[jira] [Commented] (COMPRESS-183) Support for de/encoding of tar entry names other than plain 8BIT conversion.

2012-03-23 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237447#comment-13237447
 ] 

Stefan Bodewig commented on COMPRESS-183:
-

I need to add comments and want to fix handling of linkName for tar entries 
that represent links but in general the code should be fixed with svn revision 
1304709

The tar package now uses the platform's native encoding by default (this may 
change to ISO-8859-1 before the release).  Encoding can be overridden inside 
the constructor.

The outputstream has an additional option that can be used to tell it to write 
non-ASCII file names to PAX extension headers, this should work for any modern 
implemenation of tar and is the only way to get portable archives - at the 
expense of an additional 512 bytes block.

The input stream will read and apply PAX extension headers transparently.

> Support for de/encoding of tar entry names other than plain 8BIT conversion.
> 
>
> Key: COMPRESS-183
> URL: https://issues.apache.org/jira/browse/COMPRESS-183
> Project: Commons Compress
>  Issue Type: Improvement
>  Components: Archivers
>Affects Versions: 1.3
>Reporter: Joao Schim
>  Labels: patch
> Fix For: 1.4
>
> Attachments: patch-tar-name-encoding.diff, 
> patch-tar-name-encoding.diff, patch-tar-name-encoding.diff
>
>
> The names of tar entries are currently encoded/decoded by means of plain 8bit 
> conversions of byte to char and vice-versa. This prohibits the use of 
> encodings like UTF8 in the file names. Whether the use of UTF8 (or any other 
> non ASCII) in file names is sensible is a chapter of its own. However tar 
> archives that contain files which names have been encoded with UTF8 do float 
> around. These files currently can not be read correctly by commons-compress 
> due to the encoding being hardcoded to plain 8BIT only. 
> The supplied patch allows to use encodings other than 8BIT using a 
> TarArchiveCodec structure. It does not change the standard functionality, but 
> adds to it the possibility of using a different encoding. 
> A method was added to the TarUtilsTest junit test to test the added 
> functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-183) Support for de/encoding of tar entry names other than plain 8BIT conversion.

2012-03-16 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13231866#comment-13231866
 ] 

Stefan Bodewig commented on COMPRESS-183:
-

The zip package already contains code that is similar to the codec in your 
patch, I'll look into reusing that.

Modern (POSIX) tars support non-ASCII encodings via PAX extension headers, 
which current trunk already supports on the reading side - it shouldn't be too 
hard for the writing side.

> Support for de/encoding of tar entry names other than plain 8BIT conversion.
> 
>
> Key: COMPRESS-183
> URL: https://issues.apache.org/jira/browse/COMPRESS-183
> Project: Commons Compress
>  Issue Type: Improvement
>  Components: Archivers
>Affects Versions: 1.3
>Reporter: Joao Schim
>  Labels: patch
> Fix For: 1.4
>
> Attachments: patch-tar-name-encoding.diff, 
> patch-tar-name-encoding.diff, patch-tar-name-encoding.diff
>
>
> The names of tar entries are currently encoded/decoded by means of plain 8bit 
> conversions of byte to char and vice-versa. This prohibits the use of 
> encodings like UTF8 in the file names. Whether the use of UTF8 (or any other 
> non ASCII) in file names is sensible is a chapter of its own. However tar 
> archives that contain files which names have been encoded with UTF8 do float 
> around. These files currently can not be read correctly by commons-compress 
> due to the encoding being hardcoded to plain 8BIT only. 
> The supplied patch allows to use encodings other than 8BIT using a 
> TarArchiveCodec structure. It does not change the standard functionality, but 
> adds to it the possibility of using a different encoding. 
> A method was added to the TarUtilsTest junit test to test the added 
> functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira