[ 
http://issues.apache.org/jira/browse/SANDBOX-176?page=comments#action_12437834 
] 
            
Christian Gosch commented on SANDBOX-176:
-----------------------------------------

Some hints to make this hot (from the reporter, I have to admit):

Java bug 42444999 (ZipEntry() does not convert filenames from Unicode to 
platform), which to help work around is this proposal about, is the number two 
on the Top25 list with 566 votes, ruled out only by 4670071 
(java.lang.ClassLoader.loadClassInternal(String) is too restrictive) with 793 
votes. So enabling people to overcome this nasty Java bug will delight probably 
a lot of people.

Of course the reporter successfully tested the solution proposed with "Cp437" 
as parameter for the new constructor. It has proven to work with a file / 
archive entry named "_A_Ä_a_ä_O_Ö_o_ö_U_Ü_u_ü_sz_ß_.jpg" and the zip archiver 
tools WinRAR and PowerArchiver (current versions). Additionally the built 
archive was successfully deflated (and the file name preserved) by a service 
provider with a server-based unzip mechanism of which I do not know what 
executable program effectively works there, but the result was as it should be.

> Enable creation of tool-readable ZIP archives with file names containing 
> non-ASCII characters
> ---------------------------------------------------------------------------------------------
>
>                 Key: SANDBOX-176
>                 URL: http://issues.apache.org/jira/browse/SANDBOX-176
>             Project: Commons Sandbox
>          Issue Type: Improvement
>          Components: Compress
>         Environment: Any / All
>            Reporter: Christian Gosch
>
> Currently it is not possible to generate externally readable ZIP archives 
> with java.util.zip.* or org.apache.commons.compress.* when entries to include 
> shall have names with characters outside US-ASCII. This should be changed to 
> enable at least org.apache.commons.compress.* to produce ZIP archives in 
> international context which are readable by usual ZIP archiver tools like 
> pkzip, gzip, WinZIP, PowerArchiver, WinRAR / rar, StuffIt...
> For java.util.zip.* this is due to a really old flaw on handling entry names: 
> They are just always rendered as UTF-8, which is kind of Java specific, and 
> not as Cp437, which is expected and written by most ZIP archiver tools (or 
> eventually all). For more details see:
> http://bugs.sun.com/bugdatabase/view_bug.do;:YfiG?bug_id=4244499
> http://bugs.sun.com/bugdatabase/view_bug.do;:YfiG?bug_id=4820807
> For org.apache.commons.compress.archivers.zip.* the "compress & save" 
> operation can be easily improved by extending ZipArchive:
> // Add member:
>     protected String m_encoding = null;
> // Add constructor:
>     public ZipArchive(String encoding) {
>         m_encoding = encoding;
>     }
> // Extend doSave(FileOutputStream):
> // ...
>               // Pack-Operation
>               ZipOutputStream out = null;
>               try {
>                       out = new ZipOutputStream(new 
> BufferedOutputStream(output));
>             if (m_encoding != null) {   // added
>                 out.setEncoding(m_encoding);   // added
>             }  // added
>                       while(iterator.hasNext()) {
> // ...
> Now it is possible to instantiate a ZipArchive with "Cp437" as encoding, and 
> external tools can figure out the original entry names even if they contain 
> non-ASCII characters. (On the other hand, Java cannot read back & deflate 
> such an archive since it expects UTF-8!)
> The "read & deflate" operation for ZipArchive is more difficult to extend 
> since it currently relies completely on java.util.zip.* . The other reason 
> is, that ZIP archives do not contain any hint on the character encoding used 
> for file names etc. It seems that the usual tools simply use Cp437 and Java 
> simply uses UTF-8 -- without any declaration of reasons. Thus a deflater has 
> to try.
> For TarArchive the problem is unclear. Here the commons-compress 
> implementation does not rely on third-party code as far as I can see, and TAR 
> is no Java-bound file type (like JAR, which is Java-bound). Thus chances are, 
> that everything works well, even when entry names with non-ASCII characters 
> come into play.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to