Enable creation of tool-readable ZIP archives with file names containing 
non-ASCII characters
---------------------------------------------------------------------------------------------

                 Key: SANDBOX-176
                 URL: http://issues.apache.org/jira/browse/SANDBOX-176
             Project: Commons Sandbox
          Issue Type: Improvement
          Components: Compress
         Environment: Any / All
            Reporter: Christian Gosch


Currently it is not possible to generate externally readable ZIP archives with 
java.util.zip.* or org.apache.commons.compress.* when entries to include shall 
have names with characters outside US-ASCII. This should be changed to enable 
at least org.apache.commons.compress.* to produce ZIP archives in international 
context which are readable by usual ZIP archiver tools like pkzip, gzip, 
WinZIP, PowerArchiver, WinRAR / rar, StuffIt...


For java.util.zip.* this is due to a really old flaw on handling entry names: 
They are just always rendered as UTF-8, which is kind of Java specific, and not 
as Cp437, which is expected and written by most ZIP archiver tools (or 
eventually all). For more details see:

http://bugs.sun.com/bugdatabase/view_bug.do;:YfiG?bug_id=4244499
http://bugs.sun.com/bugdatabase/view_bug.do;:YfiG?bug_id=4820807


For org.apache.commons.compress.archivers.zip.* the "compress & save" operation 
can be easily improved by extending ZipArchive:

// Add member:

    protected String m_encoding = null;

// Add constructor:

    public ZipArchive(String encoding) {
        m_encoding = encoding;
    }

// Extend doSave(FileOutputStream):
// ...
                // Pack-Operation
                ZipOutputStream out = null;
                try {
                        out = new ZipOutputStream(new 
BufferedOutputStream(output));
            if (m_encoding != null) {   // added
                out.setEncoding(m_encoding);   // added
            }  // added
                        while(iterator.hasNext()) {
// ...


Now it is possible to instantiate a ZipArchive with "Cp437" as encoding, and 
external tools can figure out the original entry names even if they contain 
non-ASCII characters. (On the other hand, Java cannot read back & deflate such 
an archive since it expects UTF-8!)

The "read & deflate" operation for ZipArchive is more difficult to extend since 
it currently relies completely on java.util.zip.* . The other reason is, that 
ZIP archives do not contain any hint on the character encoding used for file 
names etc. It seems that the usual tools simply use Cp437 and Java simply uses 
UTF-8 -- without any declaration of reasons. Thus a deflater has to try.

For TarArchive the problem is unclear. Here the commons-compress implementation 
does not rely on third-party code as far as I can see, and TAR is no Java-bound 
file type (like JAR, which is Java-bound). Thus chances are, that everything 
works well, even when entry names with non-ASCII characters come into play.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to