[ https://issues.apache.org/jira/browse/COMPRESS-638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17679451#comment-17679451 ]
Gary D. Gregory commented on COMPRESS-638: ------------------------------------------ Percent-encoding is now used for non-ISO_8859_1 characters. Please see git master or a snapshot build here: https://repository.apache.org/content/repositories/snapshots/org/apache/commons/commons-compress/1.23-SNAPSHOT/ This should be considered a workaround IMO and we could change the percent-encoding format to something else in the future if needed. There is no roundtrip back to the non-ISO_8859_1 characters when reading a GZip since we cannot tell what the intent of the file name bytes really is, unless we used some special marker, which is possible in the future in suppose. > The GzipCompressorOutputStream#writeHeader() uses ISO_8859_1 to write the > file name and comment. If the strings contains non-ISO_8859_1 characters, > unknown characters are displayed after decompression. Use percent encoding > for non ISO_8859_1 characters. > -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: COMPRESS-638 > URL: https://issues.apache.org/jira/browse/COMPRESS-638 > Project: Commons Compress > Issue Type: Bug > Reporter: Radar wen > Priority: Major > Fix For: 1.23 > > Attachments: 0110.png > > > The GzipCompressorOutputStream#writeHeader method uses the ISO_8859_1 to > write the file name. > If the file name contains non-ISO_8859_1 characters, some unknown characters > are displayed after decompression. !0110.png! > Can change the ISO_8859_1 to UTF-8? > if (filename != null) { > out.write(filename.getBytes(ISO_8859_1)); > out.write(0); > } > > -- This message was sent by Atlassian Jira (v8.20.10#820010)