[jira] [Commented] (COMPRESS-638) The GzipCompressorOutputStream#writeHeader() uses ISO_8859_1 to write the file name and comment. If the strings contains non-ISO_8859_1 characters, unknown characters are displayed after decompression. Use percent encoding for non ISO_8859_1 characters.

Gary D. Gregory (Jira) Sat, 21 Jan 2023 06:29:04 -0800


    [ 
https://issues.apache.org/jira/browse/COMPRESS-638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17679451#comment-17679451
 ]


Gary D. Gregory commented on COMPRESS-638:
------------------------------------------

Percent-encoding is now used for non-ISO_8859_1 characters.

Please see git master or a snapshot build here: 
https://repository.apache.org/content/repositories/snapshots/org/apache/commons/commons-compress/1.23-SNAPSHOT/

This should be considered a workaround IMO and we could change the 
percent-encoding format to something else in the future if needed.

There is no roundtrip back to the non-ISO_8859_1 characters when reading a GZip 
since we cannot tell what the intent of the file name bytes really is, unless 
we used some special marker, which is possible in the future in suppose.


> The GzipCompressorOutputStream#writeHeader() uses ISO_8859_1 to write the 
> file name and comment.  If the strings contains non-ISO_8859_1 characters, 
> unknown characters are displayed after decompression. Use percent encoding 
> for non ISO_8859_1 characters.
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: COMPRESS-638
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-638
>             Project: Commons Compress
>          Issue Type: Bug
>            Reporter: Radar wen
>            Priority: Major
>             Fix For: 1.23
>
>         Attachments: 0110.png
>
>
> The GzipCompressorOutputStream#writeHeader method uses the ISO_8859_1 to 
> write the file name. 
> If the file name contains non-ISO_8859_1 characters, some unknown characters 
> are displayed after decompression. !0110.png!
>  Can change the ISO_8859_1 to UTF-8? 
>         if (filename != null) {
>             out.write(filename.getBytes(ISO_8859_1));
>             out.write(0);
>         }
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (COMPRESS-638) The GzipCompressorOutputStream#writeHeader() uses ISO_8859_1 to write the file name and comment. If the strings contains non-ISO_8859_1 characters, unknown characters are displayed after decompression. Use percent encoding for non ISO_8859_1 characters.

Reply via email to