[jira] [Comment Edited] (COMPRESS-638) The GzipCompressorOutputStream#writeHeader() uses ISO_8859_1 to write the file name and comment. If the strings contains non-ISO_8859_1 characters, unknown characters are displayed after decompression. Use percent encoding for non ISO_8859_1 characters.

Gary D. Gregory (Jira) Sat, 21 Jan 2023 06:30:04 -0800


    [ 
https://issues.apache.org/jira/browse/COMPRESS-638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17679451#comment-17679451
 ]


Gary D. Gregory edited comment on COMPRESS-638 at 1/21/23 2:29 PM:
-------------------------------------------------------------------

Percent-encoding is now used for non-ISO_8859_1 characters.

Please see git master or a snapshot build here: 
https://repository.apache.org/content/repositories/snapshots/org/apache/commons/commons-compress/1.23-SNAPSHOT/

This should be considered a workaround IMO and we could change the 
percent-encoding format to something else in the future if needed.

There is no roundtrip back to the non-ISO_8859_1 characters when reading a GZip 
since we cannot tell what the intent of the file name bytes really are unless 
we used some special marker, which is possible in the future in suppose.



was (Author: garydgregory):
Percent-encoding is now used for non-ISO_8859_1 characters.

Please see git master or a snapshot build here: 
https://repository.apache.org/content/repositories/snapshots/org/apache/commons/commons-compress/1.23-SNAPSHOT/

This should be considered a workaround IMO and we could change the 
percent-encoding format to something else in the future if needed.

There is no roundtrip back to the non-ISO_8859_1 characters when reading a GZip 
since we cannot tell what the intent of the file name bytes really is, unless 
we used some special marker, which is possible in the future in suppose.


> The GzipCompressorOutputStream#writeHeader() uses ISO_8859_1 to write the 
> file name and comment.  If the strings contains non-ISO_8859_1 characters, 
> unknown characters are displayed after decompression. Use percent encoding 
> for non ISO_8859_1 characters.
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: COMPRESS-638
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-638
>             Project: Commons Compress
>          Issue Type: Bug
>            Reporter: Radar wen
>            Priority: Major
>             Fix For: 1.23
>
>         Attachments: 0110.png
>
>
> The GzipCompressorOutputStream#writeHeader method uses the ISO_8859_1 to 
> write the file name. 
> If the file name contains non-ISO_8859_1 characters, some unknown characters 
> are displayed after decompression. !0110.png!
>  Can change the ISO_8859_1 to UTF-8? 
>         if (filename != null) {
>             out.write(filename.getBytes(ISO_8859_1));
>             out.write(0);
>         }
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (COMPRESS-638) The GzipCompressorOutputStream#writeHeader() uses ISO_8859_1 to write the file name and comment. If the strings contains non-ISO_8859_1 characters, unknown characters are displayed after decompression. Use percent encoding for non ISO_8859_1 characters.

Reply via email to