[ 
https://issues.apache.org/jira/browse/COMPRESS-471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16694557#comment-16694557
 ] 

Gaurav Mittal commented on COMPRESS-471:
----------------------------------------

I tried with RAW name but then there will be lot of code changes and it will be 
difficult to manage them in zip preview and unzip process and dialog boxes etc.

 

I would like library to set some flag when there are non-UTF8 characters in 
ZipFile means -

foundUTF8 = zipFile.getEncoding().equalsIgnoreCase("UTF8"); 
it might return true or false.

or there could be some boolean value which tell whether non-utf8 characters are 
present in zip file or not.

foundUTF8 = zipFile.isUTF8Encoding(); //lib API

 

on that basis we would be able to make more robust changes.

 

We know that ZipFile constructor has all the facility to detect non UTF-8 
characters in file name but it does not give power to client code to utilize it.

 

private ZipFile(SeekableByteChannel channel, String archiveName, String 
encoding, boolean useUnicodeExtraFields, boolean closeOnError) throws 
IOException {
 this.entries = new LinkedList();

.................

try {
 Map<ZipArchiveEntry, ZipFile.NameAndComment> entriesWithoutUTF8Flag = 
this.populateFromCentralDirectory();

..............

................

}

.............

}

 

Could you please make changes in library or suggest some other way(other than 
raw name).

> Zipped files names having non UTF-8 encoding are being replaced with '?' 
> while previewing file.
> -----------------------------------------------------------------------------------------------
>
>                 Key: COMPRESS-471
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-471
>             Project: Commons Compress
>          Issue Type: Bug
>    Affects Versions: 1.18
>            Reporter: Gaurav Mittal
>            Priority: Major
>         Attachments: Document(▒Γ║╗)_20150226_11.zip, Incorrect.JPG, 
> correct.JPG
>
>
> | * All the strings which are not supported by UTF-8 are being replaced by 
> '?' symbol, 
> In the issue scenario the charset is 'Cp850', Since the common compress 
> library cannot identify the 'Cp850' charset and it takes the default charset 
> as 'UTF-8' therefore
>  we can see the '?' symbol
> In our code 
> ZipFile ret = new ZipFile(path);
> Moreover if we send the encoding in the function as defined below, it works 
> fine
> ZipFile ret = new ZipFile(new File(path), "Cp850",false);
> But the second scenario where we are forcibly giving the encoding as 'Cp850' 
> may cause side effects in some cases
>  --------------------------------------------------------------------------
> Below code does not seem to resolve UTF8 conflicts and could not make file 
> names into correct form -
>  
> try {
>  final Map<ZipArchiveEntry, NameAndComment> entriesWithoutUTF8Flag =
>  populateFromCentralDirectory();
>  resolveLocalFileHeaderData(entriesWithoutUTF8Flag); 
>  success = true;
> } finally {
>  closed = !success;
>  if (!success && closeOnError) {
>  IOUtils.closeQuietly(archive);
>  }
> }|
> | |



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to