[ 
https://issues.apache.org/jira/browse/COMPRESS-565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287589#comment-17287589
 ] 

Peter Lee commented on COMPRESS-565:
------------------------------------

I'm not familiar with *Expand-Archive Powershell utility*. Is it open sourced 
or not? I can't find anything on google.

7zip is open sourced but I'm not familiar with its code.:(

The difference between using 
_output.setCreateUnicodeExtraFields(ZipArchiveOutputStream.UnicodeExtraFieldPolicy.ALWAYS)_
 or not using it is:

whether we are adding the extra field _Info-ZIP Unicode Path Extra Field_ in 
the extra field or not.  And I think the reason why 7z is complaining and 
*Expand-Archive Powershell utility* on Windows can't extract the archive is : 
*_Info-ZIP Unicode Path Extra Field_ is not supported by them*.

See also: sector 4.6.9 of [zip 
APPNOTE|https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT] for more 
detailed information

 

With _ZipArchiveOutputStream.UnicodeExtraFieldPolicy.ALWAYS_ being set, we will 
always add the _Info-ZIP Unicode Path Extra Field_, which can be seen in the 
generated zip:

!image-2021-02-20-15-51-21-747.png!

I can make some simple explanations :

First of all, zip format is using little endian.

The first 2 bytes 0x7075 is the signature of _Info-ZIP Unicode Path Extra 
Field_.And the 0x000e is the size of this field, which is 14.

The 0x01 is the version of this extra field, which is always 1 now(according to 
the [zip APPNOTE|https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT]).

The 4 bytes 0x7df6c07c is the CRC32 checksum of the file name(which can be 
easily checked with any CRC32 check tools using the name _input.bin_).

The 9 bytes 0x69 6e 70 75 74 2e 62 69 6e is the UTF-8 value of the file name, 
which is _input.bin_.

You can see that 9 + 4 + 1 = 14 is exactly the length of this field I 
mentioned. So I think we have built a correct _Info-ZIP Unicode Path Extra 
Field._

> Regression - Corrupted headers when using 64 bit ZipArchiveOutputStream
> -----------------------------------------------------------------------
>
>                 Key: COMPRESS-565
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-565
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Archivers
>    Affects Versions: 1.20
>            Reporter: Evgenii Bovykin
>            Assignee: Peter Lee
>            Priority: Major
>         Attachments: image-2021-02-20-15-51-21-747.png
>
>
> We've recently updated commons-compress library from version 1.9 to 1.20 and 
> now experiencing the problem that didn't occur before.
>  
> When using ZipArchiveOutputStream to archive 5Gb file and setting the 
> following fields
> {{output.setUseZip64(Zip64Mode.Always)}}
>  
> {{output.setCreateUnicodeExtraFields(ZipArchiveOutputStream.UnicodeExtraFieldPolicy.ALWAYS)}}
> resulting archive contains corrupted headers.
> *Expand-Archive Powershell utility cannot extract the archive at all with the 
> error about corrupted header. 7zip also complains about it, but can extract 
> the archive.*
>  
> The problem didn't appear when using library version 1.9.
>  
> I've created a sample project that reproduces the error - 
> [https://github.com/missingdays/commons-compress-example]
> Issue doesn't reproduce if you do any of the following:
>  
>  # Downgrade library to version 1.9
>  # Remove 
> output.setCreateUnicodeExtraFields(ZipArchiveOutputStream.UnicodeExtraFieldPolicy.ALWAYS)
>  # Remove output.setUseZip64(Zip64Mode.Always) and zip smaller file (e.g. 1Gb)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to