[ https://issues.apache.org/jira/browse/COMPRESS-565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287589#comment-17287589 ]
Peter Lee commented on COMPRESS-565: ------------------------------------ I'm not familiar with *Expand-Archive Powershell utility*. Is it open sourced or not? I can't find anything on google. 7zip is open sourced but I'm not familiar with its code.:( The difference between using _output.setCreateUnicodeExtraFields(ZipArchiveOutputStream.UnicodeExtraFieldPolicy.ALWAYS)_ or not using it is: whether we are adding the extra field _Info-ZIP Unicode Path Extra Field_ in the extra field or not. And I think the reason why 7z is complaining and *Expand-Archive Powershell utility* on Windows can't extract the archive is : *_Info-ZIP Unicode Path Extra Field_ is not supported by them*. See also: sector 4.6.9 of [zip APPNOTE|https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT] for more detailed information With _ZipArchiveOutputStream.UnicodeExtraFieldPolicy.ALWAYS_ being set, we will always add the _Info-ZIP Unicode Path Extra Field_, which can be seen in the generated zip: !image-2021-02-20-15-51-21-747.png! I can make some simple explanations : First of all, zip format is using little endian. The first 2 bytes 0x7075 is the signature of _Info-ZIP Unicode Path Extra Field_.And the 0x000e is the size of this field, which is 14. The 0x01 is the version of this extra field, which is always 1 now(according to the [zip APPNOTE|https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT]). The 4 bytes 0x7df6c07c is the CRC32 checksum of the file name(which can be easily checked with any CRC32 check tools using the name _input.bin_). The 9 bytes 0x69 6e 70 75 74 2e 62 69 6e is the UTF-8 value of the file name, which is _input.bin_. You can see that 9 + 4 + 1 = 14 is exactly the length of this field I mentioned. So I think we have built a correct _Info-ZIP Unicode Path Extra Field._ > Regression - Corrupted headers when using 64 bit ZipArchiveOutputStream > ----------------------------------------------------------------------- > > Key: COMPRESS-565 > URL: https://issues.apache.org/jira/browse/COMPRESS-565 > Project: Commons Compress > Issue Type: Bug > Components: Archivers > Affects Versions: 1.20 > Reporter: Evgenii Bovykin > Assignee: Peter Lee > Priority: Major > Attachments: image-2021-02-20-15-51-21-747.png > > > We've recently updated commons-compress library from version 1.9 to 1.20 and > now experiencing the problem that didn't occur before. > > When using ZipArchiveOutputStream to archive 5Gb file and setting the > following fields > {{output.setUseZip64(Zip64Mode.Always)}} > > {{output.setCreateUnicodeExtraFields(ZipArchiveOutputStream.UnicodeExtraFieldPolicy.ALWAYS)}} > resulting archive contains corrupted headers. > *Expand-Archive Powershell utility cannot extract the archive at all with the > error about corrupted header. 7zip also complains about it, but can extract > the archive.* > > The problem didn't appear when using library version 1.9. > > I've created a sample project that reproduces the error - > [https://github.com/missingdays/commons-compress-example] > Issue doesn't reproduce if you do any of the following: > > # Downgrade library to version 1.9 > # Remove > output.setCreateUnicodeExtraFields(ZipArchiveOutputStream.UnicodeExtraFieldPolicy.ALWAYS) > # Remove output.setUseZip64(Zip64Mode.Always) and zip smaller file (e.g. 1Gb) -- This message was sent by Atlassian Jira (v8.3.4#803005)