[ https://issues.apache.org/jira/browse/COMPRESS-477?focusedWorklogId=341701&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341701 ]
ASF GitHub Bot logged work on COMPRESS-477: ------------------------------------------- Author: ASF GitHub Bot Created on: 12/Nov/19 07:09 Start Date: 12/Nov/19 07:09 Worklog Time Spent: 10m Work Description: PeterAlfreadLee commented on pull request #86: COMPRESS-477 building a split zip URL: https://github.com/apache/commons-compress/pull/86 [COMPRESS-477](https://issues.apache.org/jira/projects/COMPRESS/issues/COMPRESS-477) Add support for building a split/spanned zip. Sample code: ``` @Test public void buildSplitZipTest() throws IOException { File directoryToZip = getFilesToZip(); File outputZipFile = new File(dir, "splitZip.zip"); long splitSize = 100 * 1024L; /* 100 KB */ final ZipArchiveOutputStream zipArchiveOutputStream = new ZipArchiveOutputStream(outputZipFile, splitSize); addFilesToZip(zipArchiveOutputStream, directoryToZip); zipArchiveOutputStream.close(); // TODO: validate the created zip files when extracting split zip is merged into master } private void addFilesToZip(ZipArchiveOutputStream zipArchiveOutputStream, File fileToAdd) throws IOException { if(fileToAdd.isDirectory()) { for(File file : fileToAdd.listFiles()) { addFilesToZip(zipArchiveOutputStream, file); } } else { ZipArchiveEntry zipArchiveEntry = new ZipArchiveEntry(fileToAdd.getPath()); zipArchiveEntry.setMethod(ZipEntry.DEFLATED); zipArchiveOutputStream.putArchiveEntry(zipArchiveEntry); IOUtils.copy(new FileInputStream(fileToAdd), zipArchiveOutputStream); zipArchiveOutputStream.closeArchiveEntry(); } } ``` This PR is implemented by adding a new class `ZipSplitOutputStream`, and it's mainly implemented like this: 1. Write the zip split signature to the zip file in the constructor of `ZipSplitOutputStream` by calling `writeZipSplitSignature`; 2. Based on the zip specification, the split size must between 64K and 4,294,967,295 bytes; 3. Rename the split zip files like .z01, .z02, ... , .z(N-1), .zip ONLY IF there are more than 1 split segment; 4. Get the only split segment whose suffix is .zip IF the split size is big enough(it means the split size is bigger than the actual zip size); 5. Create a new zip split segment if the size of data to write exceeds split size, and the newly created zip segment will be named in the sequence like .z01, .z02, ..., .z99, .z100, .z101, ... , .zip; 6. Based on the zip specification, the End Of Central Directory(EOCD) and Zip64 End Of Central Directory Locator(Zip64_EOCDL) must reside on the same segment, so the `ZipSplitOutputStream` will create a new segment if the remaining size is not enough before writing EOCD and Zip64_EOCDL; 7. When creating `ZipArchiveOutputStream`, if the split size is specified, it will create a split zip instead of normal zip(as the `ZipSplitOutputStream` need the file name when creating new split segments, the constructor is like `public ZipArchiveOutputStream(final File file, final long zipSplitSize)`); 8. The disk number, relative offset, number of this disk, number of Central Directories on this disk, total number of disks in Central Directory, Zip64 End Of Central Directory, Zip64 End Of Central Directory Locator, End Of Central Directory have all been tuned to the right value when writing a split/spanned zip; 9. The testcases need to be updated when [#84]{https://github.com/apache/commons-compress/pull/84} is merged because it seems I can not test my created split/spanned zip in Linux. I have tested it on Windows and it works well; 10. This PR has some minor conflicts with [#84]{https://github.com/apache/commons-compress/pull/84}, and I will solve all these conflicts as soon as [#84]{https://github.com/apache/commons-compress/pull/84} is merged. Please feel free to let me know if the code need to be refactored or rebased. I'm looking for your reviews. :-) @bodewig @garydgregory ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 341701) Time Spent: 3h (was: 2h 50m) > Support for splitted zip files > ------------------------------ > > Key: COMPRESS-477 > URL: https://issues.apache.org/jira/browse/COMPRESS-477 > Project: Commons Compress > Issue Type: New Feature > Components: Archivers > Affects Versions: 1.18 > Reporter: Luís Filipe Nassif > Priority: Major > Labels: zip > Time Spent: 3h > Remaining Estimate: 0h > > It would be very useful to support splitted zip files. I've read > [https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT] and understood > that simply concatenating the segments and removing the split signature > 0x08074b50 from first segment would be sufficient, but it is not that simple > because compress fails with exception below: > {code} > Caused by: java.util.zip.ZipException: archive's ZIP64 end of central > directory locator is corrupt. > at > org.apache.commons.compress.archivers.zip.ZipFile.positionAtCentralDirectory64(ZipFile.java:924) > ~[commons-compress-1.18.jar:1.18] > at > org.apache.commons.compress.archivers.zip.ZipFile.positionAtCentralDirectory(ZipFile.java:901) > ~[commons-compress-1.18.jar:1.18] > at > org.apache.commons.compress.archivers.zip.ZipFile.populateFromCentralDirectory(ZipFile.java:621) > ~[commons-compress-1.18.jar:1.18] > at > org.apache.commons.compress.archivers.zip.ZipFile.<init>(ZipFile.java:295) > ~[commons-compress-1.18.jar:1.18] > at > org.apache.commons.compress.archivers.zip.ZipFile.<init>(ZipFile.java:280) > ~[commons-compress-1.18.jar:1.18] > at > org.apache.commons.compress.archivers.zip.ZipFile.<init>(ZipFile.java:236) > ~[commons-compress-1.18.jar:1.18] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)