Mikaël MECHOULAM created COMPRESS-679: -----------------------------------------
Summary: Regression on parallel processing of 7zip files Key: COMPRESS-679 URL: https://issues.apache.org/jira/browse/COMPRESS-679 Project: Commons Compress Issue Type: Bug Affects Versions: 1.26.1, 1.26.0 Reporter: Mikaël MECHOULAM Attachments: file.7z I've run into a bug which occurs when attempting to read a ZIP file in several threads simultaneously. The following code illustrates the problem. The file.7z is in attachment {code:java} import java.io.InputStream; import java.nio.file.Paths; import java.util.stream.IntStream; import org.apache.commons.compress.archivers.sevenz.SevenZArchiveEntry; import org.apache.commons.compress.archivers.sevenz.SevenZFile; public class TestZip { public static void main(final String[] args) { final Runnable runnable = () -> { try { try (final SevenZFile sevenZFile = SevenZFile.builder().setPath(Paths.get("file.7z")).get()) { SevenZArchiveEntry sevenZArchiveEntry; while ((sevenZArchiveEntry = sevenZFile.getNextEntry()) != null) { if ("file4.txt".equals(sevenZArchiveEntry.getName())) { // The entry must not be the first of the ZIP archive to reproduce final InputStream inputStream = sevenZFile.getInputStream(sevenZArchiveEntry); // treatments... break; } } } } catch (final Exception e) { // java.io.IOException: Checksum verification failed e.printStackTrace(); } }; IntStream.range(0, 30).forEach(i -> new Thread(runnable).start()); } } {code} Below is the output I receive on version 1.26: {code:java} java.io.IOException: Checksum verification failed at org.apache.commons.compress.utils.ChecksumVerifyingInputStream.verify(ChecksumVerifyingInputStream.java:98) at org.apache.commons.compress.utils.ChecksumVerifyingInputStream.read(ChecksumVerifyingInputStream.java:92) at org.apache.commons.io.IOUtils.skip(IOUtils.java:2422) at org.apache.commons.io.IOUtils.skip(IOUtils.java:2380) at org.apache.commons.compress.archivers.sevenz.SevenZFile.getCurrentStream(SevenZFile.java:912) at org.apache.commons.compress.archivers.sevenz.SevenZFile.getInputStream(SevenZFile.java:988) at com.infotel.arcsys.nativ.archiving.zip.TestZip.lambda$main$0(TestZip.java:21) at java.base/java.lang.Thread.run(Thread.java:833) {code} The issue seems to arise from the transition from version 1.25 to 1.26 of Apache Commons Compress. In the {{SevenZFile}} class of the library, the private method {{getCurrentStream}} has migrated from {{IOUtils.skip(InputStream, long)}} to a method with a same signature but in Commons-IO package, which leads to a change in behavior. In version 1.26, it uses a shared and unsynchronized buffer, theoretically intended only for writing ({{{}SCRATCH_BYTE_BUFFER_WO{}}}). This causes checksum verification issues within the library. The problem seems to be resolved by specifying the {{Supplier}} of the buffer to use. {code:java} try (InputStream stream = deferredBlockStreams.remove(0)) { org.apache.commons.io.IOUtils.skip(stream, Long.MAX_VALUE, () -> new byte[org.apache.commons.io.IOUtils.DEFAULT_BUFFER_SIZE]); } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)