Mikaël MECHOULAM created COMPRESS-679:
-----------------------------------------

             Summary: Regression on parallel processing of 7zip files
                 Key: COMPRESS-679
                 URL: https://issues.apache.org/jira/browse/COMPRESS-679
             Project: Commons Compress
          Issue Type: Bug
    Affects Versions: 1.26.1, 1.26.0
            Reporter: Mikaël MECHOULAM
         Attachments: file.7z

I've run into a bug which occurs when attempting to read a ZIP file in several 
threads simultaneously.  The following code illustrates the problem. The 
file.7z is in attachment

 
{code:java}
import java.io.InputStream;
import java.nio.file.Paths;
import java.util.stream.IntStream;
import org.apache.commons.compress.archivers.sevenz.SevenZArchiveEntry;
import org.apache.commons.compress.archivers.sevenz.SevenZFile;
public class TestZip {
    public static void main(final String[] args) {
        final Runnable runnable = () -> {
            try {
                try (final SevenZFile sevenZFile = 
SevenZFile.builder().setPath(Paths.get("file.7z")).get()) {
                    SevenZArchiveEntry sevenZArchiveEntry;
                    while ((sevenZArchiveEntry = sevenZFile.getNextEntry()) != 
null) {
                        if ("file4.txt".equals(sevenZArchiveEntry.getName())) { 
// The entry must not be the first of the ZIP archive to reproduce
                            final InputStream inputStream = 
sevenZFile.getInputStream(sevenZArchiveEntry);
                            // treatments...
                            break;
                        }
                    }
                }
            } catch (final Exception e) { // java.io.IOException: Checksum 
verification failed
                e.printStackTrace();
            }
        };
        IntStream.range(0, 30).forEach(i -> new Thread(runnable).start());
    }
}
{code}
Below is the output I receive on version 1.26: 

 
{code:java}
java.io.IOException: Checksum verification failed
  at 
org.apache.commons.compress.utils.ChecksumVerifyingInputStream.verify(ChecksumVerifyingInputStream.java:98)
  at 
org.apache.commons.compress.utils.ChecksumVerifyingInputStream.read(ChecksumVerifyingInputStream.java:92)
  at org.apache.commons.io.IOUtils.skip(IOUtils.java:2422)
  at org.apache.commons.io.IOUtils.skip(IOUtils.java:2380)
  at 
org.apache.commons.compress.archivers.sevenz.SevenZFile.getCurrentStream(SevenZFile.java:912)
  at 
org.apache.commons.compress.archivers.sevenz.SevenZFile.getInputStream(SevenZFile.java:988)
  at 
com.infotel.arcsys.nativ.archiving.zip.TestZip.lambda$main$0(TestZip.java:21)
  at java.base/java.lang.Thread.run(Thread.java:833)
 
{code}
The issue seems to arise from the transition from version 1.25 to 1.26 of 
Apache Commons Compress. In the {{SevenZFile}} class of the library, the 
private method {{getCurrentStream}} has migrated from 
{{IOUtils.skip(InputStream, long)}} to a method with a same signature but in 
Commons-IO package, which leads to a change in behavior. In version 1.26, it 
uses a shared and unsynchronized buffer, theoretically intended only for 
writing ({{{}SCRATCH_BYTE_BUFFER_WO{}}}). This causes checksum verification 
issues within the library. The problem seems to be resolved by specifying the 
{{Supplier}} of the buffer to use.
{code:java}
try (InputStream stream = deferredBlockStreams.remove(0)) {
    org.apache.commons.io.IOUtils.skip(stream, Long.MAX_VALUE, () -> new 
byte[org.apache.commons.io.IOUtils.DEFAULT_BUFFER_SIZE]);
} {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to