[ https://issues.apache.org/jira/browse/COMPRESS-666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gary D. Gregory moved COMMONSSITE-169 to COMPRESS-666: ------------------------------------------------------ Key: COMPRESS-666 (was: COMMONSSITE-169) Project: Commons Compress (was: Apache Commons All) > Commons compress 1.26.0 gives unexpected Corrupted TAR archive > -------------------------------------------------------------- > > Key: COMPRESS-666 > URL: https://issues.apache.org/jira/browse/COMPRESS-666 > Project: Commons Compress > Issue Type: Bug > Environment: Commons compress 1.26.0 to get a failure. Any tar tgz. > Reporter: Cosmin Carabet > Priority: Major > > Something in > [https://github.com/apache/commons-compress/compare/rel/commons-compress-1.25.0...master] > seems to make iterating through the tar entries of multiple > TarArchiveInputStreams throw Corrupted TAR archive: > > {code:java} > @Test > void bla() { > ExecutorService executorService = Executors.newFixedThreadPool(10); > List<CompletableFuture<Void>> tasks = IntStream.range(0, 200) > .mapToObj(_idx -> CompletableFuture.runAsync( > () -> { > try (InputStream inputStream = this.getClass() > .getResourceAsStream( > "/<your favourite tar tgz>"); > TarArchiveInputStream tarInputStream = > new TarArchiveInputStream(new > GZIPInputStream(inputStream))) { > TarArchiveEntry tarEntry; > while ((tarEntry = > tarInputStream.getNextTarEntry()) != null) { > System.out.println("Reading entry %s with > size %d" > .formatted(tarEntry.getName(), > tarEntry.getSize())); > } > } catch (Exception ex) { > throw new SafeRuntimeException(ex); > } > }, > executorService)) > .toList(); > > Futures.getUnchecked(CompletableFuture.allOf(verificationTasks.toArray(new > CompletableFuture<?>[0]))); > } {code} > Although TarArchiveInputStream is marked as not thread safe, I am not reusing > objects here. Those are in fact separate objects, presumably all with their > own position tracking info. > > The stacktrace here looks like: > {code:java} > Caused by: java.io.IOException: Corrupted TAR archive. > at > org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeader(TarArchiveEntry.java:1480) > at > org.apache.commons.compress.archivers.tar.TarArchiveEntry.<init>(TarArchiveEntry.java:534) > at > org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:431) > at > Caused by: java.lang.IllegalArgumentException: Invalid byte 100 at offset 0 > in 'dddddddddddd' len=12 > at > org.apache.commons.compress.archivers.tar.TarUtils.parseOctal(TarUtils.java:516) > at > org.apache.commons.compress.archivers.tar.TarUtils.parseOctalOrBinary(TarUtils.java:540) > at > org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeaderUnwrapped(TarArchiveEntry.java:1496) > at > org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeader(TarArchiveEntry.java:1478) > ... 7 more > {code} > That code shows that occasionally the header is wrong (the tar entry name > contains gibberish bits) which makes me think that `getNextTarEntry()` can be > faulty. > > Running that code with commons compress 1.25.0 works as expected. So it's > probably something added since November. Note that this is something related > to parallelism - using an executor service with a single thread doesn't > suffer from the same error. The tgz to decompress doesn't really matter - you > can use a manually created one worth a few KBs. -- This message was sent by Atlassian Jira (v8.20.10#820010)