[ 
https://issues.apache.org/jira/browse/COMPRESS-724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruiqi Dong updated COMPRESS-724:
--------------------------------
    Priority: Major  (was: Critical)

> TarUtils.parsePAX1XSparseHeaders() skips one extra full record when the 
> sparse header is already aligned
> --------------------------------------------------------------------------------------------------------
>
>                 Key: COMPRESS-724
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-724
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Archivers
>    Affects Versions: 1.28.0
>            Reporter: Ruiqi Dong
>            Priority: Major
>
> *Summary*
> parsePAX1XSparseHeaders() computes the remaining padding as recordSize - 
> bytesRead % recordSize. When bytesRead % recordSize == 0, the correct skip 
> should be 0, but the current code skips an entire extra record. This helper 
> is used on the normal GNU sparse 1.x read path, so an affected archive can 
> advance the stream past real file data.
>  
> *Affected code*
> File: src/main/java/org/apache/commons/compress/archivers/tar/TarUtils.java
> {code:java}
> static List<TarArchiveStructSparse> parsePAX1XSparseHeaders(
>         final InputStream inputStream, final int recordSize) throws 
> IOException {
>     ...
>     while (sparseHeadersCount-- > 0) {
>         ...
>         bytesRead += readResult[1];
>         sparseHeaders.add(new TarArchiveStructSparse(sparseOffset, 
> sparseNumbytes));
>     }
>     final long bytesToSkip = recordSize - bytesRead % recordSize;
>     IOUtils.skip(inputStream, bytesToSkip);
>     return sparseHeaders;
> } {code}
> This code path is used when reading GNU sparse 1.x entries from both 
> TarArchiveInputStream and TarFile:
> {code:java}
> // TarArchiveInputStream
> if (currEntry.isPaxGNU1XSparse()) {
>     
> currEntry.setSparseHeaders(TarUtils.parsePAX1XSparseHeaders(currentInputStream,
>  getRecordSize()));
> }
> buildSparseInputStreams(); {code}
> {code:java}
> // TarFile
> if (currEntry.isPaxGNU1XSparse()) {
>     final long position = archive.position();
>     
> currEntry.setSparseHeaders(TarUtils.parsePAX1XSparseHeaders(currentStream, 
> recordSize));
>     final long sparseHeadersSize = archive.position() - position;
>     currEntry.setSize(currEntry.getSize() - sparseHeadersSize);
>     currEntry.setDataOffset(currEntry.getDataOffset() + sparseHeadersSize);
> } {code}
>  
> *Reproducer*
> Add the following test to 
> src/test/java/org/apache/commons/compress/archivers/tar/TarUtilsTest.java:
> {code:java}
> @Test
> void testParsePax1xSparseHeadersDoesNotSkipAlignedDataRecord() throws 
> Exception {
>     final byte[] sparseHeader = pax1xSparseHeaderAlignedToSingleRecord();
>     final byte[] fileData = new byte[513];
>     Arrays.fill(fileData, 0, 512, (byte) 'A');
>     fileData[512] = 'B';
>     final byte[] input = ArrayUtils.addAll(sparseHeader, fileData);
>     try (ByteArrayInputStream in = new ByteArrayInputStream(input)) {
>         final List<TarArchiveStructSparse> sparseHeaders =
>                 TarUtils.parsePAX1XSparseHeaders(in, 512);
>         assertEquals(100, sparseHeaders.size());
>         assertEquals('A', in.read());
>     }
> } {code}
> Run:
> {code:java}
> mvn -q -Dtest=org.apache.commons.compress.archivers.tar.TarUtilsTest test 
> {code}
> Observed behavior:
> {code:java}
> TarUtilsTest.testParsePax1xSparseHeadersDoesNotSkipAlignedDataRecord:251
> expected: <65> but was: <66> {code}
> Expected behavior:
> After parsing an already aligned sparse header block, the next byte should be 
> the first byte of file data.
> The current formula converts an exact alignment case into a full-record skip. 
> In the GNU sparse 1.x read path, that means the reader moves one record too 
> far before building the sparse-data streams, so real payload bytes are 
> skipped, and the extracted/read file content becomes misaligned.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to