[jira] [Updated] (COMPRESS-724) TarUtils.parsePAX1XSparseHeaders() skips one extra full record when the sparse header is already aligned

Ruiqi Dong (Jira) Thu, 21 May 2026 02:03:11 -0700


     [ 
https://issues.apache.org/jira/browse/COMPRESS-724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ruiqi Dong updated COMPRESS-724:
--------------------------------
    Description: 
*Summary*
parsePAX1XSparseHeaders() computes the remaining padding as recordSize - 
bytesRead % recordSize. When bytesRead % recordSize == 0, the correct skip 
should be 0, but the current code skips an entire extra record. This helper is 
used on the normal GNU sparse 1.x read path, so an affected archive can advance 
the stream past real file data.
 
*Affected code*
File: src/main/java/org/apache/commons/compress/archivers/tar/TarUtils.java
{code:java}
static List<TarArchiveStructSparse> parsePAX1XSparseHeaders(
        final InputStream inputStream, final int recordSize) throws IOException 
{
    ...
    while (sparseHeadersCount-- > 0) {
        ...
        bytesRead += readResult[1];
        sparseHeaders.add(new TarArchiveStructSparse(sparseOffset, 
sparseNumbytes));
    }
    final long bytesToSkip = recordSize - bytesRead % recordSize;
    IOUtils.skip(inputStream, bytesToSkip);
    return sparseHeaders;
} {code}
This code path is used when reading GNU sparse 1.x entries from both 
TarArchiveInputStream and TarFile:
{code:java}
// TarArchiveInputStream
if (currEntry.isPaxGNU1XSparse()) {
    
currEntry.setSparseHeaders(TarUtils.parsePAX1XSparseHeaders(currentInputStream, 
getRecordSize()));
}
buildSparseInputStreams(); {code}
{code:java}
// TarFile
if (currEntry.isPaxGNU1XSparse()) {
    final long position = archive.position();
    currEntry.setSparseHeaders(TarUtils.parsePAX1XSparseHeaders(currentStream, 
recordSize));
    final long sparseHeadersSize = archive.position() - position;
    currEntry.setSize(currEntry.getSize() - sparseHeadersSize);
    currEntry.setDataOffset(currEntry.getDataOffset() + sparseHeadersSize);
} {code}
 

*Reproducer*
Add the following test to 
src/test/java/org/apache/commons/compress/archivers/tar/TarUtilsTest.java:
{code:java}
@Test
void testParsePax1xSparseHeadersDoesNotSkipAlignedDataRecord() throws Exception 
{
    final byte[] sparseHeader = pax1xSparseHeaderAlignedToSingleRecord();
    final byte[] fileData = new byte[513];
    Arrays.fill(fileData, 0, 512, (byte) 'A');
    fileData[512] = 'B';
    final byte[] input = ArrayUtils.addAll(sparseHeader, fileData);
    try (ByteArrayInputStream in = new ByteArrayInputStream(input)) {
        final List<TarArchiveStructSparse> sparseHeaders =
                TarUtils.parsePAX1XSparseHeaders(in, 512);
        assertEquals(100, sparseHeaders.size());
        assertEquals('A', in.read());
    }
} {code}
Run:
{code:java}
mvn -q -Dtest=org.apache.commons.compress.archivers.tar.TarUtilsTest test {code}
Observed behavior:
{code:java}
TarUtilsTest.testParsePax1xSparseHeadersDoesNotSkipAlignedDataRecord:251
expected: <65> but was: <66> {code}
Expected behavior:
After parsing an already aligned sparse header block, the next byte should be 
the first byte of file data.

The current formula converts an exact alignment case into a full-record skip. 
In the GNU sparse 1.x read path, that means the reader moves one record too far 
before building the sparse-data streams, so real payload bytes are skipped, and 
the extracted/read file content becomes misaligned.

  was:
*Summary*
parsePAX1XSparseHeaders() computes the remaining padding as recordSize - 
bytesRead % recordSize. When bytesRead % recordSize == 0, the correct skip 
should be 0, but the current code skips an entire extra record. This helper is 
used on the normal GNU sparse 1.x read path, so an affected archive can advance 
the stream past real file data.
 
*Affected code*
File: src/main/java/org/apache/commons/compress/archivers/tar/TarUtils.java
{code:java}
static List<TarArchiveStructSparse> parsePAX1XSparseHeaders(
        final InputStream inputStream, final int recordSize) throws IOException 
{
    ...
    while (sparseHeadersCount-- > 0) {
        ...
        bytesRead += readResult[1];
        sparseHeaders.add(new TarArchiveStructSparse(sparseOffset, 
sparseNumbytes));
    }
    final long bytesToSkip = recordSize - bytesRead % recordSize;
    IOUtils.skip(inputStream, bytesToSkip);
    return sparseHeaders;
} {code}
**
This code path is used when reading GNU sparse 1.x entries from both 
TarArchiveInputStream and TarFile:
{code:java}
// TarArchiveInputStream
if (currEntry.isPaxGNU1XSparse()) {
    
currEntry.setSparseHeaders(TarUtils.parsePAX1XSparseHeaders(currentInputStream, 
getRecordSize()));
}
buildSparseInputStreams(); {code}
{code:java}
// TarFile
if (currEntry.isPaxGNU1XSparse()) {
    final long position = archive.position();
    currEntry.setSparseHeaders(TarUtils.parsePAX1XSparseHeaders(currentStream, 
recordSize));
    final long sparseHeadersSize = archive.position() - position;
    currEntry.setSize(currEntry.getSize() - sparseHeadersSize);
    currEntry.setDataOffset(currEntry.getDataOffset() + sparseHeadersSize);
} {code}
*Reproducer*
Add the following test to 
src/test/java/org/apache/commons/compress/archivers/tar/TarUtilsTest.java:
{code:java}
@Test
void testParsePax1xSparseHeadersDoesNotSkipAlignedDataRecord() throws Exception 
{
    final byte[] sparseHeader = pax1xSparseHeaderAlignedToSingleRecord();
    final byte[] fileData = new byte[513];
    Arrays.fill(fileData, 0, 512, (byte) 'A');
    fileData[512] = 'B';
    final byte[] input = ArrayUtils.addAll(sparseHeader, fileData);
    try (ByteArrayInputStream in = new ByteArrayInputStream(input)) {
        final List<TarArchiveStructSparse> sparseHeaders =
                TarUtils.parsePAX1XSparseHeaders(in, 512);
        assertEquals(100, sparseHeaders.size());
        assertEquals('A', in.read());
    }
} {code}
Run:
{code:java}
mvn -q -Dtest=org.apache.commons.compress.archivers.tar.TarUtilsTest test {code}
Observed behavior:
{code:java}
TarUtilsTest.testParsePax1xSparseHeadersDoesNotSkipAlignedDataRecord:251
expected: <65> but was: <66> {code}
Expected behavior:
After parsing an already aligned sparse header block, the next byte should be 
the first byte of file data.


The current formula converts an exact alignment case into a full-record skip. 
In the GNU sparse 1.x read path, that means the reader moves one record too far 
before building the sparse-data streams, so real payload bytes are skipped, and 
the extracted/read file content becomes misaligned.


> TarUtils.parsePAX1XSparseHeaders() skips one extra full record when the 
> sparse header is already aligned
> --------------------------------------------------------------------------------------------------------
>
>                 Key: COMPRESS-724
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-724
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Archivers
>    Affects Versions: 1.28.0
>            Reporter: Ruiqi Dong
>            Priority: Critical
>
> *Summary*
> parsePAX1XSparseHeaders() computes the remaining padding as recordSize - 
> bytesRead % recordSize. When bytesRead % recordSize == 0, the correct skip 
> should be 0, but the current code skips an entire extra record. This helper 
> is used on the normal GNU sparse 1.x read path, so an affected archive can 
> advance the stream past real file data.
>  
> *Affected code*
> File: src/main/java/org/apache/commons/compress/archivers/tar/TarUtils.java
> {code:java}
> static List<TarArchiveStructSparse> parsePAX1XSparseHeaders(
>         final InputStream inputStream, final int recordSize) throws 
> IOException {
>     ...
>     while (sparseHeadersCount-- > 0) {
>         ...
>         bytesRead += readResult[1];
>         sparseHeaders.add(new TarArchiveStructSparse(sparseOffset, 
> sparseNumbytes));
>     }
>     final long bytesToSkip = recordSize - bytesRead % recordSize;
>     IOUtils.skip(inputStream, bytesToSkip);
>     return sparseHeaders;
> } {code}
> This code path is used when reading GNU sparse 1.x entries from both 
> TarArchiveInputStream and TarFile:
> {code:java}
> // TarArchiveInputStream
> if (currEntry.isPaxGNU1XSparse()) {
>     
> currEntry.setSparseHeaders(TarUtils.parsePAX1XSparseHeaders(currentInputStream,
>  getRecordSize()));
> }
> buildSparseInputStreams(); {code}
> {code:java}
> // TarFile
> if (currEntry.isPaxGNU1XSparse()) {
>     final long position = archive.position();
>     
> currEntry.setSparseHeaders(TarUtils.parsePAX1XSparseHeaders(currentStream, 
> recordSize));
>     final long sparseHeadersSize = archive.position() - position;
>     currEntry.setSize(currEntry.getSize() - sparseHeadersSize);
>     currEntry.setDataOffset(currEntry.getDataOffset() + sparseHeadersSize);
> } {code}
>  
> *Reproducer*
> Add the following test to 
> src/test/java/org/apache/commons/compress/archivers/tar/TarUtilsTest.java:
> {code:java}
> @Test
> void testParsePax1xSparseHeadersDoesNotSkipAlignedDataRecord() throws 
> Exception {
>     final byte[] sparseHeader = pax1xSparseHeaderAlignedToSingleRecord();
>     final byte[] fileData = new byte[513];
>     Arrays.fill(fileData, 0, 512, (byte) 'A');
>     fileData[512] = 'B';
>     final byte[] input = ArrayUtils.addAll(sparseHeader, fileData);
>     try (ByteArrayInputStream in = new ByteArrayInputStream(input)) {
>         final List<TarArchiveStructSparse> sparseHeaders =
>                 TarUtils.parsePAX1XSparseHeaders(in, 512);
>         assertEquals(100, sparseHeaders.size());
>         assertEquals('A', in.read());
>     }
> } {code}
> Run:
> {code:java}
> mvn -q -Dtest=org.apache.commons.compress.archivers.tar.TarUtilsTest test 
> {code}
> Observed behavior:
> {code:java}
> TarUtilsTest.testParsePax1xSparseHeadersDoesNotSkipAlignedDataRecord:251
> expected: <65> but was: <66> {code}
> Expected behavior:
> After parsing an already aligned sparse header block, the next byte should be 
> the first byte of file data.
> The current formula converts an exact alignment case into a full-record skip. 
> In the GNU sparse 1.x read path, that means the reader moves one record too 
> far before building the sparse-data streams, so real payload bytes are 
> skipped, and the extracted/read file content becomes misaligned.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (COMPRESS-724) TarUtils.parsePAX1XSparseHeaders() skips one extra full record when the sparse header is already aligned

Reply via email to