Ruiqi Dong created COMPRESS-724:
-----------------------------------
Summary: TarUtils.parsePAX1XSparseHeaders() skips one extra full
record when the sparse header is already aligned
Key: COMPRESS-724
URL: https://issues.apache.org/jira/browse/COMPRESS-724
Project: Commons Compress
Issue Type: Bug
Components: Archivers
Affects Versions: 1.28.0
Reporter: Ruiqi Dong
*Summary*
parsePAX1XSparseHeaders() computes the remaining padding as recordSize -
bytesRead % recordSize. When bytesRead % recordSize == 0, the correct skip
should be 0, but the current code skips an entire extra record. This helper is
used on the normal GNU sparse 1.x read path, so an affected archive can advance
the stream past real file data.
*Affected code*
File: src/main/java/org/apache/commons/compress/archivers/tar/TarUtils.java
{code:java}
static List<TarArchiveStructSparse> parsePAX1XSparseHeaders(
final InputStream inputStream, final int recordSize) throws IOException
{
...
while (sparseHeadersCount-- > 0) {
...
bytesRead += readResult[1];
sparseHeaders.add(new TarArchiveStructSparse(sparseOffset,
sparseNumbytes));
}
final long bytesToSkip = recordSize - bytesRead % recordSize;
IOUtils.skip(inputStream, bytesToSkip);
return sparseHeaders;
} {code}
**
This code path is used when reading GNU sparse 1.x entries from both
TarArchiveInputStream and TarFile:
{code:java}
// TarArchiveInputStream
if (currEntry.isPaxGNU1XSparse()) {
currEntry.setSparseHeaders(TarUtils.parsePAX1XSparseHeaders(currentInputStream,
getRecordSize()));
}
buildSparseInputStreams(); {code}
{code:java}
// TarFile
if (currEntry.isPaxGNU1XSparse()) {
final long position = archive.position();
currEntry.setSparseHeaders(TarUtils.parsePAX1XSparseHeaders(currentStream,
recordSize));
final long sparseHeadersSize = archive.position() - position;
currEntry.setSize(currEntry.getSize() - sparseHeadersSize);
currEntry.setDataOffset(currEntry.getDataOffset() + sparseHeadersSize);
} {code}
*Reproducer*
Add the following test to
src/test/java/org/apache/commons/compress/archivers/tar/TarUtilsTest.java:
{code:java}
@Test
void testParsePax1xSparseHeadersDoesNotSkipAlignedDataRecord() throws Exception
{
final byte[] sparseHeader = pax1xSparseHeaderAlignedToSingleRecord();
final byte[] fileData = new byte[513];
Arrays.fill(fileData, 0, 512, (byte) 'A');
fileData[512] = 'B';
final byte[] input = ArrayUtils.addAll(sparseHeader, fileData);
try (ByteArrayInputStream in = new ByteArrayInputStream(input)) {
final List<TarArchiveStructSparse> sparseHeaders =
TarUtils.parsePAX1XSparseHeaders(in, 512);
assertEquals(100, sparseHeaders.size());
assertEquals('A', in.read());
}
} {code}
Run:
{code:java}
mvn -q -Dtest=org.apache.commons.compress.archivers.tar.TarUtilsTest test {code}
Observed behavior:
{code:java}
TarUtilsTest.testParsePax1xSparseHeadersDoesNotSkipAlignedDataRecord:251
expected: <65> but was: <66> {code}
Expected behavior:
After parsing an already aligned sparse header block, the next byte should be
the first byte of file data.
The current formula converts an exact alignment case into a full-record skip.
In the GNU sparse 1.x read path, that means the reader moves one record too far
before building the sparse-data streams, so real payload bytes are skipped and
the extracted/read file content becomes misaligned.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)