Re: [PR] fix: Restore incremental file name handling [commons-compress]

via GitHub Tue, 26 Aug 2025 12:21:16 -0700


garydgregory commented on code in PR #698:
URL: https://github.com/apache/commons-compress/pull/698#discussion_r2300839235



##########
src/main/java/org/apache/commons/compress/archivers/tar/TarUtils.java:
##########
@@ -883,22 +885,32 @@ private static long[] readLineOfNumberForPax1x(final 
InputStream inputStream) th
      * @throws IOException if an I/O error occurs or the entry is truncated.
      * @throws ArchiveException if the entry size is invalid.
      */
-    private static String readLongName(final InputStream input, final 
ZipEncoding encoding, final TarArchiveEntry entry)
+    static String readLongName(final InputStream input, final ZipEncoding 
encoding, final TarArchiveEntry entry)
             throws IOException {
         final long size = entry.getSize();
+        // The encoding requires a byte array, whose size must be a positive 
int.
         if (size > Integer.MAX_VALUE) {
-            throw new ArchiveException("Invalid long name size: " + 
entry.getSize());
-        }
-        final int sizeInt = (int) size;
-        final byte[] buffer = new byte[sizeInt];
-        if (IOUtils.readFully(input, buffer, 0, sizeInt) < sizeInt) {
-            throw new ArchiveException("TAR entry is truncated.");
-        }
-        int length = buffer.length;
-        while (length > 0 && buffer[length - 1] == 0) {
+            throw new ArchiveException("Invalid long name entry: size %d 
exceeds maximum allowed.", entry.getSize());
+        }
+        // Read the long name incrementally to limit memory allocation in case 
of a corrupted entry.
+        final BoundedInputStream boundedInput = BoundedInputStream.builder()
+                .setInputStream(input)
+                .setMaxCount(size)
+                .setPropagateClose(false)
+                .get();
+        final UnsynchronizedByteArrayOutputStream outputStream = 
UnsynchronizedByteArrayOutputStream.builder()
+                
.setBufferSize(org.apache.commons.io.IOUtils.DEFAULT_BUFFER_SIZE)
+                .get();
+        final long read = 
org.apache.commons.io.IOUtils.copyLarge(boundedInput, outputStream);
+        if (read != size) {
+            throw new ArchiveException("Truncated long name entry: expected %d 
bytes, read %d bytes.", size, read);

Review Comment:
   `%d` -> use locale-specific grouping characters with `%,d`.



##########
src/main/java/org/apache/commons/compress/archivers/tar/TarUtils.java:
##########
@@ -883,22 +885,32 @@ private static long[] readLineOfNumberForPax1x(final 
InputStream inputStream) th
      * @throws IOException if an I/O error occurs or the entry is truncated.
      * @throws ArchiveException if the entry size is invalid.
      */
-    private static String readLongName(final InputStream input, final 
ZipEncoding encoding, final TarArchiveEntry entry)
+    static String readLongName(final InputStream input, final ZipEncoding 
encoding, final TarArchiveEntry entry)
             throws IOException {
         final long size = entry.getSize();
+        // The encoding requires a byte array, whose size must be a positive 
int.
         if (size > Integer.MAX_VALUE) {
-            throw new ArchiveException("Invalid long name size: " + 
entry.getSize());
-        }
-        final int sizeInt = (int) size;
-        final byte[] buffer = new byte[sizeInt];
-        if (IOUtils.readFully(input, buffer, 0, sizeInt) < sizeInt) {
-            throw new ArchiveException("TAR entry is truncated.");
-        }
-        int length = buffer.length;
-        while (length > 0 && buffer[length - 1] == 0) {
+            throw new ArchiveException("Invalid long name entry: size %d 
exceeds maximum allowed.", entry.getSize());
+        }
+        // Read the long name incrementally to limit memory allocation in case 
of a corrupted entry.
+        final BoundedInputStream boundedInput = BoundedInputStream.builder()
+                .setInputStream(input)
+                .setMaxCount(size)
+                .setPropagateClose(false)
+                .get();
+        final UnsynchronizedByteArrayOutputStream outputStream = 
UnsynchronizedByteArrayOutputStream.builder()
+                
.setBufferSize(org.apache.commons.io.IOUtils.DEFAULT_BUFFER_SIZE)

Review Comment:
   You don't need a FQCN here.



##########
src/test/java/org/apache/commons/compress/archivers/tar/TarUtilsTest.java:
##########
@@ -599,4 +607,121 @@ void testWriteNegativeBinary8Byte() {
         assertEquals(-3601L, TarUtils.parseOctalOrBinary(b, 0, 8));
     }
 
+    /**
+     * Builds an NTFS-style path (\\?\C:\...) up to a target total UTF-16 
length, respecting 255-unit segments.
+     */
+    private static String createNtfsLongNameByUtf16Units(final int totalUnits) 
{
+        final String prefix = "\\\\?\\C:\\";
+        final String extension = ".txt";
+
+        // U+2605 BLACK STAR (BMP, 1 UTF-16 unit, 3 UTF-8 bytes) => lets us 
pack 255 units per segment easily
+        final String segment = StringUtils.repeat("★", 255);
+        assertEquals(255, segment.length(), "Segment length should be 255 
UTF-16 code units");
+
+        final StringBuilder sb = new StringBuilder(prefix);
+        while (sb.length() + extension.length() < totalUnits) {
+            sb.append(segment).append('\\');
+        }
+
+        // Trim to exact totalUnits (UTF-16 units), then append extension
+        sb.setLength(totalUnits - extension.length());
+        sb.append(extension);
+        assertEquals(totalUnits, sb.length(), "Final length should be " + 
totalUnits + " UTF-16 code units");
+        return sb.toString();
+    }
+
+    /**
+     * Builds a POSIX-style path (rooted at `/`) up to a target total *byte* 
length in UTF-8, 255 bytes/segment.
+     */
+    private static String createPosixLongNameByUtf8Bytes(final int totalBytes) 
{
+        final String extension = ".txt";
+        // U+2605 BLACK STAR (BMP, 1 UTF-16 unit, 3 UTF-8 bytes) => 85 * 3 
UTF-8 bytes = 255 bytes
+        final String segment = StringUtils.repeat("★", 85);
+        assertEquals(255, utf8Len(segment), "Segment length should be 255 
bytes in UTF-8");
+
+        final StringBuilder sb = new StringBuilder();
+        int count = totalBytes / 256; // how many full 256-byte chunks can we 
fit?
+        while (count-- > 0) {
+            sb.append(segment).append('/');
+        }
+        count = totalBytes - utf8Len(sb) - utf8Len(extension);
+        while (count-- > 0) {
+            sb.append('a');
+        }
+        sb.append(extension);
+        assertEquals(totalBytes, utf8Len(sb), "Final length should be " + 
totalBytes + " bytes in UTF-8");
+        return sb.toString();
+    }
+
+    private static int utf8Len(final CharSequence s) {
+        return s.toString().getBytes(UTF_8).length;
+    }
+
+    private static byte[] utf8Bytes(final String s) {
+        return s.getBytes(UTF_8);
+    }
+
+    private static byte[] paddedUtf8Bytes(final String s) {
+        final int blockSize = 1024;
+        final byte[] bytes = s.getBytes(UTF_8);
+        return Arrays.copyOf(bytes, ((bytes.length + blockSize - 1) / 
blockSize) * blockSize);
+    }
+
+    static Stream<Arguments> readLongNameHandlesLimits() {
+        final String empty = "";
+        final String ntfsLongName = createNtfsLongNameByUtf16Units(32767);
+        final String posixLongName = createPosixLongNameByUtf8Bytes(4095);
+        return Stream.of(
+                Arguments.of("Empty", empty, utf8Bytes(empty)),
+                Arguments.of("Empty (padded)", empty, paddedUtf8Bytes(empty)),
+                Arguments.of("NTFS", ntfsLongName, utf8Bytes(ntfsLongName)),
+                Arguments.of("NTFS (padded)", ntfsLongName, 
paddedUtf8Bytes(ntfsLongName)),
+                Arguments.of("POSIX", posixLongName, utf8Bytes(posixLongName)),
+                Arguments.of("POSIX (padded)", posixLongName, 
paddedUtf8Bytes(posixLongName)));
+    }
+
+    @ParameterizedTest(name = "{0} long name is read correctly")
+    @MethodSource
+    void readLongNameHandlesLimits(final String kind, final String 
expectedName, final byte[] data) throws IOException {

Review Comment:
   Prefix test methods with `test`, like all other test methods in this class.



##########
src/main/java/org/apache/commons/compress/archivers/tar/TarUtils.java:
##########
@@ -883,22 +885,32 @@ private static long[] readLineOfNumberForPax1x(final 
InputStream inputStream) th
      * @throws IOException if an I/O error occurs or the entry is truncated.
      * @throws ArchiveException if the entry size is invalid.
      */
-    private static String readLongName(final InputStream input, final 
ZipEncoding encoding, final TarArchiveEntry entry)
+    static String readLongName(final InputStream input, final ZipEncoding 
encoding, final TarArchiveEntry entry)
             throws IOException {
         final long size = entry.getSize();
+        // The encoding requires a byte array, whose size must be a positive 
int.
         if (size > Integer.MAX_VALUE) {
-            throw new ArchiveException("Invalid long name size: " + 
entry.getSize());
-        }
-        final int sizeInt = (int) size;
-        final byte[] buffer = new byte[sizeInt];
-        if (IOUtils.readFully(input, buffer, 0, sizeInt) < sizeInt) {
-            throw new ArchiveException("TAR entry is truncated.");
-        }
-        int length = buffer.length;
-        while (length > 0 && buffer[length - 1] == 0) {
+            throw new ArchiveException("Invalid long name entry: size %d 
exceeds maximum allowed.", entry.getSize());

Review Comment:
   `%d` -> use locale-specific grouping characters with `%,d`



##########
src/main/java/org/apache/commons/compress/archivers/tar/TarUtils.java:
##########
@@ -883,22 +885,32 @@ private static long[] readLineOfNumberForPax1x(final 
InputStream inputStream) th
      * @throws IOException if an I/O error occurs or the entry is truncated.
      * @throws ArchiveException if the entry size is invalid.
      */
-    private static String readLongName(final InputStream input, final 
ZipEncoding encoding, final TarArchiveEntry entry)
+    static String readLongName(final InputStream input, final ZipEncoding 
encoding, final TarArchiveEntry entry)
             throws IOException {
         final long size = entry.getSize();
+        // The encoding requires a byte array, whose size must be a positive 
int.
         if (size > Integer.MAX_VALUE) {
-            throw new ArchiveException("Invalid long name size: " + 
entry.getSize());
-        }
-        final int sizeInt = (int) size;
-        final byte[] buffer = new byte[sizeInt];
-        if (IOUtils.readFully(input, buffer, 0, sizeInt) < sizeInt) {
-            throw new ArchiveException("TAR entry is truncated.");
-        }
-        int length = buffer.length;
-        while (length > 0 && buffer[length - 1] == 0) {
+            throw new ArchiveException("Invalid long name entry: size %d 
exceeds maximum allowed.", entry.getSize());
+        }
+        // Read the long name incrementally to limit memory allocation in case 
of a corrupted entry.
+        final BoundedInputStream boundedInput = BoundedInputStream.builder()
+                .setInputStream(input)
+                .setMaxCount(size)
+                .setPropagateClose(false)
+                .get();
+        final UnsynchronizedByteArrayOutputStream outputStream = 
UnsynchronizedByteArrayOutputStream.builder()
+                
.setBufferSize(org.apache.commons.io.IOUtils.DEFAULT_BUFFER_SIZE)
+                .get();
+        final long read = 
org.apache.commons.io.IOUtils.copyLarge(boundedInput, outputStream);

Review Comment:
   You don't need a FQCN here.



##########
src/test/java/org/apache/commons/compress/archivers/tar/TarUtilsTest.java:
##########
@@ -599,4 +607,121 @@ void testWriteNegativeBinary8Byte() {
         assertEquals(-3601L, TarUtils.parseOctalOrBinary(b, 0, 8));
     }
 
+    /**
+     * Builds an NTFS-style path (\\?\C:\...) up to a target total UTF-16 
length, respecting 255-unit segments.
+     */
+    private static String createNtfsLongNameByUtf16Units(final int totalUnits) 
{
+        final String prefix = "\\\\?\\C:\\";
+        final String extension = ".txt";
+
+        // U+2605 BLACK STAR (BMP, 1 UTF-16 unit, 3 UTF-8 bytes) => lets us 
pack 255 units per segment easily
+        final String segment = StringUtils.repeat("★", 255);
+        assertEquals(255, segment.length(), "Segment length should be 255 
UTF-16 code units");
+
+        final StringBuilder sb = new StringBuilder(prefix);
+        while (sb.length() + extension.length() < totalUnits) {
+            sb.append(segment).append('\\');
+        }
+
+        // Trim to exact totalUnits (UTF-16 units), then append extension
+        sb.setLength(totalUnits - extension.length());
+        sb.append(extension);
+        assertEquals(totalUnits, sb.length(), "Final length should be " + 
totalUnits + " UTF-16 code units");
+        return sb.toString();
+    }
+
+    /**
+     * Builds a POSIX-style path (rooted at `/`) up to a target total *byte* 
length in UTF-8, 255 bytes/segment.
+     */
+    private static String createPosixLongNameByUtf8Bytes(final int totalBytes) 
{
+        final String extension = ".txt";
+        // U+2605 BLACK STAR (BMP, 1 UTF-16 unit, 3 UTF-8 bytes) => 85 * 3 
UTF-8 bytes = 255 bytes
+        final String segment = StringUtils.repeat("★", 85);
+        assertEquals(255, utf8Len(segment), "Segment length should be 255 
bytes in UTF-8");
+
+        final StringBuilder sb = new StringBuilder();
+        int count = totalBytes / 256; // how many full 256-byte chunks can we 
fit?
+        while (count-- > 0) {
+            sb.append(segment).append('/');
+        }
+        count = totalBytes - utf8Len(sb) - utf8Len(extension);
+        while (count-- > 0) {
+            sb.append('a');
+        }
+        sb.append(extension);
+        assertEquals(totalBytes, utf8Len(sb), "Final length should be " + 
totalBytes + " bytes in UTF-8");
+        return sb.toString();
+    }
+
+    private static int utf8Len(final CharSequence s) {
+        return s.toString().getBytes(UTF_8).length;
+    }
+
+    private static byte[] utf8Bytes(final String s) {
+        return s.getBytes(UTF_8);
+    }
+
+    private static byte[] paddedUtf8Bytes(final String s) {
+        final int blockSize = 1024;
+        final byte[] bytes = s.getBytes(UTF_8);
+        return Arrays.copyOf(bytes, ((bytes.length + blockSize - 1) / 
blockSize) * blockSize);
+    }
+
+    static Stream<Arguments> readLongNameHandlesLimits() {
+        final String empty = "";
+        final String ntfsLongName = createNtfsLongNameByUtf16Units(32767);
+        final String posixLongName = createPosixLongNameByUtf8Bytes(4095);
+        return Stream.of(
+                Arguments.of("Empty", empty, utf8Bytes(empty)),
+                Arguments.of("Empty (padded)", empty, paddedUtf8Bytes(empty)),
+                Arguments.of("NTFS", ntfsLongName, utf8Bytes(ntfsLongName)),
+                Arguments.of("NTFS (padded)", ntfsLongName, 
paddedUtf8Bytes(ntfsLongName)),
+                Arguments.of("POSIX", posixLongName, utf8Bytes(posixLongName)),
+                Arguments.of("POSIX (padded)", posixLongName, 
paddedUtf8Bytes(posixLongName)));
+    }
+
+    @ParameterizedTest(name = "{0} long name is read correctly")
+    @MethodSource
+    void readLongNameHandlesLimits(final String kind, final String 
expectedName, final byte[] data) throws IOException {
+        final TarArchiveEntry entry = new TarArchiveEntry("test");
+        entry.setSize(data.length);
+        // Lets add a trailing "garbage" to ensure we only read what we should
+        final byte[] dataWithGarbage = Arrays.copyOf(data, data.length + 1024);
+        Arrays.fill(dataWithGarbage, data.length, dataWithGarbage.length, 
(byte) 0xFF);
+
+        try (InputStream in = new ByteArrayInputStream(dataWithGarbage)) {
+            final String actualName = TarUtils.readLongName(in, 
ZipEncodingHelper.getZipEncoding(UTF_8), entry);
+            assertEquals(
+                    expectedName,
+                    actualName,
+                    () -> String.format("[%s] The long name read does not 
match the expected value.", kind));
+        }
+    }
+
+    static Stream<Arguments> readLongNameThrowsOnTruncation() {
+        return Stream.of(
+                Arguments.of(Integer.MAX_VALUE, "truncated long name"),
+                Arguments.of(Long.MAX_VALUE, "invalid long name"));
+    }
+
+    @ParameterizedTest(name = "readLongName of {0} bytes throws 
ArchiveException")
+    @MethodSource
+    void readLongNameThrowsOnTruncation(final long size, final CharSequence 
expectedMessage) throws IOException {

Review Comment:
   Prefix test methods with `test`, like all other test methods in this class.



##########
src/test/java/org/apache/commons/compress/archivers/tar/TarUtilsTest.java:
##########
@@ -599,4 +607,121 @@ void testWriteNegativeBinary8Byte() {
         assertEquals(-3601L, TarUtils.parseOctalOrBinary(b, 0, 8));
     }
 
+    /**
+     * Builds an NTFS-style path (\\?\C:\...) up to a target total UTF-16 
length, respecting 255-unit segments.
+     */
+    private static String createNtfsLongNameByUtf16Units(final int totalUnits) 
{
+        final String prefix = "\\\\?\\C:\\";
+        final String extension = ".txt";
+
+        // U+2605 BLACK STAR (BMP, 1 UTF-16 unit, 3 UTF-8 bytes) => lets us 
pack 255 units per segment easily
+        final String segment = StringUtils.repeat("★", 255);
+        assertEquals(255, segment.length(), "Segment length should be 255 
UTF-16 code units");
+
+        final StringBuilder sb = new StringBuilder(prefix);
+        while (sb.length() + extension.length() < totalUnits) {
+            sb.append(segment).append('\\');

Review Comment:
   WDYT about calling `append()` once if you build the argument outside the 
loop?



##########
src/test/java/org/apache/commons/compress/archivers/tar/TarUtilsTest.java:
##########
@@ -599,4 +607,121 @@ void testWriteNegativeBinary8Byte() {
         assertEquals(-3601L, TarUtils.parseOctalOrBinary(b, 0, 8));
     }
 
+    /**
+     * Builds an NTFS-style path (\\?\C:\...) up to a target total UTF-16 
length, respecting 255-unit segments.
+     */
+    private static String createNtfsLongNameByUtf16Units(final int totalUnits) 
{
+        final String prefix = "\\\\?\\C:\\";
+        final String extension = ".txt";
+
+        // U+2605 BLACK STAR (BMP, 1 UTF-16 unit, 3 UTF-8 bytes) => lets us 
pack 255 units per segment easily
+        final String segment = StringUtils.repeat("★", 255);
+        assertEquals(255, segment.length(), "Segment length should be 255 
UTF-16 code units");

Review Comment:
   I guess this assertion is a sanity check; the `255` magic number is the same 
as the line above, so it could be refactored into a local variable for clarity.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] fix: Restore incremental file name handling [commons-compress]

Reply via email to