garydgregory commented on code in PR #781:
URL: https://github.com/apache/commons-io/pull/781#discussion_r2326320199


##########
src/main/java/org/apache/commons/io/FileSystem.java:
##########
@@ -380,23 +396,45 @@ public int[] getIllegalFileNameCodePoints() {
     }
 
     /**
-     * Gets the maximum length for file names. The file name does not include 
folders.
+     * Gets the maximum length for file names (excluding any folder path).
+     *
+     * <p><strong>Note:</strong> This excludes any folder path. The unit 
depends on the
+     * filesystem or OS; see {@link #getLengthUnit()} to check whether the 
value is in
+     * bytes or UTF-16 characters.</p>
      *
-     * @return the maximum length for file names.
+     * @return the maximum file name length.
      */
     public int getMaxFileNameLength() {
         return maxFileNameLength;
     }
 
     /**
-     * Gets the maximum length of the path to a file. This can include folders.
+     * Gets the maximum length for file paths (may include folders).
      *
-     * @return the maximum length of the path to a file.
+     * <p><strong>Note:</strong> This may include folder names as well as the 
file name.
+     * The unit is the same as {@link #getMaxFileNameLength()} and can be 
obtained
+     * from {@link #getLengthUnit()}.</p>
+     *
+     * @return the maximum file path length.

Review Comment:
   Should we now specify the scale of this length? Length in ... chars?



##########
src/main/java/org/apache/commons/io/FileSystem.java:
##########
@@ -380,23 +396,45 @@ public int[] getIllegalFileNameCodePoints() {
     }
 
     /**
-     * Gets the maximum length for file names. The file name does not include 
folders.
+     * Gets the maximum length for file names (excluding any folder path).
+     *
+     * <p><strong>Note:</strong> This excludes any folder path. The unit 
depends on the
+     * filesystem or OS; see {@link #getLengthUnit()} to check whether the 
value is in
+     * bytes or UTF-16 characters.</p>
      *
-     * @return the maximum length for file names.
+     * @return the maximum file name length.

Review Comment:
   Should we now specify the scale of this length? Length in ... `char`s?



##########
src/main/java/org/apache/commons/io/FileSystem.java:
##########
@@ -530,4 +623,76 @@ CharSequence trimExtension(final CharSequence cs) {
         final int index = indexOf(cs, '.', 0);
         return index < 0 ? cs : cs.subSequence(0, index);
     }
+
+    private boolean isLegalFileLength(final CharSequence candidate, final 
Charset charset) {
+        if (candidate == null || candidate.length() == 0) {
+            return false;
+        }
+        if (lengthUnit == LengthUnit.CHARS) {
+            return candidate.length() <= getMaxFileNameLength();
+        }
+        final CharsetEncoder encoder = charset.newEncoder();
+        try {
+            final ByteBuffer buffer = 
encoder.encode(CharBuffer.wrap(candidate));
+            return buffer.remaining() <= getMaxFileNameLength();
+        } catch (CharacterCodingException e) {
+            // If we can't encode, it's not legal
+            return false;
+        }
+    }
+
+    CharSequence truncateFileName(final CharSequence candidate, final Charset 
charset) {
+        final int maxFileNameLength = getMaxFileNameLength();
+        // Character-based limit: simple substring if needed.
+        if (lengthUnit == LengthUnit.CHARS) {
+            return candidate.length() <= maxFileNameLength ? candidate : 
candidate.subSequence(0, maxFileNameLength);
+        }
+
+        // Byte-based limit
+        return truncateByBytes(candidate, charset, maxFileNameLength);
+    }
+
+    static CharSequence truncateByBytes(final CharSequence candidate, final 
Charset charset, final int maxBytes) {

Review Comment:
   What does "truncateByBytes" mean?



##########
src/main/java/org/apache/commons/io/FileSystem.java:
##########
@@ -329,10 +342,12 @@ private static String replace(final String path, final 
char oldChar, final char
      * @param reservedFileNamesExtensions TODO
      * @param supportsDriveLetter Whether this file system support driver 
letters.
      * @param nameSeparator The name separator, '\\' on Windows, '/' on Linux.
+     * @param lengthUnit The unit of measurement for length limits.

Review Comment:
   The Javadoc should make explicit _what_ the measurement applies to. The 
length of ...



##########
src/main/java/org/apache/commons/io/FileSystem.java:
##########
@@ -530,4 +623,76 @@ CharSequence trimExtension(final CharSequence cs) {
         final int index = indexOf(cs, '.', 0);
         return index < 0 ? cs : cs.subSequence(0, index);
     }
+
+    private boolean isLegalFileLength(final CharSequence candidate, final 
Charset charset) {
+        if (candidate == null || candidate.length() == 0) {
+            return false;
+        }
+        if (lengthUnit == LengthUnit.CHARS) {

Review Comment:
   If we are going to add a LengthUnit class, then we should delegate these 
if/else computations to the class.



##########
src/main/java/org/apache/commons/io/FileSystem.java:
##########
@@ -530,4 +623,76 @@ CharSequence trimExtension(final CharSequence cs) {
         final int index = indexOf(cs, '.', 0);
         return index < 0 ? cs : cs.subSequence(0, index);
     }
+
+    private boolean isLegalFileLength(final CharSequence candidate, final 
Charset charset) {
+        if (candidate == null || candidate.length() == 0) {
+            return false;
+        }
+        if (lengthUnit == LengthUnit.CHARS) {
+            return candidate.length() <= getMaxFileNameLength();
+        }
+        final CharsetEncoder encoder = charset.newEncoder();
+        try {
+            final ByteBuffer buffer = 
encoder.encode(CharBuffer.wrap(candidate));
+            return buffer.remaining() <= getMaxFileNameLength();
+        } catch (CharacterCodingException e) {
+            // If we can't encode, it's not legal
+            return false;
+        }
+    }
+
+    CharSequence truncateFileName(final CharSequence candidate, final Charset 
charset) {
+        final int maxFileNameLength = getMaxFileNameLength();
+        // Character-based limit: simple substring if needed.
+        if (lengthUnit == LengthUnit.CHARS) {
+            return candidate.length() <= maxFileNameLength ? candidate : 
candidate.subSequence(0, maxFileNameLength);
+        }
+
+        // Byte-based limit
+        return truncateByBytes(candidate, charset, maxFileNameLength);
+    }
+
+    static CharSequence truncateByBytes(final CharSequence candidate, final 
Charset charset, final int maxBytes) {
+        // Byte-based limit
+        final CharsetEncoder encoder = charset.newEncoder()
+                .onMalformedInput(CodingErrorAction.REPORT)
+                .onUnmappableCharacter(CodingErrorAction.REPORT);
+
+        if (!encoder.canEncode(candidate)) {
+            throw new IllegalArgumentException(
+                    "File name contains characters that cannot be encoded with 
charset " + charset.name());
+        }
+
+        // Fast path: if even the worst-case expansion fits, we're done.
+        if (candidate.length() <= Math.floor(maxBytes / 
encoder.maxBytesPerChar())) {
+            return candidate;
+        }
+
+        // Slow path: encode into a fixed-size byte buffer.
+        final ByteBuffer out = ByteBuffer.allocate(maxBytes);
+        final CharBuffer in = CharBuffer.wrap(candidate);
+
+        // Encode until the first character that would exceed the byte budget.
+        final CoderResult cr = encoder.encode(in, out, true);
+
+        if (cr.isUnderflow()) {
+            // Entire candidate fit within maxFileNameLength bytes.
+            return candidate;
+        }
+
+        // We ran out of space mid-encode: truncate BEFORE the offending 
character.
+        return candidate.subSequence(0, in.position());
+    }
+
+    /**
+     * Units of length for the file name and path length limits.
+     *

Review Comment:
   I can't quite understand why this is needed and how it works. Is this saying 
that different OSs count file names on disk differently? Would this counting 
depend on the OS file encoding?



##########
src/main/java/org/apache/commons/io/FileSystem.java:
##########
@@ -530,4 +623,76 @@ CharSequence trimExtension(final CharSequence cs) {
         final int index = indexOf(cs, '.', 0);
         return index < 0 ? cs : cs.subSequence(0, index);
     }
+
+    private boolean isLegalFileLength(final CharSequence candidate, final 
Charset charset) {
+        if (candidate == null || candidate.length() == 0) {
+            return false;
+        }
+        if (lengthUnit == LengthUnit.CHARS) {
+            return candidate.length() <= getMaxFileNameLength();
+        }
+        final CharsetEncoder encoder = charset.newEncoder();
+        try {
+            final ByteBuffer buffer = 
encoder.encode(CharBuffer.wrap(candidate));
+            return buffer.remaining() <= getMaxFileNameLength();
+        } catch (CharacterCodingException e) {
+            // If we can't encode, it's not legal
+            return false;
+        }
+    }
+
+    CharSequence truncateFileName(final CharSequence candidate, final Charset 
charset) {
+        final int maxFileNameLength = getMaxFileNameLength();
+        // Character-based limit: simple substring if needed.
+        if (lengthUnit == LengthUnit.CHARS) {

Review Comment:
   If we are going to add a LengthUnit class, then we should delegate these 
if/else computations to the class.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to