ppkarwasz commented on code in PR #781:
URL: https://github.com/apache/commons-io/pull/781#discussion_r2328598616
##########
src/main/java/org/apache/commons/io/FileSystem.java:
##########
@@ -107,7 +107,7 @@ public enum FileSystem {
"LPT1", "LPT2", "LPT3", "LPT4", "LPT5", "LPT6", "LPT7",
"LPT8", "LPT9",
"LPT\u00b2", "LPT\u00b3", "LPT\u00b9", // Superscript 2 3
1 in that order
"NUL", "PRN"
- }, true, true, '\\', LengthUnit.CHARS);
+ }, true, true, '\\', NameLengthStrategy.UTF16_CHARS);
Review Comment:
Could you expand a bit on what you mean?
The main goal of my PR is to highlight that UTF-16 code units are almost
never the unit of measure used for file name limits. That severely constrains
how the values returned by `getMaxFileNameLength()` and `getMaxPathLength()`
can be interpreted.
My immediate motivation is to provide reasonable, conservative limits for
file names inside archives, so that corrupted or malicious values can be
detected early. For example a file entry with a name larger than
`getMaxPathLength() * maxBytesPerChar` could be immediately marked as invalid.
At first I looked only at **filesystem constraints**, but I realized I
should also consider the **Java APIs** that sit on top of them, since the JDK’s
abstractions can impose their own effective limits. That naturally leads to two
broad categories:
* **Windows**: API calls use UTF-16 code units directly. That makes it
relatively straightforward to apply file name and path length checks in Java.
Interestingly, the JDK `Path` implementation lowers the theoretical maximum
from 32,767 UTF-16 units to 32,000 and throws if the limit is exceeded.
* **POSIX-based OSes (Linux, macOS)**: limits are defined in **bytes**
(`NAME_MAX`, `PATH_MAX`), not UTF-16 units. The behavior for names longer than
those limits is unspecified. For example, HFS+ allows 255 UTF-16 code units per
name, but macOS’s POSIX layer only guarantees 255 **UTF-8 bytes**, so names
that are legal on disk may still be inaccessible via standard APIs.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]