Matt Gilman created NIFI-16000:
----------------------------------
Summary: FileUtils.getSanitizedFilename rejects filenames
containing spaces
Key: NIFI-16000
URL: https://issues.apache.org/jira/browse/NIFI-16000
Project: Apache NiFi
Issue Type: Improvement
Components: Core Framework
Reporter: Matt Gilman
Assignee: Matt Gilman
`org.apache.nifi.util.file.FileUtils.getSanitizedFilename(String)` treats the
space character (code point `32`) as invalid and replaces it with an
underscore. This list was originally derived from a cross-platform "invalid
filename characters" reference, but the space character is legal on every major
file system (NTFS, ext4, APFS, etc.).
This becomes a usability problem because of how the method is consumed. Both
`ConnectorResource` and `ParameterContextResource` use it as a strict
validation gate for the asset name supplied in the `Filename` request header:
{code:java}
final String sanitizedAssetName = FileUtils.getSanitizedFilename(assetName);
if (!assetName.equals(sanitizedAssetName)) {
throw new IllegalArgumentException(FILENAME_HEADER + " header contains an
invalid file name");
}
{code}
Because any name containing a space is rewritten during sanitization, the
equality check fails and the upload is rejected. As a result, common, perfectly
valid filenames cannot be uploaded as assets. For example, a file produced by
browser/OS download de-duplication such as {{driver (1).jar}} is sanitized to
{{driver_(1).jar}}, which differs from the original and is therefore rejected
with _"... header contains an invalid file name."_
**Proposed change**
Permit spaces within a filename while keeping the result canonical and
file-system-safe:
* Remove the space character (`32`) from the invalid-character set so interior
spaces are preserved.
* After the existing per-character replacement, normalize the result by
collapsing interior whitespace runs to a single space, stripping
leading/trailing whitespace, and removing trailing dots.
This preserves the existing "sanitize, then reject if the name changed"
contract at the call sites (a non-canonical name such as a leading/trailing
space or a trailing dot is still rejected), while allowing legitimate names
that merely contain interior spaces. It also avoids the ambiguous edge cases
that simply accepting spaces would introduce (leading/trailing spaces, repeated
spaces, trailing dots, and whitespace-only names — the latter of which can
collide on Windows, where trailing spaces/dots are silently stripped).
**Examples (after change)**
| Input | Output | Accepted by callers? |
| {{driver (1).jar}} | {{driver (1).jar}} | Yes |
| {{driver (1).jar}} (repeated spaces) | {{driver (1).jar}} | No
(non-canonical) |
| {{ driver (1).jar }} (leading/trailing) | {{driver (1).jar}} | No
(non-canonical) |
| {{report...}} (trailing dots) | {{report}} | No (non-canonical) |
| {{a/b\c}} | {{a_b_c}} | No (non-canonical) |
**Backward compatibility**
The change is backward compatible: names that previously sanitized cleanly
continue to do so, and the only behavioral change is that filenames whose sole
issue was an interior space are now accepted instead of being rewritten.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)