Matt Gilman created NIFI-16000:
----------------------------------

             Summary: FileUtils.getSanitizedFilename rejects filenames 
containing spaces
                 Key: NIFI-16000
                 URL: https://issues.apache.org/jira/browse/NIFI-16000
             Project: Apache NiFi
          Issue Type: Improvement
          Components: Core Framework
            Reporter: Matt Gilman
            Assignee: Matt Gilman


`org.apache.nifi.util.file.FileUtils.getSanitizedFilename(String)` treats the 
space character (code point `32`) as invalid and replaces it with an 
underscore. This list was originally derived from a cross-platform "invalid 
filename characters" reference, but the space character is legal on every major 
file system (NTFS, ext4, APFS, etc.).

This becomes a usability problem because of how the method is consumed. Both 
`ConnectorResource` and `ParameterContextResource` use it as a strict 
validation gate for the asset name supplied in the `Filename` request header:

{code:java}
final String sanitizedAssetName = FileUtils.getSanitizedFilename(assetName);
if (!assetName.equals(sanitizedAssetName)) {
    throw new IllegalArgumentException(FILENAME_HEADER + " header contains an 
invalid file name");
}
{code}

Because any name containing a space is rewritten during sanitization, the 
equality check fails and the upload is rejected. As a result, common, perfectly 
valid filenames cannot be uploaded as assets. For example, a file produced by 
browser/OS download de-duplication such as {{driver (1).jar}} is sanitized to 
{{driver_(1).jar}}, which differs from the original and is therefore rejected 
with _"... header contains an invalid file name."_

**Proposed change**

Permit spaces within a filename while keeping the result canonical and 
file-system-safe:

* Remove the space character (`32`) from the invalid-character set so interior 
spaces are preserved.
* After the existing per-character replacement, normalize the result by 
collapsing interior whitespace runs to a single space, stripping 
leading/trailing whitespace, and removing trailing dots.

This preserves the existing "sanitize, then reject if the name changed" 
contract at the call sites (a non-canonical name such as a leading/trailing 
space or a trailing dot is still rejected), while allowing legitimate names 
that merely contain interior spaces. It also avoids the ambiguous edge cases 
that simply accepting spaces would introduce (leading/trailing spaces, repeated 
spaces, trailing dots, and whitespace-only names — the latter of which can 
collide on Windows, where trailing spaces/dots are silently stripped).

**Examples (after change)**

| Input | Output | Accepted by callers? |
| {{driver (1).jar}} | {{driver (1).jar}} | Yes |
| {{driver   (1).jar}} (repeated spaces) | {{driver (1).jar}} | No 
(non-canonical) |
| {{ driver (1).jar }} (leading/trailing) | {{driver (1).jar}} | No 
(non-canonical) |
| {{report...}} (trailing dots) | {{report}} | No (non-canonical) |
| {{a/b\c}} | {{a_b_c}} | No (non-canonical) |

**Backward compatibility**

The change is backward compatible: names that previously sanitized cleanly 
continue to do so, and the only behavioral change is that filenames whose sole 
issue was an interior space are now accepted instead of being rewritten.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to