[ 
https://issues.apache.org/jira/browse/NIFI-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Gilman updated NIFI-16000:
-------------------------------
    Description: 
`org.apache.nifi.util.file.FileUtils.getSanitizedFilename(String)` treats the 
space character (code point `32`) as invalid and replaces it with an 
underscore. This list was originally derived from a cross-platform "invalid 
filename characters" reference, but the space character is legal on every major 
file system (NTFS, ext4, APFS, etc.).

This becomes a usability problem because of how the method is consumed. Both 
`ConnectorResource` and `ParameterContextResource` use it as a strict 
validation gate for the asset name supplied in the `Filename` request header:

{code:java}
final String sanitizedAssetName = FileUtils.getSanitizedFilename(assetName);
if (!assetName.equals(sanitizedAssetName)) {
    throw new IllegalArgumentException(FILENAME_HEADER + " header contains an 
invalid file name");
}
{code}

The pattern is "sanitize, then reject if sanitization changed anything." 
Because any name containing a space is rewritten during sanitization, the 
equality check fails and the upload is rejected. As a result, common, perfectly 
valid filenames cannot be uploaded as assets. For example, a file produced by 
browser/OS download de-duplication such as {{driver (1).jar}} is sanitized to 
{{driver_(1).jar}}, which differs from the original and is therefore rejected 
with _"... header contains an invalid file name."_

**Proposed change**

Remove the space character (`32`) from the invalid-character set so spaces are 
preserved rather than replaced. Spaces are left exactly as supplied — including 
leading, trailing, repeated, and interior spaces — and no other normalization 
is performed. All other characters continue to be sanitized as before.

**Examples (after change)**

| Input | Output |
| {{driver (1).jar}} | {{driver (1).jar}} |
| {{my report.txt}} | {{my report.txt}} |
| {{driver   (1).jar}} | {{driver   (1).jar}} |
| {{a/b\c}} | {{a_b_c}} |
| {{name:}} | {{name_}} |

**Backward compatibility**

The change is backward compatible: any filename that contained no spaces is 
sanitized exactly as before. The only behavioral change is that the space 
character is now preserved instead of being replaced with an underscore, so 
filenames whose sole issue was a space are now accepted by the asset-upload 
callers instead of being rejected.

  was:
`org.apache.nifi.util.file.FileUtils.getSanitizedFilename(String)` treats the 
space character (code point `32`) as invalid and replaces it with an 
underscore. This list was originally derived from a cross-platform "invalid 
filename characters" reference, but the space character is legal on every major 
file system (NTFS, ext4, APFS, etc.).

This becomes a usability problem because of how the method is consumed. Both 
`ConnectorResource` and `ParameterContextResource` use it as a strict 
validation gate for the asset name supplied in the `Filename` request header:

{code:java}
final String sanitizedAssetName = FileUtils.getSanitizedFilename(assetName);
if (!assetName.equals(sanitizedAssetName)) {
    throw new IllegalArgumentException(FILENAME_HEADER + " header contains an 
invalid file name");
}
{code}

Because any name containing a space is rewritten during sanitization, the 
equality check fails and the upload is rejected. As a result, common, perfectly 
valid filenames cannot be uploaded as assets. For example, a file produced by 
browser/OS download de-duplication such as {{driver (1).jar}} is sanitized to 
{{driver_(1).jar}}, which differs from the original and is therefore rejected 
with _"... header contains an invalid file name."_

**Proposed change**

Permit spaces within a filename while keeping the result canonical and 
file-system-safe:

* Remove the space character (`32`) from the invalid-character set so interior 
spaces are preserved.
* After the existing per-character replacement, normalize the result by 
collapsing interior whitespace runs to a single space, stripping 
leading/trailing whitespace, and removing trailing dots.

This preserves the existing "sanitize, then reject if the name changed" 
contract at the call sites (a non-canonical name such as a leading/trailing 
space or a trailing dot is still rejected), while allowing legitimate names 
that merely contain interior spaces. It also avoids the ambiguous edge cases 
that simply accepting spaces would introduce (leading/trailing spaces, repeated 
spaces, trailing dots, and whitespace-only names — the latter of which can 
collide on Windows, where trailing spaces/dots are silently stripped).

**Examples (after change)**

| Input | Output | Accepted by callers? |
| {{driver (1).jar}} | {{driver (1).jar}} | Yes |
| {{driver   (1).jar}} (repeated spaces) | {{driver (1).jar}} | No 
(non-canonical) |
| {{ driver (1).jar }} (leading/trailing) | {{driver (1).jar}} | No 
(non-canonical) |
| {{report...}} (trailing dots) | {{report}} | No (non-canonical) |
| {{a/b\c}} | {{a_b_c}} | No (non-canonical) |

**Backward compatibility**

The change is backward compatible: names that previously sanitized cleanly 
continue to do so, and the only behavioral change is that filenames whose sole 
issue was an interior space are now accepted instead of being rewritten.


> FileUtils.getSanitizedFilename rejects filenames containing spaces
> ------------------------------------------------------------------
>
>                 Key: NIFI-16000
>                 URL: https://issues.apache.org/jira/browse/NIFI-16000
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework
>            Reporter: Matt Gilman
>            Assignee: Matt Gilman
>            Priority: Major
>
> `org.apache.nifi.util.file.FileUtils.getSanitizedFilename(String)` treats the 
> space character (code point `32`) as invalid and replaces it with an 
> underscore. This list was originally derived from a cross-platform "invalid 
> filename characters" reference, but the space character is legal on every 
> major file system (NTFS, ext4, APFS, etc.).
> This becomes a usability problem because of how the method is consumed. Both 
> `ConnectorResource` and `ParameterContextResource` use it as a strict 
> validation gate for the asset name supplied in the `Filename` request header:
> {code:java}
> final String sanitizedAssetName = FileUtils.getSanitizedFilename(assetName);
> if (!assetName.equals(sanitizedAssetName)) {
>     throw new IllegalArgumentException(FILENAME_HEADER + " header contains an 
> invalid file name");
> }
> {code}
> The pattern is "sanitize, then reject if sanitization changed anything." 
> Because any name containing a space is rewritten during sanitization, the 
> equality check fails and the upload is rejected. As a result, common, 
> perfectly valid filenames cannot be uploaded as assets. For example, a file 
> produced by browser/OS download de-duplication such as {{driver (1).jar}} is 
> sanitized to {{driver_(1).jar}}, which differs from the original and is 
> therefore rejected with _"... header contains an invalid file name."_
> **Proposed change**
> Remove the space character (`32`) from the invalid-character set so spaces 
> are preserved rather than replaced. Spaces are left exactly as supplied — 
> including leading, trailing, repeated, and interior spaces — and no other 
> normalization is performed. All other characters continue to be sanitized as 
> before.
> **Examples (after change)**
> | Input | Output |
> | {{driver (1).jar}} | {{driver (1).jar}} |
> | {{my report.txt}} | {{my report.txt}} |
> | {{driver   (1).jar}} | {{driver   (1).jar}} |
> | {{a/b\c}} | {{a_b_c}} |
> | {{name:}} | {{name_}} |
> **Backward compatibility**
> The change is backward compatible: any filename that contained no spaces is 
> sanitized exactly as before. The only behavioral change is that the space 
> character is now preserved instead of being replaced with an underscore, so 
> filenames whose sole issue was a space are now accepted by the asset-upload 
> callers instead of being rejected.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to