[jira] [Resolved] (NIFI-7145) Chained SplitText processors unable to handle files in some circumstances

Pierre Villard (Jira) Mon, 05 Jan 2026 02:42:22 -0800


     [ 
https://issues.apache.org/jira/browse/NIFI-7145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Pierre Villard resolved NIFI-7145.
----------------------------------
    Resolution: Feedback Received

Apache NiFi 1.x is no longer maintained and no new release is planned on the 
1.x release line. Marking as resolved as part of a cleanup operation. Please 
open a new one with an updated description if this is still relevant for NiFi 
2.x.

> Chained SplitText processors unable to handle files in some circumstances
> -------------------------------------------------------------------------
>
>                 Key: NIFI-7145
>                 URL: https://issues.apache.org/jira/browse/NIFI-7145
>             Project: Apache NiFi
>          Issue Type: Bug
>    Affects Versions: 1.11.1
>         Environment: Docker Image (apache/nifi) running in Kubernetes (1.15)
>            Reporter: Chris Sampson
>            Priority: Minor
>         Attachments: Broken_SplitText.json, Broken_SplitText.xml, Screen Shot 
> 2020-02-13 at 17.28.58.png, nifi-app.log, test.csv.tgz
>
>
> With chained SplitText processors (NiFi 1.11.1 apache/nifi Docker image with 
> default nifi.properties, although configured to allow secure access in my 
> environment with encrypted flowfile/provenance/content repositories, don't 
> know whether that makes a difference): * ingest 40MB CSV file with 50k lines 
> of data (plus 1 header)
>  * SplitText - chunk the file into 10k segments (including header in each 
> file)
>  * SplitText - break each row out into its own FlowFile
>  
>  The 10k chunking works fine, but then the files sit in the queue between the 
> processors forever with the second SplitText sat showing it’s working but 
> never actually produces anything (can’t see anything in the logs, although 
> haven’t turned on debug logging to see whether that would provide anything 
> more).
>   
>  If I reduce the chunk size to 1k then the per-row split works fine - maybe 
> some sort of issue with SplitText and/or swapping of FlowFiles/content to the 
> repositories? Similarly, trying to same with a smaller file (i.e. just 
> include the first 3 columns from teh attached, but keep the 50k rows) seems 
> to work fine too.
>   
>  Example Flow/Template attached with file that breaks the flow (untar and 
> copy into /tmp). Second SplitText set to Concurrency=3 in the template, but 
> fails just the same when set to default Concurrency=1.
>   
>  SplitRecord would be an alternative (which works fine when I try it), but I 
> can’t use that as we potentially lose data if the CSV is malformed (there are 
> more data fields in a row that defined headers - the extra fields are thrown 
> away by the Record processors, which I understand to be normal and that’s 
> fine, but unfortunately I later need to ValidateRecord for each of these rows 
> to check for this kind of invalidity).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (NIFI-7145) Chained SplitText processors unable to handle files in some circumstances

Reply via email to