[ 
https://issues.apache.org/jira/browse/NIFI-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15201708#comment-15201708
 ] 

Joseph Witt commented on NIFI-1649:
-----------------------------------

Mark Bean added a comment - 09/Mar/16 14:56
What is the intent of the 'Remove Trailing Newlines' property? I believe the 
intent is to remove the End Of Line (EOL) character from the last line of each 
split file along with any additional lines that consist of nothing other than 
the EOL character (i.e. blank lines.) It seems to work fine when there is data 
other than blank lines. However, blank lines result in odd behavior. For 
example, I have observed the second split file having only 2 (blank) lines in a 
case where Header Line Count = 0, Line Split Count = 3, Remove Trailing 
Newlines = true, and the input file has lines 4-9 consisting of only '\n'. 
Essentially, only the last line of the split has its EOL removed.
Even more concerning is the case when Header Line Count is specified (and 
therefore all lines are written to an output stream versus simply cloning 
segments of the input flowfile.) Here, when a split file consists of nothing 
but blank lines, not only is that split file not output, but no subsequent 
split files are generated. The splitting is effectively stopped because 
processing believes the empty split file is the result of End Of File. This is 
a bug.
This can be addressed in the redesign of the SplitText processor. However, 
"proper" behavior needs to be well-defined. Additionally, I strongly recommend 
that the last line of the split file contain the exact contents as the line 
from the original flowfile. In other words, keep the EOL character. Removing it 
becomes highly problematic when splitting on maximum size. In such cases, you 
never know you're on the last line of the split file until the next line is 
read (and exceeds the limit.) Further, the behavior of a split file consisting 
of only blank lines (when Remove Trailing Newlines is true) needs to be clearly 
defined.
Suggestions: include EOL for all lines, but only remove trailing blank lines. 
Further, in cases where Remove Trailing Newlines is true and a split consists 
of only newlines, the split should consist of a single blank line.
Please comment.
Reply

> SplitText end of line handling is incorrect
> -------------------------------------------
>
>                 Key: NIFI-1649
>                 URL: https://issues.apache.org/jira/browse/NIFI-1649
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Extensions
>            Reporter: Joseph Witt
>            Assignee: Joseph Witt
>            Priority: Critical
>             Fix For: 0.6.0
>
>
> Lengthy discussion about this in NIFI-1118



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to