[ https://issues.apache.org/jira/browse/NIFI-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15201708#comment-15201708 ]
Joseph Witt commented on NIFI-1649: ----------------------------------- Mark Bean added a comment - 09/Mar/16 14:56 What is the intent of the 'Remove Trailing Newlines' property? I believe the intent is to remove the End Of Line (EOL) character from the last line of each split file along with any additional lines that consist of nothing other than the EOL character (i.e. blank lines.) It seems to work fine when there is data other than blank lines. However, blank lines result in odd behavior. For example, I have observed the second split file having only 2 (blank) lines in a case where Header Line Count = 0, Line Split Count = 3, Remove Trailing Newlines = true, and the input file has lines 4-9 consisting of only '\n'. Essentially, only the last line of the split has its EOL removed. Even more concerning is the case when Header Line Count is specified (and therefore all lines are written to an output stream versus simply cloning segments of the input flowfile.) Here, when a split file consists of nothing but blank lines, not only is that split file not output, but no subsequent split files are generated. The splitting is effectively stopped because processing believes the empty split file is the result of End Of File. This is a bug. This can be addressed in the redesign of the SplitText processor. However, "proper" behavior needs to be well-defined. Additionally, I strongly recommend that the last line of the split file contain the exact contents as the line from the original flowfile. In other words, keep the EOL character. Removing it becomes highly problematic when splitting on maximum size. In such cases, you never know you're on the last line of the split file until the next line is read (and exceeds the limit.) Further, the behavior of a split file consisting of only blank lines (when Remove Trailing Newlines is true) needs to be clearly defined. Suggestions: include EOL for all lines, but only remove trailing blank lines. Further, in cases where Remove Trailing Newlines is true and a split consists of only newlines, the split should consist of a single blank line. Please comment. Reply > SplitText end of line handling is incorrect > ------------------------------------------- > > Key: NIFI-1649 > URL: https://issues.apache.org/jira/browse/NIFI-1649 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions > Reporter: Joseph Witt > Assignee: Joseph Witt > Priority: Critical > Fix For: 0.6.0 > > > Lengthy discussion about this in NIFI-1118 -- This message was sent by Atlassian JIRA (v6.3.4#6332)