[ https://issues.apache.org/jira/browse/NIFI-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Otto Fowler resolved NIFI-5689. ------------------------------- Resolution: Fixed > ReplaceText does not handle end of line correctly on buffer boundary > -------------------------------------------------------------------- > > Key: NIFI-5689 > URL: https://issues.apache.org/jira/browse/NIFI-5689 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions > Affects Versions: 1.7.1 > Reporter: Sergei Zhirikov > Assignee: Otto Fowler > Priority: Minor > Attachments: Text_Parsing_Bug.xml > > > ReplaceText appears to misbehave under the following conditions: > * The input flow file contains text with Windows-style line endings (CR-LF). > * ReplaceText is configured to perform "Regex Replace" in "Line-by-Line" > mode. > * The "Maximum Buffer Size" is set to a value smaller than the whole file > content, > but large enough to fit any of the text lines in the file. > * A CR-LF pair of characters in one of the lines happens to be split across > two buffers, > that is CR is the last character in one buffer and LF is the first one in the > following one. > An example flow template is attached to illustrate the problem. > In the example, the regular expression is intended to remove white space at > the end of each line. It operates as expected in all lines except the third > one (containing "GHI"). That line satisfies the conditions described above. > As a result the CR character in the end of the line is removed, which does > not happen in other lines. > In some more complicated cases both CR and LF end up being removed, > effectively resulting in two lines being joined into one. Although, I haven't > managed to create a simple test case to reproduce that. -- This message was sent by Atlassian JIRA (v7.6.3#76005)