[ 
https://issues.apache.org/jira/browse/NIFI-942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14875731#comment-14875731
 ] 

Mark Payne commented on NIFI-942:
---------------------------------

Joe: finding a regex and matching a regex are two very different things.

Take for example the input: The quick brown fox jumps over the lazy dog.

This contains the regex: quick.*?fox
But it definitely does not MATCH the regex. It would match the regex: 
.*quick.*?fox.*

Matches Regular Expression would indicate that the entire line matches the 
regex exactly, whereas contains would indicate that some portion of the line 
matches the regex.

In Java land, this would translate to:

contains: pattern.matcher(line).find()
matches: pattern.matcher(line).matches()

While we could just supply the "Matches Regular Expression" and expect the user 
to do a .* before and after the rest of the regex, that is extremely 
inefficient in Java Pattern matching land.

> Create RouteText processor
> --------------------------
>
>                 Key: NIFI-942
>                 URL: https://issues.apache.org/jira/browse/NIFI-942
>             Project: Apache NiFi
>          Issue Type: New Feature
>          Components: Extensions
>            Reporter: Mark Payne
>            Assignee: Joseph Percivall
>             Fix For: 0.4.0
>
>
> The idea is to route individual lines of a text file to different 
> relationships. This allows for splitting lines based on some criteria or 
> filtering out specific lines, and would be a much more convenient alternative 
> than RouteOnContent for textual data.
> A discussion for this took place on the users mailing list 
> (http://mail-archives.apache.org/mod_mbox/nifi-users/201509.mbox/%3CCAKpk5PxjszdX-NXMMf6Pcet4x7Y5GmrT7_jn9uyzS-h_a9TG3A%40mail.gmail.com%3E)
> The way that I could see this working is to have a few different properties:
> Routing Strategy:
> - Route each line to matching Property Name (default)
> - Route matching lines to 'matched' if all match
> - Route matching lines to 'matched' if any match
> - Route FlowFile to 'matched' if all lines match
> - Route FlowFile to 'matched' if any line matches
> A Match Strategy
> - Starts With
> - Ends With
> - Contains
> - Equals
> - Matches Regular Expression
> - Contains Regular Expression
> And then user-defined properties that indicate what to search each line of 
> text for.
> So to find lines that begin with the < character
> You would simply add a property named "Begins with Less Than" and set the 
> value to : <
> Then set the Match Strategy to Starts With
> And Routing Strategy to "Route each line to matching Property Name"
> Then, any line that begins with < will be routed to the Begins with Less Than 
> relationship.
> This would be a simple way to pull out any particular lines of interest in a 
> text file.
> I can see this being very useful for processing log files, CSV, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to