[ https://issues.apache.org/jira/browse/NIFI-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148956#comment-17148956 ]
Otto Fowler edited comment on NIFI-2072 at 6/30/20, 9:21 PM: ------------------------------------------------------------- [~pvillard] Something like this? The restriction on the property to enable is: if you want name groups, all your capturing groups MUST be named. You can't mix named and unnamed captures. If they don't match, it falls back to the old way. But I haven't written the verify yet either.... {code:java} final String SAMPLE_STRING = "foo\r\nbar1\r\nbar2\r\nbar3\r\nhello\r\nworld\r\n"; @Test public void testProcessorWithGroupNames() throws Exception { final TestRunner testRunner = TestRunners.newTestRunner(new ExtractText()); testRunner.setProperty("regex.result1", "(?s)(?<all>.*)"); testRunner.setProperty("regex.result2", "(?s).*(?<bar1>bar1).*"); testRunner.setProperty("regex.result3", "(?s).*?(?<bar1>bar\\d).*"); testRunner.setProperty("regex.result4", "(?s).*?(?:bar\\d).*?(?<bar2>bar\\d).*?(?<bar3>bar3).*"); testRunner.setProperty("regex.result5", "(?s).*(?<bar3>bar\\d).*"); testRunner.setProperty("regex.result6", "(?s)^(?<all>.*)$"); testRunner.setProperty("regex.result7", "(?s)(?<miss>XXX)"); testRunner.setProperty(ENABLE_NAMED_GROUPS, "true"); testRunner.enqueue(SAMPLE_STRING.getBytes("UTF-8")); testRunner.run(); testRunner.assertAllFlowFilesTransferred(ExtractText.REL_MATCH, 1); final MockFlowFile out = testRunner.getFlowFilesForRelationship(ExtractText.REL_MATCH).get(0); java.util.Map<String,String> attributes = out.getAttributes(); out.assertAttributeEquals("regex.result1.all", SAMPLE_STRING); out.assertAttributeEquals("regex.result2.bar1", "bar1"); out.assertAttributeEquals("regex.result3.bar1", "bar1"); out.assertAttributeEquals("regex.result4.bar2", "bar2"); out.assertAttributeEquals("regex.result4.bar2", "bar2"); out.assertAttributeEquals("regex.result4.bar3", "bar3"); out.assertAttributeEquals("regex.result5.bar3", "bar3"); out.assertAttributeEquals("regex.result6.all", SAMPLE_STRING); out.assertAttributeEquals("regex.result7.miss", null); } {code} was (Author: ottobackwards): [~pvillard] Something like this? The restriction on the property to enable is: if you want name groups, all your capturing groups MUST be named. You can't mix named and unnamed captures. {code:java} final String SAMPLE_STRING = "foo\r\nbar1\r\nbar2\r\nbar3\r\nhello\r\nworld\r\n"; @Test public void testProcessorWithGroupNames() throws Exception { final TestRunner testRunner = TestRunners.newTestRunner(new ExtractText()); testRunner.setProperty("regex.result1", "(?s)(?<all>.*)"); testRunner.setProperty("regex.result2", "(?s).*(?<bar1>bar1).*"); testRunner.setProperty("regex.result3", "(?s).*?(?<bar1>bar\\d).*"); testRunner.setProperty("regex.result4", "(?s).*?(?:bar\\d).*?(?<bar2>bar\\d).*?(?<bar3>bar3).*"); testRunner.setProperty("regex.result5", "(?s).*(?<bar3>bar\\d).*"); testRunner.setProperty("regex.result6", "(?s)^(?<all>.*)$"); testRunner.setProperty("regex.result7", "(?s)(?<miss>XXX)"); testRunner.setProperty(ENABLE_NAMED_GROUPS, "true"); testRunner.enqueue(SAMPLE_STRING.getBytes("UTF-8")); testRunner.run(); testRunner.assertAllFlowFilesTransferred(ExtractText.REL_MATCH, 1); final MockFlowFile out = testRunner.getFlowFilesForRelationship(ExtractText.REL_MATCH).get(0); java.util.Map<String,String> attributes = out.getAttributes(); out.assertAttributeEquals("regex.result1.all", SAMPLE_STRING); out.assertAttributeEquals("regex.result2.bar1", "bar1"); out.assertAttributeEquals("regex.result3.bar1", "bar1"); out.assertAttributeEquals("regex.result4.bar2", "bar2"); out.assertAttributeEquals("regex.result4.bar2", "bar2"); out.assertAttributeEquals("regex.result4.bar3", "bar3"); out.assertAttributeEquals("regex.result5.bar3", "bar3"); out.assertAttributeEquals("regex.result6.all", SAMPLE_STRING); out.assertAttributeEquals("regex.result7.miss", null); } {code} > Support named captures in ExtractText > ------------------------------------- > > Key: NIFI-2072 > URL: https://issues.apache.org/jira/browse/NIFI-2072 > Project: Apache NiFi > Issue Type: Improvement > Reporter: Joey Frazee > Assignee: Otto Fowler > Priority: Major > > ExtractText currently captures and creates attributes using numeric indices > (e.g, attribute.name.0, attribute.name.1, etc.) whether or not the capture > groups are named, i.e., patterns like (?<name>\w+). > In addition to being more faithful to the provided regexes, named captures > could help simplify data flows because you wouldn't have to add superfluous > UpdateAttribute steps which are just renaming the indexed captures to more > interpretable names. -- This message was sent by Atlassian Jira (v8.3.4#803005)