[jira] [Commented] (NIFI-2072) Support named captures in ExtractText
[ https://issues.apache.org/jira/browse/NIFI-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17163937#comment-17163937 ] Pierre Villard commented on NIFI-2072: -- I might be able to have a look over the WE but if someone can give it a try, that would be helpful. > Support named captures in ExtractText > - > > Key: NIFI-2072 > URL: https://issues.apache.org/jira/browse/NIFI-2072 > Project: Apache NiFi > Issue Type: Improvement >Reporter: Joey Frazee >Assignee: Otto Fowler >Priority: Major > Labels: extracttext > > ExtractText currently captures and creates attributes using numeric indices > (e.g, attribute.name.0, attribute.name.1, etc.) whether or not the capture > groups are named, i.e., patterns like (?\w+). > In addition to being more faithful to the provided regexes, named captures > could help simplify data flows because you wouldn't have to add superfluous > UpdateAttribute steps which are just renaming the indexed captures to more > interpretable names. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (NIFI-2072) Support named captures in ExtractText
[ https://issues.apache.org/jira/browse/NIFI-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17163925#comment-17163925 ] Otto Fowler commented on NIFI-2072: --- The PR is up for review. The next step is that somebody reviews it. And if that person is a committer then they can +1 it and merge it. [~pvillard] is pretty busy. You are welcome to review and try etc. If that is in the realm of things you are comfortable doing > Support named captures in ExtractText > - > > Key: NIFI-2072 > URL: https://issues.apache.org/jira/browse/NIFI-2072 > Project: Apache NiFi > Issue Type: Improvement >Reporter: Joey Frazee >Assignee: Otto Fowler >Priority: Major > Labels: extracttext > > ExtractText currently captures and creates attributes using numeric indices > (e.g, attribute.name.0, attribute.name.1, etc.) whether or not the capture > groups are named, i.e., patterns like (?\w+). > In addition to being more faithful to the provided regexes, named captures > could help simplify data flows because you wouldn't have to add superfluous > UpdateAttribute steps which are just renaming the indexed captures to more > interpretable names. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (NIFI-2072) Support named captures in ExtractText
[ https://issues.apache.org/jira/browse/NIFI-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17163913#comment-17163913 ] Malthe Borch commented on NIFI-2072: [~otto] Nice work. I think this is ready for the next step (not sure who/how that works). > Support named captures in ExtractText > - > > Key: NIFI-2072 > URL: https://issues.apache.org/jira/browse/NIFI-2072 > Project: Apache NiFi > Issue Type: Improvement >Reporter: Joey Frazee >Assignee: Otto Fowler >Priority: Major > Labels: extracttext > > ExtractText currently captures and creates attributes using numeric indices > (e.g, attribute.name.0, attribute.name.1, etc.) whether or not the capture > groups are named, i.e., patterns like (?\w+). > In addition to being more faithful to the provided regexes, named captures > could help simplify data flows because you wouldn't have to add superfluous > UpdateAttribute steps which are just renaming the indexed captures to more > interpretable names. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (NIFI-2072) Support named captures in ExtractText
[ https://issues.apache.org/jira/browse/NIFI-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17163793#comment-17163793 ] Otto Fowler commented on NIFI-2072: --- [~malthe] I just pushed validation support > Support named captures in ExtractText > - > > Key: NIFI-2072 > URL: https://issues.apache.org/jira/browse/NIFI-2072 > Project: Apache NiFi > Issue Type: Improvement >Reporter: Joey Frazee >Assignee: Otto Fowler >Priority: Major > Labels: extracttext > > ExtractText currently captures and creates attributes using numeric indices > (e.g, attribute.name.0, attribute.name.1, etc.) whether or not the capture > groups are named, i.e., patterns like (?\w+). > In addition to being more faithful to the provided regexes, named captures > could help simplify data flows because you wouldn't have to add superfluous > UpdateAttribute steps which are just renaming the indexed captures to more > interpretable names. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (NIFI-2072) Support named captures in ExtractText
[ https://issues.apache.org/jira/browse/NIFI-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17153069#comment-17153069 ] Otto Fowler commented on NIFI-2072: --- I'm more inclined to do the validation, since it think handling mixed, and scoped ( nested ) etc goes downhill real fast since java doesn't support it. Would would be nice is when I called group(string) i could also call getGroupIndex(string) so that I could mix and match, but you can't. > Support named captures in ExtractText > - > > Key: NIFI-2072 > URL: https://issues.apache.org/jira/browse/NIFI-2072 > Project: Apache NiFi > Issue Type: Improvement >Reporter: Joey Frazee >Assignee: Otto Fowler >Priority: Major > Labels: extracttext > > ExtractText currently captures and creates attributes using numeric indices > (e.g, attribute.name.0, attribute.name.1, etc.) whether or not the capture > groups are named, i.e., patterns like (?\w+). > In addition to being more faithful to the provided regexes, named captures > could help simplify data flows because you wouldn't have to add superfluous > UpdateAttribute steps which are just renaming the indexed captures to more > interpretable names. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (NIFI-2072) Support named captures in ExtractText
[ https://issues.apache.org/jira/browse/NIFI-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17152904#comment-17152904 ] Malthe Borch commented on NIFI-2072: I would be happy then with "Enable named group support". In terms of what happens if an unnamed capture group is used, I think it would be better to either: - Allow it. - Implement a validation step that scans the expression for unnamed capture groups (i.e. those that are not named and not non-capturing). > Support named captures in ExtractText > - > > Key: NIFI-2072 > URL: https://issues.apache.org/jira/browse/NIFI-2072 > Project: Apache NiFi > Issue Type: Improvement >Reporter: Joey Frazee >Assignee: Otto Fowler >Priority: Major > Labels: extracttext > > ExtractText currently captures and creates attributes using numeric indices > (e.g, attribute.name.0, attribute.name.1, etc.) whether or not the capture > groups are named, i.e., patterns like (?\w+). > In addition to being more faithful to the provided regexes, named captures > could help simplify data flows because you wouldn't have to add superfluous > UpdateAttribute steps which are just renaming the indexed captures to more > interpretable names. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (NIFI-2072) Support named captures in ExtractText
[ https://issues.apache.org/jira/browse/NIFI-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17152886#comment-17152886 ] Pierre Villard commented on NIFI-2072: -- Sorry, I don't really have the time to look into it as much as I'd like right now. The only rule is that we can't make any breaking change: meaning that any existing flow should keep working the exact same way after an upgrade. That's why we usually provide a property to explicitly allow users to enable this new behavior. > Support named captures in ExtractText > - > > Key: NIFI-2072 > URL: https://issues.apache.org/jira/browse/NIFI-2072 > Project: Apache NiFi > Issue Type: Improvement >Reporter: Joey Frazee >Assignee: Otto Fowler >Priority: Major > Labels: extracttext > > ExtractText currently captures and creates attributes using numeric indices > (e.g, attribute.name.0, attribute.name.1, etc.) whether or not the capture > groups are named, i.e., patterns like (?\w+). > In addition to being more faithful to the provided regexes, named captures > could help simplify data flows because you wouldn't have to add superfluous > UpdateAttribute steps which are just renaming the indexed captures to more > interpretable names. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (NIFI-2072) Support named captures in ExtractText
[ https://issues.apache.org/jira/browse/NIFI-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17152081#comment-17152081 ] Otto Fowler commented on NIFI-2072: --- That is a good question. Usually (in my experience) when a behavior of a processor is changed it is put behind a configuration property, so that is the convention I followed. [~pvillard] did this as well. > Support named captures in ExtractText > - > > Key: NIFI-2072 > URL: https://issues.apache.org/jira/browse/NIFI-2072 > Project: Apache NiFi > Issue Type: Improvement >Reporter: Joey Frazee >Assignee: Otto Fowler >Priority: Major > Labels: extracttext > > ExtractText currently captures and creates attributes using numeric indices > (e.g, attribute.name.0, attribute.name.1, etc.) whether or not the capture > groups are named, i.e., patterns like (?\w+). > In addition to being more faithful to the provided regexes, named captures > could help simplify data flows because you wouldn't have to add superfluous > UpdateAttribute steps which are just renaming the indexed captures to more > interpretable names. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (NIFI-2072) Support named captures in ExtractText
[ https://issues.apache.org/jira/browse/NIFI-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17151827#comment-17151827 ] Malthe Borch commented on NIFI-2072: Is it really necessary to enable named capture group rather than just use them? If I don't want a named capture group, I suppose I am just not going to name them, opting instead for enumerated ones. > Support named captures in ExtractText > - > > Key: NIFI-2072 > URL: https://issues.apache.org/jira/browse/NIFI-2072 > Project: Apache NiFi > Issue Type: Improvement >Reporter: Joey Frazee >Assignee: Otto Fowler >Priority: Major > Labels: extracttext > > ExtractText currently captures and creates attributes using numeric indices > (e.g, attribute.name.0, attribute.name.1, etc.) whether or not the capture > groups are named, i.e., patterns like (?\w+). > In addition to being more faithful to the provided regexes, named captures > could help simplify data flows because you wouldn't have to add superfluous > UpdateAttribute steps which are just renaming the indexed captures to more > interpretable names. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (NIFI-2072) Support named captures in ExtractText
[ https://issues.apache.org/jira/browse/NIFI-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17151064#comment-17151064 ] Otto Fowler commented on NIFI-2072: --- OK I have a PR just about ready for this. But just to get some feedback first: After the PR there implicitly two ways the processor works based on the enable named groups property. The old way if it is not enabled. The new way. The new way is different in that numeric indices are not added until the second set of matches ( if you have that enabled). The root attribute name is used for the 0 group -or- the whole match line if there are no groups specified. such as : {code:java} @Test public void testFindAll() throws Exception { final TestRunner testRunner = TestRunners.newTestRunner(new ExtractText()); testRunner.setProperty(ENABLE_NAMED_GROUPS, "true"); testRunner.setProperty(ExtractText.ENABLE_REPEATING_CAPTURE_GROUP, "true"); final String attributeKey = "regex.result"; testRunner.setProperty(attributeKey, "(?s)(?\\w+)"); testRunner.enqueue("This is my text".getBytes(StandardCharsets.UTF_8)); testRunner.run(); testRunner.assertAllFlowFilesTransferred(ExtractText.REL_MATCH, 1); final MockFlowFile out = testRunner.getFlowFilesForRelationship(ExtractText.REL_MATCH).get(0); // Ensure the zero capture group is in the resultant attributes out.assertAttributeExists(attributeKey); out.assertAttributeExists(attributeKey + ".W"); out.assertAttributeExists(attributeKey + ".W.1"); out.assertAttributeExists(attributeKey + ".W.2"); out.assertAttributeExists(attributeKey + ".W.3"); out.assertAttributeEquals(attributeKey, "This"); out.assertAttributeEquals(attributeKey + ".W", "This"); out.assertAttributeEquals(attributeKey + ".W.1", "is"); out.assertAttributeEquals(attributeKey + ".W.2", "my"); out.assertAttributeEquals(attributeKey + ".W.3", "text"); } @Test public void testFindAllPair() throws Exception { final TestRunner testRunner = TestRunners.newTestRunner(new ExtractText()); testRunner.setProperty(ENABLE_NAMED_GROUPS, "true"); testRunner.setProperty(ExtractText.ENABLE_REPEATING_CAPTURE_GROUP, "true"); final String attributeKey = "regex.result"; testRunner.setProperty(attributeKey, "(?\\w+)=(?\\d+)"); testRunner.enqueue("a=1,b=10,c=100".getBytes(StandardCharsets.UTF_8)); testRunner.run(); testRunner.assertAllFlowFilesTransferred(ExtractText.REL_MATCH, 1); final MockFlowFile out = testRunner.getFlowFilesForRelationship(ExtractText.REL_MATCH).get(0); // Ensure the zero capture group is in the resultant attributes out.assertAttributeExists(attributeKey); out.assertAttributeExists(attributeKey + ".LEFT"); out.assertAttributeExists(attributeKey + ".RIGHT"); out.assertAttributeExists(attributeKey + ".LEFT.1"); out.assertAttributeExists(attributeKey + ".RIGHT.1"); out.assertAttributeExists(attributeKey + ".LEFT.2"); out.assertAttributeExists(attributeKey + ".RIGHT.2"); out.assertAttributeNotExists(attributeKey + ".LEFT.3"); // Ensure there's no more attributes out.assertAttributeNotExists(attributeKey + ".RIGHT.3"); // Ensure there's no more attributes out.assertAttributeEquals(attributeKey , "a=1"); out.assertAttributeEquals(attributeKey + ".LEFT", "a"); out.assertAttributeEquals(attributeKey + ".RIGHT", "1"); out.assertAttributeEquals(attributeKey + ".LEFT.1", "b"); out.assertAttributeEquals(attributeKey + ".RIGHT.1", "10"); out.assertAttributeEquals(attributeKey + ".LEFT.2", "c"); out.assertAttributeEquals(attributeKey + ".RIGHT.2", "100"); } {code} > Support named captures in ExtractText > - > > Key: NIFI-2072 > URL: https://issues.apache.org/jira/browse/NIFI-2072 > Project: Apache NiFi > Issue Type: Improvement >Reporter: Joey Frazee >Assignee: Otto Fowler >Priority: Major > > ExtractText currently captures and creates attributes using numeric indices > (e.g, attribute.name.0, attribute.name.1, etc.) whether or not the capture > groups are named, i.e., patterns like (?\w+). > In addition to being more faithful to the provided regexes, named captures > could help simplify data flows because you wouldn't have to add superfluous > UpdateAttribute steps which are just renaming the indexed captures to more > interpretable names. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (NIFI-2072) Support named captures in ExtractText
[ https://issues.apache.org/jira/browse/NIFI-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17151065#comment-17151065 ] Otto Fowler commented on NIFI-2072: --- [~pvillard] > Support named captures in ExtractText > - > > Key: NIFI-2072 > URL: https://issues.apache.org/jira/browse/NIFI-2072 > Project: Apache NiFi > Issue Type: Improvement >Reporter: Joey Frazee >Assignee: Otto Fowler >Priority: Major > > ExtractText currently captures and creates attributes using numeric indices > (e.g, attribute.name.0, attribute.name.1, etc.) whether or not the capture > groups are named, i.e., patterns like (?\w+). > In addition to being more faithful to the provided regexes, named captures > could help simplify data flows because you wouldn't have to add superfluous > UpdateAttribute steps which are just renaming the indexed captures to more > interpretable names. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (NIFI-2072) Support named captures in ExtractText
[ https://issues.apache.org/jira/browse/NIFI-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148956#comment-17148956 ] Otto Fowler commented on NIFI-2072: --- [~pvillard] Something like this? The restriction on the property to enable is: if you want name groups, all your capturing groups MUST be named. You can't mix named and unnamed captures. {code:java} final String SAMPLE_STRING = "foo\r\nbar1\r\nbar2\r\nbar3\r\nhello\r\nworld\r\n"; @Test public void testProcessorWithGroupNames() throws Exception { final TestRunner testRunner = TestRunners.newTestRunner(new ExtractText()); testRunner.setProperty("regex.result1", "(?s)(?.*)"); testRunner.setProperty("regex.result2", "(?s).*(?bar1).*"); testRunner.setProperty("regex.result3", "(?s).*?(?bar\\d).*"); testRunner.setProperty("regex.result4", "(?s).*?(?:bar\\d).*?(?bar\\d).*?(?bar3).*"); testRunner.setProperty("regex.result5", "(?s).*(?bar\\d).*"); testRunner.setProperty("regex.result6", "(?s)^(?.*)$"); testRunner.setProperty("regex.result7", "(?s)(?XXX)"); testRunner.setProperty(ENABLE_NAMED_GROUPS, "true"); testRunner.enqueue(SAMPLE_STRING.getBytes("UTF-8")); testRunner.run(); testRunner.assertAllFlowFilesTransferred(ExtractText.REL_MATCH, 1); final MockFlowFile out = testRunner.getFlowFilesForRelationship(ExtractText.REL_MATCH).get(0); java.util.Map attributes = out.getAttributes(); out.assertAttributeEquals("regex.result1.all", SAMPLE_STRING); out.assertAttributeEquals("regex.result2.bar1", "bar1"); out.assertAttributeEquals("regex.result3.bar1", "bar1"); out.assertAttributeEquals("regex.result4.bar2", "bar2"); out.assertAttributeEquals("regex.result4.bar2", "bar2"); out.assertAttributeEquals("regex.result4.bar3", "bar3"); out.assertAttributeEquals("regex.result5.bar3", "bar3"); out.assertAttributeEquals("regex.result6.all", SAMPLE_STRING); out.assertAttributeEquals("regex.result7.miss", null); } {code} > Support named captures in ExtractText > - > > Key: NIFI-2072 > URL: https://issues.apache.org/jira/browse/NIFI-2072 > Project: Apache NiFi > Issue Type: Improvement >Reporter: Joey Frazee >Assignee: Otto Fowler >Priority: Major > > ExtractText currently captures and creates attributes using numeric indices > (e.g, attribute.name.0, attribute.name.1, etc.) whether or not the capture > groups are named, i.e., patterns like (?\w+). > In addition to being more faithful to the provided regexes, named captures > could help simplify data flows because you wouldn't have to add superfluous > UpdateAttribute steps which are just renaming the indexed captures to more > interpretable names. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (NIFI-2072) Support named captures in ExtractText
[ https://issues.apache.org/jira/browse/NIFI-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17147556#comment-17147556 ] Otto Fowler commented on NIFI-2072: --- I'll take a shot > Support named captures in ExtractText > - > > Key: NIFI-2072 > URL: https://issues.apache.org/jira/browse/NIFI-2072 > Project: Apache NiFi > Issue Type: Improvement >Reporter: Joey Frazee >Assignee: Otto Fowler >Priority: Major > > ExtractText currently captures and creates attributes using numeric indices > (e.g, attribute.name.0, attribute.name.1, etc.) whether or not the capture > groups are named, i.e., patterns like (?\w+). > In addition to being more faithful to the provided regexes, named captures > could help simplify data flows because you wouldn't have to add superfluous > UpdateAttribute steps which are just renaming the indexed captures to more > interpretable names. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (NIFI-2072) Support named captures in ExtractText
[ https://issues.apache.org/jira/browse/NIFI-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17146483#comment-17146483 ] Pierre Villard commented on NIFI-2072: -- Hi [~malthe] - I didn't go further on this one and that's definitely open to anyone willing to give it a try. > Support named captures in ExtractText > - > > Key: NIFI-2072 > URL: https://issues.apache.org/jira/browse/NIFI-2072 > Project: Apache NiFi > Issue Type: Improvement >Reporter: Joey Frazee >Priority: Major > > ExtractText currently captures and creates attributes using numeric indices > (e.g, attribute.name.0, attribute.name.1, etc.) whether or not the capture > groups are named, i.e., patterns like (?\w+). > In addition to being more faithful to the provided regexes, named captures > could help simplify data flows because you wouldn't have to add superfluous > UpdateAttribute steps which are just renaming the indexed captures to more > interpretable names. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (NIFI-2072) Support named captures in ExtractText
[ https://issues.apache.org/jira/browse/NIFI-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17146104#comment-17146104 ] Malthe Borch commented on NIFI-2072: [~pvillard] did you ever make any headway with this or is it open for work, assuming that you are still happy with the suggested behavior? > Support named captures in ExtractText > - > > Key: NIFI-2072 > URL: https://issues.apache.org/jira/browse/NIFI-2072 > Project: Apache NiFi > Issue Type: Improvement >Reporter: Joey Frazee >Priority: Major > > ExtractText currently captures and creates attributes using numeric indices > (e.g, attribute.name.0, attribute.name.1, etc.) whether or not the capture > groups are named, i.e., patterns like (?\w+). > In addition to being more faithful to the provided regexes, named captures > could help simplify data flows because you wouldn't have to add superfluous > UpdateAttribute steps which are just renaming the indexed captures to more > interpretable names. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (NIFI-2072) Support named captures in ExtractText
[ https://issues.apache.org/jira/browse/NIFI-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063392#comment-16063392 ] Andre F de Miranda commented on NIFI-2072: -- [~jfrazee] [~pvillard] There is a workaround to this that is using ExtractGrok. Grok support named captures (and their extraction into attributes) out of the box, making a suitable alternative to the functionality requested here. All you need to do is to paste a pure regex on the grok pattern and voila. You get an attribute {{grok.captureName}} and with the captured value Unless there are some edge cases where ExtractGrok won't be able to handle I suggest this to be a won't fix? > Support named captures in ExtractText > - > > Key: NIFI-2072 > URL: https://issues.apache.org/jira/browse/NIFI-2072 > Project: Apache NiFi > Issue Type: Improvement >Reporter: Joey Frazee > > ExtractText currently captures and creates attributes using numeric indices > (e.g, attribute.name.0, attribute.name.1, etc.) whether or not the capture > groups are named, i.e., patterns like (?\w+). > In addition to being more faithful to the provided regexes, named captures > could help simplify data flows because you wouldn't have to add superfluous > UpdateAttribute steps which are just renaming the indexed captures to more > interpretable names. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-2072) Support named captures in ExtractText
[ https://issues.apache.org/jira/browse/NIFI-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15522554#comment-15522554 ] Pierre Villard commented on NIFI-2072: -- [~jfrazee] Here is a proposition: - I add a property allowing users to enable capture group naming. - If this property is enabled, it won't change the current behavior, but there will be additional attributes generated to return the corresponding capture groups. Example: Let's say the user has added the following property to the processor: {code} Property name = keyvalue Property value = (\w+)=(?\d+) {code} The data is: {code} a=1,b=10,c=100 {code} The following properties will be populated (in addition to the ones already created): {code} keyvalue.value.0=1 keyvalue.value.1=10 keyvalue.value.2=100 {code} If the repeating capture groups property is not enabled, then we'll have: {code} keyvalue.value=1 {code} If the regular expression is: {code} Property name = keyvalue Property value = (?\w+)=(?\d+) {code} The following properties will be populated (in addition to the ones already created): {code} keyvalue.value.0=1 keyvalue.value.1=10 keyvalue.value.2=100 keyvalue.key.0=a keyvalue.key.1=b keyvalue.key.2=c {code} If the repeating capture groups property is not enabled, then we'll have: {code} keyvalue.value=1 keyvalue.key=a {code} Does it sound like something acceptable? > Support named captures in ExtractText > - > > Key: NIFI-2072 > URL: https://issues.apache.org/jira/browse/NIFI-2072 > Project: Apache NiFi > Issue Type: Improvement >Reporter: Joey Frazee > > ExtractText currently captures and creates attributes using numeric indices > (e.g, attribute.name.0, attribute.name.1, etc.) whether or not the capture > groups are named, i.e., patterns like (?\w+). > In addition to being more faithful to the provided regexes, named captures > could help simplify data flows because you wouldn't have to add superfluous > UpdateAttribute steps which are just renaming the indexed captures to more > interpretable names. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NIFI-2072) Support named captures in ExtractText
[ https://issues.apache.org/jira/browse/NIFI-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15516790#comment-15516790 ] Joey Frazee commented on NIFI-2072: --- [~pvillard] Yeah, I honestly just assumed that would have been there in the Pattern and/or Matcher. I do want to ask, though, from a usability perspective, if getting a bit nasty might be justified here? > Support named captures in ExtractText > - > > Key: NIFI-2072 > URL: https://issues.apache.org/jira/browse/NIFI-2072 > Project: Apache NiFi > Issue Type: Improvement >Reporter: Joey Frazee > > ExtractText currently captures and creates attributes using numeric indices > (e.g, attribute.name.0, attribute.name.1, etc.) whether or not the capture > groups are named, i.e., patterns like (?\w+). > In addition to being more faithful to the provided regexes, named captures > could help simplify data flows because you wouldn't have to add superfluous > UpdateAttribute steps which are just renaming the indexed captures to more > interpretable names. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NIFI-2072) Support named captures in ExtractText
[ https://issues.apache.org/jira/browse/NIFI-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15512784#comment-15512784 ] Pierre Villard commented on NIFI-2072: -- [~jfrazee] I was looking at this JIRA and I agree that it would be a great addition. However it seems this is not possible to get the group names from the Pattern expression unless if we are going a bit nasty... http://stackoverflow.com/questions/15588903/get-group-names-in-java-regex > Support named captures in ExtractText > - > > Key: NIFI-2072 > URL: https://issues.apache.org/jira/browse/NIFI-2072 > Project: Apache NiFi > Issue Type: Improvement >Reporter: Joey Frazee > > ExtractText currently captures and creates attributes using numeric indices > (e.g, attribute.name.0, attribute.name.1, etc.) whether or not the capture > groups are named, i.e., patterns like (?\w+). > In addition to being more faithful to the provided regexes, named captures > could help simplify data flows because you wouldn't have to add superfluous > UpdateAttribute steps which are just renaming the indexed captures to more > interpretable names. -- This message was sent by Atlassian JIRA (v6.3.4#6332)