[ https://issues.apache.org/jira/browse/BEAM-7018?focusedWorklogId=260355&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-260355 ]
ASF GitHub Bot logged work on BEAM-7018: ---------------------------------------- Author: ASF GitHub Bot Created on: 14/Jun/19 12:12 Start Date: 14/Jun/19 12:12 Worklog Time Spent: 10m Work Description: robertwb commented on issue #8859: [BEAM-7018] Added Regex transform for PythonSDK URL: https://github.com/apache/beam/pull/8859#issuecomment-502085526 Thanks. These look generally useful. I wonder, however, if they should be in their own module rather than in a Regex class in the util module (which is generally a java-ism because everything must be in a class). Alternatively one could have a Regex module with methods like matches that returns the relevant PTransforms. The other thought that came to me is that this could probably be simplified a lot using lambdas, instead of creating a new PTransform and DoFn class each time. E.g. ``` class Regex(object): def matches(regex, group=0): regex = _regex_compile(regex) # Do this once at construction time def maybe_match(element): m = regex.match(element) if m: yield m.group(group) return beam.FlatMap(maybe_match) ``` If this is a common pattern, we could have a utility function ``` def match_objects(regex): regex = _regex_compile(regex) def maybe_match(element): m = regex.match(element) if m: yield m return beam.FlatMap(maybe_match) def matches(regex, group): return match_object(regex) | beam.Map(lambda m: m.group(group)) ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 260355) Time Spent: 20m (was: 10m) > Regex transform for Python SDK > ------------------------------ > > Key: BEAM-7018 > URL: https://issues.apache.org/jira/browse/BEAM-7018 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core > Reporter: Rose Nguyen > Assignee: Shehzaad Nakhoda > Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > PTransorms to use Regular Expressions to process elements in a PCollection > It should offer the same API as its Java counterpart: > [https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Regex.java] -- This message was sent by Atlassian JIRA (v7.6.3#76005)