[ 
https://issues.apache.org/jira/browse/BEAM-7018?focusedWorklogId=260355&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-260355
 ]

ASF GitHub Bot logged work on BEAM-7018:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 14/Jun/19 12:12
            Start Date: 14/Jun/19 12:12
    Worklog Time Spent: 10m 
      Work Description: robertwb commented on issue #8859: [BEAM-7018] Added 
Regex transform for PythonSDK
URL: https://github.com/apache/beam/pull/8859#issuecomment-502085526
 
 
   Thanks. These look generally useful. I wonder, however, if they should be in 
their own module rather than in a Regex class in the util module (which is 
generally a java-ism because everything must be in a class). Alternatively one 
could have a Regex module with methods like matches that returns the relevant 
PTransforms. 
   
   The other thought that came to me is that this could probably be simplified 
a lot using lambdas, instead of creating a new PTransform and DoFn class each 
time. E.g.
   
   ```
   class Regex(object):
     def matches(regex, group=0):
       regex = _regex_compile(regex)  # Do this once at construction time
       def maybe_match(element):
         m = regex.match(element)
         if m:
           yield m.group(group)
       return beam.FlatMap(maybe_match)
   ```
   
   If this is a common pattern, we could have a utility function
   
   ```
     def match_objects(regex):
       regex = _regex_compile(regex)
       def maybe_match(element):
         m = regex.match(element)
         if m:
           yield m
       return beam.FlatMap(maybe_match)
   
     def matches(regex, group):
       return match_object(regex) | beam.Map(lambda m: m.group(group))
   ```
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 260355)
    Time Spent: 20m  (was: 10m)

> Regex transform for Python SDK
> ------------------------------
>
>                 Key: BEAM-7018
>                 URL: https://issues.apache.org/jira/browse/BEAM-7018
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-py-core
>            Reporter: Rose Nguyen
>            Assignee: Shehzaad Nakhoda
>            Priority: Minor
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> PTransorms to use Regular Expressions to process elements in a PCollection
> It should offer the same API as its Java counterpart: 
> [https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Regex.java]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to