Re: Multiple Outputs from Expand in Python
Ahmet: That's close to what I want, however, DoOutputTuples don't allow for setting PCollections manually. Robert: Great! I didn't see that documented anywhere. I'll try this out. I will modify the pipeline replacement method to enable multiple outputs too. Are there any "gotchas" when modifying this code? On Fri, Oct 25, 2019 at 4:16 PM Robert Bradshaw wrote: > You can literally return a Python tuple of outputs from a composite > transform as well. (Dicts with PCollections as values are also > supported, if you want things to be named rather than referenced by > index.) > > On Fri, Oct 25, 2019 at 4:06 PM Ahmet Altay wrote: > > > > Is DoOutputsTuple what you are looking for? [1] You can look at this > expand function using it [2]. > > > > [1] > https://github.com/apache/beam/blob/master/sdks/python/apache_beam/pvalue.py#L204 > > [2] > https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/core.py#L1283 > > > > On Fri, Oct 25, 2019 at 3:51 PM Luke Cwik wrote: > >> > >> My example is about multiple inputs and not multiple outputs from > further investigation it seems as I don't know. > >> > >> Looking at the documentation online[1] doesn't seem to specify how to > do this either for composite transforms. All the examples are of the single > output variety as well[2]. > >> > >> 1: > https://beam.apache.org/documentation/programming-guide/#composite-transforms > >> 2: > https://github.com/apache/beam/blob/4ba731fe93f7f8385c771caf576745d14edf34b8/sdks/python/apache_beam/examples/cookbook/custom_ptransform.py > >> > >> On Fri, Oct 25, 2019 at 10:24 AM Luke Cwik wrote: > >>> > >>> I believe PCollectionTuple should be unnecessary since Python has > first class support for tuples as shown in the example below[1]. Can we use > tuples to solve your issue? > >>> > >>> wordsStartingWithA = \ > >>> p | 'Words starting with A' >> beam.Create(['apple', 'ant', > 'arrow']) > >>> > >>> wordsStartingWithB = \ > >>> p | 'Words starting with B' >> beam.Create(['ball', 'book', 'bow']) > >>> > >>> ((wordsStartingWithA, wordsStartingWithB) > >>> | beam.Flatten() > >>> | LogElements()) > >>> > >>> 1: > https://github.com/apache/beam/blob/238659bce8043e6a64619a959ab44453dbe22dff/learning/katas/python/Core%20Transforms/Flatten/Flatten/task.py#L29 > >>> > >>> On Fri, Oct 25, 2019 at 10:11 AM Sam Rohde wrote: > > Talked to Daniel offline and it looks like the Python SDK is missing > PCollection Tuples like the one Java has: > https://github.com/rohdesamuel/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/values/PCollectionTuple.java > . > > I'll go ahead and implement that for the Python SDK. > > On Thu, Oct 24, 2019 at 5:20 PM Sam Rohde wrote: > > > > Hey All, > > > > I'm trying to implement an expand override with multiple output > PCollections. The kicker is that I want to insert a new transform for each > output PCollection. How can I do this? > > > > Regards, > > Sam >
Re: Multiple Outputs from Expand in Python
You can literally return a Python tuple of outputs from a composite transform as well. (Dicts with PCollections as values are also supported, if you want things to be named rather than referenced by index.) On Fri, Oct 25, 2019 at 4:06 PM Ahmet Altay wrote: > > Is DoOutputsTuple what you are looking for? [1] You can look at this expand > function using it [2]. > > [1] > https://github.com/apache/beam/blob/master/sdks/python/apache_beam/pvalue.py#L204 > [2] > https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/core.py#L1283 > > On Fri, Oct 25, 2019 at 3:51 PM Luke Cwik wrote: >> >> My example is about multiple inputs and not multiple outputs from further >> investigation it seems as I don't know. >> >> Looking at the documentation online[1] doesn't seem to specify how to do >> this either for composite transforms. All the examples are of the single >> output variety as well[2]. >> >> 1: >> https://beam.apache.org/documentation/programming-guide/#composite-transforms >> 2: >> https://github.com/apache/beam/blob/4ba731fe93f7f8385c771caf576745d14edf34b8/sdks/python/apache_beam/examples/cookbook/custom_ptransform.py >> >> On Fri, Oct 25, 2019 at 10:24 AM Luke Cwik wrote: >>> >>> I believe PCollectionTuple should be unnecessary since Python has first >>> class support for tuples as shown in the example below[1]. Can we use >>> tuples to solve your issue? >>> >>> wordsStartingWithA = \ >>> p | 'Words starting with A' >> beam.Create(['apple', 'ant', 'arrow']) >>> >>> wordsStartingWithB = \ >>> p | 'Words starting with B' >> beam.Create(['ball', 'book', 'bow']) >>> >>> ((wordsStartingWithA, wordsStartingWithB) >>> | beam.Flatten() >>> | LogElements()) >>> >>> 1: >>> https://github.com/apache/beam/blob/238659bce8043e6a64619a959ab44453dbe22dff/learning/katas/python/Core%20Transforms/Flatten/Flatten/task.py#L29 >>> >>> On Fri, Oct 25, 2019 at 10:11 AM Sam Rohde wrote: Talked to Daniel offline and it looks like the Python SDK is missing PCollection Tuples like the one Java has: https://github.com/rohdesamuel/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/values/PCollectionTuple.java. I'll go ahead and implement that for the Python SDK. On Thu, Oct 24, 2019 at 5:20 PM Sam Rohde wrote: > > Hey All, > > I'm trying to implement an expand override with multiple output > PCollections. The kicker is that I want to insert a new transform for > each output PCollection. How can I do this? > > Regards, > Sam
Re: Multiple Outputs from Expand in Python
Is DoOutputsTuple what you are looking for? [1] You can look at this expand function using it [2]. [1] https://github.com/apache/beam/blob/master/sdks/python/apache_beam/pvalue.py#L204 [2] https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/core.py#L1283 On Fri, Oct 25, 2019 at 3:51 PM Luke Cwik wrote: > My example is about multiple inputs and not multiple outputs from further > investigation it seems as I don't know. > > Looking at the documentation online[1] doesn't seem to specify how to do > this either for composite transforms. All the examples are of the single > output variety as well[2]. > > 1: > https://beam.apache.org/documentation/programming-guide/#composite-transforms > 2: > https://github.com/apache/beam/blob/4ba731fe93f7f8385c771caf576745d14edf34b8/sdks/python/apache_beam/examples/cookbook/custom_ptransform.py > > On Fri, Oct 25, 2019 at 10:24 AM Luke Cwik wrote: > >> I believe PCollectionTuple should be unnecessary since Python has first >> class support for tuples as shown in the example below[1]. Can we use >> tuples to solve your issue? >> >> wordsStartingWithA = \ >> p | 'Words starting with A' >> beam.Create(['apple', 'ant', 'arrow']) >> >> wordsStartingWithB = \ >> p | 'Words starting with B' >> beam.Create(['ball', 'book', 'bow']) >> >> ((wordsStartingWithA, wordsStartingWithB) >> | beam.Flatten() >> | LogElements()) >> >> 1: >> https://github.com/apache/beam/blob/238659bce8043e6a64619a959ab44453dbe22dff/learning/katas/python/Core%20Transforms/Flatten/Flatten/task.py#L29 >> >> On Fri, Oct 25, 2019 at 10:11 AM Sam Rohde wrote: >> >>> Talked to Daniel offline and it looks like the Python SDK is missing >>> PCollection Tuples like the one Java has: >>> https://github.com/rohdesamuel/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/values/PCollectionTuple.java >>> . >>> >>> I'll go ahead and implement that for the Python SDK. >>> >>> On Thu, Oct 24, 2019 at 5:20 PM Sam Rohde wrote: >>> Hey All, I'm trying to implement an expand override with multiple output PCollections. The kicker is that I want to insert a new transform for each output PCollection. How can I do this? Regards, Sam >>>
Re: Multiple Outputs from Expand in Python
My example is about multiple inputs and not multiple outputs from further investigation it seems as I don't know. Looking at the documentation online[1] doesn't seem to specify how to do this either for composite transforms. All the examples are of the single output variety as well[2]. 1: https://beam.apache.org/documentation/programming-guide/#composite-transforms 2: https://github.com/apache/beam/blob/4ba731fe93f7f8385c771caf576745d14edf34b8/sdks/python/apache_beam/examples/cookbook/custom_ptransform.py On Fri, Oct 25, 2019 at 10:24 AM Luke Cwik wrote: > I believe PCollectionTuple should be unnecessary since Python has first > class support for tuples as shown in the example below[1]. Can we use > tuples to solve your issue? > > wordsStartingWithA = \ > p | 'Words starting with A' >> beam.Create(['apple', 'ant', 'arrow']) > > wordsStartingWithB = \ > p | 'Words starting with B' >> beam.Create(['ball', 'book', 'bow']) > > ((wordsStartingWithA, wordsStartingWithB) > | beam.Flatten() > | LogElements()) > > 1: > https://github.com/apache/beam/blob/238659bce8043e6a64619a959ab44453dbe22dff/learning/katas/python/Core%20Transforms/Flatten/Flatten/task.py#L29 > > On Fri, Oct 25, 2019 at 10:11 AM Sam Rohde wrote: > >> Talked to Daniel offline and it looks like the Python SDK is missing >> PCollection Tuples like the one Java has: >> https://github.com/rohdesamuel/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/values/PCollectionTuple.java >> . >> >> I'll go ahead and implement that for the Python SDK. >> >> On Thu, Oct 24, 2019 at 5:20 PM Sam Rohde wrote: >> >>> Hey All, >>> >>> I'm trying to implement an expand override with multiple output >>> PCollections. The kicker is that I want to insert a new transform for each >>> output PCollection. How can I do this? >>> >>> Regards, >>> Sam >>> >>
Re: Multiple Outputs from Expand in Python
I believe PCollectionTuple should be unnecessary since Python has first class support for tuples as shown in the example below[1]. Can we use tuples to solve your issue? wordsStartingWithA = \ p | 'Words starting with A' >> beam.Create(['apple', 'ant', 'arrow']) wordsStartingWithB = \ p | 'Words starting with B' >> beam.Create(['ball', 'book', 'bow']) ((wordsStartingWithA, wordsStartingWithB) | beam.Flatten() | LogElements()) 1: https://github.com/apache/beam/blob/238659bce8043e6a64619a959ab44453dbe22dff/learning/katas/python/Core%20Transforms/Flatten/Flatten/task.py#L29 On Fri, Oct 25, 2019 at 10:11 AM Sam Rohde wrote: > Talked to Daniel offline and it looks like the Python SDK is missing > PCollection Tuples like the one Java has: > https://github.com/rohdesamuel/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/values/PCollectionTuple.java > . > > I'll go ahead and implement that for the Python SDK. > > On Thu, Oct 24, 2019 at 5:20 PM Sam Rohde wrote: > >> Hey All, >> >> I'm trying to implement an expand override with multiple output >> PCollections. The kicker is that I want to insert a new transform for each >> output PCollection. How can I do this? >> >> Regards, >> Sam >> >
Re: Multiple Outputs from Expand in Python
Talked to Daniel offline and it looks like the Python SDK is missing PCollection Tuples like the one Java has: https://github.com/rohdesamuel/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/values/PCollectionTuple.java . I'll go ahead and implement that for the Python SDK. On Thu, Oct 24, 2019 at 5:20 PM Sam Rohde wrote: > Hey All, > > I'm trying to implement an expand override with multiple output > PCollections. The kicker is that I want to insert a new transform for each > output PCollection. How can I do this? > > Regards, > Sam >
Multiple Outputs from Expand in Python
Hey All, I'm trying to implement an expand override with multiple output PCollections. The kicker is that I want to insert a new transform for each output PCollection. How can I do this? Regards, Sam