On Wed, May 15, 2019 at 8:43 PM Allie Chen <[email protected]> wrote:
> Thanks all for your reply. I will try each of them and see how it goes. > > The experiment I am working now is similar to > https://stackoverflow.com/questions/48886943/early-results-from-groupbykey-transform, > which tries to get early results from GroupByKey with windowing. I have > some code like: > > Reading | beam.WindowInto(beam.window.GlobalWindows(), > > > trigger=trigger.Repeatedly(trigger.AfterCount(1)), > accumulation_mode=beam.transforms.trigger.AccumulationMode.DISCARDING) > > | MapWithAKey > > | GroupByKey > > | RemoveKey > > | OtherTransforms > > > I don't see the window and trigger working, GroupByKey still waits for all > elements. I also tried adding a timestamp for each element and using a > fixed size window. Seems no impact. > > > Anyone knows how to get the early results from GroupByKey for a bounded > source? > Note that this is essentially how Reshuffle() is implemented. However, batch never gives early results from a GroupByKey; each stage is executed sequentially. Is the goal here to be able to parallelize the Read with other operations? If the Read (and limited-parallelism write) is still the bottleneck, that might not help much.
