So, If an RPC call has to be performed for a batch of Rows(PCollection<Row>), instead of each Row, the recommended way is to batch the Rows in startBundle() of DoFn( https://stackoverflow.com/questions/49094781/yield-results-in-finish-bundle-from-a-custom-dofn/49101711#49101711)? I thought Stateful and Timely Processing could be helpful here.
On Thu, Jul 25, 2019 at 9:54 PM Robert Bradshaw <rober...@google.com> wrote: > Though it's not obvious in the name, Stateful ParDos can only be > applied to keyed PCollections, similar to GroupByKey. (You could, > however, assign every element to the same key and then apply a > Stateful DoFn, though in that case all elements would get processed on > the same worker.) > > On Thu, Jul 25, 2019 at 6:06 PM rahul patwari > <rahulpatwari8...@gmail.com> wrote: > > > > Hi, > > > > https://beam.apache.org/blog/2017/02/13/stateful-processing.html gives > an example of assigning an arbitrary-but-consistent index to each element > on a per key-and-window basis. > > > > If the Stateful ParDo is applied on a Non-Keyed PCollection, say, > PCollection<Row> with Fixed Windows, the state is maintained per window and > every element in the window will be assigned a consistent index? > > Does this mean every element belonging to the window will be processed > in a single DoFn Instance, which otherwise could have been done in multiple > parallel instances, limiting performance? > > Similarly, How does Stateful ParDo behave on Bounded Non-Keyed > PCollection? > > > > Thanks, > > Rahul >