Have you looked at the GroupIntoBatches transform?

On Thu, Jul 25, 2019 at 9:34 AM rahul patwari <rahulpatwari8...@gmail.com>
wrote:

> So, If an RPC call has to be performed for a batch of
> Rows(PCollection<Row>), instead of each Row, the recommended way is to
> batch the Rows in startBundle() of DoFn(
> https://stackoverflow.com/questions/49094781/yield-results-in-finish-bundle-from-a-custom-dofn/49101711#49101711)?
> I thought Stateful and Timely Processing could be helpful here.
>
> On Thu, Jul 25, 2019 at 9:54 PM Robert Bradshaw <rober...@google.com>
> wrote:
>
>> Though it's not obvious in the name, Stateful ParDos can only be
>> applied to keyed PCollections, similar to GroupByKey. (You could,
>> however, assign every element to the same key and then apply a
>> Stateful DoFn, though in that case all elements would get processed on
>> the same worker.)
>>
>> On Thu, Jul 25, 2019 at 6:06 PM rahul patwari
>> <rahulpatwari8...@gmail.com> wrote:
>> >
>> > Hi,
>> >
>> > https://beam.apache.org/blog/2017/02/13/stateful-processing.html
>> gives an example of assigning an arbitrary-but-consistent index to each
>> element on a per key-and-window basis.
>> >
>> > If the Stateful ParDo is applied on a Non-Keyed PCollection, say,
>> PCollection<Row> with Fixed Windows, the state is maintained per window and
>> every element in the window will be assigned a consistent index?
>> > Does this mean every element belonging to the window will be processed
>> in a single DoFn Instance, which otherwise could have been done in multiple
>> parallel instances, limiting performance?
>> > Similarly, How does Stateful ParDo behave on Bounded Non-Keyed
>> PCollection?
>> >
>> > Thanks,
>> > Rahul
>>
>

Reply via email to