Thank you Luke. I will work on implementing my use case with Stateful ParDo
itself and come back if I have any questions.
Appreciate your help.
On Fri, Jun 26, 2020 at 8:14 AM Luke Cwik wrote:
> Use a stateful DoFn and buffer the elements in a bag state. You'll want to
> use a key that contains enough data to match your join condition you are
> trying to match. For example, if your trying to match on a customerId then
> you would do something like:
> element 1 -> ParDo(extract customer id) -> KV ->
> stateful ParDo(buffer element 1 in bag state)
> ...
> element 5 -> ParDo(extract customer id) -> KV ->
> stateful ParDo(output all element in bag)
>
> If you are matching on cudomerId and eventId then you would use a
> composite key (customerId, eventId).
>
> You can always use a single global key but you will lose all parallelism
> during processing (for small pipelines this likely won't matter).
>
> On Fri, Jun 26, 2020 at 7:29 AM Praveen K Viswanathan <
> harish.prav...@gmail.com> wrote:
>
>> Hi All - I have a DoFn which generates data (KV pair) for each element
>> that it is processing. It also has to read from that KV for other elements
>> based on a key which means, the KV has to retain all the data that's
>> getting added to it while processing every element. I was thinking
>> about the "slow-caching side input pattern" but it is more of caching
>> outside the DoFn and then using it inside. It doesn't update the cache
>> inside a DoFn. Please share if anyone has thoughts on how to approach this
>> case.
>>
>> Element 1 > Add a record to a KV > . Element 5 > Used the value from
>> KV if there is a match in the key
>>
>> --
>> Thanks,
>> Praveen K Viswanathan
>>
>
--
Thanks,
Praveen K Viswanathan