Re: Shuffling on apache beam

pasquale . bonito Thu, 23 May 2019 23:15:14 -0700

Hi Reuven,
I would like to know if is possible to guarantee that record are processed by 
the same thread/task based on a key, as probably happens in a combine/stateful 
operation, without adding the delay of a windows.
This could increase efficiency of caching and reduce same racing condition when 
writing data.
I understand that workers are not part of programming model so I would like to 
know if it's possible to achieve this behaviour reducing at minimum the delay 
of windowing. We don't need any combine or state we just want the all record 
with a given key are sent to same thread,
 
Thanks



On 2019/05/24 03:20:13, Reuven Lax <re...@google.com> wrote: 
> Can you explain what you mean by worker? While every runner has workers of
> course, workers are not part of the programming model.
> 
> On Thu, May 23, 2019 at 8:13 PM pasquale.bon...@gmail.com <
> pasquale.bon...@gmail.com> wrote:
> 
> > Hi all,
> > I would like to know if Apache Beam has a functionality similar to
> > fieldsGrouping in Storm that allows to send records to a specific
> > task/worker based on a key.
> > I know that we can achieve that with a combine/grouByKey operation but
> > that implies to add a windowing in our pipeline that we don't want.
> > I have also tried using a stateful transformation.
> > I think that also in that case we should use a windowing, but I see that a
> > job with a stateful ParDo operation can be submitted  on Google Dataflow
> > with windowing. I don't know if this depends  by lacking of support for
> > stateful processing on Dataflow and if I can effetely achieve my goal with
> > this solution.
> >
> >
> > Thanks in advance for your help
> >
> >
>

Re: Shuffling on apache beam

Reply via email to