Re: PTransform.expand() guarantees

Lukasz Cwik Fri, 21 Jun 2019 09:15:34 -0700

On Fri, Jun 21, 2019 at 9:07 AM Alexey Romanenko <[email protected]>
wrote:


> Hello,
>
> I tried to find an answer in documentation for the questions below but I
> haven’t managed to do that. Actually, there are 3 related questions:
>
> Does Beam guarantee where (at “driver” or at "worker” of backend system) "
> *PTransform.expand()*” of provided transform will be called?
>
No. There are usecases where the driver is run in the "cloud" such as
template generation and also during cross language pipeline expansion. At
some point in time when I was investigating loops within Beam, one possible
solution would have been to call expand() in the "worker" whenever a new
loop iteration needed to be generated.


> Does Beam guarantee how many times it could be happened?
>
It should happen once per transform instance but why is it important?


> Does it depend on runner implementation or anything else?
>
It should not but historically in some places this has happened. Some
transforms were written with logic like am I a streaming pipeline or am I
running using Dataflow then do X. We have tried to prevent this from
happening and cleaned up places where we noticed this happens as this makes
pipelines hard to be portable across runners.

We have always wanted the driver program (wherever it may live) to give a
whole pipeline definition to the runner and the runner can then "optimize"
it by performing any additional PTransform replacements.

Re: PTransform.expand() guarantees

Reply via email to