On Mon, Nov 4, 2019 at 11:54 AM Chamikara Jayalath <chamik...@google.com> wrote: > > On Mon, Nov 4, 2019 at 11:01 AM Hai Lu <lhai...@apache.org> wrote: >> >> Hi, >> >> We're looking into leveraging the cross language pipeline feature in our >> Beam pipelines on Samza runner. While the feature seems to work well, the >> PTransform expansion as a standalone service isn't very convenient. >> Particularly that the Python pipeline needs to specify the address of the >> expansion service. >> >> I'm wondering why we couldn't embed the expansion service into runner >> itself. I understand the cross language feature wants to be runner >> independent, but does it make sense to at least provide the option to allow >> runner to use the expansion service as a library and make it transparent to >> the portable pipeline? > > > Beam composite transforms are expanded before defining the portable job > definition (and before submitting the jobs to the runner). So naturally this > is something that has to be done in the Beam side. As an added benefit, as > you identified, this allows us to keep this logic runner independent. > I think there were discussions regarding automatically starting up a local > expansion service if one is not specified. Will this address your concerns ?
Just to add to this, If you have a pipeline A -> B -> C, the expansion of B often needs to be evaluated before C can be applied (e.g. we're planning on exposing the SQL transforms cross language, and many cross-language IOs can query and supply their own schemas for downstream type checking), so one cannot construct the "whole" pipeline, pass it to the runner, and let the runner do the expansion.