Re: [DISCUSS] [BEAM-438] Rename one of PTransform.apply or PInput.apply

Aljoscha Krettek Wed, 07 Dec 2016 15:51:32 -0800

+1

I've seen this mistake myself in some PRs.


On Thu, 8 Dec 2016 at 06:10 Ben Chambers <bchamb...@google.com.invalid>
wrote:

> +1 -- This seems like the best option. It's a mechanical change, and the
> compiler will let users know it needs to be made. It will make the mistake
> much less common, and when it occurs it will be much clearer what is wrong.
>
> It would be great if we could make the mis-use a compiler problem or a
> pipeline construction time error without changing the names, but both of
> these options are not practical. We can't hide the expansion method, since
> it is what PTransform implementations need to override. We can't make this
> a construction time exception since it would require adding code to every
> PTransform implementation.
>
> On Wed, Dec 7, 2016 at 1:55 PM Thomas Groh <tg...@google.com.invalid>
> wrote:
>
> > +1; This is probably the best way to make sure users don't reverse the
> > polarity of the PCollection flow.
> >
> > This also brings PInput.expand(), POutput.expand(), and
> > PTransform.expand(PInput) into line - namely, for some composite thing,
> > "represent yourself as some collection of primitives" (potentially
> > recursively).
> >
> > On Wed, Dec 7, 2016 at 1:37 PM, Kenneth Knowles <k...@google.com.invalid>
> > wrote:
> >
> > > Hi all,
> > >
> > > I want to bring up another major backwards-incompatible change before
> it
> > is
> > > too late, to resolve [BEAM-438].
> > >
> > > Summary: Leave PInput.apply the same but rename PTransform.apply to
> > > PTransform.expand. I have opened [PR #1538] just for reference (it took
> > 30
> > > seconds using IDE automated refactor)
> > >
> > > This change affects *PTransform authors* but does *not* affect pipeline
> > > authors.
> > >
> > > This issue was filed a long time ago. It has been a problem many times
> > with
> > > actual users since before Beam started incubating. This is what goes
> > wrong
> > > (often):
> > >
> > >    PCollection<Foo> input = ...
> > >    PTransform<PCollection<Foo>, ...> transform = ...
> > >
> > >    transform.apply(input)
> > >
> > > This type checks and even looks perfectly normal. Do you see the error?
> > >
> > > ... what we need the user to write is:
> > >
> > >     input.apply(transform)
> > >
> > > What a confusing difference! After all, the first one type-checks and
> the
> > > first one is how you apply a Function or Predicate or
> > SerializableFunction,
> > > etc. But it is broken. With transform.apply(input) the transform is not
> > > registered with the pipeline at all.
> > >
> > > We obviously can't (and don't want to) change the most core way that
> > > pipeline authors use Beam, so PInput.apply (aka PCollection.apply) must
> > > remain the same. But we do need a way to make it impossible to mix
> these
> > > up.
> > >
> > > The simplest way I can think of is to choose a new name for the other
> > > method involved. Users probably won't write transform.expand(input)
> since
> > > they will never have seen it in any examples, etc. This will just make
> > > PTransform authors need to do a global rename, and the type system will
> > > direct them to all cases so there is no silent failure possible.
> > >
> > > What do you think?
> > >
> > > Kenn
> > >
> > > [BEAM-438] https://issues.apache.org/jira/browse/BEAM-438
> > > [PR #1538] https://github.com/apache/incubator-beam/pull/1538
> > >
> > > p.s. there is a really amusing and confusing call chain:
> > PCollection.apply
> > > -> Pipeline.applyTransform -> Pipeline.applyInternal ->
> > > PipelineRunner.apply -> PTransform.apply
> > >
> > > After this change and work to get the runner out of the loop, it
> becomes
> > > PCollection.apply -> Pipeline.applyTransform -> PTransform.expand
> > >
> >
>

Re: [DISCUSS] [BEAM-438] Rename one of PTransform.apply or PInput.apply

Reply via email to