+1

Regards
JB

On 12/07/2016 10:37 PM, Kenneth Knowles wrote:
Hi all,

I want to bring up another major backwards-incompatible change before it is
too late, to resolve [BEAM-438].

Summary: Leave PInput.apply the same but rename PTransform.apply to
PTransform.expand. I have opened [PR #1538] just for reference (it took 30
seconds using IDE automated refactor)

This change affects *PTransform authors* but does *not* affect pipeline
authors.

This issue was filed a long time ago. It has been a problem many times with
actual users since before Beam started incubating. This is what goes wrong
(often):

   PCollection<Foo> input = ...
   PTransform<PCollection<Foo>, ...> transform = ...

   transform.apply(input)

This type checks and even looks perfectly normal. Do you see the error?

... what we need the user to write is:

    input.apply(transform)

What a confusing difference! After all, the first one type-checks and the
first one is how you apply a Function or Predicate or SerializableFunction,
etc. But it is broken. With transform.apply(input) the transform is not
registered with the pipeline at all.

We obviously can't (and don't want to) change the most core way that
pipeline authors use Beam, so PInput.apply (aka PCollection.apply) must
remain the same. But we do need a way to make it impossible to mix these up.

The simplest way I can think of is to choose a new name for the other
method involved. Users probably won't write transform.expand(input) since
they will never have seen it in any examples, etc. This will just make
PTransform authors need to do a global rename, and the type system will
direct them to all cases so there is no silent failure possible.

What do you think?

Kenn

[BEAM-438] https://issues.apache.org/jira/browse/BEAM-438
[PR #1538] https://github.com/apache/incubator-beam/pull/1538

p.s. there is a really amusing and confusing call chain: PCollection.apply
-> Pipeline.applyTransform -> Pipeline.applyInternal ->
PipelineRunner.apply -> PTransform.apply

After this change and work to get the runner out of the loop, it becomes
PCollection.apply -> Pipeline.applyTransform -> PTransform.expand


--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Reply via email to