+1 to pushing all remaining "major" (read likely to affect everyone) breaks through in a single release.
On Wed, Dec 7, 2016 at 3:56 PM Dan Halperin <dhalp...@google.com.invalid> wrote: > +user@, because this is a user-impacting change and they might not all be > paying attention to the dev@ list. > > +1 > > I'm mildly reluctant because this will break all users that have written > composite transforms -- and I'm the jerk that filed the issue (a few times > now, on different iterations of the SDKs). But, like Ben said, I can't > think of a different way to do this that doesn't break users. > > Hopefully, with breaking changes pushed to DoFn and to PTransform, the > worst user churn would be over. I can't think of anything quite so > invasive. > > IMO: if there is, we should try to push all the remaining "major" breaks > through in the same release. > > Dan > > On Thu, Dec 8, 2016 at 7:48 AM, Aljoscha Krettek <aljos...@apache.org> > wrote: > > > +1 > > > > I've seen this mistake myself in some PRs. > > > > On Thu, 8 Dec 2016 at 06:10 Ben Chambers <bchamb...@google.com.invalid> > > wrote: > > > > > +1 -- This seems like the best option. It's a mechanical change, and > the > > > compiler will let users know it needs to be made. It will make the > > mistake > > > much less common, and when it occurs it will be much clearer what is > > wrong. > > > > > > It would be great if we could make the mis-use a compiler problem or a > > > pipeline construction time error without changing the names, but both > of > > > these options are not practical. We can't hide the expansion method, > > since > > > it is what PTransform implementations need to override. We can't make > > this > > > a construction time exception since it would require adding code to > every > > > PTransform implementation. > > > > > > On Wed, Dec 7, 2016 at 1:55 PM Thomas Groh <tg...@google.com.invalid> > > > wrote: > > > > > > > +1; This is probably the best way to make sure users don't reverse > the > > > > polarity of the PCollection flow. > > > > > > > > This also brings PInput.expand(), POutput.expand(), and > > > > PTransform.expand(PInput) into line - namely, for some composite > thing, > > > > "represent yourself as some collection of primitives" (potentially > > > > recursively). > > > > > > > > On Wed, Dec 7, 2016 at 1:37 PM, Kenneth Knowles > <k...@google.com.invalid > > > > > > > wrote: > > > > > > > > > Hi all, > > > > > > > > > > I want to bring up another major backwards-incompatible change > before > > > it > > > > is > > > > > too late, to resolve [BEAM-438]. > > > > > > > > > > Summary: Leave PInput.apply the same but rename PTransform.apply to > > > > > PTransform.expand. I have opened [PR #1538] just for reference (it > > took > > > > 30 > > > > > seconds using IDE automated refactor) > > > > > > > > > > This change affects *PTransform authors* but does *not* affect > > pipeline > > > > > authors. > > > > > > > > > > This issue was filed a long time ago. It has been a problem many > > times > > > > with > > > > > actual users since before Beam started incubating. This is what > goes > > > > wrong > > > > > (often): > > > > > > > > > > PCollection<Foo> input = ... > > > > > PTransform<PCollection<Foo>, ...> transform = ... > > > > > > > > > > transform.apply(input) > > > > > > > > > > This type checks and even looks perfectly normal. Do you see the > > error? > > > > > > > > > > ... what we need the user to write is: > > > > > > > > > > input.apply(transform) > > > > > > > > > > What a confusing difference! After all, the first one type-checks > and > > > the > > > > > first one is how you apply a Function or Predicate or > > > > SerializableFunction, > > > > > etc. But it is broken. With transform.apply(input) the transform is > > not > > > > > registered with the pipeline at all. > > > > > > > > > > We obviously can't (and don't want to) change the most core way > that > > > > > pipeline authors use Beam, so PInput.apply (aka PCollection.apply) > > must > > > > > remain the same. But we do need a way to make it impossible to mix > > > these > > > > > up. > > > > > > > > > > The simplest way I can think of is to choose a new name for the > other > > > > > method involved. Users probably won't write transform.expand(input) > > > since > > > > > they will never have seen it in any examples, etc. This will just > > make > > > > > PTransform authors need to do a global rename, and the type system > > will > > > > > direct them to all cases so there is no silent failure possible. > > > > > > > > > > What do you think? > > > > > > > > > > Kenn > > > > > > > > > > [BEAM-438] https://issues.apache.org/jira/browse/BEAM-438 > > > > > [PR #1538] https://github.com/apache/incubator-beam/pull/1538 > > > > > > > > > > p.s. there is a really amusing and confusing call chain: > > > > PCollection.apply > > > > > -> Pipeline.applyTransform -> Pipeline.applyInternal -> > > > > > PipelineRunner.apply -> PTransform.apply > > > > > > > > > > After this change and work to get the runner out of the loop, it > > > becomes > > > > > PCollection.apply -> Pipeline.applyTransform -> PTransform.expand > > > > > > > > > > > > > > >