[ https://issues.apache.org/jira/browse/BEAM-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16107634#comment-16107634 ]
Thomas Groh commented on BEAM-2699: ----------------------------------- The latter claim is incorrect: two applications of the same transform to the same input will produce different output (unless the transform's output is a subset of it's input; in which case the two applications are indistinguishable (same input, same transform, same output), and reusing the same references cannot be observed. Additionally, applications within a pipeline are forced to have unique names, so no two transforms will ever incorrectly collide. The former claim is more worrying; failing to find the same application is a pretty big problem. Potentially equals/hashCode should be (Name + Pipeline)? As an aside, PTransforms shouldn't throw an exception if you call equals or hashCode on them, just return a more-discriminating response. > AppliedPTransform is used as a key in hashmaps but PTransform is not > hashable/equality-comparable > ------------------------------------------------------------------------------------------------- > > Key: BEAM-2699 > URL: https://issues.apache.org/jira/browse/BEAM-2699 > Project: Beam > Issue Type: Bug > Components: runner-core > Reporter: Eugene Kirpichov > Assignee: Thomas Groh > > There's plenty of occurrences in runners-core of Map or BiMap where the key > is an AppliedPTransform. > However, PTransform does not advertise that it is required to implement > equals/hashCode, and some transforms can't do it properly anyway - for > example, transforms that capture a ValueProvider which is also not > hashable/eq-comparable. I'm surprised that things aren't already very broken > because of this. > Fundamentally, I don't see why we should ever compare two PTransform's for > equality. > I looked at the code and wondered "can AppliedPTransform simply be > identity-hashable", but right now the answer is no because we can create an > AppliedPTransform for the same transform applied to the same thing multiple > times. > Fixing that appears to be not very easy, but definitely possible. Ideally > TransformHierarchy.Node would just know its AppliedPTransform, however a Node > can be constructed when there's yet no Pipeline. Suppose there's gotta be > some way to propagate a Pipeline into Node.finishSpecifying() (which should > be called exactly once on the Node, and this should be enforced), and have > finishSpecifying() return the AppliedPTransform, and have the caller use that > instead of potentially repeatedly calling .toAppliedPTransform() on the same > Node. > [~kenn] is on vacation but perhaps [~tgroh] can help with this meanwhile? > CC: [~reuvenlax] -- This message was sent by Atlassian JIRA (v6.4.14#64029)