[ 
https://issues.apache.org/jira/browse/BEAM-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16107634#comment-16107634
 ] 

Thomas Groh commented on BEAM-2699:
-----------------------------------

The latter claim is incorrect: two applications of the same transform to the 
same input will produce different output (unless the transform's output is a 
subset of it's input; in which case the two applications are indistinguishable 
(same input, same transform, same output), and reusing the same references 
cannot be observed. Additionally, applications within a pipeline are forced to 
have unique names, so no two transforms will ever incorrectly collide.

The former claim is more worrying; failing to find the same application is a 
pretty big problem. Potentially equals/hashCode should be (Name + Pipeline)?

As an aside, PTransforms shouldn't throw an exception if you call equals or 
hashCode on them, just return a more-discriminating response.

> AppliedPTransform is used as a key in hashmaps but PTransform is not 
> hashable/equality-comparable
> -------------------------------------------------------------------------------------------------
>
>                 Key: BEAM-2699
>                 URL: https://issues.apache.org/jira/browse/BEAM-2699
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-core
>            Reporter: Eugene Kirpichov
>            Assignee: Thomas Groh
>
> There's plenty of occurrences in runners-core of Map or BiMap where the key 
> is an AppliedPTransform.
> However, PTransform does not advertise that it is required to implement 
> equals/hashCode, and some transforms can't do it properly anyway - for 
> example, transforms that capture a ValueProvider which is also not 
> hashable/eq-comparable. I'm surprised that things aren't already very broken 
> because of this.
> Fundamentally, I don't see why we should ever compare two PTransform's for 
> equality.
> I looked at the code and wondered "can AppliedPTransform simply be 
> identity-hashable", but right now the answer is no because we can create an 
> AppliedPTransform for the same transform applied to the same thing multiple 
> times.
> Fixing that appears to be not very easy, but definitely possible. Ideally 
> TransformHierarchy.Node would just know its AppliedPTransform, however a Node 
> can be constructed when there's yet no Pipeline. Suppose there's gotta be 
> some way to propagate a Pipeline into Node.finishSpecifying() (which should 
> be called exactly once on the Node, and this should be enforced), and have 
> finishSpecifying() return the AppliedPTransform, and have the caller use that 
> instead of potentially repeatedly calling .toAppliedPTransform() on the same 
> Node.
> [~kenn] is on vacation but perhaps [~tgroh] can help with this meanwhile?
> CC: [~reuvenlax]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to