Eugene Kirpichov created BEAM-2699:
--------------------------------------

             Summary: AppliedPTransform is used as a key in hashmaps but 
PTransform is not hashable/equality-comparable
                 Key: BEAM-2699
                 URL: https://issues.apache.org/jira/browse/BEAM-2699
             Project: Beam
          Issue Type: Bug
          Components: runner-core
            Reporter: Eugene Kirpichov
            Assignee: Thomas Groh


There's plenty of occurrences in runners-core of Map or BiMap where the key is 
an AppliedPTransform.

However, PTransform does not advertise that it is required to implement 
equals/hashCode, and some transforms can't do it properly anyway - for example, 
transforms that capture a ValueProvider which is also not 
hashable/eq-comparable. I'm surprised that things aren't already very broken 
because of this.

Fundamentally, I don't see why we should ever compare two PTransform's for 
equality.

I looked at the code and wondered "can AppliedPTransform simply be 
identity-hashable", but right now the answer is no because we can create an 
AppliedPTransform for the same transform applied to the same thing multiple 
times.

Fixing that appears to be not very easy, but definitely possible. Ideally 
TransformHierarchy.Node would just know its AppliedPTransform, however a Node 
can be constructed when there's yet no Pipeline. Suppose there's gotta be some 
way to propagate a Pipeline into Node.finishSpecifying() (which should be 
called exactly once on the Node, and this should be enforced), and have 
finishSpecifying() return the AppliedPTransform, and have the caller use that 
instead of potentially repeatedly calling .toAppliedPTransform() on the same 
Node.

[~kenn] is on vacation but perhaps [~tgroh] can help with this meanwhile?

CC: [~reuvenlax]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to