Eugene Kirpichov created BEAM-2536:
--------------------------------------

             Summary: Simplify specifying coders on PCollectionTuple
                 Key: BEAM-2536
                 URL: https://issues.apache.org/jira/browse/BEAM-2536
             Project: Beam
          Issue Type: Bug
          Components: sdk-java-core
            Reporter: Eugene Kirpichov


Currently when using a multi-output ParDo, the user usually has to do one of 
the following:

1) Use anonymous class: new TupleTag<Foo>() {} - in order to reify the Foo type 
and make coder inference work. In this case, a frequent problem is that the 
anonymous class captures a large enclosing class, and either doesn't serialize 
at all, or at least serializes to something bulky.
2) Explicitly do tuple.get(myTag).setCoder(...)

Both of these are suboptimal.

Could we have e.g. a constructor for TupleTag that explicitly takes a 
TypeDescriptor? Or even a Coder? Or a family of factory methods for 
TupleTagList that take these? E.g.:
in.apply(ParDo.of(...).withOutputTags(mainTag, TupleTagList.of(side1, 
FooCoder.of()).and(side2, BarCoder.of()));

I would suggest both: TupleTag constructor should optionally take a 
TypeDescriptor; and TupleTagList.of() and .and() should optionally take a Coder.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to