It sounds like in your specific case you're saying that the same encoding can be viewed by the Java type system two different ways. For instance, if you have an object Person that is convertible to JSON using Jackson, than that JSON encoding can be viewed as either a Person or a Map<String, Object> looking at the JSON fields. In that case, there needs to be some kind of "view change" change transform to change the type of the PCollection.
I'm not sure an untyped API would be better here. Requiring the "view change" be explicit means we can ensure the types are compatible, and also makes it very clear when this kind of change is desired. Some background on Coders that may be relevant: It might help to to think about Coders as the specification of how elements in a PCollection are encoded if/when the runner needs to. If you are trying to read JSON or XML records from a source, that is part of the source transform (reading JSON or XML records) and not part of the collection produced by the transform. Consider further -- even if you read XML records from a source, you likely *wouldn't* want to use an XML Coder for those records within the pipeline, as every time the pipeline needed to serialize them you would produce much larger amounts of data (XML is not an efficient/compact encoding). Instead, you likely want to read XML records from the source and then encode those within the pipeline using something more efficient. Then convert them to something more readable but possibly less-efficient before they exit the pipeline at a sink. On Tue, Jan 30, 2018 at 12:23 PM Kenneth Knowles <k...@google.com> wrote: > Ah, this is a point that Robert brings up quite often: one reason we put > coders on PCollections instead of doing that work in PTransforms is that > the runner (plus SDK harness) can automatically only serialize when > necessary. So the default in Beam is that the thing you want to happen is > already done. There are some corner cases when you get to the portability > framework but I am pretty sure it already works this way. If you show what > is a PTransform and PCollection in your example it might show where we can > fix things. > > On Tue, Jan 30, 2018 at 12:17 PM, Romain Manni-Bucau < > rmannibu...@gmail.com> wrote: > >> Indeed, >> >> I'll take a stupid example to make it shorter. >> I have a source emitting Person objects ({name:...,id:...}) serialized >> with jackson as JSON. >> Then my pipeline processes them with a DoFn taking a Map<String, String>. >> Here I set the coder to read json as a map. >> >> However a Map<String, String> is not a Person so my pipeline needs an >> intermediate step to convert one into the other and has in the design an >> useless serialization round trip. >> >> If you check the chain you have: Person -> JSON -> Map<String, String> -> >> JSON -> Map<String, String> whereas Person -> JSON -> Map<String, >> String> is fully enough cause there is equivalence of JSON in this example. >> >> In other words if an coder output is readable from another coder input, >> the java strong typing doesn't know about it and can enforce some fake >> steps. >> >> >> >> Romain Manni-Bucau >> @rmannibucau <https://twitter.com/rmannibucau> | Blog >> <https://rmannibucau.metawerx.net/> | Old Blog >> <http://rmannibucau.wordpress.com> | Github >> <https://github.com/rmannibucau> | LinkedIn >> <https://www.linkedin.com/in/rmannibucau> >> >> 2018-01-30 21:07 GMT+01:00 Kenneth Knowles <k...@google.com>: >> >>> I'm not sure I understand your question. Can you explain more? >>> >>> On Tue, Jan 30, 2018 at 11:50 AM, Romain Manni-Bucau < >>> rmannibu...@gmail.com> wrote: >>> >>>> Hi guys, >>>> >>>> just encountered an issue with the pipeline API and wondered if you >>>> thought about it. >>>> >>>> It can happen the Coders are compatible between them. Simple example is >>>> a text coder like JSON or XML will be able to read text. However with the >>>> pipeline API you can't support this directly and >>>> enforce the user to use an intermediate state to be typed. >>>> >>>> Is there already a way to avoid these useless round trips? >>>> >>>> Said otherwise: how to handle coders transitivity? >>>> >>>> Romain Manni-Bucau >>>> @rmannibucau <https://twitter.com/rmannibucau> | Blog >>>> <https://rmannibucau.metawerx.net/> | Old Blog >>>> <http://rmannibucau.wordpress.com> | Github >>>> <https://github.com/rmannibucau> | LinkedIn >>>> <https://www.linkedin.com/in/rmannibucau> >>>> >>> >>> >> >