[ 
https://issues.apache.org/jira/browse/BEAM-73?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Kirpichov closed BEAM-73.
--------------------------------
    Resolution: Duplicate

The only remaining instance of this is in KafkaIO, handled by BEAM-1573.

> IO design pattern: Decouple Parsers and Coders
> ----------------------------------------------
>
>                 Key: BEAM-73
>                 URL: https://issues.apache.org/jira/browse/BEAM-73
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-java-core
>            Reporter: Daniel Halperin
>            Priority: Minor
>              Labels: backward-incompatible
>             Fix For: First stable release
>
>
> Many Sources can be thought of as providing a byte[] payload -- e.g. TextIO 
> bytes between newlines, or PubSubIO messages. Therefore, we originally 
> suggested a Coder as the thing to use to decode these byte[] into T (what 
> I'll call Parsing).
> Consider the case of a text file of integers.
> 123\n
> 456\n
> ...
> We want a PCollection<Integer> out, so we can use TextualIntegerCoder with 
> TextIO.Read. However, that Coder will get propagated as the default coder for 
> that PCollection (and may be used in downstream DoFns). This seem bad as, 
> once the data is parsed, we probably want to use VarIntCoder or another Coder 
> that is more CPU- and Space-efficient.
> Another design pattern is
>     TextIO.Read() -> MapElements<String, Integer> (lambda s : 
> Integer.parseInt(s))
> This has better behavior, but now we go from byte[] to String to Integer 
> rather than directly from byte[] to Integer.
> The solution seems to be to explicitly add Parser and Coder abstractions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to