Re: [DISCUSS] Beam IO &runners native IO

Raghu Angadi Fri, 29 Apr 2016 08:14:52 -0700

On Fri, Apr 29, 2016 at 2:11 AM, Maximilian Michels <m...@apache.org> wrote:


> Further, the KafkaIO enforces a <key, value> data model which AFAIK is
> not enforced by the Beam model. I don't know the details for this
> design decision but I would like this to be communicated before it is
> merged into the master.
>

<key, value> structure comes from Kafka. It is not related to Beam or any
runner. I am not sure if I understood the concern here correctly. Every Kafka
record
<https://kafka.apache.org/090/javadoc/org/apache/kafka/clients/consumer/ConsumerRecord.html#ConsumerRecord(java.lang.String,%20int,%20long,%20K,%20V)>
has key, value, and some metadata like offset and partition. KafkaIO
directly reflects that.

As most users don't need Kafka metadata, we added '.withoutMetadata()
<https://github.com/apache/incubator-beam/blob/master/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java#L130>'
that returns just <key, value>.

One can argue many kafka consumers don't need key either. We debated adding
another helper method '.dropKey()', but left it out. Users can easily drop
keys in Beam :

   pipeline.apply(KafkaIO.read()....withoutMetaData())
               *.apply(Values.<String>create())*


> Otherwise, it will be harder to achieve
> portability among the Runners. In Flink and Spark we are already
> experienced with all kinds of connectors and user's needs. It would be
> nice to feed that back in the course of adding new connectors to the
> Beam API.
>

Re: [DISCUSS] Beam IO &runners native IO

Reply via email to