Re: [DISCUSS] Beam IO &runners native IO

amir bahmanyari Thu, 28 Apr 2016 10:49:55 -0700

Thanks so much JB.I Googled KafkaIO & got no references.Which Beam release 
supports KafkaIO? Please let me know when the examples are available...cannt 
wait !Have a wonderful day.


      From: Jean-Baptiste Onofré <j...@nanthrax.net>
 To: dev@beam.incubator.apache.org 
 Sent: Thursday, April 28, 2016 10:40 AM
 Subject: Re: [DISCUSS] Beam IO &runners native IO
   
Hi Amir,

Now, we have a KafkaIO in Beam (both source and sink). I would start 
with this one.

Actually, I'm preparing new pipeline examples showing the usage of that.

Regards
JB

On 04/28/2016 07:36 PM, amir bahmanyari wrote:
> Hi JB,Hope all is great.I am very new to "Beam". Trying to dive into it.Am 
> having a lot of trouble to read from a Kafka topic using 
> google.cloud.dataflow  APis & FlinkPipelineRunner.Could you point me to some 
> docs and/or forums where I can find a solution for my problem pls?I really 
> appreciate it.
> In case you are curious, given the following 
> line:p.apply(Read.named("ReadFromKafka").from(UnboundedFlinkSource.of(kafkaConsumer))).
> that connects to my kafka consumer (and I confirm it at the server side), it 
> fails & reports:The transform ReadFromKafka [Read(UnboundedFlinkSource)] is 
> currently not supported.
>
> I am not sure where/how I need to specify "ReadFromKafka" part.I am sending 
> stream data from my laptop to kafka in  a linux box.The default Kafka 
> consumer reports thats its received.
> I really appreciate your help. I know this is not a conventional way to ask 
> such questions but have been spending a lot of time to figure it out.Have a 
> wonderful day JB.
> Amir-
>        From: Jean-Baptiste Onofré <j...@nanthrax.net>
>  To: dev@beam.incubator.apache.org
>  Sent: Thursday, April 28, 2016 5:41 AM
>  Subject: [DISCUSS] Beam IO &runners native IO
>
> Hi all,
>
> regarding the recent threads on the mailing list, I would like to start
> a format discussion around the IO.
> As we can expect the first contributions on this area (I already have
> some work in progress around this ;)), I think it's a fair discussion to
> have.
>
> Now, we have two kinds of IO: the one "generic" to Beam, the one "local"
> to the runners.
>
> For example, let's take Kafka: we have the KafkaIO (in IO), and for
> instance, we have the spark-streaming kafka connector (in Spark Runner).
>
> Right now, we have two approaches for the user:
> 1. In the pipeline, we use KafkaIO from Beam: it's the preferred
> approach for sure. However, the user may want to use the runner specific
> IO for two reasons:
>      * Beam doesn't provide the IO yet (for instance, spark cassandra
> connector is available whereas we don't have yet any CassandraIO (I'm
> working on it anyway ;))
>      * The runner native IO is optimized or contain more features that the
> Beam native IO
> 2. So, for the previous reasons, the user could want to use the native
> runner IO. The drawback of this approach is that the pipeline will be
> tight to a specific runner, which is completely against the Beam design.
>
> I wonder if it wouldn't make sense to add flag on the IO API (and
> related on Runner API) like .useNative().
>
> For instance, the user would be able to do:
>
> pipeline.apply(KafkaIO.read().withBootstrapServers("...").withTopics("...").useNative(true);
>
> then, if the runner has a "native" IO, it will use it, else, if
> useNative(false) (the default), it won't use any runner native IO.
>
> The point there is for the configuration: assuming the Beam IO and the
> runner IO can differ, it means that the "Beam IO" would have to populate
> all runner specific IO configuration.
>
> Of course, it's always possible to use a PTransform to wrap the runner
> native IO, but we are back on the same concern: the pipeline will be
> couple to a specific runner.
>
> The purpose of the useNative() flag is to "automatically" inform the
> runner to use a specific IO if it has one: the pipeline stays decoupled
> from the runners.
>
> Thoughts ?
>
> Thanks
> Regards
> JB
>

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: [DISCUSS] Beam IO &runners native IO

Reply via email to