Sorry JB.Where can I download a " SNAPSHOT/nightly build" jar?My apologies...I
searched couldnt find it.
From: Jean-Baptiste Onofré <[email protected]>
To: [email protected]
Sent: Thursday, April 28, 2016 11:06 AM
Subject: Re: [DISCUSS] Beam IO &runners native IO
The KafkaIO is there:
https://github.com/apache/incubator-beam/tree/master/sdks/java/io/kafka
You have to use SNAPSHOT/nightly build or build yourself.
Regards
JB
On 04/28/2016 07:49 PM, amir bahmanyari wrote:
> Thanks so much JB.I Googled KafkaIO & got no references.Which Beam release
> supports KafkaIO? Please let me know when the examples are available...cannt
> wait !Have a wonderful day.
>
> From: Jean-Baptiste Onofré <[email protected]>
> To: [email protected]
> Sent: Thursday, April 28, 2016 10:40 AM
> Subject: Re: [DISCUSS] Beam IO &runners native IO
>
> Hi Amir,
>
> Now, we have a KafkaIO in Beam (both source and sink). I would start
> with this one.
>
> Actually, I'm preparing new pipeline examples showing the usage of that.
>
> Regards
> JB
>
> On 04/28/2016 07:36 PM, amir bahmanyari wrote:
>> Hi JB,Hope all is great.I am very new to "Beam". Trying to dive into it.Am
>> having a lot of trouble to read from a Kafka topic using
>> google.cloud.dataflow APis & FlinkPipelineRunner.Could you point me to some
>> docs and/or forums where I can find a solution for my problem pls?I really
>> appreciate it.
>> In case you are curious, given the following
>> line:p.apply(Read.named("ReadFromKafka").from(UnboundedFlinkSource.of(kafkaConsumer))).
>> that connects to my kafka consumer (and I confirm it at the server side), it
>> fails & reports:The transform ReadFromKafka [Read(UnboundedFlinkSource)] is
>> currently not supported.
>>
>> I am not sure where/how I need to specify "ReadFromKafka" part.I am sending
>> stream data from my laptop to kafka in a linux box.The default Kafka
>> consumer reports thats its received.
>> I really appreciate your help. I know this is not a conventional way to ask
>> such questions but have been spending a lot of time to figure it out.Have a
>> wonderful day JB.
>> Amir-
>> From: Jean-Baptiste Onofré <[email protected]>
>> To: [email protected]
>> Sent: Thursday, April 28, 2016 5:41 AM
>> Subject: [DISCUSS] Beam IO &runners native IO
>>
>> Hi all,
>>
>> regarding the recent threads on the mailing list, I would like to start
>> a format discussion around the IO.
>> As we can expect the first contributions on this area (I already have
>> some work in progress around this ;)), I think it's a fair discussion to
>> have.
>>
>> Now, we have two kinds of IO: the one "generic" to Beam, the one "local"
>> to the runners.
>>
>> For example, let's take Kafka: we have the KafkaIO (in IO), and for
>> instance, we have the spark-streaming kafka connector (in Spark Runner).
>>
>> Right now, we have two approaches for the user:
>> 1. In the pipeline, we use KafkaIO from Beam: it's the preferred
>> approach for sure. However, the user may want to use the runner specific
>> IO for two reasons:
>> * Beam doesn't provide the IO yet (for instance, spark cassandra
>> connector is available whereas we don't have yet any CassandraIO (I'm
>> working on it anyway ;))
>> * The runner native IO is optimized or contain more features that the
>> Beam native IO
>> 2. So, for the previous reasons, the user could want to use the native
>> runner IO. The drawback of this approach is that the pipeline will be
>> tight to a specific runner, which is completely against the Beam design.
>>
>> I wonder if it wouldn't make sense to add flag on the IO API (and
>> related on Runner API) like .useNative().
>>
>> For instance, the user would be able to do:
>>
>> pipeline.apply(KafkaIO.read().withBootstrapServers("...").withTopics("...").useNative(true);
>>
>> then, if the runner has a "native" IO, it will use it, else, if
>> useNative(false) (the default), it won't use any runner native IO.
>>
>> The point there is for the configuration: assuming the Beam IO and the
>> runner IO can differ, it means that the "Beam IO" would have to populate
>> all runner specific IO configuration.
>>
>> Of course, it's always possible to use a PTransform to wrap the runner
>> native IO, but we are back on the same concern: the pipeline will be
>> couple to a specific runner.
>>
>> The purpose of the useNative() flag is to "automatically" inform the
>> runner to use a specific IO if it has one: the pipeline stays decoupled
>> from the runners.
>>
>> Thoughts ?
>>
>> Thanks
>> Regards
>> JB
>>
>
--
Jean-Baptiste Onofré
[email protected]
http://blog.nanthrax.net
Talend - http://www.talend.com