Hi JB & colleagues,I got this jar file from .Index of /groups/snapshots/org/apache/beam | | | Index of /groups/snapshots/org/apache/beam | |
| java-sdk-all-0.1.0-incubating-20160428.071145-22.jar Is this is right jar to get KafkaIO?Cannt import org.apache.beam.sdk.io.kafka.... Thanks so much for your help.Amir- From: amir bahmanyari <amirto...@yahoo.com> To: "dev@beam.incubator.apache.org" <dev@beam.incubator.apache.org> Sent: Thursday, April 28, 2016 12:59 PM Subject: Re: [DISCUSS] Beam IO &runners native IO Thanks so much JB.Got it.Cannt import org.apache.beam.sdk.io.kafka I see org.apache.beam.sdk.io source and sink thu. I wait till I see the usage examples.Thanks again From: Jean-Baptiste Onofré <j...@nanthrax.net> To: dev@beam.incubator.apache.org Sent: Thursday, April 28, 2016 12:18 PM Subject: Re: [DISCUSS] Beam IO &runners native IO http://repository.apache.org/content/groups/snapshots/org/apache/beam/ On 04/28/2016 09:10 PM, amir bahmanyari wrote: > Sorry JB.Where can I download a " SNAPSHOT/nightly build" jar?My > apologies...I searched couldnt find it. > > From: Jean-Baptiste Onofré <j...@nanthrax.net> > To: dev@beam.incubator.apache.org > Sent: Thursday, April 28, 2016 11:06 AM > Subject: Re: [DISCUSS] Beam IO &runners native IO > > The KafkaIO is there: > > https://github.com/apache/incubator-beam/tree/master/sdks/java/io/kafka > > You have to use SNAPSHOT/nightly build or build yourself. > > Regards > JB > > On 04/28/2016 07:49 PM, amir bahmanyari wrote: >> Thanks so much JB.I Googled KafkaIO & got no references.Which Beam release >> supports KafkaIO? Please let me know when the examples are available...cannt >> wait !Have a wonderful day. >> >> From: Jean-Baptiste Onofré <j...@nanthrax.net> >> To: dev@beam.incubator.apache.org >> Sent: Thursday, April 28, 2016 10:40 AM >> Subject: Re: [DISCUSS] Beam IO &runners native IO >> >> Hi Amir, >> >> Now, we have a KafkaIO in Beam (both source and sink). I would start >> with this one. >> >> Actually, I'm preparing new pipeline examples showing the usage of that. >> >> Regards >> JB >> >> On 04/28/2016 07:36 PM, amir bahmanyari wrote: >>> Hi JB,Hope all is great.I am very new to "Beam". Trying to dive into it.Am >>> having a lot of trouble to read from a Kafka topic using >>> google.cloud.dataflow APis & FlinkPipelineRunner.Could you point me to >>> some docs and/or forums where I can find a solution for my problem pls?I >>> really appreciate it. >>> In case you are curious, given the following >>> line:p.apply(Read.named("ReadFromKafka").from(UnboundedFlinkSource.of(kafkaConsumer))). >>> that connects to my kafka consumer (and I confirm it at the server side), >>> it fails & reports:The transform ReadFromKafka [Read(UnboundedFlinkSource)] >>> is currently not supported. >>> >>> I am not sure where/how I need to specify "ReadFromKafka" part.I am sending >>> stream data from my laptop to kafka in a linux box.The default Kafka >>> consumer reports thats its received. >>> I really appreciate your help. I know this is not a conventional way to ask >>> such questions but have been spending a lot of time to figure it out.Have a >>> wonderful day JB. >>> Amir- >>> From: Jean-Baptiste Onofré <j...@nanthrax.net> >>> To: dev@beam.incubator.apache.org >>> Sent: Thursday, April 28, 2016 5:41 AM >>> Subject: [DISCUSS] Beam IO &runners native IO >>> >>> Hi all, >>> >>> regarding the recent threads on the mailing list, I would like to start >>> a format discussion around the IO. >>> As we can expect the first contributions on this area (I already have >>> some work in progress around this ;)), I think it's a fair discussion to >>> have. >>> >>> Now, we have two kinds of IO: the one "generic" to Beam, the one "local" >>> to the runners. >>> >>> For example, let's take Kafka: we have the KafkaIO (in IO), and for >>> instance, we have the spark-streaming kafka connector (in Spark Runner). >>> >>> Right now, we have two approaches for the user: >>> 1. In the pipeline, we use KafkaIO from Beam: it's the preferred >>> approach for sure. However, the user may want to use the runner specific >>> IO for two reasons: >>> * Beam doesn't provide the IO yet (for instance, spark cassandra >>> connector is available whereas we don't have yet any CassandraIO (I'm >>> working on it anyway ;)) >>> * The runner native IO is optimized or contain more features that >>>the >>> Beam native IO >>> 2. So, for the previous reasons, the user could want to use the native >>> runner IO. The drawback of this approach is that the pipeline will be >>> tight to a specific runner, which is completely against the Beam design. >>> >>> I wonder if it wouldn't make sense to add flag on the IO API (and >>> related on Runner API) like .useNative(). >>> >>> For instance, the user would be able to do: >>> >>> pipeline.apply(KafkaIO.read().withBootstrapServers("...").withTopics("...").useNative(true); >>> >>> then, if the runner has a "native" IO, it will use it, else, if >>> useNative(false) (the default), it won't use any runner native IO. >>> >>> The point there is for the configuration: assuming the Beam IO and the >>> runner IO can differ, it means that the "Beam IO" would have to populate >>> all runner specific IO configuration. >>> >>> Of course, it's always possible to use a PTransform to wrap the runner >>> native IO, but we are back on the same concern: the pipeline will be >>> couple to a specific runner. >>> >>> The purpose of the useNative() flag is to "automatically" inform the >>> runner to use a specific IO if it has one: the pipeline stays decoupled >>> from the runners. >>> >>> Thoughts ? >>> >>> Thanks >>> Regards >>> JB >>> >> > -- Jean-Baptiste Onofré jbono...@apache.org http://blog.nanthrax.net Talend - http://www.talend.com