Re: [DISCUSS] Beam IO &runners native IO

Raghu Angadi Thu, 28 Apr 2016 20:48:07 -0700

Amir,

KafkaIO is in this jar :
https://repository.apache.org/content/groups/snapshots/org/apache/beam/kafka/0.1.0-incubating-SNAPSHOT/kafka-0.1.0-incubating-20160428.071230-8.jar


how are building your app? How are fetching dependencies?

On Thu, Apr 28, 2016 at 7:51 PM, amir bahmanyari <
amirto...@yahoo.com.invalid> wrote:

> Hi JB & colleagues,I got this jar file from .Index of
> /groups/snapshots/org/apache/beam
>
> |
> |   |
> Index of /groups/snapshots/org/apache/beam
>    |  |
>
>   |
>
>
> java-sdk-all-0.1.0-incubating-20160428.071145-22.jar
>
> Is this is right jar to get KafkaIO?Cannt
> import org.apache.beam.sdk.io.kafka....
> Thanks so much for your help.Amir-
>
>       From: amir bahmanyari <amirto...@yahoo.com>
>  To: "dev@beam.incubator.apache.org" <dev@beam.incubator.apache.org>
>  Sent: Thursday, April 28, 2016 12:59 PM
>  Subject: Re: [DISCUSS] Beam IO &runners native IO
>
> Thanks so much JB.Got it.Cannt import org.apache.beam.sdk.io.kafka I see
> org.apache.beam.sdk.io source and sink thu.
> I wait till I see the usage examples.Thanks again      From: Jean-Baptiste
> Onofré <j...@nanthrax.net>
>  To: dev@beam.incubator.apache.org
>  Sent: Thursday, April 28, 2016 12:18 PM
>  Subject: Re: [DISCUSS] Beam IO &runners native IO
>
> http://repository.apache.org/content/groups/snapshots/org/apache/beam/
>
> On 04/28/2016 09:10 PM, amir bahmanyari wrote:
> > Sorry JB.Where can I download a " SNAPSHOT/nightly build" jar?My
> apologies...I searched couldnt find it.
> >
> >        From: Jean-Baptiste Onofré <j...@nanthrax.net>
> >  To: dev@beam.incubator.apache.org
> >  Sent: Thursday, April 28, 2016 11:06 AM
> >  Subject: Re: [DISCUSS] Beam IO &runners native IO
> >
> > The KafkaIO is there:
> >
> > https://github.com/apache/incubator-beam/tree/master/sdks/java/io/kafka
> >
> > You have to use SNAPSHOT/nightly build or build yourself.
> >
> > Regards
> > JB
> >
> > On 04/28/2016 07:49 PM, amir bahmanyari wrote:
> >> Thanks so much JB.I Googled KafkaIO & got no references.Which Beam
> release supports KafkaIO? Please let me know when the examples are
> available...cannt wait !Have a wonderful day.
> >>
> >>          From: Jean-Baptiste Onofré <j...@nanthrax.net>
> >>    To: dev@beam.incubator.apache.org
> >>    Sent: Thursday, April 28, 2016 10:40 AM
> >>    Subject: Re: [DISCUSS] Beam IO &runners native IO
> >>
> >> Hi Amir,
> >>
> >> Now, we have a KafkaIO in Beam (both source and sink). I would start
> >> with this one.
> >>
> >> Actually, I'm preparing new pipeline examples showing the usage of that.
> >>
> >> Regards
> >> JB
> >>
> >> On 04/28/2016 07:36 PM, amir bahmanyari wrote:
> >>> Hi JB,Hope all is great.I am very new to "Beam". Trying to dive into
> it.Am having a lot of trouble to read from a Kafka topic using
> google.cloud.dataflow  APis & FlinkPipelineRunner.Could you point me to
> some docs and/or forums where I can find a solution for my problem pls?I
> really appreciate it.
> >>> In case you are curious, given the following
> line:p.apply(Read.named("ReadFromKafka").from(UnboundedFlinkSource.of(kafkaConsumer))).
> >>> that connects to my kafka consumer (and I confirm it at the server
> side), it fails & reports:The transform ReadFromKafka
> [Read(UnboundedFlinkSource)] is currently not supported.
> >>>
> >>> I am not sure where/how I need to specify "ReadFromKafka" part.I am
> sending stream data from my laptop to kafka in  a linux box.The default
> Kafka consumer reports thats its received.
> >>> I really appreciate your help. I know this is not a conventional way
> to ask such questions but have been spending a lot of time to figure it
> out.Have a wonderful day JB.
> >>> Amir-
> >>>            From: Jean-Baptiste Onofré <j...@nanthrax.net>
> >>>      To: dev@beam.incubator.apache.org
> >>>      Sent: Thursday, April 28, 2016 5:41 AM
> >>>      Subject: [DISCUSS] Beam IO &runners native IO
> >>>
> >>> Hi all,
> >>>
> >>> regarding the recent threads on the mailing list, I would like to start
> >>> a format discussion around the IO.
> >>> As we can expect the first contributions on this area (I already have
> >>> some work in progress around this ;)), I think it's a fair discussion
> to
> >>> have.
> >>>
> >>> Now, we have two kinds of IO: the one "generic" to Beam, the one
> "local"
> >>> to the runners.
> >>>
> >>> For example, let's take Kafka: we have the KafkaIO (in IO), and for
> >>> instance, we have the spark-streaming kafka connector (in Spark
> Runner).
> >>>
> >>> Right now, we have two approaches for the user:
> >>> 1. In the pipeline, we use KafkaIO from Beam: it's the preferred
> >>> approach for sure. However, the user may want to use the runner
> specific
> >>> IO for two reasons:
> >>>          * Beam doesn't provide the IO yet (for instance, spark
> cassandra
> >>> connector is available whereas we don't have yet any CassandraIO (I'm
> >>> working on it anyway ;))
> >>>          * The runner native IO is optimized or contain more features
> that the
> >>> Beam native IO
> >>> 2. So, for the previous reasons, the user could want to use the native
> >>> runner IO. The drawback of this approach is that the pipeline will be
> >>> tight to a specific runner, which is completely against the Beam
> design.
> >>>
> >>> I wonder if it wouldn't make sense to add flag on the IO API (and
> >>> related on Runner API) like .useNative().
> >>>
> >>> For instance, the user would be able to do:
> >>>
> >>>
> pipeline.apply(KafkaIO.read().withBootstrapServers("...").withTopics("...").useNative(true);
> >>>
> >>> then, if the runner has a "native" IO, it will use it, else, if
> >>> useNative(false) (the default), it won't use any runner native IO.
> >>>
> >>> The point there is for the configuration: assuming the Beam IO and the
> >>> runner IO can differ, it means that the "Beam IO" would have to
> populate
> >>> all runner specific IO configuration.
> >>>
> >>> Of course, it's always possible to use a PTransform to wrap the runner
> >>> native IO, but we are back on the same concern: the pipeline will be
> >>> couple to a specific runner.
> >>>
> >>> The purpose of the useNative() flag is to "automatically" inform the
> >>> runner to use a specific IO if it has one: the pipeline stays decoupled
> >>> from the runners.
> >>>
> >>> Thoughts ?
> >>>
> >>> Thanks
> >>> Regards
> >>> JB
> >>>
> >>
> >
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>
>
>
>
>
>

Re: [DISCUSS] Beam IO &runners native IO

Reply via email to