Re: [DISCUSS] Beam IO &runners native IO

amir bahmanyari Thu, 28 Apr 2016 19:52:52 -0700

Hi JB & colleagues,I got this jar file from .Index of 
/groups/snapshots/org/apache/beam
  
|  
|   |  
Index of /groups/snapshots/org/apache/beam
   |  |


  |

 
java-sdk-all-0.1.0-incubating-20160428.071145-22.jar

Is this is right jar to get KafkaIO?Cannt import 
org.apache.beam.sdk.io.kafka....
Thanks so much for your help.Amir-

      From: amir bahmanyari <amirto...@yahoo.com>
 To: "dev@beam.incubator.apache.org" <dev@beam.incubator.apache.org> 
 Sent: Thursday, April 28, 2016 12:59 PM
 Subject: Re: [DISCUSS] Beam IO &runners native IO
   
Thanks so much JB.Got it.Cannt import org.apache.beam.sdk.io.kafka I see 
org.apache.beam.sdk.io source and sink thu.
I wait till I see the usage examples.Thanks again      From: Jean-Baptiste 
Onofré <j...@nanthrax.net>
 To: dev@beam.incubator.apache.org 
 Sent: Thursday, April 28, 2016 12:18 PM
 Subject: Re: [DISCUSS] Beam IO &runners native IO
  
http://repository.apache.org/content/groups/snapshots/org/apache/beam/

On 04/28/2016 09:10 PM, amir bahmanyari wrote:
> Sorry JB.Where can I download a " SNAPSHOT/nightly build" jar?My 
> apologies...I searched couldnt find it.
>
>        From: Jean-Baptiste Onofré <j...@nanthrax.net>
>  To: dev@beam.incubator.apache.org
>  Sent: Thursday, April 28, 2016 11:06 AM
>  Subject: Re: [DISCUSS] Beam IO &runners native IO
>
> The KafkaIO is there:
>
> https://github.com/apache/incubator-beam/tree/master/sdks/java/io/kafka
>
> You have to use SNAPSHOT/nightly build or build yourself.
>
> Regards
> JB
>
> On 04/28/2016 07:49 PM, amir bahmanyari wrote:
>> Thanks so much JB.I Googled KafkaIO & got no references.Which Beam release 
>> supports KafkaIO? Please let me know when the examples are available...cannt 
>> wait !Have a wonderful day.
>>
>>          From: Jean-Baptiste Onofré <j...@nanthrax.net>
>>    To: dev@beam.incubator.apache.org
>>    Sent: Thursday, April 28, 2016 10:40 AM
>>    Subject: Re: [DISCUSS] Beam IO &runners native IO
>>
>> Hi Amir,
>>
>> Now, we have a KafkaIO in Beam (both source and sink). I would start
>> with this one.
>>
>> Actually, I'm preparing new pipeline examples showing the usage of that.
>>
>> Regards
>> JB
>>
>> On 04/28/2016 07:36 PM, amir bahmanyari wrote:
>>> Hi JB,Hope all is great.I am very new to "Beam". Trying to dive into it.Am 
>>> having a lot of trouble to read from a Kafka topic using 
>>> google.cloud.dataflow  APis & FlinkPipelineRunner.Could you point me to 
>>> some docs and/or forums where I can find a solution for my problem pls?I 
>>> really appreciate it.
>>> In case you are curious, given the following 
>>> line:p.apply(Read.named("ReadFromKafka").from(UnboundedFlinkSource.of(kafkaConsumer))).
>>> that connects to my kafka consumer (and I confirm it at the server side), 
>>> it fails & reports:The transform ReadFromKafka [Read(UnboundedFlinkSource)] 
>>> is currently not supported.
>>>
>>> I am not sure where/how I need to specify "ReadFromKafka" part.I am sending 
>>> stream data from my laptop to kafka in  a linux box.The default Kafka 
>>> consumer reports thats its received.
>>> I really appreciate your help. I know this is not a conventional way to ask 
>>> such questions but have been spending a lot of time to figure it out.Have a 
>>> wonderful day JB.
>>> Amir-
>>>            From: Jean-Baptiste Onofré <j...@nanthrax.net>
>>>      To: dev@beam.incubator.apache.org
>>>      Sent: Thursday, April 28, 2016 5:41 AM
>>>      Subject: [DISCUSS] Beam IO &runners native IO
>>>
>>> Hi all,
>>>
>>> regarding the recent threads on the mailing list, I would like to start
>>> a format discussion around the IO.
>>> As we can expect the first contributions on this area (I already have
>>> some work in progress around this ;)), I think it's a fair discussion to
>>> have.
>>>
>>> Now, we have two kinds of IO: the one "generic" to Beam, the one "local"
>>> to the runners.
>>>
>>> For example, let's take Kafka: we have the KafkaIO (in IO), and for
>>> instance, we have the spark-streaming kafka connector (in Spark Runner).
>>>
>>> Right now, we have two approaches for the user:
>>> 1. In the pipeline, we use KafkaIO from Beam: it's the preferred
>>> approach for sure. However, the user may want to use the runner specific
>>> IO for two reasons:
>>>          * Beam doesn't provide the IO yet (for instance, spark cassandra
>>> connector is available whereas we don't have yet any CassandraIO (I'm
>>> working on it anyway ;))
>>>          * The runner native IO is optimized or contain more features that 
>>>the
>>> Beam native IO
>>> 2. So, for the previous reasons, the user could want to use the native
>>> runner IO. The drawback of this approach is that the pipeline will be
>>> tight to a specific runner, which is completely against the Beam design.
>>>
>>> I wonder if it wouldn't make sense to add flag on the IO API (and
>>> related on Runner API) like .useNative().
>>>
>>> For instance, the user would be able to do:
>>>
>>> pipeline.apply(KafkaIO.read().withBootstrapServers("...").withTopics("...").useNative(true);
>>>
>>> then, if the runner has a "native" IO, it will use it, else, if
>>> useNative(false) (the default), it won't use any runner native IO.
>>>
>>> The point there is for the configuration: assuming the Beam IO and the
>>> runner IO can differ, it means that the "Beam IO" would have to populate
>>> all runner specific IO configuration.
>>>
>>> Of course, it's always possible to use a PTransform to wrap the runner
>>> native IO, but we are back on the same concern: the pipeline will be
>>> couple to a specific runner.
>>>
>>> The purpose of the useNative() flag is to "automatically" inform the
>>> runner to use a specific IO if it has one: the pipeline stays decoupled
>>> from the runners.
>>>
>>> Thoughts ?
>>>
>>> Thanks
>>> Regards
>>> JB
>>>
>>
>

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: [DISCUSS] Beam IO &runners native IO

Reply via email to