Re: Kafka connector for Beam Python SDK

Chamikara Jayalath Mon, 30 Apr 2018 08:06:05 -0700

Hi Aljoscha,

I tried to cover this in the doc. Once we have full support for
cross-language IO, we can decide this on a case-by-case basis. But I don't
think we should cease defining new sources/sinks for Beam Python SDK till
we get to that point. I think there are good reasons for adding Kafka
support for Python today and many Beam users have request this. Also, note
that proposed Python Kafka source will be based on the Splittable DoFn
framework while the current Java version is based on the UnboundedSource
framework. Here are the reasons that are currently listed in the doc.

   -

   Users might find it useful to have at least one unbounded source and
   sink combination implemented in Python SDK and Kafka is the streaming
   system that makes most sense to support if we just want to add support for
   only one such system in Python SDK.
   -

   Not all runners might support cross-language IO. Also some
   user/runner/deployment combinations might require an unbounded source/sink
   implemented in Python SDK.
   -

   We recently added Splittable DoFn support to Python SDK. It will be good
   to have at least one production quality Splittable DoFn that will server
   as a good example for any users who wish to implement new Splittable DoFn
   implementations on top of Beam Python SDK.
   -

   Cross-language transform feature is currently is in the initial
   discussion phase and it could be some time before we can offer existing
   Java implementation of Kafka for Python SDK users.
   -

   Cross-language IO might take even longer to reach the point where it's
   fully equivalent in expressive power to a transform written in the host
   language - e.g. supporting host-language lambdas as part of the transform
   configuration is likely to take a lot longer than "first-order"
   cross-language IO. KafkaIO in Java uses lambdas as part of transform
   configuration, e.g. timestamp functions.

Thanks,
Cham

On Mon, Apr 30, 2018 at 2:14 AM Aljoscha Krettek <[email protected]>
wrote:

> Is this what we want to do in the long run, i.e. implement copies of
> connectors for different SDKs? I thought the plan was to enable using
> connectors written in different languages, i.e. use the Java Kafka I/O from
> python. This way we wouldn't duplicate bugs for three different language
> (Java, Python, and Go for now).
>
> Best,
> Aljoscha
>
>
> On 29. Apr 2018, at 20:46, Eugene Kirpichov <[email protected]> wrote:
>
> Thanks Cham, this is great! I left just a couple of comments on the doc.
>
> On Fri, Apr 27, 2018 at 10:06 PM Chamikara Jayalath <[email protected]>
> wrote:
>
>> Hi All,
>>
>> I'm looking into adding a Kafka connector to Beam Python SDK. I think
>> this will benefits many Python SDK users and will serve as a good example
>> for recently added Splittable DoFn API (Fn API support which will allow all
>> runners to use Python Splittable DoFn is in active development).  I created
>> a document [1] that makes the case for adding this connector and compares
>> the performance of available Python Kafka client libraries. Also I created
>> a POC [2] that illustrates the API and how Python SDF API can be used to
>> implement a Kafka source. I extremely appreciate any feedback related to
>> this.
>>
>> [1]
>> https://docs.google.com/document/d/1ogRS-e-HYYTHsXi_l2zDUUOnvfzEbub3BFkPrYIOawU/edit?usp=sharing
>> [2]
>> https://github.com/chamikaramj/beam/commit/982767b69198579b22522de6794242142d12c5f9
>>
>> Thanks,
>> Cham
>>
>
>

Re: Kafka connector for Beam Python SDK

Reply via email to