Hi Aljoscha, I tried to cover this in the doc. Once we have full support for cross-language IO, we can decide this on a case-by-case basis. But I don't think we should cease defining new sources/sinks for Beam Python SDK till we get to that point. I think there are good reasons for adding Kafka support for Python today and many Beam users have request this. Also, note that proposed Python Kafka source will be based on the Splittable DoFn framework while the current Java version is based on the UnboundedSource framework. Here are the reasons that are currently listed in the doc.
- Users might find it useful to have at least one unbounded source and sink combination implemented in Python SDK and Kafka is the streaming system that makes most sense to support if we just want to add support for only one such system in Python SDK. - Not all runners might support cross-language IO. Also some user/runner/deployment combinations might require an unbounded source/sink implemented in Python SDK. - We recently added Splittable DoFn support to Python SDK. It will be good to have at least one production quality Splittable DoFn that will server as a good example for any users who wish to implement new Splittable DoFn implementations on top of Beam Python SDK. - Cross-language transform feature is currently is in the initial discussion phase and it could be some time before we can offer existing Java implementation of Kafka for Python SDK users. - Cross-language IO might take even longer to reach the point where it's fully equivalent in expressive power to a transform written in the host language - e.g. supporting host-language lambdas as part of the transform configuration is likely to take a lot longer than "first-order" cross-language IO. KafkaIO in Java uses lambdas as part of transform configuration, e.g. timestamp functions. Thanks, Cham On Mon, Apr 30, 2018 at 2:14 AM Aljoscha Krettek <[email protected]> wrote: > Is this what we want to do in the long run, i.e. implement copies of > connectors for different SDKs? I thought the plan was to enable using > connectors written in different languages, i.e. use the Java Kafka I/O from > python. This way we wouldn't duplicate bugs for three different language > (Java, Python, and Go for now). > > Best, > Aljoscha > > > On 29. Apr 2018, at 20:46, Eugene Kirpichov <[email protected]> wrote: > > Thanks Cham, this is great! I left just a couple of comments on the doc. > > On Fri, Apr 27, 2018 at 10:06 PM Chamikara Jayalath <[email protected]> > wrote: > >> Hi All, >> >> I'm looking into adding a Kafka connector to Beam Python SDK. I think >> this will benefits many Python SDK users and will serve as a good example >> for recently added Splittable DoFn API (Fn API support which will allow all >> runners to use Python Splittable DoFn is in active development). I created >> a document [1] that makes the case for adding this connector and compares >> the performance of available Python Kafka client libraries. Also I created >> a POC [2] that illustrates the API and how Python SDF API can be used to >> implement a Kafka source. I extremely appreciate any feedback related to >> this. >> >> [1] >> https://docs.google.com/document/d/1ogRS-e-HYYTHsXi_l2zDUUOnvfzEbub3BFkPrYIOawU/edit?usp=sharing >> [2] >> https://github.com/chamikaramj/beam/commit/982767b69198579b22522de6794242142d12c5f9 >> >> Thanks, >> Cham >> > >
