Another point: cross-language IOs might add a performance penalty in many cases. For an example of this look at BigQueryIO. The user can register a SerializableFunction that is evaluated on every record, and determines which destination to write the record to. Now a Python user would want to register a Python function for this of course. this means that the Java IO would have to invoke Python code for each record it sees, which will likely be a big performance hit.
Of course the downside of duplicating IOs is exactly as you say - multiple versions to maintain, and potentially duplicate bugs. I think the right answer will need to be on a case-by-case basis. Reuven On Mon, Apr 30, 2018 at 8:05 AM Chamikara Jayalath <chamik...@google.com> wrote: > Hi Aljoscha, > > I tried to cover this in the doc. Once we have full support for > cross-language IO, we can decide this on a case-by-case basis. But I don't > think we should cease defining new sources/sinks for Beam Python SDK till > we get to that point. I think there are good reasons for adding Kafka > support for Python today and many Beam users have request this. Also, note > that proposed Python Kafka source will be based on the Splittable DoFn > framework while the current Java version is based on the UnboundedSource > framework. Here are the reasons that are currently listed in the doc. > > > - > > Users might find it useful to have at least one unbounded source and > sink combination implemented in Python SDK and Kafka is the streaming > system that makes most sense to support if we just want to add support for > only one such system in Python SDK. > - > > Not all runners might support cross-language IO. Also some > user/runner/deployment combinations might require an unbounded source/sink > implemented in Python SDK. > - > > We recently added Splittable DoFn support to Python SDK. It will be > good to have at least one production quality Splittable DoFn that will > server as a good example for any users who wish to implement new Splittable > DoFn implementations on top of Beam Python SDK. > - > > Cross-language transform feature is currently is in the initial > discussion phase and it could be some time before we can offer existing > Java implementation of Kafka for Python SDK users. > - > > Cross-language IO might take even longer to reach the point where it's > fully equivalent in expressive power to a transform written in the host > language - e.g. supporting host-language lambdas as part of the transform > configuration is likely to take a lot longer than "first-order" > cross-language IO. KafkaIO in Java uses lambdas as part of transform > configuration, e.g. timestamp functions. > > > Thanks, > Cham > > On Mon, Apr 30, 2018 at 2:14 AM Aljoscha Krettek <aljos...@apache.org> > wrote: > >> Is this what we want to do in the long run, i.e. implement copies of >> connectors for different SDKs? I thought the plan was to enable using >> connectors written in different languages, i.e. use the Java Kafka I/O from >> python. This way we wouldn't duplicate bugs for three different language >> (Java, Python, and Go for now). >> >> Best, >> Aljoscha >> >> >> On 29. Apr 2018, at 20:46, Eugene Kirpichov <kirpic...@google.com> wrote: >> >> Thanks Cham, this is great! I left just a couple of comments on the doc. >> >> On Fri, Apr 27, 2018 at 10:06 PM Chamikara Jayalath <chamik...@google.com> >> wrote: >> >>> Hi All, >>> >>> I'm looking into adding a Kafka connector to Beam Python SDK. I think >>> this will benefits many Python SDK users and will serve as a good example >>> for recently added Splittable DoFn API (Fn API support which will allow all >>> runners to use Python Splittable DoFn is in active development). I created >>> a document [1] that makes the case for adding this connector and compares >>> the performance of available Python Kafka client libraries. Also I created >>> a POC [2] that illustrates the API and how Python SDF API can be used to >>> implement a Kafka source. I extremely appreciate any feedback related to >>> this. >>> >>> [1] >>> https://docs.google.com/document/d/1ogRS-e-HYYTHsXi_l2zDUUOnvfzEbub3BFkPrYIOawU/edit?usp=sharing >>> [2] >>> https://github.com/chamikaramj/beam/commit/982767b69198579b22522de6794242142d12c5f9 >>> >>> Thanks, >>> Cham >>> >> >>