Another point: cross-language IOs might add a performance penalty in many
cases. For an example of this look at BigQueryIO. The user can register a
SerializableFunction that is evaluated on every record, and determines
which destination to write the record to. Now a Python user would want to
register a Python function for this of course. this means that the Java IO
would have to invoke Python code for each record it sees, which will likely
be a big performance hit.

Of course the downside of duplicating IOs is exactly as you say - multiple
versions to maintain, and potentially duplicate bugs. I think the right
answer will need to be on a case-by-case basis.

Reuven

On Mon, Apr 30, 2018 at 8:05 AM Chamikara Jayalath <chamik...@google.com>
wrote:

> Hi Aljoscha,
>
> I tried to cover this in the doc. Once we have full support for
> cross-language IO, we can decide this on a case-by-case basis. But I don't
> think we should cease defining new sources/sinks for Beam Python SDK till
> we get to that point. I think there are good reasons for adding Kafka
> support for Python today and many Beam users have request this. Also, note
> that proposed Python Kafka source will be based on the Splittable DoFn
> framework while the current Java version is based on the UnboundedSource
> framework. Here are the reasons that are currently listed in the doc.
>
>
>    -
>
>    Users might find it useful to have at least one unbounded source and
>    sink combination implemented in Python SDK and Kafka is the streaming
>    system that makes most sense to support if we just want to add support for
>    only one such system in Python SDK.
>    -
>
>    Not all runners might support cross-language IO. Also some
>    user/runner/deployment combinations might require an unbounded source/sink
>    implemented in Python SDK.
>    -
>
>    We recently added Splittable DoFn support to Python SDK. It will be
>    good to have at least one production quality Splittable DoFn that will
>    server as a good example for any users who wish to implement new Splittable
>    DoFn implementations on top of Beam Python SDK.
>    -
>
>    Cross-language transform feature is currently is in the initial
>    discussion phase and it could be some time before we can offer existing
>    Java implementation of Kafka for Python SDK users.
>    -
>
>    Cross-language IO might take even longer to reach the point where it's
>    fully equivalent in expressive power to a transform written in the host
>    language - e.g. supporting host-language lambdas as part of the transform
>    configuration is likely to take a lot longer than "first-order"
>    cross-language IO. KafkaIO in Java uses lambdas as part of transform
>    configuration, e.g. timestamp functions.
>
>
> Thanks,
> Cham
>
> On Mon, Apr 30, 2018 at 2:14 AM Aljoscha Krettek <aljos...@apache.org>
> wrote:
>
>> Is this what we want to do in the long run, i.e. implement copies of
>> connectors for different SDKs? I thought the plan was to enable using
>> connectors written in different languages, i.e. use the Java Kafka I/O from
>> python. This way we wouldn't duplicate bugs for three different language
>> (Java, Python, and Go for now).
>>
>> Best,
>> Aljoscha
>>
>>
>> On 29. Apr 2018, at 20:46, Eugene Kirpichov <kirpic...@google.com> wrote:
>>
>> Thanks Cham, this is great! I left just a couple of comments on the doc.
>>
>> On Fri, Apr 27, 2018 at 10:06 PM Chamikara Jayalath <chamik...@google.com>
>> wrote:
>>
>>> Hi All,
>>>
>>> I'm looking into adding a Kafka connector to Beam Python SDK. I think
>>> this will benefits many Python SDK users and will serve as a good example
>>> for recently added Splittable DoFn API (Fn API support which will allow all
>>> runners to use Python Splittable DoFn is in active development).  I created
>>> a document [1] that makes the case for adding this connector and compares
>>> the performance of available Python Kafka client libraries. Also I created
>>> a POC [2] that illustrates the API and how Python SDF API can be used to
>>> implement a Kafka source. I extremely appreciate any feedback related to
>>> this.
>>>
>>> [1]
>>> https://docs.google.com/document/d/1ogRS-e-HYYTHsXi_l2zDUUOnvfzEbub3BFkPrYIOawU/edit?usp=sharing
>>> [2]
>>> https://github.com/chamikaramj/beam/commit/982767b69198579b22522de6794242142d12c5f9
>>>
>>> Thanks,
>>> Cham
>>>
>>
>>

Reply via email to