There's some work needed to make the Java connector available as a cross-language transform for Python. More specifically,
(1) Add a Java builder and registrar to register Java transforms with the expansion service (see [1] and [2] for Kafka) (2) Add a Python wrapper (see [3] for Kafka) Thanks, Cham [1] https://github.com/apache/beam/blob/master/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java#L396 [2] https://github.com/apache/beam/blob/master/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java#L1429 [3] https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/external/kafka.py On Wed, Jun 17, 2020 at 8:57 AM Shashanka Balakuntala < [email protected]> wrote: > Hi All, > In regards with this discussion, I created a JIRA issue[1]. Now since > there is a talk here on cross-platform connector, should I just close the > issue with a link to Java Snowflake connector, or does anyone think writing > python based connector has some advantage in terms of performance or > usability. Please let me know what you guys think, so that i can take the > necessary step on this. > > [1] - https://issues.apache.org/jira/browse/BEAM-9466 > > *Regards* > Shashanka Balakuntala Srinivasa > > > > On Wed, Mar 11, 2020 at 2:25 AM Chamikara Jayalath <[email protected]> > wrote: > >> >> >> On Tue, Mar 10, 2020 at 1:18 PM Tyler Akidau <[email protected]> wrote: >> >>> On Tue, Mar 10, 2020 at 1:27 AM Elias Djurfeldt < >>> [email protected]> wrote: >>> >>>> From what I can tell, the only difference is that the Python connector >>>> is a pure Python implementation and doesn't rely on ODBC or JDBC (it's just >>>> a pip installable). Whereas the Java version needs JDBC. But that seems to >>>> be the only difference. >>>> >>> >>> Correct me if I'm wrong, but this sounds like a concern around having to >>> install Java dependencies for the cross-language transform. If so, I think >>> the question is: how frictionless can we make the user experience here? If >>> it can be relatively straightforward, even for a Python user with zero Java >>> familiarity, it's going to be a win from a maintainability perspective to >>> only have one implementation (Java, in this case) to keep up to date, as >>> Cham pointed out. Kasia, do you have a sense yet for what the experience >>> for a Python user would be for using the Python-wrapped Java SnowflakeIO >>> connector? >>> >> >> There are many aspects related to usability of cross-language transforms >> that are currently being worked on. We are doing some of the usability >> improvements to cross-language Kafka. But the end goal is to make using >> cross-language transforms seamless as possible to end users. For example, >> (1) Expansion service can be started up automatically if users have Java >> installed in their system. >> (2) Native language wrappers can be aware of the immediate dependencies >> needed for the expansion service. >> (3) Additional dependencies can be obtained as a part of the new >> environment >> <https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L1280> >> received through the cross-language transform expansion protocol. >> >> Also we need to add better support for converting arbitrary Java types to >> arbitrary Python types using Row coder ( >> https://issues.apache.org/jira/browse/BEAM-8732). >> >> So hopefully, the user experience of using cross-language Java transforms >> from Python can be as seamless as "just install JRE and use the transforms >> in Python xyz_io.py". >> >> There might be additional Snowflake specific considerations I'm not aware >> of. >> >> Thanks, >> Cham >> >> >>> >>> -Tyler >>> >>> >>>> >>>> I don't know enough about the Java side of Beam (or Java in general >>>> really) to say if that's an issue or not though :) >>>> >>>> Cheers, >>>> >>>> On Mon, 9 Mar 2020 at 18:06, Chamikara Jayalath <[email protected]> >>>> wrote: >>>> >>>>> Thank you. Elias and Shashanka, do you think the Python connector (and >>>>> API) can offer some additional benefits that a Java cross-language >>>>> <https://beam.apache.org/roadmap/connectors-multi-sdk/> connector >>>>> cannot ? It's fine to develop Java and Python versions if it makes sense >>>>> but if cross-language Java version offers the same benefits as Python just >>>>> having one implementation will reduce maintenance burden. >>>>> >>>>> Thanks, >>>>> Cham >>>>> >>>>> On Mon, Mar 9, 2020 at 5:41 AM Katarzyna Kucharczyk < >>>>> [email protected]> wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> Me and my colleague Dariusz we are working currently on Java >>>>>> connector and we are planning to use cross-language to add Python as >>>>>> well. >>>>>> The proposal should arrive on dev-list in the nearest future. >>>>>> Also we would be happy to help if needed in current work of yours. >>>>>> >>>>>> Cheers, >>>>>> Kasia >>>>>> >>>>>> On Mon, Mar 9, 2020 at 9:41 AM Elias Djurfeldt < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Cool Shashanka! Feel free to tag me in the JIRA and update me on any >>>>>>> progress / ponderings. >>>>>>> >>>>>>> Cheers, >>>>>>> Elias >>>>>>> >>>>>>> On Sat, 7 Mar 2020 at 03:43, Chamikara Jayalath < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Absolutely. Please create a JIRA and coordinate with Elias and any >>>>>>>> others that would like to contribute to this. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Cham >>>>>>>> >>>>>>>> On Fri, Mar 6, 2020 at 10:46 AM Shashanka Balakuntala < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Hi Chamikara and Elias, >>>>>>>>> This seems like an interesting feature. Can I start working on >>>>>>>>> this? >>>>>>>>> *Regards* >>>>>>>>> Shashanka Balakuntala Srinivasa >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Sat, Mar 7, 2020 at 12:00 AM Chamikara Jayalath < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> I don't think we have this but contributions are welcome. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Cham >>>>>>>>>> >>>>>>>>>> On Tue, Mar 3, 2020 at 4:46 AM Elias Djurfeldt < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Hi all, >>>>>>>>>>> >>>>>>>>>>> I've stumbled upon a use case where I might need a SnowflakeIO >>>>>>>>>>> in Python. Has anyone worked on this before or are there any >>>>>>>>>>> discussions >>>>>>>>>>> surrounding it? >>>>>>>>>>> >>>>>>>>>>> There is a Snowflake Python library available [1], so looks >>>>>>>>>>> feasible to implement in Beam. >>>>>>>>>>> >>>>>>>>>>> [1] >>>>>>>>>>> https://docs.snowflake.net/manuals/user-guide/python-connector.html >>>>>>>>>>> >>>>>>>>>>> Cheers, >>>>>>>>>>> Elias >>>>>>>>>>> >>>>>>>>>>
