[jira] [Commented] (BEAM-3788) Implement a Kafka IO for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117906#comment-17117906 ] Chamikara Madhusanka Jayalath commented on BEAM-3788: - KafkaIO is expected to be available as a cross-language transforms for Dataflow with Beam 2.22. Flink/Spark already support this. So closing the JIRA. We can track further improvements to the cross-language KafkaIO in other JIRAs. > Implement a Kafka IO for Python SDK > --- > > Key: BEAM-3788 > URL: https://issues.apache.org/jira/browse/BEAM-3788 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: P2 > Fix For: 2.22.0 > > > Java KafkaIO will be made available to Python users as a cross-language > transform. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-3788) Implement a Kafka IO for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17059941#comment-17059941 ] Nicolae Rosia commented on BEAM-3788: - until this is fixed, is there a way to implement an unbounded source in Python? If yes, then I could work around this by using a Kafka library in Python, such as [https://pypi.org/project/kafka-python/] > Implement a Kafka IO for Python SDK > --- > > Key: BEAM-3788 > URL: https://issues.apache.org/jira/browse/BEAM-3788 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > > Java KafkaIO will be made available to Python users as a cross-language > transform. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-3788) Implement a Kafka IO for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17041230#comment-17041230 ] Brian Hulette commented on BEAM-3788: - [~chadrik] for the coder issue: I'm +1 on the approach you suggested in your [BEAM-7870 comment|https://issues.apache.org/jira/browse/BEAM-7870?focusedCommentId=16959135=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16959135] - use portable row coder as the transport, then on the Python side make a PubsubMessage TypedDict (and possibly manually convert to a different more useful type, until we have BEAM-8732 and we can do this more cleanly). So there's still work to do there, but there's a clear path forward. > Implement a Kafka IO for Python SDK > --- > > Key: BEAM-3788 > URL: https://issues.apache.org/jira/browse/BEAM-3788 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > > Java KafkaIO will be made available to Python users as a cross-language > transform. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-3788) Implement a Kafka IO for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17040565#comment-17040565 ] Chamikara Madhusanka Jayalath commented on BEAM-3788: - Added a comment to the email thread. I think this is a Flink specific issue that should go away after SDF. > Implement a Kafka IO for Python SDK > --- > > Key: BEAM-3788 > URL: https://issues.apache.org/jira/browse/BEAM-3788 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > > Java KafkaIO will be made available to Python users as a cross-language > transform. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-3788) Implement a Kafka IO for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17040558#comment-17040558 ] Chad Dombrova commented on BEAM-3788: - Unless something has changed recently, https://issues.apache.org/jira/browse/BEAM-7870 is still a blocker for using KafkaIO in python out of the box. As the title suggests, it's also blocking PubSubIO in python and conceptually any external transform with a non-trivial coder. [~mxm], [~bhulette] has anything changed on that issue lately? > Implement a Kafka IO for Python SDK > --- > > Key: BEAM-3788 > URL: https://issues.apache.org/jira/browse/BEAM-3788 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > > Java KafkaIO will be made available to Python users as a cross-language > transform. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-3788) Implement a Kafka IO for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17036376#comment-17036376 ] Chamikara Madhusanka Jayalath commented on BEAM-3788: - We are actively working on making Java KafkaIO available to Python users as a cross-language transform for all runners. Current solution [1]makes Java Kafka source which is based on the UnboundedSource framework available through cross-language transforms framework. But this will not really work for any portable runners without transform overrides since UnboundedSource framework is not available for portable runners (This currently works for Flink since FlinkRunner overrides this source with a native source implementation). We need a Kafka cross-language transform based on a SDF version of KafkaIO which can be made available to Python users. On the other hand we are working on an UnboundedSource to SDF converter. So this can just be a cross-language transform for a Kafka transform that uses a SDF for existing Kafka UnboundedSource generated through this converter. [1][https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/external/kafka.py] > Implement a Kafka IO for Python SDK > --- > > Key: BEAM-3788 > URL: https://issues.apache.org/jira/browse/BEAM-3788 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > > Java KafkaIO will be made available to Python users as a cross-language > transform. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-3788) Implement a Kafka IO for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982762#comment-16982762 ] Jing Chen commented on BEAM-3788: - i am curious if there is a plan to integrate confluent's schema registry etc or simply vanilla kafka > Implement a Kafka IO for Python SDK > --- > > Key: BEAM-3788 > URL: https://issues.apache.org/jira/browse/BEAM-3788 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Priority: Major > > This will be implemented using the Splittable DoFn framework. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-3788) Implement a Kafka IO for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16956261#comment-16956261 ] Chamikara Madhusanka Jayalath commented on BEAM-3788: - I believe unbounded SDF support for Python SDK is few months away. We are also working on adding cross-language transforms support to Dataflow but I don't have an ETA yet. As I mentioned in a previous comment either of these will pave the way for a KafkaIO in Python SDK on Dataflow. KafkaIO is already available as a cross-language transform for Flink: [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/external/kafka.py] > Implement a Kafka IO for Python SDK > --- > > Key: BEAM-3788 > URL: https://issues.apache.org/jira/browse/BEAM-3788 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Priority: Major > > This will be implemented using the Splittable DoFn framework. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-3788) Implement a Kafka IO for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955943#comment-16955943 ] Chethan UK commented on BEAM-3788: -- [~chamikara] Any Docs? Wanted to use Kafka in Dataflow pipelines... Thanks!. > Implement a Kafka IO for Python SDK > --- > > Key: BEAM-3788 > URL: https://issues.apache.org/jira/browse/BEAM-3788 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Priority: Major > > This will be implemented using the Splittable DoFn framework. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-3788) Implement a Kafka IO for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16860419#comment-16860419 ] Chamikara Jayalath commented on BEAM-3788: -- This is blocked till we have a streaming source framework for Python SDK for portable runners. To this end, Splittable DoFn is currently under development. Note that, for folks using Flink runner, native Kafka source of Flink is currently available to Python SDK users through the cross-language transform API. > Implement a Kafka IO for Python SDK > --- > > Key: BEAM-3788 > URL: https://issues.apache.org/jira/browse/BEAM-3788 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Jayalath >Assignee: Chamikara Jayalath >Priority: Major > > This will be implemented using the Splittable DoFn framework. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3788) Implement a Kafka IO for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859396#comment-16859396 ] Willem Pienaar commented on BEAM-3788: -- [~chamikara]:Is there anything still blocking this from being implemented? Might have some time over the next few weeks. > Implement a Kafka IO for Python SDK > --- > > Key: BEAM-3788 > URL: https://issues.apache.org/jira/browse/BEAM-3788 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Jayalath >Assignee: Chamikara Jayalath >Priority: Major > > This will be implemented using the Splittable DoFn framework. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3788) Implement a Kafka IO for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1683#comment-1683 ] Chad Dombrova commented on BEAM-3788: - Hi, any updates on this? > Implement a Kafka IO for Python SDK > --- > > Key: BEAM-3788 > URL: https://issues.apache.org/jira/browse/BEAM-3788 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Jayalath >Assignee: Chamikara Jayalath >Priority: Major > > This will be implemented using the Splittable DoFn framework. -- This message was sent by Atlassian JIRA (v7.6.3#76005)