Just want to mention that we have been working with Vincent in the ReadAll implementation for Cassandra based on normal DoFn, and we expect to get it merged for the next release of Beam. Vincent is familiarized now with DoFn based IO composition, a first step towards SDF understanding. Vincent you can think of our Cassandra RingRange as a Restriction in the context of SDF. Just for reference it would be good to read in advance these two:
https://beam.apache.org/blog/splittable-do-fn/ https://beam.apache.org/documentation/programming-guide/#sdf-basics Thanks Boyuan for offering your help I think it is really needed considering that we don't have many Unbounded SDF connectors to use as reference. On Thu, Nov 19, 2020 at 11:16 PM Boyuan Zhang <boyu...@google.com> wrote: > > > > On Thu, Nov 19, 2020 at 1:29 PM Vincent Marquez <vincent.marq...@gmail.com> > wrote: >> >> >> >> >> On Thu, Nov 19, 2020 at 10:38 AM Boyuan Zhang <boyu...@google.com> wrote: >>> >>> Hi Vincent, >>> >>> Thanks for your contribution! I'm happy to work with you on this when you >>> contribute the code into Beam. >> >> >> Should I write up a JIRA to start? I have access, I've already been in the >> process of contributing some big changes to the CassandraIO connector. > > > Yes, please create a JIRA and assign it to yourself. > >> >> >>> >>> >>> Another thing is that it would be preferable to use Splittable DoFn instead >>> of using UnboundedSource to write a new IO. >> >> >> I would prefer to use the UnboundedSource connector, I've already written >> most of it, but also, I see some challenges using Splittable DoFn for Redis >> streams. >> >> Unlike Kafka and Kinesis, Redis Streams offsets are not simply monotonically >> increasing counters, so there is not a way to just claim a chunk of work >> and know that the chunk has any actual data in it. >> >> Since UnboundedSource is not yet deprecated, could I contribute that after >> finishing up some test aspects, and then perhaps we can implement a >> Splittable DoFn version? > > > It would be nice not to build new IOs on top of UnboundedSource. Currently we > already have the wrapper class which translates the existing UnboundedSource > into Unbounded Splittable DoFn and executes the UnboundedSource as the > Splittable DoFn. How about you open a WIP PR and we go through the > UnboundedSource implementation together to figure out a design for using > Splittable DoFn? > > >> >> >> >>> >>> >>> On Thu, Nov 19, 2020 at 10:18 AM Vincent Marquez >>> <vincent.marq...@gmail.com> wrote: >>>> >>>> Currently, Redis offers a streaming queue functionality similar to >>>> Kafka/Kinesis/Pubsub, but Beam does not currently have a connector for it. >>>> >>>> I've written an UnboundedSource connector that makes use of Redis Streams >>>> as a POC and it seems to work well. >>>> >>>> If someone is willing to work with me, I could write up a JIRA and/or open >>>> up a WIP pull request if there is interest in getting this as an official >>>> connector. I would mostly need guidance on naming/testing aspects. >>>> >>>> https://redis.io/topics/streams-intro >>>> >>>> ~Vincent >> >> >> ~Vincent