Just want to mention that we have been working with Vincent in the
ReadAll implementation for Cassandra based on normal DoFn, and we
expect to get it merged for the next release of Beam. Vincent is
familiarized now with DoFn based IO composition, a first step towards
SDF understanding. Vincent you can think of our Cassandra RingRange as
a Restriction in the context of SDF. Just for reference it would be
good to read in advance these two:

https://beam.apache.org/blog/splittable-do-fn/
https://beam.apache.org/documentation/programming-guide/#sdf-basics

Thanks Boyuan for offering your help I think it is really needed
considering that we don't have many Unbounded SDF connectors to use as
reference.

On Thu, Nov 19, 2020 at 11:16 PM Boyuan Zhang <boyu...@google.com> wrote:
>
>
>
> On Thu, Nov 19, 2020 at 1:29 PM Vincent Marquez <vincent.marq...@gmail.com> 
> wrote:
>>
>>
>>
>>
>> On Thu, Nov 19, 2020 at 10:38 AM Boyuan Zhang <boyu...@google.com> wrote:
>>>
>>> Hi Vincent,
>>>
>>> Thanks for your contribution! I'm happy to work with you on this when you 
>>> contribute the code into Beam.
>>
>>
>> Should I write up a JIRA to start?  I have access, I've already been in the 
>> process of contributing some big changes to the CassandraIO connector.
>
>
> Yes, please create a JIRA and assign it to yourself.
>
>>
>>
>>>
>>>
>>> Another thing is that it would be preferable to use Splittable DoFn instead 
>>> of using UnboundedSource to write a new IO.
>>
>>
>> I would prefer to use the UnboundedSource connector, I've already written 
>> most of it, but also, I see some challenges using Splittable DoFn for Redis 
>> streams.
>>
>> Unlike Kafka and Kinesis, Redis Streams offsets are not simply monotonically 
>> increasing counters, so there is not a way  to just claim a chunk of work 
>> and know that the chunk has any actual data in it.
>>
>> Since UnboundedSource is not yet deprecated, could I contribute that after 
>> finishing up some test aspects, and then perhaps we can implement a 
>> Splittable DoFn version?
>
>
> It would be nice not to build new IOs on top of UnboundedSource. Currently we 
> already have the wrapper class which translates the existing UnboundedSource 
> into Unbounded Splittable DoFn and executes the UnboundedSource as the 
> Splittable DoFn. How about you open a WIP PR and we go through the 
> UnboundedSource implementation together to figure out a design for using 
> Splittable DoFn?
>
>
>>
>>
>>
>>>
>>>
>>> On Thu, Nov 19, 2020 at 10:18 AM Vincent Marquez 
>>> <vincent.marq...@gmail.com> wrote:
>>>>
>>>> Currently, Redis offers a streaming queue functionality similar to 
>>>> Kafka/Kinesis/Pubsub, but Beam does not currently have a connector for it.
>>>>
>>>> I've written an UnboundedSource connector that makes use of Redis Streams 
>>>> as a POC and it seems to work well.
>>>>
>>>> If someone is willing to work with me, I could write up a JIRA and/or open 
>>>> up a WIP pull request if there is interest in getting this as an official 
>>>> connector.  I would mostly need guidance on naming/testing aspects.
>>>>
>>>> https://redis.io/topics/streams-intro
>>>>
>>>> ~Vincent
>>
>>
>> ~Vincent

Reply via email to