For ref this is the JIRA ticket
https://issues.apache.org/jira/browse/BEAM-9008
The improvement makes total sense and the change in the internal
implementation from BoundedSource to ParDo has no backwards consequences
for the final users so looks good. This connector does not support Dynamic
Work Rebalancing so there won't be any difference at runtime and this
refactor could be the base for a SDF based implementation.

I added you as a contributor in JIRA and assigned the ticket to you
Vincent. Great to see this one happening. Welcome to the project!

Regards,
Ismaël

On Fri, Dec 20, 2019 at 5:48 AM Vincent Marquez <vincent.marq...@gmail.com>
wrote:

>
>
> On Thu, Dec 12, 2019 at 8:43 PM Kenneth Knowles <k...@apache.org> wrote:
>
>> On Thu, Dec 12, 2019 at 3:30 PM Vincent Marquez <
>> vincent.marq...@gmail.com> wrote:
>>
>>> Hello, as I've mentioned in previous emails, I've found the CassandraIO
>>> connector lacking some essential features for efficient batch processing in
>>> real world scenarios.  We've developed a more fully featured connector and
>>> had good results with it.
>>>
>>
>> Fantastic!
>>
>>
>>> Could I perhaps write up a JIRA proposal for some minor changes to the
>>> current connector that might improve things?
>>>
>>
>> Yes!
>>
>>
>>> The  main pain point is the absense of a 'readAll' method as I
>>> documented here:
>>>
>>> https://gist.github.com/vmarquez/204b8f44b1279fdbae97b40f8681bc25
>>>
>>> If I could write up a ticket, I don't mind submitting a small PR on GH
>>> as well addressing this lack of functionality.  Thanks for your time.
>>>
>>
>> This would be excellent. Since it seems you already have implemented and
>> tested the functionality, a simple Jira with a title and description would
>> be enough, and then open a PR linked to the Jira with a title like
>> "[BEAM-1234567] Improve performance of CassandraIO"
>>
>
> I should clarify a bit.  What has already been done and tested is a custom
> connector that has a 'readAll' cassandraIO functionality, I did not modify
> the existing beam connector.  However, I spent some time the last couple
> days looking over the details of the current CassandraIO connector to
> verify it would be doable for me to do add something similar and still
> maintain all the current functionality.
>
> To share some code between both the 'read' and 'readAll' styles of
> CassandraIO, I'd want to modify the current 'Source' based 'connector' to
> be a 'ParDo' based one, so there is a minor (in my opinon, relative to the
> project) refactor involved.  I'm happy to explain in more detail in the
> JIRA.
>
> Thank you for writing to dev@ to share your experience and intentions. We
>> are happy to help you with the Jira and PR, and find the best reviewers, if
>> you will open them to get started.
>>
>> Kenn
>>
>
> Thank you!
>
>
>
>>
>>> *-Vincent*
>>>
>>
>
>

Reply via email to