I have opened up a PR for BEAM-9008. I wasn't sure if I should initiate any 'checks' from CI on the PR, so please let me know if I need to and any other changes/issues. Thanks.
On Fri, Dec 20, 2019 at 7:20 AM Ismaël Mejía <ieme...@gmail.com> wrote: > For ref this is the JIRA ticket > https://issues.apache.org/jira/browse/BEAM-9008 > The improvement makes total sense and the change in the internal > implementation from BoundedSource to ParDo has no backwards consequences > for the final users so looks good. This connector does not support Dynamic > Work Rebalancing so there won't be any difference at runtime and this > refactor could be the base for a SDF based implementation. > > I added you as a contributor in JIRA and assigned the ticket to you > Vincent. Great to see this one happening. Welcome to the project! > > Regards, > Ismaël > > On Fri, Dec 20, 2019 at 5:48 AM Vincent Marquez <vincent.marq...@gmail.com> > wrote: > >> >> >> On Thu, Dec 12, 2019 at 8:43 PM Kenneth Knowles <k...@apache.org> wrote: >> >>> On Thu, Dec 12, 2019 at 3:30 PM Vincent Marquez < >>> vincent.marq...@gmail.com> wrote: >>> >>>> Hello, as I've mentioned in previous emails, I've found the CassandraIO >>>> connector lacking some essential features for efficient batch processing in >>>> real world scenarios. We've developed a more fully featured connector and >>>> had good results with it. >>>> >>> >>> Fantastic! >>> >>> >>>> Could I perhaps write up a JIRA proposal for some minor changes to the >>>> current connector that might improve things? >>>> >>> >>> Yes! >>> >>> >>>> The main pain point is the absense of a 'readAll' method as I >>>> documented here: >>>> >>>> https://gist.github.com/vmarquez/204b8f44b1279fdbae97b40f8681bc25 >>>> >>>> If I could write up a ticket, I don't mind submitting a small PR on GH >>>> as well addressing this lack of functionality. Thanks for your time. >>>> >>> >>> This would be excellent. Since it seems you already have implemented and >>> tested the functionality, a simple Jira with a title and description would >>> be enough, and then open a PR linked to the Jira with a title like >>> "[BEAM-1234567] Improve performance of CassandraIO" >>> >> >> I should clarify a bit. What has already been done and tested is a >> custom connector that has a 'readAll' cassandraIO functionality, I did not >> modify the existing beam connector. However, I spent some time the last >> couple days looking over the details of the current CassandraIO connector >> to verify it would be doable for me to do add something similar and still >> maintain all the current functionality. >> >> To share some code between both the 'read' and 'readAll' styles of >> CassandraIO, I'd want to modify the current 'Source' based 'connector' to >> be a 'ParDo' based one, so there is a minor (in my opinon, relative to the >> project) refactor involved. I'm happy to explain in more detail in the >> JIRA. >> >> Thank you for writing to dev@ to share your experience and intentions. >>> We are happy to help you with the Jira and PR, and find the best reviewers, >>> if you will open them to get started. >>> >>> Kenn >>> >> >> Thank you! >> >> >> >>> >>>> *-Vincent* >>>> >>> >> >> -- *-Vincent*