Hello Pablo, thank you for the response, and apologies for the delay.  I had 
some work and also wanted to prove out what I was proposing with our own code 
at my workplace.

Here is a small gist of what I'm proposing. 

https://gist.github.com/vmarquez/204b8f44b1279fdbae97b40f8681bc25

I'm happy to explain more or even write up an official design doc if you think 
that would be helpful explaining things. 

--Vincent 

On 2019/10/04 18:03:23, Pablo Estrada <[email protected]> wrote: 
> Hi Vincent!> 
> Do you think you could add some code snippets / pseudocode as to what this> 
> looks like? Feel free to do it on email, gist, google doc, etc?> 
> Best> 
> -P.> 
> 
> On Thu, Oct 3, 2019 at 4:16 PM Vincent Marquez <[email protected]>> 
> wrote:> 
> 
> > Currently the CassandraIO connector allows a user to specify a table, and> 
> > the CassandraSource object generates a list of queries based on token> 
> > ranges of the table, along with grouping them by the token ranges.> 
> >> 
> > I often need to run (generated, sometimes a million+) queries against a> 
> > subset of a table.  Instead of providing a filter, it is easier and much> 
> > more performant to supply a collection of queries along with their tokens> 
> > to both partition and group by, instead of letting CassandraIO naively run> 
> > over the entire table or with a simple filter.> 
> >> 
> > I propose in addition to the current method of supplying a table and> 
> > filter, also allowing the user to pass in a collection of queries and> 
> > tokens.   The current way CassandraSource breaks up the table could be> 
> > modified to build on top of the proposed implementation to reduce code> 
> > duplication as well.  If this sounds like an acceptable alternative way of> 
> > using the CassandraIO connector, I don't mind giving it a shot with a pull> 
> > request.> 
> >> 
> > If there is a better way of doing this, I'm eager to hear and learn.> 
> > Thanks for reading!> 
> >> 
> 

Reply via email to