Currently the CassandraIO connector allows a user to specify a table, and the CassandraSource object generates a list of queries based on token ranges of the table, along with grouping them by the token ranges.
I often need to run (generated, sometimes a million+) queries against a subset of a table. Instead of providing a filter, it is easier and much more performant to supply a collection of queries along with their tokens to both partition and group by, instead of letting CassandraIO naively run over the entire table or with a simple filter. I propose in addition to the current method of supplying a table and filter, also allowing the user to pass in a collection of queries and tokens. The current way CassandraSource breaks up the table could be modified to build on top of the proposed implementation to reduce code duplication as well. If this sounds like an acceptable alternative way of using the CassandraIO connector, I don't mind giving it a shot with a pull request. If there is a better way of doing this, I'm eager to hear and learn. Thanks for reading!
