I think that any valid SQL statement should work with any data source. Drill should:
- Push down as much processing as possible into the data source (Cassandra in this case) - Maintain as much data locality as possible (ie, spread the work so that each drillbit is handling local data) - In the worst case, Drill should pull the entire table from the data source if that's what's needed to satisfy the query. On Thu, Jan 8, 2015 at 8:29 AM, Yash Sharma <yash...@gmail.com> wrote: > Hi Folks, > This thread is to discuss few scenarios how Cassandra works - and how do we > think it should be supported in Drill. > > While they are not supported in Cassandra inherently but its doable on > Drill's end once we fetch a superset of data without these cases. > > 1. Filtering non indexed column in Cassandra > 2. Filtering by subset of primary key > 3. OR condition in where clause > > Should we apply filters at Drill's end and support these features or we > propagate an error back to user for asking for a valid Cassandra based > query? > > ----- > Examples: > Here 'trending_now' is a dummy table with (id, rank, pog_id) where > (id,rank) is primary key pair. > 1. > cqlsh:recsys> select * from trending_now where pog_id=10004 ; > Bad Request: No indexed columns present in by-columns clause with Equal > operator > > 2. > cqlsh:recsys> select * from trending_now where rank=4; > Bad Request: Cannot execute this query as it might involve data filtering > and thus may have unpredictable performance. If you want to execute this > query despite the performance unpredictability, use ALLOW FILTERING > P.S. ALLOW FILTERING is not permitted in Cassandra java driver as of now. > > 3. > cqlsh:recsys> select * from trending_now where rank=4 or id='id0004'; > Bad Request: line 1:40 missing EOF at 'or' > > 4. Valid Query: > cqlsh:recsys> select * from trending_now where id='id0004' and rank=4; > > id | rank | pog_id > --------+------+-------- > id0004 | 4 | 10002 > > (1 rows) >