I think that any valid SQL statement should work with any data source.
Drill should:

   - Push down as much processing as possible into the data source
   (Cassandra in this case)
   - Maintain as much data locality as possible (ie, spread the work so
   that each drillbit is handling local data)
   - In the worst case, Drill should pull the entire table from the data
   source if that's what's needed to satisfy the query.


On Thu, Jan 8, 2015 at 8:29 AM, Yash Sharma <yash...@gmail.com> wrote:

> Hi Folks,
> This thread is to discuss few scenarios how Cassandra works - and how do we
> think it should be supported in Drill.
>
> While they are not supported in Cassandra inherently but its doable on
> Drill's end once we fetch a superset of data without these cases.
>
> 1. Filtering non indexed column in Cassandra
> 2. Filtering by subset of primary key
> 3. OR condition in where clause
>
> Should we apply filters at Drill's end and support these features or we
> propagate an error back to user for asking for a valid Cassandra based
> query?
>
> -----
> Examples:
> Here 'trending_now' is a dummy table with (id, rank, pog_id) where
> (id,rank) is primary key pair.
> 1.
> cqlsh:recsys> select * from trending_now where pog_id=10004 ;
> Bad Request: No indexed columns present in by-columns clause with Equal
> operator
>
> 2.
> cqlsh:recsys> select * from trending_now where rank=4;
> Bad Request: Cannot execute this query as it might involve data filtering
> and thus may have unpredictable performance. If you want to execute this
> query despite the performance unpredictability, use ALLOW FILTERING
> P.S. ALLOW FILTERING is not permitted in Cassandra java driver as of now.
>
> 3.
> cqlsh:recsys> select * from trending_now where rank=4 or id='id0004';
> Bad Request: line 1:40 missing EOF at 'or'
>
> 4. Valid Query:
> cqlsh:recsys> select * from trending_now where id='id0004' and rank=4;
>
>  id     | rank | pog_id
> --------+------+--------
>  id0004 |    4 |  10002
>
> (1 rows)
>

Reply via email to