Aditya Kishore created DRILL-683:
------------------------------------

             Summary: Qualify HBase scan with specified columns even if row_key 
is required.
                 Key: DRILL-683
                 URL: https://issues.apache.org/jira/browse/DRILL-683
             Project: Apache Drill
          Issue Type: Task
            Reporter: Aditya Kishore
            Assignee: Aditya Kishore
         Attachments: 
DRILL-683-Qualify-HBase-scan-with-specified-columns-.patch, 
DRILL-683-Qualify-HBase-scan-with-specified-columns-.patch

Currently (as of 
https://github.com/apache/incubator-drill/commit/612527bd22c27aa92363d2297a9c2b4a05475fd0),
 if row_key is specified as one of the projected column in a query, we do not 
qualify the HBase scan with the specified cf\[:column qualifier].

This is done because if we qualify the scan with the columns and for some rows 
if ALL of these columns do not exist (but other columns do, which means the row 
and hence the row_key exists), HBase will not return even the row key.

For example, for the sample query:
{{SELECT row_key, f\['c1'], f\['c7'] from hbase.MyTable;}}
if there exists a row with following row.

{noformat}
row_key     f['c2']     f['c3']     f['c6']
---------------------------------------------
  row1       val1        val2        val3
{noformat}

if we qualify the HBase scan with {{f\['c1'], f\['c7']}}, then the row => 
{{row1}} will get dropped from the scan result.

However, not qualifying the scan would have severe impact on scan performance.

Hence we propose the behavior that at if NONE of the specified columns in the 
query are present in a row, the entire row will be omitted from the scan result.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to