Aditya Kishore created DRILL-683:
------------------------------------
Summary: Qualify HBase scan with specified columns even if row_key
is required.
Key: DRILL-683
URL: https://issues.apache.org/jira/browse/DRILL-683
Project: Apache Drill
Issue Type: Task
Reporter: Aditya Kishore
Assignee: Aditya Kishore
Attachments:
DRILL-683-Qualify-HBase-scan-with-specified-columns-.patch,
DRILL-683-Qualify-HBase-scan-with-specified-columns-.patch
Currently (as of
https://github.com/apache/incubator-drill/commit/612527bd22c27aa92363d2297a9c2b4a05475fd0),
if row_key is specified as one of the projected column in a query, we do not
qualify the HBase scan with the specified cf\[:column qualifier].
This is done because if we qualify the scan with the columns and for some rows
if ALL of these columns do not exist (but other columns do, which means the row
and hence the row_key exists), HBase will not return even the row key.
For example, for the sample query:
{{SELECT row_key, f\['c1'], f\['c7'] from hbase.MyTable;}}
if there exists a row with following row.
{noformat}
row_key f['c2'] f['c3'] f['c6']
---------------------------------------------
row1 val1 val2 val3
{noformat}
if we qualify the HBase scan with {{f\['c1'], f\['c7']}}, then the row =>
{{row1}} will get dropped from the scan result.
However, not qualifying the scan would have severe impact on scan performance.
Hence we propose the behavior that at if NONE of the specified columns in the
query are present in a row, the entire row will be omitted from the scan result.
--
This message was sent by Atlassian JIRA
(v6.2#6252)