[jira] [Commented] (DRILL-5830) Resolve regressions to MapR DB from DRILL-5546

ASF GitHub Bot (JIRA) Tue, 03 Oct 2017 21:52:20 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-5830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16190812#comment-16190812
 ]


ASF GitHub Bot commented on DRILL-5830:
---------------------------------------

Github user jinfengni commented on the issue:

    https://github.com/apache/drill/pull/968
  
    Project push-down rule may not work in the way we want in at least two 
cases: 1) there is a bug in the code that we are not aware of, 2) it would not 
push the list of columns for "select *", when the table is a dynamic table. For 
HBase, that's not an issue. But it's an issue for MapRDB binary table, as it's 
processed as a table on a file system, in stead of following the logic of HBase 
in planning time. I could not comment why MapRDB binary table was handled that 
way. But the reality is that you still will see column star in execution for 
MapRDB, if we do not keep the "redundant" code in execution (which essentially 
converts * into row_key/column family). 
    
    If we keep column star in record reader's columns list, then we would end 
up with nullable-int vs map column. If we expand column star, then we may end 
up with empty map vs a non-empty map. To me, it's easier to handle the latter 
case. Plus it's something we have to deal with in case of HBase/JSON plugin. 



> Resolve regressions to MapR DB from DRILL-5546
> ----------------------------------------------
>
>                 Key: DRILL-5830
>                 URL: https://issues.apache.org/jira/browse/DRILL-5830
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.12.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>             Fix For: 1.12.0
>
>
> DRILL-5546 added a number of fixes for empty batches. One part of the fix was 
> for HBase. Key changes:
> * Add code to expand wildcards in the planner. (i.e. SELECT *)
> * Remove support for wildcards in the HBase record reader.
> As noted in DRILL-5775, this change had the effect of breaking support for 
> MapR-DB binary (which is API compatible with HBase.) DRILL-5775 does this by 
> expanding wildcards in the planner for MapR DB as was done for HBase in 
> DRILL-5546.
> Unfortunately, this change introduced other regressions into the code as 
> described by DRILL-5706.
> Investigation of those issues revealed that we should back out the original 
> DRILL-5546 changes and go down a different route.
> As it turns out, HBase already had a project push-down rule that expanded 
> wildcards. However, that rule didn't work correctly some of the time. 
> DRILL-5546 fixed that bug, ensuring that wildcards are expanded (at least in 
> the cases tested for this ticket.)
> The actual issue turned out to be a bug in the {{RecordBatchLoader}} class 
> which did not consider map contents when detecting schema change. As a 
> result, results like (row_key, cf\{}) were treated the same as (row_key, 
> cf\{mycol}) and the actual data colums were discarded, but randomly depending 
> on batch arrival order.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5830) Resolve regressions to MapR DB from DRILL-5546

Reply via email to