[jira] [Commented] (DRILL-5375) Nested loop join: return correct result for left join

ASF GitHub Bot (JIRA) Wed, 29 Mar 2017 03:37:23 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15946908#comment-15946908
 ]


ASF GitHub Bot commented on DRILL-5375:
---------------------------------------

Github user arina-ielchiieva commented on a diff in the pull request:

    https://github.com/apache/drill/pull/794#discussion_r108640631
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillOptiq.java
 ---
    @@ -70,27 +70,65 @@
       private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(DrillOptiq.class);
     
       /**
    -   * Converts a tree of {@link RexNode} operators into a scalar expression 
in Drill syntax.
    +   * Converts a tree of {@link RexNode} operators into a scalar expression 
in Drill syntax using one input.
    +   *
    +   * @param context parse context which contains planner settings
    +   * @param input data input
    +   * @param expr expression to be converted
    +   * @return converted expression
        */
       public static LogicalExpression toDrill(DrillParseContext context, 
RelNode input, RexNode expr) {
    -    final RexToDrill visitor = new RexToDrill(context, input);
    +    return toDrill(context, Lists.newArrayList(input), expr);
    +  }
    +
    +  /**
    +   * Converts a tree of {@link RexNode} operators into a scalar expression 
in Drill syntax using multiple inputs.
    +   *
    +   * @param context parse context which contains planner settings
    +   * @param inputs multiple data inputs
    +   * @param expr expression to be converted
    +   * @return converted expression
    +   */
    +  public static LogicalExpression toDrill(DrillParseContext context, 
List<RelNode> inputs, RexNode expr) {
    +    final RexToDrill visitor = new RexToDrill(context, inputs);
         return expr.accept(visitor);
       }
     
       private static class RexToDrill extends 
RexVisitorImpl<LogicalExpression> {
    -    private final RelNode input;
    +    private final List<RelNode> inputs;
         private final DrillParseContext context;
    +    private final List<RelDataTypeField> fieldList;
     
    -    RexToDrill(DrillParseContext context, RelNode input) {
    +    RexToDrill(DrillParseContext context, List<RelNode> inputs) {
           super(true);
           this.context = context;
    -      this.input = input;
    +      this.inputs = inputs;
    +      this.fieldList = Lists.newArrayList();
    +      /*
    +         Fields are enumerated by their presence order in input. Details 
{@link org.apache.calcite.rex.RexInputRef}.
    +         Thus we can merge field list from several inputs by adding them 
into the list in order of appearance.
    +         Each field index in the list will match field index in the 
RexInputRef instance which will allow us
    +         to retrieve field from filed list by index in {@link 
#visitInputRef(RexInputRef)} method. Example:
    +
    +         Query: select t1.c1, t2.c1. t2.c2 from t1 inner join t2 on t1.c1 
between t2.c1 and t2.c2
    +
    +         Input 1: $0
    +         Input 2: $1, $2
    +
    +         Result: $0, $1, $2
    +       */
    +      for (RelNode input : inputs) {
    --- End diff --
    
    Yes, in `public LogicalExpression visitInputRef(RexInputRef inputRef)` we 
determine to which input field belongs to. Before that we had only one input 
thus we did simple get operation `input.getRowType().getFieldList().get(index)` 
but now we have two inputs so we have to get operation on one input and if 
field in not found try in the second.  I could iterate over two inputs and do 
get operation and once filed is found break the loop OR I could merge filed 
list in one and do simple get operation `fieldList.get(index)`. For performance 
reasons, I decided to merge filed lists in constructor and use them in `public 
LogicalExpression visitInputRef(RexInputRef inputRef)` rather than iterating 
over them for each field.


> Nested loop join: return correct result for left join
> -----------------------------------------------------
>
>                 Key: DRILL-5375
>                 URL: https://issues.apache.org/jira/browse/DRILL-5375
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.8.0
>            Reporter: Arina Ielchiieva
>            Assignee: Arina Ielchiieva
>              Labels: doc-impacting
>
> Mini repro:
> 1. Create 2 Hive tables with data
> {code}
> CREATE TABLE t1 (
>   FYQ varchar(999),
>   dts varchar(999),
>   dte varchar(999)
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> 2016-Q1,2016-06-01,2016-09-30
> 2016-Q2,2016-09-01,2016-12-31
> 2016-Q3,2017-01-01,2017-03-31
> 2016-Q4,2017-04-01,2017-06-30
> CREATE TABLE t2 (
>   who varchar(999),
>   event varchar(999),
>   dt varchar(999)
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> aperson,did somthing,2017-01-06
> aperson,did somthing else,2017-01-12
> aperson,had chrsitmas,2016-12-26
> aperson,went wild,2016-01-01
> {code}
> 2. Impala Query shows correct result
> {code}
> select t2.dt, t1.fyq, t2.who, t2.event
> from t2
> left join t1 on t2.dt between t1.dts and t1.dte
> order by t2.dt;
> +------------+---------+---------+-------------------+
> | dt         | fyq     | who     | event             |
> +------------+---------+---------+-------------------+
> | 2016-01-01 | NULL    | aperson | went wild         |
> | 2016-12-26 | 2016-Q2 | aperson | had chrsitmas     |
> | 2017-01-06 | 2016-Q3 | aperson | did somthing      |
> | 2017-01-12 | 2016-Q3 | aperson | did somthing else |
> +------------+---------+---------+-------------------+
> {code}
> 3. Drill query shows wrong results:
> {code}
> alter session set planner.enable_nljoin_for_scalar_only=false;
> use hive;
> select t2.dt, t1.fyq, t2.who, t2.event
> from t2
> left join t1 on t2.dt between t1.dts and t1.dte
> order by t2.dt;
> +-------------+----------+----------+--------------------+
> |     dt      |   fyq    |   who    |       event        |
> +-------------+----------+----------+--------------------+
> | 2016-12-26  | 2016-Q2  | aperson  | had chrsitmas      |
> | 2017-01-06  | 2016-Q3  | aperson  | did somthing       |
> | 2017-01-12  | 2016-Q3  | aperson  | did somthing else  |
> +-------------+----------+----------+--------------------+
> 3 rows selected (2.523 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (DRILL-5375) Nested loop join: return correct result for left join

Reply via email to