[ https://issues.apache.org/jira/browse/DRILL-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15941724#comment-15941724 ]
ASF GitHub Bot commented on DRILL-5375: --------------------------------------- Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/794#discussion_r108036357 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/NestedLoopJoinBatch.java --- @@ -214,26 +226,62 @@ private boolean hasMore(IterOutcome outcome) { /** * Method generates the runtime code needed for NLJ. Other than the setup method to set the input and output value - * vector references we implement two more methods - * 1. emitLeft() -> Project record from the left side - * 2. emitRight() -> Project record from the right side (which is a hyper container) + * vector references we implement three more methods + * 1. doEval() -> Evaluates if record from left side matches record from the right side + * 2. emitLeft() -> Project record from the left side + * 3. emitRight() -> Project record from the right side (which is a hyper container) * @return the runtime generated class that implements the NestedLoopJoin interface - * @throws IOException - * @throws ClassTransformationException */ - private NestedLoopJoin setupWorker() throws IOException, ClassTransformationException { - final CodeGenerator<NestedLoopJoin> nLJCodeGenerator = CodeGenerator.get(NestedLoopJoin.TEMPLATE_DEFINITION, context.getFunctionRegistry(), context.getOptions()); + private NestedLoopJoin setupWorker() throws IOException, ClassTransformationException, SchemaChangeException { + final CodeGenerator<NestedLoopJoin> nLJCodeGenerator = CodeGenerator.get( + NestedLoopJoin.TEMPLATE_DEFINITION, context.getFunctionRegistry(), context.getOptions()); nLJCodeGenerator.plainJavaCapable(true); // Uncomment out this line to debug the generated code. // nLJCodeGenerator.saveCodeForDebugging(true); final ClassGenerator<NestedLoopJoin> nLJClassGenerator = nLJCodeGenerator.getRoot(); + // generate doEval + final ErrorCollector collector = new ErrorCollectorImpl(); + + + /* + Logical expression may contain fields from left and right batches. During code generation (materialization) + we need to indicate from which input field should be taken. Mapping sets can work with only one input at a time. + But non-equality expressions can be complex: + select t1.c1, t2.c1, t2.c2 from t1 inner join t2 on t1.c1 between t2.c1 and t2.c2 + or even contain self join which can not be transformed into filter since OR clause is present + select *from t1 inner join t2 on t1.c1 >= t2.c1 or t1.c3 <> t1.c4 + + In this case logical expression can not be split according to input presence (like during equality joins --- End diff -- The thing is that inequality join is not only join that has `t1.c3 <> t1.c4` but also the one that has `OR`. For example, currently the following query `select * from t1 inner join t2 on t1.c1 = t2.c1 or t1.c2 = t2.c2` will fail which the following error: `UNSUPPORTED_OPERATION ERROR: This query cannot be planned possibly due to either a cartesian join or an inequality join`. The main idea of my comment is that I don't bother if it's equality or inequality join, I just materialize the whole expression with fields from two inputs and to find out which input field is I add batch indication. If you want I can remove the comment, if it's confusing. > Nested loop join: return correct result for left join > ----------------------------------------------------- > > Key: DRILL-5375 > URL: https://issues.apache.org/jira/browse/DRILL-5375 > Project: Apache Drill > Issue Type: Bug > Affects Versions: 1.8.0 > Reporter: Arina Ielchiieva > Assignee: Arina Ielchiieva > Labels: doc-impacting > > Mini repro: > 1. Create 2 Hive tables with data > {code} > CREATE TABLE t1 ( > FYQ varchar(999), > dts varchar(999), > dte varchar(999) > ) > ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; > 2016-Q1,2016-06-01,2016-09-30 > 2016-Q2,2016-09-01,2016-12-31 > 2016-Q3,2017-01-01,2017-03-31 > 2016-Q4,2017-04-01,2017-06-30 > CREATE TABLE t2 ( > who varchar(999), > event varchar(999), > dt varchar(999) > ) > ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; > aperson,did somthing,2017-01-06 > aperson,did somthing else,2017-01-12 > aperson,had chrsitmas,2016-12-26 > aperson,went wild,2016-01-01 > {code} > 2. Impala Query shows correct result > {code} > select t2.dt, t1.fyq, t2.who, t2.event > from t2 > left join t1 on t2.dt between t1.dts and t1.dte > order by t2.dt; > +------------+---------+---------+-------------------+ > | dt | fyq | who | event | > +------------+---------+---------+-------------------+ > | 2016-01-01 | NULL | aperson | went wild | > | 2016-12-26 | 2016-Q2 | aperson | had chrsitmas | > | 2017-01-06 | 2016-Q3 | aperson | did somthing | > | 2017-01-12 | 2016-Q3 | aperson | did somthing else | > +------------+---------+---------+-------------------+ > {code} > 3. Drill query shows wrong results: > {code} > alter session set planner.enable_nljoin_for_scalar_only=false; > use hive; > select t2.dt, t1.fyq, t2.who, t2.event > from t2 > left join t1 on t2.dt between t1.dts and t1.dte > order by t2.dt; > +-------------+----------+----------+--------------------+ > | dt | fyq | who | event | > +-------------+----------+----------+--------------------+ > | 2016-12-26 | 2016-Q2 | aperson | had chrsitmas | > | 2017-01-06 | 2016-Q3 | aperson | did somthing | > | 2017-01-12 | 2016-Q3 | aperson | did somthing else | > +-------------+----------+----------+--------------------+ > 3 rows selected (2.523 seconds) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)