[ https://issues.apache.org/jira/browse/DRILL-6606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16549949#comment-16549949 ]
ASF GitHub Bot commented on DRILL-6606: --------------------------------------- ilooner commented on issue #1384: DRILL-6606: Fixed bug in HashJoin that caused it not to return OK_NEW_SCHEMA in some cases. URL: https://github.com/apache/drill/pull/1384#issuecomment-406432426 Thanks for the +1 . With respect to your comment, calling prefetchFirstBatchFromBothSides from buildSchema was actually the source of the problem. Doing so would trigger the operator state to be BatchState.FIRST after calling buildSchema which would cause an **OK_SCHEMA** to NOT be sent. This then cause downstream operators to never build a correct schema and return incorrect data types in some cases. That was the crux of the issue. This change fixes that issue by separating prefetching data to two phases: - Schema sniffing - Data sniffing The schemas need to be sniffed in the buildSchema call so we can have the schema. After sniffing schemas that state of the operator is BUILD_SCHEMA and OK_NEW_SCHEMA is emitted. Then data sniffing needs to happen in the call to innerNext() after the operator has emitted an OK_NEW_SCHEMA message. Other binary operators don't have this issue because they don't live within their memory limit, and as a consequence do not need to collect statistics about the data through sniffing. Furthermore, doing the sniffing in two stages is not a hack. It is required for functional correctness for queries like the one added in the unit test and for the reasons described above. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Hash Join returns incorrect data types when joining subqueries with limit 0 > --------------------------------------------------------------------------- > > Key: DRILL-6606 > URL: https://issues.apache.org/jira/browse/DRILL-6606 > Project: Apache Drill > Issue Type: Bug > Reporter: Bohdan Kazydub > Assignee: Timothy Farkas > Priority: Blocker > Fix For: 1.14.0 > > > PreparedStatement for query > {code:sql} > SELECT l.l_quantity, l.l_shipdate, o.o_custkey > FROM (SELECT * FROM cp.`tpch/lineitem.parquet` LIMIT 0) l > JOIN (SELECT * FROM cp.`tpch/orders.parquet` LIMIT 0) o > ON l.l_orderkey = o.o_orderkey > LIMIT 0 > {code} > is created with wrong types (nullable INTEGER) for all selected columns, no > matter what their actual type is. This behavior reproduces with hash join > only and is very likely to be caused by DRILL-6027 as the query works fine > before this feature was implemented. > To reproduce the problem you can put the aforementioned query into > TestPreparedStatementProvider#joinOrderByQuery() test method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)