[ https://issues.apache.org/jira/browse/HIVE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yingzhong Xu updated HIVE-3738: ------------------------------- Attachment: DDL The DDL helps you create related tables > Bugs exists in SEMI JOIN > ------------------------ > > Key: HIVE-3738 > URL: https://issues.apache.org/jira/browse/HIVE-3738 > Project: Hive > Issue Type: Bug > Components: Query Processor > Affects Versions: 0.9.0 > Environment: JDK1.6 > Reporter: Yingzhong Xu > Labels: Semijoin > Attachments: DDL > > > I am using the version 0.9.0 and my tables are the same with TPC-H benchmark: > Here is a simple query(works correctly): > *Q1* > {quote} > INSERT OVERWRITE TABLE customer_orders_statistics > SELECT C_CUSTKEY FROM CUSTOMER > LEFT SEMI JOIN( > SELECT O_CUSTKEY FROM ORDERS WHERE unix_timestamp(O_ORDERDATE, > 'yyyy-MM-dd') > unix_timestamp('1995-12-31','yyyy-MM-dd') > ) tempTable ON tempTable.O_CUSTKEY=CUSTOMER.C_CUSTKEY > {quote} > it means inserting the key of customers who has orders since 1995-12-31 into > another table. > But if I write the query like this: > *Q2* > {quote} > INSERT OVERWRITE TABLE customer_orders_statistics > SELECT C_CUSTKEY FROM CUSTOMER > LEFT SEMI JOIN ORDERS > ON CUSTOMER.C_CUSTKEY=ORDERS.O_CUSTKEY > AND unix_timestamp(ORDERS.O_ORDERDATE, 'yyyy-MM-dd') > > unix_timestamp('1995-12-31','yyyy-MM-dd') > {quote} > I will get exception from Hive: > {quote} > FAILED: Hive Internal Error: java.lang.NullPointerException(null) > java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFilterPlan(SemanticAnalyzer.java:1566) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.pushJoinFilters(SemanticAnalyzer.java:5254) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:6754) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7531) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:689) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > {quote} > Also,If I write the query like this: > *Q3* > {quote} > INSERT OVERWRITE TABLE customer_orders_statistics > SELECT C_CUSTKEY FROM CUSTOMER > LEFT SEMI JOIN ORDERS > ON CUSTOMER.C_CUSTKEY=ORDERS.O_CUSTKEY > WHERE unix_timestamp(ORDERS.O_ORDERDATE, 'yyyy-MM-dd') > > unix_timestamp('1995-12-31','yyyy-MM-dd') > {quote} > Then this query can be executed(wondering the right hand of SEMI JOIN can be > referenced in WHERE clause now?), but the result is wrong(comparing to Q1, > Q1's result is the same with mysql). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira