[
https://issues.apache.org/jira/browse/HIVE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yingzhong Xu updated HIVE-3738:
-------------------------------
Attachment: DDL
The DDL helps you create related tables
> Bugs exists in SEMI JOIN
> ------------------------
>
> Key: HIVE-3738
> URL: https://issues.apache.org/jira/browse/HIVE-3738
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.9.0
> Environment: JDK1.6
> Reporter: Yingzhong Xu
> Labels: Semijoin
> Attachments: DDL
>
>
> I am using the version 0.9.0 and my tables are the same with TPC-H benchmark:
> Here is a simple query(works correctly):
> *Q1*
> {quote}
> INSERT OVERWRITE TABLE customer_orders_statistics
> SELECT C_CUSTKEY FROM CUSTOMER
> LEFT SEMI JOIN(
> SELECT O_CUSTKEY FROM ORDERS WHERE unix_timestamp(O_ORDERDATE,
> 'yyyy-MM-dd') > unix_timestamp('1995-12-31','yyyy-MM-dd')
> ) tempTable ON tempTable.O_CUSTKEY=CUSTOMER.C_CUSTKEY
> {quote}
> it means inserting the key of customers who has orders since 1995-12-31 into
> another table.
> But if I write the query like this:
> *Q2*
> {quote}
> INSERT OVERWRITE TABLE customer_orders_statistics
> SELECT C_CUSTKEY FROM CUSTOMER
> LEFT SEMI JOIN ORDERS
> ON CUSTOMER.C_CUSTKEY=ORDERS.O_CUSTKEY
> AND unix_timestamp(ORDERS.O_ORDERDATE, 'yyyy-MM-dd') >
> unix_timestamp('1995-12-31','yyyy-MM-dd')
> {quote}
> I will get exception from Hive:
> {quote}
> FAILED: Hive Internal Error: java.lang.NullPointerException(null)
> java.lang.NullPointerException
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFilterPlan(SemanticAnalyzer.java:1566)
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.pushJoinFilters(SemanticAnalyzer.java:5254)
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:6754)
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7531)
> at
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909)
> at
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:689)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> {quote}
> Also,If I write the query like this:
> *Q3*
> {quote}
> INSERT OVERWRITE TABLE customer_orders_statistics
> SELECT C_CUSTKEY FROM CUSTOMER
> LEFT SEMI JOIN ORDERS
> ON CUSTOMER.C_CUSTKEY=ORDERS.O_CUSTKEY
> WHERE unix_timestamp(ORDERS.O_ORDERDATE, 'yyyy-MM-dd') >
> unix_timestamp('1995-12-31','yyyy-MM-dd')
> {quote}
> Then this query can be executed(wondering the right hand of SEMI JOIN can be
> referenced in WHERE clause now?), but the result is wrong(comparing to Q1,
> Q1's result is the same with mysql).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira