[jira] [Updated] (HIVE-3738) Bugs exists in SEMI JOIN

2012-11-21 Thread Yingzhong Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yingzhong Xu updated HIVE-3738:
---

Attachment: DDL

The DDL helps you create related tables

 Bugs exists in SEMI JOIN
 

 Key: HIVE-3738
 URL: https://issues.apache.org/jira/browse/HIVE-3738
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.9.0
 Environment: JDK1.6
Reporter: Yingzhong Xu
  Labels: Semijoin
 Attachments: DDL


 I am using the version 0.9.0 and my tables are the same with TPC-H benchmark:
 Here is a simple query(works correctly):
 *Q1*
 {quote}
 INSERT OVERWRITE TABLE customer_orders_statistics 
  SELECT C_CUSTKEY FROM CUSTOMER 
  LEFT SEMI JOIN(
   SELECT O_CUSTKEY FROM ORDERS WHERE unix_timestamp(O_ORDERDATE, 
 '-MM-dd')  unix_timestamp('1995-12-31','-MM-dd')
  ) tempTable ON tempTable.O_CUSTKEY=CUSTOMER.C_CUSTKEY
 {quote}
 it means inserting the key of customers who has orders since 1995-12-31 into 
 another table.
 But if I write the query like this:
 *Q2*
 {quote}
 INSERT OVERWRITE TABLE customer_orders_statistics 
  SELECT C_CUSTKEY FROM CUSTOMER 
  LEFT SEMI JOIN ORDERS
  ON CUSTOMER.C_CUSTKEY=ORDERS.O_CUSTKEY 
  AND unix_timestamp(ORDERS.O_ORDERDATE, '-MM-dd')  
 unix_timestamp('1995-12-31','-MM-dd')
 {quote}
 I will get exception from Hive:
 {quote}
 FAILED: Hive Internal Error: java.lang.NullPointerException(null)
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFilterPlan(SemanticAnalyzer.java:1566)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.pushJoinFilters(SemanticAnalyzer.java:5254)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:6754)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7531)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909)
   at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)
   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:689)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
 {quote}
 Also,If I write the query like this:
 *Q3*
 {quote}
 INSERT OVERWRITE TABLE customer_orders_statistics 
  SELECT C_CUSTKEY FROM CUSTOMER 
  LEFT SEMI JOIN ORDERS
  ON CUSTOMER.C_CUSTKEY=ORDERS.O_CUSTKEY 
  WHERE unix_timestamp(ORDERS.O_ORDERDATE, '-MM-dd')  
 unix_timestamp('1995-12-31','-MM-dd')
 {quote}
 Then this query can be executed(wondering the right hand of SEMI JOIN can be 
 referenced in WHERE clause now?), but the result is wrong(comparing to Q1, 
 Q1's result is the same with mysql).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3738) Bugs exists in SEMI JOIN

2012-11-21 Thread Yingzhong Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yingzhong Xu updated HIVE-3738:
---

Description: 
I am using the version 0.9.0 and my tables are the same with TPC-H benchmark:

Here is a simple query(works correctly):

*Q1*
{quote}
INSERT OVERWRITE TABLE customer_orders_statistics 
 SELECT C_CUSTKEY FROM CUSTOMER 
 LEFT SEMI JOIN(
  SELECT O_CUSTKEY FROM ORDERS WHERE unix_timestamp(O_ORDERDATE, '-MM-dd') 
 unix_timestamp('1995-12-31','-MM-dd')
 ) tempTable ON tempTable.O_CUSTKEY=CUSTOMER.C_CUSTKEY
{quote}

it means inserting the key of customers who has orders since 1995-12-31 into 
another table.
But if I write the query like this:

*Q2*
{quote}
INSERT OVERWRITE TABLE customer_orders_statistics 
 SELECT C_CUSTKEY FROM CUSTOMER 
 LEFT SEMI JOIN ORDERS
 ON CUSTOMER.C_CUSTKEY=ORDERS.O_CUSTKEY 
 AND unix_timestamp(ORDERS.O_ORDERDATE, '-MM-dd')  
unix_timestamp('1995-12-31','-MM-dd')
{quote}

I will get exception from Hive:

{quote}
FAILED: Hive Internal Error: java.lang.NullPointerException(null)
java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFilterPlan(SemanticAnalyzer.java:1566)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.pushJoinFilters(SemanticAnalyzer.java:5254)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:6754)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7531)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:689)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
{quote}

Also,If I write the query like this:
*Q3*
{quote}
INSERT OVERWRITE TABLE customer_orders_statistics 
 SELECT C_CUSTKEY FROM CUSTOMER 
 LEFT SEMI JOIN ORDERS
 ON CUSTOMER.C_CUSTKEY=ORDERS.O_CUSTKEY 
 WHERE unix_timestamp(ORDERS.O_ORDERDATE, '-MM-dd')  
unix_timestamp('1995-12-31','-MM-dd')
{quote}
Then this query can be executed(wondering the right hand of SEMI JOIN can be 
referenced in WHERE clause now?), but the result is wrong(comparing to *Q1*, 
*Q1*'s result is the same with mysql).

  was:
I am using the version 0.9.0 and my tables are the same with TPC-H benchmark:

Here is a simple query(works correctly):

*Q1*
{quote}
INSERT OVERWRITE TABLE customer_orders_statistics 
 SELECT C_CUSTKEY FROM CUSTOMER 
 LEFT SEMI JOIN(
  SELECT O_CUSTKEY FROM ORDERS WHERE unix_timestamp(O_ORDERDATE, '-MM-dd') 
 unix_timestamp('1995-12-31','-MM-dd')
 ) tempTable ON tempTable.O_CUSTKEY=CUSTOMER.C_CUSTKEY
{quote}

it means inserting the key of customers who has orders since 1995-12-31 into 
another table.
But if I write the query like this:

*Q2*
{quote}
INSERT OVERWRITE TABLE customer_orders_statistics 
 SELECT C_CUSTKEY FROM CUSTOMER 
 LEFT SEMI JOIN ORDERS
 ON CUSTOMER.C_CUSTKEY=ORDERS.O_CUSTKEY 
 AND unix_timestamp(ORDERS.O_ORDERDATE, '-MM-dd')  
unix_timestamp('1995-12-31','-MM-dd')
{quote}

I will get exception from Hive:

{quote}
FAILED: Hive Internal Error: java.lang.NullPointerException(null)
java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFilterPlan(SemanticAnalyzer.java:1566)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.pushJoinFilters(SemanticAnalyzer.java:5254)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:6754)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7531)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
at