[jira] [Updated] (HIVE-5891) Alias conflict when merging multiple mapjoin tasks into their common child mapred task

2013-12-25 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-5891:
---

   Resolution: Fixed
Fix Version/s: 0.13.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Sun!

 Alias conflict when merging multiple mapjoin tasks into their common child 
 mapred task
 --

 Key: HIVE-5891
 URL: https://issues.apache.org/jira/browse/HIVE-5891
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Sun Rui
Assignee: Sun Rui
 Fix For: 0.13.0

 Attachments: HIVE-5891.1.patch, HIVE-5891.2.patch, HIVE-5891.3.patch


 Use the following test case with HIVE 0.12:
 {code:sql}
 create table src(key int, value string);
 load data local inpath 'src/data/files/kv1.txt' overwrite into table src;
 select * from (
   select c.key from
 (select a.key from src a join src b on a.key=b.key group by a.key) tmp
 join src c on tmp.key=c.key
   union all
   select c.key from
 (select a.key from src a join src b on a.key=b.key group by a.key) tmp
 join src c on tmp.key=c.key
 ) x;
 {code}
 We will get a NullPointerException from Union Operator:
 {noformat}
 java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
 Hive Runtime Error while processing row {_col0:0}
   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:175)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
 Error while processing row {_col0:0}
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:544)
   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:157)
   ... 4 more
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.UnionOperator.processOp(UnionOperator.java:120)
   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:88)
   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:652)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:655)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758)
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:220)
   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
   at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:91)
   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534)
   ... 5 more
 {noformat}
   
 The root cause is in 
 CommonJoinTaskDispatcher.mergeMapJoinTaskIntoItsChildMapRedTask().
 {noformat}
   +--+  +--+
   | MapJoin task |  | MapJoin task |
   +--+  +--+
  \ /
   \   /
  +--+
  |  Union task  |
  +--+
 {noformat} 
 CommonJoinTaskDispatcher merges the two MapJoin tasks into their common 
 child: Union task. The two MapJoin tasks have the same alias name for their 
 big tables: $INTNAME, which is the name of the temporary table of a join 
 stream. The aliasToWork map uses alias as key, so eventually only the MapJoin 
 operator tree of one MapJoin task is saved into the aliasToWork map of the 
 Union task, while the MapJoin operator tree of another MapJoin task is lost. 
 As a result, Union operator won't be initialized because not all of its 
 parents gets intialized (The Union operator itself indicates it has two 
 parents, but actually it has only 1 parent because another parent is lost).
 This issue does not exist in HIVE 0.11 and thus is a regression bug in HIVE 
 0.12.
 The propsed solution is to use the query ID as prefix for the join stream 
 

[jira] [Updated] (HIVE-5891) Alias conflict when merging multiple mapjoin tasks into their common child mapred task

2013-12-24 Thread Sun Rui (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Rui updated HIVE-5891:
--

Status: Patch Available  (was: Open)

 Alias conflict when merging multiple mapjoin tasks into their common child 
 mapred task
 --

 Key: HIVE-5891
 URL: https://issues.apache.org/jira/browse/HIVE-5891
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Sun Rui
Assignee: Sun Rui
 Attachments: HIVE-5891.1.patch, HIVE-5891.2.patch, HIVE-5891.3.patch


 Use the following test case with HIVE 0.12:
 {code:sql}
 create table src(key int, value string);
 load data local inpath 'src/data/files/kv1.txt' overwrite into table src;
 select * from (
   select c.key from
 (select a.key from src a join src b on a.key=b.key group by a.key) tmp
 join src c on tmp.key=c.key
   union all
   select c.key from
 (select a.key from src a join src b on a.key=b.key group by a.key) tmp
 join src c on tmp.key=c.key
 ) x;
 {code}
 We will get a NullPointerException from Union Operator:
 {noformat}
 java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
 Hive Runtime Error while processing row {_col0:0}
   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:175)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
 Error while processing row {_col0:0}
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:544)
   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:157)
   ... 4 more
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.UnionOperator.processOp(UnionOperator.java:120)
   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:88)
   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:652)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:655)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758)
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:220)
   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
   at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:91)
   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534)
   ... 5 more
 {noformat}
   
 The root cause is in 
 CommonJoinTaskDispatcher.mergeMapJoinTaskIntoItsChildMapRedTask().
 {noformat}
   +--+  +--+
   | MapJoin task |  | MapJoin task |
   +--+  +--+
  \ /
   \   /
  +--+
  |  Union task  |
  +--+
 {noformat} 
 CommonJoinTaskDispatcher merges the two MapJoin tasks into their common 
 child: Union task. The two MapJoin tasks have the same alias name for their 
 big tables: $INTNAME, which is the name of the temporary table of a join 
 stream. The aliasToWork map uses alias as key, so eventually only the MapJoin 
 operator tree of one MapJoin task is saved into the aliasToWork map of the 
 Union task, while the MapJoin operator tree of another MapJoin task is lost. 
 As a result, Union operator won't be initialized because not all of its 
 parents gets intialized (The Union operator itself indicates it has two 
 parents, but actually it has only 1 parent because another parent is lost).
 This issue does not exist in HIVE 0.11 and thus is a regression bug in HIVE 
 0.12.
 The propsed solution is to use the query ID as prefix for the join stream 
 name to avoid conflict and add sanity check code in CommonJoinTaskDispatcher 
 that merge of a MapJoin task into its child MapRed task is skipped 

[jira] [Updated] (HIVE-5891) Alias conflict when merging multiple mapjoin tasks into their common child mapred task

2013-12-19 Thread Sun Rui (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Rui updated HIVE-5891:
--

Description: 
Use the following test case with HIVE 0.12:

{code:sql}
create table src(key int, value string);
load data local inpath 'src/data/files/kv1.txt' overwrite into table src;
select * from (
  select c.key from
(select a.key from src a join src b on a.key=b.key group by a.key) tmp
join src c on tmp.key=c.key
  union all
  select c.key from
(select a.key from src a join src b on a.key=b.key group by a.key) tmp
join src c on tmp.key=c.key
) x;
{code:sql}

We will get a NullPointerException from Union Operator:

{noformat}
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing row {_col0:0}
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:175)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row {_col0:0}
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:544)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:157)
... 4 more
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.UnionOperator.processOp(UnionOperator.java:120)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:88)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:652)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:655)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758)
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:220)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:91)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534)
... 5 more
{noformat}
  
The root cause is in 
CommonJoinTaskDispatcher.mergeMapJoinTaskIntoItsChildMapRedTask().
{noformat}
  +--+  +--+
  | MapJoin task |  | MapJoin task |
  +--+  +--+
 \ /
  \   /
 +--+
 |  Union task  |
 +--+
{noformat} 
CommonJoinTaskDispatcher merges the two MapJoin tasks into their common child: 
Union task. The two MapJoin tasks have the same alias name for their big 
tables: $INTNAME, which is the name of the temporary table of a join stream. 
The aliasToWork map uses alias as key, so eventually only the MapJoin operator 
tree of one MapJoin task is saved into the aliasToWork map of the Union task, 
while the MapJoin operator tree of another MapJoin task is lost. As a result, 
Union operator won't be initialized because not all of its parents gets 
intialized (The Union operator itself indicates it has two parents, but 
actually it has only 1 parent because another parent is lost).

This issue does not exist in HIVE 0.11 and thus is a regression bug in HIVE 
0.12.

The propsed solution is to use the query ID as prefix for the join stream name 
to avoid conflict and add sanity check code in CommonJoinTaskDispatcher that 
merge of a MapJoin task into its child MapRed task is skipped if there is any 
alias conflict. Please review the patch. I am not sure if the patch properly 
handles the case of DemuxOperator.

BTW, anyone knows the origin of $INTNAME? it is so confusing, maybe we can 
replace it with a meaningful name.

  was:
Use the following test case with HIVE 0.12:

{sql}
create table src(key int, value string);
load data local inpath 'src/data/files/kv1.txt' overwrite into table src;
select * from (
  select c.key from
(select a.key from src a join src b on a.key=b.key group by a.key) tmp
join src c on tmp.key=c.key
  union all
  select c.key 

[jira] [Updated] (HIVE-5891) Alias conflict when merging multiple mapjoin tasks into their common child mapred task

2013-12-19 Thread Sun Rui (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Rui updated HIVE-5891:
--

Description: 
Use the following test case with HIVE 0.12:

{sql}
create table src(key int, value string);
load data local inpath 'src/data/files/kv1.txt' overwrite into table src;
select * from (
  select c.key from
(select a.key from src a join src b on a.key=b.key group by a.key) tmp
join src c on tmp.key=c.key
  union all
  select c.key from
(select a.key from src a join src b on a.key=b.key group by a.key) tmp
join src c on tmp.key=c.key
) x;
{sql}

We will get a NullPointerException from Union Operator:

{noformat}
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing row {_col0:0}
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:175)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row {_col0:0}
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:544)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:157)
... 4 more
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.UnionOperator.processOp(UnionOperator.java:120)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:88)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:652)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:655)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758)
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:220)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:91)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534)
... 5 more
{noformat}
  
The root cause is in 
CommonJoinTaskDispatcher.mergeMapJoinTaskIntoItsChildMapRedTask().
{noformat}
  +--+  +--+
  | MapJoin task |  | MapJoin task |
  +--+  +--+
 \ /
  \   /
 +--+
 |  Union task  |
 +--+
{noformat} 
CommonJoinTaskDispatcher merges the two MapJoin tasks into their common child: 
Union task. The two MapJoin tasks have the same alias name for their big 
tables: $INTNAME, which is the name of the temporary table of a join stream. 
The aliasToWork map uses alias as key, so eventually only the MapJoin operator 
tree of one MapJoin task is saved into the aliasToWork map of the Union task, 
while the MapJoin operator tree of another MapJoin task is lost. As a result, 
Union operator won't be initialized because not all of its parents gets 
intialized (The Union operator itself indicates it has two parents, but 
actually it has only 1 parent because another parent is lost).

This issue does not exist in HIVE 0.11 and thus is a regression bug in HIVE 
0.12.

The propsed solution is to use the query ID as prefix for the join stream name 
to avoid conflict and add sanity check code in CommonJoinTaskDispatcher that 
merge of a MapJoin task into its child MapRed task is skipped if there is any 
alias conflict. Please review the patch. I am not sure if the patch properly 
handles the case of DemuxOperator.

BTW, anyone knows the origin of $INTNAME? it is so confusing, maybe we can 
replace it with a meaningful name.

  was:
Use the following test case with HIVE 0.12:

{quote}
create table src(key int, value string);
load data local inpath 'src/data/files/kv1.txt' overwrite into table src;
select * from (
  select c.key from
(select a.key from src a join src b on a.key=b.key group by a.key) tmp
join src c on tmp.key=c.key
  union all
  select c.key from

[jira] [Updated] (HIVE-5891) Alias conflict when merging multiple mapjoin tasks into their common child mapred task

2013-12-19 Thread Sun Rui (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Rui updated HIVE-5891:
--

Description: 
Use the following test case with HIVE 0.12:

{code:sql}
create table src(key int, value string);
load data local inpath 'src/data/files/kv1.txt' overwrite into table src;
select * from (
  select c.key from
(select a.key from src a join src b on a.key=b.key group by a.key) tmp
join src c on tmp.key=c.key
  union all
  select c.key from
(select a.key from src a join src b on a.key=b.key group by a.key) tmp
join src c on tmp.key=c.key
) x;
{code}

We will get a NullPointerException from Union Operator:

{noformat}
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing row {_col0:0}
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:175)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row {_col0:0}
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:544)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:157)
... 4 more
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.UnionOperator.processOp(UnionOperator.java:120)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:88)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:652)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:655)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758)
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:220)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:91)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534)
... 5 more
{noformat}
  
The root cause is in 
CommonJoinTaskDispatcher.mergeMapJoinTaskIntoItsChildMapRedTask().
{noformat}
  +--+  +--+
  | MapJoin task |  | MapJoin task |
  +--+  +--+
 \ /
  \   /
 +--+
 |  Union task  |
 +--+
{noformat} 
CommonJoinTaskDispatcher merges the two MapJoin tasks into their common child: 
Union task. The two MapJoin tasks have the same alias name for their big 
tables: $INTNAME, which is the name of the temporary table of a join stream. 
The aliasToWork map uses alias as key, so eventually only the MapJoin operator 
tree of one MapJoin task is saved into the aliasToWork map of the Union task, 
while the MapJoin operator tree of another MapJoin task is lost. As a result, 
Union operator won't be initialized because not all of its parents gets 
intialized (The Union operator itself indicates it has two parents, but 
actually it has only 1 parent because another parent is lost).

This issue does not exist in HIVE 0.11 and thus is a regression bug in HIVE 
0.12.

The propsed solution is to use the query ID as prefix for the join stream name 
to avoid conflict and add sanity check code in CommonJoinTaskDispatcher that 
merge of a MapJoin task into its child MapRed task is skipped if there is any 
alias conflict. Please review the patch. I am not sure if the patch properly 
handles the case of DemuxOperator.

BTW, anyone knows the origin of $INTNAME? it is so confusing, maybe we can 
replace it with a meaningful name.

  was:
Use the following test case with HIVE 0.12:

{code:sql}
create table src(key int, value string);
load data local inpath 'src/data/files/kv1.txt' overwrite into table src;
select * from (
  select c.key from
(select a.key from src a join src b on a.key=b.key group by a.key) tmp
join src c on tmp.key=c.key
  union all
  select c.key 

[jira] [Updated] (HIVE-5891) Alias conflict when merging multiple mapjoin tasks into their common child mapred task

2013-11-26 Thread Sun Rui (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Rui updated HIVE-5891:
--

Description: 
Use the following test case with HIVE 0.12:

{quote}
create table src(key int, value string);
load data local inpath '/home/ray/hive-0.11.0/src/data/files/kv1.txt' overwrite 
into table src;
select * from (
  select c.key from
(select a.key from src a join src b on a.key=b.key group by a.key) tmp
join src c on tmp.key=c.key
  union all
  select c.key from
(select a.key from src a join src b on a.key=b.key group by a.key) tmp
join src c on tmp.key=c.key
) x;
{quote}

We will get a NullPointerException from Union Operator:

{quote}
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing row {_col0:0}
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:175)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row {_col0:0}
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:544)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:157)
... 4 more
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.UnionOperator.processOp(UnionOperator.java:120)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:88)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:652)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:655)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758)
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:220)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:91)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534)
... 5 more
{quote}
  
The root cause is in 
CommonJoinTaskDispatcher.mergeMapJoinTaskIntoItsChildMapRedTask().
  +--+  +--+
  | MapJoin task |  | MapJoin task |
  +--+  +--+
 \ /
  \   /
 +--+
 |  Union task  |
 +--+
 
CommonJoinTaskDispatcher merges the two MapJoin tasks into their common child: 
Union task. The two MapJoin tasks
have the same alias name for their big tables: $INTNAME, which is the name of 
the temporary table of a join stream.
The aliasToWork map uses alias as key, so eventually only the MapJoin operator 
tree of one MapJoin task is saved
into the aliasToWork map of the Union task, while the MapJoin operator tree of 
another MapJoin task is lost.
As a result, Union operator won't be initialized because not all of its parents 
gets intialized (The Union operator
itself indicates it has two parents, but actually it has only 1 parent because 
another parent is lost).

This issue does not exist in HIVE 0.11 and thus is a regression bug in HIVE 
0.12.

The propsed solution is to use the query ID as prefix for the join stream name 
to avoid conflict and add sanity check
code in CommonJoinTaskDispatcher that merge of a MapJoin task into its child 
MapRed task is skipped if there is any
alias conflict. Please review the patch. I am not sure if the patch properly 
handles the case of DemuxOperator.

  was:
Use the following test case with HIVE 0.12:

{quote}
create table src(key int, value string);
load data local inpath '/home/ray/hive-0.11.0/src/data/files/kv1.txt' overwrite 
into table src;
select * from (
  select c.key from
(select a.key from src a join src b on a.key=b.key group by a.key) tmp
join src c on tmp.key=c.key
  union all
  select c.key from
(select a.key from src a join src b on a.key=b.key group by a.key) tmp
join src c on 

[jira] [Updated] (HIVE-5891) Alias conflict when merging multiple mapjoin tasks into their common child mapred task

2013-11-26 Thread Sun Rui (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Rui updated HIVE-5891:
--

Description: 
Use the following test case with HIVE 0.12:

{quote}
create table src(key int, value string);
load data local inpath '/home/ray/hive-0.11.0/src/data/files/kv1.txt' overwrite 
into table src;
select * from (
  select c.key from
(select a.key from src a join src b on a.key=b.key group by a.key) tmp
join src c on tmp.key=c.key
  union all
  select c.key from
(select a.key from src a join src b on a.key=b.key group by a.key) tmp
join src c on tmp.key=c.key
) x;
{quote}

We will get a NullPointerException from Union Operator:

{quote}
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing row {_col0:0}
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:175)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row {_col0:0}
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:544)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:157)
... 4 more
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.UnionOperator.processOp(UnionOperator.java:120)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:88)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:652)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:655)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758)
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:220)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:91)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534)
... 5 more
{quote}
  
The root cause is in 
CommonJoinTaskDispatcher.mergeMapJoinTaskIntoItsChildMapRedTask().
  +--+  +--+
  | MapJoin task |  | MapJoin task |
  +--+  +--+
 \ /
  \   /
 +--+
 |  Union task  |
 +--+
 
CommonJoinTaskDispatcher merges the two MapJoin tasks into their common child: 
Union task. The two MapJoin tasks have the same alias name for their big 
tables: $INTNAME, which is the name of the temporary table of a join stream. 
The aliasToWork map uses alias as key, so eventually only the MapJoin operator 
tree of one MapJoin task is saved into the aliasToWork map of the Union task, 
while the MapJoin operator tree of another MapJoin task is lost. As a result, 
Union operator won't be initialized because not all of its parents gets 
intialized (The Union operator itself indicates it has two parents, but 
actually it has only 1 parent because another parent is lost).

This issue does not exist in HIVE 0.11 and thus is a regression bug in HIVE 
0.12.

The propsed solution is to use the query ID as prefix for the join stream name 
to avoid conflict and add sanity check code in CommonJoinTaskDispatcher that 
merge of a MapJoin task into its child MapRed task is skipped if there is any 
alias conflict. Please review the patch. I am not sure if the patch properly 
handles the case of DemuxOperator.

  was:
Use the following test case with HIVE 0.12:

{quote}
create table src(key int, value string);
load data local inpath '/home/ray/hive-0.11.0/src/data/files/kv1.txt' overwrite 
into table src;
select * from (
  select c.key from
(select a.key from src a join src b on a.key=b.key group by a.key) tmp
join src c on tmp.key=c.key
  union all
  select c.key from
(select a.key from src a join src b on a.key=b.key group by a.key) tmp
join src c on 

[jira] [Updated] (HIVE-5891) Alias conflict when merging multiple mapjoin tasks into their common child mapred task

2013-11-26 Thread Sun Rui (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Rui updated HIVE-5891:
--

Description: 
Use the following test case with HIVE 0.12:

{quote}
create table src(key int, value string);
load data local inpath 'src/data/files/kv1.txt' overwrite into table src;
select * from (
  select c.key from
(select a.key from src a join src b on a.key=b.key group by a.key) tmp
join src c on tmp.key=c.key
  union all
  select c.key from
(select a.key from src a join src b on a.key=b.key group by a.key) tmp
join src c on tmp.key=c.key
) x;
{quote}

We will get a NullPointerException from Union Operator:

{quote}
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing row {_col0:0}
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:175)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row {_col0:0}
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:544)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:157)
... 4 more
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.UnionOperator.processOp(UnionOperator.java:120)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:88)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:652)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:655)
at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758)
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:220)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:91)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534)
... 5 more
{quote}
  
The root cause is in 
CommonJoinTaskDispatcher.mergeMapJoinTaskIntoItsChildMapRedTask().
  +--+  +--+
  | MapJoin task |  | MapJoin task |
  +--+  +--+
 \ /
  \   /
 +--+
 |  Union task  |
 +--+
 
CommonJoinTaskDispatcher merges the two MapJoin tasks into their common child: 
Union task. The two MapJoin tasks have the same alias name for their big 
tables: $INTNAME, which is the name of the temporary table of a join stream. 
The aliasToWork map uses alias as key, so eventually only the MapJoin operator 
tree of one MapJoin task is saved into the aliasToWork map of the Union task, 
while the MapJoin operator tree of another MapJoin task is lost. As a result, 
Union operator won't be initialized because not all of its parents gets 
intialized (The Union operator itself indicates it has two parents, but 
actually it has only 1 parent because another parent is lost).

This issue does not exist in HIVE 0.11 and thus is a regression bug in HIVE 
0.12.

The propsed solution is to use the query ID as prefix for the join stream name 
to avoid conflict and add sanity check code in CommonJoinTaskDispatcher that 
merge of a MapJoin task into its child MapRed task is skipped if there is any 
alias conflict. Please review the patch. I am not sure if the patch properly 
handles the case of DemuxOperator.

BTW, anyone knows the origin of $INTNAME? it is so confusing, maybe we can 
replace it with a meaningful name.

  was:
Use the following test case with HIVE 0.12:

{quote}
create table src(key int, value string);
load data local inpath '/home/ray/hive-0.11.0/src/data/files/kv1.txt' overwrite 
into table src;
select * from (
  select c.key from
(select a.key from src a join src b on a.key=b.key group by a.key) tmp
join src c on tmp.key=c.key
  union all
  select c.key from

[jira] [Updated] (HIVE-5891) Alias conflict when merging multiple mapjoin tasks into their common child mapred task

2013-11-26 Thread Sun Rui (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Rui updated HIVE-5891:
--

Status: Patch Available  (was: Open)

 Alias conflict when merging multiple mapjoin tasks into their common child 
 mapred task
 --

 Key: HIVE-5891
 URL: https://issues.apache.org/jira/browse/HIVE-5891
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Sun Rui

 Use the following test case with HIVE 0.12:
 {quote}
 create table src(key int, value string);
 load data local inpath 'src/data/files/kv1.txt' overwrite into table src;
 select * from (
   select c.key from
 (select a.key from src a join src b on a.key=b.key group by a.key) tmp
 join src c on tmp.key=c.key
   union all
   select c.key from
 (select a.key from src a join src b on a.key=b.key group by a.key) tmp
 join src c on tmp.key=c.key
 ) x;
 {quote}
 We will get a NullPointerException from Union Operator:
 {quote}
 java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
 Hive Runtime Error while processing row {_col0:0}
   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:175)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
 Error while processing row {_col0:0}
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:544)
   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:157)
   ... 4 more
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.UnionOperator.processOp(UnionOperator.java:120)
   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:88)
   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:652)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:655)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758)
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:220)
   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
   at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:91)
   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534)
   ... 5 more
 {quote}
   
 The root cause is in 
 CommonJoinTaskDispatcher.mergeMapJoinTaskIntoItsChildMapRedTask().
   +--+  +--+
   | MapJoin task |  | MapJoin task |
   +--+  +--+
  \ /
   \   /
  +--+
  |  Union task  |
  +--+
  
 CommonJoinTaskDispatcher merges the two MapJoin tasks into their common 
 child: Union task. The two MapJoin tasks have the same alias name for their 
 big tables: $INTNAME, which is the name of the temporary table of a join 
 stream. The aliasToWork map uses alias as key, so eventually only the MapJoin 
 operator tree of one MapJoin task is saved into the aliasToWork map of the 
 Union task, while the MapJoin operator tree of another MapJoin task is lost. 
 As a result, Union operator won't be initialized because not all of its 
 parents gets intialized (The Union operator itself indicates it has two 
 parents, but actually it has only 1 parent because another parent is lost).
 This issue does not exist in HIVE 0.11 and thus is a regression bug in HIVE 
 0.12.
 The propsed solution is to use the query ID as prefix for the join stream 
 name to avoid conflict and add sanity check code in CommonJoinTaskDispatcher 
 that merge of a MapJoin task into its child MapRed task is skipped if there 
 is any alias conflict. Please review the patch. I am not sure if the patch 
 properly handles the case of DemuxOperator.
 BTW, 

[jira] [Updated] (HIVE-5891) Alias conflict when merging multiple mapjoin tasks into their common child mapred task

2013-11-26 Thread Sun Rui (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Rui updated HIVE-5891:
--

Status: Open  (was: Patch Available)

 Alias conflict when merging multiple mapjoin tasks into their common child 
 mapred task
 --

 Key: HIVE-5891
 URL: https://issues.apache.org/jira/browse/HIVE-5891
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Sun Rui

 Use the following test case with HIVE 0.12:
 {quote}
 create table src(key int, value string);
 load data local inpath 'src/data/files/kv1.txt' overwrite into table src;
 select * from (
   select c.key from
 (select a.key from src a join src b on a.key=b.key group by a.key) tmp
 join src c on tmp.key=c.key
   union all
   select c.key from
 (select a.key from src a join src b on a.key=b.key group by a.key) tmp
 join src c on tmp.key=c.key
 ) x;
 {quote}
 We will get a NullPointerException from Union Operator:
 {quote}
 java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
 Hive Runtime Error while processing row {_col0:0}
   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:175)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
 Error while processing row {_col0:0}
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:544)
   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:157)
   ... 4 more
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.UnionOperator.processOp(UnionOperator.java:120)
   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:88)
   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:652)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:655)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758)
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:220)
   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
   at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:91)
   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534)
   ... 5 more
 {quote}
   
 The root cause is in 
 CommonJoinTaskDispatcher.mergeMapJoinTaskIntoItsChildMapRedTask().
   +--+  +--+
   | MapJoin task |  | MapJoin task |
   +--+  +--+
  \ /
   \   /
  +--+
  |  Union task  |
  +--+
  
 CommonJoinTaskDispatcher merges the two MapJoin tasks into their common 
 child: Union task. The two MapJoin tasks have the same alias name for their 
 big tables: $INTNAME, which is the name of the temporary table of a join 
 stream. The aliasToWork map uses alias as key, so eventually only the MapJoin 
 operator tree of one MapJoin task is saved into the aliasToWork map of the 
 Union task, while the MapJoin operator tree of another MapJoin task is lost. 
 As a result, Union operator won't be initialized because not all of its 
 parents gets intialized (The Union operator itself indicates it has two 
 parents, but actually it has only 1 parent because another parent is lost).
 This issue does not exist in HIVE 0.11 and thus is a regression bug in HIVE 
 0.12.
 The propsed solution is to use the query ID as prefix for the join stream 
 name to avoid conflict and add sanity check code in CommonJoinTaskDispatcher 
 that merge of a MapJoin task into its child MapRed task is skipped if there 
 is any alias conflict. Please review the patch. I am not sure if the patch 
 properly handles the case of DemuxOperator.
 BTW, 

[jira] [Updated] (HIVE-5891) Alias conflict when merging multiple mapjoin tasks into their common child mapred task

2013-11-26 Thread Sun Rui (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Rui updated HIVE-5891:
--

Attachment: HIVE-5891.1.patch

 Alias conflict when merging multiple mapjoin tasks into their common child 
 mapred task
 --

 Key: HIVE-5891
 URL: https://issues.apache.org/jira/browse/HIVE-5891
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Sun Rui
 Attachments: HIVE-5891.1.patch


 Use the following test case with HIVE 0.12:
 {quote}
 create table src(key int, value string);
 load data local inpath 'src/data/files/kv1.txt' overwrite into table src;
 select * from (
   select c.key from
 (select a.key from src a join src b on a.key=b.key group by a.key) tmp
 join src c on tmp.key=c.key
   union all
   select c.key from
 (select a.key from src a join src b on a.key=b.key group by a.key) tmp
 join src c on tmp.key=c.key
 ) x;
 {quote}
 We will get a NullPointerException from Union Operator:
 {quote}
 java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
 Hive Runtime Error while processing row {_col0:0}
   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:175)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
 Error while processing row {_col0:0}
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:544)
   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:157)
   ... 4 more
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.UnionOperator.processOp(UnionOperator.java:120)
   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:88)
   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:652)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:655)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758)
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:220)
   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
   at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:91)
   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534)
   ... 5 more
 {quote}
   
 The root cause is in 
 CommonJoinTaskDispatcher.mergeMapJoinTaskIntoItsChildMapRedTask().
   +--+  +--+
   | MapJoin task |  | MapJoin task |
   +--+  +--+
  \ /
   \   /
  +--+
  |  Union task  |
  +--+
  
 CommonJoinTaskDispatcher merges the two MapJoin tasks into their common 
 child: Union task. The two MapJoin tasks have the same alias name for their 
 big tables: $INTNAME, which is the name of the temporary table of a join 
 stream. The aliasToWork map uses alias as key, so eventually only the MapJoin 
 operator tree of one MapJoin task is saved into the aliasToWork map of the 
 Union task, while the MapJoin operator tree of another MapJoin task is lost. 
 As a result, Union operator won't be initialized because not all of its 
 parents gets intialized (The Union operator itself indicates it has two 
 parents, but actually it has only 1 parent because another parent is lost).
 This issue does not exist in HIVE 0.11 and thus is a regression bug in HIVE 
 0.12.
 The propsed solution is to use the query ID as prefix for the join stream 
 name to avoid conflict and add sanity check code in CommonJoinTaskDispatcher 
 that merge of a MapJoin task into its child MapRed task is skipped if there 
 is any alias conflict. Please review the patch. I am not sure if the patch 
 properly handles 

[jira] [Updated] (HIVE-5891) Alias conflict when merging multiple mapjoin tasks into their common child mapred task

2013-11-26 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-5891:
---

Assignee: Sun Rui

 Alias conflict when merging multiple mapjoin tasks into their common child 
 mapred task
 --

 Key: HIVE-5891
 URL: https://issues.apache.org/jira/browse/HIVE-5891
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Sun Rui
Assignee: Sun Rui
 Attachments: HIVE-5891.1.patch


 Use the following test case with HIVE 0.12:
 {quote}
 create table src(key int, value string);
 load data local inpath 'src/data/files/kv1.txt' overwrite into table src;
 select * from (
   select c.key from
 (select a.key from src a join src b on a.key=b.key group by a.key) tmp
 join src c on tmp.key=c.key
   union all
   select c.key from
 (select a.key from src a join src b on a.key=b.key group by a.key) tmp
 join src c on tmp.key=c.key
 ) x;
 {quote}
 We will get a NullPointerException from Union Operator:
 {quote}
 java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
 Hive Runtime Error while processing row {_col0:0}
   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:175)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
 Error while processing row {_col0:0}
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:544)
   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:157)
   ... 4 more
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.UnionOperator.processOp(UnionOperator.java:120)
   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:88)
   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:652)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:655)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758)
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:220)
   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
   at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:91)
   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534)
   ... 5 more
 {quote}
   
 The root cause is in 
 CommonJoinTaskDispatcher.mergeMapJoinTaskIntoItsChildMapRedTask().
   +--+  +--+
   | MapJoin task |  | MapJoin task |
   +--+  +--+
  \ /
   \   /
  +--+
  |  Union task  |
  +--+
  
 CommonJoinTaskDispatcher merges the two MapJoin tasks into their common 
 child: Union task. The two MapJoin tasks have the same alias name for their 
 big tables: $INTNAME, which is the name of the temporary table of a join 
 stream. The aliasToWork map uses alias as key, so eventually only the MapJoin 
 operator tree of one MapJoin task is saved into the aliasToWork map of the 
 Union task, while the MapJoin operator tree of another MapJoin task is lost. 
 As a result, Union operator won't be initialized because not all of its 
 parents gets intialized (The Union operator itself indicates it has two 
 parents, but actually it has only 1 parent because another parent is lost).
 This issue does not exist in HIVE 0.11 and thus is a regression bug in HIVE 
 0.12.
 The propsed solution is to use the query ID as prefix for the join stream 
 name to avoid conflict and add sanity check code in CommonJoinTaskDispatcher 
 that merge of a MapJoin task into its child MapRed task is skipped if there 
 is any alias conflict. Please review the patch. I am not