回复: hive 0.11 auto convert join bug report

2013-08-11 Thread wzc1989
Hi Navis,

My colleague chenchun finds that hashcode of 'deal' and 'dim_pay_date' are the 
same and the code in MapJoinProcessor.java ignores the order of rowschema.
I look at your patch and it's exactly the same place we are working on.
Thanks for your patch.


在 2013年8月11日星期日,下午9:38,Navis류승우 写道:

> Hi,
>  
> I've booked this on https://issues.apache.org/jira/browse/HIVE-5056
> and attached patch for it.
>  
> It needs full test for confirmation but you can try it.
>  
> Thanks.
>  
> 2013/8/11 mailto:wzc1...@gmail.com)>:
> > Hi all:
> > when I change the table alias dim_pay_date to A, the query pass in hive
> > 0.11(https://gist.github.com/code6/6187569#file-hive11_auto_convert_join_change_alias_pass):
> >  
> > use test;
> > create table if not exists src ( `key` int,`val` string);
> > load data local inpath '/Users/code6/git/hive/data/files/kv1.txt' overwrite
> > into table src;
> > drop table if exists orderpayment_small;
> > create table orderpayment_small (`dealid` int,`date` string,`time` string,
> > `cityid` int, `userid` int);
> > insert overwrite table orderpayment_small select 748, '2011-03-24',
> > '2011-03-24', 55 ,5372613 from src limit 1;
> > drop table if exists user_small;
> > create table user_small( userid int);
> > insert overwrite table user_small select key from src limit 100;
> > set hive.auto.convert.join.noconditionaltask.size = 200;
> > SELECT
> > `A`.`date`
> > , `deal`.`dealid`
> > FROM `orderpayment_small` `orderpayment`
> > JOIN `orderpayment_small` `A` ON `A`.`date` = `orderpayment`.`date`
> > JOIN `orderpayment_small` `deal` ON `deal`.`dealid` =
> > `orderpayment`.`dealid`
> > JOIN `orderpayment_small` `order_city` ON `order_city`.`cityid` =
> > `orderpayment`.`cityid`
> > JOIN `user_small` `user` ON `user`.`userid` = `orderpayment`.`userid`
> > limit 5;
> >  
> >  
> > It's quite strange and interesting now. I will keep searching for the answer
> > to this issue.
> >  
> >  
> >  
> > 在 2013年8月9日星期五,上午3:32,wzc1...@gmail.com (mailto:wzc1...@gmail.com) 写道:
> >  
> > Hi all:
> > I'm currently testing hive11 and encounter one bug with
> > hive.auto.convert.join, I construct a testcase so everyone can reproduce
> > it(or you can reach the testcase
> > here:https://gist.github.com/code6/6187569#file-hive11_auto_convert_join_bug):
> >  
> > use test;
> > create table src ( `key` int,`val` string);
> > load data local inpath '/Users/code6/git/hive/data/files/kv1.txt' overwrite
> > into table src;
> > drop table if exists orderpayment_small;
> > create table orderpayment_small (`dealid` int,`date` string,`time` string,
> > `cityid` int, `userid` int);
> > insert overwrite table orderpayment_small select 748, '2011-03-24',
> > '2011-03-24', 55 ,5372613 from src limit 1;
> > drop table if exists user_small;
> > create table user_small( userid int);
> > insert overwrite table user_small select key from src limit 100;
> > set hive.auto.convert.join.noconditionaltask.size = 200;
> > SELECT
> > `dim_pay_date`.`date`
> > , `deal`.`dealid`
> > FROM `orderpayment_small` `orderpayment`
> > JOIN `orderpayment_small` `dim_pay_date` ON `dim_pay_date`.`date` =
> > `orderpayment`.`date`
> > JOIN `orderpayment_small` `deal` ON `deal`.`dealid` =
> > `orderpayment`.`dealid`
> > JOIN `orderpayment_small` `order_city` ON `order_city`.`cityid` =
> > `orderpayment`.`cityid`
> > JOIN `user_small` `user` ON `user`.`userid` = `orderpayment`.`userid`
> > limit 5;
> >  
> >  
> > You should replace the path of kv1.txt by yourself. You can run the above
> > query in hive 0.11 and it will fail with ArrayIndexOutOfBoundsException, You
> > can see the explain result and the console output of the query here :
> > https://gist.github.com/code6/6187569
> >  
> > I compile the trunk code but it doesn't work with this query. I can run this
> > query in hive 0.9 with hive.auto.convert.join turns on.
> >  
> > I try to dig into this problem and I think it may be caused by the map join
> > optimization. Some adjacent operators aren't match for the input/output
> > tableinfo(column positions diff).
> >  
> > I'm not able to fix this bug and I would appreciate it if someone would like
> > to look into this problem.
> >  
> > Thanks.  



回复: hive 0.11 auto convert join bug report

2013-08-11 Thread wzc1989
Hi all:
when I change the table alias dim_pay_date to A, the query pass in hive 
0.11(https://gist.github.com/code6/6187569#file-hive11_auto_convert_join_change_alias_pass):

use test;
create table if not exists src ( `key` int,`val` string);
load data local inpath '/Users/code6/git/hive/data/files/kv1.txt' overwrite 
into table src;
drop table if exists orderpayment_small;
create table orderpayment_small (`dealid` int,`date` string,`time` string, 
`cityid` int, `userid` int);
insert overwrite table orderpayment_small select 748, '2011-03-24', 
'2011-03-24', 55 ,5372613 from src limit 1;
drop table if exists user_small;
create table user_small( userid int);
insert overwrite table user_small select key from src limit 100;
set hive.auto.convert.join.noconditionaltask.size = 200;
SELECT
`A`.`date`
, `deal`.`dealid`
FROM `orderpayment_small` `orderpayment`
JOIN `orderpayment_small` `A` ON `A`.`date` = `orderpayment`.`date`
JOIN `orderpayment_small` `deal` ON `deal`.`dealid` = `orderpayment`.`dealid`
JOIN `orderpayment_small` `order_city` ON `order_city`.`cityid` = 
`orderpayment`.`cityid`
JOIN `user_small` `user` ON `user`.`userid` = `orderpayment`.`userid`
limit 5;



It's quite strange and interesting now. I will keep searching for the answer to 
this issue.




在 2013年8月9日星期五,上午3:32,wzc1...@gmail.com 写道:

> Hi all:  
> I'm currently testing hive11 and encounter one bug with 
> hive.auto.convert.join, I construct a testcase so everyone can reproduce 
> it(or you can reach the testcase 
> here:https://gist.github.com/code6/6187569#file-hive11_auto_convert_join_bug):
>  
> use test;
> create table src ( `key` int,`val` string);
> load data local inpath '/Users/code6/git/hive/data/files/kv1.txt' overwrite 
> into table src;
> drop table if exists orderpayment_small;
> create table orderpayment_small (`dealid` int,`date` string,`time` string, 
> `cityid` int, `userid` int);
> insert overwrite table orderpayment_small select 748, '2011-03-24', 
> '2011-03-24', 55 ,5372613 from src limit 1;
> drop table if exists user_small;
> create table user_small( userid int);
> insert overwrite table user_small select key from src limit 100;
> set hive.auto.convert.join.noconditionaltask.size = 200;
> SELECT
> `dim_pay_date`.`date`
> , `deal`.`dealid`
> FROM `orderpayment_small` `orderpayment`
> JOIN `orderpayment_small` `dim_pay_date` ON `dim_pay_date`.`date` = 
> `orderpayment`.`date`
> JOIN `orderpayment_small` `deal` ON `deal`.`dealid` = `orderpayment`.`dealid`
> JOIN `orderpayment_small` `order_city` ON `order_city`.`cityid` = 
> `orderpayment`.`cityid`
> JOIN `user_small` `user` ON `user`.`userid` = `orderpayment`.`userid`
> limit 5;
>  
>  
> You should replace the path of kv1.txt by yourself. You can run the above 
> query in hive 0.11 and it will fail with ArrayIndexOutOfBoundsException, You 
> can see the explain result and the console output of the query here : 
> https://gist.github.com/code6/6187569
>  
> I compile the trunk code but it doesn't work with this query. I can run this 
> query in hive 0.9 with hive.auto.convert.join turns on.
>  
> I try to dig into this problem and I think it may be caused by the map join 
> optimization. Some adjacent operators aren't match for the input/output 
> tableinfo(column positions diff).  
>  
> I'm not able to fix this bug and I would appreciate it if someone would like 
> to look into this problem.
>  
> Thanks.  



hive 0.11 auto convert join bug report

2013-08-08 Thread wzc1989
Hi all: 
I'm currently testing hive11 and encounter one bug with hive.auto.convert.join, 
I construct a testcase so everyone can reproduce it(or you can reach the 
testcase 
here:https://gist.github.com/code6/6187569#file-hive11_auto_convert_join_bug):

use test;
create table src ( `key` int,`val` string);
load data local inpath '/Users/code6/git/hive/data/files/kv1.txt' overwrite 
into table src;
drop table if exists orderpayment_small;
create table orderpayment_small (`dealid` int,`date` string,`time` string, 
`cityid` int, `userid` int);
insert overwrite table orderpayment_small select 748, '2011-03-24', 
'2011-03-24', 55 ,5372613 from src limit 1;
drop table if exists user_small;
create table user_small( userid int);
insert overwrite table user_small select key from src limit 100;
set hive.auto.convert.join.noconditionaltask.size = 200;
SELECT
`dim_pay_date`.`date`
, `deal`.`dealid`
FROM `orderpayment_small` `orderpayment`
JOIN `orderpayment_small` `dim_pay_date` ON `dim_pay_date`.`date` = 
`orderpayment`.`date`
JOIN `orderpayment_small` `deal` ON `deal`.`dealid` = `orderpayment`.`dealid`
JOIN `orderpayment_small` `order_city` ON `order_city`.`cityid` = 
`orderpayment`.`cityid`
JOIN `user_small` `user` ON `user`.`userid` = `orderpayment`.`userid`
limit 5;


You should replace the path of kv1.txt by yourself. You can run the above query 
in hive 0.11 and it will fail with ArrayIndexOutOfBoundsException, You can see 
the explain result and the console output of the query here : 
https://gist.github.com/code6/6187569

I compile the trunk code but it doesn't work with this query. I can run this 
query in hive 0.9 with hive.auto.convert.join turns on.

I try to dig into this problem and I think it may be caused by the map join 
optimization. Some adjacent operators aren't match for the input/output 
tableinfo(column positions diff). 

I'm not able to fix this bug and I would appreciate it if someone would like to 
look into this problem.

Thanks.

回复: BUG IN HIVE-4650 seems not fixed

2013-08-01 Thread wzc1989
Hi Yin:  
Thanks for the patch, I patch it and pass this testcase, I will use it with our 
hive11 production test.



在 2013年8月1日星期四,上午5:09,Yin Huai 写道:  
> I just uploaded a patch to https://issues.apache.org/jira/browse/HIVE-4968. 
> You can try it and see if the problem has been resolved for your query.
>  
>  
> On Wed, Jul 31, 2013 at 11:21 AM, Yin Huai  (mailto:huaiyin@gmail.com)> wrote:
> > Seems it is another problem.  
> > Can you try
> >  
> >  
> > SELECT *
> > FROM (SELECT VAL001 x1,
> >  VAL002 x2,
> >  VAL003 x3,
> >  VAL004 x4,
> >  VAL005 y
> >   FROM (SELECT /*+ mapjoin(v2) */ (VAL001- mu1) * 1/(sd1) VAL001,
> >(VAL002- mu2) * 1/(sd2) VAL002,
> >(VAL003- mu3) * 1/(sd3) VAL003,
> >(VAL004- mu4) * 1/(sd4) VAL004,
> >(VAL005- mu5) * 1/(sd5) VAL005
> > FROM (SELECT x1 VAL001,
> >  
> >  x2 VAL002,
> >  x3 VAL003,
> >  x4 VAL004,
> >  y VAL005
> >   FROM cmnt) v3
> >  
> > JOIN (SELECT count(*) c,
> >  avg(VAL001) mu1,
> >  avg(VAL002) mu2,
> >  avg(VAL003) mu3,
> >  avg(VAL004) mu4,
> >  avg(VAL005) mu5,
> >  stddev_pop(VAL001) sd1,
> >  stddev_pop(VAL002) sd2,
> >  stddev_pop(VAL003) sd3,
> >  stddev_pop(VAL004) sd4,
> >  stddev_pop(VAL005) sd5
> >   FROM (SELECT *
> > FROM (SELECT x1 VAL001,
> >  x2 VAL002,
> >  x3 VAL003,
> >  x4 VAL004,
> >  y VAL005
> >   FROM cmnt) obj1_3) v1) v2) obj1_7) obj1_6;
> >  
> > Also, cmnt in v3 will be used to create the hash table. Seems the part of 
> > code in converting Join to MapJoin does not play well with this part of 
> > your original query
> >  
> >  
> > SELECT *
> >  FROM
> >(SELECT x1 VAL001,
> >x2 VAL002,
> >x3 VAL003,
> >x4 VAL004,
> >y VAL005
> > FROM cmnt) obj1_3) v3
> >  
> >  
> > I have created https://issues.apache.org/jira/browse/HIVE-4968 to address 
> > this issue.
> >  
> >  
> >  
> >  
> > On Sun, Jul 28, 2013 at 11:46 PM,  > (mailto:wzc1...@gmail.com)> wrote:
> > > Hi:
> > > I attach the output of EXPLAIN, and the hive I use is compiled from trunk 
> > > and my hadoop version is 1.0.1. I use default hive configuration.  
> > >  
> > >  
> > > --  
> > > wzc1...@gmail.com (mailto:wzc1...@gmail.com)
> > > 已使用 Sparrow (http://www.sparrowmailapp.com/?sig)
> > >  
> > > 已使用 Sparrow (http://www.sparrowmailapp.com/?sig)  
> > >  
> > > 在 2013年7月29日星期一,下午1:08,Yin Huai 写道:
> > >  
> > > > Hi,
> > > >  
> > > > Can you also post the output of EXPLAIN? The execution plan may be 
> > > > helpful to locate the problem.
> > > >  
> > > > Thanks,
> > > >  
> > > > Yin
> > > >  
> > > >  
> > > > On Sun, Jul 28, 2013 at 8:06 PM,  > > > (mailto:wzc1...@gmail.com)> wrote:
> > > > > What I mean by "not pass the testcase in HIVE-4650" is that I compile 
> > > > > the trunk code and run the query in HIVE-4650:  
> > > > > SELECT *
> > > > > FROM
> > > > >   (SELECT VAL001 x1,
> > > > >   VAL002 x2,
> > > > >   VAL003 x3,
> > > > >   VAL004 x4,
> > > > >   VAL005 y
> > > > >FROM
> > > > >  (SELECT /*+ mapjoin(v2) */ (VAL001- mu1) * 1/(sd1) 
> > > > > VAL001,(VAL002- mu2) * 1/(sd2) VAL002,(VAL003- mu3) * 1/(sd3) 
> > > > > VAL003,(VAL004- mu4) * 1/(sd4) VAL004,(VAL005- mu5) * 1/(sd5) VAL005
> > > > >   FROM
> > > > > (SELECT *
> > > > >  FROM
> > > > >(SELECT x1 VAL001,
> > > > >x2 VAL002,
> > > > >x3 VAL003,
> > > > >x4 VAL004,
> > > > >y VAL005
> > > > > FROM cmnt) obj1_3) v3
> > > > >   JOIN
> > > > > (SELECT count(*) c,
> > > > > avg(VAL001) mu1,
> > > > > avg(VAL002) mu2,
> > > > > avg(VAL003) mu3,
> > > > > avg(VAL004) mu4,
> > > > > avg(VAL005) mu5,
> > > > > stddev_pop(VAL001) sd1,
> > > > > stddev_pop(VAL002) sd2,
> > > > > stddev_pop(VAL003) sd3,
> > > > > stddev_pop(VAL004) sd4,
> > > > > stddev_pop(VAL005) sd5
> > > > >  FROM
> > > > >(SELECT *
> > > > > FROM
> > > > >   (SELECT x1 VAL001,
> > > > >   x2 VAL002,
> > > > >

回复: BUG IN HIVE-4650 seems not fixed

2013-07-28 Thread wzc1989
Hi:
I attach the output of EXPLAIN, and the hive I use is compiled from trunk and 
my hadoop version is 1.0.1. I use default hive configuration.  


--  
wzc1...@gmail.com
已使用 Sparrow (http://www.sparrowmailapp.com/?sig)

已使用 Sparrow (http://www.sparrowmailapp.com/?sig)  

在 2013年7月29日星期一,下午1:08,Yin Huai 写道:

> Hi,
>  
> Can you also post the output of EXPLAIN? The execution plan may be helpful to 
> locate the problem.
>  
> Thanks,
>  
> Yin
>  
>  
> On Sun, Jul 28, 2013 at 8:06 PM,  (mailto:wzc1...@gmail.com)> wrote:
> > What I mean by "not pass the testcase in HIVE-4650" is that I compile the 
> > trunk code and run the query in HIVE-4650:  
> > SELECT *
> > FROM
> >   (SELECT VAL001 x1,
> >   VAL002 x2,
> >   VAL003 x3,
> >   VAL004 x4,
> >   VAL005 y
> >FROM
> >  (SELECT /*+ mapjoin(v2) */ (VAL001- mu1) * 1/(sd1) VAL001,(VAL002- 
> > mu2) * 1/(sd2) VAL002,(VAL003- mu3) * 1/(sd3) VAL003,(VAL004- mu4) * 
> > 1/(sd4) VAL004,(VAL005- mu5) * 1/(sd5) VAL005
> >   FROM
> > (SELECT *
> >  FROM
> >(SELECT x1 VAL001,
> >x2 VAL002,
> >x3 VAL003,
> >x4 VAL004,
> >y VAL005
> > FROM cmnt) obj1_3) v3
> >   JOIN
> > (SELECT count(*) c,
> > avg(VAL001) mu1,
> > avg(VAL002) mu2,
> > avg(VAL003) mu3,
> > avg(VAL004) mu4,
> > avg(VAL005) mu5,
> > stddev_pop(VAL001) sd1,
> > stddev_pop(VAL002) sd2,
> > stddev_pop(VAL003) sd3,
> > stddev_pop(VAL004) sd4,
> > stddev_pop(VAL005) sd5
> >  FROM
> >(SELECT *
> > FROM
> >   (SELECT x1 VAL001,
> >   x2 VAL002,
> >   x3 VAL003,
> >   x4 VAL004,
> >   y VAL005
> >FROM cmnt) obj1_3) v1) v2) obj1_7) obj1_6 ;
> >  
> >  
> > and it still fail at the same place:
> > …
> > Diagnostic Messages for this Task:
> > java.lang.RuntimeException: 
> > org.apache.hadoop.hive.ql.metadata.HiveException: 
> > java.lang.NullPointerException
> > at 
> > org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:162)
> > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
> > at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at javax.security.auth.Subject.doAs(Subject.java:416)
> > at 
> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
> > at org.apache.hadoop.mapred.Child.main(Child.java:249)
> > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> > java.lang.NullPointerException
> > at 
> > org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:198)
> > at 
> > org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:212)
> > at 
> > org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1377)
> > at 
> > org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1381)
> > at 
> > org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1381)
> > at 
> > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:611)
> > at 
> > org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
> > ... 8 more
> > Caused by: java.lang.NullPointerException
> > at 
> > org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:186)
> > ... 14 more
> >  
> >  
> > --  
> > wzc1...@gmail.com (mailto:wzc1...@gmail.com)
> > 已使用 Sparrow (http://www.sparrowmailapp.com/?sig)
> >  
> > 已使用 Sparrow (http://www.sparrowmailapp.com/?sig)  
> >  
> > 在 2013年7月28日星期日,下午8:08,wzc1...@gmail.com (mailto:wzc1...@gmail.com) 写道:
> >  
> > > hi all:  
> > >  
> > > We are currently testing hive 0.11 against our production environment and 
> > > run into some problems. Some of them are related to the param 
> > > "hive.auto.convert.join".  
> > > We disable this param and some failed testcases passed. By searching in 
> > > hive jira issues I find that the patch in 
> > > HIVE-4650(https://issues.apache.org/jira/browse/HIVE-4650) may be helpful.
> > > I compile the newest code in trunk and try the failed testcase in 
> > > HIVE-4650, but it doesn't pass. It seems that this issue is not fixed 
> > > while it's closed.
> > >  
> > > Am I missed something?
> > >  
> > > --  
> > > wzc1...@gmail.com (mailto:wzc1...@gmail.com)
> > > 已使用 Sparrow (http://www.sparrowmailapp.com/?sig)
> > >  
>

回复: BUG IN HIVE-4650 seems not fixed

2013-07-28 Thread wzc1989
What I mean by "not pass the testcase in HIVE-4650" is that I compile the trunk 
code and run the query in HIVE-4650:  
SELECT *
FROM
  (SELECT VAL001 x1,
  VAL002 x2,
  VAL003 x3,
  VAL004 x4,
  VAL005 y
   FROM
 (SELECT /*+ mapjoin(v2) */ (VAL001- mu1) * 1/(sd1) VAL001,(VAL002- mu2) * 
1/(sd2) VAL002,(VAL003- mu3) * 1/(sd3) VAL003,(VAL004- mu4) * 1/(sd4) 
VAL004,(VAL005- mu5) * 1/(sd5) VAL005
  FROM
(SELECT *
 FROM
   (SELECT x1 VAL001,
   x2 VAL002,
   x3 VAL003,
   x4 VAL004,
   y VAL005
FROM cmnt) obj1_3) v3
  JOIN
(SELECT count(*) c,
avg(VAL001) mu1,
avg(VAL002) mu2,
avg(VAL003) mu3,
avg(VAL004) mu4,
avg(VAL005) mu5,
stddev_pop(VAL001) sd1,
stddev_pop(VAL002) sd2,
stddev_pop(VAL003) sd3,
stddev_pop(VAL004) sd4,
stddev_pop(VAL005) sd5
 FROM
   (SELECT *
FROM
  (SELECT x1 VAL001,
  x2 VAL002,
  x3 VAL003,
  x4 VAL004,
  y VAL005
   FROM cmnt) obj1_3) v1) v2) obj1_7) obj1_6 ;


and it still fail at the same place:
…
Diagnostic Messages for this Task:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.NullPointerException
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:162)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:198)
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:212)
at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1377)
at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1381)
at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1381)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:611)
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
... 8 more
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:186)
... 14 more


--  
wzc1...@gmail.com
已使用 Sparrow (http://www.sparrowmailapp.com/?sig)

已使用 Sparrow (http://www.sparrowmailapp.com/?sig)  

在 2013年7月28日星期日,下午8:08,wzc1...@gmail.com 写道:

> hi all:  
>  
> We are currently testing hive 0.11 against our production environment and run 
> into some problems. Some of them are related to the param 
> "hive.auto.convert.join".
> We disable this param and some failed testcases passed. By searching in hive 
> jira issues I find that the patch in 
> HIVE-4650(https://issues.apache.org/jira/browse/HIVE-4650) may be helpful.
> I compile the newest code in trunk and try the failed testcase in HIVE-4650, 
> but it doesn't pass. It seems that this issue is not fixed while it's closed.
>  
> Am I missed something?
>  
> --  
> wzc1...@gmail.com (mailto:wzc1...@gmail.com)
> 已使用 Sparrow (http://www.sparrowmailapp.com/?sig)
>  
> 已使用 Sparrow (http://www.sparrowmailapp.com/?sig)  



BUG IN HIVE-4650 seems not fixed

2013-07-28 Thread wzc1989
hi all:  

We are currently testing hive 0.11 against our production environment and run 
into some problems. Some of them are related to the param 
"hive.auto.convert.join".
We disable this param and some failed testcases passed. By searching in hive 
jira issues I find that the patch in 
HIVE-4650(https://issues.apache.org/jira/browse/HIVE-4650) may be helpful.
I compile the newest code in trunk and try the failed testcase in HIVE-4650, 
but it doesn't pass. It seems that this issue is not fixed while it's closed.

Am I missed something?

--  
wzc1...@gmail.com
已使用 Sparrow (http://www.sparrowmailapp.com/?sig)

已使用 Sparrow (http://www.sparrowmailapp.com/?sig)

回复: different outer join plan between hive 0.9 and hive 0.10

2013-07-03 Thread wzc1989
Hi navis:  
Thanks for your reply. Currently I'm working on the  temporary solution by 
changing the type of filter mask and doing the performance test. I try to read 
the patches and source code now and when I get better understanding of the code 
maybe I can help with this problem :)

--  
wzc1989
已使用 Sparrow (http://www.sparrowmailapp.com/?sig)


在 2013年7月2日星期二,上午8:24,Navis류승우 写道:

> Yes, a little bit.
>  
> IMHO, these flags could be assigned only for aliases with condition on
> 'on' clause. Then, I think, even a byte (8 flags) could be enough in
> most cases.
>  
> I'll do that if time permits.
>  
> 2013/7/1 wzc1989 mailto:wzc1...@gmail.com)>:
> > hi navis:
> > look at the patches in (HIVE-3411, HIVE-4206, HIVE-4212, HIVE-3464), I
> > understand what you mean by "hive tags rows a filter mask as a short for
> > outer join, which can contain 16 flags. " . I wonder why not choose Long or
> > int which can contain 64/32 tags. Does adding one Long/int in every row cost
> > too much?
> >  
> > --
> > wzc1989
> > 已使用 Sparrow
> >  
> > 在 2013年5月14日星期二,下午2:17,Navis류승우 写道:
> >  
> > In short, hive tags rows a filter mask as a short for outer join,
> > which can contain 16 flags. (see HIVE-3411, plz)
> >  
> > I'll survey for a solution.
> >  
> > 2013/5/14 wzc1989 mailto:wzc1...@gmail.com)>:
> >  
> > "hive cannot merge joins of 16+ aliases with outer join into single stage."
> > In our use case we use one table full outer join all other table to produce
> > one big table, which may exceed 16 outer join limits and will be split into
> > multi stage under hive 0.10.
> > It become very slow under hive 0.10 while we run such query well under hive
> > 0.9.
> > I believe it's due to the diff of query plan. I wonder why hive 0.10 cannot
> > merge join 16+ aliases into single stage while hive 0.9 doesn't have such
> > issue. could you explain this or give me some hint?
> >  
> > Thanks!
> >  
> > --
> > wzc1989
> > 已使用 Sparrow
> >  
> > 在 2013年5月14日星期二,下午12:26,Navis류승우 写道:
> >  
> > The error message means hive cannot merge joins of 16+ aliases with
> > outer join into single stage. It was 8 way originally (HIVE-3411) but
> > expanded to 16 later.
> >  
> > Check https://issues.apache.org/jira/browse/HIVE-3411 for details.
> >  
> > 2013/5/14 wzc1989 mailto:wzc1...@gmail.com)>:
> >  
> > This time i cherry-pick HIVE-3464, HIVE-4212, HIVE-4206 and some related
> > commits and the above explain result matches in hive 0.9 and hive 0.10,
> > thanks!
> > But I confuse about this error msg:
> >  
> > JOINNODE_OUTERJOIN_MORETHAN_16(10142, "Single join node containing outer
> > join(s) " +
> > "cannot have more than 16 aliases"),
> >  
> > does this mean in hive0.10 when we have more than 16 outer join the query
> > plan will still have some bug?
> > I test the sql below and find the explain result still diff between hive 0.9
> > and hive 0.10.
> >  
> > explain select
> > sum(a.value) val
> > from default.test_join a
> > left outer join default.test_join b on a.key = b.key
> > left outer join default.test_join c on a.key = c.key
> > left outer join default.test_join d on a.key = d.key
> > left outer join default.test_join e on a.key = e.key
> > left outer join default.test_join f on a.key = f.key
> > left outer join default.test_join g on a.key = g.key
> > left outer join default.test_join h on a.key = h.key
> > left outer join default.test_join i on a.key = i.key
> > left outer join default.test_join j on a.key = j.key
> > left outer join default.test_join k on a.key = k.key
> > left outer join default.test_join l on a.key = l.key
> > left outer join default.test_join m on a.key = m.key
> > left outer join default.test_join n on a.key = n.key
> > left outer join default.test_join u on a.key = u.key
> > left outer join default.test_join v on a.key = v.key
> > left outer join default.test_join w on a.key = w.key
> > left outer join default.test_join x on a.key = x.key
> > left outer join default.test_join z on a.key = z.key
> >  
> >  
> > --
> > wzc1989
> > 已使用 Sparrow
> >  
> > 在 2013年3月29日星期五,上午9:34,Navis류승우 写道:
> >  
> > The problem is mixture of issues (HIVE-3411, HIVE-4209, HIVE-4212,
> > HIVE-3464) and still not completely fixed even in trunk.
> >  
> > Will be fixed shortly.
> >  
> > 2013/3/29 wzc mailto:wzc1...@gmail.com)>:
> >  
>

回复: different outer join plan between hive 0.9 and hive 0.10

2013-07-01 Thread wzc1989
hi navis:
look at the patches in (HIVE-3411, HIVE-4206, HIVE-4212, HIVE-3464),  I 
understand what you mean by "hive tags rows a filter mask as a short for outer 
join, which can contain 16 flags. " . I wonder why not choose Long or int which 
can contain 64/32 tags. Does adding one Long/int in every row cost too much?

--  
wzc1989
已使用 Sparrow (http://www.sparrowmailapp.com/?sig)


在 2013年5月14日星期二,下午2:17,Navis류승우 写道:

> In short, hive tags rows a filter mask as a short for outer join,
> which can contain 16 flags. (see HIVE-3411, plz)
>  
> I'll survey for a solution.
>  
> 2013/5/14 wzc1989 mailto:wzc1...@gmail.com)>:
> > "hive cannot merge joins of 16+ aliases with outer join into single stage."
> > In our use case we use one table full outer join all other table to produce
> > one big table, which may exceed 16 outer join limits and will be split into
> > multi stage under hive 0.10.
> > It become very slow under hive 0.10 while we run such query well under hive
> > 0.9.
> > I believe it's due to the diff of query plan. I wonder why hive 0.10 cannot
> > merge join 16+ aliases into single stage while hive 0.9 doesn't have such
> > issue. could you explain this or give me some hint?
> >  
> > Thanks!
> >  
> > --
> > wzc1989
> > 已使用 Sparrow
> >  
> > 在 2013年5月14日星期二,下午12:26,Navis류승우 写道:
> >  
> > The error message means hive cannot merge joins of 16+ aliases with
> > outer join into single stage. It was 8 way originally (HIVE-3411) but
> > expanded to 16 later.
> >  
> > Check https://issues.apache.org/jira/browse/HIVE-3411 for details.
> >  
> > 2013/5/14 wzc1989 mailto:wzc1...@gmail.com)>:
> >  
> > This time i cherry-pick HIVE-3464, HIVE-4212, HIVE-4206 and some related
> > commits and the above explain result matches in hive 0.9 and hive 0.10,
> > thanks!
> > But I confuse about this error msg:
> >  
> > JOINNODE_OUTERJOIN_MORETHAN_16(10142, "Single join node containing outer
> > join(s) " +
> > "cannot have more than 16 aliases"),
> >  
> > does this mean in hive0.10 when we have more than 16 outer join the query
> > plan will still have some bug?
> > I test the sql below and find the explain result still diff between hive 0.9
> > and hive 0.10.
> >  
> > explain select
> > sum(a.value) val
> > from default.test_join a
> > left outer join default.test_join b on a.key = b.key
> > left outer join default.test_join c on a.key = c.key
> > left outer join default.test_join d on a.key = d.key
> > left outer join default.test_join e on a.key = e.key
> > left outer join default.test_join f on a.key = f.key
> > left outer join default.test_join g on a.key = g.key
> > left outer join default.test_join h on a.key = h.key
> > left outer join default.test_join i on a.key = i.key
> > left outer join default.test_join j on a.key = j.key
> > left outer join default.test_join k on a.key = k.key
> > left outer join default.test_join l on a.key = l.key
> > left outer join default.test_join m on a.key = m.key
> > left outer join default.test_join n on a.key = n.key
> > left outer join default.test_join u on a.key = u.key
> > left outer join default.test_join v on a.key = v.key
> > left outer join default.test_join w on a.key = w.key
> > left outer join default.test_join x on a.key = x.key
> > left outer join default.test_join z on a.key = z.key
> >  
> >  
> > --
> > wzc1989
> > 已使用 Sparrow
> >  
> > 在 2013年3月29日星期五,上午9:34,Navis류승우 写道:
> >  
> > The problem is mixture of issues (HIVE-3411, HIVE-4209, HIVE-4212,
> > HIVE-3464) and still not completely fixed even in trunk.
> >  
> > Will be fixed shortly.
> >  
> > 2013/3/29 wzc mailto:wzc1...@gmail.com)>:
> >  
> > The bug remains even if I apply the patch in HIVE-4206 :( The explain
> > result hasn't change.
> >  
> >  
> > 2013/3/28 Navis류승우 mailto:navis@nexr.com)>
> >  
> >  
> > It's a bug (https://issues.apache.org/jira/browse/HIVE-4206).
> >  
> > Thanks for reporting it.
> >  
> > 2013/3/24 wzc mailto:wzc1...@gmail.com)>:
> >  
> > Recently we tried to upgrade our hive from 0.9 to 0.10, but found some
> > of
> > our hive queries almost 7 times slow. One of such query consists
> > multiple
> > table outer join on the same key. By looking into the query, we found
> > the
> > query plans generate by hive 0.9 and hive 0.10 are different. Here is
> > the
> > example:
> >  
> &g

回复: different outer join plan between hive 0.9 and hive 0.10

2013-05-13 Thread wzc1989
"hive cannot merge joins of 16+ aliases with outer join into single stage."
In our use case we use one table full outer join all other table to produce one 
big table, which may exceed 16 outer join limits and will be split into multi 
stage under hive 0.10.
It become very slow under hive 0.10 while we run such query well under hive 0.9.
I believe it's due to the diff of query plan. I wonder why hive 0.10 cannot 
merge join 16+ aliases into single stage while hive 0.9 doesn't have such 
issue. could you explain this or give me some hint?

Thanks!  

--  
wzc1989
已使用 Sparrow (http://www.sparrowmailapp.com/?sig)


在 2013年5月14日星期二,下午12:26,Navis류승우 写道:

> The error message means hive cannot merge joins of 16+ aliases with
> outer join into single stage. It was 8 way originally (HIVE-3411) but
> expanded to 16 later.
>  
> Check https://issues.apache.org/jira/browse/HIVE-3411 for details.
>  
> 2013/5/14 wzc1989 mailto:wzc1...@gmail.com)>:
> > This time i cherry-pick HIVE-3464, HIVE-4212, HIVE-4206 and some related
> > commits and the above explain result matches in hive 0.9 and hive 0.10,
> > thanks!
> > But I confuse about this error msg:
> >  
> > JOINNODE_OUTERJOIN_MORETHAN_16(10142, "Single join node containing outer
> > join(s) " +
> > "cannot have more than 16 aliases"),
> >  
> > does this mean in hive0.10 when we have more than 16 outer join the query
> > plan will still have some bug?
> > I test the sql below and find the explain result still diff between hive 0.9
> > and hive 0.10.
> >  
> > explain select
> > sum(a.value) val
> > from default.test_join a
> > left outer join default.test_join b on a.key = b.key
> > left outer join default.test_join c on a.key = c.key
> > left outer join default.test_join d on a.key = d.key
> > left outer join default.test_join e on a.key = e.key
> > left outer join default.test_join f on a.key = f.key
> > left outer join default.test_join g on a.key = g.key
> > left outer join default.test_join h on a.key = h.key
> > left outer join default.test_join i on a.key = i.key
> > left outer join default.test_join j on a.key = j.key
> > left outer join default.test_join k on a.key = k.key
> > left outer join default.test_join l on a.key = l.key
> > left outer join default.test_join m on a.key = m.key
> > left outer join default.test_join n on a.key = n.key
> > left outer join default.test_join u on a.key = u.key
> > left outer join default.test_join v on a.key = v.key
> > left outer join default.test_join w on a.key = w.key
> > left outer join default.test_join x on a.key = x.key
> > left outer join default.test_join z on a.key = z.key
> >  
> >  
> > --
> > wzc1989
> > 已使用 Sparrow
> >  
> > 在 2013年3月29日星期五,上午9:34,Navis류승우 写道:
> >  
> > The problem is mixture of issues (HIVE-3411, HIVE-4209, HIVE-4212,
> > HIVE-3464) and still not completely fixed even in trunk.
> >  
> > Will be fixed shortly.
> >  
> > 2013/3/29 wzc mailto:wzc1...@gmail.com)>:
> >  
> > The bug remains even if I apply the patch in HIVE-4206 :( The explain
> > result hasn't change.
> >  
> >  
> > 2013/3/28 Navis류승우 mailto:navis@nexr.com)>
> >  
> >  
> > It's a bug (https://issues.apache.org/jira/browse/HIVE-4206).
> >  
> > Thanks for reporting it.
> >  
> > 2013/3/24 wzc mailto:wzc1...@gmail.com)>:
> >  
> > Recently we tried to upgrade our hive from 0.9 to 0.10, but found some
> > of
> > our hive queries almost 7 times slow. One of such query consists
> > multiple
> > table outer join on the same key. By looking into the query, we found
> > the
> > query plans generate by hive 0.9 and hive 0.10 are different. Here is
> > the
> > example:
> >  
> > testcase:
> >  
> > use default;
> > create table test_join (
> > `key` string,
> > `value` string
> > );
> >  
> > explain select
> > sum(a.value) val
> > from default.test_join a
> > left outer join default.test_join b on a.key = b.key
> > left outer join default.test_join c on a.key = c.key
> > left outer join default.test_join d on a.key = d.key
> > left outer join default.test_join e on a.key = e.key
> > left outer join default.test_join f on a.key = f.key
> > left outer join default.test_join g on a.key = g.key
> >  
> >  
> > the explain of hive 0.9:
> >  
> > STAGE DEPENDENCIES:
> >  
> > Stage-1 is a root stage
> >  
> > Stage-2 depends on stages: Stage-1
> >  
> > Stage

回复: different outer join plan between hive 0.9 and hive 0.10

2013-05-13 Thread wzc1989
This time i cherry-pick HIVE-3464, HIVE-4212, HIVE-4206 and some related 
commits and the above explain result matches in hive 0.9 and hive 0.10, thanks!
But I confuse about this error msg:

 JOINNODE_OUTERJOIN_MORETHAN_16(10142, "Single join node containing outer 
join(s) " +
  "cannot have more than 16 aliases"),


does this mean in hive0.10 when we have more than 16 outer join the query plan 
will still have some bug?
I test the sql below and find the explain result still diff between hive 0.9 
and hive 0.10.  

explain select
sum(a.value) val
from default.test_join a
left outer join default.test_join b on a.key = b.key
left outer join default.test_join c on a.key = c.key
left outer join default.test_join d on a.key = d.key
left outer join default.test_join e on a.key = e.key
left outer join default.test_join f on a.key = f.key
left outer join default.test_join g on a.key = g.key
left outer join default.test_join h on a.key = h.key
left outer join default.test_join i on a.key = i.key
left outer join default.test_join j on a.key = j.key
left outer join default.test_join k on a.key = k.key
left outer join default.test_join l on a.key = l.key
left outer join default.test_join m on a.key = m.key
left outer join default.test_join n on a.key = n.key
left outer join default.test_join u on a.key = u.key
left outer join default.test_join v on a.key = v.key
left outer join default.test_join w on a.key = w.key
left outer join default.test_join x on a.key = x.key
left outer join default.test_join z on a.key = z.key



--  
wzc1989
已使用 Sparrow (http://www.sparrowmailapp.com/?sig)


在 2013年3月29日星期五,上午9:34,Navis류승우 写道:

> The problem is mixture of issues (HIVE-3411, HIVE-4209, HIVE-4212,
> HIVE-3464) and still not completely fixed even in trunk.
>  
> Will be fixed shortly.
>  
> 2013/3/29 wzc mailto:wzc1...@gmail.com)>:
> > The bug remains even if I apply the patch in HIVE-4206 :( The explain
> > result hasn't change.
> >  
> >  
> > 2013/3/28 Navis류승우 mailto:navis@nexr.com)>
> > >  
> > > It's a bug (https://issues.apache.org/jira/browse/HIVE-4206).
> > >  
> > > Thanks for reporting it.
> > >  
> > > 2013/3/24 wzc mailto:wzc1...@gmail.com)>:
> > > > Recently we tried to upgrade our hive from 0.9 to 0.10, but found some
> > > > of
> > > > our hive queries almost 7 times slow. One of such query consists
> > > > multiple
> > > > table outer join on the same key. By looking into the query, we found
> > > > the
> > > > query plans generate by hive 0.9 and hive 0.10 are different. Here is
> > > > the
> > > > example:
> > > >  
> > > > testcase:
> > > >  
> > > > use default;
> > > > create table test_join (
> > > > `key` string,
> > > > `value` string
> > > > );
> > > >  
> > > > explain select
> > > > sum(a.value) val
> > > > from default.test_join a
> > > > left outer join default.test_join b on a.key = b.key
> > > > left outer join default.test_join c on a.key = c.key
> > > > left outer join default.test_join d on a.key = d.key
> > > > left outer join default.test_join e on a.key = e.key
> > > > left outer join default.test_join f on a.key = f.key
> > > > left outer join default.test_join g on a.key = g.key
> > > >  
> > > >  
> > > > the explain of hive 0.9:
> > > >  
> > > > STAGE DEPENDENCIES:
> > > >  
> > > > Stage-1 is a root stage
> > > >  
> > > > Stage-2 depends on stages: Stage-1
> > > >  
> > > > Stage-0 is a root stage
> > > >  
> > > > ...
> > > >  
> > > > Reduce Operator Tree:
> > > >  
> > > > Join Operator
> > > >  
> > > > condition map:
> > > >  
> > > > Left Outer Join0 to 1
> > > >  
> > > > Left Outer Join0 to 2
> > > >  
> > > > Left Outer Join0 to 3
> > > >  
> > > > Left Outer Join0 to 4
> > > >  
> > > > Left Outer Join0 to 5
> > > >  
> > > > Left Outer Join0 to 6
> > > >  
> > > > condition expressions:
> > > >  
> > > > 0 {VALUE._col1}
> > > >  
> > > > 1
> > > >  
> > > > 2
> > > >  
> > > > 3
> > > >  
> > > > 4
> > > >  
> > > > 5
> > > >  
> > > > 6
&