[jira] [Commented] (HIVE-12189) The list in pushdownPreds of ppd.ExprWalkerInfo should not be allowed to grow very large

2015-10-21 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967804#comment-14967804
 ] 

Ashutosh Chauhan commented on HIVE-12189:
-

[~jcamachorodriguez] This one might interest you.

> The list in pushdownPreds of ppd.ExprWalkerInfo should not be allowed to grow 
> very large
> 
>
> Key: HIVE-12189
> URL: https://issues.apache.org/jira/browse/HIVE-12189
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.1.0, 2.0.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Attachments: HIVE-12189.1.patch
>
>
> Some queries are very slow in compile time, for example following query
> {noformat}
> select * from tt1 nf 
> join tt2 a1 on (nf.col1 = a1.col1 and nf.hdp_databaseid = a1.hdp_databaseid) 
> join tt3 a2 on(a2.col2 = a1.col2 and a2.col3 = nf.col3 and 
> a2.hdp_databaseid = nf.hdp_databaseid) 
> join tt4 a3 on  (a3.col4 = a2.col4 and a3.col3 = a2.col3) 
> join tt5 a4 on (a4.col4 = a2.col4 and a4.col5 = a2.col5 and a4.col3 = 
> a2.col3 and a4.hdp_databaseid = nf.hdp_databaseid) 
> join tt6 a5 on  (a5.col3 = a2.col3 and a5.col2 = a2.col2 and 
> a5.hdp_databaseid = nf.hdp_databaseid) 
> JOIN tt7 a6 ON (a2.col3 = a6.col3 and a2.col2 = a6.col2 and a6.hdp_databaseid 
> = nf.hdp_databaseid) 
> JOIN tt8 a7 ON (a2.col3 = a7.col3 and a2.col2 = a7.col2 and a7.hdp_databaseid 
> = nf.hdp_databaseid)
> where nf.hdp_databaseid = 102 limit 10;
> {noformat}
> takes around 120 seconds to compile in hive 1.1 when
> hive.mapred.mode=strict;
> hive.optimize.ppd=true;
> and hive is not in test mode.
> All the above tables are tables with one column as partition. But all the 
> tables are empty table. If the tables are not empty, it is reported that the 
> compile so slow that it looks like hive is hanging. 
> In hive 2.0, the compile is much faster, explain takes 6.6 seconds. But it is 
> still a lot of time. One of the problem slows ppd down is that list in 
> pushdownPreds can grow very large which makes extractPushdownPreds bad 
> performance:
> {noformat}
> public static ExprWalkerInfo extractPushdownPreds(OpWalkerInfo opContext,
> Operator op, List preds)
> {noformat}
> During run the query above, in the following break point preds  has size of 
> 12051, and most entry of the list is: 
> GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), 
> GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), 
> GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), 
> GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), 
> Following code in extractPushdownPreds will clone all the nodes in preds and 
> do the walk. Hive 2.0 is faster because HIVE-11652(and other jiras) makes 
> startWalking much faster, but we still clone thousands of nodes with same 
> expression. Should we store so many same predicates in the list or just one 
> is good enough?  
> {noformat}
> List startNodes = new ArrayList();
> List clonedPreds = new ArrayList();
> for (ExprNodeDesc node : preds) {
>   ExprNodeDesc clone = node.clone();
>   clonedPreds.add(clone);
>   exprContext.getNewToOldExprMap().put(clone, node);
> }
> startNodes.addAll(clonedPreds);
> egw.startWalking(startNodes, null);
> {noformat}
> Should we change java/org/apache/hadoop/hive/ql/ppd/ExprWalkerInfo.java
> method 
> public void addFinalCandidate(String alias, ExprNodeDesc expr) 
> and
> public void addPushDowns(String alias, List pushDowns) 
> to only add expr which is not in the PushDown list for an alias?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12189) The list in pushdownPreds of ppd.ExprWalkerInfo should not be allowed to grow very large

2015-10-21 Thread Chao Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967713#comment-14967713
 ] 

Chao Sun commented on HIVE-12189:
-

Sorry, didn't see this. I'll take a look at the patch today.

> The list in pushdownPreds of ppd.ExprWalkerInfo should not be allowed to grow 
> very large
> 
>
> Key: HIVE-12189
> URL: https://issues.apache.org/jira/browse/HIVE-12189
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.1.0, 2.0.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Attachments: HIVE-12189.1.patch
>
>
> Some queries are very slow in compile time, for example following query
> {noformat}
> select * from tt1 nf 
> join tt2 a1 on (nf.col1 = a1.col1 and nf.hdp_databaseid = a1.hdp_databaseid) 
> join tt3 a2 on(a2.col2 = a1.col2 and a2.col3 = nf.col3 and 
> a2.hdp_databaseid = nf.hdp_databaseid) 
> join tt4 a3 on  (a3.col4 = a2.col4 and a3.col3 = a2.col3) 
> join tt5 a4 on (a4.col4 = a2.col4 and a4.col5 = a2.col5 and a4.col3 = 
> a2.col3 and a4.hdp_databaseid = nf.hdp_databaseid) 
> join tt6 a5 on  (a5.col3 = a2.col3 and a5.col2 = a2.col2 and 
> a5.hdp_databaseid = nf.hdp_databaseid) 
> JOIN tt7 a6 ON (a2.col3 = a6.col3 and a2.col2 = a6.col2 and a6.hdp_databaseid 
> = nf.hdp_databaseid) 
> JOIN tt8 a7 ON (a2.col3 = a7.col3 and a2.col2 = a7.col2 and a7.hdp_databaseid 
> = nf.hdp_databaseid)
> where nf.hdp_databaseid = 102 limit 10;
> {noformat}
> takes around 120 seconds to compile in hive 1.1 when
> hive.mapred.mode=strict;
> hive.optimize.ppd=true;
> and hive is not in test mode.
> All the above tables are tables with one column as partition. But all the 
> tables are empty table. If the tables are not empty, it is reported that the 
> compile so slow that it looks like hive is hanging. 
> In hive 2.0, the compile is much faster, explain takes 6.6 seconds. But it is 
> still a lot of time. One of the problem slows ppd down is that list in 
> pushdownPreds can grow very large which makes extractPushdownPreds bad 
> performance:
> {noformat}
> public static ExprWalkerInfo extractPushdownPreds(OpWalkerInfo opContext,
> Operator op, List preds)
> {noformat}
> During run the query above, in the following break point preds  has size of 
> 12051, and most entry of the list is: 
> GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), 
> GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), 
> GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), 
> GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), 
> Following code in extractPushdownPreds will clone all the nodes in preds and 
> do the walk. Hive 2.0 is faster because HIVE-11652(and other jiras) makes 
> startWalking much faster, but we still clone thousands of nodes with same 
> expression. Should we store so many same predicates in the list or just one 
> is good enough?  
> {noformat}
> List startNodes = new ArrayList();
> List clonedPreds = new ArrayList();
> for (ExprNodeDesc node : preds) {
>   ExprNodeDesc clone = node.clone();
>   clonedPreds.add(clone);
>   exprContext.getNewToOldExprMap().put(clone, node);
> }
> startNodes.addAll(clonedPreds);
> egw.startWalking(startNodes, null);
> {noformat}
> Should we change java/org/apache/hadoop/hive/ql/ppd/ExprWalkerInfo.java
> method 
> public void addFinalCandidate(String alias, ExprNodeDesc expr) 
> and
> public void addPushDowns(String alias, List pushDowns) 
> to only add expr which is not in the PushDown list for an alias?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12189) The list in pushdownPreds of ppd.ExprWalkerInfo should not be allowed to grow very large

2015-10-21 Thread Chao Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968542#comment-14968542
 ] 

Chao Sun commented on HIVE-12189:
-

Patch looks good to me. +1. I don't think we should add predicates that are 
semantically the same.


> The list in pushdownPreds of ppd.ExprWalkerInfo should not be allowed to grow 
> very large
> 
>
> Key: HIVE-12189
> URL: https://issues.apache.org/jira/browse/HIVE-12189
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.1.0, 2.0.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Attachments: HIVE-12189.1.patch
>
>
> Some queries are very slow in compile time, for example following query
> {noformat}
> select * from tt1 nf 
> join tt2 a1 on (nf.col1 = a1.col1 and nf.hdp_databaseid = a1.hdp_databaseid) 
> join tt3 a2 on(a2.col2 = a1.col2 and a2.col3 = nf.col3 and 
> a2.hdp_databaseid = nf.hdp_databaseid) 
> join tt4 a3 on  (a3.col4 = a2.col4 and a3.col3 = a2.col3) 
> join tt5 a4 on (a4.col4 = a2.col4 and a4.col5 = a2.col5 and a4.col3 = 
> a2.col3 and a4.hdp_databaseid = nf.hdp_databaseid) 
> join tt6 a5 on  (a5.col3 = a2.col3 and a5.col2 = a2.col2 and 
> a5.hdp_databaseid = nf.hdp_databaseid) 
> JOIN tt7 a6 ON (a2.col3 = a6.col3 and a2.col2 = a6.col2 and a6.hdp_databaseid 
> = nf.hdp_databaseid) 
> JOIN tt8 a7 ON (a2.col3 = a7.col3 and a2.col2 = a7.col2 and a7.hdp_databaseid 
> = nf.hdp_databaseid)
> where nf.hdp_databaseid = 102 limit 10;
> {noformat}
> takes around 120 seconds to compile in hive 1.1 when
> hive.mapred.mode=strict;
> hive.optimize.ppd=true;
> and hive is not in test mode.
> All the above tables are tables with one column as partition. But all the 
> tables are empty table. If the tables are not empty, it is reported that the 
> compile so slow that it looks like hive is hanging. 
> In hive 2.0, the compile is much faster, explain takes 6.6 seconds. But it is 
> still a lot of time. One of the problem slows ppd down is that list in 
> pushdownPreds can grow very large which makes extractPushdownPreds bad 
> performance:
> {noformat}
> public static ExprWalkerInfo extractPushdownPreds(OpWalkerInfo opContext,
> Operator op, List preds)
> {noformat}
> During run the query above, in the following break point preds  has size of 
> 12051, and most entry of the list is: 
> GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), 
> GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), 
> GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), 
> GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), 
> Following code in extractPushdownPreds will clone all the nodes in preds and 
> do the walk. Hive 2.0 is faster because HIVE-11652(and other jiras) makes 
> startWalking much faster, but we still clone thousands of nodes with same 
> expression. Should we store so many same predicates in the list or just one 
> is good enough?  
> {noformat}
> List startNodes = new ArrayList();
> List clonedPreds = new ArrayList();
> for (ExprNodeDesc node : preds) {
>   ExprNodeDesc clone = node.clone();
>   clonedPreds.add(clone);
>   exprContext.getNewToOldExprMap().put(clone, node);
> }
> startNodes.addAll(clonedPreds);
> egw.startWalking(startNodes, null);
> {noformat}
> Should we change java/org/apache/hadoop/hive/ql/ppd/ExprWalkerInfo.java
> method 
> public void addFinalCandidate(String alias, ExprNodeDesc expr) 
> and
> public void addPushDowns(String alias, List pushDowns) 
> to only add expr which is not in the PushDown list for an alias?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12189) The list in pushdownPreds of ppd.ExprWalkerInfo should not be allowed to grow very large

2015-10-20 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14965444#comment-14965444
 ] 

Yongzhi Chen commented on HIVE-12189:
-

[~csun], [~ctang.ma] could you review the change? Thanks

> The list in pushdownPreds of ppd.ExprWalkerInfo should not be allowed to grow 
> very large
> 
>
> Key: HIVE-12189
> URL: https://issues.apache.org/jira/browse/HIVE-12189
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.1.0, 2.0.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Attachments: HIVE-12189.1.patch
>
>
> Some queries are very slow in compile time, for example following query
> {noformat}
> select * from tt1 nf 
> join tt2 a1 on (nf.col1 = a1.col1 and nf.hdp_databaseid = a1.hdp_databaseid) 
> join tt3 a2 on(a2.col2 = a1.col2 and a2.col3 = nf.col3 and 
> a2.hdp_databaseid = nf.hdp_databaseid) 
> join tt4 a3 on  (a3.col4 = a2.col4 and a3.col3 = a2.col3) 
> join tt5 a4 on (a4.col4 = a2.col4 and a4.col5 = a2.col5 and a4.col3 = 
> a2.col3 and a4.hdp_databaseid = nf.hdp_databaseid) 
> join tt6 a5 on  (a5.col3 = a2.col3 and a5.col2 = a2.col2 and 
> a5.hdp_databaseid = nf.hdp_databaseid) 
> JOIN tt7 a6 ON (a2.col3 = a6.col3 and a2.col2 = a6.col2 and a6.hdp_databaseid 
> = nf.hdp_databaseid) 
> JOIN tt8 a7 ON (a2.col3 = a7.col3 and a2.col2 = a7.col2 and a7.hdp_databaseid 
> = nf.hdp_databaseid)
> where nf.hdp_databaseid = 102 limit 10;
> {noformat}
> takes around 120 seconds to compile in hive 1.1 when
> hive.mapred.mode=strict;
> hive.optimize.ppd=true;
> and hive is not in test mode.
> All the above tables are tables with one column as partition. But all the 
> tables are empty table. If the tables are not empty, it is reported that the 
> compile so slow that it looks like hive is hanging. 
> In hive 2.0, the compile is much faster, explain takes 6.6 seconds. But it is 
> still a lot of time. One of the problem slows ppd down is that list in 
> pushdownPreds can grow very large which makes extractPushdownPreds bad 
> performance:
> {noformat}
> public static ExprWalkerInfo extractPushdownPreds(OpWalkerInfo opContext,
> Operator op, List preds)
> {noformat}
> During run the query above, in the following break point preds  has size of 
> 12051, and most entry of the list is: 
> GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), 
> GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), 
> GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), 
> GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), 
> Following code in extractPushdownPreds will clone all the nodes in preds and 
> do the walk. Hive 2.0 is faster because HIVE-11652(and other jiras) makes 
> startWalking much faster, but we still clone thousands of nodes with same 
> expression. Should we store so many same predicates in the list or just one 
> is good enough?  
> {noformat}
> List startNodes = new ArrayList();
> List clonedPreds = new ArrayList();
> for (ExprNodeDesc node : preds) {
>   ExprNodeDesc clone = node.clone();
>   clonedPreds.add(clone);
>   exprContext.getNewToOldExprMap().put(clone, node);
> }
> startNodes.addAll(clonedPreds);
> egw.startWalking(startNodes, null);
> {noformat}
> Should we change java/org/apache/hadoop/hive/ql/ppd/ExprWalkerInfo.java
> method 
> public void addFinalCandidate(String alias, ExprNodeDesc expr) 
> and
> public void addPushDowns(String alias, List pushDowns) 
> to only add expr which is not in the PushDown list for an alias?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12189) The list in pushdownPreds of ppd.ExprWalkerInfo should not be allowed to grow very large

2015-10-19 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14963281#comment-14963281
 ] 

Yongzhi Chen commented on HIVE-12189:
-

I did some comparing tests:
1. Backported HIVE-11652 to HIVE 1.1 version, does not make compile much 
faster, it still took 102 seconds. So HIVE-11652 alone can not make 
egw.startWalking as fast as HIVE 2.0. I need find more jiras to backport.
2. Only apply the patch for this jira(HIVE-12189), the compile time for the 
query drop to 6.2 seconds for HIVE 1.1. 
3. For HIVE 2.0, this patch help drop the compile time from 6.6 second to 2.3


> The list in pushdownPreds of ppd.ExprWalkerInfo should not be allowed to grow 
> very large
> 
>
> Key: HIVE-12189
> URL: https://issues.apache.org/jira/browse/HIVE-12189
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.1.0, 2.0.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Attachments: HIVE-12189.1.patch
>
>
> Some queries are very slow in compile time, for example following query
> {noformat}
> select * from tt1 nf 
> join tt2 a1 on (nf.col1 = a1.col1 and nf.hdp_databaseid = a1.hdp_databaseid) 
> join tt3 a2 on(a2.col2 = a1.col2 and a2.col3 = nf.col3 and 
> a2.hdp_databaseid = nf.hdp_databaseid) 
> join tt4 a3 on  (a3.col4 = a2.col4 and a3.col3 = a2.col3) 
> join tt5 a4 on (a4.col4 = a2.col4 and a4.col5 = a2.col5 and a4.col3 = 
> a2.col3 and a4.hdp_databaseid = nf.hdp_databaseid) 
> join tt6 a5 on  (a5.col3 = a2.col3 and a5.col2 = a2.col2 and 
> a5.hdp_databaseid = nf.hdp_databaseid) 
> JOIN tt7 a6 ON (a2.col3 = a6.col3 and a2.col2 = a6.col2 and a6.hdp_databaseid 
> = nf.hdp_databaseid) 
> JOIN tt8 a7 ON (a2.col3 = a7.col3 and a2.col2 = a7.col2 and a7.hdp_databaseid 
> = nf.hdp_databaseid)
> where nf.hdp_databaseid = 102 limit 10;
> {noformat}
> takes around 120 seconds to compile in hive 1.1 when
> hive.mapred.mode=strict;
> hive.optimize.ppd=true;
> and hive is not in test mode.
> All the above tables are tables with one column as partition. But all the 
> tables are empty table. If the tables are not empty, it is reported that the 
> compile so slow that it looks like hive is hanging. 
> In hive 2.0, the compile is much faster, explain takes 6.6 seconds. But it is 
> still a lot of time. One of the problem slows ppd down is that list in 
> pushdownPreds can grow very large which makes extractPushdownPreds bad 
> performance:
> {noformat}
> public static ExprWalkerInfo extractPushdownPreds(OpWalkerInfo opContext,
> Operator op, List preds)
> {noformat}
> During run the query above, in the following break point preds  has size of 
> 12051, and most entry of the list is: 
> GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), 
> GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), 
> GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), 
> GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), 
> Following code in extractPushdownPreds will clone all the nodes in preds and 
> do the walk. Hive 2.0 is faster because HIVE-11652(and other jiras) makes 
> startWalking much faster, but we still clone thousands of nodes with same 
> expression. Should we store so many same predicates in the list or just one 
> is good enough?  
> {noformat}
> List startNodes = new ArrayList();
> List clonedPreds = new ArrayList();
> for (ExprNodeDesc node : preds) {
>   ExprNodeDesc clone = node.clone();
>   clonedPreds.add(clone);
>   exprContext.getNewToOldExprMap().put(clone, node);
> }
> startNodes.addAll(clonedPreds);
> egw.startWalking(startNodes, null);
> {noformat}
> Should we change java/org/apache/hadoop/hive/ql/ppd/ExprWalkerInfo.java
> method 
> public void addFinalCandidate(String alias, ExprNodeDesc expr) 
> and
> public void addPushDowns(String alias, List pushDowns) 
> to only add expr which is not in the PushDown list for an alias?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12189) The list in pushdownPreds of ppd.ExprWalkerInfo should not be allowed to grow very large

2015-10-17 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14962169#comment-14962169
 ] 

Yongzhi Chen commented on HIVE-12189:
-

The 4 failures are not related. Their ages are more than 14. 

> The list in pushdownPreds of ppd.ExprWalkerInfo should not be allowed to grow 
> very large
> 
>
> Key: HIVE-12189
> URL: https://issues.apache.org/jira/browse/HIVE-12189
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.1.0, 2.0.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Attachments: HIVE-12189.1.patch
>
>
> Some queries are very slow in compile time, for example following query
> {noformat}
> select * from tt1 nf 
> join tt2 a1 on (nf.col1 = a1.col1 and nf.hdp_databaseid = a1.hdp_databaseid) 
> join tt3 a2 on(a2.col2 = a1.col2 and a2.col3 = nf.col3 and 
> a2.hdp_databaseid = nf.hdp_databaseid) 
> join tt4 a3 on  (a3.col4 = a2.col4 and a3.col3 = a2.col3) 
> join tt5 a4 on (a4.col4 = a2.col4 and a4.col5 = a2.col5 and a4.col3 = 
> a2.col3 and a4.hdp_databaseid = nf.hdp_databaseid) 
> join tt6 a5 on  (a5.col3 = a2.col3 and a5.col2 = a2.col2 and 
> a5.hdp_databaseid = nf.hdp_databaseid) 
> JOIN tt7 a6 ON (a2.col3 = a6.col3 and a2.col2 = a6.col2 and a6.hdp_databaseid 
> = nf.hdp_databaseid) 
> JOIN tt8 a7 ON (a2.col3 = a7.col3 and a2.col2 = a7.col2 and a7.hdp_databaseid 
> = nf.hdp_databaseid)
> where nf.hdp_databaseid = 102 limit 10;
> {noformat}
> takes around 120 seconds to compile in hive 1.1 when
> hive.mapred.mode=strict;
> hive.optimize.ppd=true;
> and hive is not in test mode.
> All the above tables are tables with one column as partition. But all the 
> tables are empty table. If the tables are not empty, it is reported that the 
> compile so slow that it looks like hive is hanging. 
> In hive 2.0, the compile is much faster, explain takes 6.6 seconds. But it is 
> still a lot of time. One of the problem slows ppd down is that list in 
> pushdownPreds can grow very large which makes extractPushdownPreds bad 
> performance:
> {noformat}
> public static ExprWalkerInfo extractPushdownPreds(OpWalkerInfo opContext,
> Operator op, List preds)
> {noformat}
> During run the query above, in the following break point preds  has size of 
> 12051, and most entry of the list is: 
> GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), 
> GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), 
> GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), 
> GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), 
> Following code in extractPushdownPreds will clone all the nodes in preds and 
> do the walk. Hive 2.0 is faster because HIVE-11652(and other jiras) makes 
> startWalking much faster, but we still clone thousands of nodes with same 
> expression. Should we store so many same predicates in the list or just one 
> is good enough?  
> {noformat}
> List startNodes = new ArrayList();
> List clonedPreds = new ArrayList();
> for (ExprNodeDesc node : preds) {
>   ExprNodeDesc clone = node.clone();
>   clonedPreds.add(clone);
>   exprContext.getNewToOldExprMap().put(clone, node);
> }
> startNodes.addAll(clonedPreds);
> egw.startWalking(startNodes, null);
> {noformat}
> Should we change java/org/apache/hadoop/hive/ql/ppd/ExprWalkerInfo.java
> method 
> public void addFinalCandidate(String alias, ExprNodeDesc expr) 
> and
> public void addPushDowns(String alias, List pushDowns) 
> to only add expr which is not in the PushDown list for an alias?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12189) The list in pushdownPreds of ppd.ExprWalkerInfo should not be allowed to grow very large

2015-10-17 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14962014#comment-14962014
 ] 

Hive QA commented on HIVE-12189:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12767094/HIVE-12189.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 9702 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_explode
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udtf_explode
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5693/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5693/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5693/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12767094 - PreCommit-HIVE-TRUNK-Build

> The list in pushdownPreds of ppd.ExprWalkerInfo should not be allowed to grow 
> very large
> 
>
> Key: HIVE-12189
> URL: https://issues.apache.org/jira/browse/HIVE-12189
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.1.0, 2.0.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Attachments: HIVE-12189.1.patch
>
>
> Some queries are very slow in compile time, for example following query
> {noformat}
> select * from tt1 nf 
> join tt2 a1 on (nf.col1 = a1.col1 and nf.hdp_databaseid = a1.hdp_databaseid) 
> join tt3 a2 on(a2.col2 = a1.col2 and a2.col3 = nf.col3 and 
> a2.hdp_databaseid = nf.hdp_databaseid) 
> join tt4 a3 on  (a3.col4 = a2.col4 and a3.col3 = a2.col3) 
> join tt5 a4 on (a4.col4 = a2.col4 and a4.col5 = a2.col5 and a4.col3 = 
> a2.col3 and a4.hdp_databaseid = nf.hdp_databaseid) 
> join tt6 a5 on  (a5.col3 = a2.col3 and a5.col2 = a2.col2 and 
> a5.hdp_databaseid = nf.hdp_databaseid) 
> JOIN tt7 a6 ON (a2.col3 = a6.col3 and a2.col2 = a6.col2 and a6.hdp_databaseid 
> = nf.hdp_databaseid) 
> JOIN tt8 a7 ON (a2.col3 = a7.col3 and a2.col2 = a7.col2 and a7.hdp_databaseid 
> = nf.hdp_databaseid)
> where nf.hdp_databaseid = 102 limit 10;
> {noformat}
> takes around 120 seconds to compile in hive 1.1 when
> hive.mapred.mode=strict;
> hive.optimize.ppd=true;
> and hive is not in test mode.
> All the above tables are tables with one column as partition. But all the 
> tables are empty table. If the tables are not empty, it is reported that the 
> compile so slow that it looks like hive is hanging. 
> In hive 2.0, the compile is much faster, explain takes 6.6 seconds. But it is 
> still a lot of time. One of the problem slows ppd down is that list in 
> pushdownPreds can grow very large which makes extractPushdownPreds bad 
> performance:
> {noformat}
> public static ExprWalkerInfo extractPushdownPreds(OpWalkerInfo opContext,
> Operator op, List preds)
> {noformat}
> During run the query above, in the following break point preds  has size of 
> 12051, and most entry of the list is: 
> GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), 
> GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), 
> GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), 
> GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), 
> Following code in extractPushdownPreds will clone all the nodes in preds and 
> do the walk. Hive 2.0 is faster because HIVE-11652(and other jiras) makes 
> startWalking much faster, but we still clone thousands of nodes with same 
> expression. Should we store so many same predicates in the list or just one 
> is good enough?  
> {noformat}
> List startNodes = new ArrayList();
> List clonedPreds = new ArrayList();
> for (ExprNodeDesc node : preds) {
>   ExprNodeDesc clone = node.clone();
>   clonedPreds.add(clone);
>   exprContext.getNewToOldExprMap().put(clone, node);
> }
> startNodes.addAll(clonedPreds);
> egw.startWalking(startNodes, null);
> {noformat}
> Should we change java/org/apache/hadoop/hive/ql/ppd/ExprWalkerInfo.java
> method 
> public void addFinalCandidate(String alias, ExprNodeDesc expr) 
> and
> public void addPushDowns(String alias, List pushDowns) 
> 

[jira] [Commented] (HIVE-12189) The list in pushdownPreds of ppd.ExprWalkerInfo should not be allowed to grow very large

2015-10-16 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960959#comment-14960959
 ] 

Yongzhi Chen commented on HIVE-12189:
-

With the patch, the explain drop to 2.* seconds.

> The list in pushdownPreds of ppd.ExprWalkerInfo should not be allowed to grow 
> very large
> 
>
> Key: HIVE-12189
> URL: https://issues.apache.org/jira/browse/HIVE-12189
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.1.0, 2.0.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Attachments: HIVE-12189.1.patch
>
>
> Some queries are very slow in compile time, for example following query
> {noformat}
> select * from tt1 nf 
> join tt2 a1 on (nf.col1 = a1.col1 and nf.hdp_databaseid = a1.hdp_databaseid) 
> join tt3 a2 on(a2.col2 = a1.col2 and a2.col3 = nf.col3 and 
> a2.hdp_databaseid = nf.hdp_databaseid) 
> join tt4 a3 on  (a3.col4 = a2.col4 and a3.col3 = a2.col3) 
> join tt5 a4 on (a4.col4 = a2.col4 and a4.col5 = a2.col5 and a4.col3 = 
> a2.col3 and a4.hdp_databaseid = nf.hdp_databaseid) 
> join tt6 a5 on  (a5.col3 = a2.col3 and a5.col2 = a2.col2 and 
> a5.hdp_databaseid = nf.hdp_databaseid) 
> JOIN tt7 a6 ON (a2.col3 = a6.col3 and a2.col2 = a6.col2 and a6.hdp_databaseid 
> = nf.hdp_databaseid) 
> JOIN tt8 a7 ON (a2.col3 = a7.col3 and a2.col2 = a7.col2 and a7.hdp_databaseid 
> = nf.hdp_databaseid)
> where nf.hdp_databaseid = 102 limit 10;
> {noformat}
> takes around 120 seconds to compile in hive 1.1 when
> hive.mapred.mode=strict;
> hive.optimize.ppd=true;
> and hive is not in test mode.
> All the above tables are tables with one column as partition. But all the 
> tables are empty table. If the tables are not empty, it is reported that the 
> compile so slow that it looks like hive is hanging. 
> In hive 2.0, the compile is much faster, explain takes 6.6 seconds. But it is 
> still a lot of time. One of the problem slows ppd down is that list in 
> pushdownPreds can grow very large which makes extractPushdownPreds bad 
> performance:
> {noformat}
> public static ExprWalkerInfo extractPushdownPreds(OpWalkerInfo opContext,
> Operator op, List preds)
> {noformat}
> During run the query above, in the following break point preds  has size of 
> 12051, and most entry of the list is: 
> GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), 
> GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), 
> GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), 
> GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), 
> Following code in extractPushdownPreds will clone all the nodes in preds and 
> do the walk. Hive 2.0 is faster because HIVE-11652 makes startWalking much 
> faster, but we still clone thousands of nodes with same expression. Should we 
> store so many same predicates in the list or just one is good enough?  
> {noformat}
> List startNodes = new ArrayList();
> List clonedPreds = new ArrayList();
> for (ExprNodeDesc node : preds) {
>   ExprNodeDesc clone = node.clone();
>   clonedPreds.add(clone);
>   exprContext.getNewToOldExprMap().put(clone, node);
> }
> startNodes.addAll(clonedPreds);
> egw.startWalking(startNodes, null);
> {noformat}
> Should we change java/org/apache/hadoop/hive/ql/ppd/ExprWalkerInfo.java
> method 
> public void addFinalCandidate(String alias, ExprNodeDesc expr) 
> and
> public void addPushDowns(String alias, List pushDowns) 
> to only add expr which is not in the PushDown list for an alias?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)