[jira] [Commented] (HIVE-12189) The list in pushdownPreds of ppd.ExprWalkerInfo should not be allowed to grow very large
[ https://issues.apache.org/jira/browse/HIVE-12189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967804#comment-14967804 ] Ashutosh Chauhan commented on HIVE-12189: - [~jcamachorodriguez] This one might interest you. > The list in pushdownPreds of ppd.ExprWalkerInfo should not be allowed to grow > very large > > > Key: HIVE-12189 > URL: https://issues.apache.org/jira/browse/HIVE-12189 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Affects Versions: 1.1.0, 2.0.0 >Reporter: Yongzhi Chen >Assignee: Yongzhi Chen > Attachments: HIVE-12189.1.patch > > > Some queries are very slow in compile time, for example following query > {noformat} > select * from tt1 nf > join tt2 a1 on (nf.col1 = a1.col1 and nf.hdp_databaseid = a1.hdp_databaseid) > join tt3 a2 on(a2.col2 = a1.col2 and a2.col3 = nf.col3 and > a2.hdp_databaseid = nf.hdp_databaseid) > join tt4 a3 on (a3.col4 = a2.col4 and a3.col3 = a2.col3) > join tt5 a4 on (a4.col4 = a2.col4 and a4.col5 = a2.col5 and a4.col3 = > a2.col3 and a4.hdp_databaseid = nf.hdp_databaseid) > join tt6 a5 on (a5.col3 = a2.col3 and a5.col2 = a2.col2 and > a5.hdp_databaseid = nf.hdp_databaseid) > JOIN tt7 a6 ON (a2.col3 = a6.col3 and a2.col2 = a6.col2 and a6.hdp_databaseid > = nf.hdp_databaseid) > JOIN tt8 a7 ON (a2.col3 = a7.col3 and a2.col2 = a7.col2 and a7.hdp_databaseid > = nf.hdp_databaseid) > where nf.hdp_databaseid = 102 limit 10; > {noformat} > takes around 120 seconds to compile in hive 1.1 when > hive.mapred.mode=strict; > hive.optimize.ppd=true; > and hive is not in test mode. > All the above tables are tables with one column as partition. But all the > tables are empty table. If the tables are not empty, it is reported that the > compile so slow that it looks like hive is hanging. > In hive 2.0, the compile is much faster, explain takes 6.6 seconds. But it is > still a lot of time. One of the problem slows ppd down is that list in > pushdownPreds can grow very large which makes extractPushdownPreds bad > performance: > {noformat} > public static ExprWalkerInfo extractPushdownPreds(OpWalkerInfo opContext, > Operator op, List preds) > {noformat} > During run the query above, in the following break point preds has size of > 12051, and most entry of the list is: > GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), > GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), > GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), > GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), > Following code in extractPushdownPreds will clone all the nodes in preds and > do the walk. Hive 2.0 is faster because HIVE-11652(and other jiras) makes > startWalking much faster, but we still clone thousands of nodes with same > expression. Should we store so many same predicates in the list or just one > is good enough? > {noformat} > List startNodes = new ArrayList(); > List clonedPreds = new ArrayList(); > for (ExprNodeDesc node : preds) { > ExprNodeDesc clone = node.clone(); > clonedPreds.add(clone); > exprContext.getNewToOldExprMap().put(clone, node); > } > startNodes.addAll(clonedPreds); > egw.startWalking(startNodes, null); > {noformat} > Should we change java/org/apache/hadoop/hive/ql/ppd/ExprWalkerInfo.java > method > public void addFinalCandidate(String alias, ExprNodeDesc expr) > and > public void addPushDowns(String alias, List pushDowns) > to only add expr which is not in the PushDown list for an alias? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12189) The list in pushdownPreds of ppd.ExprWalkerInfo should not be allowed to grow very large
[ https://issues.apache.org/jira/browse/HIVE-12189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967713#comment-14967713 ] Chao Sun commented on HIVE-12189: - Sorry, didn't see this. I'll take a look at the patch today. > The list in pushdownPreds of ppd.ExprWalkerInfo should not be allowed to grow > very large > > > Key: HIVE-12189 > URL: https://issues.apache.org/jira/browse/HIVE-12189 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Affects Versions: 1.1.0, 2.0.0 >Reporter: Yongzhi Chen >Assignee: Yongzhi Chen > Attachments: HIVE-12189.1.patch > > > Some queries are very slow in compile time, for example following query > {noformat} > select * from tt1 nf > join tt2 a1 on (nf.col1 = a1.col1 and nf.hdp_databaseid = a1.hdp_databaseid) > join tt3 a2 on(a2.col2 = a1.col2 and a2.col3 = nf.col3 and > a2.hdp_databaseid = nf.hdp_databaseid) > join tt4 a3 on (a3.col4 = a2.col4 and a3.col3 = a2.col3) > join tt5 a4 on (a4.col4 = a2.col4 and a4.col5 = a2.col5 and a4.col3 = > a2.col3 and a4.hdp_databaseid = nf.hdp_databaseid) > join tt6 a5 on (a5.col3 = a2.col3 and a5.col2 = a2.col2 and > a5.hdp_databaseid = nf.hdp_databaseid) > JOIN tt7 a6 ON (a2.col3 = a6.col3 and a2.col2 = a6.col2 and a6.hdp_databaseid > = nf.hdp_databaseid) > JOIN tt8 a7 ON (a2.col3 = a7.col3 and a2.col2 = a7.col2 and a7.hdp_databaseid > = nf.hdp_databaseid) > where nf.hdp_databaseid = 102 limit 10; > {noformat} > takes around 120 seconds to compile in hive 1.1 when > hive.mapred.mode=strict; > hive.optimize.ppd=true; > and hive is not in test mode. > All the above tables are tables with one column as partition. But all the > tables are empty table. If the tables are not empty, it is reported that the > compile so slow that it looks like hive is hanging. > In hive 2.0, the compile is much faster, explain takes 6.6 seconds. But it is > still a lot of time. One of the problem slows ppd down is that list in > pushdownPreds can grow very large which makes extractPushdownPreds bad > performance: > {noformat} > public static ExprWalkerInfo extractPushdownPreds(OpWalkerInfo opContext, > Operator op, List preds) > {noformat} > During run the query above, in the following break point preds has size of > 12051, and most entry of the list is: > GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), > GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), > GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), > GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), > Following code in extractPushdownPreds will clone all the nodes in preds and > do the walk. Hive 2.0 is faster because HIVE-11652(and other jiras) makes > startWalking much faster, but we still clone thousands of nodes with same > expression. Should we store so many same predicates in the list or just one > is good enough? > {noformat} > List startNodes = new ArrayList(); > List clonedPreds = new ArrayList(); > for (ExprNodeDesc node : preds) { > ExprNodeDesc clone = node.clone(); > clonedPreds.add(clone); > exprContext.getNewToOldExprMap().put(clone, node); > } > startNodes.addAll(clonedPreds); > egw.startWalking(startNodes, null); > {noformat} > Should we change java/org/apache/hadoop/hive/ql/ppd/ExprWalkerInfo.java > method > public void addFinalCandidate(String alias, ExprNodeDesc expr) > and > public void addPushDowns(String alias, List pushDowns) > to only add expr which is not in the PushDown list for an alias? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12189) The list in pushdownPreds of ppd.ExprWalkerInfo should not be allowed to grow very large
[ https://issues.apache.org/jira/browse/HIVE-12189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968542#comment-14968542 ] Chao Sun commented on HIVE-12189: - Patch looks good to me. +1. I don't think we should add predicates that are semantically the same. > The list in pushdownPreds of ppd.ExprWalkerInfo should not be allowed to grow > very large > > > Key: HIVE-12189 > URL: https://issues.apache.org/jira/browse/HIVE-12189 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Affects Versions: 1.1.0, 2.0.0 >Reporter: Yongzhi Chen >Assignee: Yongzhi Chen > Attachments: HIVE-12189.1.patch > > > Some queries are very slow in compile time, for example following query > {noformat} > select * from tt1 nf > join tt2 a1 on (nf.col1 = a1.col1 and nf.hdp_databaseid = a1.hdp_databaseid) > join tt3 a2 on(a2.col2 = a1.col2 and a2.col3 = nf.col3 and > a2.hdp_databaseid = nf.hdp_databaseid) > join tt4 a3 on (a3.col4 = a2.col4 and a3.col3 = a2.col3) > join tt5 a4 on (a4.col4 = a2.col4 and a4.col5 = a2.col5 and a4.col3 = > a2.col3 and a4.hdp_databaseid = nf.hdp_databaseid) > join tt6 a5 on (a5.col3 = a2.col3 and a5.col2 = a2.col2 and > a5.hdp_databaseid = nf.hdp_databaseid) > JOIN tt7 a6 ON (a2.col3 = a6.col3 and a2.col2 = a6.col2 and a6.hdp_databaseid > = nf.hdp_databaseid) > JOIN tt8 a7 ON (a2.col3 = a7.col3 and a2.col2 = a7.col2 and a7.hdp_databaseid > = nf.hdp_databaseid) > where nf.hdp_databaseid = 102 limit 10; > {noformat} > takes around 120 seconds to compile in hive 1.1 when > hive.mapred.mode=strict; > hive.optimize.ppd=true; > and hive is not in test mode. > All the above tables are tables with one column as partition. But all the > tables are empty table. If the tables are not empty, it is reported that the > compile so slow that it looks like hive is hanging. > In hive 2.0, the compile is much faster, explain takes 6.6 seconds. But it is > still a lot of time. One of the problem slows ppd down is that list in > pushdownPreds can grow very large which makes extractPushdownPreds bad > performance: > {noformat} > public static ExprWalkerInfo extractPushdownPreds(OpWalkerInfo opContext, > Operator op, List preds) > {noformat} > During run the query above, in the following break point preds has size of > 12051, and most entry of the list is: > GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), > GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), > GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), > GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), > Following code in extractPushdownPreds will clone all the nodes in preds and > do the walk. Hive 2.0 is faster because HIVE-11652(and other jiras) makes > startWalking much faster, but we still clone thousands of nodes with same > expression. Should we store so many same predicates in the list or just one > is good enough? > {noformat} > List startNodes = new ArrayList(); > List clonedPreds = new ArrayList(); > for (ExprNodeDesc node : preds) { > ExprNodeDesc clone = node.clone(); > clonedPreds.add(clone); > exprContext.getNewToOldExprMap().put(clone, node); > } > startNodes.addAll(clonedPreds); > egw.startWalking(startNodes, null); > {noformat} > Should we change java/org/apache/hadoop/hive/ql/ppd/ExprWalkerInfo.java > method > public void addFinalCandidate(String alias, ExprNodeDesc expr) > and > public void addPushDowns(String alias, List pushDowns) > to only add expr which is not in the PushDown list for an alias? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12189) The list in pushdownPreds of ppd.ExprWalkerInfo should not be allowed to grow very large
[ https://issues.apache.org/jira/browse/HIVE-12189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14965444#comment-14965444 ] Yongzhi Chen commented on HIVE-12189: - [~csun], [~ctang.ma] could you review the change? Thanks > The list in pushdownPreds of ppd.ExprWalkerInfo should not be allowed to grow > very large > > > Key: HIVE-12189 > URL: https://issues.apache.org/jira/browse/HIVE-12189 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Affects Versions: 1.1.0, 2.0.0 >Reporter: Yongzhi Chen >Assignee: Yongzhi Chen > Attachments: HIVE-12189.1.patch > > > Some queries are very slow in compile time, for example following query > {noformat} > select * from tt1 nf > join tt2 a1 on (nf.col1 = a1.col1 and nf.hdp_databaseid = a1.hdp_databaseid) > join tt3 a2 on(a2.col2 = a1.col2 and a2.col3 = nf.col3 and > a2.hdp_databaseid = nf.hdp_databaseid) > join tt4 a3 on (a3.col4 = a2.col4 and a3.col3 = a2.col3) > join tt5 a4 on (a4.col4 = a2.col4 and a4.col5 = a2.col5 and a4.col3 = > a2.col3 and a4.hdp_databaseid = nf.hdp_databaseid) > join tt6 a5 on (a5.col3 = a2.col3 and a5.col2 = a2.col2 and > a5.hdp_databaseid = nf.hdp_databaseid) > JOIN tt7 a6 ON (a2.col3 = a6.col3 and a2.col2 = a6.col2 and a6.hdp_databaseid > = nf.hdp_databaseid) > JOIN tt8 a7 ON (a2.col3 = a7.col3 and a2.col2 = a7.col2 and a7.hdp_databaseid > = nf.hdp_databaseid) > where nf.hdp_databaseid = 102 limit 10; > {noformat} > takes around 120 seconds to compile in hive 1.1 when > hive.mapred.mode=strict; > hive.optimize.ppd=true; > and hive is not in test mode. > All the above tables are tables with one column as partition. But all the > tables are empty table. If the tables are not empty, it is reported that the > compile so slow that it looks like hive is hanging. > In hive 2.0, the compile is much faster, explain takes 6.6 seconds. But it is > still a lot of time. One of the problem slows ppd down is that list in > pushdownPreds can grow very large which makes extractPushdownPreds bad > performance: > {noformat} > public static ExprWalkerInfo extractPushdownPreds(OpWalkerInfo opContext, > Operator op, List preds) > {noformat} > During run the query above, in the following break point preds has size of > 12051, and most entry of the list is: > GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), > GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), > GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), > GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), > Following code in extractPushdownPreds will clone all the nodes in preds and > do the walk. Hive 2.0 is faster because HIVE-11652(and other jiras) makes > startWalking much faster, but we still clone thousands of nodes with same > expression. Should we store so many same predicates in the list or just one > is good enough? > {noformat} > List startNodes = new ArrayList(); > List clonedPreds = new ArrayList(); > for (ExprNodeDesc node : preds) { > ExprNodeDesc clone = node.clone(); > clonedPreds.add(clone); > exprContext.getNewToOldExprMap().put(clone, node); > } > startNodes.addAll(clonedPreds); > egw.startWalking(startNodes, null); > {noformat} > Should we change java/org/apache/hadoop/hive/ql/ppd/ExprWalkerInfo.java > method > public void addFinalCandidate(String alias, ExprNodeDesc expr) > and > public void addPushDowns(String alias, List pushDowns) > to only add expr which is not in the PushDown list for an alias? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12189) The list in pushdownPreds of ppd.ExprWalkerInfo should not be allowed to grow very large
[ https://issues.apache.org/jira/browse/HIVE-12189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14963281#comment-14963281 ] Yongzhi Chen commented on HIVE-12189: - I did some comparing tests: 1. Backported HIVE-11652 to HIVE 1.1 version, does not make compile much faster, it still took 102 seconds. So HIVE-11652 alone can not make egw.startWalking as fast as HIVE 2.0. I need find more jiras to backport. 2. Only apply the patch for this jira(HIVE-12189), the compile time for the query drop to 6.2 seconds for HIVE 1.1. 3. For HIVE 2.0, this patch help drop the compile time from 6.6 second to 2.3 > The list in pushdownPreds of ppd.ExprWalkerInfo should not be allowed to grow > very large > > > Key: HIVE-12189 > URL: https://issues.apache.org/jira/browse/HIVE-12189 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Affects Versions: 1.1.0, 2.0.0 >Reporter: Yongzhi Chen >Assignee: Yongzhi Chen > Attachments: HIVE-12189.1.patch > > > Some queries are very slow in compile time, for example following query > {noformat} > select * from tt1 nf > join tt2 a1 on (nf.col1 = a1.col1 and nf.hdp_databaseid = a1.hdp_databaseid) > join tt3 a2 on(a2.col2 = a1.col2 and a2.col3 = nf.col3 and > a2.hdp_databaseid = nf.hdp_databaseid) > join tt4 a3 on (a3.col4 = a2.col4 and a3.col3 = a2.col3) > join tt5 a4 on (a4.col4 = a2.col4 and a4.col5 = a2.col5 and a4.col3 = > a2.col3 and a4.hdp_databaseid = nf.hdp_databaseid) > join tt6 a5 on (a5.col3 = a2.col3 and a5.col2 = a2.col2 and > a5.hdp_databaseid = nf.hdp_databaseid) > JOIN tt7 a6 ON (a2.col3 = a6.col3 and a2.col2 = a6.col2 and a6.hdp_databaseid > = nf.hdp_databaseid) > JOIN tt8 a7 ON (a2.col3 = a7.col3 and a2.col2 = a7.col2 and a7.hdp_databaseid > = nf.hdp_databaseid) > where nf.hdp_databaseid = 102 limit 10; > {noformat} > takes around 120 seconds to compile in hive 1.1 when > hive.mapred.mode=strict; > hive.optimize.ppd=true; > and hive is not in test mode. > All the above tables are tables with one column as partition. But all the > tables are empty table. If the tables are not empty, it is reported that the > compile so slow that it looks like hive is hanging. > In hive 2.0, the compile is much faster, explain takes 6.6 seconds. But it is > still a lot of time. One of the problem slows ppd down is that list in > pushdownPreds can grow very large which makes extractPushdownPreds bad > performance: > {noformat} > public static ExprWalkerInfo extractPushdownPreds(OpWalkerInfo opContext, > Operator op, List preds) > {noformat} > During run the query above, in the following break point preds has size of > 12051, and most entry of the list is: > GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), > GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), > GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), > GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), > Following code in extractPushdownPreds will clone all the nodes in preds and > do the walk. Hive 2.0 is faster because HIVE-11652(and other jiras) makes > startWalking much faster, but we still clone thousands of nodes with same > expression. Should we store so many same predicates in the list or just one > is good enough? > {noformat} > List startNodes = new ArrayList(); > List clonedPreds = new ArrayList(); > for (ExprNodeDesc node : preds) { > ExprNodeDesc clone = node.clone(); > clonedPreds.add(clone); > exprContext.getNewToOldExprMap().put(clone, node); > } > startNodes.addAll(clonedPreds); > egw.startWalking(startNodes, null); > {noformat} > Should we change java/org/apache/hadoop/hive/ql/ppd/ExprWalkerInfo.java > method > public void addFinalCandidate(String alias, ExprNodeDesc expr) > and > public void addPushDowns(String alias, List pushDowns) > to only add expr which is not in the PushDown list for an alias? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12189) The list in pushdownPreds of ppd.ExprWalkerInfo should not be allowed to grow very large
[ https://issues.apache.org/jira/browse/HIVE-12189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14962169#comment-14962169 ] Yongzhi Chen commented on HIVE-12189: - The 4 failures are not related. Their ages are more than 14. > The list in pushdownPreds of ppd.ExprWalkerInfo should not be allowed to grow > very large > > > Key: HIVE-12189 > URL: https://issues.apache.org/jira/browse/HIVE-12189 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Affects Versions: 1.1.0, 2.0.0 >Reporter: Yongzhi Chen >Assignee: Yongzhi Chen > Attachments: HIVE-12189.1.patch > > > Some queries are very slow in compile time, for example following query > {noformat} > select * from tt1 nf > join tt2 a1 on (nf.col1 = a1.col1 and nf.hdp_databaseid = a1.hdp_databaseid) > join tt3 a2 on(a2.col2 = a1.col2 and a2.col3 = nf.col3 and > a2.hdp_databaseid = nf.hdp_databaseid) > join tt4 a3 on (a3.col4 = a2.col4 and a3.col3 = a2.col3) > join tt5 a4 on (a4.col4 = a2.col4 and a4.col5 = a2.col5 and a4.col3 = > a2.col3 and a4.hdp_databaseid = nf.hdp_databaseid) > join tt6 a5 on (a5.col3 = a2.col3 and a5.col2 = a2.col2 and > a5.hdp_databaseid = nf.hdp_databaseid) > JOIN tt7 a6 ON (a2.col3 = a6.col3 and a2.col2 = a6.col2 and a6.hdp_databaseid > = nf.hdp_databaseid) > JOIN tt8 a7 ON (a2.col3 = a7.col3 and a2.col2 = a7.col2 and a7.hdp_databaseid > = nf.hdp_databaseid) > where nf.hdp_databaseid = 102 limit 10; > {noformat} > takes around 120 seconds to compile in hive 1.1 when > hive.mapred.mode=strict; > hive.optimize.ppd=true; > and hive is not in test mode. > All the above tables are tables with one column as partition. But all the > tables are empty table. If the tables are not empty, it is reported that the > compile so slow that it looks like hive is hanging. > In hive 2.0, the compile is much faster, explain takes 6.6 seconds. But it is > still a lot of time. One of the problem slows ppd down is that list in > pushdownPreds can grow very large which makes extractPushdownPreds bad > performance: > {noformat} > public static ExprWalkerInfo extractPushdownPreds(OpWalkerInfo opContext, > Operator op, List preds) > {noformat} > During run the query above, in the following break point preds has size of > 12051, and most entry of the list is: > GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), > GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), > GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), > GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), > Following code in extractPushdownPreds will clone all the nodes in preds and > do the walk. Hive 2.0 is faster because HIVE-11652(and other jiras) makes > startWalking much faster, but we still clone thousands of nodes with same > expression. Should we store so many same predicates in the list or just one > is good enough? > {noformat} > List startNodes = new ArrayList(); > List clonedPreds = new ArrayList(); > for (ExprNodeDesc node : preds) { > ExprNodeDesc clone = node.clone(); > clonedPreds.add(clone); > exprContext.getNewToOldExprMap().put(clone, node); > } > startNodes.addAll(clonedPreds); > egw.startWalking(startNodes, null); > {noformat} > Should we change java/org/apache/hadoop/hive/ql/ppd/ExprWalkerInfo.java > method > public void addFinalCandidate(String alias, ExprNodeDesc expr) > and > public void addPushDowns(String alias, List pushDowns) > to only add expr which is not in the PushDown list for an alias? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12189) The list in pushdownPreds of ppd.ExprWalkerInfo should not be allowed to grow very large
[ https://issues.apache.org/jira/browse/HIVE-12189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14962014#comment-14962014 ] Hive QA commented on HIVE-12189: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12767094/HIVE-12189.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 9702 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_explode org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udtf_explode org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation org.apache.hive.jdbc.TestSSL.testSSLVersion {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5693/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5693/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5693/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12767094 - PreCommit-HIVE-TRUNK-Build > The list in pushdownPreds of ppd.ExprWalkerInfo should not be allowed to grow > very large > > > Key: HIVE-12189 > URL: https://issues.apache.org/jira/browse/HIVE-12189 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Affects Versions: 1.1.0, 2.0.0 >Reporter: Yongzhi Chen >Assignee: Yongzhi Chen > Attachments: HIVE-12189.1.patch > > > Some queries are very slow in compile time, for example following query > {noformat} > select * from tt1 nf > join tt2 a1 on (nf.col1 = a1.col1 and nf.hdp_databaseid = a1.hdp_databaseid) > join tt3 a2 on(a2.col2 = a1.col2 and a2.col3 = nf.col3 and > a2.hdp_databaseid = nf.hdp_databaseid) > join tt4 a3 on (a3.col4 = a2.col4 and a3.col3 = a2.col3) > join tt5 a4 on (a4.col4 = a2.col4 and a4.col5 = a2.col5 and a4.col3 = > a2.col3 and a4.hdp_databaseid = nf.hdp_databaseid) > join tt6 a5 on (a5.col3 = a2.col3 and a5.col2 = a2.col2 and > a5.hdp_databaseid = nf.hdp_databaseid) > JOIN tt7 a6 ON (a2.col3 = a6.col3 and a2.col2 = a6.col2 and a6.hdp_databaseid > = nf.hdp_databaseid) > JOIN tt8 a7 ON (a2.col3 = a7.col3 and a2.col2 = a7.col2 and a7.hdp_databaseid > = nf.hdp_databaseid) > where nf.hdp_databaseid = 102 limit 10; > {noformat} > takes around 120 seconds to compile in hive 1.1 when > hive.mapred.mode=strict; > hive.optimize.ppd=true; > and hive is not in test mode. > All the above tables are tables with one column as partition. But all the > tables are empty table. If the tables are not empty, it is reported that the > compile so slow that it looks like hive is hanging. > In hive 2.0, the compile is much faster, explain takes 6.6 seconds. But it is > still a lot of time. One of the problem slows ppd down is that list in > pushdownPreds can grow very large which makes extractPushdownPreds bad > performance: > {noformat} > public static ExprWalkerInfo extractPushdownPreds(OpWalkerInfo opContext, > Operator op, List preds) > {noformat} > During run the query above, in the following break point preds has size of > 12051, and most entry of the list is: > GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), > GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), > GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), > GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), > Following code in extractPushdownPreds will clone all the nodes in preds and > do the walk. Hive 2.0 is faster because HIVE-11652(and other jiras) makes > startWalking much faster, but we still clone thousands of nodes with same > expression. Should we store so many same predicates in the list or just one > is good enough? > {noformat} > List startNodes = new ArrayList(); > List clonedPreds = new ArrayList(); > for (ExprNodeDesc node : preds) { > ExprNodeDesc clone = node.clone(); > clonedPreds.add(clone); > exprContext.getNewToOldExprMap().put(clone, node); > } > startNodes.addAll(clonedPreds); > egw.startWalking(startNodes, null); > {noformat} > Should we change java/org/apache/hadoop/hive/ql/ppd/ExprWalkerInfo.java > method > public void addFinalCandidate(String alias, ExprNodeDesc expr) > and > public void addPushDowns(String alias, List pushDowns) >
[jira] [Commented] (HIVE-12189) The list in pushdownPreds of ppd.ExprWalkerInfo should not be allowed to grow very large
[ https://issues.apache.org/jira/browse/HIVE-12189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960959#comment-14960959 ] Yongzhi Chen commented on HIVE-12189: - With the patch, the explain drop to 2.* seconds. > The list in pushdownPreds of ppd.ExprWalkerInfo should not be allowed to grow > very large > > > Key: HIVE-12189 > URL: https://issues.apache.org/jira/browse/HIVE-12189 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Affects Versions: 1.1.0, 2.0.0 >Reporter: Yongzhi Chen >Assignee: Yongzhi Chen > Attachments: HIVE-12189.1.patch > > > Some queries are very slow in compile time, for example following query > {noformat} > select * from tt1 nf > join tt2 a1 on (nf.col1 = a1.col1 and nf.hdp_databaseid = a1.hdp_databaseid) > join tt3 a2 on(a2.col2 = a1.col2 and a2.col3 = nf.col3 and > a2.hdp_databaseid = nf.hdp_databaseid) > join tt4 a3 on (a3.col4 = a2.col4 and a3.col3 = a2.col3) > join tt5 a4 on (a4.col4 = a2.col4 and a4.col5 = a2.col5 and a4.col3 = > a2.col3 and a4.hdp_databaseid = nf.hdp_databaseid) > join tt6 a5 on (a5.col3 = a2.col3 and a5.col2 = a2.col2 and > a5.hdp_databaseid = nf.hdp_databaseid) > JOIN tt7 a6 ON (a2.col3 = a6.col3 and a2.col2 = a6.col2 and a6.hdp_databaseid > = nf.hdp_databaseid) > JOIN tt8 a7 ON (a2.col3 = a7.col3 and a2.col2 = a7.col2 and a7.hdp_databaseid > = nf.hdp_databaseid) > where nf.hdp_databaseid = 102 limit 10; > {noformat} > takes around 120 seconds to compile in hive 1.1 when > hive.mapred.mode=strict; > hive.optimize.ppd=true; > and hive is not in test mode. > All the above tables are tables with one column as partition. But all the > tables are empty table. If the tables are not empty, it is reported that the > compile so slow that it looks like hive is hanging. > In hive 2.0, the compile is much faster, explain takes 6.6 seconds. But it is > still a lot of time. One of the problem slows ppd down is that list in > pushdownPreds can grow very large which makes extractPushdownPreds bad > performance: > {noformat} > public static ExprWalkerInfo extractPushdownPreds(OpWalkerInfo opContext, > Operator op, List preds) > {noformat} > During run the query above, in the following break point preds has size of > 12051, and most entry of the list is: > GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), > GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), > GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), > GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), > Following code in extractPushdownPreds will clone all the nodes in preds and > do the walk. Hive 2.0 is faster because HIVE-11652 makes startWalking much > faster, but we still clone thousands of nodes with same expression. Should we > store so many same predicates in the list or just one is good enough? > {noformat} > List startNodes = new ArrayList(); > List clonedPreds = new ArrayList(); > for (ExprNodeDesc node : preds) { > ExprNodeDesc clone = node.clone(); > clonedPreds.add(clone); > exprContext.getNewToOldExprMap().put(clone, node); > } > startNodes.addAll(clonedPreds); > egw.startWalking(startNodes, null); > {noformat} > Should we change java/org/apache/hadoop/hive/ql/ppd/ExprWalkerInfo.java > method > public void addFinalCandidate(String alias, ExprNodeDesc expr) > and > public void addPushDowns(String alias, List pushDowns) > to only add expr which is not in the PushDown list for an alias? -- This message was sent by Atlassian JIRA (v6.3.4#6332)