[jira] [Commented] (HIVE-27102) Upgrade Calcite to 1.33.0 and Avatica to 1.23.0
[ https://issues.apache.org/jira/browse/HIVE-27102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843243#comment-17843243 ] Stamatis Zampetakis commented on HIVE-27102: One major element that was introduced in Calcite 1.26.0 and affects heavily the upgrade in Hive is the internal SEARCH operator (CALCITE-4173). As discussed under the respective ticket this operator aims to represent and unify various kinds of expressions notably IN, BETWEEN, and conjunctions/disjunctions with range/equality predicates. Since Hive does not know about the SEARCH operator one idea was to try to eliminate it from certain optimization phases in Hive by relying on the {{RexUtil#expandSearch}} and the {{HiveSearchExpandRule}} (which was introduced in https://github.com/zabetak/hive/tree/calcite-upgrade-1.33). However, some core APIS in Calcite such as the RelBuilder, RexSimplify, etc., now return the internal SEARCH operator and thus affect many rules, metadata providers, and other APIs. It might be difficult to ensure that SEARCH operator is completely eliminated from the plans and doing this may result in brittle code. Going forward Calcite will rely more and more on the SEARCH operator so instead of trying to get rid of it we should instead embrace it and ensure that we are handling it properly in Hive. In fact we should try to use the SEARCH operator as much as possible during the optimization phase and avoid back and forth conversions from SEARCH, BETWEEN, IN, etc. Failure to do so will probably make future upgrades harder and harder and it will increase the likelihood of "infinite rule matching" compilation failures . In Hive there are two kinds of rules that are strongly related to the SEARCH operator: * HivePointLookupOptimizerRule (useful for normalization and runtime performance) * HiveInBetweenExpandRule (useful for normalization and view based rewritting) With the advent of the SEARCH operator this rules are heavily impacted. Parts of the rules are probably redundant since RexSimplify should be able to handle some if not all of their use cases. Ideally, these rules should be removed altogether. Since the physical evaluation of the IN operator seems to have some benefits over the evaluation of OR (HIVE-11424) we should have some logic at the end of the optimization phase that will decide how to translate the SEARCH operator to a physical OR vs IN operator. Apart from the aforementioned rules there are probably other places that we have to consider the SEARCH operator and respectively get rid of the IN operator but it seems that this is the best way forward. Note that the presence of the SEARCH operator in the EXPLAIN CBO plan is not something to be avoided if the resulting plans are equivalent or better. We should focus only on cases where we spot regressions and decide how to tackle them. The plan changes that need to be more carefully reviewed are those at the physical layer in order to ensure that we don't have performance regressions. > Upgrade Calcite to 1.33.0 and Avatica to 1.23.0 > --- > > Key: HIVE-27102 > URL: https://issues.apache.org/jira/browse/HIVE-27102 > Project: Hive > Issue Type: Improvement > Components: CBO >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > > New versions for Calcite and Avatica are available so we should upgrade to > them. > I had some WIP in HIVE-26610 for upgrading calcite to 1.32.0 but given that > the work was not in very advanced state it is preferred to jump directly to > 1.33.0. > Avatica must be inline with Calcite so both need to be updated at the same > time. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27102) Upgrade Calcite to 1.33.0 and Avatica to 1.23.0
[ https://issues.apache.org/jira/browse/HIVE-27102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839518#comment-17839518 ] Stamatis Zampetakis commented on HIVE-27102: Hey [~frankgrimes97] , Calcite upgrades are rather complex but we will try to advance this work in the next few weeks and hopefully have it in 4.1.0. Other than that its worth mentioning that CVE-2020-13955, and CVE-2022-39135 are probably not exploitable via Hive since the respective codepath does not seem to be used. > Upgrade Calcite to 1.33.0 and Avatica to 1.23.0 > --- > > Key: HIVE-27102 > URL: https://issues.apache.org/jira/browse/HIVE-27102 > Project: Hive > Issue Type: Improvement > Components: CBO >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > > New versions for Calcite and Avatica are available so we should upgrade to > them. > I had some WIP in HIVE-26610 for upgrading calcite to 1.32.0 but given that > the work was not in very advanced state it is preferred to jump directly to > 1.33.0. > Avatica must be inline with Calcite so both need to be updated at the same > time. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27102) Upgrade Calcite to 1.33.0 and Avatica to 1.23.0
[ https://issues.apache.org/jira/browse/HIVE-27102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838656#comment-17838656 ] Frank Grimes commented on HIVE-27102: - Any update on this? I see that Hive 4.0.0 has recently been released but it still uses calcite-1.2.5 which we believe is still vulnerable to the following: - [CVE-2020-13955 - Missing Authentication for Critical Function in Apache Calcite|https://nvd.nist.gov/vuln/detail/CVE-2020-13955] - [CVE-2022-39135 -Apache Calcite before 1.32.0 vulnerable to potential XML External Entity (XXE) attack|https://nvd.nist.gov/vuln/detail/CVE-2022-39135] > Upgrade Calcite to 1.33.0 and Avatica to 1.23.0 > --- > > Key: HIVE-27102 > URL: https://issues.apache.org/jira/browse/HIVE-27102 > Project: Hive > Issue Type: Improvement > Components: CBO >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > > New versions for Calcite and Avatica are available so we should upgrade to > them. > I had some WIP in HIVE-26610 for upgrading calcite to 1.32.0 but given that > the work was not in very advanced state it is preferred to jump directly to > 1.33.0. > Avatica must be inline with Calcite so both need to be updated at the same > time. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27102) Upgrade Calcite to 1.33.0 and Avatica to 1.23.0
[ https://issues.apache.org/jira/browse/HIVE-27102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692778#comment-17692778 ] Stamatis Zampetakis commented on HIVE-27102: I will not be working on this for the next few weeks so if someone wants to advance on this the ideal would be to try to address the two issues mentioned above. > Upgrade Calcite to 1.33.0 and Avatica to 1.23.0 > --- > > Key: HIVE-27102 > URL: https://issues.apache.org/jira/browse/HIVE-27102 > Project: Hive > Issue Type: Improvement > Components: CBO >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > > New versions for Calcite and Avatica are available so we should upgrade to > them. > I had some WIP in HIVE-26610 for upgrading calcite to 1.32.0 but given that > the work was not in very advanced state it is preferred to jump directly to > 1.33.0. > Avatica must be inline with Calcite so both need to be updated at the same > time. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27102) Upgrade Calcite to 1.33.0 and Avatica to 1.23.0
[ https://issues.apache.org/jira/browse/HIVE-27102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692776#comment-17692776 ] Stamatis Zampetakis commented on HIVE-27102: The branch on which I am working on can be found here: [https://github.com/zabetak/hive/tree/calcite-upgrade-1.33] All the compilation problems are fixed in the branch. Moreover, there are workarounds and notes for some other problems found along the way. At this stage I am mainly running the CBO queries with the TPCDS stats trying to resolve errors and explain plan changes. {noformat} mvn test -Dtest=TestTezTPCDS30TBPerfCliDriver -Dqfile_regex=cbo_query.* -Dtest.output.overwrite{noformat} In terms of errors there at least few (cbo_query8.q and others) with the stacktrace below : {noformat} [ERROR] org.apache.hadoop.hive.cli.TestTezTPCDS30TBPerfCliDriver.testCliDriver[cbo_query8] Time elapsed: 0.425 s <<< FAILURE! java.lang.AssertionError at org.apache.calcite.rex.RexCall.(RexCall.java:86) at org.apache.calcite.rex.RexBuilder.makeCall(RexBuilder.java:253) at org.apache.hadoop.hive.ql.parse.type.HiveFunctionHelper.getExpression(HiveFunctionHelper.java:322) at org.apache.hadoop.hive.ql.parse.type.RexNodeExprFactory.createFuncCallExpr(RexNodeExprFactory.java:649) at org.apache.hadoop.hive.ql.parse.type.RexNodeExprFactory.createFuncCallExpr(RexNodeExprFactory.java:97) at org.apache.hadoop.hive.ql.parse.type.TypeCheckProcFactory$DefaultExprProcessor.getXpathOrFuncExprNodeDesc(TypeCheckProcFactory.java:1081) at org.apache.hadoop.hive.ql.parse.type.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1471) at org.apache.hadoop.hive.ql.lib.CostLessRuleDispatcher.dispatch(CostLessRuleDispatcher.java:66) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89) at org.apache.hadoop.hive.ql.lib.ExpressionWalker.walk(ExpressionWalker.java:101) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120) at org.apache.hadoop.hive.ql.parse.type.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:228) at org.apache.hadoop.hive.ql.parse.type.RexNodeTypeCheck.genExprNode(RexNodeTypeCheck.java:40) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genAllRexNode(CalcitePlanner.java:5516) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genRexNode(CalcitePlanner.java:5474) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genRexNode(CalcitePlanner.java:5434) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genFilterRelNode(CalcitePlanner.java:3164) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genFilterRelNode(CalcitePlanner.java:3482) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genFilterLogicalPlan(CalcitePlanner.java:3493) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5163) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5059) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5065) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5104) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5059) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5104) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5059) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5104) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1648) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1592) at org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:140) at org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:936) at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:191) at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:135) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1344) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:571) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12816) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:466) at