[jira] [Commented] (HIVE-27102) Upgrade Calcite to 1.33.0 and Avatica to 1.23.0

2024-05-03 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843243#comment-17843243
 ] 

Stamatis Zampetakis commented on HIVE-27102:


One major element that was introduced in Calcite 1.26.0 and affects heavily the 
upgrade in Hive is the internal SEARCH operator (CALCITE-4173). As discussed 
under the respective ticket this operator aims to represent and unify various 
kinds of expressions notably IN, BETWEEN, and conjunctions/disjunctions with 
range/equality predicates.

Since Hive does not know about the SEARCH operator one idea was to try to 
eliminate it from certain optimization phases in Hive by relying on the 
{{RexUtil#expandSearch}} and the {{HiveSearchExpandRule}} (which was introduced 
in https://github.com/zabetak/hive/tree/calcite-upgrade-1.33). However, some 
core APIS in Calcite such as the RelBuilder, RexSimplify, etc., now return the 
internal SEARCH operator and thus affect many rules, metadata providers, and 
other APIs. It might be difficult to ensure that SEARCH operator is completely 
eliminated from the plans and doing this may result in brittle code.

Going forward Calcite will rely more and more on the SEARCH operator so instead 
of trying to get rid of it we should instead embrace it and ensure that we are 
handling it properly in Hive. In fact we should try to use the SEARCH operator 
as much as possible during the optimization phase and avoid back and forth 
conversions from SEARCH, BETWEEN, IN, etc. Failure to do so will probably make 
future upgrades harder and harder and it will increase the likelihood of 
"infinite rule matching" compilation failures .

In Hive there are two kinds of rules that are strongly related to the SEARCH 
operator:
* HivePointLookupOptimizerRule (useful for normalization and runtime 
performance)
* HiveInBetweenExpandRule (useful for normalization and view based rewritting)

With the advent of the SEARCH operator this rules are heavily impacted. Parts 
of the rules are probably redundant since RexSimplify should be able to handle 
some if not all of their use cases. Ideally, these rules should be removed 
altogether.

Since the physical evaluation of the IN operator seems to have some benefits 
over the evaluation of OR (HIVE-11424) we should have some logic at the end of 
the optimization phase that will decide how to translate the SEARCH operator to 
a physical OR vs IN operator.

Apart from the aforementioned rules there are probably other places that we 
have to consider the SEARCH operator and respectively get rid of the IN 
operator but it seems that this is the best way forward.

Note that the presence of the SEARCH operator in the EXPLAIN CBO plan is not 
something to be avoided if the resulting plans are equivalent or better. We 
should focus only on cases where we spot regressions and decide how to tackle 
them. The plan changes that need to be more carefully reviewed are those at the 
physical layer in order to ensure that we don't have performance regressions.

> Upgrade Calcite to 1.33.0 and Avatica to 1.23.0
> ---
>
> Key: HIVE-27102
> URL: https://issues.apache.org/jira/browse/HIVE-27102
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> New versions for Calcite and Avatica are available so we should upgrade to 
> them.
> I had some WIP in HIVE-26610 for upgrading calcite to 1.32.0 but given that 
> the work was not in very advanced state it is preferred to jump directly to 
> 1.33.0.
> Avatica must be inline with Calcite so both need to be updated at the same 
> time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27102) Upgrade Calcite to 1.33.0 and Avatica to 1.23.0

2024-04-22 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839518#comment-17839518
 ] 

Stamatis Zampetakis commented on HIVE-27102:


Hey [~frankgrimes97] , Calcite upgrades are rather complex but we will try to 
advance this work in the next few weeks and hopefully have it in 4.1.0. Other 
than that its worth mentioning that CVE-2020-13955, and CVE-2022-39135 are 
probably not exploitable via Hive since the respective codepath does not seem 
to be used.

> Upgrade Calcite to 1.33.0 and Avatica to 1.23.0
> ---
>
> Key: HIVE-27102
> URL: https://issues.apache.org/jira/browse/HIVE-27102
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> New versions for Calcite and Avatica are available so we should upgrade to 
> them.
> I had some WIP in HIVE-26610 for upgrading calcite to 1.32.0 but given that 
> the work was not in very advanced state it is preferred to jump directly to 
> 1.33.0.
> Avatica must be inline with Calcite so both need to be updated at the same 
> time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27102) Upgrade Calcite to 1.33.0 and Avatica to 1.23.0

2024-04-18 Thread Frank Grimes (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838656#comment-17838656
 ] 

Frank Grimes commented on HIVE-27102:
-

Any update on this? I see that Hive 4.0.0 has recently been released but it 
still uses calcite-1.2.5 which we believe is still vulnerable to the following:

  - [CVE-2020-13955 - Missing Authentication for Critical Function in Apache 
Calcite|https://nvd.nist.gov/vuln/detail/CVE-2020-13955]
  - [CVE-2022-39135 -Apache Calcite before 1.32.0 vulnerable to potential XML 
External Entity (XXE) attack|https://nvd.nist.gov/vuln/detail/CVE-2022-39135]

> Upgrade Calcite to 1.33.0 and Avatica to 1.23.0
> ---
>
> Key: HIVE-27102
> URL: https://issues.apache.org/jira/browse/HIVE-27102
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> New versions for Calcite and Avatica are available so we should upgrade to 
> them.
> I had some WIP in HIVE-26610 for upgrading calcite to 1.32.0 but given that 
> the work was not in very advanced state it is preferred to jump directly to 
> 1.33.0.
> Avatica must be inline with Calcite so both need to be updated at the same 
> time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27102) Upgrade Calcite to 1.33.0 and Avatica to 1.23.0

2023-02-23 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692778#comment-17692778
 ] 

Stamatis Zampetakis commented on HIVE-27102:


I will not be working on this for the next few weeks so if someone wants to 
advance on this the ideal would be to try to address the two issues mentioned 
above.

> Upgrade Calcite to 1.33.0 and Avatica to 1.23.0
> ---
>
> Key: HIVE-27102
> URL: https://issues.apache.org/jira/browse/HIVE-27102
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> New versions for Calcite and Avatica are available so we should upgrade to 
> them.
> I had some WIP in HIVE-26610 for upgrading calcite to 1.32.0 but given that 
> the work was not in very advanced state it is preferred to jump directly to 
> 1.33.0.
> Avatica must be inline with Calcite so both need to be updated at the same 
> time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27102) Upgrade Calcite to 1.33.0 and Avatica to 1.23.0

2023-02-23 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692776#comment-17692776
 ] 

Stamatis Zampetakis commented on HIVE-27102:


The branch on which I am working on can be found here: 
[https://github.com/zabetak/hive/tree/calcite-upgrade-1.33]

All the compilation problems are fixed in the branch. Moreover, there are 
workarounds and notes for some other problems found along the way.

At this stage I am mainly running the CBO queries with the TPCDS stats trying 
to resolve errors and explain plan changes.
{noformat}
mvn test -Dtest=TestTezTPCDS30TBPerfCliDriver -Dqfile_regex=cbo_query.* 
-Dtest.output.overwrite{noformat}
In terms of errors there at least few (cbo_query8.q and others) with the 
stacktrace below :
{noformat}
[ERROR] 
org.apache.hadoop.hive.cli.TestTezTPCDS30TBPerfCliDriver.testCliDriver[cbo_query8]
  Time elapsed: 0.425 s  <<< FAILURE!
java.lang.AssertionError
    at org.apache.calcite.rex.RexCall.(RexCall.java:86)
    at org.apache.calcite.rex.RexBuilder.makeCall(RexBuilder.java:253)
    at 
org.apache.hadoop.hive.ql.parse.type.HiveFunctionHelper.getExpression(HiveFunctionHelper.java:322)
    at 
org.apache.hadoop.hive.ql.parse.type.RexNodeExprFactory.createFuncCallExpr(RexNodeExprFactory.java:649)
    at 
org.apache.hadoop.hive.ql.parse.type.RexNodeExprFactory.createFuncCallExpr(RexNodeExprFactory.java:97)
    at 
org.apache.hadoop.hive.ql.parse.type.TypeCheckProcFactory$DefaultExprProcessor.getXpathOrFuncExprNodeDesc(TypeCheckProcFactory.java:1081)
    at 
org.apache.hadoop.hive.ql.parse.type.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1471)
    at 
org.apache.hadoop.hive.ql.lib.CostLessRuleDispatcher.dispatch(CostLessRuleDispatcher.java:66)
    at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
    at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
    at 
org.apache.hadoop.hive.ql.lib.ExpressionWalker.walk(ExpressionWalker.java:101)
    at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
    at 
org.apache.hadoop.hive.ql.parse.type.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:228)
    at 
org.apache.hadoop.hive.ql.parse.type.RexNodeTypeCheck.genExprNode(RexNodeTypeCheck.java:40)
    at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genAllRexNode(CalcitePlanner.java:5516)
    at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genRexNode(CalcitePlanner.java:5474)
    at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genRexNode(CalcitePlanner.java:5434)
    at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genFilterRelNode(CalcitePlanner.java:3164)
    at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genFilterRelNode(CalcitePlanner.java:3482)
    at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genFilterLogicalPlan(CalcitePlanner.java:3493)
    at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5163)
    at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5059)
    at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5065)
    at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5104)
    at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5059)
    at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5104)
    at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5059)
    at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5104)
    at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1648)
    at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1592)
    at 
org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:140)
    at 
org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:936)
    at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:191)
    at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:135)
    at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1344)
    at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:571)
    at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12816)
    at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:466)
    at