Default operator override in Apache Calcite

2022-07-09 Thread Vladimir Ozerov
Hi,

Apache Calcite has a powerful but complicated and non-documented function
library. Some projects may require overriding some of the existing
operators to introduce custom type deduction, custom validation, etc. This
includes the base arithmetic functions (e.g, disallow INT + VARCHAR),
aggregate functions (e.g., custom precision extension), etc.

One convenient way of doing this is to re-define the function in your
custom operator table. However, this doesn't work because Apache Calcite
core uses the direct references to SqlStdOperatorTable. It starts with the
parser [1] and validator [2]. If you manage to inject your functions at
this stage (e.g., using a custom validator implementation or a custom
SqlVisitor), the sql-to-rel converter will overwrite your functions
still [3]. And even when you get the RelNode, optimization rules would
silently replace your custom functions with the default ones [4].

Alternatively, you may try extending some base interface, such as the
TypeCoercion, but this doesn't give fine-grained control over the function
behavior because you have to retain the existing function definitions to do
coercion works.

A better solution might be is to abstract out the function references
through some sort of "factory"/"resolver", somewhat similar to the one used
to resolve user-provided operators. For instance, the user may pass an
optional desired operator table to parser/validator/converter configs and
RelOptCluster. Then the "core" codebase could be refactored to dereference
functions by SqlKind instead of SqlStdOperatorTable. If some required
function types are missing from the SqlKind enum, we can add them. The
default behavior would delegate to SqlStdOperatorTable, so the existing
apps would not be affected.

A "small" problem is that there are ~1500 usages of the SqlStdOperatorTable
in the "core" module, but most of the usages are very straightforward to
replace.

This way, we would ensure a consistent function resolution throughout all
query optimization phases. WDYT?

Regards,
Vladimir.

[1]
https://github.com/apache/calcite/blob/calcite-1.30.0/core/src/main/codegen/templates/Parser.jj#L7164
[2]
https://github.com/apache/calcite/blob/calcite-1.30.0/core/src/main/java/org/apache/calcite/sql/validate/SqlValidatorImpl.java#L6788
[3]
https://github.com/apache/calcite/blob/calcite-1.30.0/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L1438
[4]
https://github.com/apache/calcite/blob/calcite-1.30.0/core/src/main/java/org/apache/calcite/rel/rules/AggregateReduceFunctionsRule.java#L357


[jira] [Created] (CALCITE-5205) Supports hint option as string and numeric literal

2022-07-09 Thread Jiajun Xie (Jira)
Jiajun Xie created CALCITE-5205:
---

 Summary: Supports hint option as string and numeric literal
 Key: CALCITE-5205
 URL: https://issues.apache.org/jira/browse/CALCITE-5205
 Project: Calcite
  Issue Type: Improvement
  Components: core
Reporter: Jiajun Xie
Assignee: Jiajun Xie


 Spark support hint that options contain string and number:
{code:java}
SELECT /*+ REPARTITION(3, c) */ * FROM t {code}
But calcite can't parse it:
{code:java}
Error while parsing SQL: select /*+ repartition(3, empno) */ empno, ename, 
deptno from emps
java.lang.RuntimeException: Error while parsing SQL: select /*+ repartition(3, 
empno) */ empno, ename, deptno from emps {code}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-5206) Parser allows MERGE with mismatched parentheses

2022-07-09 Thread Julian Hyde (Jira)
Julian Hyde created CALCITE-5206:


 Summary: Parser allows MERGE with mismatched parentheses
 Key: CALCITE-5206
 URL: https://issues.apache.org/jira/browse/CALCITE-5206
 Project: Calcite
  Issue Type: Bug
Reporter: Julian Hyde


The SQL parser allows invalid MERGE statements with mismatched parentheses. For 
example, the following invalid statement is treated as valid but is missing a 
trailing ')':
{code}
merge into emps as e
using temps as t on e.empno = t.empno
when not matched
then insert (a, b) (values (1, 2);
{code}
And the following invalid statement is treated as valid but has an unmatched 
trailing ')':
{code}
merge into emps as e
using temps as t on e.empno = t.empno
when not matched
then insert (a, b) values (1, 2));
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)