Darpan Lunagariya (e6data computing) created CALCITE-7640:
-------------------------------------------------------------
Summary: Enumerable engine should execute aggregates whose
implementor comes from a custom RexImplementorTable"
Key: CALCITE-7640
URL: https://issues.apache.org/jira/browse/CALCITE-7640
Project: Calcite
Issue Type: Improvement
Components: core
Affects Versions: 1.42.0
Reporter: Darpan Lunagariya (e6data computing)
h2. Background
CALCITE-7631 introduced {{RexImplementorTable}}, a composable SPI that is the
code-generation counterpart of {{SqlOperatorTable}}, and threaded it through
scalar code generation ({{RexToLixTranslator}}) and constant reduction
({{RexExecutorImpl}}). An extension can now supply implementors for its own
scalar operators and chain them ahead of the built-ins.
The aggregate path was left out of that change. {{EnumerableAggregate}} still
resolves aggregate implementors only through the {{RexImpTable}} singleton, so
a custom aggregate operator which is contributed through a {{SqlOperatorTable}}
rather than registered as a schema function, cannot be planned or executed by
the enumerable engine, even when a {{RexImplementorTable}} that knows it is in
scope.
h2. Problem
The built-in singleton is consulted in two places on the aggregate path:
* *Planning time*: the {{EnumerableAggregate}} constructor pre-checks every
{{AggregateCall}} against {{RexImpTable.INSTANCE}} and throws
{{InvalidRelException}} when there is no implementor.
{{EnumerableAggregateRule}} catches that and returns {{null}}, so an aggregate
the built-ins do not know is rejected during planning (ultimately
{{CannotPlanException}}).
* *Code-generation time*: {{AggImpState}} resolves the implementor from
{{RexImpTable.INSTANCE}} as well.
Because both hard-code the singleton, a chained {{RexImplementorTable}}
carrying a custom aggregate implementor is ignored. This is inconsistent with
scalar operators, which CALCITE-7631 already made pluggable.
The deeper issue is that the constructor conflates two unrelated validations:
# table-independent structural checks ({{DISTINCT}} / {{WITHIN DISTINCT}} are
never supported by the enumerable aggregate regardless of any table), which
legitimately belong to the node.
# implementor availability against a specific registry, which is a planning
decision and depends on which table is in play.
Only the second is the defect: hard-coding {{RexImpTable.INSTANCE}} for it
bakes in "the built-in table is the only table"; the closed-world assumption
CALCITE-7631 removed for scalars. (The sibling {{EnumerableSortedAggregate}}
performs no such check in its constructor.)
h2. Proposed change
Resolve aggregate implementors through the composable table, defaulting to the
built-ins:
* Add {{RexImplementorTables.of(RelOptCluster)}}, which returns the
{{RexImplementorTable}} registered on the planner {{Context}}, or
{{RexImpTable.instance()}} when none is registered -- the single place the
default is named.
* Move the implementor-availability check out of the {{EnumerableAggregate}}
constructor into {{EnumerableAggregateRule.convert}}, where the planner (hence
the injected table) is available. The rule declines (returns {{null}}) when the
active table has no implementor, so the planner can try another convention. The
constructor keeps only the table-independent structural checks.
* Thread the resolved table into aggregate code generation: {{AggImpState}}
takes a {{RexImplementorTable}}, supplied by its callers
({{EnumerableAggregate}}, {{EnumerableSortedAggregate}}, {{EnumerableWindow}}
and the interpreter's {{AggregateNode}}) from
{{RexImplementorTables.of(getCluster())}}.
Both the planning gate and code generation then read the same table, so "the
rule accepted this aggregate" implies "code generation can generate it".
{{RelOptCluster}} itself is not changed; it is only used to reach the planner
{{Context}}.
h3. Registration
An extension supplies the table on the planner {{Context}} once, at planner
creation:
{code:java}
FrameworkConfig config = Frameworks.newConfigBuilder()
.operatorTable(
SqlOperatorTables.chain(SqlStdOperatorTable.instance(),
SqlOperatorTables.of(myAggFunction)))
.context(
Contexts.of(
RexImplementorTables.chain(customTable, RexImpTable.instance())))
// ... rules, schema ...
.build();
{code}
The same applies to a directly built {{VolcanoPlanner(Contexts.of(...))}} and
to the JDBC path (a {{CalcitePrepareImpl}} subclass overriding
{{createPlanner}} to populate {{externalContext}}, wired via
{{Driver.withPrepareFactory}}).
h3. Backward compatibility
With no registration the accessor resolves to {{RexImpTable.instance()}}, so
existing behaviour is unchanged. The former 3-argument {{AggImpState}}
constructor is retained (deprecated) and delegates to the built-in table.
h2. Example
{code:sql}
SELECT g, MY_CUSTOM_AGG(x) AS c
FROM (VALUES ('a', 1), ('a', 2), ('b', 3)) AS t (g, x)
GROUP BY g
{code}
where {{MY_CUSTOM_AGG}} is contributed through a {{SqlOperatorTable}} and its
implementor through a chained {{RexImplementorTable}}. Today this fails to
plan; after the change it plans as an {{EnumerableAggregate}} and executes,
returning {{(a, 2), (b, 1)}}.
h2. Testing
End-to-end enumerable execution of a custom aggregate implementor, asserting
the actual result values -- driven both through the {{Frameworks}} API (real
SQL) and a directly built planner -- plus the negative case: an unknown
aggregate is rejected when no custom table is registered.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)