[
https://issues.apache.org/jira/browse/CALCITE-7640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Darpan Lunagariya (e6data computing) reassigned CALCITE-7640:
-------------------------------------------------------------
Assignee: Darpan Lunagariya (e6data computing)
> Enumerable engine should execute aggregates whose implementor comes from a
> custom RexImplementorTable"
> ------------------------------------------------------------------------------------------------------
>
> Key: CALCITE-7640
> URL: https://issues.apache.org/jira/browse/CALCITE-7640
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Affects Versions: 1.42.0
> Reporter: Darpan Lunagariya (e6data computing)
> Assignee: Darpan Lunagariya (e6data computing)
> Priority: Minor
>
> h2. Background
> CALCITE-7631 introduced {{RexImplementorTable}}, a composable SPI that is the
> code-generation counterpart of {{SqlOperatorTable}}, and threaded it through
> scalar code generation ({{RexToLixTranslator}}) and constant reduction
> ({{RexExecutorImpl}}). An extension can now supply implementors for its own
> scalar operators and chain them ahead of the built-ins.
> The aggregate path was left out of that change. {{EnumerableAggregate}} still
> resolves aggregate implementors only through the {{RexImpTable}} singleton,
> so a custom aggregate operator which is contributed through a
> {{SqlOperatorTable}} rather than registered as a schema function, cannot be
> planned or executed by the enumerable engine, even when a
> {{RexImplementorTable}} that knows it is in scope.
> h2. Problem
> The built-in singleton is consulted in two places on the aggregate path:
> * *Planning time*: the {{EnumerableAggregate}} constructor pre-checks every
> {{AggregateCall}} against {{RexImpTable.INSTANCE}} and throws
> {{InvalidRelException}} when there is no implementor.
> {{EnumerableAggregateRule}} catches that and returns {{null}}, so an
> aggregate the built-ins do not know is rejected during planning (ultimately
> {{CannotPlanException}}).
> * *Code-generation time*: {{AggImpState}} resolves the implementor from
> {{RexImpTable.INSTANCE}} as well.
> Because both hard-code the singleton, a chained {{RexImplementorTable}}
> carrying a custom aggregate implementor is ignored. This is inconsistent with
> scalar operators, which CALCITE-7631 already made pluggable.
> The deeper issue is that the constructor conflates two unrelated validations:
> # table-independent structural checks ({{DISTINCT}} / {{WITHIN DISTINCT}} are
> never supported by the enumerable aggregate regardless of any table), which
> legitimately belong to the node.
> # implementor availability against a specific registry, which is a planning
> decision and depends on which table is in play.
> Only the second is the defect: hard-coding {{RexImpTable.INSTANCE}} for it
> bakes in "the built-in table is the only table"; the closed-world assumption
> CALCITE-7631 removed for scalars. (The sibling {{EnumerableSortedAggregate}}
> performs no such check in its constructor.)
> h2. Proposed change
> Resolve aggregate implementors through the composable table, defaulting to
> the built-ins:
> * Add {{RexImplementorTables.of(RelOptCluster)}}, which returns the
> {{RexImplementorTable}} registered on the planner {{Context}}, or
> {{RexImpTable.instance()}} when none is registered -- the single place the
> default is named.
> * Move the implementor-availability check out of the {{EnumerableAggregate}}
> constructor into {{EnumerableAggregateRule.convert}}, where the planner
> (hence the injected table) is available. The rule declines (returns {{null}})
> when the active table has no implementor, so the planner can try another
> convention. The constructor keeps only the table-independent structural
> checks.
> * Thread the resolved table into aggregate code generation: {{AggImpState}}
> takes a {{RexImplementorTable}}, supplied by its callers
> ({{EnumerableAggregate}}, {{EnumerableSortedAggregate}}, {{EnumerableWindow}}
> and the interpreter's {{AggregateNode}}) from
> {{RexImplementorTables.of(getCluster())}}.
> Both the planning gate and code generation then read the same table, so "the
> rule accepted this aggregate" implies "code generation can generate it".
> {{RelOptCluster}} itself is not changed; it is only used to reach the planner
> {{Context}}.
> h3. Registration
> An extension supplies the table on the planner {{Context}} once, at planner
> creation:
> {code:java}
> FrameworkConfig config = Frameworks.newConfigBuilder()
> .operatorTable(
> SqlOperatorTables.chain(SqlStdOperatorTable.instance(),
> SqlOperatorTables.of(myAggFunction)))
> .context(
> Contexts.of(
> RexImplementorTables.chain(customTable, RexImpTable.instance())))
> // ... rules, schema ...
> .build();
> {code}
> The same applies to a directly built {{VolcanoPlanner(Contexts.of(...))}} and
> to the JDBC path (a {{CalcitePrepareImpl}} subclass overriding
> {{createPlanner}} to populate {{externalContext}}, wired via
> {{Driver.withPrepareFactory}}).
> h3. Backward compatibility
> With no registration the accessor resolves to {{RexImpTable.instance()}}, so
> existing behaviour is unchanged. The former 3-argument {{AggImpState}}
> constructor is retained (deprecated) and delegates to the built-in table.
> h2. Example
> {code:sql}
> SELECT g, MY_CUSTOM_AGG(x) AS c
> FROM (VALUES ('a', 1), ('a', 2), ('b', 3)) AS t (g, x)
> GROUP BY g
> {code}
> where {{MY_CUSTOM_AGG}} is contributed through a {{SqlOperatorTable}} and its
> implementor through a chained {{RexImplementorTable}}. Today this fails to
> plan; after the change it plans as an {{EnumerableAggregate}} and executes,
> returning {{(a, 2), (b, 1)}}.
> h2. Testing
> End-to-end enumerable execution of a custom aggregate implementor, asserting
> the actual result values -- driven both through the {{Frameworks}} API (real
> SQL) and a directly built planner -- plus the negative case: an unknown
> aggregate is rejected when no custom table is registered.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)