[ 
https://issues.apache.org/jira/browse/CALCITE-7640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18093570#comment-18093570
 ] 

Darpan Lunagariya (e6data computing) commented on CALCITE-7640:
---------------------------------------------------------------

The concrete driver on my side was a custom {{RexCallImplementor}} for a scalar 
operator, which CALCITE-7631 now supports. Working on that made it clear that 
{{EnumerableAggregate}} still resolves its implementor from the hardcoded 
{{RexImpTable}} singleton; so aggregates were left as a second, non-composable 
path alongside the now-pluggable scalar one. This issue closes that gap so both 
go through the same SPI rather than diverging.

> Enumerable engine should execute aggregates whose implementor comes from a 
> custom RexImplementorTable"
> ------------------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-7640
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7640
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.42.0
>            Reporter: Darpan Lunagariya (e6data computing)
>            Assignee: Darpan Lunagariya (e6data computing)
>            Priority: Minor
>              Labels: pull-request-available
>
> h2. Background
> CALCITE-7631 introduced {{RexImplementorTable}}, a composable SPI that is the 
> code-generation counterpart of {{SqlOperatorTable}}, and threaded it through 
> scalar code generation ({{RexToLixTranslator}}) and constant reduction 
> ({{RexExecutorImpl}}). An extension can now supply implementors for its own 
> scalar operators and chain them ahead of the built-ins.
> The aggregate path was left out of that change. {{EnumerableAggregate}} still 
> resolves aggregate implementors only through the {{RexImpTable}} singleton, 
> so a custom aggregate operator which is contributed through a 
> {{SqlOperatorTable}} rather than registered as a schema function, cannot be 
> planned or executed by the enumerable engine, even when a 
> {{RexImplementorTable}} that knows it is in scope.
> h2. Problem
> The built-in singleton is consulted in two places on the aggregate path:
> * *Planning time*: the {{EnumerableAggregate}} constructor pre-checks every 
> {{AggregateCall}} against {{RexImpTable.INSTANCE}} and throws 
> {{InvalidRelException}} when there is no implementor. 
> {{EnumerableAggregateRule}} catches that and returns {{null}}, so an 
> aggregate the built-ins do not know is rejected during planning (ultimately 
> {{CannotPlanException}}).
> * *Code-generation time*: {{AggImpState}} resolves the implementor from 
> {{RexImpTable.INSTANCE}} as well.
> Because both hard-code the singleton, a chained {{RexImplementorTable}} 
> carrying a custom aggregate implementor is ignored. This is inconsistent with 
> scalar operators, which CALCITE-7631 already made pluggable.
> The deeper issue is that the constructor conflates two unrelated validations:
> # table-independent structural checks ({{DISTINCT}} / {{WITHIN DISTINCT}} are 
> never supported by the enumerable aggregate regardless of any table), which 
> legitimately belong to the node.
> # implementor availability against a specific registry, which is a planning 
> decision and depends on which table is in play.
> Only the second is the defect: hard-coding {{RexImpTable.INSTANCE}} for it 
> bakes in "the built-in table is the only table"; the closed-world assumption 
> CALCITE-7631 removed for scalars. (The sibling {{EnumerableSortedAggregate}} 
> performs no such check in its constructor.)
> h2. Proposed change
> Resolve aggregate implementors through the composable table, defaulting to 
> the built-ins:
> * Add {{RexImplementorTables.of(RelOptCluster)}}, which returns the 
> {{RexImplementorTable}} registered on the planner {{Context}}, or 
> {{RexImpTable.instance()}} when none is registered -- the single place the 
> default is named.
> * Move the implementor-availability check out of the {{EnumerableAggregate}} 
> constructor into {{EnumerableAggregateRule.convert}}, where the planner 
> (hence the injected table) is available. The rule declines (returns {{null}}) 
> when the active table has no implementor, so the planner can try another 
> convention. The constructor keeps only the table-independent structural 
> checks.
> * Thread the resolved table into aggregate code generation: {{AggImpState}} 
> takes a {{RexImplementorTable}}, supplied by its callers 
> ({{EnumerableAggregate}}, {{EnumerableSortedAggregate}}, {{EnumerableWindow}} 
> and the interpreter's {{AggregateNode}}) from 
> {{RexImplementorTables.of(getCluster())}}.
> Both the planning gate and code generation then read the same table, so "the 
> rule accepted this aggregate" implies "code generation can generate it". 
> {{RelOptCluster}} itself is not changed; it is only used to reach the planner 
> {{Context}}.
> h3. Registration
> An extension supplies the table on the planner {{Context}} once, at planner 
> creation:
> {code:java}
> FrameworkConfig config = Frameworks.newConfigBuilder()
>     .operatorTable(
>         SqlOperatorTables.chain(SqlStdOperatorTable.instance(),
>             SqlOperatorTables.of(myAggFunction)))
>     .context(
>         Contexts.of(
>             RexImplementorTables.chain(customTable, RexImpTable.instance())))
>     // ... rules, schema ...
>     .build();
> {code}
> The same applies to a directly built {{VolcanoPlanner(Contexts.of(...))}} and 
> to the JDBC path (a {{CalcitePrepareImpl}} subclass overriding 
> {{createPlanner}} to populate {{externalContext}}, wired via 
> {{Driver.withPrepareFactory}}).
> h3. Backward compatibility
> With no registration the accessor resolves to {{RexImpTable.instance()}}, so 
> existing behaviour is unchanged. The former 3-argument {{AggImpState}} 
> constructor is retained (deprecated) and delegates to the built-in table.
> h2. Example
> {code:sql}
> SELECT g, MY_CUSTOM_AGG(x) AS c
> FROM (VALUES ('a', 1), ('a', 2), ('b', 3)) AS t (g, x)
> GROUP BY g
> {code}
> where {{MY_CUSTOM_AGG}} is contributed through a {{SqlOperatorTable}} and its 
> implementor through a chained {{RexImplementorTable}}. Today this fails to 
> plan; after the change it plans as an {{EnumerableAggregate}} and executes, 
> returning {{(a, 2), (b, 1)}}.
> h2. Testing
> End-to-end enumerable execution of a custom aggregate implementor, asserting 
> the actual result values -- driven both through the {{Frameworks}} API (real 
> SQL) and a directly built planner -- plus the negative case: an unknown 
> aggregate is rejected when no custom table is registered.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to