[ 
https://issues.apache.org/jira/browse/CALCITE-7640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18093571#comment-18093571
 ] 

Darpan Lunagariya (e6data computing) commented on CALCITE-7640:
---------------------------------------------------------------

More broadly, projects that build on Calcite routinely add operators that are 
specific to their engine and have to provide the code generation for them. 
Keeping implementor resolution extensible is what lets them plug in their own 
implementors, for aggregates as well as scalar operators; instead of patching 
the built-in {{RexImpTable}}. This change simply extends that extensibility to 
the aggregate path.

> Enumerable engine should execute aggregates whose implementor comes from a 
> custom RexImplementorTable"
> ------------------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-7640
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7640
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.42.0
>            Reporter: Darpan Lunagariya (e6data computing)
>            Assignee: Darpan Lunagariya (e6data computing)
>            Priority: Minor
>              Labels: pull-request-available
>
> h2. Background
> CALCITE-7631 introduced {{RexImplementorTable}}, a composable SPI that is the 
> code-generation counterpart of {{SqlOperatorTable}}, and threaded it through 
> scalar code generation ({{RexToLixTranslator}}) and constant reduction 
> ({{RexExecutorImpl}}). An extension can now supply implementors for its own 
> scalar operators and chain them ahead of the built-ins.
> The aggregate path was left out of that change. {{EnumerableAggregate}} still 
> resolves aggregate implementors only through the {{RexImpTable}} singleton, 
> so a custom aggregate operator which is contributed through a 
> {{SqlOperatorTable}} rather than registered as a schema function, cannot be 
> planned or executed by the enumerable engine, even when a 
> {{RexImplementorTable}} that knows it is in scope.
> h2. Problem
> The built-in singleton is consulted in two places on the aggregate path:
> * *Planning time*: the {{EnumerableAggregate}} constructor pre-checks every 
> {{AggregateCall}} against {{RexImpTable.INSTANCE}} and throws 
> {{InvalidRelException}} when there is no implementor. 
> {{EnumerableAggregateRule}} catches that and returns {{null}}, so an 
> aggregate the built-ins do not know is rejected during planning (ultimately 
> {{CannotPlanException}}).
> * *Code-generation time*: {{AggImpState}} resolves the implementor from 
> {{RexImpTable.INSTANCE}} as well.
> Because both hard-code the singleton, a chained {{RexImplementorTable}} 
> carrying a custom aggregate implementor is ignored. This is inconsistent with 
> scalar operators, which CALCITE-7631 already made pluggable.
> The deeper issue is that the constructor conflates two unrelated validations:
> # table-independent structural checks ({{DISTINCT}} / {{WITHIN DISTINCT}} are 
> never supported by the enumerable aggregate regardless of any table), which 
> legitimately belong to the node.
> # implementor availability against a specific registry, which is a planning 
> decision and depends on which table is in play.
> Only the second is the defect: hard-coding {{RexImpTable.INSTANCE}} for it 
> bakes in "the built-in table is the only table"; the closed-world assumption 
> CALCITE-7631 removed for scalars. (The sibling {{EnumerableSortedAggregate}} 
> performs no such check in its constructor.)
> h2. Proposed change
> Resolve aggregate implementors through the composable table, defaulting to 
> the built-ins:
> * Add {{RexImplementorTables.of(RelOptCluster)}}, which returns the 
> {{RexImplementorTable}} registered on the planner {{Context}}, or 
> {{RexImpTable.instance()}} when none is registered -- the single place the 
> default is named.
> * Move the implementor-availability check out of the {{EnumerableAggregate}} 
> constructor into {{EnumerableAggregateRule.convert}}, where the planner 
> (hence the injected table) is available. The rule declines (returns {{null}}) 
> when the active table has no implementor, so the planner can try another 
> convention. The constructor keeps only the table-independent structural 
> checks.
> * Thread the resolved table into aggregate code generation: {{AggImpState}} 
> takes a {{RexImplementorTable}}, supplied by its callers 
> ({{EnumerableAggregate}}, {{EnumerableSortedAggregate}}, {{EnumerableWindow}} 
> and the interpreter's {{AggregateNode}}) from 
> {{RexImplementorTables.of(getCluster())}}.
> Both the planning gate and code generation then read the same table, so "the 
> rule accepted this aggregate" implies "code generation can generate it". 
> {{RelOptCluster}} itself is not changed; it is only used to reach the planner 
> {{Context}}.
> h3. Registration
> An extension supplies the table on the planner {{Context}} once, at planner 
> creation:
> {code:java}
> FrameworkConfig config = Frameworks.newConfigBuilder()
>     .operatorTable(
>         SqlOperatorTables.chain(SqlStdOperatorTable.instance(),
>             SqlOperatorTables.of(myAggFunction)))
>     .context(
>         Contexts.of(
>             RexImplementorTables.chain(customTable, RexImpTable.instance())))
>     // ... rules, schema ...
>     .build();
> {code}
> The same applies to a directly built {{VolcanoPlanner(Contexts.of(...))}} and 
> to the JDBC path (a {{CalcitePrepareImpl}} subclass overriding 
> {{createPlanner}} to populate {{externalContext}}, wired via 
> {{Driver.withPrepareFactory}}).
> h3. Backward compatibility
> With no registration the accessor resolves to {{RexImpTable.instance()}}, so 
> existing behaviour is unchanged. The former 3-argument {{AggImpState}} 
> constructor is retained (deprecated) and delegates to the built-in table.
> h2. Example
> {code:sql}
> SELECT g, MY_CUSTOM_AGG(x) AS c
> FROM (VALUES ('a', 1), ('a', 2), ('b', 3)) AS t (g, x)
> GROUP BY g
> {code}
> where {{MY_CUSTOM_AGG}} is contributed through a {{SqlOperatorTable}} and its 
> implementor through a chained {{RexImplementorTable}}. Today this fails to 
> plan; after the change it plans as an {{EnumerableAggregate}} and executes, 
> returning {{(a, 2), (b, 1)}}.
> h2. Testing
> End-to-end enumerable execution of a custom aggregate implementor, asserting 
> the actual result values -- driven both through the {{Frameworks}} API (real 
> SQL) and a directly built planner -- plus the negative case: an unknown 
> aggregate is rejected when no custom table is registered.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to