Darpan Lunagariya (e6data computing) created CALCITE-7640:
-------------------------------------------------------------

             Summary: Enumerable engine should execute aggregates whose 
implementor comes from a custom RexImplementorTable"
                 Key: CALCITE-7640
                 URL: https://issues.apache.org/jira/browse/CALCITE-7640
             Project: Calcite
          Issue Type: Improvement
          Components: core
    Affects Versions: 1.42.0
            Reporter: Darpan Lunagariya (e6data computing)


h2. Background

CALCITE-7631 introduced {{RexImplementorTable}}, a composable SPI that is the 
code-generation counterpart of {{SqlOperatorTable}}, and threaded it through 
scalar code generation ({{RexToLixTranslator}}) and constant reduction 
({{RexExecutorImpl}}). An extension can now supply implementors for its own 
scalar operators and chain them ahead of the built-ins.

The aggregate path was left out of that change. {{EnumerableAggregate}} still 
resolves aggregate implementors only through the {{RexImpTable}} singleton, so 
a custom aggregate operator which is contributed through a {{SqlOperatorTable}} 
rather than registered as a schema function, cannot be planned or executed by 
the enumerable engine, even when a {{RexImplementorTable}} that knows it is in 
scope.

h2. Problem

The built-in singleton is consulted in two places on the aggregate path:

* *Planning time*: the {{EnumerableAggregate}} constructor pre-checks every 
{{AggregateCall}} against {{RexImpTable.INSTANCE}} and throws 
{{InvalidRelException}} when there is no implementor. 
{{EnumerableAggregateRule}} catches that and returns {{null}}, so an aggregate 
the built-ins do not know is rejected during planning (ultimately 
{{CannotPlanException}}).
* *Code-generation time*: {{AggImpState}} resolves the implementor from 
{{RexImpTable.INSTANCE}} as well.

Because both hard-code the singleton, a chained {{RexImplementorTable}} 
carrying a custom aggregate implementor is ignored. This is inconsistent with 
scalar operators, which CALCITE-7631 already made pluggable.

The deeper issue is that the constructor conflates two unrelated validations:
# table-independent structural checks ({{DISTINCT}} / {{WITHIN DISTINCT}} are 
never supported by the enumerable aggregate regardless of any table), which 
legitimately belong to the node.
# implementor availability against a specific registry, which is a planning 
decision and depends on which table is in play.

Only the second is the defect: hard-coding {{RexImpTable.INSTANCE}} for it 
bakes in "the built-in table is the only table"; the closed-world assumption 
CALCITE-7631 removed for scalars. (The sibling {{EnumerableSortedAggregate}} 
performs no such check in its constructor.)

h2. Proposed change

Resolve aggregate implementors through the composable table, defaulting to the 
built-ins:

* Add {{RexImplementorTables.of(RelOptCluster)}}, which returns the 
{{RexImplementorTable}} registered on the planner {{Context}}, or 
{{RexImpTable.instance()}} when none is registered -- the single place the 
default is named.
* Move the implementor-availability check out of the {{EnumerableAggregate}} 
constructor into {{EnumerableAggregateRule.convert}}, where the planner (hence 
the injected table) is available. The rule declines (returns {{null}}) when the 
active table has no implementor, so the planner can try another convention. The 
constructor keeps only the table-independent structural checks.
* Thread the resolved table into aggregate code generation: {{AggImpState}} 
takes a {{RexImplementorTable}}, supplied by its callers 
({{EnumerableAggregate}}, {{EnumerableSortedAggregate}}, {{EnumerableWindow}} 
and the interpreter's {{AggregateNode}}) from 
{{RexImplementorTables.of(getCluster())}}.

Both the planning gate and code generation then read the same table, so "the 
rule accepted this aggregate" implies "code generation can generate it". 
{{RelOptCluster}} itself is not changed; it is only used to reach the planner 
{{Context}}.

h3. Registration

An extension supplies the table on the planner {{Context}} once, at planner 
creation:

{code:java}
FrameworkConfig config = Frameworks.newConfigBuilder()
    .operatorTable(
        SqlOperatorTables.chain(SqlStdOperatorTable.instance(),
            SqlOperatorTables.of(myAggFunction)))
    .context(
        Contexts.of(
            RexImplementorTables.chain(customTable, RexImpTable.instance())))
    // ... rules, schema ...
    .build();
{code}

The same applies to a directly built {{VolcanoPlanner(Contexts.of(...))}} and 
to the JDBC path (a {{CalcitePrepareImpl}} subclass overriding 
{{createPlanner}} to populate {{externalContext}}, wired via 
{{Driver.withPrepareFactory}}).

h3. Backward compatibility

With no registration the accessor resolves to {{RexImpTable.instance()}}, so 
existing behaviour is unchanged. The former 3-argument {{AggImpState}} 
constructor is retained (deprecated) and delegates to the built-in table.

h2. Example

{code:sql}
SELECT g, MY_CUSTOM_AGG(x) AS c
FROM (VALUES ('a', 1), ('a', 2), ('b', 3)) AS t (g, x)
GROUP BY g
{code}

where {{MY_CUSTOM_AGG}} is contributed through a {{SqlOperatorTable}} and its 
implementor through a chained {{RexImplementorTable}}. Today this fails to 
plan; after the change it plans as an {{EnumerableAggregate}} and executes, 
returning {{(a, 2), (b, 1)}}.

h2. Testing

End-to-end enumerable execution of a custom aggregate implementor, asserting 
the actual result values -- driven both through the {{Frameworks}} API (real 
SQL) and a directly built planner -- plus the negative case: an unknown 
aggregate is rejected when no custom table is registered.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to