[
https://issues.apache.org/jira/browse/CALCITE-7631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Darpan Lunagariya (e6data computing) updated CALCITE-7631:
----------------------------------------------------------
Labels: (was: pull)
> Introduce a composable RexImplementorTable SPI for operator code generation
> ---------------------------------------------------------------------------
>
> Key: CALCITE-7631
> URL: https://issues.apache.org/jira/browse/CALCITE-7631
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Affects Versions: 1.42.0
> Reporter: Darpan Lunagariya (e6data computing)
> Assignee: Darpan Lunagariya (e6data computing)
> Priority: Minor
> Original Estimate: 336h
> Remaining Estimate: 336h
>
> h2. Problem
> Enumerable code generation resolves the implementor for every operator
> through the {{RexImpTable.INSTANCE}} singleton. There is exactly one
> extension hook: {{get(SqlOperator)}} consults {{ImplementableFunction}} when
> the operator is a {{SqlUserDefinedFunction}} (a function registered in a
> schema). For any *other* operator, a custom or dialect operator that an
> adapter registers through a {{SqlOperatorTable}} — there is no way to supply
> a code-generation implementor:
> * the backing maps are {{private final ImmutableMap}};
> * the {{Builder}} / {{AbstractBuilder}} and their {{define*}} methods are
> private;
> * every consumer references {{RexImpTable.INSTANCE}} directly, most
> importantly {{RexToLixTranslator.visitCall}}, but also {{RexExecutorImpl}},
> {{EnumerableAggregate}}, {{EnumerableMatch}}, {{EnumerableTableFunctionScan}}.
> Practical consequences for such an operator:
> * it throws {{"cannot translate call"}} during code generation; and
> * it cannot be constant-folded, because {{ReduceExpressionsRule}} runs
> through {{RexExecutorImpl}}, which compiles the whole batch and fails if a
> single operator has no implementor.
> h2. Why this matters
> {{RexImpTable}} is the only registry of its kind that is a hard,
> non-composable singleton. Every other piece of pluggable behaviour in Calcite
> is an interface (or builder) with a default, obtained or composed at
> configuration time:
> * operators — {{SqlOperatorTable}} + {{SqlOperatorTables.chain(...)}}
> * type system — {{RelDataTypeSystem}} (+ {{RelDataTypeSystem.DEFAULT}})
> * metadata — {{RelMetadataProvider}} / {{ChainedRelMetadataProvider}}
> * cost — {{RelOptCostFactory}}
> * constant executor — {{RexExecutor}} (set on the planner)
> So an adapter can already _define_ its operators (compose a
> {{SqlOperatorTable}}) and _validate_ them, but it cannot _generate code_ for
> them. The validation half of "defining a function" is open; the
> code-generation half is sealed. Closing that asymmetry is the goal.
> h2. Proposal
> Make the implementor table a first-class, composable SPI; the code-generation
> counterpart of {{SqlOperatorTable}}, *without changing default behaviour*.
> # Extract an interface {{RexImplementorTable}} with the existing lookups
> ({{get}} for scalar / aggregate / match / windowed-table-function operators).
> {{RexImpTable}} becomes its default implementation; {{RexImpTable.INSTANCE}}
> and a new {{RexImpTable.instance()}} remain the default.
> # Add {{RexImplementorTables.chain(...)}} (mirroring
> {{SqlOperatorTables.chain}}): consult each table in turn, first non-null
> wins; chain order provides override.
> # Thread an injectable {{RexImplementorTable}} (defaulting to the built-ins)
> through the code-generation entry points — {{RexToLixTranslator}} (new
> overloads of {{translateProjects}} / {{translateCondition}}) and
> {{RexExecutorImpl}} (for constant folding) — sourced the same way
> {{conformance}} already travels into {{EnumerableRelImplementor}}.
> An adapter then supplies implementors for its own operators by composing
> {{RexImplementorTables.chain(myTable, RexImpTable.instance())}} — exactly
> parallel to how it composes its {{SqlOperatorTable}} today.
> h3. Backward compatibility
> * {{RexImpTable.INSTANCE}} and all existing public methods remain; the
> default resolution path is unchanged.
> * New table-carrying overloads are added; the older overloads are deprecated
> and delegate to them.
> * The match / windowed-table-function lookups change from "throw on miss" to
> "return {{null}} on miss" so a chained table can fall through; call sites
> that require an implementor preserve the same failure via an explicit check.
> h3. Example
> {code:java}
> RexImplementorTable table =
> RexImplementorTables.chain(myAdapterImplementors, RexImpTable.instance());
> // constant folding
> planner.setExecutor(new RexExecutorImpl(dataContext, table));
> {code}
> h2. Follow-up
> * now that implementor lookup is an interface (RexImplementorTable), the
> static code-gen helpers and constants still living on RexImpTable
> (multiplyDivide, optimize2, NullAs, TRUE_EXPR/FALSE_EXPR/NULL_EXPR) should be
> extracted into a dedicated util (e.g. RexImpUtil) rather than remain on the
> implementation class.
> * Enumerable-engine _execution_ of custom *aggregates* additionally needs the
> table at planning time (the {{EnumerableAggregate}} constructor pre-checks
> operator support), which can be a follow-up; the SPI itself already covers
> aggregate implementors.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)