[
https://issues.apache.org/jira/browse/CALCITE-7631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Darpan Lunagariya (e6data computing) updated CALCITE-7631:
----------------------------------------------------------
Description:
h2. Problem
Enumerable code generation resolves the implementor for every operator through
the {{RexImpTable.INSTANCE}} singleton. There is exactly one extension hook:
{{get(SqlOperator)}} consults {{ImplementableFunction}} when the operator is a
{{SqlUserDefinedFunction}} (a function registered in a schema). For any *other*
operator, a custom or dialect operator that an adapter registers through a
{{SqlOperatorTable}} — there is no way to supply a code-generation implementor:
* the backing maps are {{private final ImmutableMap}};
* the {{Builder}} / {{AbstractBuilder}} and their {{define*}} methods are
private;
* every consumer references {{RexImpTable.INSTANCE}} directly, most importantly
{{RexToLixTranslator.visitCall}}, but also {{RexExecutorImpl}},
{{EnumerableAggregate}}, {{EnumerableMatch}}, {{EnumerableTableFunctionScan}}.
Practical consequences for such an operator:
* it throws {{"cannot translate call"}} during code generation; and
* it cannot be constant-folded, because {{ReduceExpressionsRule}} runs through
{{RexExecutorImpl}}, which compiles the whole batch and fails if a single
operator has no implementor.
h2. Why this matters
{{RexImpTable}} is the only registry of its kind that is a hard, non-composable
singleton. Every other piece of pluggable behaviour in Calcite is an interface
(or builder) with a default, obtained or composed at configuration time:
* operators — {{SqlOperatorTable}} + {{SqlOperatorTables.chain(...)}}
* type system — {{RelDataTypeSystem}} (+ {{RelDataTypeSystem.DEFAULT}})
* metadata — {{RelMetadataProvider}} / {{ChainedRelMetadataProvider}}
* cost — {{RelOptCostFactory}}
* constant executor — {{RexExecutor}} (set on the planner)
So an adapter can already _define_ its operators (compose a
{{SqlOperatorTable}}) and _validate_ them, but it cannot _generate code_ for
them. The validation half of "defining a function" is open; the code-generation
half is sealed. Closing that asymmetry is the goal.
h2. Proposal
Make the implementor table a first-class, composable SPI; the code-generation
counterpart of {{SqlOperatorTable}}, *without changing default behaviour*.
# Extract an interface {{RexImplementorTable}} with the existing lookups
({{get}} for scalar / aggregate / match / windowed-table-function operators).
{{RexImpTable}} becomes its default implementation; {{RexImpTable.INSTANCE}}
and a new {{RexImpTable.instance()}} remain the default.
# Add {{RexImplementorTables.chain(...)}} (mirroring
{{SqlOperatorTables.chain}}): consult each table in turn, first non-null wins;
chain order provides override.
# Thread an injectable {{RexImplementorTable}} (defaulting to the built-ins)
through the code-generation entry points — {{RexToLixTranslator}} (new
overloads of {{translateProjects}} / {{translateCondition}}) and
{{RexExecutorImpl}} (for constant folding) — sourced the same way
{{conformance}} already travels into {{EnumerableRelImplementor}}.
An adapter then supplies implementors for its own operators by composing
{{RexImplementorTables.chain(myTable, RexImpTable.instance())}} — exactly
parallel to how it composes its {{SqlOperatorTable}} today.
h3. Backward compatibility
* {{RexImpTable.INSTANCE}} and all existing public methods remain; the default
resolution path is unchanged.
* New table-carrying overloads are added; the older overloads are deprecated
and delegate to them.
* The match / windowed-table-function lookups change from "throw on miss" to
"return {{null}} on miss" so a chained table can fall through; call sites that
require an implementor preserve the same failure via an explicit check.
h3. Example
{code:java}
RexImplementorTable table =
RexImplementorTables.chain(myAdapterImplementors, RexImpTable.instance());
// constant folding
planner.setExecutor(new RexExecutorImpl(dataContext, table));
{code}
h2. Follow-up
* now that implementor lookup is an interface (RexImplementorTable), the static
code-gen helpers and constants still living on RexImpTable (multiplyDivide,
optimize2, NullAs, TRUE_EXPR/FALSE_EXPR/NULL_EXPR) should be extracted into a
dedicated util (e.g. RexImpUtil) rather than remain on the implementation class.
* Enumerable-engine _execution_ of custom *aggregates* additionally needs the
table at planning time (the {{EnumerableAggregate}} constructor pre-checks
operator support), which can be a follow-up; the SPI itself already covers
aggregate implementors.
was:
h2. Problem
Enumerable code generation resolves the implementor for every operator through
the {{RexImpTable.INSTANCE}} singleton. There is exactly one extension hook:
{{get(SqlOperator)}} consults {{ImplementableFunction}} when the operator is a
{{SqlUserDefinedFunction}} (a function registered in a schema). For any *other*
operator, a custom or dialect operator that an adapter registers through a
{{SqlOperatorTable}} — there is no way to supply a code-generation implementor:
* the backing maps are {{private final ImmutableMap}};
* the {{Builder}} / {{AbstractBuilder}} and their {{define*}} methods are
private;
* every consumer references {{RexImpTable.INSTANCE}} directly, most importantly
{{RexToLixTranslator.visitCall}}, but also {{RexExecutorImpl}},
{{EnumerableAggregate}}, {{EnumerableMatch}}, {{EnumerableTableFunctionScan}}.
Practical consequences for such an operator:
* it throws {{"cannot translate call"}} during code generation; and
* it cannot be constant-folded, because {{ReduceExpressionsRule}} runs through
{{RexExecutorImpl}}, which compiles the whole batch and fails if a single
operator has no implementor.
h2. Why this matters
{{RexImpTable}} is the only registry of its kind that is a hard, non-composable
singleton. Every other piece of pluggable behaviour in Calcite is an interface
(or builder) with a default, obtained or composed at configuration time:
* operators — {{SqlOperatorTable}} + {{SqlOperatorTables.chain(...)}}
* type system — {{RelDataTypeSystem}} (+ {{RelDataTypeSystem.DEFAULT}})
* metadata — {{RelMetadataProvider}} / {{ChainedRelMetadataProvider}}
* cost — {{RelOptCostFactory}}
* constant executor — {{RexExecutor}} (set on the planner)
So an adapter can already _define_ its operators (compose a
{{SqlOperatorTable}}) and _validate_ them, but it cannot _generate code_ for
them. The validation half of "defining a function" is open; the code-generation
half is sealed. Closing that asymmetry is the goal.
h2. Proposal
Make the implementor table a first-class, composable SPI; the code-generation
counterpart of {{SqlOperatorTable}}, *without changing default behaviour*.
# Extract an interface {{RexImplementorTable}} with the existing lookups
({{get}} for scalar / aggregate / match / windowed-table-function operators).
{{RexImpTable}} becomes its default implementation; {{RexImpTable.INSTANCE}}
and a new {{RexImpTable.instance()}} remain the default.
# Add {{RexImplementorTables.chain(...)}} (mirroring
{{SqlOperatorTables.chain}}): consult each table in turn, first non-null wins;
chain order provides override.
# Thread an injectable {{RexImplementorTable}} (defaulting to the built-ins)
through the code-generation entry points — {{RexToLixTranslator}} (new
overloads of {{translateProjects}} / {{translateCondition}}) and
{{RexExecutorImpl}} (for constant folding) — sourced the same way
{{conformance}} already travels into {{EnumerableRelImplementor}}.
An adapter then supplies implementors for its own operators by composing
{{RexImplementorTables.chain(myTable, RexImpTable.instance())}} — exactly
parallel to how it composes its {{SqlOperatorTable}} today.
h3. Backward compatibility
* {{RexImpTable.INSTANCE}} and all existing public methods remain; the default
resolution path is unchanged.
* New table-carrying overloads are added; the older overloads are deprecated
and delegate to them.
* The match / windowed-table-function lookups change from "throw on miss" to
"return {{null}} on miss" so a chained table can fall through; call sites that
require an implementor preserve the same failure via an explicit check.
h3. Example
{code:java}
RexImplementorTable table =
RexImplementorTables.chain(myAdapterImplementors, RexImpTable.instance());
// constant folding
planner.setExecutor(new RexExecutorImpl(dataContext, table));
{code}
h2. Scope / non-goals
* Enumerable-engine _execution_ of custom *aggregates* additionally needs the
table at planning time (the {{EnumerableAggregate}} constructor pre-checks
operator support), which can be a follow-up; the SPI itself already covers
aggregate implementors.
* The operator *catalog* ({{SqlLibrary}} / {{SqlLibraryOperators}}) is
unchanged. That is first-party content behind the already-open
{{SqlOperatorTable}} and is intentionally out of scope.
> Introduce a composable RexImplementorTable SPI for operator code generation
> ---------------------------------------------------------------------------
>
> Key: CALCITE-7631
> URL: https://issues.apache.org/jira/browse/CALCITE-7631
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Affects Versions: 1.42.0
> Reporter: Darpan Lunagariya (e6data computing)
> Assignee: Darpan Lunagariya (e6data computing)
> Priority: Minor
> Original Estimate: 336h
> Remaining Estimate: 336h
>
> h2. Problem
> Enumerable code generation resolves the implementor for every operator
> through the {{RexImpTable.INSTANCE}} singleton. There is exactly one
> extension hook: {{get(SqlOperator)}} consults {{ImplementableFunction}} when
> the operator is a {{SqlUserDefinedFunction}} (a function registered in a
> schema). For any *other* operator, a custom or dialect operator that an
> adapter registers through a {{SqlOperatorTable}} — there is no way to supply
> a code-generation implementor:
> * the backing maps are {{private final ImmutableMap}};
> * the {{Builder}} / {{AbstractBuilder}} and their {{define*}} methods are
> private;
> * every consumer references {{RexImpTable.INSTANCE}} directly, most
> importantly {{RexToLixTranslator.visitCall}}, but also {{RexExecutorImpl}},
> {{EnumerableAggregate}}, {{EnumerableMatch}}, {{EnumerableTableFunctionScan}}.
> Practical consequences for such an operator:
> * it throws {{"cannot translate call"}} during code generation; and
> * it cannot be constant-folded, because {{ReduceExpressionsRule}} runs
> through {{RexExecutorImpl}}, which compiles the whole batch and fails if a
> single operator has no implementor.
> h2. Why this matters
> {{RexImpTable}} is the only registry of its kind that is a hard,
> non-composable singleton. Every other piece of pluggable behaviour in Calcite
> is an interface (or builder) with a default, obtained or composed at
> configuration time:
> * operators — {{SqlOperatorTable}} + {{SqlOperatorTables.chain(...)}}
> * type system — {{RelDataTypeSystem}} (+ {{RelDataTypeSystem.DEFAULT}})
> * metadata — {{RelMetadataProvider}} / {{ChainedRelMetadataProvider}}
> * cost — {{RelOptCostFactory}}
> * constant executor — {{RexExecutor}} (set on the planner)
> So an adapter can already _define_ its operators (compose a
> {{SqlOperatorTable}}) and _validate_ them, but it cannot _generate code_ for
> them. The validation half of "defining a function" is open; the
> code-generation half is sealed. Closing that asymmetry is the goal.
> h2. Proposal
> Make the implementor table a first-class, composable SPI; the code-generation
> counterpart of {{SqlOperatorTable}}, *without changing default behaviour*.
> # Extract an interface {{RexImplementorTable}} with the existing lookups
> ({{get}} for scalar / aggregate / match / windowed-table-function operators).
> {{RexImpTable}} becomes its default implementation; {{RexImpTable.INSTANCE}}
> and a new {{RexImpTable.instance()}} remain the default.
> # Add {{RexImplementorTables.chain(...)}} (mirroring
> {{SqlOperatorTables.chain}}): consult each table in turn, first non-null
> wins; chain order provides override.
> # Thread an injectable {{RexImplementorTable}} (defaulting to the built-ins)
> through the code-generation entry points — {{RexToLixTranslator}} (new
> overloads of {{translateProjects}} / {{translateCondition}}) and
> {{RexExecutorImpl}} (for constant folding) — sourced the same way
> {{conformance}} already travels into {{EnumerableRelImplementor}}.
> An adapter then supplies implementors for its own operators by composing
> {{RexImplementorTables.chain(myTable, RexImpTable.instance())}} — exactly
> parallel to how it composes its {{SqlOperatorTable}} today.
> h3. Backward compatibility
> * {{RexImpTable.INSTANCE}} and all existing public methods remain; the
> default resolution path is unchanged.
> * New table-carrying overloads are added; the older overloads are deprecated
> and delegate to them.
> * The match / windowed-table-function lookups change from "throw on miss" to
> "return {{null}} on miss" so a chained table can fall through; call sites
> that require an implementor preserve the same failure via an explicit check.
> h3. Example
> {code:java}
> RexImplementorTable table =
> RexImplementorTables.chain(myAdapterImplementors, RexImpTable.instance());
> // constant folding
> planner.setExecutor(new RexExecutorImpl(dataContext, table));
> {code}
> h2. Follow-up
> * now that implementor lookup is an interface (RexImplementorTable), the
> static code-gen helpers and constants still living on RexImpTable
> (multiplyDivide, optimize2, NullAs, TRUE_EXPR/FALSE_EXPR/NULL_EXPR) should be
> extracted into a dedicated util (e.g. RexImpUtil) rather than remain on the
> implementation class.
> * Enumerable-engine _execution_ of custom *aggregates* additionally needs the
> table at planning time (the {{EnumerableAggregate}} constructor pre-checks
> operator support), which can be a follow-up; the SPI itself already covers
> aggregate implementors.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)