terrymanu commented on PR #36370:
URL: https://github.com/apache/shardingsphere/pull/36370#issuecomment-3614983406
> ### 1. Parameter filtering modeling
> Current tokens can only rewrite SQL via `toString(RouteUnit)`, but cannot
tell the rewrite engine which parameters to remove. For IN optimization, I need
both capabilities: rewrite SQL per route AND filter parameters per route.
>
> That's why I added `ParameterFilterable` interface in `infra.rewrite.core`
- it's an extension point that lets tokens communicate their parameter
filtering needs to the engine. Currently designed for IN optimization, but the
mechanism is generic enough for other scenarios if needed.
>
> ### 2. Compatibility with existing token generators
> The interface is optional. Existing tokens don't implement it, so they're
completely unaffected. When no filterable tokens exist, the engine returns
original parameters unchanged.
>
> ### 3. Custom algorithm compatibility
> I don't modify any SPI interfaces. For each IN value, I call the
algorithm's existing `doSharding()` method (same as routing does) to determine
target tables, then map them to route units. If the algorithm returns null or
empty, the value is marked as orphan and excluded from all routes.
>
> Since I'm using the same SPI that routing uses, any algorithm that works
with routing will work with this optimization.
>
> ### 4. Complex SQL support
> The optimization works at the parameter level, not SQL structure level. It
applies to any query using `StandardParameterBuilder` (all non-batch queries
including subqueries and JOINs), regardless of SQL complexity.
>
> In the token generator, I currently support `StandardShardingAlgorithm`
and `ComplexKeysShardingAlgorithm`. For each IN value, I invoke the algorithm
to determine target tables. For other strategy types, the value is included in
all routes (original behavior), ensuring compatibility without requiring
changes.
>
> ## PR Split Plan
> ### PR 1: Add ParameterFilterable interface
> Introduce the parameter filtering extension point in `infra.rewrite.core`.
This interface allows tokens to declare which parameters should be removed for
each route, enabling route-aware parameter filtering in the rewrite engine.
>
> ### PR 2: Integrate parameter filtering into RouteSQLRewriteEngine
> Implement the parameter filtering mechanism in the rewrite engine. When
processing each route, the engine checks for filterable tokens and applies
parameter filtering accordingly. When no filterable tokens exist, it maintains
original behavior with zero overhead.
>
> ### PR 3: Add data structures (ShardingInPredicateValue and Token)
> Introduce the data model for IN predicate optimization.
`ShardingInPredicateValue` represents a single IN value with its target route
information. `ShardingInPredicateToken` generates optimized SQL per route and
implements parameter filtering, with single-value optimization (IN → =) for
better database performance.
>
> ### PR 4: Simple optimization - literal values only
> First end-to-end working version. The token generator detects IN
predicates on sharding keys, calls sharding algorithms to determine target
routes for each value, and generates tokens with route distribution
information. This PR handles only literal values to keep the change manageable.
>
> ### PR 5: Add parameter marker support
> Extend the generator to support parameter markers (`?`) in IN predicates.
Implement `ParametersAware` interface to access bound parameters, enabling
optimization for prepared statements.
>
> ### PR 6: Complex algorithm support
> Extend support to `ComplexKeysShardingAlgorithm` for multi-column sharding
keys. The generator builds appropriate sharding values and invokes the
algorithm to determine target routes.
>
> ### PR 7: Nested expression support
> Support IN predicates nested in AND/OR expressions. The generator
recursively extracts IN predicates from binary operations, handling arbitrary
nesting depth and multiple IN predicates in the same query.
>
> @terrymanu Thanks again for the detailed feedback. I've tried to address
the architectural concerns you raised and broken down the work into smaller PRs.
>
> I'm not entirely sure if this is what you're looking for - if I've
misunderstood anything or if you'd like me to approach it differently, please
let me know. Happy to adjust the plan based on your guidance.
Some problems need to be re-design:
- Responsibility mixing: pushing parameter filtering into SQL tokens
breaks the current separation where ParameterBuilder/ParameterRewriter handle
parameters and tokens handle SQL text; this risks index/order conflicts with
other parameter
rewriters (encryption, pagination, shadow, etc.).
- Routing consistency risk: re-running doSharding per value inside tokens
may diverge from the existing RouteContext (which already reflects hints,
binding tables, merged conditions). Routing-stage condition merges/AlwaysFalse
handling aren’t
reused, so misrouting or missed routes are possible.
- SQL/parameter alignment: RouteSQLRewriteEngine currently reuses the same
parameter list in multi-route aggregation (UNION ALL); if each RouteUnit
filters parameters differently, the existing concatenation can leave
placeholder counts and
parameter counts misaligned.
- Incomplete coverage: the plan only mentions StandardParameterBuilder;
GroupedParameterBuilder / batch / multi-value scenarios are unspecified, so
behavior may be inconsistent or fail outright there.
- Behavior change: when an algorithm returns empty/null or no route,
values are silently dropped from IN, effectively weakening user predicates
without fallback or warning—risking data loss.
- Complex SQL / multi-table joins: the optimization assumes single-column
distribution; it doesn’t explain how to keep routes consistent across binding
tables, joins, or subqueries sharing parameters, so partial filtering and
incomplete
results are possible.
- New interface surface risk: introducing a cross-module
ParameterFilterable for a single scenario invites more tokens to take on
parameter logic, increasing coupling and eroding the current
parameter-vs-syntax separation.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]