Re: [PR] support IN clause optimization for sharding key [shardingsphere]

via GitHub Thu, 04 Dec 2025 17:49:57 -0800


terrymanu commented on PR #36370:
URL: https://github.com/apache/shardingsphere/pull/36370#issuecomment-3614983406


   > ### 1. Parameter filtering modeling
   > Current tokens can only rewrite SQL via `toString(RouteUnit)`, but cannot 
tell the rewrite engine which parameters to remove. For IN optimization, I need 
both capabilities: rewrite SQL per route AND filter parameters per route.
   > 
   > That's why I added `ParameterFilterable` interface in `infra.rewrite.core` 
- it's an extension point that lets tokens communicate their parameter 
filtering needs to the engine. Currently designed for IN optimization, but the 
mechanism is generic enough for other scenarios if needed.
   > 
   > ### 2. Compatibility with existing token generators
   > The interface is optional. Existing tokens don't implement it, so they're 
completely unaffected. When no filterable tokens exist, the engine returns 
original parameters unchanged.
   > 
   > ### 3. Custom algorithm compatibility
   > I don't modify any SPI interfaces. For each IN value, I call the 
algorithm's existing `doSharding()` method (same as routing does) to determine 
target tables, then map them to route units. If the algorithm returns null or 
empty, the value is marked as orphan and excluded from all routes.
   > 
   > Since I'm using the same SPI that routing uses, any algorithm that works 
with routing will work with this optimization.
   > 
   > ### 4. Complex SQL support
   > The optimization works at the parameter level, not SQL structure level. It 
applies to any query using `StandardParameterBuilder` (all non-batch queries 
including subqueries and JOINs), regardless of SQL complexity.
   > 
   > In the token generator, I currently support `StandardShardingAlgorithm` 
and `ComplexKeysShardingAlgorithm`. For each IN value, I invoke the algorithm 
to determine target tables. For other strategy types, the value is included in 
all routes (original behavior), ensuring compatibility without requiring 
changes.
   > 
   > ## PR Split Plan
   > ### PR 1: Add ParameterFilterable interface
   > Introduce the parameter filtering extension point in `infra.rewrite.core`. 
This interface allows tokens to declare which parameters should be removed for 
each route, enabling route-aware parameter filtering in the rewrite engine.
   > 
   > ### PR 2: Integrate parameter filtering into RouteSQLRewriteEngine
   > Implement the parameter filtering mechanism in the rewrite engine. When 
processing each route, the engine checks for filterable tokens and applies 
parameter filtering accordingly. When no filterable tokens exist, it maintains 
original behavior with zero overhead.
   > 
   > ### PR 3: Add data structures (ShardingInPredicateValue and Token)
   > Introduce the data model for IN predicate optimization. 
`ShardingInPredicateValue` represents a single IN value with its target route 
information. `ShardingInPredicateToken` generates optimized SQL per route and 
implements parameter filtering, with single-value optimization (IN → =) for 
better database performance.
   > 
   > ### PR 4: Simple optimization - literal values only
   > First end-to-end working version. The token generator detects IN 
predicates on sharding keys, calls sharding algorithms to determine target 
routes for each value, and generates tokens with route distribution 
information. This PR handles only literal values to keep the change manageable.
   > 
   > ### PR 5: Add parameter marker support
   > Extend the generator to support parameter markers (`?`) in IN predicates. 
Implement `ParametersAware` interface to access bound parameters, enabling 
optimization for prepared statements.
   > 
   > ### PR 6: Complex algorithm support
   > Extend support to `ComplexKeysShardingAlgorithm` for multi-column sharding 
keys. The generator builds appropriate sharding values and invokes the 
algorithm to determine target routes.
   > 
   > ### PR 7: Nested expression support
   > Support IN predicates nested in AND/OR expressions. The generator 
recursively extracts IN predicates from binary operations, handling arbitrary 
nesting depth and multiple IN predicates in the same query.
   > 
   > @terrymanu Thanks again for the detailed feedback. I've tried to address 
the architectural concerns you raised and broken down the work into smaller PRs.
   > 
   > I'm not entirely sure if this is what you're looking for - if I've 
misunderstood anything or if you'd like me to approach it differently, please 
let me know. Happy to adjust the plan based on your guidance.
   
   
   Some problems need to be re-design:
   
     - Responsibility mixing: pushing parameter filtering into SQL tokens 
breaks the current separation where ParameterBuilder/ParameterRewriter handle 
parameters and tokens handle SQL text; this risks index/order conflicts with 
other parameter
       rewriters (encryption, pagination, shadow, etc.).
     - Routing consistency risk: re-running doSharding per value inside tokens 
may diverge from the existing RouteContext (which already reflects hints, 
binding tables, merged conditions). Routing-stage condition merges/AlwaysFalse 
handling aren’t
       reused, so misrouting or missed routes are possible.
     - SQL/parameter alignment: RouteSQLRewriteEngine currently reuses the same 
parameter list in multi-route aggregation (UNION ALL); if each RouteUnit 
filters parameters differently, the existing concatenation can leave 
placeholder counts and
       parameter counts misaligned.
     - Incomplete coverage: the plan only mentions StandardParameterBuilder; 
GroupedParameterBuilder / batch / multi-value scenarios are unspecified, so 
behavior may be inconsistent or fail outright there.
     - Behavior change: when an algorithm returns empty/null or no route, 
values are silently dropped from IN, effectively weakening user predicates 
without fallback or warning—risking data loss.
     - Complex SQL / multi-table joins: the optimization assumes single-column 
distribution; it doesn’t explain how to keep routes consistent across binding 
tables, joins, or subqueries sharing parameters, so partial filtering and 
incomplete
       results are possible.
     - New interface surface risk: introducing a cross-module 
ParameterFilterable for a single scenario invites more tokens to take on 
parameter logic, increasing coupling and eroding the current 
parameter-vs-syntax separation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] support IN clause optimization for sharding key [shardingsphere]

Reply via email to