rzepinskip opened a new issue, #19408:
URL: https://github.com/apache/druid/issues/19408

   
   ### Affected Version
   
   Druid 36.0.0
   
   ### Description
   
   Range partition pruning produces **incorrect query results** (data loss) 
when filtering on LONG-typed dimensions that were used as partition keys, if 
those LONG values have different string widths (different digit counts). The 
bug causes segments containing matching rows to be incorrectly pruned from the 
query plan.
   
   Although docs (and example) mentions string dimensions in context of 
secondary partitioning, it is NOT explicitly said that it does not work for 
non-string dimensions nor index_parall / index_hadoop reject non-string 
dimensions during runtime. I only found this issue because MSQ ingestion 
CLUSTERED_BY fails on non-string dimensions.
   
   #### Root Cause
   
   1. **At ingestion time:** Range partitioning 
(`DimensionRangePartitionsSpec`) builds partition boundaries by stringifying 
dimension values via `InputRow.getDimension()`, which always returns 
`List<String>`. This produces `StringTuple` boundaries stored in 
`DimensionRangeShardSpec` (e.g., `start=["100", "200"]`).
   
   2. **At query time:** When the SQL planner or native query engine issues a 
numeric range filter like `WHERE id >= 80` on a LONG dimension:
      - SQL generates a `RangeFilter("id", ColumnType.LONG, 80L, null, ...)` 
via `Expressions.toLeafFilter()`.
      - `RangeFilter.getDimensionRangeSet()` converts the LONG bound to a 
string via `lowerEval.asString()` → produces `Range.atLeast("80")`.
      - This range is compared **lexicographically** against the segment's 
stringified `StringTuple` bounds in 
`DimensionRangeShardSpec.possibleInDomain()`.
   
   3. **The bug:** Lexicographic comparison of `"80"` vs `["100", "200"]`:
      - `'1' < '8'` (lexicographically) → segment is pruned.
      - **But numerically:** every value in [100, 200] satisfies `>= 80` → 
segment should be kept.
      - Result: **Rows matching the filter are silently dropped from query 
results.**
   
   The code comment in `RangeFilter.getDimensionRangeSet()` even acknowledges 
this is wrong:
   ```java
   // We need to return a RangeSet<String>, but we have Object, not String. We 
align with the interface by
   // converting things to String, but we'd probably be better off adjusting 
the interface to something that is
   // more type aware in the future
   ```
   
   ---
   
   #### Steps to Reproduce
   
   **1. Add failing unit test to demonstrate the bug:**
   
   File: 
`processing/src/test/java/org/apache/druid/query/filter/FilterSegmentPrunerTest.java`
   
   ```java
   @Test
   void testNumericRangePruning_mixedWidthLongs_falseNegative()
   {
     final String interval = "2026-01-01T00:00:00Z/2026-01-02T00:00:00Z";
   
     DataSegment segLargeIds = makeDataSegment(
         interval,
         makeRange("id", 0, "100", "200")
     );
   
     DimFilter filter =
         new RangeFilter("id", ColumnType.LONG, 80L, null, false, false, null);
     FilterSegmentPruner pruner = new FilterSegmentPruner(filter, null, null);
   
     // BUG: rows with id in [100, 200] all match `id >= 80`, but the segment
     //      is wrongly pruned because '100' and '200' are lex-less-than '80'.
     Assertions.assertTrue(
         pruner.include(segLargeIds),
         "Segment with numeric ids [100,200] must be kept for `id >= 80`."
     );
   }
   ```
   
   **Run the test:**
   ```bash
   mvn test -pl processing -am \
     
-Dtest="org.apache.druid.query.filter.FilterSegmentPrunerTest#testNumericRangePruning_mixedWidthLongs_falseNegative"
 \
     -Dsurefire.failIfNoSpecifiedTests=false \
     -Pskip-static-checks -Dweb.console.skip=true -T1C
   ```
   
   **Expected:** Test passes (segment is kept).  
   **Actual:** Test **fails** with:
   ```
   AssertionFailedError: Segment with numeric ids [100,200] must be kept for 
`id >= 80`.
   ==> expected: <true> but was: <false>
   ```
   
   **2. Contrasting working test (same-width LONGs):**
   
   ```java
   @Test
   void testNumericRangePruning_sameWidthLongs_works()
   {
     final String interval = "2026-01-01T00:00:00Z/2026-01-02T00:00:00Z";
   
     DataSegment segOlder = makeDataSegment(
         interval,
         makeRange("id", 0, "1000000000", "1500000000")
     );
     DataSegment segNewer = makeDataSegment(
         interval,
         makeRange("id", 1, "1500000000", "2000000000")
     );
   
     DimFilter filter =
         new RangeFilter("id", ColumnType.LONG, 1_700_000_000L, null, false, 
false, null);
     FilterSegmentPruner pruner = new FilterSegmentPruner(filter, null, null);
   
     Assertions.assertFalse(pruner.include(segOlder));  // Correctly pruned
     Assertions.assertTrue(pruner.include(segNewer));   // Correctly kept
   }
   ```
   
   This test **passes** because all 10-digit values have the same string width, 
so lexicographic order coincides with numeric order.
   
   ---
   
   #### Impact
   
   **Affected queries:**
   - SQL queries with numeric comparison operators on LONG partition dimensions:
     ```sql
     SELECT * FROM datasource WHERE account_id >= 100
     ```
   - Native queries with `RangeFilter` on LONG partition dimensions.
   
   **Affected scenarios:**
   1. **Variable-width LONG values** used as partition dimensions:
      - IDs spanning multiple magnitude ranges (e.g., 1-digit to 6-digit 
account IDs).
      - Any range including values with different digit counts.
   
   2. **Negative LONG values** (lexicographic sign ordering `'-' < '0'...'9'` 
differs from numeric).
   
   3. **Range filters on the first partition dimension** (or any dimension when 
the leading dimensions match as singletons, enabling deeper pruning).
   
   **Safe scenarios (no bug):**
   - Equality filters (`=`, `IN`, `NOT IN`) work correctly (they also use 
stringified values for comparison).
   - Same-width LONG values (e.g., 10-digit Unix timestamps in seconds, 
zero-padded IDs) work correctly because lexicographic order happens to match 
numeric order.
   - Partitioning by STRING dimensions (already lexicographic).
   
   ---
   
   #### Example Real-World Scenario
   
   **Setup:**
   - Datasource partitioned by `partitionDimensions: ["account_id"]` (LONG 
type).
   - Ingestion produces segments with boundaries like:
     ```json
     {"start": ["100"], "end": ["200"]}     // 3-digit IDs
     {"start": ["200"], "end": ["1000"]}    // 3-4 digit IDs
     {"start": ["1000"], "end": ["10000"]}  // 4-5 digit IDs
     ```
   
   **Query:**
   ```sql
   SELECT COUNT(*) FROM datasource WHERE account_id >= 80
   ```
   
   **Expected:** All rows with `account_id >= 80` (includes segments with IDs 
100-200, 200-1000, 1000-10000, etc.).
   
   **Actual:** Segment `["100", "200"]` is **incorrectly pruned** because 
lexicographically `"100" < "80"` and `"200" < "80"` (comparing first character: 
`'1' < '8'`, `'2' < '8'`). Query returns 0 rows from that segment even though 
numerically all rows match.
   
   ---
   
   #### Debugging Already Done
   
   1. **Code review:**
      - Traced ingestion path: `RangePartitionIndexTaskInputRowIteratorBuilder` 
→ `InputRow.getDimension()` → stringified values → `StringSketch` → 
`StringTuple` boundaries.
      - Traced query path: `Expressions.toLeafFilter()` → `RangeFilter` → 
`getDimensionRangeSet()` → `lowerEval.asString()` → lexicographic 
`Range<String>`.
      - Confirmed `DimensionRangeShardSpec.possibleInDomain()` performs 
lexicographic intersection via Guava `Range<String>` operations.
   
   2. **Test coverage gap:**
      - No existing tests in `FilterSegmentPrunerTest`, 
`DimensionRangeShardSpecTest`, or `RangeFilterTests` exercise numeric 
`RangeFilter` against `DimensionRangeShardSpec` with variable-width values.
      - All existing tests use STRING-typed filters or same-width numeric 
values.
   
   3. **Related findings:**
      - `BoundDimFilter.getDimensionRangeSet()` returns `null` for 
non-lexicographic orderings (safer, but older code path).
      - SQL planner always generates `RangeFilter` (not `BoundDimFilter`) for 
`>=`, `>`, `<`, `<=` operators since the "always use RangeFilter" change in 
`Expressions.toLeafFilter()`.
   
   ---
   
   #### Configuration Details
   
   **Cluster size:** N/A (reproducible in unit tests).
   
   **Ingestion spec - index_parallel (relevant excerpt):**
   ```json
   {
     "partitionsSpec": {
       "type": "range",
       "targetRowsPerSegment": 5000000,
       "partitionDimensions": ["account_id", "secondary_timestamp_ms_unix"]
     }
   }
   ```
   
   **Schema (relevant excerpt):**
   ```json
   {
     "dimensionsSpec": {
       "dimensions": [
         {"name": "account_id", "type": "long"},
         {"name": "secondary_timestamp_ms_unix", "type": "long"}
       ]
     }
   }
   ```
   
   **Query context:** Default (no special settings).
   
   #### Short term fix
   
   We can use `"secondaryPartitionPruning": false` to disable pruning in 
affected queries.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to