rzepinskip opened a new issue, #19408:
URL: https://github.com/apache/druid/issues/19408
### Affected Version
Druid 36.0.0
### Description
Range partition pruning produces **incorrect query results** (data loss)
when filtering on LONG-typed dimensions that were used as partition keys, if
those LONG values have different string widths (different digit counts). The
bug causes segments containing matching rows to be incorrectly pruned from the
query plan.
Although docs (and example) mentions string dimensions in context of
secondary partitioning, it is NOT explicitly said that it does not work for
non-string dimensions nor index_parall / index_hadoop reject non-string
dimensions during runtime. I only found this issue because MSQ ingestion
CLUSTERED_BY fails on non-string dimensions.
#### Root Cause
1. **At ingestion time:** Range partitioning
(`DimensionRangePartitionsSpec`) builds partition boundaries by stringifying
dimension values via `InputRow.getDimension()`, which always returns
`List<String>`. This produces `StringTuple` boundaries stored in
`DimensionRangeShardSpec` (e.g., `start=["100", "200"]`).
2. **At query time:** When the SQL planner or native query engine issues a
numeric range filter like `WHERE id >= 80` on a LONG dimension:
- SQL generates a `RangeFilter("id", ColumnType.LONG, 80L, null, ...)`
via `Expressions.toLeafFilter()`.
- `RangeFilter.getDimensionRangeSet()` converts the LONG bound to a
string via `lowerEval.asString()` → produces `Range.atLeast("80")`.
- This range is compared **lexicographically** against the segment's
stringified `StringTuple` bounds in
`DimensionRangeShardSpec.possibleInDomain()`.
3. **The bug:** Lexicographic comparison of `"80"` vs `["100", "200"]`:
- `'1' < '8'` (lexicographically) → segment is pruned.
- **But numerically:** every value in [100, 200] satisfies `>= 80` →
segment should be kept.
- Result: **Rows matching the filter are silently dropped from query
results.**
The code comment in `RangeFilter.getDimensionRangeSet()` even acknowledges
this is wrong:
```java
// We need to return a RangeSet<String>, but we have Object, not String. We
align with the interface by
// converting things to String, but we'd probably be better off adjusting
the interface to something that is
// more type aware in the future
```
---
#### Steps to Reproduce
**1. Add failing unit test to demonstrate the bug:**
File:
`processing/src/test/java/org/apache/druid/query/filter/FilterSegmentPrunerTest.java`
```java
@Test
void testNumericRangePruning_mixedWidthLongs_falseNegative()
{
final String interval = "2026-01-01T00:00:00Z/2026-01-02T00:00:00Z";
DataSegment segLargeIds = makeDataSegment(
interval,
makeRange("id", 0, "100", "200")
);
DimFilter filter =
new RangeFilter("id", ColumnType.LONG, 80L, null, false, false, null);
FilterSegmentPruner pruner = new FilterSegmentPruner(filter, null, null);
// BUG: rows with id in [100, 200] all match `id >= 80`, but the segment
// is wrongly pruned because '100' and '200' are lex-less-than '80'.
Assertions.assertTrue(
pruner.include(segLargeIds),
"Segment with numeric ids [100,200] must be kept for `id >= 80`."
);
}
```
**Run the test:**
```bash
mvn test -pl processing -am \
-Dtest="org.apache.druid.query.filter.FilterSegmentPrunerTest#testNumericRangePruning_mixedWidthLongs_falseNegative"
\
-Dsurefire.failIfNoSpecifiedTests=false \
-Pskip-static-checks -Dweb.console.skip=true -T1C
```
**Expected:** Test passes (segment is kept).
**Actual:** Test **fails** with:
```
AssertionFailedError: Segment with numeric ids [100,200] must be kept for
`id >= 80`.
==> expected: <true> but was: <false>
```
**2. Contrasting working test (same-width LONGs):**
```java
@Test
void testNumericRangePruning_sameWidthLongs_works()
{
final String interval = "2026-01-01T00:00:00Z/2026-01-02T00:00:00Z";
DataSegment segOlder = makeDataSegment(
interval,
makeRange("id", 0, "1000000000", "1500000000")
);
DataSegment segNewer = makeDataSegment(
interval,
makeRange("id", 1, "1500000000", "2000000000")
);
DimFilter filter =
new RangeFilter("id", ColumnType.LONG, 1_700_000_000L, null, false,
false, null);
FilterSegmentPruner pruner = new FilterSegmentPruner(filter, null, null);
Assertions.assertFalse(pruner.include(segOlder)); // Correctly pruned
Assertions.assertTrue(pruner.include(segNewer)); // Correctly kept
}
```
This test **passes** because all 10-digit values have the same string width,
so lexicographic order coincides with numeric order.
---
#### Impact
**Affected queries:**
- SQL queries with numeric comparison operators on LONG partition dimensions:
```sql
SELECT * FROM datasource WHERE account_id >= 100
```
- Native queries with `RangeFilter` on LONG partition dimensions.
**Affected scenarios:**
1. **Variable-width LONG values** used as partition dimensions:
- IDs spanning multiple magnitude ranges (e.g., 1-digit to 6-digit
account IDs).
- Any range including values with different digit counts.
2. **Negative LONG values** (lexicographic sign ordering `'-' < '0'...'9'`
differs from numeric).
3. **Range filters on the first partition dimension** (or any dimension when
the leading dimensions match as singletons, enabling deeper pruning).
**Safe scenarios (no bug):**
- Equality filters (`=`, `IN`, `NOT IN`) work correctly (they also use
stringified values for comparison).
- Same-width LONG values (e.g., 10-digit Unix timestamps in seconds,
zero-padded IDs) work correctly because lexicographic order happens to match
numeric order.
- Partitioning by STRING dimensions (already lexicographic).
---
#### Example Real-World Scenario
**Setup:**
- Datasource partitioned by `partitionDimensions: ["account_id"]` (LONG
type).
- Ingestion produces segments with boundaries like:
```json
{"start": ["100"], "end": ["200"]} // 3-digit IDs
{"start": ["200"], "end": ["1000"]} // 3-4 digit IDs
{"start": ["1000"], "end": ["10000"]} // 4-5 digit IDs
```
**Query:**
```sql
SELECT COUNT(*) FROM datasource WHERE account_id >= 80
```
**Expected:** All rows with `account_id >= 80` (includes segments with IDs
100-200, 200-1000, 1000-10000, etc.).
**Actual:** Segment `["100", "200"]` is **incorrectly pruned** because
lexicographically `"100" < "80"` and `"200" < "80"` (comparing first character:
`'1' < '8'`, `'2' < '8'`). Query returns 0 rows from that segment even though
numerically all rows match.
---
#### Debugging Already Done
1. **Code review:**
- Traced ingestion path: `RangePartitionIndexTaskInputRowIteratorBuilder`
→ `InputRow.getDimension()` → stringified values → `StringSketch` →
`StringTuple` boundaries.
- Traced query path: `Expressions.toLeafFilter()` → `RangeFilter` →
`getDimensionRangeSet()` → `lowerEval.asString()` → lexicographic
`Range<String>`.
- Confirmed `DimensionRangeShardSpec.possibleInDomain()` performs
lexicographic intersection via Guava `Range<String>` operations.
2. **Test coverage gap:**
- No existing tests in `FilterSegmentPrunerTest`,
`DimensionRangeShardSpecTest`, or `RangeFilterTests` exercise numeric
`RangeFilter` against `DimensionRangeShardSpec` with variable-width values.
- All existing tests use STRING-typed filters or same-width numeric
values.
3. **Related findings:**
- `BoundDimFilter.getDimensionRangeSet()` returns `null` for
non-lexicographic orderings (safer, but older code path).
- SQL planner always generates `RangeFilter` (not `BoundDimFilter`) for
`>=`, `>`, `<`, `<=` operators since the "always use RangeFilter" change in
`Expressions.toLeafFilter()`.
---
#### Configuration Details
**Cluster size:** N/A (reproducible in unit tests).
**Ingestion spec - index_parallel (relevant excerpt):**
```json
{
"partitionsSpec": {
"type": "range",
"targetRowsPerSegment": 5000000,
"partitionDimensions": ["account_id", "secondary_timestamp_ms_unix"]
}
}
```
**Schema (relevant excerpt):**
```json
{
"dimensionsSpec": {
"dimensions": [
{"name": "account_id", "type": "long"},
{"name": "secondary_timestamp_ms_unix", "type": "long"}
]
}
}
```
**Query context:** Default (no special settings).
#### Short term fix
We can use `"secondaryPartitionPruning": false` to disable pruning in
affected queries.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]