xiangfu0 opened a new pull request, #18239:
URL: https://github.com/apache/pinot/pull/18239
## Summary
Adds `UUID` as a first-class Pinot data type backed by 16-byte BYTES storage
with canonical lowercase RFC 4122 strings as the external representation.
### Core type plumbing
- `FieldSpec.DataType.UUID` — stored as BYTES (16 bytes), sortable via
unsigned byte order. UUID is placed **after `UNKNOWN`** (ordinal 14) to avoid
shifting existing ordinals since `AnyValueAggregationFunction` serializes
`DataType.ordinal()` into intermediate results.
- `PinotDataType.UUID` — conversions to/from STRING/BYTES; numeric ops throw
as expected.
- `DataSchema.ColumnDataType.UUID` — backed by BYTES internally, maps to
Calcite `SqlTypeName.UUID`.
- `Schema.validate()` — UUID and BIG_DECIMAL enforce SV-only constraint.
### UuidUtils (`pinot-spi`)
- Canonical conversions: `toBytes()`, `toUUID()`, `toString()`
- `UuidKey` inner class — stores UUID as two `long` fields for O(1)
hashCode/equals in hot groupby/join paths (avoids ByteArray allocation per row)
- `compare()` uses `Long.compareUnsigned` for correct unsigned byte-order
UUID comparison
### Query engine
- Predicate evaluators — Eq/NotEq/In/NotIn handle UUID via
`BytesRaw*Evaluator` + `DataType.UUID`
- Group key generators — `NoDictionarySingleColumnGroupKeyGenerator`,
`NoDictionaryMultiColumnGroupKeyGenerator` updated
- MSQE hot-path: `UuidToIdMap`, `OneUuidKeyGroupIdGenerator`,
`UuidLookupTable`
- `RequestContextUtils.evaluateLiteralValue` — supports
CAST/TO_UUID/UUID_TO_BYTES/BYTES_TO_UUID/UUID_TO_STRING/IS_UUID on predicate
RHS literals
- `CastTransformFunction` — CAST(col AS UUID) per-row support
- Scalar functions: `IS_UUID`, `TO_UUID`, `UUID_TO_BYTES`, `BYTES_TO_UUID`,
`UUID_TO_STRING`
- `ValueBasedSegmentPruner` — UUID bloom filter checked with
`DataType.toString()` for canonical UUID strings
### gRPC / response encoding (bug fixes)
- `JsonResponseEncoder.extractValue` — added `case UUID` to avoid
`IllegalArgumentException` when decoding gRPC responses with UUID columns.
**Root cause of `UuidUpsertRealtimeTest` failure.**
- `ArrowResponseEncoder` — full UUID support via `getVarCharValue()` helper;
avoids `ClassCastException` in Arrow encoding path.
- `ServerPlanRequestUtils.computeInOperands` — emits canonical UUID string
literals (not raw bytes) for UUID columns in MSQE join dynamic filters.
### Bloom filter (bug fix)
- `BloomFilterHandler` — UUID columns are now handled correctly in both
rebuild paths:
- Dictionary path: calls `UuidUtils.toString(dictionary.getBytesValue(i))`
instead of `dictionary.getStringValue(i)` (which returns hex encoding for
byte-backed dicts).
- Non-dictionary path: adds explicit `case UUID` before `case BYTES` to
avoid `IllegalStateException` on segment reload.
### Segment / realtime
- `MutableSegmentImpl` — UUID primary key normalization for upsert
- `MutableNoDictColumnStatistics.isSorted()` — uses `UuidUtils.compare`
(unsigned) instead of `ByteArray.compare` (signed) for UUID
- `BloomFilterCreator` — UUID stored as canonical string
### Input format
- Avro plugin — `logicalType: "uuid"` Avro schema field → UUID DataType
### Tests
- `UuidTypeTest`, `UuidTypeRealtimeTest` — full integration tests for batch
and realtime
- `UuidUpsertRealtimeTest` — upsert with UUID primary key
- Unit tests for UuidUtils, DataSchema, PinotDataType, RequestContextUtils,
response encoders, Avro utils, predicate evaluators
## Test plan
- [ ] `UuidTypeTest` (batch): predicate pushdown, group-by, CAST, ORDER BY
- [ ] `UuidTypeRealtimeTest`: same for realtime segment
- [ ] `UuidUpsertRealtimeTest`: upsert/dedup with UUID primary key (verifies
gRPC fix)
- [ ] `UuidUtilsTest`, `DataSchemaTest`, `PinotDataTypeTest`: unit coverage
- [ ] `ArrowResponseEncoderTest`, `JsonResponseEncoderTest`: encoding
round-trips
- [ ] `RequestContextUtilsTest`: literal evaluation for UUID functions
🤖 Generated with [Claude Code](https://claude.com/claude-code)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]