xiangfu0 commented on PR #18869: URL: https://github.com/apache/pinot/pull/18869#issuecomment-4860707548
Rebased onto latest `master` and addressed all review feedback. Summary of changes: ### @Jackie-Jiang - **`UuidUtils` javadoc → markdown (`///`)** — converted throughout `UuidUtils` and the extracted `UuidKey`. - **Private constructor next to class declaration** — moved to the top of `UuidUtils`. - **Remove per-value `null` checks** — removed all `validateNotNull(...)` calls and the helper; callers must pass non-null (documented on the class). Kept only the 16-byte length validation and the `isUuid(...)` `null → false` contract. - **`toBytes(byte[])` defensive copy** — removed; it now validates the width and returns the caller-owned buffer as-is. - **Move `UuidKey` to a dedicated class** — extracted to `UuidKey.java`. - **`FieldSpec.getDefaultNullValue()` copy** — removed; returns the stored default directly (test updated to `assertSame`). - **Enum ordinal-append hack** — `UUID` is now placed **right after its stored type `BYTES`** (not at the end), and the fragile "must stay last / AnyValueAggregationFunction" comment is gone. Making `AnyValueAggregationFunction` self-contained is left as the separate follow-up you suggested. - **`case UUID` after `case BYTES`** — reordered in `convert()`/`convertInternal()` (the other switches already had this order). - **`getDefaultNullValueString` / `getStringDefaultNullValue` "why the check / why specialize UUID"** — the UUID branch is required for correctness: `byte[] → hex` does not re-parse, so UUID must stringify to its canonical `8-4-4-4-12` form to round-trip through `getDefaultNullValue(...)`. For every non-UUID type `_dataType.toString(v) == getStringValue(v)`, so behavior is unchanged elsewhere. I folded the `instanceof String` branch into `getStringDefaultNullValue` so `setDefaultNullValue` is simpler. - **`toBytes(String)` canonical check / length check as hotspots** — kept the validated methods as the ingestion boundary; the hot compute path goes through `toUUID(long,long)` / `UuidKey.fromLongs(...)`, which skip validation entirely. Happy to add an explicit unchecked variant if a profiled call site needs one, but I left it out for now to avoid an unused API. ### @gortiz - **`StringFunctions.fromUUIDBytes` back-compat** — restored the null-on-exception behavior (try/catch → `null`) and corrected the Javadoc. - **Time-based helper tests** — added `UuidUtilsTest` coverage: `getVersion` returns 4/7 for `randomV4`/`randomV7`, `getTimestampMillis(randomV7())` is within `[before, after]` of `System.currentTimeMillis()`, a fixed-layout v7 assertion, and `getTimestampMillis` throws for a v4 UUID. - **Description bullet** — fixed: MV UUID is **allowed** (validation is consistent with the rest of the stack), not rejected. - **Nits** — `///` javadoc done; the `TableIndexingTest` comment no longer names tests that only exist in later PRs. ### @Copilot - **`PinotDataType.STRING.toUUID` trim** — now trims (matches the other STRING→primitive conversions and `UUID.convert(..., STRING)`). - **MV UUID validation** — intentional (see description fix above); behavior kept. - **`fromUUIDBytes` / version+timestamp coverage** — covered above. CI is green except the **Binary Compatibility Check**, which flags the additive `DataType.UUID` enum value and the `UuidUtils` API changes — expected for a new logical type. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
