anshul98ks123 opened a new pull request, #18317: URL: https://github.com/apache/pinot/pull/18317
### Issue(s) Follow-up to [apache/pinot#17558 -- "Slim JSON serialization for Schema and TableConfigs"](https://github.com/apache/pinot/pull/17558). The previous PR fixed slim serialization for `Schema` and `TableConfigs`, but left every `*IndexConfig` POJO using Jackson's default bean serializer. As a result, slim user-supplied index configs (e.g. `{"encodingType":"RAW"}`, `{"fpp":0.5}`) get *fattened* with default values on the first round-trip through any Pinot `ObjectMapper` (controller REST handler, ZooKeeper writer, proxy, broker preview path). This makes it impossible to keep ZK storage minimal and produces noisy UI diffs against freshly-edited table configs. The user-visible repro -- `connection__cell__timeouts` -- showed a `forward` block ballooning from `{"compressionCodec":"SNAPPY"}` to a six-key blob the moment it touched the controller. This PR rebrands the entire `*IndexConfig` serialization surface as **slim-by-default**, so any new index POJO follows the same single rule (emit only fields that differ from the class default) without further per-class plumbing. --- ### Description This PR adds curated `@JsonValue toJsonObject()` slim serializers to **12 `*IndexConfig` classes** spanning `pinot-spi` and `pinot-segment-spi`, and updates the small set of in-tree tests that previously pinned the fat wire shape. It is wire-format-narrowing only (no read changes), but it is submitted as a single PR so all `*IndexConfig` POJOs flip to the slim shape atomically -- mixing slim and fat serializers across configs would defeat the whole point. **Background -- The `*IndexConfig` Serialization Surface (Before):** | Layer | Shape (before) | Shape (after) | |---|---|---| | Default bean serializer | every getter emitted -- `disabled`, primitive-`0`, `null`, empty `[]`, every Lucene default | `@JsonValue toJsonObject()` -- only fields whose value differs from the class default | | `IndexConfig` (base) | `{"disabled":false}` | `{}` (only `{"disabled":true}` if disabled) | | `BloomFilterConfig` | `{"disabled":false,"fpp":0.05,"maxSizeInBytes":0,"loadOnHeap":false}` | `{}` | | `JsonIndexConfig` | `{"disabled":false,"maxLevels":-1,"excludeArray":false,...}` | `{}` (and slim form is now idempotent under round-trip) | | `ForwardIndexConfig` | static cluster-tunable defaults baked into JSON (`rawIndexWriterVersion=4`, `targetMaxChunkSize=1MB`, `targetDocsPerChunk=1024`) | omitted -- compared against the *live* default at call time | | `TextIndexConfig` (18 Lucene fields) | every default emitted, including `"luceneAnalyzerClassArgs":[]` (which itself broke deserialization -- see `testEmptyTextConfigRoundTrip`) | only non-default fields; empty-list defaults are omitted entirely | | `H3IndexConfig` | `{"disabled":false,"resolution":[5,6,13]}` | `{"resolution":[5,6,13]}` | | `VectorIndexConfig` | `{"disabled":false,"vectorDimension":0,"version":0}` | `{}` | | `MultiColumnTextIndexConfig` / `StarTreeIndexConfig` | full bean shape (`BaseJsonConfig` subclasses, no `disabled`) | only required + non-default fields | --- ### The Problem 1. **Slim input -> fat output on first round-trip.** A user posts `{"forward":{"compressionCodec":"SNAPPY"}}` to `inferIndexes` / `applyIndexes`. The controller deserializes via the existing `@JsonCreator` (which fills primitives + cluster-tunable statics with defaults), then re-serializes with the *default bean serializer*, producing the six-key fat blob. The slim shape is silently lost the moment any code path round-trips the POJO. 2. **Cluster-tunable static defaults leak into ZK.** `ForwardIndexConfig._rawIndexWriterVersion` / `_targetMaxChunkSize` / `_targetDocsPerChunk` are mutable statics. Pre-fix, the *value at write time* was baked into the ZK znode; post-fix, the slim form omits the field iff it equals the *live* default at that instant -- so retuning the cluster default does not require rewriting every znode. 3. **Primitive-`int` defaults are indistinguishable from "absent".** `VectorIndexConfig`'s `@JsonCreator` takes `int vectorDimension` / `int version`; Jackson fills missing values with `0`. Pre-fix, the bean serializer always emitted `"vectorDimension":0,"version":0`. Pre-fix `*Test` classes pinned that exact string. 4. **`JsonIndexConfig.maxLevels` was non-idempotent.** Field initializer is `-1` (meaning "unlimited"), `@JsonCreator` parameter is primitive `int` (default `0`). The pre-fix serializer omitted `maxLevels` only on `-1`, so deserializing slim JSON -> POJO -> re-serializing produced `{"maxLevels":0,...}` -- a strictly different wire shape than the slim input. Functionally `0` and `-1` both mean "no limit" (`JsonUtils.flatten()` only consults `maxLevels > 0`), so this was a serializer bug, not a config bug. 5. **In-tree tests pinned the fat shape.** `H3IndexConfigTest.serde`, `VectorConfigTest.serde`, `TextIndexConfigTest.testEmptyTextConfigRoundTrip` (in `pinot-segment-spi`), and `H3IndexTest`, `VectorIndexTest`, `RangeIndexTest`, `TextIndexTest` (`testConvertToUpdatedFormat` / `oldToNewConfConversion` in `pinot-segment-local`) all asserted on the exact fat string -- so the slim flip required updating these inline. --- ### Solution **1. `@JsonValue toJsonObject()` on every `*IndexConfig`** Each class gets one curated method that builds a fresh `ObjectNode`, calls `super.toJsonObject()` first to inherit `disabled` handling (when subclassing `IndexConfig`), and emits a field only when its value differs from the class default. Jackson sees the `@JsonValue` annotation and *skips* default bean introspection entirely -- the method is the sole source of truth. ```java // IndexConfig (base) -- emits only the disabled=true case @JsonValue public ObjectNode toJsonObject() { ObjectNode node = JsonUtils.newObjectNode(); if (_disabled) { node.put("disabled", true); } return node; } // ForwardIndexConfig -- reads cluster-tunable statics at call time @Override @JsonValue public ObjectNode toJsonObject() { ObjectNode node = super.toJsonObject(); if (_compressionCodec != null) { node.put("compressionCodec", _compressionCodec.name()); } if (_deriveNumDocsPerChunk) { node.put("deriveNumDocsPerChunk", true); } // Read cluster-tunable statics at call time so the serialized form tracks the live default. if (_rawIndexWriterVersion != _defaultRawIndexWriterVersion) { node.put("rawIndexWriterVersion", _rawIndexWriterVersion); } // ... return node; } ``` **2. `JsonIndexConfig.maxLevels` idempotency fix** The pre-fix omission rule (`_maxLevels != -1`) was tightened to `_maxLevels > 0`, treating both `-1` (field-initializer default) and `0` (`@JsonCreator` primitive default) as "default unlimited". This makes the slim form idempotent: `serialize -> deserialize -> serialize` produces a byte-identical result. ```java // Before -- non-idempotent: deserializing slim JSON makes _maxLevels=0, which then re-serializes if (_maxLevels != -1) { node.put("maxLevels", _maxLevels); } // After -- both -1 and 0 functionally mean "unlimited"; JsonUtils.flatten() only consults maxLevels > 0 if (_maxLevels > 0) { node.put("maxLevels", _maxLevels); } ``` **3. Two standalone serializers for `BaseJsonConfig` subclasses** `MultiColumnTextIndexConfig` and `StarTreeIndexConfig` extend `BaseJsonConfig` directly (not `IndexConfig`), so they do not call `super.toJsonObject()` -- they start with a fresh `ObjectNode` and emit required fields plus any non-default optionals. **4. Preserved fat-form-tolerant `@JsonCreator`s + regression tests** The wire format is strictly narrower, but the *read* path is unchanged: every existing `@JsonCreator` still accepts the old fat JSON. Pre-fix and post-fix payloads both deserialize identically, which keeps mixed-version clusters safe (pre-fix parsers already accepted slim payloads -- the preview pipeline has always emitted slim). To pin this contract: - `IndexConfigSerializationTest` + 11 sibling `*SerializationTest` classes lock the slim wire shape per class via a standardized nine-test matrix (defaults -> empty JSON, each non-default field round-trips, fat JSON still deserializes, `@JsonValue` is what Jackson actually invokes, etc.). - `IndexConfigSlimRoundTripTest` (cross-cutting, in `pinot-segment-local`) pushes a slim JSON through `FieldConfig.indexes` on a real `TableConfig`, resolves via `FieldIndexConfigsUtil` (the same path the broker / server use), re-serializes the resolved POJO, and asserts byte-equivalence with the original input -- this catches any wiring regression between the new serializers and `IndexType.getConfig(...)`. **5. Backward compatibility** Strictly narrower wire format. Reads are unchanged -- existing fat and slim payloads both still deserialize through the unchanged `@JsonCreator`s. No ZK migration: old fat znodes keep working; the next write replaces them with slim znodes. Pre-fix `BaseJsonConfig.equals()` / `hashCode()` rely on `toJsonNode()`, which after this PR routes through `@JsonValue toJsonObject()` -- comparisons are now based on slim outputs, but two POJOs that differ only in default-valued fields were already equal pre-fix, so no behavior change. --- ### New Flow `*IndexConfig` write path after this PR: ``` slim JSON input --> @JsonCreator ctor --> POJO (defaults materialized) | +--------------------------+ | v @JsonValue toJsonObject() | v slim JSON (only non-default fields) --> ZooKeeper / REST response / proxy ``` Example -- the original `connection__cell__timeouts` repro: ``` # Before -- slim input gets fattened on first round-trip POST /tables/inferIndexes { "forward": { "compressionCodec": "SNAPPY" } } zk/get /CONFIGS/TABLE/... { "forward": { "disabled": false, "compressionCodec": "SNAPPY", "deriveNumDocsPerChunk": false, "rawIndexWriterVersion": 4, "targetMaxChunkSize": "1M", "targetDocsPerChunk": 1024 } } # After -- slim input survives the round-trip intact POST /tables/inferIndexes { "forward": { "compressionCodec": "SNAPPY" } } zk/get /CONFIGS/TABLE/... { "forward": { "compressionCodec": "SNAPPY" } } ``` --- ### Testing - [x] `checkstyle:check` clean on `pinot-spi`, `pinot-segment-spi`, `pinot-segment-local` -- 0 violations - [x] `license:check` clean on the same three modules - [x] `mvn test` green: `pinot-spi` 604/604, `pinot-segment-spi` 342/342, `pinot-common` build success, `pinot-segment-local` 4270/4284 (the 14 unrelated failures are pre-existing JDK 21 / `datasketches.WritableMemory` initialization errors in `*SketchValueAggregatorTest` classes that we did not touch) - [x] 12 new `*SerializationTest` classes (5 in `pinot-spi`, 7 in `pinot-segment-spi`) implementing the standardized nine-test matrix per class - [x] `IndexConfigSlimRoundTripTest` -- 14 cross-cutting tests covering 9 index types (forward, dictionary, range, json, bloom, text, h3, plus universal `disabled=true`) via the real `IndexLoadingConfig` resolution path - [x] `JsonIndexConfigSerializationTest.testEachNonDefaultFieldRoundTrips` -- pins the `maxLevels` idempotency fix - [x] `ForwardIndexConfigSerializationTest` + `IndexConfigSlimRoundTripTest.slimForwardConfigDoesNotEmitClusterTunableDefaults` -- pin the cluster-tunable-static behavior - [x] Pre-existing pinned-fat assertions updated inline with slim shape: `H3IndexConfigTest.serde`, `VectorConfigTest.serde`, `TextIndexConfigTest.testEmptyTextConfigRoundTrip`, `H3IndexTest.testConvertToUpdatedFormat`, `VectorIndexTest.testConvertToUpdatedFormat`, `RangeIndexTest.oldToNewConfConversion`, `TextIndexTest.oldToNewConfConversion` - [x] Manual repro of `connection__cell__timeouts`: `POST /tables/inferIndexes` with slim forward config -> response `forward` block equals slim input; `PUT /tableConfigs/{name}` -> `zk/get` znode is slim -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
