anshul98ks123 opened a new pull request, #18317:
URL: https://github.com/apache/pinot/pull/18317

   ### Issue(s)
   
   Follow-up to [apache/pinot#17558 -- "Slim JSON serialization for Schema and 
TableConfigs"](https://github.com/apache/pinot/pull/17558).
   
   The previous PR fixed slim serialization for `Schema` and `TableConfigs`, 
but left every `*IndexConfig` POJO using Jackson's default bean serializer. As 
a result, slim user-supplied index configs (e.g. `{"encodingType":"RAW"}`, 
`{"fpp":0.5}`) get *fattened* with default values on the first round-trip 
through any Pinot `ObjectMapper` (controller REST handler, ZooKeeper writer, 
proxy, broker preview path). This makes it impossible to keep ZK storage 
minimal and produces noisy UI diffs against freshly-edited table configs. The 
user-visible repro -- `connection__cell__timeouts` -- showed a `forward` block 
ballooning from `{"compressionCodec":"SNAPPY"}` to a six-key blob the moment it 
touched the controller.
   
   This PR rebrands the entire `*IndexConfig` serialization surface as 
**slim-by-default**, so any new index POJO follows the same single rule (emit 
only fields that differ from the class default) without further per-class 
plumbing.
   
   ---
   
   ### Description
   
   This PR adds curated `@JsonValue toJsonObject()` slim serializers to **12 
`*IndexConfig` classes** spanning `pinot-spi` and `pinot-segment-spi`, and 
updates the small set of in-tree tests that previously pinned the fat wire 
shape. It is wire-format-narrowing only (no read changes), but it is submitted 
as a single PR so all `*IndexConfig` POJOs flip to the slim shape atomically -- 
mixing slim and fat serializers across configs would defeat the whole point.
   
   **Background -- The `*IndexConfig` Serialization Surface (Before):**
   
   | Layer | Shape (before) | Shape (after) |
   |---|---|---|
   | Default bean serializer | every getter emitted -- `disabled`, 
primitive-`0`, `null`, empty `[]`, every Lucene default | `@JsonValue 
toJsonObject()` -- only fields whose value differs from the class default |
   | `IndexConfig` (base) | `{"disabled":false}` | `{}` (only 
`{"disabled":true}` if disabled) |
   | `BloomFilterConfig` | 
`{"disabled":false,"fpp":0.05,"maxSizeInBytes":0,"loadOnHeap":false}` | `{}` |
   | `JsonIndexConfig` | 
`{"disabled":false,"maxLevels":-1,"excludeArray":false,...}` | `{}` (and slim 
form is now idempotent under round-trip) |
   | `ForwardIndexConfig` | static cluster-tunable defaults baked into JSON 
(`rawIndexWriterVersion=4`, `targetMaxChunkSize=1MB`, 
`targetDocsPerChunk=1024`) | omitted -- compared against the *live* default at 
call time |
   | `TextIndexConfig` (18 Lucene fields) | every default emitted, including 
`"luceneAnalyzerClassArgs":[]` (which itself broke deserialization -- see 
`testEmptyTextConfigRoundTrip`) | only non-default fields; empty-list defaults 
are omitted entirely |
   | `H3IndexConfig` | `{"disabled":false,"resolution":[5,6,13]}` | 
`{"resolution":[5,6,13]}` |
   | `VectorIndexConfig` | `{"disabled":false,"vectorDimension":0,"version":0}` 
| `{}` |
   | `MultiColumnTextIndexConfig` / `StarTreeIndexConfig` | full bean shape 
(`BaseJsonConfig` subclasses, no `disabled`) | only required + non-default 
fields |
   
   ---
   
   ### The Problem
   
   1. **Slim input -> fat output on first round-trip.** A user posts 
`{"forward":{"compressionCodec":"SNAPPY"}}` to `inferIndexes` / `applyIndexes`. 
The controller deserializes via the existing `@JsonCreator` (which fills 
primitives + cluster-tunable statics with defaults), then re-serializes with 
the *default bean serializer*, producing the six-key fat blob. The slim shape 
is silently lost the moment any code path round-trips the POJO.
   2. **Cluster-tunable static defaults leak into ZK.** 
`ForwardIndexConfig._rawIndexWriterVersion` / `_targetMaxChunkSize` / 
`_targetDocsPerChunk` are mutable statics. Pre-fix, the *value at write time* 
was baked into the ZK znode; post-fix, the slim form omits the field iff it 
equals the *live* default at that instant -- so retuning the cluster default 
does not require rewriting every znode.
   3. **Primitive-`int` defaults are indistinguishable from "absent".** 
`VectorIndexConfig`'s `@JsonCreator` takes `int vectorDimension` / `int 
version`; Jackson fills missing values with `0`. Pre-fix, the bean serializer 
always emitted `"vectorDimension":0,"version":0`. Pre-fix `*Test` classes 
pinned that exact string.
   4. **`JsonIndexConfig.maxLevels` was non-idempotent.** Field initializer is 
`-1` (meaning "unlimited"), `@JsonCreator` parameter is primitive `int` 
(default `0`). The pre-fix serializer omitted `maxLevels` only on `-1`, so 
deserializing slim JSON -> POJO -> re-serializing produced 
`{"maxLevels":0,...}` -- a strictly different wire shape than the slim input. 
Functionally `0` and `-1` both mean "no limit" (`JsonUtils.flatten()` only 
consults `maxLevels > 0`), so this was a serializer bug, not a config bug.
   5. **In-tree tests pinned the fat shape.** `H3IndexConfigTest.serde`, 
`VectorConfigTest.serde`, `TextIndexConfigTest.testEmptyTextConfigRoundTrip` 
(in `pinot-segment-spi`), and `H3IndexTest`, `VectorIndexTest`, 
`RangeIndexTest`, `TextIndexTest` (`testConvertToUpdatedFormat` / 
`oldToNewConfConversion` in `pinot-segment-local`) all asserted on the exact 
fat string -- so the slim flip required updating these inline.
   
   ---
   
   ### Solution
   
   **1. `@JsonValue toJsonObject()` on every `*IndexConfig`**
   
   Each class gets one curated method that builds a fresh `ObjectNode`, calls 
`super.toJsonObject()` first to inherit `disabled` handling (when subclassing 
`IndexConfig`), and emits a field only when its value differs from the class 
default. Jackson sees the `@JsonValue` annotation and *skips* default bean 
introspection entirely -- the method is the sole source of truth.
   
   ```java
   // IndexConfig (base) -- emits only the disabled=true case
   @JsonValue
   public ObjectNode toJsonObject() {
     ObjectNode node = JsonUtils.newObjectNode();
     if (_disabled) {
       node.put("disabled", true);
     }
     return node;
   }
   
   // ForwardIndexConfig -- reads cluster-tunable statics at call time
   @Override
   @JsonValue
   public ObjectNode toJsonObject() {
     ObjectNode node = super.toJsonObject();
     if (_compressionCodec != null) {
       node.put("compressionCodec", _compressionCodec.name());
     }
     if (_deriveNumDocsPerChunk) {
       node.put("deriveNumDocsPerChunk", true);
     }
     // Read cluster-tunable statics at call time so the serialized form tracks 
the live default.
     if (_rawIndexWriterVersion != _defaultRawIndexWriterVersion) {
       node.put("rawIndexWriterVersion", _rawIndexWriterVersion);
     }
     // ...
     return node;
   }
   ```
   
   **2. `JsonIndexConfig.maxLevels` idempotency fix**
   
   The pre-fix omission rule (`_maxLevels != -1`) was tightened to `_maxLevels 
> 0`, treating both `-1` (field-initializer default) and `0` (`@JsonCreator` 
primitive default) as "default unlimited". This makes the slim form idempotent: 
`serialize -> deserialize -> serialize` produces a byte-identical result.
   
   ```java
   // Before -- non-idempotent: deserializing slim JSON makes _maxLevels=0, 
which then re-serializes
   if (_maxLevels != -1) {
     node.put("maxLevels", _maxLevels);
   }
   
   // After -- both -1 and 0 functionally mean "unlimited"; JsonUtils.flatten() 
only consults maxLevels > 0
   if (_maxLevels > 0) {
     node.put("maxLevels", _maxLevels);
   }
   ```
   
   **3. Two standalone serializers for `BaseJsonConfig` subclasses**
   
   `MultiColumnTextIndexConfig` and `StarTreeIndexConfig` extend 
`BaseJsonConfig` directly (not `IndexConfig`), so they do not call 
`super.toJsonObject()` -- they start with a fresh `ObjectNode` and emit 
required fields plus any non-default optionals.
   
   **4. Preserved fat-form-tolerant `@JsonCreator`s + regression tests**
   
   The wire format is strictly narrower, but the *read* path is unchanged: 
every existing `@JsonCreator` still accepts the old fat JSON. Pre-fix and 
post-fix payloads both deserialize identically, which keeps mixed-version 
clusters safe (pre-fix parsers already accepted slim payloads -- the preview 
pipeline has always emitted slim).
   
   To pin this contract:
   - `IndexConfigSerializationTest` + 11 sibling `*SerializationTest` classes 
lock the slim wire shape per class via a standardized nine-test matrix 
(defaults -> empty JSON, each non-default field round-trips, fat JSON still 
deserializes, `@JsonValue` is what Jackson actually invokes, etc.).
   - `IndexConfigSlimRoundTripTest` (cross-cutting, in `pinot-segment-local`) 
pushes a slim JSON through `FieldConfig.indexes` on a real `TableConfig`, 
resolves via `FieldIndexConfigsUtil` (the same path the broker / server use), 
re-serializes the resolved POJO, and asserts byte-equivalence with the original 
input -- this catches any wiring regression between the new serializers and 
`IndexType.getConfig(...)`.
   
   **5. Backward compatibility**
   
   Strictly narrower wire format. Reads are unchanged -- existing fat and slim 
payloads both still deserialize through the unchanged `@JsonCreator`s. No ZK 
migration: old fat znodes keep working; the next write replaces them with slim 
znodes. Pre-fix `BaseJsonConfig.equals()` / `hashCode()` rely on 
`toJsonNode()`, which after this PR routes through `@JsonValue toJsonObject()` 
-- comparisons are now based on slim outputs, but two POJOs that differ only in 
default-valued fields were already equal pre-fix, so no behavior change.
   
   ---
   
   ### New Flow
   
   `*IndexConfig` write path after this PR:
   
   ```
   slim JSON input  -->  @JsonCreator ctor  -->  POJO (defaults materialized)
                                                          |
                               +--------------------------+
                               |
                               v
                        @JsonValue toJsonObject()
                               |
                               v
                 slim JSON (only non-default fields)  -->  ZooKeeper / REST 
response / proxy
   ```
   
   Example -- the original `connection__cell__timeouts` repro:
   
   ```
   # Before -- slim input gets fattened on first round-trip
   POST /tables/inferIndexes  { "forward": { "compressionCodec": "SNAPPY" } }
   zk/get /CONFIGS/TABLE/...   { "forward": {
                                   "disabled": false,
                                   "compressionCodec": "SNAPPY",
                                   "deriveNumDocsPerChunk": false,
                                   "rawIndexWriterVersion": 4,
                                   "targetMaxChunkSize": "1M",
                                   "targetDocsPerChunk": 1024
                                 } }
   
   # After -- slim input survives the round-trip intact
   POST /tables/inferIndexes  { "forward": { "compressionCodec": "SNAPPY" } }
   zk/get /CONFIGS/TABLE/...   { "forward": { "compressionCodec": "SNAPPY" } }
   ```
   
   ---
   
   ### Testing
   
   - [x] `checkstyle:check` clean on `pinot-spi`, `pinot-segment-spi`, 
`pinot-segment-local` -- 0 violations
   - [x] `license:check` clean on the same three modules
   - [x] `mvn test` green: `pinot-spi` 604/604, `pinot-segment-spi` 342/342, 
`pinot-common` build success, `pinot-segment-local` 4270/4284 (the 14 unrelated 
failures are pre-existing JDK 21 / `datasketches.WritableMemory` initialization 
errors in `*SketchValueAggregatorTest` classes that we did not touch)
   - [x] 12 new `*SerializationTest` classes (5 in `pinot-spi`, 7 in 
`pinot-segment-spi`) implementing the standardized nine-test matrix per class
   - [x] `IndexConfigSlimRoundTripTest` -- 14 cross-cutting tests covering 9 
index types (forward, dictionary, range, json, bloom, text, h3, plus universal 
`disabled=true`) via the real `IndexLoadingConfig` resolution path
   - [x] `JsonIndexConfigSerializationTest.testEachNonDefaultFieldRoundTrips` 
-- pins the `maxLevels` idempotency fix
   - [x] `ForwardIndexConfigSerializationTest` + 
`IndexConfigSlimRoundTripTest.slimForwardConfigDoesNotEmitClusterTunableDefaults`
 -- pin the cluster-tunable-static behavior
   - [x] Pre-existing pinned-fat assertions updated inline with slim shape: 
`H3IndexConfigTest.serde`, `VectorConfigTest.serde`, 
`TextIndexConfigTest.testEmptyTextConfigRoundTrip`, 
`H3IndexTest.testConvertToUpdatedFormat`, 
`VectorIndexTest.testConvertToUpdatedFormat`, 
`RangeIndexTest.oldToNewConfConversion`, `TextIndexTest.oldToNewConfConversion`
   - [x] Manual repro of `connection__cell__timeouts`: `POST 
/tables/inferIndexes` with slim forward config -> response `forward` block 
equals slim input; `PUT /tableConfigs/{name}` -> `zk/get` znode is slim
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to