hongkunxu opened a new pull request, #18564:
URL: https://github.com/apache/pinot/pull/18564

   ## Motivation
   
   Today, "is this a materialized view (MV) table?" requires scattered 
heuristics:
   
   - scanning `tableConfig.task` for a `MaterializedViewTask` entry,
   - fetching the MV definition znode from ZK,
   - or parsing `definedSQL` with `CalciteSqlParser`.
   
   This has three concrete problems:
   
   1. **No single source of truth.** Different call sites use different checks; 
they can disagree and tend to drift apart over time.
   2. **Identity is observable only after the scheduler has run.** The MV 
definition znode is created lazily by the scheduler on the first task run, not 
synchronously at table create — so any caller that asks "is MV?" between create 
and the first scheduler tick gets the wrong answer.
   3. **Task config doubles as identity.** `MaterializedViewTask` (and 
`definedSQL`) describe *execution*, not *identity*. Using them for identity 
means the create/update path cannot reject malformed combinations early, and 
there is no place to anchor backward-compatibility invariants like 
"`definedSQL` is immutable".
   
   ## What this PR does
   
   Introduce `isMaterializedView` on `TableConfig` as the **single source of 
truth** for MV identity, modeled on `isDimTable`. Every code path that needs to 
know "is this an MV table?" now reads exactly one boolean.
   
   ## Resulting layering
   
   | Layer | Responsibility | Owns |
   |-------|----------------|------|
   | **`pinot-spi` — `TableConfig.isMaterializedView`** | MV identity 
(declarative). | `IS_MATERIALIZED_VIEW_KEY` field, getter, 
`TableConfigBuilder.setIsMaterializedView`, ZN serde, DDL property mapping. |
   | **`pinot-segment-local` — `TableConfigUtils`** | Structural invariants. | 
`validateMaterializedViewInvariants` (flag ⇒ `OFFLINE` + `MaterializedViewTask` 
+ non-empty `definedSQL`; task without flag rejected; task without `definedSQL` 
rejected; cannot also be `isDimTable`). `validateBackwardCompatibility` — flag 
is immutable; `definedSQL` is immutable after creation; `MaterializedViewTask` 
cannot be removed from an MV table. |
   | **`pinot-materialized-view` — `MaterializedViewAnalyzer` / `Scheduler`** | 
Semantic validation + task generation. | `analyze()` requires the flag; 
`MaterializedViewTaskScheduler.generateTasks` skips non-MV tables. |
   | **`pinot-controller` — `PinotHelixResourceManager`** | Wire MV into 
cluster lifecycle. | Both 
`notifyMaterializedViewConsistencyManagerForTableCreate` and `…ForTableDrop` 
gated on the flag; the drop path no longer falls back to task-config scan or 
`CalciteSqlParser` to infer identity. |
   | **ZK `Definition` / `Runtime` znodes** | Runtime view of the MV instance. 
| Unchanged — still drive `watermarkMs`, partition VALID/STALE, etc. Identity 
and runtime are now cleanly separated. |
   
   Each layer depends only on the one above, with no circular reads. 
`MaterializedViewTask` keeps its original role: carrier of execution parameters 
(`definedSQL`, `bucketTimePeriod`, …), not identity.
   
   ## Benefits
   
   - **One bit answers identity** — no more scanning task configs, fetching 
znodes, or parsing SQL just to ask "is this an MV?". Identity is now correct 
from the moment the table config is written, not from the first scheduler tick.
   - **Fail-fast on create/update** — malformed combinations (e.g. 
`MaterializedViewTask` without the flag, flag without `OFFLINE`, flag together 
with `isDimTable`) are rejected at the validation gate, instead of producing a 
half-configured MV that the scheduler / consistency manager later trips over.
   - **Immutability has a place to live** — `definedSQL` immutability and 
"can't drop `MaterializedViewTask` from an MV table" are now enforced by 
`validateBackwardCompatibility`, aligned with 
`pinot-materialized-view/DESIGN.md`.
   - **Clean module boundaries** — `pinot-spi` owns the flag, 
`pinot-segment-local` owns the structural rules, `pinot-materialized-view` owns 
the semantics, `pinot-controller` only consumes the flag. `pinot-segment-local` 
does **not** depend on `pinot-materialized-view`.
   - **Easier to extend** — future MV-aware code (broker rewrite, REST 
listings, UI filters, etc.) can rely on a single declarative bit instead of 
replicating heuristics.
   
   ## Tests
   
   - `MaterializedViewAnalyzerTest`: every MV `TableConfigBuilder` now calls 
`.setIsMaterializedView(true)`; new test asserts `analyze()` fails when the 
flag is missing. 50/50 pass.
   - `TableConfigUtilsTest`: new tests cover the four 
`validateMaterializedViewInvariants` branches and the backward-compat rules 
(`definedSQL` immutable, flag cannot change). Existing positional 
`TableConfig(...)` call sites updated for the new parameter.
   - Checkstyle clean on `pinot-spi`, `pinot-common`, `pinot-segment-local`, 
`pinot-materialized-view`, `pinot-sql-ddl`, `pinot-controller`.
   - `pinot-connectors/pinot-spark-3-connector` recompiled against the new 
constructor signature.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to