hongkunxu commented on issue #17298:
URL: https://github.com/apache/pinot/issues/17298#issuecomment-4535005627

   Hi @chrajeshbabu 
   
   Thanks for raising this PEP — the goals here line up almost exactly with the 
MV work we have been landing over the last weeks. Quick status from our side:
   
   **Already merged / in review:**
   
   | Area | PR | Status |
   | --- | --- | --- |
   | MV creation, metadata model, ingestion (Minion task framework, 
ZK-persisted definition + runtime, partition fingerprinting, 
APPEND/OVERWRITE/DELETE lifecycle) | 
[#18528](https://github.com/apache/pinot/pull/18528) | Merged |
   | Calcite-based query rewrite (subsumption strategies, broker rewrite 
engine, hybrid MV+base execution, per-MV `rewriteEnabled` / staleness SLO 
gates) | [#18529](https://github.com/apache/pinot/pull/18529) | In review |
   | `CREATE / DROP / ALTER MATERIALIZED VIEW` DDL | 
[#18544](https://github.com/apache/pinot/pull/18544) | In review |
   | MV management UI + Data Sources hub | 
[#18537](https://github.com/apache/pinot/pull/18537) | In review |
   
   **Mapping to the PEP asks:**
   
   - **MV Creation** — done, with one intentional design choice: Pinot builds 
and incrementally refreshes the MV itself via a Minion task driven by a 
`definedSQL`, rather than requiring users to externally produce and ingest MV 
tables. That removes the "users must refresh MV tables when underlying data 
changes" con listed in the PEP — refresh is event-driven (controller marks 
partitions STALE on base-table segment changes; minion picks them up).
   - **Metadata Mapping** — done. MV definition (including the original SQL, 
base tables, bucket spec, rewrite/staleness flags) is persisted under 
`/CONFIGS/MATERIALIZED_VIEW/DEFINITION`; runtime coverage under 
`/CONFIGS/MATERIALIZED_VIEW/RUNTIME`. Base-table delete is blocked while 
dependent MVs exist.
   - **Query Rewrite** — done in #18529. Broker-side Calcite rewrite picks the 
cheapest covering MV per query, with a staleness SLO gate and a per-MV 
`rewriteEnabled` switch; falls back transparently to the base table for 
uncovered ranges.
   - **Star-tree on MV tables** — naturally supported. MV tables are regular 
Pinot OFFLINE tables, so all existing index types (star-tree, sorted, inverted, 
etc.) work as-is. The "smaller table → cheaper star-tree" benefit the PEP 
describes comes for free.
   
   **Observed benefits:**
   
   - **Drastically reduced scan / IO volume** — wide-column base tables get 
pre-aggregated and projected down to only the columns and grain a query family 
actually needs, so a rewritten query reads orders of magnitude fewer segments, 
columns, and rows than the original.
   - **Order-of-magnitude lower aggregation latency on large tables** — 
group-by / sum / count / distinct-count workloads that previously had to scan 
billions of base rows now hit a handful of pre-aggregated MV rows. In our 
internal benchmarks, ad-hoc aggregation latency on wide tables drops from 
multi-second to sub-second, while p99 stays stable under concurrency because 
the MV path neither fans out across the full base segment set nor competes for 
the same scan budget.
   - **Compounding effect with star-tree** — because MV tables are small and 
dense, layering a star-tree on top stays cheap on disk, so the two 
optimizations stack instead of trading off.
   - **Transparent to clients** — applications keep issuing queries against the 
base table; the broker rewriter routes them to the MV automatically and falls 
back to base for uncovered ranges, so the latency win requires zero query-side 
change.
   
   **Scope of phase 1:** offline base tables only, exactly as the PEP suggests. 
REALTIME source tables are explicitly rejected at MV-create time today; 
LLC-commit notify wiring will come in a follow-up.
   
   Happy to consolidate the PEP around this implementation rather than carry 
two parallel proposals — let me know if it would help to walk through the 
design doc / quickstart on a call.
   
   cc @xiangfu0 @Jackie-Jiang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to