hongkunxu commented on issue #17298: URL: https://github.com/apache/pinot/issues/17298#issuecomment-4535005627
Hi @chrajeshbabu Thanks for raising this PEP — the goals here line up almost exactly with the MV work we have been landing over the last weeks. Quick status from our side: **Already merged / in review:** | Area | PR | Status | | --- | --- | --- | | MV creation, metadata model, ingestion (Minion task framework, ZK-persisted definition + runtime, partition fingerprinting, APPEND/OVERWRITE/DELETE lifecycle) | [#18528](https://github.com/apache/pinot/pull/18528) | Merged | | Calcite-based query rewrite (subsumption strategies, broker rewrite engine, hybrid MV+base execution, per-MV `rewriteEnabled` / staleness SLO gates) | [#18529](https://github.com/apache/pinot/pull/18529) | In review | | `CREATE / DROP / ALTER MATERIALIZED VIEW` DDL | [#18544](https://github.com/apache/pinot/pull/18544) | In review | | MV management UI + Data Sources hub | [#18537](https://github.com/apache/pinot/pull/18537) | In review | **Mapping to the PEP asks:** - **MV Creation** — done, with one intentional design choice: Pinot builds and incrementally refreshes the MV itself via a Minion task driven by a `definedSQL`, rather than requiring users to externally produce and ingest MV tables. That removes the "users must refresh MV tables when underlying data changes" con listed in the PEP — refresh is event-driven (controller marks partitions STALE on base-table segment changes; minion picks them up). - **Metadata Mapping** — done. MV definition (including the original SQL, base tables, bucket spec, rewrite/staleness flags) is persisted under `/CONFIGS/MATERIALIZED_VIEW/DEFINITION`; runtime coverage under `/CONFIGS/MATERIALIZED_VIEW/RUNTIME`. Base-table delete is blocked while dependent MVs exist. - **Query Rewrite** — done in #18529. Broker-side Calcite rewrite picks the cheapest covering MV per query, with a staleness SLO gate and a per-MV `rewriteEnabled` switch; falls back transparently to the base table for uncovered ranges. - **Star-tree on MV tables** — naturally supported. MV tables are regular Pinot OFFLINE tables, so all existing index types (star-tree, sorted, inverted, etc.) work as-is. The "smaller table → cheaper star-tree" benefit the PEP describes comes for free. **Observed benefits:** - **Drastically reduced scan / IO volume** — wide-column base tables get pre-aggregated and projected down to only the columns and grain a query family actually needs, so a rewritten query reads orders of magnitude fewer segments, columns, and rows than the original. - **Order-of-magnitude lower aggregation latency on large tables** — group-by / sum / count / distinct-count workloads that previously had to scan billions of base rows now hit a handful of pre-aggregated MV rows. In our internal benchmarks, ad-hoc aggregation latency on wide tables drops from multi-second to sub-second, while p99 stays stable under concurrency because the MV path neither fans out across the full base segment set nor competes for the same scan budget. - **Compounding effect with star-tree** — because MV tables are small and dense, layering a star-tree on top stays cheap on disk, so the two optimizations stack instead of trading off. - **Transparent to clients** — applications keep issuing queries against the base table; the broker rewriter routes them to the MV automatically and falls back to base for uncovered ranges, so the latency win requires zero query-side change. **Scope of phase 1:** offline base tables only, exactly as the PEP suggests. REALTIME source tables are explicitly rejected at MV-create time today; LLC-commit notify wiring will come in a follow-up. Happy to consolidate the PEP around this implementation rather than carry two parallel proposals — let me know if it would help to walk through the design doc / quickstart on a call. cc @xiangfu0 @Jackie-Jiang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
