findinpath commented on code in PR #11041:
URL: https://github.com/apache/iceberg/pull/11041#discussion_r2872952820
##########
format/view-spec.md:
##########
@@ -322,3 +453,96 @@
s3://bucket/warehouse/default.db/event_agg/metadata/00002-(uuid).metadata.json
} ]
}
```
+
+### Materialized View Example
+
+Imagine the following operation, which creates a materialized view that
precomputes daily event counts:
+
+```sql
+USE prod.default
+```
+```sql
+CREATE MATERIALIZED VIEW event_agg_mv (
+ event_count COMMENT 'Count of events',
+ event_date)
+COMMENT 'Precomputed daily event counts'
+AS
+SELECT
+ COUNT(1), CAST(event_ts AS DATE)
+FROM events
+GROUP BY 2
+```
+
+The materialized view metadata JSON file looks as follows:
+
+```
+s3://bucket/warehouse/default.db/event_agg_mv/metadata/00001-(uuid).metadata.json
+```
+```
+{
+ "view-uuid": "b2a12651-3038-4a72-8a31-5027ab84da35",
+ "format-version" : 1,
+ "location" : "s3://bucket/warehouse/default.db/event_agg_mv",
+ "current-version-id" : 1,
+ "properties" : {
+ "comment" : "Precomputed daily event counts"
+ },
+ "versions" : [ {
+ "version-id" : 1,
+ "timestamp-ms" : 1573518431292,
+ "schema-id" : 1,
+ "default-catalog" : "prod",
+ "default-namespace" : [ "default" ],
+ "summary" : {
+ "engine-name" : "Spark",
+ "engine-version" : "3.4.1"
+ },
+ "representations" : [ {
+ "type" : "sql",
+ "sql" : "SELECT\n COUNT(1), CAST(event_ts AS DATE)\nFROM
events\nGROUP BY 2",
+ "dialect" : "spark"
+ } ],
+ "storage-table" : {
+ "namespace" : [ "default" ],
+ "name" : "event_agg_mv__storage"
+ }
+ } ],
+ "schemas": [ {
+ "schema-id": 1,
+ "type" : "struct",
+ "fields" : [ {
+ "id" : 1,
+ "name" : "event_count",
+ "required" : false,
+ "type" : "int",
+ "doc" : "Count of events"
+ }, {
+ "id" : 2,
+ "name" : "event_date",
+ "required" : false,
+ "type" : "date"
+ } ]
+ } ],
+ "version-log" : [ {
+ "timestamp-ms" : 1573518431292,
+ "version-id" : 1
+ } ]
+}
+```
+
+After a refresh operation, the storage table's snapshot summary contains the
`refresh-state` property.
Review Comment:
With regards to repeated "refresh" operations, is there any note worth
adding in the spec on expiring snapshots of the storage table.
Is the expiration of previous snapshots a query engine detail or there is an
opportunity to add more details on the spec about how it is happening?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]