Re: Materialized view integration with REST spec

Micah Kornfield Wed, 21 Feb 2024 10:59:03 -0800

>
> Of course we also need threads that express our preferences (voting). I
> would suggest to keep these separate from discussions about single points
> so that they can be persisted in the document.



Not sure if it helpful, but I added voting chips Question 0, as maybe an
easier way to keep track of votes.  If it is helpful, I can add them in
other places that still need a vote (I think one needs a paid Google Docs
account to insert them).

Thanks,
Micah

On Wed, Feb 21, 2024 at 10:23 AM Szehon Ho <szehon.apa...@gmail.com> wrote:

> Thanks Jan.  +1 on having just one thread per question for
> vote/preference.  Where do you suggest we have it, on the discussion
> question itself?  It would be to keep the existing threads and move it
> there.
>
> Also, I think it makes sense with making a slack channel (for quick
> question, reply) , and also discuss unresolved questions in the next week's
> sync or a separate meeting.
>
> On Wed, Feb 21, 2024 at 12:40 AM Jan Kaul <jank...@mailbox.org.invalid>
> wrote:
>
>> Thank you Jack for driving the consensus for the MV spec and thank you
>> all for the discussion.
>>
>> I really like the idea about incremental consensus because we often loose
>> sight in detailed discussions. As Jack mentioned, the highest priority
>> question currently is:
>>
>> *Should the Iceberg MV be realized as a view + storage table or do we
>> define a new metadata format? *To have one place for the discussion, I
>> created another Question (
>> https://docs.google.com/document/d/1UnhldHhe3Grz8JBngwXPA6ZZord1xMedY5ukEhZYF-A/edit?pli=1#heading=h.y70rtfhi9qxi)
>> to the Materialized View Spec google document.
>>
>> To improve the visibility of the arguments I would like to propose a new
>> process. It would be great if all relevant information is stored in the
>> document itself. Therefore I would suggest to use the comment threads for
>> smaller, temporary discussions which can be resolved by adding the points
>> to the main document. Please close the threads if the information was added
>> to the document. Additionally, I gave you all permissions to edit the
>> documents, so you can add missing points yourselves.
>>
>> Of course we also need threads that express our preferences (voting). I
>> would suggest to keep these separate from discussions about single points
>> so that they can be persisted in the document.
>>
>> After a phase of collecting arguments for the different designs I think
>> it would make sense to have video call to have a face to face discussion.
>>
>> What do you think?
>>
>> Best wishes,
>>
>> Jan
>> On 20.02.24 21:32, Manish Malhotra wrote:
>>
>> Very excited for MV to be in Iceberg :)
>> Keeping in the same doc. would be helpful, to have the trail.
>> But also agreed, if there are too many directions/threads, then keep
>> closing the old one, if there are no more questions.
>> And put down the assumptions for the initial version to move forward.
>>
>>
>> On Tue, Feb 20, 2024 at 12:17 PM Walaa Eldin Moustafa <
>> wa.moust...@gmail.com> wrote:
>>
>>> I would vote to keep a log in the doc with open questions, and keep the
>>> doc updated with open questions as they arise/get resolved.
>>>
>>> On Tue, Feb 20, 2024 at 11:37 AM Jack Ye <yezhao...@gmail.com> wrote:
>>>
>>>> Thanks for the response from everyone!
>>>>
>>>> Before proceeding further, I see a few people referring back to the
>>>> current design from Jan. I specifically raised this thread based on the
>>>> information in the doc and a few latest discussions we had there. Because
>>>> there are many threads in the doc, and each thread points further to other
>>>> discussion threads in the same doc or other doc, it is now quite hard to
>>>> follow and continue discussing all different topics there.
>>>>
>>>> I hope we can make incremental consensus of the questions in the doc
>>>> through devlist, because it provides more visibility, and also a single
>>>> thread instead of multiple threads going on at the same time. If we think
>>>> this format is not effective, I propose that we create a new mv channel in
>>>> Iceberg Slack workspace, and people interested can join and discuss all
>>>> these points directly. What do we think?
>>>>
>>>> Best,
>>>> Jack Ye
>>>>
>>>>
>>>>
>>>> On Mon, Feb 19, 2024 at 6:03 PM Szehon Ho <szehon.apa...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Great to see more discussion on the MV spec.  Actually, Jan's document 
>>>>> "Iceberg
>>>>> Materialized View Spec"
>>>>> <https://docs.google.com/document/d/1UnhldHhe3Grz8JBngwXPA6ZZord1xMedY5ukEhZYF-A>
>>>>>  has
>>>>> been organized , with a "Design Questions" section to track these debates,
>>>>> and it would be nice to centralize the debates there, as Micah mentions.
>>>>>
>>>>> For Dan's question, I think this debate was tracked in "Design Question
>>>>> 3: Should the storage table be registered in the catalog?". I think the
>>>>> general idea there was to not expose it directly via Catalog as it is then
>>>>> exposed to user modification. If the engine wants to access anything about
>>>>> the storage table (including audit and storage), it is of course there via
>>>>> the storage table pointer. I think Walaa's point is also good, we could
>>>>> expose it as we expose metadata tables, but I am still not sure if there 
>>>>> is
>>>>> still some use-cases of engine access not covered?
>>>>>
>>>>> It is true that for Jack's initial question (Do we really want to go
>>>>> with the MV = view + storage table design approach for Iceberg MV?),
>>>>> unfortunately we did not capture it as a "Design Question" in Jan's doc, 
>>>>> as
>>>>> it was an implicit assumption of 'yes', because it is the choice of Hive,
>>>>> Trino, and other engines , as others have pointed out.
>>>>>
>>>>> Jack's point about potential evolution of MV (like to add
>>>>> partitioning) is an interesting one, but definitely hard to grasp.  I 
>>>>> think
>>>>> it makes sense to add this as a separate Design Question in the doc, and
>>>>> add the options.  This will allow us to flesh out this alternative
>>>>> option(s).  Maybe Micah's point about modifying existing proposal to
>>>>> 'embed' the required table metadata fields in the existing view metadata,
>>>>> is one middle ground option.  Or we add a totally new MV object spec for
>>>>> MV, separate than existing View spec?
>>>>>
>>>>> Also , as Jack pointed out, it may make sense to have the REST /
>>>>> Catalog API proposal in the doc to educate the above decision.
>>>>>
>>>>> Thanks
>>>>> Szehon
>>>>>
>>>>> On Mon, Feb 19, 2024 at 4:08 PM Walaa Eldin Moustafa <
>>>>> wa.moust...@gmail.com> wrote:
>>>>>
>>>>>> I think it would help if we answer the question of whether an MV is a
>>>>>> view + storage table (and degree of exposing this underlying
>>>>>> implementation) in the context of the user interfacing with those 
>>>>>> concepts:
>>>>>>
>>>>>> For the end user, interfacing with the engine APIs (e.g., through
>>>>>> SQL), materialized view APIs should be almost the same as regular view 
>>>>>> APIs
>>>>>> (except for operations specific to materialized views like REFRESH 
>>>>>> command
>>>>>> etc). Typically, the end user interacts with the (materialized) view 
>>>>>> object
>>>>>> as a view, and the engine performs the abstraction over the storage 
>>>>>> table.
>>>>>>
>>>>>> For the engines interfacing with Iceberg, it sounds the correct
>>>>>> abstraction at this layer is indeed view + storage table, and engines 
>>>>>> could
>>>>>> have access to both objects to optimize queries.
>>>>>>
>>>>>> So in a sense, the engine will ultimately hide most of the
>>>>>> storage detail from the end user (except for advanced users who want to
>>>>>> explicitly access the storage table with a modifier like
>>>>>> "db.view.storageTable" -- and they can only read it), while Iceberg will
>>>>>> expose the storage details to the engine catalog to use it in scans if
>>>>>> needed. So the storage table is hidden or exposed based on the 
>>>>>> context/the
>>>>>> actual users. From Iceberg point of view (which interacts with the
>>>>>> engines), the storage table is exposed. Note that this does not
>>>>>> necessarily mean that the storage table is registered in the catalog with
>>>>>> its own independent name (e.g., where we can drop the view but keep the
>>>>>> storage table and access it from the catalog). Addressing the storage 
>>>>>> table
>>>>>> using a virtual namespace like "db.view.storageTable" sounds like a good
>>>>>> middle ground. Anyways, end users should not need to directly access the
>>>>>> storage table in most cases.
>>>>>>
>>>>>> Thanks,
>>>>>> Walaa.
>>>>>>
>>>>>> On Mon, Feb 19, 2024 at 3:38 PM Micah Kornfield <
>>>>>> emkornfi...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi Jack,
>>>>>>>
>>>>>>>
>>>>>>>> In my mind, the first key point we all need to agree upon to move
>>>>>>>> this design forward is*: Do we really want to go with the MV =
>>>>>>>> view + storage table design approach for Iceberg MV?*
>>>>>>>
>>>>>>>
>>>>>>> I think we want this to the extent that we do not want to redefine
>>>>>>> the same concept with different representations/naming to the greatest
>>>>>>> degree possible.  This is why borrowing the concepts from the view (e.g.
>>>>>>> multiple ways of expressing the same view logic in different dialects) 
>>>>>>> and
>>>>>>> aspects of the materialized data (e.g. partitioning, ordering) feels 
>>>>>>> most
>>>>>>> natural.  IIUC your proposal, I think you are saying maybe two
>>>>>>> modifications to the existing proposals in the document:
>>>>>>>
>>>>>>> 1.  No separate storage table link, instead embed most of the
>>>>>>> metadata of the materialized table into the MV document (the exception
>>>>>>> seems to be snapshot history)
>>>>>>> 2.  For snapshot history, have one unified history specific to the
>>>>>>> MV.
>>>>>>>
>>>>>>> This seems fairly reasonable to me and I think I can solve some
>>>>>>> challenges with the existing proposal in an elegant way.  If this is
>>>>>>> correct (or maybe if it isn't quite correct) perhaps you can make
>>>>>>> suggestions to the document so all of the trade-offs can be discussed in
>>>>>>> one place?
>>>>>>>
>>>>>>> I think the one thing the current draft of the materialized view
>>>>>>> ignores is how to store algebraic summaries (e.g. separate sum and count
>>>>>>> for averages, or other sketches), so that new data can be incrementally
>>>>>>> incorporated.  But representing these structures feels like it 
>>>>>>> potentially
>>>>>>> has value beyond just MVs (e.g. it can be a natural way to express 
>>>>>>> summary
>>>>>>> statistics in table metadata), so I think it deserves at least a try in
>>>>>>> incorporating the concepts in the table specification, so the 
>>>>>>> definitions
>>>>>>> can be shared.  I was imagining this could come as part of the next
>>>>>>> revision of MV specification.
>>>>>>>
>>>>>>> The MV internal structure could evolve in a way that works more
>>>>>>>> efficiently with the reduced scope of functionalities, without relying 
>>>>>>>> on
>>>>>>>> table to offer the same capabilities. I can at least say that is true 
>>>>>>>> based
>>>>>>>> on my internal knowledge of how Redshift MVs work.
>>>>>>>
>>>>>>>
>>>>>>> I'm not sure I fully understand this point, but it seems the main
>>>>>>> question here is what would break if it started to evolve in this
>>>>>>> direction.  Is it purely additive or do we suspect some elements would 
>>>>>>> need
>>>>>>> to be removed?  My gut feeling here is the main concerns here are  
>>>>>>> getting
>>>>>>> the cardinatities correct (i.e. 1 MV should probably have 0, 1 or more
>>>>>>> materialized storage tables associated with it, to support more advanced
>>>>>>> algebraic structures listed above, and perhaps a second without them, 
>>>>>>> and
>>>>>>> additional metadata to distinguish between these two different modes).
>>>>>>>
>>>>>>> If after the evaluation, we are confident that the MV = view +
>>>>>>>> storage table approach is the right way to go, then we can debate the 
>>>>>>>> other
>>>>>>>> issues, and I think the next issue to reach consensus should be 
>>>>>>>> "Should the
>>>>>>>> storage table be registered in the catalog?".
>>>>>>>
>>>>>>>
>>>>>>> I actually think there are actually more fundamental questions posed:
>>>>>>> 1.  Should be considering how items should be modelled in the REST
>>>>>>> API concurrently with the Iceberg spec, as that potentially impacts 
>>>>>>> design
>>>>>>> decision (I think the answer is yes, and we should update the doc with
>>>>>>> sketches on new endpoints and operations on the endpoints to ensure 
>>>>>>> things
>>>>>>> align).
>>>>>>> 2.  Going forward should new aspects of Iceberg artifacts rely on
>>>>>>> the fact that a catalog is present and we can rely on a naming 
>>>>>>> convention
>>>>>>> for looking up other artifacts in a catalog as pointers (I lean yes on
>>>>>>> this, but I'm a little bit more ambivalent).
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Micah
>>>>>>>
>>>>>>> On Mon, Feb 19, 2024 at 12:52 PM Jack Ye <yezhao...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I suggest we need a step-by-step process to make incremental
>>>>>>>> consensus, otherwise we are constantly talking about many different 
>>>>>>>> debates
>>>>>>>> at the same time.
>>>>>>>>
>>>>>>>> In my mind, the first key point we all need to agree upon to move
>>>>>>>> this design forward is*: Do we really want to go with the MV =
>>>>>>>> view + storage table design approach for Iceberg MV?*
>>>>>>>>
>>>>>>>> I think we (at least me) started with this assumption, mostly
>>>>>>>> because this is how Trino implements MV, and how Hive tables store MV
>>>>>>>> information today. But does it mean we should design it that way in 
>>>>>>>> Iceberg?
>>>>>>>>
>>>>>>>> Now I look back at how we did the view spec design, we could also
>>>>>>>> say that we just add a representation field in the table spec to store
>>>>>>>> view, and an Iceberg view is just a table with no data but with
>>>>>>>> representations defined. But we did not do that. So it feels now quite
>>>>>>>> inconsistent to say we want to just add a few fields in the table and 
>>>>>>>> view
>>>>>>>> spec to call it an Iceberg MV.
>>>>>>>>
>>>>>>>> If we look into most of the other database systems (e.g. Redshift,
>>>>>>>> BigQuery, Snowflake), they never expose such implementation details 
>>>>>>>> like
>>>>>>>> storage table. Apart from being close-sourced systems, I think it is 
>>>>>>>> also
>>>>>>>> for good technical reasons. There are many more things that a table 
>>>>>>>> needs
>>>>>>>> to support, but does not really apply to MV. The MV internal structure
>>>>>>>> could evolve in a way that works more efficiently with the reduced 
>>>>>>>> scope of
>>>>>>>> functionalities, without relying on table to offer the same 
>>>>>>>> capabilities. I
>>>>>>>> can at least say that is true based on my internal knowledge of how
>>>>>>>> Redshift MVs work.
>>>>>>>>
>>>>>>>> I think we should fully evaluate both directions, and commit to one
>>>>>>>> first before debating more things.
>>>>>>>>
>>>>>>>> If we have a new and independent Iceberg MV spec, then an Iceberg
>>>>>>>> MV is under-the-hood a single object containing all MV information. It 
>>>>>>>> has
>>>>>>>> its own name, snapshots, view representation, etc. I don't believe we 
>>>>>>>> will
>>>>>>>> be blocked by Trino due to its MV SPIs currently requiring the 
>>>>>>>> existence of
>>>>>>>> a storage table, as it will just be a different implementation from the
>>>>>>>> existing one in Trino-Iceberg. In this direction, I don't think we 
>>>>>>>> need to
>>>>>>>> have any further debate about pointers, metadata locations, storage 
>>>>>>>> table,
>>>>>>>> etc. because everything will be new.
>>>>>>>>
>>>>>>>> If after the evaluation, we are confident that the MV = view +
>>>>>>>> storage table approach is the right way to go, then we can debate the 
>>>>>>>> other
>>>>>>>> issues, and I think the next issue to reach consensus should be 
>>>>>>>> "Should the
>>>>>>>> storage table be registered in the catalog?".
>>>>>>>>
>>>>>>>> What do we think?
>>>>>>>>
>>>>>>>> -Jack
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Feb 19, 2024 at 11:32 AM Daniel Weeks <dwe...@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Jack,
>>>>>>>>>
>>>>>>>>> I think we should consider either allowing the storage table to be
>>>>>>>>> fully exposed/addressable via the catalog or allow access via 
>>>>>>>>> namespacing
>>>>>>>>> like with metadata tables.  E.g. 
>>>>>>>>> <catalog>.<database>.<table>.<storage>,
>>>>>>>>> which would allow for full access to the underlying table.
>>>>>>>>>
>>>>>>>>> For other engines to interact with the storage table (e.g. to
>>>>>>>>> execute the query to materialize the table), it may be necessary that 
>>>>>>>>> the
>>>>>>>>> table is fully addressable.  Whether the storage table is returned as 
>>>>>>>>> part
>>>>>>>>> of list operations may be something we leave up to the catalog
>>>>>>>>> implementation.
>>>>>>>>>
>>>>>>>>> I don't think the table should reference a physical location (only
>>>>>>>>> a logical reference) since things will be changing behind the view
>>>>>>>>> definition and I'm not confident we want to have to update the view
>>>>>>>>> representation everytime the storage table is updated.
>>>>>>>>>
>>>>>>>>> I think there's still some exploration as to whether we need to
>>>>>>>>> model this as separate from view endpoints, but there may be enough 
>>>>>>>>> overlap
>>>>>>>>> that it's not necessary to have yet another set of endpoints for
>>>>>>>>> materialized views (maybe filter params if you need to distinguish?).
>>>>>>>>>
>>>>>>>>> -Dan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sun, Feb 18, 2024 at 6:57 PM Renjie Liu <
>>>>>>>>> liurenjie2...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi, Jack:
>>>>>>>>>>
>>>>>>>>>> Thanks for raising this.
>>>>>>>>>>
>>>>>>>>>> In most database systems, MV, view and table are considered
>>>>>>>>>>> independent objects, at least at API level. It is very rare for a 
>>>>>>>>>>> system to
>>>>>>>>>>> support operations like "materializing a logical view" or 
>>>>>>>>>>> "upgrading a
>>>>>>>>>>> logical view to MV", because view and MV are very different in 
>>>>>>>>>>> almost every
>>>>>>>>>>> aspect of user experience. Extending the existing view or table 
>>>>>>>>>>> spec to
>>>>>>>>>>> accommodate MV might give us a MV implementation similar to the 
>>>>>>>>>>> current
>>>>>>>>>>> Trino or Hive views, save us some effort and a few APIs in REST, 
>>>>>>>>>>> but it
>>>>>>>>>>> binds us to a very specific design of MV, which we might regret in 
>>>>>>>>>>> the
>>>>>>>>>>> future.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> When I reviewed the doc, I thought we were discussing the spec of
>>>>>>>>>> materialized view, just like the spec of table metadata, but didn't 
>>>>>>>>>> not the
>>>>>>>>>> user facing api. I would definitely agree that we should consider MV 
>>>>>>>>>> as
>>>>>>>>>> another kind of database object in user facing api, even though it's
>>>>>>>>>> internally modelled as a view + storage table pointer.
>>>>>>>>>>
>>>>>>>>>> If we want to make the REST experience good for MV, I think we
>>>>>>>>>>> should at least consider directly describing the full metadata of 
>>>>>>>>>>> the
>>>>>>>>>>> storage table in Iceberg view, instead of pointing to a JSON file.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Do you mean we need to add components like
>>>>>>>>>> `LoadMaterializedViewResponse`, if so, I would +1 for this.
>>>>>>>>>>
>>>>>>>>>> *Q2: what REST APIs do we expect to use for interactions with
>>>>>>>>>>> MVs?*
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> As I have mentioned above,  I think we should consider MV as
>>>>>>>>>> another database object, so I think we should add a set of apis
>>>>>>>>>> specifically designed for MV, such as `loadMV`, `freshMV`.
>>>>>>>>>>
>>>>>>>>>> On Sat, Feb 17, 2024 at 11:14 AM Jack Ye <yezhao...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>
>>>>>>>>>>> As we are discussing the spec change for materialized view,
>>>>>>>>>>> there has been a missing aspect that is technically also related, 
>>>>>>>>>>> and might
>>>>>>>>>>> affect the MV spec design: *how do we want to add MV support to
>>>>>>>>>>> the REST spec?*
>>>>>>>>>>>
>>>>>>>>>>> I would like to discuss this in a new thread to collect people's
>>>>>>>>>>> thoughts. This topic expands to the following 2 sub-questions:
>>>>>>>>>>>
>>>>>>>>>>> *Q1: how would the MV spec change affect the REST spec?*
>>>>>>>>>>> In the current proposal, it looks like we are using a design
>>>>>>>>>>> where a MV is modeled as an Iceberg view linking to an Iceberg 
>>>>>>>>>>> storage
>>>>>>>>>>> table. At the same time, we do not want to expose this storage 
>>>>>>>>>>> table in the
>>>>>>>>>>> catalog, thus the Iceberg view has a pointer to only a metadata 
>>>>>>>>>>> JSON file
>>>>>>>>>>> of the Iceberg storage table. Each MV refresh updates the pointer 
>>>>>>>>>>> to a new
>>>>>>>>>>> metadata JSON file.
>>>>>>>>>>>
>>>>>>>>>>> I feel this does not play very well with the direction that REST
>>>>>>>>>>> is going. The REST catalog is trying to remove the dependency to the
>>>>>>>>>>> metadata JSON file. For example, in LoadTableResponse the only 
>>>>>>>>>>> required
>>>>>>>>>>> field is the metadata, and metadata-location is actually optional.
>>>>>>>>>>>
>>>>>>>>>>> If we want to make the REST experience good for MV, I think we
>>>>>>>>>>> should at least consider directly describing the full metadata of 
>>>>>>>>>>> the
>>>>>>>>>>> storage table in Iceberg view, instead of pointing to a JSON file.
>>>>>>>>>>>
>>>>>>>>>>> *Q2: what REST APIs do we expect to use for interactions with
>>>>>>>>>>> MVs?*
>>>>>>>>>>> So far we have been thinking about amending the view spec to
>>>>>>>>>>> accommodate MV. This entails likely having MVs also being handled 
>>>>>>>>>>> through
>>>>>>>>>>> the view APIs in REST spec.
>>>>>>>>>>>
>>>>>>>>>>> We need to agree with that first in the community, because this
>>>>>>>>>>> has various implications, and I am not really sure at this point if 
>>>>>>>>>>> it is
>>>>>>>>>>> the best way to go.
>>>>>>>>>>>
>>>>>>>>>>> If MV interactions are through the view APIs, the view APIs need
>>>>>>>>>>> to be updated to accommodate MV constructs that are not really 
>>>>>>>>>>> related to
>>>>>>>>>>> logical views. In fact, most actions performed on MVs are more 
>>>>>>>>>>> similar to
>>>>>>>>>>> actions performed on table rather than view, which involve 
>>>>>>>>>>> configuring data
>>>>>>>>>>> layout, read and write constructs. For example, users might run 
>>>>>>>>>>> something
>>>>>>>>>>> like:
>>>>>>>>>>>
>>>>>>>>>>> CREATE MATERIALIZED VIEW mv
>>>>>>>>>>> PARTITION BY col1
>>>>>>>>>>> CLUSTER BY col2
>>>>>>>>>>> AS ( // some sql )
>>>>>>>>>>>
>>>>>>>>>>> then the CreateView API needs to accept partition spec and sort
>>>>>>>>>>> order that are completely not relevant for logical views.
>>>>>>>>>>>
>>>>>>>>>>> When reading a MV, we might even want to have a
>>>>>>>>>>> PlanMaterializedView API similar to the PlanTable API we are adding.
>>>>>>>>>>>
>>>>>>>>>>> *My personal take*
>>>>>>>>>>> It feels like we need to reconsider the question of what is the
>>>>>>>>>>> best way to model MV in Iceberg. Should it be (1) a view linked to a
>>>>>>>>>>> storage table, or (2) a table with a view SQL associated with it, 
>>>>>>>>>>> or (3)
>>>>>>>>>>> it's a completely independent thing. This topic was discussed in 
>>>>>>>>>>> the past in
>>>>>>>>>>> this doc
>>>>>>>>>>> <https://docs.google.com/document/d/1QAuy-meSZ6Oy37iPym8sV_n7R2yKZOHunVR-ZWhhZ6Q/edit?pli=1>,
>>>>>>>>>>> but at that time we did not have much perspective about aspects 
>>>>>>>>>>> like REST
>>>>>>>>>>> spec, and the view integration was also not fully completed yet. 
>>>>>>>>>>> With the
>>>>>>>>>>> new knowledge, currently I am actually leaning a bit more towards 
>>>>>>>>>>> (3).
>>>>>>>>>>>
>>>>>>>>>>> In most database systems, MV, view and table are considered
>>>>>>>>>>> independent objects, at least at API level. It is very rare for a 
>>>>>>>>>>> system to
>>>>>>>>>>> support operations like "materializing a logical view" or 
>>>>>>>>>>> "upgrading a
>>>>>>>>>>> logical view to MV", because view and MV are very different in 
>>>>>>>>>>> almost every
>>>>>>>>>>> aspect of user experience. Extending the existing view or table 
>>>>>>>>>>> spec to
>>>>>>>>>>> accommodate MV might give us a MV implementation similar to the 
>>>>>>>>>>> current
>>>>>>>>>>> Trino or Hive views, save us some effort and a few APIs in REST, 
>>>>>>>>>>> but it
>>>>>>>>>>> binds us to a very specific design of MV, which we might regret in 
>>>>>>>>>>> the
>>>>>>>>>>> future.
>>>>>>>>>>>
>>>>>>>>>>> If we make a new MV spec, it can be made up of fields that
>>>>>>>>>>> already exist in the table and view specs, but it is a whole new 
>>>>>>>>>>> spec. In
>>>>>>>>>>> this way, the spec can evolve independently to accommodate MV 
>>>>>>>>>>> specific
>>>>>>>>>>> features, and we can also create MV-related REST endpoints that 
>>>>>>>>>>> will evolve
>>>>>>>>>>> independently from table and view REST APIs.
>>>>>>>>>>>
>>>>>>>>>>> But on the other side it is definitely associated with more work
>>>>>>>>>>> to maintain a new spec, and potentially big refactoring in the 
>>>>>>>>>>> codebase to
>>>>>>>>>>> make sure operations today that work on table or view can now 
>>>>>>>>>>> support MV as
>>>>>>>>>>> a different object. And it definitely has other problems that I have
>>>>>>>>>>> overlooked. I would greatly appreciate any thoughts about this!
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Jack Ye
>>>>>>>>>>>
>>>>>>>>>>>

Re: Materialized view integration with REST spec

Reply via email to