Hey, This is a pretty interesting proposal, thanks for raising it, Yufei! About the 'big metadata' topic: I'm trying to understand the scale of the data being returned in such a scenario. Let me know if my calculations are wrong: taking the 'files' metadata table on a table with 100 cols, for me it seems that *one row* in such a table could take somewhere around *2Kb*. Most of this is stats (lower/upper bounds, value counts etc.) where for one particular stat we'd need to have 100cols * 4 bytes (at least). I wonder what row count we'd consider 'big metadata'. Let's say we have a table with a number of files in the 100k range and 100 cols, then the size of the total result set is around *200Mb* (100k rows * 2Kb row size) plus the overhead of the format we choose to return this result. Is my calculation correct?
Thanks, Gabor On Thu, Jul 4, 2024 at 8:35 AM Szehon Ho <szehon...@apple.com.invalid> wrote: > Hi Piotr > > Thanks for the reply. It’s a good point, I was thinking it would be > convenient in REST, and could avoid the hassle of spec change. But you are > right that it probably belongs at a lower level if we support this feature > generally (like an additional boolean on snapshot). > > Sorry to hijack the thread of the main topic, will start a proper thread > on this when I get a chance. > > Thanks > Szehon > > On Jul 3, 2024, at 11:26 PM, Piotr Findeisen <piotr.findei...@gmail.com> > wrote: > > Hi Szehon, > > re listing 'removed' snapshots > > If I understand what you're saying is the following: Iceberg table format > requires users to first delete metadata information about files and only > then delete the files, and sometimes users want to order these events > differently. > We can solve this within a REST catalog, because REST catalog is not > limited by the Iceberg spec. In particular, it can do copies of metadata > and other workarounds. > However, why wouldn't we choose to solve this within Iceberg format? A > naive person could think that it's conceptually trivial to mark a snapshot > as 'expired' to allow data file removal without removing all the snapshot > information yet. > Please help my understand the reasoning behind these tradeoffs. > > Best > PF > > > > > On Thu, 4 Jul 2024 at 02:26, Szehon Ho <szehon.apa...@gmail.com> wrote: > >> Yes, I was chatting with Yufei about this, in the first glance I agree >> this would be nice to have. I always thought that metadata tables are >> important enough to spec somewhere, and I think this is a nice place to do >> it. There seems to be some overlap with existing calls (ie, you can get >> snapshots from table. and files from proposed Plan API), but it does seem >> valuable to get it in one place. >> >> If we can solve the 'big metadata' issue for PrePlan/PlanTable API's, it >> sounds like we can re-use the solution for files metadata tables. I'd >> perhaps leave out position_deletes one though, as it's mostly used >> internally and seems a bit too 'big' even for this. >> >> I wonder if we can even add an optional endpoint for listing 'removed' >> snapshots. I know it sounds weird, but when looking at metadata tables, >> the one question that I got a lot but could not answer is how to find when >> a data file is added (or a partition is added). If the snapshot is expired >> then it is no longer possible to trace that history. Users often expire >> snapshots to claw back disk space, but may necessarily want to delete the >> snapshot history. But I believe the REST catalog seems to have an >> opportunity in removeSnapshot to preserve the metadata of the old snapshot >> (up to some configured time). So we can query the snapshot metadata even >> after it expires, which I feel will be valuable. >> >> Thanks >> Szehon >> >> >> On Wed, Jul 3, 2024 at 3:04 PM Jack Ye <yezhao...@gmail.com> wrote: >> >>> Hi Yufei, >>> >>> Interesting that we are thinking about similar things. I had this item >>> as a part of the roadmap discussion items in the catalog sync meeting, and >>> then I removed it before the meeting because I felt it's too early to >>> discuss. >>> >>> My main concern for having server-side metadata tables is how we solve >>> the "big metadata" issue. The partitions, manifests, files table can easily >>> itself become a big table, and the REST server becomes inefficient in >>> retrieving results. It's the same old "HMS is too slow in iterating through >>> the partitions" problem. Iceberg kind of solves it by having this >>> information in Avro and in storage that can be scanned distributedly, but >>> with server-side metadata tables, we are technically re-introducing the >>> problem. >>> >>> Maybe one potential approach is to run those potentially large metadata >>> table scans through the PreplanTable and PlanTable APIs. Just a quick >>> thought for now, I need to think a bit more about this. >>> >>> Best, >>> Jack Ye >>> >>> >>> >>> >>> >>> On Wed, Jul 3, 2024 at 1:45 PM Yufei Gu <flyrain...@gmail.com> wrote: >>> >>>> Hi folks, >>>> >>>> I'd like to discuss a new proposal to support server-side metadata >>>> tables. >>>> >>>> One of Iceberg's most advantageous features is the ability to inspect a >>>> table using metadata tables. For instance, we can query snapshots just like >>>> we query data rows using the following command: SELECT * FROM >>>> prod.db.table.snapshots; >>>> >>>> With the REST catalog, we can simplify this process further by >>>> providing metadata directly from REST endpoints. Here are several benefits >>>> of this approach: >>>> >>>> - Engine Independence: The metadata tables do not rely on a >>>> specific implementation of an engine. The REST server returns the >>>> results >>>> directly. For example, the Rust Iceberg does not need to implement its >>>> own >>>> logic to query the snapshot table if it connects to a server with this >>>> capability. This reduces the complexity and development effort required >>>> for >>>> different clients and engines. >>>> - Enabled New Use Cases: A catalog UI or Lakehouse UI can present a >>>> table's metadata (e.g., snapshot/partition list) without relying on an >>>> engine like Trino. This opens up possibilities for lightweight UIs and >>>> tools that can directly interact with the REST endpoints to retrieve and >>>> display metadata. >>>> - Enhanced Performance: With server-side caching, the server-side >>>> metadata tables will perform better. Caching reduces the need to >>>> repeatedly >>>> compute or retrieve metadata, leading to faster response times and >>>> reduced >>>> load on the underlying storage systems. >>>> >>>> Here is the proposal in google doc: >>>> https://docs.google.com/document/d/1MVLwyMQtZ-7jewsQ0PuTvtJbpfl4HCoVdbowMqFTmfc/edit?usp=sharing >>>> >>>> Estimated read time: 5 mins >>>> >>>> Would really appreciate any feedback on this topic and proposal! >>>> >>>> >>>> Yufei >>>> >>> >