Hey,
This is a pretty interesting proposal, thanks for raising it, Yufei!

About the 'big metadata' topic:
I'm trying to understand the scale of the data being returned in such a
scenario. Let me know if my calculations are wrong: taking the 'files'
metadata table on a table with 100 cols, for me it seems that *one row* in
such a table could take somewhere around *2Kb*. Most of this is stats
(lower/upper bounds, value counts etc.) where for one particular stat we'd
need to have 100cols * 4 bytes (at least).
I wonder what row count we'd consider 'big metadata'. Let's say we have a
table with a number of files in the 100k range and 100 cols, then the size
of the total result set is around *200Mb* (100k rows * 2Kb row size) plus
the overhead of the format we choose to return this result. Is my
calculation correct?

Thanks,
Gabor


On Thu, Jul 4, 2024 at 8:35 AM Szehon Ho <szehon...@apple.com.invalid>
wrote:

> Hi Piotr
>
> Thanks for the reply.  It’s a good point, I was thinking it would be
> convenient in REST, and could avoid the hassle of spec change.  But you are
> right that it probably belongs at a lower level if we support this feature
> generally (like an additional boolean on snapshot).
>
> Sorry to hijack the thread of the main topic, will start a proper thread
> on this when I get a chance.
>
> Thanks
> Szehon
>
> On Jul 3, 2024, at 11:26 PM, Piotr Findeisen <piotr.findei...@gmail.com>
> wrote:
>
> Hi Szehon,
>
> re listing 'removed' snapshots
>
> If I understand what you're saying is the following: Iceberg table format
> requires users to first delete metadata information about files and only
> then delete the files, and sometimes users want to order these events
> differently.
> We can solve this within a REST catalog, because REST catalog is not
> limited by the Iceberg spec. In particular, it can do copies of metadata
> and other workarounds.
> However, why wouldn't we choose to solve this within Iceberg format? A
> naive person could think that it's conceptually trivial to mark a snapshot
> as 'expired' to allow data file removal without removing all the snapshot
> information yet.
> Please help my understand the reasoning behind these tradeoffs.
>
> Best
> PF
>
>
>
>
> On Thu, 4 Jul 2024 at 02:26, Szehon Ho <szehon.apa...@gmail.com> wrote:
>
>> Yes, I was chatting with Yufei about this, in the first glance I agree
>> this would be nice to have.  I always thought that metadata tables are
>> important enough to spec somewhere, and I think this is a nice place to do
>> it.  There seems to be some overlap with existing calls (ie, you can get
>> snapshots from table. and files from proposed Plan API), but it does seem
>> valuable to get it in one place.
>>
>> If we can solve the 'big metadata' issue for PrePlan/PlanTable API's, it
>> sounds like we can re-use the solution for files metadata tables.  I'd
>> perhaps leave out position_deletes one though, as it's mostly used
>> internally and seems a bit too 'big' even for this.
>>
>> I wonder if we can even add an optional endpoint for listing 'removed'
>> snapshots.   I know it sounds weird, but when looking at metadata tables,
>> the one question that I got a lot but could not answer is how to find when
>> a data file is added (or a partition is added).  If the snapshot is expired
>> then it is no longer possible to trace that history.  Users often expire
>> snapshots to claw back disk space, but may necessarily want to delete the
>> snapshot history.  But I believe the REST catalog seems to have an
>> opportunity in removeSnapshot to preserve the metadata of the old snapshot
>> (up to some configured time).  So we can query the snapshot metadata even
>> after it expires, which I feel will be valuable.
>>
>> Thanks
>> Szehon
>>
>>
>> On Wed, Jul 3, 2024 at 3:04 PM Jack Ye <yezhao...@gmail.com> wrote:
>>
>>> Hi Yufei,
>>>
>>> Interesting that we are thinking about similar things. I had this item
>>> as a part of the roadmap discussion items in the catalog sync meeting, and
>>> then I removed it before the meeting because I felt it's too early to
>>> discuss.
>>>
>>> My main concern for having server-side metadata tables is how we solve
>>> the "big metadata" issue. The partitions, manifests, files table can easily
>>> itself become a big table, and the REST server becomes inefficient in
>>> retrieving results. It's the same old "HMS is too slow in iterating through
>>> the partitions" problem. Iceberg kind of solves it by having this
>>> information in Avro and in storage that can be scanned distributedly, but
>>> with server-side metadata tables, we are technically re-introducing the
>>> problem.
>>>
>>> Maybe one potential approach is to run those potentially large metadata
>>> table scans through the PreplanTable and PlanTable APIs. Just a quick
>>> thought for now, I need to think a bit more about this.
>>>
>>> Best,
>>> Jack Ye
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Jul 3, 2024 at 1:45 PM Yufei Gu <flyrain...@gmail.com> wrote:
>>>
>>>> Hi folks,
>>>>
>>>> I'd like to discuss a new proposal to support server-side metadata
>>>> tables.
>>>>
>>>> One of Iceberg's most advantageous features is the ability to inspect a
>>>> table using metadata tables. For instance, we can query snapshots just like
>>>> we query data rows using the following command: SELECT * FROM
>>>> prod.db.table.snapshots;
>>>>
>>>> With the REST catalog, we can simplify this process further by
>>>> providing metadata directly from REST endpoints. Here are several benefits
>>>> of this approach:
>>>>
>>>>    - Engine Independence: The metadata tables do not rely on a
>>>>    specific implementation of an engine. The REST server returns the 
>>>> results
>>>>    directly. For example, the Rust Iceberg does not need to implement its 
>>>> own
>>>>    logic to query the snapshot table if it connects to a server with this
>>>>    capability. This reduces the complexity and development effort required 
>>>> for
>>>>    different clients and engines.
>>>>    - Enabled New Use Cases: A catalog UI or Lakehouse UI can present a
>>>>    table's metadata (e.g., snapshot/partition list) without relying on an
>>>>    engine like Trino. This opens up possibilities for lightweight UIs and
>>>>    tools that can directly interact with the REST endpoints to retrieve and
>>>>    display metadata.
>>>>    - Enhanced Performance: With server-side caching, the server-side
>>>>    metadata tables will perform better. Caching reduces the need to 
>>>> repeatedly
>>>>    compute or retrieve metadata, leading to faster response times and 
>>>> reduced
>>>>    load on the underlying storage systems.
>>>>
>>>> Here is the proposal in google doc:
>>>> https://docs.google.com/document/d/1MVLwyMQtZ-7jewsQ0PuTvtJbpfl4HCoVdbowMqFTmfc/edit?usp=sharing
>>>>
>>>> Estimated read time: 5 mins
>>>>
>>>> Would really appreciate any feedback on this topic and proposal!
>>>>
>>>>
>>>> Yufei
>>>>
>>>
>

Reply via email to