Hey all,

The current architecture diagram
<https://iceberg.apache.org/img/iceberg-metadata.png> for an iceberg table
hasn't been updated in over 3 years, and there's are some aspects to the
architecture of an iceberg table that have changed, most notably delete
files and puffin files. since this diagram gets a lot of use in enablement
content around the community and isn't totally accurate anymore, @Ajantha
Bhat U <ajantha.bh...@dremio.com> and I discussed updating it to be more
accurate

here's an updated version of the diagram
<https://docs.google.com/drawings/d/1m_iiJIJjiymadFIsCYnuUS6BvFo0MYDPCx0kKhZgIx4/edit>
we put together

a few points for discussion that we're interested in others' thoughts on:

   1. the diagram is obviously somewhat more visually complicated than the
   current one, but IMO the benefit of being more accurate for people learning
   iceberg outweighs the additional complexity
   2. since the partition stats spec PR
   <https://github.com/apache/iceberg/pull/7105> just got merged, we
   thought it'd be good to include that too while we're updating it, and
   combine puffin files with partition stats files into one category of files
   in the diagram labeled "statistics files". we combined them in the diagram,
   rather than splitting them up, because 1. it provides a simpler diagram, 2.
   gets the primary point across, and 3. they both serve the purpose of
   providing statistics for tools to leverage (albeit for different use cases)
   3. we put statistics files in place in the diagram for both s0 and s1,
   though we could only have statistics files for s1, which would 1. make the
   diagram simpler, and 2. show a simple example of the use case of not
   needing stats files initially, but then as data grows and/or query patterns
   change, now stats files are needed

if folks are on board with updating the diagram, and after we come to a
conclusion on the above discussion points and any others that come up, I
can export it to a png and create a PR to update the arch diagram image on
the site

thanks!


Jason Hughes


Dremio | Director of Technical Advocacy

Reply via email to