An update would be greatly appreciated, thank you!

On Thu, Nov 2, 2023 at 12:42 PM Jason Hughes <ja...@dremio.com.invalid>
wrote:

> Hey all,
>
> The current architecture diagram
> <https://iceberg.apache.org/img/iceberg-metadata.png> for an iceberg
> table hasn't been updated in over 3 years, and there's are some aspects to
> the architecture of an iceberg table that have changed, most notably delete
> files and puffin files. since this diagram gets a lot of use in enablement
> content around the community and isn't totally accurate anymore, @Ajantha
> Bhat U <ajantha.bh...@dremio.com> and I discussed updating it to be more
> accurate
>
> here's an updated version of the diagram
> <https://docs.google.com/drawings/d/1m_iiJIJjiymadFIsCYnuUS6BvFo0MYDPCx0kKhZgIx4/edit>
> we put together
>
> a few points for discussion that we're interested in others' thoughts on:
>
>    1. the diagram is obviously somewhat more visually complicated than
>    the current one, but IMO the benefit of being more accurate for people
>    learning iceberg outweighs the additional complexity
>    2. since the partition stats spec PR
>    <https://github.com/apache/iceberg/pull/7105> just got merged, we
>    thought it'd be good to include that too while we're updating it, and
>    combine puffin files with partition stats files into one category of files
>    in the diagram labeled "statistics files". we combined them in the diagram,
>    rather than splitting them up, because 1. it provides a simpler diagram, 2.
>    gets the primary point across, and 3. they both serve the purpose of
>    providing statistics for tools to leverage (albeit for different use cases)
>    3. we put statistics files in place in the diagram for both s0 and s1,
>    though we could only have statistics files for s1, which would 1. make the
>    diagram simpler, and 2. show a simple example of the use case of not
>    needing stats files initially, but then as data grows and/or query patterns
>    change, now stats files are needed
>
> if folks are on board with updating the diagram, and after we come to a
> conclusion on the above discussion points and any others that come up, I
> can export it to a png and create a PR to update the arch diagram image on
> the site
>
> thanks!
>
>
> Jason Hughes
>
>
> Dremio | Director of Technical Advocacy
>
>
>
>
>

-- 
Aaron Niskode-Dossett, Data Engineering -- Etsy

Reply via email to