An update would be greatly appreciated, thank you! On Thu, Nov 2, 2023 at 12:42 PM Jason Hughes <ja...@dremio.com.invalid> wrote:
> Hey all, > > The current architecture diagram > <https://iceberg.apache.org/img/iceberg-metadata.png> for an iceberg > table hasn't been updated in over 3 years, and there's are some aspects to > the architecture of an iceberg table that have changed, most notably delete > files and puffin files. since this diagram gets a lot of use in enablement > content around the community and isn't totally accurate anymore, @Ajantha > Bhat U <ajantha.bh...@dremio.com> and I discussed updating it to be more > accurate > > here's an updated version of the diagram > <https://docs.google.com/drawings/d/1m_iiJIJjiymadFIsCYnuUS6BvFo0MYDPCx0kKhZgIx4/edit> > we put together > > a few points for discussion that we're interested in others' thoughts on: > > 1. the diagram is obviously somewhat more visually complicated than > the current one, but IMO the benefit of being more accurate for people > learning iceberg outweighs the additional complexity > 2. since the partition stats spec PR > <https://github.com/apache/iceberg/pull/7105> just got merged, we > thought it'd be good to include that too while we're updating it, and > combine puffin files with partition stats files into one category of files > in the diagram labeled "statistics files". we combined them in the diagram, > rather than splitting them up, because 1. it provides a simpler diagram, 2. > gets the primary point across, and 3. they both serve the purpose of > providing statistics for tools to leverage (albeit for different use cases) > 3. we put statistics files in place in the diagram for both s0 and s1, > though we could only have statistics files for s1, which would 1. make the > diagram simpler, and 2. show a simple example of the use case of not > needing stats files initially, but then as data grows and/or query patterns > change, now stats files are needed > > if folks are on board with updating the diagram, and after we come to a > conclusion on the above discussion points and any others that come up, I > can export it to a png and create a PR to update the arch diagram image on > the site > > thanks! > > > Jason Hughes > > > Dremio | Director of Technical Advocacy > > > > > -- Aaron Niskode-Dossett, Data Engineering -- Etsy