Thanks Vivekanand! I made some comments on the doc. Overall, I think a partition index is a good idea. We've thought about adding sketches that contain skew estimates for certain columns in a partition so that we can do better join estimation. Getting a start on how we would store data like this is a good step.
I'm a bit more skeptical about locality information, since it would get out of date and require rewriting old, large manifests. On Fri, Nov 20, 2020 at 1:44 AM Vivekanand Vellanki <[email protected]> wrote: > Hi, > > I would like to propose additional fields in Iceberg manifest files > <https://docs.google.com/document/d/1G6GeOXkGSiSTcu0lDS6VA1FtJ_uz9FO4tF2Pffmx9LU/edit#> > to support the following scenarios: > > - Partition index to include per-partition stats to help support > planning > - Data locality information to support split assignment in distributed > query engines > > Comments are welcome. > > -- > Thanks > Vivek > > -- Ryan Blue Software Engineer Netflix
