Ryan, I created 2 tickets to work on the partition index. Can you please assign them to me (vvellanki) so that I can work on them?
https://github.com/apache/iceberg/issues/1832 https://github.com/apache/iceberg/issues/1833 Thanks Vivek On Fri, Nov 20, 2020 at 11:18 PM Ryan Blue <[email protected]> wrote: > Thanks Vivekanand! > > I made some comments on the doc. Overall, I think a partition index is a > good idea. We've thought about adding sketches that contain skew estimates > for certain columns in a partition so that we can do better join > estimation. Getting a start on how we would store data like this is a good > step. > > I'm a bit more skeptical about locality information, since it would get > out of date and require rewriting old, large manifests. > > On Fri, Nov 20, 2020 at 1:44 AM Vivekanand Vellanki <[email protected]> > wrote: > >> Hi, >> >> I would like to propose additional fields in Iceberg manifest files >> <https://docs.google.com/document/d/1G6GeOXkGSiSTcu0lDS6VA1FtJ_uz9FO4tF2Pffmx9LU/edit#> >> to support the following scenarios: >> >> - Partition index to include per-partition stats to help support >> planning >> - Data locality information to support split assignment in >> distributed query engines >> >> Comments are welcome. >> >> -- >> Thanks >> Vivek >> >> > > -- > Ryan Blue > Software Engineer > Netflix >
