Thanks for the feedback. I responded to the comments in the doc.

Regarding locality information, I introduced a timestamp field to track the
time when the information was populated. Engines can use this timestamp to
decide the validity of this data locality information. Further, when
manifest files are restated as part of MergeAppend; or compaction; this
information would be updated.

On Fri, Nov 20, 2020 at 11:18 PM Ryan Blue <[email protected]>
wrote:

> Thanks Vivekanand!
>
> I made some comments on the doc. Overall, I think a partition index is a
> good idea. We've thought about adding sketches that contain skew estimates
> for certain columns in a partition so that we can do better join
> estimation. Getting a start on how we would store data like this is a good
> step.
>
> I'm a bit more skeptical about locality information, since it would get
> out of date and require rewriting old, large manifests.
>
> On Fri, Nov 20, 2020 at 1:44 AM Vivekanand Vellanki <[email protected]>
> wrote:
>
>> Hi,
>>
>> I would like to propose additional fields in Iceberg manifest files
>> <https://docs.google.com/document/d/1G6GeOXkGSiSTcu0lDS6VA1FtJ_uz9FO4tF2Pffmx9LU/edit#>
>> to support the following scenarios:
>>
>>    - Partition index to include per-partition stats to help support
>>    planning
>>    - Data locality information to support split assignment in
>>    distributed query engines
>>
>> Comments are welcome.
>>
>> --
>> Thanks
>> Vivek
>>
>>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Reply via email to