Hey Micah,

For #1, I don't believe spec clearly calls out that all data/delete files
must be unique, but the requirements for cleanup would be violated in
certain cases if you had the same file referenced in multiple manifests.
In practice, the best way to ensure data correctness and metadata
consistency is to ensure that all referenced files have unique locations
and that those locations do not get overwritten.

For #2, the answer follows mostly because if the answer to #1 holds, then
yes the pairwise intersection of entries in the manifest files of a given
snapshot is empty.

The java library does perform some checks to prevent a file from being
added to the same manifest multiple times, but I don't think that
extends to all possible ways of adding files.  So it may be possible, but
not a good idea.

Sam might know if there's a way to add a nav for the format page (it is a
little difficult to navigate at the moment).

-Dan

On Thu, Mar 3, 2022 at 4:49 PM Micah Kornfield <emkornfi...@gmail.com>
wrote:

> Hi Iceberg Dev,
> I tried searching for it in the specification but couldn't find anything
> explicit:
>
> 1.  Is it assumed that all data files and delete files will always have
> globally unique names in a table?
> 2.  Is it expected that the pairwise intersection of all manifest files in
> a snapshot is empty (i.e. For any given data file it has exactly zero or 1
> entries across all manifest files in a snapshot)?
>
> I think the uniqueness of both can maybe be inferred by this sentence (but
> I'm not 100% sure):
>
>> When a file is replaced or deleted from the dataset, it’s manifest entry
>> fields store the snapshot ID in which the file was deleted and status 2
>> (deleted). The file may be deleted from the file system when the snapshot
>> in which it was deleted is garbage collected, assuming that older snapshots
>> have also been garbage collected [1].
>
>
> Thanks,
> Micah
>
>
> P.S. Is there a way to add a table of contents to the specification.  I
> might be missing it but I don't see one rendered at:
> https://iceberg.apache.org/spec/
>

Reply via email to