In the real system each file would have a unique universal identifier. When iceberg does a delete it doesn’t actually remove the file it creates a new meta-data file which no longer includes that file. When you attempt to access the table of time one you were actually just reading the first meta-data file enough the new meta-data file which is missing the entry for the deleted file.
The only way to end up in the scenario you describe is if you were manually deleting files and adding files using the iceberg internal API and not some thing like spark or flink. What actually happens is some thing like at T1 metadata says f1-uuid exists The data is deleted T2 metadata no longer list f1 New data is written T3 metadata says f3_uuid now exists Data files are only physically deleted by iceberg through the expire snapshots command. This removes the snapshot meta-data as well as any data files which are only referred to by those snap shots that are expired. If you are using the internal api (org.apache.iceberg.Table) then it is your responsibility to not perform operations or delete files that would violate the uniqueness of each snapshot. In this case you would similarly solve the problem by just not physically deleting the file when you remove it. Although usually having unique names every time you add data is a good safety measure. > On May 16, 2021, at 4:53 AM, Vivekanand Vellanki <[email protected]> wrote: > > > Hi, > > I would like to understand if Iceberg supports the following scenario: > At time t1, there's a table with a file f1.parquet > At time t2, f1.parquet is removed from the table. f1.parquet is also deleted > from the filesystem > Querying table@t1 results in errors since f1.parquet is no longer available > in the filesystem > At time t3, f1.parquet is recreated and added back to the table > Querying table@t1 now results in potentially incorrect results since > f1.parquet is now present in the filesystem > Should there be a version identifier for each data-file in the manifest file > to handle such scenarios? > > Thanks > Vivek >
