> > Maybe the user does want to change a row in a file directly and replace > the file to get an updated result quickly bypassing the Iceberg API >
Actually, the above is one of the reasons our customers overwrite Parquet files. They discover that the Parquet file contains incorrect data - they fix it by recreating a new Parquet file with the corrected data; and replace the old version of the file with the new version of the file bypassing Iceberg completely. On Mon, May 17, 2021 at 9:39 AM Jack Ye <[email protected]> wrote: > I actually think there is an argument against that use case of returning > an error after time t3. Maybe the user does want to change a row in a file > directly and replace the file to get an updated result quickly bypassing > the Iceberg API. In that case failing that query after t3 would block that > use case. The statistics in manifest might be wrong, but we can further > argue that the user can directly modify statistics and replace files all > the way up to the snapshot to make sure everything continues to work. > > In general, if a user decides to bypass the contract set by Iceberg, I > believe that we should not predict the behavior and compensate the system > for that behavior, because users can bypass the contract in all different > ways and it will open the door to satisfy many awkward use cases and in the > end break assumptions to the fundamentals. > > In this case you described, I think the existing Iceberg behavior makes > total sense. If you would like to achieve what you described later, you can > potentially update your FileIO and leverage the versioning feature of the > underlying storage to make sure that the file uploaded never has the same > identifier, so that users cannot replace a file at t3. For example, if you > are running on S3, you can enable S3 versioning, and extend the S3FIleIO so > that each file path is not just the s3 path, but the s3 path + version. > > But this is just what I think, let's see how others reply. > > -Jack > > On Sun, May 16, 2021 at 8:52 PM Vivekanand Vellanki <[email protected]> > wrote: > >> From an Iceberg perspective, I understand what you are saying. >> >> A lot of our customers add/remove files to the table using scripts. The >> typical workflow would be: >> - Create Parquet files using other tools >> - Add these files to the Iceberg table >> >> Similarly, for removing Parquet files from the table. I understand that >> Iceberg doesn't delete the data file until all snapshots that refer to that >> data file expire. However, the customer can delete the file directly - they >> might understand that a query on a snapshot will fail. >> >> I am concerned that an unintentional mistake in updating the Iceberg >> table results in incorrect results while querying an Iceberg snapshot. It >> is ok to return an error when a file referred to by a snapshot does not >> exist. >> >> This issue can be addressed by adding a version identifier (e.g. mtime) >> in the DataFile object and including this information in the manifest file. >> This ensures that snapshot reads are correct even when users make mistakes >> while adding/removing files to the table. >> >> We can work on this, if there is sufficient interest. >> >> On Sun, May 16, 2021 at 8:34 PM <[email protected]> wrote: >> >>> In the real system each file would have a unique universal identifier. >>> When iceberg does a delete it doesn’t actually remove the file it creates a >>> new meta-data file which no longer includes that file. When you attempt to >>> access the table of time one you were actually just reading the first >>> meta-data file enough the new meta-data file which is missing the entry for >>> the deleted file. >>> >>> The only way to end up in the scenario you describe is if you were >>> manually deleting files and adding files using the iceberg internal API and >>> not some thing like spark or flink. >>> >>> What actually happens is some thing like at >>> T1 metadata says f1-uuid exists >>> >>> The data is deleted >>> T2 metadata no longer list f1 >>> >>> New data is written >>> T3 metadata says f3_uuid now exists >>> >>> Data files are only physically deleted by iceberg through the expire >>> snapshots command. This removes the snapshot meta-data as well as any data >>> files which are only referred to by those snap shots that are expired. >>> >>> If you are using the internal api (org.apache.iceberg.Table) then it is >>> your responsibility to not perform operations or delete files that would >>> violate the uniqueness of each snapshot. In this case you would similarly >>> solve the problem by just not physically deleting the file when you remove >>> it. Although usually having unique names every time you add data is a good >>> safety measure. >>> >>> On May 16, 2021, at 4:53 AM, Vivekanand Vellanki <[email protected]> >>> wrote: >>> >>> >>> Hi, >>> >>> I would like to understand if Iceberg supports the following scenario: >>> >>> - At time t1, there's a table with a file f1.parquet >>> - At time t2, f1.parquet is removed from the table. f1.parquet is >>> also deleted from the filesystem >>> - Querying table@t1 results in errors since f1.parquet is no longer >>> available in the filesystem >>> - At time t3, f1.parquet is recreated and added back to the table >>> - Querying table@t1 now results in potentially incorrect results >>> since f1.parquet is now present in the filesystem >>> >>> Should there be a version identifier for each data-file in the manifest >>> file to handle such scenarios? >>> >>> Thanks >>> Vivek >>> >>>
