Re: Querying older versions of an Iceberg table

Vivekanand Vellanki Sun, 16 May 2021 21:37:26 -0700

>
> Maybe the user does want to change a row in a file directly and replace
> the file to get an updated result quickly bypassing the Iceberg API
>


Actually, the above is one of the reasons our customers overwrite Parquet
files. They discover that the Parquet file contains incorrect data - they
fix it by recreating a new Parquet file with the corrected data; and
replace the old version of the file with the new version of the file
bypassing Iceberg completely.

On Mon, May 17, 2021 at 9:39 AM Jack Ye <[email protected]> wrote:

> I actually think there is an argument against that use case of returning
> an error after time t3. Maybe the user does want to change a row in a file
> directly and replace the file to get an updated result quickly bypassing
> the Iceberg API. In that case failing that query after t3 would block that
> use case. The statistics in manifest might be wrong, but we can further
> argue that the user can directly modify statistics and replace files all
> the way up to the snapshot to make sure everything continues to work.
>
> In general, if a user decides to bypass the contract set by Iceberg, I
> believe that we should not predict the behavior and compensate the system
> for that behavior, because users can bypass the contract in all different
> ways and it will open the door to satisfy many awkward use cases and in the
> end break assumptions to the fundamentals.
>
> In this case you described, I think the existing Iceberg behavior makes
> total sense. If you would like to achieve what you described later, you can
> potentially update your FileIO and leverage the versioning feature of the
> underlying storage to make sure that the file uploaded never has the same
> identifier, so that users cannot replace a file at t3. For example, if you
> are running on S3, you can enable S3 versioning, and extend the S3FIleIO so
> that each file path is not just the s3 path, but the s3 path + version.
>
> But this is just what I think, let's see how others reply.
>
> -Jack
>
> On Sun, May 16, 2021 at 8:52 PM Vivekanand Vellanki <[email protected]>
> wrote:
>
>> From an Iceberg perspective, I understand what you are saying.
>>
>> A lot of our customers add/remove files to the table using scripts. The
>> typical workflow would be:
>> - Create Parquet files using other tools
>> - Add these files to the Iceberg table
>>
>> Similarly, for removing Parquet files from the table. I understand that
>> Iceberg doesn't delete the data file until all snapshots that refer to that
>> data file expire. However, the customer can delete the file directly - they
>> might understand that a query on a snapshot will fail.
>>
>> I am concerned that an unintentional mistake in updating the Iceberg
>> table results in incorrect results while querying an Iceberg snapshot. It
>> is ok to return an error when a file referred to by a snapshot does not
>> exist.
>>
>> This issue can be addressed by adding a version identifier (e.g. mtime)
>> in the DataFile object and including this information in the manifest file.
>> This ensures that snapshot reads are correct even when users make mistakes
>> while adding/removing files to the table.
>>
>> We can work on this, if there is sufficient interest.
>>
>> On Sun, May 16, 2021 at 8:34 PM <[email protected]> wrote:
>>
>>> In the real system each file would have a unique universal identifier.
>>> When iceberg does a delete it doesn’t actually remove the file it creates a
>>> new meta-data file which no longer includes that file. When you attempt to
>>> access the table of time one you were actually just reading the first
>>> meta-data file enough the new meta-data file which is missing the entry for
>>> the deleted file.
>>>
>>> The only way to end up in the scenario you describe is if you were
>>> manually deleting files and adding files using the iceberg internal API and
>>> not some thing like spark or flink.
>>>
>>> What actually happens is some thing like at
>>> T1 metadata says f1-uuid exists
>>>
>>> The data is deleted
>>> T2 metadata no longer list f1
>>>
>>> New data is written
>>> T3 metadata says f3_uuid now exists
>>>
>>> Data files are only physically deleted by iceberg through the expire
>>> snapshots command. This removes the snapshot meta-data as well as any data
>>> files which are only referred to by those snap shots that are expired.
>>>
>>> If you are using the internal api (org.apache.iceberg.Table) then it is
>>> your responsibility to not perform operations or delete files that would
>>> violate the uniqueness of each snapshot. In this case you would similarly
>>> solve the problem by just not physically deleting the file when you remove
>>> it. Although usually having unique names every time you add data is a good
>>> safety measure.
>>>
>>> On May 16, 2021, at 4:53 AM, Vivekanand Vellanki <[email protected]>
>>> wrote:
>>>
>>> 
>>> Hi,
>>>
>>> I would like to understand if Iceberg supports the following scenario:
>>>
>>>    - At time t1, there's a table with a file f1.parquet
>>>    - At time t2, f1.parquet is removed from the table. f1.parquet is
>>>    also deleted from the filesystem
>>>    - Querying table@t1 results in errors since f1.parquet is no longer
>>>    available in the filesystem
>>>    - At time t3, f1.parquet is recreated and added back to the table
>>>    - Querying table@t1 now results in potentially incorrect results
>>>    since f1.parquet is now present in the filesystem
>>>
>>> Should there be a version identifier for each data-file in the manifest
>>> file to handle such scenarios?
>>>
>>> Thanks
>>> Vivek
>>>
>>>

Re: Querying older versions of an Iceberg table

Reply via email to