Upserts in Iceberg

Owen O'Malley Wed, 03 Jul 2019 11:45:02 -0700

It works for me too. 

.. Owen


> On Jul 3, 2019, at 11:27, Anton Okolnychyi <aokolnyc...@apple.com.invalid> 
> wrote:
> 
> Works for me too.
> 
>> On 3 Jul 2019, at 19:09, Erik Wright <erik.wri...@shopify.com.INVALID> wrote:
>> 
>> That works for me.
>> 
>> On Wed, Jul 3, 2019 at 2:01 PM Ryan Blue <rb...@netflix.com.invalid> wrote:
>>> How about 9AM PDT on Friday, 5 July then?
>>> 
>>>> On Wed, Jul 3, 2019 at 10:55 AM Owen O'Malley <owen.omal...@gmail.com> 
>>>> wrote:
>>>> I'd like to call in, but I'm out Thursday. Friday would work except 11am 
>>>> to 1pm pdt.
>>>> 
>>>> .. Owen
>>>> 
>>>>> On Wed, Jul 3, 2019 at 10:42 AM Ryan Blue <rb...@netflix.com.invalid> 
>>>>> wrote:
>>>>> I'm available Thursday and Friday this week as well, but it's a holiday 
>>>>> in the US so some people may be out. If there are no objections from 
>>>>> anyone that would like to attend, then I'm up for that.
>>>>> 
>>>>>> On Wed, Jul 3, 2019 at 10:40 AM Anton Okolnychyi <aokolnyc...@apple.com> 
>>>>>> wrote:
>>>>>> I apologize for the delay on my side. I’ll still have to go through the 
>>>>>> last emails. I am available on Thursday/Friday this week and would be 
>>>>>> great to sync.
>>>>>> 
>>>>>> Thanks,
>>>>>> Anton
>>>>>> 
>>>>>>> On 3 Jul 2019, at 01:29, Ryan Blue <rb...@netflix.com.INVALID> wrote:
>>>>>>> 
>>>>>>> Sorry I didn't get back to this thread last week. Let's try to have a 
>>>>>>> video call to sync up on this next week. What days would work for 
>>>>>>> everyone?
>>>>>>> 
>>>>>>> rb
>>>>>>> 
>>>>>>>> On Fri, Jun 21, 2019 at 9:06 AM Erik Wright <erik.wri...@shopify.com> 
>>>>>>>> wrote:
>>>>>>>> With regards to operation values. Currently they are:
>>>>>>>> append: data files were added and no files were removed.
>>>>>>>> replace: data files were rewritten with the same data; i.e., 
>>>>>>>> compaction, changing the data file format, or relocating data files.
>>>>>>>> overwrite: data files were deleted and added in a logical overwrite 
>>>>>>>> operation.
>>>>>>>> delete: data files were removed and their contents logically deleted.
>>>>>>>> If deletion files (with or without data files) are appended to the 
>>>>>>>> dataset, will we consider that an `append` operation? If so, if 
>>>>>>>> deletion and/or data files are appended, and whole files are also 
>>>>>>>> deleted, will we consider that an `overwrite`?
>>>>>>>> 
>>>>>>>> Given that the only apparent purpose of the operation field is to 
>>>>>>>> optimize snapshot expiration the above seems to meet its needs. An 
>>>>>>>> incremental reader can also skip `replace` snapshots but no others. 
>>>>>>>> Once it decides to read a snapshot I don't think there's any 
>>>>>>>> difference in how it processes the data for append/overwrite/delete 
>>>>>>>> cases.
>>>>>>>> 
>>>>>>>>> On Thu, Jun 20, 2019 at 8:55 PM Ryan Blue <rb...@netflix.com> wrote:
>>>>>>>>> I don’t see that we need [sequence numbers] for file/offset-deletes, 
>>>>>>>>> since they apply to a specific file. They’re not harmful, but the 
>>>>>>>>> don’t seem relevant.
>>>>>>>>> 
>>>>>>>>> These delete files will probably contain a path and an offset and 
>>>>>>>>> could contain deletes for multiple files. In that case, the sequence 
>>>>>>>>> number can be used to eliminate delete files that don’t need to be 
>>>>>>>>> applied to a particular data file, just like the column equality 
>>>>>>>>> deletes. Likewise, it can be used to drop the delete files when there 
>>>>>>>>> are no data files with an older sequence number.
>>>>>>>>> 
>>>>>>>>> I don’t understand the purpose of the min sequence number, nor what 
>>>>>>>>> the “min data seq” is.
>>>>>>>>> 
>>>>>>>>> Min sequence number would be used for pruning delete files without 
>>>>>>>>> reading all the manifests to find out if there are old data files. If 
>>>>>>>>> no manifest with data for a partition contains a file older than some 
>>>>>>>>> sequence number N, then any delete file with a sequence number < N 
>>>>>>>>> can be removed.
>>>>>>>>> 
>>>>>>>> OK, so the minimum sequence number is an attribute of manifest files. 
>>>>>>>> Sounds good. It can likely permit us to optimize compaction operations 
>>>>>>>> as well (i.e., you can easily limit the operation to a subset of 
>>>>>>>> manifest files as long as they are the oldest ones).
>>>>>>>>  
>>>>>>>>> The “min data seq” is the minimum sequence number of a data file. 
>>>>>>>>> That seems like what we actually want for the pruning I described 
>>>>>>>>> above.
>>>>>>>>> 
>>>>>>>> I would expect a data file (appended rows or deletions by column 
>>>>>>>> value) to have a single sequence number that applies to the whole 
>>>>>>>> file. Even a delete-by-file-and-offset file can do with only a single 
>>>>>>>> sequence number (which must be larger than the sequence numbers of all 
>>>>>>>> deleted files). Why do we need a "minimum" data sequence per file?
>>>>>>>>> Off the top of my head [supporting non-key delete] requires adding 
>>>>>>>>> additional information to the manifest file, indicating the columns 
>>>>>>>>> that are used for the deletion. Only equality would be supported; if 
>>>>>>>>> multiple columns were used, they would be combined with boolean-and. 
>>>>>>>>> I don’t see anything too tricky about it.
>>>>>>>>> 
>>>>>>>>> Yes, exactly. I actually phrased it wrong initially. I think it would 
>>>>>>>>> be simple to extend the equality deletes to do this. We just need a 
>>>>>>>>> way to have global scope, not just partition scope.
>>>>>>>>> 
>>>>>>>> I don't think anything special needs to be done with regards to 
>>>>>>>> scoping/partitioning of delete files. When scanning one or more data 
>>>>>>>> files, one must also consider any and all deletion files that could 
>>>>>>>> apply to them. The only way to prune deletion files from consideration 
>>>>>>>> is:
>>>>>>>> All of your data files have at least one partition column in common.
>>>>>>>> The deletion file is also partitioned on that column (at least).
>>>>>>>> The value sets of the data files do not overlap the value sets of the 
>>>>>>>> deletion files in that column.
>>>>>>>>  So given a dataset of sessions that is partitioned by device form 
>>>>>>>> factor and date, for example, you could have a delete (user_id=9876) 
>>>>>>>> in a deletion file that is not partitioned. And it would be "in scope" 
>>>>>>>> for all of those data files.
>>>>>>>> 
>>>>>>>> If you had the same dataset partitioned by hash(user_id) and your 
>>>>>>>> deletes were _also_ partitioned by hash(user_id) you would be able to 
>>>>>>>> prune those deletes while scanning the sessions.
>>>>>>>>> If we add this on a per-deletion file basis it is not clear if there 
>>>>>>>>> is any relevance in preserving the concept of a unique row ID.
>>>>>>>>> 
>>>>>>>>> Agreed. That’s why I’ve been steering us away from the debate about 
>>>>>>>>> whether keys are unique or not. Either way, a natural key delete must 
>>>>>>>>> delete all of the records it matches.
>>>>>>>>> 
>>>>>>>>> I would assume that the maximum sequence number should appear in the 
>>>>>>>>> table metadata
>>>>>>>>> 
>>>>>>>>> Agreed.
>>>>>>>>> 
>>>>>>>>> [W]ould you make it optional to assign a sequence number to a 
>>>>>>>>> snapshot? “Replace” snapshots would not need one.
>>>>>>>>> 
>>>>>>>>> The only requirement is that it is monotonically increasing. If one 
>>>>>>>>> isn’t used, we don’t have to increment. I’d say it is up to the 
>>>>>>>>> implementation to decide. I would probably increment it every time to 
>>>>>>>>> avoid errors.
>>>>>>>>> 
>>>>>>>>> -- 
>>>>>>>>> Ryan Blue
>>>>>>>>> Software Engineer
>>>>>>>>> Netflix
>>>>>>> 
>>>>>>> 
>>>>>>> -- 
>>>>>>> Ryan Blue
>>>>>>> Software Engineer
>>>>>>> Netflix
>>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>> Ryan Blue
>>>>> Software Engineer
>>>>> Netflix
>>> 
>>> 
>>> -- 
>>> Ryan Blue
>>> Software Engineer
>>> Netflix
>

Re: Updates/Deletes/Upserts in Iceberg

Reply via email to