Upserts in Iceberg

Owen O'Malley Wed, 12 Jun 2019 14:50:55 -0700


> On May 15, 2019, at 12:54 PM, Ryan Blue <rb...@netflix.com.INVALID> wrote:
> 
> 2. Iceberg diff files should use synthetic keys
> 
> A lot of the discussion on the doc is about whether natural keys are 
> practical or what assumptions we can make or trade about them. In my opinion, 
> Iceberg tables will absolutely need natural keys for reasonable use cases. 
> And those natural keys will need to be unique. And Iceberg will need to rely 
> on engines to enforce that uniqueness.
> 
Agreed. One restriction that we should probably adopt is not allowing mutations 
of the natural keys or the partition/bucketing. Allowing such mutations while 
enforcing uniqueness would typically be expensive. Without those mutations, 
uniqueness is relatively cheap to enforce.


> But, there is a difference between table behavior and implementation. We can 
> use synthetic keys to implement the requirements of natural keys. Each row 
> should be identified by its file and position in a file. When deleting by a 
> natural key, we just need to find out what the synthetic key is and encode 
> that in the delete diff.
> 
+1

> With the physical representation using synthetic keys, we should also define 
> how to communicate a natural key constraint for a table. That way, writers 
> can fail if a write may violate the key constraints of a table.
> 
> 3. Synthetic keys should be based on filename and position
> 
> I think identifying the file in a synthetic key makes a lot of sense. This 
> would allow for delta file reuse as individual files are rewritten by a 
> “major” compaction and provides nice flexibility that fits with the format. 
> We will need to think through all the impacts, like how file relocation works 
> (e.g., move between regions) and the requirements for rewrites (must apply 
> the delta when rewriting).
> 
I’d recommend using the file and row number within the file. I believe Avro, 
ORC, and Parquet all track row numbers and thus they are a basically free 
synthetic id within each file. The critical feature is that each input split 
needs to know how many rows are above it in the file, so that delete files can 
be read efficiently.

.. Owen

> Open questions
> 
> There are also quite a few remaining questions for a design:
> 
> Should Iceberg use insert diff files? (My initial answer is no)
> Should Iceberg require diff compaction? Iceberg could require one delete diff 
> per partition, for example. (My answer: no)
> Should data files store synthetic key position? If so, why?
> Should there be a dense format for deletes, or just a sparse format?
> What is the scope of a delete diff? At a minimum, partition. But does it make 
> sense to build ways to restrict scope further?
> 
> On Fri, May 10, 2019 at 11:27 AM Anton Okolnychyi 
> <aokolnyc...@apple.com.invalid> wrote:
> We did take a look at Hudi. The overall design seems to be pretty complicated 
> and, unfortunately, I didn’t have time to explore every detail.
> 
> Here is my understanding (correct me if I am wrong):
> 
> - Hudi has RECORD_KEY, which is expected to be unique.
> - Hudi has PRECOMBINED_KEY, which is used to pick only one row in the 
> incoming batch if there are multiple rows with the same key. As I understand, 
> this isn't used on reads. It is used on writes to deduplicate rows with 
> identical keys within one incoming batch. For example, if we are inserting 10 
> records and two rows have the same key, PRECOMBINED_KEY will be used to pick 
> up only one row.
> - Once Hudi ensures the uniqueness of RECORD_KEY within the incoming batch, 
> it loads the Bloom filter index from all existing Parquet files in the 
> involved partitions (meaning, partitions spread from the input batch) and 
> tags each record as either an update or insert by mapping the incoming keys 
> to existing files for updates. At this point, it seems to rely on join.
> 
> Is my understanding correct? If so, do we want to consider joins on write? We 
> mentioned this technique as one way to ensure the uniqueness of natural keys 
> but we were concerned about the performance. Also, does Hudi support 
> record-level updates? 
> 
> Thanks,
> Anton
> 
>> On 10 May 2019, at 18:22, Erik Wright <erik.wri...@shopify.com.INVALID 
>> <mailto:erik.wri...@shopify.com.INVALID>> wrote:
>> 
>> Thanks for putting this forward.
>> 
>> Another term for the "lazy" approach would be "merge on read".
>> 
>> My team has built something internally that uses merge-on-read internally 
>> but uses an "Eager" materialization for publication to Presto. Roughly, we 
>> maintain a table metadata file that looks a bit like Iceberg's and tracks 
>> the "live" version of each partition as it is updated over time. We are 
>> looking into a solution that will allow us to push the merge-on-read all the 
>> way to Presto (and other consumers), and adding Merge-On-Read to Iceberg is 
>> one of the approaches we are considering.
>> 
>> It's worth noting that Hudi does have support for upserts/deletes as well, 
>> so that's another model to consider.
>> 
>> On Fri, May 10, 2019 at 8:30 AM Miguel Miranda 
>> <miguelnmira...@apple.com.invalid <mailto:miguelnmira...@apple.com.invalid>> 
>> wrote:
>> Hi,
>> 
>> As Anton said, we purposely avoided making a "decision" on which approach 
>> should be implemented in order to allow for a meaningful discussion with the 
>> community.
>> 
>> The document starts with an eager approach as it is straightforward and easy 
>> to understand: steps resemble the usual file level operations/manipulations 
>> frequently used by engineers when implementing Update/Delete/Upsert 
>> behaviour themselves, hopefully creating a conceptual bridge to the more 
>> involved designs. Right now, Iceberg has almost everything to implement the 
>> "eager" approach as we simply need to adjust the retry mechanism. For 
>> example, I have implemented a prototype of the eager solution with Spark and 
>> Iceberg.
>> 
>> We looked into many existing solutions for inspiration, but when there isn't 
>> a paper or code in the public domain it becomes hard to assess the 
>> underlying design, although some of it can be inferred from the API or 
>> documentation.
>> 
>> Best,
>> Miguel
>> 
>>> On 10 May 2019, at 11:57, Anton Okolnychyi <aokolnyc...@apple.com 
>>> <mailto:aokolnyc...@apple.com>> wrote:
>>> 
>>> Thanks for the feedback, Jacques!
>>> 
>>> You are correct, we kept the question of the best approach as open :) The 
>>> idea was to have a discussion in the community. Hopefully, we can reach a 
>>> consensus.
>>> 
>>> While the proposed “lazy” approaches certainly offer significant benefits, 
>>> they require more changes in Iceberg as well as in readers/query engines 
>>> (depending on how we want to merge base and diff files). For us, it is 
>>> important to understand whether the Iceberg community would even consider 
>>> such changes. 
>>> 
>>> Hive ACID 3 is one the projects we looked at. In fact, we spoke to Owen, 
>>> the original creator of updates/deletes/upserts in Hive. I believe the 
>>> “lazy” approaches are close to what Hive 3 does but with their own 
>>> distinctions that Iceberg allows us to have. It would be great to have 
>>> Owen’s feedback.
>>> 
>>> We don’t know the internals of Delta as updates/deletes/upserts are not 
>>> open source. My personal guess, yes, it might be similar to the “eager” 
>>> approach in our doc.
>>> 
>>> Jacques, could you share some insights how you implement the merge of 
>>> diffs? Is it done by readers?
>>> 
>>> Thanks,
>>> Anton
>>> 
>>>> On 10 May 2019, at 06:24, Jacques Nadeau <jacq...@dremio.com 
>>>> <mailto:jacq...@dremio.com>> wrote:
>>>> 
>>>> This is a nice doc and it covers many different options. Upon first skim, 
>>>> I don't see a strong argument for particular approach. D
>>>> 
>>>> In our own development, we've been leaning heavily towards what you 
>>>> describe in the document as "lazy with SRI". I believe this is consistent 
>>>> with what the Hive community did on top of Orc. It's interesting because 
>>>> my (maybe incorrect) understanding of the Databricks Delta approach is 
>>>> they chose what you title "eager" in their approach to upserts. They may 
>>>> also have a lazy approach for other types of mutations but I don't think 
>>>> they do.
>>>> 
>>>> Thanks again for putting this together!
>>>> Jacques
>>>> --
>>>> Jacques Nadeau
>>>> CTO and Co-Founder, Dremio
>>>> 
>>>> 
>>>> On Wed, May 8, 2019 at 3:42 AM Anton Okolnychyi 
>>>> <aokolnyc...@apple.com.invalid <mailto:aokolnyc...@apple.com.invalid>> 
>>>> wrote:
>>>> Hi folks,
>>>> 
>>>> Miguel (cc) and I have spent some time thinking about how to perform 
>>>> updates/deletes/upserts on top of Iceberg tables. This functionality is 
>>>> essential for many modern use cases. We've summarized our ideas in a doc 
>>>> [1], which, hopefully, will trigger a discussion in the community. The 
>>>> document presents different conceptual approaches alongside their 
>>>> trade-offs. We will be glad to consider any other ideas as well.
>>>> 
>>>> Thanks,
>>>> Anton
>>>> 
>>>> [1] - 
>>>> https://docs.google.com/document/d/1Pk34C3diOfVCRc-sfxfhXZfzvxwum1Odo-6Jj9mwK38/
>>>>  
>>>> <https://docs.google.com/document/d/1Pk34C3diOfVCRc-sfxfhXZfzvxwum1Odo-6Jj9mwK38/>
>>>> 
>>>> 
>>> 
>> 
> 
> 
> 
> -- 
> Ryan Blue
> Software Engineer
> Netflix

Re: Updates/Deletes/Upserts in Iceberg

Reply via email to