Re: [PR] Flink: Apply DeleteGranularity for writes [iceberg]

via GitHub Wed, 24 Apr 2024 19:02:52 -0700


aokolnychyi commented on PR #10200:
URL: https://github.com/apache/iceberg/pull/10200#issuecomment-2076189673


   After taking a closer look at `BaseTaskWriter`, I think we may have a 
correctness issue when encoding changes if the table contains multiple specs. 
Our current implementation of `BaseTaskWriter` assumes all writes are encoded 
against the current spec. However, what if there are some matching keys in 
other specs? Written deletes will be scoped to the current partition spec and 
will not apply to data in other specs, potentially missing to upsert some 
records. I think we would want to eventually migrate Flink to new writers that 
inherit `PartitioningWriter`. This is out of scope of this PR, however.
   
   I am also not sure we need `ContinuousFileScopedPositionDeleteWriter`. I 
understand we want to solve the companion issue before Flink migrates to 
`PartitioningWriter` so we have to come up with a fix for `TaskWriter`. What 
about directly using `SortingPositionOnlyDeleteWriter` with file granularity in 
`BaseEqualityDeltaWriter`? We only need to pass a closure to create new 
position delete writers and that class should already sort deletes for us on 
the fly. We never needed to persist deleted rows in position deletes so no 
behavior change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Flink: Apply DeleteGranularity for writes [iceberg]

Reply via email to