aokolnychyi commented on PR #10200: URL: https://github.com/apache/iceberg/pull/10200#issuecomment-2076189673
After taking a closer look at `BaseTaskWriter`, I think we may have a correctness issue when encoding changes if the table contains multiple specs. Our current implementation of `BaseTaskWriter` assumes all writes are encoded against the current spec. However, what if there are some matching keys in other specs? Written deletes will be scoped to the current partition spec and will not apply to data in other specs, potentially missing to upsert some records. I think we would want to eventually migrate Flink to new writers that inherit `PartitioningWriter`. This is out of scope of this PR, however. I am also not sure we need `ContinuousFileScopedPositionDeleteWriter`. I understand we want to solve the companion issue before Flink migrates to `PartitioningWriter` so we have to come up with a fix for `TaskWriter`. What about directly using `SortingPositionOnlyDeleteWriter` with file granularity in `BaseEqualityDeltaWriter`? We only need to pass a closure to create new position delete writers and that class should already sort deletes for us on the fly. We never needed to persist deleted rows in position deletes so no behavior change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org