amogh-jahagirdar commented on code in PR #11561:
URL: https://github.com/apache/iceberg/pull/11561#discussion_r1863707596
##########
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkBatchQueryScan.java:
##########
@@ -168,7 +168,7 @@ protected Map<String, DeleteFileSet> rewritableDeletes() {
for (ScanTask task : tasks()) {
FileScanTask fileScanTask = task.asFileScanTask();
for (DeleteFile deleteFile : fileScanTask.deletes()) {
- if (ContentFileUtil.isFileScoped(deleteFile)) {
+ if (ContentFileUtil.isFileScoped(deleteFile) ||
ContentFileUtil.isDV(deleteFile)) {
Review Comment:
Great catch, yes to both points. the file scoped Util correctly handles the
DV case by checking the referenced data file, and we for DVs we do need to
merge all existing position deletes, regardless of scope. I've fixed that and
added a test case for update/delete/merge which will create a mix of file
scoped and partition scoped deletes, and then finally produce a delete which
has the positions merged from these existing deletes.
##########
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkBatchQueryScan.java:
##########
@@ -168,7 +168,7 @@ protected Map<String, DeleteFileSet> rewritableDeletes() {
for (ScanTask task : tasks()) {
FileScanTask fileScanTask = task.asFileScanTask();
for (DeleteFile deleteFile : fileScanTask.deletes()) {
- if (ContentFileUtil.isFileScoped(deleteFile)) {
+ if (ContentFileUtil.isFileScoped(deleteFile) ||
ContentFileUtil.isDV(deleteFile)) {
Review Comment:
Great catch, yes to both points. the file scoped Util correctly handles the
DV case by checking the referenced data file, and we for DVs we do need to
merge all existing position deletes, regardless of scope. I've fixed that and
added a test case for update/delete/merge which will create a mix of file
scoped and partition scoped deletes, and then finally produce a DV which has
the positions merged from these existing deletes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]