RussellSpitzer commented on issue #12467:
URL: https://github.com/apache/iceberg/issues/12467#issuecomment-2707097097
@sfc-gh-ygu
It has to apply both in Copy on Write.
So imagine I have a Data File
*data.parquet*
|X|
|--|
|1 |
|2|
|3|
|4|
And equality Delete
*eq.parquet*
|x|
|-|
|3|
When I do a scan of this file without a filter I make a scan task
```
{
file = data.parquet
deleteList = eq.parquet
}
```
The problem comes when I apply a filter,
So say I do `DELETE WHERE x = 2`
This produces a scan with a filter pushdown of `x = 2` which is used in
`table.scan.filter`
The filter condition is then checked against `eq.parquet` which has a min
and max for `x` of 3. Since we know `x = 2` we get a "CAN NOT MATCH" and ignore
`eq.delete`.
So I produce a scan task that looks like
```
{
file = data.parquet
deleteList = nil
}
```
This scan task goes through the COW execution path which performs `DELETE
WHERE x = 2` to every row but since we are in COW rows that are not deleted are
shunted into a new file. Here we have a problem because we aren't applying the
equality deletes so we write a new file
*data_2.parquet*
|X|
|--|
|1 |
|3|
|4|
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]