wypoon commented on PR #5742:
URL: https://github.com/apache/iceberg/pull/5742#issuecomment-1244528824

   @flyrain @RussellSpitzer this is a follow up to 
https://github.com/apache/iceberg/pull/4588. In that change, there is a code 
path that is not tested, which is counting positional deletes when using a 
streaming delete filter. I manually tested that code path by temporarily 
changing the threshold to use a streaming filter in `DeleteFilter` from 100,000 
to 2 and running `TestSparkReaderDeletes` that way. With this change, we make 
the threshold configurable so we can set it for testing. I had actually 
introduced the change here in the original PR at some point, but Russell asked 
me to separate it out because the PR was already quite complex.
   
   The logic behind this change is as follows:
   We add a `streamDeleteFilterThreshold` field to `SparkScan.ReadTask`, 
because the `planInputPartitions` method of both `SparkBatch` and 
`SparkMicroBatchStream` construct `SparkScan.ReadTask`s, and `SparkBatch` and 
`SparkMicroBatchStream` both take a `SparkReadConf` and thus can get the 
threshold value from the `SparkReadConf` and pass it in when constructing 
`SparkScan.ReadTask`. `SparkScan.RowReader` and `SparkScan.BatchReader` both 
take a `SparkScan.ReadTask` in their constructor, so they can get the threshold 
from `SparkScan.ReadTask` and pass it up their constructor chain to their 
respective superclasses, `RowDataReader` and `BatchDataReader`, where in their 
`open(FileScanTask)` methods, they construct a `BaseReader.SparkDeleteFilter`, 
which is where we pass in the threshold value.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to