nastra opened a new pull request, #13222:
URL: https://github.com/apache/iceberg/pull/13222

   This keeps track of all data files to be removed/rewritten in 
**MergingSnapshotProducer** and passes those to the **ManifestFilterManager** 
for deletes. Once **ManifestFilterManager** goes through delete manifests, it 
also checks whether a DV references any of the data files to be removed. 
   This is needed in addition to https://github.com/apache/iceberg/pull/13245 
so that we can properly remove orphaned DVs when e.g. a metadata-only delete is 
performed.
   
   The below benchmark shows that tracking data files to be removed and then 
detecting orphaned DVs when delete manifests are looked at is only adding a 
small fraction to the throughput.
   
   without tracking data files to be removed
   =========================================
   
   ```
   Benchmark                                   (numFiles)  
(percentDataFilesRewritten)  Mode  Cnt   Score   Error  Units
   RewriteDataFilesBenchmark.rewriteDataFiles       50000                       
     5    ss    5   0.497 ± 0.061   s/op
   RewriteDataFilesBenchmark.rewriteDataFiles       50000                       
    25    ss    5   0.649 ± 0.080   s/op
   RewriteDataFilesBenchmark.rewriteDataFiles       50000                       
    50    ss    5   1.889 ± 0.096   s/op
   RewriteDataFilesBenchmark.rewriteDataFiles       50000                       
   100    ss    5   2.093 ± 0.125   s/op
   RewriteDataFilesBenchmark.rewriteDataFiles      100000                       
     5    ss    5   0.503 ± 0.040   s/op
   RewriteDataFilesBenchmark.rewriteDataFiles      100000                       
    25    ss    5   1.941 ± 0.154   s/op
   RewriteDataFilesBenchmark.rewriteDataFiles      100000                       
    50    ss    5   2.139 ± 0.165   s/op
   RewriteDataFilesBenchmark.rewriteDataFiles      100000                       
   100    ss    5   2.474 ± 0.149   s/op
   RewriteDataFilesBenchmark.rewriteDataFiles      500000                       
     5    ss    5   1.054 ± 0.067   s/op
   RewriteDataFilesBenchmark.rewriteDataFiles      500000                       
    25    ss    5   2.577 ± 0.247   s/op
   RewriteDataFilesBenchmark.rewriteDataFiles      500000                       
    50    ss    5   3.318 ± 1.121   s/op
   RewriteDataFilesBenchmark.rewriteDataFiles      500000                       
   100    ss    5   5.792 ± 1.725   s/op
   RewriteDataFilesBenchmark.rewriteDataFiles     1000000                       
     5    ss    5   1.352 ± 0.122   s/op
   RewriteDataFilesBenchmark.rewriteDataFiles     1000000                       
    25    ss    5   3.252 ± 0.325   s/op
   RewriteDataFilesBenchmark.rewriteDataFiles     1000000                       
    50    ss    5   4.887 ± 0.548   s/op
   RewriteDataFilesBenchmark.rewriteDataFiles     1000000                       
   100    ss    5   8.297 ± 1.991   s/op
   RewriteDataFilesBenchmark.rewriteDataFiles     2000000                       
     5    ss    5   2.536 ± 0.232   s/op
   RewriteDataFilesBenchmark.rewriteDataFiles     2000000                       
    25    ss    5   5.227 ± 1.042   s/op
   RewriteDataFilesBenchmark.rewriteDataFiles     2000000                       
    50    ss    5   7.545 ± 2.052   s/op
   RewriteDataFilesBenchmark.rewriteDataFiles     2000000                       
   100    ss    5  18.058 ± 4.773   s/op
   ```
   
   with tracking data files to be removed
   ======================================
   ```
   RewriteDataFilesBenchmark.rewriteDataFiles       50000                       
     5    ss    5   0.626 ± 0.080   s/op
   RewriteDataFilesBenchmark.rewriteDataFiles       50000                       
    25    ss    5   0.813 ± 0.081   s/op
   RewriteDataFilesBenchmark.rewriteDataFiles       50000                       
    50    ss    5   2.013 ± 0.037   s/op
   RewriteDataFilesBenchmark.rewriteDataFiles       50000                       
   100    ss    5   2.316 ± 0.137   s/op
   RewriteDataFilesBenchmark.rewriteDataFiles      100000                       
     5    ss    5   0.676 ± 0.036   s/op
   RewriteDataFilesBenchmark.rewriteDataFiles      100000                       
    25    ss    5   2.036 ± 0.105   s/op
   RewriteDataFilesBenchmark.rewriteDataFiles      100000                       
    50    ss    5   2.125 ± 0.107   s/op
   RewriteDataFilesBenchmark.rewriteDataFiles      100000                       
   100    ss    5   2.856 ± 0.138   s/op
   RewriteDataFilesBenchmark.rewriteDataFiles      500000                       
     5    ss    5   1.319 ± 0.049   s/op
   RewriteDataFilesBenchmark.rewriteDataFiles      500000                       
    25    ss    5   3.314 ± 0.265   s/op
   RewriteDataFilesBenchmark.rewriteDataFiles      500000                       
    50    ss    5   4.219 ± 0.498   s/op
   RewriteDataFilesBenchmark.rewriteDataFiles      500000                       
   100    ss    5   6.164 ± 0.736   s/op
   RewriteDataFilesBenchmark.rewriteDataFiles     1000000                       
     5    ss    5   2.010 ± 0.224   s/op
   RewriteDataFilesBenchmark.rewriteDataFiles     1000000                       
    25    ss    5   3.961 ± 0.137   s/op
   RewriteDataFilesBenchmark.rewriteDataFiles     1000000                       
    50    ss    5   5.771 ± 0.839   s/op
   RewriteDataFilesBenchmark.rewriteDataFiles     1000000                       
   100    ss    5   9.082 ± 0.869   s/op
   RewriteDataFilesBenchmark.rewriteDataFiles     2000000                       
     5    ss    5   3.772 ± 0.602   s/op
   RewriteDataFilesBenchmark.rewriteDataFiles     2000000                       
    25    ss    5   6.558 ± 0.939   s/op
   RewriteDataFilesBenchmark.rewriteDataFiles     2000000                       
    50    ss    5   9.179 ± 1.330   s/op
   RewriteDataFilesBenchmark.rewriteDataFiles     2000000                       
   100    ss    5  23.623 ± 8.387   s/op
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to