lawofcycles commented on PR #3320: URL: https://github.com/apache/iceberg-python/pull/3320#issuecomment-4370712868
## Benchmark results This PR brings three capabilities to PyIceberg's write path. 1. Transparent retry for concurrent writes. Users no longer need to implement retry logic around `table.append()` or `table.delete()`. 2. Data conflict validation. Incompatible concurrent modifications (e.g. concurrent deletes on the same data) are detected and rejected with `ValidationException`, preventing silent data corruption. 3. Efficient retry via data file reuse. On retry, only manifests are regenerated. Data files already written to S3 are reused, avoiding redundant Parquet writes. To validate (3), I benchmarked concurrent appends using the NYC Yellow Taxi dataset (2024-01, 2.9M rows, 19 columns) with Glue Data Catalog + S3. ### Before vs After Without this PR, concurrent appends fail immediately with `CommitFailedException`. Only one writer succeeds per batch, regardless of parallelism. | Workers | Before (no retry) | After (this PR) | |--------:|-------------------:|----------------:| | 2 | 50.0% | 100.0% | | 4 | 25.0% | 100.0% | | 8 | 12.5% | 100.0% | (N workers x 10 batches x 1K rows, `commit.retry.num-retries=10`, `commit.retry.min-wait-ms=500`) ### Internal retry vs user-side retry Compared the internal retry (this PR) against a user-side retry that catches `CommitFailedException` and re-does `load_table` + `append` from scratch. Both use the same backoff parameters (retries=15, min-wait=500ms). | Workers | Internal retry | User-side retry | Speedup | |--------:|---------------:|----------------:|--------:| | 2 | 33s | 46s | 1.4x | | 4 | 68s | 87s | 1.3x | | 8 | 167s | 299s | 1.8x | | 16 | 399s | 588s | 1.5x | (3 batches per worker, ~370K-1.5M rows per batch depending on worker count) Internal retry is faster because it reuses data files already written to S3 and only regenerates manifests on retry. User-side retry rewrites Parquet files on every attempt. Interestingly, internal retry actually performs *more* retries than user-side retry (88 vs 50 total retries at 8 workers), because the shorter retry window increases commit attempt density. Despite more retries, the total time is lower because each retry is much cheaper. ### Tuning `commit.retry.min-wait-ms` Tested different `min-wait-ms` values with 8 workers to find the optimal backoff for Glue. | min-wait-ms | Total time | Total retries | |------------:|-----------:|--------------:| | 100 | 158s | 78 | | 500 | 126s | 115 | | 1000 | 238s | 67 | | 2000 | 235s | 63 | | 3000 | 206s | 41 | The default (100ms, matching Java Iceberg) works reasonably well, but 500ms is optimal for Glue. Too short causes contention storms, too long wastes time waiting. The optimal value depends on the catalog's commit latency. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
