The GitHub Actions job "CI" on iceberg-rust.git/spawn-multiple-tasks-per-read has failed. Run started by GitHub user tafia (triggered by tafia).
Head commit for run: 297d707984bfdfd0cece7a9d9e26d4be8129dabd / Johann Tuffe <[email protected]> feat: (perf) allow spawning multiple tasks per read Scanning of all files is both cpu and io intensive. While we can control the io parallelism via concurrency_limit* arguments, all the work is effectively done on the same tokio task, thus the same cpu. This situation is one of the main reason why iceberg-rust is much slower than pyiceberg while reading large files (my test involved a 10G file). This PR proposes to split scans into chunks which can be spawned independently to allow cpu parallelism. In my tests (I have yet to find how to benchmark it in this project directly), reading a 10G file: - before: 38s - after: 16s - pyiceberg: 15s Report URL: https://github.com/apache/iceberg-rust/actions/runs/22252417856 With regards, GitHub Actions via GitBox
