sdd opened a new pull request, #515:
URL: https://github.com/apache/iceberg-rust/pull/515

   NB: (This PR builds on top of 
https://github.com/apache/iceberg-rust/pull/373 and 
https://github.com/apache/iceberg-rust/pull/512 and includes their commits, so 
should be rebased on main once they are merged and before this one gets merged)
   
   This brings some big performance gains vs the previous sequential batch 
processing. On my 12-core Ryzen 9 5900X, I see all 12 cores hitting about 50% 
utilization. 
   
   Performance on retrieval of all the data on a full table scan in my perf 
testing branch for this hit 83 million rows in 7s, or over 11M rows/sec. Real 
world could be quite a bit faster as 50% of the CPU usage was for Minio serving 
up the data files.
   
   As with the concurrent file plan PR, the concurrency config has been set to 
fast defaults based on testing a range of values but can be user-configured.
   
   
![Screenshot_20240731_211559](https://github.com/user-attachments/assets/0c5cd1c2-9389-4864-a3bd-566aebf6f9a2)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to