jorisvandenbossche commented on issue #38389:
URL: https://github.com/apache/arrow/issues/38389#issuecomment-1774678058

   For reference, I ran your snippet above and repeated the timing part 
multiple times on a **Linux (Ubuntu 20.04) Dell XPS 13 9380** (more than 4 
years old, 8th gen Intel Core i7, 4 cores / 8 threads), and I get almost 2 GB/s 
for disk speed and around 1 GB/s for reading (just under for from file, just 
above for in-memory).
   
   (so at least it's not a simple mac vs linux issue)
   
   One environment characteristic that will significantly influence those 
numbers is the parallelization (the Parquet reading will by default use all the 
available cores). So it might be worth to run those timings with and without 
threads enabled, to ensure it's not related to bad scaling on that front.
   On my laptop, I get the expected 3-4x speedup with threads enabled (the 
numbers above), as I get around 250-300 MB/s using `use_threads=False`.
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to