jorisvandenbossche commented on issue #38389: URL: https://github.com/apache/arrow/issues/38389#issuecomment-1774678058
For reference, I ran your snippet above and repeated the timing part multiple times on a **Linux (Ubuntu 20.04) Dell XPS 13 9380** (more than 4 years old, 8th gen Intel Core i7, 4 cores / 8 threads), and I get almost 2 GB/s for disk speed and around 1 GB/s for reading (just under for from file, just above for in-memory). (so at least it's not a simple mac vs linux issue) One environment characteristic that will significantly influence those numbers is the parallelization (the Parquet reading will by default use all the available cores). So it might be worth to run those timings with and without threads enabled, to ensure it's not related to bad scaling on that front. On my laptop, I get the expected 3-4x speedup with threads enabled (the numbers above), as I get around 250-300 MB/s using `use_threads=False`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org