Dandandan commented on issue #20325: URL: https://github.com/apache/datafusion/issues/20325#issuecomment-3905282899
In addition to some of the overhead you already mentioned (CachedArrayReader / skips / filter + concat) that could be reduced, I think a lot is actually the IO pattern. Some observations/thoughts when running target_partitions=cpu cores on my machine: * CPU usage is not super high for the parquet decoding / datafusion exec threads - ~90% of it just waiting on data to arrive. * This can be demonstrated when doubling the parallelism (2x cores) on my machine, the query finishes ~20% faster due to being able to do more parallel IO / hiding IO latency. * In this case the IO pattern (skipping data) and the `LocalFileSystem` object store is also adding overhead (open / close / allocation / metadata) as it reads more smaller ranges, probably more context switching etc. going on: <img width="708" height="397" alt="Image" src="https://github.com/user-attachments/assets/94e9eca3-2450-41f1-8cd8-d7bd3b8afdb3" /> * I think it might help in this case to cache / remember the file handles (open / close is called many times for each range): https://github.com/apache/arrow-rs-object-store/issues/18 * Tokio (with this query) spreads the IO out to ~30 blocking threads (and keeps them available / spreads the work on them) as it uses `spawn_blocking` in the object store (perhaps we can bring our own thread pool to reduce context switching) * Perhaps we can consider prefetching inside the parquet reader ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
