Dandandan commented on issue #19971: URL: https://github.com/apache/datafusion/issues/19971#issuecomment-3801621664
I am using samply as tool which does sample-based CPU profiling https://github.com/apache/datafusion/blob/6524d91938d2ea6c764edd1a2bc3fd4c98cfcc9c/docs/source/library-user-guide/profiling.md#profiling-using-samply-cross-platform-profiler The example of the screenshot is just one query (5) of `clickbench_partitioned` (which has 100 files). I agree there is probably not much to be added re: listing of objects, but the heavy part (when running it locally agains a number of files) is actually the CPU part: deserializing/converting/merging/... Parquet metadata + statistics, which is also done in `list_files_for_scan`. Moving this to use a number of threads should at least spread the work. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
