BlakeOrth commented on issue #19971:
URL: https://github.com/apache/datafusion/issues/19971#issuecomment-3800841802

   I'm sure there are performance gains to be had during the file listing phase 
of a cold query. I'm skeptical (read: actual evidence of performance 
improvement should be required here) that there's much performance to be had in 
terms of parallelizing calls for actually listing of objects backing a table. 
The issue with parallelizing calls for listing itself is the underlying 
`object_store` machinery is inherently sequential. Even if you invoke it in a 
parallel manner, the underlying implementations have to make sequential calls 
(how do you parallelize the discovery of a set that doesn't guarantee 
deterministic ordering of results that's also of unknown size?).
   
   @Dandandan I'm not familiar with the output of the benchmarking tool you're 
using here. Is the single thread that's operating during the listing operations 
actually exhibiting CPU bottlenecking, or is DataFusion spending time waiting 
on IO here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to