BlakeOrth commented on PR #18160:
URL: https://github.com/apache/datafusion/pull/18160#issuecomment-3429171331

   @alamb Yes, agreed this should be a positive performance improvement on most 
datasets when using high latency storage, especially since fetching the parquet 
footer followed by the parquet metadata is a strictly sequential operation for 
each file.
   
   The benchmark results here are a bit curious and look inconsistent (perhaps 
due to reasons out of everyone's control). However, I wouldn't be too surprised 
to see minor performance improvements from some local disk backed queries. The 
8B fetch for the parquet footer is below pretty much any reasonable storage 
device's and file system's block size, so the local disk and filesystem are 
probably doing the same amount of work in either case, and this PR eliminates 
one extra call to disk and any internal runtime scheduling around managing that 
call.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to