Re: [PR] PARQUET-2374: Add metrics support for parquet file reader [parquet-mr]

via GitHub Wed, 22 Nov 2023 17:12:35 -0800


parthchandra commented on PR #1187:
URL: https://github.com/apache/parquet-mr/pull/1187#issuecomment-1823714212


   > > For the object stores, things to measure are
   > > 
   > > * time to open() and close() a file
   > > * time for a read after a backwards seek
   > > * time for a read after a forwards seek.
   > > * how many reads actually took place
   > > * for vector IO, whatever gets picked up there
   > > * were errors reported and retried, or throttling events
   > > * number of underlying GET requests
   > 
   > CMIW, it seems that these stats can be collected solely at the input 
stream level.
   
   Yes, they are best collected by the file system client API. However it would 
be nice to be able to hook up all these metrics together. Then we could, for 
instance, show a single Spark scan operator that displays stats for the 
operator, parquet reader, and the input stream in one place.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] PARQUET-2374: Add metrics support for parquet file reader [parquet-mr]

Reply via email to