parthchandra commented on PR #1187:
URL: https://github.com/apache/parquet-mr/pull/1187#issuecomment-1823714212

   > > For the object stores, things to measure are
   > > 
   > > * time to open() and close() a file
   > > * time for a read after a backwards seek
   > > * time for a read after a forwards seek.
   > > * how many reads actually took place
   > > * for vector IO, whatever gets picked up there
   > > * were errors reported and retried, or throttling events
   > > * number of underlying GET requests
   > 
   > CMIW, it seems that these stats can be collected solely at the input 
stream level.
   
   Yes, they are best collected by the file system client API. However it would 
be nice to be able to hook up all these metrics together. Then we could, for 
instance, show a single Spark scan operator that displays stats for the 
operator, parquet reader, and the input stream in one place.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to