parthchandra commented on PR #1187: URL: https://github.com/apache/parquet-mr/pull/1187#issuecomment-1823714212
> > For the object stores, things to measure are > > > > * time to open() and close() a file > > * time for a read after a backwards seek > > * time for a read after a forwards seek. > > * how many reads actually took place > > * for vector IO, whatever gets picked up there > > * were errors reported and retried, or throttling events > > * number of underlying GET requests > > CMIW, it seems that these stats can be collected solely at the input stream level. Yes, they are best collected by the file system client API. However it would be nice to be able to hook up all these metrics together. Then we could, for instance, show a single Spark scan operator that displays stats for the operator, parquet reader, and the input stream in one place. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
