BlakeOrth opened a new issue, #18232:
URL: https://github.com/apache/datafusion/issues/18232

   ### Is your feature request related to a problem or challenge?
   
   As noted in the comment chain here: 
    - https://github.com/apache/datafusion/pull/18139#discussion_r2440968965
   
   The duration statistic reported by some of the instrumented object store's 
methods, while technically accurate, can potentially be misleading for users. 
E.g. the duration reported for a `put_multipart` is the duration the backing 
object store spent initiating a multipart put session with the backing store, 
as opposed to the duration actually spent pushing data to the backing store. 
Users would likely expect the duration to be the latter since that's the 
portion of the process where actual "work" with the backing store is being 
done. Additionally, any duration based caveats are not readily apparent without 
understanding both the instrumentation code in `datafusion` as well as some 
understanding of how operations work in `object_store`. 
   
   Considering the instrumented object store is currently mostly a 
development/debug utility the above caveats are likely tolerable, however 
improving/scrutinizing the accounting for the collected and reported durations 
would allow the instrumented object store to be more useful in profiling 
operations that are strictly focused on runtime duration of operations.
   
   ### Describe the solution you'd like
   
   I would like to have additional logic added to the instrumented object store 
that helps the duration statistics that are collected and reported to be in 
line with an end-user's expectations.
   
   ### Describe alternatives you've considered
   
   If the goal is just to make sure the duration stats that are reported are 
not misleading duration could be omitted from various operations (and 
subsequently accounted for when computing summary statistics). This would help 
the reported statistics not be misleading, but it would also reduce the 
granularity of reporting which seems somewhat undesirable.
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to