BlakeOrth opened a new pull request, #18064:
URL: https://github.com/apache/datafusion/pull/18064

   
   
   ## Which issue does this PR close?
   
   This does not fully close, but is an incremental building block component 
for: 
    - https://github.com/apache/datafusion/issues/17207
   
   The full context of how this code is likely to progress can be seen in the 
POC for this effort:
    - https://github.com/apache/datafusion/pull/17266
   
   ## Rationale for this change
   
   For queries that have many calls to an instrumented object store generating 
a full output of all the calls and the summary of those calls could end up 
generating thousands of lines of output. Allowing users to only see a summary 
for these cases will help ensure the instrumented object store does not 
completely dominate the output for a query.
   
   ## What changes are included in this PR?
   
    - Adds the ability for a user to choose a summary only output for an 
instrumented object store when using the CLI
    - The existing "enabled" setting that displays both a summary and a 
detailed usage for each object store call has been renamed to `Trace` to 
improve clarity
    - Adds additional test cases for summary only and modifies existing tests 
to use trace
    - Updates user guide docs to reflect the CLI flag and command line changes
   
   ## Are these changes tested?
   
   Yes. Additional unit tests have been added, and the existing integration 
test has been augmented to exercise the new option(s).
   
   Example functional output:
   ```console
   ./datafusion-cli --object-store-profiling trace
   ```
   ```sql
   DataFusion CLI v50.2.0
   > CREATE EXTERNAL TABLE hits
   STORED AS PARQUET
   LOCATION 
'https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_1.parquet';
   0 row(s) fetched.
   Elapsed 0.532 seconds.
   
   Object Store Profiling
   Instrumented Object Store: instrument_mode: Trace, inner: HttpStore
   2025-10-14T22:26:13.185625701+00:00 operation=Get duration=0.035335s size=8 
range: bytes=174965036-174965043 
path=hits_compatible/athena_partitioned/hits_1.parquet
   2025-10-14T22:26:13.221015783+00:00 operation=Get duration=0.045423s 
size=34322 range: bytes=174930714-174965035 
path=hits_compatible/athena_partitioned/hits_1.parquet
   
   Summaries:
   Get
   count: 2
   duration min: 0.035335s
   duration max: 0.045423s
   duration avg: 0.040379s
   size min: 8 B
   size max: 34322 B
   size avg: 17165 B
   size sum: 34330 B
   
   > \object_store_profiling summary
   ObjectStore Profile mode set to Summary
   > CREATE EXTERNAL TABLE hits2
   STORED AS PARQUET
   LOCATION 
'https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_2.parquet';
   0 row(s) fetched.
   Elapsed 0.179 seconds.
   
   Object Store Profiling
   Instrumented Object Store: instrument_mode: Summary, inner: HttpStore
   Summaries:
   Get
   count: 2
   duration min: 0.021558s
   duration max: 0.022129s
   duration avg: 0.021843s
   size min: 8 B
   size max: 55508 B
   size avg: 27758 B
   size sum: 55516 B
   
   >
   ```
   
   ## Are there any user-facing changes?
   
   Yes. An existing user option in the form of a CLI flag and the associated 
command was changed. The user documentation has been updated to reflect these 
changes.
   
   ##
   cc @alamb 
   (I believe the previous PR that was merged for this effort was the last 
major set of core functionality! :tada: The remaining PRs should all be pretty 
concise and just fill out the small bits of missing implementation.)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to