BlakeOrth opened a new pull request, #18064:
URL: https://github.com/apache/datafusion/pull/18064
## Which issue does this PR close?
This does not fully close, but is an incremental building block component
for:
- https://github.com/apache/datafusion/issues/17207
The full context of how this code is likely to progress can be seen in the
POC for this effort:
- https://github.com/apache/datafusion/pull/17266
## Rationale for this change
For queries that have many calls to an instrumented object store generating
a full output of all the calls and the summary of those calls could end up
generating thousands of lines of output. Allowing users to only see a summary
for these cases will help ensure the instrumented object store does not
completely dominate the output for a query.
## What changes are included in this PR?
- Adds the ability for a user to choose a summary only output for an
instrumented object store when using the CLI
- The existing "enabled" setting that displays both a summary and a
detailed usage for each object store call has been renamed to `Trace` to
improve clarity
- Adds additional test cases for summary only and modifies existing tests
to use trace
- Updates user guide docs to reflect the CLI flag and command line changes
## Are these changes tested?
Yes. Additional unit tests have been added, and the existing integration
test has been augmented to exercise the new option(s).
Example functional output:
```console
./datafusion-cli --object-store-profiling trace
```
```sql
DataFusion CLI v50.2.0
> CREATE EXTERNAL TABLE hits
STORED AS PARQUET
LOCATION
'https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_1.parquet';
0 row(s) fetched.
Elapsed 0.532 seconds.
Object Store Profiling
Instrumented Object Store: instrument_mode: Trace, inner: HttpStore
2025-10-14T22:26:13.185625701+00:00 operation=Get duration=0.035335s size=8
range: bytes=174965036-174965043
path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-14T22:26:13.221015783+00:00 operation=Get duration=0.045423s
size=34322 range: bytes=174930714-174965035
path=hits_compatible/athena_partitioned/hits_1.parquet
Summaries:
Get
count: 2
duration min: 0.035335s
duration max: 0.045423s
duration avg: 0.040379s
size min: 8 B
size max: 34322 B
size avg: 17165 B
size sum: 34330 B
> \object_store_profiling summary
ObjectStore Profile mode set to Summary
> CREATE EXTERNAL TABLE hits2
STORED AS PARQUET
LOCATION
'https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_2.parquet';
0 row(s) fetched.
Elapsed 0.179 seconds.
Object Store Profiling
Instrumented Object Store: instrument_mode: Summary, inner: HttpStore
Summaries:
Get
count: 2
duration min: 0.021558s
duration max: 0.022129s
duration avg: 0.021843s
size min: 8 B
size max: 55508 B
size avg: 27758 B
size sum: 55516 B
>
```
## Are there any user-facing changes?
Yes. An existing user option in the form of a CLI flag and the associated
command was changed. The user documentation has been updated to reflect these
changes.
##
cc @alamb
(I believe the previous PR that was merged for this effort was the last
major set of core functionality! :tada: The remaining PRs should all be pretty
concise and just fill out the small bits of missing implementation.)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]