alamb opened a new issue #679:
URL: https://github.com/apache/arrow-datafusion/issues/679


   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   Usecase:
   1. Connecting DataFusion metrics to existing state of the art metrics 
collection systems (prometheus, influxdb, opentelemetry)
   2. Gaining access to real time values of metrics (not just snapshots)
   
   When running plans with multiple operators of the same type (e.g. multiple 
`HashJoinExec`) there us currently be no way to programaatically gain access to 
the individual HashJoin's statistics. For example, in 
https://github.com/apache/arrow-datafusion/pull/662 the interface will return 
metrics with a single string like `inputRows`. For the usecase of printing 
metrics per operator, that is just fine, however for usecases across the whole 
plan it is less fine. 
   
   It is currently very awkward to create metrics with finer granularity (such 
as per partition of the hash join or  parquet metrics per file). The current 
string = metric  interface means you have to make a compound string key (such 
as "metrics for file foo") and then have to parse that key (as I did in 
https://github.com/apache/arrow-datafusion/pull/657)
   
   Other metric systems such as prometheus, opentelemetry, and influxdb, allow 
for name=value pairs on each metric to address these problems. So when trying 
to integrate DataFusion metrics, it will be a challenge to integrate with these 
other systems 
   
   The other systems allow you to get access to things like:
   
   ```
   operator=ParquetExec,filename="my_filename",partition_number=0 
rows_scanned=100
   operator=ParquetExec,filename="my_other_filename",partition_number=0 
rows_scanned=200
   ```
   
   or for hash join
   ```
   operator=HashJoin,partition_number=0 rows_scanned=100
   operator=HashJoin,partition_number=1 rows_scanned=200
   ```
   
   Another challenge with the current metrics interface is that despite using 
`Arc` and atomic counters internally, the only external interface is to get a 
snapshot of the metrics. If it were to return the `Arc`s themselves, we could 
implement interactive visualizations showing how the metrics evolved over time.
   
   **Describe the solution you'd like**
   
   This, I propose the following changes to metrics to create them (with 
name/vale pairs)
   
   ```rust
   let metrics = SQLMetric::counter("numRows")
     .with("partition", 1)
     .with("filename", "my_file");
   
   let sub_metric = SQLMetric:counter("otherMetric")
     // inherit all name/value pairs on `metric`
     .with_family(metrics)
     .with("new_detail", "awesome");
   ```
   
   And then the collection interface should return a list of these metrics 
rather than HashSet with the id. For example, rather than
   
   
   ```rust
   pub trait ExecutionPlan {
   ...
   
       /// Return a snapshot of the metrics collected during execution
       fn metrics(&self) -> HashMap<String, SQLMetric> {
           HashMap::new()
       }
   ```
   
   Something like
   
   ```rust
   pub trait ExecutionPlan {
   ...
   
       /// Return the metrics for this execution
       fn metrics(&self) -> Vec<Arc<SQLMetric>> {
           HashMap::new()
       }
   ```
   
   **Describe alternatives you've considered**
   A clear and concise description of any alternative solutions or features 
you've considered.
   
   **Additional context**
   Add any other context or screenshots about the feature request here.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to