GitHub user JHoelli edited a discussion: Refactor and Restructure Prometheus Metrics to best practices and naming convention
Currently we do not really have a naming convention for Prometheus metrics (https://prometheus.io/docs/practices/naming/). The idea is to structure the metric naming to prefix_componentName_function_unit. As Prefix I would propose: - sp_core for everything related to the java "backend" - sp_extension for everything relates to a service - i.e., listed as service with a service id in the couch db For the current metrics this scheme would result in : **Pipelines** Prefix | Component | Function | Suffix | Description | Labels | Old Naming -- | -- | -- | -- | -- | -- | -- sp_core_ | pipeline_ | count_ | total | Total number of pipelines | - | all_pipelines and element_count sp_core_ | pipeline_ | health_ | state | Total number of pipelines per status (failed, attention, healthy) | operation = failed \| attention \| healthy | attention_required_pipelines failed_pipelines healthy_pipelines sp_core_ | pipeline_ | running_ | state | Total number of pipelines running or stopped | operation = running \| stopped | running_pipelines stopped_pipelines sp_core_ | pipeline_element | data_ | total | Total amount of data received/sent by a pipeline element (e.g., filter) | operation = received \| send elementId="" pipelineId="" | element_input_total_data element_output_total_data **Adapter** Prefix | Component | Function | Suffix | Description | Labels | Old Naming -- | -- | -- | -- | -- | -- | -- sp_core_ | adapter_instance_ | published_ | total | Total number of published events per adapter instance | adapterid=""adaptername="" | adapter_events_published_total **Load Balancer** Prefix | Component | Function | Suffix | Description | Labels | Old Naming -- | -- | -- | -- | -- | -- | -- sp_core_ | — | migration_time_ | seconds | Total pipeline migration time | - | lb_migration_time_seconds sp_extension_ | — | count_ | total | Total number of elements in extension service | - | lb_service_adapter_countlb_service_pipeline_count **Memory** Prefix | Component | Function | Suffix | Description | Labels | Old Naming -- | -- | -- | -- | -- | -- | -- sp_extension_ | memory_ | used_ | bytes | Amount of memory used in bytes (indirectly via job = serviceId) | - | sp_memory_used_bytes sp_extension_ | memory_ | allocation_rate_ | bytes_per_second | Memory allocation rate in bytes per second (indirectly via job = serviceId) | - | sp_memory_allocation_rate_bytes_per_second sp_extension_ | memory_ | usage_ | bytes | Element memory usage in bytes (from load balancer) | service_id="" | memory_usage **Service** Prefix | Component | Function | Suffix | Description | Labels | Old Naming -- | -- | -- | -- | -- | -- | -- sp_extension_ | cpu_ | usage_ | percentage | CPU usage percentage | serviceid="" | cpu_usage sp_extension_ | weight_ | count_ | total | Weight of remaining available resources for an element | serviceid="" | weight sp_extension_ | system_load_ | last_minute | ? | System load average over the last minute | serviceid="" | system_load sp_extension_ | system_load_ | historic_ | average | Historical system load average | serviceid="" | Historical_system_load **Rate Limiter** | Prefix | Component | Function | Suffix | Description | Labels | Old Naming | | ------------- | ------------- | ----------------- | ------- | --------------------------------------------------- | ------------ | ----------------------------------------- | | sp_extension_ | rate_limiter_ | queue_ | total | Current size of the waiting queue | - | sp_rate_limiter_queue_size | | sp_extension_ | rate_limiter_ | average_wait_time | seconds | Average wait time for permit acquisition in seconds | - | sp_rate_limiter_average_wait_time_seconds | Any opinions on the approach ? GitHub link: https://github.com/apache/streampipes/discussions/4014 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
