[
https://issues.apache.org/jira/browse/SOLR-9857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki updated SOLR-9857:
------------------------------------
Attachment: SOLR-9857.patch
Initial version of the reporting and aggregation of replica metrics.
The design reuses {{SolrMetricReporter}} API - it implements a
{{SolrReplicaReporter}} which is scheduled to report a relevant subset of
metrics every N seconds to the shard leader. It uses javabin format for sending
serialized metrics data.
There is also a new handler at the {{CoreContainer}} level under
{{/admin/metricsCollector}}, which aggregates reports sent from
{{SolrReplicaReporter}}-s. This runs at a {{CoreContainer}} level instead of
the core level because I hope to reuse it for aggregating also node statistics
in SOLR-9858. Partial metrics from replicas are then added to a registry that
has the name of the shard with a ".leader" suffix.
I spent some time thinking about how to best aggregate partial metrics. In
general case it's not possible to do this in a meaningful way, and the Metrics
API doesn't offer any help here. In the end I implemented {{AggregateMetric}},
which maintains all partial numbers for a selected metric and provides only
basic statistics (average, min/max, stddev) - and I left it to the user to
decide which statistic is most meaningful, if at all.
These aggregated metrics are kept in a regular {{MetricRegistry}} on the shard
leader, so they are also reported by {{/admin/metrics}}.
Comments and suggestions are welcome :)
> Collect aggregated metrics from replicas in shard leader
> --------------------------------------------------------
>
> Key: SOLR-9857
> URL: https://issues.apache.org/jira/browse/SOLR-9857
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Components: metrics
> Affects Versions: master (7.0)
> Reporter: Andrzej Bialecki
> Assignee: Andrzej Bialecki
> Priority: Minor
> Attachments: SOLR-9857.patch
>
>
> Shard leaders can collect metrics from replicas in order to learn about their
> load and the progress of replication. These per-replica metrics need to be
> aggregated (if possible) in order to report cluster-wide per-shard metrics.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]