[
https://issues.apache.org/jira/browse/SOLR-9898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15782804#comment-15782804
]
Andrzej Bialecki commented on SOLR-9898:
-----------------------------------------
h1. Overview
Solr 6.4 adds a developer API and instrumentation for the collection of
detailed performance-oriented metrics throughout the life-cycle of Solr service
and its various components. Internally it uses [Dropwizard Metrics
API|http://metrics.dropwizard.io], which uses the following classes of meters
to measure events:
* *counters* - simply count events. They provide a single long value, e.g. the
number of requests.
* *meters* - additionally compute rates of events. Provide a count (as above)
and 1-, 5-, and 15-minute exponentially decaying rates, similarly to the Unix
system load average.
* *histograms* - calculate approximate distribution of events according to
their values. Provide the following approximate statistics, with a similar
exponential decay as above: mean (arithmetic average), median, maximum,
minimum, standard deviation, and 75-th, 95-th, 98-th, 99-th and 999-th
percentiles.
* *timers* - measure the number and duration of events. They provide a count
and histogram of timings.
* *gauges* - offer instantaneous reading of a current value, e.g. current queue
depth, current number of active connections, free heap size.
Group of related metrics with unique names is managed in a *metric registry*.
Solr maintains several such registries, each corresponding to a high-level
group such as: {{jvm, jetty, http, node, core}} (see below). Metrics are
maintained and accumulated through all life-cycles of components since the
start of the process until its shutdown - e.g. metrics for a particular
SolrCore are tracked through possibly several load / unload / rename
operations, and deleted only when a core is explicitly deleted. However,
metrics are not persisted across process restarts - restarting Solr will
discard all collected metrics.
For each group (and/or for each registry) there can be several *reporters* -
components responsible for communication of metrics from selected registries to
external systems. Currently implemented reporters support emitting metrics via
JMX, Ganglia, Graphite and SLF4J. There is also a dedicated {{/admin/metrics}}
handler that can be queried to report all or a subset of the current metrics
from multiple registries.
h2. Metric groups
These are the major groups of metrics that are collected:
h3. JVM level ({{solr.jvm}} registry):
* direct and mapped buffer pools
* class loading / unloading
* OS memory, CPU time, file descriptors, swap, system load
* GC count and time
* heap, non-heap memory and GC pools
* number of threads, their states and deadlocks
h3. Node / CoreContainer level ({{solr.node}} registry):
* handler requests (count, timing): collections, info, admin, configSets, etc.
* number of cores (loaded, lazy, unloaded)
h3. Core (SolrCore) level ({{solr.core.<collection>...}} registries, one for
each core):
* all common RequestHandler-s report: request timers / counters, timeouts,
errors.
* index-level events (in progress - SOLR-9854): meters for minor / major
merges, number of merged docs, number of deleted docs, gauges for currently
running merges and their size.
* directory-level IO: total read / write meters, histograms for read / write
operations and their size, optionally split per index file (eg. field data,
term dictionary, docValues, etc) (SOLR-9854 in progress)
* shard replication and transaction log replay on replicas (TBD, SOLR-9856)
* TBD: caches, update handler details, and other relevant SolrInfoMBean-s
h3. HTTP level ({{solr.http}} registry):
* open / available / pending connections for shard handler and update handler
h3. Jetty level ({{solr.jetty}} registry):
* threads and pools,
* connection and request timers,
* meters for responses by HTTP class (1xx, 2xx, etc)
h3. Shard leader (TBD)
* aggregated metrics from each replica (SOLR-9857)
h3. Overseer (TBD)
* aggregated metrics from shard leaders and cluster nodes (SOLR-9858)
> Documentation for metrics collection and /admin/metrics
> -------------------------------------------------------
>
> Key: SOLR-9898
> URL: https://issues.apache.org/jira/browse/SOLR-9898
> Project: Solr
> Issue Type: Task
> Security Level: Public(Default Security Level. Issues are Public)
> Affects Versions: master (7.0), 6.4
> Reporter: Andrzej Bialecki
>
> Draft documentation follows.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]