[ 
https://issues.apache.org/jira/browse/SOLR-15300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17319504#comment-17319504
 ] 

Andrzej Bialecki commented on SOLR-15300:
-----------------------------------------

Based on the Slack discussions, I propose to add the following information to 
the output of CLUSTERSTATUS command:
 * add a calculated (not stored in DocCollection) "health" property at the 
level of each shard and each collection.
 * use the following symbolic names for the health state:
 ** GREEN: all replicas up, leader exists,
 ** YELLOW: some replicas down, leader exists,
 ** ORANGE: many replicas down, leader exists,
 ** RED: most replicas down, or no leader.
 * use 66% and 33% of active replicas as the thresholds between 
yellow/orange/red.
 * the collection-level health status will be reported as the worst status of 
any shard.

The notion of having a flag for a "read only" collection (when there's no 
leader or only PULL replicas) needs further thought, because there's already a 
"readOnly" flag that users can explicitly set using MODIFYCOLLECTION (this flag 
is also used in REINDEXCOLLECTION).

> Shard "state" flag is confusing and of limited value to outside consumers
> -------------------------------------------------------------------------
>
>                 Key: SOLR-15300
>                 URL: https://issues.apache.org/jira/browse/SOLR-15300
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Andrzej Bialecki
>            Assignee: Andrzej Bialecki
>            Priority: Major
>
> Solr API (and consequently the metric reporters, which are often used for 
> Solr monitoring) report the shard as being in ACTIVE state even when in 
> reality its functionality is severely compromised (eg. no replicas, all 
> replicas down, or no leader).
> This reported state is technically correct because it is used only for 
> tracking of the SPLITSHARD operations, as defined in {{Slice.State}}. 
> However, this may be misleading and more often unhelpful than not - for 
> constant monitoring a flag that actually reports impaired functionality of a 
> shard would be more useful than a flag that reports a relatively uncommon 
> SPLITSHARD operation.
> We could either redefine the meaning of the existing flag (and change its 
> state according to some of the criteria I listed above), or add another flag 
> to represent the "health" status of a shard. The value of this flag would 
> then provide an easy way to monitor and to alert external systems of 
> dangerous function impairment, without monitoring the state of all replicas 
> of a collection.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to