[ 
https://issues.apache.org/jira/browse/SOLR-15300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17316772#comment-17316772
 ] 

Jan Høydahl commented on SOLR-15300:
------------------------------------

{quote}what is the intended replication factor and how to measure it
{quote}
Well, the intended replicationFactor for a given shard is the number of 
replicas currently registered with CLUSTERSTATUS. After a system start we 
expect all replicas to be operational. It's not hard to calculate those numbers 
from CLUSTERSTATUS, but you need some logic, some counting etc - it's not just 
there in zk.

CLUSTERSTATUS output should probably remain as it is in ZK. If we add 
properties to the data, those props should either be in its own sub-tree next 
to "collections" or clearly marked as "_live-state" or similar, to not confuse 
it with what is in ZK.

> Shard "state" flag is confusing and of limited value to outside consumers
> -------------------------------------------------------------------------
>
>                 Key: SOLR-15300
>                 URL: https://issues.apache.org/jira/browse/SOLR-15300
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Andrzej Bialecki
>            Assignee: Andrzej Bialecki
>            Priority: Major
>
> Solr API (and consequently the metric reporters, which are often used for 
> Solr monitoring) report the shard as being in ACTIVE state even when in 
> reality its functionality is severely compromised (eg. no replicas, all 
> replicas down, or no leader).
> This reported state is technically correct because it is used only for 
> tracking of the SPLITSHARD operations, as defined in {{Slice.State}}. 
> However, this may be misleading and more often unhelpful than not - for 
> constant monitoring a flag that actually reports impaired functionality of a 
> shard would be more useful than a flag that reports a relatively uncommon 
> SPLITSHARD operation.
> We could either redefine the meaning of the existing flag (and change its 
> state according to some of the criteria I listed above), or add another flag 
> to represent the "health" status of a shard. The value of this flag would 
> then provide an easy way to monitor and to alert external systems of 
> dangerous function impairment, without monitoring the state of all replicas 
> of a collection.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to