[
https://issues.apache.org/jira/browse/IGNITE-27163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vladislav Pyatkov updated IGNITE-27163:
---------------------------------------
Description:
h3. Motivation
We already have seats for RAFT groups for partition. It would be logical to
extend this approach to CMG/MG.
{noformat}
Local partition states. A local property of replica, storage, state machine,
etc., associated with the partition:
Healthy
State machine is running, everything’s fine.
Initializing
Ignite node is online, but the corresponding raft group is yet to complete its
initialization.
Snapshot installation
Full state transfer is taking place. Once it’s finished, the partition will
become healthy or catching-up. Before that, data can’t be read, and log
replication is also on pause.
Catching-up
Node is in the process of replicating data from the leader, and its data is a
little bit in the past. More specifically, node has not replicated the tail of
the log, that corresponds to N log entries or log entries for M seconds. The
latest index isn’t known on the follower node, while time can be estimated as a
difference between safe time and node’s clock, so the time interval seems like
a preferred option.
Broken
Something’s wrong with the state machine. Some data might be unavailable for
reading, log can’t be replicated, and this state won’t be changed automatically
without intervention.
Global partition states. A global property of a partition, that specifies its
apparent functionality from user’s point of view:
Available partition
Healthy partition that can process read and write requests. This means that the
majority of peers are healthy at the moment.
Read-only partition
Partition that can process read requests, but can’t process write requests.
There’s no healthy majority, but there’s at least one alive (healthy/catch-up)
peer that can process historical read-only queries.
Unavailable partition
Partition that can’t process any requests.
Degraded partition
This state represents the partition that is by all means available to the user,
but is at a higher risk of having issues than other partitions. For example,
one of the group's peers is offline. There’s still a majority, but the backup
factor is lowered.
{noformat}
h3. Definition of done
The global/local states are available for both system groups (CMG/MG)
was:TBD
> Metrics for CMG/MG
> ------------------
>
> Key: IGNITE-27163
> URL: https://issues.apache.org/jira/browse/IGNITE-27163
> Project: Ignite
> Issue Type: Improvement
> Components: metrics ai3
> Reporter: Vladislav Pyatkov
> Priority: Major
> Labels: ignite-3
>
> h3. Motivation
> We already have seats for RAFT groups for partition. It would be logical to
> extend this approach to CMG/MG.
> {noformat}
> Local partition states. A local property of replica, storage, state machine,
> etc., associated with the partition:
> Healthy
> State machine is running, everything’s fine.
> Initializing
> Ignite node is online, but the corresponding raft group is yet to complete
> its initialization.
> Snapshot installation
> Full state transfer is taking place. Once it’s finished, the partition will
> become healthy or catching-up. Before that, data can’t be read, and log
> replication is also on pause.
> Catching-up
> Node is in the process of replicating data from the leader, and its data is a
> little bit in the past. More specifically, node has not replicated the tail
> of the log, that corresponds to N log entries or log entries for M seconds.
> The latest index isn’t known on the follower node, while time can be
> estimated as a difference between safe time and node’s clock, so the time
> interval seems like a preferred option.
> Broken
> Something’s wrong with the state machine. Some data might be unavailable for
> reading, log can’t be replicated, and this state won’t be changed
> automatically without intervention.
> Global partition states. A global property of a partition, that specifies its
> apparent functionality from user’s point of view:
> Available partition
> Healthy partition that can process read and write requests. This means that
> the majority of peers are healthy at the moment.
> Read-only partition
> Partition that can process read requests, but can’t process write requests.
> There’s no healthy majority, but there’s at least one alive
> (healthy/catch-up) peer that can process historical read-only queries.
> Unavailable partition
> Partition that can’t process any requests.
> Degraded partition
> This state represents the partition that is by all means available to the
> user, but is at a higher risk of having issues than other partitions. For
> example, one of the group's peers is offline. There’s still a majority, but
> the backup factor is lowered.
> {noformat}
> h3. Definition of done
> The global/local states are available for both system groups (CMG/MG)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)