w3ll1ngt commented on code in PR #13130:
URL: https://github.com/apache/ignite/pull/13130#discussion_r3261086342


##########
docs/_docs/perf-and-troubleshooting/general-perf-tips.adoc:
##########
@@ -47,3 +47,20 @@ queries with JOINs at massive scale and expect significant 
performance benefits.
 
 * Adjust link:data-rebalancing[data rebalancing settings] to ensure that 
rebalancing completes faster when your cluster topology changes.
 
+== What healthy cluster behavior looks like
+
+A healthy Ignite cluster is not defined by a single latency, CPU, or memory 
number. In practice, it is a cluster whose topology is stable, whose cluster 
state and baseline match the intended deployment, whose partitions are not lost 
or divergent, whose rebalancing and checkpointing complete in bounded time, and 
whose execution queues and memory pools return to a steady level after 
short-lived spikes. Ignite exposes these signals through built-in metrics, 
system views, and the control script rather than through a single aggregate 
health score.
+
+When checking whether a cluster is healthy, start with topology and cluster 
state. The cluster should be in the expected state, usually ACTIVE, and the 
number of server and client nodes should be stable. If native persistence is 
enabled, the baseline should also be in the expected shape: for a stable 
deployment, the nodes that are expected to be online should appear online both 
in baseline-related metrics and in the SYS.BASELINE_NODES system view. Frequent 
unexpected topology changes are not normal and should be treated as a sign of 
node instability or network problems.

Review Comment:
   thank you, links added



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to