On further thought, Im now thinking this is telling me which rank is stopped 
(2), not that two ranks are stopped. I guess I am still curious about why this 
information is retained here and can rank 2 be made active again? If so, would 
this be cleaned up out of "stopped"?

The state diagram here: http://docs.ceph.com/docs/master/cephfs/mds-states/

seems to indicate that once a rank is "Stopped" it has no path to move out of 
that state. Perhaps I am reading it wrong.

We have updated multi-active-mds clusters and pushed down max_mds to 1 then 
back to 2 during those upgrades and on all of those clusters we do not have any 
listed in "stopped." so I am guessing those ranks go back to active.

Thanks for the clarity.
________________________________
From: ceph-users <ceph-users-boun...@lists.ceph.com> on behalf of Wesley 
Dillingham <wdilling...@godaddy.com>
Sent: Tuesday, May 28, 2019 5:15 PM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Meaning of Ceph MDS / Rank in "Stopped" state.

Notice: This email is from an external sender.



I am working to develop some monitoring for our File clusters and as part of 
the check I inspect `ceph mds stat` for damaged,failed,stopped MDS/Ranks. 
Initially I set my check to Alarm if any of these states was discovered but as 
I distributed it out I noticed that one of our clusters had the following:

                 "failed": [],
                   "damaged": [],
                   "stopped": [
                       2
                   ],

However the cluster health is good and the mds state is: cephfs-2/2/2 up  
{0=p3plcephmds001=up:active,1=p3plcephmds002=up:active}, 1 up:standby

A little further digging and I found that a stopped state doesnt apply to an 
MDS but rather a rank and may indicate that max_mds was previously set higher 
than its current setting of 2, and the "Stopped" ranks are simply ranks which 
were active and simply offloaded their state to other ranks.

My question is, how can I inspect further which ranks are "stopped" and would 
it be appropriate to "clear" those stopped ranks if possible or should I modify 
my check to ignore stopped ranks and only focus on damaged/failed ranks.

The cluster is running 12.2.12

Thanks.

Respectfully,

Wes Dillingham
wdilling...@godaddy.com
Site Reliability Engineer IV - Platform Storage
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to