I think you really need to sit down and explain the full story. Dropping 
one-liners with new information will not work via e-mail.

I have never heard of the problem you are facing, so you did something that 
possibly no-one else has done before. Unless we know the full history from the 
last time the cluster was health_ok until now, it will almost certainly not be 
possible to figure out what is going on via e-mail.

Usually, setting "norebalance" and "norecovery" should stop any recovery IO and 
allow the PGs to peer. If they do not become active, something is wrong and the 
information we got so far does not give a clue what this could be.

Please post the output of "ceph health detail", "ceph osd pool stats" and "ceph 
osd pool ls detail" and a log of actions and results since last health_ok 
status here, maybe it gives a clue what is going on.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Zhenshi Zhou <deader...@gmail.com>
Sent: 29 October 2020 09:44:14
To: Frank Schilder
Cc: ceph-users
Subject: Re: [ceph-users] monitor sst files continue growing

I reset the pg_num after adding osd, it made some pg inactive(in activating 
state)

Frank Schilder <fr...@dtu.dk<mailto:fr...@dtu.dk>> 于2020年10月29日周四 下午3:56写道:
This does not explain incomplete and inactive PGs. Are you hitting 
https://tracker.ceph.com/issues/46847 (see also thread "Ceph does not recover 
from OSD restart"? In that case, temporarily stopping and restarting all new 
OSDs might help.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Zhenshi Zhou <deader...@gmail.com<mailto:deader...@gmail.com>>
Sent: 29 October 2020 08:30:25
To: Frank Schilder
Cc: ceph-users
Subject: Re: [ceph-users] monitor sst files continue growing

After add OSDs into the cluster, the recovery and backfill progress has not 
finished yet

Zhenshi Zhou 
<deader...@gmail.com<mailto:deader...@gmail.com><mailto:deader...@gmail.com<mailto:deader...@gmail.com>>>
 于2020年10月29日周四 下午3:29写道:
MGR is stopped by me cause it took too much memories.
For pg status, I added some OSDs in this cluster, and it

Frank Schilder 
<fr...@dtu.dk<mailto:fr...@dtu.dk><mailto:fr...@dtu.dk<mailto:fr...@dtu.dk>>> 
于2020年10月29日周四 下午3:27写道:
Your problem is the overall cluster health. The MONs store cluster history 
information that will be trimmed once it reaches HEALTH_OK. Restarting the MONs 
only makes things worse right now. The health status is a mess, no MGR, a bunch 
of PGs inactive, etc. This is what you need to resolve. How did your cluster 
end up like this?

It looks like all OSDs are up and in. You need to find out

- why there are inactive PGs
- why there are incomplete PGs

This usually happens when OSDs go missing.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Zhenshi Zhou 
<deader...@gmail.com<mailto:deader...@gmail.com><mailto:deader...@gmail.com<mailto:deader...@gmail.com>>>
Sent: 29 October 2020 07:37:19
To: ceph-users
Subject: [ceph-users] monitor sst files continue growing

Hi all,

My cluster is in wrong state. SST files in /var/lib/ceph/mon/xxx/store.db
continue growing. It claims mon are using a lot of disk space.

I set "mon compact on start = true" and restart one of the monitors. But
it started and campacting for a long time, seems it has no end.

[image.png]
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to