[ceph-users] Re: Cluster healthy, but 16.2.7 osd daemon upgrade says its unsafe to stop them?

胡玮文 Thu, 10 Feb 2022 11:26:17 -0800

I believe this is the reason.

I mean number of OSDs in the “up” set should be at least 1 greater than the 
min_size for the upgrade to proceed. Or once any OSD is stopped, it can drop 
below min_size, and prevent the pg from becoming active. So just cleanup the 
misplaced and the upgrade should proceed automatically.


But I’m a little confused. I think if you have only 2 up OSD in a replicate x3 
pool, it should in degraded state, and should give you a HEALTH_WARN.

在 2022年2月11日，03:06，Zach Heise (SSCC) <he...@ssc.wisc.edu> 写道：



Hi Weiwen, thanks for replying.

All of my replicated pools, including the newest ssdpool I made most recently, 
have a min_size of 2. My other two EC pools have a min_size of 3.

Looking at pg dump output again, it does look like the two EC pools have 
exactly 4 OSDs listed in the "Acting" column, and everything else has 3 OSDs in 
Acting. So that's as it should be, I believe?

I do have some 'misplaced' objects on 8 different PGs (the 
active+clean+remapped ones listed in my original ceph -s output), that only 
have 2 "up" OSDs listed, but in the "Acting" columns each have 3 OSDs as they 
should. Apparently these 231 misplaced objects aren't enough to cause ceph to 
drop out of HEALTH_OK status.

Zach


On 2022-02-10 12:41 PM, huw...@outlook.com<mailto:huw...@outlook.com> wrote:

Hi Zach,

How about your min_size setting? Have you checked the number of OSDs in the 
acting set of every PG is at least 1 greater than the min_size of the 
corresponding pool?

Weiwen Hu



在 2022年2月10日，05:02，Zach Heise (SSCC) 
<he...@ssc.wisc.edu><mailto:he...@ssc.wisc.edu> 写道：

Hello,

ceph health detail says my 5-node cluster is healthy, yet when I ran ceph orch 
upgrade start --ceph-version 16.2.7 everything seemed to go fine until we got 
to the OSD section, now for the past hour, every 15 seconds a new log entry of  
'Upgrade: unsafe to stop osd(s) at this time (1 PGs are or would become 
offline)' appears in the logs.

ceph pg dump_stuck (unclean, degraded, etc) shows "ok" for everything too. Yet 
somehow 1 PG is (apparently) holding up all the OSD upgrades and not letting 
the process finish. Should I stop the upgrade and try it again? (I haven't done 
that before so was just nervous to try it). Any other ideas?

 cluster:
   id:     9aa000e8-b999-11eb-82f2-ecf4bbcc0ac0
   health: HEALTH_OK
  services:
   mon: 4 daemons, quorum ceph05,ceph04,ceph01,ceph03 (age 92m)
   mgr: ceph03.futetp(active, since 97m), standbys: ceph01.fblojp
   mds: 1/1 daemons up, 1 hot standby
   osd: 33 osds: 33 up (since 2h), 33 in (since 4h); 9 remapped pgs
  data:
   volumes: 1/1 healthy
   pools:   7 pools, 193 pgs
   objects: 3.72k objects, 14 GiB
   usage:   43 GiB used, 64 TiB / 64 TiB avail
   pgs:     231/11170 objects misplaced (2.068%)
            185 active+clean
            8   active+clean+remapped
  io:
   client:   1.2 KiB/s rd, 2 op/s rd, 0 op/s wr
  progress:
   Upgrade to 16.2.7 (5m)
     [=====.......................] (remaining: 24m)

--
Zach
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io<mailto:ceph-users@ceph.io>
To unsubscribe send an email to 
ceph-users-le...@ceph.io<mailto:ceph-users-le...@ceph.io>

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Cluster healthy, but 16.2.7 osd daemon upgrade says its unsafe to stop them?

Reply via email to