[ceph-users] Re: Cluster healthy, but 16.2.7 osd daemon upgrade says its unsafe to stop them?

2022-05-28 Thread Janne Johansson
Den fre 27 maj 2022 kl 18:26 skrev Sarunas Burdulis : > Thanks. I don't recall creating any of the default.* pools, so they > might have created by ceph-deploy, years ago (kraken?). They all have > min_size 1, replica 2. Those are automatically created by radosgw when it starts. -- May the most

[ceph-users] Re: Cluster healthy, but 16.2.7 osd daemon upgrade says its unsafe to stop them?

2022-05-27 Thread Sarunas Burdulis
On 5/27/22 11:41, Bogdan Adrian Velica wrote: Hi, Can you please tell us the side of your ceph cluster? How man OSDs do you have? 16 OSDs. $ ceph df --- RAW STORAGE --- CLASS SIZE    AVAIL USED  RAW USED  %RAW USED hdd    8.9 TiB  8.3 TiB  595 GiB   595 GiB   6.55 ssd    7.6 TiB 

[ceph-users] Re: Cluster healthy, but 16.2.7 osd daemon upgrade says its unsafe to stop them?

2022-05-27 Thread Bogdan Adrian Velica
Hi, Can you please tell us the side of your ceph cluster? How man OSDs do you have? The default recommendations are to have a min_size of 2 and replica 3 per replicated pool. Thank you, Bogdan Velica croit.io On Fri, May 27, 2022 at 6:33 PM Sarunas Burdulis wrote: > On 5/27/22 04:54, Robert

[ceph-users] Re: Cluster healthy, but 16.2.7 osd daemon upgrade says its unsafe to stop them?

2022-05-27 Thread Sarunas Burdulis
On 5/27/22 04:54, Robert Sander wrote: Am 26.05.22 um 20:21 schrieb Sarunas Burdulis: size 2 min_size 1 With such a setting you are guaranteed to lose data. What would you suggest? -- Sarunas Burdulis Dartmouth Mathematics math.dartmouth.edu/~sarunas · https://useplaintext.email ·

[ceph-users] Re: Cluster healthy, but 16.2.7 osd daemon upgrade says its unsafe to stop them?

2022-05-27 Thread Sarunas Burdulis
On 5/26/22 14:38, Wesley Dillingham wrote: pool 13 'mathfs_metadata' replicated size 2 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change The problem is you have size=2 and min_size=2 on this pool. I would increase the size of this pool to 3 (but i

[ceph-users] Re: Cluster healthy, but 16.2.7 osd daemon upgrade says its unsafe to stop them?

2022-05-27 Thread Robert Sander
Am 26.05.22 um 20:21 schrieb Sarunas Burdulis: size 2 min_size 1 With such a setting you are guaranteed to lose data. Regards -- Robert Sander Heinlein Consulting GmbH Schwedter Str. 8/9b, 10119 Berlin http://www.heinlein-support.de Tel: 030 / 405051-43 Fax: 030 / 405051-19 Zwangsangaben

[ceph-users] Re: Cluster healthy, but 16.2.7 osd daemon upgrade says its unsafe to stop them?

2022-05-26 Thread Wesley Dillingham
pool 13 'mathfs_metadata' replicated size 2 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change The problem is you have size=2 and min_size=2 on this pool. I would increase the size of this pool to 3 (but i would also do that to all of your pools which

[ceph-users] Re: Cluster healthy, but 16.2.7 osd daemon upgrade says its unsafe to stop them?

2022-05-26 Thread Sarunas Burdulis
On 5/26/22 14:09, Wesley Dillingham wrote: What does "ceph osd pool ls detail" say? $ ceph osd pool ls detail pool 0 'rbd' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 64 pgp_num 64 autoscale_mode on last_change 44740 flags hashpspool,selfmanaged_snaps stripe_width 0

[ceph-users] Re: Cluster healthy, but 16.2.7 osd daemon upgrade says its unsafe to stop them?

2022-05-26 Thread Wesley Dillingham
What does "ceph osd pool ls detail" say? Respectfully, *Wes Dillingham* w...@wesdillingham.com LinkedIn On Thu, May 26, 2022 at 11:24 AM Sarunas Burdulis < saru...@math.dartmouth.edu> wrote: > Running > > `ceph osd ok-to-stop 0` > > shows: > >

[ceph-users] Re: Cluster healthy, but 16.2.7 osd daemon upgrade says its unsafe to stop them?

2022-05-26 Thread Sarunas Burdulis
Running `ceph osd ok-to-stop 0` shows: {"ok_to_stop":false,"osds":[1], "num_ok_pgs":25,"num_not_ok_pgs":2, "bad_become_inactive":["13.a","13.11"],

[ceph-users] Re: Cluster healthy, but 16.2.7 osd daemon upgrade says its unsafe to stop them?

2022-05-25 Thread Sarunas Burdulis
On 25/05/2022 15.39, Tim Olow wrote: Do you have any pools with only one replica? All pools are 'replicated size' 2 or 3, 'min_size' 1 or 2. -- Sarunas Burdulis Dartmouth Mathematics https://math.dartmouth.edu/~sarunas · https://useplaintext.email ·

[ceph-users] Re: Cluster healthy, but 16.2.7 osd daemon upgrade says its unsafe to stop them?

2022-05-25 Thread Tim Olow
Do you have any pools with only one replica? Tim On 5/25/22, 1:48 PM, "Sarunas Burdulis" wrote: > ceph health detail says my 5-node cluster is healthy, yet when I ran > ceph orch upgrade start --ceph-version 16.2.7 everything seemed to go > fine until we got to the OSD section,

[ceph-users] Re: Cluster healthy, but 16.2.7 osd daemon upgrade says its unsafe to stop them?

2022-05-25 Thread Sarunas Burdulis
ceph health detail says my 5-node cluster is healthy, yet when I ran ceph orch upgrade start --ceph-version 16.2.7 everything seemed to go fine until we got to the OSD section, now for the past hour, every 15 seconds a new log entry of 'Upgrade: unsafe to stop osd(s) at this time (1 PGs are

[ceph-users] Re: Cluster healthy, but 16.2.7 osd daemon upgrade says its unsafe to stop them?

2022-02-10 Thread Zach Heise (SSCC)
Yes, these 8 PGs have been in this 'remapped' state for quite awhile. I don't know why CRUSH has not seen fit to designate new OSDs for them so that acting and up match. For the error in question - ceph upgrade is saying that only 1 PG would become offline if

[ceph-users] Re: Cluster healthy, but 16.2.7 osd daemon upgrade says its unsafe to stop them?

2022-02-10 Thread Gregory Farnum
“Up” is the set of OSDs which are alive from the calculated crush mapping. “Acting” includes those extras which have been added in to bring the PG up to proper size. So the PG does have 3 live OSDs serving it. But perhaps the safety check *is* looking at up instead of acting? That seems like a

[ceph-users] Re: Cluster healthy, but 16.2.7 osd daemon upgrade says its unsafe to stop them?

2022-02-10 Thread 胡 玮文
I believe this is the reason. I mean number of OSDs in the “up” set should be at least 1 greater than the min_size for the upgrade to proceed. Or once any OSD is stopped, it can drop below min_size, and prevent the pg from becoming active. So just cleanup the misplaced and the upgrade should

[ceph-users] Re: Cluster healthy, but 16.2.7 osd daemon upgrade says its unsafe to stop them?

2022-02-10 Thread Zach Heise (SSCC)
Hi Weiwen, thanks for replying. All of my replicated pools, including the newest ssdpool I made most recently, have a min_size of 2. My other two EC pools have a min_size of 3. Looking at pg dump output again, it does look like the two EC pools have exactly 4

[ceph-users] Re: Cluster healthy, but 16.2.7 osd daemon upgrade says its unsafe to stop them?

2022-02-10 Thread 胡 玮文
Hi Zach, How about your min_size setting? Have you checked the number of OSDs in the acting set of every PG is at least 1 greater than the min_size of the corresponding pool? Weiwen Hu > 在 2022年2月10日,05:02,Zach Heise (SSCC) 写道: > > Hello, > > ceph health detail says my 5-node cluster is

[ceph-users] Re: Cluster healthy, but 16.2.7 osd daemon upgrade says its unsafe to stop them?

2022-02-10 Thread Zach Heise (SSCC)
That's an excellent point! Between my last ceph upgrade and now, I did make a new crush ruleset and a new pool that uses that crush rule. It was just for SSDs, of which I have 5, one per host. All of my other pools are using the default crush rulesets

[ceph-users] Re: Cluster healthy, but 16.2.7 osd daemon upgrade says its unsafe to stop them?

2022-02-10 Thread Gregory Farnum
I don’t know how to get better errors out of cephadm, but the only way I can think of for this to happen is if your crush rule is somehow placing multiple replicas of a pg on a single host that cephadm wants to upgrade. So check your rules, your pool sizes, and osd tree? -Greg On Thu, Feb 10,

[ceph-users] Re: Cluster healthy, but 16.2.7 osd daemon upgrade says its unsafe to stop them?

2022-02-10 Thread Zach Heise (SSCC)
It could be an issue with the devicehealthpool as you are correct, it is a single PG - but when the cluster is reporting that everything is healthy, it's difficult where to go from there. What I don't understand is why its refusing to upgrade ANY of the osd daemons;

[ceph-users] Re: Cluster healthy, but 16.2.7 osd daemon upgrade says its unsafe to stop them?

2022-02-09 Thread Anthony D'Atri
Speculation: might the devicehealth pool be involved? It seems to typically have just 1 PG. > On Feb 9, 2022, at 1:41 PM, Zach Heise (SSCC) wrote: > > Good afternoon, thank you for your reply. Yes I know you are right, > eventually we'll switch to an odd number of mons rather than even.

[ceph-users] Re: Cluster healthy, but 16.2.7 osd daemon upgrade says its unsafe to stop them?

2022-02-09 Thread Zach Heise (SSCC)
Good afternoon, thank you for your reply. Yes I know you are right, eventually we'll switch to an odd number of mons rather than even. We're still in 'testing' mode right now and only my coworkers and I are using the cluster. Of the 7 pools, all but 2 are replica x3. The last two are EC 2+2.

[ceph-users] Re: Cluster healthy, but 16.2.7 osd daemon upgrade says its unsafe to stop them?

2022-02-09 Thread sascha a.
Hello, all your pools running replica > 1? also having 4 monitors is pretty bad for split brain situations.. Zach Heise (SSCC) schrieb am Mi., 9. Feb. 2022, 22:02: > Hello, > > ceph health detail says my 5-node cluster is healthy, yet when I ran > ceph orch upgrade start --ceph-version 16.2.7