[ceph-users] Re: Nautilus: Decommission an OSD Node
Hi Dave, It's been a few days and I haven't seen any follow up in the list so I'm wondering if the issue is that there was a typo in your osd list? It appears that you have 16 included again in the destination instead of 26? "24,25,16,27,28" I'm not familiar with the pgremapper script so I may be misunderstanding your command. Rich On Thu, 2 Nov 2023 at 09:39, Dave Hall wrote: > > Hello., > > I've recently made the decision to gradually decommission my Nautilus > cluster and migrate the hardware to a new Pacific or Quincy cluster. By > gradually, I mean that as I expand the new cluster I will move (copy/erase) > content from the old cluster to the new, making room to decommission more > nodes and move them over. > > In order to do this I will, of course, need to remove OSD nodes by first > emptying the OSDs on each node. > > I noticed that pgremapper (a version prior to October 2021) has a 'drain' > subcommand that allows one to control which target OSDs would receive the > PGs from the source OSD being drained. This seemed like a good idea: If > one simply marks an OSD 'out', it's contents would be rebalanced to other > OSDs on the same node that are still active, which seems like it would make > a lot of unnecessary data movement and also make removing the next OSD take > longer. > > So I went through the trouble of creating a 'really long' pgremapper drain > command excluding the OSDs of two nodes as targets: > > # bin/pgremapper drain 16 --target-osds > 00,01,02,03,04,05,06,07,24,25,16,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71 > --allow-movement-across host --max-source-backfills 75 --concurrency 20 > --verbose --yes > > > However, when this is complete OSD 16 actually contains more PGs than > before I started. It appears that the mapping generated by pgremapper also > back-filled the OSD as it was draining it. > > So did I miss something here? What is the best way to proceed? I > understand that it would be mayhem to mark 8 of 72 OSDs out and then turn > backfill/rebalance/recover back on. But it seems like there should be a > better way. > > Suggestions? > > Thanks. > > -Dave > > -- > Dave Hall > Binghamton University > kdh...@binghamton.edu > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Many pgs inactive after node failure
On Sat, Nov 4, 2023, 6:44 AM Matthew Booth wrote: > I have a 3 node ceph cluster in my home lab. One of the pools spans 3 > hdds, one on each node, and has size 2, min size 1. One of my nodes is > currently down, and I have 160 pgs in 'unknown' state. The other 2 > hosts are up and the cluster has quorum. > > Example `ceph health detail` output: > pg 9.0 is stuck inactive for 25h, current state unknown, last acting [] > > I have 3 questions: > > Why would the pgs be in an unknown state? > No quick answer to this, unfortunately. Try `ceph pg map 9.0` and looking at it alongside the output of `ceph osd tree`. Do you have device classes/CRUSH rules or anything that you were tinkering with? Did the OSD that failed get marked out? Do you have an active mgr? Does `ceph health detail` indicate anything else being a problem? > I would like to recover the cluster without recovering the failed > node, primarily so that I know I can. Is that possible? > > The boot nvme of the host has failed, so I will most likely rebuild > it. I'm running rook, and I will most likely delete the old node and > create a new one with the same name. AFAIK, the OSDs are fine. When > rook rediscovers the OSDs, will it add them back with data intact? If > not, is there any way I can make it so it will? > Assuming you used standard tools/playbooks, mostly everything just shoves Ceph OSDs onto an LVM partition. As long as you do leave the LVM partition alone, you can just tell Ceph to scan the LVM for metadata and "activate" it again (in Ceph parlance) as another user mentioned. Happy homelabbing! > Thanks! > -- > Matthew Booth > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: OSD not starting
Hi Alex, Thank you very much, yes it was a time sync issue after fixing time sync OSD service started. regards, Amudhan On Sat, Nov 4, 2023 at 9:07 PM Alex Gorbachev wrote: > Hi Amudhan, > > Have you checked the time sync? This could be an issue: > > https://tracker.ceph.com/issues/17170 > -- > Alex Gorbachev > Intelligent Systems Services Inc. > http://www.iss-integration.com > https://www.linkedin.com/in/alex-gorbachev-iss/ > > > > On Sat, Nov 4, 2023 at 11:22 AM Amudhan P wrote: > >> Hi, >> >> One of the server in Ceph cluster accidentally shutdown abruptly due to >> power failure. After restarting OSD's not coming up and in Ceph health >> check it shows osd down. >> When checking OSD status "osd.26 18865 unable to obtain rotating service >> keys; retrying" >> For every 30 seconds it's just putting a message and it's all the same in >> all OSD in the system. >> >> Nov 04 20:03:05 strg-node-03 bash[34287]: debug >> 2023-11-04T14:33:05.089+ 7f1f5693c080 -1 osd.26 18865 unable to obtain >> rotating service keys; retrying >> Nov 04 20:03:35 strg-node-03 bash[34287]: debug >> 2023-11-04T14:33:35.090+ 7f1f5693c080 -1 osd.26 18865 unable to obtain >> rotating service keys; retrying >> >> ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific >> (stable) on Debian 11 bullseye and cephadm based installation. >> >> Tried to search for errors and msg couldn't find anything useful. >> >> How do I fix this issue ? >> >> regards, >> Amudhan >> ___ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io >> > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Many pgs inactive after node failure
Hi, this is another example why min_size 1/size 2 are a bad choice (if you value your data). There have been plenty discussions on this list about that, I'm not going into detail about that. I'm not familiar with rook, but activating existing OSDs usually works fine [1]. Regards, Eugen [1] https://docs.ceph.com/en/reef/cephadm/services/osd/#activate-existing-osds Zitat von Matthew Booth : I have a 3 node ceph cluster in my home lab. One of the pools spans 3 hdds, one on each node, and has size 2, min size 1. One of my nodes is currently down, and I have 160 pgs in 'unknown' state. The other 2 hosts are up and the cluster has quorum. Example `ceph health detail` output: pg 9.0 is stuck inactive for 25h, current state unknown, last acting [] I have 3 questions: Why would the pgs be in an unknown state? I would like to recover the cluster without recovering the failed node, primarily so that I know I can. Is that possible? The boot nvme of the host has failed, so I will most likely rebuild it. I'm running rook, and I will most likely delete the old node and create a new one with the same name. AFAIK, the OSDs are fine. When rook rediscovers the OSDs, will it add them back with data intact? If not, is there any way I can make it so it will? Thanks! -- Matthew Booth ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io