[ceph-users] Re: Nautilus: Decommission an OSD Node

2023-11-05 Thread Richard Bade
Hi Dave,
It's been a few days and I haven't seen any follow up in the list so
I'm wondering if the issue is that there was a typo in your osd list?
It appears that you have 16 included again in the destination instead of 26?
"24,25,16,27,28"
I'm not familiar with the pgremapper script so I may be
misunderstanding your command.

Rich

On Thu, 2 Nov 2023 at 09:39, Dave Hall  wrote:
>
> Hello.,
>
> I've recently made the decision to gradually decommission my Nautilus
> cluster and migrate the hardware to a new Pacific or Quincy cluster. By
> gradually, I mean that as I expand the new cluster I will move (copy/erase)
> content from the old cluster to the new, making room to decommission more
> nodes and move them over.
>
> In order to do this I will, of course, need to remove OSD nodes by first
> emptying the OSDs on each node.
>
> I noticed that pgremapper (a version prior to October 2021) has a 'drain'
> subcommand that allows one to control which target OSDs would receive the
> PGs from the source OSD being drained.  This seemed like a good idea:  If
> one simply marks an OSD 'out', it's contents would be rebalanced to other
> OSDs on the same node that are still active, which seems like it would make
> a lot of unnecessary data movement and also make removing the next OSD take
> longer.
>
> So I went through the trouble of creating a 'really long' pgremapper drain
> command excluding the OSDs of two nodes as targets:
>
> # bin/pgremapper drain 16 --target-osds
> 00,01,02,03,04,05,06,07,24,25,16,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71
> --allow-movement-across host  --max-source-backfills 75 --concurrency 20
> --verbose --yes
>
>
> However, when this is complete OSD 16 actually contains more PGs than
> before I started.  It appears that the mapping generated by pgremapper also
> back-filled the OSD as it was draining it.
>
> So did I miss something here?  What is the best way to proceed?  I
> understand that it would be mayhem to mark 8 of 72 OSDs out and then turn
> backfill/rebalance/recover back on.  But it seems like there should be a
> better way.
>
> Suggestions?
>
> Thanks.
>
> -Dave
>
> --
> Dave Hall
> Binghamton University
> kdh...@binghamton.edu
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Many pgs inactive after node failure

2023-11-05 Thread Tyler Stachecki
On Sat, Nov 4, 2023, 6:44 AM Matthew Booth  wrote:

> I have a 3 node ceph cluster in my home lab. One of the pools spans 3
> hdds, one on each node, and has size 2, min size 1. One of my nodes is
> currently down, and I have 160 pgs in 'unknown' state. The other 2
> hosts are up and the cluster has quorum.
>
> Example `ceph health detail` output:
> pg 9.0 is stuck inactive for 25h, current state unknown, last acting []
>
> I have 3 questions:
>
> Why would the pgs be in an unknown state?
>

No quick answer to this, unfortunately. Try `ceph pg map 9.0` and looking
at it alongside the output of `ceph osd tree`.

Do you have device classes/CRUSH rules or anything that you were tinkering
with? Did the OSD that failed get marked out? Do you have an active mgr?
Does `ceph health detail` indicate anything else being a problem?


> I would like to recover the cluster without recovering the failed
> node, primarily so that I know I can. Is that possible?
>
> The boot nvme of the host has failed, so I will most likely rebuild
> it. I'm running rook, and I will most likely delete the old node and
> create a new one with the same name. AFAIK, the OSDs are fine. When
> rook rediscovers the OSDs, will it add them back with data intact? If
> not, is there any way I can make it so it will?
>

Assuming you used standard tools/playbooks, mostly everything just shoves
Ceph OSDs onto an LVM partition. As long as you do leave the LVM partition
alone, you can just tell Ceph to scan the LVM for metadata and "activate"
it again (in Ceph parlance) as another user mentioned.

Happy homelabbing!


> Thanks!
> --
> Matthew Booth
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSD not starting

2023-11-05 Thread Amudhan P
Hi Alex,

Thank you very much, yes it was a time sync issue after fixing time sync
OSD service started.

regards,
Amudhan

On Sat, Nov 4, 2023 at 9:07 PM Alex Gorbachev 
wrote:

> Hi  Amudhan,
>
> Have you checked the time sync?  This could be an issue:
>
> https://tracker.ceph.com/issues/17170
> --
> Alex Gorbachev
> Intelligent Systems Services Inc.
> http://www.iss-integration.com
> https://www.linkedin.com/in/alex-gorbachev-iss/
>
>
>
> On Sat, Nov 4, 2023 at 11:22 AM Amudhan P  wrote:
>
>> Hi,
>>
>> One of the server  in Ceph cluster accidentally shutdown abruptly due to
>> power failure. After restarting OSD's not coming up and in Ceph health
>> check it shows osd down.
>> When checking OSD status "osd.26 18865 unable to obtain rotating service
>> keys; retrying"
>> For every 30 seconds it's just putting a message and it's all the same in
>> all OSD in the system.
>>
>> Nov 04 20:03:05 strg-node-03 bash[34287]: debug
>> 2023-11-04T14:33:05.089+ 7f1f5693c080 -1 osd.26 18865 unable to obtain
>> rotating service keys; retrying
>> Nov 04 20:03:35 strg-node-03 bash[34287]: debug
>> 2023-11-04T14:33:35.090+ 7f1f5693c080 -1 osd.26 18865 unable to obtain
>> rotating service keys; retrying
>>
>> ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific
>> (stable)  on Debian 11 bullseye and cephadm based  installation.
>>
>> Tried to search for errors and msg couldn't find anything useful.
>>
>> How do I fix this issue ?
>>
>> regards,
>> Amudhan
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Many pgs inactive after node failure

2023-11-05 Thread Eugen Block

Hi,

this is another example why min_size 1/size 2 are a bad choice (if you  
value your data). There have been plenty discussions on this list  
about that, I'm not going into detail about that. I'm not familiar  
with rook, but activating existing OSDs usually works fine [1].


Regards,
Eugen

[1] https://docs.ceph.com/en/reef/cephadm/services/osd/#activate-existing-osds

Zitat von Matthew Booth :


I have a 3 node ceph cluster in my home lab. One of the pools spans 3
hdds, one on each node, and has size 2, min size 1. One of my nodes is
currently down, and I have 160 pgs in 'unknown' state. The other 2
hosts are up and the cluster has quorum.

Example `ceph health detail` output:
pg 9.0 is stuck inactive for 25h, current state unknown, last acting []

I have 3 questions:

Why would the pgs be in an unknown state?

I would like to recover the cluster without recovering the failed
node, primarily so that I know I can. Is that possible?

The boot nvme of the host has failed, so I will most likely rebuild
it. I'm running rook, and I will most likely delete the old node and
create a new one with the same name. AFAIK, the OSDs are fine. When
rook rediscovers the OSDs, will it add them back with data intact? If
not, is there any way I can make it so it will?

Thanks!
--
Matthew Booth
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io