[ceph-users] Re: Urgent help! RGW Disappeared on Quincy
A few additional findings. I'm of the impression that initial deployment and config of RGW on quincy (net-new) creates a default Realm. For reasons I don't understand, my default realm no longer exists. Could this be the issue with RGWs not being manageable / detected by my cluster? Under what circumstances would a Realm disappear? How to get this fixed? # radosgw-admin realm list { "default_info": "", "realms": [] } # radosgw-admin realm default failed to init realm: (2) No such file or directory # radosgw-admin realm get-default No default realm is set # radosgw-admin realm list-periods failed to read realm: (2) No such file or directory # rados ls -p .rgw.root zonegroup_info.45518452-8aa6-41b4-99f0-059b255c31cd zone_info.743ea532-f5bc-4cca-891b-c27a586d5129 zone_names.default zonegroups_names.default On Sat, Dec 31, 2022 at 3:15 PM Deep Dish wrote: > Hi Pavin, > > Happy New Year! > > Many thanks for the commands. I managed to get the cluster into green > status with the repeer command. > > Still had some slow MDS ops, I decided to purge a few older backups in a > repository that's backed by one of the cephfs volumes. This resolved all > slow OPS -- there must have been an inconsistent file causing an issue, not > a big deal as it's a backup (likely manifested itself when cluster was > rebuilding / rebalancing). > > I'm still having an issue in getting RGWs to be manageable / detected by > the cluster (any ideas appreciated): > - Increased debug logging to 5/5 across debug_rgw,debug_rgw_datacache, > debug_rgw_sync > - Redeployed RGW service (removed and recreated service); > - All GWs show similar log entries -- There's a failed init realm ID error > (could that be it?): > > 2022-12-31T20:02:55.570+ 7fea6d76a5c0 0 deferred set uid:gid to > 167:167 (ceph:ceph) > > 2022-12-31T20:02:55.570+ 7fea6d76a5c0 0 ceph version 17.2.5 > (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable), process > radosgw, pid 2 > > 2022-12-31T20:02:55.570+ 7fea6d76a5c0 0 framework: beast > > 2022-12-31T20:02:55.570+ 7fea6d76a5c0 0 framework conf key: port, > val: 80 > > 2022-12-31T20:02:55.570+ 7fea6d76a5c0 1 radosgw_Main not setting > numa affinity > > 2022-12-31T20:02:55.574+ 7fea6d76a5c0 1 rgw_d3n: > rgw_d3n_l1_local_datacache_enabled=0 > > 2022-12-31T20:02:55.574+ 7fea6d76a5c0 1 D3N datacache enabled: 0 > > 2022-12-31T20:02:55.590+ 7fea6d76a5c0 4 rgw main: RGWPeriod::init > failed to init realm id : (2) No such file or directory > > 2022-12-31T20:02:55.686+ 7fea6d76a5c0 4 rgw main: Realm: > () > > 2022-12-31T20:02:55.686+ 7fea6d76a5c0 4 rgw main: ZoneGroup: default > (45518452-8aa6-41b4-99f0-059b255c31cd) > > 2022-12-31T20:02:55.686+ 7fea6d76a5c0 4 rgw main: Zone: default > (743ea532-f5bc-4cca-891b-c27a586d5129) > > 2022-12-31T20:02:56.010+ 7fea6d76a5c0 2 all 8 watchers are set, > enabling cache > > 2022-12-31T20:02:56.026+ 7fea52ab9700 2 rgw data changes log: > RGWDataChangesLog::ChangesRenewThread: start > > 2022-12-31T20:02:57.170+ 7fea4eab1700 2 garbage collection: garbage > collection: start > > 2022-12-31T20:02:57.170+ 7fea4e2b0700 2 rgw object expirer Worker > thread: object expiration: start > > 2022-12-31T20:02:57.170+ 7fea48aa5700 5 lifecycle: schedule life > cycle next start time: Sun Jan 1 00:00:00 2023 > > 2022-12-31T20:02:57.170+ 7fea46aa1700 5 lifecycle: schedule life > cycle next start time: Sun Jan 1 00:00:00 2023 > > 2022-12-31T20:02:57.170+ 7fea44a9d700 5 lifecycle: schedule life > cycle next start time: Sun Jan 1 00:00:00 2023 > > 2022-12-31T20:02:59.186+ 7fea4eab1700 2 garbage collection: garbage > collection: stop > > 2022-12-31T20:03:18.029+ 7fea52ab9700 2 rgw data changes log: > RGWDataChangesLog::ChangesRenewThread: start > > 2022-12-31T20:03:40.029+ 7fea52ab9700 2 rgw data changes log: > RGWDataChangesLog::ChangesRenewThread: start > > 2022-12-31T20:04:02.028+ 7fea52ab9700 2 rgw data changes log: > RGWDataChangesLog::ChangesRenewThread: start > > 2022-12-31T20:04:24.027+ 7fea52ab9700 2 rgw data changes log: > RGWDataChangesLog::ChangesRenewThread: start > > 2022-12-31T20:04:46.031+ 7fea52ab9700 2 rgw data changes log: > RGWDataChangesLog::ChangesRenewThread: start > > 2022-12-31T20:05:08.030+ 7fea52ab9700 2 rgw data changes log: > RGWDataChangesLog::ChangesRenewThread: start > > 2022-12-31T20:05:30.030+ 7fea52ab9700 2 rgw data changes log: > RGWDataChangesLog::ChangesRenewThread: start > > 2022-12-31T20:05:52.029+ 7fea52ab9700 2 rgw data changes log: > RGWDataChangesLog::ChangesRenewThread: start > > 2022-12-31T20:06:14.029+ 7fea52ab9700 2 rgw data changes log: > RGWDataChangesLog::ChangesRenewThread: start > > 2022-12-31T20:06:36.028+ 7fea52ab9700 2 rgw data changes log: > RGWDataChangesLog::ChangesRenewThread: start > > 2022-12-31T20:06:58.032+ 7fea52ab9700
[ceph-users] Re: Urgent help! RGW Disappeared on Quincy
Hi Pavin, Happy New Year! Many thanks for the commands. I managed to get the cluster into green status with the repeer command. Still had some slow MDS ops, I decided to purge a few older backups in a repository that's backed by one of the cephfs volumes. This resolved all slow OPS -- there must have been an inconsistent file causing an issue, not a big deal as it's a backup (likely manifested itself when cluster was rebuilding / rebalancing). I'm still having an issue in getting RGWs to be manageable / detected by the cluster (any ideas appreciated): - Increased debug logging to 5/5 across debug_rgw,debug_rgw_datacache, debug_rgw_sync - Redeployed RGW service (removed and recreated service); - All GWs show similar log entries -- There's a failed init realm ID error (could that be it?): 2022-12-31T20:02:55.570+ 7fea6d76a5c0 0 deferred set uid:gid to 167:167 (ceph:ceph) 2022-12-31T20:02:55.570+ 7fea6d76a5c0 0 ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable), process radosgw, pid 2 2022-12-31T20:02:55.570+ 7fea6d76a5c0 0 framework: beast 2022-12-31T20:02:55.570+ 7fea6d76a5c0 0 framework conf key: port, val: 80 2022-12-31T20:02:55.570+ 7fea6d76a5c0 1 radosgw_Main not setting numa affinity 2022-12-31T20:02:55.574+ 7fea6d76a5c0 1 rgw_d3n: rgw_d3n_l1_local_datacache_enabled=0 2022-12-31T20:02:55.574+ 7fea6d76a5c0 1 D3N datacache enabled: 0 2022-12-31T20:02:55.590+ 7fea6d76a5c0 4 rgw main: RGWPeriod::init failed to init realm id : (2) No such file or directory 2022-12-31T20:02:55.686+ 7fea6d76a5c0 4 rgw main: Realm: () 2022-12-31T20:02:55.686+ 7fea6d76a5c0 4 rgw main: ZoneGroup: default (45518452-8aa6-41b4-99f0-059b255c31cd) 2022-12-31T20:02:55.686+ 7fea6d76a5c0 4 rgw main: Zone: default (743ea532-f5bc-4cca-891b-c27a586d5129) 2022-12-31T20:02:56.010+ 7fea6d76a5c0 2 all 8 watchers are set, enabling cache 2022-12-31T20:02:56.026+ 7fea52ab9700 2 rgw data changes log: RGWDataChangesLog::ChangesRenewThread: start 2022-12-31T20:02:57.170+ 7fea4eab1700 2 garbage collection: garbage collection: start 2022-12-31T20:02:57.170+ 7fea4e2b0700 2 rgw object expirer Worker thread: object expiration: start 2022-12-31T20:02:57.170+ 7fea48aa5700 5 lifecycle: schedule life cycle next start time: Sun Jan 1 00:00:00 2023 2022-12-31T20:02:57.170+ 7fea46aa1700 5 lifecycle: schedule life cycle next start time: Sun Jan 1 00:00:00 2023 2022-12-31T20:02:57.170+ 7fea44a9d700 5 lifecycle: schedule life cycle next start time: Sun Jan 1 00:00:00 2023 2022-12-31T20:02:59.186+ 7fea4eab1700 2 garbage collection: garbage collection: stop 2022-12-31T20:03:18.029+ 7fea52ab9700 2 rgw data changes log: RGWDataChangesLog::ChangesRenewThread: start 2022-12-31T20:03:40.029+ 7fea52ab9700 2 rgw data changes log: RGWDataChangesLog::ChangesRenewThread: start 2022-12-31T20:04:02.028+ 7fea52ab9700 2 rgw data changes log: RGWDataChangesLog::ChangesRenewThread: start 2022-12-31T20:04:24.027+ 7fea52ab9700 2 rgw data changes log: RGWDataChangesLog::ChangesRenewThread: start 2022-12-31T20:04:46.031+ 7fea52ab9700 2 rgw data changes log: RGWDataChangesLog::ChangesRenewThread: start 2022-12-31T20:05:08.030+ 7fea52ab9700 2 rgw data changes log: RGWDataChangesLog::ChangesRenewThread: start 2022-12-31T20:05:30.030+ 7fea52ab9700 2 rgw data changes log: RGWDataChangesLog::ChangesRenewThread: start 2022-12-31T20:05:52.029+ 7fea52ab9700 2 rgw data changes log: RGWDataChangesLog::ChangesRenewThread: start 2022-12-31T20:06:14.029+ 7fea52ab9700 2 rgw data changes log: RGWDataChangesLog::ChangesRenewThread: start 2022-12-31T20:06:36.028+ 7fea52ab9700 2 rgw data changes log: RGWDataChangesLog::ChangesRenewThread: start 2022-12-31T20:06:58.032+ 7fea52ab9700 2 rgw data changes log: RGWDataChangesLog::ChangesRenewThread: start 2022-12-31T20:07:20.031+ 7fea52ab9700 2 rgw data changes log: RGWDataChangesLog::ChangesRenewThread: start 2022-12-31T20:07:42.031+ 7fea52ab9700 2 rgw data changes log: RGWDataChangesLog::ChangesRenewThread: start 2022-12-31T20:08:04.030+ 7fea52ab9700 2 rgw data changes log: RGWDataChangesLog::ChangesRenewThread: start 2022-12-31T20:08:26.029+ 7fea52ab9700 2 rgw data changes log: RGWDataChangesLog::ChangesRenewThread: start 2022-12-31T20:08:48.029+ 7fea52ab9700 2 rgw data changes log: RGWDataChangesLog::ChangesRenewThread: start 2022-12-31T20:09:10.032+ 7fea52ab9700 2 rgw data changes log: RGWDataChangesLog::ChangesRenewThread: start 2022-12-31T20:09:32.028+ 7fea52ab9700 2 rgw data changes log: RGWDataChangesLog::ChangesRenewThread: start 2022-12-31T20:09:54.027+ 7fea52ab9700 2 rgw data changes log: RGWDataChangesLog::ChangesRenewThread: start On Sat, Dec 31, 2022 at 1:22 AM Pavin Joseph wrote: > Hey there, > > Sorry for the late reply. > If the pg
[ceph-users] Re: How to shutdown a ceph node
Yes I do. It's the ceph default and we use it on any cluster size (smallest is 3 Hosts with 6 disks each) and it removes a lot of headache. :-) And as OP did not provide and config I assumed that he uses the default. Happy new year. > Am 31.12.2022 um 15:11 schrieb Anthony D'Atri : > > Are you using size=3 replication and failure domain = host? If so you’ll be > ok. > We see folks sometimes using an EC profile that will result in PGs down, > especially with such a small cluster. > >> On Dec 31, 2022, at 4:11 AM, Boris wrote: >> >> Hi, >> I usually do 'ceph osd set noout' and 'ceph osd set norebalance' and then >> shut down the OS normally. >> >> After everything is done I unset bot values and let the objects recover. >> >> Cheers and happy new year. >> Am 31.12.2022 um 08:52 schrieb Bülent ŞENGÜLER : >>> >>> Hello, >>> >>> I have a ceph cluster with 4 nodes and İ have to shutdown one node of them >>> due to electricity maintaince. I found how a cluster shutdown but I could >>> not find to close a node. How can I power off a node gracefully.Thanks for >>> answer. >>> >>> Regards. >>> ___ >>> ceph-users mailing list -- ceph-users@ceph.io >>> To unsubscribe send an email to ceph-users-le...@ceph.io >> ___ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: How to shutdown a ceph node
Are you using size=3 replication and failure domain = host? If so you’ll be ok. We see folks sometimes using an EC profile that will result in PGs down, especially with such a small cluster. > On Dec 31, 2022, at 4:11 AM, Boris wrote: > > Hi, > I usually do 'ceph osd set noout' and 'ceph osd set norebalance' and then > shut down the OS normally. > > After everything is done I unset bot values and let the objects recover. > > Cheers and happy new year. > >> Am 31.12.2022 um 08:52 schrieb Bülent ŞENGÜLER : >> >> Hello, >> >> I have a ceph cluster with 4 nodes and İ have to shutdown one node of them >> due to electricity maintaince. I found how a cluster shutdown but I could >> not find to close a node. How can I power off a node gracefully.Thanks for >> answer. >> >> Regards. >> ___ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: How to shutdown a ceph node
Hi, I usually do 'ceph osd set noout' and 'ceph osd set norebalance' and then shut down the OS normally. After everything is done I unset bot values and let the objects recover. Cheers and happy new year. > Am 31.12.2022 um 08:52 schrieb Bülent ŞENGÜLER : > > Hello, > > I have a ceph cluster with 4 nodes and İ have to shutdown one node of them > due to electricity maintaince. I found how a cluster shutdown but I could > not find to close a node. How can I power off a node gracefully.Thanks for > answer. > > Regards. > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io