[ceph-users] Re: Urgent help! RGW Disappeared on Quincy

2022-12-31 Thread Deep Dish
A few additional findings.   I'm of the impression that initial deployment
and config of RGW on quincy (net-new) creates a default Realm.   For
reasons I don't understand, my default realm no longer exists.

Could this be the issue with RGWs not being manageable / detected by my
cluster?
Under what circumstances would a Realm disappear?
How to get this fixed?

# radosgw-admin realm list

{

"default_info": "",

"realms": []

}


# radosgw-admin realm default

failed to init realm: (2) No such file or directory


# radosgw-admin realm get-default

No default realm is set



# radosgw-admin realm list-periods

failed to read realm: (2) No such file or directory


# rados ls -p .rgw.root

zonegroup_info.45518452-8aa6-41b4-99f0-059b255c31cd

zone_info.743ea532-f5bc-4cca-891b-c27a586d5129

zone_names.default

zonegroups_names.default



On Sat, Dec 31, 2022 at 3:15 PM Deep Dish  wrote:

> Hi Pavin,
>
> Happy New Year!
>
> Many thanks for the commands.  I managed to get the cluster into green
> status with the repeer command.
>
> Still had some slow MDS ops, I decided to purge a few older backups in a
> repository that's backed by one of the cephfs volumes.   This resolved all
> slow OPS -- there must have been an inconsistent file causing an issue, not
> a big deal as it's a backup (likely manifested itself when cluster was
> rebuilding / rebalancing).
>
> I'm still having an issue in getting RGWs to be manageable / detected by
> the cluster (any ideas appreciated):
> - Increased debug logging to 5/5 across debug_rgw,debug_rgw_datacache,
> debug_rgw_sync
> - Redeployed RGW service (removed and recreated service);
> - All GWs show similar log entries -- There's a failed init realm ID error
> (could that be it?):
>
> 2022-12-31T20:02:55.570+ 7fea6d76a5c0  0 deferred set uid:gid to
> 167:167 (ceph:ceph)
>
> 2022-12-31T20:02:55.570+ 7fea6d76a5c0  0 ceph version 17.2.5
> (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable), process
> radosgw, pid 2
>
> 2022-12-31T20:02:55.570+ 7fea6d76a5c0  0 framework: beast
>
> 2022-12-31T20:02:55.570+ 7fea6d76a5c0  0 framework conf key: port,
> val: 80
>
> 2022-12-31T20:02:55.570+ 7fea6d76a5c0  1 radosgw_Main not setting
> numa affinity
>
> 2022-12-31T20:02:55.574+ 7fea6d76a5c0  1 rgw_d3n:
> rgw_d3n_l1_local_datacache_enabled=0
>
> 2022-12-31T20:02:55.574+ 7fea6d76a5c0  1 D3N datacache enabled: 0
>
> 2022-12-31T20:02:55.590+ 7fea6d76a5c0  4 rgw main: RGWPeriod::init
> failed to init realm  id  : (2) No such file or directory
>
> 2022-12-31T20:02:55.686+ 7fea6d76a5c0  4 rgw main: Realm:
>   ()
>
> 2022-12-31T20:02:55.686+ 7fea6d76a5c0  4 rgw main: ZoneGroup: default
> (45518452-8aa6-41b4-99f0-059b255c31cd)
>
> 2022-12-31T20:02:55.686+ 7fea6d76a5c0  4 rgw main: Zone:  default
> (743ea532-f5bc-4cca-891b-c27a586d5129)
>
> 2022-12-31T20:02:56.010+ 7fea6d76a5c0  2 all 8 watchers are set,
> enabling cache
>
> 2022-12-31T20:02:56.026+ 7fea52ab9700  2 rgw data changes log:
> RGWDataChangesLog::ChangesRenewThread: start
>
> 2022-12-31T20:02:57.170+ 7fea4eab1700  2 garbage collection: garbage
> collection: start
>
> 2022-12-31T20:02:57.170+ 7fea4e2b0700  2 rgw object expirer Worker
> thread: object expiration: start
>
> 2022-12-31T20:02:57.170+ 7fea48aa5700  5 lifecycle: schedule life
> cycle next start time: Sun Jan  1 00:00:00 2023
>
> 2022-12-31T20:02:57.170+ 7fea46aa1700  5 lifecycle: schedule life
> cycle next start time: Sun Jan  1 00:00:00 2023
>
> 2022-12-31T20:02:57.170+ 7fea44a9d700  5 lifecycle: schedule life
> cycle next start time: Sun Jan  1 00:00:00 2023
>
> 2022-12-31T20:02:59.186+ 7fea4eab1700  2 garbage collection: garbage
> collection: stop
>
> 2022-12-31T20:03:18.029+ 7fea52ab9700  2 rgw data changes log:
> RGWDataChangesLog::ChangesRenewThread: start
>
> 2022-12-31T20:03:40.029+ 7fea52ab9700  2 rgw data changes log:
> RGWDataChangesLog::ChangesRenewThread: start
>
> 2022-12-31T20:04:02.028+ 7fea52ab9700  2 rgw data changes log:
> RGWDataChangesLog::ChangesRenewThread: start
>
> 2022-12-31T20:04:24.027+ 7fea52ab9700  2 rgw data changes log:
> RGWDataChangesLog::ChangesRenewThread: start
>
> 2022-12-31T20:04:46.031+ 7fea52ab9700  2 rgw data changes log:
> RGWDataChangesLog::ChangesRenewThread: start
>
> 2022-12-31T20:05:08.030+ 7fea52ab9700  2 rgw data changes log:
> RGWDataChangesLog::ChangesRenewThread: start
>
> 2022-12-31T20:05:30.030+ 7fea52ab9700  2 rgw data changes log:
> RGWDataChangesLog::ChangesRenewThread: start
>
> 2022-12-31T20:05:52.029+ 7fea52ab9700  2 rgw data changes log:
> RGWDataChangesLog::ChangesRenewThread: start
>
> 2022-12-31T20:06:14.029+ 7fea52ab9700  2 rgw data changes log:
> RGWDataChangesLog::ChangesRenewThread: start
>
> 2022-12-31T20:06:36.028+ 7fea52ab9700  2 rgw data changes log:
> RGWDataChangesLog::ChangesRenewThread: start
>
> 2022-12-31T20:06:58.032+ 7fea52ab9700  

[ceph-users] Re: Urgent help! RGW Disappeared on Quincy

2022-12-31 Thread Deep Dish
Hi Pavin,

Happy New Year!

Many thanks for the commands.  I managed to get the cluster into green
status with the repeer command.

Still had some slow MDS ops, I decided to purge a few older backups in a
repository that's backed by one of the cephfs volumes.   This resolved all
slow OPS -- there must have been an inconsistent file causing an issue, not
a big deal as it's a backup (likely manifested itself when cluster was
rebuilding / rebalancing).

I'm still having an issue in getting RGWs to be manageable / detected by
the cluster (any ideas appreciated):
- Increased debug logging to 5/5 across debug_rgw,debug_rgw_datacache,
debug_rgw_sync
- Redeployed RGW service (removed and recreated service);
- All GWs show similar log entries -- There's a failed init realm ID error
(could that be it?):

2022-12-31T20:02:55.570+ 7fea6d76a5c0  0 deferred set uid:gid to
167:167 (ceph:ceph)

2022-12-31T20:02:55.570+ 7fea6d76a5c0  0 ceph version 17.2.5
(98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable), process
radosgw, pid 2

2022-12-31T20:02:55.570+ 7fea6d76a5c0  0 framework: beast

2022-12-31T20:02:55.570+ 7fea6d76a5c0  0 framework conf key: port, val:
80

2022-12-31T20:02:55.570+ 7fea6d76a5c0  1 radosgw_Main not setting numa
affinity

2022-12-31T20:02:55.574+ 7fea6d76a5c0  1 rgw_d3n:
rgw_d3n_l1_local_datacache_enabled=0

2022-12-31T20:02:55.574+ 7fea6d76a5c0  1 D3N datacache enabled: 0

2022-12-31T20:02:55.590+ 7fea6d76a5c0  4 rgw main: RGWPeriod::init
failed to init realm  id  : (2) No such file or directory

2022-12-31T20:02:55.686+ 7fea6d76a5c0  4 rgw main: Realm:
()

2022-12-31T20:02:55.686+ 7fea6d76a5c0  4 rgw main: ZoneGroup: default
(45518452-8aa6-41b4-99f0-059b255c31cd)

2022-12-31T20:02:55.686+ 7fea6d76a5c0  4 rgw main: Zone:  default
(743ea532-f5bc-4cca-891b-c27a586d5129)

2022-12-31T20:02:56.010+ 7fea6d76a5c0  2 all 8 watchers are set,
enabling cache

2022-12-31T20:02:56.026+ 7fea52ab9700  2 rgw data changes log:
RGWDataChangesLog::ChangesRenewThread: start

2022-12-31T20:02:57.170+ 7fea4eab1700  2 garbage collection: garbage
collection: start

2022-12-31T20:02:57.170+ 7fea4e2b0700  2 rgw object expirer Worker
thread: object expiration: start

2022-12-31T20:02:57.170+ 7fea48aa5700  5 lifecycle: schedule life cycle
next start time: Sun Jan  1 00:00:00 2023

2022-12-31T20:02:57.170+ 7fea46aa1700  5 lifecycle: schedule life cycle
next start time: Sun Jan  1 00:00:00 2023

2022-12-31T20:02:57.170+ 7fea44a9d700  5 lifecycle: schedule life cycle
next start time: Sun Jan  1 00:00:00 2023

2022-12-31T20:02:59.186+ 7fea4eab1700  2 garbage collection: garbage
collection: stop

2022-12-31T20:03:18.029+ 7fea52ab9700  2 rgw data changes log:
RGWDataChangesLog::ChangesRenewThread: start

2022-12-31T20:03:40.029+ 7fea52ab9700  2 rgw data changes log:
RGWDataChangesLog::ChangesRenewThread: start

2022-12-31T20:04:02.028+ 7fea52ab9700  2 rgw data changes log:
RGWDataChangesLog::ChangesRenewThread: start

2022-12-31T20:04:24.027+ 7fea52ab9700  2 rgw data changes log:
RGWDataChangesLog::ChangesRenewThread: start

2022-12-31T20:04:46.031+ 7fea52ab9700  2 rgw data changes log:
RGWDataChangesLog::ChangesRenewThread: start

2022-12-31T20:05:08.030+ 7fea52ab9700  2 rgw data changes log:
RGWDataChangesLog::ChangesRenewThread: start

2022-12-31T20:05:30.030+ 7fea52ab9700  2 rgw data changes log:
RGWDataChangesLog::ChangesRenewThread: start

2022-12-31T20:05:52.029+ 7fea52ab9700  2 rgw data changes log:
RGWDataChangesLog::ChangesRenewThread: start

2022-12-31T20:06:14.029+ 7fea52ab9700  2 rgw data changes log:
RGWDataChangesLog::ChangesRenewThread: start

2022-12-31T20:06:36.028+ 7fea52ab9700  2 rgw data changes log:
RGWDataChangesLog::ChangesRenewThread: start

2022-12-31T20:06:58.032+ 7fea52ab9700  2 rgw data changes log:
RGWDataChangesLog::ChangesRenewThread: start

2022-12-31T20:07:20.031+ 7fea52ab9700  2 rgw data changes log:
RGWDataChangesLog::ChangesRenewThread: start

2022-12-31T20:07:42.031+ 7fea52ab9700  2 rgw data changes log:
RGWDataChangesLog::ChangesRenewThread: start

2022-12-31T20:08:04.030+ 7fea52ab9700  2 rgw data changes log:
RGWDataChangesLog::ChangesRenewThread: start

2022-12-31T20:08:26.029+ 7fea52ab9700  2 rgw data changes log:
RGWDataChangesLog::ChangesRenewThread: start

2022-12-31T20:08:48.029+ 7fea52ab9700  2 rgw data changes log:
RGWDataChangesLog::ChangesRenewThread: start

2022-12-31T20:09:10.032+ 7fea52ab9700  2 rgw data changes log:
RGWDataChangesLog::ChangesRenewThread: start

2022-12-31T20:09:32.028+ 7fea52ab9700  2 rgw data changes log:
RGWDataChangesLog::ChangesRenewThread: start

2022-12-31T20:09:54.027+ 7fea52ab9700  2 rgw data changes log:
RGWDataChangesLog::ChangesRenewThread: start



On Sat, Dec 31, 2022 at 1:22 AM Pavin Joseph  wrote:

> Hey there,
>
> Sorry for the late reply.
> If the pg 

[ceph-users] Re: How to shutdown a ceph node

2022-12-31 Thread Boris
Yes I do. It's the ceph default and we use it on any cluster size (smallest is 
3 Hosts with 6 disks each) and it removes a lot of headache. :-)

And as OP did not provide and config I assumed that he uses the default. 

Happy new year. 

> Am 31.12.2022 um 15:11 schrieb Anthony D'Atri :
> 
> Are you using size=3 replication and failure domain = host?  If so you’ll be 
> ok.
> We see folks sometimes using an EC profile that will result in PGs down, 
> especially with such a small cluster.
> 
>> On Dec 31, 2022, at 4:11 AM, Boris  wrote:
>> 
>> Hi,
>> I usually do 'ceph osd set noout' and 'ceph osd set norebalance' and then 
>> shut down the OS normally.
>> 
>> After everything is done I unset bot values and let the objects recover. 
>> 
>> Cheers and happy new year. 
>> 
 Am 31.12.2022 um 08:52 schrieb Bülent ŞENGÜLER :
>>> 
>>> Hello,
>>> 
>>> I have a ceph cluster with 4 nodes and  İ have to shutdown one node of them
>>> due to electricity maintaince. I found how a cluster shutdown but I could
>>> not find to close a node. How can I power off a node gracefully.Thanks for
>>> answer.
>>> 
>>> Regards.
>>> ___
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to shutdown a ceph node

2022-12-31 Thread Anthony D'Atri
Are you using size=3 replication and failure domain = host?  If so you’ll be ok.
We see folks sometimes using an EC profile that will result in PGs down, 
especially with such a small cluster.

> On Dec 31, 2022, at 4:11 AM, Boris  wrote:
> 
> Hi,
> I usually do 'ceph osd set noout' and 'ceph osd set norebalance' and then 
> shut down the OS normally.
> 
> After everything is done I unset bot values and let the objects recover. 
> 
> Cheers and happy new year. 
> 
>> Am 31.12.2022 um 08:52 schrieb Bülent ŞENGÜLER :
>> 
>> Hello,
>> 
>> I have a ceph cluster with 4 nodes and  İ have to shutdown one node of them
>> due to electricity maintaince. I found how a cluster shutdown but I could
>> not find to close a node. How can I power off a node gracefully.Thanks for
>> answer.
>> 
>> Regards.
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to shutdown a ceph node

2022-12-31 Thread Boris
Hi,
I usually do 'ceph osd set noout' and 'ceph osd set norebalance' and then shut 
down the OS normally.

After everything is done I unset bot values and let the objects recover. 

Cheers and happy new year. 

> Am 31.12.2022 um 08:52 schrieb Bülent ŞENGÜLER :
> 
> Hello,
> 
> I have a ceph cluster with 4 nodes and  İ have to shutdown one node of them
> due to electricity maintaince. I found how a cluster shutdown but I could
> not find to close a node. How can I power off a node gracefully.Thanks for
> answer.
> 
> Regards.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io