Happy New Year, glad you got the fundamentals fixed :)
I'm not very familiar with RGW/Object storage as I mainly use CephFS in
production.
The missing realm is probably the reason the rgw is not working with
your configuration and perhaps the orchestrator fudged it on upgrade.
Since you've
A few additional findings. I'm of the impression that initial deployment
and config of RGW on quincy (net-new) creates a default Realm. For
reasons I don't understand, my default realm no longer exists.
Could this be the issue with RGWs not being manageable / detected by my
cluster?
Under
Hi Pavin,
Happy New Year!
Many thanks for the commands. I managed to get the cluster into green
status with the repeer command.
Still had some slow MDS ops, I decided to purge a few older backups in a
repository that's backed by one of the cephfs volumes. This resolved all
slow OPS -- there
Hey there,
Sorry for the late reply.
If the pg issue isn't solved yet, could you run these:
ceph pg repeer
ceph pg repair
Pavin.
On 29-Dec-22 4:08 AM, Deep Dish wrote:
Hi Pavin,
The following are additional developments.. There's one PG that's
stuck and unable to recover. I've attached
Hi Pavin,
The following are additional developments.. There's one PG that's
stuck and unable to recover. I've attached relevant ceph -s / health
detail and pg stat outputs below.
- There were some remaining lock files as suggested in /var/run/ceph/
pertaining to rgw. I removed the service,
1. This is a guess, but check /var/[lib|run]/ceph for any lock files.
2. This is more straightforward to fix, add faster WAL/Block device/LV
for each OSD or create a fast storage pool just for metadata. Also,
experiment with MDS cache size/trim [0] settings.
[0]:
Got logging enabled as per
https://ceph.io/en/news/blog/2022/centralized_logging/. My embedded
grafana doesn't come up in the dashboard, but at least I have log (files)
on my nodes. Interesting.
Two issues plaguing my cluster:
1 - RGWs not manageable
2 - MDS_SLOW_METADATA_IO warning (impact
HI Pavin,
Thanks for the reply. I'm a bit at a loss honestly as this worked
perfectly without any issue up until the rebalance of the cluster.
Orchestrator is great. Aside from this (which I suspect is not
orchestrator related), I haven't had any issues.
In terms of logs, I'm not sure where
Quick update:
- I followed documentation, and ran the following:
# ceph dashboard set-rgw-credentials
Error EINVAL: No RGW credentials found, please consult the documentation on
how to enable RGW for the dashboard.
- I see dashboard credentials configured (all this was working fine before):