Hi,

I have a problem with a full cluster and getting it back to a healthy state.
Fortunately it's a small test cluster with no valuable data in it.
It is used exclusively for RGW/S3, running 17.2.3.

I intentionaly filled it up via rclone/S3 until it got into HEALTH_ERR, so see 
what would happen in that situation. 
At first it sort-of looks ok, as the cluster apparently goes into a read-only 
state. I can still get the stored data via S3.

But then there seems to be no way to get out of the full state. Via S3 one 
can't delete any objects or buckets.
Or did I miss anything? The requests just hang until they time out.

So, I used "rados rm -p <pool> <obj> --force-full" to delete a bunch of those 
multipart corpses and other "old" objects.
That got the cluster back into HEALTH_OK.

But now the RGW gc seems to be screwed up:
# radosgw-admin gc list --include-all | grep oid | wc -l
158109
# radosgw-admin gc process --include-all
# radosgw-admin gc list --include-all | grep oid | wc -l
158109

I.e.. it has 158109 objects to clean up, but doesn’t clean up anything.
I guess that's because the objects it wants to collect don't exist anymore, but 
are in some index or other list.
Is there any way to reset or clean up?

I'd appreciate any hints.

Ciao, Uli

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to