Recently on one of our bigger clusters (~1,900 OSDs) running Luminous (12.2.8), 
we had a problem where OSDs would frequently get restarted while deep-scrubbing.

After digging into it I found that a number of the OSDs had very large omap 
directories (50GiB+).  I believe these were OSDs that had previous held PGs 
that were part of the .rgw.buckets.index pool which I have recently moved to 
all SSDs, however, it seems like the data remained on the HDDs.

I was able to reduce the data usage on most of the OSDs (from ~50 GiB to < 200 
MiB!) by compacting the omap dbs offline by setting 'leveldb_compact_on_mount = 
true' in the [osd] section of ceph.conf, but that didn't work on the newer OSDs 
which use rocksdb.  On those I had to do an online compaction using a command 
like:

$ ceph tell osd.510 compact

That worked, but today when I tried doing that on some of the SSD-based OSDs 
which are backing .rgw.buckets.index I started getting slow requests and the 
compaction ultimately failed with this error:

$ ceph tell osd.1720 compact
osd.1720: Error ENXIO: osd down

When I tried it again it succeeded:

$ ceph tell osd.1720 compact
osd.1720: compacted omap in 420.999 seconds

The data usage on that OSD dropped from 57.8 GiB to 43.4 GiB which was nice, 
but I don't believe that'll get any smaller until I start splitting the PGs in 
the .rgw.buckets.index pool to better distribute that pool across the SSD-based 
OSDs.

The first question I have is what is the option to do an offline compaction of 
rocksdb so I don't impact our customers while compacting the rest of the 
SSD-based OSDs?

The next question is if there's a way to configure Ceph to automatically 
compact the omap dbs in the background in a way that doesn't affect user 
experience?

Finally, I was able to figure out that the omap directories were getting large 
because we're using filestore on this cluster, but how could someone determine 
this when using BlueStore?

Thanks,
Bryan

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to