Nautilus will make this easier. https://github.com/ceph/ceph/pull/18096
On Thu, Jan 3, 2019 at 5:22 AM Bryan Stillwell <bstillw...@godaddy.com> wrote: > > Recently on one of our bigger clusters (~1,900 OSDs) running Luminous > (12.2.8), we had a problem where OSDs would frequently get restarted while > deep-scrubbing. > > After digging into it I found that a number of the OSDs had very large omap > directories (50GiB+). I believe these were OSDs that had previous held PGs > that were part of the .rgw.buckets.index pool which I have recently moved to > all SSDs, however, it seems like the data remained on the HDDs. > > I was able to reduce the data usage on most of the OSDs (from ~50 GiB to < > 200 MiB!) by compacting the omap dbs offline by setting > 'leveldb_compact_on_mount = true' in the [osd] section of ceph.conf, but that > didn't work on the newer OSDs which use rocksdb. On those I had to do an > online compaction using a command like: > > $ ceph tell osd.510 compact > > That worked, but today when I tried doing that on some of the SSD-based OSDs > which are backing .rgw.buckets.index I started getting slow requests and the > compaction ultimately failed with this error: > > $ ceph tell osd.1720 compact > osd.1720: Error ENXIO: osd down > > When I tried it again it succeeded: > > $ ceph tell osd.1720 compact > osd.1720: compacted omap in 420.999 seconds > > The data usage on that OSD dropped from 57.8 GiB to 43.4 GiB which was nice, > but I don't believe that'll get any smaller until I start splitting the PGs > in the .rgw.buckets.index pool to better distribute that pool across the > SSD-based OSDs. > > The first question I have is what is the option to do an offline compaction > of rocksdb so I don't impact our customers while compacting the rest of the > SSD-based OSDs? > > The next question is if there's a way to configure Ceph to automatically > compact the omap dbs in the background in a way that doesn't affect user > experience? > > Finally, I was able to figure out that the omap directories were getting > large because we're using filestore on this cluster, but how could someone > determine this when using BlueStore? > > Thanks, > Bryan > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Cheers, Brad _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com