Nautilus will make this easier.

https://github.com/ceph/ceph/pull/18096

On Thu, Jan 3, 2019 at 5:22 AM Bryan Stillwell <bstillw...@godaddy.com> wrote:
>
> Recently on one of our bigger clusters (~1,900 OSDs) running Luminous 
> (12.2.8), we had a problem where OSDs would frequently get restarted while 
> deep-scrubbing.
>
> After digging into it I found that a number of the OSDs had very large omap 
> directories (50GiB+).  I believe these were OSDs that had previous held PGs 
> that were part of the .rgw.buckets.index pool which I have recently moved to 
> all SSDs, however, it seems like the data remained on the HDDs.
>
> I was able to reduce the data usage on most of the OSDs (from ~50 GiB to < 
> 200 MiB!) by compacting the omap dbs offline by setting 
> 'leveldb_compact_on_mount = true' in the [osd] section of ceph.conf, but that 
> didn't work on the newer OSDs which use rocksdb.  On those I had to do an 
> online compaction using a command like:
>
> $ ceph tell osd.510 compact
>
> That worked, but today when I tried doing that on some of the SSD-based OSDs 
> which are backing .rgw.buckets.index I started getting slow requests and the 
> compaction ultimately failed with this error:
>
> $ ceph tell osd.1720 compact
> osd.1720: Error ENXIO: osd down
>
> When I tried it again it succeeded:
>
> $ ceph tell osd.1720 compact
> osd.1720: compacted omap in 420.999 seconds
>
> The data usage on that OSD dropped from 57.8 GiB to 43.4 GiB which was nice, 
> but I don't believe that'll get any smaller until I start splitting the PGs 
> in the .rgw.buckets.index pool to better distribute that pool across the 
> SSD-based OSDs.
>
> The first question I have is what is the option to do an offline compaction 
> of rocksdb so I don't impact our customers while compacting the rest of the 
> SSD-based OSDs?
>
> The next question is if there's a way to configure Ceph to automatically 
> compact the omap dbs in the background in a way that doesn't affect user 
> experience?
>
> Finally, I was able to figure out that the omap directories were getting 
> large because we're using filestore on this cluster, but how could someone 
> determine this when using BlueStore?
>
> Thanks,
> Bryan
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Cheers,
Brad
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to