Public bug reported:

This is an issue described in https://tracker.ceph.com/issues/38745,
where ceph health details shows messages like,

sudo ceph health detail
HEALTH_WARN 3 OSD(s) experiencing BlueFS spillover; mon juju-6879b7-6-lxd-1 is 
low on available space
[WRN] BLUEFS_SPILLOVER: 3 OSD(s) experiencing BlueFS spillover <---
osd.41 spilled over 66 MiB metadata from 'db' device (3.0 GiB used of 29 GiB) 
to slow device
osd.96 spilled over 461 MiB metadata from 'db' device (3.0 GiB used of 29 GiB) 
to slow device
osd.105 spilled over 198 MiB metadata from 'db' device (3.0 GiB used of 29 GiB) 
to slow device

The bluefs spillover is very likely caused because of the rocksdb's
level-sized issue.

https://docs.ceph.com/en/latest/rados/configuration/bluestore-config-
ref/#sizing has a statement about this leveled sizing.

Between versions 15.2.6 and 15.2.10 , if the value of
bluestore_volume_selection_policy is not set to use_some_extra, this
issue can be faced inspite of free space available, due to the fact that
RocksDB only uses "leveled" space on the NVME partition. The values are
set to be 300MB, 3GB, 30GB and 300GB. Every DB space above such a limit
will automatically end up on slow devices.

There is also a discussion at www.mail-archive.com/ceph-
us...@ceph.io/msg05782.html

Running compaction on the database, i.e ceph tell osd.XX compact
(replace XX with the OSD number) can work around the issue, but the best
fix is to either,

I am also pasting some notes Dongdong mentions on SF case 00326782,
where the fix is to either,

A. Redeploy the OSDs with a larger DB lvm/partition.

OR

B. Migrate to a new larger DB lvm/partition, this can be done offline
with ceph-volume lvm migrate, please refer to
https://docs.ceph.com/en/octopus/ceph-volume/lvm/migrate/ but it
requires to upgrade the cluster to 15.2.14 first.

A will be much safer, but more time-consuming. B will be much faster,
but its recommended to do it on one node first and wait/monitoring for a
couple of weeks before moving forward.

As mentioned above, to avoid running into the issue even with free space
available, the value of bluestore_volume_selection_policy should be set
to use_some_extra for all OSDs. 15.2.6 has
bluestore_volume_selection_policy but the default was only set to
use_some_extra 15.2.11 onwards. (https://tracker.ceph.com/issues/47053)

** Affects: ceph (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1959649

Title:
  BlueFS spillover detected for particular OSDs

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1959649/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to