Public bug reported:

Ceph on bcache could have serious performance degradation (10 times drop) when 
the below two conditions are met:
1. bluefs_buffered_io is turned on

2. Any OSD bcache’s cache_available_percent is less than 60

As many of us may already know that bcache will force all writes to go
directly to the backing device when the cache_available_percent is less
than CUTOFF_WRITEBACK_SYNC(30).

But the thing is that bcache will start to bypass *some* writes when 
cache_available_percent reached to CUTOFF_WRITEBACK(60), and those are the 
writes that are not carrying any synchronization flags. Those are kernel IO 
flags: REQ_SYNC, REQ_FUA, REQ_PREFLUSH.
The code is here 
https://github.com/torvalds/linux/blob/master/drivers/md/bcache/writeback.h#L123


The problem I found from a recent case (bionic-stein + 4.15 kernel) is that 
when bluefs are submitting writes with bluefs_buffered_io turned on, the writes 
that send to bcache won’t carry any of the sync flags, and when the 
cache_available_percent dropped lower than 60 (which is quite easy to hit), all 
bluefs IO will be forced to be performed in a non-writeback mode. This is 
equivalent to setting up the bluestore DB on an HDD device, so every IO is 
bounded by the HDD speed. 

I’m not sure How the sync flags are propagated from ceph all the way to the 
kernel bcache layer.
But I’ve verified all different ceph/kernel/ubuntu versions when 
bluefs_buffered_io turned on:

N: no issue, all writes contain SYNC flag. 
P: has the issue, disable bluefs_buffered_io works. 

Bionic-ussuri + kernel 5.4.0  -> N

Bionic-ussuri + kernel 4.15.0 -> P

Bionic-stein + kernel 5.4.0 -> N

Bionic-stein + kernel 4.15.0 -> P

Bionic-train + kernel 5.4.0 -> N

Bionic-train + kernel 4.15.0 -> P

Focal(octopus) + kernel 5.4.0 -> N

Focal(octopus) + kernel 5.8.0 -> N

Focal-wallaby + kernel 5.4.0 -> N

Focal-wallaby + kernel 5.8.0 -> N


As we can see the issue appears to hit when bluefs_buffered_io = true and the 
kernel with 4.15.0.
I’m not sure how/why the SYNC flag is added in the 5.4 or 5.8 kernel when 
bluefs_buffered_io is enabled. Currently, I know 5.4 and 5.8 are good when 
bluefs_buffered_io is turned on. 

Note that if all OSDs are deployed with separate NVME as the bluestore
DB device, then the cluster won’t hit the issue, only those OSDs that
put bluestore DB on bcache device will hit this issue.

Ceph releases with bluefs_buffered_io enabled by default:
bluefs_buffered_io was enabled by default in v13.2.0 and v14.2.0.
bluefs_buffered_io was disabled by default in v14.2.10 and v15.2.0.
bluefs_buffered io was re-enabled in the following point releases:
v14.2.22
v15.2.13
v16.2.0

So in a summary, if all below 3 are met, this cluster will very likely
hit the issue when any OSD bcache has the cache_available_percent
dropped to 60:

1. ceph has bluefs_buffered_io enabled

2. OSDs are putting bluestore DB on top of bcache device

3. kernel version is bionic-ga (4.15.0)

** Affects: ceph (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1936136

Title:
  ceph on bcache performance regression

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1936136/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to