[ceph-users] Re: High io wait when osd rocksdb is compacting

Raffael Bachmann Wed, 29 Jul 2020 06:19:24 -0700


Hi Wido

Thanks for the quick answer. They are all Intel p3520https://ark.intel.com/content/www/us/en/ark/products/88727/intel-ssd-dc-p3520-series-2-0tb-2-5in-pcie-3-0-x4-3d1-mlc.html

And this is ceph df
RAW STORAGE:
    CLASS     SIZE       AVAIL       USED        RAW USED     %RAW USED
    nvme      11 TiB     2.3 TiB     8.6 TiB      8.7 TiB 79.28
    TOTAL     11 TiB     2.3 TiB     8.6 TiB      8.7 TiB 79.28

POOLS:
    POOL     ID     STORED      OBJECTS     USED        %USED MAX AVAIL
    ceph      8     2.9 TiB     769.41k     8.6 TiB 89.15       359 GiB

Cheers
Raffael

On 29/07/2020 15:04, Wido den Hollander wrote:

On 29/07/2020 14:52, Raffael Bachmann wrote:
Hi All,
I'm kind of crossposting this from here:https://forum.proxmox.com/threads/i-o-wait-after-upgrade-5-x-to-6-2-and-ceph-luminous-to-nautilus.73581/But since I'm more and more sure that it's a ceph problem I'll try myluck here.
Since updating from Luminous to Nautilus I have a big problem.
I have a 3 node cluster. Each cluster has 2 nvme ssd and a 10GBASE-Tnet for ceph.Every few minutes a osd seems to compact the rocksdb. While doingthis it uses alot of I/O and blocks.This basically blocks the whole cluster and no VM/Container can readdata for some seconds (minutes).
While it happens "iostat -x" looks like this:
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %utilnvme0n1 0.00 2.00 0.00 24.00 0.00 46.00 0.00 95.83 0.00 0.00 0.00 0.00 12.00 2.00 0.40nvme1n1 0.00 1495.00 0.00 3924.00 0.00 6099.00 0.00 80.31 0.00 352.39 523.78 0.00 2.62 0.67 100.00
And iotop:

Total DISK READ:         0.00 B/s | Total DISK WRITE: 1573.47 K/s
Current DISK READ:       0.00 B/s | Current DISK WRITE: 3.43 M/s
     TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN IO>    COMMAND
2306 be/4 ceph 0.00 B/s 1533.22 K/s 0.00 % 99.99 %ceph-osd -f --cluster ceph --id 3 --setuser ceph --setgroup ceph[rocksdb:low1]
In the ceph-osd log I see that rocksdb is compacting.https://gist.github.com/qwasli/3bd0c7d535ee462feff8aaee618f3e08
The pool and one OSD is nearfull. I'd planed to move some data awayto another ceph pool. But now I'm not sure anymore if I should gowith ceph.I'l move some data away anyway today to see if that helps, but beforethe upgrade there was the same amount of data an I haven't had aproblem.
Any hints to solve this are appreciated.
What model/type of NVMe is this?
And on a nearfull cluster these problems can arise, it's usually not agood idea to have OSDs be nearfull.
What does 'ceph df' tell you?

Wido
Cheers
Raffael
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: High io wait when osd rocksdb is compacting

Reply via email to