Re: [ceph-users] long blocking with writes on rbds

Jeff Epstein Wed, 06 May 2015 02:01:29 -0700

If anyone here is interested in what became of my problem withdreadfully bad performance with ceph, I'd like to offer this follow up.

The problem, as it turns out, is a regression that exists only inversion 3.18 of the kernel. Upgrading to 4.0 solved the problem, andperformance is now normal. Odd that no one here suggested this fix, andall the messing about with various topologies, placement groups, and soon, was for naught.


Jeff

On 04/09/2015 11:25 PM, Jeff Epstein wrote:

As a follow-up to this issue, I'd like to point out some other thingsI've noticed.
First, per suggestions posted here, I've reduced the number of pgs perpool. This results in the following ceph status:
cluster e96e10d3-ad2b-467f-9fe4-ab5269b70206
     health HEALTH_WARN too few pgs per osd (14 < min 20)
monmap e1: 3 mons at{a=192.168.224.4:6789/0,b=192.168.232.4:6789/0,c=192.168.240.4:6789/0}, electionepoch 8, quorum 0,1,2 a,b,c
     osdmap e238: 6 osds: 6 up, 6 in
      pgmap v1107: 86 pgs, 23 pools, 2511 MB data, 801 objects
            38288 MB used, 1467 GB / 1504 GB avail
                  86 active+clean

I'm not sure if I should be concerned about the HEALTH WARN.
However, this has not helped the performance issues. I've dug deeperto try to understand what is actually happening. It's curious becausethere isn't much data: our pools are about 5GB, so it really shouldn'ttake 30 minutes to an hour to run mkfs. Here some results taken fromdisk analysis tools while this delay is in progress:
From pt-diskstats:
#ts device rd_s rd_avkb rd_mb_s rd_mrg rd_cnc rd_rt wr_swr_avkb wr_mb_s wr_mrg wr_cnc wr_rt busy in_prg io_s qtime stime1.0 rbd0 0.0 0.0 0.0 0% 0.0 0.0 0.00.0 0.0 0% 0.0 0.0 100% 6 0.0 0.0 0.0
From iostat:
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-szavgqu-sz await r_await w_await svctm %utilrbd0 0.00 0.03 0.03 0.04 0.13 10.73310.78 3.31 19730.41 0.40 37704.35 7073.59 49.47
These results correspond with my experience: the device is busy, aswitnessed by the "busy" column in pt-diskstats and the "await" columnin iostat. But both tools also attest to the fact that there isn'tmuch reading or writing going on. According to pt-diskstats, thereisn't any. So my question is: what is ceph /doing/? It clearly isn'tjust blocking as a result of excess I/O load, something else is goingon. Can anyone please explain?
Jeff

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] long blocking with writes on rbds

Reply via email to