Re: [ceph-users] long blocking with writes on rbds

Jeff Epstein Fri, 24 Apr 2015 02:25:56 -0700

Hi JC,

In answer to your question, iostat shows high wait times on the RBD, butnot on the underlying medium. For example:


Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util

rbd50             0.00     0.01    0.00    0.00     0.01     0.03    15.81     
5.08 892493.16    1.47 1692367.78 85088.96  48.41

xvdk              0.00    32.18    0.20    4.55     1.00   314.30   132.83     
0.40   83.70    1.10   87.35   0.82   0.39

In this case, rbd50 is the RBD that is blocking, while xvdk is thephysical disk where OSD data is stored. xvdk appears completely normal,whereas rbd50 has absurdly high wait times. This leads me to think thatthe problem is a bug or misconfiguration in ceph, rather than beingactually blocked by slow I/O.


This information is also reflected in vmstat:

    procs -----------memory---------- ---swap-- -----io---- -system-- 
----cpu----

 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa

 0  1 1386024  79596 263552 468136    1   11    26   297  213  247  0  1 57 42

Finally, one can see the blocked processes, and the function they'reblocked in, with ps. This is pretty typical:


    2  6750 root     D     0.0 jbd2/rbd47-8    wait_on_buffer

    2 17551 root     D     0.0 jbd2/rbd51-8    wait_on_buffer

    2 19019 root     D     0.0 kworker/u30:3   get_write_access

22369 22374 root     D     0.0 shutdown        sync_inodes_sb

22372 22381 root     D     0.0 shutdown        sync_inodes_sb

Frequently mkfs blocks as well:

 1468 12329 root     D     0.0 mkfs.ext4       wait_on_page_bit

 1468 12332 root     D     0.0 mkfs.ext4       wait_on_buffer

I haven't seen anything obviously unusual in ceph -w, but I'm also notcompletely sure what I'm looking for.

The network connection between our nodes is provided by Amazon AWS, andhas always been sufficient for our production needs until now. Ifthere's a specific issue of concern related to ceph that I should beinvestigating, please let me know.

Here's a pastebin from an OSD experiencing the problem I described. Iset debug_osd to 5/5. If you can provide any insight, I'd be grateful.http://pastebin.com/kLSwbVRbAlso, if you have any more suggestions on how I can collect potentiallyinteresting debug info, please let me know. Thanks.


Jeff

On 04/10/2015 12:24 AM, LOPEZ Jean-Charles wrote:

Hi Jeff,

have you tried gathering an iostats but on the OSD side to see how your OSD 
drives behave.

The RBD side shows you what the client is experiencing (the symptom) but will 
not help you find the problem.

Can you grab this iostat output on the OSD VMs (district-1 or district-2) 
depending on which test you did last. Don’t forget to indicate which devices 
are the OSD devices on your VMs together with the iostat posting.

Have you also investigated the network between your client and the OSDs? While 
the test is going, do you see any unusal message in a « ceph -w » output?

Pastebin and we’ll see if we can spot something.

As for the too few PGs, once we’ve found the root cause of why it’s slow, 
you’ll be able to adjust and increase the number of PGs per pool.

Cheers
JC

On 9 Apr 2015, at 20:25, Jeff Epstein <jeff.epst...@commerceguys.com> wrote:

As a follow-up to this issue, I'd like to point out some other things I've 
noticed.

First, per suggestions posted here, I've reduced the number of pgs per pool. 
This results in the following ceph status:

     cluster e96e10d3-ad2b-467f-9fe4-ab5269b70206
      health HEALTH_WARN too few pgs per osd (14 < min 20)
      monmap e1: 3 mons at 
{a=192.168.224.4:6789/0,b=192.168.232.4:6789/0,c=192.168.240.4:6789/0}, 
election epoch 8, quorum 0,1,2 a,b,c
      osdmap e238: 6 osds: 6 up, 6 in
       pgmap v1107: 86 pgs, 23 pools, 2511 MB data, 801 objects
             38288 MB used, 1467 GB / 1504 GB avail
                   86 active+clean

I'm not sure if I should be concerned about the HEALTH WARN.

However, this has not helped the performance issues. I've dug deeper to try to 
understand what is actually happening. It's curious because there isn't much 
data: our pools are about 5GB, so it really shouldn't take 30 minutes to an 
hour to run mkfs. Here some results taken from disk analysis tools while this 
delay is in progress:

 From pt-diskstats:

   #ts device    rd_s rd_avkb rd_mb_s rd_mrg rd_cnc   rd_rt    wr_s wr_avkb 
wr_mb_s wr_mrg wr_cnc   wr_rt busy in_prg    io_s  qtime stime
   1.0 rbd0       0.0     0.0     0.0     0%    0.0     0.0     0.0     0.0     
0.0     0%    0.0     0.0 100%      6     0.0    0.0   0.0

 From iostat:

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
rbd0              0.00     0.03    0.03    0.04     0.13    10.73   310.78     
3.31 19730.41    0.40 37704.35 7073.59  49.47

These results correspond with my experience: the device is busy, as witnessed by the 
"busy" column in pt-diskstats and the "await" column in iostat. But both tools 
also attest to the fact that there isn't much reading or writing going on.  According to 
pt-diskstats, there isn't any. So my question is: what is ceph doing? It clearly isn't just 
blocking as a result of excess I/O load, something else is going on. Can anyone please explain?

Jeff


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] long blocking with writes on rbds

Reply via email to