Hi,

Yeah, the performance when the system isn't waiting for slow IO's is definitely 
acceptable for what I'm doing, it's just the handful of slow IO's messing up 
the overall latency.

Bill
________________________________
From: Gonzalo Aguilar Delgado [gagui...@aguilardelgado.com]
Sent: Monday, January 05, 2015 3:47 PM
To: Sanders, Bill
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Slow/Hung IOs

Hi,

I just ran this test and found my system is not better. But I use commodity 
hardware. The only difference is latency. You should look at it.

 Total time run:         62.412381
Total writes made:      919
Write size:             4194304
Bandwidth (MB/sec):     58.899

Stddev Bandwidth:       27.7791
Max bandwidth (MB/sec): 108
Min bandwidth (MB/sec): 0
Average Latency:        2.15587
Stddev Latency:         1.1327
Max latency:            6.42248
Min latency:            0.444754

I use 2 nodes 1GB net, but only get 100MB/s. I will look at it and increase 
node number.

Best regards,


El lun, 5 de ene 2015 a las 11:36 , Sanders, Bill <bill.sand...@teradata.com> 
escribió:
Hi Ceph Users,

We've got a Ceph cluster we've built, and we're experiencing issues with slow 
or hung IO's, even running 'rados bench' on the OSD cluster.  Things start out 
great, ~600 MB/s, then rapidly drops off as the test waits for IO's. Nothing 
seems to be taxed... the system just seems to be waiting.  Any help trying to 
figure out what could cause the slow IO's is appreciated.

For example, 'rados -p rbd bench 60 write -t 32' takes over 900s to complete:

A typical rados bench:
 Total time run:         957.458274
Total writes made:      9251
Write size:             4194304
Bandwidth (MB/sec):     38.648

Stddev Bandwidth:       157.323
Max bandwidth (MB/sec): 964
Min bandwidth (MB/sec): 0
Average Latency:        3.21126
Stddev Latency:         51.9546
Max latency:            910.72
Min latency:            0.04516


According to ceph.log, we're not experiencing any OSD flapping or monitor 
election cycles, just slow requests:

# grep slow /var/log/ceph/ceph.log:
2015-01-05 13:42:42.937678 osd.18 39.7.48.7:6803/11185 220 : [WRN] 3 slow 
requests, 1 included below; oldest blocked for > 513.611379 secs
2015-01-05 13:42:42.937685 osd.18 39.7.48.7:6803/11185 221 : [WRN] slow request 
30.136429 seconds old, received at 2015-01-05 13:42:12.801205: 
osd_op(client.92008.1:3101508 rb.0.1437.238e1f29.00000000000f [write 
114688~512] 3.841c0edf ondisk+write e994) v4 currently waiting for subops from 
3,37
2015-01-05 13:42:49.938681 osd.18 39.7.48.7:6803/11185 222 : [WRN] 3 slow 
requests, 1 included below; oldest blocked for > 520.612372 secs
2015-01-05 13:42:49.938688 osd.18 39.7.48.7:6803/11185 223 : [WRN] slow request 
480.636547 seconds old, received at 2015-01-05 13:34:49.302080: 
osd_op(client.92008.1:3100010 rb.0.140d.238e1f29.000000000c77 [write 
3622400~512] 3.d031a69f ondisk+write e994) v4 currently waiting for subops from 
26,37
2015-01-05 13:43:12.941838 osd.18 39.7.48.7:6803/11185 224 : [WRN] 3 slow 
requests, 1 included below; oldest blocked for > 543.615545 secs
2015-01-05 13:43:12.941844 osd.18 39.7.48.7:6803/11185 225 : [WRN] slow request 
60.140595 seconds old, received at 2015-01-05 13:42:12.801205: 
osd_op(client.92008.1:3101508 rb.0.1437.238e1f29.00000000000f [write 
114688~512] 3.841c0edf ondisk+write e994) v4 currently waiting for subops from 
3,37
2015-01-05 13:44:04.933440 osd.14 39.7.48.7:6818/11640 251 : [WRN] 4 slow 
requests, 1 included below; oldest blocked for > 606.941954 secs
2015-01-05 13:44:04.933469 osd.14 39.7.48.7:6818/11640 252 : [WRN] slow request 
240.101138 seconds old, received at 2015-01-05 13:40:04.832272: 
osd_op(client.92008.1:3101102 rb.0.142b.238e1f29.000000000010 [write 
475136~512] 3.5e623815 ondisk+write e994) v4 currently waiting for subops from 
27,33
2015-01-05 13:44:12.950805 osd.18 39.7.48.7:6803/11185 226 : [WRN] 3 slow 
requests, 1 included below; oldest blocked for > 603.624511 secs
2015-01-05 13:44:12.950812 osd.18 39.7.48.7:6803/11185 227 : [WRN] slow request 
120.149561 seconds old, received at 2015-01-05 13:42:12.801205: 
osd_op(client.92008.1:3101508 rb.0.1437.238e1f29.00000000000f [write 
114688~512] 3.841c0edf ondisk+write e994) v4 currently waiting for subops from 
3,37
2015-01-05 13:46:12.988010 osd.18 39.7.48.7:6803/11185 228 : [WRN] 3 slow 
requests, 1 included below; oldest blocked for > 723.661722 secs
2015-01-05 13:46:12.988017 osd.18 39.7.48.7:6803/11185 229 : [WRN] slow request 
240.186772 seconds old, received at 2015-01-05 13:42:12.801205: 
osd_op(client.92008.1:3101508 rb.0.1437.238e1f29.00000000000f [write 
114688~512] 3.841c0edf ondisk+write e994) v4 currently waiting for subops from 
3,37
2015-01-05 13:46:18.971570 osd.14 39.7.48.7:6818/11640 253 : [WRN] 4 slow 
requests, 1 included below; oldest blocked for > 740.980083 secs
2015-01-05 13:46:18.971577 osd.14 39.7.48.7:6818/11640 254 : [WRN] slow request 
480.063439 seconds old, received at 2015-01-05 13:38:18.908100: 
osd_op(client.91911.1:3113675 rb.0.13f5.238e1f29.000000000010 [write 
475136~512] 3.679a939d ondisk+write e994) v4 currently waiting for subops from 
27,34
2015-01-05 13:48:05.030581 osd.14 39.7.48.7:6818/11640 255 : [WRN] 4 slow 
requests, 1 included below; oldest blocked for > 847.039098 secs
2015-01-05 13:48:05.030587 osd.14 39.7.48.7:6818/11640 256 : [WRN] slow request 
480.198282 seconds old, received at 2015-01-05 13:40:04.832272: 
osd_op(client.92008.1:3101102 rb.0.142b.238e1f29.000000000010 [write 
475136~512] 3.5e623815 ondisk+write e994) v4 currently waiting for subops from 
27,33


iostat -dxm:
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz 
avgqu-sz   await  svctm  %util
sda            5662.99     3.63  155.17    3.37    22.94     0.03   296.72     
1.05    6.64   1.79  28.32
sdb               0.73     0.04    3.62   70.66     0.04    10.85   300.04     
0.06    0.79   0.08   0.61
sdc               0.70     0.00    1.70   65.45     0.01    10.34   315.60     
0.06    0.85   0.09   0.57
sdd               0.07     0.95   16.78   13.47     1.03     2.28   224.25     
0.37   12.32   3.84  11.62
sde               0.26     0.80   25.79   10.89     2.11     1.85   220.74     
2.82   76.94   4.73  17.35
sdf               0.23     0.94   21.16   12.39     1.56     2.14   225.94     
2.81   83.84   4.70  15.75
sdg               0.02     0.91    9.22   12.30     0.33     2.06   227.64     
0.26   12.26   3.67   7.89
sdh               0.07     0.86   14.31   12.35     0.84     2.06   223.02     
2.17   81.27   4.79  12.76
sdk               0.19     0.85   15.92   10.48     1.06     1.78   220.09     
3.11  114.07   4.68  12.35
sdl               0.30     0.87   20.32   12.07     1.45     2.06   221.99     
2.71   82.65   4.68  15.16
sdi               0.13     0.88   16.03   11.51     1.07     1.94   224.27     
2.99  108.40   4.64  12.78
sdj               0.15     0.90   15.63   11.83     0.95     2.01   220.32     
2.85  103.83   4.65  12.76
sdm               0.19     0.86   20.48   12.04     1.46     2.04   220.42     
2.64   81.17   4.72  15.34

Our ceph cluster is 4x Dell R720xd nodes:
2x1TB spinners configured in RAID for the OS
10x4TB spinners for OSD's (XFS)
2x400GB SSD's, each with 5x~50GB OSD journals
2x Xeon E5-2620 CPU (/proc/cpuinfo reports 24 cores)
128GB RAM
Two networks (public+cluster), both over infiniband

Three monitors are configured on the first three nodes, and use a chunk of one 
of the SSDs for their data, on an XFS partition

Software:
SLES 11SP3, with some in house patching. (3.0.1 kernel, "ceph-client" 
backported from 3.10)
Ceph version: ceph-0.80.5-0.9.2, packaged by SUSE

ceph.conf:
fsid = 3e8dbfd8-c3c8-4d30-80e2-cd059619d757
mon initial members = tvsaq1, tvsaq2, tvsar1
mon host = 39.7.48.6, 39.7.48.7, 39.7.48.8

cluster network = 39.64.0.0/12
public network = 39.0.0.0/12
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
osd journal size = 9000
filestore xattr use omap = true
osd crush update on start = false
osd pool default size = 3
osd pool default min size = 1
osd pool default pg num = 4096
osd pool default pgp num = 4096

mon clock drift allowed = .100
osd mount options xfs = rw,noatime,inode64




_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to