Hi BIll,

From your log excerpt, it looks like your slow requests are happening on OSDs 
14 and 18. Is it always these two OSDs?

If you don't have a long recovery time (e.g., the cluster is just full of test 
data), maybe you could try setting OSDs 14 and 18 out and re-benching?

Alternatively I suppose you could just use bonnie++ or dd etc to write to those 
OSDs (careful to not clobber any Ceph dirs) and see how the performance looks. 

Cheers,
Lincoln

On Jan 5, 2015, at 4:36 PM, Sanders, Bill wrote:

> Hi Ceph Users,
> 
> We've got a Ceph cluster we've built, and we're experiencing issues with slow 
> or hung IO's, even running 'rados bench' on the OSD cluster.  Things start 
> out great, ~600 MB/s, then rapidly drops off as the test waits for IO's. 
> Nothing seems to be taxed... the system just seems to be waiting.  Any help 
> trying to figure out what could cause the slow IO's is appreciated.
> 
> For example, 'rados -p rbd bench 60 write -t 32' takes over 900s to complete:
> 
> A typical rados bench:
>  Total time run:         957.458274
> Total writes made:      9251
> Write size:             4194304
> Bandwidth (MB/sec):     38.648 
> 
> Stddev Bandwidth:       157.323
> Max bandwidth (MB/sec): 964
> Min bandwidth (MB/sec): 0
> Average Latency:        3.21126
> Stddev Latency:         51.9546
> Max latency:            910.72
> Min latency:            0.04516
> 
> 
> According to ceph.log, we're not experiencing any OSD flapping or monitor 
> election cycles, just slow requests:
> 
> # grep slow /var/log/ceph/ceph.log:
> 2015-01-05 13:42:42.937678 osd.18 39.7.48.7:6803/11185 220 : [WRN] 3 slow 
> requests, 1 included below; oldest blocked for > 513.611379 secs
> 2015-01-05 13:42:42.937685 osd.18 39.7.48.7:6803/11185 221 : [WRN] slow 
> request 30.136429 seconds old, received at 2015-01-05 13:42:12.801205: 
> osd_op(client.92008.1:3101508 rb.0.1437.238e1f29.00000000000f [write 
> 114688~512] 3.841c0edf ondisk+write e994) v4 currently waiting for subops 
> from 3,37
> 2015-01-05 13:42:49.938681 osd.18 39.7.48.7:6803/11185 222 : [WRN] 3 slow 
> requests, 1 included below; oldest blocked for > 520.612372 secs
> 2015-01-05 13:42:49.938688 osd.18 39.7.48.7:6803/11185 223 : [WRN] slow 
> request 480.636547 seconds old, received at 2015-01-05 13:34:49.302080: 
> osd_op(client.92008.1:3100010 rb.0.140d.238e1f29.000000000c77 [write 
> 3622400~512] 3.d031a69f ondisk+write e994) v4 currently waiting for subops 
> from 26,37
> 2015-01-05 13:43:12.941838 osd.18 39.7.48.7:6803/11185 224 : [WRN] 3 slow 
> requests, 1 included below; oldest blocked for > 543.615545 secs
> 2015-01-05 13:43:12.941844 osd.18 39.7.48.7:6803/11185 225 : [WRN] slow 
> request 60.140595 seconds old, received at 2015-01-05 13:42:12.801205: 
> osd_op(client.92008.1:3101508 rb.0.1437.238e1f29.00000000000f [write 
> 114688~512] 3.841c0edf ondisk+write e994) v4 currently waiting for subops 
> from 3,37
> 2015-01-05 13:44:04.933440 osd.14 39.7.48.7:6818/11640 251 : [WRN] 4 slow 
> requests, 1 included below; oldest blocked for > 606.941954 secs
> 2015-01-05 13:44:04.933469 osd.14 39.7.48.7:6818/11640 252 : [WRN] slow 
> request 240.101138 seconds old, received at 2015-01-05 13:40:04.832272: 
> osd_op(client.92008.1:3101102 rb.0.142b.238e1f29.000000000010 [write 
> 475136~512] 3.5e623815 ondisk+write e994) v4 currently waiting for subops 
> from 27,33
> 2015-01-05 13:44:12.950805 osd.18 39.7.48.7:6803/11185 226 : [WRN] 3 slow 
> requests, 1 included below; oldest blocked for > 603.624511 secs
> 2015-01-05 13:44:12.950812 osd.18 39.7.48.7:6803/11185 227 : [WRN] slow 
> request 120.149561 seconds old, received at 2015-01-05 13:42:12.801205: 
> osd_op(client.92008.1:3101508 rb.0.1437.238e1f29.00000000000f [write 
> 114688~512] 3.841c0edf ondisk+write e994) v4 currently waiting for subops 
> from 3,37
> 2015-01-05 13:46:12.988010 osd.18 39.7.48.7:6803/11185 228 : [WRN] 3 slow 
> requests, 1 included below; oldest blocked for > 723.661722 secs
> 2015-01-05 13:46:12.988017 osd.18 39.7.48.7:6803/11185 229 : [WRN] slow 
> request 240.186772 seconds old, received at 2015-01-05 13:42:12.801205: 
> osd_op(client.92008.1:3101508 rb.0.1437.238e1f29.00000000000f [write 
> 114688~512] 3.841c0edf ondisk+write e994) v4 currently waiting for subops 
> from 3,37
> 2015-01-05 13:46:18.971570 osd.14 39.7.48.7:6818/11640 253 : [WRN] 4 slow 
> requests, 1 included below; oldest blocked for > 740.980083 secs
> 2015-01-05 13:46:18.971577 osd.14 39.7.48.7:6818/11640 254 : [WRN] slow 
> request 480.063439 seconds old, received at 2015-01-05 13:38:18.908100: 
> osd_op(client.91911.1:3113675 rb.0.13f5.238e1f29.000000000010 [write 
> 475136~512] 3.679a939d ondisk+write e994) v4 currently waiting for subops 
> from 27,34
> 2015-01-05 13:48:05.030581 osd.14 39.7.48.7:6818/11640 255 : [WRN] 4 slow 
> requests, 1 included below; oldest blocked for > 847.039098 secs
> 2015-01-05 13:48:05.030587 osd.14 39.7.48.7:6818/11640 256 : [WRN] slow 
> request 480.198282 seconds old, received at 2015-01-05 13:40:04.832272: 
> osd_op(client.92008.1:3101102 rb.0.142b.238e1f29.000000000010 [write 
> 475136~512] 3.5e623815 ondisk+write e994) v4 currently waiting for subops 
> from 27,33
> 
> 
> iostat -dxm:
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz 
> avgqu-sz   await  svctm  %util
> sda            5662.99     3.63  155.17    3.37    22.94     0.03   296.72    
>  1.05    6.64   1.79  28.32
> sdb               0.73     0.04    3.62   70.66     0.04    10.85   300.04    
>  0.06    0.79   0.08   0.61
> sdc               0.70     0.00    1.70   65.45     0.01    10.34   315.60    
>  0.06    0.85   0.09   0.57
> sdd               0.07     0.95   16.78   13.47     1.03     2.28   224.25    
>  0.37   12.32   3.84  11.62
> sde               0.26     0.80   25.79   10.89     2.11     1.85   220.74    
>  2.82   76.94   4.73  17.35
> sdf               0.23     0.94   21.16   12.39     1.56     2.14   225.94    
>  2.81   83.84   4.70  15.75
> sdg               0.02     0.91    9.22   12.30     0.33     2.06   227.64    
>  0.26   12.26   3.67   7.89
> sdh               0.07     0.86   14.31   12.35     0.84     2.06   223.02    
>  2.17   81.27   4.79  12.76
> sdk               0.19     0.85   15.92   10.48     1.06     1.78   220.09    
>  3.11  114.07   4.68  12.35
> sdl               0.30     0.87   20.32   12.07     1.45     2.06   221.99    
>  2.71   82.65   4.68  15.16
> sdi               0.13     0.88   16.03   11.51     1.07     1.94   224.27    
>  2.99  108.40   4.64  12.78
> sdj               0.15     0.90   15.63   11.83     0.95     2.01   220.32    
>  2.85  103.83   4.65  12.76
> sdm               0.19     0.86   20.48   12.04     1.46     2.04   220.42    
>  2.64   81.17   4.72  15.34
> 
> Our ceph cluster is 4x Dell R720xd nodes:
> 2x1TB spinners configured in RAID for the OS
> 10x4TB spinners for OSD's (XFS)
> 2x400GB SSD's, each with 5x~50GB OSD journals
> 2x Xeon E5-2620 CPU (/proc/cpuinfo reports 24 cores)
> 128GB RAM
> Two networks (public+cluster), both over infiniband
> 
> Three monitors are configured on the first three nodes, and use a chunk of 
> one of the SSDs for their data, on an XFS partition
> 
> Software:
> SLES 11SP3, with some in house patching. (3.0.1 kernel, "ceph-client" 
> backported from 3.10)
> Ceph version: ceph-0.80.5-0.9.2, packaged by SUSE
> 
> ceph.conf:
> fsid = 3e8dbfd8-c3c8-4d30-80e2-cd059619d757
> mon initial members = tvsaq1, tvsaq2, tvsar1
> mon host = 39.7.48.6, 39.7.48.7, 39.7.48.8
> 
> cluster network = 39.64.0.0/12
> public network = 39.0.0.0/12
> auth cluster required = cephx
> auth service required = cephx
> auth client required = cephx
> osd journal size = 9000
> filestore xattr use omap = true
> osd crush update on start = false
> osd pool default size = 3
> osd pool default min size = 1
> osd pool default pg num = 4096
> osd pool default pgp num = 4096
> 
> mon clock drift allowed = .100
> osd mount options xfs = rw,noatime,inode64
> 
> 
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to