Hi BIll, From your log excerpt, it looks like your slow requests are happening on OSDs 14 and 18. Is it always these two OSDs?
If you don't have a long recovery time (e.g., the cluster is just full of test data), maybe you could try setting OSDs 14 and 18 out and re-benching? Alternatively I suppose you could just use bonnie++ or dd etc to write to those OSDs (careful to not clobber any Ceph dirs) and see how the performance looks. Cheers, Lincoln On Jan 5, 2015, at 4:36 PM, Sanders, Bill wrote: > Hi Ceph Users, > > We've got a Ceph cluster we've built, and we're experiencing issues with slow > or hung IO's, even running 'rados bench' on the OSD cluster. Things start > out great, ~600 MB/s, then rapidly drops off as the test waits for IO's. > Nothing seems to be taxed... the system just seems to be waiting. Any help > trying to figure out what could cause the slow IO's is appreciated. > > For example, 'rados -p rbd bench 60 write -t 32' takes over 900s to complete: > > A typical rados bench: > Total time run: 957.458274 > Total writes made: 9251 > Write size: 4194304 > Bandwidth (MB/sec): 38.648 > > Stddev Bandwidth: 157.323 > Max bandwidth (MB/sec): 964 > Min bandwidth (MB/sec): 0 > Average Latency: 3.21126 > Stddev Latency: 51.9546 > Max latency: 910.72 > Min latency: 0.04516 > > > According to ceph.log, we're not experiencing any OSD flapping or monitor > election cycles, just slow requests: > > # grep slow /var/log/ceph/ceph.log: > 2015-01-05 13:42:42.937678 osd.18 39.7.48.7:6803/11185 220 : [WRN] 3 slow > requests, 1 included below; oldest blocked for > 513.611379 secs > 2015-01-05 13:42:42.937685 osd.18 39.7.48.7:6803/11185 221 : [WRN] slow > request 30.136429 seconds old, received at 2015-01-05 13:42:12.801205: > osd_op(client.92008.1:3101508 rb.0.1437.238e1f29.00000000000f [write > 114688~512] 3.841c0edf ondisk+write e994) v4 currently waiting for subops > from 3,37 > 2015-01-05 13:42:49.938681 osd.18 39.7.48.7:6803/11185 222 : [WRN] 3 slow > requests, 1 included below; oldest blocked for > 520.612372 secs > 2015-01-05 13:42:49.938688 osd.18 39.7.48.7:6803/11185 223 : [WRN] slow > request 480.636547 seconds old, received at 2015-01-05 13:34:49.302080: > osd_op(client.92008.1:3100010 rb.0.140d.238e1f29.000000000c77 [write > 3622400~512] 3.d031a69f ondisk+write e994) v4 currently waiting for subops > from 26,37 > 2015-01-05 13:43:12.941838 osd.18 39.7.48.7:6803/11185 224 : [WRN] 3 slow > requests, 1 included below; oldest blocked for > 543.615545 secs > 2015-01-05 13:43:12.941844 osd.18 39.7.48.7:6803/11185 225 : [WRN] slow > request 60.140595 seconds old, received at 2015-01-05 13:42:12.801205: > osd_op(client.92008.1:3101508 rb.0.1437.238e1f29.00000000000f [write > 114688~512] 3.841c0edf ondisk+write e994) v4 currently waiting for subops > from 3,37 > 2015-01-05 13:44:04.933440 osd.14 39.7.48.7:6818/11640 251 : [WRN] 4 slow > requests, 1 included below; oldest blocked for > 606.941954 secs > 2015-01-05 13:44:04.933469 osd.14 39.7.48.7:6818/11640 252 : [WRN] slow > request 240.101138 seconds old, received at 2015-01-05 13:40:04.832272: > osd_op(client.92008.1:3101102 rb.0.142b.238e1f29.000000000010 [write > 475136~512] 3.5e623815 ondisk+write e994) v4 currently waiting for subops > from 27,33 > 2015-01-05 13:44:12.950805 osd.18 39.7.48.7:6803/11185 226 : [WRN] 3 slow > requests, 1 included below; oldest blocked for > 603.624511 secs > 2015-01-05 13:44:12.950812 osd.18 39.7.48.7:6803/11185 227 : [WRN] slow > request 120.149561 seconds old, received at 2015-01-05 13:42:12.801205: > osd_op(client.92008.1:3101508 rb.0.1437.238e1f29.00000000000f [write > 114688~512] 3.841c0edf ondisk+write e994) v4 currently waiting for subops > from 3,37 > 2015-01-05 13:46:12.988010 osd.18 39.7.48.7:6803/11185 228 : [WRN] 3 slow > requests, 1 included below; oldest blocked for > 723.661722 secs > 2015-01-05 13:46:12.988017 osd.18 39.7.48.7:6803/11185 229 : [WRN] slow > request 240.186772 seconds old, received at 2015-01-05 13:42:12.801205: > osd_op(client.92008.1:3101508 rb.0.1437.238e1f29.00000000000f [write > 114688~512] 3.841c0edf ondisk+write e994) v4 currently waiting for subops > from 3,37 > 2015-01-05 13:46:18.971570 osd.14 39.7.48.7:6818/11640 253 : [WRN] 4 slow > requests, 1 included below; oldest blocked for > 740.980083 secs > 2015-01-05 13:46:18.971577 osd.14 39.7.48.7:6818/11640 254 : [WRN] slow > request 480.063439 seconds old, received at 2015-01-05 13:38:18.908100: > osd_op(client.91911.1:3113675 rb.0.13f5.238e1f29.000000000010 [write > 475136~512] 3.679a939d ondisk+write e994) v4 currently waiting for subops > from 27,34 > 2015-01-05 13:48:05.030581 osd.14 39.7.48.7:6818/11640 255 : [WRN] 4 slow > requests, 1 included below; oldest blocked for > 847.039098 secs > 2015-01-05 13:48:05.030587 osd.14 39.7.48.7:6818/11640 256 : [WRN] slow > request 480.198282 seconds old, received at 2015-01-05 13:40:04.832272: > osd_op(client.92008.1:3101102 rb.0.142b.238e1f29.000000000010 [write > 475136~512] 3.5e623815 ondisk+write e994) v4 currently waiting for subops > from 27,33 > > > iostat -dxm: > Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz > avgqu-sz await svctm %util > sda 5662.99 3.63 155.17 3.37 22.94 0.03 296.72 > 1.05 6.64 1.79 28.32 > sdb 0.73 0.04 3.62 70.66 0.04 10.85 300.04 > 0.06 0.79 0.08 0.61 > sdc 0.70 0.00 1.70 65.45 0.01 10.34 315.60 > 0.06 0.85 0.09 0.57 > sdd 0.07 0.95 16.78 13.47 1.03 2.28 224.25 > 0.37 12.32 3.84 11.62 > sde 0.26 0.80 25.79 10.89 2.11 1.85 220.74 > 2.82 76.94 4.73 17.35 > sdf 0.23 0.94 21.16 12.39 1.56 2.14 225.94 > 2.81 83.84 4.70 15.75 > sdg 0.02 0.91 9.22 12.30 0.33 2.06 227.64 > 0.26 12.26 3.67 7.89 > sdh 0.07 0.86 14.31 12.35 0.84 2.06 223.02 > 2.17 81.27 4.79 12.76 > sdk 0.19 0.85 15.92 10.48 1.06 1.78 220.09 > 3.11 114.07 4.68 12.35 > sdl 0.30 0.87 20.32 12.07 1.45 2.06 221.99 > 2.71 82.65 4.68 15.16 > sdi 0.13 0.88 16.03 11.51 1.07 1.94 224.27 > 2.99 108.40 4.64 12.78 > sdj 0.15 0.90 15.63 11.83 0.95 2.01 220.32 > 2.85 103.83 4.65 12.76 > sdm 0.19 0.86 20.48 12.04 1.46 2.04 220.42 > 2.64 81.17 4.72 15.34 > > Our ceph cluster is 4x Dell R720xd nodes: > 2x1TB spinners configured in RAID for the OS > 10x4TB spinners for OSD's (XFS) > 2x400GB SSD's, each with 5x~50GB OSD journals > 2x Xeon E5-2620 CPU (/proc/cpuinfo reports 24 cores) > 128GB RAM > Two networks (public+cluster), both over infiniband > > Three monitors are configured on the first three nodes, and use a chunk of > one of the SSDs for their data, on an XFS partition > > Software: > SLES 11SP3, with some in house patching. (3.0.1 kernel, "ceph-client" > backported from 3.10) > Ceph version: ceph-0.80.5-0.9.2, packaged by SUSE > > ceph.conf: > fsid = 3e8dbfd8-c3c8-4d30-80e2-cd059619d757 > mon initial members = tvsaq1, tvsaq2, tvsar1 > mon host = 39.7.48.6, 39.7.48.7, 39.7.48.8 > > cluster network = 39.64.0.0/12 > public network = 39.0.0.0/12 > auth cluster required = cephx > auth service required = cephx > auth client required = cephx > osd journal size = 9000 > filestore xattr use omap = true > osd crush update on start = false > osd pool default size = 3 > osd pool default min size = 1 > osd pool default pg num = 4096 > osd pool default pgp num = 4096 > > mon clock drift allowed = .100 > osd mount options xfs = rw,noatime,inode64 > > > > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com