I've just added another server ( same specs) with one osd and the behavior is the same - bad performance ..cur MB/s 0 Check network with iperf3 ..no issues
So it is not a server issue since I am getting same behavior with 2 different servers ... but I checked network with iperf3 ..no issues What can it be ? ceph osd df tree ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS TYPE NAME -1 3.44714 - 588G 80693M 509G 0 0 - root default -9 0.57458 - 588G 80693M 509G 13.39 1.13 - host osd01 5 hdd 0.57458 1.00000 588G 80693M 509G 13.39 1.13 64 osd.5 -7 1.14899 - 1176G 130G 1046G 11.06 0.94 - host osd02 0 hdd 0.57500 1.00000 588G 70061M 519G 11.63 0.98 50 osd.0 1 hdd 0.57500 1.00000 588G 63200M 526G 10.49 0.89 41 osd.1 -3 1.14899 - 1176G 138G 1038G 11.76 1.00 - host osd03 2 hdd 0.57500 1.00000 588G 68581M 521G 11.38 0.96 48 osd.2 3 hdd 0.57500 1.00000 588G 73185M 516G 12.15 1.03 53 osd.3 -4 0.57458 - 0 0 0 0 0 - host osd04 4 hdd 0.57458 0 0 0 0 0 0 0 osd.4 2018-04-10 15:11:58.542507 min lat: 0.0201432 max lat: 13.9308 avg lat: 0.466235 sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s) 40 16 1294 1278 127.785 0 - 0.466235 41 16 1294 1278 124.668 0 - 0.466235 42 16 1294 1278 121.7 0 - 0.466235 43 16 1294 1278 118.87 0 - 0.466235 44 16 1302 1286 116.896 6.4 0.0302793 0.469203 45 16 1395 1379 122.564 372 0.312525 0.51994 46 16 1458 1442 125.377 252 0.0387492 0.501892 47 16 1458 1442 122.709 0 - 0.501892 48 16 1458 1442 120.153 0 - 0.501892 49 16 1458 1442 117.701 0 - 0.501892 50 16 1522 1506 120.466 64 0.137913 0.516969 51 16 1522 1506 118.104 0 - 0.516969 52 16 1522 1506 115.833 0 - 0.516969 53 16 1522 1506 113.648 0 - 0.516969 54 16 1522 1506 111.543 0 - 0.516969 55 16 1522 1506 109.515 0 - 0.516969 56 16 1522 1506 107.559 0 - 0.516969 57 16 1522 1506 105.672 0 - 0.516969 58 16 1522 1506 103.851 0 - 0.516969 Total time run: 58.927431 Total reads made: 1522 Read size: 4194304 Object size: 4194304 Bandwidth (MB/sec): 103.314 Average IOPS: 25 Stddev IOPS: 35 Max IOPS: 106 Min IOPS: 0 Average Latency(s): 0.618812 Max latency(s): 13.9308 Min latency(s): 0.0201432 iperf3 -c 192.168.0.181 -i1 -t 10 Connecting to host 192.168.0.181, port 5201 [ 4] local 192.168.0.182 port 57448 connected to 192.168.0.181 port 5201 [ ID] Interval Transfer Bandwidth Retr Cwnd [ 4] 0.00-1.00 sec 1.15 GBytes 9.92 Gbits/sec 0 830 KBytes [ 4] 1.00-2.00 sec 1.15 GBytes 9.90 Gbits/sec 0 830 KBytes [ 4] 2.00-3.00 sec 1.15 GBytes 9.91 Gbits/sec 0 918 KBytes [ 4] 3.00-4.00 sec 1.15 GBytes 9.90 Gbits/sec 0 918 KBytes [ 4] 4.00-5.00 sec 1.15 GBytes 9.90 Gbits/sec 0 918 KBytes [ 4] 5.00-6.00 sec 1.15 GBytes 9.90 Gbits/sec 0 918 KBytes [ 4] 6.00-7.00 sec 1.15 GBytes 9.90 Gbits/sec 0 918 KBytes [ 4] 7.00-8.00 sec 1.15 GBytes 9.90 Gbits/sec 0 918 KBytes [ 4] 8.00-9.00 sec 1.15 GBytes 9.90 Gbits/sec 0 918 KBytes [ 4] 9.00-10.00 sec 1.15 GBytes 9.91 Gbits/sec 0 918 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-10.00 sec 11.5 GBytes 9.90 Gbits/sec 0 sender [ 4] 0.00-10.00 sec 11.5 GBytes 9.90 Gbits/sec receiver On Tue, 10 Apr 2018 at 08:49, Steven Vacaroaia <ste...@gmail.com> wrote: > Hi, > Thanks for providing guidance > > VD0 is the SSD drive > Many people suggested to not enable WB for SSD so that cache can be used > for HDD where is needed more > > Setup is 3 identical DELL R620 server OSD01, OSD02, OSD04 > 10 GB separate networks, 600 GB Entreprise HDD , 320 GB Entreprise SSD > Blustore, separate WAL / DB on SSD ( 1 GB partition for WAL, 30GB for DB) > > With 2 OSD per servers and only OSD01, OSD02 , performance is as expected > ( no gaps CUR MB/s ) > > Adding one OSD from OSD04, tanks performance ( lots of gaps CUR MB/s 0 ) > > See below > > ceph -s > cluster: > id: 1e98e57a-ef41-4327-b88a-dd2531912632 > health: HEALTH_WARN > noscrub,nodeep-scrub flag(s) set > > > > > WITH OSD04 > > ceph osd tree > ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF > -1 2.87256 root default > -7 1.14899 host osd02 > 0 hdd 0.57500 osd.0 up 1.00000 1.00000 > 1 hdd 0.57500 osd.1 up 1.00000 1.00000 > -3 1.14899 host osd03 > 2 hdd 0.57500 osd.2 up 1.00000 1.00000 > 3 hdd 0.57500 osd.3 up 1.00000 1.00000 > -4 0.57458 host osd04 > 4 hdd 0.57458 osd.4 up 1.00000 1.00000 > > > 2018-04-10 08:37:08.111037 min lat: 0.0128562 max lat: 13.1623 avg lat: > 0.528273 > sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg > lat(s) > 100 16 3001 2985 119.388 90 0.0169507 > 0.528273 > 101 16 3029 3013 119.315 112 0.0410565 > 0.524325 > 102 16 3029 3013 118.145 0 - > 0.524325 > 103 16 3029 3013 116.998 0 - > 0.524325 > 104 16 3029 3013 115.873 0 - > 0.524325 > 105 16 3071 3055 116.37 42 0.0888923 > 0.54832 > 106 16 3156 3140 118.479 340 0.0162464 > 0.535244 > 107 16 3156 3140 117.372 0 - > 0.535244 > 108 16 3156 3140 116.285 0 - > 0.535244 > 109 16 3156 3140 115.218 0 - > 0.535244 > 110 16 3156 3140 114.171 0 - > 0.535244 > 111 16 3156 3140 113.142 0 - > 0.535244 > 112 16 3156 3140 112.132 0 - > 0.535244 > 113 16 3156 3140 111.14 0 - > 0.535244 > 114 16 3156 3140 110.165 0 - > 0.535244 > 115 16 3156 3140 109.207 0 - > 0.535244 > 116 16 3230 3214 110.817 29.6 0.0169969 > 0.574856 > 117 16 3311 3295 112.639 324 0.0704851 > 0.565529 > 118 16 3311 3295 111.684 0 - > 0.565529 > 119 16 3311 3295 110.746 0 - > 0.565529 > 2018-04-10 08:37:28.112886 min lat: 0.0128562 max lat: 14.7293 avg lat: > 0.565529 > sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg > lat(s) > 120 16 3311 3295 109.823 0 - > 0.565529 > 121 16 3311 3295 108.915 0 - > 0.565529 > 122 16 3311 3295 108.022 0 - > 0.565529 > Total time run: 122.568983 > Total writes made: 3312 > Write size: 4194304 > Object size: 4194304 > Bandwidth (MB/sec): 108.086 > Stddev Bandwidth: 121.191 > Max bandwidth (MB/sec): 520 > Min bandwidth (MB/sec): 0 > Average IOPS: 27 > Stddev IOPS: 30 > Max IOPS: 130 > Min IOPS: 0 > Average Latency(s): 0.591771 > Stddev Latency(s): 1.74753 > Max latency(s): 14.7293 > Min latency(s): 0.0128562 > > > AFTER ceph osd down osd.4; ceph osd out osd.4 > > ceph osd tree > ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF > -1 2.87256 root default > -7 1.14899 host osd02 > 0 hdd 0.57500 osd.0 up 1.00000 1.00000 > 1 hdd 0.57500 osd.1 up 1.00000 1.00000 > -3 1.14899 host osd03 > 2 hdd 0.57500 osd.2 up 1.00000 1.00000 > 3 hdd 0.57500 osd.3 up 1.00000 1.00000 > -4 0.57458 host osd04 > 4 hdd 0.57458 osd.4 up 0 1.00000 > > > 2018-04-10 08:46:55.193642 min lat: 0.0156532 max lat: 2.5884 avg lat: > 0.310681 > sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg > lat(s) > 100 16 5144 5128 205.097 220 0.0372222 > 0.310681 > 101 16 5196 5180 205.126 208 0.421245 > 0.310908 > 102 16 5232 5216 204.526 144 0.543723 > 0.311544 > 103 16 5271 5255 204.055 156 0.465998 > 0.312394 > 104 16 5310 5294 203.593 156 0.483188 > 0.313355 > 105 16 5357 5341 203.444 188 0.0313209 > 0.313267 > 106 16 5402 5386 203.223 180 0.517098 > 0.313714 > 107 16 5457 5441 203.379 220 0.0277359 > 0.313288 > 108 16 5515 5499 203.644 232 0.470556 > 0.313203 > 109 16 5565 5549 203.611 200 0.564713 > 0.313173 > 110 16 5606 5590 203.25 164 0.0223166 > 0.313596 > 111 16 5659 5643 203.329 212 0.0231103 > 0.313597 > 112 16 5703 5687 203.085 176 0.033348 > 0.314018 > 113 16 5757 5741 203.199 216 1.53862 > 0.313991 > 114 16 5798 5782 202.855 164 0.4711 > 0.314511 > 115 16 5852 5836 202.969 216 0.0350226 > 0.31424 > 116 16 5912 5896 203.288 240 0.0253188 > 0.313657 > 117 16 5964 5948 203.328 208 0.0223623 > 0.313562 > 118 16 6024 6008 203.639 240 0.174245 > 0.313531 > 119 16 6070 6054 203.473 184 0.712498 > 0.313582 > 2018-04-10 08:47:15.195873 min lat: 0.0154679 max lat: 2.5884 avg lat: > 0.313564 > sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg > lat(s) > 120 16 6120 6104 203.444 200 0.0351212 > 0.313564 > Total time run: 120.551897 > Total writes made: 6120 > Write size: 4194304 > Object size: 4194304 > Bandwidth (MB/sec): 203.066 > Stddev Bandwidth: 43.8329 > Max bandwidth (MB/sec): 480 > Min bandwidth (MB/sec): 128 > Average IOPS: 50 > Stddev IOPS: 10 > Max IOPS: 120 > Min IOPS: 32 > Average Latency(s): 0.314959 > Stddev Latency(s): 0.379298 > Max latency(s): 2.5884 > Min latency(s): 0.0154679 > > > > On Tue, 10 Apr 2018 at 07:58, Kai Wagner <kwag...@suse.com> wrote: > >> Is this just from one server or from all servers? Just wondering why VD >> 0 is using WriteThrough compared to the others. If that's the setup for >> the OSD's you already have a cache setup problem. >> >> >> On 10.04.2018 13:44, Mohamad Gebai wrote: >> > megacli -LDGetProp -cache -Lall -a0 >> > >> > Adapter 0-VD 0(target id: 0): Cache Policy:WriteThrough, >> > ReadAheadNone, Direct, Write Cache OK if bad BBU >> > Adapter 0-VD 1(target id: 1): Cache Policy:WriteBack, ReadAdaptive, >> > Cached, No Write Cache if bad BBU >> > Adapter 0-VD 2(target id: 2): Cache Policy:WriteBack, ReadAdaptive, >> > Cached, No Write Cache if bad BBU >> > Adapter 0-VD 3(target id: 3): Cache Policy:WriteBack, ReadAdaptive, >> > Cached, No Write Cache if bad BBU >> >> -- >> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB >> 21284 (AG Nürnberg) >> >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com