Hi, Thanks for providing guidance VD0 is the SSD drive Many people suggested to not enable WB for SSD so that cache can be used for HDD where is needed more
Setup is 3 identical DELL R620 server OSD01, OSD02, OSD04 10 GB separate networks, 600 GB Entreprise HDD , 320 GB Entreprise SSD Blustore, separate WAL / DB on SSD ( 1 GB partition for WAL, 30GB for DB) With 2 OSD per servers and only OSD01, OSD02 , performance is as expected ( no gaps CUR MB/s ) Adding one OSD from OSD04, tanks performance ( lots of gaps CUR MB/s 0 ) See below ceph -s cluster: id: 1e98e57a-ef41-4327-b88a-dd2531912632 health: HEALTH_WARN noscrub,nodeep-scrub flag(s) set WITH OSD04 ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 2.87256 root default -7 1.14899 host osd02 0 hdd 0.57500 osd.0 up 1.00000 1.00000 1 hdd 0.57500 osd.1 up 1.00000 1.00000 -3 1.14899 host osd03 2 hdd 0.57500 osd.2 up 1.00000 1.00000 3 hdd 0.57500 osd.3 up 1.00000 1.00000 -4 0.57458 host osd04 4 hdd 0.57458 osd.4 up 1.00000 1.00000 2018-04-10 08:37:08.111037 min lat: 0.0128562 max lat: 13.1623 avg lat: 0.528273 sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s) 100 16 3001 2985 119.388 90 0.0169507 0.528273 101 16 3029 3013 119.315 112 0.0410565 0.524325 102 16 3029 3013 118.145 0 - 0.524325 103 16 3029 3013 116.998 0 - 0.524325 104 16 3029 3013 115.873 0 - 0.524325 105 16 3071 3055 116.37 42 0.0888923 0.54832 106 16 3156 3140 118.479 340 0.0162464 0.535244 107 16 3156 3140 117.372 0 - 0.535244 108 16 3156 3140 116.285 0 - 0.535244 109 16 3156 3140 115.218 0 - 0.535244 110 16 3156 3140 114.171 0 - 0.535244 111 16 3156 3140 113.142 0 - 0.535244 112 16 3156 3140 112.132 0 - 0.535244 113 16 3156 3140 111.14 0 - 0.535244 114 16 3156 3140 110.165 0 - 0.535244 115 16 3156 3140 109.207 0 - 0.535244 116 16 3230 3214 110.817 29.6 0.0169969 0.574856 117 16 3311 3295 112.639 324 0.0704851 0.565529 118 16 3311 3295 111.684 0 - 0.565529 119 16 3311 3295 110.746 0 - 0.565529 2018-04-10 08:37:28.112886 min lat: 0.0128562 max lat: 14.7293 avg lat: 0.565529 sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s) 120 16 3311 3295 109.823 0 - 0.565529 121 16 3311 3295 108.915 0 - 0.565529 122 16 3311 3295 108.022 0 - 0.565529 Total time run: 122.568983 Total writes made: 3312 Write size: 4194304 Object size: 4194304 Bandwidth (MB/sec): 108.086 Stddev Bandwidth: 121.191 Max bandwidth (MB/sec): 520 Min bandwidth (MB/sec): 0 Average IOPS: 27 Stddev IOPS: 30 Max IOPS: 130 Min IOPS: 0 Average Latency(s): 0.591771 Stddev Latency(s): 1.74753 Max latency(s): 14.7293 Min latency(s): 0.0128562 AFTER ceph osd down osd.4; ceph osd out osd.4 ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 2.87256 root default -7 1.14899 host osd02 0 hdd 0.57500 osd.0 up 1.00000 1.00000 1 hdd 0.57500 osd.1 up 1.00000 1.00000 -3 1.14899 host osd03 2 hdd 0.57500 osd.2 up 1.00000 1.00000 3 hdd 0.57500 osd.3 up 1.00000 1.00000 -4 0.57458 host osd04 4 hdd 0.57458 osd.4 up 0 1.00000 2018-04-10 08:46:55.193642 min lat: 0.0156532 max lat: 2.5884 avg lat: 0.310681 sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s) 100 16 5144 5128 205.097 220 0.0372222 0.310681 101 16 5196 5180 205.126 208 0.421245 0.310908 102 16 5232 5216 204.526 144 0.543723 0.311544 103 16 5271 5255 204.055 156 0.465998 0.312394 104 16 5310 5294 203.593 156 0.483188 0.313355 105 16 5357 5341 203.444 188 0.0313209 0.313267 106 16 5402 5386 203.223 180 0.517098 0.313714 107 16 5457 5441 203.379 220 0.0277359 0.313288 108 16 5515 5499 203.644 232 0.470556 0.313203 109 16 5565 5549 203.611 200 0.564713 0.313173 110 16 5606 5590 203.25 164 0.0223166 0.313596 111 16 5659 5643 203.329 212 0.0231103 0.313597 112 16 5703 5687 203.085 176 0.033348 0.314018 113 16 5757 5741 203.199 216 1.53862 0.313991 114 16 5798 5782 202.855 164 0.4711 0.314511 115 16 5852 5836 202.969 216 0.0350226 0.31424 116 16 5912 5896 203.288 240 0.0253188 0.313657 117 16 5964 5948 203.328 208 0.0223623 0.313562 118 16 6024 6008 203.639 240 0.174245 0.313531 119 16 6070 6054 203.473 184 0.712498 0.313582 2018-04-10 08:47:15.195873 min lat: 0.0154679 max lat: 2.5884 avg lat: 0.313564 sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s) 120 16 6120 6104 203.444 200 0.0351212 0.313564 Total time run: 120.551897 Total writes made: 6120 Write size: 4194304 Object size: 4194304 Bandwidth (MB/sec): 203.066 Stddev Bandwidth: 43.8329 Max bandwidth (MB/sec): 480 Min bandwidth (MB/sec): 128 Average IOPS: 50 Stddev IOPS: 10 Max IOPS: 120 Min IOPS: 32 Average Latency(s): 0.314959 Stddev Latency(s): 0.379298 Max latency(s): 2.5884 Min latency(s): 0.0154679 On Tue, 10 Apr 2018 at 07:58, Kai Wagner <kwag...@suse.com> wrote: > Is this just from one server or from all servers? Just wondering why VD > 0 is using WriteThrough compared to the others. If that's the setup for > the OSD's you already have a cache setup problem. > > > On 10.04.2018 13:44, Mohamad Gebai wrote: > > megacli -LDGetProp -cache -Lall -a0 > > > > Adapter 0-VD 0(target id: 0): Cache Policy:WriteThrough, > > ReadAheadNone, Direct, Write Cache OK if bad BBU > > Adapter 0-VD 1(target id: 1): Cache Policy:WriteBack, ReadAdaptive, > > Cached, No Write Cache if bad BBU > > Adapter 0-VD 2(target id: 2): Cache Policy:WriteBack, ReadAdaptive, > > Cached, No Write Cache if bad BBU > > Adapter 0-VD 3(target id: 3): Cache Policy:WriteBack, ReadAdaptive, > > Cached, No Write Cache if bad BBU > > -- > SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB > 21284 (AG Nürnberg) > > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com