Re: [ceph-users] Rados performance inconsistencies, lower than expected performance
> > I’m running Proxmox VE 5.2 which includes ceph version 12.2.7 > > (94ce186ac93bb28c3c444bccfefb8a31eb0748e4) > luminous (stable) > 12.2.8 is in the repositories. ;) I forgot to reply to this part and I did notice the update afterwards and since updated but performance was the same. I redid all the steps (delete OSDs, zap disks, create OSDs and sync data for each server) and performance is back again. Any ideas or suggestions on how to provide useful information or how to debug this issue? Thanks, Menno ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Rados performance inconsistencies, lower than expected performance
-Original message- > From:Alwin Antreich > Sent: Thursday 13th September 2018 14:41 > To: Menno Zonneveld > Cc: ceph-users ; Marc Roos > > Subject: Re: [ceph-users] Rados performance inconsistencies, lower than > expected performance > > > Am I doing something wrong? Did I run into some sort of bug? > AFAIR, the object deletion is done in the background. Depending on how > quick you do the subsequent tests and how much the cluster is working on > the recovery, the results may vary well. I've noticed this as well but the performance will never recover unless I recreate the OSDs, the storage machines show no activity on CPU or disk so I assume the deletion and recovery is already done (also ceph -w shows everything is ok), this is something I made sure of while doing these tests. The second the third machine finishes recovery of recreating all OSDs performance is back again. Thanks, Menno ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Rados performance inconsistencies, lower than expected performance
On Thu, Sep 13, 2018 at 02:17:20PM +0200, Menno Zonneveld wrote: > Update on the subject, warning, lengthy post but reproducible results and > workaround to get performance back to expected level. > > One of the servers had a broken disk controller causing some performance > issues on this one host, FIO showed about half performance on some disks > compared to the other hosts, it’s been replaced but rados performance did not > improve. > > - write: 313.86 MB/s / 0.203907ms > > I figured it would be wise to test all servers and disks to see if they > deliver the expected performance. > > Since I have data on the cluster that I wanted to keep online I did one > server at a time, delete the 3 OSDs, FIO test them, recreate the OSDs and add > them back to the cluster, wait till the cluster is healthy and on to the next > server. All disks match the expected performance now so with all servers and > OSDs up again I redid the rados benchmark. > > Performance was almost twice as good as before recreating all OSDs and on par > with what I had expected for bandwidth and latency. > > - write: 586.679 MB/s / 0.109085ms > - read: 2092.27 MB/s / 0.0292913ms > > As the controller was not to blame I wanted to test if having different size > OSDs with ‘correct’ weights assigned was causing the issue, I removed one OSD > on each storage node (one node at a time) and re-partitioned it and added it > back to the cluster with the correct weight, performance was still ok though > a little slower as before. > > Figuring this wasn’t the cause either I took out the OSDs with partitions > again, wiped the disks and recreated the OSDs. Performance now was even > lower, almost as low as when I just swapped the controller. > > Since I knew the performance could be better I decided to recreate all OSDs > one server at a time and performance once again was good. > > Since now I was able to reproduce the issue I started once more and document > all the steps to see if there is any logic to the issue. > > With the cluster performing well I started removing one OSD at a time, wait > for the cluster to become healthy again, benchmark, add it back and on to the > next server. > > These are the results of each step. > > One OSD out: > > write: 528.439 / 0.121021 > read: 2022.19 / 0.03031 > > OSD back in again: > > write: 584.14 / 0.10956 > read: 2108.06 / 0.0289867 > > Next server, one OSD out: > > write: 482.923 / 0.132512 > read: 2008.38 / 0.0305356 > > OSD back in again: > > write: 578.034 / 0.110686 > read: 2059.24 / 0.0297554 > > Next server, one OSD out: > > write: 470.384 / 0.136055 > read: 2043.68 / 0.0299759 > > OSD back in again: > > write: 424.01 / 0.150886 > read: 2086.94 / 0.0293182 > > Write performance now is significantly lower as when I started. When I first > wrote on the mailing list performance seems to go up once CEPH enters > 'near-full' state so I decided to test that again. > > I reached full by accident and the last two write tests showed somewhat > better performance but not near the level I started with. > > write: 468.632 / 0.136559 > write: 488.523 / 0.130999 > > I removed the benchmark pool and recreated it, testing a few more times, > performance now seems even lower again and again near the results I started > off with. > > write: 449.034 / 0.142524 > write: 399.532 / 0.160168 > write: 366.831 / 0.174451 > > I know how to get the performance back to the expected level by recreating > all OSDs and shuffling data around the cluster but I don’t think this should > happen in the first place. > > Just to clarify when removing an OSD I reweigh it to 0, wait for it’s safe to > delete the OSD, I assume this is the correct way of doing such things. > > Am I doing something wrong? Did I run into some sort of bug? AFAIR, the object deletion is done in the background. Depending on how quick you do the subsequent tests and how much the cluster is working on the recovery, the results may vary well. > > I’m running Proxmox VE 5.2 which includes ceph version 12.2.7 > (94ce186ac93bb28c3c444bccfefb8a31eb0748e4) luminous (stable) 12.2.8 is in the repositories. ;) > > Thanks, > Menno > > > My script to safely remove an OSD: > > ceph osd crush reweight osd.$1 0.0 > while ! ceph osd safe-to-destroy $1; do echo "not safe to destroy, waiting.." > ; sleep 10 ; done > sleep 5 > ceph osd out $1 > systemctl disable ceph-osd@$1 > systemctl stop ceph-osd@$1 > ceph osd crush remove osd.$1 > ceph auth del osd.$1 > ceph osd down $1 > ceph osd rm $1 > -- Cheers, Alwin ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Rados performance inconsistencies, lower than expected performance
Update on the subject, warning, lengthy post but reproducible results and workaround to get performance back to expected level. One of the servers had a broken disk controller causing some performance issues on this one host, FIO showed about half performance on some disks compared to the other hosts, it’s been replaced but rados performance did not improve. - write: 313.86 MB/s / 0.203907ms I figured it would be wise to test all servers and disks to see if they deliver the expected performance. Since I have data on the cluster that I wanted to keep online I did one server at a time, delete the 3 OSDs, FIO test them, recreate the OSDs and add them back to the cluster, wait till the cluster is healthy and on to the next server. All disks match the expected performance now so with all servers and OSDs up again I redid the rados benchmark. Performance was almost twice as good as before recreating all OSDs and on par with what I had expected for bandwidth and latency. - write: 586.679 MB/s / 0.109085ms - read: 2092.27 MB/s / 0.0292913ms As the controller was not to blame I wanted to test if having different size OSDs with ‘correct’ weights assigned was causing the issue, I removed one OSD on each storage node (one node at a time) and re-partitioned it and added it back to the cluster with the correct weight, performance was still ok though a little slower as before. Figuring this wasn’t the cause either I took out the OSDs with partitions again, wiped the disks and recreated the OSDs. Performance now was even lower, almost as low as when I just swapped the controller. Since I knew the performance could be better I decided to recreate all OSDs one server at a time and performance once again was good. Since now I was able to reproduce the issue I started once more and document all the steps to see if there is any logic to the issue. With the cluster performing well I started removing one OSD at a time, wait for the cluster to become healthy again, benchmark, add it back and on to the next server. These are the results of each step. One OSD out: write: 528.439 / 0.121021 read: 2022.19 / 0.03031 OSD back in again: write: 584.14 / 0.10956 read: 2108.06 / 0.0289867 Next server, one OSD out: write: 482.923 / 0.132512 read: 2008.38 / 0.0305356 OSD back in again: write: 578.034 / 0.110686 read: 2059.24 / 0.0297554 Next server, one OSD out: write: 470.384 / 0.136055 read: 2043.68 / 0.0299759 OSD back in again: write: 424.01 / 0.150886 read: 2086.94 / 0.0293182 Write performance now is significantly lower as when I started. When I first wrote on the mailing list performance seems to go up once CEPH enters 'near-full' state so I decided to test that again. I reached full by accident and the last two write tests showed somewhat better performance but not near the level I started with. write: 468.632 / 0.136559 write: 488.523 / 0.130999 I removed the benchmark pool and recreated it, testing a few more times, performance now seems even lower again and again near the results I started off with. write: 449.034 / 0.142524 write: 399.532 / 0.160168 write: 366.831 / 0.174451 I know how to get the performance back to the expected level by recreating all OSDs and shuffling data around the cluster but I don’t think this should happen in the first place. Just to clarify when removing an OSD I reweigh it to 0, wait for it’s safe to delete the OSD, I assume this is the correct way of doing such things. Am I doing something wrong? Did I run into some sort of bug? I’m running Proxmox VE 5.2 which includes ceph version 12.2.7 (94ce186ac93bb28c3c444bccfefb8a31eb0748e4) luminous (stable) Thanks, Menno My script to safely remove an OSD: ceph osd crush reweight osd.$1 0.0 while ! ceph osd safe-to-destroy $1; do echo "not safe to destroy, waiting.." ; sleep 10 ; done sleep 5 ceph osd out $1 systemctl disable ceph-osd@$1 systemctl stop ceph-osd@$1 ceph osd crush remove osd.$1 ceph auth del osd.$1 ceph osd down $1 ceph osd rm $1 -Original message- > From:Menno Zonneveld > Sent: Monday 10th September 2018 11:45 > To: Alwin Antreich ; ceph-users > > Cc: Marc Roos > Subject: RE: [ceph-users] Rados performance inconsistencies, lower than > expected performance > > > -Original message- > > From:Alwin Antreich > > Sent: Thursday 6th September 2018 18:36 > > To: ceph-users > > Cc: Menno Zonneveld ; Marc Roos > > Subject: Re: [ceph-users] Rados performance inconsistencies, lower than > expected performance > > > > On Thu, Sep 06, 2018 at 05:15:26PM +0200, Marc Roos wrote: > > > > > > It is idle, testing still, running a backup's at night on it. > > > How do you fill up the cluster so you can test between empty and full? > > > > Do you have a "ceph df" from empty and full? > > > > > > I have done anot
Re: [ceph-users] Rados performance inconsistencies, lower than expected performance
-Original message- > From:Alwin Antreich > Sent: Thursday 6th September 2018 18:36 > To: ceph-users > Cc: Menno Zonneveld ; Marc Roos > Subject: Re: [ceph-users] Rados performance inconsistencies, lower than > expected performance > > On Thu, Sep 06, 2018 at 05:15:26PM +0200, Marc Roos wrote: > > > > It is idle, testing still, running a backup's at night on it. > > How do you fill up the cluster so you can test between empty and full? > > Do you have a "ceph df" from empty and full? > > > > I have done another test disabling new scrubs on the rbd.ssd pool (but > > still 3 on hdd) with: > > ceph tell osd.* injectargs --osd_max_backfills=0 > > Again getting slower towards the end. > > Bandwidth (MB/sec): 395.749 > > Average Latency(s): 0.161713 > In the results you both had, the latency is twice as high as in our > tests [1]. That can already make quiet some difference. Depending on the > actual hardware used, there may or may not be the possibility for good > optimisation. > > As a start, you could test the disks with fio, as shown in our benchmark > paper, to get some results for comparison. The forum thread [1] has > some benchmarks from other users for comparison. > > [1] https://forum.proxmox.com/threads/proxmox-ve-ceph-benchmark-2018-02.41761/ Thanks for the suggestion, I redid the fio test and one server seem to be causing trouble. When I initially tested our SSD's according to the benchmark paper our Intel SSD's performed more or less equal to the Samsung SSD's used. from fio.log fio: (groupid=0, jobs=1): err= 0: pid=3606315: Mon Sep 10 11:12:36 2018 write: io=4005.9MB, bw=68366KB/s, iops=17091, runt= 60001msec slat (usec): min=5, max=252, avg= 5.76, stdev= 0.66 clat (usec): min=6, max=949, avg=51.72, stdev= 9.54 lat (usec): min=54, max=955, avg=57.48, stdev= 9.56 However one of the other machines (with identical SSD's) now performs poorly compared to the others with these results fio: (groupid=0, jobs=1): err= 0: pid=3893600: Mon Sep 10 11:15:17 2018 write: io=1258.8MB, bw=51801KB/s, iops=12950, runt= 24883msec slat (usec): min=5, max=259, avg= 6.17, stdev= 0.78 clat (usec): min=53, max=857, avg=69.77, stdev=13.11 lat (usec): min=70, max=863, avg=75.93, stdev=13.17 I'll first resolve the slower machine before doing more testing as this surely won't help overall performance. > -- > Cheers, > Alwin Thanks!, Menno ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Rados performance inconsistencies, lower than expected performance
I filled up the cluster by accident by not supplying --no-cleanup to the write benchmark, I'm sure there must be a better way for that though. I've run the tests again and when the cluster is 'empty' (I have a few test VM's stored on CEPH) and let it fill up again. Performance goes up from 276.812 to 433.859 MB/sec and latency goes down from 0.231178 to 0.147433. I do have to mention I did find a problem with the cluster thanks to Alwin's suggestion to (re)do fio benchmarks, one server with identical SSD's is performing poorly compared to the others, I'll resolve this first before continuing other benchmarks. When empty: # ceph df GLOBAL: SIZE AVAIL RAW USED %RAW USED 3784G 2488G 1295G 34.24 POOLS: NAME ID USED %USED MAX AVAIL OBJECTS ssd 1 431G 37.33 723G 110984 rbdbench 76 0 0 723G 0 # rados bench -p rbdbench 180 write -b 4M -t 16 --no-cleanup Total time run: 180.223580 Total writes made: 12472 Write size: 4194304 Object size: 4194304 Bandwidth (MB/sec): 276.812 Stddev Bandwidth: 66.2295 Max bandwidth (MB/sec): 524 Min bandwidth (MB/sec): 112 Average IOPS: 69 Stddev IOPS: 16 Max IOPS: 131 Min IOPS: 28 Average Latency(s): 0.231178 Stddev Latency(s): 0.19153 Max latency(s): 1.16432 Min latency(s): 0.022585 And after a few benchmarks when I hit CEPH's warning near-full.: # ceph df GLOBAL: SIZE AVAIL RAW USED %RAW USED 3784G 751G 3032G 80.13 POOLS: NAME ID USED %USED MAX AVAIL OBJECTS ssd 1 431G 82.93 90858M 110984 rbdbench 76 579G 86.73 90858M 148467 # rados bench -p rbdbench 180 write -b 4M -t 16 --no-cleanup Total time run: 180.233495 Total writes made: 19549 Write size: 4194304 Object size: 4194304 Bandwidth (MB/sec): 433.859 Stddev Bandwidth: 73.0601 Max bandwidth (MB/sec): 584 Min bandwidth (MB/sec): 220 Average IOPS: 108 Stddev IOPS: 18 Max IOPS: 146 Min IOPS: 55 Average Latency(s): 0.147433 Stddev Latency(s): 0.103518 Max latency(s): 1.08162 Min latency(s): 0.0218688 -Original message- > From:Marc Roos > Sent: Thursday 6th September 2018 17:15 > To: ceph-users ; Menno Zonneveld > Subject: RE: [ceph-users] Rados performance inconsistencies, lower than > expected performance > > > It is idle, testing still, running a backup's at night on it. > How do you fill up the cluster so you can test between empty and full? > Do you have a "ceph df" from empty and full? > > I have done another test disabling new scrubs on the rbd.ssd pool (but > still 3 on hdd) with: > ceph tell osd.* injectargs --osd_max_backfills=0 > Again getting slower towards the end. > Bandwidth (MB/sec): 395.749 > Average Latency(s): 0.161713 > > > -Original Message- > From: Menno Zonneveld [mailto:me...@1afa.com] > Sent: donderdag 6 september 2018 16:56 > To: Marc Roos; ceph-users > Subject: RE: [ceph-users] Rados performance inconsistencies, lower than > expected performance > > The benchmark does fluctuate quite a bit that's why I run it for 180 > seconds now as then I do get consistent results. > > Your performance seems on par with what I'm getting with 3 nodes and 9 > OSD's, not sure what to make of that. > > Are your machines actively used perhaps? Mine are mostly idle as it's > still a test setup. > > -Original message- > > From:Marc Roos > > Sent: Thursday 6th September 2018 16:23 > > To: ceph-users ; Menno Zonneveld > > > > Subject: RE: [ceph-users] Rados performance inconsistencies, lower > > than expected performance > > > > > > > > I am on 4 nodes, mostly hdds, and 4x samsung sm863 480GB 2x E5-2660 2x > > > LSI SAS2308 1x dual port 10Gbit (one used, and shared between > > cluster/client vlans) > > > > I have 5 pg's scrubbing, but I am not sure if there is any on the ssd > > pool. I am noticing a drop in the performance at the end of the test. > > Maybe some caching on the ssd? > > > > rados bench -p rbd.ssd 60 write -b 4M -t 16 > > Bandwidth (MB/sec): 448.465 > > Average Latency(s): 0.142671 > > > > rados bench -p rbd.ssd 180 write -b 4M -t 16 > > Bandwidth (MB/sec): 381.998 > > Average Latency(s): 0.167524 > > > > > > -Original Message- > > From: Menno Zonneveld [mailto:me...@1afa.com] > > Sent: donderdag
Re: [ceph-users] Rados performance inconsistencies, lower than expected performance
the samsung sm863. write-4k-seq: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1 randwrite-4k-seq: (g=1): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1 read-4k-seq: (g=2): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1 randread-4k-seq: (g=3): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1 rw-4k-seq: (g=4): rw=rw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1 randrw-4k-seq: (g=5): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1 write-128k-seq: (g=6): rw=write, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=1 randwrite-128k-seq: (g=7): rw=randwrite, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=1 read-128k-seq: (g=8): rw=read, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=1 randread-128k-seq: (g=9): rw=randread, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=1 rw-128k-seq: (g=10): rw=rw, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=1 randrw-128k-seq: (g=11): rw=randrw, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=1 write-1024k-seq: (g=12): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=1 randwrite-1024k-seq: (g=13): rw=randwrite, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=1 read-1024k-seq: (g=14): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=1 randread-1024k-seq: (g=15): rw=randread, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=1 rw-1024k-seq: (g=16): rw=rw, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=1 randrw-1024k-seq: (g=17): rw=randrw, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=1 fio-3.1 Starting 18 processes write-4k-seq: (groupid=0, jobs=1): err= 0: pid=522702: Thu Sep 6 21:04:12 2018 write: IOPS=18.3k, BW=71.6MiB/s (75.1MB/s)(12.6GiB/180001msec) slat (usec): min=4, max=118, avg= 9.69, stdev= 4.91 clat (nsec): min=1147, max=647409, avg=42105.25, stdev=8700.49 lat (usec): min=34, max=662, avg=52.09, stdev= 9.30 clat percentiles (usec): | 1.00th=[ 33], 5.00th=[ 35], 10.00th=[ 35], 20.00th=[ 35], | 30.00th=[ 36], 40.00th=[ 36], 50.00th=[ 39], 60.00th=[ 43], | 70.00th=[ 47], 80.00th=[ 52], 90.00th=[ 56], 95.00th=[ 58], | 99.00th=[ 60], 99.50th=[ 62], 99.90th=[ 67], 99.95th=[ 69], | 99.99th=[ 155] bw ( KiB/s): min=36464, max=92829, per=84.59%, avg=62036.62, stdev=13226.37, samples=359 iops: min= 9116, max=23207, avg=15508.80, stdev=3306.62, samples=359 lat (usec) : 2=0.01%, 10=0.01%, 20=0.01%, 50=75.33%, 100=24.64% lat (usec) : 250=0.01%, 500=0.01%, 750=0.01% cpu : usr=11.41%, sys=27.84%, ctx=3300541, majf=0, minf=51 IO depths: 1=116.7%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwt: total=0,3300219,0, short=0,0,0, dropped=0,0,0 latency : target=0, window=0, percentile=100.00%, depth=1 randwrite-4k-seq: (groupid=1, jobs=1): err= 0: pid=522903: Thu Sep 6 21:04:12 2018 write: IOPS=17.9k, BW=69.8MiB/s (73.2MB/s)(12.3GiB/180001msec) slat (usec): min=4, max=333, avg= 9.94, stdev= 5.00 clat (nsec): min=1141, max=1131.2k, avg=42560.69, stdev=9074.14 lat (usec): min=35, max=1137, avg=52.80, stdev= 9.42 clat percentiles (usec): | 1.00th=[ 33], 5.00th=[ 35], 10.00th=[ 35], 20.00th=[ 35], | 30.00th=[ 36], 40.00th=[ 36], 50.00th=[ 41], 60.00th=[ 43], | 70.00th=[ 49], 80.00th=[ 54], 90.00th=[ 57], 95.00th=[ 58], | 99.00th=[ 60], 99.50th=[ 62], 99.90th=[ 67], 99.95th=[ 70], | 99.99th=[ 174] bw ( KiB/s): min=34338, max=92268, per=84.26%, avg=60268.13, stdev=12283.36, samples=359 iops: min= 8584, max=23067, avg=15066.67, stdev=3070.87, samples=359 lat (usec) : 2=0.01%, 10=0.01%, 20=0.01%, 50=71.73%, 100=28.24% lat (usec) : 250=0.01%, 500=0.01%, 750=0.01% lat (msec) : 2=0.01% cpu : usr=12.96%, sys=26.87%, ctx=3218988, majf=0, minf=10962 IO depths: 1=116.8%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
Re: [ceph-users] Rados performance inconsistencies, lower than expected performance
On Thu, Sep 06, 2018 at 05:15:26PM +0200, Marc Roos wrote: > > It is idle, testing still, running a backup's at night on it. > How do you fill up the cluster so you can test between empty and full? > Do you have a "ceph df" from empty and full? > > I have done another test disabling new scrubs on the rbd.ssd pool (but > still 3 on hdd) with: > ceph tell osd.* injectargs --osd_max_backfills=0 > Again getting slower towards the end. > Bandwidth (MB/sec): 395.749 > Average Latency(s): 0.161713 In the results you both had, the latency is twice as high as in our tests [1]. That can already make quiet some difference. Depending on the actual hardware used, there may or may not be the possibility for good optimisation. As a start, you could test the disks with fio, as shown in our benchmark paper, to get some results for comparison. The forum thread [1] has some benchmarks from other users for comparison. [1] https://forum.proxmox.com/threads/proxmox-ve-ceph-benchmark-2018-02.41761/ -- Cheers, Alwin ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Rados performance inconsistencies, lower than expected performance
It is idle, testing still, running a backup's at night on it. How do you fill up the cluster so you can test between empty and full? Do you have a "ceph df" from empty and full? I have done another test disabling new scrubs on the rbd.ssd pool (but still 3 on hdd) with: ceph tell osd.* injectargs --osd_max_backfills=0 Again getting slower towards the end. Bandwidth (MB/sec): 395.749 Average Latency(s): 0.161713 -Original Message- From: Menno Zonneveld [mailto:me...@1afa.com] Sent: donderdag 6 september 2018 16:56 To: Marc Roos; ceph-users Subject: RE: [ceph-users] Rados performance inconsistencies, lower than expected performance The benchmark does fluctuate quite a bit that's why I run it for 180 seconds now as then I do get consistent results. Your performance seems on par with what I'm getting with 3 nodes and 9 OSD's, not sure what to make of that. Are your machines actively used perhaps? Mine are mostly idle as it's still a test setup. -Original message- > From:Marc Roos > Sent: Thursday 6th September 2018 16:23 > To: ceph-users ; Menno Zonneveld > > Subject: RE: [ceph-users] Rados performance inconsistencies, lower > than expected performance > > > > I am on 4 nodes, mostly hdds, and 4x samsung sm863 480GB 2x E5-2660 2x > LSI SAS2308 1x dual port 10Gbit (one used, and shared between > cluster/client vlans) > > I have 5 pg's scrubbing, but I am not sure if there is any on the ssd > pool. I am noticing a drop in the performance at the end of the test. > Maybe some caching on the ssd? > > rados bench -p rbd.ssd 60 write -b 4M -t 16 > Bandwidth (MB/sec): 448.465 > Average Latency(s): 0.142671 > > rados bench -p rbd.ssd 180 write -b 4M -t 16 > Bandwidth (MB/sec): 381.998 > Average Latency(s): 0.167524 > > > -Original Message- > From: Menno Zonneveld [mailto:me...@1afa.com] > Sent: donderdag 6 september 2018 15:52 > To: Marc Roos; ceph-users > Subject: RE: [ceph-users] Rados performance inconsistencies, lower > than expected performance > > ah yes, 3x replicated with minimal 2. > > > my ceph.conf is pretty bare, just in case it might be relevant > > [global] >auth client required = cephx >auth cluster required = cephx >auth service required = cephx > >cluster network = 172.25.42.0/24 > >fsid = f4971cca-e73c-46bc-bb05-4af61d419f6e > >keyring = /etc/pve/priv/$cluster.$name.keyring > >mon allow pool delete = true >mon osd allow primary affinity = true > >osd journal size = 5120 >osd pool default min size = 2 >osd pool default size = 3 > > > -Original message- > > From:Marc Roos > > Sent: Thursday 6th September 2018 15:43 > > To: ceph-users ; Menno Zonneveld > > > > Subject: RE: [ceph-users] Rados performance inconsistencies, lower > > than expected performance > > > > > > > > Test pool is 3x replicated? > > > > > > -Original Message- > > From: Menno Zonneveld [mailto:me...@1afa.com] > > Sent: donderdag 6 september 2018 15:29 > > To: ceph-users@lists.ceph.com > > Subject: [ceph-users] Rados performance inconsistencies, lower than > > expected performance > > > > I've setup a CEPH cluster to test things before going into > > production but I've run into some performance issues that I cannot > > resolve or explain. > > > > Hardware in use in each storage machine (x3) > > - dual 10Gbit Solarflare Communications SFC9020 (Linux bond, mtu > > 9000) > > - dual 10Gbit EdgeSwitch 16-Port XG > > - LSI Logic / Symbios Logic SAS2308 PCI-Express Fusion-MPT SAS-2 HBA > > - 3x Intel S4500 480GB SSD as OSD's > > - 2x SSD raid-1 boot/OS disks > > - 2x Intel(R) Xeon(R) CPU E5-2630 > > - 128GB memory > > > > Software wise I'm running CEPH 12.2.7-pve1 setup from Proxmox VE 5.2 > > on all nodes. > > > > Running rados benchmark resulted in somewhat lower than expected > > performance unless ceph enters the 'near-full' state. When the > > cluster > > > is mostly empty rados bench (180 write -b 4M -t 16) results in about > > 330MB/s with 0.18ms latency but when hitting near-full state this > > goes > > > up to a more expected 550MB/s and 0.11ms latency. > > > > iostat on the storage machines shows the disks are hardly utilized > > unless the cluster hits near-full, CPU and network also aren't maxed > > out. I’ve also tried with NIC bonding and just one switch, without > > jumbo frames but nothing seem to matter in this case. > > &
Re: [ceph-users] Rados performance inconsistencies, lower than expected performance
The benchmark does fluctuate quite a bit that's why I run it for 180 seconds now as then I do get consistent results. Your performance seems on par with what I'm getting with 3 nodes and 9 OSD's, not sure what to make of that. Are your machines actively used perhaps? Mine are mostly idle as it's still a test setup. -Original message- > From:Marc Roos > Sent: Thursday 6th September 2018 16:23 > To: ceph-users ; Menno Zonneveld > Subject: RE: [ceph-users] Rados performance inconsistencies, lower than > expected performance > > > > I am on 4 nodes, mostly hdds, and 4x samsung sm863 480GB > 2x E5-2660 > 2x LSI SAS2308 > 1x dual port 10Gbit (one used, and shared between cluster/client vlans) > > I have 5 pg's scrubbing, but I am not sure if there is any on the ssd > pool. I am noticing a drop in the performance at the end of the test. > Maybe some caching on the ssd? > > rados bench -p rbd.ssd 60 write -b 4M -t 16 > Bandwidth (MB/sec): 448.465 > Average Latency(s): 0.142671 > > rados bench -p rbd.ssd 180 write -b 4M -t 16 > Bandwidth (MB/sec): 381.998 > Average Latency(s): 0.167524 > > > -Original Message- > From: Menno Zonneveld [mailto:me...@1afa.com] > Sent: donderdag 6 september 2018 15:52 > To: Marc Roos; ceph-users > Subject: RE: [ceph-users] Rados performance inconsistencies, lower than > expected performance > > ah yes, 3x replicated with minimal 2. > > > my ceph.conf is pretty bare, just in case it might be relevant > > [global] >auth client required = cephx >auth cluster required = cephx >auth service required = cephx > >cluster network = 172.25.42.0/24 > >fsid = f4971cca-e73c-46bc-bb05-4af61d419f6e > >keyring = /etc/pve/priv/$cluster.$name.keyring > >mon allow pool delete = true >mon osd allow primary affinity = true > >osd journal size = 5120 >osd pool default min size = 2 >osd pool default size = 3 > > > -Original message----- > > From:Marc Roos > > Sent: Thursday 6th September 2018 15:43 > > To: ceph-users ; Menno Zonneveld > > > > Subject: RE: [ceph-users] Rados performance inconsistencies, lower > > than expected performance > > > > > > > > Test pool is 3x replicated? > > > > > > -Original Message- > > From: Menno Zonneveld [mailto:me...@1afa.com] > > Sent: donderdag 6 september 2018 15:29 > > To: ceph-users@lists.ceph.com > > Subject: [ceph-users] Rados performance inconsistencies, lower than > > expected performance > > > > I've setup a CEPH cluster to test things before going into production > > but I've run into some performance issues that I cannot resolve or > > explain. > > > > Hardware in use in each storage machine (x3) > > - dual 10Gbit Solarflare Communications SFC9020 (Linux bond, mtu 9000) > > - dual 10Gbit EdgeSwitch 16-Port XG > > - LSI Logic / Symbios Logic SAS2308 PCI-Express Fusion-MPT SAS-2 HBA > > - 3x Intel S4500 480GB SSD as OSD's > > - 2x SSD raid-1 boot/OS disks > > - 2x Intel(R) Xeon(R) CPU E5-2630 > > - 128GB memory > > > > Software wise I'm running CEPH 12.2.7-pve1 setup from Proxmox VE 5.2 > > on all nodes. > > > > Running rados benchmark resulted in somewhat lower than expected > > performance unless ceph enters the 'near-full' state. When the cluster > > > is mostly empty rados bench (180 write -b 4M -t 16) results in about > > 330MB/s with 0.18ms latency but when hitting near-full state this goes > > > up to a more expected 550MB/s and 0.11ms latency. > > > > iostat on the storage machines shows the disks are hardly utilized > > unless the cluster hits near-full, CPU and network also aren't maxed > > out. I’ve also tried with NIC bonding and just one switch, without > > jumbo frames but nothing seem to matter in this case. > > > > Is this expected behavior or what can I try to do to pinpoint the > > bottleneck ? > > > > The expected performance is per Proxmox's benchmark results they > > released this year, they have 4 OSD's per server and hit almost > > 800MB/s with 0.08ms latency using 10Gbit and 3 nodes, though they have > > > more OSD's and somewhat different hardware I understand I won't hit > > the 800MB/s mark but the difference between empty and almost full > > cluster makes no sense to me, I'd expect it to be the other way > around. > > > > Thanks, > > Menno > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Rados performance inconsistencies, lower than expected performance
-Original message- > From:Alwin Antreich > Sent: Thursday 6th September 2018 16:27 > To: ceph-users > Cc: Menno Zonneveld > Subject: Re: [ceph-users] Rados performance inconsistencies, lower than > expected performance > > Hi, Hi! > On Thu, Sep 06, 2018 at 03:52:21PM +0200, Menno Zonneveld wrote: > > ah yes, 3x replicated with minimal 2. > > > > > > my ceph.conf is pretty bare, just in case it might be relevant > > > > [global] > > auth client required = cephx > > auth cluster required = cephx > > auth service required = cephx > > > > cluster network = 172.25.42.0/24 > > > > fsid = f4971cca-e73c-46bc-bb05-4af61d419f6e > > > > keyring = /etc/pve/priv/$cluster.$name.keyring > > > > mon allow pool delete = true > > mon osd allow primary affinity = true > On our test cluster, we didn't set the primary affinity as all OSDs were > SSDs of the same model. Did you do any settings other than this? How > does your crush map look like? I only used this option when testing with mixing HDD and SSD (1 replica on SSD and 2 on HDD); right now affinity for all disks is 1. The weight of one OSD in each server is lower because I have partitioned the drive to be able to test with SSD journal for HDDs but this isn't active at the moment. If I understand correctly setting the weight like this should be fine and I also tested with weight 1 for all OSD's and I still get the same performance ('slow' when empty, fast when full) Current ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 3.71997 root ssd -5 1.23999 host ceph01-test 2 ssd 0.36600 osd.2 up 1.0 1.0 3 ssd 0.43700 osd.3 up 1.0 1.0 6 ssd 0.43700 osd.6 up 1.0 1.0 -7 1.23999 host ceph02-test 4 ssd 0.36600 osd.4 up 1.0 1.0 5 ssd 0.43700 osd.5 up 1.0 1.0 7 ssd 0.43700 osd.7 up 1.0 1.0 -3 1.23999 host ceph03-test 0 ssd 0.36600 osd.0 up 1.0 1.0 1 ssd 0.43700 osd.1 up 1.0 1.0 8 ssd 0.43700 osd.8 up 1.0 1.0 My current crush map looks like this: # begin crush map tunable choose_local_tries 0 tunable choose_local_fallback_tries 0 tunable choose_total_tries 50 tunable chooseleaf_descend_once 1 tunable chooseleaf_vary_r 1 tunable chooseleaf_stable 1 tunable straw_calc_version 1 tunable allowed_bucket_algs 54 # devices device 0 osd.0 class ssd device 1 osd.1 class ssd device 2 osd.2 class ssd device 3 osd.3 class ssd device 4 osd.4 class ssd device 5 osd.5 class ssd device 6 osd.6 class ssd device 7 osd.7 class ssd device 8 osd.8 class ssd # types type 0 osd type 1 host type 2 chassis type 3 rack type 4 row type 5 pdu type 6 pod type 7 room type 8 datacenter type 9 region type 10 root # buckets host ceph03-test { id -3 # do not change unnecessarily id -4 class ssd # do not change unnecessarily # weight 1.240 alg straw2 hash 0 # rjenkins1 item osd.1 weight 0.437 item osd.0 weight 0.366 item osd.8 weight 0.437 } host ceph01-test { id -5 # do not change unnecessarily id -6 class ssd # do not change unnecessarily # weight 1.240 alg straw2 hash 0 # rjenkins1 item osd.3 weight 0.437 item osd.2 weight 0.366 item osd.6 weight 0.437 } host ceph02-test { id -7 # do not change unnecessarily id -8 class ssd # do not change unnecessarily # weight 1.240 alg straw2 hash 0 # rjenkins1 item osd.5 weight 0.437 item osd.4 weight 0.366 item osd.7 weight 0.437 } root ssd { id -1 # do not change unnecessarily id -2 class ssd # do not change unnecessarily # weight 3.720 alg straw2 hash 0 # rjenkins1 item ceph03-test weight 1.240 item ceph01-test weight 1.240 item ceph02-test weight 1.240 } # rules rule ssd { id 0 type replicated min_size 1 max_size 10 step take ssd step chooseleaf firstn 0 type host step emit } # end crush map > > > > osd journal size = 5120 > > osd pool default min size = 2 > > osd pool default size = 3 > > > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Rados performance inconsistencies, lower than expected performance
Hi, On Thu, Sep 06, 2018 at 03:52:21PM +0200, Menno Zonneveld wrote: > ah yes, 3x replicated with minimal 2. > > > my ceph.conf is pretty bare, just in case it might be relevant > > [global] >auth client required = cephx >auth cluster required = cephx >auth service required = cephx > >cluster network = 172.25.42.0/24 > >fsid = f4971cca-e73c-46bc-bb05-4af61d419f6e > >keyring = /etc/pve/priv/$cluster.$name.keyring > >mon allow pool delete = true >mon osd allow primary affinity = true On our test cluster, we didn't set the primary affinity as all OSDs were SSDs of the same model. Did you do any settings other than this? How does your crush map look like? > >osd journal size = 5120 >osd pool default min size = 2 >osd pool default size = 3 > > > -Original message- > > From:Marc Roos > > Sent: Thursday 6th September 2018 15:43 > > To: ceph-users ; Menno Zonneveld > > Subject: RE: [ceph-users] Rados performance inconsistencies, lower than > > expected performance > > > > > > > > Test pool is 3x replicated? > > > > > > -Original Message- > > From: Menno Zonneveld [mailto:me...@1afa.com] > > Sent: donderdag 6 september 2018 15:29 > > To: ceph-users@lists.ceph.com > > Subject: [ceph-users] Rados performance inconsistencies, lower than > > expected performance > > > > I've setup a CEPH cluster to test things before going into production > > but I've run into some performance issues that I cannot resolve or > > explain. > > > > Hardware in use in each storage machine (x3) > > - dual 10Gbit Solarflare Communications SFC9020 (Linux bond, mtu 9000) > > - dual 10Gbit EdgeSwitch 16-Port XG > > - LSI Logic / Symbios Logic SAS2308 PCI-Express Fusion-MPT SAS-2 HBA > > - 3x Intel S4500 480GB SSD as OSD's > > - 2x SSD raid-1 boot/OS disks > > - 2x Intel(R) Xeon(R) CPU E5-2630 > > - 128GB memory > > > > Software wise I'm running CEPH 12.2.7-pve1 setup from Proxmox VE 5.2 on > > all nodes. > > > > Running rados benchmark resulted in somewhat lower than expected > > performance unless ceph enters the 'near-full' state. When the cluster > > is mostly empty rados bench (180 write -b 4M -t 16) results in about > > 330MB/s with 0.18ms latency but when hitting near-full state this goes > > up to a more expected 550MB/s and 0.11ms latency. > > > > iostat on the storage machines shows the disks are hardly utilized > > unless the cluster hits near-full, CPU and network also aren't maxed > > out. I’ve also tried with NIC bonding and just one switch, without > > jumbo frames but nothing seem to matter in this case. > > > > Is this expected behavior or what can I try to do to pinpoint the > > bottleneck ? > > > > The expected performance is per Proxmox's benchmark results they > > released this year, they have 4 OSD's per server and hit almost 800MB/s > > with 0.08ms latency using 10Gbit and 3 nodes, though they have more > > OSD's and somewhat different hardware I understand I won't hit the > > 800MB/s mark but the difference between empty and almost full cluster > > makes no sense to me, I'd expect it to be the other way around. > > > > Thanks, > > Menno ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Rados performance inconsistencies, lower than expected performance
I am on 4 nodes, mostly hdds, and 4x samsung sm863 480GB 2x E5-2660 2x LSI SAS2308 1x dual port 10Gbit (one used, and shared between cluster/client vlans) I have 5 pg's scrubbing, but I am not sure if there is any on the ssd pool. I am noticing a drop in the performance at the end of the test. Maybe some caching on the ssd? rados bench -p rbd.ssd 60 write -b 4M -t 16 Bandwidth (MB/sec): 448.465 Average Latency(s): 0.142671 rados bench -p rbd.ssd 180 write -b 4M -t 16 Bandwidth (MB/sec): 381.998 Average Latency(s): 0.167524 -Original Message- From: Menno Zonneveld [mailto:me...@1afa.com] Sent: donderdag 6 september 2018 15:52 To: Marc Roos; ceph-users Subject: RE: [ceph-users] Rados performance inconsistencies, lower than expected performance ah yes, 3x replicated with minimal 2. my ceph.conf is pretty bare, just in case it might be relevant [global] auth client required = cephx auth cluster required = cephx auth service required = cephx cluster network = 172.25.42.0/24 fsid = f4971cca-e73c-46bc-bb05-4af61d419f6e keyring = /etc/pve/priv/$cluster.$name.keyring mon allow pool delete = true mon osd allow primary affinity = true osd journal size = 5120 osd pool default min size = 2 osd pool default size = 3 -Original message- > From:Marc Roos > Sent: Thursday 6th September 2018 15:43 > To: ceph-users ; Menno Zonneveld > > Subject: RE: [ceph-users] Rados performance inconsistencies, lower > than expected performance > > > > Test pool is 3x replicated? > > > -Original Message- > From: Menno Zonneveld [mailto:me...@1afa.com] > Sent: donderdag 6 september 2018 15:29 > To: ceph-users@lists.ceph.com > Subject: [ceph-users] Rados performance inconsistencies, lower than > expected performance > > I've setup a CEPH cluster to test things before going into production > but I've run into some performance issues that I cannot resolve or > explain. > > Hardware in use in each storage machine (x3) > - dual 10Gbit Solarflare Communications SFC9020 (Linux bond, mtu 9000) > - dual 10Gbit EdgeSwitch 16-Port XG > - LSI Logic / Symbios Logic SAS2308 PCI-Express Fusion-MPT SAS-2 HBA > - 3x Intel S4500 480GB SSD as OSD's > - 2x SSD raid-1 boot/OS disks > - 2x Intel(R) Xeon(R) CPU E5-2630 > - 128GB memory > > Software wise I'm running CEPH 12.2.7-pve1 setup from Proxmox VE 5.2 > on all nodes. > > Running rados benchmark resulted in somewhat lower than expected > performance unless ceph enters the 'near-full' state. When the cluster > is mostly empty rados bench (180 write -b 4M -t 16) results in about > 330MB/s with 0.18ms latency but when hitting near-full state this goes > up to a more expected 550MB/s and 0.11ms latency. > > iostat on the storage machines shows the disks are hardly utilized > unless the cluster hits near-full, CPU and network also aren't maxed > out. I’ve also tried with NIC bonding and just one switch, without > jumbo frames but nothing seem to matter in this case. > > Is this expected behavior or what can I try to do to pinpoint the > bottleneck ? > > The expected performance is per Proxmox's benchmark results they > released this year, they have 4 OSD's per server and hit almost > 800MB/s with 0.08ms latency using 10Gbit and 3 nodes, though they have > more OSD's and somewhat different hardware I understand I won't hit > the 800MB/s mark but the difference between empty and almost full > cluster makes no sense to me, I'd expect it to be the other way around. > > Thanks, > Menno > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Rados performance inconsistencies, lower than expected performance
ah yes, 3x replicated with minimal 2. my ceph.conf is pretty bare, just in case it might be relevant [global] auth client required = cephx auth cluster required = cephx auth service required = cephx cluster network = 172.25.42.0/24 fsid = f4971cca-e73c-46bc-bb05-4af61d419f6e keyring = /etc/pve/priv/$cluster.$name.keyring mon allow pool delete = true mon osd allow primary affinity = true osd journal size = 5120 osd pool default min size = 2 osd pool default size = 3 -Original message- > From:Marc Roos > Sent: Thursday 6th September 2018 15:43 > To: ceph-users ; Menno Zonneveld > Subject: RE: [ceph-users] Rados performance inconsistencies, lower than > expected performance > > > > Test pool is 3x replicated? > > > -Original Message- > From: Menno Zonneveld [mailto:me...@1afa.com] > Sent: donderdag 6 september 2018 15:29 > To: ceph-users@lists.ceph.com > Subject: [ceph-users] Rados performance inconsistencies, lower than > expected performance > > I've setup a CEPH cluster to test things before going into production > but I've run into some performance issues that I cannot resolve or > explain. > > Hardware in use in each storage machine (x3) > - dual 10Gbit Solarflare Communications SFC9020 (Linux bond, mtu 9000) > - dual 10Gbit EdgeSwitch 16-Port XG > - LSI Logic / Symbios Logic SAS2308 PCI-Express Fusion-MPT SAS-2 HBA > - 3x Intel S4500 480GB SSD as OSD's > - 2x SSD raid-1 boot/OS disks > - 2x Intel(R) Xeon(R) CPU E5-2630 > - 128GB memory > > Software wise I'm running CEPH 12.2.7-pve1 setup from Proxmox VE 5.2 on > all nodes. > > Running rados benchmark resulted in somewhat lower than expected > performance unless ceph enters the 'near-full' state. When the cluster > is mostly empty rados bench (180 write -b 4M -t 16) results in about > 330MB/s with 0.18ms latency but when hitting near-full state this goes > up to a more expected 550MB/s and 0.11ms latency. > > iostat on the storage machines shows the disks are hardly utilized > unless the cluster hits near-full, CPU and network also aren't maxed > out. I’ve also tried with NIC bonding and just one switch, without > jumbo frames but nothing seem to matter in this case. > > Is this expected behavior or what can I try to do to pinpoint the > bottleneck ? > > The expected performance is per Proxmox's benchmark results they > released this year, they have 4 OSD's per server and hit almost 800MB/s > with 0.08ms latency using 10Gbit and 3 nodes, though they have more > OSD's and somewhat different hardware I understand I won't hit the > 800MB/s mark but the difference between empty and almost full cluster > makes no sense to me, I'd expect it to be the other way around. > > Thanks, > Menno > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Rados performance inconsistencies, lower than expected performance
Test pool is 3x replicated? -Original Message- From: Menno Zonneveld [mailto:me...@1afa.com] Sent: donderdag 6 september 2018 15:29 To: ceph-users@lists.ceph.com Subject: [ceph-users] Rados performance inconsistencies, lower than expected performance I've setup a CEPH cluster to test things before going into production but I've run into some performance issues that I cannot resolve or explain. Hardware in use in each storage machine (x3) - dual 10Gbit Solarflare Communications SFC9020 (Linux bond, mtu 9000) - dual 10Gbit EdgeSwitch 16-Port XG - LSI Logic / Symbios Logic SAS2308 PCI-Express Fusion-MPT SAS-2 HBA - 3x Intel S4500 480GB SSD as OSD's - 2x SSD raid-1 boot/OS disks - 2x Intel(R) Xeon(R) CPU E5-2630 - 128GB memory Software wise I'm running CEPH 12.2.7-pve1 setup from Proxmox VE 5.2 on all nodes. Running rados benchmark resulted in somewhat lower than expected performance unless ceph enters the 'near-full' state. When the cluster is mostly empty rados bench (180 write -b 4M -t 16) results in about 330MB/s with 0.18ms latency but when hitting near-full state this goes up to a more expected 550MB/s and 0.11ms latency. iostat on the storage machines shows the disks are hardly utilized unless the cluster hits near-full, CPU and network also aren't maxed out. I’ve also tried with NIC bonding and just one switch, without jumbo frames but nothing seem to matter in this case. Is this expected behavior or what can I try to do to pinpoint the bottleneck ? The expected performance is per Proxmox's benchmark results they released this year, they have 4 OSD's per server and hit almost 800MB/s with 0.08ms latency using 10Gbit and 3 nodes, though they have more OSD's and somewhat different hardware I understand I won't hit the 800MB/s mark but the difference between empty and almost full cluster makes no sense to me, I'd expect it to be the other way around. Thanks, Menno ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Rados performance inconsistencies, lower than expected performance
I've setup a CEPH cluster to test things before going into production but I've run into some performance issues that I cannot resolve or explain. Hardware in use in each storage machine (x3) - dual 10Gbit Solarflare Communications SFC9020 (Linux bond, mtu 9000) - dual 10Gbit EdgeSwitch 16-Port XG - LSI Logic / Symbios Logic SAS2308 PCI-Express Fusion-MPT SAS-2 HBA - 3x Intel S4500 480GB SSD as OSD's - 2x SSD raid-1 boot/OS disks - 2x Intel(R) Xeon(R) CPU E5-2630 - 128GB memory Software wise I'm running CEPH 12.2.7-pve1 setup from Proxmox VE 5.2 on all nodes. Running rados benchmark resulted in somewhat lower than expected performance unless ceph enters the 'near-full' state. When the cluster is mostly empty rados bench (180 write -b 4M -t 16) results in about 330MB/s with 0.18ms latency but when hitting near-full state this goes up to a more expected 550MB/s and 0.11ms latency. iostat on the storage machines shows the disks are hardly utilized unless the cluster hits near-full, CPU and network also aren't maxed out. I’ve also tried with NIC bonding and just one switch, without jumbo frames but nothing seem to matter in this case. Is this expected behavior or what can I try to do to pinpoint the bottleneck ? The expected performance is per Proxmox's benchmark results they released this year, they have 4 OSD's per server and hit almost 800MB/s with 0.08ms latency using 10Gbit and 3 nodes, though they have more OSD's and somewhat different hardware I understand I won't hit the 800MB/s mark but the difference between empty and almost full cluster makes no sense to me, I'd expect it to be the other way around. Thanks, Menno ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com