Re: [ceph-users] Rados performance inconsistencies, lower than expected performance

Marc Roos Thu, 06 Sep 2018 08:15:49 -0700

It is idle, testing still, running a backup's at night on it.
How do you fill up the cluster so you can test between empty and full? 
Do you have a "ceph df" from empty and full?


I have done another test disabling new scrubs on the rbd.ssd pool (but 
still 3 on hdd) with:
ceph tell osd.* injectargs --osd_max_backfills=0
Again getting slower towards the end.
Bandwidth (MB/sec):     395.749
Average Latency(s):     0.161713


-----Original Message-----
From: Menno Zonneveld [mailto:me...@1afa.com] 
Sent: donderdag 6 september 2018 16:56
To: Marc Roos; ceph-users
Subject: RE: [ceph-users] Rados performance inconsistencies, lower than 
expected performance

The benchmark does fluctuate quite a bit that's why I run it for 180 
seconds now as then I do get consistent results.

Your performance seems on par with what I'm getting with 3 nodes and 9 
OSD's, not sure what to make of that.

Are your machines actively used perhaps? Mine are mostly idle as it's 
still a test setup.

-----Original message-----
> From:Marc Roos <m.r...@f1-outsourcing.eu>
> Sent: Thursday 6th September 2018 16:23
> To: ceph-users <ceph-users@lists.ceph.com>; Menno Zonneveld 
> <me...@1afa.com>
> Subject: RE: [ceph-users] Rados performance inconsistencies, lower 
> than expected performance
> 
> 
> 
> I am on 4 nodes, mostly hdds, and 4x samsung sm863 480GB 2x E5-2660 2x 

> LSI SAS2308 1x dual port 10Gbit (one used, and shared between 
> cluster/client vlans)
> 
> I have 5 pg's scrubbing, but I am not sure if there is any on the ssd 
> pool. I am noticing a drop in the performance at the end of the test.
> Maybe some caching on the ssd?
> 
> rados bench -p rbd.ssd 60 write -b 4M -t 16
> Bandwidth (MB/sec):     448.465
> Average Latency(s):     0.142671
> 
> rados bench -p rbd.ssd 180 write -b 4M -t 16
> Bandwidth (MB/sec):     381.998
> Average Latency(s):     0.167524
> 
> 
> -----Original Message-----
> From: Menno Zonneveld [mailto:me...@1afa.com]
> Sent: donderdag 6 september 2018 15:52
> To: Marc Roos; ceph-users
> Subject: RE: [ceph-users] Rados performance inconsistencies, lower 
> than expected performance
> 
> ah yes, 3x replicated with minimal 2.
> 
> 
> my ceph.conf is pretty bare, just in case it might be relevant
> 
> [global]
>        auth client required = cephx
>        auth cluster required = cephx
>        auth service required = cephx
> 
>        cluster network = 172.25.42.0/24
> 
>        fsid = f4971cca-e73c-46bc-bb05-4af61d419f6e
> 
>        keyring = /etc/pve/priv/$cluster.$name.keyring
> 
>        mon allow pool delete = true
>        mon osd allow primary affinity = true
> 
>        osd journal size = 5120
>        osd pool default min size = 2
>        osd pool default size = 3
> 
> 
> -----Original message-----
> > From:Marc Roos <m.r...@f1-outsourcing.eu>
> > Sent: Thursday 6th September 2018 15:43
> > To: ceph-users <ceph-users@lists.ceph.com>; Menno Zonneveld 
> > <me...@1afa.com>
> > Subject: RE: [ceph-users] Rados performance inconsistencies, lower 
> > than expected performance
> > 
> >  
> > 
> > Test pool is 3x replicated?
> > 
> > 
> > -----Original Message-----
> > From: Menno Zonneveld [mailto:me...@1afa.com]
> > Sent: donderdag 6 september 2018 15:29
> > To: ceph-users@lists.ceph.com
> > Subject: [ceph-users] Rados performance inconsistencies, lower than 
> > expected performance
> > 
> > I've setup a CEPH cluster to test things before going into 
> > production but I've run into some performance issues that I cannot 
> > resolve or explain.
> > 
> > Hardware in use in each storage machine (x3)
> > - dual 10Gbit Solarflare Communications SFC9020 (Linux bond, mtu 
> > 9000)
> > - dual 10Gbit EdgeSwitch 16-Port XG
> > - LSI Logic / Symbios Logic SAS2308 PCI-Express Fusion-MPT SAS-2 HBA
> > - 3x Intel S4500 480GB SSD as OSD's
> > - 2x SSD raid-1 boot/OS disks
> > - 2x Intel(R) Xeon(R) CPU E5-2630
> > - 128GB memory
> > 
> > Software wise I'm running CEPH 12.2.7-pve1 setup from Proxmox VE 5.2 

> > on all nodes.
> > 
> > Running rados benchmark resulted in somewhat lower than expected 
> > performance unless ceph enters the 'near-full' state. When the 
> > cluster
> 
> > is mostly empty rados bench (180 write -b 4M -t 16) results in about 

> > 330MB/s with 0.18ms latency but when hitting near-full state this 
> > goes
> 
> > up to a more expected 550MB/s and 0.11ms latency.
> > 
> > iostat on the storage machines shows the disks are hardly utilized 
> > unless the cluster hits near-full, CPU and network also aren't maxed 

> > out. I’ve also tried with NIC bonding and just one switch, without 
> > jumbo frames but nothing seem to matter in this case.
> > 
> > Is this expected behavior or what can I try to do to pinpoint the 
> > bottleneck ?
> > 
> > The expected performance is per Proxmox's benchmark results they 
> > released this year, they have 4 OSD's per server and hit almost 
> > 800MB/s with 0.08ms latency using 10Gbit and 3 nodes, though they 
> > have
> 
> > more OSD's and somewhat different hardware I understand I won't hit 
> > the 800MB/s mark but the difference between empty and almost full 
> > cluster makes no sense to me, I'd expect it to be the other way
> around.
> > 
> > Thanks,
> > Menno
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > 
> > 
> > 
> 
> 
> 


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Rados performance inconsistencies, lower than expected performance

Reply via email to