It is idle, testing still, running a backup's at night on it. How do you fill up the cluster so you can test between empty and full? Do you have a "ceph df" from empty and full?
I have done another test disabling new scrubs on the rbd.ssd pool (but still 3 on hdd) with: ceph tell osd.* injectargs --osd_max_backfills=0 Again getting slower towards the end. Bandwidth (MB/sec): 395.749 Average Latency(s): 0.161713 -----Original Message----- From: Menno Zonneveld [mailto:me...@1afa.com] Sent: donderdag 6 september 2018 16:56 To: Marc Roos; ceph-users Subject: RE: [ceph-users] Rados performance inconsistencies, lower than expected performance The benchmark does fluctuate quite a bit that's why I run it for 180 seconds now as then I do get consistent results. Your performance seems on par with what I'm getting with 3 nodes and 9 OSD's, not sure what to make of that. Are your machines actively used perhaps? Mine are mostly idle as it's still a test setup. -----Original message----- > From:Marc Roos <m.r...@f1-outsourcing.eu> > Sent: Thursday 6th September 2018 16:23 > To: ceph-users <ceph-users@lists.ceph.com>; Menno Zonneveld > <me...@1afa.com> > Subject: RE: [ceph-users] Rados performance inconsistencies, lower > than expected performance > > > > I am on 4 nodes, mostly hdds, and 4x samsung sm863 480GB 2x E5-2660 2x > LSI SAS2308 1x dual port 10Gbit (one used, and shared between > cluster/client vlans) > > I have 5 pg's scrubbing, but I am not sure if there is any on the ssd > pool. I am noticing a drop in the performance at the end of the test. > Maybe some caching on the ssd? > > rados bench -p rbd.ssd 60 write -b 4M -t 16 > Bandwidth (MB/sec): 448.465 > Average Latency(s): 0.142671 > > rados bench -p rbd.ssd 180 write -b 4M -t 16 > Bandwidth (MB/sec): 381.998 > Average Latency(s): 0.167524 > > > -----Original Message----- > From: Menno Zonneveld [mailto:me...@1afa.com] > Sent: donderdag 6 september 2018 15:52 > To: Marc Roos; ceph-users > Subject: RE: [ceph-users] Rados performance inconsistencies, lower > than expected performance > > ah yes, 3x replicated with minimal 2. > > > my ceph.conf is pretty bare, just in case it might be relevant > > [global] > auth client required = cephx > auth cluster required = cephx > auth service required = cephx > > cluster network = 172.25.42.0/24 > > fsid = f4971cca-e73c-46bc-bb05-4af61d419f6e > > keyring = /etc/pve/priv/$cluster.$name.keyring > > mon allow pool delete = true > mon osd allow primary affinity = true > > osd journal size = 5120 > osd pool default min size = 2 > osd pool default size = 3 > > > -----Original message----- > > From:Marc Roos <m.r...@f1-outsourcing.eu> > > Sent: Thursday 6th September 2018 15:43 > > To: ceph-users <ceph-users@lists.ceph.com>; Menno Zonneveld > > <me...@1afa.com> > > Subject: RE: [ceph-users] Rados performance inconsistencies, lower > > than expected performance > > > > > > > > Test pool is 3x replicated? > > > > > > -----Original Message----- > > From: Menno Zonneveld [mailto:me...@1afa.com] > > Sent: donderdag 6 september 2018 15:29 > > To: ceph-users@lists.ceph.com > > Subject: [ceph-users] Rados performance inconsistencies, lower than > > expected performance > > > > I've setup a CEPH cluster to test things before going into > > production but I've run into some performance issues that I cannot > > resolve or explain. > > > > Hardware in use in each storage machine (x3) > > - dual 10Gbit Solarflare Communications SFC9020 (Linux bond, mtu > > 9000) > > - dual 10Gbit EdgeSwitch 16-Port XG > > - LSI Logic / Symbios Logic SAS2308 PCI-Express Fusion-MPT SAS-2 HBA > > - 3x Intel S4500 480GB SSD as OSD's > > - 2x SSD raid-1 boot/OS disks > > - 2x Intel(R) Xeon(R) CPU E5-2630 > > - 128GB memory > > > > Software wise I'm running CEPH 12.2.7-pve1 setup from Proxmox VE 5.2 > > on all nodes. > > > > Running rados benchmark resulted in somewhat lower than expected > > performance unless ceph enters the 'near-full' state. When the > > cluster > > > is mostly empty rados bench (180 write -b 4M -t 16) results in about > > 330MB/s with 0.18ms latency but when hitting near-full state this > > goes > > > up to a more expected 550MB/s and 0.11ms latency. > > > > iostat on the storage machines shows the disks are hardly utilized > > unless the cluster hits near-full, CPU and network also aren't maxed > > out. I’ve also tried with NIC bonding and just one switch, without > > jumbo frames but nothing seem to matter in this case. > > > > Is this expected behavior or what can I try to do to pinpoint the > > bottleneck ? > > > > The expected performance is per Proxmox's benchmark results they > > released this year, they have 4 OSD's per server and hit almost > > 800MB/s with 0.08ms latency using 10Gbit and 3 nodes, though they > > have > > > more OSD's and somewhat different hardware I understand I won't hit > > the 800MB/s mark but the difference between empty and almost full > > cluster makes no sense to me, I'd expect it to be the other way > around. > > > > Thanks, > > Menno > > _______________________________________________ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > > > _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com