FYI when I performed testing on our cluster I saw the same thing. fio randwrite 4k test over a large volume was a lot faster with larger RBD object size (8mb was marginally better than the default 4mb). It makes no sense to me unless there is a huge overhead with increasing number of objects. Or maybe there is some sort of alignment problem that causes small objects overlap with the actual workload. (In my cluster some objects are mysteriously sized as 4MiB-4KiB).
Jan > On 25. 3. 2016, at 10:17, Zhang Qiang <dotslash...@gmail.com> wrote: > > Hi Christian, Thanks for your reply, here're the test specs: > >>> > [global] > ioengine=libaio > runtime=90 > direct=1 > group_reporting > iodepth=16 > ramp_time=5 > size=1G > > [seq_w_4k_20] > bs=4k > filename=seq_w_4k_20 > rw=write > numjobs=20 > > [seq_w_1m_20] > bs=1m > filename=seq_w_1m_20 > rw=write > numjobs=20 > <<<< > > Test results: 4k - aggrb=13245KB/s, 1m - aggrb=1102.6MB/s > > Mount options: ceph-fuse /ceph -m 10.3.138.36:6789 > > Ceph configurations: > >>>> > filestore_xattr_use_omap = true > auth cluster required = cephx > auth service required = cephx > auth client required = cephx > osd journal size = 128 > osd pool default size = 2 > osd pool default min size = 1 > osd pool default pg num = 512 > osd pool default pgp num = 512 > osd crush chooseleaf type = 1 > <<<< > > Other configurations are all default. > > Status: > health HEALTH_OK > monmap e5: 5 mons at > {1=10.3.138.37:6789/0,2=10.3.138.39:6789/0,3=10.3.138.40:6789/0,4=10.3.138.59:6789/0,GGZ-YG-S0311-PLATFORM-138=10.3.138.36:6789/0} > election epoch 28, quorum 0,1,2,3,4 > GGZ-YG-S0311-PLATFORM-138,1,2,3,4 > mdsmap e55: 1/1/1 up {0=1=up:active} > osdmap e1290: 20 osds: 20 up, 20 in > pgmap v7180: 1000 pgs, 2 pools, 14925 MB data, 3851 objects > 37827 MB used, 20837 GB / 21991 GB avail > 1000 active+clean > >> On Fri, 25 Mar 2016 at 16:44 Christian Balzer <ch...@gol.com> wrote: >> >> Hello, >> >> On Fri, 25 Mar 2016 08:11:27 +0000 Zhang Qiang wrote: >> >> > Hi all, >> > >> > According to fio, >> Exact fio command please. >> >> >with 4k block size, the sequence write performance of >> > my ceph-fuse mount >> >> Exact mount options, ceph config (RBD cache) please. >> >> >is just about 20+ M/s, only 200 Mb of 1 Gb full >> > duplex NIC outgoing bandwidth was used for maximum. But for 1M block >> > size the performance could achieve as high as 1000 M/s, approaching the >> > limit of the NIC bandwidth. Why the performance stats differs so mush >> > for different block sizes? >> That's exactly why. >> You can see that with local attached storage as well, many small requests >> are slower than large (essential sequential) writes. >> Network attached storage in general (latency) and thus Ceph as well (plus >> code overhead) amplify that. >> >> >Can I configure ceph-fuse mount's block size >> > for maximum performance? >> > >> Very little to do with that if you're using sync writes (thus the fio >> command line pleasE), if not RBD cache could/should help. >> >> Christian >> >> > Basic information about the cluster: 20 OSDs on separate PCIe hard disks >> > distributed across 2 servers, each with write performance about 300 M/s; >> > 5 MONs; 1 MDS. Ceph version 0.94.6 >> > (e832001feaf8c176593e0325c8298e3f16dfb403). >> > >> > Thanks :) >> >> >> -- >> Christian Balzer Network/Systems Engineer >> ch...@gol.com Global OnLine Japan/Rakuten Communications >> http://www.gol.com/ > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com