On Fri, Dec 18, 2015 at 11:16 AM, Christian Balzer <ch...@gol.com> wrote: > > Hello, > > On Fri, 18 Dec 2015 03:36:12 +0100 Francois Lafont wrote: > >> Hi, >> >> I have ceph cluster currently unused and I have (to my mind) very low >> performances. I'm not an expert in benchs, here an example of quick >> bench: >> >> --------------------------------------------------------------- >> # fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 >> --name=readwrite --filename=rw.data --bs=4k --iodepth=64 --size=300MB >> --readwrite=randrw --rwmixread=50 readwrite: (g=0): rw=randrw, >> bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64 fio-2.1.3 Starting 1 >> process readwrite: Laying out IO file(s) (1 file(s) / 300MB) >> Jobs: 1 (f=1): [m] [100.0% done] [2264KB/2128KB/0KB /s] [566/532/0 iops] >> [eta 00m:00s] readwrite: (groupid=0, jobs=1): err= 0: pid=3783: Fri Dec >> 18 02:01:13 2015 read : io=153640KB, bw=2302.9KB/s, iops=575, runt= >> 66719msec write: io=153560KB, bw=2301.7KB/s, iops=575, runt= 66719msec >> cpu : usr=0.77%, sys=3.07%, ctx=115432, majf=0, minf=604 >> IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >> >=64=99.9% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >> >64=0.0%, >=64=0.0% >> complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >> >=64=0.0% issued : total=r=38410/w=38390/d=0, short=r=0/w=0/d=0 >> >> Run status group 0 (all jobs): >> READ: io=153640KB, aggrb=2302KB/s, minb=2302KB/s, maxb=2302KB/s, >> mint=66719msec, maxt=66719msec WRITE: io=153560KB, aggrb=2301KB/s, >> minb=2301KB/s, maxb=2301KB/s, mint=66719msec, maxt=66719msec >> --------------------------------------------------------------- >>
fio tests AIO performance in this case. cephfs does not handle AIO properly, AIO is actually SYNC IO. that's why cephfs is so slow in this case. Regards Yan, Zheng >> It seems to me very bad. > Indeed. > Firstly let me state that I don't use CephFS and have no clues how this > influences things and can/should be tuned. > > That being said, the fio above running in VM (RBD) gives me 440 IOPS > against a single OSD storage server (replica 1) with 4 crappy HDDs and > on-disk journals on my test cluster (1Gb/s links). > So yeah, given your configuration that's bad. > > In comparison I get 3000 IOPS against a production cluster (so not idle) > with 4 storage nodes. Each with 4 100GB DC S3700 for journals and OS and 8 > SATA HDDs, Infiniband (IPoIB) connectivity for everything. > > All of this is with .80.x (Firefly) on Debian Jessie. > > >> Can I hope better results with my setup >> (explained below)? During the bench, I don't see particular symptoms (no >> CPU blocked at 100% etc). If you have advices to improve the perf and/or >> maybe to make smarter benchs, I'm really interested. >> > You want to use atop on all your nodes and look for everything from disks > to network utilization. > There might be nothing obvious going on, but it needs to be ruled out. > >> Thanks in advance for your help. Here is my conf... >> >> I use Ubuntu 14.04 on each server with the 3.13 kernel (it's the same >> for the client ceph where I run my bench) and I use Ceph 9.2.0 >> (Infernalis). > > I seem to recall that this particular kernel has issues, you might want to > scour the archives here. > >>On the client, cephfs is mounted via cephfs-fuse with this >> in /etc/fstab: >> >> id=cephfs,keyring=/etc/ceph/ceph.client.cephfs.keyring,client_mountpoint=/ >> /mnt/cephfs >> fuse.ceph noatime,defaults,_netdev 0 0 >> >> I have 5 cluster node servers "Supermicro Motherboard X10SLM+-LN4 S1150" >> with one 1GbE port for the ceph public network and one 10GbE port for >> the ceph private network: >> > For the sake of latency (which becomes the biggest issues when you're not > exhausting CPU/DISK), you'd be better off with everything on 10GbE, unless > you need the 1GbE to connect to clients that have no 10Gb/s ports. > >> - 1 x Intel Xeon E3-1265Lv3 >> - 1 SSD DC3710 Series 200GB (with partitions for the OS, the 3 >> OSD-journals and, just for ceph01, ceph02 and ceph03, the SSD contains >> too a partition for the workdir of a monitor > The 200GB DC S3700 would have been faster, but that's a moot point and not > your bottleneck for sure. > >> - 3 HD 4TB Western Digital (WD) SATA 7200rpm >> - RAM 32GB >> - NO RAID controlleur > > Which controller are you using? > I recently came across an Adaptec SATA3 HBA that delivered only 176 MB/s > writes with 200GB DC S3700s as opposed to 280MB/s when used with Intel > onboard SATA-3 ports or a LSI 9211-4i HBA. > > Regards, > > Christian > >> - Each partition uses XFS with noatim option, except the OS partition in >> EXT4. >> >> Here is my ceph.conf : >> >> --------------------------------------------------------------- >> [global] >> fsid = xxxxxxxxxxxxxxxxxxxxxxxxxxxx >> cluster network = 192.168.22.0/24 >> public network = 10.0.2.0/24 >> auth cluster required = cephx >> auth service required = cephx >> auth client required = cephx >> filestore xattr use omap = true >> osd pool default size = 3 >> osd pool default min size = 1 >> osd pool default pg num = 64 >> osd pool default pgp num = 64 >> osd crush chooseleaf type = 1 >> osd journal size = 0 >> osd max backfills = 1 >> osd recovery max active = 1 >> osd client op priority = 63 >> osd recovery op priority = 1 >> osd op threads = 4 >> mds cache size = 1000000 >> osd scrub begin hour = 3 >> osd scrub end hour = 5 >> mon allow pool delete = false >> mon osd down out subtree limit = host >> mon osd min down reporters = 4 >> >> [mon.ceph01] >> host = ceph01 >> mon addr = 10.0.2.101 >> >> [mon.ceph02] >> host = ceph02 >> mon addr = 10.0.2.102 >> >> [mon.ceph03] >> host = ceph03 >> mon addr = 10.0.2.103 >> --------------------------------------------------------------- >> >> mds are in active/standby mode. >> > > > -- > Christian Balzer Network/Systems Engineer > ch...@gol.com Global OnLine Japan/Rakuten Communications > http://www.gol.com/ > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com