Also take a look at Galera cluster. You can relax flushing to disk as long as all your nodes don't go down at the same time. (And when a node goes back up after a crash you should trash it before it rejoins the cluster)
Jan > On 26 Feb 2016, at 11:01, Nick Fisk <n...@fisk.me.uk> wrote: > > I guess my question was more around what does your final workload look like, > if it’s the same as the SQL benchmarks then you are not going to get much > better performance than what you do now, aside from trying some of the tuning > options I mentioned which might get you an extra 100iops. > > The only other option would be to look at some sort of clientside SSD caching > (flashcache.bcahce...etc) of the RBD. These are not ideal, but it might be > your only option of getting near local sync write performance. > >> -----Original Message----- >> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of >> Huan Zhang >> Sent: 26 February 2016 09:30 >> To: Nick Fisk <n...@fisk.me.uk> >> Cc: josh durgin <josh.dur...@inktank.com>; ceph-users <ceph- >> us...@ceph.com> >> Subject: Re: [ceph-users] Guest sync write iops so poor. >> >> Hi Nick, >> DB's IO pattern depends on config, mysql for example. >> innodb_flush_log_at_trx_commit =1, mysql will sync after one transcation. >> like: >> write >> sync >> wirte >> sync >> ... >> >> innodb_flush_log_at_trx_commit = 5, >> write >> write >> write >> write >> write >> sync >> >> innodb_flush_log_at_trx_commit = 0, >> write >> write >> ... >> one second later. >> sync. >> >> >> may not very accurate, but more or less. >> We test mysql tps, with nnodb_flush_log_at_trx_commit =1, get very poor >> performance even if we can reach very high O_DIRECT randwrite iops with >> fio. >> >> >> >> 2016-02-26 16:59 GMT+08:00 Nick Fisk <n...@fisk.me.uk>: >>> -----Original Message----- >>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf >> Of >>> Huan Zhang >>> Sent: 26 February 2016 06:50 >>> To: Jason Dillaman <dilla...@redhat.com> >>> Cc: josh durgin <josh.dur...@inktank.com>; Nick Fisk <n...@fisk.me.uk>; >>> ceph-users <ceph-us...@ceph.com> >>> Subject: Re: [ceph-users] Guest sync write iops so poor. >>> >>> rbd engine with fsync=1 seems stuck. >>> Jobs: 1 (f=1): [w(1)] [0.0% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta >>> 1244d:10h:39m:18s] >>> >>> But fio using /dev/rbd0 sync=1 direct=1 ioengine=libaio iodepth=64, get >> very >>> high iops ~35K, similar to direct wirte. >>> >>> I'm confused with that result, IMHO, ceph could just ignore the sync cache >>> command since it always use sync write to journal, right? >> >> Even if the data is not sync'd to the data storage part of the OSD, the data >> still >> has to be written to the journal and this is where the performance limit >> lies. >> >> The very nature of SDS means that you are never going to achieve the same >> latency as you do to a local disk as even if the software side introduced no >> extra latency, just the network latency will severely limit your sync >> performance. >> >> Do you know the IO pattern the DB's generate? I know you can switch most >> DB's to flush with O_DIRECT instead of sync, it might be this helps in your >> case. >> >> Also check out the tech talk from last month about high performance >> databases on Ceph. The presenter gave the impression that, at least in their >> case, not every write was a sync IO. So your results could possibly matter >> less >> than you think. >> >> Also please search the lists and past presentations about reducing write >> latency. There are a few things you can do like disabling logging and some >> kernel parameters to stop the CPU's entering sleep states/reducing >> frequency. One thing I witnessed that if the Ceph cluster is only running at >> low queue depths, so it's only generating low cpu load, all the cores on the >> CPU's throttle themselves down to their lowest speeds, which really hurts >> latency. >> >>> >>> Why we get so bad sync iops, how ceph handle it? >>> Very appreciated to get your reply! >>> >>> 2016-02-25 22:44 GMT+08:00 Jason Dillaman <dilla...@redhat.com>: >>>> 35K IOPS with ioengine=rbd sounds like the "sync=1" option doesn't >>> actually >>>> work. Or it's not touching the same object (but I wonder whether write >>>> ordering is preserved at that rate?). >>> >>> The fio rbd engine does not support "sync=1"; however, it should support >>> "fsync=1" to accomplish roughly the same effect. >>> >>> Jason >> > > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com