Also take a look at Galera cluster. You can relax flushing to disk as long as 
all your nodes don't go down at the same time.
(And when a node goes back up after a crash you should trash it before it 
rejoins the cluster)

Jan


> On 26 Feb 2016, at 11:01, Nick Fisk <n...@fisk.me.uk> wrote:
> 
> I guess my question was more around what does your final workload look like, 
> if it’s the same as the SQL benchmarks then you are not going to get much 
> better performance than what you do now, aside from trying some of the tuning 
> options I mentioned which might get you an extra 100iops.
> 
> The only other option would be to look at some sort of clientside SSD caching 
> (flashcache.bcahce...etc) of the RBD. These are not ideal, but it might be 
> your only option of getting near local sync write performance.
> 
>> -----Original Message-----
>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
>> Huan Zhang
>> Sent: 26 February 2016 09:30
>> To: Nick Fisk <n...@fisk.me.uk>
>> Cc: josh durgin <josh.dur...@inktank.com>; ceph-users <ceph-
>> us...@ceph.com>
>> Subject: Re: [ceph-users] Guest sync write iops so poor.
>> 
>> Hi Nick,
>> DB's IO pattern depends on config, mysql for example.
>> innodb_flush_log_at_trx_commit =1, mysql will sync after one transcation.
>> like:
>> write
>> sync
>> wirte
>> sync
>> ...
>> 
>> innodb_flush_log_at_trx_commit = 5,
>> write
>> write
>> write
>> write
>> write
>> sync
>> 
>> innodb_flush_log_at_trx_commit = 0,
>> write
>> write
>> ...
>> one second later.
>> sync.
>> 
>> 
>> may not very accurate, but more or less.
>> We test mysql tps, with nnodb_flush_log_at_trx_commit =1, get very poor
>> performance even if we can reach very high O_DIRECT randwrite iops with
>> fio.
>> 
>> 
>> 
>> 2016-02-26 16:59 GMT+08:00 Nick Fisk <n...@fisk.me.uk>:
>>> -----Original Message-----
>>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
>> Of
>>> Huan Zhang
>>> Sent: 26 February 2016 06:50
>>> To: Jason Dillaman <dilla...@redhat.com>
>>> Cc: josh durgin <josh.dur...@inktank.com>; Nick Fisk <n...@fisk.me.uk>;
>>> ceph-users <ceph-us...@ceph.com>
>>> Subject: Re: [ceph-users] Guest sync write iops so poor.
>>> 
>>> rbd engine with fsync=1 seems stuck.
>>> Jobs: 1 (f=1): [w(1)] [0.0% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta
>>> 1244d:10h:39m:18s]
>>> 
>>> But fio using /dev/rbd0 sync=1 direct=1 ioengine=libaio iodepth=64, get
>> very
>>> high iops ~35K, similar to direct wirte.
>>> 
>>> I'm confused with that result, IMHO, ceph could just ignore the sync cache
>>> command since it always use sync write to journal, right?
>> 
>> Even if the data is not sync'd to the data storage part of the OSD, the data 
>> still
>> has to be written to the journal and this is where the performance limit 
>> lies.
>> 
>> The very nature of SDS means that you are never going to achieve the same
>> latency as you do to a local disk as even if the software side introduced no
>> extra latency, just the network latency will severely limit your sync
>> performance.
>> 
>> Do you know the IO pattern the DB's generate? I know you can switch most
>> DB's to flush with O_DIRECT instead of sync, it might be this helps in your
>> case.
>> 
>> Also check out the tech talk from last month about high performance
>> databases on Ceph. The presenter gave the impression that, at least in their
>> case, not every write was a sync IO. So your results could possibly matter 
>> less
>> than you think.
>> 
>> Also please search the lists and past presentations about reducing write
>> latency. There are a few things you can do like disabling logging and some
>> kernel parameters to stop the CPU's entering sleep states/reducing
>> frequency. One thing I witnessed that if the Ceph cluster is only running at
>> low queue depths, so it's only generating low cpu load, all the cores on the
>> CPU's throttle themselves down to their lowest speeds, which really hurts
>> latency.
>> 
>>> 
>>> Why we get so bad sync iops, how ceph handle it?
>>> Very appreciated to get your reply!
>>> 
>>> 2016-02-25 22:44 GMT+08:00 Jason Dillaman <dilla...@redhat.com>:
>>>> 35K IOPS with ioengine=rbd sounds like the "sync=1" option doesn't
>>> actually
>>>> work. Or it's not touching the same object (but I wonder whether write
>>>> ordering is preserved at that rate?).
>>> 
>>> The fio rbd engine does not support "sync=1"; however, it should support
>>> "fsync=1" to accomplish roughly the same effect.
>>> 
>>> Jason
>> 
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to