On Fri, Mar 6, 2015 at 10:18 AM, Nick Fisk <n...@fisk.me.uk> wrote: > On Fri, Mar 6, 2015 at 9:04 AM, Nick Fisk <n...@fisk.me.uk> wrote: > > > > > > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Jake Young > Sent: 06 March 2015 12:52 > To: Nick Fisk > Cc: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] tgt and krbd > > > > On Thursday, March 5, 2015, Nick Fisk <n...@fisk.me.uk> wrote: > Hi All, > > Just a heads up after a day’s experimentation. > > I believe tgt with its default settings has a small write cache when > exporting a kernel mapped RBD. Doing some write tests I saw 4 times the > write throughput when using tgt aio + krbd compared to tgt with the builtin > librbd. > > After running the following command against the LUN, which apparently > disables write cache, Performance dropped back to what I am seeing using > tgt+librbd and also the same as fio. > > tgtadm --op update --mode logicalunit --tid 2 --lun 3 -P > mode_page=8:0:18:0x10:0:0xff:0xff:0:0:0xff:0xff:0xff:0xff:0x80:0x14:0:0:0:0:0:0 > > From that I can only deduce that using tgt + krbd in its default state is > not 100% safe to use, especially in an HA environment. > > Nick > > > > > Hey Nick, > > tgt actually does not have any caches. No read, no write. tgt's design is > to passthrough all commands to the backend as efficiently as possible. > > > http://xo4t.mj.am/link/xo4t/6jv2q54/1/dy6ksWJtZ-g2UEgyc-v5dA/aHR0cDovL2xpc3RzLndwa2cub3JnL3BpcGVybWFpbC9zdGd0LzIwMTMtTWF5LzAwNTc4OC5odG1s > > The configuration parameters just inform the initiators whether the > backend storage has a cache. Clearly this makes a big difference for you. > What initiator are you using with this test? > > Maybe the kernel is doing the caching. What tuning parameters do you have > on the krbd disk? > > It could be that using aio is much more efficient. Maybe built in lib rbd > isn't doing aio? > > Jake > > > Hi Jake, > > Hmm that’s interesting, it’s definitely effecting write behaviour though. > > I was running iometer doing single io depth writes in a windows VM on ESXi > using its software initiator, which as far as I’m aware should be sending > sync writes for each request. > > I saw in iostat on the tgt server that my 128kb writes were being > coalesced into ~1024kb writes, which would explain the performance > increase. So something somewhere is doing caching, albeit on a small scale. > > The krbd disk was all using default settings. I know the RBD support for > tgt is using the librbd sync writes which I suppose might explain the > default difference, but this should be the expected behaviour. > > Nick > > > > > *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf > Of *Jake Young > *Sent:* 06 March 2015 15:07 > > *To:* Nick Fisk > *Cc:* ceph-users@lists.ceph.com > *Subject:* Re: [ceph-users] tgt and krbd > > > > My initator is also VMware software iscsi. I had my tgt iscsi targets' > write-cache setting off. > > I turned write and read cache on in the middle of creating a large eager > zeroed disk (tgt has no VAAI support, so this is all regular synchronous > IO) and it did give me a clear performance boost. > > Not orders of magnitude, but maybe 15% faster. > > If the image makes it to the list, the yellow line is write KBps. It went > from about 85MBps to about 100MBps. What was more noticeable was that the > latency (grey line) went from around 250 ms to 130ms. > > [image: Inline image 1] > > I'm pretty sure this IO (zeroing) is always 1MB writes, so I don't think > this caused my write size to change. Maybe it did something to the iSCSI > packets? > > > > Jake > > > > > Hi Jake, > > > > Good to see it’s not just me. > > > > I’m guessing that the fact you are doing 1MB writes means that the latency > difference is having a less noticeable impact on the overall write > bandwidth. What I have been discovering with Ceph + iSCSi is that due to > all the extra hops (client->iscsi proxy->pri OSD-> sec OSD) is that you get > a lot of latency serialisation which dramatically impacts single threaded > iops at small IO sizes. >
That makes sense. I don't really understand how latency is going down if tgt is not really doing caching. > > > A few days back I tested adding a tiny SSD write cache on the iscsi proxy > and this had a dramatic effect in “hiding” the latency behind it from the > client. > > > > Nick > After seeing your results, I've been considering experimenting with that. Currently, my iSCSI proxy nodes are VMs. I would like to build a few dedicated servers with fast SSDs or fusion-io devices. It depends on my budget, it's hard to justify getting a card that costs 10x the rest of the server... I would run all my tgt instances in containers pointing to the rbd disk+cache device. A fusion-io device could support many tgt containers. I don't really want to go back to krbd. I have a few rbd's that are format 2 with striping, there aren't any stable kernels that support that (or any kernels at all yet for "fancy striping"). I wish there was a way to incorporate a local cache device into tgt with librbd backends. Jake
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com