Re: [ceph-users] any recommendation of using EnhanceIO?

Nick Fisk Tue, 18 Aug 2015 09:10:18 -0700

> -----Original Message-----
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Mark Nelson
> Sent: 18 August 2015 15:55
> To: Jan Schermer <j...@schermer.cz>
> Cc: ceph-users@lists.ceph.com; Nick Fisk <n...@fisk.me.uk>
> Subject: Re: [ceph-users] any recommendation of using EnhanceIO?
>
>
>
> On 08/18/2015 09:24 AM, Jan Schermer wrote:
> >
> >> On 18 Aug 2015, at 15:50, Mark Nelson <mnel...@redhat.com> wrote:
> >>
> >>
> >>
> >> On 08/18/2015 06:47 AM, Nick Fisk wrote:
> >>> Just to chime in, I gave dmcache a limited test but its lack of proper
> writeback cache ruled it out for me. It only performs write back caching on
> blocks already on the SSD, whereas I need something that works like a
> Battery backed raid controller caching all writes.
> >>>
> >>> It's amazing the 100x performance increase you get with RBD's when
> doing sync writes and give it something like just 1GB write back cache with
> flashcache.
> >>
> >> For your use case, is it ok that data may live on the flashcache for some
> amount of time before making to ceph to be replicated?  We've wondered
> internally if this kind of trade-off is acceptable to customers or not should 
> the
> flashcache SSD fail.
> >>
> >
> > Was it me pestering you about it? :-)
> > All my customers need this desperately - people don't care about having
> RPO=0 seconds when all hell breaks loose.
> > People care about their apps being slow all the time which is effectively an
> "outage".
> > I (sysadmin) care about having consistent data where all I have to do is 
> > start
> up the VMs.
> >
> > Any ideas how to approach this? I think even checkpoints (like reverting to
> a known point in the past) would be great and sufficient for most people...
>
> Here's kind of how I see the field right now:
>
> 1) Cache at the client level.  Likely fastest but obvious issues like above.
> RAID1 might be an option at increased cost.  Lack of barriers in some
> implementations scary.


Agreed.

>
> 2) Cache below the OSD.  Not much recent data on this.  Not likely as fast as
> client side cache, but likely cheaper (fewer OSD nodes than client nodes?).
> Lack of barriers in some implementations scary.

This also has the benefit of caching the leveldb on the OSD, so get a big 
performance gain from there too for small sequential writes. I looked at using 
Flashcache for this too but decided it was adding to much complexity and risk.

I thought I read somewhere that RocksDB allows you to move its WAL to SSD, is 
there anything in the pipeline for something like moving the filestore to use 
RocksDB?

>
> 3) Ceph Cache Tiering. Network overhead and write amplification on
> promotion makes this primarily useful when workloads fit mostly into the
> cache tier.  Overall safe design but care must be taken to not over-promote.
>
> 4) separate SSD pool.  Manual and not particularly flexible, but perhaps best
> for applications that need consistently high performance.

I think it depends on the definition of performance. Currently even very fast 
CPU's and SSD's in their own pool will still struggle to get less than 1ms of 
write latency. If your performance requirements are for large queue depths then 
you will probably be alright. If you require something that mirrors the 
performance of traditional write back cache, then even pure SSD Pools can start 
to struggle.


To give a real world example of what I see when doing various tests,  here is a 
rough guide to IOP's when removing a snapshot on a ESX server

Traditional Array 10K disks = 300-600 IOPs
Ceph 7.2K + SSD Journal = 100-200 IOPs (LevelDB syncing on OSD seems to be the 
main limitation)
Ceph Pure SSD Pool = 500 IOPs (Intel s3700 SSD's)
Ceph Cache Tiering = 10-500 IOPs (As we know, misses can be very painful)
Ceph + RBD Caching with Flashcache = 200-1000 IOPs (Readahead can give high 
bursts if snapshot blocks are sequential)

And when copying VM's to datastore (ESXi does this in sequential 64k 
IO's.....yes silly I know)

Traditional Array 10K disks = ~100MB/s (Limited by 1GB interface, on other 
arrays I guess this scales)
Ceph 7.2K + SSD Journal = ~20MB/s (Again LevelDB sync seems to limit here for 
sequential writes)
Ceph Pure SSD Pool = ~50MB/s (Ceph CPU bottleneck is occurring)
Ceph Cache Tiering = ~50MB/s when writing to new block, <10MB/s when 
promote+overwrite
Ceph + RBD Caching with Flashcache = As fast as the SSD will go


>
> >
> >
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On
> >>>> Behalf Of Jan Schermer
> >>>> Sent: 18 August 2015 12:44
> >>>> To: Mark Nelson <mnel...@redhat.com>
> >>>> Cc: ceph-users@lists.ceph.com
> >>>> Subject: Re: [ceph-users] any recommendation of using EnhanceIO?
> >>>>
> >>>> I did not. Not sure why now - probably for the same reason I didn't
> >>>> extensively test bcache.
> >>>> I'm not a real fan of device mapper though, so if I had to choose
> >>>> I'd still go for bcache :-)
> >>>>
> >>>> Jan
> >>>>
> >>>>> On 18 Aug 2015, at 13:33, Mark Nelson <mnel...@redhat.com>
> wrote:
> >>>>>
> >>>>> Hi Jan,
> >>>>>
> >>>>> Out of curiosity did you ever try dm-cache?  I've been meaning to
> >>>>> give it a
> >>>> spin but haven't had the spare cycles.
> >>>>>
> >>>>> Mark
> >>>>>
> >>>>> On 08/18/2015 04:00 AM, Jan Schermer wrote:
> >>>>>> I already evaluated EnhanceIO in combination with CentOS 6 (and
> >>>> backported 3.10 and 4.0 kernel-lt if I remember correctly).
> >>>>>> It worked fine during benchmarks and stress tests, but once we
> >>>>>> run DB2
> >>>> on it it panicked within minutes and took all the data with it
> >>>> (almost literally - files that werent touched, like OS binaries
> >>>> were b0rked and the filesystem was unsalvageable).
> >>>>>> If you disregard this warning - the performance gains weren't
> >>>>>> that great
> >>>> either, at least in a VM. It had problems when flushing to disk
> >>>> after reaching dirty watermark and the block size has some
> >>>> not-well-documented implications (not sure now, but I think it only
> >>>> cached IO _larger_than the block size, so if your database keeps
> >>>> incrementing an XX-byte counter it will go straight to disk).
> >>>>>>
> >>>>>> Flashcache doesn't respect barriers (or does it now?) - if that's
> >>>>>> ok for you
> >>>> than go for it, it should be stable and I used it in the past in
> >>>> production without problems.
> >>>>>>
> >>>>>> bcache seemed to work fine, but I needed to
> >>>>>> a) use it for root
> >>>>>> b) disable and enable it on the fly (doh)
> >>>>>> c) make it non-persisent (flush it) before reboot - not sure if
> >>>>>> that was
> >>>> possible either.
> >>>>>> d) all that in a customer's VM, and that customer didn't have a
> >>>>>> strong
> >>>> technical background to be able to fiddle with it...
> >>>>>> So I haven't tested it heavily.
> >>>>>>
> >>>>>> Bcache should be the obvious choice if you are in control of the
> >>>>>> environment. At least you can cry on LKML's shoulder when you
> >>>>>> lose data :-)
> >>>>>>
> >>>>>> Jan
> >>>>>>
> >>>>>>
> >>>>>>> On 18 Aug 2015, at 01:49, Alex Gorbachev
> >>>>>>> <a...@iss-integration.com>
> >>>> wrote:
> >>>>>>>
> >>>>>>> What about https://github.com/Frontier314/EnhanceIO?  Last
> >>>>>>> commit 2 months ago, but no external contributors :(
> >>>>>>>
> >>>>>>> The nice thing about EnhanceIO is there is no need to change
> >>>>>>> device name, unlike bcache, flashcache etc.
> >>>>>>>
> >>>>>>> Best regards,
> >>>>>>> Alex
> >>>>>>>
> >>>>>>> On Thu, Jul 23, 2015 at 11:02 AM, Daniel Gryniewicz
> >>>>>>> <d...@redhat.com>
> >>>> wrote:
> >>>>>>>> I did some (non-ceph) work on these, and concluded that bcache
> >>>>>>>> was the best supported, most stable, and fastest.  This was ~1
> >>>>>>>> year ago, to take it with a grain of salt, but that's what I would
> recommend.
> >>>>>>>>
> >>>>>>>> Daniel
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> ________________________________
> >>>>>>>> From: "Dominik Zalewski" <dzalew...@optlink.net>
> >>>>>>>> To: "German Anders" <gand...@despegar.com>
> >>>>>>>> Cc: "ceph-users" <ceph-users@lists.ceph.com>
> >>>>>>>> Sent: Wednesday, July 1, 2015 5:28:10 PM
> >>>>>>>> Subject: Re: [ceph-users] any recommendation of using
> EnhanceIO?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> I’ve asked same question last weeks or so (just search the
> >>>>>>>> mailing list archives for EnhanceIO :) and got some interesting
> answers.
> >>>>>>>>
> >>>>>>>> Looks like the project is pretty much dead since it was bought
> >>>>>>>> out by
> >>>> HGST.
> >>>>>>>> Even their website has some broken links in regards to
> >>>>>>>> EnhanceIO
> >>>>>>>>
> >>>>>>>> I’m keen to try flashcache or bcache (its been in the mainline
> >>>>>>>> kernel for some time)
> >>>>>>>>
> >>>>>>>> Dominik
> >>>>>>>>
> >>>>>>>> On 1 Jul 2015, at 21:13, German Anders
> <gand...@despegar.com>
> >>>> wrote:
> >>>>>>>>
> >>>>>>>> Hi cephers,
> >>>>>>>>
> >>>>>>>>    Is anyone out there that implement enhanceIO in a production
> >>>> environment?
> >>>>>>>> any recommendation? any perf output to share with the diff
> >>>>>>>> between using it and not?
> >>>>>>>>
> >>>>>>>> Thanks in advance,
> >>>>>>>>
> >>>>>>>> German
> >>>>>>>> _______________________________________________
> >>>>>>>> ceph-users mailing list
> >>>>>>>> ceph-users@lists.ceph.com
> >>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> _______________________________________________
> >>>>>>>> ceph-users mailing list
> >>>>>>>> ceph-users@lists.ceph.com
> >>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> _______________________________________________
> >>>>>>>> ceph-users mailing list
> >>>>>>>> ceph-users@lists.ceph.com
> >>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> ceph-users mailing list
> >>>>>>> ceph-users@lists.ceph.com
> >>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> ceph-users mailing list
> >>>>>> ceph-users@lists.ceph.com
> >>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>>>>
> >>>>> _______________________________________________
> >>>>> ceph-users mailing list
> >>>>> ceph-users@lists.ceph.com
> >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>>
> >>>> _______________________________________________
> >>>> ceph-users mailing list
> >>>> ceph-users@lists.ceph.com
> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>
> >>>
> >>>
> >>>
> >
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Nick Fisk
Technical Support Engineer

System Professional Ltd
tel: 01825 830000
mob: 07711377522
fax: 01825 830001
mail: nick.f...@sys-pro.co.uk
web: www.sys-pro.co.uk<http://www.sys-pro.co.uk>

IT SUPPORT SERVICES | VIRTUALISATION | STORAGE | BACKUP AND DR | IT CONSULTING

Registered Office:
Wilderness Barns, Wilderness Lane, Hadlow Down, East Sussex, TN22 4HU
Registered in England and Wales.
Company Number: 04754200


Confidentiality: This e-mail and its attachments are intended for the above 
named only and may be confidential. If they have come to you in error you must 
take no action based on them, nor must you copy or show them to anyone; please 
reply to this e-mail and highlight the error.

Security Warning: Please note that this e-mail has been created in the 
knowledge that Internet e-mail is not a 100% secure communications medium. We 
advise that you understand and observe this lack of security when e-mailing us.

Viruses: Although we have taken steps to ensure that this e-mail and 
attachments are free from any virus, we advise that in keeping with good 
computing practice the recipient should ensure they are actually virus free. 
Any views expressed in this e-mail message are those of the individual and not 
necessarily those of the company or any of its subsidiaries.
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] any recommendation of using EnhanceIO?

Reply via email to