I would concur having spent a lot of time on ZFS on Solaris.

ZIL will reduce the fragmentation problem a lot (because it is not doing intent 
logging into the filesystem itself which fragments the block allocations) and 
write response will be a lot better.  I would use different devices for L2ARC 
and ZIL - ZIL needs to be small and fast for writes (and mirrored - we have 
used some HGST 16G devices which are designed as ZILs - pricy but highly 
recommend) - L2ARC just needs to be faster for reads than your data disks, most 
SSDs would be fine for this.

A 14 disk RAIDZ2 is also going to be very poor for writes especially with SATA 
- you are effectively only getting one disk worth of IOPS for write as each 
write needs to hit all disks.  Without a ZIL you are also losing out on write 
IOPS for ZIL and metadata operations.



> -----Original Message-----
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Patrick Donnelly
> Sent: Wednesday, 11 January 2017 5:24 PM
> To: Kevin Olbrich
> Cc: Ceph Users
> Subject: Re: [ceph-users] Review of Ceph on ZFS - or how not to deploy Ceph
> for RBD + OpenStack
>
> Hello Kevin,
>
> On Tue, Jan 10, 2017 at 4:21 PM, Kevin Olbrich <k...@sv01.de> wrote:
> > 5x Ceph node equipped with 32GB RAM, Intel i5, Intel DC P3700 NVMe
> > journal,
>
> Is the "journal" used as a ZIL?
>
> > We experienced a lot of io blocks (X requests blocked > 32 sec) when a
> > lot of data is changed in cloned RBDs (disk imported via OpenStack
> > Glance, cloned during instance creation by Cinder).
> > If the disk was cloned some months ago and large software updates are
> > applied (a lot of small files) combined with a lot of syncs, we often
> > had a node hit suicide timeout.
> > Most likely this is a problem with op thread count, as it is easy to
> > block threads with RAIDZ2 (RAID6) if many small operations are written
> > to disk (again, COW is not optimal here).
> > When recovery took place (0.020% degraded) the cluster performance was
> > very bad - remote service VMs (Windows) were unusable. Recovery itself
> > was using
> > 70 - 200 mb/s which was okay.
>
> I would think having an SSD ZIL here would make a very large difference.
> Probably a ZIL may have a much larger performance impact than an L2ARC
> device. [You may even partition it and have both but I'm not sure if that's
> normally recommended.]
>
> Thanks for your writeup!
>
> --
> Patrick Donnelly
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Confidentiality: This email and any attachments are confidential and may be 
subject to copyright, legal or some other professional privilege. They are 
intended solely for the attention and use of the named addressee(s). They may 
only be copied, distributed or disclosed with the consent of the copyright 
owner. If you have received this email by mistake or by breach of the 
confidentiality clause, please notify the sender immediately by return email 
and delete or destroy all copies of the email. Any confidentiality, privilege 
or copyright is not waived or lost because this email has been sent to you by 
mistake.
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to