I have suffered power losses in every data center I've been in.  I have
lost SSDs because of it (Intel 320 Series).  The worst time, I lost both
SSDs in a RAID1.  That was a bad day.

I'm using the Intel DC S3700 now, so I don't have a repeat.  My cluster is
small enough that losing a journal SSD would be a major headache.

I'm manually monitoring wear level.  So far all of my journals are still at
100% lifetime.  I do have some of the Intel 320 that are down to 45%
lifetime remaining.  (Those Intel 320s are in less critical roles).  One of
these days I'll get around to automating it.


Speed wise, my small cluster was fast enough without SSDs, until I started
to expand.  I'm only using RadosGW, and I only care about latency in the
human timeframe.  A second or two of latency is annoying, but not a big
deal.

I went from 3 nodes to 5, and the expansion was extremely painful.  I admit
that I inflicted a lot of pain on myself.  I expanded too fast (add all the
OSDs at the same time?  Sure, why not.), and I was using the default
configs.  Things got better after I lowered the backfill priority and
count, and learned to add one or two disks at a time.  Still, customers
noticed the increase in latency when I was adding osds.

Now that I have the journals on SSDs, customers don't notice the
maintenance anymore.  RadosGW latency goes from ~50ms to ~80ms, not ~50ms
to 2000ms.



On Tue, Nov 25, 2014 at 9:12 AM, Michael Kuriger <mk7...@yp.com> wrote:

> My cluster is actually very fast without SSD drives.  Thanks for the
> advice!
>
> Michael Kuriger
> mk7...@yp.com
> 818-649-7235
>
> MikeKuriger (IM)
>
>
>
>
> On 11/25/14, 7:49 AM, "Mark Nelson" <mark.nel...@inktank.com> wrote:
>
> >On 11/25/2014 09:41 AM, Erik Logtenberg wrote:
> >> If you are like me, you have the journals for your OSD's with rotating
> >> media stored separately on an SSD. If you are even more like me, you
> >> happen to use Intel 530 SSD's in some of your hosts. If so, please do
> >> check your S.M.A.R.T. statistics regularly, because these SSD's really
> >> can't cope with Ceph.
> >>
> >> Check out the media-wear graphs for the two Intel 530's in my cluster.
> >> As soon as those declining lines get down to 30% or so, they need to be
> >> replaced. That means less than half a year between purchase and
> >> end-of-life :(
> >>
> >> Tip of the week, keep an eye on those statistics, don't let a failing
> >> SSD surprise you.
> >
> >This is really good advice, and it's not just the Intel 530s.  Most
> >consumer grade SSDs have pretty low write endurance.  If you mostly are
> >doing reads from your cluster you may be OK, but if you have even
> >moderately high write workloads and you care about avoiding OSD downtime
> >(which in a production cluster is pretty important though not usually
> >100% critical), get high write endurance SSDs.
> >
> >Mark
> >
> >>
> >> Erik.
> >>
> >>
> >>
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >>
> >>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listin
> >>fo.cgi_ceph-2Dusers-2Dceph.com
> &d=AAICAg&c=lXkdEK1PC7UK9oKA-BBSI8p1AamzLOS
> >>ncm6Vfn0C_UQ&r=CSYA9OS6Qd7fQySI2LDvlQ&m=xAjtZHPapVvnusxPYRk6BsgVfaL1ZLDaT
> >>ojJmuDFDpQ&s=F0CBA8T3LuTIhofIV4LGk-6CgC8KsPAu-7JgJ4jRm3I&e=
> >>
> >
> >_______________________________________________
> >ceph-users mailing list
> >ceph-users@lists.ceph.com
> >
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinf
> >o.cgi_ceph-2Dusers-2Dceph.com
> &d=AAICAg&c=lXkdEK1PC7UK9oKA-BBSI8p1AamzLOSnc
> >m6Vfn0C_UQ&r=CSYA9OS6Qd7fQySI2LDvlQ&m=xAjtZHPapVvnusxPYRk6BsgVfaL1ZLDaTojJ
> >muDFDpQ&s=F0CBA8T3LuTIhofIV4LGk-6CgC8KsPAu-7JgJ4jRm3I&e=
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to