Re: [ceph-users] SSD selection

2015-02-28 Thread Christian Balzer
On Sat, 28 Feb 2015 20:42:35 -0600 Tony Harris wrote:

> Hi all,
> 
> I have a small cluster together and it's running fairly well (3 nodes, 21
> osds).  I'm looking to improve the write performance a bit though, which
> I was hoping that using SSDs for journals would do.  But, I was wondering
> what people had as recommendations for SSDs to act as journal drives.
> If I read the docs on ceph.com correctly, I'll need 2 ssds per node
> (with 7 drives in each node, I think the recommendation was 1ssd per 4-5
> drives?) so I'm looking for drives that will work well without breaking
> the bank for where I work (I'll probably have to purchase them myself
> and donate, so my budget is somewhat small).  Any suggestions?  I'd
> prefer one that can finish its write in a power outage case, the only
> one I know of off hand is the intel dcs3700 I think, but at $300 it's
> WAY above my affordability range.

Firstly, an uneven number of OSDs (HDDs) per node will bite you in the
proverbial behind down the road when combined with journal SSDs, as one of
those SSDs will wear our faster than the other.

Secondly, how many SSDs you need is basically a trade-off between price,
performance, endurance and limiting failure impact. 

I have cluster where I used 4 100GB DC S3700s with 8 HDD OSDs, optimizing
the write paths and IOPS and failure domain, but not the sequential speed
or cost.

Depending on what your write load is and the expected lifetime of this
cluster, you might be able to get away with DC S3500s or even better the
new DC S3610s.
Keep in mind that buying a cheap, low endurance SSD now might cost you
more down the road if you have to replace it after a year (TBW/$).

All the cheap alternatives to DC level SSDs tend to wear out too fast,
have no powercaps and tend to have unpredictable (caused by garbage
collection) and steadily decreasing performance.

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD selection

2015-03-01 Thread Tony Harris
Well, although I have 7 now per node, you make a good point and I'm in a
position where I can either increase to 8 and split 4/4 and have 2 ssds, or
reduce to 5 and use a single osd per node (the system is not in production
yet).

Do all the DC lines have caps in them or just the DC s line?

-Tony

On Sat, Feb 28, 2015 at 11:21 PM, Christian Balzer  wrote:

> On Sat, 28 Feb 2015 20:42:35 -0600 Tony Harris wrote:
>
> > Hi all,
> >
> > I have a small cluster together and it's running fairly well (3 nodes, 21
> > osds).  I'm looking to improve the write performance a bit though, which
> > I was hoping that using SSDs for journals would do.  But, I was wondering
> > what people had as recommendations for SSDs to act as journal drives.
> > If I read the docs on ceph.com correctly, I'll need 2 ssds per node
> > (with 7 drives in each node, I think the recommendation was 1ssd per 4-5
> > drives?) so I'm looking for drives that will work well without breaking
> > the bank for where I work (I'll probably have to purchase them myself
> > and donate, so my budget is somewhat small).  Any suggestions?  I'd
> > prefer one that can finish its write in a power outage case, the only
> > one I know of off hand is the intel dcs3700 I think, but at $300 it's
> > WAY above my affordability range.
>
> Firstly, an uneven number of OSDs (HDDs) per node will bite you in the
> proverbial behind down the road when combined with journal SSDs, as one of
> those SSDs will wear our faster than the other.
>
> Secondly, how many SSDs you need is basically a trade-off between price,
> performance, endurance and limiting failure impact.
>
> I have cluster where I used 4 100GB DC S3700s with 8 HDD OSDs, optimizing
> the write paths and IOPS and failure domain, but not the sequential speed
> or cost.
>
> Depending on what your write load is and the expected lifetime of this
> cluster, you might be able to get away with DC S3500s or even better the
> new DC S3610s.
> Keep in mind that buying a cheap, low endurance SSD now might cost you
> more down the road if you have to replace it after a year (TBW/$).
>
> All the cheap alternatives to DC level SSDs tend to wear out too fast,
> have no powercaps and tend to have unpredictable (caused by garbage
> collection) and steadily decreasing performance.
>
> Christian
> --
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com   Global OnLine Japan/Fusion Communications
> http://www.gol.com/
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD selection

2015-03-01 Thread Andrei Mikhailovsky
I would not use a single ssd for 5 osds. I would recommend the 3-4 osds max per 
ssd or you will get the bottleneck on the ssd side. 

I've had a reasonable experience with Intel 520 ssds (which are not produced 
anymore). I've found Samsung 840 Pro to be horrible! 

Otherwise, it seems that everyone here recommends the DC3500 or DC3700 and it 
has the best wear per $ ratio out of all the drives. 

Andrei 

- Original Message -

> From: "Tony Harris" 
> To: "Christian Balzer" 
> Cc: ceph-users@lists.ceph.com
> Sent: Sunday, 1 March, 2015 4:19:30 PM
> Subject: Re: [ceph-users] SSD selection

> Well, although I have 7 now per node, you make a good point and I'm
> in a position where I can either increase to 8 and split 4/4 and
> have 2 ssds, or reduce to 5 and use a single osd per node (the
> system is not in production yet).

> Do all the DC lines have caps in them or just the DC s line?

> -Tony

> On Sat, Feb 28, 2015 at 11:21 PM, Christian Balzer < ch...@gol.com >
> wrote:

> > On Sat, 28 Feb 2015 20:42:35 -0600 Tony Harris wrote:
> 

> > > Hi all,
> 
> > >
> 
> > > I have a small cluster together and it's running fairly well (3
> > > nodes, 21
> 
> > > osds). I'm looking to improve the write performance a bit though,
> > > which
> 
> > > I was hoping that using SSDs for journals would do. But, I was
> > > wondering
> 
> > > what people had as recommendations for SSDs to act as journal
> > > drives.
> 
> > > If I read the docs on ceph.com correctly, I'll need 2 ssds per
> > > node
> 
> > > (with 7 drives in each node, I think the recommendation was 1ssd
> > > per 4-5
> 
> > > drives?) so I'm looking for drives that will work well without
> > > breaking
> 
> > > the bank for where I work (I'll probably have to purchase them
> > > myself
> 
> > > and donate, so my budget is somewhat small). Any suggestions? I'd
> 
> > > prefer one that can finish its write in a power outage case, the
> > > only
> 
> > > one I know of off hand is the intel dcs3700 I think, but at $300
> > > it's
> 
> > > WAY above my affordability range.
> 

> > Firstly, an uneven number of OSDs (HDDs) per node will bite you in
> > the
> 
> > proverbial behind down the road when combined with journal SSDs, as
> > one of
> 
> > those SSDs will wear our faster than the other.
> 

> > Secondly, how many SSDs you need is basically a trade-off between
> > price,
> 
> > performance, endurance and limiting failure impact.
> 

> > I have cluster where I used 4 100GB DC S3700s with 8 HDD OSDs,
> > optimizing
> 
> > the write paths and IOPS and failure domain, but not the sequential
> > speed
> 
> > or cost.
> 

> > Depending on what your write load is and the expected lifetime of
> > this
> 
> > cluster, you might be able to get away with DC S3500s or even
> > better
> > the
> 
> > new DC S3610s.
> 
> > Keep in mind that buying a cheap, low endurance SSD now might cost
> > you
> 
> > more down the road if you have to replace it after a year (TBW/$).
> 

> > All the cheap alternatives to DC level SSDs tend to wear out too
> > fast,
> 
> > have no powercaps and tend to have unpredictable (caused by garbage
> 
> > collection) and steadily decreasing performance.
> 

> > Christian
> 
> > --
> 
> > Christian Balzer Network/Systems Engineer
> 
> > ch...@gol.com Global OnLine Japan/Fusion Communications
> 
> > http://www.gol.com/
> 

> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD selection

2015-03-01 Thread Tony Harris
Ok, any size suggestion?  Can I get a 120 and be ok?  I see I can get
DCS3500 120GB for within $120/drive so it's possible to get 6 of them...

-Tony

On Sun, Mar 1, 2015 at 12:46 PM, Andrei Mikhailovsky 
wrote:

>
> I would not use a single ssd for 5 osds. I would recommend the 3-4 osds
> max per ssd or you will get the bottleneck on the ssd side.
>
> I've had a reasonable experience with Intel 520 ssds (which are not
> produced anymore). I've found Samsung 840 Pro to be horrible!
>
> Otherwise, it seems that everyone here recommends the DC3500 or DC3700 and
> it has the best wear per $ ratio out of all the drives.
>
> Andrei
>
>
> --
>
> *From: *"Tony Harris" 
> *To: *"Christian Balzer" 
> *Cc: *ceph-users@lists.ceph.com
> *Sent: *Sunday, 1 March, 2015 4:19:30 PM
> *Subject: *Re: [ceph-users] SSD selection
>
>
> Well, although I have 7 now per node, you make a good point and I'm in a
> position where I can either increase to 8 and split 4/4 and have 2 ssds, or
> reduce to 5 and use a single osd per node (the system is not in production
> yet).
>
> Do all the DC lines have caps in them or just the DC s line?
>
> -Tony
>
> On Sat, Feb 28, 2015 at 11:21 PM, Christian Balzer  wrote:
>
>> On Sat, 28 Feb 2015 20:42:35 -0600 Tony Harris wrote:
>>
>> > Hi all,
>> >
>> > I have a small cluster together and it's running fairly well (3 nodes,
>> 21
>> > osds).  I'm looking to improve the write performance a bit though, which
>> > I was hoping that using SSDs for journals would do.  But, I was
>> wondering
>> > what people had as recommendations for SSDs to act as journal drives.
>> > If I read the docs on ceph.com correctly, I'll need 2 ssds per node
>> > (with 7 drives in each node, I think the recommendation was 1ssd per 4-5
>> > drives?) so I'm looking for drives that will work well without breaking
>> > the bank for where I work (I'll probably have to purchase them myself
>> > and donate, so my budget is somewhat small).  Any suggestions?  I'd
>> > prefer one that can finish its write in a power outage case, the only
>> > one I know of off hand is the intel dcs3700 I think, but at $300 it's
>> > WAY above my affordability range.
>>
>> Firstly, an uneven number of OSDs (HDDs) per node will bite you in the
>> proverbial behind down the road when combined with journal SSDs, as one of
>> those SSDs will wear our faster than the other.
>>
>> Secondly, how many SSDs you need is basically a trade-off between price,
>> performance, endurance and limiting failure impact.
>>
>> I have cluster where I used 4 100GB DC S3700s with 8 HDD OSDs, optimizing
>> the write paths and IOPS and failure domain, but not the sequential speed
>> or cost.
>>
>> Depending on what your write load is and the expected lifetime of this
>> cluster, you might be able to get away with DC S3500s or even better the
>> new DC S3610s.
>> Keep in mind that buying a cheap, low endurance SSD now might cost you
>> more down the road if you have to replace it after a year (TBW/$).
>>
>> All the cheap alternatives to DC level SSDs tend to wear out too fast,
>> have no powercaps and tend to have unpredictable (caused by garbage
>> collection) and steadily decreasing performance.
>>
>> Christian
>> --
>> Christian BalzerNetwork/Systems Engineer
>> ch...@gol.com   Global OnLine Japan/Fusion Communications
>> http://www.gol.com/
>>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD selection

2015-03-01 Thread Andrei Mikhailovsky
I am not sure about the enterprise grade and underprovisioning, but for the 
Intel 520s i've got 240gbs (the speeds of 240 is a bit better than 120s). and 
i've left 50% underprovisioned. I've got 10GB for journals and I am using 4 
osds per ssd. 

Andrei 

- Original Message -

> From: "Tony Harris" 
> To: "Andrei Mikhailovsky" 
> Cc: ceph-users@lists.ceph.com, "Christian Balzer" 
> Sent: Sunday, 1 March, 2015 8:49:56 PM
> Subject: Re: [ceph-users] SSD selection

> Ok, any size suggestion? Can I get a 120 and be ok? I see I can get
> DCS3500 120GB for within $120/drive so it's possible to get 6 of
> them...

> -Tony

> On Sun, Mar 1, 2015 at 12:46 PM, Andrei Mikhailovsky <
> and...@arhont.com > wrote:

> > I would not use a single ssd for 5 osds. I would recommend the 3-4
> > osds max per ssd or you will get the bottleneck on the ssd side.
> 

> > I've had a reasonable experience with Intel 520 ssds (which are not
> > produced anymore). I've found Samsung 840 Pro to be horrible!
> 

> > Otherwise, it seems that everyone here recommends the DC3500 or
> > DC3700 and it has the best wear per $ ratio out of all the drives.
> 

> > Andrei
> 

> > > From: "Tony Harris" < neth...@gmail.com >
> > 
> 
> > > To: "Christian Balzer" < ch...@gol.com >
> > 
> 
> > > Cc: ceph-users@lists.ceph.com
> > 
> 
> > > Sent: Sunday, 1 March, 2015 4:19:30 PM
> > 
> 
> > > Subject: Re: [ceph-users] SSD selection
> > 
> 

> > > Well, although I have 7 now per node, you make a good point and
> > > I'm
> > > in a position where I can either increase to 8 and split 4/4 and
> > > have 2 ssds, or reduce to 5 and use a single osd per node (the
> > > system is not in production yet).
> > 
> 

> > > Do all the DC lines have caps in them or just the DC s line?
> > 
> 

> > > -Tony
> > 
> 

> > > On Sat, Feb 28, 2015 at 11:21 PM, Christian Balzer <
> > > ch...@gol.com
> > > >
> > > wrote:
> > 
> 

> > > > On Sat, 28 Feb 2015 20:42:35 -0600 Tony Harris wrote:
> > > 
> > 
> 

> > > > > Hi all,
> > > 
> > 
> 
> > > > >
> > > 
> > 
> 
> > > > > I have a small cluster together and it's running fairly well
> > > > > (3
> > > > > nodes, 21
> > > 
> > 
> 
> > > > > osds). I'm looking to improve the write performance a bit
> > > > > though,
> > > > > which
> > > 
> > 
> 
> > > > > I was hoping that using SSDs for journals would do. But, I
> > > > > was
> > > > > wondering
> > > 
> > 
> 
> > > > > what people had as recommendations for SSDs to act as journal
> > > > > drives.
> > > 
> > 
> 
> > > > > If I read the docs on ceph.com correctly, I'll need 2 ssds
> > > > > per
> > > > > node
> > > 
> > 
> 
> > > > > (with 7 drives in each node, I think the recommendation was
> > > > > 1ssd
> > > > > per 4-5
> > > 
> > 
> 
> > > > > drives?) so I'm looking for drives that will work well
> > > > > without
> > > > > breaking
> > > 
> > 
> 
> > > > > the bank for where I work (I'll probably have to purchase
> > > > > them
> > > > > myself
> > > 
> > 
> 
> > > > > and donate, so my budget is somewhat small). Any suggestions?
> > > > > I'd
> > > 
> > 
> 
> > > > > prefer one that can finish its write in a power outage case,
> > > > > the
> > > > > only
> > > 
> > 
> 
> > > > > one I know of off hand is the intel dcs3700 I think, but at
> > > > > $300
> > > > > it's
> > > 
> > 
> 
> > > > > WAY above my affordability range.
> > > 
> > 
> 

> > > > Firstly, an uneven number of OSDs (HDDs) per node will bite you
> > > > in
> > > > the
> > > 
> > 
> 
> > > > proverbial behind down the road when combined with journal
> > > > SSDs,
> > > > as
> > > > one of
> > > 
> > 
> 
> > > > those SSDs will wear our f

Re: [ceph-users] SSD selection

2015-03-01 Thread Tony Harris
Now, I've never setup a journal on a separate disk, I assume you have 4
partitions at 10GB / partition, I noticed in the docs they referred to 10
GB, as a good starter.  Would it be better to have 4 partitions @ 10g ea or
4 @20?

I know I'll take a speed hit, but unless I can get my work to buy the
drives, they will have to sit with what my personal budget can afford and
be willing to donate ;)

-Tony

On Sun, Mar 1, 2015 at 2:54 PM, Andrei Mikhailovsky 
wrote:

> I am not sure about the enterprise grade and underprovisioning, but for
> the Intel 520s i've got 240gbs (the speeds of 240 is a bit better than
> 120s). and i've left 50% underprovisioned. I've got 10GB for journals and I
> am using 4 osds per ssd.
>
> Andrei
>
>
> --
>
> *From: *"Tony Harris" 
> *To: *"Andrei Mikhailovsky" 
> *Cc: *ceph-users@lists.ceph.com, "Christian Balzer" 
> *Sent: *Sunday, 1 March, 2015 8:49:56 PM
>
> *Subject: *Re: [ceph-users] SSD selection
>
> Ok, any size suggestion?  Can I get a 120 and be ok?  I see I can get
> DCS3500 120GB for within $120/drive so it's possible to get 6 of them...
>
> -Tony
>
> On Sun, Mar 1, 2015 at 12:46 PM, Andrei Mikhailovsky 
> wrote:
>
>>
>> I would not use a single ssd for 5 osds. I would recommend the 3-4 osds
>> max per ssd or you will get the bottleneck on the ssd side.
>>
>> I've had a reasonable experience with Intel 520 ssds (which are not
>> produced anymore). I've found Samsung 840 Pro to be horrible!
>>
>> Otherwise, it seems that everyone here recommends the DC3500 or DC3700
>> and it has the best wear per $ ratio out of all the drives.
>>
>> Andrei
>>
>>
>> --
>>
>> *From: *"Tony Harris" 
>> *To: *"Christian Balzer" 
>> *Cc: *ceph-users@lists.ceph.com
>> *Sent: *Sunday, 1 March, 2015 4:19:30 PM
>> *Subject: *Re: [ceph-users] SSD selection
>>
>>
>> Well, although I have 7 now per node, you make a good point and I'm in a
>> position where I can either increase to 8 and split 4/4 and have 2 ssds, or
>> reduce to 5 and use a single osd per node (the system is not in production
>> yet).
>>
>> Do all the DC lines have caps in them or just the DC s line?
>>
>> -Tony
>>
>> On Sat, Feb 28, 2015 at 11:21 PM, Christian Balzer  wrote:
>>
>>> On Sat, 28 Feb 2015 20:42:35 -0600 Tony Harris wrote:
>>>
>>> > Hi all,
>>> >
>>> > I have a small cluster together and it's running fairly well (3 nodes,
>>> 21
>>> > osds).  I'm looking to improve the write performance a bit though,
>>> which
>>> > I was hoping that using SSDs for journals would do.  But, I was
>>> wondering
>>> > what people had as recommendations for SSDs to act as journal drives.
>>> > If I read the docs on ceph.com correctly, I'll need 2 ssds per node
>>> > (with 7 drives in each node, I think the recommendation was 1ssd per
>>> 4-5
>>> > drives?) so I'm looking for drives that will work well without breaking
>>> > the bank for where I work (I'll probably have to purchase them myself
>>> > and donate, so my budget is somewhat small).  Any suggestions?  I'd
>>> > prefer one that can finish its write in a power outage case, the only
>>> > one I know of off hand is the intel dcs3700 I think, but at $300 it's
>>> > WAY above my affordability range.
>>>
>>> Firstly, an uneven number of OSDs (HDDs) per node will bite you in the
>>> proverbial behind down the road when combined with journal SSDs, as one
>>> of
>>> those SSDs will wear our faster than the other.
>>>
>>> Secondly, how many SSDs you need is basically a trade-off between price,
>>> performance, endurance and limiting failure impact.
>>>
>>> I have cluster where I used 4 100GB DC S3700s with 8 HDD OSDs, optimizing
>>> the write paths and IOPS and failure domain, but not the sequential speed
>>> or cost.
>>>
>>> Depending on what your write load is and the expected lifetime of this
>>> cluster, you might be able to get away with DC S3500s or even better the
>>> new DC S3610s.
>>> Keep in mind that buying a cheap, low endurance SSD now might cost you
>>> more down the road if you have to replace it after a year (TBW/$).
>>>
>>> All the cheap alternatives to DC level SSDs tend to wear out too fast,
>>> have no powercaps and tend to have unpredictable (caused by garbage
>>> collection) and steadily decreasing performance.
>>>
>>> Christian
>>> --
>>> Christian BalzerNetwork/Systems Engineer
>>> ch...@gol.com   Global OnLine Japan/Fusion Communications
>>> http://www.gol.com/
>>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD selection

2015-03-01 Thread Christian Balzer

Again, penultimately you will need to sit down, compile and compare the
numbers.

Start with this:
http://ark.intel.com/products/family/83425/Data-Center-SSDs

Pay close attention to the 3610 SSDs, while slightly more expensive they
offer 10 times the endurance. 

Guestimate the amount of data written to your cluster per day, break that
down to the load a journal SSD will see and then multiply by at least 5 to
be on the safe side. Then see which SSD will fit your expected usage
pattern.

You didn't mention your network, but I assume it's 10Gb/s?

At 135MB/s writes the 100GB DC S3500 will not cut the mustard in any shape
or form when journaling for 4 HDDs. 
With 2 HDDs it might be a so-so choice, but still falling short. 
Most currenth 7.2K RPM HDDs these days can do around 150MB/s writes,
however that's neither uniform, nor does Ceph do anything resembling a
sequential write (which is where these speeds come from), so in my book
80-120MB/s on the SSD journal per HDD are enough.

A speed hit is one thing, more than halving your bandwidth is bad,
especially when thinking about backfilling. 

Journal size doesn't matter that much, 10GB is fine, 20GB x4 is OK with
the 100GB DC drives, with 5xx consumer models I'd leave at least 50% free.

Christian

On Sun, 1 Mar 2015 15:08:10 -0600 Tony Harris wrote:

> Now, I've never setup a journal on a separate disk, I assume you have 4
> partitions at 10GB / partition, I noticed in the docs they referred to 10
> GB, as a good starter.  Would it be better to have 4 partitions @ 10g ea
> or 4 @20?
> 
> I know I'll take a speed hit, but unless I can get my work to buy the
> drives, they will have to sit with what my personal budget can afford and
> be willing to donate ;)
> 
> -Tony
> 
> On Sun, Mar 1, 2015 at 2:54 PM, Andrei Mikhailovsky 
> wrote:
> 
> > I am not sure about the enterprise grade and underprovisioning, but for
> > the Intel 520s i've got 240gbs (the speeds of 240 is a bit better than
> > 120s). and i've left 50% underprovisioned. I've got 10GB for journals
> > and I am using 4 osds per ssd.
> >
> > Andrei
> >
> >
> > --
> >
> > *From: *"Tony Harris" 
> > *To: *"Andrei Mikhailovsky" 
> > *Cc: *ceph-users@lists.ceph.com, "Christian Balzer" 
> > *Sent: *Sunday, 1 March, 2015 8:49:56 PM
> >
> > *Subject: *Re: [ceph-users] SSD selection
> >
> > Ok, any size suggestion?  Can I get a 120 and be ok?  I see I can get
> > DCS3500 120GB for within $120/drive so it's possible to get 6 of
> > them...
> >
> > -Tony
> >
> > On Sun, Mar 1, 2015 at 12:46 PM, Andrei Mikhailovsky
> >  wrote:
> >
> >>
> >> I would not use a single ssd for 5 osds. I would recommend the 3-4
> >> osds max per ssd or you will get the bottleneck on the ssd side.
> >>
> >> I've had a reasonable experience with Intel 520 ssds (which are not
> >> produced anymore). I've found Samsung 840 Pro to be horrible!
> >>
> >> Otherwise, it seems that everyone here recommends the DC3500 or DC3700
> >> and it has the best wear per $ ratio out of all the drives.
> >>
> >> Andrei
> >>
> >>
> >> --
> >>
> >> *From: *"Tony Harris" 
> >> *To: *"Christian Balzer" 
> >> *Cc: *ceph-users@lists.ceph.com
> >> *Sent: *Sunday, 1 March, 2015 4:19:30 PM
> >> *Subject: *Re: [ceph-users] SSD selection
> >>
> >>
> >> Well, although I have 7 now per node, you make a good point and I'm
> >> in a position where I can either increase to 8 and split 4/4 and have
> >> 2 ssds, or reduce to 5 and use a single osd per node (the system is
> >> not in production yet).
> >>
> >> Do all the DC lines have caps in them or just the DC s line?
> >>
> >> -Tony
> >>
> >> On Sat, Feb 28, 2015 at 11:21 PM, Christian Balzer 
> >> wrote:
> >>
> >>> On Sat, 28 Feb 2015 20:42:35 -0600 Tony Harris wrote:
> >>>
> >>> > Hi all,
> >>> >
> >>> > I have a small cluster together and it's running fairly well (3
> >>> > nodes,
> >>> 21
> >>> > osds).  I'm looking to improve the write performance a bit though,
> >>> which
> >>> > I was hoping that using SSDs for journals would do.  But, I was
> >>> wondering
> >>> > what people had as recommendations for SSDs to act as journal
> >>> > drives. If 

Re: [ceph-users] SSD selection

2015-03-01 Thread Tony Harris
On Sun, Mar 1, 2015 at 6:32 PM, Christian Balzer  wrote:

>
> Again, penultimately you will need to sit down, compile and compare the
> numbers.
>
> Start with this:
> http://ark.intel.com/products/family/83425/Data-Center-SSDs
>
> Pay close attention to the 3610 SSDs, while slightly more expensive they
> offer 10 times the endurance.
>

Unfortunately, $300 vs $100 isn't really slightly more expensive ;)
 Although I did notice that the 3710's can be gotten for ~210.



>
> Guestimate the amount of data written to your cluster per day, break that
> down to the load a journal SSD will see and then multiply by at least 5 to
> be on the safe side. Then see which SSD will fit your expected usage
> pattern.
>

Luckily I don't think there will be a ton of data per day written.  The
majority of servers whose VHDs will be stored in our cluster don't have a
lot of frequent activity - aside from a few windows servers that have DBs
servers in them (and even they don't write a ton of data per day really).



>
> You didn't mention your network, but I assume it's 10Gb/s?
>

Would be nice, if I had access to the kind of cash to get a 10Gb network, I
wouldn't be stressing the cost of a set of SSDs ;)


>
> At 135MB/s writes the 100GB DC S3500 will not cut the mustard in any shape
> or form when journaling for 4 HDDs.
> With 2 HDDs it might be a so-so choice, but still falling short.
> Most currenth 7.2K RPM HDDs these days can do around 150MB/s writes,
> however that's neither uniform, nor does Ceph do anything resembling a
> sequential write (which is where these speeds come from), so in my book
> 80-120MB/s on the SSD journal per HDD are enough.
>

The drives I have access to that are in the cluster aren't the fastest,
current drives out there; but what you're describing, to have even 3 HDD's
per SSD, you'd need an SSD running 240-360MB/s write capability...  Why
does the ceph documentation then talk 1ssd per 4-5 osd drives?  It would be
near impossible to get an SSD to meet that level of speeds..


>
> A speed hit is one thing, more than halving your bandwidth is bad,
> especially when thinking about backfilling.
>

Although I'm working with more than 1Gb/s, it's a lot less than 10Gb/s, so
there might be a threshold there where we wouldn't experience an issue
where someone using 10G would (God I'd love a 10G network, but no budget
for it)


>
> Journal size doesn't matter that much, 10GB is fine, 20GB x4 is OK with
> the 100GB DC drives, with 5xx consumer models I'd leave at least 50% free.
>

Well, I'd like to steer away from the consumer models if possible since
they (AFAIK) don't contain caps to finish writes should a power loss occur,
unless there is one that does?

-Tony


>
> Christian
>
> On Sun, 1 Mar 2015 15:08:10 -0600 Tony Harris wrote:
>
> > Now, I've never setup a journal on a separate disk, I assume you have 4
> > partitions at 10GB / partition, I noticed in the docs they referred to 10
> > GB, as a good starter.  Would it be better to have 4 partitions @ 10g ea
> > or 4 @20?
> >
> > I know I'll take a speed hit, but unless I can get my work to buy the
> > drives, they will have to sit with what my personal budget can afford and
> > be willing to donate ;)
> >
> > -Tony
> >
> > On Sun, Mar 1, 2015 at 2:54 PM, Andrei Mikhailovsky 
> > wrote:
> >
> > > I am not sure about the enterprise grade and underprovisioning, but for
> > > the Intel 520s i've got 240gbs (the speeds of 240 is a bit better than
> > > 120s). and i've left 50% underprovisioned. I've got 10GB for journals
> > > and I am using 4 osds per ssd.
> > >
> > > Andrei
> > >
> > >
> > > --
> > >
> > > *From: *"Tony Harris" 
> > > *To: *"Andrei Mikhailovsky" 
> > > *Cc: *ceph-users@lists.ceph.com, "Christian Balzer" 
> > > *Sent: *Sunday, 1 March, 2015 8:49:56 PM
> > >
> > > *Subject: *Re: [ceph-users] SSD selection
> > >
> > > Ok, any size suggestion?  Can I get a 120 and be ok?  I see I can get
> > > DCS3500 120GB for within $120/drive so it's possible to get 6 of
> > > them...
> > >
> > > -Tony
> > >
> > > On Sun, Mar 1, 2015 at 12:46 PM, Andrei Mikhailovsky
> > >  wrote:
> > >
> > >>
> > >> I would not use a single ssd for 5 osds. I would recommend the 3-4
> > >> osds max per ssd or you will get the bottleneck on the ssd side.
> > >>
> > >> I've had a reasona

Re: [ceph-users] SSD selection

2015-03-01 Thread Christian Balzer
) don't contain caps to finish writes should a power loss
> occur, unless there is one that does?
> 
Not that I'm aware of. 

Also note that while Andrei is happy with his 520s (especially compared to
the Samsungs) I have various 5x0 Intel SSDs in use as well and while they
are quite nice the 3700s are so much faster (consistently) in comparison
that one can't believe it ain't butter. ^o^

Christian

> -Tony
> 
> 
> >
> > Christian
> >
> > On Sun, 1 Mar 2015 15:08:10 -0600 Tony Harris wrote:
> >
> > > Now, I've never setup a journal on a separate disk, I assume you
> > > have 4 partitions at 10GB / partition, I noticed in the docs they
> > > referred to 10 GB, as a good starter.  Would it be better to have 4
> > > partitions @ 10g ea or 4 @20?
> > >
> > > I know I'll take a speed hit, but unless I can get my work to buy the
> > > drives, they will have to sit with what my personal budget can
> > > afford and be willing to donate ;)
> > >
> > > -Tony
> > >
> > > On Sun, Mar 1, 2015 at 2:54 PM, Andrei Mikhailovsky
> > >  wrote:
> > >
> > > > I am not sure about the enterprise grade and underprovisioning,
> > > > but for the Intel 520s i've got 240gbs (the speeds of 240 is a bit
> > > > better than 120s). and i've left 50% underprovisioned. I've got
> > > > 10GB for journals and I am using 4 osds per ssd.
> > > >
> > > > Andrei
> > > >
> > > >
> > > > --
> > > >
> > > > *From: *"Tony Harris" 
> > > > *To: *"Andrei Mikhailovsky" 
> > > > *Cc: *ceph-users@lists.ceph.com, "Christian Balzer" 
> > > > *Sent: *Sunday, 1 March, 2015 8:49:56 PM
> > > >
> > > > *Subject: *Re: [ceph-users] SSD selection
> > > >
> > > > Ok, any size suggestion?  Can I get a 120 and be ok?  I see I can
> > > > get DCS3500 120GB for within $120/drive so it's possible to get 6
> > > > of them...
> > > >
> > > > -Tony
> > > >
> > > > On Sun, Mar 1, 2015 at 12:46 PM, Andrei Mikhailovsky
> > > >  wrote:
> > > >
> > > >>
> > > >> I would not use a single ssd for 5 osds. I would recommend the 3-4
> > > >> osds max per ssd or you will get the bottleneck on the ssd side.
> > > >>
> > > >> I've had a reasonable experience with Intel 520 ssds (which are
> > > >> not produced anymore). I've found Samsung 840 Pro to be horrible!
> > > >>
> > > >> Otherwise, it seems that everyone here recommends the DC3500 or
> > > >> DC3700 and it has the best wear per $ ratio out of all the drives.
> > > >>
> > > >> Andrei
> > > >>
> > > >>
> > > >> --
> > > >>
> > > >> *From: *"Tony Harris" 
> > > >> *To: *"Christian Balzer" 
> > > >> *Cc: *ceph-users@lists.ceph.com
> > > >> *Sent: *Sunday, 1 March, 2015 4:19:30 PM
> > > >> *Subject: *Re: [ceph-users] SSD selection
> > > >>
> > > >>
> > > >> Well, although I have 7 now per node, you make a good point and
> > > >> I'm in a position where I can either increase to 8 and split 4/4
> > > >> and have 2 ssds, or reduce to 5 and use a single osd per node
> > > >> (the system is not in production yet).
> > > >>
> > > >> Do all the DC lines have caps in them or just the DC s line?
> > > >>
> > > >> -Tony
> > > >>
> > > >> On Sat, Feb 28, 2015 at 11:21 PM, Christian Balzer 
> > > >> wrote:
> > > >>
> > > >>> On Sat, 28 Feb 2015 20:42:35 -0600 Tony Harris wrote:
> > > >>>
> > > >>> > Hi all,
> > > >>> >
> > > >>> > I have a small cluster together and it's running fairly well (3
> > > >>> > nodes,
> > > >>> 21
> > > >>> > osds).  I'm looking to improve the write performance a bit
> > > >>> > though,
> > > >>> which
> > > >>> > I was hoping that using SSDs for journals would do.  But, I was
> > > >>> wondering
> > > >>

Re: [ceph-users] SSD selection

2015-03-01 Thread Tony Harris
On Sun, Mar 1, 2015 at 10:18 PM, Christian Balzer  wrote:

> On Sun, 1 Mar 2015 21:26:16 -0600 Tony Harris wrote:
>
> > On Sun, Mar 1, 2015 at 6:32 PM, Christian Balzer  wrote:
> >
> > >
> > > Again, penultimately you will need to sit down, compile and compare the
> > > numbers.
> > >
> > > Start with this:
> > > http://ark.intel.com/products/family/83425/Data-Center-SSDs
> > >
> > > Pay close attention to the 3610 SSDs, while slightly more expensive
> > > they offer 10 times the endurance.
> > >
> >
> > Unfortunately, $300 vs $100 isn't really slightly more expensive ;)
> >  Although I did notice that the 3710's can be gotten for ~210.
> >
> >
> I'm not sure where you get those prices from or what you're comparing with
> what but if you look at the OEM prices in the URL up there (which compare
> quite closely to what you can find when looking at shopping prices) a
> comparison with closely matched capabilities goes like this:
>
> http://ark.intel.com/compare/71913,86640,75680,75679
>
>
I'll be honest, the pricing on Intel's website is far from reality.  I
haven't been able to find any OEMs, and retail pricing on the 200GB 3610 is
~231 (the $300 must have been a different model in the line).  Although
$231 does add up real quick if I need to get 6 of them :(


> You really wouldn't want less than 200MB/s, even in your setup which I
> take to be 2Gb/s from what you wrote below.



> Note that the 100GB 3700 is going to perform way better and last immensely
> longer than the 160GB 3500 while being moderately more expensive, while
> the the 200GB 3610 is faster (IOPS), lasting 10 times long AND cheaper than
> the 240GB 3500.
>
> It is pretty much those numbers that made me use 4 100GB 3700s instead of
> 3500s (240GB), much more bang for the buck and it still did fit my budget
> and could deal with 80% of the network bandwidth.
>

So the 3710's would be an ok solution?  I have seen the 3700s for right
about $200, which although doesn't seem a lot cheaper, when getting 6, that
does shave about $200 after shipping costs as well...


>
> >
> > >
> > > Guestimate the amount of data written to your cluster per day, break
> > > that down to the load a journal SSD will see and then multiply by at
> > > least 5 to be on the safe side. Then see which SSD will fit your
> > > expected usage pattern.
> > >
> >
> > Luckily I don't think there will be a ton of data per day written.  The
> > majority of servers whose VHDs will be stored in our cluster don't have a
> > lot of frequent activity - aside from a few windows servers that have DBs
> > servers in them (and even they don't write a ton of data per day really).
> >
>
> Being able to put even a coarse number on this will tell you if you can
> skim on the endurance and have your cluster last like 5 years or if
> getting a higher endurance SSD is going to be cheaper.
>

Any suggestions on how I can get a really accurate number on this?  I mean,
I could probably get some good numbers from the database servers in terms
of their writes in a given day, but when it comes to other processes
running in the background I'm not sure how much these  might really affect
this number.


>
>
> >
> So it's 2x1Gb/s then?
>

client side 2x1, cluster side, 3x1.


>
> At that speed a single SSD from the list above would do, if you're
> a) aware of the risk that this SSD failing will kill all OSDs on that node
> and
> b) don't expect your cluster to be upgraded
>

I'd really prefer 2 per node from our discussions so far - it's all a
matter of cost, but I also don't want to jump to a poor decision just
because it can't be afforded immediately.  I'd rather gradually upgrade
nodes as can be afforded then jump into cheap now only to have to pay a
bigger price later.


>
> > Well, I'd like to steer away from the consumer models if possible since
> > they (AFAIK) don't contain caps to finish writes should a power loss
> > occur, unless there is one that does?
> >
> Not that I'm aware of.
>
> Also note that while Andrei is happy with his 520s (especially compared to
> the Samsungs) I have various 5x0 Intel SSDs in use as well and while they
> are quite nice the 3700s are so much faster (consistently) in comparison
> that one can't believe it ain't butter. ^o^
>

I'll have to see if I can get funding, I've already donated enough to get
the (albeit used) servers and nic cards, I just can't personally afford to
donate another 1K-1200, but hopefully I'll soon have it nailed down what
exact model I would like to have and maybe I can get them to pay for at
least 1/2 of them...  God working for a school can be taxing at times.

-Tony



>
> Christian
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD selection

2015-03-01 Thread Christian Balzer
On Sun, 1 Mar 2015 22:47:48 -0600 Tony Harris wrote:

> On Sun, Mar 1, 2015 at 10:18 PM, Christian Balzer  wrote:
> 
> > On Sun, 1 Mar 2015 21:26:16 -0600 Tony Harris wrote:
> >
> > > On Sun, Mar 1, 2015 at 6:32 PM, Christian Balzer 
> > > wrote:
> > >
> > > >
> > > > Again, penultimately you will need to sit down, compile and
> > > > compare the numbers.
> > > >
> > > > Start with this:
> > > > http://ark.intel.com/products/family/83425/Data-Center-SSDs
> > > >
> > > > Pay close attention to the 3610 SSDs, while slightly more expensive
> > > > they offer 10 times the endurance.
> > > >
> > >
> > > Unfortunately, $300 vs $100 isn't really slightly more expensive ;)
> > >  Although I did notice that the 3710's can be gotten for ~210.
> > >
> > >
> > I'm not sure where you get those prices from or what you're comparing
> > with what but if you look at the OEM prices in the URL up there (which
> > compare quite closely to what you can find when looking at shopping
> > prices) a comparison with closely matched capabilities goes like this:
> >
> > http://ark.intel.com/compare/71913,86640,75680,75679
> >
> >
> I'll be honest, the pricing on Intel's website is far from reality.  I
> haven't been able to find any OEMs, and retail pricing on the 200GB 3610
> is ~231 (the $300 must have been a different model in the line).
> Although $231 does add up real quick if I need to get 6 of them :(
> 
> 
Using the google shopping (which isn't ideal, but for simplicities sake)
search I see the 100GB DC S3700 from 170USD and the 160GB DC S3500 from
150USD, which are a pretty good match to the OEM price on the Intel site
of 180 and 160 respectively.

> > You really wouldn't want less than 200MB/s, even in your setup which I
> > take to be 2Gb/s from what you wrote below.
> 
> 
> 
> > Note that the 100GB 3700 is going to perform way better and last
> > immensely longer than the 160GB 3500 while being moderately more
> > expensive, while the the 200GB 3610 is faster (IOPS), lasting 10 times
> > long AND cheaper than the 240GB 3500.
> >
> > It is pretty much those numbers that made me use 4 100GB 3700s instead
> > of 3500s (240GB), much more bang for the buck and it still did fit my
> > budget and could deal with 80% of the network bandwidth.
> >
> 
> So the 3710's would be an ok solution?  

No, because they start from 200GB and with a 300USD price tag. The 3710s
do not replace the 3700s, they extend the selection upwards (in size
mostly).  

>I have seen the 3700s for right
> about $200, which although doesn't seem a lot cheaper, when getting 6,
> that does shave about $200 after shipping costs as well...
> 
See above, google shopping. The lowballer is Walmart, of all places:

http://www.walmart.com/ip/26972768?wmlspartner=wlpa&selectedSellerId=0


> 
> >
> > >
> > > >
> > > > Guestimate the amount of data written to your cluster per day,
> > > > break that down to the load a journal SSD will see and then
> > > > multiply by at least 5 to be on the safe side. Then see which SSD
> > > > will fit your expected usage pattern.
> > > >
> > >
> > > Luckily I don't think there will be a ton of data per day written.
> > > The majority of servers whose VHDs will be stored in our cluster
> > > don't have a lot of frequent activity - aside from a few windows
> > > servers that have DBs servers in them (and even they don't write a
> > > ton of data per day really).
> > >
> >
> > Being able to put even a coarse number on this will tell you if you can
> > skim on the endurance and have your cluster last like 5 years or if
> > getting a higher endurance SSD is going to be cheaper.
> >
> 
> Any suggestions on how I can get a really accurate number on this?  I
> mean, I could probably get some good numbers from the database servers
> in terms of their writes in a given day, but when it comes to other
> processes running in the background I'm not sure how much these  might
> really affect this number.
>

If you have existing servers that run linux and have been up for
reasonably long time (months), iostat will give you a very good idea.
No ideas about Windows, but I bet those stats exist someplace, too.
 
For example a Ceph storage node, up 74 days with OS and journals on the
first 4 drives and OSD HDDs on the other 8:

Device:tpskB_read/skB_wrtn/skB_read kB_wrtn 
sda   9.8229.88   187.87  191341125 1203171718
sdb   9.7929.57   194.22  189367432 1243850846
sdc   9.7729.83   188.89  191061000 1209676622
sdd   8.7729.57   175.40  189399240 1123294410
sde   5.24   354.1955.68 2268306443  356604748
sdi   5.02   335.6163.60 2149338787  407307544
sdj   4.96   350.3352.43 2243590803  335751320
sdl   5.04   374.6248.49 2399170183  310559488
sdf   4.85   354.5250.43 2270401571  322947192
sdh   4.77   33

Re: [ceph-users] SSD selection

2015-03-02 Thread Tony Harris
On Sun, Mar 1, 2015 at 11:19 PM, Christian Balzer  wrote:

>
> > >
> > I'll be honest, the pricing on Intel's website is far from reality.  I
> > haven't been able to find any OEMs, and retail pricing on the 200GB 3610
> > is ~231 (the $300 must have been a different model in the line).
> > Although $231 does add up real quick if I need to get 6 of them :(
> >
> >
> Using the google shopping (which isn't ideal, but for simplicities sake)
> search I see the 100GB DC S3700 from 170USD and the 160GB DC S3500 from
> 150USD, which are a pretty good match to the OEM price on the Intel site
> of 180 and 160 respectively.
>
>
If I have to buy them personally, that'll work well.  If I can get work to
get them, then I kinda have to limit myself to whom we have marked as
suppliers as it's a pain to get a new company in the mix.



> > > You really wouldn't want less than 200MB/s, even in your setup which I
> > > take to be 2Gb/s from what you wrote below.
> >
> >
> >
> > > Note that the 100GB 3700 is going to perform way better and last
> > > immensely longer than the 160GB 3500 while being moderately more
> > > expensive, while the the 200GB 3610 is faster (IOPS), lasting 10 times
> > > long AND cheaper than the 240GB 3500.
> > >
> > > It is pretty much those numbers that made me use 4 100GB 3700s instead
> > > of 3500s (240GB), much more bang for the buck and it still did fit my
> > > budget and could deal with 80% of the network bandwidth.
> > >
> >
> > So the 3710's would be an ok solution?
>
> No, because they start from 200GB and with a 300USD price tag. The 3710s
> do not replace the 3700s, they extend the selection upwards (in size
> mostly).
>

I thought I had corrected that - I was thinking the 3700's and typed 3710 :)


>
> >I have seen the 3700s for right
> > about $200, which although doesn't seem a lot cheaper, when getting 6,
> > that does shave about $200 after shipping costs as well...
> >
> See above, google shopping. The lowballer is Walmart, of all places:
>
> http://www.walmart.com/ip/26972768?wmlspartner=wlpa&selectedSellerId=0
>
>
> >
> > >
> > > >
> > > > >
> > > > > Guestimate the amount of data written to your cluster per day,
> > > > > break that down to the load a journal SSD will see and then
> > > > > multiply by at least 5 to be on the safe side. Then see which SSD
> > > > > will fit your expected usage pattern.
> > > > >
> > > >
> > > > Luckily I don't think there will be a ton of data per day written.
> > > > The majority of servers whose VHDs will be stored in our cluster
> > > > don't have a lot of frequent activity - aside from a few windows
> > > > servers that have DBs servers in them (and even they don't write a
> > > > ton of data per day really).
> > > >
> > >
> > > Being able to put even a coarse number on this will tell you if you can
> > > skim on the endurance and have your cluster last like 5 years or if
> > > getting a higher endurance SSD is going to be cheaper.
> > >
> >
> > Any suggestions on how I can get a really accurate number on this?  I
> > mean, I could probably get some good numbers from the database servers
> > in terms of their writes in a given day, but when it comes to other
> > processes running in the background I'm not sure how much these  might
> > really affect this number.
> >
>
> If you have existing servers that run linux and have been up for
> reasonably long time (months), iostat will give you a very good idea.
> No ideas about Windows, but I bet those stats exist someplace, too.
>

I can't say months, but at least a month, maybe two - trying to remember
when our last extended power outage was - I can find out later.


>
> For example a Ceph storage node, up 74 days with OS and journals on the
> first 4 drives and OSD HDDs on the other 8:
>
> Device:tpskB_read/skB_wrtn/skB_read kB_wrtn
> sda   9.8229.88   187.87  191341125 1203171718
> sdb   9.7929.57   194.22  189367432 1243850846
> sdc   9.7729.83   188.89  191061000 1209676622
> sdd   8.7729.57   175.40  189399240 1123294410
> sde   5.24   354.1955.68 2268306443  356604748
> sdi   5.02   335.6163.60 2149338787  407307544
> sdj   4.96   350.3352.43 2243590803  335751320
> sdl   5.04   374.6248.49 2399170183  310559488
> sdf   4.85   354.5250.43 2270401571  322947192
> sdh   4.77   332.3850.60 2128622471  324065888
> sdg   6.26   403.9765.42 2587109283  418931316
> sdk   5.86   385.3655.61 2467921295  356120140
>

I do have some linux vms that have been up for a while, can't say how many
months since the last extended power outage off hand (granted I know once I
look at the uptime), but hopefully it will at least give me an idea.


> >
> > >
> > >
> > > >
> > > So it's 2x1Gb/s then?
> > >