Re: [ceph-users] OSD node memory sizing

2016-05-19 Thread Christian Balzer

Hello,

On Thu, 19 May 2016 10:51:20 +0200 Dietmar Rieder wrote:

> Hello,
> 
> On 05/19/2016 03:36 AM, Christian Balzer wrote:
> > 
> > Hello again,
> > 
> > On Wed, 18 May 2016 15:32:50 +0200 Dietmar Rieder wrote:
> > 
> >> Hello Christian,
> >>
> >>> Hello,
> >>>
> >>> On Wed, 18 May 2016 13:57:59 +0200 Dietmar Rieder wrote:
> >>>
>  Dear Ceph users,
> 
>  I've a question regarding the memory recommendations for an OSD
>  node.
> 
>  The official Ceph hardware recommendations say that an OSD node
>  should have 1GB Ram / TB OSD [1]
> 
>  The "Reference Architecture" whitpaper from Red Hat & Supermicro
>  says that "typically" 2GB of memory per OSD on a OSD node is used.
>  [2]
> 
> >>> This question has been asked and answered here countless times.
> >>>
> >>> Maybe something a bit more detailed ought to be placed in the first
> >>> location, or simply a reference to the 2nd one. 
> >>> But then again, that would detract from the RH added value.
> >>
> >> thanks for replying, nonetheless.
> >> I checked the list before but I failed to find a definitive answer,
> >> may be I was not looking hard enough. Anyway, thanks!
> >>
> > They tend to hidden sometimes in other threads, but there really is a
> > lot..
> 
> It seems so, have to dig deeper into the available discussions...
>
See the recent thread "journal or cache tier on SSDs ?" started by
another academic, slightly to your west for some insights, more below.

> > 
> >>>  
>  According to the recommendation in [1] an OSD node with 24x 8TB OSD
>  disks is "underpowered "  when it is equipped with 128GB of RAM.
>  However, following the "recommendation" in [2] 128GB should be
>  plenty enough.
> 
> >>> It's fine per se, the OSD processes will not consume all of that even
> >>> in extreme situations.
> >>
> >> Ok, if I understood this correctly, then 128GB should be enough also
> >> during rebalancing or backfilling.
> >>
> > Definitely, but realize that during this time of high memory
> > consumption cause by backfilling your system is also under strain from
> > objects moving in an out, so as per the high-density thread you will
> > want all your dentry and other important SLAB objects to stay in RAM.
> > 
> > That's a lot of objects potentially with 8TB, so when choosing DIMMs
> > pick ones that leave you with the option to go to 256GB later if need
> > be.
> 
> Good point, I'll keep this in mind
> 
> > 
> > Also you'll probably have loads of fun playing with CRUSH weights to
> > keep the utilization of these 8TB OSDs within 100GB of each other. 
> 
> I'm afraid that  finding the "optimal" settings will demand a lot of
> testing/playing
> 

Optimal settings is another topic, this is just making tiny adjustments to
your CRUSH weights so that the OSDs stay within a few percent of usage of
each other. 

> > 
> >>>
> >>> Very large OSDs and high density storage nodes have other issues and
> >>> challenges, tuning and memory wise.
> >>> There are several threads about these recently, including today.
> >>
> >> Thanks, I'll study these...
> >>
>  I'm wondering which of the two is good enough for a Ceph cluster
>  with 10 nodes using EC (6+3)
> 
> >>> I would spend more time pondering about the CPU power of these
> >>> machines (EC need more) and what cache tier to get.
> >>
> >> We are planing to equip the OSD nodes with 2x2650v4 CPUs (24 cores @
> >> 2.2GHz), that is 1 core/OSD. For the cache tier each OSD node gets two
> >> 800Gb NVMe's. We hope this setup will give reasonable performance with
> >> EC.
> >>
> > So you have actually 26 OSDs per node then.
> > I'd say the CPUs are fine, but EC and the NVMes will eat a fair share
> > of it.
> 
> Your right, it is 26 OSDs but still I assume that with these CPUs we
> will not be completely underpowered.
>
Since you stated your use case I'll say the same, not so much if this were
to be the storage for lots of high IOPS VMs.
 
> > That's why I prefer to have dedicated cache tier nodes with fewer but
> > faster cores, unless the cluster is going to be very large.
> > With Hammer a 800GB DC S3160 SSD based OSD can easily saturate a 
> > "E5-2623 v3" core @3.3GHz (nearly 2 cores to be precise) and Jewel has
> > optimization that will both make it faster by itself AND enable it to
> > use more CPU resources as well.
> > 
> 
> That's probably, the best solution, but this will not be in our budged
> and rackspace limits for the first setup, however when expanding later
> on it will definitely be something to consider, also depending on the
> performance that we obtain with this first setup.
> 
Well, if you're gonna grow this cluster your shared setup will become more
and more effective (but still remain harder to design/specify just right).

> > The NVMes (DC P3700 one presumes?) just for cache tiering, no SSD
> > journals for the OSDs?
> 
> For now we have an offer for HPE  800GB NVMe MU (mixed use), 880MB/s
> write 2600MB/s read, 3 

Re: [ceph-users] OSD node memory sizing

2016-05-19 Thread Dietmar Rieder
Hello,

On 05/19/2016 03:36 AM, Christian Balzer wrote:
> 
> Hello again,
> 
> On Wed, 18 May 2016 15:32:50 +0200 Dietmar Rieder wrote:
> 
>> Hello Christian,
>>
>>> Hello,
>>>
>>> On Wed, 18 May 2016 13:57:59 +0200 Dietmar Rieder wrote:
>>>
 Dear Ceph users,

 I've a question regarding the memory recommendations for an OSD node.

 The official Ceph hardware recommendations say that an OSD node should
 have 1GB Ram / TB OSD [1]

 The "Reference Architecture" whitpaper from Red Hat & Supermicro says
 that "typically" 2GB of memory per OSD on a OSD node is used. [2]

>>> This question has been asked and answered here countless times.
>>>
>>> Maybe something a bit more detailed ought to be placed in the first
>>> location, or simply a reference to the 2nd one. 
>>> But then again, that would detract from the RH added value.
>>
>> thanks for replying, nonetheless.
>> I checked the list before but I failed to find a definitive answer, may
>> be I was not looking hard enough. Anyway, thanks!
>>
> They tend to hidden sometimes in other threads, but there really is a lot..

It seems so, have to dig deeper into the available discussions...

> 
>>>  
 According to the recommendation in [1] an OSD node with 24x 8TB OSD
 disks is "underpowered "  when it is equipped with 128GB of RAM.
 However, following the "recommendation" in [2] 128GB should be plenty
 enough.

>>> It's fine per se, the OSD processes will not consume all of that even
>>> in extreme situations.
>>
>> Ok, if I understood this correctly, then 128GB should be enough also
>> during rebalancing or backfilling.
>>
> Definitely, but realize that during this time of high memory consumption
> cause by backfilling your system is also under strain from objects moving
> in an out, so as per the high-density thread you will want all your dentry
> and other important SLAB objects to stay in RAM.
> 
> That's a lot of objects potentially with 8TB, so when choosing DIMMs pick
> ones that leave you with the option to go to 256GB later if need be.

Good point, I'll keep this in mind

> 
> Also you'll probably have loads of fun playing with CRUSH weights to keep
> the utilization of these 8TB OSDs within 100GB of each other. 

I'm afraid that  finding the "optimal" settings will demand a lot of
testing/playing

> 
>>>
>>> Very large OSDs and high density storage nodes have other issues and
>>> challenges, tuning and memory wise.
>>> There are several threads about these recently, including today.
>>
>> Thanks, I'll study these...
>>
 I'm wondering which of the two is good enough for a Ceph cluster with
 10 nodes using EC (6+3)

>>> I would spend more time pondering about the CPU power of these machines
>>> (EC need more) and what cache tier to get.
>>
>> We are planing to equip the OSD nodes with 2x2650v4 CPUs (24 cores @
>> 2.2GHz), that is 1 core/OSD. For the cache tier each OSD node gets two
>> 800Gb NVMe's. We hope this setup will give reasonable performance with
>> EC.
>>
> So you have actually 26 OSDs per node then.
> I'd say the CPUs are fine, but EC and the NVMes will eat a fair share of
> it.

Your right, it is 26 OSDs but still I assume that with these CPUs we
will not be completely underpowered.

> That's why I prefer to have dedicated cache tier nodes with fewer but
> faster cores, unless the cluster is going to be very large.
> With Hammer a 800GB DC S3160 SSD based OSD can easily saturate a 
> "E5-2623 v3" core @3.3GHz (nearly 2 cores to be precise) and Jewel has
> optimization that will both make it faster by itself AND enable it to
> use more CPU resources as well.
> 

That's probably, the best solution, but this will not be in our budged
and rackspace limits for the first setup, however when expanding later
on it will definitely be something to consider, also depending on the
performance that we obtain with this first setup.

> The NVMes (DC P3700 one presumes?) just for cache tiering, no SSD
> journals for the OSDs?

For now we have an offer for HPE  800GB NVMe MU (mixed use), 880MB/s
write 2600MB/s read, 3 DW/D. So they are a fast as the DC 3700, we will
probably check also other options.

> What are your network plans then, as in is your node storage bandwidth a
> good match for your network bandwidth? 
>

As network we will have 2x10GBit bonded cluster internal and 2x10GBit
bonded towards the clients, 1GBit for administration


>>> That is, if performance is a requirement in your use case.
>>
>> Always, who wouldn't care about performance?  :-)
>>
> "Good enough" sometimes really is good enough.
> 
> Since you're going for 8TB OSDs, EC and 10 nodes it feels that for you
> space is important, so something like archival, not RBD images for high
> performance VMs.
> 
> What is your use case?


You're right, space is most important. Our use case is not serving RBD
for VMs.
We will mainly store genomic data on cephfs volumes and access it from a
computing cluster
for analyis. This 

Re: [ceph-users] OSD node memory sizing

2016-05-18 Thread Christian Balzer

Hello again,

On Wed, 18 May 2016 15:32:50 +0200 Dietmar Rieder wrote:

> Hello Christian,
> 
> > Hello,
> > 
> > On Wed, 18 May 2016 13:57:59 +0200 Dietmar Rieder wrote:
> > 
> >> Dear Ceph users,
> >>
> >> I've a question regarding the memory recommendations for an OSD node.
> >>
> >> The official Ceph hardware recommendations say that an OSD node should
> >> have 1GB Ram / TB OSD [1]
> >>
> >> The "Reference Architecture" whitpaper from Red Hat & Supermicro says
> >> that "typically" 2GB of memory per OSD on a OSD node is used. [2]
> >>
> > This question has been asked and answered here countless times.
> > 
> > Maybe something a bit more detailed ought to be placed in the first
> > location, or simply a reference to the 2nd one. 
> > But then again, that would detract from the RH added value.
> 
> thanks for replying, nonetheless.
> I checked the list before but I failed to find a definitive answer, may
> be I was not looking hard enough. Anyway, thanks!
> 
They tend to hidden sometimes in other threads, but there really is a lot..

> >  
> >> According to the recommendation in [1] an OSD node with 24x 8TB OSD
> >> disks is "underpowered "  when it is equipped with 128GB of RAM.
> >> However, following the "recommendation" in [2] 128GB should be plenty
> >> enough.
> >>
> > It's fine per se, the OSD processes will not consume all of that even
> > in extreme situations.
> 
> Ok, if I understood this correctly, then 128GB should be enough also
> during rebalancing or backfilling.
> 
Definitely, but realize that during this time of high memory consumption
cause by backfilling your system is also under strain from objects moving
in an out, so as per the high-density thread you will want all your dentry
and other important SLAB objects to stay in RAM.

That's a lot of objects potentially with 8TB, so when choosing DIMMs pick
ones that leave you with the option to go to 256GB later if need be.

Also you'll probably have loads of fun playing with CRUSH weights to keep
the utilization of these 8TB OSDs within 100GB of each other. 

> > 
> > Very large OSDs and high density storage nodes have other issues and
> > challenges, tuning and memory wise.
> > There are several threads about these recently, including today.
> 
> Thanks, I'll study these...
> 
> >> I'm wondering which of the two is good enough for a Ceph cluster with
> >> 10 nodes using EC (6+3)
> >>
> > I would spend more time pondering about the CPU power of these machines
> > (EC need more) and what cache tier to get.
> 
> We are planing to equip the OSD nodes with 2x2650v4 CPUs (24 cores @
> 2.2GHz), that is 1 core/OSD. For the cache tier each OSD node gets two
> 800Gb NVMe's. We hope this setup will give reasonable performance with
> EC.
> 
So you have actually 26 OSDs per node then.
I'd say the CPUs are fine, but EC and the NVMes will eat a fair share of
it.
That's why I prefer to have dedicated cache tier nodes with fewer but
faster cores, unless the cluster is going to be very large.
With Hammer a 800GB DC S3160 SSD based OSD can easily saturate a 
"E5-2623 v3" core @3.3GHz (nearly 2 cores to be precise) and Jewel has
optimization that will both make it faster by itself AND enable it to
use more CPU resources as well.

The NVMes (DC P3700 one presumes?) just for cache tiering, no SSD
journals for the OSDs?
What are your network plans then, as in is your node storage bandwidth a
good match for your network bandwidth? 

> > That is, if performance is a requirement in your use case.
> 
> Always, who wouldn't care about performance?  :-)
> 
"Good enough" sometimes really is good enough.

Since you're going for 8TB OSDs, EC and 10 nodes it feels that for you
space is important, so something like archival, not RBD images for high
performance VMs.

What is your use case?

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD node memory sizing

2016-05-18 Thread Dietmar Rieder
Hello Christian,

> Hello,
> 
> On Wed, 18 May 2016 13:57:59 +0200 Dietmar Rieder wrote:
> 
>> Dear Ceph users,
>>
>> I've a question regarding the memory recommendations for an OSD node.
>>
>> The official Ceph hardware recommendations say that an OSD node should
>> have 1GB Ram / TB OSD [1]
>>
>> The "Reference Architecture" whitpaper from Red Hat & Supermicro says
>> that "typically" 2GB of memory per OSD on a OSD node is used. [2]
>>
> This question has been asked and answered here countless times.
> 
> Maybe something a bit more detailed ought to be placed in the first
> location, or simply a reference to the 2nd one. 
> But then again, that would detract from the RH added value.

thanks for replying, nonetheless.
I checked the list before but I failed to find a definitive answer, may
be I was not looking hard enough. Anyway, thanks!

>  
>> According to the recommendation in [1] an OSD node with 24x 8TB OSD
>> disks is "underpowered "  when it is equipped with 128GB of RAM.
>> However, following the "recommendation" in [2] 128GB should be plenty
>> enough.
>>
> It's fine per se, the OSD processes will not consume all of that even in
> extreme situations.

Ok, if I understood this correctly, then 128GB should be enough also
during rebalancing or backfilling.

> 
> Very large OSDs and high density storage nodes have other issues and
> challenges, tuning and memory wise.
> There are several threads about these recently, including today.

Thanks, I'll study these...

>> I'm wondering which of the two is good enough for a Ceph cluster with 10
>> nodes using EC (6+3)
>>
> I would spend more time pondering about the CPU power of these machines
> (EC need more) and what cache tier to get.

We are planing to equip the OSD nodes with 2x2650v4 CPUs (24 cores @
2.2GHz), that is 1 core/OSD. For the cache tier each OSD node gets two
800Gb NVMe's. We hope this setup will give reasonable performance with EC.

> That is, if performance is a requirement in your use case.

Always, who wouldn't care about performance?  :-)

Dietmar

-- 
_
D i e t m a r  R i e d e r, Mag.Dr.
Innsbruck Medical University
Biocenter - Division for Bioinformatics
Innrain 80, 6020 Innsbruck
Phone: +43 512 9003 71402
Fax: +43 512 9003 73100
Email: dietmar.rie...@i-med.ac.at
Web:   http://www.icbi.at





signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OSD node memory sizing

2016-05-18 Thread Dietmar Rieder
Dear Ceph users,

I've a question regarding the memory recommendations for an OSD node.

The official Ceph hardware recommendations say that an OSD node should
have 1GB Ram / TB OSD [1]

The "Reference Architecture" whitpaper from Red Hat & Supermicro says
that "typically" 2GB of memory per OSD on a OSD node is used. [2]

According to the recommendation in [1] an OSD node with 24x 8TB OSD
disks is "underpowered "  when it is equipped with 128GB of RAM.
However, following the "recommendation" in [2] 128GB should be plenty
enough.

I'm wondering which of the two is good enough for a Ceph cluster with 10
nodes using EC (6+3)

Thanks for any comment
  Dietmar

[1] http://docs.ceph.com/docs/jewel/start/hardware-recommendations/
[2]
https://www.redhat.com/en/files/resources/en-rhst-cephstorage-supermicro-INC0270868_v2_0715.pdf

-- 
_
D i e t m a r  R i e d e r, Mag.Dr.
Innsbruck Medical University
Biocenter - Division for Bioinformatics




signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com