Re: [ceph-users] Hardware recommendation / calculation for large cluster

Leen Besselink Sun, 12 May 2013 09:05:21 -0700

On Sun, May 12, 2013 at 03:14:15PM +0200, Tim Mohlmann wrote:
> Hi,
> 
> On Saturday 11 May 2013 16:04:27 Leen Besselink wrote:
>  
> > Someone is going to correct me if I'm wrong, but I think you misread
> > something.
> >
> >
> > The Mon-daemon doesn't need that much RAM:
> > 
> > The 'RAM: 1 GB per daemon' is per Mon-daemon, not per OSD-daemon.
> > 
> Gosh, I feel embarresed. This ectually was my main concern / bottleneck. 
> Thanks for pointing this out. Seems Ceph really rocks in deploying affordable 
> data clusters.
>


I did see you mentioned you wanted to have, many disks in the same machine.

Not just machines with let's say 12 disks for example.

Did you know you need the CPU-power of a 1Ghz Xeon core per OSD for the times 
when
recovery is happening ?

> Regards, Tim
> 
> > On Sat, May 11, 2013 at 03:42:59PM +0200, Tim Mohlmann wrote:
> > > Hi,
> > > 
> > > First of all I am new to ceph and this mailing list. At this moment I am
> > > looking into the possibilities to get involved in the storage business. I
> > > am trying to get an estimate about costs and after that I will start to
> > > determine how to get sufficient income.
> > > 
> > > First I will describe my case, at the bottom you will find my questions.
> > > 
> > > 
> > > GENERAL LAYOUT:
> > > 
> > > Part of this cost calculation is of course hardware. For the larger part
> > > I've already figured it out. In my plans I will be leasing a full rack
> > > (46U). Depending on the domestic needs I will be using 36 or 40U for ODS
> > > storage servers. (I will assume 36U from here on, to keep a solid value
> > > for calculation and have enough spare space for extra devices).
> > > 
> > > Each OSD server uses 4U and can take 36x3.5" drives. So in 36U I can put
> > > 36/4=9 OSD servers, containing 9*36=324 HDDs.
> > > 
> > > 
> > > HARD DISK DRIVES
> > > 
> > > I have been looking for WD digital RE and RED series. RE is more expensive
> > > per GB, but has a larger MTBF and offers a 4TB model. RED is (real) cheap
> > > per GB, but only goes as far a 3TB.
> > > 
> > > At my current calculations it does not matter much if I would put
> > > expensive WD RE 4TB disks or cheaper WD RED 3TB, the price per GB over
> > > the complete cluster expense and 3 years of running costs (including AFR)
> > > is almost the same.
> > > 
> > > So basically, if I could reduce the costs of all the other components used
> > > in the cluster, I would go for the 3TB disk and if the costs will be
> > > higher then my first calculation, I would use the 4TB disk.
> > > 
> > > Let's assume 4TB from now on. So, 4*324=1296TB. So lets go Peta-byte ;).
> > > 
> > > 
> > > NETWORK
> > > 
> > > I will use a redundant 2x10Gbe network connection for each node. Two
> > > independent 10Gbe switches will be used and I will use bonding between the
> > > interfaces on each node. (Thanks some guy in the #Ceph irc for pointing
> > > this option out). I will use VLAN's to split front-side, backside and
> > > Internet networks.
> > > 
> > > 
> > > OSD SERVER
> > > 
> > > SuperMicro based, 36 HDD hotswap. Dual socket mainboard. 16x DIMM sockets.
> > > It is advertised they can take up to 512GB of RAM. I will install 2 x
> > > Intel Xeon E5620 2.40ghz processor, having 4 cores and 8 threads each.
> > > For the RAM I am in doubt (see below). I am looking into running 1 OSD
> > > per disk.
> > > 
> > > 
> > > MON AND MDS SERVERS
> > > 
> > > Now comes the big question. What specs are required? It first I had the
> > > plan to use 4 SuperMicro superservers, with a 4 socket mainboards that
> > > contain up to the new 16core AMD processors and up to 1TB of RAM.
> > > 
> > > I want all 4 of the servers to run a MON service, MDS service and costumer
> > > / public services. Probably I would use VM's (kvm) to separate them. I
> > > will compile my own kernel to enable Kernel Samepage Merge, Hugepage
> > > support and memory compaction to make RAM use more efficient. The
> > > requirements for my public services will be added up, once I know what I
> > > need for MON and MDS.
> > > 
> > > 
> > > RAM FOR ALL SERVERS
> > > 
> > > So what would you estimate to be the ram usage?
> > > http://ceph.com/docs/master/install/hardware-recommendations/#minimum-
> > > hardware-recommendations.
> > > 
> > > Sounds OK for the OSD part. 500 MB per daemon, would put the minimum RAM
> > > requirement for my OSD server to 18GB. 32GB should be more then enough.
> > > Although I would like to see if it is possible to use btrfs compression?
> > > In
> > > that case I'd need more RAM in there.
> > > 
> > > What I really want to know: how many RAM do I need for MON and MDS
> > > servers?
> > > 1GB per daemon sounds pretty steep. As everybody knows, RAM is expensive!
> > > 
> > > In my case I would need at least 324 GB of ram for each of them. Initially
> > > I was planning to use 4 servers and each of them running both. Joining
> > > those in a single system, with the other duties the system has to perform
> > > I would need the full 1TB of RAM. I would need to use 32GB modules witch
> > > are really expensive per GB and difficult to find. (not may server
> > > hardware vendors in the Netherlands have them).
> > > 
> > > 
> > > QUESTIONS
> > > 
> > > Question 1: Is it really the amount for OSD's that counts for MON and MDS
> > > RAM usage, or the size of the object store?
> > > 
> > > Question 2: can I do it with less RAM? Any statistics, or better: a
> > > calculation? I can imagine memory pages becoming redundant if the cluster
> > > grows, so less memory required per OSD.
> > > 
> > > Question 3: If it is the amount of OSDs that counts, would it be
> > > beneficial to combine disks in a RAID 0 (lvm or btrfs) array?
> > > 
> > > Question 4: Is it safe / possible to store MON files inside of the cluster
> > > itself? The 10GB per daemon requirement would mean I need 3240GB of
> > > storage
> > > for each MON, meaning I need to get some huge disks and a (lvm) RAID 1
> > > array for redundancy, while I have a huge redundant file sytem at hand
> > > already.
> > > 
> > > Question 5: Is it possible to enable btrfs compression? I know btrfs is
> > > not
> > > stable for production yet, but it would be nice if compression is
> > > supported in the future, when it does become stable
> > > 
> > > If the RAM requirement is not so steep, I am thinking about the
> > > possibility to run the MON service from 4 OSD servers. Upgrading them to
> > > 16x16GB of RAM would give me 256GB of RAM. (Again, 32GB modules are to
> > > expensive and not an option). This would obsolete 2 superservers,
> > > decreasing their workload, and keep some spare computing power for future
> > > growth. The only reason I needed them is for RAM capacity.
> > > 
> > > Getting rid of 2 superservers, will provide me with enough space to fit a
> > > 10th storage server. This will considerable reduce the total cost per GB
> > > of this cluster. (Comparing all the hardware without the HDDs, the 4
> > > superservers are the most expensive part)
> > > 
> > > I completely understand if you think: hey! that kind of things should be
> > > corporate advise etc. Please understand I am just an individual, just
> > > working his (non-IT) job and has Linux and open-source as a hobby. I just
> > > started brainstorming on some business opportunities. If this story would
> > > be feasible, I would use this information to make a business and
> > > investment plan and look for investors.
> > > 
> > > Thanks and best regards,
> > > 
> > > Tim Mohlmann
> > > 
> > > 
> > > _______________________________________________
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > 
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Hardware recommendation / calculation for large cluster

Reply via email to