Re: [ceph-users] Ceph Supermicro hardware recommendation
17.02.2015 04:11, Christian Balzer пишет: Hello, re-adding the mailing list. On Mon, 16 Feb 2015 17:54:01 +0300 Mike wrote: Hello 05.02.2015 08:35, Christian Balzer пишет: Hello, LSI 2308 IT 2 x SSD Intel DC S3700 400GB 2 x SSD Intel DC S3700 200GB Why the separation of SSDs? They aren't going to be that busy with regards to the OS. We would like to use 400GB SSD for a cache pool, and 200GB SSD for the journaling. Don't, at least not like that. First and foremost, SSD based OSDs/pools have different requirements, especially when it comes to CPU. Mixing your HDD and SSD based OSDs in the same chassis is a generally a bad idea. Why? If we have for example SuperServer 6028U-TR4+ with proper configuration (4 x SSD DC S3700 for cache pool/8 x 6-8Tb SATA HDD for Cold storage/E5-2695V3 CPU/128Gb RAM), why it's still bad idea? It's something inside Ceph don't work well? Ceph in and by itself will of course work. But your example up there is total overkill on one hand and simply not balanced on the other hand. You'd be much better off (both performance and price wise) if you'd go with something less powerful for a HDD storage node like this: http://www.supermicro.com/products/system/2U/6027/SSG-6027R-E1R12T.cfm with 2 400GB Intels in the back for journals and 16 cores total. While your SSD based storage nodes would be nicely dense by using something like: http://www.supermicro.com/products/system/2U/2028/SYS-2028TP-DC0TR.cfm with 2 E5-2690 v3 per node (I'd actually rather prefer E5-2687W v3, but those are running too hot). Alternatively one of the 1U cases with up to 10 SSDs. Also maintaining a crush map that separates the SSD from HDD pools is made a lot easier, less error prone by segregating nodes into SSD and HDD ones. There are several more reasons below. Yes this normal variants of configurations. But in this way you have 2 different nodes versus 1, it's require a more support inside company. In a whole setup you will be have for each MON, OSD, SSD-CACHE servers one configuration and another configurations for compute nodes. A lot of support, supplies, attention. That's why we still trying reduce amount of configuration for support. It's a balance support versus cost/speed/etc. For me cache pool it's 1-st fast small storage between big slow storage. That's the idea, yes. But besides the problems with performance I'm listing again below, that small is another, very difficult to judge in advance problem. By mixing your cache pool SSD OSDs into the HDD OSD chassis, you're making yourself inflexible in that area (as in just add another SSD cache pool node when needed). Yes in some way inflexible, but I have one configuration not two and can grow up cluster simply add modes. You don't need journal anymore and if you need you can enlarge fast storage. You still need the journal of course, it's (unfortunately in some cases) a basic requirement in Ceph. I suppose what you meant is don't need journal on SSDs anymore. And while that is true, this makes your slow storage at least twice as slow, which at some point (deep-scrub, data re-balancing, very busy cluster) is likely to make you wish you had those journal SSDs. Yes, journal on cold storage is need for re-balancing cluster if some node/hdd fail or promote/remove object from ssd cache. I remember a email in this mail list from one of inktank guys (sorry, didn't remember him full email and name), they wrote that you no need journal if you use cache pool. If you really want to use SSD based OSDs, got at least with Giant, probably better even to wait for Hammer. Otherwise your performance will be nowhere near the investment you're making. Read up in the ML archives about SSD based clusters and their performance, as well as cache pools. Which brings us to the second point, cache pools are pretty pointless currently when it comes to performance. So unless you're planning to use EC pools, you will gain very little from them. So, ssd cache pool useless at all? They're (currently) not performing all that well, ask people on the ML who're actually using them. By now it's true I'm reading ML every day. This is a combination of Ceph currently being unable to fully utilize the full potential of SSDs in general and the cache pool code (having to promote/demote whole objects mainly) in particular. Both of these things are of course known to the Ceph developers and being improved, but right now I don't think they will give you what you expect from them. I would build a good, solid, classic Ceph cluster at this point in time and have a small cache pool for testing. Once that pool performs to your satisfaction, you can always grow it. Another reason to keep SSD based storage nodes separate. Christian Thanks for answer! ___ ceph-users mailing list ceph-users@lists.ceph.com
Re: [ceph-users] Ceph Supermicro hardware recommendation
I ended up destroying the EC pool and starting over. It was killing all of my OSD machines, and I couldn't keep anything working right with EC in use. So, no core dumps and I'm not in a place to reproduce easily anymore. This was with Giant on Ubuntu 14.04. On Thu Feb 12 2015 at 7:07:38 AM Mark Nelson mnel...@redhat.com wrote: On 02/08/2015 10:41 PM, Scott Laird wrote: Does anyone have a good recommendation for per-OSD memory for EC? My EC test blew up in my face when my OSDs suddenly spiked to 10+ GB per OSD process as soon as any reconstruction was needed. Which (of course) caused OSDs to OOM, which meant more reconstruction, which fairly immediately led to a dead cluster. This was with Giant. Is this typical? Doh, that shouldn't happen. Can you reproduce it? Would be especially nice if we could get a core dump or if you could make it happen under valgrind. If the CPUs are spinning, even a perf report might prove useful. On Fri Feb 06 2015 at 2:41:50 AM Mohamed Pakkeer mdfakk...@gmail.com mailto:mdfakk...@gmail.com wrote: Hi all, We are building EC cluster with cache tier for CephFS. We are planning to use the following 1U chassis along with Intel SSD DC S3700 for cache tier. It has 10 * 2.5 slots. Could you recommend a suitable Intel processor and amount of RAM to cater 10 * SSDs?. http://www.supermicro.com/products/system/1U/1028/SYS-1028R-WTRT.cfm Regards K.Mohamed Pakkeer On Fri, Feb 6, 2015 at 2:57 PM, Stephan Seitz s.se...@heinlein-support.de mailto:s.se...@heinlein-support.de wrote: Hi, Am Dienstag, den 03.02.2015, 15:16 + schrieb Colombo Marco: Hi all, I have to build a new Ceph storage cluster, after i‘ve read the hardware recommendations and some mail from this mailing list i would like to buy these servers: just FYI: SuperMicro already focuses on ceph with a productline: http://www.supermicro.com/solutions/datasheet_Ceph.pdf http://www.supermicro.com/solutions/storage_ceph.cfm regards, Stephan Seitz -- Heinlein Support GmbH Schwedter Str. 8/9b, 10119 Berlin http://www.heinlein-support.de Tel: 030 / 405051-44 Fax: 030 / 405051-19 Zwangsangaben lt. §35a GmbHG: HRB 93818 B / Amtsgericht Berlin-Charlottenburg, Geschäftsführer: Peer Heinlein -- Sitz: Berlin ___ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Thanks Regards K.Mohamed Pakkeer Mobile- 0091-8754410114 _ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph Supermicro hardware recommendation
On 02/08/2015 10:41 PM, Scott Laird wrote: Does anyone have a good recommendation for per-OSD memory for EC? My EC test blew up in my face when my OSDs suddenly spiked to 10+ GB per OSD process as soon as any reconstruction was needed. Which (of course) caused OSDs to OOM, which meant more reconstruction, which fairly immediately led to a dead cluster. This was with Giant. Is this typical? Doh, that shouldn't happen. Can you reproduce it? Would be especially nice if we could get a core dump or if you could make it happen under valgrind. If the CPUs are spinning, even a perf report might prove useful. On Fri Feb 06 2015 at 2:41:50 AM Mohamed Pakkeer mdfakk...@gmail.com mailto:mdfakk...@gmail.com wrote: Hi all, We are building EC cluster with cache tier for CephFS. We are planning to use the following 1U chassis along with Intel SSD DC S3700 for cache tier. It has 10 * 2.5 slots. Could you recommend a suitable Intel processor and amount of RAM to cater 10 * SSDs?. http://www.supermicro.com/products/system/1U/1028/SYS-1028R-WTRT.cfm Regards K.Mohamed Pakkeer On Fri, Feb 6, 2015 at 2:57 PM, Stephan Seitz s.se...@heinlein-support.de mailto:s.se...@heinlein-support.de wrote: Hi, Am Dienstag, den 03.02.2015, 15:16 + schrieb Colombo Marco: Hi all, I have to build a new Ceph storage cluster, after i‘ve read the hardware recommendations and some mail from this mailing list i would like to buy these servers: just FYI: SuperMicro already focuses on ceph with a productline: http://www.supermicro.com/solutions/datasheet_Ceph.pdf http://www.supermicro.com/solutions/storage_ceph.cfm regards, Stephan Seitz -- Heinlein Support GmbH Schwedter Str. 8/9b, 10119 Berlin http://www.heinlein-support.de Tel: 030 / 405051-44 Fax: 030 / 405051-19 Zwangsangaben lt. §35a GmbHG: HRB 93818 B / Amtsgericht Berlin-Charlottenburg, Geschäftsführer: Peer Heinlein -- Sitz: Berlin ___ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Thanks Regards K.Mohamed Pakkeer Mobile- 0091-8754410114 _ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph Supermicro hardware recommendation
Does anyone have a good recommendation for per-OSD memory for EC? My EC test blew up in my face when my OSDs suddenly spiked to 10+ GB per OSD process as soon as any reconstruction was needed. Which (of course) caused OSDs to OOM, which meant more reconstruction, which fairly immediately led to a dead cluster. This was with Giant. Is this typical? On Fri Feb 06 2015 at 2:41:50 AM Mohamed Pakkeer mdfakk...@gmail.com wrote: Hi all, We are building EC cluster with cache tier for CephFS. We are planning to use the following 1U chassis along with Intel SSD DC S3700 for cache tier. It has 10 * 2.5 slots. Could you recommend a suitable Intel processor and amount of RAM to cater 10 * SSDs?. http://www.supermicro.com/products/system/1U/1028/SYS-1028R-WTRT.cfm Regards K.Mohamed Pakkeer On Fri, Feb 6, 2015 at 2:57 PM, Stephan Seitz s.se...@heinlein-support.de wrote: Hi, Am Dienstag, den 03.02.2015, 15:16 + schrieb Colombo Marco: Hi all, I have to build a new Ceph storage cluster, after i‘ve read the hardware recommendations and some mail from this mailing list i would like to buy these servers: just FYI: SuperMicro already focuses on ceph with a productline: http://www.supermicro.com/solutions/datasheet_Ceph.pdf http://www.supermicro.com/solutions/storage_ceph.cfm regards, Stephan Seitz -- Heinlein Support GmbH Schwedter Str. 8/9b, 10119 Berlin http://www.heinlein-support.de Tel: 030 / 405051-44 Fax: 030 / 405051-19 Zwangsangaben lt. §35a GmbHG: HRB 93818 B / Amtsgericht Berlin-Charlottenburg, Geschäftsführer: Peer Heinlein -- Sitz: Berlin ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Thanks Regards K.Mohamed Pakkeer Mobile- 0091-8754410114 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph Supermicro hardware recommendation
Hi all, We are building EC cluster with cache tier for CephFS. We are planning to use the following 1U chassis along with Intel SSD DC S3700 for cache tier. It has 10 * 2.5 slots. Could you recommend a suitable Intel processor and amount of RAM to cater 10 * SSDs?. http://www.supermicro.com/products/system/1U/1028/SYS-1028R-WTRT.cfm Regards K.Mohamed Pakkeer On Fri, Feb 6, 2015 at 2:57 PM, Stephan Seitz s.se...@heinlein-support.de wrote: Hi, Am Dienstag, den 03.02.2015, 15:16 + schrieb Colombo Marco: Hi all, I have to build a new Ceph storage cluster, after i‘ve read the hardware recommendations and some mail from this mailing list i would like to buy these servers: just FYI: SuperMicro already focuses on ceph with a productline: http://www.supermicro.com/solutions/datasheet_Ceph.pdf http://www.supermicro.com/solutions/storage_ceph.cfm regards, Stephan Seitz -- Heinlein Support GmbH Schwedter Str. 8/9b, 10119 Berlin http://www.heinlein-support.de Tel: 030 / 405051-44 Fax: 030 / 405051-19 Zwangsangaben lt. §35a GmbHG: HRB 93818 B / Amtsgericht Berlin-Charlottenburg, Geschäftsführer: Peer Heinlein -- Sitz: Berlin ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Thanks Regards K.Mohamed Pakkeer Mobile- 0091-8754410114 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph Supermicro hardware recommendation
Hi Christian, On 04/02/15 02:39, Christian Balzer ch...@gol.com wrote: On Tue, 3 Feb 2015 15:16:57 + Colombo Marco wrote: Hi all, I have to build a new Ceph storage cluster, after i‘ve read the hardware recommendations and some mail from this mailing list i would like to buy these servers: Nick mentioned a number of things already I totally agree with, so don't be surprised if some of this feels like a repeat. OSD: SSG-6027R-E1R12L - http://www.supermicro.nl/products/system/2U/6027/SSG-6027R-E1R12L.cfm Intel Xeon e5-2630 v2 64 GB RAM As nick said, v3 and more RAM might be helpful, depending on your use case (small writes versus large ones) even faster CPUs as well. Ok, we switch from v2 to v3 and from 64 to 96 GB of RAM. LSI 2308 IT 2 x SSD Intel DC S3700 400GB 2 x SSD Intel DC S3700 200GB Why the separation of SSDs? They aren't going to be that busy with regards to the OS. We would like to use 400GB SSD for a cache pool, and 200GB SSD for the journaling. Get a case like Nick mentioned with 2 2.5 bays in the back, put 2 DC S3700 400GBs in there (connected to onboard 6Gb/s SATA3), partition them so that you have a RAID1 for OS and plain partitions for the journals of the now 12 OSD HDDs in your chassis. Of course this optimization in terms of cost and density comes with a price, if one SSD should fail, you will have 6 OSDs down. Given how reliable the Intels are this is unlikely, but something you need to consider. If you want to limit the impact of a SSD failure and have just 2 OSD journals per SSD, get a chassis like the one above and 4 DC S3700 200GB, RAID10 them for the OS and put 2 journal partitions on each. I did the same with 8 3TB HDDs and 4 DC S3700 100GB, the HDDs (and CPU with 4KB IOPS), are the limiting factor, not the SSDs. 8 x HDD Seagate Enterprise 6TB Are you really sure you need that density? One disk failure will result in a LOT of data movement once these become somewhat full. If you were to go for a 12 OSD node as described above, consider 4TB ones for the same overall density, while having more IOPS and likely the same price or less. We choosen the 6TB of disk, because we need a lot of storage in a small amount of server and we prefer server with not too much disks. However we plan to use max 80% of a 6TB Disk 2 x 40GbE for backend network You'd be lucky to write more that 800MB/s sustained to your 8 HDDs (remember they will have to deal with competing reads and writes, this is not a sequential synthetic write benchmark). Incidentally 1GB/s to 1.2GB/s (depending on configuration) would also be the limit of your journal SSDs. Other than backfilling caused by cluster changes (OSD removed/added), your limitation is nearly always going to be IOPS, not bandwidth. Ok, after some discussion, we switch to 2 x 10 GbE. So 2x10GbE or if you're comfortable with it (I am ^o^) an Infiniband backend (can be cheaper, less latency, plans for RDMA support in Ceph) should be more than sufficient. 2 x 10GbE for public network META/MON: SYS-6017R-72RFTP - http://www.supermicro.com/products/system/1U/6017/SYS-6017R-72RFTP.cfm 2 x Intel Xeon e5-2637 v2 4 x SSD Intel DC S3500 240GB raid 1+0 You're likely to get better performance and of course MUCH better durability by using 2 DC S3700, at about the same price. Ok we switch to 2 x SSD DC S3700 128 GB RAM Total overkill for a MON, but I have no idea about MDS and RAM never hurts. Ok we switch from 128 to 96 In your follow-up you mentioned 3 mons, I would suggest putting 2 more mons (only, not MDS) on OSD nodes and make sure that within the IP numbering the real mons have the lowest IP addresses, because the MON with the lowest IP becomes master (and thus the busiest). This way you can survive a loss of 2 nodes and still have a valid quorum. Ok, got it Christian 2 x 10 GbE What do you think? Any feedbacks, advices, or ideas are welcome! Thanks so much Regards, -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ Thanks so much! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph Supermicro hardware recommendation
Hi Marco, Am 04.02.2015 10:20, schrieb Colombo Marco: ... We choosen the 6TB of disk, because we need a lot of storage in a small amount of server and we prefer server with not too much disks. However we plan to use max 80% of a 6TB Disk 80% is too much! You will run into trouble. Ceph don't write the data in equal distribution. Sometimes I see an difference of 20% in the usage of the OSD. I recommend 60-70% as maximum. Udo ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph Supermicro hardware recommendation
Hello, On Wed, 4 Feb 2015 09:20:24 + Colombo Marco wrote: Hi Christian, On 04/02/15 02:39, Christian Balzer ch...@gol.com wrote: On Tue, 3 Feb 2015 15:16:57 + Colombo Marco wrote: Hi all, I have to build a new Ceph storage cluster, after i‘ve read the hardware recommendations and some mail from this mailing list i would like to buy these servers: Nick mentioned a number of things already I totally agree with, so don't be surprised if some of this feels like a repeat. OSD: SSG-6027R-E1R12L - http://www.supermicro.nl/products/system/2U/6027/SSG-6027R-E1R12L.cfm Intel Xeon e5-2630 v2 64 GB RAM As nick said, v3 and more RAM might be helpful, depending on your use case (small writes versus large ones) even faster CPUs as well. Ok, we switch from v2 to v3 and from 64 to 96 GB of RAM. LSI 2308 IT 2 x SSD Intel DC S3700 400GB 2 x SSD Intel DC S3700 200GB Why the separation of SSDs? They aren't going to be that busy with regards to the OS. We would like to use 400GB SSD for a cache pool, and 200GB SSD for the journaling. Don't, at least not like that. First and foremost, SSD based OSDs/pools have different requirements, especially when it comes to CPU. Mixing your HDD and SSD based OSDs in the same chassis is a generally a bad idea. If you really want to use SSD based OSDs, got at least with Giant, probably better even to wait for Hammer. Otherwise your performance will be nowhere near the investment you're making. Read up in the ML archives about SSD based clusters and their performance, as well as cache pools. Which brings us to the second point, cache pools are pretty pointless currently when it comes to performance. So unless you're planning to use EC pools, you will gain very little from them. Lastly, if you still want to do SSD based OSDs, go for something like this: http://www.supermicro.com.tw/products/system/2U/2028/SYS-2028TP-DC0TR.cfm Add the fastest CPUs you can afford and voila, instant SSD based cluster (replication of 2 should be fine with DC S3700). Now with _this_ particular type of nodes, you might want to consider 40GbE links (front and back-end). Get a case like Nick mentioned with 2 2.5 bays in the back, put 2 DC S3700 400GBs in there (connected to onboard 6Gb/s SATA3), partition them so that you have a RAID1 for OS and plain partitions for the journals of the now 12 OSD HDDs in your chassis. Of course this optimization in terms of cost and density comes with a price, if one SSD should fail, you will have 6 OSDs down. Given how reliable the Intels are this is unlikely, but something you need to consider. If you want to limit the impact of a SSD failure and have just 2 OSD journals per SSD, get a chassis like the one above and 4 DC S3700 200GB, RAID10 them for the OS and put 2 journal partitions on each. I did the same with 8 3TB HDDs and 4 DC S3700 100GB, the HDDs (and CPU with 4KB IOPS), are the limiting factor, not the SSDs. 8 x HDD Seagate Enterprise 6TB Are you really sure you need that density? One disk failure will result in a LOT of data movement once these become somewhat full. If you were to go for a 12 OSD node as described above, consider 4TB ones for the same overall density, while having more IOPS and likely the same price or less. We choosen the 6TB of disk, because we need a lot of storage in a small amount of server and we prefer server with not too much disks. However we plan to use max 80% of a 6TB Disk Less disks, less IOPS, less bandwidth. Reducing the amount of servers (which are fixed cost after all) is understandable. But you have an option up there that gives you the same density as with the 6TB disks, but with a significantly improved performance. 2 x 40GbE for backend network You'd be lucky to write more that 800MB/s sustained to your 8 HDDs (remember they will have to deal with competing reads and writes, this is not a sequential synthetic write benchmark). Incidentally 1GB/s to 1.2GB/s (depending on configuration) would also be the limit of your journal SSDs. Other than backfilling caused by cluster changes (OSD removed/added), your limitation is nearly always going to be IOPS, not bandwidth. Ok, after some discussion, we switch to 2 x 10 GbE. So 2x10GbE or if you're comfortable with it (I am ^o^) an Infiniband backend (can be cheaper, less latency, plans for RDMA support in Ceph) should be more than sufficient. 2 x 10GbE for public network META/MON: SYS-6017R-72RFTP - http://www.supermicro.com/products/system/1U/6017/SYS-6017R-72RFTP.cfm 2 x Intel Xeon e5-2637 v2 4 x SSD Intel DC S3500 240GB raid 1+0 You're likely to get better performance and of course MUCH better durability by using 2 DC S3700, at about the same price. Ok we switch to 2 x SSD DC S3700 128 GB RAM Total overkill for a MON, but I have no idea about MDS and RAM never hurts. Ok we switch from 128 to 96 Don't take my
[ceph-users] Ceph Supermicro hardware recommendation
Hi all, I have to build a new Ceph storage cluster, after i‘ve read the hardware recommendations and some mail from this mailing list i would like to buy these servers: OSD: SSG-6027R-E1R12L - http://www.supermicro.nl/products/system/2U/6027/SSG-6027R-E1R12L.cfm Intel Xeon e5-2630 v2 64 GB RAM LSI 2308 IT 2 x SSD Intel DC S3700 400GB 2 x SSD Intel DC S3700 200GB 8 x HDD Seagate Enterprise 6TB 2 x 40GbE for backend network 2 x 10GbE for public network META/MON: SYS-6017R-72RFTP - http://www.supermicro.com/products/system/1U/6017/SYS-6017R-72RFTP.cfm 2 x Intel Xeon e5-2637 v2 4 x SSD Intel DC S3500 240GB raid 1+0 128 GB RAM 2 x 10 GbE What do you think? Any feedbacks, advices, or ideas are welcome! Thanks so much Regards, ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph Supermicro hardware recommendation
Hi, Just a couple of points, you might want to see if you can get a Xeon v3 board+CPU as they have more performance and use less power. You can also get a SM 2U chassis which has 2x 2.5” disk slots at the rear, this would allow you to have an extra 2x 3.5” disks in the front of the server. Extra ram in the OSD nodes would probably help performance a bit How many nodes are you going to have? You might find that bonded 10G networking is sufficient instead of the extra cost of 40GB networking. Nick From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Colombo Marco Sent: 03 February 2015 15:17 To: ceph-users@lists.ceph.com Subject: [ceph-users] Ceph Supermicro hardware recommendation Hi all, I have to build a new Ceph storage cluster, after i‘ve read the hardware recommendations and some mail from this mailing list i would like to buy these servers: OSD: SSG-6027R-E1R12L - http://www.supermicro.nl/products/system/2U/6027/SSG-6027R-E1R12L.cfm Intel Xeon e5-2630 v2 64 GB RAM LSI 2308 IT 2 x SSD Intel DC S3700 400GB 2 x SSD Intel DC S3700 200GB 8 x HDD Seagate Enterprise 6TB 2 x 40GbE for backend network 2 x 10GbE for public network META/MON: SYS-6017R-72RFTP - http://www.supermicro.com/products/system/1U/6017/SYS-6017R-72RFTP.cfm 2 x Intel Xeon e5-2637 v2 4 x SSD Intel DC S3500 240GB raid 1+0 128 GB RAM 2 x 10 GbE What do you think? Any feedbacks, advices, or ideas are welcome! Thanks so much Regards, ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph Supermicro hardware recommendation
Hi Nick, Hi, Just a couple of points, you might want to see if you can get a Xeon v3 board+CPU as they have more performance and use less power. ok You can also get a SM 2U chassis which has 2x 2.5” disk slots at the rear, this would allow you to have an extra 2x 3.5” disks in the front of the server. These two rear slots will be used for the Operating System's SSD Extra ram in the OSD nodes would probably help performance a bit ok How many nodes are you going to have? You might find that bonded 10G networking is sufficient instead of the extra cost of 40GB networking. I think about 14 o 16 OSD nodes. 3 Metadata/Monitor nodes Nick Thanks Regards Marco From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Colombo Marco Sent: 03 February 2015 15:17 To: ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com Subject: [ceph-users] Ceph Supermicro hardware recommendation Hi all, I have to build a new Ceph storage cluster, after i‘ve read the hardware recommendations and some mail from this mailing list i would like to buy these servers: OSD: SSG-6027R-E1R12L - http://xo4t.mj.am/link/xo4t/grkj3rk/1/m3tngGzWbOpwg5uXd5lPdw/aHR0cDovL3d3dy5zdXBlcm1pY3JvLm5sL3Byb2R1Y3RzL3N5c3RlbS8yVS82MDI3L1NTRy02MDI3Ui1FMVIxMkwuY2Zt Intel Xeon e5-2630 v2 64 GB RAM LSI 2308 IT 2 x SSD Intel DC S3700 400GB 2 x SSD Intel DC S3700 200GB 8 x HDD Seagate Enterprise 6TB 2 x 40GbE for backend network 2 x 10GbE for public network META/MON: SYS-6017R-72RFTP - http://xo4t.mj.am/link/xo4t/grkj3rk/2/Fc3dQ9lM7vImlEFAB-_wDg/aHR0cDovL3d3dy5zdXBlcm1pY3JvLmNvbS9wcm9kdWN0cy9zeXN0ZW0vMVUvNjAxNy9TWVMtNjAxN1ItNzJSRlRQLmNmbQ 2 x Intel Xeon e5-2637 v2 4 x SSD Intel DC S3500 240GB raid 1+0 128 GB RAM 2 x 10 GbE What do you think? Any feedbacks, advices, or ideas are welcome! Thanks so much Regards, ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph Supermicro hardware recommendation
On Tue, 3 Feb 2015 15:16:57 + Colombo Marco wrote: Hi all, I have to build a new Ceph storage cluster, after i‘ve read the hardware recommendations and some mail from this mailing list i would like to buy these servers: Nick mentioned a number of things already I totally agree with, so don't be surprised if some of this feels like a repeat. OSD: SSG-6027R-E1R12L - http://www.supermicro.nl/products/system/2U/6027/SSG-6027R-E1R12L.cfm Intel Xeon e5-2630 v2 64 GB RAM As nick said, v3 and more RAM might be helpful, depending on your use case (small writes versus large ones) even faster CPUs as well. LSI 2308 IT 2 x SSD Intel DC S3700 400GB 2 x SSD Intel DC S3700 200GB Why the separation of SSDs? They aren't going to be that busy with regards to the OS. Get a case like Nick mentioned with 2 2.5 bays in the back, put 2 DC S3700 400GBs in there (connected to onboard 6Gb/s SATA3), partition them so that you have a RAID1 for OS and plain partitions for the journals of the now 12 OSD HDDs in your chassis. Of course this optimization in terms of cost and density comes with a price, if one SSD should fail, you will have 6 OSDs down. Given how reliable the Intels are this is unlikely, but something you need to consider. If you want to limit the impact of a SSD failure and have just 2 OSD journals per SSD, get a chassis like the one above and 4 DC S3700 200GB, RAID10 them for the OS and put 2 journal partitions on each. I did the same with 8 3TB HDDs and 4 DC S3700 100GB, the HDDs (and CPU with 4KB IOPS), are the limiting factor, not the SSDs. 8 x HDD Seagate Enterprise 6TB Are you really sure you need that density? One disk failure will result in a LOT of data movement once these become somewhat full. If you were to go for a 12 OSD node as described above, consider 4TB ones for the same overall density, while having more IOPS and likely the same price or less. 2 x 40GbE for backend network You'd be lucky to write more that 800MB/s sustained to your 8 HDDs (remember they will have to deal with competing reads and writes, this is not a sequential synthetic write benchmark). Incidentally 1GB/s to 1.2GB/s (depending on configuration) would also be the limit of your journal SSDs. Other than backfilling caused by cluster changes (OSD removed/added), your limitation is nearly always going to be IOPS, not bandwidth. So 2x10GbE or if you're comfortable with it (I am ^o^) an Infiniband backend (can be cheaper, less latency, plans for RDMA support in Ceph) should be more than sufficient. 2 x 10GbE for public network META/MON: SYS-6017R-72RFTP - http://www.supermicro.com/products/system/1U/6017/SYS-6017R-72RFTP.cfm 2 x Intel Xeon e5-2637 v2 4 x SSD Intel DC S3500 240GB raid 1+0 You're likely to get better performance and of course MUCH better durability by using 2 DC S3700, at about the same price. 128 GB RAM Total overkill for a MON, but I have no idea about MDS and RAM never hurts. In your follow-up you mentioned 3 mons, I would suggest putting 2 more mons (only, not MDS) on OSD nodes and make sure that within the IP numbering the real mons have the lowest IP addresses, because the MON with the lowest IP becomes master (and thus the busiest). This way you can survive a loss of 2 nodes and still have a valid quorum. Christian 2 x 10 GbE What do you think? Any feedbacks, advices, or ideas are welcome! Thanks so much Regards, -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com