Re: Hadoop cluster hardware configuration
if you tell us the purpose of this cluster, then it would be helpful to tell exactly how good it is On Mon, Jun 4, 2012 at 3:57 PM, praveenesh kumar praveen...@gmail.comwrote: Hello all, I am looking forward to build a 5 node hadoop cluster with the following configurations per machine. -- 1. Intel Xeon E5-2609 (2.40GHz/4-core) 2. 32 GB RAM (8GB 1Rx4 PC3) 3. 5 x 900GB 6G SAS 10K hard disk ( total 4.5 TB storage/machine) 4. Ethernet 1GbE connection I would like the experts to please review it and share if this sounds like a optimal/deserving hadoop hardware configuration or not ? I know without knowing the actual use case its not worth commenting, but still in general I would like to have the views. Also please suggest if I am missing something. Regards, Praveenesh -- Nitin Pawar
Re: Hadoop cluster hardware configuration
On a very high level... we would be utilizing cluster not only for hadoop but for other I/O bound or in-memory operations. That is the reason we are going for SAS hard disks. And we also need to perform lots of computational tasks for which we have RAM kept to 32 GB, which can be increased. So on a high level just wanted to know does these hardware specs make sense ? Regards, Praveenesh On Mon, Jun 4, 2012 at 5:46 PM, Nitin Pawar nitinpawar...@gmail.com wrote: if you tell us the purpose of this cluster, then it would be helpful to tell exactly how good it is On Mon, Jun 4, 2012 at 3:57 PM, praveenesh kumar praveen...@gmail.com wrote: Hello all, I am looking forward to build a 5 node hadoop cluster with the following configurations per machine. -- 1. Intel Xeon E5-2609 (2.40GHz/4-core) 2. 32 GB RAM (8GB 1Rx4 PC3) 3. 5 x 900GB 6G SAS 10K hard disk ( total 4.5 TB storage/machine) 4. Ethernet 1GbE connection I would like the experts to please review it and share if this sounds like a optimal/deserving hadoop hardware configuration or not ? I know without knowing the actual use case its not worth commenting, but still in general I would like to have the views. Also please suggest if I am missing something. Regards, Praveenesh -- Nitin Pawar
Re: Hadoop cluster hardware configuration
if you are doing computations using hadoop on a miniscale yes this hardware is good enough. Normally hadoop clusters are pre-occupied with the heavy loads so they are not shared for multiple usage unless your utilization of hadoop is on lower side and then you want to reuse the hardware. On Mon, Jun 4, 2012 at 5:52 PM, praveenesh kumar praveen...@gmail.comwrote: On a very high level... we would be utilizing cluster not only for hadoop but for other I/O bound or in-memory operations. That is the reason we are going for SAS hard disks. And we also need to perform lots of computational tasks for which we have RAM kept to 32 GB, which can be increased. So on a high level just wanted to know does these hardware specs make sense ? Regards, Praveenesh On Mon, Jun 4, 2012 at 5:46 PM, Nitin Pawar nitinpawar...@gmail.com wrote: if you tell us the purpose of this cluster, then it would be helpful to tell exactly how good it is On Mon, Jun 4, 2012 at 3:57 PM, praveenesh kumar praveen...@gmail.com wrote: Hello all, I am looking forward to build a 5 node hadoop cluster with the following configurations per machine. -- 1. Intel Xeon E5-2609 (2.40GHz/4-core) 2. 32 GB RAM (8GB 1Rx4 PC3) 3. 5 x 900GB 6G SAS 10K hard disk ( total 4.5 TB storage/machine) 4. Ethernet 1GbE connection I would like the experts to please review it and share if this sounds like a optimal/deserving hadoop hardware configuration or not ? I know without knowing the actual use case its not worth commenting, but still in general I would like to have the views. Also please suggest if I am missing something. Regards, Praveenesh -- Nitin Pawar -- Nitin Pawar
Re: Hadoop and hardware
Pierre, As discussed in recent other threads, it depends. The most sensible thing for Hadoop nodes is to find a sweet spot for price/performance. In general that will mean keeping a balance between compute power, disks, and network bandwidth, and factor in racks, space, operating costs etc. How much storage capacity are you thinking of when you target about 120 data nodes? If you had for example 60 quad core nodes with 12 * 2 TB disks (or more) I would suspect you would be bottle-necked on your 1GB network connections. Other things to consider is how many nodes per rack? If these 60 nodes would be 2u and you'd fit 20 nodes in a rack, then loosing one top of the rack switch means loosing 1/3 of the capacity of your cluster. Yet another consideration is how easily you want to be able to expand your cluster incrementally? Until you run Hadoop 0.23 you probably want all your nodes to be roughly similar in capacity. Cheers, Joep On Fri, Dec 16, 2011 at 3:50 AM, Cussol pierre.cus...@cnes.fr wrote: In my company, we intend to set up an hadoop cluster to run analylitics applications. This cluster would have about 120 data nodes with dual sockets servers with a GB interconnect. We are also exploring a solution with 60 quad sockets servers. How do compare the quad sockets and dual sockets servers in an hadoop cluster ? any help ? pierre -- View this message in context: http://old.nabble.com/Hadoop-and-hardware-tp32987374p32987374.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: Hadoop cluster hardware details for big data
Hi, Thanks a lot for your timely help. Your valuable answers helped us to understand what kind of hardware to use when it comes to huge data. With Regards, Karthik On 7/6/11, Steve Loughran ste...@apache.org wrote: On 06/07/11 13:18, Michel Segel wrote: Wasn't the answer 42? ;-P 42 = 40 + NN +2ary NN, assuming the JT runs on 2ary or on one of the worker nodes Looking at your calc... You forgot to factor in the number of slots per node. So the number is only a fraction. Assume 10 slots per node. (10 because it makes the math easier.) I thought something was wrong. Then I thought of the server revenue and decided not to look that hard. -- With Regards, Karthik
Re: Hadoop cluster hardware details for big data
Have you taken a look at http://wiki.apache.org/hadoop/PoweredBy? It contains information relevant to your question, if not a detailed answer. On Wed, Jul 6, 2011 at 4:13 PM, Karthik Kumar karthik84ku...@gmail.com wrote: Hi, Has anyone here used hadoop to process more than 3TB of data? If so we would like to know how many machines you used in your cluster and about the hardware configuration. The objective is to know how to handle huge data in Hadoop cluster. -- With Regards, Karthik -- Harsh J
Re: Hadoop cluster hardware details for big data
Hi, I wanted to know the time required to process huge datasets and number of machines used for them. On 7/6/11, Harsh J ha...@cloudera.com wrote: Have you taken a look at http://wiki.apache.org/hadoop/PoweredBy? It contains information relevant to your question, if not a detailed answer. On Wed, Jul 6, 2011 at 4:13 PM, Karthik Kumar karthik84ku...@gmail.com wrote: Hi, Has anyone here used hadoop to process more than 3TB of data? If so we would like to know how many machines you used in your cluster and about the hardware configuration. The objective is to know how to handle huge data in Hadoop cluster. -- With Regards, Karthik -- Harsh J -- With Regards, Karthik
Re: Hadoop cluster hardware details for big data
Karthik, That's a highly process-dependent question I think -- What you would do with the data, would determine the time it takes. No two applications are the same in my belief. On Wed, Jul 6, 2011 at 4:35 PM, Karthik Kumar karthik84ku...@gmail.com wrote: Hi, I wanted to know the time required to process huge datasets and number of machines used for them. On 7/6/11, Harsh J ha...@cloudera.com wrote: Have you taken a look at http://wiki.apache.org/hadoop/PoweredBy? It contains information relevant to your question, if not a detailed answer. On Wed, Jul 6, 2011 at 4:13 PM, Karthik Kumar karthik84ku...@gmail.com wrote: Hi, Has anyone here used hadoop to process more than 3TB of data? If so we would like to know how many machines you used in your cluster and about the hardware configuration. The objective is to know how to handle huge data in Hadoop cluster. -- With Regards, Karthik -- Harsh J -- With Regards, Karthik -- Harsh J
Re: Hadoop cluster hardware details for big data
On 06/07/11 11:43, Karthik Kumar wrote: Hi, Has anyone here used hadoop to process more than 3TB of data? If so we would like to know how many machines you used in your cluster and about the hardware configuration. The objective is to know how to handle huge data in Hadoop cluster. This is too vague a question. What do you mean process?. Scan through some logs looking for values? You could do that on a single machine if you weren't in a rush and you have enough disks, you'd just be very IO bound, and to be honest HDFS needs a minimum number of machines to become fault tolerant. Do complex matrix operations that use lots of RAM and CPU? You'll need more machines. If your cluster has a blocksize of 512MB then a 3TB file fits into (3*1024*1024)/512 blocks: 6144. so you can't have more than 6144 machines anyway -that's your theoretical maximum, even if your name is Facebook or Yahoo! What you are looking for is something in between 10 and 6144, the exact number driven by -how much compute you need to do, and how fast you want it done (controls #of CPUs, RAM) -how much total HDD storage you anticipate needing -whether you want to do leading-edge GPU work (good performance on some tasks, but limited work per machine) You can use benchmarking tools like gridmix3 to get some more data on the characteristics of your workload, which you can then take to your server supplier to say this is what we need, what can you offer? Otherwise everyone is just guessing. Remember also that you can add more racks later, but you will need to plan ahead on datacentre space, power and -very importantly- how you are going to expand the networking. Life is simplest if everything fits into one rack, but if you plan to expand you need to have a roadmap of how to connect that rack to some new ones, which means adding fast interconnect between different top of rack switches. You also need to worry about how to get data in and out fast. -Steve
Re: Hadoop cluster hardware details for big data
On 06/07/11 11:43, Karthik Kumar wrote: Hi, Has anyone here used hadoop to process more than 3TB of data? If so we would like to know how many machines you used in your cluster and about the hardware configuration. The objective is to know how to handle huge data in Hadoop cluster. Actually, I've just thought of simpler answer. 40. It's completely random, but if said with confidence it's as valid as any other answer to your current question.
Re: Hadoop cluster hardware details for big data
On 06/07/11 13:18, Michel Segel wrote: Wasn't the answer 42? ;-P 42 = 40 + NN +2ary NN, assuming the JT runs on 2ary or on one of the worker nodes Looking at your calc... You forgot to factor in the number of slots per node. So the number is only a fraction. Assume 10 slots per node. (10 because it makes the math easier.) I thought something was wrong. Then I thought of the server revenue and decided not to look that hard.