Re: Typical hardware configurations
Scott Carey wrote: On 3/30/09 4:41 AM, Steve Loughran ste...@apache.org wrote: Ryan Rawson wrote: You should also be getting 64-bit systems and running a 64 bit distro on it and a jvm that has -d64 available. For the namenode yes. For the others, you will take a fairly big memory hit (1.5X object size) due to the longer pointers. JRockit has special compressed pointers, so will JDK 7, apparently. Sun Java 6 update 14 has ³Ordinary Object Pointer² compression as well. -XX:+UseCompressedOops. I¹ve been testing out the pre-release of that with great success. Nice. Have you tried Hadoop with it yet? Jrockit has virtually no 64 bit overhead up to 4GB, Sun Java 6u14 has small overhead up to 32GB with the new compression scheme. IBM¹s VM also has some sort of pointer compression but I don¹t have experience with it myself. I use the JRockit JVM as it is what our customers use and we need to test on the same JVM. It is interesting in that recursive calls don't ever seem to run out; the way it does stack doesn't have separate memory spaces for stack, permanent generation heap space and the like. That doesn't mean apps are light: a freshly started IDE consumes more physical memory than a VMWare image running XP and outlook. But it is fairly responsive, which is good for a UI: 2295m 650m 22m S2 10.9 0:43.80 java 855m 543m 530m S 11 9.1 4:40.40 vmware-vmx http://wikis.sun.com/display/HotSpotInternals/CompressedOops http://blog.juma.me.uk/tag/compressed-oops/ With pointer compression, there may be gains to be had with running 64 bit JVMs smaller than 4GB on x86 since then the runtime has access to native 64 bit integer operations and registers (as well as 2x the register count). It will be highly use-case dependent. that would certainly benefit atomic operations on longs; for floating point math it would be less useful as JVMs have long made use of the SSE register set for FP work. 64 bit registers would make it easier to move stuff in and out of those registers. I will try and set up a hudson server with this update and see how well it behaves.
Re: Typical hardware configurations
Ryan Rawson wrote: You should also be getting 64-bit systems and running a 64 bit distro on it and a jvm that has -d64 available. For the namenode yes. For the others, you will take a fairly big memory hit (1.5X object size) due to the longer pointers. JRockit has special compressed pointers, so will JDK 7, apparently.
Re: Typical hardware configurations
On 3/30/09 4:41 AM, Steve Loughran ste...@apache.org wrote: Ryan Rawson wrote: You should also be getting 64-bit systems and running a 64 bit distro on it and a jvm that has -d64 available. For the namenode yes. For the others, you will take a fairly big memory hit (1.5X object size) due to the longer pointers. JRockit has special compressed pointers, so will JDK 7, apparently. Sun Java 6 update 14 has ³Ordinary Object Pointer² compression as well. -XX:+UseCompressedOops. I¹ve been testing out the pre-release of that with great success. Jrockit has virtually no 64 bit overhead up to 4GB, Sun Java 6u14 has small overhead up to 32GB with the new compression scheme. IBM¹s VM also has some sort of pointer compression but I don¹t have experience with it myself. http://wikis.sun.com/display/HotSpotInternals/CompressedOops http://blog.juma.me.uk/tag/compressed-oops/ With pointer compression, there may be gains to be had with running 64 bit JVMs smaller than 4GB on x86 since then the runtime has access to native 64 bit integer operations and registers (as well as 2x the register count). It will be highly use-case dependent.
Re: Typical hardware configurations
I run 10 node cluster with 2 cores 2.4Ghz with 4Gb Ram and dual 250GB drives per node. I run on used 32 bit servers so I can only run 2GB hbase but I still have memory left for tasktracker and datanode. more files in hadoop = more memory used on the namenode. hbase master is lightly loaded so I run my on the same node as namenode My personal option is a large memory 64bit machine can not be fully loaded with hbase at this time but will give you better performance. Maybe if you have lots of MR jobs or need the netter response then it would be worth it. I thank there is still some open issues on to many open file handles etc that can limit larger server to not be fully used to there capacity. Thank in terms of google they stick with low (cheap to replace) hard drive sizes medium memory and cost/performance cpus but have lots of them. Billy Amandeep Khurana ama...@gmail.com wrote in message news:35a22e220903272207s30f26310y3ecbec723b83e...@mail.gmail.com... What are the typical hardware config for a node that people are using for Hadoop and HBase? I am setting up a new 10 node cluster which will have HBase running as well that will be feeding my front end directly. Currently, I had a 3 node cluster with 2 GB of RAM on the slaves and 4 GB of RAM on the master. This didnt work very well due to the RAM being a little low. I got some config details from the powered by page on the Hadoop wiki, but nothing like that for Hbase. Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz
Re: Typical hardware configurations
Hello Amandeep, A basic rule of thumb is 1 core and 1 GB RAM per JVM. The Hadoop and HBase daemons will all need such an allocation. You can extend this to the mapreduce subsystem when considering how many mappers and/or reducers can concurrently execute on each node alongside the rest of what you are running. Or, you can choose to partition your hardware to support separate HDFS and HBase from the mapreduce task runners, as some do, which changes the situation. Lots of people try to run all-in-one clusters, where all functions are more or less co-located on every node. Strictly speaking, how much heap a TaskTracker map or reduce task child will require depends on the user application. But, it still loads the CPU so I still use the 1 CPU/1 GB rule of thumb even for these. Overload your CPU resources and the JVM scheduler will starve threads, introducing spurious heartbeat misses, timeouts, and recovery behaviors in system daemons that will unnecessarily degrade performance and operation. One thing I have considered but not tried is using Linux CPU affinity masks to put system functions in one partition and all user mapreduce tasks in the other. Another option as I mentioned is to split hardware resources among the functions. Here is what I have used in the past in a successful all-in- one deployment. In parentheses next to the Java process' name is the heap allocation reserved with -Xmx. 1: NameNode (2000) and DataNode (1000) 1: HMaster (1000), JobTracker (1000), and DataNode (1000) 23: DataNode (1000), HRegionServer (2000), TaskTracker (1000), and the concurrency limit for mappers and reducers set to 4 and 4, respectively. We picked a midpoint between cheap hardware and big iron. Our per node specs was dual quad core, 4/8 GB RAM, 6x 1TB disk. 2x1TB hosted the system volume in RAID-1 configuration. The remaining 4x1TB drives were attached as JBOD and used as DataNode data volumes. The rationale for using so much disk per node was maximization of cluster/rack density. As the size of your HDFS volume increases, you'll need to grow the heap allocation of your NameNode accordingly. In all my time running HBase I never needed more than 2GB allocated to it, but I hear that Facebook runs a NameNode with a 20GB heap. A word of warning however: Currently HBase is a very challenging user of HDFS. In 0.20 there are some changes (HFile) which lessens somewhat the number of open files and should also lower the total number of DataNode xceivers necessary to support operations. However on my 25 node cluster running Hadoop/HBase 0.19, I found it necessary to increase the DataNode xceiver limit to 4096 (from its default of 512!) to successfully bootstrap a HBase cluster with 7000 regions. Therefore it may not be the per-node spec that is the determining factor for the stability of your cluster, but rather the number of DataNodes employed to sufficiently spread the load. Hope that helps, - Andy From: Amandeep Khurana ama...@gmail.com Subject: Typical hardware configurations To: core-user@hadoop.apache.org, hbase-u...@hadoop.apache.org Date: Friday, March 27, 2009, 10:07 PM What are the typical hardware config for a node that people are using for Hadoop and HBase? I am setting up a new 10 node cluster which will have HBase running as well that will be feeding my front end directly. Currently, I had a 3 node cluster with 2 GB of RAM on the slaves and 4 GB of RAM on the master. This didnt work very well due to the RAM being a little low. I got some config details from the powered by page on the Hadoop wiki, but nothing like that for Hbase. Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz
Re: Typical hardware configurations
Hi Amandeep, I just did the same investigation not long ago, and I was recommended to get Amazon EC2 X-Large equivalenthttp://www.google.com/url?q=http%3A%2F%2Faws.amazon.com%2Fec2%2F%23pricingsa=Dsntz=1usg=AFrqEzc1z8IB5p0hIR7SGe-mRVRZXW7Lvgnodes: , 8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each), 15 GB memory, 1690 GB of instance storage, 64-bit platform. One EC2 Compute Unit (ECU) is equivalent to CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor. For more details, you may want to refer to Daniel Leffel's experience on setting up HBasehttp://www.google.com/url?q=http%3A%2F%2Fmail-archives.apache.org%2Fmod_mbox%2Fhadoop-hbase-user%2F200805.mbox%2F%253C25e5a0c00805072129w3b54599r286940f134c6f235%40mail.gmail.com%253Esa=Dsntz=1usg=AFrqEzcmU5_eMlrfoBJwCTxOg9I8NeJ2JQ Hope it helps. Best, Arber On Fri, Mar 27, 2009 at 10:07 PM, Amandeep Khurana ama...@gmail.com wrote: What are the typical hardware config for a node that people are using for Hadoop and HBase? I am setting up a new 10 node cluster which will have HBase running as well that will be feeding my front end directly. Currently, I had a 3 node cluster with 2 GB of RAM on the slaves and 4 GB of RAM on the master. This didnt work very well due to the RAM being a little low. I got some config details from the powered by page on the Hadoop wiki, but nothing like that for Hbase. Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz
Re: Typical hardware configurations
Even though hbase runs on 'commodity' hardware, it's important to remember that to achieve scale you need to do a bit better than 1 cpu 1 gb ram type things. I tend to think in per-core specs, that way you don't have to worry about 2 core vs 4 core vs 8 core - you buy whatever is most economical at the time. I'd match 1 core with 2-4gb ram. You'll want to dedicate 4 gb of ram to hbase, it'll make life easier. You should also be getting 64-bit systems and running a 64 bit distro on it and a jvm that has -d64 available. A word about master... For hbase, the master is (a) important and (b) very lightweight. Meaning the master doesn't use much ram.For hadoop, things are different, because the HDFS master is relatively light weight, but needs lots of ram (every file takes up memory space). On my cluster, the master is the same node-type as the rest. I've heard recommendations to buy better hardware for your master - if you lose a disk, your whole cluster goes down. I can't say i disagree with that sentiment. Good luck! On Fri, Mar 27, 2009 at 10:43 PM, Yabo-Arber Xu arber.resea...@gmail.comwrote: Hi Amandeep, I just did the same investigation not long ago, and I was recommended to get Amazon EC2 X-Large equivalent http://www.google.com/url?q=http%3A%2F%2Faws.amazon.com%2Fec2%2F%23pricingsa=Dsntz=1usg=AFrqEzc1z8IB5p0hIR7SGe-mRVRZXW7Lvg nodes: , 8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each), 15 GB memory, 1690 GB of instance storage, 64-bit platform. One EC2 Compute Unit (ECU) is equivalent to CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor. For more details, you may want to refer to Daniel Leffel's experience on setting up HBase http://www.google.com/url?q=http%3A%2F%2Fmail-archives.apache.org%2Fmod_mbox%2Fhadoop-hbase-user%2F200805.mbox%2F%253C25e5a0c00805072129w3b54599r286940f134c6f235%40mail.gmail.com%253Esa=Dsntz=1usg=AFrqEzcmU5_eMlrfoBJwCTxOg9I8NeJ2JQ Hope it helps. Best, Arber On Fri, Mar 27, 2009 at 10:07 PM, Amandeep Khurana ama...@gmail.com wrote: What are the typical hardware config for a node that people are using for Hadoop and HBase? I am setting up a new 10 node cluster which will have HBase running as well that will be feeding my front end directly. Currently, I had a 3 node cluster with 2 GB of RAM on the slaves and 4 GB of RAM on the master. This didnt work very well due to the RAM being a little low. I got some config details from the powered by page on the Hadoop wiki, but nothing like that for Hbase. Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz