On Oct 1, 2009, at 7:13 AM, Steve Loughran wrote:
Ryan Smith wrote:I have a question that i feel i should ask on this thread. Lets say you want to build a cluster where you will be doing very little map/ reduce, storage and replication of data only on hdfs. What would the hardwarerequirements be? No quad core? less ram?Servers with more HDD per CPU, and less RAM. CPUs are a big slice not just of capital, but of your power budget. If you are running a big datacentre, you will care about that electricity bill.Assuming you go for 1U with 6 HDD in a 1U box, you could have 6 or 12 TB per U, then perhaps a 2-core or 4-core server with "enough" ECC RAM.* with less M/R work, you could allocate most of that TB to work, leave a few hundred GB for OS and logs* you'd better estimate external load; if the cluster is storing data then total network bandwidth will be 3X the data ingress (for replication = 3), read costs are that of the data itself. Also, 5 threads on 3 different machines handing the write and forward process.* I don't know how much load the datanode JVM would take with, say 11 TB of managed storage underneath; that's memory and CPU time.
Datanode load is a function of the number of IOPS. Basically, buying 6 12TB nodes versus 3 24TB nodes, you double the number of IOPS per node.
If you're using HDFS solely for backup, then the number of IOPS is so small you can assume it's zero. We use HDFS for a non-mapreduce physics application, and our particular application mix is such that I target 1 batch system core per usable HDFS TB.
Is anyone out there running big datanodes? What do they see?
Our biggest is 48TB:* They go offline for 5 minutes during the block reports. We use rack awareness to make sure that both copies are not on big data nodes. Fixed in future releases (0.20.0 even, maybe). * When one disk goes out, the datanode shuts down - meaning that 48 disks go out. This is to be fixed in 0.21.0, I think. * The CPUs (4 cores) are pegged when the system is under full load. If I had a chance, I'd give it more CPU horsepower.
As usual, everyone's application is different enough that any anecdote is possibly not applicable.
Brian
smime.p7s
Description: S/MIME cryptographic signature