Ryan Smith wrote:
I have a question that i feel i should ask on this thread.  Lets say you
want to build a cluster where you will be doing very little map/reduce,
storage and replication of data only on hdfs.  What would the hardware
requirements be?  No quad core? less ram?


Servers with more HDD per CPU, and less RAM. CPUs are a big slice not just of capital, but of your power budget. If you are running a big datacentre, you will care about that electricity bill.

Assuming you go for 1U with 6 HDD in a 1U box, you could have 6 or 12 TB per U, then perhaps a 2-core or 4-core server with "enough" ECC RAM.

* with less M/R work, you could allocate most of that TB to work, leave a few hundred GB for OS and logs

* you'd better estimate external load; if the cluster is storing data then total network bandwidth will be 3X the data ingress (for replication = 3), read costs are that of the data itself. Also, 5 threads on 3 different machines handing the write and forward process.

* I don't know how much load the datanode JVM would take with, say 11 TB of managed storage underneath; that's memory and CPU time.

Is anyone out there running big datanodes? What do they see?

-steve

Reply via email to