Hi, I am working of implementing some machine learning algorithms using Map Red. I want to know that If I have data that takes 5-6 hours to train on a normal machine. Will putting in 2-3 more nodes have an effect? I read in the yahoo hadoop tutorial. "Executing Hadoop on a limited amount of data on a small number of nodes may not demonstrate particularly stellar performance as the overhead involved in starting Hadoop programs is relatively high. Other parallel/distributed programming paradigms such as MPI (Message Passing Interface) may perform much better on two, four, or perhaps a dozen machines."
I have at my disposal 3 laptops each with 4 G RAM and 150G hard disk space each... I have 600M of training data.... -- View this message in context: http://www.nabble.com/How-many-nodes-does-one-man-want--tp22733399p22733399.html Sent from the Hadoop core-user mailing list archive at Nabble.com.