hi all devs, I have read through the dev mail list, and have a rough idea of the progress of Mahout. I have read the google paper and the nips paper. As for the nips paper "map reduce for ML on Multicore", i have some questions.
1. Sect. 4.1 Algorithm Time Complexity Analysis. the paper assumes m >> n, that is, the training instances are much larger than the features. its datasets do have very few features. but this may not be true for many tasks, e.g. text classification, where feature dimensions will reach 10^4-10^5. then will the analysis still hold? 2. Sect. 4.1, too. "reduce phase can minimize communication by combining data as it's passed back; this accounts for the logP factor", Could you help me figure out how logP is calculated. 3. Sect 5.4 Results and Discussion "SVM gets about 13.6% speed up on average over 16 cores", it's 13.6% or 13.6? From figure 2, it seems should be 13.6? 4. Sect 5.4, too. "Finally, the above are runs on multiprocessor machines." No matter multiprocess or multicore, it runs on a single machine which have a share memory. But actually, M/R is for multi-machine, which will involve much more cost on inter-machine communication. So the results of the paper may be questionable? Maybe some of the questions are a complete misinterpretation. Please help me to get an full understanding of the paper, thanks.