hi all devs,

I have read through the dev mail list, and have a rough idea of the
progress of Mahout. I have read the google paper and the nips paper.
As for the nips paper "map reduce for ML on Multicore", i have some
questions.

1. Sect. 4.1 Algorithm Time Complexity Analysis.
the paper assumes m >> n, that is, the training instances are much
larger than the features. its datasets do have very few features. but
this may not be true for many tasks, e.g. text classification, where
feature dimensions will reach 10^4-10^5. then will the analysis still
hold?

2. Sect. 4.1, too.
"reduce phase can minimize communication by combining data as it's
passed back; this accounts for the logP factor", Could you help me
figure out how logP is calculated.

3. Sect 5.4 Results and Discussion
"SVM gets about 13.6% speed up on average over 16 cores", it's 13.6%
or 13.6? From figure 2, it seems should be 13.6?

4. Sect 5.4, too.
"Finally, the above are runs on multiprocessor machines." No matter
multiprocess or multicore, it runs on a single machine which have a
share memory. But actually, M/R is for multi-machine, which will
involve much more cost on inter-machine communication. So the results
of the paper may be questionable?

Maybe some of the questions are a complete misinterpretation. Please
help me to get an full understanding of the paper, thanks.

Reply via email to