Hi,

I have a question about hadoop, which most likely someone in mahout
must have solved before:

Many online ML algorithms require multiple passes over data for best
performance. When putting these algorithms on hadoop, one would want
to run the code close to the data (same machine/rack). Mappers offer
this data-local execution but do not offer means to run multiple times
over the data. Of course, one could run the code outside of the hadoop
mapreduce framework as a HDFS client, but that does not offer the
data-locality advantage, in addition to not being scheduled through
the hadoop schedulers.

How is this solved in mahout?

Thanks for any pointer,

Markus

Reply via email to