May be there is no direct equivalent but there are many ways how one can build MapReduce architecture into existing system without Hadoop. And there is something all these systems have in common at high level. I can see many existing systems are adding MapReduce paradigm into their stack (e.g.: Aster, GigaSpaces, ... to name a few). Do you think it would be too dificult or impractical at this point to target clean design of algorithms in Mahout and make then pure MapReduce as opposed to coupled with Hadoop? MapReduce API can be just set of few interfaces (and I think there are already such interfaces in Hadoop but I don't think you can get then as a separated JAR). The rest of the Hadoop dependencies (like using HDFS) can be abstracted later if needed. Think of a developer who would like to use Mahout but can not use Hadoop. For such developer it would be "just" a matter of adapting Mahout to his/her proprietar MapReduce system. I am not saying Mahout should have this capability now but would be a nice goal.
Regards, Lukas On Mon, Sep 7, 2009 at 9:42 AM, Sean Owen <sro...@gmail.com> wrote: > I don't know of any other viable alternative at the moment, and I > think any alternative would be sufficiently different that it would be > hard to meaningfully abstract it away without inventing our own little > mapreduce layer. It still doesn't save anyone from thinking about the > details of configuring the underlying implementation -- in fact, now > they have to worry about configuring Mahout-style mapreduce layer as > well. > > (In comparison, take a look at something as simple as logging. Through > people inventing abstractions, and abstractions on abstractions, it's > actually turned into something difficult to manage. Using SL4FJ, > putting in the right bindings .jar so it routes through Log4J -- and > don't forget log4j.xml -- which you have to use because your > dependencies use it, and then, what about that library that will try > to select Log4J or Commons on its own, but it's using Commons because > it found it in the classpath, and now you don't remember which file > configures that, and...) > > > On Mon, Sep 7, 2009 at 8:32 AM, Lukáš Vlček<lukas.vl...@gmail.com> wrote: > > Hi, > > just a note: Wouldn't it be better to talk about MapReduce as opposed to > > Hadoop? This means that for each algorithm implemented in Mahout it > should > > be clearly stated wheter it is MapReduce based implementation or not (or > > using other ways to make it scalable). I can imagine it could be useful > to > > abstract from Hadoop to the point where it would be possible to use > > different MapReduce providers. I am not sure wheter there is any > consensus > > about how MapReduce interfaces API should look like but Mahout could be a > > good candidate for a project to define and create abstract MapReduce API. >