Isabel and Dmitry, Thank you for your input on this. I've noticed that Mahout's code uses the new mapreduce package, so I have been following the new APIs. This was also suggested by Sean w.r.t Mahout-294.
Multiple inputs is a requirement for my project and I was planning on using the old mapred.lib.multipleinputs class which is not marked as deprecated in 0.20.2: http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/lib/MultipleInputs.html Is this advisable and if not, what are my options to handle multiple inputs? On Sat, May 28, 2011 at 5:59 PM, Dmitriy Lyubimov <[email protected]> wrote: > Dhruv, > > Just a warning, before you want to lock yourself to new apis: > > Yes new APIs are preferrable but it is not always possible to use them > because 0.20.2 lacks _a lot_ in terms of bare necessities in new api > realm . (multiple inputs/ outputs come to mind at once). > > I think i did weasel my way out of those in some cases but i did not > test it at scale yet, it is certainly not an official way to do it. > > Either way it's probably not worth it for anything beyond sheer basic > MR functionality until we switch to something that actually does have > the 'new api' because 0.20.2 has some very much truncated version > which is very far from complete. > > -d > > On Fri, May 27, 2011 at 3:19 AM, Isabel Drost <[email protected]> wrote: > > On 18.05.2011 Dhruv Kumar wrote: > >> For the GSoC project which version of Hadoop's API should I follow? > > > > Try to use the new M/R apis where possible - we had the same discussion > in an > > earlier thread on spectral clustering, in addition Sean just opened an > issue > > concerning Upgrading to newer Hadoop versions, you can take a look there > as > > well. > > > > Isabel > > >
