Isabel and Dmitry,

Thank you for your input on this. I've noticed that Mahout's code uses the
new mapreduce package, so I have been following the new APIs. This was also
suggested by Sean w.r.t Mahout-294.

Multiple inputs is a requirement for my project and I was planning on using
the old mapred.lib.multipleinputs class which is not marked as deprecated in
0.20.2:


http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/lib/MultipleInputs.html

Is this advisable and if not, what are my options to handle multiple inputs?

On Sat, May 28, 2011 at 5:59 PM, Dmitriy Lyubimov <[email protected]> wrote:

> Dhruv,
>
> Just a warning, before you want to lock yourself to new apis:
>
> Yes new APIs are preferrable but it is not always possible to use them
> because 0.20.2 lacks _a lot_ in terms of bare necessities in new api
> realm . (multiple inputs/ outputs come to mind at once).
>
> I think i did weasel my way out of those in some cases but i did not
> test it at scale yet, it is certainly not an official way to do it.
>
> Either way it's probably not worth it for anything beyond sheer basic
> MR functionality until we switch to something that actually does have
> the 'new api' because 0.20.2 has some very much truncated version
> which is very far from complete.
>
> -d
>
> On Fri, May 27, 2011 at 3:19 AM, Isabel Drost <[email protected]> wrote:
> > On 18.05.2011 Dhruv Kumar wrote:
> >> For the GSoC project which version of Hadoop's API should I follow?
> >
> > Try to use the new M/R apis where possible - we had the same discussion
> in an
> > earlier thread on spectral clustering, in addition Sean just opened an
> issue
> > concerning Upgrading to newer Hadoop versions, you can take a look there
> as
> > well.
> >
> > Isabel
> >
>

Reply via email to