Job input path is always multiple paths. You don't need to have
multiple inputs to specify that. What you need multiple inputs for is
to be able to specify different input file formats and assign
different mappers to handle them.

If all your input is formatted homogeneously both record structure
wise and logic wise, then you don't need multiple inputs. That's not
what MI main achievement.

On Sat, May 28, 2011 at 3:58 PM, Shannon Quinn <[email protected]> wrote:
> Isn't this just a matter of making multiple calls to
> FileInputFormat.addInputPath(...) (to adhere to the new APIs) ?
>
> On 5/28/11 5:54 PM, Dmitriy Lyubimov wrote:
>>
>> I don't see how you can use deprecated multiple inputs, as if i am not
>> missing anything, its signature is tied to old api types, such as
>> JobConf, which you of course won't have as you define a new api job.
>>
>> On Sat, May 28, 2011 at 3:43 PM, Dhruv Kumar<[email protected]>  wrote:
>>>
>>> Isabel and Dmitry,
>>>
>>> Thank you for your input on this. I've noticed that Mahout's code uses
>>> the
>>> new mapreduce package, so I have been following the new APIs. This was
>>> also
>>> suggested by Sean w.r.t Mahout-294.
>>>
>>> Multiple inputs is a requirement for my project and I was planning on
>>> using
>>> the old mapred.lib.multipleinputs class which is not marked as deprecated
>>> in
>>> 0.20.2:
>>>
>>>
>>>
>>> http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/lib/MultipleInputs.html
>>>
>>> Is this advisable and if not, what are my options to handle multiple
>>> inputs?
>>>
>>> On Sat, May 28, 2011 at 5:59 PM, Dmitriy Lyubimov<[email protected]>
>>>  wrote:
>>>
>>>> Dhruv,
>>>>
>>>> Just a warning, before you want to lock yourself to new apis:
>>>>
>>>> Yes new APIs are preferrable but it is not always possible to use them
>>>> because 0.20.2 lacks _a lot_ in terms of bare necessities in new api
>>>> realm . (multiple inputs/ outputs come to mind at once).
>>>>
>>>> I think i did weasel my way out of those in some cases but i did not
>>>> test it at scale yet, it is certainly not an official way to do it.
>>>>
>>>> Either way it's probably not worth it for anything beyond sheer basic
>>>> MR functionality until we switch to something that actually does have
>>>> the 'new api' because 0.20.2 has some very much truncated version
>>>> which is very far from complete.
>>>>
>>>> -d
>>>>
>>>> On Fri, May 27, 2011 at 3:19 AM, Isabel Drost<[email protected]>  wrote:
>>>>>
>>>>> On 18.05.2011 Dhruv Kumar wrote:
>>>>>>
>>>>>> For the GSoC project which version of Hadoop's API should I follow?
>>>>>
>>>>> Try to use the new M/R apis where possible - we had the same discussion
>>>>
>>>> in an
>>>>>
>>>>> earlier thread on spectral clustering, in addition Sean just opened an
>>>>
>>>> issue
>>>>>
>>>>> concerning Upgrading to newer Hadoop versions, you can take a look
>>>>> there
>>>>
>>>> as
>>>>>
>>>>> well.
>>>>>
>>>>> Isabel
>>>>>
>
>

Reply via email to