Thanks for the response. What I am trying is to do is finding the average
and then the standard deviation for a very large set (say a million) of
numbers. The result would be used in further calculations.
I have got the average from the first map-reduce chain. now i need to read
this average as well as the set of numbers to calculate the standard
deviation.  so one file would have the input set and the other "resultant"
file would have just the average.
Please do tell me in case there is a better way of doing things than what i
am doing. Any input/suggestion is appreciated.:)



On Mon, Nov 10, 2008 at 4:22 AM, Amar Kamat <[EMAIL PROTECTED]> wrote:

> Amar Kamat wrote:
>
>> some speed wrote:
>>
>>> I was wondering if it was possible to read the input for a map function
>>> from
>>> 2 different files:
>>>  1st file ---> user-input file from a particular location(path)
>>>
>> Is the input/user file sorted? If yes then you can use "map-side join" for
> performance reasons. See org.apache.hadoop.mapred.join for more details.
>
>> 2nd file=---> A resultant file (has just one <key,value> pair) from a
>>> previous MapReduce job. (I am implementing a chain MapReduce function)
>>>
>> Can you explain in more detail the contents of 2nd file?
>
>>
>>> Now, for every <key,value> pair in the user-input file, I would like to
>>> use
>>> the same <key,value> pair from the 2nd file for some calculations.
>>>
>> Can you explain this in more detail? Can you give some abstracted example
> of how file1 and file2 look like and what operation/processing you want to
> do?
>
>
>>>
>> I guess you might need to do some kind of join on the 2 files. Look at
>> contrib/data_join for more details.
>> Amar
>>
>>> Is it possible for me to do so? Can someone guide me in the right
>>> direction
>>> please?
>>>
>>>
>>> Thanks!
>>>
>>>
>>>
>>
>>
>

Reply via email to