Thank you Amogh. I will go through the link.

Udaya.

On 1/28/10, Ravi <ravindra.babu.rav...@gmail.com> wrote:
> Thank you Amogh
>
> Ravi.
>
> On 1/28/10, Amogh Vasekar <am...@yahoo-inc.com> wrote:
>> Hi,
>> Here's the relevant thread with Gordon, the author of the solution:
>> I am in the process of learning Hadoop (and I think I've made a lot of
>> progress).  I have described the specific problem and solution on my blog
>> http://www.data-miners.com/blog/2009/11/hadoop-and-mapreduce-parallel-program.html.
>>
>> You particular solution won't work, because I need to do additional
>> processing between the two passes.
>>
>> --gordon
>>
>> On Wed, Nov 25, 2009 at 1:50 AM, Amogh Vasekar <am...@yahoo-inc.com>
>> wrote:
>>
>> Amogh
>>
>>
>> On 1/28/10 4:03 PM, "Ravi" <ravindra.babu.rav...@gmail.com> wrote:
>>
>> Thank you Amogh.
>>
>> On Thu, Jan 28, 2010 at 3:44 PM, Amogh Vasekar <am...@yahoo-inc.com>
>> wrote:
>>
>>> Hi,
>>> For global line numbers, you would need to know the ordering within each
>>> split generated from the input file. The standard input formats provide
>>> offsets in splits, so if the records are of equal length you can compute
>>> some kind of numbering.
>>> I remember someone had implemented sequential numbering using the
>>> partition
>>> id for each map task (mapred.task.partition) and posted this on his blog.
>>> I
>>> don't have it handy with me right now, but will send you off the list if
>>> I
>>> find it.
>>>
>>> Amogh
>>>
>>>
>>> On 1/28/10 3:29 PM, "Udaya Lakshmi" <udaya...@gmail.com> wrote:
>>>
>>> Hi all..
>>>  I have searched the documentation but could not find a input file
>>> format which will give line number as the key and line as the value.
>>> Did I miss something? Can someone give me a clue of how to implement
>>> one such input file format.
>>>
>>> Thanks,
>>> Udaya.
>>>
>>>
>>
>>
>

Reply via email to