Re: Input file format doubt

Ravi Thu, 28 Jan 2010 04:19:24 -0800

Thank you Amogh

Ravi.


On 1/28/10, Amogh Vasekar <am...@yahoo-inc.com> wrote:
> Hi,
> Here's the relevant thread with Gordon, the author of the solution:
> I am in the process of learning Hadoop (and I think I've made a lot of
> progress).  I have described the specific problem and solution on my blog
> http://www.data-miners.com/blog/2009/11/hadoop-and-mapreduce-parallel-program.html.
>
> You particular solution won't work, because I need to do additional
> processing between the two passes.
>
> --gordon
>
> On Wed, Nov 25, 2009 at 1:50 AM, Amogh Vasekar <am...@yahoo-inc.com> wrote:
>
> Amogh
>
>
> On 1/28/10 4:03 PM, "Ravi" <ravindra.babu.rav...@gmail.com> wrote:
>
> Thank you Amogh.
>
> On Thu, Jan 28, 2010 at 3:44 PM, Amogh Vasekar <am...@yahoo-inc.com> wrote:
>
>> Hi,
>> For global line numbers, you would need to know the ordering within each
>> split generated from the input file. The standard input formats provide
>> offsets in splits, so if the records are of equal length you can compute
>> some kind of numbering.
>> I remember someone had implemented sequential numbering using the
>> partition
>> id for each map task (mapred.task.partition) and posted this on his blog.
>> I
>> don't have it handy with me right now, but will send you off the list if I
>> find it.
>>
>> Amogh
>>
>>
>> On 1/28/10 3:29 PM, "Udaya Lakshmi" <udaya...@gmail.com> wrote:
>>
>> Hi all..
>>  I have searched the documentation but could not find a input file
>> format which will give line number as the key and line as the value.
>> Did I miss something? Can someone give me a clue of how to implement
>> one such input file format.
>>
>> Thanks,
>> Udaya.
>>
>>
>
>

Re: Input file format doubt

Reply via email to