Thank you Amogh. I will go through the link. Udaya.
On 1/28/10, Ravi <ravindra.babu.rav...@gmail.com> wrote: > Thank you Amogh > > Ravi. > > On 1/28/10, Amogh Vasekar <am...@yahoo-inc.com> wrote: >> Hi, >> Here's the relevant thread with Gordon, the author of the solution: >> I am in the process of learning Hadoop (and I think I've made a lot of >> progress). I have described the specific problem and solution on my blog >> http://www.data-miners.com/blog/2009/11/hadoop-and-mapreduce-parallel-program.html. >> >> You particular solution won't work, because I need to do additional >> processing between the two passes. >> >> --gordon >> >> On Wed, Nov 25, 2009 at 1:50 AM, Amogh Vasekar <am...@yahoo-inc.com> >> wrote: >> >> Amogh >> >> >> On 1/28/10 4:03 PM, "Ravi" <ravindra.babu.rav...@gmail.com> wrote: >> >> Thank you Amogh. >> >> On Thu, Jan 28, 2010 at 3:44 PM, Amogh Vasekar <am...@yahoo-inc.com> >> wrote: >> >>> Hi, >>> For global line numbers, you would need to know the ordering within each >>> split generated from the input file. The standard input formats provide >>> offsets in splits, so if the records are of equal length you can compute >>> some kind of numbering. >>> I remember someone had implemented sequential numbering using the >>> partition >>> id for each map task (mapred.task.partition) and posted this on his blog. >>> I >>> don't have it handy with me right now, but will send you off the list if >>> I >>> find it. >>> >>> Amogh >>> >>> >>> On 1/28/10 3:29 PM, "Udaya Lakshmi" <udaya...@gmail.com> wrote: >>> >>> Hi all.. >>> I have searched the documentation but could not find a input file >>> format which will give line number as the key and line as the value. >>> Did I miss something? Can someone give me a clue of how to implement >>> one such input file format. >>> >>> Thanks, >>> Udaya. >>> >>> >> >> >