Thank you Amogh Ravi.
On 1/28/10, Amogh Vasekar <am...@yahoo-inc.com> wrote: > Hi, > Here's the relevant thread with Gordon, the author of the solution: > I am in the process of learning Hadoop (and I think I've made a lot of > progress). I have described the specific problem and solution on my blog > http://www.data-miners.com/blog/2009/11/hadoop-and-mapreduce-parallel-program.html. > > You particular solution won't work, because I need to do additional > processing between the two passes. > > --gordon > > On Wed, Nov 25, 2009 at 1:50 AM, Amogh Vasekar <am...@yahoo-inc.com> wrote: > > Amogh > > > On 1/28/10 4:03 PM, "Ravi" <ravindra.babu.rav...@gmail.com> wrote: > > Thank you Amogh. > > On Thu, Jan 28, 2010 at 3:44 PM, Amogh Vasekar <am...@yahoo-inc.com> wrote: > >> Hi, >> For global line numbers, you would need to know the ordering within each >> split generated from the input file. The standard input formats provide >> offsets in splits, so if the records are of equal length you can compute >> some kind of numbering. >> I remember someone had implemented sequential numbering using the >> partition >> id for each map task (mapred.task.partition) and posted this on his blog. >> I >> don't have it handy with me right now, but will send you off the list if I >> find it. >> >> Amogh >> >> >> On 1/28/10 3:29 PM, "Udaya Lakshmi" <udaya...@gmail.com> wrote: >> >> Hi all.. >> I have searched the documentation but could not find a input file >> format which will give line number as the key and line as the value. >> Did I miss something? Can someone give me a clue of how to implement >> one such input file format. >> >> Thanks, >> Udaya. >> >> > >