Assume I have a large file called *BigData.unsorted* ( say 500GB) consisting of lines of text. Assume that these lines are in random order - I understand how to assign a key to lines and that Hadoop will pass the lines to my reducers in order of that key.
Now assume I want a single file called *BigData.sorted* with the lines in the order of the keys. I think I understand how to get files part00000, part000001 ,,, but not 1) How I get just the lines from the reducer not the keys 2) How I make the reducer generate a file with the name that I want "* BigData.sorted"* *3) How without using a single reducer instance I get a single output file or is a single reducer the right choice for this task.* * * *Also it would be very nice if the output of the reducer were compressed - say BigData.sorted.gz * * * *Any suggestions *-- Steven M. Lewis PhD Institute for Systems Biology Seattle WA
