part-00000 means, there is only one reduce task in your configuration. Hope, this helps.
Tien Duc Dinh Jim Twensky wrote: > > Hello, > > The original map-reduce paper states: "After successful completion, the > output of the map-reduce execution is available in the R output files (one > per reduce task, with file names as specified by the user)." However, when > using Hadoop's TextOutputFormat, all the reducer outputs are combined in a > single file called part-00000. I was wondering how and when this merging > process is done. When the reducer calls output.collect(key,value), is this > record written to a local temporary output file in the reducer's disk and > then these local files (a total of R) are later merged into one single > file > with a final thread or is it directly written to the final output file > (part-00000)? I am asking this because I'd like to get an ordered sample > of > the final output data, ie. one record per every 1000 records or something > similar and I don't want to run a serial process that iterates on the > final > output file. > > Thanks, > Jim > > -- View this message in context: http://www.nabble.com/Merging-reducer-outputs-into-a-single-part-00000-file-tp21396867p21399089.html Sent from the Hadoop core-user mailing list archive at Nabble.com.