Ordering of records in output files?

Joel Welling Wed, 10 Sep 2008 10:32:58 -0700

Hi folks;
  I have a simple Streaming job where the mapper produces output records
beginning with a 16 character ascii string and passes them to
IdentityReducer.  When I run it, I get the same number of output files
as I have mapred.reduce.tasks .  Each one contains some of the strings,
and within each file the strings are in sorted order.
  But there is no obvious ordering *across* the files.  For example, I
can see where the first few strings in the output went to files 0,1,3,4,
and then back to 0, but none of them ended up in file 2.
  What's the algorithm that determines which strings end up in which
files?  Is there a way I can change it so that sequentially ordered
strings end up in the same file rather than spraying off across all the
files?


Thanks,
-Joel
 [EMAIL PROTECTED]

Ordering of records in output files?

Reply via email to