Hi folks; I have a simple Streaming job where the mapper produces output records beginning with a 16 character ascii string and passes them to IdentityReducer. When I run it, I get the same number of output files as I have mapred.reduce.tasks . Each one contains some of the strings, and within each file the strings are in sorted order. But there is no obvious ordering *across* the files. For example, I can see where the first few strings in the output went to files 0,1,3,4, and then back to 0, but none of them ended up in file 2. What's the algorithm that determines which strings end up in which files? Is there a way I can change it so that sequentially ordered strings end up in the same file rather than spraying off across all the files?
Thanks, -Joel [EMAIL PROTECTED]