You can use a randomized reduce key to parallelize the comparison of different runs. Each reduce key would be in a small range of integers (say 0..100). Each reducer would then be in charge of keeping only the best solution. The final output would be 100 values which could be compared conventionally.
Whether this would help really depends on how many runs you have. If it is less than millions, this probably doesn't matter and Miles suggestion is fine. On Thu, Mar 19, 2009 at 11:54 AM, Miles Osborne <mi...@inf.ed.ac.uk> wrote: > you won't need any reducers. -- Ted Dunning, CTO DeepDyve