In iterated map-reduce, a series of code-identical jobs where the reduce output of one is the map input of the next, there are two synchronization barriers per iteration: one in the middle of each job (between map and reduce) and one at the end of each job. In principle this could be a painfully excessive amount of synchronization. Is it in practice? Do you have iterated map-reduce applications with great load imbalance in some phases?
Thanks, Mike