Say I have two MapReduce processes, A and B. The two are algorithmically dissimilar, so they have to be implemented as separate MapReduce processes. The output of A is used as the input of B, so A has to run first. However, B doesn't need to take all of A's output as input, only a partition of it. So in theory A and B could run at the same time in a producer/consumer arrangement, where B would start to work as soon as A had produced some output but before A had completed. Obviously, this could be a big parallelization win.
Is this possible in MapReduce? I know at the most basic level it is not–there is no synchronization mechanism that allows the same HDFS directory to be used for both input and output–but is there some abstraction layer on top that allows it? I've been digging around, and I think the answer is "No" but I want to be sure. More specifically, the only abstraction layer I'm aware of that chains together MapReduce processes is Cascade, and I think it requires the reduce steps to be serialized, but again I'm not sure because I've only read the documentation and haven't actually played with it.