On 11/6/07, Doug Cutting <[EMAIL PROTECTED]> wrote: > > Joydeep Sen Sarma wrote: > > One of the controversies is whether in the presence of failures, this > > makes performance worse rather than better (kind of like udp vs. tcp - > > what's better depends on error rate). The probability of a failure per > > job will increase non-linearly as the number of nodes involved per job > > increases. So what might make sense for small clusters may not make > > sense for bigger ones. But it sure would be nice to have this option. > > Hmm. Personally I wouldn't put a very high priority on complicated > features that don't scale well.
I don't think this is necessarily that what we're asking for is complicated, and I think it can be made to scale quite well. The critique is appropriate for the current "polyreduce" prototype, but that is a particular solution to a general problem. And *scaling* is exactly the issue that we are trying to address here (since the current model is quite wasteful of computational resources). As for complexity, I do agree that the current prototype that Joydeep linked to is needlessly cumbersome, but, to solve this problem theoretically, the only additional information that would need to be represented to a scheduler to plan a composite job is just a dependency graph between the mappers and reducers. Dependency graphs are about as clear as it gets in this business, and they certainly aren't going to have any scaling problems and conceptual problems. I don't really see how failures would be more catastrophic if this is implemented purely in terms of smarter scheduling (which is adamantly not what the current polyreduce prototype does). You're running the exact same jobs you would be running otherwise, just sooner. :) Chris