It looks to me like the original questions, and the ensuing discussion, are a mix of two concerns. (1) BSP has some abstractions that have been realized in many concrete systems, and these abstractions can be compared to MapReduce independently of the details of particular implementations. (2) is asking for details of the implementations in Giraph and Hama. I think it is very valuable to separate these. Even for concrete systems like Hama and Giraph, the APIs are sufficiently abstract to allow quite a variety of implementations.
Regarding the comparison of abstractions, the first thing to note is that MapReduce omits two important concepts from BSP, namely iteration and per-component state. To make a very meaningful comparison, you have to compare BSP with an iterated use of MapReduce. Imagine running the same MapReduce job over and over again, with the output of one iteration being the input to the next. In this case you can pair up each reduce task (except those of the final iteration) of one iteration with a map task of the following iteration --- the one that reads the reduce task's output. It is that pair that corresponds to a "compute" invocation in BSP. Note that in iterated MapReduce there are two synchronization barriers per iteration: one between map and reduce, and one between iterations. BSP has just one barrier per iteration. Iterated MapReduce externalizes the intermediate data that flows from one iteration's reduce tasks to the next iteration's map tasks; in the corresponding BSP computation this intermediate data is just held inside the compute invocations. Regards, Mike
