First, I would like to explain a BSP task lifecycle. In its first
superstep, a set of bsp tasks run in parallel. They are frequently called
as peers. The count of peers can be specified by the hama job designer or
be decided by the hama job scheduler based on the number of input splits or
availability of free slots. In the first superstep, all the peers typically
would be reading from HDFS, or from a socket stream or making up their own
inputs for next superstep. The outputs generated in the first superstep
would be then given to intended peers. Every peer then enters the sync
barrier. This gives a global synchronization state for all peers where each
one of them could be sure that the destination peers received the messages
sent to them. In the subsequent supersteps, every peer reads from the
messages that were sent to itself by other peers in the previous superstep.
This repeats till the maximum allowed superstep count specified by the job
designer or till desired results are computed.

> But, the # number of free slots change constantly (as other
> jobs gets completed and started). The free slots could be assigned to the
> bsp job in question at a later stage also, similar to an mr job.

>Adding additional tasks might make the job complete faster.

So this is where we have to understand that the lifecycle of a mapper or a
reducer is equivalent to a single superstep of bsp task. A bsp task is
located on the same machine until the job completes unlike mapper or
reducer task.

So when the bsp job is in progress, if free slots come into picture, it is
too late for the currently running bsp jobs to consider these new slots.
The hama job designer would already have set the optimum number of tasks
for his jobs. This rule even applies to hadoop job designer. We can't have
1,000,000 mappers and 1,000,000 reducers as explained on their wiki. A
careful designer would specify his input partitions and the partitioner for
reduce such that it is optimum and won't exhaust the cluster resources.
Having more bsp tasks in parallel need not make things faster. I feel
optimizing algorithms on both frameworks is an iterative process. In Hama,
new free slots would be considered if a job is scheduled after the new
slots come into picture. If some of the jobs don't get scheduled because
non-availability of slots, this is good indication that either the current
tasks are taking too many slots or your cluster is not big enough to carry
out all jobs in parallel.

Hadoop has some very good functionalities implemented by some very good
engineers and we do try to get our insights from them if not reuse their
code ;). But the requirements change in certain circumstances when we have
to implement BSP model. We may have to tweak a lot with HDFS too in future.

To add to this, I remember Chiahung mentioning about Ciel project that
schedules tasks dynamically as needed and I thought it would be cool to
have this feature in hama, especially for real-time computation. Vaguely
thinking, we can start tasks with input from  checkpointed data. We don't
have this feature in our immediate roadmap. We are trying to get some
important features out first.


-Suraj

Reply via email to