hama - general questions

Paweł Brach Wed, 02 Feb 2011 01:26:31 -0800

Hi,

I'm trying to learn Hama. I've set Hadoop cluster on 3 machines and Hama on
it (also 3 nodes). Everything works I think, I'm able to run example codes
like Pi and Test (provided with Hama).
Now I want to write own code - I'm trying to create a graph and calclate
something on it using Hama. To do that I need to understand better how it
works.
I want to implement for example Page Rank algorithm (in the naive way).
If we consider typical Hama job, there is a main method where we set a
cluster and set jobs for nodes.
I want to create random graph in the main method, divide set of vertices
into 3 groups (I have 3 Hama nodes) and in every Hama node process 1/3
vertices.


The Hama node code skeleton (one superstep):
- for each vertex: send current PR value to its neighbours (i.e. Hama node
which is processing this neighbour)
- sync
- for each vertex: receive messages and update PR value

Questions:
- As I mentioned before in the main method in the Hama job, I would like to
create random graph and set 1/3 vertices to every Hama node. The questions
is how to make vertices accessible for Hama nodes (in bsp method). Probably
I need to put all vertices in the HDFS and get them in every Hama node from
HDFS, but how to do that? How does Hama provide access to HDFS ?
- On the bottom of Hama is HDFS, I have cluster with 3 machines, on every
nodes there are DataNode from Hadoop and Hama node. When some Hama node want
to get something from HDFS it could communicate with DataNode which is
located on different machine. How to force to use DataNode located on the
same machine?

Thanks for help,
Cheers,
Pawel

hama - general questions

Reply via email to