Hi, I'm trying to learn Hama. I've set Hadoop cluster on 3 machines and Hama on it (also 3 nodes). Everything works I think, I'm able to run example codes like Pi and Test (provided with Hama). Now I want to write own code - I'm trying to create a graph and calclate something on it using Hama. To do that I need to understand better how it works. I want to implement for example Page Rank algorithm (in the naive way). If we consider typical Hama job, there is a main method where we set a cluster and set jobs for nodes. I want to create random graph in the main method, divide set of vertices into 3 groups (I have 3 Hama nodes) and in every Hama node process 1/3 vertices.
The Hama node code skeleton (one superstep): - for each vertex: send current PR value to its neighbours (i.e. Hama node which is processing this neighbour) - sync - for each vertex: receive messages and update PR value Questions: - As I mentioned before in the main method in the Hama job, I would like to create random graph and set 1/3 vertices to every Hama node. The questions is how to make vertices accessible for Hama nodes (in bsp method). Probably I need to put all vertices in the HDFS and get them in every Hama node from HDFS, but how to do that? How does Hama provide access to HDFS ? - On the bottom of Hama is HDFS, I have cluster with 3 machines, on every nodes there are DataNode from Hadoop and Hama node. When some Hama node want to get something from HDFS it could communicate with DataNode which is located on different machine. How to force to use DataNode located on the same machine? Thanks for help, Cheers, Pawel
