subject:"Giraph 1.0 \| Computation stuck at map 100% \- reduce 0% for my algorithm only, at multi\-node cluster"

Giraph 1.0 | Computation stuck at map 100% - reduce 0% for my algorithm only, at multi-node cluster

2014-09-30 Thread Panagiotis Eustratiadis

Good morning,

I have been having a problem the past few days which sadly I can't solve.

First of all I set up a Hadoop 0.20.203.0 cluster of two nodes a master and
a slave. I followed this tutorial for the settings:
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/

Then I set up Giraph, and I built it properly with maven. When I run the
SimpleShortestPathVertex with number of workers = 2 it runs properly, and
gives me results which I can view from any of the two nodes. Also the
jobtracker at master:50030 and slave:50030 and everything else is working
as expected.

However, when I try to run my own algorithm it hangs at map 100% reduce 0%
forever. I looked at SimpleShortestPathVertex for any configurations and it
has none. And the weird part is: the jobs at the jobtracker have no logs at
stdout or stderr. The only thing readable is the map task info:

task_201409300940_0001_m_00 | 100.00% - MASTER_ZOOKEEPER_ONLY | 1
finished out of 2 on superstep -1
task_201409300940_0001_m_01 | 100.00% | startSuperstep: WORKER_ONLY -
Attempt=0, Superstep=-1
task_201409300940_0001_m_02 | 100.00% | startSuperstep: WORKER_ONLY -
Attempt=0, Superstep=-1

Is there anything I'm overlooking? I have Googled the obvious stack
overflow solutions for two days now. Has anyone encountered anything
similar?

Regards,
Panagiotis Eustratiadis.

Re: Giraph 1.0 | Computation stuck at map 100% - reduce 0% for my algorithm only, at multi-node cluster

2014-09-30 Thread Matthew Cornell

I'm new, but in my meager experience when it stops at map 100% it means
there was an error somewhere. In Giraph I've often found it difficult to
pin down what that error actually was (e.g., out of memory), but the logs
are the first place to look. Just to clarify re: not finding outputs: Are
you going to http://your_host.com:50030/jobtracker.jsp and clicking on
the failed job id (e.g., job_201409251209_0029 -
http://your_host.com:50030/jobdetails.jsp?jobid=job_201409251209_0029refresh=0
)? From there, click the map link in the table to see its tasks. (Giraph
runs entirely as a map task, IIUC.) You should see tasks for the master
plus your workers. If you click on one of them (e.g.,
task_201409251209_0029_m_00 -
http://your_host.com:50030/taskdetails.jsp?tipid=task_201409251209_0029_m_00
) you should see what machine it ran on plus a link to the Task Logs. Click
on All and you should see three sections for stdout, stderr, and syslog,
the latter of which usually contains hints about what went wrong. You
should check all the worker logs.

Hope that helps.


On Tue, Sep 30, 2014 at 2:53 AM, Panagiotis Eustratiadis 
ep.pan@gmail.com wrote:

 Good morning,

 I have been having a problem the past few days which sadly I can't solve.

 First of all I set up a Hadoop 0.20.203.0 cluster of two nodes a master
 and a slave. I followed this tutorial for the settings:
 http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/

 Then I set up Giraph, and I built it properly with maven. When I run the
 SimpleShortestPathVertex with number of workers = 2 it runs properly, and
 gives me results which I can view from any of the two nodes. Also the
 jobtracker at master:50030 and slave:50030 and everything else is working
 as expected.

 However, when I try to run my own algorithm it hangs at map 100% reduce 0%
 forever. I looked at SimpleShortestPathVertex for any configurations and it
 has none. And the weird part is: the jobs at the jobtracker have no logs at
 stdout or stderr. The only thing readable is the map task info:

 task_201409300940_0001_m_00 | 100.00% - MASTER_ZOOKEEPER_ONLY | 1
 finished out of 2 on superstep -1
 task_201409300940_0001_m_01 | 100.00% | startSuperstep: WORKER_ONLY -
 Attempt=0, Superstep=-1
 task_201409300940_0001_m_02 | 100.00% | startSuperstep: WORKER_ONLY -
 Attempt=0, Superstep=-1

 Is there anything I'm overlooking? I have Googled the obvious stack
 overflow solutions for two days now. Has anyone encountered anything
 similar?

 Regards,
 Panagiotis Eustratiadis.




-- 
Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson
Street, Amherst MA 01002 | matthewcornell.org

Re: Giraph 1.0 | Computation stuck at map 100% - reduce 0% for my algorithm only, at multi-node cluster

2014-09-30 Thread Panagiotis Eustratiadis

Hello Matthew, thanks for the answer.

Oddly enough the job isn't listed under the failed jobs, as it is still
running. The execution never ends unless I kill it from the command line.
And I did check the logs (I always do), but they don't say anything. By the
way I see no syslog, only stdout and stderr.

What I didn't mention in my previous post and it might help, is that the
algorithm executes perfectly on a single node cluster setup. And from the
fact that the SimpleShortestPathVertex runs on the multi node setup just
fine, we deduce that the multi node setup is correct (right?).

2014-09-30 15:24 GMT+03:00 Matthew Cornell m...@matthewcornell.org:

 I'm new, but in my meager experience when it stops at map 100% it means
 there was an error somewhere. In Giraph I've often found it difficult to
 pin down what that error actually was (e.g., out of memory), but the logs
 are the first place to look. Just to clarify re: not finding outputs: Are
 you going to http://your_host.com:50030/jobtracker.jsp and clicking on
 the failed job id (e.g., job_201409251209_0029 - 
 http://your_host.com:50030/jobdetails.jsp?jobid=job_201409251209_0029refresh=0
 )? From there, click the map link in the table to see its tasks. (Giraph
 runs entirely as a map task, IIUC.) You should see tasks for the master
 plus your workers. If you click on one of them (e.g.,
 task_201409251209_0029_m_00 - 
 http://your_host.com:50030/taskdetails.jsp?tipid=task_201409251209_0029_m_00
 ) you should see what machine it ran on plus a link to the Task Logs. Click
 on All and you should see three sections for stdout, stderr, and syslog,
 the latter of which usually contains hints about what went wrong. You
 should check all the worker logs.

 Hope that helps.


 On Tue, Sep 30, 2014 at 2:53 AM, Panagiotis Eustratiadis 
 ep.pan@gmail.com wrote:

 Good morning,

 I have been having a problem the past few days which sadly I can't solve.

 First of all I set up a Hadoop 0.20.203.0 cluster of two nodes a master
 and a slave. I followed this tutorial for the settings:
 http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/

 Then I set up Giraph, and I built it properly with maven. When I run the
 SimpleShortestPathVertex with number of workers = 2 it runs properly, and
 gives me results which I can view from any of the two nodes. Also the
 jobtracker at master:50030 and slave:50030 and everything else is working
 as expected.

 However, when I try to run my own algorithm it hangs at map 100% reduce
 0% forever. I looked at SimpleShortestPathVertex for any configurations and
 it has none. And the weird part is: the jobs at the jobtracker have no logs
 at stdout or stderr. The only thing readable is the map task info:

 task_201409300940_0001_m_00 | 100.00% - MASTER_ZOOKEEPER_ONLY | 1
 finished out of 2 on superstep -1
 task_201409300940_0001_m_01 | 100.00% | startSuperstep: WORKER_ONLY -
 Attempt=0, Superstep=-1
 task_201409300940_0001_m_02 | 100.00% | startSuperstep: WORKER_ONLY -
 Attempt=0, Superstep=-1

 Is there anything I'm overlooking? I have Googled the obvious stack
 overflow solutions for two days now. Has anyone encountered anything
 similar?

 Regards,
 Panagiotis Eustratiadis.




 --
 Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson
 Street, Amherst MA 01002 | matthewcornell.org

Giraph 1.0 | Computation stuck at map 100% - reduce 0% for my algorithm only, at multi-node cluster

Re: Giraph 1.0 | Computation stuck at map 100% - reduce 0% for my algorithm only, at multi-node cluster

Re: Giraph 1.0 | Computation stuck at map 100% - reduce 0% for my algorithm only, at multi-node cluster

3 matches

Site Navigation

Mail list logo

Footer information