Alessio Arleo created GIRAPH-970:
------------------------------------
Summary: Missing chosen workers on superstep -1
Key: GIRAPH-970
URL: https://issues.apache.org/jira/browse/GIRAPH-970
Project: Giraph
Issue Type: Bug
Components: bsp
Affects Versions: 1.1.0
Environment: Linux version 3.13.0-37-generic (buildd@kapok) (gcc
version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) 64 bit
Hadoop 1.2.1
Reporter: Alessio Arleo
I found a problem with Giraph 1.1.0 while trying to execute the
ShortestPathComputation example.
This is the command given:
$HADOOP_HOME/bin/hadoop jar
~/git/giraph_patched/giraph-examples/target/giraph-examples-1.1.0-for-hadoop-1.2.1-jar-with-dependencies.jar
org.apache.giraph.GiraphRunner
org.apache.giraph.examples.SimpleShortestPathsComputation -vif
org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip
/users/hadoop/input/tiny_graph.txt -vof
org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op
/users/hadoop/output/shortestpath -w 1
And there is the output:
#################################
Warning: $HADOOP_HOME is deprecated.
14/12/15 12:07:36 INFO utils.ConfigurationUtils: No edge input format
specified. Ensure your InputFormat does not require one.
14/12/15 12:07:36 INFO utils.ConfigurationUtils: No edge output format
specified. Ensure your OutputFormat does not require one.
14/12/15 12:07:36 INFO job.GiraphJob: run: Since checkpointing is disabled
(default), do not allow any task retries (setting mapred.map.max.attempts = 0,
old value = 4)
14/12/15 12:07:38 INFO job.GiraphJob: Tracking URL:
http://VirtualMINT-H023:50030/jobdetails.jsp?jobid=job_201412151205_0001
14/12/15 12:07:38 INFO job.GiraphJob: Waiting for resources... Job will start
only when it gets all 2 mappers
14/12/15 12:08:51 INFO job.HaltApplicationUtils$DefaultHaltInstructionsWriter:
writeHaltInstructions: To halt after next superstep execute:
'bin/halt-application --zkServer virtualmint-h023:22181 --zkNode
/_hadoopBsp/job_201412151205_0001/_haltComputation'
14/12/15 12:08:51 INFO mapred.JobClient: Running job: job_201412151205_0001
14/12/15 12:08:52 INFO mapred.JobClient: map 100% reduce 0%
################################
The computation hangs here until the timeout is reached. Here is what I found
while reading the first worker log.
2014-12-15 12:12:16,303 INFO org.apache.giraph.master.BspServiceMaster:
createVertexInputSplits: Starting to write input split data to zookeeper with 1
threads
2014-12-15 12:12:16,314 INFO org.apache.giraph.master.BspServiceMaster:
createVertexInputSplits: Done writing input split data to zookeeper
2014-12-15 12:12:16,332 INFO org.apache.giraph.comm.netty.NettyClient: Using
Netty without authentication.
2014-12-15 12:12:16,341 INFO org.apache.giraph.comm.netty.NettyClient:
connectAllAddresses: Successfully added 1 connections, (1 total connected) 0
failed, 0 failures total.
2014-12-15 12:12:16,344 INFO org.apache.giraph.partition.PartitionUtils:
computePartitionCount: Creating 1, default would have been 1 partitions.
2014-12-15 12:12:16,373 INFO org.apache.giraph.master.BspServiceMaster:
barrierOnWorkerList: 0 out of 1 workers finished on superstep -1 on path
/_hadoopBsp/job_201412151211_0001/_vertexInputSplitDoneDir
2014-12-15 12:12:16,375 INFO org.apache.giraph.master.BspServiceMaster:
barrierOnWorkerList: Waiting on [virtualmint-h023_1]
2014-12-15 12:12:16,393 INFO org.apache.giraph.comm.netty.NettyServer: start:
Using Netty without authentication.
2014-12-15 12:12:16,464 ERROR org.apache.giraph.master.BspServiceMaster:
barrierOnWorkerList: Missing chosen workers [Worker(hostname=virtualmint-h023,
MRtaskID=1, port=30001)] on superstep -1
2014-12-15 12:12:16,464 ERROR org.apache.giraph.master.MasterThread:
masterThread: Master algorithm failed with IllegalStateException
java.lang.IllegalStateException: coordinateVertexInputSplits: Worker failed
during input split (currently not supported)
at
org.apache.giraph.master.BspServiceMaster.coordinateInputSplits(BspServiceMaster.java:1489)
at
org.apache.giraph.master.BspServiceMaster.coordinateSuperstep(BspServiceMaster.java:1656)
at org.apache.giraph.master.MasterThread.run(MasterThread.java:124)
2014-12-15 12:12:16,464 FATAL org.apache.giraph.graph.GraphTaskManager:
uncaughtException: OverrideExceptionHandler on thread
org.apache.giraph.master.MasterThread, msg = java.lang.IllegalStateException:
coordinateVertexInputSplits: Worker failed during input split (currently not
supported), exiting...
java.lang.IllegalStateException: java.lang.IllegalStateException:
coordinateVertexInputSplits: Worker failed during input split (currently not
supported)
at org.apache.giraph.master.MasterThread.run(MasterThread.java:194)
Caused by: java.lang.IllegalStateException: coordinateVertexInputSplits: Worker
failed during input split (currently not supported)
at
org.apache.giraph.master.BspServiceMaster.coordinateInputSplits(BspServiceMaster.java:1489)
at
org.apache.giraph.master.BspServiceMaster.coordinateSuperstep(BspServiceMaster.java:1656)
at org.apache.giraph.master.MasterThread.run(MasterThread.java:124)
2014-12-15 12:12:16,464 WARN org.apache.giraph.zk.ZooKeeperManager:
logZooKeeperOutput: Dumping up to last 100 lines of the ZooKeeper process
STDOUT and STDERR.
################################
Computation does not even get to first superstep. Giraph cannot find the
worker. Giraph-904 patch applied to BspServiceMaster.
I am running the Hadoop 1.2.1 on a single machine with the configuration
suggested in the Giraph Quick Start guide. Hadoop itself works fine (tested
with wordcount example).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)