[ 
https://issues.apache.org/jira/browse/GIRAPH-601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13617924#comment-13617924
 ] 

Eugene Koontz commented on GIRAPH-601:
--------------------------------------

Thanks for the feedback Eli! This is where I was waylaid while hoping to take a 
look at GIRAPH-13. Once I get this sorted out I hope to continue with the 
latter!

Here's some output enhanced with taskId information for each worker and master:

{code}
ekoontz@Eugenes-MacBook-Pro ~$ perl format.pl < postprocessed.txt
application_1364578380737_0027/container_1364578380737_0027_01_000002/syslog
=========
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Log level remains 
at info
INFO [main] org.apache.giraph.graph.GraphTaskManager: Distributed cache is 
empty. Assuming fatjar.
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: classpath @ 
/tmp/hadoop-yarn/staging/ekoontz/.staging/job_1364578380737_0027/job.jar for 
job org.apache.giraph.benchmark.PageRankBenchmark
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker: true
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: taskPartition: 0
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is 
true.
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is 
true and zkAlreadyProvided=true.
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is 
true and taskPartition (0) is less than masterCount (1), so MASTER_ONLY.
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Starting up 
BspServiceMaster (master thread)...
INFO [main] org.apache.zookeeper.ZooKeeper: Initiating client connection, 
connectString=localhost:2181 sessionTimeout=60000 
watcher=org.apache.giraph.master.BspServiceMaster@47875da7
INFO [main] org.apache.giraph.graph.GraphTaskManager: map: No need to do 
anything when not a worker
INFO [main] org.apache.giraph.graph.GraphTaskManager: cleanup: Starting for 
MASTER_ONLY
INFO [org.apache.giraph.master.MasterThread] 
org.apache.giraph.master.BspServiceMaster: becomeMaster: First child is 
'/_hadoopBsp/job_1364578380737_0027/_masterElectionDir/eugenes-macbook-pro.local_00000000000'
 and my bid is 
'/_hadoopBsp/job_1364578380737_0027/_masterElectionDir/eugenes-macbook-pro.local_00000000000'
INFO [org.apache.giraph.master.MasterThread] 
org.apache.giraph.comm.netty.NettyServer: NettyServer: Using execution handler 
with 8 threads after requestFrameDecoder.
WARN [org.apache.giraph.master.MasterThread] 
org.apache.hadoop.conf.Configuration: mapred.map.tasks is deprecated. Instead, 
use mapreduce.job.maps
INFO [org.apache.giraph.master.MasterThread] 
org.apache.giraph.comm.netty.NettyServer: start: Started server communication 
server: Eugenes-MacBook-Pro.local/172.16.175.1:30000 with up to 16 threads on 
bind attempt 0 with sendBufferSize = 32768 receiveBufferSize = 524288 backlog = 
6
INFO [org.apache.giraph.master.MasterThread] 
org.apache.giraph.comm.netty.NettyClient: NettyClient: Using execution handler 
with 8 threads after requestEncoder.
INFO [org.apache.giraph.master.MasterThread] 
org.apache.giraph.master.BspServiceMaster: becomeMaster: I am now the master!
DEBUG [org.apache.giraph.master.MasterThread] 
org.apache.giraph.master.BspServiceMaster: checkWorkers: Got event that health 
registration changed, not using poll attempt
DEBUG [org.apache.giraph.master.MasterThread] 
org.apache.giraph.master.BspServiceMaster: checkWorkers: Got event that health 
registration changed, not using poll attempt
DEBUG [org.apache.giraph.master.MasterThread] 
org.apache.giraph.master.BspServiceMaster: checkWorkers: Got event that health 
registration changed, not using poll attempt
DEBUG [org.apache.giraph.master.MasterThread] 
org.apache.giraph.master.BspServiceMaster: checkWorkers: Got event that health 
registration changed, not using poll attempt
DEBUG [org.apache.giraph.master.MasterThread] 
org.apache.giraph.master.BspServiceMaster: checkWorkers: Got event that health 
registration changed, not using poll attempt
INFO [org.apache.giraph.master.MasterThread] 
org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses 
of 6 needed to start superstep -1.  Reporting every 30000 msecs, 565085 more 
msecs left before giving up.
INFO [org.apache.giraph.master.MasterThread] 
org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No 
response from partition 6 (could be master)
INFO [org.apache.giraph.master.MasterThread] 
org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses 
of 6 needed to start superstep -1.  Reporting every 30000 msecs, 535076 more 
msecs left before giving up.
INFO [org.apache.giraph.master.MasterThread] 
org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No 
response from partition 6 (could be master)
INFO [org.apache.giraph.master.MasterThread] 
org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses 
of 6 needed to start superstep -1.  Reporting every 30000 msecs, 505068 more 
msecs left before giving up.
INFO [org.apache.giraph.master.MasterThread] 
org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No 
response from partition 6 (could be master)
INFO [org.apache.giraph.master.MasterThread] 
org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses 
of 6 needed to start superstep -1.  Reporting every 30000 msecs, 475059 more 
msecs left before giving up.
INFO [org.apache.giraph.master.MasterThread] 
org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No 
response from partition 6 (could be master)
INFO [org.apache.giraph.master.MasterThread] 
org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses 
of 6 needed to start superstep -1.  Reporting every 30000 msecs, 445051 more 
msecs left before giving up.
INFO [org.apache.giraph.master.MasterThread] 
org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No 
response from partition 6 (could be master)
INFO [org.apache.giraph.master.MasterThread] 
org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses 
of 6 needed to start superstep -1.  Reporting every 30000 msecs, 415042 more 
msecs left before giving up.
INFO [org.apache.giraph.master.MasterThread] 
org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No 
response from partition 6 (could be master)
INFO [org.apache.giraph.master.MasterThread] 
org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses 
of 6 needed to start superstep -1.  Reporting every 30000 msecs, 385033 more 
msecs left before giving up.
INFO [org.apache.giraph.master.MasterThread] 
org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No 
response from partition 6 (could be master)
INFO [org.apache.giraph.master.MasterThread] 
org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses 
of 6 needed to start superstep -1.  Reporting every 30000 msecs, 355024 more 
msecs left before giving up.
INFO [org.apache.giraph.master.MasterThread] 
org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No 
response from partition 6 (could be master)
INFO [org.apache.giraph.master.MasterThread] 
org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses 
of 6 needed to start superstep -1.  Reporting every 30000 msecs, 325016 more 
msecs left before giving up.
INFO [org.apache.giraph.master.MasterThread] 
org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No 
response from partition 6 (could be master)
INFO [org.apache.giraph.master.MasterThread] 
org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses 
of 6 needed to start superstep -1.  Reporting every 30000 msecs, 295008 more 
msecs left before giving up.
INFO [org.apache.giraph.master.MasterThread] 
org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No 
response from partition 6 (could be master)
INFO [org.apache.giraph.master.MasterThread] 
org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses 
of 6 needed to start superstep -1.  Reporting every 30000 msecs, 265001 more 
msecs left before giving up.
INFO [org.apache.giraph.master.MasterThread] 
org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No 
response from partition 6 (could be master)
INFO [org.apache.giraph.master.MasterThread] 
org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses 
of 6 needed to start superstep -1.  Reporting every 30000 msecs, 234992 more 
msecs left before giving up.
INFO [org.apache.giraph.master.MasterThread] 
org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No 
response from partition 6 (could be master)
INFO [org.apache.giraph.master.MasterThread] 
org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses 
of 6 needed to start superstep -1.  Reporting every 30000 msecs, 204984 more 
msecs left before giving up.
INFO [org.apache.giraph.master.MasterThread] 
org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No 
response from partition 6 (could be master)
INFO [org.apache.giraph.master.MasterThread] 
org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses 
of 6 needed to start superstep -1.  Reporting every 30000 msecs, 174974 more 
msecs left before giving up.
INFO [org.apache.giraph.master.MasterThread] 
org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No 
response from partition 6 (could be master)
INFO [org.apache.giraph.master.MasterThread] 
org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses 
of 6 needed to start superstep -1.  Reporting every 30000 msecs, 144968 more 
msecs left before giving up.
INFO [org.apache.giraph.master.MasterThread] 
org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No 
response from partition 6 (could be master)
INFO [org.apache.giraph.master.MasterThread] 
org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses 
of 6 needed to start superstep -1.  Reporting every 30000 msecs, 114959 more 
msecs left before giving up.
INFO [org.apache.giraph.master.MasterThread] 
org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No 
response from partition 6 (could be master)
INFO [org.apache.giraph.master.MasterThread] 
org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses 
of 6 needed to start superstep -1.  Reporting every 30000 msecs, 84952 more 
msecs left before giving up.
INFO [org.apache.giraph.master.MasterThread] 
org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No 
response from partition 6 (could be master)
INFO [org.apache.giraph.master.MasterThread] 
org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses 
of 6 needed to start superstep -1.  Reporting every 30000 msecs, 54946 more 
msecs left before giving up.
INFO [org.apache.giraph.master.MasterThread] 
org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No 
response from partition 6 (could be master)
INFO [org.apache.giraph.master.MasterThread] 
org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses 
of 6 needed to start superstep -1.  Reporting every 30000 msecs, 24938 more 
msecs left before giving up.
INFO [org.apache.giraph.master.MasterThread] 
org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No 
response from partition 6 (could be master)
INFO [org.apache.giraph.master.MasterThread] 
org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses 
of 6 needed to start superstep -1.  Reporting every 30000 msecs, -5072 more 
msecs left before giving up.
INFO [org.apache.giraph.master.MasterThread] 
org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No 
response from partition 6 (could be master)
ERROR [org.apache.giraph.master.MasterThread] 
org.apache.giraph.master.BspServiceMaster: checkWorkers: Did not receive enough 
processes in time (only 5 of 6 required) after waiting 600000msecs).  This 
occurs if you do not have enough map tasks available simultaneously on your 
Hadoop instance to fulfill the number of requested workers.
INFO [org.apache.giraph.master.MasterThread] 
org.apache.giraph.master.BspServiceMaster: setJobState: 
{"_stateKey":"FAILED","_applicationAttemptKey":-1,"_superstepKey":-1} on 
superstep -1
FATAL [org.apache.giraph.master.MasterThread] 
org.apache.giraph.master.BspServiceMaster: failJob: Killing job 
job_1364578380737_0027
FATAL [org.apache.giraph.master.MasterThread] 
org.apache.giraph.master.BspServiceMaster: failJob: exception 
java.lang.IllegalStateException: Not enough healthy workers to create input 
splits
WARN [org.apache.giraph.master.MasterThread] 
org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final 
parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
WARN [org.apache.giraph.master.MasterThread] 
org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final 
parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
INFO [org.apache.giraph.master.MasterThread] 
org.apache.hadoop.yarn.service.AbstractService: 
Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
INFO [org.apache.giraph.master.MasterThread] 
org.apache.hadoop.yarn.service.AbstractService: 
Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
ERROR [org.apache.giraph.master.MasterThread] 
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException 
as:ekoontz (auth:SIMPLE) cause:java.io.IOException
ERROR [org.apache.giraph.master.MasterThread] 
org.apache.giraph.master.MasterThread: masterThread: Master algorithm failed 
with RuntimeException






FATAL [org.apache.giraph.master.MasterThread] 
org.apache.giraph.graph.GraphMapper: uncaughtException: 
OverrideExceptionHandler on thread org.apache.giraph.master.MasterThread, msg = 
java.lang.RuntimeException: java.io.IOException, exiting...








application_1364578380737_0027/container_1364578380737_0027_01_000003/syslog
=========
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Log level remains 
at info
INFO [main] org.apache.giraph.graph.GraphTaskManager: Distributed cache is 
empty. Assuming fatjar.
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: classpath @ 
/tmp/hadoop-yarn/staging/ekoontz/.staging/job_1364578380737_0027/job.jar for 
job org.apache.giraph.benchmark.PageRankBenchmark
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker: true
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: taskPartition: 1
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is 
true.
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is 
true and zkAlreadyProvided=true.
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is 
true and taskPartition (1) is NOT less than masterCount (1), so WORKER_ONLY.
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Starting up 
BspServiceWorker...
INFO [main] org.apache.zookeeper.ZooKeeper: Initiating client connection, 
connectString=localhost:2181 sessionTimeout=60000 
watcher=org.apache.giraph.worker.BspServiceWorker@5b4bc4e6
INFO [main] org.apache.giraph.comm.netty.NettyServer: NettyServer: Using 
execution handler with 8 threads after requestFrameDecoder.
INFO [main] org.apache.giraph.comm.netty.NettyServer: start: Started server 
communication server: Eugenes-MacBook-Pro.local/172.16.175.1:30001 with up to 
16 threads on bind attempt 0 with sendBufferSize = 32768 receiveBufferSize = 
524288 backlog = 6
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Registering health 
of this worker...
INFO [main] org.apache.giraph.bsp.BspService: getJobState: Job state already 
exists (/_hadoopBsp/job_1364578380737_0027/_masterJobState)
DEBUG [main] org.apache.giraph.worker.BspServiceWorker: worker: 
Worker(hostname=Eugenes-MacBook-Pro.local, MRtaskID=1, port=30001) with 
taskId=1 is starting superstep.
INFO [main] org.apache.giraph.worker.BspServiceWorker: registerHealth: Created 
my health node for attempt=0, superstep=-1 with 
/_hadoopBsp/job_1364578380737_0027/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/Eugenes-MacBook-Pro.local_1
 and workerInfo= Worker(hostname=Eugenes-MacBook-Pro.local, MRtaskID=1, 
port=30001)
INFO [main-EventThread] org.apache.giraph.worker.BspServiceWorker: 
processEvent: Job state changed, checking to see if it needs to restart
INFO [main-EventThread] org.apache.giraph.bsp.BspService: getJobState: Job 
state already exists (/_hadoopBsp/job_1364578380737_0027/_masterJobState)

application_1364578380737_0027/container_1364578380737_0027_01_000004/syslog
=========
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Log level remains 
at info
INFO [main] org.apache.giraph.graph.GraphTaskManager: Distributed cache is 
empty. Assuming fatjar.
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: classpath @ 
/tmp/hadoop-yarn/staging/ekoontz/.staging/job_1364578380737_0027/job.jar for 
job org.apache.giraph.benchmark.PageRankBenchmark
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker: true
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: taskPartition: 2
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is 
true.
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is 
true and zkAlreadyProvided=true.
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is 
true and taskPartition (2) is NOT less than masterCount (1), so WORKER_ONLY.
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Starting up 
BspServiceWorker...
INFO [main] org.apache.zookeeper.ZooKeeper: Initiating client connection, 
connectString=localhost:2181 sessionTimeout=60000 
watcher=org.apache.giraph.worker.BspServiceWorker@7aec8784
INFO [main] org.apache.giraph.comm.netty.NettyServer: NettyServer: Using 
execution handler with 8 threads after requestFrameDecoder.
INFO [main] org.apache.giraph.comm.netty.NettyServer: start: Started server 
communication server: Eugenes-MacBook-Pro.local/172.16.175.1:30002 with up to 
16 threads on bind attempt 0 with sendBufferSize = 32768 receiveBufferSize = 
524288 backlog = 6
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Registering health 
of this worker...
INFO [main] org.apache.giraph.bsp.BspService: getJobState: Job state already 
exists (/_hadoopBsp/job_1364578380737_0027/_masterJobState)
DEBUG [main] org.apache.giraph.worker.BspServiceWorker: worker: 
Worker(hostname=Eugenes-MacBook-Pro.local, MRtaskID=2, port=30002) with 
taskId=2 is starting superstep.
INFO [main] org.apache.giraph.worker.BspServiceWorker: registerHealth: Created 
my health node for attempt=0, superstep=-1 with 
/_hadoopBsp/job_1364578380737_0027/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/Eugenes-MacBook-Pro.local_2
 and workerInfo= Worker(hostname=Eugenes-MacBook-Pro.local, MRtaskID=2, 
port=30002)
INFO [main-EventThread] org.apache.giraph.worker.BspServiceWorker: 
processEvent: Job state changed, checking to see if it needs to restart
INFO [main-EventThread] org.apache.giraph.bsp.BspService: getJobState: Job 
state already exists (/_hadoopBsp/job_1364578380737_0027/_masterJobState)

application_1364578380737_0027/container_1364578380737_0027_01_000005/syslog
=========
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Log level remains 
at info
INFO [main] org.apache.giraph.graph.GraphTaskManager: Distributed cache is 
empty. Assuming fatjar.
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: classpath @ 
/tmp/hadoop-yarn/staging/ekoontz/.staging/job_1364578380737_0027/job.jar for 
job org.apache.giraph.benchmark.PageRankBenchmark
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker: true
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: taskPartition: 3
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is 
true.
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is 
true and zkAlreadyProvided=true.
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is 
true and taskPartition (3) is NOT less than masterCount (1), so WORKER_ONLY.
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Starting up 
BspServiceWorker...
INFO [main] org.apache.zookeeper.ZooKeeper: Initiating client connection, 
connectString=localhost:2181 sessionTimeout=60000 
watcher=org.apache.giraph.worker.BspServiceWorker@c45aa2c
INFO [main] org.apache.giraph.comm.netty.NettyServer: NettyServer: Using 
execution handler with 8 threads after requestFrameDecoder.
INFO [main] org.apache.giraph.comm.netty.NettyServer: start: Started server 
communication server: Eugenes-MacBook-Pro.local/172.16.175.1:30003 with up to 
16 threads on bind attempt 0 with sendBufferSize = 32768 receiveBufferSize = 
524288 backlog = 6
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Registering health 
of this worker...
INFO [main] org.apache.giraph.bsp.BspService: getJobState: Job state already 
exists (/_hadoopBsp/job_1364578380737_0027/_masterJobState)
DEBUG [main] org.apache.giraph.worker.BspServiceWorker: worker: 
Worker(hostname=Eugenes-MacBook-Pro.local, MRtaskID=3, port=30003) with 
taskId=3 is starting superstep.
INFO [main] org.apache.giraph.worker.BspServiceWorker: registerHealth: Created 
my health node for attempt=0, superstep=-1 with 
/_hadoopBsp/job_1364578380737_0027/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/Eugenes-MacBook-Pro.local_3
 and workerInfo= Worker(hostname=Eugenes-MacBook-Pro.local, MRtaskID=3, 
port=30003)
INFO [main-EventThread] org.apache.giraph.worker.BspServiceWorker: 
processEvent: Job state changed, checking to see if it needs to restart
INFO [main-EventThread] org.apache.giraph.bsp.BspService: getJobState: Job 
state already exists (/_hadoopBsp/job_1364578380737_0027/_masterJobState)

application_1364578380737_0027/container_1364578380737_0027_01_000006/syslog
=========
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Log level remains 
at info
INFO [main] org.apache.giraph.graph.GraphTaskManager: Distributed cache is 
empty. Assuming fatjar.
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: classpath @ 
/tmp/hadoop-yarn/staging/ekoontz/.staging/job_1364578380737_0027/job.jar for 
job org.apache.giraph.benchmark.PageRankBenchmark
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker: true
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: taskPartition: 4
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is 
true.
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is 
true and zkAlreadyProvided=true.
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is 
true and taskPartition (4) is NOT less than masterCount (1), so WORKER_ONLY.
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Starting up 
BspServiceWorker...
INFO [main] org.apache.zookeeper.ZooKeeper: Initiating client connection, 
connectString=localhost:2181 sessionTimeout=60000 
watcher=org.apache.giraph.worker.BspServiceWorker@7e413fc6
INFO [main] org.apache.giraph.comm.netty.NettyServer: NettyServer: Using 
execution handler with 8 threads after requestFrameDecoder.
INFO [main] org.apache.giraph.comm.netty.NettyServer: start: Started server 
communication server: Eugenes-MacBook-Pro.local/172.16.175.1:30004 with up to 
16 threads on bind attempt 0 with sendBufferSize = 32768 receiveBufferSize = 
524288 backlog = 6
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Registering health 
of this worker...
INFO [main] org.apache.giraph.bsp.BspService: getJobState: Job state already 
exists (/_hadoopBsp/job_1364578380737_0027/_masterJobState)
DEBUG [main] org.apache.giraph.worker.BspServiceWorker: worker: 
Worker(hostname=Eugenes-MacBook-Pro.local, MRtaskID=4, port=30004) with 
taskId=4 is starting superstep.
INFO [main] org.apache.giraph.worker.BspServiceWorker: registerHealth: Created 
my health node for attempt=0, superstep=-1 with 
/_hadoopBsp/job_1364578380737_0027/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/Eugenes-MacBook-Pro.local_4
 and workerInfo= Worker(hostname=Eugenes-MacBook-Pro.local, MRtaskID=4, 
port=30004)
INFO [main-EventThread] org.apache.giraph.worker.BspServiceWorker: 
processEvent: Job state changed, checking to see if it needs to restart
INFO [main-EventThread] org.apache.giraph.bsp.BspService: getJobState: Job 
state already exists (/_hadoopBsp/job_1364578380737_0027/_masterJobState)

application_1364578380737_0027/container_1364578380737_0027_01_000007/syslog
=========
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Log level remains 
at info
INFO [main] org.apache.giraph.graph.GraphTaskManager: Distributed cache is 
empty. Assuming fatjar.
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: classpath @ 
/tmp/hadoop-yarn/staging/ekoontz/.staging/job_1364578380737_0027/job.jar for 
job org.apache.giraph.benchmark.PageRankBenchmark
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker: true
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: taskPartition: 5
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is 
true.
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is 
true and zkAlreadyProvided=true.
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is 
true and taskPartition (5) is NOT less than masterCount (1), so WORKER_ONLY.
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Starting up 
BspServiceWorker...
INFO [main] org.apache.zookeeper.ZooKeeper: Initiating client connection, 
connectString=localhost:2181 sessionTimeout=60000 
watcher=org.apache.giraph.worker.BspServiceWorker@3b5ad1da
INFO [main] org.apache.giraph.comm.netty.NettyServer: NettyServer: Using 
execution handler with 8 threads after requestFrameDecoder.
INFO [main] org.apache.giraph.comm.netty.NettyServer: start: Started server 
communication server: Eugenes-MacBook-Pro.local/172.16.175.1:30005 with up to 
16 threads on bind attempt 0 with sendBufferSize = 32768 receiveBufferSize = 
524288 backlog = 6
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Registering health 
of this worker...
INFO [main] org.apache.giraph.bsp.BspService: getJobState: Job state already 
exists (/_hadoopBsp/job_1364578380737_0027/_masterJobState)
DEBUG [main] org.apache.giraph.worker.BspServiceWorker: worker: 
Worker(hostname=Eugenes-MacBook-Pro.local, MRtaskID=5, port=30005) with 
taskId=5 is starting superstep.
INFO [main] org.apache.giraph.worker.BspServiceWorker: registerHealth: Created 
my health node for attempt=0, superstep=-1 with 
/_hadoopBsp/job_1364578380737_0027/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/Eugenes-MacBook-Pro.local_5
 and workerInfo= Worker(hostname=Eugenes-MacBook-Pro.local, MRtaskID=5, 
port=30005)
INFO [main-EventThread] org.apache.giraph.worker.BspServiceWorker: 
processEvent: Job state changed, checking to see if it needs to restart
INFO [main-EventThread] org.apache.giraph.bsp.BspService: getJobState: Job 
state already exists (/_hadoopBsp/job_1364578380737_0027/_masterJobState)

application_1364578380737_0027/container_1364578380737_0027_01_000008/syslog
=========
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Log level remains 
at info
INFO [main] org.apache.giraph.graph.GraphTaskManager: Distributed cache is 
empty. Assuming fatjar.
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: classpath @ 
/tmp/hadoop-yarn/staging/ekoontz/.staging/job_1364578380737_0027/job.jar for 
job org.apache.giraph.benchmark.PageRankBenchmark
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker: true
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: taskPartition: 6
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is 
true.
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is 
true and zkAlreadyProvided=true.
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is 
true and taskPartition (6) is NOT less than masterCount (1), so WORKER_ONLY.
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Starting up 
BspServiceWorker...
INFO [main] org.apache.zookeeper.ZooKeeper: Initiating client connection, 
connectString=localhost:2181 sessionTimeout=60000 
watcher=org.apache.giraph.worker.BspServiceWorker@1b275a34
INFO [main] org.apache.giraph.comm.netty.NettyServer: NettyServer: Using 
execution handler with 8 threads after requestFrameDecoder.
INFO [main] org.apache.giraph.comm.netty.NettyServer: start: Started server 
communication server: Eugenes-MacBook-Pro.local/172.16.175.1:30006 with up to 
16 threads on bind attempt 0 with sendBufferSize = 32768 receiveBufferSize = 
524288 backlog = 6
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Registering health 
of this worker...
INFO [main] org.apache.giraph.bsp.BspService: getJobState: Job state already 
exists (/_hadoopBsp/job_1364578380737_0027/_masterJobState)
DEBUG [main] org.apache.giraph.worker.BspServiceWorker: worker: 
Worker(hostname=Eugenes-MacBook-Pro.local, MRtaskID=6, port=30006) with 
taskId=6 is starting superstep.
INFO [main] org.apache.giraph.worker.BspServiceWorker: registerHealth: Created 
my health node for attempt=0, superstep=-1 with 
/_hadoopBsp/job_1364578380737_0027/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/Eugenes-MacBook-Pro.local_6
 and workerInfo= Worker(hostname=Eugenes-MacBook-Pro.local, MRtaskID=6, 
port=30006)

{code}

                
> Exception when running pagerank benchmark: SendVertexRequest cannot be cast 
> to MasterRequest
> --------------------------------------------------------------------------------------------
>
>                 Key: GIRAPH-601
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-601
>             Project: Giraph
>          Issue Type: Bug
>            Reporter: Eugene Koontz
>         Attachments: instrumentation.patch
>
>
> Building Giraph with:
> {code}
> mvn -DskipTests  -Phadoop_2.0.3 clean compile
> {code}
> Running pagerank like this:
> {code}
>  $HADOOP_RUNTIME/bin/hadoop jar $JAR \
>          org.apache.giraph.benchmark.PageRankBenchmark \
>         -e 10 -s 10 -v -V 10 -w 6
> {code}
> I see this in  
> /tmp/userlogs/application_1364578380737_0003/container_1364578380737_0003_01_000002/
>  :
> {code}
> 2013-03-29 10:58:06,371 DEBUG [org.apache.giraph.master.MasterThread] 
> org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: Got finished 
> worker list = [Eugenes-MacBook-Pro.local_1, Eugenes-MacBook-Pro.local_3], 
> size = 2, worker list = [Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=2, port=30002), Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=1, port=30001), Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=4, port=30004), Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=3, port=30003), Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=5, port=30005), Worker(hostname=Eugenes-MacBook-Pro.local, 
> MRtaskID=0, port=30010)], size = 6 from 
> /_hadoopBsp/job_1364578380737_0003/_vertexInputSplitDoneDir
> 2013-03-29 10:58:06,373 WARN [netty-server-exec-3] 
> org.apache.giraph.comm.netty.handler.RequestServerHandler: exceptionCaught: 
> Channel failed with remote address /172.16.175.1:56236
> java.lang.ClassCastException: 
> org.apache.giraph.comm.requests.SendVertexRequest cannot be cast to 
> org.apache.giraph.comm.requests.MasterRequest
>       at 
> org.apache.giraph.comm.netty.handler.MasterRequestServerHandler.processRequest(MasterRequestServerHandler.java:27)
>       at 
> org.apache.giraph.comm.netty.handler.RequestServerHandler.messageReceived(RequestServerHandler.java:106)
>       at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
>       at 
> org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:71)
>       at 
> org.jboss.netty.handler.execution.ChannelUpstreamEventRunnable.doRun(ChannelUpstreamEventRunnable.java:45)
>       at 
> org.jboss.netty.handler.execution.ChannelEventRunnable.run(ChannelEventRunnable.java:69)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>       at java.lang.Thread.run(Thread.java:680)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to