[ https://issues.apache.org/jira/browse/GIRAPH-601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13617924#comment-13617924 ]
Eugene Koontz commented on GIRAPH-601: -------------------------------------- Thanks for the feedback Eli! This is where I was waylaid while hoping to take a look at GIRAPH-13. Once I get this sorted out I hope to continue with the latter! Here's some output enhanced with taskId information for each worker and master: {code} ekoontz@Eugenes-MacBook-Pro ~$ perl format.pl < postprocessed.txt application_1364578380737_0027/container_1364578380737_0027_01_000002/syslog ========= INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Log level remains at info INFO [main] org.apache.giraph.graph.GraphTaskManager: Distributed cache is empty. Assuming fatjar. INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: classpath @ /tmp/hadoop-yarn/staging/ekoontz/.staging/job_1364578380737_0027/job.jar for job org.apache.giraph.benchmark.PageRankBenchmark DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker: true DEBUG [main] org.apache.giraph.graph.GraphTaskManager: taskPartition: 0 DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is true. DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is true and zkAlreadyProvided=true. DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is true and taskPartition (0) is less than masterCount (1), so MASTER_ONLY. INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Starting up BspServiceMaster (master thread)... INFO [main] org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=60000 watcher=org.apache.giraph.master.BspServiceMaster@47875da7 INFO [main] org.apache.giraph.graph.GraphTaskManager: map: No need to do anything when not a worker INFO [main] org.apache.giraph.graph.GraphTaskManager: cleanup: Starting for MASTER_ONLY INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: becomeMaster: First child is '/_hadoopBsp/job_1364578380737_0027/_masterElectionDir/eugenes-macbook-pro.local_00000000000' and my bid is '/_hadoopBsp/job_1364578380737_0027/_masterElectionDir/eugenes-macbook-pro.local_00000000000' INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.comm.netty.NettyServer: NettyServer: Using execution handler with 8 threads after requestFrameDecoder. WARN [org.apache.giraph.master.MasterThread] org.apache.hadoop.conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.comm.netty.NettyServer: start: Started server communication server: Eugenes-MacBook-Pro.local/172.16.175.1:30000 with up to 16 threads on bind attempt 0 with sendBufferSize = 32768 receiveBufferSize = 524288 backlog = 6 INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.comm.netty.NettyClient: NettyClient: Using execution handler with 8 threads after requestEncoder. INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: becomeMaster: I am now the master! DEBUG [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: checkWorkers: Got event that health registration changed, not using poll attempt DEBUG [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: checkWorkers: Got event that health registration changed, not using poll attempt DEBUG [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: checkWorkers: Got event that health registration changed, not using poll attempt DEBUG [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: checkWorkers: Got event that health registration changed, not using poll attempt DEBUG [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: checkWorkers: Got event that health registration changed, not using poll attempt INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses of 6 needed to start superstep -1. Reporting every 30000 msecs, 565085 more msecs left before giving up. INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No response from partition 6 (could be master) INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses of 6 needed to start superstep -1. Reporting every 30000 msecs, 535076 more msecs left before giving up. INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No response from partition 6 (could be master) INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses of 6 needed to start superstep -1. Reporting every 30000 msecs, 505068 more msecs left before giving up. INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No response from partition 6 (could be master) INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses of 6 needed to start superstep -1. Reporting every 30000 msecs, 475059 more msecs left before giving up. INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No response from partition 6 (could be master) INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses of 6 needed to start superstep -1. Reporting every 30000 msecs, 445051 more msecs left before giving up. INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No response from partition 6 (could be master) INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses of 6 needed to start superstep -1. Reporting every 30000 msecs, 415042 more msecs left before giving up. INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No response from partition 6 (could be master) INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses of 6 needed to start superstep -1. Reporting every 30000 msecs, 385033 more msecs left before giving up. INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No response from partition 6 (could be master) INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses of 6 needed to start superstep -1. Reporting every 30000 msecs, 355024 more msecs left before giving up. INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No response from partition 6 (could be master) INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses of 6 needed to start superstep -1. Reporting every 30000 msecs, 325016 more msecs left before giving up. INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No response from partition 6 (could be master) INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses of 6 needed to start superstep -1. Reporting every 30000 msecs, 295008 more msecs left before giving up. INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No response from partition 6 (could be master) INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses of 6 needed to start superstep -1. Reporting every 30000 msecs, 265001 more msecs left before giving up. INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No response from partition 6 (could be master) INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses of 6 needed to start superstep -1. Reporting every 30000 msecs, 234992 more msecs left before giving up. INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No response from partition 6 (could be master) INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses of 6 needed to start superstep -1. Reporting every 30000 msecs, 204984 more msecs left before giving up. INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No response from partition 6 (could be master) INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses of 6 needed to start superstep -1. Reporting every 30000 msecs, 174974 more msecs left before giving up. INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No response from partition 6 (could be master) INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses of 6 needed to start superstep -1. Reporting every 30000 msecs, 144968 more msecs left before giving up. INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No response from partition 6 (could be master) INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses of 6 needed to start superstep -1. Reporting every 30000 msecs, 114959 more msecs left before giving up. INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No response from partition 6 (could be master) INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses of 6 needed to start superstep -1. Reporting every 30000 msecs, 84952 more msecs left before giving up. INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No response from partition 6 (could be master) INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses of 6 needed to start superstep -1. Reporting every 30000 msecs, 54946 more msecs left before giving up. INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No response from partition 6 (could be master) INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses of 6 needed to start superstep -1. Reporting every 30000 msecs, 24938 more msecs left before giving up. INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No response from partition 6 (could be master) INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses of 6 needed to start superstep -1. Reporting every 30000 msecs, -5072 more msecs left before giving up. INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No response from partition 6 (could be master) ERROR [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: checkWorkers: Did not receive enough processes in time (only 5 of 6 required) after waiting 600000msecs). This occurs if you do not have enough map tasks available simultaneously on your Hadoop instance to fulfill the number of requested workers. INFO [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: setJobState: {"_stateKey":"FAILED","_applicationAttemptKey":-1,"_superstepKey":-1} on superstep -1 FATAL [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: failJob: Killing job job_1364578380737_0027 FATAL [org.apache.giraph.master.MasterThread] org.apache.giraph.master.BspServiceMaster: failJob: exception java.lang.IllegalStateException: Not enough healthy workers to create input splits WARN [org.apache.giraph.master.MasterThread] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. WARN [org.apache.giraph.master.MasterThread] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. INFO [org.apache.giraph.master.MasterThread] org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited. INFO [org.apache.giraph.master.MasterThread] org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started. ERROR [org.apache.giraph.master.MasterThread] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:ekoontz (auth:SIMPLE) cause:java.io.IOException ERROR [org.apache.giraph.master.MasterThread] org.apache.giraph.master.MasterThread: masterThread: Master algorithm failed with RuntimeException FATAL [org.apache.giraph.master.MasterThread] org.apache.giraph.graph.GraphMapper: uncaughtException: OverrideExceptionHandler on thread org.apache.giraph.master.MasterThread, msg = java.lang.RuntimeException: java.io.IOException, exiting... application_1364578380737_0027/container_1364578380737_0027_01_000003/syslog ========= INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Log level remains at info INFO [main] org.apache.giraph.graph.GraphTaskManager: Distributed cache is empty. Assuming fatjar. INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: classpath @ /tmp/hadoop-yarn/staging/ekoontz/.staging/job_1364578380737_0027/job.jar for job org.apache.giraph.benchmark.PageRankBenchmark DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker: true DEBUG [main] org.apache.giraph.graph.GraphTaskManager: taskPartition: 1 DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is true. DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is true and zkAlreadyProvided=true. DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is true and taskPartition (1) is NOT less than masterCount (1), so WORKER_ONLY. INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Starting up BspServiceWorker... INFO [main] org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=60000 watcher=org.apache.giraph.worker.BspServiceWorker@5b4bc4e6 INFO [main] org.apache.giraph.comm.netty.NettyServer: NettyServer: Using execution handler with 8 threads after requestFrameDecoder. INFO [main] org.apache.giraph.comm.netty.NettyServer: start: Started server communication server: Eugenes-MacBook-Pro.local/172.16.175.1:30001 with up to 16 threads on bind attempt 0 with sendBufferSize = 32768 receiveBufferSize = 524288 backlog = 6 INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Registering health of this worker... INFO [main] org.apache.giraph.bsp.BspService: getJobState: Job state already exists (/_hadoopBsp/job_1364578380737_0027/_masterJobState) DEBUG [main] org.apache.giraph.worker.BspServiceWorker: worker: Worker(hostname=Eugenes-MacBook-Pro.local, MRtaskID=1, port=30001) with taskId=1 is starting superstep. INFO [main] org.apache.giraph.worker.BspServiceWorker: registerHealth: Created my health node for attempt=0, superstep=-1 with /_hadoopBsp/job_1364578380737_0027/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/Eugenes-MacBook-Pro.local_1 and workerInfo= Worker(hostname=Eugenes-MacBook-Pro.local, MRtaskID=1, port=30001) INFO [main-EventThread] org.apache.giraph.worker.BspServiceWorker: processEvent: Job state changed, checking to see if it needs to restart INFO [main-EventThread] org.apache.giraph.bsp.BspService: getJobState: Job state already exists (/_hadoopBsp/job_1364578380737_0027/_masterJobState) application_1364578380737_0027/container_1364578380737_0027_01_000004/syslog ========= INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Log level remains at info INFO [main] org.apache.giraph.graph.GraphTaskManager: Distributed cache is empty. Assuming fatjar. INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: classpath @ /tmp/hadoop-yarn/staging/ekoontz/.staging/job_1364578380737_0027/job.jar for job org.apache.giraph.benchmark.PageRankBenchmark DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker: true DEBUG [main] org.apache.giraph.graph.GraphTaskManager: taskPartition: 2 DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is true. DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is true and zkAlreadyProvided=true. DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is true and taskPartition (2) is NOT less than masterCount (1), so WORKER_ONLY. INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Starting up BspServiceWorker... INFO [main] org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=60000 watcher=org.apache.giraph.worker.BspServiceWorker@7aec8784 INFO [main] org.apache.giraph.comm.netty.NettyServer: NettyServer: Using execution handler with 8 threads after requestFrameDecoder. INFO [main] org.apache.giraph.comm.netty.NettyServer: start: Started server communication server: Eugenes-MacBook-Pro.local/172.16.175.1:30002 with up to 16 threads on bind attempt 0 with sendBufferSize = 32768 receiveBufferSize = 524288 backlog = 6 INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Registering health of this worker... INFO [main] org.apache.giraph.bsp.BspService: getJobState: Job state already exists (/_hadoopBsp/job_1364578380737_0027/_masterJobState) DEBUG [main] org.apache.giraph.worker.BspServiceWorker: worker: Worker(hostname=Eugenes-MacBook-Pro.local, MRtaskID=2, port=30002) with taskId=2 is starting superstep. INFO [main] org.apache.giraph.worker.BspServiceWorker: registerHealth: Created my health node for attempt=0, superstep=-1 with /_hadoopBsp/job_1364578380737_0027/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/Eugenes-MacBook-Pro.local_2 and workerInfo= Worker(hostname=Eugenes-MacBook-Pro.local, MRtaskID=2, port=30002) INFO [main-EventThread] org.apache.giraph.worker.BspServiceWorker: processEvent: Job state changed, checking to see if it needs to restart INFO [main-EventThread] org.apache.giraph.bsp.BspService: getJobState: Job state already exists (/_hadoopBsp/job_1364578380737_0027/_masterJobState) application_1364578380737_0027/container_1364578380737_0027_01_000005/syslog ========= INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Log level remains at info INFO [main] org.apache.giraph.graph.GraphTaskManager: Distributed cache is empty. Assuming fatjar. INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: classpath @ /tmp/hadoop-yarn/staging/ekoontz/.staging/job_1364578380737_0027/job.jar for job org.apache.giraph.benchmark.PageRankBenchmark DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker: true DEBUG [main] org.apache.giraph.graph.GraphTaskManager: taskPartition: 3 DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is true. DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is true and zkAlreadyProvided=true. DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is true and taskPartition (3) is NOT less than masterCount (1), so WORKER_ONLY. INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Starting up BspServiceWorker... INFO [main] org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=60000 watcher=org.apache.giraph.worker.BspServiceWorker@c45aa2c INFO [main] org.apache.giraph.comm.netty.NettyServer: NettyServer: Using execution handler with 8 threads after requestFrameDecoder. INFO [main] org.apache.giraph.comm.netty.NettyServer: start: Started server communication server: Eugenes-MacBook-Pro.local/172.16.175.1:30003 with up to 16 threads on bind attempt 0 with sendBufferSize = 32768 receiveBufferSize = 524288 backlog = 6 INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Registering health of this worker... INFO [main] org.apache.giraph.bsp.BspService: getJobState: Job state already exists (/_hadoopBsp/job_1364578380737_0027/_masterJobState) DEBUG [main] org.apache.giraph.worker.BspServiceWorker: worker: Worker(hostname=Eugenes-MacBook-Pro.local, MRtaskID=3, port=30003) with taskId=3 is starting superstep. INFO [main] org.apache.giraph.worker.BspServiceWorker: registerHealth: Created my health node for attempt=0, superstep=-1 with /_hadoopBsp/job_1364578380737_0027/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/Eugenes-MacBook-Pro.local_3 and workerInfo= Worker(hostname=Eugenes-MacBook-Pro.local, MRtaskID=3, port=30003) INFO [main-EventThread] org.apache.giraph.worker.BspServiceWorker: processEvent: Job state changed, checking to see if it needs to restart INFO [main-EventThread] org.apache.giraph.bsp.BspService: getJobState: Job state already exists (/_hadoopBsp/job_1364578380737_0027/_masterJobState) application_1364578380737_0027/container_1364578380737_0027_01_000006/syslog ========= INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Log level remains at info INFO [main] org.apache.giraph.graph.GraphTaskManager: Distributed cache is empty. Assuming fatjar. INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: classpath @ /tmp/hadoop-yarn/staging/ekoontz/.staging/job_1364578380737_0027/job.jar for job org.apache.giraph.benchmark.PageRankBenchmark DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker: true DEBUG [main] org.apache.giraph.graph.GraphTaskManager: taskPartition: 4 DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is true. DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is true and zkAlreadyProvided=true. DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is true and taskPartition (4) is NOT less than masterCount (1), so WORKER_ONLY. INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Starting up BspServiceWorker... INFO [main] org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=60000 watcher=org.apache.giraph.worker.BspServiceWorker@7e413fc6 INFO [main] org.apache.giraph.comm.netty.NettyServer: NettyServer: Using execution handler with 8 threads after requestFrameDecoder. INFO [main] org.apache.giraph.comm.netty.NettyServer: start: Started server communication server: Eugenes-MacBook-Pro.local/172.16.175.1:30004 with up to 16 threads on bind attempt 0 with sendBufferSize = 32768 receiveBufferSize = 524288 backlog = 6 INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Registering health of this worker... INFO [main] org.apache.giraph.bsp.BspService: getJobState: Job state already exists (/_hadoopBsp/job_1364578380737_0027/_masterJobState) DEBUG [main] org.apache.giraph.worker.BspServiceWorker: worker: Worker(hostname=Eugenes-MacBook-Pro.local, MRtaskID=4, port=30004) with taskId=4 is starting superstep. INFO [main] org.apache.giraph.worker.BspServiceWorker: registerHealth: Created my health node for attempt=0, superstep=-1 with /_hadoopBsp/job_1364578380737_0027/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/Eugenes-MacBook-Pro.local_4 and workerInfo= Worker(hostname=Eugenes-MacBook-Pro.local, MRtaskID=4, port=30004) INFO [main-EventThread] org.apache.giraph.worker.BspServiceWorker: processEvent: Job state changed, checking to see if it needs to restart INFO [main-EventThread] org.apache.giraph.bsp.BspService: getJobState: Job state already exists (/_hadoopBsp/job_1364578380737_0027/_masterJobState) application_1364578380737_0027/container_1364578380737_0027_01_000007/syslog ========= INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Log level remains at info INFO [main] org.apache.giraph.graph.GraphTaskManager: Distributed cache is empty. Assuming fatjar. INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: classpath @ /tmp/hadoop-yarn/staging/ekoontz/.staging/job_1364578380737_0027/job.jar for job org.apache.giraph.benchmark.PageRankBenchmark DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker: true DEBUG [main] org.apache.giraph.graph.GraphTaskManager: taskPartition: 5 DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is true. DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is true and zkAlreadyProvided=true. DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is true and taskPartition (5) is NOT less than masterCount (1), so WORKER_ONLY. INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Starting up BspServiceWorker... INFO [main] org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=60000 watcher=org.apache.giraph.worker.BspServiceWorker@3b5ad1da INFO [main] org.apache.giraph.comm.netty.NettyServer: NettyServer: Using execution handler with 8 threads after requestFrameDecoder. INFO [main] org.apache.giraph.comm.netty.NettyServer: start: Started server communication server: Eugenes-MacBook-Pro.local/172.16.175.1:30005 with up to 16 threads on bind attempt 0 with sendBufferSize = 32768 receiveBufferSize = 524288 backlog = 6 INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Registering health of this worker... INFO [main] org.apache.giraph.bsp.BspService: getJobState: Job state already exists (/_hadoopBsp/job_1364578380737_0027/_masterJobState) DEBUG [main] org.apache.giraph.worker.BspServiceWorker: worker: Worker(hostname=Eugenes-MacBook-Pro.local, MRtaskID=5, port=30005) with taskId=5 is starting superstep. INFO [main] org.apache.giraph.worker.BspServiceWorker: registerHealth: Created my health node for attempt=0, superstep=-1 with /_hadoopBsp/job_1364578380737_0027/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/Eugenes-MacBook-Pro.local_5 and workerInfo= Worker(hostname=Eugenes-MacBook-Pro.local, MRtaskID=5, port=30005) INFO [main-EventThread] org.apache.giraph.worker.BspServiceWorker: processEvent: Job state changed, checking to see if it needs to restart INFO [main-EventThread] org.apache.giraph.bsp.BspService: getJobState: Job state already exists (/_hadoopBsp/job_1364578380737_0027/_masterJobState) application_1364578380737_0027/container_1364578380737_0027_01_000008/syslog ========= INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Log level remains at info INFO [main] org.apache.giraph.graph.GraphTaskManager: Distributed cache is empty. Assuming fatjar. INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: classpath @ /tmp/hadoop-yarn/staging/ekoontz/.staging/job_1364578380737_0027/job.jar for job org.apache.giraph.benchmark.PageRankBenchmark DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker: true DEBUG [main] org.apache.giraph.graph.GraphTaskManager: taskPartition: 6 DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is true. DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is true and zkAlreadyProvided=true. DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is true and taskPartition (6) is NOT less than masterCount (1), so WORKER_ONLY. INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Starting up BspServiceWorker... INFO [main] org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=60000 watcher=org.apache.giraph.worker.BspServiceWorker@1b275a34 INFO [main] org.apache.giraph.comm.netty.NettyServer: NettyServer: Using execution handler with 8 threads after requestFrameDecoder. INFO [main] org.apache.giraph.comm.netty.NettyServer: start: Started server communication server: Eugenes-MacBook-Pro.local/172.16.175.1:30006 with up to 16 threads on bind attempt 0 with sendBufferSize = 32768 receiveBufferSize = 524288 backlog = 6 INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Registering health of this worker... INFO [main] org.apache.giraph.bsp.BspService: getJobState: Job state already exists (/_hadoopBsp/job_1364578380737_0027/_masterJobState) DEBUG [main] org.apache.giraph.worker.BspServiceWorker: worker: Worker(hostname=Eugenes-MacBook-Pro.local, MRtaskID=6, port=30006) with taskId=6 is starting superstep. INFO [main] org.apache.giraph.worker.BspServiceWorker: registerHealth: Created my health node for attempt=0, superstep=-1 with /_hadoopBsp/job_1364578380737_0027/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/Eugenes-MacBook-Pro.local_6 and workerInfo= Worker(hostname=Eugenes-MacBook-Pro.local, MRtaskID=6, port=30006) {code} > Exception when running pagerank benchmark: SendVertexRequest cannot be cast > to MasterRequest > -------------------------------------------------------------------------------------------- > > Key: GIRAPH-601 > URL: https://issues.apache.org/jira/browse/GIRAPH-601 > Project: Giraph > Issue Type: Bug > Reporter: Eugene Koontz > Attachments: instrumentation.patch > > > Building Giraph with: > {code} > mvn -DskipTests -Phadoop_2.0.3 clean compile > {code} > Running pagerank like this: > {code} > $HADOOP_RUNTIME/bin/hadoop jar $JAR \ > org.apache.giraph.benchmark.PageRankBenchmark \ > -e 10 -s 10 -v -V 10 -w 6 > {code} > I see this in > /tmp/userlogs/application_1364578380737_0003/container_1364578380737_0003_01_000002/ > : > {code} > 2013-03-29 10:58:06,371 DEBUG [org.apache.giraph.master.MasterThread] > org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: Got finished > worker list = [Eugenes-MacBook-Pro.local_1, Eugenes-MacBook-Pro.local_3], > size = 2, worker list = [Worker(hostname=Eugenes-MacBook-Pro.local, > MRtaskID=2, port=30002), Worker(hostname=Eugenes-MacBook-Pro.local, > MRtaskID=1, port=30001), Worker(hostname=Eugenes-MacBook-Pro.local, > MRtaskID=4, port=30004), Worker(hostname=Eugenes-MacBook-Pro.local, > MRtaskID=3, port=30003), Worker(hostname=Eugenes-MacBook-Pro.local, > MRtaskID=5, port=30005), Worker(hostname=Eugenes-MacBook-Pro.local, > MRtaskID=0, port=30010)], size = 6 from > /_hadoopBsp/job_1364578380737_0003/_vertexInputSplitDoneDir > 2013-03-29 10:58:06,373 WARN [netty-server-exec-3] > org.apache.giraph.comm.netty.handler.RequestServerHandler: exceptionCaught: > Channel failed with remote address /172.16.175.1:56236 > java.lang.ClassCastException: > org.apache.giraph.comm.requests.SendVertexRequest cannot be cast to > org.apache.giraph.comm.requests.MasterRequest > at > org.apache.giraph.comm.netty.handler.MasterRequestServerHandler.processRequest(MasterRequestServerHandler.java:27) > at > org.apache.giraph.comm.netty.handler.RequestServerHandler.messageReceived(RequestServerHandler.java:106) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) > at > org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:71) > at > org.jboss.netty.handler.execution.ChannelUpstreamEventRunnable.doRun(ChannelUpstreamEventRunnable.java:45) > at > org.jboss.netty.handler.execution.ChannelEventRunnable.run(ChannelEventRunnable.java:69) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) > at java.lang.Thread.run(Thread.java:680) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira