[ https://issues.apache.org/jira/browse/GIRAPH-53?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13134723#comment-13134723 ]
Avery Ching commented on GIRAPH-53: ----------------------------------- No, the map tasks are held for the duration of the application (no matter how many supersteps). That is a huge benefit when compared to implementing iterative graph applications on a traditional MapReduce framework. > Unable to read additional data from server session, likely server has closed > socket > ----------------------------------------------------------------------------------- > > Key: GIRAPH-53 > URL: https://issues.apache.org/jira/browse/GIRAPH-53 > Project: Giraph > Issue Type: Bug > Reporter: locker > > I've got an error recently. Every thing goes well till it comes to the 103rd > superstep. > 2011-10-14 16:23:38,904 INFO org.apache.giraph.comm.BasicRPCCommunications: > prepareSuperstep > 2011-10-14 16:23:39,018 WARN org.apache.giraph.graph.BspService: process: > Unknown and unprocessed event > (path=/_hadoopBsp/job_201110141621_0001/_applicationAttemptsDir/0/_superstepDir/101/_vertexRangeAssignments, > type=NodeDeleted, state=SyncConnected) > 2011-10-14 16:23:39,057 INFO org.apache.giraph.graph.BspServiceWorker: > registerHealth: Created my health node for attempt=0, superstep=103 with > /_hadoopBsp/job_201110141621_0001/_applicationAttemptsDir/0/_superstepDir/103/_workerHealthyDir/locker-desktop_1 > and hostnamePort = ["locker-desktop",30001] > 2011-10-14 16:23:39,057 WARN org.apache.giraph.graph.BspService: process: > Unknown and unprocessed event > (path=/_hadoopBsp/job_201110141621_0001/_applicationAttemptsDir/0/_superstepDir/101/_superstepFinished, > type=NodeDeleted, state=SyncConnected) > 2011-10-14 16:23:39,529 INFO org.apache.zookeeper.ClientCnxn: Unable to read > additional data from server sessionid 0x1330186cff30001, likely server has > closed socket, closing socket connection and attempting reconnect > 2011-10-14 16:23:39,630 ERROR org.apache.zookeeper.ClientCnxn: Error while > calling watcher > java.lang.RuntimeException: process: Disconnected from ZooKeeper, cannot > recover. > at org.apache.giraph.graph.BspService.process(BspService.java:995) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:488) > 2011-10-14 16:23:41,098 INFO org.apache.zookeeper.ClientCnxn: Opening socket > connection to server locker-desktop/10.13.30.90:22181 > 2011-10-14 16:23:41,099 WARN org.apache.zookeeper.ClientCnxn: Session > 0x1330186cff30001 for server null, unexpected error, closing socket > connection and attempting reconnect > java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078) > 2011-10-14 16:23:41,212 INFO org.apache.hadoop.mapred.TaskLogsTruncater: > Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1 > 2011-10-14 16:23:41,306 INFO org.apache.hadoop.io.nativeio.NativeIO: > Initialized cache for UID to User mapping with a cache timeout of 14400 > seconds. > 2011-10-14 16:23:41,307 INFO org.apache.hadoop.io.nativeio.NativeIO: Got > UserName dic for UID 1001 from the native implementation > 2011-10-14 16:23:41,318 WARN org.apache.hadoop.mapred.Child: Error running > child > java.lang.RuntimeException: > org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode > = ConnectionLoss for > /_hadoopBsp/job_201110141621_0001/_applicationAttemptsDir/0/_superstepDir/103/_vertexRangeAssignments > at > org.apache.giraph.graph.BspServiceWorker.startSuperstep(BspServiceWorker.java:836) > at org.apache.giraph.graph.GraphMapper.map(GraphMapper.java:551) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369) > at org.apache.hadoop.mapred.Child$4.run(Child.java:259) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) > at org.apache.hadoop.mapred.Child.main(Child.java:253) > Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: > KeeperErrorCode = ConnectionLoss for > /_hadoopBsp/job_201110141621_0001/_applicationAttemptsDir/0/_superstepDir/103/_vertexRangeAssignments > at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) > at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:809) > at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:837) > at > org.apache.giraph.graph.BspServiceWorker.startSuperstep(BspServiceWorker.java:830) > ... 9 more > I dont know whether it should be called a bug or not. Wait for some help, > thx... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira