[ 
https://issues.apache.org/jira/browse/GIRAPH-53?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13127702#comment-13127702
 ] 

Avery Ching commented on GIRAPH-53:
-----------------------------------

Also, I wonder if it's related to the counter issues, see 
https://issues.apache.org/jira/browse/GIRAPH-52.  You can try to disable the 
superstep counters with the job option "-Dgiraph.useSuperstepCounters=false", 
then see if the problem still occurs.
                
> Unable to read additional data from server session, likely server has closed 
> socket
> -----------------------------------------------------------------------------------
>
>                 Key: GIRAPH-53
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-53
>             Project: Giraph
>          Issue Type: Bug
>            Reporter: locker
>
> I've got an error recently. Every thing goes well till it comes to the 103rd 
> superstep. 
> 2011-10-14 16:23:38,904 INFO org.apache.giraph.comm.BasicRPCCommunications: 
> prepareSuperstep
> 2011-10-14 16:23:39,018 WARN org.apache.giraph.graph.BspService: process: 
> Unknown and unprocessed event 
> (path=/_hadoopBsp/job_201110141621_0001/_applicationAttemptsDir/0/_superstepDir/101/_vertexRangeAssignments,
>  type=NodeDeleted, state=SyncConnected)
> 2011-10-14 16:23:39,057 INFO org.apache.giraph.graph.BspServiceWorker: 
> registerHealth: Created my health node for attempt=0, superstep=103 with 
> /_hadoopBsp/job_201110141621_0001/_applicationAttemptsDir/0/_superstepDir/103/_workerHealthyDir/locker-desktop_1
>  and hostnamePort = ["locker-desktop",30001]
> 2011-10-14 16:23:39,057 WARN org.apache.giraph.graph.BspService: process: 
> Unknown and unprocessed event 
> (path=/_hadoopBsp/job_201110141621_0001/_applicationAttemptsDir/0/_superstepDir/101/_superstepFinished,
>  type=NodeDeleted, state=SyncConnected)
> 2011-10-14 16:23:39,529 INFO org.apache.zookeeper.ClientCnxn: Unable to read 
> additional data from server sessionid 0x1330186cff30001, likely server has 
> closed socket, closing socket connection and attempting reconnect
> 2011-10-14 16:23:39,630 ERROR org.apache.zookeeper.ClientCnxn: Error while 
> calling watcher 
> java.lang.RuntimeException: process: Disconnected from ZooKeeper, cannot 
> recover.
>       at org.apache.giraph.graph.BspService.process(BspService.java:995)
>       at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:488)
> 2011-10-14 16:23:41,098 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
> connection to server locker-desktop/10.13.30.90:22181
> 2011-10-14 16:23:41,099 WARN org.apache.zookeeper.ClientCnxn: Session 
> 0x1330186cff30001 for server null, unexpected error, closing socket 
> connection and attempting reconnect
> java.net.ConnectException: Connection refused
>       at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>       at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>       at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078)
> 2011-10-14 16:23:41,212 INFO org.apache.hadoop.mapred.TaskLogsTruncater: 
> Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
> 2011-10-14 16:23:41,306 INFO org.apache.hadoop.io.nativeio.NativeIO: 
> Initialized cache for UID to User mapping with a cache timeout of 14400 
> seconds.
> 2011-10-14 16:23:41,307 INFO org.apache.hadoop.io.nativeio.NativeIO: Got 
> UserName dic for UID 1001 from the native implementation
> 2011-10-14 16:23:41,318 WARN org.apache.hadoop.mapred.Child: Error running 
> child
> java.lang.RuntimeException: 
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
> = ConnectionLoss for 
> /_hadoopBsp/job_201110141621_0001/_applicationAttemptsDir/0/_superstepDir/103/_vertexRangeAssignments
>       at 
> org.apache.giraph.graph.BspServiceWorker.startSuperstep(BspServiceWorker.java:836)
>       at org.apache.giraph.graph.GraphMapper.map(GraphMapper.java:551)
>       at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
>       at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:396)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>       at org.apache.hadoop.mapred.Child.main(Child.java:253)
> Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: 
> KeeperErrorCode = ConnectionLoss for 
> /_hadoopBsp/job_201110141621_0001/_applicationAttemptsDir/0/_superstepDir/103/_vertexRangeAssignments
>       at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
>       at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>       at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:809)
>       at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:837)
>       at 
> org.apache.giraph.graph.BspServiceWorker.startSuperstep(BspServiceWorker.java:830)
>       ... 9 more
> I dont know whether it should be called a bug or not. Wait for some help, 
> thx...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to