[jira] [Commented] (GIRAPH-972) Race condition in checkpointing

Hudson (JIRA) Thu, 18 Dec 2014 16:17:28 -0800

    [ 
https://issues.apache.org/jira/browse/GIRAPH-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252615#comment-14252615
 ]


Hudson commented on GIRAPH-972:
-------------------------------

ABORTED: Integrated in Giraph-trunk-Commit #1507 (See 
[https://builds.apache.org/job/Giraph-trunk-Commit/1507/])
GIRAPH-972 Race condition in checkpointing (edunov: 
http://git-wip-us.apache.org/repos/asf?p=giraph.git&a=commit&h=7f2d58445e2353a1a42fbb4282ed5cad724186b5)
* giraph-core/src/main/java/org/apache/giraph/master/BspServiceMaster.java
* CHANGELOG
* giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java
* giraph-core/src/main/java/org/apache/giraph/bsp/BspService.java


> Race condition in checkpointing
> -------------------------------
>
>                 Key: GIRAPH-972
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-972
>             Project: Giraph
>          Issue Type: Bug
>            Reporter: Sergey Edunov
>
> Couple of issues noticed with checkpointing of large jobs:
> 1) Task ID of master appears to be important. In most cases it is 0, however 
> sometimes it is not and as we can not control it checkpointing should not 
> depend on it.
> 2) Race condition happens on master when worker dies:
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
> NoNode for 
> /_hadoopBsp/job_201411061513.38895_0001/_applicationAttemptsDir/0/_superstepDir/9/_workerHealthyDir/hadoop4921.prn2.facebook.com_3
>       at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
>       at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>       at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
>       at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1180)
>       at org.apache.giraph.zk.ZooKeeperExt.getData(ZooKeeperExt.java:470)
>       at 
> org.apache.giraph.utils.WritableUtils.readFieldsFromZnode(WritableUtils.java:126)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (GIRAPH-972) Race condition in checkpointing

Reply via email to