Hi all,
Recently, I set up Hama cluster using 2 machines.
This specification is as follows:
- 8 GB RAM
- 12 TB HDD
- (I don’t remember CPU spec.)
In order to run hama job, I set up configuration, bsp.tasks.maximum=40 and
bsp.child.java.opts=-Xmx4096m, in hama-site.xml. (skip rests of settings.)
So I performed examples which are pi Estimator and FastGraphGen but I got
below errors.
attempt_201507071627_0001_000023_0:
org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for
/bsp/job_201507071627_0001/peers/cluster-0:61029
attempt_201507071627_0001_000023_0: at
org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
attempt_201507071627_0001_000023_0: at
org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
attempt_201507071627_0001_000023_0: at
org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
attempt_201507071627_0001_000023_0: at
org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
attempt_201507071627_0001_000023_0: at
org.apache.hama.bsp.sync.ZKSyncClient.isExists(ZKSyncClient.java:108)
attempt_201507071627_0001_000023_0: at
org.apache.hama.bsp.sync.ZKSyncClient.writeNode(ZKSyncClient.java:261)
attempt_201507071627_0001_000023_0: at
org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.registerTask(ZooKeeperSyncC
lientImpl.java:279)
attempt_201507071627_0001_000023_0: at
org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.register(ZooKeeperSyncClien
tImpl.java:261)
attempt_201507071627_0001_000023_0: at org.apache.hama.bsp.BSPPeerImpl.
initializeSyncService(BSPPeerImpl.java:305)
attempt_201507071627_0001_000023_0: at org.apache.hama.bsp.BSPPeerImpl.
<init>(BSPPeerImpl.java:185)
attempt_201507071627_0001_000023_0: at
org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1251)
attempt_201507071627_0001_000023_0: 15/07/07 16:27:40 ERROR
sync.ZKSyncClient: Error creating zk path
/bsp/job_201507071627_0001/peers/cluster-0:61029
attempt_201507071627_0001_000023_0:
org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /bsp
attempt_201507071627_0001_000023_0: at
org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
attempt_201507071627_0001_000023_0: at
org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
attempt_201507071627_0001_000023_0: at
org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
attempt_201507071627_0001_000023_0: at
org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
attempt_201507071627_0001_000023_0: at
org.apache.hama.bsp.sync.ZKSyncClient.createZnode(ZKSyncClient.java:135)
attempt_201507071627_0001_000023_0: at
org.apache.hama.bsp.sync.ZKSyncClient.writeNode(ZKSyncClient.java:281)
attempt_201507071627_0001_000023_0: at
org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.registerTask(ZooKeeperSyncC
lientImpl.java:279)
attempt_201507071627_0001_000023_0: at
org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.register(ZooKeeperSyncClien
tImpl.java:261)
attempt_201507071627_0001_000023_0: at org.apache.hama.bsp.BSPPeerImpl.
initializeSyncService(BSPPeerImpl.java:305)
attempt_201507071627_0001_000023_0: at org.apache.hama.bsp.BSPPeerImpl.
<init>(BSPPeerImpl.java:185)
attempt_201507071627_0001_000023_0: at
org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1251)
attempt_201507071627_0001_000023_0: 15/07/07 16:27:42 ERROR
sync.ZKSyncClient: Error checking zk path /bsp/job_201507071627_0001/sync/-1
attempt_201507071627_0001_000023_0:
org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /bsp/job_201507071627_0001/sync/-1
attempt_201507071627_0001_000023_0: at
org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
attempt_201507071627_0001_000023_0: at
org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
attempt_201507071627_0001_000023_0: at
org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
attempt_201507071627_0001_000023_0: at
org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
attempt_201507071627_0001_000023_0: at
org.apache.hama.bsp.sync.ZKSyncClient.isExists(ZKSyncClient.java:108)
attempt_201507071627_0001_000023_0: at
org.apache.hama.bsp.sync.ZKSyncClient.writeNode(ZKSyncClient.java:261)
attempt_201507071627_0001_000023_0: at
org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.enterBarrier(ZooKeeperSyncC
lientImpl.java:100)
attempt_201507071627_0001_000023_0: at org.apache.hama.bsp.BSPPeerImpl.
doFirstSync(BSPPeerImpl.java:312)
attempt_201507071627_0001_000023_0: at org.apache.hama.bsp.BSPPeerImpl.
<init>(BSPPeerImpl.java:238)
attempt_201507071627_0001_000023_0: at
org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1251)
attempt_201507071627_0001_000023_0: 15/07/07 16:27:44 ERROR
sync.ZKSyncClient: Error creating zk path /bsp/job_201507071627_0001/sync/-1
attempt_201507071627_0001_000023_0:
org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /bsp
attempt_201507071627_0001_000023_0: at
org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
attempt_201507071627_0001_000023_0: at
org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
attempt_201507071627_0001_000023_0: at
org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
attempt_201507071627_0001_000023_0: at
org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
attempt_201507071627_0001_000023_0: at
org.apache.hama.bsp.sync.ZKSyncClient.createZnode(ZKSyncClient.java:135)
attempt_201507071627_0001_000023_0: at
org.apache.hama.bsp.sync.ZKSyncClient.writeNode(ZKSyncClient.java:281)
attempt_201507071627_0001_000023_0: at
org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.enterBarrier(ZooKeeperSyncC
lientImpl.java:100)
attempt_201507071627_0001_000023_0: at org.apache.hama.bsp.BSPPeerImpl.
doFirstSync(BSPPeerImpl.java:312)
attempt_201507071627_0001_000023_0: at org.apache.hama.bsp.BSPPeerImpl.
<init>(BSPPeerImpl.java:238)
attempt_201507071627_0001_000023_0: at
org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1251)
attempt_201507071627_0001_000023_0: 15/07/07 16:27:46 FATAL
bsp.GroomServer: SyncError from child
attempt_201507071627_0001_000023_0: org.apache.hama.bsp.sync.SyncException
attempt_201507071627_0001_000023_0: at
org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.enterBarrier(ZooKeeperSyncC
lientImpl.java:138)
attempt_201507071627_0001_000023_0: at org.apache.hama.bsp.BSPPeerImpl.
doFirstSync(BSPPeerImpl.java:312)
attempt_201507071627_0001_000023_0: at org.apache.hama.bsp.BSPPeerImpl.
<init>(BSPPeerImpl.java:238)
attempt_201507071627_0001_000023_0: at
org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1251)
15/07/07 16:27:48 INFO bsp.BSPJobClient: Job failed.
This is a ZK error. Hama tasks try to get the /bsp node from zookeeper and
fails.
This is just because hama.zookeeper.property.maxClientCnxns is 30 in hama-
default.xml.
The problem has been encountered while the number of maximum tasks is
larger than it.
To solve the problem, Hama has a setting to increase the number of
connectiosns to ZK.
<property>
<name>hama.zookeeper.property.maxClientCnxns</name>
<value>100</value>
</property>
So we should update the default number of connections which is over 100
because server’s performance has been more improved than before.
If you agree my opinion, I will change the default value as 300.
Best regards,
Minho Kim