[ https://issues.apache.org/jira/browse/HBASE-4045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jean-Daniel Cryans updated HBASE-4045: -------------------------------------- Attachment: HBASE-4045.patch Easy fix, instead of returning null in fetchSlavesAddresses I'll return an empty list. > [replication] NPE in ReplicationSource when ZK is gone > ------------------------------------------------------ > > Key: HBASE-4045 > URL: https://issues.apache.org/jira/browse/HBASE-4045 > Project: HBase > Issue Type: Bug > Affects Versions: 0.90.3 > Reporter: Jean-Daniel Cryans > Assignee: Jean-Daniel Cryans > Priority: Minor > Fix For: 0.90.4 > > Attachments: HBASE-4045.patch > > > We got this in production, it killed the replication thread but the server > itself was fine and the master kept the logs: > {quote} > 2011-06-26 16:02:56,092 INFO org.apache.zookeeper.ClientCnxn: Client session > timed out, have not heard from server in 26667ms for sessionid > 0x22f9dcb30ab01b8, closing socket connection and attempting reconnect > 2011-06-26 16:02:56,213 DEBUG > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: connection to cluster: > 5-0x22f9dcb30ab01b8-0x22f9dcb30ab01b8 Received ZooKeeper Event, type=None, > state=Disconnected, path=null > 2011-06-26 16:02:56,213 DEBUG > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: connection to cluster: > 5-0x22f9dcb30ab01b8-0x22f9dcb30ab01b8 Received Disconnected from ZooKeeper, > ignoring > 2011-06-26 16:02:56,213 WARN > org.apache.hadoop.hbase.replication.ReplicationZookeeper: Cannot get peer's > region server addresses > org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode > = ConnectionLoss for /hbase/rs > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:90) > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:42) > at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1243) > at > org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenNoWatch(ZKUtil.java:389) > at > org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndGetAsAddresses(ZKUtil.java:355) > at > org.apache.hadoop.hbase.replication.ReplicationZookeeper.fetchSlavesAddresses(ReplicationZookeeper.java:228) > at > org.apache.hadoop.hbase.replication.ReplicationZookeeper.getSlavesAddresses(ReplicationZookeeper.java:216) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.chooseSinks(ReplicationSource.java:205) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:588) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:341)2011-06-26 > 16:02:56,222 ERROR > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Closing > source 5 because an error occurred: Uncaught exception during runtime > java.lang.Exception: java.lang.NullPointerException > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$1.uncaughtException(ReplicationSource.java:628) > at java.lang.Thread.dispatchUncaughtException(Thread.java:1874)Caused > by: java.lang.NullPointerException > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.chooseSinks(ReplicationSource.java:208) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:588) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:341) > {quote} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira