[
https://issues.apache.org/jira/browse/HBASE-25627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bharath Vissapragada resolved HBASE-25627.
------------------------------------------
Resolution: Fixed
Thanks [~sandeep.pal]
> HBase replication should have a metric to represent if the source is stuck
> getting initialized
> ----------------------------------------------------------------------------------------------
>
> Key: HBASE-25627
> URL: https://issues.apache.org/jira/browse/HBASE-25627
> Project: HBase
> Issue Type: Improvement
> Components: Replication
> Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.3.5, 2.4.3
> Reporter: Sandeep Pal
> Assignee: Sandeep Pal
> Priority: Major
> Fix For: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.4.3
>
>
> There can be situation when the cluster is not able to talk to peer cluster
> ZK, in that case, yes the logQueue will be accumulating but without digging
> into the logs, we cannot know what's the reason of loqQueue getting
> accumulating on the source.
> Since the replication source doesn't even start the shipper in this case, it
> is good to have a dedicated metric if the RS cannot talk to the peer's ZK at
> all.
>
> {code:java}
> 2021-03-03 04:02:10,704 DEBUG [peerId] zookeeper.RecoverableZooKeeper -
> Possibly transient ZooKeeper,
> quorum=zookeeper-0.zookeeper-a.fakeAddress:2181,zookeeper-1.zookeeper-a.fakeAddress:2181,zookeeper-2.zookeeper-a.fakeAddress:2181,zookeeper-3.zookeeper-a.fakeAddress:2181,zookeeper-4.zookeeper-a.fakeAddress:2181,
> exception=org.apache.zookeeper.KeeperException$AuthFailedException:
> KeeperErrorCode = AuthFailed for /hbase/hbaseid2021-03-03 04:02:10,704 DEBUG
> [peerId] zookeeper.RecoverableZooKeeper - Possibly transient ZooKeeper,
> quorum=zookeeper-0.zookeeper-a.fakeAddress:2181,zookeeper-1.zookeeper-a.fakeAddress:2181,zookeeper-2.zookeeper-a.fakeAddress:2181,zookeeper-3.zookeeper-a.fakeAddress:2181,zookeeper-4.zookeeper-a.fakeAddress:2181,
> exception=org.apache.zookeeper.KeeperException$AuthFailedException:
> KeeperErrorCode = AuthFailed for
> /hbase/hbaseidorg.apache.zookeeper.KeeperException$AuthFailedException:
> KeeperErrorCode = AuthFailed for /hbase/hbaseid at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:126) at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:54) at
> org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1119) at
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:284)
> at org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:469) at
> org.apache.hadoop.hbase.zookeeper.ZKClusterId.readClusterIdZNode(ZKClusterId.java:65)
> at
> org.apache.hadoop.hbase.zookeeper.ZKClusterId.getUUIDForCluster(ZKClusterId.java:96)
> at
> org.apache.hadoop.hbase.replication.HBaseReplicationEndpoint.getPeerUUID(HBaseReplicationEndpoint.java:104)
> at
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:306)
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)