[jira] [Resolved] (HBASE-25627) HBase replication should have a metric to represent if the source is stuck getting initialized

Bharath Vissapragada (Jira) Mon, 22 Mar 2021 22:58:10 -0700


     [ 
https://issues.apache.org/jira/browse/HBASE-25627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Bharath Vissapragada resolved HBASE-25627.
------------------------------------------
    Resolution: Fixed

Thanks [~sandeep.pal]

> HBase replication should have a metric to represent if the source is stuck 
> getting initialized
> ----------------------------------------------------------------------------------------------
>
>                 Key: HBASE-25627
>                 URL: https://issues.apache.org/jira/browse/HBASE-25627
>             Project: HBase
>          Issue Type: Improvement
>          Components: Replication
>    Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.3.5, 2.4.3
>            Reporter: Sandeep Pal
>            Assignee: Sandeep Pal
>            Priority: Major
>             Fix For: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.4.3
>
>
> There can be situation when the cluster is not able to talk to peer cluster 
> ZK, in that case, yes the logQueue will be accumulating but without digging 
> into the logs, we cannot know what's the reason of loqQueue getting 
> accumulating on the source. 
> Since the replication source doesn't even start the shipper in this case, it 
> is good to have a dedicated metric if the RS cannot talk to the peer's ZK at 
> all. 
>  
> {code:java}
> 2021-03-03 04:02:10,704 DEBUG [peerId] zookeeper.RecoverableZooKeeper - 
> Possibly transient ZooKeeper, 
> quorum=zookeeper-0.zookeeper-a.fakeAddress:2181,zookeeper-1.zookeeper-a.fakeAddress:2181,zookeeper-2.zookeeper-a.fakeAddress:2181,zookeeper-3.zookeeper-a.fakeAddress:2181,zookeeper-4.zookeeper-a.fakeAddress:2181,
>  exception=org.apache.zookeeper.KeeperException$AuthFailedException: 
> KeeperErrorCode = AuthFailed for /hbase/hbaseid2021-03-03 04:02:10,704 DEBUG 
> [peerId] zookeeper.RecoverableZooKeeper - Possibly transient ZooKeeper, 
> quorum=zookeeper-0.zookeeper-a.fakeAddress:2181,zookeeper-1.zookeeper-a.fakeAddress:2181,zookeeper-2.zookeeper-a.fakeAddress:2181,zookeeper-3.zookeeper-a.fakeAddress:2181,zookeeper-4.zookeeper-a.fakeAddress:2181,
>  exception=org.apache.zookeeper.KeeperException$AuthFailedException: 
> KeeperErrorCode = AuthFailed for 
> /hbase/hbaseidorg.apache.zookeeper.KeeperException$AuthFailedException: 
> KeeperErrorCode = AuthFailed for /hbase/hbaseid at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:126) at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:54) at 
> org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1119) at 
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:284)
>  at org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:469) at 
> org.apache.hadoop.hbase.zookeeper.ZKClusterId.readClusterIdZNode(ZKClusterId.java:65)
>  at 
> org.apache.hadoop.hbase.zookeeper.ZKClusterId.getUUIDForCluster(ZKClusterId.java:96)
>  at 
> org.apache.hadoop.hbase.replication.HBaseReplicationEndpoint.getPeerUUID(HBaseReplicationEndpoint.java:104)
>  at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:306)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-25627) HBase replication should have a metric to represent if the source is stuck getting initialized

Reply via email to