[ 
https://issues.apache.org/jira/browse/HBASE-10335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13870586#comment-13870586
 ] 

Liang Xie commented on HBASE-10335:
-----------------------------------

I raised it as "Blocker", since when we hit this issue currently, the only 
available solution to us seems is to restart the whole cluster...
If any guys think it's not suitable, just feel free to reset as "major" :)

nit for patch:  add an isDebugEnable check should be better.

> AuthFailedException in zookeeper may block replication forever
> --------------------------------------------------------------
>
>                 Key: HBASE-10335
>                 URL: https://issues.apache.org/jira/browse/HBASE-10335
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication, security
>    Affects Versions: 0.94.15, 0.99.0
>            Reporter: Liu Shaohui
>            Assignee: Liu Shaohui
>            Priority: Blocker
>         Attachments: HBASE-10335-v1.diff
>
>
> ReplicationSource will rechoose sinks when encounted exceptions during 
> skipping edits to the current sink. But if the  zookeeper client for peer 
> cluster go to AUTH_FAILED state, the ReplicationSource will always get  
> AuthFailedException. The ReplicationSource does not reconnect  the peer, 
> because reconnectPeer only handle ConnectionLossException and 
> SessionExpiredException. As a result, the replication will print log: 
> {quote}
> 2014-01-14,12:07:06,892 INFO 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Getting 0 
> rs from peer cluster # 20
> 2014-01-14,12:07:06,892 INFO 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Slave 
> cluster looks down: 20 has 0 region servers
> {quote}
> and be blocked forever.
> I think other places may have same problems for not handling 
> AuthFailedException in zookeeper. eg: HBASE-8675.
> [~apurtell]



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to