[ 
https://issues.apache.org/jira/browse/HBASE-24585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17140153#comment-17140153
 ] 

Michael Stack commented on HBASE-24585:
---------------------------------------

I think we are doing the right thing after all... the exception does NOT bubble 
up and kill the RS. It happens earlier around failed RPC getting table 
descriptor. Will be back.

> If RSProcedureHandler throws exception, it aborts the hosting RS
> ----------------------------------------------------------------
>
>                 Key: HBASE-24585
>                 URL: https://issues.apache.org/jira/browse/HBASE-24585
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Michael Stack
>            Priority: Major
>
> HBASE-24574 proc v2 distributed log splitting is enabled. A remote split 
> fails because it was interrupted. The InterruptedException became an IOE and 
> then bubbled up and out of the RSPH below causing a RS abort.
> {code}
>  2020-06-17 21:20:37,472 ERROR 
> [RS_LOG_REPLAY_OPS-regionserver/localhost:16020-0] 
> handler.RSProcedureHandler: Error when call RSProcedureCallable:
>  java.io.IOException: Failed WAL split, status=RESIGNED, 
> wal=file:/Users/stack/checkouts/hbase.apache.git/tmp/hbase/WALs/localhost,16020,1592440848604-splitting/localhost%2C16020%2C1592440848604.meta.1592440852959.meta
>    at 
> org.apache.hadoop.hbase.regionserver.SplitWALCallable.splitWal(SplitWALCallable.java:106)
>    at 
> org.apache.hadoop.hbase.regionserver.SplitWALCallable.call(SplitWALCallable.java:86)
>    at 
> org.apache.hadoop.hbase.regionserver.SplitWALCallable.call(SplitWALCallable.java:49)
>    at 
> org.apache.hadoop.hbase.regionserver.handler.RSProcedureHandler.process(RSProcedureHandler.java:49)
>    at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
>    at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>    at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>    at java.lang.Thread.run(Thread.java:748)
> {code}
> The remote-procedure framework needs to be more resilient? Log the exception 
> unless an ERROR and keep going? Otherwise, makes features like procedurev2 
> distributed log splitting brittle. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to