[ 
https://issues.apache.org/jira/browse/HBASE-24585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17140087#comment-17140087
 ] 

Guanghao Zhang commented on HBASE-24585:
----------------------------------------

{quote}I think the design here is the task should not throw any exceptions, as 
it should just reports the error to master. If we do meet an exception, the 
only safe way is to abort.
{quote}
Do we have any doc to explana this design? If not, we need to add clear 
explanation for this to prevent misuse in future.

> If RSProcedureHandler throws exception, it aborts the hosting RS
> ----------------------------------------------------------------
>
>                 Key: HBASE-24585
>                 URL: https://issues.apache.org/jira/browse/HBASE-24585
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Michael Stack
>            Priority: Major
>
> HBASE-24574 proc v2 distributed log splitting is enabled. A remote split 
> fails because it was interrupted. The InterruptedException became an IOE and 
> then bubbled up and out of the RSPH below causing a RS abort.
> {code}
>  2020-06-17 21:20:37,472 ERROR 
> [RS_LOG_REPLAY_OPS-regionserver/localhost:16020-0] 
> handler.RSProcedureHandler: Error when call RSProcedureCallable:
>  java.io.IOException: Failed WAL split, status=RESIGNED, 
> wal=file:/Users/stack/checkouts/hbase.apache.git/tmp/hbase/WALs/localhost,16020,1592440848604-splitting/localhost%2C16020%2C1592440848604.meta.1592440852959.meta
>    at 
> org.apache.hadoop.hbase.regionserver.SplitWALCallable.splitWal(SplitWALCallable.java:106)
>    at 
> org.apache.hadoop.hbase.regionserver.SplitWALCallable.call(SplitWALCallable.java:86)
>    at 
> org.apache.hadoop.hbase.regionserver.SplitWALCallable.call(SplitWALCallable.java:49)
>    at 
> org.apache.hadoop.hbase.regionserver.handler.RSProcedureHandler.process(RSProcedureHandler.java:49)
>    at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
>    at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>    at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>    at java.lang.Thread.run(Thread.java:748)
> {code}
> The remote-procedure framework needs to be more resilient? Log the exception 
> unless an ERROR and keep going? Otherwise, makes features like procedurev2 
> distributed log splitting brittle. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to