Nirdosh Kumar Yadav created HBASE-29041:
-------------------------------------------

             Summary: Set UncaughtException Handler for RegionServer 
ExecutorService
                 Key: HBASE-29041
                 URL: https://issues.apache.org/jira/browse/HBASE-29041
             Project: HBase
          Issue Type: Bug
          Components: regionserver
    Affects Versions: 2.5.10, 2.6.1, 3.0.0
            Reporter: Nirdosh Kumar Yadav


In HBase cluster we have encountered a scenario where regionserver server crash 
procedure(SCP) waited for more than 3 Hours. Incident was triggered due to 
temporary network unavailability in hbase cluster. On Debugging found out SCP 
was stuck due to  child {{SplitWALProcedure}} which was waiting for completion 
of SpliWalRemote procedure by regionserver worker.  SplitWALRemote procedure 
while running encountered{{{} an unknown exception. In logs we can see 
"hdfs{}}}{{{}.{}}}{{{}DataStreamer{}}}{{ }}{{-}}{{ }}{{No}}{{ }}{{ack}}{{ 
}}{{{}receive{}}}{{{}d{}}}" error while regionserver connecting to Data Node. 
After this error thread was stuck or died as there was no related logs 
exists{{{}. There were inconsistent regions reported during this period. All 
procedure were restarted and completed after Active HMaster service was 
bounced. {}}}

Related logs:

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to