[
https://issues.apache.org/jira/browse/HBASE-29041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nirdosh Kumar Yadav updated HBASE-29041:
----------------------------------------
Description:
In HBase cluster we have encountered a scenario where regionserver server crash
procedure(SCP) waited for more than 3 Hours. Incident was triggered due to
temporary network unavailability in hbase cluster. On Debugging found out SCP
was stuck due to child {{SplitWALProcedure}} which was waiting for completion
of SpliWalRemote procedure by regionserver worker. SplitWALRemote procedure
while running encountered{{{{}} an unknown exception. In logs we can see
*"hdfs*{}}}{*}{{{}.{}}}{{{}DataStreamer - {{}}}}{{{}No ack
}}\{{{}receive{}}}{{{}d{}}}"{*} error while regionserver connecting to Data
Node. After this error thread was stuck or died as there was no related logs
exists{{{}. There were inconsistent regions reported during this period. All
procedure were restarted and completed after Active HMaster service was
bounced. {}}}
Related logs:
[HMASTER-4]
2024-12-05 14:55:11,264 INFO [PEWorker-41] procedure2.ProcedureExecutor -
Initialized subprocedures=[\{pid=6003288, ppid=6002575, state=RUNNABLE;
SplitWALRemoteProcedureregionserver-53.regionserver.hbase.hbase33a.hbase.monitoring.aws-esvc1-useast2.aws.sfdc.is%2C60020%2C1730886070174.1733410178680,
worker=regionserver-15.regionserver.hbase.hbase33a.hbase.monitoring.aws-esvc1-useast2.aws.sfdc.is,60020,1730878238028}]
[RS-15]
2024-12-05 14:55:11,461 DEBUG
[iority.RWQ.Fifo.read.handler=83,queue=2,port=60020] regionserver.RSRpcServices
- Executing remote procedure
classorg.apache.hadoop.hbase.regionserver.SplitWALCallable, pid=6003288
[RS-15]
2024-12-05 14:55:54,689 ERROR [split-log-closeStream-pool-0] hdfs.DataStreamer
- No ack received, took 25002ms (threshold=25000ms). File being written:
/hbase/data/default/tsdb/c997c5f8dd36481dcd3ebb9b79a35b51/recovered.edits/0000000000539451088-regionserver-53.regionserver.hbase.hbase33a.hbase.monitoring.aws-esvc1-useast2.aws.sfdc.is%2C60020%2C1730886070174.1733410178680.temp,
block: BP-1745262640-10.60.130.13-1712173738392:blk_1330710217_257120451,
Write pipeline datanodes:
[DatanodeInfoWithStorage[10.60.52.107:50010,DS-f2b7ba1a-68b5-433a-9fe8-99315a172098,SSD],
DatanodeInfoWithStorage[10.60.75.52:50010,DS-93a433be-972f-4457-92ae-dd07288e41b5,SSD]].
[HMASTER-1]
2024-12-05 18:11:42,036 DEBUG [master/hmaster-1:60000:becomeActiveMaster]
store.ProcedureTree -Procedure Procedure(pid=6003288, ppid=6002575,
class=org.apache.hadoop.hbase.master.procedure.SplitWALRemoteProcedure) stack
ids=[3592]
[HMASTER-1]
2024-12-05 18:11:42,214 DEBUG [master/hmaster-1:60000:becomeActiveMaster]
procedure2.ProcedureExecutor - Loading pid=6003288, ppid=6002575,
state=RUNNABLE; SplitWALRemoteProcedure
regionserver-53.regionserver.hbase.hbase33a.hbase.monitoring.aws-esvc1-useast2.aws.sfdc.is%2C60020%2C1730886070174.1733410178680,
worker=regionserver-15.regionserver.hbase.hbase33a.hbase.monitoring.aws-esvc1-useast2.aws.sfdc.is,60020,1730878238028
[RS-15]
2024-12-05 18:11:42,769 DEBUG
[iority.RWQ.Fifo.read.handler=79,queue=7,port=60020] regionserver.RSRpcServices
- Executing remote procedure
classorg.apache.hadoop.hbase.regionserver.SplitWALCallable, pid=6003288
[RS-15]
2024-12-05 18:11:48,247 DEBUG
[_REPLAY_OPS-regionserver/regionserver-15:60020-192]
regionserver.RemoteProcedureResultReporter - Successfully complete execution of
pid=6003288
[HMASTER-1]
2024-12-05 18:11:48,304 INFO [PEWorker-2] procedure2.ProcedureExecutor -
Finished pid=6002575, ppid=6000775, state=SUCCESS;
SplitWALProcedureregionserver-53.regionserver.hbase.hbase33a.hbase.monitoring.aws-esvc1-useast2.aws.sfdc.is%2C60020%2C1730886070174.1733410178680,
worker=regionserver-15.regionserver.hbase.hbase33a.hbase.monitoring.aws-esvc1-useast2.aws.sfdc.is,60020,1730878238028
in 3 hrs, 16 mins, 52.806 sec
was:
In HBase cluster we have encountered a scenario where regionserver server crash
procedure(SCP) waited for more than 3 Hours. Incident was triggered due to
temporary network unavailability in hbase cluster. On Debugging found out SCP
was stuck due to child {{SplitWALProcedure}} which was waiting for completion
of SpliWalRemote procedure by regionserver worker. SplitWALRemote procedure
while running encountered{{{{}} an unknown exception. In logs we can see
*"hdfs*{}}}{*}{{{}.{}}}{{{}DataStreamer - {{}}}}{{{}No ack
}}\{{{}receive{}}}{{{}d{}}}"{*} error while regionserver connecting to Data
Node. After this error thread was stuck or died as there was no related logs
exists{{{}. There were inconsistent regions reported during this period. All
procedure were restarted and completed after Active HMaster service was
bounced. {}}}
Related logs:
{quote}{color:#000000}[HMASTER-4]{color}
{{{}2024{}}}{{{}-{}}}{{{}12{}}}{{{}-{}}}{{{}05{}}}{{
}}{{{}14{}}}{{{}:{}}}{{{}55{}}}{{{}:{}}}{{{}11{}}}{{{},{}}}{{{}264{}}}{{
}}{{INFO}}{{{} [{}}}{{{}PEWorker{}}}{{{}-{}}}{{{}41{}}}{{{}]
{}}}{{{}procedure2{}}}{{{}.{}}}{{{}ProcedureExecutor{}}}{{ }}{{-}}{{
}}{{Initialized}}{{
}}{{{}subprocedures{}}}{{{}=[{{}}}{{{}pid{}}}{{{}={}}}{{{}6003288{}}}{{{},
{}}}{{{}ppid{}}}{{{}={}}}{{{}6002575{}}}{{{},
{}}}{{{}state{}}}{{{}={}}}{{{}RUNNABLE{}}}{{{};
{}}}{{{}SplitWALRemoteProcedure{}}}{{{}{}}}{{{}regionserver{}}}{{{}-{}}}{{{}53{}}}{{{}.{}}}{{{}regionserver{}}}{{{}.{}}}{{{}hbase{}}}{{{}.{}}}{{{}hbase33a{}}}{{{}.{}}}{{{}hbase{}}}{{{}.{}}}{{{}monitoring{}}}{{{}.{}}}{{{}aws{}}}{{{}-{}}}{{{}esvc1{}}}{{{}-{}}}{{{}useast2{}}}{{{}.{}}}{{{}aws{}}}{{{}.{}}}{{{}sfdc{}}}{{{}.{}}}{{{}is{}}}{{{}%2C{}}}{{{}60020{}}}{{{}%2C{}}}{{{}1730886070174{}}}{{{}.{}}}{{{}1733410178680{}}}{{{},
{}}}{{{}worker{}}}{{{}={}}}{{{}regionserver{}}}{{{}-{}}}{{{}15{}}}{{{}.{}}}{{{}regionserver{}}}{{{}.{}}}{{{}hbase{}}}{{{}.{}}}{{{}hbase33a{}}}{{{}.{}}}{{{}hbase{}}}{{{}.{}}}{{{}monitoring{}}}{{{}.{}}}{{{}aws{}}}{{{}-{}}}{{{}esvc1{}}}{{{}-{}}}{{{}useast2{}}}{{{}.{}}}{{{}aws{}}}{{{}.{}}}{{{}sfdc{}}}{{{}.{}}}{{{}is{}}}{{{},{}}}{{{}60020{}}}{{{},{}}}{{{}1730878238028{}}}{{{}}]{}}}
{color:#000000}[RS-15]{color}
{{{}2024{}}}{{{}-{}}}{{{}12{}}}{{{}-{}}}{{{}05{}}}{{
}}{{{}14{}}}{{{}:{}}}{{{}55{}}}{{{}:{}}}{{{}11{}}}{{{},{}}}{{{}461{}}}{{
}}{{DEBUG}}{{{}
[{}}}{{{}iority{}}}{{{}.{}}}{{{}RWQ{}}}{{{}.{}}}{{{}Fifo{}}}{{{}.{}}}{{{}read{}}}{{{}.{}}}{{{}handler{}}}{{{}={}}}{{{}83{}}}{{{},{}}}{{{}queue{}}}{{{}={}}}{{{}2{}}}{{{},{}}}{{{}port{}}}{{{}={}}}{{{}60020{}}}{{{}]
{}}}{{{}regionserver{}}}{{{}.{}}}{{{}RSRpcServices{}}}{{ }}{{-}}{{
}}{{Executing}}{{ }}{{remote}}{{ }}{{procedure}}{{
}}{{{}class{}}}{{{}{}}}{{{}org{}}}{{{}.{}}}{{{}apache{}}}{{{}.{}}}{{{}hadoop{}}}{{{}.{}}}{{{}hbase{}}}{{{}.{}}}{{{}regionserver{}}}{{{}.{}}}{{{}SplitWALCallable{}}}{{{},
{}}}{{{}pid{}}}{{{}={}}}{{{}6003288{}}}
{color:#000000}[RS-15]{color}
{{{}2024{}}}{{{}-{}}}{{{}12{}}}{{{}-{}}}{{{}05{}}}{{
}}{{{}14{}}}{{{}:{}}}{{{}55{}}}{{{}:{}}}{{{}54{}}}{{{},{}}}{{{}689{}}}{{
}}{{ERROR}}{{{}
[{}}}{{{}split{}}}{{{}-{}}}{{{}log{}}}{{{}-{}}}{{{}closeStream{}}}{{{}-{}}}{{{}pool{}}}{{{}-{}}}{{{}0{}}}{{{}]
{}}}{{{}hdfs{}}}{{{}.{}}}{{{}DataStreamer{}}}{{ }}{{-}}{{ }}{{No}}{{
}}{{ack}}{{ }}{{{}received{}}}{{{}, {}}}{{took}}{{ }}{{25002ms}}{{{}
({}}}{{{}threshold{}}}{{{}={}}}{{{}25000ms{}}}{{{}){}}}{{{}.{}}}{{ }}{{File}}{{
}}{{being}}{{ }}{{{}written{}}}{{{}:
/{}}}{{{}hbase{}}}{{{}/{}}}{{{}data{}}}{{{}/{}}}{{{}default{}}}{{{}/{}}}{{{}tsdb{}}}{{{}/{}}}{{{}c997c5f8dd36481dcd3ebb9b79a35b51{}}}{{{}/{}}}{{{}recovered{}}}{{{}.{}}}{{{}edits{}}}{{{}/{}}}{{{}0000000000539451088{}}}{{{}-{}}}{{{}regionserver{}}}{{{}-{}}}{{{}53{}}}{{{}.{}}}{{{}regionserver{}}}{{{}.{}}}{{{}hbase{}}}{{{}.{}}}{{{}hbase33a{}}}{{{}.{}}}{{{}hbase{}}}{{{}.{}}}{{{}monitoring{}}}{{{}.{}}}{{{}aws{}}}{{{}-{}}}{{{}esvc1{}}}{{{}-{}}}{{{}useast2{}}}{{{}.{}}}{{{}aws{}}}{{{}.{}}}{{{}sfdc{}}}{{{}.{}}}{{{}is{}}}{{{}%2C{}}}{{{}60020{}}}{{{}%2C{}}}{{{}1730886070174{}}}{{{}.{}}}{{{}1733410178680{}}}{{{}.{}}}{{{}temp{}}}{{{},
{}}}{{{}block{}}}{{{}:
{}}}{{{}BP{}}}{{{}-{}}}{{{}1745262640{}}}{{{}-{}}}{{{}10{}}}{{{}.{}}}{{{}60{}}}{{{}.{}}}{{{}130{}}}{{{}.{}}}{{{}13{}}}{{{}-{}}}{{{}1712173738392{}}}{{{}:{}}}{{{}blk{}}}{{{}_{}}}{{{}1330710217{}}}{{{}_{}}}{{{}257120451{}}}{{{},
{}}}{{Write}}{{ }}{{pipeline}}{{ }}{{{}datanodes{}}}{{{}:
[{}}}{{{}DatanodeInfoWithStorage{}}}{{{}[{}}}{{{}10{}}}{{{}.{}}}{{{}60{}}}{{{}.{}}}{{{}52{}}}{{{}.{}}}{{{}107{}}}{{{}:{}}}{{{}50010{}}}{{{},{}}}{{{}DS{}}}{{{}-{}}}{{{}f2b7ba1a{}}}{{{}-{}}}{{{}68b5{}}}{{{}-{}}}{{{}433a{}}}{{{}-{}}}{{{}9fe8{}}}{{{}-{}}}{{{}99315a172098{}}}{{{},{}}}{{{}SSD{}}}{{{}],
{}}}{{{}DatanodeInfoWithStorage{}}}{{{}[{}}}{{{}10{}}}{{{}.{}}}{{{}60{}}}{{{}.{}}}{{{}75{}}}{{{}.{}}}{{{}52{}}}{{{}:{}}}{{{}50010{}}}{{{},{}}}{{{}DS{}}}{{{}-{}}}{{{}93a433be{}}}{{{}-{}}}{{{}972f{}}}{{{}-{}}}{{{}4457{}}}{{{}-{}}}{{{}92ae{}}}{{{}-{}}}{{{}dd07288e41b5{}}}{{{},{}}}{{{}SSD{}}}{{{}]].{}}}
{color:#000000}[HMASTER-1]{color}
{{{}2024{}}}{{{}-{}}}{{{}12{}}}{{{}-{}}}{{{}05{}}}{{
}}{{{}18{}}}{{{}:{}}}{{{}11{}}}{{{}:{}}}{{{}42{}}}{{{},{}}}{{{}036{}}}{{
}}{{DEBUG}}{{{}
[{}}}{{{}master{}}}{{{}/{}}}{{{}hmaster{}}}{{{}-{}}}{{{}1{}}}{{{}:{}}}{{{}60000{}}}{{{}:{}}}{{{}becomeActiveMaster{}}}{{{}]
{}}}{{{}store{}}}{{{}.{}}}{{{}ProcedureTree{}}}{{
}}{{{}-{}}}{{{}{}}}{{{}Procedure{}}}{{
}}{{{}Procedure{}}}{{{}({}}}{{{}pid{}}}{{{}={}}}{{{}6003288{}}}{{{},
{}}}{{{}ppid{}}}{{{}={}}}{{{}6002575{}}}{{{},
{}}}{{{}class{}}}{{{}={}}}{{{}org{}}}{{{}.{}}}{{{}apache{}}}{{{}.{}}}{{{}hadoop{}}}{{{}.{}}}{{{}hbase{}}}{{{}.{}}}{{{}master{}}}{{{}.{}}}{{{}procedure{}}}{{{}.{}}}{{{}SplitWALRemoteProcedure{}}}{{{})
{}}}{{stack}}{{ }}{{{}ids{}}}{{{}=[{}}}{{{}3592]{}}}
{color:#000000}[HMASTER-1]{color}
{{{}2024{}}}{{{}-{}}}{{{}12{}}}{{{}-{}}}{{{}05{}}}{{
}}{{{}18{}}}{{{}:{}}}{{{}11{}}}{{{}:{}}}{{{}42{}}}{{{},{}}}{{{}214{}}}{{
}}{{DEBUG}}{{{}
[{}}}{{{}master{}}}{{{}/{}}}{{{}hmaster{}}}{{{}-{}}}{{{}1{}}}{{{}:{}}}{{{}60000{}}}{{{}:{}}}{{{}becomeActiveMaster{}}}{{{}]
{}}}{{{}procedure2{}}}{{{}.{}}}{{{}ProcedureExecutor{}}}{{ }}{{-}}{{
}}{{Loading}}{{ }}{{{}pid{}}}{{{}={}}}{{{}6003288{}}}{{{},
{}}}{{{}ppid{}}}{{{}={}}}{{{}6002575{}}}{{{},
{}}}{{{}state{}}}{{{}={}}}{{{}RUNNABLE{}}}{{{};
{}}}{{SplitWALRemoteProcedure}}{{
}}{{{}regionserver{}}}{{{}-{}}}{{{}53{}}}{{{}.{}}}{{{}regionserver{}}}{{{}.{}}}{{{}hbase{}}}{{{}.{}}}{{{}hbase33a{}}}{{{}.{}}}{{{}hbase{}}}{{{}.{}}}{{{}monitoring{}}}{{{}.{}}}{{{}aws{}}}{{{}-{}}}{{{}esvc1{}}}{{{}-{}}}{{{}useast2{}}}{{{}.{}}}{{{}aws{}}}{{{}.{}}}{{{}sfdc{}}}{{{}.{}}}{{{}is{}}}{{{}%2C{}}}{{{}60020{}}}{{{}%2C{}}}{{{}1730886070174{}}}{{{}.{}}}{{{}1733410178680{}}}{{{},
{}}}{{{}worker{}}}{{{}={}}}{{{}regionserver{}}}{{{}-{}}}{{{}15{}}}{{{}.{}}}{{{}regionserver{}}}{{{}.{}}}{{{}hbase{}}}{{{}.{}}}{{{}hbase33a{}}}{{{}.{}}}{{{}hbase{}}}{{{}.{}}}{{{}monitoring{}}}{{{}.{}}}{{{}aws{}}}{{{}-{}}}{{{}esvc1{}}}{{{}-{}}}{{{}useast2{}}}{{{}.{}}}{{{}aws{}}}{{{}.{}}}{{{}sfdc{}}}{{{}.{}}}{{{}is{}}}{{{},{}}}{{{}60020{}}}{{{},{}}}{{{}1730878238028{}}}
{color:#000000}[RS-15]{color}
{{{}2024{}}}{{{}-{}}}{{{}12{}}}{{{}-{}}}{{{}05{}}}{{
}}{{{}18{}}}{{{}:{}}}{{{}11{}}}{{{}:{}}}{{{}42{}}}{{{},{}}}{{{}769{}}}{{
}}{{DEBUG}}{{{}
[{}}}{{{}iority{}}}{{{}.{}}}{{{}RWQ{}}}{{{}.{}}}{{{}Fifo{}}}{{{}.{}}}{{{}read{}}}{{{}.{}}}{{{}handler{}}}{{{}={}}}{{{}79{}}}{{{},{}}}{{{}queue{}}}{{{}={}}}{{{}7{}}}{{{},{}}}{{{}port{}}}{{{}={}}}{{{}60020{}}}{{{}]
{}}}{{{}regionserver{}}}{{{}.{}}}{{{}RSRpcServices{}}}{{ }}{{-}}{{
}}{{Executing}}{{ }}{{remote}}{{ }}{{procedure}}{{
}}{{{}class{}}}{{{}{}}}{{{}org{}}}{{{}.{}}}{{{}apache{}}}{{{}.{}}}{{{}hadoop{}}}{{{}.{}}}{{{}hbase{}}}{{{}.{}}}{{{}regionserver{}}}{{{}.{}}}{{{}SplitWALCallable{}}}{{{},
{}}}{{{}pid{}}}{{{}={}}}{{{}6003288{}}}
{color:#000000}[RS-15]{color}
{{{}2024{}}}{{{}-{}}}{{{}12{}}}{{{}-{}}}{{{}05{}}}{{
}}{{{}18{}}}{{{}:{}}}{{{}11{}}}{{{}:{}}}{{{}48{}}}{{{},{}}}{{{}247{}}}{{
}}{{DEBUG}}{{{}
[{}}}{{{}_{}}}{{{}REPLAY{}}}{{{}_{}}}{{{}OPS{}}}{{{}-{}}}{{{}regionserver{}}}{{{}/{}}}{{{}regionserver{}}}{{{}-{}}}{{{}15{}}}{{{}:{}}}{{{}60020{}}}{{{}-{}}}{{{}192{}}}{{{}]
{}}}{{{}regionserver{}}}{{{}.{}}}{{{}RemoteProcedureResultReporter{}}}{{
}}{{-}}{{ }}{{Successfully}}{{ }}{{complete}}{{ }}{{execution}}{{ }}{{of}}{{
}}{{{}pid{}}}{{{}={}}}{{{}6003288{}}}
{color:#000000}[HMASTER-1]{color}
{{{}2024{}}}{{{}-{}}}{{{}12{}}}{{{}-{}}}{{{}05{}}}{{
}}{{{}18{}}}{{{}:{}}}{{{}11{}}}{{{}:{}}}{{{}48{}}}{{{},{}}}{{{}304{}}}{{
}}{{INFO}}{{{} [{}}}{{{}PEWorker{}}}{{{}-{}}}{{{}2{}}}{{{}]
{}}}{{{}procedure2{}}}{{{}.{}}}{{{}ProcedureExecutor{}}}{{ }}{{-}}{{
}}{{Finished}}{{ }}{{{}pid{}}}{{{}={}}}{{{}6002575{}}}{{{},
{}}}{{{}ppid{}}}{{{}={}}}{{{}6000775{}}}{{{},
{}}}{{{}state{}}}{{{}={}}}{{{}SUCCESS{}}}{{{};
{}}}{{{}SplitWALProcedure{}}}{{{}{}}}{{{}regionserver{}}}{{{}-{}}}{{{}53{}}}{{{}.{}}}{{{}regionserver{}}}{{{}.{}}}{{{}hbase{}}}{{{}.{}}}{{{}hbase33a{}}}{{{}.{}}}{{{}hbase{}}}{{{}.{}}}{{{}monitoring{}}}{{{}.{}}}{{{}aws{}}}{{{}-{}}}{{{}esvc1{}}}{{{}-{}}}{{{}useast2{}}}{{{}.{}}}{{{}aws{}}}{{{}.{}}}{{{}sfdc{}}}{{{}.{}}}{{{}is{}}}{{{}%2C{}}}{{{}60020{}}}{{{}%2C{}}}{{{}1730886070174{}}}{{{}.{}}}{{{}1733410178680{}}}{{{},
{}}}{{{}worker{}}}{{{}={}}}{{{}regionserver{}}}{{{}-{}}}{{{}15{}}}{{{}.{}}}{{{}regionserver{}}}{{{}.{}}}{{{}hbase{}}}{{{}.{}}}{{{}hbase33a{}}}{{{}.{}}}{{{}hbase{}}}{{{}.{}}}{{{}monitoring{}}}{{{}.{}}}{{{}aws{}}}{{{}-{}}}{{{}esvc1{}}}{{{}-{}}}{{{}useast2{}}}{{{}.{}}}{{{}aws{}}}{{{}.{}}}{{{}sfdc{}}}{{{}.{}}}{{{}is{}}}{{{},{}}}{{{}60020{}}}{{{},{}}}{{{}1730878238028{}}}{{
}}{{in}}{{ }}*{{3}}{{ }}{{{}hrs{}}}{{{}, {}}}{{16}}{{ }}{{{}mins{}}}{{{},
{}}}{{{}52{}}}{{{}.{}}}{{{}806{}}}{{ }}{{sec}}*
{quote}
> Set UncaughtException Handler for RegionServer ExecutorService
> --------------------------------------------------------------
>
> Key: HBASE-29041
> URL: https://issues.apache.org/jira/browse/HBASE-29041
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Affects Versions: 3.0.0, 2.6.1, 2.5.10
> Reporter: Nirdosh Kumar Yadav
> Priority: Minor
>
> In HBase cluster we have encountered a scenario where regionserver server
> crash procedure(SCP) waited for more than 3 Hours. Incident was triggered due
> to temporary network unavailability in hbase cluster. On Debugging found out
> SCP was stuck due to child {{SplitWALProcedure}} which was waiting for
> completion of SpliWalRemote procedure by regionserver worker. SplitWALRemote
> procedure while running encountered{{{{}} an unknown exception. In logs we
> can see *"hdfs*{}}}{*}{{{}.{}}}{{{}DataStreamer - {{}}}}{{{}No ack
> }}\{{{}receive{}}}{{{}d{}}}"{*} error while regionserver connecting to Data
> Node. After this error thread was stuck or died as there was no related logs
> exists{{{}. There were inconsistent regions reported during this period. All
> procedure were restarted and completed after Active HMaster service was
> bounced. {}}}
> Related logs:
> [HMASTER-4]
> 2024-12-05 14:55:11,264 INFO [PEWorker-41] procedure2.ProcedureExecutor -
> Initialized subprocedures=[\{pid=6003288, ppid=6002575, state=RUNNABLE;
> SplitWALRemoteProcedureregionserver-53.regionserver.hbase.hbase33a.hbase.monitoring.aws-esvc1-useast2.aws.sfdc.is%2C60020%2C1730886070174.1733410178680,
>
> worker=regionserver-15.regionserver.hbase.hbase33a.hbase.monitoring.aws-esvc1-useast2.aws.sfdc.is,60020,1730878238028}]
> [RS-15]
> 2024-12-05 14:55:11,461 DEBUG
> [iority.RWQ.Fifo.read.handler=83,queue=2,port=60020]
> regionserver.RSRpcServices - Executing remote procedure
> classorg.apache.hadoop.hbase.regionserver.SplitWALCallable, pid=6003288
> [RS-15]
> 2024-12-05 14:55:54,689 ERROR [split-log-closeStream-pool-0]
> hdfs.DataStreamer - No ack received, took 25002ms (threshold=25000ms). File
> being written:
> /hbase/data/default/tsdb/c997c5f8dd36481dcd3ebb9b79a35b51/recovered.edits/0000000000539451088-regionserver-53.regionserver.hbase.hbase33a.hbase.monitoring.aws-esvc1-useast2.aws.sfdc.is%2C60020%2C1730886070174.1733410178680.temp,
> block: BP-1745262640-10.60.130.13-1712173738392:blk_1330710217_257120451,
> Write pipeline datanodes:
> [DatanodeInfoWithStorage[10.60.52.107:50010,DS-f2b7ba1a-68b5-433a-9fe8-99315a172098,SSD],
>
> DatanodeInfoWithStorage[10.60.75.52:50010,DS-93a433be-972f-4457-92ae-dd07288e41b5,SSD]].
> [HMASTER-1]
> 2024-12-05 18:11:42,036 DEBUG [master/hmaster-1:60000:becomeActiveMaster]
> store.ProcedureTree -Procedure Procedure(pid=6003288, ppid=6002575,
> class=org.apache.hadoop.hbase.master.procedure.SplitWALRemoteProcedure) stack
> ids=[3592]
> [HMASTER-1]
> 2024-12-05 18:11:42,214 DEBUG [master/hmaster-1:60000:becomeActiveMaster]
> procedure2.ProcedureExecutor - Loading pid=6003288, ppid=6002575,
> state=RUNNABLE; SplitWALRemoteProcedure
> regionserver-53.regionserver.hbase.hbase33a.hbase.monitoring.aws-esvc1-useast2.aws.sfdc.is%2C60020%2C1730886070174.1733410178680,
>
> worker=regionserver-15.regionserver.hbase.hbase33a.hbase.monitoring.aws-esvc1-useast2.aws.sfdc.is,60020,1730878238028
> [RS-15]
> 2024-12-05 18:11:42,769 DEBUG
> [iority.RWQ.Fifo.read.handler=79,queue=7,port=60020]
> regionserver.RSRpcServices - Executing remote procedure
> classorg.apache.hadoop.hbase.regionserver.SplitWALCallable, pid=6003288
> [RS-15]
> 2024-12-05 18:11:48,247 DEBUG
> [_REPLAY_OPS-regionserver/regionserver-15:60020-192]
> regionserver.RemoteProcedureResultReporter - Successfully complete execution
> of pid=6003288
> [HMASTER-1]
> 2024-12-05 18:11:48,304 INFO [PEWorker-2] procedure2.ProcedureExecutor -
> Finished pid=6002575, ppid=6000775, state=SUCCESS;
> SplitWALProcedureregionserver-53.regionserver.hbase.hbase33a.hbase.monitoring.aws-esvc1-useast2.aws.sfdc.is%2C60020%2C1730886070174.1733410178680,
>
> worker=regionserver-15.regionserver.hbase.hbase33a.hbase.monitoring.aws-esvc1-useast2.aws.sfdc.is,60020,1730878238028
> in 3 hrs, 16 mins, 52.806 sec
--
This message was sent by Atlassian Jira
(v8.20.10#820010)