Viraj Jasani created HBASE-29180: ------------------------------------ Summary: Apply fail-fast retry limit for UnknownHostException Key: HBASE-29180 URL: https://issues.apache.org/jira/browse/HBASE-29180 Project: HBase Issue Type: Sub-task Affects Versions: 2.5.11 Reporter: Viraj Jasani
As part of HBASE-28638, fail-fast retry limit has been introduced for errors like CallQueueTooBigException, SaslException, ConnectionClosedException. This helps limit the num of retries that RSProcedureDispatcher has to perform while executing remote procedures. Since the region open/close fails on the remote server, we also trigger SCP for the target server. We recently came across UnknownHostException as another example of where the remote calls can get stuck forever: {code:java} WARN [RSProcedureDispatcher-pool-98034] procedure.RSProcedureDispatcher - request to rs1.xyz,60020,1739254267238 failed due to java.net.UnknownHostException: Call to address=rs1.xyz:60020 failed on local exception: java.net.UnknownHostException: rs1.xyz:60020 could not be resolved, try=2867, retrying... , request params: open_region { open_info { region { ... ... {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)