Jing Zhao created HDFS-10688: -------------------------------- Summary: BPServiceActor may run into a tight loop for sending block report when hitting IOException Key: HDFS-10688 URL: https://issues.apache.org/jira/browse/HDFS-10688 Project: Hadoop HDFS Issue Type: Bug Components: datanode Reporter: Jing Zhao Assignee: Chen Liang
Currently in BPServiceActor#offerService, when datanode runs into a local IOException, the DataNode only logs the exception and runs into the while loop again: {code} } catch(RemoteException re) { ....... LOG.warn("RemoteException in offerService", re); try { long sleepTime = Math.min(1000, dnConf.heartBeatInterval); Thread.sleep(sleepTime); } catch (InterruptedException ie) { Thread.currentThread().interrupt(); } } catch (IOException e) { LOG.warn("IOException in offerService", e); } {code} This tight loop may cause some issue. For example, in a production cluster, we saw a DataNode hit exception when doing kerberos realm lookup. This tight loop finally caused the DataNode to send hundreds of DNS lookup queries. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org