[ https://issues.apache.org/jira/browse/HDFS-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chen Liang updated HDFS-10688: ------------------------------ Attachment: HDFS-10688.001.patch > BPServiceActor may run into a tight loop for sending block report when > hitting IOException > ------------------------------------------------------------------------------------------ > > Key: HDFS-10688 > URL: https://issues.apache.org/jira/browse/HDFS-10688 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode > Reporter: Jing Zhao > Assignee: Chen Liang > Attachments: HDFS-10688.001.patch > > > Currently in BPServiceActor#offerService, when datanode runs into a local > IOException, the DataNode only logs the exception and runs into the while > loop again: > {code} > } catch(RemoteException re) { > ....... > LOG.warn("RemoteException in offerService", re); > try { > long sleepTime = Math.min(1000, dnConf.heartBeatInterval); > Thread.sleep(sleepTime); > } catch (InterruptedException ie) { > Thread.currentThread().interrupt(); > } > } catch (IOException e) { > LOG.warn("IOException in offerService", e); > } > {code} > This tight loop may cause some issue. For example, in a production cluster, > we saw a DataNode hit exception when doing kerberos realm lookup. This tight > loop finally caused the DataNode to send hundreds of DNS lookup queries. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org