[ https://issues.apache.org/jira/browse/HDFS-5500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vinod Kumar Vavilapalli updated HDFS-5500: ------------------------------------------ Target Version/s: (was: 2.8.0) Not much going on here for a long time, dropping from 2.8.0. Not putting any target-version either anymore, let's target this depending on when there is patch activity. > Critical datanode threads may terminate silently on uncaught exceptions > ----------------------------------------------------------------------- > > Key: HDFS-5500 > URL: https://issues.apache.org/jira/browse/HDFS-5500 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Kihwal Lee > Priority: Critical > > We've seen refreshUsed (DU) thread disappearing on uncaught exceptions. This > can go unnoticed for a long time. If OOM occurs, more things can go wrong. > On one occasion, Timer, multiple refreshUsed and DataXceiverServer thread had > terminated. > DataXceiverServer catches OutOfMemoryError and sleeps for 30 seconds, but I > am not sure it is really helpful. In once case, the thread did it multiple > times then terminated. I suspect another OOM was thrown while in a catch > block. As a result, the server socket was not closed and clients hung on > connect. If it had at least closed the socket, client-side would have been > impacted less. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org