[ 
https://issues.apache.org/jira/browse/HDFS-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965255#comment-13965255
 ] 

Ding Yuan commented on HDFS-6145:
---------------------------------

Ping. Is there anything else I can help from my side?

> Stopping unexpected exception from propagating to avoid serious consequences
> ----------------------------------------------------------------------------
>
>                 Key: HDFS-6145
>                 URL: https://issues.apache.org/jira/browse/HDFS-6145
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 2.2.0
>            Reporter: Ding Yuan
>
> There are a few cases where an exception should never have occurred, but the 
> code simply logged it and let the execution continue. Since they shouldn't 
> have occurred, a safer way may be to simply terminate the execution and stop 
> them from propagating into some unexpected consequences.
> ==========================
> Case 1:
> Line: 336, File: 
> "org/apache/hadoop/hdfs/server/namenode/snapshot/INodeDirectorySnapshottable.java"
> {noformat}
> 325:       try {
> 326:         Quota.Counts counts = cleanSubtree(snapshot, prior, 
> collectedBlocks,
> 327:             removedINodes, true);
> 328:         INodeDirectory parent = getParent();
>  .. ..
> 335:       } catch(QuotaExceededException e) {
> 336:         LOG.error("BUG: removeSnapshot increases namespace usage.", e);
> 337:       }
> {noformat}
> Since this shouldn't have occurred unless some unexpected bugs occur,
> should the NN simply stop the execution to prevent bad things from 
> propagation?
> Similar handling of QuotaExceededException can be found at:
>   Line: 544, File: 
> "org/apache/hadoop/hdfs/server/namenode/INodeReference.java"
>   Line: 657, File: 
> "org/apache/hadoop/hdfs/server/namenode/INodeReference.java"
>   Line: 669, File: 
> "org/apache/hadoop/hdfs/server/namenode/INodeReference.java"
> ==========================================
> ==========================
> Case 2:
> Line: 601, File: "org/apache/hadoop/hdfs/server/namenode/JournalSet.java"
> {noformat}
> 591:  public synchronized RemoteEditLogManifest getEditLogManifest(long 
> fromTxId,
> ..
> 595:    for (JournalAndStream j : journals) {
> ..
> 598:         try {
> 599:           allLogs.addAll(fjm.getRemoteEditLogs(fromTxId, forReading, 
> false));
> 600:         } catch (Throwable t) {
> 601:           LOG.warn("Cannot list edit logs in " + fjm, t);
> 602:         }
> {noformat}
> An exception from addAll will result in some edit log files not considered, 
> and not included in the checkpoint, which may result in dataloss.
> ==========================================
> ==========================
> Case 3:
> Line: 4029, File: "org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java"
> {noformat}
> 4010:       try {
> 4011:         while (fsRunning && shouldNNRmRun) {
> 4012:           checkAvailableResources();
> 4013:           if(!nameNodeHasResourcesAvailable()) {
> 4014:             String lowResourcesMsg = "NameNode low on available disk 
> space. ";
> 4015:             if (!isInSafeMode()) {
> 4016:               FSNamesystem.LOG.warn(lowResourcesMsg + "Entering safe 
> mode.");
> 4017:             } else {
> 4018:               FSNamesystem.LOG.warn(lowResourcesMsg + "Already in safe 
> mode.");
> 4019:             }
> 4020:             enterSafeMode(true);
> 4021:           }
> .. ..
> 4027:         }
> 4028:       } catch (Exception e) {
> 4029:         FSNamesystem.LOG.error("Exception in NameNodeResourceMonitor: 
> ", e);
> 4030:       }
> {noformat}
> enterSafeMode might thrown exception. In the case of not being able to 
> entering safe mode, should the execution simply terminate?
> ==========================================



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to