[ 
https://issues.apache.org/jira/browse/HDFS-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14050803#comment-14050803
 ] 

Colin Patrick McCabe commented on HDFS-6618:
--------------------------------------------

Ugh, I wrote a long comment and then closed the window.

One thing that I'm a bit afraid of here is that previously, if we threw an 
exception while building up the {{List<INode> removedINodes}}, we'd avoid 
modifying the INodeMap.  But now, we may leave the INodeMap in an inconsistent 
state.

A lot of these functions you're passing the remover to are declared to throw 
checked exceptions.  Like this one:
{code}
  public Quota.Counts cleanSubtree(final int snapshotId, int priorSnapshotId,
      final BlocksMapUpdateInfo collectedBlocks,
      final INodeRemover inodeRemover, final boolean countDiffChange)
      throws QuotaExceededException {
{code}

What happens if a {{QuotaExceededException}} is thrown here after we modified 
the INodeMap, but before we invalidated the blocks  (which are still collected 
in collectedBlocks)?  Seems like we'll be in an inconsistent state.

Maybe I'm missing something obvious, but can't we just move the 
{{dir.removeFromInodeMap}} line into the first {{writeLock}} block?  Then we 
could keep the "add inodes to a list" approach (deferred approach) rather than 
directly mutating the inodeMap.

{code}
    writeLock();
    try {
      checkOperation(OperationCategory.WRITE);
      checkNameNodeSafeMode("Cannot delete " + src);
      src = FSDirectory.resolvePath(src, pathComponents, dir);
      if (!recursive && dir.isNonEmptyDirectory(src)) {
        throw new PathIsNotEmptyDirectoryException(src + " is non empty");
      }
      if (enforcePermission && isPermissionEnabled) {
        checkPermission(pc, src, false, null, FsAction.WRITE, null,
            FsAction.ALL, true, false);
      }
      long mtime = now();
      // Unlink the target directory from directory tree
      long filesRemoved = dir.delete(src, collectedBlocks, removedINodes,
              mtime);
      if (filesRemoved < 0) {
        return false;
      }
      getEditLog().logDelete(src, mtime, logRetryCache);
      incrDeletedFileCount(filesRemoved);
      // Blocks/INodes will be handled later
      removePathAndBlocks(src, null, null);
      ret = true;
    } finally {
      writeUnlock();
    }
    getEditLog().logSync(); 
    removeBlocks(collectedBlocks); // Incremental deletion of blocks
    collectedBlocks.clear();

    dir.writeLock();
    try {
      dir.removeFromInodeMap(removedINodes);  <=== move this up?
    } finally {
      dir.writeUnlock();
    }
    removedINodes.clear();
{code}

I apologize if I'm missing an obvious locking or other issue that prevents this.

> Remove deleted INodes from INodeMap right away
> ----------------------------------------------
>
>                 Key: HDFS-6618
>                 URL: https://issues.apache.org/jira/browse/HDFS-6618
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.5.0
>            Reporter: Kihwal Lee
>            Priority: Blocker
>         Attachments: HDFS-6618.AbstractList.patch, 
> HDFS-6618.inodeRemover.patch, HDFS-6618.inodeRemover.v2.patch, HDFS-6618.patch
>
>
> After HDFS-6527, we have not seen the edit log corruption for weeks on 
> multiple clusters until yesterday. Previously, we would see it within 30 
> minutes on a cluster.
> But the same condition was reproduced even with HDFS-6527.  The only 
> explanation is that the RPC handler thread serving {{addBlock()}} was 
> accessing stale parent value.  Although nulling out parent is done inside the 
> {{FSNamesystem}} and {{FSDirectory}} write lock, there is no memory barrier 
> because there is no "synchronized" block involved in the process.
> I suggest making parent volatile.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to