Rushabh S Shah created HDFS-8245:
------------------------------------

             Summary: Standby namenode doesn't process DELETED_BLOCK if the add 
block request is in edit log.
                 Key: HDFS-8245
                 URL: https://issues.apache.org/jira/browse/HDFS-8245
             Project: Hadoop HDFS
          Issue Type: Bug
    Affects Versions: 2.6.0
            Reporter: Rushabh S Shah


The following series of events happened on Standby namenode :
2015-04-09 07:47:21,735 \[Edit log tailer] INFO ha.EditLogTailer: Triggering 
log roll on remote NameNode Active Namenode (ANN)
2015-04-09 07:58:01,858 \[Edit log tailer] INFO ha.EditLogTailer: Triggering 
log roll on remote NameNode ANN

The following series of events happened on Active Namenode:,
2015-04-09 07:47:21,747 \[IPC Server handler 99 on 8020] INFO 
namenode.FSNamesystem: Roll Edit Log from Standby NN (SNN)
2015-04-09 07:58:01,868 \[IPC Server handler 18 on 8020] INFO 
namenode.FSNamesystem: Roll Edit Log from SNN


The following series of events happened on datanode ( {color:red} datanodeA 
{color}):
2015-04-09 07:52:15,817 \[DataXceiver for client 
DFSClient_attempt_1428022041757_102831_r_000107_0_1139131345_1 at /xxxx:51078 
\[Receiving block 
BP-595383232-xxxx-1360869396230:blk_1570321882_1102029183867]] INFO 
datanode.DataNode: Receiving 
BP-595383232-xxxx-1360869396230:blk_1570321882_1102029183867 src: /client:51078 
dest: /{color:red}datanodeA:1004{color}

2015-04-09 07:52:15,969 \[PacketResponder: 
BP-595383232-xxxx-1360869396230:blk_1570321882_1102029183867, 
type=HAS_DOWNSTREAM_IN_PIPELINE] INFO DataNode.clienttrace: src: /client:51078, 
dest: /{color:red}datanodeA:1004{color}, bytes: 20, op: HDFS_WRITE, cliID: 
DFSClient_attempt_1428022041757_102831_r_000107_0_1139131345_1, offset: 0, 
srvID: 356a8a98-826f-446d-8f4c-ce288c1f0a75, blockid: 
BP-595383232-xxxx-1360869396230:blk_1570321882_1102029183867, duration: 
148948385
2015-04-09 07:52:15,969 \[PacketResponder: 
BP-595383232-xxxx-1360869396230:blk_1570321882_1102029183867, 
type=HAS_DOWNSTREAM_IN_PIPELINE] INFO datanode.DataNode: PacketResponder: 
BP-595383232-xxxx-1360869396230:blk_1570321882_1102029183867, 
type=HAS_DOWNSTREAM_IN_PIPELINE terminating
2015-04-09 07:52:25,970 \[DataXceiver for client /{color:red}<datanodeB> 
{color}:52827 \[Copying block 
BP-595383232-xxxx-1360869396230:blk_1570321882_1102029183867]] INFO 
datanode.DataNode: Copied 
BP-595383232-xxxx-1360869396230:blk_1570321882_1102029183867 to 
<{color:red}datanodeB{color}>:52827
2015-04-09 07:52:28,187 \[DataNode:   heartbeating to ANN:8020] INFO 
impl.FsDatasetAsyncDiskService: Scheduling blk_1570321882_1102029183867 file 
<path>/blk_1570321882 for deletion
2015-04-09 07:52:28,188 \[Async disk worker #1482 for volume ] INFO 
impl.FsDatasetAsyncDiskService: Deleted BP-595383232-xxxx-1360869396230 
blk_1570321882_1102029183867 file <path>/blk_1570321882

Then we failover for upgrade and then the standby became active.

When we did  ls command on this file, we got the following exception:

15/04/09 22:07:39 WARN hdfs.BlockReaderFactory: I/O error constructing remote 
block reader.
java.io.IOException: Got error for OP_READ_BLOCK, self=/client:32947, 
remote={color:red}datanodeA:1004{color}, for file <filename>, for pool 
BP-595383232-xxxx-1360869396230 block 1570321882_1102029183867
        at 
org.apache.hadoop.hdfs.RemoteBlockReader2.checkSuccess(RemoteBlockReader2.java:445)
        at 
org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:410)
        at 
org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:815)
        at 
org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:693)
        at 
org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:351)
        at 
org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:576)
        at 
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:800)
        at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:847)
        at java.io.DataInputStream.read(DataInputStream.java:100)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:78)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:52)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:112)
        at 
org.apache.hadoop.fs.shell.CopyCommands$Merge.processArguments(CopyCommands.java:97)
        at 
org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:190)
        at org.apache.hadoop.fs.shell.Command.run(Command.java:154)
        at org.apache.hadoop.fs.FsShell.run(FsShell.java:287)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
        at org.apache.hadoop.fs.FsShell.main(FsShell.java:340)

Eventually this request was served by {color:red} datanodeB {color} on the 
second attempt, 
15/04/09 22:07:39 INFO hdfs.DFSClient: Successfully connected to 
{color:red}datanodeB:1004{color} for 
BP-595383232-xxxx-1360869396230:blk_1570321882_1102029183867

On further analyzing this problem, I found that when {color:red} datanodeA 
{color} notified the namenode after deleting that block, the then ANN didn't 
queue delete request to PendingDNMessages.
So when the then SNN read the edits, it didn't knew that the block from 
{color:red}datanodeA {color} was deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to