[ 
https://issues.apache.org/jira/browse/HDFS-15175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17388275#comment-17388275
 ] 

Stephen O'Donnell commented on HDFS-15175:
------------------------------------------

This is a tricky problem. If I understand correctly the issue is that with 
async edit logging, some number of edits are pending in memory waiting to be 
flushed to disk. Those edit Ops have a reference to a block object which is 
mutable and can change before they are flushed. If that block object changes, 
then the information in all the pending edits will be changed too. This goes 
further than just the block objects, as even the list of blocks for a given 
file is a mutable list and the edit op holds a reference to it. Therefore if 
the list of blocks changes, which is written to the edit log will change too.

One solution (which is not really feasible) is to make the Block object 
immutable, so that when it is changed a new instance is created, but that would 
be difficult - it is used in many places. It would be more efficient than deep 
copying every object for edits, but perhaps not in other parts of the NN. I 
don't see a better way than making a copy of the mutable data when it is stored 
into the pending edit op.

> Multiple CloseOp shared block instance causes the standby namenode to crash 
> when rolling editlog
> ------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-15175
>                 URL: https://issues.apache.org/jira/browse/HDFS-15175
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.9.2
>            Reporter: Yicong Cai
>            Assignee: Wan Chang
>            Priority: Critical
>              Labels: NameNode
>         Attachments: HDFS-15175-trunk.1.patch
>
>
>  
> {panel:title=Crash exception}
> 2020-02-16 09:24:46,426 [507844305] - ERROR [Edit log 
> tailer:FSEditLogLoader@245] - Encountered exception on operation CloseOp 
> [length=0, inodeId=0, path=..., replication=3, mtime=1581816138774, 
> atime=1581814760398, blockSize=536870912, blocks=[blk_5568434562_4495417845], 
> permissions=da_music:hdfs:rw-r-----, aclEntries=null, clientName=, 
> clientMachine=, overwrite=false, storagePolicyId=0, opCode=OP_CLOSE, 
> txid=32625024993]
>  java.io.IOException: File is not under construction: ......
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:442)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:237)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:146)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:891)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:872)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:262)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:395)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:348)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:365)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:360)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1873)
>  at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:479)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:361)
> {panel}
>  
> {panel:title=Editlog}
> <RECORD>
>  <OPCODE>OP_REASSIGN_LEASE</OPCODE>
>  <DATA>
>  <TXID>32625021150</TXID>
>  <LEASEHOLDER>DFSClient_NONMAPREDUCE_-969060727_197760</LEASEHOLDER>
>  <PATH>......</PATH>
>  <NEWHOLDER>DFSClient_NONMAPREDUCE_1000868229_201260</NEWHOLDER>
>  </DATA>
>  </RECORD>
> ......
> <RECORD>
>  <OPCODE>OP_CLOSE</OPCODE>
>  <DATA>
>  <TXID>32625023743</TXID>
>  <LENGTH>0</LENGTH>
>  <INODEID>0</INODEID>
>  <PATH>......</PATH>
>  <REPLICATION>3</REPLICATION>
>  <MTIME>1581816135883</MTIME>
>  <ATIME>1581814760398</ATIME>
>  <BLOCKSIZE>536870912</BLOCKSIZE>
>  <CLIENT_NAME></CLIENT_NAME>
>  <CLIENT_MACHINE></CLIENT_MACHINE>
>  <OVERWRITE>false</OVERWRITE>
>  <BLOCK>
>  <BLOCK_ID>5568434562</BLOCK_ID>
>  <NUM_BYTES>185818644</NUM_BYTES>
>  <GENSTAMP>4495417845</GENSTAMP>
>  </BLOCK>
>  <PERMISSION_STATUS>
>  <USERNAME>da_music</USERNAME>
>  <GROUPNAME>hdfs</GROUPNAME>
>  <MODE>416</MODE>
>  </PERMISSION_STATUS>
>  </DATA>
>  </RECORD>
> ......
> <RECORD>
>  <OPCODE>OP_TRUNCATE</OPCODE>
>  <DATA>
>  <TXID>32625024049</TXID>
>  <SRC>......</SRC>
>  <CLIENTNAME>DFSClient_NONMAPREDUCE_1000868229_201260</CLIENTNAME>
>  <CLIENTMACHINE>......</CLIENTMACHINE>
>  <NEWLENGTH>185818644</NEWLENGTH>
>  <TIMESTAMP>1581816136336</TIMESTAMP>
>  <BLOCK>
>  <BLOCK_ID>5568434562</BLOCK_ID>
>  <NUM_BYTES>185818648</NUM_BYTES>
>  <GENSTAMP>4495417845</GENSTAMP>
>  </BLOCK>
>  </DATA>
>  </RECORD>
> ......
> <RECORD>
>  <OPCODE>OP_CLOSE</OPCODE>
>  <DATA>
>  <TXID>32625024993</TXID>
>  <LENGTH>0</LENGTH>
>  <INODEID>0</INODEID>
>  <PATH>......</PATH>
>  <REPLICATION>3</REPLICATION>
>  <MTIME>1581816138774</MTIME>
>  <ATIME>1581814760398</ATIME>
>  <BLOCKSIZE>536870912</BLOCKSIZE>
>  <CLIENT_NAME></CLIENT_NAME>
>  <CLIENT_MACHINE></CLIENT_MACHINE>
>  <OVERWRITE>false</OVERWRITE>
>  <BLOCK>
>  <BLOCK_ID>5568434562</BLOCK_ID>
>  <NUM_BYTES>185818644</NUM_BYTES>
>  <GENSTAMP>4495417845</GENSTAMP>
>  </BLOCK>
>  <PERMISSION_STATUS>
>  <USERNAME>da_music</USERNAME>
>  <GROUPNAME>hdfs</GROUPNAME>
>  <MODE>416</MODE>
>  </PERMISSION_STATUS>
>  </DATA>
>  </RECORD>
> {panel}
>  
>  
> The block size should be 185818648 in the first CloseOp. When truncate is 
> used, the block size becomes 185818644. The CloseOp/TruncateOp/CloseOp is 
> synchronized to the JournalNode in the same batch. The block used by CloseOp 
> twice is the same instance, which causes the first CloseOp has wrong block 
> size. When SNN rolling Editlog, TruncateOp does not make the file to the 
> UnderConstruction state. Then, when the second CloseOp is executed, the file 
> is not in the UnderConstruction state, and SNN crashes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to