[ https://issues.apache.org/jira/browse/HDFS-15175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17388275#comment-17388275 ]
Stephen O'Donnell commented on HDFS-15175: ------------------------------------------ This is a tricky problem. If I understand correctly the issue is that with async edit logging, some number of edits are pending in memory waiting to be flushed to disk. Those edit Ops have a reference to a block object which is mutable and can change before they are flushed. If that block object changes, then the information in all the pending edits will be changed too. This goes further than just the block objects, as even the list of blocks for a given file is a mutable list and the edit op holds a reference to it. Therefore if the list of blocks changes, which is written to the edit log will change too. One solution (which is not really feasible) is to make the Block object immutable, so that when it is changed a new instance is created, but that would be difficult - it is used in many places. It would be more efficient than deep copying every object for edits, but perhaps not in other parts of the NN. I don't see a better way than making a copy of the mutable data when it is stored into the pending edit op. > Multiple CloseOp shared block instance causes the standby namenode to crash > when rolling editlog > ------------------------------------------------------------------------------------------------ > > Key: HDFS-15175 > URL: https://issues.apache.org/jira/browse/HDFS-15175 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 2.9.2 > Reporter: Yicong Cai > Assignee: Wan Chang > Priority: Critical > Labels: NameNode > Attachments: HDFS-15175-trunk.1.patch > > > > {panel:title=Crash exception} > 2020-02-16 09:24:46,426 [507844305] - ERROR [Edit log > tailer:FSEditLogLoader@245] - Encountered exception on operation CloseOp > [length=0, inodeId=0, path=..., replication=3, mtime=1581816138774, > atime=1581814760398, blockSize=536870912, blocks=[blk_5568434562_4495417845], > permissions=da_music:hdfs:rw-r-----, aclEntries=null, clientName=, > clientMachine=, overwrite=false, storagePolicyId=0, opCode=OP_CLOSE, > txid=32625024993] > java.io.IOException: File is not under construction: ...... > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:442) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:237) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:146) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:891) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:872) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:262) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:395) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:348) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:365) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:360) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1873) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:479) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:361) > {panel} > > {panel:title=Editlog} > <RECORD> > <OPCODE>OP_REASSIGN_LEASE</OPCODE> > <DATA> > <TXID>32625021150</TXID> > <LEASEHOLDER>DFSClient_NONMAPREDUCE_-969060727_197760</LEASEHOLDER> > <PATH>......</PATH> > <NEWHOLDER>DFSClient_NONMAPREDUCE_1000868229_201260</NEWHOLDER> > </DATA> > </RECORD> > ...... > <RECORD> > <OPCODE>OP_CLOSE</OPCODE> > <DATA> > <TXID>32625023743</TXID> > <LENGTH>0</LENGTH> > <INODEID>0</INODEID> > <PATH>......</PATH> > <REPLICATION>3</REPLICATION> > <MTIME>1581816135883</MTIME> > <ATIME>1581814760398</ATIME> > <BLOCKSIZE>536870912</BLOCKSIZE> > <CLIENT_NAME></CLIENT_NAME> > <CLIENT_MACHINE></CLIENT_MACHINE> > <OVERWRITE>false</OVERWRITE> > <BLOCK> > <BLOCK_ID>5568434562</BLOCK_ID> > <NUM_BYTES>185818644</NUM_BYTES> > <GENSTAMP>4495417845</GENSTAMP> > </BLOCK> > <PERMISSION_STATUS> > <USERNAME>da_music</USERNAME> > <GROUPNAME>hdfs</GROUPNAME> > <MODE>416</MODE> > </PERMISSION_STATUS> > </DATA> > </RECORD> > ...... > <RECORD> > <OPCODE>OP_TRUNCATE</OPCODE> > <DATA> > <TXID>32625024049</TXID> > <SRC>......</SRC> > <CLIENTNAME>DFSClient_NONMAPREDUCE_1000868229_201260</CLIENTNAME> > <CLIENTMACHINE>......</CLIENTMACHINE> > <NEWLENGTH>185818644</NEWLENGTH> > <TIMESTAMP>1581816136336</TIMESTAMP> > <BLOCK> > <BLOCK_ID>5568434562</BLOCK_ID> > <NUM_BYTES>185818648</NUM_BYTES> > <GENSTAMP>4495417845</GENSTAMP> > </BLOCK> > </DATA> > </RECORD> > ...... > <RECORD> > <OPCODE>OP_CLOSE</OPCODE> > <DATA> > <TXID>32625024993</TXID> > <LENGTH>0</LENGTH> > <INODEID>0</INODEID> > <PATH>......</PATH> > <REPLICATION>3</REPLICATION> > <MTIME>1581816138774</MTIME> > <ATIME>1581814760398</ATIME> > <BLOCKSIZE>536870912</BLOCKSIZE> > <CLIENT_NAME></CLIENT_NAME> > <CLIENT_MACHINE></CLIENT_MACHINE> > <OVERWRITE>false</OVERWRITE> > <BLOCK> > <BLOCK_ID>5568434562</BLOCK_ID> > <NUM_BYTES>185818644</NUM_BYTES> > <GENSTAMP>4495417845</GENSTAMP> > </BLOCK> > <PERMISSION_STATUS> > <USERNAME>da_music</USERNAME> > <GROUPNAME>hdfs</GROUPNAME> > <MODE>416</MODE> > </PERMISSION_STATUS> > </DATA> > </RECORD> > {panel} > > > The block size should be 185818648 in the first CloseOp. When truncate is > used, the block size becomes 185818644. The CloseOp/TruncateOp/CloseOp is > synchronized to the JournalNode in the same batch. The block used by CloseOp > twice is the same instance, which causes the first CloseOp has wrong block > size. When SNN rolling Editlog, TruncateOp does not make the file to the > UnderConstruction state. Then, when the second CloseOp is executed, the file > is not in the UnderConstruction state, and SNN crashes. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org