[ https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16821795#comment-16821795 ]
angerszhu commented on HDFS-14437: ---------------------------------- [~hexiaoqiao] [~daryn] [~starphin] [~kihwal] [~arpitagarwal] When I do Unit test , I add some log in my code and print thread id in log4j. In TestEditLog#Transactions's run method, I add a call for #rollEditLog to simulate random multithreading call of logSync(). {code:java} @Override public void run() { PermissionStatus p = namesystem.createFsOwnerPermissions( new FsPermission((short)0777)); FSEditLog editLog = namesystem.getEditLog(); for (int i = 0; i < numTransactions; i++) { INodeFile inode = new INodeFile(namesystem.allocateNewInodeId(), null, p, 0L, 0L, BlockInfo.EMPTY_ARRAY, replication, blockSize, (byte)0); inode.toUnderConstruction("", ""); editLog.logOpenFile("/filename" + (startIndex + i), inode, false, false); editLog.logCloseFile("/filename" + (startIndex + i), inode); editLog.logSync(); if(i % 100 ==0){ try { editLog.rollEditLog(); } catch (IOException e) { e.printStackTrace(); } } } } {code} and I add some Log of these there transaction id: * txid : Current max transaction id of all thread * myTransactionId : max transaction id of current thread * syncTxId : current transaction if of EditLog that has been synced to QJM The Right #rollEditLog Event: see thread {code:java} TransactionThread-267{code} {code:java} 2019-04-19 16:32:08,068 INFO namenode.FSEditLog TransactionThread-267 - Rolling edit logs 2019-04-19 16:32:08,068 INFO namenode.FSEditLog TransactionThread-267 - Ending log segment 829 2019-04-19 16:32:08,068 INFO namenode.FSEditLog TransactionThread-267 - When endCurrentLogSegment was called : txid = 1286 2019-04-19 16:32:08,068 INFO namenode.FSEditLog TransactionThread-267 - When endCurrentLogSegment was called : myTransationId = 830 2019-04-19 16:32:08,068 INFO namenode.FSEditLog TransactionThread-267 - When endCurrentLogSegment was called : synctxid = 830 2019-04-19 16:32:08,068 INFO namenode.FSEditLog TransactionThread-267 - when logEdit, OpCode OP_END_LOG_SEGMENT 2019-04-19 16:32:08,068 INFO namenode.FSEditLog TransactionThread-267 - when logEdit, should not forceSync 2019-04-19 16:32:08,068 INFO namenode.FSEditLog TransactionThread-267 - Start LogSync 2019-04-19 16:32:08,068 INFO namenode.FSEditLog TransactionThread-267 - when logSyncAll, myTransactionId 1287 2019-04-19 16:32:08,068 INFO namenode.FSEditLog TransactionThread-267 - When logSync was called : txid = 1287 2019-04-19 16:32:08,068 INFO namenode.FSEditLog TransactionThread-267 - When logSync was called : myTransationId = 1287 2019-04-19 16:32:08,068 INFO namenode.FSEditLog TransactionThread-267 - When logSync was called : synctxid = 830 2019-04-19 16:32:08,068 INFO namenode.FSEditLog TransactionThread-267 - when logSync, start flush 2019-04-19 16:32:08,068 INFO namenode.FSEditLog TransactionThread-267 - when logSync, finish flush 2019-04-19 16:32:08,068 INFO namenode.FSEditLog TransactionThread-267 - Number of transactions: 459 Total time for transactions(ms): 9 Number of transactions batched in Syncs: 280 Number of syncs: 4 SyncTimes(ms): 0 3 2019-04-19 16:32:08,068 INFO namenode.FileJournalManager TransactionThread-267 - Finalizing edits file /Users/angerszhu/Documents/project/gdc/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1/current/edits_inprogress_0000000000000000829 -> /Users/angerszhu/Documents/project/gdc/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1/current/edits_0000000000000000829-0000000000000001287 2019-04-19 16:32:08,069 INFO namenode.FileJournalManager TransactionThread-267 - Finalizing edits file /Users/angerszhu/Documents/project/gdc/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name2/current/edits_inprogress_0000000000000000829 -> /Users/angerszhu/Documents/project/gdc/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name2/current/edits_0000000000000000829-0000000000000001287 2019-04-19 16:32:08,069 INFO namenode.FSEditLog TransactionThread-267 - Finish Ending log segment 20 {code} we can see that the sequence of this three Id is Ok . The Error #rollEditLog Case: see {code:java} TransactionThread-226 {code} {code:java} 2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-226 - When logSync was called : txid = 760 2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-226 - When logSync was called : myTransationId = 19 2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-226 - When logSync was called : synctxid = 23 2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-226 - Rolling edit logs 2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-226 - Ending log segment 3 2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-226 - When endCurrentLogSegment was called : txid = 760 2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-226 - When endCurrentLogSegment was called : myTransationId = 19 2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-226 - When endCurrentLogSegment was called : synctxid = 23 2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-226 - when logEdit, OpCode OP_END_LOG_SEGMENT 2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-226 - when logEdit, should not forceSync 2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-226 - Start LogSync 2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-226 - when logSyncAll, myTransactionId 761 2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-265 - when logEdit, OpCode OP_ADD 2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-265 - when logEdit, should not forceSync 2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-264 - when logEdit, OpCode OP_ADD 2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-264 - when logEdit, should not forceSync 2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-227 - when logEdit, OpCode OP_CLOSE 2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-227 - when logEdit, should not forceSync 2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-263 - when logEdit, OpCode OP_ADD 2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-263 - when logEdit, should not forceSync 2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-228 - when logEdit, OpCode OP_CLOSE 2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-228 - when logEdit, should not forceSync 2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-262 - when logEdit, OpCode OP_ADD 2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-262 - when logEdit, should not forceSync 2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-261 - when logEdit, OpCode OP_ADD 2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-261 - when logEdit, should not forceSync 2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-261 - when logEdit, OpCode OP_CLOSE 2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-261 - when logEdit, should not forceSync 2019-04-19 16:32:07,979 INFO namenode.FSEditLog TransactionThread-229 - when logEdit, OpCode OP_CLOSE 2019-04-19 16:32:07,979 INFO namenode.FSEditLog TransactionThread-229 - when logEdit, should not forceSync 2019-04-19 16:32:07,979 INFO namenode.FSEditLog TransactionThread-260 - when logEdit, OpCode OP_ADD 2019-04-19 16:32:07,979 INFO namenode.FSEditLog TransactionThread-260 - when logEdit, should not forceSync ................ .................. 2019-04-19 16:32:07,982 INFO namenode.FSEditLog TransactionThread-260 - when logEdit, should not forceSync 2019-04-19 16:32:07,982 INFO namenode.FSEditLog TransactionThread-262 - when logEdit, OpCode OP_CLOSE 2019-04-19 16:32:07,982 INFO namenode.FSEditLog TransactionThread-262 - when logEdit, should not forceSync 2019-04-19 16:32:07,982 INFO namenode.FSEditLog TransactionThread-263 - when logEdit, OpCode OP_CLOSE 2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-263 - when logEdit, should not forceSync 2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-264 - when logEdit, OpCode OP_CLOSE 2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-264 - when logEdit, should not forceSync 2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-265 - when logEdit, OpCode OP_CLOSE 2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-265 - when logEdit, should not forceSync 2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-266 - when logEdit, OpCode OP_CLOSE 2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-266 - when logEdit, should not forceSync 2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-225 - Rolling edit logs 2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-225 - Ending log segment 3 2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-225 - When endCurrentLogSegment was called : txid = 827 2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-225 - When endCurrentLogSegment was called : myTransationId = 758 2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-225 - When endCurrentLogSegment was called : synctxid = 758 2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-225 - when logEdit, OpCode OP_END_LOG_SEGMENT 2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-225 - when logEdit, should not forceSync 2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-225 - Start LogSync 2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-225 - when logSyncAll, myTransactionId 828 2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-225 - When logSync was called : txid = 828 2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-225 - When logSync was called : myTransationId = 828 2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-225 - When logSync was called : synctxid = 758 2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-225 - when logSync, start flush 2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-225 - when logSync, finish flush 2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-225 - Number of transactions: 826 Total time for transactions(ms): 55 Number of transactions batched in Syncs: 1 Number of syncs: 5 SyncTimes(ms): 1 2 2019-04-19 16:32:07,984 INFO namenode.FileJournalManager TransactionThread-225 - Finalizing edits file /Users/angerszhu/Documents/project/gdc/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1/current/edits_inprogress_0000000000000000003 -> /Users/angerszhu/Documents/project/gdc/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1/current/edits_0000000000000000003-0000000000000000828 2019-04-19 16:32:07,984 INFO namenode.FileJournalManager TransactionThread-225 - Finalizing edits file /Users/angerszhu/Documents/project/gdc/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name2/current/edits_inprogress_0000000000000000003 -> /Users/angerszhu/Documents/project/gdc/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name2/current/edits_0000000000000000003-0000000000000000828 2019-04-19 16:32:07,984 INFO namenode.FSEditLog TransactionThread-225 - Finish Ending log segment 2019-04-19 16:32:07,984 INFO namenode.FSEditLog TransactionThread-225 - Starting log segment at 829 2019-04-19 16:32:07,985 INFO namenode.FSEditLog TransactionThread-225 - when logEdit, OpCode OP_START_LOG_SEGMENT 2019-04-19 16:32:07,985 INFO namenode.FSEditLog TransactionThread-225 - when logEdit, should not forceSync 2019-04-19 16:32:07,985 INFO namenode.FSEditLog TransactionThread-225 - When logSync was called : txid = 829 2019-04-19 16:32:07,985 INFO namenode.FSEditLog TransactionThread-225 - When logSync was called : myTransationId = 829 2019-04-19 16:32:07,986 INFO namenode.FSEditLog TransactionThread-225 - When logSync was called : synctxid = 828 2019-04-19 16:32:07,986 INFO namenode.FSEditLog TransactionThread-225 - when logSync, start flush 2019-04-19 16:32:07,986 INFO namenode.FSEditLog TransactionThread-225 - when logSync, finish flush 2019-04-19 16:32:07,986 INFO namenode.FSEditLog TransactionThread-225 - Finish Rolling edit logs 2019-04-19 16:32:07,986 INFO namenode.FSEditLog TransactionThread-267 - when logEdit, OpCode OP_CLOSE 2019-04- ................. ................ 2019-04-19 16:32:08,080 INFO namenode.FSEditLog TransactionThread-227 - When endCurrentLogSegment was called : txid = 1328 2019-04-19 16:32:08,081 INFO namenode.FSEditLog TransactionThread-227 - When endCurrentLogSegment was called : myTransationId = 764 2019-04-19 16:32:08,081 INFO namenode.FSEditLog TransactionThread-227 - When endCurrentLogSegment was called : synctxid = 1288 2019-04-19 16:32:08,081 INFO namenode.FSEditLog TransactionThread-227 - when logEdit, OpCode OP_END_LOG_SEGMENT 2019-04-19 16:32:08,081 INFO namenode.FSEditLog TransactionThread-227 - when logEdit, should not forceSync 2019-04-19 16:32:08,081 INFO namenode.FSEditLog TransactionThread-227 - Start LogSync 2019-04-19 16:32:08,081 INFO namenode.FSEditLog TransactionThread-227 - when logSyncAll, myTransactionId 1329 2019-04-19 16:32:08,081 INFO namenode.FSEditLog TransactionThread-226 - When logSync was called : txid = 1329 2019-04-19 16:32:08,081 INFO namenode.FSEditLog TransactionThread-226 - When logSync was called : myTransationId = 761 2019-04-19 16:32:08,081 INFO namenode.FSEditLog TransactionThread-226 - When logSync was called : synctxid = 1288 2019-04-19 16:32:08,081 INFO namenode.FSEditLog TransactionThread-226 - Number of transactions: 42 Total time for transactions(ms): 2 Number of transactions batched in Syncs: 41 Number of syncs: 3 SyncTimes(ms): 1 1 2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - The edits buffer is 663 bytes long with 39 unflushed transactions. Below is the list of unflushed transactions: 2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed op [0]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1291] 2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed op [1]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1292] 2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed op [2]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1293] 2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed op [3]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1294] 2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed op [4]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1295] 2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed op [5]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1296] 2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed op [6]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1297] 2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed op [7]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1298] 2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed op [8]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1299] 2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed op [9]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1300] 2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed op [10]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1301] 2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed op [11]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1302] 2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed op [12]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1303] 2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed op [13]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1304] 2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed op [14]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1305] 2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed op [15]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1306] 2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed op [16]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1307] 2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed op [17]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1308] 2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed op [18]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1309] 2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed op [19]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1310] 2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed op [20]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1311] 2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed op [21]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1312] 2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed op [22]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1313] 2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed op [23]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1314] 2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed op [24]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1315] 2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed op [25]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1316] 2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed op [26]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1317] 2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed op [27]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1318] 2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed op [28]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1319] 2019-04-19 16:32:08,085 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed op [29]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1320] 2019-04-19 16:32:08,085 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed op [30]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1321] 2019-04-19 16:32:08,085 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed op [31]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1322] 2019-04-19 16:32:08,085 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed op [32]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1323] 2019-04-19 16:32:08,085 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed op [33]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1324] 2019-04-19 16:32:08,085 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed op [34]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1325] 2019-04-19 16:32:08,085 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed op [35]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1326] 2019-04-19 16:32:08,085 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed op [36]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1327] 2019-04-19 16:32:08,085 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed op [37]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1328] 2019-04-19 16:32:08,085 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed op [38]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1329] 2019-04-19 16:32:08,085 ERROR namenode.FSEditLog TransactionThread-226 - Error: finalize log segment 1288, 1329 failed for (journal JournalAndStream(mgr=FileJournalManager(root=/Users/angerszhu/Documents/project/gdc/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1), stream=EditLogFileOutputStream(/Users/angerszhu/Documents/project/gdc/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1/current/edits_inprogress_0000000000000001288))) java.io.IOException: FSEditStream has 663 bytes still to be flushed and cannot be closed. at org.apache.hadoop.hdfs.server.namenode.EditsDoubleBuffer.close(EditsDoubleBuffer.java:73) at org.apache.hadoop.hdfs.server.namenode.EditLogFileOutputStream.close(EditLogFileOutputStream.java:151) at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalAndStream.closeStream(JournalSet.java:115) at org.apache.hadoop.hdfs.server.namenode.JournalSet$4.apply(JournalSet.java:235) at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393) at org.apache.hadoop.hdfs.server.namenode.JournalSet.finalizeLogSegment(JournalSet.java:231) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:1259) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:1182) at org.apache.hadoop.hdfs.server.namenode.TestEditLog$Transactions.run(TestEditLog.java:223) at java.lang.Thread.run(Thread.java:748) {code} For the TransactionThread-226, we can see that it's interrupted, During the whole cycle of #rollEditLog, it was been interrupted by other thread's evnent. And this make the #syncTxId will be larger then the old thread that call #rollEditLog and been interrupted. Finally because: myTxid <= synctxid, it just returned here. {code:java} if (mytxid <= synctxid) { numTransactionsBatchedInSync++; if (metrics != null) { // Metrics is non-null only when used inside name node metrics.incrTransactionsBatchedInSync(); } return; } {code} But when this time. bufCurrent will still have data not been flushed. Then cause that error. But I can't say why it's interrupted since there are many way's will call logSync and I don't know more detail of the whole HA and QJM. > Exception happened when rollEditLog expects empty > EditsDoubleBuffer.bufCurrent but not > ----------------------------------------------------------------------------------------- > > Key: HDFS-14437 > URL: https://issues.apache.org/jira/browse/HDFS-14437 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode, qjm > Reporter: angerszhu > Priority: Major > > For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 > , I have sort the process of write and flush EditLog and some important > function, I found the in the class FSEditLog class, the close() function > will call such process like below: > > {code:java} > waitForSyncToFinish(); > endCurrentLogSegment(true);{code} > since we have gain the object lock in the function close(), so when > waitForSyncToFish() method return, it mean all logSync job has done and all > data in bufReady has been flushed out, and since current thread has the lock > of this object, when call endCurrentLogSegment(), no other thread will gain > the lock so they can't write new editlog into currentBuf. > But when we don't call waitForSyncToFish() before endCurrentLogSegment(), > there may be some autoScheduled logSync()'s flush process is doing, since > this process don't need > synchronization since it has mention in the comment of logSync() method : > > {code:java} > /** > * Sync all modifications done by this thread. > * > * The internal concurrency design of this class is as follows: > * - Log items are written synchronized into an in-memory buffer, > * and each assigned a transaction ID. > * - When a thread (client) would like to sync all of its edits, logSync() > * uses a ThreadLocal transaction ID to determine what edit number must > * be synced to. > * - The isSyncRunning volatile boolean tracks whether a sync is currently > * under progress. > * > * The data is double-buffered within each edit log implementation so that > * in-memory writing can occur in parallel with the on-disk writing. > * > * Each sync occurs in three steps: > * 1. synchronized, it swaps the double buffer and sets the isSyncRunning > * flag. > * 2. unsynchronized, it flushes the data to storage > * 3. synchronized, it resets the flag and notifies anyone waiting on the > * sync. > * > * The lack of synchronization on step 2 allows other threads to continue > * to write into the memory buffer while the sync is in progress. > * Because this step is unsynchronized, actions that need to avoid > * concurrency with sync() should be synchronized and also call > * waitForSyncToFinish() before assuming they are running alone. > */ > public void logSync() { > long syncStart = 0; > // Fetch the transactionId of this thread. > long mytxid = myTransactionId.get().txid; > > boolean sync = false; > try { > EditLogOutputStream logStream = null; > synchronized (this) { > try { > printStatistics(false); > // if somebody is already syncing, then wait > while (mytxid > synctxid && isSyncRunning) { > try { > wait(1000); > } catch (InterruptedException ie) { > } > } > // > // If this transaction was already flushed, then nothing to do > // > if (mytxid <= synctxid) { > numTransactionsBatchedInSync++; > if (metrics != null) { > // Metrics is non-null only when used inside name node > metrics.incrTransactionsBatchedInSync(); > } > return; > } > > // now, this thread will do the sync > syncStart = txid; > isSyncRunning = true; > sync = true; > // swap buffers > try { > if (journalSet.isEmpty()) { > throw new IOException("No journals available to flush"); > } > editLogStream.setReadyToFlush(); > } catch (IOException e) { > final String msg = > "Could not sync enough journals to persistent storage " + > "due to " + e.getMessage() + ". " + > "Unsynced transactions: " + (txid - synctxid); > LOG.fatal(msg, new Exception()); > synchronized(journalSetLock) { > IOUtils.cleanup(LOG, journalSet); > } > terminate(1, msg); > } > } finally { > // Prevent RuntimeException from blocking other log edit write > doneWithAutoSyncScheduling(); > } > //editLogStream may become null, > //so store a local variable for flush. > logStream = editLogStream; > } > > // do the sync > long start = now(); > try { > if (logStream != null) { > logStream.flush(); > } > } catch (IOException ex) { > synchronized (this) { > final String msg = > "Could not sync enough journals to persistent storage. " > + "Unsynced transactions: " + (txid - synctxid); > LOG.fatal(msg, new Exception()); > synchronized(journalSetLock) { > IOUtils.cleanup(LOG, journalSet); > } > terminate(1, msg); > } > } > long elapsed = now() - start; > if (metrics != null) { // Metrics non-null only when used inside name node > metrics.addSync(elapsed); > } > > } finally { > // Prevent RuntimeException from blocking other log edit sync > synchronized (this) { > if (sync) { > synctxid = syncStart; > for (JournalManager jm : journalSet.getJournalManagers()) { > /** > * {@link FileJournalManager#lastReadableTxId} is only meaningful > * for file-based journals. Therefore the interface is not added to > * other types of {@link JournalManager}. > */ > if (jm instanceof FileJournalManager) { > ((FileJournalManager)jm).setLastReadableTxId(syncStart); > } > } > isSyncRunning = false; > } > this.notifyAll(); > } > } > } > {code} > So when not call waitForSyncFinish() before endCurrentLogSegment() will came > to a situation that when it can't guarantee that when call > endCurrentLogSegment() method, there is no flush job was doing. Then in the > endCurrentLogSegment() method process , bufReady may haven't been flushed out > totally, then it swap with the bufCurrent, finally when call > EditLogOutputStream's close() function, there is still un-flushed bytes in > bufCurrent then cause the error in > https://issues.apache.org/jira/browse/HDFS-10943 > > so maybe we should add waitForSyncFinish() before endCurrentLogSegment() > method in rollEditLog() method in FSEditLog class ? > {code:java} > synchronized long rollEditLog() throws IOException { > LOG.info("Rolling edit logs"); > waitForSyncToFinish(); > endCurrentLogSegment(true); > > long nextTxId = getLastWrittenTxId() + 1; > startLogSegment(nextTxId, true); > > assert curSegmentTxId == nextTxId; > return nextTxId; > }{code} > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org