[ 
https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16821795#comment-16821795
 ] 

angerszhu commented on HDFS-14437:
----------------------------------

[~hexiaoqiao] [~daryn] [~starphin] [~kihwal] [~arpitagarwal]

 

When I do Unit test , I add some log in my code and print thread id in log4j.

In TestEditLog#Transactions's run method, I add a call for #rollEditLog to 
simulate random multithreading call of logSync().

 
{code:java}
@Override
public void run() {
 PermissionStatus p = namesystem.createFsOwnerPermissions(
 new FsPermission((short)0777));
 FSEditLog editLog = namesystem.getEditLog();

 for (int i = 0; i < numTransactions; i++) {
 INodeFile inode = new INodeFile(namesystem.allocateNewInodeId(), null,
 p, 0L, 0L, BlockInfo.EMPTY_ARRAY, replication, blockSize, (byte)0);
 inode.toUnderConstruction("", "");

 editLog.logOpenFile("/filename" + (startIndex + i), inode, false, false);
 editLog.logCloseFile("/filename" + (startIndex + i), inode);
 editLog.logSync();
 if(i % 100 ==0){
 try {
 editLog.rollEditLog();
 } catch (IOException e) {
 e.printStackTrace();
 }
 }
 }
}

{code}
 

and I add some Log of these there transaction id: 
 * txid  : Current max transaction id of all thread
 * myTransactionId : max transaction id of current thread
 * syncTxId : current transaction if of EditLog that has been synced to QJM

The Right #rollEditLog Event: see thread 
{code:java}
TransactionThread-267{code}
 
{code:java}
2019-04-19 16:32:08,068 INFO namenode.FSEditLog TransactionThread-267 - Rolling 
edit logs
2019-04-19 16:32:08,068 INFO namenode.FSEditLog TransactionThread-267 - Ending 
log segment 829
2019-04-19 16:32:08,068 INFO namenode.FSEditLog TransactionThread-267 - When 
endCurrentLogSegment was called : txid = 1286
2019-04-19 16:32:08,068 INFO namenode.FSEditLog TransactionThread-267 - When 
endCurrentLogSegment was called : myTransationId = 830
2019-04-19 16:32:08,068 INFO namenode.FSEditLog TransactionThread-267 - When 
endCurrentLogSegment was called : synctxid = 830
2019-04-19 16:32:08,068 INFO namenode.FSEditLog TransactionThread-267 - when 
logEdit, OpCode OP_END_LOG_SEGMENT
2019-04-19 16:32:08,068 INFO namenode.FSEditLog TransactionThread-267 - when 
logEdit, should not forceSync
2019-04-19 16:32:08,068 INFO namenode.FSEditLog TransactionThread-267 - Start 
LogSync
2019-04-19 16:32:08,068 INFO namenode.FSEditLog TransactionThread-267 - when 
logSyncAll, myTransactionId 1287
2019-04-19 16:32:08,068 INFO namenode.FSEditLog TransactionThread-267 - When 
logSync was called : txid = 1287
2019-04-19 16:32:08,068 INFO namenode.FSEditLog TransactionThread-267 - When 
logSync was called : myTransationId = 1287
2019-04-19 16:32:08,068 INFO namenode.FSEditLog TransactionThread-267 - When 
logSync was called : synctxid = 830
2019-04-19 16:32:08,068 INFO namenode.FSEditLog TransactionThread-267 - when 
logSync, start flush
2019-04-19 16:32:08,068 INFO namenode.FSEditLog TransactionThread-267 - when 
logSync, finish flush
2019-04-19 16:32:08,068 INFO namenode.FSEditLog TransactionThread-267 - Number 
of transactions: 459 Total time for transactions(ms): 9 Number of transactions 
batched in Syncs: 280 Number of syncs: 4 SyncTimes(ms): 0 3
2019-04-19 16:32:08,068 INFO namenode.FileJournalManager TransactionThread-267 
- Finalizing edits file 
/Users/angerszhu/Documents/project/gdc/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1/current/edits_inprogress_0000000000000000829
 -> 
/Users/angerszhu/Documents/project/gdc/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1/current/edits_0000000000000000829-0000000000000001287
2019-04-19 16:32:08,069 INFO namenode.FileJournalManager TransactionThread-267 
- Finalizing edits file 
/Users/angerszhu/Documents/project/gdc/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name2/current/edits_inprogress_0000000000000000829
 -> 
/Users/angerszhu/Documents/project/gdc/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name2/current/edits_0000000000000000829-0000000000000001287
2019-04-19 16:32:08,069 INFO namenode.FSEditLog TransactionThread-267 - Finish 
Ending log segment
20
{code}
we can see that the sequence of this three Id is Ok . 

 

 

The Error #rollEditLog Case: see 
{code:java}
TransactionThread-226 {code}
 
{code:java}
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-226 - When 
logSync was called : txid = 760
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-226 - When 
logSync was called : myTransationId = 19
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-226 - When 
logSync was called : synctxid = 23
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-226 - Rolling 
edit logs
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-226 - Ending 
log segment 3
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-226 - When 
endCurrentLogSegment was called : txid = 760
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-226 - When 
endCurrentLogSegment was called : myTransationId = 19
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-226 - When 
endCurrentLogSegment was called : synctxid = 23
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-226 - when 
logEdit, OpCode OP_END_LOG_SEGMENT
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-226 - when 
logEdit, should not forceSync
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-226 - Start 
LogSync
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-226 - when 
logSyncAll, myTransactionId 761
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-265 - when 
logEdit, OpCode OP_ADD
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-265 - when 
logEdit, should not forceSync
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-264 - when 
logEdit, OpCode OP_ADD
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-264 - when 
logEdit, should not forceSync
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-227 - when 
logEdit, OpCode OP_CLOSE
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-227 - when 
logEdit, should not forceSync
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-263 - when 
logEdit, OpCode OP_ADD
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-263 - when 
logEdit, should not forceSync
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-228 - when 
logEdit, OpCode OP_CLOSE
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-228 - when 
logEdit, should not forceSync
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-262 - when 
logEdit, OpCode OP_ADD
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-262 - when 
logEdit, should not forceSync
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-261 - when 
logEdit, OpCode OP_ADD
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-261 - when 
logEdit, should not forceSync
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-261 - when 
logEdit, OpCode OP_CLOSE
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-261 - when 
logEdit, should not forceSync
2019-04-19 16:32:07,979 INFO namenode.FSEditLog TransactionThread-229 - when 
logEdit, OpCode OP_CLOSE
2019-04-19 16:32:07,979 INFO namenode.FSEditLog TransactionThread-229 - when 
logEdit, should not forceSync
2019-04-19 16:32:07,979 INFO namenode.FSEditLog TransactionThread-260 - when 
logEdit, OpCode OP_ADD
2019-04-19 16:32:07,979 INFO namenode.FSEditLog TransactionThread-260 - when 
logEdit, should not forceSync

................
..................
2019-04-19 16:32:07,982 INFO namenode.FSEditLog TransactionThread-260 - when 
logEdit, should not forceSync
2019-04-19 16:32:07,982 INFO namenode.FSEditLog TransactionThread-262 - when 
logEdit, OpCode OP_CLOSE
2019-04-19 16:32:07,982 INFO namenode.FSEditLog TransactionThread-262 - when 
logEdit, should not forceSync
2019-04-19 16:32:07,982 INFO namenode.FSEditLog TransactionThread-263 - when 
logEdit, OpCode OP_CLOSE
2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-263 - when 
logEdit, should not forceSync
2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-264 - when 
logEdit, OpCode OP_CLOSE
2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-264 - when 
logEdit, should not forceSync
2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-265 - when 
logEdit, OpCode OP_CLOSE
2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-265 - when 
logEdit, should not forceSync
2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-266 - when 
logEdit, OpCode OP_CLOSE
2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-266 - when 
logEdit, should not forceSync
2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-225 - Rolling 
edit logs
2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-225 - Ending 
log segment 3
2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-225 - When 
endCurrentLogSegment was called : txid = 827
2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-225 - When 
endCurrentLogSegment was called : myTransationId = 758
2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-225 - When 
endCurrentLogSegment was called : synctxid = 758
2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-225 - when 
logEdit, OpCode OP_END_LOG_SEGMENT
2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-225 - when 
logEdit, should not forceSync
2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-225 - Start 
LogSync
2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-225 - when 
logSyncAll, myTransactionId 828
2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-225 - When 
logSync was called : txid = 828
2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-225 - When 
logSync was called : myTransationId = 828
2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-225 - When 
logSync was called : synctxid = 758
2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-225 - when 
logSync, start flush
2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-225 - when 
logSync, finish flush
2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-225 - Number 
of transactions: 826 Total time for transactions(ms): 55 Number of transactions 
batched in Syncs: 1 Number of syncs: 5 SyncTimes(ms): 1 2
2019-04-19 16:32:07,984 INFO namenode.FileJournalManager TransactionThread-225 
- Finalizing edits file 
/Users/angerszhu/Documents/project/gdc/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1/current/edits_inprogress_0000000000000000003
 -> 
/Users/angerszhu/Documents/project/gdc/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1/current/edits_0000000000000000003-0000000000000000828
2019-04-19 16:32:07,984 INFO namenode.FileJournalManager TransactionThread-225 
- Finalizing edits file 
/Users/angerszhu/Documents/project/gdc/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name2/current/edits_inprogress_0000000000000000003
 -> 
/Users/angerszhu/Documents/project/gdc/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name2/current/edits_0000000000000000003-0000000000000000828
2019-04-19 16:32:07,984 INFO namenode.FSEditLog TransactionThread-225 - Finish 
Ending log segment
2019-04-19 16:32:07,984 INFO namenode.FSEditLog TransactionThread-225 - 
Starting log segment at 829
2019-04-19 16:32:07,985 INFO namenode.FSEditLog TransactionThread-225 - when 
logEdit, OpCode OP_START_LOG_SEGMENT
2019-04-19 16:32:07,985 INFO namenode.FSEditLog TransactionThread-225 - when 
logEdit, should not forceSync
2019-04-19 16:32:07,985 INFO namenode.FSEditLog TransactionThread-225 - When 
logSync was called : txid = 829
2019-04-19 16:32:07,985 INFO namenode.FSEditLog TransactionThread-225 - When 
logSync was called : myTransationId = 829
2019-04-19 16:32:07,986 INFO namenode.FSEditLog TransactionThread-225 - When 
logSync was called : synctxid = 828
2019-04-19 16:32:07,986 INFO namenode.FSEditLog TransactionThread-225 - when 
logSync, start flush
2019-04-19 16:32:07,986 INFO namenode.FSEditLog TransactionThread-225 - when 
logSync, finish flush
2019-04-19 16:32:07,986 INFO namenode.FSEditLog TransactionThread-225 - Finish 
Rolling edit logs
2019-04-19 16:32:07,986 INFO namenode.FSEditLog TransactionThread-267 - when 
logEdit, OpCode OP_CLOSE
2019-04-
.................
................


2019-04-19 16:32:08,080 INFO namenode.FSEditLog TransactionThread-227 - When 
endCurrentLogSegment was called : txid = 1328
2019-04-19 16:32:08,081 INFO namenode.FSEditLog TransactionThread-227 - When 
endCurrentLogSegment was called : myTransationId = 764
2019-04-19 16:32:08,081 INFO namenode.FSEditLog TransactionThread-227 - When 
endCurrentLogSegment was called : synctxid = 1288
2019-04-19 16:32:08,081 INFO namenode.FSEditLog TransactionThread-227 - when 
logEdit, OpCode OP_END_LOG_SEGMENT
2019-04-19 16:32:08,081 INFO namenode.FSEditLog TransactionThread-227 - when 
logEdit, should not forceSync
2019-04-19 16:32:08,081 INFO namenode.FSEditLog TransactionThread-227 - Start 
LogSync
2019-04-19 16:32:08,081 INFO namenode.FSEditLog TransactionThread-227 - when 
logSyncAll, myTransactionId 1329
2019-04-19 16:32:08,081 INFO namenode.FSEditLog TransactionThread-226 - When 
logSync was called : txid = 1329
2019-04-19 16:32:08,081 INFO namenode.FSEditLog TransactionThread-226 - When 
logSync was called : myTransationId = 761
2019-04-19 16:32:08,081 INFO namenode.FSEditLog TransactionThread-226 - When 
logSync was called : synctxid = 1288
2019-04-19 16:32:08,081 INFO namenode.FSEditLog TransactionThread-226 - Number 
of transactions: 42 Total time for transactions(ms): 2 Number of transactions 
batched in Syncs: 41 Number of syncs: 3 SyncTimes(ms): 1 1
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - 
The edits buffer is 663 bytes long with 39 unflushed transactions. Below is the 
list of unflushed transactions:
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - 
Unflushed op [0]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1291]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - 
Unflushed op [1]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1292]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - 
Unflushed op [2]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1293]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - 
Unflushed op [3]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1294]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - 
Unflushed op [4]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1295]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - 
Unflushed op [5]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1296]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - 
Unflushed op [6]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1297]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - 
Unflushed op [7]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1298]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - 
Unflushed op [8]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1299]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - 
Unflushed op [9]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1300]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - 
Unflushed op [10]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1301]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - 
Unflushed op [11]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1302]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - 
Unflushed op [12]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1303]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - 
Unflushed op [13]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1304]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - 
Unflushed op [14]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1305]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - 
Unflushed op [15]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1306]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - 
Unflushed op [16]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1307]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - 
Unflushed op [17]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1308]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - 
Unflushed op [18]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1309]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - 
Unflushed op [19]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1310]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - 
Unflushed op [20]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1311]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - 
Unflushed op [21]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1312]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - 
Unflushed op [22]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1313]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - 
Unflushed op [23]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1314]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - 
Unflushed op [24]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1315]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - 
Unflushed op [25]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1316]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - 
Unflushed op [26]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1317]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - 
Unflushed op [27]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1318]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - 
Unflushed op [28]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1319]
2019-04-19 16:32:08,085 WARN namenode.EditsDoubleBuffer TransactionThread-226 - 
Unflushed op [29]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1320]
2019-04-19 16:32:08,085 WARN namenode.EditsDoubleBuffer TransactionThread-226 - 
Unflushed op [30]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1321]
2019-04-19 16:32:08,085 WARN namenode.EditsDoubleBuffer TransactionThread-226 - 
Unflushed op [31]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1322]
2019-04-19 16:32:08,085 WARN namenode.EditsDoubleBuffer TransactionThread-226 - 
Unflushed op [32]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1323]
2019-04-19 16:32:08,085 WARN namenode.EditsDoubleBuffer TransactionThread-226 - 
Unflushed op [33]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1324]
2019-04-19 16:32:08,085 WARN namenode.EditsDoubleBuffer TransactionThread-226 - 
Unflushed op [34]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1325]
2019-04-19 16:32:08,085 WARN namenode.EditsDoubleBuffer TransactionThread-226 - 
Unflushed op [35]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1326]
2019-04-19 16:32:08,085 WARN namenode.EditsDoubleBuffer TransactionThread-226 - 
Unflushed op [36]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1327]
2019-04-19 16:32:08,085 WARN namenode.EditsDoubleBuffer TransactionThread-226 - 
Unflushed op [37]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1328]
2019-04-19 16:32:08,085 WARN namenode.EditsDoubleBuffer TransactionThread-226 - 
Unflushed op [38]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1329]
2019-04-19 16:32:08,085 ERROR namenode.FSEditLog TransactionThread-226 - Error: 
finalize log segment 1288, 1329 failed for (journal 
JournalAndStream(mgr=FileJournalManager(root=/Users/angerszhu/Documents/project/gdc/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1),
 
stream=EditLogFileOutputStream(/Users/angerszhu/Documents/project/gdc/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1/current/edits_inprogress_0000000000000001288)))
java.io.IOException: FSEditStream has 663 bytes still to be flushed and cannot 
be closed.
at 
org.apache.hadoop.hdfs.server.namenode.EditsDoubleBuffer.close(EditsDoubleBuffer.java:73)
at 
org.apache.hadoop.hdfs.server.namenode.EditLogFileOutputStream.close(EditLogFileOutputStream.java:151)
at 
org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalAndStream.closeStream(JournalSet.java:115)
at 
org.apache.hadoop.hdfs.server.namenode.JournalSet$4.apply(JournalSet.java:235)
at 
org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393)
at 
org.apache.hadoop.hdfs.server.namenode.JournalSet.finalizeLogSegment(JournalSet.java:231)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:1259)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:1182)
at 
org.apache.hadoop.hdfs.server.namenode.TestEditLog$Transactions.run(TestEditLog.java:223)
at java.lang.Thread.run(Thread.java:748)
{code}
For the TransactionThread-226, we can see that it's interrupted, 

During the whole cycle of  #rollEditLog, it was been interrupted by other 
thread's evnent.

And this make the #syncTxId will be larger then the old thread that call 
#rollEditLog and been interrupted.

Finally because: myTxid <= synctxid, it just returned here.

 
{code:java}
if (mytxid <= synctxid) {
  numTransactionsBatchedInSync++;
  if (metrics != null) {
    // Metrics is non-null only when used inside name node
    metrics.incrTransactionsBatchedInSync();
  }
  return;
}

{code}
But when this time. bufCurrent will still have data not been flushed.

Then cause that error.

 

 

But I can't say why it's interrupted since there are many way's will call 
logSync and I don't know more detail of the whole HA and QJM. 

 

 

> Exception happened when   rollEditLog expects empty 
> EditsDoubleBuffer.bufCurrent  but not
> -----------------------------------------------------------------------------------------
>
>                 Key: HDFS-14437
>                 URL: https://issues.apache.org/jira/browse/HDFS-14437
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ha, namenode, qjm
>            Reporter: angerszhu
>            Priority: Major
>
> For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 
> , I have sort the process of write and flush EditLog and some important 
> function, I found the in the class  FSEditLog class, the close() function 
> will call such process like below:
>  
> {code:java}
> waitForSyncToFinish();
> endCurrentLogSegment(true);{code}
> since we have gain the object lock in the function close(), so when  
> waitForSyncToFish() method return, it mean all logSync job has done and all 
> data in bufReady has been flushed out, and since current thread has the lock 
> of this object, when call endCurrentLogSegment(), no other thread will gain 
> the lock so they can't write new editlog into currentBuf.
> But when we don't call waitForSyncToFish() before endCurrentLogSegment(), 
> there may be some autoScheduled logSync()'s flush process is doing, since 
> this process don't need
> synchronization since it has mention in the comment of logSync() method :
>  
> {code:java}
> /**
>  * Sync all modifications done by this thread.
>  *
>  * The internal concurrency design of this class is as follows:
>  *   - Log items are written synchronized into an in-memory buffer,
>  *     and each assigned a transaction ID.
>  *   - When a thread (client) would like to sync all of its edits, logSync()
>  *     uses a ThreadLocal transaction ID to determine what edit number must
>  *     be synced to.
>  *   - The isSyncRunning volatile boolean tracks whether a sync is currently
>  *     under progress.
>  *
>  * The data is double-buffered within each edit log implementation so that
>  * in-memory writing can occur in parallel with the on-disk writing.
>  *
>  * Each sync occurs in three steps:
>  *   1. synchronized, it swaps the double buffer and sets the isSyncRunning
>  *      flag.
>  *   2. unsynchronized, it flushes the data to storage
>  *   3. synchronized, it resets the flag and notifies anyone waiting on the
>  *      sync.
>  *
>  * The lack of synchronization on step 2 allows other threads to continue
>  * to write into the memory buffer while the sync is in progress.
>  * Because this step is unsynchronized, actions that need to avoid
>  * concurrency with sync() should be synchronized and also call
>  * waitForSyncToFinish() before assuming they are running alone.
>  */
> public void logSync() {
>   long syncStart = 0;
>   // Fetch the transactionId of this thread. 
>   long mytxid = myTransactionId.get().txid;
>   
>   boolean sync = false;
>   try {
>     EditLogOutputStream logStream = null;
>     synchronized (this) {
>       try {
>         printStatistics(false);
>         // if somebody is already syncing, then wait
>         while (mytxid > synctxid && isSyncRunning) {
>           try {
>             wait(1000);
>           } catch (InterruptedException ie) {
>           }
>         }
>         //
>         // If this transaction was already flushed, then nothing to do
>         //
>         if (mytxid <= synctxid) {
>           numTransactionsBatchedInSync++;
>           if (metrics != null) {
>             // Metrics is non-null only when used inside name node
>             metrics.incrTransactionsBatchedInSync();
>           }
>           return;
>         }
>    
>         // now, this thread will do the sync
>         syncStart = txid;
>         isSyncRunning = true;
>         sync = true;
>         // swap buffers
>         try {
>           if (journalSet.isEmpty()) {
>             throw new IOException("No journals available to flush");
>           }
>           editLogStream.setReadyToFlush();
>         } catch (IOException e) {
>           final String msg =
>               "Could not sync enough journals to persistent storage " +
>               "due to " + e.getMessage() + ". " +
>               "Unsynced transactions: " + (txid - synctxid);
>           LOG.fatal(msg, new Exception());
>           synchronized(journalSetLock) {
>             IOUtils.cleanup(LOG, journalSet);
>           }
>           terminate(1, msg);
>         }
>       } finally {
>         // Prevent RuntimeException from blocking other log edit write 
>         doneWithAutoSyncScheduling();
>       }
>       //editLogStream may become null,
>       //so store a local variable for flush.
>       logStream = editLogStream;
>     }
>     
>     // do the sync
>     long start = now();
>     try {
>       if (logStream != null) {
>         logStream.flush();
>       }
>     } catch (IOException ex) {
>       synchronized (this) {
>         final String msg =
>             "Could not sync enough journals to persistent storage. "
>             + "Unsynced transactions: " + (txid - synctxid);
>         LOG.fatal(msg, new Exception());
>         synchronized(journalSetLock) {
>           IOUtils.cleanup(LOG, journalSet);
>         }
>         terminate(1, msg);
>       }
>     }
>     long elapsed = now() - start;
>     if (metrics != null) { // Metrics non-null only when used inside name node
>       metrics.addSync(elapsed);
>     }
>     
>   } finally {
>     // Prevent RuntimeException from blocking other log edit sync 
>     synchronized (this) {
>       if (sync) {
>         synctxid = syncStart;
>         for (JournalManager jm : journalSet.getJournalManagers()) {
>           /**
>            * {@link FileJournalManager#lastReadableTxId} is only meaningful
>            * for file-based journals. Therefore the interface is not added to
>            * other types of {@link JournalManager}.
>            */
>           if (jm instanceof FileJournalManager) {
>             ((FileJournalManager)jm).setLastReadableTxId(syncStart);
>           }
>         }
>         isSyncRunning = false;
>       }
>       this.notifyAll();
>    }
>   }
> }
> {code}
> So  when not call waitForSyncFinish() before endCurrentLogSegment() will came 
> to a situation that when it can't guarantee that when call 
> endCurrentLogSegment() method, there is no flush job was doing. Then in the 
> endCurrentLogSegment() method process , bufReady may haven't been flushed out 
> totally, then it swap with the bufCurrent, finally when call 
> EditLogOutputStream's close() function, there is still un-flushed bytes in 
> bufCurrent then cause the error in 
> https://issues.apache.org/jira/browse/HDFS-10943
>  
> so maybe we should add waitForSyncFinish() before endCurrentLogSegment() 
> method in rollEditLog() method in FSEditLog class ?
> {code:java}
> synchronized long rollEditLog() throws IOException {
>  LOG.info("Rolling edit logs");
>  waitForSyncToFinish();
>  endCurrentLogSegment(true);
>  
>  long nextTxId = getLastWrittenTxId() + 1;
>  startLogSegment(nextTxId, true);
>  
>  assert curSegmentTxId == nextTxId;
>  return nextTxId;
> }{code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to