[ https://issues.apache.org/jira/browse/HDFS-16659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17603505#comment-17603505 ]
ASF GitHub Bot commented on HDFS-16659: --------------------------------------- ZanderXu commented on PR #4560: URL: https://github.com/apache/hadoop/pull/4560#issuecomment-1245232780 @xkrogen After deep thinking and do some verifications, I found there are two places should be fixed for the case that `sinceTxId = highestTxId + 1`. Currently Journal throws one `NewerTxnIdException` to namenode, we expect namenode can catch `NewerTxnIdException` during `selectRpcInputStreams` and ignore it. But the namenode throws a `QuorumException` during `selectRpcInputStreams` because there are a majority of `NewerTxnIdException`. Then the namenode fallbacks to `selectStreamingInputStreams`. Beside this problem, JournalNodeRpcServer shouldn't print any logs about `NewerTxnIdException` when `sinceTxId = highestTxId + 1`, but it should print some logs about `NewerTxnIdException` when `sinceTxId > highestTxId + 1`. So as above cases, how about handling them differently? such as ``` long highestTxId = getHighestWrittenTxId(); if (sinceTxId == highestTxId + 1) { // This is normal case and will return one response with 0 txnCount. metrics.rpcEmptyResponses.incr(); return GetJournaledEditsResponseProto.newBuilder().setTxnCount(0).build(); } else if (sinceTxId > highestTxId) { // Requested edits that don't exist yet and is newer than highestTxId. metrics.rpcEmptyResponses.incr(); throw new NewerTxnIdException( "Highest txn ID available in the journal is %d, but requested txns starting at %d.", highestTxId, sinceTxId); } ``` > JournalNode should throw NewerTxnIdException if SinceTxId is bigger than > HighestWrittenTxId > ------------------------------------------------------------------------------------------- > > Key: HDFS-16659 > URL: https://issues.apache.org/jira/browse/HDFS-16659 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: ZanderXu > Assignee: ZanderXu > Priority: Critical > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h > Remaining Estimate: 0h > > JournalNode should throw `CacheMissException` if `sinceTxId` is bigger than > `highestWrittenTxId` during handling `getJournaledEdits` rpc from NNs. > Current logic may cause in-progress EditlogTailer cannot replay any Edits > from JournalNodes in some corner cases, resulting in ObserverNameNode cannot > handle requests from clients. > Suppose there are 3 journalNodes, JN0 ~ JN1. > * JN0 has some abnormal cases when Active Namenode is syncing 10 Edits with > first txid 11 > * NameNode just ignore the abnormal JN0 and continue to sync Edits to Journal > 1 and 2 > * JN0 backed to health > * NameNode continue sync 10 Edits with first txid 21. > * At this point, there are no Edits 11 ~ 30 in the cache of JN0 > * Observer NameNode try to select EditLogInputStream through > `getJournaledEdits` with since txId 21 > * Journal 2 has some abnormal cases and caused a slow response > The expected result is: Response should contain 20 Edits from txId 21 to txId > 30 from JN1 and JN2. Because Active NameNode successfully write these Edits > to JN1 and JN2 and failed write these edits to JN0. > But in the current implementation, the response is [Response(0) from JN0, > Response(10) from JN1], because there are some abnormal cases in JN2, such > as GC, bad network, cause a slow response. So the `maxAllowedTxns` will be > 0, NameNode will not replay any Edits. > As above, the root case is that JournalNode should throw Miss Cache Exception > when `sinceTxid` is more than `highestWrittenTxId`. > And the bug code as blew: > {code:java} > if (sinceTxId > getHighestWrittenTxId()) { > // Requested edits that don't exist yet; short-circuit the cache here > metrics.rpcEmptyResponses.incr(); > return > GetJournaledEditsResponseProto.newBuilder().setTxnCount(0).build(); > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org