[ 
https://issues.apache.org/jira/browse/HDFS-16659?focusedWorklogId=793334&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-793334
 ]

ASF GitHub Bot logged work on HDFS-16659:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 20/Jul/22 16:19
            Start Date: 20/Jul/22 16:19
    Worklog Time Spent: 10m 
      Work Description: ZanderXu commented on PR #4560:
URL: https://github.com/apache/hadoop/pull/4560#issuecomment-1190487791

   @jojochuang @goiri Can you help me review this patch? Thanks




Issue Time Tracking
-------------------

    Worklog Id:     (was: 793334)
    Time Spent: 40m  (was: 0.5h)

> JournalNode should throw CacheMissException if SinceTxId is bigger than 
> HighestWrittenTxId
> ------------------------------------------------------------------------------------------
>
>                 Key: HDFS-16659
>                 URL: https://issues.apache.org/jira/browse/HDFS-16659
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: ZanderXu
>            Assignee: ZanderXu
>            Priority: Critical
>              Labels: pull-request-available
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> JournalNode should throw `CacheMissException` if `sinceTxId` is bigger than 
> `highestWrittenTxId`. And it will caused EditlogTailer can not able to tail 
> edits. And it maybe caused ObserverNameNode can not able handle requests from 
> clients.
> Suppose there are 3 journalNodes, JN0 ~ JN1.
> The corner case as blew:
> * JN0 has some abnormal cases when Active Namenode is journaling Edits with 
> start txId 11
> * NameNode just ignore the abnormal JN0 and continue to write Edits to 
> Journal 1 and 2
> * JN0 backed to health
> * Observer NameNode try to select EditLogInputStream vis PRC with start txId 
> 21
> * Journal 1 has some abnormal cases caused slow rpc response
> And the expected selecting result is: Response should contain 20 Edits from 
> txId 21 to txId 40 from JN1 and JN2. Because Active NameNode successfully 
> write these Edits to JN1 and JN2 and failed write these edits to JN0, so 
> there is no Edits from id 21 to 40 in the cache of JN0.
> But in the current implementation,  there is no Edits in the Response. 
> Because namenode successfully got a response from JN0 that did not contains 
> any Edits.
> And the bug code as blew:
> {code:java}
> if (sinceTxId > getHighestWrittenTxId()) {
>     // Requested edits that don't exist yet; short-circuit the cache here
>     metrics.rpcEmptyResponses.incr();
>     return 
> GetJournaledEditsResponseProto.newBuilder().setTxnCount(0).build(); 
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to