[ 
https://issues.apache.org/jira/browse/HDFS-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13446918#comment-13446918
 ] 

Chao Shi commented on HDFS-3885:
--------------------------------

A similar one to save network latency: sync logs to lagging node in a larger 
batch. I guess a batch of 512K or 1MB should be much efficient.

Note that this can also work for uncommitted transactions. Imagine this with 3 
JNs:
Tx1 is committed by JN1 and JN2. QJM is writing Tx2. JN3 is lagging. So we have 
tx1 and tx2 in its queue. We can send them to JN3 in a batch.

To implement the above idea, it needs more changes to current code structure, 
which simply uses a single threaded executor as the queue.
                
> QJM: optimize log sync when JN is lagging behind
> ------------------------------------------------
>
>                 Key: HDFS-3885
>                 URL: https://issues.apache.org/jira/browse/HDFS-3885
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>    Affects Versions: QuorumJournalManager (HDFS-3077)
>            Reporter: Todd Lipcon
>
> This is a potential optimization that we can add to the JournalNode: when one 
> of the nodes is lagging behind the others (eg because its local disk is 
> slower or there was a network blip), it receives edits after they've been 
> committed to a majority. It can tell this because the committed txid included 
> in the request info is higher than the highest txid in the actual batch to be 
> written. In this case, we know that this batch has already been fsynced to a 
> quorum of nodes, so we can skip the fsync() on the laggy node, helping it to 
> catch back up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to