[ 
https://issues.apache.org/jira/browse/HDFS-9198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated HDFS-9198:
------------------------------
    Attachment: HDFS-9198-trunk.patch
                HDFS-9198-branch2.patch

Incremental block reports are dumped into a queue for asynchronous processing 
by a background thread.  This thread acquires the write lock and processes IBRs 
until the queue drains or a max lock hold is met.  The max hold is 4ms which 
may seem high, but if the NN is backlogged that much, it's better to take the 
hit to catch up to avoid client issues.

Full BR processing also uses the queuing in a synchronous manner.  This helps 
preserve the ordering between the IBRs and full BRs from a node.  Another 
reason for synchronous full BR processing is it may issue a finalize command.

IBRs do not send commands so they can be async.  However, in the unlikely event 
that an IBR fails, the DN currently re-queues the IBR, but now the DN always 
sees success.  In practice an IBR fails if the DN is dead or unregistered.  On 
the off-chance that an IBR fails for another reason, I added minimal support to 
force the DN to re-register which elicits a full BR for re-syncing.

(The patches are trivially minor line number conflicts and a @VisibleForTesting)

> Coalesce IBR processing in the NN
> ---------------------------------
>
>                 Key: HDFS-9198
>                 URL: https://issues.apache.org/jira/browse/HDFS-9198
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>    Affects Versions: 2.0.0-alpha
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>         Attachments: HDFS-9198-branch2.patch, HDFS-9198-trunk.patch
>
>
> IBRs from thousands of DNs under load will degrade NN performance due to 
> excessive write-lock contention from multiple IPC handler threads.  The IBR 
> processing is quick, so the lock contention may be reduced by coalescing 
> multiple IBRs into a single write-lock transaction.  The handlers will also 
> be freed up faster for other operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to