[ 
https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14981323#comment-14981323
 ] 

Daryn Sharp commented on HDFS-7964:
-----------------------------------

I think special casing a few ops is fragile even if we add an assert.  I feel 
safer knowing a production deadlock isn't going to occur if/when someone 
doesn't add tests to stress all possible code paths or more importantly doesn't 
test against the async logging.   A runtime check will require halting the NN 
because data structures are already updated - definitely not good when a simple 
sync check will prevent any issues.

Exploring latency concerns:

Today, logEdit has to serialize and compute the crc for ops while holding the 
write lock.  With this feature, logEdit only queues which releases the write 
lock sooner.  The serialization cost is shifted to the background thread, 
however there's not 100 threads contending.  In the meantime other calls are 
being processed.

Sadly, the slow steady stream concern isn't even possible.  Even with 20-30k 
ops/sec, the average batch size is 1.6.  I've seen as high as 7.  Now once 
fine-grain locking is done...  We may be able to have that debate. ;)



> Add support for async edit logging
> ----------------------------------
>
>                 Key: HDFS-7964
>                 URL: https://issues.apache.org/jira/browse/HDFS-7964
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>    Affects Versions: 2.0.2-alpha
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>         Attachments: HDFS-7964.patch, HDFS-7964.patch, HDFS-7964.patch
>
>
> Edit logging is a major source of contention within the NN.  LogEdit is 
> called within the namespace write log, while logSync is called outside of the 
> lock to allow greater concurrency.  The handler thread remains busy until 
> logSync returns to provide the client with a durability guarantee for the 
> response.
> Write heavy RPC load and/or slow IO causes handlers to stall in logSync.  
> Although the write lock is not held, readers are limited/starved and the call 
> queue fills.  Combining an edit log thread with postponed RPC responses from 
> HADOOP-10300 will provide the same durability guarantee but immediately free 
> up the handlers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to