[
https://issues.apache.org/jira/browse/HADOOP-1003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12471764
]
Raghu Angadi commented on HADOOP-1003:
--------------------------------------
> b) Another Server thread that waits for pending commits to be synced and
> replies back to clients.
This extra thread is not required. IPC threads can do the job.
> Proposal to batch commits to edits log.
> ---------------------------------------
>
> Key: HADOOP-1003
> URL: https://issues.apache.org/jira/browse/HADOOP-1003
> Project: Hadoop
> Issue Type: Improvement
> Components: dfs
> Reporter: Raghu Angadi
> Assigned To: Sameer Paranjpye
>
> Right now most expensive namenode operations are that require commits to
> edits log. e.g. creating a file, deleting, renaming etc. Most of the time is
> spent in fsync() of edits file (multiple fsync() calls in the case of
> multiple image directories). During this time whole namesystem is under lock
> and even non-mutating operations like open() are blocked.
> On a local filesystem, each fsync could take in the order of milliseconds. My
> understanding is that guarantee namenode provides is that edits log is synced
> before replying to the client. Without any changes to current locking
> structure, I was thinking of the following for batching multiple edits :
> a) a facility in RPC Server to postpone responding to a particular call
> (communication with ThreadLocals may be). This is strictly not required but
> without it, number operations batched would be limited to number of IPC
> threads.
> b) Another Server thread that waits for pending commits to be synced and
> replies back to clients.
> c) fsync manager that periodically syncs the edit log and informs
> waiting RPCs. The sync thread can dynamically decide to wait longer or
> shorter based on the load so that we don't increase the latency when namenode
> is lightly loaded. Event simple policy of 'sync if there are any mutations'
> will also work but that might reduce the hard disk life.
>
> All the synchronization between these threads is a bit complicated but it can
> be stable. My main concern is whether the guarantee we are providing enough
> for namenode operation. I think it is enough.
> In terms of throughput, number of creates a namenode can do should be on the
> same range as number of opens it can do.
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.