[
https://issues.apache.org/jira/browse/HADOOP-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
dhruba borthakur updated HADOOP-1942:
-------------------------------------
Attachment: transactionLogSync6.patch
This patch does the following:
1. removed the "synchronized estreams". This exposed a bug that cause the
transaction log to get corrupted.
2. EditLogOutputStream does not implement DataOutput leading to code
simplification.
3. Swap DataOutputStreams rather than ByteOutputStream. This fixed the bug
exposed by 1 above.
Thanks to Raghu for these review comments.
> Increase the concurrency of transaction logging to edits log
> ------------------------------------------------------------
>
> Key: HADOOP-1942
> URL: https://issues.apache.org/jira/browse/HADOOP-1942
> Project: Hadoop
> Issue Type: Improvement
> Components: dfs
> Reporter: dhruba borthakur
> Assignee: dhruba borthakur
> Priority: Blocker
> Fix For: 0.15.0
>
> Attachments: transactionLogSync.patch, transactionLogSync2.patch,
> transactionLogSync3.patch, transactionLogSync4.patch,
> transactionLogSync5.patch, transactionLogSync6.patch
>
>
> For some typical workloads, the throughput of the namenode is bottlenecked by
> the rate of transactions that are being logged into tghe edits log. In the
> current code, a batching scheme implies that all transactions do not have to
> incur a sync of the edits log to disk. However, the existing batch-ing scheme
> can be improved.
> One option is to keep two buffers associated with edits file. Threads write
> to the primary buffer while holding the FSNamesystem lock. Then the thread
> release the FSNamesystem lock, acquires a new lock called the syncLock, swaps
> buffers, and flushes the old buffer to the persistent store. Since the
> buffers are swapped, new transactions continue to get logged into the new
> buffer. (Of course, the new transactions cannot complete before this new
> buffer is sync-ed).
> This approach does a better job of batching syncs to disk, thus improving
> performance.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.