[
https://issues.apache.org/jira/browse/HDFS-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862909#action_12862909
]
Hairong Kuang commented on HDFS-1112:
-------------------------------------
Here is the proposal:
# Add a method getUnflushedDataLen() to EditLogOutputStream that returns the
length of buffered edit logs that need to be flushed.
# After each edit log entry is written (FSEditLog#logEdit), check the length of
the unflushed edit logs. If it is bigger than or equal to the initial buffer
size, which is 512K for now, all edit log streams are automatically flushed and
synced to disks.
This proposal does allow edit log buffer to grow beyond the initial buffer
size, but the max buffer size is really bounded the max length of an edit log
entry. In most cases, I believe that the buffer can grow up to 1M bytes. The
advantage of not shrinking the edit log buffer is that it won't cause frequent
buffer allocations & deallocations, so avoiding frequent GCs if a large amount
of open hits NameNode in a short time.
> Edit log buffer should not grow unboundedly
> -------------------------------------------
>
> Key: HDFS-1112
> URL: https://issues.apache.org/jira/browse/HDFS-1112
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: name-node
> Affects Versions: 0.22.0
> Reporter: Hairong Kuang
> Fix For: 0.22.0
>
>
> Currently HDFS does not impose an upper limit on the edit log buffer. In case
> there are a large number of open operations coming in with access time update
> on, since open does not call sync automatically, there is a possibility that
> the buffer grow to a large size, therefore causes memory leak and full GC in
> extreme cases as described in HDFS-1104.
> The edit log buffer should be automatically flushed when the buffer becomes
> full.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.