[
https://issues.apache.org/jira/browse/HDFS-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Colin Patrick McCabe updated HDFS-3510:
---------------------------------------
Attachment: HDFS-3510.007.patch
* use IOUtils#writeFully now that it's available
* write a megabyte at a time to minimize seeks
> Fix FSEditLog pre-allocation
> ----------------------------
>
> Key: HDFS-3510
> URL: https://issues.apache.org/jira/browse/HDFS-3510
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 1.0.0, 2.0.0-alpha
> Reporter: Colin Patrick McCabe
> Assignee: Colin Patrick McCabe
> Fix For: 1.0.0, 2.0.1-alpha
>
> Attachments: HDFS-3510-b1.001.patch, HDFS-3510-b1.002.patch,
> HDFS-3510.001.patch, HDFS-3510.003.patch, HDFS-3510.004.patch,
> HDFS-3510.004.patch, HDFS-3510.006.patch, HDFS-3510.007.patch
>
>
> In the FSEditLog, we want to avoid running out of space in the middle of
> writing an edit log operation to the disk. We do this by a process called
> "preallocation"-- reserving space on the disk for the upcoming edit log
> entries before beginning to write them.
> The idea is that if we're going to encounter an out-of-disk-space condition,
> we don't want it to happen in the middle of writing valid data. Instead, we
> want it to happen in the middle of writing padding bytes. The edit log uses
> bytes with the value 0xff (in decimal, -1) as padding. These bytes
> correspond to FSEditLogOp.OP_INVALID.
> The current preallocation strategy is flawed. Although we preallocate a very
> large chunk at a time-- 1 megabyte, in fact-- we only do this preallocation
> when we are more than 4096 bytes away from the end of the file. This means
> that the effective preallocation length is only 4096 bytes. A batch of edit
> log entries could easily be more than this. There is evidence that this has
> caused problems in the field for end-users.
> Here is a visual illustration of the old preallocation strategy:
> {code}
> first write
> |
> V <----- 1 MB ----->
> +--+---------------+
> |__|FFFFFFFFFFFFFFF|
> +--+---------------+
> second write
> |
> V
> +--+------+--------+
> |__|______|FFFFFFFF|
> +--+------+--------+
> third write
> |
> V
> +--+------+------+-+
> |__|______|______|_|
> +--+------+------+-+
> fourth write
> | (NOT preallocated)
> V
> +--+------+------+-+
> |__|______|______|________
> +--+------+------+-+
> fifth write
> |
> V<--- 1 MB -->
> +--+------+------+--------+---+--------+
> |__|______|______|________|___|FFFFFFFF|
> +--+------+------+--------+---+--------+
> {code}
> And here is the new preallocation strategy:
> {code}
> first write
> |
> V
> +--+
> |__|
> +--+
> second write
> |
> V
> +--+------+
> |__|______|
> +--+------+
> third write
> |
> V
> +--+------+------+
> |__|______|______|
> +--+------+------+
> fourth write
> |
> V
> +--+------+------+--------+
> |__|______|______|________|
> +--+------+------+--------+
> fifth write
> |
> V
> +--+------+------+--------+---+
> |__|______|______|________|___|
> +--+------+------+--------+---+
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira