[ 
https://issues.apache.org/jira/browse/HDFS-1846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13023631#comment-13023631
 ] 

Aaron T. Myers commented on HDFS-1846:
--------------------------------------

Hey guys, I've done some performance analysis, and here are the results. I'll 
post a patch shortly (not intended for inclusion) so you can see what I did to 
do the analysis. If anyone would like to try this patch on their own system, 
I'd be very curious to see the results, since as Nathan points out, the results 
can be affected by many factors.

{noformat}
----------------------------------------------------
Results for classic scheme:
Overall total ops: 100000 
Overall total time of all ops: 39224.0
Overall average time of op: 0.39224
Overall fastest op: 0
Overall slowest op: 223 
Preallocation total ops: 23
Preallocation total time of all ops: 24.0
Preallocation average time of op: 1.0434782608695652
Preallocation fastest op: 0
Preallocation slowest op: 6
Total time of slowest 1% of ops: 4858.0
Average time of slowest 1% of ops: 4.858
----------------------------------------------------
----------------------------------------------------
Results for new scheme: 
Overall total ops: 100000
Overall total time of all ops: 37192.0
Overall average time of op: 0.37192
Overall fastest op: 0 
Overall slowest op: 231
Preallocation total ops: 23 
Preallocation total time of all ops: 291.0 
Preallocation average time of op: 12.652173913043478
Preallocation fastest op: 10 
Preallocation slowest op: 21
Total time of slowest 1% of ops: 4670.0
Average time of slowest 1% of ops: 4.67
----------------------------------------------------
{noformat}

I personally ran this test several times on my own system, and the results from 
this particular test run are pretty representative. There wasn't much variation 
across runs.

As you can see from this data, with the new scheme, performing an edit which 
causes an on-disk preallocation is indeed slower - about 10x slower than a 
similar op using the previous scheme. However, I was correct that the time 
taken for the average op is indeed lower with the new scheme than the old. Also 
worth noting that the average time taken for the slowest 1% of ops is faster 
with the new scheme, since there were only 23 preallocations during the test 
run.

I'm of the opinion that the increased latency of the preallocation-inducing ops 
is worth the performance improvement of the average op and the extra durability 
this patch would provide. The worst increase in latency from an op which 
happens to induce a preallocation is ~20ms, which seems acceptable.

Also, curiously, in the course of this analysis I discovered that under both 
preallocation schemes there are fairly consistently ~10 ops whose total time 
taken was ~200ms on my system. These ops seem uncorrelated with preallocations. 
Determining what's causing those is being left as future work.

> Don't fill preallocated portion of edits log with 0x00
> ------------------------------------------------------
>
>                 Key: HDFS-1846
>                 URL: https://issues.apache.org/jira/browse/HDFS-1846
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>    Affects Versions: 0.23.0
>            Reporter: Aaron T. Myers
>            Assignee: Aaron T. Myers
>         Attachments: hdfs-1846.0.txt
>
>
> HADOOP-2330 added a feature to preallocate space in the local file system for 
> the NN transaction log. That change seeks past the current end of the file 
> and writes out some data, which on most systems results in the intervening 
> data in the file being filled with zeros. Most underlying file systems have 
> special handling for sparse files, and don't actually allocate blocks on disk 
> for blocks of a file which consist completely of 0x00.
> I've seen cases in the wild where the volume an edits dir is on fills up, 
> resulting in a partial final transaction being written out to disk. If you 
> examine the bytes of this (now corrupt) edits file, you'll see the partial 
> final transaction followed by a lot of zeros, suggesting that the 
> preallocation previously succeeded before the volume ran out of space. If we 
> fill the preallocated space with something other than zeros, we'd likely see 
> the failure at preallocation time, rather than transaction-writing time, and 
> so cause the NN to crash earlier, without a partial transaction being written 
> out.
> I also hypothesize that filling the preallocated space in the edits log with 
> something other than 0x00 will result in a performance improvement in NN 
> throughput. I haven't tested this yet, but I intend to as part of this JIRA.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to