[ https://issues.apache.org/jira/browse/HDFS-1846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13023631#comment-13023631 ]
Aaron T. Myers commented on HDFS-1846: -------------------------------------- Hey guys, I've done some performance analysis, and here are the results. I'll post a patch shortly (not intended for inclusion) so you can see what I did to do the analysis. If anyone would like to try this patch on their own system, I'd be very curious to see the results, since as Nathan points out, the results can be affected by many factors. {noformat} ---------------------------------------------------- Results for classic scheme: Overall total ops: 100000 Overall total time of all ops: 39224.0 Overall average time of op: 0.39224 Overall fastest op: 0 Overall slowest op: 223 Preallocation total ops: 23 Preallocation total time of all ops: 24.0 Preallocation average time of op: 1.0434782608695652 Preallocation fastest op: 0 Preallocation slowest op: 6 Total time of slowest 1% of ops: 4858.0 Average time of slowest 1% of ops: 4.858 ---------------------------------------------------- ---------------------------------------------------- Results for new scheme: Overall total ops: 100000 Overall total time of all ops: 37192.0 Overall average time of op: 0.37192 Overall fastest op: 0 Overall slowest op: 231 Preallocation total ops: 23 Preallocation total time of all ops: 291.0 Preallocation average time of op: 12.652173913043478 Preallocation fastest op: 10 Preallocation slowest op: 21 Total time of slowest 1% of ops: 4670.0 Average time of slowest 1% of ops: 4.67 ---------------------------------------------------- {noformat} I personally ran this test several times on my own system, and the results from this particular test run are pretty representative. There wasn't much variation across runs. As you can see from this data, with the new scheme, performing an edit which causes an on-disk preallocation is indeed slower - about 10x slower than a similar op using the previous scheme. However, I was correct that the time taken for the average op is indeed lower with the new scheme than the old. Also worth noting that the average time taken for the slowest 1% of ops is faster with the new scheme, since there were only 23 preallocations during the test run. I'm of the opinion that the increased latency of the preallocation-inducing ops is worth the performance improvement of the average op and the extra durability this patch would provide. The worst increase in latency from an op which happens to induce a preallocation is ~20ms, which seems acceptable. Also, curiously, in the course of this analysis I discovered that under both preallocation schemes there are fairly consistently ~10 ops whose total time taken was ~200ms on my system. These ops seem uncorrelated with preallocations. Determining what's causing those is being left as future work. > Don't fill preallocated portion of edits log with 0x00 > ------------------------------------------------------ > > Key: HDFS-1846 > URL: https://issues.apache.org/jira/browse/HDFS-1846 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node > Affects Versions: 0.23.0 > Reporter: Aaron T. Myers > Assignee: Aaron T. Myers > Attachments: hdfs-1846.0.txt > > > HADOOP-2330 added a feature to preallocate space in the local file system for > the NN transaction log. That change seeks past the current end of the file > and writes out some data, which on most systems results in the intervening > data in the file being filled with zeros. Most underlying file systems have > special handling for sparse files, and don't actually allocate blocks on disk > for blocks of a file which consist completely of 0x00. > I've seen cases in the wild where the volume an edits dir is on fills up, > resulting in a partial final transaction being written out to disk. If you > examine the bytes of this (now corrupt) edits file, you'll see the partial > final transaction followed by a lot of zeros, suggesting that the > preallocation previously succeeded before the volume ran out of space. If we > fill the preallocated space with something other than zeros, we'd likely see > the failure at preallocation time, rather than transaction-writing time, and > so cause the NN to crash earlier, without a partial transaction being written > out. > I also hypothesize that filling the preallocated space in the edits log with > something other than 0x00 will result in a performance improvement in NN > throughput. I haven't tested this yet, but I intend to as part of this JIRA. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira