[ https://issues.apache.org/jira/browse/HDFS-1846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13025517#comment-13025517 ]
Aaron T. Myers commented on HDFS-1846: -------------------------------------- @Eli - the first test was indeed done on an SSD. Here are the results of running the test on a spinning HDD: {noformat} ---------------------------------------------------- Results for classic scheme: Overall total ops: 100000 Overall total time of all ops: 1024072.0 Overall average time of op: 10.24072 Overall fastest op: 3 Overall slowest op: 178 Preallocation total ops: 23 Preallocation total time of all ops: 871.0 Preallocation average time of op: 37.869565217391305 Preallocation fastest op: 28 Preallocation slowest op: 52 Total time of slowest 1% of ops: 48949.0 Average time of slowest 1% of ops: 48.949 ---------------------------------------------------- ---------------------------------------------------- Results for new scheme: Overall total ops: 100000 Overall total time of all ops: 860702.0 Overall average time of op: 8.60702 Overall fastest op: 2 Overall slowest op: 288 Preallocation total ops: 23 Preallocation total time of all ops: 1236.0 Preallocation average time of op: 53.73913043478261 Preallocation fastest op: 41 Preallocation slowest op: 91 Total time of slowest 1% of ops: 36456.0 Average time of slowest 1% of ops: 36.456 ---------------------------------------------------- {noformat} The results are similar to my previous test, just a whole lot slower across the board. If anything the percent improvement for the average op seems to have improved - from 5% improvement on an SSD to 18% improvement on a normal HDD. The average performance degradation of a preallocation-inducing op has also improved - from 1200% worse to 42% worse. Also worth noting that, per an offline suggestion from Todd, I ran this test slightly differently. I ran each test (classic and new schemes) twice, to account for any warm-up time for the various caches involved (disk, JIT, classloading, local FS, etc.) The results I've included here are from the second run of each test. Here's a diff based off my previous patch: {code} index 7e74429..d599224 100644 --- src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestEditLogOutputStream.java +++ src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestEditLogOutputStream.java @@ -19,11 +19,13 @@ public class TestEditLogOutputStream { @Test public void testEditLogOutputStreamPerformanceWithClassicPreallocationScheme() throws IOException { performTestAndPrintResults(false); + performTestAndPrintResults(false); } @Test public void testEditLogOutputStreamPerformanceWithNewPreallocationScheme() throws IOException { performTestAndPrintResults(true); + performTestAndPrintResults(true); } private void performTestAndPrintResults(boolean useNewPreallocationScheme) throws IOException { @@ -32,6 +34,7 @@ public class TestEditLogOutputStream { Configuration conf = new Configuration(); conf.set(DFSConfigKeys.DFS_PERMISSIONS_ENABLED_KEY, "false"); + conf.set("hadoop.tmp.dir", "/data/1/atm/edits-log-preallocate-test/tmp"); FileSystem.setDefaultUri(conf, "hdfs://localhost:0"); conf.set("dfs.http.address", "127.0.0.1:0"); File baseDir = new File(conf.get("hadoop.tmp.dir"), "dfs/"); {code} > Don't fill preallocated portion of edits log with 0x00 > ------------------------------------------------------ > > Key: HDFS-1846 > URL: https://issues.apache.org/jira/browse/HDFS-1846 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node > Affects Versions: 0.23.0 > Reporter: Aaron T. Myers > Assignee: Aaron T. Myers > Attachments: hdfs-1846-perf-analysis.0.patch, hdfs-1846.0.txt > > > HADOOP-2330 added a feature to preallocate space in the local file system for > the NN transaction log. That change seeks past the current end of the file > and writes out some data, which on most systems results in the intervening > data in the file being filled with zeros. Most underlying file systems have > special handling for sparse files, and don't actually allocate blocks on disk > for blocks of a file which consist completely of 0x00. > I've seen cases in the wild where the volume an edits dir is on fills up, > resulting in a partial final transaction being written out to disk. If you > examine the bytes of this (now corrupt) edits file, you'll see the partial > final transaction followed by a lot of zeros, suggesting that the > preallocation previously succeeded before the volume ran out of space. If we > fill the preallocated space with something other than zeros, we'd likely see > the failure at preallocation time, rather than transaction-writing time, and > so cause the NN to crash earlier, without a partial transaction being written > out. > I also hypothesize that filling the preallocated space in the edits log with > something other than 0x00 will result in a performance improvement in NN > throughput. I haven't tested this yet, but I intend to as part of this JIRA. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira