[ 
https://issues.apache.org/jira/browse/HDFS-1846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13025517#comment-13025517
 ] 

Aaron T. Myers commented on HDFS-1846:
--------------------------------------

@Eli - the first test was indeed done on an SSD. Here are the results of 
running the test on a spinning HDD:

{noformat}
----------------------------------------------------
Results for classic scheme:
Overall total ops: 100000
Overall total time of all ops: 1024072.0
Overall average time of op: 10.24072
Overall fastest op: 3
Overall slowest op: 178
Preallocation total ops: 23
Preallocation total time of all ops: 871.0
Preallocation average time of op: 37.869565217391305
Preallocation fastest op: 28
Preallocation slowest op: 52
Total time of slowest 1% of ops: 48949.0
Average time of slowest 1% of ops: 48.949
----------------------------------------------------
----------------------------------------------------
Results for new scheme:
Overall total ops: 100000
Overall total time of all ops: 860702.0
Overall average time of op: 8.60702
Overall fastest op: 2
Overall slowest op: 288
Preallocation total ops: 23
Preallocation total time of all ops: 1236.0
Preallocation average time of op: 53.73913043478261
Preallocation fastest op: 41
Preallocation slowest op: 91
Total time of slowest 1% of ops: 36456.0
Average time of slowest 1% of ops: 36.456
----------------------------------------------------
{noformat}

The results are similar to my previous test, just a whole lot slower across the 
board. If anything the percent improvement for the average op seems to have 
improved - from 5% improvement on an SSD to 18% improvement on a normal HDD. 
The average performance degradation of a preallocation-inducing op has also 
improved - from 1200% worse to 42% worse.

Also worth noting that, per an offline suggestion from Todd, I ran this test 
slightly differently. I ran each test (classic and new schemes) twice, to 
account for any warm-up time for the various caches involved (disk, JIT, 
classloading, local FS, etc.) The results I've included here are from the 
second run of each test. Here's a diff based off my previous patch:

{code}
index 7e74429..d599224 100644
--- 
src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestEditLogOutputStream.java
+++ 
src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestEditLogOutputStream.java
@@ -19,11 +19,13 @@ public class TestEditLogOutputStream {
   @Test
   public void 
testEditLogOutputStreamPerformanceWithClassicPreallocationScheme() throws 
IOException {
     performTestAndPrintResults(false);
+    performTestAndPrintResults(false);
   }
   
   @Test
   public void testEditLogOutputStreamPerformanceWithNewPreallocationScheme() 
throws IOException {
     performTestAndPrintResults(true);
+    performTestAndPrintResults(true);
   }
 
   private void performTestAndPrintResults(boolean useNewPreallocationScheme) 
throws IOException {
@@ -32,6 +34,7 @@ public class TestEditLogOutputStream {
     
     Configuration conf = new Configuration();
     conf.set(DFSConfigKeys.DFS_PERMISSIONS_ENABLED_KEY, "false");
+    conf.set("hadoop.tmp.dir", "/data/1/atm/edits-log-preallocate-test/tmp");
     FileSystem.setDefaultUri(conf, "hdfs://localhost:0");
     conf.set("dfs.http.address", "127.0.0.1:0");
     File baseDir = new File(conf.get("hadoop.tmp.dir"), "dfs/");
{code}

> Don't fill preallocated portion of edits log with 0x00
> ------------------------------------------------------
>
>                 Key: HDFS-1846
>                 URL: https://issues.apache.org/jira/browse/HDFS-1846
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>    Affects Versions: 0.23.0
>            Reporter: Aaron T. Myers
>            Assignee: Aaron T. Myers
>         Attachments: hdfs-1846-perf-analysis.0.patch, hdfs-1846.0.txt
>
>
> HADOOP-2330 added a feature to preallocate space in the local file system for 
> the NN transaction log. That change seeks past the current end of the file 
> and writes out some data, which on most systems results in the intervening 
> data in the file being filled with zeros. Most underlying file systems have 
> special handling for sparse files, and don't actually allocate blocks on disk 
> for blocks of a file which consist completely of 0x00.
> I've seen cases in the wild where the volume an edits dir is on fills up, 
> resulting in a partial final transaction being written out to disk. If you 
> examine the bytes of this (now corrupt) edits file, you'll see the partial 
> final transaction followed by a lot of zeros, suggesting that the 
> preallocation previously succeeded before the volume ran out of space. If we 
> fill the preallocated space with something other than zeros, we'd likely see 
> the failure at preallocation time, rather than transaction-writing time, and 
> so cause the NN to crash earlier, without a partial transaction being written 
> out.
> I also hypothesize that filling the preallocated space in the edits log with 
> something other than 0x00 will result in a performance improvement in NN 
> throughput. I haven't tested this yet, but I intend to as part of this JIRA.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to