[jira] [Commented] (HDFS-4070) DFSClient ignores bufferSize argument & always performs small writes

Gopal V (JIRA) Thu, 18 Oct 2012 05:08:14 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-4070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478935#comment-13478935
 ]


Gopal V commented on HDFS-4070:
-------------------------------

I agree that the kernels should be doing IO merging and handle concurrent 
disk-writes well, because they have global knowledge of I/O patterns. Ideally, 
the userland code should just let that be and write large, page aligned chunks 
of data & let the kernel do the actual magic.

In this case, the userland layer is actually being unnecessarily chatty on 
syscalls. I'd hazard a guess that the performance boost is simply due to the 
reduced number of syscalls & from letting the kernel do fewer wake-ups of 
threads running CPU bound code (check-sums and such).

Before I benchmarked it, I was more annoyed about the fact that the system 
out-right overrides my buffer settings (a decent tunable, if you will).

I don't think the 15% number directly translates into any other benchmark, this 
obviously highlights the DFS writes & without any other operations in the 
middle.

This was tested on an RHEL 5.5 box on EC2 (with 4 EBS volumes backing hdfs).

2.6.18-194.32.1.el5xen #1 SMP

A real bare metal box would behave differently  (I've ordered a new SSD backed 
box, it'll arrive in parts & get all working by next week I guess).

In the mean time, if you can bench this on some real hardware, I'll have some 
foundation to my theories (beyond averaging stuff on EC2).
                
> DFSClient ignores bufferSize argument & always performs small writes
> --------------------------------------------------------------------
>
>                 Key: HDFS-4070
>                 URL: https://issues.apache.org/jira/browse/HDFS-4070
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs client
>    Affects Versions: 1.0.3, 2.0.3-alpha
>         Environment: RHEL 5.5 x86_64 (ec2)
>            Reporter: Gopal V
>            Priority: Minor
>         Attachments: 
> gistfe319436b880026cbad4-aad495d50e0d6b538831327752b984e0fdcc74db.tar.gz
>
>
> The following code illustrates the issue at hand 
> {code}
>  protected void map(LongWritable offset, Text value, Context context) 
>               throws IOException, InterruptedException {
>                       OutputStream out = fs.create(new 
> Path("/tmp/benchmark/",value.toString()), true, 1024*1024); 
>                       int i;
>                       for(i = 0; i < 1024*1024; i++) {
>                               out.write(buffer, 0, 1024);
>                       }
>                       out.close();
>                       context.write(value, new IntWritable(i));
>       }
> {code}
> This code is run as a single map-only task with an input file on disk and 
> map-output to disk.
> {{# su - hdfs -c 'hadoop jar /tmp/dfs-test-1.0-SNAPSHOT-job.jar  
> file:///tmp/list file:///grid/0/hadoop/hdfs/tmp/benchmark'}}
> In the data node disk access patterns, the following consistent pattern was 
> observed irrespective of bufferSize provided.
> {code}
> 21119 read(58,  <unfinished ...>
> 21119 <... read resumed> 
> "\0\1\0\0\0\0\0\0\0034\212\0\0\0\0\0\0\0+\220\0\0\0\376\0\262\252ux\262\252u"...,
>  65557) = 65557
> 21119 lseek(107, 0, SEEK_CUR <unfinished ...>
> 21119 <... lseek resumed> )             = 53774848
> 21119 write(107, 
> "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65024 
> <unfinished ...>
> 21119 <... write resumed> )             = 65024
> 21119 write(108, 
> "\262\252ux\262\252ux\262\252ux\262\252ux\262\252ux\262\252ux\262\252ux\262\252ux"...,
>  508 <unfinished ...>
> 21119 <... write resumed> )             = 508
> {code}
> Here fd 58 is the incoming socket, 107 is the blk file and 108 is the .meta 
> file.
> The DFS packet size ignores the bufferSize argument and suffers from 
> suboptimal syscall & disk performance because of the default 64kb value, as 
> is obvious from the interrupted read/write operations.
> Changing the packet size to a more optimal 1056405 bytes results in a decent 
> spike in performance, by cutting down on disk & network iops.
> h3. Average time (milliseconds) for a 10 GB write as 10 files in a single map 
> task
> ||timestamp||65536||1056252||
> |1350469614|88530|78662|
> |1350469827|88610|81680|
> |1350470042|92632|78277|
> |1350470261|89726|79225|
> |1350470476|92272|78265|
> |1350470696|89646|81352|
> |1350470913|92311|77281|
> |1350471132|89632|77601|
> |1350471345|89302|81530|
> |1350471564|91844|80413|
> That is by average an increase from ~115 MB/s to ~130 MB/s, by modifying the 
> global packet size setting.
> This suggests that there is value in adapting the user provided buffer sizes 
> to hadoop packet sizing, per stream.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4070) DFSClient ignores bufferSize argument & always performs small writes

Reply via email to