[ 
https://issues.apache.org/jira/browse/HDFS-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13222216#comment-13222216
 ] 

Henry Robinson commented on HDFS-2834:
--------------------------------------

Here are some initial benchmark numbers. They need a little explaining.

I ran 16 experiments total, changing the read path (copying or direct), the 
kind of checksum used (native, non-native or none), the locality (shortcircuit 
or remote-to-same-machine), although not in all combinations. All measurements 
are through libhdfs, which will explain a couple of oddities in the 
performance. Once I get a little time, I'll try and do a native Java benchmark, 
but the relative results should be quite similar. 

*Configuration*

The test reads the first 512MB of a 2GB file from a MiniDFSCluster running on 
the same machine. Each configuration was run 50 times, and the first 5 runs 
were discarded; the remaining runs were averaged. The file was read from buffer 
cache on a machine with 16GB RAM and 8 i7 cores. You can see the code here: 
https://gist.github.com/1977470

*Read sizes*

The size of each read requested is an important variable. When performing a 
checksum, the maximum read size is bounded by the number of checksums that 
BlockReadLocal can fit into its internal buffer. Prior to this patch, this 
fixed the maximum read size in one go to 32k. In a revision of this patch which 
I'll upload shortly, I've made this buffer size configurable.

I ran all the experiments in two configurations - with a 32k read buffer, and a 
1MB one. The size of the requested read was also fixed to 32k and 1MB 
respectively. When not performing checksums, but doing a shortcircuit read, 
there is no limit on the size of a single read, but for comparison these 
experiments were run with 32k and 1MB reads as well. 

Finally, remote reads are limited to 64k in size. Again, I ran the experiment 
with both read sizes. The 1MB / copying read performance is extremely slow when 
performing a remote read. This is because of the excessive amount of memory 
allocation happening inside libhdfs which will allocate a 1MB byte[] for each 
64k read. This illustrates one of the confounding effects of measuring 
performance through libhdfs, and the dangers of not correctly matching your 
read size to the size of the read the BlockReader implementation is able to 
return.

*Results*

(All values are throughput measured in MB/s)

||      ||Native Checksums||    No Checksums||  Non-native Checksums||  Remote, 
Native Checksums
|Direct (MB/s) - 1MB buffer and request size|   3834.25 |4665.05|       867.06| 
2057.17|
|Copying (MB/s) - 1MB buffer and request size|  1976.09 |1650.15|       754.97| 
394.91|
|Direct (MB/s) - 32k buffer and request size|   2943.02 |3695.37|       816.22| 
1925.03|
|Copying (MB/s) - 32k buffer and request size|  2010.21|        2290.50|        
721.52| 1412.20|

... and in pretty picture form:

!hdfs-2834-libhdfs-benchmark.png!


                
> ByteBuffer-based read API for DFSInputStream
> --------------------------------------------
>
>                 Key: HDFS-2834
>                 URL: https://issues.apache.org/jira/browse/HDFS-2834
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Henry Robinson
>            Assignee: Henry Robinson
>         Attachments: HDFS-2834-no-common.patch, HDFS-2834.3.patch, 
> HDFS-2834.4.patch, HDFS-2834.5.patch, HDFS-2834.6.patch, HDFS-2834.patch, 
> HDFS-2834.patch, hdfs-2834-libhdfs-benchmark.png
>
>
> The {{DFSInputStream}} read-path always copies bytes into a JVM-allocated 
> {{byte[]}}. Although for many clients this is desired behaviour, in certain 
> situations, such as native-reads through libhdfs, this imposes an extra copy 
> penalty since the {{byte[]}} needs to be copied out again into a natively 
> readable memory area. 
> For these cases, it would be preferable to allow the client to supply its own 
> buffer, wrapped in a {{ByteBuffer}}, to avoid that final copy overhead. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to