[jira] Commented: (HDFS-959) Performance improvements to DFSClient and DataNode for faster DFS write at replication factor of 1

Naredula Janardhana Reddy (JIRA) Mon, 15 Feb 2010 01:43:53 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833737#action_12833737
 ]


Naredula Janardhana Reddy commented on HDFS-959:
------------------------------------------------

bq. What is the bandwidth of the disk that you use? How is it compared to 165 
MB/sec throughput?

I misunderstood your question the first time. 

Based on a simple C program I wrote, I was able to push ~185 MB/s. The program 
does write() in a tight loop using large amounts of data (>40 GB) on a machine 
with 16GB  RAM, ensuring that the data is written to the disk (and does not 
stay buffered in the OS).

I currently don't have access to the same nodes, therefore am unable to measure 
it using a standard benchmark such as bonnie++.

> Performance improvements to DFSClient and DataNode for faster DFS write at 
> replication factor of 1
> --------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-959
>                 URL: https://issues.apache.org/jira/browse/HDFS-959
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: data-node, hdfs client
>    Affects Versions: 0.20.2
>         Environment: RHEL5 on Dual CPU quad-core Intel servers, 16 GB RAM, 4 
> SATA disks.
>            Reporter: Naredula Janardhana Reddy
>             Fix For: 0.20.2
>
>         Attachments: performance_patch
>
>
> The following improvements are suggested to DFSClient and DataNode to improve 
> DFS write throughput, based on experimental verification with replication 
> factor of 1.
> The changes are useful in principle for replication factors of 2 and 3 as 
> well, but they do not currently demonstrate noticeable performance 
> improvement in our test-bed because of a network throughput bottleneck that 
> hides the benefit of these changes. 
> All changes are applicable to 0.20.2. Some of them are applicable to trunk, 
> as noted below. I have not verified applicability to 0.21.
> List of Improvements
> -----------------------------
> Item 1: DFSCilent. Finer grain locks in WriteChunk(). Currently the lock is 
> held at the data block level (512 bytes). It can be moved to the packet level 
> (64kbytes), to lower the frequency of locking.
>  This optimization applies to 20.2. It already appears in trunk.
> Item 2: Misc. improvements to DataNode
>  2.1:  Concurrency of Disk Writes: Check sum verification and writing data to 
> disk can be moved to a separate thread ("Disk Write Thread"). This will allow 
> the existing "network thread" to trigger faster  acks to the DFSClient. This 
> will also allow the packet to be transmitted to the replication node faster. 
> In effect, this will allow DataNode to consume packets at higher speeds.
>  This optimization applies to 20.2 and trunk.
>  2.2:  Bulk Receive and Bulk Send: This optimization is enabled by doing 2.1. 
> We can now have DataNode receive more than one packet at a time since we have 
> added a buffer between the (existing) network thread and the (newly added) 
> Disk Write thread.
>  This optimization applies to 20.2 and trunk.
>  2.3: Early Ack:  The proposed optimization is to send out acks to the client 
> as soon as possible instead of waiting for the disk write. Note that, the 
> last ack is an exception: It will be sent only after data has been flushed to 
> the OS.
>  This optimization applies to 20.2. It already appears in trunk.
>  2.4: lseek optimization: Currently lseek (the system call) is called before 
> every disk write, which is not necessary when the write is sequential. The 
> propsed optimization calls lseek only when necessary.
>  This optimization applies to 20.2. I was unable to tell if it is already in 
> trunk.
>  2.5 Checksum buffered writes: Currently checksum is written in a buffered 
> stream of size 512 bytes. This can be increased to a higher numbers - such as 
> 4kbytes - to lower the number of write() system calls. This will save context 
> switch overhead.
>  This optimization applies to 20.2. I was unable to tell if it is already in 
> trunk.
> Item 3: Applying HADOOP-6166 - PureJavaCrc32() - from trunk to 20.2
>  This is applicable to 20.2.  It already appears in trunk.
> Performance Experiments Results
> -----------------------------------------------
> Performance experiments showed the following numbers:
> Hadoop Version: 0.20.2
> Server Configs: RHEL5, Quad-core dual-CPU, 16GB RAM, 4 SATA disks
>  $ uname -a
>  Linux gsbl90324.blue.ygrid.yahoo.com 2.6.18-53.1.13.el5 #1 SMP Mon Feb 11 
> 13:27:27 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
>  $ cat /proc/cpuinfo
>  model name   : Intel(R) Xeon(R) CPU           L5420  @ 2.50GHz
>  $ cat /etc/issue
>  Red Hat Enterprise Linux Server release 5.1 (Tikanga)
>  Kernel \r on an \m
> Benchmark Details
> --------------------------
> Benchmark Name: DFSIO
> Benchmark Configuration:
>  a) # maps (writers to DFS per node). Tried the following values: 1,2,3
>  b) # of nodes: Single-node test and 15-node cluster test
> Results Summary
> --------------------------
> a) With all the above optimizations turned on
> All these tests were done with replication factor of 1. Tests with 
> replication factors of 2 and 3 showed no noticeably improvement, because 
> these improvements are shielded by network bandwidth as noted above.
> What was measured: Write throughput per client (in MB/s)
> | Test Description                                                          | 
>  Baseline (MB/s)  | With improvements (MB/s) |  % improvement |
> | 15-node cluster with 1 map (writer) per node       |  103                   
>      | 147                                          | ~43 %                   
>    |
> | Single node test with 1 maps (writer) per node    |  102                    
>     | 148                                          | ~45 %                    
>   |
> | Single node test with 2 maps (writers) per node  |   86                     
>     | 101                                          | ~16 %                    
>   |
> | Single node test with 3 maps (writers) per node  |   67                     
>     |   76                                          | ~13 %                   
>     |
>     
> a) With above optimizations turned on individually
> I ran some experiments by adding and removing items individually to 
> understand the approximate range of performance contribution from each item. 
> These are the numbers I got (They are approximate).
> | ITEM        | Title                                                         
>                     | Improvement in 0.20 | Improvement in trunk |
> | Item 1      | DFSCilent. Finer grain locks in WriteChunk()    |      30%    
>                      | Already in trunk            |
> | Item 2.1   | Concurrency of Disk Writes                                   | 
>     25%                          | 15-20%                         |
> | Item 2.2   | Bulk Receive and Bulk Send                                 |   
>     2%                          | (Have not yet tried)      |
> | Item 2.3   | Early Ack                                                      
>               |       2%                          | Already in trunk          
>   |
> | Item 2.4   | lseek optimization                                             
>       |       2%                          | (Have not yet tried)       |
> | Item 2.5   | Checksum buffered writes                                    |  
>      2%                          | (Have not yet tried)       |
> | Item 3      | Applying HADOOP-6166 - PureJavaCrc32()       |    15%         
>                  | Already in trunk             |
> Patches
> -----------
> I will submit a patch for 0.20.2 shortly (in a day).
> I expect to submit a patch for trunk after review comments for above patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-959) Performance improvements to DFSClient and DataNode for faster DFS write at replication factor of 1

Reply via email to