[ 
https://issues.apache.org/jira/browse/HADOOP-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated HADOOP-1649:
---------------------------------

    Attachment: HADOOP-1649.patch

Latest patch that adds buffering to various consumers and producers of block 
data. With this patch most of of the performance gap in benchmarks is closed. 
With TestDFSIO we are still seeing 3-5% difference on average. Each time this 
difference can be traced to nodes with slow disks. Whether block crcs makes  
bad nodes worse is not clear. 

This patch adds buffer while writing data to disk as well as while reading from 
disk. From the tests, buffer while writing is more important. I guess OS 
read-ahead while reading the data makes  buffer for reading.

Of course, extra buffering add extra data copies. I will file another jira to 
remove majority of these copies without changing buffering.

Another change is that DataNode opens block file with RandomAccessFile() and 
seeks to first read position. It used to skip() to the position.

> Performance regression with Block CRCs
> --------------------------------------
>
>                 Key: HADOOP-1649
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1649
>             Project: Hadoop
>          Issue Type: Bug
>    Affects Versions: 0.14.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>            Priority: Blocker
>             Fix For: 0.14.0
>
>         Attachments: HADOOP-1649.patch, HADOOP-1649.patch
>
>
> Performance is noticeably affected by Block Level CRCs patch (HADOOP-1134). 
> This is more noticeable on writes (randomriter test etc). 
> With random writer, it takes 20-25% on small cluster (20 nodes) and many be 
> 10% on larger cluster. 
> There are a few differences in how data is written with 1134. As soon as I 
> can reproduce this, I think it will be easier to fix. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to