[jira] [Updated] (HDFS-8722) Optimize datanode writes for small writes and flushes

Junping Du (JIRA) Thu, 05 Jan 2017 16:51:30 -0800

     [ 
https://issues.apache.org/jira/browse/HDFS-8722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Junping Du updated HDFS-8722:
-----------------------------
    Fix Version/s: 2.8.0

> Optimize datanode writes for small writes and flushes
> -----------------------------------------------------
>
>                 Key: HDFS-8722
>                 URL: https://issues.apache.org/jira/browse/HDFS-8722
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 2.7.1
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>            Priority: Critical
>             Fix For: 2.8.0, 2.7.2, 2.6.4, 3.0.0-alpha1
>
>         Attachments: HDFS-8722.br26.patch, HDFS-8722.patch, HDFS-8722.v1.patch
>
>
> After the data corruption fix by HDFS-4660, the CRC recalculation for partial 
> chunk is executed more frequently, if the client repeats writing few bytes 
> and calling hflush/hsync.  This is because the generic logic forces CRC 
> recalculation if on-disk data is not CRC chunk aligned. Prior to HDFS-4660, 
> datanode blindly accepted whatever CRC client provided, if the incoming data 
> is chunk-aligned. This was the source of the corruption.
> We can still optimize for the most common case where a client is repeatedly 
> writing small number of bytes followed by hflush/hsync with no pipeline 
> recovery or append, by allowing the previous behavior for this specific case. 
>  If the incoming data has a duplicate portion and that is at the last 
> chunk-boundary before the partial chunk on disk, datanode can use the 
> checksum supplied by the client without redoing the checksum on its own.  
> This reduces disk reads as well as CPU load for the checksum calculation.
> If the incoming packet data goes back further than the last on-disk chunk 
> boundary, datanode will still do a recalculation, but this occurs rarely 
> during pipeline recoveries. Thus the optimization for this specific case 
> should be sufficient to speed up the vast majority of cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-8722) Optimize datanode writes for small writes and flushes

Reply via email to