[jira] Commented: (HADOOP-2154) Non-interleaved checksums would optimize block transfers.

Rajagopal Natarajan (JIRA) Sun, 25 Nov 2007 23:08:14 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12545343
 ]


Rajagopal Natarajan commented on HADOOP-2154:
---------------------------------------------

Before I start writing the code, I just have a query about this improvement. As 
I could see, maxbytes per checksum is typically 512 bytes by default. By 
default, block size is 67108864 bytes. This means that if the chunks would be 
written to sockets directly (as opposed to  accumulating chunks and writing out 
once a full block is ready) with its checksum, we are decreasing the 
data:header ratio by a big factor, isn't it? Wouldn't this be inefficient? Or 
am I missing something?

> Non-interleaved checksums would optimize block transfers.
> ---------------------------------------------------------
>
>                 Key: HADOOP-2154
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2154
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Konstantin Shvachko
>            Assignee: Rajagopal Natarajan
>             Fix For: 0.16.0
>
>
> Currently when a block is transfered to a data-node the client interleaves 
> data chunks with the respective checksums. 
> This requires creating an extra copy of the original data in a new buffer 
> interleaved with the crcs.
> We can avoid extra copying if the data and the crc are fed to the socket one 
> after another.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2154) Non-interleaved checksums would optimize block transfers.

Reply via email to