[ https://issues.apache.org/jira/browse/HADOOP-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12545343 ]
Rajagopal Natarajan commented on HADOOP-2154: --------------------------------------------- Before I start writing the code, I just have a query about this improvement. As I could see, maxbytes per checksum is typically 512 bytes by default. By default, block size is 67108864 bytes. This means that if the chunks would be written to sockets directly (as opposed to accumulating chunks and writing out once a full block is ready) with its checksum, we are decreasing the data:header ratio by a big factor, isn't it? Wouldn't this be inefficient? Or am I missing something? > Non-interleaved checksums would optimize block transfers. > --------------------------------------------------------- > > Key: HADOOP-2154 > URL: https://issues.apache.org/jira/browse/HADOOP-2154 > Project: Hadoop > Issue Type: Improvement > Components: dfs > Affects Versions: 0.14.0 > Reporter: Konstantin Shvachko > Assignee: Rajagopal Natarajan > Fix For: 0.16.0 > > > Currently when a block is transfered to a data-node the client interleaves > data chunks with the respective checksums. > This requires creating an extra copy of the original data in a new buffer > interleaved with the crcs. > We can avoid extra copying if the data and the crc are fed to the socket one > after another. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.