[ 
https://issues.apache.org/jira/browse/HBASE-5355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13474786#comment-13474786
 ] 

Lars Hofhansl commented on HBASE-5355:
--------------------------------------

Before we commit this or the trunk patch I'd love to see some numbers comparing 
this full compression stream approach with just avoiding duplicate data while 
serializing from/to the RegionServer. On both sides we'd have to reassemble the 
full KVs (unless we finally make a KV interface), but we can that efficiently 
if we keep track size of the omitted parts of the KV and preallocate the space 
and copy the data in that. That way we'd have the same amount memory copying 
(ignoring DMA from the network card for the moment) and can safe bytes on the 
wire.
I raised this on the mailing this a while ago, and Andy commented on that 
somewhere as well.
KV are sorted when traveling over the wire (as a set of Puts/Deletes or in a 
Result) we can simple avoid copying the prefix multiple times.
                
> Compressed RPC's for HBase
> --------------------------
>
>                 Key: HBASE-5355
>                 URL: https://issues.apache.org/jira/browse/HBASE-5355
>             Project: HBase
>          Issue Type: Improvement
>          Components: IPC/RPC
>    Affects Versions: 0.89.20100924
>            Reporter: Karthik Ranganathan
>            Assignee: Karthik Ranganathan
>         Attachments: HBASE-5355-0.94.patch
>
>
> Some application need ability to do large batched writes and reads from a 
> remote MR cluster. These eventually get bottlenecked on the network. These 
> results are also pretty compressible sometimes.
> The aim here is to add the ability to do compressed calls to the server on 
> both the send and receive paths.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to