[ 
https://issues.apache.org/jira/browse/HBASE-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13003260#comment-13003260
 ] 

ryan rawson commented on HBASE-3480:
------------------------------------

in my test the cost of serialization was larger than the time savings
of data transmission on the wire.

I think we are going to put a freeze on RPC changes soon, we need to
be thinking next gen.


> Reduce the size of Result serialization
> ---------------------------------------
>
>                 Key: HBASE-3480
>                 URL: https://issues.apache.org/jira/browse/HBASE-3480
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: ryan rawson
>         Attachments: HBASE-3480-lzf.txt, HBASE-3480.txt
>
>
> When faced with a gigabit ethernet network connection, things are pretty slow 
> actually.  For example, let's take a 2 MB reply, using a 120MB/sec line rate, 
> we are talking about about 16ms to transfer that data across a gige line.  
> This is a pretty significant amount of time.
> So this JIRA is about reducing the size of the Result[] serialization.  By 
> exploiting family and qualifier and rowkey duplication, I created a simple 
> encoding scheme to use a dictionary instead of literal strings.  
> in my testing, I am seeing some success with the sizes.  Average serialized 
> size is about 1/2 of previous, but time to serialize on the regionserver side 
> is way up, by a factor of 10x.  This might be due to the simplistic first 
> implementation however.
> Here is the post change size:
> grep 'Serialized size' * | perl -ne '/Serialized size: (\d+?) in (\d+?) ns/ ; 
> print $1, " ", $2, "\n" if $1 > 10000;' | cut -f1 -d' ' | perl -ne '$sum += 
> $_; $count++; END {print $sum/$count, "\n"}'
> 377047.1125
> Here is the pre change size:
> grep 'Serialized size' * | perl -ne '/Serialized size: (\d+?) in (\d+?) ns/ ; 
> print $1, " ", $2, "\n" if $1 > 10000;' | cut -f1 -d' ' | perl -ne '$sum += 
> $_; $count++; END {print $sum/$count, "\n"}'
> 601078.505882353
> That is about a 60% improvement in size.
> But times are not so good, here are some samples of the old, in (size) (time 
> in ns)
> 3874599 10685836
> 5582725 11525888
> so that is about 11ms to serialize 3-5mb of data.
> In the new implementation:
> 1898788 118504672
> 1630058 91133003
> this is 118-91ms for serialized sizes of 1.6-1.8 MB.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to