[ 
https://issues.apache.org/jira/browse/HBASE-1765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743035#action_12743035
 ] 

Jonathan Gray commented on HBASE-1765:
--------------------------------------

Of note in this implementation:  

There was a decision to make about how to do the serialization/deserialization 
of Result[].  Prior to this patch, we were reading a single massive byte[] for 
all Result[] together.  The issue is that Result can then not just have a 
byte[] because we also need an offset.  Rather than introduce byte[] and offset 
(then we don't have a simple .getBytes() method) I'm using 
ImmutableBytesWritable which is just like KeyValue in that we give it (byte[], 
offset, length).  So now Result.getBytes() returns an ImmutableBytesWritable.

This allows us to retain the optimization of reading a single large byte[] for 
the entire Result array rather than one byte[] per Result.  The trade-off is 
that Result.getBytes() returns IBW instead of byte[], so consumer must be aware 
that they need to check IBW.getOffset().  There is a note in the javadoc to 
that regard.

> Delay Result deserialization until asked for and permit access to the raw 
> binary to prevent forced deserialization
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-1765
>                 URL: https://issues.apache.org/jira/browse/HBASE-1765
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: client
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>             Fix For: 0.20.1, 0.21.0
>
>         Attachments: HBASE-1765-v1.patch, HBASE-1765-v2.patch
>
>
> We have our own API that we use to access HBase from other languages like 
> erlang, python, c, etc...
> The Java gateway that maps from the actual HBase API to our internal API 
> wants to pass the raw binary received for a Result.  As is, we have to 
> deserialize into an array of KeyValues and then re-serialize into a flat 
> byte[].
> We would like to propose modifying Result to not build the KeyValue[] until 
> it's asked for via client methods (.raw() or .sorted() or any of the map 
> methods).  This is already how the map methods work (we don't build the map 
> until it's asked for the first time).
> The only API change would be adding an additional Result.getBytes() method 
> the get the raw underlying byte[] that was sent from the server.  
> The Result.readFields(DataInput) would then only read in the full byte[].  
> Would add an additional private method Result.readFields() that generated the 
> KeyValue[].  That would be called whenever a client asks for anything besides 
> .getBytes().
> Since all access to Result is done through those methods (KeyValue[] private 
> and not directly accessible w/o using those methods) this should not impact 
> any existing code.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to