[ 
https://issues.apache.org/jira/browse/AVRO-406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831252#action_12831252
 ] 

ryan rawson commented on AVRO-406:
----------------------------------

The HBase API is really returning a 2 dimensional array of byte arrays... Each 
cell (row/col/version) is one byte array, and then each row is an array of 
those, and each result set is multiple rows.  Perhaps another way to think of 
this is an endless array of cells with inferred 'next row' from watching the 
row key between cells. 

The optimization available to us here is that each hbase RPC call spends very 
little time actually _in hbase_ but ends up waiting on Datanode to get the data 
back.  Some kind of async framework on the server side could help chain daemons 
that make multiple RPCs and avoid busy-waiting threads.

One other thought, right now we use a block-oriented loader, since right now 
you have to store the entire value in RAM at least once (during RPC and 
memstore times), but if someone wanted to store massive values in hbase we 
could use the DN streaming API and stream those chunks back to the client.  
Right now everything is modelled as arrays of bytes, so that might not be so 
hard to do.  I'm a little wary of large object APIs, since you might as well 
store the data in HDFS directly. 

Right now the return type might be:
array of array of byte

if you say only the first enclosing array is 'streaming' that means the 
sub-array is NOT streamed, right?

If so, then streaming excessively large objects in the process of streaming 
normal and other associated objects might not be the right thing to do.


> Support streaming RPC calls
> ---------------------------
>
>                 Key: AVRO-406
>                 URL: https://issues.apache.org/jira/browse/AVRO-406
>             Project: Avro
>          Issue Type: New Feature
>          Components: java, spec
>            Reporter: Todd Lipcon
>
> Avro nicely supports chunking of container types into multiple frames. We 
> need to expose this to RPC layer to facilitate use cases like the Hadoop 
> Datanode where a single "RPC" can yield far more data than should be buffered 
> in memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to