[ https://issues.apache.org/jira/browse/AVRO-406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831252#action_12831252 ]
ryan rawson commented on AVRO-406: ---------------------------------- The HBase API is really returning a 2 dimensional array of byte arrays... Each cell (row/col/version) is one byte array, and then each row is an array of those, and each result set is multiple rows. Perhaps another way to think of this is an endless array of cells with inferred 'next row' from watching the row key between cells. The optimization available to us here is that each hbase RPC call spends very little time actually _in hbase_ but ends up waiting on Datanode to get the data back. Some kind of async framework on the server side could help chain daemons that make multiple RPCs and avoid busy-waiting threads. One other thought, right now we use a block-oriented loader, since right now you have to store the entire value in RAM at least once (during RPC and memstore times), but if someone wanted to store massive values in hbase we could use the DN streaming API and stream those chunks back to the client. Right now everything is modelled as arrays of bytes, so that might not be so hard to do. I'm a little wary of large object APIs, since you might as well store the data in HDFS directly. Right now the return type might be: array of array of byte if you say only the first enclosing array is 'streaming' that means the sub-array is NOT streamed, right? If so, then streaming excessively large objects in the process of streaming normal and other associated objects might not be the right thing to do. > Support streaming RPC calls > --------------------------- > > Key: AVRO-406 > URL: https://issues.apache.org/jira/browse/AVRO-406 > Project: Avro > Issue Type: New Feature > Components: java, spec > Reporter: Todd Lipcon > > Avro nicely supports chunking of container types into multiple frames. We > need to expose this to RPC layer to facilitate use cases like the Hadoop > Datanode where a single "RPC" can yield far more data than should be buffered > in memory. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.