[ 
https://issues.apache.org/jira/browse/AVRO-406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831614#action_12831614
 ] 

Todd Lipcon commented on AVRO-406:
----------------------------------

bq. As I think about it more, I believe the Iterator<Iterator<Foo>> approach 
has merit. Avro's runtime supports the notion of efficient skipping

Skipping is one thing, but there's no way to rewind a socket. For example, what 
do we do about the following user code:

{code}
void myStreamedRpc(Iterable<Iterable<Foo>> myArg) {
  Iterator outerIter = myArg.iterator();
  Iterator<Foo> firstIter = outerIter.next().iterator();
  Iterator<Foo> secondIter = outerIter.next().iterator();
  Foo a = secondIter.next();
  Foo b = firstIter.next();
}
{code}

This code tries to read the data off the stream in the opposite order from 
which they arrive. When assigning 'a' we could certainly skip all of 
firstIter's data, but then we'd be screwed when it comes time to assign 'b' 
since we can't skip back. We could buffer all of firstIter as soon as we access 
secondIter, but then we're not being very transparent at all to end users. I 
don't like the idea that users of this API would have to really understand its 
workings under the hood or else risk potentially unbounded memory buffering.

Keeping the API restricted to "you may have one streamed input and one streamed 
output" has its downsides in loss of generality, but at least it is very 
transparent to implementors, and slight deviations in processing order can't 
cause huge swings in performance.

> Support streaming RPC calls
> ---------------------------
>
>                 Key: AVRO-406
>                 URL: https://issues.apache.org/jira/browse/AVRO-406
>             Project: Avro
>          Issue Type: New Feature
>          Components: java, spec
>            Reporter: Todd Lipcon
>
> Avro nicely supports chunking of container types into multiple frames. We 
> need to expose this to RPC layer to facilitate use cases like the Hadoop 
> Datanode where a single "RPC" can yield far more data than should be buffered 
> in memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to