Hi Bruce, Would this RPC protocol take the role of the transport in the Avro specification or would it replace the protocol? If the handshake occurs on channel 0 while the request/response payloads are transferred on a different channel, this would not meet the existing wire protocol as described in the current 1.3.2 spec right?
A couple other questions inline: On Thu, Apr 8, 2010 at 11:54 AM, Bruce Mitchener <bruce.mitche...@gmail.com> wrote: > While I recommend actually reading RFC 3080 (it is an easy read), this > summary may help... > > Framing: Length prefixed data, nothing unusual. > Encoding: Messages are effectively this: > > enum message_type { > message, // a request > reply, // when there's only a single reply > answer, // when there are multiple replies, send multiple > answers and then a null. > null, // terminate a chain of replies > error, // oops, there was an error > } > > struct message { > enum message_type message_type; > int channel; > int message_id; > bool more; // Is this message complete, or is more data coming? > for streaming > int sequence_number; // see RFC 3080 > optional int answer_number; // Used for answers > bytes payload; // The actual RPC command, still serialized here > } > > When a connection is opened, there's initially one channel, channel 0. That > channel is used for commands controlling the connection state, like opening > and closing channels. We should also perform Avro RPC handshakes over > channel 0. Is channel 0 used exclusively as a control channel or would requests be allowed on this channel? Any idea on what the control messages would look like? > > Channels allow for concurrency. You can send requests/messages down > multiple channels and process them independently. Messages on a single > channel need to be processed in order though. This allows for both > guaranteed order of execution (within a single channel) and greater > concurrency (multiple channels). > > Streaming happens in 2 ways. For streaming transfers, thoughts on optional compression codec attachment to streaming channels? It may be useful for IO-bound applications, but if you're transferring files like avro object container files that are already compressed - you'd need some extra coordination (but maybe that's outside the problem domain). > > The first way is to flip the more flag on a message. This means that the > data has been broken up over multiple messages and you need to receive the > whole thing before processing it. > > The second is to have multiple answers (followed by a null frame) to a > single request message. This allows you to process the data in a streaming > fashion. The only thing that this doesn't allow is to process the data > being sent in a streaming fashion, but you could look at doing that by > sending multiple request messages instead. > > Security and privacy can be handled by SASL. > > The RFC defines a number of ways in which you can detect buggy > implementations of the protocol or invalid data being sent (framing / > encoding violations). > > This should be pretty straight forward to implement, and as such (and since > I need such a thing in the immediate future), I've already begun an > implementation in C. > > - Bruce > > On Wed, Apr 7, 2010 at 4:13 PM, Bruce Mitchener > <bruce.mitche...@gmail.com>wrote: > >> I'm assuming that the goals of an optimized transport for Avro RPC are >> something like the following: >> >> * Framing should be efficient, easy to implement. >> * Streaming of large values, both as part of a request and as a response >> is very important. >> * Being able to have multiple concurrent requests in flight, while also >> being able to have ordering guarantees where desired is necessary. >> * It should be easy to implement this in Java, C, Python, Ruby, etc. >> * Security is or will be important. This security can include >> authorization as well as privacy concerns. >> >> I'd like to see something based largely upon RFC 3080, with some >> simplifications and extensions: >> >> http://www.faqs.org/rfcs/rfc3080.html >> >> What does this get us? >> >> * This system has mechanisms in place for streaming both a single large >> message and breaking a single reply up into multiple answers, allowing for >> pretty flexible streaming. (You can even mix these by having an answer that >> gets chunked itself.) >> * Concurrency is achieved by having multiple channels. Each channel >> executes messages in order, so you have a good mechanism for sending >> multiple things at once as well as maintaining ordering guarantees as >> necessary. >> * Reporting errors is very clear as it is a separate response type. >> * It has already been specified pretty clearly and we'd just be evolving >> that to something that more closely matches our needs. >> * It specifies sufficient data that you could implement this over >> transports other than TCP, such as UDP. >> >> Changes, rough list: >> >> * Use Avro-encoding for most things, so the encoding of a message would >> become an Avro struct. >> * Lose profiles in the sense that they're used in that specification since >> we're just exchanging Avro RPCs. >> * Do length prefixing rather than in the header, so that it is very >> amenable to binary I/O at high volumes. >> * No XML stuff, just existing things like the Avro handshake, wrapped up >> in messages. >> * For now, don't worry about things like flow control as expressed in RFC >> 3081, mapping of 3080 to TCP. >> * Think about adding something for true one-way messages, but an empty >> reply frame is probably sufficient, since that still allows reporting errors >> if needed (or desired). >> * May well need some extensions for a more flexible security model. >> * Use Avro RPC stuff to encode the channel management commands on channel >> 0 rather than XML. >> >> RFC 3117 (http://www.faqs.org/rfcs/rfc3117.html) goes into some of the >> philosophy and thinking behind the design of RFC 3080. Both are short and >> easy reading. >> >> - Bruce >> >> >