Re: Thoughts on an RPC protocol

2010-04-12 Thread Doug Cutting

Jeff Hodges wrote:

To throw another set of ideas into the hat, SPDY[1][2] would be good
to learn from. SPDY takes the basics of HTTP and makes them fast.
Benefits we would enjoy include:

* Multiplexed streams
* Request prioritization
* HTTP header compression
* Server push

Currently in draft form.

[1] http://dev.chromium.org/spdy/spdy-whitepaper
[2] http://dev.chromium.org/spdy/spdy-protocol/spdy-protocol-draft2


I like that SPDY is more actively developed than BEEP.  It would be nice 
not to have to re-implement clients and servers from scratch, and to 
perhaps even use a pre-existing specfication.


SPDY does fix one of the primary restrictions of HTTP in that it permits 
request multiplexing: responses need not arrive in order.


However other concerns folks have had about HTTP are that:
 a. text headers are big and slow to process
 b. SSL/TLS is heavyweight and inflexible for authentication

SPDY addresses the size of headers by compressing them, but that may 
hinder the speed of header processing.


SPDY uses SSL/TLS, so would have the same issues there.  Perhaps they 
could be convinced to adopt SASL instead of TLS?


Doug


Re: Thoughts on an RPC protocol

2010-04-12 Thread Jeff Hodges
I hadn't thought of adding SASL to SPDY. I haven't dived into the
community, yet, to see if they've discussed it.

The philosophy behind BEEP seems pretty good, even if the crazy XML
style of the actual spec is not a good match (as put in an earlier
email). The lack of community is worrisome, of course, but having it
as an influence in our own design is probably worthwhile.

I do worry about the desire not to name things. URIs seem so
obviously a good in a system that they should be consistent for any
implementation of the protocol and instance of its use. However, I
recognize how much of a bite that is to chew if we don't take a
substantial portion of a spec into our own that already has that
defined. I'm not trying to push a REST dogma. Building a spec without
RESTful design is fine by me, as long as we recognize the tradeoffs.
--
Jeff

On Mon, Apr 12, 2010 at 1:46 PM, Doug Cutting cutt...@apache.org wrote:
 Jeff Hodges wrote:

 To throw another set of ideas into the hat, SPDY[1][2] would be good
 to learn from. SPDY takes the basics of HTTP and makes them fast.
 Benefits we would enjoy include:

 * Multiplexed streams
 * Request prioritization
 * HTTP header compression
 * Server push

 Currently in draft form.

 [1] http://dev.chromium.org/spdy/spdy-whitepaper
 [2] http://dev.chromium.org/spdy/spdy-protocol/spdy-protocol-draft2

 I like that SPDY is more actively developed than BEEP.  It would be nice not
 to have to re-implement clients and servers from scratch, and to perhaps
 even use a pre-existing specfication.

 SPDY does fix one of the primary restrictions of HTTP in that it permits
 request multiplexing: responses need not arrive in order.

 However other concerns folks have had about HTTP are that:
  a. text headers are big and slow to process
  b. SSL/TLS is heavyweight and inflexible for authentication

 SPDY addresses the size of headers by compressing them, but that may hinder
 the speed of header processing.

 SPDY uses SSL/TLS, so would have the same issues there.  Perhaps they could
 be convinced to adopt SASL instead of TLS?

 Doug



Re: Thoughts on an RPC protocol

2010-04-12 Thread Jeff Hodges
Er, note that I'm generally lightly positive on using it as a
influence on the protocol, even if portions of the philosophy I find
less than ideal. I mentioned the lack of community because a lack of
active  means whatever obstacles occur due to its fundamental nature
aren't well-known and well-defined. That's reasonable, if not ideal,
as we are already talking about building our own RPC!
--
Jeff

On Mon, Apr 12, 2010 at 2:48 PM, Jeff Hodges jhod...@twitter.com wrote:
 I hadn't thought of adding SASL to SPDY. I haven't dived into the
 community, yet, to see if they've discussed it.

 The philosophy behind BEEP seems pretty good, even if the crazy XML
 style of the actual spec is not a good match (as put in an earlier
 email). The lack of community is worrisome, of course, but having it
 as an influence in our own design is probably worthwhile.

 I do worry about the desire not to name things. URIs seem so
 obviously a good in a system that they should be consistent for any
 implementation of the protocol and instance of its use. However, I
 recognize how much of a bite that is to chew if we don't take a
 substantial portion of a spec into our own that already has that
 defined. I'm not trying to push a REST dogma. Building a spec without
 RESTful design is fine by me, as long as we recognize the tradeoffs.
 --
 Jeff

 On Mon, Apr 12, 2010 at 1:46 PM, Doug Cutting cutt...@apache.org wrote:
 Jeff Hodges wrote:

 To throw another set of ideas into the hat, SPDY[1][2] would be good
 to learn from. SPDY takes the basics of HTTP and makes them fast.
 Benefits we would enjoy include:

 * Multiplexed streams
 * Request prioritization
 * HTTP header compression
 * Server push

 Currently in draft form.

 [1] http://dev.chromium.org/spdy/spdy-whitepaper
 [2] http://dev.chromium.org/spdy/spdy-protocol/spdy-protocol-draft2

 I like that SPDY is more actively developed than BEEP.  It would be nice not
 to have to re-implement clients and servers from scratch, and to perhaps
 even use a pre-existing specfication.

 SPDY does fix one of the primary restrictions of HTTP in that it permits
 request multiplexing: responses need not arrive in order.

 However other concerns folks have had about HTTP are that:
  a. text headers are big and slow to process
  b. SSL/TLS is heavyweight and inflexible for authentication

 SPDY addresses the size of headers by compressing them, but that may hinder
 the speed of header processing.

 SPDY uses SSL/TLS, so would have the same issues there.  Perhaps they could
 be convinced to adopt SASL instead of TLS?

 Doug




Re: Thoughts on an RPC protocol

2010-04-10 Thread Jeff Hodges
To throw another set of ideas into the hat, SPDY[1][2] would be good
to learn from. SPDY takes the basics of HTTP and makes them fast.
Benefits we would enjoy include:

* Multiplexed streams
* Request prioritization
* HTTP header compression
* Server push

Currently in draft form.

[1] http://dev.chromium.org/spdy/spdy-whitepaper
[2] http://dev.chromium.org/spdy/spdy-protocol/spdy-protocol-draft2
--
Jeff
On Fri, Apr 9, 2010 at 2:29 PM, Doug Cutting cutt...@apache.org wrote:
 Scott Carey wrote:

 I also have not wrapped my head around routing/proxy use cases.  From
 a somewhat ignorant perspective on them -- I'd rather have a solid
 point-to-point protocol that just works, is simple, and can meet the
 vast majority of use cases with high performance than one that
 happens to be capable of sophisticated routing but has a lot of other
 limitations or is a lot harder to implement and debug.

 FWIW, they're theoretical at this point.  I was only stating that prefixing
 every request and response with handshakes makes stuff like proxies trivial,
 since the protocol becomes stateless.  Once we start having sessions things
 get trickier.  For example, many HTTP client libraries cache connections,
 so, if you're building on top of one of those, it's hard to know when a new
 connection is opened.

 One approach is to declare that the current framing and handshake rules only
 apply to HTTP, currently our only standard transport.  Then we can define a
 new transport that's point-to-point, stateful, etc. which may handle framing
 and handshakes differently.  Thus we can retain back-compatibility.  Make
 sense?

 Doug



Re: Thoughts on an RPC protocol

2010-04-10 Thread Jeff Hodges
Oh, and it's been partially implemented in Chromium, so there's a
quasi-reference implementation.
--
Jeff

On Sat, Apr 10, 2010 at 10:48 AM, Jeff Hodges jhod...@twitter.com wrote:
 To throw another set of ideas into the hat, SPDY[1][2] would be good
 to learn from. SPDY takes the basics of HTTP and makes them fast.
 Benefits we would enjoy include:

 * Multiplexed streams
 * Request prioritization
 * HTTP header compression
 * Server push

 Currently in draft form.

 [1] http://dev.chromium.org/spdy/spdy-whitepaper
 [2] http://dev.chromium.org/spdy/spdy-protocol/spdy-protocol-draft2
 --
 Jeff
 On Fri, Apr 9, 2010 at 2:29 PM, Doug Cutting cutt...@apache.org wrote:
 Scott Carey wrote:

 I also have not wrapped my head around routing/proxy use cases.  From
 a somewhat ignorant perspective on them -- I'd rather have a solid
 point-to-point protocol that just works, is simple, and can meet the
 vast majority of use cases with high performance than one that
 happens to be capable of sophisticated routing but has a lot of other
 limitations or is a lot harder to implement and debug.

 FWIW, they're theoretical at this point.  I was only stating that prefixing
 every request and response with handshakes makes stuff like proxies trivial,
 since the protocol becomes stateless.  Once we start having sessions things
 get trickier.  For example, many HTTP client libraries cache connections,
 so, if you're building on top of one of those, it's hard to know when a new
 connection is opened.

 One approach is to declare that the current framing and handshake rules only
 apply to HTTP, currently our only standard transport.  Then we can define a
 new transport that's point-to-point, stateful, etc. which may handle framing
 and handshakes differently.  Thus we can retain back-compatibility.  Make
 sense?

 Doug




Re: Thoughts on an RPC protocol

2010-04-10 Thread Jeff Hodges
Sorry for the spam. Python, java and apache httpd implementations
listed at the project page: http://www.chromium.org/spdy

On Sat, Apr 10, 2010 at 10:53 AM, Jeff Hodges jhod...@twitter.com wrote:
 Oh, and it's been partially implemented in Chromium, so there's a
 quasi-reference implementation.
 --
 Jeff

 On Sat, Apr 10, 2010 at 10:48 AM, Jeff Hodges jhod...@twitter.com wrote:
 To throw another set of ideas into the hat, SPDY[1][2] would be good
 to learn from. SPDY takes the basics of HTTP and makes them fast.
 Benefits we would enjoy include:

 * Multiplexed streams
 * Request prioritization
 * HTTP header compression
 * Server push

 Currently in draft form.

 [1] http://dev.chromium.org/spdy/spdy-whitepaper
 [2] http://dev.chromium.org/spdy/spdy-protocol/spdy-protocol-draft2
 --
 Jeff
 On Fri, Apr 9, 2010 at 2:29 PM, Doug Cutting cutt...@apache.org wrote:
 Scott Carey wrote:

 I also have not wrapped my head around routing/proxy use cases.  From
 a somewhat ignorant perspective on them -- I'd rather have a solid
 point-to-point protocol that just works, is simple, and can meet the
 vast majority of use cases with high performance than one that
 happens to be capable of sophisticated routing but has a lot of other
 limitations or is a lot harder to implement and debug.

 FWIW, they're theoretical at this point.  I was only stating that prefixing
 every request and response with handshakes makes stuff like proxies trivial,
 since the protocol becomes stateless.  Once we start having sessions things
 get trickier.  For example, many HTTP client libraries cache connections,
 so, if you're building on top of one of those, it's hard to know when a new
 connection is opened.

 One approach is to declare that the current framing and handshake rules only
 apply to HTTP, currently our only standard transport.  Then we can define a
 new transport that's point-to-point, stateful, etc. which may handle framing
 and handshakes differently.  Thus we can retain back-compatibility.  Make
 sense?

 Doug





Re: Thoughts on an RPC protocol

2010-04-10 Thread Bruce Mitchener
What specific changes would you propose making to my proposal?

 - Bruce

On Sat, Apr 10, 2010 at 11:57 AM, Jeff Hodges jhod...@twitter.com wrote:

 Sorry for the spam. Python, java and apache httpd implementations
 listed at the project page: http://www.chromium.org/spdy

 On Sat, Apr 10, 2010 at 10:53 AM, Jeff Hodges jhod...@twitter.com wrote:
  Oh, and it's been partially implemented in Chromium, so there's a
  quasi-reference implementation.
  --
  Jeff
 
  On Sat, Apr 10, 2010 at 10:48 AM, Jeff Hodges jhod...@twitter.com
 wrote:
  To throw another set of ideas into the hat, SPDY[1][2] would be good
  to learn from. SPDY takes the basics of HTTP and makes them fast.
  Benefits we would enjoy include:
 
  * Multiplexed streams
  * Request prioritization
  * HTTP header compression
  * Server push
 
  Currently in draft form.
 
  [1] http://dev.chromium.org/spdy/spdy-whitepaper
  [2] http://dev.chromium.org/spdy/spdy-protocol/spdy-protocol-draft2
  --
  Jeff
  On Fri, Apr 9, 2010 at 2:29 PM, Doug Cutting cutt...@apache.org
 wrote:
  Scott Carey wrote:
 
  I also have not wrapped my head around routing/proxy use cases.  From
  a somewhat ignorant perspective on them -- I'd rather have a solid
  point-to-point protocol that just works, is simple, and can meet the
  vast majority of use cases with high performance than one that
  happens to be capable of sophisticated routing but has a lot of other
  limitations or is a lot harder to implement and debug.
 
  FWIW, they're theoretical at this point.  I was only stating that
 prefixing
  every request and response with handshakes makes stuff like proxies
 trivial,
  since the protocol becomes stateless.  Once we start having sessions
 things
  get trickier.  For example, many HTTP client libraries cache
 connections,
  so, if you're building on top of one of those, it's hard to know when a
 new
  connection is opened.
 
  One approach is to declare that the current framing and handshake rules
 only
  apply to HTTP, currently our only standard transport.  Then we can
 define a
  new transport that's point-to-point, stateful, etc. which may handle
 framing
  and handshakes differently.  Thus we can retain back-compatibility.
  Make
  sense?
 
  Doug
 
 
 



Re: Thoughts on an RPC protocol

2010-04-10 Thread Jeff Hodges
I may have misunderstood the direction this thread had taken. I'm
still going through the spec.
--
Jeff

On Sat, Apr 10, 2010 at 10:59 AM, Bruce Mitchener
bruce.mitche...@gmail.com wrote:
 What specific changes would you propose making to my proposal?

  - Bruce

 On Sat, Apr 10, 2010 at 11:57 AM, Jeff Hodges jhod...@twitter.com wrote:

 Sorry for the spam. Python, java and apache httpd implementations
 listed at the project page: http://www.chromium.org/spdy

 On Sat, Apr 10, 2010 at 10:53 AM, Jeff Hodges jhod...@twitter.com wrote:
  Oh, and it's been partially implemented in Chromium, so there's a
  quasi-reference implementation.
  --
  Jeff
 
  On Sat, Apr 10, 2010 at 10:48 AM, Jeff Hodges jhod...@twitter.com
 wrote:
  To throw another set of ideas into the hat, SPDY[1][2] would be good
  to learn from. SPDY takes the basics of HTTP and makes them fast.
  Benefits we would enjoy include:
 
  * Multiplexed streams
  * Request prioritization
  * HTTP header compression
  * Server push
 
  Currently in draft form.
 
  [1] http://dev.chromium.org/spdy/spdy-whitepaper
  [2] http://dev.chromium.org/spdy/spdy-protocol/spdy-protocol-draft2
  --
  Jeff
  On Fri, Apr 9, 2010 at 2:29 PM, Doug Cutting cutt...@apache.org
 wrote:
  Scott Carey wrote:
 
  I also have not wrapped my head around routing/proxy use cases.  From
  a somewhat ignorant perspective on them -- I'd rather have a solid
  point-to-point protocol that just works, is simple, and can meet the
  vast majority of use cases with high performance than one that
  happens to be capable of sophisticated routing but has a lot of other
  limitations or is a lot harder to implement and debug.
 
  FWIW, they're theoretical at this point.  I was only stating that
 prefixing
  every request and response with handshakes makes stuff like proxies
 trivial,
  since the protocol becomes stateless.  Once we start having sessions
 things
  get trickier.  For example, many HTTP client libraries cache
 connections,
  so, if you're building on top of one of those, it's hard to know when a
 new
  connection is opened.
 
  One approach is to declare that the current framing and handshake rules
 only
  apply to HTTP, currently our only standard transport.  Then we can
 define a
  new transport that's point-to-point, stateful, etc. which may handle
 framing
  and handshakes differently.  Thus we can retain back-compatibility.
  Make
  sense?
 
  Doug
 
 
 




Re: Thoughts on an RPC protocol

2010-04-09 Thread Bruce Mitchener
Doug,

I'm happy to hear that you like this approach!

Allocation of channels seems to be something specific to an application.  In
my app, I'd have a channel for the streaming data that is constantly
arriving and a channel for making requests on and getting back answers
immediately.  Others could have a channel per object or whatever.

Are your proxy servers custom software or are they just passing traffic
along directly? If they're Avro-aware, then they can manage the handshaking
process when routing to a new peer.  Is this something that is actively
happening today or just something that is possible?

I definitely agree about not wanting a handshake per request. For my
application that would add a lot of overhead in terms of the data
transmitted.  (I'm sending a lot of small requests, hopefully many thousands
per second...)  I would be much much happier being able to have a handshake
per connection (or per channel open).

 - Bruce

On Thu, Apr 8, 2010 at 4:43 PM, Doug Cutting cutt...@apache.org wrote:

 Bruce,

 Overall this looks like a good approach to me.

 How do you anticipate allocating channels?  I'm guessing this would be one
 per client object, that a pool of open connections to servers would be
 maintained, and creating a new client object would allocate a new channel.

 Currently we perform a handshake per request.  This is fairly cheap and
 permits things like routing through proxy servers.  Different requests over
 the same connection can talk to different backend servers running different
 versions of the protocol.  Also consider the case where, between calls on an
 object, the connection times out, and a new session is established and a new
 handshake must take place.

 That said, having a session where the handshake can be assumed vastly
 simplifies one-way messages.  Without a response or error on which to prefix
 a handshake response, a one-way client has no means to know that the server
 was able to even parse its request.  Yet we'd still like a handshake for
 one-way messages, so that clients and servers need not be versioned in
 lockstep.  So the handshake-per-request model doesn't serve one-way messages
 well.

 How can we address both of these needs: to permit flexible payload routing
 and efficient one-way messaging?

 Doug


 Bruce Mitchener wrote:

  * Think about adding something for true one-way messages, but an empty
 reply frame is probably sufficient, since that still allows reporting
 errors
 if needed (or desired).




Re: Thoughts on an RPC protocol

2010-04-09 Thread Scott Carey

On Apr 8, 2010, at 11:35 PM, Bruce Mitchener wrote:

 Doug,
 
 I'm happy to hear that you like this approach!
 
 Allocation of channels seems to be something specific to an application.  In
 my app, I'd have a channel for the streaming data that is constantly
 arriving and a channel for making requests on and getting back answers
 immediately.  Others could have a channel per object or whatever.

If this is all on one TCP port, then channels will interfere with one another 
somewhat -- the transport layer will see packets arrive in the order they were 
sent.  If one packet in your streaming data stalls, both channels will stall.  
Depending on the application requirements, this might be fine.  But it should 
be made clear that channels are not independent, they are just interleaved over 
one ordered data stream.  How each implementation orders sending data on one 
end will affect order on the other side.

 
 I definitely agree about not wanting a handshake per request. For my
 application that would add a lot of overhead in terms of the data
 transmitted.  (I'm sending a lot of small requests, hopefully many thousands
 per second...)  I would be much much happier being able to have a handshake
 per connection (or per channel open).
 

Handshake per request will limit WAN usage.  Doubling request latency isn't a 
problem for local networks with sub 0.1ms RTTs, but it is a problem with 25ms 
RTTs.  Round trips aren't free on the processing or bandwidth side either.   If 
there is a way to meet most goals and limit extra handshakes to specific cases 
that would be a significant performance improvement.

 - Bruce
 
 On Thu, Apr 8, 2010 at 4:43 PM, Doug Cutting cutt...@apache.org wrote:
 
 Bruce,
 
 Overall this looks like a good approach to me.
 
 How do you anticipate allocating channels?  I'm guessing this would be one
 per client object, that a pool of open connections to servers would be
 maintained, and creating a new client object would allocate a new channel.
 
 Currently we perform a handshake per request.  This is fairly cheap and
 permits things like routing through proxy servers.  Different requests over
 the same connection can talk to different backend servers running different
 versions of the protocol.  Also consider the case where, between calls on an
 object, the connection times out, and a new session is established and a new
 handshake must take place.
 
 That said, having a session where the handshake can be assumed vastly
 simplifies one-way messages.  Without a response or error on which to prefix
 a handshake response, a one-way client has no means to know that the server
 was able to even parse its request.  Yet we'd still like a handshake for
 one-way messages, so that clients and servers need not be versioned in
 lockstep.  So the handshake-per-request model doesn't serve one-way messages
 well.
 
 How can we address both of these needs: to permit flexible payload routing
 and efficient one-way messaging?
 
 Doug
 
 
 Bruce Mitchener wrote:
 
 * Think about adding something for true one-way messages, but an empty
 reply frame is probably sufficient, since that still allows reporting
 errors
 if needed (or desired).
 
 



Re: Thoughts on an RPC protocol

2010-04-09 Thread Bruce Mitchener
On Fri, Apr 9, 2010 at 10:00 AM, Scott Carey sc...@richrelevance.comwrote:


 On Apr 8, 2010, at 11:35 PM, Bruce Mitchener wrote:

  Doug,
 
  I'm happy to hear that you like this approach!
 
  Allocation of channels seems to be something specific to an application.
  In
  my app, I'd have a channel for the streaming data that is constantly
  arriving and a channel for making requests on and getting back answers
  immediately.  Others could have a channel per object or whatever.

 If this is all on one TCP port, then channels will interfere with one
 another somewhat -- the transport layer will see packets arrive in the order
 they were sent.  If one packet in your streaming data stalls, both channels
 will stall.  Depending on the application requirements, this might be fine.
  But it should be made clear that channels are not independent, they are
 just interleaved over one ordered data stream.  How each implementation
 orders sending data on one end will affect order on the other side.


Agreed, that's just a fact of life with TCP.  Perhaps if SCTP ever gets some
traction, then people can do a mapping for that.  In the meantime, we could
look at what RFC 3081 did in the TCP mapping for RFC 3080 with respect to
flow control.

 I definitely agree about not wanting a handshake per request. For my
  application that would add a lot of overhead in terms of the data
  transmitted.  (I'm sending a lot of small requests, hopefully many
 thousands
  per second...)  I would be much much happier being able to have a
 handshake
  per connection (or per channel open).
 

 Handshake per request will limit WAN usage.  Doubling request latency isn't
 a problem for local networks with sub 0.1ms RTTs, but it is a problem with
 25ms RTTs.  Round trips aren't free on the processing or bandwidth side
 either.   If there is a way to meet most goals and limit extra handshakes to
 specific cases that would be a significant performance improvement.


We agree very strongly here.

 - Bruce


Re: Thoughts on an RPC protocol

2010-04-09 Thread Bo Shi
On Fri, Apr 9, 2010 at 2:35 AM, Bruce Mitchener
bruce.mitche...@gmail.com wrote:
 Doug,

 I'm happy to hear that you like this approach!

 Allocation of channels seems to be something specific to an application.  In
 my app, I'd have a channel for the streaming data that is constantly
 arriving and a channel for making requests on and getting back answers
 immediately.  Others could have a channel per object or whatever.

One ubiquitous protocol that shares many of the same requirements and
properties (in particular multiplexed channels over a transport) is
SSH.  Their channel mechanism may provide additional inspiration:
[http://tools.ietf.org/html/rfc4254#section-5].  One interesting bit
is that SSH doesn't have an explicit channel for control commands,
instead they create additional message types for control messages that
aren't associated with any channel.  It's only a minor distinction
though.


 Are your proxy servers custom software or are they just passing traffic
 along directly? If they're Avro-aware, then they can manage the handshaking
 process when routing to a new peer.  Is this something that is actively
 happening today or just something that is possible?

 I definitely agree about not wanting a handshake per request. For my
 application that would add a lot of overhead in terms of the data
 transmitted.  (I'm sending a lot of small requests, hopefully many thousands
 per second...)  I would be much much happier being able to have a handshake
 per connection (or per channel open).


If, as you suggest above, we enforce a 1-1 mapping of channel and avro
protocol, wouldn't that eliminate the need for a handshake on
subsequent requests on the same channel?  The handshake process could
be part of the open-channel negotiation.  I'm still wrapping my head
around the routing use-case; not sure if this meets the requirements
there though.

  - Bruce

 On Thu, Apr 8, 2010 at 4:43 PM, Doug Cutting cutt...@apache.org wrote:

 Bruce,

 Overall this looks like a good approach to me.

 How do you anticipate allocating channels?  I'm guessing this would be one
 per client object, that a pool of open connections to servers would be
 maintained, and creating a new client object would allocate a new channel.

 Currently we perform a handshake per request.  This is fairly cheap and
 permits things like routing through proxy servers.  Different requests over
 the same connection can talk to different backend servers running different
 versions of the protocol.  Also consider the case where, between calls on an
 object, the connection times out, and a new session is established and a new
 handshake must take place.

 That said, having a session where the handshake can be assumed vastly
 simplifies one-way messages.  Without a response or error on which to prefix
 a handshake response, a one-way client has no means to know that the server
 was able to even parse its request.  Yet we'd still like a handshake for
 one-way messages, so that clients and servers need not be versioned in
 lockstep.  So the handshake-per-request model doesn't serve one-way messages
 well.

 How can we address both of these needs: to permit flexible payload routing
 and efficient one-way messaging?

 Doug


 Bruce Mitchener wrote:

  * Think about adding something for true one-way messages, but an empty
 reply frame is probably sufficient, since that still allows reporting
 errors
 if needed (or desired).





Re: Thoughts on an RPC protocol

2010-04-09 Thread Scott Carey
On Apr 9, 2010, at 11:56 AM, Bo Shi wrote:

 On Fri, Apr 9, 2010 at 2:35 AM, Bruce Mitchener
 bruce.mitche...@gmail.com wrote:
 Doug,
 
 I'm happy to hear that you like this approach!
 
 Allocation of channels seems to be something specific to an application.  In
 my app, I'd have a channel for the streaming data that is constantly
 arriving and a channel for making requests on and getting back answers
 immediately.  Others could have a channel per object or whatever.
 
 One ubiquitous protocol that shares many of the same requirements and
 properties (in particular multiplexed channels over a transport) is
 SSH.  Their channel mechanism may provide additional inspiration:
 [http://tools.ietf.org/html/rfc4254#section-5].  One interesting bit
 is that SSH doesn't have an explicit channel for control commands,
 instead they create additional message types for control messages that
 aren't associated with any channel.  It's only a minor distinction
 though.
 

One flaw in SSH is that the bandwidth over a WAN is often pathetic.
Because it has multiple channels, it implements its own flow control and 
receive windows.  The effective bandwidth is the window size divided by the 
RTT.  Many implementations have a hard-coded 64KB receive buffer -- over a 
connection with a 30ms RTT this is peak data throughput of 2MB/sec.  The latest 
implementation versions of SSH patch the issue by having an automatically 
growing buffer, but the in memory buffer size required for high throughput 
transfer over higher latency links is large.

http://www.docstoc.com/docs/18191581/High-Performance-Networking-with-the-SSH-Protocol/
http://www.psc.edu/networking/projects/tcptune/
http://www.psc.edu/networking/projects/hpn-ssh/

This arises out of flow control to make sure that channels do not interfere 
with each other too much.

HTTP does not have this problem -- it just uses TCP's flow control and its only 
multi-channel-like feature -- http pipelining, doesn't bother with flow 
control. 

Whatever Avro does for a custom socket based transport should avoid this.  This 
sort of use case will likely be common (distributed copy of HDFS across a WAN 
for example). 

 
 If, as you suggest above, we enforce a 1-1 mapping of channel and avro
 protocol, wouldn't that eliminate the need for a handshake on
 subsequent requests on the same channel?  The handshake process could
 be part of the open-channel negotiation.  I'm still wrapping my head
 around the routing use-case; not sure if this meets the requirements
 there though.

I also have not wrapped my head around routing/proxy use cases.  From a 
somewhat ignorant perspective on them -- I'd rather have a solid point-to-point 
protocol that just works, is simple, and can meet the vast majority of use 
cases with high performance than one that happens to be capable of 
sophisticated routing but has a lot of other limitations or is a lot harder to 
implement and debug.


 
 



Re: Thoughts on an RPC protocol

2010-04-09 Thread Doug Cutting

Scott Carey wrote:

I also have not wrapped my head around routing/proxy use cases.  From
a somewhat ignorant perspective on them -- I'd rather have a solid
point-to-point protocol that just works, is simple, and can meet the
vast majority of use cases with high performance than one that
happens to be capable of sophisticated routing but has a lot of other
limitations or is a lot harder to implement and debug.


FWIW, they're theoretical at this point.  I was only stating that 
prefixing every request and response with handshakes makes stuff like 
proxies trivial, since the protocol becomes stateless.  Once we start 
having sessions things get trickier.  For example, many HTTP client 
libraries cache connections, so, if you're building on top of one of 
those, it's hard to know when a new connection is opened.


One approach is to declare that the current framing and handshake rules 
only apply to HTTP, currently our only standard transport.  Then we can 
define a new transport that's point-to-point, stateful, etc. which may 
handle framing and handshakes differently.  Thus we can retain 
back-compatibility.  Make sense?


Doug


Re: Thoughts on an RPC protocol

2010-04-08 Thread Bruce Mitchener
While I recommend actually reading RFC 3080 (it is an easy read), this
summary may help...

Framing: Length prefixed data, nothing unusual.
Encoding: Messages are effectively this:

enum message_type {
message,// a request
reply,  // when there's only a single reply
answer,   // when there are multiple replies, send multiple
answers and then a null.
null,// terminate a chain of replies
error,  // oops, there was an error
}

struct message {
enum message_type message_type;
int channel;
int message_id;
bool more;  // Is this message complete, or is more data coming?
for streaming
int sequence_number; // see RFC 3080
optional int answer_number; // Used for answers
bytes payload;   // The actual RPC command, still serialized here
}

When a connection is opened, there's initially one channel, channel 0. That
channel is used for commands controlling the connection state, like opening
and closing channels.  We should also perform Avro RPC handshakes over
channel 0.

Channels allow for concurrency.  You can send requests/messages down
multiple channels and process them independently. Messages on a single
channel need to be processed in order though. This allows for both
guaranteed order of execution (within a single channel) and greater
concurrency (multiple channels).

Streaming happens in 2 ways.

The first way is to flip the more flag on a message. This means that the
data has been broken up over multiple messages and you need to receive the
whole thing before processing it.

The second is to have multiple answers (followed by a null frame) to a
single request message.  This allows you to process the data in a streaming
fashion.  The only thing that this doesn't allow is to process the data
being sent in a streaming fashion, but you could look at doing that by
sending multiple request messages instead.

Security and privacy can be handled by SASL.

The RFC defines a number of ways in which you can detect buggy
implementations of the protocol or invalid data being sent (framing /
encoding violations).

This should be pretty straight forward to implement, and as such (and since
I need such a thing in the immediate future), I've already begun an
implementation in C.

 - Bruce

On Wed, Apr 7, 2010 at 4:13 PM, Bruce Mitchener
bruce.mitche...@gmail.comwrote:

 I'm assuming that the goals of an optimized transport for Avro RPC are
 something like the following:

  * Framing should be efficient, easy to implement.
  * Streaming of large values, both as part of a request and as a response
 is very important.
  * Being able to have multiple concurrent requests in flight, while also
 being able to have ordering guarantees where desired is necessary.
  * It should be easy to implement this in Java, C, Python, Ruby, etc.
  * Security is or will be important. This security can include
 authorization as well as privacy concerns.

 I'd like to see something based largely upon RFC 3080, with some
 simplifications and extensions:

 http://www.faqs.org/rfcs/rfc3080.html

 What does this get us?

  * This system has mechanisms in place for streaming both a single large
 message and breaking a single reply up into multiple answers, allowing for
 pretty flexible streaming.  (You can even mix these by having an answer that
 gets chunked itself.)
  * Concurrency is achieved by having multiple channels. Each channel
 executes messages in order, so you have a good mechanism for sending
 multiple things at once as well as maintaining ordering guarantees as
 necessary.
  * Reporting errors is very clear as it is a separate response type.
  * It has already been specified pretty clearly and we'd just be evolving
 that to something that more closely matches our needs.
  * It specifies sufficient data that you could implement this over
 transports other than TCP, such as UDP.

 Changes, rough list:

  * Use Avro-encoding for most things, so the encoding of a message would
 become an Avro struct.
  * Lose profiles in the sense that they're used in that specification since
 we're just exchanging Avro RPCs.
  * Do length prefixing rather than in the header, so that it is very
 amenable to binary I/O at high volumes.
  * No XML stuff, just existing things like the Avro handshake, wrapped up
 in messages.
  * For now, don't worry about things like flow control as expressed in RFC
 3081, mapping of 3080 to TCP.
  * Think about adding something for true one-way messages, but an empty
 reply frame is probably sufficient, since that still allows reporting errors
 if needed (or desired).
  * May well need some extensions for a more flexible security model.
  * Use Avro RPC stuff to encode the channel management commands on channel
 0 rather than XML.

 RFC 3117 (http://www.faqs.org/rfcs/rfc3117.html) goes into some of the
 philosophy and thinking behind the design of RFC 3080.  Both are short and
 easy reading.

  - 

Re: Thoughts on an RPC protocol

2010-04-08 Thread Bo Shi
Hi Bruce,

Would this RPC protocol take the role of the transport in the Avro
specification or would it replace the protocol?  If the handshake
occurs on channel 0 while the request/response payloads are
transferred on a different channel, this would not meet the existing
wire protocol as described in the current 1.3.2 spec right?

A couple other questions inline:

On Thu, Apr 8, 2010 at 11:54 AM, Bruce Mitchener
bruce.mitche...@gmail.com wrote:
 While I recommend actually reading RFC 3080 (it is an easy read), this
 summary may help...

 Framing: Length prefixed data, nothing unusual.
 Encoding: Messages are effectively this:

 enum message_type {
    message,            // a request
    reply,                  // when there's only a single reply
    answer,               // when there are multiple replies, send multiple
 answers and then a null.
    null,                    // terminate a chain of replies
    error,                  // oops, there was an error
 }

 struct message {
    enum message_type message_type;
    int channel;
    int message_id;
    bool more;          // Is this message complete, or is more data coming?
 for streaming
    int sequence_number; // see RFC 3080
    optional int answer_number; // Used for answers
    bytes payload;   // The actual RPC command, still serialized here
 }

 When a connection is opened, there's initially one channel, channel 0. That
 channel is used for commands controlling the connection state, like opening
 and closing channels.  We should also perform Avro RPC handshakes over
 channel 0.

Is channel 0 used exclusively as a control channel or would requests
be allowed on this channel?  Any idea on what the control messages
would look like?


 Channels allow for concurrency.  You can send requests/messages down
 multiple channels and process them independently. Messages on a single
 channel need to be processed in order though. This allows for both
 guaranteed order of execution (within a single channel) and greater
 concurrency (multiple channels).

 Streaming happens in 2 ways.

For streaming transfers, thoughts on optional compression codec
attachment to streaming channels?  It may be useful for IO-bound
applications, but if you're transferring files like avro object
container files that are already compressed - you'd need some extra
coordination (but maybe that's outside the problem domain).


 The first way is to flip the more flag on a message. This means that the
 data has been broken up over multiple messages and you need to receive the
 whole thing before processing it.

 The second is to have multiple answers (followed by a null frame) to a
 single request message.  This allows you to process the data in a streaming
 fashion.  The only thing that this doesn't allow is to process the data
 being sent in a streaming fashion, but you could look at doing that by
 sending multiple request messages instead.

 Security and privacy can be handled by SASL.

 The RFC defines a number of ways in which you can detect buggy
 implementations of the protocol or invalid data being sent (framing /
 encoding violations).

 This should be pretty straight forward to implement, and as such (and since
 I need such a thing in the immediate future), I've already begun an
 implementation in C.

  - Bruce

 On Wed, Apr 7, 2010 at 4:13 PM, Bruce Mitchener
 bruce.mitche...@gmail.comwrote:

 I'm assuming that the goals of an optimized transport for Avro RPC are
 something like the following:

  * Framing should be efficient, easy to implement.
  * Streaming of large values, both as part of a request and as a response
 is very important.
  * Being able to have multiple concurrent requests in flight, while also
 being able to have ordering guarantees where desired is necessary.
  * It should be easy to implement this in Java, C, Python, Ruby, etc.
  * Security is or will be important. This security can include
 authorization as well as privacy concerns.

 I'd like to see something based largely upon RFC 3080, with some
 simplifications and extensions:

     http://www.faqs.org/rfcs/rfc3080.html

 What does this get us?

  * This system has mechanisms in place for streaming both a single large
 message and breaking a single reply up into multiple answers, allowing for
 pretty flexible streaming.  (You can even mix these by having an answer that
 gets chunked itself.)
  * Concurrency is achieved by having multiple channels. Each channel
 executes messages in order, so you have a good mechanism for sending
 multiple things at once as well as maintaining ordering guarantees as
 necessary.
  * Reporting errors is very clear as it is a separate response type.
  * It has already been specified pretty clearly and we'd just be evolving
 that to something that more closely matches our needs.
  * It specifies sufficient data that you could implement this over
 transports other than TCP, such as UDP.

 Changes, rough list:

  * Use Avro-encoding for most things, so the encoding of a 

Re: Thoughts on an RPC protocol

2010-04-08 Thread Doug Cutting

Bruce,

Overall this looks like a good approach to me.

How do you anticipate allocating channels?  I'm guessing this would be 
one per client object, that a pool of open connections to servers would 
be maintained, and creating a new client object would allocate a new 
channel.


Currently we perform a handshake per request.  This is fairly cheap and 
permits things like routing through proxy servers.  Different requests 
over the same connection can talk to different backend servers running 
different versions of the protocol.  Also consider the case where, 
between calls on an object, the connection times out, and a new session 
is established and a new handshake must take place.


That said, having a session where the handshake can be assumed vastly 
simplifies one-way messages.  Without a response or error on which to 
prefix a handshake response, a one-way client has no means to know that 
the server was able to even parse its request.  Yet we'd still like a 
handshake for one-way messages, so that clients and servers need not be 
versioned in lockstep.  So the handshake-per-request model doesn't serve 
one-way messages well.


How can we address both of these needs: to permit flexible payload 
routing and efficient one-way messaging?


Doug

Bruce Mitchener wrote:

 * Think about adding something for true one-way messages, but an empty
reply frame is probably sufficient, since that still allows reporting errors
if needed (or desired).


Re: Thoughts on an RPC protocol

2010-04-08 Thread Jeremy Custenborder
I really like the model that Voldemort uses for their protocol buffers
implementation. I ported their client to .net and it was really
simple. The framing is really simple as it prefixes an integer value
before binary data. The binary data is a request made with a protocol
buffer, and so is the response. The response is a protocol buffer
package specific to the method called. They used enums for the method
names. For streaming results it prefixes a 1 if there is another
record and either a 0 or -1 (I forget and need to look at the code.
You get the idea) if it's the end of the stream. I like the idea of
keeping things as simple as possible because it makes it easier for
additional languages to be added quickly. Personally I would prefer to
have an easy request then response model that blocks on the client
side. For simplicity a client could just block and wait for the data
to return or the client could be more advanced and use callbacks or io
completion ports. This would allow a client to use things like
connection pooling to handle concurrency of multiple requests.

As for routing I typically don't like using a proxy because that
limits you to the interface bandwidth of the proxy appliance. For
example if you use something like a netscaler as your proxy in front
of your back end servers. You will have 50 back end servers with gig
connections trying to feed a couple netscalers that can constrain the
bandwidth. Adding more bandwidth means adding more proxy boxes which
gets expensive fast especially with netscalers. Voldemort uses the
concept of node banning. If a node doesn't respond quickly enough or
has errors it will get banned for a period of time. Couple this with
something like SRV records in DNS and you can easily manipulate the
direction of your traffic without using a proxy saving you some cash.

I'm currently working on the .net port of avro and the rpc
implementation is one of my priorities. My goal is to get something to
the point where I can have a utility that connects to the rpc server,
gets protocol handshake, then builds out strongly typed code that a
developer can work against.

{
  namespace: com.acme,
  protocol: HelloWorld,
  doc: Protocol Greetings,

  types: [
{name: Greeting, type: record, fields: [
  {name: message, type: string}]},
{name: Curse, type: error, fields: [
  {name: message, type: string}]}
  ],

  messages: {
hello: {
  doc: Say hello.,
  request: [{name: greeting, type: Greeting }],
  response: Greeting,
  errors: [Curse]
}
  }
}

would generate

namespace com.acme
{
public class HelloWorld:Avro.Protocol
{
public Greeting hello(Greeting  greeting)
}
}

On Thu, Apr 8, 2010 at 3:43 PM, Doug Cutting cutt...@apache.org wrote:
 Bruce,

 Overall this looks like a good approach to me.

 How do you anticipate allocating channels?  I'm guessing this would be one
 per client object, that a pool of open connections to servers would be
 maintained, and creating a new client object would allocate a new channel.

 Currently we perform a handshake per request.  This is fairly cheap and
 permits things like routing through proxy servers.  Different requests over
 the same connection can talk to different backend servers running different
 versions of the protocol.  Also consider the case where, between calls on an
 object, the connection times out, and a new session is established and a new
 handshake must take place.

 That said, having a session where the handshake can be assumed vastly
 simplifies one-way messages.  Without a response or error on which to prefix
 a handshake response, a one-way client has no means to know that the server
 was able to even parse its request.  Yet we'd still like a handshake for
 one-way messages, so that clients and servers need not be versioned in
 lockstep.  So the handshake-per-request model doesn't serve one-way messages
 well.

 How can we address both of these needs: to permit flexible payload routing
 and efficient one-way messaging?

 Doug

 Bruce Mitchener wrote:

  * Think about adding something for true one-way messages, but an empty
 reply frame is probably sufficient, since that still allows reporting
 errors
 if needed (or desired).