[ 
https://issues.apache.org/jira/browse/AVRO-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830451#action_12830451
 ] 

Philip Zeyliger commented on AVRO-341:
--------------------------------------

I'm hijacking this thread for the description (as opposed to the title).  Let's 
start thinking about a high-performance, secure transport for Avro.

Here's a dump of my current thoughts on this topic, after reading up a bit on 
SASL, and reading through some of the the Hadoop security patches.

First off, we should probably call this a "protocol".  It's a bit tricky, since 
we've already got a notion of Avro protocols, but "transport" reminds people of 
http://en.wikipedia.org/wiki/Transport_Layer, i.e., UDP vs TCP, and that's not 
what we're discussing here.  (On the TCP vs UDP front, let's focus our efforts 
first on a TCP protocol.  There might be a lot of value of having a UDP 
protocol as well, but it's clear that we'll need a TCP one.)

It's a bit meta, but I'd like us to consider describing Avro's protocol in 
terms of (and here the terminology falls down) an Avro protocol, or at least in 
terms of Avro records.  Instead of saying "and then there shall be a long, 
encoded like so, and then it shall by follows by that many bytes", we should 
just say, and "then shall we receive a record with the following schema".  We 
already do so in part, and I think that's the right direction.  I think it will 
make the description of the protocol clearer, and, I think, it will let the 
implementation worry re-use some schema functionality.  (I think 
implementations should use the most type-safe APIs they have available to them, 
but, hey, that's by definition an implementation detail.)

In terms of the "primitives", here's what I can think of:
 * CALL; this is the work-horse of the RPC, analagous to 
http://hadoop.apache.org/avro/docs/1.2.0/spec.html#Call+Format.  If we decide 
to do schema resolution at the handshake level, we would do it here.  Returns 
the response.  May throw AuthenticationRequired.
* AUTHENTICATE: this is the command for authentication.  SASL sometimes 
requires a back and forth (until it's "done"); we'd put the hooks for all of 
that here.
* DISCOVER: Asks the server for information about itself.  Specifically, 
servers may tell clients what protocols they support.  This may throw 
AuthenticationRequired or return nothing, if the server wants to be cagey.  
This is in some sense similar to FB303: 
https://svn.apache.org/repos/asf/incubator/thrift/trunk/contrib/fb303/if/fb303.thrift
 .  In a friendly environment, a server could tell you who's running it (a 
username), what machine it's on, arbitrary key/value statistics.

We absolutely need to support piggy-backing of commands.  One way to do that is 
for clients to simply be able to send multiple commands in a row, without 
waiting for the response.  Or having commands able to include subcommands.

We need to support out-of-order responses and "one way" (don't wait for a 
response) commands.

We still need to do framing.  Also, SASL requires that all bytes after the 
succesful SASL authentication are wrapped by SASL, so servers and clients need 
to have a state machine that understands that, and wraps appropriately.  (We 
could maybe have avoided framing if we supported framing directly in Avro's 
string primitive type, like we do in Avro's map type, by having a negative 
string length indicate a string that is continued.)

Finally, we need to think hard about how to version this protocol itself.  It's 
appealing to be able to add commands in the future ("oneway" is an example) or 
to enrich the response of commands like "DISCOVER".  It's noteworthy that 
text-based protocols like IMAP have had little trouble extending themselves to 
stuff like SASL, because they could just augment what existing commands did.  
(RFC 4959  is pretty short.)   A simple approach would be to bootstrap it by 
sending hash(avro protocol schema), and doing much like we do with calls right 
now.

Anyway, that's where I am right now.  Looking forward to more discussion.

-- Philip

> specify avro transport in spec
> ------------------------------
>
>                 Key: AVRO-341
>                 URL: https://issues.apache.org/jira/browse/AVRO-341
>             Project: Avro
>          Issue Type: Improvement
>          Components: spec
>            Reporter: Doug Cutting
>
> We should develop a high-performance, secure, transport for Avro.  This 
> should be named with avro: uris.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to