Alternative Binary Protocol idea for memcached.

Clint Webb Tue, 19 Feb 2008 22:20:35 -0800

Instead of having a static structure-based protocol, a more advanced (yet
simplified) protocol based on reduced instrutions is feasible.


I know a considerable amount of thought and work has gone into the existing
flavour of the binary protocol, and I dont expect that work to be discarded,
I'm really only mentioning this new concept now as an alternative for the
future if we ever find the current binary protocol to be too restrictive and
inflexible.  And something to think about, or even use elsewhere.

I haven't written any code for memcached or clients to handle this, its just
a concept to think about, but I have used this style of protocol several
times in the past, and its flexibility has been VERY usefull.   The
processing of the instructions has been very trivial also.

Basically the idea is to use a series of simple commands that are tied in
together to perform something useful.  It is modeled on the RISC method used
by mainframe processors.

Each 'command' is a single byte.  Some commands don't have any parameters.
Some do.  In my previous instances of similar protocols, we used certain
number ranges to indicate what kind of parameter to expect (byte, 32-bit
word, short-string, long-string), but that doesnt need to be the case.

In the examples listed here, I am using the descriptive name of a command.
Keep in mind that these are not ASCII commands, but are a logical
representation of a single byte.   Parameters are handled such that strings
are first represented by a length, followed by the data.  Short-strings
(used only for keys and tags) are a byte which indicates length, followed by
the data.  Long strings are a 32-bit length, followed by the data.

Here is a quick example of a SET.

    CLEAR
    KEY [short-string]
    VALUE [long-string]
    FLAGS [word]
    EXPIRES [long]
    SET
    GO

The point with this protocol is that all these operations can be done in any
order, as nothing is actually applied until a GO is received, and more
importantly, elements that we don't need, do not have to be included, and as
extra features are added in the future, they can just be added as additional
commands.   If we don't want to set an expiry, then we don't include it in
the operation.

For another example, the following stream, could set a key, then apply 3
tags to it

    CLEAR
    KEY [short-string]
    VALUE [long-string]
    SET
    GO
    TAG [short-string]
    GO
    TAG [short-string]
    GO
    TAG [short-string]
    GO

This also shows off the ability to selectivly clear out the accumulated
data.  The KEY we applied, stays in a buffer until we get another KEY, a
CLEAR, or the socket is closed.   I haven't used this protocol in a
connection-less environment (aka UDP), so not sure if it will work very
well,

The commands that I thought of are
    CLEAR
    GO

    KEY [short-string]
    VALUE [long-string]
    TAG [short-string]

    ADD
    SET
    GET
    PURGE
    DELETE

    INCR [word]
    DECR [word]

    EXPIRES
    FLAGS
    QUIET
    VERBOSE

    RESULT [byte]

Results are returned using the exact same command structure.  For example,
here is a get.

client --> server
    CLEAR
    GET
    KEY [key]
    GO

server --> client
    VALUE [value]
    RESULT [0]


If you wanted to get multiple keys.

client --> server
    CLEAR
    GET
    KEY [key]
    VERBOSE
    GO

server --> client
    KEY [key]
    VALUE [value]
    RESULT [0]


There are various ways errors can be handled on the server side.  Whenever I
have implemented this kind of protocol, I have taken the concept that
commands over-write competing commands.   for example, if we get a (CLEAR
SET GET KEY [key] GO) sequence, the GET over-wrote the SET condition and
this would have been a valid operation.   Even though it wuold have had an
extra and unnecessary byte.   This is especially important when re-using
settings from previous operations.   But for operations that require certain
data that is not supplied, a RESULT greater than zero should be returned (as
well as a VALUE) that can indicate what the problem is, such as  a SET
without supplying a KEY for example.

Another example, if we need to set 3 keys, to expire in 300 seconds,

    CLEAR
    SET
    EXPIRES [300]
    KEY [key1]
    VALUE [value1]
    GO
    KEY [key2]
    VALUE [value2]
    GO
    KEY [key3]
    VALUE [value3]
    GO

If you wanted to do multiple sets, only showing results for exceptions, you
would use the QUIET command which will cause it to supress RESULT output
unless there were issues, followed by a final RESULT 0, which indicates that
the operation has completed.

I have more generic stuff somewhere, regarding this kind of protocol from
previous projects where I have used it, but this should be enough to at
least get the concept across.

I was meaning to bring this up before, but never seemed to have the time.
As a result the binary protocol efforts have moved on to having actual
implementations, so this idea is probably a bit too late.  I wanted to share
however.

disclaimer: I haven't looked at he current binary spec since it was first
scratched out, its possible I'm missing some important functionality.



-- 
"Be excellent to each other"

Alternative Binary Protocol idea for memcached.

Reply via email to