Re: Alternative Binary Protocol idea for memcached.

Clint Webb Thu, 21 Feb 2008 16:56:34 -0800

I don't know if there is any need for a different binary protocol, I was
only bringing it up as an alternative design style if it ever comes handy in
the future.


I do have working code of a very similar protocol used for a different
purpose and was curious how easy it would be to integrate it into memcached
since Dustin mentioned that the protocol handling was abstracted.

So when I looked at the memcached code (binary branch) I realized that the
protocol abstraction could be improved a bit, so thats what I've been
looking at.  I wouldn't say that the performance improvements would be
massive, but I do think that it would be something measurable at least.

Regardless of whether a third protocol is ever implemented, I think I can
provide some patches that improve the performance, and at the same time
completely abstract out the protocol handling.

The improvements come into place because currently we have an event handler
that has to handle both ascii protocol and binary protocol.  Both need
completely different optimisations.  The ascii one needs to keep reading in
data until it gets a complete line.  The binary one only needs to read in X
amount of bytes, figure out what command it is, read X more bytes, etc...

By using a different event handler for each protocol, and seperating the
code, it not only makes it easier to understand, but could also provide a
situation where we could easily compile in only the binary protocol for
example.



On Thu, Feb 21, 2008 at 12:06 PM, Marc <[EMAIL PROTECTED]> wrote:

>  I'd like to see a compelling use case and prototype for just one more
> protocol before we start worrying about 100s of protocols.  Generalization
> for its own sake isn't an interesting design goal.  What problem are you
> trying to solve that isn't solved by the current implementation?  What do
> you expect to gain by deferring requests?  I'm a little mystified.
>
>
> On 2/20/08 6:42 PM, "Clint Webb" <[EMAIL PROTECTED]> wrote:
>
> Alright, I've started looking at the code.
>
> Would anyone object if I first started working on abstracting the protocol
> handling routines so that they are handled by callback instead of the
> current if/else setup?
>
> Then I was going to put all the protocol stuff in prot_ascii.h|c and
> prot_binary.h|c respectively.
>
> This way we could potentially have hundreds of protocols handled without
> having a lengthy if/else comparison.
>
> On Wed, Feb 20, 2008 at 10:06 AM, Clint Webb <[EMAIL PROTECTED]> wrote:
>
> I wasn't expecting such a quick reply.  Good point about allowing multiple
> protocols.  I might pull out some of my old code and see how easy it is to
> drop in.
>
> I thought I'd give a little background on myself and this protocol style.
> I used to work in the controls and automation industry.  If you've ever
> checked your luggage into an airport, or sent anything thru USPS, UPS or
> FedEX, or bought anything online from amazon, b&n and other large online
> stores, or from Walmart, kmart, target, etc... then your product has likely
> had some experience with my coding somewhere along the line.
>
> I developed this protocol for a tiny little side-project for one of the
> above mentioned companies.  It was basically a connector that took
> information from a large number of different systems and passed it to
> another.  The requirements changed a lot, so I developed something that
> could be fast, but had to be flexible.  If I added a feature to the server,
> I didn't want to be forced to update all the clients as well.   Also, some
> of the clients were tiny little embedded controllers, so it had to be pretty
> simple too.
>
> This solution was VERY fast, as all the commands are a single byte and
> could be easily mapped to an array of callback routines.   This protocol
> also had to run on a real-time system also, so we had to ensure that all
> operations preformed in a predictable fashion.
>
> I seperated the commands by their parameter type.  0-31 had no parameter.
>  32-63 had a single byte parameter.  64-95 had a 32-bit parameter.  96-127
> had a single-byte paramter which was the length of a stream of data to
> follow (short-string).  128-159 had a 32-bit integer that was the length of
> a stream of data to follow.   This was our 5 different parameter types.  A
> command could only have one parameter.
>
> This way, the IO handling code could retrieve all the necessary data and
> then pass it off to a callback routine for that command.
>
> Each socket connection would have a set of buffers set aside for the
> incoming data.  In this case we would want a buffer set aside to hold the
> key and value data.
>
> To speed up processing and ensure that the minimum data set has been
> provided,  we used a 32-bit (or was it 64-bit?) word as flags.   Each
> operation would set or clear a flag(s).   So when a GO command is received,
> it can quickly determine what 'action' needs to take place,  and which
> 'parameters' have been provided.
>
> If we ran out of room having to handle more than 256 commands, we would
> use a command of 0xFF which would expect that the next byte would be another
> command (from a set different to the first).  I never actually implemented
> it though.  The most commands I ever used was about 100 or so.
>
> I cant imagine that a variable-length structured protocol could be much
> faster than that.  Still the emphasis of this protocol is not so much on
> speed, but on flexibility to add functionality to the protocol (by adding
> commands) without breaking existing clients (and without having to handle
> multiple versions of the protocol).
>
> The 'noreply' stuff that I have seen around the list could probably
> benefit from this protocol.  I haven't looked close enough at the CAS stuff
> either, but I suspect that would be easy to implement too.
>
> Also, those that want to shave off a few extra bytes in their client, have
> the option of sending a request that only includes the bits they want.  If
> you care about expiry leave it out, same with flags, tags, cas id's, and
> anything else.  Plus you can stream-line some of your requests by not using
> the CLEAR command, and re-using the state.
>
> Dang, if I had a little more time on my hands right now, I'd be really
> tempted to implement it.   I don't actually have a *need* for this protocol
> in memcached, it was purely an intellectual itch ever since I saw people
> complaining about the existing protocols being difficult to expand.
>
>
>
> On Feb 20, 2008 4:14 PM, Dustin Sallings <[EMAIL PROTECTED]> wrote:
>
>
> On Feb 19, 2008, at 22:20, Clint Webb wrote:
>
> > I know a considerable amount of thought and work has gone into the
> > existing flavour of the binary protocol, and I dont expect that work
> > to be discarded, I'm really only mentioning this new concept now as
> > an alternative for the future if we ever find the current binary
> > protocol to be too restrictive and inflexible.  And something to
> > think about, or even use elsewhere.
>
>         The is certainly interesting.  The first step of doing the binary
> protocol implementation was to create a change that allowed multiple
> protocols to coexist.  It would be possible to implement this to run
> in parallel with the existing protocols in the 1.3 codebase.
>
>         Intuitively, it doesn't seem as efficient to process as what we've
> got now, but I like being proven wrong, so I'd welcome another front-
> end protocol.  :)
>
>         Of course, I wrote the majority of the current binary protocol
> code
> about six months ago, so I'd really like to at least have one in more
> people's hands.
>
> --
> Dustin Sallings
>
>
>
>
>


-- 
"Be excellent to each other"

Re: Alternative Binary Protocol idea for memcached.

Reply via email to