RE: _bulk_get protocol extension

Yaron Goland Tue, 28 Jan 2014 08:56:22 -0800

I did read it and I didn't agree with it.

>       * A single slow response blocks all requests behind it.

The same is true of bulk get. Remember the only thing that can be pipelined are 
non-idempotent methods which generally means GET. So if a single GET can slow 
down the whole pipeline so a single 'virtual' GET in a bulk GET request can 
slow down the response as well.

>       * When processing in parallel, servers must buffer pipelined
> responses, which may exhaust server resources-e.g., what if one of the
> responses is very large? This exposes an attack vector against the server!

This is the whole point of buffer control in TCP. The server only pulls off 
what it can handle. If a client sends more requests than the server can handle 
then the server stops servicing the buffer and buffer control automatically 
pushes back on the client. 

Put another way if this attack works then a client can replicate it without 
pipelining just by making multiple independent requests. So either a server 
protects itself from DOS by clients or it doesn't, pipelining doesn't change 
anything.

>       * A failed response may terminate the TCP connection, forcing the
> client to re-request all the subsequent resources, which may cause duplicate
> processing.

Certainly nothing in HTTP requires such termination so what this point is 
really says 'bad clients will throw exceptions on non-200 responses'. Well bad 
clients are going to do a lot of bad silly stupid things. If they use decent 
libraries (e.g. Apache, .net, etc.) this isn't a problem because the exception 
won't terminate the connection. The connection is actually part of a pool and 
is managed separately.

So yes, bad clients will do bad things but that applies no matter what so I 
don't see it worth worrying about.

>       * Detecting pipelining compatibility reliably, where intermediaries
> may be present, is a nontrivial problem.

Pipelining is point to point, not end to end. In other words if the 
intermediary is returning 1.1 responses then it is a 1.1 intermediary otherwise 
its job is to return 1.0 even if the upstream system it's talking to is 1.1. So 
pipelining happens. So each hop only needs to probe its next hop.

>       * Some intermediaries do not support pipelining and may abort the
> connection, while others may serialize all requests.

Intermediaries that don't support pipelining publish 1.0 for just that reason. 
And serialization is always a possibility but the server can do the same 
serialization. So yes, bad infrastructure is bad infrastructure. But that isn't 
a reason to abandon the protocol and invent a new protocol to crawl through the 
old one.

So personally I'm having trouble buying the protocol argument. But you make two 
arguments in your email that seem well positioned to have a really productive 
conversation about.

Your first argument is that the overhead of GET is so bad that even in the face 
of pipelining the performance will still be significantly worse than a bulk 
request. Well you said you already implemented bulk requests. So um... why not 
publish some numbers and the code you used to generate it?

The same argument applies to ZIP and the benefits of ZIPping similar data. You 
said you already have this up and running. So why not just publish some numbers 
comparing a non-pipelined connection, a pipelined connection and your bulk GET? 
You can show latency, bandwidth and CPU load.

I suspect those numbers would make for a more productive conversation.

        Thanks,

                        Yaron

> -----Original Message-----
> From: Jens Alfke [mailto:[email protected]]
> Sent: Monday, January 27, 2014 9:13 PM
> To: [email protected]
> Subject: Re: _bulk_get protocol extension
> 
> 
> On Jan 27, 2014, at 7:26 PM, Yaron Goland <[email protected]> wrote:
> 
> > Nevertheless he did say that so long as one probes the connection then
> pipelining is known to work. Probing just means that you can't assume that
> the server you are talking to is a 1.1 server and therefore supports 
> pipelining.
> 
> Well, yes, that's pretty clear - I mean, I know pipelining's been
> implemented. (And on iOS and Mac the frameworks already know how to
> support pipelining, so one doesn't have to do the probing oneself.)
> 
> The problems with pipelining are higher level than that. Did you read the text
> by Ilya Grigorik that I linked to? Here's another excerpt:
> 
>       * A single slow response blocks all requests behind it.
>       * When processing in parallel, servers must buffer pipelined
> responses, which may exhaust server resources-e.g., what if one of the
> responses is very large? This exposes an attack vector against the server!
>       * A failed response may terminate the TCP connection, forcing the
> client to re-request all the subsequent resources, which may cause duplicate
> processing.
>       * Detecting pipelining compatibility reliably, where intermediaries
> may be present, is a nontrivial problem.
>       * Some intermediaries do not support pipelining and may abort the
> connection, while others may serialize all requests.
> -
> http://chimera.labs.oreilly.com/books/1230000000545/ch11.html#HTTP_PIPE
> LINING
> 
> (Now, HTTP 2.0 is adding multiplexing, which alleviates most of those
> problems. I'll be happy when we get to use it, but that probably won't be for
> a year or two at least.)
> 
> I also mentioned the overhead of issuing a bunch of HTTP requests versus
> just one. As a thought experiment, consider fetching a one-megabyte HTTP
> resource by using a thousand byte-range GET requests each requesting 1K of
> the file. Would this take longer than issuing a single GET request for the
> entire resource? Yeah, and probably a lot longer, even with pipelining. The
> client and the server both introduce overhead in handling requests.
> 
> Finally, consider that putting a number of related resources together into a
> single body enables better compression, since general-purpose compression
> algorithms look for repeated patterns. If I have a thousand small documents
> each of which contains a property named "this_is_my_custom_property",
> then if all those documents are returned in one response each instance of
> that string will get compressed down to a very short token. If they're
> separate responses, the string won't get compressed.
> 
> -Jens

RE: _bulk_get protocol extension

Reply via email to