Re: optimizing TLS time to first byte

Willy Tarreau Thu, 06 Feb 2014 23:04:47 -0800

Hi Ilya,

On Thu, Feb 06, 2014 at 04:14:14PM -0800, Ilya Grigorik wrote:
(...)
> > I preferred to only rely on CF_STREAMER and ignore the _FAST variant
> > because it would only favor high bandwidth clients (it's used to
> > enable splice() in fact). But I thought that CF_STREAMER alone would
> > do the right job. And your WPT test seems to confirm this, when we
> > look at the bandwidth usage!
> 
> Gotcha, thanks. As a follow up question, is it possible for me to control
> the size of the read buffer?


Yes, in the global section, you can set :

  - tune.bufsize : size of the buffer
  - tune.maxrewrite : reserve at the end of the buffer which is left
    untouched when receiving HTTP headers

So during the headers phase, the buffer is considered full with
(bufsize-maxrewrite) bytes. After that, it's bufsize only.

> > > This works great if we're talking to a
> > > backend in "http" mode: we parse the HTTP/1.x protocol and detect when a
> > > new request is being processed, etc. However, what if I'm using HAProxy
> > to
> > > terminate TLS (+alpn negotiate) and then route the data to a "tcp" mode
> > > backend.. which is my spdy / http/2 server talking over a non-encrypted
> > > channel.
> >
> > Ah good point. I *suspect* that in practice it will work because :
> >
> >   - the last segment of the first transfer will almost always be incomplete
> >     (you don't always transfer exact multiples of the buffer size) ;
> >   - the first response for the next request will almost always be
> > incomplete
> >     (headers and not all data)
> >
> 
> Ah, clever. To make this more interesting, say we have multiple streams in
> flight: the frames may be interleaved and some streams may finish sooner
> than others, but since multiple are in flight, chances are we'll be able to
> fill the read buffer until the last stream completes.. which is actually
> exactly what we want: we wouldn't want to reset the window at end of each
> stream, but only when the connection goes quiet!

But then if we have multiple streams in flight, chances are that almost
all reads will be large enough to fill the buffer. This will certainly
not always be the case, but even if we're doing incomplete reads, we'll
send all what we have at once until there are at least two consecutive
incomplete reads. That said, the best way to deal with this will obviously
be to implement support for the upper protocols themselves at some point.

> > So if we're in this situation, this will be enough to reset the CF_STREAMER
> > flag (2 consecutive incomplete reads). I think it would be worth testing
> > it.
> > A very simple way to test it in your environment would be to chain two
> > instances, one in TCP mode deciphering, and one in HTTP mode.
> >
> 
> That's clever. I think for a realistic test we'd need a SPDY backend
> though, since that's the only way we can actually get the multiplexed
> streams flowing in parallel.

Yes it would be interesting to know how it behaves.

> > One of the huge difficulties we'll face will be to manage multiple streams
> > over one connection. I think it will change the current paradigm of how
> > requests are instanciated (which already started). From the very first
> > version, we instanciated one "session" upon accept(), and this session
> > contains buffers on which analyzers are plugged. The HTTP parsers are
> > such analyzers. All the states and counters are stored at the session
> > level. In 1.5, we started to change a few things. A connection is
> > instanciated upon accept, then the session allocated after the connection
> > is initialized (eg: SSL handshake complete). But splitting the sessions
> > between multiple requests will be quite complex. For example, I fear
> > that we'll have to always copy data because we'll have multiple
> > connections on one side and a single multiplexed one on the other side.
> > You can take a look at doc/internal/entities.pdf if you're interested.
> >
> 
> Yep, and you guys are not the only ones that will have to go through this
> architectural shift... I think many of the popular servers (Apache in
> particular comes to mind), might have to seriously reconsider their
> internal architecture. Not an easy thing to do, but I think it'll be worth
> it. :-)

Yes but the difficulty is that we also try to remain performant. At the
moment, we have no problem load-balancing videos or moderately large
objects (images) at 40 Gbps through a single-socket Xeon. I really fear
that the architecture required for HTTP/2 will make this drop significantly,
just because of the smaller windows, extra send/recv calls, and possibly
extra copies. And the worst thing would be to lose this performance in
HTTP/1 just because of the architecture shift needed to support HTTP/2.
We'll see...

Cheers,
Willy

Re: optimizing TLS time to first byte

Reply via email to