Hi all,                                       

I found some time to work on haproxy last weeks and to perform a number of
fundamental changes that have been needed for a long time.

First, while working on SSL and Compression at Exceliance, we found that the
way the internal buffers and the HTTP message interact is really annoying.
It comes from a long leftover of the migration which happened in 1.3 but it
now had to come to an end. Some buffer manipulation functions have to deal
with pointers that are copied into other places and because of this, some
operations such as a simple realign are not possible.

So I've changed the way it works. Now a buffer has a base (or origin) pointer,
everything below it is from the past and is leaving the buffer. Everything
above it is new and waiting for being forwarded. And HTTP messages don't hold
and absolute pointer anymore, just offsets relative to the base pointer. The
change was complex but the code is much more manageable and offers much more
flexibility right now.

Some of these changes conflicted with the ACL and pattern frameworks, so it
was the right moment to merge them together. We now have a single sample fetch
function for each type of data we want to extract, and both ACLs and patterns
rely on this. The first user-visible benefit from this is that ACLs can now
match cookies, URL parameters and arbitrary payloads. In practice, the current
code is almost ready to enable session tracking on any input criteria. I
thought I could make the track-sc1 and track-sc2 actions track headers but
some more changes were needed that were out of the scope of all these changes,
so I left them for later.

Since some ACLs and pattern fetch methods supported an argument, a new argument
management framework was implemented, making it very easy to declare variable
number of typed arguments for new keywords. Thanks to this extension, I could
bring new optional arguments to hdr() and cook() fetch methods to specify an
occurrence number. This allows stick-tables to extract an IP address from a
precise occurrence of the X-Forwarded-For header for instance, and to write
ACLs which match such headers against networks found in files.

Another point which had to be done was to automatically type the samples.
Since the pattern framework supported automatic type casts, it was easy to
complete this. Thanks to these types, we now support IPv6 ACLs, and the
"src" and "dst" ACL/patterns are IPv4 or IPv6 depending on the data found.
This is important because it means that it is now possible to mix v4 and v6
addresses in ACL patterns. As a side effect, the "src6" and "dst6" pattern
fetches have been removed because they were redundant with "src" and "dst".

All these extensions required some improved parsing and error reporting.
Thus I have implemented a simple and convenient error reporting framework
based on a new "memprintf()" function which acts on a single pointer that
is automatically reallocated and freed. A large number of config parsing
options (specifically the ACL ones) which used to report "error at line X"
are now able to say something like "occurrence -20 too negative at argument
2 of hdr_ip(), must be >= -10". I wish I've done this earlier, it's so simple,
it took far less time to implement than the time it took to design without
it in the past !

Along these things, the long-awaited "use-server" directive was introduced.
It works as an exception to load balancing and persistence. It is convenient
to avoid creating many backends when you want to select a server for a
specific purpose (eg: monitoring). 

The log framework now learned to create, emit and log a unique request ID.
Using the same syntax as log-format, it is possible to build a string which
is supposed to uniquely identify a request in a given environment. This
string is logged and emitted in headers so that everyone along the chain can
log the same information, making it much easier to correlate events across
large infrastructures.

The error capture system was lacking a number of important information. I
discovered this while trying to track a bug I have on my server, which causes
invalid contents to sometimes be emitted and blocked by haproxy which logs
them. Unfortunately, the level of information made these traces inexploitable.
Now there are additional information such as the client's source port, all
known internal flags, the position in the stream and the length of the last
chunk. This will probably help when I get the error again.

Another point, I found an uninitialized entry in a structure which made me
waste 2 hours because on one machine, the first malloc() returned a zeroed
area while on another one it was not the case. So I have added a command
line option to enable memory poisonning. It immediately gave me another
occurrence which I fixed :-) However I think the code is safe now.

A number of other minor issues were fixed :
  - balance source did not properly hash IPv6 addresses (Alex Markham)
  - logformat could sometimes segfault (William Lallemand)
  - req_ssl_sni would randomly fail if a session ID is present (Emmanuel Bégazu)
  - doc cleanups and fixes to support HTML converter (Cyril Bonté)

What's in the pipe now ? We're still working on getting SSL and compression to
work. I think that the deep changes are done for now. We found how to split
the socket and protocol handling and it looks promising. I hope that we'll get
something working soon so that we can work on the multi-process model which is
an absolute requirement if we want to get some performance on SSL. I already
have some ideas and I believe I found how we could share the stats and servers
states between all processes. But that's for another version :-)

I've just released haproxy 1.5-dev9 with all the changes above. Thanks to all
those who read till there.

   site index      : http://haproxy.1wt.eu/
   sources         : http://haproxy.1wt.eu/download/1.5/src/devel/
   changelog       : http://haproxy.1wt.eu/download/1.5/src/CHANGELOG
                             
Cheers,
Willy


Reply via email to