Hi all, I've long wanted to send a status update on where haproxy 1.9 is going, and seeing some recent threads speculating about what will be available reminded me that it's really time to send this update. Be careful, this e-mail is long.
So first of all, while we didn't have *that* many bugs on 1.8, the ones we received lately were really painful to work on. As was expected, most of them came from the new core stuff, namely H2, threads, cache, master- worker, as well as a few existing parts having moved a lot like SSL and Lua. Fixing them kept a number of developers quite busy scratching their heads much longer than initially anticipated, to the point that the core parts that needed to be worked on had little opportunities to make any acceptable progress. Fortunately, thanks to the amazing help that was provided here on the list and on Discourse sorting out PEBKAC from bugs, we're progressively recovering and getting back to development. The income of new contributions is also a positive sign of this progressive restart. So where are we now ? The current features were already merged : - the fd_cache was made lockless, we've seen a ~40% performance gain on a 12-core machine when using threads [Olivier] ; - the CLI now supports a payload, which will be used to update certs on the fly [Aurélien] ; - SPOE was significantly updated to provide a smoother load balancing with pipeline-aware agents, allowing them to scale even better [Christopher] A number of features have already been submitted at least once, are being reviewed and/or discussed and are expected to get merged once everything is considered OK : - certificate update on the CLI [Aurélien] : from what I understand, some more discussion is needed to figure how to reliably identify a certificate. - make a "resolvers" section able to use /etc/resolv.conf [Ben]. Baptiste is going to take a look at this very soon, I think we should be good soon on this one. - queue priorities manipulation [Patrick] : Patrick already sent his patch set, he's just waiting for me to take a look. A quick look at it makes me think we're about to be good as well, there will likely be very few round trips. - alert if some header addition cannot be performed [Tim] : it's not exactly a feature but I suspect that we'll want to iterate over it to further refine the reported information in case of errors, and it's really needed. I need to take a look at it, still couldn't find time for this. - scheduler update [Olivier] : the scheduler has been reworked to become more thread-aware, to improve the inter-thread communication, and to improve fairness between tasks. The first step consisted in changing the internal API a bit. I've just received the patch set for review, it should be merged soon. The second step removes a lot of locking and will make the scheduler scale much better with many threads. It's also part of the patch set I have to review. - peers over SSL [Fred] : the purpose is to allow all bind options with peers so that peers can exchange information securely. I think I've seen it posted somewhere, I'll have to dig through the archives. A number of other things were started but not submitted yet : - applet/connection scheduler [Olivier] : the next step will consist in using regular tasks to schedule applets and prevent them from clogging the CPU. The last step will consist in using miniature sub-tasks called "tasklets" to also schedule I/Os. This is needed to sort out the trouble we have with the MUX architecture right now where it's impossible to just wake a mux up without pretending to have received a fake I/O. - analyzer timing [me] : in order to constantly improve haproxy's observability, I want to add the ability to measure the CPU time taken in each task, the latency added by other tasks, and the time spent in I/O. It becomes really important with potentially heavy Lua scripts or huge certificates which cause large delays affecting all other tasks. Now it will be possible to see the culprits and the victims. A next step will consist in adding processing timeouts to alert about, or kill offenders. - buffer rework [me] : we've reached the point where the current buffer internal API is really annoying and forces us to deploy more efforts to work around it than to update it. The details of the changes have been put into a new doc (buffers-rework.txt). It should lead to quite some code removal. The changes will affect a lot of places, will be 90% mechanical and 10% tricky. Over the long term it will allow to merge chunks and buffers and remove buffer allocations at many places, as well as permit to send error files that are larger than a buffer. - native HTTP representation allowing end-to-end HTTP/2 [Christopher] : this will allow HTTP/2 to be used end-to-end. For now Chris is working on the "easy" part which is the HTTP/1 parsing/conversion. I'm putting quotes around "easy" because while HTTP/1 appears easy to deal with, that's where the highest number of obstacles are present due to the fact that HTTP/1 is currently used as the native internal representation (H2 is currently translated to H1). This will heavily rely on the mux changes. We know that some parts ("tcp-request content" L7-based rules, option http-send-name-header, and filters/compression) will cause some head scratching. The expected gains from this part are tremendous. But while I thought that H2 and threads used to be the most complex changes over the last decade, I think this one's complexity has already surpassed them both. I'd say the chances of success for 1.9 are around 70%, which is not much but which still gives us the motivation for pursuing the efforts. Once completed, Christopher will deserve a barrel of beer to forget about this painful task ;-) - master/worker improvements [William] : William is currently working on making the master process much smarter and able to really manage the workers. It will be possible to have a CLI on the master to consult the list of workers for example, and maybe later even bounce between CLIs from the master. I'm unclear about all the details but when he explains to me it sounds really cool. - http-request "do-resolve" action [Baptiste] : the purpose is to be able to perform DNS resolutions on the fly (ie read a name from a variable, set the IP into another one). Since converters cannot yield, Baptiste implemented this as an action. I don't remember if there's something equivalent planned nor possible in Lua. Not started yet but already planned : - split certificates [William] : some users prefer to have separate .key and .crt files (eg for permission reasons). William has already started to look and it seems doable, there are "just" a few shortcomings to take care of (I don't remember which ones). - certificate merge on startup [William] : the purpose is to save a lot of RAM usage by trying to load certificates only once even when they appear on multiple bind lines. Some more discussion is required on this. - mux rework [Chris/Olivier/me?] : the current mux design is painful as we have to cross various levels multiple times and duplicate some application level code. A rework attempt was made consisting in chaining layers in a more natural way which relies on breaking a very old assumption, which is that send() is only performed upon solliciation from I/O callbacks instead of from the upper layers. We do have some experimental code for this and the end-to-end HTTP code will heavily rely on this. Currently a big part of the difficulties lie in the connection scheduling and buffers API limitations, addressed in the steps above. - mux upgrade [Chris/Olivier/me?] : Christopher has identified that we'll likely need that a mux can be upgraded on the fly (eg: switch from a TCP frontend to an HTTP backend). This is another necessary pain inflicted by the HTTP mux design which will offer many new perspectives in the future. - small improvements to the cache [William] : I don't remember exactly what, I have some memories about removing the single-buffer size limitation and possibly starting to take Vary into account in some cases. - SSL layer splitting between FD level and buffer level [Olivier] : this will be required for QUIC so that we can feed buffers containing SSL traffic to OpenSSL. For now the SSL layer relies on OpenSSL's native read()/write() calls (yes, the ones that force you to disable SIGPIPE in gdb). Some ideas to study later, nothing really assigned : - certificates : study if it makes sense to parse them on the fly on first use to also save on startup time. - logs: some improvements to the logging system have been discussed about a few times, which can be summarized like this : - some people would like a set of log "profiles" that can be reused everywhere instead of copy-pasting the same log-format lines. - some people really need a per-server log-format because they send, say, native logs locally and JSON logs to another server. - some people want to load-balance between log servers - other people want to sample logs without having to trick the config => all of this has to be addressed together. - the "return" directive, regularly talked about, may possible be done for 1.9. It's possible I accidently missed someone's tasks, if so, please yell now. I'd like that any development outside the scope of what is enumerated above and not started yet is not submitted for now, because discussing designs and reviewing code takes a huge amount of time and concentration that inevitably postpones code progress on planned stuff. If some people want to start to work on significant changes, I'd prefer they do it in their own branches. We can possibly create a -next branch to merge such pending stuff once reviewed and accepted. If some are interested in participating to the "not started but planned" part because they already have things available, or want to take care of stuff not listed there, please discuss this here so that we can all decide together. Just be patient about the response :-) I think I'll soon issue a -dev1 after we merge most of the pending stuff so that at least those who contributed it can use it in a tagged version, and can ask for testers around them. If I forget, ping me! For the next steps given that we've spent a lot of time dealing with bugs and painful patches, I'll be a bit more demanding regarding changes. While the vast majority of regular contributors have made an awesome progress in the quality of their patches, we're still spending some time reviewing huge patches that should be broken up into smaller pieces, or re-editing commit messages to avoid ruining the life of the -stable team. Sometimes patches submitted for review stay pending for a long time just because of irritating things that will require more processing time than needed (usually it's from first-time contributors). Thus please ensure you've *really* read CONTRIBUTING at least once in your life before submitting patches, even if you're used to submitting them. Feel free to disagree with it and to propose patches against it. At least for those who've read it, patches have always been much easier handled and were merged much faster. If you don't get a response for one week after submitting a patch, complain loudly! If you're waiting for a maintainer who doesn't respond, complain as well! We're all humans and we all miss stuff, and you cannot expect that someone will suddenly rediscover your submission in the middle of 20000 other messages 1 month later. For maintainers, if you cannot respond in time because you're busy, please at least respond "ping me again in 5 days" or so. Last point, as some of you might have noticed, we finally recovered the haproxy GitHub account (thanks to Jeff for this as well as Dan, Lukas and Joe for helping coordinate the operations). A new haproxy repository was placed there only with mainline for now. By the way if you're having a fork from more than one month ago, chances are that you accidently forked the wrong one ;-) We do have a few plans with this account now, some of which seem appealing, and others who don't sound really compatible with some annoying GitHub limitations, making me think that maybe in the end we should switch to Gitlab (William already created haproxy.org there) : - re-enable issues : we still don't have a public bug tracker/todo list and it's a pain for everyone. It would be nice to show what's pending or being worked on, and to sometimes add extra info regarding a given task. The problem we've faced in the past with GitHub's issue tracker is that it's impossible to restrict the creation of new issues to participants. Given that haproxy is a complex component, the boundary between human error, misunderstanding and bug is extremely thin. It resulted in the issue tracker being filled with wrong bugs, 100% of which were in fact requests for help. It makes the utility totally useless for development and bug fixing as it requires more time to maintain in a clean state than it takes to put issues in a mailbox. Some people suggested using issue templates, but I hardly see how this will help, at best it will annoy regular participants, at worst it will not stop polluters from filling crap there. What I'd ideally like would be that about any participant here on the list (i.e. those who help others) can create an issue, and that anyone could see open issues and add info to existing issues. The closest we've found to this is Gitlab, but when issue creation is limited to contributors, only these ones can watch them. I tend to consider that once many participants here are contributors, the tracker becomes almost public thus this limitation is not a big deal, but it's still annoying. A limitation that isn't addressed by any of them is that an issue has a single status and not one per maintenance branch. Some will say that labels replace this but I'd say that this isn't true, it just vaguely emulates this. Anyway if we don't have better we can go with this. I often dream of the day someone pissed of by using prehistoric issue trackers writes a modern one, just like when Linus created Git... - wiki : we all know that the architecture guide is obsolete, everyone wants to refresh it and nobody can because it's a tedious task that no single person can address, and nobody anymore knows all haproxy pieces. In addition a lot of participants have useful tips to share and would be much more at ease with putting this into a wiki than by editing architecture.txt. Thus I'd like to reuse either Github or Gitlab's wiki to place real contents edited and maintained by the community, and to kill this obsolete file. This way the obsolete stuff on the main haproxy.org page could go as well. - automated builds and CI : gitlab and github have these possibilities (either natively or relying on third party products). It would help us detect the occasional stupid build breakage (like building with/ without threads, with/without SSL). - we could imagine reusing self-hosted pages to put Cyril's generated docs, or to link there if it's easier. Similar for Thierry's Lua docs, it will first depend on what these ones prefer (I don't want to steal your work guys, don't worry, just suggesting that it could be done if you find it desirable). - maybe placing the releases there will help (I don't know the upload process for any of these, will have to check closer). It may even be used for Aleks' docker images if that makes sense at all (I don't know). Feel free to discuss these points here on the list as usual. Please don't PM me, I don't have time to rehash the same points with multiple persons. If I missed anything important that you care about, say it now. thanks, Willy