On Wed, Mar 31, 2021 at 02:29:40PM +0200, Vincent Bernat wrote: > ? 31 mars 2021 12:46 +02, Willy Tarreau: > > > On the kernel Greg solved all this by issuing all versions very > > frequently: as long as you produce updates faster than users are > > willing to deploy them, they can choose what to do. It just requires > > a bandwidth that we don't have :-/ Some weeks several of us work full > > time on backports and tests! Right now we've reached a point where > > backports can prevent us from working on mainline, and where this lack > > of time increases the risk of regressions, and the regressions require > > more backport time. > > Wouldn't this mean there are too many versions in parallel?
It cannot be summed up this easily. Normally, old versions are not released often so they don't cost much. But not releasing them often complicates the backports and their testing so it's still better to try to feed them along with the other ones. However, releasing them in parallel to the other ones makes them more susceptible to get stupid issues like the last build failure with libmusl. But not releasing them wouldn't change much given that build failures in certain environments are only detected once the release sends the signal that it's time to update :-/ With this said, while the adoption of non-LTS versions has added one to two versions to the series, it has significantly reduced the pain of certain backports precisely because it resulted in splitting the population of users. So at the cost of ~1 more version in the pipe, we get more detailed reports from users who are more accustomed to enabling core dumps, firing gdb, applying patches etc, which reduces the time spent on bugs and increases the confidence in fixes that get backported. So I'd say that it remains a very good investment. However I wanted to make sure we shorten the non-LTS versions' life to limit the in-field fragmentation. And this works extremely well (I'm very grateful to our users for this, and I suspect that the status banner in the executable reminding about EOL helps). We probably have not seen any single 2.1 report in the issues over the last 3-4 months. And I expect that 6 months after 2.4 is released, we won't read about 2.3 anymore. Also if you dig into the issue tracker, you'll see a noticeable number of users who accept to run some tests on 2.3 to verify if it fixes an issue they face in 2.2. We're usually not asking for an upgrade, just a test on a very close version. This flexibility is very important as well. So the number of parallel versions is one aspect of the problem but it's also an important part of the solution. I hope we can continue to maintain short lives for non-LTS but at the same time it must remain a win-win: if we get useful reports on one version that are valid for other ones as well, I'm fine with extending it a little bit as we did for 1.9; there's no reason the ones making most efforts are the first ones punished. Overall the real issue remains the number of bugs we introduce in the code and that is unavoidable when working on lower layers where a good test coverage is extremely difficult to achieve. Making smaller and more detailed patches is mandatory. Continuing to add reg-tests definitely helps a lot. We've added more than one reg-test per week since 2.3, that's definitely not bad at all, but this effort must continue! The CI reports few false positives now and the situation has tremendously improved over the last 2 years. So with better code we can hope for less bugs, less fixes, less backports hence less risks of regressions. > > I think that the real problem arrives when a version becomes generally > > available in distros. And distro users are often the ones with the least > > autonomy when it comes to rolling back. When you build from sources, > > you're more at ease. Thus probably that a nice solution would be to > > add an idle period between a stable release and its appearance in > > distros so that it really gets some initial deployment before becoming > > generally available. And I know that some users complain when they do > > not immediately see their binary package, but that's something we can > > easily explain and document. We could even indicate a level of confidence > > in the announce messages. It has the merit of respecting the principle > > of least surprise for everyone in the chain, including those like you > > and me involved in the release cycle and who did not necessarily plan > > to stop all activities to work on yet-another-release because the > > long-awaited fix-of-the-month broke something and its own fix broke > > something else. > > We can do that. In the future, I may even tackle all the problems at > once: providing easy access to old versions and have two versions of > each repository: one with new versions immediately available and one > with a semi-fixed delay. Ah I really like this! Your packages definitely are the most exposed ones so this could very efficiently reduce the exposure in the early days and still provide a downgrade path for those who would be the unlucky ones to first detect a regression. It could also represent an incentive for users to follow updates more closely, knowing that if 2.2.14 breaks they can roll back to 2.2.13 so that it's better for them not to leave too large steps between updates. Thanks! Willy