On Tue, Feb 24, 2015 at 11:23:29AM +0100, Ingo Molnar wrote: > > Your upgrade proposal is an *enormous* disruption to the > > system: > > > > - a latency of "well below 10" seconds is completely > > unacceptable to most users who want to patch the kernel > > of a production system _while_ it's in production. > > I think this statement is false for the following reasons.
The statement is very true. > - I'd say the majority of system operators of production > systems can live with a couple of seconds of delay at a > well defined moment of the day or week - with gradual, > pretty much open ended improvements in that latency > down the line. In the most usual corporate setting any noticeable outage, even out of business hours, requires an ahead notice, and an agreement of all stakeholders - teams that depend on the system. If a live patching technology introduces an outage, it's not "live" and because of these bureaucratic reasons, it will not be used and a regular reboot will be scheduled instead. > - I think your argument ignores the fact that live > upgrades would extend the scope of 'users willing to > patch the kernel of a production system' _enormously_. > > For example, I have a production system with this much > uptime: > > 10:50:09 up 153 days, 3:58, 34 users, load average: 0.00, 0.02, 0.05 > > While currently I'm reluctant to reboot the system to > upgrade the kernel (due to a reboot's intrusiveness), > and that is why it has achieved a relatively high > uptime, but I'd definitely allow the kernel to upgrade > at 0:00am just fine. (I'd even give it up to a few > minutes, as long as TCP connections don't time out.) > > And I don't think my usecase is special. I agree that this is useful. But it is a different problem that only partially overlaps with what we're trying to achieve with live patching. If you can make full kernel upgrades to work this way, which I doubt is achievable in the next 10 years due to all the research and infrastructure needed, then you certainly gain an additional group of users. And a great tool. A large portion of those that ask for live patching won't use it, though. But honestly, I prefer a solution that works for small patches now, than a solution for unlimited patches sometime in next decade. > What gradual improvements in live upgrade latency am I > talking about? > > - For example the majority of pure user-space process > pages in RAM could be saved from the old kernel over > into the new kernel - i.e. they'd stay in place in RAM, > but they'd be re-hashed for the new data structures. > This avoids a big chunk of checkpointing overhead. I'd have hoped this would be a given. If you can't preserve memory contents and have to re-load from disk, you can just as well reboot entirely, the time needed will not be much more.. > - Likewise, most of the page cache could be saved from an > old kernel to a new kernel as well - further reducing > checkpointing overhead. > > - The PROT_NONE mechanism of the current NUMA balancing > code could be used to transparently mark user-space > pages as 'checkpointed'. This would reduce system > interruption as only 'newly modified' pages would have > to be checkpointed when the upgrade happens. > > - Hardware devices could be marked as 'already in well > defined state', skipping the more expensive steps of > driver initialization. > > - Possibly full user-space page tables could be preserved > over an upgrade: this way user-space execution would be > unaffected even in the micro level: cache layout, TLB > patterns, etc. > > There's lots of gradual speedups possible with such a model > IMO. Yes, as I say above, guaranteeing decades of employment. ;) > With live kernel patching we run into a brick wall of > complexity straight away: we have to analyze the nature of > the kernel modification, in the context of live patching, > and that only works for the simplest of kernel > modifications. But you're able to _use_ it. > With live kernel upgrades no such brick wall exists, just > about any transition between kernel versions is possible. The brick wall you run to is "I need to implement full kernel state serialization before I can do anything at all." That's something that isn't even clear _how_ to do. Particularly with Linux kernel's development model where internal ABI and structures are always in flux it may not even be realistic. > Granted, with live kernel upgrades it's much more complex > to get the 'simple' case into an even rudimentarily working > fashion (full userspace state has to be enumerated, saved > and restored), but once we are there, it's a whole new > category of goodness and it probably covers 90%+ of the > live kernel patching usecases on day 1 already ... Feel free to start working on it. I'll stick with live patching. -- Vojtech Pavlik Director SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/