On Fri, Nov 07, 2014 at 09:45:00AM -0600, Josh Poimboeuf wrote: > > LEAVE_FUNCTION > > LEAVE_PATCHED_SET > > LEAVE_KERNEL > > > > SWITCH_FUNCTION > > SWITCH_THREAD > > SWITCH_KERNEL > > > > Now with those definitions: > > > > livepatch (null model), as is, is LEAVE_FUNCTION and SWITCH_FUNCTION > > > > kpatch, masami-refcounting and Ksplice are LEAVE_PATCHED_SET and > > SWITCH_KERNEL > > > > kGraft is LEAVE_KERNEL and SWITCH_THREAD > > > > CRIU/kexec is LEAVE_KERNEL and SWITCH_KERNEL > > Thanks, nice analysis! > > > By blending kGraft and masami-refcounting, we could create a consistency > > engine capable of almost any combination of these properties and thus > > all the consistency models. > > Can you elaborate on what this would look like?
There would be the refcounting engine, counting entries/exits of the area of interest (nothing for LEAVE_FUNCTION, patched functions for LEAVE_PATCHED_SET - same as Masami's work now, or syscall entry/exit for LEAVE_KERNEL), and it'd do the counting either per thread, flagging a thread as 'new universe' when the count goes to zero, or flipping a 'new universe' switch for the whole kernel when the count goes down to zero. A patch would have flags which specify a combination of the above properties that are needed for successful patching of that specific patch. > The big problem with SWITCH_THREAD is that it adds the possibility that > old functions can run simultaneously with new ones. When you change > data or data semantics, which is roughly 10% of security patches, it > creates some serious headaches: > > - It makes patch safety analysis much harder by doubling the number of > permutations of scenarios you have to consider. In addition to > considering newfunc/olddata and newfunc/newdata, you also have to > consider oldfunc/olddata and oldfunc/newdata. > > - It requires two patches instead of one. The first patch is needed to > modify the old functions to be able to deal with new data. After the > first patch has been fully applied, then you apply the second patch > which can start creating new versions of data. For data layout an semantic changes, there are two approaches: 1) TRANSFORM_WORLD Stop the world, transform everything, resume. This is what Ksplice does and what could work for kpatch, would be rather interesting (but possible) for masami-refcounting and doesn't work at all for the per-thread kGraft. It allows to deallocate structures, allocate new ones, basically rebuild the data structures of the kernel. No shadowing or using of padding is needed. The nice part is that the patch can stay pretty much the original patch that fixes the bug when applied to normal kernel sources. The most tricky part with this approach is writing the additional transformation code. Finding all instances of a changed data structure. It fails if only semantics are changed, but that is easily fixed by making sure there is always a layout change for any semantic change. All instances of a specific data structure can be found, worst case with some compiler help: No function can have pointers or instances of the structure on the stack, or registers, as that would include it in the patched set. So all have to be either global, or referenced by a globally-rooted tree, linked list or any other structure. This one is also possible to revert, if a reverse-transforming function is provided. masami-refcounting can be made to work with this by spinning in every function entry ftrace/kprobe callback after a universe flip and calling stop_kernel from the function exit callback that flipped the switch. 2) TRANSFORM_ON_ACCESS This requires structure versioning and/or shadowing. All 'new' functions are written with this in mind and can both handle the old and new data formats and transform the data to the new format. When universe transition is completed for the whole system, a single flag is flipped for the functions to start transforming. The advantage is to not have to look up every single instance of the structure and not having to make sure you found them all. The disadvantages are that the patch now looks very different to what goes into the kernel sources, that you never know whether the conversion is complete and reverting the patch is tough, although can be helped by keeping track of transformed functions at a cost of maintaining another data structure for that. It works with any of the approaches (except null model) and while it needs two steps (patch, then enable conversion), it doesn't require two rounds of patching. Also, you don't have to consider oldfunc/newdata as that will never happen. oldfunc/olddata obviously works, so you only have to look at newfunc/olddata and newfunc/newdata as the transformation goes on. I don't see either of these as really that much simpler. But I do see value in offering both. > On the other hand, SWITCH_KERNEL doesn't have those problems. It does > have the problem you mentioned, roughly 2% of the time, where it can't > patch functions which are always in use. But in that case we can skip > the backtrace check ~90% of the time. An interesting bit is that when you skip the backtrace check you're actually reverting to LEAVE_FUNCION SWITCH_FUNCTION, forfeiting all consistency and not LEAVE_FUNCTION SWITCH_KERNEL as one would expect. Hence for those 2% of cases (going with your number, because it's a guess anyway) LEAVE_PATCHED_SET SWITCH_THREAD would in fact be a safer option. > So it's really maybe something > like 0.2% of patches which can't be patched with SWITCH_KERNEL. But > even then I think we could overcome that by getting creative, e.g. using > the multiple patch approach. > > So my perspective is that SWITCH_THREAD causes big headaches 10% of the > time, whereas SWITCH_KERNEL causes small headaches 1.8% of the time, and > big headaches 0.2% of the time :-) My preferred way would be to go with SWITCH_THREAD for the simpler stuff and do a SWITCH_KERNEL for the 10% of complex patches. This because (LEAVE_PATCHED_SET) SWITCH_THREAD finishes much quicker. But I'm biased there. ;) It seems more and more to me that we will actually want the more powerful engine coping with the various options. -- Vojtech Pavlik Director SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/