> On Jan 11, 2019, at 1:41 PM, Josh Poimboeuf <jpoim...@redhat.com> wrote: > > On Fri, Jan 11, 2019 at 09:36:59PM +0000, Nadav Amit wrote: >>> On Jan 11, 2019, at 1:22 PM, Josh Poimboeuf <jpoim...@redhat.com> wrote: >>> >>> On Fri, Jan 11, 2019 at 12:46:39PM -0800, Linus Torvalds wrote: >>>> On Fri, Jan 11, 2019 at 12:31 PM Josh Poimboeuf <jpoim...@redhat.com> >>>> wrote: >>>>> I was referring to the fact that a single static call key update will >>>>> usually result in patching multiple call sites. But you're right, it's >>>>> only 1-2 trampolines per text_poke_bp() invocation. Though eventually >>>>> we may want to batch all the writes like what Daniel has proposed for >>>>> jump labels, to reduce IPIs. >>>> >>>> Yeah, my suggestion doesn't allow for batching, since it would >>>> basically generate one trampoline for every rewritten instruction. >>> >>> As Andy said, I think batching would still be possible, it's just that >>> we'd have to create multiple trampolines at a time. >>> >>> Or... we could do a hybrid approach: create a single custom trampoline >>> which has the call destination patched in, but put the return address in >>> %rax -- which is always clobbered, even for callee-saved PV ops. Like: >>> >>> trampoline: >>> push %rax >>> call patched-dest >>> >>> That way the batching could be done with a single trampoline >>> (particularly if using rcu-sched to avoid the sti hack). >> >> I don’t see RCU-sched solves the problem if you don’t disable preemption. On >> a fully preemptable kernel, you can get preempted between the push and the >> call (jmp) or before the push. RCU-sched can then finish, and the preempted >> task may later jump to a wrong patched-dest. > > Argh, I misspoke about RCU-sched. Words are hard. > > I meant synchronize_rcu_tasks(), which is a completely different animal. > My understanding is that it waits until all runnable tasks (including > preempted tasks) have gotten a chance to run.
Actually, I just used the term you used, and thought about synchronize_sched(). If you look at my patch [1], you’ll see I did something similar using synchronize_sched(). But this required some delicate work of restarting any preempted “optpoline” (or whatever name you want) block. [Note that my implementation has a terrible bug in this respect]. This is required since running a preempted task to does now prevent it from being preempted again without doing any “real” progress. If we want to adapt the same solution to static_calls, this means that in retint_kernel (entry_64.S), you need check whether you got preempted inside the trampoline and change the saved RIP in such case back, before the static_call. IMHO, sti+jmp is simpler. [1] https://lore.kernel.org/lkml/20181231072112.21051-6-na...@vmware.com/