On Fri, 19 Oct 2018 04:44:33 +0000 Nadav Amit <na...@vmware.com> wrote:
> at 9:29 PM, Andy Lutomirski <l...@kernel.org> wrote: > > >> On Oct 18, 2018, at 6:08 PM, Nadav Amit <na...@vmware.com> wrote: > >> > >> at 10:00 AM, Andy Lutomirski <l...@amacapital.net> wrote: > >> > >>>> On Oct 18, 2018, at 9:47 AM, Nadav Amit <na...@vmware.com> wrote: > >>>> > >>>> at 8:51 PM, Andy Lutomirski <l...@amacapital.net> wrote: > >>>> > >>>>>> On Wed, Oct 17, 2018 at 8:12 PM Nadav Amit <na...@vmware.com> wrote: > >>>>>> at 6:22 PM, Andy Lutomirski <l...@amacapital.net> wrote: > >>>>>> > >>>>>>>> On Oct 17, 2018, at 5:54 PM, Nadav Amit <na...@vmware.com> wrote: > >>>>>>>> > >>>>>>>> It is sometimes beneficial to prevent preemption for very few > >>>>>>>> instructions, or prevent preemption for some instructions that > >>>>>>>> precede > >>>>>>>> a branch (this latter case will be introduced in the next patches). > >>>>>>>> > >>>>>>>> To provide such functionality on x86-64, we use an empty REX-prefix > >>>>>>>> (opcode 0x40) as an indication that preemption is disabled for the > >>>>>>>> following instruction. > >>>>>>> > >>>>>>> Nifty! > >>>>>>> > >>>>>>> That being said, I think you have a few bugs. First, you can’t just > >>>>>>> ignore > >>>>>>> a rescheduling interrupt, as you introduce unbounded latency when this > >>>>>>> happens ― you’re effectively emulating preempt_enable_no_resched(), > >>>>>>> which > >>>>>>> is not a drop-in replacement for preempt_enable(). To fix this, you > >>>>>>> may > >>>>>>> need to jump to a slow-path trampoline that calls schedule() at the > >>>>>>> end or > >>>>>>> consider rewinding one instruction instead. Or use TF, which is only a > >>>>>>> little bit terrifying… > >>>>>> > >>>>>> Yes, I didn’t pay enough attention here. For my use-case, I think that > >>>>>> the > >>>>>> easiest solution would be to make synchronize_sched() ignore > >>>>>> preemptions > >>>>>> that happen while the prefix is detected. It would slightly change the > >>>>>> meaning of the prefix. > >>>> > >>>> So thinking about it further, rewinding the instruction seems the easiest > >>>> and most robust solution. I’ll do it. > >>>> > >>>>>>> You also aren’t accounting for the case where you get an exception > >>>>>>> that > >>>>>>> is, in turn, preempted. > >>>>>> > >>>>>> Hmm.. Can you give me an example for such an exception in my use-case? > >>>>>> I > >>>>>> cannot think of an exception that might be preempted (assuming #BP, #MC > >>>>>> cannot be preempted). > >>>>> > >>>>> Look for cond_local_irq_enable(). > >>>> > >>>> I looked at it. Yet, I still don’t see how exceptions might happen in my > >>>> use-case, but having said that - this can be fixed too. > >>> > >>> I’m not totally certain there’s a case that matters. But it’s worth > >>> checking > >> > >> I am still checking. But, I wanted to ask you whether the existing code is > >> correct, since it seems to me that others do the same mistake I did, unless > >> I don’t understand the code. > >> > >> Consider for example do_int3(), and see my inlined comments: > >> > >> dotraplinkage void notrace do_int3(struct pt_regs *regs, long error_code) > >> { > >> ... > >> ist_enter(regs); // => preempt_disable() > >> cond_local_irq_enable(regs); // => assume it enables IRQs > >> > >> ... > >> // resched irq can be delivered here. It will not caused rescheduling > >> // since preemption is disabled > >> > >> cond_local_irq_disable(regs); // => assume it disables IRQs > >> ist_exit(regs); // => preempt_enable_no_resched() > >> } > >> > >> At this point resched will not happen for unbounded length of time (unless > >> there is another point when exiting the trap handler that checks if > >> preemption should take place). > > > > I think it's only a bug in the cases where someone uses extable to fix > > up an int3 (which would be nuts) or that we oops. But I should still > > fix it. In the normal case where int3 was in user code, we'll miss > > the reschedule in do_trap(), but we'll reschedule in > > prepare_exit_to_usermode() -> exit_to_usermode_loop(). > > Thanks for your quick response, and sorry for bothering instead of dealing > with it. Note that do_debug() does something similar to do_int3(). > > And then there is optimized_callback() that also uses > preempt_enable_no_resched(). I think the original use was correct, but then > a19b2e3d7839 ("kprobes/x86: Remove IRQ disabling from ftrace-based/optimized > kprobes”) removed the IRQ disabling, while leaving > preempt_enable_no_resched() . No? Ah, good catch! Indeed, we don't need to stick on no_resched anymore. Thanks! -- Masami Hiramatsu <mhira...@kernel.org>