> On Oct 18, 2018, at 6:08 PM, Nadav Amit <na...@vmware.com> wrote: > > at 10:00 AM, Andy Lutomirski <l...@amacapital.net> wrote: > >> >> >>> On Oct 18, 2018, at 9:47 AM, Nadav Amit <na...@vmware.com> wrote: >>> >>> at 8:51 PM, Andy Lutomirski <l...@amacapital.net> wrote: >>> >>>>> On Wed, Oct 17, 2018 at 8:12 PM Nadav Amit <na...@vmware.com> wrote: >>>>> at 6:22 PM, Andy Lutomirski <l...@amacapital.net> wrote: >>>>> >>>>>>> On Oct 17, 2018, at 5:54 PM, Nadav Amit <na...@vmware.com> wrote: >>>>>>> >>>>>>> It is sometimes beneficial to prevent preemption for very few >>>>>>> instructions, or prevent preemption for some instructions that precede >>>>>>> a branch (this latter case will be introduced in the next patches). >>>>>>> >>>>>>> To provide such functionality on x86-64, we use an empty REX-prefix >>>>>>> (opcode 0x40) as an indication that preemption is disabled for the >>>>>>> following instruction. >>>>>> >>>>>> Nifty! >>>>>> >>>>>> That being said, I think you have a few bugs. First, you can’t just >>>>>> ignore >>>>>> a rescheduling interrupt, as you introduce unbounded latency when this >>>>>> happens — you’re effectively emulating preempt_enable_no_resched(), which >>>>>> is not a drop-in replacement for preempt_enable(). To fix this, you may >>>>>> need to jump to a slow-path trampoline that calls schedule() at the end >>>>>> or >>>>>> consider rewinding one instruction instead. Or use TF, which is only a >>>>>> little bit terrifying… >>>>> >>>>> Yes, I didn’t pay enough attention here. For my use-case, I think that the >>>>> easiest solution would be to make synchronize_sched() ignore preemptions >>>>> that happen while the prefix is detected. It would slightly change the >>>>> meaning of the prefix. >>> >>> So thinking about it further, rewinding the instruction seems the easiest >>> and most robust solution. I’ll do it. >>> >>>>>> You also aren’t accounting for the case where you get an exception that >>>>>> is, in turn, preempted. >>>>> >>>>> Hmm.. Can you give me an example for such an exception in my use-case? I >>>>> cannot think of an exception that might be preempted (assuming #BP, #MC >>>>> cannot be preempted). >>>> >>>> Look for cond_local_irq_enable(). >>> >>> I looked at it. Yet, I still don’t see how exceptions might happen in my >>> use-case, but having said that - this can be fixed too. >> >> I’m not totally certain there’s a case that matters. But it’s worth checking > > I am still checking. But, I wanted to ask you whether the existing code is > correct, since it seems to me that others do the same mistake I did, unless > I don’t understand the code. > > Consider for example do_int3(), and see my inlined comments: > > dotraplinkage void notrace do_int3(struct pt_regs *regs, long error_code) > { > ... > ist_enter(regs); // => preempt_disable() > cond_local_irq_enable(regs); // => assume it enables IRQs > > ... > // resched irq can be delivered here. It will not caused rescheduling > // since preemption is disabled > > cond_local_irq_disable(regs); // => assume it disables IRQs > ist_exit(regs); // => preempt_enable_no_resched() > } > > At this point resched will not happen for unbounded length of time (unless > there is another point when exiting the trap handler that checks if > preemption should take place).
I think it's only a bug in the cases where someone uses extable to fix up an int3 (which would be nuts) or that we oops. But I should still fix it. In the normal case where int3 was in user code, we'll miss the reschedule in do_trap(), but we'll reschedule in prepare_exit_to_usermode() -> exit_to_usermode_loop(). > > Another example is __BPF_PROG_RUN_ARRAY(), which also uses > preempt_enable_no_resched(). Alexei, I think this code is just wrong. Do you know why it uses preempt_enable_no_resched()? Oleg, the code in kernel/signal.c: preempt_disable(); read_unlock(&tasklist_lock); preempt_enable_no_resched(); freezable_schedule(); looks bogus. I don't get what it's trying to achieve with preempt_disable(), and I also don't see why no_resched does anything. Sure, it prevents a reschedule triggered during read_unlock() from causing a reschedule, but it doesn't prevent an interrupt immediately after the preempt_enable_no_resched() call from scheduling. --Andy > > Am I missing something? > > Thanks, > Nadav