On Wed, Feb 3, 2021 at 10:10 AM Linus Torvalds <torva...@linux-foundation.org> wrote: > > On Wed, Feb 3, 2021 at 10:00 AM Gabriel Krisman Bertazi > <kris...@collabora.com> wrote: > > > > Does the patch below follows your suggestion? I'm setting the > > SYSCALL_WORK shadowing TIF_SINGLESTEP every time, instead of only when > > the child is inside a system call. Is this acceptable? > > Looks sane to me. > > My main worry would be about "what about the next system call"? It's > not what Kyle's case cares about, but let me just give an example: > > - task A traces task B, and starts single-stepping. Task B was *not* > in a system call at this point. > > - task B happily executes one instruction at a time, takes a TF > fault, everything is good > > - task B now does a system call. That will disable single-stepping > while in the kernel > > - task B returns from the system call. TF will be set in eflags, but > the first instruction *after* the system call will execute unless we > go through the system call exit path > > So I think the tracer basically misses one instruction when single-stepping.
I was hoping you wouldn't ask this :) The x86 architecture is fundamentally a bit busted here. If we return from a system call with SYSRET and TF is set in R11, then SYSRET traps, which means that #DB is delivered before executing a user instruction. I have been asking Intel for quite a while to document this, and they said they did, but I still can't find it. IRET is the opposite: if we return from a system call with IRET and TF is set on the stack, we execute one user instruction and then trap. So if we want to reliably single-step a system call and trap after the system call, we just need to synthesize a trap on the way out. Doing this and getting all the nasty corners (e.g. sigreturn setting TF, sigreturn *clearing* TF, signal delivery as part of the syscall, ptrace mucking with TF) etc right might be nontrivial. I suspect the behavior back in the bad old asm-entry-path days was at best inconsistent. --Andy