On Tue, Dec 30, 2014 at 3:29 PM, Andy Lutomirski <l...@amacapital.net> wrote: > On Dec 30, 2014 11:03 AM, "Peter Zijlstra" <pet...@infradead.org> wrote: >> >> On Thu, Dec 25, 2014 at 07:48:28AM -0800, Andy Lutomirski wrote: >> > On a quick look, there are plenty of other bugs in there besides just >> > the stack pointer issue. The ABI check that uses TIF_IA32 in the perf >> > core is completely wrong. TIF_IA32 may be equal to the actual >> > userspace bitness by luck, but, if so, that's more or less just luck. >> > And there's a user_mode test that should be user_mode_vm. >> > >> > Also, it's not just sp that's wrong. There are various places that >> > you can interrupt in which many of the registers have confusing >> > locations. You could try using the cfi unwind data, but that's >> > unlikely to work for regs like cs and ss, and, during context switch, >> > this has very little chance of working. >> > >> > What's the point of this feature? Honestly, my suggestion would be to >> > delete it instead of trying to fix it. It's also not clear to me that >> > there aren't serious security problems here -- it's entirely possible >> > for sensitive *kernel* values to and up in task_pt_regs at certain >> > times, and if you run during context switch and there's no code to >> > suppress this dump during context switch, then you could be showing >> > regs that belong to the wrong task. >> >> Of course the people who actually wrote the code are not on CC :/ >> >> There's two users of this iirc; >> >> 1) the dwarf stack unwinder thingy, which basically dumps the userspace >> regs and the top of userspace stack on 'event'. >> > > Given how the x86_64* entry code works, using task_pt_regs from > anywhere except explicitly supported contexts (including exceptions > that originated in userspace and a small handful of system calls) is > asking for trouble. NMI context is especially bad. > > How important is this feature, and which registers matter? It might > be possible to use a dwarf unwinder on the kernel call stack to get > most of the regs from most contexts, and it might also be possible to > make small changes to the entry code to make it possible to get some > of the registers reliably, but it's not currently possible to safely > use task_pt_regs *at all* from NMI context unless you've at least > blacklisted a handful of origin RIP values that give dangerously bogus > results. (Using do_nmi's regs parameter if user_mode_vm(regs) is a > different story.)
It's actually worse than just knowing the interrupted kernel RIP. If the call chain goes usermode -> IST exception -> NMI, then task_pt_regs is entirely uninitialized. Assuming all the CFI annotations are correct, the unwinder could still do it from the kernel. Note that, as far as I know, Jan Beulich is the only person who uses the unwinder on kernel code. Jan, how do you do this? > > * I'm not nearly as familiar with the 32-bit entry code, so I don't > know whether we have the same issues there. > >> 2) the recent sample_regs_intr, which dumps the register set at >> 'event', be it kernel or userspace. >> > > What's wrong with the PMI's pt_regs for that? If we interrupted the > kernel, they'll be kernel regs (with all their attendant security > issues) and, if we interrupted userspace, then they'll be the full, > correct userspace registers. > > --Andy > >> >> The first is somewhat usable when lacking framepointers while still >> desiring some unwind information, the second is useful to things like >> call argument profiling and the like. -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/