* Andy Lutomirski <l...@kernel.org> wrote: > On Tue, Nov 21, 2017 at 10:22 PM, Ingo Molnar <mi...@kernel.org> wrote: > > > > * Andy Lutomirski <l...@kernel.org> wrote: > > > >> This sets up stack switching, including for SYSCALL. I think it's > >> in decent shape. > >> > >> Known issues: > >> - I think we're going to want a way to turn the stack switching on and > >> off either at boot time or at runtime. It should be fairly > >> straightforward > >> to make it work. > >> > >> - I think the ORC unwinder isn't so good at dealing with stack overflows. > >> It bails too early (I think), resulting in lots of ? entries. This > >> isn't a regression with this series -- it's just something that could > >> be improved. > >> > >> Ingo, patch 1 may be tip/urgent material. It fixes what I think is > >> a bug in Xen. I'm having a hard time testing because it's being > >> masked by a bigger unrelated bug that's keeping Xen from booting > >> when configured to hit the bug I'm fixing. (The latter bug goes at > >> least back to v4.13, I think I know roughtly what's wrong, and I've > >> reported it to the maintainers.) > > > > Hm, with this series the previous IRQ vector bug appears again: > > > > [ 51.156370] do_IRQ: 16.34 No irq handler for vector > > [ 57.511030] do_IRQ: 16.34 No irq handler for vector > > [ 57.528335] do_IRQ: 16.34 No irq handler for vector > > [ 57.533256] do_IRQ: 16.34 No irq handler for vector > > [ 63.991913] do_IRQ: 16.34 No irq handler for vector > > [ 63.996810] do_IRQ: 16.34 No irq handler for vector > > > > I've attached the reproducer config. Note that the system appears to be > > working to > > a certain extent (I could ssh to it and extract its config), but produces > > these > > warnings sporadically. > > I'll try to reproduce this, but this is weird. This is vector 34, > which is, or could be, a genuine IRQ vector. The only way I can think > of that my series would have caused this is if I very severely broke > common_interrupt, but I don't see how that could have happened without > breaking everything. It's also weird that you're seeing this only on > CPU 16. Maybe it's worth adding a WARN_ON to that warning to get a > stack trace just in case. > > Thomas, any insight here? > > > but don't get the IRQ vector warnings. > > Ingo, are you saying that you only get the IRQ vector warnings with > the SYSCALL hwframe fix applied? That's bizarre.
Correct. I assume it's because lockdep is working fine with that fix applied, but that also means that different irq-tracing code paths are taken. The lockdep error disables lockdep globally and immediately. > Anyway, I booted your config (more or less -- I munged it through > virtme-configkernel --update first) with 17 vCPUs and it seems fine. > Is the issue reliable enough to bisect? Ok, it should be bisectable, will try to bisect it. I think it's a key aspect that the CPU is AMD - a similar config on Intel seems to be working fine (modulo the unwinder warning). Thanks, Ingo