At Thu, 19 Mar 2015 08:41:57 -0700, Andy Lutomirski wrote: > > On Thu, Mar 19, 2015 at 8:22 AM, Takashi Iwai <ti...@suse.de> wrote: > > At Thu, 19 Mar 2015 15:55:26 +0100, > > Takashi Iwai wrote: > >> > >> At Thu, 19 Mar 2015 14:47:12 +0100, > >> Takashi Iwai wrote: > >> > > >> > At Thu, 19 Mar 2015 13:48:56 +0100, > >> > Denys Vlasenko wrote: > >> > > > >> > > Having no more ideas at the moment, here is a tarball of 13 patches > >> > > of commits touching entry_64.S up to 4.0.0-rc1. > >> > > > >> > > x0001.patch is the latest, x0015.patch is the oldest. > >> > > > >> > > Patches 0003 and 0008 are not there since 0003 is empty merge patch > >> > > and 0008 does some PCI fixup. > >> > > > >> > > If this breakage is recent, it ought to be one of these. > >> > > Most of them do some non-trivial surgery. > >> > > > >> > > Even though I did not spot anything suspicious in them, > >> > > entry.S is notorious for subtle breakage. > >> > > > >> > > Try reverting them in sequence starting from x0001.patch > >> > > and see reverting which one makes crash disappear. > >> > > >> > OK, I'm going to check these git series. > >> > >> Reverting the commit > >> 96b6352c12711d5c0bb7157f49c92580248e8146 > >> x86_64, entry: Remove the syscall exit audit and schedule optimizations > >> > >> seems enough. After reverting this one, the machine runs stable with > >> the kvm stress test. > >> > >> (I'll keep test running for a while; at the previous bisection, I hit > >> the bug right after posting the mail ;) > > > > It survived long enough, so this looks like the spot. > > > > Also, I checked the patch below instead of reverting the commit, and > > this seems working, too. > > > > > > Takashi > > > > diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S > > index 1d74d161687c..5340ac7f88a9 100644 > > --- a/arch/x86/kernel/entry_64.S > > +++ b/arch/x86/kernel/entry_64.S > > @@ -364,12 +364,12 @@ system_call_fastpath: > > * Has incomplete stack frame and undefined top of stack. > > */ > > ret_from_sys_call: > > - testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET) > > - jnz int_ret_from_sys_call_fixup /* Go the the slow path */ > > - > > LOCKDEP_SYS_EXIT > > DISABLE_INTERRUPTS(CLBR_NONE) > > TRACE_IRQS_OFF > > + testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET) > > + jnz int_ret_from_sys_call_fixup /* Go the the slow path */ > > + > > CFI_REMEMBER_STATE > > /* > > * sysretq will re-enable interrupts: > > The crash you're seeing could certainly be caused by an IRQ at the > wrong time. However: > > int_ret_from_sys_call_fixup: > FIXUP_TOP_OF_STACK %r11, -ARGOFFSET > jmp int_ret_from_sys_call > > and > > GLOBAL(int_ret_from_sys_call) > DISABLE_INTERRUPTS(CLBR_NONE) > TRACE_IRQS_OFF > > so with or without your little patch, we're turning off IRQs very > quickly. retint_swapgs also turnes off interrupts before doing > anything. So I don't see how your patch would have any effect.
What about LOCKDEP_SYS_EXIT? Takashi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/