Hi Andy, Am 18.03.2015 um 20:26 schrieb Andy Lutomirski: > Hi Linus- > > You seem to enjoy debugging these things. Want to give this a shot? > My guess is a vmalloc fault accessing either old_rsp or kernel_stack > right after swapgs in syscall entry. > > On Wed, Mar 18, 2015 at 12:03 PM, Stefan Seyfried > <stefan.seyfr...@googlemail.com> wrote: >> Hi all, >> >> first, I'm kind of happy that I'm not the only one seeing this, and >> thus my beloved Thinkpad can stay for a bit longer... :-) >> >> Then, I'm mostly an amateur when it comes to kernel debugging, so bear >> with me when I'm stumbling through the code... >> >> Am 18.03.2015 um 19:03 schrieb Andy Lutomirski: >>> On Wed, Mar 18, 2015 at 10:46 AM, Takashi Iwai <ti...@suse.de> wrote: >>>> At Wed, 18 Mar 2015 18:43:52 +0100, >>>> Takashi Iwai wrote: >>>>> >>>>> At Wed, 18 Mar 2015 15:16:42 +0100, >>>>> Takashi Iwai wrote: >>>>>> >>>>>> At Sun, 15 Mar 2015 09:17:15 +0100, >>>>>> Stefan Seyfried wrote: >>>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> in 4.0-rc I have recently seen a few crashes, always when running >>>>>>> KVM guests (IIRC). Today I was able to capture a crash dump, this >>>>>>> is the backtrace from dmesg.txt: >>>>>>> >>>>>>> [242060.604870] PANIC: double fault, error_code: 0x0 >>> >>> OK, we double faulted. Too bad that x86 CPUs don't tell us why. >>> >>>>>>> [242060.604878] CPU: 1 PID: 2132 Comm: qemu-system-x86 Tainted: G >>>>>>> W 4.0.0-rc3-2.gd5c547f-desktop #1 >>>>>>> [242060.604880] Hardware name: LENOVO 74665EG/74665EG, BIOS 6DET71WW >>>>>>> (3.21 ) 12/13/2011 >>>>>>> [242060.604883] task: ffff880103f46150 ti: ffff8801013d4000 task.ti: >>>>>>> ffff8801013d4000 >>>>>>> [242060.604885] RIP: 0010:[<ffffffff816834ad>] [<ffffffff816834ad>] >>>>>>> page_fault+0xd/0x30 >>> >>> The double fault happened during page fault processing. Could you >>> disassemble your page_fault function to find the offending >>> instruction? >> >> This one is easy: >> >> crash> disassemble page_fault >> Dump of assembler code for function page_fault: >> 0xffffffff816834a0 <+0>: data32 xchg %ax,%ax >> 0xffffffff816834a3 <+3>: data32 xchg %ax,%ax >> 0xffffffff816834a6 <+6>: data32 xchg %ax,%ax >> 0xffffffff816834a9 <+9>: sub $0x78,%rsp >> 0xffffffff816834ad <+13>: callq 0xffffffff81683620 <error_entry> > > The callq was the double-faulting instruction, and it is indeed the > first function in here that would have accessed the stack. (The sub > *changes* rsp but isn't a memory access.) So, since RSP is bogus, we > page fault, and the page fault is promoted to a double fault. The > surprising thing is that the page fault itself seems to have been > delivered okay, and RSP wasn't on a page boundary. > > You wouldn't happen to be using a Broadwell machine?
No, this is a quite old Thinkpad X200s, Core2duo processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Core(TM)2 Duo CPU L9400 @ 1.86GHz stepping : 10 microcode : 0xa0c > The only way to get here with bogus RSP is if we interrupted something > that was previously running at CPL0 with similarly bogus RSP. > > I don't know if I trust CR2. It's 16 bytes lower than I'd expect. > >> 0xffffffff816834b2 <+18>: mov %rsp,%rdi >> 0xffffffff816834b5 <+21>: mov 0x78(%rsp),%rsi >> 0xffffffff816834ba <+26>: movq $0xffffffffffffffff,0x78(%rsp) >> 0xffffffff816834c3 <+35>: callq 0xffffffff810504e0 <do_page_fault> >> 0xffffffff816834c8 <+40>: jmpq 0xffffffff816836d0 <error_exit> >> End of assembler dump. >> >> >>>>>>> [242060.604893] RSP: 0018:00007fffa55eafb8 EFLAGS: 00010016 >>> >>> Uh, what? That RSP is a user address. >>> >>>>>>> [242060.604895] RAX: 000000000000aa40 RBX: 0000000000000001 RCX: >>>>>>> ffffffff81682237 >>>>>>> [242060.604896] RDX: 000000000000aa40 RSI: 0000000000000000 RDI: >>>>>>> 00007fffa55eb078 >>>>>>> [242060.604898] RBP: 00007fffa55f1c1c R08: 0000000000000008 R09: >>>>>>> 0000000000000000 >>>>>>> [242060.604900] R10: 0000000000000000 R11: 0000000000000293 R12: >>>>>>> 000000000000004a >>>>>>> [242060.604902] R13: 00007ffa356b5d60 R14: 000000000000000f R15: >>>>>>> 00007ffa3556cf20 >>>>>>> [242060.604904] FS: 00007ffa33dbfa80(0000) GS:ffff88023bc80000(0000) >>>>>>> knlGS:0000000000000000 >>>>>>> [242060.604906] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>>>> [242060.604908] CR2: 00007fffa55eafa8 CR3: 0000000002d7e000 CR4: >>>>>>> 00000000000427e0 >>>>>>> [242060.604909] Stack: >>>>>>> [242060.604942] BUG: unable to handle kernel paging request at >>>>>>> 00007fffa55eafb8 >>>>>>> [242060.604995] IP: [<ffffffff81005b44>] show_stack_log_lvl+0x124/0x190 >>> >>> This is suspicious. We need to have died, again, of a fatal page >>> fault while dumping the stack. >> >> I posted the same problem to the opensuse kernel list shortly before turning >> to LKML. There, Michal Kubecek noted: >> >> "I encountered a similar problem recently. The thing is, x86 >> specification says that on a double fault, RIP and RSP registers are >> undefined, i.e. you not only can't expect them to contain values >> corresponding to the first or second fault but you can't even expect >> them to have any usable values at all. Unfortunately the kernel double >> fault handler doesn't take this into account and does try to display >> usual crash related information so that it itself does usually crash >> when trying to show stack content (that's the show_stack_log_lvl() >> crash). > > I think that's not entirely true. RIP is reliable for many classes of > double faults, and we rely on that for espfix64. The fact that hpa > was willing to write that code strongly suggests that Intel chips at > least really do work that way. > >> >> The result is a double fault (which itself would be very hard to debug) >> followed by a crash in its handler so that analysing the outcome is >> extremely difficult." >> >> I cannot judge if this is true, but it sounded related to solving the >> problem to me. > > > The crash in the handler is a separate bug. > > > >> >>>>>>> [242060.605036] PGD 4779a067 PUD 40e3e067 PMD 4769e067 PTE 0 >>>>>>> [242060.605078] Oops: 0000 [#1] PREEMPT SMP >>>>>>> [242060.605106] Modules linked in: vhost_net vhost macvtap macvlan >>>>>>> nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd >>>>>>> grace sunrpc fscache nls_iso8859_1 nls_cp437 vfat fat ppp_deflate >>>>>>> bsd_comp ppp_async crc_ccitt ppp_generic slhc ses enclosure uas >>>>>>> usb_storage cmac algif_hash ctr ccm rfcomm fuse xt_CHECKSUM >>>>>>> iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat >>>>>>> nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack >>>>>>> nf_conntrack ipt_REJECT xt_tcpudp tun bridge stp llc ebtable_filter >>>>>>> ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables >>>>>>> af_packet bnep dm_crypt ecb cbc algif_skcipher af_alg xfs libcrc32c >>>>>>> snd_hda_codec_conexant snd_hda_codec_generic iTCO_wdt >>>>>>> iTCO_vendor_support snd_hda_intel snd_hda_controller snd_hda_codec >>>>>>> snd_hwdep snd_pcm_oss snd_pcm >>>>>>> [242060.605396] dm_mod snd_seq snd_seq_device snd_timer coretemp >>>>>>> kvm_intel kvm snd_mixer_oss cdc_ether cdc_wdm cdc_acm usbnet mii arc4 >>>>>>> uvcvideo videobuf2_vmalloc videobuf2_memops thinkpad_acpi >>>>>>> videobuf2_core btusb v4l2_common videodev i2c_i801 iwldvm bluetooth >>>>>>> serio_raw mac80211 pcspkr e1000e iwlwifi snd lpc_ich mei_me ptp >>>>>>> mfd_core pps_core mei cfg80211 shpchp wmi soundcore rfkill battery ac >>>>>>> tpm_tis tpm acpi_cpufreq i915 xhci_pci xhci_hcd i2c_algo_bit >>>>>>> drm_kms_helper drm thermal video button processor sg loop >>>>>>> [242060.605396] CPU: 1 PID: 2132 Comm: qemu-system-x86 Tainted: G >>>>>>> W 4.0.0-rc3-2.gd5c547f-desktop #1 >>>>>>> [242060.605396] Hardware name: LENOVO 74665EG/74665EG, BIOS 6DET71WW >>>>>>> (3.21 ) 12/13/2011 >>>>>>> [242060.605396] task: ffff880103f46150 ti: ffff8801013d4000 task.ti: >>>>>>> ffff8801013d4000 >>>>>>> [242060.605396] RIP: 0010:[<ffffffff81005b44>] [<ffffffff81005b44>] >>>>>>> show_stack_log_lvl+0x124/0x190 >>>>>>> [242060.605396] RSP: 0018:ffff88023bc84e88 EFLAGS: 00010046 >>>>>>> [242060.605396] RAX: 00007fffa55eafc0 RBX: 00007fffa55eafb8 RCX: >>>>>>> ffff88023bc7ffc0 >>>>>>> [242060.605396] RDX: 0000000000000000 RSI: ffff88023bc84f58 RDI: >>>>>>> 0000000000000000 >>>>>>> [242060.605396] RBP: ffff88023bc83fc0 R08: ffffffff81a2fe15 R09: >>>>>>> 0000000000000020 >>>>>>> [242060.605396] R10: 0000000000000afb R11: ffff88023bc84bee R12: >>>>>>> ffff88023bc84f58 >>>>>>> [242060.605396] R13: 0000000000000000 R14: ffffffff81a2fe15 R15: >>>>>>> 0000000000000000 >>>>>>> [242060.605396] FS: 00007ffa33dbfa80(0000) GS:ffff88023bc80000(0000) >>>>>>> knlGS:0000000000000000 >>>>>>> [242060.605396] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>>>> [242060.605396] CR2: 00007fffa55eafb8 CR3: 0000000002d7e000 CR4: >>>>>>> 00000000000427e0 >>>>>>> [242060.605396] Stack: >>>>>>> [242060.605396] 0000000002d7e000 0000000000000008 ffff88023bc84ee8 >>>>>>> 00007fffa55eafb8 >>>>>>> [242060.605396] 0000000000000000 ffff88023bc84f58 00007fffa55eafb8 >>>>>>> 0000000000000040 >>>>>>> [242060.605396] 00007ffa356b5d60 000000000000000f 00007ffa3556cf20 >>>>>>> ffffffff81005c36 >>>>>>> [242060.605396] Call Trace: >>>>>>> [242060.605396] [<ffffffff81005c36>] show_regs+0x86/0x210 >>>>>>> [242060.605396] [<ffffffff8104636f>] df_debug+0x1f/0x30 >>>>>>> [242060.605396] [<ffffffff810041a4>] do_double_fault+0x84/0x100 >>>>>>> [242060.605396] [<ffffffff81683088>] double_fault+0x28/0x30 >>>>>>> [242060.605396] [<ffffffff816834ad>] page_fault+0xd/0x30 >>>>>>> [242060.605396] Code: fe a2 81 31 c0 89 54 24 08 48 89 0c 24 48 8b 5b >>>>>>> f8 e8 cc 06 67 00 48 8b 0c 24 8b 54 24 08 85 d2 74 05 f6 c2 03 74 48 48 >>>>>>> 8d 43 08 <48> 8b 33 48 c7 c7 0d fe a2 81 89 54 24 14 48 89 4c 24 08 48 >>>>>>> 89 >>>>>>> [242060.605396] RIP [<ffffffff81005b44>] show_stack_log_lvl+0x124/0x190 >>>>>>> [242060.605396] RSP <ffff88023bc84e88> >>>>>>> [242060.605396] CR2: 00007fffa55eafb8 >>>>>>> >>>>>>> I would not totally rule out a hardware problem, since this machine had >>>>>>> another weird crash where it crashed and the bios beeper was constant >>>>>>> on until I hit the power button for 5 seconds. >>>>>>> >>>>>>> Unfortunately, I cannot load the crashdump with the crash version in >>>>>>> openSUSE Tumbleweed, so the backtrace is all I have for now. >>>>>> >>>>>> Just "me too", I'm getting the very same crash out of sudden with the >>>>>> recent 4.0-rc. Judging from the very same pattern (usually crash >>>>>> happened while using KVM (-smp 4) and kernel builds with -j8), I don't >>>>>> think it's a hardware problem. >>>>> >>>>> The git bisection pointed to the commit: >>>>> commit b926e6f61a26036ee9eabe6761483954d481ad25 >>>>> x86, traps: Fix ist_enter from userspace >>>>> >>>>> And reverting this on top of the latest Linus tree seems working. >>>>> Seife, could you verify on your machine, too? >>>> >>>> Argh, false positive. Right after I wrote this mail, I got the very >>>> same crash. I seem to need running the test much longer than I >>>> thought. >>>> >>>> But somehow the commits around the above smell suspicious... >>>> >>> >>> Those commits shouldn't really have affected page fault or double >>> fault behavior. They made big changes to MCE, breakpoints, and debug >>> exceptions. >>> >>> Something's very wrong here. I'm guessing that we somehow ended up in >>> page_fault in a completely invalid context. >>> >>> One hairy code path that could plausibly do this is: >>> >>> 1. syscall >>> >>> 2. vmalloc fault accessing old_rsp aka rsp_scratch (or kernel_stack -- >>> same issue) >>> >>> 3. page fault. Now we're on the user stack and we do awful things. >>> If we run off the end of the presently writable portion of the stack, >>> we double fault. >> >> Maybe Michal's idea from above points in the right direction? >> >> Now since I have a crash dump and the corresponding debuginfo at hand, >> might this help somehow to find out where the problem originated from? >> >> I mean -- it's only 508 processes to look at :-) but if I knew what to >> look for in their backtraces, I would try to do it. > > The relevant thread's stack is here (see ti in the trace): > > ffff8801013d4000 > > It could be interesting to see what's there. > > I don't suppose you want to try to walk the paging structures to see > if ffff88023bc80000 (i.e. gsbase) and, more specifically, > ffff88023bc80000 + old_rsp and ffff88023bc80000 + kernel_stack are > present? You'd only have to walk one level -- presumably, if the PGD > entry is there, the rest of the entries are okay, too. That's all greek to me :-) I see that there is something at ffff88023bc80000: crash> x /64xg 0xffff88023bc80000 0xffff88023bc80000: 0x0000000000000000 0x0000000000000000 0xffff88023bc80010: 0x0000000000000000 0x0000000000000000 0xffff88023bc80020: 0x0000000000000000 0x000000006686ada9 0xffff88023bc80030: 0x0000000000000000 0x0000000000000000 0xffff88023bc80040: 0x0000000000000000 0x0000000000000000 0xffff88023bc80050: 0x0000000000000000 0x0000000000000000 0xffff88023bc80060: 0x0000000000000000 0x0000000000000000 0xffff88023bc80070: 0x0000000000000000 0x0000000000000000 0xffff88023bc80080: 0x0000000000000000 0x0000000000000000 0xffff88023bc80090: 0x0000000000000000 0x0000000000000000 0xffff88023bc800a0: 0x0000000000000000 0x0000000000000000 0xffff88023bc800b0: 0x0000000000000000 0x0000000000000000 0xffff88023bc800c0: 0x0000000000000000 0x0000000000000000 0xffff88023bc800d0: 0x0000000000000000 0x0000000000000000 0xffff88023bc800e0: 0x0000000000000000 0x0000000000000000 0xffff88023bc800f0: 0x0000000000000000 0x0000000000000000 0xffff88023bc80100: 0x0000000000000000 0x0000000000000000 0xffff88023bc80110: 0x0000000000000000 0x0000000000000000 0xffff88023bc80120: 0x0000000000000000 0x0000000000000000 0xffff88023bc80130: 0x0000000000000000 0x0000000000000000 0xffff88023bc80140: 0x0000000000000000 0x0000000000000000 0xffff88023bc80150: 0x0000000000000000 0x0000000000000000 0xffff88023bc80160: 0x0000000000000000 0x0000000000000000 0xffff88023bc80170: 0x0000000000000000 0x0000000000000000 0xffff88023bc80180: 0x0000000000000000 0x0000000000000000 0xffff88023bc80190: 0x0000000000000000 0x0000000000000000 0xffff88023bc801a0: 0x0000000000000000 0x0000000000000000 0xffff88023bc801b0: 0x0000000000000000 0x0000000000000000 0xffff88023bc801c0: 0x0000000000000000 0x0000000000000000 0xffff88023bc801d0: 0x0000000000000000 0x0000000000000000 0xffff88023bc801e0: 0x0000000000000000 0x0000000000000000 0xffff88023bc801f0: 0x0000000000000000 0x0000000000000000 old_rsp and kernel_stack seem bogus: crash> print old_rsp Cannot access memory at address 0xa200 gdb: gdb request failed: print old_rsp crash> print kernel_stack Cannot access memory at address 0xaa48 gdb: gdb request failed: print kernel_stack kernel_stack is not a pointer? So 0xffff88023bc80000 + 0xaa48 it is: crash> x /64xg 0xffff88023bc8aa00 0xffff88023bc8aa00: 0x0000000000000000 0x0000000000000000 0xffff88023bc8aa10: 0x0000000000000000 0xffff880103f46150 0xffff88023bc8aa20: 0xffffffff80010001 0xffff88023bc83fc0 0xffff88023bc8aa30: 0x0000000000000000 0x0000000000000000 0xffff88023bc8aa40: 0xffff880103f46150 0xffff8801013d7fd8 0xffff88023bc8aa50: 0x0000000000000000 0x0000000000000000 0xffff88023bc8aa60: 0x0000000000000000 0x0000000000000000 0xffff88023bc8aa70: 0x0000000000000000 0x0000000000000000 0xffff88023bc8aa80: 0x0000000000000000 0x0000000000000000 0xffff88023bc8aa90: 0x0000000000000000 0x0000000000000000 0xffff88023bc8aaa0: 0x0000000000000000 0x0000000000000000 0xffff88023bc8aab0: 0x0000000000000000 0x0000000000000000 0xffff88023bc8aac0: 0x0000000000000000 0x0000000000000000 0xffff88023bc8aad0: 0x0000000000000000 0x0000000000000000 0xffff88023bc8aae0: 0x0000000000000000 0x0000000000000000 0xffff88023bc8aaf0: 0x0000000000000000 0x0000000000000000 0xffff88023bc8ab00: 0x0000000000000000 0x0000000000000000 0xffff88023bc8ab10: 0x0000000000000000 0x0000000000000000 0xffff88023bc8ab20: 0x0000000000000000 0x0000000000000000 0xffff88023bc8ab30: 0x0000000000000000 0x0000000000000000 0xffff88023bc8ab40: 0x0000000000000000 0x0000000000000000 0xffff88023bc8ab50: 0x0000000000000000 0x0000000000000000 0xffff88023bc8ab60: 0x0000000000000000 0x000000007fffffff 0xffff88023bc8ab70: 0x0000000000000000 0x0000000000000000 0xffff88023bc8ab80: 0x0000000000000000 0x0000000000000000 0xffff88023bc8ab90: 0x0000000000000000 0x0000000000000000 0xffff88023bc8aba0: 0x0000000000000000 0x0000000000000000 0xffff88023bc8abb0: 0x0000000000000000 0x0000000000000000 0xffff88023bc8abc0: 0x0000000000000000 0x0000000000000000 0xffff88023bc8abd0: 0x0000000000000000 0x0000000000000000 0xffff88023bc8abe0: 0x0000000000000000 0x0000000000000000 0xffff88023bc8abf0: 0x0000000000000000 0x0000000000000000 and for old_rsp: crash> x /64xg 0xffff88023bc8a200 0xffff88023bc8a200: 0x00007fffa55f1b98 0x0000000000000000 0xffff88023bc8a210: 0x0000000000000000 0x0000000000000000 0xffff88023bc8a220: 0x0000000000000000 0x0000000000000000 0xffff88023bc8a230: 0x0000000000003ce6 0x0000000000000000 0xffff88023bc8a240: 0xffffffffffffffff 0xffffffffffffffff 0xffff88023bc8a250: 0xffffffffffffffff 0xffffffffffffffff 0xffff88023bc8a260: 0xffffffffffffffff 0xffffffffffffffff 0xffff88023bc8a270: 0xffffffffffffffff 0xffffffffffffffff 0xffff88023bc8a280: 0xffffffffffffffff 0xffffffffffffffff 0xffff88023bc8a290: 0xffffffffffffffff 0xffffffffffffffff 0xffff88023bc8a2a0: 0xffffffffffffffff 0xffffffffffffffff 0xffff88023bc8a2b0: 0xffffffffffffffff 0xffffffffffffffff 0xffff88023bc8a2c0: 0xffffffffffffffff 0xffffffff0000001b 0xffff88023bc8a2d0: 0xffffffff00000023 0xffffffffffffffff 0xffff88023bc8a2e0: 0xffffffffffffffff 0xffffffffffffffff 0xffff88023bc8a2f0: 0xffffffffffffffff 0xffffffffffffffff 0xffff88023bc8a300: 0x0000000100000000 0x0000000300000002 0xffff88023bc8a310: 0x0000000500000004 0x0000000700000006 0xffff88023bc8a320: 0x0000000900000008 0x0000000b0000000a 0xffff88023bc8a330: 0x0000000d0000000c 0x0000000f0000000e 0xffff88023bc8a340: 0x00000014ffffffff 0xffffffffffffffff 0xffff88023bc8a350: 0xffffffffffffffff 0xffffffffffffffff 0xffff88023bc8a360: 0xffffffffffffffff 0xffffffffffffffff 0xffff88023bc8a370: 0xffffffffffffffff 0xffffffffffffffff 0xffff88023bc8a380: 0x00000015ffffffff 0xffffffffffffffff 0xffff88023bc8a390: 0xffffffff0000001e 0xffffffffffffffff 0xffff88023bc8a3a0: 0xffffffffffffffff 0xffffffffffffffff 0xffff88023bc8a3b0: 0xffffffffffffffff 0xffffffffffffffff 0xffff88023bc8a3c0: 0x00000016ffffffff 0xffffffffffffffff 0xffff88023bc8a3d0: 0xffffffffffffffff 0xffffffffffffffff 0xffff88023bc8a3e0: 0xffffffffffffffff 0xffffffffffffffff 0xffff88023bc8a3f0: 0xffffffffffffffff 0xffffffffffffffff But it's entirely possible I'm using the tool wrong. What type of data is supposed to be at those places? I tried crash> print *(pgtable_t) 0xffff88023bc8a200 but I cannot judge if the output is useful :-( Best regards, Stefan -- Stefan Seyfried Linux Consultant & Developer -- GPG Key: 0x731B665B B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/