Fwd: Re: [qubes-users] Re: AppVms being killed on resume due to clock skew too large

Mike Keehan Sat, 01 Feb 2020 06:59:04 -0800

Should have replied to the list!


-------- Forwarded Message --------

Subject: Re: [qubes-users] Re: AppVms being killed on resume due toclock skew too large

Date: Sat, 1 Feb 2020 11:49:29 +0000
From: Mike Keehan <m...@keehan.net>
To: mmo...@disroot.org

On 2/1/20 10:27 AM, mmo...@disroot.org wrote:

Same problem again, this time not related to any socket closure.
Apparently related to systemd:
[41911.199732] audit: type=1104 audit(1580516883.707:119): pid=4917uid=0 auid=4294967295 ses=4294967295 msg='op=PAM:setcredgrantors=pam_rootok acct="root" exe="/usr/lib/qubes/qrexec-agent"hostname=? addr=? terminal=? res=success'[41920.252871] clocksource: timekeeping watchdog on CPU0: Markingclocksource 'tsc' as unstable because the skew is too large:[41920.252927] clocksource: 'xen' wd_now: 2a1620baf67a wd_last:2a140e3c5f9f mask: ffffffffffffffff[41920.252972] clocksource: 'tsc' cs_now: ffffff88779d4270 cs_last:5083a288ea9a mask: ffffffffffffffff
[41920.253013] tsc: Marking TSC unstable due to clocksource watchdog
[41921.161370] audit: type=1100 audit(1580516893.670:120): pid=4955uid=0 auid=4294967295 ses=4294967295 msg='op=PAM:authenticationgrantors=pam_rootok acct="root" exe="/usr/lib/qubes/qrexec-agent"hostname=? addr=? terminal=? res=success'[41921.163039] audit: type=1103 audit(1580516893.672:121): pid=4955uid=0 auid=4294967295 ses=4294967295 msg='op=PAM:setcredgrantors=pam_rootok acct="root" exe="/usr/lib/qubes/qrexec-agent"hostname=? addr=? terminal=? res=success'[41921.176874] audit: type=1105 audit(1580516893.686:122): pid=4955uid=0 auid=4294967295 ses=4294967295 msg='op=PAM:session_opengrantors=pam_keyinit,pam_limits,pam_systemd,pam_unix,pam_umask,pam_lastlogacct="root" exe="/usr/lib/qubes/qrexec-agent" hostname=? addr=?terminal=? res=success'[41922.205481] audit: type=1106 audit(1580552389.038:123): pid=4955uid=0 auid=4294967295 ses=4294967295 msg='op=PAM:session_closegrantors=pam_keyinit,pam_limits,pam_systemd,pam_unix,pam_umask,pam_lastlogacct="root" exe="/usr/lib/qubes/qrexec-agent" hostname=? addr=?terminal=? res=success'[41922.205554] audit: type=1104 audit(1580552389.038:124): pid=4955uid=0 auid=4294967295 ses=4294967295 msg='op=PAM:setcredgrantors=pam_rootok acct="root" exe="/usr/lib/qubes/qrexec-agent"hostname=? addr=? terminal=? res=success'*[41932.321374] systemd[4919]: segfault at 640550f11920 ip0000640550345cbd sp 00007ffd40e80440 error 6 in systemd[6405502f6000+b7000][41932.321420] Code: 24 28 02 00 00 48 85 c9 74 0f 48 89 81 28 02 00 0049 8b 84 24 28 02 00 00 48 85 c0 0f 84 a0 07 00 00 49 8b 94 24 20 02 0000 <48> 89 90 20 02 00 00 49 c7 84 24 28 02 00 00 00 00 00 00 49 c7 84*[41932.321515] audit: type=1701 audit(1580552399.156:125): auid=0 uid=0gid=0 ses=4 pid=4919 comm="systemd" exe="/usr/lib/systemd/systemd"sig=11 res=1[41932.336794] audit: type=1130 audit(1580552399.171:126): pid=1 uid=0auid=4294967295 ses=4294967295 msg='unit=systemd-coredump@0-4990-0comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=?terminal=? res=success'[41932.627105] audit: type=1131 audit(1580552399.456:127): pid=1 uid=0auid=4294967295 ses=4294967295 msg='unit=user@0 comm="systemd"exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'[41932.636551] audit: type=1131 audit(1580552399.471:128): pid=1 uid=0auid=4294967295 ses=4294967295 msg='unit=user-runtime-dir@0comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=?terminal=? res=success'[41932.661359] audit: type=1131 audit(1580552399.495:129): pid=1 uid=0auid=4294967295 ses=4294967295 msg='unit=systemd-coredump@0-4990-0comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=?terminal=? res=success'[41934.482123] BUG: unable to handle kernel NULL pointer dereference at0000000000000080
[41934.482143] PGD 0 P4D 0
[41934.482150] Oops: 0000 [#1] SMP PTI
[41934.482159] CPU: 0 PID: 5002 Comm: Compositor Tainted: G O4.19.94-1.pvops.qubes.x86_64 #1
[41934.482178] RIP: 0010:mem_cgroup_page_lruvec+0x28/0x50
[41934.482189] Code: 00 00 0f 1f 44 00 00 0f 1f 44 00 00 48 8b 47 38 488b 17 48 85 c0 48 0f 44 05 dc d1 0c 01 48 c1 ea 36 48 8b 84 d0 48 0a 0000 <48> 3b b0 80 00 00 00 75 12 f3 c3 48 8d 86 a0 a1 02 00 48 3b b0 80
[41934.482222] RSP: 0018:ffffc900011d3aa8 EFLAGS: 00010046
[41934.482232] RAX: 0000000000000000 RBX: ffffffff82369cc0 RCX:ffffc900011d3ae8[41934.482246] RDX: 0000000000000000 RSI: ffff8880f9fd5000 RDI:ffffea0002adec00[41934.482265] RBP: ffff88802f7e6fb8 R08: ffffc900011d3ae8 R09:000000000001eb39[41934.482279] R10: 00000000000fa000 R11: ffffffffffffffff R12:ffff8880f9fd5000[41934.482294] R13: ffffea0002adec00 R14: 0000000000000014 R15:ffff88802f7e7000[41934.482308] FS: 0000000000000000(0000) GS:ffff8880f5a00000(0000)knlGS:0000000000000000
[41934.482323] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[41934.482335] CR2: 0000000000000080 CR3: 000000003c9da001 CR4:00000000003606f0[41934.482351] DR0: 0000000000000000 DR1: 0000000000000000 DR2:0000000000000000[41934.482365] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:0000000000000400
[41934.482380] Call Trace:
[41934.482388] release_pages+0x12c/0x4b0
[41934.482397] tlb_flush_mmu_free+0x36/0x50
[41934.482406] unmap_page_range+0x8f0/0xd00
[41934.482415] unmap_vmas+0x4c/0xa0
[41934.482423] exit_mmap+0xb5/0x1a0
[41934.482432] mmput+0x5f/0x140
[41934.482443] flush_old_exec+0x597/0x6c0
[41934.482451] ? load_elf_phdrs+0x97/0xb0
[41934.482460] load_elf_binary+0x3d9/0x1224
[41934.482468] ? get_acl+0x1a/0x100
[41934.482477] search_binary_handler+0xa6/0x1c0
[41934.482487] __do_execve_file.isra.34+0x587/0x7e0
[41934.482498] __x64_sys_execve+0x34/0x40
[41934.482506] do_syscall_64+0x5b/0x190
[41934.482515] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[41934.482526] RIP: 0033:0x7c1fb7d15acb
[41934.482535] Code: Bad RIP value.
[41934.482543] RSP: 002b:00007c1fa7361b18 EFLAGS: 00000246 ORIG_RAX:000000000000003b[41934.482557] RAX: ffffffffffffffda RBX: 00007c1fa7361b40 RCX:00007c1fb7d15acb[41934.482572] RDX: 00007c1fa9b5f800 RSI: 00007c1fa7361b20 RDI:00007c1fb7a22cd0[41934.482586] RBP: 00007c1fa7361ba0 R08: 00007c1fa7361b38 R09:00007c1fa7361b60[41934.482600] R10: 00007c1fa7361b20 R11: 0000000000000246 R12:00007c1fa7361bd8[41934.482615] R13: 0000000000000000 R14: 000000005e355001 R15:00007c1fa7361bf0[41934.482630] Modules linked in: ip6table_filter ip6_tablesxt_conntrack ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntracknf_defrag_ipv6 nf_defrag_ipv4 libcrc32c intel_rapl crct10dif_pclmulcrc32_pclmul crc32c_intel xen_netfront ghash_clmulni_intelintel_rapl_perf pcspkr u2mfn(O) xenfs xen_privcmd xen_gntdevxen_gntalloc xen_blkback xen_evtchn overlay xen_blkfront
[41934.482694] CR2: 0000000000000080
[41934.482703] ---[ end trace f587889938477959 ]---
[41934.482714] RIP: 0010:mem_cgroup_page_lruvec+0x28/0x50
[41934.482724] Code: 00 00 0f 1f 44 00 00 0f 1f 44 00 00 48 8b 47 38 488b 17 48 85 c0 48 0f 44 05 dc d1 0c 01 48 c1 ea 36 48 8b 84 d0 48 0a 0000 <48> 3b b0 80 00 00 00 75 12 f3 c3 48 8d 86 a0 a1 02 00 48 3b b0 80
[41934.482756] RSP: 0018:ffffc900011d3aa8 EFLAGS: 00010046
[41934.482766] RAX: 0000000000000000 RBX: ffffffff82369cc0 RCX:ffffc900011d3ae8[41934.482780] RDX: 0000000000000000 RSI: ffff8880f9fd5000 RDI:ffffea0002adec00[41934.482794] RBP: ffff88802f7e6fb8 R08: ffffc900011d3ae8 R09:000000000001eb39[41934.482808] R10: 00000000000fa000 R11: ffffffffffffffff R12:ffff8880f9fd5000[41934.482822] R13: ffffea0002adec00 R14: 0000000000000014 R15:ffff88802f7e7000[41934.482837] FS: 0000000000000000(0000) GS:ffff8880f5a00000(0000)knlGS:0000000000000000
[41934.482851] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[41934.482863] CR2: 00007c1fb7d15aa1 CR3: 000000003c9da001 CR4:00000000003606f0[41934.482877] DR0: 0000000000000000 DR1: 0000000000000000 DR2:0000000000000000[41934.482891] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:0000000000000400
[41934.482905] Kernel panic - not syncing: Fatal exception
[41936.108632] Shutting down cpus with NMI
[41936.108774] Kernel Offset: disabled


Any idea what might be causing this issue?

Thanks.
January 31, 2020 5:08 PM, mmo...@disroot.org <mailto:mmo...@disroot.org>wrote:
    Many thanks for the suggestion!
    I'm not using any proprietary modules of any sort, below are the
    only modules that I are loaded in the AppVM that was killed (as you
    can see nothing really special):

    Module Size Used by
    fuse 126976 3
    ip6table_filter 16384 1
    ip6_tables 32768 1 ip6table_filter
    xt_conntrack 16384 2
    ipt_MASQUERADE 16384 1
    iptable_nat 16384 1
    nf_nat_ipv4 16384 2 ipt_MASQUERADE,iptable_nat
    nf_nat 36864 1 nf_nat_ipv4
    nf_conntrack 163840 4 xt_conntrack,nf_nat,ipt_MASQUERADE,nf_nat_ipv4
    nf_defrag_ipv6 20480 1 nf_conntrack
    nf_defrag_ipv4 16384 1 nf_conntrack
    libcrc32c 16384 2 nf_conntrack,nf_nat
    intel_rapl 24576 0
    crct10dif_pclmul 16384 0
    crc32_pclmul 16384 0
    crc32c_intel 24576 1
    ghash_clmulni_intel 16384 0
    xen_netfront 32768 0
    intel_rapl_perf 16384 0
    pcspkr 16384 0
    xenfs 16384 1
    u2mfn 16384 0
    xen_privcmd 24576 17 xenfs
    xen_gntdev 24576 1
    xen_gntalloc 16384 5
    xen_blkback 49152 0
    xen_evtchn 16384 6
    overlay 122880 1
    xen_blkfront 45056 6

    The closesure of the socket probably is related with borgmatic (that
    I'm using as my backup mechanism for the AppVms). But I don't think
    its related, since I this enabled only in a few machines, and even
    the ones that are not using borgmatic are terminated on resume.

    I'm runing out of ideas on this. What I do noticed though is that if
    the resume is done immediately after the suspend the resume works
    fins without any AppVM being killed, which seems to indicate perhaps
    an issue with the clock (that's the only thing that comes to mind,
    specially given the warning above) but I'm not sure if this is the
    root cause.

    Any more suggestions would be really appreciated!


As this is a different crash, maybe it is memory corruption.
Some information about which VM template is crashing may help,
and any VMs that never crash?

What type of machine are you using?

--
You received this message because you are subscribed to the Google Groups 
"qubes-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to qubes-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/qubes-users/e9a18a62-2b2b-cd8f-c4f6-f120dc84f8f0%40keehan.net.

Fwd: Re: [qubes-users] Re: AppVms being killed on resume due to clock skew too large

Reply via email to