Natan,
Thanks for the report. Would you mind re-posting this to the
xen-users mailing list? You're much more likely to get someone there
who's seen such a bug before.
-George
On Tue, Nov 7, 2017 at 11:12 PM, Nathan March wrote:
> Since moving from 4.4 to 4.6, I’ve been seeing an increasing number of
> stability issues on our hypervisors. I’m not clear if there’s a singular
> root cause here, or if I’m dealing with multiple bugs…
>
>
>
> One of the more common ones I’ve seen, is a VM on shutdown will remain in
> the null state and a kernel bug is thrown:
>
>
>
> xen001 log # xl list
>
> NameID Mem VCPUs State
> Time(s)
>
> Domain-0 0 614424 r-
> 6639.7
>
> (null) 3 0 1 --pscd
> 36.3
>
>
>
> [89920.839074] BUG: unable to handle kernel paging request at
> 88020ee9a000
>
> [89920.839546] IP: [] __memcpy+0x12/0x20
>
> [89920.839933] PGD 2008067
>
> [89920.840022] PUD 17f43f067
>
> [89920.840390] PMD 1e0976067
>
> [89920.840469] PTE 0
>
> [89920.840833]
>
> [89920.841123] Oops: [#1] SMP
>
> [89920.841417] Modules linked in: ebt_ip ebtable_filter ebtables
> arptable_filter arp_tables bridge xen_pciback xen_gntalloc nfsd auth_rpcgss
> nfsv3 nfs_acl nfs fscache lockd sunrpc grace 8021q mrp garp stp llc bonding
> xen_acpi_processor blktap xen_netback xen_blkback xen_gntdev xen_evtchn
> xenfs xen_privcmd dcdbas fjes pcspkr ipmi_devintf ipmi_si ipmi_msghandler
> joydev i2c_i801 i2c_smbus lpc_ich shpchp mei_me mei ioatdma ixgbe mdio igb
> dca ptp pps_core uas usb_storage wmi ttm
>
> [89920.847080] CPU: 4 PID: 1471 Comm: loop6 Not tainted 4.9.58-29.el6.x86_64
> #1
>
> [89920.847381] Hardware name: Dell Inc. PowerEdge C6220/03C9JJ, BIOS 2.7.1
> 03/04/2015
>
> [89920.847893] task: 8801b75e0700 task.stack: c900460e
>
> [89920.848192] RIP: e030:[] []
> __memcpy+0x12/0x20
>
> [89920.848783] RSP: e02b:c900460e3b20 EFLAGS: 00010246
>
> [89920.849081] RAX: 88018916d000 RBX: 8801b75e0700 RCX:
> 0200
>
> [89920.849384] RDX: RSI: 88020ee9a000 RDI:
> 88018916d000
>
> [89920.849686] RBP: c900460e3b38 R08: 88011da9fcf8 R09:
> 0002
>
> [89920.849989] R10: 88019535bddc R11: ea0006245b5c R12:
> 1000
>
> [89920.850294] R13: 88018916e000 R14: 1000 R15:
> c900460e3b68
>
> [89920.850605] FS: 7fb865c30700() GS:880204b0()
> knlGS:
>
> [89920.851118] CS: e033 DS: ES: CR0: 80050033
>
> [89920.851418] CR2: 88020ee9a000 CR3: 0001ef03b000 CR4:
> 00042660
>
> [89920.851720] Stack:
>
> [89920.852009] 814375ca c900460e3b38 c900460e3d08
> c900460e3bb8
>
> [89920.852821] 814381c5 c900460e3b68 c900460e3d08
> 1000
>
> [89920.853633] c900460e3d88 1000
> ea00
>
> [89920.854445] Call Trace:
>
> [89920.854741] [] ? memcpy_from_page+0x3a/0x70
>
> [89920.855043] []
> iov_iter_copy_from_user_atomic+0x265/0x290
>
> [89920.855354] [] generic_perform_write+0xf3/0x1d0
>
> [89920.855673] [] ? xen_load_tls+0xaa/0x160
>
> [89920.855992] [] nfs_file_write+0xdb/0x200 [nfs]
>
> [89920.856297] [] vfs_iter_write+0xa2/0xf0
>
> [89920.856599] [] lo_write_bvec+0x65/0x100
>
> [89920.856899] [] do_req_filebacked+0x195/0x300
>
> [89920.857202] [] loop_queue_work+0x5b/0x80
>
> [89920.857505] [] kthread_worker_fn+0x98/0x1b0
>
> [89920.857808] [] ? schedule+0x3a/0xa0
>
> [89920.858108] [] ? _raw_spin_unlock_irqrestore+0x16/0x20
>
> [89920.858411] [] ? kthread_probe_data+0x40/0x40
>
> [89920.858713] [] kthread+0xe5/0x100
>
> [89920.859014] [] ? __kthread_init_worker+0x40/0x40
>
> [89920.859317] [] ret_from_fork+0x25/0x30
>
> [89920.859615] Code: 81 f3 00 00 00 00 e9 1e ff ff ff 90 90 90 90 90 90 90
> 90 90 90 90 90 90 90 66 66 90 66 90 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07
> 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 f3
>
> [89920.864410] RIP [] __memcpy+0x12/0x20
>
> [89920.864749] RSP
>
> [89920.865021] CR2: 88020ee9a000
>
> [89920.865294] ---[ end trace b77d2ce5646284d1 ]---
>
>
>
> Wondering if anyone has advice on how to troubleshoot the above, or might
> have some insight into that the issue could be? This hypervisor was only up
> for a day, had almost no VMs running on it since boot, I booted a single
> windows test VM which BSOD’ed and then this happened.
>
>
>
> This is on xen 4.6.6-4.el6 with 4.9.58-29.el6.x86_64. I see these issues
> across a wide number of systems with from both Dell and Supermicro, although
> we run the same Intel x540 10gb nic’s in each system with the same netapp
> nfs backend storage.
>
>
>
> Cheers,
>
> Nathan
>
>
> ___
> CentOS-virt mailing list
> CentOS-virt@centos.org
>