Re: [CentOS-virt] Xen PV DomU running Kernel 4.14.5-1.el7.elrepo.x86_64: xl -v vcpu-set triggers domU kernel WARNING, then domU becomes unresponsive
On 15-12-2017 9:01, Adi Pircalabu wrote: On 15-12-2017 4:10, Akemi Yagi wrote: On Mon, Dec 11, 2017 at 4:52 PM, Adi Pircalabuwrote: Has anyone seen this recently? I couldn't replicate it on: - CentOS 6 running kernel-2.6.32-696.16.1.el6.x86_64, kernel-lt-4.4.105-1.el6.elrepo.x86_64 - CentOS 7 running 4.9.67-1.el7.centos.x86_64 But I can replicate it consistently running "xl -v vcpu-set " on: - CentOS 6 running 4.14.5-1.el6.elrepo.x86_64 - CentOS 7 running 4.14.5-1.el7.elrepo.x86_64 dom0 versions tested with similar results in the domU: - 4.6.6-6.el7 on kernel 4.9.63-29.el7.x86_64 - 4.6.3-15.el6 on kernel 4.9.37-29.el6.x86_64 Noticed behaviour: - These commands stall: top ls -l /var/tmp ls -l /tmp - Stuck in D state on the CentOS 7 domU: root 5 0.0 0.0 0 0 ?D11:20 0:00 [kworker/u8:0] root 316 0.0 0.0 0 0 ?D11:20 0:00 [jbd2/xvda1-8] root 1145 0.0 0.2 116636 4776 ?Ds 11:20 0:00 -bash root 1289 0.0 0.1 25852 2420 ?Ds 11:35 0:00 /usr/bin/systemd-tmpfiles --clean root 1290 0.0 0.1 125248 2696 pts/1D+ 11:44 0:00 ls --color=auto -l /tmp/ root 1293 0.0 0.1 125248 2568 pts/2D+ 11:44 0:00 ls --color=auto -l /var/tmp root 1296 0.0 0.2 116636 4908 pts/3Ds+ 11:44 0:00 -bash root 1358 0.0 0.1 125248 2612 pts/4D+ 11:47 0:00 ls --color=auto -l /var/tmp At a first glance it appears the issue is in 4.14.5 kernel. Stack traces follow: Adi Pircalabu Can you test-install 4.15-rcX to see if the problem persists in the latest kernel?: http://elrepo.org/people/ajb/devel/kernel-ml/el7/x86_64/RPMS/ [1] Akemi Thanks for that, tested it on both CentOS 6 and 7 PV domU and I get similar panics: -CentOS 6- [...] dracut: Switching root Welcome to CentOS Starting udev: udev: starting version 147 input: PC Speaker as /devices/platform/pcspkr/input/input0 xen_netfront: Initialising Xen virtual ethernet driver BUG: unable to handle kernel NULL pointer dereference at 0010 IP: coretemp_cpu_online+0x116/0x190 [coretemp] PGD 7b5c7067 P4D 7b5c7067 PUD 7b5cd067 PMD 0 Oops: 0002 [#1] SMP Modules linked in: coretemp(+) hwmon xen_netfront pcspkr ext4 jbd2 mbcache xen_blkfront dm_mirror dm_region_hash dm_log dm_mod dax CPU: 0 PID: 12 Comm: cpuhp/0 Not tainted 4.15.0-0.rc1.el6.elrepo.x86_64 #1 task: 88007c8f43c0 task.stack: c9004039 RIP: e030:coretemp_cpu_online+0x116/0x190 [coretemp] RSP: e02b:c90040393cd8 EFLAGS: 00010246 RAX: 0010 RBX: RCX: 88007c87c248 RDX: RSI: 880077720c28 RDI: 8800069ea020 RBP: c90040393d18 R08: R09: c90040393a08 R10: R11: 005f R12: R13: 8800069ea000 R14: 88007f60a040 R15: FS: 7f685ca0a700() GS:88007f60() knlGS: CS: e033 DS: ES: CR0: 80050033 CR2: 0010 CR3: 0683a000 CR4: 00042660 Call Trace: ? coretemp_add_core+0x50/0x50 [coretemp] cpuhp_invoke_callback+0xe9/0x700 ? put_prev_task_fair+0x26/0x40 ? __schedule+0x2d0/0x6e0 ? __wake_up_common+0x84/0x130 ? __wake_up_common+0x84/0x130 cpuhp_thread_fun+0xee/0x170 smpboot_thread_fn+0x10c/0x160 ? smpboot_create_threads+0x80/0x80 kthread+0x10a/0x140 ? kthread_probe_data+0x40/0x40 ret_from_fork+0x1f/0x30 Code: 11 15 41 e1 49 89 c5 b8 f4 ff ff ff 4d 85 ed 0f 84 66 ff ff ff 4c 89 ef e8 88 11 41 e1 85 c0 75 6e 48 8b 05 75 17 00 00 4d 63 ff <4e> 89 2c f8 49 81 fd 00 f0 ff ff 44 89 e8 0f 87 3c ff ff ff 49 RIP: coretemp_cpu_online+0x116/0x190 [coretemp] RSP: c90040393cd8 CR2: 0010 ---[ end trace 8253bafacf228cf2 ]--- -CentOS 6- -CentOS 7- [...] [ OK ] Found device /dev/xvda2. Activating swap /dev/xvda2... [4.998940] alg: No test for pcbc(aes) (pcbc-aes-aesni) [5.001054] Adding 1048572k swap on /dev/xvda2. Priority:-2 extents:1 across:1048572k SSFS [ OK ] Activated swap /dev/xvda2. [ OK ] Reached target Swap. [5.020760] BUG: unable to handle kernel NULL pointer dereference at 0010 [5.020767] IP: coretemp_cpu_online+0xf8/0x1f7 [coretemp] [5.020769] PGD 0 P4D 0 [5.020771] Oops: 0002 [#1] SMP [5.020773] Modules linked in: coretemp(+) crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel crypto_simd glue_helper cryptd pcspkr intel_rapl_perf nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 xen_netfront xen_blkfront crc32c_intel [5.020786] CPU: 0 PID: 12 Comm: cpuhp/0 Not tainted 4.15.0-0.rc3.el7.elrepo.x86_64 #1 [5.020789] RIP: e030:coretemp_cpu_online+0xf8/0x1f7 [coretemp] [5.020790] RSP: e02b:c90040387e10 EFLAGS: 00010246 [5.020793] RAX: 0010 RBX: 8800040d8800 RCX: [5.020794] RDX: 880079761e70 RSI:
Re: [CentOS-virt] Xen PV DomU running Kernel 4.14.5-1.el7.elrepo.x86_64: xl -v vcpu-set triggers domU kernel WARNING, then domU becomes unresponsive
On 15-12-2017 4:10, Akemi Yagi wrote: On Mon, Dec 11, 2017 at 4:52 PM, Adi Pircalabuwrote: Has anyone seen this recently? I couldn't replicate it on: - CentOS 6 running kernel-2.6.32-696.16.1.el6.x86_64, kernel-lt-4.4.105-1.el6.elrepo.x86_64 - CentOS 7 running 4.9.67-1.el7.centos.x86_64 But I can replicate it consistently running "xl -v vcpu-set " on: - CentOS 6 running 4.14.5-1.el6.elrepo.x86_64 - CentOS 7 running 4.14.5-1.el7.elrepo.x86_64 dom0 versions tested with similar results in the domU: - 4.6.6-6.el7 on kernel 4.9.63-29.el7.x86_64 - 4.6.3-15.el6 on kernel 4.9.37-29.el6.x86_64 Noticed behaviour: - These commands stall: top ls -l /var/tmp ls -l /tmp - Stuck in D state on the CentOS 7 domU: root 5 0.0 0.0 0 0 ?D11:20 0:00 [kworker/u8:0] root 316 0.0 0.0 0 0 ?D11:20 0:00 [jbd2/xvda1-8] root 1145 0.0 0.2 116636 4776 ?Ds 11:20 0:00 -bash root 1289 0.0 0.1 25852 2420 ?Ds 11:35 0:00 /usr/bin/systemd-tmpfiles --clean root 1290 0.0 0.1 125248 2696 pts/1D+ 11:44 0:00 ls --color=auto -l /tmp/ root 1293 0.0 0.1 125248 2568 pts/2D+ 11:44 0:00 ls --color=auto -l /var/tmp root 1296 0.0 0.2 116636 4908 pts/3Ds+ 11:44 0:00 -bash root 1358 0.0 0.1 125248 2612 pts/4D+ 11:47 0:00 ls --color=auto -l /var/tmp At a first glance it appears the issue is in 4.14.5 kernel. Stack traces follow: Adi Pircalabu Can you test-install 4.15-rcX to see if the problem persists in the latest kernel?: http://elrepo.org/people/ajb/devel/kernel-ml/el7/x86_64/RPMS/ [1] Akemi Thanks for that, tested it on both CentOS 6 and 7 PV domU and I get similar panics: -CentOS 6- [...] dracut: Switching root Welcome to CentOS Starting udev: udev: starting version 147 input: PC Speaker as /devices/platform/pcspkr/input/input0 xen_netfront: Initialising Xen virtual ethernet driver BUG: unable to handle kernel NULL pointer dereference at 0010 IP: coretemp_cpu_online+0x116/0x190 [coretemp] PGD 7b5c7067 P4D 7b5c7067 PUD 7b5cd067 PMD 0 Oops: 0002 [#1] SMP Modules linked in: coretemp(+) hwmon xen_netfront pcspkr ext4 jbd2 mbcache xen_blkfront dm_mirror dm_region_hash dm_log dm_mod dax CPU: 0 PID: 12 Comm: cpuhp/0 Not tainted 4.15.0-0.rc1.el6.elrepo.x86_64 #1 task: 88007c8f43c0 task.stack: c9004039 RIP: e030:coretemp_cpu_online+0x116/0x190 [coretemp] RSP: e02b:c90040393cd8 EFLAGS: 00010246 RAX: 0010 RBX: RCX: 88007c87c248 RDX: RSI: 880077720c28 RDI: 8800069ea020 RBP: c90040393d18 R08: R09: c90040393a08 R10: R11: 005f R12: R13: 8800069ea000 R14: 88007f60a040 R15: FS: 7f685ca0a700() GS:88007f60() knlGS: CS: e033 DS: ES: CR0: 80050033 CR2: 0010 CR3: 0683a000 CR4: 00042660 Call Trace: ? coretemp_add_core+0x50/0x50 [coretemp] cpuhp_invoke_callback+0xe9/0x700 ? put_prev_task_fair+0x26/0x40 ? __schedule+0x2d0/0x6e0 ? __wake_up_common+0x84/0x130 ? __wake_up_common+0x84/0x130 cpuhp_thread_fun+0xee/0x170 smpboot_thread_fn+0x10c/0x160 ? smpboot_create_threads+0x80/0x80 kthread+0x10a/0x140 ? kthread_probe_data+0x40/0x40 ret_from_fork+0x1f/0x30 Code: 11 15 41 e1 49 89 c5 b8 f4 ff ff ff 4d 85 ed 0f 84 66 ff ff ff 4c 89 ef e8 88 11 41 e1 85 c0 75 6e 48 8b 05 75 17 00 00 4d 63 ff <4e> 89 2c f8 49 81 fd 00 f0 ff ff 44 89 e8 0f 87 3c ff ff ff 49 RIP: coretemp_cpu_online+0x116/0x190 [coretemp] RSP: c90040393cd8 CR2: 0010 ---[ end trace 8253bafacf228cf2 ]--- -CentOS 6- -CentOS 7- [...] [ OK ] Found device /dev/xvda2. Activating swap /dev/xvda2... [4.998940] alg: No test for pcbc(aes) (pcbc-aes-aesni) [5.001054] Adding 1048572k swap on /dev/xvda2. Priority:-2 extents:1 across:1048572k SSFS [ OK ] Activated swap /dev/xvda2. [ OK ] Reached target Swap. [5.020760] BUG: unable to handle kernel NULL pointer dereference at 0010 [5.020767] IP: coretemp_cpu_online+0xf8/0x1f7 [coretemp] [5.020769] PGD 0 P4D 0 [5.020771] Oops: 0002 [#1] SMP [5.020773] Modules linked in: coretemp(+) crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel crypto_simd glue_helper cryptd pcspkr intel_rapl_perf nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 xen_netfront xen_blkfront crc32c_intel [5.020786] CPU: 0 PID: 12 Comm: cpuhp/0 Not tainted 4.15.0-0.rc3.el7.elrepo.x86_64 #1 [5.020789] RIP: e030:coretemp_cpu_online+0xf8/0x1f7 [coretemp] [5.020790] RSP: e02b:c90040387e10 EFLAGS: 00010246 [5.020793] RAX: 0010 RBX: 8800040d8800 RCX: [5.020794] RDX: 880079761e70 RSI: 88007c438cc8 RDI:
Re: [CentOS-virt] Xen PV DomU running Kernel 4.14.5-1.el7.elrepo.x86_64: xl -v vcpu-set triggers domU kernel WARNING, then domU becomes unresponsive
On Mon, Dec 11, 2017 at 4:52 PM, Adi Pircalabuwrote: > Has anyone seen this recently? I couldn't replicate it on: > - CentOS 6 running kernel-2.6.32-696.16.1.el6.x86_64, > kernel-lt-4.4.105-1.el6.elrepo.x86_64 > - CentOS 7 running 4.9.67-1.el7.centos.x86_64 > > But I can replicate it consistently running "xl -v vcpu-set " > on: > - CentOS 6 running 4.14.5-1.el6.elrepo.x86_64 > - CentOS 7 running 4.14.5-1.el7.elrepo.x86_64 > > dom0 versions tested with similar results in the domU: > - 4.6.6-6.el7 on kernel 4.9.63-29.el7.x86_64 > - 4.6.3-15.el6 on kernel 4.9.37-29.el6.x86_64 > > Noticed behaviour: > - These commands stall: > top > ls -l /var/tmp > ls -l /tmp > - Stuck in D state on the CentOS 7 domU: > root 5 0.0 0.0 0 0 ?D11:20 0:00 > [kworker/u8:0] > root 316 0.0 0.0 0 0 ?D11:20 0:00 > [jbd2/xvda1-8] > root 1145 0.0 0.2 116636 4776 ?Ds 11:20 0:00 -bash > root 1289 0.0 0.1 25852 2420 ?Ds 11:35 0:00 > /usr/bin/systemd-tmpfiles --clean > root 1290 0.0 0.1 125248 2696 pts/1D+ 11:44 0:00 ls > --color=auto -l /tmp/ > root 1293 0.0 0.1 125248 2568 pts/2D+ 11:44 0:00 ls > --color=auto -l /var/tmp > root 1296 0.0 0.2 116636 4908 pts/3Ds+ 11:44 0:00 -bash > root 1358 0.0 0.1 125248 2612 pts/4D+ 11:47 0:00 ls > --color=auto -l /var/tmp > > At a first glance it appears the issue is in 4.14.5 kernel. Stack traces > follow: > > Adi Pircalabu Can you test-install 4.15-rcX to see if the problem persists in the latest kernel?: http://elrepo.org/people/ajb/devel/kernel-ml/el7/x86_64/RPMS/ Akemi ___ CentOS-virt mailing list CentOS-virt@centos.org https://lists.centos.org/mailman/listinfo/centos-virt