Re: [CentOS-virt] Xen PV DomU running Kernel 4.14.5-1.el7.elrepo.x86_64: xl -v vcpu-set triggers domU kernel WARNING, then domU becomes unresponsive

2017-12-14 Thread Adi Pircalabu

On 15-12-2017 9:01, Adi Pircalabu wrote:

On 15-12-2017 4:10, Akemi Yagi wrote:

On Mon, Dec 11, 2017 at 4:52 PM, Adi Pircalabu 
wrote:


Has anyone seen this recently? I couldn't replicate it on:
- CentOS 6 running kernel-2.6.32-696.16.1.el6.x86_64,
kernel-lt-4.4.105-1.el6.elrepo.x86_64
- CentOS 7 running 4.9.67-1.el7.centos.x86_64

But I can replicate it consistently running "xl -v vcpu-set 
" on:
- CentOS 6 running 4.14.5-1.el6.elrepo.x86_64
- CentOS 7 running 4.14.5-1.el7.elrepo.x86_64

dom0 versions tested with similar results in the domU:
- 4.6.6-6.el7 on kernel 4.9.63-29.el7.x86_64
- 4.6.3-15.el6 on kernel 4.9.37-29.el6.x86_64

Noticed behaviour:
- These commands stall:
top
ls -l /var/tmp
ls -l /tmp
- Stuck in D state on the CentOS 7 domU:
root 5  0.0  0.0  0 0 ?D11:20   0:00
[kworker/u8:0]
root   316  0.0  0.0  0 0 ?D11:20   0:00
[jbd2/xvda1-8]
root  1145  0.0  0.2 116636  4776 ?Ds   11:20   0:00
-bash
root  1289  0.0  0.1  25852  2420 ?Ds   11:35   0:00
/usr/bin/systemd-tmpfiles --clean
root  1290  0.0  0.1 125248  2696 pts/1D+   11:44   0:00 ls
--color=auto -l /tmp/
root  1293  0.0  0.1 125248  2568 pts/2D+   11:44   0:00 ls
--color=auto -l /var/tmp
root  1296  0.0  0.2 116636  4908 pts/3Ds+  11:44   0:00
-bash
root  1358  0.0  0.1 125248  2612 pts/4D+   11:47   0:00 ls
--color=auto -l /var/tmp

At a first glance it appears the issue is in 4.14.5 kernel. Stack
traces follow:

Adi Pircalabu


Can you test-install 4.15-rcX​
 to see if the problem persists in the latest kernel?:

​http://elrepo.org/people/ajb/devel/kernel-ml/el7/x86_64/RPMS/ [1]

Akemi


Thanks for that, tested it on both CentOS 6 and 7 PV domU and I get
similar panics:

-CentOS 6-
[...]
dracut: Switching root
Welcome to CentOS
Starting udev: udev: starting version 147
input: PC Speaker as /devices/platform/pcspkr/input/input0
xen_netfront: Initialising Xen virtual ethernet driver
BUG: unable to handle kernel NULL pointer dereference at 
0010

IP: coretemp_cpu_online+0x116/0x190 [coretemp]
PGD 7b5c7067 P4D 7b5c7067 PUD 7b5cd067 PMD 0
Oops: 0002 [#1] SMP
Modules linked in: coretemp(+) hwmon xen_netfront pcspkr ext4 jbd2
mbcache xen_blkfront dm_mirror dm_region_hash dm_log dm_mod dax
CPU: 0 PID: 12 Comm: cpuhp/0 Not tainted 4.15.0-0.rc1.el6.elrepo.x86_64 
#1

task: 88007c8f43c0 task.stack: c9004039
RIP: e030:coretemp_cpu_online+0x116/0x190 [coretemp]
RSP: e02b:c90040393cd8 EFLAGS: 00010246
RAX: 0010 RBX:  RCX: 88007c87c248
RDX:  RSI: 880077720c28 RDI: 8800069ea020
RBP: c90040393d18 R08:  R09: c90040393a08
R10:  R11: 005f R12: 
R13: 8800069ea000 R14: 88007f60a040 R15: 
FS:  7f685ca0a700() GS:88007f60() 
knlGS:

CS:  e033 DS:  ES:  CR0: 80050033
CR2: 0010 CR3: 0683a000 CR4: 00042660
Call Trace:
 ? coretemp_add_core+0x50/0x50 [coretemp]
 cpuhp_invoke_callback+0xe9/0x700
 ? put_prev_task_fair+0x26/0x40
 ? __schedule+0x2d0/0x6e0
 ? __wake_up_common+0x84/0x130
 ? __wake_up_common+0x84/0x130
 cpuhp_thread_fun+0xee/0x170
 smpboot_thread_fn+0x10c/0x160
 ? smpboot_create_threads+0x80/0x80
 kthread+0x10a/0x140
 ? kthread_probe_data+0x40/0x40
 ret_from_fork+0x1f/0x30
Code: 11 15 41 e1 49 89 c5 b8 f4 ff ff ff 4d 85 ed 0f 84 66 ff ff ff
4c 89 ef e8 88 11 41 e1 85 c0 75 6e 48 8b 05 75 17 00 00 4d 63 ff <4e>
89 2c f8 49 81 fd 00 f0 ff ff 44 89 e8 0f 87 3c ff ff ff 49
RIP: coretemp_cpu_online+0x116/0x190 [coretemp] RSP: c90040393cd8
CR2: 0010
---[ end trace 8253bafacf228cf2 ]---
-CentOS 6-
-CentOS 7-
[...]
[  OK  ] Found device /dev/xvda2.
 Activating swap /dev/xvda2...
[4.998940] alg: No test for pcbc(aes) (pcbc-aes-aesni)
[5.001054] Adding 1048572k swap on /dev/xvda2.  Priority:-2
extents:1 across:1048572k SSFS
[  OK  ] Activated swap /dev/xvda2.
[  OK  ] Reached target Swap.
[5.020760] BUG: unable to handle kernel NULL pointer dereference
at 0010
[5.020767] IP: coretemp_cpu_online+0xf8/0x1f7 [coretemp]
[5.020769] PGD 0 P4D 0
[5.020771] Oops: 0002 [#1] SMP
[5.020773] Modules linked in: coretemp(+) crct10dif_pclmul
crc32_pclmul ghash_clmulni_intel pcbc aesni_intel crypto_simd
glue_helper cryptd pcspkr intel_rapl_perf nfsd auth_rpcgss nfs_acl
lockd grace sunrpc ip_tables ext4 mbcache jbd2 xen_netfront
xen_blkfront crc32c_intel
[5.020786] CPU: 0 PID: 12 Comm: cpuhp/0 Not tainted
4.15.0-0.rc3.el7.elrepo.x86_64 #1
[5.020789] RIP: e030:coretemp_cpu_online+0xf8/0x1f7 [coretemp]
[5.020790] RSP: e02b:c90040387e10 EFLAGS: 00010246
[5.020793] RAX: 0010 RBX: 8800040d8800 RCX: 

[5.020794] RDX: 880079761e70 RSI: 

Re: [CentOS-virt] Xen PV DomU running Kernel 4.14.5-1.el7.elrepo.x86_64: xl -v vcpu-set triggers domU kernel WARNING, then domU becomes unresponsive

2017-12-14 Thread Adi Pircalabu

On 15-12-2017 4:10, Akemi Yagi wrote:

On Mon, Dec 11, 2017 at 4:52 PM, Adi Pircalabu 
wrote:


Has anyone seen this recently? I couldn't replicate it on:
- CentOS 6 running kernel-2.6.32-696.16.1.el6.x86_64,
kernel-lt-4.4.105-1.el6.elrepo.x86_64
- CentOS 7 running 4.9.67-1.el7.centos.x86_64

But I can replicate it consistently running "xl -v vcpu-set 
" on:
- CentOS 6 running 4.14.5-1.el6.elrepo.x86_64
- CentOS 7 running 4.14.5-1.el7.elrepo.x86_64

dom0 versions tested with similar results in the domU:
- 4.6.6-6.el7 on kernel 4.9.63-29.el7.x86_64
- 4.6.3-15.el6 on kernel 4.9.37-29.el6.x86_64

Noticed behaviour:
- These commands stall:
top
ls -l /var/tmp
ls -l /tmp
- Stuck in D state on the CentOS 7 domU:
root 5  0.0  0.0  0 0 ?D11:20   0:00
[kworker/u8:0]
root   316  0.0  0.0  0 0 ?D11:20   0:00
[jbd2/xvda1-8]
root  1145  0.0  0.2 116636  4776 ?Ds   11:20   0:00
-bash
root  1289  0.0  0.1  25852  2420 ?Ds   11:35   0:00
/usr/bin/systemd-tmpfiles --clean
root  1290  0.0  0.1 125248  2696 pts/1D+   11:44   0:00 ls
--color=auto -l /tmp/
root  1293  0.0  0.1 125248  2568 pts/2D+   11:44   0:00 ls
--color=auto -l /var/tmp
root  1296  0.0  0.2 116636  4908 pts/3Ds+  11:44   0:00
-bash
root  1358  0.0  0.1 125248  2612 pts/4D+   11:47   0:00 ls
--color=auto -l /var/tmp

At a first glance it appears the issue is in 4.14.5 kernel. Stack
traces follow:

Adi Pircalabu


Can you test-install 4.15-rcX​
 to see if the problem persists in the latest kernel?:

​http://elrepo.org/people/ajb/devel/kernel-ml/el7/x86_64/RPMS/ [1]

Akemi


Thanks for that, tested it on both CentOS 6 and 7 PV domU and I get 
similar panics:


-CentOS 6-
[...]
dracut: Switching root
Welcome to CentOS
Starting udev: udev: starting version 147
input: PC Speaker as /devices/platform/pcspkr/input/input0
xen_netfront: Initialising Xen virtual ethernet driver
BUG: unable to handle kernel NULL pointer dereference at 
0010

IP: coretemp_cpu_online+0x116/0x190 [coretemp]
PGD 7b5c7067 P4D 7b5c7067 PUD 7b5cd067 PMD 0
Oops: 0002 [#1] SMP
Modules linked in: coretemp(+) hwmon xen_netfront pcspkr ext4 jbd2 
mbcache xen_blkfront dm_mirror dm_region_hash dm_log dm_mod dax
CPU: 0 PID: 12 Comm: cpuhp/0 Not tainted 4.15.0-0.rc1.el6.elrepo.x86_64 
#1

task: 88007c8f43c0 task.stack: c9004039
RIP: e030:coretemp_cpu_online+0x116/0x190 [coretemp]
RSP: e02b:c90040393cd8 EFLAGS: 00010246
RAX: 0010 RBX:  RCX: 88007c87c248
RDX:  RSI: 880077720c28 RDI: 8800069ea020
RBP: c90040393d18 R08:  R09: c90040393a08
R10:  R11: 005f R12: 
R13: 8800069ea000 R14: 88007f60a040 R15: 
FS:  7f685ca0a700() GS:88007f60() 
knlGS:

CS:  e033 DS:  ES:  CR0: 80050033
CR2: 0010 CR3: 0683a000 CR4: 00042660
Call Trace:
 ? coretemp_add_core+0x50/0x50 [coretemp]
 cpuhp_invoke_callback+0xe9/0x700
 ? put_prev_task_fair+0x26/0x40
 ? __schedule+0x2d0/0x6e0
 ? __wake_up_common+0x84/0x130
 ? __wake_up_common+0x84/0x130
 cpuhp_thread_fun+0xee/0x170
 smpboot_thread_fn+0x10c/0x160
 ? smpboot_create_threads+0x80/0x80
 kthread+0x10a/0x140
 ? kthread_probe_data+0x40/0x40
 ret_from_fork+0x1f/0x30
Code: 11 15 41 e1 49 89 c5 b8 f4 ff ff ff 4d 85 ed 0f 84 66 ff ff ff 4c 
89 ef e8 88 11 41 e1 85 c0 75 6e 48 8b 05 75 17 00 00 4d 63 ff <4e> 89 
2c f8 49 81 fd 00 f0 ff ff 44 89 e8 0f 87 3c ff ff ff 49

RIP: coretemp_cpu_online+0x116/0x190 [coretemp] RSP: c90040393cd8
CR2: 0010
---[ end trace 8253bafacf228cf2 ]---
-CentOS 6-
-CentOS 7-
[...]
[  OK  ] Found device /dev/xvda2.
 Activating swap /dev/xvda2...
[4.998940] alg: No test for pcbc(aes) (pcbc-aes-aesni)
[5.001054] Adding 1048572k swap on /dev/xvda2.  Priority:-2 
extents:1 across:1048572k SSFS

[  OK  ] Activated swap /dev/xvda2.
[  OK  ] Reached target Swap.
[5.020760] BUG: unable to handle kernel NULL pointer dereference at 
0010

[5.020767] IP: coretemp_cpu_online+0xf8/0x1f7 [coretemp]
[5.020769] PGD 0 P4D 0
[5.020771] Oops: 0002 [#1] SMP
[5.020773] Modules linked in: coretemp(+) crct10dif_pclmul 
crc32_pclmul ghash_clmulni_intel pcbc aesni_intel crypto_simd 
glue_helper cryptd pcspkr intel_rapl_perf nfsd auth_rpcgss nfs_acl lockd 
grace sunrpc ip_tables ext4 mbcache jbd2 xen_netfront xen_blkfront 
crc32c_intel
[5.020786] CPU: 0 PID: 12 Comm: cpuhp/0 Not tainted 
4.15.0-0.rc3.el7.elrepo.x86_64 #1

[5.020789] RIP: e030:coretemp_cpu_online+0xf8/0x1f7 [coretemp]
[5.020790] RSP: e02b:c90040387e10 EFLAGS: 00010246
[5.020793] RAX: 0010 RBX: 8800040d8800 RCX: 

[5.020794] RDX: 880079761e70 RSI: 88007c438cc8 RDI: 

Re: [CentOS-virt] Xen PV DomU running Kernel 4.14.5-1.el7.elrepo.x86_64: xl -v vcpu-set triggers domU kernel WARNING, then domU becomes unresponsive

2017-12-14 Thread Akemi Yagi
On Mon, Dec 11, 2017 at 4:52 PM, Adi Pircalabu  wrote:

> Has anyone seen this recently? I couldn't replicate it on:
> - CentOS 6 running kernel-2.6.32-696.16.1.el6.x86_64,
> kernel-lt-4.4.105-1.el6.elrepo.x86_64
> - CentOS 7 running 4.9.67-1.el7.centos.x86_64
>
> But I can replicate it consistently running "xl -v vcpu-set  "
> on:
> - CentOS 6 running 4.14.5-1.el6.elrepo.x86_64
> - CentOS 7 running 4.14.5-1.el7.elrepo.x86_64
>
> dom0 versions tested with similar results in the domU:
> - 4.6.6-6.el7 on kernel 4.9.63-29.el7.x86_64
> - 4.6.3-15.el6 on kernel 4.9.37-29.el6.x86_64
>
> Noticed behaviour:
> - These commands stall:
> top
> ls -l /var/tmp
> ls -l /tmp
> - Stuck in D state on the CentOS 7 domU:
> root 5  0.0  0.0  0 0 ?D11:20   0:00
> [kworker/u8:0]
> root   316  0.0  0.0  0 0 ?D11:20   0:00
> [jbd2/xvda1-8]
> root  1145  0.0  0.2 116636  4776 ?Ds   11:20   0:00 -bash
> root  1289  0.0  0.1  25852  2420 ?Ds   11:35   0:00
> /usr/bin/systemd-tmpfiles --clean
> root  1290  0.0  0.1 125248  2696 pts/1D+   11:44   0:00 ls
> --color=auto -l /tmp/
> root  1293  0.0  0.1 125248  2568 pts/2D+   11:44   0:00 ls
> --color=auto -l /var/tmp
> root  1296  0.0  0.2 116636  4908 pts/3Ds+  11:44   0:00 -bash
> root  1358  0.0  0.1 125248  2612 pts/4D+   11:47   0:00 ls
> --color=auto -l /var/tmp
>
> At a first glance it appears the issue is in 4.14.5 kernel. Stack traces
> follow:
>
> Adi Pircalabu


Can you test-install 4.15-rcX​

 to see if the problem persists in the latest kernel?:

​http://elrepo.org/people/ajb/devel/kernel-ml/el7/x86_64/RPMS/

Akemi
___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt