Re: Xen Domu kernel crash at start of boot

2018-06-22 Thread Jaromír Doleček
2018-06-22 2:54 GMT+02:00 Chuck Zmudzinski :
> I am getting a kernel crash almost immediately after booting the current
> kernel. I am running NetBSD/xen amd64 on a Debian Linux 8.10 DOM0 which uses
> Xen-4.4. Last week's kernel was good. I built a kernel from a cvs update a
> couple of days ago and tried it. It crashed. I tried the most recent daily
> snapshot available from NetBSD daily builds. It crashed too. Here is the
> information from the console about the daily snapshot kernel that crashed
> (it was built earlier today):
> [   1.000] vcpu0: Intel(R) Core(TM) i5-4590S CPU @ 3.00GHz, id 0x306c3
> [   1.000] vcpu0: package 0, core 3, smt 0
> [   1.000] vcpu1 at hypervisor0
> [   1.000] vcpu1: Intel(R) Core(TM) i5-4590S CPU @ 3.00GHz, id 0x306c3
> [   1.000] vcpu1: package 0, core 3, smt 0
> [   1.000] xenbus0 at hypervisor0: Xen Virtual Bus Interface
> [   1.000] xencons0 at hypervisor0: Xen Virtual Console Driver
> [   1.030] fatal protection fault in supervisor mode
> [   1.030] trap type 4 code 0 rip 0x80205968 cs 0x1e030
> rflags 0x10046 cr2 0 ilevel 0 rsp 0xa000a570fbf0
> [   1.030] curlwp 0xa642d4a0 pid 0.15 lowest kstack
> 0xa000a570b2c0
> kernel: protection fault trap, code=0
> Stopped in pid 0.15 (system) at 80205968:   fxsavel

That would be my fault.

Can you send me moral equivalent of "cpuctl identify 0" from the DOM0?
I want to know what CPUID is saying about supported features on the
CPU.

Can you also check whether you use no-xsave flag for your DOM0 by
chance? It should not be needed on Intel CPUs.

Meanwhile this change in sys/arch/xen/x86/cpu.c can be done to avoid this:

@@ -551,7 +551,7 @@ cpu_init(struct cpu_info *ci)
  * does, here we only set CR4_OSXSAVE if the feature is already
  * enabled according to CPUID.
  */
- if (cpu_feature[1] & CPUID2_OSXSAVE)
+ if (0 && cpu_feature[1] & CPUID2_OSXSAVE)
  cr4 |= CR4_OSXSAVE;
  else {
  x86_xsave_features = 0;

The change was tested on Xen 4.2 and Xen 4.8, I wonder if Xen 4.4 has
yet another quirk. Any chance you could try your DOM0 updated to newer
Xen?

Jaromir


Re: Xen Domu kernel crash at start of boot

2018-06-21 Thread Paul Goyette

Stopped in pid 0.15 (system) at 80205968:   fxsavel


I seem to recall some recent changes related to save/restore of fpu 
state.




On Thu, 21 Jun 2018, Chuck Zmudzinski wrote:

I am getting a kernel crash almost immediately after booting the current 
kernel. I am running NetBSD/xen amd64 on a Debian Linux 8.10 DOM0 which uses 
Xen-4.4. Last week's kernel was good. I built a kernel from a cvs update a 
couple of days ago and tried it. It crashed. I tried the most recent daily 
snapshot available from NetBSD daily builds. It crashed too. Here is the 
information from the console about the daily snapshot kernel that crashed (it 
was built earlier today):


[   1.000] NetBSD 8.99.19 (XEN3_DOMU) #0: Thu Jun 21 11:48:05 UTC 2018
[   1.000] 
mkre...@mkrepro.netbsd.org:/usr/src/sys/arch/xen/compile/XEN3_DOMU


Here is the information I got from the console about the crash:

[   1.000] total memory = 3072 MB
[   1.000] avail memory = 2963 MB
[   1.000] cpu_rng: RDRAND
[   1.000] running cgd selftest aes-xts-256 aes-xts-512 done
[   1.000] mainbus0 (root)
[   1.000] hypervisor0 at mainbus0: Xen version 4.4.1
[   1.000] vcpu0 at hypervisor0
[   1.000] vcpu0: Intel(R) Core(TM) i5-4590S CPU @ 3.00GHz, id 0x306c3
[   1.000] vcpu0: package 0, core 3, smt 0
[   1.000] vcpu1 at hypervisor0
[   1.000] vcpu1: Intel(R) Core(TM) i5-4590S CPU @ 3.00GHz, id 0x306c3
[   1.000] vcpu1: package 0, core 3, smt 0
[   1.000] xenbus0 at hypervisor0: Xen Virtual Bus Interface
[   1.000] xencons0 at hypervisor0: Xen Virtual Console Driver
[   1.030] fatal protection fault in supervisor mode
[   1.030] trap type 4 code 0 rip 0x80205968 cs 0x1e030 
rflags 0x10046 cr2 0 ilevel 0 rsp 0xa000a570fbf0
[   1.030] curlwp 0xa642d4a0 pid 0.15 lowest kstack 
0xa000a570b2c0

kernel: protection fault trap, code=0
Stopped in pid 0.15 (system) at 80205968:   fxsavel
ds  0
es  0
fs  fd00
gs  0
rdi a000a570fbf8
rsi 1
rbp a000a570fe68
rbx 0
rdx 2
rcx 0
rax 0
r8  a668
r9  cccd
r10 64
r11 0
r12 a000a570fbf8
r13 1
r14 0
r15 0
rip 80205968
cs  e030
rflags  10046
rsp a000a570fbf0
ss  e02b
80205968:   fxsavel
db{1}> bt
?() at 80205968
?() at 802313f0
?() at 8023155e

I could not get any useful info from addr2line. Any ideas why it crashes?

Thanks,

Chuck Zmudzinski


!DSPAM:5b2c48e0136191148838969!




+--+--++
| Paul Goyette | PGP Key fingerprint: | E-mail addresses:  |
| (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee dot com   |
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd dot org |
+--+--++

Re: Xen Domu kernel crash at start of boot

2018-06-21 Thread Chuck Zmudzinski

The next few lines of the working kernel from last week:

Jun 21 15:38:33 ave /netbsd: [   1.000] xencons0: console major 143, 
unit 0

Jun 21 15:38:33 ave /netbsd: [   1.000] xencons0: using event channel 2
Jun 21 15:38:33 ave /netbsd: [   1.000] timecounter: Timecounter 
"clockinterrupt" frequency 100 Hz quality 0
Jun 21 15:38:33 ave /netbsd: [   1.030] timecounter: Timecounter 
"xen_system_time" frequency 10 Hz quality 10


It looks like it panics when configuring the xen console. I don't have 
time to investigate this now. Maybe next week if it's not fixed by then...


On 06/21/2018 09:06 PM, Greg Troxel wrote:

you might see what the first line not printed is, in other words, what
did the working kernel print next?
And the usual bisecting