On Mon, Jan 13, 2020 at 02:49:52PM +0000, Andrew Doran wrote: > > Now I get a different panic: > > [ 1.0000000] vcpu0 at hypervisor0 > > [ 1.0000000] vcpu0: 64 page colors > > [ 1.0000000] vcpu0: Intel(R) Core(TM)2 Duo CPU E6550 @ 2.33GHz, id > > 0x6fb > > [ 1.0000000] vcpu0: node 0, package 0, core 1, smt 0 > > [ 1.0000000] vcpu1 at hypervisor0 > > [ 1.0000000] vcpu1: 2 page colors > > [ 1.0000000] vcpu1: starting > > [ 1.0000000] vcpu1: is started. > > [ 1.0000000] vcpu1: Intel(R) Core(TM)2 Duo CPU E6550 @ 2.33GHz, id > > 0x6fb > > [ 1.0000000] vcpu1: node 0, package 0, core 0, smt 0 > > [...] > > [ 1.0000030] UVM: using package allocation scheme, 1 package(s) per bucket > > [ 1.0000030] Xen vcpu1 clock: using event channel 7 > > [ 1.8809493] vcpu1: running > > [ 1.8809493] panic: kernel diagnostic assertion "prev != NULL" failed: > > file "/dsk/l1/misc/bouyer/HEAD/clean/src/sys/kern/kern_lwp.c", line 1021 > > [ 1.8809493] cpu1: Begin traceback... > > [ 1.8809493] > > vpanic(c057f868,d77abf74,d77abf98,c03cc3e5,c057f868,c057f802,c05b0f71,c05b0ce4,3fd,0) > > at netbsd:vpanic+0x134 > > [ 1.8809493] > > kern_assert(c057f868,c057f802,c05b0f71,c05b0ce4,3fd,0,0,0,c13a6900,c03c60c0) > > at netbsd:kern_assert+0x23 > > [ 1.8809493] lwp_startup(0,c13a6900,8b1000,c0674200,0,c010007a,0,0,0,0) > > at netbsd:lwp_startup+0x155 > > [ 1.8809493] cpu1: End traceback... > > > > If I remove the call to cpu_switchto() in cpu_hatch() it boots, but it seems > > that all user processes are running on cpu0 only ... > > I looked and the only thing cpu_switchto() is doing there is setting curlwp, > but that's already set in cpu_start_secondary(), so it's not needed.
It also sets rsp and rbp. I think rbp is not set by anything else, at last in the Xen case. The different rbp value would explain why in one case we hit a KASSERT() in lwp_startup later. But I don't know what pcb_rbp contains; I couldn't find where the pcb for idlelwp is initialized. > > > I can't see what extra work the cpu_switchto() could be doing that would > > matters, execpt maybe the %epb/rbp init. Any idea ? > > Right I don't think cpu_switchto() matters there. The strategy for > assigning LWPs to CPUs in the scheduler has changed. If the machine is not > busy everything is likely to stay on CPU0. Are you putting much load on it? I just tried a build.sh -j4 CPU0 is 100% busy, the others are 100% idle: load averages: 3.02, 2.14, 1.26; up 0+00:51:59 16:59:03 61 processes: 5 runnable, 54 sleeping, 2 on CPU CPU0 states: 39.3% user, 0.0% nice, 60.7% system, 0.0% interrupt, 0.0% idle CPU1 states: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle CPU2 states: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle CPU3 states: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle Memory: 1402M Act, 168K Inact, 16K Wired, 14M Exec, 1352M File, 1932M Free Swap: PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND 21392 bouyer 33 0 29M 5964K RUN/0 0:00 2.00% 0.10% as 0 root 0 0 0K 11M CPU/3 0:30 0.00% 0.00% [system] 81 bouyer 85 0 20M 3596K kqueue/0 0:19 0.00% 0.00% tmux 226 bouyer 43 0 16M 1900K CPU/0 0:00 0.00% 0.00% top 16883 bouyer 33 0 8992K 2212K RUN/0 0:00 0.00% 0.00% nbmake 21137 bouyer 33 0 7844K 1220K RUN/0 0:00 0.00% 0.00% sed 12098 bouyer 33 0 4288K 164K RUN/0 0:00 0.00% 0.00% sh 22411 bouyer 33 0 4288K 164K RUN/0 0:00 0.00% 0.00% cc 42 root 85 0 80M 5768K poll/0 0:00 0.00% 0.00% sshd -- Manuel Bouyer <bou...@antioche.eu.org> NetBSD: 26 ans d'experience feront toujours la difference --