On Mon, May 15, 2017 at 11:28 AM, Mike Belopuhov <m...@belopuhov.com> wrote:
> On Mon, May 15, 2017 at 11:18 -0400, Dan Cross wrote: > > On Mon, May 15, 2017 at 11:01 AM, Mike Belopuhov <m...@belopuhov.com> > wrote: > > > > > > Thanks for reporting this, however there's not enough info to follow > > > up on this right now. What is clear is that your provider is using > > > an ancient version of Xen that doesn't even support the callback > > > vector interrupt delivery (the emulated xspd0 device is delivering > > > all interrupts). We have developed code for Xen 4.5+ platforms and > > > there was only some testing done by users on 3.x. So, in a way, you > > > can consider Xen 3.x to not be officially supported at this point. > > > > That's unfortunate. Sadly, this is common across two different providers > > (Panix and rootbsd.net). The latter, I'm sure, would at least be > interested > > in coordinating with you guys to get a fix. I'll open a trouble ticket > with > > them. > > > > Having said that, I've got a few questions: > > > > > > - Do you see other write failures as well? > > > > Yes. E.g, syslogd had a similar write failure before panic. > > Can you reproduce any of these write failures at will? > I'm not sure what you mean. If I induce the load conditions, then the VM will panic fairly reliably. What happens when you just send a signal to dump the core? > You can test this by running "sleep 100", and then call > "pkill -ABRT -lf sleep". I'm not sure what this shows, but sure I can do that: : jaan; /bin/sleep 100& [1] 20701 : jaan; pkill -ABRT -lf sleep 20701 sleep : jaan; [1] + abort (core dumped) /bin/sleep 100 : jaan; ls -l sleep.core -rw------- 1 cross staff 4208416 May 15 15:42 sleep.core : jaan; The panic-inducing condition seems to be that, for whatever reason, the kernel gets into a funny state where processes like init(8) die due to having part of their VM image corrupted; the kernel then panics because `init` dies. > - Do you have swap enabled? (pstat -s) > > > > > > Yes; a gig: > > > > : jaan; pstat -s > > Device 1K-blocks Used Avail Capacity Priority > > /dev/sd0b 1048249 0 1048249 0% 0 > > : jaan; > > > > Do you see swap being used under your load? I'm not sure. I can try and crash a machine again and see poke at a kernel var from ddb to see; anything in particular you want me to look at? > - Do you see crashes when bsd.mp is used instead of a single processor > > > > kernel (that's right, even on the single processor VM)? > > > > > > > Yes; the panic happens whether using single- or multi-processor kernels. > > Good, nothing has slipped through those cracks again. > I can see the value in narrowing down the search space. :-) - Dan C.