Re: Panic on 11-STABLE with Xen guest

2018-11-26 Thread Roger Pau Monné
On Mon, Nov 26, 2018 at 10:31:43AM -0800, John Baldwin wrote:
> On 11/22/18 12:39 PM, Joe Clarke wrote:
> > I believe after the commit 340016 for the dynamic IRQ layout, my Xen VM
> > started to panic.  I just upgraded the kernel today and saw this:
> > 
> > xen: unable to map IRQ#2
> > panic: Unable to register interrupt override
> > cpuid = 0
> > KDB: stack backtrace:
> > #0 0x8060a4e7 at kdb_backtrace+0x67
> > #1 0x805c3787 at vpanic+0x177
> > #2 0x805c3603 at panic+0x43
> > #3 0x8093a766 at madt_parse_ints+0x96
> > #4 0x803353f9 at acpi_walk_subtables+0x29
> > #5 0x8093a5e6 at xenpv_register_pirqs+0x56
> > #6 0x80928296 at intr_init_sources+0x116
> > #7 0x8055eba8 at mi_startup+0x118
> > #8 0x8029902c at btext+0x2c
> > 
> > The following kernel works:
> > 
> > @(#)FreeBSD 11.2-STABLE #4: Thu Nov  1 02:24:07 EDT 2018
> > FreeBSD 11.2-STABLE #4: Thu Nov  1 02:24:07 EDT 2018
> > root@creme-brulee:/usr/obj/usr/src/sys/CREME-BRULEE
> > 
> > The following kernel produces the panic above immediately on boot:
> > 
> > @(#)FreeBSD 11.2-STABLE #5: Wed Nov 21 11:08:38 EST 2018
> > FreeBSD 11.2-STABLE #5: Wed Nov 21 11:08:38 EST 2018
> > root@creme-brulee:/usr/obj/usr/src/sys/CREME-BRULEE
> > 
> > Attached is a screen grab of the console of the panic.
> 
> Hmm, I don't see any obvious candidates of Xen changes that weren't included
> in the MFC.  I've added royger@ (who maintains Xen in FreeBSD) to the cc to
> see if he has an idea.
> 
> Roger, the main changes that aren't MFC'd to 11 from 12/head seem to be some
> refcounting on event channels and PVHv2 vs PVHv1?

Sorry, I seem to have missed the thread on stable@ and only replied to
the people that have CC'ed me. The issue is already fixed, see:

https://svnweb.freebsd.org/base?view=revision&revision=340982

Thanks, Roger.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Boot hang on Xen after r318347/(310418)

2017-05-25 Thread Roger Pau Monné
On Wed, May 24, 2017 at 06:33:07PM -0400, Adam McDougall wrote:
> Hello,
> 
> Recently I made a new build of 11-STABLE but encountered a boot hang
> at this state:
> http://www.egr.msu.edu/~mcdouga9/pics/r318347-smp-hang.png
> 
> It is easy to reproduce, I can just boot from any 11 or 12 ISO that 
> contains the commit.

I have just tested latest HEAD (r318861) and stable/11 (r318854) and
they both work fine on my environment (a VM with 4 vCPUs and 2GB of
RAM on OSS Xen 4.9). I'm also adding Colin in case he has some input,
he has been doing some tests on HEAD and AFAIK he hasn't seen any
issues.

> I compiled various svn revisions to confirm that r318347 caused the 
> issue and r318346 is fine. With r318347 or later including the latest 
> 11-STABLE, the system will only boot with one virtual CPU in XenServer. 
> Any more cpus and it hangs. I also tried a 12 kernel from head this 
> afternoon and I have the same hang. I had this issue on XenServer 7 
> (Xen 4.7) and XenServer 6.5 (Xen 4.4). I did most of my testing on 7. I 
> also did much of my testing with a GENERIC kernel to try to rule out 
> kernel configuration mistakes. When it hangs, the performance 
> monitoring in Xen tells me at least one CPU is pegged. r318674 boots 
> fine on physical hardware without Xen involved.
> 
> Looking at r318347 which mentions EARLY_AP_STARTUP and later seeing 
> r318763 which enables EARLY_AP_STARTUP in GENERIC, I tried adding it to 
> my kernel but it turned the hang into a panic but with any number of 
> CPUs: 
> http://www.egr.msu.edu/~mcdouga9/pics/r318347-early-ap-startup-panic.png

I guess this is on stable/11 right? The panic looks easier to debug
that the hang, so let's start by this one. Can you enable the serial
console and kernel debug options in order to get a trace? With just
this it's almost impossible to know what went wrong.

If you still have that kernel around (and it's debug symbols), can you
do:

$ addr2line -e /usr/lib/debug/boot/kernel/kernel.debug 0x80793344

(The address is the instruction pointer on the crash image, I think I
got it right)

In order to compile a stable/11 kernel with full debugging support you
will have to add:

# For full debugger support use (turn off in stable branch):
options BUF_TRACKING# Track buffer history
options DDB # Support DDB.
options FULL_BUF_TRACKING   # Track more buffer history
options GDB # Support remote GDB.
options DEADLKRES   # Enable the deadlock resolver
options INVARIANTS  # Enable calls of extra sanity checking
options INVARIANT_SUPPORT   # Extra sanity checks of internal 
structures, required by INVARIANTS
options WITNESS # Enable checks to detect deadlocks and 
cycles
options WITNESS_SKIPSPIN# Don't run witness on spinlocks for 
speed
options MALLOC_DEBUG_MAXZONES=8 # Separate malloc(9) zones

To your kernel config file.

Just to be sure, this is an amd64 kernel right?

Roger.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: FreeBSD 11 Stable on a Xen :: bridge0 crashing server

2017-01-20 Thread Roger Pau Monné
On Fri, Jan 20, 2017 at 07:20:15PM +0300, Andrey V. Elsukov wrote:
> On 20.01.2017 18:57, Trond Endrestøl wrote:
> > > Here is the situation:
> > > I have a VPS server from a well reputed provider (and they deserve the
> > > reputation), running FreeBSD 11 stable x64 under Xen Full Virtualization
> > > (HVM). I have the xn0 interface which is working fine. I intend to use 
> > > VIMAGE,
> > > so I compiled the kernel, added cloned_interface="bridge0" and restarted 
> > > the
> > > server. But as soon as I am attaching the xn0 to bridge0, the kernel is
> > > panicking and the server restarting.
> > > Any suggestion/pointer/test-instruction is highly appreciated.
> > 
> > The code crashes at line 427 of sys/netinet/if_ether.c:
> > 
> >   ARPSTAT_INC(txrequests);
> > 
> > See
> > https://svnweb.freebsd.org/base/stable/11/sys/netinet/if_ether.c?view=annotate#l427
> > 
> > stable/11 has problems accounting the outgoing octets of any xn
> > interface, although this isn't connected to your case.
> > 
> > Just to rule out any uncertainty, try this patch:
> > 
> > https://svnweb.freebsd.org/base?view=revision&revision=308126
> > 
> > See PR 213439,
> > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=213439
> > 
> > Note, I'm not a developer nor a committer, just a humble sysadmin.
> 
> This problem is unrelated. ARP statistics is global and isn't related to
> some specific interface. IMHO, the kernel panics due to missing VNET
> context. As I see from the code in sys/dev/xen, it is not capable with
> VIMAGE.

I cannot really look into this right now due to lack of time, but I'm more than
happy to review/apply patches in order to fix this.

Roger.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: XEN kernel broken with mlx5*

2015-12-14 Thread Roger Pau Monné
El 14/12/15 a les 15.35, NGie Cooper ha escrit:
> Hi HPS,
>   It seems that XEN.i386 is broken with the mlx5* module due to a 
> mismatch in function definitions. Some refactoring took place on head that 
> appears to have “fixed” this issue. Could you or Roger please fix it?

No refactoring took place on HEAD, the XEN.i386 kernel was simply
removed. AFAICT adding the affected drivers into the WITHOUT_MODULES
list in the XEN kernel config file should solve it.

Roger.

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"