Re: Commit 367705+367706 causes a pabic

2020-11-20 Thread Kristof Provost

I still can’t reproduce that panic.

Does it happen immediately after you start a vnet jail?

Does it also happen with a GENERIC kernel?

Regards,
Kristof

On 20 Nov 2020, at 14:53, Peter Blok wrote:

The panic with ipsec code in the backtrace was already very strange. I 
was using IPsec, but only on one interface totally separate from the 
members of the bridge as well as the bridge itself. The jails were not 
doing any ipsec as well. Note that panic was a while ago and it was 
after the 1st bridge epochification was done on stable-12 which was 
later backed out


Today the system is no longer using ipsec, but it is still compiled 
in. I can remove it if need be for a test



src.conf
WITHOUT_KERBEROS=yes
WITHOUT_GSSAPI=yes
WITHOUT_SENDMAIL=true
WITHOUT_MAILWRAPPER=true
WITHOUT_DMAGENT=true
WITHOUT_GAMES=true
WITHOUT_IPFILTER=true
WITHOUT_UNBOUND=true
WITHOUT_PROFILE=true
WITHOUT_ATM=true
WITHOUT_BSNMP=true
#WITHOUT_CROSS_COMPILER=true
WITHOUT_DEBUG_FILES=true
WITHOUT_DICT=true
WITHOUT_FLOPPY=true
WITHOUT_HTML=true
WITHOUT_HYPERV=true
WITHOUT_NDIS=true
WITHOUT_NIS=true
WITHOUT_PPP=true
WITHOUT_TALK=true
WITHOUT_TESTS=true
WITHOUT_WIRELESS=true
#WITHOUT_LIB32=true
WITHOUT_LPR=true

make.conf
KERNCONF=BHYVE
MODULES_OVERRIDE=opensolaris dtrace zfs vmm nmdm if_bridge bridgestp 
if_vxlan pflog libmchain libiconv smbfs linux linux64 linux_common 
linuxkpi linprocfs linsysfs ext2fs

DEFAULT_VERSIONS+=perl5=5.30 mysql=5.7 python=3.8 python3=3.8
OPTIONS_UNSET=DOCS NLS MANPAGES

BHYVE
cpu HAMMER
ident   BHYVE

makeoptions DEBUG=-g# Build kernel with gdb(1) debug symbols
makeoptions WITH_CTF=1  # Run ctfconvert(1) for DTrace support

options CAMDEBUG

options SCHED_ULE   # ULE scheduler
options PREEMPTION  # Enable kernel thread preemption
options INET# InterNETworking
options INET6   # IPv6 communications protocols
options IPSEC
options TCP_OFFLOAD # TCP offload
options TCP_RFC7413 # TCP FASTOPEN
options SCTP# Stream Control Transmission Protocol
options FFS # Berkeley Fast Filesystem
options SOFTUPDATES # Enable FFS soft updates support
options UFS_ACL # Support for access control lists
options UFS_DIRHASH # Improve performance on big directories
options UFS_GJOURNAL# Enable gjournal-based UFS journaling
options QUOTA   # Enable disk quotas for UFS
options SUIDDIR
options NFSCL   # Network Filesystem Client
options NFSD# Network Filesystem Server
options NFSLOCKD# Network Lock Manager
options MSDOSFS # MSDOS Filesystem
options CD9660  # ISO 9660 Filesystem
options FUSEFS
options NULLFS  # NULL filesystem
options UNIONFS
options FDESCFS # File descriptor filesystem
options PROCFS  # Process filesystem (requires PSEUDOFS)
options PSEUDOFS# Pseudo-filesystem framework
options GEOM_PART_GPT   # GUID Partition Tables.
options GEOM_RAID   # Soft RAID functionality.
options GEOM_LABEL  # Provides labelization
options GEOM_ELI# Disk encryption.
options COMPAT_FREEBSD32# Compatible with i386 binaries
options COMPAT_FREEBSD4 # Compatible with FreeBSD4
options COMPAT_FREEBSD5 # Compatible with FreeBSD5
options COMPAT_FREEBSD6 # Compatible with FreeBSD6
options COMPAT_FREEBSD7 # Compatible with FreeBSD7
options COMPAT_FREEBSD9 # Compatible with FreeBSD9
options COMPAT_FREEBSD10# Compatible with FreeBSD10
options COMPAT_FREEBSD11# Compatible with FreeBSD11
options SCSI_DELAY=5000 # Delay (in ms) before probing SCSI
options KTRACE  # ktrace(1) support
options STACK   # stack(9) support
options SYSVSHM # SYSV-style shared memory
options SYSVMSG # SYSV-style message queues
options SYSVSEM # SYSV-style semaphores
options 	_KPOSIX_PRIORITY_SCHEDULING # POSIX P1003_1B real-time 
extensions
options 	PRINTF_BUFR_SIZE=128	# Prevent printf output being 
interspersed.

options KBD_INSTALL_CDEV# install a CDEV entry in /dev
options HWPMC_HOOKS # Necessary kernel hooks for hwpmc(4)
options AUDIT   # Security event auditing
options CAPABILITY_MODE # Capsicum capability mode
options CAPABILITIES# Capsicum 

Re: Commit 367705+367706 causes a pabic

2020-11-20 Thread Peter Blok
The panic with ipsec code in the backtrace was already very strange. I was 
using IPsec, but only on one interface totally separate from the members of the 
bridge as well as the bridge itself. The jails were not doing any ipsec as 
well. Note that panic was a while ago and it was after the 1st bridge 
epochification was done on stable-12 which was later backed out

Today the system is no longer using ipsec, but it is still compiled in. I can 
remove it if need be for a test


src.conf
WITHOUT_KERBEROS=yes
WITHOUT_GSSAPI=yes
WITHOUT_SENDMAIL=true
WITHOUT_MAILWRAPPER=true
WITHOUT_DMAGENT=true
WITHOUT_GAMES=true
WITHOUT_IPFILTER=true
WITHOUT_UNBOUND=true
WITHOUT_PROFILE=true
WITHOUT_ATM=true
WITHOUT_BSNMP=true
#WITHOUT_CROSS_COMPILER=true
WITHOUT_DEBUG_FILES=true
WITHOUT_DICT=true
WITHOUT_FLOPPY=true
WITHOUT_HTML=true
WITHOUT_HYPERV=true
WITHOUT_NDIS=true
WITHOUT_NIS=true
WITHOUT_PPP=true
WITHOUT_TALK=true
WITHOUT_TESTS=true
WITHOUT_WIRELESS=true
#WITHOUT_LIB32=true
WITHOUT_LPR=true

make.conf
KERNCONF=BHYVE
MODULES_OVERRIDE=opensolaris dtrace zfs vmm nmdm if_bridge bridgestp if_vxlan 
pflog libmchain libiconv smbfs linux linux64 linux_common linuxkpi linprocfs 
linsysfs ext2fs
DEFAULT_VERSIONS+=perl5=5.30 mysql=5.7 python=3.8 python3=3.8
OPTIONS_UNSET=DOCS NLS MANPAGES

BHYVE
cpu HAMMER
ident   BHYVE

makeoptions DEBUG=-g# Build kernel with gdb(1) debug symbols
makeoptions WITH_CTF=1  # Run ctfconvert(1) for DTrace support

options CAMDEBUG

options SCHED_ULE   # ULE scheduler
options PREEMPTION  # Enable kernel thread preemption
options INET# InterNETworking
options INET6   # IPv6 communications protocols
options IPSEC
options TCP_OFFLOAD # TCP offload
options TCP_RFC7413 # TCP FASTOPEN
options SCTP# Stream Control Transmission Protocol
options FFS # Berkeley Fast Filesystem
options SOFTUPDATES # Enable FFS soft updates support
options UFS_ACL # Support for access control lists
options UFS_DIRHASH # Improve performance on big directories
options UFS_GJOURNAL# Enable gjournal-based UFS journaling
options QUOTA   # Enable disk quotas for UFS
options SUIDDIR
options NFSCL   # Network Filesystem Client
options NFSD# Network Filesystem Server
options NFSLOCKD# Network Lock Manager
options MSDOSFS # MSDOS Filesystem
options CD9660  # ISO 9660 Filesystem
options FUSEFS
options NULLFS  # NULL filesystem
options UNIONFS
options FDESCFS # File descriptor filesystem
options PROCFS  # Process filesystem (requires PSEUDOFS)
options PSEUDOFS# Pseudo-filesystem framework
options GEOM_PART_GPT   # GUID Partition Tables.
options GEOM_RAID   # Soft RAID functionality.
options GEOM_LABEL  # Provides labelization
options GEOM_ELI# Disk encryption.
options COMPAT_FREEBSD32# Compatible with i386 binaries
options COMPAT_FREEBSD4 # Compatible with FreeBSD4
options COMPAT_FREEBSD5 # Compatible with FreeBSD5
options COMPAT_FREEBSD6 # Compatible with FreeBSD6
options COMPAT_FREEBSD7 # Compatible with FreeBSD7
options COMPAT_FREEBSD9 # Compatible with FreeBSD9
options COMPAT_FREEBSD10# Compatible with FreeBSD10
options COMPAT_FREEBSD11# Compatible with FreeBSD11
options SCSI_DELAY=5000 # Delay (in ms) before probing SCSI
options KTRACE  # ktrace(1) support
options STACK   # stack(9) support
options SYSVSHM # SYSV-style shared memory
options SYSVMSG # SYSV-style message queues
options SYSVSEM # SYSV-style semaphores
options _KPOSIX_PRIORITY_SCHEDULING # POSIX P1003_1B real-time 
extensions
options PRINTF_BUFR_SIZE=128# Prevent printf output being 
interspersed.
options KBD_INSTALL_CDEV# install a CDEV entry in /dev
options HWPMC_HOOKS # Necessary kernel hooks for hwpmc(4)
options AUDIT   # Security event auditing
options CAPABILITY_MODE # Capsicum capability mode
options CAPABILITIES# Capsicum capabilities
options MAC # TrustedBSD MAC Framework
options MAC_PORTACL
options MAC_NTPD
options KDTRACE_FRAME   # Ensure frames are 

Re: Commit 367705+367706 causes a pabic

2020-11-20 Thread Kristof Provost
Can you share your kernel config file (and src.conf / make.conf if they 
exist)?


This second panic is in the IPSec code. My current thinking is that your 
kernel config is triggering a bug that’s manifesting in multiple 
places, but not actually caused by those places.


I’d like to be able to reproduce it so we can debug it.

Best regards,
Kristof

On 20 Nov 2020, at 12:02, Peter Blok wrote:

Hi Kristof,

This is 12-stable. With the previous bridge epochification that was 
backed out my config had a panic too.


I don’t have any local modifications. I did a clean rebuild after 
removing /usr/obj/usr


My kernel is custom - I only have zfs.ko, opensolaris.ko, vmm.ko and 
nmdm.ko as modules. Everything else is statically linked. I have 
removed all drivers not needed for the hardware at hand.


My bridge is between two vlans from the same trunk and the jail epair 
devices as well as the bhyve tap devices.


The panic happens when the jails are starting.

I can try to narrow it down over the weekend and make the crash dump 
available for analysis.


Previously I had the following crash with 363492

kernel trap 12 with interrupts disabled


Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 02
fault virtual address   = 0x0410
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x80692326
stack pointer   = 0x28:0xfe00c06097b0
frame pointer   = 0x28:0xfe00c06097f0
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= resume, IOPL = 0
current process = 2030 (ifconfig)
trap number = 12
panic: page fault
cpuid = 2
time = 1595683412
KDB: stack backtrace:
#0 0x80698165 at kdb_backtrace+0x65
#1 0x8064d67b at vpanic+0x17b
#2 0x8064d4f3 at panic+0x43
#3 0x809cc311 at trap_fatal+0x391
#4 0x809cc36f at trap_pfault+0x4f
#5 0x809cb9b6 at trap+0x286
#6 0x809a5b28 at calltrap+0x8
#7 0x803677fd at ck_epoch_synchronize_wait+0x8d
#8 0x8069213a at epoch_wait_preempt+0xaa
#9 0x807615b7 at ipsec_ioctl+0x3a7
#10 0x8075274f at ifioctl+0x47f
#11 0x806b5ea7 at kern_ioctl+0x2b7
#12 0x806b5b4a at sys_ioctl+0xfa
#13 0x809ccec7 at amd64_syscall+0x387
#14 0x809a6450 at fast_syscall_common+0x101





On 20 Nov 2020, at 11:30, Kristof Provost  wrote:

On 20 Nov 2020, at 11:18, peter.b...@bsd4all.org 
 wrote:
I’m afraid the last Epoch fix for bridge is not solving the 
problem ( or perhaps creates a new ).



We’re talking about the stable/12 branch, right?


This seems to happen when the jail epair is added to the bridge.

There must be something more to it than that. I’ve run the bridge 
tests on stable/12 without issue, and this is a problem we didn’t 
see when the bridge epochification initially went into stable/12.


Do you have a custom kernel config? Other patches? What exact 
commands do you run to trigger the panic?



kernel trap 12 with interrupts disabled


Fatal trap 12: page fault while in kernel mode
cpuid = 6; apic id = 06
fault virtual address   = 0xc10
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x80695e76
stack pointer   = 0x28:0xfe00bf14e6e0
frame pointer   = 0x28:0xfe00bf14e720
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= resume, IOPL = 0
current process = 1686 (jail)
trap number = 12
panic: page fault
cpuid = 6
time = 1605811310
KDB: stack backtrace:
#0 0x8069bb85 at kdb_backtrace+0x65
#1 0x80650a4b at vpanic+0x17b
#2 0x806508c3 at panic+0x43
#3 0x809d0351 at trap_fatal+0x391
#4 0x809d03af at trap_pfault+0x4f
#5 0x809cf9f6 at trap+0x286
#6 0x809a98c8 at calltrap+0x8
#7 0x80368a8d at ck_epoch_synchronize_wait+0x8d
#8 0x80695c8a at epoch_wait_preempt+0xaa
#9 0x80757d40 at vnet_if_init+0x120
#10 0x8078c994 at vnet_alloc+0x114
#11 0x8061e3f7 at kern_jail_set+0x1bb7
#12 0x80620190 at sys_jail_set+0x40
#13 0x809d0f07 at amd64_syscall+0x387
#14 0x809aa1ee at fast_syscall_common+0xf8


This panic is rather odd. This isn’t even the bridge code. This is 
during initial creation of the vnet. I don’t really see how this 
could even trigger panics.
That panic looks as if something corrupted the net_epoch_preempt, by 
overwriting the epoch->e_epoch. The bridge patches only access this 
variable through the well-established functions and macros. I see no 
obvious way that they could corrupt it.


Best regards,
Kristof



___
freebsd-stable@freebsd.org mailing list

Re: Commit 367705+367706 causes a pabic

2020-11-20 Thread Peter Blok
Hi Kristof,

This is 12-stable. With the previous bridge epochification that was backed out 
my config had a panic too.

I don’t have any local modifications. I did a clean rebuild after removing 
/usr/obj/usr

My kernel is custom - I only have zfs.ko, opensolaris.ko, vmm.ko and nmdm.ko as 
modules. Everything else is statically linked. I have removed all drivers not 
needed for the hardware at hand.

My bridge is between two vlans from the same trunk and the jail epair devices 
as well as the bhyve tap devices.

The panic happens when the jails are starting.

I can try to narrow it down over the weekend and make the crash dump available 
for analysis.

Previously I had the following crash with 363492

kernel trap 12 with interrupts disabled


Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 02
fault virtual address   = 0x0410
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x80692326
stack pointer   = 0x28:0xfe00c06097b0
frame pointer   = 0x28:0xfe00c06097f0
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= resume, IOPL = 0
current process = 2030 (ifconfig)
trap number = 12
panic: page fault
cpuid = 2
time = 1595683412
KDB: stack backtrace:
#0 0x80698165 at kdb_backtrace+0x65
#1 0x8064d67b at vpanic+0x17b
#2 0x8064d4f3 at panic+0x43
#3 0x809cc311 at trap_fatal+0x391
#4 0x809cc36f at trap_pfault+0x4f
#5 0x809cb9b6 at trap+0x286
#6 0x809a5b28 at calltrap+0x8
#7 0x803677fd at ck_epoch_synchronize_wait+0x8d
#8 0x8069213a at epoch_wait_preempt+0xaa
#9 0x807615b7 at ipsec_ioctl+0x3a7
#10 0x8075274f at ifioctl+0x47f
#11 0x806b5ea7 at kern_ioctl+0x2b7
#12 0x806b5b4a at sys_ioctl+0xfa
#13 0x809ccec7 at amd64_syscall+0x387
#14 0x809a6450 at fast_syscall_common+0x101




> On 20 Nov 2020, at 11:30, Kristof Provost  wrote:
> 
> On 20 Nov 2020, at 11:18, peter.b...@bsd4all.org 
>  wrote:
>> I’m afraid the last Epoch fix for bridge is not solving the problem ( or 
>> perhaps creates a new ).
>> 
> We’re talking about the stable/12 branch, right?
> 
>> This seems to happen when the jail epair is added to the bridge.
>> 
> There must be something more to it than that. I’ve run the bridge tests on 
> stable/12 without issue, and this is a problem we didn’t see when the bridge 
> epochification initially went into stable/12.
> 
> Do you have a custom kernel config? Other patches? What exact commands do you 
> run to trigger the panic?
> 
>> kernel trap 12 with interrupts disabled
>> 
>> 
>> Fatal trap 12: page fault while in kernel mode
>> cpuid = 6; apic id = 06
>> fault virtual address= 0xc10
>> fault code   = supervisor read data, page not present
>> instruction pointer  = 0x20:0x80695e76
>> stack pointer= 0x28:0xfe00bf14e6e0
>> frame pointer= 0x28:0xfe00bf14e720
>> code segment = base 0x0, limit 0xf, type 0x1b
>>  = DPL 0, pres 1, long 1, def32 0, gran 1
>> processor eflags = resume, IOPL = 0
>> current process  = 1686 (jail)
>> trap number  = 12
>> panic: page fault
>> cpuid = 6
>> time = 1605811310
>> KDB: stack backtrace:
>> #0 0x8069bb85 at kdb_backtrace+0x65
>> #1 0x80650a4b at vpanic+0x17b
>> #2 0x806508c3 at panic+0x43
>> #3 0x809d0351 at trap_fatal+0x391
>> #4 0x809d03af at trap_pfault+0x4f
>> #5 0x809cf9f6 at trap+0x286
>> #6 0x809a98c8 at calltrap+0x8
>> #7 0x80368a8d at ck_epoch_synchronize_wait+0x8d
>> #8 0x80695c8a at epoch_wait_preempt+0xaa
>> #9 0x80757d40 at vnet_if_init+0x120
>> #10 0x8078c994 at vnet_alloc+0x114
>> #11 0x8061e3f7 at kern_jail_set+0x1bb7
>> #12 0x80620190 at sys_jail_set+0x40
>> #13 0x809d0f07 at amd64_syscall+0x387
>> #14 0x809aa1ee at fast_syscall_common+0xf8
> 
> This panic is rather odd. This isn’t even the bridge code. This is during 
> initial creation of the vnet. I don’t really see how this could even trigger 
> panics.
> That panic looks as if something corrupted the net_epoch_preempt, by 
> overwriting the epoch->e_epoch. The bridge patches only access this variable 
> through the well-established functions and macros. I see no obvious way that 
> they could corrupt it.
> 
> Best regards,
> Kristof



smime.p7s
Description: S/MIME cryptographic signature


Re: Commit 367705+367706 causes a pabic

2020-11-20 Thread Kristof Provost

On 20 Nov 2020, at 11:18, peter.b...@bsd4all.org wrote:
I’m afraid the last Epoch fix for bridge is not solving the problem 
( or perhaps creates a new ).



We’re talking about the stable/12 branch, right?


This seems to happen when the jail epair is added to the bridge.

There must be something more to it than that. I’ve run the bridge 
tests on stable/12 without issue, and this is a problem we didn’t see 
when the bridge epochification initially went into stable/12.


Do you have a custom kernel config? Other patches? What exact commands 
do you run to trigger the panic?



kernel trap 12 with interrupts disabled


Fatal trap 12: page fault while in kernel mode
cpuid = 6; apic id = 06
fault virtual address   = 0xc10
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x80695e76
stack pointer   = 0x28:0xfe00bf14e6e0
frame pointer   = 0x28:0xfe00bf14e720
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= resume, IOPL = 0
current process = 1686 (jail)
trap number = 12
panic: page fault
cpuid = 6
time = 1605811310
KDB: stack backtrace:
#0 0x8069bb85 at kdb_backtrace+0x65
#1 0x80650a4b at vpanic+0x17b
#2 0x806508c3 at panic+0x43
#3 0x809d0351 at trap_fatal+0x391
#4 0x809d03af at trap_pfault+0x4f
#5 0x809cf9f6 at trap+0x286
#6 0x809a98c8 at calltrap+0x8
#7 0x80368a8d at ck_epoch_synchronize_wait+0x8d
#8 0x80695c8a at epoch_wait_preempt+0xaa
#9 0x80757d40 at vnet_if_init+0x120
#10 0x8078c994 at vnet_alloc+0x114
#11 0x8061e3f7 at kern_jail_set+0x1bb7
#12 0x80620190 at sys_jail_set+0x40
#13 0x809d0f07 at amd64_syscall+0x387
#14 0x809aa1ee at fast_syscall_common+0xf8


This panic is rather odd. This isn’t even the bridge code. This is 
during initial creation of the vnet. I don’t really see how this could 
even trigger panics.
That panic looks as if something corrupted the net_epoch_preempt, by 
overwriting the epoch->e_epoch. The bridge patches only access this 
variable through the well-established functions and macros. I see no 
obvious way that they could corrupt it.


Best regards,
Kristof
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Commit 367705+367706 causes a pabic

2020-11-20 Thread Peter Blok
Hi,

I’m afraid the last Epoch fix for bridge is not solving the problem ( or 
perhaps creates a new ).

This seems to happen when the jail epair is added to the bridge.

Removing both fixes solves the problem.

Peter


kernel trap 12 with interrupts disabled


Fatal trap 12: page fault while in kernel mode
cpuid = 6; apic id = 06
fault virtual address   = 0xc10
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x80695e76
stack pointer   = 0x28:0xfe00bf14e6e0
frame pointer   = 0x28:0xfe00bf14e720
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= resume, IOPL = 0
current process = 1686 (jail)
trap number = 12
panic: page fault
cpuid = 6
time = 1605811310
KDB: stack backtrace:
#0 0x8069bb85 at kdb_backtrace+0x65
#1 0x80650a4b at vpanic+0x17b
#2 0x806508c3 at panic+0x43
#3 0x809d0351 at trap_fatal+0x391
#4 0x809d03af at trap_pfault+0x4f
#5 0x809cf9f6 at trap+0x286
#6 0x809a98c8 at calltrap+0x8
#7 0x80368a8d at ck_epoch_synchronize_wait+0x8d
#8 0x80695c8a at epoch_wait_preempt+0xaa
#9 0x80757d40 at vnet_if_init+0x120
#10 0x8078c994 at vnet_alloc+0x114
#11 0x8061e3f7 at kern_jail_set+0x1bb7
#12 0x80620190 at sys_jail_set+0x40
#13 0x809d0f07 at amd64_syscall+0x387
#14 0x809aa1ee at fast_syscall_common+0xf8

smime.p7s
Description: S/MIME cryptographic signature