freebsd-update 12.3 to 14.0RC1 takes 12-24 hours (block cloning regression)

2023-10-17 Thread Kevin Bowling
Hi,

I have two systems with a zpool 2x2 mirror on 7.2k RPM disks.  One
system also has a flash SLOG.

The flash SLOG system took around 12 hours to complete freebsd-update
from 13.2 to 14.0-RC1.  The system without the SLOG took nearly 24
hours.  This was the result of ~50k patches, and ~10k files from
freebsd-update and a very pathological 'install' command performance.

'ps auxww | grep install':
root   52225   0.0  0.0  12852   2504  0  D+   20:55  0:00.00
install -S -o 0 -g 0 -m 0644
b6850914127c27fe192a41387f5cec04a1d927e6605ff09e8fd88dcd74fdec9d
///usr/src/sys/netgraph/ng_vlan.h
root   68042   0.0  0.0  13580   3648  0  I+   02:24  0:01.14
/bin/sh /usr/sbin/freebsd-update install  root   69946
0.0  0.0  13580   3632  0  S+   02:24  0:15.65 /bin/sh
/usr/sbin/freebsd-update install

'control+t on freebsd-update':

load: 0.16  cmd: install 97128 [tx->tx_sync_done_cv] 0.67r 0.00u 0.00s
0% 2440k
mi_switch+0xc2 _cv_wait+0x113 txg_wait_synced_impl+0xb9
txg_wait_synced+0xb dmu_offset_next+0x77 zfs_holey+0x137 zfs_fre
ebsd_ioctl+0x4f vn_generic_copy_file_range+0x64b
kern_copy_file_range+0x327 sys_copy_file_range+0x78
amd64_syscall+0x10c
 fast_syscall_common+0xf8

I spoke with mjg about this and because my pools do not have block
cloning enabled, copy_file_range turns into a massive pessimization in
'install'.  He suggested a workaround of 'sysctl
vfs.zfs.dmu_offset_next_sync=0' but we should probably sort this out
for 14.0-RELEASE.

Regards,
Kevin



Re: Potential show-stopper in em driver?

2023-08-14 Thread Kevin Bowling
On Mon, Aug 14, 2023 at 4:45 PM Greg 'groggy' Lehey  wrote:
>
> [moving to current as requested by bz@]
>
> On Monday, 14 August 2023 at 10:09:22 -0700, Kevin Bowling wrote:
> >
> > I'm able to replicate this on my I217 using iperf3.  It happens
> > quickly with flow control enabled (default) and takes about 15 minutes
> > of line rate with flow control disabled.  I am looking into the scope
> > of the issue and will commit a fix or enable chicken bits for affected
> > parts soon.
>
> Thanks.  Let me know when you have something and I'll test it.

I went ahead and reverted: 797e480cba8834e584062092c098e60956d28180

I thought I had a handle on the bug a few times but all I've managed
to do is make it take a bit longer to stall so far.  Whatever is going
on is very subtle and may be outside the em_txrx.c code.

> I'll reply to the other message later, but things look better without
> TCO.

I'm able to reproduce on my machine with the same NIC as yours so I no
longer need the requested debugging info.

> Greg
> --
> Sent from my desktop computer.
> See complete headers for address and phone numbers.
> This message is digitally signed.  If your Microsoft mail program
> reports problems, please read http://lemis.com/broken-MUA.php



Re: Potential show-stopper in em driver?

2023-08-13 Thread Kevin Bowling
On Sun, Aug 13, 2023 at 6:55 PM Greg 'groggy' Lehey  wrote:
>
> I've spent the last couple of days chasing random hangs on my -CURRENT
> box.  It seems to be related to the Ethernet driver (em).  I've been
> trying without much success to chase it down, and I'd be grateful.
> The box is headless, and all communication is via the net, which
> doesn't make it any easier.  I've tried a verbose boot, but nothing of
> interest shows up.  Typically it happens during the nightly backups,
> which are over NFS:
>
>   Aug 13 21:06:46 dereel kernel: <<<66>n>nffs server s6>neurekfs server 
> aeureka:/dum:p: /ndoump: tnot responding
>   Aug 13 21:06:46 dereel kernel:
>   Aug 13 21:06:46 dereel kernel: responding
>   Aug 13 21:06:46 dereel kernel:
>   Aug 13 21:06:46 dereel kernel: server eureka:/dump:n not responding
>
> And if you haven't seen those garbled messages before, admire.
> They've been there for a long time, and they have nothing to do with
> the problem.  More to the point, there are no other error messages.
>
> I've run three kernels on this box over the last few weeks:
>
> 1. FreeBSD dereel 14.0-CURRENT FreeBSD 14.0-CURRENT amd64 1400093 #10 
> main-n264292-7f9318a022ef: Mon Jul 24 17:13:32 AEST 2023 
> grog@dereel:/usr/obj/eureka/home/src/FreeBSD/git/main/amd64.amd64/sys/GENERIC 
> amd64
>
>This works with no problems.
>
> 2. FreeBSD 14.0-CURRENT amd64 1400094 #11 main-n264653-517e0978db1f: Thu Aug 
> 10 14:17:13 AEST 2023 
> grog@dereel:/usr/obj/eureka/home/src/FreeBSD/git/main/amd64.amd64/sys/GENERIC
>
> 3. FreeBSD dereel 14.0-ALPHA1 FreeBSD 14.0-ALPHA1 amd64 1400094 #12 
> main-n264693-b231322dbe95: Sat Aug 12 14:31:44 AEST 2023 
> grog@dereel:/usr/obj/eureka/home/src/FreeBSD/git/main/amd64.amd64/sys/GENERIC 
> amd64
>
>Both of these exhibit the problem.
>
> Note that we're now ALPHA1, so it's a good idea to get to the bottom
> of it.  The box is an ThinkCentre M93p.  I'm attaching a verbose boot
> log, though I don't expect anybody to find something of use there.
> I'm also currently building a new world in case something has happened
> since Saturday.

The verbose boot didn't make it in the email.  'dmesg | grep
em' would be a good enough start.

Can you post 'sysctl dev.em.' from the machine after
a lockup?  Since you don't seem to have OOB access you may have to get
creative with cron or something.

> Greg
> --
> Sent from my desktop computer.
> See complete headers for address and phone numbers.
> This message is digitally signed.  If your Microsoft mail program
> reports problems, please read http://lemis.com/broken-MUA.php



Re: ZFS deadlock in 14

2023-08-10 Thread Kevin Bowling
Spoke too soon still seeing zfs lockups under heavy poudriere workload
after the MFVs.  Regression time matches what has been reported here.

On Thu, Aug 10, 2023 at 4:33 PM Cy Schubert 
wrote:

> I haven't experienced any problems (yet) either.
>
>
> --
> Cheers,
> Cy Schubert 
> FreeBSD UNIX: Web:  https://FreeBSD.org
> NTP:   Web:  https://nwtime.org
>
> e^(i*pi)+1=0
>
>
> In message
>  om>
> , Kevin Bowling writes:
> > The two MFVs on head have improved/fixed stability with poudriere for
> > me 48 core bare metal.
> >
> > On Thu, Aug 10, 2023 at 6:37=E2=80=AFAM Cy Schubert
>  > com> wrote:
> > >
> > > In message
>  > l.c
> > > om>
> > > , Kevin Bowling writes:
> > > > Possibly
> https://github.com/openzfs/zfs/commit/2cb992a99ccadb78d97049b4=
> > 0bd4=3D
> > > > 42eb4fdc549d
> > > >
> > > > On Tue, Aug 8, 2023 at 10:08=3DE2=3D80=3DAFAM Dag-Erling
> Sm=3DC3=3DB8rg=
> > rav  > > > sd.org> wrote:
> > > > >
> > > > > At some point between 42d088299c (4 May) and f0c9703301 (26 June),
> a
> > > > > deadlock was introduced in ZFS.  It is still present as of
> 9c2823bae9=
> >  (4
> > > > > August) and is 100% reproducable just by starting poudriere bulk
> in a
> > > > > 16-core VM and waiting a few hours until deadlkres kicks in.  In
> the
> > > > > latest instance, deadlkres complained about a bash process:
> > > > >
> > > > > #0  sched_switch (td=3D3Dtd@entry=3D3D0xfe02fb1d8000,
> flags=
> > =3D3Dflags@e=3D
> > > > ntry=3D3D259) at /usr/src/sys/kern/sched_ule.c:2299
> > > > > #1  0x80b5a0a3 in mi_switch (flags=3D3Dflags@entry
> =3D3D25=
> > 9) at /u=3D
> > > > sr/src/sys/kern/kern_synch.c:550
> > > > > #2  0x80babcb4 in sleepq_switch
> (wchan=3D3D0xf818543a=
> > 9e70, =3D
> > > > pri=3D3D64) at /usr/src/sys/kern/subr_sleepqueue.c:609
> > > > > #3  0x80babb8c in sleepq_wait
> (wchan=3D3D, p=
> > ri=3D3D<=3D
> > > > unavailable>) at /usr/src/sys/kern/subr_sleepqueue.c:660
> > > > > #4  0x80b1c1b0 in sleeplk (lk=3D3Dlk@entry
> =3D3D0xf818=
> > 543a9e70=3D
> > > > , flags=3D3Dflags@entry=3D3D2121728, ilk=3D3Dilk@entry=3D3D0x0,
> wmesg=
> > =3D3Dwmesg@entry=3D
> > > > =3D3D0x8222a054 "zfs", pri=3D3D, pri@entry
> =3D3D6=
> > 4, timo=3D3D=3D
> > > > timo@entry=3D3D6, queue=3D3D1) at /usr/src/sys/kern/kern_lock.c:310
> > > > > #5  0x80b1a23f in lockmgr_slock_hard
> (lk=3D3D0xf81854=
> > 3a9e70=3D
> > > > , flags=3D3D2121728, ilk=3D3D,
> file=3D3D0x812544=
> > fb "/usr/s=3D
> > > > rc/sys/kern/vfs_subr.c", line=3D3D3057, lwa=3D3D0x0) at
> /usr/src/sys/ke=
> > rn/kern_=3D
> > > > lock.c:705
> > > > > #6  0x80c59ec3 in VOP_LOCK1
> (vp=3D3D0xf818543a9e00, f=
> > lags=3D
> > > > =3D3D2105344, file=3D3D0x812544fb
> "/usr/src/sys/kern/vfs_subr.c=
> > ", line=3D
> > > > =3D3D3057) at ./vnode_if.h:1120
> > > > > #7  _vn_lock (vp=3D3Dvp@entry=3D3D0xf818543a9e00,
> flags=3D3D2=
> > 105344, fi=3D
> > > > le=3D3D, line=3D3D, line@entry=3D3D3057)
> at /=
> > usr/src/sy=3D
> > > > s/kern/vfs_vnops.c:1815
> > > > > #8  0x80c4173d in vget_finish
> (vp=3D3D0xf818543a9e00,=
> >  flags=3D
> > > > =3D3D, vs=3D3Dvs@entry=3D3DVGET_USECOUNT) at
> /usr/src/sys/=
> > kern/vfs_s=3D
> > > > ubr.c:3057
> > > > > #9  0x80c1c9b7 in cache_lookup (dvp=3D3Ddvp@entry
> =3D3D0xf=
> > 802c=3D
> > > > d02ac40, vpp=3D3Dvpp@entry=3D3D0xfe046b20ac30, cnp=3D3Dcnp@entry
> =3D=
> > 3D0xfe04=3D
> > > > 6b20ac58, tsp=3D3Dtsp@entry=3D3D0x0, ticksp=3D3Dticksp@entry=3D3D0x0)
> a=
> > t /usr/src/s=3D
> > > > ys/kern/vfs_cache.c:2086
> > > > > #10 0x80c2150c in vfs_cache_lookup (ap=3D3D out=
> > >) at =3D
> > > > /usr/src/sys/kern/vfs_cache.c:3068
> > > > > #11 0x80c32c37 in VOP_LOOKUP
> (dvp=3D3D0xf802cd02ac40,=
> >  vpp=3D
> > > > =3D3D0xfe046b20ac30, cnp=3D3D0xfe046b20ac58) at
> ./vnode_if.h:69
> > > > > #12 vfs_lookup (ndp=3D3Dndp

Re: ZFS deadlock in 14

2023-08-10 Thread Kevin Bowling
The two MFVs on head have improved/fixed stability with poudriere for
me 48 core bare metal.

On Thu, Aug 10, 2023 at 6:37 AM Cy Schubert  wrote:
>
> In message  om>
> , Kevin Bowling writes:
> > Possibly https://github.com/openzfs/zfs/commit/2cb992a99ccadb78d97049b40bd4=
> > 42eb4fdc549d
> >
> > On Tue, Aug 8, 2023 at 10:08=E2=80=AFAM Dag-Erling Sm=C3=B8rgrav  > sd.org> wrote:
> > >
> > > At some point between 42d088299c (4 May) and f0c9703301 (26 June), a
> > > deadlock was introduced in ZFS.  It is still present as of 9c2823bae9 (4
> > > August) and is 100% reproducable just by starting poudriere bulk in a
> > > 16-core VM and waiting a few hours until deadlkres kicks in.  In the
> > > latest instance, deadlkres complained about a bash process:
> > >
> > > #0  sched_switch (td=3Dtd@entry=3D0xfe02fb1d8000, flags=3Dflags@e=
> > ntry=3D259) at /usr/src/sys/kern/sched_ule.c:2299
> > > #1  0x80b5a0a3 in mi_switch (flags=3Dflags@entry=3D259) at /u=
> > sr/src/sys/kern/kern_synch.c:550
> > > #2  0x80babcb4 in sleepq_switch (wchan=3D0xf818543a9e70, =
> > pri=3D64) at /usr/src/sys/kern/subr_sleepqueue.c:609
> > > #3  0x80babb8c in sleepq_wait (wchan=3D, pri=3D<=
> > unavailable>) at /usr/src/sys/kern/subr_sleepqueue.c:660
> > > #4  0x80b1c1b0 in sleeplk (lk=3Dlk@entry=3D0xf818543a9e70=
> > , flags=3Dflags@entry=3D2121728, ilk=3Dilk@entry=3D0x0, wmesg=3Dwmesg@entry=
> > =3D0x8222a054 "zfs", pri=3D, pri@entry=3D64, timo=3D=
> > timo@entry=3D6, queue=3D1) at /usr/src/sys/kern/kern_lock.c:310
> > > #5  0x80b1a23f in lockmgr_slock_hard (lk=3D0xf818543a9e70=
> > , flags=3D2121728, ilk=3D, file=3D0x812544fb "/usr/s=
> > rc/sys/kern/vfs_subr.c", line=3D3057, lwa=3D0x0) at /usr/src/sys/kern/kern_=
> > lock.c:705
> > > #6  0x80c59ec3 in VOP_LOCK1 (vp=3D0xf818543a9e00, flags=
> > =3D2105344, file=3D0x812544fb "/usr/src/sys/kern/vfs_subr.c", line=
> > =3D3057) at ./vnode_if.h:1120
> > > #7  _vn_lock (vp=3Dvp@entry=3D0xf818543a9e00, flags=3D2105344, fi=
> > le=3D, line=3D, line@entry=3D3057) at /usr/src/sy=
> > s/kern/vfs_vnops.c:1815
> > > #8  0x80c4173d in vget_finish (vp=3D0xf818543a9e00, flags=
> > =3D, vs=3Dvs@entry=3DVGET_USECOUNT) at /usr/src/sys/kern/vfs_s=
> > ubr.c:3057
> > > #9  0x80c1c9b7 in cache_lookup (dvp=3Ddvp@entry=3D0xf802c=
> > d02ac40, vpp=3Dvpp@entry=3D0xfe046b20ac30, cnp=3Dcnp@entry=3D0xfe04=
> > 6b20ac58, tsp=3Dtsp@entry=3D0x0, ticksp=3Dticksp@entry=3D0x0) at /usr/src/s=
> > ys/kern/vfs_cache.c:2086
> > > #10 0x80c2150c in vfs_cache_lookup (ap=3D) at =
> > /usr/src/sys/kern/vfs_cache.c:3068
> > > #11 0x80c32c37 in VOP_LOOKUP (dvp=3D0xf802cd02ac40, vpp=
> > =3D0xfe046b20ac30, cnp=3D0xfe046b20ac58) at ./vnode_if.h:69
> > > #12 vfs_lookup (ndp=3Dndp@entry=3D0xfe046b20abd8) at /usr/src/sys=
> > /kern/vfs_lookup.c:1266
> > > #13 0x80c31ce1 in namei (ndp=3Dndp@entry=3D0xfe046b20abd8=
> > ) at /usr/src/sys/kern/vfs_lookup.c:689
> > > #14 0x80c52090 in kern_statat (td=3D0xfe02fb1d8000, flag=
> > =3D, fd=3D-100, path=3D0xa75b480e070  > emory at address 0xa75b480e070>, pathseg=3Dpathseg@entry=3DUIO_USERSPACE, s=
> > bp=3Dsbp@entry=3D0xfe046b20ad18)
> > > at /usr/src/sys/kern/vfs_syscalls.c:2441
> > > #15 0x80c52797 in sys_fstatat (td=3D, uap=3D0xff=
> > fffe02fb1d8400) at /usr/src/sys/kern/vfs_syscalls.c:2419
> > > #16 0x81049398 in syscallenter (td=3D) at /usr=
> > /src/sys/amd64/amd64/../../kern/subr_syscall.c:190
> > > #17 amd64_syscall (td=3D0xfe02fb1d8000, traced=3D0) at /usr/src/s=
> > ys/amd64/amd64/trap.c:1199
> > > #18 
> > >
> > > The lock it is trying to acquire in frame 5 belongs to another bash
> > > process which is in the process of creating a fifo:
> > >
> > > #0  sched_switch (td=3Dtd@entry=3D0xfe046acd8e40, flags=3Dflags@e=
> > ntry=3D259) at /usr/src/sys/kern/sched_ule.c:2299
> > > #1  0x80b5a0a3 in mi_switch (flags=3Dflags@entry=3D259) at /u=
> > sr/src/sys/kern/kern_synch.c:550
> > > #2  0x80babcb4 in sleepq_switch (wchan=3D0xf8018acbf154, =
> > pri=3D87) at /usr/src/sys/kern/subr_sleepqueue.c:609
> > > #3  0x80babb8c in sleepq_wait (wchan=3D, pri=3D<=
> > unavailable>

Re: ZFS deadlock in 14

2023-08-09 Thread Kevin Bowling
Possibly 
https://github.com/openzfs/zfs/commit/2cb992a99ccadb78d97049b40bd442eb4fdc549d

On Tue, Aug 8, 2023 at 10:08 AM Dag-Erling Smørgrav  wrote:
>
> At some point between 42d088299c (4 May) and f0c9703301 (26 June), a
> deadlock was introduced in ZFS.  It is still present as of 9c2823bae9 (4
> August) and is 100% reproducable just by starting poudriere bulk in a
> 16-core VM and waiting a few hours until deadlkres kicks in.  In the
> latest instance, deadlkres complained about a bash process:
>
> #0  sched_switch (td=td@entry=0xfe02fb1d8000, flags=flags@entry=259) 
> at /usr/src/sys/kern/sched_ule.c:2299
> #1  0x80b5a0a3 in mi_switch (flags=flags@entry=259) at 
> /usr/src/sys/kern/kern_synch.c:550
> #2  0x80babcb4 in sleepq_switch (wchan=0xf818543a9e70, 
> pri=64) at /usr/src/sys/kern/subr_sleepqueue.c:609
> #3  0x80babb8c in sleepq_wait (wchan=, 
> pri=) at /usr/src/sys/kern/subr_sleepqueue.c:660
> #4  0x80b1c1b0 in sleeplk (lk=lk@entry=0xf818543a9e70, 
> flags=flags@entry=2121728, ilk=ilk@entry=0x0, 
> wmesg=wmesg@entry=0x8222a054 "zfs", pri=, 
> pri@entry=64, timo=timo@entry=6, queue=1) at /usr/src/sys/kern/kern_lock.c:310
> #5  0x80b1a23f in lockmgr_slock_hard (lk=0xf818543a9e70, 
> flags=2121728, ilk=, file=0x812544fb 
> "/usr/src/sys/kern/vfs_subr.c", line=3057, lwa=0x0) at 
> /usr/src/sys/kern/kern_lock.c:705
> #6  0x80c59ec3 in VOP_LOCK1 (vp=0xf818543a9e00, 
> flags=2105344, file=0x812544fb "/usr/src/sys/kern/vfs_subr.c", 
> line=3057) at ./vnode_if.h:1120
> #7  _vn_lock (vp=vp@entry=0xf818543a9e00, flags=2105344, 
> file=, line=, line@entry=3057) at 
> /usr/src/sys/kern/vfs_vnops.c:1815
> #8  0x80c4173d in vget_finish (vp=0xf818543a9e00, 
> flags=, vs=vs@entry=VGET_USECOUNT) at 
> /usr/src/sys/kern/vfs_subr.c:3057
> #9  0x80c1c9b7 in cache_lookup (dvp=dvp@entry=0xf802cd02ac40, 
> vpp=vpp@entry=0xfe046b20ac30, cnp=cnp@entry=0xfe046b20ac58, 
> tsp=tsp@entry=0x0, ticksp=ticksp@entry=0x0) at 
> /usr/src/sys/kern/vfs_cache.c:2086
> #10 0x80c2150c in vfs_cache_lookup (ap=) at 
> /usr/src/sys/kern/vfs_cache.c:3068
> #11 0x80c32c37 in VOP_LOOKUP (dvp=0xf802cd02ac40, 
> vpp=0xfe046b20ac30, cnp=0xfe046b20ac58) at ./vnode_if.h:69
> #12 vfs_lookup (ndp=ndp@entry=0xfe046b20abd8) at 
> /usr/src/sys/kern/vfs_lookup.c:1266
> #13 0x80c31ce1 in namei (ndp=ndp@entry=0xfe046b20abd8) at 
> /usr/src/sys/kern/vfs_lookup.c:689
> #14 0x80c52090 in kern_statat (td=0xfe02fb1d8000, 
> flag=, fd=-100, path=0xa75b480e070  memory at address 0xa75b480e070>, pathseg=pathseg@entry=UIO_USERSPACE, 
> sbp=sbp@entry=0xfe046b20ad18)
> at /usr/src/sys/kern/vfs_syscalls.c:2441
> #15 0x80c52797 in sys_fstatat (td=, 
> uap=0xfe02fb1d8400) at /usr/src/sys/kern/vfs_syscalls.c:2419
> #16 0x81049398 in syscallenter (td=) at 
> /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:190
> #17 amd64_syscall (td=0xfe02fb1d8000, traced=0) at 
> /usr/src/sys/amd64/amd64/trap.c:1199
> #18 
>
> The lock it is trying to acquire in frame 5 belongs to another bash
> process which is in the process of creating a fifo:
>
> #0  sched_switch (td=td@entry=0xfe046acd8e40, flags=flags@entry=259) 
> at /usr/src/sys/kern/sched_ule.c:2299
> #1  0x80b5a0a3 in mi_switch (flags=flags@entry=259) at 
> /usr/src/sys/kern/kern_synch.c:550
> #2  0x80babcb4 in sleepq_switch (wchan=0xf8018acbf154, 
> pri=87) at /usr/src/sys/kern/subr_sleepqueue.c:609
> #3  0x80babb8c in sleepq_wait (wchan=, 
> pri=) at /usr/src/sys/kern/subr_sleepqueue.c:660
> #4  0x80b59606 in _sleep (ident=ident@entry=0xf8018acbf154, 
> lock=lock@entry=0xf8018acbf120, priority=priority@entry=87, 
> wmesg=0x8223af0e "zfs teardown inactive", sbt=sbt@entry=0, 
> pr=pr@entry=0, flags=256)
> at /usr/src/sys/kern/kern_synch.c:225
> #5  0x80b45dc0 in rms_rlock_fallback (rms=0xf8018acbf120) at 
> /usr/src/sys/kern/kern_rmlock.c:1015
> #6  0x80b45c93 in rms_rlock (rms=, 
> rms@entry=0xf8018acbf120) at /usr/src/sys/kern/kern_rmlock.c:1036
> #7  0x81fb147b in zfs_freebsd_reclaim (ap=) at 
> /usr/src/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c:5164
> #8  0x8111d245 in VOP_RECLAIM_APV (vop=0x822e71a0 
> , a=a@entry=0xfe0410f1c9c8) at vnode_if.c:2180
> #9  0x80c43569 in VOP_RECLAIM (vp=0xf802cdbaca80) at 
> ./vnode_if.h:1084
> #10 vgonel (vp=vp@entry=0xf802cdbaca80) at 
> /usr/src/sys/kern/vfs_subr.c:4143
> #11 0x80c3ef61 in vtryrecycle (vp=0xf802cdbaca80) at 
> /usr/src/sys/kern/vfs_subr.c:1693
> #12 vnlru_free_impl (count=count@entry=1, mnt_op=mnt_op@entry=0x0, 
> mvp=0xf8010864da00) at /usr/src/sys/kern/vfs_subr.c:1

Re: Kernel panic after updating 14-CURRENT amd64 to main-n264268-ff4633d9f89

2023-07-22 Thread Kevin Bowling
On Sat, Jul 22, 2023 at 1:21 AM Yasuhiro Kimura  wrote:
>
> From: Kevin Bowling 
> Subject: Re: Kernel panic after updating 14-CURRENT amd64 to 
> main-n264268-ff4633d9f89
> Date: Fri, 21 Jul 2023 21:44:13 -0700
>
> > Thanks, I have reverted for now.  Can you tell me which NIC is
> > implemented there?
>
> Output of `pciconf -lv` says as following.
>
> em0@pci0:0:3:0: class=0x02 rev=0x02 hdr=0x00 vendor=0x8086 device=0x100e 
> subvendor=0x8086 subdevice=0x001e
> vendor = 'Intel Corporation'
> device = '82540EM Gigabit Ethernet Controller'
> class  = network
> subclass   = ethernet
>
> Regards.

Thanks for the report, I've identified the errors and recommitted.

> ---
> Yasuhiro Kimura



Re: Kernel panic after updating 14-CURRENT amd64 to main-n264268-ff4633d9f89

2023-07-21 Thread Kevin Bowling
Thanks, I have reverted for now.  Can you tell me which NIC is
implemented there?

On Fri, Jul 21, 2023 at 12:45 PM Yasuhiro Kimura  wrote:
>
> From: Yasuhiro Kimura 
> Subject: Kernel panic after updating 14-CURRENT amd64 to 
> main-n264268-ff4633d9f89
> Date: Sat, 22 Jul 2023 02:50:23 +0900 (JST)
>
> > After updating my 14.0-CURRENT amd64 system from
> > main-n264162-f58378393fb to main-n264268-ff4633d9f89, kernel crashes
> > with panic as following.
> >
> > https://people.freebsd.org/~yasu/FreeBSD-14-CURRENT-amd64-main-n264268-ff4633d9f89.20230721.panic.png
>
> According to the result of bisect, kernel panic starts with following
> commit.
>
> --
> commit 95f7b36e8fac45092b9a4eea5e32732e979989f0
> Author: Kevin Bowling 
> Date:   Thu Jul 20 20:30:00 2023 -0700
>
> e1000: lem(4)/em(4) ifcaps, TSO and hwcsum fixes
>
> * em(4) obey administrative ifcaps for using hwcsum offload
> * em(4) obey administrative ifcaps for hw vlan receive tagging
> * em(4) add additional TSO6 ifcap, but disabled by default as is TSO4
> * lem(4) obey administrative ifcaps for using hwcsum offload
> * lem(4) add support for hw vlan receive tagging
> * lem(4) Add ifcaps for TSO offload experimentation, but disabled by
>   default due to errata and possibly missing txrx code.
> * lem(4) disable HWCSUM ifcaps by default on 82547 due to errata around
>   full duplex links.  It may still be administratively enabled.
>
> Reviewed by:markj (previous version)
> MFC after:  2 weeks
> Differential Revision:  https://reviews.freebsd.org/D30072
> --
>
> Cc-ing to its committer.
>
> ---
> Yasuhiro Kimura



amd64: enable options NUMA ing GENERIC and MINIMAL

2018-09-11 Thread Kevin Bowling
-CURRENT users, in svn rS338602, 'options NUMA' has just been enabled
for amd64 GENERIC and MINIMAL kernels.

This should provide good effect to systems with more than one physical
CPU with associated memory, certain high core count Intel chips when
configured to use Cluster-on-Die or Sub-NUMA-Clustering, and recent
AMD products with multiple chiplets like Thread Ripper and EPYC.  If
you have a single domain system, there is no expected change.

If you have such NUMA machines, early testing would be greatly
appreciated in order to ensure the quality and stability of the 12.0
release.

You can confirm configuration of NUMA domains with 'sysctl
vm.ndomains', example output:
vm.ndomains: 2

This work was sponsored by Dell EMC Isilon and Netflix and led by Jeff
Roberson.  Some folks have participated for a long time in bi-weekly
calls and stabilization.  It is known to be used in production
configurations for some time at Netflix and Limelight Networks.

If you encounter any issues where you suspect this is the root cause,
please disable 'options NUMA' in the kernel configuration and
reproduce before reporting as such.

This would be a good time to audit your kernel config if copied from
GENERIC and consider if you can instead use 'include GENERIC-NODEBUG'
to only track local modifications.

Regards,
Kevin Bowling
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [RFC] Deprecation and removal of the drm2 driver

2018-05-30 Thread Kevin Bowling
32b compat is quite different than i386 arch.  It makes sense to
maintain 32b compat for quite a while.

On Wed, May 30, 2018 at 3:33 AM, Thomas Mueller  wrote:
>> Wow, this blew up quite a lot bigger than I anticipated.  I'll try to
>> summarize the discussion a bit below and then suggest a way forward.
>
>> The primary reasons we want to do this is because there are conflicts between
>> the new drm drivers in ports, and the drm drivers in base, since they control
>> the same hardware.  It is hard to make conflicting drivers to auto load in a
>> consistent way.  In order to improve the desktop experience I'd like to see
>> that graphics drivers are loaded on system boot.  There is also a push from
>> upstream to have the xf86-video* drivers stop loading driver kernel modules.
>> It is also easier to keep a port updated than keeping the base system 
>> updated,
>> and updates can propagate to multiple FreeBSD versions at once.  This will
>> also ensure that all ports use the same firmware blobs.
>
>> So, to the summary.  A lot of people are using i386, and as such still need
>> the old drm drivers.  There were also some reports about issues with the
>> drm-next/stable drivers, which needs investigating. Power is another
>> architecture that also is not supported by drm-next/stable, although we hope
>> to extend support to powerpc in the future. There was a lot of discussion
>> regarding making it into a port, or only excluding the driver on amd64, and
>> similar suggestions.
>
>> To move forward, we'll do the following:  Note that this is for current only.
>> We take the drm and drm2 drivers and make a port for it, maintained by the
>> graphics team (x11@).  After a transition period, then the drivers are 
>> removed
>> from base.  At the same time, pkg-messages are added to relevant places to
>> point people to the various available drm drivers.
>
>> Regards
>
>> Niclas Zeising
>> FreeBSD graphics/x11 team
>
> One reason I can think of to maintain i386 compatibility is to be able to run 
> wine and possibly other software that requires i386 compatibility.
>
> That said, I currently have no active FreeBSD i386 installation, and probably 
> won't get around to it anytime soon.
>
> I believe Linux can run wine on an amd64 multilib installation, but FreeBSD 
> is not up to that yet.
>
> For the above purpose, keeping drm and drm2 as a port might be good enough, 
> as opposed to being part of base.
>
> i386 is not dead.  While some Linux distros (such as Arch) and DragonFlyBSD 
> have quit i386 support, Haiku maintains 32-bit support to be able to run old 
> BeOS software as well as newer things.
>
> Tom
>
> ___
> freebsd-...@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-x11
> To unsubscribe, send any mail to "freebsd-x11-unsubscr...@freebsd.org"
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [RFC] future of drm1 in base

2017-09-04 Thread Kevin Bowling
I wasn't subscribed to -x11 earlier but do want to add some commentary
to this thread.

The language describing these drivers in upstream is not at all
ambiguous WRT to how bad these are and Linux distributions have
dropped them:


menuconfig DRM_LEGACY
bool "Enable legacy drivers (DANGEROUS)"
depends on DRM && MMU
select DRM_VM
help
 Enable legacy DRI1 drivers. Those drivers expose unsafe and dangerous
 APIs to user-space, which can be used to circumvent access
 restrictions and other security measures. For backwards compatibility
 those drivers are still available, but their use is highly
 inadvisable and might harm your system.

 You are recommended to use the safe modeset-only drivers instead, and
 perform 3D emulation in user-space.

 Unless you have strong reasons to go rogue, say "N".


There seems to be some kind of misunderstanding that removing these
drivers is taking something away from users.  I disagree.  It does
_not_ deorbit HW support.  We'd be giving users a supportable and
secure experience by dropping them.  For 2d desktop it is likely that
a framebuffer is fast(er) as David states.  On any reasonable CPU 3d
(like the server models people pointed out) may even be faster using
llvmpipe.

Regards,

On Mon, Sep 4, 2017 at 12:33 PM, Johannes M Dieterich  wrote:
>
>
>
> Mark Johnston – Sun., 3. September 2017 15:19
>> On Sat, Sep 02, 2017 at 10:02:57PM -0400, Johannes M Dieterich wrote:
>> > Dear current/x11,
>> >
>> > please CC me on responses.
>> >
>> > I am writing you on behalf of the FreeBSDDesktop team concerning the
>> > future of drm1 in base.
>> >
>> > drm1 in base supports the following GPUs:
>> > * 3dfx Banshee/Voodoo3+ (tdfx)
>> > * ATI Rage 128 (r128)
>> > * ATI Rage Pro (mach64)
>> > * Matrox G200/G400 (mga)
>> > * Savage3D/MX/IX, Savage4, SuperSavage, Twister, ProSavage[DDR] (savage)
>> > * SIS 300/630/540 and XGI V3XE/V5/V8 (sis)
>> > * VIA Unichrome / Pro (via)
>> >
>> > Since their original introduction up to 2010 these drivers have mostly
>> > been maintained as part of larger cleanups. The newest hardware drm1
>> > supports dates from 2004, if I am not mistaken, and most of the
>> > hardware is AGP-based.
>> >
>> > With the introduction of graphics/drm-next-kmod which brings its own
>> > drm.ko following the Linux notation, we are facing collisions between
>> > these old drivers' drm.ko and the newer one.
>>
>> I don't think this is a real problem. The reason one currently needs to
>> manually load the drm-next drm.ko (rather than just kldloading a driver
>> and having it pick up the right drm.ko automatically) is that our drm.ko
>> defines the same module ("drmn") as drm2.ko in the base system. So upon
>> attempting to load a drm-next driver, the kernel uses the linker hints
>> to load drm2.ko, which is incorrect. However, this can be addressed by
>> simply bumping the drmn version in the port and modifying the drivers
>> accordingly. I've submitted a 4-line PR which does exactly that. After
>> that change, we can modify the pkg-message to omit drm.ko from the
>> kld_list value. As a result, the name of our DRM module doesn't matter
>> since users don't need to specify it, so the collision with drm1 isn't a
>> problem.
>>
>> > We would like to hear if anybody still runs CURRENT on machines housing
>> > the above GPUs and relies on drm1.
>> >
>> > If there are still a significant number of people running CURRENT on
>> > this hardware in production, we would be willing to make a
>> > graphics/drm-legacy-kmod port.
>>
>> With the PR I mentioned above, I think it's a non-issue to keep drm1 in
>> the base system. Since there appear to be at least some users of those
>> drivers, I really think it would be preferable to avoid removing them
>> unless it's absolutely necessary.
> Your proposed solution does work, thanks for providing it! Let's move this 
> conversation to a later point in time then.
>
> Johannes
> ___
> freebsd-...@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-x11
> To unsubscribe, send any mail to "freebsd-x11-unsubscr...@freebsd.org"
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: !EARLY_AP_STARTUP and -CURRENT

2017-08-31 Thread Kevin Bowling
panic: mutex sched lock 0 not owned at /d0/kev/freebsd/sys/kern/sched_ule.c:2379

On Thu, Aug 31, 2017 at 7:38 AM, John Baldwin  wrote:
> On Wednesday, August 30, 2017 04:54:07 PM Kevin Bowling wrote:
>> I'm dealing with a shit sandwich right now where the mps(4) or cam_da
>> reorders drives on a few thousand legacy MBR machines I have (and I
>> can't easily install glabel ATM), and !EARLY_AP_STARTUP seems to have
>> regressed.  I'd like to be able to run w/o EARLY_AP_STARTUP right
>> quick so I can take a more leisurely approach to fixing mps(4) boot
>> probe correctly (freebsd-scsi@ has that thread).
>>
>> With WITNESS and !EARLY_AP_STARTUP I hit an assert in sched_setpreempt
>> in kern/sched_ule.c 100% of the time.  Here are a couple invocations,
>> with oddness around a different CPU holding the curthread lock but
>> somehow a different AP is runnable in the function:
>
> Do you have the panic messages?
>
>> Tracing pid 11 tid 100020 td 0xf80128cd1560
>> kdb_enter() at kdb_enter+0x3b/frame 0xfe3e653dcc10
>> vpanic() at vpanic+0x1b9/frame 0xfe3e653dcc90
>> panic() at panic+0x43/frame 0xfe3e653dccf0
>> __mtx_assert() at __mtx_assert+0xb4/frame 0xfe3e653dcd00
>> sched_add() at sched_add+0x152/frame 0xfe3e653dcd40
>> intr_event_schedule_thread() at intr_event_schedule_thread+0xca/frame
>> 0xfe3e653dcd80
>> swi_sched() at swi_sched+0x6c/frame 0xfe3e653dcdc0
>> softclock_call_cc() at softclock_call_cc+0x155/frame 0xfe3e653dce70
>> callout_process() at callout_process+0x1f9/frame 0xfe3e653dcef0
>> handleevents() at handleevents+0x1a4/frame 0xfe3e653dcf30
>> cpu_initclocks_ap() at cpu_initclocks_ap+0xc8/frame 0xfe3e653dcf60
>> init_secondary_tail() at init_secondary_tail+0x1e3/frame 0xfe3e653dcf90
>> init_secondary() at init_secondary+0x2b3/frame 0xfe3e653dcff0
>>
>>
>> db> show thread 0xf80128cd1560
>> Thread 100020 at 0xf80128cd1560:
>>  proc (pid 11): 0xf80128cb5000
>>  name: idle: cpu17
>>  stack: 0xfe3e5cd88000-0xfe3e5cd8bfff
>>  flags: 0x40024  pflags: 0x20
>>  state: CAN RUN
>>  priority: 255
>>  container lock: sched lock 0 (0x81c39800)
>> db> show lock 0x81c39800
>>  class: spin mutex
>>  name: sched lock 0
>>  flags: {SPIN, RECURSE}
>>  state: {OWNED}
>>  owner: 0xf80128cca000 (tid 100017, pid 11, "idle: cpu14")
>>
>>
>> db> bt
>> Tracing pid 11 tid 100021 td 0xf80128cd2000
>> kdb_enter() at kdb_enter+0x3b/frame 0xfe3e655e4c10
>> vpanic() at vpanic+0x1b9/frame 0xfe3e655e4c90
>> panic() at panic+0x43/frame 0xfe3e655e4cf0
>> __mtx_assert() at __mtx_assert+0xb4/frame 0xfe3e655e4d00
>> sched_add() at sched_add+0x152/frame 0xfe3e655e4d40
>> intr_event_schedule_thread() at intr_event_schedule_thread+0xca/frame
>> 0xfe3e655e4d80
>> swi_sched() at swi_sched+0x6c/frame 0xfe3e655e4dc0
>> softclock_call_cc() at softclock_call_cc+0x155/frame 0xfe3e655e4e70
>> callout_process() at callout_process+0x1f9/frame 0xfe3e655e4ef0
>> handleevents() at handleevents+0x1a4/frame 0xfe3e655e4f30
>> cpu_initclocks_ap() at cpu_initclocks_ap+0xc8/frame 0xfe3e655e4f60
>> init_secondary_tail() at init_secondary_tail+0x1e3/frame 0xfe3e655e4f90
>> init_secondary() at init_secondary+0x2b3/frame 0xfe3e655e4ff0
>> db> show thread 0xf80128cd2000
>> Thread 100021 at 0xf80128cd2000:
>>  proc (pid 11): 0xf80128cb6000
>>  name: idle: cpu18
>>  stack: 0xfe3e5cf17000-0xfe3e5cf1afff
>>  flags: 0x40024  pflags: 0x20
>>  state: CAN RUN
>>  priority: 255
>>  container lock: sched lock 0 (0x81c39800)
>> db> show lock 0x81c39800
>>  class: spin mutex
>>  name: sched lock 0
>>  flags: {SPIN, RECURSE}
>>  state: {OWNED}
>>  owner: 0xf80128cdb560 (tid 100028, pid 11, "idle: cpu25")
>>
>> Regards,
>> Kevin
>
>
> --
> John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


!EARLY_AP_STARTUP and -CURRENT

2017-08-30 Thread Kevin Bowling
I'm dealing with a shit sandwich right now where the mps(4) or cam_da
reorders drives on a few thousand legacy MBR machines I have (and I
can't easily install glabel ATM), and !EARLY_AP_STARTUP seems to have
regressed.  I'd like to be able to run w/o EARLY_AP_STARTUP right
quick so I can take a more leisurely approach to fixing mps(4) boot
probe correctly (freebsd-scsi@ has that thread).

With WITNESS and !EARLY_AP_STARTUP I hit an assert in sched_setpreempt
in kern/sched_ule.c 100% of the time.  Here are a couple invocations,
with oddness around a different CPU holding the curthread lock but
somehow a different AP is runnable in the function:

Tracing pid 11 tid 100020 td 0xf80128cd1560
kdb_enter() at kdb_enter+0x3b/frame 0xfe3e653dcc10
vpanic() at vpanic+0x1b9/frame 0xfe3e653dcc90
panic() at panic+0x43/frame 0xfe3e653dccf0
__mtx_assert() at __mtx_assert+0xb4/frame 0xfe3e653dcd00
sched_add() at sched_add+0x152/frame 0xfe3e653dcd40
intr_event_schedule_thread() at intr_event_schedule_thread+0xca/frame
0xfe3e653dcd80
swi_sched() at swi_sched+0x6c/frame 0xfe3e653dcdc0
softclock_call_cc() at softclock_call_cc+0x155/frame 0xfe3e653dce70
callout_process() at callout_process+0x1f9/frame 0xfe3e653dcef0
handleevents() at handleevents+0x1a4/frame 0xfe3e653dcf30
cpu_initclocks_ap() at cpu_initclocks_ap+0xc8/frame 0xfe3e653dcf60
init_secondary_tail() at init_secondary_tail+0x1e3/frame 0xfe3e653dcf90
init_secondary() at init_secondary+0x2b3/frame 0xfe3e653dcff0


db> show thread 0xf80128cd1560
Thread 100020 at 0xf80128cd1560:
 proc (pid 11): 0xf80128cb5000
 name: idle: cpu17
 stack: 0xfe3e5cd88000-0xfe3e5cd8bfff
 flags: 0x40024  pflags: 0x20
 state: CAN RUN
 priority: 255
 container lock: sched lock 0 (0x81c39800)
db> show lock 0x81c39800
 class: spin mutex
 name: sched lock 0
 flags: {SPIN, RECURSE}
 state: {OWNED}
 owner: 0xf80128cca000 (tid 100017, pid 11, "idle: cpu14")


db> bt
Tracing pid 11 tid 100021 td 0xf80128cd2000
kdb_enter() at kdb_enter+0x3b/frame 0xfe3e655e4c10
vpanic() at vpanic+0x1b9/frame 0xfe3e655e4c90
panic() at panic+0x43/frame 0xfe3e655e4cf0
__mtx_assert() at __mtx_assert+0xb4/frame 0xfe3e655e4d00
sched_add() at sched_add+0x152/frame 0xfe3e655e4d40
intr_event_schedule_thread() at intr_event_schedule_thread+0xca/frame
0xfe3e655e4d80
swi_sched() at swi_sched+0x6c/frame 0xfe3e655e4dc0
softclock_call_cc() at softclock_call_cc+0x155/frame 0xfe3e655e4e70
callout_process() at callout_process+0x1f9/frame 0xfe3e655e4ef0
handleevents() at handleevents+0x1a4/frame 0xfe3e655e4f30
cpu_initclocks_ap() at cpu_initclocks_ap+0xc8/frame 0xfe3e655e4f60
init_secondary_tail() at init_secondary_tail+0x1e3/frame 0xfe3e655e4f90
init_secondary() at init_secondary+0x2b3/frame 0xfe3e655e4ff0
db> show thread 0xf80128cd2000
Thread 100021 at 0xf80128cd2000:
 proc (pid 11): 0xf80128cb6000
 name: idle: cpu18
 stack: 0xfe3e5cf17000-0xfe3e5cf1afff
 flags: 0x40024  pflags: 0x20
 state: CAN RUN
 priority: 255
 container lock: sched lock 0 (0x81c39800)
db> show lock 0x81c39800
 class: spin mutex
 name: sched lock 0
 flags: {SPIN, RECURSE}
 state: {OWNED}
 owner: 0xf80128cdb560 (tid 100028, pid 11, "idle: cpu25")

Regards,
Kevin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: HOWTO articles for migrating from Linux to FreeBSD, especially for pkg?

2014-07-20 Thread Kevin Bowling

On 7/18/2014 1:18 PM, Alfred Perlstein wrote:


On 7/18/14, 6:28 AM, Allan Jude wrote:

On 2014-07-17 16:12, Adrian Chadd wrote:

On 17 July 2014 13:03, Alberto Mijares  wrote:

On Thu, Jul 17, 2014 at 2:58 PM, Adrian Chadd 
wrote:

Hi!

3) The binary packages need to work out of the box
4) .. which means, when you do things like pkg install apache, it
can't just be installed and not be enabled, because that's a bit of a
problem;


No. Please NEVER do that! The user must be able to edit the files and
start the service by himself.

Cool, so what's the single line command needed to type in to start a
given package service?



-a
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to
"freebsd-current-unsubscr...@freebsd.org"


We could make 'service apache22 enable'

which can run: sysrc -f /etc/rc.conf apache22_enable="YES"

and 'service apache22 disable'

that can use sysrc -x

And then ports can individually extend the functionality if they require.


I like this a lot.

That said, if other distros are setting up apache in 2 steps and we
require 3 then we require 50% MORE STEPs!

Or they require 33% LESS steps than us.

Just to put it into perspective.  Should FreeBSD be 50% more difficult
or time consuming to configure?

-Alfred


Yes.  As someone who works on a large fleet of Ubuntu systems, the worst 
thing dpkg does is auto-start services and it even auto-restarts them on 
updates in some cases.


* Starting a service is a security risk.  Especially before it has been 
configured, either manually or with tools.  This is potentially true 
even with "sane defaults" - for instance, the pkg may be installed from 
an image/media and need to be updated from an internet repo because the 
image has aged.
* Mandatory (re)starting of a service may happen before all deps are 
upgraded/installed, requiring multiple pointless and time consuming 
restarts.
* Likewise, starting a service before the manual or CM policy hits can 
cause all sorts of problems, difficulties, and again even security 
implications.


The way of doing things for large infrastructure is using some type of 
config management or orchestration tool like Puppet, Chef, Salt, 
Ansible, cfengine.  This is even the case for small deployments for the 
types of users Craig was talking about in the initial post.


Regards,
Kevin


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"