Re: FreeBSD panics possibly caused by nfs clients

2024-03-06 Thread Rick Macklem
On Wed, Mar 6, 2024 at 10:56 AM Matthew L. Dailey wrote: > > Posting a few updates on this issue. > > I was able to induce a panic on a CURRENT kernel (20240215), built with > GENERIC-KASAN and running kern.kstack_pages=6 (default) after ~189 > hours. The panic message and backtrace are below -

Re: FreeBSD panics possibly caused by nfs clients

2024-03-06 Thread Matthew L. Dailey
Posting a few updates on this issue. I was able to induce a panic on a CURRENT kernel (20240215), built with GENERIC-KASAN and running kern.kstack_pages=6 (default) after ~189 hours. The panic message and backtrace are below - please reach out directly if you'd like to have a look at the core.

Re: FreeBSD panics possibly caused by nfs clients

2024-02-20 Thread Rick Macklem
On Tue, Feb 20, 2024 at 11:21 AM Matthew L. Dailey wrote: > > Hi all, > > I induced a panic on my CURRENT (20240215-d79b6b8ec267-268300) VM after > about 24 hours. This is the one without any debugging, so it only > confirms the fact that the panics we've been experiencing still exist in >

Re: FreeBSD panics possibly caused by nfs clients

2024-02-20 Thread Matthew L. Dailey
Hi all, I induced a panic on my CURRENT (20240215-d79b6b8ec267-268300) VM after about 24 hours. This is the one without any debugging, so it only confirms the fact that the panics we've been experiencing still exist in CURRENT. There was some disk issue that prevented the dump, so all I have

Re: FreeBSD panics possibly caused by nfs clients

2024-02-19 Thread Rick Macklem
On Mon, Feb 19, 2024 at 7:44 AM Matthew L. Dailey wrote: > > Hi all, > > So I finally induced a panic on a "pure" ufs system - root and exported > filesystem were both ufs. So, I think this definitively rules out zfs as > a source of the issue. > > This panic was on 14.0p5 without debugging

Re: FreeBSD panics possibly caused by nfs clients

2024-02-19 Thread Matthew L. Dailey
Hi all, So I finally induced a panic on a "pure" ufs system - root and exported filesystem were both ufs. So, I think this definitively rules out zfs as a source of the issue. This panic was on 14.0p5 without debugging options, so the core may not be helpful. The panic and backtrace are below

Re: FreeBSD panics possibly caused by nfs clients

2024-02-16 Thread Matthew L. Dailey
Hi all, Before the week was out, I wanted to provide an update on this issue. Last weekend, I installed two VMs with CURRENT (20240208-82bebc793658-268105) - one on zfs and one on ufs - and built a kernel with this config file: include GENERIC ident THAYER-FULLDEBUG makeoptions DEBUG=-g

Re: FreeBSD panics possibly caused by nfs clients

2024-02-09 Thread Rick Macklem
On Fri, Feb 9, 2024 at 10:23 AM Matthew L. Dailey wrote: > > I had my first kernel panic with a KASAN kernel after only 01:27. This > first panic was a "double fault," which isn't anything we've seen > previously - usually we've seen trap 9 or trap 12, but sometimes others. > Based on the

Re: FreeBSD panics possibly caused by nfs clients

2024-02-09 Thread Mark Johnston
On Fri, Feb 09, 2024 at 10:11:14PM +, Matthew L. Dailey wrote: > On 2/9/24 4:18 PM, Mark Johnston wrote: > > [You don't often get email from ma...@freebsd.org. Learn why this is > > important at https://aka.ms/LearnAboutSenderIdentification ] > > > > On Fri, Feb 09, 2024 at 06:23:08PM +,

Re: FreeBSD panics possibly caused by nfs clients

2024-02-09 Thread Rick Macklem
On Fri, Feb 9, 2024 at 2:04 PM Zaphod Beeblebrox wrote: > > Just in case it's relevant, I'm carrying around this patch on my fairly busy > little RISC-V machine. > > diff --git a/sys/fs/nfsclient/nfs_clvnops.c b/sys/fs/nfsclient/nfs_clvnops.c > index 0b8c587a542c..85c0ebd7a10f 100644 > ---

Re: FreeBSD panics possibly caused by nfs clients

2024-02-09 Thread Matthew L. Dailey
On 2/9/24 4:18 PM, Mark Johnston wrote: > [You don't often get email from ma...@freebsd.org. Learn why this is > important at https://aka.ms/LearnAboutSenderIdentification ] > > On Fri, Feb 09, 2024 at 06:23:08PM +, Matthew L. Dailey wrote: >> I had my first kernel panic with a KASAN

Re: FreeBSD panics possibly caused by nfs clients

2024-02-09 Thread Zaphod Beeblebrox
Just in case it's relevant, I'm carrying around this patch on my fairly busy little RISC-V machine. diff --git a/sys/fs/nfsclient/nfs_clvnops.c b/sys/fs/nfsclient/nfs_clvnops.c index 0b8c587a542c..85c0ebd7a10f 100644 --- a/sys/fs/nfsclient/nfs_clvnops.c +++ b/sys/fs/nfsclient/nfs_clvnops.c @@

Re: FreeBSD panics possibly caused by nfs clients

2024-02-09 Thread Mark Johnston
On Fri, Feb 09, 2024 at 06:23:08PM +, Matthew L. Dailey wrote: > I had my first kernel panic with a KASAN kernel after only 01:27. This > first panic was a "double fault," which isn't anything we've seen > previously - usually we've seen trap 9 or trap 12, but sometimes others. > Based on

Re: FreeBSD panics possibly caused by nfs clients

2024-02-09 Thread Matthew L. Dailey
I had my first kernel panic with a KASAN kernel after only 01:27. This first panic was a "double fault," which isn't anything we've seen previously - usually we've seen trap 9 or trap 12, but sometimes others. Based on the backtrace, it definitely looks like KASAN caught something, but I don't

Re: FreeBSD panics possibly caused by nfs clients

2024-02-09 Thread Matthew L. Dailey
On 2/9/24 11:04 AM, Mark Johnston wrote: > [You don't often get email from ma...@freebsd.org. Learn why this is > important at https://aka.ms/LearnAboutSenderIdentification ] > > On Thu, Feb 08, 2024 at 03:34:52PM +, Matthew L. Dailey wrote: >> Good morning all, >> >> Per Rick Macklem's

Re: FreeBSD panics possibly caused by nfs clients

2024-02-09 Thread Mark Johnston
On Thu, Feb 08, 2024 at 03:34:52PM +, Matthew L. Dailey wrote: > Good morning all, > > Per Rick Macklem's suggestion, I'm posting this query here in the hopes > that other may have ideas. > > We did do some minimal testing with ufs around this problem back in > August, but hadn't narrowed

FreeBSD panics possibly caused by nfs clients

2024-02-08 Thread Matthew L. Dailey
Good morning all, Per Rick Macklem's suggestion, I'm posting this query here in the hopes that other may have ideas. We did do some minimal testing with ufs around this problem back in August, but hadn't narrowed the issue down to hdf5 workloads yet, so testing was inconclusive. We'll do