Re: NFSv4 crash of CURRENT

Peter Blok Mon, 15 Jan 2024 07:59:49 -0800

Rick,

I can confirm Kostik’s fix works on 13-stable.


Peter

> On 15 Jan 2024, at 16:13, Peter Blok <pb...@bsd4all.org> wrote:
> 
> I can give it a shot on one of my clients.
> 
>> On 15 Jan 2024, at 16:04, Rick Macklem <rick.mack...@gmail.com 
>> <mailto:rick.mack...@gmail.com>> wrote:
>> 
>> On Mon, Jan 15, 2024 at 2:53 AM Peter Blok <pb...@bsd4all.org 
>> <mailto:pb...@bsd4all.org>> wrote:
>>> 
>>> Hi,
>>> 
>>> Forgot to mention I’m on 13-stable. The fix that is causing the crash with 
>>> automounted NFS is:
>>> 
>>> commit cc5cda1dbaa907ce52074f47264cc45b5a7d6c8b
>>> Author: Konstantin Belousov <k...@freebsd.org <mailto:k...@freebsd.org>>
>>> Date:   Tue Jan 2 00:22:44 2024 +0200
>>> 
>>>    nfsclient: limit situations when we do unlocked read-ahead by nfsiod
>>> 
>>>    (cherry picked from commit 70dc6b2ce314a0f32755005ad02802fca7ed186e)
>>> 
>>> When I remove the fix, the problem is gone. Add it back and the crash 
>>> happens.
>> Kostik has already come up with a probable fix. If you want it right
>> away, here it is,
>> but he'll probably commit it soon anyhow:
>> diff --git a/sys/fs/nfsclient/nfs_clbio.c b/sys/fs/nfsclient/nfs_clbio.c
>> index c027d7d7c3fd..1cf45bb0c924 100644
>> --- a/sys/fs/nfsclient/nfs_clbio.c
>> +++ b/sys/fs/nfsclient/nfs_clbio.c
>> @@ -414,6 +414,18 @@ nfs_bioread_check_cons(struct vnode *vp, struct
>> thread *td, struct ucred *cred)
>>        return (error);
>> }
>> 
>> +static bool
>> +ncl_bioread_dora(struct vnode *vp)
>> +{
>> +       vm_object_t obj;
>> +
>> +       obj = vp->v_object;
>> +       if (obj == NULL)
>> +               return (true);
>> +       return (!vm_object_mightbedirty(vp->v_object) &&
>> +           vp->v_object->un_pager.vnp.writemappings == 0);
>> +}
>> +
>> /*
>>  * Vnode op for read using bio
>>  */
>> @@ -486,9 +498,7 @@ ncl_bioread(struct vnode *vp, struct uio *uio, int
>> ioflag, struct ucred *cred)
>>                 * unlocked read by nfsiod could obliterate changes
>>                 * done by userspace.
>>                 */
>> -               if (nmp->nm_readahead > 0 &&
>> -                   !vm_object_mightbedirty(vp->v_object) &&
>> -                   vp->v_object->un_pager.vnp.writemappings == 0) {
>> +               if (nmp->nm_readahead > 0 && ncl_bioread_dora(vp)) {
>>                    for (nra = 0; nra < nmp->nm_readahead && nra < seqcount &&
>>                        (off_t)(lbn + 1 + nra) * biosize < nsize; nra++) {
>>                        rabn = lbn + 1 + nra;
>> @@ -675,9 +685,7 @@ ncl_bioread(struct vnode *vp, struct uio *uio, int
>> ioflag, struct ucred *cred)
>>                 *  directory offset cookie of the next block.)
>>                 */
>>                NFSLOCKNODE(np);
>> -               if (nmp->nm_readahead > 0 &&
>> -                   !vm_object_mightbedirty(vp->v_object) &&
>> -                   vp->v_object->un_pager.vnp.writemappings == 0 &&
>> +               if (nmp->nm_readahead > 0 && ncl_bioread_dora(vp) &&
>>                    (bp->b_flags & B_INVAL) == 0 &&
>>                    (np->n_direofoffset == 0 ||
>>                    (lbn + 1) * NFS_DIRBLKSIZ < np->n_direofoffset) &&
>> 
>> rick
>> ps: It appears that autofs causes the directory to be read before it
>> is open'd for
>>      some reason. I've never looked at autofs.
>> 
>>> 
>>> Peter
>>> 
>>> On 15 Jan 2024, at 09:31, Peter Blok <pb...@bsd4all.org 
>>> <mailto:pb...@bsd4all.org>> wrote:
>>> 
>>> Hi,
>>> 
>>> I do have a crash on a NFS client with stable of today 
>>> (4c4633fdffbe8e4b6d328c2bc9bb3edacc9ab50a). It is also autofs related. 
>>> Maybe it is the same problem.
>>> 
>>> I have ports automounted on /am/ports. When I do cd /am/ports/sys and type 
>>> tab to autocomplete it crashes with the below stack trace. If I plainly 
>>> mount ports on /usr/ports and do the same everything works. I am using NFSv3
>>> 
>>> Peter
>>> 
>>> 
>>> 
>>> 
>>> Fatal trap 12: page fault while in kernel mode
>>> cpuid = 2; apic id = 04
>>> fault virtual address = 0x89
>>> fault code = supervisor read data, page not present
>>> instruction pointer = 0x20:0xffffffff809645d4
>>> stack pointer        = 0x28:0xfffffe00acadb830
>>> frame pointer        = 0x28:0xfffffe00acadb830
>>> code segment = base 0x0, limit 0xfffff, type 0x1b
>>> = DPL 0, pres 1, long 1, def32 0, gran 1
>>> processor eflags = interrupt enabled, resume, IOPL = 0
>>> current process = 6869 (csh)
>>> trap number = 12
>>> panic: page fault
>>> cpuid = 2
>>> time = 1705306940
>>> KDB: stack backtrace:
>>> #0 0xffffffff806232f5 at kdb_backtrace+0x65
>>> #1 0xffffffff805d7a02 at vpanic+0x152
>>> #2 0xffffffff805d78a3 at panic+0x43
>>> #3 0xffffffff809d58ad at trap_fatal+0x38d
>>> #4 0xffffffff809d58ff at trap_pfault+0x4f
>>> #5 0xffffffff809af048 at calltrap+0x8
>>> #6 0xffffffff804c7a7e at ncl_bioread+0xb7e
>>> #7 0xffffffff804b9d90 at nfs_readdir+0x1f0
>>> #8 0xffffffff8069c61a at vop_sigdefer+0x2a
>>> #9 0xffffffff809f8ae0 at VOP_READDIR_APV+0x20
>>> #10 0xffffffff81ce75de at autofs_readdir+0x2ce
>>> #11 0xffffffff809f8ae0 at VOP_READDIR_APV+0x20
>>> #12 0xffffffff806c3002 at kern_getdirentries+0x222
>>> #13 0xffffffff806c33a9 at sys_getdirentries+0x29
>>> #14 0xffffffff809d6180 at amd64_syscall+0x110
>>> #15 0xffffffff809af95b at fast_syscall_common+0xf8
>>> 
>>> 
>>> 
>>> On 15 Jan 2024, at 06:46, FreeBSD User <free...@walstatt-de.de 
>>> <mailto:free...@walstatt-de.de>> wrote:
>>> 
>>> Am Sun, 14 Jan 2024 20:34:12 -0800
>>> Cy Schubert <cy.schub...@cschubert.com <mailto:cy.schub...@cschubert.com>> 
>>> schrieb:
>>> 
>>> In message 
>>> <CAM5tNy5aat8vUn2fsX9jV=D9yGZdnO20Q0Ea7qtszx+zSES2bw@mail.gmail.c 
>>> <mailto:CAM5tNy5aat8vUn2fsX9jV=D9yGZdnO20Q0Ea7qtszx+zSES2bw@mail.gmail.c>
>>> om>
>>> , Rick Macklem writes:
>>> 
>>> On Sat, Jan 13, 2024 at 12:39=E2=80=AFPM Ronald Klop <ronald-li...@klop.ws 
>>> <mailto:ronald-li...@klop.ws>>=
>>> wrote:
>>> 
>>> 
>>> 
>>> Van: FreeBSD User <free...@walstatt-de.de <mailto:free...@walstatt-de.de>>
>>> Datum: 13 januari 2024 19:34
>>> Aan: FreeBSD CURRENT <freebsd-current@freebsd.org 
>>> <mailto:freebsd-current@freebsd.org>>
>>> Onderwerp: NFSv4 crash of CURRENT
>>> 
>>> Hello,
>>> 
>>> running CURRENT client (FreeBSD 15.0-CURRENT #4 main-n267556-69748e62e82a=
>>> 
>>> : Sat Jan 13 18:08:32
>>> 
>>> CET 2024 amd64). One NFSv4 server is same OS revision as the mentioned cl=
>>> 
>>> ient, other is FreeBSD
>>> 
>>> 13.2-RELEASE-p8. Both offer NFSv4 filesystems, non-kerberized.
>>> 
>>> I can crash the client reproducable by accessing the one or other NFSv4 F=
>>> 
>>> S (a simple ls -la).
>>> 
>>> The NFSv4 FS is backed by ZFS (if this matters). I do not have physicla a=
>>> 
>>> ccess to the client
>>> 
>>> host, luckily the box recovers.
>>> 
>>> Did you rebuild both the nfscommon and nfscl modules from the same sources?
>>> I did a commit to main that changes the interface between these two
>>> modules and did bump the
>>> __FreeBSD_version to 1500010, which should cause both to be rebuilt.
>>> (If you have "options NFSCL" in your kernel config, both should have
>>> been rebuilt as a part of
>>> the kernel build.)
>>> 
>>> 
>>> Is anyone by chance seeing autofs in the backtrace too?
>>> 
>>> 
>>> 
>>> Hello Cy Shubert,
>>> 
>>> I forgot to mention that those crashes occur with autofs mounted 
>>> filesystems. Good question,
>>> by the way, I will check whether crashes also happen when mounting the 
>>> tradidional way.
>>> 
>>> Kind regards,
>>> 
>>> oh
>>> 
>>> --
>>> O. Hartmann
>

Re: NFSv4 crash of CURRENT

Reply via email to