Dear Neil, I was running it on arm64, may that be the reason?
Regards, Richard On May 27, 2024 4:02:32 AM GMT+02:00, NeilBrown <ne...@suse.de> wrote: >On Sun, 26 May 2024, Richard Kojedzinszky wrote: >> Dear Neil, >> >> According to my quick tests, your patch seems to fix this bug. Could you >> also manage to try my attached code, could you also reproduce the bug? > >Thanks for testing. > >I can run your test code but it isn't triggering the bug (90 minutes so >far). Possibly a different compiler used for the kernel, possibly >hardware differences (I'm running under qemu). Bugs related to barriers >(which this one seems to be) need just the right circumstances to >trigger so they can be hard to reproduce on a different system. > >I've made some cosmetic improvements to the patch and will post it to >the NFS maintainers. > >Thanks again, >NeilBrown > > >> >> Thanks, >> Richard >> >> 2024-05-24 07:29 időpontban Richard Kojedzinszky ezt írta: >> > Dear Neil, >> > >> > I've applied your patch, and since then there are no lockups. Before >> > that my application reported a lockup in a minute or two, now it has >> > been running for half an hour, and still running. >> > >> > Thanks, >> > Richard >> > >> > 2024-05-24 01:31 időpontban NeilBrown ezt írta: >> >> On Fri, 24 May 2024, Richard Kojedzinszky wrote: >> >>> Dear devs, >> >>> >> >>> I am attaching a stripped down version of the little program which >> >>> triggers the bug very quickly, in a few minutes in my test lab. It >> >>> turned out that a single NFS mountpoint is enough. Just start the >> >>> program giving it the NFS mount as first argument. It will chdir >> >>> there, >> >>> and do file operations, which will trigger a lockup in a few minutes. >> >> >> >> I couldn't get the go code to run. But then it is a long time since I >> >> played with go and I didn't try very hard. >> >> If you could provide simple instructions and a list of package >> >> dependencies that I need to install (on Debian), I can give it a try. >> >> >> >> Or you could try this patch. It might help, but I don't have high >> >> hopes. It adds some memory barriers and fixes a bug which would cause >> >> a >> >> problem if memory allocation failed (but memory allocation never >> >> fails). >> >> >> >> NeilBrown >> >> >> >> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c >> >> index ac505671efbd..5bcc0d14d519 100644 >> >> --- a/fs/nfs/dir.c >> >> +++ b/fs/nfs/dir.c >> >> @@ -1804,7 +1804,7 @@ __nfs_lookup_revalidate(struct dentry *dentry, >> >> unsigned int flags, >> >> } else { >> >> /* Wait for unlink to complete */ >> >> wait_var_event(&dentry->d_fsdata, >> >> - dentry->d_fsdata != NFS_FSDATA_BLOCKED); >> >> + smp_load_acquire(&dentry->d_fsdata) != >> >> NFS_FSDATA_BLOCKED); >> >> parent = dget_parent(dentry); >> >> ret = reval(d_inode(parent), dentry, flags); >> >> dput(parent); >> >> @@ -2508,7 +2508,7 @@ int nfs_unlink(struct inode *dir, struct dentry >> >> *dentry) >> >> spin_unlock(&dentry->d_lock); >> >> error = nfs_safe_remove(dentry); >> >> nfs_dentry_remove_handle_error(dir, dentry, error); >> >> - dentry->d_fsdata = NULL; >> >> + smp_store_release(&dentry->d_fsdata, NULL); >> >> wake_up_var(&dentry->d_fsdata); >> >> out: >> >> trace_nfs_unlink_exit(dir, dentry, error); >> >> @@ -2616,7 +2616,7 @@ nfs_unblock_rename(struct rpc_task *task, struct >> >> nfs_renamedata *data) >> >> { >> >> struct dentry *new_dentry = data->new_dentry; >> >> >> >> - new_dentry->d_fsdata = NULL; >> >> + smp_store_release(&new_dentry->d_fsdata, NULL); >> >> wake_up_var(&new_dentry->d_fsdata); >> >> } >> >> >> >> @@ -2717,6 +2717,10 @@ int nfs_rename(struct mnt_idmap *idmap, struct >> >> inode *old_dir, >> >> task = nfs_async_rename(old_dir, new_dir, old_dentry, new_dentry, >> >> must_unblock ? nfs_unblock_rename : NULL); >> >> if (IS_ERR(task)) { >> >> + if (must_unlock) { >> >> + smp_store_release(&new_dentry->d_fsdata, NULL); >> >> + wake_up_var(&new_dentry->d_fsdata); >> >> + } >> >> error = PTR_ERR(task); >> >> goto out; >> >> } >> >