https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=249871

            Bug ID: 249871
           Summary: NFSv4 faulty directory listings under heavy load
           Product: Base System
           Version: 12.1-RELEASE
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: kern
          Assignee: b...@freebsd.org
          Reporter: j...@freebsd.org

I think I've discovered a peculiar bug in NFSv4.  When the server is under
heavy load, directory listings sometimes show duplicate filenames and other
times omit filenames.

This was discovered when running parallel jobs on a small HPC cluster, each
running xzcat on an NFS-served file, dumping the uncompressed output to a local
disk on the client, followed by some brief heavy computation and writing
several small output files to the NFS server.  As shown below, there are 11,031
files processed.  Parallel jobs were capped between 50 to 150 at a time, with
the problem occurring with any cap.

All files list-*.txt shown below were produced by

    ls | grep 'combined.*-ad\.vcf\.xz'

or

    find . -maxdepth 1 'combined.*-ad.vcf.xz'

The file list-1.txt contains the correct directory listing.

list-100.txt, however, contains duplicate filenames, and list-1000.txt has both
duplicate and missing filenames.

# sort list-1.txt | uniq -d

# sort list-100.txt | uniq -d
combined.NWD297242-ad.vcf.xz
combined.NWD745320-ad.vcf.xz
combined.NWD787696-ad.vcf.xz

# wc -l list-1.txt list-100.txt list-1000.txt
   11031 list-1.txt
   11034 list-100.txt
   11027 list-1000.txt
   33092 total

# diff list-1.txt list-100.txt
2404a2405
> combined.NWD297242-ad.vcf.xz
7856a7858
> combined.NWD745320-ad.vcf.xz
8391a8394
> combined.NWD787696-ad.vcf.xz

# diff list-1.txt list-1000.txt
153a154
> combined.NWD111306-ad.vcf.xz
170d170
< combined.NWD113182-ad.vcf.xz
512d511
[snip]

If I revert the mounts to NFSv3, the problem goes away (but performance
suffers).

There are no apparent problems delivering file content, just directory
listings.  Using this fact, I can work around the problem by writing the
directory listing to a file beforehand, when the server is not under load:

    ls | grep 'combined.*-ad\.vcf\.xz' > VCF-list.txt

Reading this file under heavy load does not pose any problems.  It's only if I
do a new directory listing with "ls" or "find".

The problem is consistently reproducible under heavy load and does not occur 
under light load.

/etc/exports:

V4: /

/etc/zfs/exports:

# !!! DO NOT EDIT THIS FILE MANUALLY !!!

/pxeserver/images       -alldirs -ro -network 192.168.0.0 -mask 255.255.128.0 
/raid-00        -maproot=root -network 192.168.0.0 -mask 255.255.128.0 
/sharedapps     -maproot=root -network 192.168.0.0 -mask 255.255.128.0 
/usr/home       -maproot=root -network 192.168.0.0 -mask 255.255.128.0 
/var/cache/pkg  -maproot=root -network 192.168.0.0 -mask 255.255.128.0 

/etc/fstab on the clients:

login:/usr/home         /usr/home       nfs     rw,bg,intr,noatime 0       0
login:/raid-00          /raid-00        nfs     rw,bg,intr,noatime 0       0
login:/sharedapps       /sharedapps     nfs     rw,bg,intr,noatime 0       0
login:/var/cache/pkg    /var/cache/pkg  nfs     rw,bg,intr,noatime 0       0

-- 
You are receiving this mail because:
You are the assignee for the bug.
_______________________________________________
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"

Reply via email to