Very informative! Thank you so much!
Fred > -----Original Message----- > From: Beowulf [mailto:beowulf-boun...@beowulf.org] On Behalf Of Bill > Broadley > Sent: 星期四, 七月 21, 2016 7:19 > To: Beowulf@beowulf.org > Subject: [Beowulf] NFS HPC survey results. > > > Many thanks for all the responses. > > Here's the promised raw data: > https://wiki.cse.ucdavis.edu/_media/wiki:linux-hpc-nfs-survey.csv > > I'll summarize the 26 results below. I'll email similar to those that asked. > > Not everyone answered all questions. > > 1) cluster OS: > 72% Redhat/CentOS/Scientific linux or derivative > 24% Debian/Ubuntu or derivative > 4% SUSE or derivative > > 2) Appliance/NAS or linux server > 32% NFS appliance > 76% linux server > 12% other (illumos/Solaris) > > 3) Appliances used (one each, free form answers): > * Hitachi BlueARC, EMC Isilon, DDN/GPFS, x4540 > * Not sure - something that corporate provided. An F5, maybe...? Also a > Panasas system for /scratch. > * NetApp FAS6xxx > * netapp > * isilon x and nl > * Isilon > * NetApp > * Synology > > 4) Which kernel do you use: > 88% one provided with the linux distribution > 12% one that I compile/tweak myself > > 5) what kernel changes do you make > * CPU performance tweaking, network performance. > * raise ARP cache size, newer kernel than stock 3.2 was needed for newer > hardware 3.14 at the moment > * ZFS > > 6) Do you often see problems like nfs: server 192.168.5.30 not responding, > timed out: > 42.3% Never > 23.1% Sometimes > 19.2% rarely > 7.7% daily > 7.7% often > > 7) If you see NFS time outs what do you do (free form answers) > * nothing > * nothing > * Restart NFSd, look for performance intensive jobs, sometimes increase > NFSd. > * Look at what's going on on that server. That means looking at what the > disks are doing, what network flows are going to/from that server and > determine if the load is something to take action on or to let. > * Not much > * Reboot > * Resolve connectivity issue if any and run mount command on nodes. If this > doesn't fix it, then reboot. > * Ignore them, unless they become a problem. > * Look for the root cause of the issue, typically system is suffering > network > issues or is overloaded by a user 'abuse/missuse'. > * diagnose and identify underlying cause > * Try to figure out who is overloading the NFS server (hard job) > * Troubleshoot, typically a machine is offline or network saturation > > 8) which NFS options do you use (free form): > * tcp,async,nodev,nosuid,rsize=32768,wsize=32768,timeout=10 > * nfsvers=3,nolock,hard,intr,timeo=16,retrans=8 > * hard,intr,rsize=32768,wsize=32768 > * all default > * async > * async,nodev,nosuid,rsize=32768,wsize=32768 > * tcp,async, nodev, nosuid,timeout=10 > * -rw,intr,nosuid,proto=tcp (mostly. Could be "ro" and/or "suid") > * > rsize=32768,wsize=32768,hard,intr,vers=3,proto=tcp,retrans=2,timeo=600 > * rsize=32768,wsize=32768 > * -nobrowse,intr,rsize=32768,wsize=32768,vers=3 > * udp,hard,timeo=50,retrans=7,intr,bg,rsize=8192,wsize=8192,nfsvers=3, > mountvers=3 > * RHEL defaults > * default ones, they're almost always the best ones > * rw,nosuid,nodev,tcp,hard,intr,vers=4 > * > rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=t > cp, > port=0,timeo=600,retrans=2,sec=sys, > clientaddr=10.5.6.7,local_lock=none, > addr=10.5.6.1 > * defaults, netdev,vers=3 > * nfsvers=3,tcp,rw,hard,intr,timeo=600,retrans=2 > * rw,hard,tcp,nfsvers=3,noacl,nolock > * default rhel6 (+nosuid, nodev, and sometimes nfsver=3) > * tcp, intr, noauto, timeout, rsize, wsize, auto > * nfsvers=3,rsize=1024,wsize=1024,cto > > 9) Any explanations: > * We have not yet made the change to nfsv4, we use nolock due to various > application "issues", we do not hard set rsize/wsize as they have been > negotiating better values for a number of years on their own under v3, > and the timeout/retrans are a bit of a legacy set of values from working > on > this issue of server overload. Hard was a choice on our end to pick that > having things hang definitely seemed better then having things fail and > go > stale. We still agree with the choice of hard. Intr just helps to > "interupt" stuck things when needed. > * We like to be able to ctrl-C hung processes. For some systems we use > larger > rsize/wsize if the vendor supports it. > * works for me without tewaks > * We didn't use tcp until the last couple of years. > * Probably needs a revisit- block size was set up for 2.x series kernels > * default of centos 7 > * nfsv4 was not stable enough last time out, don't fix rsize/wsize as > client/server usually negotiate to 1M anyway > * We have frequent power outage (5+ times a year) and noauto helps our > not to > hang on mounting nfs shares. Drawback is you have to manually mount. > Time > out helps with this issue as well. > * These are adjusted if necessary for particular workloads > > 10) what parts of the file system do you use NFS for (free form): > * /home > * /home > * /home > * /home > * /home > * /home > * /home > * /home and /apps > * We use NFS for the OS (NFSRoot), App tree, $HOME, Group dedicated > space, as > well as some of our scratch spaces. All of these come from different NFS > servers. > * /home, /apps > * /home /opt /etc /usr /boot > * /home,/apps, > * /home, /apps, /scratch - all of 'em > * /home, long term project storage, shared software > * /cluster/home,/cluster/local,/cluster/scratch,/cluster/data > * home, apps, shared data > * /usr/local, /home > * /home , /apps > * various > * /home, /group, /usr/local > * /home, parts of /opt, some specific top level auto-mountable dirs > * What above is called /apps and /home for a few medium sized systems > * /home, /local, /opt, /diskless > * /home, /opt, diskless node images > > 11) How many nodes can mount a single NFS server at once: > 24% >= 512 nodes > 20% 65-128 nodes > 16% 1-16 nodes > 12% 17-32 nodes > 12% 257-512 nodes > 12% 129-256 nodes > 4% 33-64 nodes > > 12) How many NFSd daemons do you run per NFS server > 45.0% 1-16 > 13.6% 129-256 > 13.6% 65-128 > 9.1% 33-64 > 4.5% 17-32 > 4.5% 256-512 > 4.5% 512-1024 > 4.5% 2048-4096 > > 13) Do you use NFSd or user space > 81.0% Kernel NFSd > 14.3% User space > 4.8% Both > > 14) What interconnect do you use with NFS? > 38.5% 10G > 26.9% GigE > 23.1% IB > 11.5% Other > > 15) If IB what transport (10 responses) > 100% IPoIB > 0% Other > > 16) If IB, do you use connected mode (8 responses) > 65.5% Connected mode > 37.5% Don't use connected mode > > 17) Do you use UDP or TCP (25 responses) > 84% TCP > 12% UDP > 4% Other > > 18) Which other network file systems do you use? (24 responses) > 0% PNFS > 58.3% Lustre > 16.7% Ceph > 12.5% BeeGFS > 12.5% GlusterFS > 8.3% None (Panansas, GPFS, HSM/SAM/QFS, or more than one of the > above) > > 19) Are the other network file systems more or less reliable than NFS? > 58.3% Similar > 16.7% I use only NFS > 12.5% Much more reliable > 4.2% Much less reliable > 4.2% Somewhat less reliable > 4.2% Somewhat more reliable > > 20) Do you support MPI-IO (not just MPI) > 70.8% no > 20.8% yes > 8.3% (yes, but nobody uses it) > > 21) Any tips for making NFS perform better or more reliably? > * We start with the underlying block (raid/disks) setup that you are going > to > serve data out and plumb up from there. The key things here is choosing your > raid stride/chunk sizes and insuring your file system is as aware of the raid > layout for good alignment as you can. We do follow the esnet host tuning found > at: http://fasterdata.es.net/host-tuning/linux/ on both client and server > systems. We also bump up the rpc.mountd count to help insure successful > mounts as we use autofs to mount a number of the nfs spaces. When a larger > HPC job starts up on many nodes we did have a time where not all would be > able to mount successfully if the server was under load. Increasing the > rpc.mountd count helped. We also set async and wdelay on our exports on the > servers. > * Kernel settings > * I've heard that configuring IB in RDMA boosts NFS performance > * We don't use NFS for high performance cluster data. That's Lustre's > world. > Where NFS is used for scientific data, it's in places where there are modest > numbers of concurrent clients. > * more disks > * RPCMOUNTDOPTS="--num-threads=64" > * Try to optimize /etc/sysconfig/nfs as much as possible. > > 22) Any tips for making NFS clients perform better or more reliably? > * Following the above mentioned esnet info at: > http://fasterdata.es.net/host-tuning/linux/. I should note that for both > client > and server that are using IPoIB we use connected mode and set the MTU to > 64k. > * Reducing the size of the kernel dirty buffer on the clients makes > performance much more consistent. > * user reliable interconnect hw > * We've tried scripting NFS mounts w/o much success. > * Educate users on using the right filesystem for the right task > > 23) Anything you would like to add: > > * We have also seen input from others that they see gains with the client > option of 'nocto'. The man pages would suggest this has some risks so while we > have tested and can see that certain loads see a gain from this we have not > yet > moved forward to deploy this option on our general setup. We are in process of > testing our apps to insure we do not create other issues for apps if we do use > this flag. Another things we have been looking at is cachefilesd and seeing > how well that helps for data that can easily be cached. For things like our > application trees, the OS (we are NFSRoot booted), and even some user > reference data sets this looks quite promising but we have not gone live with > this yet either. > * We're always looking to improve our environment as well. We don't > always have TIME to do so, of course. > * Horses for courses. NFS is great for shared software and home > directories. > It's pretty useless for high performance access from hundreds of compute > nodes. > * Every storage system / file system I've ever seen or used has had its > problems. There is no silver bullet (afaik). Use that which you have the > competence to handle. > * We are currently struggling with NFS mounts. We use them extensively > throughout our department. Problems are they hang constantly and when one > person is using the share heavily it slows down other computers. We've done > lots of research into optimizing NFS but always come back to the same issues > (hanging mounts that don't recover w/o admin interaction). We would love to > know what other people are doing. We are experimenting with ceph at the > moment for future large storage needs. > > > > > > > > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf