>Size is not that big, 600GB space with around half of that actually used. >GlusterFS servers themselves each have 4 cores and 12GB memory. It might also >be important to note that these are VMware hosted nodes that make use of SAN >storage for the datastores.
4 cores is quite low, especially when healing. >Connected to that NFS (ganesha) exported share are just over 100 clients, all >RHEL6 and RHEL7, some spanning 10 network hops away. All of those clients are >(currently) using the same virtual-IP, so all end up on the same server. Why not FUSE ? Ganesha is suitable for UNIX and BSD systems that do not support FUSE. >Note that I mentioned 'should', since at times it had anywhere between 250.000 >and 1 million files in it (which of course is not advised). Using some kind >of hashing (subfolders spread per day/hour etc) was also already advised. If you have multiple subdomains (from replicate -> to distributed-replicated) , you can also spread the load - yet 'find' won't be faster :) Problems that are often seen: >- Any kind of operation on VMware such as a vMotion, creating a VM snapshot >etc. on the node that has these 100+ clients connected causes such a temporary >pause that pacemaker decides to switch the resources (causing a failover of >the virtual IP address, thus clients connected suffer delay). RH corosync defaults are not suitable for VMs. I prefer SUSE's defaults. Consider increasing the 'token' and 'consensus' to a more meaningful values -> start with 10s token for example. >One would expect this to last just shy under a minute, then clients would >happily continue. However connected clients are stuck with a non-working >mountpoint (commands as df, ls, find etc simply hang.. they go into an >uninterruptible sleep). In regular HA NFS, there is a "notify" resource that notifies the clients about the failover. The stale happens because your IP is brought before the NFS export is ready. As you haven't provided HA details, I can't help much there. >Mount are 'hard' mounts to insure guaranteed writes. That's good. Also is needed for the HA to properly work. >- Once the number of files are over the 100.000 mark (again into a single, >unhashed, folder) any operation on that share becomes very sluggish (even a >df, on a client, would take 20/30 seconds, a find command would take minutes >to complete). I think it's expected... >If anyone can spot any ideas for improvement ? I would try to first switch to 'replica 3 arbiter 1' as current setup is wasting storage, next switch the clients to FUSE. For performance improvements , I would add some SSDs in the game (tier 1+ storage) and use the SSD-based LUNs as lvm caching. Best Regards, Strahil Nikolov ________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users