>Size is not that big, 600GB space with around half of that actually used.  
>GlusterFS servers themselves each have 4 cores and 12GB memory.  It might also 
>be important to note that these are VMware hosted nodes that make use of  SAN 
>storage for the datastores.

4 cores is quite low, especially when healing.

>Connected to that NFS (ganesha) exported share are just over 100 clients, all 
>RHEL6 and RHEL7, some spanning 10 network hops away.  All of those clients are 
>(currently) using the same virtual-IP, so all end up on the same server.

Why not FUSE ? Ganesha is suitable for UNIX and BSD systems that do not support 
FUSE.

>Note that I mentioned 'should', since at times it had anywhere between 250.000 
>and 1 million files in it (which of course is not advised).  Using some kind 
>of hashing (subfolders spread per day/hour etc) was also already advised.
If you have multiple subdomains (from replicate -> to distributed-replicated) , 
you can also spread the load - yet 'find' won't be faster :)


Problems that are often seen:
>- Any kind of operation on VMware such as a vMotion, creating a VM snapshot 
>etc. on the node that has these 100+ clients connected causes such a temporary 
>pause that pacemaker decides to switch the resources (causing a failover of 
>the virtual IP address, thus clients connected suffer delay).  
RH corosync defaults are not suitable for VMs. I prefer SUSE's defaults.
Consider increasing the 'token' and 'consensus' to a more meaningful values -> 
start with 10s token for example.

>One would expect this to last just shy under a minute, then clients would 
>happily continue.  However connected clients are stuck with a non-working 
>mountpoint (commands as df, ls, find etc simply hang.. they go into an 
>uninterruptible sleep).
In regular HA NFS, there is a "notify" resource that notifies the clients about 
the failover. The stale happens because your IP is brought before the NFS 
export is ready. As you haven't provided HA details, I can't help much there.

>Mount are 'hard' mounts to insure guaranteed writes.
That's good. Also is needed for the HA to properly work.

>- Once the number of files are over the 100.000 mark (again into a single, 
>unhashed, folder) any operation on that share becomes very sluggish (even a 
>df, on a client, would take 20/30 seconds,  a find command would take minutes 
>to complete).
I think it's expected...

>If anyone can spot any ideas for improvement ?
I would try to first switch to 'replica 3 arbiter 1' as current setup is 
wasting storage, next switch the clients to FUSE.
For performance improvements , I would add some SSDs in the game (tier 1+ 
storage) and use the SSD-based LUNs as lvm caching.

Best Regards,
Strahil Nikolov
________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Reply via email to