----- Original Message ----- > From: "Niels de Vos" <nde...@redhat.com> > To: "Raghavendra G" <raghaven...@gluster.com> > Cc: "Dan Lambright" <dlamb...@redhat.com>, "Gluster Devel" > <gluster-devel@gluster.org>, "Csaba Henk" > <csaba.h...@gmail.com> > Sent: Wednesday, August 17, 2016 4:49:41 AM > Subject: Re: [Gluster-devel] md-cache improvements > > On Wed, Aug 17, 2016 at 11:42:25AM +0530, Raghavendra G wrote: > > On Fri, Aug 12, 2016 at 10:29 AM, Raghavendra G <raghaven...@gluster.com> > > wrote: > > > > > > > > > > > On Thu, Aug 11, 2016 at 9:31 AM, Raghavendra G <raghaven...@gluster.com> > > > wrote: > > > > > >> Couple of more areas to explore: > > >> 1. purging kernel dentry and/or page-cache too. Because of patch [1], > > >> upcall notification can result in a call to inode_invalidate, which > > >> results > > >> in an "invalidate" notification to fuse kernel module. While I am sure > > >> that, this notification will purge page-cache from kernel, I am not sure > > >> about dentries. I assume if an inode is invalidated, it should result in > > >> a > > >> lookup (from kernel to glusterfs). But neverthless, we should look into > > >> differences between entry_invalidation and inode_invalidation and > > >> harness > > >> them appropriately. > > I do not think fuse handles upcall yet. I think there is a patch for > that somewhere. It's been a while since I looked into that, but I think > invalidating the affected dentries was straight forwards.
Can the patch # be tracked down ? I'd like to run some experiments with it + tiering.. > > > >> 2. Granularity of invalidation. For eg., We shouldn't be purging > > >> page-cache in kernel, because of a change in xattr used by an xlator > > >> (eg., > > >> dht layout xattr). We have to make sure that [1] is handling this. We > > >> need > > >> to add more granularity into invaldation (like internal xattr > > >> invalidation, > > >> user xattr invalidation, entry invalidation in kernel, page-cache > > >> invalidation in kernel, attribute/stat invalidation in kernel etc) and > > >> use > > >> them judiciously, while making sure other cached data remains to be > > >> present. > > >> > > > > > > To stress the importance of this point, it should be noted that with tier > > > there can be constant migration of files, which can result in spurious > > > (from perspective of application) invalidations, even though application > > > is > > > not doing any writes on files [2][3][4]. Also, even if application is > > > writing to file, there is no point in invalidating dentry cache. We > > > should > > > explore more ways to solve [2][3][4]. > > Actually upcall tracks the client/inode combination, and only sends > upcall events to clients that (recently/timeout?) accessed the inode. > There should not be any upcalls for inodes that the client did not > access. So, when promotion/demotion happens, only the process doing this > should receive the event, not any of the other clients that did not > access the inode. > > > > 3. We've a long standing issue of spurious termination of fuse > > > invalidation thread. Since after termination, the thread is not > > > re-spawned, > > > we would not be able to purge kernel entry/attribute/page-cache. This > > > issue > > > was touched upon during a discussion [5], though we didn't solve the > > > problem then for lack of bandwidth. Csaba has agreed to work on this > > > issue. > > > > > > > 4. Flooding of network with upcall notifications. Is it a problem? If yes, > > does upcall infra already solves it? Would NFS/SMB leases help here? > > I guess some form of flooding is possible when two or more clients do > many directory operations in the same directory. Hmm, now I wonder if a > client gets an upcall event for something it did itself. I guess that > would (most often?) not be needed. > > Niels > > > > > > > > > [2] https://bugzilla.redhat.com/show_bug.cgi?id=1293967#c7 > > > [3] https://bugzilla.redhat.com/show_bug.cgi?id=1293967#c8 > > > [4] https://bugzilla.redhat.com/show_bug.cgi?id=1293967#c9 > > > [5] http://review.gluster.org/#/c/13274/1/xlators/mount/ > > > fuse/src/fuse-bridge.c > > > > > > > > >> > > >> [1] http://review.gluster.org/12951 > > >> > > >> > > >> On Wed, Aug 10, 2016 at 10:35 PM, Dan Lambright <dlamb...@redhat.com> > > >> wrote: > > >> > > >>> > > >>> There have been recurring discussions within the gluster community to > > >>> build on existing support for md-cache and upcalls to help performance > > >>> for > > >>> small file workloads. In certain cases, "lookup amplification" > > >>> dominates > > >>> data transfers, i.e. the cumulative round trip times of multiple > > >>> LOOKUPs > > >>> from the client mitigates benefits from faster backend storage. > > >>> > > >>> To tackle this problem, one suggestion is to more aggressively utilize > > >>> md-cache to cache inodes on the client than is currently done. The > > >>> inodes > > >>> would be cached until they are invalidated by the server. > > >>> > > >>> Several gluster development engineers within the DHT, NFS, and Samba > > >>> teams have been involved with related efforts, which have been underway > > >>> for > > >>> some time now. At this juncture, comments are requested from gluster > > >>> developers. > > >>> > > >>> (1) .. help call out where additional upcalls would be needed to > > >>> invalidate stale client cache entries (in particular, need feedback > > >>> from > > >>> DHT/AFR areas), > > >>> > > >>> (2) .. identify failure cases, when we cannot trust the contents of > > >>> md-cache, e.g. when an upcall may have been dropped by the network > > >>> > > >>> (3) .. point out additional improvements which md-cache needs. For > > >>> example, it cannot be allowed to grow unbounded. > > >>> > > >>> Dan > > >>> > > >>> ----- Original Message ----- > > >>> > From: "Raghavendra Gowdappa" <rgowd...@redhat.com> > > >>> > > > >>> > List of areas where we need invalidation notification: > > >>> > 1. Any changes to xattrs used by xlators to store metadata (like dht > > >>> layout > > >>> > xattr, afr xattrs etc). > > >>> > 2. Scenarios where individual xlator feels like it needs a lookup. > > >>> > For > > >>> > example failed directory creation on non-hashed subvol in dht during > > >>> mkdir. > > >>> > Though dht succeeds mkdir, it would be better to not cache this inode > > >>> as a > > >>> > subsequent lookup will heal the directory and make things better. > > >>> > 3. removing of files > > >>> > 4. writev on brick (to invalidate read cache on client) > > >>> > > > >>> > Other questions: > > >>> > 5. Does md-cache has cache management? like lru or an upper limit for > > >>> cache. > > >>> > 6. Network disconnects and invalidating cache. When a network > > >>> disconnect > > >>> > happens we need to invalidate cache for inodes present on that brick > > >>> as we > > >>> > might be missing some notifications. Current approach of purging > > >>> > cache > > >>> of > > >>> > all inodes might not be optimal as it might rollback benefits of > > >>> caching. > > >>> > Also, please note that network disconnects are not rare events. > > >>> > > > >>> > regards, > > >>> > Raghavendra > > >>> _______________________________________________ > > >>> Gluster-devel mailing list > > >>> Gluster-devel@gluster.org > > >>> http://www.gluster.org/mailman/listinfo/gluster-devel > > >>> > > >> > > >> > > >> > > >> -- > > >> Raghavendra G > > >> > > > > > > > > > > > > -- > > > Raghavendra G > > > > > > > > > > > -- > > Raghavendra G > > > _______________________________________________ > > Gluster-devel mailing list > > Gluster-devel@gluster.org > > http://www.gluster.org/mailman/listinfo/gluster-devel > > _______________________________________________ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel