[Devel] [PATCH RHEL7 COMMIT] ms/trylock_super(): replacement for grab_super_passive()

Konstantin Khorenko Fri, 13 Oct 2017 05:43:13 -0700

The commit is pushed to "branch-rh7-3.10.0-693.1.1.vz7.37.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-693.1.1.vz7.37.13
------>
commit 22c1ffe395aa3d560ba33139aeb38a8840783766
Author: Konstantin Khlebnikov <khlebni...@yandex-team.ru>
Date:   Fri Oct 13 15:40:02 2017 +0300


    ms/trylock_super(): replacement for grab_super_passive()
    
    We have a softlockup in wb_do_writeback -> wb_writeback ->
    __writeback_inodes_wb stack, as we iterate over works in
    wb->bdi->work_list and for each work we move wb->b_dirty inodes to
    wb->b_io(in queue_io), and later in __writeback_inodes_wb() we try
    to write them, we trylock inode->i_sb->s_umount rw_semaphore and
    fail, we put each inode back to wb->b_dirty, and on each inode we
    take sb_lock - which can be long. We do aproximately ~100.000.000
    iterations, witch takes more than softlockup threshold(20sec).
    The above s_umount lock is taken by sync() in container in
    sync_filesystems_ve() wich sleeps for 50 sec.
    
    Removing sb_lock from the equation should remove most of contention
    and all iterations should pass much faster.
    
    https://jira.sw.ru/browse/PSBM-73370
    ms commit: eb6ef3d ("trylock_super(): replacement for grab_super_passive()")
    
    original commit message:
    
    I've noticed significant locking contention in memory reclaimer around
    sb_lock inside grab_super_passive(). Grab_super_passive() is called from
    two places: in icache/dcache shrinkers (function super_cache_scan) and
    from writeback (function __writeback_inodes_wb). Both are required for
    progress in memory allocator.
    
    Grab_super_passive() acquires sb_lock to increment sb->s_count and check
    sb->s_instances. It seems sb->s_umount locked for read is enough here:
    super-block deactivation always runs under sb->s_umount locked for write.
    Protecting super-block itself isn't a problem: in super_cache_scan() sb
    is protected by shrinker_rwsem: it cannot be freed if its slab shrinkers
    are still active. Inside writeback super-block comes from inode from bdi
    writeback list under wb->list_lock.
    
    This patch removes locking sb_lock and checks s_instances under s_umount:
    generic_shutdown_super() unlinks it under sb->s_umount locked for write.
    New variant is called trylock_super() and since it only locks semaphore,
    callers must call up_read(&sb->s_umount) instead of drop_super(sb) when
    they're done.
    
    Signed-off-by: Konstantin Khlebnikov <khlebni...@yandex-team.ru>
    Signed-off-by: Al Viro <v...@zeniv.linux.org.uk>
    
    Suggested-by: Andrey Ryabinin <aryabi...@virtuozzo.com>
    Signed-off-by: Pavel Tikhomirov <ptikhomi...@virtuozzo.com>
---
 fs/fs-writeback.c |  6 +++---
 fs/internal.h     |  2 +-
 fs/super.c        | 42 +++++++++++++++++++-----------------------
 3 files changed, 23 insertions(+), 27 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 91f1d0a..7cea021 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -797,9 +797,9 @@ static long __writeback_inodes_wb(struct bdi_writeback *wb,
                if (time_is_before_jiffies(start_time + 15* HZ))
                        trace = 1;
 
-               if (!grab_super_passive(sb)) {
+               if (!trylock_super(sb)) {
                        /*
-                        * grab_super_passive() may fail consistently due to
+                        * trylock_super() may fail consistently due to
                         * s_umount being grabbed by someone else. Don't use
                         * requeue_io() to avoid busy retrying the inode/sb.
                         */
@@ -810,7 +810,7 @@ static long __writeback_inodes_wb(struct bdi_writeback *wb,
                        continue;
                }
                wrote += writeback_sb_inodes(sb, wb, work);
-               drop_super(sb);
+               up_read(&sb->s_umount);
 
                /* refer to the same tests at the end of writeback_sb_inodes */
                if (wrote) {
diff --git a/fs/internal.h b/fs/internal.h
index 289437c..ab2f706 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -86,7 +86,7 @@ extern struct file *get_empty_filp(void);
  * super.c
  */
 extern int do_remount_sb(struct super_block *, int, void *, int);
-extern bool grab_super_passive(struct super_block *sb);
+extern bool trylock_super(struct super_block *sb);
 extern struct dentry *mount_fs(struct file_system_type *,
                               int, const char *, void *);
 extern struct super_block *user_get_super(dev_t);
diff --git a/fs/super.c b/fs/super.c
index af7ae96..3b59975 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -94,8 +94,8 @@ static unsigned long super_cache_scan(struct shrinker *shrink,
        if (!(sc->gfp_mask & __GFP_FS))
                return SHRINK_STOP;
 
-       if (!grab_super_passive(sb))
-                return SHRINK_STOP;
+       if (!trylock_super(sb))
+               return SHRINK_STOP;
 
        if (sb->s_op && sb->s_op->nr_cached_objects)
                fs_objects = sb->s_op->nr_cached_objects(sb, sc);
@@ -123,7 +123,7 @@ static unsigned long super_cache_scan(struct shrinker 
*shrink,
                freed += sb->s_op->free_cached_objects(sb, sc);
        }
 
-       drop_super(sb);
+       up_read(&sb->s_umount);
        return freed;
 }
 
@@ -139,7 +139,7 @@ static unsigned long super_cache_count(struct shrinker 
*shrink,
        sb = container_of(shrink, struct super_block, s_shrink);
 
        /*
-        * Don't call grab_super_passive as it is a potential
+        * Don't call trylock_super as it is a potential
         * scalability bottleneck. The counts could get updated
         * between super_cache_count and super_cache_scan anyway.
         * Call to super_cache_count with shrinker_rwsem held
@@ -373,35 +373,31 @@ static int grab_super(struct super_block *s) 
__releases(sb_lock)
 }
 
 /*
- *     grab_super_passive - acquire a passive reference
+ *     trylock_super - try to grab ->s_umount shared
  *     @sb: reference we are trying to grab
  *
- *     Tries to acquire a passive reference. This is used in places where we
+ *     Try to prevent fs shutdown.  This is used in places where we
  *     cannot take an active reference but we need to ensure that the
- *     superblock does not go away while we are working on it. It returns
- *     false if a reference was not gained, and returns true with the s_umount
- *     lock held in read mode if a reference is gained. On successful return,
- *     the caller must drop the s_umount lock and the passive reference when
- *     done.
+ *     filesystem is not shut down while we are working on it. It returns
+ *     false if we cannot acquire s_umount or if we lose the race and
+ *     filesystem already got into shutdown, and returns true with the s_umount
+ *     lock held in read mode in case of success. On successful return,
+ *     the caller must drop the s_umount lock when done.
+ *
+ *     Note that unlike get_super() et.al. this one does *not* bump ->s_count.
+ *     The reason why it's safe is that we are OK with doing trylock instead
+ *     of down_read().  There's a couple of places that are OK with that, but
+ *     it's very much not a general-purpose interface.
  */
-bool grab_super_passive(struct super_block *sb)
+bool trylock_super(struct super_block *sb)
 {
-       spin_lock(&sb_lock);
-       if (hlist_unhashed(&sb->s_instances)) {
-               spin_unlock(&sb_lock);
-               return false;
-       }
-
-       sb->s_count++;
-       spin_unlock(&sb_lock);
-
        if (down_read_trylock(&sb->s_umount)) {
-               if (sb->s_root && (sb->s_flags & MS_BORN))
+               if (!hlist_unhashed(&sb->s_instances) &&
+                   sb->s_root && (sb->s_flags & MS_BORN))
                        return true;
                up_read(&sb->s_umount);
        }
 
-       put_super(sb);
        return false;
 }
 
_______________________________________________
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

[Devel] [PATCH RHEL7 COMMIT] ms/trylock_super(): replacement for grab_super_passive()

Reply via email to