The commit is pushed to "branch-rh7-3.10.0-693.1.1.vz7.37.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git after rh7-3.10.0-693.1.1.vz7.37.13 ------> commit 22c1ffe395aa3d560ba33139aeb38a8840783766 Author: Konstantin Khlebnikov <khlebni...@yandex-team.ru> Date: Fri Oct 13 15:40:02 2017 +0300
ms/trylock_super(): replacement for grab_super_passive() We have a softlockup in wb_do_writeback -> wb_writeback -> __writeback_inodes_wb stack, as we iterate over works in wb->bdi->work_list and for each work we move wb->b_dirty inodes to wb->b_io(in queue_io), and later in __writeback_inodes_wb() we try to write them, we trylock inode->i_sb->s_umount rw_semaphore and fail, we put each inode back to wb->b_dirty, and on each inode we take sb_lock - which can be long. We do aproximately ~100.000.000 iterations, witch takes more than softlockup threshold(20sec). The above s_umount lock is taken by sync() in container in sync_filesystems_ve() wich sleeps for 50 sec. Removing sb_lock from the equation should remove most of contention and all iterations should pass much faster. https://jira.sw.ru/browse/PSBM-73370 ms commit: eb6ef3d ("trylock_super(): replacement for grab_super_passive()") original commit message: I've noticed significant locking contention in memory reclaimer around sb_lock inside grab_super_passive(). Grab_super_passive() is called from two places: in icache/dcache shrinkers (function super_cache_scan) and from writeback (function __writeback_inodes_wb). Both are required for progress in memory allocator. Grab_super_passive() acquires sb_lock to increment sb->s_count and check sb->s_instances. It seems sb->s_umount locked for read is enough here: super-block deactivation always runs under sb->s_umount locked for write. Protecting super-block itself isn't a problem: in super_cache_scan() sb is protected by shrinker_rwsem: it cannot be freed if its slab shrinkers are still active. Inside writeback super-block comes from inode from bdi writeback list under wb->list_lock. This patch removes locking sb_lock and checks s_instances under s_umount: generic_shutdown_super() unlinks it under sb->s_umount locked for write. New variant is called trylock_super() and since it only locks semaphore, callers must call up_read(&sb->s_umount) instead of drop_super(sb) when they're done. Signed-off-by: Konstantin Khlebnikov <khlebni...@yandex-team.ru> Signed-off-by: Al Viro <v...@zeniv.linux.org.uk> Suggested-by: Andrey Ryabinin <aryabi...@virtuozzo.com> Signed-off-by: Pavel Tikhomirov <ptikhomi...@virtuozzo.com> --- fs/fs-writeback.c | 6 +++--- fs/internal.h | 2 +- fs/super.c | 42 +++++++++++++++++++----------------------- 3 files changed, 23 insertions(+), 27 deletions(-) diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 91f1d0a..7cea021 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -797,9 +797,9 @@ static long __writeback_inodes_wb(struct bdi_writeback *wb, if (time_is_before_jiffies(start_time + 15* HZ)) trace = 1; - if (!grab_super_passive(sb)) { + if (!trylock_super(sb)) { /* - * grab_super_passive() may fail consistently due to + * trylock_super() may fail consistently due to * s_umount being grabbed by someone else. Don't use * requeue_io() to avoid busy retrying the inode/sb. */ @@ -810,7 +810,7 @@ static long __writeback_inodes_wb(struct bdi_writeback *wb, continue; } wrote += writeback_sb_inodes(sb, wb, work); - drop_super(sb); + up_read(&sb->s_umount); /* refer to the same tests at the end of writeback_sb_inodes */ if (wrote) { diff --git a/fs/internal.h b/fs/internal.h index 289437c..ab2f706 100644 --- a/fs/internal.h +++ b/fs/internal.h @@ -86,7 +86,7 @@ extern struct file *get_empty_filp(void); * super.c */ extern int do_remount_sb(struct super_block *, int, void *, int); -extern bool grab_super_passive(struct super_block *sb); +extern bool trylock_super(struct super_block *sb); extern struct dentry *mount_fs(struct file_system_type *, int, const char *, void *); extern struct super_block *user_get_super(dev_t); diff --git a/fs/super.c b/fs/super.c index af7ae96..3b59975 100644 --- a/fs/super.c +++ b/fs/super.c @@ -94,8 +94,8 @@ static unsigned long super_cache_scan(struct shrinker *shrink, if (!(sc->gfp_mask & __GFP_FS)) return SHRINK_STOP; - if (!grab_super_passive(sb)) - return SHRINK_STOP; + if (!trylock_super(sb)) + return SHRINK_STOP; if (sb->s_op && sb->s_op->nr_cached_objects) fs_objects = sb->s_op->nr_cached_objects(sb, sc); @@ -123,7 +123,7 @@ static unsigned long super_cache_scan(struct shrinker *shrink, freed += sb->s_op->free_cached_objects(sb, sc); } - drop_super(sb); + up_read(&sb->s_umount); return freed; } @@ -139,7 +139,7 @@ static unsigned long super_cache_count(struct shrinker *shrink, sb = container_of(shrink, struct super_block, s_shrink); /* - * Don't call grab_super_passive as it is a potential + * Don't call trylock_super as it is a potential * scalability bottleneck. The counts could get updated * between super_cache_count and super_cache_scan anyway. * Call to super_cache_count with shrinker_rwsem held @@ -373,35 +373,31 @@ static int grab_super(struct super_block *s) __releases(sb_lock) } /* - * grab_super_passive - acquire a passive reference + * trylock_super - try to grab ->s_umount shared * @sb: reference we are trying to grab * - * Tries to acquire a passive reference. This is used in places where we + * Try to prevent fs shutdown. This is used in places where we * cannot take an active reference but we need to ensure that the - * superblock does not go away while we are working on it. It returns - * false if a reference was not gained, and returns true with the s_umount - * lock held in read mode if a reference is gained. On successful return, - * the caller must drop the s_umount lock and the passive reference when - * done. + * filesystem is not shut down while we are working on it. It returns + * false if we cannot acquire s_umount or if we lose the race and + * filesystem already got into shutdown, and returns true with the s_umount + * lock held in read mode in case of success. On successful return, + * the caller must drop the s_umount lock when done. + * + * Note that unlike get_super() et.al. this one does *not* bump ->s_count. + * The reason why it's safe is that we are OK with doing trylock instead + * of down_read(). There's a couple of places that are OK with that, but + * it's very much not a general-purpose interface. */ -bool grab_super_passive(struct super_block *sb) +bool trylock_super(struct super_block *sb) { - spin_lock(&sb_lock); - if (hlist_unhashed(&sb->s_instances)) { - spin_unlock(&sb_lock); - return false; - } - - sb->s_count++; - spin_unlock(&sb_lock); - if (down_read_trylock(&sb->s_umount)) { - if (sb->s_root && (sb->s_flags & MS_BORN)) + if (!hlist_unhashed(&sb->s_instances) && + sb->s_root && (sb->s_flags & MS_BORN)) return true; up_read(&sb->s_umount); } - put_super(sb); return false; } _______________________________________________ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel