[Devel] [PATCH rh7] netfilter: Add warning on nft NAT init if "iptable_nat" already loaded
nft NAT cannot work along with iptables NAT. "iptable_nat" module is always loaded on the VZ Node (libvirt triggers the load), so warn on "nft_nat" module load. i've added an additional check - if "ip(6)table_nat" modules are really loaded - may be some time later libvirt won't trigger their load. https://jira.sw.ru/browse/PSBM-102919 https://jira.sw.ru/browse/PSBM-123111 Signed-off-by: Konstantin Khorenko --- net/netfilter/nft_nat.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/net/netfilter/nft_nat.c b/net/netfilter/nft_nat.c index 3883504db5c3..d12d540e1b60 100644 --- a/net/netfilter/nft_nat.c +++ b/net/netfilter/nft_nat.c @@ -279,6 +279,12 @@ static struct nft_expr_type nft_nat_type __read_mostly = { static int __init nft_nat_module_init(void) { + /* nft NAT does not work if ip(6)table_nat module is loaded */ + WARN_ONCE(init_net.ipv4.nat_table || init_net.ipv6.ip6table_nat, + "WARNING: 'nft_nat' kernel module is being loaded " + "while 'ip(6)table_nat' module already loaded. " + "nft NAT will not work.\n"); + return nft_register_expr(&nft_nat_type); } -- 2.24.3 ___ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel
[Devel] [PATCH rh7 8/8] ms/mm/memory.c: share the i_mmap_rwsem
From: Davidlohr Bueso The unmap_mapping_range family of functions do the unmapping of user pages (ultimately via zap_page_range_single) without touching the actual interval tree, thus share the lock. Signed-off-by: Davidlohr Bueso Cc: "Kirill A. Shutemov" Acked-by: Hugh Dickins Cc: Oleg Nesterov Cc: Peter Zijlstra (Intel) Cc: Rik van Riel Cc: Srikar Dronamraju Acked-by: Mel Gorman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds https://jira.sw.ru/browse/PSBM-122663 (cherry picked from commit c8475d144abb1e62958cc5ec281d2a9e161c1946) Signed-off-by: Andrey Ryabinin --- mm/memory.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 7e66dea08f3f..3e5124d14996 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2712,10 +2712,10 @@ void unmap_mapping_range(struct address_space *mapping, if (details.last_index < details.first_index) details.last_index = ULONG_MAX; - i_mmap_lock_write(mapping); + i_mmap_lock_read(mapping); if (unlikely(!RB_EMPTY_ROOT(&mapping->i_mmap))) unmap_mapping_range_tree(&mapping->i_mmap, &details); - i_mmap_unlock_write(mapping); + i_mmap_unlock_read(mapping); } EXPORT_SYMBOL(unmap_mapping_range); -- 2.26.2 ___ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel
[Devel] [PATCH rh7 7/8] ms/mm/nommu: share the i_mmap_rwsem
From: Davidlohr Bueso Shrinking/truncate logic can call nommu_shrink_inode_mappings() to verify that any shared mappings of the inode in question aren't broken (dead zone). afaict the only user being ramfs to handle the size change attribute. Pretty much a no-brainer to share the lock. Signed-off-by: Davidlohr Bueso Acked-by: "Kirill A. Shutemov" Acked-by: Hugh Dickins Cc: Oleg Nesterov Acked-by: Peter Zijlstra (Intel) Cc: Rik van Riel Cc: Srikar Dronamraju Acked-by: Mel Gorman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds https://jira.sw.ru/browse/PSBM-122663 (cherry picked from commit 1acf2e040721564d579297646862b8ea3dd4511b) Signed-off-by: Andrey Ryabinin --- mm/nommu.c | 9 - 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/mm/nommu.c b/mm/nommu.c index f994621e52f0..290fe3031147 100644 --- a/mm/nommu.c +++ b/mm/nommu.c @@ -2134,14 +2134,14 @@ int nommu_shrink_inode_mappings(struct inode *inode, size_t size, high = (size + PAGE_SIZE - 1) >> PAGE_SHIFT; down_write(&nommu_region_sem); - i_mmap_lock_write(inode->i_mapping); + i_mmap_lock_read(inode->i_mapping); /* search for VMAs that fall within the dead zone */ vma_interval_tree_foreach(vma, &inode->i_mapping->i_mmap, low, high) { /* found one - only interested if it's shared out of the page * cache */ if (vma->vm_flags & VM_SHARED) { - i_mmap_unlock_write(inode->i_mapping); + i_mmap_unlock_read(inode->i_mapping); up_write(&nommu_region_sem); return -ETXTBSY; /* not quite true, but near enough */ } @@ -2153,8 +2153,7 @@ int nommu_shrink_inode_mappings(struct inode *inode, size_t size, * we don't check for any regions that start beyond the EOF as there * shouldn't be any */ - vma_interval_tree_foreach(vma, &inode->i_mapping->i_mmap, - 0, ULONG_MAX) { + vma_interval_tree_foreach(vma, &inode->i_mapping->i_mmap, 0, ULONG_MAX) { if (!(vma->vm_flags & VM_SHARED)) continue; @@ -2169,7 +2168,7 @@ int nommu_shrink_inode_mappings(struct inode *inode, size_t size, } } - i_mmap_unlock_write(inode->i_mapping); + i_mmap_unlock_read(inode->i_mapping); up_write(&nommu_region_sem); return 0; } -- 2.26.2 ___ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel
[Devel] [PATCH rh7 5/8] ms/uprobes: share the i_mmap_rwsem
From: Davidlohr Bueso Both register and unregister call build_map_info() in order to create the list of mappings before installing or removing breakpoints for every mm which maps file backed memory. As such, there is no reason to hold the i_mmap_rwsem exclusively, so share it and allow concurrent readers to build the mapping data. Signed-off-by: Davidlohr Bueso Acked-by: Srikar Dronamraju Acked-by: "Kirill A. Shutemov" Cc: Oleg Nesterov Acked-by: Hugh Dickins Acked-by: Peter Zijlstra (Intel) Cc: Rik van Riel Acked-by: Mel Gorman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds https://jira.sw.ru/browse/PSBM-122663 (cherry picked from commit 4a23717a236b2ab31efb1651f586126789fc997f) Signed-off-by: Andrey Ryabinin --- kernel/events/uprobes.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 9f312227a769..be501d8d9704 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -690,7 +690,7 @@ build_map_info(struct address_space *mapping, loff_t offset, bool is_register) int more = 0; again: - i_mmap_lock_write(mapping); + i_mmap_lock_read(mapping); vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) { if (!valid_vma(vma, is_register)) continue; @@ -721,7 +721,7 @@ build_map_info(struct address_space *mapping, loff_t offset, bool is_register) info->mm = vma->vm_mm; info->vaddr = offset_to_vaddr(vma, offset); } - i_mmap_unlock_write(mapping); + i_mmap_unlock_read(mapping); if (!more) goto out; -- 2.26.2 ___ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel
[Devel] [PATCH rh7 6/8] ms/mm/memory-failure: share the i_mmap_rwsem
From: Davidlohr Bueso No brainer conversion: collect_procs_file() only schedules a process for later kill, share the lock, similarly to the anon vma variant. Signed-off-by: Davidlohr Bueso Acked-by: "Kirill A. Shutemov" Acked-by: Hugh Dickins Cc: Oleg Nesterov Acked-by: Peter Zijlstra (Intel) Cc: Rik van Riel Cc: Srikar Dronamraju Acked-by: Mel Gorman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds https://jira.sw.ru/browse/PSBM-122663 (cherry picked from commit d28eb9c861f41aa2af4cfcc5eeeddff42b13d31e) Signed-off-by: Andrey Ryabinin --- mm/memory-failure.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index da1ef2edd5dd..a5f5e604c0b8 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -497,7 +497,7 @@ static void collect_procs_file(struct page *page, struct list_head *to_kill, struct task_struct *tsk; struct address_space *mapping = page->mapping; - i_mmap_lock_write(mapping); + i_mmap_lock_read(mapping); qread_lock(&tasklist_lock); for_each_process(tsk) { pgoff_t pgoff = page_to_pgoff(page); @@ -519,7 +519,7 @@ static void collect_procs_file(struct page *page, struct list_head *to_kill, } } qread_unlock(&tasklist_lock); - i_mmap_unlock_write(mapping); + i_mmap_unlock_read(mapping); } /* -- 2.26.2 ___ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel
[Devel] [PATCH rh7 1/8] ms/mm, fs: introduce helpers around the i_mmap_mutex
From: Davidlohr Bueso This series is a continuation of the conversion of the i_mmap_mutex to rwsem, following what we have for the anon memory counterpart. With Hugh's feedback from the first iteration. Ultimately, the most obvious paths that require exclusive ownership of the lock is when we modify the VMA interval tree, via vma_interval_tree_insert() and vma_interval_tree_remove() families. Cases such as unmapping, where the ptes content is changed but the tree remains untouched should make it safe to share the i_mmap_rwsem. As such, the code of course is straightforward, however the devil is very much in the details. While its been tested on a number of workloads without anything exploding, I would not be surprised if there are some less documented/known assumptions about the lock that could suffer from these changes. Or maybe I'm just missing something, but either way I believe its at the point where it could use more eyes and hopefully some time in linux-next. Because the lock type conversion is the heart of this patchset, its worth noting a few comparisons between mutex vs rwsem (xadd): (i) Same size, no extra footprint. (ii) Both have CONFIG_XXX_SPIN_ON_OWNER capabilities for exclusive lock ownership. (iii) Both can be slightly unfair wrt exclusive ownership, with writer lock stealing properties, not necessarily respecting FIFO order for granting the lock when contended. (iv) Mutexes can be slightly faster than rwsems when the lock is non-contended. (v) Both suck at performance for debug (slowpaths), which shouldn't matter anyway. Sharing the lock is obviously beneficial, and sem writer ownership is close enough to mutexes. The biggest winner of these changes is migration. As for concrete numbers, the following performance results are for a 4-socket 60-core IvyBridge-EX with 130Gb of RAM. Both alltests and disk (xfs+ramdisk) workloads of aim7 suite do quite well with this set, with a steady ~60% throughput (jpm) increase for alltests and up to ~30% for disk for high amounts of concurrency. Lower counts of workload users (< 100) does not show much difference at all, so at least no regressions. 3.18-rc13.18-rc1-i_mmap_rwsem alltests-100 17918.72 ( 0.00%)28417.97 ( 58.59%) alltests-200 16529.39 ( 0.00%)26807.92 ( 62.18%) alltests-300 16591.17 ( 0.00%)26878.08 ( 62.00%) alltests-400 16490.37 ( 0.00%)26664.63 ( 61.70%) alltests-500 16593.17 ( 0.00%)26433.72 ( 59.30%) alltests-600 16508.56 ( 0.00%)26409.20 ( 59.97%) alltests-700 16508.19 ( 0.00%)26298.58 ( 59.31%) alltests-800 16437.58 ( 0.00%)26433.02 ( 60.81%) alltests-900 16418.35 ( 0.00%)26241.61 ( 59.83%) alltests-100016369.00 ( 0.00%)26195.76 ( 60.03%) alltests-110016330.11 ( 0.00%)26133.46 ( 60.03%) alltests-120016341.30 ( 0.00%)26084.03 ( 59.62%) alltests-130016304.75 ( 0.00%)26024.74 ( 59.61%) alltests-140016231.08 ( 0.00%)25952.35 ( 59.89%) alltests-150016168.06 ( 0.00%)25850.58 ( 59.89%) alltests-160016142.56 ( 0.00%)25767.42 ( 59.62%) alltests-170016118.91 ( 0.00%)25689.58 ( 59.38%) alltests-180016068.06 ( 0.00%)25599.71 ( 59.32%) alltests-190016046.94 ( 0.00%)25525.92 ( 59.07%) alltests-200016007.26 ( 0.00%)25513.07 ( 59.38%) disk-100 7582.14 ( 0.00%) 7257.48 ( -4.28%) disk-200 6962.44 ( 0.00%) 7109.15 ( 2.11%) disk-300 6435.93 ( 0.00%) 6904.75 ( 7.28%) disk-400 6370.84 ( 0.00%) 6861.26 ( 7.70%) disk-500 6353.42 ( 0.00%) 6846.71 ( 7.76%) disk-600 6368.82 ( 0.00%) 6806.75 ( 6.88%) disk-700 6331.37 ( 0.00%) 6796.01 ( 7.34%) disk-800 6324.22 ( 0.00%) 6788.00 ( 7.33%) disk-900 6253.52 ( 0.00%) 6750.43 ( 7.95%) disk-1000 6242.53 ( 0.00%) 6855.11 ( 9.81%) disk-1100 6234.75 ( 0.00%) 6858.47 ( 10.00%) disk-1200 6312.76 ( 0.00%) 6845.13 ( 8.43%) disk-1300 6309.95 ( 0.00%) 6834.51 ( 8.31%) disk-1400 6171.76 ( 0.00%) 6787.09 ( 9.97%) disk-1500 6139.81 ( 0.00%) 6761.09 ( 10.12%) disk-1600 4807.12 ( 0.00%) 6725.33 ( 39.90%) disk-1700 4669.50 ( 0.00%) 5985.38 ( 28.18%) disk-1800 4663.51 ( 0.00%) 5972.99 ( 28.08%) disk-1900 4674.31 ( 0.00%) 5949.94 ( 27.29%) disk-2000 4668.36 ( 0.00%) 5834.93 ( 24.99%) In addition, a 67.5% increase in successfully migrated NUMA pages, thus improving node locality. The patch layout is simple but designed for bisection (in case reversion is needed if the changes break upstream) and easier review: o Patches 1-4 convert the i_mmap lock from mutex to rwsem. o Patches 5-10 share the lock in specific paths, each patch details the rationale behind why it should be safe. This p
[Devel] [PATCH rh7 4/8] ms/mm/rmap: share the i_mmap_rwsem
From: Davidlohr Bueso Similarly to the anon memory counterpart, we can share the mapping's lock ownership as the interval tree is not modified when doing doing the walk, only the file page. Signed-off-by: Davidlohr Bueso Acked-by: Rik van Riel Acked-by: "Kirill A. Shutemov" Acked-by: Hugh Dickins Cc: Oleg Nesterov Acked-by: Peter Zijlstra (Intel) Cc: Srikar Dronamraju Acked-by: Mel Gorman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds https://jira.sw.ru/browse/PSBM-122663 (cherry picked from commit 3dec0ba0be6a532cac949e02b853021bf6d57dad) Signed-off-by: Andrey Ryabinin --- include/linux/fs.h | 10 ++ mm/rmap.c | 9 + 2 files changed, 15 insertions(+), 4 deletions(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index f422b0f7b02a..acedffc46fe4 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -709,6 +709,16 @@ static inline void i_mmap_unlock_write(struct address_space *mapping) up_write(&mapping->i_mmap_rwsem); } +static inline void i_mmap_lock_read(struct address_space *mapping) +{ + down_read(&mapping->i_mmap_rwsem); +} + +static inline void i_mmap_unlock_read(struct address_space *mapping) +{ + up_read(&mapping->i_mmap_rwsem); +} + /* * Might pages of this file be mapped into userspace? */ diff --git a/mm/rmap.c b/mm/rmap.c index e72be32c3dae..523957450d20 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1723,7 +1723,8 @@ static int rmap_walk_file(struct page *page, struct rmap_walk_control *rwc) if (!mapping) return ret; pgoff = page_to_pgoff(page); - down_write_nested(&mapping->i_mmap_rwsem, SINGLE_DEPTH_NESTING); + + i_mmap_lock_read(mapping); vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) { unsigned long address = vma_address(page, vma); @@ -1748,7 +1749,7 @@ static int rmap_walk_file(struct page *page, struct rmap_walk_control *rwc) if (!mapping_mapped(peer)) continue; - i_mmap_lock_write(peer); + i_mmap_lock_read(peer); vma_interval_tree_foreach(vma, &peer->i_mmap, pgoff, pgoff) { unsigned long address = vma_address(page, vma); @@ -1764,7 +1765,7 @@ static int rmap_walk_file(struct page *page, struct rmap_walk_control *rwc) cond_resched(); } - i_mmap_unlock_write(peer); + i_mmap_unlock_read(peer); if (ret != SWAP_AGAIN) goto done; @@ -1772,7 +1773,7 @@ static int rmap_walk_file(struct page *page, struct rmap_walk_control *rwc) goto done; } done: - i_mmap_unlock_write(mapping); + i_mmap_unlock_read(mapping); return ret; } -- 2.26.2 ___ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel
[Devel] [PATCH rh7 3/8] ms/mm: convert i_mmap_mutex to rwsem
From: Davidlohr Bueso The i_mmap_mutex is a close cousin of the anon vma lock, both protecting similar data, one for file backed pages and the other for anon memory. To this end, this lock can also be a rwsem. In addition, there are some important opportunities to share the lock when there are no tree modifications. This conversion is straightforward. For now, all users take the write lock. [s...@canb.auug.org.au: update fremap.c] Signed-off-by: Davidlohr Bueso Reviewed-by: Rik van Riel Acked-by: "Kirill A. Shutemov" Acked-by: Hugh Dickins Cc: Oleg Nesterov Acked-by: Peter Zijlstra (Intel) Cc: Srikar Dronamraju Acked-by: Mel Gorman Signed-off-by: Stephen Rothwell Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds https://jira.sw.ru/browse/PSBM-122663 (cherry picked from commit c8c06efa8b552608493b7066c234cfa82c47fcea) Signed-off-by: Andrey Ryabinin --- Documentation/vm/locking | 2 +- fs/hugetlbfs/inode.c | 10 +- fs/inode.c | 2 +- include/linux/fs.h | 7 --- include/linux/mmu_notifier.h | 2 +- kernel/events/uprobes.c | 2 +- mm/filemap.c | 10 +- mm/hugetlb.c | 10 +- mm/memory.c | 2 +- mm/mmap.c| 6 +++--- mm/mremap.c | 2 +- mm/rmap.c| 4 ++-- 12 files changed, 30 insertions(+), 29 deletions(-) diff --git a/Documentation/vm/locking b/Documentation/vm/locking index f61228bd6395..fb6402884062 100644 --- a/Documentation/vm/locking +++ b/Documentation/vm/locking @@ -66,7 +66,7 @@ in some cases it is not really needed. Eg, vm_start is modified by expand_stack(), it is hard to come up with a destructive scenario without having the vmlist protection in this case. -The page_table_lock nests with the inode i_mmap_mutex and the kmem cache +The page_table_lock nests with the inode i_mmap_rwsem and the kmem cache c_spinlock spinlocks. This is okay, since the kmem code asks for pages after dropping c_spinlock. The page_table_lock also nests with pagecache_lock and pagemap_lru_lock spinlocks, and no code asks for memory with these locks diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index fb40a55cc8f1..68f8f2f0eaf5 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -757,12 +757,12 @@ static struct inode *hugetlbfs_get_root(struct super_block *sb, } /* - * Hugetlbfs is not reclaimable; therefore its i_mmap_mutex will never + * Hugetlbfs is not reclaimable; therefore its i_mmap_rwsem will never * be taken from reclaim -- unlike regular filesystems. This needs an * annotation because huge_pmd_share() does an allocation under hugetlb's - * i_mmap_mutex. + * i_mmap_rwsem. */ -struct lock_class_key hugetlbfs_i_mmap_mutex_key; +static struct lock_class_key hugetlbfs_i_mmap_rwsem_key; static struct inode *hugetlbfs_get_inode(struct super_block *sb, struct inode *dir, @@ -779,8 +779,8 @@ static struct inode *hugetlbfs_get_inode(struct super_block *sb, if (inode) { inode->i_ino = get_next_ino(); inode_init_owner(inode, dir, mode); - lockdep_set_class(&inode->i_mapping->i_mmap_mutex, - &hugetlbfs_i_mmap_mutex_key); + lockdep_set_class(&inode->i_mapping->i_mmap_rwsem, + &hugetlbfs_i_mmap_rwsem_key); inode->i_mapping->a_ops = &hugetlbfs_aops; inode->i_mapping->backing_dev_info =&hugetlbfs_backing_dev_info; inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME; diff --git a/fs/inode.c b/fs/inode.c index 5253272c3742..2423a30dda1b 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -356,7 +356,7 @@ void address_space_init_once(struct address_space *mapping) memset(mapping, 0, sizeof(*mapping)); INIT_RADIX_TREE(&mapping->page_tree, GFP_ATOMIC); spin_lock_init(&mapping->tree_lock); - mutex_init(&mapping->i_mmap_mutex); + init_rwsem(&mapping->i_mmap_rwsem); INIT_LIST_HEAD(&mapping->private_list); spin_lock_init(&mapping->private_lock); mapping->i_mmap = RB_ROOT; diff --git a/include/linux/fs.h b/include/linux/fs.h index e32cb9b71042..f422b0f7b02a 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -17,6 +17,7 @@ #include #include #include +#include #include #include #include @@ -626,7 +627,7 @@ struct address_space { RH_KABI_REPLACE(unsigned int i_mmap_writable, atomic_t i_mmap_writable) /* count VM_SHARED mappings */ struct rb_root i_mmap; /* tree of private and shared mappings */ - struct mutexi_mmap_mutex; /* protect tree, count, list */ + struct rw_semaphore i_mmap_rwsem; /* protect tree, count, list */ /* Protected by tree_lock together with the radix tree */ unsign
[Devel] [PATCH rh7 2/8] ms/mm: use new helper functions around the i_mmap_mutex
From: Davidlohr Bueso Convert all open coded mutex_lock/unlock calls to the i_mmap_[lock/unlock]_write() helpers. Signed-off-by: Davidlohr Bueso Acked-by: Rik van Riel Acked-by: "Kirill A. Shutemov" Acked-by: Hugh Dickins Cc: Oleg Nesterov Acked-by: Peter Zijlstra (Intel) Cc: Srikar Dronamraju Acked-by: Mel Gorman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds https://jira.sw.ru/browse/PSBM-122663 (cherry picked from commit 83cde9e8ba95d180eaefefe834958fbf7008cf39) Signed-off-by: Andrey Ryabinin --- fs/dax.c| 4 ++-- fs/hugetlbfs/inode.c| 12 ++-- kernel/events/uprobes.c | 4 ++-- kernel/fork.c | 4 ++-- mm/hugetlb.c| 12 ++-- mm/memory-failure.c | 4 ++-- mm/memory.c | 28 ++-- mm/mmap.c | 14 +++--- mm/mremap.c | 4 ++-- mm/nommu.c | 14 +++--- mm/rmap.c | 6 +++--- 11 files changed, 53 insertions(+), 53 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index f22e3b32b6cc..7a18745acf01 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -909,7 +909,7 @@ static void dax_mapping_entry_mkclean(struct address_space *mapping, spinlock_t *ptl; bool changed; - mutex_lock(&mapping->i_mmap_mutex); + i_mmap_lock_write(mapping); vma_interval_tree_foreach(vma, &mapping->i_mmap, index, index) { unsigned long address; @@ -960,7 +960,7 @@ unlock_pte: if (changed) mmu_notifier_invalidate_page(vma->vm_mm, address); } - mutex_unlock(&mapping->i_mmap_mutex); + i_mmap_unlock_write(mapping); } static int dax_writeback_one(struct dax_device *dax_dev, diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index bdd5c7827391..fb40a55cc8f1 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -493,11 +493,11 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, if (unlikely(page_mapped(page))) { BUG_ON(truncate_op); - mutex_lock(&mapping->i_mmap_mutex); + i_mmap_lock_write(mapping); hugetlb_vmdelete_list(&mapping->i_mmap, next * pages_per_huge_page(h), (next + 1) * pages_per_huge_page(h)); - mutex_unlock(&mapping->i_mmap_mutex); + i_mmap_unlock_write(mapping); } lock_page(page); @@ -553,10 +553,10 @@ static int hugetlb_vmtruncate(struct inode *inode, loff_t offset) pgoff = offset >> PAGE_SHIFT; i_size_write(inode, offset); - mutex_lock(&mapping->i_mmap_mutex); + i_mmap_lock_write(mapping); if (!RB_EMPTY_ROOT(&mapping->i_mmap)) hugetlb_vmdelete_list(&mapping->i_mmap, pgoff, 0); - mutex_unlock(&mapping->i_mmap_mutex); + i_mmap_unlock_write(mapping); remove_inode_hugepages(inode, offset, LLONG_MAX); return 0; } @@ -578,12 +578,12 @@ static long hugetlbfs_punch_hole(struct inode *inode, loff_t offset, loff_t len) struct address_space *mapping = inode->i_mapping; mutex_lock(&inode->i_mutex); - mutex_lock(&mapping->i_mmap_mutex); + i_mmap_lock_write(mapping); if (!RB_EMPTY_ROOT(&mapping->i_mmap)) hugetlb_vmdelete_list(&mapping->i_mmap, hole_start >> PAGE_SHIFT, hole_end >> PAGE_SHIFT); - mutex_unlock(&mapping->i_mmap_mutex); + i_mmap_unlock_write(mapping); remove_inode_hugepages(inode, hole_start, hole_end); mutex_unlock(&inode->i_mutex); } diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index a5a59cc93fb6..816ad8e3d92f 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -690,7 +690,7 @@ build_map_info(struct address_space *mapping, loff_t offset, bool is_register) int more = 0; again: - mutex_lock(&mapping->i_mmap_mutex); + i_mmap_lock_write(mapping); vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) { if (!valid_vma(vma, is_register)) continue; @@ -721,7 +721,7 @@ build_map_info(struct address_space *mapping, loff_t offset, bool is_register) info->mm = vma->vm_mm; info->vaddr = offset_to_vaddr(vma, offset); } - mutex_unlock(&mapping->i_mmap_mutex); + i_mmap_unlock_write(mapping); if (!more) goto out; diff --git a/kernel/fork.c b/kernel/fork.c index 9467e21a8fa4..b6a5279403be 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -5
[Devel] [PATCH rh7] Revert "mm: Port diff-mm-vmscan-disable-fs-related-activity-for-direct-direct-reclaim"
This reverts commit 50fb388878b646872b78143de3c1bf3fa6f7f148. Sometimes we can see a lot of reclaimable dcache and no other reclaimbale memory. It looks like that kswapd can't keep up reclaiming dcache fast enough. Commit 50fb388878b6 forbids to reclaim dcache in direct reclaim to prevent potential deadlocks that might happen due to bugs in other subsystems. Revert it to allow more aggressive dcache reclaim. It's unlikely to cause any problems since we already directly reclaim dcache in memcg reclaim, so let's do the same for the global one. https://jira.sw.ru/browse/PSBM-122663 Signed-off-by: Andrey Ryabinin --- mm/vmscan.c | 8 +--- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 85622f235e78..240435eb6d84 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2653,15 +2653,9 @@ static void shrink_zone(struct zone *zone, struct scan_control *sc, { struct reclaim_state *reclaim_state = current->reclaim_state; unsigned long nr_reclaimed, nr_scanned; - gfp_t slab_gfp = sc->gfp_mask; bool slab_only = sc->slab_only; bool retry; - /* Disable fs-related IO for direct reclaim */ - if (!sc->target_mem_cgroup && - (current->flags & (PF_MEMALLOC|PF_KSWAPD)) == PF_MEMALLOC) - slab_gfp &= ~__GFP_FS; - do { struct mem_cgroup *root = sc->target_mem_cgroup; struct mem_cgroup_reclaim_cookie reclaim = { @@ -2695,7 +2689,7 @@ static void shrink_zone(struct zone *zone, struct scan_control *sc, } if (is_classzone) { - shrink_slab(slab_gfp, zone_to_nid(zone), + shrink_slab(sc->gfp_mask, zone_to_nid(zone), memcg, sc->priority, false); if (reclaim_state) { sc->nr_reclaimed += reclaim_state->reclaimed_slab; -- 2.26.2 ___ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel