Hi Will, none of my comments below represent objections to this patch, but let me remark:
On Thu, Oct 05, 2017 at 05:31:54PM +0100, Will Deacon wrote: > Hi Paul, > > On Tue, Oct 03, 2017 at 12:11:10PM -0700, Paul E. McKenney wrote: > > On Fri, Sep 29, 2017 at 05:33:49PM +0100, Will Deacon wrote: > > > On Fri, Sep 29, 2017 at 09:29:39AM -0700, Paul E. McKenney wrote: > > > > On Fri, Sep 29, 2017 at 10:08:43AM +0100, Will Deacon wrote: > > > > > Ok, but where does that leave us wrt my initial proposal of moving > > > > > smp_read_barrier_depends() into READ_ONCE and getting rid of > > > > > lockless_dereference? > > > > > > > > > > Michael (or anybody else running mainline on SMP Alpha) -- would you > > > > > be > > > > > able to give the diff below a spin and see whether there's a > > > > > measurable > > > > > performance impact? > > > > > > > > This will be a sensitive test. The smp_read_barrier_depends() can be > > > > removed from lockless_dereference(). Without this removal Alpha will > > > > get two memory barriers from rcu_dereference() and friends. > > > > > > Oh yes, good point. I was trying to keep the diff simple, but you're > > > right that this is packing too many barriers. Fixed diff below. > > > > Not seeing any objections thus far. If there are none by (say) the > > end of this week, I would be happy to queue a patch for the 4.16 > > merge window. That should give ample opportunity for further review > > and testing. > > Ok, full patch below. > > Will > > --->8 > > From 15956d0cc6b37208d8542b1858a8d8b64227acf4 Mon Sep 17 00:00:00 2001 > From: Will Deacon <will.dea...@arm.com> > Date: Thu, 5 Oct 2017 16:57:36 +0100 > Subject: [PATCH] locking/barriers: Kill lockless_dereference > > lockless_dereference is a nice idea, but it's gained little traction in > kernel code since it's introduction three years ago. This is partly > because it's a pain to type, but also because using READ_ONCE instead > will work correctly on all architectures apart from Alpha, which is a > fully supported but somewhat niche architecture these days. lockless_dereference might be a mouthful, but it does (explicitly) say/remark: "Yep, we are relying on the following address dep. to be "in strong-ppo" ". Such information will be lost or, at least, not immediately clear by just reading a READ_ONCE(). (And Yes, this information is only relevant when we "include" Alpha in the picture/analysis.) > > This patch moves smp_read_barrier_depends() (a NOP on all architectures > other than Alpha) from lockless_dereference into READ_ONCE, converts > the few actual users over to READ_ONCE and then finally removes > lockless_dereference altogether. Notice that several "potential users" of lockless_dereference are currently hidden in other call sites for smp_read_barrier_depends (i.e., cases where this barrier is not called from within a lock- less or an RCU dereference). Some of these usages (e.g., include/linux/percpu-refcount.h:__ref_is_percpu, mm/ksm.c:get_ksm_page, security/keys/keyring.c:search_nested_keyrings ) precedes this barrier with a READ_ONCE; others (e.g., arch/alpha/include/asm/pgtable.h:pmd_offset, net/ipv4/netfilter/arp_tables.c:arpt_do_table kernel/kernel/events/uprobes.c:get_trampiline_vaddr ) with a plain read. There also appear to be cases where the barrier is preceded by an ACCESS_ONCE (c.f, fs/dcache.c:prepend_name) or by an xchg_release (c.f., kernel/locking/qspinlock.c:queued_spin_lock_slowpath), and it would not be difficult to imagine/create different usages. > > Signed-off-by: Will Deacon <will.dea...@arm.com> I understand that we all agree we're missing a Tested-by here ;-). Andrea > --- > Documentation/memory-barriers.txt | 12 ------------ > .../translations/ko_KR/memory-barriers.txt | 12 ------------ > arch/x86/events/core.c | 2 +- > arch/x86/include/asm/mmu_context.h | 4 ++-- > arch/x86/kernel/ldt.c | 2 +- > drivers/md/dm-mpath.c | 20 ++++++++++---------- > fs/dcache.c | 4 ++-- > fs/overlayfs/ovl_entry.h | 2 +- > fs/overlayfs/readdir.c | 2 +- > include/linux/compiler.h | 21 > +-------------------- > include/linux/rculist.h | 4 ++-- > include/linux/rcupdate.h | 4 ++-- > kernel/events/core.c | 4 ++-- > kernel/seccomp.c | 2 +- > kernel/task_work.c | 2 +- > mm/slab.h | 2 +- > 16 files changed, 28 insertions(+), 71 deletions(-) > > diff --git a/Documentation/memory-barriers.txt > b/Documentation/memory-barriers.txt > index b759a60624fd..470a682f3fa4 100644 > --- a/Documentation/memory-barriers.txt > +++ b/Documentation/memory-barriers.txt > @@ -1886,18 +1886,6 @@ There are some more advanced barrier functions: > See Documentation/atomic_{t,bitops}.txt for more information. > > > - (*) lockless_dereference(); > - > - This can be thought of as a pointer-fetch wrapper around the > - smp_read_barrier_depends() data-dependency barrier. > - > - This is also similar to rcu_dereference(), but in cases where > - object lifetime is handled by some mechanism other than RCU, for > - example, when the objects removed only when the system goes down. > - In addition, lockless_dereference() is used in some data structures > - that can be used both with and without RCU. > - > - > (*) dma_wmb(); > (*) dma_rmb(); > > diff --git a/Documentation/translations/ko_KR/memory-barriers.txt > b/Documentation/translations/ko_KR/memory-barriers.txt > index a7a813258013..ec3b46e27b7a 100644 > --- a/Documentation/translations/ko_KR/memory-barriers.txt > +++ b/Documentation/translations/ko_KR/memory-barriers.txt > @@ -1858,18 +1858,6 @@ Mandatory 배리어들은 SMP 시스템에서도 UP 시스템에서도 SMP 효 > 참고하세요. > > > - (*) lockless_dereference(); > - > - 이 함수는 smp_read_barrier_depends() 데이터 의존성 배리어를 사용하는 > - 포인터 읽어오기 래퍼(wrapper) 함수로 생각될 수 있습니다. > - > - 객체의 라이프타임이 RCU 외의 메커니즘으로 관리된다는 점을 제외하면 > - rcu_dereference() 와도 유사한데, 예를 들면 객체가 시스템이 꺼질 때에만 > - 제거되는 경우 등입니다. 또한, lockless_dereference() 은 RCU 와 함께 > - 사용될수도, RCU 없이 사용될 수도 있는 일부 데이터 구조에 사용되고 > - 있습니다. > - > - > (*) dma_wmb(); > (*) dma_rmb(); > > diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c > index 80534d3c2480..589af1eec7c1 100644 > --- a/arch/x86/events/core.c > +++ b/arch/x86/events/core.c > @@ -2371,7 +2371,7 @@ static unsigned long get_segment_base(unsigned int > segment) > struct ldt_struct *ldt; > > /* IRQs are off, so this synchronizes with smp_store_release */ > - ldt = lockless_dereference(current->active_mm->context.ldt); > + ldt = READ_ONCE(current->active_mm->context.ldt); > if (!ldt || idx >= ldt->nr_entries) > return 0; > > diff --git a/arch/x86/include/asm/mmu_context.h > b/arch/x86/include/asm/mmu_context.h > index c120b5db178a..9037a4e546e8 100644 > --- a/arch/x86/include/asm/mmu_context.h > +++ b/arch/x86/include/asm/mmu_context.h > @@ -72,8 +72,8 @@ static inline void load_mm_ldt(struct mm_struct *mm) > #ifdef CONFIG_MODIFY_LDT_SYSCALL > struct ldt_struct *ldt; > > - /* lockless_dereference synchronizes with smp_store_release */ > - ldt = lockless_dereference(mm->context.ldt); > + /* READ_ONCE synchronizes with smp_store_release */ > + ldt = READ_ONCE(mm->context.ldt); > > /* > * Any change to mm->context.ldt is followed by an IPI to all > diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c > index f0e64db18ac8..0a21390642c4 100644 > --- a/arch/x86/kernel/ldt.c > +++ b/arch/x86/kernel/ldt.c > @@ -101,7 +101,7 @@ static void finalize_ldt_struct(struct ldt_struct *ldt) > static void install_ldt(struct mm_struct *current_mm, > struct ldt_struct *ldt) > { > - /* Synchronizes with lockless_dereference in load_mm_ldt. */ > + /* Synchronizes with READ_ONCE in load_mm_ldt. */ > smp_store_release(¤t_mm->context.ldt, ldt); > > /* Activate the LDT for all CPUs using current_mm. */ > diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c > index 11f273d2f018..3f88c9d32f7e 100644 > --- a/drivers/md/dm-mpath.c > +++ b/drivers/md/dm-mpath.c > @@ -366,7 +366,7 @@ static struct pgpath *choose_path_in_pg(struct multipath > *m, > > pgpath = path_to_pgpath(path); > > - if (unlikely(lockless_dereference(m->current_pg) != pg)) { > + if (unlikely(READ_ONCE(m->current_pg) != pg)) { > /* Only update current_pgpath if pg changed */ > spin_lock_irqsave(&m->lock, flags); > m->current_pgpath = pgpath; > @@ -390,7 +390,7 @@ static struct pgpath *choose_pgpath(struct multipath *m, > size_t nr_bytes) > } > > /* Were we instructed to switch PG? */ > - if (lockless_dereference(m->next_pg)) { > + if (READ_ONCE(m->next_pg)) { > spin_lock_irqsave(&m->lock, flags); > pg = m->next_pg; > if (!pg) { > @@ -406,7 +406,7 @@ static struct pgpath *choose_pgpath(struct multipath *m, > size_t nr_bytes) > > /* Don't change PG until it has no remaining paths */ > check_current_pg: > - pg = lockless_dereference(m->current_pg); > + pg = READ_ONCE(m->current_pg); > if (pg) { > pgpath = choose_path_in_pg(m, pg, nr_bytes); > if (!IS_ERR_OR_NULL(pgpath)) > @@ -473,7 +473,7 @@ static int multipath_clone_and_map(struct dm_target *ti, > struct request *rq, > struct request *clone; > > /* Do we need to select a new pgpath? */ > - pgpath = lockless_dereference(m->current_pgpath); > + pgpath = READ_ONCE(m->current_pgpath); > if (!pgpath || !test_bit(MPATHF_QUEUE_IO, &m->flags)) > pgpath = choose_pgpath(m, nr_bytes); > > @@ -535,7 +535,7 @@ static int __multipath_map_bio(struct multipath *m, > struct bio *bio, struct dm_m > bool queue_io; > > /* Do we need to select a new pgpath? */ > - pgpath = lockless_dereference(m->current_pgpath); > + pgpath = READ_ONCE(m->current_pgpath); > queue_io = test_bit(MPATHF_QUEUE_IO, &m->flags); > if (!pgpath || !queue_io) > pgpath = choose_pgpath(m, nr_bytes); > @@ -1804,7 +1804,7 @@ static int multipath_prepare_ioctl(struct dm_target *ti, > struct pgpath *current_pgpath; > int r; > > - current_pgpath = lockless_dereference(m->current_pgpath); > + current_pgpath = READ_ONCE(m->current_pgpath); > if (!current_pgpath) > current_pgpath = choose_pgpath(m, 0); > > @@ -1826,7 +1826,7 @@ static int multipath_prepare_ioctl(struct dm_target *ti, > } > > if (r == -ENOTCONN) { > - if (!lockless_dereference(m->current_pg)) { > + if (!READ_ONCE(m->current_pg)) { > /* Path status changed, redo selection */ > (void) choose_pgpath(m, 0); > } > @@ -1895,9 +1895,9 @@ static int multipath_busy(struct dm_target *ti) > return (m->queue_mode != DM_TYPE_MQ_REQUEST_BASED); > > /* Guess which priority_group will be used at next mapping time */ > - pg = lockless_dereference(m->current_pg); > - next_pg = lockless_dereference(m->next_pg); > - if (unlikely(!lockless_dereference(m->current_pgpath) && next_pg)) > + pg = READ_ONCE(m->current_pg); > + next_pg = READ_ONCE(m->next_pg); > + if (unlikely(!READ_ONCE(m->current_pgpath) && next_pg)) > pg = next_pg; > > if (!pg) { > diff --git a/fs/dcache.c b/fs/dcache.c > index f90141387f01..34c852af215c 100644 > --- a/fs/dcache.c > +++ b/fs/dcache.c > @@ -231,7 +231,7 @@ static inline int dentry_cmp(const struct dentry *dentry, > const unsigned char *c > { > /* > * Be careful about RCU walk racing with rename: > - * use 'lockless_dereference' to fetch the name pointer. > + * use 'READ_ONCE' to fetch the name pointer. > * > * NOTE! Even if a rename will mean that the length > * was not loaded atomically, we don't care. The > @@ -245,7 +245,7 @@ static inline int dentry_cmp(const struct dentry *dentry, > const unsigned char *c > * early because the data cannot match (there can > * be no NUL in the ct/tcount data) > */ > - const unsigned char *cs = lockless_dereference(dentry->d_name.name); > + const unsigned char *cs = READ_ONCE(dentry->d_name.name); > > return dentry_string_cmp(cs, ct, tcount); > } > diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h > index 878a750986dd..0f6809fa6628 100644 > --- a/fs/overlayfs/ovl_entry.h > +++ b/fs/overlayfs/ovl_entry.h > @@ -74,5 +74,5 @@ static inline struct ovl_inode *OVL_I(struct inode *inode) > > static inline struct dentry *ovl_upperdentry_dereference(struct ovl_inode > *oi) > { > - return lockless_dereference(oi->__upperdentry); > + return READ_ONCE(oi->__upperdentry); > } > diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c > index 62e9b22a2077..0b389d330613 100644 > --- a/fs/overlayfs/readdir.c > +++ b/fs/overlayfs/readdir.c > @@ -754,7 +754,7 @@ static int ovl_dir_fsync(struct file *file, loff_t start, > loff_t end, > if (!od->is_upper && OVL_TYPE_UPPER(ovl_path_type(dentry))) { > struct inode *inode = file_inode(file); > > - realfile = lockless_dereference(od->upperfile); > + realfile = READ_ONCE(od->upperfile); > if (!realfile) { > struct path upperpath; > > diff --git a/include/linux/compiler.h b/include/linux/compiler.h > index e95a2631e545..f260ff39f90f 100644 > --- a/include/linux/compiler.h > +++ b/include/linux/compiler.h > @@ -340,6 +340,7 @@ static __always_inline void __write_once_size(volatile > void *p, void *res, int s > __read_once_size(&(x), __u.__c, sizeof(x)); \ > else \ > __read_once_size_nocheck(&(x), __u.__c, sizeof(x)); \ > + smp_read_barrier_depends(); /* Enforce dependency ordering from x */ \ > __u.__val; \ > }) > #define READ_ONCE(x) __READ_ONCE(x, 1) > @@ -604,24 +605,4 @@ static __always_inline void __write_once_size(volatile > void *p, void *res, int s > (volatile typeof(x) *)&(x); }) > #define ACCESS_ONCE(x) (*__ACCESS_ONCE(x)) > > -/** > - * lockless_dereference() - safely load a pointer for later dereference > - * @p: The pointer to load > - * > - * Similar to rcu_dereference(), but for situations where the pointed-to > - * object's lifetime is managed by something other than RCU. That > - * "something other" might be reference counting or simple immortality. > - * > - * The seemingly unused variable ___typecheck_p validates that @p is > - * indeed a pointer type by using a pointer to typeof(*p) as the type. > - * Taking a pointer to typeof(*p) again is needed in case p is void *. > - */ > -#define lockless_dereference(p) \ > -({ \ > - typeof(p) _________p1 = READ_ONCE(p); \ > - typeof(*(p)) *___typecheck_p __maybe_unused; \ > - smp_read_barrier_depends(); /* Dependency order vs. p above. */ \ > - (_________p1); \ > -}) > - > #endif /* __LINUX_COMPILER_H */ > diff --git a/include/linux/rculist.h b/include/linux/rculist.h > index b1fd8bf85fdc..3a2bb7d8ed4d 100644 > --- a/include/linux/rculist.h > +++ b/include/linux/rculist.h > @@ -274,7 +274,7 @@ static inline void list_splice_tail_init_rcu(struct > list_head *list, > * primitives such as list_add_rcu() as long as it's guarded by > rcu_read_lock(). > */ > #define list_entry_rcu(ptr, type, member) \ > - container_of(lockless_dereference(ptr), type, member) > + container_of(READ_ONCE(ptr), type, member) > > /** > * Where are list_empty_rcu() and list_first_entry_rcu()? > @@ -367,7 +367,7 @@ static inline void list_splice_tail_init_rcu(struct > list_head *list, > * example is when items are added to the list, but never deleted. > */ > #define list_entry_lockless(ptr, type, member) \ > - container_of((typeof(ptr))lockless_dereference(ptr), type, member) > + container_of((typeof(ptr))READ_ONCE(ptr), type, member) > > /** > * list_for_each_entry_lockless - iterate over rcu list of given type > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h > index de50d8a4cf41..380a3aeb09d7 100644 > --- a/include/linux/rcupdate.h > +++ b/include/linux/rcupdate.h > @@ -346,7 +346,7 @@ static inline void rcu_preempt_sleep_check(void) { } > #define __rcu_dereference_check(p, c, space) \ > ({ \ > /* Dependency order vs. p above. */ \ > - typeof(*p) *________p1 = (typeof(*p) *__force)lockless_dereference(p); \ > + typeof(*p) *________p1 = (typeof(*p) *__force)READ_ONCE(p); \ > RCU_LOCKDEP_WARN(!(c), "suspicious rcu_dereference_check() usage"); \ > rcu_dereference_sparse(p, space); \ > ((typeof(*p) __force __kernel *)(________p1)); \ > @@ -360,7 +360,7 @@ static inline void rcu_preempt_sleep_check(void) { } > #define rcu_dereference_raw(p) \ > ({ \ > /* Dependency order vs. p above. */ \ > - typeof(p) ________p1 = lockless_dereference(p); \ > + typeof(p) ________p1 = READ_ONCE(p); \ > ((typeof(*p) __force __kernel *)(________p1)); \ > }) > > diff --git a/kernel/events/core.c b/kernel/events/core.c > index 6bc21e202ae4..417812ce0099 100644 > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -4231,7 +4231,7 @@ static void perf_remove_from_owner(struct perf_event > *event) > * indeed free this event, otherwise we need to serialize on > * owner->perf_event_mutex. > */ > - owner = lockless_dereference(event->owner); > + owner = READ_ONCE(event->owner); > if (owner) { > /* > * Since delayed_put_task_struct() also drops the last > @@ -4328,7 +4328,7 @@ int perf_event_release_kernel(struct perf_event *event) > * Cannot change, child events are not migrated, see the > * comment with perf_event_ctx_lock_nested(). > */ > - ctx = lockless_dereference(child->ctx); > + ctx = READ_ONCE(child->ctx); > /* > * Since child_mutex nests inside ctx::mutex, we must jump > * through hoops. We start by grabbing a reference on the ctx. > diff --git a/kernel/seccomp.c b/kernel/seccomp.c > index bb3a38005b9c..1daa8b61a268 100644 > --- a/kernel/seccomp.c > +++ b/kernel/seccomp.c > @@ -189,7 +189,7 @@ static u32 seccomp_run_filters(const struct seccomp_data > *sd, > u32 ret = SECCOMP_RET_ALLOW; > /* Make sure cross-thread synced filter points somewhere sane. */ > struct seccomp_filter *f = > - lockless_dereference(current->seccomp.filter); > + READ_ONCE(current->seccomp.filter); > > /* Ensure unexpected behavior doesn't result in failing open. */ > if (unlikely(WARN_ON(f == NULL))) > diff --git a/kernel/task_work.c b/kernel/task_work.c > index 836a72a66fba..9a9f262fc53d 100644 > --- a/kernel/task_work.c > +++ b/kernel/task_work.c > @@ -67,7 +67,7 @@ task_work_cancel(struct task_struct *task, task_work_func_t > func) > * we raced with task_work_run(), *pprev == NULL/exited. > */ > raw_spin_lock_irqsave(&task->pi_lock, flags); > - while ((work = lockless_dereference(*pprev))) { > + while ((work = READ_ONCE(*pprev))) { > if (work->func != func) > pprev = &work->next; > else if (cmpxchg(pprev, work, work->next) == work) > diff --git a/mm/slab.h b/mm/slab.h > index 073362816acc..8894f811a89d 100644 > --- a/mm/slab.h > +++ b/mm/slab.h > @@ -258,7 +258,7 @@ cache_from_memcg_idx(struct kmem_cache *s, int idx) > * memcg_caches issues a write barrier to match this (see > * memcg_create_kmem_cache()). > */ > - cachep = lockless_dereference(arr->entries[idx]); > + cachep = READ_ONCE(arr->entries[idx]); > rcu_read_unlock(); > > return cachep; > -- > 2.1.4 >