RE: [PATCH RFC 01/16] prcu: Add PRCU implementation

2018-01-30 Thread zhangheng (AC)
-Original Message-
>From: Boqun Feng [mailto:boqun.f...@gmail.com] 
>Sent: 2018年1月30日 14:41
>To: zhangheng (AC) <hen...@huawei.com>
>Cc: Paul E. McKenney <paul...@linux.vnet.ibm.com>; liangli...@huawei.com; 
>Guohanjun (Hanjun Guo) <guohan...@huawei.com>; Chenhaibo (Haibo, OS Lab) 
><hb.c...@huawei.com>; lihao.li...@gmail.com; linux-kernel@vger.kernel.org
>Subject: Re: [PATCH RFC 01/16] prcu: Add PRCU implementation
>
>On Tue, Jan 30, 2018 at 05:34:03AM +, zhangheng (AC) wrote:
>[...]
>> >> > +static void prcu_handler(void *info) {
>> >> > +   struct prcu_local_struct *local;
>> >> > +
>> >> > +   local = this_cpu_ptr(_local);
>> >> > +   if (!local->locked)
>> >
>> >And I think a smp_mb() is needed here, because in the following case:
>> >
>> >CPU 0 CPU 1
>> >==  ==
>> >{X is initially 0}
>> >
>> >WRITE_ONCE(X, 1);
>> >
>> >  prcu_read_unlock(void):
>> >  if (locked) {
>> >  synchronize_prcu(void):
>> >...
>> >
>> >local->locked--;
>> >  # switch to IPI
>> > WRITE_ONCE(local->version,)
>> >> > latest>
>> >  
>> >
>> >  r1 = READ_ONCE(X);
>> >
>> >r1 could be 0, which breaks RCU guarantees.
>> >
>> 
>> Thank you.
>> As I know,
>> it guarantees that the interrupt to be handled after all write instructions 
>> issued before have complete in x86 arch.
>> So the smp_mb is meaningless in x86 arch.
>
>Sure. x86 is TSO, and we are talking about reordering of two stores here, and 
>that can not happen on TSO.
>
>> But I am not sure whether other archs guarantee this feature. If not, we do 
>> need a smp_mb here.
>> 
>
>I think most of the weak memory model don't have this gaurantee, so you need a 
>smp_mb() or use smp_store_release().

Agree.

>
>> >> > +   WRITE_ONCE(local->version, 
>> >> > +atomic64_read(>global_version));
>> >> > +}
>> >> > +
>> >> > +void synchronize_prcu(void)
>> >> > +{
>> >> > +   int cpu;
>> >> > +   cpumask_t cpus;
>> >> > +   unsigned long long version;
>> >> > +   struct prcu_local_struct *local;
>> >> > +
>> >> > +   version = atomic64_add_return(1, >global_version);
>> >> > +   mutex_lock(>mtx);
>> >> > +
>> >> > +   local = get_cpu_ptr(_local);
>> >> > +   local->version = version;
>> >> > +   put_cpu_ptr(_local);
>> >> > +
>> >> > +   cpumask_clear();
>> >> > +   for_each_possible_cpu(cpu) {
>> >> > +   local = per_cpu_ptr(_local, cpu);
>> >> > +   if (!READ_ONCE(local->online))
>> >> > +   continue;
>> >> > +   if (READ_ONCE(local->version) < version) {
>> >> 
>> >> On 32-bit systems, given that ->version is long long, you might see 
>> >> load tearing.  And on some 32-bit systems, the cmpxchg() in
>> >> prcu_hander() might not build.
>> >> 
>> >
>> >/me curious about why an atomic64_t is used here for global version. I 
>> >think maybe 32bit global version still suffices.
>> >
>> >Regards,
>> >Boqun
>> 
>> Because the synchronization latency is low, it can have higher gp frequency.
>> It seems that 32bit can only correctly work for several years if there are 
>> 20+ gps per second.
>> 
>
>Because PRCU doesn't handle gp number overflow? May I ask why this is 
>difficult? Currently RCU could tolerate counter wrap for grace period:
>
>   https://lwn.net/Articles/652677/ (Details in "Parallelism facts of 
> life")
>
>Is there any subtle difference I'm missing?
>
>Regards,
>Boqun
>

Yes, you are right. Currently prcu hasn't given a solution for overflow thus it 
needs a 64-bit

RE: [PATCH RFC 01/16] prcu: Add PRCU implementation

2018-01-30 Thread zhangheng (AC)
-Original Message-
>From: Boqun Feng [mailto:boqun.f...@gmail.com] 
>Sent: 2018年1月30日 14:41
>To: zhangheng (AC) 
>Cc: Paul E. McKenney ; liangli...@huawei.com; 
>Guohanjun (Hanjun Guo) ; Chenhaibo (Haibo, OS Lab) 
>; lihao.li...@gmail.com; linux-kernel@vger.kernel.org
>Subject: Re: [PATCH RFC 01/16] prcu: Add PRCU implementation
>
>On Tue, Jan 30, 2018 at 05:34:03AM +, zhangheng (AC) wrote:
>[...]
>> >> > +static void prcu_handler(void *info) {
>> >> > +   struct prcu_local_struct *local;
>> >> > +
>> >> > +   local = this_cpu_ptr(_local);
>> >> > +   if (!local->locked)
>> >
>> >And I think a smp_mb() is needed here, because in the following case:
>> >
>> >CPU 0 CPU 1
>> >==  ==
>> >{X is initially 0}
>> >
>> >WRITE_ONCE(X, 1);
>> >
>> >  prcu_read_unlock(void):
>> >  if (locked) {
>> >  synchronize_prcu(void):
>> >...
>> >
>> >local->locked--;
>> >  # switch to IPI
>> > WRITE_ONCE(local->version,)
>> >> > latest>
>> >  
>> >
>> >  r1 = READ_ONCE(X);
>> >
>> >r1 could be 0, which breaks RCU guarantees.
>> >
>> 
>> Thank you.
>> As I know,
>> it guarantees that the interrupt to be handled after all write instructions 
>> issued before have complete in x86 arch.
>> So the smp_mb is meaningless in x86 arch.
>
>Sure. x86 is TSO, and we are talking about reordering of two stores here, and 
>that can not happen on TSO.
>
>> But I am not sure whether other archs guarantee this feature. If not, we do 
>> need a smp_mb here.
>> 
>
>I think most of the weak memory model don't have this gaurantee, so you need a 
>smp_mb() or use smp_store_release().

Agree.

>
>> >> > +   WRITE_ONCE(local->version, 
>> >> > +atomic64_read(>global_version));
>> >> > +}
>> >> > +
>> >> > +void synchronize_prcu(void)
>> >> > +{
>> >> > +   int cpu;
>> >> > +   cpumask_t cpus;
>> >> > +   unsigned long long version;
>> >> > +   struct prcu_local_struct *local;
>> >> > +
>> >> > +   version = atomic64_add_return(1, >global_version);
>> >> > +   mutex_lock(>mtx);
>> >> > +
>> >> > +   local = get_cpu_ptr(_local);
>> >> > +   local->version = version;
>> >> > +   put_cpu_ptr(_local);
>> >> > +
>> >> > +   cpumask_clear();
>> >> > +   for_each_possible_cpu(cpu) {
>> >> > +   local = per_cpu_ptr(_local, cpu);
>> >> > +   if (!READ_ONCE(local->online))
>> >> > +   continue;
>> >> > +   if (READ_ONCE(local->version) < version) {
>> >> 
>> >> On 32-bit systems, given that ->version is long long, you might see 
>> >> load tearing.  And on some 32-bit systems, the cmpxchg() in
>> >> prcu_hander() might not build.
>> >> 
>> >
>> >/me curious about why an atomic64_t is used here for global version. I 
>> >think maybe 32bit global version still suffices.
>> >
>> >Regards,
>> >Boqun
>> 
>> Because the synchronization latency is low, it can have higher gp frequency.
>> It seems that 32bit can only correctly work for several years if there are 
>> 20+ gps per second.
>> 
>
>Because PRCU doesn't handle gp number overflow? May I ask why this is 
>difficult? Currently RCU could tolerate counter wrap for grace period:
>
>   https://lwn.net/Articles/652677/ (Details in "Parallelism facts of 
> life")
>
>Is there any subtle difference I'm missing?
>
>Regards,
>Boqun
>

Yes, you are right. Currently prcu hasn't given a solution for overflow thus it 
needs a 64-bit counter.
Giving a solution is not that difficult. I just didn't consider it when I use 
64-bit counter.

Re: [PATCH RFC 01/16] prcu: Add PRCU implementation

2018-01-29 Thread Boqun Feng
On Tue, Jan 30, 2018 at 05:34:03AM +, zhangheng (AC) wrote:
[...]
> >> > +static void prcu_handler(void *info) {
> >> > +struct prcu_local_struct *local;
> >> > +
> >> > +local = this_cpu_ptr(_local);
> >> > +if (!local->locked)
> >
> >And I think a smp_mb() is needed here, because in the following case:
> >
> > CPU 0 CPU 1
> > ==  ==
> > {X is initially 0}
> >
> > WRITE_ONCE(X, 1);
> >
> >   prcu_read_unlock(void):
> >   if (locked) {
> >   synchronize_prcu(void):
> > ...
> > 
> > local->locked--;
> >   # switch to IPI
> > WRITE_ONCE(local->version,)
> >  > latest>
> >   
> >
> >   r1 = READ_ONCE(X);
> >
> >r1 could be 0, which breaks RCU guarantees.
> >
> 
> Thank you.
> As I know,
> it guarantees that the interrupt to be handled after all write instructions 
> issued before have complete in x86 arch.
> So the smp_mb is meaningless in x86 arch.

Sure. x86 is TSO, and we are talking about reordering of two stores
here, and that can not happen on TSO.

> But I am not sure whether other archs guarantee this feature. If not, we do 
> need a smp_mb here.
> 

I think most of the weak memory model don't have this gaurantee, so you
need a smp_mb() or use smp_store_release().

> >> > +WRITE_ONCE(local->version, 
> >> > atomic64_read(>global_version));
> >> > +}
> >> > +
> >> > +void synchronize_prcu(void)
> >> > +{
> >> > +int cpu;
> >> > +cpumask_t cpus;
> >> > +unsigned long long version;
> >> > +struct prcu_local_struct *local;
> >> > +
> >> > +version = atomic64_add_return(1, >global_version);
> >> > +mutex_lock(>mtx);
> >> > +
> >> > +local = get_cpu_ptr(_local);
> >> > +local->version = version;
> >> > +put_cpu_ptr(_local);
> >> > +
> >> > +cpumask_clear();
> >> > +for_each_possible_cpu(cpu) {
> >> > +local = per_cpu_ptr(_local, cpu);
> >> > +if (!READ_ONCE(local->online))
> >> > +continue;
> >> > +if (READ_ONCE(local->version) < version) {
> >> 
> >> On 32-bit systems, given that ->version is long long, you might see 
> >> load tearing.  And on some 32-bit systems, the cmpxchg() in 
> >> prcu_hander() might not build.
> >> 
> >
> >/me curious about why an atomic64_t is used here for global version. I think 
> >maybe 32bit global version still suffices.
> >
> >Regards,
> >Boqun
> 
> Because the synchronization latency is low, it can have higher gp frequency.
> It seems that 32bit can only correctly work for several years if there are 
> 20+ gps per second.
> 

Because PRCU doesn't handle gp number overflow? May I ask why this is
difficult? Currently RCU could tolerate counter wrap for grace period:

https://lwn.net/Articles/652677/ (Details in "Parallelism facts of 
life")

Is there any subtle difference I'm missing?

Regards,
Boqun

> >
> >> Or is the idea that only prcu_handler() updates ->version?  But in 
> >> that case, you wouldn't need the READ_ONCE() above.  What am I missing 
> >> here?
> >> 
> >> > +smp_call_function_single(cpu, prcu_handler, 
> >> > NULL, 0);
> >> > +cpumask_set_cpu(cpu, );
> >> > +}
> >> > +}
> >> > +
> >> > +for_each_cpu(cpu, ) {
> >> > +local = per_cpu_ptr(_local, cpu);
> >> > +while (READ_ONCE(local->version) < version)
> >> 
> >> This ->version read can also tear on some 32-bit systems, and this one 
> >> most definitely can race with the prcu_handler() above.  Does the 
> >> algorithm operate correctly in that case?  (It doesn't look that way 
> >> to me, but I might be missing something.) Or are 32-bit systems excluded?
> >> 
> >> > +cpu_relax();
> >> > +}
> >> 
> >> I might be missing something, but I believe we need a memory barrier 
> >> here on non-TSO systems.  Without that, couldn't we miss a preemption?
> >> 
> >> > +
> >> > +if (atomic_read(>active_ctr))
> >> > +wait_event(prcu->wait_q, 
> >> > !atomic_read(>active_ctr));
> >> > +
> >> > +mutex_unlock(>mtx);
> >> > +}
> >> > +EXPORT_SYMBOL(synchronize_prcu);
> >> > +
> >> > +void prcu_note_context_switch(void) {
> >> > +struct prcu_local_struct *local;
> >> > +
> >> > +local = get_cpu_ptr(_local);
> >> > +if (local->locked) {
> >> > +atomic_add(local->locked, >active_ctr);
> >> > +local->locked = 0;
> >> > + 

Re: [PATCH RFC 01/16] prcu: Add PRCU implementation

2018-01-29 Thread Boqun Feng
On Tue, Jan 30, 2018 at 05:34:03AM +, zhangheng (AC) wrote:
[...]
> >> > +static void prcu_handler(void *info) {
> >> > +struct prcu_local_struct *local;
> >> > +
> >> > +local = this_cpu_ptr(_local);
> >> > +if (!local->locked)
> >
> >And I think a smp_mb() is needed here, because in the following case:
> >
> > CPU 0 CPU 1
> > ==  ==
> > {X is initially 0}
> >
> > WRITE_ONCE(X, 1);
> >
> >   prcu_read_unlock(void):
> >   if (locked) {
> >   synchronize_prcu(void):
> > ...
> > 
> > local->locked--;
> >   # switch to IPI
> > WRITE_ONCE(local->version,)
> >  > latest>
> >   
> >
> >   r1 = READ_ONCE(X);
> >
> >r1 could be 0, which breaks RCU guarantees.
> >
> 
> Thank you.
> As I know,
> it guarantees that the interrupt to be handled after all write instructions 
> issued before have complete in x86 arch.
> So the smp_mb is meaningless in x86 arch.

Sure. x86 is TSO, and we are talking about reordering of two stores
here, and that can not happen on TSO.

> But I am not sure whether other archs guarantee this feature. If not, we do 
> need a smp_mb here.
> 

I think most of the weak memory model don't have this gaurantee, so you
need a smp_mb() or use smp_store_release().

> >> > +WRITE_ONCE(local->version, 
> >> > atomic64_read(>global_version));
> >> > +}
> >> > +
> >> > +void synchronize_prcu(void)
> >> > +{
> >> > +int cpu;
> >> > +cpumask_t cpus;
> >> > +unsigned long long version;
> >> > +struct prcu_local_struct *local;
> >> > +
> >> > +version = atomic64_add_return(1, >global_version);
> >> > +mutex_lock(>mtx);
> >> > +
> >> > +local = get_cpu_ptr(_local);
> >> > +local->version = version;
> >> > +put_cpu_ptr(_local);
> >> > +
> >> > +cpumask_clear();
> >> > +for_each_possible_cpu(cpu) {
> >> > +local = per_cpu_ptr(_local, cpu);
> >> > +if (!READ_ONCE(local->online))
> >> > +continue;
> >> > +if (READ_ONCE(local->version) < version) {
> >> 
> >> On 32-bit systems, given that ->version is long long, you might see 
> >> load tearing.  And on some 32-bit systems, the cmpxchg() in 
> >> prcu_hander() might not build.
> >> 
> >
> >/me curious about why an atomic64_t is used here for global version. I think 
> >maybe 32bit global version still suffices.
> >
> >Regards,
> >Boqun
> 
> Because the synchronization latency is low, it can have higher gp frequency.
> It seems that 32bit can only correctly work for several years if there are 
> 20+ gps per second.
> 

Because PRCU doesn't handle gp number overflow? May I ask why this is
difficult? Currently RCU could tolerate counter wrap for grace period:

https://lwn.net/Articles/652677/ (Details in "Parallelism facts of 
life")

Is there any subtle difference I'm missing?

Regards,
Boqun

> >
> >> Or is the idea that only prcu_handler() updates ->version?  But in 
> >> that case, you wouldn't need the READ_ONCE() above.  What am I missing 
> >> here?
> >> 
> >> > +smp_call_function_single(cpu, prcu_handler, 
> >> > NULL, 0);
> >> > +cpumask_set_cpu(cpu, );
> >> > +}
> >> > +}
> >> > +
> >> > +for_each_cpu(cpu, ) {
> >> > +local = per_cpu_ptr(_local, cpu);
> >> > +while (READ_ONCE(local->version) < version)
> >> 
> >> This ->version read can also tear on some 32-bit systems, and this one 
> >> most definitely can race with the prcu_handler() above.  Does the 
> >> algorithm operate correctly in that case?  (It doesn't look that way 
> >> to me, but I might be missing something.) Or are 32-bit systems excluded?
> >> 
> >> > +cpu_relax();
> >> > +}
> >> 
> >> I might be missing something, but I believe we need a memory barrier 
> >> here on non-TSO systems.  Without that, couldn't we miss a preemption?
> >> 
> >> > +
> >> > +if (atomic_read(>active_ctr))
> >> > +wait_event(prcu->wait_q, 
> >> > !atomic_read(>active_ctr));
> >> > +
> >> > +mutex_unlock(>mtx);
> >> > +}
> >> > +EXPORT_SYMBOL(synchronize_prcu);
> >> > +
> >> > +void prcu_note_context_switch(void) {
> >> > +struct prcu_local_struct *local;
> >> > +
> >> > +local = get_cpu_ptr(_local);
> >> > +if (local->locked) {
> >> > +atomic_add(local->locked, >active_ctr);
> >> > +local->locked = 0;
> >> > + 

RE: [PATCH RFC 01/16] prcu: Add PRCU implementation

2018-01-29 Thread zhangheng (AC)
>-Original Message-
>From: jiangshan...@gmail.com [mailto:jiangshan...@gmail.com] On Behalf Of Lai 
>Jiangshan
>Sent: 2018年1月29日 17:11
>To: liangli...@huawei.com
>Cc: Paul E. McKenney <paul...@linux.vnet.ibm.com>; Guohanjun (Hanjun Guo) 
><guohan...@huawei.com>; zhangheng (AC) <hen...@huawei.com>; Chenhaibo (Haibo, 
>OS Lab) <hb.c...@huawei.com>; lihao.li...@gmail.com; LKML 
><linux-kernel@vger.kernel.org>
>Subject: Re: [PATCH RFC 01/16] prcu: Add PRCU implementation
>
>On Tue, Jan 23, 2018 at 3:59 PM,  <liangli...@huawei.com> wrote:
>> From: Heng Zhang <hen...@huawei.com>
>>
>> This RCU implementation (PRCU) is based on a fast consensus protocol 
>> published in the following paper:
>>
>> Fast Consensus Using Bounded Staleness for Scalable Read-mostly 
>> Synchronization.
>> Haibo Chen, Heng Zhang, Ran Liu, Binyu Zang, and Haibing Guan.
>> IEEE Transactions on Parallel and Distributed Systems (TPDS), 2016.
>> https://dl.acm.org/citation.cfm?id=3024114.3024143
>>
>> Signed-off-by: Heng Zhang <hen...@huawei.com>
>> Signed-off-by: Lihao Liang <liangli...@huawei.com>
>> ---
>>  include/linux/prcu.h |  37 +++
>>  kernel/rcu/Makefile  |   2 +-
>>  kernel/rcu/prcu.c| 125 
>> +++
>>  kernel/sched/core.c  |   2 +
>>  4 files changed, 165 insertions(+), 1 deletion(-)  create mode 100644 
>> include/linux/prcu.h  create mode 100644 kernel/rcu/prcu.c
>>
>> diff --git a/include/linux/prcu.h b/include/linux/prcu.h new file mode 
>> 100644 index ..653b4633
>> --- /dev/null
>> +++ b/include/linux/prcu.h
>> @@ -0,0 +1,37 @@
>> +#ifndef __LINUX_PRCU_H
>> +#define __LINUX_PRCU_H
>> +
>> +#include 
>> +#include 
>> +#include 
>> +
>> +#define CONFIG_PRCU
>> +
>> +struct prcu_local_struct {
>> +   unsigned int locked;
>> +   unsigned int online;
>> +   unsigned long long version;
>> +};
>> +
>> +struct prcu_struct {
>> +   atomic64_t global_version;
>> +   atomic_t active_ctr;
>> +   struct mutex mtx;
>> +   wait_queue_head_t wait_q;
>> +};
>> +
>> +#ifdef CONFIG_PRCU
>> +void prcu_read_lock(void);
>> +void prcu_read_unlock(void);
>> +void synchronize_prcu(void);
>> +void prcu_note_context_switch(void);
>> +
>> +#else /* #ifdef CONFIG_PRCU */
>> +
>> +#define prcu_read_lock() do {} while (0) #define prcu_read_unlock() 
>> +do {} while (0) #define synchronize_prcu() do {} while (0) #define 
>> +prcu_note_context_switch() do {} while (0)
>> +
>> +#endif /* #ifdef CONFIG_PRCU */
>> +#endif /* __LINUX_PRCU_H */
>> diff --git a/kernel/rcu/Makefile b/kernel/rcu/Makefile index 
>> 23803c7d..8791419c 100644
>> --- a/kernel/rcu/Makefile
>> +++ b/kernel/rcu/Makefile
>> @@ -2,7 +2,7 @@
>>  # and is generally not a function of system call inputs.
>>  KCOV_INSTRUMENT := n
>>
>> -obj-y += update.o sync.o
>> +obj-y += update.o sync.o prcu.o
>>  obj-$(CONFIG_CLASSIC_SRCU) += srcu.o
>>  obj-$(CONFIG_TREE_SRCU) += srcutree.o
>>  obj-$(CONFIG_TINY_SRCU) += srcutiny.o diff --git a/kernel/rcu/prcu.c 
>> b/kernel/rcu/prcu.c new file mode 100644 index ..a00b9420
>> --- /dev/null
>> +++ b/kernel/rcu/prcu.c
>> @@ -0,0 +1,125 @@
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +
>> +#include 
>> +
>> +DEFINE_PER_CPU_SHARED_ALIGNED(struct prcu_local_struct, prcu_local);
>> +
>> +struct prcu_struct global_prcu = {
>> +   .global_version = ATOMIC64_INIT(0),
>> +   .active_ctr = ATOMIC_INIT(0),
>> +   .mtx = __MUTEX_INITIALIZER(global_prcu.mtx),
>> +   .wait_q = __WAIT_QUEUE_HEAD_INITIALIZER(global_prcu.wait_q)
>> +};
>> +struct prcu_struct *prcu = _prcu;
>> +
>> +static inline void prcu_report(struct prcu_local_struct *local) {
>> +   unsigned long long global_version;
>> +   unsigned long long local_version;
>> +
>> +   global_version = atomic64_read(>global_version);
>> +   local_version = local->version;
>> +   if (global_version > local_version)
>> +   cmpxchg(>version, local_version, 
>> + global_version);
>
>It is called with irq-disabled, and local->version can't be modified on other 
>cpu. why cmpxchg is needed?

No, it will also be called by prcu_read_unlock in this implementation.

>> +}
>

RE: [PATCH RFC 01/16] prcu: Add PRCU implementation

2018-01-29 Thread zhangheng (AC)
>-Original Message-
>From: jiangshan...@gmail.com [mailto:jiangshan...@gmail.com] On Behalf Of Lai 
>Jiangshan
>Sent: 2018年1月29日 17:11
>To: liangli...@huawei.com
>Cc: Paul E. McKenney ; Guohanjun (Hanjun Guo) 
>; zhangheng (AC) ; Chenhaibo (Haibo, 
>OS Lab) ; lihao.li...@gmail.com; LKML 
>
>Subject: Re: [PATCH RFC 01/16] prcu: Add PRCU implementation
>
>On Tue, Jan 23, 2018 at 3:59 PM,   wrote:
>> From: Heng Zhang 
>>
>> This RCU implementation (PRCU) is based on a fast consensus protocol 
>> published in the following paper:
>>
>> Fast Consensus Using Bounded Staleness for Scalable Read-mostly 
>> Synchronization.
>> Haibo Chen, Heng Zhang, Ran Liu, Binyu Zang, and Haibing Guan.
>> IEEE Transactions on Parallel and Distributed Systems (TPDS), 2016.
>> https://dl.acm.org/citation.cfm?id=3024114.3024143
>>
>> Signed-off-by: Heng Zhang 
>> Signed-off-by: Lihao Liang 
>> ---
>>  include/linux/prcu.h |  37 +++
>>  kernel/rcu/Makefile  |   2 +-
>>  kernel/rcu/prcu.c| 125 
>> +++
>>  kernel/sched/core.c  |   2 +
>>  4 files changed, 165 insertions(+), 1 deletion(-)  create mode 100644 
>> include/linux/prcu.h  create mode 100644 kernel/rcu/prcu.c
>>
>> diff --git a/include/linux/prcu.h b/include/linux/prcu.h new file mode 
>> 100644 index ..653b4633
>> --- /dev/null
>> +++ b/include/linux/prcu.h
>> @@ -0,0 +1,37 @@
>> +#ifndef __LINUX_PRCU_H
>> +#define __LINUX_PRCU_H
>> +
>> +#include 
>> +#include 
>> +#include 
>> +
>> +#define CONFIG_PRCU
>> +
>> +struct prcu_local_struct {
>> +   unsigned int locked;
>> +   unsigned int online;
>> +   unsigned long long version;
>> +};
>> +
>> +struct prcu_struct {
>> +   atomic64_t global_version;
>> +   atomic_t active_ctr;
>> +   struct mutex mtx;
>> +   wait_queue_head_t wait_q;
>> +};
>> +
>> +#ifdef CONFIG_PRCU
>> +void prcu_read_lock(void);
>> +void prcu_read_unlock(void);
>> +void synchronize_prcu(void);
>> +void prcu_note_context_switch(void);
>> +
>> +#else /* #ifdef CONFIG_PRCU */
>> +
>> +#define prcu_read_lock() do {} while (0) #define prcu_read_unlock() 
>> +do {} while (0) #define synchronize_prcu() do {} while (0) #define 
>> +prcu_note_context_switch() do {} while (0)
>> +
>> +#endif /* #ifdef CONFIG_PRCU */
>> +#endif /* __LINUX_PRCU_H */
>> diff --git a/kernel/rcu/Makefile b/kernel/rcu/Makefile index 
>> 23803c7d..8791419c 100644
>> --- a/kernel/rcu/Makefile
>> +++ b/kernel/rcu/Makefile
>> @@ -2,7 +2,7 @@
>>  # and is generally not a function of system call inputs.
>>  KCOV_INSTRUMENT := n
>>
>> -obj-y += update.o sync.o
>> +obj-y += update.o sync.o prcu.o
>>  obj-$(CONFIG_CLASSIC_SRCU) += srcu.o
>>  obj-$(CONFIG_TREE_SRCU) += srcutree.o
>>  obj-$(CONFIG_TINY_SRCU) += srcutiny.o diff --git a/kernel/rcu/prcu.c 
>> b/kernel/rcu/prcu.c new file mode 100644 index ..a00b9420
>> --- /dev/null
>> +++ b/kernel/rcu/prcu.c
>> @@ -0,0 +1,125 @@
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +
>> +#include 
>> +
>> +DEFINE_PER_CPU_SHARED_ALIGNED(struct prcu_local_struct, prcu_local);
>> +
>> +struct prcu_struct global_prcu = {
>> +   .global_version = ATOMIC64_INIT(0),
>> +   .active_ctr = ATOMIC_INIT(0),
>> +   .mtx = __MUTEX_INITIALIZER(global_prcu.mtx),
>> +   .wait_q = __WAIT_QUEUE_HEAD_INITIALIZER(global_prcu.wait_q)
>> +};
>> +struct prcu_struct *prcu = _prcu;
>> +
>> +static inline void prcu_report(struct prcu_local_struct *local) {
>> +   unsigned long long global_version;
>> +   unsigned long long local_version;
>> +
>> +   global_version = atomic64_read(>global_version);
>> +   local_version = local->version;
>> +   if (global_version > local_version)
>> +   cmpxchg(>version, local_version, 
>> + global_version);
>
>It is called with irq-disabled, and local->version can't be modified on other 
>cpu. why cmpxchg is needed?

No, it will also be called by prcu_read_unlock in this implementation.

>> +}
>> +
>> +void prcu_read_lock(void)
>> +{
>> +   struct prcu_local_struct *local;
>> +
>> +   local = get_cpu_ptr(_local);
>> +   if (!local->online) {
>> +   WRITE_ONCE(local->online

RE: [PATCH RFC 01/16] prcu: Add PRCU implementation

2018-01-29 Thread zhangheng (AC)
-Original Message-
>From: Boqun Feng [mailto:boqun.f...@gmail.com] 
>Sent: 2018年1月25日 15:31
>To: Paul E. McKenney <paul...@linux.vnet.ibm.com>
>Cc: liangli...@huawei.com; Guohanjun (Hanjun Guo) <guohan...@huawei.com>; 
>zhangheng (AC) <hen...@huawei.com>; Chenhaibo (Haibo, OS Lab) 
><hb.c...@huawei.com>; lihao.li...@gmail.com; linux-kernel@vger.kernel.org
>Subject: Re: [PATCH RFC 01/16] prcu: Add PRCU implementation
>
>On Wed, Jan 24, 2018 at 10:16:18PM -0800, Paul E. McKenney wrote:
>> On Tue, Jan 23, 2018 at 03:59:26PM +0800, liangli...@huawei.com wrote:
>> > From: Heng Zhang <hen...@huawei.com>
>> > 
>> > This RCU implementation (PRCU) is based on a fast consensus protocol 
>> > published in the following paper:
>> > 
>> > Fast Consensus Using Bounded Staleness for Scalable Read-mostly 
>> > Synchronization.
>> > Haibo Chen, Heng Zhang, Ran Liu, Binyu Zang, and Haibing Guan.
>> > IEEE Transactions on Parallel and Distributed Systems (TPDS), 2016.
>> > https://dl.acm.org/citation.cfm?id=3024114.3024143
>> > 
>> > Signed-off-by: Heng Zhang <hen...@huawei.com>
>> > Signed-off-by: Lihao Liang <liangli...@huawei.com>
>> 
>> A few comments and questions interspersed.
>> 
>>  Thanx, Paul
>> 
>> > ---
>> >  include/linux/prcu.h |  37 +++
>> >  kernel/rcu/Makefile  |   2 +-
>> >  kernel/rcu/prcu.c| 125 
>> > +++
>> >  kernel/sched/core.c  |   2 +
>> >  4 files changed, 165 insertions(+), 1 deletion(-)  create mode 
>> > 100644 include/linux/prcu.h  create mode 100644 kernel/rcu/prcu.c
>> > 
>> > diff --git a/include/linux/prcu.h b/include/linux/prcu.h new file 
>> > mode 100644 index ..653b4633
>> > --- /dev/null
>> > +++ b/include/linux/prcu.h
>> > @@ -0,0 +1,37 @@
>> > +#ifndef __LINUX_PRCU_H
>> > +#define __LINUX_PRCU_H
>> > +
>> > +#include 
>> > +#include 
>> > +#include 
>> > +
>> > +#define CONFIG_PRCU
>> > +
>> > +struct prcu_local_struct {
>> > +  unsigned int locked;
>> > +  unsigned int online;
>> > +  unsigned long long version;
>> > +};
>> > +
>> > +struct prcu_struct {
>> > +  atomic64_t global_version;
>> > +  atomic_t active_ctr;
>> > +  struct mutex mtx;
>> > +  wait_queue_head_t wait_q;
>> > +};
>> > +
>> > +#ifdef CONFIG_PRCU
>> > +void prcu_read_lock(void);
>> > +void prcu_read_unlock(void);
>> > +void synchronize_prcu(void);
>> > +void prcu_note_context_switch(void);
>> > +
>> > +#else /* #ifdef CONFIG_PRCU */
>> > +
>> > +#define prcu_read_lock() do {} while (0) #define prcu_read_unlock() 
>> > +do {} while (0) #define synchronize_prcu() do {} while (0) #define 
>> > +prcu_note_context_switch() do {} while (0)
>> 
>> If CONFIG_PRCU=n and some code is built that uses PRCU, shouldn't you 
>> get a build error rather than an error-free but inoperative PRCU?
>> 
>> Of course, Peter's question about purpose of the patch set applies 
>> here as well.
>> 
>> > +
>> > +#endif /* #ifdef CONFIG_PRCU */
>> > +#endif /* __LINUX_PRCU_H */
>> > diff --git a/kernel/rcu/Makefile b/kernel/rcu/Makefile index 
>> > 23803c7d..8791419c 100644
>> > --- a/kernel/rcu/Makefile
>> > +++ b/kernel/rcu/Makefile
>> > @@ -2,7 +2,7 @@
>> >  # and is generally not a function of system call inputs.
>> >  KCOV_INSTRUMENT := n
>> > 
>> > -obj-y += update.o sync.o
>> > +obj-y += update.o sync.o prcu.o
>> >  obj-$(CONFIG_CLASSIC_SRCU) += srcu.o
>> >  obj-$(CONFIG_TREE_SRCU) += srcutree.o
>> >  obj-$(CONFIG_TINY_SRCU) += srcutiny.o diff --git 
>> > a/kernel/rcu/prcu.c b/kernel/rcu/prcu.c new file mode 100644 index 
>> > ..a00b9420
>> > --- /dev/null
>> > +++ b/kernel/rcu/prcu.c
>> > @@ -0,0 +1,125 @@
>> > +#include 
>> > +#include 
>> > +#include 
>> > +#include 
>> > +#include 
>> > +
>> > +#include 
>> > +
>> > +DEFINE_PER_CPU_SHARED_ALIGNED(struct prcu_local_struct, 
>> > +prcu_local);
>> > +
>> > +struct prcu_struct global_prcu = {
>> > +  .global_version = ATOMIC64_INIT(0),
>> > +  .active_ctr

RE: [PATCH RFC 01/16] prcu: Add PRCU implementation

2018-01-29 Thread zhangheng (AC)
-Original Message-
>From: Boqun Feng [mailto:boqun.f...@gmail.com] 
>Sent: 2018年1月25日 15:31
>To: Paul E. McKenney 
>Cc: liangli...@huawei.com; Guohanjun (Hanjun Guo) ; 
>zhangheng (AC) ; Chenhaibo (Haibo, OS Lab) 
>; lihao.li...@gmail.com; linux-kernel@vger.kernel.org
>Subject: Re: [PATCH RFC 01/16] prcu: Add PRCU implementation
>
>On Wed, Jan 24, 2018 at 10:16:18PM -0800, Paul E. McKenney wrote:
>> On Tue, Jan 23, 2018 at 03:59:26PM +0800, liangli...@huawei.com wrote:
>> > From: Heng Zhang 
>> > 
>> > This RCU implementation (PRCU) is based on a fast consensus protocol 
>> > published in the following paper:
>> > 
>> > Fast Consensus Using Bounded Staleness for Scalable Read-mostly 
>> > Synchronization.
>> > Haibo Chen, Heng Zhang, Ran Liu, Binyu Zang, and Haibing Guan.
>> > IEEE Transactions on Parallel and Distributed Systems (TPDS), 2016.
>> > https://dl.acm.org/citation.cfm?id=3024114.3024143
>> > 
>> > Signed-off-by: Heng Zhang 
>> > Signed-off-by: Lihao Liang 
>> 
>> A few comments and questions interspersed.
>> 
>>  Thanx, Paul
>> 
>> > ---
>> >  include/linux/prcu.h |  37 +++
>> >  kernel/rcu/Makefile  |   2 +-
>> >  kernel/rcu/prcu.c| 125 
>> > +++
>> >  kernel/sched/core.c  |   2 +
>> >  4 files changed, 165 insertions(+), 1 deletion(-)  create mode 
>> > 100644 include/linux/prcu.h  create mode 100644 kernel/rcu/prcu.c
>> > 
>> > diff --git a/include/linux/prcu.h b/include/linux/prcu.h new file 
>> > mode 100644 index ..653b4633
>> > --- /dev/null
>> > +++ b/include/linux/prcu.h
>> > @@ -0,0 +1,37 @@
>> > +#ifndef __LINUX_PRCU_H
>> > +#define __LINUX_PRCU_H
>> > +
>> > +#include 
>> > +#include 
>> > +#include 
>> > +
>> > +#define CONFIG_PRCU
>> > +
>> > +struct prcu_local_struct {
>> > +  unsigned int locked;
>> > +  unsigned int online;
>> > +  unsigned long long version;
>> > +};
>> > +
>> > +struct prcu_struct {
>> > +  atomic64_t global_version;
>> > +  atomic_t active_ctr;
>> > +  struct mutex mtx;
>> > +  wait_queue_head_t wait_q;
>> > +};
>> > +
>> > +#ifdef CONFIG_PRCU
>> > +void prcu_read_lock(void);
>> > +void prcu_read_unlock(void);
>> > +void synchronize_prcu(void);
>> > +void prcu_note_context_switch(void);
>> > +
>> > +#else /* #ifdef CONFIG_PRCU */
>> > +
>> > +#define prcu_read_lock() do {} while (0) #define prcu_read_unlock() 
>> > +do {} while (0) #define synchronize_prcu() do {} while (0) #define 
>> > +prcu_note_context_switch() do {} while (0)
>> 
>> If CONFIG_PRCU=n and some code is built that uses PRCU, shouldn't you 
>> get a build error rather than an error-free but inoperative PRCU?
>> 
>> Of course, Peter's question about purpose of the patch set applies 
>> here as well.
>> 
>> > +
>> > +#endif /* #ifdef CONFIG_PRCU */
>> > +#endif /* __LINUX_PRCU_H */
>> > diff --git a/kernel/rcu/Makefile b/kernel/rcu/Makefile index 
>> > 23803c7d..8791419c 100644
>> > --- a/kernel/rcu/Makefile
>> > +++ b/kernel/rcu/Makefile
>> > @@ -2,7 +2,7 @@
>> >  # and is generally not a function of system call inputs.
>> >  KCOV_INSTRUMENT := n
>> > 
>> > -obj-y += update.o sync.o
>> > +obj-y += update.o sync.o prcu.o
>> >  obj-$(CONFIG_CLASSIC_SRCU) += srcu.o
>> >  obj-$(CONFIG_TREE_SRCU) += srcutree.o
>> >  obj-$(CONFIG_TINY_SRCU) += srcutiny.o diff --git 
>> > a/kernel/rcu/prcu.c b/kernel/rcu/prcu.c new file mode 100644 index 
>> > ..a00b9420
>> > --- /dev/null
>> > +++ b/kernel/rcu/prcu.c
>> > @@ -0,0 +1,125 @@
>> > +#include 
>> > +#include 
>> > +#include 
>> > +#include 
>> > +#include 
>> > +
>> > +#include 
>> > +
>> > +DEFINE_PER_CPU_SHARED_ALIGNED(struct prcu_local_struct, 
>> > +prcu_local);
>> > +
>> > +struct prcu_struct global_prcu = {
>> > +  .global_version = ATOMIC64_INIT(0),
>> > +  .active_ctr = ATOMIC_INIT(0),
>> > +  .mtx = __MUTEX_INITIALIZER(global_prcu.mtx),
>> > +  .wait_q = __WAIT_QUEUE_HEAD_INITIALIZER(global_prcu.wait_q)
>> > +};
>&g

RE: [PATCH RFC 01/16] prcu: Add PRCU implementation

2018-01-29 Thread zhangheng (AC)
Very sorry for late response, I found this email has been blocked after Lihao 
mentioned this morning.

-Original Message-
From: zhangheng (AC) 
Sent: 2018年1月26日 18:30
To: 'paul...@linux.vnet.ibm.com' <paul...@linux.vnet.ibm.com>; 
liangli...@huawei.com
Cc: Guohanjun (Hanjun Guo) <guohan...@huawei.com>; Chenhaibo (Haibo, OS Lab) 
<hb.c...@huawei.com>; lihao.li...@gmail.com; linux-kernel@vger.kernel.org
Subject: RE: [PATCH RFC 01/16] prcu: Add PRCU implementation

Hi Paul, thanks a lot for pointing out the problems of the implementation. 
Here's my understanding.

Best Regards,
Heng

-Original Message-
>From: Paul E. McKenney [mailto:paul...@linux.vnet.ibm.com]
>Sent: 2018年1月25日 14:16
>To: liangli...@huawei.com
>Cc: Guohanjun (Hanjun Guo) <guohan...@huawei.com>; zhangheng (AC) 
><hen...@huawei.com>; Chenhaibo (Haibo, OS Lab) <hb.c...@huawei.com>; 
>lihao.li...@gmail.com; linux-kernel@vger.kernel.org
>Subject: Re: [PATCH RFC 01/16] prcu: Add PRCU implementation

>On Tue, Jan 23, 2018 at 03:59:26PM +0800, liangli...@huawei.com wrote:
>> From: Heng Zhang <hen...@huawei.com>
>> 
>> This RCU implementation (PRCU) is based on a fast consensus protocol 
>> published in the following paper:
>> 
>> Fast Consensus Using Bounded Staleness for Scalable Read-mostly 
>> Synchronization.
>> Haibo Chen, Heng Zhang, Ran Liu, Binyu Zang, and Haibing Guan.
>> IEEE Transactions on Parallel and Distributed Systems (TPDS), 2016.
>> https://dl.acm.org/citation.cfm?id=3024114.3024143
>> 
>> Signed-off-by: Heng Zhang <hen...@huawei.com>
>> Signed-off-by: Lihao Liang <liangli...@huawei.com>

>A few comments and questions interspersed.

>   Thanx, Paul

>> ---
>>  include/linux/prcu.h |  37 +++
>>  kernel/rcu/Makefile  |   2 +-
>>  kernel/rcu/prcu.c| 125 
>> +++
>>  kernel/sched/core.c  |   2 +
>>  4 files changed, 165 insertions(+), 1 deletion(-)  create mode 
>> 100644 include/linux/prcu.h  create mode 100644 kernel/rcu/prcu.c
>> 
>> diff --git a/include/linux/prcu.h b/include/linux/prcu.h new file 
>> mode
>> 100644 index ..653b4633
>> --- /dev/null
>> +++ b/include/linux/prcu.h
>> @@ -0,0 +1,37 @@
>> +#ifndef __LINUX_PRCU_H
>> +#define __LINUX_PRCU_H
>> +
>> +#include 
>> +#include 
>> +#include 
>> +
>> +#define CONFIG_PRCU
>> +
>> +struct prcu_local_struct {
>> +unsigned int locked;
>> +unsigned int online;
>> +unsigned long long version;
>> +};
>> +
>> +struct prcu_struct {
>> +atomic64_t global_version;
>> +atomic_t active_ctr;
>> +struct mutex mtx;
>> +wait_queue_head_t wait_q;
>> +};
>> +
>> +#ifdef CONFIG_PRCU
>> +void prcu_read_lock(void);
>> +void prcu_read_unlock(void);
>> +void synchronize_prcu(void);
>> +void prcu_note_context_switch(void);
>> +
>> +#else /* #ifdef CONFIG_PRCU */
>> +
>> +#define prcu_read_lock() do {} while (0) #define prcu_read_unlock() 
>> +do {} while (0) #define synchronize_prcu() do {} while (0) #define
>> +prcu_note_context_switch() do {} while (0)

>If CONFIG_PRCU=n and some code is built that uses PRCU, shouldn't you get a 
>build error rather than an error-free but inoperative PRCU?

>Of course, Peter's question about purpose of the patch set applies here as 
>well.

Yes, we should handle this case more carefully.
And in my personal opinion, prcu is designed for some modules that have a few 
threads and require low synchronization latency (because it just sends IPIs to 
online readers). I think if a module that uses synchronize_rcu_expedited, it 
may need prcu.

>> +
>> +#endif /* #ifdef CONFIG_PRCU */
>> +#endif /* __LINUX_PRCU_H */
>> diff --git a/kernel/rcu/Makefile b/kernel/rcu/Makefile index 
>> 23803c7d..8791419c 100644
>> --- a/kernel/rcu/Makefile
>> +++ b/kernel/rcu/Makefile
>> @@ -2,7 +2,7 @@
>>  # and is generally not a function of system call inputs.
>>  KCOV_INSTRUMENT := n
>> 
>> -obj-y += update.o sync.o
>> +obj-y += update.o sync.o prcu.o
>>  obj-$(CONFIG_CLASSIC_SRCU) += srcu.o
>>  obj-$(CONFIG_TREE_SRCU) += srcutree.o
>>  obj-$(CONFIG_TINY_SRCU) += srcutiny.o diff --git a/kernel/rcu/prcu.c 
>> b/kernel/rcu/prcu.c new file mode 100644 index ..a00b9420
>> --- /dev/null
>> +++ b/kernel/rcu/prcu.c
>> @@ -0,0 +1,125 @@
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#inclu

RE: [PATCH RFC 01/16] prcu: Add PRCU implementation

2018-01-29 Thread zhangheng (AC)
Very sorry for late response, I found this email has been blocked after Lihao 
mentioned this morning.

-Original Message-
From: zhangheng (AC) 
Sent: 2018年1月26日 18:30
To: 'paul...@linux.vnet.ibm.com' ; 
liangli...@huawei.com
Cc: Guohanjun (Hanjun Guo) ; Chenhaibo (Haibo, OS Lab) 
; lihao.li...@gmail.com; linux-kernel@vger.kernel.org
Subject: RE: [PATCH RFC 01/16] prcu: Add PRCU implementation

Hi Paul, thanks a lot for pointing out the problems of the implementation. 
Here's my understanding.

Best Regards,
Heng

-Original Message-
>From: Paul E. McKenney [mailto:paul...@linux.vnet.ibm.com]
>Sent: 2018年1月25日 14:16
>To: liangli...@huawei.com
>Cc: Guohanjun (Hanjun Guo) ; zhangheng (AC) 
>; Chenhaibo (Haibo, OS Lab) ; 
>lihao.li...@gmail.com; linux-kernel@vger.kernel.org
>Subject: Re: [PATCH RFC 01/16] prcu: Add PRCU implementation

>On Tue, Jan 23, 2018 at 03:59:26PM +0800, liangli...@huawei.com wrote:
>> From: Heng Zhang 
>> 
>> This RCU implementation (PRCU) is based on a fast consensus protocol 
>> published in the following paper:
>> 
>> Fast Consensus Using Bounded Staleness for Scalable Read-mostly 
>> Synchronization.
>> Haibo Chen, Heng Zhang, Ran Liu, Binyu Zang, and Haibing Guan.
>> IEEE Transactions on Parallel and Distributed Systems (TPDS), 2016.
>> https://dl.acm.org/citation.cfm?id=3024114.3024143
>> 
>> Signed-off-by: Heng Zhang 
>> Signed-off-by: Lihao Liang 

>A few comments and questions interspersed.

>   Thanx, Paul

>> ---
>>  include/linux/prcu.h |  37 +++
>>  kernel/rcu/Makefile  |   2 +-
>>  kernel/rcu/prcu.c| 125 
>> +++
>>  kernel/sched/core.c  |   2 +
>>  4 files changed, 165 insertions(+), 1 deletion(-)  create mode 
>> 100644 include/linux/prcu.h  create mode 100644 kernel/rcu/prcu.c
>> 
>> diff --git a/include/linux/prcu.h b/include/linux/prcu.h new file 
>> mode
>> 100644 index ..653b4633
>> --- /dev/null
>> +++ b/include/linux/prcu.h
>> @@ -0,0 +1,37 @@
>> +#ifndef __LINUX_PRCU_H
>> +#define __LINUX_PRCU_H
>> +
>> +#include 
>> +#include 
>> +#include 
>> +
>> +#define CONFIG_PRCU
>> +
>> +struct prcu_local_struct {
>> +unsigned int locked;
>> +unsigned int online;
>> +unsigned long long version;
>> +};
>> +
>> +struct prcu_struct {
>> +atomic64_t global_version;
>> +atomic_t active_ctr;
>> +struct mutex mtx;
>> +wait_queue_head_t wait_q;
>> +};
>> +
>> +#ifdef CONFIG_PRCU
>> +void prcu_read_lock(void);
>> +void prcu_read_unlock(void);
>> +void synchronize_prcu(void);
>> +void prcu_note_context_switch(void);
>> +
>> +#else /* #ifdef CONFIG_PRCU */
>> +
>> +#define prcu_read_lock() do {} while (0) #define prcu_read_unlock() 
>> +do {} while (0) #define synchronize_prcu() do {} while (0) #define
>> +prcu_note_context_switch() do {} while (0)

>If CONFIG_PRCU=n and some code is built that uses PRCU, shouldn't you get a 
>build error rather than an error-free but inoperative PRCU?

>Of course, Peter's question about purpose of the patch set applies here as 
>well.

Yes, we should handle this case more carefully.
And in my personal opinion, prcu is designed for some modules that have a few 
threads and require low synchronization latency (because it just sends IPIs to 
online readers). I think if a module that uses synchronize_rcu_expedited, it 
may need prcu.

>> +
>> +#endif /* #ifdef CONFIG_PRCU */
>> +#endif /* __LINUX_PRCU_H */
>> diff --git a/kernel/rcu/Makefile b/kernel/rcu/Makefile index 
>> 23803c7d..8791419c 100644
>> --- a/kernel/rcu/Makefile
>> +++ b/kernel/rcu/Makefile
>> @@ -2,7 +2,7 @@
>>  # and is generally not a function of system call inputs.
>>  KCOV_INSTRUMENT := n
>> 
>> -obj-y += update.o sync.o
>> +obj-y += update.o sync.o prcu.o
>>  obj-$(CONFIG_CLASSIC_SRCU) += srcu.o
>>  obj-$(CONFIG_TREE_SRCU) += srcutree.o
>>  obj-$(CONFIG_TINY_SRCU) += srcutiny.o diff --git a/kernel/rcu/prcu.c 
>> b/kernel/rcu/prcu.c new file mode 100644 index ..a00b9420
>> --- /dev/null
>> +++ b/kernel/rcu/prcu.c
>> @@ -0,0 +1,125 @@
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +
>> +#include 
>> +
>> +DEFINE_PER_CPU_SHARED_ALIGNED(struct prcu_local_struct, prcu_local);
>> +
>> +struct prcu_struct global_prcu = {
>> +.global_version = ATOMIC64_INIT(0),
>> +.active_ctr = ATOMI

Re: [PATCH RFC 01/16] prcu: Add PRCU implementation

2018-01-29 Thread Lai Jiangshan
On Tue, Jan 23, 2018 at 3:59 PM,   wrote:
> From: Heng Zhang 
>
> This RCU implementation (PRCU) is based on a fast consensus protocol
> published in the following paper:
>
> Fast Consensus Using Bounded Staleness for Scalable Read-mostly 
> Synchronization.
> Haibo Chen, Heng Zhang, Ran Liu, Binyu Zang, and Haibing Guan.
> IEEE Transactions on Parallel and Distributed Systems (TPDS), 2016.
> https://dl.acm.org/citation.cfm?id=3024114.3024143
>
> Signed-off-by: Heng Zhang 
> Signed-off-by: Lihao Liang 
> ---
>  include/linux/prcu.h |  37 +++
>  kernel/rcu/Makefile  |   2 +-
>  kernel/rcu/prcu.c| 125 
> +++
>  kernel/sched/core.c  |   2 +
>  4 files changed, 165 insertions(+), 1 deletion(-)
>  create mode 100644 include/linux/prcu.h
>  create mode 100644 kernel/rcu/prcu.c
>
> diff --git a/include/linux/prcu.h b/include/linux/prcu.h
> new file mode 100644
> index ..653b4633
> --- /dev/null
> +++ b/include/linux/prcu.h
> @@ -0,0 +1,37 @@
> +#ifndef __LINUX_PRCU_H
> +#define __LINUX_PRCU_H
> +
> +#include 
> +#include 
> +#include 
> +
> +#define CONFIG_PRCU
> +
> +struct prcu_local_struct {
> +   unsigned int locked;
> +   unsigned int online;
> +   unsigned long long version;
> +};
> +
> +struct prcu_struct {
> +   atomic64_t global_version;
> +   atomic_t active_ctr;
> +   struct mutex mtx;
> +   wait_queue_head_t wait_q;
> +};
> +
> +#ifdef CONFIG_PRCU
> +void prcu_read_lock(void);
> +void prcu_read_unlock(void);
> +void synchronize_prcu(void);
> +void prcu_note_context_switch(void);
> +
> +#else /* #ifdef CONFIG_PRCU */
> +
> +#define prcu_read_lock() do {} while (0)
> +#define prcu_read_unlock() do {} while (0)
> +#define synchronize_prcu() do {} while (0)
> +#define prcu_note_context_switch() do {} while (0)
> +
> +#endif /* #ifdef CONFIG_PRCU */
> +#endif /* __LINUX_PRCU_H */
> diff --git a/kernel/rcu/Makefile b/kernel/rcu/Makefile
> index 23803c7d..8791419c 100644
> --- a/kernel/rcu/Makefile
> +++ b/kernel/rcu/Makefile
> @@ -2,7 +2,7 @@
>  # and is generally not a function of system call inputs.
>  KCOV_INSTRUMENT := n
>
> -obj-y += update.o sync.o
> +obj-y += update.o sync.o prcu.o
>  obj-$(CONFIG_CLASSIC_SRCU) += srcu.o
>  obj-$(CONFIG_TREE_SRCU) += srcutree.o
>  obj-$(CONFIG_TINY_SRCU) += srcutiny.o
> diff --git a/kernel/rcu/prcu.c b/kernel/rcu/prcu.c
> new file mode 100644
> index ..a00b9420
> --- /dev/null
> +++ b/kernel/rcu/prcu.c
> @@ -0,0 +1,125 @@
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +
> +DEFINE_PER_CPU_SHARED_ALIGNED(struct prcu_local_struct, prcu_local);
> +
> +struct prcu_struct global_prcu = {
> +   .global_version = ATOMIC64_INIT(0),
> +   .active_ctr = ATOMIC_INIT(0),
> +   .mtx = __MUTEX_INITIALIZER(global_prcu.mtx),
> +   .wait_q = __WAIT_QUEUE_HEAD_INITIALIZER(global_prcu.wait_q)
> +};
> +struct prcu_struct *prcu = _prcu;
> +
> +static inline void prcu_report(struct prcu_local_struct *local)
> +{
> +   unsigned long long global_version;
> +   unsigned long long local_version;
> +
> +   global_version = atomic64_read(>global_version);
> +   local_version = local->version;
> +   if (global_version > local_version)
> +   cmpxchg(>version, local_version, global_version);

It is called with irq-disabled, and local->version can't be modified on
other cpu. why cmpxchg is needed?

> +}
> +
> +void prcu_read_lock(void)
> +{
> +   struct prcu_local_struct *local;
> +
> +   local = get_cpu_ptr(_local);
> +   if (!local->online) {
> +   WRITE_ONCE(local->online, 1);
> +   smp_mb();

What's is the paired code?

> +   }
> +
> +   local->locked++;
> +   put_cpu_ptr(_local);
> +}
> +EXPORT_SYMBOL(prcu_read_lock);
> +
> +void prcu_read_unlock(void)
> +{
> +   int locked;
> +   struct prcu_local_struct *local;
> +
> +   barrier();
> +   local = get_cpu_ptr(_local);
> +   locked = local->locked;
> +   if (locked) {
> +   local->locked--;
> +   if (locked == 1)
> +   prcu_report(local);
> +   put_cpu_ptr(_local);
> +   } else {
> +   put_cpu_ptr(_local);
> +   if (!atomic_dec_return(>active_ctr))
> +   wake_up(>wait_q);
> +   }
> +}
> +EXPORT_SYMBOL(prcu_read_unlock);
> +
> +static void prcu_handler(void *info)
> +{
> +   struct prcu_local_struct *local;
> +
> +   local = this_cpu_ptr(_local);
> +   if (!local->locked)
> +   WRITE_ONCE(local->version, 
> atomic64_read(>global_version));
> +}
> +
> +void synchronize_prcu(void)
> +{
> +   int cpu;
> +   cpumask_t cpus;

It might overflow the stack if the cpumask is large, please move it to
struct prcu.

> +   unsigned long long version;
> +   struct 

Re: [PATCH RFC 01/16] prcu: Add PRCU implementation

2018-01-29 Thread Lai Jiangshan
On Tue, Jan 23, 2018 at 3:59 PM,   wrote:
> From: Heng Zhang 
>
> This RCU implementation (PRCU) is based on a fast consensus protocol
> published in the following paper:
>
> Fast Consensus Using Bounded Staleness for Scalable Read-mostly 
> Synchronization.
> Haibo Chen, Heng Zhang, Ran Liu, Binyu Zang, and Haibing Guan.
> IEEE Transactions on Parallel and Distributed Systems (TPDS), 2016.
> https://dl.acm.org/citation.cfm?id=3024114.3024143
>
> Signed-off-by: Heng Zhang 
> Signed-off-by: Lihao Liang 
> ---
>  include/linux/prcu.h |  37 +++
>  kernel/rcu/Makefile  |   2 +-
>  kernel/rcu/prcu.c| 125 
> +++
>  kernel/sched/core.c  |   2 +
>  4 files changed, 165 insertions(+), 1 deletion(-)
>  create mode 100644 include/linux/prcu.h
>  create mode 100644 kernel/rcu/prcu.c
>
> diff --git a/include/linux/prcu.h b/include/linux/prcu.h
> new file mode 100644
> index ..653b4633
> --- /dev/null
> +++ b/include/linux/prcu.h
> @@ -0,0 +1,37 @@
> +#ifndef __LINUX_PRCU_H
> +#define __LINUX_PRCU_H
> +
> +#include 
> +#include 
> +#include 
> +
> +#define CONFIG_PRCU
> +
> +struct prcu_local_struct {
> +   unsigned int locked;
> +   unsigned int online;
> +   unsigned long long version;
> +};
> +
> +struct prcu_struct {
> +   atomic64_t global_version;
> +   atomic_t active_ctr;
> +   struct mutex mtx;
> +   wait_queue_head_t wait_q;
> +};
> +
> +#ifdef CONFIG_PRCU
> +void prcu_read_lock(void);
> +void prcu_read_unlock(void);
> +void synchronize_prcu(void);
> +void prcu_note_context_switch(void);
> +
> +#else /* #ifdef CONFIG_PRCU */
> +
> +#define prcu_read_lock() do {} while (0)
> +#define prcu_read_unlock() do {} while (0)
> +#define synchronize_prcu() do {} while (0)
> +#define prcu_note_context_switch() do {} while (0)
> +
> +#endif /* #ifdef CONFIG_PRCU */
> +#endif /* __LINUX_PRCU_H */
> diff --git a/kernel/rcu/Makefile b/kernel/rcu/Makefile
> index 23803c7d..8791419c 100644
> --- a/kernel/rcu/Makefile
> +++ b/kernel/rcu/Makefile
> @@ -2,7 +2,7 @@
>  # and is generally not a function of system call inputs.
>  KCOV_INSTRUMENT := n
>
> -obj-y += update.o sync.o
> +obj-y += update.o sync.o prcu.o
>  obj-$(CONFIG_CLASSIC_SRCU) += srcu.o
>  obj-$(CONFIG_TREE_SRCU) += srcutree.o
>  obj-$(CONFIG_TINY_SRCU) += srcutiny.o
> diff --git a/kernel/rcu/prcu.c b/kernel/rcu/prcu.c
> new file mode 100644
> index ..a00b9420
> --- /dev/null
> +++ b/kernel/rcu/prcu.c
> @@ -0,0 +1,125 @@
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +
> +DEFINE_PER_CPU_SHARED_ALIGNED(struct prcu_local_struct, prcu_local);
> +
> +struct prcu_struct global_prcu = {
> +   .global_version = ATOMIC64_INIT(0),
> +   .active_ctr = ATOMIC_INIT(0),
> +   .mtx = __MUTEX_INITIALIZER(global_prcu.mtx),
> +   .wait_q = __WAIT_QUEUE_HEAD_INITIALIZER(global_prcu.wait_q)
> +};
> +struct prcu_struct *prcu = _prcu;
> +
> +static inline void prcu_report(struct prcu_local_struct *local)
> +{
> +   unsigned long long global_version;
> +   unsigned long long local_version;
> +
> +   global_version = atomic64_read(>global_version);
> +   local_version = local->version;
> +   if (global_version > local_version)
> +   cmpxchg(>version, local_version, global_version);

It is called with irq-disabled, and local->version can't be modified on
other cpu. why cmpxchg is needed?

> +}
> +
> +void prcu_read_lock(void)
> +{
> +   struct prcu_local_struct *local;
> +
> +   local = get_cpu_ptr(_local);
> +   if (!local->online) {
> +   WRITE_ONCE(local->online, 1);
> +   smp_mb();

What's is the paired code?

> +   }
> +
> +   local->locked++;
> +   put_cpu_ptr(_local);
> +}
> +EXPORT_SYMBOL(prcu_read_lock);
> +
> +void prcu_read_unlock(void)
> +{
> +   int locked;
> +   struct prcu_local_struct *local;
> +
> +   barrier();
> +   local = get_cpu_ptr(_local);
> +   locked = local->locked;
> +   if (locked) {
> +   local->locked--;
> +   if (locked == 1)
> +   prcu_report(local);
> +   put_cpu_ptr(_local);
> +   } else {
> +   put_cpu_ptr(_local);
> +   if (!atomic_dec_return(>active_ctr))
> +   wake_up(>wait_q);
> +   }
> +}
> +EXPORT_SYMBOL(prcu_read_unlock);
> +
> +static void prcu_handler(void *info)
> +{
> +   struct prcu_local_struct *local;
> +
> +   local = this_cpu_ptr(_local);
> +   if (!local->locked)
> +   WRITE_ONCE(local->version, 
> atomic64_read(>global_version));
> +}
> +
> +void synchronize_prcu(void)
> +{
> +   int cpu;
> +   cpumask_t cpus;

It might overflow the stack if the cpumask is large, please move it to
struct prcu.

> +   unsigned long long version;
> +   struct prcu_local_struct *local;
> +
> +   version = atomic64_add_return(1, >global_version);


Re: [PATCH RFC 01/16] prcu: Add PRCU implementation

2018-01-26 Thread Lihao Liang
On Thu, Jan 25, 2018 at 6:16 AM, Paul E. McKenney
 wrote:
> On Tue, Jan 23, 2018 at 03:59:26PM +0800, liangli...@huawei.com wrote:
>> From: Heng Zhang 
>>
>> This RCU implementation (PRCU) is based on a fast consensus protocol
>> published in the following paper:
>>
>> Fast Consensus Using Bounded Staleness for Scalable Read-mostly 
>> Synchronization.
>> Haibo Chen, Heng Zhang, Ran Liu, Binyu Zang, and Haibing Guan.
>> IEEE Transactions on Parallel and Distributed Systems (TPDS), 2016.
>> https://dl.acm.org/citation.cfm?id=3024114.3024143
>>
>> Signed-off-by: Heng Zhang 
>> Signed-off-by: Lihao Liang 
>
> A few comments and questions interspersed.
>
> Thanx, Paul
>
>> ---
>>  include/linux/prcu.h |  37 +++
>>  kernel/rcu/Makefile  |   2 +-
>>  kernel/rcu/prcu.c| 125 
>> +++
>>  kernel/sched/core.c  |   2 +
>>  4 files changed, 165 insertions(+), 1 deletion(-)
>>  create mode 100644 include/linux/prcu.h
>>  create mode 100644 kernel/rcu/prcu.c
>>
>> diff --git a/include/linux/prcu.h b/include/linux/prcu.h
>> new file mode 100644
>> index ..653b4633
>> --- /dev/null
>> +++ b/include/linux/prcu.h
>> @@ -0,0 +1,37 @@
>> +#ifndef __LINUX_PRCU_H
>> +#define __LINUX_PRCU_H
>> +
>> +#include 
>> +#include 
>> +#include 
>> +
>> +#define CONFIG_PRCU
>> +
>> +struct prcu_local_struct {
>> + unsigned int locked;
>> + unsigned int online;
>> + unsigned long long version;
>> +};
>> +
>> +struct prcu_struct {
>> + atomic64_t global_version;
>> + atomic_t active_ctr;
>> + struct mutex mtx;
>> + wait_queue_head_t wait_q;
>> +};
>> +
>> +#ifdef CONFIG_PRCU
>> +void prcu_read_lock(void);
>> +void prcu_read_unlock(void);
>> +void synchronize_prcu(void);
>> +void prcu_note_context_switch(void);
>> +
>> +#else /* #ifdef CONFIG_PRCU */
>> +
>> +#define prcu_read_lock() do {} while (0)
>> +#define prcu_read_unlock() do {} while (0)
>> +#define synchronize_prcu() do {} while (0)
>> +#define prcu_note_context_switch() do {} while (0)
>
> If CONFIG_PRCU=n and some code is built that uses PRCU, shouldn't you
> get a build error rather than an error-free but inoperative PRCU?
>

Very good point, thank you!

> Of course, Peter's question about purpose of the patch set applies
> here as well.
>

The main motivation of this patch set is the comparison results of
rcuperf between PRCU and Tree RCU in which PRCU outperformed Tree RCU
by a large margin.

As indicated in your reply of the email in this patch series

[PATCH RFC 00/16] A new RCU implementation based on a fast consensus protocol

this may be a bug on either expedited RCU grace-period latency or on
rcuperf's measurements.

Many thanks,
Lihao.

>> +
>> +#endif /* #ifdef CONFIG_PRCU */
>> +#endif /* __LINUX_PRCU_H */
>> diff --git a/kernel/rcu/Makefile b/kernel/rcu/Makefile
>> index 23803c7d..8791419c 100644
>> --- a/kernel/rcu/Makefile
>> +++ b/kernel/rcu/Makefile
>> @@ -2,7 +2,7 @@
>>  # and is generally not a function of system call inputs.
>>  KCOV_INSTRUMENT := n
>>
>> -obj-y += update.o sync.o
>> +obj-y += update.o sync.o prcu.o
>>  obj-$(CONFIG_CLASSIC_SRCU) += srcu.o
>>  obj-$(CONFIG_TREE_SRCU) += srcutree.o
>>  obj-$(CONFIG_TINY_SRCU) += srcutiny.o
>> diff --git a/kernel/rcu/prcu.c b/kernel/rcu/prcu.c
>> new file mode 100644
>> index ..a00b9420
>> --- /dev/null
>> +++ b/kernel/rcu/prcu.c
>> @@ -0,0 +1,125 @@
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +
>> +#include 
>> +
>> +DEFINE_PER_CPU_SHARED_ALIGNED(struct prcu_local_struct, prcu_local);
>> +
>> +struct prcu_struct global_prcu = {
>> + .global_version = ATOMIC64_INIT(0),
>> + .active_ctr = ATOMIC_INIT(0),
>> + .mtx = __MUTEX_INITIALIZER(global_prcu.mtx),
>> + .wait_q = __WAIT_QUEUE_HEAD_INITIALIZER(global_prcu.wait_q)
>> +};
>> +struct prcu_struct *prcu = _prcu;
>> +
>> +static inline void prcu_report(struct prcu_local_struct *local)
>> +{
>> + unsigned long long global_version;
>> + unsigned long long local_version;
>> +
>> + global_version = atomic64_read(>global_version);
>> + local_version = local->version;
>> + if (global_version > local_version)
>> + cmpxchg(>version, local_version, global_version);
>> +}
>> +
>> +void prcu_read_lock(void)
>> +{
>> + struct prcu_local_struct *local;
>> +
>> + local = get_cpu_ptr(_local);
>> + if (!local->online) {
>> + WRITE_ONCE(local->online, 1);
>> + smp_mb();
>> + }
>> +
>> + local->locked++;
>> + put_cpu_ptr(_local);
>> +}
>> +EXPORT_SYMBOL(prcu_read_lock);
>> +
>> +void prcu_read_unlock(void)
>> +{
>> + int locked;
>> + struct prcu_local_struct *local;
>> +
>> + barrier();
>> + local = get_cpu_ptr(_local);
>> + locked = local->locked;
>> + if (locked) {
>> + 

Re: [PATCH RFC 01/16] prcu: Add PRCU implementation

2018-01-26 Thread Lihao Liang
On Thu, Jan 25, 2018 at 6:16 AM, Paul E. McKenney
 wrote:
> On Tue, Jan 23, 2018 at 03:59:26PM +0800, liangli...@huawei.com wrote:
>> From: Heng Zhang 
>>
>> This RCU implementation (PRCU) is based on a fast consensus protocol
>> published in the following paper:
>>
>> Fast Consensus Using Bounded Staleness for Scalable Read-mostly 
>> Synchronization.
>> Haibo Chen, Heng Zhang, Ran Liu, Binyu Zang, and Haibing Guan.
>> IEEE Transactions on Parallel and Distributed Systems (TPDS), 2016.
>> https://dl.acm.org/citation.cfm?id=3024114.3024143
>>
>> Signed-off-by: Heng Zhang 
>> Signed-off-by: Lihao Liang 
>
> A few comments and questions interspersed.
>
> Thanx, Paul
>
>> ---
>>  include/linux/prcu.h |  37 +++
>>  kernel/rcu/Makefile  |   2 +-
>>  kernel/rcu/prcu.c| 125 
>> +++
>>  kernel/sched/core.c  |   2 +
>>  4 files changed, 165 insertions(+), 1 deletion(-)
>>  create mode 100644 include/linux/prcu.h
>>  create mode 100644 kernel/rcu/prcu.c
>>
>> diff --git a/include/linux/prcu.h b/include/linux/prcu.h
>> new file mode 100644
>> index ..653b4633
>> --- /dev/null
>> +++ b/include/linux/prcu.h
>> @@ -0,0 +1,37 @@
>> +#ifndef __LINUX_PRCU_H
>> +#define __LINUX_PRCU_H
>> +
>> +#include 
>> +#include 
>> +#include 
>> +
>> +#define CONFIG_PRCU
>> +
>> +struct prcu_local_struct {
>> + unsigned int locked;
>> + unsigned int online;
>> + unsigned long long version;
>> +};
>> +
>> +struct prcu_struct {
>> + atomic64_t global_version;
>> + atomic_t active_ctr;
>> + struct mutex mtx;
>> + wait_queue_head_t wait_q;
>> +};
>> +
>> +#ifdef CONFIG_PRCU
>> +void prcu_read_lock(void);
>> +void prcu_read_unlock(void);
>> +void synchronize_prcu(void);
>> +void prcu_note_context_switch(void);
>> +
>> +#else /* #ifdef CONFIG_PRCU */
>> +
>> +#define prcu_read_lock() do {} while (0)
>> +#define prcu_read_unlock() do {} while (0)
>> +#define synchronize_prcu() do {} while (0)
>> +#define prcu_note_context_switch() do {} while (0)
>
> If CONFIG_PRCU=n and some code is built that uses PRCU, shouldn't you
> get a build error rather than an error-free but inoperative PRCU?
>

Very good point, thank you!

> Of course, Peter's question about purpose of the patch set applies
> here as well.
>

The main motivation of this patch set is the comparison results of
rcuperf between PRCU and Tree RCU in which PRCU outperformed Tree RCU
by a large margin.

As indicated in your reply of the email in this patch series

[PATCH RFC 00/16] A new RCU implementation based on a fast consensus protocol

this may be a bug on either expedited RCU grace-period latency or on
rcuperf's measurements.

Many thanks,
Lihao.

>> +
>> +#endif /* #ifdef CONFIG_PRCU */
>> +#endif /* __LINUX_PRCU_H */
>> diff --git a/kernel/rcu/Makefile b/kernel/rcu/Makefile
>> index 23803c7d..8791419c 100644
>> --- a/kernel/rcu/Makefile
>> +++ b/kernel/rcu/Makefile
>> @@ -2,7 +2,7 @@
>>  # and is generally not a function of system call inputs.
>>  KCOV_INSTRUMENT := n
>>
>> -obj-y += update.o sync.o
>> +obj-y += update.o sync.o prcu.o
>>  obj-$(CONFIG_CLASSIC_SRCU) += srcu.o
>>  obj-$(CONFIG_TREE_SRCU) += srcutree.o
>>  obj-$(CONFIG_TINY_SRCU) += srcutiny.o
>> diff --git a/kernel/rcu/prcu.c b/kernel/rcu/prcu.c
>> new file mode 100644
>> index ..a00b9420
>> --- /dev/null
>> +++ b/kernel/rcu/prcu.c
>> @@ -0,0 +1,125 @@
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +
>> +#include 
>> +
>> +DEFINE_PER_CPU_SHARED_ALIGNED(struct prcu_local_struct, prcu_local);
>> +
>> +struct prcu_struct global_prcu = {
>> + .global_version = ATOMIC64_INIT(0),
>> + .active_ctr = ATOMIC_INIT(0),
>> + .mtx = __MUTEX_INITIALIZER(global_prcu.mtx),
>> + .wait_q = __WAIT_QUEUE_HEAD_INITIALIZER(global_prcu.wait_q)
>> +};
>> +struct prcu_struct *prcu = _prcu;
>> +
>> +static inline void prcu_report(struct prcu_local_struct *local)
>> +{
>> + unsigned long long global_version;
>> + unsigned long long local_version;
>> +
>> + global_version = atomic64_read(>global_version);
>> + local_version = local->version;
>> + if (global_version > local_version)
>> + cmpxchg(>version, local_version, global_version);
>> +}
>> +
>> +void prcu_read_lock(void)
>> +{
>> + struct prcu_local_struct *local;
>> +
>> + local = get_cpu_ptr(_local);
>> + if (!local->online) {
>> + WRITE_ONCE(local->online, 1);
>> + smp_mb();
>> + }
>> +
>> + local->locked++;
>> + put_cpu_ptr(_local);
>> +}
>> +EXPORT_SYMBOL(prcu_read_lock);
>> +
>> +void prcu_read_unlock(void)
>> +{
>> + int locked;
>> + struct prcu_local_struct *local;
>> +
>> + barrier();
>> + local = get_cpu_ptr(_local);
>> + locked = local->locked;
>> + if (locked) {
>> + local->locked--;
>> + if (locked == 1)
>> + 

Re: [PATCH RFC 01/16] prcu: Add PRCU implementation

2018-01-24 Thread Boqun Feng
On Wed, Jan 24, 2018 at 10:16:18PM -0800, Paul E. McKenney wrote:
> On Tue, Jan 23, 2018 at 03:59:26PM +0800, liangli...@huawei.com wrote:
> > From: Heng Zhang 
> > 
> > This RCU implementation (PRCU) is based on a fast consensus protocol
> > published in the following paper:
> > 
> > Fast Consensus Using Bounded Staleness for Scalable Read-mostly 
> > Synchronization.
> > Haibo Chen, Heng Zhang, Ran Liu, Binyu Zang, and Haibing Guan.
> > IEEE Transactions on Parallel and Distributed Systems (TPDS), 2016.
> > https://dl.acm.org/citation.cfm?id=3024114.3024143
> > 
> > Signed-off-by: Heng Zhang 
> > Signed-off-by: Lihao Liang 
> 
> A few comments and questions interspersed.
> 
>   Thanx, Paul
> 
> > ---
> >  include/linux/prcu.h |  37 +++
> >  kernel/rcu/Makefile  |   2 +-
> >  kernel/rcu/prcu.c| 125 
> > +++
> >  kernel/sched/core.c  |   2 +
> >  4 files changed, 165 insertions(+), 1 deletion(-)
> >  create mode 100644 include/linux/prcu.h
> >  create mode 100644 kernel/rcu/prcu.c
> > 
> > diff --git a/include/linux/prcu.h b/include/linux/prcu.h
> > new file mode 100644
> > index ..653b4633
> > --- /dev/null
> > +++ b/include/linux/prcu.h
> > @@ -0,0 +1,37 @@
> > +#ifndef __LINUX_PRCU_H
> > +#define __LINUX_PRCU_H
> > +
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#define CONFIG_PRCU
> > +
> > +struct prcu_local_struct {
> > +   unsigned int locked;
> > +   unsigned int online;
> > +   unsigned long long version;
> > +};
> > +
> > +struct prcu_struct {
> > +   atomic64_t global_version;
> > +   atomic_t active_ctr;
> > +   struct mutex mtx;
> > +   wait_queue_head_t wait_q;
> > +};
> > +
> > +#ifdef CONFIG_PRCU
> > +void prcu_read_lock(void);
> > +void prcu_read_unlock(void);
> > +void synchronize_prcu(void);
> > +void prcu_note_context_switch(void);
> > +
> > +#else /* #ifdef CONFIG_PRCU */
> > +
> > +#define prcu_read_lock() do {} while (0)
> > +#define prcu_read_unlock() do {} while (0)
> > +#define synchronize_prcu() do {} while (0)
> > +#define prcu_note_context_switch() do {} while (0)
> 
> If CONFIG_PRCU=n and some code is built that uses PRCU, shouldn't you
> get a build error rather than an error-free but inoperative PRCU?
> 
> Of course, Peter's question about purpose of the patch set applies
> here as well.
> 
> > +
> > +#endif /* #ifdef CONFIG_PRCU */
> > +#endif /* __LINUX_PRCU_H */
> > diff --git a/kernel/rcu/Makefile b/kernel/rcu/Makefile
> > index 23803c7d..8791419c 100644
> > --- a/kernel/rcu/Makefile
> > +++ b/kernel/rcu/Makefile
> > @@ -2,7 +2,7 @@
> >  # and is generally not a function of system call inputs.
> >  KCOV_INSTRUMENT := n
> > 
> > -obj-y += update.o sync.o
> > +obj-y += update.o sync.o prcu.o
> >  obj-$(CONFIG_CLASSIC_SRCU) += srcu.o
> >  obj-$(CONFIG_TREE_SRCU) += srcutree.o
> >  obj-$(CONFIG_TINY_SRCU) += srcutiny.o
> > diff --git a/kernel/rcu/prcu.c b/kernel/rcu/prcu.c
> > new file mode 100644
> > index ..a00b9420
> > --- /dev/null
> > +++ b/kernel/rcu/prcu.c
> > @@ -0,0 +1,125 @@
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#include 
> > +
> > +DEFINE_PER_CPU_SHARED_ALIGNED(struct prcu_local_struct, prcu_local);
> > +
> > +struct prcu_struct global_prcu = {
> > +   .global_version = ATOMIC64_INIT(0),
> > +   .active_ctr = ATOMIC_INIT(0),
> > +   .mtx = __MUTEX_INITIALIZER(global_prcu.mtx),
> > +   .wait_q = __WAIT_QUEUE_HEAD_INITIALIZER(global_prcu.wait_q)
> > +};
> > +struct prcu_struct *prcu = _prcu;
> > +
> > +static inline void prcu_report(struct prcu_local_struct *local)
> > +{
> > +   unsigned long long global_version;
> > +   unsigned long long local_version;
> > +
> > +   global_version = atomic64_read(>global_version);
> > +   local_version = local->version;
> > +   if (global_version > local_version)
> > +   cmpxchg(>version, local_version, global_version);
> > +}
> > +
> > +void prcu_read_lock(void)
> > +{
> > +   struct prcu_local_struct *local;
> > +
> > +   local = get_cpu_ptr(_local);
> > +   if (!local->online) {
> > +   WRITE_ONCE(local->online, 1);
> > +   smp_mb();
> > +   }
> > +
> > +   local->locked++;
> > +   put_cpu_ptr(_local);
> > +}
> > +EXPORT_SYMBOL(prcu_read_lock);
> > +
> > +void prcu_read_unlock(void)
> > +{
> > +   int locked;
> > +   struct prcu_local_struct *local;
> > +
> > +   barrier();
> > +   local = get_cpu_ptr(_local);
> > +   locked = local->locked;
> > +   if (locked) {
> > +   local->locked--;
> > +   if (locked == 1)
> > +   prcu_report(local);
> 
> Is ordering important here?  It looks to me that the compiler could
> rearrange some of the accesses within prcu_report() with the local->locked
> decrement.  There appears to be some potential for load and store tearing,
> though perhaps you have verified that your compiler avoids 

Re: [PATCH RFC 01/16] prcu: Add PRCU implementation

2018-01-24 Thread Boqun Feng
On Wed, Jan 24, 2018 at 10:16:18PM -0800, Paul E. McKenney wrote:
> On Tue, Jan 23, 2018 at 03:59:26PM +0800, liangli...@huawei.com wrote:
> > From: Heng Zhang 
> > 
> > This RCU implementation (PRCU) is based on a fast consensus protocol
> > published in the following paper:
> > 
> > Fast Consensus Using Bounded Staleness for Scalable Read-mostly 
> > Synchronization.
> > Haibo Chen, Heng Zhang, Ran Liu, Binyu Zang, and Haibing Guan.
> > IEEE Transactions on Parallel and Distributed Systems (TPDS), 2016.
> > https://dl.acm.org/citation.cfm?id=3024114.3024143
> > 
> > Signed-off-by: Heng Zhang 
> > Signed-off-by: Lihao Liang 
> 
> A few comments and questions interspersed.
> 
>   Thanx, Paul
> 
> > ---
> >  include/linux/prcu.h |  37 +++
> >  kernel/rcu/Makefile  |   2 +-
> >  kernel/rcu/prcu.c| 125 
> > +++
> >  kernel/sched/core.c  |   2 +
> >  4 files changed, 165 insertions(+), 1 deletion(-)
> >  create mode 100644 include/linux/prcu.h
> >  create mode 100644 kernel/rcu/prcu.c
> > 
> > diff --git a/include/linux/prcu.h b/include/linux/prcu.h
> > new file mode 100644
> > index ..653b4633
> > --- /dev/null
> > +++ b/include/linux/prcu.h
> > @@ -0,0 +1,37 @@
> > +#ifndef __LINUX_PRCU_H
> > +#define __LINUX_PRCU_H
> > +
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#define CONFIG_PRCU
> > +
> > +struct prcu_local_struct {
> > +   unsigned int locked;
> > +   unsigned int online;
> > +   unsigned long long version;
> > +};
> > +
> > +struct prcu_struct {
> > +   atomic64_t global_version;
> > +   atomic_t active_ctr;
> > +   struct mutex mtx;
> > +   wait_queue_head_t wait_q;
> > +};
> > +
> > +#ifdef CONFIG_PRCU
> > +void prcu_read_lock(void);
> > +void prcu_read_unlock(void);
> > +void synchronize_prcu(void);
> > +void prcu_note_context_switch(void);
> > +
> > +#else /* #ifdef CONFIG_PRCU */
> > +
> > +#define prcu_read_lock() do {} while (0)
> > +#define prcu_read_unlock() do {} while (0)
> > +#define synchronize_prcu() do {} while (0)
> > +#define prcu_note_context_switch() do {} while (0)
> 
> If CONFIG_PRCU=n and some code is built that uses PRCU, shouldn't you
> get a build error rather than an error-free but inoperative PRCU?
> 
> Of course, Peter's question about purpose of the patch set applies
> here as well.
> 
> > +
> > +#endif /* #ifdef CONFIG_PRCU */
> > +#endif /* __LINUX_PRCU_H */
> > diff --git a/kernel/rcu/Makefile b/kernel/rcu/Makefile
> > index 23803c7d..8791419c 100644
> > --- a/kernel/rcu/Makefile
> > +++ b/kernel/rcu/Makefile
> > @@ -2,7 +2,7 @@
> >  # and is generally not a function of system call inputs.
> >  KCOV_INSTRUMENT := n
> > 
> > -obj-y += update.o sync.o
> > +obj-y += update.o sync.o prcu.o
> >  obj-$(CONFIG_CLASSIC_SRCU) += srcu.o
> >  obj-$(CONFIG_TREE_SRCU) += srcutree.o
> >  obj-$(CONFIG_TINY_SRCU) += srcutiny.o
> > diff --git a/kernel/rcu/prcu.c b/kernel/rcu/prcu.c
> > new file mode 100644
> > index ..a00b9420
> > --- /dev/null
> > +++ b/kernel/rcu/prcu.c
> > @@ -0,0 +1,125 @@
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#include 
> > +
> > +DEFINE_PER_CPU_SHARED_ALIGNED(struct prcu_local_struct, prcu_local);
> > +
> > +struct prcu_struct global_prcu = {
> > +   .global_version = ATOMIC64_INIT(0),
> > +   .active_ctr = ATOMIC_INIT(0),
> > +   .mtx = __MUTEX_INITIALIZER(global_prcu.mtx),
> > +   .wait_q = __WAIT_QUEUE_HEAD_INITIALIZER(global_prcu.wait_q)
> > +};
> > +struct prcu_struct *prcu = _prcu;
> > +
> > +static inline void prcu_report(struct prcu_local_struct *local)
> > +{
> > +   unsigned long long global_version;
> > +   unsigned long long local_version;
> > +
> > +   global_version = atomic64_read(>global_version);
> > +   local_version = local->version;
> > +   if (global_version > local_version)
> > +   cmpxchg(>version, local_version, global_version);
> > +}
> > +
> > +void prcu_read_lock(void)
> > +{
> > +   struct prcu_local_struct *local;
> > +
> > +   local = get_cpu_ptr(_local);
> > +   if (!local->online) {
> > +   WRITE_ONCE(local->online, 1);
> > +   smp_mb();
> > +   }
> > +
> > +   local->locked++;
> > +   put_cpu_ptr(_local);
> > +}
> > +EXPORT_SYMBOL(prcu_read_lock);
> > +
> > +void prcu_read_unlock(void)
> > +{
> > +   int locked;
> > +   struct prcu_local_struct *local;
> > +
> > +   barrier();
> > +   local = get_cpu_ptr(_local);
> > +   locked = local->locked;
> > +   if (locked) {
> > +   local->locked--;
> > +   if (locked == 1)
> > +   prcu_report(local);
> 
> Is ordering important here?  It looks to me that the compiler could
> rearrange some of the accesses within prcu_report() with the local->locked
> decrement.  There appears to be some potential for load and store tearing,
> though perhaps you have verified that your compiler avoids this on
> the architecture that you are using.
> 
> > + 

Re: [PATCH RFC 01/16] prcu: Add PRCU implementation

2018-01-24 Thread Paul E. McKenney
On Tue, Jan 23, 2018 at 03:59:26PM +0800, liangli...@huawei.com wrote:
> From: Heng Zhang 
> 
> This RCU implementation (PRCU) is based on a fast consensus protocol
> published in the following paper:
> 
> Fast Consensus Using Bounded Staleness for Scalable Read-mostly 
> Synchronization.
> Haibo Chen, Heng Zhang, Ran Liu, Binyu Zang, and Haibing Guan.
> IEEE Transactions on Parallel and Distributed Systems (TPDS), 2016.
> https://dl.acm.org/citation.cfm?id=3024114.3024143
> 
> Signed-off-by: Heng Zhang 
> Signed-off-by: Lihao Liang 

A few comments and questions interspersed.

Thanx, Paul

> ---
>  include/linux/prcu.h |  37 +++
>  kernel/rcu/Makefile  |   2 +-
>  kernel/rcu/prcu.c| 125 
> +++
>  kernel/sched/core.c  |   2 +
>  4 files changed, 165 insertions(+), 1 deletion(-)
>  create mode 100644 include/linux/prcu.h
>  create mode 100644 kernel/rcu/prcu.c
> 
> diff --git a/include/linux/prcu.h b/include/linux/prcu.h
> new file mode 100644
> index ..653b4633
> --- /dev/null
> +++ b/include/linux/prcu.h
> @@ -0,0 +1,37 @@
> +#ifndef __LINUX_PRCU_H
> +#define __LINUX_PRCU_H
> +
> +#include 
> +#include 
> +#include 
> +
> +#define CONFIG_PRCU
> +
> +struct prcu_local_struct {
> + unsigned int locked;
> + unsigned int online;
> + unsigned long long version;
> +};
> +
> +struct prcu_struct {
> + atomic64_t global_version;
> + atomic_t active_ctr;
> + struct mutex mtx;
> + wait_queue_head_t wait_q;
> +};
> +
> +#ifdef CONFIG_PRCU
> +void prcu_read_lock(void);
> +void prcu_read_unlock(void);
> +void synchronize_prcu(void);
> +void prcu_note_context_switch(void);
> +
> +#else /* #ifdef CONFIG_PRCU */
> +
> +#define prcu_read_lock() do {} while (0)
> +#define prcu_read_unlock() do {} while (0)
> +#define synchronize_prcu() do {} while (0)
> +#define prcu_note_context_switch() do {} while (0)

If CONFIG_PRCU=n and some code is built that uses PRCU, shouldn't you
get a build error rather than an error-free but inoperative PRCU?

Of course, Peter's question about purpose of the patch set applies
here as well.

> +
> +#endif /* #ifdef CONFIG_PRCU */
> +#endif /* __LINUX_PRCU_H */
> diff --git a/kernel/rcu/Makefile b/kernel/rcu/Makefile
> index 23803c7d..8791419c 100644
> --- a/kernel/rcu/Makefile
> +++ b/kernel/rcu/Makefile
> @@ -2,7 +2,7 @@
>  # and is generally not a function of system call inputs.
>  KCOV_INSTRUMENT := n
> 
> -obj-y += update.o sync.o
> +obj-y += update.o sync.o prcu.o
>  obj-$(CONFIG_CLASSIC_SRCU) += srcu.o
>  obj-$(CONFIG_TREE_SRCU) += srcutree.o
>  obj-$(CONFIG_TINY_SRCU) += srcutiny.o
> diff --git a/kernel/rcu/prcu.c b/kernel/rcu/prcu.c
> new file mode 100644
> index ..a00b9420
> --- /dev/null
> +++ b/kernel/rcu/prcu.c
> @@ -0,0 +1,125 @@
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +
> +DEFINE_PER_CPU_SHARED_ALIGNED(struct prcu_local_struct, prcu_local);
> +
> +struct prcu_struct global_prcu = {
> + .global_version = ATOMIC64_INIT(0),
> + .active_ctr = ATOMIC_INIT(0),
> + .mtx = __MUTEX_INITIALIZER(global_prcu.mtx),
> + .wait_q = __WAIT_QUEUE_HEAD_INITIALIZER(global_prcu.wait_q)
> +};
> +struct prcu_struct *prcu = _prcu;
> +
> +static inline void prcu_report(struct prcu_local_struct *local)
> +{
> + unsigned long long global_version;
> + unsigned long long local_version;
> +
> + global_version = atomic64_read(>global_version);
> + local_version = local->version;
> + if (global_version > local_version)
> + cmpxchg(>version, local_version, global_version);
> +}
> +
> +void prcu_read_lock(void)
> +{
> + struct prcu_local_struct *local;
> +
> + local = get_cpu_ptr(_local);
> + if (!local->online) {
> + WRITE_ONCE(local->online, 1);
> + smp_mb();
> + }
> +
> + local->locked++;
> + put_cpu_ptr(_local);
> +}
> +EXPORT_SYMBOL(prcu_read_lock);
> +
> +void prcu_read_unlock(void)
> +{
> + int locked;
> + struct prcu_local_struct *local;
> +
> + barrier();
> + local = get_cpu_ptr(_local);
> + locked = local->locked;
> + if (locked) {
> + local->locked--;
> + if (locked == 1)
> + prcu_report(local);

Is ordering important here?  It looks to me that the compiler could
rearrange some of the accesses within prcu_report() with the local->locked
decrement.  There appears to be some potential for load and store tearing,
though perhaps you have verified that your compiler avoids this on
the architecture that you are using.

> + put_cpu_ptr(_local);
> + } else {

Hmmm...  We get here if the RCU read-side critical section was preempted.
If none of them are preempted, ->active_ctr remains zero.

> + put_cpu_ptr(_local);
> + if 

Re: [PATCH RFC 01/16] prcu: Add PRCU implementation

2018-01-24 Thread Paul E. McKenney
On Tue, Jan 23, 2018 at 03:59:26PM +0800, liangli...@huawei.com wrote:
> From: Heng Zhang 
> 
> This RCU implementation (PRCU) is based on a fast consensus protocol
> published in the following paper:
> 
> Fast Consensus Using Bounded Staleness for Scalable Read-mostly 
> Synchronization.
> Haibo Chen, Heng Zhang, Ran Liu, Binyu Zang, and Haibing Guan.
> IEEE Transactions on Parallel and Distributed Systems (TPDS), 2016.
> https://dl.acm.org/citation.cfm?id=3024114.3024143
> 
> Signed-off-by: Heng Zhang 
> Signed-off-by: Lihao Liang 

A few comments and questions interspersed.

Thanx, Paul

> ---
>  include/linux/prcu.h |  37 +++
>  kernel/rcu/Makefile  |   2 +-
>  kernel/rcu/prcu.c| 125 
> +++
>  kernel/sched/core.c  |   2 +
>  4 files changed, 165 insertions(+), 1 deletion(-)
>  create mode 100644 include/linux/prcu.h
>  create mode 100644 kernel/rcu/prcu.c
> 
> diff --git a/include/linux/prcu.h b/include/linux/prcu.h
> new file mode 100644
> index ..653b4633
> --- /dev/null
> +++ b/include/linux/prcu.h
> @@ -0,0 +1,37 @@
> +#ifndef __LINUX_PRCU_H
> +#define __LINUX_PRCU_H
> +
> +#include 
> +#include 
> +#include 
> +
> +#define CONFIG_PRCU
> +
> +struct prcu_local_struct {
> + unsigned int locked;
> + unsigned int online;
> + unsigned long long version;
> +};
> +
> +struct prcu_struct {
> + atomic64_t global_version;
> + atomic_t active_ctr;
> + struct mutex mtx;
> + wait_queue_head_t wait_q;
> +};
> +
> +#ifdef CONFIG_PRCU
> +void prcu_read_lock(void);
> +void prcu_read_unlock(void);
> +void synchronize_prcu(void);
> +void prcu_note_context_switch(void);
> +
> +#else /* #ifdef CONFIG_PRCU */
> +
> +#define prcu_read_lock() do {} while (0)
> +#define prcu_read_unlock() do {} while (0)
> +#define synchronize_prcu() do {} while (0)
> +#define prcu_note_context_switch() do {} while (0)

If CONFIG_PRCU=n and some code is built that uses PRCU, shouldn't you
get a build error rather than an error-free but inoperative PRCU?

Of course, Peter's question about purpose of the patch set applies
here as well.

> +
> +#endif /* #ifdef CONFIG_PRCU */
> +#endif /* __LINUX_PRCU_H */
> diff --git a/kernel/rcu/Makefile b/kernel/rcu/Makefile
> index 23803c7d..8791419c 100644
> --- a/kernel/rcu/Makefile
> +++ b/kernel/rcu/Makefile
> @@ -2,7 +2,7 @@
>  # and is generally not a function of system call inputs.
>  KCOV_INSTRUMENT := n
> 
> -obj-y += update.o sync.o
> +obj-y += update.o sync.o prcu.o
>  obj-$(CONFIG_CLASSIC_SRCU) += srcu.o
>  obj-$(CONFIG_TREE_SRCU) += srcutree.o
>  obj-$(CONFIG_TINY_SRCU) += srcutiny.o
> diff --git a/kernel/rcu/prcu.c b/kernel/rcu/prcu.c
> new file mode 100644
> index ..a00b9420
> --- /dev/null
> +++ b/kernel/rcu/prcu.c
> @@ -0,0 +1,125 @@
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +
> +DEFINE_PER_CPU_SHARED_ALIGNED(struct prcu_local_struct, prcu_local);
> +
> +struct prcu_struct global_prcu = {
> + .global_version = ATOMIC64_INIT(0),
> + .active_ctr = ATOMIC_INIT(0),
> + .mtx = __MUTEX_INITIALIZER(global_prcu.mtx),
> + .wait_q = __WAIT_QUEUE_HEAD_INITIALIZER(global_prcu.wait_q)
> +};
> +struct prcu_struct *prcu = _prcu;
> +
> +static inline void prcu_report(struct prcu_local_struct *local)
> +{
> + unsigned long long global_version;
> + unsigned long long local_version;
> +
> + global_version = atomic64_read(>global_version);
> + local_version = local->version;
> + if (global_version > local_version)
> + cmpxchg(>version, local_version, global_version);
> +}
> +
> +void prcu_read_lock(void)
> +{
> + struct prcu_local_struct *local;
> +
> + local = get_cpu_ptr(_local);
> + if (!local->online) {
> + WRITE_ONCE(local->online, 1);
> + smp_mb();
> + }
> +
> + local->locked++;
> + put_cpu_ptr(_local);
> +}
> +EXPORT_SYMBOL(prcu_read_lock);
> +
> +void prcu_read_unlock(void)
> +{
> + int locked;
> + struct prcu_local_struct *local;
> +
> + barrier();
> + local = get_cpu_ptr(_local);
> + locked = local->locked;
> + if (locked) {
> + local->locked--;
> + if (locked == 1)
> + prcu_report(local);

Is ordering important here?  It looks to me that the compiler could
rearrange some of the accesses within prcu_report() with the local->locked
decrement.  There appears to be some potential for load and store tearing,
though perhaps you have verified that your compiler avoids this on
the architecture that you are using.

> + put_cpu_ptr(_local);
> + } else {

Hmmm...  We get here if the RCU read-side critical section was preempted.
If none of them are preempted, ->active_ctr remains zero.

> + put_cpu_ptr(_local);
> + if (!atomic_dec_return(>active_ctr))
> + wake_up(>wait_q);
> + }
> 

Re: [PATCH RFC 01/16] prcu: Add PRCU implementation

2018-01-24 Thread Peter Zijlstra
On Wed, Jan 24, 2018 at 05:15:30PM +, Lihao Liang wrote:
> Alternatively, the paper can be found at
> 
> http://ipads.se.sjtu.edu.cn/lib/exe/fetch.php?media=publications:consensus-tpds16.pdf

Thanks, I'll try and have a read.


Re: [PATCH RFC 01/16] prcu: Add PRCU implementation

2018-01-24 Thread Peter Zijlstra
On Wed, Jan 24, 2018 at 05:15:30PM +, Lihao Liang wrote:
> Alternatively, the paper can be found at
> 
> http://ipads.se.sjtu.edu.cn/lib/exe/fetch.php?media=publications:consensus-tpds16.pdf

Thanks, I'll try and have a read.


Re: [PATCH RFC 01/16] prcu: Add PRCU implementation

2018-01-24 Thread Lihao Liang
Dear Peter,

Many thanks for your comments. I will provide a proper changelog.
Alternatively, the paper can be found at

http://ipads.se.sjtu.edu.cn/lib/exe/fetch.php?media=publications:consensus-tpds16.pdf

Best,
Lihao.

On Wed, Jan 24, 2018 at 11:26 AM, Peter Zijlstra  wrote:
> On Tue, Jan 23, 2018 at 03:59:26PM +0800, liangli...@huawei.com wrote:
>> From: Heng Zhang 
>>
>> This RCU implementation (PRCU) is based on a fast consensus protocol
>> published in the following paper:
>>
>> Fast Consensus Using Bounded Staleness for Scalable Read-mostly 
>> Synchronization.
>> Haibo Chen, Heng Zhang, Ran Liu, Binyu Zang, and Haibing Guan.
>> IEEE Transactions on Parallel and Distributed Systems (TPDS), 2016.
>> https://dl.acm.org/citation.cfm?id=3024114.3024143
>
> That's an utterly useless changelog for something like a new RCU
> implementation.
>
> You fail to describe why you're proposing a new RCU implementation; what
> problems does it fix?, how is it better?
>
> All you provide is a paywalled link to some paper that we can't read.
>
> Please write a real changelog that describes things properly and
> provide, if at all possible, a readily accessible link to your paper.


Re: [PATCH RFC 01/16] prcu: Add PRCU implementation

2018-01-24 Thread Lihao Liang
Dear Peter,

Many thanks for your comments. I will provide a proper changelog.
Alternatively, the paper can be found at

http://ipads.se.sjtu.edu.cn/lib/exe/fetch.php?media=publications:consensus-tpds16.pdf

Best,
Lihao.

On Wed, Jan 24, 2018 at 11:26 AM, Peter Zijlstra  wrote:
> On Tue, Jan 23, 2018 at 03:59:26PM +0800, liangli...@huawei.com wrote:
>> From: Heng Zhang 
>>
>> This RCU implementation (PRCU) is based on a fast consensus protocol
>> published in the following paper:
>>
>> Fast Consensus Using Bounded Staleness for Scalable Read-mostly 
>> Synchronization.
>> Haibo Chen, Heng Zhang, Ran Liu, Binyu Zang, and Haibing Guan.
>> IEEE Transactions on Parallel and Distributed Systems (TPDS), 2016.
>> https://dl.acm.org/citation.cfm?id=3024114.3024143
>
> That's an utterly useless changelog for something like a new RCU
> implementation.
>
> You fail to describe why you're proposing a new RCU implementation; what
> problems does it fix?, how is it better?
>
> All you provide is a paywalled link to some paper that we can't read.
>
> Please write a real changelog that describes things properly and
> provide, if at all possible, a readily accessible link to your paper.


Re: [PATCH RFC 01/16] prcu: Add PRCU implementation

2018-01-24 Thread Peter Zijlstra
On Tue, Jan 23, 2018 at 03:59:26PM +0800, liangli...@huawei.com wrote:
> From: Heng Zhang 
> 
> This RCU implementation (PRCU) is based on a fast consensus protocol
> published in the following paper:
> 
> Fast Consensus Using Bounded Staleness for Scalable Read-mostly 
> Synchronization.
> Haibo Chen, Heng Zhang, Ran Liu, Binyu Zang, and Haibing Guan.
> IEEE Transactions on Parallel and Distributed Systems (TPDS), 2016.
> https://dl.acm.org/citation.cfm?id=3024114.3024143

That's an utterly useless changelog for something like a new RCU
implementation.

You fail to describe why you're proposing a new RCU implementation; what
problems does it fix?, how is it better?

All you provide is a paywalled link to some paper that we can't read.

Please write a real changelog that describes things properly and
provide, if at all possible, a readily accessible link to your paper.


Re: [PATCH RFC 01/16] prcu: Add PRCU implementation

2018-01-24 Thread Peter Zijlstra
On Tue, Jan 23, 2018 at 03:59:26PM +0800, liangli...@huawei.com wrote:
> From: Heng Zhang 
> 
> This RCU implementation (PRCU) is based on a fast consensus protocol
> published in the following paper:
> 
> Fast Consensus Using Bounded Staleness for Scalable Read-mostly 
> Synchronization.
> Haibo Chen, Heng Zhang, Ran Liu, Binyu Zang, and Haibing Guan.
> IEEE Transactions on Parallel and Distributed Systems (TPDS), 2016.
> https://dl.acm.org/citation.cfm?id=3024114.3024143

That's an utterly useless changelog for something like a new RCU
implementation.

You fail to describe why you're proposing a new RCU implementation; what
problems does it fix?, how is it better?

All you provide is a paywalled link to some paper that we can't read.

Please write a real changelog that describes things properly and
provide, if at all possible, a readily accessible link to your paper.