[tip: core/rcu] rcu: Fix CPU-offline trace in rcutree_dying_cpu

2021-04-11 Thread tip-bot2 for Neeraj Upadhyay
The following commit has been merged into the core/rcu branch of tip:

Commit-ID: 47fcbc8dd62f15dc75916225ebacdc3bca9c12b2
Gitweb:
https://git.kernel.org/tip/47fcbc8dd62f15dc75916225ebacdc3bca9c12b2
Author:Neeraj Upadhyay 
AuthorDate:Mon, 11 Jan 2021 17:15:58 +05:30
Committer: Paul E. McKenney 
CommitterDate: Mon, 08 Mar 2021 14:17:35 -08:00

rcu: Fix CPU-offline trace in rcutree_dying_cpu

The condition in the trace_rcu_grace_period() in rcutree_dying_cpu() is
backwards, so that it uses the string "cpuofl" when the offline CPU is
blocking the current grace period and "cpuofl-bgp" otherwise.  Given that
the "-bgp" stands for "blocking grace period", this is at best misleading.
This commit therefore switches these strings in order to correctly trace
whether the outgoing cpu blocks the current grace period.

Signed-off-by: Neeraj Upadhyay 
Signed-off-by: Paul E. McKenney 
---
 kernel/rcu/tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index cdf091f..e62c2de 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2413,7 +2413,7 @@ int rcutree_dying_cpu(unsigned int cpu)
 
blkd = !!(rnp->qsmask & rdp->grpmask);
trace_rcu_grace_period(rcu_state.name, READ_ONCE(rnp->gp_seq),
-  blkd ? TPS("cpuofl") : TPS("cpuofl-bgp"));
+  blkd ? TPS("cpuofl-bgp") : TPS("cpuofl"));
return 0;
 }
 


Re: [PATCH 01/13] rcu/nocb: Fix potential missed nocb_timer rearm

2021-03-03 Thread Neeraj Upadhyay

 This is normally improbably given that the timer is set for only
 one jiffy, but timers can be delayed.  Besides, it is possible
 that kernel was built with CONFIG_RCU_STRICT_GRACE_PERIOD=y.
 
 7.  The grace period ends, so rcu_gp_kthread awakens the

 ->nocb_gp_kthread, which in turn awakens both CPU 1's and
 CPU 2's ->nocb_cb_kthread.  Then ->nocb_gb_kthread sleeps
 waiting for more newly queued callbacks.
 
 8.  CPU 1's ->nocb_cb_kthread invokes its callback, then sleeps

 waiting for more invocable callbacks.
 
 9.  Note that neither kthread updated any ->nocb_timer state,

 so CPU 1's ->nocb_defer_wakeup is still set to RCU_NOCB_WAKE.
 
 10. CPU 1 enqueues its second callback, this time with interrupts

 enabled so it can wake directly ->nocb_gp_kthread.
 It does so with calling wake_nocb_gp() which also cancels the
 pending timer that got queued in step 2. But that doesn't reset
 CPU 1's ->nocb_defer_wakeup which is still set to RCU_NOCB_WAKE.
 So CPU 1's ->nocb_defer_wakeup and its ->nocb_timer are now
 desynchronized.
 
 11. ->nocb_gp_kthread associates the callback queued in 10 with a new

 grace period, arranges for that grace period to start and sleeps
 waiting for it to complete.
 
 12. The grace period ends, rcu_gp_kthread awakens ->nocb_gp_kthread,

 which in turn wakes up CPU 1's ->nocb_cb_kthread which then
 invokes the callback queued in 10.
 
 13. CPU 1 enqueues its third callback, this time with interrupts

 disabled so it must queue a timer for a deferred wakeup. However
 the value of its ->nocb_defer_wakeup is RCU_NOCB_WAKE which
 incorrectly indicates that a timer is already queued.  Instead,
 CPU 1's ->nocb_timer was cancelled in 10.  CPU 1 therefore fails
 to queue the ->nocb_timer.
 
 14. CPU 1 has its pending callback and it may go unnoticed until

 some other CPU ever wakes up ->nocb_gp_kthread or CPU 1 ever
 calls an explicit deferred wakeup, for example, during idle entry.
 
 This commit fixes this bug by resetting rdp->nocb_defer_wakeup everytime

 we delete the ->nocb_timer.
 
 Fixes: d1b222c6be1f (rcu/nocb: Add bypass callback queueing)

 Cc: Stable 
 Cc: Josh Triplett 
 Cc: Lai Jiangshan 
 Cc: Joel Fernandes 
 Cc: Neeraj Upadhyay 
 Cc: Boqun Feng 
 Signed-off-by: Frederic Weisbecker 
 Signed-off-by: Paul E. McKenney 

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 44746d8..429491d 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1721,7 +1721,11 @@ static bool wake_nocb_gp(struct rcu_data *rdp, bool 
force,
rcu_nocb_unlock_irqrestore(rdp, flags);
return false;
}
-   del_timer(>nocb_timer);
+
+   if (READ_ONCE(rdp->nocb_defer_wakeup) > RCU_NOCB_WAKE_NOT) {
+   WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
+   del_timer(>nocb_timer);
+   }
rcu_nocb_unlock_irqrestore(rdp, flags);
raw_spin_lock_irqsave(_gp->nocb_gp_lock, flags);
if (force || READ_ONCE(rdp_gp->nocb_gp_sleep)) {
@@ -2350,7 +2354,6 @@ static bool do_nocb_deferred_wakeup_common(struct 
rcu_data *rdp)
return false;
}
ndw = READ_ONCE(rdp->nocb_defer_wakeup);
-   WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
ret = wake_nocb_gp(rdp, ndw == RCU_NOCB_WAKE_FORCE, flags);
trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("DeferredWake"));
  



Reviewed-by: Neeraj Upadhyay 


Thanks
Neeraj

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member of the Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH] arm64: Add part number for Arm Cortex-A78

2021-02-17 Thread Neeraj Upadhyay




On 2/17/2021 10:36 PM, Will Deacon wrote:

On Wed, Feb 17, 2021 at 10:14:11PM +0530, Neeraj Upadhyay wrote:

Add the MIDR part number info for the Arm Cortex-A78.

Signed-off-by: Neeraj Upadhyay 
---
  arch/arm64/include/asm/cputype.h | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/arch/arm64/include/asm/cputype.h b/arch/arm64/include/asm/cputype.h
index ef5b040..3aced88 100644
--- a/arch/arm64/include/asm/cputype.h
+++ b/arch/arm64/include/asm/cputype.h
@@ -72,6 +72,7 @@
  #define ARM_CPU_PART_CORTEX_A76   0xD0B
  #define ARM_CPU_PART_NEOVERSE_N1  0xD0C
  #define ARM_CPU_PART_CORTEX_A77   0xD0D
+#define ARM_CPU_PART_CORTEX_A780xD41
  
  #define APM_CPU_PART_POTENZA		0x000
  
@@ -109,6 +110,7 @@

  #define MIDR_CORTEX_A76   MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, 
ARM_CPU_PART_CORTEX_A76)
  #define MIDR_NEOVERSE_N1 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, 
ARM_CPU_PART_NEOVERSE_N1)
  #define MIDR_CORTEX_A77   MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, 
ARM_CPU_PART_CORTEX_A77)
+#define MIDR_CORTEX_A78MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, 
ARM_CPU_PART_CORTEX_A78)
  #define MIDR_THUNDERX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, 
CAVIUM_CPU_PART_THUNDERX)
  #define MIDR_THUNDERX_81XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, 
CAVIUM_CPU_PART_THUNDERX_81XX)
  #define MIDR_THUNDERX_83XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, 
CAVIUM_CPU_PART_THUNDERX_83XX)


This usually means there's an erratum to work around. What are you hiding ;)

Will



:) . This is needed for supporting implementation defined AMU counters 
in A78 [1]. However, there is no upstream user of it.


[1] https://www.spinics.net/lists/arm-kernel/msg856989.html


Thanks
Neeraj

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member of the Code Aurora Forum, hosted by The Linux Foundation


[PATCH] arm64: Add part number for Arm Cortex-A78

2021-02-17 Thread Neeraj Upadhyay
Add the MIDR part number info for the Arm Cortex-A78.

Signed-off-by: Neeraj Upadhyay 
---
 arch/arm64/include/asm/cputype.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/include/asm/cputype.h b/arch/arm64/include/asm/cputype.h
index ef5b040..3aced88 100644
--- a/arch/arm64/include/asm/cputype.h
+++ b/arch/arm64/include/asm/cputype.h
@@ -72,6 +72,7 @@
 #define ARM_CPU_PART_CORTEX_A760xD0B
 #define ARM_CPU_PART_NEOVERSE_N1   0xD0C
 #define ARM_CPU_PART_CORTEX_A770xD0D
+#define ARM_CPU_PART_CORTEX_A780xD41
 
 #define APM_CPU_PART_POTENZA   0x000
 
@@ -109,6 +110,7 @@
 #define MIDR_CORTEX_A76MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, 
ARM_CPU_PART_CORTEX_A76)
 #define MIDR_NEOVERSE_N1 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, 
ARM_CPU_PART_NEOVERSE_N1)
 #define MIDR_CORTEX_A77MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, 
ARM_CPU_PART_CORTEX_A77)
+#define MIDR_CORTEX_A78MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, 
ARM_CPU_PART_CORTEX_A78)
 #define MIDR_THUNDERX  MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, 
CAVIUM_CPU_PART_THUNDERX)
 #define MIDR_THUNDERX_81XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, 
CAVIUM_CPU_PART_THUNDERX_81XX)
 #define MIDR_THUNDERX_83XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, 
CAVIUM_CPU_PART_THUNDERX_83XX)
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
the Code Aurora Forum, 
hosted by The Linux Foundation



[tip: core/rcu] rcu: Check and report missed fqs timer wakeup on RCU stall

2021-02-12 Thread tip-bot2 for Neeraj Upadhyay
The following commit has been merged into the core/rcu branch of tip:

Commit-ID: 683954e55c981467bfd4688417e914bafc40959f
Gitweb:
https://git.kernel.org/tip/683954e55c981467bfd4688417e914bafc40959f
Author:Neeraj Upadhyay 
AuthorDate:Mon, 16 Nov 2020 21:36:00 +05:30
Committer: Paul E. McKenney 
CommitterDate: Wed, 06 Jan 2021 16:54:11 -08:00

rcu: Check and report missed fqs timer wakeup on RCU stall

For a new grace period request, the RCU GP kthread transitions through
following states:

a. [RCU_GP_WAIT_GPS] -> [RCU_GP_DONE_GPS]

The RCU_GP_WAIT_GPS state is where the GP kthread waits for a request
for a new GP.  Once it receives a request (for example, when a new RCU
callback is queued), the GP kthread transitions to RCU_GP_DONE_GPS.

b. [RCU_GP_DONE_GPS] -> [RCU_GP_ONOFF]

Grace period initialization starts in rcu_gp_init(), which records the
start of new GP in rcu_state.gp_seq and transitions to RCU_GP_ONOFF.

c. [RCU_GP_ONOFF] -> [RCU_GP_INIT]

The purpose of the RCU_GP_ONOFF state is to apply the online/offline
information that was buffered for any CPUs that recently came online or
went offline.  This state is maintained in per-leaf rcu_node bitmasks,
with the buffered state in ->qsmaskinitnext and the state for the upcoming
GP in ->qsmaskinit.  At the end of this RCU_GP_ONOFF state, each bit in
->qsmaskinit will correspond to a CPU that must pass through a quiescent
state before the upcoming grace period is allowed to complete.

However, a leaf rcu_node structure with an all-zeroes ->qsmaskinit
cannot necessarily be ignored.  In preemptible RCU, there might well be
tasks still in RCU read-side critical sections that were first preempted
while running on one of the CPUs managed by this structure.  Such tasks
will be queued on this structure's ->blkd_tasks list.  Only after this
list fully drains can this leaf rcu_node structure be ignored, and even
then only if none of its CPUs have come back online in the meantime.
Once that happens, the ->qsmaskinit masks further up the tree will be
updated to exclude this leaf rcu_node structure.

Once the ->qsmaskinitnext and ->qsmaskinit fields have been updated
as needed, the GP kthread transitions to RCU_GP_INIT.

d. [RCU_GP_INIT] -> [RCU_GP_WAIT_FQS]

The purpose of the RCU_GP_INIT state is to copy each ->qsmaskinit to
the ->qsmask field within each rcu_node structure.  This copying is done
breadth-first from the root to the leaves.  Why not just copy directly
from ->qsmaskinitnext to ->qsmask?  Because the ->qsmaskinitnext masks
can change in the meantime as additional CPUs come online or go offline.
Such changes would result in inconsistencies in the ->qsmask fields up and
down the tree, which could in turn result in too-short grace periods or
grace-period hangs.  These issues are avoided by snapshotting the leaf
rcu_node structures' ->qsmaskinitnext fields into their ->qsmaskinit
counterparts, generating a consistent set of ->qsmaskinit fields
throughout the tree, and only then copying these consistent ->qsmaskinit
fields to their ->qsmask counterparts.

Once this initialization step is complete, the GP kthread transitions
to RCU_GP_WAIT_FQS, where it waits to do a force-quiescent-state scan
on the one hand or for the end of the grace period on the other.

e. [RCU_GP_WAIT_FQS] -> [RCU_GP_DOING_FQS]

The RCU_GP_WAIT_FQS state waits for one of three things:  (1) An
explicit request to do a force-quiescent-state scan, (2) The end of
the grace period, or (3) A short interval of time, after which it
will do a force-quiescent-state (FQS) scan.  The explicit request can
come from rcutorture or from any CPU that has too many RCU callbacks
queued (see the qhimark kernel parameter and the RCU_GP_FLAG_OVLD
flag).  The aforementioned "short period of time" is specified by the
jiffies_till_first_fqs boot parameter for a given grace period's first
FQS scan and by the jiffies_till_next_fqs for later FQS scans.

Either way, once the wait is over, the GP kthread transitions to
RCU_GP_DOING_FQS.

f. [RCU_GP_DOING_FQS] -> [RCU_GP_CLEANUP]

The RCU_GP_DOING_FQS state performs an FQS scan.  Each such scan carries
out two functions for any CPU whose bit is still set in its leaf rcu_node
structure's ->qsmask field, that is, for any CPU that has not yet reported
a quiescent state for the current grace period:

  i.  Report quiescent states on behalf of CPUs that have been observed
  to be idle (from an RCU perspective) since the beginning of the
  grace period.

  ii. If the current grace period is too old, take various actions to
  encourage holdout CPUs to pass through quiescent states, including
  enlisting the aid of any calls to cond_resched() and might_sleep(),
  and even including IPIing the holdout CPUs.

These checks are skipped for any leaf rcu_node structure with a all-zero
->qsmask field, however such structures are subject 

Re: [PATCH] rcu: Correct cpu offline trace in rcutree_dying_cpu

2021-01-12 Thread Neeraj Upadhyay




On 1/12/2021 11:01 PM, Paul E. McKenney wrote:

On Mon, Jan 11, 2021 at 05:15:58PM +0530, Neeraj Upadhyay wrote:

Correctly trace whether the outgoing cpu blocks current gp in
rcutree_dying_cpu().

Signed-off-by: Neeraj Upadhyay 


Good catch, queued, thank you!  Please see below for my usual
wordsmithing, and please lat me know if I messed something up.

Thanx, Paul



Thanks Paul, looks good!


Thanks
Neeraj




commit ab6e7609e7590e1bb220ef6b0822a823dde46f6c
Author: Neeraj Upadhyay 
Date:   Mon Jan 11 17:15:58 2021 +0530

 rcu: Fix CPU-offline trace in rcutree_dying_cpu
 
 The condition in the trace_rcu_grace_period() in rcutree_dying_cpu() is

 backwards, so that it uses the string "cpuofl" when the offline CPU is
 blocking the current grace period and "cpuofl-bgp" otherwise.  Given that
 the "-bgp" stands for "blocking grace period", this is at best misleading.
 This commit therefore switches these strings in order to correctly trace
 whether the outgoing cpu blocks the current grace period.
 
 Signed-off-by: Neeraj Upadhyay 

 Signed-off-by: Paul E. McKenney 

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index cc6b6fc..63c6dba 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2387,7 +2387,7 @@ int rcutree_dying_cpu(unsigned int cpu)
  
  	blkd = !!(rnp->qsmask & rdp->grpmask);

trace_rcu_grace_period(rcu_state.name, READ_ONCE(rnp->gp_seq),
-  blkd ? TPS("cpuofl") : TPS("cpuofl-bgp"));
+  blkd ? TPS("cpuofl-bgp") : TPS("cpuofl"));
return 0;
  }
  



--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member of the Code Aurora Forum, hosted by The Linux Foundation


[PATCH] rcu: Correct cpu offline trace in rcutree_dying_cpu

2021-01-11 Thread Neeraj Upadhyay
Correctly trace whether the outgoing cpu blocks current gp in
rcutree_dying_cpu().

Signed-off-by: Neeraj Upadhyay 
---
 kernel/rcu/tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 097990a..1f4bff4 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2364,7 +2364,7 @@ int rcutree_dying_cpu(unsigned int cpu)
 
blkd = !!(rnp->qsmask & rdp->grpmask);
trace_rcu_grace_period(rcu_state.name, READ_ONCE(rnp->gp_seq),
-  blkd ? TPS("cpuofl") : TPS("cpuofl-bgp"));
+  blkd ? TPS("cpuofl-bgp") : TPS("cpuofl"));
return 0;
 }
 
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



[PATCH] rcu: Trace cbs accelerated without leaf rnp node lock held

2021-01-11 Thread Neeraj Upadhyay
Trace cbs which are accelerated without rnp lock help in
rcu_accelerate_cbs_unlocked().

Signed-off-by: Neeraj Upadhyay 
---
 kernel/rcu/tree.c | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 338b817..097990a 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1473,6 +1473,17 @@ static void rcu_gp_kthread_wake(void)
swake_up_one(_state.gp_wq);
 }
 
+static void rcu_trace_accelerated_cbs(struct rcu_data *rdp,
+ unsigned long gp_seq_req)
+{
+   rcu_lockdep_assert_cblist_protected(rdp);
+   /* Trace depending on how much we were able to accelerate. */
+   if (rcu_segcblist_restempty(>cblist, RCU_WAIT_TAIL))
+   trace_rcu_grace_period(rcu_state.name, gp_seq_req, 
TPS("AccWaitCB"));
+   else
+   trace_rcu_grace_period(rcu_state.name, gp_seq_req, 
TPS("AccReadyCB"));
+}
+
 /*
  * If there is room, assign a ->gp_seq number to any callbacks on this
  * CPU that have not already been assigned.  Also accelerate any callbacks
@@ -1511,12 +1522,7 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, 
struct rcu_data *rdp)
if (!rcu_segcblist_accelerate(>cblist, gp_seq_req))
return ret;
ret = rcu_start_this_gp(rnp, rdp, gp_seq_req);
-
-   /* Trace depending on how much we were able to accelerate. */
-   if (rcu_segcblist_restempty(>cblist, RCU_WAIT_TAIL))
-   trace_rcu_grace_period(rcu_state.name, gp_seq_req, 
TPS("AccWaitCB"));
-   else
-   trace_rcu_grace_period(rcu_state.name, gp_seq_req, 
TPS("AccReadyCB"));
+   rcu_trace_accelerated_cbs(rdp, gp_seq_req);
 
return ret;
 }
@@ -1538,7 +1544,8 @@ static void rcu_accelerate_cbs_unlocked(struct rcu_node 
*rnp,
c = rcu_seq_snap(_state.gp_seq);
if (!READ_ONCE(rdp->gpwrap) && ULONG_CMP_GE(rdp->gp_seq_needed, c)) {
/* Old request still live, so mark recent callbacks. */
-   (void)rcu_segcblist_accelerate(>cblist, c);
+   if (rcu_segcblist_accelerate(>cblist, c))
+   rcu_trace_accelerated_cbs(rdp, c);
return;
}
raw_spin_lock_rcu_node(rnp); /* irqs already disabled. */
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



[PATCH] rcu: Only trace when cbs are accelerated

2021-01-11 Thread Neeraj Upadhyay
Fix rcu_accelerate_cbs() traces to only trace when cbs are
accelerated in current call.

Signed-off-by: Neeraj Upadhyay 
---
 kernel/rcu/tree.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 804e543..338b817 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1508,8 +1508,9 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, 
struct rcu_data *rdp)
 * number.
 */
gp_seq_req = rcu_seq_snap(_state.gp_seq);
-   if (rcu_segcblist_accelerate(>cblist, gp_seq_req))
-   ret = rcu_start_this_gp(rnp, rdp, gp_seq_req);
+   if (!rcu_segcblist_accelerate(>cblist, gp_seq_req))
+   return ret;
+   ret = rcu_start_this_gp(rnp, rdp, gp_seq_req);
 
/* Trace depending on how much we were able to accelerate. */
if (rcu_segcblist_restempty(>cblist, RCU_WAIT_TAIL))
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



[PATCH] rcu: Fix dynticks_nmi_nesting underflow check in rcu_is_cpu_rrupt_from_idle

2020-12-23 Thread Neeraj Upadhyay
For the smp_call_function() optimization, where callbacks can run from
idle context, in commit 806f04e9fd2c ("rcu: Allow for smp_call_function()
running callbacks from idle"), an additional check is added in
rcu_is_cpu_rrupt_from_idle(), for dynticks_nmi_nesting value being 0,
for these smp_call_function() callbacks running from idle loop.
However, this commit missed updating a preexisting underflow check
of dynticks_nmi_nesting, which checks for a non zero positive value.
Fix this warning and while at it, read the counter only once.

Signed-off-by: Neeraj Upadhyay 
---

Hi,

I was not able to get this warning, with scftorture.

  RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nmi_nesting) <= 0,
"RCU dynticks_nmi_nesting counter underflow/zero!");

Not sure if idle loop smp_call_function() optimization is already present
in mainline?

Another thing, which I am not sure of is, maybe lockdep gets disabled
in the idle loop contexts, where rcu_is_cpu_rrupt_from_idle() is called?
Was this the original intention, to keep the lockdep based
RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nmi_nesting) <= 0
check separate from idle task context nesting value
WARN_ON_ONCE(!nesting && !is_idle_task(current)) check?


Thanks
Neeraj

 kernel/rcu/tree.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index bc8b489..c3037cf 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -457,11 +457,10 @@ static int rcu_is_cpu_rrupt_from_idle(void)
/* Check for counter underflows */
RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nesting) < 0,
 "RCU dynticks_nesting counter underflow!");
-   RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nmi_nesting) <= 0,
-"RCU dynticks_nmi_nesting counter underflow/zero!");
+   nesting = __this_cpu_read(rcu_data.dynticks_nmi_nesting);
+   RCU_LOCKDEP_WARN(nesting < 0, "RCU dynticks_nmi_nesting counter 
underflow!");
 
/* Are we at first interrupt nesting level? */
-   nesting = __this_cpu_read(rcu_data.dynticks_nmi_nesting);
if (nesting > 1)
return false;
 
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



[tip: core/rcu] rcu: Clarify nocb kthreads naming in RCU_NOCB_CPU config

2020-12-13 Thread tip-bot2 for Neeraj Upadhyay
The following commit has been merged into the core/rcu branch of tip:

Commit-ID: a3941517fcd6625adc540aef5ec3f717c8fa71e8
Gitweb:
https://git.kernel.org/tip/a3941517fcd6625adc540aef5ec3f717c8fa71e8
Author:Neeraj Upadhyay 
AuthorDate:Thu, 24 Sep 2020 12:04:10 +05:30
Committer: Paul E. McKenney 
CommitterDate: Thu, 19 Nov 2020 19:37:16 -08:00

rcu: Clarify nocb kthreads naming in RCU_NOCB_CPU config

This commit clarifies that the "p" and the "s" in the in the RCU_NOCB_CPU
config-option description refer to the "x" in the "rcuox/N" kthread name.

Signed-off-by: Neeraj Upadhyay 
[ paulmck: While in the area, update description and advice. ]
Signed-off-by: Paul E. McKenney 
---
 kernel/rcu/Kconfig | 20 
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
index b71e21f..cdc57b4 100644
--- a/kernel/rcu/Kconfig
+++ b/kernel/rcu/Kconfig
@@ -221,19 +221,23 @@ config RCU_NOCB_CPU
  Use this option to reduce OS jitter for aggressive HPC or
  real-time workloads.  It can also be used to offload RCU
  callback invocation to energy-efficient CPUs in battery-powered
- asymmetric multiprocessors.
+ asymmetric multiprocessors.  The price of this reduced jitter
+ is that the overhead of call_rcu() increases and that some
+ workloads will incur significant increases in context-switch
+ rates.
 
  This option offloads callback invocation from the set of CPUs
  specified at boot time by the rcu_nocbs parameter.  For each
  such CPU, a kthread ("rcuox/N") will be created to invoke
  callbacks, where the "N" is the CPU being offloaded, and where
- the "p" for RCU-preempt (PREEMPTION kernels) and "s" for RCU-sched
- (!PREEMPTION kernels).  Nothing prevents this kthread from running
- on the specified CPUs, but (1) the kthreads may be preempted
- between each callback, and (2) affinity or cgroups can be used
- to force the kthreads to run on whatever set of CPUs is desired.
-
- Say Y here if you want to help to debug reduced OS jitter.
+ the "x" is "p" for RCU-preempt (PREEMPTION kernels) and "s" for
+ RCU-sched (!PREEMPTION kernels).  Nothing prevents this kthread
+ from running on the specified CPUs, but (1) the kthreads may be
+ preempted between each callback, and (2) affinity or cgroups can
+ be used to force the kthreads to run on whatever set of CPUs is
+ desired.
+
+ Say Y here if you need reduced OS jitter, despite added overhead.
  Say N here if you are unsure.
 
 config TASKS_TRACE_RCU_READ_MB


[tip: core/rcu] rcu: Fix single-CPU check in rcu_blocking_is_gp()

2020-12-13 Thread tip-bot2 for Neeraj Upadhyay
The following commit has been merged into the core/rcu branch of tip:

Commit-ID: ed73860cecc3ec12aa50a6dcfb4900e5b4ae9507
Gitweb:
https://git.kernel.org/tip/ed73860cecc3ec12aa50a6dcfb4900e5b4ae9507
Author:Neeraj Upadhyay 
AuthorDate:Wed, 23 Sep 2020 12:59:33 +05:30
Committer: Paul E. McKenney 
CommitterDate: Thu, 19 Nov 2020 19:37:16 -08:00

rcu: Fix single-CPU check in rcu_blocking_is_gp()

Currently, for CONFIG_PREEMPTION=n kernels, rcu_blocking_is_gp() uses
num_online_cpus() to determine whether there is only one CPU online.  When
there is only a single CPU online, the simple fact that synchronize_rcu()
could be legally called implies that a full grace period has elapsed.
Therefore, in the single-CPU case, synchronize_rcu() simply returns
immediately.  Unfortunately, num_online_cpus() is unreliable while a
CPU-hotplug operation is transitioning to or from single-CPU operation
because:

1.  num_online_cpus() uses atomic_read(&__num_online_cpus) to
locklessly sample the number of online CPUs.  The hotplug locks
are not held, which means that an incoming CPU can concurrently
update this count.  This in turn means that an RCU read-side
critical section on the incoming CPU might observe updates
prior to the grace period, but also that this critical section
might extend beyond the end of the optimized synchronize_rcu().
This breaks RCU's fundamental guarantee.

2.  In addition, num_online_cpus() does no ordering, thus providing
another way that RCU's fundamental guarantee can be broken by
the current code.

3.  The most probable failure mode happens on outgoing CPUs.
The outgoing CPU updates the count of online CPUs in the
CPUHP_TEARDOWN_CPU stop-machine handler, which is fine in
and of itself due to preemption being disabled at the call
to num_online_cpus().  Unfortunately, after that stop-machine
handler returns, the CPU takes one last trip through the
scheduler (which has RCU readers) and, after the resulting
context switch, one final dive into the idle loop.  During this
time, RCU needs to keep track of two CPUs, but num_online_cpus()
will say that there is only one, which in turn means that the
surviving CPU will incorrectly ignore the outgoing CPU's RCU
read-side critical sections.

This problem is illustrated by the following litmus test in which P0()
corresponds to synchronize_rcu() and P1() corresponds to the incoming CPU.
The herd7 tool confirms that the "exists" clause can be satisfied,
thus demonstrating that this breakage can happen according to the Linux
kernel memory model.

   {
 int x = 0;
 atomic_t numonline = ATOMIC_INIT(1);
   }

   P0(int *x, atomic_t *numonline)
   {
 int r0;
 WRITE_ONCE(*x, 1);
 r0 = atomic_read(numonline);
 if (r0 == 1) {
   smp_mb();
 } else {
   synchronize_rcu();
 }
 WRITE_ONCE(*x, 2);
   }

   P1(int *x, atomic_t *numonline)
   {
 int r0; int r1;

 atomic_inc(numonline);
 smp_mb();
 rcu_read_lock();
 r0 = READ_ONCE(*x);
 smp_rmb();
 r1 = READ_ONCE(*x);
 rcu_read_unlock();
   }

   locations [x;numonline;]

   exists (1:r0=0 /\ 1:r1=2)

It is important to note that these problems arise only when the system
is transitioning to or from single-CPU operation.

One solution would be to hold the CPU-hotplug locks while sampling
num_online_cpus(), which was in fact the intent of the (redundant)
preempt_disable() and preempt_enable() surrounding this call to
num_online_cpus().  Actually blocking CPU hotplug would not only result
in excessive overhead, but would also unnecessarily impede CPU-hotplug
operations.

This commit therefore follows long-standing RCU tradition by maintaining
a separate RCU-specific set of CPU-hotplug books.

This separate set of books is implemented by a new ->n_online_cpus field
in the rcu_state structure that maintains RCU's count of the online CPUs.
This count is incremented early in the CPU-online process, so that
the critical transition away from single-CPU operation will occur when
there is only a single CPU.  Similarly for the critical transition to
single-CPU operation, the counter is decremented late in the CPU-offline
process, again while there is only a single CPU.  Because there is only
ever a single CPU when the ->n_online_cpus field undergoes the critical
1->2 and 2->1 transitions, full memory ordering and mutual exclusion is
provided implicitly and, better yet, for free.

In the case where the CPU is coming online, nothing will happen until
the current CPU helps it come online.  Therefore, the new CPU will see
all accesses prior to the optimized grace period, which means that RCU
does not need to further delay this new CPU.  In the case where the CPU
is going offline, the outgoing CPU is totally out of the picture before
the optimized 

Re: [PATCH v2 tip/core/rcu 1/6] srcu: Make Tiny SRCU use multi-bit grace-period counter

2020-11-28 Thread Neeraj Upadhyay




On 11/28/2020 7:46 AM, Paul E. McKenney wrote:

On Wed, Nov 25, 2020 at 10:03:26AM +0530, Neeraj Upadhyay wrote:



On 11/24/2020 10:48 AM, Neeraj Upadhyay wrote:



On 11/24/2020 1:25 AM, Paul E. McKenney wrote:

On Mon, Nov 23, 2020 at 10:01:13AM +0530, Neeraj Upadhyay wrote:

On 11/21/2020 6:29 AM, paul...@kernel.org wrote:

From: "Paul E. McKenney" 

There is a need for a polling interface for SRCU grace periods.  This
polling needs to distinguish between an SRCU instance being idle on the
one hand or in the middle of a grace period on the other.  This commit
therefore converts the Tiny SRCU srcu_struct structure's srcu_idx from
a defacto boolean to a free-running counter, using the bottom bit to
indicate that a grace period is in progress.  The second-from-bottom
bit is thus used as the index returned by srcu_read_lock().

Link:
https://lore.kernel.org/rcu/20201112201547.gf3365...@moria.home.lan/
Reported-by: Kent Overstreet 
[ paulmck: Fix __srcu_read_lock() idx computation Neeraj per
Upadhyay. ]
Signed-off-by: Paul E. McKenney 
---
    include/linux/srcutiny.h | 4 ++--
    kernel/rcu/srcutiny.c    | 5 +++--
    2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/include/linux/srcutiny.h b/include/linux/srcutiny.h
index 5a5a194..d9edb67 100644
--- a/include/linux/srcutiny.h
+++ b/include/linux/srcutiny.h
@@ -15,7 +15,7 @@
    struct srcu_struct {
    short srcu_lock_nesting[2];    /* srcu_read_lock()
nesting depth. */
-    short srcu_idx;    /* Current reader array element. */
+    unsigned short srcu_idx;    /* Current reader array
element in bit 0x2. */
    u8 srcu_gp_running;    /* GP workqueue running? */
    u8 srcu_gp_waiting;    /* GP waiting for readers? */
    struct swait_queue_head srcu_wq;
@@ -59,7 +59,7 @@ static inline int __srcu_read_lock(struct
srcu_struct *ssp)
    {
    int idx;
-    idx = READ_ONCE(ssp->srcu_idx);
+    idx = ((READ_ONCE(ssp->srcu_idx) + 1) & 0x2) >> 1;
    WRITE_ONCE(ssp->srcu_lock_nesting[idx],
ssp->srcu_lock_nesting[idx] + 1);
    return idx;
    }


Need change in idx calcultion in srcu_torture_stats_print() ?

static inline void srcu_torture_stats_print(struct srcu_struct *ssp,
    idx = READ_ONCE(ssp->srcu_idx) & 0x1;


Excellent point!  It should match the calculation in __srcu_read_lock(),
shouldn't it?  I have updated this, thank you!

     Thanx, Paul



Updated version looks good!


Thanks
Neeraj



For the version in rcu -dev:

Reviewed-by: Neeraj Upadhyay 


I applied all of these, thank you very much!



Welcome :)


Only minor point which I have is, the idx calculation can be made an inline
func (though srcu_drive_gp() does not require a READ_ONCE for ->srcu_idx):

__srcu_read_lock() and srcu_torture_stats_print() are using

idx = ((READ_ONCE(ssp->srcu_idx) + 1) & 0x2) >> 1;

whereas srcu_drive_gp() uses:

idx = (ssp->srcu_idx & 0x2) / 2;


They do work on different elements of the various arrays.  Or do you
believe that the srcu_drive_gp() use needs adjusting?


My bad, I missed that they are using different elements of array.
Please ignore this comment.


Thanks
Neeraj



Either way, the overhead of READ_ONCE() is absolutely not at all
a problem.  Would you like to put together a patch so that I can see
exactly what you are suggesting?

Thanx, Paul


Thanks
Neeraj


Thanks
Neeraj


diff --git a/kernel/rcu/srcutiny.c b/kernel/rcu/srcutiny.c
index 6208c1d..5598cf6 100644
--- a/kernel/rcu/srcutiny.c
+++ b/kernel/rcu/srcutiny.c
@@ -124,11 +124,12 @@ void srcu_drive_gp(struct work_struct *wp)
    ssp->srcu_cb_head = NULL;
    ssp->srcu_cb_tail = >srcu_cb_head;
    local_irq_enable();
-    idx = ssp->srcu_idx;
-    WRITE_ONCE(ssp->srcu_idx, !ssp->srcu_idx);
+    idx = (ssp->srcu_idx & 0x2) / 2;
+    WRITE_ONCE(ssp->srcu_idx, ssp->srcu_idx + 1);
    WRITE_ONCE(ssp->srcu_gp_waiting, true);  /*
srcu_read_unlock() wakes! */
    swait_event_exclusive(ssp->srcu_wq,
!READ_ONCE(ssp->srcu_lock_nesting[idx]));
    WRITE_ONCE(ssp->srcu_gp_waiting, false); /*
srcu_read_unlock() cheap. */
+    WRITE_ONCE(ssp->srcu_idx, ssp->srcu_idx + 1);
    /* Invoke the callbacks we removed above. */
    while (lh) {



--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is
a member of
the Code Aurora Forum, hosted by The Linux Foundation




--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of
the Code Aurora Forum, hosted by The Linux Foundation


--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member of the Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH v2 tip/core/rcu 6/6] srcu: Document polling interfaces for Tree SRCU grace periods

2020-11-27 Thread Neeraj Upadhyay




On 11/21/2020 6:29 AM, paul...@kernel.org wrote:

From: "Paul E. McKenney" 

This commit adds requirements documentation for the
get_state_synchronize_srcu(), start_poll_synchronize_srcu(), and
poll_state_synchronize_srcu() functions.

Link: https://lore.kernel.org/rcu/20201112201547.gf3365...@moria.home.lan/
Reported-by: Kent Overstreet 
Signed-off-by: Paul E. McKenney 
---


Reviewed-by: Neeraj Upadhyay 


Thanks
Neeraj


  Documentation/RCU/Design/Requirements/Requirements.rst | 18 ++
  1 file changed, 18 insertions(+)

diff --git a/Documentation/RCU/Design/Requirements/Requirements.rst 
b/Documentation/RCU/Design/Requirements/Requirements.rst
index 1e3df77..2dce79d 100644
--- a/Documentation/RCU/Design/Requirements/Requirements.rst
+++ b/Documentation/RCU/Design/Requirements/Requirements.rst
@@ -2600,6 +2600,24 @@ also includes DEFINE_SRCU(), DEFINE_STATIC_SRCU(), and
  init_srcu_struct() APIs for defining and initializing
  ``srcu_struct`` structures.
  
+More recently, the SRCU API has added polling interfaces:

+
+#. start_poll_synchronize_srcu() returns a cookie identifying
+   the completion of a future SRCU grace period and ensures
+   that this grace period will be started.
+#. poll_state_synchronize_srcu() returns ``true`` iff the
+   specified cookie corresponds to an already-completed
+   SRCU grace period.
+#. get_state_synchronize_srcu() returns a cookie just like
+   start_poll_synchronize_srcu() does, but differs in that
+   it does nothing to ensure that any future SRCU grace period
+   will be started.
+
+These functions are used to avoid unnecessary SRCU grace periods in
+certain types of buffer-cache algorithms having multi-stage age-out
+mechanisms.  The idea is that by the time the block has aged completely
+from the cache, an SRCU grace period will be very likely to have elapsed.
+
  Tasks RCU
  ~
  



--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member of the Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH v2 tip/core/rcu 5/6] srcu: Provide polling interfaces for Tree SRCU grace periods

2020-11-26 Thread Neeraj Upadhyay




On 11/21/2020 6:29 AM, paul...@kernel.org wrote:

From: "Paul E. McKenney" 

There is a need for a polling interface for SRCU grace
periods, so this commit supplies get_state_synchronize_srcu(),
start_poll_synchronize_srcu(), and poll_state_synchronize_srcu() for this
purpose.  The first can be used if future grace periods are inevitable
(perhaps due to a later call_srcu() invocation), the second if future
grace periods might not otherwise happen, and the third to check if a
grace period has elapsed since the corresponding call to either of the
first two.

As with get_state_synchronize_rcu() and cond_synchronize_rcu(),
the return value from either get_state_synchronize_srcu() or
start_poll_synchronize_srcu() must be passed in to a later call to
poll_state_synchronize_srcu().

Link: https://lore.kernel.org/rcu/20201112201547.gf3365...@moria.home.lan/
Reported-by: Kent Overstreet 
[ paulmck: Add EXPORT_SYMBOL_GPL() per kernel test robot feedback. ]
[ paulmck: Apply feedback from Neeraj Upadhyay. ]
Link: https://lore.kernel.org/lkml/20201117004017.GA7444@paulmck-ThinkPad-P72/
Signed-off-by: Paul E. McKenney 
---


For version in -rcu dev

Reviewed-by: Neeraj Upadhyay 


Thanks
Neeraj


  kernel/rcu/srcutiny.c |  2 +-
  kernel/rcu/srcutree.c | 67 ---
  2 files changed, 64 insertions(+), 5 deletions(-)

diff --git a/kernel/rcu/srcutiny.c b/kernel/rcu/srcutiny.c
index b073175..c8f4202 100644
--- a/kernel/rcu/srcutiny.c
+++ b/kernel/rcu/srcutiny.c
@@ -148,7 +148,7 @@ void srcu_drive_gp(struct work_struct *wp)
 * straighten that out.
 */
WRITE_ONCE(ssp->srcu_gp_running, false);
-   if (USHORT_CMP_GE(ssp->srcu_idx, READ_ONCE(ssp->srcu_idx_max)))
+   if (USHORT_CMP_LT(ssp->srcu_idx, READ_ONCE(ssp->srcu_idx_max)))
schedule_work(>srcu_work);
  }
  EXPORT_SYMBOL_GPL(srcu_drive_gp);
diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index d930ece..945c047 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -810,7 +810,8 @@ static void srcu_leak_callback(struct rcu_head *rhp)
  /*
   * Start an SRCU grace period, and also queue the callback if non-NULL.
   */
-static void srcu_gp_start_if_needed(struct srcu_struct *ssp, struct rcu_head 
*rhp, bool do_norm)
+static unsigned long srcu_gp_start_if_needed(struct srcu_struct *ssp,
+struct rcu_head *rhp, bool do_norm)
  {
unsigned long flags;
int idx;
@@ -819,10 +820,12 @@ static void srcu_gp_start_if_needed(struct srcu_struct 
*ssp, struct rcu_head *rh
unsigned long s;
struct srcu_data *sdp;
  
+	check_init_srcu_struct(ssp);

idx = srcu_read_lock(ssp);
sdp = raw_cpu_ptr(ssp->sda);
spin_lock_irqsave_rcu_node(sdp, flags);
-   rcu_segcblist_enqueue(>srcu_cblist, rhp);
+   if (rhp)
+   rcu_segcblist_enqueue(>srcu_cblist, rhp);
rcu_segcblist_advance(>srcu_cblist,
  rcu_seq_current(>srcu_gp_seq));
s = rcu_seq_snap(>srcu_gp_seq);
@@ -841,6 +844,7 @@ static void srcu_gp_start_if_needed(struct srcu_struct 
*ssp, struct rcu_head *rh
else if (needexp)
srcu_funnel_exp_start(ssp, sdp->mynode, s);
srcu_read_unlock(ssp, idx);
+   return s;
  }
  
  /*

@@ -874,7 +878,6 @@ static void srcu_gp_start_if_needed(struct srcu_struct 
*ssp, struct rcu_head *rh
  static void __call_srcu(struct srcu_struct *ssp, struct rcu_head *rhp,
rcu_callback_t func, bool do_norm)
  {
-   check_init_srcu_struct(ssp);
if (debug_rcu_head_queue(rhp)) {
/* Probable double call_srcu(), so leak the callback. */
WRITE_ONCE(rhp->func, srcu_leak_callback);
@@ -882,7 +885,7 @@ static void __call_srcu(struct srcu_struct *ssp, struct 
rcu_head *rhp,
return;
}
rhp->func = func;
-   srcu_gp_start_if_needed(ssp, rhp, do_norm);
+   (void)srcu_gp_start_if_needed(ssp, rhp, do_norm);
  }
  
  /**

@@ -1011,6 +1014,62 @@ void synchronize_srcu(struct srcu_struct *ssp)
  }
  EXPORT_SYMBOL_GPL(synchronize_srcu);
  
+/**

+ * get_state_synchronize_srcu - Provide an end-of-grace-period cookie
+ * @ssp: srcu_struct to provide cookie for.
+ *
+ * This function returns a cookie that can be passed to
+ * poll_state_synchronize_srcu(), which will return true if a full grace
+ * period has elapsed in the meantime.  It is the caller's responsibility
+ * to make sure that grace period happens, for example, by invoking
+ * call_srcu() after return from get_state_synchronize_srcu().
+ */
+unsigned long get_state_synchronize_srcu(struct srcu_struct *ssp)
+{
+   // Any prior manipulation of SRCU-protected data must happen
+   // before the load from ->srcu_gp_seq.
+   smp_mb();
+   return rcu_seq_snap(>srcu_gp_seq);
+}
+EXPORT_SYMBOL_GPL(get_st

Re: [PATCH v2 tip/core/rcu 4/6] srcu: Provide polling interfaces for Tiny SRCU grace periods

2020-11-24 Thread Neeraj Upadhyay




On 11/25/2020 1:00 AM, Paul E. McKenney wrote:

On Tue, Nov 24, 2020 at 10:44:24AM +0530, Neeraj Upadhyay wrote:



On 11/24/2020 2:42 AM, Paul E. McKenney wrote:

On Mon, Nov 23, 2020 at 10:13:13AM +0530, Neeraj Upadhyay wrote:



On 11/21/2020 6:29 AM, paul...@kernel.org wrote:

From: "Paul E. McKenney" 

There is a need for a polling interface for SRCU grace
periods, so this commit supplies get_state_synchronize_srcu(),
start_poll_synchronize_srcu(), and poll_state_synchronize_srcu() for this
purpose.  The first can be used if future grace periods are inevitable
(perhaps due to a later call_srcu() invocation), the second if future
grace periods might not otherwise happen, and the third to check if a
grace period has elapsed since the corresponding call to either of the
first two.

As with get_state_synchronize_rcu() and cond_synchronize_rcu(),
the return value from either get_state_synchronize_srcu() or
start_poll_synchronize_srcu() must be passed in to a later call to
poll_state_synchronize_srcu().

Link: https://lore.kernel.org/rcu/20201112201547.gf3365...@moria.home.lan/
Reported-by: Kent Overstreet 
[ paulmck: Add EXPORT_SYMBOL_GPL() per kernel test robot feedback. ]
[ paulmck: Apply feedback from Neeraj Upadhyay. ]
Link: https://lore.kernel.org/lkml/20201117004017.GA7444@paulmck-ThinkPad-P72/
Signed-off-by: Paul E. McKenney 
---
include/linux/rcupdate.h |  2 ++
include/linux/srcu.h |  3 +++
include/linux/srcutiny.h |  1 +
kernel/rcu/srcutiny.c| 52 
++--
4 files changed, 56 insertions(+), 2 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index de08264..e09c0d8 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -33,6 +33,8 @@
#define ULONG_CMP_GE(a, b)  (ULONG_MAX / 2 >= (a) - (b))
#define ULONG_CMP_LT(a, b)  (ULONG_MAX / 2 < (a) - (b))
#define ulong2long(a)   (*(long *)(&(a)))
+#define USHORT_CMP_GE(a, b)(USHRT_MAX / 2 >= (unsigned short)((a) - (b)))
+#define USHORT_CMP_LT(a, b)(USHRT_MAX / 2 < (unsigned short)((a) - (b)))
/* Exported common interfaces */
void call_rcu(struct rcu_head *head, rcu_callback_t func);
diff --git a/include/linux/srcu.h b/include/linux/srcu.h
index e432cc9..a0895bb 100644
--- a/include/linux/srcu.h
+++ b/include/linux/srcu.h
@@ -60,6 +60,9 @@ void cleanup_srcu_struct(struct srcu_struct *ssp);
int __srcu_read_lock(struct srcu_struct *ssp) __acquires(ssp);
void __srcu_read_unlock(struct srcu_struct *ssp, int idx) __releases(ssp);
void synchronize_srcu(struct srcu_struct *ssp);
+unsigned long get_state_synchronize_srcu(struct srcu_struct *ssp);
+unsigned long start_poll_synchronize_srcu(struct srcu_struct *ssp);
+bool poll_state_synchronize_srcu(struct srcu_struct *ssp, unsigned long 
cookie);
#ifdef CONFIG_DEBUG_LOCK_ALLOC
diff --git a/include/linux/srcutiny.h b/include/linux/srcutiny.h
index d9edb67..c7f0c1f 100644
--- a/include/linux/srcutiny.h
+++ b/include/linux/srcutiny.h
@@ -16,6 +16,7 @@
struct srcu_struct {
short srcu_lock_nesting[2]; /* srcu_read_lock() nesting depth. */
unsigned short srcu_idx;/* Current reader array element in bit 
0x2. */
+   unsigned short srcu_idx_max;/* Furthest future srcu_idx request. */
u8 srcu_gp_running; /* GP workqueue running? */
u8 srcu_gp_waiting; /* GP waiting for readers? */
struct swait_queue_head srcu_wq;
diff --git a/kernel/rcu/srcutiny.c b/kernel/rcu/srcutiny.c
index 3bac1db..b073175 100644
--- a/kernel/rcu/srcutiny.c
+++ b/kernel/rcu/srcutiny.c
@@ -34,6 +34,7 @@ static int init_srcu_struct_fields(struct srcu_struct *ssp)
ssp->srcu_gp_running = false;
ssp->srcu_gp_waiting = false;
ssp->srcu_idx = 0;
+   ssp->srcu_idx_max = 0;
INIT_WORK(>srcu_work, srcu_drive_gp);
INIT_LIST_HEAD(>srcu_work.entry);
return 0;


Minor: cleanup_srcu_struct() can probably have 2 new sanity checks?

WARN_ON(ssp->srcu_idx != ssp->srcu_idx_max);
WARN_ON(ssp->srcu_idx & 1);


Good point, added and under test.


Thanks
Neeraj


@@ -114,7 +115,7 @@ void srcu_drive_gp(struct work_struct *wp)
struct srcu_struct *ssp;
ssp = container_of(wp, struct srcu_struct, srcu_work);
-   if (ssp->srcu_gp_running || !READ_ONCE(ssp->srcu_cb_head))
+   if (ssp->srcu_gp_running || USHORT_CMP_GE(ssp->srcu_idx, 
READ_ONCE(ssp->srcu_idx_max)))
return; /* Already running or nothing to do. */
/* Remove recently arrived callbacks and wait for readers. */
@@ -147,13 +148,18 @@ void srcu_drive_gp(struct work_struct *wp)
 * straighten that out.
 */
WRITE_ONCE(ssp->srcu_gp_running, false);
-   if (READ_ONCE(ssp->srcu_cb_head))
+   if (USHORT_CMP_GE(ssp->srcu_idx, READ_ONCE(ssp->srcu_idx_max)))

Re: [PATCH v2 tip/core/rcu 1/6] srcu: Make Tiny SRCU use multi-bit grace-period counter

2020-11-24 Thread Neeraj Upadhyay




On 11/24/2020 10:48 AM, Neeraj Upadhyay wrote:



On 11/24/2020 1:25 AM, Paul E. McKenney wrote:

On Mon, Nov 23, 2020 at 10:01:13AM +0530, Neeraj Upadhyay wrote:

On 11/21/2020 6:29 AM, paul...@kernel.org wrote:

From: "Paul E. McKenney" 

There is a need for a polling interface for SRCU grace periods.  This
polling needs to distinguish between an SRCU instance being idle on the
one hand or in the middle of a grace period on the other.  This commit
therefore converts the Tiny SRCU srcu_struct structure's srcu_idx from
a defacto boolean to a free-running counter, using the bottom bit to
indicate that a grace period is in progress.  The second-from-bottom
bit is thus used as the index returned by srcu_read_lock().

Link: 
https://lore.kernel.org/rcu/20201112201547.gf3365...@moria.home.lan/

Reported-by: Kent Overstreet 
[ paulmck: Fix __srcu_read_lock() idx computation Neeraj per 
Upadhyay. ]

Signed-off-by: Paul E. McKenney 
---
   include/linux/srcutiny.h | 4 ++--
   kernel/rcu/srcutiny.c    | 5 +++--
   2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/include/linux/srcutiny.h b/include/linux/srcutiny.h
index 5a5a194..d9edb67 100644
--- a/include/linux/srcutiny.h
+++ b/include/linux/srcutiny.h
@@ -15,7 +15,7 @@
   struct srcu_struct {
   short srcu_lock_nesting[2];    /* srcu_read_lock() nesting 
depth. */

-    short srcu_idx;    /* Current reader array element. */
+    unsigned short srcu_idx;    /* Current reader array element in 
bit 0x2. */

   u8 srcu_gp_running;    /* GP workqueue running? */
   u8 srcu_gp_waiting;    /* GP waiting for readers? */
   struct swait_queue_head srcu_wq;
@@ -59,7 +59,7 @@ static inline int __srcu_read_lock(struct 
srcu_struct *ssp)

   {
   int idx;
-    idx = READ_ONCE(ssp->srcu_idx);
+    idx = ((READ_ONCE(ssp->srcu_idx) + 1) & 0x2) >> 1;
   WRITE_ONCE(ssp->srcu_lock_nesting[idx], 
ssp->srcu_lock_nesting[idx] + 1);

   return idx;
   }


Need change in idx calcultion in srcu_torture_stats_print() ?

static inline void srcu_torture_stats_print(struct srcu_struct *ssp,
   idx = READ_ONCE(ssp->srcu_idx) & 0x1;


Excellent point!  It should match the calculation in __srcu_read_lock(),
shouldn't it?  I have updated this, thank you!

    Thanx, Paul



Updated version looks good!


Thanks
Neeraj



For the version in rcu -dev:

Reviewed-by: Neeraj Upadhyay 

Only minor point which I have is, the idx calculation can be made an 
inline func (though srcu_drive_gp() does not require a READ_ONCE for 
->srcu_idx):


__srcu_read_lock() and srcu_torture_stats_print() are using

idx = ((READ_ONCE(ssp->srcu_idx) + 1) & 0x2) >> 1;

whereas srcu_drive_gp() uses:

idx = (ssp->srcu_idx & 0x2) / 2;

Thanks
Neeraj


Thanks
Neeraj


diff --git a/kernel/rcu/srcutiny.c b/kernel/rcu/srcutiny.c
index 6208c1d..5598cf6 100644
--- a/kernel/rcu/srcutiny.c
+++ b/kernel/rcu/srcutiny.c
@@ -124,11 +124,12 @@ void srcu_drive_gp(struct work_struct *wp)
   ssp->srcu_cb_head = NULL;
   ssp->srcu_cb_tail = >srcu_cb_head;
   local_irq_enable();
-    idx = ssp->srcu_idx;
-    WRITE_ONCE(ssp->srcu_idx, !ssp->srcu_idx);
+    idx = (ssp->srcu_idx & 0x2) / 2;
+    WRITE_ONCE(ssp->srcu_idx, ssp->srcu_idx + 1);
   WRITE_ONCE(ssp->srcu_gp_waiting, true);  /* 
srcu_read_unlock() wakes! */
   swait_event_exclusive(ssp->srcu_wq, 
!READ_ONCE(ssp->srcu_lock_nesting[idx]));
   WRITE_ONCE(ssp->srcu_gp_waiting, false); /* 
srcu_read_unlock() cheap. */

+    WRITE_ONCE(ssp->srcu_idx, ssp->srcu_idx + 1);
   /* Invoke the callbacks we removed above. */
   while (lh) {



--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member of

the Code Aurora Forum, hosted by The Linux Foundation




--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member of the Code Aurora Forum, hosted by The Linux Foundation


Re: AMU extension v1 support for cortex A76, A77, A78 CPUs

2020-11-23 Thread Neeraj Upadhyay

Thanks Marc, Vladimir, Mark, Sudeep for your inputs!


Thanks
Neeraj


On 11/20/2020 3:43 PM, Mark Rutland wrote:

On Fri, Nov 20, 2020 at 09:09:00AM +, Vladimir Murzin wrote:

On 11/20/20 8:56 AM, Marc Zyngier wrote:

On 2020-11-20 04:30, Neeraj Upadhyay wrote:

Hi,

For ARM cortex A76, A77, A78 cores (which as per TRM, support AMU)
AA64PFR0[47:44] field is not set, and AMU does not get enabled for
them.
Can you please provide support for these CPUs in cpufeature.c?


If that was the case, that'd be an erratum, and it would need to be
documented as such. It could also be that this is an optional feature
for these cores (though the TRM doesn't suggest that).

Can someone at ARM confirm what is the expected behaviour of these CPUs?


Not a confirmation, but IIRC, these are imp def features, while our cpufeatures
catches architected one.


We generally don't make use of IMP-DEF featurees because of all the pain
it brings.

Looking at the Cortex-A76 TRM, the encoding for AMCNTENCLR is:

  Op0: 3  (0b11)
  Op1: 3  (0b011)
  CRn: 15 (0b)
  CRm: 9  (0b1001)
  Op2: 7  (0b111)

... whereas the architected encoding (from our sysreg.h) is:

  Op0: 3
  Op1: 3
  CRn: 13
  CRm: 2
  Op2: 4

... so that's a different register with the same name, which is
confusing and unfortunate.

The encodings are different (and I haven't checked whether the fields /
semantics are the same), so it's not just a matter of wiring up new
detection code. There are also IMP-DEF traps in ACTLR_EL3 and ACTLR_EL2
which we can't be certain of the configuration of, and as the registers
are in the IMP-DEF encoding space they'll be trapped by HCR_EL2.TIDCP
and emulated as UNDEFINED by a hypervisor. All of that means that going
by the MIDR alone is not sufficient to know we can safely access the
registers.

So as usual for IMP-DEF stuff I don't think we can or should make use of
this.

Thanks,
Mark.



--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member of the Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH v2 tip/core/rcu 1/6] srcu: Make Tiny SRCU use multi-bit grace-period counter

2020-11-23 Thread Neeraj Upadhyay




On 11/24/2020 1:25 AM, Paul E. McKenney wrote:

On Mon, Nov 23, 2020 at 10:01:13AM +0530, Neeraj Upadhyay wrote:

On 11/21/2020 6:29 AM, paul...@kernel.org wrote:

From: "Paul E. McKenney" 

There is a need for a polling interface for SRCU grace periods.  This
polling needs to distinguish between an SRCU instance being idle on the
one hand or in the middle of a grace period on the other.  This commit
therefore converts the Tiny SRCU srcu_struct structure's srcu_idx from
a defacto boolean to a free-running counter, using the bottom bit to
indicate that a grace period is in progress.  The second-from-bottom
bit is thus used as the index returned by srcu_read_lock().

Link: https://lore.kernel.org/rcu/20201112201547.gf3365...@moria.home.lan/
Reported-by: Kent Overstreet 
[ paulmck: Fix __srcu_read_lock() idx computation Neeraj per Upadhyay. ]
Signed-off-by: Paul E. McKenney 
---
   include/linux/srcutiny.h | 4 ++--
   kernel/rcu/srcutiny.c| 5 +++--
   2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/include/linux/srcutiny.h b/include/linux/srcutiny.h
index 5a5a194..d9edb67 100644
--- a/include/linux/srcutiny.h
+++ b/include/linux/srcutiny.h
@@ -15,7 +15,7 @@
   struct srcu_struct {
short srcu_lock_nesting[2]; /* srcu_read_lock() nesting depth. */
-   short srcu_idx; /* Current reader array element. */
+   unsigned short srcu_idx;/* Current reader array element in bit 
0x2. */
u8 srcu_gp_running; /* GP workqueue running? */
u8 srcu_gp_waiting; /* GP waiting for readers? */
struct swait_queue_head srcu_wq;
@@ -59,7 +59,7 @@ static inline int __srcu_read_lock(struct srcu_struct *ssp)
   {
int idx;
-   idx = READ_ONCE(ssp->srcu_idx);
+   idx = ((READ_ONCE(ssp->srcu_idx) + 1) & 0x2) >> 1;
WRITE_ONCE(ssp->srcu_lock_nesting[idx], ssp->srcu_lock_nesting[idx] + 
1);
return idx;
   }


Need change in idx calcultion in srcu_torture_stats_print() ?

static inline void srcu_torture_stats_print(struct srcu_struct *ssp,
   idx = READ_ONCE(ssp->srcu_idx) & 0x1;


Excellent point!  It should match the calculation in __srcu_read_lock(),
shouldn't it?  I have updated this, thank you!

Thanx, Paul



Updated version looks good!


Thanks
Neeraj


Thanks
Neeraj


diff --git a/kernel/rcu/srcutiny.c b/kernel/rcu/srcutiny.c
index 6208c1d..5598cf6 100644
--- a/kernel/rcu/srcutiny.c
+++ b/kernel/rcu/srcutiny.c
@@ -124,11 +124,12 @@ void srcu_drive_gp(struct work_struct *wp)
ssp->srcu_cb_head = NULL;
ssp->srcu_cb_tail = >srcu_cb_head;
local_irq_enable();
-   idx = ssp->srcu_idx;
-   WRITE_ONCE(ssp->srcu_idx, !ssp->srcu_idx);
+   idx = (ssp->srcu_idx & 0x2) / 2;
+   WRITE_ONCE(ssp->srcu_idx, ssp->srcu_idx + 1);
WRITE_ONCE(ssp->srcu_gp_waiting, true);  /* srcu_read_unlock() wakes! */
swait_event_exclusive(ssp->srcu_wq, 
!READ_ONCE(ssp->srcu_lock_nesting[idx]));
WRITE_ONCE(ssp->srcu_gp_waiting, false); /* srcu_read_unlock() cheap. */
+   WRITE_ONCE(ssp->srcu_idx, ssp->srcu_idx + 1);
/* Invoke the callbacks we removed above. */
while (lh) {



--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of
the Code Aurora Forum, hosted by The Linux Foundation


--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member of the Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH v2 tip/core/rcu 4/6] srcu: Provide polling interfaces for Tiny SRCU grace periods

2020-11-23 Thread Neeraj Upadhyay




On 11/24/2020 2:42 AM, Paul E. McKenney wrote:

On Mon, Nov 23, 2020 at 10:13:13AM +0530, Neeraj Upadhyay wrote:



On 11/21/2020 6:29 AM, paul...@kernel.org wrote:

From: "Paul E. McKenney" 

There is a need for a polling interface for SRCU grace
periods, so this commit supplies get_state_synchronize_srcu(),
start_poll_synchronize_srcu(), and poll_state_synchronize_srcu() for this
purpose.  The first can be used if future grace periods are inevitable
(perhaps due to a later call_srcu() invocation), the second if future
grace periods might not otherwise happen, and the third to check if a
grace period has elapsed since the corresponding call to either of the
first two.

As with get_state_synchronize_rcu() and cond_synchronize_rcu(),
the return value from either get_state_synchronize_srcu() or
start_poll_synchronize_srcu() must be passed in to a later call to
poll_state_synchronize_srcu().

Link: https://lore.kernel.org/rcu/20201112201547.gf3365...@moria.home.lan/
Reported-by: Kent Overstreet 
[ paulmck: Add EXPORT_SYMBOL_GPL() per kernel test robot feedback. ]
[ paulmck: Apply feedback from Neeraj Upadhyay. ]
Link: https://lore.kernel.org/lkml/20201117004017.GA7444@paulmck-ThinkPad-P72/
Signed-off-by: Paul E. McKenney 
---
   include/linux/rcupdate.h |  2 ++
   include/linux/srcu.h |  3 +++
   include/linux/srcutiny.h |  1 +
   kernel/rcu/srcutiny.c| 52 
++--
   4 files changed, 56 insertions(+), 2 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index de08264..e09c0d8 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -33,6 +33,8 @@
   #define ULONG_CMP_GE(a, b)   (ULONG_MAX / 2 >= (a) - (b))
   #define ULONG_CMP_LT(a, b)   (ULONG_MAX / 2 < (a) - (b))
   #define ulong2long(a)(*(long *)(&(a)))
+#define USHORT_CMP_GE(a, b)(USHRT_MAX / 2 >= (unsigned short)((a) - (b)))
+#define USHORT_CMP_LT(a, b)(USHRT_MAX / 2 < (unsigned short)((a) - (b)))
   /* Exported common interfaces */
   void call_rcu(struct rcu_head *head, rcu_callback_t func);
diff --git a/include/linux/srcu.h b/include/linux/srcu.h
index e432cc9..a0895bb 100644
--- a/include/linux/srcu.h
+++ b/include/linux/srcu.h
@@ -60,6 +60,9 @@ void cleanup_srcu_struct(struct srcu_struct *ssp);
   int __srcu_read_lock(struct srcu_struct *ssp) __acquires(ssp);
   void __srcu_read_unlock(struct srcu_struct *ssp, int idx) __releases(ssp);
   void synchronize_srcu(struct srcu_struct *ssp);
+unsigned long get_state_synchronize_srcu(struct srcu_struct *ssp);
+unsigned long start_poll_synchronize_srcu(struct srcu_struct *ssp);
+bool poll_state_synchronize_srcu(struct srcu_struct *ssp, unsigned long 
cookie);
   #ifdef CONFIG_DEBUG_LOCK_ALLOC
diff --git a/include/linux/srcutiny.h b/include/linux/srcutiny.h
index d9edb67..c7f0c1f 100644
--- a/include/linux/srcutiny.h
+++ b/include/linux/srcutiny.h
@@ -16,6 +16,7 @@
   struct srcu_struct {
short srcu_lock_nesting[2]; /* srcu_read_lock() nesting depth. */
unsigned short srcu_idx;/* Current reader array element in bit 
0x2. */
+   unsigned short srcu_idx_max;/* Furthest future srcu_idx request. */
u8 srcu_gp_running; /* GP workqueue running? */
u8 srcu_gp_waiting; /* GP waiting for readers? */
struct swait_queue_head srcu_wq;
diff --git a/kernel/rcu/srcutiny.c b/kernel/rcu/srcutiny.c
index 3bac1db..b073175 100644
--- a/kernel/rcu/srcutiny.c
+++ b/kernel/rcu/srcutiny.c
@@ -34,6 +34,7 @@ static int init_srcu_struct_fields(struct srcu_struct *ssp)
ssp->srcu_gp_running = false;
ssp->srcu_gp_waiting = false;
ssp->srcu_idx = 0;
+   ssp->srcu_idx_max = 0;
INIT_WORK(>srcu_work, srcu_drive_gp);
INIT_LIST_HEAD(>srcu_work.entry);
return 0;


Minor: cleanup_srcu_struct() can probably have 2 new sanity checks?

WARN_ON(ssp->srcu_idx != ssp->srcu_idx_max);
WARN_ON(ssp->srcu_idx & 1);


Good point, added and under test.


Thanks
Neeraj


@@ -114,7 +115,7 @@ void srcu_drive_gp(struct work_struct *wp)
struct srcu_struct *ssp;
ssp = container_of(wp, struct srcu_struct, srcu_work);
-   if (ssp->srcu_gp_running || !READ_ONCE(ssp->srcu_cb_head))
+   if (ssp->srcu_gp_running || USHORT_CMP_GE(ssp->srcu_idx, 
READ_ONCE(ssp->srcu_idx_max)))
return; /* Already running or nothing to do. */
/* Remove recently arrived callbacks and wait for readers. */
@@ -147,13 +148,18 @@ void srcu_drive_gp(struct work_struct *wp)
 * straighten that out.
 */
WRITE_ONCE(ssp->srcu_gp_running, false);
-   if (READ_ONCE(ssp->srcu_cb_head))
+   if (USHORT_CMP_GE(ssp->srcu_idx, READ_ONCE(ssp->srcu_idx_max)))
schedule_work(>srcu_work);
   }
   EXPORT_SYMBOL_GPL(srcu_drive_gp);
   static void srcu_gp_start_

Re: [PATCH v2 tip/core/rcu 4/6] srcu: Provide polling interfaces for Tiny SRCU grace periods

2020-11-22 Thread Neeraj Upadhyay




On 11/21/2020 6:29 AM, paul...@kernel.org wrote:

From: "Paul E. McKenney" 

There is a need for a polling interface for SRCU grace
periods, so this commit supplies get_state_synchronize_srcu(),
start_poll_synchronize_srcu(), and poll_state_synchronize_srcu() for this
purpose.  The first can be used if future grace periods are inevitable
(perhaps due to a later call_srcu() invocation), the second if future
grace periods might not otherwise happen, and the third to check if a
grace period has elapsed since the corresponding call to either of the
first two.

As with get_state_synchronize_rcu() and cond_synchronize_rcu(),
the return value from either get_state_synchronize_srcu() or
start_poll_synchronize_srcu() must be passed in to a later call to
poll_state_synchronize_srcu().

Link: https://lore.kernel.org/rcu/20201112201547.gf3365...@moria.home.lan/
Reported-by: Kent Overstreet 
[ paulmck: Add EXPORT_SYMBOL_GPL() per kernel test robot feedback. ]
[ paulmck: Apply feedback from Neeraj Upadhyay. ]
Link: https://lore.kernel.org/lkml/20201117004017.GA7444@paulmck-ThinkPad-P72/
Signed-off-by: Paul E. McKenney 
---
  include/linux/rcupdate.h |  2 ++
  include/linux/srcu.h |  3 +++
  include/linux/srcutiny.h |  1 +
  kernel/rcu/srcutiny.c| 52 ++--
  4 files changed, 56 insertions(+), 2 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index de08264..e09c0d8 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -33,6 +33,8 @@
  #define ULONG_CMP_GE(a, b)(ULONG_MAX / 2 >= (a) - (b))
  #define ULONG_CMP_LT(a, b)(ULONG_MAX / 2 < (a) - (b))
  #define ulong2long(a) (*(long *)(&(a)))
+#define USHORT_CMP_GE(a, b)(USHRT_MAX / 2 >= (unsigned short)((a) - (b)))
+#define USHORT_CMP_LT(a, b)(USHRT_MAX / 2 < (unsigned short)((a) - (b)))
  
  /* Exported common interfaces */

  void call_rcu(struct rcu_head *head, rcu_callback_t func);
diff --git a/include/linux/srcu.h b/include/linux/srcu.h
index e432cc9..a0895bb 100644
--- a/include/linux/srcu.h
+++ b/include/linux/srcu.h
@@ -60,6 +60,9 @@ void cleanup_srcu_struct(struct srcu_struct *ssp);
  int __srcu_read_lock(struct srcu_struct *ssp) __acquires(ssp);
  void __srcu_read_unlock(struct srcu_struct *ssp, int idx) __releases(ssp);
  void synchronize_srcu(struct srcu_struct *ssp);
+unsigned long get_state_synchronize_srcu(struct srcu_struct *ssp);
+unsigned long start_poll_synchronize_srcu(struct srcu_struct *ssp);
+bool poll_state_synchronize_srcu(struct srcu_struct *ssp, unsigned long 
cookie);
  
  #ifdef CONFIG_DEBUG_LOCK_ALLOC
  
diff --git a/include/linux/srcutiny.h b/include/linux/srcutiny.h

index d9edb67..c7f0c1f 100644
--- a/include/linux/srcutiny.h
+++ b/include/linux/srcutiny.h
@@ -16,6 +16,7 @@
  struct srcu_struct {
short srcu_lock_nesting[2]; /* srcu_read_lock() nesting depth. */
unsigned short srcu_idx;/* Current reader array element in bit 
0x2. */
+   unsigned short srcu_idx_max;/* Furthest future srcu_idx request. */
u8 srcu_gp_running; /* GP workqueue running? */
u8 srcu_gp_waiting; /* GP waiting for readers? */
struct swait_queue_head srcu_wq;
diff --git a/kernel/rcu/srcutiny.c b/kernel/rcu/srcutiny.c
index 3bac1db..b073175 100644
--- a/kernel/rcu/srcutiny.c
+++ b/kernel/rcu/srcutiny.c
@@ -34,6 +34,7 @@ static int init_srcu_struct_fields(struct srcu_struct *ssp)
ssp->srcu_gp_running = false;
ssp->srcu_gp_waiting = false;
ssp->srcu_idx = 0;
+   ssp->srcu_idx_max = 0;
INIT_WORK(>srcu_work, srcu_drive_gp);
INIT_LIST_HEAD(>srcu_work.entry);
return 0;


Minor: cleanup_srcu_struct() can probably have 2 new sanity checks?

WARN_ON(ssp->srcu_idx != ssp->srcu_idx_max);
WARN_ON(ssp->srcu_idx & 1);

Thanks
Neeraj


@@ -114,7 +115,7 @@ void srcu_drive_gp(struct work_struct *wp)
struct srcu_struct *ssp;
  
  	ssp = container_of(wp, struct srcu_struct, srcu_work);

-   if (ssp->srcu_gp_running || !READ_ONCE(ssp->srcu_cb_head))
+   if (ssp->srcu_gp_running || USHORT_CMP_GE(ssp->srcu_idx, 
READ_ONCE(ssp->srcu_idx_max)))
return; /* Already running or nothing to do. */
  
  	/* Remove recently arrived callbacks and wait for readers. */

@@ -147,13 +148,18 @@ void srcu_drive_gp(struct work_struct *wp)
 * straighten that out.
 */
WRITE_ONCE(ssp->srcu_gp_running, false);
-   if (READ_ONCE(ssp->srcu_cb_head))
+   if (USHORT_CMP_GE(ssp->srcu_idx, READ_ONCE(ssp->srcu_idx_max)))
schedule_work(>srcu_work);
  }
  EXPORT_SYMBOL_GPL(srcu_drive_gp);
  
  static void srcu_gp_start_if_needed(struct srcu_struct *ssp)

  {
+   unsigned short cookie;
+
+   cookie = get_state_synchronize_srcu(ssp);
+   if (USHORT_CMP_LT(READ_ONCE(ssp->srcu_idx_max), co

Re: [PATCH RFC tip/core/rcu 4/5] srcu: Provide polling interfaces for Tiny SRCU grace periods

2020-11-22 Thread Neeraj Upadhyay




On 11/22/2020 11:31 PM, Paul E. McKenney wrote:

On Sun, Nov 22, 2020 at 07:57:26PM +0530, Neeraj Upadhyay wrote:

On 11/21/2020 5:43 AM, Paul E. McKenney wrote:

On Fri, Nov 20, 2020 at 05:28:32PM +0530, Neeraj Upadhyay wrote:

Hi Paul,

On 11/17/2020 6:10 AM, paul...@kernel.org wrote:

From: "Paul E. McKenney" 

There is a need for a polling interface for SRCU grace
periods, so this commit supplies get_state_synchronize_srcu(),
start_poll_synchronize_srcu(), and poll_state_synchronize_srcu() for this
purpose.  The first can be used if future grace periods are inevitable
(perhaps due to a later call_srcu() invocation), the second if future
grace periods might not otherwise happen, and the third to check if a
grace period has elapsed since the corresponding call to either of the
first two.

As with get_state_synchronize_rcu() and cond_synchronize_rcu(),
the return value from either get_state_synchronize_srcu() or
start_poll_synchronize_srcu() must be passed in to a later call to
poll_state_synchronize_srcu().

Link: https://lore.kernel.org/rcu/20201112201547.gf3365...@moria.home.lan/
Reported-by: Kent Overstreet 
[ paulmck: Add EXPORT_SYMBOL_GPL() per kernel test robot feedback. ]
Signed-off-by: Paul E. McKenney 
---
include/linux/rcupdate.h |  2 ++
include/linux/srcu.h |  3 +++
include/linux/srcutiny.h |  1 +
kernel/rcu/srcutiny.c| 52 
++--
4 files changed, 56 insertions(+), 2 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index de08264..e09c0d8 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -33,6 +33,8 @@
#define ULONG_CMP_GE(a, b)  (ULONG_MAX / 2 >= (a) - (b))
#define ULONG_CMP_LT(a, b)  (ULONG_MAX / 2 < (a) - (b))
#define ulong2long(a)   (*(long *)(&(a)))
+#define USHORT_CMP_GE(a, b)(USHRT_MAX / 2 >= (unsigned short)((a) - (b)))
+#define USHORT_CMP_LT(a, b)(USHRT_MAX / 2 < (unsigned short)((a) - (b)))
/* Exported common interfaces */
void call_rcu(struct rcu_head *head, rcu_callback_t func);
diff --git a/include/linux/srcu.h b/include/linux/srcu.h
index e432cc9..a0895bb 100644
--- a/include/linux/srcu.h
+++ b/include/linux/srcu.h
@@ -60,6 +60,9 @@ void cleanup_srcu_struct(struct srcu_struct *ssp);
int __srcu_read_lock(struct srcu_struct *ssp) __acquires(ssp);
void __srcu_read_unlock(struct srcu_struct *ssp, int idx) __releases(ssp);
void synchronize_srcu(struct srcu_struct *ssp);
+unsigned long get_state_synchronize_srcu(struct srcu_struct *ssp);
+unsigned long start_poll_synchronize_srcu(struct srcu_struct *ssp);
+bool poll_state_synchronize_srcu(struct srcu_struct *ssp, unsigned long 
cookie);
#ifdef CONFIG_DEBUG_LOCK_ALLOC
diff --git a/include/linux/srcutiny.h b/include/linux/srcutiny.h
index fed4a2d..e9bd6fb 100644
--- a/include/linux/srcutiny.h
+++ b/include/linux/srcutiny.h
@@ -16,6 +16,7 @@
struct srcu_struct {
short srcu_lock_nesting[2]; /* srcu_read_lock() nesting depth. */
unsigned short srcu_idx;/* Current reader array element in bit 
0x2. */
+   unsigned short srcu_idx_max;/* Furthest future srcu_idx request. */
u8 srcu_gp_running; /* GP workqueue running? */
u8 srcu_gp_waiting; /* GP waiting for readers? */
struct swait_queue_head srcu_wq;
diff --git a/kernel/rcu/srcutiny.c b/kernel/rcu/srcutiny.c
index 3bac1db..b405811 100644
--- a/kernel/rcu/srcutiny.c
+++ b/kernel/rcu/srcutiny.c
@@ -34,6 +34,7 @@ static int init_srcu_struct_fields(struct srcu_struct *ssp)
ssp->srcu_gp_running = false;
ssp->srcu_gp_waiting = false;
ssp->srcu_idx = 0;
+   ssp->srcu_idx_max = 0;
INIT_WORK(>srcu_work, srcu_drive_gp);
INIT_LIST_HEAD(>srcu_work.entry);
return 0;
@@ -114,7 +115,7 @@ void srcu_drive_gp(struct work_struct *wp)
struct srcu_struct *ssp;
ssp = container_of(wp, struct srcu_struct, srcu_work);
-   if (ssp->srcu_gp_running || !READ_ONCE(ssp->srcu_cb_head))
+   if (ssp->srcu_gp_running || USHORT_CMP_GE(ssp->srcu_idx, 
READ_ONCE(ssp->srcu_idx_max)))
return; /* Already running or nothing to do. */
/* Remove recently arrived callbacks and wait for readers. */
@@ -147,14 +148,19 @@ void srcu_drive_gp(struct work_struct *wp)
 * straighten that out.
 */
WRITE_ONCE(ssp->srcu_gp_running, false);
-   if (READ_ONCE(ssp->srcu_cb_head))
+   if (USHORT_CMP_GE(ssp->srcu_idx, READ_ONCE(ssp->srcu_idx_max)))


Should this be USHORT_CMP_LT ?


I believe that you are correct.  As is, it works but does needless
grace periods.


schedule_work(>srcu_work);
}
EXPORT_SYMBOL_GPL(srcu_drive_gp);
static void srcu_gp_start_if_needed(struct srcu_struct *ssp)
{
+   unsigned short cookie;
+
if (!READ_ONCE(ssp-&g

Re: [PATCH v2 tip/core/rcu 1/6] srcu: Make Tiny SRCU use multi-bit grace-period counter

2020-11-22 Thread Neeraj Upadhyay




On 11/21/2020 6:29 AM, paul...@kernel.org wrote:

From: "Paul E. McKenney" 

There is a need for a polling interface for SRCU grace periods.  This
polling needs to distinguish between an SRCU instance being idle on the
one hand or in the middle of a grace period on the other.  This commit
therefore converts the Tiny SRCU srcu_struct structure's srcu_idx from
a defacto boolean to a free-running counter, using the bottom bit to
indicate that a grace period is in progress.  The second-from-bottom
bit is thus used as the index returned by srcu_read_lock().

Link: https://lore.kernel.org/rcu/20201112201547.gf3365...@moria.home.lan/
Reported-by: Kent Overstreet 
[ paulmck: Fix __srcu_read_lock() idx computation Neeraj per Upadhyay. ]
Signed-off-by: Paul E. McKenney 
---
  include/linux/srcutiny.h | 4 ++--
  kernel/rcu/srcutiny.c| 5 +++--
  2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/include/linux/srcutiny.h b/include/linux/srcutiny.h
index 5a5a194..d9edb67 100644
--- a/include/linux/srcutiny.h
+++ b/include/linux/srcutiny.h
@@ -15,7 +15,7 @@
  
  struct srcu_struct {

short srcu_lock_nesting[2]; /* srcu_read_lock() nesting depth. */
-   short srcu_idx; /* Current reader array element. */
+   unsigned short srcu_idx;/* Current reader array element in bit 
0x2. */
u8 srcu_gp_running; /* GP workqueue running? */
u8 srcu_gp_waiting; /* GP waiting for readers? */
struct swait_queue_head srcu_wq;
@@ -59,7 +59,7 @@ static inline int __srcu_read_lock(struct srcu_struct *ssp)
  {
int idx;
  
-	idx = READ_ONCE(ssp->srcu_idx);

+   idx = ((READ_ONCE(ssp->srcu_idx) + 1) & 0x2) >> 1;
WRITE_ONCE(ssp->srcu_lock_nesting[idx], ssp->srcu_lock_nesting[idx] + 
1);
return idx;
  }


Need change in idx calcultion in srcu_torture_stats_print() ?

static inline void srcu_torture_stats_print(struct srcu_struct *ssp,
  idx = READ_ONCE(ssp->srcu_idx) & 0x1;

Thanks
Neeraj


diff --git a/kernel/rcu/srcutiny.c b/kernel/rcu/srcutiny.c
index 6208c1d..5598cf6 100644
--- a/kernel/rcu/srcutiny.c
+++ b/kernel/rcu/srcutiny.c
@@ -124,11 +124,12 @@ void srcu_drive_gp(struct work_struct *wp)
ssp->srcu_cb_head = NULL;
ssp->srcu_cb_tail = >srcu_cb_head;
local_irq_enable();
-   idx = ssp->srcu_idx;
-   WRITE_ONCE(ssp->srcu_idx, !ssp->srcu_idx);
+   idx = (ssp->srcu_idx & 0x2) / 2;
+   WRITE_ONCE(ssp->srcu_idx, ssp->srcu_idx + 1);
WRITE_ONCE(ssp->srcu_gp_waiting, true);  /* srcu_read_unlock() wakes! */
swait_event_exclusive(ssp->srcu_wq, 
!READ_ONCE(ssp->srcu_lock_nesting[idx]));
WRITE_ONCE(ssp->srcu_gp_waiting, false); /* srcu_read_unlock() cheap. */
+   WRITE_ONCE(ssp->srcu_idx, ssp->srcu_idx + 1);
  
  	/* Invoke the callbacks we removed above. */

while (lh) {



--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member of the Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH v2 tip/core/rcu 4/6] srcu: Provide polling interfaces for Tiny SRCU grace periods

2020-11-22 Thread Neeraj Upadhyay




On 11/21/2020 6:29 AM, paul...@kernel.org wrote:

From: "Paul E. McKenney" 

There is a need for a polling interface for SRCU grace
periods, so this commit supplies get_state_synchronize_srcu(),
start_poll_synchronize_srcu(), and poll_state_synchronize_srcu() for this
purpose.  The first can be used if future grace periods are inevitable
(perhaps due to a later call_srcu() invocation), the second if future
grace periods might not otherwise happen, and the third to check if a
grace period has elapsed since the corresponding call to either of the
first two.

As with get_state_synchronize_rcu() and cond_synchronize_rcu(),
the return value from either get_state_synchronize_srcu() or
start_poll_synchronize_srcu() must be passed in to a later call to
poll_state_synchronize_srcu().

Link: https://lore.kernel.org/rcu/20201112201547.gf3365...@moria.home.lan/
Reported-by: Kent Overstreet 
[ paulmck: Add EXPORT_SYMBOL_GPL() per kernel test robot feedback. ]
[ paulmck: Apply feedback from Neeraj Upadhyay. ]
Link: https://lore.kernel.org/lkml/20201117004017.GA7444@paulmck-ThinkPad-P72/
Signed-off-by: Paul E. McKenney 
---
  include/linux/rcupdate.h |  2 ++
  include/linux/srcu.h |  3 +++
  include/linux/srcutiny.h |  1 +
  kernel/rcu/srcutiny.c| 52 ++--
  4 files changed, 56 insertions(+), 2 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index de08264..e09c0d8 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -33,6 +33,8 @@
  #define ULONG_CMP_GE(a, b)(ULONG_MAX / 2 >= (a) - (b))
  #define ULONG_CMP_LT(a, b)(ULONG_MAX / 2 < (a) - (b))
  #define ulong2long(a) (*(long *)(&(a)))
+#define USHORT_CMP_GE(a, b)(USHRT_MAX / 2 >= (unsigned short)((a) - (b)))
+#define USHORT_CMP_LT(a, b)(USHRT_MAX / 2 < (unsigned short)((a) - (b)))
  
  /* Exported common interfaces */

  void call_rcu(struct rcu_head *head, rcu_callback_t func);
diff --git a/include/linux/srcu.h b/include/linux/srcu.h
index e432cc9..a0895bb 100644
--- a/include/linux/srcu.h
+++ b/include/linux/srcu.h
@@ -60,6 +60,9 @@ void cleanup_srcu_struct(struct srcu_struct *ssp);
  int __srcu_read_lock(struct srcu_struct *ssp) __acquires(ssp);
  void __srcu_read_unlock(struct srcu_struct *ssp, int idx) __releases(ssp);
  void synchronize_srcu(struct srcu_struct *ssp);
+unsigned long get_state_synchronize_srcu(struct srcu_struct *ssp);
+unsigned long start_poll_synchronize_srcu(struct srcu_struct *ssp);
+bool poll_state_synchronize_srcu(struct srcu_struct *ssp, unsigned long 
cookie);
  
  #ifdef CONFIG_DEBUG_LOCK_ALLOC
  
diff --git a/include/linux/srcutiny.h b/include/linux/srcutiny.h

index d9edb67..c7f0c1f 100644
--- a/include/linux/srcutiny.h
+++ b/include/linux/srcutiny.h
@@ -16,6 +16,7 @@
  struct srcu_struct {
short srcu_lock_nesting[2]; /* srcu_read_lock() nesting depth. */
unsigned short srcu_idx;/* Current reader array element in bit 
0x2. */
+   unsigned short srcu_idx_max;/* Furthest future srcu_idx request. */
u8 srcu_gp_running; /* GP workqueue running? */
u8 srcu_gp_waiting; /* GP waiting for readers? */
struct swait_queue_head srcu_wq;
diff --git a/kernel/rcu/srcutiny.c b/kernel/rcu/srcutiny.c
index 3bac1db..b073175 100644
--- a/kernel/rcu/srcutiny.c
+++ b/kernel/rcu/srcutiny.c
@@ -34,6 +34,7 @@ static int init_srcu_struct_fields(struct srcu_struct *ssp)
ssp->srcu_gp_running = false;
ssp->srcu_gp_waiting = false;
ssp->srcu_idx = 0;
+   ssp->srcu_idx_max = 0;
INIT_WORK(>srcu_work, srcu_drive_gp);
INIT_LIST_HEAD(>srcu_work.entry);
return 0;
@@ -114,7 +115,7 @@ void srcu_drive_gp(struct work_struct *wp)
struct srcu_struct *ssp;
  
  	ssp = container_of(wp, struct srcu_struct, srcu_work);

-   if (ssp->srcu_gp_running || !READ_ONCE(ssp->srcu_cb_head))
+   if (ssp->srcu_gp_running || USHORT_CMP_GE(ssp->srcu_idx, 
READ_ONCE(ssp->srcu_idx_max)))
return; /* Already running or nothing to do. */
  
  	/* Remove recently arrived callbacks and wait for readers. */

@@ -147,13 +148,18 @@ void srcu_drive_gp(struct work_struct *wp)
 * straighten that out.
 */
WRITE_ONCE(ssp->srcu_gp_running, false);
-   if (READ_ONCE(ssp->srcu_cb_head))
+   if (USHORT_CMP_GE(ssp->srcu_idx, READ_ONCE(ssp->srcu_idx_max)))


USHORT_CMP_LT ?

Thanks
Neeraj


schedule_work(>srcu_work);



  }
  EXPORT_SYMBOL_GPL(srcu_drive_gp);
  
  static void srcu_gp_start_if_needed(struct srcu_struct *ssp)

  {
+   unsigned short cookie;
+
+   cookie = get_state_synchronize_srcu(ssp);
+   if (USHORT_CMP_LT(READ_ONCE(ssp->srcu_idx_max), cookie))
+   WRITE_ONCE(ssp->srcu_idx_max, cookie);
if (!READ_ONCE(ssp->srcu_gp_running)) {
if (l

Re: [PATCH RFC tip/core/rcu 4/5] srcu: Provide polling interfaces for Tiny SRCU grace periods

2020-11-22 Thread Neeraj Upadhyay




On 11/21/2020 5:43 AM, Paul E. McKenney wrote:

On Fri, Nov 20, 2020 at 05:28:32PM +0530, Neeraj Upadhyay wrote:

Hi Paul,

On 11/17/2020 6:10 AM, paul...@kernel.org wrote:

From: "Paul E. McKenney" 

There is a need for a polling interface for SRCU grace
periods, so this commit supplies get_state_synchronize_srcu(),
start_poll_synchronize_srcu(), and poll_state_synchronize_srcu() for this
purpose.  The first can be used if future grace periods are inevitable
(perhaps due to a later call_srcu() invocation), the second if future
grace periods might not otherwise happen, and the third to check if a
grace period has elapsed since the corresponding call to either of the
first two.

As with get_state_synchronize_rcu() and cond_synchronize_rcu(),
the return value from either get_state_synchronize_srcu() or
start_poll_synchronize_srcu() must be passed in to a later call to
poll_state_synchronize_srcu().

Link: https://lore.kernel.org/rcu/20201112201547.gf3365...@moria.home.lan/
Reported-by: Kent Overstreet 
[ paulmck: Add EXPORT_SYMBOL_GPL() per kernel test robot feedback. ]
Signed-off-by: Paul E. McKenney 
---
   include/linux/rcupdate.h |  2 ++
   include/linux/srcu.h |  3 +++
   include/linux/srcutiny.h |  1 +
   kernel/rcu/srcutiny.c| 52 
++--
   4 files changed, 56 insertions(+), 2 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index de08264..e09c0d8 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -33,6 +33,8 @@
   #define ULONG_CMP_GE(a, b)   (ULONG_MAX / 2 >= (a) - (b))
   #define ULONG_CMP_LT(a, b)   (ULONG_MAX / 2 < (a) - (b))
   #define ulong2long(a)(*(long *)(&(a)))
+#define USHORT_CMP_GE(a, b)(USHRT_MAX / 2 >= (unsigned short)((a) - (b)))
+#define USHORT_CMP_LT(a, b)(USHRT_MAX / 2 < (unsigned short)((a) - (b)))
   /* Exported common interfaces */
   void call_rcu(struct rcu_head *head, rcu_callback_t func);
diff --git a/include/linux/srcu.h b/include/linux/srcu.h
index e432cc9..a0895bb 100644
--- a/include/linux/srcu.h
+++ b/include/linux/srcu.h
@@ -60,6 +60,9 @@ void cleanup_srcu_struct(struct srcu_struct *ssp);
   int __srcu_read_lock(struct srcu_struct *ssp) __acquires(ssp);
   void __srcu_read_unlock(struct srcu_struct *ssp, int idx) __releases(ssp);
   void synchronize_srcu(struct srcu_struct *ssp);
+unsigned long get_state_synchronize_srcu(struct srcu_struct *ssp);
+unsigned long start_poll_synchronize_srcu(struct srcu_struct *ssp);
+bool poll_state_synchronize_srcu(struct srcu_struct *ssp, unsigned long 
cookie);
   #ifdef CONFIG_DEBUG_LOCK_ALLOC
diff --git a/include/linux/srcutiny.h b/include/linux/srcutiny.h
index fed4a2d..e9bd6fb 100644
--- a/include/linux/srcutiny.h
+++ b/include/linux/srcutiny.h
@@ -16,6 +16,7 @@
   struct srcu_struct {
short srcu_lock_nesting[2]; /* srcu_read_lock() nesting depth. */
unsigned short srcu_idx;/* Current reader array element in bit 
0x2. */
+   unsigned short srcu_idx_max;/* Furthest future srcu_idx request. */
u8 srcu_gp_running; /* GP workqueue running? */
u8 srcu_gp_waiting; /* GP waiting for readers? */
struct swait_queue_head srcu_wq;
diff --git a/kernel/rcu/srcutiny.c b/kernel/rcu/srcutiny.c
index 3bac1db..b405811 100644
--- a/kernel/rcu/srcutiny.c
+++ b/kernel/rcu/srcutiny.c
@@ -34,6 +34,7 @@ static int init_srcu_struct_fields(struct srcu_struct *ssp)
ssp->srcu_gp_running = false;
ssp->srcu_gp_waiting = false;
ssp->srcu_idx = 0;
+   ssp->srcu_idx_max = 0;
INIT_WORK(>srcu_work, srcu_drive_gp);
INIT_LIST_HEAD(>srcu_work.entry);
return 0;
@@ -114,7 +115,7 @@ void srcu_drive_gp(struct work_struct *wp)
struct srcu_struct *ssp;
ssp = container_of(wp, struct srcu_struct, srcu_work);
-   if (ssp->srcu_gp_running || !READ_ONCE(ssp->srcu_cb_head))
+   if (ssp->srcu_gp_running || USHORT_CMP_GE(ssp->srcu_idx, 
READ_ONCE(ssp->srcu_idx_max)))
return; /* Already running or nothing to do. */
/* Remove recently arrived callbacks and wait for readers. */
@@ -147,14 +148,19 @@ void srcu_drive_gp(struct work_struct *wp)
 * straighten that out.
 */
WRITE_ONCE(ssp->srcu_gp_running, false);
-   if (READ_ONCE(ssp->srcu_cb_head))
+   if (USHORT_CMP_GE(ssp->srcu_idx, READ_ONCE(ssp->srcu_idx_max)))


Should this be USHORT_CMP_LT ?


I believe that you are correct.  As is, it works but does needless
grace periods.


schedule_work(>srcu_work);
   }
   EXPORT_SYMBOL_GPL(srcu_drive_gp);
   static void srcu_gp_start_if_needed(struct srcu_struct *ssp)
   {
+   unsigned short cookie;
+
if (!READ_ONCE(ssp->srcu_gp_running)) {
+   cookie = get_state_synchronize_srcu(ssp);
+   if (USHORT_CMP_LT(READ_ONCE(ssp-

Re: [PATCH RFC tip/core/rcu 5/5] srcu: Provide polling interfaces for Tree SRCU grace periods

2020-11-22 Thread Neeraj Upadhyay




On 11/21/2020 5:46 AM, Paul E. McKenney wrote:

On Fri, Nov 20, 2020 at 05:31:43PM +0530, Neeraj Upadhyay wrote:



On 11/17/2020 6:10 AM, paul...@kernel.org wrote:

From: "Paul E. McKenney" 

There is a need for a polling interface for SRCU grace
periods, so this commit supplies get_state_synchronize_srcu(),
start_poll_synchronize_srcu(), and poll_state_synchronize_srcu() for this
purpose.  The first can be used if future grace periods are inevitable
(perhaps due to a later call_srcu() invocation), the second if future
grace periods might not otherwise happen, and the third to check if a
grace period has elapsed since the corresponding call to either of the
first two.

As with get_state_synchronize_rcu() and cond_synchronize_rcu(),
the return value from either get_state_synchronize_srcu() or
start_poll_synchronize_srcu() must be passed in to a later call to
poll_state_synchronize_srcu().

Link: https://lore.kernel.org/rcu/20201112201547.gf3365...@moria.home.lan/
Reported-by: Kent Overstreet 
[ paulmck: Add EXPORT_SYMBOL_GPL() per kernel test robot feedback. ]
Signed-off-by: Paul E. McKenney 
---
   kernel/rcu/srcutree.c | 63 
---
   1 file changed, 60 insertions(+), 3 deletions(-)

diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index d930ece..015d80e 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -810,7 +810,8 @@ static void srcu_leak_callback(struct rcu_head *rhp)
   /*
* Start an SRCU grace period, and also queue the callback if non-NULL.
*/
-static void srcu_gp_start_if_needed(struct srcu_struct *ssp, struct rcu_head 
*rhp, bool do_norm)
+static unsigned long srcu_gp_start_if_needed(struct srcu_struct *ssp,
+struct rcu_head *rhp, bool do_norm)
   {
unsigned long flags;
int idx;
@@ -822,7 +823,8 @@ static void srcu_gp_start_if_needed(struct srcu_struct 
*ssp, struct rcu_head *rh
idx = srcu_read_lock(ssp);
sdp = raw_cpu_ptr(ssp->sda);
spin_lock_irqsave_rcu_node(sdp, flags);
-   rcu_segcblist_enqueue(>srcu_cblist, rhp);
+   if (rhp)
+   rcu_segcblist_enqueue(>srcu_cblist, rhp);
rcu_segcblist_advance(>srcu_cblist,
  rcu_seq_current(>srcu_gp_seq));
s = rcu_seq_snap(>srcu_gp_seq);
@@ -841,6 +843,7 @@ static void srcu_gp_start_if_needed(struct srcu_struct 
*ssp, struct rcu_head *rh
else if (needexp)
srcu_funnel_exp_start(ssp, sdp->mynode, s);
srcu_read_unlock(ssp, idx);
+   return s;
   }
   /*
@@ -882,7 +885,7 @@ static void __call_srcu(struct srcu_struct *ssp, struct 
rcu_head *rhp,
return;
}
rhp->func = func;
-   srcu_gp_start_if_needed(ssp, rhp, do_norm);
+   (void)srcu_gp_start_if_needed(ssp, rhp, do_norm);
   }
   /**
@@ -1011,6 +1014,60 @@ void synchronize_srcu(struct srcu_struct *ssp)
   }
   EXPORT_SYMBOL_GPL(synchronize_srcu);
+/**
+ * get_state_synchronize_srcu - Provide an end-of-grace-period cookie
+ * @ssp: srcu_struct to provide cookie for.
+ *
+ * This function returns a cookie that can be passed to
+ * poll_state_synchronize_srcu(), which will return true if a full grace
+ * period has elapsed in the meantime.  It is the caller's responsibility
+ * to make sure that grace period happens, for example, by invoking
+ * call_srcu() after return from get_state_synchronize_srcu().
+ */
+unsigned long get_state_synchronize_srcu(struct srcu_struct *ssp)
+{
+   // Any prior manipulation of SRCU-protected data must happen
+// before the load from ->srcu_gp_seq.
+   smp_mb();
+   return rcu_seq_snap(>srcu_gp_seq);
+}
+EXPORT_SYMBOL_GPL(get_state_synchronize_srcu);
+
+/**
+ * start_poll_synchronize_srcu - Provide cookie and start grace period
+ * @ssp: srcu_struct to provide cookie for.
+ *
+ * This function returns a cookie that can be passed to
+ * poll_state_synchronize_srcu(), which will return true if a full grace
+ * period has elapsed in the meantime.  Unlike get_state_synchronize_srcu(),
+ * this function also ensures that any needed SRCU grace period will be
+ * started.  This convenience does come at a cost in terms of CPU overhead.
+ */
+unsigned long start_poll_synchronize_srcu(struct srcu_struct *ssp)
+{
+   return srcu_gp_start_if_needed(ssp, NULL, true);
+}
+EXPORT_SYMBOL_GPL(start_poll_synchronize_srcu);
+
+/**
+ * poll_state_synchronize_srcu - Has cookie's grace period ended?
+ * @ssp: srcu_struct to provide cookie for.
+ * @cookie: Return value from get_state_synchronize_srcu() or 
start_poll_synchronize_srcu().
+ *
+ * This function takes the cookie that was returned from either
+ * get_state_synchronize_srcu() or start_poll_synchronize_srcu(), and
+ * returns @true if an SRCU grace period elapsed since the time that the
+ * cookie was created.
+ */
+bool poll_state_synchronize_srcu(struc

Re: [PATCH RFC tip/core/rcu 5/5] srcu: Provide polling interfaces for Tree SRCU grace periods

2020-11-20 Thread Neeraj Upadhyay




On 11/17/2020 6:10 AM, paul...@kernel.org wrote:

From: "Paul E. McKenney" 

There is a need for a polling interface for SRCU grace
periods, so this commit supplies get_state_synchronize_srcu(),
start_poll_synchronize_srcu(), and poll_state_synchronize_srcu() for this
purpose.  The first can be used if future grace periods are inevitable
(perhaps due to a later call_srcu() invocation), the second if future
grace periods might not otherwise happen, and the third to check if a
grace period has elapsed since the corresponding call to either of the
first two.

As with get_state_synchronize_rcu() and cond_synchronize_rcu(),
the return value from either get_state_synchronize_srcu() or
start_poll_synchronize_srcu() must be passed in to a later call to
poll_state_synchronize_srcu().

Link: https://lore.kernel.org/rcu/20201112201547.gf3365...@moria.home.lan/
Reported-by: Kent Overstreet 
[ paulmck: Add EXPORT_SYMBOL_GPL() per kernel test robot feedback. ]
Signed-off-by: Paul E. McKenney 
---
  kernel/rcu/srcutree.c | 63 ---
  1 file changed, 60 insertions(+), 3 deletions(-)

diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index d930ece..015d80e 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -810,7 +810,8 @@ static void srcu_leak_callback(struct rcu_head *rhp)
  /*
   * Start an SRCU grace period, and also queue the callback if non-NULL.
   */
-static void srcu_gp_start_if_needed(struct srcu_struct *ssp, struct rcu_head 
*rhp, bool do_norm)
+static unsigned long srcu_gp_start_if_needed(struct srcu_struct *ssp,
+struct rcu_head *rhp, bool do_norm)
  {
unsigned long flags;
int idx;
@@ -822,7 +823,8 @@ static void srcu_gp_start_if_needed(struct srcu_struct 
*ssp, struct rcu_head *rh
idx = srcu_read_lock(ssp);
sdp = raw_cpu_ptr(ssp->sda);
spin_lock_irqsave_rcu_node(sdp, flags);
-   rcu_segcblist_enqueue(>srcu_cblist, rhp);
+   if (rhp)
+   rcu_segcblist_enqueue(>srcu_cblist, rhp);
rcu_segcblist_advance(>srcu_cblist,
  rcu_seq_current(>srcu_gp_seq));
s = rcu_seq_snap(>srcu_gp_seq);
@@ -841,6 +843,7 @@ static void srcu_gp_start_if_needed(struct srcu_struct 
*ssp, struct rcu_head *rh
else if (needexp)
srcu_funnel_exp_start(ssp, sdp->mynode, s);
srcu_read_unlock(ssp, idx);
+   return s;
  }
  
  /*

@@ -882,7 +885,7 @@ static void __call_srcu(struct srcu_struct *ssp, struct 
rcu_head *rhp,
return;
}
rhp->func = func;
-   srcu_gp_start_if_needed(ssp, rhp, do_norm);
+   (void)srcu_gp_start_if_needed(ssp, rhp, do_norm);
  }
  
  /**

@@ -1011,6 +1014,60 @@ void synchronize_srcu(struct srcu_struct *ssp)
  }
  EXPORT_SYMBOL_GPL(synchronize_srcu);
  
+/**

+ * get_state_synchronize_srcu - Provide an end-of-grace-period cookie
+ * @ssp: srcu_struct to provide cookie for.
+ *
+ * This function returns a cookie that can be passed to
+ * poll_state_synchronize_srcu(), which will return true if a full grace
+ * period has elapsed in the meantime.  It is the caller's responsibility
+ * to make sure that grace period happens, for example, by invoking
+ * call_srcu() after return from get_state_synchronize_srcu().
+ */
+unsigned long get_state_synchronize_srcu(struct srcu_struct *ssp)
+{
+   // Any prior manipulation of SRCU-protected data must happen
+// before the load from ->srcu_gp_seq.
+   smp_mb();
+   return rcu_seq_snap(>srcu_gp_seq);
+}
+EXPORT_SYMBOL_GPL(get_state_synchronize_srcu);
+
+/**
+ * start_poll_synchronize_srcu - Provide cookie and start grace period
+ * @ssp: srcu_struct to provide cookie for.
+ *
+ * This function returns a cookie that can be passed to
+ * poll_state_synchronize_srcu(), which will return true if a full grace
+ * period has elapsed in the meantime.  Unlike get_state_synchronize_srcu(),
+ * this function also ensures that any needed SRCU grace period will be
+ * started.  This convenience does come at a cost in terms of CPU overhead.
+ */
+unsigned long start_poll_synchronize_srcu(struct srcu_struct *ssp)
+{
+   return srcu_gp_start_if_needed(ssp, NULL, true);
+}
+EXPORT_SYMBOL_GPL(start_poll_synchronize_srcu);
+
+/**
+ * poll_state_synchronize_srcu - Has cookie's grace period ended?
+ * @ssp: srcu_struct to provide cookie for.
+ * @cookie: Return value from get_state_synchronize_srcu() or 
start_poll_synchronize_srcu().
+ *
+ * This function takes the cookie that was returned from either
+ * get_state_synchronize_srcu() or start_poll_synchronize_srcu(), and
+ * returns @true if an SRCU grace period elapsed since the time that the
+ * cookie was created.
+ */
+bool poll_state_synchronize_srcu(struct srcu_struct *ssp, unsigned long cookie)
+{
+   if (!rcu_seq_done(>srcu_gp_seq, cookie))
+   return false;
+   smp_mb(); // ^^^


Minor: Should this comment be 

Re: [PATCH RFC tip/core/rcu 4/5] srcu: Provide polling interfaces for Tiny SRCU grace periods

2020-11-20 Thread Neeraj Upadhyay

Hi Paul,

On 11/17/2020 6:10 AM, paul...@kernel.org wrote:

From: "Paul E. McKenney" 

There is a need for a polling interface for SRCU grace
periods, so this commit supplies get_state_synchronize_srcu(),
start_poll_synchronize_srcu(), and poll_state_synchronize_srcu() for this
purpose.  The first can be used if future grace periods are inevitable
(perhaps due to a later call_srcu() invocation), the second if future
grace periods might not otherwise happen, and the third to check if a
grace period has elapsed since the corresponding call to either of the
first two.

As with get_state_synchronize_rcu() and cond_synchronize_rcu(),
the return value from either get_state_synchronize_srcu() or
start_poll_synchronize_srcu() must be passed in to a later call to
poll_state_synchronize_srcu().

Link: https://lore.kernel.org/rcu/20201112201547.gf3365...@moria.home.lan/
Reported-by: Kent Overstreet 
[ paulmck: Add EXPORT_SYMBOL_GPL() per kernel test robot feedback. ]
Signed-off-by: Paul E. McKenney 
---
  include/linux/rcupdate.h |  2 ++
  include/linux/srcu.h |  3 +++
  include/linux/srcutiny.h |  1 +
  kernel/rcu/srcutiny.c| 52 ++--
  4 files changed, 56 insertions(+), 2 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index de08264..e09c0d8 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -33,6 +33,8 @@
  #define ULONG_CMP_GE(a, b)(ULONG_MAX / 2 >= (a) - (b))
  #define ULONG_CMP_LT(a, b)(ULONG_MAX / 2 < (a) - (b))
  #define ulong2long(a) (*(long *)(&(a)))
+#define USHORT_CMP_GE(a, b)(USHRT_MAX / 2 >= (unsigned short)((a) - (b)))
+#define USHORT_CMP_LT(a, b)(USHRT_MAX / 2 < (unsigned short)((a) - (b)))
  
  /* Exported common interfaces */

  void call_rcu(struct rcu_head *head, rcu_callback_t func);
diff --git a/include/linux/srcu.h b/include/linux/srcu.h
index e432cc9..a0895bb 100644
--- a/include/linux/srcu.h
+++ b/include/linux/srcu.h
@@ -60,6 +60,9 @@ void cleanup_srcu_struct(struct srcu_struct *ssp);
  int __srcu_read_lock(struct srcu_struct *ssp) __acquires(ssp);
  void __srcu_read_unlock(struct srcu_struct *ssp, int idx) __releases(ssp);
  void synchronize_srcu(struct srcu_struct *ssp);
+unsigned long get_state_synchronize_srcu(struct srcu_struct *ssp);
+unsigned long start_poll_synchronize_srcu(struct srcu_struct *ssp);
+bool poll_state_synchronize_srcu(struct srcu_struct *ssp, unsigned long 
cookie);
  
  #ifdef CONFIG_DEBUG_LOCK_ALLOC
  
diff --git a/include/linux/srcutiny.h b/include/linux/srcutiny.h

index fed4a2d..e9bd6fb 100644
--- a/include/linux/srcutiny.h
+++ b/include/linux/srcutiny.h
@@ -16,6 +16,7 @@
  struct srcu_struct {
short srcu_lock_nesting[2]; /* srcu_read_lock() nesting depth. */
unsigned short srcu_idx;/* Current reader array element in bit 
0x2. */
+   unsigned short srcu_idx_max;/* Furthest future srcu_idx request. */
u8 srcu_gp_running; /* GP workqueue running? */
u8 srcu_gp_waiting; /* GP waiting for readers? */
struct swait_queue_head srcu_wq;
diff --git a/kernel/rcu/srcutiny.c b/kernel/rcu/srcutiny.c
index 3bac1db..b405811 100644
--- a/kernel/rcu/srcutiny.c
+++ b/kernel/rcu/srcutiny.c
@@ -34,6 +34,7 @@ static int init_srcu_struct_fields(struct srcu_struct *ssp)
ssp->srcu_gp_running = false;
ssp->srcu_gp_waiting = false;
ssp->srcu_idx = 0;
+   ssp->srcu_idx_max = 0;
INIT_WORK(>srcu_work, srcu_drive_gp);
INIT_LIST_HEAD(>srcu_work.entry);
return 0;
@@ -114,7 +115,7 @@ void srcu_drive_gp(struct work_struct *wp)
struct srcu_struct *ssp;
  
  	ssp = container_of(wp, struct srcu_struct, srcu_work);

-   if (ssp->srcu_gp_running || !READ_ONCE(ssp->srcu_cb_head))
+   if (ssp->srcu_gp_running || USHORT_CMP_GE(ssp->srcu_idx, 
READ_ONCE(ssp->srcu_idx_max)))
return; /* Already running or nothing to do. */
  
  	/* Remove recently arrived callbacks and wait for readers. */

@@ -147,14 +148,19 @@ void srcu_drive_gp(struct work_struct *wp)
 * straighten that out.
 */
WRITE_ONCE(ssp->srcu_gp_running, false);
-   if (READ_ONCE(ssp->srcu_cb_head))
+   if (USHORT_CMP_GE(ssp->srcu_idx, READ_ONCE(ssp->srcu_idx_max)))


Should this be USHORT_CMP_LT ?


schedule_work(>srcu_work);
  }
  EXPORT_SYMBOL_GPL(srcu_drive_gp);
  
  static void srcu_gp_start_if_needed(struct srcu_struct *ssp)

  {
+   unsigned short cookie;
+
if (!READ_ONCE(ssp->srcu_gp_running)) {
+   cookie = get_state_synchronize_srcu(ssp);
+   if (USHORT_CMP_LT(READ_ONCE(ssp->srcu_idx_max), cookie))
+   WRITE_ONCE(ssp->srcu_idx_max, cookie);


I was thinking of a case which might break with this.

Consider a scenario, where GP is in progress and kworker is right
before below point, after executing callbacks:

void srcu_drive_gp(struct work_struct 

Re: [PATCH RFC tip/core/rcu 2/5] srcu: Provide internal interface to start a Tiny SRCU grace period

2020-11-20 Thread Neeraj Upadhyay




On 11/17/2020 6:10 AM, paul...@kernel.org wrote:

From: "Paul E. McKenney" 

There is a need for a polling interface for SRCU grace periods.
This polling needs to initiate an SRCU grace period without
having to queue (and manage) a callback.  This commit therefore
splits the Tiny SRCU call_srcu() function into callback-queuing and
start-grace-period portions, with the latter in a new function named
srcu_gp_start_if_needed().

Link: https://lore.kernel.org/rcu/20201112201547.gf3365...@moria.home.lan/
Reported-by: Kent Overstreet 
Signed-off-by: Paul E. McKenney 
---


Reviewed-by: Neeraj Upadhyay 


Thanks
Neeraj


  kernel/rcu/srcutiny.c | 17 +++--
  1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/kernel/rcu/srcutiny.c b/kernel/rcu/srcutiny.c
index 5598cf6..3bac1db 100644
--- a/kernel/rcu/srcutiny.c
+++ b/kernel/rcu/srcutiny.c
@@ -152,6 +152,16 @@ void srcu_drive_gp(struct work_struct *wp)
  }
  EXPORT_SYMBOL_GPL(srcu_drive_gp);
  
+static void srcu_gp_start_if_needed(struct srcu_struct *ssp)

+{
+   if (!READ_ONCE(ssp->srcu_gp_running)) {
+   if (likely(srcu_init_done))
+   schedule_work(>srcu_work);
+   else if (list_empty(>srcu_work.entry))
+   list_add(>srcu_work.entry, _boot_list);
+   }
+}
+
  /*
   * Enqueue an SRCU callback on the specified srcu_struct structure,
   * initiating grace-period processing if it is not already running.
@@ -167,12 +177,7 @@ void call_srcu(struct srcu_struct *ssp, struct rcu_head 
*rhp,
*ssp->srcu_cb_tail = rhp;
ssp->srcu_cb_tail = >next;
local_irq_restore(flags);
-   if (!READ_ONCE(ssp->srcu_gp_running)) {
-   if (likely(srcu_init_done))
-   schedule_work(>srcu_work);
-   else if (list_empty(>srcu_work.entry))
-   list_add(>srcu_work.entry, _boot_list);
-   }
+   srcu_gp_start_if_needed(ssp);
  }
  EXPORT_SYMBOL_GPL(call_srcu);
  



--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member of the Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH RFC tip/core/rcu 3/5] srcu: Provide internal interface to start a Tree SRCU grace period

2020-11-20 Thread Neeraj Upadhyay




On 11/17/2020 6:10 AM, paul...@kernel.org wrote:

From: "Paul E. McKenney" 

There is a need for a polling interface for SRCU grace periods.
This polling needs to initiate an SRCU grace period without having
to queue (and manage) a callback.  This commit therefore splits the
Tree SRCU __call_srcu() function into callback-initialization and
queuing/start-grace-period portions, with the latter in a new function
named srcu_gp_start_if_needed().  This function may be passed a NULL
callback pointer, in which case it will refrain from queuing anything.

Why have the new function mess with queuing?  Locking considerations,
of course!

Link: https://lore.kernel.org/rcu/20201112201547.gf3365...@moria.home.lan/
Reported-by: Kent Overstreet 
Signed-off-by: Paul E. McKenney 
---


Reviewed-by: Neeraj Upadhyay 


Thanks
Neeraj


  kernel/rcu/srcutree.c | 66 +--
  1 file changed, 37 insertions(+), 29 deletions(-)

diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index 79b7081..d930ece 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -808,6 +808,42 @@ static void srcu_leak_callback(struct rcu_head *rhp)
  }
  
  /*

+ * Start an SRCU grace period, and also queue the callback if non-NULL.
+ */
+static void srcu_gp_start_if_needed(struct srcu_struct *ssp, struct rcu_head 
*rhp, bool do_norm)
+{
+   unsigned long flags;
+   int idx;
+   bool needexp = false;
+   bool needgp = false;
+   unsigned long s;
+   struct srcu_data *sdp;
+
+   idx = srcu_read_lock(ssp);
+   sdp = raw_cpu_ptr(ssp->sda);
+   spin_lock_irqsave_rcu_node(sdp, flags);
+   rcu_segcblist_enqueue(>srcu_cblist, rhp);
+   rcu_segcblist_advance(>srcu_cblist,
+ rcu_seq_current(>srcu_gp_seq));
+   s = rcu_seq_snap(>srcu_gp_seq);
+   (void)rcu_segcblist_accelerate(>srcu_cblist, s);
+   if (ULONG_CMP_LT(sdp->srcu_gp_seq_needed, s)) {
+   sdp->srcu_gp_seq_needed = s;
+   needgp = true;
+   }
+   if (!do_norm && ULONG_CMP_LT(sdp->srcu_gp_seq_needed_exp, s)) {
+   sdp->srcu_gp_seq_needed_exp = s;
+   needexp = true;
+   }
+   spin_unlock_irqrestore_rcu_node(sdp, flags);
+   if (needgp)
+   srcu_funnel_gp_start(ssp, sdp, s, do_norm);
+   else if (needexp)
+   srcu_funnel_exp_start(ssp, sdp->mynode, s);
+   srcu_read_unlock(ssp, idx);
+}
+
+/*
   * Enqueue an SRCU callback on the srcu_data structure associated with
   * the current CPU and the specified srcu_struct structure, initiating
   * grace-period processing if it is not already running.
@@ -838,13 +874,6 @@ static void srcu_leak_callback(struct rcu_head *rhp)
  static void __call_srcu(struct srcu_struct *ssp, struct rcu_head *rhp,
rcu_callback_t func, bool do_norm)
  {
-   unsigned long flags;
-   int idx;
-   bool needexp = false;
-   bool needgp = false;
-   unsigned long s;
-   struct srcu_data *sdp;
-
check_init_srcu_struct(ssp);
if (debug_rcu_head_queue(rhp)) {
/* Probable double call_srcu(), so leak the callback. */
@@ -853,28 +882,7 @@ static void __call_srcu(struct srcu_struct *ssp, struct 
rcu_head *rhp,
return;
}
rhp->func = func;
-   idx = srcu_read_lock(ssp);
-   sdp = raw_cpu_ptr(ssp->sda);
-   spin_lock_irqsave_rcu_node(sdp, flags);
-   rcu_segcblist_enqueue(>srcu_cblist, rhp);
-   rcu_segcblist_advance(>srcu_cblist,
- rcu_seq_current(>srcu_gp_seq));
-   s = rcu_seq_snap(>srcu_gp_seq);
-   (void)rcu_segcblist_accelerate(>srcu_cblist, s);
-   if (ULONG_CMP_LT(sdp->srcu_gp_seq_needed, s)) {
-   sdp->srcu_gp_seq_needed = s;
-   needgp = true;
-   }
-   if (!do_norm && ULONG_CMP_LT(sdp->srcu_gp_seq_needed_exp, s)) {
-   sdp->srcu_gp_seq_needed_exp = s;
-   needexp = true;
-   }
-   spin_unlock_irqrestore_rcu_node(sdp, flags);
-   if (needgp)
-   srcu_funnel_gp_start(ssp, sdp, s, do_norm);
-   else if (needexp)
-   srcu_funnel_exp_start(ssp, sdp->mynode, s);
-   srcu_read_unlock(ssp, idx);
+   srcu_gp_start_if_needed(ssp, rhp, do_norm);
  }
  
  /**




--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member of the Code Aurora Forum, hosted by The Linux Foundation


AMU extension v1 support for cortex A76, A77, A78 CPUs

2020-11-19 Thread Neeraj Upadhyay

Hi,

For ARM cortex A76, A77, A78 cores (which as per TRM, support AMU) 
AA64PFR0[47:44] field is not set, and AMU does not get enabled for them.

Can you please provide support for these CPUs in cpufeature.c?



Thanks
Neeraj

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member of the Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH RFC tip/core/rcu 1/5] srcu: Make Tiny SRCU use multi-bit grace-period counter

2020-11-19 Thread Neeraj Upadhyay

Hi Paul,

On 11/17/2020 6:10 AM, paul...@kernel.org wrote:

From: "Paul E. McKenney" 

There is a need for a polling interface for SRCU grace periods.  This
polling needs to distinguish between an SRCU instance being idle on the
one hand or in the middle of a grace period on the other.  This commit
therefore converts the Tiny SRCU srcu_struct structure's srcu_idx from
a defacto boolean to a free-running counter, using the bottom bit to
indicate that a grace period is in progress.  The second-from-bottom
bit is thus used as the index returned by srcu_read_lock().

Link: https://lore.kernel.org/rcu/20201112201547.gf3365...@moria.home.lan/
Reported-by: Kent Overstreet 
Signed-off-by: Paul E. McKenney 
---
  include/linux/srcutiny.h | 4 ++--
  kernel/rcu/srcutiny.c| 5 +++--
  2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/include/linux/srcutiny.h b/include/linux/srcutiny.h
index 5a5a194..fed4a2d 100644
--- a/include/linux/srcutiny.h
+++ b/include/linux/srcutiny.h
@@ -15,7 +15,7 @@
  
  struct srcu_struct {

short srcu_lock_nesting[2]; /* srcu_read_lock() nesting depth. */
-   short srcu_idx; /* Current reader array element. */
+   unsigned short srcu_idx;/* Current reader array element in bit 
0x2. */
u8 srcu_gp_running; /* GP workqueue running? */
u8 srcu_gp_waiting; /* GP waiting for readers? */
struct swait_queue_head srcu_wq;
@@ -59,7 +59,7 @@ static inline int __srcu_read_lock(struct srcu_struct *ssp)
  {
int idx;
  
-	idx = READ_ONCE(ssp->srcu_idx);

+   idx = (READ_ONCE(ssp->srcu_idx) & 0x2) / 2;


Should we use bit 0x2 of (READ_ONCE(ssp->srcu_idx) + 1) , if GP 
(srcu_drive_gp()) is in progress? Or am I missing something here?


idx = ((READ_ONCE(ssp->srcu_idx) +1) & 0x2) / 2;

Also, any reason for using divison instead of shift; something to
do with 16-bit srcu_idx which I am missing here?

Thanks
Neeraj


WRITE_ONCE(ssp->srcu_lock_nesting[idx], ssp->srcu_lock_nesting[idx] + 
1);
return idx;
  }
diff --git a/kernel/rcu/srcutiny.c b/kernel/rcu/srcutiny.c
index 6208c1d..5598cf6 100644
--- a/kernel/rcu/srcutiny.c
+++ b/kernel/rcu/srcutiny.c
@@ -124,11 +124,12 @@ void srcu_drive_gp(struct work_struct *wp)
ssp->srcu_cb_head = NULL;
ssp->srcu_cb_tail = >srcu_cb_head;
local_irq_enable();
-   idx = ssp->srcu_idx;
-   WRITE_ONCE(ssp->srcu_idx, !ssp->srcu_idx);
+   idx = (ssp->srcu_idx & 0x2) / 2;
+   WRITE_ONCE(ssp->srcu_idx, ssp->srcu_idx + 1);
WRITE_ONCE(ssp->srcu_gp_waiting, true);  /* srcu_read_unlock() wakes! */
swait_event_exclusive(ssp->srcu_wq, 
!READ_ONCE(ssp->srcu_lock_nesting[idx]));
WRITE_ONCE(ssp->srcu_gp_waiting, false); /* srcu_read_unlock() cheap. */
+   WRITE_ONCE(ssp->srcu_idx, ssp->srcu_idx + 1);
  
  	/* Invoke the callbacks we removed above. */

while (lh) {



--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member of the Code Aurora Forum, hosted by The Linux Foundation


[PATCH V2] rcu: Check and report missed fqs timer wakeup on RCU stall

2020-11-16 Thread Neeraj Upadhyay
For a new grace period request, RCU GP kthread transitions
through following states:

a. [RCU_GP_WAIT_GPS] -> [RCU_GP_DONE_GPS]

Initial state, where GP kthread waits for a new GP start
request is RCU_GP_WAIT_GPS. On new GP request (started by
callers, for any new callbacks, by setting RCU_GP_FLAG_INIT
gp_state flag and waking up the GP kthread) GP kthread
enters RCU_GP_DONE_GPS.

b. [RCU_GP_DONE_GPS] -> [RCU_GP_ONOFF]

Grace period initialization starts in rcu_gp_init(), which
records the start of new GP in rcu_state.gp_seq and enters
into RCU_GP_ONOFF state.

c. [RCU_GP_ONOFF] -> [RCU_GP_INIT]

In RCU_GP_ONOFF state, for each leaf rnp node, GP kthread
applies the buffered online/offline information of its cpus,
from ->qsmaskinitnext to ->qsmaskinit, which is basically
the mask of cpus, which need to report quiescent state, for
the new GP to complete.

Also, in this state, an outgoing rnp's (for which all cpus
are now offline and there are no tasks blocked inside
RCU read section) or an incoming rnp's (for which first cpu
comes online) participation in the new and subsequent GP is
advertised, by propagating its qsmaskinit information up the
tree.

d. [RCU_GP_INIT] -> [RCU_GP_WAIT_FQS]

In RCU_GP_INIT state, current GPs qs mask information and new
GP seq is propagated down the tree, into all rnp nodes,
in breadth first order.

On GP initialization completion, GP kthread enters a fqs loop,
which does following things, to get quiescent state reported
for cpus, which aren't quiesce in ->qsmask:

  i. For CPUs, which have entered idle since the first iteration,
 report quiescent state up the tree.

  ii. Based on how long the current grace period has been running
  for, either, set the appropriate flags, so that sched clock
  interrupt on that cpu, does qs reporting or do a resched
  on that cpu.

On each iteration, it transitions through following states. The fqs
loop is terminated on GP completion, when all CPUs report their
quiescent state and all RCU readers, blocking current grace period
have completed the RCU read section.

e. [RCU_GP_WAIT_FQS] -> [RCU_GP_DOING_FQS]

GP kthread uses a timed wait (jiffies_till_first_fqs for first
iteration and jiffies_till_next_fqs for subsequent iterations),
before doing all the work, to force quiescent state on
all cpus. It wakes up from this state, either on timeout, or
when (RCU_GP_FLAG_FQS | RCU_GP_FLAG_OVLD) flags are set, either
on callback overload conditions or when last cpu reports its
quiescent state and all RCU readers blocking current GP, have
left the RCU read section.

f. [RCU_GP_CLEANUP] -> [RCU_GP_CLEANED]

Mark end of grace period, in ->gp_seq, on all rnp nodes, in breadth
first order and finally in rcu_state.

For cases where timers are not served (due to issues in timer
configuration, in timer framework or due to softirqs not getting
handled, when there is a storm of interrupts) on the CPU,
where GP kthread queued a timer (for timed wait, which is done
in RCU_GP_WAIT_FQS) its possible that RCU kthread never wakes up.
Report the same from stall warnings, if GP thread is in RCU_GP_WAIT_FQS
state, and the timeout has elapsed and the kthread is not woken.

Signed-off-by: Neeraj Upadhyay 
---
Changes in V2:

 - Documentation update.
 - Replace READ_ONCE()/smp_rmb() with smp_load_acquire(), as suggested by Paul.
 - Correct time_after() check.
 - Move fqs timer warning above starvation warning.

 Documentation/RCU/stallwarn.rst | 23 ++-
 kernel/rcu/tree.c   | 25 +++--
 kernel/rcu/tree_stall.h | 32 
 3 files changed, 69 insertions(+), 11 deletions(-)

diff --git a/Documentation/RCU/stallwarn.rst b/Documentation/RCU/stallwarn.rst
index c9ab6af..5170cc0 100644
--- a/Documentation/RCU/stallwarn.rst
+++ b/Documentation/RCU/stallwarn.rst
@@ -92,7 +92,9 @@ warnings:
buggy timer hardware through bugs in the interrupt or exception
path (whether hardware, firmware, or software) through bugs
in Linux's timer subsystem through bugs in the scheduler, and,
-   yes, even including bugs in RCU itself.
+   yes, even including bugs in RCU itself. It can also result in
+   the ``rcu_.*timer wakeup didn't happen for`` console-log message,
+   which will include additional debugging information.
 
 -  A bug in the RCU implementation.
 
@@ -292,6 +294,25 @@ kthread is waiting for a short timeout, the "state" 
precedes value of the
 task_struct ->state field, and the "cpu" indicates that the grace-period
 kthread last ran on CPU 5.
 
+If the relevant grace-period kthread isn't woken up from fqs wait, and
+fqs timeout has happened, then the following additional line is printed::
+
+   kthread timer wakeup didn't happen for 23804 jiffies! g7076 f0x0 
RCU_GP_WAIT_FQS(5) ->state=0x402
+
+The "23804" indicates that kthread's timer expired 23 thousand jiffies
+back. Re

Re: [PATCH] rcu: Check and report missed fqs timer wakeup on RCU stall

2020-11-16 Thread Neeraj Upadhyay

Hi Paul,

On 11/12/2020 1:01 AM, Paul E. McKenney wrote:

On Wed, Nov 11, 2020 at 07:37:37PM +0530, Neeraj Upadhyay wrote:

For a new grace period request, RCU GP kthread transitions
through following states:

a. [RCU_GP_WAIT_GPS] -> [RCU_GP_DONE_GPS]

Initial state, where GP kthread waits for a new GP start
request is RCU_GP_WAIT_GPS. On new GP request (started by
callers, for any new callbacks, by setting RCU_GP_FLAG_INIT
gp_state flag and waking up the GP kthread) GP kthread
enters RCU_GP_DONE_GPS.

b. [RCU_GP_DONE_GPS] -> [RCU_GP_ONOFF]

Grace period initialization starts in rcu_gp_init(), which
records the start of new GP in rcu_state.gp_seq and enters
into RCU_GP_ONOFF state.

c. [RCU_GP_ONOFF] -> [RCU_GP_INIT]

In RCU_GP_ONOFF state, for each leaf rnp node, GP kthread
applies the buffered online/offline information of its cpus,
from ->qsmaskinitnext to ->qsmaskinit, which is basically
the mask of cpus, which need to report quiescent state, for
the new GP to complete.

Also, in this state, an outgoing rnp's (for which all cpus
are now offline and there are no tasks blocked inside
RCU read section) or an incoming rnp's (for which first cpu
comes online) participation in the new and subsequent GP is
advertised, by propagating its qsmaskinit information up the
tree.

d. [RCU_GP_INIT] -> [RCU_GP_WAIT_FQS]

In RCU_GP_INIT state, current GPs qs mask information and new
GP seq is propagated down the tree, into all rnp nodes,
in breadth first order.

On GP initialization completion, GP kthread enters a fqs loop,
which does following things, to get quiescent state reported
for cpus, which aren't quiesce in ->qsmask:

   i. For CPUs, which have entered idle since the first iteration,
  report quiescent state up the tree.

   ii. Based on how long the current grace period has been running
   for, either, set the appropriate flags, so that sched clock
   interrupt on that cpu, does qs reporting or do a resched
   on that cpu.

On each iteration, it transitions through following states. The fqs
loop is terminated on GP completion, when all CPUs report their
quiescent state and all RCU readers, blocking current grace period
have completed the RCU read section.

e. [RCU_GP_WAIT_FQS] -> [RCU_GP_DOING_FQS]

GP kthread uses a timed wait (jiffies_till_first_fqs for first
iteration and jiffies_till_next_fqs for subsequent iterations),
before doing all the work, to force quiescent state on
all cpus. It wakes up from this state, either on timeout, or
when (RCU_GP_FLAG_FQS | RCU_GP_FLAG_OVLD) flags are set, either
on callback overload conditions or when last cpu reports its
quiescent state and all RCU readers blocking current GP, have
left the RCU read section.

f. [RCU_GP_CLEANUP] -> [RCU_GP_CLEANED]

Mark end of grace period, in ->gp_seq, on all rnp nodes, in breadth
first order and finally in rcu_state.

For cases where timers are not served (due to issues in timer
configuration, in timer framework or due to softirqs not getting
handled, when there is a storm of interrupts) on the CPU,
where GP kthread queued a timer (for timed wait, which is done
in RCU_GP_WAIT_FQS) its possible that RCU kthread never wakes up.
Report the same from stall warnings, if GP thread is in RCU_GP_WAIT_FQS
state, and the timeout has elapsed and the kthread is not woken.

Signed-off-by: Neeraj Upadhyay 
---
  kernel/rcu/tree.c   | 25 +++--
  kernel/rcu/tree_stall.h | 33 +
  2 files changed, 48 insertions(+), 10 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 413831b..804e543 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1767,7 +1767,7 @@ static bool rcu_gp_init(void)
 * go offline later.  Please also refer to "Hotplug CPU" section
 * of RCU's Requirements documentation.
 */
-   rcu_state.gp_state = RCU_GP_ONOFF;
+   WRITE_ONCE(rcu_state.gp_state, RCU_GP_ONOFF);
rcu_for_each_leaf_node(rnp) {
smp_mb(); // Pair with barriers used when updating ->ofl_seq to 
odd values.
firstseq = READ_ONCE(rnp->ofl_seq);
@@ -1833,7 +1833,7 @@ static bool rcu_gp_init(void)
 * The grace period cannot complete until the initialization
 * process finishes, because this kthread handles both.
 */
-   rcu_state.gp_state = RCU_GP_INIT;
+   WRITE_ONCE(rcu_state.gp_state, RCU_GP_INIT);
rcu_for_each_node_breadth_first(rnp) {
rcu_gp_slow(gp_init_delay);
raw_spin_lock_irqsave_rcu_node(rnp, flags);
@@ -1932,17 +1932,22 @@ static void rcu_gp_fqs_loop(void)
ret = 0;
for (;;) {
if (!ret) {
-   rcu_state.jiffies_force_qs = jiffies + j;
+   WRITE_ONCE(rcu_state.jiffies_force_qs, jiffies + j);
+   /*
+* jiffies_force_qs before RCU_GP_WAIT_FQS state
+ 

[PATCH] rcu: Check and report missed fqs timer wakeup on RCU stall

2020-11-11 Thread Neeraj Upadhyay
For a new grace period request, RCU GP kthread transitions
through following states:

a. [RCU_GP_WAIT_GPS] -> [RCU_GP_DONE_GPS]

Initial state, where GP kthread waits for a new GP start
request is RCU_GP_WAIT_GPS. On new GP request (started by
callers, for any new callbacks, by setting RCU_GP_FLAG_INIT
gp_state flag and waking up the GP kthread) GP kthread
enters RCU_GP_DONE_GPS.

b. [RCU_GP_DONE_GPS] -> [RCU_GP_ONOFF]

Grace period initialization starts in rcu_gp_init(), which
records the start of new GP in rcu_state.gp_seq and enters
into RCU_GP_ONOFF state.

c. [RCU_GP_ONOFF] -> [RCU_GP_INIT]

In RCU_GP_ONOFF state, for each leaf rnp node, GP kthread
applies the buffered online/offline information of its cpus,
from ->qsmaskinitnext to ->qsmaskinit, which is basically
the mask of cpus, which need to report quiescent state, for
the new GP to complete.

Also, in this state, an outgoing rnp's (for which all cpus
are now offline and there are no tasks blocked inside
RCU read section) or an incoming rnp's (for which first cpu
comes online) participation in the new and subsequent GP is
advertised, by propagating its qsmaskinit information up the
tree.

d. [RCU_GP_INIT] -> [RCU_GP_WAIT_FQS]

In RCU_GP_INIT state, current GPs qs mask information and new
GP seq is propagated down the tree, into all rnp nodes,
in breadth first order.

On GP initialization completion, GP kthread enters a fqs loop,
which does following things, to get quiescent state reported
for cpus, which aren't quiesce in ->qsmask:

  i. For CPUs, which have entered idle since the first iteration,
 report quiescent state up the tree.

  ii. Based on how long the current grace period has been running
  for, either, set the appropriate flags, so that sched clock
  interrupt on that cpu, does qs reporting or do a resched
  on that cpu.

On each iteration, it transitions through following states. The fqs
loop is terminated on GP completion, when all CPUs report their
quiescent state and all RCU readers, blocking current grace period
have completed the RCU read section.

e. [RCU_GP_WAIT_FQS] -> [RCU_GP_DOING_FQS]

GP kthread uses a timed wait (jiffies_till_first_fqs for first
iteration and jiffies_till_next_fqs for subsequent iterations),
before doing all the work, to force quiescent state on
all cpus. It wakes up from this state, either on timeout, or
when (RCU_GP_FLAG_FQS | RCU_GP_FLAG_OVLD) flags are set, either
on callback overload conditions or when last cpu reports its
quiescent state and all RCU readers blocking current GP, have
left the RCU read section.

f. [RCU_GP_CLEANUP] -> [RCU_GP_CLEANED]

Mark end of grace period, in ->gp_seq, on all rnp nodes, in breadth
first order and finally in rcu_state.

For cases where timers are not served (due to issues in timer
configuration, in timer framework or due to softirqs not getting
handled, when there is a storm of interrupts) on the CPU,
where GP kthread queued a timer (for timed wait, which is done
in RCU_GP_WAIT_FQS) its possible that RCU kthread never wakes up.
Report the same from stall warnings, if GP thread is in RCU_GP_WAIT_FQS
state, and the timeout has elapsed and the kthread is not woken.

Signed-off-by: Neeraj Upadhyay 
---
 kernel/rcu/tree.c   | 25 +++--
 kernel/rcu/tree_stall.h | 33 +
 2 files changed, 48 insertions(+), 10 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 413831b..804e543 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1767,7 +1767,7 @@ static bool rcu_gp_init(void)
 * go offline later.  Please also refer to "Hotplug CPU" section
 * of RCU's Requirements documentation.
 */
-   rcu_state.gp_state = RCU_GP_ONOFF;
+   WRITE_ONCE(rcu_state.gp_state, RCU_GP_ONOFF);
rcu_for_each_leaf_node(rnp) {
smp_mb(); // Pair with barriers used when updating ->ofl_seq to 
odd values.
firstseq = READ_ONCE(rnp->ofl_seq);
@@ -1833,7 +1833,7 @@ static bool rcu_gp_init(void)
 * The grace period cannot complete until the initialization
 * process finishes, because this kthread handles both.
 */
-   rcu_state.gp_state = RCU_GP_INIT;
+   WRITE_ONCE(rcu_state.gp_state, RCU_GP_INIT);
rcu_for_each_node_breadth_first(rnp) {
rcu_gp_slow(gp_init_delay);
raw_spin_lock_irqsave_rcu_node(rnp, flags);
@@ -1932,17 +1932,22 @@ static void rcu_gp_fqs_loop(void)
ret = 0;
for (;;) {
if (!ret) {
-   rcu_state.jiffies_force_qs = jiffies + j;
+   WRITE_ONCE(rcu_state.jiffies_force_qs, jiffies + j);
+   /*
+* jiffies_force_qs before RCU_GP_WAIT_FQS state
+* update; required for stall checks.
+*/
+   smp_wmb();

Re: [PATCH v9 4/7] rcu/trace: Add tracing for how segcb list changes

2020-11-04 Thread Neeraj Upadhyay




On 11/3/2020 7:56 PM, Joel Fernandes (Google) wrote:

Track how the segcb list changes before/after acceleration, during
queuing and during dequeuing.

This has proved useful to discover an optimization to avoid unwanted GP
requests when there are no callbacks accelerated. The overhead is minimal as
each segment's length is now stored in the respective segment.

Reviewed-by: Frederic Weisbecker 
Reviewed-by: Neeraj Upadhyay 
Signed-off-by: Joel Fernandes (Google) 
---
  include/trace/events/rcu.h | 25 +
  kernel/rcu/rcu_segcblist.c | 34 ++
  kernel/rcu/rcu_segcblist.h |  5 +
  kernel/rcu/tree.c  |  9 +
  4 files changed, 73 insertions(+)

diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
index 155b5cb43cfd..5f8f2ee1a936 100644
--- a/include/trace/events/rcu.h
+++ b/include/trace/events/rcu.h
@@ -505,6 +505,31 @@ TRACE_EVENT_RCU(rcu_callback,
  __entry->qlen)
  );
  
+TRACE_EVENT_RCU(rcu_segcb_stats,

+
+   TP_PROTO(const char *ctx, int *cb_count, unsigned long *gp_seq),


I think we need to use long[] instead of int[] for cb_count everywhere 
in this patch?



Thanks
Neeraj


+
+   TP_ARGS(ctx, cb_count, gp_seq),
+
+   TP_STRUCT__entry(
+   __field(const char *, ctx)
+   __array(int, cb_count, RCU_CBLIST_NSEGS)
+   __array(unsigned long, gp_seq, RCU_CBLIST_NSEGS)
+   ),
+
+   TP_fast_assign(
+   __entry->ctx = ctx;
+   memcpy(__entry->cb_count, cb_count, RCU_CBLIST_NSEGS * 
sizeof(int));
+   memcpy(__entry->gp_seq, gp_seq, RCU_CBLIST_NSEGS * 
sizeof(unsigned long));
+   ),
+
+   TP_printk("%s cb_count: (DONE=%d, WAIT=%d, NEXT_READY=%d, NEXT=%d) 
"
+ "gp_seq: (DONE=%lu, WAIT=%lu, NEXT_READY=%lu, 
NEXT=%lu)", __entry->ctx,
+ __entry->cb_count[0], __entry->cb_count[1], 
__entry->cb_count[2], __entry->cb_count[3],
+ __entry->gp_seq[0], __entry->gp_seq[1], 
__entry->gp_seq[2], __entry->gp_seq[3])
+
+);
+
  /*
   * Tracepoint for the registration of a single RCU callback of the special
   * kvfree() form.  The first argument is the RCU type, the second argument
diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
index 357c19bbcb00..2a03949d0b82 100644
--- a/kernel/rcu/rcu_segcblist.c
+++ b/kernel/rcu/rcu_segcblist.c
@@ -14,6 +14,7 @@
  #include 
  
  #include "rcu_segcblist.h"

+#include "rcu.h"
  
  /* Initialize simple callback list. */

  void rcu_cblist_init(struct rcu_cblist *rclp)
@@ -328,6 +329,39 @@ void rcu_segcblist_extract_done_cbs(struct rcu_segcblist 
*rsclp,
rcu_segcblist_set_seglen(rsclp, RCU_DONE_TAIL, 0);
  }
  
+/*

+ * Return how many CBs each segment along with their gp_seq values.
+ *
+ * This function is O(N) where N is the number of segments. Only used from
+ * tracing code which is usually disabled in production.
+ */
+#ifdef CONFIG_RCU_TRACE
+static void rcu_segcblist_countseq(struct rcu_segcblist *rsclp,
+int cbcount[RCU_CBLIST_NSEGS],
+unsigned long gpseq[RCU_CBLIST_NSEGS])
+{
+   int i;
+
+   for (i = 0; i < RCU_CBLIST_NSEGS; i++) {
+   cbcount[i] = rcu_segcblist_get_seglen(rsclp, i);
+   gpseq[i] = rsclp->gp_seq[i];
+   }
+}
+
+void __trace_rcu_segcb_stats(struct rcu_segcblist *rsclp, const char *context)
+{
+   int cbs[RCU_CBLIST_NSEGS];
+   unsigned long gps[RCU_CBLIST_NSEGS];
+
+   if (!trace_rcu_segcb_stats_enabled())
+   return;
+
+   rcu_segcblist_countseq(rsclp, cbs, gps);
+
+   trace_rcu_segcb_stats(context, cbs, gps);
+}
+#endif
+
  /*
   * Extract only those callbacks still pending (not yet ready to be
   * invoked) from the specified rcu_segcblist structure and place them in
diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
index cd35c9faaf51..7750734fa116 100644
--- a/kernel/rcu/rcu_segcblist.h
+++ b/kernel/rcu/rcu_segcblist.h
@@ -103,3 +103,8 @@ void rcu_segcblist_advance(struct rcu_segcblist *rsclp, 
unsigned long seq);
  bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq);
  void rcu_segcblist_merge(struct rcu_segcblist *dst_rsclp,
 struct rcu_segcblist *src_rsclp);
+#ifdef CONFIG_RCU_TRACE
+void __trace_rcu_segcb_stats(struct rcu_segcblist *rsclp, const char *context);
+#else
+#define __trace_rcu_segcb_stats(...)
+#endif
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 24c00020ab83..f6c6653b3ec2 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1497,6 +1497,8 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, 
struct rcu_data *rdp)
if (!rcu_segcblist_pend_

Re: [PATCH v9 3/7] srcu: Fix invoke_rcu_callbacks() segcb length adjustment

2020-11-04 Thread Neeraj Upadhyay




On 11/3/2020 7:55 PM, Joel Fernandes (Google) wrote:

With earlier patches, the negative counting of the unsegmented list
cannot be used to adjust the segmented one. To fix this, sample the
unsegmented length in advance, and use it after CB execution to adjust
the segmented list's length.

Reviewed-by: Frederic Weisbecker 
Suggested-by: Frederic Weisbecker 
Signed-off-by: Joel Fernandes (Google) 
---


Reviewed-by: Neeraj Upadhyay 


  kernel/rcu/srcutree.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index 0f23d20d485a..79b7081143a7 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -1160,6 +1160,7 @@ static void srcu_advance_state(struct srcu_struct *ssp)
   */
  static void srcu_invoke_callbacks(struct work_struct *work)
  {
+   long len;
bool more;
struct rcu_cblist ready_cbs;
struct rcu_head *rhp;
@@ -1182,6 +1183,7 @@ static void srcu_invoke_callbacks(struct work_struct 
*work)
/* We are on the job!  Extract and invoke ready callbacks. */
sdp->srcu_cblist_invoking = true;
rcu_segcblist_extract_done_cbs(>srcu_cblist, _cbs);
+   len = ready_cbs.len;
spin_unlock_irq_rcu_node(sdp);
rhp = rcu_cblist_dequeue(_cbs);
for (; rhp != NULL; rhp = rcu_cblist_dequeue(_cbs)) {
@@ -1190,13 +1192,14 @@ static void srcu_invoke_callbacks(struct work_struct 
*work)
rhp->func(rhp);
local_bh_enable();
}
+   WARN_ON_ONCE(ready_cbs.len);
  
  	/*

 * Update counts, accelerate new callbacks, and if needed,
 * schedule another round of callback invocation.
 */
spin_lock_irq_rcu_node(sdp);
-   rcu_segcblist_insert_count(>srcu_cblist, _cbs);
+   rcu_segcblist_add_len(>srcu_cblist, -len);
(void)rcu_segcblist_accelerate(>srcu_cblist,
   rcu_seq_snap(>srcu_gp_seq));
sdp->srcu_cblist_invoking = false;



--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member of the Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH v9 2/7] rcu/segcblist: Add counters to segcblist datastructure

2020-11-04 Thread Neeraj Upadhyay




On 11/3/2020 7:55 PM, Joel Fernandes (Google) wrote:

Add counting of segment lengths of segmented callback list.

This will be useful for a number of things such as knowing how big the
ready-to-execute segment have gotten. The immediate benefit is ability
to trace how the callbacks in the segmented callback list change.

Also this patch remove hacks related to using donecbs's ->len field as a
temporary variable to save the segmented callback list's length. This cannot be
done anymore and is not needed.

Reviewed-by: Frederic Weisbecker 
Signed-off-by: Joel Fernandes (Google) 
---


Reviewed-by: Neeraj Upadhyay 


  include/linux/rcu_segcblist.h |   1 +
  kernel/rcu/rcu_segcblist.c| 120 ++
  kernel/rcu/rcu_segcblist.h|   2 -
  3 files changed, 79 insertions(+), 44 deletions(-)

diff --git a/include/linux/rcu_segcblist.h b/include/linux/rcu_segcblist.h
index b36afe7b22c9..6c01f09a6456 100644
--- a/include/linux/rcu_segcblist.h
+++ b/include/linux/rcu_segcblist.h
@@ -72,6 +72,7 @@ struct rcu_segcblist {
  #else
long len;
  #endif
+   long seglen[RCU_CBLIST_NSEGS];
u8 enabled;
u8 offloaded;
  };
diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
index bb246d8c6ef1..357c19bbcb00 100644
--- a/kernel/rcu/rcu_segcblist.c
+++ b/kernel/rcu/rcu_segcblist.c
@@ -7,10 +7,11 @@
   * Authors: Paul E. McKenney 
   */
  
-#include 

-#include 
+#include 
  #include 
+#include 
  #include 
+#include 
  
  #include "rcu_segcblist.h"
  
@@ -88,6 +89,46 @@ static void rcu_segcblist_set_len(struct rcu_segcblist *rsclp, long v)

  #endif
  }
  
+/* Get the length of a segment of the rcu_segcblist structure. */

+static long rcu_segcblist_get_seglen(struct rcu_segcblist *rsclp, int seg)
+{
+   return READ_ONCE(rsclp->seglen[seg]);
+}
+
+/* Set the length of a segment of the rcu_segcblist structure. */
+static void rcu_segcblist_set_seglen(struct rcu_segcblist *rsclp, int seg, 
long v)
+{
+   WRITE_ONCE(rsclp->seglen[seg], v);
+}
+
+/* Return number of callbacks in a segment of the segmented callback list. */
+static void rcu_segcblist_add_seglen(struct rcu_segcblist *rsclp, int seg, 
long v)
+{
+   WRITE_ONCE(rsclp->seglen[seg], rsclp->seglen[seg] + v);
+}
+
+/* Move from's segment length to to's segment. */
+static void rcu_segcblist_move_seglen(struct rcu_segcblist *rsclp, int from, 
int to)
+{
+   long len;
+
+   if (from == to)
+   return;
+
+   len = rcu_segcblist_get_seglen(rsclp, from);
+   if (!len)
+   return;
+
+   rcu_segcblist_add_seglen(rsclp, to, len);
+   rcu_segcblist_set_seglen(rsclp, from, 0);
+}
+
+/* Increment segment's length. */
+static void rcu_segcblist_inc_seglen(struct rcu_segcblist *rsclp, int seg)
+{
+   rcu_segcblist_add_seglen(rsclp, seg, 1);
+}
+
  /*
   * Increase the numeric length of an rcu_segcblist structure by the
   * specified amount, which can be negative.  This can cause the ->len
@@ -119,26 +160,6 @@ void rcu_segcblist_inc_len(struct rcu_segcblist *rsclp)
rcu_segcblist_add_len(rsclp, 1);
  }
  
-/*

- * Exchange the numeric length of the specified rcu_segcblist structure
- * with the specified value.  This can cause the ->len field to disagree
- * with the actual number of callbacks on the structure.  This exchange is
- * fully ordered with respect to the callers accesses both before and after.
- */
-static long rcu_segcblist_xchg_len(struct rcu_segcblist *rsclp, long v)
-{
-#ifdef CONFIG_RCU_NOCB_CPU
-   return atomic_long_xchg(>len, v);
-#else
-   long ret = rsclp->len;
-
-   smp_mb(); /* Up to the caller! */
-   WRITE_ONCE(rsclp->len, v);
-   smp_mb(); /* Up to the caller! */
-   return ret;
-#endif
-}
-
  /*
   * Initialize an rcu_segcblist structure.
   */
@@ -149,8 +170,10 @@ void rcu_segcblist_init(struct rcu_segcblist *rsclp)
BUILD_BUG_ON(RCU_NEXT_TAIL + 1 != ARRAY_SIZE(rsclp->gp_seq));
BUILD_BUG_ON(ARRAY_SIZE(rsclp->tails) != ARRAY_SIZE(rsclp->gp_seq));
rsclp->head = NULL;
-   for (i = 0; i < RCU_CBLIST_NSEGS; i++)
+   for (i = 0; i < RCU_CBLIST_NSEGS; i++) {
rsclp->tails[i] = >head;
+   rcu_segcblist_set_seglen(rsclp, i, 0);
+   }
rcu_segcblist_set_len(rsclp, 0);
rsclp->enabled = 1;
  }
@@ -246,6 +269,7 @@ void rcu_segcblist_enqueue(struct rcu_segcblist *rsclp,
  {
rcu_segcblist_inc_len(rsclp);
smp_mb(); /* Ensure counts are updated before callback is enqueued. */
+   rcu_segcblist_inc_seglen(rsclp, RCU_NEXT_TAIL);
rhp->next = NULL;
WRITE_ONCE(*rsclp->tails[RCU_NEXT_TAIL], rhp);
WRITE_ONCE(rsclp->tails[RCU_NEXT_TAIL], >next);
@@ -274,27 +298,13 @@ bool rcu_segcblist_entrain(struct rcu_segcblist *rsclp,
for (i = RCU_NEXT_TAIL; i > RCU_DONE_TAIL; i--)
   

Re: Queries on ARM SDEI Linux kernel code

2020-10-21 Thread Neeraj Upadhyay

Hi James,

Sorry for late reply. Thanks for your comments!

On 10/16/2020 9:57 PM, James Morse wrote:

Hi Neeraj,

On 15/10/2020 07:07, Neeraj Upadhyay wrote:

1. Looks like interrupt bind interface (SDEI_1_0_FN_SDEI_INTERRUPT_BIND) is not 
available
for clients to use; can you please share information on
why it is not provided?


There is no compelling use-case for it, and its very complex to support as the 
driver can
no longer hide things like hibernate.

Last time I looked, it looked like the SDEI driver would need to ask the 
irqchip to
prevent modification while firmware re-configures the irq. I couldn't work out 
how this
would work if the irq is in-progress on another CPU.



Got it. I will think in this direction, on how to achieve this.


The reasons to use bound-interrupts can equally be supported with an event 
provided by
firmware.



Ok, I will explore in that direction.


While trying to dig information on this, I saw  that [1] says:
   Now the hotplug callbacks save  nothing, and restore the OS-view of 
registered/enabled.
This makes bound-interrupts harder to work with.



Based on this comment, the changes from v4 [2], which I could understand is, 
cpu down path
does not save the current event enable status, and we rely on the enable status
`event->reenable', which is set, when register/unregister, enable/disable calls 
are made;
this enable status is used during cpu up path, to decide whether to reenable 
the interrupt.



Does this make, bound-interrupts harder to work with? how? Can you please 
explain? Or
above save/restore is not the reason and you meant something else?


If you bind a level-triggered interrupt, how does firmware know how to clear 
the interrupt
from whatever is generating it?

What happens if the OS can't do this either, as it needs to allocate memory, or 
take a
lock, which it can't do in nmi context?

Ok, makes sense.
The people that wrote the SDEI spec's answer to this was that the handler can 
disable the
event from inside the handler... and firmware will do, something, to stop the 
interrupt
screaming.

So now an event can become disabled anytime its registered, which makes it more
complicated to save/restore.



Also, does shared bound interrupts


Shared-interrupts as an NMI made me jump. But I think you mean a bound 
interrupt as a
shared event. i.e. and SPI not a PPI.



Sorry I should have worded properly; yes I meant SPI as shared event.


also have the same problem, as save/restore behavior
was only for private events?


See above, the problem is the event disabling itself.


This makes sense now.


Additionally those changes to unregister the private-event mean the code can't 
tell the
difference between cpuhp and hibernate... only hibernate additionally loses the 
state in
firmware.



Got it!

2. SDEI_EVENT_SIGNAL api is not provided? What is the reason for it? Its 
handling has the
same problems, which are there for bound interrupts?


Its not supported as no-one showed up with a use-case.
While firmware is expected to back it with a PPI, its doesn't have the same 
problems as
bound-interrupts as its not an interrupt the OS ever knows about.



Also, if it is provided, clients need to register event 0 ? Vendor events or 
other event
nums are not supported, as per spec.


Ideally the driver would register the event, and provide a call_on_cpu() helper 
to trigger
it. This should fit in with however the GIC's PMR based NMI does its PPI based
crash/stacktrace call so that the caller doesn't need to know if its back by 
IRQ, pNMI or
SDEI.


Ok; I will explore how PMR based NMIs work; I thought it was SGI based. 
But will recheck.



3. Can kernel panic() be triggered from sdei event handler?


Yes,



Is it a safe operation?


panic() wipes out the machine... did you expect it to keep running?


I wanted to check the case where panic triggers kexec/kdump path into 
capture kernel.



What does safe mean here?
I think I didn't put it correctly; I meant what possible scenarios can 

happen in this case and you explained one below, thanks!


You should probably call nmi_panic() if there is the risk that the event 
occurred during
panic() on the same CPU, as it would otherwise just block.



The spec says, synchronous exceptions should not be triggered; I think panic
won't do it; but anything which triggers a WARN
or other sync exception in that path can cause undefined behavior. Can you 
share your
thoughts on this?


What do you mean by undefined behaviour?

I was thinking, if SDEI event preempts EL1, at the point, where EL1 has 
just entered an exception, and hasn't captured the registers like 
spsr_el1, elr_el1 and other registers, what will be the behavior?



SDEI was originally to report external abort to the OS in regions where the OS 
can't take
an exception because the exception-registers are live, just after and exception 
and just
before eret.

If you take another exception from the NMI handler, chances are you're going to 
go back
ro

Queries on ARM SDEI Linux kernel code

2020-10-15 Thread Neeraj Upadhyay

Hi James,

Have few queries on ARM SDEI Linux code. Queries are listed below; can 
you please help provide your insights on these?


1. Looks like interrupt bind interface (SDEI_1_0_FN_SDEI_INTERRUPT_BIND) 
is not available for clients to use; can you please share information on

why it is not provided?

While trying to dig information on this, I saw  that [1] says:
  Now the hotplug callbacks save  nothing, and restore the OS-view of 
registered/enabled. This makes bound-interrupts harder to work with.


Based on this comment, the changes from v4 [2], which I could understand 
is, cpu down path does not save the current event enable status, and we 
rely on the enable status `event->reenable', which is set, when 
register/unregister, enable/disable calls are made; this enable status 
is used during cpu up path, to decide whether to reenable the interrupt.


Does this make, bound-interrupts harder to work with? how? Can you 
please explain? Or above save/restore is not the reason and you meant 
something else?


Also, does shared bound interrupts also have the same problem, as 
save/restore behavior was only for private events?


2. SDEI_EVENT_SIGNAL api is not provided? What is the reason for it? Its 
handling has the same problems, which are there for bound interrupts?


Also, if it is provided, clients need to register event 0 ? Vendor 
events or other event nums are not supported, as per spec.


3. Can kernel panic() be triggered from sdei event handler? Is it a safe
operation? The spec says, synchronous exceptions should not be 
triggered; I think panic won't do it; but anything which triggers a WARN
or other sync exception in that path can cause undefined behavior. Can 
you share your thoughts on this?


"The handler code should not enable asynchronous exceptions by clearing 
any of the PSTATE.DAIF bits, and should not cause synchronous exceptions 
to the client Exception level."



[1] https://lwn.net/Articles/740817/
[2] https://www.spinics.net/lists/kvm-arm/msg27784.html

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member of the Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH v6 3/4] rcu/trace: Add tracing for how segcb list changes

2020-10-14 Thread Neeraj Upadhyay




On 9/23/2020 8:52 PM, Joel Fernandes (Google) wrote:

Track how the segcb list changes before/after acceleration, during
queuing and during dequeuing.

This has proved useful to discover an optimization to avoid unwanted GP
requests when there are no callbacks accelerated. The overhead is minimal as
each segment's length is now stored in the respective segment.

Signed-off-by: Joel Fernandes (Google) 
---
  include/trace/events/rcu.h | 25 +
  kernel/rcu/rcu_segcblist.c | 34 ++
  kernel/rcu/rcu_segcblist.h |  5 +
  kernel/rcu/tree.c  |  9 +
  4 files changed, 73 insertions(+)

diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
index 155b5cb43cfd..7b84df3c95df 100644
--- a/include/trace/events/rcu.h
+++ b/include/trace/events/rcu.h
@@ -505,6 +505,31 @@ TRACE_EVENT_RCU(rcu_callback,
  __entry->qlen)
  );
  
+TRACE_EVENT_RCU(rcu_segcb,

+
+   TP_PROTO(const char *ctx, int *cb_count, unsigned long *gp_seq),
+
+   TP_ARGS(ctx, cb_count, gp_seq),
+
+   TP_STRUCT__entry(
+   __field(const char *, ctx)
+   __array(int, cb_count, 4)
+   __array(unsigned long, gp_seq, 4)


Use RCU_CBLIST_NSEGS in place of 4 ?

+   ),
+
+   TP_fast_assign(
+   __entry->ctx = ctx;
+   memcpy(__entry->cb_count, cb_count, 4 * sizeof(int));
+   memcpy(__entry->gp_seq, gp_seq, 4 * sizeof(unsigned 
long));
+   ),
+
+   TP_printk("%s cb_count: (DONE=%d, WAIT=%d, NEXT_READY=%d, NEXT=%d) 
"
+ "gp_seq: (DONE=%lu, WAIT=%lu, NEXT_READY=%lu, 
NEXT=%lu)", __entry->ctx,
+ __entry->cb_count[0], __entry->cb_count[1], 
__entry->cb_count[2], __entry->cb_count[3],
+ __entry->gp_seq[0], __entry->gp_seq[1], 
__entry->gp_seq[2], __entry->gp_seq[3])
+
+);
+
  /*
   * Tracepoint for the registration of a single RCU callback of the special
   * kvfree() form.  The first argument is the RCU type, the second argument
diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
index 0e6d19bd3de9..df0f31e30947 100644
--- a/kernel/rcu/rcu_segcblist.c
+++ b/kernel/rcu/rcu_segcblist.c
@@ -13,6 +13,7 @@
  #include 
  
  #include "rcu_segcblist.h"

+#include "rcu.h"
  
  /* Initialize simple callback list. */

  void rcu_cblist_init(struct rcu_cblist *rclp)
@@ -343,6 +344,39 @@ void rcu_segcblist_extract_done_cbs(struct rcu_segcblist 
*rsclp,
rcu_segcblist_set_seglen(rsclp, RCU_DONE_TAIL, 0);
  }
  
+/*

+ * Return how many CBs each segment along with their gp_seq values.
+ *
+ * This function is O(N) where N is the number of callbacks. Only used from


N is number of segments?


+ * tracing code which is usually disabled in production.
+ */
+#ifdef CONFIG_RCU_TRACE
+static void rcu_segcblist_countseq(struct rcu_segcblist *rsclp,
+int cbcount[RCU_CBLIST_NSEGS],
+unsigned long gpseq[RCU_CBLIST_NSEGS])
+{
+   int i;
+
+   for (i = 0; i < RCU_CBLIST_NSEGS; i++)
+   cbcount[i] = 0;
+


What is the reason for initializing to 0?


+   for (i = 0; i < RCU_CBLIST_NSEGS; i++) {
+   cbcount[i] = rcu_segcblist_get_seglen(rsclp, i);
+   gpseq[i] = rsclp->gp_seq[i];
+   }
+}
+
+void trace_rcu_segcb_list(struct rcu_segcblist *rsclp, char *context)
+{
+   int cbs[RCU_CBLIST_NSEGS];
+   unsigned long gps[RCU_CBLIST_NSEGS];
+
+   rcu_segcblist_countseq(rsclp, cbs, gps);
+
+   trace_rcu_segcb(context, cbs, gps);
+}
+#endif
+
  /*
   * Extract only those callbacks still pending (not yet ready to be
   * invoked) from the specified rcu_segcblist structure and place them in
diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
index 3e0eb1056ae9..15c10d30f88c 100644
--- a/kernel/rcu/rcu_segcblist.h
+++ b/kernel/rcu/rcu_segcblist.h
@@ -103,3 +103,8 @@ void rcu_segcblist_advance(struct rcu_segcblist *rsclp, 
unsigned long seq);
  bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq);
  void rcu_segcblist_merge(struct rcu_segcblist *dst_rsclp,
 struct rcu_segcblist *src_rsclp);
+#ifdef CONFIG_RCU_TRACE
+void trace_rcu_segcb_list(struct rcu_segcblist *rsclp, char *context);
+#else
+#define trace_rcu_segcb_list(...)
+#endif
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 50af465729f4..e3381ff67fc6 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1492,6 +1492,8 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, 
struct rcu_data *rdp)
if (!rcu_segcblist_pend_cbs(>cblist))
return false;
  
+	trace_rcu_segcb_list(>cblist, "SegCbPreAcc");


Use TPS("SegCbPreAcc") ?


Thanks
Neeraj


+
/*
 * Callbacks are often registered with incomplete grace-period
 * 

Re: [PATCH v6 1/4] rcu/tree: Make rcu_do_batch count how many callbacks were executed

2020-10-14 Thread Neeraj Upadhyay




On 9/23/2020 8:52 PM, Joel Fernandes (Google) wrote:

Currently, rcu_do_batch() depends on the unsegmented callback list's len field
to know how many CBs are executed. This fields counts down from 0 as CBs are
dequeued.  It is possible that all CBs could not be run because of reaching
limits in which case the remaining unexecuted callbacks are requeued in the
CPU's segcblist.

The number of callbacks that were not requeued are then the negative count (how
many CBs were run) stored in the rcl->len which has been counting down on every
dequeue. This negative count is then added to the per-cpu segmented callback
list's to correct its count.

Such a design works against future efforts to track the length of each segment
of the segmented callback list. The reason is because
rcu_segcblist_extract_done_cbs() will be populating the unsegmented callback
list's length field (rcl->len) during extraction.

Also, the design of counting down from 0 is confusing and error-prone IMHO.

This commit therefore explicitly counts have many callbacks were executed in
rcu_do_batch() itself, and uses that to update the per-CPU segcb list's ->len
field, without relying on the negativity of rcl->len.

Signed-off-by: Joel Fernandes (Google) 
---
  kernel/rcu/rcu_segcblist.c | 2 +-
  kernel/rcu/rcu_segcblist.h | 1 +
  kernel/rcu/tree.c  | 9 -
  3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
index 2d2a6b6b9dfb..bb246d8c6ef1 100644
--- a/kernel/rcu/rcu_segcblist.c
+++ b/kernel/rcu/rcu_segcblist.c
@@ -95,7 +95,7 @@ static void rcu_segcblist_set_len(struct rcu_segcblist 
*rsclp, long v)
   * This increase is fully ordered with respect to the callers accesses
   * both before and after.
   */
-static void rcu_segcblist_add_len(struct rcu_segcblist *rsclp, long v)
+void rcu_segcblist_add_len(struct rcu_segcblist *rsclp, long v)
  {
  #ifdef CONFIG_RCU_NOCB_CPU
smp_mb__before_atomic(); /* Up to the caller! */
diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
index 5c293afc07b8..b90725f81d77 100644
--- a/kernel/rcu/rcu_segcblist.h
+++ b/kernel/rcu/rcu_segcblist.h
@@ -76,6 +76,7 @@ static inline bool rcu_segcblist_restempty(struct 
rcu_segcblist *rsclp, int seg)
  }
  
  void rcu_segcblist_inc_len(struct rcu_segcblist *rsclp);

+void rcu_segcblist_add_len(struct rcu_segcblist *rsclp, long v);
  void rcu_segcblist_init(struct rcu_segcblist *rsclp);
  void rcu_segcblist_disable(struct rcu_segcblist *rsclp);
  void rcu_segcblist_offload(struct rcu_segcblist *rsclp);
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 7623128d0020..50af465729f4 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2427,7 +2427,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
   rcu_segcblist_is_offloaded(>cblist);
struct rcu_head *rhp;
struct rcu_cblist rcl = RCU_CBLIST_INITIALIZER(rcl);
-   long bl, count;
+   long bl, count = 0;
long pending, tlimit = 0;
  
  	/* If no callbacks are ready, just return. */

@@ -2472,6 +2472,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
for (; rhp; rhp = rcu_cblist_dequeue()) {
rcu_callback_t f;
  
+		count++;

debug_rcu_head_unqueue(rhp);
  
  		rcu_lock_acquire(_callback_map);

@@ -2485,9 +2486,8 @@ static void rcu_do_batch(struct rcu_data *rdp)
  
  		/*

 * Stop only if limit reached and CPU has something to do.
-* Note: The rcl structure counts down from zero.
 */
-   if (-rcl.len >= bl && !offloaded &&
+   if (count >= bl && !offloaded &&
(need_resched() ||
 (!is_idle_task(current) && !rcu_is_callbacks_kthread(
break;


Update below usage of -rcl.len also?

if (likely((-rcl.len & 31) || local_clock() < tlimit))


Thanks
Neeraj


@@ -2510,7 +2510,6 @@ static void rcu_do_batch(struct rcu_data *rdp)
  
  	local_irq_save(flags);

rcu_nocb_lock(rdp);
-   count = -rcl.len;
rdp->n_cbs_invoked += count;
trace_rcu_batch_end(rcu_state.name, count, !!rcl.head, need_resched(),
is_idle_task(current), rcu_is_callbacks_kthread());
@@ -2518,7 +2517,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
/* Update counts and requeue any remaining callbacks. */
rcu_segcblist_insert_done_cbs(>cblist, );
smp_mb(); /* List handling before counting for rcu_barrier(). */
-   rcu_segcblist_insert_count(>cblist, );
+   rcu_segcblist_add_len(>cblist, -count);
  
  	/* Reinstate batch limit if we have worked down the excess. */

count = rcu_segcblist_n_cbs(>cblist);



--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member of the Code Aurora Forum, hosted by The Linux Foundation


[tip: core/rcu] rcu/tree: Remove CONFIG_PREMPT_RCU check in force_qs_rnp()

2020-10-09 Thread tip-bot2 for Neeraj Upadhyay
The following commit has been merged into the core/rcu branch of tip:

Commit-ID: 9b1ce0acb5e65e9ea1e6b322562d072f9f7d1f6e
Gitweb:
https://git.kernel.org/tip/9b1ce0acb5e65e9ea1e6b322562d072f9f7d1f6e
Author:Neeraj Upadhyay 
AuthorDate:Mon, 22 Jun 2020 23:37:03 +05:30
Committer: Paul E. McKenney 
CommitterDate: Mon, 24 Aug 2020 18:36:06 -07:00

rcu/tree: Remove CONFIG_PREMPT_RCU check in force_qs_rnp()

Originally, the call to rcu_preempt_blocked_readers_cgp() from
force_qs_rnp() had to be conditioned on CONFIG_PREEMPT_RCU=y, as in
commit a77da14ce9af ("rcu: Yet another fix for preemption and CPU
hotplug").  However, there is now a CONFIG_PREEMPT_RCU=n definition of
rcu_preempt_blocked_readers_cgp() that unconditionally returns zero, so
invoking it is now safe.  In addition, the CONFIG_PREEMPT_RCU=n definition
of rcu_initiate_boost() simply releases the rcu_node structure's ->lock,
which is what happens when the "if" condition evaluates to false.

This commit therefore drops the IS_ENABLED(CONFIG_PREEMPT_RCU) check,
so that rcu_initiate_boost() is called only in CONFIG_PREEMPT_RCU=y
kernels when there are readers blocking the current grace period.
This does not change the behavior, but reduces code-reader confusion by
eliminating non-CONFIG_PREEMPT_RCU=y calls to rcu_initiate_boost().

Signed-off-by: Neeraj Upadhyay 
Signed-off-by: Paul E. McKenney 
---
 kernel/rcu/tree.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 4770d77..acc926f 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2533,8 +2533,7 @@ static void force_qs_rnp(int (*f)(struct rcu_data *rdp))
raw_spin_lock_irqsave_rcu_node(rnp, flags);
rcu_state.cbovldnext |= !!rnp->cbovldmask;
if (rnp->qsmask == 0) {
-   if (!IS_ENABLED(CONFIG_PREEMPT_RCU) ||
-   rcu_preempt_blocked_readers_cgp(rnp)) {
+   if (rcu_preempt_blocked_readers_cgp(rnp)) {
/*
 * No point in scanning bits because they
 * are all zero.  But we might need to


[tip: core/rcu] rcu/tree: Force quiescent state on callback overload

2020-10-09 Thread tip-bot2 for Neeraj Upadhyay
The following commit has been merged into the core/rcu branch of tip:

Commit-ID: 9c39245382de4d52a122641952900709d4a9950b
Gitweb:
https://git.kernel.org/tip/9c39245382de4d52a122641952900709d4a9950b
Author:Neeraj Upadhyay 
AuthorDate:Mon, 22 Jun 2020 00:07:27 +05:30
Committer: Paul E. McKenney 
CommitterDate: Mon, 24 Aug 2020 18:36:05 -07:00

rcu/tree: Force quiescent state on callback overload

On callback overload, it is necessary to quickly detect idle CPUs,
and rcu_gp_fqs_check_wake() checks for this condition.  Unfortunately,
the code following the call to this function does not repeat this check,
which means that in reality no actual quiescent-state forcing, instead
only a couple of quick and pointless wakeups at the beginning of the
grace period.

This commit therefore adds a check for the RCU_GP_FLAG_OVLD flag in
the post-wakeup "if" statement in rcu_gp_fqs_loop().

Fixes: 1fca4d12f4637 ("rcu: Expedite first two FQS scans under 
callback-overload conditions")
Reviewed-by: Joel Fernandes (Google) 
Signed-off-by: Neeraj Upadhyay 
Signed-off-by: Paul E. McKenney 
---
 kernel/rcu/tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 8969120..4770d77 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1884,7 +1884,7 @@ static void rcu_gp_fqs_loop(void)
break;
/* If time for quiescent-state forcing, do it. */
if (!time_after(rcu_state.jiffies_force_qs, jiffies) ||
-   (gf & RCU_GP_FLAG_FQS)) {
+   (gf & (RCU_GP_FLAG_FQS | RCU_GP_FLAG_OVLD))) {
trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
   TPS("fqsstart"));
rcu_gp_fqs(first_gp_fqs);


Re: [PATCH v2] rcu/tree: nocb: Avoid raising softirq when there are ready to execute CBs

2020-10-08 Thread Neeraj Upadhyay




On 10/8/2020 4:04 AM, Paul E. McKenney wrote:

On Sun, Oct 04, 2020 at 10:11:32PM -0400, Joel Fernandes (Google) wrote:

During testing, I see it is possible that rcu_pending() returns 1 when
offloaded callbacks are ready to execute thus raising the RCU softirq.

However, softirq does not execute offloaded callbacks. They are executed in a
kthread which is awakened independent of the softirq.

This commit therefore avoids raising the softirq in the first place. That's
probably a good thing considering that the purpose of callback offloading is to
reduce softirq activity.

Passed 30 minute tests of TREE01 through TREE09 each.

On TREE08, I notice that there is atmost 150us from when the softirq was
NOT raised when ready cbs were present, to when the ready callbacks were
invoked by the rcuop thread. This also further confirms that there is no
need to raise the softirq for ready cbs in the first place.

Cc: neer...@codeaurora.org
Signed-off-by: Joel Fernandes (Google) 


Looks good, applied, thank you!  I reworked things a bit based on
previous patches and to more precisely capture why this patch does
not cause additional problems.  Please let me know if I messed
anything up.

Thanx, Paul



commit 33847a34a2d261354a79b4a24d9d37222e8ec888
Author: Joel Fernandes (Google) 
Date:   Wed Oct 7 13:50:36 2020 -0700

 rcu/tree: nocb: Avoid raising softirq for offloaded ready-to-execute CBs
 
 Testing showed that rcu_pending() can return 1 when offloaded callbacks

 are ready to execute.  This invokes RCU core processing, for example,
 by raising RCU_SOFTIRQ, eventually resulting in a call to rcu_core().
 However, rcu_core() explicitly avoids in any way manipulating offloaded
 callbacks, which are instead handled by the rcuog and rcuoc kthreads,
 which work independently of rcu_core().
 
 One exception to this independence is that rcu_core() invokes

 do_nocb_deferred_wakeup(), however, rcu_pending() also checks
 rcu_nocb_need_deferred_wakeup() in order to correctly handle this case,
 invoking rcu_core() when needed.
 
 This commit therefore avoids needlessly invoking RCU core processing

 by checking rcu_segcblist_ready_cbs() only on non-offloaded CPUs.
 This reduces overhead, for example, by reducing softirq activity.
 
 This change passed 30 minute tests of TREE01 through TREE09 each.
 
 On TREE08, there is at most 150us from the time that rcu_pending() chose

 not to invoke RCU core processing to the time when the ready callbacks
 were invoked by the rcuoc kthread.  This provides further evidence that
 there is no need to invoke rcu_core() for offloaded callbacks that are
 ready to invoke.
 
 Cc: Neeraj Upadhyay 

 Signed-off-by: Joel Fernandes (Google) 
 Signed-off-by: Paul E. McKenney 



Reviewed-by: Neeraj Upadhyay 


Thanks
Neeraj


diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 85e3f29..bfd38f2 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3716,7 +3716,8 @@ static int rcu_pending(int user)
return 1;
  
  	/* Does this CPU have callbacks ready to invoke? */

-   if (rcu_segcblist_ready_cbs(>cblist))
+   if (!rcu_segcblist_is_offloaded(>cblist) &&
+   rcu_segcblist_ready_cbs(>cblist))
return 1;
  
  	/* Has RCU gone idle with this CPU needing another grace period? */




--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member of the Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH] rcu: Clarify nocb kthreads naming in RCU_NOCB_CPU config

2020-09-24 Thread Neeraj Upadhyay

Hi Paul,

On 9/25/2020 4:29 AM, Paul E. McKenney wrote:

On Thu, Sep 24, 2020 at 12:04:10PM +0530, Neeraj Upadhyay wrote:

Clarify the "x" in rcuox/N naming in RCU_NOCB_CPU config
description.

Signed-off-by: Neeraj Upadhyay 


Applied with a few additional updates as shown below.  As always, please
let me know if I messed anything up.



Looks good! thanks!


Thanks
Neeraj


Thanx, Paul



commit 8d1d776b4998896a6f8f4608edb0b258bd37ec9f
Author: Neeraj Upadhyay 
Date:   Thu Sep 24 12:04:10 2020 +0530

 rcu: Clarify nocb kthreads naming in RCU_NOCB_CPU config
 
 This commit clarifies that the "p" and the "s" in the in the RCU_NOCB_CPU

 config-option description refer to the "x" in the "rcuox/N" kthread name.
 
 Signed-off-by: Neeraj Upadhyay 

 [ paulmck: While in the area, update description and advice. ]
 Signed-off-by: Paul E. McKenney 

diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
index b71e21f..cdc57b4 100644
--- a/kernel/rcu/Kconfig
+++ b/kernel/rcu/Kconfig
@@ -221,19 +221,23 @@ config RCU_NOCB_CPU
  Use this option to reduce OS jitter for aggressive HPC or
  real-time workloads.  It can also be used to offload RCU
  callback invocation to energy-efficient CPUs in battery-powered
- asymmetric multiprocessors.
+ asymmetric multiprocessors.  The price of this reduced jitter
+ is that the overhead of call_rcu() increases and that some
+ workloads will incur significant increases in context-switch
+ rates.
  
  	  This option offloads callback invocation from the set of CPUs

  specified at boot time by the rcu_nocbs parameter.  For each
  such CPU, a kthread ("rcuox/N") will be created to invoke
  callbacks, where the "N" is the CPU being offloaded, and where
- the "p" for RCU-preempt (PREEMPTION kernels) and "s" for RCU-sched
- (!PREEMPTION kernels).  Nothing prevents this kthread from running
- on the specified CPUs, but (1) the kthreads may be preempted
- between each callback, and (2) affinity or cgroups can be used
- to force the kthreads to run on whatever set of CPUs is desired.
-
- Say Y here if you want to help to debug reduced OS jitter.
+ the "x" is "p" for RCU-preempt (PREEMPTION kernels) and "s" for
+ RCU-sched (!PREEMPTION kernels).  Nothing prevents this kthread
+ from running on the specified CPUs, but (1) the kthreads may be
+ preempted between each callback, and (2) affinity or cgroups can
+ be used to force the kthreads to run on whatever set of CPUs is
+ desired.
+
+ Say Y here if you need reduced OS jitter, despite added overhead.
  Say N here if you are unsure.
  
  config TASKS_TRACE_RCU_READ_MB




--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member of the Code Aurora Forum, hosted by The Linux Foundation


[PATCH] rcu: Clarify nocb kthreads naming in RCU_NOCB_CPU config

2020-09-24 Thread Neeraj Upadhyay
Clarify the "x" in rcuox/N naming in RCU_NOCB_CPU config
description.

Signed-off-by: Neeraj Upadhyay 
---
 kernel/rcu/Kconfig | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
index b71e21f..5b22747 100644
--- a/kernel/rcu/Kconfig
+++ b/kernel/rcu/Kconfig
@@ -227,11 +227,12 @@ config RCU_NOCB_CPU
  specified at boot time by the rcu_nocbs parameter.  For each
  such CPU, a kthread ("rcuox/N") will be created to invoke
  callbacks, where the "N" is the CPU being offloaded, and where
- the "p" for RCU-preempt (PREEMPTION kernels) and "s" for RCU-sched
- (!PREEMPTION kernels).  Nothing prevents this kthread from running
- on the specified CPUs, but (1) the kthreads may be preempted
- between each callback, and (2) affinity or cgroups can be used
- to force the kthreads to run on whatever set of CPUs is desired.
+ the "x" is "p" for RCU-preempt (PREEMPTION kernels) and "s" for
+ RCU-sched (!PREEMPTION kernels).  Nothing prevents this kthread
+ from running on the specified CPUs, but (1) the kthreads may be
+ preempted between each callback, and (2) affinity or cgroups can
+ be used to force the kthreads to run on whatever set of CPUs is
+ desired.
 
  Say Y here if you want to help to debug reduced OS jitter.
  Say N here if you are unsure.
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



Re: [PATCH V2] rcu/tree: Correctly handle single cpu check in rcu_blocking_is_gp

2020-09-23 Thread Neeraj Upadhyay

Hi Paul,

On 9/24/2020 2:33 AM, Paul E. McKenney wrote:

On Wed, Sep 23, 2020 at 12:59:33PM +0530, Neeraj Upadhyay wrote:

Currently, for non-preempt kernels (with CONFIG_PREEMPTION=n),
rcu_blocking_is_gp() checks (with preemption disabled), whether
there is only one cpu online. It uses num_online_cpus() to
decide whether only one cpu is online. If there is only single
cpu online, synchronize_rcu() is optimized to return without
doing all the work to wait for grace period. However, there are
few issues with the num_online_cpus() check used, for transition
of __num_online_cpus from 2 -> 1 for cpu down path and 1 -> 2
for cpu up path.


Again, good catch!

As usual, I could not resist editing the commit log and comments, so
could you please look the following over to make sure that I did not
mess anything up?



The commit log and comments look very good! Thanks!


Thanks
Neeraj


Thanx, Paul



commit 7af8c1c8d13c6c9c7013fbfef77f843dbc430c4b
Author: Neeraj Upadhyay 
Date:   Wed Sep 23 12:59:33 2020 +0530

 rcu: Fix single-CPU check in rcu_blocking_is_gp()
 
 Currently, for CONFIG_PREEMPTION=n kernels, rcu_blocking_is_gp() uses

 num_online_cpus() to determine whether there is only one CPU online.  When
 there is only single CPU online, the simple fact that synchronize_rcu()
 could be legally called implies that a full grace period has elapsed.
 Therefore, in the single-CPU case, synchronize_rcu() simply returns
 immediately.  Unfortunately, num_online_cpus() is unreliable while a
 CPU-hotplug operation is transitioning to or from single-CPU operation
 because:
 
 1.  num_online_cpus() uses atomic_read(&__num_online_cpus) to

 locklessly sample the number of online CPUs.  The hotplug locks
 are not held, which means that an incoming CPU can concurrently
 update this count.  This in turn means that an RCU read-side
 critical section on the incoming CPU might observe updates
 prior to the grace period, but also that this critical section
 might extend beyond the end of the optimized synchronize_rcu().
 This breaks RCU's fundamental guarantee.
 
 2.  In addition, num_online_cpus() does no ordering, thus providing

 another way that RCU's fundamental guarantee can be broken by
 the current code.
 
 3.  The most probable failure mode happens on outgoing CPUs.

 The outgoing CPU updates the count of online CPUs in the
 CPUHP_TEARDOWN_CPU stop-machine handler, which is fine in
 and of itself due to preemption being disabled at the call
 to num_online_cpus().  Unfortunately, after that stop-machine
 handler returns, the CPU takes one last trip through the
 scheduler (which has RCU readers) and, after the resulting
 context switch, one final dive into the idle loop.  During this
 time, RCU needs to keep track of two CPUs, but num_online_cpus()
 will say that there is only one, which in turn means that the
 surviving CPU will incorrectly ignore the outgoing CPU's RCU
 read-side critical sections.
 
 This problem is illustrated by the following litmus test in which P0()

 corresponds to synchronize_rcu() and P1() corresponds to the incoming CPU.
 The herd7 tool confirms that the "exists" clause can be satisfied,
 thus demonstrating that this breakage can happen according to the Linux
 kernel memory model.
 
{

  int x = 0;
  atomic_t numonline = ATOMIC_INIT(1);
}
 
P0(int *x, atomic_t *numonline)

{
  int r0;
  WRITE_ONCE(*x, 1);
  r0 = atomic_read(numonline);
  if (r0 == 1) {
smp_mb();
  } else {
synchronize_rcu();
  }
  WRITE_ONCE(*x, 2);
}
 
P1(int *x, atomic_t *numonline)

{
  int r0; int r1;
 
  atomic_inc(numonline);

  smp_mb();
  rcu_read_lock();
  r0 = READ_ONCE(*x);
  smp_rmb();
  r1 = READ_ONCE(*x);
  rcu_read_unlock();
}
 
locations [x;numonline;]
 
exists (1:r0=0 /\ 1:r1=2)
 
 It is important to note that these problems arise only when the system

 is transitioning to or from single-CPU operation.
 
 One solution would be to hold the CPU-hotplug locks while sampling

 num_online_cpus(), which was in fact the intent of the (redundant)
 preempt_disable() and preempt_enable() surrounding this call to
 num_online_cpus().  Actually blocking CPU hotplug would not only result
 in excessive overhead, but would also unnecessaril

[PATCH V2] rcu/tree: Correctly handle single cpu check in rcu_blocking_is_gp

2020-09-23 Thread Neeraj Upadhyay
Currently, for non-preempt kernels (with CONFIG_PREEMPTION=n),
rcu_blocking_is_gp() checks (with preemption disabled), whether
there is only one cpu online. It uses num_online_cpus() to
decide whether only one cpu is online. If there is only single
cpu online, synchronize_rcu() is optimized to return without
doing all the work to wait for grace period. However, there are
few issues with the num_online_cpus() check used, for transition
of __num_online_cpus from 2 -> 1 for cpu down path and 1 -> 2
for cpu up path.

1. For CPU path, num_online_cpus() does a atomic_read(&__num_online_cpus).
   As hotplug locks are not held, this does not ensure that
   new incoming cpus update of the count is visible. This can
   result in read side section on new incoming cpu, observe
   updates which should not be visible beyond the grace period
   corresponding to synchronize_rcu().

   For e.g. below litmus test, where P0 process corresponds to
   synchronize_rcu() and P1 corresponds to new online cpu,  has
   positive witnesses; confirming the possibility of read side
   section to extend before and after the grace period, thereby
   breaking guarantees provided by synchronize_rcu().

   {
 int x = 0;
 atomic_t numonline = ATOMIC_INIT(1);
   }

   P0(int *x, atomic_t *numonline)
   {
 int r0;
 WRITE_ONCE(*x, 1);
 r0 = atomic_read(numonline);
 if (r0 == 1) {
   smp_mb();
 } else {
   synchronize_rcu();
 }
 WRITE_ONCE(*x, 2);
   }

   P1(int *x, atomic_t *numonline)
   {
 int r0; int r1;

 atomic_inc(numonline);
 smp_mb();
 rcu_read_lock();
 r0 = READ_ONCE(*x);
 smp_rmb();
 r1 = READ_ONCE(*x);
 rcu_read_unlock();
   }

   locations [x;numonline;]

   exists (1:r0=0 /\ 1:r1=2)

2. Second problem is, the same early exit, from synchronize_rcu()
   does not provide full ordering, memory barrier, w.r.t. memory
   accesses after synchronize_rcu() call.

3. Third, more important issue is related to outgoing cpu. Checking
   only for __num_online_cpus with preemotion disabled isn't sufficient
   for RCU, as, on completion of CPUHP_TEARDOWN_CPU stop machine (which
   clears outgoing cpu from __num_online_cpus, the CPU switches to idle
   task. So, checking only for __num_online_cpus does not consider
   RCU read side sections in scheduler code (before switching to idle
   task) and any potential read side sections in idle task, before final
   RCU-quiesce entry into cpuhp_report_idle_dead() -> rcu_report_dead().

The problem here is, changes to number of online cpus happen from places
that are not helpful for lockless access from synchronize_rcu() :
for offlining cpu, it happens from stop machine on that cpu, before
cpu is marked offline in rcu bookkeeping, resulting in unwatched RCU
critical sections; and for onlining cpu, it happens on the new cpu,
after it is online, resulting in visibility issues.

To handle these issues, add a new rcu_state member n_online_cpus, to keep
count of the current number of online cpus. This counter is incremented
early in the CPU-hotplug CPU-online process, that is, on the control CPU,
prior to the new CPU coming online. For outgoing cpu, counter is
decremented on the control cpu, after the offlining cpu is long gone.

Update of this counter from control cpu means that, if rcu_state.
n_online_cpus is equal to one in rcu_blocking_is_gp(), below is
guaranteed:

a. There is only one CPU, and that CPU sees all prior accesses made by
   any CPU that was online at the time of its access.  Furthermore,
   rcu_state.n_online_cpus value cannot change, as CONFIG_PREEMPTION=n.

b. All later CPUs (both this one and any that come online later on) are
   guaranteed to see all accesses by any CPU prior to this point in the
   code. The CPU-hotplug code provides the required memory barriers to
   ensure this.

This eliminates the need for any additional memory barriers for the single
cpu online case; it also eliminates the early counter update and visibility
problems with __num_online_cpus value. For multiple cpu online case,
synchronize_rcu() does all the work of grace period initiation and wait
and that provides the required ordering gurantees. rcu_state.n_online_cpus
update from control cpu would result in unnecessary calls to
synchronize_rcu() slow path during the CPU-online process, but that
should have negligible impact.

Signed-off-by: Neeraj Upadhyay 
---

Changes in V2:

- Make rcu_state.n_online_cpus int, instead of atomic_t.
- Move counter updates to rcutree_prepare_cpu() and rcutree_dead_cpu().
- Update commit log and code comments to highlight single-cpu behavior.

 kernel/rcu/tree.c | 21 -
 kernel/rcu/tree.h |  1 +
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 2424e2a..facdec9 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2405,6 +2405,7 @@ int rcutree_dead_cpu(unsigned int cpu)
if (!IS_ENABLED(CONFI

Re: [PATCH] rcu/tree: Correctly handle single cpu check in rcu_blocking_is_gp

2020-09-23 Thread Neeraj Upadhyay

Hi Paul,

On 9/23/2020 1:59 AM, Paul E. McKenney wrote:

On Tue, Sep 22, 2020 at 01:15:57AM +0530, Neeraj Upadhyay wrote:

Currently, for non-preempt kernels (with CONFIG_PREEMPTION=n),
rcu_blocking_is_gp() checks (with preemption disabled), whether
there is only one cpu online. It uses num_online_cpus() to
decide whether only one cpu is online. If there is only single
cpu online, synchronize_rcu() is optimized to return without
doing all the work to wait for grace period. However, there are
few issues with the num_online_cpus() check used:


Great catch!!!

I do have some questions about your suggested fix, though.


1. num_online_cpus() does a atomic_read(&__num_online_cpus). As
hotplug locks are not held, this does not ensure that
new incoming cpus update of the count is visible. This can
result in read side section on new incoming cpu, observe
updates which should not be visible beyond the grace period
corresponding to synchronize_rcu().

For e.g. below litmus test, where P0 process corresponds to
synchronize_rcu() and P1 corresponds to new online cpu,  has
positive witnesses; confirming the possibility of read side
section to extend before and after the grace period, thereby
breaking guarantees provided by synchronize_rcu().

{
  int x = 0;
  atomic_t numonline = ATOMIC_INIT(1);
}

P0(int *x, atomic_t *numonline)
{
  int r0;
  WRITE_ONCE(*x, 1);
  r0 = atomic_read(numonline);
  if (r0 == 1) {
smp_mb();
  } else {
synchronize_rcu();
  }
  WRITE_ONCE(*x, 2);
}

P1(int *x, atomic_t *numonline)
{
  int r0; int r1;

  atomic_inc(numonline);
  smp_mb();
  rcu_read_lock();
  r0 = READ_ONCE(*x);
  smp_rmb();
  r1 = READ_ONCE(*x);
  rcu_read_unlock();
}

locations [x;numonline;]

exists (1:r0=0 /\ 1:r1=2)

2. Second problem is, the same early exit, from synchronize_rcu()
does not provide full ordering, memory barrier, w.r.t. memory
accesses after synchronize_rcu() call.

3. Third, more important issue is related to outgoing cpu. Checking
only for __num_online_cpus with preemotion disabled isn't sufficient
for RCU, as, on completion of CPUHP_TEARDOWN_CPU stop machine (which
clears outgoing cpu from __num_online_cpus, the CPU switches to idle
task. So, checking only for __num_online_cpus does not consider
RCU read side sections in scheduler code (before switching to idle
task) and any potential read side sections in idle task, before final
RCU-quiesce entry into cpuhp_report_idle_dead() -> rcu_report_dead().

To handle these issues, add a new rcu_state member n_online_cpus, to
keep account of the current number of online cpus. The atomic updates
to this counter from rcu_report_dead() and rcu_cpu_starting() and
the added read/write memory ordering semantics ensure that
synchronize_rcu() fast path waits for all read side sections, where
incoming/outgoing cpus are considered online, for RCU i.e. after
rcu_cpu_starting() and before rcu_report_dead().

Signed-off-by: Neeraj Upadhyay 
---

Below is the reproducer for issue described in point 3; this snippet
is based on klitmus generated test, which is modified to sample reads
from idle thread:

static void code0(int* x) {
WRITE_ONCE(*x, 1);
   idle_ctr = 0;
   smp_mb();
   mdelay(10);
   WRITE_ONCE(*x, 1);
   idle_ctr = 1;
   synchronize_rcu();
   WRITE_ONCE(*x, 2);
   idle_ctr = 2;

}

static int thread0(void *_p) {
   int _j, _i;
   ctx_t *_a = (ctx_t *)_p;

   smp_mb();
   for (_j = 0 ; _j < stride ; _j++) {
 for (_i = _j ; _i < size ; _i += stride) {
   while (idle_wait1) {
   cpu_relax();
   cond_resched();
 }
   code0(&_a->x[_i]);
   smp_mb();
   get_online_cpus();
   idle_wait1 = true;
   put_online_cpus();
 }
   }
   atomic_inc();
   smp_mb();
   wake_up(wq);
   smp_mb();
   do_exit(0);
}


static void code1(int* x,int* out_1_r1,int* out_1_r0) {

   int r0; int r1;

   r0 = READ_ONCE(idle_ctr_snap1);
   r1 = READ_ONCE(idle_ctr_snap2);

   *out_1_r1 = (int)r1;
   *out_1_r0 = (int)r0;
}

static int thread1(void *_p) {
   ctx_t *_a = (ctx_t *)_p;
   int _j, _i;

   smp_mb();
   for (_j = 0 ; _j < stride ; _j++) {
 for (_i = _j ; _i < size ; _i += stride) {
   while (idle_wait2) {
   cpu_relax();
   cond_resched();
   }
   get_online_cpus();
   code1(&_a->x[_i],&_a->out_1_r1[_i],&_a->out_1_r0[_i]);
   smp_mb();
   idle_wait2 = true;
   put_online_cpus();
 }
   }
   atomic_inc();
   smp_mb();
   wake_up(wq);
   smp_mb();
   do_exit(0);
}

Idle thread snippet:

if (cpu_is_offline(cpu)) {
 smp_mb();
 idle_wait1 = false;
 mdelay(8);
 smp_mb();
 rcu_read_lock();
 idle_ctr_snap1 = idle_ctr;
 mdelay(40);
 smp_rmb();
 idle_ctr_snap2 = idle_c

[PATCH] rcu/tree: Correctly handle single cpu check in rcu_blocking_is_gp

2020-09-21 Thread Neeraj Upadhyay
Currently, for non-preempt kernels (with CONFIG_PREEMPTION=n),
rcu_blocking_is_gp() checks (with preemption disabled), whether
there is only one cpu online. It uses num_online_cpus() to
decide whether only one cpu is online. If there is only single
cpu online, synchronize_rcu() is optimized to return without
doing all the work to wait for grace period. However, there are
few issues with the num_online_cpus() check used:

1. num_online_cpus() does a atomic_read(&__num_online_cpus). As
   hotplug locks are not held, this does not ensure that
   new incoming cpus update of the count is visible. This can
   result in read side section on new incoming cpu, observe
   updates which should not be visible beyond the grace period
   corresponding to synchronize_rcu().

   For e.g. below litmus test, where P0 process corresponds to
   synchronize_rcu() and P1 corresponds to new online cpu,  has
   positive witnesses; confirming the possibility of read side
   section to extend before and after the grace period, thereby
   breaking guarantees provided by synchronize_rcu().

   {
 int x = 0;
 atomic_t numonline = ATOMIC_INIT(1);
   }

   P0(int *x, atomic_t *numonline)
   {
 int r0;
 WRITE_ONCE(*x, 1);
 r0 = atomic_read(numonline);
 if (r0 == 1) {
   smp_mb();
 } else {
   synchronize_rcu();
 }
 WRITE_ONCE(*x, 2);
   }

   P1(int *x, atomic_t *numonline)
   {
 int r0; int r1;

 atomic_inc(numonline);
 smp_mb();
 rcu_read_lock();
 r0 = READ_ONCE(*x);
 smp_rmb();
 r1 = READ_ONCE(*x);
 rcu_read_unlock();
   }

   locations [x;numonline;]

   exists (1:r0=0 /\ 1:r1=2)

2. Second problem is, the same early exit, from synchronize_rcu()
   does not provide full ordering, memory barrier, w.r.t. memory
   accesses after synchronize_rcu() call.

3. Third, more important issue is related to outgoing cpu. Checking
   only for __num_online_cpus with preemotion disabled isn't sufficient
   for RCU, as, on completion of CPUHP_TEARDOWN_CPU stop machine (which
   clears outgoing cpu from __num_online_cpus, the CPU switches to idle
   task. So, checking only for __num_online_cpus does not consider
   RCU read side sections in scheduler code (before switching to idle
   task) and any potential read side sections in idle task, before final
   RCU-quiesce entry into cpuhp_report_idle_dead() -> rcu_report_dead().

To handle these issues, add a new rcu_state member n_online_cpus, to
keep account of the current number of online cpus. The atomic updates
to this counter from rcu_report_dead() and rcu_cpu_starting() and
the added read/write memory ordering semantics ensure that
synchronize_rcu() fast path waits for all read side sections, where
incoming/outgoing cpus are considered online, for RCU i.e. after
rcu_cpu_starting() and before rcu_report_dead().

Signed-off-by: Neeraj Upadhyay 
---

Below is the reproducer for issue described in point 3; this snippet
is based on klitmus generated test, which is modified to sample reads
from idle thread:

static void code0(int* x) {
WRITE_ONCE(*x, 1);
  idle_ctr = 0;
  smp_mb();
  mdelay(10);
  WRITE_ONCE(*x, 1);
  idle_ctr = 1;
  synchronize_rcu();
  WRITE_ONCE(*x, 2);
  idle_ctr = 2;

}

static int thread0(void *_p) {
  int _j, _i;
  ctx_t *_a = (ctx_t *)_p;

  smp_mb();
  for (_j = 0 ; _j < stride ; _j++) {
for (_i = _j ; _i < size ; _i += stride) {
  while (idle_wait1) {
  cpu_relax();
  cond_resched();
}
  code0(&_a->x[_i]);
  smp_mb();
  get_online_cpus();
  idle_wait1 = true;
  put_online_cpus();
}
  }
  atomic_inc();
  smp_mb();
  wake_up(wq);
  smp_mb();
  do_exit(0);
}


static void code1(int* x,int* out_1_r1,int* out_1_r0) {

  int r0; int r1;

  r0 = READ_ONCE(idle_ctr_snap1);
  r1 = READ_ONCE(idle_ctr_snap2);

  *out_1_r1 = (int)r1;
  *out_1_r0 = (int)r0;
}

static int thread1(void *_p) {
  ctx_t *_a = (ctx_t *)_p;
  int _j, _i;

  smp_mb();
  for (_j = 0 ; _j < stride ; _j++) {
for (_i = _j ; _i < size ; _i += stride) {
  while (idle_wait2) {
  cpu_relax();
  cond_resched();
  }
  get_online_cpus();
  code1(&_a->x[_i],&_a->out_1_r1[_i],&_a->out_1_r0[_i]);
  smp_mb();
  idle_wait2 = true;
  put_online_cpus();
}
  }
  atomic_inc();
  smp_mb();
  wake_up(wq);
  smp_mb();
  do_exit(0);
}

Idle thread snippet:

if (cpu_is_offline(cpu)) {
smp_mb();
idle_wait1 = false;
mdelay(8);
smp_mb();
rcu_read_lock();
idle_ctr_snap1 = idle_ctr;
mdelay(40);
smp_rmb();
idle_ctr_snap2 = idle_ctr;
rcu_read_unlock();
smp_mb();
idle_wait2 = false;
tick_nohz_idle_stop_tick();
cpuhp_report_idle_dead();
arch_cpu_idle_dead();

 kernel/rcu/tree.c | 65 +++
 kernel/rcu/tree.h |  1 +
 2 files changed, 66 insertions(+)

dif

Re: [PATCH] rcu/tree: Remove CONFIG_PREMPT_RCU check in force_qs_rnp

2020-06-23 Thread Neeraj Upadhyay

Hi Paul,

On 6/23/2020 4:48 AM, Paul E. McKenney wrote:

On Mon, Jun 22, 2020 at 11:37:03PM +0530, Neeraj Upadhyay wrote:

Remove CONFIG_PREMPT_RCU check in force_qs_rnp(). Originally,
this check was required to skip executing fqs failsafe
for rcu-sched, which was added in commit a77da14ce9af ("rcu:
Yet another fix for preemption and CPU hotplug"). However,
this failsafe has been removed, since then. So, cleanup the
code to avoid any confusion around the need for boosting,
for !CONFIG_PREMPT_RCU.

Signed-off-by: Neeraj Upadhyay 


Good point, there is a !PREEMPT definition of the function
rcu_preempt_blocked_readers_cgp() that unconditionally returns zero.
And if !PREEMPT kernels, the same things happens in the "if"
body as after it, so behavior is not changed.

I have queued and pushed this with an upgraded commit log as
shown below.

Thanx, Paul



Thanks! patch looks good to me!

Thanks
Neeraj


---
  kernel/rcu/tree.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 6226bfb..57c904b 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2514,8 +2514,7 @@ static void force_qs_rnp(int (*f)(struct rcu_data *rdp))
raw_spin_lock_irqsave_rcu_node(rnp, flags);
rcu_state.cbovldnext |= !!rnp->cbovldmask;
if (rnp->qsmask == 0) {
-   if (!IS_ENABLED(CONFIG_PREEMPT_RCU) ||
-   rcu_preempt_blocked_readers_cgp(rnp)) {
+   if (rcu_preempt_blocked_readers_cgp(rnp)) {
/*
 * No point in scanning bits because they
 * are all zero.  But we might need to
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project




commit a4600389c35010aef414b89e2817d4a527e751b5
Author: Neeraj Upadhyay 
Date:   Mon Jun 22 23:37:03 2020 +0530

 rcu/tree: Remove CONFIG_PREMPT_RCU check in force_qs_rnp()
 
 Originally, the call to rcu_preempt_blocked_readers_cgp() from

 force_qs_rnp() had to be conditioned on CONFIG_PREEMPT_RCU=y, as in
 commit a77da14ce9af ("rcu: Yet another fix for preemption and CPU
 hotplug").  However, there is now a CONFIG_PREEMPT_RCU=n definition of
 rcu_preempt_blocked_readers_cgp() that unconditionally returns zero, so
 invoking it is now safe.  In addition, the CONFIG_PREEMPT_RCU=n definition
 of rcu_initiate_boost() simply releases the rcu_node structure's ->lock,
 which is what happens when the "if" condition evaluates to false.
 
 This commit therefore drops the IS_ENABLED(CONFIG_PREEMPT_RCU) check,

 so that rcu_initiate_boost() is called only in CONFIG_PREEMPT_RCU=y
 kernels when there are readers blocking the current grace period.
 This does not change the behavior, but reduces code-reader confusion by
 eliminating non-CONFIG_PREEMPT_RCU=y calls to rcu_initiate_boost().
 
 Signed-off-by: Neeraj Upadhyay 

 Signed-off-by: Paul E. McKenney 

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 6226bfb..57c904b 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2514,8 +2514,7 @@ static void force_qs_rnp(int (*f)(struct rcu_data *rdp))
raw_spin_lock_irqsave_rcu_node(rnp, flags);
rcu_state.cbovldnext |= !!rnp->cbovldmask;
if (rnp->qsmask == 0) {
-   if (!IS_ENABLED(CONFIG_PREEMPT_RCU) ||
-   rcu_preempt_blocked_readers_cgp(rnp)) {
+   if (rcu_preempt_blocked_readers_cgp(rnp)) {
/*
 * No point in scanning bits because they
 * are all zero.  But we might need to



--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member of the Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH] rcu/tree: Force quiescent state on callback overload

2020-06-23 Thread Neeraj Upadhyay

Hi Paul,

On 6/23/2020 4:23 AM, Paul E. McKenney wrote:

On Mon, Jun 22, 2020 at 09:16:24AM +0530, Neeraj Upadhyay wrote:

Hi Paul,

On 6/22/2020 8:43 AM, Paul E. McKenney wrote:

On Mon, Jun 22, 2020 at 01:30:31AM +0530, Neeraj Upadhyay wrote:

Hi Paul,

On 6/22/2020 1:20 AM, Paul E. McKenney wrote:

On Mon, Jun 22, 2020 at 12:07:27AM +0530, Neeraj Upadhyay wrote:

On callback overload, we want to force quiescent state immediately,
for the first and second fqs. Enforce the same, by including
RCU_GP_FLAG_OVLD flag, in fqsstart check.

Signed-off-by: Neeraj Upadhyay 


Good catch!

But what did you do to verify that this change does the right thing?

Thanx, Paul



I haven't done a runtime verification of this code path; I posted this,
based on review of this code.


My concern is that under overload, the FQS scans would happen continuously
rather than accelerating only the first such scan in a given grace period.
This would of course result in a CPU-bound grace-period kthread, which
users might not be all that happy with.

Or am I missing something subtle that prevents this?


Looks like under overload, only the first and second scans are accelerated?

 gf = 0;
 if (first_gp_fqs) {
  first_gp_fqs = false;
   gf = rcu_state.cbovld ? RCU_GP_FLAG_OVLD : 0;
 }


Very good, it does sound like you understand this, and it matches my
analysis and passes light testing, so I queued this one.  I did improve
the commit log, please check below.  The added detail is helpful to people
(including ourselves, by the way) who might need to look at this commit
some time in the future.



Thanks; patch looks good; I will try to put more efforts on commit log
for future patches.


If you have an x86 system lying around, running rcutorture is quite
straightforward.  Non-x86 systems can also run rcutorture, if nothing
else by using modprobe and rmmod as described here:

https://paulmck.livejournal.com/57769.html

The scripting described in the latter part of this document has worked
on ARMv8 and PowerPC, and might still work for all I know.


I will set it up at my end; the livejournal is pretty detailed! Thanks
for sharing this!


Thanks
Neeraj



Thanx, Paul



commit 9482524d7dd0aea5d32a6efa2979223eea07c029
Author: Neeraj Upadhyay 
Date:   Mon Jun 22 00:07:27 2020 +0530

 rcu/tree: Force quiescent state on callback overload
 
 On callback overload, it is necessary to quickly detect idle CPUs,

 and rcu_gp_fqs_check_wake() checks for this condition.  Unfortunately,
 the code following the call to this function does not repeat this check,
 which means that in reality no actual quiescent-state forcing, instead
 only a couple of quick and pointless wakeups at the beginning of the
 grace period.
 
 This commit therefore adds a check for the RCU_GP_FLAG_OVLD flag in

 the post-wakeup "if" statement in rcu_gp_fqs_loop().
 
 Signed-off-by: Neeraj Upadhyay 

 Signed-off-by: Paul E. McKenney 

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index d0988a1..6226bfb 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1865,7 +1865,7 @@ static void rcu_gp_fqs_loop(void)
break;
/* If time for quiescent-state forcing, do it. */
if (!time_after(rcu_state.jiffies_force_qs, jiffies) ||
-   (gf & RCU_GP_FLAG_FQS)) {
+   (gf & (RCU_GP_FLAG_FQS | RCU_GP_FLAG_OVLD))) {
trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
   TPS("fqsstart"));
rcu_gp_fqs(first_gp_fqs);



--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member of the Code Aurora Forum, hosted by The Linux Foundation


[PATCH] rcu/tree: Remove CONFIG_PREMPT_RCU check in force_qs_rnp

2020-06-22 Thread Neeraj Upadhyay
Remove CONFIG_PREMPT_RCU check in force_qs_rnp(). Originally,
this check was required to skip executing fqs failsafe
for rcu-sched, which was added in commit a77da14ce9af ("rcu:
Yet another fix for preemption and CPU hotplug"). However,
this failsafe has been removed, since then. So, cleanup the
code to avoid any confusion around the need for boosting,
for !CONFIG_PREMPT_RCU.

Signed-off-by: Neeraj Upadhyay 
---
 kernel/rcu/tree.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 6226bfb..57c904b 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2514,8 +2514,7 @@ static void force_qs_rnp(int (*f)(struct rcu_data *rdp))
raw_spin_lock_irqsave_rcu_node(rnp, flags);
rcu_state.cbovldnext |= !!rnp->cbovldmask;
if (rnp->qsmask == 0) {
-   if (!IS_ENABLED(CONFIG_PREEMPT_RCU) ||
-   rcu_preempt_blocked_readers_cgp(rnp)) {
+   if (rcu_preempt_blocked_readers_cgp(rnp)) {
/*
 * No point in scanning bits because they
 * are all zero.  But we might need to
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



Re: [PATCH] rcu/tree: Force quiescent state on callback overload

2020-06-21 Thread Neeraj Upadhyay

Hi Paul,

On 6/22/2020 8:43 AM, Paul E. McKenney wrote:

On Mon, Jun 22, 2020 at 01:30:31AM +0530, Neeraj Upadhyay wrote:

Hi Paul,

On 6/22/2020 1:20 AM, Paul E. McKenney wrote:

On Mon, Jun 22, 2020 at 12:07:27AM +0530, Neeraj Upadhyay wrote:

On callback overload, we want to force quiescent state immediately,
for the first and second fqs. Enforce the same, by including
RCU_GP_FLAG_OVLD flag, in fqsstart check.

Signed-off-by: Neeraj Upadhyay 


Good catch!

But what did you do to verify that this change does the right thing?

Thanx, Paul



I haven't done a runtime verification of this code path; I posted this,
based on review of this code.


My concern is that under overload, the FQS scans would happen continuously
rather than accelerating only the first such scan in a given grace period.
This would of course result in a CPU-bound grace-period kthread, which
users might not be all that happy with.

Or am I missing something subtle that prevents this?


Looks like under overload, only the first and second scans are accelerated?

gf = 0;
if (first_gp_fqs) {
 first_gp_fqs = false;
  gf = rcu_state.cbovld ? RCU_GP_FLAG_OVLD : 0;
}


Thanks
Neeraj



But yes, it does look like the current mainline code fails to do the
first scan immediately, so again, good catch!

Thanx, Paul




Thanks
Neeraj


---
   kernel/rcu/tree.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index d0988a1..6226bfb 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1865,7 +1865,7 @@ static void rcu_gp_fqs_loop(void)
break;
/* If time for quiescent-state forcing, do it. */
if (!time_after(rcu_state.jiffies_force_qs, jiffies) ||
-   (gf & RCU_GP_FLAG_FQS)) {
+   (gf & (RCU_GP_FLAG_FQS | RCU_GP_FLAG_OVLD))) {
trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
   TPS("fqsstart"));
rcu_gp_fqs(first_gp_fqs);
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of
the Code Aurora Forum, hosted by The Linux Foundation


--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member of the Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH] rcu/tree: Force quiescent state on callback overload

2020-06-21 Thread Neeraj Upadhyay

Hi Paul,

On 6/22/2020 1:20 AM, Paul E. McKenney wrote:

On Mon, Jun 22, 2020 at 12:07:27AM +0530, Neeraj Upadhyay wrote:

On callback overload, we want to force quiescent state immediately,
for the first and second fqs. Enforce the same, by including
RCU_GP_FLAG_OVLD flag, in fqsstart check.

Signed-off-by: Neeraj Upadhyay 


Good catch!

But what did you do to verify that this change does the right thing?

Thanx, Paul



I haven't done a runtime verification of this code path; I posted this, 
based on review of this code.



Thanks
Neeraj


---
  kernel/rcu/tree.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index d0988a1..6226bfb 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1865,7 +1865,7 @@ static void rcu_gp_fqs_loop(void)
break;
/* If time for quiescent-state forcing, do it. */
if (!time_after(rcu_state.jiffies_force_qs, jiffies) ||
-   (gf & RCU_GP_FLAG_FQS)) {
+   (gf & (RCU_GP_FLAG_FQS | RCU_GP_FLAG_OVLD))) {
trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
   TPS("fqsstart"));
rcu_gp_fqs(first_gp_fqs);
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member of the Code Aurora Forum, hosted by The Linux Foundation


[PATCH] rcu/tree: Force quiescent state on callback overload

2020-06-21 Thread Neeraj Upadhyay
On callback overload, we want to force quiescent state immediately,
for the first and second fqs. Enforce the same, by including
RCU_GP_FLAG_OVLD flag, in fqsstart check.

Signed-off-by: Neeraj Upadhyay 
---
 kernel/rcu/tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index d0988a1..6226bfb 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1865,7 +1865,7 @@ static void rcu_gp_fqs_loop(void)
break;
/* If time for quiescent-state forcing, do it. */
if (!time_after(rcu_state.jiffies_force_qs, jiffies) ||
-   (gf & RCU_GP_FLAG_FQS)) {
+   (gf & (RCU_GP_FLAG_FQS | RCU_GP_FLAG_OVLD))) {
trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
   TPS("fqsstart"));
rcu_gp_fqs(first_gp_fqs);
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



Re: Query regarding pseudo nmi support on GIC V3 and request_nmi()

2020-05-08 Thread Neeraj Upadhyay

Hi Marc,

Thanks a lot for your comments. I will work on exploring how SDEI can be 
used for it.




Thanks
Neeraj

On 5/8/2020 9:41 PM, Marc Zyngier wrote:

On Fri, 08 May 2020 14:34:10 +0100,
Neeraj Upadhyay  wrote:


Hi Marc,

On 5/8/2020 6:23 PM, Marc Zyngier wrote:

On Fri, 8 May 2020 18:09:00 +0530
Neeraj Upadhyay  wrote:


Hi Marc,

On 5/8/2020 5:57 PM, Marc Zyngier wrote:

On Fri, 8 May 2020 16:36:42 +0530
Neeraj Upadhyay  wrote:


Hi Marc,

On 5/8/2020 4:15 PM, Marc Zyngier wrote:

On Thu, 07 May 2020 17:06:19 +0100,
Neeraj Upadhyay  wrote:


Hi,

I have one query regarding pseudo NMI support on GIC v3; from what I
could understand, GIC v3 supports pseudo NMI setup for SPIs and PPIs.
However the request_nmi() in irq framework requires NMI to be per cpu
interrupt source (it checks for IRQF_PERCPU). Can you please help
understand this part, how SPIs can be configured as NMIs, if there is
a per cpu interrupt source restriction?


Let me answer your question by another question: what is the semantic
of a NMI if you can't associate it with a particular CPU?
>>

I was actually thinking of a use case, where, we have a watchdog
interrupt (which is a SPI), which is used for detecting software
hangs and cause device reset; If that interrupt's current cpu
affinity is on a core, where interrupts are disabled, we won't be
able to serve it; so, we need to group that interrupt as an fiq;


Linux doesn't use Group-0 interrupts, as they are strictly secure
(unless your SoC doesn't have EL3, which I doubt).


Yes, we handle that watchdog interrupt as a Group-0 interrupt, which
is handled as fiq in EL3.



I was thinking, if its feasible to mark that interrupt as pseudo
NMI and route it to EL1 as irq. However, looks like that is not the
semantic of a NMI and we would need something like pseudo NMI ipi
for this.


Sending a NMI IPI from another NMI handler? Even once I've added
these, there is no way this will work for that particular scenario.
Just look at the restrictions we impose on NMIs.



Sending a pseudo NMI IPI (to EL1) from fiq handler (which runs in
EL3); I will check, but do you think, that might not work?


How do you know, from EL3, what to write in memory so that the NMI
handler knows what you want to do? Are you going to parse the S1 page
tables? Hard-code the behaviour of some random Linux version in your
legendary non-updatable firmware? This isn't an acceptable behaviour.



Ok, I understand;

Initial thought was to use watchdog SPI as pseudo NMI; however, as
pseudo NMIs are only per CPU sources, we were exploring the
possibility of using an unused ipi (using the work which is done in
[1] and [2]  for SGIs) as pseudo NMI, which EL3 sends to EL1, on
receiving watchdog fiq. The pseudo NMI handler would collect required
debug information, to help indentify the lockup cause. We weren't
thinking of communicating any information from EL3 fiq handler to
EL1.


What if the operating system running at EL1/EL2 is *not* Linux?



However, from this discussion, I realize that calling irq handler from
fiq handler, would not be possible. So, the approach looks flawed.

I believe, allowing a non-percpu pseudo NMI is not acceptable to community?


No, I really don't want to entertain this idea, because the semantics
are way too loosely defined and you'd end up with everyone wanting
something mildly different.


An IPI is between two CPUs used by the same SW entitiy. What runs at
EL3 is completely alien to Linux, and so is Linux to EL3. If you want
to IPI, send Group-0 IPIs that are private to the firmware.



Ok got it; however, I wonder what's the use case of sending
SGI to EL1, from secure world, using ICC_ASGI1R. I thought it
allowed communication between EL1 and EL3; but, looks like I
understood in wrong.


There is what the GIC architecture can do, and there is what is
sensible for Linux. The GIC allows IPIs from S-to-NS as well as the
opposite. This doesn't make it a good idea (it actually is a terrible
idea, and I really hope that future versions of the architecture will
simply kill the feature).

The idea was that you'd make SGIs an first class ABI between S and
NS. Given that the two are developed separately and that nobody ever
standardised what the SGI numbers mean, this idea is completely dead.




If you want to inject NMI-type exceptions into EL1, you can always try
SDEI (did I actually write this? Help!). But given your use case below,
that wouldn't work either.



Ok.


Frankly, if all you need to do is to reset the SoC, use EL3
firmware. That is what it is for.



Before triggering SoC reset, we want to collect certain  EL1 debug
information like stack trace for CPUs and other debug information.


Frankly, if you are going to reset the SoC because EL1/EL2 has gone
bust, how can you trust it to do anything sensible when injecting an
interrupt?. Once you take a SPI at EL3, gather whatever state you want
from EL3. Do not involve EL1 at all.

M.



Agree that it

Re: Query regarding pseudo nmi support on GIC V3 and request_nmi()

2020-05-08 Thread Neeraj Upadhyay

Hi Marc,

On 5/8/2020 6:23 PM, Marc Zyngier wrote:

On Fri, 8 May 2020 18:09:00 +0530
Neeraj Upadhyay  wrote:


Hi Marc,

On 5/8/2020 5:57 PM, Marc Zyngier wrote:

On Fri, 8 May 2020 16:36:42 +0530
Neeraj Upadhyay  wrote:
   

Hi Marc,

On 5/8/2020 4:15 PM, Marc Zyngier wrote:

On Thu, 07 May 2020 17:06:19 +0100,
Neeraj Upadhyay  wrote:


Hi,

I have one query regarding pseudo NMI support on GIC v3; from what I
could understand, GIC v3 supports pseudo NMI setup for SPIs and PPIs.
However the request_nmi() in irq framework requires NMI to be per cpu
interrupt source (it checks for IRQF_PERCPU). Can you please help
understand this part, how SPIs can be configured as NMIs, if there is
a per cpu interrupt source restriction?


Let me answer your question by another question: what is the semantic
of a NMI if you can't associate it with a particular CPU?
   >>

I was actually thinking of a use case, where, we have a watchdog
interrupt (which is a SPI), which is used for detecting software
hangs and cause device reset; If that interrupt's current cpu
affinity is on a core, where interrupts are disabled, we won't be
able to serve it; so, we need to group that interrupt as an fiq;


Linux doesn't use Group-0 interrupts, as they are strictly secure
(unless your SoC doesn't have EL3, which I doubt).


Yes, we handle that watchdog interrupt as a Group-0 interrupt, which
is handled as fiq in EL3.

   

I was thinking, if its feasible to mark that interrupt as pseudo
NMI and route it to EL1 as irq. However, looks like that is not the
semantic of a NMI and we would need something like pseudo NMI ipi
for this.


Sending a NMI IPI from another NMI handler? Even once I've added
these, there is no way this will work for that particular scenario.
Just look at the restrictions we impose on NMIs.
   


Sending a pseudo NMI IPI (to EL1) from fiq handler (which runs in
EL3); I will check, but do you think, that might not work?


How do you know, from EL3, what to write in memory so that the NMI
handler knows what you want to do? Are you going to parse the S1 page
tables? Hard-code the behaviour of some random Linux version in your
legendary non-updatable firmware? This isn't an acceptable behaviour.



Ok, I understand;

Initial thought was to use watchdog SPI as pseudo NMI; however, as 
pseudo NMIs are only per CPU sources, we were exploring the possibility 
of using an unused ipi (using the work which is done in [1] and [2]  for 
SGIs) as pseudo NMI, which EL3 sends to EL1, on receiving watchdog fiq. 
The pseudo NMI handler would collect required debug information, to help 
indentify the lockup cause. We weren't thinking of communicating any 
information from EL3 fiq handler to EL1.


However, from this discussion, I realize that calling irq handler from
fiq handler, would not be possible. So, the approach looks flawed.

I believe, allowing a non-percpu pseudo NMI is not acceptable to community?



[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=irq/gic-sgi


[2] https://lkml.org/lkml/2020/4/25/135


An IPI is between two CPUs used by the same SW entitiy. What runs at
EL3 is completely alien to Linux, and so is Linux to EL3. If you want
to IPI, send Group-0 IPIs that are private to the firmware.



Ok got it; however, I wonder what's the use case of sending
SGI to EL1, from secure world, using ICC_ASGI1R. I thought it
allowed communication between EL1 and EL3; but, looks like I understood 
in wrong.



If you want to inject NMI-type exceptions into EL1, you can always try
SDEI (did I actually write this? Help!). But given your use case below,
that wouldn't work either.



Ok.


Frankly, if all you need to do is to reset the SoC, use EL3
firmware. That is what it is for.
   


Before triggering SoC reset, we want to collect certain  EL1 debug
information like stack trace for CPUs and other debug information.


Frankly, if you are going to reset the SoC because EL1/EL2 has gone
bust, how can you trust it to do anything sensible when injecting an
interrupt?. Once you take a SPI at EL3, gather whatever state you want
from EL3. Do not involve EL1 at all.

M.



Agree that it might not work for all cases. But, for the cases like, 
some kernel code is stuck after disabling local irqs; pseudo NMI might 
still be able to run and capture stack and other debug info, to help 
detect the cause of lockups.



Thanks
Neeraj

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member of the Code Aurora Forum, hosted by The Linux Foundation


Re: Query regarding pseudo nmi support on GIC V3 and request_nmi()

2020-05-08 Thread Neeraj Upadhyay

Hi Marc,

On 5/8/2020 5:57 PM, Marc Zyngier wrote:

On Fri, 8 May 2020 16:36:42 +0530
Neeraj Upadhyay  wrote:


Hi Marc,

On 5/8/2020 4:15 PM, Marc Zyngier wrote:

On Thu, 07 May 2020 17:06:19 +0100,
Neeraj Upadhyay  wrote:


Hi,

I have one query regarding pseudo NMI support on GIC v3; from what I
could understand, GIC v3 supports pseudo NMI setup for SPIs and PPIs.
However the request_nmi() in irq framework requires NMI to be per cpu
interrupt source (it checks for IRQF_PERCPU). Can you please help
understand this part, how SPIs can be configured as NMIs, if there is
a per cpu interrupt source restriction?


Let me answer your question by another question: what is the semantic
of a NMI if you can't associate it with a particular CPU?
  


I was actually thinking of a use case, where, we have a watchdog
interrupt (which is a SPI), which is used for detecting software
hangs and cause device reset; If that interrupt's current cpu
affinity is on a core, where interrupts are disabled, we won't be
able to serve it; so, we need to group that interrupt as an fiq;


Linux doesn't use Group-0 interrupts, as they are strictly secure
(unless your SoC doesn't have EL3, which I doubt).


Yes, we handle that watchdog interrupt as a Group-0 interrupt, which is 
handled as fiq in EL3.





I was thinking, if its feasible to mark that interrupt as pseudo NMI
and route it to EL1 as irq. However, looks like that is not the
semantic of a NMI and we would need something like pseudo NMI ipi for
this.


Sending a NMI IPI from another NMI handler? Even once I've added these,
there is no way this will work for that particular scenario. Just look
at the restrictions we impose on NMIs.



Sending a pseudo NMI IPI (to EL1) from fiq handler (which runs in EL3); 
I will check, but do you think, that might not work?



Frankly, if all you need to do is to reset the SoC, use EL3 firmware.
That is what it is for.



Before triggering SoC reset, we want to collect certain  EL1 debug 
information like stack trace for CPUs and other debug information.



Thanks,

M.



Thanks
Neeraj

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member of the Code Aurora Forum, hosted by The Linux Foundation


Re: Query regarding pseudo nmi support on GIC V3 and request_nmi()

2020-05-08 Thread Neeraj Upadhyay

Hi Marc,

On 5/8/2020 4:15 PM, Marc Zyngier wrote:

On Thu, 07 May 2020 17:06:19 +0100,
Neeraj Upadhyay  wrote:


Hi,

I have one query regarding pseudo NMI support on GIC v3; from what I
could understand, GIC v3 supports pseudo NMI setup for SPIs and PPIs.
However the request_nmi() in irq framework requires NMI to be per cpu
interrupt source (it checks for IRQF_PERCPU). Can you please help
understand this part, how SPIs can be configured as NMIs, if there is
a per cpu interrupt source restriction?


Let me answer your question by another question: what is the semantic
of a NMI if you can't associate it with a particular CPU?



I was actually thinking of a use case, where, we have a watchdog 
interrupt (which is a SPI), which is used for detecting software hangs
and cause device reset; If that interrupt's current cpu affinity is on a 
core, where interrupts are disabled, we won't be able to serve it; so, 
we need to group that interrupt as an fiq; I was thinking, if its 
feasible to mark that interrupt as pseudo NMI and route it to EL1 as 
irq. However, looks like that is not the semantic of a NMI and we would

need something like pseudo NMI ipi for this.


We use pseudo-NMI to be able to profile (or detect lockups) within
sections where normal interrupts cannot fire. If the interrupt can
end-up on a random CPU (with an unrelated PMU or one that hasn't
locked up), what have we achieved? Only confusion.

The whole point is that NMIs have to be tied to a given CPU. For
SGI/PPI, this is guaranteed by construction. For SPIs, this means that
the affinity cannot be changed from userspace. IRQF_PERCPU doesn't
mean much in this context as we don't "broadcast" interrupts, but is
an indication to the core kernel that the same interrupt cannot be
taken on another CPU.

The short of it is that NMIs are only for per-CPU sources. For SPIs,
that's for PMUs that use SPIs instead of PPIs. Don't use it for
anything else.



Thank you for the explanation!


Thanks,

M.



Thanks
Neeraj

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member of the Code Aurora Forum, hosted by The Linux Foundation


Query regarding pseudo nmi support on GIC V3 and request_nmi()

2020-05-07 Thread Neeraj Upadhyay

Hi,

I have one query regarding pseudo NMI support on GIC v3; from what I 
could understand, GIC v3 supports pseudo NMI setup for SPIs and PPIs.

However the request_nmi() in irq framework requires NMI to be per cpu
interrupt source (it checks for IRQF_PERCPU). Can you please help
understand this part, how SPIs can be configured as NMIs, if there is
a per cpu interrupt source restriction?



Thanks
Neeraj

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member of the Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH] arm64: Explicitly set pstate.ssbs for el0 on kernel entry

2019-07-18 Thread Neeraj Upadhyay

Hi Marc,


On 7/9/19 7:52 PM, Marc Zyngier wrote:

On 09/07/2019 15:18, Neeraj Upadhyay wrote:

Hi Marc,

On 7/9/19 6:38 PM, Marc Zyngier wrote:

Hi Neeraj,

On 09/07/2019 12:22, Neeraj Upadhyay wrote:

For cpus which do not support pstate.ssbs feature, el0
might not retain spsr.ssbs. This is problematic, if this
task migrates to a cpu supporting this feature, thus
relying on its state to be correct. On kernel entry,
explicitly set spsr.ssbs, so that speculation is enabled
for el0, when this task migrates to a cpu supporting
ssbs feature. Restoring state at kernel entry ensures
that el0 ssbs state is always consistent while we are
in el1.

As alternatives are applied by boot cpu, at the end of smp
init, presence/absence of ssbs feature on boot cpu, is used
for deciding, whether the capability is uniformly provided.

I've seen the same issue, but went for a slightly different
approach, see below.


Signed-off-by: Neeraj Upadhyay 
---
   arch/arm64/kernel/cpu_errata.c | 16 
   arch/arm64/kernel/entry.S  | 26 +-
   2 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index ca11ff7..c84a56d 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -336,6 +336,22 @@ void __init arm64_enable_wa2_handling(struct alt_instr 
*alt,
*updptr = cpu_to_le32(aarch64_insn_gen_nop());
   }
   
+void __init arm64_restore_ssbs_state(struct alt_instr *alt,

+__le32 *origptr, __le32 *updptr,
+int nr_inst)
+{
+   BUG_ON(nr_inst != 1);
+   /*
+* Only restore EL0 SSBS state on EL1 entry if cpu does not
+* support the capability and capability is present for at
+* least one cpu and if the SSBD state allows it to
+* be changed.
+*/
+   if (!this_cpu_has_cap(ARM64_SSBS) && cpus_have_cap(ARM64_SSBS) &&
+   arm64_get_ssbd_state() != ARM64_SSBD_FORCE_ENABLE)
+   *updptr = cpu_to_le32(aarch64_insn_gen_nop());
+}
+
   void arm64_set_ssbd_mitigation(bool state)
   {
if (!IS_ENABLED(CONFIG_ARM64_SSBD)) {
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 9cdc459..7e79305 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -143,6 +143,25 @@ alternative_cb_end
   #endif
.endm
   
+	// This macro updates spsr. It also corrupts the condition

+   // codes state.
+   .macro  restore_ssbs_state, saved_spsr, tmp
+#ifdef CONFIG_ARM64_SSBD
+alternative_cb arm64_restore_ssbs_state
+   b   .L__asm_ssbs_skip\@
+alternative_cb_end
+   ldr \tmp, [tsk, #TSK_TI_FLAGS]
+   tbnz\tmp, #TIF_SSBD, .L__asm_ssbs_skip\@
+   tst \saved_spsr, #PSR_MODE32_BIT// native task?
+   b.ne.L__asm_ssbs_compat\@
+   orr \saved_spsr, \saved_spsr, #PSR_SSBS_BIT
+   b   .L__asm_ssbs_skip\@
+.L__asm_ssbs_compat\@:
+   orr \saved_spsr, \saved_spsr, #PSR_AA32_SSBS_BIT
+.L__asm_ssbs_skip\@:
+#endif
+   .endm

Although this is in keeping with the rest of entry.S (perfectly
unreadable ;-), I think we can do something a bit simpler, that
doesn't rely on patching. Also, this doesn't seem to take the
SSBD options such as ARM64_SSBD_FORCE_ENABLE into account.

arm64_restore_ssbs_state has a check for ARM64_SSBD_FORCE_ENABLE,

does that look wrong?

No, I just focused on the rest of the asm code and missed it, apologies.


+
.macro  kernel_entry, el, regsize = 64
.if \regsize == 32
mov w0, w0  // zero upper 32 bits of x0
@@ -182,8 +201,13 @@ alternative_cb_end
str x20, [tsk, #TSK_TI_ADDR_LIMIT]
/* No need to reset PSTATE.UAO, hardware's already set it to 0 for us */
.endif /* \el == 0 */
-   mrs x22, elr_el1
mrs x23, spsr_el1
+
+   .if \el == 0
+   restore_ssbs_state x23, x22
+   .endif
+
+   mrs x22, elr_el1
stp lr, x21, [sp, #S_LR]
   
   	/*



How about the patch below?

Looks good; was just going to mention PF_KTHREAD check, but Mark R. has
already

given detailed information about it.

Yup, well spotted. I'll respin it shortly and we can then work out
whether that's really a better approach.


Did you get chance to recheck it?


Thanks

Neeraj



Thanks,

M.


--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation



Re: [PATCH] arm64: Explicitly set pstate.ssbs for el0 on kernel entry

2019-07-09 Thread Neeraj Upadhyay

Hi Marc,

On 7/9/19 6:38 PM, Marc Zyngier wrote:

Hi Neeraj,

On 09/07/2019 12:22, Neeraj Upadhyay wrote:

For cpus which do not support pstate.ssbs feature, el0
might not retain spsr.ssbs. This is problematic, if this
task migrates to a cpu supporting this feature, thus
relying on its state to be correct. On kernel entry,
explicitly set spsr.ssbs, so that speculation is enabled
for el0, when this task migrates to a cpu supporting
ssbs feature. Restoring state at kernel entry ensures
that el0 ssbs state is always consistent while we are
in el1.

As alternatives are applied by boot cpu, at the end of smp
init, presence/absence of ssbs feature on boot cpu, is used
for deciding, whether the capability is uniformly provided.

I've seen the same issue, but went for a slightly different
approach, see below.


Signed-off-by: Neeraj Upadhyay 
---
  arch/arm64/kernel/cpu_errata.c | 16 
  arch/arm64/kernel/entry.S  | 26 +-
  2 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index ca11ff7..c84a56d 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -336,6 +336,22 @@ void __init arm64_enable_wa2_handling(struct alt_instr 
*alt,
*updptr = cpu_to_le32(aarch64_insn_gen_nop());
  }
  
+void __init arm64_restore_ssbs_state(struct alt_instr *alt,

+__le32 *origptr, __le32 *updptr,
+int nr_inst)
+{
+   BUG_ON(nr_inst != 1);
+   /*
+* Only restore EL0 SSBS state on EL1 entry if cpu does not
+* support the capability and capability is present for at
+* least one cpu and if the SSBD state allows it to
+* be changed.
+*/
+   if (!this_cpu_has_cap(ARM64_SSBS) && cpus_have_cap(ARM64_SSBS) &&
+   arm64_get_ssbd_state() != ARM64_SSBD_FORCE_ENABLE)
+   *updptr = cpu_to_le32(aarch64_insn_gen_nop());
+}
+
  void arm64_set_ssbd_mitigation(bool state)
  {
if (!IS_ENABLED(CONFIG_ARM64_SSBD)) {
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 9cdc459..7e79305 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -143,6 +143,25 @@ alternative_cb_end
  #endif
.endm
  
+	// This macro updates spsr. It also corrupts the condition

+   // codes state.
+   .macro  restore_ssbs_state, saved_spsr, tmp
+#ifdef CONFIG_ARM64_SSBD
+alternative_cb arm64_restore_ssbs_state
+   b   .L__asm_ssbs_skip\@
+alternative_cb_end
+   ldr \tmp, [tsk, #TSK_TI_FLAGS]
+   tbnz\tmp, #TIF_SSBD, .L__asm_ssbs_skip\@
+   tst \saved_spsr, #PSR_MODE32_BIT// native task?
+   b.ne.L__asm_ssbs_compat\@
+   orr \saved_spsr, \saved_spsr, #PSR_SSBS_BIT
+   b   .L__asm_ssbs_skip\@
+.L__asm_ssbs_compat\@:
+   orr \saved_spsr, \saved_spsr, #PSR_AA32_SSBS_BIT
+.L__asm_ssbs_skip\@:
+#endif
+   .endm

Although this is in keeping with the rest of entry.S (perfectly
unreadable ;-), I think we can do something a bit simpler, that
doesn't rely on patching. Also, this doesn't seem to take the
SSBD options such as ARM64_SSBD_FORCE_ENABLE into account.


arm64_restore_ssbs_state has a check for ARM64_SSBD_FORCE_ENABLE,

does that look wrong?




+
.macro  kernel_entry, el, regsize = 64
.if \regsize == 32
mov w0, w0  // zero upper 32 bits of x0
@@ -182,8 +201,13 @@ alternative_cb_end
str x20, [tsk, #TSK_TI_ADDR_LIMIT]
/* No need to reset PSTATE.UAO, hardware's already set it to 0 for us */
.endif /* \el == 0 */
-   mrs x22, elr_el1
mrs x23, spsr_el1
+
+   .if \el == 0
+   restore_ssbs_state x23, x22
+   .endif
+
+   mrs x22, elr_el1
stp lr, x21, [sp, #S_LR]
  
  	/*



How about the patch below?


Looks good; was just going to mention PF_KTHREAD check, but Mark R. has 
already


given detailed information about it.


Thanks

Neeraj



Thanks,

M.

 From 7d4314d1ef3122d8bf56a7ef239c8c68e0c81277 Mon Sep 17 00:00:00 2001
From: Marc Zyngier 
Date: Tue, 4 Jun 2019 17:35:18 +0100
Subject: [PATCH] arm64: Force SSBS on context switch

On a CPU that doesn't support SSBS, PSTATE[12] is RES0.  In a system
where only some of the CPUs implement SSBS, we end-up losing track of
the SSBS bit across task migration.

To address this issue, let's force the SSBS bit on context switch.

Signed-off-by: Marc Zyngier 
---
  arch/arm64/include/asm/processor.h | 14 --
  arch/arm64/kernel/process.c| 14 ++
  2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/processor.h 
b/arch/arm64/include/asm/processor.h
index fd5b1a4efc70..844e2964b0f5 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -193,6 +193,16 

[PATCH] arm64: Explicitly set pstate.ssbs for el0 on kernel entry

2019-07-09 Thread Neeraj Upadhyay
For cpus which do not support pstate.ssbs feature, el0
might not retain spsr.ssbs. This is problematic, if this
task migrates to a cpu supporting this feature, thus
relying on its state to be correct. On kernel entry,
explicitly set spsr.ssbs, so that speculation is enabled
for el0, when this task migrates to a cpu supporting
ssbs feature. Restoring state at kernel entry ensures
that el0 ssbs state is always consistent while we are
in el1.

As alternatives are applied by boot cpu, at the end of smp
init, presence/absence of ssbs feature on boot cpu, is used
for deciding, whether the capability is uniformly provided.

Signed-off-by: Neeraj Upadhyay 
---
 arch/arm64/kernel/cpu_errata.c | 16 
 arch/arm64/kernel/entry.S  | 26 +-
 2 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index ca11ff7..c84a56d 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -336,6 +336,22 @@ void __init arm64_enable_wa2_handling(struct alt_instr 
*alt,
*updptr = cpu_to_le32(aarch64_insn_gen_nop());
 }
 
+void __init arm64_restore_ssbs_state(struct alt_instr *alt,
+__le32 *origptr, __le32 *updptr,
+int nr_inst)
+{
+   BUG_ON(nr_inst != 1);
+   /*
+* Only restore EL0 SSBS state on EL1 entry if cpu does not
+* support the capability and capability is present for at
+* least one cpu and if the SSBD state allows it to
+* be changed.
+*/
+   if (!this_cpu_has_cap(ARM64_SSBS) && cpus_have_cap(ARM64_SSBS) &&
+   arm64_get_ssbd_state() != ARM64_SSBD_FORCE_ENABLE)
+   *updptr = cpu_to_le32(aarch64_insn_gen_nop());
+}
+
 void arm64_set_ssbd_mitigation(bool state)
 {
if (!IS_ENABLED(CONFIG_ARM64_SSBD)) {
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 9cdc459..7e79305 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -143,6 +143,25 @@ alternative_cb_end
 #endif
.endm
 
+   // This macro updates spsr. It also corrupts the condition
+   // codes state.
+   .macro  restore_ssbs_state, saved_spsr, tmp
+#ifdef CONFIG_ARM64_SSBD
+alternative_cb arm64_restore_ssbs_state
+   b   .L__asm_ssbs_skip\@
+alternative_cb_end
+   ldr \tmp, [tsk, #TSK_TI_FLAGS]
+   tbnz\tmp, #TIF_SSBD, .L__asm_ssbs_skip\@
+   tst \saved_spsr, #PSR_MODE32_BIT// native task?
+   b.ne.L__asm_ssbs_compat\@
+   orr \saved_spsr, \saved_spsr, #PSR_SSBS_BIT
+   b   .L__asm_ssbs_skip\@
+.L__asm_ssbs_compat\@:
+   orr \saved_spsr, \saved_spsr, #PSR_AA32_SSBS_BIT
+.L__asm_ssbs_skip\@:
+#endif
+   .endm
+
.macro  kernel_entry, el, regsize = 64
.if \regsize == 32
mov w0, w0  // zero upper 32 bits of x0
@@ -182,8 +201,13 @@ alternative_cb_end
str x20, [tsk, #TSK_TI_ADDR_LIMIT]
/* No need to reset PSTATE.UAO, hardware's already set it to 0 for us */
.endif /* \el == 0 */
-   mrs x22, elr_el1
mrs x23, spsr_el1
+
+   .if \el == 0
+   restore_ssbs_state x23, x22
+   .endif
+
+   mrs x22, elr_el1
stp lr, x21, [sp, #S_LR]
 
/*
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation



[PATCH v3] pinctrl: qcom: Add irq_enable callback for msm gpio

2019-06-25 Thread Neeraj Upadhyay
From: Srinivas Ramana 

Introduce the irq_enable callback which will be same as irq_unmask
except that it will also clear the status bit before unmask.

This will help in clearing any erroneous interrupts that would
have got latched when the interrupt is not in use.

There may be devices like UART which can use the same gpio line
for data rx as well as a wakeup gpio when in suspend. The data that
was flowing on the line may latch the interrupt and when we enable
the interrupt before going to suspend, this would trigger the
unexpected interrupt. This change helps clearing the interrupt
so that these unexpected interrupts gets cleared.

Signed-off-by: Srinivas Ramana 
Signed-off-by: Neeraj Upadhyay 
---

Changes since v2:
- Renamed function to msm_gpio_irq_clear_unmask()

 drivers/pinctrl/qcom/pinctrl-msm.c | 25 -
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/drivers/pinctrl/qcom/pinctrl-msm.c 
b/drivers/pinctrl/qcom/pinctrl-msm.c
index 6e319bc..0223755 100644
--- a/drivers/pinctrl/qcom/pinctrl-msm.c
+++ b/drivers/pinctrl/qcom/pinctrl-msm.c
@@ -729,7 +729,7 @@ static void msm_gpio_irq_mask(struct irq_data *d)
raw_spin_unlock_irqrestore(>lock, flags);
 }
 
-static void msm_gpio_irq_unmask(struct irq_data *d)
+static void msm_gpio_irq_clear_unmask(struct irq_data *d, bool status_clear)
 {
struct gpio_chip *gc = irq_data_get_irq_chip_data(d);
struct msm_pinctrl *pctrl = gpiochip_get_data(gc);
@@ -741,6 +741,17 @@ static void msm_gpio_irq_unmask(struct irq_data *d)
 
raw_spin_lock_irqsave(>lock, flags);
 
+   if (status_clear) {
+   /*
+* clear the interrupt status bit before unmask to avoid
+* any erroneous interrupts that would have got latched
+* when the interrupt is not in use.
+*/
+   val = msm_readl_intr_status(pctrl, g);
+   val &= ~BIT(g->intr_status_bit);
+   msm_writel_intr_status(val, pctrl, g);
+   }
+
val = msm_readl_intr_cfg(pctrl, g);
val |= BIT(g->intr_raw_status_bit);
val |= BIT(g->intr_enable_bit);
@@ -751,6 +762,17 @@ static void msm_gpio_irq_unmask(struct irq_data *d)
raw_spin_unlock_irqrestore(>lock, flags);
 }
 
+static void msm_gpio_irq_enable(struct irq_data *d)
+{
+
+   msm_gpio_irq_clear_unmask(d, true);
+}
+
+static void msm_gpio_irq_unmask(struct irq_data *d)
+{
+   msm_gpio_irq_clear_unmask(d, false);
+}
+
 static void msm_gpio_irq_ack(struct irq_data *d)
 {
struct gpio_chip *gc = irq_data_get_irq_chip_data(d);
@@ -978,6 +1000,7 @@ static int msm_gpio_init(struct msm_pinctrl *pctrl)
chip->need_valid_mask = msm_gpio_needs_valid_mask(pctrl);
 
pctrl->irq_chip.name = "msmgpio";
+   pctrl->irq_chip.irq_enable = msm_gpio_irq_enable;
pctrl->irq_chip.irq_mask = msm_gpio_irq_mask;
pctrl->irq_chip.irq_unmask = msm_gpio_irq_unmask;
pctrl->irq_chip.irq_ack = msm_gpio_irq_ack;
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation



Re: [PATCH v2] pinctrl: qcom: Add irq_enable callback for msm gpio

2019-06-25 Thread Neeraj Upadhyay



On 6/25/19 2:28 PM, Linus Walleij wrote:

On Mon, Jun 17, 2019 at 11:35 AM Neeraj Upadhyay  wrote:


From: Srinivas Ramana 

Introduce the irq_enable callback which will be same as irq_unmask
except that it will also clear the status bit before unmask.

This will help in clearing any erroneous interrupts that would
have got latched when the interrupt is not in use.

There may be devices like UART which can use the same gpio line
for data rx as well as a wakeup gpio when in suspend. The data that
was flowing on the line may latch the interrupt and when we enable
the interrupt before going to suspend, this would trigger the
unexpected interrupt. This change helps clearing the interrupt
so that these unexpected interrupts gets cleared.

Signed-off-by: Srinivas Ramana 
Signed-off-by: Neeraj Upadhyay 

Overall this looks good to me, waiting for Bjorn's review.


Changes since v1:
- Extracted common code into __msm_gpio_irq_unmask().

Please don't name functions __like __that.


-static void msm_gpio_irq_unmask(struct irq_data *d)
+static void __msm_gpio_irq_unmask(struct irq_data *d, bool status_clear)

Instead of __unclear __underscore __semantic use something
really descriptive like

static void msm_gpio_irq_clear_irq()

That is what it does, right?


Is below ok? as it clears (if status_clear set) and then unmasks irq

static void msm_gpio_irq_clear_unmask()



Other than that it looks fine.

Yours,
Linus Walleij


--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation



Re: Fwd: Re: [PATCH] pinctrl: qcom: Clear status bit on irq_unmask

2019-06-17 Thread Neeraj Upadhyay

Thanks for the review, Linus.

On 6/17/19 5:20 PM, Linus Walleij wrote:

On Mon, Jun 17, 2019 at 12:35 PM Neeraj Upadhyay  wrote:


Hi Stephen, there is one use case with is not covered by commit
b55326dc969e (

"pinctrl: msm: Really mask level interrupts to prevent latching"). That
happens when

gpio line is toggled between i/o mode and interrupt mode :

1. GPIO is configured as irq line. Peripheral raises interrupt.

2. IRQ handler runs and disables the irq line (through wq work).

3. GPIO is configured for input and and data is received from the
peripheral.

There is no distinction between using a GPIO line as input
and using it for IRQ. All input GPIOs can be used for IRQs,
if the hardware supports it (has an irqchip).


Ok




4. Now, when GPIO is re-enabled as irq, we see spurious irq, and there
isn't

any data received on the gpio line, when it is read back after
configuring as input.

That's an interesting usecase. Hans Verkuil reworked the
GPIO irq support very elegantly exactly to support this type
of usecase (irq switch on and off dynamically), where he
was even switching the line into output mode between
the IRQ trains. (one-wire transcactions for CEC).


Patch https://lkml.org/lkml/2019/6/17/226 tries to cover this use case.
Can you please provide your comments?

What this patch does is clear all pending IRQs at irq
unmask. This is usually safe, unless there may be cases
where you *want* to catch any pending IRQs. I guess
normally you don't so it should be safe?


We want to clear pending status at irq enable. Afaik, no,

we don't normally track these pending irqs. So, should be

fine here.



The corner case is when you start some transaction
or whatever that gets ACKed by an IRQ and you actually
get the IRQ back before you had time to execute the code
enabling the IRQ.

That would be racy and bad code, as you should clearly
enable the IRQ first, then start the transaction. So I think
this patch is safe.

But let's see what Bjorn says.

Yours,
Linus Walleij


--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation



Re: Fwd: Re: [PATCH] pinctrl: qcom: Clear status bit on irq_unmask

2019-06-17 Thread Neeraj Upadhyay




Quoting tengf...@codeaurora.org (2019-06-11 03:41:26)

On 2019-06-10 22:51, Stephen Boyd wrote:
> Quoting Linus Walleij (2019-06-07 14:08:10)
>> On Fri, May 31, 2019 at 8:52 AM Tengfei Fan 
 >> wrote:

>> >> > The gpio interrupt status bit is getting set after the
>> > irq is disabled and causing an immediate interrupt after
>> > enablling the irq, so clear status bit on irq_unmask.
>> >
>> > Signed-off-by: Tengfei Fan 
>> >> This looks pretty serious, can one of the Qcom maintainers ACK
>> this?
>> >> Should it be sent to fixes and even stable?
>> >> Fixes: tag?
>> > > How is the interrupt status bit getting set after the irq is 
disabled?

> It looks like this is a level type interrupt? I thought that after
> commit b55326dc969e ("pinctrl: msm: Really mask level interrupts to
> prevent latching") this wouldn't be a problem. Am I wrong, or is qcom
> just clearing out patches on drivers and this is the last one that 
> needs

> to be upstreamed?

Your patch(commit b55326dc969e) can cover our issue, and my patch is 
no longer needed.

Your patch isn't included in our code, so I submitted this patch.


Alright cool. Sounds like this patch can be dropped then and you can
pick up the patch from upstream into your vendor kernel.



Hi Stephen, there is one use case with is not covered by commit 
b55326dc969e (


"pinctrl: msm: Really mask level interrupts to prevent latching"). That 
happens when


gpio line is toggled between i/o mode and interrupt mode :

1. GPIO is configured as irq line. Peripheral raises interrupt.

2. IRQ handler runs and disables the irq line (through wq work).

3. GPIO is configured for input and and data is received from the 
peripheral.


4. Now, when GPIO is re-enabled as irq, we see spurious irq, and there 
isn't


any data received on the gpio line, when it is read back after 
configuring as input.


This can happen for both edge and level interrupts.

Patch https://lkml.org/lkml/2019/6/17/226 tries to cover this use case. 
Can you please


provide your comments?


Thanks

Neeraj

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation



[PATCH v2] pinctrl: qcom: Add irq_enable callback for msm gpio

2019-06-17 Thread Neeraj Upadhyay
From: Srinivas Ramana 

Introduce the irq_enable callback which will be same as irq_unmask
except that it will also clear the status bit before unmask.

This will help in clearing any erroneous interrupts that would
have got latched when the interrupt is not in use.

There may be devices like UART which can use the same gpio line
for data rx as well as a wakeup gpio when in suspend. The data that
was flowing on the line may latch the interrupt and when we enable
the interrupt before going to suspend, this would trigger the
unexpected interrupt. This change helps clearing the interrupt
so that these unexpected interrupts gets cleared.

Signed-off-by: Srinivas Ramana 
Signed-off-by: Neeraj Upadhyay 
---

Changes since v1:
- Extracted common code into __msm_gpio_irq_unmask().

 drivers/pinctrl/qcom/pinctrl-msm.c | 25 -
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/drivers/pinctrl/qcom/pinctrl-msm.c 
b/drivers/pinctrl/qcom/pinctrl-msm.c
index 6e319bc..2a127f0 100644
--- a/drivers/pinctrl/qcom/pinctrl-msm.c
+++ b/drivers/pinctrl/qcom/pinctrl-msm.c
@@ -729,7 +729,7 @@ static void msm_gpio_irq_mask(struct irq_data *d)
raw_spin_unlock_irqrestore(>lock, flags);
 }
 
-static void msm_gpio_irq_unmask(struct irq_data *d)
+static void __msm_gpio_irq_unmask(struct irq_data *d, bool status_clear)
 {
struct gpio_chip *gc = irq_data_get_irq_chip_data(d);
struct msm_pinctrl *pctrl = gpiochip_get_data(gc);
@@ -741,6 +741,17 @@ static void msm_gpio_irq_unmask(struct irq_data *d)
 
raw_spin_lock_irqsave(>lock, flags);
 
+   if (status_clear) {
+   /*
+* clear the interrupt status bit before unmask to avoid
+* any erroneous interrupts that would have got latched
+* when the interrupt is not in use.
+*/
+   val = msm_readl_intr_status(pctrl, g);
+   val &= ~BIT(g->intr_status_bit);
+   msm_writel_intr_status(val, pctrl, g);
+   }
+
val = msm_readl_intr_cfg(pctrl, g);
val |= BIT(g->intr_raw_status_bit);
val |= BIT(g->intr_enable_bit);
@@ -751,6 +762,17 @@ static void msm_gpio_irq_unmask(struct irq_data *d)
raw_spin_unlock_irqrestore(>lock, flags);
 }
 
+static void msm_gpio_irq_enable(struct irq_data *d)
+{
+
+   __msm_gpio_irq_unmask(d, true);
+}
+
+static void msm_gpio_irq_unmask(struct irq_data *d)
+{
+   __msm_gpio_irq_unmask(d, false);
+}
+
 static void msm_gpio_irq_ack(struct irq_data *d)
 {
struct gpio_chip *gc = irq_data_get_irq_chip_data(d);
@@ -978,6 +1000,7 @@ static int msm_gpio_init(struct msm_pinctrl *pctrl)
chip->need_valid_mask = msm_gpio_needs_valid_mask(pctrl);
 
pctrl->irq_chip.name = "msmgpio";
+   pctrl->irq_chip.irq_enable = msm_gpio_irq_enable;
pctrl->irq_chip.irq_mask = msm_gpio_irq_mask;
pctrl->irq_chip.irq_unmask = msm_gpio_irq_unmask;
pctrl->irq_chip.irq_ack = msm_gpio_irq_ack;
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation



Re: [PATCH] rcu: tree_stall: Correctly unlock root node in rcu_check_gp_start_stall

2019-03-29 Thread Neeraj Upadhyay




On 3/30/19 2:57 AM, Paul E. McKenney wrote:

On Fri, Mar 29, 2019 at 07:52:15PM +0530, Neeraj Upadhyay wrote:



On 3/29/19 6:58 PM, Mukesh Ojha wrote:


On 3/29/2019 4:57 PM, Neeraj Upadhyay wrote:

Only unlock the root node, if current node (rnp) is not
root node.

Signed-off-by: Neeraj Upadhyay 





---
   kernel/rcu/tree_stall.h | 4 +++-
   1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index f65a73a..0651833 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h



why this is showing as under tree_stall.h while it is under
"kernel/rcu/tree.c"


It's moved in 
https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/commit/?h=dev=10462d6f58fb6dbde7563e9343505d98d5bfba3d

Please see linux-rcu dev tree for other changes, which moves code to
this file.


Thanks
Neeraj




@@ -630,7 +630,9 @@ static void rcu_check_gp_start_stall(struct
rcu_node *rnp, struct rcu_data *rdp,
   time_before(j, rcu_state.gp_req_activity + gpssdelay) ||
   time_before(j, rcu_state.gp_activity + gpssdelay) ||
   atomic_xchg(, 1)) {
-    raw_spin_unlock_rcu_node(rnp_root); /* irqs remain disabled. */
+    if (rnp_root != rnp)
+    /* irqs remain disabled. */
+    raw_spin_unlock_rcu_node(rnp_root);


Looks good as it will balance the lock .if it is the root_node,
which was not there earlier, and unlock was happening without any
lock on root.

Reviewed-by: Mukesh Ojha 


Applied, again thank you both!

In both cases, I updated the commit log, so please check to make sure
that I didn't mess anything up.

Thanx, Paul



Thanks Paul. One minor comment on 
https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/commit/?h=dev=ec6530e763046b6bb1f4c2c2aed49ebc68aae2a0 



"it clearly does not make sense to release
both rnp->lock and rnp->lock"

should be rnp->lock and rnp_root->lock



Thanks
Neeraj


Cheers,
-Mukesh


   raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
   return;
   }


--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation





--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH] rcu: tree_stall: Correctly unlock root node in rcu_check_gp_start_stall

2019-03-29 Thread Neeraj Upadhyay




On 3/29/19 6:58 PM, Mukesh Ojha wrote:


On 3/29/2019 4:57 PM, Neeraj Upadhyay wrote:

Only unlock the root node, if current node (rnp) is not
root node.

Signed-off-by: Neeraj Upadhyay 





---
  kernel/rcu/tree_stall.h | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index f65a73a..0651833 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h



why this is showing as under tree_stall.h while it is under 
"kernel/rcu/tree.c"


It's moved in 
https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/commit/?h=dev=10462d6f58fb6dbde7563e9343505d98d5bfba3d


Please see linux-rcu dev tree for other changes, which moves code to 
this file.



Thanks
Neeraj



@@ -630,7 +630,9 @@ static void rcu_check_gp_start_stall(struct 
rcu_node *rnp, struct rcu_data *rdp,

  time_before(j, rcu_state.gp_req_activity + gpssdelay) ||
  time_before(j, rcu_state.gp_activity + gpssdelay) ||
  atomic_xchg(, 1)) {
-    raw_spin_unlock_rcu_node(rnp_root); /* irqs remain disabled. */
+    if (rnp_root != rnp)
+    /* irqs remain disabled. */
+    raw_spin_unlock_rcu_node(rnp_root);


Looks good as it will balance the lock .if it is the root_node, which 
was not there earlier, and unlock was happening without any lock on root.


Reviewed-by: Mukesh Ojha 

Cheers,
-Mukesh


  raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
  return;
  }


--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation


[PATCH] rcu: tree_stall: Correctly unlock root node in rcu_check_gp_start_stall

2019-03-29 Thread Neeraj Upadhyay
Only unlock the root node, if current node (rnp) is not
root node.

Signed-off-by: Neeraj Upadhyay 
---
 kernel/rcu/tree_stall.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index f65a73a..0651833 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -630,7 +630,9 @@ static void rcu_check_gp_start_stall(struct rcu_node *rnp, 
struct rcu_data *rdp,
time_before(j, rcu_state.gp_req_activity + gpssdelay) ||
time_before(j, rcu_state.gp_activity + gpssdelay) ||
atomic_xchg(, 1)) {
-   raw_spin_unlock_rcu_node(rnp_root); /* irqs remain disabled. */
+   if (rnp_root != rnp)
+   /* irqs remain disabled. */
+   raw_spin_unlock_rcu_node(rnp_root);
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
return;
}
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation



[PATCH] rcu: tree_plugin: Dump specified number of blocked tasks

2019-03-29 Thread Neeraj Upadhyay
dump_blkd_tasks() uses 10 as the max number of blocked
tasks, which are printed. However, it has an argument
which provides that number. So, use the argument value
instead. As all callers currently pass 10 as the number,
there isn't any impact.

Signed-off-by: Neeraj Upadhyay 
---
 kernel/rcu/tree_plugin.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 3960294..08bcd87 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -740,7 +740,7 @@ void exit_rcu(void)
i = 0;
list_for_each(lhp, >blkd_tasks) {
pr_cont(" %p", lhp);
-   if (++i >= 10)
+   if (++i >= ncheck)
break;
}
pr_cont("\n");
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation



Re: [PATCH] rcu/tree: Fix self wakeups for grace period kthread

2019-03-12 Thread Neeraj Upadhyay




On 3/12/19 7:20 AM, Steven Rostedt wrote:

On Fri,  8 Mar 2019 15:16:18 +0530
Neeraj Upadhyay  wrote:


Update the code to match the comment that self wakeup of
grace period kthread is allowed from interrupt handler, and
softirq handler, running in the grace period kthread's
context. Present code allows self wakeups from all
interrupt contexts - nmi, softirq and hardirq contexts.


That's not actually the issue. But it appears that we return if we
simply have BH disabled, which I don't think we want, and we don't care
about NMI as NMI should never call this code.

I think your patch is correct, but the change log is not.

-- Steve



Hi Steve, sorry, I don't understand fully, why we want to not return in 
BH disabled case. From the commit logs and lkml discussion, there is a 
case where GP kthread is interrupted in the wait event path and 
rcu_gp_kthread_wake() is called in softirq handler (I am not sure about 
interrupt handler case; how rcu_gp_kthread_wake() is called from that path).


https://github.com/torvalds/linux/commit/1d1f898df6586c5ea9aeaf349f13089c6fa37903

Thanks
Neeraj




Signed-off-by: Neeraj Upadhyay 
---
  kernel/rcu/tree.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index acd6ccf..57cac6d 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1585,7 +1585,7 @@ static bool rcu_future_gp_cleanup(struct rcu_node *rnp)
  static void rcu_gp_kthread_wake(void)
  {
if ((current == rcu_state.gp_kthread &&
-!in_interrupt() && !in_serving_softirq()) ||
+!in_irq() && !in_serving_softirq()) ||
!READ_ONCE(rcu_state.gp_flags) ||
!rcu_state.gp_kthread)
return;




--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH] rcu/tree_plugin: Fix nohz status in stall warning

2019-03-08 Thread Neeraj Upadhyay




On 3/9/19 2:48 AM, Paul E. McKenney wrote:

On Fri, Mar 08, 2019 at 11:51:49PM +0530, Neeraj Upadhyay wrote:

Fix stall warning, to show correct nohz marker.

Signed-off-by: Neeraj Upadhyay 


Good eyes, thank you!

I applied and pushed all three with modified commit logs.  Please check
to make sure that I didn't mess anything up.  If testing and reviews
are OK, I expect to submit them for the v5.2 merge window.

I had to hand-apply this one, so in future patches could you please
generate them against -rcu?  It may be found here:

git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git

Thanx, Paul



Thanks Paul, I reviewed the patches on -rcu; they look good. Thanks for 
applying them! I will move my codebase to  -rcu, for all future work.


Thanks
Neeraj


---
  kernel/rcu/tree_plugin.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 97dba50..93da32c 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1659,7 +1659,7 @@ static void print_cpu_stall_fast_no_hz(char *cp, int cpu)
rdp->last_accelerate & 0x, jiffies & 0x,
".l"[rdp->all_lazy],
".L"[!rcu_segcblist_n_nonlazy_cbs(>cblist)],
-   ".D"[!rdp->tick_nohz_enabled_snap]);
+   ".D"[!!rdp->tick_nohz_enabled_snap]);
  }

  #else /* #ifdef CONFIG_RCU_FAST_NO_HZ */
--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation





--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation


[PATCH] rcu/rcutorture: Fix expected forward progress duration in oom notifier

2019-03-08 Thread Neeraj Upadhyay
Fix a possible miscalculation in rcutorture_oom_notify(),
for the expected forward progress duration of current
forward progress test.

Signed-off-by: Neeraj Upadhyay 
---
 kernel/rcu/rcutorture.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index f14d1b1..5d09f79 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -1848,7 +1848,7 @@ static int rcutorture_oom_notify(struct notifier_block 
*self,
WARN(1, "%s invoked upon OOM during forward-progress testing.\n",
 __func__);
rcu_torture_fwd_cb_hist();
-   rcu_fwd_progress_check(1 + (jiffies - READ_ONCE(rcu_fwd_startat) / 2));
+   rcu_fwd_progress_check(1 + (jiffies - READ_ONCE(rcu_fwd_startat)) / 2);
WRITE_ONCE(rcu_fwd_emergency_stop, true);
smp_mb(); /* Emergency stop before free and wait to avoid hangs. */
pr_info("%s: Freed %lu RCU callbacks.\n",
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation



[PATCH] rcu/tree_plugin: Fix nohz status in stall warning

2019-03-08 Thread Neeraj Upadhyay
Fix stall warning, to show correct nohz marker.

Signed-off-by: Neeraj Upadhyay 
---
 kernel/rcu/tree_plugin.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 97dba50..93da32c 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1659,7 +1659,7 @@ static void print_cpu_stall_fast_no_hz(char *cp, int cpu)
rdp->last_accelerate & 0x, jiffies & 0x,
".l"[rdp->all_lazy],
".L"[!rcu_segcblist_n_nonlazy_cbs(>cblist)],
-   ".D"[!rdp->tick_nohz_enabled_snap]);
+   ".D"[!!rdp->tick_nohz_enabled_snap]);
 }
 
 #else /* #ifdef CONFIG_RCU_FAST_NO_HZ */
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation



[PATCH] rcu/tree: Fix self wakeups for grace period kthread

2019-03-08 Thread Neeraj Upadhyay
Update the code to match the comment that self wakeup of
grace period kthread is allowed from interrupt handler, and
softirq handler, running in the grace period kthread's
context. Present code allows self wakeups from all
interrupt contexts - nmi, softirq and hardirq contexts.

Signed-off-by: Neeraj Upadhyay 
---
 kernel/rcu/tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index acd6ccf..57cac6d 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1585,7 +1585,7 @@ static bool rcu_future_gp_cleanup(struct rcu_node *rnp)
 static void rcu_gp_kthread_wake(void)
 {
if ((current == rcu_state.gp_kthread &&
-!in_interrupt() && !in_serving_softirq()) ||
+!in_irq() && !in_serving_softirq()) ||
!READ_ONCE(rcu_state.gp_flags) ||
!rcu_state.gp_kthread)
return;
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation



[tip:smp/urgent] cpu/hotplug: Adjust misplaced smb() in cpuhp_thread_fun()

2018-09-06 Thread tip-bot for Neeraj Upadhyay
Commit-ID:  f8b7530aa0a1def79c93101216b5b17cf408a70a
Gitweb: https://git.kernel.org/tip/f8b7530aa0a1def79c93101216b5b17cf408a70a
Author: Neeraj Upadhyay 
AuthorDate: Wed, 5 Sep 2018 11:22:07 +0530
Committer:  Thomas Gleixner 
CommitDate: Thu, 6 Sep 2018 15:21:37 +0200

cpu/hotplug: Adjust misplaced smb() in cpuhp_thread_fun()

The smp_mb() in cpuhp_thread_fun() is misplaced. It needs to be after the
load of st->should_run to prevent reordering of the later load/stores
w.r.t. the load of st->should_run.

Fixes: 4dddfb5faa61 ("smp/hotplug: Rewrite AP state machine core")
Signed-off-by: Neeraj Upadhyay 
Signed-off-by: Thomas Gleixner 
Acked-by: Peter Zijlstra (Intel) 
Cc: j...@joshtriplett.org
Cc: pet...@infradead.org
Cc: jiangshan...@gmail.com
Cc: dzic...@redhat.com
Cc: brendan.jack...@arm.com
Cc: ma...@debian.org
Cc: mo...@codeaurora.org
Cc: sram...@codeaurora.org
Cc: linux-arm-...@vger.kernel.org
Cc: sta...@vger.kernel.org
Link: 
https://lkml.kernel.org/r/1536126727-11629-1-git-send-email-neer...@codeaurora.org

---
 kernel/cpu.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index aa7fe85ad62e..eb4041f78073 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -607,15 +607,15 @@ static void cpuhp_thread_fun(unsigned int cpu)
bool bringup = st->bringup;
enum cpuhp_state state;
 
+   if (WARN_ON_ONCE(!st->should_run))
+   return;
+
/*
 * ACQUIRE for the cpuhp_should_run() load of ->should_run. Ensures
 * that if we see ->should_run we also see the rest of the state.
 */
smp_mb();
 
-   if (WARN_ON_ONCE(!st->should_run))
-   return;
-
cpuhp_lock_acquire(bringup);
 
if (st->single) {


[tip:smp/urgent] cpu/hotplug: Adjust misplaced smb() in cpuhp_thread_fun()

2018-09-06 Thread tip-bot for Neeraj Upadhyay
Commit-ID:  f8b7530aa0a1def79c93101216b5b17cf408a70a
Gitweb: https://git.kernel.org/tip/f8b7530aa0a1def79c93101216b5b17cf408a70a
Author: Neeraj Upadhyay 
AuthorDate: Wed, 5 Sep 2018 11:22:07 +0530
Committer:  Thomas Gleixner 
CommitDate: Thu, 6 Sep 2018 15:21:37 +0200

cpu/hotplug: Adjust misplaced smb() in cpuhp_thread_fun()

The smp_mb() in cpuhp_thread_fun() is misplaced. It needs to be after the
load of st->should_run to prevent reordering of the later load/stores
w.r.t. the load of st->should_run.

Fixes: 4dddfb5faa61 ("smp/hotplug: Rewrite AP state machine core")
Signed-off-by: Neeraj Upadhyay 
Signed-off-by: Thomas Gleixner 
Acked-by: Peter Zijlstra (Intel) 
Cc: j...@joshtriplett.org
Cc: pet...@infradead.org
Cc: jiangshan...@gmail.com
Cc: dzic...@redhat.com
Cc: brendan.jack...@arm.com
Cc: ma...@debian.org
Cc: mo...@codeaurora.org
Cc: sram...@codeaurora.org
Cc: linux-arm-...@vger.kernel.org
Cc: sta...@vger.kernel.org
Link: 
https://lkml.kernel.org/r/1536126727-11629-1-git-send-email-neer...@codeaurora.org

---
 kernel/cpu.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index aa7fe85ad62e..eb4041f78073 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -607,15 +607,15 @@ static void cpuhp_thread_fun(unsigned int cpu)
bool bringup = st->bringup;
enum cpuhp_state state;
 
+   if (WARN_ON_ONCE(!st->should_run))
+   return;
+
/*
 * ACQUIRE for the cpuhp_should_run() load of ->should_run. Ensures
 * that if we see ->should_run we also see the rest of the state.
 */
smp_mb();
 
-   if (WARN_ON_ONCE(!st->should_run))
-   return;
-
cpuhp_lock_acquire(bringup);
 
if (st->single) {


Re: [PATCH] cpu/hotplug: Fix rollback during error-out in takedown_cpu()

2018-09-06 Thread Neeraj Upadhyay




On 09/06/2018 01:48 PM, Thomas Gleixner wrote:

On Thu, 6 Sep 2018, Neeraj Upadhyay wrote:

On 09/05/2018 06:47 PM, Thomas Gleixner wrote:

On Wed, 5 Sep 2018, Neeraj Upadhyay wrote:

On 09/05/2018 05:53 PM, Thomas Gleixner wrote:

And looking closer this is a general issue. Just that the TEARDOWN state
makes it simple to observe. It's universaly broken, when the first
teardown
callback fails because, st->state is only decremented _AFTER_ the
callback
returns success, but undo_cpu_down() increments unconditionally.


As per my understanding, there are 2 problems here; one is fixed with your
patch, and other is cpuhp_reset_state() is used during rollback from
non-AP to
AP state, which seem to result in 2 increments of st->state (one increment
done by cpuhp_reset_state() and another by cpu_thread_fun()) .

And how did your hack fix that up magically? I'll have a look later today.

Thanks,

tglx


The hack fixes it by not calling cpuhp_reset_state() and doing rollback state
reset inline in  _cpu_down().


And how is that any different from the proper patch I sent? It ends up in
the same state. So I have a hard time to understand your blurb above where
you claim that my patch just solves one of two problems?

Thanks,

tglx



Yes, your patch solves the problem related to smpboot unparking being 
skipped during rollback and with the hack we end up in same state. The 
second thing, which I am referring to, is that there is one additional 
state increment. I missed the part that, it could be required, so that 
we reach CPUHP_AP_ONLINE_IDLE before calling __cpuhp_kick_ap(). So, it's 
not a problem.


* 	cpuhp_reset_state() does one state increment and we reach 
CPUHP_AP_ONLINE_IDLE.


if (ret && st->state > CPUHP_TEARDOWN_CPU && st->state < prev_state) {
cpuhp_reset_state(st, prev_state);
__cpuhp_kick_ap(st);
}

CPUHP_TEARDOWN_CPU,
CPUHP_AP_ONLINE_IDLE,
CPUHP_AP_SMPBOOT_THREADS,


* cpuhp_thread_fun() does one more increment before invoking state 
callback (so, we skip CPUHP_AP_ONLINE_IDLE) and we reach 
CPUHP_AP_SMPBOOT_THREADS:


static void cpuhp_thread_fun(unsigned int cpu)

if (bringup) {
st->state++;
state = st->state;



--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH] cpu/hotplug: Fix rollback during error-out in takedown_cpu()

2018-09-06 Thread Neeraj Upadhyay




On 09/06/2018 01:48 PM, Thomas Gleixner wrote:

On Thu, 6 Sep 2018, Neeraj Upadhyay wrote:

On 09/05/2018 06:47 PM, Thomas Gleixner wrote:

On Wed, 5 Sep 2018, Neeraj Upadhyay wrote:

On 09/05/2018 05:53 PM, Thomas Gleixner wrote:

And looking closer this is a general issue. Just that the TEARDOWN state
makes it simple to observe. It's universaly broken, when the first
teardown
callback fails because, st->state is only decremented _AFTER_ the
callback
returns success, but undo_cpu_down() increments unconditionally.


As per my understanding, there are 2 problems here; one is fixed with your
patch, and other is cpuhp_reset_state() is used during rollback from
non-AP to
AP state, which seem to result in 2 increments of st->state (one increment
done by cpuhp_reset_state() and another by cpu_thread_fun()) .

And how did your hack fix that up magically? I'll have a look later today.

Thanks,

tglx


The hack fixes it by not calling cpuhp_reset_state() and doing rollback state
reset inline in  _cpu_down().


And how is that any different from the proper patch I sent? It ends up in
the same state. So I have a hard time to understand your blurb above where
you claim that my patch just solves one of two problems?

Thanks,

tglx



Yes, your patch solves the problem related to smpboot unparking being 
skipped during rollback and with the hack we end up in same state. The 
second thing, which I am referring to, is that there is one additional 
state increment. I missed the part that, it could be required, so that 
we reach CPUHP_AP_ONLINE_IDLE before calling __cpuhp_kick_ap(). So, it's 
not a problem.


* 	cpuhp_reset_state() does one state increment and we reach 
CPUHP_AP_ONLINE_IDLE.


if (ret && st->state > CPUHP_TEARDOWN_CPU && st->state < prev_state) {
cpuhp_reset_state(st, prev_state);
__cpuhp_kick_ap(st);
}

CPUHP_TEARDOWN_CPU,
CPUHP_AP_ONLINE_IDLE,
CPUHP_AP_SMPBOOT_THREADS,


* cpuhp_thread_fun() does one more increment before invoking state 
callback (so, we skip CPUHP_AP_ONLINE_IDLE) and we reach 
CPUHP_AP_SMPBOOT_THREADS:


static void cpuhp_thread_fun(unsigned int cpu)

if (bringup) {
st->state++;
state = st->state;



--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH] cpu/hotplug: Fix rollback during error-out in takedown_cpu()

2018-09-05 Thread Neeraj Upadhyay




On 09/05/2018 06:47 PM, Thomas Gleixner wrote:

On Wed, 5 Sep 2018, Neeraj Upadhyay wrote:

On 09/05/2018 05:53 PM, Thomas Gleixner wrote:

And looking closer this is a general issue. Just that the TEARDOWN state
makes it simple to observe. It's universaly broken, when the first teardown
callback fails because, st->state is only decremented _AFTER_ the callback
returns success, but undo_cpu_down() increments unconditionally.


As per my understanding, there are 2 problems here; one is fixed with your
patch, and other is cpuhp_reset_state() is used during rollback from non-AP to
AP state, which seem to result in 2 increments of st->state (one increment
done by cpuhp_reset_state() and another by cpu_thread_fun()) .

And how did your hack fix that up magically? I'll have a look later today.

Thanks,

tglx


The hack fixes it by not calling cpuhp_reset_state() and doing rollback 
state reset inline in  _cpu_down().





--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation



Re: [PATCH] cpu/hotplug: Fix rollback during error-out in takedown_cpu()

2018-09-05 Thread Neeraj Upadhyay




On 09/05/2018 06:47 PM, Thomas Gleixner wrote:

On Wed, 5 Sep 2018, Neeraj Upadhyay wrote:

On 09/05/2018 05:53 PM, Thomas Gleixner wrote:

And looking closer this is a general issue. Just that the TEARDOWN state
makes it simple to observe. It's universaly broken, when the first teardown
callback fails because, st->state is only decremented _AFTER_ the callback
returns success, but undo_cpu_down() increments unconditionally.


As per my understanding, there are 2 problems here; one is fixed with your
patch, and other is cpuhp_reset_state() is used during rollback from non-AP to
AP state, which seem to result in 2 increments of st->state (one increment
done by cpuhp_reset_state() and another by cpu_thread_fun()) .

And how did your hack fix that up magically? I'll have a look later today.

Thanks,

tglx


The hack fixes it by not calling cpuhp_reset_state() and doing rollback 
state reset inline in  _cpu_down().





--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation



Re: [PATCH] cpu/hotplug: Fix rollback during error-out in takedown_cpu()

2018-09-05 Thread Neeraj Upadhyay




On 09/05/2018 05:53 PM, Thomas Gleixner wrote:

On Wed, 5 Sep 2018, Thomas Gleixner wrote:

On Tue, 4 Sep 2018, Neeraj Upadhyay wrote:

ret = cpuhp_down_callbacks(cpu, st, target);
if (ret && st->state > CPUHP_TEARDOWN_CPU && st->state < prev_state) {
-   cpuhp_reset_state(st, prev_state);
+   /*
+* As st->last is not set, cpuhp_reset_state() increments
+* st->state, which results in CPUHP_AP_SMPBOOT_THREADS being
+* skipped during rollback. So, don't use it here.
+*/
+   st->rollback = true;
+   st->target = prev_state;
+   st->bringup = !st->bringup;

No, this is just papering over the actual problem.

The state inconsistency happens in take_cpu_down() when it returns with a
failure from __cpu_disable() because that returns with state = TEARDOWN_CPU
and st->state is then incremented in undo_cpu_down().

That's the real issue and we need to analyze the whole cpu_down rollback
logic first.

And looking closer this is a general issue. Just that the TEARDOWN state
makes it simple to observe. It's universaly broken, when the first teardown
callback fails because, st->state is only decremented _AFTER_ the callback
returns success, but undo_cpu_down() increments unconditionally.

Patch below.

Thanks,

tglx
As per my understanding, there are 2 problems here; one is fixed with 
your patch, and other is cpuhp_reset_state() is used during rollback 
from non-AP to AP state, which seem to result in 2 increments of 
st->state (one increment done by cpuhp_reset_state() and another by 
cpu_thread_fun()) .


--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -916,7 +916,8 @@ static int cpuhp_down_callbacks(unsigned
ret = cpuhp_invoke_callback(cpu, st->state, false, NULL, NULL);
if (ret) {
st->target = prev_state;
-   undo_cpu_down(cpu, st);
+   if (st->state < prev_state)
+   undo_cpu_down(cpu, st);
break;
}
}
@@ -969,7 +970,7 @@ static int __ref _cpu_down(unsigned int
 * to do the further cleanups.
 */
ret = cpuhp_down_callbacks(cpu, st, target);
-   if (ret && st->state > CPUHP_TEARDOWN_CPU && st->state < prev_state) {
+   if (ret && st->state == CPUHP_TEARDOWN_CPU && st->state < prev_state) {
cpuhp_reset_state(st, prev_state);
__cpuhp_kick_ap(st);
}


--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation



Re: [PATCH] cpu/hotplug: Fix rollback during error-out in takedown_cpu()

2018-09-05 Thread Neeraj Upadhyay




On 09/05/2018 05:53 PM, Thomas Gleixner wrote:

On Wed, 5 Sep 2018, Thomas Gleixner wrote:

On Tue, 4 Sep 2018, Neeraj Upadhyay wrote:

ret = cpuhp_down_callbacks(cpu, st, target);
if (ret && st->state > CPUHP_TEARDOWN_CPU && st->state < prev_state) {
-   cpuhp_reset_state(st, prev_state);
+   /*
+* As st->last is not set, cpuhp_reset_state() increments
+* st->state, which results in CPUHP_AP_SMPBOOT_THREADS being
+* skipped during rollback. So, don't use it here.
+*/
+   st->rollback = true;
+   st->target = prev_state;
+   st->bringup = !st->bringup;

No, this is just papering over the actual problem.

The state inconsistency happens in take_cpu_down() when it returns with a
failure from __cpu_disable() because that returns with state = TEARDOWN_CPU
and st->state is then incremented in undo_cpu_down().

That's the real issue and we need to analyze the whole cpu_down rollback
logic first.

And looking closer this is a general issue. Just that the TEARDOWN state
makes it simple to observe. It's universaly broken, when the first teardown
callback fails because, st->state is only decremented _AFTER_ the callback
returns success, but undo_cpu_down() increments unconditionally.

Patch below.

Thanks,

tglx
As per my understanding, there are 2 problems here; one is fixed with 
your patch, and other is cpuhp_reset_state() is used during rollback 
from non-AP to AP state, which seem to result in 2 increments of 
st->state (one increment done by cpuhp_reset_state() and another by 
cpu_thread_fun()) .


--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -916,7 +916,8 @@ static int cpuhp_down_callbacks(unsigned
ret = cpuhp_invoke_callback(cpu, st->state, false, NULL, NULL);
if (ret) {
st->target = prev_state;
-   undo_cpu_down(cpu, st);
+   if (st->state < prev_state)
+   undo_cpu_down(cpu, st);
break;
}
}
@@ -969,7 +970,7 @@ static int __ref _cpu_down(unsigned int
 * to do the further cleanups.
 */
ret = cpuhp_down_callbacks(cpu, st, target);
-   if (ret && st->state > CPUHP_TEARDOWN_CPU && st->state < prev_state) {
+   if (ret && st->state == CPUHP_TEARDOWN_CPU && st->state < prev_state) {
cpuhp_reset_state(st, prev_state);
__cpuhp_kick_ap(st);
}


--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation



[PATCH] cpu/hotplug: Fix acquire of st->should_run in cpuhp_thread_fun()

2018-09-04 Thread Neeraj Upadhyay
The smp_mb() in cpuhp_thread_fun() appears to be misplaced, and
need to be after the load of st->should_run, to prevent
reordering of the later load/stores w.r.t. the load of
st->should_run.

Signed-off-by: Neeraj Upadhyay 
---
 kernel/cpu.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index aa7fe85..eb4041f 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -607,15 +607,15 @@ static void cpuhp_thread_fun(unsigned int cpu)
bool bringup = st->bringup;
enum cpuhp_state state;
 
+   if (WARN_ON_ONCE(!st->should_run))
+   return;
+
/*
 * ACQUIRE for the cpuhp_should_run() load of ->should_run. Ensures
 * that if we see ->should_run we also see the rest of the state.
 */
smp_mb();
 
-   if (WARN_ON_ONCE(!st->should_run))
-   return;
-
cpuhp_lock_acquire(bringup);
 
if (st->single) {
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation



[PATCH] cpu/hotplug: Fix acquire of st->should_run in cpuhp_thread_fun()

2018-09-04 Thread Neeraj Upadhyay
The smp_mb() in cpuhp_thread_fun() appears to be misplaced, and
need to be after the load of st->should_run, to prevent
reordering of the later load/stores w.r.t. the load of
st->should_run.

Signed-off-by: Neeraj Upadhyay 
---
 kernel/cpu.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index aa7fe85..eb4041f 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -607,15 +607,15 @@ static void cpuhp_thread_fun(unsigned int cpu)
bool bringup = st->bringup;
enum cpuhp_state state;
 
+   if (WARN_ON_ONCE(!st->should_run))
+   return;
+
/*
 * ACQUIRE for the cpuhp_should_run() load of ->should_run. Ensures
 * that if we see ->should_run we also see the rest of the state.
 */
smp_mb();
 
-   if (WARN_ON_ONCE(!st->should_run))
-   return;
-
cpuhp_lock_acquire(bringup);
 
if (st->single) {
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation



[PATCH] cpu/hotplug: Fix rollback during error-out in takedown_cpu()

2018-09-04 Thread Neeraj Upadhyay
If takedown_cpu() fails during _cpu_down(), st->state is reset,
by calling cpuhp_reset_state(). This results in an additional
increment of st->state, which results in CPUHP_AP_SMPBOOT_THREADS
state being skipped during rollback. Fix this by not calling
cpuhp_reset_state() and doing the state reset directly in
_cpu_down().

Fixes: 4dddfb5faa61 ("smp/hotplug: Rewrite AP state machine core")
Signed-off-by: Neeraj Upadhyay 
---
 kernel/cpu.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index aa7fe85..9f49edb 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -970,7 +970,14 @@ static int __ref _cpu_down(unsigned int cpu, int 
tasks_frozen,
 */
ret = cpuhp_down_callbacks(cpu, st, target);
if (ret && st->state > CPUHP_TEARDOWN_CPU && st->state < prev_state) {
-   cpuhp_reset_state(st, prev_state);
+   /*
+* As st->last is not set, cpuhp_reset_state() increments
+* st->state, which results in CPUHP_AP_SMPBOOT_THREADS being
+* skipped during rollback. So, don't use it here.
+*/
+   st->rollback = true;
+   st->target = prev_state;
+   st->bringup = !st->bringup;
__cpuhp_kick_ap(st);
}
 
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation



[PATCH] cpu/hotplug: Fix rollback during error-out in takedown_cpu()

2018-09-04 Thread Neeraj Upadhyay
If takedown_cpu() fails during _cpu_down(), st->state is reset,
by calling cpuhp_reset_state(). This results in an additional
increment of st->state, which results in CPUHP_AP_SMPBOOT_THREADS
state being skipped during rollback. Fix this by not calling
cpuhp_reset_state() and doing the state reset directly in
_cpu_down().

Fixes: 4dddfb5faa61 ("smp/hotplug: Rewrite AP state machine core")
Signed-off-by: Neeraj Upadhyay 
---
 kernel/cpu.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index aa7fe85..9f49edb 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -970,7 +970,14 @@ static int __ref _cpu_down(unsigned int cpu, int 
tasks_frozen,
 */
ret = cpuhp_down_callbacks(cpu, st, target);
if (ret && st->state > CPUHP_TEARDOWN_CPU && st->state < prev_state) {
-   cpuhp_reset_state(st, prev_state);
+   /*
+* As st->last is not set, cpuhp_reset_state() increments
+* st->state, which results in CPUHP_AP_SMPBOOT_THREADS being
+* skipped during rollback. So, don't use it here.
+*/
+   st->rollback = true;
+   st->target = prev_state;
+   st->bringup = !st->bringup;
__cpuhp_kick_ap(st);
}
 
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation



Re: [PATCH] workqueue: Handle race between wake up and rebind

2018-01-18 Thread Neeraj Upadhyay



On 01/18/2018 08:32 AM, Lai Jiangshan wrote:

On Wed, Jan 17, 2018 at 4:08 AM, Neeraj Upadhyay <neer...@codeaurora.org> wrote:


On 01/16/2018 11:05 PM, Tejun Heo wrote:

Hello, Neeraj.

On Mon, Jan 15, 2018 at 02:08:12PM +0530, Neeraj Upadhyay wrote:

- kworker/0:0 gets chance to run on cpu1; while processing
a work, it goes to sleep. However, it does not decrement
pool->nr_running. This is because WORKER_REBOUND (NOT_
RUNNING) flag was cleared, when worker entered worker_

Do you mean that because REBOUND was set?


Actually, I meant REBOUND was not set. Below is the sequence

- cpu0 bounded pool is unbound.

- kworker/0:0 is woken up on cpu1.

- cpu0 pool is rebound
   REBOUND is set for kworker/0:0


Thanks for looking into the detail of workqueue...

"REBOUND is set for kworker/0:0" means set_cpus_allowed_ptr(kworker/0:0)
already successfull returned and kworker/0:0 is already moved to cpu0.

It will not still run on cpu1 as the following steps you described.

If there is something wrong with " set_cpus_allowed_ptr()"
in this situation, could you please elaborate it.


Thanks Lai, I missed that; will debug from that perspective.



- kworker/0:0 starts running on cpu1
   worker_thread()
 // It clears REBOUND and sets nr_running =1 after below call
 worker_clr_flags(worker, WORKER_PREP | WORKER_REBOUND);

- kworker/0:0 goes to sleep
   wq_worker_sleeping()
 // Below condition is not true, as all NOT_RUNNING
 // flags were cleared in worker_thread()
 if (worker->flags & WORKER_NOT_RUNNING)
 // Below is true, as worker is running on cpu1
 if (WARN_ON_ONCE(pool->cpu != raw_smp_processor_id()))
   return NULL;
 // Below is not reached and nr_running stays 1
 if (atomic_dec_and_test(>nr_running) &&

- kworker/0:0 wakes up again, this time on cpu0, as worker->task
   cpus_allowed was set to cpu0, in rebind_workers.
   wq_worker_waking_up()
 if (!(worker->flags & WORKER_NOT_RUNNING)) {
 // Increments pool->nr_running to 2
 atomic_inc(>pool->nr_running);


thread().

Worker 0 runs on cpu1
  worker_thread()
process_one_work()
  wq_worker_sleeping()
if (worker->flags & WORKER_NOT_RUNNING)
  return NULL;
if (WARN_ON_ONCE(pool->cpu != raw_smp_processor_id()))
  

- After this, when kworker/0:0 wakes up, this time on its
bounded cpu cpu0, it increments pool->nr_running again.
So, pool->nr_running becomes 2.

Why is it suddenly 2?  Who made it one on the account of the kworker?

As shown in above comment, it became 1 in
worker_clr_flags(worker, WORKER_PREP | WORKER_REBOUND);


Do you see this happening?  Or better, is there a (semi) reliable
repro for this issue?

Yes, this was reported in our long run testing with random hotplug.
Sorry, don't have a quick reproducer for it. Issue is reported in few
days of testing.


Thanks.


--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation



--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation



Re: [PATCH] workqueue: Handle race between wake up and rebind

2018-01-18 Thread Neeraj Upadhyay



On 01/18/2018 08:32 AM, Lai Jiangshan wrote:

On Wed, Jan 17, 2018 at 4:08 AM, Neeraj Upadhyay  wrote:


On 01/16/2018 11:05 PM, Tejun Heo wrote:

Hello, Neeraj.

On Mon, Jan 15, 2018 at 02:08:12PM +0530, Neeraj Upadhyay wrote:

- kworker/0:0 gets chance to run on cpu1; while processing
a work, it goes to sleep. However, it does not decrement
pool->nr_running. This is because WORKER_REBOUND (NOT_
RUNNING) flag was cleared, when worker entered worker_

Do you mean that because REBOUND was set?


Actually, I meant REBOUND was not set. Below is the sequence

- cpu0 bounded pool is unbound.

- kworker/0:0 is woken up on cpu1.

- cpu0 pool is rebound
   REBOUND is set for kworker/0:0


Thanks for looking into the detail of workqueue...

"REBOUND is set for kworker/0:0" means set_cpus_allowed_ptr(kworker/0:0)
already successfull returned and kworker/0:0 is already moved to cpu0.

It will not still run on cpu1 as the following steps you described.

If there is something wrong with " set_cpus_allowed_ptr()"
in this situation, could you please elaborate it.


Thanks Lai, I missed that; will debug from that perspective.



- kworker/0:0 starts running on cpu1
   worker_thread()
 // It clears REBOUND and sets nr_running =1 after below call
 worker_clr_flags(worker, WORKER_PREP | WORKER_REBOUND);

- kworker/0:0 goes to sleep
   wq_worker_sleeping()
 // Below condition is not true, as all NOT_RUNNING
 // flags were cleared in worker_thread()
 if (worker->flags & WORKER_NOT_RUNNING)
 // Below is true, as worker is running on cpu1
 if (WARN_ON_ONCE(pool->cpu != raw_smp_processor_id()))
   return NULL;
 // Below is not reached and nr_running stays 1
 if (atomic_dec_and_test(>nr_running) &&

- kworker/0:0 wakes up again, this time on cpu0, as worker->task
   cpus_allowed was set to cpu0, in rebind_workers.
   wq_worker_waking_up()
 if (!(worker->flags & WORKER_NOT_RUNNING)) {
 // Increments pool->nr_running to 2
 atomic_inc(>pool->nr_running);


thread().

Worker 0 runs on cpu1
  worker_thread()
process_one_work()
  wq_worker_sleeping()
if (worker->flags & WORKER_NOT_RUNNING)
  return NULL;
if (WARN_ON_ONCE(pool->cpu != raw_smp_processor_id()))
  

- After this, when kworker/0:0 wakes up, this time on its
bounded cpu cpu0, it increments pool->nr_running again.
So, pool->nr_running becomes 2.

Why is it suddenly 2?  Who made it one on the account of the kworker?

As shown in above comment, it became 1 in
worker_clr_flags(worker, WORKER_PREP | WORKER_REBOUND);


Do you see this happening?  Or better, is there a (semi) reliable
repro for this issue?

Yes, this was reported in our long run testing with random hotplug.
Sorry, don't have a quick reproducer for it. Issue is reported in few
days of testing.


Thanks.


--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation



--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation



Re: [PATCH] workqueue: Handle race between wake up and rebind

2018-01-16 Thread Neeraj Upadhyay



On 01/16/2018 11:05 PM, Tejun Heo wrote:

Hello, Neeraj.

On Mon, Jan 15, 2018 at 02:08:12PM +0530, Neeraj Upadhyay wrote:

- kworker/0:0 gets chance to run on cpu1; while processing
   a work, it goes to sleep. However, it does not decrement
   pool->nr_running. This is because WORKER_REBOUND (NOT_
   RUNNING) flag was cleared, when worker entered worker_

Do you mean that because REBOUND was set?


Actually, I meant REBOUND was not set. Below is the sequence

- cpu0 bounded pool is unbound.

- kworker/0:0 is woken up on cpu1.

- cpu0 pool is rebound
  REBOUND is set for kworker/0:0

- kworker/0:0 starts running on cpu1
  worker_thread()
// It clears REBOUND and sets nr_running =1 after below call
worker_clr_flags(worker, WORKER_PREP | WORKER_REBOUND);

- kworker/0:0 goes to sleep
  wq_worker_sleeping()
// Below condition is not true, as all NOT_RUNNING
// flags were cleared in worker_thread()
if (worker->flags & WORKER_NOT_RUNNING)
// Below is true, as worker is running on cpu1
if (WARN_ON_ONCE(pool->cpu != raw_smp_processor_id()))
  return NULL;
// Below is not reached and nr_running stays 1
if (atomic_dec_and_test(>nr_running) &&

- kworker/0:0 wakes up again, this time on cpu0, as worker->task
  cpus_allowed was set to cpu0, in rebind_workers.
  wq_worker_waking_up()
if (!(worker->flags & WORKER_NOT_RUNNING)) {
// Increments pool->nr_running to 2
atomic_inc(>pool->nr_running);




   thread().

   Worker 0 runs on cpu1
 worker_thread()
   process_one_work()
 wq_worker_sleeping()
   if (worker->flags & WORKER_NOT_RUNNING)
 return NULL;
   if (WARN_ON_ONCE(pool->cpu != raw_smp_processor_id()))
 

- After this, when kworker/0:0 wakes up, this time on its
   bounded cpu cpu0, it increments pool->nr_running again.
   So, pool->nr_running becomes 2.

Why is it suddenly 2?  Who made it one on the account of the kworker?

As shown in above comment, it became 1 in
worker_clr_flags(worker, WORKER_PREP | WORKER_REBOUND);


Do you see this happening?  Or better, is there a (semi) reliable
repro for this issue?

Yes, this was reported in our long run testing with random hotplug.
Sorry, don't have a quick reproducer for it. Issue is reported in few
days of testing.


Thanks.



--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation



Re: [PATCH] workqueue: Handle race between wake up and rebind

2018-01-16 Thread Neeraj Upadhyay



On 01/16/2018 11:05 PM, Tejun Heo wrote:

Hello, Neeraj.

On Mon, Jan 15, 2018 at 02:08:12PM +0530, Neeraj Upadhyay wrote:

- kworker/0:0 gets chance to run on cpu1; while processing
   a work, it goes to sleep. However, it does not decrement
   pool->nr_running. This is because WORKER_REBOUND (NOT_
   RUNNING) flag was cleared, when worker entered worker_

Do you mean that because REBOUND was set?


Actually, I meant REBOUND was not set. Below is the sequence

- cpu0 bounded pool is unbound.

- kworker/0:0 is woken up on cpu1.

- cpu0 pool is rebound
  REBOUND is set for kworker/0:0

- kworker/0:0 starts running on cpu1
  worker_thread()
// It clears REBOUND and sets nr_running =1 after below call
worker_clr_flags(worker, WORKER_PREP | WORKER_REBOUND);

- kworker/0:0 goes to sleep
  wq_worker_sleeping()
// Below condition is not true, as all NOT_RUNNING
// flags were cleared in worker_thread()
if (worker->flags & WORKER_NOT_RUNNING)
// Below is true, as worker is running on cpu1
if (WARN_ON_ONCE(pool->cpu != raw_smp_processor_id()))
  return NULL;
// Below is not reached and nr_running stays 1
if (atomic_dec_and_test(>nr_running) &&

- kworker/0:0 wakes up again, this time on cpu0, as worker->task
  cpus_allowed was set to cpu0, in rebind_workers.
  wq_worker_waking_up()
if (!(worker->flags & WORKER_NOT_RUNNING)) {
// Increments pool->nr_running to 2
atomic_inc(>pool->nr_running);




   thread().

   Worker 0 runs on cpu1
 worker_thread()
   process_one_work()
 wq_worker_sleeping()
   if (worker->flags & WORKER_NOT_RUNNING)
 return NULL;
   if (WARN_ON_ONCE(pool->cpu != raw_smp_processor_id()))
 

- After this, when kworker/0:0 wakes up, this time on its
   bounded cpu cpu0, it increments pool->nr_running again.
   So, pool->nr_running becomes 2.

Why is it suddenly 2?  Who made it one on the account of the kworker?

As shown in above comment, it became 1 in
worker_clr_flags(worker, WORKER_PREP | WORKER_REBOUND);


Do you see this happening?  Or better, is there a (semi) reliable
repro for this issue?

Yes, this was reported in our long run testing with random hotplug.
Sorry, don't have a quick reproducer for it. Issue is reported in few
days of testing.


Thanks.



--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation



[PATCH] workqueue: Handle race between wake up and rebind

2018-01-15 Thread Neeraj Upadhyay
There is a potential race b/w rebind_workers() and
wakeup of a worker thread, which can result in
workqueue lockup for a bounder worker pool.

Below is the potential race:

- cpu0 is a bounded worker pool, which is unbound
  from its cpu. A new work is queued on this pool,
  which causes its worker (kworker/0:0) to be woken
  up on a cpu different from cpu0, lets say cpu1.

  workqueue_queue_work
workqueue_activate_work
  

- cpu0 rebind happens
  rebind_workers()
Clears POOL_DISASSOCIATED and binds cpumask of all
workers.

- kworker/0:0 gets chance to run on cpu1; while processing
  a work, it goes to sleep. However, it does not decrement
  pool->nr_running. This is because WORKER_REBOUND (NOT_
  RUNNING) flag was cleared, when worker entered worker_
  thread().

  Worker 0 runs on cpu1
worker_thread()
  process_one_work()
wq_worker_sleeping()
  if (worker->flags & WORKER_NOT_RUNNING)
return NULL;
  if (WARN_ON_ONCE(pool->cpu != raw_smp_processor_id()))


- After this, when kworker/0:0 wakes up, this time on its
  bounded cpu cpu0, it increments pool->nr_running again.
  So, pool->nr_running becomes 2.

- When kworker/0:0 enters idle, it decrements pool->nr_running
  by 1. This leaves pool->nr_running =1 , with no workers in
  runnable state.

- Now, no new workers will be woken up, as pool->nr_running is
  non-zero. This results in indefinite lockup for this pool.

Fix this by deferring the work to some other idle worker,
if the current worker is not bound to its pool's CPU.

Signed-off-by: Neeraj Upadhyay <neer...@codeaurora.org>
---
 kernel/workqueue.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 43d18cb..71c0023 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2218,6 +2218,17 @@ static int worker_thread(void *__worker)
if (unlikely(!may_start_working(pool)) && manage_workers(worker))
goto recheck;
 
+   /* handle the case where, while a bounded pool is unbound,
+* its worker is woken up on a target CPU, which is different
+* from pool->cpu, but pool is rebound before this worker gets
+* chance to run on the target CPU.
+*/
+   if (WARN_ON_ONCE(!(pool->flags & POOL_DISASSOCIATED) &&
+   raw_smp_processor_id() != pool->cpu)) {
+   wake_up_worker(pool);
+   goto sleep;
+   }
+
/*
 * ->scheduled list can only be filled while a worker is
 * preparing to process a work or actually processing it.
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation



[PATCH] workqueue: Handle race between wake up and rebind

2018-01-15 Thread Neeraj Upadhyay
There is a potential race b/w rebind_workers() and
wakeup of a worker thread, which can result in
workqueue lockup for a bounder worker pool.

Below is the potential race:

- cpu0 is a bounded worker pool, which is unbound
  from its cpu. A new work is queued on this pool,
  which causes its worker (kworker/0:0) to be woken
  up on a cpu different from cpu0, lets say cpu1.

  workqueue_queue_work
workqueue_activate_work
  

- cpu0 rebind happens
  rebind_workers()
Clears POOL_DISASSOCIATED and binds cpumask of all
workers.

- kworker/0:0 gets chance to run on cpu1; while processing
  a work, it goes to sleep. However, it does not decrement
  pool->nr_running. This is because WORKER_REBOUND (NOT_
  RUNNING) flag was cleared, when worker entered worker_
  thread().

  Worker 0 runs on cpu1
worker_thread()
  process_one_work()
wq_worker_sleeping()
  if (worker->flags & WORKER_NOT_RUNNING)
return NULL;
  if (WARN_ON_ONCE(pool->cpu != raw_smp_processor_id()))


- After this, when kworker/0:0 wakes up, this time on its
  bounded cpu cpu0, it increments pool->nr_running again.
  So, pool->nr_running becomes 2.

- When kworker/0:0 enters idle, it decrements pool->nr_running
  by 1. This leaves pool->nr_running =1 , with no workers in
  runnable state.

- Now, no new workers will be woken up, as pool->nr_running is
  non-zero. This results in indefinite lockup for this pool.

Fix this by deferring the work to some other idle worker,
if the current worker is not bound to its pool's CPU.

Signed-off-by: Neeraj Upadhyay 
---
 kernel/workqueue.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 43d18cb..71c0023 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2218,6 +2218,17 @@ static int worker_thread(void *__worker)
if (unlikely(!may_start_working(pool)) && manage_workers(worker))
goto recheck;
 
+   /* handle the case where, while a bounded pool is unbound,
+* its worker is woken up on a target CPU, which is different
+* from pool->cpu, but pool is rebound before this worker gets
+* chance to run on the target CPU.
+*/
+   if (WARN_ON_ONCE(!(pool->flags & POOL_DISASSOCIATED) &&
+   raw_smp_processor_id() != pool->cpu)) {
+   wake_up_worker(pool);
+   goto sleep;
+   }
+
/*
 * ->scheduled list can only be filled while a worker is
 * preparing to process a work or actually processing it.
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation



Re: Query regarding srcu_funnel_exp_start()

2017-10-27 Thread Neeraj Upadhyay

On 10/28/2017 03:50 AM, Paul E. McKenney wrote:

On Fri, Oct 27, 2017 at 10:15:04PM +0530, Neeraj Upadhyay wrote:

On 10/27/2017 05:56 PM, Paul E. McKenney wrote:

On Fri, Oct 27, 2017 at 02:23:07PM +0530, Neeraj Upadhyay wrote:

Hi,

One query regarding srcu_funnel_exp_start() function in
kernel/rcu/srcutree.c.

static void srcu_funnel_exp_start(struct srcu_struct *sp, struct
srcu_node *snp,
  unsigned long s)
{

if (!ULONG_CMP_LT(sp->srcu_gp_seq_needed_exp, s))
sp->srcu_gp_seq_needed_exp = s;

}

Why is sp->srcu_gp_seq_needed_exp set to 's' if srcu_gp_seq_needed_exp is >=
's'. Shouldn't srcu_gp_seq_needed_exp be equal to the greater of both?


Let's suppose that it is incorrect as currently written.  Can you
construct a test case demonstrating a failure of some sort, then provide
a fix?


Will check this. Might take some time to build a test case.


Fair enough!

I suggest checking to see if kernel/rcu/rcuperf.c can do what you need for
this test.  (Might not with a single test, but perhaps a before-and-after
comparison.  Or maybe you really do need to add some test code somewhere.)



Thanks for the suggestion, will try that out.


To start with, if it is currently incorrect, what would be the nature
of the failure?

Thanx, Paul



Hi Paul,

I see below scenario, where new gp won't be expedited. Please correct
me if I am missing something here.

1. CPU0 calls synchronize_srcu_expedited()

synchronize_srcu_expedited()
   __synchronize_srcu()
 __call_srcu()
 s = rcu_seq_snap(>srcu_gp_seq); // lets say
srcu_gp_seq  = 0;
 // s = 0x100


Looks like you have one hex digit and then two binary digits, but why not?
(RCU_SEQ_STATE_MASK is 3 rather than 0xff >


Yeah, sorry I confused myself while representing the values. 0x100 need 
to be replaced with b'100' and 0x200 with b'1000'.



 sdp->srcu_gp_seq_needed = s // 0x100
 needgp = true
 sdp->srcu_gp_seq_needed_exp = s // 0x100
 srcu_funnel_gp_start()
 sp->srcu_gp_seq_needed_exp = s;
 srcu_gp_start(sp);
 rcu_seq_start(>srcu_gp_seq);

2. CPU1 calls normal synchronize_srcu()

synchronize_srcu()
 __synchronize_srcu(sp, true)
 __call_srcu()
 s = rcu_seq_snap(>srcu_gp_seq); // srcu_gp_seq = 1
 // s= 0x200
 sdp->srcu_gp_seq_needed = s; // 0x200
 srcu_funnel_gp_start()
 smp_store_release(>srcu_gp_seq_needed, s); // 0x200

3. CPU3 calls synchronize_srcu_expedited()

synchronize_srcu_expedited()
   __synchronize_srcu()
 __call_srcu()
 s = rcu_seq_snap(>srcu_gp_seq); // srcu_gp_seq = 1
 // s = 0x200
 sdp->srcu_gp_seq_needed_exp = s // 0x200
 srcu_funnel_exp_start(sp, sdp->mynode, s);
 // sp->srcu_gp_seq_needed_exp = 0x100
 // s = 0x200 ; sp->srcu_gp_seq_needed_exp is not updated
 if (!ULONG_CMP_LT(sp->srcu_gp_seq_needed_exp, s))
 sp->srcu_gp_seq_needed_exp = s;


Seems plausible, but you should be able to show the difference in
grace-period duration with a test.



Ok sure, will attempt that.


While you are in srcu_funnel_exp_start(), should it be rechecking
rcu_seq_done(>srcu_gp_seq, s) as well as the current
ULONG_CMP_GE(snp->srcu_gp_seq_needed_exp, s) under the lock?
Why or why not?

Thanx, Paul



Hi Paul,

I don't see how it will impact. I have put markers in code snippet
below to explain my points. My understanding is

* rcu_seq_done check @a is a fastpath return, and avoid contention
for snp lock, if the gp has already elapsed.

* Checking it @b, inside srcu_node  lock might not make any
difference, as sp->srcu_gp_seq counter portion is updated
under srcu_struct lock. Also, we cannot lock srcu_struct at this
point, as it will cause lock contention among multiple CPUs.

* Checking rcu_seq_done @c also does not impact, as we have already
done all the work of traversing the entire parent chain and if
rcu_seq_done() is true srcu_gp_seq_needed_exp will be greater
than or equal to 's'.

  srcu_gp_end()
raw_spin_lock_irq_rcu_node(sp);
rcu_seq_end(>srcu_gp_seq);
gpseq = rcu_seq_current(>srcu_gp_seq);
if (ULONG_CMP_LT(sp->srcu_gp_seq_needed_exp, gpseq))
sp->srcu_gp_seq_needed_exp = gpseq;
raw_spin_unlock_irq_rcu_node(sp);

static void srcu_funnel_exp_start(...)
{


for (; snp != NULL; snp = snp->srcu_parent) {
if (rcu_seq_done(>srcu_gp_seq, s) ||  /* a */
ULONG_CMP_GE(READ_ONCE(snp->srcu_gp_seq_needed_exp), s))
return;
  

Re: Query regarding srcu_funnel_exp_start()

2017-10-27 Thread Neeraj Upadhyay

On 10/28/2017 03:50 AM, Paul E. McKenney wrote:

On Fri, Oct 27, 2017 at 10:15:04PM +0530, Neeraj Upadhyay wrote:

On 10/27/2017 05:56 PM, Paul E. McKenney wrote:

On Fri, Oct 27, 2017 at 02:23:07PM +0530, Neeraj Upadhyay wrote:

Hi,

One query regarding srcu_funnel_exp_start() function in
kernel/rcu/srcutree.c.

static void srcu_funnel_exp_start(struct srcu_struct *sp, struct
srcu_node *snp,
  unsigned long s)
{

if (!ULONG_CMP_LT(sp->srcu_gp_seq_needed_exp, s))
sp->srcu_gp_seq_needed_exp = s;

}

Why is sp->srcu_gp_seq_needed_exp set to 's' if srcu_gp_seq_needed_exp is >=
's'. Shouldn't srcu_gp_seq_needed_exp be equal to the greater of both?


Let's suppose that it is incorrect as currently written.  Can you
construct a test case demonstrating a failure of some sort, then provide
a fix?


Will check this. Might take some time to build a test case.


Fair enough!

I suggest checking to see if kernel/rcu/rcuperf.c can do what you need for
this test.  (Might not with a single test, but perhaps a before-and-after
comparison.  Or maybe you really do need to add some test code somewhere.)



Thanks for the suggestion, will try that out.


To start with, if it is currently incorrect, what would be the nature
of the failure?

Thanx, Paul



Hi Paul,

I see below scenario, where new gp won't be expedited. Please correct
me if I am missing something here.

1. CPU0 calls synchronize_srcu_expedited()

synchronize_srcu_expedited()
   __synchronize_srcu()
 __call_srcu()
 s = rcu_seq_snap(>srcu_gp_seq); // lets say
srcu_gp_seq  = 0;
 // s = 0x100


Looks like you have one hex digit and then two binary digits, but why not?
(RCU_SEQ_STATE_MASK is 3 rather than 0xff >


Yeah, sorry I confused myself while representing the values. 0x100 need 
to be replaced with b'100' and 0x200 with b'1000'.



 sdp->srcu_gp_seq_needed = s // 0x100
 needgp = true
 sdp->srcu_gp_seq_needed_exp = s // 0x100
 srcu_funnel_gp_start()
 sp->srcu_gp_seq_needed_exp = s;
 srcu_gp_start(sp);
 rcu_seq_start(>srcu_gp_seq);

2. CPU1 calls normal synchronize_srcu()

synchronize_srcu()
 __synchronize_srcu(sp, true)
 __call_srcu()
 s = rcu_seq_snap(>srcu_gp_seq); // srcu_gp_seq = 1
 // s= 0x200
 sdp->srcu_gp_seq_needed = s; // 0x200
 srcu_funnel_gp_start()
 smp_store_release(>srcu_gp_seq_needed, s); // 0x200

3. CPU3 calls synchronize_srcu_expedited()

synchronize_srcu_expedited()
   __synchronize_srcu()
 __call_srcu()
 s = rcu_seq_snap(>srcu_gp_seq); // srcu_gp_seq = 1
 // s = 0x200
 sdp->srcu_gp_seq_needed_exp = s // 0x200
 srcu_funnel_exp_start(sp, sdp->mynode, s);
 // sp->srcu_gp_seq_needed_exp = 0x100
 // s = 0x200 ; sp->srcu_gp_seq_needed_exp is not updated
 if (!ULONG_CMP_LT(sp->srcu_gp_seq_needed_exp, s))
 sp->srcu_gp_seq_needed_exp = s;


Seems plausible, but you should be able to show the difference in
grace-period duration with a test.



Ok sure, will attempt that.


While you are in srcu_funnel_exp_start(), should it be rechecking
rcu_seq_done(>srcu_gp_seq, s) as well as the current
ULONG_CMP_GE(snp->srcu_gp_seq_needed_exp, s) under the lock?
Why or why not?

Thanx, Paul



Hi Paul,

I don't see how it will impact. I have put markers in code snippet
below to explain my points. My understanding is

* rcu_seq_done check @a is a fastpath return, and avoid contention
for snp lock, if the gp has already elapsed.

* Checking it @b, inside srcu_node  lock might not make any
difference, as sp->srcu_gp_seq counter portion is updated
under srcu_struct lock. Also, we cannot lock srcu_struct at this
point, as it will cause lock contention among multiple CPUs.

* Checking rcu_seq_done @c also does not impact, as we have already
done all the work of traversing the entire parent chain and if
rcu_seq_done() is true srcu_gp_seq_needed_exp will be greater
than or equal to 's'.

  srcu_gp_end()
raw_spin_lock_irq_rcu_node(sp);
rcu_seq_end(>srcu_gp_seq);
gpseq = rcu_seq_current(>srcu_gp_seq);
if (ULONG_CMP_LT(sp->srcu_gp_seq_needed_exp, gpseq))
sp->srcu_gp_seq_needed_exp = gpseq;
raw_spin_unlock_irq_rcu_node(sp);

static void srcu_funnel_exp_start(...)
{


for (; snp != NULL; snp = snp->srcu_parent) {
if (rcu_seq_done(>srcu_gp_seq, s) ||  /* a */
ULONG_CMP_GE(READ_ONCE(snp->srcu_gp_seq_needed_exp), s))
return;
  

Re: Query regarding srcu_funnel_exp_start()

2017-10-27 Thread Neeraj Upadhyay

On 10/27/2017 05:56 PM, Paul E. McKenney wrote:

On Fri, Oct 27, 2017 at 02:23:07PM +0530, Neeraj Upadhyay wrote:

Hi,

One query regarding srcu_funnel_exp_start() function in
kernel/rcu/srcutree.c.

static void srcu_funnel_exp_start(struct srcu_struct *sp, struct
srcu_node *snp,
  unsigned long s)
{

if (!ULONG_CMP_LT(sp->srcu_gp_seq_needed_exp, s))
sp->srcu_gp_seq_needed_exp = s;

}

Why is sp->srcu_gp_seq_needed_exp set to 's' if srcu_gp_seq_needed_exp is >=
's'. Shouldn't srcu_gp_seq_needed_exp be equal to the greater of both?


Let's suppose that it is incorrect as currently written.  Can you
construct a test case demonstrating a failure of some sort, then provide
a fix?


Will check this. Might take some time to build a test case.



To start with, if it is currently incorrect, what would be the nature
of the failure?

Thanx, Paul



Hi Paul,

I see below scenario, where new gp won't be expedited. Please correct
me if I am missing something here.

1. CPU0 calls synchronize_srcu_expedited()

synchronize_srcu_expedited()
  __synchronize_srcu()
__call_srcu()
s = rcu_seq_snap(>srcu_gp_seq); // lets say srcu_gp_seq 
 = 0;

// s = 0x100
sdp->srcu_gp_seq_needed = s // 0x100
needgp = true
sdp->srcu_gp_seq_needed_exp = s // 0x100
srcu_funnel_gp_start()
sp->srcu_gp_seq_needed_exp = s;
srcu_gp_start(sp);
rcu_seq_start(>srcu_gp_seq);

2. CPU1 calls normal synchronize_srcu()

synchronize_srcu()
__synchronize_srcu(sp, true)
__call_srcu()
s = rcu_seq_snap(>srcu_gp_seq); // srcu_gp_seq = 1
// s= 0x200
sdp->srcu_gp_seq_needed = s; // 0x200
srcu_funnel_gp_start()
smp_store_release(>srcu_gp_seq_needed, s); // 0x200

3. CPU3 calls synchronize_srcu_expedited()

synchronize_srcu_expedited()
  __synchronize_srcu()
__call_srcu()
s = rcu_seq_snap(>srcu_gp_seq); // srcu_gp_seq = 1
// s = 0x200
sdp->srcu_gp_seq_needed_exp = s // 0x200
srcu_funnel_exp_start(sp, sdp->mynode, s);
// sp->srcu_gp_seq_needed_exp = 0x100
// s = 0x200 ; sp->srcu_gp_seq_needed_exp is not updated
if (!ULONG_CMP_LT(sp->srcu_gp_seq_needed_exp, s))
sp->srcu_gp_seq_needed_exp = s;


Thanks
Neeraj






--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation


  1   2   >