Re: 回复: [PATCH] rcu: shrink each possible cpu krcp

2020-08-31 Thread Uladzislau Rezki
On Fri, Aug 21, 2020 at 08:33:28AM -0700, Paul E. McKenney wrote:
> On Thu, Aug 20, 2020 at 06:39:57PM -0400, Joel Fernandes wrote:
> > On Wed, Aug 19, 2020 at 05:58:08PM +0200, Uladzislau Rezki wrote:
> > > On Wed, Aug 19, 2020 at 08:21:59AM -0700, Paul E. McKenney wrote:
> > > > On Wed, Aug 19, 2020 at 09:56:54AM -0400, Joel Fernandes wrote:
> > > > > On Wed, Aug 19, 2020 at 03:00:55AM +, Zhang, Qiang wrote:
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 发件人: linux-kernel-ow...@vger.kernel.org 
> > > > > >  代表 Joel Fernandes 
> > > > > > 
> > > > > > 发送时间: 2020年8月19日 8:04
> > > > > > 收件人: Paul E. McKenney
> > > > > > 抄送: Uladzislau Rezki; Zhang, Qiang; Josh Triplett; Steven Rostedt; 
> > > > > > Mathieu Desnoyers; Lai Jiangshan; rcu; LKML
> > > > > > 主题: Re: [PATCH] rcu: shrink each possible cpu krcp
> > > > > > 
> > > > > > On Tue, Aug 18, 2020 at 6:02 PM Paul E. McKenney 
> > > > > >  wrote:
> > > > > > 
> > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > > > > index b8ccd7b5af82..6decb9ad2421 100644
> > > > > > > > --- a/kernel/rcu/tree.c
> > > > > > > > +++ b/kernel/rcu/tree.c
> > > > > > > > @@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu)
> > > > > > > >  {
> > > > > > > > struct rcu_data *rdp = per_cpu_ptr(_data, cpu);
> > > > > > > > struct rcu_node *rnp = rdp->mynode;  /* Outgoing CPU's 
> > > > > > > > rdp & rnp. */
> > > > > > > > +   struct kfree_rcu_cpu *krcp;
> > > > > > > >
> > > > > > > > if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
> > > > > > > > return 0;
> > > > > > > >
> > > > > > > > +   /* Drain the kcrp of this CPU. IRQs should be disabled? 
> > > > > > > > */
> > > > > > > > +   krcp = this_cpu_ptr()
> > > > > > > > +   schedule_delayed_work(>monitor_work, 0);
> > > > > > > > +
> > > > > > > >
> > > > > > > > A cpu can be offlined and its krp will be stuck until a 
> > > > > > > > shrinker is involved.
> > > > > > > > Maybe be never.
> > > > > > >
> > > > > > > Does the same apply to its kmalloc() per-CPU caches?  If so, I 
> > > > > > > have a
> > > > > > > hard time getting too worried about it.  ;-)
> > > > > > 
> > > > > > >Looking at slab_offline_cpu() , that calls 
> > > > > > >cancel_delayed_work_sync()
> > > > > > >on the cache reaper who's job is to flush the per-cpu caches. So I
> > > > > > >believe during CPU offlining, the per-cpu slab caches are flushed.
> > > > > > >
> > > > > > >thanks,
> > > > > > >
> > > > > >  >- Joel
> > > > > > 
> > > > > > When cpu going offline, the slub or slab only flush free objects in 
> > > > > > offline
> > > > > > cpu cache,  put these free objects in node list  or return buddy 
> > > > > > system,
> > > > > > for those who are still in use, they still stay offline cpu cache.
> > > > > > 
> > > > > > If we want clean per-cpu "krcp" objects when cpu going offline.  we 
> > > > > > should
> > > > > > free "krcp" cache objects in "rcutree_offline_cpu", this func be 
> > > > > > called
> > > > > > before other rcu cpu offline func. and then "rcutree_offline_cpu" 
> > > > > > will be
> > > > > > called in "cpuhp/%u" per-cpu thread.
> > > > > > 
> > > > > 
> > > > > Could you please wrap text properly when you post to mailing list, 
> > > > > thanks. I
> > > > > fixed it for you above.
> > > > > 
> > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > > index 8ce77d9ac716..1812d4a1ac1b 100644
> > > > > > --- a/kernel/rcu/tree.c
> > > > > > +++ b/kernel/rcu/tree.c
> > > > > > @@ -3959,6 +3959,7 @@ int rcutree_offline_cpu(unsigned int cpu)
> > > > > > unsigned long flags;
> > > > > > struct rcu_data *rdp;
> > > > > > struct rcu_node *rnp;
> > > > > > +   struct kfree_rcu_cpu *krcp;
> > > > > >  
> > > > > > rdp = per_cpu_ptr(_data, cpu);
> > > > > > rnp = rdp->mynode;
> > > > > > @@ -3970,6 +3971,11 @@ int rcutree_offline_cpu(unsigned int cpu)
> > > > > >  
> > > > > > // nohz_full CPUs need the tick for stop-machine to work 
> > > > > > quickly
> > > > > > tick_dep_set(TICK_DEP_BIT_RCU);
> > > > > > +
> > > > > > +   krcp = per_cpu_ptr(, cpu);
> > > > > > +   raw_spin_lock_irqsave(>lock, flags);
> > > > > > +   schedule_delayed_work(>monitor_work, 0);
> > > > > > +   raw_spin_unlock_irqrestore(>lock, flags);
> > > > > > return 0;
> > > > > 
> > > > > I realized the above is not good enough for what this is trying to 
> > > > > do. Unlike
> > > > > the slab, the new kfree_rcu objects cannot always be drained / 
> > > > > submitted to
> > > > > RCU because the previous batch may still be waiting for a grace 
> > > > > period. So
> > > > > the above code could very well return with the yet-to-be-submitted 
> > > > > kfree_rcu
> > > > > objects still in the cache.
> > > > > 
> > > > > One option is to spin-wait here for monitor_todo to be false and keep 
> > > > > calling
> > > > > kfree_rcu_drain_unlock() till 

Re: 回复: [PATCH] rcu: shrink each possible cpu krcp

2020-08-21 Thread Paul E. McKenney
On Thu, Aug 20, 2020 at 06:39:57PM -0400, Joel Fernandes wrote:
> On Wed, Aug 19, 2020 at 05:58:08PM +0200, Uladzislau Rezki wrote:
> > On Wed, Aug 19, 2020 at 08:21:59AM -0700, Paul E. McKenney wrote:
> > > On Wed, Aug 19, 2020 at 09:56:54AM -0400, Joel Fernandes wrote:
> > > > On Wed, Aug 19, 2020 at 03:00:55AM +, Zhang, Qiang wrote:
> > > > > 
> > > > > 
> > > > > 
> > > > > 发件人: linux-kernel-ow...@vger.kernel.org 
> > > > >  代表 Joel Fernandes 
> > > > > 
> > > > > 发送时间: 2020年8月19日 8:04
> > > > > 收件人: Paul E. McKenney
> > > > > 抄送: Uladzislau Rezki; Zhang, Qiang; Josh Triplett; Steven Rostedt; 
> > > > > Mathieu Desnoyers; Lai Jiangshan; rcu; LKML
> > > > > 主题: Re: [PATCH] rcu: shrink each possible cpu krcp
> > > > > 
> > > > > On Tue, Aug 18, 2020 at 6:02 PM Paul E. McKenney  
> > > > > wrote:
> > > > > 
> > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > > > index b8ccd7b5af82..6decb9ad2421 100644
> > > > > > > --- a/kernel/rcu/tree.c
> > > > > > > +++ b/kernel/rcu/tree.c
> > > > > > > @@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu)
> > > > > > >  {
> > > > > > > struct rcu_data *rdp = per_cpu_ptr(_data, cpu);
> > > > > > > struct rcu_node *rnp = rdp->mynode;  /* Outgoing CPU's 
> > > > > > > rdp & rnp. */
> > > > > > > +   struct kfree_rcu_cpu *krcp;
> > > > > > >
> > > > > > > if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
> > > > > > > return 0;
> > > > > > >
> > > > > > > +   /* Drain the kcrp of this CPU. IRQs should be disabled? */
> > > > > > > +   krcp = this_cpu_ptr()
> > > > > > > +   schedule_delayed_work(>monitor_work, 0);
> > > > > > > +
> > > > > > >
> > > > > > > A cpu can be offlined and its krp will be stuck until a shrinker 
> > > > > > > is involved.
> > > > > > > Maybe be never.
> > > > > >
> > > > > > Does the same apply to its kmalloc() per-CPU caches?  If so, I have 
> > > > > > a
> > > > > > hard time getting too worried about it.  ;-)
> > > > > 
> > > > > >Looking at slab_offline_cpu() , that calls cancel_delayed_work_sync()
> > > > > >on the cache reaper who's job is to flush the per-cpu caches. So I
> > > > > >believe during CPU offlining, the per-cpu slab caches are flushed.
> > > > > >
> > > > > >thanks,
> > > > > >
> > > > >  >- Joel
> > > > > 
> > > > > When cpu going offline, the slub or slab only flush free objects in 
> > > > > offline
> > > > > cpu cache,  put these free objects in node list  or return buddy 
> > > > > system,
> > > > > for those who are still in use, they still stay offline cpu cache.
> > > > > 
> > > > > If we want clean per-cpu "krcp" objects when cpu going offline.  we 
> > > > > should
> > > > > free "krcp" cache objects in "rcutree_offline_cpu", this func be 
> > > > > called
> > > > > before other rcu cpu offline func. and then "rcutree_offline_cpu" 
> > > > > will be
> > > > > called in "cpuhp/%u" per-cpu thread.
> > > > > 
> > > > 
> > > > Could you please wrap text properly when you post to mailing list, 
> > > > thanks. I
> > > > fixed it for you above.
> > > > 
> > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > index 8ce77d9ac716..1812d4a1ac1b 100644
> > > > > --- a/kernel/rcu/tree.c
> > > > > +++ b/kernel/rcu/tree.c
> > > > > @@ -3959,6 +3959,7 @@ int rcutree_offline_cpu(unsigned int cpu)
> > > > > unsigned long flags;
> > > > > struct rcu_data *rdp;
> > > > > struct rcu_node *rnp;
> > > > > +   struct kfree_rcu_cpu *krcp;
> > > > >  
> > > > > rdp = per_cpu_ptr(_data, cpu);
> > > > > rnp = rdp->mynode;
> > > > > @@ -3970,6 +3971,11 @@ int rcutree_offline_cpu(unsigned int cpu)
> > > > >  
> > > > > // nohz_full CPUs need the tick for stop-machine to work 
> > > > > quickly
> > > > > tick_dep_set(TICK_DEP_BIT_RCU);
> > > > > +
> > > > > +   krcp = per_cpu_ptr(, cpu);
> > > > > +   raw_spin_lock_irqsave(>lock, flags);
> > > > > +   schedule_delayed_work(>monitor_work, 0);
> > > > > +   raw_spin_unlock_irqrestore(>lock, flags);
> > > > > return 0;
> > > > 
> > > > I realized the above is not good enough for what this is trying to do. 
> > > > Unlike
> > > > the slab, the new kfree_rcu objects cannot always be drained / 
> > > > submitted to
> > > > RCU because the previous batch may still be waiting for a grace period. 
> > > > So
> > > > the above code could very well return with the yet-to-be-submitted 
> > > > kfree_rcu
> > > > objects still in the cache.
> > > > 
> > > > One option is to spin-wait here for monitor_todo to be false and keep 
> > > > calling
> > > > kfree_rcu_drain_unlock() till then.
> > > > 
> > > > But then that's not good enough either, because if new objects are 
> > > > queued
> > > > when interrupts are enabled in the CPU offline path, then the cache 
> > > > will get
> > > > new objects after the previous set was drained. Further, spin waiting 
> > > > may
> > > > introduce deadlocks.

Re: 回复: [PATCH] rcu: shrink each possible cpu krcp

2020-08-20 Thread Joel Fernandes
On Wed, Aug 19, 2020 at 05:58:08PM +0200, Uladzislau Rezki wrote:
> On Wed, Aug 19, 2020 at 08:21:59AM -0700, Paul E. McKenney wrote:
> > On Wed, Aug 19, 2020 at 09:56:54AM -0400, Joel Fernandes wrote:
> > > On Wed, Aug 19, 2020 at 03:00:55AM +, Zhang, Qiang wrote:
> > > > 
> > > > 
> > > > 
> > > > 发件人: linux-kernel-ow...@vger.kernel.org 
> > > >  代表 Joel Fernandes 
> > > > 
> > > > 发送时间: 2020年8月19日 8:04
> > > > 收件人: Paul E. McKenney
> > > > 抄送: Uladzislau Rezki; Zhang, Qiang; Josh Triplett; Steven Rostedt; 
> > > > Mathieu Desnoyers; Lai Jiangshan; rcu; LKML
> > > > 主题: Re: [PATCH] rcu: shrink each possible cpu krcp
> > > > 
> > > > On Tue, Aug 18, 2020 at 6:02 PM Paul E. McKenney  
> > > > wrote:
> > > > 
> > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > > index b8ccd7b5af82..6decb9ad2421 100644
> > > > > > --- a/kernel/rcu/tree.c
> > > > > > +++ b/kernel/rcu/tree.c
> > > > > > @@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu)
> > > > > >  {
> > > > > > struct rcu_data *rdp = per_cpu_ptr(_data, cpu);
> > > > > > struct rcu_node *rnp = rdp->mynode;  /* Outgoing CPU's rdp 
> > > > > > & rnp. */
> > > > > > +   struct kfree_rcu_cpu *krcp;
> > > > > >
> > > > > > if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
> > > > > > return 0;
> > > > > >
> > > > > > +   /* Drain the kcrp of this CPU. IRQs should be disabled? */
> > > > > > +   krcp = this_cpu_ptr()
> > > > > > +   schedule_delayed_work(>monitor_work, 0);
> > > > > > +
> > > > > >
> > > > > > A cpu can be offlined and its krp will be stuck until a shrinker is 
> > > > > > involved.
> > > > > > Maybe be never.
> > > > >
> > > > > Does the same apply to its kmalloc() per-CPU caches?  If so, I have a
> > > > > hard time getting too worried about it.  ;-)
> > > > 
> > > > >Looking at slab_offline_cpu() , that calls cancel_delayed_work_sync()
> > > > >on the cache reaper who's job is to flush the per-cpu caches. So I
> > > > >believe during CPU offlining, the per-cpu slab caches are flushed.
> > > > >
> > > > >thanks,
> > > > >
> > > >  >- Joel
> > > > 
> > > > When cpu going offline, the slub or slab only flush free objects in 
> > > > offline
> > > > cpu cache,  put these free objects in node list  or return buddy system,
> > > > for those who are still in use, they still stay offline cpu cache.
> > > > 
> > > > If we want clean per-cpu "krcp" objects when cpu going offline.  we 
> > > > should
> > > > free "krcp" cache objects in "rcutree_offline_cpu", this func be called
> > > > before other rcu cpu offline func. and then "rcutree_offline_cpu" will 
> > > > be
> > > > called in "cpuhp/%u" per-cpu thread.
> > > > 
> > > 
> > > Could you please wrap text properly when you post to mailing list, 
> > > thanks. I
> > > fixed it for you above.
> > > 
> > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > index 8ce77d9ac716..1812d4a1ac1b 100644
> > > > --- a/kernel/rcu/tree.c
> > > > +++ b/kernel/rcu/tree.c
> > > > @@ -3959,6 +3959,7 @@ int rcutree_offline_cpu(unsigned int cpu)
> > > > unsigned long flags;
> > > > struct rcu_data *rdp;
> > > > struct rcu_node *rnp;
> > > > +   struct kfree_rcu_cpu *krcp;
> > > >  
> > > > rdp = per_cpu_ptr(_data, cpu);
> > > > rnp = rdp->mynode;
> > > > @@ -3970,6 +3971,11 @@ int rcutree_offline_cpu(unsigned int cpu)
> > > >  
> > > > // nohz_full CPUs need the tick for stop-machine to work quickly
> > > > tick_dep_set(TICK_DEP_BIT_RCU);
> > > > +
> > > > +   krcp = per_cpu_ptr(, cpu);
> > > > +   raw_spin_lock_irqsave(>lock, flags);
> > > > +   schedule_delayed_work(>monitor_work, 0);
> > > > +   raw_spin_unlock_irqrestore(>lock, flags);
> > > > return 0;
> > > 
> > > I realized the above is not good enough for what this is trying to do. 
> > > Unlike
> > > the slab, the new kfree_rcu objects cannot always be drained / submitted 
> > > to
> > > RCU because the previous batch may still be waiting for a grace period. So
> > > the above code could very well return with the yet-to-be-submitted 
> > > kfree_rcu
> > > objects still in the cache.
> > > 
> > > One option is to spin-wait here for monitor_todo to be false and keep 
> > > calling
> > > kfree_rcu_drain_unlock() till then.
> > > 
> > > But then that's not good enough either, because if new objects are queued
> > > when interrupts are enabled in the CPU offline path, then the cache will 
> > > get
> > > new objects after the previous set was drained. Further, spin waiting may
> > > introduce deadlocks.
> > > 
> > > Another option is to switch the kfree_rcu() path to non-batching (so new
> > > objects cannot be cached in the offline path and are submitted directly to
> > > RCU), wait for a GP and then submit the work. But then not sure if 
> > > 1-argument
> > > kfree_rcu() will like that.
> > 
> > Or spawn a workqueue that does something like this:
> > 
> 

Re: 回复: [PATCH] rcu: shrink each possible cpu krcp

2020-08-19 Thread Uladzislau Rezki
On Wed, Aug 19, 2020 at 08:21:59AM -0700, Paul E. McKenney wrote:
> On Wed, Aug 19, 2020 at 09:56:54AM -0400, Joel Fernandes wrote:
> > On Wed, Aug 19, 2020 at 03:00:55AM +, Zhang, Qiang wrote:
> > > 
> > > 
> > > 
> > > 发件人: linux-kernel-ow...@vger.kernel.org 
> > >  代表 Joel Fernandes 
> > > 
> > > 发送时间: 2020年8月19日 8:04
> > > 收件人: Paul E. McKenney
> > > 抄送: Uladzislau Rezki; Zhang, Qiang; Josh Triplett; Steven Rostedt; 
> > > Mathieu Desnoyers; Lai Jiangshan; rcu; LKML
> > > 主题: Re: [PATCH] rcu: shrink each possible cpu krcp
> > > 
> > > On Tue, Aug 18, 2020 at 6:02 PM Paul E. McKenney  
> > > wrote:
> > > 
> > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > index b8ccd7b5af82..6decb9ad2421 100644
> > > > > --- a/kernel/rcu/tree.c
> > > > > +++ b/kernel/rcu/tree.c
> > > > > @@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu)
> > > > >  {
> > > > > struct rcu_data *rdp = per_cpu_ptr(_data, cpu);
> > > > > struct rcu_node *rnp = rdp->mynode;  /* Outgoing CPU's rdp & 
> > > > > rnp. */
> > > > > +   struct kfree_rcu_cpu *krcp;
> > > > >
> > > > > if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
> > > > > return 0;
> > > > >
> > > > > +   /* Drain the kcrp of this CPU. IRQs should be disabled? */
> > > > > +   krcp = this_cpu_ptr()
> > > > > +   schedule_delayed_work(>monitor_work, 0);
> > > > > +
> > > > >
> > > > > A cpu can be offlined and its krp will be stuck until a shrinker is 
> > > > > involved.
> > > > > Maybe be never.
> > > >
> > > > Does the same apply to its kmalloc() per-CPU caches?  If so, I have a
> > > > hard time getting too worried about it.  ;-)
> > > 
> > > >Looking at slab_offline_cpu() , that calls cancel_delayed_work_sync()
> > > >on the cache reaper who's job is to flush the per-cpu caches. So I
> > > >believe during CPU offlining, the per-cpu slab caches are flushed.
> > > >
> > > >thanks,
> > > >
> > >  >- Joel
> > > 
> > > When cpu going offline, the slub or slab only flush free objects in 
> > > offline
> > > cpu cache,  put these free objects in node list  or return buddy system,
> > > for those who are still in use, they still stay offline cpu cache.
> > > 
> > > If we want clean per-cpu "krcp" objects when cpu going offline.  we should
> > > free "krcp" cache objects in "rcutree_offline_cpu", this func be called
> > > before other rcu cpu offline func. and then "rcutree_offline_cpu" will be
> > > called in "cpuhp/%u" per-cpu thread.
> > > 
> > 
> > Could you please wrap text properly when you post to mailing list, thanks. I
> > fixed it for you above.
> > 
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index 8ce77d9ac716..1812d4a1ac1b 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -3959,6 +3959,7 @@ int rcutree_offline_cpu(unsigned int cpu)
> > > unsigned long flags;
> > > struct rcu_data *rdp;
> > > struct rcu_node *rnp;
> > > +   struct kfree_rcu_cpu *krcp;
> > >  
> > > rdp = per_cpu_ptr(_data, cpu);
> > > rnp = rdp->mynode;
> > > @@ -3970,6 +3971,11 @@ int rcutree_offline_cpu(unsigned int cpu)
> > >  
> > > // nohz_full CPUs need the tick for stop-machine to work quickly
> > > tick_dep_set(TICK_DEP_BIT_RCU);
> > > +
> > > +   krcp = per_cpu_ptr(, cpu);
> > > +   raw_spin_lock_irqsave(>lock, flags);
> > > +   schedule_delayed_work(>monitor_work, 0);
> > > +   raw_spin_unlock_irqrestore(>lock, flags);
> > > return 0;
> > 
> > I realized the above is not good enough for what this is trying to do. 
> > Unlike
> > the slab, the new kfree_rcu objects cannot always be drained / submitted to
> > RCU because the previous batch may still be waiting for a grace period. So
> > the above code could very well return with the yet-to-be-submitted kfree_rcu
> > objects still in the cache.
> > 
> > One option is to spin-wait here for monitor_todo to be false and keep 
> > calling
> > kfree_rcu_drain_unlock() till then.
> > 
> > But then that's not good enough either, because if new objects are queued
> > when interrupts are enabled in the CPU offline path, then the cache will get
> > new objects after the previous set was drained. Further, spin waiting may
> > introduce deadlocks.
> > 
> > Another option is to switch the kfree_rcu() path to non-batching (so new
> > objects cannot be cached in the offline path and are submitted directly to
> > RCU), wait for a GP and then submit the work. But then not sure if 
> > 1-argument
> > kfree_rcu() will like that.
> 
> Or spawn a workqueue that does something like this:
> 
> 1.Get any pending kvfree_rcu() requests sent off to RCU.
> 
> 2.Do an rcu_barrier().
> 
> 3.Do the cleanup actions.
> 
> > Probably Qian's original fix for for_each_possible_cpus() is good enough for
> > the shrinker case, and then we can tackle the hotplug one.
> 
> It might take some experimentation to find the best 

Re: 回复: [PATCH] rcu: shrink each possible cpu krcp

2020-08-19 Thread Joel Fernandes
On Wed, Aug 19, 2020 at 08:21:59AM -0700, Paul E. McKenney wrote:
> On Wed, Aug 19, 2020 at 09:56:54AM -0400, Joel Fernandes wrote:
> > On Wed, Aug 19, 2020 at 03:00:55AM +, Zhang, Qiang wrote:
> > > 
> > > 
> > > 
> > > 发件人: linux-kernel-ow...@vger.kernel.org 
> > >  代表 Joel Fernandes 
> > > 
> > > 发送时间: 2020年8月19日 8:04
> > > 收件人: Paul E. McKenney
> > > 抄送: Uladzislau Rezki; Zhang, Qiang; Josh Triplett; Steven Rostedt; 
> > > Mathieu Desnoyers; Lai Jiangshan; rcu; LKML
> > > 主题: Re: [PATCH] rcu: shrink each possible cpu krcp
> > > 
> > > On Tue, Aug 18, 2020 at 6:02 PM Paul E. McKenney  
> > > wrote:
> > > 
> > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > index b8ccd7b5af82..6decb9ad2421 100644
> > > > > --- a/kernel/rcu/tree.c
> > > > > +++ b/kernel/rcu/tree.c
> > > > > @@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu)
> > > > >  {
> > > > > struct rcu_data *rdp = per_cpu_ptr(_data, cpu);
> > > > > struct rcu_node *rnp = rdp->mynode;  /* Outgoing CPU's rdp & 
> > > > > rnp. */
> > > > > +   struct kfree_rcu_cpu *krcp;
> > > > >
> > > > > if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
> > > > > return 0;
> > > > >
> > > > > +   /* Drain the kcrp of this CPU. IRQs should be disabled? */
> > > > > +   krcp = this_cpu_ptr()
> > > > > +   schedule_delayed_work(>monitor_work, 0);
> > > > > +
> > > > >
> > > > > A cpu can be offlined and its krp will be stuck until a shrinker is 
> > > > > involved.
> > > > > Maybe be never.
> > > >
> > > > Does the same apply to its kmalloc() per-CPU caches?  If so, I have a
> > > > hard time getting too worried about it.  ;-)
> > > 
> > > >Looking at slab_offline_cpu() , that calls cancel_delayed_work_sync()
> > > >on the cache reaper who's job is to flush the per-cpu caches. So I
> > > >believe during CPU offlining, the per-cpu slab caches are flushed.
> > > >
> > > >thanks,
> > > >
> > >  >- Joel
> > > 
> > > When cpu going offline, the slub or slab only flush free objects in 
> > > offline
> > > cpu cache,  put these free objects in node list  or return buddy system,
> > > for those who are still in use, they still stay offline cpu cache.
> > > 
> > > If we want clean per-cpu "krcp" objects when cpu going offline.  we should
> > > free "krcp" cache objects in "rcutree_offline_cpu", this func be called
> > > before other rcu cpu offline func. and then "rcutree_offline_cpu" will be
> > > called in "cpuhp/%u" per-cpu thread.
> > > 
> > 
> > Could you please wrap text properly when you post to mailing list, thanks. I
> > fixed it for you above.
> > 
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index 8ce77d9ac716..1812d4a1ac1b 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -3959,6 +3959,7 @@ int rcutree_offline_cpu(unsigned int cpu)
> > > unsigned long flags;
> > > struct rcu_data *rdp;
> > > struct rcu_node *rnp;
> > > +   struct kfree_rcu_cpu *krcp;
> > >  
> > > rdp = per_cpu_ptr(_data, cpu);
> > > rnp = rdp->mynode;
> > > @@ -3970,6 +3971,11 @@ int rcutree_offline_cpu(unsigned int cpu)
> > >  
> > > // nohz_full CPUs need the tick for stop-machine to work quickly
> > > tick_dep_set(TICK_DEP_BIT_RCU);
> > > +
> > > +   krcp = per_cpu_ptr(, cpu);
> > > +   raw_spin_lock_irqsave(>lock, flags);
> > > +   schedule_delayed_work(>monitor_work, 0);
> > > +   raw_spin_unlock_irqrestore(>lock, flags);
> > > return 0;
> > 
> > I realized the above is not good enough for what this is trying to do. 
> > Unlike
> > the slab, the new kfree_rcu objects cannot always be drained / submitted to
> > RCU because the previous batch may still be waiting for a grace period. So
> > the above code could very well return with the yet-to-be-submitted kfree_rcu
> > objects still in the cache.
> > 
> > One option is to spin-wait here for monitor_todo to be false and keep 
> > calling
> > kfree_rcu_drain_unlock() till then.
> > 
> > But then that's not good enough either, because if new objects are queued
> > when interrupts are enabled in the CPU offline path, then the cache will get
> > new objects after the previous set was drained. Further, spin waiting may
> > introduce deadlocks.
> > 
> > Another option is to switch the kfree_rcu() path to non-batching (so new
> > objects cannot be cached in the offline path and are submitted directly to
> > RCU), wait for a GP and then submit the work. But then not sure if 
> > 1-argument
> > kfree_rcu() will like that.
> 
> Or spawn a workqueue that does something like this:
> 
> 1.Get any pending kvfree_rcu() requests sent off to RCU.
> 
> 2.Do an rcu_barrier().
> 
> 3.Do the cleanup actions.

One more thing I'n not sure about is if this will even be an issue  because
the delayed-work to run the monitor could be migrated to another (both timers
and wq get migrated).  So in that case, the draining 

Re: 回复: [PATCH] rcu: shrink each possible cpu krcp

2020-08-19 Thread Paul E. McKenney
On Wed, Aug 19, 2020 at 09:56:54AM -0400, Joel Fernandes wrote:
> On Wed, Aug 19, 2020 at 03:00:55AM +, Zhang, Qiang wrote:
> > 
> > 
> > 
> > 发件人: linux-kernel-ow...@vger.kernel.org 
> >  代表 Joel Fernandes 
> > 
> > 发送时间: 2020年8月19日 8:04
> > 收件人: Paul E. McKenney
> > 抄送: Uladzislau Rezki; Zhang, Qiang; Josh Triplett; Steven Rostedt; Mathieu 
> > Desnoyers; Lai Jiangshan; rcu; LKML
> > 主题: Re: [PATCH] rcu: shrink each possible cpu krcp
> > 
> > On Tue, Aug 18, 2020 at 6:02 PM Paul E. McKenney  wrote:
> > 
> > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > index b8ccd7b5af82..6decb9ad2421 100644
> > > > --- a/kernel/rcu/tree.c
> > > > +++ b/kernel/rcu/tree.c
> > > > @@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu)
> > > >  {
> > > > struct rcu_data *rdp = per_cpu_ptr(_data, cpu);
> > > > struct rcu_node *rnp = rdp->mynode;  /* Outgoing CPU's rdp & 
> > > > rnp. */
> > > > +   struct kfree_rcu_cpu *krcp;
> > > >
> > > > if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
> > > > return 0;
> > > >
> > > > +   /* Drain the kcrp of this CPU. IRQs should be disabled? */
> > > > +   krcp = this_cpu_ptr()
> > > > +   schedule_delayed_work(>monitor_work, 0);
> > > > +
> > > >
> > > > A cpu can be offlined and its krp will be stuck until a shrinker is 
> > > > involved.
> > > > Maybe be never.
> > >
> > > Does the same apply to its kmalloc() per-CPU caches?  If so, I have a
> > > hard time getting too worried about it.  ;-)
> > 
> > >Looking at slab_offline_cpu() , that calls cancel_delayed_work_sync()
> > >on the cache reaper who's job is to flush the per-cpu caches. So I
> > >believe during CPU offlining, the per-cpu slab caches are flushed.
> > >
> > >thanks,
> > >
> >  >- Joel
> > 
> > When cpu going offline, the slub or slab only flush free objects in offline
> > cpu cache,  put these free objects in node list  or return buddy system,
> > for those who are still in use, they still stay offline cpu cache.
> > 
> > If we want clean per-cpu "krcp" objects when cpu going offline.  we should
> > free "krcp" cache objects in "rcutree_offline_cpu", this func be called
> > before other rcu cpu offline func. and then "rcutree_offline_cpu" will be
> > called in "cpuhp/%u" per-cpu thread.
> > 
> 
> Could you please wrap text properly when you post to mailing list, thanks. I
> fixed it for you above.
> 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 8ce77d9ac716..1812d4a1ac1b 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -3959,6 +3959,7 @@ int rcutree_offline_cpu(unsigned int cpu)
> > unsigned long flags;
> > struct rcu_data *rdp;
> > struct rcu_node *rnp;
> > +   struct kfree_rcu_cpu *krcp;
> >  
> > rdp = per_cpu_ptr(_data, cpu);
> > rnp = rdp->mynode;
> > @@ -3970,6 +3971,11 @@ int rcutree_offline_cpu(unsigned int cpu)
> >  
> > // nohz_full CPUs need the tick for stop-machine to work quickly
> > tick_dep_set(TICK_DEP_BIT_RCU);
> > +
> > +   krcp = per_cpu_ptr(, cpu);
> > +   raw_spin_lock_irqsave(>lock, flags);
> > +   schedule_delayed_work(>monitor_work, 0);
> > +   raw_spin_unlock_irqrestore(>lock, flags);
> > return 0;
> 
> I realized the above is not good enough for what this is trying to do. Unlike
> the slab, the new kfree_rcu objects cannot always be drained / submitted to
> RCU because the previous batch may still be waiting for a grace period. So
> the above code could very well return with the yet-to-be-submitted kfree_rcu
> objects still in the cache.
> 
> One option is to spin-wait here for monitor_todo to be false and keep calling
> kfree_rcu_drain_unlock() till then.
> 
> But then that's not good enough either, because if new objects are queued
> when interrupts are enabled in the CPU offline path, then the cache will get
> new objects after the previous set was drained. Further, spin waiting may
> introduce deadlocks.
> 
> Another option is to switch the kfree_rcu() path to non-batching (so new
> objects cannot be cached in the offline path and are submitted directly to
> RCU), wait for a GP and then submit the work. But then not sure if 1-argument
> kfree_rcu() will like that.

Or spawn a workqueue that does something like this:

1.  Get any pending kvfree_rcu() requests sent off to RCU.

2.  Do an rcu_barrier().

3.  Do the cleanup actions.

> Probably Qian's original fix for for_each_possible_cpus() is good enough for
> the shrinker case, and then we can tackle the hotplug one.

It might take some experimentation to find the best solution.

Thanx, Paul


Re: 回复: [PATCH] rcu: shrink each possible cpu krcp

2020-08-19 Thread Joel Fernandes
On Wed, Aug 19, 2020 at 03:00:55AM +, Zhang, Qiang wrote:
> 
> 
> 
> 发件人: linux-kernel-ow...@vger.kernel.org  
> 代表 Joel Fernandes 
> 发送时间: 2020年8月19日 8:04
> 收件人: Paul E. McKenney
> 抄送: Uladzislau Rezki; Zhang, Qiang; Josh Triplett; Steven Rostedt; Mathieu 
> Desnoyers; Lai Jiangshan; rcu; LKML
> 主题: Re: [PATCH] rcu: shrink each possible cpu krcp
> 
> On Tue, Aug 18, 2020 at 6:02 PM Paul E. McKenney  wrote:
> 
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index b8ccd7b5af82..6decb9ad2421 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu)
> > >  {
> > > struct rcu_data *rdp = per_cpu_ptr(_data, cpu);
> > > struct rcu_node *rnp = rdp->mynode;  /* Outgoing CPU's rdp & rnp. 
> > > */
> > > +   struct kfree_rcu_cpu *krcp;
> > >
> > > if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
> > > return 0;
> > >
> > > +   /* Drain the kcrp of this CPU. IRQs should be disabled? */
> > > +   krcp = this_cpu_ptr()
> > > +   schedule_delayed_work(>monitor_work, 0);
> > > +
> > >
> > > A cpu can be offlined and its krp will be stuck until a shrinker is 
> > > involved.
> > > Maybe be never.
> >
> > Does the same apply to its kmalloc() per-CPU caches?  If so, I have a
> > hard time getting too worried about it.  ;-)
> 
> >Looking at slab_offline_cpu() , that calls cancel_delayed_work_sync()
> >on the cache reaper who's job is to flush the per-cpu caches. So I
> >believe during CPU offlining, the per-cpu slab caches are flushed.
> >
> >thanks,
> >
>  >- Joel
> 
> When cpu going offline, the slub or slab only flush free objects in offline
> cpu cache,  put these free objects in node list  or return buddy system,
> for those who are still in use, they still stay offline cpu cache.
> 
> If we want clean per-cpu "krcp" objects when cpu going offline.  we should
> free "krcp" cache objects in "rcutree_offline_cpu", this func be called
> before other rcu cpu offline func. and then "rcutree_offline_cpu" will be
> called in "cpuhp/%u" per-cpu thread.
> 

Could you please wrap text properly when you post to mailing list, thanks. I
fixed it for you above.

> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 8ce77d9ac716..1812d4a1ac1b 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3959,6 +3959,7 @@ int rcutree_offline_cpu(unsigned int cpu)
> unsigned long flags;
> struct rcu_data *rdp;
> struct rcu_node *rnp;
> +   struct kfree_rcu_cpu *krcp;
>  
> rdp = per_cpu_ptr(_data, cpu);
> rnp = rdp->mynode;
> @@ -3970,6 +3971,11 @@ int rcutree_offline_cpu(unsigned int cpu)
>  
> // nohz_full CPUs need the tick for stop-machine to work quickly
> tick_dep_set(TICK_DEP_BIT_RCU);
> +
> +   krcp = per_cpu_ptr(, cpu);
> +   raw_spin_lock_irqsave(>lock, flags);
> +   schedule_delayed_work(>monitor_work, 0);
> +   raw_spin_unlock_irqrestore(>lock, flags);
> return 0;

I realized the above is not good enough for what this is trying to do. Unlike
the slab, the new kfree_rcu objects cannot always be drained / submitted to
RCU because the previous batch may still be waiting for a grace period. So
the above code could very well return with the yet-to-be-submitted kfree_rcu
objects still in the cache.

One option is to spin-wait here for monitor_todo to be false and keep calling
kfree_rcu_drain_unlock() till then.

But then that's not good enough either, because if new objects are queued
when interrupts are enabled in the CPU offline path, then the cache will get
new objects after the previous set was drained. Further, spin waiting may
introduce deadlocks.

Another option is to switch the kfree_rcu() path to non-batching (so new
objects cannot be cached in the offline path and are submitted directly to
RCU), wait for a GP and then submit the work. But then not sure if 1-argument
kfree_rcu() will like that.

Probably Qian's original fix for for_each_possible_cpus() is good enough for
the shrinker case, and then we can tackle the hotplug one.

thanks,

 - Joel



Re: 回复: [PATCH] rcu: shrink each possible cpu krcp

2020-08-19 Thread Paul E. McKenney
On Wed, Aug 19, 2020 at 03:00:55AM +, Zhang, Qiang wrote:
> 
> 
> 
> 发件人: linux-kernel-ow...@vger.kernel.org  
> 代表 Joel Fernandes 
> 发送时间: 2020年8月19日 8:04
> 收件人: Paul E. McKenney
> 抄送: Uladzislau Rezki; Zhang, Qiang; Josh Triplett; Steven Rostedt; Mathieu 
> Desnoyers; Lai Jiangshan; rcu; LKML
> 主题: Re: [PATCH] rcu: shrink each possible cpu krcp
> 
> On Tue, Aug 18, 2020 at 6:02 PM Paul E. McKenney  wrote:
> 
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index b8ccd7b5af82..6decb9ad2421 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu)
> > >  {
> > > struct rcu_data *rdp = per_cpu_ptr(_data, cpu);
> > > struct rcu_node *rnp = rdp->mynode;  /* Outgoing CPU's rdp & rnp. 
> > > */
> > > +   struct kfree_rcu_cpu *krcp;
> > >
> > > if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
> > > return 0;
> > >
> > > +   /* Drain the kcrp of this CPU. IRQs should be disabled? */
> > > +   krcp = this_cpu_ptr()
> > > +   schedule_delayed_work(>monitor_work, 0);
> > > +
> > >
> > > A cpu can be offlined and its krp will be stuck until a shrinker is 
> > > involved.
> > > Maybe be never.
> >
> > Does the same apply to its kmalloc() per-CPU caches?  If so, I have a
> > hard time getting too worried about it.  ;-)
> 
> >Looking at slab_offline_cpu() , that calls cancel_delayed_work_sync()
> >on the cache reaper who's job is to flush the per-cpu caches. So I
> >believe during CPU offlining, the per-cpu slab caches are flushed.
> >
> >thanks,
> >
>  >- Joel
> 
> When cpu going offline, the slub or slab only flush free objects in offline 
> cpu cache,  put these free objects in node list  or return buddy system,  for 
> those who are still in use, they still stay offline cpu cache.
> 
> If we want clean per-cpu "krcp" objects when cpu going offline.
>  we should free "krcp" cache objects in "rcutree_offline_cpu", this func be 
> called before other rcu cpu offline func. and then "rcutree_offline_cpu" will 
> be called in "cpuhp/%u" per-cpu thread.

If Joel and Uladislau are good with this, would you be willing to
resend with a commit log and your Signed-off-by?

Thanx, Paul

> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 8ce77d9ac716..1812d4a1ac1b 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3959,6 +3959,7 @@ int rcutree_offline_cpu(unsigned int cpu)
> unsigned long flags;
> struct rcu_data *rdp;
> struct rcu_node *rnp;
> +   struct kfree_rcu_cpu *krcp;
>  
> rdp = per_cpu_ptr(_data, cpu);
> rnp = rdp->mynode;
> @@ -3970,6 +3971,11 @@ int rcutree_offline_cpu(unsigned int cpu)
>  
> // nohz_full CPUs need the tick for stop-machine to work quickly
> tick_dep_set(TICK_DEP_BIT_RCU);
> +
> +   krcp = per_cpu_ptr(, cpu);
> +   raw_spin_lock_irqsave(>lock, flags);
> +   schedule_delayed_work(>monitor_work, 0);
> +   raw_spin_unlock_irqrestore(>lock, flags);
> return 0;
>  }
> 
> thanks,
> 
> Zqiang


回复: [PATCH] rcu: shrink each possible cpu krcp

2020-08-18 Thread Zhang, Qiang



发件人: linux-kernel-ow...@vger.kernel.org  代表 
Joel Fernandes 
发送时间: 2020年8月19日 8:04
收件人: Paul E. McKenney
抄送: Uladzislau Rezki; Zhang, Qiang; Josh Triplett; Steven Rostedt; Mathieu 
Desnoyers; Lai Jiangshan; rcu; LKML
主题: Re: [PATCH] rcu: shrink each possible cpu krcp

On Tue, Aug 18, 2020 at 6:02 PM Paul E. McKenney  wrote:

> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index b8ccd7b5af82..6decb9ad2421 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu)
> >  {
> > struct rcu_data *rdp = per_cpu_ptr(_data, cpu);
> > struct rcu_node *rnp = rdp->mynode;  /* Outgoing CPU's rdp & rnp. */
> > +   struct kfree_rcu_cpu *krcp;
> >
> > if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
> > return 0;
> >
> > +   /* Drain the kcrp of this CPU. IRQs should be disabled? */
> > +   krcp = this_cpu_ptr()
> > +   schedule_delayed_work(>monitor_work, 0);
> > +
> >
> > A cpu can be offlined and its krp will be stuck until a shrinker is 
> > involved.
> > Maybe be never.
>
> Does the same apply to its kmalloc() per-CPU caches?  If so, I have a
> hard time getting too worried about it.  ;-)

>Looking at slab_offline_cpu() , that calls cancel_delayed_work_sync()
>on the cache reaper who's job is to flush the per-cpu caches. So I
>believe during CPU offlining, the per-cpu slab caches are flushed.
>
>thanks,
>
 >- Joel

When cpu going offline, the slub or slab only flush free objects in offline cpu 
cache,  put these free objects in node list  or return buddy system,  for those 
who are still in use, they still stay offline cpu cache.

If we want clean per-cpu "krcp" objects when cpu going offline.
 we should free "krcp" cache objects in "rcutree_offline_cpu", this func be 
called before other rcu cpu offline func. and then "rcutree_offline_cpu" will 
be called in "cpuhp/%u" per-cpu thread.

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 8ce77d9ac716..1812d4a1ac1b 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3959,6 +3959,7 @@ int rcutree_offline_cpu(unsigned int cpu)
unsigned long flags;
struct rcu_data *rdp;
struct rcu_node *rnp;
+   struct kfree_rcu_cpu *krcp;
 
rdp = per_cpu_ptr(_data, cpu);
rnp = rdp->mynode;
@@ -3970,6 +3971,11 @@ int rcutree_offline_cpu(unsigned int cpu)
 
// nohz_full CPUs need the tick for stop-machine to work quickly
tick_dep_set(TICK_DEP_BIT_RCU);
+
+   krcp = per_cpu_ptr(, cpu);
+   raw_spin_lock_irqsave(>lock, flags);
+   schedule_delayed_work(>monitor_work, 0);
+   raw_spin_unlock_irqrestore(>lock, flags);
return 0;
 }

thanks,

Zqiang