Re: 回复: [PATCH] rcu: shrink each possible cpu krcp
On Fri, Aug 21, 2020 at 08:33:28AM -0700, Paul E. McKenney wrote: > On Thu, Aug 20, 2020 at 06:39:57PM -0400, Joel Fernandes wrote: > > On Wed, Aug 19, 2020 at 05:58:08PM +0200, Uladzislau Rezki wrote: > > > On Wed, Aug 19, 2020 at 08:21:59AM -0700, Paul E. McKenney wrote: > > > > On Wed, Aug 19, 2020 at 09:56:54AM -0400, Joel Fernandes wrote: > > > > > On Wed, Aug 19, 2020 at 03:00:55AM +, Zhang, Qiang wrote: > > > > > > > > > > > > > > > > > > > > > > > > 发件人: linux-kernel-ow...@vger.kernel.org > > > > > > 代表 Joel Fernandes > > > > > > > > > > > > 发送时间: 2020年8月19日 8:04 > > > > > > 收件人: Paul E. McKenney > > > > > > 抄送: Uladzislau Rezki; Zhang, Qiang; Josh Triplett; Steven Rostedt; > > > > > > Mathieu Desnoyers; Lai Jiangshan; rcu; LKML > > > > > > 主题: Re: [PATCH] rcu: shrink each possible cpu krcp > > > > > > > > > > > > On Tue, Aug 18, 2020 at 6:02 PM Paul E. McKenney > > > > > > wrote: > > > > > > > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > > > > > index b8ccd7b5af82..6decb9ad2421 100644 > > > > > > > > --- a/kernel/rcu/tree.c > > > > > > > > +++ b/kernel/rcu/tree.c > > > > > > > > @@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu) > > > > > > > > { > > > > > > > > struct rcu_data *rdp = per_cpu_ptr(_data, cpu); > > > > > > > > struct rcu_node *rnp = rdp->mynode; /* Outgoing CPU's > > > > > > > > rdp & rnp. */ > > > > > > > > + struct kfree_rcu_cpu *krcp; > > > > > > > > > > > > > > > > if (!IS_ENABLED(CONFIG_HOTPLUG_CPU)) > > > > > > > > return 0; > > > > > > > > > > > > > > > > + /* Drain the kcrp of this CPU. IRQs should be disabled? > > > > > > > > */ > > > > > > > > + krcp = this_cpu_ptr() > > > > > > > > + schedule_delayed_work(>monitor_work, 0); > > > > > > > > + > > > > > > > > > > > > > > > > A cpu can be offlined and its krp will be stuck until a > > > > > > > > shrinker is involved. > > > > > > > > Maybe be never. > > > > > > > > > > > > > > Does the same apply to its kmalloc() per-CPU caches? If so, I > > > > > > > have a > > > > > > > hard time getting too worried about it. ;-) > > > > > > > > > > > > >Looking at slab_offline_cpu() , that calls > > > > > > >cancel_delayed_work_sync() > > > > > > >on the cache reaper who's job is to flush the per-cpu caches. So I > > > > > > >believe during CPU offlining, the per-cpu slab caches are flushed. > > > > > > > > > > > > > >thanks, > > > > > > > > > > > > > >- Joel > > > > > > > > > > > > When cpu going offline, the slub or slab only flush free objects in > > > > > > offline > > > > > > cpu cache, put these free objects in node list or return buddy > > > > > > system, > > > > > > for those who are still in use, they still stay offline cpu cache. > > > > > > > > > > > > If we want clean per-cpu "krcp" objects when cpu going offline. we > > > > > > should > > > > > > free "krcp" cache objects in "rcutree_offline_cpu", this func be > > > > > > called > > > > > > before other rcu cpu offline func. and then "rcutree_offline_cpu" > > > > > > will be > > > > > > called in "cpuhp/%u" per-cpu thread. > > > > > > > > > > > > > > > > Could you please wrap text properly when you post to mailing list, > > > > > thanks. I > > > > > fixed it for you above. > > > > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > > > index 8ce77d9ac716..1812d4a1ac1b 100644 > > > > > > --- a/kernel/rcu/tree.c > > > > > > +++ b/kernel/rcu/tree.c > > > > > > @@ -3959,6 +3959,7 @@ int rcutree_offline_cpu(unsigned int cpu) > > > > > > unsigned long flags; > > > > > > struct rcu_data *rdp; > > > > > > struct rcu_node *rnp; > > > > > > + struct kfree_rcu_cpu *krcp; > > > > > > > > > > > > rdp = per_cpu_ptr(_data, cpu); > > > > > > rnp = rdp->mynode; > > > > > > @@ -3970,6 +3971,11 @@ int rcutree_offline_cpu(unsigned int cpu) > > > > > > > > > > > > // nohz_full CPUs need the tick for stop-machine to work > > > > > > quickly > > > > > > tick_dep_set(TICK_DEP_BIT_RCU); > > > > > > + > > > > > > + krcp = per_cpu_ptr(, cpu); > > > > > > + raw_spin_lock_irqsave(>lock, flags); > > > > > > + schedule_delayed_work(>monitor_work, 0); > > > > > > + raw_spin_unlock_irqrestore(>lock, flags); > > > > > > return 0; > > > > > > > > > > I realized the above is not good enough for what this is trying to > > > > > do. Unlike > > > > > the slab, the new kfree_rcu objects cannot always be drained / > > > > > submitted to > > > > > RCU because the previous batch may still be waiting for a grace > > > > > period. So > > > > > the above code could very well return with the yet-to-be-submitted > > > > > kfree_rcu > > > > > objects still in the cache. > > > > > > > > > > One option is to spin-wait here for monitor_todo to be false and keep > > > > > calling > > > > > kfree_rcu_drain_unlock() till
Re: 回复: [PATCH] rcu: shrink each possible cpu krcp
On Thu, Aug 20, 2020 at 06:39:57PM -0400, Joel Fernandes wrote: > On Wed, Aug 19, 2020 at 05:58:08PM +0200, Uladzislau Rezki wrote: > > On Wed, Aug 19, 2020 at 08:21:59AM -0700, Paul E. McKenney wrote: > > > On Wed, Aug 19, 2020 at 09:56:54AM -0400, Joel Fernandes wrote: > > > > On Wed, Aug 19, 2020 at 03:00:55AM +, Zhang, Qiang wrote: > > > > > > > > > > > > > > > > > > > > 发件人: linux-kernel-ow...@vger.kernel.org > > > > > 代表 Joel Fernandes > > > > > > > > > > 发送时间: 2020年8月19日 8:04 > > > > > 收件人: Paul E. McKenney > > > > > 抄送: Uladzislau Rezki; Zhang, Qiang; Josh Triplett; Steven Rostedt; > > > > > Mathieu Desnoyers; Lai Jiangshan; rcu; LKML > > > > > 主题: Re: [PATCH] rcu: shrink each possible cpu krcp > > > > > > > > > > On Tue, Aug 18, 2020 at 6:02 PM Paul E. McKenney > > > > > wrote: > > > > > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > > > > index b8ccd7b5af82..6decb9ad2421 100644 > > > > > > > --- a/kernel/rcu/tree.c > > > > > > > +++ b/kernel/rcu/tree.c > > > > > > > @@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu) > > > > > > > { > > > > > > > struct rcu_data *rdp = per_cpu_ptr(_data, cpu); > > > > > > > struct rcu_node *rnp = rdp->mynode; /* Outgoing CPU's > > > > > > > rdp & rnp. */ > > > > > > > + struct kfree_rcu_cpu *krcp; > > > > > > > > > > > > > > if (!IS_ENABLED(CONFIG_HOTPLUG_CPU)) > > > > > > > return 0; > > > > > > > > > > > > > > + /* Drain the kcrp of this CPU. IRQs should be disabled? */ > > > > > > > + krcp = this_cpu_ptr() > > > > > > > + schedule_delayed_work(>monitor_work, 0); > > > > > > > + > > > > > > > > > > > > > > A cpu can be offlined and its krp will be stuck until a shrinker > > > > > > > is involved. > > > > > > > Maybe be never. > > > > > > > > > > > > Does the same apply to its kmalloc() per-CPU caches? If so, I have > > > > > > a > > > > > > hard time getting too worried about it. ;-) > > > > > > > > > > >Looking at slab_offline_cpu() , that calls cancel_delayed_work_sync() > > > > > >on the cache reaper who's job is to flush the per-cpu caches. So I > > > > > >believe during CPU offlining, the per-cpu slab caches are flushed. > > > > > > > > > > > >thanks, > > > > > > > > > > > >- Joel > > > > > > > > > > When cpu going offline, the slub or slab only flush free objects in > > > > > offline > > > > > cpu cache, put these free objects in node list or return buddy > > > > > system, > > > > > for those who are still in use, they still stay offline cpu cache. > > > > > > > > > > If we want clean per-cpu "krcp" objects when cpu going offline. we > > > > > should > > > > > free "krcp" cache objects in "rcutree_offline_cpu", this func be > > > > > called > > > > > before other rcu cpu offline func. and then "rcutree_offline_cpu" > > > > > will be > > > > > called in "cpuhp/%u" per-cpu thread. > > > > > > > > > > > > > Could you please wrap text properly when you post to mailing list, > > > > thanks. I > > > > fixed it for you above. > > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > > index 8ce77d9ac716..1812d4a1ac1b 100644 > > > > > --- a/kernel/rcu/tree.c > > > > > +++ b/kernel/rcu/tree.c > > > > > @@ -3959,6 +3959,7 @@ int rcutree_offline_cpu(unsigned int cpu) > > > > > unsigned long flags; > > > > > struct rcu_data *rdp; > > > > > struct rcu_node *rnp; > > > > > + struct kfree_rcu_cpu *krcp; > > > > > > > > > > rdp = per_cpu_ptr(_data, cpu); > > > > > rnp = rdp->mynode; > > > > > @@ -3970,6 +3971,11 @@ int rcutree_offline_cpu(unsigned int cpu) > > > > > > > > > > // nohz_full CPUs need the tick for stop-machine to work > > > > > quickly > > > > > tick_dep_set(TICK_DEP_BIT_RCU); > > > > > + > > > > > + krcp = per_cpu_ptr(, cpu); > > > > > + raw_spin_lock_irqsave(>lock, flags); > > > > > + schedule_delayed_work(>monitor_work, 0); > > > > > + raw_spin_unlock_irqrestore(>lock, flags); > > > > > return 0; > > > > > > > > I realized the above is not good enough for what this is trying to do. > > > > Unlike > > > > the slab, the new kfree_rcu objects cannot always be drained / > > > > submitted to > > > > RCU because the previous batch may still be waiting for a grace period. > > > > So > > > > the above code could very well return with the yet-to-be-submitted > > > > kfree_rcu > > > > objects still in the cache. > > > > > > > > One option is to spin-wait here for monitor_todo to be false and keep > > > > calling > > > > kfree_rcu_drain_unlock() till then. > > > > > > > > But then that's not good enough either, because if new objects are > > > > queued > > > > when interrupts are enabled in the CPU offline path, then the cache > > > > will get > > > > new objects after the previous set was drained. Further, spin waiting > > > > may > > > > introduce deadlocks.
Re: 回复: [PATCH] rcu: shrink each possible cpu krcp
On Wed, Aug 19, 2020 at 05:58:08PM +0200, Uladzislau Rezki wrote: > On Wed, Aug 19, 2020 at 08:21:59AM -0700, Paul E. McKenney wrote: > > On Wed, Aug 19, 2020 at 09:56:54AM -0400, Joel Fernandes wrote: > > > On Wed, Aug 19, 2020 at 03:00:55AM +, Zhang, Qiang wrote: > > > > > > > > > > > > > > > > 发件人: linux-kernel-ow...@vger.kernel.org > > > > 代表 Joel Fernandes > > > > > > > > 发送时间: 2020年8月19日 8:04 > > > > 收件人: Paul E. McKenney > > > > 抄送: Uladzislau Rezki; Zhang, Qiang; Josh Triplett; Steven Rostedt; > > > > Mathieu Desnoyers; Lai Jiangshan; rcu; LKML > > > > 主题: Re: [PATCH] rcu: shrink each possible cpu krcp > > > > > > > > On Tue, Aug 18, 2020 at 6:02 PM Paul E. McKenney > > > > wrote: > > > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > > > index b8ccd7b5af82..6decb9ad2421 100644 > > > > > > --- a/kernel/rcu/tree.c > > > > > > +++ b/kernel/rcu/tree.c > > > > > > @@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu) > > > > > > { > > > > > > struct rcu_data *rdp = per_cpu_ptr(_data, cpu); > > > > > > struct rcu_node *rnp = rdp->mynode; /* Outgoing CPU's rdp > > > > > > & rnp. */ > > > > > > + struct kfree_rcu_cpu *krcp; > > > > > > > > > > > > if (!IS_ENABLED(CONFIG_HOTPLUG_CPU)) > > > > > > return 0; > > > > > > > > > > > > + /* Drain the kcrp of this CPU. IRQs should be disabled? */ > > > > > > + krcp = this_cpu_ptr() > > > > > > + schedule_delayed_work(>monitor_work, 0); > > > > > > + > > > > > > > > > > > > A cpu can be offlined and its krp will be stuck until a shrinker is > > > > > > involved. > > > > > > Maybe be never. > > > > > > > > > > Does the same apply to its kmalloc() per-CPU caches? If so, I have a > > > > > hard time getting too worried about it. ;-) > > > > > > > > >Looking at slab_offline_cpu() , that calls cancel_delayed_work_sync() > > > > >on the cache reaper who's job is to flush the per-cpu caches. So I > > > > >believe during CPU offlining, the per-cpu slab caches are flushed. > > > > > > > > > >thanks, > > > > > > > > > >- Joel > > > > > > > > When cpu going offline, the slub or slab only flush free objects in > > > > offline > > > > cpu cache, put these free objects in node list or return buddy system, > > > > for those who are still in use, they still stay offline cpu cache. > > > > > > > > If we want clean per-cpu "krcp" objects when cpu going offline. we > > > > should > > > > free "krcp" cache objects in "rcutree_offline_cpu", this func be called > > > > before other rcu cpu offline func. and then "rcutree_offline_cpu" will > > > > be > > > > called in "cpuhp/%u" per-cpu thread. > > > > > > > > > > Could you please wrap text properly when you post to mailing list, > > > thanks. I > > > fixed it for you above. > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > index 8ce77d9ac716..1812d4a1ac1b 100644 > > > > --- a/kernel/rcu/tree.c > > > > +++ b/kernel/rcu/tree.c > > > > @@ -3959,6 +3959,7 @@ int rcutree_offline_cpu(unsigned int cpu) > > > > unsigned long flags; > > > > struct rcu_data *rdp; > > > > struct rcu_node *rnp; > > > > + struct kfree_rcu_cpu *krcp; > > > > > > > > rdp = per_cpu_ptr(_data, cpu); > > > > rnp = rdp->mynode; > > > > @@ -3970,6 +3971,11 @@ int rcutree_offline_cpu(unsigned int cpu) > > > > > > > > // nohz_full CPUs need the tick for stop-machine to work quickly > > > > tick_dep_set(TICK_DEP_BIT_RCU); > > > > + > > > > + krcp = per_cpu_ptr(, cpu); > > > > + raw_spin_lock_irqsave(>lock, flags); > > > > + schedule_delayed_work(>monitor_work, 0); > > > > + raw_spin_unlock_irqrestore(>lock, flags); > > > > return 0; > > > > > > I realized the above is not good enough for what this is trying to do. > > > Unlike > > > the slab, the new kfree_rcu objects cannot always be drained / submitted > > > to > > > RCU because the previous batch may still be waiting for a grace period. So > > > the above code could very well return with the yet-to-be-submitted > > > kfree_rcu > > > objects still in the cache. > > > > > > One option is to spin-wait here for monitor_todo to be false and keep > > > calling > > > kfree_rcu_drain_unlock() till then. > > > > > > But then that's not good enough either, because if new objects are queued > > > when interrupts are enabled in the CPU offline path, then the cache will > > > get > > > new objects after the previous set was drained. Further, spin waiting may > > > introduce deadlocks. > > > > > > Another option is to switch the kfree_rcu() path to non-batching (so new > > > objects cannot be cached in the offline path and are submitted directly to > > > RCU), wait for a GP and then submit the work. But then not sure if > > > 1-argument > > > kfree_rcu() will like that. > > > > Or spawn a workqueue that does something like this: > > >
Re: 回复: [PATCH] rcu: shrink each possible cpu krcp
On Wed, Aug 19, 2020 at 08:21:59AM -0700, Paul E. McKenney wrote: > On Wed, Aug 19, 2020 at 09:56:54AM -0400, Joel Fernandes wrote: > > On Wed, Aug 19, 2020 at 03:00:55AM +, Zhang, Qiang wrote: > > > > > > > > > > > > 发件人: linux-kernel-ow...@vger.kernel.org > > > 代表 Joel Fernandes > > > > > > 发送时间: 2020年8月19日 8:04 > > > 收件人: Paul E. McKenney > > > 抄送: Uladzislau Rezki; Zhang, Qiang; Josh Triplett; Steven Rostedt; > > > Mathieu Desnoyers; Lai Jiangshan; rcu; LKML > > > 主题: Re: [PATCH] rcu: shrink each possible cpu krcp > > > > > > On Tue, Aug 18, 2020 at 6:02 PM Paul E. McKenney > > > wrote: > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > > index b8ccd7b5af82..6decb9ad2421 100644 > > > > > --- a/kernel/rcu/tree.c > > > > > +++ b/kernel/rcu/tree.c > > > > > @@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu) > > > > > { > > > > > struct rcu_data *rdp = per_cpu_ptr(_data, cpu); > > > > > struct rcu_node *rnp = rdp->mynode; /* Outgoing CPU's rdp & > > > > > rnp. */ > > > > > + struct kfree_rcu_cpu *krcp; > > > > > > > > > > if (!IS_ENABLED(CONFIG_HOTPLUG_CPU)) > > > > > return 0; > > > > > > > > > > + /* Drain the kcrp of this CPU. IRQs should be disabled? */ > > > > > + krcp = this_cpu_ptr() > > > > > + schedule_delayed_work(>monitor_work, 0); > > > > > + > > > > > > > > > > A cpu can be offlined and its krp will be stuck until a shrinker is > > > > > involved. > > > > > Maybe be never. > > > > > > > > Does the same apply to its kmalloc() per-CPU caches? If so, I have a > > > > hard time getting too worried about it. ;-) > > > > > > >Looking at slab_offline_cpu() , that calls cancel_delayed_work_sync() > > > >on the cache reaper who's job is to flush the per-cpu caches. So I > > > >believe during CPU offlining, the per-cpu slab caches are flushed. > > > > > > > >thanks, > > > > > > > >- Joel > > > > > > When cpu going offline, the slub or slab only flush free objects in > > > offline > > > cpu cache, put these free objects in node list or return buddy system, > > > for those who are still in use, they still stay offline cpu cache. > > > > > > If we want clean per-cpu "krcp" objects when cpu going offline. we should > > > free "krcp" cache objects in "rcutree_offline_cpu", this func be called > > > before other rcu cpu offline func. and then "rcutree_offline_cpu" will be > > > called in "cpuhp/%u" per-cpu thread. > > > > > > > Could you please wrap text properly when you post to mailing list, thanks. I > > fixed it for you above. > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > index 8ce77d9ac716..1812d4a1ac1b 100644 > > > --- a/kernel/rcu/tree.c > > > +++ b/kernel/rcu/tree.c > > > @@ -3959,6 +3959,7 @@ int rcutree_offline_cpu(unsigned int cpu) > > > unsigned long flags; > > > struct rcu_data *rdp; > > > struct rcu_node *rnp; > > > + struct kfree_rcu_cpu *krcp; > > > > > > rdp = per_cpu_ptr(_data, cpu); > > > rnp = rdp->mynode; > > > @@ -3970,6 +3971,11 @@ int rcutree_offline_cpu(unsigned int cpu) > > > > > > // nohz_full CPUs need the tick for stop-machine to work quickly > > > tick_dep_set(TICK_DEP_BIT_RCU); > > > + > > > + krcp = per_cpu_ptr(, cpu); > > > + raw_spin_lock_irqsave(>lock, flags); > > > + schedule_delayed_work(>monitor_work, 0); > > > + raw_spin_unlock_irqrestore(>lock, flags); > > > return 0; > > > > I realized the above is not good enough for what this is trying to do. > > Unlike > > the slab, the new kfree_rcu objects cannot always be drained / submitted to > > RCU because the previous batch may still be waiting for a grace period. So > > the above code could very well return with the yet-to-be-submitted kfree_rcu > > objects still in the cache. > > > > One option is to spin-wait here for monitor_todo to be false and keep > > calling > > kfree_rcu_drain_unlock() till then. > > > > But then that's not good enough either, because if new objects are queued > > when interrupts are enabled in the CPU offline path, then the cache will get > > new objects after the previous set was drained. Further, spin waiting may > > introduce deadlocks. > > > > Another option is to switch the kfree_rcu() path to non-batching (so new > > objects cannot be cached in the offline path and are submitted directly to > > RCU), wait for a GP and then submit the work. But then not sure if > > 1-argument > > kfree_rcu() will like that. > > Or spawn a workqueue that does something like this: > > 1.Get any pending kvfree_rcu() requests sent off to RCU. > > 2.Do an rcu_barrier(). > > 3.Do the cleanup actions. > > > Probably Qian's original fix for for_each_possible_cpus() is good enough for > > the shrinker case, and then we can tackle the hotplug one. > > It might take some experimentation to find the best
Re: 回复: [PATCH] rcu: shrink each possible cpu krcp
On Wed, Aug 19, 2020 at 08:21:59AM -0700, Paul E. McKenney wrote: > On Wed, Aug 19, 2020 at 09:56:54AM -0400, Joel Fernandes wrote: > > On Wed, Aug 19, 2020 at 03:00:55AM +, Zhang, Qiang wrote: > > > > > > > > > > > > 发件人: linux-kernel-ow...@vger.kernel.org > > > 代表 Joel Fernandes > > > > > > 发送时间: 2020年8月19日 8:04 > > > 收件人: Paul E. McKenney > > > 抄送: Uladzislau Rezki; Zhang, Qiang; Josh Triplett; Steven Rostedt; > > > Mathieu Desnoyers; Lai Jiangshan; rcu; LKML > > > 主题: Re: [PATCH] rcu: shrink each possible cpu krcp > > > > > > On Tue, Aug 18, 2020 at 6:02 PM Paul E. McKenney > > > wrote: > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > > index b8ccd7b5af82..6decb9ad2421 100644 > > > > > --- a/kernel/rcu/tree.c > > > > > +++ b/kernel/rcu/tree.c > > > > > @@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu) > > > > > { > > > > > struct rcu_data *rdp = per_cpu_ptr(_data, cpu); > > > > > struct rcu_node *rnp = rdp->mynode; /* Outgoing CPU's rdp & > > > > > rnp. */ > > > > > + struct kfree_rcu_cpu *krcp; > > > > > > > > > > if (!IS_ENABLED(CONFIG_HOTPLUG_CPU)) > > > > > return 0; > > > > > > > > > > + /* Drain the kcrp of this CPU. IRQs should be disabled? */ > > > > > + krcp = this_cpu_ptr() > > > > > + schedule_delayed_work(>monitor_work, 0); > > > > > + > > > > > > > > > > A cpu can be offlined and its krp will be stuck until a shrinker is > > > > > involved. > > > > > Maybe be never. > > > > > > > > Does the same apply to its kmalloc() per-CPU caches? If so, I have a > > > > hard time getting too worried about it. ;-) > > > > > > >Looking at slab_offline_cpu() , that calls cancel_delayed_work_sync() > > > >on the cache reaper who's job is to flush the per-cpu caches. So I > > > >believe during CPU offlining, the per-cpu slab caches are flushed. > > > > > > > >thanks, > > > > > > > >- Joel > > > > > > When cpu going offline, the slub or slab only flush free objects in > > > offline > > > cpu cache, put these free objects in node list or return buddy system, > > > for those who are still in use, they still stay offline cpu cache. > > > > > > If we want clean per-cpu "krcp" objects when cpu going offline. we should > > > free "krcp" cache objects in "rcutree_offline_cpu", this func be called > > > before other rcu cpu offline func. and then "rcutree_offline_cpu" will be > > > called in "cpuhp/%u" per-cpu thread. > > > > > > > Could you please wrap text properly when you post to mailing list, thanks. I > > fixed it for you above. > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > index 8ce77d9ac716..1812d4a1ac1b 100644 > > > --- a/kernel/rcu/tree.c > > > +++ b/kernel/rcu/tree.c > > > @@ -3959,6 +3959,7 @@ int rcutree_offline_cpu(unsigned int cpu) > > > unsigned long flags; > > > struct rcu_data *rdp; > > > struct rcu_node *rnp; > > > + struct kfree_rcu_cpu *krcp; > > > > > > rdp = per_cpu_ptr(_data, cpu); > > > rnp = rdp->mynode; > > > @@ -3970,6 +3971,11 @@ int rcutree_offline_cpu(unsigned int cpu) > > > > > > // nohz_full CPUs need the tick for stop-machine to work quickly > > > tick_dep_set(TICK_DEP_BIT_RCU); > > > + > > > + krcp = per_cpu_ptr(, cpu); > > > + raw_spin_lock_irqsave(>lock, flags); > > > + schedule_delayed_work(>monitor_work, 0); > > > + raw_spin_unlock_irqrestore(>lock, flags); > > > return 0; > > > > I realized the above is not good enough for what this is trying to do. > > Unlike > > the slab, the new kfree_rcu objects cannot always be drained / submitted to > > RCU because the previous batch may still be waiting for a grace period. So > > the above code could very well return with the yet-to-be-submitted kfree_rcu > > objects still in the cache. > > > > One option is to spin-wait here for monitor_todo to be false and keep > > calling > > kfree_rcu_drain_unlock() till then. > > > > But then that's not good enough either, because if new objects are queued > > when interrupts are enabled in the CPU offline path, then the cache will get > > new objects after the previous set was drained. Further, spin waiting may > > introduce deadlocks. > > > > Another option is to switch the kfree_rcu() path to non-batching (so new > > objects cannot be cached in the offline path and are submitted directly to > > RCU), wait for a GP and then submit the work. But then not sure if > > 1-argument > > kfree_rcu() will like that. > > Or spawn a workqueue that does something like this: > > 1.Get any pending kvfree_rcu() requests sent off to RCU. > > 2.Do an rcu_barrier(). > > 3.Do the cleanup actions. One more thing I'n not sure about is if this will even be an issue because the delayed-work to run the monitor could be migrated to another (both timers and wq get migrated). So in that case, the draining
Re: 回复: [PATCH] rcu: shrink each possible cpu krcp
On Wed, Aug 19, 2020 at 09:56:54AM -0400, Joel Fernandes wrote: > On Wed, Aug 19, 2020 at 03:00:55AM +, Zhang, Qiang wrote: > > > > > > > > 发件人: linux-kernel-ow...@vger.kernel.org > > 代表 Joel Fernandes > > > > 发送时间: 2020年8月19日 8:04 > > 收件人: Paul E. McKenney > > 抄送: Uladzislau Rezki; Zhang, Qiang; Josh Triplett; Steven Rostedt; Mathieu > > Desnoyers; Lai Jiangshan; rcu; LKML > > 主题: Re: [PATCH] rcu: shrink each possible cpu krcp > > > > On Tue, Aug 18, 2020 at 6:02 PM Paul E. McKenney wrote: > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > index b8ccd7b5af82..6decb9ad2421 100644 > > > > --- a/kernel/rcu/tree.c > > > > +++ b/kernel/rcu/tree.c > > > > @@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu) > > > > { > > > > struct rcu_data *rdp = per_cpu_ptr(_data, cpu); > > > > struct rcu_node *rnp = rdp->mynode; /* Outgoing CPU's rdp & > > > > rnp. */ > > > > + struct kfree_rcu_cpu *krcp; > > > > > > > > if (!IS_ENABLED(CONFIG_HOTPLUG_CPU)) > > > > return 0; > > > > > > > > + /* Drain the kcrp of this CPU. IRQs should be disabled? */ > > > > + krcp = this_cpu_ptr() > > > > + schedule_delayed_work(>monitor_work, 0); > > > > + > > > > > > > > A cpu can be offlined and its krp will be stuck until a shrinker is > > > > involved. > > > > Maybe be never. > > > > > > Does the same apply to its kmalloc() per-CPU caches? If so, I have a > > > hard time getting too worried about it. ;-) > > > > >Looking at slab_offline_cpu() , that calls cancel_delayed_work_sync() > > >on the cache reaper who's job is to flush the per-cpu caches. So I > > >believe during CPU offlining, the per-cpu slab caches are flushed. > > > > > >thanks, > > > > > >- Joel > > > > When cpu going offline, the slub or slab only flush free objects in offline > > cpu cache, put these free objects in node list or return buddy system, > > for those who are still in use, they still stay offline cpu cache. > > > > If we want clean per-cpu "krcp" objects when cpu going offline. we should > > free "krcp" cache objects in "rcutree_offline_cpu", this func be called > > before other rcu cpu offline func. and then "rcutree_offline_cpu" will be > > called in "cpuhp/%u" per-cpu thread. > > > > Could you please wrap text properly when you post to mailing list, thanks. I > fixed it for you above. > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > index 8ce77d9ac716..1812d4a1ac1b 100644 > > --- a/kernel/rcu/tree.c > > +++ b/kernel/rcu/tree.c > > @@ -3959,6 +3959,7 @@ int rcutree_offline_cpu(unsigned int cpu) > > unsigned long flags; > > struct rcu_data *rdp; > > struct rcu_node *rnp; > > + struct kfree_rcu_cpu *krcp; > > > > rdp = per_cpu_ptr(_data, cpu); > > rnp = rdp->mynode; > > @@ -3970,6 +3971,11 @@ int rcutree_offline_cpu(unsigned int cpu) > > > > // nohz_full CPUs need the tick for stop-machine to work quickly > > tick_dep_set(TICK_DEP_BIT_RCU); > > + > > + krcp = per_cpu_ptr(, cpu); > > + raw_spin_lock_irqsave(>lock, flags); > > + schedule_delayed_work(>monitor_work, 0); > > + raw_spin_unlock_irqrestore(>lock, flags); > > return 0; > > I realized the above is not good enough for what this is trying to do. Unlike > the slab, the new kfree_rcu objects cannot always be drained / submitted to > RCU because the previous batch may still be waiting for a grace period. So > the above code could very well return with the yet-to-be-submitted kfree_rcu > objects still in the cache. > > One option is to spin-wait here for monitor_todo to be false and keep calling > kfree_rcu_drain_unlock() till then. > > But then that's not good enough either, because if new objects are queued > when interrupts are enabled in the CPU offline path, then the cache will get > new objects after the previous set was drained. Further, spin waiting may > introduce deadlocks. > > Another option is to switch the kfree_rcu() path to non-batching (so new > objects cannot be cached in the offline path and are submitted directly to > RCU), wait for a GP and then submit the work. But then not sure if 1-argument > kfree_rcu() will like that. Or spawn a workqueue that does something like this: 1. Get any pending kvfree_rcu() requests sent off to RCU. 2. Do an rcu_barrier(). 3. Do the cleanup actions. > Probably Qian's original fix for for_each_possible_cpus() is good enough for > the shrinker case, and then we can tackle the hotplug one. It might take some experimentation to find the best solution. Thanx, Paul
Re: 回复: [PATCH] rcu: shrink each possible cpu krcp
On Wed, Aug 19, 2020 at 03:00:55AM +, Zhang, Qiang wrote: > > > > 发件人: linux-kernel-ow...@vger.kernel.org > 代表 Joel Fernandes > 发送时间: 2020年8月19日 8:04 > 收件人: Paul E. McKenney > 抄送: Uladzislau Rezki; Zhang, Qiang; Josh Triplett; Steven Rostedt; Mathieu > Desnoyers; Lai Jiangshan; rcu; LKML > 主题: Re: [PATCH] rcu: shrink each possible cpu krcp > > On Tue, Aug 18, 2020 at 6:02 PM Paul E. McKenney wrote: > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > index b8ccd7b5af82..6decb9ad2421 100644 > > > --- a/kernel/rcu/tree.c > > > +++ b/kernel/rcu/tree.c > > > @@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu) > > > { > > > struct rcu_data *rdp = per_cpu_ptr(_data, cpu); > > > struct rcu_node *rnp = rdp->mynode; /* Outgoing CPU's rdp & rnp. > > > */ > > > + struct kfree_rcu_cpu *krcp; > > > > > > if (!IS_ENABLED(CONFIG_HOTPLUG_CPU)) > > > return 0; > > > > > > + /* Drain the kcrp of this CPU. IRQs should be disabled? */ > > > + krcp = this_cpu_ptr() > > > + schedule_delayed_work(>monitor_work, 0); > > > + > > > > > > A cpu can be offlined and its krp will be stuck until a shrinker is > > > involved. > > > Maybe be never. > > > > Does the same apply to its kmalloc() per-CPU caches? If so, I have a > > hard time getting too worried about it. ;-) > > >Looking at slab_offline_cpu() , that calls cancel_delayed_work_sync() > >on the cache reaper who's job is to flush the per-cpu caches. So I > >believe during CPU offlining, the per-cpu slab caches are flushed. > > > >thanks, > > > >- Joel > > When cpu going offline, the slub or slab only flush free objects in offline > cpu cache, put these free objects in node list or return buddy system, > for those who are still in use, they still stay offline cpu cache. > > If we want clean per-cpu "krcp" objects when cpu going offline. we should > free "krcp" cache objects in "rcutree_offline_cpu", this func be called > before other rcu cpu offline func. and then "rcutree_offline_cpu" will be > called in "cpuhp/%u" per-cpu thread. > Could you please wrap text properly when you post to mailing list, thanks. I fixed it for you above. > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 8ce77d9ac716..1812d4a1ac1b 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -3959,6 +3959,7 @@ int rcutree_offline_cpu(unsigned int cpu) > unsigned long flags; > struct rcu_data *rdp; > struct rcu_node *rnp; > + struct kfree_rcu_cpu *krcp; > > rdp = per_cpu_ptr(_data, cpu); > rnp = rdp->mynode; > @@ -3970,6 +3971,11 @@ int rcutree_offline_cpu(unsigned int cpu) > > // nohz_full CPUs need the tick for stop-machine to work quickly > tick_dep_set(TICK_DEP_BIT_RCU); > + > + krcp = per_cpu_ptr(, cpu); > + raw_spin_lock_irqsave(>lock, flags); > + schedule_delayed_work(>monitor_work, 0); > + raw_spin_unlock_irqrestore(>lock, flags); > return 0; I realized the above is not good enough for what this is trying to do. Unlike the slab, the new kfree_rcu objects cannot always be drained / submitted to RCU because the previous batch may still be waiting for a grace period. So the above code could very well return with the yet-to-be-submitted kfree_rcu objects still in the cache. One option is to spin-wait here for monitor_todo to be false and keep calling kfree_rcu_drain_unlock() till then. But then that's not good enough either, because if new objects are queued when interrupts are enabled in the CPU offline path, then the cache will get new objects after the previous set was drained. Further, spin waiting may introduce deadlocks. Another option is to switch the kfree_rcu() path to non-batching (so new objects cannot be cached in the offline path and are submitted directly to RCU), wait for a GP and then submit the work. But then not sure if 1-argument kfree_rcu() will like that. Probably Qian's original fix for for_each_possible_cpus() is good enough for the shrinker case, and then we can tackle the hotplug one. thanks, - Joel
Re: 回复: [PATCH] rcu: shrink each possible cpu krcp
On Wed, Aug 19, 2020 at 03:00:55AM +, Zhang, Qiang wrote: > > > > 发件人: linux-kernel-ow...@vger.kernel.org > 代表 Joel Fernandes > 发送时间: 2020年8月19日 8:04 > 收件人: Paul E. McKenney > 抄送: Uladzislau Rezki; Zhang, Qiang; Josh Triplett; Steven Rostedt; Mathieu > Desnoyers; Lai Jiangshan; rcu; LKML > 主题: Re: [PATCH] rcu: shrink each possible cpu krcp > > On Tue, Aug 18, 2020 at 6:02 PM Paul E. McKenney wrote: > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > index b8ccd7b5af82..6decb9ad2421 100644 > > > --- a/kernel/rcu/tree.c > > > +++ b/kernel/rcu/tree.c > > > @@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu) > > > { > > > struct rcu_data *rdp = per_cpu_ptr(_data, cpu); > > > struct rcu_node *rnp = rdp->mynode; /* Outgoing CPU's rdp & rnp. > > > */ > > > + struct kfree_rcu_cpu *krcp; > > > > > > if (!IS_ENABLED(CONFIG_HOTPLUG_CPU)) > > > return 0; > > > > > > + /* Drain the kcrp of this CPU. IRQs should be disabled? */ > > > + krcp = this_cpu_ptr() > > > + schedule_delayed_work(>monitor_work, 0); > > > + > > > > > > A cpu can be offlined and its krp will be stuck until a shrinker is > > > involved. > > > Maybe be never. > > > > Does the same apply to its kmalloc() per-CPU caches? If so, I have a > > hard time getting too worried about it. ;-) > > >Looking at slab_offline_cpu() , that calls cancel_delayed_work_sync() > >on the cache reaper who's job is to flush the per-cpu caches. So I > >believe during CPU offlining, the per-cpu slab caches are flushed. > > > >thanks, > > > >- Joel > > When cpu going offline, the slub or slab only flush free objects in offline > cpu cache, put these free objects in node list or return buddy system, for > those who are still in use, they still stay offline cpu cache. > > If we want clean per-cpu "krcp" objects when cpu going offline. > we should free "krcp" cache objects in "rcutree_offline_cpu", this func be > called before other rcu cpu offline func. and then "rcutree_offline_cpu" will > be called in "cpuhp/%u" per-cpu thread. If Joel and Uladislau are good with this, would you be willing to resend with a commit log and your Signed-off-by? Thanx, Paul > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 8ce77d9ac716..1812d4a1ac1b 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -3959,6 +3959,7 @@ int rcutree_offline_cpu(unsigned int cpu) > unsigned long flags; > struct rcu_data *rdp; > struct rcu_node *rnp; > + struct kfree_rcu_cpu *krcp; > > rdp = per_cpu_ptr(_data, cpu); > rnp = rdp->mynode; > @@ -3970,6 +3971,11 @@ int rcutree_offline_cpu(unsigned int cpu) > > // nohz_full CPUs need the tick for stop-machine to work quickly > tick_dep_set(TICK_DEP_BIT_RCU); > + > + krcp = per_cpu_ptr(, cpu); > + raw_spin_lock_irqsave(>lock, flags); > + schedule_delayed_work(>monitor_work, 0); > + raw_spin_unlock_irqrestore(>lock, flags); > return 0; > } > > thanks, > > Zqiang
回复: [PATCH] rcu: shrink each possible cpu krcp
发件人: linux-kernel-ow...@vger.kernel.org 代表 Joel Fernandes 发送时间: 2020年8月19日 8:04 收件人: Paul E. McKenney 抄送: Uladzislau Rezki; Zhang, Qiang; Josh Triplett; Steven Rostedt; Mathieu Desnoyers; Lai Jiangshan; rcu; LKML 主题: Re: [PATCH] rcu: shrink each possible cpu krcp On Tue, Aug 18, 2020 at 6:02 PM Paul E. McKenney wrote: > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > index b8ccd7b5af82..6decb9ad2421 100644 > > --- a/kernel/rcu/tree.c > > +++ b/kernel/rcu/tree.c > > @@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu) > > { > > struct rcu_data *rdp = per_cpu_ptr(_data, cpu); > > struct rcu_node *rnp = rdp->mynode; /* Outgoing CPU's rdp & rnp. */ > > + struct kfree_rcu_cpu *krcp; > > > > if (!IS_ENABLED(CONFIG_HOTPLUG_CPU)) > > return 0; > > > > + /* Drain the kcrp of this CPU. IRQs should be disabled? */ > > + krcp = this_cpu_ptr() > > + schedule_delayed_work(>monitor_work, 0); > > + > > > > A cpu can be offlined and its krp will be stuck until a shrinker is > > involved. > > Maybe be never. > > Does the same apply to its kmalloc() per-CPU caches? If so, I have a > hard time getting too worried about it. ;-) >Looking at slab_offline_cpu() , that calls cancel_delayed_work_sync() >on the cache reaper who's job is to flush the per-cpu caches. So I >believe during CPU offlining, the per-cpu slab caches are flushed. > >thanks, > >- Joel When cpu going offline, the slub or slab only flush free objects in offline cpu cache, put these free objects in node list or return buddy system, for those who are still in use, they still stay offline cpu cache. If we want clean per-cpu "krcp" objects when cpu going offline. we should free "krcp" cache objects in "rcutree_offline_cpu", this func be called before other rcu cpu offline func. and then "rcutree_offline_cpu" will be called in "cpuhp/%u" per-cpu thread. diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 8ce77d9ac716..1812d4a1ac1b 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -3959,6 +3959,7 @@ int rcutree_offline_cpu(unsigned int cpu) unsigned long flags; struct rcu_data *rdp; struct rcu_node *rnp; + struct kfree_rcu_cpu *krcp; rdp = per_cpu_ptr(_data, cpu); rnp = rdp->mynode; @@ -3970,6 +3971,11 @@ int rcutree_offline_cpu(unsigned int cpu) // nohz_full CPUs need the tick for stop-machine to work quickly tick_dep_set(TICK_DEP_BIT_RCU); + + krcp = per_cpu_ptr(, cpu); + raw_spin_lock_irqsave(>lock, flags); + schedule_delayed_work(>monitor_work, 0); + raw_spin_unlock_irqrestore(>lock, flags); return 0; } thanks, Zqiang