subject:"Re\: \[PATCH v2\] blk\-throttle\: fix race between blkcg_bio_issue_check and cgroup

Re: [PATCH v2] blk-throttle: fix race between blkcg_bio_issue_check and cgroup_rmdir

2018-03-04 Thread Joseph Qi



On 18/3/5 04:23, Tejun Heo wrote:
> Hello, Joseph.
> 
> Sorry about late reply.
> 
> On Wed, Feb 28, 2018 at 02:52:10PM +0800, Joseph Qi wrote:
>> In current code, I'm afraid pd_offline_fn() as well as the rest
>> destruction have to be called together under the same blkcg->lock and
>> q->queue_lock.
>> For example, if we split the pd_offline_fn() and radix_tree_delete()
>> into 2 phases, it may introduce a race between blkcg_deactivate_policy()
>> when exit queue and blkcg_css_free(), which will result in
>> pd_offline_fn() to be called twice.
> 
> So, yeah, the sync scheme aroung blkg is pretty brittle and we'd need
> some restructuring to separate out blkg offlining and release, but it
> looks like that'd be the right thing to do, no?
> 
Agree, except the restriction above, as of now I don't find any more.
I'll try to fix in the way you suggested and post v3.

Thanks,
Joseph

Re: [PATCH v2] blk-throttle: fix race between blkcg_bio_issue_check and cgroup_rmdir

2018-03-04 Thread Tejun Heo

Hello, Joseph.

Sorry about late reply.

On Wed, Feb 28, 2018 at 02:52:10PM +0800, Joseph Qi wrote:
> In current code, I'm afraid pd_offline_fn() as well as the rest
> destruction have to be called together under the same blkcg->lock and
> q->queue_lock.
> For example, if we split the pd_offline_fn() and radix_tree_delete()
> into 2 phases, it may introduce a race between blkcg_deactivate_policy()
> when exit queue and blkcg_css_free(), which will result in
> pd_offline_fn() to be called twice.

So, yeah, the sync scheme aroung blkg is pretty brittle and we'd need
some restructuring to separate out blkg offlining and release, but it
looks like that'd be the right thing to do, no?

Thanks.

-- 
tejun

Re: [PATCH v2] blk-throttle: fix race between blkcg_bio_issue_check and cgroup_rmdir

2018-02-27 Thread Joseph Qi

Hi Tejun,

On 18/2/28 02:33, Tejun Heo wrote:
> Hello, Joseph.
> 
> On Sat, Feb 24, 2018 at 09:45:49AM +0800, Joseph Qi wrote:
>>> IIRC, as long as the blkcg and the device are there, the blkgs aren't
>>> gonna be destroyed.  So, if you have a ref to the blkcg through
>>> tryget, the blkg shouldn't go away.
>>>
>>
>> Maybe we have misunderstanding here.
>>
>> In this case, blkg doesn't go away as we have rcu protect, but
>> blkg_destroy() can be called, in which blkg_put() will put the last
>> refcnt and then schedule __blkg_release_rcu().
>>
>> css refcnt can't prevent blkcg css from offlining, instead it is css
>> online_cnt.
>>
>> css_tryget() will only get a refcnt of blkcg css, but can't be
>> guaranteed to fail when css is confirmed to kill.
> 
> Ah, you're right.  I was thinking we only destroy blkgs from blkcg
> release path.  Given that we primarily use blkcg refcnting to pin
> them, I believe that's what we should do - ie. only call
> pd_offline_fn() from blkcg_css_offline() path and do the rest of
> destruction from blkcg_css_free().  What do you think?
> 
In current code, I'm afraid pd_offline_fn() as well as the rest
destruction have to be called together under the same blkcg->lock and
q->queue_lock.
For example, if we split the pd_offline_fn() and radix_tree_delete()
into 2 phases, it may introduce a race between blkcg_deactivate_policy()
when exit queue and blkcg_css_free(), which will result in
pd_offline_fn() to be called twice.

Thanks,
Joseph

Re: [PATCH v2] blk-throttle: fix race between blkcg_bio_issue_check and cgroup_rmdir

2018-02-27 Thread Tejun Heo

Hello, Joseph.

On Sat, Feb 24, 2018 at 09:45:49AM +0800, Joseph Qi wrote:
> > IIRC, as long as the blkcg and the device are there, the blkgs aren't
> > gonna be destroyed.  So, if you have a ref to the blkcg through
> > tryget, the blkg shouldn't go away.
> > 
> 
> Maybe we have misunderstanding here.
> 
> In this case, blkg doesn't go away as we have rcu protect, but
> blkg_destroy() can be called, in which blkg_put() will put the last
> refcnt and then schedule __blkg_release_rcu().
> 
> css refcnt can't prevent blkcg css from offlining, instead it is css
> online_cnt.
> 
> css_tryget() will only get a refcnt of blkcg css, but can't be
> guaranteed to fail when css is confirmed to kill.

Ah, you're right.  I was thinking we only destroy blkgs from blkcg
release path.  Given that we primarily use blkcg refcnting to pin
them, I believe that's what we should do - ie. only call
pd_offline_fn() from blkcg_css_offline() path and do the rest of
destruction from blkcg_css_free().  What do you think?

Thanks.

-- 
tejun

Re: [PATCH v2] blk-throttle: fix race between blkcg_bio_issue_check and cgroup_rmdir

2018-02-26 Thread Joseph Qi

Hi Tejun,

Could you please help take a look at this and give a confirmation?

Thanks,
Joseph

On 18/2/24 09:45, Joseph Qi wrote:
> Hi Tejun,
> 
> On 18/2/23 22:23, Tejun Heo wrote:
>> Hello,
>>
>> On Fri, Feb 23, 2018 at 09:56:54AM +0800, xuejiufei wrote:
 On Thu, Feb 22, 2018 at 02:14:34PM +0800, Joseph Qi wrote:
> I still don't get how css_tryget can work here.
>
> The race happens when:
> 1) writeback kworker has found the blkg with rcu;
> 2) blkcg is during offlining and blkg_destroy() has already been called.
> Then, writeback kworker will take queue lock and access the blkg with
> refcount 0.

 Yeah, then tryget would fail and it should go through the root.

>>> In this race, the refcount of blkg becomes zero and is destroyed.
>>> However css may still have refcount, and css_tryget can return success
>>> before other callers put the refcount.
>>> So I don't get how css_tryget can fix this race? Or I wonder if we can
>>> add another function blkg_tryget?
>>
>> IIRC, as long as the blkcg and the device are there, the blkgs aren't
>> gonna be destroyed.  So, if you have a ref to the blkcg through
>> tryget, the blkg shouldn't go away.
>>
> 
> Maybe we have misunderstanding here.
> 
> In this case, blkg doesn't go away as we have rcu protect, but
> blkg_destroy() can be called, in which blkg_put() will put the last
> refcnt and then schedule __blkg_release_rcu().
> 
> css refcnt can't prevent blkcg css from offlining, instead it is css
> online_cnt.
> 
> css_tryget() will only get a refcnt of blkcg css, but can't be
> guaranteed to fail when css is confirmed to kill.
> 
> The race sequence:
> writeback kworker   cgroup_rmdir
>   cgroup_destroy_locked
> kill_css
>   css_killed_ref_fn
> css_killed_work_fn
>   offline_css
> blkcg_css_offline
>   blkcg_bio_issue_check
> rcu_read_lock
> blkg_lookup
>   spin_trylock(q->queue_lock)
>   blkg_destroy
>   spin_unlock(q->queue_lock)
> blk_throtl_bio
>   spin_lock_irq(q->queue_lock)
>   spin_unlock_irq(q->queue_lock)
> rcu_read_unlock
> 
> Thanks,
> Joseph
>

Re: [PATCH v2] blk-throttle: fix race between blkcg_bio_issue_check and cgroup_rmdir

2018-02-23 Thread Joseph Qi

Hi Tejun,

On 18/2/23 22:23, Tejun Heo wrote:
> Hello,
> 
> On Fri, Feb 23, 2018 at 09:56:54AM +0800, xuejiufei wrote:
>>> On Thu, Feb 22, 2018 at 02:14:34PM +0800, Joseph Qi wrote:
 I still don't get how css_tryget can work here.

 The race happens when:
 1) writeback kworker has found the blkg with rcu;
 2) blkcg is during offlining and blkg_destroy() has already been called.
 Then, writeback kworker will take queue lock and access the blkg with
 refcount 0.
>>>
>>> Yeah, then tryget would fail and it should go through the root.
>>>
>> In this race, the refcount of blkg becomes zero and is destroyed.
>> However css may still have refcount, and css_tryget can return success
>> before other callers put the refcount.
>> So I don't get how css_tryget can fix this race? Or I wonder if we can
>> add another function blkg_tryget?
> 
> IIRC, as long as the blkcg and the device are there, the blkgs aren't
> gonna be destroyed.  So, if you have a ref to the blkcg through
> tryget, the blkg shouldn't go away.
> 

Maybe we have misunderstanding here.

In this case, blkg doesn't go away as we have rcu protect, but
blkg_destroy() can be called, in which blkg_put() will put the last
refcnt and then schedule __blkg_release_rcu().

css refcnt can't prevent blkcg css from offlining, instead it is css
online_cnt.

css_tryget() will only get a refcnt of blkcg css, but can't be
guaranteed to fail when css is confirmed to kill.

The race sequence:
writeback kworker   cgroup_rmdir
  cgroup_destroy_locked
kill_css
  css_killed_ref_fn
css_killed_work_fn
  offline_css
blkcg_css_offline
  blkcg_bio_issue_check
rcu_read_lock
blkg_lookup
  spin_trylock(q->queue_lock)
  blkg_destroy
  spin_unlock(q->queue_lock)
blk_throtl_bio
  spin_lock_irq(q->queue_lock)
  spin_unlock_irq(q->queue_lock)
rcu_read_unlock

Thanks,
Joseph

Re: [PATCH v2] blk-throttle: fix race between blkcg_bio_issue_check and cgroup_rmdir

2018-02-23 Thread Tejun Heo

Hello,

On Fri, Feb 23, 2018 at 09:56:54AM +0800, xuejiufei wrote:
> > On Thu, Feb 22, 2018 at 02:14:34PM +0800, Joseph Qi wrote:
> >> I still don't get how css_tryget can work here.
> >>
> >> The race happens when:
> >> 1) writeback kworker has found the blkg with rcu;
> >> 2) blkcg is during offlining and blkg_destroy() has already been called.
> >> Then, writeback kworker will take queue lock and access the blkg with
> >> refcount 0.
> > 
> > Yeah, then tryget would fail and it should go through the root.
> > 
> In this race, the refcount of blkg becomes zero and is destroyed.
> However css may still have refcount, and css_tryget can return success
> before other callers put the refcount.
> So I don't get how css_tryget can fix this race? Or I wonder if we can
> add another function blkg_tryget?

IIRC, as long as the blkcg and the device are there, the blkgs aren't
gonna be destroyed.  So, if you have a ref to the blkcg through
tryget, the blkg shouldn't go away.

Thanks.

-- 
tejun

Re: [PATCH v2] blk-throttle: fix race between blkcg_bio_issue_check and cgroup_rmdir

2018-02-22 Thread xuejiufei

Hi Tejun,

On 2018/2/22 下午11:18, Tejun Heo wrote:
> Hello,
> 
> On Thu, Feb 22, 2018 at 02:14:34PM +0800, Joseph Qi wrote:
>> I still don't get how css_tryget can work here.
>>
>> The race happens when:
>> 1) writeback kworker has found the blkg with rcu;
>> 2) blkcg is during offlining and blkg_destroy() has already been called.
>> Then, writeback kworker will take queue lock and access the blkg with
>> refcount 0.
> 
> Yeah, then tryget would fail and it should go through the root.
> 
In this race, the refcount of blkg becomes zero and is destroyed.
However css may still have refcount, and css_tryget can return success
before other callers put the refcount.
So I don't get how css_tryget can fix this race? Or I wonder if we can
add another function blkg_tryget?

Thanks,
Jiufei

> Thanks.
>

Re: [PATCH v2] blk-throttle: fix race between blkcg_bio_issue_check and cgroup_rmdir

2018-02-21 Thread Joseph Qi

Hi Tejun,

Sorry for the delayed reply.

On 18/2/13 01:11, Tejun Heo wrote:
> Hello, Joseph.
> 
> On Fri, Feb 09, 2018 at 10:15:19AM +0800, Joseph Qi wrote:
>> IIUC, we have to identify it is in blkcg_css_offline now which will
>> blkg_put. Since percpu_ref_kill_and_confirm in kill_css will set flag
>> __PERCPU_REF_DEAD, so we can use this to avoid the race. IOW, if
>> __PERCPU_REF_DEAD is set now, we know blkcg css is in offline and
>> continue access blkg may risk double free. Thus we choose to skip these
>> ios.
>> I don't get how css_tryget works since it doesn't care the flag
>> __PERCPU_REF_DEAD. Also css_tryget can't prevent blkcg_css from
>> offlining since the race happens blkcg_css_offline is in progress.
>> Am I missing something here?
> 
> Once marked dead, the ref is in atomic mode and css_tryget() would hit
> the atomic counter.  Here, we don't care about the offlining and
> draining.  A draining memcg can still have a lot of memory to be
> written back attached to it and we don't want punt all of them to the
> root cgroup.

I still don't get how css_tryget can work here.

The race happens when:
1) writeback kworker has found the blkg with rcu;
2) blkcg is during offlining and blkg_destroy() has already been called.
Then, writeback kworker will take queue lock and access the blkg with
refcount 0.

So, I think we should take queue lock when lookup blkg if we want to
throttle the ios during offline (the way this patch tries), or use
css_tryget_online() to skip the further use of the risky blkg, which
means these ios won't be throttled either.

Thanks,
Joseph

Re: [PATCH v2] blk-throttle: fix race between blkcg_bio_issue_check and cgroup_rmdir

2018-02-12 Thread Tejun Heo

Hello, Joseph.

On Fri, Feb 09, 2018 at 10:15:19AM +0800, Joseph Qi wrote:
> IIUC, we have to identify it is in blkcg_css_offline now which will
> blkg_put. Since percpu_ref_kill_and_confirm in kill_css will set flag
> __PERCPU_REF_DEAD, so we can use this to avoid the race. IOW, if
> __PERCPU_REF_DEAD is set now, we know blkcg css is in offline and
> continue access blkg may risk double free. Thus we choose to skip these
> ios.
> I don't get how css_tryget works since it doesn't care the flag
> __PERCPU_REF_DEAD. Also css_tryget can't prevent blkcg_css from
> offlining since the race happens blkcg_css_offline is in progress.
> Am I missing something here?

Once marked dead, the ref is in atomic mode and css_tryget() would hit
the atomic counter.  Here, we don't care about the offlining and
draining.  A draining memcg can still have a lot of memory to be
written back attached to it and we don't want punt all of them to the
root cgroup.

Thanks.

-- 
tejun

Re: [PATCH v2] blk-throttle: fix race between blkcg_bio_issue_check and cgroup_rmdir

2018-02-08 Thread Joseph Qi

Hi Tejun,

On 18/2/8 23:23, Tejun Heo wrote:
> Hello, Joseph.
> 
> On Thu, Feb 08, 2018 at 10:29:43AM +0800, Joseph Qi wrote:
>> So you mean checking css->refcnt to prevent the further use of
>> blkg_get? I think it makes sense.
> 
> Yes.
> 
>> IMO, we should use css_tryget_online instead, and rightly after taking
> 
> Not really.  An offline css still can have a vast amount of IO to
> drain and write out.
> 
IIUC, we have to identify it is in blkcg_css_offline now which will
blkg_put. Since percpu_ref_kill_and_confirm in kill_css will set flag
__PERCPU_REF_DEAD, so we can use this to avoid the race. IOW, if
__PERCPU_REF_DEAD is set now, we know blkcg css is in offline and
continue access blkg may risk double free. Thus we choose to skip these
ios.
I don't get how css_tryget works since it doesn't care the flag
__PERCPU_REF_DEAD. Also css_tryget can't prevent blkcg_css from
offlining since the race happens blkcg_css_offline is in progress.
Am I missing something here?

Thanks,
Joseph

>> queue_lock. Because there may be more use of blkg_get in blk_throtl_bio
>> in the futher. Actually it already has two now. One is in
>> blk_throtl_assoc_bio, and the other is in throtl_qnode_add_bio.
>> What do you think of this?
> 
> As long as we avoid making the bypass paths expensive, whatever makes
> the code easier to read and maintain would be better.  css ref ops are
> extremely cheap anyway.
> 
> Thanks.
>

Re: [PATCH v2] blk-throttle: fix race between blkcg_bio_issue_check and cgroup_rmdir

2018-02-08 Thread Tejun Heo

Hello, Joseph.

On Thu, Feb 08, 2018 at 10:29:43AM +0800, Joseph Qi wrote:
> So you mean checking css->refcnt to prevent the further use of
> blkg_get? I think it makes sense.

Yes.

> IMO, we should use css_tryget_online instead, and rightly after taking

Not really.  An offline css still can have a vast amount of IO to
drain and write out.

> queue_lock. Because there may be more use of blkg_get in blk_throtl_bio
> in the futher. Actually it already has two now. One is in
> blk_throtl_assoc_bio, and the other is in throtl_qnode_add_bio.
> What do you think of this?

As long as we avoid making the bypass paths expensive, whatever makes
the code easier to read and maintain would be better.  css ref ops are
extremely cheap anyway.

Thanks.

-- 
tejun

Re: [PATCH v2] blk-throttle: fix race between blkcg_bio_issue_check and cgroup_rmdir

2018-02-07 Thread Joseph Qi

Hi Tejun,
Thanks very much for reviewing this patch.

On 18/2/8 05:38, Tejun Heo wrote:
> Hello, Joseph.
> 
> On Wed, Feb 07, 2018 at 04:40:02PM +0800, Joseph Qi wrote:
>> writeback kworker
>>   blkcg_bio_issue_check
>> rcu_read_lock
>> blkg_lookup
>> <<< *race window*
>> blk_throtl_bio
>>   spin_lock_irq(q->queue_lock)
>>   spin_unlock_irq(q->queue_lock)
>> rcu_read_unlock
>>
>> cgroup_rmdir
>>   cgroup_destroy_locked
>> kill_css
>>   css_killed_ref_fn
>> css_killed_work_fn
>>   offline_css
>> blkcg_css_offline
>>   spin_trylock(q->queue_lock)
>>   blkg_destroy
>>   spin_unlock(q->queue_lock)
> 
> Ah, right.  Thanks for spotting the bug.
> 
>> Since rcu can only prevent blkg from releasing when it is being used,
>> the blkg->refcnt can be decreased to 0 during blkg_destroy and schedule
>> blkg release.
>> Then trying to blkg_get in blk_throtl_bio will complains the WARNING.
>> And then the corresponding blkg_put will schedule blkg release again,
>> which result in double free.
>> This race is introduced by commit ae1188963611 ("blkcg: consolidate blkg
>> creation in blkcg_bio_issue_check()"). Before this commit, it will lookup
>> first and then try to lookup/create again with queue_lock. So revive
>> this logic to fix the race.
> 
> The change seems a bit drastic to me.  Can't we do something like the
> following instead?
> 
> blk_throtl_bio()
> {
>   ... non throttled cases ...
> 
>   /* out-of-limit, queue to @tg */
> 
>   /*
>* We can look up and retry but the race window is tiny here.
>* Just letting it through should be good enough.
>*/
>   if (!css_tryget(blkcg->css))
>   goto out;
> 
>   ... actual queueing ...
>   css_put(blkcg->css);
>   ...
> }
So you mean checking css->refcnt to prevent the further use of
blkg_get? I think it makes sense.
IMO, we should use css_tryget_online instead, and rightly after taking
queue_lock. Because there may be more use of blkg_get in blk_throtl_bio
in the futher. Actually it already has two now. One is in
blk_throtl_assoc_bio, and the other is in throtl_qnode_add_bio.
What do you think of this?

Thanks,
Joseph

Re: [PATCH v2] blk-throttle: fix race between blkcg_bio_issue_check and cgroup_rmdir

2018-02-07 Thread Tejun Heo

Hello, Joseph.

On Wed, Feb 07, 2018 at 04:40:02PM +0800, Joseph Qi wrote:
> writeback kworker
>   blkcg_bio_issue_check
> rcu_read_lock
> blkg_lookup
> <<< *race window*
> blk_throtl_bio
>   spin_lock_irq(q->queue_lock)
>   spin_unlock_irq(q->queue_lock)
> rcu_read_unlock
> 
> cgroup_rmdir
>   cgroup_destroy_locked
> kill_css
>   css_killed_ref_fn
> css_killed_work_fn
>   offline_css
> blkcg_css_offline
>   spin_trylock(q->queue_lock)
>   blkg_destroy
>   spin_unlock(q->queue_lock)

Ah, right.  Thanks for spotting the bug.

> Since rcu can only prevent blkg from releasing when it is being used,
> the blkg->refcnt can be decreased to 0 during blkg_destroy and schedule
> blkg release.
> Then trying to blkg_get in blk_throtl_bio will complains the WARNING.
> And then the corresponding blkg_put will schedule blkg release again,
> which result in double free.
> This race is introduced by commit ae1188963611 ("blkcg: consolidate blkg
> creation in blkcg_bio_issue_check()"). Before this commit, it will lookup
> first and then try to lookup/create again with queue_lock. So revive
> this logic to fix the race.

The change seems a bit drastic to me.  Can't we do something like the
following instead?

blk_throtl_bio()
{
... non throttled cases ...

/* out-of-limit, queue to @tg */

/*
 * We can look up and retry but the race window is tiny here.
 * Just letting it through should be good enough.
 */
if (!css_tryget(blkcg->css))
goto out;

... actual queueing ...
css_put(blkcg->css);
...
}

Thanks.

-- 
tejun

Re: [PATCH v2] blk-throttle: fix race between blkcg_bio_issue_check and cgroup_rmdir

Re: [PATCH v2] blk-throttle: fix race between blkcg_bio_issue_check and cgroup_rmdir

Re: [PATCH v2] blk-throttle: fix race between blkcg_bio_issue_check and cgroup_rmdir

Re: [PATCH v2] blk-throttle: fix race between blkcg_bio_issue_check and cgroup_rmdir

Re: [PATCH v2] blk-throttle: fix race between blkcg_bio_issue_check and cgroup_rmdir

Re: [PATCH v2] blk-throttle: fix race between blkcg_bio_issue_check and cgroup_rmdir

Re: [PATCH v2] blk-throttle: fix race between blkcg_bio_issue_check and cgroup_rmdir

Re: [PATCH v2] blk-throttle: fix race between blkcg_bio_issue_check and cgroup_rmdir

Re: [PATCH v2] blk-throttle: fix race between blkcg_bio_issue_check and cgroup_rmdir

Re: [PATCH v2] blk-throttle: fix race between blkcg_bio_issue_check and cgroup_rmdir

Re: [PATCH v2] blk-throttle: fix race between blkcg_bio_issue_check and cgroup_rmdir

Re: [PATCH v2] blk-throttle: fix race between blkcg_bio_issue_check and cgroup_rmdir

Re: [PATCH v2] blk-throttle: fix race between blkcg_bio_issue_check and cgroup_rmdir

Re: [PATCH v2] blk-throttle: fix race between blkcg_bio_issue_check and cgroup_rmdir

14 matches

Site Navigation

Mail list logo

Footer information