Hi Tejun,

Thanks, I see I missed the RCU part.
I'll try the force atomic thing.
Though so far I haven't been able to reproduce it yet.

Thanks,
David


2018-03-14 8:43 GMT-07:00 Tejun Heo <t...@kernel.org>:
> Hello, David.
>
> On Tue, Mar 13, 2018 at 03:50:47PM -0700, David Chen wrote:
>> ====
>> CPU A                           CPU B
>> -----                           -----
>> percpu_ref_kill()               percpu_ref_tryget_live()
>> {
>>                                 if (__ref_is_percpu())
>>   set __PERCPU_REF_DEAD;
>>   __percpu_ref_switch_mode();
>>    ^ sum up current percpu_count
>>                                 this_cpu_inc(*percpu_count); <- this
>> increment got leaked.
>>
>> ====
>>
>> So if later CPU B later does percpu_ref_put, it will cause ref->count
>> to drop to -1.
>> And thus causing the above hung task issue.
>>
>> Do you think this theory is correct, or am I missing something?
>> Please tell me what do you think.
>
> The switching to atomic mode does something like the following.
>
> 1. Mark the refcnt so that __ref_is_percpu() is false.
>
> 2. Wait for RCU grace period so that everyone including
>    percpu_ref_tryget_live() which has seen true __ref_is_percpu() is
>    done with its operation.
>
> 3. Now that it knows nobody is operating on the assumption that the
>    counter is in percpu mode, it adds up all the percpu counters.
>
> So, provided there aren't some silly bugs, what you described
> shouldn't happen.  Can you force the refcnt into atomic mode w/
> PERCPU_REF_INIT_ATOMIC and see whether the problem persists?
>
> Thanks.
>
> --
> tejun

Reply via email to