Re: [PATCH v2 4/4] mm/slub: Fix sysfs shrink circular locking dependency

2020-05-16 Thread Qian Cai



> On Apr 27, 2020, at 7:56 PM, Waiman Long  wrote:
> 
> A lockdep splat is observed by echoing "1" to the shrink sysfs file
> and then shutting down the system:
> 
> [  167.473392] Chain exists of:
> [  167.473392]   kn->count#279 --> mem_hotplug_lock.rw_sem --> slab_mutex
> [  167.473392]
> [  167.484323]  Possible unsafe locking scenario:
> [  167.484323]
> [  167.490273]CPU0CPU1
> [  167.494825]
> [  167.499376]   lock(slab_mutex);
> [  167.502530]lock(mem_hotplug_lock.rw_sem);
> [  167.509356]lock(slab_mutex);
> [  167.515044]   lock(kn->count#279);
> [  167.518462]
> [  167.518462]  *** DEADLOCK ***
> 
> It is because of the get_online_cpus() and get_online_mems() calls in
> kmem_cache_shrink() invoked via the shrink sysfs file. To fix that, we
> have to use trylock to get the memory and cpu hotplug read locks. Since
> hotplug events are rare, it should be fine to refuse a kmem caches
> shrink operation when some hotplug events are in progress.
> 
> Signed-off-by: Waiman Long 

Feel free to use,

Reviewed-by: Qian Cai 

Re: [PATCH v2 4/4] mm/slub: Fix sysfs shrink circular locking dependency

2020-05-16 Thread Qian Cai



> On Apr 28, 2020, at 10:07 AM, Waiman Long  wrote:
> 
> Trylock is handled differently from lockdep's perspective as trylock can 
> failed. When trylock succeeds, the critical section is executed. As long as 
> it doesn't try to acquire another lock in the circular chain, the execution 
> will finish at some point and release the lock. On the other hand, if another 
> task has already held all those locks, the trylock will fail and held locks 
> should be released. Again, no deadlock will happen.

Ok, I can see that in validate_chain() especially mentioned,

“Trylock needs to maintain the stack of held locks, but it does not add new 
dependencies, because trylock can be done in any order.”

So, I agree this trylock trick could really work. Especially, I don’t know any 
other better way to fix this.

Re: [PATCH v2 4/4] mm/slub: Fix sysfs shrink circular locking dependency

2020-04-28 Thread Qian Cai



> On Apr 28, 2020, at 10:06 AM, Waiman Long  wrote:
> 
> On 4/27/20 10:11 PM, Qian Cai wrote:
>> 
>>> On Apr 27, 2020, at 9:39 PM, Waiman Long  wrote:
>>> 
>>> The sequence that was prevented by this patch is "kn->count --> 
>>> mem_hotplug_lock.rwsem". This sequence isn't directly in the splat. Once 
>>> this link is broken, the 3-lock circular loop cannot be formed. Maybe I 
>>> should modify the commit log to make this point more clear.
>> I don’t know what you are talking about. Once trylock succeed once, you will 
>> have kn->count —> cpu/memory_hotplug_lock.
>> 
> Trylock is handled differently from lockdep's perspective as trylock can 
> failed. When trylock succeeds, the critical section is executed. As long as 
> it doesn't try to acquire another lock in the circular chain, the execution 
> will finish at some point and release the lock. On the other hand, if another 
> task has already held all those locks, the trylock will fail and held locks 
> should be released. Again, no deadlock will happen.

So once,

CPU0 (trylock succeed):
kn->count —> cpu/memory_hotplug_lock.

Did you mean that lockdep will not record this existing chain?

If it did. Then later, are you still sure that CPU1 (via memcg path below) will 
still be impossible to trigger a splat just because lockdep will be able to 
tell that those arennon-exclusive (cpu/memory_hotplug_lock) locks instead?

 cpu/memory_hotplug_lock -> kn->count

[  290.805818] -> #3 (kn->count#86){}-{0:0}:
[  290.811954]__kernfs_remove+0x455/0x4c0
[  290.816428]kernfs_remove+0x23/0x40
[  290.820554]sysfs_remove_dir+0x74/0x80
[  290.824947]kobject_del+0x57/0xa0
[  290.828905]sysfs_slab_unlink+0x1c/0x20
[  290.833377]shutdown_cache+0x15d/0x1c0
[  290.837964]kmemcg_cache_shutdown_fn+0xe/0x20
[  290.842963]kmemcg_workfn+0x35/0x50   <—— cpu/memory_hotplug_lock
[  290.847095]process_one_work+0x57e/0xb90
[  290.851658]worker_thread+0x63/0x5b0
[  290.855872]kthread+0x1f7/0x220
[  290.859653]ret_from_fork+0x27/0x50

Re: [PATCH v2 4/4] mm/slub: Fix sysfs shrink circular locking dependency

2020-04-28 Thread Waiman Long

On 4/27/20 10:11 PM, Qian Cai wrote:



On Apr 27, 2020, at 9:39 PM, Waiman Long  wrote:

The sequence that was prevented by this patch is "kn->count --> 
mem_hotplug_lock.rwsem". This sequence isn't directly in the splat. Once this link is 
broken, the 3-lock circular loop cannot be formed. Maybe I should modify the commit log to make 
this point more clear.

I don’t know what you are talking about. Once trylock succeed once, you will have 
kn->count —> cpu/memory_hotplug_lock.

Trylock is handled differently from lockdep's perspective as trylock can 
failed. When trylock succeeds, the critical section is executed. As long 
as it doesn't try to acquire another lock in the circular chain, the 
execution will finish at some point and release the lock. On the other 
hand, if another task has already held all those locks, the trylock will 
fail and held locks should be released. Again, no deadlock will happen.


Regards,
Longman