On 2/23/21 4:05 AM, Peter Zijlstra wrote:
On Mon, Feb 22, 2021 at 11:00:37PM -0500, Chris Hyser wrote:On 1/22/21 8:17 PM, Joel Fernandes (Google) wrote: While trying to test the new prctl() code I'm working on, I ran into a bug I chased back into this v10 code. Under a fair amount of stress, when the function __sched_core_update_cookie() is ultimately called from sched_core_fork(), the system deadlocks or otherwise non-visibly crashes. I've not had much success figuring out why/what. I'm running with LOCKDEP on and seeing no complaints. Duplicating it only requires setting a cookie on a task and forking a bunch of threads ... all of which then want to update their cookie.Can you share the code and reproducer?
Attached is a tarball with c code (source) and scripts. Just run ./setup_bug which will compile the source and start a bash with a cs cookie. Then run ./show_bug which dumps the cookie and then fires off some processes and threads. Note the cs_clone command is not doing any core sched prctls for this test (not needed and currently coded for a diff prctl interface). It just creates processes and threads. I see this hang almost instantly.
Josh, I did verify that this occurs on Joel's coresched tree both with and w/o the kprot patch and that should exactly correspond to these patches.
-chrish
bug.tar.xz
Description: application/xz