Re: rcu_barrier() + membarrier() + scheduler deadlock?

2020-04-30 Thread Paul E. McKenney
On Thu, Apr 30, 2020 at 09:28:32AM -0400, Qian Cai wrote:
> > On Apr 30, 2020, at 9:15 AM, Paul E. McKenney  wrote:
> >> https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config
> >> 
> >> [53294.651754][T149877] futex_wake_op: trinity-c25 tries to shift op by 
> >> -17; fix this program
> >> [53323.947396][T150988] futex_wake_op: trinity-c6 tries to shift op by -5; 
> >> fix this program
> >> [53458.295837][  T215] INFO: task kworker/u64:0:8 blocked for more than 
> >> 122 seconds.
> >> [53458.304063][  T215]   Tainted: G   O L
> >> 5.7.0-rc3-next-20200429 #1
> >> [53458.311568][  T215] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
> >> disables this message.
> >> [53458.320190][  T215] kworker/u64:0   D10584 8  2 0x90004000
> >> [53458.326668][  T215] Workqueue: netns cleanup_net
> >> [53458.331330][  T215] Call Trace:
> >> [53458.334510][  T215]  __schedule+0x47b/0xa50
> >> [53458.338765][  T215]  ? wait_for_completion+0x80/0x120
> >> [53458.343920][  T215]  schedule+0x59/0xd0
> >> [53458.348013][  T215]  schedule_timeout+0x10a/0x150
> >> [53458.352762][  T215]  ? wait_for_completion+0x80/0x120
> >> [53458.357881][  T215]  ? _raw_spin_unlock_irq+0x30/0x40
> >> [53458.362997][  T215]  ? trace_hardirqs_on+0x22/0x100
> >> [53458.367948][  T215]  ? wait_for_completion+0x80/0x120
> >> [53458.373195][  T215]  wait_for_completion+0xb4/0x120
> >> [53458.378149][  T215]  __flush_work+0x3ff/0x6e0
> >> [53458.382586][  T215]  ? init_pwq+0x210/0x210
> >> [53458.386840][  T215]  flush_work+0x20/0x30
> >> [53458.390891][  T215]  rollback_registered_many+0x3d6/0x950
> >> [53458.396438][  T215]  ? mark_held_locks+0x4e/0x80
> >> [53458.401339][  T215]  unregister_netdevice_many+0x5d/0x200
> >> [53458.406816][  T215]  default_device_exit_batch+0x213/0x240
> >> [53458.412348][  T215]  ? do_wait_intr_irq+0xe0/0xe0
> >> [53458.417225][  T215]  ? dev_change_net_namespace+0x6d0/0x6d0
> >> [53458.423000][  T215]  ops_exit_list+0xa2/0xc0
> >> [53458.427367][  T215]  cleanup_net+0x3d0/0x600
> >> [53458.431778][  T215]  process_one_work+0x560/0xba0
> >> [53458.436629][  T215]  worker_thread+0x80/0x5f0
> >> [53458.441078][  T215]  ? process_scheduled_works+0x90/0x90
> >> [53458.446485][  T215]  kthread+0x1de/0x200
> >> [53458.450600][  T215]  ? kthread_unpark+0xd0/0xd0
> >> [53458.455227][  T215]  ret_from_fork+0x27/0x50
> >> [53458.460332][  T215] INFO: task trinity-c17:150651 blocked for more than 
> >> 123 seconds.
> >> [53458.468180][  T215]   Tainted: G   O L
> >> 5.7.0-rc3-next-20200429 #1
> >> [53458.475924][  T215] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
> >> disables this message.
> >> [53458.484511][  T215] trinity-c17 D12312 150651  93301 0x1004
> >> [53458.490862][  T215] Call Trace:
> >> [53458.494039][  T215]  __schedule+0x47b/0xa50
> >> [53458.498452][  T215]  schedule+0x59/0xd0
> >> [53458.502376][  T215]  schedule_preempt_disabled+0x15/0x20
> >> [53458.507809][  T215]  __mutex_lock+0x6f2/0xbf0
> >> [53458.512296][  T215]  ? rtnl_lock+0x20/0x30
> >> [53458.516484][  T215]  mutex_lock_nested+0x31/0x40
> >> [53458.521157][  T215]  ? mutex_lock_nested+0x31/0x40
> >> [53458.526195][  T215]  rtnl_lock+0x20/0x30
> >> [53458.530251][  T215]  do_ip_setsockopt.isra.12+0xec/0x1b90
> >> [53458.535875][  T215]  ? find_held_lock+0x35/0xa0
> >> [53458.540603][  T215]  ? rb_insert_color+0x10f/0x390
> >> [53458.545436][  T215]  ? lock_acquire+0xcd/0x450
> >> [53458.550126][  T215]  ? find_held_lock+0x35/0xa0
> >> [53458.554717][  T215]  ? __cgroup_bpf_prog_array_is_empty+0x121/0x230
> >> [53458.561127][  T215]  ip_setsockopt+0x3e/0x90
> >> [53458.565511][  T215]  udp_setsockopt+0x49/0x80
> >> [53458.570059][  T215]  sock_common_setsockopt+0x6d/0x90
> >> [53458.575303][  T215]  __sys_setsockopt+0x194/0x2e0
> >> [53458.580128][  T215]  __x64_sys_setsockopt+0x70/0x90
> >> [53458.585066][  T215]  do_syscall_64+0x91/0xb10
> >> [53458.589504][  T215]  ? perf_call_bpf_enter+0x120/0x120
> >> [53458.594741][  T215]  ? trace_hardirqs_off_thunk+0x1a/0x1c
> >> [53458.600467][  T215]  entry_SYSCALL_64_after_hwframe+0x49/0xb3
> >> [53458.606296][  T215] RIP: 0033:0x7f2cbe1b270d
> >> [53458.610611][  T215] Code: Bad RIP value.
> >> [53458.614570][  T215] RSP: 002b:7ffe6a4b18d8 EFLAGS: 0246 
> >> ORIG_RAX: 0036
> >> [53458.623094][  T215] RAX: ffda RBX: 0036 RCX: 
> >> 7f2cbe1b270d
> >> [53458.631024][  T215] RDX: 002a RSI:  RDI: 
> >> 0060
> >> [53458.638954][  T215] RBP: 0036 R08: 0088 R09: 
> >> 00f0
> >> [53458.647044][  T215] R10: 025557f0 R11: 0246 R12: 
> >> 0002
> >> [53458.654960][  T215] R13: 7f2cbcaeb058 R14: 7f2cbe0716c0 R15: 
> >> 7f2cbcaeb000
> >> [53458.662899][  T215] INFO: task trinity-c10:150896 blocked for more than 
> >> 123 seconds.
> >> [53458.670757][  T215]   Tainted: G 

Re: rcu_barrier() + membarrier() + scheduler deadlock?

2020-04-30 Thread Qian Cai



> On Apr 30, 2020, at 9:15 AM, Paul E. McKenney  wrote:
>> 
>> https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config
>> 
>> [53294.651754][T149877] futex_wake_op: trinity-c25 tries to shift op by -17; 
>> fix this program
>> [53323.947396][T150988] futex_wake_op: trinity-c6 tries to shift op by -5; 
>> fix this program
>> [53458.295837][  T215] INFO: task kworker/u64:0:8 blocked for more than 122 
>> seconds.
>> [53458.304063][  T215]   Tainted: G   O L
>> 5.7.0-rc3-next-20200429 #1
>> [53458.311568][  T215] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
>> disables this message.
>> [53458.320190][  T215] kworker/u64:0   D10584 8  2 0x90004000
>> [53458.326668][  T215] Workqueue: netns cleanup_net
>> [53458.331330][  T215] Call Trace:
>> [53458.334510][  T215]  __schedule+0x47b/0xa50
>> [53458.338765][  T215]  ? wait_for_completion+0x80/0x120
>> [53458.343920][  T215]  schedule+0x59/0xd0
>> [53458.348013][  T215]  schedule_timeout+0x10a/0x150
>> [53458.352762][  T215]  ? wait_for_completion+0x80/0x120
>> [53458.357881][  T215]  ? _raw_spin_unlock_irq+0x30/0x40
>> [53458.362997][  T215]  ? trace_hardirqs_on+0x22/0x100
>> [53458.367948][  T215]  ? wait_for_completion+0x80/0x120
>> [53458.373195][  T215]  wait_for_completion+0xb4/0x120
>> [53458.378149][  T215]  __flush_work+0x3ff/0x6e0
>> [53458.382586][  T215]  ? init_pwq+0x210/0x210
>> [53458.386840][  T215]  flush_work+0x20/0x30
>> [53458.390891][  T215]  rollback_registered_many+0x3d6/0x950
>> [53458.396438][  T215]  ? mark_held_locks+0x4e/0x80
>> [53458.401339][  T215]  unregister_netdevice_many+0x5d/0x200
>> [53458.406816][  T215]  default_device_exit_batch+0x213/0x240
>> [53458.412348][  T215]  ? do_wait_intr_irq+0xe0/0xe0
>> [53458.417225][  T215]  ? dev_change_net_namespace+0x6d0/0x6d0
>> [53458.423000][  T215]  ops_exit_list+0xa2/0xc0
>> [53458.427367][  T215]  cleanup_net+0x3d0/0x600
>> [53458.431778][  T215]  process_one_work+0x560/0xba0
>> [53458.436629][  T215]  worker_thread+0x80/0x5f0
>> [53458.441078][  T215]  ? process_scheduled_works+0x90/0x90
>> [53458.446485][  T215]  kthread+0x1de/0x200
>> [53458.450600][  T215]  ? kthread_unpark+0xd0/0xd0
>> [53458.455227][  T215]  ret_from_fork+0x27/0x50
>> [53458.460332][  T215] INFO: task trinity-c17:150651 blocked for more than 
>> 123 seconds.
>> [53458.468180][  T215]   Tainted: G   O L
>> 5.7.0-rc3-next-20200429 #1
>> [53458.475924][  T215] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
>> disables this message.
>> [53458.484511][  T215] trinity-c17 D12312 150651  93301 0x1004
>> [53458.490862][  T215] Call Trace:
>> [53458.494039][  T215]  __schedule+0x47b/0xa50
>> [53458.498452][  T215]  schedule+0x59/0xd0
>> [53458.502376][  T215]  schedule_preempt_disabled+0x15/0x20
>> [53458.507809][  T215]  __mutex_lock+0x6f2/0xbf0
>> [53458.512296][  T215]  ? rtnl_lock+0x20/0x30
>> [53458.516484][  T215]  mutex_lock_nested+0x31/0x40
>> [53458.521157][  T215]  ? mutex_lock_nested+0x31/0x40
>> [53458.526195][  T215]  rtnl_lock+0x20/0x30
>> [53458.530251][  T215]  do_ip_setsockopt.isra.12+0xec/0x1b90
>> [53458.535875][  T215]  ? find_held_lock+0x35/0xa0
>> [53458.540603][  T215]  ? rb_insert_color+0x10f/0x390
>> [53458.545436][  T215]  ? lock_acquire+0xcd/0x450
>> [53458.550126][  T215]  ? find_held_lock+0x35/0xa0
>> [53458.554717][  T215]  ? __cgroup_bpf_prog_array_is_empty+0x121/0x230
>> [53458.561127][  T215]  ip_setsockopt+0x3e/0x90
>> [53458.565511][  T215]  udp_setsockopt+0x49/0x80
>> [53458.570059][  T215]  sock_common_setsockopt+0x6d/0x90
>> [53458.575303][  T215]  __sys_setsockopt+0x194/0x2e0
>> [53458.580128][  T215]  __x64_sys_setsockopt+0x70/0x90
>> [53458.585066][  T215]  do_syscall_64+0x91/0xb10
>> [53458.589504][  T215]  ? perf_call_bpf_enter+0x120/0x120
>> [53458.594741][  T215]  ? trace_hardirqs_off_thunk+0x1a/0x1c
>> [53458.600467][  T215]  entry_SYSCALL_64_after_hwframe+0x49/0xb3
>> [53458.606296][  T215] RIP: 0033:0x7f2cbe1b270d
>> [53458.610611][  T215] Code: Bad RIP value.
>> [53458.614570][  T215] RSP: 002b:7ffe6a4b18d8 EFLAGS: 0246 ORIG_RAX: 
>> 0036
>> [53458.623094][  T215] RAX: ffda RBX: 0036 RCX: 
>> 7f2cbe1b270d
>> [53458.631024][  T215] RDX: 002a RSI:  RDI: 
>> 0060
>> [53458.638954][  T215] RBP: 0036 R08: 0088 R09: 
>> 00f0
>> [53458.647044][  T215] R10: 025557f0 R11: 0246 R12: 
>> 0002
>> [53458.654960][  T215] R13: 7f2cbcaeb058 R14: 7f2cbe0716c0 R15: 
>> 7f2cbcaeb000
>> [53458.662899][  T215] INFO: task trinity-c10:150896 blocked for more than 
>> 123 seconds.
>> [53458.670757][  T215]   Tainted: G   O L
>> 5.7.0-rc3-next-20200429 #1
>> [53458.678380][  T215] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
>> disables this message.
>> [53458.687007][  T215] trinity-c10 D11512 150896  93301 0x1004
>> 

Re: rcu_barrier() + membarrier() + scheduler deadlock?

2020-04-30 Thread Paul E. McKenney
On Wed, Apr 29, 2020 at 11:39:48PM -0400, Qian Cai wrote:
> 
> 
> > On Apr 29, 2020, at 11:22 PM, Paul E. McKenney  wrote:
> > 
> > On Wed, Apr 29, 2020 at 11:00:58PM -0400, Qian Cai wrote:
> >> Doing some fuzzers on many CPUs triggers a deadlock really quickly.  I can 
> >> see that there were several tasks had been stuck for a while,
> >> 
> >> CPUA:
> >> slab_caches_to_rcu_destroy_workfn()
> >> rcu_barrier()
> >> wait_for_completion()
> >> 
> >> CPUB:
> >> sched_setaffinity()
> >> __set_cpus_allowed_ptr()
> >> stop_one_cpu()
> >> 
> >> CPUC:
> >> __x64_sys_membarrier
> >> synchronize_rcu()
> >> __wait_rcu_gp()
> >> wait_for_completion()
> >> 
> >> Lockdep did not flag anything useful at all. Any clue?
> > 
> > CPUA and CPUC are most likely both waiting for a grace period to complete.
> > 
> > CPUA will be blocking CPU hotplug (get_online_cpus()) in case that
> > matters.  I am not seeing any obvious wait for an RCU grace period
> > in CPUB.
> > 
> > But which task's affinity is being set?  That of the grace-period kthread?
> 
> I am not sure. One of task's sched_setaffinity() can’t die, so here is the 
> log from that task.
> 
> # grep setaffinity rootfs/tmp/trinity-child20.log 
> [child20:131] [313] sched_setaffinity(pid=0, len=0x13939393, 
> user_mask_ptr=0x7fa3d7c37000) = -1 (Bad address)
> [child20:131] [574] sched_setaffinity(pid=196, len=0xf000, user_mask_ptr=0x8) 
> = -1 (Bad address)
> [child20:131] [589] sched_setaffinity(pid=0, len=100, user_mask_ptr=0x1) = -1 
> (Bad address)
> [child20:131] [615] sched_setaffinity(pid=0, len=1, user_mask_ptr=0x8) = -1 
> (Bad address)
> [child20:589] [17] sched_setaffinity(pid=0, len=8, 
> user_mask_ptr=0x7fa3d7c39000) = -1 (Invalid argument)
> [child20:589] [346] sched_setaffinity(pid=0, len=167, user_mask_ptr=0x1) = -1 
> (Bad address)
> [child20:926] [44] sched_setaffinity(pid=0, len=4096, 
> user_mask_ptr=0x7fa3d7c3c000) = -1 (Invalid argument)
> [child20:926] [124] sched_setaffinity(pid=0, len=1, 
> user_mask_ptr=0x7fa3d7c38000) = 0
> [child20:1007] [217] sched_setaffinity(pid=0, len=4096, 
> user_mask_ptr=0x7fa3d7c38000) = 0
> [child20:1007] [235] sched_setaffinity(pid=0, len=8, user_mask_ptr=0x0) = -1 
> (Bad address)
> [child20:1122] [63] sched_setaffinity(pid=0, len=777, user_mask_ptr=0x4) = -1 
> (Bad address)
> [child20:1122] [509] sched_setaffinity(pid=0, len=4096, 
> user_mask_ptr=0x7fa3d803c000) = 0
> [child20:1122] [902] sched_setaffinity(pid=1750, len=1, user_mask_ptr=0x8) = 
> -1 (Bad address)
> [child20:1824] [57] sched_setaffinity(pid=1723, len=0x1fa56, 
> user_mask_ptr=0x1) = -1 (Bad address)
> [child20:1853] [92] sched_setaffinity(pid=1741, len=4096, 
> user_mask_ptr=0x7fa3d843c000) = 0
> [child20:2058] [114] sched_setaffinity(pid=2019, len=23, user_mask_ptr=0x8) = 
> -1 (Bad address)

Not sure what to make of this.  Some succeeded, others had various
error conditions.  Though I would expect the PID of RCU's kthreads to
be smaller than shown, but it does depend on the hardware and software
configuration.

> > If not, are there rcuo kthreads present?  Either way, preventing any of
> 
> I did not catch that. I might just be that workqueue was unable to handle 
> fuzzers loads on many CPUs. I had another 32-CPU server stuck in 
> flush_work(). Not sure if it is related.

Your .config hss CONFIG_PREEMPT_NONE=y and CONFIG_NO_HZ_IDLE=y, and
there is no RCU callback offloading, so there should not have been any
rcuo kthreads.

But yes, too much overload might break things.

> > RCU's kthreads from running could block potentially both CPUA and CPUC.
> > Though in the case of the grace-period kthread, I would expect to see
> > an RCU CPU stall warning.
> > 
> > Could you please share your .config?
> 
> https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config
> 
> [53294.651754][T149877] futex_wake_op: trinity-c25 tries to shift op by -17; 
> fix this program
> [53323.947396][T150988] futex_wake_op: trinity-c6 tries to shift op by -5; 
> fix this program
> [53458.295837][  T215] INFO: task kworker/u64:0:8 blocked for more than 122 
> seconds.
> [53458.304063][  T215]   Tainted: G   O L
> 5.7.0-rc3-next-20200429 #1
> [53458.311568][  T215] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
> disables this message.
> [53458.320190][  T215] kworker/u64:0   D10584 8  2 0x90004000
> [53458.326668][  T215] Workqueue: netns cleanup_net
> [53458.331330][  T215] Call Trace:
> [53458.334510][  T215]  __schedule+0x47b/0xa50
> [53458.338765][  T215]  ? wait_for_completion+0x80/0x120
> [53458.343920][  T215]  schedule+0x59/0xd0
> [53458.348013][  T215]  schedule_timeout+0x10a/0x150
> [53458.352762][  T215]  ? wait_for_completion+0x80/0x120
> [53458.357881][  T215]  ? _raw_spin_unlock_irq+0x30/0x40
> [53458.362997][  T215]  ? trace_hardirqs_on+0x22/0x100
> [53458.367948][  T215]  ? wait_for_completion+0x80/0x120
> [53458.373195][  T215]  wait_for_completion+0xb4/0x120
> [53458.378149][  T215]  __flush_work+0x3ff/0x6e0
> 

Re: rcu_barrier() + membarrier() + scheduler deadlock?

2020-04-29 Thread Qian Cai



> On Apr 29, 2020, at 11:22 PM, Paul E. McKenney  wrote:
> 
> On Wed, Apr 29, 2020 at 11:00:58PM -0400, Qian Cai wrote:
>> Doing some fuzzers on many CPUs triggers a deadlock really quickly.  I can 
>> see that there were several tasks had been stuck for a while,
>> 
>> CPUA:
>> slab_caches_to_rcu_destroy_workfn()
>> rcu_barrier()
>> wait_for_completion()
>> 
>> CPUB:
>> sched_setaffinity()
>> __set_cpus_allowed_ptr()
>> stop_one_cpu()
>> 
>> CPUC:
>> __x64_sys_membarrier
>> synchronize_rcu()
>> __wait_rcu_gp()
>> wait_for_completion()
>> 
>> Lockdep did not flag anything useful at all. Any clue?
> 
> CPUA and CPUC are most likely both waiting for a grace period to complete.
> 
> CPUA will be blocking CPU hotplug (get_online_cpus()) in case that
> matters.  I am not seeing any obvious wait for an RCU grace period
> in CPUB.
> 
> But which task's affinity is being set?  That of the grace-period kthread?

I am not sure. One of task's sched_setaffinity() can’t die, so here is the log 
from that task.

# grep setaffinity rootfs/tmp/trinity-child20.log 
[child20:131] [313] sched_setaffinity(pid=0, len=0x13939393, 
user_mask_ptr=0x7fa3d7c37000) = -1 (Bad address)
[child20:131] [574] sched_setaffinity(pid=196, len=0xf000, user_mask_ptr=0x8) = 
-1 (Bad address)
[child20:131] [589] sched_setaffinity(pid=0, len=100, user_mask_ptr=0x1) = -1 
(Bad address)
[child20:131] [615] sched_setaffinity(pid=0, len=1, user_mask_ptr=0x8) = -1 
(Bad address)
[child20:589] [17] sched_setaffinity(pid=0, len=8, 
user_mask_ptr=0x7fa3d7c39000) = -1 (Invalid argument)
[child20:589] [346] sched_setaffinity(pid=0, len=167, user_mask_ptr=0x1) = -1 
(Bad address)
[child20:926] [44] sched_setaffinity(pid=0, len=4096, 
user_mask_ptr=0x7fa3d7c3c000) = -1 (Invalid argument)
[child20:926] [124] sched_setaffinity(pid=0, len=1, 
user_mask_ptr=0x7fa3d7c38000) = 0
[child20:1007] [217] sched_setaffinity(pid=0, len=4096, 
user_mask_ptr=0x7fa3d7c38000) = 0
[child20:1007] [235] sched_setaffinity(pid=0, len=8, user_mask_ptr=0x0) = -1 
(Bad address)
[child20:1122] [63] sched_setaffinity(pid=0, len=777, user_mask_ptr=0x4) = -1 
(Bad address)
[child20:1122] [509] sched_setaffinity(pid=0, len=4096, 
user_mask_ptr=0x7fa3d803c000) = 0
[child20:1122] [902] sched_setaffinity(pid=1750, len=1, user_mask_ptr=0x8) = -1 
(Bad address)
[child20:1824] [57] sched_setaffinity(pid=1723, len=0x1fa56, user_mask_ptr=0x1) 
= -1 (Bad address)
[child20:1853] [92] sched_setaffinity(pid=1741, len=4096, 
user_mask_ptr=0x7fa3d843c000) = 0
[child20:2058] [114] sched_setaffinity(pid=2019, len=23, user_mask_ptr=0x8) = 
-1 (Bad address)

> If not, are there rcuo kthreads present?  Either way, preventing any of

I did not catch that. I might just be that workqueue was unable to handle 
fuzzers loads on many CPUs. I had another 32-CPU server stuck in flush_work(). 
Not sure if it is related.

> RCU's kthreads from running could block potentially both CPUA and CPUC.
> Though in the case of the grace-period kthread, I would expect to see
> an RCU CPU stall warning.
> 
> Could you please share your .config?

https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config

[53294.651754][T149877] futex_wake_op: trinity-c25 tries to shift op by -17; 
fix this program
[53323.947396][T150988] futex_wake_op: trinity-c6 tries to shift op by -5; fix 
this program
[53458.295837][  T215] INFO: task kworker/u64:0:8 blocked for more than 122 
seconds.
[53458.304063][  T215]   Tainted: G   O L
5.7.0-rc3-next-20200429 #1
[53458.311568][  T215] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[53458.320190][  T215] kworker/u64:0   D10584 8  2 0x90004000
[53458.326668][  T215] Workqueue: netns cleanup_net
[53458.331330][  T215] Call Trace:
[53458.334510][  T215]  __schedule+0x47b/0xa50
[53458.338765][  T215]  ? wait_for_completion+0x80/0x120
[53458.343920][  T215]  schedule+0x59/0xd0
[53458.348013][  T215]  schedule_timeout+0x10a/0x150
[53458.352762][  T215]  ? wait_for_completion+0x80/0x120
[53458.357881][  T215]  ? _raw_spin_unlock_irq+0x30/0x40
[53458.362997][  T215]  ? trace_hardirqs_on+0x22/0x100
[53458.367948][  T215]  ? wait_for_completion+0x80/0x120
[53458.373195][  T215]  wait_for_completion+0xb4/0x120
[53458.378149][  T215]  __flush_work+0x3ff/0x6e0
[53458.382586][  T215]  ? init_pwq+0x210/0x210
[53458.386840][  T215]  flush_work+0x20/0x30
[53458.390891][  T215]  rollback_registered_many+0x3d6/0x950
[53458.396438][  T215]  ? mark_held_locks+0x4e/0x80
[53458.401339][  T215]  unregister_netdevice_many+0x5d/0x200
[53458.406816][  T215]  default_device_exit_batch+0x213/0x240
[53458.412348][  T215]  ? do_wait_intr_irq+0xe0/0xe0
[53458.417225][  T215]  ? dev_change_net_namespace+0x6d0/0x6d0
[53458.423000][  T215]  ops_exit_list+0xa2/0xc0
[53458.427367][  T215]  cleanup_net+0x3d0/0x600
[53458.431778][  T215]  process_one_work+0x560/0xba0
[53458.436629][  T215]  worker_thread+0x80/0x5f0
[53458.441078][  T215]  ? process_scheduled_works+0x90/0x90

Re: rcu_barrier() + membarrier() + scheduler deadlock?

2020-04-29 Thread Paul E. McKenney
On Wed, Apr 29, 2020 at 08:22:38PM -0700, Paul E. McKenney wrote:
> On Wed, Apr 29, 2020 at 11:00:58PM -0400, Qian Cai wrote:
> > Doing some fuzzers on many CPUs triggers a deadlock really quickly.  I can 
> > see that there were several tasks had been stuck for a while,
> > 
> > CPUA:
> > slab_caches_to_rcu_destroy_workfn()
> > rcu_barrier()
> > wait_for_completion()
> > 
> > CPUB:
> > sched_setaffinity()
> > __set_cpus_allowed_ptr()
> > stop_one_cpu()
> > 
> > CPUC:
> > __x64_sys_membarrier
> > synchronize_rcu()
> > __wait_rcu_gp()
> > wait_for_completion()
> > 
> > Lockdep did not flag anything useful at all. Any clue?
> 
> CPUA and CPUC are most likely both waiting for a grace period to complete.
> 
> CPUA will be blocking CPU hotplug (get_online_cpus()) in case that
> matters.  I am not seeing any obvious wait for an RCU grace period
> in CPUB.
> 
> But which task's affinity is being set?  That of the grace-period kthread?
> If not, are there rcuo kthreads present?  Either way, preventing any of
> RCU's kthreads from running could block potentially both CPUA and CPUC.
> Though in the case of the grace-period kthread, I would expect to see
> an RCU CPU stall warning.
> 
> Could you please share your .config?

And there are a number of tasks below marked "can't die".  So maybe
something more fundamental is stuck?

Thanx, Paul

> > [52885.040113][T150406] futex_wake_op: trinity-c9 tries to shift op by 
> > -1189; fix this program
> > [52890.366916][T150196] futex_wake_op: trinity-c3 tries to shift op by -1; 
> > fix this program
> > [52896.369779][T151054] futex_wake_op: trinity-c23 tries to shift op by 
> > 710; fix this program
> > [52901.770688][  T310] INFO: task kworker/36:2:142207 blocked for more than 
> > 122 seconds.
> > [52901.811471][  T310]   Tainted: G L
> > 5.7.0-rc3-next-20200429 #2
> > [52901.849148][  T310] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
> > disables this message.
> > [52901.889803][  T310] kworker/36:2D29104 142207  2 0x90004000
> > [52901.919461][  T310] Workqueue: events slab_caches_to_rcu_destroy_workfn
> > [52901.950935][  T310] Call Trace:
> > [52901.965369][  T310]  __schedule+0x77e/0x1070
> > [52901.986211][  T310]  ? __sched_text_start+0x8/0x8
> > [52902.008267][  T310]  ? trace_hardirqs_on+0x3a/0x160
> > [52902.031764][  T310]  schedule+0x95/0x160
> > [52902.050729][  T310]  schedule_timeout+0x47c/0x6a0
> > [52902.072897][  T310]  ? print_irqtrace_events+0x110/0x110
> > [52902.098316][  T310]  ? usleep_range+0x100/0x100
> > [52902.119664][  T310]  ? mark_held_locks+0x86/0xb0
> > [52902.141397][  T310]  ? _raw_spin_unlock_irq+0x27/0x40
> > [52902.165326][  T310]  ? _raw_spin_unlock_irq+0x27/0x40
> > [52902.189622][  T310]  ? lockdep_hardirqs_on+0x1b0/0x2c0
> > [52902.213641][  T310]  ? wait_for_completion+0x116/0x1a0
> > [52902.238757][  T310]  ? trace_hardirqs_on+0x3a/0x160
> > [52902.261648][  T310]  wait_for_completion+0x11e/0x1a0
> > [52902.285516][  T310]  ? wait_for_completion_interruptible+0x220/0x220
> > [52902.316342][  T310]  ? rcu_read_lock_held+0xc0/0xc0
> > [52902.340298][  T310]  rcu_barrier+0x2c5/0x440
> > [52902.361736][  T310]  slab_caches_to_rcu_destroy_workfn+0x95/0xe0
> > [52902.391109][  T310]  process_one_work+0x57e/0xb90
> > [52902.413244][  T310]  ? pwq_dec_nr_in_flight+0x170/0x170
> > [52902.437957][  T310]  ? worker_thread+0x14b/0x5b0
> > [52902.460357][  T310]  worker_thread+0x63/0x5b0
> > [52902.481267][  T310]  ? process_one_work+0xb90/0xb90
> > [52902.504818][  T310]  kthread+0x1f7/0x220
> > [52902.523542][  T310]  ? kthread_create_worker_on_cpu+0xc0/0xc0
> > [52902.550801][  T310]  ret_from_fork+0x3a/0x50
> > [52902.560427][T151328] futex_wake_op: trinity-c10 tries to shift op by -1; 
> > fix this program
> > [52902.571250][  T310] INFO: task trinity-c18:145344 can't die for more 
> > than 123 seconds.
> > [52902.648554][  T310] trinity-c18 R  running task26944 145344  
> > 74932 0x10004004
> > [52902.700179][  T310] Call Trace:
> > [52902.715039][  T310]  __schedule+0x77e/0x1070
> > [52902.735531][  T310]  ? __sched_text_start+0x8/0x8
> > [52902.757922][  T310]  ? do_syscall_64+0x28d/0xaf0
> > [52902.779731][  T310]  ? do_syscall_64+0x28d/0xaf0
> > [52902.802215][  T310]  schedule+0x95/0x160
> > [52902.821375][  T310]  do_syscall_64+0x23e/0xaf0
> > [52902.843517][  T310]  ? perf_call_bpf_enter+0x1a0/0x1a0
> > [52902.871003][  T310]  ? ftrace_syscall_enter+0x4b0/0x4b0
> > [52902.896034][  T310]  ? syscall_return_slowpath+0x580/0x580
> > [52902.922153][  T310]  ? entry_SYSCALL_64_after_hwframe+0x3e/0xb3
> > [52902.950409][  T310]  ? trace_hardirqs_off_caller+0x3a/0x150
> > [52902.976721][  T310]  ? trace_hardirqs_off_thunk+0x1a/0x1c
> > [52903.002409][  T310]  entry_SYSCALL_64_after_hwframe+0x49/0xb3
> > [52903.030018][  T310] RIP: 0033:0x7fa3d9cd670d
> > [52903.050021][  T310] Code: Bad RIP value.
> > [52903.068733][  T310] RSP: 

Re: rcu_barrier() + membarrier() + scheduler deadlock?

2020-04-29 Thread Paul E. McKenney
On Wed, Apr 29, 2020 at 11:00:58PM -0400, Qian Cai wrote:
> Doing some fuzzers on many CPUs triggers a deadlock really quickly.  I can 
> see that there were several tasks had been stuck for a while,
> 
> CPUA:
> slab_caches_to_rcu_destroy_workfn()
> rcu_barrier()
> wait_for_completion()
> 
> CPUB:
> sched_setaffinity()
> __set_cpus_allowed_ptr()
> stop_one_cpu()
> 
> CPUC:
> __x64_sys_membarrier
> synchronize_rcu()
> __wait_rcu_gp()
> wait_for_completion()
> 
> Lockdep did not flag anything useful at all. Any clue?

CPUA and CPUC are most likely both waiting for a grace period to complete.

CPUA will be blocking CPU hotplug (get_online_cpus()) in case that
matters.  I am not seeing any obvious wait for an RCU grace period
in CPUB.

But which task's affinity is being set?  That of the grace-period kthread?
If not, are there rcuo kthreads present?  Either way, preventing any of
RCU's kthreads from running could block potentially both CPUA and CPUC.
Though in the case of the grace-period kthread, I would expect to see
an RCU CPU stall warning.

Could you please share your .config?

Thanx, Paul

> [52885.040113][T150406] futex_wake_op: trinity-c9 tries to shift op by -1189; 
> fix this program
> [52890.366916][T150196] futex_wake_op: trinity-c3 tries to shift op by -1; 
> fix this program
> [52896.369779][T151054] futex_wake_op: trinity-c23 tries to shift op by 710; 
> fix this program
> [52901.770688][  T310] INFO: task kworker/36:2:142207 blocked for more than 
> 122 seconds.
> [52901.811471][  T310]   Tainted: G L
> 5.7.0-rc3-next-20200429 #2
> [52901.849148][  T310] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
> disables this message.
> [52901.889803][  T310] kworker/36:2D29104 142207  2 0x90004000
> [52901.919461][  T310] Workqueue: events slab_caches_to_rcu_destroy_workfn
> [52901.950935][  T310] Call Trace:
> [52901.965369][  T310]  __schedule+0x77e/0x1070
> [52901.986211][  T310]  ? __sched_text_start+0x8/0x8
> [52902.008267][  T310]  ? trace_hardirqs_on+0x3a/0x160
> [52902.031764][  T310]  schedule+0x95/0x160
> [52902.050729][  T310]  schedule_timeout+0x47c/0x6a0
> [52902.072897][  T310]  ? print_irqtrace_events+0x110/0x110
> [52902.098316][  T310]  ? usleep_range+0x100/0x100
> [52902.119664][  T310]  ? mark_held_locks+0x86/0xb0
> [52902.141397][  T310]  ? _raw_spin_unlock_irq+0x27/0x40
> [52902.165326][  T310]  ? _raw_spin_unlock_irq+0x27/0x40
> [52902.189622][  T310]  ? lockdep_hardirqs_on+0x1b0/0x2c0
> [52902.213641][  T310]  ? wait_for_completion+0x116/0x1a0
> [52902.238757][  T310]  ? trace_hardirqs_on+0x3a/0x160
> [52902.261648][  T310]  wait_for_completion+0x11e/0x1a0
> [52902.285516][  T310]  ? wait_for_completion_interruptible+0x220/0x220
> [52902.316342][  T310]  ? rcu_read_lock_held+0xc0/0xc0
> [52902.340298][  T310]  rcu_barrier+0x2c5/0x440
> [52902.361736][  T310]  slab_caches_to_rcu_destroy_workfn+0x95/0xe0
> [52902.391109][  T310]  process_one_work+0x57e/0xb90
> [52902.413244][  T310]  ? pwq_dec_nr_in_flight+0x170/0x170
> [52902.437957][  T310]  ? worker_thread+0x14b/0x5b0
> [52902.460357][  T310]  worker_thread+0x63/0x5b0
> [52902.481267][  T310]  ? process_one_work+0xb90/0xb90
> [52902.504818][  T310]  kthread+0x1f7/0x220
> [52902.523542][  T310]  ? kthread_create_worker_on_cpu+0xc0/0xc0
> [52902.550801][  T310]  ret_from_fork+0x3a/0x50
> [52902.560427][T151328] futex_wake_op: trinity-c10 tries to shift op by -1; 
> fix this program
> [52902.571250][  T310] INFO: task trinity-c18:145344 can't die for more than 
> 123 seconds.
> [52902.648554][  T310] trinity-c18 R  running task26944 145344  74932 
> 0x10004004
> [52902.700179][  T310] Call Trace:
> [52902.715039][  T310]  __schedule+0x77e/0x1070
> [52902.735531][  T310]  ? __sched_text_start+0x8/0x8
> [52902.757922][  T310]  ? do_syscall_64+0x28d/0xaf0
> [52902.779731][  T310]  ? do_syscall_64+0x28d/0xaf0
> [52902.802215][  T310]  schedule+0x95/0x160
> [52902.821375][  T310]  do_syscall_64+0x23e/0xaf0
> [52902.843517][  T310]  ? perf_call_bpf_enter+0x1a0/0x1a0
> [52902.871003][  T310]  ? ftrace_syscall_enter+0x4b0/0x4b0
> [52902.896034][  T310]  ? syscall_return_slowpath+0x580/0x580
> [52902.922153][  T310]  ? entry_SYSCALL_64_after_hwframe+0x3e/0xb3
> [52902.950409][  T310]  ? trace_hardirqs_off_caller+0x3a/0x150
> [52902.976721][  T310]  ? trace_hardirqs_off_thunk+0x1a/0x1c
> [52903.002409][  T310]  entry_SYSCALL_64_after_hwframe+0x49/0xb3
> [52903.030018][  T310] RIP: 0033:0x7fa3d9cd670d
> [52903.050021][  T310] Code: Bad RIP value.
> [52903.068733][  T310] RSP: 002b:7fff1d6588c8 EFLAGS: 0246 ORIG_RAX: 
> 00cb
> [52903.107680][  T310] RAX:  RBX: 00cb RCX: 
> 7fa3d9cd670d
> [52903.144861][  T310] RDX: 7fa3d7c38000 RSI: 0004 RDI: 
> 
> [52903.182189][  T310] RBP: 00cb R08: ca23 R09: 
> 00b3
> [52903.219290][  T310]