[Kernel-packages] [Bug 1968361] Re: rawsock test BUG: soft lockup
** Changed in: stress-ng Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1968361 Title: rawsock test BUG: soft lockup Status in Linux: Fix Released Status in Stress-ng: Fix Released Status in linux package in Ubuntu: Invalid Bug description: When running the rawsock stressor on large system with 32 CPUs and above, I always hit soft lockup in the kernel, and sometime it will lock up the system if running it for longtime. This issue is on all major OSes that I tested: Ubunutu 20.04. RHEL7,8, SUSE 15 my system: stress-ng V0.13.03-5-g9093bce7 #lscpu | grep CPU CPU(s): 64 On-line CPU(s) list: 0-63 NUMA node0 CPU(s): 0-63 # ./stress-ng --rawsock 20 -t 5 stress-ng: info: [49748] setting to a 5 second run per stressor stress-ng: info: [49748] dispatching hogs: 20 rawsock Message from syslogd@rain65 at Apr 8 12:18:26 ... kernel:watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [stress-ng:49781] If I run with --timeout 60 secs, it will lock up the systems. The issue is lock starvation in the kernel: - when stressor create an instance, forking a new child/client and parent/server processes, recreating sockets for these processes. The kernel acquires the Write lock for adding them to raw sock hash table. - the client process immediately starts sending data in a do while {} loop. The kernel acquires the Read Lock for accessing raw sock hash table, and cloning the data packets for all raw socket processes. - The main stress-ng process may still continue to create the rest of instances. The kernel may hit the lock starvation (as error shown above) - similar to it, when the timeout expires, the parents would try to close their sockets, which the kernel also try to acquire the Write Lock, before sending SIGKILL to their child processes. We may hit the lock starvation, since clients have not closed their sockets and continue sending data. I'm not sure this is intended, but to avoid the kernel lock starvation in raw socket, I propose the simple patch attached. I has tested it a large system with 128 CPUs without hitting any BUG: soft lock up. Thanks, Thinh Tran To manage notifications about this bug go to: https://bugs.launchpad.net/linux/+bug/1968361/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1968361] Re: rawsock test BUG: soft lockup
Also there is a re-forking delay added to allow instances to fire up and back off if resources get low. These changes have been tested with 256, 1024, 4096 and 8192 instances on a 24 thread system with 32GB of memory. ** Changed in: linux (Ubuntu) Status: New => Invalid ** Changed in: linux (Ubuntu) Importance: High => Low -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1968361 Title: rawsock test BUG: soft lockup Status in Linux: Fix Released Status in Stress-ng: Fix Committed Status in linux package in Ubuntu: Invalid Bug description: When running the rawsock stressor on large system with 32 CPUs and above, I always hit soft lockup in the kernel, and sometime it will lock up the system if running it for longtime. This issue is on all major OSes that I tested: Ubunutu 20.04. RHEL7,8, SUSE 15 my system: stress-ng V0.13.03-5-g9093bce7 #lscpu | grep CPU CPU(s): 64 On-line CPU(s) list: 0-63 NUMA node0 CPU(s): 0-63 # ./stress-ng --rawsock 20 -t 5 stress-ng: info: [49748] setting to a 5 second run per stressor stress-ng: info: [49748] dispatching hogs: 20 rawsock Message from syslogd@rain65 at Apr 8 12:18:26 ... kernel:watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [stress-ng:49781] If I run with --timeout 60 secs, it will lock up the systems. The issue is lock starvation in the kernel: - when stressor create an instance, forking a new child/client and parent/server processes, recreating sockets for these processes. The kernel acquires the Write lock for adding them to raw sock hash table. - the client process immediately starts sending data in a do while {} loop. The kernel acquires the Read Lock for accessing raw sock hash table, and cloning the data packets for all raw socket processes. - The main stress-ng process may still continue to create the rest of instances. The kernel may hit the lock starvation (as error shown above) - similar to it, when the timeout expires, the parents would try to close their sockets, which the kernel also try to acquire the Write Lock, before sending SIGKILL to their child processes. We may hit the lock starvation, since clients have not closed their sockets and continue sending data. I'm not sure this is intended, but to avoid the kernel lock starvation in raw socket, I propose the simple patch attached. I has tested it a large system with 128 CPUs without hitting any BUG: soft lock up. Thanks, Thinh Tran To manage notifications about this bug go to: https://bugs.launchpad.net/linux/+bug/1968361/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1968361] Re: rawsock test BUG: soft lockup
This fix will land in the next release of stress-ng at the end of March 2023 ** Changed in: stress-ng Status: New => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1968361 Title: rawsock test BUG: soft lockup Status in Linux: Fix Released Status in Stress-ng: Fix Committed Status in linux package in Ubuntu: Invalid Bug description: When running the rawsock stressor on large system with 32 CPUs and above, I always hit soft lockup in the kernel, and sometime it will lock up the system if running it for longtime. This issue is on all major OSes that I tested: Ubunutu 20.04. RHEL7,8, SUSE 15 my system: stress-ng V0.13.03-5-g9093bce7 #lscpu | grep CPU CPU(s): 64 On-line CPU(s) list: 0-63 NUMA node0 CPU(s): 0-63 # ./stress-ng --rawsock 20 -t 5 stress-ng: info: [49748] setting to a 5 second run per stressor stress-ng: info: [49748] dispatching hogs: 20 rawsock Message from syslogd@rain65 at Apr 8 12:18:26 ... kernel:watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [stress-ng:49781] If I run with --timeout 60 secs, it will lock up the systems. The issue is lock starvation in the kernel: - when stressor create an instance, forking a new child/client and parent/server processes, recreating sockets for these processes. The kernel acquires the Write lock for adding them to raw sock hash table. - the client process immediately starts sending data in a do while {} loop. The kernel acquires the Read Lock for accessing raw sock hash table, and cloning the data packets for all raw socket processes. - The main stress-ng process may still continue to create the rest of instances. The kernel may hit the lock starvation (as error shown above) - similar to it, when the timeout expires, the parents would try to close their sockets, which the kernel also try to acquire the Write Lock, before sending SIGKILL to their child processes. We may hit the lock starvation, since clients have not closed their sockets and continue sending data. I'm not sure this is intended, but to avoid the kernel lock starvation in raw socket, I propose the simple patch attached. I has tested it a large system with 128 CPUs without hitting any BUG: soft lock up. Thanks, Thinh Tran To manage notifications about this bug go to: https://bugs.launchpad.net/linux/+bug/1968361/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1968361] Re: rawsock test BUG: soft lockup
Added an ENOBUFS check on the sender with priority dropping on ENOBUFS errors and also a timer backoff delay. Added OOM killer respawning that can be overridden using the --oomable to allow overcommitted systems to ether respawn OOM'd rawsock instances (default) or not respawn (--oomable). Fix committed upstream: https://github.com/ColinIanKing/stress- ng/commit/e4d3b90267243d7505399e7059950097d9bd50ae -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1968361 Title: rawsock test BUG: soft lockup Status in Linux: Fix Released Status in Stress-ng: Fix Committed Status in linux package in Ubuntu: Invalid Bug description: When running the rawsock stressor on large system with 32 CPUs and above, I always hit soft lockup in the kernel, and sometime it will lock up the system if running it for longtime. This issue is on all major OSes that I tested: Ubunutu 20.04. RHEL7,8, SUSE 15 my system: stress-ng V0.13.03-5-g9093bce7 #lscpu | grep CPU CPU(s): 64 On-line CPU(s) list: 0-63 NUMA node0 CPU(s): 0-63 # ./stress-ng --rawsock 20 -t 5 stress-ng: info: [49748] setting to a 5 second run per stressor stress-ng: info: [49748] dispatching hogs: 20 rawsock Message from syslogd@rain65 at Apr 8 12:18:26 ... kernel:watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [stress-ng:49781] If I run with --timeout 60 secs, it will lock up the systems. The issue is lock starvation in the kernel: - when stressor create an instance, forking a new child/client and parent/server processes, recreating sockets for these processes. The kernel acquires the Write lock for adding them to raw sock hash table. - the client process immediately starts sending data in a do while {} loop. The kernel acquires the Read Lock for accessing raw sock hash table, and cloning the data packets for all raw socket processes. - The main stress-ng process may still continue to create the rest of instances. The kernel may hit the lock starvation (as error shown above) - similar to it, when the timeout expires, the parents would try to close their sockets, which the kernel also try to acquire the Write Lock, before sending SIGKILL to their child processes. We may hit the lock starvation, since clients have not closed their sockets and continue sending data. I'm not sure this is intended, but to avoid the kernel lock starvation in raw socket, I propose the simple patch attached. I has tested it a large system with 128 CPUs without hitting any BUG: soft lock up. Thanks, Thinh Tran To manage notifications about this bug go to: https://bugs.launchpad.net/linux/+bug/1968361/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1968361] Re: rawsock test BUG: soft lockup
Looks like the kernel is running out of resources and it is doing Out- of-memory killing of various processes. I think I have ways of reducing this from occurring. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1968361 Title: rawsock test BUG: soft lockup Status in Linux: Fix Released Status in Stress-ng: New Status in linux package in Ubuntu: New Bug description: When running the rawsock stressor on large system with 32 CPUs and above, I always hit soft lockup in the kernel, and sometime it will lock up the system if running it for longtime. This issue is on all major OSes that I tested: Ubunutu 20.04. RHEL7,8, SUSE 15 my system: stress-ng V0.13.03-5-g9093bce7 #lscpu | grep CPU CPU(s): 64 On-line CPU(s) list: 0-63 NUMA node0 CPU(s): 0-63 # ./stress-ng --rawsock 20 -t 5 stress-ng: info: [49748] setting to a 5 second run per stressor stress-ng: info: [49748] dispatching hogs: 20 rawsock Message from syslogd@rain65 at Apr 8 12:18:26 ... kernel:watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [stress-ng:49781] If I run with --timeout 60 secs, it will lock up the systems. The issue is lock starvation in the kernel: - when stressor create an instance, forking a new child/client and parent/server processes, recreating sockets for these processes. The kernel acquires the Write lock for adding them to raw sock hash table. - the client process immediately starts sending data in a do while {} loop. The kernel acquires the Read Lock for accessing raw sock hash table, and cloning the data packets for all raw socket processes. - The main stress-ng process may still continue to create the rest of instances. The kernel may hit the lock starvation (as error shown above) - similar to it, when the timeout expires, the parents would try to close their sockets, which the kernel also try to acquire the Write Lock, before sending SIGKILL to their child processes. We may hit the lock starvation, since clients have not closed their sockets and continue sending data. I'm not sure this is intended, but to avoid the kernel lock starvation in raw socket, I propose the simple patch attached. I has tested it a large system with 128 CPUs without hitting any BUG: soft lock up. Thanks, Thinh Tran To manage notifications about this bug go to: https://bugs.launchpad.net/linux/+bug/1968361/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1968361] Re: rawsock test BUG: soft lockup
Hi Colin, the patch in comment #2 does not work on my system with 128 CPUs. I still got the BUG: soft lockup, and eventually kill my ssh session. I think the workaround is making the client (sender) wait for the main process done with creating all the stressor's instances. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1968361 Title: rawsock test BUG: soft lockup Status in Linux: Fix Released Status in Stress-ng: New Status in linux package in Ubuntu: New Bug description: When running the rawsock stressor on large system with 32 CPUs and above, I always hit soft lockup in the kernel, and sometime it will lock up the system if running it for longtime. This issue is on all major OSes that I tested: Ubunutu 20.04. RHEL7,8, SUSE 15 my system: stress-ng V0.13.03-5-g9093bce7 #lscpu | grep CPU CPU(s): 64 On-line CPU(s) list: 0-63 NUMA node0 CPU(s): 0-63 # ./stress-ng --rawsock 20 -t 5 stress-ng: info: [49748] setting to a 5 second run per stressor stress-ng: info: [49748] dispatching hogs: 20 rawsock Message from syslogd@rain65 at Apr 8 12:18:26 ... kernel:watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [stress-ng:49781] If I run with --timeout 60 secs, it will lock up the systems. The issue is lock starvation in the kernel: - when stressor create an instance, forking a new child/client and parent/server processes, recreating sockets for these processes. The kernel acquires the Write lock for adding them to raw sock hash table. - the client process immediately starts sending data in a do while {} loop. The kernel acquires the Read Lock for accessing raw sock hash table, and cloning the data packets for all raw socket processes. - The main stress-ng process may still continue to create the rest of instances. The kernel may hit the lock starvation (as error shown above) - similar to it, when the timeout expires, the parents would try to close their sockets, which the kernel also try to acquire the Write Lock, before sending SIGKILL to their child processes. We may hit the lock starvation, since clients have not closed their sockets and continue sending data. I'm not sure this is intended, but to avoid the kernel lock starvation in raw socket, I propose the simple patch attached. I has tested it a large system with 128 CPUs without hitting any BUG: soft lock up. Thanks, Thinh Tran To manage notifications about this bug go to: https://bugs.launchpad.net/linux/+bug/1968361/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1968361] Re: rawsock test BUG: soft lockup
** Changed in: linux Status: New => Fix Released -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1968361 Title: rawsock test BUG: soft lockup Status in Linux: Fix Released Status in Stress-ng: New Status in linux package in Ubuntu: New Bug description: When running the rawsock stressor on large system with 32 CPUs and above, I always hit soft lockup in the kernel, and sometime it will lock up the system if running it for longtime. This issue is on all major OSes that I tested: Ubunutu 20.04. RHEL7,8, SUSE 15 my system: stress-ng V0.13.03-5-g9093bce7 #lscpu | grep CPU CPU(s): 64 On-line CPU(s) list: 0-63 NUMA node0 CPU(s): 0-63 # ./stress-ng --rawsock 20 -t 5 stress-ng: info: [49748] setting to a 5 second run per stressor stress-ng: info: [49748] dispatching hogs: 20 rawsock Message from syslogd@rain65 at Apr 8 12:18:26 ... kernel:watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [stress-ng:49781] If I run with --timeout 60 secs, it will lock up the systems. The issue is lock starvation in the kernel: - when stressor create an instance, forking a new child/client and parent/server processes, recreating sockets for these processes. The kernel acquires the Write lock for adding them to raw sock hash table. - the client process immediately starts sending data in a do while {} loop. The kernel acquires the Read Lock for accessing raw sock hash table, and cloning the data packets for all raw socket processes. - The main stress-ng process may still continue to create the rest of instances. The kernel may hit the lock starvation (as error shown above) - similar to it, when the timeout expires, the parents would try to close their sockets, which the kernel also try to acquire the Write Lock, before sending SIGKILL to their child processes. We may hit the lock starvation, since clients have not closed their sockets and continue sending data. I'm not sure this is intended, but to avoid the kernel lock starvation in raw socket, I propose the simple patch attached. I has tested it a large system with 128 CPUs without hitting any BUG: soft lock up. Thanks, Thinh Tran To manage notifications about this bug go to: https://bugs.launchpad.net/linux/+bug/1968361/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1968361] Re: rawsock test BUG: soft lockup
Work-aound committed to stress-ng: commit 69328da97f04745a9da2890c90c131c2322f81e2 (HEAD -> master) Author: Colin Ian King Date: Wed Apr 27 08:49:12 2022 + stress-rawsock: make client wait for server to start -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1968361 Title: rawsock test BUG: soft lockup Status in Linux: New Status in Stress-ng: New Status in linux package in Ubuntu: New Bug description: When running the rawsock stressor on large system with 32 CPUs and above, I always hit soft lockup in the kernel, and sometime it will lock up the system if running it for longtime. This issue is on all major OSes that I tested: Ubunutu 20.04. RHEL7,8, SUSE 15 my system: stress-ng V0.13.03-5-g9093bce7 #lscpu | grep CPU CPU(s): 64 On-line CPU(s) list: 0-63 NUMA node0 CPU(s): 0-63 # ./stress-ng --rawsock 20 -t 5 stress-ng: info: [49748] setting to a 5 second run per stressor stress-ng: info: [49748] dispatching hogs: 20 rawsock Message from syslogd@rain65 at Apr 8 12:18:26 ... kernel:watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [stress-ng:49781] If I run with --timeout 60 secs, it will lock up the systems. The issue is lock starvation in the kernel: - when stressor create an instance, forking a new child/client and parent/server processes, recreating sockets for these processes. The kernel acquires the Write lock for adding them to raw sock hash table. - the client process immediately starts sending data in a do while {} loop. The kernel acquires the Read Lock for accessing raw sock hash table, and cloning the data packets for all raw socket processes. - The main stress-ng process may still continue to create the rest of instances. The kernel may hit the lock starvation (as error shown above) - similar to it, when the timeout expires, the parents would try to close their sockets, which the kernel also try to acquire the Write Lock, before sending SIGKILL to their child processes. We may hit the lock starvation, since clients have not closed their sockets and continue sending data. I'm not sure this is intended, but to avoid the kernel lock starvation in raw socket, I propose the simple patch attached. I has tested it a large system with 128 CPUs without hitting any BUG: soft lock up. Thanks, Thinh Tran To manage notifications about this bug go to: https://bugs.launchpad.net/linux/+bug/1968361/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1968361] Re: rawsock test BUG: soft lockup
** Tags added: patch -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1968361 Title: rawsock test BUG: soft lockup Status in Linux: New Status in Stress-ng: New Status in linux package in Ubuntu: New Bug description: When running the rawsock stressor on large system with 32 CPUs and above, I always hit soft lockup in the kernel, and sometime it will lock up the system if running it for longtime. This issue is on all major OSes that I tested: Ubunutu 20.04. RHEL7,8, SUSE 15 my system: stress-ng V0.13.03-5-g9093bce7 #lscpu | grep CPU CPU(s): 64 On-line CPU(s) list: 0-63 NUMA node0 CPU(s): 0-63 # ./stress-ng --rawsock 20 -t 5 stress-ng: info: [49748] setting to a 5 second run per stressor stress-ng: info: [49748] dispatching hogs: 20 rawsock Message from syslogd@rain65 at Apr 8 12:18:26 ... kernel:watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [stress-ng:49781] If I run with --timeout 60 secs, it will lock up the systems. The issue is lock starvation in the kernel: - when stressor create an instance, forking a new child/client and parent/server processes, recreating sockets for these processes. The kernel acquires the Write lock for adding them to raw sock hash table. - the client process immediately starts sending data in a do while {} loop. The kernel acquires the Read Lock for accessing raw sock hash table, and cloning the data packets for all raw socket processes. - The main stress-ng process may still continue to create the rest of instances. The kernel may hit the lock starvation (as error shown above) - similar to it, when the timeout expires, the parents would try to close their sockets, which the kernel also try to acquire the Write Lock, before sending SIGKILL to their child processes. We may hit the lock starvation, since clients have not closed their sockets and continue sending data. I'm not sure this is intended, but to avoid the kernel lock starvation in raw socket, I propose the simple patch attached. I has tested it a large system with 128 CPUs without hitting any BUG: soft lock up. Thanks, Thinh Tran To manage notifications about this bug go to: https://bugs.launchpad.net/linux/+bug/1968361/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1968361] Re: rawsock test BUG: soft lockup
** Changed in: linux Status: Unknown => New -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1968361 Title: rawsock test BUG: soft lockup Status in Linux: New Status in Stress-ng: New Status in linux package in Ubuntu: New Bug description: When running the rawsock stressor on large system with 32 CPUs and above, I always hit soft lockup in the kernel, and sometime it will lock up the system if running it for longtime. This issue is on all major OSes that I tested: Ubunutu 20.04. RHEL7,8, SUSE 15 my system: stress-ng V0.13.03-5-g9093bce7 #lscpu | grep CPU CPU(s): 64 On-line CPU(s) list: 0-63 NUMA node0 CPU(s): 0-63 # ./stress-ng --rawsock 20 -t 5 stress-ng: info: [49748] setting to a 5 second run per stressor stress-ng: info: [49748] dispatching hogs: 20 rawsock Message from syslogd@rain65 at Apr 8 12:18:26 ... kernel:watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [stress-ng:49781] If I run with --timeout 60 secs, it will lock up the systems. The issue is lock starvation in the kernel: - when stressor create an instance, forking a new child/client and parent/server processes, recreating sockets for these processes. The kernel acquires the Write lock for adding them to raw sock hash table. - the client process immediately starts sending data in a do while {} loop. The kernel acquires the Read Lock for accessing raw sock hash table, and cloning the data packets for all raw socket processes. - The main stress-ng process may still continue to create the rest of instances. The kernel may hit the lock starvation (as error shown above) - similar to it, when the timeout expires, the parents would try to close their sockets, which the kernel also try to acquire the Write Lock, before sending SIGKILL to their child processes. We may hit the lock starvation, since clients have not closed their sockets and continue sending data. I'm not sure this is intended, but to avoid the kernel lock starvation in raw socket, I propose the simple patch attached. I has tested it a large system with 128 CPUs without hitting any BUG: soft lock up. Thanks, Thinh Tran To manage notifications about this bug go to: https://bugs.launchpad.net/linux/+bug/1968361/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1968361] Re: rawsock test BUG: soft lockup
** Also affects: linux (Ubuntu) Importance: Undecided Status: New ** Changed in: linux (Ubuntu) Importance: Undecided => High ** Changed in: stress-ng Importance: Undecided => Low ** Changed in: stress-ng Assignee: (unassigned) => Colin Ian King (colin-king) ** Bug watch added: github.com/ColinIanKing/stress-ng/issues #187 https://github.com/ColinIanKing/stress-ng/issues/187 ** Also affects: linux via https://github.com/ColinIanKing/stress-ng/issues/187 Importance: Unknown Status: Unknown -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1968361 Title: rawsock test BUG: soft lockup Status in Linux: Unknown Status in Stress-ng: New Status in linux package in Ubuntu: New Bug description: When running the rawsock stressor on large system with 32 CPUs and above, I always hit soft lockup in the kernel, and sometime it will lock up the system if running it for longtime. This issue is on all major OSes that I tested: Ubunutu 20.04. RHEL7,8, SUSE 15 my system: stress-ng V0.13.03-5-g9093bce7 #lscpu | grep CPU CPU(s): 64 On-line CPU(s) list: 0-63 NUMA node0 CPU(s): 0-63 # ./stress-ng --rawsock 20 -t 5 stress-ng: info: [49748] setting to a 5 second run per stressor stress-ng: info: [49748] dispatching hogs: 20 rawsock Message from syslogd@rain65 at Apr 8 12:18:26 ... kernel:watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [stress-ng:49781] If I run with --timeout 60 secs, it will lock up the systems. The issue is lock starvation in the kernel: - when stressor create an instance, forking a new child/client and parent/server processes, recreating sockets for these processes. The kernel acquires the Write lock for adding them to raw sock hash table. - the client process immediately starts sending data in a do while {} loop. The kernel acquires the Read Lock for accessing raw sock hash table, and cloning the data packets for all raw socket processes. - The main stress-ng process may still continue to create the rest of instances. The kernel may hit the lock starvation (as error shown above) - similar to it, when the timeout expires, the parents would try to close their sockets, which the kernel also try to acquire the Write Lock, before sending SIGKILL to their child processes. We may hit the lock starvation, since clients have not closed their sockets and continue sending data. I'm not sure this is intended, but to avoid the kernel lock starvation in raw socket, I propose the simple patch attached. I has tested it a large system with 128 CPUs without hitting any BUG: soft lock up. Thanks, Thinh Tran To manage notifications about this bug go to: https://bugs.launchpad.net/linux/+bug/1968361/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp