[Bro-Dev] [JIRA] (BIT-1306) bro process would get stuck/freeze with myricom drivers
[ https://bro-tracker.atlassian.net/browse/BIT-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aashish Sharma updated BIT-1306: Yes, So sorry, I couldn't get to it soon enough. Yes, Patch fixes the problem. Aashish -- Aashish Sharma (asha...@lbl.gov) Cyber Security, Lawrence Berkeley National Laboratory http://go.lbl.gov/pgp-aashish Office: (510)-495-2680 Cell: (510)-612-7971 bro process would get stuck/freeze with myricom drivers --- Key: BIT-1306 URL: https://bro-tracker.atlassian.net/browse/BIT-1306 Project: Bro Issue Tracker Issue Type: Problem Components: Bro Affects Versions: git/master Environment: OS: FreeBSD 9.3-RELEASE-p5 OS bro version 2.3-328 git log -1 --format=%H 379593c7fded0f9791ae71a52dd78a4c9d5a2c1f Reporter: Aashish Sharma Assignee: Robin Sommer Labels: bro-git, myricom Fix For: 2.4 When I stop bro (in cluster mode), one of the bro worker process (random) would get stuck and wouldn't shutdown, stop or even be killed using kill -s 9. System has to be ultimately rebooted to remove stuck bro process. On running myri_start_stop I see: # /usr/local/opt/snf/sbin/myri_start_stop stop Removing myri_snf.ko kldunload: can't unload file: Device busy It appears that the myri_snf.ko driver cannot be unloaded because of the stuck bro process. That process still has an open descriptor on the Sniffer device/driver and bro process freezes More details: The bro process is stuck in RNE state R Marks a runnable process. N The process has reduced CPU scheduling priority (see setpriority(2)). E The process is trying to exit. Here is an example: ### stuck process: [bro@01 ~]$ ps auxwww | fgrep 1616 bro1616 100.0 0.0 758040 60480 ?? RNE 2:57PM 53:50.04 /usr/local/bro-git/bin/bro -i myri0 -U .status -p broctl -p broctl-live -p local -p worker-1-1 mgr.bro broctl base/frameworks/cluster local-worker.bro broctl/auto when checking for process in proc: [bro@c ~]$ ls -l /proc/1616 ls: /proc/1616: No such file or directory -- This message was sent by Atlassian JIRA (v6.4-OD-16-006#64014) ___ bro-dev mailing list bro-dev@bro.org http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev
[Bro-Dev] [JIRA] (BIT-1306) bro process would get stuck/freeze with myricom drivers
[ https://bro-tracker.atlassian.net/browse/BIT-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=20260#comment-20260 ] klehigh commented on BIT-1306: -- This patch works. I'm able to stop without any hanging processes. bro process would get stuck/freeze with myricom drivers --- Key: BIT-1306 URL: https://bro-tracker.atlassian.net/browse/BIT-1306 Project: Bro Issue Tracker Issue Type: Problem Components: Bro Affects Versions: git/master Environment: OS: FreeBSD 9.3-RELEASE-p5 OS bro version 2.3-328 git log -1 --format=%H 379593c7fded0f9791ae71a52dd78a4c9d5a2c1f Reporter: Aashish Sharma Assignee: Robin Sommer Labels: bro-git, myricom Fix For: 2.4 When I stop bro (in cluster mode), one of the bro worker process (random) would get stuck and wouldn't shutdown, stop or even be killed using kill -s 9. System has to be ultimately rebooted to remove stuck bro process. On running myri_start_stop I see: # /usr/local/opt/snf/sbin/myri_start_stop stop Removing myri_snf.ko kldunload: can't unload file: Device busy It appears that the myri_snf.ko driver cannot be unloaded because of the stuck bro process. That process still has an open descriptor on the Sniffer device/driver and bro process freezes More details: The bro process is stuck in RNE state R Marks a runnable process. N The process has reduced CPU scheduling priority (see setpriority(2)). E The process is trying to exit. Here is an example: ### stuck process: [bro@01 ~]$ ps auxwww | fgrep 1616 bro1616 100.0 0.0 758040 60480 ?? RNE 2:57PM 53:50.04 /usr/local/bro-git/bin/bro -i myri0 -U .status -p broctl -p broctl-live -p local -p worker-1-1 mgr.bro broctl base/frameworks/cluster local-worker.bro broctl/auto when checking for process in proc: [bro@c ~]$ ls -l /proc/1616 ls: /proc/1616: No such file or directory -- This message was sent by Atlassian JIRA (v6.4-OD-16-006#64014) ___ bro-dev mailing list bro-dev@bro.org http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev
[Bro-Dev] [JIRA] (BIT-1306) bro process would get stuck/freeze with myricom drivers
[ https://bro-tracker.atlassian.net/browse/BIT-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robin Sommer updated BIT-1306: -- Resolution: Fixed Status: Closed (was: Open) bro process would get stuck/freeze with myricom drivers --- Key: BIT-1306 URL: https://bro-tracker.atlassian.net/browse/BIT-1306 Project: Bro Issue Tracker Issue Type: Problem Components: Bro Affects Versions: git/master Environment: OS: FreeBSD 9.3-RELEASE-p5 OS bro version 2.3-328 git log -1 --format=%H 379593c7fded0f9791ae71a52dd78a4c9d5a2c1f Reporter: Aashish Sharma Assignee: Robin Sommer Labels: bro-git, myricom Fix For: 2.4 When I stop bro (in cluster mode), one of the bro worker process (random) would get stuck and wouldn't shutdown, stop or even be killed using kill -s 9. System has to be ultimately rebooted to remove stuck bro process. On running myri_start_stop I see: # /usr/local/opt/snf/sbin/myri_start_stop stop Removing myri_snf.ko kldunload: can't unload file: Device busy It appears that the myri_snf.ko driver cannot be unloaded because of the stuck bro process. That process still has an open descriptor on the Sniffer device/driver and bro process freezes More details: The bro process is stuck in RNE state R Marks a runnable process. N The process has reduced CPU scheduling priority (see setpriority(2)). E The process is trying to exit. Here is an example: ### stuck process: [bro@01 ~]$ ps auxwww | fgrep 1616 bro1616 100.0 0.0 758040 60480 ?? RNE 2:57PM 53:50.04 /usr/local/bro-git/bin/bro -i myri0 -U .status -p broctl -p broctl-live -p local -p worker-1-1 mgr.bro broctl base/frameworks/cluster local-worker.bro broctl/auto when checking for process in proc: [bro@c ~]$ ls -l /proc/1616 ls: /proc/1616: No such file or directory -- This message was sent by Atlassian JIRA (v6.4-OD-16-006#64014) ___ bro-dev mailing list bro-dev@bro.org http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev
[Bro-Dev] [JIRA] (BIT-1306) bro process would get stuck/freeze with myricom drivers
[ https://bro-tracker.atlassian.net/browse/BIT-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=20261#comment-20261 ] Robin Sommer commented on BIT-1306: --- Thanks, Keith! Closing ticket. bro process would get stuck/freeze with myricom drivers --- Key: BIT-1306 URL: https://bro-tracker.atlassian.net/browse/BIT-1306 Project: Bro Issue Tracker Issue Type: Problem Components: Bro Affects Versions: git/master Environment: OS: FreeBSD 9.3-RELEASE-p5 OS bro version 2.3-328 git log -1 --format=%H 379593c7fded0f9791ae71a52dd78a4c9d5a2c1f Reporter: Aashish Sharma Assignee: Robin Sommer Labels: bro-git, myricom Fix For: 2.4 When I stop bro (in cluster mode), one of the bro worker process (random) would get stuck and wouldn't shutdown, stop or even be killed using kill -s 9. System has to be ultimately rebooted to remove stuck bro process. On running myri_start_stop I see: # /usr/local/opt/snf/sbin/myri_start_stop stop Removing myri_snf.ko kldunload: can't unload file: Device busy It appears that the myri_snf.ko driver cannot be unloaded because of the stuck bro process. That process still has an open descriptor on the Sniffer device/driver and bro process freezes More details: The bro process is stuck in RNE state R Marks a runnable process. N The process has reduced CPU scheduling priority (see setpriority(2)). E The process is trying to exit. Here is an example: ### stuck process: [bro@01 ~]$ ps auxwww | fgrep 1616 bro1616 100.0 0.0 758040 60480 ?? RNE 2:57PM 53:50.04 /usr/local/bro-git/bin/bro -i myri0 -U .status -p broctl -p broctl-live -p local -p worker-1-1 mgr.bro broctl base/frameworks/cluster local-worker.bro broctl/auto when checking for process in proc: [bro@c ~]$ ls -l /proc/1616 ls: /proc/1616: No such file or directory -- This message was sent by Atlassian JIRA (v6.4-OD-16-006#64014) ___ bro-dev mailing list bro-dev@bro.org http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev
[Bro-Dev] [JIRA] (BIT-1306) bro process would get stuck/freeze with myricom drivers
[ https://bro-tracker.atlassian.net/browse/BIT-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=20257#comment-20257 ] klehigh commented on BIT-1306: -- Tested the patch on FreeBSD-10.1-p9 with bro 2.3-680 and Myricom SNF v3 drivers and it resolves this issue. bro process would get stuck/freeze with myricom drivers --- Key: BIT-1306 URL: https://bro-tracker.atlassian.net/browse/BIT-1306 Project: Bro Issue Tracker Issue Type: Problem Components: Bro Affects Versions: git/master Environment: OS: FreeBSD 9.3-RELEASE-p5 OS bro version 2.3-328 git log -1 --format=%H 379593c7fded0f9791ae71a52dd78a4c9d5a2c1f Reporter: Aashish Sharma Assignee: Robin Sommer Labels: bro-git, myricom Fix For: 2.4 When I stop bro (in cluster mode), one of the bro worker process (random) would get stuck and wouldn't shutdown, stop or even be killed using kill -s 9. System has to be ultimately rebooted to remove stuck bro process. On running myri_start_stop I see: # /usr/local/opt/snf/sbin/myri_start_stop stop Removing myri_snf.ko kldunload: can't unload file: Device busy It appears that the myri_snf.ko driver cannot be unloaded because of the stuck bro process. That process still has an open descriptor on the Sniffer device/driver and bro process freezes More details: The bro process is stuck in RNE state R Marks a runnable process. N The process has reduced CPU scheduling priority (see setpriority(2)). E The process is trying to exit. Here is an example: ### stuck process: [bro@01 ~]$ ps auxwww | fgrep 1616 bro1616 100.0 0.0 758040 60480 ?? RNE 2:57PM 53:50.04 /usr/local/bro-git/bin/bro -i myri0 -U .status -p broctl -p broctl-live -p local -p worker-1-1 mgr.bro broctl base/frameworks/cluster local-worker.bro broctl/auto when checking for process in proc: [bro@c ~]$ ls -l /proc/1616 ls: /proc/1616: No such file or directory -- This message was sent by Atlassian JIRA (v6.4-OD-16-006#64014) ___ bro-dev mailing list bro-dev@bro.org http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev
[Bro-Dev] [JIRA] (BIT-1306) bro process would get stuck/freeze with myricom drivers
[ https://bro-tracker.atlassian.net/browse/BIT-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=20230#comment-20230 ] Robin Sommer commented on BIT-1306: --- Check the change. bro process would get stuck/freeze with myricom drivers --- Key: BIT-1306 URL: https://bro-tracker.atlassian.net/browse/BIT-1306 Project: Bro Issue Tracker Issue Type: Problem Components: Bro Affects Versions: git/master Environment: OS: FreeBSD 9.3-RELEASE-p5 OS bro version 2.3-328 git log -1 --format=%H 379593c7fded0f9791ae71a52dd78a4c9d5a2c1f Reporter: Aashish Sharma Labels: bro-git, myricom Fix For: 2.4 When I stop bro (in cluster mode), one of the bro worker process (random) would get stuck and wouldn't shutdown, stop or even be killed using kill -s 9. System has to be ultimately rebooted to remove stuck bro process. On running myri_start_stop I see: # /usr/local/opt/snf/sbin/myri_start_stop stop Removing myri_snf.ko kldunload: can't unload file: Device busy It appears that the myri_snf.ko driver cannot be unloaded because of the stuck bro process. That process still has an open descriptor on the Sniffer device/driver and bro process freezes More details: The bro process is stuck in RNE state R Marks a runnable process. N The process has reduced CPU scheduling priority (see setpriority(2)). E The process is trying to exit. Here is an example: ### stuck process: [bro@01 ~]$ ps auxwww | fgrep 1616 bro1616 100.0 0.0 758040 60480 ?? RNE 2:57PM 53:50.04 /usr/local/bro-git/bin/bro -i myri0 -U .status -p broctl -p broctl-live -p local -p worker-1-1 mgr.bro broctl base/frameworks/cluster local-worker.bro broctl/auto when checking for process in proc: [bro@c ~]$ ls -l /proc/1616 ls: /proc/1616: No such file or directory -- This message was sent by Atlassian JIRA (v6.4-OD-16-006#64014) ___ bro-dev mailing list bro-dev@bro.org http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev
[Bro-Dev] [JIRA] (BIT-1306) bro process would get stuck/freeze with myricom drivers
[ https://bro-tracker.atlassian.net/browse/BIT-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robin Sommer reassigned BIT-1306: - Assignee: Robin Sommer bro process would get stuck/freeze with myricom drivers --- Key: BIT-1306 URL: https://bro-tracker.atlassian.net/browse/BIT-1306 Project: Bro Issue Tracker Issue Type: Problem Components: Bro Affects Versions: git/master Environment: OS: FreeBSD 9.3-RELEASE-p5 OS bro version 2.3-328 git log -1 --format=%H 379593c7fded0f9791ae71a52dd78a4c9d5a2c1f Reporter: Aashish Sharma Assignee: Robin Sommer Labels: bro-git, myricom Fix For: 2.4 When I stop bro (in cluster mode), one of the bro worker process (random) would get stuck and wouldn't shutdown, stop or even be killed using kill -s 9. System has to be ultimately rebooted to remove stuck bro process. On running myri_start_stop I see: # /usr/local/opt/snf/sbin/myri_start_stop stop Removing myri_snf.ko kldunload: can't unload file: Device busy It appears that the myri_snf.ko driver cannot be unloaded because of the stuck bro process. That process still has an open descriptor on the Sniffer device/driver and bro process freezes More details: The bro process is stuck in RNE state R Marks a runnable process. N The process has reduced CPU scheduling priority (see setpriority(2)). E The process is trying to exit. Here is an example: ### stuck process: [bro@01 ~]$ ps auxwww | fgrep 1616 bro1616 100.0 0.0 758040 60480 ?? RNE 2:57PM 53:50.04 /usr/local/bro-git/bin/bro -i myri0 -U .status -p broctl -p broctl-live -p local -p worker-1-1 mgr.bro broctl base/frameworks/cluster local-worker.bro broctl/auto when checking for process in proc: [bro@c ~]$ ls -l /proc/1616 ls: /proc/1616: No such file or directory -- This message was sent by Atlassian JIRA (v6.4-OD-16-006#64014) ___ bro-dev mailing list bro-dev@bro.org http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev
[Bro-Dev] [JIRA] (BIT-1306) bro process would get stuck/freeze with myricom drivers
[ https://bro-tracker.atlassian.net/browse/BIT-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=20105#comment-20105 ] Jon Siwek commented on BIT-1306: Can you check if this small patch helps? {code} diff --git a/src/main.cc b/src/main.cc index fb48bdc..7827302 100644 --- a/src/main.cc +++ b/src/main.cc @@ -391,6 +391,7 @@ void terminate_bro() delete event_serializer; delete state_serializer; delete event_registry; + delete remote_serializer; delete analyzer_mgr; delete file_mgr; delete log_mgr; {code} I'm not sure why that got removed (it still exists in 2.3.2), but it might cause the main Bro processes to not reap its child. The main Bro process being the one that opened a network interface and the child being the one doing remote communication, but which inherits the parent's open file descriptors. So a total guess is that the process forked for remote communication became a zombie (due to lack of what's in the patch above) and holds an open file descriptor on the network device. bro process would get stuck/freeze with myricom drivers --- Key: BIT-1306 URL: https://bro-tracker.atlassian.net/browse/BIT-1306 Project: Bro Issue Tracker Issue Type: Problem Components: Bro Affects Versions: git/master Environment: OS: FreeBSD 9.3-RELEASE-p5 OS bro version 2.3-328 git log -1 --format=%H 379593c7fded0f9791ae71a52dd78a4c9d5a2c1f Reporter: Aashish Sharma Labels: bro-git, myricom Fix For: 2.4 When I stop bro (in cluster mode), one of the bro worker process (random) would get stuck and wouldn't shutdown, stop or even be killed using kill -s 9. System has to be ultimately rebooted to remove stuck bro process. On running myri_start_stop I see: # /usr/local/opt/snf/sbin/myri_start_stop stop Removing myri_snf.ko kldunload: can't unload file: Device busy It appears that the myri_snf.ko driver cannot be unloaded because of the stuck bro process. That process still has an open descriptor on the Sniffer device/driver and bro process freezes More details: The bro process is stuck in RNE state R Marks a runnable process. N The process has reduced CPU scheduling priority (see setpriority(2)). E The process is trying to exit. Here is an example: ### stuck process: [bro@01 ~]$ ps auxwww | fgrep 1616 bro1616 100.0 0.0 758040 60480 ?? RNE 2:57PM 53:50.04 /usr/local/bro-git/bin/bro -i myri0 -U .status -p broctl -p broctl-live -p local -p worker-1-1 mgr.bro broctl base/frameworks/cluster local-worker.bro broctl/auto when checking for process in proc: [bro@c ~]$ ls -l /proc/1616 ls: /proc/1616: No such file or directory -- This message was sent by Atlassian JIRA (v6.4-OD-16-005#64014) ___ bro-dev mailing list bro-dev@bro.org http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev
[Bro-Dev] [JIRA] (BIT-1306) bro process would get stuck/freeze with myricom drivers
Aashish Sharma created BIT-1306: --- Summary: bro process would get stuck/freeze with myricom drivers Key: BIT-1306 URL: https://bro-tracker.atlassian.net/browse/BIT-1306 Project: Bro Issue Tracker Issue Type: Problem Components: Bro Affects Versions: git/master Environment: OS: FreeBSD 9.3-RELEASE-p5 OS bro version 2.3-328 git log -1 --format=%H 379593c7fded0f9791ae71a52dd78a4c9d5a2c1f Reporter: Aashish Sharma When I stop bro (in cluster mode), one of the bro worker process (random) would get stuck and wouldn't shutdown, stop or even be killed using kill -s 9. System has to be ultimately rebooted to remove stuck bro process. On running myri_start_stop I see: # /usr/local/opt/snf/sbin/myri_start_stop stop Removing myri_snf.ko kldunload: can't unload file: Device busy It appears that the myri_snf.ko driver cannot be unloaded because of the stuck bro process. That process still has an open descriptor on the Sniffer device/driver and bro process freezes More details: The bro process is stuck in RNE state R Marks a runnable process. N The process has reduced CPU scheduling priority (see setpriority(2)). E The process is trying to exit. Here is an example: ### stuck process: [bro@01 ~]$ ps auxwww | fgrep 1616 bro1616 100.0 0.0 758040 60480 ?? RNE 2:57PM 53:50.04 /usr/local/bro-git/bin/bro -i myri0 -U .status -p broctl -p broctl-live -p local -p worker-1-1 mgr.bro broctl base/frameworks/cluster local-worker.bro broctl/auto when checking for process in proc: [bro@c ~]$ ls -l /proc/1616 ls: /proc/1616: No such file or directory -- This message was sent by Atlassian JIRA (v6.4-OD-13-026#64011) ___ bro-dev mailing list bro-dev@bro.org http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev