[Bro-Dev] [JIRA] (BIT-1353) BroCtl status/top take excessive amount of time

2015-03-23 Thread Johanna Amann (JIRA)
Johanna Amann created BIT-1353:
--

 Summary: BroCtl status/top take excessive amount of time
 Key: BIT-1353
 URL: https://bro-tracker.atlassian.net/browse/BIT-1353
 Project: Bro Issue Tracker
  Issue Type: Problem
  Components: BroControl
Affects Versions: git/master
Reporter: Johanna Amann
 Fix For: 2.4


After running a large bro cluster for a few days on a FreeBSD system (FreeBSD 
10.1, 28 physical nodes, 81 worker processes), broctl actions that interact 
with all nodes seem to take excessive amounts of time (>2 minutes for a broctl 
status). This was not the case right after starting up the cluster.

If there is any way I can help with more information, please let me know what 
to do.



--
This message was sent by Atlassian JIRA
(v6.4-OD-16-005#64014)
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


[Bro-Dev] [JIRA] (BIT-1353) BroCtl status/top take excessive amount of time

2015-03-25 Thread Johanna Amann (JIRA)

[ 
https://bro-tracker.atlassian.net/browse/BIT-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=20117#comment-20117
 ] 

Johanna Amann commented on BIT-1353:


I looked into this a tad more - and it seems that two nodes were very slow to 
reply and potentially ran into a timeout. That does not really seem obvious 
from the status output at the moment though (unless I completely missed it) - 
perhaps we should add that.

> BroCtl status/top take excessive amount of time
> ---
>
> Key: BIT-1353
> URL: https://bro-tracker.atlassian.net/browse/BIT-1353
> Project: Bro Issue Tracker
>  Issue Type: Problem
>  Components: BroControl
>Affects Versions: git/master
>Reporter: Johanna Amann
> Fix For: 2.4
>
>
> After running a large bro cluster for a few days on a FreeBSD system (FreeBSD 
> 10.1, 28 physical nodes, 81 worker processes), broctl actions that interact 
> with all nodes seem to take excessive amounts of time (>2 minutes for a 
> broctl status). This was not the case right after starting up the cluster.
> If there is any way I can help with more information, please let me know what 
> to do.



--
This message was sent by Atlassian JIRA
(v6.4-OD-16-005#64014)
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


[Bro-Dev] [JIRA] (BIT-1353) BroCtl status/top take excessive amount of time

2015-03-26 Thread Johanna Amann (JIRA)

[ 
https://bro-tracker.atlassian.net/browse/BIT-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=20118#comment-20118
 ] 

Johanna Amann commented on BIT-1353:


And even more detail - the cause of this was hardware problems on two nodes. 
The bro instances of these nodes were still kind-of-running, but I don't think 
they were communicating with master anymore and they were unnkillable (even 
with kill -9); probably hanging while waiting for disk-io (harddrive problems). 
Since you still could ssh into the nodes, and they worked normally unless you 
tried to do certain file system accesses, broctl apparently listed them as 
online, without giving any indication of problems with the nodes, besides the 
fact that "status" takes a long time.

> BroCtl status/top take excessive amount of time
> ---
>
> Key: BIT-1353
> URL: https://bro-tracker.atlassian.net/browse/BIT-1353
> Project: Bro Issue Tracker
>  Issue Type: Problem
>  Components: BroControl
>Affects Versions: git/master
>Reporter: Johanna Amann
> Fix For: 2.4
>
>
> After running a large bro cluster for a few days on a FreeBSD system (FreeBSD 
> 10.1, 28 physical nodes, 81 worker processes), broctl actions that interact 
> with all nodes seem to take excessive amounts of time (>2 minutes for a 
> broctl status). This was not the case right after starting up the cluster.
> If there is any way I can help with more information, please let me know what 
> to do.



--
This message was sent by Atlassian JIRA
(v6.4-OD-16-005#64014)
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


[Bro-Dev] [JIRA] (BIT-1353) BroCtl status/top take excessive amount of time

2015-03-26 Thread Adam Slagell (JIRA)

 [ 
https://bro-tracker.atlassian.net/browse/BIT-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Slagell reassigned BIT-1353:
-

Assignee: Daniel Thayer

> BroCtl status/top take excessive amount of time
> ---
>
> Key: BIT-1353
> URL: https://bro-tracker.atlassian.net/browse/BIT-1353
> Project: Bro Issue Tracker
>  Issue Type: Problem
>  Components: BroControl
>Affects Versions: git/master
>Reporter: Johanna Amann
>Assignee: Daniel Thayer
> Fix For: 2.4
>
>
> After running a large bro cluster for a few days on a FreeBSD system (FreeBSD 
> 10.1, 28 physical nodes, 81 worker processes), broctl actions that interact 
> with all nodes seem to take excessive amounts of time (>2 minutes for a 
> broctl status). This was not the case right after starting up the cluster.
> If there is any way I can help with more information, please let me know what 
> to do.



--
This message was sent by Atlassian JIRA
(v6.4-OD-16-005#64014)
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


[Bro-Dev] [JIRA] (BIT-1353) BroCtl status/top take excessive amount of time

2015-03-27 Thread Daniel Thayer (JIRA)

[ 
https://bro-tracker.atlassian.net/browse/BIT-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=20122#comment-20122
 ] 

Daniel Thayer commented on BIT-1353:


I'm not seeing a problem.  As a test, I simulated a slow node by adding a 
"sleep"
command to one of the scripts that broctl runs on the remote host.
If the sleep is long enough to exceed the timeout, then I see "???" in the 
status
output (in the "Running", "Peers", and "Started" columns).
Otherwise, broctl status simply gathers information reported by Bro.


> BroCtl status/top take excessive amount of time
> ---
>
> Key: BIT-1353
> URL: https://bro-tracker.atlassian.net/browse/BIT-1353
> Project: Bro Issue Tracker
>  Issue Type: Problem
>  Components: BroControl
>Affects Versions: git/master
>Reporter: Johanna Amann
>Assignee: Daniel Thayer
> Fix For: 2.4
>
>
> After running a large bro cluster for a few days on a FreeBSD system (FreeBSD 
> 10.1, 28 physical nodes, 81 worker processes), broctl actions that interact 
> with all nodes seem to take excessive amounts of time (>2 minutes for a 
> broctl status). This was not the case right after starting up the cluster.
> If there is any way I can help with more information, please let me know what 
> to do.



--
This message was sent by Atlassian JIRA
(v6.4-OD-16-005#64014)
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


[Bro-Dev] [JIRA] (BIT-1353) BroCtl status/top take excessive amount of time

2015-04-03 Thread Robin Sommer (JIRA)

[ 
https://bro-tracker.atlassian.net/browse/BIT-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=20224#comment-20224
 ] 

Robin Sommer commented on BIT-1353:
---

set timeout to 30s and make configurable, revisit later when Broker is there

> BroCtl status/top take excessive amount of time
> ---
>
> Key: BIT-1353
> URL: https://bro-tracker.atlassian.net/browse/BIT-1353
> Project: Bro Issue Tracker
>  Issue Type: Problem
>  Components: BroControl
>Affects Versions: git/master
>Reporter: Johanna Amann
>Assignee: Daniel Thayer
> Fix For: 2.4
>
>
> After running a large bro cluster for a few days on a FreeBSD system (FreeBSD 
> 10.1, 28 physical nodes, 81 worker processes), broctl actions that interact 
> with all nodes seem to take excessive amounts of time (>2 minutes for a 
> broctl status). This was not the case right after starting up the cluster.
> If there is any way I can help with more information, please let me know what 
> to do.



--
This message was sent by Atlassian JIRA
(v6.4-OD-16-006#64014)
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


[Bro-Dev] [JIRA] (BIT-1353) BroCtl status/top take excessive amount of time

2015-04-16 Thread Daniel Thayer (JIRA)

[ 
https://bro-tracker.atlassian.net/browse/BIT-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=20308#comment-20308
 ] 

Daniel Thayer commented on BIT-1353:


Branch topic/dnthayer/ticket1353 in the broctl repo contains the fix for this 
issue.


> BroCtl status/top take excessive amount of time
> ---
>
> Key: BIT-1353
> URL: https://bro-tracker.atlassian.net/browse/BIT-1353
> Project: Bro Issue Tracker
>  Issue Type: Problem
>  Components: BroControl
>Affects Versions: git/master
>Reporter: Johanna Amann
>Assignee: Daniel Thayer
> Fix For: 2.4
>
>
> After running a large bro cluster for a few days on a FreeBSD system (FreeBSD 
> 10.1, 28 physical nodes, 81 worker processes), broctl actions that interact 
> with all nodes seem to take excessive amounts of time (>2 minutes for a 
> broctl status). This was not the case right after starting up the cluster.
> If there is any way I can help with more information, please let me know what 
> to do.



--
This message was sent by Atlassian JIRA
(v6.4-OD-16-006#64014)
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


[Bro-Dev] [JIRA] (BIT-1353) BroCtl status/top take excessive amount of time

2015-04-16 Thread Daniel Thayer (JIRA)

 [ 
https://bro-tracker.atlassian.net/browse/BIT-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Thayer updated BIT-1353:
---
Status: Merge Request  (was: Open)

> BroCtl status/top take excessive amount of time
> ---
>
> Key: BIT-1353
> URL: https://bro-tracker.atlassian.net/browse/BIT-1353
> Project: Bro Issue Tracker
>  Issue Type: Problem
>  Components: BroControl
>Affects Versions: git/master
>Reporter: Johanna Amann
>Assignee: Daniel Thayer
> Fix For: 2.4
>
>
> After running a large bro cluster for a few days on a FreeBSD system (FreeBSD 
> 10.1, 28 physical nodes, 81 worker processes), broctl actions that interact 
> with all nodes seem to take excessive amounts of time (>2 minutes for a 
> broctl status). This was not the case right after starting up the cluster.
> If there is any way I can help with more information, please let me know what 
> to do.



--
This message was sent by Atlassian JIRA
(v6.4-OD-16-006#64014)
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


[Bro-Dev] [JIRA] (BIT-1353) BroCtl status/top take excessive amount of time

2015-04-16 Thread Daniel Thayer (JIRA)

 [ 
https://bro-tracker.atlassian.net/browse/BIT-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Thayer updated BIT-1353:
---
Status: Open  (was: Merge Request)

> BroCtl status/top take excessive amount of time
> ---
>
> Key: BIT-1353
> URL: https://bro-tracker.atlassian.net/browse/BIT-1353
> Project: Bro Issue Tracker
>  Issue Type: Problem
>  Components: BroControl
>Affects Versions: git/master
>Reporter: Johanna Amann
>Assignee: Daniel Thayer
> Fix For: 2.4
>
>
> After running a large bro cluster for a few days on a FreeBSD system (FreeBSD 
> 10.1, 28 physical nodes, 81 worker processes), broctl actions that interact 
> with all nodes seem to take excessive amounts of time (>2 minutes for a 
> broctl status). This was not the case right after starting up the cluster.
> If there is any way I can help with more information, please let me know what 
> to do.



--
This message was sent by Atlassian JIRA
(v6.4-OD-16-006#64014)
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


[Bro-Dev] [JIRA] (BIT-1353) BroCtl status/top take excessive amount of time

2015-04-16 Thread Daniel Thayer (JIRA)

 [ 
https://bro-tracker.atlassian.net/browse/BIT-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Thayer updated BIT-1353:
---
Status: Merge Request  (was: Open)

> BroCtl status/top take excessive amount of time
> ---
>
> Key: BIT-1353
> URL: https://bro-tracker.atlassian.net/browse/BIT-1353
> Project: Bro Issue Tracker
>  Issue Type: Problem
>  Components: BroControl
>Affects Versions: git/master
>Reporter: Johanna Amann
>Assignee: Daniel Thayer
> Fix For: 2.4
>
>
> After running a large bro cluster for a few days on a FreeBSD system (FreeBSD 
> 10.1, 28 physical nodes, 81 worker processes), broctl actions that interact 
> with all nodes seem to take excessive amounts of time (>2 minutes for a 
> broctl status). This was not the case right after starting up the cluster.
> If there is any way I can help with more information, please let me know what 
> to do.



--
This message was sent by Atlassian JIRA
(v6.4-OD-16-006#64014)
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


[Bro-Dev] [JIRA] (BIT-1353) BroCtl status/top take excessive amount of time

2015-04-20 Thread Robin Sommer (JIRA)

 [ 
https://bro-tracker.atlassian.net/browse/BIT-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robin Sommer reassigned BIT-1353:
-

Assignee: (was: Daniel Thayer)

> BroCtl status/top take excessive amount of time
> ---
>
> Key: BIT-1353
> URL: https://bro-tracker.atlassian.net/browse/BIT-1353
> Project: Bro Issue Tracker
>  Issue Type: Problem
>  Components: BroControl
>Affects Versions: git/master
>Reporter: Johanna Amann
> Fix For: 2.4
>
>
> After running a large bro cluster for a few days on a FreeBSD system (FreeBSD 
> 10.1, 28 physical nodes, 81 worker processes), broctl actions that interact 
> with all nodes seem to take excessive amounts of time (>2 minutes for a 
> broctl status). This was not the case right after starting up the cluster.
> If there is any way I can help with more information, please let me know what 
> to do.



--
This message was sent by Atlassian JIRA
(v6.5-OD-01-120#65000)
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


[Bro-Dev] [JIRA] (BIT-1353) BroCtl status/top take excessive amount of time

2015-04-20 Thread Robin Sommer (JIRA)

[ 
https://bro-tracker.atlassian.net/browse/BIT-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=20408#comment-20408
 ] 

Robin Sommer commented on BIT-1353:
---

This has been merged already.

> BroCtl status/top take excessive amount of time
> ---
>
> Key: BIT-1353
> URL: https://bro-tracker.atlassian.net/browse/BIT-1353
> Project: Bro Issue Tracker
>  Issue Type: Problem
>  Components: BroControl
>Affects Versions: git/master
>Reporter: Johanna Amann
> Fix For: 2.4
>
>
> After running a large bro cluster for a few days on a FreeBSD system (FreeBSD 
> 10.1, 28 physical nodes, 81 worker processes), broctl actions that interact 
> with all nodes seem to take excessive amounts of time (>2 minutes for a 
> broctl status). This was not the case right after starting up the cluster.
> If there is any way I can help with more information, please let me know what 
> to do.



--
This message was sent by Atlassian JIRA
(v6.5-OD-01-120#65000)
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


[Bro-Dev] [JIRA] (BIT-1353) BroCtl status/top take excessive amount of time

2015-04-20 Thread Robin Sommer (JIRA)

 [ 
https://bro-tracker.atlassian.net/browse/BIT-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robin Sommer updated BIT-1353:
--
Resolution: Merged  (was: Fixed)
Status: Closed  (was: Merge Request)

> BroCtl status/top take excessive amount of time
> ---
>
> Key: BIT-1353
> URL: https://bro-tracker.atlassian.net/browse/BIT-1353
> Project: Bro Issue Tracker
>  Issue Type: Problem
>  Components: BroControl
>Affects Versions: git/master
>Reporter: Johanna Amann
> Fix For: 2.4
>
>
> After running a large bro cluster for a few days on a FreeBSD system (FreeBSD 
> 10.1, 28 physical nodes, 81 worker processes), broctl actions that interact 
> with all nodes seem to take excessive amounts of time (>2 minutes for a 
> broctl status). This was not the case right after starting up the cluster.
> If there is any way I can help with more information, please let me know what 
> to do.



--
This message was sent by Atlassian JIRA
(v6.5-OD-01-120#65000)
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] [JIRA] (BIT-1353) BroCtl status/top take excessive amount of time

2015-03-23 Thread Johanna Amann
Hi,

On Mon, Mar 23, 2015 at 03:33:13PM -0500, Daniel Thayer wrote:
> I'm glad to hear that you're testing broctl on FreeBSD (I always
> test on Linux).  Here are my initial ideas:

> How many hosts are in your cluster?  (you mentioned "28 physical nodes",
> does that mean 28 computers?!)

It is 28 computers, each running 3 bro worker processes with 2 more
physical machines running the master and proxies.

> Are you running the git master version of broctl?

it is not quite master - it currently is running 5e2defe, so the state as
of March 13th.

> Is every broctl command slow, or just status and top?

All the ones that I tried are slow. I can upgrade to master and test again
- I just wanted to ask if there is some way to debug what is going on
before restarting the cluster, since the problem took a few days to
manifest itself. Hence I probably will not be able to directly reproduce
it :)

> The broctl status command usually spends most of its time
> waiting for broccoli.  I've added a new option that you
> can set in your etc/broctl.cfg file that will skip
> the broccoli code so that broctl status runs much faster.
> To enable this feature, make sure this line is in your
> broctl.cfg file:
> StatusCmdShowAll = 0
> (after you add this, broctl will say that you have to run
> either "install" or "deploy", but you don't actually
> need to for this particular broctl option).

I added this (without running install / depoloy) and it now is now faster,
but still takes a while. I examined spool/debug.log a bit and it actually
seems that a significant period of time is spent getting the process status.
The timeline currently looks like this:

23 Mar 11:53:05 [broctl] status
23 Mar 11:53:05 [broctl] Getting process status ...
23 Mar 11:53:05 [execute] blade26: 
/xa/bro/master/share/broctl/scripts/helpers/check-pid 2513
[...] (many lines like this and many exit code lines)
23 Mar 11:54:07 [execute] blade15: exit code 0
23 Mar 11:54:07 [execute] blade26: 
/xa/bro/master/share/broctl/scripts/helpers/cat-file 
/xa/bro/master/spool/worker-26-0/.startup
[...]
23 Mar 11:54:09 [execute] blade15: exit code 0
23 Mar 11:54:09 [events] broccoli: Control::peer_status_request() to node 
worker-26-0
[...]
23 Mar 11:54:29 [events] broccoli: 
Control::peer_status_response(1427136868.812806 [...]
-> status output

Johanna
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


Re: [Bro-Dev] [JIRA] (BIT-1353) BroCtl status/top take excessive amount of time

2015-03-23 Thread Johanna Amann
On Mon, Mar 23, 2015 at 04:15:12PM -0500, Daniel Thayer wrote:
> When you do a broctl status, does it show a status line for every Bro
> node in your cluster?

Yes, it does. At least I think so, the number is quite large :)

> How are you running broctl status:
> 1) just by typing "broctl status", or
> 2) by running "broctl", then type the "status" command at the BroControl
> prompt.

I run broctl first and then type status.

> When you run "broctl status", it must establish an ssh session to
> every remote machine, which could take awhile when there are 28
> machines.  However, when you run just "broctl", then type "status"
> at the BroControl prompt, it keeps the ssh sessions open, so the 2nd
> time you type "status" should be faster than the 1st time (because
> the 2nd time it doesn't need to do the ssh connections).

There does not seem to be a big speed difference between the first time
and the second time status is run.

Johanna
___
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev