michael goulish created DISPATCH-2173:
-----------------------------------------

             Summary: 30-Mesh Behaving Badly
                 Key: DISPATCH-2173
                 URL: https://issues.apache.org/jira/browse/DISPATCH-2173
             Project: Qpid Dispatch
          Issue Type: Bug
          Components: Router Node
            Reporter: michael goulish
            Assignee: michael goulish


While testing scale-up of full-mesh networks I encountered some Bad Behavior at 
30 nodes. (435 connections.)

On my first try, 15 of the routers died.

On my second try, no nodes died – but the network never converged. It consumed 
all available CPU (32 cores) for three minutes, and the 30 routers printed a 
combined total of more than 1000 radius calculations to their logs by the time 
I became wrathful and cast them all into the Bitbucket of Woe.

 

For reference, those radius calculations are how I decide that the network has 
converged – everybody has settled down and agreed on the topology and stopped 
talking about it. The last thing each router prints to its log is a radius 
calculation, and then it's done. This may happen multiple times for each 
router, but when the total number of such prints stops changing – the network 
has converged.

 

For 15 or 20 routers, the number of such prints was 20 or 40 or so. When this 
test exceeded that by 25x, I decided it was never going to quit.

 

...Now looking at the logs to see if I can figure out what was happening...

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org

Reply via email to