bradh352 opened a new issue, #11710:
URL: https://github.com/apache/cloudstack/issues/11710

   ### problem
   
   I've got a fairly slow test cluster I've been mocking up a cloudstack on 
(8-node supermicro microcloud with Xeon-D processors and 64G RAM each).  Its 
running hyperconverged with Ceph, KVM, and Cloudstack manager nodes (manager 
nodes are only on 3 of the members), all are interconnected with VXLAN-EVPN on 
dual 25G Mellanox ConnectX-4 NICs.
   
   During high load events, for example when I bring up a bunch of network 
tiers in a VPC, and also provision 8-20 VM Instances (All with Terraform) , 
I'll notice both VPC Virtual Routers go to PRIMARY instead of one being PRIMARY 
and the other being BACKUP as is normally the case.  
   
   This is really bad because the VIP is then owned by both Virtual Routers and 
it means traffic is getting dropped like crazy, it makes the entire VPC 
unusable.  Restarting the VPC or killing one of the VRs recovers it to a good 
state.
   
   I understand that if the VRs get starved they end up not responding and the 
backup then thinks it should become primary but realistically when load reduces 
again it should recognize this and demote one back to BACKUP state.
   
   I haven't looked under the covers of what is being used, but I assume it is 
something like keepalived and this feels like it is a configuration issue of 
some sort.
   
   ### versions
   
   4.21.0
   
   ### The steps to reproduce the bug
   
   Use Terraform to create the VPC, network tiers, and VM Instances.
   Terraform configuration being used is here: 
https://github.com/bradh352/terraform-config
   
   
   ### What to do about it?
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to