jeanvetorello commented on issue #12621:
URL: https://github.com/apache/cloudstack/issues/12621#issuecomment-3877420125

   @weizhouapache Sure, here are the logs and reproduction details.
   
   ### Why this is a major issue
   
   I consider this a major issue because it directly causes **VPC 
unavailability in production**. The scenario is:
   
   1. You add a second public IP range (on a different VLAN/VXLAN) to a VPC — 
this works fine, the VPC operates normally
   2. However, if the VPC router happens to **reboot** (e.g., host migration, 
restart with cleanup, or any event that triggers router re-deployment), the 
router **fails to come back up**
   3. The VPC becomes **completely unavailable** — all VMs inside lose 
connectivity
   4. The **only workaround** to recover is to remove public IP ranges until 
only one remains in the VPC, then restart
   
   This is particularly dangerous because the failure doesn't happen when 
adding the IP range — everything looks fine. It's a **ticking time bomb** that 
only triggers on the next router restart/redeployment.
   
   I've applied a patch to my production environment (CloudStack 4.21.0.0) and 
it resolved the issue. The PR is submitted.
   
   ---
   
   ### Reproduction (cloudstack-simulator 4.21.0.0)
   
   **Steps:**
   1. Create a VPC with a tier network
   2. Deploy a VM in the tier (so the VPC router is created)
   3. The `networks` table ends up with comma-separated CIDRs for the VPC's 
public/control network:
   
   ```sql
   mysql> SELECT id, name, cidr, gateway FROM networks WHERE removed IS NULL;
   
+-----+----------+-----------------------------------+-------------------------+
   | id  | name     | cidr                              | gateway               
  |
   
+-----+----------+-----------------------------------+-------------------------+
   | 200 | NULL     | 192.168.2.0/24,160.1.0.0/24       | 192.168.2.1,160.1.0.1 
  |
   | 204 | test-sub | 10.0.0.0/24                       | 10.0.0.1              
  |
   
+-----+----------+-----------------------------------+-------------------------+
   ```
   
   4. Restart the VPC with `cleanup=true` → **FAILS**
   
   ### API Error Response
   
   ```json
   {
       "queryasyncjobresultresponse": {
           "cmd": "org.apache.cloudstack.api.command.user.vpc.RestartVPCCmd",
           "jobresultcode": 530,
           "jobresult": {
               "errorcode": 530,
               "errortext": "cidr is not formatted correctly: 
192.168.2.0/24,160.1.0.0/24"
           },
           "jobstatus": 2
       }
   }
   ```
   
   ### Management Server Stack Trace (vmops.log)
   
   ```
   2026-02-10 12:36:49,746 ERROR [c.c.a.ApiAsyncJobDispatcher] 
(API-Job-Executor-1:[ctx-a55fcee4, job-44]) (logid:036c8e2e)
   Unexpected exception while executing 
org.apache.cloudstack.api.command.user.vpc.RestartVPCCmd
   com.cloud.utils.exception.CloudRuntimeException: cidr is not formatted 
correctly: 192.168.2.0/24,160.1.0.0/24
        at com.cloud.utils.net.NetUtils.cidrToLong(NetUtils.java:911)
        at 
com.cloud.utils.net.NetUtils.isNetworkAWithinNetworkB(NetUtils.java:894)
        at com.cloud.network.dao.NetworkVO.equals(NetworkVO.java:603)
        at java.base/java.util.HashMap.getNode(HashMap.java:568)
        at java.base/java.util.LinkedHashMap.get(LinkedHashMap.java:440)
        at 
com.cloud.network.router.VpcNetworkHelperImpl.reallocateRouterNetworks(VpcNetworkHelperImpl.java:162)
        at 
com.cloud.network.router.NetworkHelperImpl.deployRouterWithTemplates(NetworkHelperImpl.java:542)
        at 
com.cloud.network.router.NetworkHelperImpl.deployRouter(NetworkHelperImpl.java:602)
        at 
org.apache.cloudstack.network.router.deployment.VpcRouterDeploymentDefinition.deployAllVirtualRouters(VpcRouterDeploymentDefinition.java:195)
        at 
org.apache.cloudstack.network.router.deployment.RouterDeploymentDefinition.executeDeployment(RouterDeploymentDefinition.java:393)
        at 
org.apache.cloudstack.network.router.deployment.RouterDeploymentDefinition.findOrDeployVirtualRouter(RouterDeploymentDefinition.java:255)
        at 
org.apache.cloudstack.network.router.deployment.VpcRouterDeploymentDefinition.findOrDeployVirtualRouter(VpcRouterDeploymentDefinition.java:157)
        at 
org.apache.cloudstack.network.router.deployment.RouterDeploymentDefinition.deployVirtualRouter(RouterDeploymentDefinition.java:221)
        at 
com.cloud.network.element.VpcVirtualRouterElement.implementVpc(VpcVirtualRouterElement.java:165)
        at 
com.cloud.network.vpc.VpcManagerImpl.startVpc(VpcManagerImpl.java:2039)
        at 
com.cloud.network.vpc.VpcManagerImpl.rollingRestartVpc(VpcManagerImpl.java:3633)
        at 
com.cloud.network.vpc.VpcManagerImpl.restartVpc(VpcManagerImpl.java:2422)
        at 
com.cloud.network.vpc.VpcManagerImpl.restartVpc(VpcManagerImpl.java:2382)
        ...
        at 
org.apache.cloudstack.api.command.user.vpc.RestartVPCCmd.execute(RestartVPCCmd.java:94)
   ```
   
   ### Root Cause
   
   The fix in the PR wraps the `cidr` values with 
`StringUtils.getFirstValueFromCommaSeparatedString()` before passing them to 
`NetUtils.isNetworkAWithinNetworkB()` in `NetworkVO.equals()`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to