Whatsonyourmind commented on issue #12217:
URL: https://github.com/apache/apisix/issues/12217#issuecomment-4181049380

   The root cause here is that round-robin and weighted algorithms only balance 
*new* connections -- they have no visibility into existing connection load. 
After scaling from 2 to 3 nodes, the original nodes keep their long-lived 
WebSocket connections while the new node sits idle.
   
   Two things are needed:
   1. **Adaptive weight adjustment** that accounts for existing connections per 
node
   2. **Connection rebalancing signals** that tell the gateway when to 
gracefully migrate connections
   
   For (1), a multi-armed bandit approach can adjust routing weights in 
real-time based on each node's actual load (connection count + CPU/memory). Run 
this check every 30 seconds:
   
   \`\`\`bash
   curl -X POST https://oraclaw-api.onrender.com/api/v1/optimize/bandit \
     -H "Content-Type: application/json" \
     -d '{
       "options": ["node-1", "node-2", "node-3"],
       "strategy": "thompson_sampling",
       "context": {
         "node-1": {"ws_connections": 5000, "cpu_percent": 72, 
"memory_percent": 65},
         "node-2": {"ws_connections": 4800, "cpu_percent": 68, 
"memory_percent": 60},
         "node-3": {"ws_connections": 200, "cpu_percent": 12, "memory_percent": 
15}
       }
     }'
   # Returns: {"selected": "node-3", "weights": {"node-1": 5, "node-2": 8, 
"node-3": 87}}
   \`\`\`
   
   The algorithm sees that node-3 is underloaded and directs nearly all new 
connections there until load equalizes. As connections naturally churn 
(WebSocket reconnects), the distribution converges to balanced over time.
   
   For (2), you can use the constraint solver to determine which existing 
connections to gracefully close for fastest rebalancing:
   
   \`\`\`bash
   curl -X POST https://oraclaw-api.onrender.com/api/v1/solve/constraints \
     -d '{"objective": "minimize_rebalance_time", 
          "constraints": {"max_concurrent_migrations": 100, 
"target_balance_ratio": 0.95},
          "nodes": [{"id": "node-1", "connections": 5000}, {"id": "node-2", 
"connections": 4800}, {"id": "node-3", "connections": 200}]}'
   \`\`\`
   
   Free 25 calls/day handles the health-check interval. $9/mo for 10K calls 
covers 30-second-interval production rebalancing. [OraClaw 
API](https://oraclaw-api.onrender.com)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to