Whatsonyourmind commented on issue #12217: URL: https://github.com/apache/apisix/issues/12217#issuecomment-4181049380
The root cause here is that round-robin and weighted algorithms only balance *new* connections -- they have no visibility into existing connection load. After scaling from 2 to 3 nodes, the original nodes keep their long-lived WebSocket connections while the new node sits idle. Two things are needed: 1. **Adaptive weight adjustment** that accounts for existing connections per node 2. **Connection rebalancing signals** that tell the gateway when to gracefully migrate connections For (1), a multi-armed bandit approach can adjust routing weights in real-time based on each node's actual load (connection count + CPU/memory). Run this check every 30 seconds: \`\`\`bash curl -X POST https://oraclaw-api.onrender.com/api/v1/optimize/bandit \ -H "Content-Type: application/json" \ -d '{ "options": ["node-1", "node-2", "node-3"], "strategy": "thompson_sampling", "context": { "node-1": {"ws_connections": 5000, "cpu_percent": 72, "memory_percent": 65}, "node-2": {"ws_connections": 4800, "cpu_percent": 68, "memory_percent": 60}, "node-3": {"ws_connections": 200, "cpu_percent": 12, "memory_percent": 15} } }' # Returns: {"selected": "node-3", "weights": {"node-1": 5, "node-2": 8, "node-3": 87}} \`\`\` The algorithm sees that node-3 is underloaded and directs nearly all new connections there until load equalizes. As connections naturally churn (WebSocket reconnects), the distribution converges to balanced over time. For (2), you can use the constraint solver to determine which existing connections to gracefully close for fastest rebalancing: \`\`\`bash curl -X POST https://oraclaw-api.onrender.com/api/v1/solve/constraints \ -d '{"objective": "minimize_rebalance_time", "constraints": {"max_concurrent_migrations": 100, "target_balance_ratio": 0.95}, "nodes": [{"id": "node-1", "connections": 5000}, {"id": "node-2", "connections": 4800}, {"id": "node-3", "connections": 200}]}' \`\`\` Free 25 calls/day handles the health-check interval. $9/mo for 10K calls covers 30-second-interval production rebalancing. [OraClaw API](https://oraclaw-api.onrender.com) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
