loth opened a new issue, #10598:
URL: https://github.com/apache/cloudstack/issues/10598
### problem
Our CS environment has two clusters, one windows and one linux. We have host
tags that allocate where VM's are placed. We are facing an issue when setting
host in maintenance that sometimes it will just fall to error state. Upon
checking the logs I found that while it checked the first cluster (ID 19,
Windows) however it never tried any hosts in the Linux cluster (ID 16).
Here is the log from the management server. As we can see, the allocator
tried to find resources in cluster 19, failed to find any with the Linux tag,
then failed out before trying the other cluster.
`
2025-03-20 20:04:13,017 DEBUG [c.c.v.VmWorkJobDispatcher]
(Work-Job-Executor-145:[ctx-2ec22277, job-3900296/job-3901388])
(logid:30d64f64) Run VM work job: com.cloud.vm.VmWorkMigrateAway for VM 8127,
job origin: 3900296
2025-03-20 20:04:13,018 DEBUG [c.c.v.VmWorkJobHandlerProxy]
(Work-Job-Executor-145:[ctx-2ec22277, job-3900296/job-3901388, ctx-e309f975])
(logid:30d64f64) Execute VM work job:
com.cloud.vm.VmWorkMigrateAway{"srcHostId":982,"userId":1,"accountId":1,"vmId":8127,"handlerName":"VirtualMachineManagerImpl"}
2025-03-20 20:04:13,031 INFO [c.c.v.ClusteredVirtualMachineManagerImpl]
(Work-Job-Executor-145:[ctx-2ec22277, job-3900296/job-3901388, ctx-e309f975])
(logid:30d64f64) Searching for hosts in the zone for vm migration
2025-03-20 20:04:13,034 DEBUG [c.c.d.DeploymentPlanningManagerImpl]
(Work-Job-Executor-145:[ctx-2ec22277, job-3900296/job-3901388, ctx-e309f975])
(logid:30d64f64) Trying to deploy VM [error decoding VM instance
{"id":8127,"instanceName":"i-10989-8127-VM","type":"User","uuid":"ce0c1fb6-65ef-4942-97de-05b255fdcd7d"}]
and details: Plan
[{"_dcId":14,"_poolId":637,"_recreateDisks":false,"preferredHostIds":[],"migrationPlan":true,"hostPriorities":{}}];
avoid list [{"_clusterIds":[24],"_hostIds":[982]}] and planner: [].
2025-03-20 20:04:13,037 DEBUG [c.c.d.DeploymentPlanningManagerImpl]
(Work-Job-Executor-145:[ctx-2ec22277, job-3900296/job-3901388, ctx-e309f975])
(logid:30d64f64) Checking non dedicated resources to deploy VM
[{"instanceName":"i-10989-8127-VM","type":"User","uuid":"ce0c1fb6-65ef-4942-97de-05b255fdcd7d"}].
2025-03-20 20:04:13,046 DEBUG [c.c.d.DeploymentPlanningManagerImpl]
(Work-Job-Executor-145:[ctx-2ec22277, job-3900296/job-3901388, ctx-e309f975])
(logid:30d64f64) Adding pods [[]], clusters [[]] and hosts [[]] to the avoid
list in the deploy process of user VM ["ce0c1fb6-65ef-4942-97de-05b255fdcd7d"],
because this VM is not explicitly dedicated to these components.
2025-03-20 20:04:13,048 DEBUG [o.a.c.u.r.ReflectionToStringBuilderUtils]
(Work-Job-Executor-145:[ctx-2ec22277, job-3900296/job-3901388, ctx-e309f975])
(logid:30d64f64) Object is null, not reflecting it.
2025-03-20 20:04:13,049 DEBUG [o.a.c.u.r.ReflectionToStringBuilderUtils]
(Work-Job-Executor-145:[ctx-2ec22277, job-3900296/job-3901388, ctx-e309f975])
(logid:30d64f64) Object is null, not reflecting it.
2025-03-20 20:04:13,049 DEBUG [c.c.d.DeploymentPlanningManagerImpl]
(Work-Job-Executor-145:[ctx-2ec22277, job-3900296/job-3901388, ctx-e309f975])
(logid:30d64f64) Trying to allocate a host and storage pools from datacenter
[{"name":"CS12","uuid":"81e07cf1-e395-4f5e-aa7e-4e9d80aa3cfb"}], pod [null],
cluster [null], to deploy VM
[{"instanceName":"i-10989-8127-VM","type":"User","uuid":"ce0c1fb6-65ef-4942-97de-05b255fdcd7d"}]
with requested CPU [24000] and requested RAM [(32.00 GB) 34359738368].
2025-03-20 20:04:13,050 DEBUG [c.c.d.DeploymentPlanningManagerImpl]
(Work-Job-Executor-145:[ctx-2ec22277, job-3900296/job-3901388, ctx-e309f975])
(logid:30d64f64) ROOT volume [69d114a8-5673-4ad7-a387-2ac0d11cf3a1] is ready to
deploy VM [ce0c1fb6-65ef-4942-97de-05b255fdcd7d].
2025-03-20 20:04:13,054 DEBUG [c.c.d.DeploymentPlanningManagerImpl]
(Work-Job-Executor-145:[ctx-2ec22277, job-3900296/job-3901388, ctx-e309f975])
(logid:30d64f64) Adding pods [] to the avoid set because these pods are in the
Disabled state.
2025-03-20 20:04:13,057 DEBUG [c.c.d.DeploymentPlanningManagerImpl]
(Work-Job-Executor-145:[ctx-2ec22277, job-3900296/job-3901388, ctx-e309f975])
(logid:30d64f64) Adding clusters [] of pod [5] to the void set because these
clusters are in the Disabled state.
2025-03-20 20:04:13,060 DEBUG [c.c.d.DeploymentPlanningManagerImpl]
(Work-Job-Executor-145:[ctx-2ec22277, job-3900296/job-3901388, ctx-e309f975])
(logid:30d64f64) Adding hosts [] of datacenter
[81e07cf1-e395-4f5e-aa7e-4e9d80aa3cfb] to the avoid set, because these hosts
are in the Disabled state.
2025-03-20 20:04:13,065 DEBUG [c.c.d.DeploymentPlanningManagerImpl]
(Work-Job-Executor-145:[ctx-2ec22277, job-3900296/job-3901388, ctx-e309f975])
(logid:30d64f64) DeploymentPlan [DataCenterDeployment] has not specified host.
Trying to find another destination to deploy VM
[ce0c1fb6-65ef-4942-97de-05b255fdcd7d], avoiding pods [], clusters [24] and
hosts [982].
2025-03-20 20:04:13,065 DEBUG [c.c.d.DeploymentPlanningManagerImpl]
(Work-Job-Executor-145:[ctx-2ec22277, job-3900296/job-3901388, ctx-e309f975])
(logid:30d64f64) Deploy avoids pods: [], clusters: [24], hosts: [982].
2025-03-20 20:04:13,065 DEBUG [c.c.d.DeploymentPlanningManagerImpl]
(Work-Job-Executor-145:[ctx-2ec22277, job-3900296/job-3901388, ctx-e309f975])
(logid:30d64f64) Deploy hosts with priorities {}, hosts have NORMAL priority by
default
2025-03-20 20:04:13,066 DEBUG [c.c.d.DeploymentPlanningManagerImpl]
(Work-Job-Executor-145:[ctx-2ec22277, job-3900296/job-3901388, ctx-e309f975])
(logid:30d64f64) This VM has last host_id specified, trying to choose the same
host: 982
2025-03-20 20:04:13,069 WARN [c.c.d.DeploymentPlanningManagerImpl]
(Work-Job-Executor-145:[ctx-2ec22277, job-3900296/job-3901388, ctx-e309f975])
(logid:30d64f64) The last host [bf145201-bcab-4634-8750-36b064574bc7] of VM
[ce0c1fb6-65ef-4942-97de-05b255fdcd7d] is in the avoid set. Skipping this and
trying other available hosts.
2025-03-20 20:04:13,069 DEBUG [c.c.d.DeploymentPlanningManagerImpl]
(Work-Job-Executor-145:[ctx-2ec22277, job-3900296/job-3901388, ctx-e309f975])
(logid:30d64f64) Cannot choose the last host to deploy this VM VM instance
{"id":8127,"instanceName":"i-10989-8127-VM","type":"User","uuid":"ce0c1fb6-65ef-4942-97de-05b255fdcd7d"}.
2025-03-20 20:04:13,072 DEBUG [c.c.d.FirstFitPlanner]
(Work-Job-Executor-145:[ctx-2ec22277, job-3900296/job-3901388, ctx-e309f975])
(logid:30d64f64) Searching all possible resources under this Zone: 14
2025-03-20 20:04:13,076 DEBUG [c.c.d.FirstFitPlanner]
(Work-Job-Executor-145:[ctx-2ec22277, job-3900296/job-3901388, ctx-e309f975])
(logid:30d64f64) Listing clusters in order of aggregate capacity, that have (at
least one host with) enough CPU and RAM capacity under this Zone: 14
2025-03-20 20:04:13,084 DEBUG [c.c.d.FirstFitPlanner]
(Work-Job-Executor-145:[ctx-2ec22277, job-3900296/job-3901388, ctx-e309f975])
(logid:30d64f64) Removing from the clusterId list these clusters from avoid
set: [24]
2025-03-20 20:04:13,102 DEBUG [c.c.d.FirstFitPlanner]
(Work-Job-Executor-145:[ctx-2ec22277, job-3900296/job-3901388, ctx-e309f975])
(logid:30d64f64) The clusterId list for the given offering tag: [19, 16]
2025-03-20 20:04:13,106 DEBUG [c.c.d.DeploymentPlanningManagerImpl]
(Work-Job-Executor-145:[ctx-2ec22277, job-3900296/job-3901388, ctx-e309f975])
(logid:30d64f64) Checking resources in Cluster: 19 under Pod: 5
2025-03-20 20:04:13,111 INFO [c.c.a.m.a.i.FirstFitRoutingAllocator]
(Work-Job-Executor-145:[ctx-2ec22277, job-3900296/job-3901388, ctx-e309f975,
FirstFitRoutingAllocator]) (logid:30d64f64) Guest VM is requested with
Custom[UEFI] Boot Type false
2025-03-20 20:04:13,111 DEBUG [c.c.a.m.a.i.FirstFitRoutingAllocator]
(Work-Job-Executor-145:[ctx-2ec22277, job-3900296/job-3901388, ctx-e309f975,
FirstFitRoutingAllocator]) (logid:30d64f64) Looking for hosts in zone [14], pod
[5], cluster [19]
2025-03-20 20:04:13,111 DEBUG [c.c.a.m.a.i.FirstFitRoutingAllocator]
(Work-Job-Executor-145:[ctx-2ec22277, job-3900296/job-3901388, ctx-e309f975,
FirstFitRoutingAllocator]) (logid:30d64f64) Looking for hosts having tag
specified on SvcOffering:generic
2025-03-20 20:04:13,118 DEBUG [c.c.a.m.a.i.FirstFitRoutingAllocator]
(Work-Job-Executor-145:[ctx-2ec22277, job-3900296/job-3901388, ctx-e309f975,
FirstFitRoutingAllocator]) (logid:30d64f64) Hosts with tag 'generic' are:[Host
{"id":919,"name":"node87","type":"Routing","uuid":"e5668da2-862f-4cea-bda8-20707525ecda"},
Host
{"id":1126,"name":"node51","type":"Routing","uuid":"be3eb6a2-29b3-4a1e-a125-af8f9135d3a6"},
Host
{"id":1129,"name":"node52","type":"Routing","uuid":"a803eaea-82a1-4c10-a4d2-b0db6c8c15c6"},
Host
{"id":1350,"name":"node84","type":"Routing","uuid":"1a006517-3a4b-4a98-85a2-f693fa4f4501"},
Host
{"id":1368,"name":"node50","type":"Routing","uuid":"bc93e3f6-3751-46fd-8dee-c24f6e8fc922"}]
2025-03-20 20:04:13,118 DEBUG [c.c.a.m.a.i.FirstFitRoutingAllocator]
(Work-Job-Executor-145:[ctx-2ec22277, job-3900296/job-3901388, ctx-e309f975,
FirstFitRoutingAllocator]) (logid:30d64f64) Looking for hosts having tag
specified on Template:linux
2025-03-20 20:04:13,123 DEBUG [c.c.a.m.a.i.FirstFitRoutingAllocator]
(Work-Job-Executor-145:[ctx-2ec22277, job-3900296/job-3901388, ctx-e309f975,
FirstFitRoutingAllocator]) (logid:30d64f64) Hosts with tag 'linux' are:[]
2025-03-20 20:04:13,124 DEBUG [c.c.a.m.a.i.FirstFitRoutingAllocator]
(Work-Job-Executor-145:[ctx-2ec22277, job-3900296/job-3901388, ctx-e309f975,
FirstFitRoutingAllocator]) (logid:30d64f64) Found 0 Hosts satisfying both tags,
host ids are:[]
2025-03-20 20:04:13,126 ERROR [c.c.a.m.a.i.FirstFitRoutingAllocator]
(Work-Job-Executor-145:[ctx-2ec22277, job-3900296/job-3901388, ctx-e309f975,
FirstFitRoutingAllocator]) (logid:30d64f64) No suitable host found for vm [VM
instance
{"id":8127,"instanceName":"i-10989-8127-VM","type":"User","uuid":"ce0c1fb6-65ef-4942-97de-05b255fdcd7d"}]
with tags [generic].
2025-03-20 20:04:13,127 ERROR [c.c.v.VmWorkJobHandlerProxy]
(Work-Job-Executor-145:[ctx-2ec22277, job-3900296/job-3901388, ctx-e309f975])
(logid:30d64f64) Invocation exception, caused by:
com.cloud.utils.exception.CloudRuntimeException: No suitable host found for vm
[VM instance
{"id":8127,"instanceName":"i-10989-8127-VM","type":"User","uuid":"ce0c1fb6-65ef-4942-97de-05b255fdcd7d"}].
2025-03-20 20:04:13,127 INFO [c.c.v.VmWorkJobHandlerProxy]
(Work-Job-Executor-145:[ctx-2ec22277, job-3900296/job-3901388, ctx-e309f975])
(logid:30d64f64) Rethrow exception
com.cloud.utils.exception.CloudRuntimeException: No suitable host found for vm
[VM instance
{"id":8127,"instanceName":"i-10989-8127-VM","type":"User","uuid":"ce0c1fb6-65ef-4942-97de-05b255fdcd7d"}].
2025-03-20 20:04:13,127 DEBUG [c.c.v.VmWorkJobDispatcher]
(Work-Job-Executor-145:[ctx-2ec22277, job-3900296/job-3901388])
(logid:30d64f64) Done with run of VM work job: com.cloud.vm.VmWorkMigrateAway
for VM 8127, job origin: 3900296
2025-03-20 20:04:13,127 ERROR [c.c.v.VmWorkJobDispatcher]
(Work-Job-Executor-145:[ctx-2ec22277, job-3900296/job-3901388])
(logid:30d64f64) Unable to complete AsyncJobVO: {id:3901388, userId: 1,
accountId: 1, instanceType: null, instanceId: null, cmd:
com.cloud.vm.VmWorkMigrateAway, cmdInfo:
rO0ABXNyAB5jb20uY2xvdWQudm0uVm1Xb3JrTWlncmF0ZUF3YXmt4MX4jtcEmwIAAUoACXNyY0hvc3RJZHhyABNjb20uY2xvdWQudm0uVm1Xb3Jrn5m2VvAlZ2sCAARKAAlhY2NvdW50SWRKAAZ1c2VySWRKAAR2bUlkTAALaGFuZGxlck5hbWV0ABJMamF2YS9sYW5nL1N0cmluZzt4cAAAAAAAAAABAAAAAAAAAAEAAAAAAAAfv3QAGVZpcnR1YWxNYWNoaW5lTWFuYWdlckltcGwAAAAAAAAD1g,
cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0, result:
null, initMsid: 95539959339, completeMsid: null, lastUpdated: null, lastPolled:
null, created: Thu Mar 20 20:04:12 UTC 2025, removed: null}, job origin:
3900296 com.cloud.utils.exception.CloudRuntimeException: No suitable host found
for vm [VM instance
{"id":8127,"instanceName":"i-10989-8127-VM","type":"User","uuid":"ce0c1fb6-65
ef-4942-97de-05b255fdcd7d"}].
2025-03-20 20:04:13,128 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
(Work-Job-Executor-145:[ctx-2ec22277, job-3900296/job-3901388])
(logid:30d64f64) Complete async job-3901388, jobStatus: FAILED, resultCode: 0
`
I also checked the UI upon doing a manual migration, all windows hosts are
invalid destinations in the options to select a host (which I believe
findHostsForMigration uses, so that should be working fine)
### versions
ACS 4.20, KVM with NFS Primary/Secondary storage
### The steps to reproduce the bug
Setup two clusters, with host tags to identify where VM's should go
(windows/linux).
Try to set maintenance on one of the hosts.
Check for clusters in the log to see if it attempts others.
### What to do about it?
Correct the code to try all clusters instead of just the first.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]