Comment #1 on issue 781 by [email protected]: Stalled hbal after long
running replace-disks
https://code.google.com/p/ganeti/issues/detail?id=781
Hi, I bumped into the same issue. Here's some context details:
# gnt-cluster --version
gnt-cluster (ganeti v2.9.3) 2.9.3
# gnt-cluster version
Software version: 2.9.3
Internode protocol: 2090000
Configuration format: 2090000
OS api version: 20
Export interface: 0
VCS version: v2.9.3
# hspace --version
hspace (ganeti) version v2.9.3
compiled with ghc 7.4
running on linux x86_64
Cluster's nodes are running Debian Wheezy 7.8
What steps will reproduce the problem?
running 'hbal -L -X' where a 'replace-disks' job is included. That
particular job affected an instance with 200GB disk.
What is the expected output? What do you see instead?
Expected behavior for hbal would be to execute the whole series of jobs
calculated to rebalance the cluster. Instead, hbal stalls after
successfully executing the first job which is replacing-disks and migrating
an instance with 200GB disk.
Please provide any additional information below:
775710 success
INSTANCE_REPLACE_DISKS(problematic.instance),INSTANCE_MIGRATE(problematic.instance)
775711 success INSTANCE_QUERY_DATA
775714 success CLUSTER_VERIFY
775715 success CLUSTER_VERIFY_CONFIG
775716 success CLUSTER_VERIFY_GROUP(5d3aed89-4f19-4a87-8d0d-cff6159a6926)
775717 success CLUSTER_VERIFY_GROUP(e4c3ade3-f126-4d5f-aebe-0d114c9c5006)
775718 success INSTANCE_QUERY_DATA
# gnt-job info 775710
Job ID: 775710
Status: success
Received: 2015-06-22 13:12:17.127187
Processing start: 2015-06-22 13:12:17.262588 (delta 0.135401s)
Processing end: 2015-06-22 13:39:40.079557 (delta 1642.816969s)
Total processing time: 1642.952370 seconds
Opcodes:
OP_INSTANCE_REPLACE_DISKS
Status: success
Processing start: 2015-06-22 13:12:17.262588
Execution start: 2015-06-22 13:12:17.431253
Processing end: 2015-06-22 13:37:31.807807
OP_INSTANCE_MIGRATE
Status: success
Processing start: 2015-06-22 13:37:32.058913
Execution start: 2015-06-22 13:38:17.707905
Processing end: 2015-06-22 13:39:40.079539
I had to manually kill hbal process at 2015-06-22 14:26, and then re-issue
it to execute the rest of the commands.
--
You received this message because this project is configured to send all
issue notifications to this address.
You may adjust your notification preferences at:
https://code.google.com/hosting/settings