Re: Issue 781 in ganeti: Stalled hbal after long running replace-disks

ganeti Mon, 22 Jun 2015 05:52:23 -0700

Comment #1 on issue 781 by [email protected]: Stalled hbal after longrunning replace-disks

https://code.google.com/p/ganeti/issues/detail?id=781


Hi, I bumped into the same issue. Here's some context details:

# gnt-cluster --version
gnt-cluster (ganeti v2.9.3) 2.9.3

# gnt-cluster version
Software version: 2.9.3
Internode protocol: 2090000
Configuration format: 2090000
OS api version: 20
Export interface: 0
VCS version: v2.9.3

# hspace --version
hspace (ganeti) version v2.9.3
compiled with ghc 7.4
running on linux x86_64

Cluster's nodes are running Debian Wheezy 7.8


What steps will reproduce the problem?

running 'hbal -L -X' where a 'replace-disks' job is included. Thatparticular job affected an instance with 200GB disk.


What is the expected output? What do you see instead?

Expected behavior for hbal would be to execute the whole series of jobscalculated to rebalance the cluster. Instead, hbal stalls aftersuccessfully executing the first job which is replacing-disks and migratingan instance with 200GB disk.


Please provide any additional information below:

775710 successINSTANCE_REPLACE_DISKS(problematic.instance),INSTANCE_MIGRATE(problematic.instance)

775711 success INSTANCE_QUERY_DATA
775714 success CLUSTER_VERIFY
775715 success CLUSTER_VERIFY_CONFIG
775716 success CLUSTER_VERIFY_GROUP(5d3aed89-4f19-4a87-8d0d-cff6159a6926)
775717 success CLUSTER_VERIFY_GROUP(e4c3ade3-f126-4d5f-aebe-0d114c9c5006)
775718 success INSTANCE_QUERY_DATA

# gnt-job info 775710
Job ID: 775710
  Status: success
  Received:         2015-06-22 13:12:17.127187
  Processing start: 2015-06-22 13:12:17.262588 (delta 0.135401s)
  Processing end:   2015-06-22 13:39:40.079557 (delta 1642.816969s)
  Total processing time: 1642.952370 seconds
  Opcodes:
    OP_INSTANCE_REPLACE_DISKS
      Status: success
      Processing start: 2015-06-22 13:12:17.262588
      Execution start:  2015-06-22 13:12:17.431253
      Processing end:   2015-06-22 13:37:31.807807

OP_INSTANCE_MIGRATE
      Status: success
      Processing start: 2015-06-22 13:37:32.058913
      Execution start:  2015-06-22 13:38:17.707905
      Processing end:   2015-06-22 13:39:40.079539

I had to manually kill hbal process at 2015-06-22 14:26, and then re-issueit to execute the rest of the commands.

--

You received this message because this project is configured to send allissue notifications to this address.

You may adjust your notification preferences at:
https://code.google.com/hosting/settings

Re: Issue 781 in ganeti: Stalled hbal after long running replace-disks

Reply via email to