Status: New
Owner: ----

New issue 792 by [email protected]: paused DRBD sync causes flood of requests to noded
http://code.google.com/p/ganeti/issues/detail?id=792

What software version are you running? Please provide the output of "gnt-
cluster --version", "gnt-cluster version", and "hspace --version".
gnt-cluster (ganeti v2.10.1) 2.10.1
Software version: 2.10.1
Internode protocol: 2100000
Configuration format: 2100000
OS api version: 20
Export interface: 0
VCS version: (ganeti) version v2.10.1
hspace (ganeti) version v2.10.1
compiled with ghc 6.12
running on linux x86_64

What distribution are you using?
Debian Squeeze

What steps will reproduce the problem?
1. Initiate a replace-disks operation.
2. Pause the DRBD sync operation with `drbdsetup {device} pause-sync`.

What is the expected output? What do you see instead?
To try to quickly recover from a single-point-of-failure condition, about 8 replace-disks operations were kicked off at once. This put excess load on the cluster and it was decided to pause all but one of the DRBD syncs to reduce the load. However, the noded processes then started putting a large load on the node receiving the DRBD sync streams. The node-daemon.log was showing a higher-than-usual requests per second to /blockdev_getmirrorstatus and many corresponding lvs command executions.

Please provide any additional information below.
Notice in the attached job info file that once the "time remaining" estimate from DRBD goes to "no time estimate", the rate at which the job is updated increases dramatically.

I suspect this function may be relevant (http://git.ganeti.org/?p=ganeti.git;a=blob;f=lib/cmdlib/instance_storage.py;hb=6e684281c5fa65261d265e89190407a7ea4f6182#l1199). The sleep on line 1265 depends on the max_time variable, which is initialized to 0 on line 1219, but only updated on line 1247 if mstat.estimated_time is not None. Perhaps when DRBD syncs are paused mstat.estimated_time is set to None and max_time remains set to 0 preventing the sleep between each loop.


Attachments:
        job.675873  234 KB
        node-daemon.log.snippet  5.7 KB

--
You received this message because this project is configured to send all issue notifications to this address.
You may adjust your notification preferences at:
https://code.google.com/hosting/settings

Reply via email to