Status: New
Owner: ----
New issue 792 by [email protected]: paused DRBD sync causes flood of
requests to noded
http://code.google.com/p/ganeti/issues/detail?id=792
What software version are you running? Please provide the output of "gnt-
cluster --version", "gnt-cluster version", and "hspace --version".
gnt-cluster (ganeti v2.10.1) 2.10.1
Software version: 2.10.1
Internode protocol: 2100000
Configuration format: 2100000
OS api version: 20
Export interface: 0
VCS version: (ganeti) version v2.10.1
hspace (ganeti) version v2.10.1
compiled with ghc 6.12
running on linux x86_64
What distribution are you using?
Debian Squeeze
What steps will reproduce the problem?
1. Initiate a replace-disks operation.
2. Pause the DRBD sync operation with `drbdsetup {device} pause-sync`.
What is the expected output? What do you see instead?
To try to quickly recover from a single-point-of-failure condition, about 8
replace-disks operations were kicked off at once. This put excess load on
the cluster and it was decided to pause all but one of the DRBD syncs to
reduce the load. However, the noded processes then started putting a large
load on the node receiving the DRBD sync streams. The node-daemon.log was
showing a higher-than-usual requests per second to
/blockdev_getmirrorstatus and many corresponding lvs command executions.
Please provide any additional information below.
Notice in the attached job info file that once the "time remaining"
estimate from DRBD goes to "no time estimate", the rate at which the job is
updated increases dramatically.
I suspect this function may be relevant
(http://git.ganeti.org/?p=ganeti.git;a=blob;f=lib/cmdlib/instance_storage.py;hb=6e684281c5fa65261d265e89190407a7ea4f6182#l1199).
The sleep on line 1265 depends on the max_time variable, which is
initialized to 0 on line 1219, but only updated on line 1247 if
mstat.estimated_time is not None. Perhaps when DRBD syncs are paused
mstat.estimated_time is set to None and max_time remains set to 0
preventing the sleep between each loop.
Attachments:
job.675873 234 KB
node-daemon.log.snippet 5.7 KB
--
You received this message because this project is configured to send all
issue notifications to this address.
You may adjust your notification preferences at:
https://code.google.com/hosting/settings