Re: Review Request 24598: Copy Snapshot command too heavy on XenServer Dom0 resources when using dd top copy incremental snapshots

daan Hoogland Thu, 14 Aug 2014 01:32:02 -0700


> On Aug. 12, 2014, 11:16 a.m., daan Hoogland wrote:
> > c4b78c3aaa8df20c8e892b9d5108d8f34f96ed0c on 4.4


37baddd7212717f259c33b3bb75720d718b92d2c on master


- daan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24598/#review50304
-----------------------------------------------------------


On Aug. 12, 2014, 11:21 a.m., Joris van Lieshout wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24598/
> -----------------------------------------------------------
> 
> (Updated Aug. 12, 2014, 11:21 a.m.)
> 
> 
> Review request for cloudstack, Alex Huang, anthony xu, daan Hoogland, edison 
> su, Kishan Kavala, Min Chen, Sanjay Tripathi, and Hugo Trippaers.
> 
> 
> Bugs: CLOUDSTACK-7319
>     https://issues.apache.org/jira/browse/CLOUDSTACK-7319
> 
> 
> Repository: cloudstack-git
> 
> 
> Description
> -------
> 
> We noticed that the dd process was way to agressive on Dom0 causing all kinds 
> of problems on a xenserver with medium workloads. 
> ACS uses the dd command to copy incremental snapshots to secondary storage. 
> This process is to heavy on Dom0 resources and even impacts DomU performance, 
> and can even lead to domain freezes (including Dom0) of more then a minute. 
> We've found that this is because the Dom0 kernel caches the read and write 
> operations of dd.
> Some of the issues we have seen as a consequence of this are:
> - DomU performance/freezes
> - OVS freeze and not forwarding any traffic
> - Including LACPDUs resulting in the bond going down
> - keepalived heartbeat packets between RRVMs not being send/received 
> resulting in flapping RRVM master state
> - Braking snapshot copy processes
> - the xenserver heartbeat script reaching it's timeout and fencing the server
> - poolmaster connection loss
> - ACS marking the host as down and fencing the instances even though they are 
> still running on the origional host resulting in the same instance running on 
> to hosts in one cluster
> - vhd corruption are a result of some of the issues mentioned above
> We've developed a patch on the xenserver scripts 
> /etc/xapi.d/plugins/vmopsSnapshot that added the direct flag of both input 
> and output files (iflag=direct oflag=direct).
> Our test have shown that Dom0 load during snapshot copy is way lower.
> 
> We believe Hot-fix 4 for XS62 sp1 contains a similar fix but for the sparse 
> dd process used for the first copy of a chain.
> 
> http://support.citrix.com/article/CTX140417
> 
> == begin quote ==
> Copying a virtual disk between SRs uses the unbuffered I/O to avoid polluting 
> the pagecache in the Control Domain (dom0). This reduces the dom0 vCPU 
> overhead and allows the pagecache to work more effectively for other 
> operations.
> == end quote ==
> 
> 
> Diffs
> -----
> 
>   scripts/vm/hypervisor/xenserver/vmopsSnapshot 5fd69a6 
> 
> Diff: https://reviews.apache.org/r/24598/diff/
> 
> 
> Testing
> -------
> 
> We are running this fix in our beta and prod environment (both using ACS 
> 4.3.0) with great success.
> 
> 
> Thanks,
> 
> Joris van Lieshout
> 
>

Re: Review Request 24598: Copy Snapshot command too heavy on XenServer Dom0 resources when using dd top copy incremental snapshots

Reply via email to