-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24598/#review50304
-----------------------------------------------------------

Ship it!


c4b78c3aaa8df20c8e892b9d5108d8f34f96ed0c on 4.4

- daan Hoogland


On Aug. 12, 2014, 11:05 a.m., Joris van Lieshout wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24598/
> -----------------------------------------------------------
> 
> (Updated Aug. 12, 2014, 11:05 a.m.)
> 
> 
> Review request for cloudstack, Alex Huang, anthony xu, daan Hoogland, edison 
> su, Kishan Kavala, Min Chen, Sanjay Tripathi, and Hugo Trippaers.
> 
> 
> Bugs: CLOUDSTACK-7319
>     https://issues.apache.org/jira/browse/CLOUDSTACK-7319
> 
> 
> Repository: cloudstack-git
> 
> 
> Description
> -------
> 
> We noticed that the dd process was way to agressive on Dom0 causing all kinds 
> of problems on a xenserver with medium workloads. 
> ACS uses the dd command to copy incremental snapshots to secondary storage. 
> This process is to heavy on Dom0 resources and even impacts DomU performance, 
> and can even lead to domain freezes (including Dom0) of more then a minute. 
> We've found that this is because the Dom0 kernel caches the read and write 
> operations of dd.
> Some of the issues we have seen as a consequence of this are:
> - DomU performance/freezes
> - OVS freeze and not forwarding any traffic
> - Including LACPDUs resulting in the bond going down
> - keepalived heartbeat packets between RRVMs not being send/received 
> resulting in flapping RRVM master state
> - Braking snapshot copy processes
> - the xenserver heartbeat script reaching it's timeout and fencing the server
> - poolmaster connection loss
> - ACS marking the host as down and fencing the instances even though they are 
> still running on the origional host resulting in the same instance running on 
> to hosts in one cluster
> - vhd corruption are a result of some of the issues mentioned above
> We've developed a patch on the xenserver scripts 
> /etc/xapi.d/plugins/vmopsSnapshot that added the direct flag of both input 
> and output files (iflag=direct oflag=direct).
> Our test have shown that Dom0 load during snapshot copy is way lower.
> 
> 
> Diffs
> -----
> 
>   scripts/vm/hypervisor/xenserver/vmopsSnapshot 5fd69a6 
> 
> Diff: https://reviews.apache.org/r/24598/diff/
> 
> 
> Testing
> -------
> 
> We are running this fix in our beta and prod environment (both using ACS 
> 4.3.0) with great success.
> 
> 
> Thanks,
> 
> Joris van Lieshout
> 
>

Reply via email to