----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24598/#review50304 -----------------------------------------------------------
Ship it! c4b78c3aaa8df20c8e892b9d5108d8f34f96ed0c on 4.4 - daan Hoogland On Aug. 12, 2014, 11:05 a.m., Joris van Lieshout wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/24598/ > ----------------------------------------------------------- > > (Updated Aug. 12, 2014, 11:05 a.m.) > > > Review request for cloudstack, Alex Huang, anthony xu, daan Hoogland, edison > su, Kishan Kavala, Min Chen, Sanjay Tripathi, and Hugo Trippaers. > > > Bugs: CLOUDSTACK-7319 > https://issues.apache.org/jira/browse/CLOUDSTACK-7319 > > > Repository: cloudstack-git > > > Description > ------- > > We noticed that the dd process was way to agressive on Dom0 causing all kinds > of problems on a xenserver with medium workloads. > ACS uses the dd command to copy incremental snapshots to secondary storage. > This process is to heavy on Dom0 resources and even impacts DomU performance, > and can even lead to domain freezes (including Dom0) of more then a minute. > We've found that this is because the Dom0 kernel caches the read and write > operations of dd. > Some of the issues we have seen as a consequence of this are: > - DomU performance/freezes > - OVS freeze and not forwarding any traffic > - Including LACPDUs resulting in the bond going down > - keepalived heartbeat packets between RRVMs not being send/received > resulting in flapping RRVM master state > - Braking snapshot copy processes > - the xenserver heartbeat script reaching it's timeout and fencing the server > - poolmaster connection loss > - ACS marking the host as down and fencing the instances even though they are > still running on the origional host resulting in the same instance running on > to hosts in one cluster > - vhd corruption are a result of some of the issues mentioned above > We've developed a patch on the xenserver scripts > /etc/xapi.d/plugins/vmopsSnapshot that added the direct flag of both input > and output files (iflag=direct oflag=direct). > Our test have shown that Dom0 load during snapshot copy is way lower. > > > Diffs > ----- > > scripts/vm/hypervisor/xenserver/vmopsSnapshot 5fd69a6 > > Diff: https://reviews.apache.org/r/24598/diff/ > > > Testing > ------- > > We are running this fix in our beta and prod environment (both using ACS > 4.3.0) with great success. > > > Thanks, > > Joris van Lieshout > >