Hi Ilya, Regarding the live migration, we are using it in production and did migrate a couple of VMs until we reach some corner cases, for which I wrote a few fixes. We'll verify them during the following weeks. The code is based on CS 4.4 but I started porting it to master. I have to finish that and merge the fixes too. For the cold migration, it's already in CS and we are usign it since a while. What do you mean by secure KVM migration? My code reads configuration values for which you can have TLS peer-2-peer connection between the agents to transfert over it all the data using the features in libvirt. That the setup we have in production.
For the graceful shutdown, we have a HA proxy in front so we just edit the configuration to turn off one MS. We are also checking manually if there aren't any snapshot ongoing before launching the stop-start. But I don't find this very robust. Therefore I read a lot of the code managing the agent and how the agents are connected to the MS. There is already a command to rebalance agents between MS, so I'm developping a solution around that. Kind regards, Marc-Aurèle > On 02 Jul 2016, at 02:03, ilya <ilya.mailing.li...@gmail.com> wrote: > > Marco, > > I written a tiny shell script that does following: > > Make's sure there are async_jobs that arent running, also block 8080 via > iptables - to avoid user connecting to MS thats about to go down. > > It needs a bit of enhancement - and should lookup the MSID of that > specific server, it looks something like this - consider borrowing > concepts if applicable.. > >> #!/bin/bash >> DATESTAMP=$(date +%m%d%y-%H%M%S) >> DBPASS=$(java -classpath /usr/share/cloudstack-common/lib/jasypt-1.9.0.jar >> org.jasypt.intf.cli.JasyptPBEStringDecryptionCLI input="$(cat >> /etc/cloudstack/management/db.properties | grep db.cloud.password | awk >> -F'(' '{print $2}' | sed 's/)//g')" password="$(cat >> /etc/cloudstack/management/key)" | grep -A2 OUTPUT | tail -1) >> DBHOST=$(cat /etc/cloudstack/management/db.properties | grep db.cloud.host | >> awk -F'=' '{print $2}' | tail -1 ) >> DBUSER=$(cat /etc/cloudstack/management/db.properties | grep >> db.cloud.username | awk -F'=' '{print $2}') >> DB=$(cat /etc/cloudstack/management/db.properties | grep db.cloud.name | awk >> -F'=' '{print $2}') >> DBPORT=$(cat /etc/cloudstack/management/db.properties | grep db.cloud.port | >> awk -F'=' '{print $2}') >> MYSQLCMD="mysql -h $DBHOST -u $DBUSER -P $DBPORT -p$DBPASS $DB" >> #echo $DBPASS $DBHOST $DBUSER $DB $DBPORT >> >> >> JOBS=$(echo 'SELECT * FROM cloud.async_job where job_status=0 and >> job_dispatcher not like "pseudoJobDispatcher"' | $MYSQLCMD | wc -l) >> >> if [ $JOBS -gt 0 ] >> then >> echo "WARN: Looks like i have active jobs in flight, please >> try again later" >> echo 'SELECT * FROM cloud.async_job where job_status=0 and >> job_dispatcher not like "pseudoJobDispatcher"' | $MYSQLCMD >> exit >> else >> echo "NOTE: No jobs running, good to go!" >> echo "NOTE: Blocking incoming 8080" >> /sbin/iptables -A INPUT -p tcp --destination-port 8080 -j DROP >> service cloudstack-management stop >> service cloudstack-management stop:wq >> CSPID=$(cat /var/run/cloudstack-management.pid ) >> ps -p $CSPID >/dev/null 2>&1 && (kill -9 $CSPID) >> ps -p $CSPID >/dev/null 2>&1 && (echo "ERROR: Count not >> terminame cloudstack service on `hostname` with pid $SCPID"; /sbin/iptables >> -D INPUT -p tcp --destination-port 8080 -j DROP; exit 1) >> service cloudstack-management start >> echo "NOTE: Unblocking incoming 8080" >> /sbin/iptables -D INPUT -p tcp --destination-port 8080 -j DROP >> fi > > Regards, > ilya > > On 7/1/16 3:30 AM, ma...@exoscale.ch wrote: >> Hi, >> >> I can't edit the page but I'll be glad to put some effort for the V5: >> - Live migration for KVM >> - Improve logging using UUIDs (as I already did part of that for us at >> exoscale) >> >> I'm in the process to add another feature we need: graceful shutdown of a >> management server when running a cluster of MS. The goal is to send a >> "prepareForShutdown" command to one or more MS and have them rebalance their >> agents to the ones still running so that no command will be lost. Then there >> shouldn't be any downtime with any agent during an update. >> >> Kind regards, >> Marc-Aurèle >> >> PS: Is there any architectural discussion going on on the Slack channel? I >> saw that the IRC is not so active... >> >> >>> On 01 Jul 2016, at 11:55, Paul Angus <paul.an...@shapeblue.com> wrote: >>> >>> There's not been much response to this, but I'll start clearing away the >>> unclaimed items, people can always add them back. >>> >>> >>> Kind regards, >>> >>> Paul Angus >>> >>> >>> paul.an...@shapeblue.com >>> www.shapeblue.com >>> 53 Chandos Place, Covent Garden, London WC2N 4HSUK >>> @shapeblue >>> >>> >>> >>