Hi Evgheni, Unfortunately migrating the Jenkins VM failed, to (my) luck its back running in the old Production cluster. so we could track this I am listing again the steps taken today:
1. around 18:00 TLV time, I triggered a snapshot of the VM. This not only failed but caused the Jenkins VM to be none-responsive for a few minutes. More distributing is that although in the 'event's in the engine it announced a failure, under 'snapshots' the new snapshot was listed under status 'ok'. this also caused few CI failures(which were re-triggered). 2. As snapshot seems like a no-option, I created a new VM in the production cluster jenkins-2.phx.ovirt.org, and downloaded the latest backup from backup.phx.ovirt.org, so in case of a failure we could change the DNS and use it(keep in mind this backup does not have any builds, only logs/configs) 3. I shut down the VM from the engine - it was hanging for a few minutes in 'shutting down' and then announced 'shutdown failed', which caused it to appear again in 'up' state but it was non responsive. virsh -r --list also stated it was up. 4. I triggered another shutdown, which succeeded. As I didn't want to risk it any more I let it boot in the same cluster, which was also successful. I've attached some parts of engine.log, from a quick look on vdsm.log I didn't see anything but could help if someone else have a look(this is ovirt-srv02). the relevant log times for the shut down failure are from '2016-06-23 16:15'. Either way until we find the problem, I'm not sure we should risk it before we have a proper recovery plan. One brute-force option is using rsync from jenkins.phx.ovirt.org:/var/lib/data/jenkins to jenkins-2, with jenkins daemon itself shut down on 'jenkins-2', then we could schedule a downtime on jenkins.phx.ovirt.org, wait that everything is synced, and stop jenkins(and puppet), then start jenkins daemon on jenkins-2 and change the DNS cname of jenkins.ovirt.org to point to it. if everything goes smooth it should run fine, and if not, we still have jenkins.phx.ovirt.org running. another option is to unmount /var/lib/data/ and mount it back to jenkins-2, though then we might be in trouble if something goes wrong on the way. Nadav.
engine.log snapshot event 2016-06-23 09:06:49,592 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-44) VM jenkins-phx-ovirt-org e7a7b735-0310-4f88-9ed9-4fed85835a01 moved from Up --> Paused , Custom Event ID: -1, Message: Failed to create live snapshot 'ngoldin_before_cluster_move' for VM 'jenkins-phx-ovirt-org'. VM restart is recommended. Note that using the created snapshot might cause data inconsistency. , Custom Event ID: -1, Message: Failed to complete snapshot 'ngoldin_before_cluster_move' creation for VM 'jenkins-phx-ovirt-org'. 2016-06-23 09:17:29,020 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-69) VM jenkins-phx-ovirt-org e7a7b735-0310-4f88-9ed9-4fed85835a01 moved from Paused --> Up failed shutdown 2016-06-23 15:59:20,348 INFO [org.ovirt.engine.core.bll.ShutdownVmCommand] (org.ovirt.thread.pool-8-thread-25) [52b9dd27] Entered (VM jenkins-phx-ovirt-org). 2016-06-23 15:59:20,349 INFO [org.ovirt.engine.core.bll.ShutdownVmCommand] (org.ovirt.thread.pool-8-thread-25) [52b9dd27] Sending shutdown command for VM jenkins-phx-ovirt-org. 2016-06-23 15:59:20,446 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-25) [52b9dd27] Correlation ID: 52b9dd27, Job ID: f1f0d78e-ae68-465e-a3c1-e46d146fc2e7, Call Stack: null, Custom Event ID: -1, Message: VM shutdown initiated by admin on VM jenkins-phx-ovirt-org (Host: ovirt-srv02) (Reason: Not Specified). 2016-06-23 16:04:20,556 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-20) [2d2d1b3a] VM jenkins-phx-ovirt-org e7a7b735-0310-4f88-9ed9-4fed85835a01 moved from PoweringDown --> Up 2016-06-23 16:04:20,628 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-20) [2d2d1b3a] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Shutdown of VM jenkins-phx-ovirt-org failed.
_______________________________________________ Infra mailing list Infra@ovirt.org http://lists.ovirt.org/mailman/listinfo/infra