Hi,
Sounds like libvirt/qemu have a problem restarting the VM. Are the
config files for those daemons under /etc/libvirt set up identically
on all your hosts? Did you make sure that the user that starts the
kvm process is able to read en write to the files? (In my install I
added the user "qemu" to the "oneadmin" group to make that work.)
You should have a log file for this particular VM in
/var/log/libvirt/qemu/one-[vm-id].log. What does it say?
You say it happens occasionally, so not always? Does it always fail
when you migrate tot this particular host, or also occasionally? If
occasionally, you have to look for things that change from time to
time. Do you have config files maintained by chef/puppet/cfengine?
Do you have user accounts maintained by ldap/nis? If this host
always fails, it must be bad config of this host; compare it to the
other hosts that do work.
Where does the "unable to read from monitor" come from. Is it
opennebula? That is kind of normal: it tries to read the status for
the VM but since it is not up, it fails. But actually, that should
not give you a "connection reset"...
What you can do to test your setup when it fails:
Go to the directory with the files and before you change anything,
so a "virsh create deployment.X" where X is the highest number you
can find in that dir. )In your example, it would be 2). (You will
need to be root for this or you will not be able to use the "system"
libvirt space.)
If that "just works" then you have a real strange problem. If it
gives you errors, try to solve them. :)
(If you do not know "virsh", you should read up a bit on it.)
Hope this is a bit helpful.
Jhon
On 07/04/2012 12:59 PM, David wrote:
Hi, All
I used
OpenNebula3.2.1 version.
When I execute VM
migrate operation,Occasionally the
VM appears a migration failure,
following log:
Thu Jun 28 15:16:24 2012
[LCM][I]: New VM state is RUNNING
Thu Jun 28 15:17:09 2012 [LCM][I]: New VM
state is SAVE_MIGRATE
Thu Jun 28 15:17:42 2012 [VMM][I]: save:
Executed "virsh --connect qemu:///system [^] save one-383
/one_images/383/images/checkpoint".
Thu Jun 28 15:17:42 2012 [VMM][I]:
ExitCode: 0
Thu Jun 28 15:17:42 2012 [VMM][I]:
Successfully execute virtualization driver operation: save.
Thu Jun 28 15:17:43 2012 [VMM][I]:
ExitCode: 0
Thu Jun 28 15:17:43 2012 [VMM][I]:
Successfully execute network driver operation: clean.
Thu Jun 28 15:17:43 2012 [LCM][I]: New VM
state is PROLOG_MIGRATE
Thu Jun 28 15:56:03 2012 [TM][I]:
tm_mv.sh: Moving /one_images/383/images
Thu Jun 28 15:56:03 2012 [TM][I]:
tm_mv.sh: Executed "ssh compute-56-5.local mkdir -p
/one_images/383".
Thu Jun 28 15:56:03 2012 [TM][I]:
tm_mv.sh: Executed "scp -r
compute-56-4.local:/one_images/383/images
compute-56-5.local:/one_images/383/images".
Thu Jun 28 15:56:03 2012 [TM][I]:
tm_mv.sh: Executed "ssh compute-56-4.local rm -rf
/one_images/383/images".
Thu Jun 28 15:56:03 2012 [TM][I]:
ExitCode: 0
Thu Jun 28 15:56:03 2012 [LCM][I]: New VM
state is BOOT
Thu Jun 28 15:56:05 2012 [VMM][I]:
ExitCode: 0
Thu Jun 28 15:56:05 2012 [VMM][I]:
Successfully execute network driver operation: pre.
Thu Jun 28 15:56:06 2012 [VMM][I]: Command
execution fail: /var/tmp/one/vmm/kvm/restore
/one_images/383/images/checkpoint compute-56-5.local 383
compute-56-5.local
Thu Jun 28 15:56:06 2012 [VMM][E]:
restore: Command "virsh --connect qemu:///system [^] restore
/one_images/383/images/checkpoint" failed.
Thu Jun 28 15:56:06 2012 [VMM][E]:
restore: error: Failed to restore domain from
/one_images/383/images/checkpoint
Thu Jun 28 15:56:06 2012 [VMM][I]: error:
internal error process exited while connecting to monitor:
qemu-kvm: -drive
file=/one_images/383/images/disk.0,if=none,id=drive-virtio-disk0,format=raw:
could not open disk image /one_images/383/images/disk.0:
Permission denied
Thu Jun 28 15:56:06 2012 [VMM][E]: Could
not restore from /one_images/383/images/checkpoint
Thu Jun 28 15:56:06 2012 [VMM][I]:
ExitCode: 1
Thu Jun 28 15:56:06 2012 [VMM][I]: Failed
to execute virtualization driver operation: restore.
Thu Jun 28 15:56:06 2012 [VMM][E]: Error
restoring VM: Could not restore from
/one_images/383/images/checkpoint
Thu Jun 28 15:56:06 2012 [DiM][I]: New VM
state is FAILED
execute command : chmod +x *
but ,receive the following error message:
error: Unable to read from monitor: Connection
reset by peer
This is what causes problems ?
Thanks! Hope after
Regards!
_______________________________________________
Users mailing list
Users@lists.opennebula.org
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
--
Jhon Masschelein
Senior Systeemprogrammeur
SARA - HPCV
Science Park 140
1098 XG Amsterdam
T +31 (0)20 592 8099
F +31 (0)20 668 3167
M +31 (0)6 4748 9328
E jhon.masschel...@sara.nl
http://www.sara.nl
|
_______________________________________________
Users mailing list
Users@lists.opennebula.org
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org