Re: [one-users] onevm migrate/suspend and checkpoint files

2014-08-08 Thread Steven Timm

We basically decided to go back to the stock rhel/centos/sci. linux
kernels.  Performance with 3.10 kernel was better but we never
could get things to migrate right.  Performance of the old kernel
is still very bad but at least we can migrate clean and
in the past few errata updates, redhat has fixed it so at least
the kernel doesn't crash the whole machine under those conditions.

Steve Timm

On Fri, 8 Aug 2014, Jaime Melis wrote:


Hi Steven,
unfortunately I'm not able to help with most of the email, however I can
tell you that the underlying operation for checkpointing is the save
operation.

http://wiki.libvirt.org/page/VM_lifecycle

regards,
Jaime



On Thu, Jul 24, 2014 at 4:00 PM, Steven Timm t...@fnal.gov wrote:

  When OpenNebula creates a checkpoint file either as part
  of a onevm migrate or onevm suspend, what libvirt function
  is it calling to do the checkpoint?

  We are seeing some issues on our new Ivy Bridge hardware
  that sometimes in the process of a (non-live) migration,
  the clock can get confused in such a way that when the
  virtual machine starts from the checkpoint file
  it will be hung and the kvm process uses 100% of cpu for
  a day or more, and then usually resolves itself.  In some
  cases we see the clock jump very far into the future (2598),
  which in itself can confuse a linux vm enough to hang it.

  Any clues on what OpenNebula /libvirt are doing under the
  covers?
  Is there any reason to suspect that on Ivy Bridge hardware,
  in which there are some 60 different cpu frequencies available
  for cpu scaling, the rapidly fluctuating clock speeds might
  get us into trouble--i.e. suspending the machine on one clock
  frequency and bringig it back on a different clock frequency?

  Does anyone have experience in migrating between hardware
  generations... Ivy Bridge - Westmere and vice versa?

  Finally, has anyone run a successful combination of kernel 3.10
  or greater and RHEL6/Centos 6/Sci. Linux 6?
  (In particular do the stock versions of libvirt and qemu-kvm
  play nice with the 3.10 kernel)?
  The 2.6.32 kernel that comes with RHEL6/Centos6/Sci Linux 6 is
  just not
  up to dealing with virtualization on Ivy Bridge machines and it
  has some trouble on Sandy Bridge too.

  Thanks

  Steve Timm



  --
  Steven C. Timm, Ph.D  (630) 840-8525
  t...@fnal.gov  http://home.fnal.gov/~timm/
  Fermilab Scientific Computing Division, Scientific Computing
  Services Quad.
  Grid and Cloud Services Dept., Associate Dept. Head for Cloud
  Computing
  ___
  Users mailing list
  Users@lists.opennebula.org
  http://lists.opennebula.org/listinfo.cgi/users-opennebula.org




--
Jaime Melis
Project Engineer
OpenNebula - Flexible Enterprise Cloud Made Simple
www.OpenNebula.org | jme...@opennebula.org




--
Steven C. Timm, Ph.D  (630) 840-8525
t...@fnal.gov  http://home.fnal.gov/~timm/
Fermilab Scientific Computing Division, Scientific Computing Services Quad.
Grid and Cloud Services Dept., Associate Dept. Head for Cloud Computing___
Users mailing list
Users@lists.opennebula.org
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org


[one-users] onevm migrate/suspend and checkpoint files

2014-07-24 Thread Steven Timm


When OpenNebula creates a checkpoint file either as part
of a onevm migrate or onevm suspend, what libvirt function
is it calling to do the checkpoint?

We are seeing some issues on our new Ivy Bridge hardware
that sometimes in the process of a (non-live) migration,
the clock can get confused in such a way that when the
virtual machine starts from the checkpoint file
it will be hung and the kvm process uses 100% of cpu for
a day or more, and then usually resolves itself.  In some
cases we see the clock jump very far into the future (2598),
which in itself can confuse a linux vm enough to hang it.

Any clues on what OpenNebula /libvirt are doing under the covers?
Is there any reason to suspect that on Ivy Bridge hardware,
in which there are some 60 different cpu frequencies available
for cpu scaling, the rapidly fluctuating clock speeds might
get us into trouble--i.e. suspending the machine on one clock
frequency and bringig it back on a different clock frequency?

Does anyone have experience in migrating between hardware
generations... Ivy Bridge - Westmere and vice versa?

Finally, has anyone run a successful combination of kernel 3.10
or greater and RHEL6/Centos 6/Sci. Linux 6?
(In particular do the stock versions of libvirt and qemu-kvm
play nice with the 3.10 kernel)?
The 2.6.32 kernel that comes with RHEL6/Centos6/Sci Linux 6 is just not
up to dealing with virtualization on Ivy Bridge machines and it
has some trouble on Sandy Bridge too.

Thanks

Steve Timm



--
Steven C. Timm, Ph.D  (630) 840-8525
t...@fnal.gov  http://home.fnal.gov/~timm/
Fermilab Scientific Computing Division, Scientific Computing Services Quad.
Grid and Cloud Services Dept., Associate Dept. Head for Cloud Computing
___
Users mailing list
Users@lists.opennebula.org
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org