Re: [PATCH] kexec/kdump implementation for Xen PV domU

2013-07-30 Thread Daniel Kiper
On Mon, Jul 29, 2013 at 02:44:19PM -0700, Matt Wilson wrote:
 On Mon, Jul 29, 2013 at 07:15:43PM +0200, Daniel Kiper wrote:
  Hi all,
 
  Here I am sending as attachments patches enabling kexec/kdump
  support in Xen PV domU. Only x84_64 architecture is supported.
  There is no support for i386 but some code could be easily reused.
  Here is a description of patches:

 [...]

- kexec-kernel-only_20121203.patch: this patch fixes timer
  issue on Amazon EC2 machines.

 Hi Daniel,

 Do you know the cause of this issue? Does it have something to do with
 singleshot timer migration when offlining/onlining SMP CPUs?

Sadly, no. I was not able to replicate this on my machines (I did test
on Xen 4.1). However, as I saw this issue appears on Xen 3.4 and 4.0
(IIRC version numbers used on your machines). Additionally, it does not
depend on CPU models. And it appears quite often but not always.
Maybe it is linked with singleshot timer migration.

Daniel
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: is kexec on Xen domU possible?

2013-07-22 Thread Daniel Kiper
On Fri, Jul 19, 2013 at 01:58:05PM -0700, H. Peter Anvin wrote:
 On 07/19/2013 12:14 PM, Greg KH wrote:
 
  The errors that the kexec tools seem to run into is finding the memory
  to place the new kernel into, is that just an issue that PV guests
  aren't given enough kernel memory in which to replicate themselves from
  dom0?
 
  There are a lot of differences between baremetal machines and PV guests.
  For example you are not able to do identity mapping per se in PV guests.
  Arguments to new kernel are passed in completely different way. etc.
 
  Ok, thanks for confirming that it is possible, but doesn't currently
  work for pv guests.
 

 Also, in any virtualized environment the hypervisor can do a better job
 for things like kdump, simply because it can provide two things that are
 otherwise hard to do:

 1. a known-good system state;
 2. a known-clean kdump image.

 As such, I do encourage the virtualization people to (also) develop
 hypervisor-*aware* solutions for these kinds of things.

In general I agree but if you could not change hypervisor
and/or dom0 (e.g. you are using cloud providers which are
stick to old versions of Xen) then you have no choice.

Daniel
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: is kexec on Xen domU possible?

2013-07-19 Thread Daniel Kiper
On Thu, Jul 18, 2013 at 07:06:03PM -0700, Greg KH wrote:
 Hi all,

 I was messing around with kexec and it seems to work just fine on KVM,
 but for Xen domU images, it doesn't work at all.  Daniel, I saw some
 patches from you back in September 2012 for adding this support for
 Dom0, but they don't seem to have gone into the kernel (but other

At first I was going use existing in Xen kexec implementation for Dom0.
However, after some discussion on Xen-devel and LKML we stated that
this implementation is completely broken and should be rewriten. David
Vrabel from Citrix wrote new kexec implementation for Xen which does
not relay on Linux kernel. I hope that it will be included in Xen 4.4 release.

 patches went into kexec-tools at that time.)  You mention that domU

These are mostly fixes which were needed for planned Xen kexec support.
IIRC, they are also needed for systems using ancient Xen Linux Kernel 2.18.
However, most of this implementation will be replaced by new one written
by David Vrabel. It will contain support only for new Xen Dom0 kexec 
implentation.

 support is easy after your patches go in, is that because Dom0 needs
 to support this, or is it something specific to only domU?

In case of domU we should consider following cases:
  - PV guests: there is no support for kexec at this time;
Once I wrote an implementatation for that type of guests
for one company but according to our agreement I could not
publish this code; However, I could use it as a base for
publicly available kexec implementation; Currently, I do
not have any plans to work on this due to some more important
stuff to do; However, question about kexec support for PV
guests is raised from time to time and maybe this issue
will be much more important than others once,
  - HVM guests: kexec should work without any issue,
  - PVonHVM guests: IIRC, there were some issues with PV
drivers but they were fixed some time ago by patches
posted by Olaf Hering,
  - PVH guests: those type of guests are not available in Xen
current releases yet; However, Konrad Wilk done some preliminary
work on kexec support but there are still some issues to resolve.

I do not know what are you trying to do but if you would like
to get some crash dumps there is also another solution to that.
You could use xm/xl dump-core from Dom0 to get dumps of domU memory.

 Also, what's the status of those patches for the kernel, I don't see
 them reposted anywhere, did you drop them?

They were dropped. Please look above for details.

I hope that helps.

Daniel
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: is kexec on Xen domU possible?

2013-07-19 Thread Daniel Kiper
On Fri, Jul 19, 2013 at 08:12:43AM -0700, Greg KH wrote:
 On Fri, Jul 19, 2013 at 03:18:19PM +0200, Daniel Kiper wrote:
   support is easy after your patches go in, is that because Dom0 needs
   to support this, or is it something specific to only domU?
 
  In case of domU we should consider following cases:
- PV guests: there is no support for kexec at this time;
  Once I wrote an implementatation for that type of guests
  for one company but according to our agreement I could not
  publish this code; However, I could use it as a base for
  publicly available kexec implementation; Currently, I do
  not have any plans to work on this due to some more important
  stuff to do; However, question about kexec support for PV
  guests is raised from time to time and maybe this issue
  will be much more important than others once,
- HVM guests: kexec should work without any issue,
- PVonHVM guests: IIRC, there were some issues with PV
  drivers but they were fixed some time ago by patches
  posted by Olaf Hering,
- PVH guests: those type of guests are not available in Xen
  current releases yet; However, Konrad Wilk done some preliminary
  work on kexec support but there are still some issues to resolve.
 
  I do not know what are you trying to do but if you would like
  to get some crash dumps there is also another solution to that.
  You could use xm/xl dump-core from Dom0 to get dumps of domU memory.

 As Brandon said, we were trying to use kexec in a PV guest in domU to
 run another kernel.  I had assumed this wouldn't need support from dom0.

You are right.

 As you have implemented this in the past, did you need to change dom0 in
 order to achieve this, and if so, why?

It was strong requirement to not change anything in hypervisor or dom0.
I succeeded to do that but it requires changes in kernel and kexec-tools.

 The errors that the kexec tools seem to run into is finding the memory
 to place the new kernel into, is that just an issue that PV guests
 aren't given enough kernel memory in which to replicate themselves from
 dom0?

There are a lot of differences between baremetal machines and PV guests.
For example you are not able to do identity mapping per se in PV guests.
Arguments to new kernel are passed in completely different way. etc.

Daniel
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation

2013-01-11 Thread Daniel Kiper
On Thu, Jan 10, 2013 at 02:19:55PM +, David Vrabel wrote:
 On 04/01/13 17:01, Daniel Kiper wrote:
  On Fri, Jan 04, 2013 at 02:38:44PM +, David Vrabel wrote:
  On 04/01/13 14:22, Daniel Kiper wrote:
  On Wed, Jan 02, 2013 at 11:26:43AM +, Andrew Cooper wrote:
  On 27/12/12 18:02, Eric W. Biederman wrote:
  Andrew Cooperandrew.coop...@citrix.com  writes:
 
  On 27/12/2012 07:53, Eric W. Biederman wrote:
  The syscall ABI still has the wrong semantics.
 
  Aka totally unmaintainable and umergeable.
 
  The concept of domU support is also strange.  What does domU support 
  even mean, when the dom0 support is loading a kernel to pick up Xen 
  when Xen falls over.
  There are two requirements pulling at this patch series, but I agree
  that we need to clarify them.
  It probably make sense to split them apart a little even.
 
 
 
  Thinking about this split, there might be a way to simply it even more.
 
  /sbin/kexec can load the Xen crash kernel itself by issuing
  hypercalls using /dev/xen/privcmd.  This would remove the need for
  the dom0 kernel to distinguish between loading a crash kernel for
  itself and loading a kernel for Xen.
 
  Or is this just a silly idea complicating the matter?
 
  This is impossible with current Xen kexec/kdump interface.
  It should be changed to do that. However, I suppose that
  Xen community would not be interested in such changes.
 
  I don't see why the hypercall ABI cannot be extended with new sub-ops
  that do the right thing -- the existing ABI is a bit weird.
 
  I plan to start prototyping something shortly (hopefully next week) for
  the Xen kexec case.
 
  Wow... As I can this time Xen community is interested in...
  That is great. I agree that current kexec interface is not ideal.

 I spent some more time looking at the existing interface and
 implementation and it really is broken.

  David, I am happy to help in that process. However, if you wish I could
  carry it myself. Anyway, it looks that I should hold on with my
  Linux kexec/kdump patches.

 I should be able to post some prototype patches for Xen in a few weeks.
  No guarantees though.

That is great. If you need any help drop me a line.

  My .5 cents:
- We should focus on KEXEC_CMD_kexec_load and KEXEC_CMD_kexec_unload;
  probably we should introduce KEXEC_CMD_kexec_load2 and 
  KEXEC_CMD_kexec_unload2;
  load should __LOAD__ kernel image and other things into hypervisor 
  memory;

 Yes, but I don't see how we can easily support both ABIs easily.  I'd be
 in favour of replacing the existing hypercalls and requiring updated
 kexec tools in dom0 (this isn't that different to requiring the correct
 libxc in dom0).

Why? Just define new strutures for new functions of kexec hypercall.
That should suffice.

  I suppose that allmost all things could be copied from 
  linux/kernel/kexec.c,
  
  linux/arch/x86/kernel/{machine_kexec_$(BITS).c,relocate_kernel_$(BITS).c};
  I think that KEXEC_CMD_kexec should stay as is,

 I don't think we want all the junk from Linux inside Xen -- we only want
 to support the kdump case and do not have to handle returning from the
 kexec image.

I do not want to implement kexec jump or stuff like. However, I think that
it is worth use code which could be used. As I know there are lot of stuff
which was taken with smaller or bigger changes from Linux Kernel.
Why we would like to reinvent the wheel this time?

Additionally, we should not drop kexec support. It is main part of kdump.
In case of kdump new kernel (and other stuff) is placed in prealocated
space in contrary to kexec. That's all. kexec is useful if you would like
to quickly (skipping BIOS) switch from Xen to baremetal Linux. If you drop
kexec support from Xen then you need alter kexec-tools package in bunch
of distros to take into account new Xen behavior.
I think that it is not we want to do.

- Hmmm... Now I think that we should still use kexec syscall to load image
  into Xen memory (with new KEXEC_CMD_kexec_load2) because it establishes
  all things which are needed to call kdump if dom0 crashes; however,
  I could be wrong...

 I don't think we need the kexec syscall.  The kernel can unconditionally
 do the crash hypercall, which will return if the kdump kernel isn't
 loaded and the kernel can fall back to the regular non-kexec panic.

No, please do not do that. When you call HYPERVISOR_kexec_op(KEXEC_CMD_kexec)
system is completly shutdown. Return form HYPERVISOR_kexec_op(KEXEC_CMD_kexec)
would require to restore some kernel functionalities. It maybe impossible
in some cases. Additionally, it means that some changes should be made
in generic kexec code path. As I know kexec maintainers are very reluctant
to make such things.

 This will allow the kexec syscall to be used only for the domU kexec case.

- last but not least, we should think about support for PV guests
  too.

 I won't be looking at this.

OK.

 To avoid confusion about the two largely

Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation

2013-01-11 Thread Daniel Kiper
On Mon, Jan 07, 2013 at 01:49:44PM +, Ian Campbell wrote:
 On Mon, 2013-01-07 at 12:34 +, Daniel Kiper wrote:
  I think that new kexec hypercall function should mimics kexec syscall.
 
 We want to have an interface can be used by non-Linux domains (both dom0
 and domU) as well though, so please bear this in mind.

I agree, but all arguments passed to kexec syscall are quiet generic and they
do not impose any limitations. Just look into include/linux/kexec.h.
That is why I think that a lot of things could be taken from
Linux kexec implementation.

Daniel
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation

2013-01-07 Thread Daniel Kiper
On Fri, Jan 04, 2013 at 02:11:46PM -0500, Konrad Rzeszutek Wilk wrote:
 On Fri, Jan 04, 2013 at 06:07:51PM +0100, Daniel Kiper wrote:
  On Fri, Jan 04, 2013 at 02:41:17PM +, Jan Beulich wrote:
On 04.01.13 at 15:22, Daniel Kiper daniel.ki...@oracle.com wrote:
On Wed, Jan 02, 2013 at 11:26:43AM +, Andrew Cooper wrote:
/sbin/kexec can load the Xen crash kernel itself by issuing
hypercalls using /dev/xen/privcmd.  This would remove the need for
the dom0 kernel to distinguish between loading a crash kernel for
itself and loading a kernel for Xen.
   
Or is this just a silly idea complicating the matter?
   
This is impossible with current Xen kexec/kdump interface.
  
   Why?
 
  Because current KEXEC_CMD_kexec_load does not load kernel
  image and other things into Xen memory. It means that it
  should live somewhere in dom0 Linux kernel memory.

 We could have a very simple hypercall which would have:

 struct fancy_new_hypercall {
   xen_pfn_t payload; // IN
   ssize_t len; // IN
 #define DATA (11)
 #define DATA_EOF (12)
 #define DATA_KERNEL (13)
 #define DATA_RAMDISK (14)
   unsigned int flags; // IN
   unsigned int status; // OUT
 };

 which would in a loop just iterate over the payloads and
 let the hypervisor stick it in the crashkernel space.

 This is all hand-waving of course. There probably would be a need
 to figure out how much space you have in the reserved Xen's
 'crashkernel' memory region too.

I think that new kexec hypercall function should mimics kexec syscall.
It means that all arguments passed to hypercall should have same types
if it is possible or if it is not possible then conversion should be done
in very easy way. Additionally, I think that one call of new hypercall
load function should load all needed thinks in right place and
return relevant status. Last but not least, new functionality should
be available through /dev/xen/privcmd or directly from kernel without
bigger effort.

Daniel
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE

2013-01-07 Thread Daniel Kiper
On Mon, Jan 07, 2013 at 09:48:20AM +, Jan Beulich wrote:
  On 04.01.13 at 18:25, Daniel Kiper daniel.ki...@oracle.com wrote:
  Right, so where is virtual mapping of control page established?
  I could not find relevant code in SLES kernel which does that.

 In the hypervisor (xen/arch/x86/machine_kexec.c:machine_kexec_load()).
 xen/arch/x86/machine_kexec.c:machine_kexec() then simply uses
 image-page_list[1].

This (xen/arch/x86/machine_kexec.c:machine_kexec_load()) maps relevant
page (allocated earlier by dom0) in hypervisor fixmap area. However,
it does not make relevant mapping in transition page table which
leads to crash when %cr3 is switched from Xen page table to
transition page table.

Daniel
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 01/11] kexec: introduce kexec firmware support

2013-01-04 Thread Daniel Kiper
On Thu, Dec 27, 2012 at 07:06:13PM -0800, ebied...@xmission.com wrote:
 Daniel Kiper daniel.ki...@oracle.com writes:

  Daniel Kiper daniel.ki...@oracle.com writes:
 
   Some kexec/kdump implementations (e.g. Xen PVOPS) could not use default
   Linux infrastructure and require some support from firmware and/or 
   hypervisor.
   To cope with that problem kexec firmware infrastructure was introduced.
   It allows a developer to use all kexec/kdump features of given firmware
   or hypervisor.
 
  As this stands this patch is wrong.
 
  You need to pass an additional flag from userspace through /sbin/kexec
  that says load the kexec image in the firmware.  A global variable here
  is not ok.
 
  As I understand it you are loading a kexec on xen panic image.  Which
  is semantically different from a kexec on linux panic image.  It is not
  ok to do have a silly global variable kexec_use_firmware.
 
  Earlier we agreed that /sbin/kexec should call kexec syscall with
  special flag. However, during work on Xen kexec/kdump v3 patch
  I stated that this is insufficient because e.g. crash_kexec()
  should execute different code in case of use of firmware support too.

 That implies you have the wrong model of userspace.

 Very simply there is:
 linux kexec pass through to xen kexec.

 And
 linux kexec (ultimately pv kexec because the pv machine is a slightly
 different architecture).

As I understand in Xen dom0 kexec/kdump case machine_kexec() should call
stub which should call relevant hypercall to initiate kexec/kdump in
Xen itself. Right?

  Sadly syscall does not save this flag anywhere.

  Additionally, I stated
  that kernel itself has the best knowledge which code path should be
  used (firmware or plain Linux). If this decision will be left to userspace
  then simple kexec syscall could crash system at worst case (e.g. when
  plain Linux kexec will be used in case when firmware kaxec should be
  used).

 And that path selection bit is strongly non-sense.  You are advocating
 hardcoding unnecessary policy in the kernel.

 If for dom0 you need crash_kexec to do something different from domU
 you should be able to load a small piece of code via kexec that makes
 the hypervisor calls you need.

  However, if you wish I could add this flag to syscall.

 I do wish.  We need to distinguish between the kexec firmware pass
 through, and normal kexec.

OK.

  Additionally, I could
  add function which enables firmware support and then kexec_use_firmware
  variable will be global only in kexec.c module.

 No.  kexec_use_firmware is the wrong mental model.

 Do not mix the kexec pass through and the normal kexec case.

 We most definitely need to call different code in the kexec firmware
 pass through case.

 For normal kexec we just need to use a paravirt aware version of
 machine_kexec and machine_kexec_shutdown.

OK, but this solves problem in crash_kexec() only. However, kernel_kexec()
still calls machine_shutdown() which breaks kexec on Xen dom0 (to be precise
it shutdown machine via hypercall). Should I add machine_kexec_shutdown()
(like machine_crash_shutdown()) which would call, let's say,
machine_ops.kexec_shutdown()?

Additionally, crash_shrink_memory() does not make sens in Xen dom0 case.
How do you wish disable it if kexec_use_firmware is the wrong mental model?

  Furthermore it is not ok to have a conditional
  code outside of header files.
 
  I agree but how to dispatch execution e.g. in crash_kexec()
  if we would like (I suppose) compile kexec firmware
  support conditionally?

 The classic pattern is to have the #ifdefs in the header and have an
 noop function that is inlined when the functionality is compiled out.
 This allows all of the logic to always be compiled.

OK.

Daniel
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE

2013-01-04 Thread Daniel Kiper
On Fri, Dec 28, 2012 at 01:59:27PM +0100, Borislav Petkov wrote:
 On Thu, Dec 27, 2012 at 03:19:24PM -0800, Daniel Kiper wrote:
   Hmm... this code is being redone at the moment... this might conflict.
 
  Is this available somewhere? May I have a look at it?

 http://marc.info/?l=linux-kernelm=135581534620383

 The for-x86-boot-v7 and -v8 branches.

 HTH.

Thanks.

Daniel
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation

2013-01-04 Thread Daniel Kiper
On Wed, Jan 02, 2013 at 11:26:43AM +, Andrew Cooper wrote:
 On 27/12/12 18:02, Eric W. Biederman wrote:
 Andrew Cooperandrew.coop...@citrix.com  writes:
 
 On 27/12/2012 07:53, Eric W. Biederman wrote:
 The syscall ABI still has the wrong semantics.
 
 Aka totally unmaintainable and umergeable.
 
 The concept of domU support is also strange.  What does domU support even 
 mean, when the dom0 support is loading a kernel to pick up Xen when Xen 
 falls over.
 There are two requirements pulling at this patch series, but I agree
 that we need to clarify them.
 It probably make sense to split them apart a little even.
 
 

 Thinking about this split, there might be a way to simply it even more.

 /sbin/kexec can load the Xen crash kernel itself by issuing
 hypercalls using /dev/xen/privcmd.  This would remove the need for
 the dom0 kernel to distinguish between loading a crash kernel for
 itself and loading a kernel for Xen.

 Or is this just a silly idea complicating the matter?

This is impossible with current Xen kexec/kdump interface.
It should be changed to do that. However, I suppose that
Xen community would not be interested in such changes.

Daniel
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE

2013-01-04 Thread Daniel Kiper
On Thu, Jan 03, 2013 at 09:34:55AM +, Jan Beulich wrote:
  On 27.12.12 at 03:18, Daniel Kiper daniel.ki...@oracle.com wrote:
  Some implementations (e.g. Xen PVOPS) could not use part of identity page 
  table
  to construct transition page table. It means that they require separate 
  PUDs,
  PMDs and PTEs for virtual and physical (identity) mapping. To satisfy that
  requirement add extra pointer to PGD, PUD, PMD and PTE and align existing
  code.

 So you keep posting this despite it having got pointed out on each
 earlier submission that this is unnecessary, proven by the fact that
 the non-pvops Xen kernels can get away without it. Why?

Sorry but I forgot to reply for your email last time.

I am still not convinced. I have tested SUSE kernel itself and it does not work.
Maybe I missed something but... Please check 
arch/x86/kernel/machine_kexec_64.c:init_transition_pgtable()

I can see:

vaddr = (unsigned long)relocate_kernel;

and later:

pgd += pgd_index(vaddr);
...

It is wrong. relocate_kernel() virtual address in Xen is different
than its virtual address in Linux Kernel. That is why transition
page table could not be established in Linux Kernel and so on...
How does this work in SUSE? I do not have an idea.

I am happy to fix that but whatever fix for it is
I would like to be sure that it works.

Daniel
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation

2013-01-04 Thread Daniel Kiper
On Fri, Jan 04, 2013 at 02:38:44PM +, David Vrabel wrote:
 On 04/01/13 14:22, Daniel Kiper wrote:
  On Wed, Jan 02, 2013 at 11:26:43AM +, Andrew Cooper wrote:
  On 27/12/12 18:02, Eric W. Biederman wrote:
  Andrew Cooperandrew.coop...@citrix.com  writes:
 
  On 27/12/2012 07:53, Eric W. Biederman wrote:
  The syscall ABI still has the wrong semantics.
 
  Aka totally unmaintainable and umergeable.
 
  The concept of domU support is also strange.  What does domU support 
  even mean, when the dom0 support is loading a kernel to pick up Xen 
  when Xen falls over.
  There are two requirements pulling at this patch series, but I agree
  that we need to clarify them.
  It probably make sense to split them apart a little even.
 
 
 
  Thinking about this split, there might be a way to simply it even more.
 
  /sbin/kexec can load the Xen crash kernel itself by issuing
  hypercalls using /dev/xen/privcmd.  This would remove the need for
  the dom0 kernel to distinguish between loading a crash kernel for
  itself and loading a kernel for Xen.
 
  Or is this just a silly idea complicating the matter?
 
  This is impossible with current Xen kexec/kdump interface.
  It should be changed to do that. However, I suppose that
  Xen community would not be interested in such changes.

 I don't see why the hypercall ABI cannot be extended with new sub-ops
 that do the right thing -- the existing ABI is a bit weird.

 I plan to start prototyping something shortly (hopefully next week) for
 the Xen kexec case.

Wow... As I can this time Xen community is interested in...
That is great. I agree that current kexec interface is not ideal.

David, I am happy to help in that process. However, if you wish I could
carry it myself. Anyway, it looks that I should hold on with my
Linux kexec/kdump patches.

My .5 cents:
  - We should focus on KEXEC_CMD_kexec_load and KEXEC_CMD_kexec_unload;
probably we should introduce KEXEC_CMD_kexec_load2 and 
KEXEC_CMD_kexec_unload2;
load should __LOAD__ kernel image and other things into hypervisor memory;
I suppose that allmost all things could be copied from linux/kernel/kexec.c,
linux/arch/x86/kernel/{machine_kexec_$(BITS).c,relocate_kernel_$(BITS).c};
I think that KEXEC_CMD_kexec should stay as is,
  - Hmmm... Now I think that we should still use kexec syscall to load image
into Xen memory (with new KEXEC_CMD_kexec_load2) because it establishes
all things which are needed to call kdump if dom0 crashes; however,
I could be wrong...
  - last but not least, we should think about support for PV guests too.

Daniel
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation

2013-01-04 Thread Daniel Kiper
On Fri, Jan 04, 2013 at 02:41:17PM +, Jan Beulich wrote:
  On 04.01.13 at 15:22, Daniel Kiper daniel.ki...@oracle.com wrote:
  On Wed, Jan 02, 2013 at 11:26:43AM +, Andrew Cooper wrote:
  /sbin/kexec can load the Xen crash kernel itself by issuing
  hypercalls using /dev/xen/privcmd.  This would remove the need for
  the dom0 kernel to distinguish between loading a crash kernel for
  itself and loading a kernel for Xen.
 
  Or is this just a silly idea complicating the matter?
 
  This is impossible with current Xen kexec/kdump interface.

 Why?

Because current KEXEC_CMD_kexec_load does not load kernel
image and other things into Xen memory. It means that it
should live somewhere in dom0 Linux kernel memory.

Daniel
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE

2013-01-04 Thread Daniel Kiper
On Fri, Jan 04, 2013 at 04:12:32PM +, Jan Beulich wrote:
  On 04.01.13 at 16:15, Daniel Kiper daniel.ki...@oracle.com wrote:
  On Thu, Jan 03, 2013 at 09:34:55AM +, Jan Beulich wrote:
   On 27.12.12 at 03:18, Daniel Kiper daniel.ki...@oracle.com wrote:
   Some implementations (e.g. Xen PVOPS) could not use part of identity 
   page table
   to construct transition page table. It means that they require separate 
   PUDs,
   PMDs and PTEs for virtual and physical (identity) mapping. To satisfy 
   that
   requirement add extra pointer to PGD, PUD, PMD and PTE and align existing
   code.
 
  So you keep posting this despite it having got pointed out on each
  earlier submission that this is unnecessary, proven by the fact that
  the non-pvops Xen kernels can get away without it. Why?
 
  Sorry but I forgot to reply for your email last time.
 
  I am still not convinced. I have tested SUSE kernel itself and it does not 
  work.
  Maybe I missed something but... Please check
  arch/x86/kernel/machine_kexec_64.c:init_transition_pgtable()
 
  I can see:
 
  vaddr = (unsigned long)relocate_kernel;
 
  and later:
 
  pgd += pgd_index(vaddr);
  ...

 I think that mapping is simply irrelevant, as the code at
 relocate_kernel gets copied to the control page and
 invoked there (other than in the native case, where
 relocate_kernel() gets invoked directly).

Right, so where is virtual mapping of control page established?
I could not find relevant code in SLES kernel which does that.

Daniel
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE

2012-12-27 Thread Daniel Kiper
 Hmm... this code is being redone at the moment... this might conflict.

Is this available somewhere? May I have a look at it?

Daniel

PS I am on holiday until 02/01/2013 and I could not
   have access to my email box. Please be patient.
   At worst case I will send reply when I will be
   back at office.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 06/11] x86/xen: Add i386 kexec/kdump implementation

2012-12-27 Thread Daniel Kiper
 On 12/26/2012 06:18 PM, Daniel Kiper wrote:
  Add i386 kexec/kdump implementation.
 
  v2 - suggestions/fixes:
  - allocate transition page table pages below 4 GiB
(suggested by Jan Beulich).

 Why?

Sadly all addresses are passed via unsigned long
variable to kexec hypercall.

Daniel
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 00/11] xen: Initial kexec/kdump implementation

2012-12-27 Thread Daniel Kiper
 On 12/26/2012 06:18 PM, Daniel Kiper wrote:
  Hi,
 
  This set of patches contains initial kexec/kdump implementation for Xen v3.
  Currently only dom0 is supported, however, almost all infrustructure
  required for domU support is ready.
 
  Jan Beulich suggested to merge Xen x86 assembler code with baremetal x86 
  code.
  This could simplify and reduce a bit size of kernel code. However, this 
  solution
  requires some changes in baremetal x86 code. First of all code which 
  establishes
  transition page table should be moved back from machine_kexec_$(BITS).c to
  relocate_kernel_$(BITS).S. Another important thing which should be changed 
  in that
  case is format of page_list array. Xen kexec hypercall requires to 
  alternate physical
  addresses with virtual ones. These and other required stuff have not been 
  done in that
  version because I am not sure that solution will be accepted by kexec/kdump 
  maintainers.
  I hope that this email spark discussion about that topic.

 I want a detailed list of the constraints that this assumes and 
 therefore imposes on the native implementation as a result of this.  We 
 have had way too many patches where Xen PV hacks effectively nailgun 
 arbitrary, and sometimes poor, design decisions in place and now we 
 can't fix them.

OK but now I think that we should leave this discussion
until all details regarding kexec/kdump generic code
will be agreed. Sorry for that.

Daniel
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 01/11] kexec: introduce kexec firmware support

2012-12-27 Thread Daniel Kiper
 Daniel Kiper daniel.ki...@oracle.com writes:

  Some kexec/kdump implementations (e.g. Xen PVOPS) could not use default
  Linux infrastructure and require some support from firmware and/or 
  hypervisor.
  To cope with that problem kexec firmware infrastructure was introduced.
  It allows a developer to use all kexec/kdump features of given firmware
  or hypervisor.

 As this stands this patch is wrong.

 You need to pass an additional flag from userspace through /sbin/kexec
 that says load the kexec image in the firmware.  A global variable here
 is not ok.

 As I understand it you are loading a kexec on xen panic image.  Which
 is semantically different from a kexec on linux panic image.  It is not
 ok to do have a silly global variable kexec_use_firmware.

Earlier we agreed that /sbin/kexec should call kexec syscall with
special flag. However, during work on Xen kexec/kdump v3 patch
I stated that this is insufficient because e.g. crash_kexec()
should execute different code in case of use of firmware support too.
Sadly syscall does not save this flag anywhere. Additionally, I stated
that kernel itself has the best knowledge which code path should be
used (firmware or plain Linux). If this decision will be left to userspace
then simple kexec syscall could crash system at worst case (e.g. when
plain Linux kexec will be used in case when firmware kaxec should be used).
However, if you wish I could add this flag to syscall. Additionally, I could
add function which enables firmware support and then kexec_use_firmware
variable will be global only in kexec.c module.

 Furthermore it is not ok to have a conditional
 code outside of header files.

I agree but how to dispatch execution e.g. in crash_kexec()
if we would like (I suppose) compile kexec firmware
support conditionally?

Daniel
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 00/11] xen: Initial kexec/kdump implementation

2012-12-27 Thread Daniel Kiper
 Andrew Cooper andrew.coop...@citrix.com writes:

  On 27/12/2012 07:53, Eric W. Biederman wrote:
  The syscall ABI still has the wrong semantics.
 
  Aka totally unmaintainable and umergeable.
 
  The concept of domU support is also strange.  What does domU support even 
  mean, when the dom0  support is loading a kernel to pick up Xen when Xen 
  falls over.
 
  There are two requirements pulling at this patch series, but I agree
  that we need to clarify them.

 It probably make sense to split them apart a little even.

  When dom0 loads a crash kernel, it is loading one for Xen to use.  As a
  dom0 crash causes a Xen crash, having dom0 set up a kdump kernel for
  itself is completely useless.  This ability is present in classic Xen
  dom0 kernels, but the feature is currently missing in PVOPS.

  Many cloud customers and service providers want the ability for a VM
  administrator to be able to load a kdump/kexec kernel within a
  domain[1].  This allows the VM administrator to take more proactive
  steps to isolate the cause of a crash, the state of which is most likely
  discarded while tearing down the domain.  The result being that as far
  as Xen is concerned, the domain is still alive, while the kdump
  kernel/environment can work its usual magic.  I am not aware of any
  feature like this existing in the past.

 Which makes domU support semantically just the normal kexec/kdump
 support.  Got it.

To some extent. It is true on HVM and PVonHVM guests. However,
PV guests requires a bit different kexec/kdump implementation
than plain kexec/kdump. Proposed firmware support has almost
all required features. PV guest specific features (a few) will
be added later (after agreeing generic firmware support which
is sufficient at least for dom0).

It looks that I should replace domU by PV guest in patch description.

 The point of implementing domU is for those times when the hypervisor
 admin and the kernel admin are different.

Right.

 For domU support modifying or adding alternate versions of
 machine_kexec.c and relocate_kernel.S to add paravirtualization support
 make sense.

It is not sufficient. Please look above.

 There is the practical argument that for implementation efficiency of
 crash dumps it would be better if that support came from the hypervisor
 or the hypervisor environment.  But this gets into the practical reality

I am thinking about that.

 that the hypervisor environment does not do that today.  Furthermore
 kexec all by itself working in a paravirtualized environment under Xen
 makes sense.

 domU support is what Peter was worrying about for cleanliness, and
 we need some x86 backend ops there, and generally to be careful.

As I know we do not need any additional pv_ops stuff
if we place all needed things in kexec firmware support.

Daniel
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 00/11] xen: Initial kexec/kdump implementation

2012-12-26 Thread Daniel Kiper

Hi,

This set of patches contains initial kexec/kdump implementation for Xen v3.
Currently only dom0 is supported, however, almost all infrustructure
required for domU support is ready.

Jan Beulich suggested to merge Xen x86 assembler code with baremetal x86 code.
This could simplify and reduce a bit size of kernel code. However, this solution
requires some changes in baremetal x86 code. First of all code which establishes
transition page table should be moved back from machine_kexec_$(BITS).c to
relocate_kernel_$(BITS).S. Another important thing which should be changed in 
that
case is format of page_list array. Xen kexec hypercall requires to alternate 
physical
addresses with virtual ones. These and other required stuff have not been done 
in that
version because I am not sure that solution will be accepted by kexec/kdump 
maintainers.
I hope that this email spark discussion about that topic.

Daniel

 arch/x86/Kconfig |3 +
 arch/x86/include/asm/kexec.h |   10 +-
 arch/x86/include/asm/xen/hypercall.h |6 +
 arch/x86/include/asm/xen/kexec.h |   79 
 arch/x86/kernel/machine_kexec_64.c   |   12 +-
 arch/x86/kernel/vmlinux.lds.S|7 +-
 arch/x86/xen/Kconfig |1 +
 arch/x86/xen/Makefile|3 +
 arch/x86/xen/enlighten.c |   11 +
 arch/x86/xen/kexec.c |  150 +++
 arch/x86/xen/machine_kexec_32.c  |  226 +++
 arch/x86/xen/machine_kexec_64.c  |  318 +++
 arch/x86/xen/relocate_kernel_32.S|  323 +++
 arch/x86/xen/relocate_kernel_64.S|  309 ++
 drivers/xen/sys-hypervisor.c |   42 ++-
 include/linux/kexec.h|   26 ++-
 include/xen/interface/xen.h  |   33 ++
 kernel/Makefile  |1 +
 kernel/kexec-firmware.c  |  743 ++
 kernel/kexec.c   |   46 ++-
 20 files changed, 2331 insertions(+), 18 deletions(-)

Daniel Kiper (11):
  kexec: introduce kexec firmware support
  x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and 
PTE
  xen: Introduce architecture independent data for kexec/kdump
  x86/xen: Introduce architecture dependent data for kexec/kdump
  x86/xen: Register resources required by kexec-tools
  x86/xen: Add i386 kexec/kdump implementation
  x86/xen: Add x86_64 kexec/kdump implementation
  x86/xen: Add kexec/kdump Kconfig and makefile rules
  x86/xen/enlighten: Add init and crash kexec/kdump hooks
  drivers/xen: Export vmcoreinfo through sysfs
  x86: Add Xen kexec control code size check to linker script
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 01/11] kexec: introduce kexec firmware support

2012-12-26 Thread Daniel Kiper
Some kexec/kdump implementations (e.g. Xen PVOPS) could not use default
Linux infrastructure and require some support from firmware and/or hypervisor.
To cope with that problem kexec firmware infrastructure was introduced.
It allows a developer to use all kexec/kdump features of given firmware
or hypervisor.

v3 - suggestions/fixes:
   - replace kexec_ops struct by kexec firmware infrastructure
 (suggested by Eric Biederman).

v2 - suggestions/fixes:
   - add comment for kexec_ops.crash_alloc_temp_store member
 (suggested by Konrad Rzeszutek Wilk),
   - simplify kexec_ops usage
 (suggested by Konrad Rzeszutek Wilk).

Signed-off-by: Daniel Kiper daniel.ki...@oracle.com
---
 include/linux/kexec.h   |   26 ++-
 kernel/Makefile |1 +
 kernel/kexec-firmware.c |  743 +++
 kernel/kexec.c  |   46 +++-
 4 files changed, 809 insertions(+), 7 deletions(-)
 create mode 100644 kernel/kexec-firmware.c

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index d0b8458..9568457 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -116,17 +116,34 @@ struct kimage {
 #endif
 };
 
-
-
 /* kexec interface functions */
 extern void machine_kexec(struct kimage *image);
 extern int machine_kexec_prepare(struct kimage *image);
 extern void machine_kexec_cleanup(struct kimage *image);
+extern struct page *mf_kexec_kimage_alloc_pages(gfp_t gfp_mask,
+   unsigned int order,
+   unsigned long limit);
+extern void mf_kexec_kimage_free_pages(struct page *page);
+extern unsigned long mf_kexec_page_to_pfn(struct page *page);
+extern struct page *mf_kexec_pfn_to_page(unsigned long mfn);
+extern unsigned long mf_kexec_virt_to_phys(volatile void *address);
+extern void *mf_kexec_phys_to_virt(unsigned long address);
+extern int mf_kexec_prepare(struct kimage *image);
+extern int mf_kexec_load(struct kimage *image);
+extern void mf_kexec_cleanup(struct kimage *image);
+extern void mf_kexec_unload(struct kimage *image);
+extern void mf_kexec_shutdown(void);
+extern void mf_kexec(struct kimage *image);
 extern asmlinkage long sys_kexec_load(unsigned long entry,
unsigned long nr_segments,
struct kexec_segment __user *segments,
unsigned long flags);
+extern long firmware_sys_kexec_load(unsigned long entry,
+   unsigned long nr_segments,
+   struct kexec_segment __user *segments,
+   unsigned long flags);
 extern int kernel_kexec(void);
+extern int firmware_kernel_kexec(void);
 #ifdef CONFIG_COMPAT
 extern asmlinkage long compat_sys_kexec_load(unsigned long entry,
unsigned long nr_segments,
@@ -135,7 +152,10 @@ extern asmlinkage long compat_sys_kexec_load(unsigned long 
entry,
 #endif
 extern struct page *kimage_alloc_control_pages(struct kimage *image,
unsigned int order);
+extern struct page *firmware_kimage_alloc_control_pages(struct kimage *image,
+   unsigned int order);
 extern void crash_kexec(struct pt_regs *);
+extern void firmware_crash_kexec(struct pt_regs *);
 int kexec_should_crash(struct task_struct *);
 void crash_save_cpu(struct pt_regs *regs, int cpu);
 void crash_save_vmcoreinfo(void);
@@ -168,6 +188,8 @@ unsigned long paddr_vmcoreinfo_note(void);
 #define VMCOREINFO_CONFIG(name) \
vmcoreinfo_append_str(CONFIG_%s=y\n, #name)
 
+extern bool kexec_use_firmware;
+
 extern struct kimage *kexec_image;
 extern struct kimage *kexec_crash_image;
 
diff --git a/kernel/Makefile b/kernel/Makefile
index 6c072b6..bc96b2f 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -58,6 +58,7 @@ obj-$(CONFIG_MODULE_SIG) += module_signing.o modsign_pubkey.o 
modsign_certificat
 obj-$(CONFIG_KALLSYMS) += kallsyms.o
 obj-$(CONFIG_BSD_PROCESS_ACCT) += acct.o
 obj-$(CONFIG_KEXEC) += kexec.o
+obj-$(CONFIG_KEXEC_FIRMWARE) += kexec-firmware.o
 obj-$(CONFIG_BACKTRACE_SELF_TEST) += backtracetest.o
 obj-$(CONFIG_COMPAT) += compat.o
 obj-$(CONFIG_CGROUPS) += cgroup.o
diff --git a/kernel/kexec-firmware.c b/kernel/kexec-firmware.c
new file mode 100644
index 000..f6ddd4c
--- /dev/null
+++ b/kernel/kexec-firmware.c
@@ -0,0 +1,743 @@
+/*
+ * Copyright (C) 2002-2004 Eric Biederman  ebied...@xmission.com
+ * Copyright (C) 2012 Daniel Kiper, Oracle Corporation
+ *
+ * Most of the code here is a copy of kernel/kexec.c.
+ *
+ * This source code is licensed under the GNU General Public License,
+ * Version 2.  See the file COPYING for more details.
+ */
+
+#include linux/atomic.h
+#include linux/errno.h
+#include linux/highmem.h
+#include linux/kernel.h
+#include linux/kexec.h
+#include linux/list.h
+#include linux/mm.h

[PATCH v3 02/11] x86/kexec: Add extra pointers to transition page table PGD, PUD, PMD and PTE

2012-12-26 Thread Daniel Kiper
Some implementations (e.g. Xen PVOPS) could not use part of identity page table
to construct transition page table. It means that they require separate PUDs,
PMDs and PTEs for virtual and physical (identity) mapping. To satisfy that
requirement add extra pointer to PGD, PUD, PMD and PTE and align existing code.

Signed-off-by: Daniel Kiper daniel.ki...@oracle.com
---
 arch/x86/include/asm/kexec.h   |   10 +++---
 arch/x86/kernel/machine_kexec_64.c |   12 ++--
 2 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 6080d26..cedd204 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -157,9 +157,13 @@ struct kimage_arch {
 };
 #else
 struct kimage_arch {
-   pud_t *pud;
-   pmd_t *pmd;
-   pte_t *pte;
+   pgd_t *pgd;
+   pud_t *pud0;
+   pud_t *pud1;
+   pmd_t *pmd0;
+   pmd_t *pmd1;
+   pte_t *pte0;
+   pte_t *pte1;
 };
 #endif
 
diff --git a/arch/x86/kernel/machine_kexec_64.c 
b/arch/x86/kernel/machine_kexec_64.c
index b3ea9db..976e54b 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -137,9 +137,9 @@ out:
 
 static void free_transition_pgtable(struct kimage *image)
 {
-   free_page((unsigned long)image-arch.pud);
-   free_page((unsigned long)image-arch.pmd);
-   free_page((unsigned long)image-arch.pte);
+   free_page((unsigned long)image-arch.pud0);
+   free_page((unsigned long)image-arch.pmd0);
+   free_page((unsigned long)image-arch.pte0);
 }
 
 static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
@@ -157,7 +157,7 @@ static int init_transition_pgtable(struct kimage *image, 
pgd_t *pgd)
pud = (pud_t *)get_zeroed_page(GFP_KERNEL);
if (!pud)
goto err;
-   image-arch.pud = pud;
+   image-arch.pud0 = pud;
set_pgd(pgd, __pgd(__pa(pud) | _KERNPG_TABLE));
}
pud = pud_offset(pgd, vaddr);
@@ -165,7 +165,7 @@ static int init_transition_pgtable(struct kimage *image, 
pgd_t *pgd)
pmd = (pmd_t *)get_zeroed_page(GFP_KERNEL);
if (!pmd)
goto err;
-   image-arch.pmd = pmd;
+   image-arch.pmd0 = pmd;
set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
}
pmd = pmd_offset(pud, vaddr);
@@ -173,7 +173,7 @@ static int init_transition_pgtable(struct kimage *image, 
pgd_t *pgd)
pte = (pte_t *)get_zeroed_page(GFP_KERNEL);
if (!pte)
goto err;
-   image-arch.pte = pte;
+   image-arch.pte0 = pte;
set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
}
pte = pte_offset_kernel(pmd, vaddr);
-- 
1.5.6.5

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 03/11] xen: Introduce architecture independent data for kexec/kdump

2012-12-26 Thread Daniel Kiper
Introduce architecture independent constants and structures
required by Xen kexec/kdump implementation.

Signed-off-by: Daniel Kiper daniel.ki...@oracle.com
---
 include/xen/interface/xen.h |   33 +
 1 files changed, 33 insertions(+), 0 deletions(-)

diff --git a/include/xen/interface/xen.h b/include/xen/interface/xen.h
index 886a5d8..09c16ab 100644
--- a/include/xen/interface/xen.h
+++ b/include/xen/interface/xen.h
@@ -57,6 +57,7 @@
 #define __HYPERVISOR_event_channel_op 32
 #define __HYPERVISOR_physdev_op   33
 #define __HYPERVISOR_hvm_op   34
+#define __HYPERVISOR_kexec_op 37
 #define __HYPERVISOR_tmem_op  38
 
 /* Architecture-specific hypercall definitions. */
@@ -231,7 +232,39 @@ DEFINE_GUEST_HANDLE_STRUCT(mmuext_op);
 #define VMASST_TYPE_pae_extended_cr3 3
 #define MAX_VMASST_TYPE 3
 
+/*
+ * Commands to HYPERVISOR_kexec_op().
+ */
+#define KEXEC_CMD_kexec0
+#define KEXEC_CMD_kexec_load   1
+#define KEXEC_CMD_kexec_unload 2
+#define KEXEC_CMD_kexec_get_range  3
+
+/*
+ * Memory ranges for kdump (utilized by HYPERVISOR_kexec_op()).
+ */
+#define KEXEC_RANGE_MA_CRASH   0
+#define KEXEC_RANGE_MA_XEN 1
+#define KEXEC_RANGE_MA_CPU 2
+#define KEXEC_RANGE_MA_XENHEAP 3
+#define KEXEC_RANGE_MA_BOOT_PARAM  4
+#define KEXEC_RANGE_MA_EFI_MEMMAP  5
+#define KEXEC_RANGE_MA_VMCOREINFO  6
+
 #ifndef __ASSEMBLY__
+struct xen_kexec_exec {
+   int type;
+};
+
+struct xen_kexec_range {
+   int range;
+   int nr;
+   unsigned long size;
+   unsigned long start;
+};
+
+extern unsigned long xen_vmcoreinfo_maddr;
+extern unsigned long xen_vmcoreinfo_max_size;
 
 typedef uint16_t domid_t;
 
-- 
1.5.6.5

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 10/11] drivers/xen: Export vmcoreinfo through sysfs

2012-12-26 Thread Daniel Kiper
Export vmcoreinfo through sysfs.

Signed-off-by: Daniel Kiper daniel.ki...@oracle.com
---
 drivers/xen/sys-hypervisor.c |   42 +-
 1 files changed, 41 insertions(+), 1 deletions(-)

diff --git a/drivers/xen/sys-hypervisor.c b/drivers/xen/sys-hypervisor.c
index 96453f8..9dd290c 100644
--- a/drivers/xen/sys-hypervisor.c
+++ b/drivers/xen/sys-hypervisor.c
@@ -368,6 +368,41 @@ static void xen_properties_destroy(void)
sysfs_remove_group(hypervisor_kobj, xen_properties_group);
 }
 
+#ifdef CONFIG_KEXEC_FIRMWARE
+static ssize_t vmcoreinfo_show(struct hyp_sysfs_attr *attr, char *buffer)
+{
+   return sprintf(buffer, %lx %lx\n, xen_vmcoreinfo_maddr,
+   xen_vmcoreinfo_max_size);
+}
+
+HYPERVISOR_ATTR_RO(vmcoreinfo);
+
+static int __init xen_vmcoreinfo_init(void)
+{
+   if (!xen_vmcoreinfo_max_size)
+   return 0;
+
+   return sysfs_create_file(hypervisor_kobj, vmcoreinfo_attr.attr);
+}
+
+static void xen_vmcoreinfo_destroy(void)
+{
+   if (!xen_vmcoreinfo_max_size)
+   return;
+
+   sysfs_remove_file(hypervisor_kobj, vmcoreinfo_attr.attr);
+}
+#else
+static int __init xen_vmcoreinfo_init(void)
+{
+   return 0;
+}
+
+static void xen_vmcoreinfo_destroy(void)
+{
+}
+#endif
+
 static int __init hyper_sysfs_init(void)
 {
int ret;
@@ -390,9 +425,14 @@ static int __init hyper_sysfs_init(void)
ret = xen_properties_init();
if (ret)
goto prop_out;
+   ret = xen_vmcoreinfo_init();
+   if (ret)
+   goto vmcoreinfo_out;
 
goto out;
 
+vmcoreinfo_out:
+   xen_properties_destroy();
 prop_out:
xen_sysfs_uuid_destroy();
 uuid_out:
@@ -407,12 +447,12 @@ out:
 
 static void __exit hyper_sysfs_exit(void)
 {
+   xen_vmcoreinfo_destroy();
xen_properties_destroy();
xen_compilation_destroy();
xen_sysfs_uuid_destroy();
xen_sysfs_version_destroy();
xen_sysfs_type_destroy();
-
 }
 module_init(hyper_sysfs_init);
 module_exit(hyper_sysfs_exit);
-- 
1.5.6.5

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 11/11] x86: Add Xen kexec control code size check to linker script

2012-12-26 Thread Daniel Kiper
Add Xen kexec control code size check to linker script.

Signed-off-by: Daniel Kiper daniel.ki...@oracle.com
---
 arch/x86/kernel/vmlinux.lds.S |7 ++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 22a1530..f18786a 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -360,5 +360,10 @@ INIT_PER_CPU(irq_stack_union);
 
 . = ASSERT(kexec_control_code_size = KEXEC_CONTROL_CODE_MAX_SIZE,
kexec control code size is too big);
-#endif
 
+#ifdef CONFIG_XEN
+. = ASSERT(xen_kexec_control_code_size - xen_relocate_kernel =
+   KEXEC_CONTROL_CODE_MAX_SIZE,
+   Xen kexec control code size is too big);
+#endif
+#endif
-- 
1.5.6.5

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 09/11] x86/xen/enlighten: Add init and crash kexec/kdump hooks

2012-12-26 Thread Daniel Kiper
Add init and crash kexec/kdump hooks.

Signed-off-by: Daniel Kiper daniel.ki...@oracle.com
---
 arch/x86/xen/enlighten.c |   11 +++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 138e566..5025bba 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -31,6 +31,7 @@
 #include linux/pci.h
 #include linux/gfp.h
 #include linux/memblock.h
+#include linux/kexec.h
 
 #include xen/xen.h
 #include xen/events.h
@@ -1276,6 +1277,12 @@ static void xen_machine_power_off(void)
 
 static void xen_crash_shutdown(struct pt_regs *regs)
 {
+#ifdef CONFIG_KEXEC_FIRMWARE
+   if (kexec_crash_image) {
+   crash_save_cpu(regs, safe_smp_processor_id());
+   return;
+   }
+#endif
xen_reboot(SHUTDOWN_crash);
 }
 
@@ -1353,6 +1360,10 @@ asmlinkage void __init xen_start_kernel(void)
 
xen_init_mmu_ops();
 
+#ifdef CONFIG_KEXEC_FIRMWARE
+   kexec_use_firmware = true;
+#endif
+
/* Prevent unwanted bits from being set in PTEs. */
__supported_pte_mask = ~_PAGE_GLOBAL;
 #if 0
-- 
1.5.6.5

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 08/11] x86/xen: Add kexec/kdump Kconfig and makefile rules

2012-12-26 Thread Daniel Kiper
Add kexec/kdump Kconfig and makefile rules.

Signed-off-by: Daniel Kiper daniel.ki...@oracle.com
---
 arch/x86/Kconfig  |3 +++
 arch/x86/xen/Kconfig  |1 +
 arch/x86/xen/Makefile |3 +++
 3 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 79795af..e2746c4 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1600,6 +1600,9 @@ config KEXEC_JUMP
  Jump between original kernel and kexeced kernel and invoke
  code in physical address mode via KEXEC
 
+config KEXEC_FIRMWARE
+   def_bool n
+
 config PHYSICAL_START
hex Physical address where the kernel is loaded if (EXPERT || 
CRASH_DUMP)
default 0x100
diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig
index 131dacd..8469c1c 100644
--- a/arch/x86/xen/Kconfig
+++ b/arch/x86/xen/Kconfig
@@ -7,6 +7,7 @@ config XEN
select PARAVIRT
select PARAVIRT_CLOCK
select XEN_HAVE_PVMMU
+   select KEXEC_FIRMWARE if KEXEC
depends on X86_64 || (X86_32  X86_PAE  !X86_VISWS)
depends on X86_TSC
help
diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
index 96ab2c0..99952d7 100644
--- a/arch/x86/xen/Makefile
+++ b/arch/x86/xen/Makefile
@@ -22,3 +22,6 @@ obj-$(CONFIG_PARAVIRT_SPINLOCKS)+= spinlock.o
 obj-$(CONFIG_XEN_DEBUG_FS) += debugfs.o
 obj-$(CONFIG_XEN_DOM0) += apic.o vga.o
 obj-$(CONFIG_SWIOTLB_XEN)  += pci-swiotlb-xen.o
+obj-$(CONFIG_KEXEC_FIRMWARE)   += kexec.o
+obj-$(CONFIG_KEXEC_FIRMWARE)   += machine_kexec_$(BITS).o
+obj-$(CONFIG_KEXEC_FIRMWARE)   += relocate_kernel_$(BITS).o
-- 
1.5.6.5

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 07/11] x86/xen: Add x86_64 kexec/kdump implementation

2012-12-26 Thread Daniel Kiper
Add x86_64 kexec/kdump implementation.

Signed-off-by: Daniel Kiper daniel.ki...@oracle.com
---
 arch/x86/xen/machine_kexec_64.c   |  318 +
 arch/x86/xen/relocate_kernel_64.S |  309 +++
 2 files changed, 627 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/xen/machine_kexec_64.c
 create mode 100644 arch/x86/xen/relocate_kernel_64.S

diff --git a/arch/x86/xen/machine_kexec_64.c b/arch/x86/xen/machine_kexec_64.c
new file mode 100644
index 000..2600342
--- /dev/null
+++ b/arch/x86/xen/machine_kexec_64.c
@@ -0,0 +1,318 @@
+/*
+ * Copyright (c) 2011 Daniel Kiper
+ * Copyright (c) 2012 Daniel Kiper, Oracle Corporation
+ *
+ * kexec/kdump implementation for Xen was written by Daniel Kiper.
+ * Initial work on it was sponsored by Google under Google Summer
+ * of Code 2011 program and Citrix. Konrad Rzeszutek Wilk from Oracle
+ * was the mentor for this project.
+ *
+ * Some ideas are taken from:
+ *   - native kexec/kdump implementation,
+ *   - kexec/kdump implementation for Xen Linux Kernel Ver. 2.6.18,
+ *   - PV-GRUB.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program.  If not, see http://www.gnu.org/licenses/.
+ */
+
+#include linux/errno.h
+#include linux/kernel.h
+#include linux/kexec.h
+#include linux/mm.h
+#include linux/string.h
+
+#include xen/interface/memory.h
+#include xen/xen.h
+
+#include asm/xen/hypercall.h
+#include asm/xen/kexec.h
+#include asm/xen/page.h
+
+#define __ma(vaddr)(virt_to_machine(vaddr).maddr)
+
+static void init_level2_page(pmd_t *pmd, unsigned long addr)
+{
+   unsigned long end_addr = addr + PUD_SIZE;
+
+   while (addr  end_addr) {
+   native_set_pmd(pmd++, native_make_pmd(addr | 
__PAGE_KERNEL_LARGE_EXEC));
+   addr += PMD_SIZE;
+   }
+}
+
+static int init_level3_page(struct kimage *image, pud_t *pud,
+   unsigned long addr, unsigned long last_addr)
+{
+   pmd_t *pmd;
+   struct page *page;
+   unsigned long end_addr = addr + PGDIR_SIZE;
+
+   while ((addr  last_addr)  (addr  end_addr)) {
+   page = firmware_kimage_alloc_control_pages(image, 0);
+
+   if (!page)
+   return -ENOMEM;
+
+   pmd = page_address(page);
+   init_level2_page(pmd, addr);
+   native_set_pud(pud++, native_make_pud(__ma(pmd) | 
_KERNPG_TABLE));
+   addr += PUD_SIZE;
+   }
+
+   /* Clear the unused entries. */
+   while (addr  end_addr) {
+   native_pud_clear(pud++);
+   addr += PUD_SIZE;
+   }
+
+   return 0;
+}
+
+
+static int init_level4_page(struct kimage *image, pgd_t *pgd,
+   unsigned long addr, unsigned long last_addr)
+{
+   int rc;
+   pud_t *pud;
+   struct page *page;
+   unsigned long end_addr = addr + PTRS_PER_PGD * PGDIR_SIZE;
+
+   while ((addr  last_addr)  (addr  end_addr)) {
+   page = firmware_kimage_alloc_control_pages(image, 0);
+
+   if (!page)
+   return -ENOMEM;
+
+   pud = page_address(page);
+   rc = init_level3_page(image, pud, addr, last_addr);
+
+   if (rc)
+   return rc;
+
+   native_set_pgd(pgd++, native_make_pgd(__ma(pud) | 
_KERNPG_TABLE));
+   addr += PGDIR_SIZE;
+   }
+
+   /* Clear the unused entries. */
+   while (addr  end_addr) {
+   native_pgd_clear(pgd++);
+   addr += PGDIR_SIZE;
+   }
+
+   return 0;
+}
+
+static void free_transition_pgtable(struct kimage *image)
+{
+   free_page((unsigned long)image-arch.pgd);
+   free_page((unsigned long)image-arch.pud0);
+   free_page((unsigned long)image-arch.pud1);
+   free_page((unsigned long)image-arch.pmd0);
+   free_page((unsigned long)image-arch.pmd1);
+   free_page((unsigned long)image-arch.pte0);
+   free_page((unsigned long)image-arch.pte1);
+}
+
+static int alloc_transition_pgtable(struct kimage *image)
+{
+   image-arch.pgd = (pgd_t *)get_zeroed_page(GFP_KERNEL);
+
+   if (!image-arch.pgd)
+   goto err;
+
+   image-arch.pud0 = (pud_t *)get_zeroed_page(GFP_KERNEL);
+
+   if (!image-arch.pud0)
+   goto err;
+
+   image-arch.pud1 = (pud_t *)get_zeroed_page(GFP_KERNEL

[PATCH v3 05/11] x86/xen: Register resources required by kexec-tools

2012-12-26 Thread Daniel Kiper
Register resources required by kexec-tools.

v2 - suggestions/fixes:
   - change logging level
 (suggested by Konrad Rzeszutek Wilk).

Signed-off-by: Daniel Kiper daniel.ki...@oracle.com
---
 arch/x86/xen/kexec.c |  150 ++
 1 files changed, 150 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/xen/kexec.c

diff --git a/arch/x86/xen/kexec.c b/arch/x86/xen/kexec.c
new file mode 100644
index 000..7ec4c45
--- /dev/null
+++ b/arch/x86/xen/kexec.c
@@ -0,0 +1,150 @@
+/*
+ * Copyright (c) 2011 Daniel Kiper
+ * Copyright (c) 2012 Daniel Kiper, Oracle Corporation
+ *
+ * kexec/kdump implementation for Xen was written by Daniel Kiper.
+ * Initial work on it was sponsored by Google under Google Summer
+ * of Code 2011 program and Citrix. Konrad Rzeszutek Wilk from Oracle
+ * was the mentor for this project.
+ *
+ * Some ideas are taken from:
+ *   - native kexec/kdump implementation,
+ *   - kexec/kdump implementation for Xen Linux Kernel Ver. 2.6.18,
+ *   - PV-GRUB.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program.  If not, see http://www.gnu.org/licenses/.
+ */
+
+#include linux/errno.h
+#include linux/init.h
+#include linux/ioport.h
+#include linux/kernel.h
+#include linux/kexec.h
+#include linux/slab.h
+#include linux/string.h
+
+#include xen/interface/platform.h
+#include xen/interface/xen.h
+#include xen/xen.h
+
+#include asm/xen/hypercall.h
+
+unsigned long xen_vmcoreinfo_maddr = 0;
+unsigned long xen_vmcoreinfo_max_size = 0;
+
+static int __init xen_init_kexec_resources(void)
+{
+   int rc;
+   static struct resource xen_hypervisor_res = {
+   .name = Hypervisor code and data,
+   .flags = IORESOURCE_BUSY | IORESOURCE_MEM
+   };
+   struct resource *cpu_res;
+   struct xen_kexec_range xkr;
+   struct xen_platform_op cpuinfo_op;
+   uint32_t cpus, i;
+
+   if (!xen_initial_domain())
+   return 0;
+
+   if (strstr(boot_command_line, crashkernel=))
+   pr_warn(kexec: Ignoring crashkernel option. 
+   It should be passed to Xen hypervisor.\n);
+
+   /* Register Crash kernel resource. */
+   xkr.range = KEXEC_RANGE_MA_CRASH;
+   rc = HYPERVISOR_kexec_op(KEXEC_CMD_kexec_get_range, xkr);
+
+   if (rc) {
+   pr_warn(kexec: %s: HYPERVISOR_kexec_op(KEXEC_RANGE_MA_CRASH)
+   : %i\n, __func__, rc);
+   return rc;
+   }
+
+   if (!xkr.size)
+   return 0;
+
+   crashk_res.start = xkr.start;
+   crashk_res.end = xkr.start + xkr.size - 1;
+   insert_resource(iomem_resource, crashk_res);
+
+   /* Register Hypervisor code and data resource. */
+   xkr.range = KEXEC_RANGE_MA_XEN;
+   rc = HYPERVISOR_kexec_op(KEXEC_CMD_kexec_get_range, xkr);
+
+   if (rc) {
+   pr_warn(kexec: %s: HYPERVISOR_kexec_op(KEXEC_RANGE_MA_XEN)
+   : %i\n, __func__, rc);
+   return rc;
+   }
+
+   xen_hypervisor_res.start = xkr.start;
+   xen_hypervisor_res.end = xkr.start + xkr.size - 1;
+   insert_resource(iomem_resource, xen_hypervisor_res);
+
+   /* Determine maximum number of physical CPUs. */
+   cpuinfo_op.cmd = XENPF_get_cpuinfo;
+   cpuinfo_op.u.pcpu_info.xen_cpuid = 0;
+   rc = HYPERVISOR_dom0_op(cpuinfo_op);
+
+   if (rc) {
+   pr_warn(kexec: %s: HYPERVISOR_dom0_op(): %i\n, __func__, rc);
+   return rc;
+   }
+
+   cpus = cpuinfo_op.u.pcpu_info.max_present + 1;
+
+   /* Register CPUs Crash note resources. */
+   cpu_res = kcalloc(cpus, sizeof(struct resource), GFP_KERNEL);
+
+   if (!cpu_res) {
+   pr_warn(kexec: %s: kcalloc(): %i\n, __func__, -ENOMEM);
+   return -ENOMEM;
+   }
+
+   for (i = 0; i  cpus; ++i) {
+   xkr.range = KEXEC_RANGE_MA_CPU;
+   xkr.nr = i;
+   rc = HYPERVISOR_kexec_op(KEXEC_CMD_kexec_get_range, xkr);
+
+   if (rc) {
+   pr_warn(kexec: %s: cpu: %u: HYPERVISOR_kexec_op
+   (KEXEC_RANGE_MA_XEN): %i\n, __func__, i, rc);
+   continue;
+   }
+
+   cpu_res-name = Crash note;
+   cpu_res-start = xkr.start;
+   cpu_res-end = xkr.start + xkr.size - 1;
+   cpu_res-flags

[PATCH v3 06/11] x86/xen: Add i386 kexec/kdump implementation

2012-12-26 Thread Daniel Kiper
Add i386 kexec/kdump implementation.

v2 - suggestions/fixes:
   - allocate transition page table pages below 4 GiB
 (suggested by Jan Beulich).

Signed-off-by: Daniel Kiper daniel.ki...@oracle.com
---
 arch/x86/xen/machine_kexec_32.c   |  226 ++
 arch/x86/xen/relocate_kernel_32.S |  323 +
 2 files changed, 549 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/xen/machine_kexec_32.c
 create mode 100644 arch/x86/xen/relocate_kernel_32.S

diff --git a/arch/x86/xen/machine_kexec_32.c b/arch/x86/xen/machine_kexec_32.c
new file mode 100644
index 000..011a5e8
--- /dev/null
+++ b/arch/x86/xen/machine_kexec_32.c
@@ -0,0 +1,226 @@
+/*
+ * Copyright (c) 2011 Daniel Kiper
+ * Copyright (c) 2012 Daniel Kiper, Oracle Corporation
+ *
+ * kexec/kdump implementation for Xen was written by Daniel Kiper.
+ * Initial work on it was sponsored by Google under Google Summer
+ * of Code 2011 program and Citrix. Konrad Rzeszutek Wilk from Oracle
+ * was the mentor for this project.
+ *
+ * Some ideas are taken from:
+ *   - native kexec/kdump implementation,
+ *   - kexec/kdump implementation for Xen Linux Kernel Ver. 2.6.18,
+ *   - PV-GRUB.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program.  If not, see http://www.gnu.org/licenses/.
+ */
+
+#include linux/errno.h
+#include linux/kernel.h
+#include linux/kexec.h
+#include linux/mm.h
+#include linux/string.h
+
+#include xen/xen.h
+#include xen/xen-ops.h
+
+#include asm/xen/hypercall.h
+#include asm/xen/kexec.h
+#include asm/xen/page.h
+
+#define __ma(vaddr)(virt_to_machine(vaddr).maddr)
+
+static void *alloc_pgtable_page(struct kimage *image)
+{
+   struct page *page;
+
+   page = firmware_kimage_alloc_control_pages(image, 0);
+
+   if (!page || !page_address(page))
+   return NULL;
+
+   memset(page_address(page), 0, PAGE_SIZE);
+
+   return page_address(page);
+}
+
+static int alloc_transition_pgtable(struct kimage *image)
+{
+   image-arch.pgd = alloc_pgtable_page(image);
+
+   if (!image-arch.pgd)
+   return -ENOMEM;
+
+   image-arch.pmd0 = alloc_pgtable_page(image);
+
+   if (!image-arch.pmd0)
+   return -ENOMEM;
+
+   image-arch.pmd1 = alloc_pgtable_page(image);
+
+   if (!image-arch.pmd1)
+   return -ENOMEM;
+
+   image-arch.pte0 = alloc_pgtable_page(image);
+
+   if (!image-arch.pte0)
+   return -ENOMEM;
+
+   image-arch.pte1 = alloc_pgtable_page(image);
+
+   if (!image-arch.pte1)
+   return -ENOMEM;
+
+   return 0;
+}
+
+struct page *mf_kexec_kimage_alloc_pages(gfp_t gfp_mask,
+   unsigned int order,
+   unsigned long limit)
+{
+   struct page *pages;
+   unsigned int address_bits, i;
+
+   pages = alloc_pages(gfp_mask, order);
+
+   if (!pages)
+   return NULL;
+
+   address_bits = (limit == ULONG_MAX) ? BITS_PER_LONG : ilog2(limit);
+
+   /* Relocate set of pages below given limit. */
+   if (xen_create_contiguous_region((unsigned long)page_address(pages),
+   order, address_bits)) {
+   __free_pages(pages, order);
+   return NULL;
+   }
+
+   BUG_ON(PagePrivate(pages));
+
+   pages-mapping = NULL;
+   set_page_private(pages, order);
+
+   for (i = 0; i  (1  order); ++i)
+   SetPageReserved(pages + i);
+
+   return pages;
+}
+
+void mf_kexec_kimage_free_pages(struct page *page)
+{
+   unsigned int i, order;
+
+   order = page_private(page);
+
+   for (i = 0; i  (1  order); ++i)
+   ClearPageReserved(page + i);
+
+   xen_destroy_contiguous_region((unsigned long)page_address(page), order);
+   __free_pages(page, order);
+}
+
+unsigned long mf_kexec_page_to_pfn(struct page *page)
+{
+   return pfn_to_mfn(page_to_pfn(page));
+}
+
+struct page *mf_kexec_pfn_to_page(unsigned long mfn)
+{
+   return pfn_to_page(mfn_to_pfn(mfn));
+}
+
+unsigned long mf_kexec_virt_to_phys(volatile void *address)
+{
+   return virt_to_machine(address).maddr;
+}
+
+void *mf_kexec_phys_to_virt(unsigned long address)
+{
+   return phys_to_virt(machine_to_phys(XMADDR(address)).paddr);
+}
+
+int mf_kexec_prepare(struct kimage *image)
+{
+#ifdef

[PATCH v3 04/11] x86/xen: Introduce architecture dependent data for kexec/kdump

2012-12-26 Thread Daniel Kiper
Introduce architecture dependent constants, structures and
functions required by Xen kexec/kdump implementation.

Signed-off-by: Daniel Kiper daniel.ki...@oracle.com
---
 arch/x86/include/asm/xen/hypercall.h |6 +++
 arch/x86/include/asm/xen/kexec.h |   79 ++
 2 files changed, 85 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/include/asm/xen/kexec.h

diff --git a/arch/x86/include/asm/xen/hypercall.h 
b/arch/x86/include/asm/xen/hypercall.h
index c20d1ce..e76a1b8 100644
--- a/arch/x86/include/asm/xen/hypercall.h
+++ b/arch/x86/include/asm/xen/hypercall.h
@@ -459,6 +459,12 @@ HYPERVISOR_hvm_op(int op, void *arg)
 }
 
 static inline int
+HYPERVISOR_kexec_op(unsigned long op, void *args)
+{
+   return _hypercall2(int, kexec_op, op, args);
+}
+
+static inline int
 HYPERVISOR_tmem_op(
struct tmem_op *op)
 {
diff --git a/arch/x86/include/asm/xen/kexec.h b/arch/x86/include/asm/xen/kexec.h
new file mode 100644
index 000..d09b52f
--- /dev/null
+++ b/arch/x86/include/asm/xen/kexec.h
@@ -0,0 +1,79 @@
+/*
+ * Copyright (c) 2011 Daniel Kiper
+ * Copyright (c) 2012 Daniel Kiper, Oracle Corporation
+ *
+ * kexec/kdump implementation for Xen was written by Daniel Kiper.
+ * Initial work on it was sponsored by Google under Google Summer
+ * of Code 2011 program and Citrix. Konrad Rzeszutek Wilk from Oracle
+ * was the mentor for this project.
+ *
+ * Some ideas are taken from:
+ *   - native kexec/kdump implementation,
+ *   - kexec/kdump implementation for Xen Linux Kernel Ver. 2.6.18,
+ *   - PV-GRUB.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program.  If not, see http://www.gnu.org/licenses/.
+ */
+
+#ifndef _ASM_X86_XEN_KEXEC_H
+#define _ASM_X86_XEN_KEXEC_H
+
+#define KEXEC_XEN_NO_PAGES 17
+
+#define XK_MA_CONTROL_PAGE 0
+#define XK_VA_CONTROL_PAGE 1
+#define XK_MA_PGD_PAGE 2
+#define XK_VA_PGD_PAGE 3
+#define XK_MA_PUD0_PAGE4
+#define XK_VA_PUD0_PAGE5
+#define XK_MA_PUD1_PAGE6
+#define XK_VA_PUD1_PAGE7
+#define XK_MA_PMD0_PAGE8
+#define XK_VA_PMD0_PAGE9
+#define XK_MA_PMD1_PAGE10
+#define XK_VA_PMD1_PAGE11
+#define XK_MA_PTE0_PAGE12
+#define XK_VA_PTE0_PAGE13
+#define XK_MA_PTE1_PAGE14
+#define XK_VA_PTE1_PAGE15
+#define XK_MA_TABLE_PAGE   16
+
+#ifndef __ASSEMBLY__
+struct xen_kexec_image {
+   unsigned long page_list[KEXEC_XEN_NO_PAGES];
+   unsigned long indirection_page;
+   unsigned long start_address;
+};
+
+struct xen_kexec_load {
+   int type;
+   struct xen_kexec_image image;
+};
+
+extern unsigned int xen_kexec_control_code_size;
+
+#ifdef CONFIG_X86_32
+extern void xen_relocate_kernel(unsigned long indirection_page,
+   unsigned long *page_list,
+   unsigned long start_address,
+   unsigned int has_pae,
+   unsigned int preserve_context);
+#else
+extern void xen_relocate_kernel(unsigned long indirection_page,
+   unsigned long *page_list,
+   unsigned long start_address,
+   unsigned int preserve_context);
+#endif
+#endif
+#endif /* _ASM_X86_XEN_KEXEC_H */
-- 
1.5.6.5

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 01/11] kexec: introduce kexec_ops struct

2012-11-23 Thread Daniel Kiper
On Fri, Nov 23, 2012 at 09:53:37AM +, Jan Beulich wrote:
  On 23.11.12 at 02:56, Andrew Cooper andrew.coop...@citrix.com wrote:
  On 23/11/2012 01:38, H. Peter Anvin wrote:
  I still don't really get why it can't be isolated from dom0, which would
  make more sense to me, even for a Xen crash.
 
 
  The crash region (as specified by crashkernel= on the Xen command line)
  is isolated from dom0.
 
  dom0 (using the kexec utility etc) has the task of locating the Xen
  crash notes (using the kexec hypercall interface), constructing a binary
  blob containing kernel, initram and gubbins, and asking Xen to put this
  blob in the crash region (again, using the kexec hypercall interface).
 
  I do not see how this is very much different from the native case
  currently (although please correct me if I am misinformed).  Linux has
  extra work to do by populating /proc/iomem with the Xen crash regions
  boot (so the kexec utility can reference their physical addresses when
  constructing the blob), and should just act as a conduit between the
  kexec system call and the kexec hypercall to load the blob.

 But all of this _could_ be done completely independent of the
 Dom0 kernel's kexec infrastructure (i.e. fully from user space,
 invoking the necessary hypercalls through the privcmd driver).

No, this is impossible. kexec/kdump image lives in dom0 kernel memory
until execution. That is why privcmd driver itself is not a solution
in this case.

 It's just that parts of the kexec infrastructure can be re-used
 (and hence that mechanism probably seemed the easier approach
 to the implementer of the original kexec-on-Xen). If the kernel
 folks dislike that re-use (quite understandably looking at how
 much of it needs to be re-done), that shouldn't prevent us from
 looking into the existing alternatives.

This is last resort option. First I think we should try to find
good solution which reuses existing code as much as possible.

Daniel
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 01/11] kexec: introduce kexec_ops struct

2012-11-23 Thread Daniel Kiper
On Fri, Nov 23, 2012 at 10:51:55AM +, Jan Beulich wrote:
  On 23.11.12 at 11:37, Daniel Kiper daniel.ki...@oracle.com wrote:
  On Fri, Nov 23, 2012 at 09:53:37AM +, Jan Beulich wrote:
   On 23.11.12 at 02:56, Andrew Cooper andrew.coop...@citrix.com wrote:
   On 23/11/2012 01:38, H. Peter Anvin wrote:
   I still don't really get why it can't be isolated from dom0, which would
   make more sense to me, even for a Xen crash.
  
  
   The crash region (as specified by crashkernel= on the Xen command line)
   is isolated from dom0.
  
   dom0 (using the kexec utility etc) has the task of locating the Xen
   crash notes (using the kexec hypercall interface), constructing a binary
   blob containing kernel, initram and gubbins, and asking Xen to put this
   blob in the crash region (again, using the kexec hypercall interface).
  
   I do not see how this is very much different from the native case
   currently (although please correct me if I am misinformed).  Linux has
   extra work to do by populating /proc/iomem with the Xen crash regions
   boot (so the kexec utility can reference their physical addresses when
   constructing the blob), and should just act as a conduit between the
   kexec system call and the kexec hypercall to load the blob.
 
  But all of this _could_ be done completely independent of the
  Dom0 kernel's kexec infrastructure (i.e. fully from user space,
  invoking the necessary hypercalls through the privcmd driver).
 
  No, this is impossible. kexec/kdump image lives in dom0 kernel memory
  until execution. That is why privcmd driver itself is not a solution
  in this case.

 Even if so, there's no fundamental reason why that kernel image
 can't be put into Xen controlled space instead.

Yep, but we must change Xen kexec interface and/or its behavior first.
If we take that option then we could also move almost all needed things
from dom0 kernel to Xen. This way we could simplify Linux Kernel
kexec/kdump infrastructure needed to run on Xen.

Daniel
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH v2 01/11] kexec: introduce kexec_ops struct

2012-11-23 Thread Daniel Kiper
On Fri, Nov 23, 2012 at 10:51:08AM +, Ian Campbell wrote:
 On Fri, 2012-11-23 at 10:37 +, Daniel Kiper wrote:
  On Fri, Nov 23, 2012 at 09:53:37AM +, Jan Beulich wrote:
On 23.11.12 at 02:56, Andrew Cooper andrew.coop...@citrix.com wrote:
The crash region (as specified by crashkernel= on the Xen command line)
is isolated from dom0.
   [...]
  
   But all of this _could_ be done completely independent of the
   Dom0 kernel's kexec infrastructure (i.e. fully from user space,
   invoking the necessary hypercalls through the privcmd driver).
 
  No, this is impossible. kexec/kdump image lives in dom0 kernel memory
  until execution.

 Are you sure? I could have sworn they lived in the hypervisor owned
 memory set aside by the crashkernel= parameter as Andy suggested.

I am sure. It is moved to final resting place when
relocate_kernel() is called by hypervisor.

Daniel
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization