Re: [PATCH v5 1/2] arm64: kvm: allows kvm cpu hotplug

2015-10-15 Thread AKASHI Takahiro

James,

I reproduced the problem on Hikey board, but

On 10/13/2015 07:43 PM, James Morse wrote:

Hi,

On 13/10/15 06:38, AKASHI Takahiro wrote:

On 10/12/2015 10:28 PM, James Morse wrote:

On 29/05/15 06:38, AKASHI Takahiro wrote:

The current kvm implementation on arm64 does cpu-specific initialization
at system boot, and has no way to gracefully shutdown a core in terms of
kvm. This prevents, especially, kexec from rebooting the system on a boot
core in EL2.

This patch adds a cpu tear-down function and also puts an existing cpu-init
code into a separate function, kvm_arch_hardware_disable() and
kvm_arch_hardware_enable() respectively.
We don't need arm64-specific cpu hotplug hook any more.


I think we do... on platforms where cpuidle uses psci to temporarily turn
off cores that aren't in use, we lose the el2 state. This hotplug hook
restores the state, even if there a no vms running.


I've just noticed there are two cpu notifiers - we may be referring to
different ones. (hyp_init_cpu_pm_nb and hyp_init_cpu_nb)



If I understand you correctly, with or without my patch, kvm doesn't work
  under cpuidle anyway. Right?


It works with, and without, v4.
This patch v5 causes the problem.



If so, saving/restoring cpu states (or at least, kicking cpu hotplug hooks)
is cpuidle driver's responsibility, isn't it?


Yes - but with v5, (at least one of) the hotplug hooks isn't having the
same effect as before:

Before v5, cpu_init_hyp_mode() is called via cpu_notify() each time
cpu_suspend() suspends/wakes-up the core.

Logically it should be the 'pm' notifier that does this work:

if (cmd == CPU_PM_EXIT &&
__hyp_get_vectors() == hyp_default_vectors) {
cpu_init_hyp_mode(NULL);
return NOTIFY_OK;



With v5, kvm_arch_hardware_enable() isn't called each time cpu_suspend()
cycles the core.


Right. I misunderstood kvm_arm_get_running_vcpu().


The problem appears to be this hunk, affecting the above code:

-   if (cmd == CPU_PM_EXIT &&
-   __hyp_get_vectors() == hyp_default_vectors) {
-   cpu_init_hyp_mode(NULL);
+   if (cmd == CPU_PM_EXIT && kvm_arm_get_running_vcpu()) {
+   kvm_arch_hardware_enable();


Changing this to just rename cpu_init_hyp_mode() to
kvm_arch_hardware_enable() solves the problem.


The change that you suggested won't work well because kvm needs to maintain
cpu state with 'kvm_usage_count' using kvm_arch_hardware_enable/disable().
With this changed applied, you won't be able to do kexec.

I'm going to try more generic PM hook.

Thanks,
-Takahiro AKASHI


Presumably kvm_arm_get_running_vcpu() evaluates to false before the first
vm is started, meaning no vms can be started if pm events occur before
starting the first vm.

Sorry I blamed the wrong cpu notifier hook - I didn't realise there were two!


Thanks,

James



This patch prevents me from running vms on such a platform, qemu gives:

kvm [1500]: Unsupported exception type: 6264688KVM internal error.

Suberror: 0

kvmtool goes with a more dramatic:

KVM exit reason: 17 ("KVM_EXIT_INTERNAL_ERROR")


Disabling CONFIG_ARM_CPUIDLE solves this problem.


(Sorry to revive an old thread - I've been using v4 of this patch for the
hibernate/suspend-to-disk series).



Since this patch modifies common part of code between arm and arm64, one
stub definition, __cpu_reset_hyp_mode(), is added on arm side to avoid
compiling errors.

Signed-off-by: AKASHI Takahiro 



diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
index fd085ec..afe6263 100644
--- a/arch/arm64/kvm/hyp.S
+++ b/arch/arm64/kvm/hyp.S
@@ -1136,6 +1136,11 @@ ENTRY(kvm_call_hyp)
   ret
   ENDPROC(kvm_call_hyp)

+ENTRY(kvm_call_reset)
+hvc#HVC_RESET
+ret
+ENDPROC(kvm_call_reset)
+
   .macro invalid_vectorlabel, target
   .align2
   \label:
@@ -1179,10 +1184,27 @@ el1_sync:// Guest trapped
into EL2
   cmpx18, #HVC_GET_VECTORS
   b.ne1f
   mrsx0, vbar_el2
-b2f
-
-1:/* Default to HVC_CALL_HYP. */
+bdo_eret

+/* jump into trampoline code */
+1:cmpx18, #HVC_RESET
+b.ne2f
+/*
+ * Entry point is:
+ *TRAMPOLINE_VA
+ *+ (__kvm_hyp_reset - (__hyp_idmap_text_start & PAGE_MASK))
+ */
+adrpx2, __kvm_hyp_reset
+addx2, x2, #:lo12:__kvm_hyp_reset
+adrpx3, __hyp_idmap_text_start
+addx3, x3, #:lo12:__hyp_idmap_text_start
+andx3, x3, PAGE_MASK
+subx2, x2, x3
+ldrx3, =TRAMPOLINE_VA
+addx2, x2, x3
+brx2// no return
+
+2:/* Default to HVC_CALL_HYP. */


What was the reason not to use kvm_call_hyp(__kvm_hyp_reset, ...)?
(You mentioned you wanted to at [0] - I can't find the details in the
archive)


Thanks,

James


[0] http://lists.infradead.org/pipermail/kexec/2015-April/335533.html


___
kexec mailing list

Re: [PATCH] kexec: use mmap instead of read for slurp_file()

2015-10-15 Thread Michael Holzheu
On Wed, 14 Oct 2015 17:05:52 -0700
Geoff Levand  wrote:

[snip]

> > if (err < 0)
> >  >  >   >   > die("Can not seek to the begin of file %s: %s\n",
> >  >  >   >   >   >   > filename, strerror(errno));
> > +>  >   > buf = slurp_fd(fd, filename, size, );
> >  >  > } else {
> > ->  >   > size = stats.st_size;
> > +>  >   > size = nread = stats.st_size;
> > +>  >   > buf = mmap(NULL, size,
> 
> With this change the caller can't tell if buf was malloc'ed or mmaped.  The
> only safe thing it can do is to not call free() on the returned buf, this will
> lead to memory leakage for malloc'ed buffers.

I have read the code and have not found any free call. Therefore I assumed
that the kexec approach is to not free the buffer *explicitly* and leave to
the kernel to free it *automatically* at process exit.

@Simon: Was this assumption wrong?

Michael


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v4 63/79] include/uapi/linux/kexec.h: use __kernel_size_t instead of size_t

2015-10-15 Thread Mikko Rapeli
Fixes userspace compilation error:

error: unknown type name ‘size_t’
  size_t bufsz;

Signed-off-by: Mikko Rapeli 
---
 include/uapi/linux/kexec.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/kexec.h b/include/uapi/linux/kexec.h
index 99048e5..26be201 100644
--- a/include/uapi/linux/kexec.h
+++ b/include/uapi/linux/kexec.h
@@ -50,9 +50,9 @@
  */
 struct kexec_segment {
const void *buf;
-   size_t bufsz;
+   __kernel_size_t bufsz;
const void *mem;
-   size_t memsz;
+   __kernel_size_t memsz;
 };
 
 #endif /* __KERNEL__ */
-- 
2.5.0


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] kexec-tools: do not copy 1st kernel root= param in fs2dt.c

2015-10-15 Thread Simon Horman
Hi Dave,

On Fri, Oct 09, 2015 at 01:57:22PM +0800, Dave Young wrote:
> Jan Stodola  reported ppc64 root= is always added in 
> kexec
> kernel cmdline. But sometimes we need boot without root= for example we use
> kexec to boot into installation initramfs image like below:
> kexec --load vmlinuz --initrd=initrd.img --command-line=\
> "inst.repo=http:Server/ppc64le/os/"
> 
> While creating dtb, in case there's no root= in user provided cmdline params 
> kexec-tools will find the original root= param used in 1st kernel and pass it
> to 2nd kernel. This caused that user have no way to remove root= cmdline.
> 
> Dropping that part of code so that one can get chance to kexec into 2nd kernel
> without root= param. One can still provide root= in --command-line=""

I'm a little concerned about the backwards-compatibility implications of
this change. Though I agree the new behaviour is entirely sane perhaps
it should be activated via a new command line option.

Also, won't this affect other architectures that use DT.
I'm thinking about ARM. If so it might be good to tweak the changelog.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[ANNOUNCE] kexec-tools v2.0.10 preparation

2015-10-15 Thread Simon Horman
Hi all,

I am planning to release kexec-tools v2.0.11 around the time that
the v4.3 kernel is released.

As I estimate that the latter occur within the few weeks I
would like to ask interested parties to send any patches they would like
included in v2.0.10 within the next week so that I can make an rc release.

For reference the patches queued up since v2.0.10 are as follows:

faef3d02bb0d ppc64: Fix warnings
626b441109b6 configure: Set SUBARCH=BE for powerpc
0974c43a18ca kexec-tools: fix build error with glibc 2.19 and earlier version
4fbf781eb038 Load crash kernel high on x86
6fd80e245e38 fix kexec load hang in case crash notes addr read failure
7ab842d8a004 kexec: use mmap instead of read for slurp_file()
a304e2d82a8c ppc64: purgatory: Reset primary cpu endian to big-endian
97a07e2775ef Drop release date from kexec-tools version output
cb9a818ff2ac Add persistent memory support
c9c21cc107dc kexec: use _DEFAULT_SOURCE instead to remove compiling warning
973426e85377 kexec-tools 2.0.10.git

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2 1/3] ppc64: Fix warnings

2015-10-15 Thread Simon Horman
On Tue, Oct 06, 2015 at 05:55:48PM -0500, Scott Wood wrote:
> Produce a warning-free build on ppc64 (at least, when built as 64-bit
> userspace -- if a 64-bit binary for ppc64 is a requirement, why is -m64
> set only on purgatory?).  Mostly unused (or write-only) variable
> warnings, but also one nasty one where reserve() was used without a
> prototype, causing long long arguments to be passed as int.
> 
> Signed-off-by: Scott Wood 

Thanks, applied.

I would have slightly preferred one-patch per problem.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] kexec: determine size of block device

2015-10-15 Thread Simon Horman
Hi Andreas,

On Fri, Oct 09, 2015 at 03:27:18PM +0200, Andreas Fenkart wrote:
> starting 'kexec -l /dev/mmcblk0p1' fails since the size of
> a block device can not be determined with stat

Could you provide some motivation for kexec reading a kernel image
from a block device as per the example above?

> Signed-off-by: Andreas Fenkart 
> ---
>  kexec/kexec.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/kexec/kexec.c b/kexec/kexec.c
> index ff024f3..8b8846e 100644
> --- a/kexec/kexec.c
> +++ b/kexec/kexec.c
> @@ -26,7 +26,9 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -554,6 +556,14 @@ char *slurp_file(const char *filename, off_t *r_size)
>   die("Can not seek to the begin of file %s: %s\n",
>   filename, strerror(errno));
>   buf = slurp_fd(fd, filename, size, );
> +} else if (S_ISBLK(stats.st_mode)) {
> + /* taken from blockdev */

Is "blockdev" Linux kernel code?

> + unsigned long long llu = -1;
> + err = ioctl(fd, BLKGETSIZE64, );
> + if (err < 0)
> + die("Can't retrieve size of block device %s: %s\n",
> + filename, strerror(errno));
> + size = llu;
>   } else {
>   size = nread = stats.st_size;
>   buf = mmap(NULL, size,
> -- 
> 2.5.1
> 
> 
> ___
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
> 

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option

2015-10-15 Thread 河合英宏 / KAWAI,HIDEHIRO
> * Thomas Gleixner  wrote:
> 
> > Borislav,
> >
> > On Mon, 5 Oct 2015, Borislav Petkov wrote:
> > > On Mon, Oct 05, 2015 at 02:03:58AM +, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > > That's different from my point of view.  I'm not going to pass
> > > > some data from the first kernel to the second kernel. I'm just going to
> > > > provide a configurable option for the second kernel to users.
> > >
> > > Dude, WTF?! You're adding a kernel command line which is supposed to
> > > be used *only* by the kdump kernel. But nooo, it is there in the open
> > > and visible to people. And anyone can type it in during boot. AND THAT
> > > SHOULDN'T BE POSSIBLE IN THE FIRST PLACE!
> > >
> > > This information is strictly for the kdump kernel - it shouldn't be a
> > > generic command line option. How hard it is to understand that simple
> > > fact?!
> >
> > Calm down!
> >
> > Disabling that NMI in the first kernel is not going to make the world
> > explode. We have enough command line options a user can type in, which
> > are way worse than that one. If some "expert" types nonsense on the
> > first kernel command line, then it's none of our problems, really.
> >
> > If Kawai would have marked that option as a debug feature, this
> > discussion would have probably never happened.
> >
> > Aside of that, the best way to hand in options for the kdump kernel is
> > the command line. We have an existing interface for that.
> >
> > Let's move on. Nothing to see here.
> 
> So Boris kind of has a point: there are numerous problems with boot options as
> kexec parameter interface:
> 
>  - boot option strings are not a well defined programmatic interface:
> - failures are not obvious (they are often ignored)
> - inserting/setting parameters is awkward as well.
> 
>  - boot options are not an ABI, so when options have dual use with kexec it's 
> easy
>to break things inadvertently: without that failure being apparent other 
> than
>reintroducing obscure kexec failure modes again.
> 
>  - in the booted up kexec kernel it's not really obvious which options got 
> set by
>kexec and which got set by the user - as there's no separation of 
> namespaces.
> 
>  - likewise, if the user specifies an option in conflict with a kexec 
> requirement
>it's not really obvious what's happening: the user setting should probably
>dominate - I'm not sure that's the case with the current kexec code.

Thanks for the detailed explanation.  I can understand the disadvantages
of using boot option.  So these are essential problems with boot options
rather than new boot option added for kexec'ed kernel.
 
> Boot options are basically a user interface.
> 
> On the other hand the hack of (ab-)using boot parameters as kexec parameter
> passing is an existing, many years old practice and we cannot just stop it 
> without
> offering an alternative (and better!) interface first.
> 
> We could improve things by either adding a separate kexec-only parameter 
> passing
> facility (like programmatic boot parameters are) - or by creating some sort of
> boot parameter alias that clearly identifies kexec parameters.
> 
> So for example when introducing 'noextnmi' we'd also add a facility to add a
> 'kexec_noextnmi' alias - which clearly identifies this boot parameter as a 
> kexec
> related one.
> 
> Every 'kexec' inserted parameter would be prefixed by kexec_ - and then the
> separation of namespaces (and the prioritization of user vs. kexec 
> requirements)
> becomes well defined as well..
> 
> We should also probably print a warning if a kexec_* parameter is passed in 
> that
> has no matching handler in the kexec()-ed kernel.

It would be reasonable.  Or we might improve kexec command so that
it removes conflict boot options with warnings.

As I stated in another mail, I'm going to change "noextnmi" to
"apic_extnmi={bsp|all|none}", which can be used both the first and
second kernels.  So, I'll add this option without "kexec_" prefix
at this point.

> I do concur that this patch is probably OK given existing practices.

Thanks!

--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option

2015-10-15 Thread 河合英宏 / KAWAI,HIDEHIRO
> > By the way, I have a pending patch which expands this option like
> > this:
> >
> > apic_extnmi={ bsp | all | none }
> >
> > If apic_extnmi=all is specified, external NMIs are broadcast to
> > all CPUs.  This raises the successful rate of kernel panic in the case
> > where an external NMI to CPU 0 is swallowed by other NMI handlers or
> > blocked due to hang-up in NMI context.  The patch works without any
> > problems, but I'm going to drop the feature if it will cause long
> > discussion.  I'd like to settle this patch set down once.  At least,
> > I'm going to change this option to apic_extnmi={bsp|none} style for
> > the future expansion.
> >
> > How do you think about this?
> 
> Do it right away with all three variants. They make a lot of sense to
> me.

OK. I'll do that.

Regards,

--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 2/2] util_lib: Silence ar informational message

2015-10-15 Thread Simon Horman
Hi Geoff,

On Thu, Oct 08, 2015 at 04:21:24PM -0700, Geoff Levand wrote:
> Add the -c option to ar to silence the informational
> message 'ar creating libutil.a'.  ar outputs this
> message to stderr, and some tools interpret this as
> a build warning.

I am wondering about the portability of -c.
I tried to find some documentation of it and drew a blank.
Do you have any thoughts on this?

> Signed-off-by: Geoff Levand 
> ---
>  util_lib/Makefile | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/util_lib/Makefile b/util_lib/Makefile
> index 54ccdc5..15b84e5 100644
> --- a/util_lib/Makefile
> +++ b/util_lib/Makefile
> @@ -18,5 +18,5 @@ $(UTIL_LIB): CPPFLAGS += -I$(srcdir)/util_lib/include
>  
>  $(UTIL_LIB): $(UTIL_LIB_OBJS)
>   @$(MKDIR) -p $(@D)
> - $(AR) rs $(UTIL_LIB) $(UTIL_LIB_OBJS)
> + $(AR) rs -c $(UTIL_LIB) $(UTIL_LIB_OBJS)
>  
> -- 
> 2.5.0
> 

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2 3/3] ppc64: Add a flag to tell the kernel it's booting from kexec

2015-10-15 Thread Simon Horman
On Tue, Oct 06, 2015 at 05:55:50PM -0500, Scott Wood wrote:
> It needs to know this because the SMP release mechanism for Freescale
> book3e is different from when booting with normal hardware.  In theory
> we could simulate the normal spin table mechanism, but not (easily) at
> the addresses U-Boot put in the device tree -- so there'd need to be
> even more communication between the kernel and kexec to set that up.
> 
> Signed-off-by: Scott Wood 
> ---
> v2: Use a device tree property rather than setting a flag in the kernel
> image, as requested by Michael Ellerman.

I'd value a review of this from someone more familiar with ppc than I.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2 2/3] ppc64: Avoid rfid if no need to clear MSR_LE

2015-10-15 Thread Simon Horman
On Tue, Oct 06, 2015 at 05:55:49PM -0500, Scott Wood wrote:
> Commit a304e2d82a8c3 ("ppc64: purgatory: Reset primary cpu endian to
> big-endian) changed bctr to rfid.  rfid is book3s-only and will cause a
> fatal exception on book3e.
> 
> Purgatory is an isolated environment which makes importing information
> about the subarch awkward, so instead rely on the fact that MSR_LE
> should never be set on book3e, and the rfid is only needed if MSR_LE is
> set (and thus needs to be cleared).  In theory that MSR bit is reserved
> on book3e, rather than zero, but in practice I have not seen it set.
> 
> Signed-off-by: Scott Wood 
> Cc: Samuel Mendoza-Jonas 
> ---
> v2: new patch

Could I get a review of this from at lest one ppc person?

>  purgatory/arch/ppc64/v2wrap.S | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/purgatory/arch/ppc64/v2wrap.S b/purgatory/arch/ppc64/v2wrap.S
> index 179ade9..3534080 100644
> --- a/purgatory/arch/ppc64/v2wrap.S
> +++ b/purgatory/arch/ppc64/v2wrap.S
> @@ -116,9 +116,17 @@ master:
>   stw 7,0x5c(4)   # and patch it into the kernel
>   mr  3,16# restore dt address
>  
> + mfmsr   5
> + andi.   10,5,1  # test MSR_LE
> + bne little_endian
> +
> + li  5,0 # r5 will be 0 for kernel
> + mtctr   4   # prepare branch to
> + bctr# start kernel
> + 
> +little_endian:   # book3s-only
>   mtsrr0  4   # prepare branch to
>  
> - mfmsr   5
>   clrrdi  5,5,1   # clear MSR_LE
>   mtsrr1  5
>  
> -- 
> 2.1.4
> 
> 
> ___
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
> 

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec