Re: [PATCH v2] powerpc/pseries: Fix maximum memory value

2019-07-04 Thread Aravinda Prasad



On Thursday 04 July 2019 04:43 PM, Michael Ellerman wrote:
> "Naveen N. Rao"  writes:
>> Nathan Lynch wrote:
>>> Aravinda Prasad  writes:
>>>> Calculating the maximum memory based on the number of lmbs
>>>> and lmb size does not account for the RMA region. Hence
>>>> use memory_hotplug_max(), which already accounts for the
>>>> RMA region, to fetch the maximum memory value. Thanks to
>>>> Nathan Lynch for suggesting the memory_hotplug_max()
>>>> function.
>>>
>>> Well, I hope I haven't led you astray... will it give you the desired
>>> result on a kernel configured without memory hotplug support, booted in
>>> an LPAR with some huge pages configured?
>>>
>>> If so, then
>>> Acked-by: Nathan Lynch 
>>>
>>> It would likely help with review and future maintenance if the semantics
>>> and intended use of the MaxMem field are made a little more
>>> explicit. For example, is it supposed to include persistent memory?
>>> Perhaps a follow-up patch could address this. Or maybe I'm overthinking
>>> it.
>>
>> This was primarily aimed to replicate what AIX lparstat does and 
>> documentation (*) just says:
>>
>>   Maximum Memory
>>   Maximum possible amount of Memory.
>>
>> I think this mirrors the maximum memory value set in the LPAR profile, 
>> and this provides a way to obtain that value from within the LPAR.
> 
> But the doc string for memory_hotplug_max() says:
> 
>  * memory_hotplug_max - return max address of memory that may be added
> 
> 
> ie. maximum *address* not maximum *amount*.
> 
> Possibly it turns out to be the same value, but that is just because you
> have no holes in your layout.
> 
> So I don't think this patch is correct.

memory_hotplug_max (in one of the cases) is taking the value from
"ibm,lrdr-capacity" and according to PAPR:

PAPR section C.6.3.1 ibm,lrdr-capacity:

"The phys (of size #address-cells) communicates the maximum address in
bytes and therefore, the most memory that can be allocated to this
partition."

On other cases memory_hotplug_max() is calculating based on the number
of lmbs assigned to the partition, so should still give max mem value


> 
> cheers
> 

-- 
Regards,
Aravinda



Re: [PATCH v2] powerpc/pseries: Fix maximum memory value

2019-06-30 Thread Aravinda Prasad



On Friday 28 June 2019 10:57 PM, Nathan Lynch wrote:
> Aravinda Prasad  writes:
>> Calculating the maximum memory based on the number of lmbs
>> and lmb size does not account for the RMA region. Hence
>> use memory_hotplug_max(), which already accounts for the
>> RMA region, to fetch the maximum memory value. Thanks to
>> Nathan Lynch for suggesting the memory_hotplug_max()
>> function.
> 
> Well, I hope I haven't led you astray... will it give you the desired
> result on a kernel configured without memory hotplug support, booted in
> an LPAR with some huge pages configured?

Yes. I have tested the patch both with CONFIG_MEMORY_HOTPLUG set and not
set. It is working as expected.

Regards,
Aravinda

> 
> If so, then
> Acked-by: Nathan Lynch 
> 
> It would likely help with review and future maintenance if the semantics
> and intended use of the MaxMem field are made a little more
> explicit. For example, is it supposed to include persistent memory?
> Perhaps a follow-up patch could address this. Or maybe I'm overthinking
> it.
> 

-- 
Regards,
Aravinda



[PATCH v2] powerpc/pseries: Fix maximum memory value

2019-06-28 Thread Aravinda Prasad
Calculating the maximum memory based on the number of lmbs
and lmb size does not account for the RMA region. Hence
use memory_hotplug_max(), which already accounts for the
RMA region, to fetch the maximum memory value. Thanks to
Nathan Lynch for suggesting the memory_hotplug_max()
function.

Fixes: 772b039fd9a7: ("powerpc/pseries: Export maximum memory value")
Signed-off-by: Aravinda Prasad 
---
 arch/powerpc/platforms/pseries/lparcfg.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/lparcfg.c 
b/arch/powerpc/platforms/pseries/lparcfg.c
index e33e8bc..010a41f 100644
--- a/arch/powerpc/platforms/pseries/lparcfg.c
+++ b/arch/powerpc/platforms/pseries/lparcfg.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -33,7 +34,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include "pseries.h"
 
@@ -435,7 +435,7 @@ static void maxmem_data(struct seq_file *m)
 {
unsigned long maxmem = 0;
 
-   maxmem += drmem_info->n_lmbs * drmem_info->lmb_size;
+   maxmem += memory_hotplug_max();
maxmem += hugetlb_total_pages() * PAGE_SIZE;
 
seq_printf(m, "MaxMem=%ld\n", maxmem);



Re: [PATCH] powerpc/pseries: Fix maximum memory value

2019-06-27 Thread Aravinda Prasad



On Thursday 27 June 2019 04:06 AM, Nathan Lynch wrote:
> Aravinda Prasad  writes:
>> Calculating the maximum memory based on the number of lmbs
>> and lmb size does not account for the RMA region. Hence
>> use drmem_lmb_memory_max(), which already accounts for the
>> RMA region, to fetch the maximum memory value.
>>
>> Fixes: 772b039fd9a7: ("powerpc/pseries: Export maximum memory value")
>> Signed-off-by: Aravinda Prasad 
>> ---
>>  arch/powerpc/platforms/pseries/lparcfg.c |2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/platforms/pseries/lparcfg.c 
>> b/arch/powerpc/platforms/pseries/lparcfg.c
>> index e33e8bc..f425842 100644
>> --- a/arch/powerpc/platforms/pseries/lparcfg.c
>> +++ b/arch/powerpc/platforms/pseries/lparcfg.c
>> @@ -435,7 +435,7 @@ static void maxmem_data(struct seq_file *m)
>>  {
>>  unsigned long maxmem = 0;
>>  
>> -maxmem += drmem_info->n_lmbs * drmem_info->lmb_size;
>> +maxmem += drmem_lmb_memory_max();
> 
> Would memory_hotplug_max() be better here? There's no guarantee an LPAR
> will have the device tree node/properties that populate drmem.

I see memory_hotplug_max() calling drmem_lmb_memory_max() if "/rtas"
node is not available or else it fetches from "ibm,lrdr-capacity". So I
think memory_hotplug_max() can be used.

Regards,
Aravinda

> 

-- 
Regards,
Aravinda


[PATCH] powerpc/pseries: Fix maximum memory value

2019-06-26 Thread Aravinda Prasad
Calculating the maximum memory based on the number of lmbs
and lmb size does not account for the RMA region. Hence
use drmem_lmb_memory_max(), which already accounts for the
RMA region, to fetch the maximum memory value.

Fixes: 772b039fd9a7: ("powerpc/pseries: Export maximum memory value")
Signed-off-by: Aravinda Prasad 
---
 arch/powerpc/platforms/pseries/lparcfg.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/lparcfg.c 
b/arch/powerpc/platforms/pseries/lparcfg.c
index e33e8bc..f425842 100644
--- a/arch/powerpc/platforms/pseries/lparcfg.c
+++ b/arch/powerpc/platforms/pseries/lparcfg.c
@@ -435,7 +435,7 @@ static void maxmem_data(struct seq_file *m)
 {
unsigned long maxmem = 0;
 
-   maxmem += drmem_info->n_lmbs * drmem_info->lmb_size;
+   maxmem += drmem_lmb_memory_max();
maxmem += hugetlb_total_pages() * PAGE_SIZE;
 
seq_printf(m, "MaxMem=%ld\n", maxmem);



Re: [PATCH 2/2] powerpc/64s: Better printing of machine check info for guest MCEs

2019-02-20 Thread Aravinda Prasad



On Wednesday 20 February 2019 06:36 AM, Paul Mackerras wrote:
> This adds an "in_guest" parameter to machine_check_print_event_info()
> so that we can avoid trying to translate guest NIP values into
> symbolic form using the host kernel's symbol table.

Reviewed-by: Aravinda Prasad 

Regards,
Aravinda

> 
> Signed-off-by: Paul Mackerras 
> ---
>  arch/powerpc/include/asm/mce.h| 2 +-
>  arch/powerpc/kernel/mce.c | 8 +---
>  arch/powerpc/kvm/book3s_hv.c  | 4 ++--
>  arch/powerpc/platforms/powernv/opal.c | 2 +-
>  4 files changed, 9 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h
> index a8b8903..17996bc 100644
> --- a/arch/powerpc/include/asm/mce.h
> +++ b/arch/powerpc/include/asm/mce.h
> @@ -209,7 +209,7 @@ extern int get_mce_event(struct machine_check_event *mce, 
> bool release);
>  extern void release_mce_event(void);
>  extern void machine_check_queue_event(void);
>  extern void machine_check_print_event_info(struct machine_check_event *evt,
> -bool user_mode);
> +bool user_mode, bool in_guest);
>  #ifdef CONFIG_PPC_BOOK3S_64
>  void flush_and_reload_slb(void);
>  #endif /* CONFIG_PPC_BOOK3S_64 */
> diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
> index bd933a7..d01b690 100644
> --- a/arch/powerpc/kernel/mce.c
> +++ b/arch/powerpc/kernel/mce.c
> @@ -301,13 +301,13 @@ static void machine_check_process_queued_event(struct 
> irq_work *work)
>   while (__this_cpu_read(mce_queue_count) > 0) {
>   index = __this_cpu_read(mce_queue_count) - 1;
>   evt = this_cpu_ptr(_event_queue[index]);
> - machine_check_print_event_info(evt, false);
> + machine_check_print_event_info(evt, false, false);
>   __this_cpu_dec(mce_queue_count);
>   }
>  }
> 
>  void machine_check_print_event_info(struct machine_check_event *evt,
> - bool user_mode)
> + bool user_mode, bool in_guest)
>  {
>   const char *level, *sevstr, *subtype;
>   static const char *mc_ue_types[] = {
> @@ -387,7 +387,9 @@ void machine_check_print_event_info(struct 
> machine_check_event *evt,
>  evt->disposition == MCE_DISPOSITION_RECOVERED ?
>  "Recovered" : "Not recovered");
> 
> - if (user_mode) {
> + if (in_guest) {
> + printk("%s  Guest NIP: %016llx\n", evt->srr0);
> + } else if (user_mode) {
>   printk("%s  NIP: [%016llx] PID: %d Comm: %s\n", level,
>   evt->srr0, current->pid, current->comm);
>   } else {
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index d8bf05a..81cba4b 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -1216,7 +1216,7 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, 
> struct kvm_vcpu *vcpu,
>   break;
>   case BOOK3S_INTERRUPT_MACHINE_CHECK:
>   /* Print the MCE event to host console. */
> - machine_check_print_event_info(>arch.mce_evt, false);
> + machine_check_print_event_info(>arch.mce_evt, false, 
> true);
> 
>   /*
>* If the guest can do FWNMI, exit to userspace so it can
> @@ -1406,7 +1406,7 @@ static int kvmppc_handle_nested_exit(struct kvm_run 
> *run, struct kvm_vcpu *vcpu)
>   /* Pass the machine check to the L1 guest */
>   r = RESUME_HOST;
>   /* Print the MCE event to host console. */
> - machine_check_print_event_info(>arch.mce_evt, false);
> + machine_check_print_event_info(>arch.mce_evt, false, 
> true);
>   break;
>   /*
>* We get these next two if the guest accesses a page which it thinks
> diff --git a/arch/powerpc/platforms/powernv/opal.c 
> b/arch/powerpc/platforms/powernv/opal.c
> index 79586f1..05c85be 100644
> --- a/arch/powerpc/platforms/powernv/opal.c
> +++ b/arch/powerpc/platforms/powernv/opal.c
> @@ -587,7 +587,7 @@ int opal_machine_check(struct pt_regs *regs)
>  evt.version);
>   return 0;
>   }
> - machine_check_print_event_info(, user_mode(regs));
> + machine_check_print_event_info(, user_mode(regs), false);
> 
>   if (opal_recover_mce(regs, ))
>   return 1;
> 

-- 
Regards,
Aravinda



Re: [PATCH 1/2] KVM: PPC: Book3S HV: Simplify machine check handling

2019-02-20 Thread Aravinda Prasad
   /* Print the MCE event to host console. */
> + machine_check_print_event_info(>arch.mce_evt, false);
> +
> + /*
> +  * If the guest can do FWNMI, exit to userspace so it can
> +  * deliver a FWNMI to the guest.
> +  * Otherwise we synthesize a machine check for the guest
> +  * so that it knows that the machine check occurred.
> +  */
> + if (!vcpu->kvm->arch.fwnmi_enabled) {
> + ulong flags = vcpu->arch.shregs.msr & 0x083c;
> + kvmppc_core_queue_machine_check(vcpu, flags);
> + r = RESUME_GUEST;
> + break;
> + }
> +
>   /* Exit to guest with KVM_EXIT_NMI as exit reason */
>   run->exit_reason = KVM_EXIT_NMI;
>   run->hw.hardware_exit_reason = vcpu->arch.trap;
> @@ -1227,8 +1243,6 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, 
> struct kvm_vcpu *vcpu,
>   run->flags |= KVM_RUN_PPC_NMI_DISP_NOT_RECOV;
> 
>   r = RESUME_HOST;
> - /* Print the MCE event to host console. */
> - machine_check_print_event_info(>arch.mce_evt, false);
>   break;
>   case BOOK3S_INTERRUPT_PROGRAM:
>   {
> diff --git a/arch/powerpc/kvm/book3s_hv_ras.c 
> b/arch/powerpc/kvm/book3s_hv_ras.c
> index 0787f12..9aa10b1 100644
> --- a/arch/powerpc/kvm/book3s_hv_ras.c
> +++ b/arch/powerpc/kvm/book3s_hv_ras.c
> @@ -69,7 +69,7 @@ static void reload_slb(struct kvm_vcpu *vcpu)
>   *
>   * Returns: 0 => exit guest, 1 => deliver machine check to guest

You have to remove the above comment as the function now returns nothing.

Reviewed-by: Aravinda Prasad 

Regards,
Aravinda

>   */
> -static long kvmppc_realmode_mc_power7(struct kvm_vcpu *vcpu)
> +static void kvmppc_realmode_mc_power7(struct kvm_vcpu *vcpu)
>  {
>   unsigned long srr1 = vcpu->arch.shregs.msr;
>   struct machine_check_event mce_evt;
> @@ -111,52 +111,24 @@ static long kvmppc_realmode_mc_power7(struct kvm_vcpu 
> *vcpu)
>   }
> 
>   /*
> -  * See if we have already handled the condition in the linux host.
> -  * We assume that if the condition is recovered then linux host
> -  * will have generated an error log event that we will pick
> -  * up and log later.
> -  * Don't release mce event now. We will queue up the event so that
> -  * we can log the MCE event info on host console.
> +  * Now get the event and stash it in the vcpu struct so it can
> +  * be handled by the primary thread in virtual mode.  We can't
> +  * call machine_check_queue_event() here if we are running on
> +  * an offline secondary thread.
>*/
> - if (!get_mce_event(_evt, MCE_EVENT_DONTRELEASE))
> - goto out;
> -
> - if (mce_evt.version == MCE_V1 &&
> - (mce_evt.severity == MCE_SEV_NO_ERROR ||
> -  mce_evt.disposition == MCE_DISPOSITION_RECOVERED))
> - handled = 1;
> -
> -out:
> - /*
> -  * For guest that supports FWNMI capability, hook the MCE event into
> -  * vcpu structure. We are going to exit the guest with KVM_EXIT_NMI
> -  * exit reason. On our way to exit we will pull this event from vcpu
> -  * structure and print it from thread 0 of the core/subcore.
> -  *
> -  * For guest that does not support FWNMI capability (old QEMU):
> -  * We are now going enter guest either through machine check
> -  * interrupt (for unhandled errors) or will continue from
> -  * current HSRR0 (for handled errors) in guest. Hence
> -  * queue up the event so that we can log it from host console later.
> -  */
> - if (vcpu->kvm->arch.fwnmi_enabled) {
> - /*
> -  * Hook up the mce event on to vcpu structure.
> -  * First clear the old event.
> -  */
> - memset(>arch.mce_evt, 0, sizeof(vcpu->arch.mce_evt));
> - if (get_mce_event(_evt, MCE_EVENT_RELEASE)) {
> - vcpu->arch.mce_evt = mce_evt;
> - }
> - } else
> - machine_check_queue_event();
> + if (get_mce_event(_evt, MCE_EVENT_RELEASE)) {
> + if (handled && mce_evt.version == MCE_V1)
> + mce_evt.disposition = MCE_DISPOSITION_RECOVERED;
> + } else {
> + memset(_evt, 0, sizeof(mce_evt));
> + }
> 
> - return handled;
> + vcpu->arch.mce_evt = mce_evt;
>  }
> 
> -long kvmppc_realmode_machine_check(struct kvm_vcpu *vcpu)
> +void kvmppc_r

Re: [PATCH] powerpc/pseries: Export maximum memory value

2018-10-22 Thread Aravinda Prasad



On Wednesday 10 October 2018 09:50 PM, Nathan Fontenot wrote:
> On 10/10/2018 05:22 AM, Aravinda Prasad wrote:
>> This patch exports the maximum possible amount of memory
>> configured on the system via /proc/powerpc/lparcfg.
>>
>> Signed-off-by: Aravinda Prasad 
>> ---
>>  arch/powerpc/platforms/pseries/lparcfg.c |   13 +
>>  1 file changed, 13 insertions(+)
>>
>> diff --git a/arch/powerpc/platforms/pseries/lparcfg.c 
>> b/arch/powerpc/platforms/pseries/lparcfg.c
>> index 7c872dc..aa82f55 100644
>> --- a/arch/powerpc/platforms/pseries/lparcfg.c
>> +++ b/arch/powerpc/platforms/pseries/lparcfg.c
>> @@ -26,6 +26,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  #include 
>>  #include 
>>  #include 
>> @@ -36,6 +37,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>
>>  #include "pseries.h"
>>
>> @@ -433,6 +435,16 @@ static void parse_em_data(struct seq_file *m)
>>  seq_printf(m, "power_mode_data=%016lx\n", retbuf[0]);
>>  }
>>
>> +static void maxmem_data(struct seq_file *m)
>> +{
>> +unsigned long maxmem = 0;
>> +
>> +maxmem += drmem_info->n_lmbs * drmem_info->lmb_size;
>> +maxmem += hugetlb_total_pages() * PAGE_SIZE;
>> +
>> +seq_printf(m, "MaxMem=%ld\n", maxmem);
> 
> Should this be MaxPossibleMem?
> 
> At least for the drmem memory the value calculated is the maximum possible
> memory. I wonder if calling it MaxMem would lead users to think they have
> that much memory available to them.

Nathan,

I feel MaxMem makes more sense because, we use "Minimum memory",
"Desired memory" and "Maximum memory" in the LPAR profile configuration
and MaxMem is in fact displaying the value of this "Maximum memory"
profile value.

So I want to keep it as "MaxMem". Let me know if you think otherwise.

Regards,
Aravinda


> 
> -Nathan
> 
>> +}
>> +
>>  static int pseries_lparcfg_data(struct seq_file *m, void *v)
>>  {
>>  int partition_potential_processors;
>> @@ -491,6 +503,7 @@ static int pseries_lparcfg_data(struct seq_file *m, void 
>> *v)
>>  seq_printf(m, "slb_size=%d\n", mmu_slb_size);
>>  #endif
>>  parse_em_data(m);
>> +maxmem_data(m);
>>
>>  return 0;
>>  }
>>

-- 
Regards,
Aravinda



[PATCH v3] powerpc/pseries: Export raw per-CPU VPA data via debugfs

2018-10-16 Thread Aravinda Prasad
This patch exports the raw per-CPU VPA data via debugfs.
A per-CPU file is created which exports the VPA data of
that CPU to help debug some of the VPA related issues or
to analyze the per-CPU VPA related statistics.

v3: Removed offline CPU check.

v2: Included offline CPU check and other review comments.

Signed-off-by: Aravinda Prasad 
---
 arch/powerpc/platforms/pseries/lpar.c |   54 +
 1 file changed, 54 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/lpar.c 
b/arch/powerpc/platforms/pseries/lpar.c
index d3992ce..1c757a4 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -48,6 +48,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "pseries.h"
 
@@ -1028,3 +1029,56 @@ static int __init reserve_vrma_context_id(void)
return 0;
 }
 machine_device_initcall(pseries, reserve_vrma_context_id);
+
+#ifdef CONFIG_DEBUG_FS
+/* debugfs file interface for vpa data */
+static ssize_t vpa_file_read(struct file *filp, char __user *buf, size_t len,
+ loff_t *pos)
+{
+   int cpu = (long)filp->private_data;
+   struct lppaca *lppaca = _of(cpu);
+
+   return simple_read_from_buffer(buf, len, pos, lppaca,
+   sizeof(struct lppaca));
+}
+
+static const struct file_operations vpa_fops = {
+   .open   = simple_open,
+   .read   = vpa_file_read,
+   .llseek = default_llseek,
+};
+
+static int __init vpa_debugfs_init(void)
+{
+   char name[16];
+   long i;
+   static struct dentry *vpa_dir;
+
+   if (!firmware_has_feature(FW_FEATURE_SPLPAR))
+   return 0;
+
+   vpa_dir = debugfs_create_dir("vpa", powerpc_debugfs_root);
+   if (!vpa_dir) {
+   pr_warn("%s: can't create vpa root dir\n", __func__);
+   return -ENOMEM;
+   }
+
+   /* set up the per-cpu vpa file*/
+   for_each_possible_cpu(i) {
+   struct dentry *d;
+
+   sprintf(name, "cpu-%ld", i);
+
+   d = debugfs_create_file(name, 0400, vpa_dir, (void *)i,
+   _fops);
+   if (!d) {
+   pr_warn("%s: can't create per-cpu vpa file\n",
+   __func__);
+   return -ENOMEM;
+   }
+   }
+
+   return 0;
+}
+machine_arch_initcall(pseries, vpa_debugfs_init);
+#endif /* CONFIG_DEBUG_FS */



Re: [PATCH] powerpc/pseries: Export maximum memory value

2018-10-16 Thread Aravinda Prasad



On Wednesday 10 October 2018 10:19 PM, Naveen N. Rao wrote:
> Nathan Fontenot wrote:
>> On 10/10/2018 05:22 AM, Aravinda Prasad wrote:
>>> This patch exports the maximum possible amount of memory
>>> configured on the system via /proc/powerpc/lparcfg.
>>>
>>> Signed-off-by: Aravinda Prasad 
>>> ---
>>>  arch/powerpc/platforms/pseries/lparcfg.c |   13 +
>>>  1 file changed, 13 insertions(+)
>>>
>>> diff --git a/arch/powerpc/platforms/pseries/lparcfg.c
>>> b/arch/powerpc/platforms/pseries/lparcfg.c
>>> index 7c872dc..aa82f55 100644
>>> --- a/arch/powerpc/platforms/pseries/lparcfg.c
>>> +++ b/arch/powerpc/platforms/pseries/lparcfg.c
>>> @@ -26,6 +26,7 @@
>>>  #include 
>>>  #include 
>>>  #include 
>>> +#include 
>>>  #include 
>>>  #include 
>>>  #include 
>>> @@ -36,6 +37,7 @@
>>>  #include 
>>>  #include 
>>>  #include 
>>> +#include 
>>>
>>>  #include "pseries.h"
>>>
>>> @@ -433,6 +435,16 @@ static void parse_em_data(struct seq_file *m)
>>>  seq_printf(m, "power_mode_data=%016lx\n", retbuf[0]);
>>>  }
>>>
>>> +static void maxmem_data(struct seq_file *m)
>>> +{
>>> +    unsigned long maxmem = 0;
>>> +
>>> +    maxmem += drmem_info->n_lmbs * drmem_info->lmb_size;
>>> +    maxmem += hugetlb_total_pages() * PAGE_SIZE;
>>> +
>>> +    seq_printf(m, "MaxMem=%ld\n", maxmem);
>>
>> Should this be MaxPossibleMem?
>>
>> At least for the drmem memory the value calculated is the maximum
>> possible
>> memory. I wonder if calling it MaxMem would lead users to think they have
>> that much memory available to them.
> 
> That's a good point. This seems to be referred to as just 'maximum
> memory' in the LPAR configuration as well as in the lparstat
> documentation, but it shouldn't hurt to rename it here.

As Naveen mentioned, I am using the term referred in the LPAR
configuration. But, I can rename it.

Regards,
Aravinda

> 
> - Naveen
> 

-- 
Regards,
Aravinda



[PATCH] powerpc/pseries: Export maximum memory value

2018-10-10 Thread Aravinda Prasad
This patch exports the maximum possible amount of memory
configured on the system via /proc/powerpc/lparcfg.

Signed-off-by: Aravinda Prasad 
---
 arch/powerpc/platforms/pseries/lparcfg.c |   13 +
 1 file changed, 13 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/lparcfg.c 
b/arch/powerpc/platforms/pseries/lparcfg.c
index 7c872dc..aa82f55 100644
--- a/arch/powerpc/platforms/pseries/lparcfg.c
+++ b/arch/powerpc/platforms/pseries/lparcfg.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -36,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "pseries.h"
 
@@ -433,6 +435,16 @@ static void parse_em_data(struct seq_file *m)
seq_printf(m, "power_mode_data=%016lx\n", retbuf[0]);
 }
 
+static void maxmem_data(struct seq_file *m)
+{
+   unsigned long maxmem = 0;
+
+   maxmem += drmem_info->n_lmbs * drmem_info->lmb_size;
+   maxmem += hugetlb_total_pages() * PAGE_SIZE;
+
+   seq_printf(m, "MaxMem=%ld\n", maxmem);
+}
+
 static int pseries_lparcfg_data(struct seq_file *m, void *v)
 {
int partition_potential_processors;
@@ -491,6 +503,7 @@ static int pseries_lparcfg_data(struct seq_file *m, void *v)
seq_printf(m, "slb_size=%d\n", mmu_slb_size);
 #endif
parse_em_data(m);
+   maxmem_data(m);
 
return 0;
 }



[PATCH v2] powerpc/pseries: Export raw per-CPU VPA data via debugfs

2018-09-25 Thread Aravinda Prasad
This patch exports the raw per-CPU VPA data via debugfs.
A per-CPU file is created which exports the VPA data of
that CPU to help debug some of the VPA related issues or
to analyze the per-CPU VPA related statistics.

Signed-off-by: Aravinda Prasad 
---
 arch/powerpc/platforms/pseries/lpar.c |   57 +
 1 file changed, 57 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/lpar.c 
b/arch/powerpc/platforms/pseries/lpar.c
index d3992ce..8852ebc 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -48,6 +48,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "pseries.h"
 
@@ -1028,3 +1029,59 @@ static int __init reserve_vrma_context_id(void)
return 0;
 }
 machine_device_initcall(pseries, reserve_vrma_context_id);
+
+#ifdef CONFIG_DEBUG_FS
+/* debugfs file interface for vpa data */
+static ssize_t vpa_file_read(struct file *filp, char __user *buf, size_t len,
+ loff_t *pos)
+{
+   int cpu = (long)filp->private_data;
+   struct lppaca *lppaca = _of(cpu);
+
+   if (!cpu_online(cpu))
+   return -EINVAL;
+
+   return simple_read_from_buffer(buf, len, pos, lppaca,
+   sizeof(struct lppaca));
+}
+
+static const struct file_operations vpa_fops = {
+   .open   = simple_open,
+   .read   = vpa_file_read,
+   .llseek = default_llseek,
+};
+
+static int __init vpa_debugfs_init(void)
+{
+   char name[16];
+   long i;
+   static struct dentry *vpa_dir;
+
+   if (!firmware_has_feature(FW_FEATURE_SPLPAR))
+   return 0;
+
+   vpa_dir = debugfs_create_dir("vpa", powerpc_debugfs_root);
+   if (!vpa_dir) {
+   pr_warn("%s: can't create vpa root dir\n", __func__);
+   return -ENOMEM;
+   }
+
+   /* set up the per-cpu vpa file*/
+   for_each_possible_cpu(i) {
+   struct dentry *d;
+
+   sprintf(name, "cpu-%ld", i);
+
+   d = debugfs_create_file(name, 0400, vpa_dir, (void *)i,
+   _fops);
+   if (!d) {
+   pr_warn("%s: can't create per-cpu vpa file\n",
+   __func__);
+   return -ENOMEM;
+   }
+   }
+
+   return 0;
+}
+machine_arch_initcall(pseries, vpa_debugfs_init);
+#endif /* CONFIG_DEBUG_FS */



Re: [PATCH] powerpc/pseries: Export raw per-CPU VPA data via debugfs

2018-09-25 Thread Aravinda Prasad
Hi Michael,

On Monday 24 September 2018 05:12 PM, Michael Ellerman wrote:
> Hi Aravinda,
> 
> Aravinda Prasad  writes:
> 
>> This patch exports the raw per-CPU VPA data via debugfs.
>> A per-CPU file is created which exports the VPA data of
>> that CPU to help debug some of the VPA related issues or
>> to analyze the per-CPU VPA related statistics.
> 
> Do we really need this in debugfs? I'm not saying we don't, but I'm also
> not really clear why we need it.

Two reasons. First, if a new stat from VPA needs to be exported to user,
then we end up with a kernel patch. By exporting the VPA data, only the
user space tool needs to be updated.

Second is that we need this if we want to debug any per-CPU statistics
like stolen time etc.

> 
> If there is a good reason for exporting it, do we really want to export
> it in distro kernels, or should it be behind a CONFIG ?

I am fine to put this behind a CONFIG.

> 
>> diff --git a/arch/powerpc/platforms/pseries/lpar.c 
>> b/arch/powerpc/platforms/pseries/lpar.c
>> index d3992ce..cc12c12 100644
>> --- a/arch/powerpc/platforms/pseries/lpar.c
>> +++ b/arch/powerpc/platforms/pseries/lpar.c
>> @@ -48,6 +48,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  
>>  #include "pseries.h"
>>  
>> @@ -64,6 +65,16 @@ EXPORT_SYMBOL(plpar_hcall);
>>  EXPORT_SYMBOL(plpar_hcall9);
>>  EXPORT_SYMBOL(plpar_hcall_norets);
>>  
>> +#ifdef CONFIG_DEBUG_FS
>> +struct vpa {
>> +struct dentry   *file;
> 
> You never use this other than temporarily when creating the file.
> 
>> +int cpu;
>> +};
>> +static DEFINE_PER_CPU(struct vpa, cpu_vpa);
> 
> And you never really use the per_cpu() either.
> 
> So you don't need the vpa struct at all AFAICS.

I now think so. Will rework on the patch.

> 
>> +
>> +static struct dentry *vpa_dir;
> 
> That can just be a local in vpa_debugfs_init().

ok.

> 
>> +#endif
>> +
>>  void vpa_init(int cpu)
>>  {
>>  int hwcpu = get_hard_smp_processor_id(cpu);
>> @@ -1028,3 +1039,77 @@ static int __init reserve_vrma_context_id(void)
>>  return 0;
>>  }
>>  machine_device_initcall(pseries, reserve_vrma_context_id);
>> +
>> +#ifdef CONFIG_DEBUG_FS
>> +/* debugfs file interface for vpa data */
>> +static ssize_t vpa_file_read(struct file *filp, char __user *buf, size_t 
>> len,
>> +loff_t *pos)
> 
> Like this please:
> 
> static ssize_t vpa_file_read(struct file *filp, char __user *buf, size_t len,
>loff_t *pos)
> 

noted

>> +{
>> +long int rc;
>> +struct vpa *vpa = filp->private_data;
>> +struct lppaca *lppaca = _of(vpa->cpu);
> 
> Once you've stashed the CPU number in private_data (see below) you can
> get it back with:
> 
>   int cpu = (long)filp->private_data;
>   struct lppaca *lppaca = _of(cpu);

sure

> 
>> +
>> +if (len < sizeof(struct lppaca))
>> +return -EINVAL;
>> +
>> +rc = copy_to_user(buf, lppaca, sizeof(struct lppaca));
>> +if (rc)
>> +return -EFAULT;
> 
> You should use simple_read_from_buffer().

ok

> 
>> +
>> +return 0;
>> +}
>> +
>> +static int vpa_file_open(struct inode *inode, struct file *filp)
>> +{
>> +struct vpa *vpa = inode->i_private;
>> +
>> +filp->private_data = vpa;
>> +return 0;
>> +}
> 
> You can just use simple_open().

ok

> 
>> +static int vpa_file_release(struct inode *inode, struct file *filp)
>> +{
>> +return 0;
>> +}
> 
> I don't think you need release if it's empty.

noted

> 
>> +static const struct file_operations vpa_fops = {
>> +.open   = vpa_file_open,
>> +.release= vpa_file_release,
>> +.read   = vpa_file_read,
>> +.llseek = no_llseek,
>> +};
>> +
>> +static int __init vpa_debugfs_init(void)
>> +{
>> +char name[10];
> 
> That's not big enough if you end up with 4 billion CPUs. Make it 16.

Will make it 16.

> 
>> +int i;
>> +
>> +if (!firmware_has_feature(FW_FEATURE_SPLPAR))
>> +return 0;
>> +
>> +vpa_dir = debugfs_create_dir("vpa", powerpc_debugfs_root);
>> +if (!vpa_dir) {
>> +pr_warn("%s: can't create vpa root dir\n", __func__);
>> +return -ENOMEM;
>> +}
>> +
>> +/* set up the per-cpu vpa file*/
>> +for_e

[PATCH] powerpc/pseries: Export raw per-CPU VPA data via debugfs

2018-09-20 Thread Aravinda Prasad
This patch exports the raw per-CPU VPA data via debugfs.
A per-CPU file is created which exports the VPA data of
that CPU to help debug some of the VPA related issues or
to analyze the per-CPU VPA related statistics.

Signed-off-by: Aravinda Prasad 
---
 arch/powerpc/platforms/pseries/lpar.c |   85 +
 1 file changed, 85 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/lpar.c 
b/arch/powerpc/platforms/pseries/lpar.c
index d3992ce..cc12c12 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -48,6 +48,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "pseries.h"
 
@@ -64,6 +65,16 @@ EXPORT_SYMBOL(plpar_hcall);
 EXPORT_SYMBOL(plpar_hcall9);
 EXPORT_SYMBOL(plpar_hcall_norets);
 
+#ifdef CONFIG_DEBUG_FS
+struct vpa {
+   struct dentry   *file;
+   int cpu;
+};
+static DEFINE_PER_CPU(struct vpa, cpu_vpa);
+
+static struct dentry *vpa_dir;
+#endif
+
 void vpa_init(int cpu)
 {
int hwcpu = get_hard_smp_processor_id(cpu);
@@ -1028,3 +1039,77 @@ static int __init reserve_vrma_context_id(void)
return 0;
 }
 machine_device_initcall(pseries, reserve_vrma_context_id);
+
+#ifdef CONFIG_DEBUG_FS
+/* debugfs file interface for vpa data */
+static ssize_t vpa_file_read(struct file *filp, char __user *buf, size_t len,
+   loff_t *pos)
+{
+   long int rc;
+   struct vpa *vpa = filp->private_data;
+   struct lppaca *lppaca = _of(vpa->cpu);
+
+   if (len < sizeof(struct lppaca))
+   return -EINVAL;
+
+   rc = copy_to_user(buf, lppaca, sizeof(struct lppaca));
+   if (rc)
+   return -EFAULT;
+
+   return 0;
+}
+
+static int vpa_file_open(struct inode *inode, struct file *filp)
+{
+   struct vpa *vpa = inode->i_private;
+
+   filp->private_data = vpa;
+   return 0;
+}
+
+static int vpa_file_release(struct inode *inode, struct file *filp)
+{
+   return 0;
+}
+
+static const struct file_operations vpa_fops = {
+   .open   = vpa_file_open,
+   .release= vpa_file_release,
+   .read   = vpa_file_read,
+   .llseek = no_llseek,
+};
+
+static int __init vpa_debugfs_init(void)
+{
+   char name[10];
+   int i;
+
+   if (!firmware_has_feature(FW_FEATURE_SPLPAR))
+   return 0;
+
+   vpa_dir = debugfs_create_dir("vpa", powerpc_debugfs_root);
+   if (!vpa_dir) {
+   pr_warn("%s: can't create vpa root dir\n", __func__);
+   return -ENOMEM;
+   }
+
+   /* set up the per-cpu vpa file*/
+   for_each_possible_cpu(i) {
+   struct vpa *vpa = _cpu(cpu_vpa, i);
+
+   vpa->cpu = i;
+   sprintf(name, "cpu-%d", i);
+
+   vpa->file = debugfs_create_file(name, 0400, vpa_dir, vpa,
+   _fops);
+   if (!vpa->file) {
+   pr_warn("%s: can't create per-cpu vpa file\n",
+   __func__);
+   return -ENOMEM;
+   }
+   }
+
+   return 0;
+}
+machine_arch_initcall(pseries, vpa_debugfs_init);
+#endif /* CONFIG_DEBUG_FS */



[PATCH] powerpc/pseries: Export VPA related data

2018-09-04 Thread Aravinda Prasad
This patch exports VPA related data such as stolen and
donated CPU cycles through /proc/powerpc/lparcfg file.

Signed-off-by: Aravinda Prasad 
---
 arch/powerpc/include/asm/lppaca.h|   10 ++-
 arch/powerpc/platforms/pseries/lparcfg.c |   45 ++
 2 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/lppaca.h 
b/arch/powerpc/include/asm/lppaca.h
index 7c23ce8..87e22f6 100644
--- a/arch/powerpc/include/asm/lppaca.h
+++ b/arch/powerpc/include/asm/lppaca.h
@@ -94,7 +94,15 @@ struct lppaca {
volatile __be32 dispersion_count; /* dispatch changed physical cpu */
volatile __be64 cmo_faults; /* CMO page fault count */
volatile __be64 cmo_fault_time; /* CMO page fault time */
-   u8  reserved10[104];
+   volatile __be64 idle_stolen_purr;
+   volatile __be64 idle_stolen_spurr;
+   volatile __be64 busy_stolen_purr;
+   volatile __be64 busy_stolen_spurr;
+   volatile __be64 idle_donated_purr;
+   volatile __be64 idle_donated_spurr;
+   volatile __be64 busy_donated_purr;
+   volatile __be64 busy_donated_spurr;
+   u8  reserved10[40];
 
/* cacheline 4-5 */
 
diff --git a/arch/powerpc/platforms/pseries/lparcfg.c 
b/arch/powerpc/platforms/pseries/lparcfg.c
index 7c872dc..b986551 100644
--- a/arch/powerpc/platforms/pseries/lparcfg.c
+++ b/arch/powerpc/platforms/pseries/lparcfg.c
@@ -433,6 +433,50 @@ static void parse_em_data(struct seq_file *m)
seq_printf(m, "power_mode_data=%016lx\n", retbuf[0]);
 }
 
+static void pseries_vpa_data(struct seq_file *m)
+{
+   int cpu;
+   unsigned long idle_stolen_purr = 0;
+   unsigned long idle_stolen_spurr = 0;
+   unsigned long busy_stolen_purr = 0;
+   unsigned long busy_stolen_spurr = 0;
+   unsigned long idle_donated_purr = 0;
+   unsigned long idle_donated_spurr = 0;
+   unsigned long busy_donated_purr = 0;
+   unsigned long busy_donated_spurr = 0;
+
+   if (!firmware_has_feature(FW_FEATURE_SPLPAR))
+   return;
+
+   for_each_possible_cpu(cpu) {
+   idle_stolen_purr +=
+   be64_to_cpu(lppaca_of(cpu).idle_stolen_purr);
+   idle_stolen_spurr +=
+   be64_to_cpu(lppaca_of(cpu).idle_stolen_spurr);
+   busy_stolen_purr +=
+   be64_to_cpu(lppaca_of(cpu).busy_stolen_purr);
+   busy_stolen_spurr +=
+   be64_to_cpu(lppaca_of(cpu).busy_stolen_spurr);
+   idle_donated_purr +=
+   be64_to_cpu(lppaca_of(cpu).idle_donated_purr);
+   idle_donated_spurr +=
+   be64_to_cpu(lppaca_of(cpu).idle_donated_spurr);
+   busy_donated_purr +=
+   be64_to_cpu(lppaca_of(cpu).busy_donated_purr);
+   busy_donated_spurr +=
+   be64_to_cpu(lppaca_of(cpu).busy_donated_spurr);
+   }
+
+   seq_printf(m, "idle_stolen_purr=%lu\n", idle_stolen_purr);
+   seq_printf(m, "idle_stolen_spurr=%lu\n", idle_stolen_spurr);
+   seq_printf(m, "busy_stolen_purr=%lu\n", busy_stolen_purr);
+   seq_printf(m, "busy_stolen_spurr=%lu\n", busy_stolen_spurr);
+   seq_printf(m, "idle_donated_purr=%lu\n", idle_donated_purr);
+   seq_printf(m, "idle_donated_spurr=%lu\n", idle_donated_spurr);
+   seq_printf(m, "busy_donated_purr=%lu\n", busy_donated_purr);
+   seq_printf(m, "busy_donated_spurr=%lu\n", busy_donated_spurr);
+}
+
 static int pseries_lparcfg_data(struct seq_file *m, void *v)
 {
int partition_potential_processors;
@@ -491,6 +535,7 @@ static int pseries_lparcfg_data(struct seq_file *m, void *v)
seq_printf(m, "slb_size=%d\n", mmu_slb_size);
 #endif
parse_em_data(m);
+   pseries_vpa_data(m);
 
return 0;
 }



[PATCH v5 2/2] KVM: PPC: Exit guest upon MCE when FWNMI capability is enabled

2017-01-13 Thread Aravinda Prasad
Enhance KVM to cause a guest exit with KVM_EXIT_NMI
exit reason upon a machine check exception (MCE) in
the guest address space if the KVM_CAP_PPC_FWNMI
capability is enabled (instead of delivering a 0x200
interrupt to guest). This enables QEMU to build error
log and deliver machine check exception to guest via
guest registered machine check handler.

This approach simplifies the delivery of machine
check exception to guest OS compared to the earlier
approach of KVM directly invoking 0x200 guest interrupt
vector.

This design/approach is based on the feedback for the
QEMU patches to handle machine check exception. Details
of earlier approach of handling machine check exception
in QEMU and related discussions can be found at:

https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg00813.html

Note:

This patch introduces a hook which is invoked at the time
of guest exit to facilitate the host-side handling of
machine check exception before the exception is passed
on to the guest. Hence, the host-side handling which was
performed earlier via machine_check_fwnmi is removed.

The reasons for this approach is (i) it is not possible
to distinguish whether the exception occurred in the
guest or the host from the pt_regs passed on the
machine_check_exception(). Hence machine_check_exception()
calls panic, instead of passing on the exception to
the guest, if the machine check exception is not
recoverable. (ii) the approach introduced in this
patch gives opportunity to the host kernel to perform
actions in virtual mode before passing on the exception
to the guest. This approach does not require complex
tweaks to machine_check_fwnmi and friends.

Signed-off-by: Aravinda Prasad <aravi...@linux.vnet.ibm.com>
Reviewed-by: David Gibson <da...@gibson.dropbear.id.au>
---
 arch/powerpc/kvm/book3s_hv.c|   27 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |   47 ---
 arch/powerpc/platforms/powernv/opal.c   |   10 +++
 3 files changed, 54 insertions(+), 30 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 3686471..cae4921 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -123,6 +123,7 @@ MODULE_PARM_DESC(halt_poll_ns_shrink, "Factor halt poll 
time is shrunk by");
 
 static void kvmppc_end_cede(struct kvm_vcpu *vcpu);
 static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu);
+static void kvmppc_machine_check_hook(void);
 
 static inline struct kvm_vcpu *next_runnable_thread(struct kvmppc_vcore *vc,
int *ip)
@@ -954,15 +955,14 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, 
struct kvm_vcpu *vcpu,
r = RESUME_GUEST;
break;
case BOOK3S_INTERRUPT_MACHINE_CHECK:
+   /* Exit to guest with KVM_EXIT_NMI as exit reason */
+   run->exit_reason = KVM_EXIT_NMI;
+   r = RESUME_HOST;
/*
-* Deliver a machine check interrupt to the guest.
-* We have to do this, even if the host has handled the
-* machine check, because machine checks use SRR0/1 and
-* the interrupt might have trashed guest state in them.
+* Invoke host-kernel handler to perform any host-side
+* handling before exiting the guest.
 */
-   kvmppc_book3s_queue_irqprio(vcpu,
-   BOOK3S_INTERRUPT_MACHINE_CHECK);
-   r = RESUME_GUEST;
+   kvmppc_machine_check_hook();
break;
case BOOK3S_INTERRUPT_PROGRAM:
{
@@ -3491,6 +3491,19 @@ static void kvmppc_irq_bypass_del_producer_hv(struct 
irq_bypass_consumer *cons,
 }
 #endif
 
+/*
+ * Hook to handle machine check exceptions occurred inside a guest.
+ * This hook is invoked from host virtual mode from KVM before exiting
+ * the guest with KVM_EXIT_NMI exit reason. This gives an opportunity
+ * for the host to take action (if any) before passing on the machine
+ * check exception to the guest kernel.
+ */
+static void kvmppc_machine_check_hook(void)
+{
+   if (ppc_md.machine_check_exception)
+   ppc_md.machine_check_exception(NULL);
+}
+
 static long kvm_arch_vm_ioctl_hv(struct file *filp,
 unsigned int ioctl, unsigned long arg)
 {
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index c3c1d1b..9b41390 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -134,21 +134,18 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
stb r0, HSTATE_HWTHREAD_REQ(r13)
 
/*
-* For external and machine check interrupts, we need
-* to call the Linux handler to process the interrupt.
-* We do that by jumping to absolute address 0x500 for
-* external interrupts, or the 

[PATCH v5 1/2] KVM: PPC: Add new capability to control MCE behaviour

2017-01-13 Thread Aravinda Prasad
This patch introduces a new KVM capability to control
how KVM behaves on machine check exception (MCE).
Without this capability, KVM redirects machine check
exceptions to guest's 0x200 vector, if the address in
error belongs to the guest. With this capability KVM
causes a guest exit with NMI exit reason.

The new capability is required to avoid problems if
a new kernel/KVM is used with an old QEMU for guests
that don't issue "ibm,nmi-register". As old QEMU does
not understand the NMI exit type, it treats it as a
fatal error. However, the guest could have handled
the machine check error if the exception was delivered
to guest's 0x200 interrupt vector instead of NMI exit
in case of old QEMU.

Signed-off-by: Aravinda Prasad <aravi...@linux.vnet.ibm.com>
Reviewed-by: David Gibson <da...@gibson.dropbear.id.au>
---
 Documentation/virtual/kvm/api.txt   |   11 +++
 arch/powerpc/include/asm/kvm_host.h |1 +
 arch/powerpc/kernel/asm-offsets.c   |1 +
 arch/powerpc/kvm/powerpc.c  |7 +++
 include/uapi/linux/kvm.h|1 +
 5 files changed, 21 insertions(+)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 6bbceb9..748f7f7 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -3911,6 +3911,17 @@ to take care of that.
 This capability can be enabled dynamically even if VCPUs were already
 created and are running.
 
+7.9 KVM_CAP_PPC_FWNMI
+
+Architectures: ppc
+Parameters: none
+
+With this capability a machine check exception in the guest address
+space will cause KVM to exit the guest with NMI exit reason. This
+enables QEMU to build error log and branch to guest kernel registered
+machine check handling routine. Without this capability KVM will
+branch to guests' 0x200 interrupt vector.
+
 8. Other capabilities.
 --
 
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 28350a2..018c684 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -264,6 +264,7 @@ struct kvm_arch {
int hpt_cma_alloc;
struct dentry *debugfs_dir;
struct dentry *htab_dentry;
+   u8 fwnmi_enabled;
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
struct mutex hpt_mutex;
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index caec7bf..3acd503 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -494,6 +494,7 @@ int main(void)
DEFINE(KVM_NEED_FLUSH, offsetof(struct kvm, arch.need_tlb_flush.bits));
DEFINE(KVM_ENABLED_HCALLS, offsetof(struct kvm, arch.enabled_hcalls));
DEFINE(KVM_VRMA_SLB_V, offsetof(struct kvm, arch.vrma_slb_v));
+   DEFINE(KVM_FWNMI, offsetof(struct kvm, arch.fwnmi_enabled));
DEFINE(VCPU_DSISR, offsetof(struct kvm_vcpu, arch.shregs.dsisr));
DEFINE(VCPU_DAR, offsetof(struct kvm_vcpu, arch.shregs.dar));
DEFINE(VCPU_VPA, offsetof(struct kvm_vcpu, arch.vpa.pinned_addr));
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 70963c8..a4405a8 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -604,6 +604,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
r = cpu_has_feature(CPU_FTR_TM_COMP) &&
is_kvmppc_hv_enabled(kvm);
break;
+   case KVM_CAP_PPC_FWNMI:
+   r = 1;
+   break;
default:
r = 0;
break;
@@ -1204,6 +1207,10 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu 
*vcpu,
break;
}
 #endif /* CONFIG_KVM_XICS */
+   case KVM_CAP_PPC_FWNMI:
+   r = 0;
+   vcpu->kvm->arch.fwnmi_enabled = true;
+   break;
default:
r = -EINVAL;
break;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 4ee67cb..3e41c42 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -870,6 +870,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_S390_USER_INSTR0 130
 #define KVM_CAP_MSI_DEVID 131
 #define KVM_CAP_PPC_HTM 132
+#define KVM_CAP_PPC_FWNMI 133
 
 #ifdef KVM_CAP_IRQ_ROUTING
 



[PATCH v5 0/2] KVM: PPC: Add FWNMI support for KVM guests on POWER

2017-01-13 Thread Aravinda Prasad
This series of patches add FWNMI support for KVM guests
on POWER.

Memory errors such as bit flips that cannot be corrected
by hardware is passed on to the kernel for handling
by raising machine check exception (an NMI). Upon such
machine check exceptions, if the address in error belongs
to the guest, the error is passed on to the guest
kernel for handling. However, for guest kernels that
have issued "ibm,nmi-register" call, QEMU should build
an error log and pass on the error log to the guest-
kernel registered machine check handler routine.

This patch series adds the functionality to pass on the
machine check exception to the guest kernel by
giving control to QEMU. QEMU builds the error log
and invokes the guest-kernel registered handler.

QEMU part can be found at:
http://lists.nongnu.org/archive/html/qemu-ppc/2015-12/msg00199.html

Change Log v5:
  - Added capability documentation. No functionality/code change.

Change Log v4:
  - Allow host-side handling of the machine check exception before
passing on the exception to the guest.

Change Log v3:
  - Split the patch into 2. First patch introduces the
new capability while the second one enhances KVM to
redirect MCE.
  - Fix access width bug

Change Log v2:
  - Added KVM capability

---

Aravinda Prasad (2):
  KVM: PPC: Add new capability to control MCE behaviour
  KVM: PPC: Exit guest upon MCE when FWNMI capability is enabled


 Documentation/virtual/kvm/api.txt   |   11 +++
 arch/powerpc/include/asm/kvm_host.h |1 +
 arch/powerpc/kernel/asm-offsets.c   |1 +
 arch/powerpc/kvm/book3s_hv.c|   27 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |   47 ---
 arch/powerpc/kvm/powerpc.c  |7 +
 arch/powerpc/platforms/powernv/opal.c   |   10 +++
 include/uapi/linux/kvm.h|1 +
 8 files changed, 75 insertions(+), 30 deletions(-)

--
Aravinda Prasad



Re: [PATCH v4 1/2] KVM: PPC: Add new capability to control MCE behaviour

2017-01-12 Thread Aravinda Prasad


On Thursday 12 January 2017 03:26 PM, Paul Mackerras wrote:
> On Mon, Jan 09, 2017 at 05:10:35PM +0530, Aravinda Prasad wrote:
>> This patch introduces a new KVM capability to control
>> how KVM behaves on machine check exception (MCE).
>> Without this capability, KVM redirects machine check
>> exceptions to guest's 0x200 vector, if the address in
>> error belongs to the guest. With this capability KVM
>> causes a guest exit with NMI exit reason.
>>
>> The new capability is required to avoid problems if
>> a new kernel/KVM is used with an old QEMU for guests
>> that don't issue "ibm,nmi-register". As old QEMU does
>> not understand the NMI exit type, it treats it as a
>> fatal error. However, the guest could have handled
>> the machine check error if the exception was delivered
>> to guest's 0x200 interrupt vector instead of NMI exit
>> in case of old QEMU.
> 
> You need to add a description of the new capability to
> Documentation/virtual/kvm/api.txt.

sure.

> 
> Paul.
> 

-- 
Regards,
Aravinda



Re: [PATCH v4 2/2] KVM: PPC: Exit guest upon MCE when FWNMI capability is enabled

2017-01-12 Thread Aravinda Prasad


On Thursday 12 January 2017 02:35 PM, Balbir Singh wrote:
> On Mon, Jan 09, 2017 at 05:10:45PM +0530, Aravinda Prasad wrote:

[ . . .]


>> The reasons for this approach is (i) it is not possible
>> to distinguish whether the exception occurred in the
>> guest or the host from the pt_regs passed on the
>> machine_check_exception(). Hence machine_check_exception()
>> calls panic, instead of passing on the exception to
>> the guest, if the machine check exception is not
>> recoverable. (ii) the approach introduced in this
>> patch gives opportunity to the host kernel to perform
>> actions in virtual mode before passing on the exception
>> to the guest. This approach does not require complex
>> tweaks to machine_check_fwnmi and friends.
> 
> It would be good to qualify the different types of MCE
> and what action we expect across hypervisor and guest.

The hypervisor performs actions depending on the type of MCE (SLB
multihit, UEs, etc). If the hypervisor is unable to recover from the MCE
and if the address in error belongs to the guest, then this patch set
forwards the error to the guest kernel for handling.

The main goal of this patch set is to pass on the unrecoverable MCE
errors in the guest address space to the guest kernel, instead of
crashing the hypervisor. The action taken by the hypervisor and the
guest kernel upon MCE remains unchanged.

[ . . . ]

> 
> Shouldn't the host take action for example poison bad pages?
> 

We want to give the guest kernel a chance to recover the clean part of
the page before poisoning. As in case of an UE only few bytes of a page
are affected. Hence we don't immediately poison the bad pages in the host.

It is expected that the guest kernel performs the poisoning of the bad
pages after performing recovery action. This prevents the guest from
reusing the bad page.

However, the missing part is to communicate back to the host when guest
is done with the recovery. This is mainly to prevent reuse of bad pages
by the host when the guest shutdowns/reboots/crashes/migrates.

We are planning to address this part as a separate patch set.

Regards,
Aravinda

>>  if (opal_recover_mce(regs, ))
>>  return 1;
>>  
>>
> 
> Balbir Singh 
> 

-- 
Regards,
Aravinda



[PATCH v4 2/2] KVM: PPC: Exit guest upon MCE when FWNMI capability is enabled

2017-01-09 Thread Aravinda Prasad
Enhance KVM to cause a guest exit with KVM_EXIT_NMI
exit reason upon a machine check exception (MCE) in
the guest address space if the KVM_CAP_PPC_FWNMI
capability is enabled (instead of delivering a 0x200
interrupt to guest). This enables QEMU to build error
log and deliver machine check exception to guest via
guest registered machine check handler.

This approach simplifies the delivery of machine
check exception to guest OS compared to the earlier
approach of KVM directly invoking 0x200 guest interrupt
vector.

This design/approach is based on the feedback for the
QEMU patches to handle machine check exception. Details
of earlier approach of handling machine check exception
in QEMU and related discussions can be found at:

https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg00813.html

Note:

This patch introduces a hook which is invoked at the time
of guest exit to facilitate the host-side handling of
machine check exception before the exception is passed
on to the guest. Hence, the host-side handling which was
performed earlier via machine_check_fwnmi is removed.

The reasons for this approach is (i) it is not possible
to distinguish whether the exception occurred in the
guest or the host from the pt_regs passed on the
machine_check_exception(). Hence machine_check_exception()
calls panic, instead of passing on the exception to
the guest, if the machine check exception is not
recoverable. (ii) the approach introduced in this
patch gives opportunity to the host kernel to perform
actions in virtual mode before passing on the exception
to the guest. This approach does not require complex
tweaks to machine_check_fwnmi and friends.

Signed-off-by: Aravinda Prasad <aravi...@linux.vnet.ibm.com>
---
 arch/powerpc/kvm/book3s_hv.c|   27 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |   47 ---
 arch/powerpc/platforms/powernv/opal.c   |   10 +++
 3 files changed, 54 insertions(+), 30 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 3686471..cae4921 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -123,6 +123,7 @@ MODULE_PARM_DESC(halt_poll_ns_shrink, "Factor halt poll 
time is shrunk by");
 
 static void kvmppc_end_cede(struct kvm_vcpu *vcpu);
 static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu);
+static void kvmppc_machine_check_hook(void);
 
 static inline struct kvm_vcpu *next_runnable_thread(struct kvmppc_vcore *vc,
int *ip)
@@ -954,15 +955,14 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, 
struct kvm_vcpu *vcpu,
r = RESUME_GUEST;
break;
case BOOK3S_INTERRUPT_MACHINE_CHECK:
+   /* Exit to guest with KVM_EXIT_NMI as exit reason */
+   run->exit_reason = KVM_EXIT_NMI;
+   r = RESUME_HOST;
/*
-* Deliver a machine check interrupt to the guest.
-* We have to do this, even if the host has handled the
-* machine check, because machine checks use SRR0/1 and
-* the interrupt might have trashed guest state in them.
+* Invoke host-kernel handler to perform any host-side
+* handling before exiting the guest.
 */
-   kvmppc_book3s_queue_irqprio(vcpu,
-   BOOK3S_INTERRUPT_MACHINE_CHECK);
-   r = RESUME_GUEST;
+   kvmppc_machine_check_hook();
break;
case BOOK3S_INTERRUPT_PROGRAM:
{
@@ -3491,6 +3491,19 @@ static void kvmppc_irq_bypass_del_producer_hv(struct 
irq_bypass_consumer *cons,
 }
 #endif
 
+/*
+ * Hook to handle machine check exceptions occurred inside a guest.
+ * This hook is invoked from host virtual mode from KVM before exiting
+ * the guest with KVM_EXIT_NMI exit reason. This gives an opportunity
+ * for the host to take action (if any) before passing on the machine
+ * check exception to the guest kernel.
+ */
+static void kvmppc_machine_check_hook(void)
+{
+   if (ppc_md.machine_check_exception)
+   ppc_md.machine_check_exception(NULL);
+}
+
 static long kvm_arch_vm_ioctl_hv(struct file *filp,
 unsigned int ioctl, unsigned long arg)
 {
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index c3c1d1b..9b41390 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -134,21 +134,18 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
stb r0, HSTATE_HWTHREAD_REQ(r13)
 
/*
-* For external and machine check interrupts, we need
-* to call the Linux handler to process the interrupt.
-* We do that by jumping to absolute address 0x500 for
-* external interrupts, or the machine_check_fwnmi label
-* for machine checks (since firmware

[PATCH v4 1/2] KVM: PPC: Add new capability to control MCE behaviour

2017-01-09 Thread Aravinda Prasad
This patch introduces a new KVM capability to control
how KVM behaves on machine check exception (MCE).
Without this capability, KVM redirects machine check
exceptions to guest's 0x200 vector, if the address in
error belongs to the guest. With this capability KVM
causes a guest exit with NMI exit reason.

The new capability is required to avoid problems if
a new kernel/KVM is used with an old QEMU for guests
that don't issue "ibm,nmi-register". As old QEMU does
not understand the NMI exit type, it treats it as a
fatal error. However, the guest could have handled
the machine check error if the exception was delivered
to guest's 0x200 interrupt vector instead of NMI exit
in case of old QEMU.

QEMU part can be found at:
http://lists.nongnu.org/archive/html/qemu-ppc/2015-12/msg00199.html

Change Log v4:
  - Allow host-side handling of the machine check exception before
passing on the exception to the guest.

Change Log v3:
  - Split the patch into 2. First patch introduces the
new capability while the second one enhances KVM to
redirect MCE.
  - Fix access width bug

Change Log v2:
  - Added KVM capability

Signed-off-by: Aravinda Prasad <aravi...@linux.vnet.ibm.com>
Reviewed-by: David Gibson <da...@gibson.dropbear.id.au>
---
 arch/powerpc/include/asm/kvm_host.h |1 +
 arch/powerpc/kernel/asm-offsets.c   |1 +
 arch/powerpc/kvm/powerpc.c  |7 +++
 include/uapi/linux/kvm.h|1 +
 4 files changed, 10 insertions(+)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 28350a2..018c684 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -264,6 +264,7 @@ struct kvm_arch {
int hpt_cma_alloc;
struct dentry *debugfs_dir;
struct dentry *htab_dentry;
+   u8 fwnmi_enabled;
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
struct mutex hpt_mutex;
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index caec7bf..3acd503 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -494,6 +494,7 @@ int main(void)
DEFINE(KVM_NEED_FLUSH, offsetof(struct kvm, arch.need_tlb_flush.bits));
DEFINE(KVM_ENABLED_HCALLS, offsetof(struct kvm, arch.enabled_hcalls));
DEFINE(KVM_VRMA_SLB_V, offsetof(struct kvm, arch.vrma_slb_v));
+   DEFINE(KVM_FWNMI, offsetof(struct kvm, arch.fwnmi_enabled));
DEFINE(VCPU_DSISR, offsetof(struct kvm_vcpu, arch.shregs.dsisr));
DEFINE(VCPU_DAR, offsetof(struct kvm_vcpu, arch.shregs.dar));
DEFINE(VCPU_VPA, offsetof(struct kvm_vcpu, arch.vpa.pinned_addr));
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 70963c8..a4405a8 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -604,6 +604,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
r = cpu_has_feature(CPU_FTR_TM_COMP) &&
is_kvmppc_hv_enabled(kvm);
break;
+   case KVM_CAP_PPC_FWNMI:
+   r = 1;
+   break;
default:
r = 0;
break;
@@ -1204,6 +1207,10 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu 
*vcpu,
break;
}
 #endif /* CONFIG_KVM_XICS */
+   case KVM_CAP_PPC_FWNMI:
+   r = 0;
+   vcpu->kvm->arch.fwnmi_enabled = true;
+   break;
default:
r = -EINVAL;
break;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 4ee67cb..3e41c42 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -870,6 +870,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_S390_USER_INSTR0 130
 #define KVM_CAP_MSI_DEVID 131
 #define KVM_CAP_PPC_HTM 132
+#define KVM_CAP_PPC_FWNMI 133
 
 #ifdef KVM_CAP_IRQ_ROUTING
 



Re: [PATCH v3 2/2] KVM: PPC: Exit guest upon MCE when FWNMI capability is enabled

2016-06-21 Thread Aravinda Prasad


On Monday 20 June 2016 10:48 AM, Paul Mackerras wrote:
> Hi Aravinda,
> 
> On Wed, Jan 13, 2016 at 12:38:09PM +0530, Aravinda Prasad wrote:
>> Enhance KVM to cause a guest exit with KVM_EXIT_NMI
>> exit reasons upon a machine check exception (MCE) in
>> the guest address space if the KVM_CAP_PPC_FWNMI
>> capability is enabled (instead of delivering 0x200
>> interrupt to guest). This enables QEMU to build error
>> log and deliver machine check exception to guest via
>> guest registered machine check handler.
>>
>> This approach simplifies the delivering of machine
>> check exception to guest OS compared to the earlier
>> approach of KVM directly invoking 0x200 guest interrupt
>> vector. In the earlier approach QEMU was enhanced to
>> patch the 0x200 interrupt vector during boot. The
>> patched code at 0x200 issued a private hcall to pass
>> the control to QEMU to build the error log.
>>
>> This design/approach is based on the feedback for the
>> QEMU patches to handle machine check exception. Details
>> of earlier approach of handling machine check exception
>> in QEMU and related discussions can be found at:
>>
>> https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg00813.html
>>
>> Signed-off-by: Aravinda Prasad <aravi...@linux.vnet.ibm.com>
> 
> Are you in the process of doing a new version of this patch with the
> requested changes?

Yes, I am working (intermittently) on the new version. But, not able to
finish off and post it. Will complete it and post the new version.

Regards,
Aravinda

> 
> Paul.
> 

-- 
Regards,
Aravinda

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 2/2] KVM: PPC: Exit guest upon MCE when FWNMI capability is enabled

2016-01-25 Thread Aravinda Prasad


On Sunday 24 January 2016 02:54 AM, Paul Mackerras wrote:
> On Sat, Jan 23, 2016 at 06:23:35PM +0530, Aravinda Prasad wrote:
>>
>>
>> On Saturday 23 January 2016 03:58 PM, Paul Mackerras wrote:
>>> On Wed, Jan 13, 2016 at 12:38:09PM +0530, Aravinda Prasad wrote:
>>>> Enhance KVM to cause a guest exit with KVM_EXIT_NMI
>>>> exit reasons upon a machine check exception (MCE) in
>>>> the guest address space if the KVM_CAP_PPC_FWNMI
>>>> capability is enabled (instead of delivering 0x200
>>>> interrupt to guest). This enables QEMU to build error
>>>> log and deliver machine check exception to guest via
>>>> guest registered machine check handler.
>>>>
>>>> This approach simplifies the delivering of machine
>>>> check exception to guest OS compared to the earlier
>>>> approach of KVM directly invoking 0x200 guest interrupt
>>>> vector. In the earlier approach QEMU was enhanced to
>>>> patch the 0x200 interrupt vector during boot. The
>>>> patched code at 0x200 issued a private hcall to pass
>>>> the control to QEMU to build the error log.
>>>>
>>>> This design/approach is based on the feedback for the
>>>> QEMU patches to handle machine check exception. Details
>>>> of earlier approach of handling machine check exception
>>>> in QEMU and related discussions can be found at:
>>>
>>> [snip]
>>>
>>>> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>>>> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>>>> @@ -133,21 +133,18 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>>>>stb r0, HSTATE_HWTHREAD_REQ(r13)
>>>>  
>>>>/*
>>>> -   * For external and machine check interrupts, we need
>>>> -   * to call the Linux handler to process the interrupt.
>>>> -   * We do that by jumping to absolute address 0x500 for
>>>> -   * external interrupts, or the machine_check_fwnmi label
>>>> -   * for machine checks (since firmware might have patched
>>>> -   * the vector area at 0x200).  The [h]rfid at the end of the
>>>> -   * handler will return to the book3s_hv_interrupts.S code.
>>>> -   * For other interrupts we do the rfid to get back
>>>> -   * to the book3s_hv_interrupts.S code here.
>>>> +   * For external interrupts we need to call the Linux
>>>> +   * handler to process the interrupt. We do that by jumping
>>>> +   * to absolute address 0x500 for external interrupts.
>>>> +   * The [h]rfid at the end of the handler will return to
>>>> +   * the book3s_hv_interrupts.S code. For other interrupts
>>>> +   * we do the rfid to get back to the book3s_hv_interrupts.S
>>>> +   * code here.
>>>> */
>>>>ld  r8, 112+PPC_LR_STKOFF(r1)
>>>>addir1, r1, 112
>>>>ld  r7, HSTATE_HOST_MSR(r13)
>>>>  
>>>> -  cmpwi   cr1, r12, BOOK3S_INTERRUPT_MACHINE_CHECK
>>>>cmpwi   r12, BOOK3S_INTERRUPT_EXTERNAL
>>>>beq 11f
>>>>cmpwi   r12, BOOK3S_INTERRUPT_H_DOORBELL
>>>> @@ -162,7 +159,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>>>>mtmsrd  r6, 1   /* Clear RI in MSR */
>>>>mtsrr0  r8
>>>>mtsrr1  r7
>>>> -  beq cr1, 13f/* machine check */
>>>>RFI
>>>>  
>>>>/* On POWER7, we have external interrupts set to use HSRR0/1 */
>>>> @@ -170,8 +166,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>>>>mtspr   SPRN_HSRR1, r7
>>>>ba  0x500
>>>>  
>>>> -13:   b   machine_check_fwnmi
>>>> -
>>>
>>> So, what you're disabling here is the host-side handling of the
>>> machine check after completing the guest->host switch.  This has
>>> nothing to do with how the machine check gets communicated to the
>>> guest.
>>>
>>> Now, part of the host-side machine check handling has already
>>> happened, but I thought there was more that was done in host kernel
>>> virtual mode.  If this change really is needed then I would want an
>>> ack from Mahesh that this is correct, and it will need to be explained
>>> in detail in the patch description.
>>
>> If we don't do that we will end up running into
>> panic() in opal_machine_check() if UE belonged to guest.
>>
>> Details in this link:
>

Re: [PATCH v3 1/2] KVM: PPC: New capability to control MCE behaviour

2016-01-23 Thread Aravinda Prasad


On Saturday 23 January 2016 03:50 PM, Paul Mackerras wrote:
> On Wed, Jan 13, 2016 at 12:37:59PM +0530, Aravinda Prasad wrote:
>> This patch introduces a new KVM capability to control
>> how KVM behaves on machine check exception (MCE).
>> Without this capability, KVM redirects machine check
>> exceptions to guest's 0x200 vector if the address in
>> error belongs to the guest. With this capability KVM
>> causes a guest exit with NMI exit reason.
>>
>> This is required to avoid problems if a new kernel/KVM
>> is used with an old QEMU for guests that don't issue
>> "ibm,nmi-register". As old QEMU does not understand the
>> NMI exit type, it treats it as a fatal error. However,
>> the guest could have handled the machine check error
>> if the exception was delivered to guest's 0x200 interrupt
>> vector instead of NMI exit in case of old QEMU.
> 
> [snip]
> 
>> @@ -1132,6 +1135,10 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu 
>> *vcpu,
>>  break;
>>  }
>>  #endif /* CONFIG_KVM_XICS */
>> +case KVM_CAP_PPC_FWNMI:
>> +r = 0;
>> +vcpu->kvm->arch.fwnmi_enabled = true;
>> +break;
> 
> Might we ever want to set this flag back to false after setting it to
> true?  If so perhaps we should do vcpu->kvm->arch.fwnmi_enabled =
> !!cap->args[0].  However, I admit I can't actually think of a
> situation where we would need to reset it. :)

Even I am not able to think of any situation where resetting is required.

Regards,
Aravinda

> 
> Paul.
> 

-- 
Regards,
Aravinda

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 2/2] KVM: PPC: Exit guest upon MCE when FWNMI capability is enabled

2016-01-23 Thread Aravinda Prasad


On Saturday 23 January 2016 03:58 PM, Paul Mackerras wrote:
> On Wed, Jan 13, 2016 at 12:38:09PM +0530, Aravinda Prasad wrote:
>> Enhance KVM to cause a guest exit with KVM_EXIT_NMI
>> exit reasons upon a machine check exception (MCE) in
>> the guest address space if the KVM_CAP_PPC_FWNMI
>> capability is enabled (instead of delivering 0x200
>> interrupt to guest). This enables QEMU to build error
>> log and deliver machine check exception to guest via
>> guest registered machine check handler.
>>
>> This approach simplifies the delivering of machine
>> check exception to guest OS compared to the earlier
>> approach of KVM directly invoking 0x200 guest interrupt
>> vector. In the earlier approach QEMU was enhanced to
>> patch the 0x200 interrupt vector during boot. The
>> patched code at 0x200 issued a private hcall to pass
>> the control to QEMU to build the error log.
>>
>> This design/approach is based on the feedback for the
>> QEMU patches to handle machine check exception. Details
>> of earlier approach of handling machine check exception
>> in QEMU and related discussions can be found at:
> 
> [snip]
> 
>> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>> @@ -133,21 +133,18 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>>  stb r0, HSTATE_HWTHREAD_REQ(r13)
>>  
>>  /*
>> - * For external and machine check interrupts, we need
>> - * to call the Linux handler to process the interrupt.
>> - * We do that by jumping to absolute address 0x500 for
>> - * external interrupts, or the machine_check_fwnmi label
>> - * for machine checks (since firmware might have patched
>> - * the vector area at 0x200).  The [h]rfid at the end of the
>> - * handler will return to the book3s_hv_interrupts.S code.
>> - * For other interrupts we do the rfid to get back
>> - * to the book3s_hv_interrupts.S code here.
>> + * For external interrupts we need to call the Linux
>> + * handler to process the interrupt. We do that by jumping
>> + * to absolute address 0x500 for external interrupts.
>> + * The [h]rfid at the end of the handler will return to
>> + * the book3s_hv_interrupts.S code. For other interrupts
>> + * we do the rfid to get back to the book3s_hv_interrupts.S
>> + * code here.
>>   */
>>  ld  r8, 112+PPC_LR_STKOFF(r1)
>>  addir1, r1, 112
>>  ld  r7, HSTATE_HOST_MSR(r13)
>>  
>> -cmpwi   cr1, r12, BOOK3S_INTERRUPT_MACHINE_CHECK
>>  cmpwi   r12, BOOK3S_INTERRUPT_EXTERNAL
>>  beq 11f
>>  cmpwi   r12, BOOK3S_INTERRUPT_H_DOORBELL
>> @@ -162,7 +159,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>>  mtmsrd  r6, 1   /* Clear RI in MSR */
>>  mtsrr0  r8
>>  mtsrr1  r7
>> -beq cr1, 13f/* machine check */
>>  RFI
>>  
>>  /* On POWER7, we have external interrupts set to use HSRR0/1 */
>> @@ -170,8 +166,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>>  mtspr   SPRN_HSRR1, r7
>>  ba  0x500
>>  
>> -13: b   machine_check_fwnmi
>> -
> 
> So, what you're disabling here is the host-side handling of the
> machine check after completing the guest->host switch.  This has
> nothing to do with how the machine check gets communicated to the
> guest.
> 
> Now, part of the host-side machine check handling has already
> happened, but I thought there was more that was done in host kernel
> virtual mode.  If this change really is needed then I would want an
> ack from Mahesh that this is correct, and it will need to be explained
> in detail in the patch description.

If we don't do that we will end up running into
panic() in opal_machine_check() if UE belonged to guest.

Details in this link:
http://marc.info/?l=kvm-ppc=144730552720044=2


> 
>>  14: mtspr   SPRN_HSRR0, r8
>>  mtspr   SPRN_HSRR1, r7
>>  b   hmi_exception_after_realmode
>> @@ -2390,15 +2384,13 @@ machine_check_realmode:
>>  ld  r9, HSTATE_KVM_VCPU(r13)
>>  li  r12, BOOK3S_INTERRUPT_MACHINE_CHECK
>>  /*
>> - * Deliver unhandled/fatal (e.g. UE) MCE errors to guest through
>> - * machine check interrupt (set HSRR0 to 0x200). And for handled
>> - * errors (no-fatal), just go back to guest execution with current
>> - * HSRR0 instead of exiting guest. This new approach will inject
>> - * machine check to guest for fatal error causing guest to crash.
>> - *
>> - 

[PATCH v3 2/2] KVM: PPC: Exit guest upon MCE when FWNMI capability is enabled

2016-01-12 Thread Aravinda Prasad
Enhance KVM to cause a guest exit with KVM_EXIT_NMI
exit reasons upon a machine check exception (MCE) in
the guest address space if the KVM_CAP_PPC_FWNMI
capability is enabled (instead of delivering 0x200
interrupt to guest). This enables QEMU to build error
log and deliver machine check exception to guest via
guest registered machine check handler.

This approach simplifies the delivering of machine
check exception to guest OS compared to the earlier
approach of KVM directly invoking 0x200 guest interrupt
vector. In the earlier approach QEMU was enhanced to
patch the 0x200 interrupt vector during boot. The
patched code at 0x200 issued a private hcall to pass
the control to QEMU to build the error log.

This design/approach is based on the feedback for the
QEMU patches to handle machine check exception. Details
of earlier approach of handling machine check exception
in QEMU and related discussions can be found at:

https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg00813.html

Signed-off-by: Aravinda Prasad <aravi...@linux.vnet.ibm.com>
---
 arch/powerpc/kvm/book3s_hv.c|   12 ++--
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |   48 +++
 2 files changed, 26 insertions(+), 34 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index a7352b5..4fa03d0 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -858,15 +858,9 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, 
struct kvm_vcpu *vcpu,
r = RESUME_GUEST;
break;
case BOOK3S_INTERRUPT_MACHINE_CHECK:
-   /*
-* Deliver a machine check interrupt to the guest.
-* We have to do this, even if the host has handled the
-* machine check, because machine checks use SRR0/1 and
-* the interrupt might have trashed guest state in them.
-*/
-   kvmppc_book3s_queue_irqprio(vcpu,
-   BOOK3S_INTERRUPT_MACHINE_CHECK);
-   r = RESUME_GUEST;
+   /* Exit to guest with KVM_EXIT_NMI as exit reason */
+   run->exit_reason = KVM_EXIT_NMI;
+   r = RESUME_HOST;
break;
case BOOK3S_INTERRUPT_PROGRAM:
{
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 3c6badc..84e32a3 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -133,21 +133,18 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
stb r0, HSTATE_HWTHREAD_REQ(r13)
 
/*
-* For external and machine check interrupts, we need
-* to call the Linux handler to process the interrupt.
-* We do that by jumping to absolute address 0x500 for
-* external interrupts, or the machine_check_fwnmi label
-* for machine checks (since firmware might have patched
-* the vector area at 0x200).  The [h]rfid at the end of the
-* handler will return to the book3s_hv_interrupts.S code.
-* For other interrupts we do the rfid to get back
-* to the book3s_hv_interrupts.S code here.
+* For external interrupts we need to call the Linux
+* handler to process the interrupt. We do that by jumping
+* to absolute address 0x500 for external interrupts.
+* The [h]rfid at the end of the handler will return to
+* the book3s_hv_interrupts.S code. For other interrupts
+* we do the rfid to get back to the book3s_hv_interrupts.S
+* code here.
 */
ld  r8, 112+PPC_LR_STKOFF(r1)
addir1, r1, 112
ld  r7, HSTATE_HOST_MSR(r13)
 
-   cmpwi   cr1, r12, BOOK3S_INTERRUPT_MACHINE_CHECK
cmpwi   r12, BOOK3S_INTERRUPT_EXTERNAL
beq 11f
cmpwi   r12, BOOK3S_INTERRUPT_H_DOORBELL
@@ -162,7 +159,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
mtmsrd  r6, 1   /* Clear RI in MSR */
mtsrr0  r8
mtsrr1  r7
-   beq cr1, 13f/* machine check */
RFI
 
/* On POWER7, we have external interrupts set to use HSRR0/1 */
@@ -170,8 +166,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
mtspr   SPRN_HSRR1, r7
ba  0x500
 
-13:b   machine_check_fwnmi
-
 14:mtspr   SPRN_HSRR0, r8
mtspr   SPRN_HSRR1, r7
b   hmi_exception_after_realmode
@@ -2390,15 +2384,13 @@ machine_check_realmode:
ld  r9, HSTATE_KVM_VCPU(r13)
li  r12, BOOK3S_INTERRUPT_MACHINE_CHECK
/*
-* Deliver unhandled/fatal (e.g. UE) MCE errors to guest through
-* machine check interrupt (set HSRR0 to 0x200). And for handled
-* errors (no-fatal), just go back to guest execution with current
-* HSRR0 instead of exiting guest. This new approach will inject
-* machine check

[PATCH v3 1/2] KVM: PPC: New capability to control MCE behaviour

2016-01-12 Thread Aravinda Prasad
This patch introduces a new KVM capability to control
how KVM behaves on machine check exception (MCE).
Without this capability, KVM redirects machine check
exceptions to guest's 0x200 vector if the address in
error belongs to the guest. With this capability KVM
causes a guest exit with NMI exit reason.

This is required to avoid problems if a new kernel/KVM
is used with an old QEMU for guests that don't issue
"ibm,nmi-register". As old QEMU does not understand the
NMI exit type, it treats it as a fatal error. However,
the guest could have handled the machine check error
if the exception was delivered to guest's 0x200 interrupt
vector instead of NMI exit in case of old QEMU.

QEMU part can be found at:
http://lists.nongnu.org/archive/html/qemu-ppc/2015-12/msg00199.html

Change Log v3:
  - Split the patch into 2. First patch introduces the
new capability while the second one enhances KVM to
redirect MCE.
  - Fix access width bug
  - Rebased to v4.4-rc7

Change Log v2:
  - Added KVM capability

Signed-off-by: Aravinda Prasad <aravi...@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/kvm_host.h |1 +
 arch/powerpc/kernel/asm-offsets.c   |1 +
 arch/powerpc/kvm/powerpc.c  |7 +++
 include/uapi/linux/kvm.h|1 +
 4 files changed, 10 insertions(+)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index cfa758c..9ac2b84 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -243,6 +243,7 @@ struct kvm_arch {
int hpt_cma_alloc;
struct dentry *debugfs_dir;
struct dentry *htab_dentry;
+   u8 fwnmi_enabled;
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
struct mutex hpt_mutex;
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 221d584..6a4e81a 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -506,6 +506,7 @@ int main(void)
DEFINE(KVM_ENABLED_HCALLS, offsetof(struct kvm, arch.enabled_hcalls));
DEFINE(KVM_LPCR, offsetof(struct kvm, arch.lpcr));
DEFINE(KVM_VRMA_SLB_V, offsetof(struct kvm, arch.vrma_slb_v));
+   DEFINE(KVM_FWNMI, offsetof(struct kvm, arch.fwnmi_enabled));
DEFINE(VCPU_DSISR, offsetof(struct kvm_vcpu, arch.shregs.dsisr));
DEFINE(VCPU_DAR, offsetof(struct kvm_vcpu, arch.shregs.dar));
DEFINE(VCPU_VPA, offsetof(struct kvm_vcpu, arch.vpa.pinned_addr));
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 6fd2405..a8399b5 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -570,6 +570,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
r = 1;
break;
 #endif
+   case KVM_CAP_PPC_FWNMI:
+   r = 1;
+   break;
default:
r = 0;
break;
@@ -1132,6 +1135,10 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu 
*vcpu,
break;
}
 #endif /* CONFIG_KVM_XICS */
+   case KVM_CAP_PPC_FWNMI:
+   r = 0;
+   vcpu->kvm->arch.fwnmi_enabled = true;
+   break;
default:
r = -EINVAL;
break;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 03f3618..d8a07b5 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -831,6 +831,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_GUEST_DEBUG_HW_WPS 120
 #define KVM_CAP_SPLIT_IRQCHIP 121
 #define KVM_CAP_IOEVENTFD_ANY_LENGTH 122
+#define KVM_CAP_PPC_FWNMI 123
 
 #ifdef KVM_CAP_IRQ_ROUTING
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2] KVM: PPC: Exit guest upon fatal machine check exception

2015-12-16 Thread Aravinda Prasad


On Thursday 17 December 2015 08:02 AM, David Gibson wrote:
> On Wed, Dec 16, 2015 at 11:26:12AM +0530, Aravinda Prasad wrote:
>> This patch modifies KVM to cause a guest exit with
>> KVM_EXIT_NMI instead of immediately delivering a 0x200
>> interrupt to guest upon machine check exception in
>> guest address. Exiting the guest enables QEMU to build
>> error log and deliver machine check exception to guest
>> OS (either via guest OS registered machine check
>> handler or via 0x200 guest OS interrupt vector).
>>
>> This approach simplifies the delivering of machine
>> check exception to guest OS compared to the earlier approach
>> of KVM directly invoking 0x200 guest interrupt vector.
>> In the earlier approach QEMU patched the 0x200 interrupt
>> vector during boot. The patched code at 0x200 issued a
>> private hcall to pass the control to QEMU to build the
>> error log.
>>
>> This design/approach is based on the feedback for the
>> QEMU patches to handle machine check exception. Details
>> of earlier approach of handling machine check exception
>> in QEMU and related discussions can be found at:
>>
>> https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg00813.html
>>
>> This patch also introduces a new KVM capability to
>> control how KVM behaves on machine check exception.
>> Without this capability, KVM redirects machine check
>> exceptions to guest's 0x200 vector if the address in
>> error belongs to guest. With this capability KVM
>> causes a guest exit with NMI exit reason.
>>
>> This is required to avoid problems if a new kernel/KVM
>> is used with an old QEMU for guests that don't issue
>> "ibm,nmi-register". As old QEMU does not understand the
>> NMI exit type, it treats it as a fatal error. However,
>> the guest could have handled the machine check error
>> if the exception was delivered to guest's 0x200 interrupt
>> vector instead of NMI exit in case of old QEMU.
>>
>> Change Log v2:
>>   - Added KVM capability
>>
>> Signed-off-by: Aravinda Prasad <aravi...@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/include/asm/kvm_host.h |1 +
>>  arch/powerpc/kernel/asm-offsets.c   |1 +
>>  arch/powerpc/kvm/book3s_hv.c|   12 +++---
>>  arch/powerpc/kvm/book3s_hv_rmhandlers.S |   37 
>> +++
>>  arch/powerpc/kvm/powerpc.c  |7 ++
>>  include/uapi/linux/kvm.h|1 +
>>  6 files changed, 31 insertions(+), 28 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/kvm_host.h 
>> b/arch/powerpc/include/asm/kvm_host.h
>> index 827a38d..8a652ba 100644
>> --- a/arch/powerpc/include/asm/kvm_host.h
>> +++ b/arch/powerpc/include/asm/kvm_host.h
>> @@ -243,6 +243,7 @@ struct kvm_arch {
>>  int hpt_cma_alloc;
>>  struct dentry *debugfs_dir;
>>  struct dentry *htab_dentry;
>> +u8 fwnmi_enabled;
>>  #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
>>  #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
>>  struct mutex hpt_mutex;
>> diff --git a/arch/powerpc/kernel/asm-offsets.c 
>> b/arch/powerpc/kernel/asm-offsets.c
>> index 221d584..6a4e81a 100644
>> --- a/arch/powerpc/kernel/asm-offsets.c
>> +++ b/arch/powerpc/kernel/asm-offsets.c
>> @@ -506,6 +506,7 @@ int main(void)
>>  DEFINE(KVM_ENABLED_HCALLS, offsetof(struct kvm, arch.enabled_hcalls));
>>  DEFINE(KVM_LPCR, offsetof(struct kvm, arch.lpcr));
>>  DEFINE(KVM_VRMA_SLB_V, offsetof(struct kvm, arch.vrma_slb_v));
>> +DEFINE(KVM_FWNMI, offsetof(struct kvm, arch.fwnmi_enabled));
>>  DEFINE(VCPU_DSISR, offsetof(struct kvm_vcpu, arch.shregs.dsisr));
>>  DEFINE(VCPU_DAR, offsetof(struct kvm_vcpu, arch.shregs.dar));
>>  DEFINE(VCPU_VPA, offsetof(struct kvm_vcpu, arch.vpa.pinned_addr));
>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
>> index 2280497..1b1dff0 100644
>> --- a/arch/powerpc/kvm/book3s_hv.c
>> +++ b/arch/powerpc/kvm/book3s_hv.c
>> @@ -859,15 +859,9 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, 
>> struct kvm_vcpu *vcpu,
>>  r = RESUME_GUEST;
>>  break;
>>  case BOOK3S_INTERRUPT_MACHINE_CHECK:
>> -/*
>> - * Deliver a machine check interrupt to the guest.
>> - * We have to do this, even if the host has handled the
>> - * machine check, because machine checks use SRR0/1 and
>> - * the interrupt might have trashed guest state in them.
>> - */

Re: [PATCH v2] KVM: PPC: Exit guest upon fatal machine check exception

2015-12-16 Thread Aravinda Prasad


On Wednesday 16 December 2015 03:10 PM, Thomas Huth wrote:
> On 16/12/15 06:56, Aravinda Prasad wrote:
>> This patch modifies KVM to cause a guest exit with
>> KVM_EXIT_NMI instead of immediately delivering a 0x200
>> interrupt to guest upon machine check exception in
>> guest address. Exiting the guest enables QEMU to build
>> error log and deliver machine check exception to guest
>> OS (either via guest OS registered machine check
>> handler or via 0x200 guest OS interrupt vector).
>>
>> This approach simplifies the delivering of machine
>> check exception to guest OS compared to the earlier approach
>> of KVM directly invoking 0x200 guest interrupt vector.
>> In the earlier approach QEMU patched the 0x200 interrupt
>> vector during boot. The patched code at 0x200 issued a
>> private hcall to pass the control to QEMU to build the
>> error log.
>>
>> This design/approach is based on the feedback for the
>> QEMU patches to handle machine check exception. Details
>> of earlier approach of handling machine check exception
>> in QEMU and related discussions can be found at:
>>
>> https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg00813.html
>>
>> This patch also introduces a new KVM capability to
>> control how KVM behaves on machine check exception.
>> Without this capability, KVM redirects machine check
>> exceptions to guest's 0x200 vector if the address in
>> error belongs to guest. With this capability KVM
>> causes a guest exit with NMI exit reason.
>>
>> This is required to avoid problems if a new kernel/KVM
>> is used with an old QEMU for guests that don't issue
>> "ibm,nmi-register". As old QEMU does not understand the
>> NMI exit type, it treats it as a fatal error. However,
>> the guest could have handled the machine check error
>> if the exception was delivered to guest's 0x200 interrupt
>> vector instead of NMI exit in case of old QEMU.
>>
>> Change Log v2:
>>   - Added KVM capability
>>
>> Signed-off-by: Aravinda Prasad <aravi...@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/include/asm/kvm_host.h |1 +
>>  arch/powerpc/kernel/asm-offsets.c   |1 +
>>  arch/powerpc/kvm/book3s_hv.c|   12 +++---
>>  arch/powerpc/kvm/book3s_hv_rmhandlers.S |   37 
>> +++
>>  arch/powerpc/kvm/powerpc.c  |7 ++
>>  include/uapi/linux/kvm.h|1 +
>>  6 files changed, 31 insertions(+), 28 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/kvm_host.h 
>> b/arch/powerpc/include/asm/kvm_host.h
>> index 827a38d..8a652ba 100644
>> --- a/arch/powerpc/include/asm/kvm_host.h
>> +++ b/arch/powerpc/include/asm/kvm_host.h
>> @@ -243,6 +243,7 @@ struct kvm_arch {
>>  int hpt_cma_alloc;
>>  struct dentry *debugfs_dir;
>>  struct dentry *htab_dentry;
>> +u8 fwnmi_enabled;
> 
> Here you declare the variable as 8-bits ...
> 
>>  #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
>>  #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
>>  struct mutex hpt_mutex;
>> diff --git a/arch/powerpc/kernel/asm-offsets.c 
>> b/arch/powerpc/kernel/asm-offsets.c
>> index 221d584..6a4e81a 100644
>> --- a/arch/powerpc/kernel/asm-offsets.c
>> +++ b/arch/powerpc/kernel/asm-offsets.c
>> @@ -506,6 +506,7 @@ int main(void)
>>  DEFINE(KVM_ENABLED_HCALLS, offsetof(struct kvm, arch.enabled_hcalls));
>>  DEFINE(KVM_LPCR, offsetof(struct kvm, arch.lpcr));
>>  DEFINE(KVM_VRMA_SLB_V, offsetof(struct kvm, arch.vrma_slb_v));
>> +DEFINE(KVM_FWNMI, offsetof(struct kvm, arch.fwnmi_enabled));
> 
> ... then define an asm-offset for it ...
> 
>>  DEFINE(VCPU_DSISR, offsetof(struct kvm_vcpu, arch.shregs.dsisr));
>>  DEFINE(VCPU_DAR, offsetof(struct kvm_vcpu, arch.shregs.dar));
>>  DEFINE(VCPU_VPA, offsetof(struct kvm_vcpu, arch.vpa.pinned_addr));
>> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
>> b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>> index b98889e..f43c124 100644
>> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> [...]
>> @@ -2381,24 +2377,27 @@ machine_check_realmode:
>>  ld  r9, HSTATE_KVM_VCPU(r13)
>>  li  r12, BOOK3S_INTERRUPT_MACHINE_CHECK
>>  /*
>> - * Deliver unhandled/fatal (e.g. UE) MCE errors to guest through
>> - * machine check interrupt (set HSRR0 to 0x200). And for handled
>> - * errors (no-fatal), just go back to guest execution with current
>> - *

[PATCH v2] KVM: PPC: Exit guest upon fatal machine check exception

2015-12-15 Thread Aravinda Prasad
This patch modifies KVM to cause a guest exit with
KVM_EXIT_NMI instead of immediately delivering a 0x200
interrupt to guest upon machine check exception in
guest address. Exiting the guest enables QEMU to build
error log and deliver machine check exception to guest
OS (either via guest OS registered machine check
handler or via 0x200 guest OS interrupt vector).

This approach simplifies the delivering of machine
check exception to guest OS compared to the earlier approach
of KVM directly invoking 0x200 guest interrupt vector.
In the earlier approach QEMU patched the 0x200 interrupt
vector during boot. The patched code at 0x200 issued a
private hcall to pass the control to QEMU to build the
error log.

This design/approach is based on the feedback for the
QEMU patches to handle machine check exception. Details
of earlier approach of handling machine check exception
in QEMU and related discussions can be found at:

https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg00813.html

This patch also introduces a new KVM capability to
control how KVM behaves on machine check exception.
Without this capability, KVM redirects machine check
exceptions to guest's 0x200 vector if the address in
error belongs to guest. With this capability KVM
causes a guest exit with NMI exit reason.

This is required to avoid problems if a new kernel/KVM
is used with an old QEMU for guests that don't issue
"ibm,nmi-register". As old QEMU does not understand the
NMI exit type, it treats it as a fatal error. However,
the guest could have handled the machine check error
if the exception was delivered to guest's 0x200 interrupt
vector instead of NMI exit in case of old QEMU.

Change Log v2:
  - Added KVM capability

Signed-off-by: Aravinda Prasad <aravi...@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/kvm_host.h |1 +
 arch/powerpc/kernel/asm-offsets.c   |1 +
 arch/powerpc/kvm/book3s_hv.c|   12 +++---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |   37 +++
 arch/powerpc/kvm/powerpc.c  |7 ++
 include/uapi/linux/kvm.h|1 +
 6 files changed, 31 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 827a38d..8a652ba 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -243,6 +243,7 @@ struct kvm_arch {
int hpt_cma_alloc;
struct dentry *debugfs_dir;
struct dentry *htab_dentry;
+   u8 fwnmi_enabled;
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
struct mutex hpt_mutex;
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 221d584..6a4e81a 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -506,6 +506,7 @@ int main(void)
DEFINE(KVM_ENABLED_HCALLS, offsetof(struct kvm, arch.enabled_hcalls));
DEFINE(KVM_LPCR, offsetof(struct kvm, arch.lpcr));
DEFINE(KVM_VRMA_SLB_V, offsetof(struct kvm, arch.vrma_slb_v));
+   DEFINE(KVM_FWNMI, offsetof(struct kvm, arch.fwnmi_enabled));
DEFINE(VCPU_DSISR, offsetof(struct kvm_vcpu, arch.shregs.dsisr));
DEFINE(VCPU_DAR, offsetof(struct kvm_vcpu, arch.shregs.dar));
DEFINE(VCPU_VPA, offsetof(struct kvm_vcpu, arch.vpa.pinned_addr));
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 2280497..1b1dff0 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -859,15 +859,9 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, 
struct kvm_vcpu *vcpu,
r = RESUME_GUEST;
break;
case BOOK3S_INTERRUPT_MACHINE_CHECK:
-   /*
-* Deliver a machine check interrupt to the guest.
-* We have to do this, even if the host has handled the
-* machine check, because machine checks use SRR0/1 and
-* the interrupt might have trashed guest state in them.
-*/
-   kvmppc_book3s_queue_irqprio(vcpu,
-   BOOK3S_INTERRUPT_MACHINE_CHECK);
-   r = RESUME_GUEST;
+   /* Exit to guest with KVM_EXIT_NMI as exit reason */
+   run->exit_reason = KVM_EXIT_NMI;
+   r = RESUME_HOST;
break;
case BOOK3S_INTERRUPT_PROGRAM:
{
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index b98889e..f43c124 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -147,7 +147,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
addir1, r1, 112
ld  r7, HSTATE_HOST_MSR(r13)
 
-   cmpwi   cr1, r12, BOOK3S_INTERRUPT_MACHINE_CHECK
cmpwi   r12, BOOK3S_INTERRUPT_EXTERNAL
beq 11f
cmpwi   cr2, r12, B

Re: [PATCH] KVM: PPC: Exit guest upon fatal machine check exception

2015-11-13 Thread Aravinda Prasad


On Friday 13 November 2015 01:08 PM, Thomas Huth wrote:
> On 13/11/15 07:26, Aravinda Prasad wrote:
>>
>> On Friday 13 November 2015 07:20 AM, David Gibson wrote:
>>> On Thu, Nov 12, 2015 at 11:22:29PM +0530, Aravinda Prasad wrote:
> [...]
>>>> So thinking whether qemu should explicitly enable the new NMI
>>>> behavior.
>>>
>>> So, I think the reasoning above tends towards having qemu control the
>>> MC behaviour.  If qemu does nothing, MCs are delivered direct to
>>> 0x200, if it enables the new handling, they cause a KVM exit and qemu
>>> will deliver the MC.
>>
>> This essentially requires qemu to control how KVM behaves as KVM does
>> the actual redirection of MC either to guest's 0x200 vector or to exit
>> guest. So, if we are running new qemu, then KVM should exit guest and if
>> we are running old qemu, KVM should redirect MC to 0x200. Is there any
>> way to communicate this to KVM? ioctl?
> 
> Simply introduce a KVM capability that can be enabled by userspace.
> See kvm_vcpu_ioctl_enable_cap() in arch/powerpc/kvm/powerpc.c.

Thanks. I will look at it.

Regards,
Aravinda

> 
>  Thomas
> 

-- 
Regards,
Aravinda

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] KVM: PPC: Exit guest upon fatal machine check exception

2015-11-12 Thread Aravinda Prasad


On Friday 13 November 2015 07:20 AM, David Gibson wrote:
> On Thu, Nov 12, 2015 at 11:22:29PM +0530, Aravinda Prasad wrote:

[...]

>>
>> I overlooked it. I think I need to take into consideration whether guest
>> issued "ibm, nmi-register". If the guest has issued "ibm, nmi-register"
>> then we should not jump to 0x200 upon UE. With the old kernel and old
>> QEMU this is broken as we always jump to 0x200.
>>
>> However, if the guest has not issued "ibm, nmi-register" then upon UE we
>> should jump to 0x200. If new kernel is used with old QEMU this
>> functionality breaks as it causes guest to terminate with unhandled NMI
>> exit.
>>
>> So thinking whether qemu should explicitly enable the new NMI
>> behavior.
> 
> So, I think the reasoning above tends towards having qemu control the
> MC behaviour.  If qemu does nothing, MCs are delivered direct to
> 0x200, if it enables the new handling, they cause a KVM exit and qemu
> will deliver the MC.

This essentially requires qemu to control how KVM behaves as KVM does
the actual redirection of MC either to guest's 0x200 vector or to exit
guest. So, if we are running new qemu, then KVM should exit guest and if
we are running old qemu, KVM should redirect MC to 0x200. Is there any
way to communicate this to KVM? ioctl?

However, this should not be a problem (in terms of breaking existing
functionality) with old KVM as it always redirects MC to 0x200
irrespective of old or new qemu.

> 
> Then I'd expect qemu to switch on the new-style handling from
> ibm,nmi-register.
> 
>>

-- 
Regards,
Aravinda

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] KVM: PPC: Exit guest upon fatal machine check exception

2015-11-12 Thread Aravinda Prasad


On Friday 13 November 2015 03:07 AM, Daniel Axtens wrote:
> Aravinda Prasad <aravi...@linux.vnet.ibm.com> writes:
> 
>>> Yeah, it would be good not to break this.
>>
>> I am not familiar with CAPI. Does this affect CAPI?
> 
> When a CAPI card experiences an EEH event, any cache lines it holds are
> filled with SUEs (Special UEs, interpreted by the kernel the same as
> regular UEs). When these are read, we get an MCE. Currently CAPI does
> not support virtualisation, but that is actively being worked
> on. There's a _very_ good chance it will then be backported to various
> distros, which could have old qemu.
> 
> Therefore, I'd ideally like to make sure UEs in KVM guests work properly
> and continue to do so in all combinations of kernel and qemu. I'm
> following up the email from Mahesh that you linked: I'm not sure I quite
> follow his logic so I'll try to make sense of that and then get back to
> you.

sure. Thanks.

Regards,
Aravinda

> 
> Regards,
> Daniel
> 

-- 
Regards,
Aravinda

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] KVM: PPC: Exit guest upon fatal machine check exception

2015-11-12 Thread Aravinda Prasad


On Thursday 12 November 2015 10:13 AM, David Gibson wrote:
> On Thu, Nov 12, 2015 at 10:02:10AM +0530, Aravinda Prasad wrote:
>>
>>
>> On Thursday 12 November 2015 09:08 AM, David Gibson wrote:
>>> On Thu, Nov 12, 2015 at 01:24:19PM +1100, Daniel Axtens wrote:
>>>> Aravinda Prasad <aravi...@linux.vnet.ibm.com> writes:
>>>>
>>>>> This patch modifies KVM to cause a guest exit with
>>>>> KVM_EXIT_NMI instead of immediately delivering a 0x200
>>>>> interrupt to guest upon machine check exception in
>>>>> guest address. Exiting the guest enables QEMU to build
>>>>> error log and deliver machine check exception to guest
>>>>> OS (either via guest OS registered machine check
>>>>> handler or via 0x200 guest OS interrupt vector).
>>>>>
>>>>> This approach simplifies the delivering of machine
>>>>> check exception to guest OS compared to the earlier approach
>>>>> of KVM directly invoking 0x200 guest interrupt vector.
>>>>> In the earlier approach QEMU patched the 0x200 interrupt
>>>>> vector during boot. The patched code at 0x200 issued a
>>>>> private hcall to pass the control to QEMU to build the
>>>>> error log.
>>>>>
>>>>> This design/approach is based on the feedback for the
>>>>> QEMU patches to handle machine check exception. Details
>>>>> of earlier approach of handling machine check exception
>>>>> in QEMU and related discussions can be found at:
>>>>>
>>>>> https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg00813.html
>>>>
>>>> I've poked at the MCE code, but not the KVM MCE code, so I may be
>>>> mistaken here, but I'm not clear on how this handles errors that the
>>>> guest can recover without terminating.
>>>>
>>>> For example, a Linux guest can handle a UE in guest userspace by killing
>>>> the guest process. A hypthetical non-linux guest with a microkernel
>>>> could even survive UEs in drivers.
>>>>
>>>> It sounds from your patch like you're changing this behaviour. Is this
>>>> right?
>>>
>>> So, IIUC.  Once the qemu pieces are in place as well it shouldn't
>>> change this behaviour: KVM will exit to qemu, qemu will log the error
>>> information (new), then reinject the MC to the guest which can still
>>> handle it as you describe above.
>>
>> Yes. With KVM and QEMU both in place this will not change the behavior.
>> QEMU will inject the UE to guest and the guest handles the UE based on
>> where it occurred. For example if an UE happens in a guest process
>> address space, that process will be killed.
>>
>>>
>>> But, there could be a problem if you have a new kernel with an old
>>> qemu, in that case qemu might not understand the new exit type and
>>> treat it as a fatal error, even though the guest could actually cope
>>> with it.
>>
>> In case of new kernel and old QEMU, the guest terminates as old QEMU
>> does not understand the NMI exit reason. However, this is the case with
>> old kernel and old QEMU as they do not handle UE belonging to guest. The
>> difference is that the guest kernel terminates with different error
>> code.
> 
> Ok.. assuming the guest has code to handle the UE in 0x200, why would
> the guest terminate with old kernel and old qemu?  I haven't quite
> followed the logic.

I overlooked it. I think I need to take into consideration whether guest
issued "ibm, nmi-register". If the guest has issued "ibm, nmi-register"
then we should not jump to 0x200 upon UE. With the old kernel and old
QEMU this is broken as we always jump to 0x200.

However, if the guest has not issued "ibm, nmi-register" then upon UE we
should jump to 0x200. If new kernel is used with old QEMU this
functionality breaks as it causes guest to terminate with unhandled NMI
exit.

So thinking whether qemu should explicitly enable the new NMI behavior.

Regards,
Aravinda

> 
>>
>> old kernel and old QEMU -> guest panics [1] irrespective of where UE
>>happened in guest address space.
>> old kernel and new QEMU -> guest panics. same as above.
>> new kernel and old QEMU -> guest terminates with unhanded NMI error
>>irrespective of where UE happened in guest
>> new kernel and new QEMU -> guest handles UEs in process address space
>>by killing the pr

Re: [PATCH] KVM: PPC: Exit guest upon fatal machine check exception

2015-11-12 Thread Aravinda Prasad


On Thursday 12 November 2015 10:28 AM, Daniel Axtens wrote:
> 
>> So, IIUC.  Once the qemu pieces are in place as well it shouldn't
>> change this behaviour: KVM will exit to qemu, qemu will log the error
>> information (new), then reinject the MC to the guest which can still
>> handle it as you describe above.
> 
> Ah, that makes *much* more sense now! Thanks for the explanation: I
> don't really follow qemu development.
> 
>>
>> But, there could be a problem if you have a new kernel with an old
>> qemu, in that case qemu might not understand the new exit type and
>> treat it as a fatal error, even though the guest could actually cope
>> with it.
>>
>> Aravinda, do we need to change this so that qemu has to explicitly
>> enable the new NMI behaviour?  Or have I missed something that will
>> make that case work already.
> 
> Yeah, it would be good not to break this.

I am not familiar with CAPI. Does this affect CAPI?

Regards,
Aravinda

> 
> Regards,
> Daniel
> 
> 
>> -- 
>> David Gibson | I'll have my music baroque, and my code
>> david AT gibson.dropbear.id.au   | minimalist, thank you.  NOT _the_ 
>> _other_
>>  | _way_ _around_!
>> http://www.ozlabs.org/~dgibson
> ___
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
> 

-- 
Regards,
Aravinda

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] KVM: PPC: Exit guest upon fatal machine check exception

2015-11-11 Thread Aravinda Prasad
This patch modifies KVM to cause a guest exit with
KVM_EXIT_NMI instead of immediately delivering a 0x200
interrupt to guest upon machine check exception in
guest address. Exiting the guest enables QEMU to build
error log and deliver machine check exception to guest
OS (either via guest OS registered machine check
handler or via 0x200 guest OS interrupt vector).

This approach simplifies the delivering of machine
check exception to guest OS compared to the earlier approach
of KVM directly invoking 0x200 guest interrupt vector.
In the earlier approach QEMU patched the 0x200 interrupt
vector during boot. The patched code at 0x200 issued a
private hcall to pass the control to QEMU to build the
error log.

This design/approach is based on the feedback for the
QEMU patches to handle machine check exception. Details
of earlier approach of handling machine check exception
in QEMU and related discussions can be found at:

https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg00813.html

Signed-off-by: Aravinda Prasad <aravi...@linux.vnet.ibm.com>
---
 arch/powerpc/kvm/book3s_hv.c|   12 +++-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |   31 +++
 2 files changed, 14 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 2280497..1b1dff0 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -859,15 +859,9 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, 
struct kvm_vcpu *vcpu,
r = RESUME_GUEST;
break;
case BOOK3S_INTERRUPT_MACHINE_CHECK:
-   /*
-* Deliver a machine check interrupt to the guest.
-* We have to do this, even if the host has handled the
-* machine check, because machine checks use SRR0/1 and
-* the interrupt might have trashed guest state in them.
-*/
-   kvmppc_book3s_queue_irqprio(vcpu,
-   BOOK3S_INTERRUPT_MACHINE_CHECK);
-   r = RESUME_GUEST;
+   /* Exit to guest with KVM_EXIT_NMI as exit reason */
+   run->exit_reason = KVM_EXIT_NMI;
+   r = RESUME_HOST;
break;
case BOOK3S_INTERRUPT_PROGRAM:
{
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index b98889e..672b4f6 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -147,7 +147,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
addir1, r1, 112
ld  r7, HSTATE_HOST_MSR(r13)
 
-   cmpwi   cr1, r12, BOOK3S_INTERRUPT_MACHINE_CHECK
cmpwi   r12, BOOK3S_INTERRUPT_EXTERNAL
beq 11f
cmpwi   cr2, r12, BOOK3S_INTERRUPT_HMI
@@ -160,7 +159,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
mtmsrd  r6, 1   /* Clear RI in MSR */
mtsrr0  r8
mtsrr1  r7
-   beq cr1, 13f/* machine check */
RFI
 
/* On POWER7, we have external interrupts set to use HSRR0/1 */
@@ -168,8 +166,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
mtspr   SPRN_HSRR1, r7
ba  0x500
 
-13:b   machine_check_fwnmi
-
 14:mtspr   SPRN_HSRR0, r8
mtspr   SPRN_HSRR1, r7
b   hmi_exception_after_realmode
@@ -2381,24 +2377,19 @@ machine_check_realmode:
ld  r9, HSTATE_KVM_VCPU(r13)
li  r12, BOOK3S_INTERRUPT_MACHINE_CHECK
/*
-* Deliver unhandled/fatal (e.g. UE) MCE errors to guest through
-* machine check interrupt (set HSRR0 to 0x200). And for handled
-* errors (no-fatal), just go back to guest execution with current
-* HSRR0 instead of exiting guest. This new approach will inject
-* machine check to guest for fatal error causing guest to crash.
-*
-* The old code used to return to host for unhandled errors which
-* was causing guest to hang with soft lockups inside guest and
-* makes it difficult to recover guest instance.
+* Deliver unhandled/fatal (e.g. UE) MCE errors to guest
+* by exiting the guest with KVM_EXIT_NMI exit reason (exit
+* reason set later based on trap). For handled errors
+* (no-fatal), go back to guest execution with current HSRR0
+* instead of exiting the guest. This approach will cause
+* the guest to exit in case of fatal machine check error.
 */
-   ld  r10, VCPU_PC(r9)
+   bne 2f  /* Continue guest execution? */
+   /* If not, exit the guest. SRR0/1 are already set */
+   b   mc_cont
+2: ld  r10, VCPU_PC(r9)
ld  r11, VCPU_MSR(r9)
-   bne 2f  /* Continue guest execution. */
-   /* If not, deliver a machine check.  SRR0/1 are already set */
-   li  r10, BOOK3S_INTERRUPT_MACH

Re: [PATCH] KVM: PPC: Exit guest upon fatal machine check exception

2015-11-11 Thread Aravinda Prasad


On Thursday 12 November 2015 09:04 AM, David Gibson wrote:
> On Wed, Nov 11, 2015 at 10:28:46PM +0530, Aravinda Prasad wrote:
>> This patch modifies KVM to cause a guest exit with
>> KVM_EXIT_NMI instead of immediately delivering a 0x200
>> interrupt to guest upon machine check exception in
>> guest address. Exiting the guest enables QEMU to build
>> error log and deliver machine check exception to guest
>> OS (either via guest OS registered machine check
>> handler or via 0x200 guest OS interrupt vector).
>>
>> This approach simplifies the delivering of machine
>> check exception to guest OS compared to the earlier approach
>> of KVM directly invoking 0x200 guest interrupt vector.
>> In the earlier approach QEMU patched the 0x200 interrupt
>> vector during boot. The patched code at 0x200 issued a
>> private hcall to pass the control to QEMU to build the
>> error log.
>>
>> This design/approach is based on the feedback for the
>> QEMU patches to handle machine check exception. Details
>> of earlier approach of handling machine check exception
>> in QEMU and related discussions can be found at:
>>
>> https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg00813.html
>>
>> Signed-off-by: Aravinda Prasad <aravi...@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/kvm/book3s_hv.c|   12 +++-
>>  arch/powerpc/kvm/book3s_hv_rmhandlers.S |   31 
>> +++
>>  2 files changed, 14 insertions(+), 29 deletions(-)
>>
>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
>> index 2280497..1b1dff0 100644
>> --- a/arch/powerpc/kvm/book3s_hv.c
>> +++ b/arch/powerpc/kvm/book3s_hv.c
>> @@ -859,15 +859,9 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, 
>> struct kvm_vcpu *vcpu,
>>  r = RESUME_GUEST;
>>  break;
>>  case BOOK3S_INTERRUPT_MACHINE_CHECK:
>> -/*
>> - * Deliver a machine check interrupt to the guest.
>> - * We have to do this, even if the host has handled the
>> - * machine check, because machine checks use SRR0/1 and
>> - * the interrupt might have trashed guest state in them.
>> - */
>> -kvmppc_book3s_queue_irqprio(vcpu,
>> -BOOK3S_INTERRUPT_MACHINE_CHECK);
>> -r = RESUME_GUEST;
>> +/* Exit to guest with KVM_EXIT_NMI as exit reason */
>> +run->exit_reason = KVM_EXIT_NMI;
>> +r = RESUME_HOST;
>>  break;
>>  case BOOK3S_INTERRUPT_PROGRAM:
>>  {
>> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
>> b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>> index b98889e..672b4f6 100644
>> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>> @@ -147,7 +147,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>>  addir1, r1, 112
>>  ld  r7, HSTATE_HOST_MSR(r13)
>>  
>> -cmpwi   cr1, r12, BOOK3S_INTERRUPT_MACHINE_CHECK
>>  cmpwi   r12, BOOK3S_INTERRUPT_EXTERNAL
>>  beq 11f
>>  cmpwi   cr2, r12, BOOK3S_INTERRUPT_HMI
>> @@ -160,7 +159,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>>  mtmsrd  r6, 1   /* Clear RI in MSR */
>>  mtsrr0  r8
>>  mtsrr1  r7
>> -beq cr1, 13f/* machine check */
>>  RFI
>>  
>>  /* On POWER7, we have external interrupts set to use HSRR0/1 */
>> @@ -168,8 +166,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>>  mtspr   SPRN_HSRR1, r7
>>  ba  0x500
>>  
>> -13: b   machine_check_fwnmi
>> -
> 
> I don't quite understand the 3 hunks above.  The rest seems to change
> the delivery of MCs as I'd expect, but the above seems to change their
> detection and the reason for that isn't obvious to me.

We need to do guest exit here and hence we continue with RFI or else if
we branch to machine_check_fwnmi then the control will flow to
opal_recover_mce routine without causing the guest to exit with NMI exit
code. And I think, opal_recover_mce() is used to recover from UEs in
host user/kernel space and does not have a check for UEs belonging to
guest. Hence if we branch to machine_check_fwnmi we end up running into
panic() in opal_machine_check() if UE belonged to guest.

Regards,
Aravinda

> 
> 
>>  14: mtspr   SPRN_HSRR0, r8
>>  mtspr   SPRN_HSRR1, r7
>>  b   hmi_exception_after_realmode
>> @@ -2381,24 +2377,19 @@ machine_check

Re: [PATCH] KVM: PPC: Exit guest upon fatal machine check exception

2015-11-11 Thread Aravinda Prasad


On Thursday 12 November 2015 09:08 AM, David Gibson wrote:
> On Thu, Nov 12, 2015 at 01:24:19PM +1100, Daniel Axtens wrote:
>> Aravinda Prasad <aravi...@linux.vnet.ibm.com> writes:
>>
>>> This patch modifies KVM to cause a guest exit with
>>> KVM_EXIT_NMI instead of immediately delivering a 0x200
>>> interrupt to guest upon machine check exception in
>>> guest address. Exiting the guest enables QEMU to build
>>> error log and deliver machine check exception to guest
>>> OS (either via guest OS registered machine check
>>> handler or via 0x200 guest OS interrupt vector).
>>>
>>> This approach simplifies the delivering of machine
>>> check exception to guest OS compared to the earlier approach
>>> of KVM directly invoking 0x200 guest interrupt vector.
>>> In the earlier approach QEMU patched the 0x200 interrupt
>>> vector during boot. The patched code at 0x200 issued a
>>> private hcall to pass the control to QEMU to build the
>>> error log.
>>>
>>> This design/approach is based on the feedback for the
>>> QEMU patches to handle machine check exception. Details
>>> of earlier approach of handling machine check exception
>>> in QEMU and related discussions can be found at:
>>>
>>> https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg00813.html
>>
>> I've poked at the MCE code, but not the KVM MCE code, so I may be
>> mistaken here, but I'm not clear on how this handles errors that the
>> guest can recover without terminating.
>>
>> For example, a Linux guest can handle a UE in guest userspace by killing
>> the guest process. A hypthetical non-linux guest with a microkernel
>> could even survive UEs in drivers.
>>
>> It sounds from your patch like you're changing this behaviour. Is this
>> right?
> 
> So, IIUC.  Once the qemu pieces are in place as well it shouldn't
> change this behaviour: KVM will exit to qemu, qemu will log the error
> information (new), then reinject the MC to the guest which can still
> handle it as you describe above.

Yes. With KVM and QEMU both in place this will not change the behavior.
QEMU will inject the UE to guest and the guest handles the UE based on
where it occurred. For example if an UE happens in a guest process
address space, that process will be killed.

> 
> But, there could be a problem if you have a new kernel with an old
> qemu, in that case qemu might not understand the new exit type and
> treat it as a fatal error, even though the guest could actually cope
> with it.

In case of new kernel and old QEMU, the guest terminates as old QEMU
does not understand the NMI exit reason. However, this is the case with
old kernel and old QEMU as they do not handle UE belonging to guest. The
difference is that the guest kernel terminates with different error code.

old kernel and old QEMU -> guest panics [1] irrespective of where UE
   happened in guest address space.
old kernel and new QEMU -> guest panics. same as above.
new kernel and old QEMU -> guest terminates with unhanded NMI error
   irrespective of where UE happened in guest
new kernel and new QEMU -> guest handles UEs in process address space
   by killing the process. guest terminates
   for UEs in guest kernel address space.

[1] https://lists.ozlabs.org/pipermail/linuxppc-dev/2014-June/118329.html

> 
> Aravinda, do we need to change this so that qemu has to explicitly
> enable the new NMI behaviour?  Or have I missed something that will
> make that case work already.

I think we don't need to explicitly enable the new behavior. With new
kernel and new QEMU this should just work. As mentioned above this is
already broken for old kernel/QEMU. Any thoughts?

Regards,
Aravinda

> 
> 
> 
> ___
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
> 

-- 
Regards,
Aravinda

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] Add private HCALL to inform updated RTAS base and entry

2014-08-25 Thread Aravinda Prasad
This patch adds a private HCALL to inform qemu the updated
rtas-base and rtas-entry address when OS invokes the call
instantiate-rtas. This is required as qemu allocates the
error reporting structure in RTAS space upon a machine check
exception and hence needs to know the updated RTAS.

Enhancements to qemu to handle the private HCALL, prepare
error log and invoke machine check notification routine
are in a separate patch.

Signed-off-by: Nikunj A. Dadhania nik...@linux.vnet.ibm.com
Signed-off-by: Aravinda Prasad aravi...@linux.vnet.ibm.com
---
 board-qemu/slof/rtas.fs   |6 +-
 lib/libhvcall/hvcall.code |6 ++
 lib/libhvcall/hvcall.in   |1 +
 lib/libhvcall/libhvcall.h |4 +++-
 4 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/board-qemu/slof/rtas.fs b/board-qemu/slof/rtas.fs
index 41e30c2..c081b57 100644
--- a/board-qemu/slof/rtas.fs
+++ b/board-qemu/slof/rtas.fs
@@ -174,7 +174,11 @@ rtas-node set-node
 
 : instantiate-rtas ( adr -- entry )
 dup rtas-base swap rtas-size move
-rtas-entry rtas-base - +
+dup rtas-entry rtas-base - +
+2dup hv-rtas-update 0  IF
+. Failed to update RTAS  cr
+THEN
+nip   
 ;
 
 device-end
diff --git a/lib/libhvcall/hvcall.code b/lib/libhvcall/hvcall.code
index 744469f..0ff50f2 100644
--- a/lib/libhvcall/hvcall.code
+++ b/lib/libhvcall/hvcall.code
@@ -111,6 +111,12 @@ PRIM(hv_X2d_cas)
TOS.u = hv_cas(vec, buf, size);
 MIRP
 
+PRIM(hv_X2d_rtas_X2d_update)
+   unsigned long rtas_entry   = TOS.u; POP;
+   unsigned long rtas_base= TOS.u;
+   TOS.u = hv_generic(KVMPPC_H_RTAS_UPDATE, rtas_base, rtas_entry);
+MIRP
+
 PRIM(get_X2d_print_X2d_version)
unsigned long addr = TOS.u; POP;
get_print_banner(addr);
diff --git a/lib/libhvcall/hvcall.in b/lib/libhvcall/hvcall.in
index e99d6d1..4437b77 100644
--- a/lib/libhvcall/hvcall.in
+++ b/lib/libhvcall/hvcall.in
@@ -30,4 +30,5 @@ cod(RX!)
 
 cod(hv-logical-memop)
 cod(hv-cas)
+cod(hv-rtas-update)
 cod(get-print-version)
diff --git a/lib/libhvcall/libhvcall.h b/lib/libhvcall/libhvcall.h
index 6356a62..193b738 100644
--- a/lib/libhvcall/libhvcall.h
+++ b/lib/libhvcall/libhvcall.h
@@ -24,7 +24,9 @@
 #define KVMPPC_H_LOGICAL_MEMOP  (KVMPPC_HCALL_BASE + 0x1)
 /* Client Architecture support */
 #define KVMPPC_H_CAS(KVMPPC_HCALL_BASE + 0x2)
-#define KVMPPC_HCALL_MAXKVMPPC_H_CAS
+#define KVMPPC_H_RTAS_UPDATE(KVMPPC_HCALL_BASE + 0x3)
+#define KVMPPC_H_REPORT_MC_ERR  (KVMPPC_HCALL_BASE + 0x4)
+#define KVMPPC_HCALL_MAXKVMPPC_H_NMI_MCE
 
 #ifndef __ASSEMBLY__
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] Enable hardware breakpoint upon re-registering

2012-11-05 Thread Aravinda Prasad
On powerpc, ptrace will disable hardware breakpoint request once the
breakpoint is hit. It is the responsibility of the caller to set it
again. However, when the caller sets the hardware breakpoint again
using ptrace(PTRACE_SET_DEBUGREG, child_pid, 0, addr), the hardware
breakpoint is not enabled.

While gdb's approach is to unregister and re-register the hardware
breakpoint every time the breakpoint is hit - which is working fine,
this could affect other programs trying to re-register hardware
breakpoint without unregistering.

This patch enables hardware breakpoint if the caller is re-registering.

Signed-off-by: Aravinda Prasad aravi...@linux.vnet.ibm.com
---
 arch/powerpc/kernel/ptrace.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
index 79d8e56..09371d0 100644
--- a/arch/powerpc/kernel/ptrace.c
+++ b/arch/powerpc/kernel/ptrace.c
@@ -952,6 +952,10 @@ int ptrace_set_debugreg(struct task_struct *task, unsigned 
long addr,
arch_bp_generic_fields(data 
(DABR_DATA_WRITE | DABR_DATA_READ),
attr.bp_type);
+
+   /* Enable breakpoint */
+   attr.disabled = false;
+
ret =  modify_user_hw_breakpoint(bp, attr);
if (ret) {
ptrace_put_breakpoints(task);

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev