Re: [PATCH 2/2] efifb: Avoid reconfiguration of BAR that covers the framebuffer

2017-05-19 Thread Ard Biesheuvel
On 19 May 2017 at 21:44, Bjorn Helgaas  wrote:
> On Fri, May 19, 2017 at 05:37:30PM +0100, Ard Biesheuvel wrote:
>> Hi Bjorn,
>>
>> On 19 May 2017 at 17:27, Bjorn Helgaas  wrote:
>> > [+cc linux-pci]
>> >
>> > On Tue, Apr 04, 2017 at 04:27:44PM +0100, Ard Biesheuvel wrote:
>> >> On UEFI systems, the PCI subsystem is enumerated by the firmware,
>> >> and if a graphical framebuffer is exposed by a PCI device, its base
>> >> address and size are exposed to the OS via the Graphics Output
>> >> Protocol (GOP).
>> >>
>> >> On arm64 PCI systems, the entire PCI hierarchy is reconfigured from
>> >> scratch at boot. This may result in the GOP framebuffer address to
>> >> become stale, if the BAR covering the framebuffer is modified. This
>> >> will cause the framebuffer to become unresponsive, and may in some
>> >> cases result in unpredictable behavior if the range is reassigned to
>> >> another device.
>> >>
>> >> So add a non-x86 quirk to the EFI fb driver to find the BAR associated
>> >> with the GOP base address, and claim the BAR resource so that the PCI
>> >> core will not move it.
>> >
>> > I know this has already been merged as 55d728a40d36 ("efi/fb: Avoid
>> > reconfiguration of BAR that covers the framebuffer"), and I'm not
>> > suggesting that we revert it, but I have some misgivings.
>> ...
>
>> > Another is the use of pci_claim_resource() to express the idea that "this
>> > BAR can not be moved".  We have IORESOURCE_PCI_FIXED for that purpose, and
>> > previous versions of the patch used that.  I understand there was some
>> > problem with that, but I wish we could figure out and fix that problem
>> > instead of using a different mechanism.
>>
>> OK. The problem was that IORESOURCE_PCI_FIXED does not prevent the PCI
>> subsystem from handing out the same range to another device.
>
> Yes, this would definitely be a problem.  There must be a path where
> we start doing the reassignment before we claim the resources that
> have already been assigned.  That's what seems backwards to me -- it
> seems like we should claim things that are valid first so we know to
> avoid them.
>
>> > I'm not even completely sold on the idea that we need to prevent the BAR
>> > from being moved.  I'm not a UEFI expert, but if this requirement is in the
>> > spec, we should reference it.  If not, it should be sufficient to remember
>> > the boot-time BAR value, match the GOP base to *that*, and map it to
>> > whatever the current BAR value is.
>>
>> There is no such requirement in the spec. The graphics output protocol
>> is not described in terms of PCI, BARs etc. The framebuffer is simply
>> a memory range with side effects that is left enabled when handing
>> over to the OS.
>>
>> In summary, I am as unhappy with the patch as you are, but it is still
>> an improvement over the previous situation, so let's simply
>> collaborate to get this into shape going forward.
>>
>> My preference would be to investigate IORESOURCE_PCI_FIXED again,
>> because that still seems like the best way to deal with a live
>> framebuffer on a PCI device that has memory decoding enabled. It
>> should also address the issue with the current version of the patch,
>> i.e., that claiming resources at this point is not possible if the
>> device resides behind a bridge.
>>
>> So is there any guidance you can give as to how to proceed? If we set
>> IORESOURCE_PCI_FIXED, how should be prevent the PCI layer from
>> assigning this resource elsewhere if we cannot claim it yet?
>
> My preference would actually be to remember the boot BAR values and
> map to the current values because that avoids the unnecessary
> restriction.  The IORESOURCE_PCI_FIXED uses that seem legitimate to me
> are the legacy VGA and IDE things (stuff that's not described by BARs
> and *can't* be moved) and the new "Enhanced Allocation" stuff
> (basically a way to describe more non-BAR resources).
>
> We do something sort of similar with pci_revert_fw_address(), although
> it's currently only implemented for x86.  I could imagine a more
> generic solution that could be used for this GOP issue and could also
> replace pci_revert_fw_address().
>

I already proposed something like this a while ago:
http://marc.info//?l=linux-fbdev&m=149190021316410&w=2

> Conceptually it could be as simple as adding 7 u64 boot-time BAR
> values (6 BARs + the ROM) to struct pci_dev.  We went to a lot of
> trouble to implement the pcibios_fwaddrmap stuff on x86, and I can't
> remember why it's x86-specific.  Maybe we thought the extra 56 bytes
> per dev was too much overhead (it does seem like a lot for such a
> limited use case).  Maybe there's a clever way to store just the BARs
> we actually change and keep that long enough for efifb to use it.
>
> It *would* be nice to fix the problem with IORESOURCE_PCI_FIXED, and I
> think it would help clean up PCI resource management a lot.  Ideally
> we would be able to claim host bridge resources even before scanning
> the bus, then claim BARs immediately when we d

Re: [PATCH v5 28/32] x86/mm, kexec: Allow kexec to be used with SME

2017-05-19 Thread Tom Lendacky

On 5/19/2017 4:28 PM, Borislav Petkov wrote:

On Fri, May 19, 2017 at 04:07:24PM -0500, Tom Lendacky wrote:

As long as those never change from static inline everything will be
fine. I can change it, but I really like how it explicitly indicates


I know what you want to do. But you're practically defining a helper
which contains two arbitrary instructions which probably no one else
will need.

So how about we simplify this function even more. We don't need to pay
attention to kexec being in progress because we're halting anyway so who
cares how fast we halt.

Might have to state that in the comment below though, instead of what's
there now.

And for the exact same moot reason, we don't need to look at SME CPUID
feature - we can just as well WBINVD unconditionally.

void stop_this_cpu(void *dummy)
{
local_irq_disable();
/*
 * Remove this CPU:
 */
set_cpu_online(smp_processor_id(), false);
disable_local_APIC();
mcheck_cpu_clear(this_cpu_ptr(&cpu_info));

for (;;) {
/*
 * If we are performing a kexec and the processor supports
 * SME then we need to clear out cache information before
 * halting. With kexec, going from SME inactive to SME active
 * requires clearing cache entries so that addresses without
 * the encryption bit set don't corrupt the same physical
 * address that has the encryption bit set when caches are
 * flushed. Perform a wbinvd followed by a halt to achieve
 * this.
 */
asm volatile("wbinvd; hlt" ::: "memory");
}
}

How's that?


I can live with that!

Thanks,
Tom




--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 28/32] x86/mm, kexec: Allow kexec to be used with SME

2017-05-19 Thread Borislav Petkov
On Fri, May 19, 2017 at 04:07:24PM -0500, Tom Lendacky wrote:
> As long as those never change from static inline everything will be
> fine. I can change it, but I really like how it explicitly indicates

I know what you want to do. But you're practically defining a helper
which contains two arbitrary instructions which probably no one else
will need.

So how about we simplify this function even more. We don't need to pay
attention to kexec being in progress because we're halting anyway so who
cares how fast we halt.

Might have to state that in the comment below though, instead of what's
there now.

And for the exact same moot reason, we don't need to look at SME CPUID
feature - we can just as well WBINVD unconditionally.

void stop_this_cpu(void *dummy)
{
local_irq_disable();
/*
 * Remove this CPU:
 */
set_cpu_online(smp_processor_id(), false);
disable_local_APIC();
mcheck_cpu_clear(this_cpu_ptr(&cpu_info));

for (;;) {
/*
 * If we are performing a kexec and the processor supports
 * SME then we need to clear out cache information before
 * halting. With kexec, going from SME inactive to SME active
 * requires clearing cache entries so that addresses without
 * the encryption bit set don't corrupt the same physical
 * address that has the encryption bit set when caches are
 * flushed. Perform a wbinvd followed by a halt to achieve
 * this.
 */
asm volatile("wbinvd; hlt" ::: "memory");
}
}

How's that?

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 28/32] x86/mm, kexec: Allow kexec to be used with SME

2017-05-19 Thread Tom Lendacky

On 5/19/2017 3:58 PM, Borislav Petkov wrote:

On Fri, May 19, 2017 at 03:45:28PM -0500, Tom Lendacky wrote:

Actually there is.  The above will result in data in the cache because
halt() turns into a function call if CONFIG_PARAVIRT is defined (refer
to the comment above where do_wbinvd_halt is set to true). I could make
this a native_wbinvd() and native_halt()


That's why we have the native_* versions - to bypass paravirt crap.


As long as those never change from static inline everything will be
fine. I can change it, but I really like how it explicitly indicates
what is needed in this case. Even if the function gets changed from
static inline the fact that the instructions are sequential in the
function covers that case.

Thanks,
Tom




--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 28/32] x86/mm, kexec: Allow kexec to be used with SME

2017-05-19 Thread Borislav Petkov
On Fri, May 19, 2017 at 03:45:28PM -0500, Tom Lendacky wrote:
> Actually there is.  The above will result in data in the cache because
> halt() turns into a function call if CONFIG_PARAVIRT is defined (refer
> to the comment above where do_wbinvd_halt is set to true). I could make
> this a native_wbinvd() and native_halt()

That's why we have the native_* versions - to bypass paravirt crap.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 17/32] x86/mm: Add support to access boot related data in the clear

2017-05-19 Thread Tom Lendacky

On 5/18/2017 4:02 AM, Borislav Petkov wrote:

On Wed, May 17, 2017 at 01:54:39PM -0500, Tom Lendacky wrote:

I was worried what the compiler might do when CONFIG_EFI is not set,
but it appears to take care of it. I'll double check though.


There's a efi_enabled() !CONFIG_EFI version too, so should be fine.


I may introduce a length variable to capture data->len right after
paddr_next is set and then have just a single memunmap() call before
the if check.


Yap.


I tried that, but calling an "__init" function (early_memremap()) from
a non "__init" function generated warnings. I suppose I can pass in a
function for the map and unmap but that looks worse to me (also the
unmap functions take different arguments).


No, the other way around: the __init function should call the non-init
one and you need the non-init one anyway for memremap_is_setup_data().



The "worker" function would be doing the loop through the setup data,
but since the setup data is mapped inside the loop I can't do the __init
calling the non-init function and still hope to consolidate the code.
Maybe I'm missing something here...

Thanks,
Tom


This is like the chicken and the egg scenario. In order to determine if
an address is setup data I have to explicitly map the setup data chain
as decrypted. In order to do that I have to supply a flag to explicitly
map the data decrypted otherwise I wind up back in the
memremap_is_setup_data() function again and again and again...


Oh, fun.


--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 28/32] x86/mm, kexec: Allow kexec to be used with SME

2017-05-19 Thread Tom Lendacky

On 5/17/2017 2:17 PM, Borislav Petkov wrote:

On Tue, Apr 18, 2017 at 04:21:21PM -0500, Tom Lendacky wrote:

Provide support so that kexec can be used to boot a kernel when SME is
enabled.

Support is needed to allocate pages for kexec without encryption.  This
is needed in order to be able to reboot in the kernel in the same manner
as originally booted.

Additionally, when shutting down all of the CPUs we need to be sure to
flush the caches and then halt. This is needed when booting from a state
where SME was not active into a state where SME is active (or vice-versa).
Without these steps, it is possible for cache lines to exist for the same
physical location but tagged both with and without the encryption bit. This
can cause random memory corruption when caches are flushed depending on
which cacheline is written last.

Signed-off-by: Tom Lendacky 
---
 arch/x86/include/asm/init.h  |1 +
 arch/x86/include/asm/irqflags.h  |5 +
 arch/x86/include/asm/kexec.h |8 
 arch/x86/include/asm/pgtable_types.h |1 +
 arch/x86/kernel/machine_kexec_64.c   |   35 +-
 arch/x86/kernel/process.c|   26 +++--
 arch/x86/mm/ident_map.c  |   11 +++
 include/linux/kexec.h|   14 ++
 kernel/kexec_core.c  |7 +++
 9 files changed, 101 insertions(+), 7 deletions(-)


...


@@ -86,7 +86,7 @@ static int init_transition_pgtable(struct kimage *image, 
pgd_t *pgd)
set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
}
pte = pte_offset_kernel(pmd, vaddr);
-   set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC));
+   set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC_NOENC));
return 0;
 err:
free_transition_pgtable(image);
@@ -114,6 +114,7 @@ static int init_pgtable(struct kimage *image, unsigned long 
start_pgtable)
.alloc_pgt_page = alloc_pgt_page,
.context= image,
.pmd_flag   = __PAGE_KERNEL_LARGE_EXEC,
+   .kernpg_flag= _KERNPG_TABLE_NOENC,
};
unsigned long mstart, mend;
pgd_t *level4p;
@@ -597,3 +598,35 @@ void arch_kexec_unprotect_crashkres(void)
 {
kexec_mark_crashkres(false);
 }
+
+int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages, gfp_t gfp)
+{
+   int ret;
+
+   if (sme_active()) {


if (!sme_active())
return 0;

/*
 * If SME...



Ok.




+   /*
+* If SME is active we need to be sure that kexec pages are
+* not encrypted because when we boot to the new kernel the
+* pages won't be accessed encrypted (initially).
+*/
+   ret = set_memory_decrypted((unsigned long)vaddr, pages);
+   if (ret)
+   return ret;
+
+   if (gfp & __GFP_ZERO)
+   memset(vaddr, 0, pages * PAGE_SIZE);


This function is called after alloc_pages() which already zeroes memory
when __GFP_ZERO is supplied.

If you need to clear the memory *after* set_memory_encrypted() happens,
then you should probably mask out __GFP_ZERO before the alloc_pages()
call so as not to do it twice.


I'll look into that.  I could put the memset() at the end of this
function so that it is done here no matter what.  And update the
default arch_kexec_post_alloc_pages() to also do the memset(). It
just hides the clearing of the pages a bit though by doing that.




+   }
+
+   return 0;
+}
+
+void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages)
+{
+   if (sme_active()) {
+   /*
+* If SME is active we need to reset the pages back to being
+* an encrypted mapping before freeing them.
+*/
+   set_memory_encrypted((unsigned long)vaddr, pages);
+   }
+}
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 0bb8842..f4e5de6 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -355,8 +356,25 @@ bool xen_set_default_idle(void)
return ret;
 }
 #endif
+
 void stop_this_cpu(void *dummy)
 {
+   bool do_wbinvd_halt = false;
+
+   if (kexec_in_progress && boot_cpu_has(X86_FEATURE_SME)) {
+   /*
+* If we are performing a kexec and the processor supports
+* SME then we need to clear out cache information before
+* halting. With kexec, going from SME inactive to SME active
+* requires clearing cache entries so that addresses without
+* the encryption bit set don't corrupt the same physical
+* address that has the encryption bit set when caches are
+* flushed. Perform a wbinvd fol

Re: [PATCH 2/2] efifb: Avoid reconfiguration of BAR that covers the framebuffer

2017-05-19 Thread Bjorn Helgaas
On Fri, May 19, 2017 at 05:37:30PM +0100, Ard Biesheuvel wrote:
> Hi Bjorn,
> 
> On 19 May 2017 at 17:27, Bjorn Helgaas  wrote:
> > [+cc linux-pci]
> >
> > On Tue, Apr 04, 2017 at 04:27:44PM +0100, Ard Biesheuvel wrote:
> >> On UEFI systems, the PCI subsystem is enumerated by the firmware,
> >> and if a graphical framebuffer is exposed by a PCI device, its base
> >> address and size are exposed to the OS via the Graphics Output
> >> Protocol (GOP).
> >>
> >> On arm64 PCI systems, the entire PCI hierarchy is reconfigured from
> >> scratch at boot. This may result in the GOP framebuffer address to
> >> become stale, if the BAR covering the framebuffer is modified. This
> >> will cause the framebuffer to become unresponsive, and may in some
> >> cases result in unpredictable behavior if the range is reassigned to
> >> another device.
> >>
> >> So add a non-x86 quirk to the EFI fb driver to find the BAR associated
> >> with the GOP base address, and claim the BAR resource so that the PCI
> >> core will not move it.
> >
> > I know this has already been merged as 55d728a40d36 ("efi/fb: Avoid
> > reconfiguration of BAR that covers the framebuffer"), and I'm not
> > suggesting that we revert it, but I have some misgivings.
> ...

> > Another is the use of pci_claim_resource() to express the idea that "this
> > BAR can not be moved".  We have IORESOURCE_PCI_FIXED for that purpose, and
> > previous versions of the patch used that.  I understand there was some
> > problem with that, but I wish we could figure out and fix that problem
> > instead of using a different mechanism.
> 
> OK. The problem was that IORESOURCE_PCI_FIXED does not prevent the PCI
> subsystem from handing out the same range to another device.

Yes, this would definitely be a problem.  There must be a path where
we start doing the reassignment before we claim the resources that
have already been assigned.  That's what seems backwards to me -- it
seems like we should claim things that are valid first so we know to
avoid them.

> > I'm not even completely sold on the idea that we need to prevent the BAR
> > from being moved.  I'm not a UEFI expert, but if this requirement is in the
> > spec, we should reference it.  If not, it should be sufficient to remember
> > the boot-time BAR value, match the GOP base to *that*, and map it to
> > whatever the current BAR value is.
> 
> There is no such requirement in the spec. The graphics output protocol
> is not described in terms of PCI, BARs etc. The framebuffer is simply
> a memory range with side effects that is left enabled when handing
> over to the OS.
> 
> In summary, I am as unhappy with the patch as you are, but it is still
> an improvement over the previous situation, so let's simply
> collaborate to get this into shape going forward.
> 
> My preference would be to investigate IORESOURCE_PCI_FIXED again,
> because that still seems like the best way to deal with a live
> framebuffer on a PCI device that has memory decoding enabled. It
> should also address the issue with the current version of the patch,
> i.e., that claiming resources at this point is not possible if the
> device resides behind a bridge.
> 
> So is there any guidance you can give as to how to proceed? If we set
> IORESOURCE_PCI_FIXED, how should be prevent the PCI layer from
> assigning this resource elsewhere if we cannot claim it yet?

My preference would actually be to remember the boot BAR values and
map to the current values because that avoids the unnecessary
restriction.  The IORESOURCE_PCI_FIXED uses that seem legitimate to me
are the legacy VGA and IDE things (stuff that's not described by BARs
and *can't* be moved) and the new "Enhanced Allocation" stuff
(basically a way to describe more non-BAR resources).

We do something sort of similar with pci_revert_fw_address(), although
it's currently only implemented for x86.  I could imagine a more
generic solution that could be used for this GOP issue and could also
replace pci_revert_fw_address().

Conceptually it could be as simple as adding 7 u64 boot-time BAR
values (6 BARs + the ROM) to struct pci_dev.  We went to a lot of
trouble to implement the pcibios_fwaddrmap stuff on x86, and I can't
remember why it's x86-specific.  Maybe we thought the extra 56 bytes
per dev was too much overhead (it does seem like a lot for such a
limited use case).  Maybe there's a clever way to store just the BARs
we actually change and keep that long enough for efifb to use it.

It *would* be nice to fix the problem with IORESOURCE_PCI_FIXED, and I
think it would help clean up PCI resource management a lot.  Ideally
we would be able to claim host bridge resources even before scanning
the bus, then claim BARs immediately when we discover them.  That
would allow us to automatically use any setup done by firmware, and
reassign anything that we couldn't claim.

But I think this will end up being difficult because we currently do
the PCI bus scan before looking at ACPI resources, and sometimes those

[PATCH V17 04/11] efi: parse ARM processor error

2017-05-19 Thread Tyler Baicar
Add support for ARM Common Platform Error Record (CPER).
UEFI 2.6 specification adds support for ARM specific
processor error information to be reported as part of the
CPER records. This provides more detail on for processor error logs.

Signed-off-by: Tyler Baicar 
CC: Jonathan (Zhixiong) Zhang 
Reviewed-by: James Morse 
Reviewed-by: Ard Biesheuvel 
---
 drivers/firmware/efi/cper.c | 129 
 include/linux/cper.h|  54 +++
 2 files changed, 183 insertions(+)

diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index 229cf92..eac0854 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -110,12 +110,15 @@ void cper_print_bits(const char *pfx, unsigned int bits,
 static const char * const proc_type_strs[] = {
"IA32/X64",
"IA64",
+   "ARM",
 };
 
 static const char * const proc_isa_strs[] = {
"IA32",
"IA64",
"X64",
+   "ARM A32/T32",
+   "ARM A64",
 };
 
 static const char * const proc_error_type_strs[] = {
@@ -184,6 +187,122 @@ static void cper_print_proc_generic(const char *pfx,
printk("%s""IP: 0x%016llx\n", pfx, proc->ip);
 }
 
+#if defined(CONFIG_ARM64) || defined(CONFIG_ARM)
+static const char * const arm_reg_ctx_strs[] = {
+   "AArch32 general purpose registers",
+   "AArch32 EL1 context registers",
+   "AArch32 EL2 context registers",
+   "AArch32 secure context registers",
+   "AArch64 general purpose registers",
+   "AArch64 EL1 context registers",
+   "AArch64 EL2 context registers",
+   "AArch64 EL3 context registers",
+   "Misc. system register structure",
+};
+
+static void cper_print_proc_arm(const char *pfx,
+   const struct cper_sec_proc_arm *proc)
+{
+   int i, len, max_ctx_type;
+   struct cper_arm_err_info *err_info;
+   struct cper_arm_ctx_info *ctx_info;
+   char newpfx[64];
+
+   printk("%sMIDR: 0x%016llx\n", pfx, proc->midr);
+
+   len = proc->section_length - (sizeof(*proc) +
+   proc->err_info_num * (sizeof(*err_info)));
+   if (len < 0) {
+   printk("%ssection length: %d\n", pfx, proc->section_length);
+   printk("%ssection length is too small\n", pfx);
+   printk("%sfirmware-generated error record is incorrect\n", pfx);
+   printk("%sERR_INFO_NUM is %d\n", pfx, proc->err_info_num);
+   return;
+   }
+
+   if (proc->validation_bits & CPER_ARM_VALID_MPIDR)
+   printk("%sMultiprocessor Affinity Register (MPIDR): 
0x%016llx\n",
+   pfx, proc->mpidr);
+
+   if (proc->validation_bits & CPER_ARM_VALID_AFFINITY_LEVEL)
+   printk("%serror affinity level: %d\n", pfx,
+   proc->affinity_level);
+
+   if (proc->validation_bits & CPER_ARM_VALID_RUNNING_STATE) {
+   printk("%srunning state: 0x%x\n", pfx, proc->running_state);
+   printk("%sPower State Coordination Interface state: %d\n",
+   pfx, proc->psci_state);
+   }
+
+   snprintf(newpfx, sizeof(newpfx), "%s%s", pfx, INDENT_SP);
+
+   err_info = (struct cper_arm_err_info *)(proc + 1);
+   for (i = 0; i < proc->err_info_num; i++) {
+   printk("%sError info structure %d:\n", pfx, i);
+
+   printk("%snum errors: %d\n", pfx, err_info->multiple_error + 1);
+
+   if (err_info->validation_bits & CPER_ARM_INFO_VALID_FLAGS) {
+   if (err_info->flags & CPER_ARM_INFO_FLAGS_FIRST)
+   printk("%sfirst error captured\n", newpfx);
+   if (err_info->flags & CPER_ARM_INFO_FLAGS_LAST)
+   printk("%slast error captured\n", newpfx);
+   if (err_info->flags & CPER_ARM_INFO_FLAGS_PROPAGATED)
+   printk("%spropagated error captured\n",
+  newpfx);
+   if (err_info->flags & CPER_ARM_INFO_FLAGS_OVERFLOW)
+   printk("%soverflow occurred, error info is 
incomplete\n",
+  newpfx);
+   }
+
+   printk("%serror_type: %d, %s\n", newpfx, err_info->type,
+   err_info->type < ARRAY_SIZE(proc_error_type_strs) ?
+   proc_error_type_strs[err_info->type] : "unknown");
+   if (err_info->validation_bits & CPER_ARM_INFO_VALID_ERR_INFO)
+   printk("%serror_info: 0x%016llx\n", newpfx,
+  err_info->error_info);
+   if (err_info->validation_bits & CPER_ARM_INFO_VALID_VIRT_ADDR)
+   printk("%svirtual fault address: 0x%016llx\n",
+   newpfx, err_info->virt_fault_addr);
+   if (err_info->validation_bits & 
CPER_ARM_INFO_VALID_PHYSIC

[PATCH V17 02/11] ras: acpi/apei: cper: add support for generic data v3 structure

2017-05-19 Thread Tyler Baicar
The ACPI 6.1 spec adds a new revision of the generic error data
entry structure. Add support to handle the new structure as well
as properly verify and iterate through the generic data entries.

Signed-off-by: Tyler Baicar 
CC: Jonathan (Zhixiong) Zhang 
---
 drivers/acpi/apei/ghes.c| 11 +--
 drivers/firmware/efi/cper.c | 37 ++---
 include/acpi/ghes.h | 36 
 3 files changed, 63 insertions(+), 21 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index c072acf..626552e 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -428,8 +428,7 @@ static void ghes_handle_memory_failure(struct 
acpi_hest_generic_data *gdata, int
unsigned long pfn;
int flags = -1;
int sec_sev = ghes_severity(gdata->error_severity);
-   struct cper_sec_mem_err *mem_err;
-   mem_err = (struct cper_sec_mem_err *)(gdata + 1);
+   struct cper_sec_mem_err *mem_err = acpi_hest_get_payload(gdata);
 
if (!(mem_err->validation_bits & CPER_MEM_VALID_PA))
return;
@@ -465,8 +464,8 @@ static void ghes_do_proc(struct ghes *ghes,
sec_sev = ghes_severity(gdata->error_severity);
if (!uuid_le_cmp(*(uuid_le *)gdata->section_type,
 CPER_SEC_PLATFORM_MEM)) {
-   struct cper_sec_mem_err *mem_err;
-   mem_err = (struct cper_sec_mem_err *)(gdata+1);
+   struct cper_sec_mem_err *mem_err = 
acpi_hest_get_payload(gdata);
+
ghes_edac_report_mem_error(ghes, sev, mem_err);
 
arch_apei_report_mem_error(sev, mem_err);
@@ -475,8 +474,8 @@ static void ghes_do_proc(struct ghes *ghes,
 #ifdef CONFIG_ACPI_APEI_PCIEAER
else if (!uuid_le_cmp(*(uuid_le *)gdata->section_type,
  CPER_SEC_PCIE)) {
-   struct cper_sec_pcie *pcie_err;
-   pcie_err = (struct cper_sec_pcie *)(gdata+1);
+   struct cper_sec_pcie *pcie_err = 
acpi_hest_get_payload(gdata);
+
if (sev == GHES_SEV_RECOVERABLE &&
sec_sev == GHES_SEV_RECOVERABLE &&
pcie_err->validation_bits & 
CPER_PCIE_VALID_DEVICE_ID &&
diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index d425374..9024757 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define INDENT_SP  " "
 
@@ -386,8 +387,9 @@ static void cper_print_pcie(const char *pfx, const struct 
cper_sec_pcie *pcie,
pfx, pcie->bridge.secondary_status, pcie->bridge.control);
 }
 
-static void cper_estatus_print_section(
-   const char *pfx, const struct acpi_hest_generic_data *gdata, int sec_no)
+static void
+cper_estatus_print_section(const char *pfx, struct acpi_hest_generic_data 
*gdata,
+  int sec_no)
 {
uuid_le *sec_type = (uuid_le *)gdata->section_type;
__u16 severity;
@@ -403,14 +405,16 @@ static void cper_estatus_print_section(
 
snprintf(newpfx, sizeof(newpfx), "%s%s", pfx, INDENT_SP);
if (!uuid_le_cmp(*sec_type, CPER_SEC_PROC_GENERIC)) {
-   struct cper_sec_proc_generic *proc_err = (void *)(gdata + 1);
+   struct cper_sec_proc_generic *proc_err = 
acpi_hest_get_payload(gdata);
+
printk("%s""section_type: general processor error\n", newpfx);
if (gdata->error_data_length >= sizeof(*proc_err))
cper_print_proc_generic(newpfx, proc_err);
else
goto err_section_too_small;
} else if (!uuid_le_cmp(*sec_type, CPER_SEC_PLATFORM_MEM)) {
-   struct cper_sec_mem_err *mem_err = (void *)(gdata + 1);
+   struct cper_sec_mem_err *mem_err = acpi_hest_get_payload(gdata);
+
printk("%s""section_type: memory error\n", newpfx);
if (gdata->error_data_length >=
sizeof(struct cper_sec_mem_err_old))
@@ -419,7 +423,8 @@ static void cper_estatus_print_section(
else
goto err_section_too_small;
} else if (!uuid_le_cmp(*sec_type, CPER_SEC_PCIE)) {
-   struct cper_sec_pcie *pcie = (void *)(gdata + 1);
+   struct cper_sec_pcie *pcie = acpi_hest_get_payload(gdata);
+
printk("%s""section_type: PCIe error\n", newpfx);
if (gdata->error_data_length >= sizeof(*pcie))
cper_print_pcie(newpfx, pcie, gdata);
@@ -438,7 +443,7 @@ void cper_estatus_print(const char *pfx,
const struct acpi_hest_generic_status *estatus)
 {
struct acpi_hest_generic_data *gdata;
-   unsigned int data_len, gedata_len;
+   unsigned int data_len

[PATCH V17 06/11] acpi: apei: handle SEA notification type for ARMv8

2017-05-19 Thread Tyler Baicar
ARM APEI extension proposal added SEA (Synchronous External Abort)
notification type for ARMv8.
Add a new GHES error source handling function for SEA. If an error
source's notification type is SEA, then this function can be registered
into the SEA exception handler. That way GHES will parse and report
SEA exceptions when they occur.
An SEA can interrupt code that had interrupts masked and is treated as
an NMI. To aid this the page of address space for mapping APEI buffers
while in_nmi() is always reserved, and ghes_ioremap_pfn_nmi() is
changed to use the helper methods to find the prot_t to map with in
the same way as ghes_ioremap_pfn_irq().

Signed-off-by: Tyler Baicar 
CC: Jonathan (Zhixiong) Zhang 
Reviewed-by: James Morse 
Acked-by: Catalin Marinas 
---
 arch/arm64/Kconfig|  2 ++
 arch/arm64/mm/fault.c | 17 
 drivers/acpi/apei/Kconfig | 15 ++
 drivers/acpi/apei/ghes.c  | 70 +++
 include/acpi/ghes.h   |  7 +
 5 files changed, 105 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 3dcd7ec..8055997 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -19,6 +19,7 @@ config ARM64
select ARCH_HAS_STRICT_KERNEL_RWX
select ARCH_HAS_STRICT_MODULE_RWX
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
+   select ARCH_HAVE_NMI_SAFE_CMPXCHG if ACPI_APEI_SEA
select ARCH_USE_CMPXCHG_LOCKREF
select ARCH_SUPPORTS_ATOMIC_RMW
select ARCH_SUPPORTS_NUMA_BALANCING
@@ -92,6 +93,7 @@ config ARM64
select HAVE_IRQ_TIME_ACCOUNTING
select HAVE_MEMBLOCK
select HAVE_MEMBLOCK_NODE_MAP if NUMA
+   select HAVE_NMI if ACPI_APEI_SEA
select HAVE_PATA_PLATFORM
select HAVE_PERF_EVENTS
select HAVE_PERF_REGS
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 6697816..9b4e26f 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -42,6 +42,8 @@
 #include 
 #include 
 
+#include 
+
 struct fault_info {
int (*fn)(unsigned long addr, unsigned int esr,
  struct pt_regs *regs);
@@ -535,6 +537,21 @@ static int do_sea(unsigned long addr, unsigned int esr, 
struct pt_regs *regs)
pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n",
inf->name, esr, addr);
 
+   /*
+* Synchronous aborts may interrupt code which had interrupts masked.
+* Before calling out into the wider kernel tell the interested
+* subsystems.
+*/
+   if (IS_ENABLED(CONFIG_ACPI_APEI_SEA)) {
+   if (interrupts_enabled(regs))
+   nmi_enter();
+
+   ghes_notify_sea();
+
+   if (interrupts_enabled(regs))
+   nmi_exit();
+   }
+
info.si_signo = SIGBUS;
info.si_errno = 0;
info.si_code  = 0;
diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
index b0140c8..de14d49 100644
--- a/drivers/acpi/apei/Kconfig
+++ b/drivers/acpi/apei/Kconfig
@@ -39,6 +39,21 @@ config ACPI_APEI_PCIEAER
  PCIe AER errors may be reported via APEI firmware first mode.
  Turn on this option to enable the corresponding support.
 
+config ACPI_APEI_SEA
+   bool "APEI Synchronous External Abort logging/recovering support"
+   depends on ARM64 && ACPI_APEI_GHES
+   default y
+   help
+ This option should be enabled if the system supports
+ firmware first handling of SEA (Synchronous External Abort).
+ SEA happens with certain faults of data abort or instruction
+ abort synchronous exceptions on ARMv8 systems. If a system
+ supports firmware first handling of SEA, the platform analyzes
+ and handles hardware error notifications from SEA, and it may then
+ form a HW error record for the OS to parse and handle. This
+ option allows the OS to look for such hardware error record, and
+ take appropriate action.
+
 config ACPI_APEI_MEMORY_FAILURE
bool "APEI memory error recovering support"
depends on ACPI_APEI && MEMORY_FAILURE
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 626552e..7e59a73 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -116,11 +116,7 @@ static inline bool is_hest_type_generic_v2(struct ghes 
*ghes)
  * Two virtual pages are used, one for IRQ/PROCESS context, the other for
  * NMI context (optionally).
  */
-#ifdef CONFIG_HAVE_ACPI_APEI_NMI
 #define GHES_IOREMAP_PAGES   2
-#else
-#define GHES_IOREMAP_PAGES   1
-#endif
 #define GHES_IOREMAP_IRQ_PAGE(base)(base)
 #define GHES_IOREMAP_NMI_PAGE(base)((base) + PAGE_SIZE)
 
@@ -159,10 +155,14 @@ static void ghes_ioremap_exit(void)
 static void __iomem *ghes_ioremap_pfn_nmi(u64 pfn)
 {
unsigned long vaddr;
+   phys_addr_t paddr;
+   pgprot_t prot;
 
vaddr = (unsigned lo

[PATCH V17 05/11] arm64: exception: handle Synchronous External Abort

2017-05-19 Thread Tyler Baicar
SEA exceptions are often caused by an uncorrected hardware
error, and are handled when data abort and instruction abort
exception classes have specific values for their Fault Status
Code.
When SEA occurs, before killing the process, report the error
in the kernel logs.
Update fault_info[] with specific SEA faults so that the
new SEA handler is used.

Signed-off-by: Tyler Baicar 
CC: Jonathan (Zhixiong) Zhang 
Reviewed-by: James Morse 
Acked-by: Catalin Marinas 
---
 arch/arm64/include/asm/esr.h |  1 +
 arch/arm64/mm/fault.c| 45 ++--
 2 files changed, 36 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index 85997c0..28bf02e 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -83,6 +83,7 @@
 #define ESR_ELx_WNR(UL(1) << 6)
 
 /* Shared ISS field definitions for Data/Instruction aborts */
+#define ESR_ELx_FnV(UL(1) << 10)
 #define ESR_ELx_EA (UL(1) << 9)
 #define ESR_ELx_S1PTW  (UL(1) << 7)
 
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 37b95df..6697816 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -522,6 +522,31 @@ static int do_bad(unsigned long addr, unsigned int esr, 
struct pt_regs *regs)
return 1;
 }
 
+/*
+ * This abort handler deals with Synchronous External Abort.
+ * It calls notifiers, and then returns "fault".
+ */
+static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
+{
+   struct siginfo info;
+   const struct fault_info *inf;
+
+   inf = esr_to_fault_info(esr);
+   pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n",
+   inf->name, esr, addr);
+
+   info.si_signo = SIGBUS;
+   info.si_errno = 0;
+   info.si_code  = 0;
+   if (esr & ESR_ELx_FnV)
+   info.si_addr = 0;
+   else
+   info.si_addr  = (void __user *)addr;
+   arm64_notify_die("", regs, &info, esr);
+
+   return 0;
+}
+
 static const struct fault_info fault_info[] = {
{ do_bad,   SIGBUS,  0, "ttbr address size 
fault"   },
{ do_bad,   SIGBUS,  0, "level 1 address size 
fault"},
@@ -539,22 +564,22 @@ static int do_bad(unsigned long addr, unsigned int esr, 
struct pt_regs *regs)
{ do_page_fault,SIGSEGV, SEGV_ACCERR,   "level 1 permission 
fault"  },
{ do_page_fault,SIGSEGV, SEGV_ACCERR,   "level 2 permission 
fault"  },
{ do_page_fault,SIGSEGV, SEGV_ACCERR,   "level 3 permission 
fault"  },
-   { do_bad,   SIGBUS,  0, "synchronous external 
abort"},
+   { do_sea,   SIGBUS,  0, "synchronous external 
abort"},
{ do_bad,   SIGBUS,  0, "unknown 17"
},
{ do_bad,   SIGBUS,  0, "unknown 18"
},
{ do_bad,   SIGBUS,  0, "unknown 19"
},
-   { do_bad,   SIGBUS,  0, "synchronous external 
abort (translation table walk)" },
-   { do_bad,   SIGBUS,  0, "synchronous external 
abort (translation table walk)" },
-   { do_bad,   SIGBUS,  0, "synchronous external 
abort (translation table walk)" },
-   { do_bad,   SIGBUS,  0, "synchronous external 
abort (translation table walk)" },
-   { do_bad,   SIGBUS,  0, "synchronous parity 
error"  },
+   { do_sea,   SIGBUS,  0, "level 0 (translation 
table walk)"  },
+   { do_sea,   SIGBUS,  0, "level 1 (translation 
table walk)"  },
+   { do_sea,   SIGBUS,  0, "level 2 (translation 
table walk)"  },
+   { do_sea,   SIGBUS,  0, "level 3 (translation 
table walk)"  },
+   { do_sea,   SIGBUS,  0, "synchronous parity or 
ECC error" },
{ do_bad,   SIGBUS,  0, "unknown 25"
},
{ do_bad,   SIGBUS,  0, "unknown 26"
},
{ do_bad,   SIGBUS,  0, "unknown 27"
},
-   { do_bad,   SIGBUS,  0, "synchronous parity 
error (translation table walk)" },
-   { do_bad,   SIGBUS,  0, "synchronous parity 
error (translation table walk)" },
-   { do_bad,   SIGBUS,  0, "synchronous parity 
error (translation table walk)" },
-   { do_bad,   SIGBUS,  0, "synchronous parity 
error (translation table walk)" },
+   { do_sea,   SIGBUS,  0, "level 0 synchronous 
parity error (translatio

[PATCH V17 09/11] ras: acpi / apei: generate trace event for unrecognized CPER section

2017-05-19 Thread Tyler Baicar
The UEFI spec includes non-standard section type support in the
Common Platform Error Record. This is defined in section N.2.3 of
UEFI version 2.5.

Currently if the CPER section's type (UUID) does not match any
section type that the kernel knows how to parse, a trace event is
not generated.

Generate a trace event which contains the raw error data for
non-standard section type error records.

Signed-off-by: Tyler Baicar 
CC: Jonathan (Zhixiong) Zhang 
Tested-by: Shiju Jose 
---
 drivers/acpi/apei/ghes.c  | 27 +++
 drivers/ras/ras.c | 10 +-
 include/linux/ras.h   | 12 
 include/ras/ras_event.h   | 45 +
 include/uapi/linux/uuid.h |  6 --
 5 files changed, 93 insertions(+), 7 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index fadfb43..16adbc8 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -45,11 +45,14 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 #include 
 #include 
 #include 
+#include 
 
 #include "apei-internal.h"
 
@@ -460,12 +463,22 @@ static void ghes_do_proc(struct ghes *ghes,
 {
int sev, sec_sev;
struct acpi_hest_generic_data *gdata;
+   uuid_le sec_type;
+   uuid_le *fru_id = &NULL_UUID_LE;
+   char *fru_text = "";
 
sev = ghes_severity(estatus->error_severity);
apei_estatus_for_each_section(estatus, gdata) {
sec_sev = ghes_severity(gdata->error_severity);
-   if (!uuid_le_cmp(*(uuid_le *)gdata->section_type,
-CPER_SEC_PLATFORM_MEM)) {
+   sec_type = *(uuid_le *)gdata->section_type;
+
+   if (gdata->validation_bits & CPER_SEC_VALID_FRU_ID)
+   fru_id = (uuid_le *)gdata->fru_id;
+
+   if (gdata->validation_bits & CPER_SEC_VALID_FRU_TEXT)
+   fru_text = gdata->fru_text;
+
+   if (!uuid_le_cmp(sec_type, CPER_SEC_PLATFORM_MEM)) {
struct cper_sec_mem_err *mem_err = 
acpi_hest_get_payload(gdata);
 
ghes_edac_report_mem_error(ghes, sev, mem_err);
@@ -474,8 +487,7 @@ static void ghes_do_proc(struct ghes *ghes,
ghes_handle_memory_failure(gdata, sev);
}
 #ifdef CONFIG_ACPI_APEI_PCIEAER
-   else if (!uuid_le_cmp(*(uuid_le *)gdata->section_type,
- CPER_SEC_PCIE)) {
+   else if (!uuid_le_cmp(sec_type, CPER_SEC_PCIE)) {
struct cper_sec_pcie *pcie_err = 
acpi_hest_get_payload(gdata);
 
if (sev == GHES_SEV_RECOVERABLE &&
@@ -506,6 +518,13 @@ static void ghes_do_proc(struct ghes *ghes,
 
}
 #endif
+   else {
+   void *err = acpi_hest_get_payload(gdata);
+
+   log_non_standard_event(&sec_type, fru_id, fru_text,
+  sec_sev, err,
+  gdata->error_data_length);
+   }
}
 }
 
diff --git a/drivers/ras/ras.c b/drivers/ras/ras.c
index 94f8038..e87fd9e 100644
--- a/drivers/ras/ras.c
+++ b/drivers/ras/ras.c
@@ -7,11 +7,19 @@
 
 #include 
 #include 
+#include 
 
 #define CREATE_TRACE_POINTS
 #define TRACE_INCLUDE_PATH ../../include/ras
 #include 
 
+void log_non_standard_event(const uuid_le *sec_type, const uuid_le *fru_id,
+   const char *fru_text, const u8 sev, const u8 *err,
+   const u32 len)
+{
+   trace_non_standard_event(sec_type, fru_id, fru_text, sev, err, len);
+}
+
 static int __init ras_init(void)
 {
int rc = 0;
@@ -27,7 +35,7 @@ static int __init ras_init(void)
 EXPORT_TRACEPOINT_SYMBOL_GPL(extlog_mem_event);
 #endif
 EXPORT_TRACEPOINT_SYMBOL_GPL(mc_event);
-
+EXPORT_TRACEPOINT_SYMBOL_GPL(non_standard_event);
 
 int __init parse_ras_param(char *str)
 {
diff --git a/include/linux/ras.h b/include/linux/ras.h
index ffb1471..a7f3ed3 100644
--- a/include/linux/ras.h
+++ b/include/linux/ras.h
@@ -2,6 +2,7 @@
 #define __RAS_H__
 
 #include 
+#include 
 
 #ifdef CONFIG_DEBUG_FS
 int ras_userspace_consumers(void);
@@ -22,4 +23,15 @@ static inline void __init cec_init(void) { }
 static inline int cec_add_elem(u64 pfn){ return -ENODEV; }
 #endif
 
+#ifdef CONFIG_RAS
+void log_non_standard_event(const uuid_le *sec_type,
+   const uuid_le *fru_id, const char *fru_text,
+   const u8 sev, const u8 *err, const u32 len);
+#else
+static void log_non_standard_event(const uuid_le *sec_type,
+  const uuid_le *fru_id, const char *fru_text,
+  const u8 sev, const u8 *err,
+  const u32 len) { return; }
+#endif
+
 #endif /* __RAS_H__ */
diff --git a/include/ras/ras_event.h b/inclu

[PATCH V17 08/11] efi: print unrecognized CPER section

2017-05-19 Thread Tyler Baicar
UEFI spec allows for non-standard section in Common Platform Error
Record. This is defined in section N.2.3 of UEFI version 2.5.

Currently if the CPER section's type (UUID) does not match with
one of the section types that the kernel knows how to parse, the
section is skipped. Therefore, user is not able to see
such CPER data, for instance, error record of non-standard section.

This change prints out the raw data in hex in the dmesg buffer so
that non-standard sections are reported to the user. Non-standard
section type errors should be reported to the user because these
can include errors which are vendor specific. The data length is
taken from Error Data length field of Generic Error Data Entry.

The following is a sample output from dmesg:
 Hardware error from APEI Generic Hardware Error Source: 2
 It has been corrected by h/w and requires no further action
 event severity: corrected
  time: precise 2017-03-15 20:37:35
  Error 0, type: corrected
   section type: unknown, d2e2621c-f936-468d-0d84-15a4ed015c8b
   section length: 0x238
   : 4d415201 4d492031 453a4d45 435f4343  .RAM1 IMEM:ECC_C
   0010: 53515f45 44525f42    E_QSB_RD
   0020:      
   0030:   0101 0101  
   0040:   0005   
   0050: 0101  0001 0000  
...

The raw data from the error can then be decoded using vendor
specific tools.

Signed-off-by: Tyler Baicar 
CC: Jonathan (Zhixiong) Zhang 
Reviewed-by: James Morse 
---
 drivers/firmware/efi/cper.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index eac0854..d5a5855 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -585,8 +585,15 @@ static void cper_print_tstamp(const char *pfx,
else
goto err_section_too_small;
 #endif
-   } else
-   printk("%s""section type: unknown, %pUl\n", newpfx, sec_type);
+   } else {
+   const void *err = acpi_hest_get_payload(gdata);
+
+   printk("%ssection type: unknown, %pUl\n", newpfx, sec_type);
+   printk("%ssection length: %#x\n", newpfx,
+  gdata->error_data_length);
+   print_hex_dump(newpfx, "", DUMP_PREFIX_OFFSET, 16, 4, err,
+  gdata->error_data_length, true);
+   }
 
return;
 
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V17 07/11] acpi: apei: panic OS with fatal error status block

2017-05-19 Thread Tyler Baicar
From: "Jonathan (Zhixiong) Zhang" 

Even if an error status block's severity is fatal, the kernel does not
honor the severity level and panic.

With the firmware first model, the platform could inform the OS about a
fatal hardware error through the non-NMI GHES notification type. The OS
should panic when a hardware error record is received with this
severity.

Call panic() after CPER data in error status block is printed if
severity is fatal, before each error section is handled.

Signed-off-by: Jonathan (Zhixiong) Zhang 
Signed-off-by: Tyler Baicar 
Reviewed-by: James Morse 
---
 drivers/acpi/apei/ghes.c | 36 +---
 1 file changed, 21 insertions(+), 15 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 7e59a73..fadfb43 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -135,6 +135,8 @@ static inline bool is_hest_type_generic_v2(struct ghes 
*ghes)
 static struct ghes_estatus_cache 
*ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
 static atomic_t ghes_estatus_cache_alloced;
 
+static int ghes_panic_timeout __read_mostly = 30;
+
 static int ghes_ioremap_init(void)
 {
ghes_ioremap_area = __get_vm_area(PAGE_SIZE * GHES_IOREMAP_PAGES,
@@ -691,6 +693,16 @@ static int ghes_ack_error(struct acpi_hest_generic_v2 *gv2)
return apei_write(val, &gv2->read_ack_register);
 }
 
+static void __ghes_panic(struct ghes *ghes)
+{
+   __ghes_print_estatus(KERN_EMERG, ghes->generic, ghes->estatus);
+
+   /* reboot to log the error! */
+   if (!panic_timeout)
+   panic_timeout = ghes_panic_timeout;
+   panic("Fatal hardware error!");
+}
+
 static int ghes_proc(struct ghes *ghes)
 {
int rc;
@@ -698,6 +710,11 @@ static int ghes_proc(struct ghes *ghes)
rc = ghes_read_estatus(ghes, 0);
if (rc)
goto out;
+
+   if (ghes_severity(ghes->estatus->error_severity) >= GHES_SEV_PANIC) {
+   __ghes_panic(ghes);
+   }
+
if (!ghes_estatus_cached(ghes->estatus)) {
if (ghes_print_estatus(NULL, ghes->generic, ghes->estatus))
ghes_estatus_cache_add(ghes->generic, ghes->estatus);
@@ -838,8 +855,6 @@ static inline void ghes_sea_remove(struct ghes *ghes)
 
 static LIST_HEAD(ghes_nmi);
 
-static int ghes_panic_timeout  __read_mostly = 30;
-
 static void ghes_proc_in_irq(struct irq_work *irq_work)
 {
struct llist_node *llnode, *next;
@@ -925,18 +940,6 @@ static void __process_error(struct ghes *ghes)
 #endif
 }
 
-static void __ghes_panic(struct ghes *ghes)
-{
-   oops_begin();
-   ghes_print_queued_estatus();
-   __ghes_print_estatus(KERN_EMERG, ghes->generic, ghes->estatus);
-
-   /* reboot to log the error! */
-   if (panic_timeout == 0)
-   panic_timeout = ghes_panic_timeout;
-   panic("Fatal hardware error!");
-}
-
 static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
 {
struct ghes *ghes;
@@ -954,8 +957,11 @@ static int ghes_notify_nmi(unsigned int cmd, struct 
pt_regs *regs)
}
 
sev = ghes_severity(ghes->estatus->error_severity);
-   if (sev >= GHES_SEV_PANIC)
+   if (sev >= GHES_SEV_PANIC) {
+   oops_begin();
+   ghes_print_queued_estatus();
__ghes_panic(ghes);
+   }
 
if (!(ghes->flags & GHES_TO_CLEAR))
continue;
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V17 11/11] arm/arm64: KVM: add guest SEA support

2017-05-19 Thread Tyler Baicar
Currently external aborts are unsupported by the guest abort
handling. Add handling for SEAs so that the host kernel reports
SEAs which occur in the guest kernel.

When an SEA occurs in the guest kernel, the guest exits and is
routed to kvm_handle_guest_abort(). Prior to this patch, a print
message of an unsupported FSC would be printed and nothing else
would happen. With this patch, the code gets routed to the APEI
handling of SEAs in the host kernel to report the SEA information.

Signed-off-by: Tyler Baicar 
Acked-by: Catalin Marinas 
Acked-by: Marc Zyngier 
Acked-by: Christoffer Dall 
---
 arch/arm/include/asm/kvm_arm.h   | 10 ++
 arch/arm/include/asm/system_misc.h   |  5 +
 arch/arm64/include/asm/kvm_arm.h | 10 ++
 arch/arm64/include/asm/system_misc.h |  2 ++
 arch/arm64/mm/fault.c| 22 --
 drivers/acpi/apei/ghes.c | 17 +++--
 include/acpi/ghes.h  |  2 +-
 virt/kvm/arm/mmu.c   | 36 +---
 8 files changed, 92 insertions(+), 12 deletions(-)

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index a3f0b3d..ebf020b 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -187,6 +187,16 @@
 #define FSC_FAULT  (0x04)
 #define FSC_ACCESS (0x08)
 #define FSC_PERM   (0x0c)
+#define FSC_SEA(0x10)
+#define FSC_SEA_TTW0   (0x14)
+#define FSC_SEA_TTW1   (0x15)
+#define FSC_SEA_TTW2   (0x16)
+#define FSC_SEA_TTW3   (0x17)
+#define FSC_SECC   (0x18)
+#define FSC_SECC_TTW0  (0x1c)
+#define FSC_SECC_TTW1  (0x1d)
+#define FSC_SECC_TTW2  (0x1e)
+#define FSC_SECC_TTW3  (0x1f)
 
 /* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
 #define HPFAR_MASK (~0xf)
diff --git a/arch/arm/include/asm/system_misc.h 
b/arch/arm/include/asm/system_misc.h
index a3d61ad..8c4a89f 100644
--- a/arch/arm/include/asm/system_misc.h
+++ b/arch/arm/include/asm/system_misc.h
@@ -22,6 +22,11 @@
 
 extern unsigned int user_debug;
 
+static inline int handle_guest_sea(phys_addr_t addr, unsigned int esr)
+{
+   return -1;
+}
+
 #endif /* !__ASSEMBLY__ */
 
 #endif /* __ASM_ARM_SYSTEM_MISC_H */
diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 6e99978..61d694c 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -204,6 +204,16 @@
 #define FSC_FAULT  ESR_ELx_FSC_FAULT
 #define FSC_ACCESS ESR_ELx_FSC_ACCESS
 #define FSC_PERM   ESR_ELx_FSC_PERM
+#define FSC_SEAESR_ELx_FSC_EXTABT
+#define FSC_SEA_TTW0   (0x14)
+#define FSC_SEA_TTW1   (0x15)
+#define FSC_SEA_TTW2   (0x16)
+#define FSC_SEA_TTW3   (0x17)
+#define FSC_SECC   (0x18)
+#define FSC_SECC_TTW0  (0x1c)
+#define FSC_SECC_TTW1  (0x1d)
+#define FSC_SECC_TTW2  (0x1e)
+#define FSC_SECC_TTW3  (0x1f)
 
 /* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
 #define HPFAR_MASK (~UL(0xf))
diff --git a/arch/arm64/include/asm/system_misc.h 
b/arch/arm64/include/asm/system_misc.h
index bc81243..95aa442 100644
--- a/arch/arm64/include/asm/system_misc.h
+++ b/arch/arm64/include/asm/system_misc.h
@@ -56,6 +56,8 @@ void hook_debug_fault_code(int nr, int (*fn)(unsigned long, 
unsigned int,
__show_ratelimited; \
 })
 
+int handle_guest_sea(phys_addr_t addr, unsigned int esr);
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* __ASM_SYSTEM_MISC_H */
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 9b4e26f..1ce62cc 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -532,6 +532,7 @@ static int do_sea(unsigned long addr, unsigned int esr, 
struct pt_regs *regs)
 {
struct siginfo info;
const struct fault_info *inf;
+   int ret = 0;
 
inf = esr_to_fault_info(esr);
pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n",
@@ -546,7 +547,7 @@ static int do_sea(unsigned long addr, unsigned int esr, 
struct pt_regs *regs)
if (interrupts_enabled(regs))
nmi_enter();
 
-   ghes_notify_sea();
+   ret = ghes_notify_sea();
 
if (interrupts_enabled(regs))
nmi_exit();
@@ -561,7 +562,7 @@ static int do_sea(unsigned long addr, unsigned int esr, 
struct pt_regs *regs)
info.si_addr  = (void __user *)addr;
arm64_notify_die("", regs, &info, esr);
 
-   return 0;
+   return ret;
 }
 
 static const struct fault_info fault_info[] = {
@@ -632,6 +633,23 @@ static int do_sea(unsigned long addr, unsigned int esr, 
struct pt_regs *regs)
 };
 
 /*
+ * Handle Synchronous External Aborts that occur in a guest kernel.
+ *
+ * The return value will be zero if the SEA was successfully handled
+ * and non-zero if there was an error processing the error or there was
+ * no error to process.
+ */
+int handle_guest_sea(phys_addr_t addr, unsigned int esr)
+{
+

[PATCH V17 10/11] trace, ras: add ARM processor error trace event

2017-05-19 Thread Tyler Baicar
Currently there are trace events for the various RAS
errors with the exception of ARM processor type errors.
Add a new trace event for such errors so that the user
will know when they occur. These trace events are
consistent with the ARM processor error section type
defined in UEFI 2.6 spec section N.2.4.4.

Signed-off-by: Tyler Baicar 
Acked-by: Steven Rostedt 
Reviewed-by: Xie XiuQi 
---
 drivers/acpi/apei/ghes.c|  6 +-
 drivers/firmware/efi/cper.c |  1 +
 drivers/ras/ras.c   |  6 ++
 include/linux/ras.h |  3 +++
 include/ras/ras_event.h | 45 +
 5 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 16adbc8..a0ab5f3 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -518,7 +518,11 @@ static void ghes_do_proc(struct ghes *ghes,
 
}
 #endif
-   else {
+   else if (!uuid_le_cmp(sec_type, CPER_SEC_PROC_ARM)) {
+   struct cper_sec_proc_arm *err = 
acpi_hest_get_payload(gdata);
+
+   log_arm_hw_error(err);
+   } else {
void *err = acpi_hest_get_payload(gdata);
 
log_non_standard_event(&sec_type, fru_id, fru_text,
diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index d5a5855..48a8f69 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define INDENT_SP  " "
 
diff --git a/drivers/ras/ras.c b/drivers/ras/ras.c
index e87fd9e..39701a5 100644
--- a/drivers/ras/ras.c
+++ b/drivers/ras/ras.c
@@ -20,6 +20,11 @@ void log_non_standard_event(const uuid_le *sec_type, const 
uuid_le *fru_id,
trace_non_standard_event(sec_type, fru_id, fru_text, sev, err, len);
 }
 
+void log_arm_hw_error(struct cper_sec_proc_arm *err)
+{
+   trace_arm_event(err);
+}
+
 static int __init ras_init(void)
 {
int rc = 0;
@@ -36,6 +41,7 @@ static int __init ras_init(void)
 #endif
 EXPORT_TRACEPOINT_SYMBOL_GPL(mc_event);
 EXPORT_TRACEPOINT_SYMBOL_GPL(non_standard_event);
+EXPORT_TRACEPOINT_SYMBOL_GPL(arm_event);
 
 int __init parse_ras_param(char *str)
 {
diff --git a/include/linux/ras.h b/include/linux/ras.h
index a7f3ed3..7a14658 100644
--- a/include/linux/ras.h
+++ b/include/linux/ras.h
@@ -3,6 +3,7 @@
 
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_DEBUG_FS
 int ras_userspace_consumers(void);
@@ -27,11 +28,13 @@ static inline void __init cec_init(void){ }
 void log_non_standard_event(const uuid_le *sec_type,
const uuid_le *fru_id, const char *fru_text,
const u8 sev, const u8 *err, const u32 len);
+void log_arm_hw_error(struct cper_sec_proc_arm *err);
 #else
 static void log_non_standard_event(const uuid_le *sec_type,
   const uuid_le *fru_id, const char *fru_text,
   const u8 sev, const u8 *err,
   const u32 len) { return; }
+static void log_arm_hw_error(struct cper_sec_proc_arm *err) { return; }
 #endif
 
 #endif /* __RAS_H__ */
diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
index 4f79ba9..429f46f 100644
--- a/include/ras/ras_event.h
+++ b/include/ras/ras_event.h
@@ -162,6 +162,51 @@
 );
 
 /*
+ * ARM Processor Events Report
+ *
+ * This event is generated when hardware detects an ARM processor error
+ * has occurred. UEFI 2.6 spec section N.2.4.4.
+ */
+TRACE_EVENT(arm_event,
+
+   TP_PROTO(const struct cper_sec_proc_arm *proc),
+
+   TP_ARGS(proc),
+
+   TP_STRUCT__entry(
+   __field(u64, mpidr)
+   __field(u64, midr)
+   __field(u32, running_state)
+   __field(u32, psci_state)
+   __field(u8, affinity)
+   ),
+
+   TP_fast_assign(
+   if (proc->validation_bits & CPER_ARM_VALID_AFFINITY_LEVEL)
+   __entry->affinity = proc->affinity_level;
+   else
+   __entry->affinity = ~0;
+   if (proc->validation_bits & CPER_ARM_VALID_MPIDR)
+   __entry->mpidr = proc->mpidr;
+   else
+   __entry->mpidr = 0ULL;
+   __entry->midr = proc->midr;
+   if (proc->validation_bits & CPER_ARM_VALID_RUNNING_STATE) {
+   __entry->running_state = proc->running_state;
+   __entry->psci_state = proc->psci_state;
+   } else {
+   __entry->running_state = ~0;
+   __entry->psci_state = ~0;
+   }
+   ),
+
+   TP_printk("affinity level: %d; MPIDR: %016llx; MIDR: %016llx; "
+ "running state: %d; PSCI state: %d",
+ __entry->affinity, __entry->mpidr, __entry->midr,
+ __entry->running_state, 

[PATCH V17 03/11] cper: add timestamp print to CPER status printing

2017-05-19 Thread Tyler Baicar
The ACPI 6.1 spec added a timestamp to the generic error data
entry structure. Print the timestamp out when printing out the
error information.

Signed-off-by: Tyler Baicar 
CC: Jonathan (Zhixiong) Zhang 
---
 drivers/firmware/efi/cper.c | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index 9024757..229cf92 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -32,6 +32,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 
 #define INDENT_SP  " "
@@ -387,6 +389,27 @@ static void cper_print_pcie(const char *pfx, const struct 
cper_sec_pcie *pcie,
pfx, pcie->bridge.secondary_status, pcie->bridge.control);
 }
 
+static void cper_print_tstamp(const char *pfx,
+  struct acpi_hest_generic_data_v300 *gdata)
+{
+   __u8 hour, min, sec, day, mon, year, century, *timestamp;
+
+   if (gdata->validation_bits & ACPI_HEST_GEN_VALID_TIMESTAMP) {
+   timestamp = (__u8 *)&(gdata->time_stamp);
+   sec   = bcd2bin(timestamp[0]);
+   min   = bcd2bin(timestamp[1]);
+   hour  = bcd2bin(timestamp[2]);
+   day   = bcd2bin(timestamp[4]);
+   mon   = bcd2bin(timestamp[5]);
+   year  = bcd2bin(timestamp[6]);
+   century   = bcd2bin(timestamp[7]);
+
+   printk("%s%ststamp: %02d%02d-%02d-%02d %02d:%02d:%02d\n", pfx,
+  (timestamp[3] & 0x1 ? "precise " : "imprecise "),
+  century, year, mon, day, hour, min, sec);
+   }
+}
+
 static void
 cper_estatus_print_section(const char *pfx, struct acpi_hest_generic_data 
*gdata,
   int sec_no)
@@ -395,6 +418,9 @@ static void cper_print_pcie(const char *pfx, const struct 
cper_sec_pcie *pcie,
__u16 severity;
char newpfx[64];
 
+   if (acpi_hest_get_version(gdata) >= 3)
+   cper_print_tstamp(pfx, (struct acpi_hest_generic_data_v300 
*)gdata);
+
severity = gdata->error_severity;
printk("%s""Error %d, type: %s\n", pfx, sec_no,
   cper_severity_str(severity));
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V17 01/11] acpi: apei: read ack upon ghes record consumption

2017-05-19 Thread Tyler Baicar
A RAS (Reliability, Availability, Serviceability) controller
may be a separate processor running in parallel with OS
execution, and may generate error records for consumption by
the OS. If the RAS controller produces multiple error records,
then they may be overwritten before the OS has consumed them.

The Generic Hardware Error Source (GHES) v2 structure
introduces the capability for the OS to acknowledge the
consumption of the error record generated by the RAS
controller. A RAS controller supporting GHESv2 shall wait for
the acknowledgment before writing a new error record, thus
eliminating the race condition.

Add support for parsing of GHESv2 sub-tables as well.

Signed-off-by: Tyler Baicar 
CC: Jonathan (Zhixiong) Zhang 
Reviewed-by: James Morse 
---
 drivers/acpi/apei/ghes.c | 59 +---
 drivers/acpi/apei/hest.c |  7 --
 include/acpi/ghes.h  |  5 +++-
 3 files changed, 65 insertions(+), 6 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index d0855c0..c072acf 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -46,6 +46,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -80,6 +81,11 @@
((struct acpi_hest_generic_status *)\
 ((struct ghes_estatus_node *)(estatus_node) + 1))
 
+static inline bool is_hest_type_generic_v2(struct ghes *ghes)
+{
+   return ghes->generic->header.type == ACPI_HEST_TYPE_GENERIC_ERROR_V2;
+}
+
 /*
  * This driver isn't really modular, however for the time being,
  * continuing to use module_param is the easiest way to remain
@@ -240,6 +246,16 @@ static int ghes_estatus_pool_expand(unsigned long len)
return 0;
 }
 
+static int map_gen_v2(struct ghes *ghes)
+{
+   return apei_map_generic_address(&ghes->generic_v2->read_ack_register);
+}
+
+static void unmap_gen_v2(struct ghes *ghes)
+{
+   apei_unmap_generic_address(&ghes->generic_v2->read_ack_register);
+}
+
 static struct ghes *ghes_new(struct acpi_hest_generic *generic)
 {
struct ghes *ghes;
@@ -249,10 +265,17 @@ static struct ghes *ghes_new(struct acpi_hest_generic 
*generic)
ghes = kzalloc(sizeof(*ghes), GFP_KERNEL);
if (!ghes)
return ERR_PTR(-ENOMEM);
+
ghes->generic = generic;
+   if (is_hest_type_generic_v2(ghes)) {
+   rc = map_gen_v2(ghes);
+   if (rc)
+   goto err_free;
+   }
+
rc = apei_map_generic_address(&generic->error_status_address);
if (rc)
-   goto err_free;
+   goto err_unmap_read_ack_addr;
error_block_length = generic->error_block_length;
if (error_block_length > GHES_ESTATUS_MAX_SIZE) {
pr_warning(FW_WARN GHES_PFX
@@ -264,13 +287,16 @@ static struct ghes *ghes_new(struct acpi_hest_generic 
*generic)
ghes->estatus = kmalloc(error_block_length, GFP_KERNEL);
if (!ghes->estatus) {
rc = -ENOMEM;
-   goto err_unmap;
+   goto err_unmap_status_addr;
}
 
return ghes;
 
-err_unmap:
+err_unmap_status_addr:
apei_unmap_generic_address(&generic->error_status_address);
+err_unmap_read_ack_addr:
+   if (is_hest_type_generic_v2(ghes))
+   unmap_gen_v2(ghes);
 err_free:
kfree(ghes);
return ERR_PTR(rc);
@@ -280,6 +306,8 @@ static void ghes_fini(struct ghes *ghes)
 {
kfree(ghes->estatus);
apei_unmap_generic_address(&ghes->generic->error_status_address);
+   if (is_hest_type_generic_v2(ghes))
+   unmap_gen_v2(ghes);
 }
 
 static inline int ghes_severity(int severity)
@@ -649,6 +677,21 @@ static void ghes_estatus_cache_add(
rcu_read_unlock();
 }
 
+static int ghes_ack_error(struct acpi_hest_generic_v2 *gv2)
+{
+   int rc;
+   u64 val = 0;
+
+   rc = apei_read(&val, &gv2->read_ack_register);
+   if (rc)
+   return rc;
+
+   val &= gv2->read_ack_preserve << gv2->read_ack_register.bit_offset;
+   val |= gv2->read_ack_write<< gv2->read_ack_register.bit_offset;
+
+   return apei_write(val, &gv2->read_ack_register);
+}
+
 static int ghes_proc(struct ghes *ghes)
 {
int rc;
@@ -661,6 +704,16 @@ static int ghes_proc(struct ghes *ghes)
ghes_estatus_cache_add(ghes->generic, ghes->estatus);
}
ghes_do_proc(ghes, ghes->estatus);
+
+   /*
+* GHESv2 type HEST entries introduce support for error acknowledgment,
+* so only acknowledge the error if this support is present.
+*/
+   if (is_hest_type_generic_v2(ghes)) {
+   rc = ghes_ack_error(ghes->generic_v2);
+   if (rc)
+   return rc;
+   }
 out:
ghes_clear_estatus(ghes);
return rc;
diff --git a/drivers/acpi/apei/hest.c b/drivers/acpi/apei/hest.c
index 8f2a98e..456b488 100644
--- a/drivers/acpi/apei/hest.c
++

[PATCH V17 00/11] Add UEFI 2.6 and ACPI 6.1 updates for RAS on ARM64

2017-05-19 Thread Tyler Baicar
When a memory error, CPU error, PCIe error, or other type of hardware error
that's covered by RAS occurs, firmware should populate the shared GHES memory
location with the proper GHES structures to notify the OS of the error.
For example, platforms that implement firmware first handling may implement
separate GHES sources for corrected errors and uncorrected errors. If the
error is an uncorrectable error, then the firmware will notify the OS
immediately since the error needs to be handled ASAP. The OS will then be able
to take the appropriate action needed such as offlining a page. If the error
is a corrected error, then the firmware will not interrupt the OS immediately.
Instead, the OS will see and report the error the next time it's GHES timer
expires. The kernel will first parse the GHES structures and report the errors
through the kernel logs and then notify the user space through RAS trace
events. This allows user space applications such as RAS Daemon to see the
errors and report them however the user desires. This patchset extends the
kernel functionality for RAS errors based on updates in the UEFI 2.6 and
ACPI 6.1 specifications.

An example flow from firmware to user space could be:

 +---+
   +>|   |
   | |  GHES polling |--+
+-+  |source |  |   +---+   ++
| |  +---+  |   |  Kernel GHES  |   ||
|  Firmware   | +-->|  CPER AER and |-->|  RAS trace |
| |  +---+  |   |  EDAC drivers |   |   event|
+-+  |   |  |   +---+   ++
   | |  GHES sci |--+
   +>|   source  |
 +---+

Add support for Generic Hardware Error Source (GHES) v2, which introduces the
capability for the OS to acknowledge the consumption of the error record
generated by the Reliability, Availability and Serviceability (RAS) controller.
This eliminates potential race conditions between the OS and the RAS controller.

Add support for the timestamp field added to the Generic Error Data Entry v3,
allowing the OS to log the time that the error is generated by the firmware,
rather than the time the error is consumed. This improves the correctness of
event sequences when analyzing error logs. The timestamp is added in
ACPI 6.1, reference Table 18-343 Generic Error Data Entry.

Add support for ARMv8 Common Platform Error Record (CPER) per UEFI 2.6
specification. ARMv8 specific processor error information is reported as part of
the CPER records.  This provides more detail on for processor error logs. This
can help describe ARMv8 cache, tlb, and bus errors.

Synchronous External Abort (SEA) represents a specific processor error condition
in ARM systems. A handler is added to recognize SEA errors, and a notifier is
added to parse and report the errors before the process is killed. Refer to
section N.2.1.1 in the Common Platform Error Record appendix of the UEFI 2.6
specification.

Currently the kernel ignores CPER records that are unrecognized.
On the other hand, UEFI spec allows for non-standard (eg. vendor
proprietary) error section type in CPER (Common Platform Error Record),
as defined in section N2.3 of UEFI version 2.5. Therefore, user
is not able to see hardware error data of non-standard section.

If section Type field of Generic Error Data Entry is unrecognized,
prints out the raw data in dmesg buffer, and also adds a tracepoint
for reporting such hardware errors.

Currently even if an error status block's severity is fatal, the kernel
does not honor the severity level and panic. With the firmware first
model, the platform could inform the OS about a fatal hardware error
through the non-NMI GHES notification type. The OS should panic when a
hardware error record is received with this severity.

Add support to handle SEAs that occur while a KVM guest kernel is
running. Currently these are unsupported by the guest abort handling.

V17:Rebase on tip
Change trace event helper function names
Remove unneeded prefixes from commit text

V16:Rebase on 4.11
Change helper functions from #defines to inline functions
Address checkpatch warnings which make sense
Various parameter/variable name changes and spacing changes for better code 
readibility
Comment why only GHESv2 needs to acknowledge the error records
Define and set error structures on the same line
Change timestamp printing function name to cper_print_tstamp
Update timestamp to be a single print again and specify when it's precise 
or imprecise
Only print section length when the length check fails in the ARM CPER 
record parsing
Remove version and length print of ARM error info structures
Spell out Multiprocessor Affinity Register (MPIDR) and Power State 
Coordination Interface (PSCI)
Combine invalid context prints for ARM context info parsing
  

Re: [PATCH v5 32/32] x86/mm: Add support to make use of Secure Memory Encryption

2017-05-19 Thread Borislav Petkov
On Fri, May 19, 2017 at 03:16:51PM -0500, Josh Poimboeuf wrote:
> I'm the stack validation guy, not the stack protection guy :-)

LOL. I thought you were *the* stacks guy. :-)))

But once you've validated it, you could protect it then too. :-)

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 32/32] x86/mm: Add support to make use of Secure Memory Encryption

2017-05-19 Thread Josh Poimboeuf
On Fri, May 19, 2017 at 01:30:05PM +0200, Borislav Petkov wrote:
> > it is called so early. I can get past it by adding:
> > 
> > CFLAGS_mem_encrypt.o := $(nostackp)
> > 
> > in the arch/x86/mm/Makefile, but that obviously eliminates the support
> > for the whole file.  Would it be better to split out the sme_enable()
> > and other boot routines into a separate file or just apply the
> > $(nostackp) to the whole file?
> 
> Josh might have a better idea here... CCed.

I'm the stack validation guy, not the stack protection guy :-)

But there is a way to disable compiler options on a per-function basis
with the gcc __optimize__ function attribute.  For example:

  __attribute__((__optimize__("no-stack-protector")))

-- 
Josh
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 23/32] swiotlb: Add warnings for use of bounce buffers with SME

2017-05-19 Thread Tom Lendacky

On 5/16/2017 9:52 AM, Borislav Petkov wrote:

On Tue, Apr 18, 2017 at 04:20:19PM -0500, Tom Lendacky wrote:

Add warnings to let the user know when bounce buffers are being used for
DMA when SME is active.  Since the bounce buffers are not in encrypted
memory, these notifications are to allow the user to determine some
appropriate action - if necessary.

Signed-off-by: Tom Lendacky 
---
 arch/x86/include/asm/mem_encrypt.h |   11 +++
 include/linux/dma-mapping.h|   11 +++
 include/linux/mem_encrypt.h|6 ++
 lib/swiotlb.c  |3 +++
 4 files changed, 31 insertions(+)

diff --git a/arch/x86/include/asm/mem_encrypt.h 
b/arch/x86/include/asm/mem_encrypt.h
index 0637b4b..b406df2 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -26,6 +26,11 @@ static inline bool sme_active(void)
return !!sme_me_mask;
 }

+static inline u64 sme_dma_mask(void)
+{
+   return ((u64)sme_me_mask << 1) - 1;
+}
+
 void __init sme_early_encrypt(resource_size_t paddr,
  unsigned long size);
 void __init sme_early_decrypt(resource_size_t paddr,
@@ -50,6 +55,12 @@ static inline bool sme_active(void)
 {
return false;
 }
+
+static inline u64 sme_dma_mask(void)
+{
+   return 0ULL;
+}
+
 #endif

 static inline void __init sme_early_encrypt(resource_size_t paddr,
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 0977317..f825870 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 

 /**
  * List of possible attributes associated with a DMA mapping. The semantics
@@ -577,6 +578,11 @@ static inline int dma_set_mask(struct device *dev, u64 
mask)

if (!dev->dma_mask || !dma_supported(dev, mask))
return -EIO;
+
+   if (sme_active() && (mask < sme_dma_mask()))
+   dev_warn_ratelimited(dev,
+"SME is active, device will require DMA bounce 
buffers\n");


Bah, no need to break that line - just let it stick out. Ditto for the
others.


Ok.

Thanks,
Tom




--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 22/32] x86, swiotlb: DMA support for memory encryption

2017-05-19 Thread Tom Lendacky

On 5/16/2017 9:27 AM, Borislav Petkov wrote:

On Tue, Apr 18, 2017 at 04:20:10PM -0500, Tom Lendacky wrote:

Since DMA addresses will effectively look like 48-bit addresses when the
memory encryption mask is set, SWIOTLB is needed if the DMA mask of the
device performing the DMA does not support 48-bits. SWIOTLB will be
initialized to create decrypted bounce buffers for use by these devices.


Use a verb in the subject:

Subject: x86, swiotlb: Add memory encryption support

or similar.


Will do.

Thanks,
Tom




--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 19/32] x86/mm: Add support to access persistent memory in the clear

2017-05-19 Thread Tom Lendacky

On 5/16/2017 9:04 AM, Borislav Petkov wrote:

On Tue, Apr 18, 2017 at 04:19:42PM -0500, Tom Lendacky wrote:

Persistent memory is expected to persist across reboots. The encryption
key used by SME will change across reboots which will result in corrupted
persistent memory.  Persistent memory is handed out by block devices
through memory remapping functions, so be sure not to map this memory as
encrypted.

Signed-off-by: Tom Lendacky 
---
 arch/x86/mm/ioremap.c |   31 ++-
 1 file changed, 30 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index bce0604..55317ba 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -425,17 +425,46 @@ void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr)
  * Examine the physical address to determine if it is an area of memory
  * that should be mapped decrypted.  If the memory is not part of the
  * kernel usable area it was accessed and created decrypted, so these
- * areas should be mapped decrypted.
+ * areas should be mapped decrypted. And since the encryption key can
+ * change across reboots, persistent memory should also be mapped
+ * decrypted.
  */
 static bool memremap_should_map_decrypted(resource_size_t phys_addr,
  unsigned long size)
 {
+   int is_pmem;
+
+   /*
+* Check if the address is part of a persistent memory region.
+* This check covers areas added by E820, EFI and ACPI.
+*/
+   is_pmem = region_intersects(phys_addr, size, IORESOURCE_MEM,
+   IORES_DESC_PERSISTENT_MEMORY);
+   if (is_pmem != REGION_DISJOINT)
+   return true;
+
+   /*
+* Check if the non-volatile attribute is set for an EFI
+* reserved area.
+*/
+   if (efi_enabled(EFI_BOOT)) {
+   switch (efi_mem_type(phys_addr)) {
+   case EFI_RESERVED_TYPE:
+   if (efi_mem_attributes(phys_addr) & EFI_MEMORY_NV)
+   return true;
+   break;
+   default:
+   break;
+   }
+   }
+
/* Check if the address is outside kernel usable area */
switch (e820__get_entry_type(phys_addr, phys_addr + size - 1)) {
case E820_TYPE_RESERVED:
case E820_TYPE_ACPI:
case E820_TYPE_NVS:
case E820_TYPE_UNUSABLE:
+   case E820_TYPE_PRAM:


Can't you simply add:

case E820_TYPE_PMEM:

here too and thus get rid of the region_intersects() thing above?

Because, for example, e820_type_to_iores_desc() maps E820_TYPE_PMEM to
IORES_DESC_PERSISTENT_MEMORY so those should be equivalent...


I'll have to double-check this, but I believe that when persistent
memory is identified through the NFIT table it adds it as a resource
but doesn't add it as an e820 entry so I can't rely on the type being
returned as E820_TYPE_PMEM by e820__get_entry_type().

Thanks,
Tom




--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] efi-pstore: Fix write/erase id tracking

2017-05-19 Thread Kees Cook
Prior to the pstore interface refactoring, the "id" generated during
a backend pstore_write() was only retained by the internal pstore
inode tracking list. Additionally the "part" was ignored, so EFI
would encode this in the id. This corrects the misunderstandings
and correctly sets "id" during pstore_write(), and uses "part"
directly during pstore_erase().

Reported-by: Marta Lofstedt 
Fixes: 76cc9580e3fb ("pstore: Replace arguments for write() API")
Fixes: a61072aae693 ("pstore: Replace arguments for erase() API")
Signed-off-by: Kees Cook 
---
Since the pstore refactor broke this, I intend to push this via the
pstore tree.
---
 drivers/firmware/efi/efi-pstore.c | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/drivers/firmware/efi/efi-pstore.c 
b/drivers/firmware/efi/efi-pstore.c
index 44148fd4c9f2..dda2e96328c0 100644
--- a/drivers/firmware/efi/efi-pstore.c
+++ b/drivers/firmware/efi/efi-pstore.c
@@ -53,6 +53,7 @@ static int efi_pstore_read_func(struct efivar_entry *entry,
if (sscanf(name, "dump-type%u-%u-%d-%lu-%c",
   &record->type, &part, &cnt, &time, &data_type) == 5) {
record->id = generic_id(time, part, cnt);
+   record->part = part;
record->count = cnt;
record->time.tv_sec = time;
record->time.tv_nsec = 0;
@@ -64,6 +65,7 @@ static int efi_pstore_read_func(struct efivar_entry *entry,
} else if (sscanf(name, "dump-type%u-%u-%d-%lu",
   &record->type, &part, &cnt, &time) == 4) {
record->id = generic_id(time, part, cnt);
+   record->part = part;
record->count = cnt;
record->time.tv_sec = time;
record->time.tv_nsec = 0;
@@ -77,6 +79,7 @@ static int efi_pstore_read_func(struct efivar_entry *entry,
 * multiple logs, remains.
 */
record->id = generic_id(time, part, 0);
+   record->part = part;
record->count = 0;
record->time.tv_sec = time;
record->time.tv_nsec = 0;
@@ -241,9 +244,15 @@ static int efi_pstore_write(struct pstore_record *record)
efi_guid_t vendor = LINUX_EFI_CRASH_GUID;
int i, ret = 0;
 
+   record->time.tv_sec = get_seconds();
+   record->time.tv_nsec = 0;
+
+   record->id = generic_id(record->time.tv_sec, record->part,
+   record->count);
+
snprintf(name, sizeof(name), "dump-type%u-%u-%d-%lu-%c",
 record->type, record->part, record->count,
-get_seconds(), record->compressed ? 'C' : 'D');
+record->time.tv_sec, record->compressed ? 'C' : 'D');
 
for (i = 0; i < DUMP_NAME_LEN; i++)
efi_name[i] = name[i];
@@ -255,7 +264,6 @@ static int efi_pstore_write(struct pstore_record *record)
if (record->reason == KMSG_DUMP_OOPS)
efivar_run_worker();
 
-   record->id = record->part;
return ret;
 };
 
@@ -287,7 +295,7 @@ static int efi_pstore_erase_func(struct efivar_entry 
*entry, void *data)
 * holding multiple logs, remains.
 */
snprintf(name_old, sizeof(name_old), "dump-type%u-%u-%lu",
-   ed->record->type, (unsigned int)ed->record->id,
+   ed->record->type, ed->record->part,
ed->record->time.tv_sec);
 
for (i = 0; i < DUMP_NAME_LEN; i++)
@@ -320,10 +328,7 @@ static int efi_pstore_erase(struct pstore_record *record)
char name[DUMP_NAME_LEN];
efi_char16_t efi_name[DUMP_NAME_LEN];
int found, i;
-   unsigned int part;
 
-   do_div(record->id, 1000);
-   part = do_div(record->id, 100);
snprintf(name, sizeof(name), "dump-type%u-%u-%d-%lu",
 record->type, record->part, record->count,
 record->time.tv_sec);
-- 
2.7.4


-- 
Kees Cook
Pixel Security
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] efifb: Avoid reconfiguration of BAR that covers the framebuffer

2017-05-19 Thread Ard Biesheuvel
Hi Bjorn,

On 19 May 2017 at 17:27, Bjorn Helgaas  wrote:
> [+cc linux-pci]
>
> On Tue, Apr 04, 2017 at 04:27:44PM +0100, Ard Biesheuvel wrote:
>> On UEFI systems, the PCI subsystem is enumerated by the firmware,
>> and if a graphical framebuffer is exposed by a PCI device, its base
>> address and size are exposed to the OS via the Graphics Output
>> Protocol (GOP).
>>
>> On arm64 PCI systems, the entire PCI hierarchy is reconfigured from
>> scratch at boot. This may result in the GOP framebuffer address to
>> become stale, if the BAR covering the framebuffer is modified. This
>> will cause the framebuffer to become unresponsive, and may in some
>> cases result in unpredictable behavior if the range is reassigned to
>> another device.
>>
>> So add a non-x86 quirk to the EFI fb driver to find the BAR associated
>> with the GOP base address, and claim the BAR resource so that the PCI
>> core will not move it.
>
> I know this has already been merged as 55d728a40d36 ("efi/fb: Avoid
> reconfiguration of BAR that covers the framebuffer"), and I'm not
> suggesting that we revert it, but I have some misgivings.
>

Thanks for taking a look. I have been struggling with this issue for a
while now.

> One is the "#ifndef CONFIG_X86".  In principle, there is nothing arch-
> specific here.  I don't think it's a good idea to build in dependencies on
> things like "this arch preserves (or reprograms) PCI BARs".  The PCI core
> may reprogram all, some, or none of the PCI BARs, depending on what (if
> anything) firmware has done.
>

IIRC this was a late addition. I agree that the issue exists in theory
on x86 as well. However, I was primarily dealing with the reality of
arm64 systems that suddenly explode in inexplicable ways after
upgrading the kernel to one that happens to have EFIFB built in. The
patch went into -stable as well, so I still think adding #ifndef
CONFIG_X86 was the right choice here.

> Another is the use of pci_claim_resource() to express the idea that "this
> BAR can not be moved".  We have IORESOURCE_PCI_FIXED for that purpose, and
> previous versions of the patch used that.  I understand there was some
> problem with that, but I wish we could figure out and fix that problem
> instead of using a different mechanism.
>

OK. The problem was that IORESOURCE_PCI_FIXED does not prevent the PCI
subsystem from handing out the same range to another device.

> I'm not even completely sold on the idea that we need to prevent the BAR
> from being moved.  I'm not a UEFI expert, but if this requirement is in the
> spec, we should reference it.  If not, it should be sufficient to remember
> the boot-time BAR value, match the GOP base to *that*, and map it to
> whatever the current BAR value is.
>

There is no such requirement in the spec. The graphics output protocol
is not described in terms of PCI, BARs etc. The framebuffer is simply
a memory range with side effects that is left enabled when handing
over to the OS.

In summary, I am as unhappy with the patch as you are, but it is still
an improvement over the previous situation, so let's simply
collaborate to get this into shape going forward.

My preference would be to investigate IORESOURCE_PCI_FIXED again,
because that still seems like the best way to deal with a live
framebuffer on a PCI device that has memory decoding enabled. It
should also address the issue with the current version of the patch,
i.e., that claiming resources at this point is not possible if the
device resides behind a bridge.

So is there any guidance you can give as to how to proceed? If we set
IORESOURCE_PCI_FIXED, how should be prevent the PCI layer from
assigning this resource elsewhere if we cannot claim it yet?

Regards,
Ard.


>> Fixes: 9822504c1fa5 ("efifb: Enable the efi-framebuffer platform driver ...")
>> Cc:  # v4.7+
>> Cc: Matt Fleming 
>> Cc: Peter Jones 
>> Signed-off-by: Ard Biesheuvel 
>> ---
>>  drivers/video/fbdev/efifb.c | 66 
>> -
>>  1 file changed, 65 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/video/fbdev/efifb.c b/drivers/video/fbdev/efifb.c
>> index 8c4dc1e1f94f..758960b6aec9 100644
>> --- a/drivers/video/fbdev/efifb.c
>> +++ b/drivers/video/fbdev/efifb.c
>> @@ -10,6 +10,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  #include 
>>  #include 
>>  #include 
>> @@ -143,6 +144,8 @@ static struct attribute *efifb_attrs[] = {
>>  };
>>  ATTRIBUTE_GROUPS(efifb);
>>
>> +static bool pci_dev_disabled;/* FB base matches BAR of a disabled 
>> device */
>> +
>>  static int efifb_probe(struct platform_device *dev)
>>  {
>>   struct fb_info *info;
>> @@ -152,7 +155,7 @@ static int efifb_probe(struct platform_device *dev)
>>   unsigned int size_total;
>>   char *option = NULL;
>>
>> - if (screen_info.orig_video_isVGA != VIDEO_TYPE_EFI)
>> + if (screen_info.orig_video_isVGA != VIDEO_TYPE_EFI || pci_dev_disabled)
>>   return -ENODEV;
>>
>>   if (fb_get_options("efifb", 

Re: [PATCH 2/2] efifb: Avoid reconfiguration of BAR that covers the framebuffer

2017-05-19 Thread Bjorn Helgaas
[+cc linux-pci]

On Tue, Apr 04, 2017 at 04:27:44PM +0100, Ard Biesheuvel wrote:
> On UEFI systems, the PCI subsystem is enumerated by the firmware,
> and if a graphical framebuffer is exposed by a PCI device, its base
> address and size are exposed to the OS via the Graphics Output
> Protocol (GOP).
> 
> On arm64 PCI systems, the entire PCI hierarchy is reconfigured from
> scratch at boot. This may result in the GOP framebuffer address to
> become stale, if the BAR covering the framebuffer is modified. This
> will cause the framebuffer to become unresponsive, and may in some
> cases result in unpredictable behavior if the range is reassigned to
> another device.
> 
> So add a non-x86 quirk to the EFI fb driver to find the BAR associated
> with the GOP base address, and claim the BAR resource so that the PCI
> core will not move it.

I know this has already been merged as 55d728a40d36 ("efi/fb: Avoid
reconfiguration of BAR that covers the framebuffer"), and I'm not
suggesting that we revert it, but I have some misgivings.

One is the "#ifndef CONFIG_X86".  In principle, there is nothing arch-
specific here.  I don't think it's a good idea to build in dependencies on
things like "this arch preserves (or reprograms) PCI BARs".  The PCI core
may reprogram all, some, or none of the PCI BARs, depending on what (if
anything) firmware has done.

Another is the use of pci_claim_resource() to express the idea that "this
BAR can not be moved".  We have IORESOURCE_PCI_FIXED for that purpose, and
previous versions of the patch used that.  I understand there was some
problem with that, but I wish we could figure out and fix that problem
instead of using a different mechanism.

I'm not even completely sold on the idea that we need to prevent the BAR
from being moved.  I'm not a UEFI expert, but if this requirement is in the
spec, we should reference it.  If not, it should be sufficient to remember
the boot-time BAR value, match the GOP base to *that*, and map it to
whatever the current BAR value is.

> Fixes: 9822504c1fa5 ("efifb: Enable the efi-framebuffer platform driver ...")
> Cc:  # v4.7+
> Cc: Matt Fleming 
> Cc: Peter Jones 
> Signed-off-by: Ard Biesheuvel 
> ---
>  drivers/video/fbdev/efifb.c | 66 
> -
>  1 file changed, 65 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/video/fbdev/efifb.c b/drivers/video/fbdev/efifb.c
> index 8c4dc1e1f94f..758960b6aec9 100644
> --- a/drivers/video/fbdev/efifb.c
> +++ b/drivers/video/fbdev/efifb.c
> @@ -10,6 +10,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -143,6 +144,8 @@ static struct attribute *efifb_attrs[] = {
>  };
>  ATTRIBUTE_GROUPS(efifb);
>  
> +static bool pci_dev_disabled;/* FB base matches BAR of a disabled 
> device */
> +
>  static int efifb_probe(struct platform_device *dev)
>  {
>   struct fb_info *info;
> @@ -152,7 +155,7 @@ static int efifb_probe(struct platform_device *dev)
>   unsigned int size_total;
>   char *option = NULL;
>  
> - if (screen_info.orig_video_isVGA != VIDEO_TYPE_EFI)
> + if (screen_info.orig_video_isVGA != VIDEO_TYPE_EFI || pci_dev_disabled)
>   return -ENODEV;
>  
>   if (fb_get_options("efifb", &option))
> @@ -360,3 +363,64 @@ static struct platform_driver efifb_driver = {
>  };
>  
>  builtin_platform_driver(efifb_driver);
> +
> +#ifndef CONFIG_X86
> +
> +static bool pci_bar_found;   /* did we find a BAR matching the efifb base? */
> +
> +static void claim_efifb_bar(struct pci_dev *dev, int idx)
> +{
> + u16 word;
> +
> + pci_bar_found = true;
> +
> + pci_read_config_word(dev, PCI_COMMAND, &word);
> + if (!(word & PCI_COMMAND_MEMORY)) {
> + pci_dev_disabled = true;
> + dev_err(&dev->dev,
> + "BAR %d: assigned to efifb but device is disabled!\n",
> + idx);
> + return;
> + }
> +
> + if (pci_claim_resource(dev, idx)) {
> + pci_dev_disabled = true;
> + dev_err(&dev->dev,
> + "BAR %d: failed to claim resource for efifb!\n", idx);
> + return;
> + }
> +
> + dev_info(&dev->dev, "BAR %d: assigned to efifb\n", idx);
> +}
> +
> +static void efifb_fixup_resources(struct pci_dev *dev)
> +{
> + u64 base = screen_info.lfb_base;
> + u64 size = screen_info.lfb_size;
> + int i;
> +
> + if (pci_bar_found || screen_info.orig_video_isVGA != VIDEO_TYPE_EFI)
> + return;
> +
> + if (screen_info.capabilities & VIDEO_CAPABILITY_64BIT_BASE)
> + base |= (u64)screen_info.ext_lfb_base << 32;
> +
> + if (!base)
> + return;
> +
> + for (i = 0; i < PCI_STD_RESOURCE_END; i++) {
> + struct resource *res = &dev->resource[i];
> +
> + if (!(res->flags & IORESOURCE_MEM))
> + continue;
> +
> + if (res->start <= base && res->end >= base + size - 1) {

Re: [PATCH] efi/reboot: Fall back to original power-off method if EFI_RESET_SHUTDOWN returns

2017-05-19 Thread Ard Biesheuvel
On 23 April 2017 at 13:36, Hans de Goede  wrote:
> Commit 44be28e9dd98 ("x86/reboot: Add EFI reboot quirk for ACPI Hardware
> Reduced flag") sets pm_power_off to efi_power_off() when the
> acpi_gbl_reduced_hardware flag is set.
>
> According to its commit message this is necessary because: "BayTrail-T
> class of hardware requires EFI in order to powerdown and reboot and no
> other reliable method exists"
>
> But I have a Bay Trail CR tablet where the EFI_RESET_SHUTDOWN call does
> not work, it simply returns without doing anything (AFAICT).
>
> So it seems that some Bay Trail devices must use EFI for power-off, while
> for others only ACPI works.
>
> Note that efi_power_off() only gets used if the platform code defines
> efi_poweroff_required() and that returns true, this currently only ever
> happens on x86.
>
> Since on the devices which need ACPI for power-off the EFI_RESET_SHUTDOWN
> call simply returns, this patch makes the efi-reboot code remember the
> old pm_power_off handler and if EFI_RESET_SHUTDOWN returns it falls back
> to calling that.
>
> This seems preferable to dmi-quirking our way out of this, since there
> are likely quite a few devices suffering from this.
>
> Cc: Mark Salter 
> Signed-off-by: Hans de Goede 
> ---
>  drivers/firmware/efi/reboot.c | 12 +++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/firmware/efi/reboot.c b/drivers/firmware/efi/reboot.c
> index 62ead9b..7117e2d 100644
> --- a/drivers/firmware/efi/reboot.c
> +++ b/drivers/firmware/efi/reboot.c
> @@ -5,6 +5,8 @@
>  #include 
>  #include 
>
> +void (*orig_pm_power_off)(void);
> +
>  int efi_reboot_quirk_mode = -1;
>
>  void efi_reboot(enum reboot_mode reboot_mode, const char *__unused)
> @@ -51,6 +53,12 @@ bool __weak efi_poweroff_required(void)
>  static void efi_power_off(void)
>  {
> efi.reset_system(EFI_RESET_SHUTDOWN, EFI_SUCCESS, 0, NULL);
> +   /*
> +* The above call should not return, if it does fall back to
> +* the original power off method (typically ACPI poweroff).
> +*/
> +   if (orig_pm_power_off)
> +   orig_pm_power_off();
>  }
>
>  static int __init efi_shutdown_init(void)
> @@ -58,8 +66,10 @@ static int __init efi_shutdown_init(void)
> if (!efi_enabled(EFI_RUNTIME_SERVICES))
> return -ENODEV;
>
> -   if (efi_poweroff_required())
> +   if (efi_poweroff_required()) {
> +   orig_pm_power_off = pm_power_off;
> pm_power_off = efi_power_off;
> +   }
>
> return 0;
>  }

This does not look unreasonable to me, but this is more Matt's turf so
I will let him handle this one.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/5] efi: Move the x86 secure boot switch to generic code

2017-05-19 Thread Ard Biesheuvel
First of all, apologies for taking so long to respond.

On 6 April 2017 at 13:49, David Howells  wrote:
> Move the switch-statement in x86's setup_arch() that inteprets the
> secure_boot boot parameter to generic code.
>
> Suggested-by: Ard Biesheuvel 
> Signed-off-by: David Howells 
> ---
>
>  arch/x86/kernel/setup.c|   14 +-
>  drivers/firmware/efi/Kconfig   |   23 +++
>  drivers/firmware/efi/Makefile  |3 ++-
>  drivers/firmware/efi/secure_boot.c |   34 ++
>  include/linux/efi.h|6 ++
>  5 files changed, 66 insertions(+), 14 deletions(-)
>  create mode 100644 drivers/firmware/efi/secure_boot.c
>
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 4bf0c8926a1c..b89979ffa6e5 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -1178,19 +1178,7 @@ void __init setup_arch(char **cmdline_p)
> /* Allocate bigger log buffer */
> setup_log_buf(1);
>
> -   if (efi_enabled(EFI_BOOT)) {
> -   switch (boot_params.secure_boot) {
> -   case efi_secureboot_mode_disabled:
> -   pr_info("Secure boot disabled\n");
> -   break;
> -   case efi_secureboot_mode_enabled:
> -   pr_info("Secure boot enabled\n");
> -   break;
> -   default:
> -   pr_info("Secure boot could not be determined\n");
> -   break;
> -   }
> -   }
> +   efi_set_secure_boot(boot_params.secure_boot);
>
> reserve_initrd();
>
> diff --git a/drivers/firmware/efi/Kconfig b/drivers/firmware/efi/Kconfig
> index 2e78b0b96d74..4b902ffbfcf4 100644
> --- a/drivers/firmware/efi/Kconfig
> +++ b/drivers/firmware/efi/Kconfig
> @@ -84,6 +84,29 @@ config EFI_PARAMS_FROM_FDT
>  config EFI_RUNTIME_WRAPPERS
> bool
>
> +config EFI_SECURE_BOOT
> +   bool "Support UEFI Secure Boot and lock down the kernel in secure 
> boot mode"
> +   default n
> +   help
> + UEFI Secure Boot provides a mechanism for ensuring that the firmware
> + will only load signed bootloaders and kernels.  Secure boot mode may
> + be determined from EFI variables provided by the BIOS if not

Please replace 'the BIOS' with something more generic.

> + indicated by the boot parameters.
> +
> + Enabling this option turns on support for UEFI secure boot in the
> + kernel.  This will result in various kernel facilities being locked
> + away from userspace if the kernel detects that it has been booted in
> + secure boot mode.  If it hasn't been booted in secure boot mode, or
> + this cannot be determined, the lock down doesn't occur.
> +
> + The kernel facilities that get locked down include:
> + - Viewing or changing the kernel's memory
> + - Directly accessing ioports
> + - Directly specifying ioports and other hardware parameters to 
> drivers
> + - Storing the kernel image unencrypted for hibernation
> + - Loading unsigned modules
> + - Kexec'ing unsigned images
> +
>  config EFI_ARMSTUB
> bool
>
> diff --git a/drivers/firmware/efi/Makefile b/drivers/firmware/efi/Makefile
> index ad67342313ed..65969f840685 100644
> --- a/drivers/firmware/efi/Makefile
> +++ b/drivers/firmware/efi/Makefile
> @@ -22,7 +22,8 @@ obj-$(CONFIG_EFI_FAKE_MEMMAP) += fake_mem.o
>  obj-$(CONFIG_EFI_BOOTLOADER_CONTROL)   += efibc.o
>  obj-$(CONFIG_EFI_TEST) += test/
>  obj-$(CONFIG_EFI_DEV_PATH_PARSER)  += dev-path-parser.o
> -obj-$(CONFIG_APPLE_PROPERTIES) += apple-properties.o
> +obj-$(CONFIG_EFI_SECURE_BOOT)  += secure_boot.o
> +obj-$(CONFIG_APPLE_PROPERTIES) += apple-properties.oo

Spurious change here

>
>  arm-obj-$(CONFIG_EFI)  := arm-init.o arm-runtime.o
>  obj-$(CONFIG_ARM)  += $(arm-obj-y)
> diff --git a/drivers/firmware/efi/secure_boot.c 
> b/drivers/firmware/efi/secure_boot.c
> new file mode 100644
> index ..cf5bccae15e8
> --- /dev/null
> +++ b/drivers/firmware/efi/secure_boot.c

We have a file called secureboot.c in libstub/, so for consistency,
could you please drop the underscore?

> @@ -0,0 +1,34 @@
> +/* Core kernel secure boot support.
> + *
> + * Copyright (C) 2017 Red Hat, Inc. All Rights Reserved.
> + * Written by David Howells (dhowe...@redhat.com)
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public Licence
> + * as published by the Free Software Foundation; either version
> + * 2 of the Licence, or (at your option) any later version.
> + */
> +
> +#include 
> +#include 
> +#include 
> +
> +/*
> + * Decide what to do when UEFI secure boot mode is enabled.
> + */
> +void __init efi_set_secure_boot(enum efi_secureboot_mode m

Re: [PATCH] efi-pstore: Fix read iter after pstore API refactor

2017-05-19 Thread Ard Biesheuvel
On 18 May 2017 at 17:41, Kees Cook  wrote:
> On Thu, May 18, 2017 at 9:18 AM, Ard Biesheuvel
>  wrote:
>> On 18 May 2017 at 14:01, Kees Cook  wrote:
>>> On Thu, May 18, 2017 at 3:35 AM, Ard Biesheuvel
>>>  wrote:
 On 12 May 2017 at 22:58, Kees Cook  wrote:
> During the internal pstore API refactoring, the EFI vars read entry was
> accidentally made to update a stack variable instead of the pstore
> private data pointer. This corrects the problem (and removes the now
> needless argument).
>
> Signed-off-by: Kees Cook 

 Does this need a cc stable?
>>>
>>> No, the refactor first appeared in 4.12-rc1.
>>>
>>
>> OK. Applied.
>
> Err, eek, no, it's already gone into Linus's tree.
>

OK, I dropped it again :-)
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 32/32] x86/mm: Add support to make use of Secure Memory Encryption

2017-05-19 Thread Borislav Petkov
On Fri, Apr 21, 2017 at 01:56:13PM -0500, Tom Lendacky wrote:
> On 4/18/2017 4:22 PM, Tom Lendacky wrote:
> > Add support to check if SME has been enabled and if memory encryption
> > should be activated (checking of command line option based on the
> > configuration of the default state).  If memory encryption is to be
> > activated, then the encryption mask is set and the kernel is encrypted
> > "in place."
> > 
> > Signed-off-by: Tom Lendacky 
> > ---
> >  arch/x86/kernel/head_64.S |1 +
> >  arch/x86/mm/mem_encrypt.c |   83 
> > +++--
> >  2 files changed, 80 insertions(+), 4 deletions(-)
> > 
> 
> ...
> 
> > 
> > -unsigned long __init sme_enable(void)
> > +unsigned long __init sme_enable(struct boot_params *bp)
> >  {
> > +   const char *cmdline_ptr, *cmdline_arg, *cmdline_on, *cmdline_off;
> > +   unsigned int eax, ebx, ecx, edx;
> > +   unsigned long me_mask;
> > +   bool active_by_default;
> > +   char buffer[16];
> 
> So it turns out that when KASLR is enabled (CONFIG_RAMDOMIZE_BASE=y)
> the stack-protector support causes issues with this function because

What issues?

> it is called so early. I can get past it by adding:
> 
> CFLAGS_mem_encrypt.o := $(nostackp)
> 
> in the arch/x86/mm/Makefile, but that obviously eliminates the support
> for the whole file.  Would it be better to split out the sme_enable()
> and other boot routines into a separate file or just apply the
> $(nostackp) to the whole file?

Josh might have a better idea here... CCed.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 32/32] x86/mm: Add support to make use of Secure Memory Encryption

2017-05-19 Thread Borislav Petkov
On Tue, Apr 18, 2017 at 04:22:23PM -0500, Tom Lendacky wrote:
> Add support to check if SME has been enabled and if memory encryption
> should be activated (checking of command line option based on the
> configuration of the default state).  If memory encryption is to be
> activated, then the encryption mask is set and the kernel is encrypted
> "in place."
> 
> Signed-off-by: Tom Lendacky 
> ---
>  arch/x86/kernel/head_64.S |1 +
>  arch/x86/mm/mem_encrypt.c |   83 
> +++--
>  2 files changed, 80 insertions(+), 4 deletions(-)

...

> +unsigned long __init sme_enable(struct boot_params *bp)
>  {
> + const char *cmdline_ptr, *cmdline_arg, *cmdline_on, *cmdline_off;
> + unsigned int eax, ebx, ecx, edx;
> + unsigned long me_mask;
> + bool active_by_default;
> + char buffer[16];
> + u64 msr;
> +
> + /* Check for the SME support leaf */
> + eax = 0x8000;
> + ecx = 0;
> + native_cpuid(&eax, &ebx, &ecx, &edx);
> + if (eax < 0x801f)
> + goto out;
> +
> + /*
> +  * Check for the SME feature:
> +  *   CPUID Fn8000_001F[EAX] - Bit 0
> +  * Secure Memory Encryption support
> +  *   CPUID Fn8000_001F[EBX] - Bits 5:0
> +  * Pagetable bit position used to indicate encryption
> +  */
> + eax = 0x801f;
> + ecx = 0;
> + native_cpuid(&eax, &ebx, &ecx, &edx);
> + if (!(eax & 1))
> + goto out;

< newline here.

> + me_mask = 1UL << (ebx & 0x3f);
> +
> + /* Check if SME is enabled */
> + msr = __rdmsr(MSR_K8_SYSCFG);
> + if (!(msr & MSR_K8_SYSCFG_MEM_ENCRYPT))
> + goto out;
> +
> + /*
> +  * Fixups have not been applied to phys_base yet, so we must obtain
> +  * the address to the SME command line option data in the following
> +  * way.
> +  */
> + asm ("lea sme_cmdline_arg(%%rip), %0"
> +  : "=r" (cmdline_arg)
> +  : "p" (sme_cmdline_arg));
> + asm ("lea sme_cmdline_on(%%rip), %0"
> +  : "=r" (cmdline_on)
> +  : "p" (sme_cmdline_on));
> + asm ("lea sme_cmdline_off(%%rip), %0"
> +  : "=r" (cmdline_off)
> +  : "p" (sme_cmdline_off));
> +
> + if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT))
> + active_by_default = true;
> + else
> + active_by_default = false;
> +
> + cmdline_ptr = (const char *)((u64)bp->hdr.cmd_line_ptr |
> +  ((u64)bp->ext_cmd_line_ptr << 32));
> +
> + cmdline_find_option(cmdline_ptr, cmdline_arg, buffer, sizeof(buffer));
> +
> + if (strncmp(buffer, cmdline_on, sizeof(buffer)) == 0)
> + sme_me_mask = me_mask;

Why doesn't simply

if (!strncmp(buffer, "on", 2))
...

work?

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html