Re: kexec regression since 4.9 caused by efi

2017-04-04 Thread Dave Young
On 04/04/17 at 02:37pm, Matt Fleming wrote:
> On Mon, 20 Mar, at 10:14:12AM, Dave Young wrote:
> > 
> > Matt, I'm fine if you prefer to capture the range checking errors.
> > Would you like me to post it or just you send it out?
> 
> Can you please send out the patch with the minimal change to
> efi_arch_mem_reserve() and we'll get it into urgent ASAP.

Omar has sent it out, for the lookup function issue I think I can do it
after this one later.

Thanks
Dave


Re: kexec regression since 4.9 caused by efi

2017-04-04 Thread Matt Fleming
On Mon, 20 Mar, at 10:14:12AM, Dave Young wrote:
> 
> Matt, I'm fine if you prefer to capture the range checking errors.
> Would you like me to post it or just you send it out?

Can you please send out the patch with the minimal change to
efi_arch_mem_reserve() and we'll get it into urgent ASAP.


Re: kexec regression since 4.9 caused by efi

2017-04-03 Thread Omar Sandoval
On Thu, Mar 16, 2017 at 10:50:48AM -0700, Omar Sandoval wrote:
> On Thu, Mar 16, 2017 at 12:41:32PM +, Matt Fleming wrote:
> > On Mon, 13 Mar, at 03:37:48PM, Dave Young wrote:
> > > 
> > > Omar, could you try below patch? Looking at the efi_mem_desc_lookup, it 
> > > is not
> > > correct to be used in efi_arch_mem_reserve, if it passed your test, I
> > > can rewrite patch log with more background and send it out:
> > > 
> > > for_each_efi_memory_desc(md) {
> > >   [snip]
> > > if (!(md->attribute & EFI_MEMORY_RUNTIME) &&
> > > md->type != EFI_BOOT_SERVICES_DATA &&
> > > md->type != EFI_RUNTIME_SERVICES_DATA) {
> > > continue;
> > > }
> > > 
> > > In above code, it meant to get a md of EFI_MEMORY_RUNTIME of either boot
> > > data or runtime data, this is wrong for efi_mem_reserve, because we are
> > > reserving boot data which has no EFI_MEMORY_RUNTIME attribute at the
> > > running time. Just is happened to work and we did not capture the error.
> > 
> > Wouldn't something like this be simpler?
> > 
> > ---
> > 
> > diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
> > index 30031d5293c4..cdfe8c628959 100644
> > --- a/arch/x86/platform/efi/quirks.c
> > +++ b/arch/x86/platform/efi/quirks.c
> > @@ -201,6 +201,10 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 
> > size)
> > return;
> > }
> >  
> > +   /* No need to reserve regions that will never be freed. */
> > +   if (md.attribute & EFI_MEMORY_RUNTIME)
> > +   return;
> > +
> > size += addr % EFI_PAGE_SIZE;
> > size = round_up(size, EFI_PAGE_SIZE);
> > addr = round_down(addr, EFI_PAGE_SIZE);
> 
> This works for me.
> 
> Reported-and-tested-by: Omar Sandoval 

Is this going to go in for 4.11?


Re: kexec regression since 4.9 caused by efi

2017-03-22 Thread Dave Young
On 03/22/17 at 04:10pm, Ard Biesheuvel wrote:
> On 21 March 2017 at 07:48, Dave Young  wrote:
> > On 03/20/17 at 10:14am, Dave Young wrote:
> >> On 03/17/17 at 01:32pm, Matt Fleming wrote:
> >> > On Fri, 17 Mar, at 10:09:51AM, Dave Young wrote:
> >> > >
> >> > > Matt, I think it should be fine although I think the md type checking 
> >> > > in
> >> > > efi_mem_desc_lookup() is causing confusion and not easy to understand..
> >> >
> >> > Could you make that a separate patch if you think of improvements
> >> > there?
> >>
> >> Duplicate the lookup function is indeed a little ugly, will do it when I
> >> have a better idea, we can leave it as is since it works.
> >
> > Matt, rethinking about this, how about doint something below, not
> > tested, just seeking for idea and opinons, in this way no need duplicate
> > a function, but there is an assumption that no overlapped mem ranges in
> > efi memmap.
> >
> > Also the case Omar reported is the esrt memory range type is
> > RUNTIME_DATA, that is a little different with the mem attribute of
> > RUNTIME which also includes BOOT_DATA which has been set the RUNTIME
> > attribute, like bgrt in kexec reboot. Should we distinguish the two
> > cases and give out some warnings or debug info?
> >
> >
> > ---
> >  arch/x86/platform/efi/quirks.c |5 +
> >  drivers/firmware/efi/efi.c |6 --
> >  drivers/firmware/efi/esrt.c|7 +++
> >  3 files changed, 12 insertions(+), 6 deletions(-)
> >
> > --- linux-x86.orig/drivers/firmware/efi/efi.c
> > +++ linux-x86/drivers/firmware/efi/efi.c
> > @@ -376,12 +376,6 @@ int __init efi_mem_desc_lookup(u64 phys_
> > u64 size;
> > u64 end;
> >
> > -   if (!(md->attribute & EFI_MEMORY_RUNTIME) &&
> > -   md->type != EFI_BOOT_SERVICES_DATA &&
> > -   md->type != EFI_RUNTIME_SERVICES_DATA) {
> > -   continue;
> > -   }
> > -
> > size = md->num_pages << EFI_PAGE_SHIFT;
> > end = md->phys_addr + size;
> > if (phys_addr >= md->phys_addr && phys_addr < end) {
> > --- linux-x86.orig/drivers/firmware/efi/esrt.c
> > +++ linux-x86/drivers/firmware/efi/esrt.c
> > @@ -258,6 +258,13 @@ void __init efi_esrt_init(void)
> > return;
> > }
> >
> > +   if (!(md->attribute & EFI_MEMORY_RUNTIME) &&
> > + md->type != EFI_BOOT_SERVICES_DATA &&
> > + md->type != EFI_RUNTIME_SERVICES_DATA) {
> > +   pr_err("ESRT header memory map type is invalid\n");
> > +   return;
> > +   }
> > +
> 
> This looks wrong to me. While the meanings get convoluted in practice,
> the EFI_MEMORY_RUNTIME attribute only means that the firmware requests
> a virtual mapping for the region. It is perfectly legal for a
> EFI_RUNTIME_SERVICES_DATA region not to have the EFI_MEMORY_RUNTIME
> attribute, if the region is never accessed by the runtime services
> themselves, and this is not entirely unlikely for tables that the
> firmware exposes to the OS

Thanks for the comment, if so "!(md->attribute & EFI_MEMORY_RUNTIME) &&"
should be dropped.

BTW, md->type should be md.type, bgrt reserving works fine with this
change but I have no esrt machine to test it. I would like to wait for
Matt's opinions about this first before an update. 

Also cc Peter about the esrt piece.
> 
> > max = efi_mem_desc_end(&md);
> > if (max < efi.esrt) {
> > pr_err("EFI memory descriptor is invalid. (esrt: %p max: 
> > %p)\n",
> > --- linux-x86.orig/arch/x86/platform/efi/quirks.c
> > +++ linux-x86/arch/x86/platform/efi/quirks.c
> > @@ -201,6 +201,11 @@ void __init efi_arch_mem_reserve(phys_ad
> > return;
> > }
> >
> > +   if (md->attribute & EFI_MEMORY_RUNTIME ||
> > + md->type != EFI_BOOT_SERVICES_DATA) {
> > +   return;
> > +   }
> > +
> > size += addr % EFI_PAGE_SIZE;
> > size = round_up(size, EFI_PAGE_SIZE);
> > addr = round_down(addr, EFI_PAGE_SIZE);
> >
> >>
> >> >
> >> > > How about move the if chunk early like below because it seems no need
> >> > > to sanity check the addr + size any more if the md is still RUNTIME?
> >> >
> >> > My original version did as you suggest, but I changed it because we
> >> > *really* want to know if someone tries to reserve a range that spans
> >> > regions. That would be totally unexpected and a warning about a
> >> > potential bug/issue.
> >>
> >> Matt, I'm fine if you prefer to capture the range checking errors.
> >> Would you like me to post it or just you send it out?
> >>
> >> Thanks
> >> Dave
> >
> > Thanks
> > Dave
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-efi" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kexec regression since 4.9 caused by efi

2017-03-22 Thread Ard Biesheuvel
On 21 March 2017 at 07:48, Dave Young  wrote:
> On 03/20/17 at 10:14am, Dave Young wrote:
>> On 03/17/17 at 01:32pm, Matt Fleming wrote:
>> > On Fri, 17 Mar, at 10:09:51AM, Dave Young wrote:
>> > >
>> > > Matt, I think it should be fine although I think the md type checking in
>> > > efi_mem_desc_lookup() is causing confusion and not easy to understand..
>> >
>> > Could you make that a separate patch if you think of improvements
>> > there?
>>
>> Duplicate the lookup function is indeed a little ugly, will do it when I
>> have a better idea, we can leave it as is since it works.
>
> Matt, rethinking about this, how about doint something below, not
> tested, just seeking for idea and opinons, in this way no need duplicate
> a function, but there is an assumption that no overlapped mem ranges in
> efi memmap.
>
> Also the case Omar reported is the esrt memory range type is
> RUNTIME_DATA, that is a little different with the mem attribute of
> RUNTIME which also includes BOOT_DATA which has been set the RUNTIME
> attribute, like bgrt in kexec reboot. Should we distinguish the two
> cases and give out some warnings or debug info?
>
>
> ---
>  arch/x86/platform/efi/quirks.c |5 +
>  drivers/firmware/efi/efi.c |6 --
>  drivers/firmware/efi/esrt.c|7 +++
>  3 files changed, 12 insertions(+), 6 deletions(-)
>
> --- linux-x86.orig/drivers/firmware/efi/efi.c
> +++ linux-x86/drivers/firmware/efi/efi.c
> @@ -376,12 +376,6 @@ int __init efi_mem_desc_lookup(u64 phys_
> u64 size;
> u64 end;
>
> -   if (!(md->attribute & EFI_MEMORY_RUNTIME) &&
> -   md->type != EFI_BOOT_SERVICES_DATA &&
> -   md->type != EFI_RUNTIME_SERVICES_DATA) {
> -   continue;
> -   }
> -
> size = md->num_pages << EFI_PAGE_SHIFT;
> end = md->phys_addr + size;
> if (phys_addr >= md->phys_addr && phys_addr < end) {
> --- linux-x86.orig/drivers/firmware/efi/esrt.c
> +++ linux-x86/drivers/firmware/efi/esrt.c
> @@ -258,6 +258,13 @@ void __init efi_esrt_init(void)
> return;
> }
>
> +   if (!(md->attribute & EFI_MEMORY_RUNTIME) &&
> + md->type != EFI_BOOT_SERVICES_DATA &&
> + md->type != EFI_RUNTIME_SERVICES_DATA) {
> +   pr_err("ESRT header memory map type is invalid\n");
> +   return;
> +   }
> +

This looks wrong to me. While the meanings get convoluted in practice,
the EFI_MEMORY_RUNTIME attribute only means that the firmware requests
a virtual mapping for the region. It is perfectly legal for a
EFI_RUNTIME_SERVICES_DATA region not to have the EFI_MEMORY_RUNTIME
attribute, if the region is never accessed by the runtime services
themselves, and this is not entirely unlikely for tables that the
firmware exposes to the OS

> max = efi_mem_desc_end(&md);
> if (max < efi.esrt) {
> pr_err("EFI memory descriptor is invalid. (esrt: %p max: 
> %p)\n",
> --- linux-x86.orig/arch/x86/platform/efi/quirks.c
> +++ linux-x86/arch/x86/platform/efi/quirks.c
> @@ -201,6 +201,11 @@ void __init efi_arch_mem_reserve(phys_ad
> return;
> }
>
> +   if (md->attribute & EFI_MEMORY_RUNTIME ||
> + md->type != EFI_BOOT_SERVICES_DATA) {
> +   return;
> +   }
> +
> size += addr % EFI_PAGE_SIZE;
> size = round_up(size, EFI_PAGE_SIZE);
> addr = round_down(addr, EFI_PAGE_SIZE);
>
>>
>> >
>> > > How about move the if chunk early like below because it seems no need
>> > > to sanity check the addr + size any more if the md is still RUNTIME?
>> >
>> > My original version did as you suggest, but I changed it because we
>> > *really* want to know if someone tries to reserve a range that spans
>> > regions. That would be totally unexpected and a warning about a
>> > potential bug/issue.
>>
>> Matt, I'm fine if you prefer to capture the range checking errors.
>> Would you like me to post it or just you send it out?
>>
>> Thanks
>> Dave
>
> Thanks
> Dave
> --
> To unsubscribe from this list: send the line "unsubscribe linux-efi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kexec regression since 4.9 caused by efi

2017-03-21 Thread Dave Young
On 03/20/17 at 10:14am, Dave Young wrote:
> On 03/17/17 at 01:32pm, Matt Fleming wrote:
> > On Fri, 17 Mar, at 10:09:51AM, Dave Young wrote:
> > > 
> > > Matt, I think it should be fine although I think the md type checking in
> > > efi_mem_desc_lookup() is causing confusion and not easy to understand..
> >  
> > Could you make that a separate patch if you think of improvements
> > there?
> 
> Duplicate the lookup function is indeed a little ugly, will do it when I
> have a better idea, we can leave it as is since it works.

Matt, rethinking about this, how about doint something below, not
tested, just seeking for idea and opinons, in this way no need duplicate
a function, but there is an assumption that no overlapped mem ranges in
efi memmap.

Also the case Omar reported is the esrt memory range type is
RUNTIME_DATA, that is a little different with the mem attribute of
RUNTIME which also includes BOOT_DATA which has been set the RUNTIME
attribute, like bgrt in kexec reboot. Should we distinguish the two
cases and give out some warnings or debug info?


---
 arch/x86/platform/efi/quirks.c |5 +
 drivers/firmware/efi/efi.c |6 --
 drivers/firmware/efi/esrt.c|7 +++
 3 files changed, 12 insertions(+), 6 deletions(-)

--- linux-x86.orig/drivers/firmware/efi/efi.c
+++ linux-x86/drivers/firmware/efi/efi.c
@@ -376,12 +376,6 @@ int __init efi_mem_desc_lookup(u64 phys_
u64 size;
u64 end;
 
-   if (!(md->attribute & EFI_MEMORY_RUNTIME) &&
-   md->type != EFI_BOOT_SERVICES_DATA &&
-   md->type != EFI_RUNTIME_SERVICES_DATA) {
-   continue;
-   }
-
size = md->num_pages << EFI_PAGE_SHIFT;
end = md->phys_addr + size;
if (phys_addr >= md->phys_addr && phys_addr < end) {
--- linux-x86.orig/drivers/firmware/efi/esrt.c
+++ linux-x86/drivers/firmware/efi/esrt.c
@@ -258,6 +258,13 @@ void __init efi_esrt_init(void)
return;
}
 
+   if (!(md->attribute & EFI_MEMORY_RUNTIME) &&
+ md->type != EFI_BOOT_SERVICES_DATA &&
+ md->type != EFI_RUNTIME_SERVICES_DATA) {
+   pr_err("ESRT header memory map type is invalid\n");
+   return;
+   }
+
max = efi_mem_desc_end(&md);
if (max < efi.esrt) {
pr_err("EFI memory descriptor is invalid. (esrt: %p max: %p)\n",
--- linux-x86.orig/arch/x86/platform/efi/quirks.c
+++ linux-x86/arch/x86/platform/efi/quirks.c
@@ -201,6 +201,11 @@ void __init efi_arch_mem_reserve(phys_ad
return;
}
 
+   if (md->attribute & EFI_MEMORY_RUNTIME ||
+ md->type != EFI_BOOT_SERVICES_DATA) {
+   return;
+   }
+
size += addr % EFI_PAGE_SIZE;
size = round_up(size, EFI_PAGE_SIZE);
addr = round_down(addr, EFI_PAGE_SIZE);

> 
> > 
> > > How about move the if chunk early like below because it seems no need
> > > to sanity check the addr + size any more if the md is still RUNTIME?
> > 
> > My original version did as you suggest, but I changed it because we
> > *really* want to know if someone tries to reserve a range that spans
> > regions. That would be totally unexpected and a warning about a
> > potential bug/issue.
> 
> Matt, I'm fine if you prefer to capture the range checking errors.
> Would you like me to post it or just you send it out?
> 
> Thanks
> Dave

Thanks
Dave


Re: kexec regression since 4.9 caused by efi

2017-03-19 Thread Dave Young
On 03/17/17 at 01:32pm, Matt Fleming wrote:
> On Fri, 17 Mar, at 10:09:51AM, Dave Young wrote:
> > 
> > Matt, I think it should be fine although I think the md type checking in
> > efi_mem_desc_lookup() is causing confusion and not easy to understand..
>  
> Could you make that a separate patch if you think of improvements
> there?

Duplicate the lookup function is indeed a little ugly, will do it when I
have a better idea, we can leave it as is since it works.

> 
> > How about move the if chunk early like below because it seems no need
> > to sanity check the addr + size any more if the md is still RUNTIME?
> 
> My original version did as you suggest, but I changed it because we
> *really* want to know if someone tries to reserve a range that spans
> regions. That would be totally unexpected and a warning about a
> potential bug/issue.

Matt, I'm fine if you prefer to capture the range checking errors.
Would you like me to post it or just you send it out?

Thanks
Dave


Re: kexec regression since 4.9 caused by efi

2017-03-17 Thread Matt Fleming
On Fri, 17 Mar, at 10:09:51AM, Dave Young wrote:
> 
> Matt, I think it should be fine although I think the md type checking in
> efi_mem_desc_lookup() is causing confusion and not easy to understand..
 
Could you make that a separate patch if you think of improvements
there?

> How about move the if chunk early like below because it seems no need
> to sanity check the addr + size any more if the md is still RUNTIME?

My original version did as you suggest, but I changed it because we
*really* want to know if someone tries to reserve a range that spans
regions. That would be totally unexpected and a warning about a
potential bug/issue.


Re: kexec regression since 4.9 caused by efi

2017-03-17 Thread Ard Biesheuvel
On 17 March 2017 at 02:09, Dave Young  wrote:
> On 03/16/17 at 12:41pm, Matt Fleming wrote:
>> On Mon, 13 Mar, at 03:37:48PM, Dave Young wrote:
>> >
>> > Omar, could you try below patch? Looking at the efi_mem_desc_lookup, it is 
>> > not
>> > correct to be used in efi_arch_mem_reserve, if it passed your test, I
>> > can rewrite patch log with more background and send it out:
>> >
>> > for_each_efi_memory_desc(md) {
>> > [snip]
>> > if (!(md->attribute & EFI_MEMORY_RUNTIME) &&
>> > md->type != EFI_BOOT_SERVICES_DATA &&
>> > md->type != EFI_RUNTIME_SERVICES_DATA) {
>> > continue;
>> > }
>> >
>> > In above code, it meant to get a md of EFI_MEMORY_RUNTIME of either boot
>> > data or runtime data, this is wrong for efi_mem_reserve, because we are
>> > reserving boot data which has no EFI_MEMORY_RUNTIME attribute at the
>> > running time. Just is happened to work and we did not capture the error.
>>
>> Wouldn't something like this be simpler?
>>
>> ---
>>
>> diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
>> index 30031d5293c4..cdfe8c628959 100644
>> --- a/arch/x86/platform/efi/quirks.c
>> +++ b/arch/x86/platform/efi/quirks.c
>> @@ -201,6 +201,10 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 
>> size)
>>   return;
>>   }
>>
>> + /* No need to reserve regions that will never be freed. */
>> + if (md.attribute & EFI_MEMORY_RUNTIME)
>> + return;
>> +
>
> Matt, I think it should be fine although I think the md type checking in
> efi_mem_desc_lookup() is causing confusion and not easy to understand..
>
> How about move the if chunk early like below because it seems no need
> to sanity check the addr + size any more if the md is still RUNTIME?
>
> --- linux-x86.orig/arch/x86/platform/efi/quirks.c
> +++ linux-x86/arch/x86/platform/efi/quirks.c
> @@ -196,6 +196,10 @@ void __init efi_arch_mem_reserve(phys_ad
> return;
> }
>
> +   /* No need to reserve regions that will never be freed. */
> +   if (md.attribute & EFI_MEMORY_RUNTIME)
> +   return;
> +
> if (addr + size > md.phys_addr + (md.num_pages << EFI_PAGE_SHIFT)) {
> pr_err("Region spans EFI memory descriptors, %pa\n", &addr);
> return;
>

This way, we suppress the error it the region spans multiple
descriptors, and only the first one has the runtime attribute. So the
two patches are not equivalent. I don't have a strong preference
either way, though.


Re: kexec regression since 4.9 caused by efi

2017-03-16 Thread Dave Young
On 03/16/17 at 12:41pm, Matt Fleming wrote:
> On Mon, 13 Mar, at 03:37:48PM, Dave Young wrote:
> > 
> > Omar, could you try below patch? Looking at the efi_mem_desc_lookup, it is 
> > not
> > correct to be used in efi_arch_mem_reserve, if it passed your test, I
> > can rewrite patch log with more background and send it out:
> > 
> > for_each_efi_memory_desc(md) {
> > [snip]
> > if (!(md->attribute & EFI_MEMORY_RUNTIME) &&
> > md->type != EFI_BOOT_SERVICES_DATA &&
> > md->type != EFI_RUNTIME_SERVICES_DATA) {
> > continue;
> > }
> > 
> > In above code, it meant to get a md of EFI_MEMORY_RUNTIME of either boot
> > data or runtime data, this is wrong for efi_mem_reserve, because we are
> > reserving boot data which has no EFI_MEMORY_RUNTIME attribute at the
> > running time. Just is happened to work and we did not capture the error.
> 
> Wouldn't something like this be simpler?
> 
> ---
> 
> diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
> index 30031d5293c4..cdfe8c628959 100644
> --- a/arch/x86/platform/efi/quirks.c
> +++ b/arch/x86/platform/efi/quirks.c
> @@ -201,6 +201,10 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 
> size)
>   return;
>   }
>  
> + /* No need to reserve regions that will never be freed. */
> + if (md.attribute & EFI_MEMORY_RUNTIME)
> + return;
> +

Matt, I think it should be fine although I think the md type checking in
efi_mem_desc_lookup() is causing confusion and not easy to understand..

How about move the if chunk early like below because it seems no need
to sanity check the addr + size any more if the md is still RUNTIME?

--- linux-x86.orig/arch/x86/platform/efi/quirks.c
+++ linux-x86/arch/x86/platform/efi/quirks.c
@@ -196,6 +196,10 @@ void __init efi_arch_mem_reserve(phys_ad
return;
}
 
+   /* No need to reserve regions that will never be freed. */
+   if (md.attribute & EFI_MEMORY_RUNTIME)
+   return;
+
if (addr + size > md.phys_addr + (md.num_pages << EFI_PAGE_SHIFT)) {
pr_err("Region spans EFI memory descriptors, %pa\n", &addr);
return;

Thanks
Dave


Re: kexec regression since 4.9 caused by efi

2017-03-16 Thread Omar Sandoval
On Thu, Mar 16, 2017 at 12:41:32PM +, Matt Fleming wrote:
> On Mon, 13 Mar, at 03:37:48PM, Dave Young wrote:
> > 
> > Omar, could you try below patch? Looking at the efi_mem_desc_lookup, it is 
> > not
> > correct to be used in efi_arch_mem_reserve, if it passed your test, I
> > can rewrite patch log with more background and send it out:
> > 
> > for_each_efi_memory_desc(md) {
> > [snip]
> > if (!(md->attribute & EFI_MEMORY_RUNTIME) &&
> > md->type != EFI_BOOT_SERVICES_DATA &&
> > md->type != EFI_RUNTIME_SERVICES_DATA) {
> > continue;
> > }
> > 
> > In above code, it meant to get a md of EFI_MEMORY_RUNTIME of either boot
> > data or runtime data, this is wrong for efi_mem_reserve, because we are
> > reserving boot data which has no EFI_MEMORY_RUNTIME attribute at the
> > running time. Just is happened to work and we did not capture the error.
> 
> Wouldn't something like this be simpler?
> 
> ---
> 
> diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
> index 30031d5293c4..cdfe8c628959 100644
> --- a/arch/x86/platform/efi/quirks.c
> +++ b/arch/x86/platform/efi/quirks.c
> @@ -201,6 +201,10 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 
> size)
>   return;
>   }
>  
> + /* No need to reserve regions that will never be freed. */
> + if (md.attribute & EFI_MEMORY_RUNTIME)
> + return;
> +
>   size += addr % EFI_PAGE_SIZE;
>   size = round_up(size, EFI_PAGE_SIZE);
>   addr = round_down(addr, EFI_PAGE_SIZE);

This works for me.

Reported-and-tested-by: Omar Sandoval 


Re: kexec regression since 4.9 caused by efi

2017-03-16 Thread Matt Fleming
On Mon, 13 Mar, at 03:37:48PM, Dave Young wrote:
> 
> Omar, could you try below patch? Looking at the efi_mem_desc_lookup, it is not
> correct to be used in efi_arch_mem_reserve, if it passed your test, I
> can rewrite patch log with more background and send it out:
> 
> for_each_efi_memory_desc(md) {
>   [snip]
> if (!(md->attribute & EFI_MEMORY_RUNTIME) &&
> md->type != EFI_BOOT_SERVICES_DATA &&
> md->type != EFI_RUNTIME_SERVICES_DATA) {
> continue;
> }
> 
> In above code, it meant to get a md of EFI_MEMORY_RUNTIME of either boot
> data or runtime data, this is wrong for efi_mem_reserve, because we are
> reserving boot data which has no EFI_MEMORY_RUNTIME attribute at the
> running time. Just is happened to work and we did not capture the error.

Wouldn't something like this be simpler?

---

diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 30031d5293c4..cdfe8c628959 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -201,6 +201,10 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 
size)
return;
}
 
+   /* No need to reserve regions that will never be freed. */
+   if (md.attribute & EFI_MEMORY_RUNTIME)
+   return;
+
size += addr % EFI_PAGE_SIZE;
size = round_up(size, EFI_PAGE_SIZE);
addr = round_down(addr, EFI_PAGE_SIZE);


Re: kexec regression since 4.9 caused by efi

2017-03-16 Thread Matt Fleming
On Thu, 09 Mar, at 12:53:36PM, Ard Biesheuvel wrote:
> 
> Hi Omar,
> 
> Thanks for tracking this down.
> 
> I wonder if this is an unintended side effect of the way we repurpose
> the EFI_MEMORY_RUNTIME attribute in efi_arch_mem_reserve(). AFAIUI,
> splitting memory map entries should only be necessary for regions that
> are not runtime memory regions to begin with, and so whether their
> virtual mapping address makes sense or not should be irrelevant.
> 
> Perhaps this only illustrates my lack of understanding of the x86 way
> of doing this, so perhaps Matt can shed some light on this?

Sorry for the delay.

Yes, Ard is correct. It's not necessary to split/reserve memory
regions that already have the EFI_MEMORY_RUNTIME attribute.


Re: kexec regression since 4.9 caused by efi

2017-03-13 Thread Dave Young
On 03/09/17 at 01:54am, Omar Sandoval wrote:
> On Thu, Mar 09, 2017 at 02:38:06PM +0800, Dave Young wrote:
> > Add efi/kexec list.
> > 
> > On 03/08/17 at 12:16pm, Omar Sandoval wrote:
> 
> [snip]
> 
> > I have no more clue yet from your provided log, but the runtime value is
> > odd to me. It is set in below code:
> > 
> > arch/x86/platform/efi/efi.c: efi_systab_init()
> > efi_systab.runtime = data ?
> >  (void *)(unsigned long)data->runtime :
> >  (void *)(unsigne long)systab64->runtime;
> > 
> > Here data is the setup_data passed by kexec-tools from normal kernel to
> > kexec kernel, efi_setup_data structure is like below: 
> > struct efi_setup_data {
> > u64 fw_vendor;
> > u64 runtime;
> > u64 tables;
> > u64 smbios;
> > u64 reserved[8];
> > };
> > 
> > kexec-tools get the runtime address from /sys/firmware/efi/runtime
> > 
> > So can you do some debuggin on your side, eg. see the sysfs runtime
> > value is correct or not. And add some printk in efi init path etc.
> 
> The attached patch fixes this for me.

Omar, could you try below patch? Looking at the efi_mem_desc_lookup, it is not
correct to be used in efi_arch_mem_reserve, if it passed your test, I
can rewrite patch log with more background and send it out:

for_each_efi_memory_desc(md) {
[snip]
if (!(md->attribute & EFI_MEMORY_RUNTIME) &&
md->type != EFI_BOOT_SERVICES_DATA &&
md->type != EFI_RUNTIME_SERVICES_DATA) {
continue;
}

In above code, it meant to get a md of EFI_MEMORY_RUNTIME of either boot
data or runtime data, this is wrong for efi_mem_reserve, because we are
reserving boot data which has no EFI_MEMORY_RUNTIME attribute at the
running time. Just is happened to work and we did not capture the error.

Signed-off-by: Dave Young 
---
 arch/x86/platform/efi/quirks.c |4 +++-
 drivers/firmware/efi/efi.c |   39 +++
 include/linux/efi.h|1 +
 3 files changed, 43 insertions(+), 1 deletion(-)

--- linux-x86.orig/arch/x86/platform/efi/quirks.c
+++ linux-x86/arch/x86/platform/efi/quirks.c
@@ -191,7 +191,9 @@ void __init efi_arch_mem_reserve(phys_ad
int num_entries;
void *new;
 
-   if (efi_mem_desc_lookup(addr, &md)) {
+   if (efi_md_lookup_boot_data(addr, &md)) {
+   if (md.attribute & EFI_MEMORY_RUNTIME)
+   return;
pr_err("Failed to lookup EFI memory descriptor for %pa\n", 
&addr);
return;
}
--- linux-x86.orig/drivers/firmware/efi/efi.c
+++ linux-x86/drivers/firmware/efi/efi.c
@@ -353,6 +353,45 @@ err_put:
 subsys_initcall(efisubsys_init);
 
 /*
+ * Find the efi memory descriptor for a given physical address.
+ * Given a physical address, if it exists within an EFI memory map
+ * entry of type EFI_BOOT_SERVICES_DATA and the entry has no attribute
+ * EFI_MEMORY_RUNTIME, then populate the supplied memory descriptor
+ * with the appropriate data.
+ */
+int __init efi_md_lookup_boot_data(u64 phys_addr,
+efi_memory_desc_t *out_md)
+{
+   efi_memory_desc_t *md;
+
+   if (!efi_enabled(EFI_MEMMAP)) {
+   pr_err_once("EFI_MEMMAP is not enabled.\n");
+   return -EINVAL;
+   }
+
+   if (!out_md) {
+   pr_err_once("out_md is null.\n");
+   return -EINVAL;
+   }
+
+   for_each_efi_memory_desc(md) {
+   u64 size;
+   u64 end;
+
+   if (md->type != EFI_BOOT_SERVICES_DATA)
+   continue;
+
+   size = md->num_pages << EFI_PAGE_SHIFT;
+   end = md->phys_addr + size;
+   if (phys_addr >= md->phys_addr && phys_addr < end) {
+   memcpy(out_md, md, sizeof(*out_md));
+   return 0;
+   }
+   }
+   return -ENOENT;
+}
+
+/*
  * Find the efi memory descriptor for a given physical address.  Given a
  * physical address, determine if it exists within an EFI Memory Map entry,
  * and if so, populate the supplied memory descriptor with the appropriate
--- linux-x86.orig/include/linux/efi.h
+++ linux-x86/include/linux/efi.h
@@ -979,6 +979,7 @@ extern u64 efi_mem_attribute (unsigned l
 extern int __init efi_uart_console_only (void);
 extern u64 efi_mem_desc_end(efi_memory_desc_t *md);
 extern int efi_mem_desc_lookup(u64 phys_addr, efi_memory_desc_t *out_md);
+extern int efi_md_lookup_boot_data(u64 phys_addr, efi_memory_desc_t *out_md);
 extern void efi_mem_reserve(phys_addr_t addr, u64 size);
 extern void efi_initialize_iomem_resources(struct resource *code_resource,
struct resource *data_resource, struct resource *bss_resource);


Re: kexec regression since 4.9 caused by efi

2017-03-09 Thread Dave Young
On 03/09/17 at 01:54am, Omar Sandoval wrote:
> On Thu, Mar 09, 2017 at 02:38:06PM +0800, Dave Young wrote:
> > Add efi/kexec list.
> > 
> > On 03/08/17 at 12:16pm, Omar Sandoval wrote:
> 
> [snip]
> 
> > I have no more clue yet from your provided log, but the runtime value is
> > odd to me. It is set in below code:
> > 
> > arch/x86/platform/efi/efi.c: efi_systab_init()
> > efi_systab.runtime = data ?
> >  (void *)(unsigned long)data->runtime :
> >  (void *)(unsigne long)systab64->runtime;
> > 
> > Here data is the setup_data passed by kexec-tools from normal kernel to
> > kexec kernel, efi_setup_data structure is like below: 
> > struct efi_setup_data {
> > u64 fw_vendor;
> > u64 runtime;
> > u64 tables;
> > u64 smbios;
> > u64 reserved[8];
> > };
> > 
> > kexec-tools get the runtime address from /sys/firmware/efi/runtime
> > 
> > So can you do some debuggin on your side, eg. see the sysfs runtime
> > value is correct or not. And add some printk in efi init path etc.
> 
> The attached patch fixes this for me.
> From 4b343f0b0b408469f28c973ea52877797a166313 Mon Sep 17 00:00:00 2001
> Message-Id: 
> <4b343f0b0b408469f28c973ea52877797a166313.1489053164.git.osan...@fb.com>
> From: Omar Sandoval 
> Date: Thu, 9 Mar 2017 01:46:19 -0800
> Subject: [PATCH] efi: adjust virt_addr when splitting descriptors in
>  efi_memmap_insert()
> 
> When we split efi memory descriptors, we adjust the physical address but
> not the virtual address it maps to. This leads to bogus memory mappings
> later when these virtual addresses are used.
> 
> This fixes a kexec boot regression since 8e80632fb23f ("efi/esrt: Use
> efi_mem_reserve() and avoid a kmalloc()"), although the bug was only
> exposed by that commit.
> 
> Signed-off-by: Omar Sandoval 
> ---
>  drivers/firmware/efi/memmap.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c
> index 78686443cb37..ca614db76faf 100644
> --- a/drivers/firmware/efi/memmap.c
> +++ b/drivers/firmware/efi/memmap.c
> @@ -298,6 +298,7 @@ void __init efi_memmap_insert(struct efi_memory_map 
> *old_memmap, void *buf,
>   memcpy(new, old, old_memmap->desc_size);
>   md = new;
>   md->phys_addr = m_end + 1;
> + md->virt_addr += md->phys_addr - start;
>   md->num_pages = (end - md->phys_addr + 1) >>
>   EFI_PAGE_SHIFT;
>   }
> @@ -312,6 +313,7 @@ void __init efi_memmap_insert(struct efi_memory_map 
> *old_memmap, void *buf,
>   md = new;
>   md->attribute |= m_attr;
>   md->phys_addr = m_start;
> + md->virt_addr += md->phys_addr - start;
>   md->num_pages = (m_end - m_start + 1) >>
>   EFI_PAGE_SHIFT;
>   /* last part */
> @@ -319,6 +321,7 @@ void __init efi_memmap_insert(struct efi_memory_map 
> *old_memmap, void *buf,
>   memcpy(new, old, old_memmap->desc_size);
>   md = new;
>   md->phys_addr = m_end + 1;
> + md->virt_addr += md->phys_addr - start;
>   md->num_pages = (end - m_end) >>
>   EFI_PAGE_SHIFT;
>   }
> @@ -333,6 +336,7 @@ void __init efi_memmap_insert(struct efi_memory_map 
> *old_memmap, void *buf,
>   memcpy(new, old, old_memmap->desc_size);
>   md = new;
>   md->phys_addr = m_start;
> + md->virt_addr += md->phys_addr - start;
>   md->num_pages = (end - md->phys_addr + 1) >>
>   EFI_PAGE_SHIFT;
>   md->attribute |= m_attr;
> -- 
> 2.12.0
> 

Nice, thanks for the debugging, so the problem is clear now.

Just Runtime areas are not necessarily to be reserved, for boot areas no
need to update the virt address. But I'm not sure about the fakemem
usage of this.

So need comments from Matt..

Thanks
Dave


Re: kexec regression since 4.9 caused by efi

2017-03-09 Thread Dave Young
On 03/09/17 at 12:53pm, Ard Biesheuvel wrote:
> On 9 March 2017 at 10:54, Omar Sandoval  wrote:
> > On Thu, Mar 09, 2017 at 02:38:06PM +0800, Dave Young wrote:
> >> Add efi/kexec list.
> >>
> >> On 03/08/17 at 12:16pm, Omar Sandoval wrote:
> >
> > [snip]
> >
> >> I have no more clue yet from your provided log, but the runtime value is
> >> odd to me. It is set in below code:
> >>
> >> arch/x86/platform/efi/efi.c: efi_systab_init()
> >>   efi_systab.runtime = data ?
> >>(void *)(unsigned long)data->runtime :
> >>(void *)(unsigne long)systab64->runtime;
> >>
> >> Here data is the setup_data passed by kexec-tools from normal kernel to
> >> kexec kernel, efi_setup_data structure is like below:
> >> struct efi_setup_data {
> >> u64 fw_vendor;
> >> u64 runtime;
> >> u64 tables;
> >> u64 smbios;
> >> u64 reserved[8];
> >> };
> >>
> >> kexec-tools get the runtime address from /sys/firmware/efi/runtime
> >>
> >> So can you do some debuggin on your side, eg. see the sysfs runtime
> >> value is correct or not. And add some printk in efi init path etc.
> >
> > The attached patch fixes this for me.
> 
> Hi Omar,
> 
> Thanks for tracking this down.
> 
> I wonder if this is an unintended side effect of the way we repurpose
> the EFI_MEMORY_RUNTIME attribute in efi_arch_mem_reserve(). AFAIUI,
> splitting memory map entries should only be necessary for regions that
> are not runtime memory regions to begin with, and so whether their
> virtual mapping address makes sense or not should be irrelevant.

In this case the esrt chunk are Runtime Data which is not necessary to
be reserved explicitly. I think efi_arch_mem_reserve are for boot areas.

Probably there could be esrt data which belongs to boot data? If we are
sure they are all runtime, the better fix may be just dropping the
efi_mem_reserve in esrt.c

> 
> Perhaps this only illustrates my lack of understanding of the x86 way
> of doing this, so perhaps Matt can shed some light on this?
> 
> Thanks,
> Ard.

Thanks
Dave


Re: kexec regression since 4.9 caused by efi

2017-03-09 Thread Ard Biesheuvel
On 9 March 2017 at 10:54, Omar Sandoval  wrote:
> On Thu, Mar 09, 2017 at 02:38:06PM +0800, Dave Young wrote:
>> Add efi/kexec list.
>>
>> On 03/08/17 at 12:16pm, Omar Sandoval wrote:
>
> [snip]
>
>> I have no more clue yet from your provided log, but the runtime value is
>> odd to me. It is set in below code:
>>
>> arch/x86/platform/efi/efi.c: efi_systab_init()
>>   efi_systab.runtime = data ?
>>(void *)(unsigned long)data->runtime :
>>(void *)(unsigne long)systab64->runtime;
>>
>> Here data is the setup_data passed by kexec-tools from normal kernel to
>> kexec kernel, efi_setup_data structure is like below:
>> struct efi_setup_data {
>> u64 fw_vendor;
>> u64 runtime;
>> u64 tables;
>> u64 smbios;
>> u64 reserved[8];
>> };
>>
>> kexec-tools get the runtime address from /sys/firmware/efi/runtime
>>
>> So can you do some debuggin on your side, eg. see the sysfs runtime
>> value is correct or not. And add some printk in efi init path etc.
>
> The attached patch fixes this for me.

Hi Omar,

Thanks for tracking this down.

I wonder if this is an unintended side effect of the way we repurpose
the EFI_MEMORY_RUNTIME attribute in efi_arch_mem_reserve(). AFAIUI,
splitting memory map entries should only be necessary for regions that
are not runtime memory regions to begin with, and so whether their
virtual mapping address makes sense or not should be irrelevant.

Perhaps this only illustrates my lack of understanding of the x86 way
of doing this, so perhaps Matt can shed some light on this?

Thanks,
Ard.


Re: kexec regression since 4.9 caused by efi

2017-03-09 Thread Omar Sandoval
On Thu, Mar 09, 2017 at 02:38:06PM +0800, Dave Young wrote:
> Add efi/kexec list.
> 
> On 03/08/17 at 12:16pm, Omar Sandoval wrote:

[snip]

> I have no more clue yet from your provided log, but the runtime value is
> odd to me. It is set in below code:
> 
> arch/x86/platform/efi/efi.c: efi_systab_init()
>   efi_systab.runtime = data ?
>(void *)(unsigned long)data->runtime :
>(void *)(unsigne long)systab64->runtime;
> 
> Here data is the setup_data passed by kexec-tools from normal kernel to
> kexec kernel, efi_setup_data structure is like below: 
> struct efi_setup_data {
> u64 fw_vendor;
> u64 runtime;
> u64 tables;
> u64 smbios;
> u64 reserved[8];
> };
> 
> kexec-tools get the runtime address from /sys/firmware/efi/runtime
> 
> So can you do some debuggin on your side, eg. see the sysfs runtime
> value is correct or not. And add some printk in efi init path etc.

The attached patch fixes this for me.
>From 4b343f0b0b408469f28c973ea52877797a166313 Mon Sep 17 00:00:00 2001
Message-Id: <4b343f0b0b408469f28c973ea52877797a166313.1489053164.git.osan...@fb.com>
From: Omar Sandoval 
Date: Thu, 9 Mar 2017 01:46:19 -0800
Subject: [PATCH] efi: adjust virt_addr when splitting descriptors in
 efi_memmap_insert()

When we split efi memory descriptors, we adjust the physical address but
not the virtual address it maps to. This leads to bogus memory mappings
later when these virtual addresses are used.

This fixes a kexec boot regression since 8e80632fb23f ("efi/esrt: Use
efi_mem_reserve() and avoid a kmalloc()"), although the bug was only
exposed by that commit.

Signed-off-by: Omar Sandoval 
---
 drivers/firmware/efi/memmap.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c
index 78686443cb37..ca614db76faf 100644
--- a/drivers/firmware/efi/memmap.c
+++ b/drivers/firmware/efi/memmap.c
@@ -298,6 +298,7 @@ void __init efi_memmap_insert(struct efi_memory_map *old_memmap, void *buf,
 			memcpy(new, old, old_memmap->desc_size);
 			md = new;
 			md->phys_addr = m_end + 1;
+			md->virt_addr += md->phys_addr - start;
 			md->num_pages = (end - md->phys_addr + 1) >>
 EFI_PAGE_SHIFT;
 		}
@@ -312,6 +313,7 @@ void __init efi_memmap_insert(struct efi_memory_map *old_memmap, void *buf,
 			md = new;
 			md->attribute |= m_attr;
 			md->phys_addr = m_start;
+			md->virt_addr += md->phys_addr - start;
 			md->num_pages = (m_end - m_start + 1) >>
 EFI_PAGE_SHIFT;
 			/* last part */
@@ -319,6 +321,7 @@ void __init efi_memmap_insert(struct efi_memory_map *old_memmap, void *buf,
 			memcpy(new, old, old_memmap->desc_size);
 			md = new;
 			md->phys_addr = m_end + 1;
+			md->virt_addr += md->phys_addr - start;
 			md->num_pages = (end - m_end) >>
 EFI_PAGE_SHIFT;
 		}
@@ -333,6 +336,7 @@ void __init efi_memmap_insert(struct efi_memory_map *old_memmap, void *buf,
 			memcpy(new, old, old_memmap->desc_size);
 			md = new;
 			md->phys_addr = m_start;
+			md->virt_addr += md->phys_addr - start;
 			md->num_pages = (end - md->phys_addr + 1) >>
 EFI_PAGE_SHIFT;
 			md->attribute |= m_attr;
-- 
2.12.0



Re: kexec regression since 4.9 caused by efi

2017-03-08 Thread Omar Sandoval
On Thu, Mar 09, 2017 at 10:21:36AM +0800, Dave Young wrote:
> I have no esrt machine to test, can you share the full kernel log with
> efi=debug in kernel cmdline?
> 
> *) normal boot kernel log without the reverting
> *) kexec boot log with and without the reverting

Attached.
[0.00] Linux version 4.11.0-rc1 (osan...@devbig561.prn1.facebook.com) 
(gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #53 SMP Wed Mar 8 
12:07:16 PST 2017
[0.00] Command line: BOOT_IMAGE=/vmlinuz-4.11.0-rc1 ro root=LABEL=/ 
ipv6.autoconf=0 erst_disable biosdevname=0 net.ifnames=0 fsck.repair=yes 
pcie_pme=nomsi 
netconsole=+@2401:db00:0011:b03e:face::0009:/eth0,1514@2401:db00:eef0:a59::/02:90:fb:5b:b7:1e
 crashkernel=128M console=tty0 console=ttyS1,57600 efi=debug
[0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point 
registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[0.00] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[0.00] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, 
using 'standard' format.
[0.00] e820: BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009] usable
[0.00] BIOS-e820: [mem 0x0010-0x750bdfff] usable
[0.00] BIOS-e820: [mem 0x750be000-0x75ddbfff] reserved
[0.00] BIOS-e820: [mem 0x75ddc000-0x75e32fff] ACPI data
[0.00] BIOS-e820: [mem 0x75e33000-0x7642dfff] ACPI NVS
[0.00] BIOS-e820: [mem 0x7642e000-0x7bcd9fff] reserved
[0.00] BIOS-e820: [mem 0x7bcda000-0x7bcdafff] usable
[0.00] BIOS-e820: [mem 0x7bcdb000-0x7bd60fff] reserved
[0.00] BIOS-e820: [mem 0x7bd61000-0x7bff] usable
[0.00] BIOS-e820: [mem 0x7c00-0x8fff] reserved
[0.00] BIOS-e820: [mem 0xfed1c000-0xfed44fff] reserved
[0.00] BIOS-e820: [mem 0xff00-0x] reserved
[0.00] BIOS-e820: [mem 0x0001-0x00407fff] usable
[0.00] NX (Execute Disable) protection: active
[0.00] extended physical RAM map:
[0.00] reserve setup_data: [mem 0x-0x0009] 
usable
[0.00] reserve setup_data: [mem 0x0010-0x709fb017] 
usable
[0.00] reserve setup_data: [mem 0x709fb018-0x70a4b857] 
usable
[0.00] reserve setup_data: [mem 0x70a4b858-0x70a4c017] 
usable
[0.00] reserve setup_data: [mem 0x70a4c018-0x70a53c57] 
usable
[0.00] reserve setup_data: [mem 0x70a53c58-0x750bdfff] 
usable
[0.00] reserve setup_data: [mem 0x750be000-0x75ddbfff] 
reserved
[0.00] reserve setup_data: [mem 0x75ddc000-0x75e32fff] 
ACPI data
[0.00] reserve setup_data: [mem 0x75e33000-0x7642dfff] 
ACPI NVS
[0.00] reserve setup_data: [mem 0x7642e000-0x7bcd9fff] 
reserved
[0.00] reserve setup_data: [mem 0x7bcda000-0x7bcdafff] 
usable
[0.00] reserve setup_data: [mem 0x7bcdb000-0x7bd60fff] 
reserved
[0.00] reserve setup_data: [mem 0x7bd61000-0x7bff] 
usable
[0.00] reserve setup_data: [mem 0x7c00-0x8fff] 
reserved
[0.00] reserve setup_data: [mem 0xfed1c000-0xfed44fff] 
reserved
[0.00] reserve setup_data: [mem 0xff00-0x] 
reserved
[0.00] reserve setup_data: [mem 0x0001-0x00407fff] 
usable
[0.00] efi: EFI v2.40 by American Megatrends
[0.00] efi:  ACPI=0x75f5c000  ACPI 2.0=0x75f5c000  ESRT=0x7bc4d018  
SMBIOS=0xf05e0  SMBIOS 3.0=0x7bb2f000  MPS=0xfc9e0 
[0.00] efi: mem00: [Boot Code  |   |  |  |  |  |  |  |   
|WB|WT|WC|UC] range=[0x-0x7fff] (0MB)
[0.00] efi: mem01: [Loader Data|   |  |  |  |  |  |  |   
|WB|WT|WC|UC] range=[0x8000-0x8fff] (0MB)
[0.00] efi: mem02: [Conventional Memory|   |  |  |  |  |  |  |   
|WB|WT|WC|UC] range=[0x9000-0x0003] (0MB)
[0.00] efi: mem03: [Boot Code  |   |  |  |  |  |  |  |   
|WB|WT|WC|UC] range=[0x0004-0x0009] (0MB)
[0.00] efi: mem04: [Conventional Memory|   |  |  |  |  |  |  |   
|WB|WT|WC|UC] range=[0x0010-0x00ff] (15MB)
[0.00] efi: mem05: [Loader Data|   |  |  |  |  |  |  |   
|WB|WT|WC|UC] range=[0x0100-0x024cdfff] (20MB)
[0.00] efi: mem06: [Conventional Memory|   |  |  |  |  |  |  |   
|WB|WT|WC|UC] range=[0x024ce000-0x000

Re: kexec regression since 4.9 caused by efi

2017-03-08 Thread Dave Young
Add efi/kexec list.

On 03/08/17 at 12:16pm, Omar Sandoval wrote:
> Hi, everyone,
> 
> Since 4.9, kexec results in the following panic on some of our servers:
> 
> [0.001000] general protection fault:  [#1] SMP
> [0.001000] Modules linked in:
> [0.001000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.11.0-rc1 #53
> [0.001000] Hardware name: Wiwynn Leopard-Orv2/Leopard-DDR BW, BIOS LBM05  
>  09/30/2016
> [0.001000] task: 81e0e4c0 task.stack: 81e0
> [0.001000] RIP: 0010:virt_efi_set_variable+0x85/0x1a0
> [0.001000] RSP: :81e03e18 EFLAGS: 00010202
> [0.001000] RAX: afafafafafafafaf RBX: 81e3a4e0 RCX: 
> 0007
> [0.001000] RDX: 81e03e70 RSI: 81e3a4e0 RDI: 
> 88407f8c2de0
> [0.001000] RBP: 81e03e60 R08:  R09: 
> 
> [0.001000] R10:  R11:  R12: 
> 81e03e70
> [0.001000] R13: 0007 R14:  R15: 
> 
> [0.001000] FS:  () GS:881fff60() 
> knlGS:
> [0.001000] CS:  0010 DS:  ES:  CR0: 80050033
> [0.001000] CR2: 88407f30f000 CR3: 001fff102000 CR4: 
> 000406b0
> [0.001000] DR0:  DR1:  DR2: 
> 
> [0.001000] DR3:  DR6: fffe0ff0 DR7: 
> 0400
> [0.001000] Call Trace:
> [0.001000]  efi_delete_dummy_variable+0x7a/0x80
> [0.001000]  efi_enter_virtual_mode+0x3e2/0x494
> [0.001000]  start_kernel+0x392/0x418
> [0.001000]  ? set_init_arg+0x55/0x55
> [0.001000]  x86_64_start_reservations+0x2a/0x2c
> [0.001000]  x86_64_start_kernel+0xea/0xed
> [0.001000]  start_cpu+0x14/0x14
> [0.001000] Code: 42 25 8d ff 80 3d 43 77 95 00 00 75 68 9c 8f 04 24 48 8b 
> 05 3e 7d 7e 00 48 89 de 4d 89 f9 4d 89 f0 44 89 e9 4c 89 e2 48 8b 40 58 <48> 
> 8b 78 58 31 c0 e8 90 e4 92 ff 48 8b 3c 24 48 c7 c6 2b 0a ca
> [0.001000] RIP: virt_efi_set_variable+0x85/0x1a0 RSP: 81e03e18
> [0.001000] ---[ end trace 0bd213e540e9b19f ]---
> [0.001000] Kernel panic - not syncing: Fatal exception
> [0.001000] ---[ end Kernel panic - not syncing: Fatal exception
> 
> Booting normally (i.e., not kexec) still works.
> 
> The decoded code is:
> 
> 
>0:   42 25 8d ff 80 3d   rex.X and $0x3d80ff8d,%eax
>6:   43 77 95rex.XB ja 0xff9e
>9:   00 00   add%al,(%rax)
>b:   75 68   jne0x75
>d:   9c  pushfq
>e:   8f 04 24popq   (%rsp)
>   11:   48 8b 05 3e 7d 7e 00mov0x7e7d3e(%rip),%rax# 0x7e7d56
>   18:   48 89 demov%rbx,%rsi
>   1b:   4d 89 f9mov%r15,%r9
>   1e:   4d 89 f0mov%r14,%r8
>   21:   44 89 e9mov%r13d,%ecx
>   24:   4c 89 e2mov%r12,%rdx
>   27:   48 8b 40 58 mov0x58(%rax),%rax
>   2b:*  48 8b 78 58 mov0x58(%rax),%rdi  <-- trapping 
> instruction
>   2f:   31 c0   xor%eax,%eax
>   31:   e8 90 e4 92 ff  callq  0xff92e4c6
>   36:   48 8b 3c 24 mov(%rsp),%rdi
>   3a:   48  rex.W
>   3b:   c7  .byte 0xc7
>   3c:   c6  (bad)
>   3d:   2b 0a   sub(%rdx),%ecx
>   3f:   ca  .byte 0xca
> 
> If I'm reading this correctly, efi.systab->runtime == 0xafafafafafafafaf,

I have no more clue yet from your provided log, but the runtime value is
odd to me. It is set in below code:

arch/x86/platform/efi/efi.c: efi_systab_init()
efi_systab.runtime = data ?
 (void *)(unsigned long)data->runtime :
 (void *)(unsigne long)systab64->runtime;

Here data is the setup_data passed by kexec-tools from normal kernel to
kexec kernel, efi_setup_data structure is like below: 
struct efi_setup_data {
u64 fw_vendor;
u64 runtime;
u64 tables;
u64 smbios;
u64 reserved[8];
};

kexec-tools get the runtime address from /sys/firmware/efi/runtime

So can you do some debuggin on your side, eg. see the sysfs runtime
value is correct or not. And add some printk in efi init path etc.

> and we're crashing when we try to dereference that.
> 
> Here is the output of efi=debug from before the crash:
> 
> [0.00] Linux version 4.11.0-rc1 (osan...@devbig561.prn1.facebook.com) 
> (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #53 SMP Wed Mar 8 
> 12:07:16 PST 2017
> [0.00] Command line: BOOT_IMAGE=/vmlinuz-4.6.7-34_fbk7_2504_g8275185 
> ro root=LABEL=/ ipv6.autoconf=0 erst_disable biosdevname=0 net.ifnames=0 
> fsck.repair=yes pcie_pme=nomsi 
> netconsole=+@2401:db00:0011:b03e:face::0009:0

Re: kexec regression since 4.9 caused by efi

2017-03-08 Thread Dave Young
On 03/08/17 at 12:16pm, Omar Sandoval wrote:
> Hi, everyone,
> 
> Since 4.9, kexec results in the following panic on some of our servers:
> 
> [0.001000] general protection fault:  [#1] SMP
> [0.001000] Modules linked in:
> [0.001000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.11.0-rc1 #53
> [0.001000] Hardware name: Wiwynn Leopard-Orv2/Leopard-DDR BW, BIOS LBM05  
>  09/30/2016
> [0.001000] task: 81e0e4c0 task.stack: 81e0
> [0.001000] RIP: 0010:virt_efi_set_variable+0x85/0x1a0
> [0.001000] RSP: :81e03e18 EFLAGS: 00010202
> [0.001000] RAX: afafafafafafafaf RBX: 81e3a4e0 RCX: 
> 0007
> [0.001000] RDX: 81e03e70 RSI: 81e3a4e0 RDI: 
> 88407f8c2de0
> [0.001000] RBP: 81e03e60 R08:  R09: 
> 
> [0.001000] R10:  R11:  R12: 
> 81e03e70
> [0.001000] R13: 0007 R14:  R15: 
> 
> [0.001000] FS:  () GS:881fff60() 
> knlGS:
> [0.001000] CS:  0010 DS:  ES:  CR0: 80050033
> [0.001000] CR2: 88407f30f000 CR3: 001fff102000 CR4: 
> 000406b0
> [0.001000] DR0:  DR1:  DR2: 
> 
> [0.001000] DR3:  DR6: fffe0ff0 DR7: 
> 0400
> [0.001000] Call Trace:
> [0.001000]  efi_delete_dummy_variable+0x7a/0x80
> [0.001000]  efi_enter_virtual_mode+0x3e2/0x494
> [0.001000]  start_kernel+0x392/0x418
> [0.001000]  ? set_init_arg+0x55/0x55
> [0.001000]  x86_64_start_reservations+0x2a/0x2c
> [0.001000]  x86_64_start_kernel+0xea/0xed
> [0.001000]  start_cpu+0x14/0x14
> [0.001000] Code: 42 25 8d ff 80 3d 43 77 95 00 00 75 68 9c 8f 04 24 48 8b 
> 05 3e 7d 7e 00 48 89 de 4d 89 f9 4d 89 f0 44 89 e9 4c 89 e2 48 8b 40 58 <48> 
> 8b 78 58 31 c0 e8 90 e4 92 ff 48 8b 3c 24 48 c7 c6 2b 0a ca
> [0.001000] RIP: virt_efi_set_variable+0x85/0x1a0 RSP: 81e03e18
> [0.001000] ---[ end trace 0bd213e540e9b19f ]---
> [0.001000] Kernel panic - not syncing: Fatal exception
> [0.001000] ---[ end Kernel panic - not syncing: Fatal exception
> 
> Booting normally (i.e., not kexec) still works.
> 
> The decoded code is:
> 
> 
>0:   42 25 8d ff 80 3d   rex.X and $0x3d80ff8d,%eax
>6:   43 77 95rex.XB ja 0xff9e
>9:   00 00   add%al,(%rax)
>b:   75 68   jne0x75
>d:   9c  pushfq
>e:   8f 04 24popq   (%rsp)
>   11:   48 8b 05 3e 7d 7e 00mov0x7e7d3e(%rip),%rax# 0x7e7d56
>   18:   48 89 demov%rbx,%rsi
>   1b:   4d 89 f9mov%r15,%r9
>   1e:   4d 89 f0mov%r14,%r8
>   21:   44 89 e9mov%r13d,%ecx
>   24:   4c 89 e2mov%r12,%rdx
>   27:   48 8b 40 58 mov0x58(%rax),%rax
>   2b:*  48 8b 78 58 mov0x58(%rax),%rdi  <-- trapping 
> instruction
>   2f:   31 c0   xor%eax,%eax
>   31:   e8 90 e4 92 ff  callq  0xff92e4c6
>   36:   48 8b 3c 24 mov(%rsp),%rdi
>   3a:   48  rex.W
>   3b:   c7  .byte 0xc7
>   3c:   c6  (bad)
>   3d:   2b 0a   sub(%rdx),%ecx
>   3f:   ca  .byte 0xca
> 
> If I'm reading this correctly, efi.systab->runtime == 0xafafafafafafafaf,
> and we're crashing when we try to dereference that.
> 
> Here is the output of efi=debug from before the crash:
> 
> [0.00] Linux version 4.11.0-rc1 (osan...@devbig561.prn1.facebook.com) 
> (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #53 SMP Wed Mar 8 
> 12:07:16 PST 2017
> [0.00] Command line: BOOT_IMAGE=/vmlinuz-4.6.7-34_fbk7_2504_g8275185 
> ro root=LABEL=/ ipv6.autoconf=0 erst_disable biosdevname=0 net.ifnames=0 
> fsck.repair=yes pcie_pme=nomsi 
> netconsole=+@2401:db00:0011:b03e:face::0009:/eth0,1514@2401:db00:eef0:a59::/02:90:fb:5b:b7:1e
>  crashkernel=128M console=tty0 co
> nsole=ttyS1,57600 efi=debug
> [0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point 
> registers'
> [0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
> [0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
> [0.00] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
> [0.00] x86/fpu: Enabled xstate features 0x7, context size is 832 
> bytes, using 'standard' format.
> [0.00] e820: BIOS-provided physical RAM map:
> [0.00] BIOS-e820: [mem 0x0100-0x0009] usable
> [0.00] BIOS-e820: [mem 0x0010-0x750bdfff] usable
> [0.00] BIOS-e820: [mem 0x750be000-0x75ddbf