Re: 4.9-rc1 boot regression, ambiguous bisect result

2016-11-02 Thread Dan Williams
On Wed, Nov 2, 2016 at 5:41 PM, Neri, Ricardo  wrote:
> On Sun, 2016-10-30 at 08:59 -0700, Dan Williams wrote:
>> On Sun, Oct 30, 2016 at 5:08 AM, Thorsten Leemhuis
>>  wrote:
>> > JFYI: I added this report to the list of regressions for Linux 4.9. I'll
>> > watch this thread for further updates on this issue to document progress
>> > in my weekly reports. Please let me know via regressi...@leemhuis.info
>> > in case the discussion moves to a different place (bugzilla or another
>> > mail thread for example). tia!
>> >
>> > Current status (afaics) in my report: This looks stuck. Or was is
>> > discussed (or even fixed) somewhere else?
>>
>> Thanks, and no, not fixed yet. I've not found the time to run the
>> experiments Matt needs, but a colleague has offered to look into it.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-efi" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> Dan, was there a special configuration that you enabled for this bug?
> The person working on the bug can't reproduce the bug in v4.9-rc1 or
> v4.9-rc3.

I am pxe-booting the platform to an nfs root filesystem.


Re: 4.9-rc1 boot regression, ambiguous bisect result

2016-11-02 Thread Neri, Ricardo
On Sun, 2016-10-30 at 08:59 -0700, Dan Williams wrote:
> On Sun, Oct 30, 2016 at 5:08 AM, Thorsten Leemhuis
>  wrote:
> > JFYI: I added this report to the list of regressions for Linux 4.9. I'll
> > watch this thread for further updates on this issue to document progress
> > in my weekly reports. Please let me know via regressi...@leemhuis.info
> > in case the discussion moves to a different place (bugzilla or another
> > mail thread for example). tia!
> >
> > Current status (afaics) in my report: This looks stuck. Or was is
> > discussed (or even fixed) somewhere else?
> 
> Thanks, and no, not fixed yet. I've not found the time to run the
> experiments Matt needs, but a colleague has offered to look into it.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-efi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Dan, was there a special configuration that you enabled for this bug?
The person working on the bug can't reproduce the bug in v4.9-rc1 or
v4.9-rc3.


Re: 4.9-rc1 boot regression, ambiguous bisect result

2016-10-31 Thread Matt Fleming
On Sun, 30 Oct, at 08:59:58AM, Dan Williams wrote:
> On Sun, Oct 30, 2016 at 5:08 AM, Thorsten Leemhuis
>  wrote:
> > JFYI: I added this report to the list of regressions for Linux 4.9. I'll
> > watch this thread for further updates on this issue to document progress
> > in my weekly reports. Please let me know via regressi...@leemhuis.info
> > in case the discussion moves to a different place (bugzilla or another
> > mail thread for example). tia!
> >
> > Current status (afaics) in my report: This looks stuck. Or was is
> > discussed (or even fixed) somewhere else?
> 
> Thanks, and no, not fixed yet. I've not found the time to run the
> experiments Matt needs, but a colleague has offered to look into it.

Of course, if you are willing to help with debugging, Thorsten, it
would be much appreciated and this bug might get fixed sooner.


Re: 4.9-rc1 boot regression, ambiguous bisect result

2016-10-30 Thread Dan Williams
On Sun, Oct 30, 2016 at 5:08 AM, Thorsten Leemhuis
 wrote:
> JFYI: I added this report to the list of regressions for Linux 4.9. I'll
> watch this thread for further updates on this issue to document progress
> in my weekly reports. Please let me know via regressi...@leemhuis.info
> in case the discussion moves to a different place (bugzilla or another
> mail thread for example). tia!
>
> Current status (afaics) in my report: This looks stuck. Or was is
> discussed (or even fixed) somewhere else?

Thanks, and no, not fixed yet. I've not found the time to run the
experiments Matt needs, but a colleague has offered to look into it.


Re: 4.9-rc1 boot regression, ambiguous bisect result

2016-10-30 Thread Thorsten Leemhuis
JFYI: I added this report to the list of regressions for Linux 4.9. I'll
watch this thread for further updates on this issue to document progress
in my weekly reports. Please let me know via regressi...@leemhuis.info
in case the discussion moves to a different place (bugzilla or another
mail thread for example). tia!

Current status (afaics) in my report: This looks stuck. Or was is
discussed (or even fixed) somewhere else?

Ciao, Thorsten

On 22.10.2016 01:20, Dan Williams wrote:
> On Fri, Oct 21, 2016 at 1:20 PM, Matt Fleming  
> wrote:
>> On Fri, 21 Oct, at 04:41:29PM, Matt Fleming wrote:
>>>
>>> FYI, I've been able to reproduce some crash when using your EFI memory
>>> map layout under Qemu and forcing the ESRT driver to reserve the space.
>>
>> Nope, that was a bug in my hack. I can't get Qemu to crash while using
>> your memory map layout.
>>
>> Any chance you can insert "while(1)" loops into the EFI boot paths for
>> a kernel that is known to reboot or trigger a triple fault in kernels
>> that hang, so that we can narrow in on the issue. See,
>>
>>   
>> http://www.codeblueprint.co.uk/2015/04/early-x86-linux-boot-debug-tricks.html
> 
> I can take a look, but it will not be until Monday when I have
> physical access to the system again.
> 
> http://news.gmane.org/find-root.php?message_id=CAPcyv4jkVcBwecxwt1P+p-fMSuen9B9xHEVf0BjM5uJZ4_jAdw%40mail.gmail.com
>  
> http://mid.gmane.org/CAPcyv4jkVcBwecxwt1P+p-fMSuen9B9xHEVf0BjM5uJZ4_jAdw%40mail.gmail.com
> 


Re: 4.9-rc1 boot regression, ambiguous bisect result

2016-10-21 Thread Dan Williams
On Fri, Oct 21, 2016 at 1:20 PM, Matt Fleming  wrote:
> On Fri, 21 Oct, at 04:41:29PM, Matt Fleming wrote:
>>
>> FYI, I've been able to reproduce some crash when using your EFI memory
>> map layout under Qemu and forcing the ESRT driver to reserve the space.
>
> Nope, that was a bug in my hack. I can't get Qemu to crash while using
> your memory map layout.
>
> Any chance you can insert "while(1)" loops into the EFI boot paths for
> a kernel that is known to reboot or trigger a triple fault in kernels
> that hang, so that we can narrow in on the issue. See,
>
>   
> http://www.codeblueprint.co.uk/2015/04/early-x86-linux-boot-debug-tricks.html

I can take a look, but it will not be until Monday when I have
physical access to the system again.


Re: 4.9-rc1 boot regression, ambiguous bisect result

2016-10-21 Thread Matt Fleming
On Fri, 21 Oct, at 04:41:29PM, Matt Fleming wrote:
> 
> FYI, I've been able to reproduce some crash when using your EFI memory
> map layout under Qemu and forcing the ESRT driver to reserve the space.
 
Nope, that was a bug in my hack. I can't get Qemu to crash while using
your memory map layout.

Any chance you can insert "while(1)" loops into the EFI boot paths for
a kernel that is known to reboot or trigger a triple fault in kernels
that hang, so that we can narrow in on the issue. See,

  http://www.codeblueprint.co.uk/2015/04/early-x86-linux-boot-debug-tricks.html


Re: 4.9-rc1 boot regression, ambiguous bisect result

2016-10-21 Thread Dan Williams
On Fri, Oct 21, 2016 at 12:00 AM, Ingo Molnar  wrote:
>
> * Dan Williams  wrote:
>
>> On Thu, Oct 20, 2016 at 8:22 AM, Dan Williams  
>> wrote:
>> > On Thu, Oct 20, 2016 at 5:29 AM, Matt Fleming  
>> > wrote:
>> >> On Wed, 19 Oct, at 09:04:29PM, Dan Williams wrote:
>> >>> Hi,
>> >>>
>> >>> I am currently unable to boot a Yoga 900 with latest mainline, but 4.8 
>> >>> boots.
>> >>>
>> >>> The symptom is a reboot before the video console is available.
>> >>>
>> >>> I bisected to commit 816e76129ed5 "efi: Allow drivers to reserve boot
>> >>> services forever".  However, that commit is known to be broken.  The
>> >>> proposed fix, commit 92dc33501bfb "x86/efi: Round EFI memmap
>> >>> reservations to EFI_PAGE_SIZE", also exhibits the reboot problem.
>> >>>
>> >>> During the bisect some of the stopping points landed on commits that
>> >>> caused the boot process to hang rather than cause a reboot.  The
>> >>> commits that resulted in a hang are marked "git bisect skip" in this
>> >>> log: https://gist.github.com/djbw/1b501daa98192a42ae848f03bb59c30e
>> >>>
>> >>> I'll try treating those hangs as bad bisect results and re-run the
>> >>> full bisect tomorrow.  In the meantime I wonder if the bisect log
>> >>> implicates a better regression candidate?
>> >>
>> >> Could you mail the dmesg output when booting a known working kernel
>> >> with efi=debug ?
>> >
>> > Here it is:
>> >
>> > https://gist.github.com/djbw/cae05e721b159d5ad7b146d7a93f5fa2
>>
>> I am able to build a kernel and boot the platform with the following
>> set of reverts:
>>
>>   Revert "x86/efi: Round EFI memmap reservations to EFI_PAGE_SIZE"
>>   Revert "x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image 
>> data"
>>   Revert "efi/esrt: Use efi_mem_reserve() and avoid a kmalloc()"
>>   Revert "efi: Allow drivers to reserve boot services forever"
>
> Could you please describe the bootup behavior after each revert? I.e. wild 
> guess:
>
>vanilla kernel:
># spontaneous reboot
>+ Revert "x86/efi: Round EFI memmap reservations to EFI_PAGE_SIZE":
># spontaneous reboot
>+ Revert "x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image 
> data":
># hang
>+ Revert "efi/esrt: Use efi_mem_reserve() and avoid a kmalloc()":
># hang
>+ Revert "efi: Allow drivers to reserve boot services forever":
>== works
>
> ?

In this case all but the last revert produce the same result, instant
reboot after loading the kernel.  I have not been able to pinpoint
what changes that behavior to the hang conditions I saw mid-bisect.

The first three reverts are just there to get the kernel to build
again after reverting "efi: Allow drivers to reserve boot services
forever"


Re: 4.9-rc1 boot regression, ambiguous bisect result

2016-10-21 Thread Matt Fleming
On Thu, 20 Oct, at 12:37:16PM, Dan Williams wrote:
> 
> I am able to build a kernel and boot the platform with the following
> set of reverts:
> 
>   Revert "x86/efi: Round EFI memmap reservations to EFI_PAGE_SIZE"
>   Revert "x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data"
>   Revert "efi/esrt: Use efi_mem_reserve() and avoid a kmalloc()"
>   Revert "efi: Allow drivers to reserve boot services forever"

FYI, I've been able to reproduce some crash when using your EFI memory
map layout under Qemu and forcing the ESRT driver to reserve the space.

It looks like the new EFI memmap we allocate as part of the
reservation is smaller than the old one - which is backwards.

Still debugging...


Re: 4.9-rc1 boot regression, ambiguous bisect result

2016-10-21 Thread Ingo Molnar

* Dan Williams  wrote:

> On Thu, Oct 20, 2016 at 8:22 AM, Dan Williams  
> wrote:
> > On Thu, Oct 20, 2016 at 5:29 AM, Matt Fleming  
> > wrote:
> >> On Wed, 19 Oct, at 09:04:29PM, Dan Williams wrote:
> >>> Hi,
> >>>
> >>> I am currently unable to boot a Yoga 900 with latest mainline, but 4.8 
> >>> boots.
> >>>
> >>> The symptom is a reboot before the video console is available.
> >>>
> >>> I bisected to commit 816e76129ed5 "efi: Allow drivers to reserve boot
> >>> services forever".  However, that commit is known to be broken.  The
> >>> proposed fix, commit 92dc33501bfb "x86/efi: Round EFI memmap
> >>> reservations to EFI_PAGE_SIZE", also exhibits the reboot problem.
> >>>
> >>> During the bisect some of the stopping points landed on commits that
> >>> caused the boot process to hang rather than cause a reboot.  The
> >>> commits that resulted in a hang are marked "git bisect skip" in this
> >>> log: https://gist.github.com/djbw/1b501daa98192a42ae848f03bb59c30e
> >>>
> >>> I'll try treating those hangs as bad bisect results and re-run the
> >>> full bisect tomorrow.  In the meantime I wonder if the bisect log
> >>> implicates a better regression candidate?
> >>
> >> Could you mail the dmesg output when booting a known working kernel
> >> with efi=debug ?
> >
> > Here it is:
> >
> > https://gist.github.com/djbw/cae05e721b159d5ad7b146d7a93f5fa2
> 
> I am able to build a kernel and boot the platform with the following
> set of reverts:
> 
>   Revert "x86/efi: Round EFI memmap reservations to EFI_PAGE_SIZE"
>   Revert "x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data"
>   Revert "efi/esrt: Use efi_mem_reserve() and avoid a kmalloc()"
>   Revert "efi: Allow drivers to reserve boot services forever"

Could you please describe the bootup behavior after each revert? I.e. wild 
guess:

   vanilla kernel:
   # spontaneous reboot
   + Revert "x86/efi: Round EFI memmap reservations to EFI_PAGE_SIZE":
   # spontaneous reboot
   + Revert "x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image 
data":
   # hang
   + Revert "efi/esrt: Use efi_mem_reserve() and avoid a kmalloc()":
   # hang
   + Revert "efi: Allow drivers to reserve boot services forever":
   == works

?

Thanks,

Ingo


Re: 4.9-rc1 boot regression, ambiguous bisect result

2016-10-20 Thread Dan Williams
On Thu, Oct 20, 2016 at 8:22 AM, Dan Williams  wrote:
> On Thu, Oct 20, 2016 at 5:29 AM, Matt Fleming  
> wrote:
>> On Wed, 19 Oct, at 09:04:29PM, Dan Williams wrote:
>>> Hi,
>>>
>>> I am currently unable to boot a Yoga 900 with latest mainline, but 4.8 
>>> boots.
>>>
>>> The symptom is a reboot before the video console is available.
>>>
>>> I bisected to commit 816e76129ed5 "efi: Allow drivers to reserve boot
>>> services forever".  However, that commit is known to be broken.  The
>>> proposed fix, commit 92dc33501bfb "x86/efi: Round EFI memmap
>>> reservations to EFI_PAGE_SIZE", also exhibits the reboot problem.
>>>
>>> During the bisect some of the stopping points landed on commits that
>>> caused the boot process to hang rather than cause a reboot.  The
>>> commits that resulted in a hang are marked "git bisect skip" in this
>>> log: https://gist.github.com/djbw/1b501daa98192a42ae848f03bb59c30e
>>>
>>> I'll try treating those hangs as bad bisect results and re-run the
>>> full bisect tomorrow.  In the meantime I wonder if the bisect log
>>> implicates a better regression candidate?
>>
>> Could you mail the dmesg output when booting a known working kernel
>> with efi=debug ?
>
> Here it is:
>
> https://gist.github.com/djbw/cae05e721b159d5ad7b146d7a93f5fa2

I am able to build a kernel and boot the platform with the following
set of reverts:

  Revert "x86/efi: Round EFI memmap reservations to EFI_PAGE_SIZE"
  Revert "x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data"
  Revert "efi/esrt: Use efi_mem_reserve() and avoid a kmalloc()"
  Revert "efi: Allow drivers to reserve boot services forever"


Re: 4.9-rc1 boot regression, ambiguous bisect result

2016-10-20 Thread Dan Williams
On Thu, Oct 20, 2016 at 5:29 AM, Matt Fleming  wrote:
> On Wed, 19 Oct, at 09:04:29PM, Dan Williams wrote:
>> Hi,
>>
>> I am currently unable to boot a Yoga 900 with latest mainline, but 4.8 boots.
>>
>> The symptom is a reboot before the video console is available.
>>
>> I bisected to commit 816e76129ed5 "efi: Allow drivers to reserve boot
>> services forever".  However, that commit is known to be broken.  The
>> proposed fix, commit 92dc33501bfb "x86/efi: Round EFI memmap
>> reservations to EFI_PAGE_SIZE", also exhibits the reboot problem.
>>
>> During the bisect some of the stopping points landed on commits that
>> caused the boot process to hang rather than cause a reboot.  The
>> commits that resulted in a hang are marked "git bisect skip" in this
>> log: https://gist.github.com/djbw/1b501daa98192a42ae848f03bb59c30e
>>
>> I'll try treating those hangs as bad bisect results and re-run the
>> full bisect tomorrow.  In the meantime I wonder if the bisect log
>> implicates a better regression candidate?
>
> Could you mail the dmesg output when booting a known working kernel
> with efi=debug ?

Here it is:

https://gist.github.com/djbw/cae05e721b159d5ad7b146d7a93f5fa2


Re: 4.9-rc1 boot regression, ambiguous bisect result

2016-10-20 Thread Matt Fleming
On Wed, 19 Oct, at 09:04:29PM, Dan Williams wrote:
> Hi,
> 
> I am currently unable to boot a Yoga 900 with latest mainline, but 4.8 boots.
> 
> The symptom is a reboot before the video console is available.
> 
> I bisected to commit 816e76129ed5 "efi: Allow drivers to reserve boot
> services forever".  However, that commit is known to be broken.  The
> proposed fix, commit 92dc33501bfb "x86/efi: Round EFI memmap
> reservations to EFI_PAGE_SIZE", also exhibits the reboot problem.
> 
> During the bisect some of the stopping points landed on commits that
> caused the boot process to hang rather than cause a reboot.  The
> commits that resulted in a hang are marked "git bisect skip" in this
> log: https://gist.github.com/djbw/1b501daa98192a42ae848f03bb59c30e
> 
> I'll try treating those hangs as bad bisect results and re-run the
> full bisect tomorrow.  In the meantime I wonder if the bisect log
> implicates a better regression candidate?

Could you mail the dmesg output when booting a known working kernel
with efi=debug ?


Re: 4.9-rc1 boot regression, ambiguous bisect result

2016-10-19 Thread Ingo Molnar

* Dan Williams  wrote:

> Hi,
> 
> I am currently unable to boot a Yoga 900 with latest mainline, but 4.8 boots.
> 
> The symptom is a reboot before the video console is available.
> 
> I bisected to commit 816e76129ed5 "efi: Allow drivers to reserve boot
> services forever".  However, that commit is known to be broken.  The
> proposed fix, commit 92dc33501bfb "x86/efi: Round EFI memmap
> reservations to EFI_PAGE_SIZE", also exhibits the reboot problem.
> 
> During the bisect some of the stopping points landed on commits that
> caused the boot process to hang rather than cause a reboot.  The
> commits that resulted in a hang are marked "git bisect skip" in this
> log: https://gist.github.com/djbw/1b501daa98192a42ae848f03bb59c30e
> 
> I'll try treating those hangs as bad bisect results and re-run the
> full bisect tomorrow.  In the meantime I wonder if the bisect log
> implicates a better regression candidate?

You could also try reverts of the suspicious commits, and then, if the reverted 
kernel works fine, create a more linear history by cherry-picking them in the 
right order - and then be able to pinpoint the bad commit with a higher 
confidence.

Thanks,

Ingo


4.9-rc1 boot regression, ambiguous bisect result

2016-10-19 Thread Dan Williams
Hi,

I am currently unable to boot a Yoga 900 with latest mainline, but 4.8 boots.

The symptom is a reboot before the video console is available.

I bisected to commit 816e76129ed5 "efi: Allow drivers to reserve boot
services forever".  However, that commit is known to be broken.  The
proposed fix, commit 92dc33501bfb "x86/efi: Round EFI memmap
reservations to EFI_PAGE_SIZE", also exhibits the reboot problem.

During the bisect some of the stopping points landed on commits that
caused the boot process to hang rather than cause a reboot.  The
commits that resulted in a hang are marked "git bisect skip" in this
log: https://gist.github.com/djbw/1b501daa98192a42ae848f03bb59c30e

I'll try treating those hangs as bad bisect results and re-run the
full bisect tomorrow.  In the meantime I wonder if the bisect log
implicates a better regression candidate?