Re: 4.9-rc1 boot regression, ambiguous bisect result
On Wed, Nov 2, 2016 at 5:41 PM, Neri, Ricardo wrote: > On Sun, 2016-10-30 at 08:59 -0700, Dan Williams wrote: >> On Sun, Oct 30, 2016 at 5:08 AM, Thorsten Leemhuis >> wrote: >> > JFYI: I added this report to the list of regressions for Linux 4.9. I'll >> > watch this thread for further updates on this issue to document progress >> > in my weekly reports. Please let me know via regressi...@leemhuis.info >> > in case the discussion moves to a different place (bugzilla or another >> > mail thread for example). tia! >> > >> > Current status (afaics) in my report: This looks stuck. Or was is >> > discussed (or even fixed) somewhere else? >> >> Thanks, and no, not fixed yet. I've not found the time to run the >> experiments Matt needs, but a colleague has offered to look into it. >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-efi" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > Dan, was there a special configuration that you enabled for this bug? > The person working on the bug can't reproduce the bug in v4.9-rc1 or > v4.9-rc3. I am pxe-booting the platform to an nfs root filesystem.
Re: 4.9-rc1 boot regression, ambiguous bisect result
On Sun, 2016-10-30 at 08:59 -0700, Dan Williams wrote: > On Sun, Oct 30, 2016 at 5:08 AM, Thorsten Leemhuis > wrote: > > JFYI: I added this report to the list of regressions for Linux 4.9. I'll > > watch this thread for further updates on this issue to document progress > > in my weekly reports. Please let me know via regressi...@leemhuis.info > > in case the discussion moves to a different place (bugzilla or another > > mail thread for example). tia! > > > > Current status (afaics) in my report: This looks stuck. Or was is > > discussed (or even fixed) somewhere else? > > Thanks, and no, not fixed yet. I've not found the time to run the > experiments Matt needs, but a colleague has offered to look into it. > -- > To unsubscribe from this list: send the line "unsubscribe linux-efi" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Dan, was there a special configuration that you enabled for this bug? The person working on the bug can't reproduce the bug in v4.9-rc1 or v4.9-rc3.
Re: 4.9-rc1 boot regression, ambiguous bisect result
On Sun, 30 Oct, at 08:59:58AM, Dan Williams wrote: > On Sun, Oct 30, 2016 at 5:08 AM, Thorsten Leemhuis > wrote: > > JFYI: I added this report to the list of regressions for Linux 4.9. I'll > > watch this thread for further updates on this issue to document progress > > in my weekly reports. Please let me know via regressi...@leemhuis.info > > in case the discussion moves to a different place (bugzilla or another > > mail thread for example). tia! > > > > Current status (afaics) in my report: This looks stuck. Or was is > > discussed (or even fixed) somewhere else? > > Thanks, and no, not fixed yet. I've not found the time to run the > experiments Matt needs, but a colleague has offered to look into it. Of course, if you are willing to help with debugging, Thorsten, it would be much appreciated and this bug might get fixed sooner.
Re: 4.9-rc1 boot regression, ambiguous bisect result
On Sun, Oct 30, 2016 at 5:08 AM, Thorsten Leemhuis wrote: > JFYI: I added this report to the list of regressions for Linux 4.9. I'll > watch this thread for further updates on this issue to document progress > in my weekly reports. Please let me know via regressi...@leemhuis.info > in case the discussion moves to a different place (bugzilla or another > mail thread for example). tia! > > Current status (afaics) in my report: This looks stuck. Or was is > discussed (or even fixed) somewhere else? Thanks, and no, not fixed yet. I've not found the time to run the experiments Matt needs, but a colleague has offered to look into it.
Re: 4.9-rc1 boot regression, ambiguous bisect result
JFYI: I added this report to the list of regressions for Linux 4.9. I'll watch this thread for further updates on this issue to document progress in my weekly reports. Please let me know via regressi...@leemhuis.info in case the discussion moves to a different place (bugzilla or another mail thread for example). tia! Current status (afaics) in my report: This looks stuck. Or was is discussed (or even fixed) somewhere else? Ciao, Thorsten On 22.10.2016 01:20, Dan Williams wrote: > On Fri, Oct 21, 2016 at 1:20 PM, Matt Fleming > wrote: >> On Fri, 21 Oct, at 04:41:29PM, Matt Fleming wrote: >>> >>> FYI, I've been able to reproduce some crash when using your EFI memory >>> map layout under Qemu and forcing the ESRT driver to reserve the space. >> >> Nope, that was a bug in my hack. I can't get Qemu to crash while using >> your memory map layout. >> >> Any chance you can insert "while(1)" loops into the EFI boot paths for >> a kernel that is known to reboot or trigger a triple fault in kernels >> that hang, so that we can narrow in on the issue. See, >> >> >> http://www.codeblueprint.co.uk/2015/04/early-x86-linux-boot-debug-tricks.html > > I can take a look, but it will not be until Monday when I have > physical access to the system again. > > http://news.gmane.org/find-root.php?message_id=CAPcyv4jkVcBwecxwt1P+p-fMSuen9B9xHEVf0BjM5uJZ4_jAdw%40mail.gmail.com > > http://mid.gmane.org/CAPcyv4jkVcBwecxwt1P+p-fMSuen9B9xHEVf0BjM5uJZ4_jAdw%40mail.gmail.com >
Re: 4.9-rc1 boot regression, ambiguous bisect result
On Fri, Oct 21, 2016 at 1:20 PM, Matt Fleming wrote: > On Fri, 21 Oct, at 04:41:29PM, Matt Fleming wrote: >> >> FYI, I've been able to reproduce some crash when using your EFI memory >> map layout under Qemu and forcing the ESRT driver to reserve the space. > > Nope, that was a bug in my hack. I can't get Qemu to crash while using > your memory map layout. > > Any chance you can insert "while(1)" loops into the EFI boot paths for > a kernel that is known to reboot or trigger a triple fault in kernels > that hang, so that we can narrow in on the issue. See, > > > http://www.codeblueprint.co.uk/2015/04/early-x86-linux-boot-debug-tricks.html I can take a look, but it will not be until Monday when I have physical access to the system again.
Re: 4.9-rc1 boot regression, ambiguous bisect result
On Fri, 21 Oct, at 04:41:29PM, Matt Fleming wrote: > > FYI, I've been able to reproduce some crash when using your EFI memory > map layout under Qemu and forcing the ESRT driver to reserve the space. Nope, that was a bug in my hack. I can't get Qemu to crash while using your memory map layout. Any chance you can insert "while(1)" loops into the EFI boot paths for a kernel that is known to reboot or trigger a triple fault in kernels that hang, so that we can narrow in on the issue. See, http://www.codeblueprint.co.uk/2015/04/early-x86-linux-boot-debug-tricks.html
Re: 4.9-rc1 boot regression, ambiguous bisect result
On Fri, Oct 21, 2016 at 12:00 AM, Ingo Molnar wrote: > > * Dan Williams wrote: > >> On Thu, Oct 20, 2016 at 8:22 AM, Dan Williams >> wrote: >> > On Thu, Oct 20, 2016 at 5:29 AM, Matt Fleming >> > wrote: >> >> On Wed, 19 Oct, at 09:04:29PM, Dan Williams wrote: >> >>> Hi, >> >>> >> >>> I am currently unable to boot a Yoga 900 with latest mainline, but 4.8 >> >>> boots. >> >>> >> >>> The symptom is a reboot before the video console is available. >> >>> >> >>> I bisected to commit 816e76129ed5 "efi: Allow drivers to reserve boot >> >>> services forever". However, that commit is known to be broken. The >> >>> proposed fix, commit 92dc33501bfb "x86/efi: Round EFI memmap >> >>> reservations to EFI_PAGE_SIZE", also exhibits the reboot problem. >> >>> >> >>> During the bisect some of the stopping points landed on commits that >> >>> caused the boot process to hang rather than cause a reboot. The >> >>> commits that resulted in a hang are marked "git bisect skip" in this >> >>> log: https://gist.github.com/djbw/1b501daa98192a42ae848f03bb59c30e >> >>> >> >>> I'll try treating those hangs as bad bisect results and re-run the >> >>> full bisect tomorrow. In the meantime I wonder if the bisect log >> >>> implicates a better regression candidate? >> >> >> >> Could you mail the dmesg output when booting a known working kernel >> >> with efi=debug ? >> > >> > Here it is: >> > >> > https://gist.github.com/djbw/cae05e721b159d5ad7b146d7a93f5fa2 >> >> I am able to build a kernel and boot the platform with the following >> set of reverts: >> >> Revert "x86/efi: Round EFI memmap reservations to EFI_PAGE_SIZE" >> Revert "x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image >> data" >> Revert "efi/esrt: Use efi_mem_reserve() and avoid a kmalloc()" >> Revert "efi: Allow drivers to reserve boot services forever" > > Could you please describe the bootup behavior after each revert? I.e. wild > guess: > >vanilla kernel: ># spontaneous reboot >+ Revert "x86/efi: Round EFI memmap reservations to EFI_PAGE_SIZE": ># spontaneous reboot >+ Revert "x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image > data": ># hang >+ Revert "efi/esrt: Use efi_mem_reserve() and avoid a kmalloc()": ># hang >+ Revert "efi: Allow drivers to reserve boot services forever": >== works > > ? In this case all but the last revert produce the same result, instant reboot after loading the kernel. I have not been able to pinpoint what changes that behavior to the hang conditions I saw mid-bisect. The first three reverts are just there to get the kernel to build again after reverting "efi: Allow drivers to reserve boot services forever"
Re: 4.9-rc1 boot regression, ambiguous bisect result
On Thu, 20 Oct, at 12:37:16PM, Dan Williams wrote: > > I am able to build a kernel and boot the platform with the following > set of reverts: > > Revert "x86/efi: Round EFI memmap reservations to EFI_PAGE_SIZE" > Revert "x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data" > Revert "efi/esrt: Use efi_mem_reserve() and avoid a kmalloc()" > Revert "efi: Allow drivers to reserve boot services forever" FYI, I've been able to reproduce some crash when using your EFI memory map layout under Qemu and forcing the ESRT driver to reserve the space. It looks like the new EFI memmap we allocate as part of the reservation is smaller than the old one - which is backwards. Still debugging...
Re: 4.9-rc1 boot regression, ambiguous bisect result
* Dan Williams wrote: > On Thu, Oct 20, 2016 at 8:22 AM, Dan Williams > wrote: > > On Thu, Oct 20, 2016 at 5:29 AM, Matt Fleming > > wrote: > >> On Wed, 19 Oct, at 09:04:29PM, Dan Williams wrote: > >>> Hi, > >>> > >>> I am currently unable to boot a Yoga 900 with latest mainline, but 4.8 > >>> boots. > >>> > >>> The symptom is a reboot before the video console is available. > >>> > >>> I bisected to commit 816e76129ed5 "efi: Allow drivers to reserve boot > >>> services forever". However, that commit is known to be broken. The > >>> proposed fix, commit 92dc33501bfb "x86/efi: Round EFI memmap > >>> reservations to EFI_PAGE_SIZE", also exhibits the reboot problem. > >>> > >>> During the bisect some of the stopping points landed on commits that > >>> caused the boot process to hang rather than cause a reboot. The > >>> commits that resulted in a hang are marked "git bisect skip" in this > >>> log: https://gist.github.com/djbw/1b501daa98192a42ae848f03bb59c30e > >>> > >>> I'll try treating those hangs as bad bisect results and re-run the > >>> full bisect tomorrow. In the meantime I wonder if the bisect log > >>> implicates a better regression candidate? > >> > >> Could you mail the dmesg output when booting a known working kernel > >> with efi=debug ? > > > > Here it is: > > > > https://gist.github.com/djbw/cae05e721b159d5ad7b146d7a93f5fa2 > > I am able to build a kernel and boot the platform with the following > set of reverts: > > Revert "x86/efi: Round EFI memmap reservations to EFI_PAGE_SIZE" > Revert "x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data" > Revert "efi/esrt: Use efi_mem_reserve() and avoid a kmalloc()" > Revert "efi: Allow drivers to reserve boot services forever" Could you please describe the bootup behavior after each revert? I.e. wild guess: vanilla kernel: # spontaneous reboot + Revert "x86/efi: Round EFI memmap reservations to EFI_PAGE_SIZE": # spontaneous reboot + Revert "x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data": # hang + Revert "efi/esrt: Use efi_mem_reserve() and avoid a kmalloc()": # hang + Revert "efi: Allow drivers to reserve boot services forever": == works ? Thanks, Ingo
Re: 4.9-rc1 boot regression, ambiguous bisect result
On Thu, Oct 20, 2016 at 8:22 AM, Dan Williams wrote: > On Thu, Oct 20, 2016 at 5:29 AM, Matt Fleming > wrote: >> On Wed, 19 Oct, at 09:04:29PM, Dan Williams wrote: >>> Hi, >>> >>> I am currently unable to boot a Yoga 900 with latest mainline, but 4.8 >>> boots. >>> >>> The symptom is a reboot before the video console is available. >>> >>> I bisected to commit 816e76129ed5 "efi: Allow drivers to reserve boot >>> services forever". However, that commit is known to be broken. The >>> proposed fix, commit 92dc33501bfb "x86/efi: Round EFI memmap >>> reservations to EFI_PAGE_SIZE", also exhibits the reboot problem. >>> >>> During the bisect some of the stopping points landed on commits that >>> caused the boot process to hang rather than cause a reboot. The >>> commits that resulted in a hang are marked "git bisect skip" in this >>> log: https://gist.github.com/djbw/1b501daa98192a42ae848f03bb59c30e >>> >>> I'll try treating those hangs as bad bisect results and re-run the >>> full bisect tomorrow. In the meantime I wonder if the bisect log >>> implicates a better regression candidate? >> >> Could you mail the dmesg output when booting a known working kernel >> with efi=debug ? > > Here it is: > > https://gist.github.com/djbw/cae05e721b159d5ad7b146d7a93f5fa2 I am able to build a kernel and boot the platform with the following set of reverts: Revert "x86/efi: Round EFI memmap reservations to EFI_PAGE_SIZE" Revert "x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data" Revert "efi/esrt: Use efi_mem_reserve() and avoid a kmalloc()" Revert "efi: Allow drivers to reserve boot services forever"
Re: 4.9-rc1 boot regression, ambiguous bisect result
On Thu, Oct 20, 2016 at 5:29 AM, Matt Fleming wrote: > On Wed, 19 Oct, at 09:04:29PM, Dan Williams wrote: >> Hi, >> >> I am currently unable to boot a Yoga 900 with latest mainline, but 4.8 boots. >> >> The symptom is a reboot before the video console is available. >> >> I bisected to commit 816e76129ed5 "efi: Allow drivers to reserve boot >> services forever". However, that commit is known to be broken. The >> proposed fix, commit 92dc33501bfb "x86/efi: Round EFI memmap >> reservations to EFI_PAGE_SIZE", also exhibits the reboot problem. >> >> During the bisect some of the stopping points landed on commits that >> caused the boot process to hang rather than cause a reboot. The >> commits that resulted in a hang are marked "git bisect skip" in this >> log: https://gist.github.com/djbw/1b501daa98192a42ae848f03bb59c30e >> >> I'll try treating those hangs as bad bisect results and re-run the >> full bisect tomorrow. In the meantime I wonder if the bisect log >> implicates a better regression candidate? > > Could you mail the dmesg output when booting a known working kernel > with efi=debug ? Here it is: https://gist.github.com/djbw/cae05e721b159d5ad7b146d7a93f5fa2
Re: 4.9-rc1 boot regression, ambiguous bisect result
On Wed, 19 Oct, at 09:04:29PM, Dan Williams wrote: > Hi, > > I am currently unable to boot a Yoga 900 with latest mainline, but 4.8 boots. > > The symptom is a reboot before the video console is available. > > I bisected to commit 816e76129ed5 "efi: Allow drivers to reserve boot > services forever". However, that commit is known to be broken. The > proposed fix, commit 92dc33501bfb "x86/efi: Round EFI memmap > reservations to EFI_PAGE_SIZE", also exhibits the reboot problem. > > During the bisect some of the stopping points landed on commits that > caused the boot process to hang rather than cause a reboot. The > commits that resulted in a hang are marked "git bisect skip" in this > log: https://gist.github.com/djbw/1b501daa98192a42ae848f03bb59c30e > > I'll try treating those hangs as bad bisect results and re-run the > full bisect tomorrow. In the meantime I wonder if the bisect log > implicates a better regression candidate? Could you mail the dmesg output when booting a known working kernel with efi=debug ?
Re: 4.9-rc1 boot regression, ambiguous bisect result
* Dan Williams wrote: > Hi, > > I am currently unable to boot a Yoga 900 with latest mainline, but 4.8 boots. > > The symptom is a reboot before the video console is available. > > I bisected to commit 816e76129ed5 "efi: Allow drivers to reserve boot > services forever". However, that commit is known to be broken. The > proposed fix, commit 92dc33501bfb "x86/efi: Round EFI memmap > reservations to EFI_PAGE_SIZE", also exhibits the reboot problem. > > During the bisect some of the stopping points landed on commits that > caused the boot process to hang rather than cause a reboot. The > commits that resulted in a hang are marked "git bisect skip" in this > log: https://gist.github.com/djbw/1b501daa98192a42ae848f03bb59c30e > > I'll try treating those hangs as bad bisect results and re-run the > full bisect tomorrow. In the meantime I wonder if the bisect log > implicates a better regression candidate? You could also try reverts of the suspicious commits, and then, if the reverted kernel works fine, create a more linear history by cherry-picking them in the right order - and then be able to pinpoint the bad commit with a higher confidence. Thanks, Ingo
4.9-rc1 boot regression, ambiguous bisect result
Hi, I am currently unable to boot a Yoga 900 with latest mainline, but 4.8 boots. The symptom is a reboot before the video console is available. I bisected to commit 816e76129ed5 "efi: Allow drivers to reserve boot services forever". However, that commit is known to be broken. The proposed fix, commit 92dc33501bfb "x86/efi: Round EFI memmap reservations to EFI_PAGE_SIZE", also exhibits the reboot problem. During the bisect some of the stopping points landed on commits that caused the boot process to hang rather than cause a reboot. The commits that resulted in a hang are marked "git bisect skip" in this log: https://gist.github.com/djbw/1b501daa98192a42ae848f03bb59c30e I'll try treating those hangs as bad bisect results and re-run the full bisect tomorrow. In the meantime I wonder if the bisect log implicates a better regression candidate?