On Fri, 10 Apr 2020 22:29:13 +0200 Stephen Berman <stephen.ber...@gmx.net> 
wrote:

> I've built current development LFS using jhalfs and when I invoke (via
> sudo or logged in as root) `shutdown -h now', the system appears to hang
[...]

On Sun, 26 Apr 2020 00:33:42 +0200 Stephen Berman <stephen.ber...@gmx.net> 
wrote:

> I've completed the bisection of the mainline kernel between the good
> v5.1 and the bad v5.2, and here's the result:
>
> 6d25be5782e482eb93e3de0c94d0a517879377d0 is the first bad commit
> commit 6d25be5782e482eb93e3de0c94d0a517879377d0
> Author: Thomas Gleixner <t...@linutronix.de>
> Date:   Wed Mar 13 17:55:48 2019 +0100
>
>     sched/core, workqueues: Distangle worker accounting from rq lock
[...]

On Mon, 27 Apr 2020 09:58:58 +0200 Stephen Berman <stephen.ber...@gmx.net> 
wrote:

> On Mon, 27 Apr 2020 03:45:13 -0400 Michael Shell <li...@michaelshell.org> 
> wrote:
>
>> On Sun, 26 Apr 2020 23:53:12 +0200
>> Stephen Berman <stephen.ber...@gmx.net> wrote:
>>
>>> As noted in my reply to Ken, I couldn't cleanly revert the commit in
>>> recent mainline or stable kernel sources.
>>
>>
>> If it were me, in this case, I would even consider manually editing
>> the code. It might come to that to find out exactly where the problem
>> is.
>
> I wanted to do that, but half of the hunks of the patch failed to apply,
> and in at least one case a function was changed that no longer exists,
> at least under the name occurring in the patch, so trying to disentangle
> this is not straightforward, at least for someone like me who's not
> familiar with the code.
>
>> In anycase, please do report this issue to the kernel developers
>> and do let us know if you ever find out what the problem was.
>
> Will do.

I did so last May 1, and that began an exchange on the linux kernel
mailing list with one of the kernel hackers, Sebastian Siewior (see
https://lore.kernel.org/lkml/87bln7ves7....@gmx.net/).  He was able to
diagnose the problem as being due to ACPI too aggressively mandating
polling of a temperature sensor on my motherboard, causing workers to
pile up, which need to be flushed on shutdown, but because there are so
many, it appears to hang (or just take a very long time).  By June 16 he
found a workaround: to increase the polling interval from 1 to 30
seconds by adding this to the kernel commandline on booting the system:

thermal.tzp=300

With this `shutdown -h now' again promptly powered off my machine, as it
did with earlier unaffected kernels.  (I was also contacted by someone
who had the same problem and found my post to the LFS support list, and
it turned out we have very similar motherboards: I have Gigabyte Z390 M
Gaming Rev. 1001 and he has Gigabyte Z390 Designare rev 1.0.  He
confirmed that the workaround worked for him too.)

Subsequently an ACPI hacker, Rafael Wysocki, started looking for a real
fix, and on January 14, found it
(https://lore.kernel.org/lkml/3391226.KRKnzuvfpg@kreacher/).  I applied
the patch to my local mainline kernel 5.11.0-rc4+ and confirmed that it
fixes the hang.

The patch was committed to the linux-next integration testing tree on
January 25
(https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/drivers/acpi/thermal.c?h=next-20210127&id=81b704d3e4674e09781d331df73d76675d5ad8cb)
and should appear in the mainline kernel shortly.

Steve Berman
-- 
http://lists.linuxfromscratch.org/listinfo/lfs-support
FAQ: http://www.linuxfromscratch.org/blfs/faq.html
Unsubscribe: See the above information page

Do not top post on this list.

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?

http://en.wikipedia.org/wiki/Posting_style

Reply via email to