Mikhail <mp39...@gmail.com> writes:

> Recently I've bought subject laptop, but it had an issue - when I was
> doing git clone of a any huge tree, like linux kernel, it shut down in
> the beginning of 'Resolving deltas' stage. I'd tested Debian 11 and
> OpenBSD (current) - Debian shut down almost immediately, OpenBSD was
> working randomly but about ~3 minutes.
>
> Service guys set up Windows 10 on it and git clone worked fine there,
> they also did AIDA64 stress testing for a day, and I'd tested latest
> Ubuntu 22.04.1 and it worked without the issue. I did some digging and
> came up with this commit:
>
> ---
> commit 086aa750ab8f1698a6c6eaafe1458279776ce66d
> from: tb <t...@openbsd.org>
> date: Wed Aug 11 18:15:50 2021 UTC
>
> Make hw.setperf percentages proportional to the enhanced speed step
> frequencies on intel processors. This way, the default hw.setperf=99
> corresponds to the maximum ordinary speed while setting it to 100
> enables turbo mode.
>
> Tested in snaps for a week, positive feedback from several.
>
> M  sys/arch/amd64/amd64/est.c
> ---
>
> Apparently setperf was set to 100, and if the CPU works in that "turbo
> mode" for long enough it overheats and shuts down (sensors shows
> temperature of ~100C while in 'Resolving deltas').
>
> Work around was to set:
> hw.perfpolicy=manual
> hw.setperf=99
>
> I was watching how Windows 10 behaves and noticed that when CPU
> intensive thing starts - it boosts CPU for 100% for about 5 seconds and
> then drops it to 85% and continue to work in this mode till the end of
> the task.
>
> My previous laptop is around 12 years old, so I don't follow how modern
> HW in this segment works, I was told that current laptops can work in
> high speed mode only for a while, and then they should back to normal
> speeds, otherwise behavior is unpredictable (as with this one).
>
> I'd like to know from your experience - is that shitty cooling design
> from Lenovo, or the OSes should be more careful about setting turbo
> mode? Since the commit is 1 year old and no one complained, I assume
> it's first, but would like to know opinion of the list.

Both, imo. My Surface Go3 exhibits this, so if I'm compiling a kernel or
running a high-cpu process for more than 30s or so I manualy setperf to
~45-55 otherwise it overheats and the firmware does a safety shutoff.

Some folks as well as myself have experimented with implementing more
"modern" HWP support but from my experience doing so it's not a silver
bullet in these thermal situations. Intel's SDM isn't very clear, to me,
on how much faith we can put into the cpu to self-regulate and it's
clear from looking at other OS's they have to use thermal sensors to do
some self-regulation. Diffs have floated around, but it's not a simple
problem to solve, unfortunately. And while Intel does it one way...AMD
has their own conventions.

We also don't generally have concensus on the actual problem we want to
solve. Some folks care about battery life. I care about thermal issues
and don't want something burning a hole through my lap and killing
itself in the process. This leads to too many knobs and dials in
everyone's proposed diffs.

so the tl;dr: any passively cooled mobile Intel chipsets these days
are really a gamble and the hw vendors are punting responsibility to os
developers to account for their lack of cooling.

-dv

Reply via email to