@Robie,
1. is it possible that users are using thermald on hardware not covered by
upstream tests? 
[Koba] As per my test cases, the older machine than kbl would be not covered.
but thermald is enabled since 2016, i thought Intel may not support the older 
fully.
If there's a regression, we could ask user to report on launchpad and help to 
fix.

2. By "all the unit tests must pass in all the supported
Intel CPUs", who defines "supported"?
[Koba], there's a supported CPU list in the thermald source, 
~~~
@src/thd_engine.cpp,
supported_ids_t id_table[] = {
...
>------->-------{ 6, 0x97 }, // Alderlake                                      
>------->-------{ 6, 0x9a }, // Alderlake                                       
>------->-------{ 6, 0xb7 }, // Raptorlake                                      
>------->-------{ 6, 0xba }, // Raptorlake                                    
>------->-------{ 6, 0xbf }, // Raptorlake
...
}
~~~
thermald is maintained by Intel and definitely Intel define "supported".
 
3. Is it possible that Ubuntu users have hardware not covered by that 
definition of "supported"? 
[Koba], I think it's impossible if there's one Intel platform missed in the 
supported list. HWE would find it at the developing stage because thermald 
would complain it first then HWE would check with Intel.

4. Is there any risk to users of non-Intel hardware? 
[Koba] There's only one chance that you add the '--ignore-cpuid-check'.
by the default, thermald would not work on non-Intel hardware.

5. How complete is upstream's test coverage?
[Koba] it cover all used modules and loaded policy tables.
a. used modules, rapl_control, intel_pstate, intel_powerclamp, cpufreq, 
processor.
b. load policy table from xml file or acpi tables.
c. Evaluate the temperature and check if the rules act correctly after 
activate/escalate/deescalate the cooling devices.
 
6. What assurance is there that there will be no feature
regressions?
i could only explain there may be corner cases for PL1 min/max feature.

---
for this commit, 
https://github.com/intel/thermal_daemon/commit/7e490fc79d784b3faf8314af98ec14981ba7fb75

1) Is this safe in relation to Ubuntu kernel versions? 
[Koba] I would say it's safe on Jammy/Focal
~~~
Jammy,
~~~~~~
TCC adjustment has been offloaded to kernel driver intel_tcc_cooling,
it's registered as a thermal cooling device.
2eb87d75f980) thermal/drivers/intel: Introduce tcc cooling driver.
This was merged to mainline since 5.13. Focal is using hwe-5.15.
Ref. https://www.phoronix.com/news/Linux-5.13-Intel-Cooling-Driver
~~~~~~

#Timo has a replied for Focal,
~~~~~~
commit fdf4f2fb8e8990c131b2b1a5a9c03681bb16e87a
Author: Srinivas Pandruvada <srinivas.pandruv...@linux.intel.com>
Date: Mon Jul 22 18:03:02 2019 -0700

     drivers: thermal: processor_thermal_device: Export sysfs interface
for TCC offset

so a backport to focal (which is planned) should be safe in that regard.
~~~~~~
~~~
2) Did this actually get checked before upload?
[Koba] i checked the related kernel commits if it's landed on Focal/Jammy.
 
3) What in your proposed QA process would catch this kind of change to ensure 
that
the specific requirements for each such deprecation is met in Ubuntu
[Koba] I have test cases but they are generic unit tests and a 
stressing&montioring test.
you could check the description. btw, there may be some edge cases I didn't 
meet, if the issue is trigged, just ask user to report the issue and help them 
to fix.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to thermald in Ubuntu.
https://bugs.launchpad.net/bugs/1995606

Title:
  Upgrade thermald to 2.5.1

Status in thermald package in Ubuntu:
  Fix Released
Status in thermald source package in Jammy:
  Incomplete

Bug description:
  [Justification]
  The purpose of this bug is that prevent the regression in the future.
  The automatic test scripts are better for the future SRU and is still on the 
planning.

  [Test case]
  For each supported CPU series (RPL/ADL/TGL/CML/CFL/KBL) the following tests 
will be run on machines in the CI lab:

  1. Run stress-ng, and observe the temperature/frequency/power with s-tui
    - Temperatures should stay just below trip values
    - Power/performance profiles should stay roughly the same between old 
thermald and new thermald (unless specifically expected eg: to fix 
premature/insufficient throttling)
  2. check if thermald could read rules from /dev/acpi_thermal_rel and generate 
the xml file on /etc/thermald/ correctly.
    - this depends on if acpi_thermal_rel exist.
    - if the machine suppots acpi_thermal_rel, the "thermal-conf.xml.auto"
   could be landed in etc/thermald/.
    - if not, the user-defined xml could be created, then jump to (3).
    - run thermald with --loglevel=debug, and compare the log with xml.auto 
file. check if the configuration could be parsed correctly.
  3. check if theramd-conf.xml and thermal-cpu-cdev-order.xml can be loaded 
correctly.
    - run thermald with --loglevel=debug, and compare the log with xml files.
    - if parsed correctly, the configurations from XML files would appear in 
the log.

  4. Run unit tests, the scripts are under test folder, using emul_temp to 
simulate the High temperatue and check thermald would throttle CPU through the 
related cooling device.
    - rapl.sh
    - intel_pstate.sh
    - powerclamp.sh
    - processor.sh
  5. check if the power/frequency would be throttled once the temperature reach 
the trip-points of thermal zone.
  6. check if system would be throttled even the temperature is under the 
trip-points.

  [ Where problems could occur ]
  since the PL1 min/max is introduced, we may face the edge case in the future.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/thermald/+bug/1995606/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to