http://bugzilla.kernel.org/show_bug.cgi?id=11878





------- Comment #15 from [EMAIL PROTECTED]  2008-11-03 09:04 -------
Created an attachment (id=18637)
 --> (http://bugzilla.kernel.org/attachment.cgi?id=18637&action=view)
dmesg-thermtrip.txt

Hi Zhao,
thanks, I see, definitely the BIOS is broken, but ...

The reality is, that most of the companies designing this funny stuff, I would
say, are living on some unknown planet. If their gadgets are working in
the planned environment (e.g. by incident), they stop any further actions.
Luckily the linux kernel is open source and there must be something in the
kernel disturbing the (broken) BIOS.

If I try some sort of rational reasoning (I don't jet have the time
to RTFSMIM), the for me visible facts are:

* There is no way to manage the fan state directly (which is IMO good thing)
  but further this state is reported always on to os (BUG_1).

* IMO the BIOS through SMI, or what ever, triggers *always* the temperature
object
  state-update irregularly (BUG_2), much few times that it should be.
  But it in a normal case the fan gets sooner or later *switched off*.

* In *some* linux kernels during the uptime from 5 min to 2 h the fan stucks on
  (BUG_3).

* I patched acpi_video_device_lcd_set_level() and it turns out that even
writing
  a constant *same* brightness value (also actualy not messing with the 
  backlight voltage value) kicks the SMI system to update the
  temperature/fan state.

* While make -j3 kernel, one of the cores got rescued from meltdown (BUG_4):

> mcelog 
MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 1 THERMAL EVENT TSC cf9edaa98
Processor core below trip temperature. Throttling disabled
STATUS 882d0100 MCGSTATUS 0

I can only try to guess:
* it is perfectly fine, the cpu is designed for
* wrong or old microcode:
[   20.406893] IA-32 Microcode Update Driver: v1.14a
<[EMAIL PROTECTED]>
[   20.409382] firmware: requesting intel-ucode/06-0f-0d
[   20.441544] firmware: requesting intel-ucode/06-0f-0d
[
* the cpu is buggy
* a damage somehow related to this issue

* Well, the kernel has to have a full load of quirks everywhere;) It may be
  a not a such crazy idea to quirk it, i.e. set a timer to 'kick the SMI' this
  or other (yet to be found) way. I think every half second would make a sense.

I am curious enough to try to understand and track down the bug (particularly
BUG_3) to some single option or patch, so any explanations or hints where to
look or instrument the kernel are really appreciated.

After more than 2h uptime and I used always the same (latest) ubuntu 'boot
machinery' IIRC:

* the working kernels are:
  vanilla 2.6.26.7 *2.6.27.4*
  ubuntu 2.6.24-21

* fan stays on:
  vanilla 2.6.28 rc1, rc2
  current ubuntu 2.6.27-7.14


-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
acpi-bugzilla mailing list
acpi-bugzilla@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/acpi-bugzilla

Reply via email to