http://bugzilla.kernel.org/show_bug.cgi?id=11878
------- Comment #15 from [EMAIL PROTECTED] 2008-11-03 09:04 ------- Created an attachment (id=18637) --> (http://bugzilla.kernel.org/attachment.cgi?id=18637&action=view) dmesg-thermtrip.txt Hi Zhao, thanks, I see, definitely the BIOS is broken, but ... The reality is, that most of the companies designing this funny stuff, I would say, are living on some unknown planet. If their gadgets are working in the planned environment (e.g. by incident), they stop any further actions. Luckily the linux kernel is open source and there must be something in the kernel disturbing the (broken) BIOS. If I try some sort of rational reasoning (I don't jet have the time to RTFSMIM), the for me visible facts are: * There is no way to manage the fan state directly (which is IMO good thing) but further this state is reported always on to os (BUG_1). * IMO the BIOS through SMI, or what ever, triggers *always* the temperature object state-update irregularly (BUG_2), much few times that it should be. But it in a normal case the fan gets sooner or later *switched off*. * In *some* linux kernels during the uptime from 5 min to 2 h the fan stucks on (BUG_3). * I patched acpi_video_device_lcd_set_level() and it turns out that even writing a constant *same* brightness value (also actualy not messing with the backlight voltage value) kicks the SMI system to update the temperature/fan state. * While make -j3 kernel, one of the cores got rescued from meltdown (BUG_4): > mcelog MCE 0 HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 1 THERMAL EVENT TSC cf9edaa98 Processor core below trip temperature. Throttling disabled STATUS 882d0100 MCGSTATUS 0 I can only try to guess: * it is perfectly fine, the cpu is designed for * wrong or old microcode: [ 20.406893] IA-32 Microcode Update Driver: v1.14a <[EMAIL PROTECTED]> [ 20.409382] firmware: requesting intel-ucode/06-0f-0d [ 20.441544] firmware: requesting intel-ucode/06-0f-0d [ * the cpu is buggy * a damage somehow related to this issue * Well, the kernel has to have a full load of quirks everywhere;) It may be a not a such crazy idea to quirk it, i.e. set a timer to 'kick the SMI' this or other (yet to be found) way. I think every half second would make a sense. I am curious enough to try to understand and track down the bug (particularly BUG_3) to some single option or patch, so any explanations or hints where to look or instrument the kernel are really appreciated. After more than 2h uptime and I used always the same (latest) ubuntu 'boot machinery' IIRC: * the working kernels are: vanilla 2.6.26.7 *2.6.27.4* ubuntu 2.6.24-21 * fan stays on: vanilla 2.6.28 rc1, rc2 current ubuntu 2.6.27-7.14 -- Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ acpi-bugzilla mailing list acpi-bugzilla@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/acpi-bugzilla