Do you have CONFIG_CPU_FREQ defined in your kernel config?

I have an HP laptop where I have seen similar behavior. After dealing with it for some time, I tracked it down to a problem with changing the cpu's frequency. For a very small period after the clock is changed, the thermal sensor reads back nonsense. I've seen readings like "69... 69... 95... 70..." and that's with 0.5 second sampling. I've found 2 workarounds:

1) The quick and easy way:
       /etc/init.d/powernowd stop
       Now, build x.org
       /etc/init.d/powernowd start

       Of course you'll need to replace powernowd with what ever power management daemon you have emerged.

2) The uglier, but potentially more useful fix:
        Save this as thermal.diff:
-----------------------------------------------------------------------------------------
--- orig/drivers/acpi/thermal.c 2005-07-07 22:37: 42.000000000 -0400
+++ new/drivers/acpi/thermal.c  2005-06-15 18:30:43.000000000 -0400
@@ -61,7 +61,8 @@
 #define ACPI_THERMAL_MODE_ACTIVE       0x00
 #define ACPI_THERMAL_MODE_PASSIVE      0x01
 #define ACPI_THERMAL_MODE_CRITICAL     0xff
-#define ACPI_THERMAL_PATH_POWEROFF     "/sbin/poweroff"
+//#define ACPI_THERMAL_PATH_POWEROFF   "/sbin/poweroff"
+#define ACPI_THERMAL_PATH_POWEROFF     "/sbin/overheat"

 #define ACPI_THERMAL_MAX_ACTIVE        10
 #define ACPI_THERMAL_MAX_LIMIT_STR_LEN 65
-----------------------------------------------------------------------------------------
        Patch the kernel by cd'ing to /usr/src/linux and typing:
                patch -p1 < <path-to>/thermal.diff

        This will cause the kernel to call /sbin/overheat instead of /sbin/powerdown if your laptop hits a critical temperature.
        Save this as /sbin/overheat:
-----------------------------------------------------------------------------------------
#!/bin/bash

POWER_MGT_COMMAND=/etc/init.d/powernowd

if ${POWER_MGT_COMMAND} status > /dev/null ; then
    ${POWER_MGT_COMMAND} stop

    cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq \
        > /sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed
    echo -n 0 > /proc/acpi/thermal_zone/THRM/cooling_mode
    (
        echo System switched to low power mode for cooling
        cat /proc/acpi/thermal_zone/THRM/temperature
    ) | wall
fi
-----------------------------------------------------------------------------------------
        Make /sbin/overheat executable by typing:
            chmod 755 /sbin/overheat

        Now, when the thermal sensor reports crazy values, my laptop just slows way down instead of completely stopping.

On my todo list:
        o  After the temperature comes down, reenable power management
        o  If the temperature does not come down in a reasonable period, then shut it down.
        o  A better patch that takes into account cpufreq changes and disable the thermal faults for a few ms after a frequency change. I need to get a better idea of how long the sensor gives erroneous readings.

dcm

On 12/12/05, Mariusz Pękala <[EMAIL PROTECTED]> wrote:
> El Domingo, 11 de Diciembre de 2005 11:42, C. Beamer escribió:
> > My issue is this:  The computer powered off in the middle of the install
> > of xorg-x11.  This has happened a couple of times.  I haven't been
> > having problems with the laptop, so I'm pretty sure the issue has
> > something to do with power management since I built power management
> > into the kernel, but didn't emerge acpid.  Anyway, since the emerge of
> > xorg-x11 has bombed a couple of times, is there anything that I should
> > do in the way of clean up before trying to emerge it again?
> > Colleen

> On 2005-12-11 17:32:46 +0100 (Sun, Dec), Rafael Fernández López wrote:
> I can't find any sense at that issue: I can't understand what's the reason
> that make your computer turn off in a compilation.
>
> Well... I'm afraid of temperature. I hope that's not the reason, but is the
> first thing that came to my mind. Maybe in your laptop (I've an Amilo Fujitsu
> Siemens, and when compiling OO or KDE it is really hot), when it reachs some
> temperature it turns off because of security reasons.
>
> I cannot find any other reason.

I vote for temperature issues too. That is my experience with some
Aristo laptop - it get very hot very easily and powers off when
temperature exceeds 85 C.

You may try to run something like this while emerging:
# while sleep 5 ; do cat /proc/acpi/thermal_zone/THM0/temperature >>
  /tmp/temper ; done &

and hope that part of that file will survive the poweroff - you will see
whether temperature was raising before end.

Or you may put something like:
... do cat /proc/acp..... | tee -a /tmp/temper ; done &
in background in the session in which emerge runs and observe the
temperature between compilation lines.

The exact path to temperature file may differ, it will be something like
/proc/acpi/thermal_zone/*/temperature - and it will exist only if your
kernel has necessary drivers compiled (or modules inserted).

The /proc/acpi/thermal_zone/*/temperture file has about 30 bytes,
35 thousands of copies makes 1MB file, so you loop may run for 9
hours if storing one copy every second or 48 hours if appending one copy
every 5 seconds.

HTH.

--
No virus found in this outgoing message.
Checked by 'grep -i virus $MESSAGE'
Trust me.



Reply via email to