On Tue, May 14, 2013 at 9:56 PM, Sonny Rao <sonny...@chromium.org> wrote: > On Tue, May 14, 2013 at 9:34 PM, Sonny Rao <sonny...@chromium.org> wrote: >> On Tue, May 14, 2013 at 9:29 PM, Zhang Rui <rui.zh...@intel.com> wrote: >>> On Wed, 2013-05-15 at 12:26 +0800, Zhang Rui wrote: >>>> please >>>> >>>> On Tue, 2013-05-14 at 21:18 -0700, Sonny Rao wrote: >>>> > Hi, I've seen a regression in kernels since 3.7 on x86 devices where >>>> > the kernel turns the system fans on to max speed after resuming from >>>> > ram. Other people have noticed it as well, for example see >>>> > https://bugzilla.redhat.com/show_bug.cgi?id=895276 >>>> > >>>> please check if this is a duplicate of bug >>>> https://bugzilla.kernel.org/show_bug.cgi?id=56591 >>> or you can try 3.10-rc1 to see if the problem still exists or not. >> >> Ok, I patched in the fix from that bugzilla -- >> 928c5edbe6f7cb0d1c71bc2353d091bc5b114fe3 >> but I'm still seeing the issue, I'll try 3.10-rc1 next >> > > 3.10-rc1 seems good > 3.9.2 is okay, though fans do seem to be on more for a while after > resume, it eventually turns off > 3.8.13 seems to still be broken, with fans at maximum >
So, I did a reverse bisect between 3.9 and 3.9.1 and found that the commit you mentioned does indeed fix the problem on 3.9, and I double-checked that it doesn't seem to be fixed on 3.8.13. So, I made a 3.8.13 version of this debug patch in the bugzilla entry https://bugzilla.kernel.org/attachment.cgi?id=98671 and I never see the thermal_cdev_update getting called for cdev 0 or cdev 1, yet they are set to 1 after resume. Perhaps something else is enabling them? >>> >>> thanks, >>> rui >>>> > For example on the Samsung 550 Chromebook, we have one thermal zone >>>> > and have 5 cooling_devices, 0-4, which correspond to 5 possible fan >>>> > speeds. Under typical idle, only cooling_device4 and maybe >>>> > cooling_device3 are active, depending on temperature: >>>> > >>>> > cat /sys/class/thermal/cooling_device[01234]/cur_state >>>> > /sys/class/thermal/thermal_zone0/temp >>>> > 0 >>>> > 0 >>>> > 0 >>>> > 0 >>>> > 1 >>>> > 57000 >>>> > >>>> > however after a suspend/resume, we see that cooling_devices 0 and 1 >>>> > become active: >>>> > cat /sys/class/thermal/cooling_device[01234]/cur_state >>>> > /sys/class/thermal/thermal_zone0/temp >>>> > 1 >>>> > 1 >>>> > 0 >>>> > 0 >>>> > 1 >>>> > 54000 >>>> > >>>> > and it seems to stay that way, even though the temperature is low >>>> > enough that the fan shouldn't be running at that speed. If I manually >>>> > disable cooling_devices 0 and 1 then fan control works normally again. >>>> > >>>> > I started bisecting it and was able to do so up until this commit: >>>> > commit 29b19e250434c6193c8b8e4c34c9c6284dd4f101 >>>> > Merge: 125c4c7 c072fed >>>> > Author: Len Brown <len.br...@intel.com> >>>> > AuthorDate: Tue Oct 9 01:35:52 2012 -0400 >>>> > Commit: Len Brown <len.br...@intel.com> >>>> > CommitDate: Tue Oct 9 01:35:52 2012 -0400 >>>> > >>>> > Merge branch 'release' of >>>> > git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux into >>>> > thermal >>>> > >>>> > unfortunately, I'm not able to successfully do a suspend/resume on the >>>> > commits in that merge, so I wasn't able to bisect down to the exact >>>> > commit. >>>> > >>>> > I did confirm that one parent of the merge is okay: commit >>>> > 125c4c706b680c7831f0966ff873c1ad0354ec25 idr: rename MAX_LEVEL to >>>> > MAX_IDR_LEVEL >>>> > >>>> > so I think it falls somewhere in this list of commits: >>>> > c072fed95c9855a920c114d7fa3351f0f54ea06e...e3f25e6e5836c4790fbe395ff42e241f372d859d >>>> > >>>> > c072fed9 thermal: Exynos: Fix NULL pointer dereference in >>>> > exynos_unregister_thermal() >>>> > a4b6fec9 Thermal: Fix bug on cpu_cooling, cooling device's id conflict >>>> > problem. >>>> > 79e093c3 thermal: exynos: Use devm_* functions >>>> > 17be868e ARM: exynos: add thermal sensor driver platform data support >>>> > 7e0b55e6 thermal: exynos: register the tmu sensor with the kernel >>>> > thermal layer >>>> > f22d9c03c thermal: exynos5: add exynos5250 thermal sensor driver support >>>> > c48cbba6 hwmon: exynos4: move thermal sensor driver to driver/thermal >>>> > directory >>>> > 02361418 thermal: add generic cpufreq cooling implementation >>>> > a7a3b8c8 Fix a build error. >>>> > 204dd1d3 thermal: Fix potential NULL pointer accesses >>>> > 1e426ffdd thermal: add Renesas R-Car thermal sensor support >>>> > 79a49168 thermal: fix potential out-of-bounds memory access >>>> > f4a821ce6 Thermal: Introduce locking for cdev.thermal_instances list. >>>> > 908b9fb79 Thermal: Unify the code for both active and passive cooling >>>> > ce119f832 Thermal: Introduce simple arbitrator for setting device >>>> > cooling state >>>> > b5e4ae62 Thermal: List thermal_instance in thermal_cooling_device. >>>> > cddf31b3b Thermal: Rename thermal_instance.node to >>>> > thermal_instance.tz_node. >>>> > 2d374139 Thermal: Rename thermal_zone_device.cooling_devices >>>> > b81b6ba3 Thermal: rename structure thermal_cooling_device_instance to >>>> > thermal_instance >>>> > 4ae46befb Thermal: Introduce thermal_zone_trip_update() >>>> > 1b7ddb84 Thermal: Remove tc1/tc2 in generic thermal layer. >>>> > 601f3d424 Thermal: Introduce .get_trend() callback. >>>> > 9d99842f9 Thermal: set upper and lower limits >>>> > 74051ba5 Thermal: Introduce cooling states range support >>>> > >>>> > When I get time, I'll try to rebase those commits onto the IDR commit >>>> > and see if I can get a better bisect. Any insights into the problem >>>> > would be appreciated, thanks. >>>> >>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-pm" in >>>> the body of a message to majord...@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/