I've posted this here simply as google bait for the next poor sod suffering with radeon power management issues.

Why does the in-kernel radeon driver try and cook my machine by default?

Since I bought this machine, I've run with a hard-coded hack to keep the card in low power mode. This has not previously had any issues, even to being able to watch three full screen HD streams (one on each head).

A recent change to the radeon driver left me with noise on the right hand side of all three heads, and a plea for assistance was me with a response that multiple heads are not supported in low-power mode (thus my hack in the first place).

The issue here is that all the multi-head profiles run the card in the highest power profile, and the machine ends up sounding like a 747 trying to keep the card cool.

Poking into radeon_asic.c tells me my cards power profiles are setup in : r600_pm_init_profile

Time to have a look at what profiles the card actually has.
Enabling dri debugging and booting up leaves this in my log

Sep 6 13:47:37 localhost kernel: [ 3.919380] [drm:radeon_pm_print_states], 5 Power State(s) Sep 6 13:47:37 localhost kernel: [ 3.919381] [drm:radeon_pm_print_states], State 0: Default Sep 6 13:47:37 localhost kernel: [ 3.919382] [drm:radeon_pm_print_states], Default Sep 6 13:47:37 localhost kernel: [ 3.919383] [drm:radeon_pm_print_states], 16 PCIE Lanes Sep 6 13:47:37 localhost kernel: [ 3.919384] [drm:radeon_pm_print_states], 3 Clock Mode(s) Sep 6 13:47:37 localhost kernel: [ 3.919385] [drm:radeon_pm_print_states], 0 e: 680000 m: 900000 v: 1100 No display only Sep 6 13:47:37 localhost kernel: [ 3.919386] [drm:radeon_pm_print_states], 1 e: 680000 m: 900000 v: 1100 Sep 6 13:47:37 localhost kernel: [ 3.919387] [drm:radeon_pm_print_states], 2 e: 680000 m: 900000 v: 1100 Sep 6 13:47:37 localhost kernel: [ 3.919388] [drm:radeon_pm_print_states], State 1: Performance Sep 6 13:47:37 localhost kernel: [ 3.919389] [drm:radeon_pm_print_states], 16 PCIE Lanes Sep 6 13:47:37 localhost kernel: [ 3.919390] [drm:radeon_pm_print_states], 3 Clock Mode(s) Sep 6 13:47:37 localhost kernel: [ 3.919391] [drm:radeon_pm_print_states], 0 e: 100000 m: 149000 v: 900 No display only Sep 6 13:47:37 localhost kernel: [ 3.919392] [drm:radeon_pm_print_states], 1 e: 398000 m: 900000 v: 1000 Sep 6 13:47:37 localhost kernel: [ 3.919393] [drm:radeon_pm_print_states], 2 e: 680000 m: 900000 v: 1100 Sep 6 13:47:37 localhost kernel: [ 3.919395] [drm:radeon_pm_print_states], State 2: Default Sep 6 13:47:37 localhost kernel: [ 3.919395] [drm:radeon_pm_print_states], 16 PCIE Lanes Sep 6 13:47:37 localhost kernel: [ 3.919396] [drm:radeon_pm_print_states], 3 Clock Mode(s) Sep 6 13:47:37 localhost kernel: [ 3.919397] [drm:radeon_pm_print_states], 0 e: 298000 m: 900000 v: 950 No display only Sep 6 13:47:37 localhost kernel: [ 3.919398] [drm:radeon_pm_print_states], 1 e: 298000 m: 900000 v: 950 Sep 6 13:47:37 localhost kernel: [ 3.919399] [drm:radeon_pm_print_states], 2 e: 680000 m: 900000 v: 1100 Sep 6 13:47:37 localhost kernel: [ 3.919400] [drm:radeon_pm_print_states], State 3: Default Sep 6 13:47:37 localhost kernel: [ 3.919401] [drm:radeon_pm_print_states], 16 PCIE Lanes Sep 6 13:47:37 localhost kernel: [ 3.919402] [drm:radeon_pm_print_states], 3 Clock Mode(s) Sep 6 13:47:37 localhost kernel: [ 3.919403] [drm:radeon_pm_print_states], 0 e: 502000 m: 900000 v: 1050 No display only Sep 6 13:47:37 localhost kernel: [ 3.919404] [drm:radeon_pm_print_states], 1 e: 502000 m: 900000 v: 1050 Sep 6 13:47:37 localhost kernel: [ 3.919405] [drm:radeon_pm_print_states], 2 e: 680000 m: 900000 v: 1100 Sep 6 13:47:37 localhost kernel: [ 3.919406] [drm:radeon_pm_print_states], State 4: Battery Sep 6 13:47:37 localhost kernel: [ 3.919407] [drm:radeon_pm_print_states], 16 PCIE Lanes Sep 6 13:47:37 localhost kernel: [ 3.919408] [drm:radeon_pm_print_states], 3 Clock Mode(s) Sep 6 13:47:37 localhost kernel: [ 3.919409] [drm:radeon_pm_print_states], 0 e: 100000 m: 149000 v: 900 No display only Sep 6 13:47:37 localhost kernel: [ 3.919410] [drm:radeon_pm_print_states], 1 e: 100000 m: 149000 v: 900 Sep 6 13:47:37 localhost kernel: [ 3.919411] [drm:radeon_pm_print_states], 2 e: 100000 m: 149000 v: 900 Sep 6 13:47:37 localhost kernel: [ 3.920559] [drm] radeon: power management initialized

So, because my GPU is a mobile device (rdev->flags & RADEON_IS_MOBILITY is true), then my single head profiles are selected from the Battery profile, and because there is not a second Battery profile the multi-head profiles come from profile 0 (default).

This card actually has 3 profiles called default, and interestingly the second one looks almost sane.

Hard coding the profile indexes works, however even with the low profile there (Profile 2, clock mode 0) the RAM is running flat out and it still generates some not insignificant heat.

So, ultimately I came up with the following hack :

diff -u temp/linux-3.4.4/drivers/gpu/drm/radeon/radeon_drv.c linux-3.4.4/drivers/gpu/drm/radeon/radeon_drv.c --- temp/linux-3.4.4/drivers/gpu/drm/radeon/radeon_drv.c 2012-09-06 15:27:18.337696944 +0800 +++ linux-3.4.4/drivers/gpu/drm/radeon/radeon_drv.c 2012-09-06 16:06:05.252394033 +0800
@@ -137,6 +137,8 @@
 int radeon_pcie_gen2 = 0;
 int radeon_msi = -1;
 int radeon_lockup_timeout = 10000;
+int radeon_minsclk = 0;
+int radeon_minmclk = 0;

 MODULE_PARM_DESC(no_wb, "Disable AGP writeback for scratch registers");
 module_param_named(no_wb, radeon_no_wb, int, 0444);
@@ -189,6 +191,12 @@
MODULE_PARM_DESC(lockup_timeout, "GPU lockup timeout in ms (defaul 10000 = 10 seconds, 0 = disable)");
 module_param_named(lockup_timeout, radeon_lockup_timeout, int, 0444);

+MODULE_PARM_DESC(minsclk, "Minimum GPU clock speed");
+module_param_named(minsclk, radeon_minsclk, int, 0644);
+
+MODULE_PARM_DESC(minmclk, "Minimum Memory clock speed");
+module_param_named(minmclk, radeon_minmclk, int, 0644);
+
 static int radeon_suspend(struct drm_device *dev, pm_message_t state)
 {
        drm_radeon_private_t *dev_priv = dev->dev_private;
diff -u temp/linux-3.4.4/drivers/gpu/drm/radeon/radeon.h linux-3.4.4/drivers/gpu/drm/radeon/radeon.h --- temp/linux-3.4.4/drivers/gpu/drm/radeon/radeon.h 2012-09-06 15:27:13.733803910 +0800 +++ linux-3.4.4/drivers/gpu/drm/radeon/radeon.h 2012-09-06 15:45:06.678661305 +0800
@@ -95,6 +95,8 @@
 extern int radeon_pcie_gen2;
 extern int radeon_msi;
 extern int radeon_lockup_timeout;
+extern int radeon_minsclk;
+extern int radeon_minmclk;

 /*
* Copy from radeon_drv.h so we don't have to include both and have conflicting diff -u temp/linux-3.4.4/drivers/gpu/drm/radeon/radeon_pm.c linux-3.4.4/drivers/gpu/drm/radeon/radeon_pm.c --- temp/linux-3.4.4/drivers/gpu/drm/radeon/radeon_pm.c 2012-09-06 15:27:13.739803773 +0800 +++ linux-3.4.4/drivers/gpu/drm/radeon/radeon_pm.c 2012-09-06 15:43:13.115341210 +0800
@@ -120,7 +120,7 @@
                break;
        case PM_PROFILE_LOW:
                if (rdev->pm.active_crtc_count > 1)
-                       rdev->pm.profile_index = PM_PROFILE_LOW_MH_IDX;
+                       rdev->pm.profile_index = PM_PROFILE_LOW_SH_IDX;
                else
                        rdev->pm.profile_index = PM_PROFILE_LOW_SH_IDX;
                break;
@@ -193,7 +193,10 @@
                        clock_info[rdev->pm.requested_clock_mode_index].mclk;
                if (mclk > rdev->pm.default_mclk)
                        mclk = rdev->pm.default_mclk;
-
+               if (mclk < radeon_minmclk)
+                       mclk = radeon_minmclk;
+               if (sclk < radeon_minsclk)
+                       sclk = radeon_minsclk;
                /* upvolt before raising clocks, downvolt after lowering clocks 
*/
                if (sclk < rdev->pm.current_sclk)
                        misc_after = true;

I can now set a minimum clock speed for both the GPU and RAM and activate it by switching profiles. Turning the GPU clock up to the same speed as the RAM in the lowest profile (150000) and running Clock mode 0 in Profile 5 sees all my visual artefacts go away, and I can resume using the machine without screaming fans.

Obviously the selection of correct default power profiles is a difficult issue and subject to the vagaries of the lunatic who wrote the cards BIOS. I don't pretend to have the answer, but I do have a hack that works for me (ugly as it may be).

I'm happy to work on a fix for this (it's not like I'm an isolated case here, a quick google search turns up plenty of hits) if someone can help me understand the right way to fix it properly.

Regards,
Brad
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Reply via email to