[PATCH] drm/ttm: Don't evict BOs outside of the requested placement range

2014-10-10 Thread Alan Swanson
On Fri, 2014-10-10 at 12:20 +0900, Michel D?nzer wrote:
> On 09.10.2014 19:22, Alan Swanson wrote:
> > On 2014-10-09 07:02, Michel D?nzer wrote:
> >> From: Michel D?nzer 
> >>
> >> The radeon driver uses placement range restrictions for several reasons,
> >> in particular to make sure BOs in VRAM can be accessed by the CPU, e.g.
> >> during a page fault.
> >>
> >> Without this change, TTM could evict other BOs while trying to satisfy
> >> the requested placement, even if the evicted BOs were outside of the
> >> requested placement range. Doing so didn't free up any space in the
> >> requested placement range, so the (potentially high) eviction cost was
> >> incurred for no benefit.
> >>
> >> Nominating for stable because radeon driver changes in 3.17 made this
> >> much more noticeable than before.
> >>
> >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84662
> >> Cc: stable at vger.kernel.org
> >> Signed-off-by: Michel D?nzer 
> >> ---
> >>  drivers/gpu/drm/ttm/ttm_bo.c | 20 +---
> >>  1 file changed, 17 insertions(+), 3 deletions(-)
> >>
> [...]
> 
> > I believe you need to "s/place/placement/" over this patch.
> 
> The fpfn and lpfn members were moved from struct ttm_placement to a new 
> struct ttm_place in f1217ed09f827e42a49ffa6a5aab672aa6f57a65.
> 
> If you mean something else, please elaborate.

This patch failed to build on 3.17.0 so wouldn't be a candidate for
stable unless the currently drm-next only ttm_place patch also goes to
stable (else replace ttm_place with ttm_placements in the patch for
stable)?

-- 
Alan.




[PATCH] drm/ttm: Don't evict BOs outside of the requested placement range

2014-10-09 Thread Alan Swanson
On 2014-10-09 07:02, Michel D?nzer wrote:
> From: Michel D?nzer 
> 
> The radeon driver uses placement range restrictions for several 
> reasons,
> in particular to make sure BOs in VRAM can be accessed by the CPU, e.g.
> during a page fault.
> 
> Without this change, TTM could evict other BOs while trying to satisfy
> the requested placement, even if the evicted BOs were outside of the
> requested placement range. Doing so didn't free up any space in the
> requested placement range, so the (potentially high) eviction cost was
> incurred for no benefit.
> 
> Nominating for stable because radeon driver changes in 3.17 made this
> much more noticeable than before.
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84662
> Cc: stable at vger.kernel.org
> Signed-off-by: Michel D?nzer 
> ---
>  drivers/gpu/drm/ttm/ttm_bo.c | 20 +---
>  1 file changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c 
> b/drivers/gpu/drm/ttm/ttm_bo.c
> index 8f5cec6..407fa2d 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -709,6 +709,7 @@ out:
> 
>  static int ttm_mem_evict_first(struct ttm_bo_device *bdev,
>   uint32_t mem_type,
> + const struct ttm_place *place,
>   bool interruptible,
>   bool no_wait_gpu)
>  {
> @@ -720,8 +721,21 @@ static int ttm_mem_evict_first(struct 
> ttm_bo_device *bdev,
>   spin_lock(>lru_lock);
>   list_for_each_entry(bo, >lru, lru) {
>   ret = __ttm_bo_reserve(bo, false, true, false, NULL);
> - if (!ret)
> + if (!ret) {
> + if (place && (place->fpfn || place->lpfn)) {
> + /* Don't evict this BO if it's outside of the
> +  * requested placement range
> +  */
> + if (place->fpfn >= (bo->mem.start + 
> bo->mem.size) ||
> + (place->lpfn && place->lpfn <= 
> bo->mem.start)) {
> + __ttm_bo_unreserve(bo);
> + ret = -EBUSY;
> + continue;
> + }
> + }
> +
>   break;
> + }
>   }
> 
>   if (ret) {
> @@ -782,7 +796,7 @@ static int ttm_bo_mem_force_space(struct
> ttm_buffer_object *bo,
>   return ret;
>   if (mem->mm_node)
>   break;
> - ret = ttm_mem_evict_first(bdev, mem_type,
> + ret = ttm_mem_evict_first(bdev, mem_type, place,
> interruptible, no_wait_gpu);
>   if (unlikely(ret != 0))
>   return ret;
> @@ -1233,7 +1247,7 @@ static int ttm_bo_force_list_clean(struct
> ttm_bo_device *bdev,
>   spin_lock(>lru_lock);
>   while (!list_empty(>lru)) {
>   spin_unlock(>lru_lock);
> - ret = ttm_mem_evict_first(bdev, mem_type, false, false);
> + ret = ttm_mem_evict_first(bdev, mem_type, NULL, false, false);
>   if (ret) {
>   if (allow_errors) {
>   return ret;

I believe you need to "s/place/placement/" over this patch.

-- 
Alan.



[PATCH] drm/radeon: restrict engine clock to fastest power state

2013-05-30 Thread Alan Swanson
Radeon power management restricts the maximum engine clock to the initial
default clock. However, on APUs the default clock usually is not the fastest
allowed by their defined power states. Change restriction to the fastest
engine clock found in power states.

Signed-off-by: Alan Swanson 
---
 drivers/gpu/drm/radeon/radeon.h|  1 +
 drivers/gpu/drm/radeon/radeon_pm.c | 25 ++---
 2 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 1442ce7..83d1e76 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -,6 +,7 @@ struct radeon_pm {
u32 default_mclk;
u16 default_vddc;
u16 default_vddci;
+   u32 max_sclk;
struct radeon_i2c_chan *i2c_bus;
/* selected pm method */
enum radeon_pm_method pm_method;
diff --git a/drivers/gpu/drm/radeon/radeon_pm.c 
b/drivers/gpu/drm/radeon/radeon_pm.c
index 788c64c..3512af9 100644
--- a/drivers/gpu/drm/radeon/radeon_pm.c
+++ b/drivers/gpu/drm/radeon/radeon_pm.c
@@ -164,8 +164,8 @@ static void radeon_set_power_state(struct radeon_device 
*rdev)
if (radeon_gui_idle(rdev)) {
sclk = 
rdev->pm.power_state[rdev->pm.requested_power_state_index].
clock_info[rdev->pm.requested_clock_mode_index].sclk;
-   if (sclk > rdev->pm.default_sclk)
-   sclk = rdev->pm.default_sclk;
+   if (sclk > rdev->pm.max_sclk)
+   sclk = rdev->pm.max_sclk;

/* starting with BTC, there is one state that is used for both
 * MH and SH.  Difference is that we always use the high clock 
index for
@@ -307,7 +307,7 @@ static void radeon_pm_print_states(struct radeon_device 
*rdev)
DRM_DEBUG_DRIVER("State %d: %s\n", i,
radeon_pm_state_type_name[power_state->type]);
if (i == rdev->pm.default_power_state_index)
-   DRM_DEBUG_DRIVER("\tDefault");
+   DRM_DEBUG_DRIVER("\tDEFAULT\n");
if ((rdev->flags & RADEON_IS_PCIE) && !(rdev->flags & 
RADEON_IS_IGP))
DRM_DEBUG_DRIVER("\t%d PCIE Lanes\n", 
power_state->pcie_lanes);
if (power_state->flags & RADEON_PM_STATE_SINGLE_DISPLAY_ONLY)
@@ -329,6 +329,22 @@ static void radeon_pm_print_states(struct radeon_device 
*rdev)
}
 }

+static void radeon_pm_get_max_clocks(struct radeon_device *rdev)
+{
+   int i, j;
+   struct radeon_power_state *power_state;
+   struct radeon_pm_clock_info *clock_info;
+
+   for (i = 0; i < rdev->pm.num_power_states; i++) {
+   power_state = >pm.power_state[i];
+   for (j = 0; j < power_state->num_clock_modes; j++) {
+   clock_info = &(power_state->clock_info[j]);
+   if (clock_info->sclk > rdev->pm.max_sclk)
+   rdev->pm.max_sclk = clock_info->sclk;
+   }
+   }
+}
+
 static ssize_t radeon_get_pm_profile(struct device *dev,
 struct device_attribute *attr,
 char *buf)
@@ -588,6 +604,7 @@ int radeon_pm_init(struct radeon_device *rdev)
rdev->pm.default_mclk = rdev->clock.default_mclk;
rdev->pm.current_sclk = rdev->clock.default_sclk;
rdev->pm.current_mclk = rdev->clock.default_mclk;
+   rdev->pm.max_sclk = rdev->clock.default_sclk;
rdev->pm.int_thermal_type = THERMAL_TYPE_NONE;

if (rdev->bios) {
@@ -597,6 +614,7 @@ int radeon_pm_init(struct radeon_device *rdev)
radeon_combios_get_power_modes(rdev);
radeon_pm_print_states(rdev);
radeon_pm_init_profile(rdev);
+   radeon_pm_get_max_clocks(rdev);
/* set up the default clocks if the MC ucode is loaded */
if ((rdev->family >= CHIP_BARTS) &&
(rdev->family <= CHIP_CAYMAN) &&
@@ -848,6 +866,7 @@ static int radeon_debugfs_pm_info(struct seq_file *m, void 
*data)
seq_printf(m, "current engine clock: %u0 kHz\n", 
rdev->pm.current_sclk);
else
seq_printf(m, "current engine clock: %u0 kHz\n", 
radeon_get_engine_clock(rdev));
+   seq_printf(m, "maximum engine clock: %u0 kHz\n", rdev->pm.max_sclk);
seq_printf(m, "default memory clock: %u0 kHz\n", rdev->pm.default_mclk);
if (rdev->asic->pm.get_memory_clock)
seq_printf(m, "current memory clock: %u0 kHz\n", 
radeon_get_memory_clock(rdev));
-- 
1.8.1.5





[PATCH] drm/radeon: restrict engine clock to fastest power state

2013-05-29 Thread Alan Swanson
Radeon power management restricts the maximum engine clock to the initial
default clock. However, on APUs the default clock usually is not the fastest
allowed by their defined power states. Change restriction to the fastest
engine clock found in power states.

Signed-off-by: Alan Swanson swan...@ukfsn.org
---
 drivers/gpu/drm/radeon/radeon.h|  1 +
 drivers/gpu/drm/radeon/radeon_pm.c | 25 ++---
 2 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 1442ce7..83d1e76 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -,6 +,7 @@ struct radeon_pm {
u32 default_mclk;
u16 default_vddc;
u16 default_vddci;
+   u32 max_sclk;
struct radeon_i2c_chan *i2c_bus;
/* selected pm method */
enum radeon_pm_method pm_method;
diff --git a/drivers/gpu/drm/radeon/radeon_pm.c 
b/drivers/gpu/drm/radeon/radeon_pm.c
index 788c64c..3512af9 100644
--- a/drivers/gpu/drm/radeon/radeon_pm.c
+++ b/drivers/gpu/drm/radeon/radeon_pm.c
@@ -164,8 +164,8 @@ static void radeon_set_power_state(struct radeon_device 
*rdev)
if (radeon_gui_idle(rdev)) {
sclk = 
rdev-pm.power_state[rdev-pm.requested_power_state_index].
clock_info[rdev-pm.requested_clock_mode_index].sclk;
-   if (sclk  rdev-pm.default_sclk)
-   sclk = rdev-pm.default_sclk;
+   if (sclk  rdev-pm.max_sclk)
+   sclk = rdev-pm.max_sclk;
 
/* starting with BTC, there is one state that is used for both
 * MH and SH.  Difference is that we always use the high clock 
index for
@@ -307,7 +307,7 @@ static void radeon_pm_print_states(struct radeon_device 
*rdev)
DRM_DEBUG_DRIVER(State %d: %s\n, i,
radeon_pm_state_type_name[power_state-type]);
if (i == rdev-pm.default_power_state_index)
-   DRM_DEBUG_DRIVER(\tDefault);
+   DRM_DEBUG_DRIVER(\tDEFAULT\n);
if ((rdev-flags  RADEON_IS_PCIE)  !(rdev-flags  
RADEON_IS_IGP))
DRM_DEBUG_DRIVER(\t%d PCIE Lanes\n, 
power_state-pcie_lanes);
if (power_state-flags  RADEON_PM_STATE_SINGLE_DISPLAY_ONLY)
@@ -329,6 +329,22 @@ static void radeon_pm_print_states(struct radeon_device 
*rdev)
}
 }
 
+static void radeon_pm_get_max_clocks(struct radeon_device *rdev)
+{
+   int i, j;
+   struct radeon_power_state *power_state;
+   struct radeon_pm_clock_info *clock_info;
+
+   for (i = 0; i  rdev-pm.num_power_states; i++) {
+   power_state = rdev-pm.power_state[i];
+   for (j = 0; j  power_state-num_clock_modes; j++) {
+   clock_info = (power_state-clock_info[j]);
+   if (clock_info-sclk  rdev-pm.max_sclk)
+   rdev-pm.max_sclk = clock_info-sclk;
+   }
+   }
+}
+
 static ssize_t radeon_get_pm_profile(struct device *dev,
 struct device_attribute *attr,
 char *buf)
@@ -588,6 +604,7 @@ int radeon_pm_init(struct radeon_device *rdev)
rdev-pm.default_mclk = rdev-clock.default_mclk;
rdev-pm.current_sclk = rdev-clock.default_sclk;
rdev-pm.current_mclk = rdev-clock.default_mclk;
+   rdev-pm.max_sclk = rdev-clock.default_sclk;
rdev-pm.int_thermal_type = THERMAL_TYPE_NONE;
 
if (rdev-bios) {
@@ -597,6 +614,7 @@ int radeon_pm_init(struct radeon_device *rdev)
radeon_combios_get_power_modes(rdev);
radeon_pm_print_states(rdev);
radeon_pm_init_profile(rdev);
+   radeon_pm_get_max_clocks(rdev);
/* set up the default clocks if the MC ucode is loaded */
if ((rdev-family = CHIP_BARTS) 
(rdev-family = CHIP_CAYMAN) 
@@ -848,6 +866,7 @@ static int radeon_debugfs_pm_info(struct seq_file *m, void 
*data)
seq_printf(m, current engine clock: %u0 kHz\n, 
rdev-pm.current_sclk);
else
seq_printf(m, current engine clock: %u0 kHz\n, 
radeon_get_engine_clock(rdev));
+   seq_printf(m, maximum engine clock: %u0 kHz\n, rdev-pm.max_sclk);
seq_printf(m, default memory clock: %u0 kHz\n, rdev-pm.default_mclk);
if (rdev-asic-pm.get_memory_clock)
seq_printf(m, current memory clock: %u0 kHz\n, 
radeon_get_memory_clock(rdev));
-- 
1.8.1.5



___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] drm/ttm: add minimum residency constraint for bo eviction

2012-11-28 Thread Alan Swanson
On Wed, 2012-11-28 at 18:24 -0500, Jerome Glisse wrote:
> On Wed, Nov 28, 2012 at 6:18 PM, Thomas Hellstrom  
> wrote:
> > On 11/28/2012 04:58 PM, j.glisse at gmail.com wrote:
> >>
> >> From: Jerome Glisse 
> >>
> >> This patch add a minimum residency time configurable for each memory
> >> pool (VRAM, GTT, ...). Intention is to avoid having a lot of memory
> >> eviction from VRAM up to a point where the GPU pretty much spend all
> >> it's time moving things in and out.
> >
> >
> > This patch seems odd to me.
> >
> > It seems the net effect is to refuse evictions from VRAM and make buffers go
> > somewhere else, and that makes things faster?
> >
> > Why don't they go there in the first place instead of trying to force them
> > into VRAM,
> > when VRAM is full?
> >
> > /Thomas
> 
> It's mostly a side effect of cs and validating with each cs, if boA is
> in cs1 and not in cs2 and boB is in cs1 but not in cs2 than boA could
> be evicted by cs2 and boB moved in, if next cs ie cs3 is like cs1 then
> boA move back again and boB is evicted, then you get cs4 which
> reference boB but not boA, boA get evicted and boB move in ... So ttm
> just spend its time doing eviction but he doing so because it's ask by
> the driver to do so. Note that what is costly there is not the bo move
> in itself but the page allocation.
> 
> I propose this patch to put a boundary on bo eviction frequency, i
> thought it might help other driver, if you set the residency time to 0
> you get the current behavior, if you don't you enforce a minimum
> residency time which helps driver like radeon. Of course a proper fix
> to the bo eviction for radeon has to be in radeon code and is mostly
> an overhaul of how we validate bo.
> 
> But i still believe that this patch has value in itself by allowing
> driver to put a boundary on buffer movement frequency.
> 
> Cheers,
> Jerome

So, a variation on John Carmack's recommendation from 2000 to use MRU,
not LRU, to avoid texture trashing.

  Mar 07, 2000 - Virtualized video card local memory is The Right Thing.
  http://floodyberry.com/carmack/johnc_plan_2000.html

In fact, this was last discussed in 2005 with a patch for a 1 second
stale texture eviction and I (still) wondered why a method it was never
implemented since it was an clear problem.

  http://thread.gmane.org/gmane.comp.video.dri.devel/17274/focus=17305

-- 
Alan.





Re: [PATCH] drm/ttm: add minimum residency constraint for bo eviction

2012-11-28 Thread Alan Swanson
On Wed, 2012-11-28 at 18:24 -0500, Jerome Glisse wrote:
 On Wed, Nov 28, 2012 at 6:18 PM, Thomas Hellstrom tho...@shipmail.org wrote:
  On 11/28/2012 04:58 PM, j.gli...@gmail.com wrote:
 
  From: Jerome Glisse jgli...@redhat.com
 
  This patch add a minimum residency time configurable for each memory
  pool (VRAM, GTT, ...). Intention is to avoid having a lot of memory
  eviction from VRAM up to a point where the GPU pretty much spend all
  it's time moving things in and out.
 
 
  This patch seems odd to me.
 
  It seems the net effect is to refuse evictions from VRAM and make buffers go
  somewhere else, and that makes things faster?
 
  Why don't they go there in the first place instead of trying to force them
  into VRAM,
  when VRAM is full?
 
  /Thomas
 
 It's mostly a side effect of cs and validating with each cs, if boA is
 in cs1 and not in cs2 and boB is in cs1 but not in cs2 than boA could
 be evicted by cs2 and boB moved in, if next cs ie cs3 is like cs1 then
 boA move back again and boB is evicted, then you get cs4 which
 reference boB but not boA, boA get evicted and boB move in ... So ttm
 just spend its time doing eviction but he doing so because it's ask by
 the driver to do so. Note that what is costly there is not the bo move
 in itself but the page allocation.
 
 I propose this patch to put a boundary on bo eviction frequency, i
 thought it might help other driver, if you set the residency time to 0
 you get the current behavior, if you don't you enforce a minimum
 residency time which helps driver like radeon. Of course a proper fix
 to the bo eviction for radeon has to be in radeon code and is mostly
 an overhaul of how we validate bo.
 
 But i still believe that this patch has value in itself by allowing
 driver to put a boundary on buffer movement frequency.
 
 Cheers,
 Jerome

So, a variation on John Carmack's recommendation from 2000 to use MRU,
not LRU, to avoid texture trashing.

  Mar 07, 2000 - Virtualized video card local memory is The Right Thing.
  http://floodyberry.com/carmack/johnc_plan_2000.html

In fact, this was last discussed in 2005 with a patch for a 1 second
stale texture eviction and I (still) wondered why a method it was never
implemented since it was an clear problem.

  http://thread.gmane.org/gmane.comp.video.dri.devel/17274/focus=17305

-- 
Alan.



___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


r600g: Trying to remove the busywait for fence completion, but hitting inexplicable behaviour

2012-01-31 Thread Alan Swanson
On Tue, 2012-01-31 at 13:16 +, Simon Farnsworth wrote:
> Hello,
> 
> When profiling my workload on an AMD E-350 (PALM GPU) to see why it still
> wasn't performing well with Jerome's WIP macrotiling patches, I noticed that
> r600_fence_finish was taking 10% of my CPU time. I determined experimentally
> that changing from sched_yield() to os_time_sleep(10) fixed this and
> resolved my last performance issue on AMD Fusion as compared to Intel Atom,
> but felt that this was hacky.

No, you were right the first time, sched_yield should definitely not be
being used in this busy wait under Linux (even with its preceding few
spins).

Back in 2003 when Linux 2.5 changed sched_yield to move processes from
run queue to the expired queue instead, its use was discussed on the DRI
devel list.

http://lwn.net/Articles/31462/
http://thread.gmane.org/gmane.comp.video.dri.devel/5455/focus=6779

There stills seems to be five uses that should be checked.

src/gallium/drivers/r600/r600_pipe.c
src/gallium/drivers/nouveau/nouveau_fence.c
src/gallium/auxiliary/pipebuffer/pb_buffer_fenced.c
src/gallium/winsys/radeon/drm/radeon_drm_bo.c (*2)

-- 
Alan.



Re: r600g: Trying to remove the busywait for fence completion, but hitting inexplicable behaviour

2012-01-31 Thread Alan Swanson
On Tue, 2012-01-31 at 13:16 +, Simon Farnsworth wrote:
 Hello,
 
 When profiling my workload on an AMD E-350 (PALM GPU) to see why it still
 wasn't performing well with Jerome's WIP macrotiling patches, I noticed that
 r600_fence_finish was taking 10% of my CPU time. I determined experimentally
 that changing from sched_yield() to os_time_sleep(10) fixed this and
 resolved my last performance issue on AMD Fusion as compared to Intel Atom,
 but felt that this was hacky.

No, you were right the first time, sched_yield should definitely not be
being used in this busy wait under Linux (even with its preceding few
spins).

Back in 2003 when Linux 2.5 changed sched_yield to move processes from
run queue to the expired queue instead, its use was discussed on the DRI
devel list.

http://lwn.net/Articles/31462/
http://thread.gmane.org/gmane.comp.video.dri.devel/5455/focus=6779

There stills seems to be five uses that should be checked.

src/gallium/drivers/r600/r600_pipe.c
src/gallium/drivers/nouveau/nouveau_fence.c
src/gallium/auxiliary/pipebuffer/pb_buffer_fenced.c
src/gallium/winsys/radeon/drm/radeon_drm_bo.c (*2)

-- 
Alan.

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel