[PATCH] drm/radeon: fix TOPDOWN handling for bo_create

2015-03-12 Thread Lauri Kasanen
On Thu, 12 Mar 2015 18:02:56 +0900
Michel Dänzer  wrote:

> struct ttm_place::lpfn is honoured even with TTM_PL_FLAG_TOPDOWN, so
> latter should work with RADEON_GEM_CPU_ACCESS. It sounds like the
> problem is really that some BOs are expected to be within a certain
> range from the beginning of VRAM, but lpfn isn't set accordingly. It
> would be better to fix that by setting lpfn directly than indirectly via
> RADEON_GEM_CPU_ACCESS.
> 
> 
> Anyway, since this isn't the first bug which prevents
> TTM_PL_FLAG_TOPDOWN from working as intended in the radeon driver, I
> wonder if its performance impact should be re-evaluated. Lauri?

I'm sorry, I'm not in a place where I could spend the time to redo the
benchmarks.

If it causes too many issues it is of course easy to disable, but so
far the issues shown have not been caused by it - it merely exposed
wrong settings/bugs elsewhere. From this POV I would say it's good to
have it enabled, to stress the various parts.

This doesn't warm the heart of the guy with flicker after suspend, so
perhaps a kernel module parameter to disable it (defaulting to enabled)?

- Lauri


[PATCH 1/5] drm/radeon: Set TTM_PL_FLAG_TOPDOWN also for RADEON_GEM_CPU_ACCESS BOs

2014-10-28 Thread Lauri Kasanen
On Tue, 28 Oct 2014 18:35:02 +0900
Michel Dänzer  wrote:

> From: Michel Dänzer 
> 
> I wasn't sure if TTM_PL_FLAG_TOPDOWN works correctly with non-0 lpfn, but
> AFAICT it does.
> 
> Signed-off-by: Michel Dänzer 

This series is

Reviewed-by: Lauri Kasanen 

- Lauri


[PATCH] drm/radeon: Inline r100_mm_rreg, -wreg, v3

2014-07-10 Thread Lauri Kasanen
On Sun, 20 Apr 2014 19:41:11 +0200
Christian K?nig  wrote:

> Am 20.04.2014 19:29, schrieb Lauri Kasanen:
> > This was originally un-inlined by Andi Kleen in 2011 citing size concerns.
> > Indeed, a first attempt at inlining it grew radeon.ko by 7%.
> >
> > However, 2% of cpu is spent in this function. Simply inlining it gave 1% 
> > more fps
> > in Urban Terror.
> >
> > v2: We know the minimum MMIO size. Adding it to the if allows the compiler 
> > to
> > optimize the branch out, improving both performance and size.
> >
> > The v2 patch decreases radeon.ko size by 2%. I didn't re-benchmark, but 
> > common sense
> > says perf is now more than 1% better.
> >
> > v3: Also change _wreg, make the threshold a define.
> >
> > Inlining _wreg increased the size a bit compared to v2, so now radeon.ko
> > is only 1% smaller.
> >
> > Signed-off-by: Lauri Kasanen 
> 
> Reviewed-by: Christian K?nig 

Ping. Although reviewed in April, seems this wasn't applied to any tree?

- Lauri


[PATCH] drm/radeon: Inline r100_mm_rreg, -wreg, v3

2014-04-20 Thread Lauri Kasanen
This was originally un-inlined by Andi Kleen in 2011 citing size concerns.
Indeed, a first attempt at inlining it grew radeon.ko by 7%.

However, 2% of cpu is spent in this function. Simply inlining it gave 1% more 
fps
in Urban Terror.

v2: We know the minimum MMIO size. Adding it to the if allows the compiler to
optimize the branch out, improving both performance and size.

The v2 patch decreases radeon.ko size by 2%. I didn't re-benchmark, but common 
sense
says perf is now more than 1% better.

v3: Also change _wreg, make the threshold a define.

Inlining _wreg increased the size a bit compared to v2, so now radeon.ko
is only 1% smaller.

Signed-off-by: Lauri Kasanen 
---
 drivers/gpu/drm/radeon/r100.c   | 33 -
 drivers/gpu/drm/radeon/radeon.h | 40 
 2 files changed, 36 insertions(+), 37 deletions(-)

diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c
index b6c3264..a4e7871 100644
--- a/drivers/gpu/drm/radeon/r100.c
+++ b/drivers/gpu/drm/radeon/r100.c
@@ -4086,39 +4086,6 @@ int r100_init(struct radeon_device *rdev)
return 0;
 }

-uint32_t r100_mm_rreg(struct radeon_device *rdev, uint32_t reg,
- bool always_indirect)
-{
-   if (reg < rdev->rmmio_size && !always_indirect)
-   return readl(((void __iomem *)rdev->rmmio) + reg);
-   else {
-   unsigned long flags;
-   uint32_t ret;
-
-   spin_lock_irqsave(&rdev->mmio_idx_lock, flags);
-   writel(reg, ((void __iomem *)rdev->rmmio) + RADEON_MM_INDEX);
-   ret = readl(((void __iomem *)rdev->rmmio) + RADEON_MM_DATA);
-   spin_unlock_irqrestore(&rdev->mmio_idx_lock, flags);
-
-   return ret;
-   }
-}
-
-void r100_mm_wreg(struct radeon_device *rdev, uint32_t reg, uint32_t v,
- bool always_indirect)
-{
-   if (reg < rdev->rmmio_size && !always_indirect)
-   writel(v, ((void __iomem *)rdev->rmmio) + reg);
-   else {
-   unsigned long flags;
-
-   spin_lock_irqsave(&rdev->mmio_idx_lock, flags);
-   writel(reg, ((void __iomem *)rdev->rmmio) + RADEON_MM_INDEX);
-   writel(v, ((void __iomem *)rdev->rmmio) + RADEON_MM_DATA);
-   spin_unlock_irqrestore(&rdev->mmio_idx_lock, flags);
-   }
-}
-
 u32 r100_io_rreg(struct radeon_device *rdev, u32 reg)
 {
if (reg < rdev->rio_mem_size)
diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index f21db7a..a749b6c 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -2328,10 +2328,42 @@ int radeon_device_init(struct radeon_device *rdev,
 void radeon_device_fini(struct radeon_device *rdev);
 int radeon_gpu_wait_for_idle(struct radeon_device *rdev);

-uint32_t r100_mm_rreg(struct radeon_device *rdev, uint32_t reg,
- bool always_indirect);
-void r100_mm_wreg(struct radeon_device *rdev, uint32_t reg, uint32_t v,
- bool always_indirect);
+#define RADEON_MIN_MMIO_SIZE 0x1
+
+static inline uint32_t r100_mm_rreg(struct radeon_device *rdev, uint32_t reg,
+   bool always_indirect)
+{
+   /* The mmio size is 64kb at minimum. Allows the if to be optimized out. 
*/
+   if ((reg < rdev->rmmio_size || reg < RADEON_MIN_MMIO_SIZE) && 
!always_indirect)
+   return readl(((void __iomem *)rdev->rmmio) + reg);
+   else {
+   unsigned long flags;
+   uint32_t ret;
+
+   spin_lock_irqsave(&rdev->mmio_idx_lock, flags);
+   writel(reg, ((void __iomem *)rdev->rmmio) + RADEON_MM_INDEX);
+   ret = readl(((void __iomem *)rdev->rmmio) + RADEON_MM_DATA);
+   spin_unlock_irqrestore(&rdev->mmio_idx_lock, flags);
+
+   return ret;
+   }
+}
+
+static inline void r100_mm_wreg(struct radeon_device *rdev, uint32_t reg, 
uint32_t v,
+   bool always_indirect)
+{
+   if ((reg < rdev->rmmio_size || reg < RADEON_MIN_MMIO_SIZE) && 
!always_indirect)
+   writel(v, ((void __iomem *)rdev->rmmio) + reg);
+   else {
+   unsigned long flags;
+
+   spin_lock_irqsave(&rdev->mmio_idx_lock, flags);
+   writel(reg, ((void __iomem *)rdev->rmmio) + RADEON_MM_INDEX);
+   writel(v, ((void __iomem *)rdev->rmmio) + RADEON_MM_DATA);
+   spin_unlock_irqrestore(&rdev->mmio_idx_lock, flags);
+   }
+}
+
 u32 r100_io_rreg(struct radeon_device *rdev, u32 reg);
 void r100_io_wreg(struct radeon_device *rdev, u32 reg, u32 v);

-- 
1.8.3.1



[PATCH] drm/radeon: Inline r100_mm_rreg, v2

2014-04-19 Thread Lauri Kasanen
On Sat, 19 Apr 2014 11:15:53 -0400
Alex Deucher  wrote:

> On Sat, Apr 19, 2014 at 6:06 AM, Christian K?nig
> >> This was originally un-inlined by Andi Kleen in 2011 citing size concerns.
> >> Indeed, a first attempt at inlining it grew radeon.ko by 7%.
> >>
> >> However, 2% of cpu is spent in this function. Simply inlining it gave 1%
> >> more fps
> >> in Urban Terror.
> >>
> >> v2: We know the minimum MMIO size. Adding it to the if allows the compiler
> >> to
> >> optimize the branch out, improving both performance and size.
> >>
> >> The v2 patch decreases radeon.ko size by 2%. I didn't re-benchmark, but
> >> common sense
> >> says perf is now more than 1% better.
> >
> > Nice!
> >
> > But are you sure that the register PCI bar is always at least 64K in size?
> > Keep in mind that this code is used over all generations since R100.
> > Additional to that we probably should have a define for that and also apply
> > the optimizations to r100_mm_wreg as well.

Yes, I checked the earlier code. It had 64kb hard-coded, and when it
was changed in 2010 to use the dynamic value, the commit said later
asics are larger. (07bec2df01)

A quick google also didn't find any dmesg with smaller values, R100
cards had 64kb.

> If most of the register accesses are for the interrupt setup, I wonder
> if it would be better to just clean up the irq_set functions to reduce
> the register accesses.  E.g., only touch the registers for the
> specific irq masks have changed.

Yes, that should also be done. But as this function is used elsewhere
as well, having it fast (not to mention the size decrease) would be
good.

I think this patch is safe enough for 3.15, but perhaps it's too late
now.

- Lauri


[PATCH] drm/radeon: Inline r100_mm_rreg, v2

2014-04-19 Thread Lauri Kasanen
This was originally un-inlined by Andi Kleen in 2011 citing size concerns.
Indeed, a first attempt at inlining it grew radeon.ko by 7%.

However, 2% of cpu is spent in this function. Simply inlining it gave 1% more 
fps
in Urban Terror.

v2: We know the minimum MMIO size. Adding it to the if allows the compiler to
optimize the branch out, improving both performance and size.

The v2 patch decreases radeon.ko size by 2%. I didn't re-benchmark, but common 
sense
says perf is now more than 1% better.

Signed-off-by: Lauri Kasanen 
---
 drivers/gpu/drm/radeon/r100.c   | 18 --
 drivers/gpu/drm/radeon/radeon.h | 21 +++--
 2 files changed, 19 insertions(+), 20 deletions(-)

The _wreg function could be given similar treatment, but as it's nowhere as hot
(0.009% vs 2%), didn't bother.

diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c
index b6c3264..8169e82 100644
--- a/drivers/gpu/drm/radeon/r100.c
+++ b/drivers/gpu/drm/radeon/r100.c
@@ -4086,24 +4086,6 @@ int r100_init(struct radeon_device *rdev)
return 0;
 }

-uint32_t r100_mm_rreg(struct radeon_device *rdev, uint32_t reg,
- bool always_indirect)
-{
-   if (reg < rdev->rmmio_size && !always_indirect)
-   return readl(((void __iomem *)rdev->rmmio) + reg);
-   else {
-   unsigned long flags;
-   uint32_t ret;
-
-   spin_lock_irqsave(&rdev->mmio_idx_lock, flags);
-   writel(reg, ((void __iomem *)rdev->rmmio) + RADEON_MM_INDEX);
-   ret = readl(((void __iomem *)rdev->rmmio) + RADEON_MM_DATA);
-   spin_unlock_irqrestore(&rdev->mmio_idx_lock, flags);
-
-   return ret;
-   }
-}
-
 void r100_mm_wreg(struct radeon_device *rdev, uint32_t reg, uint32_t v,
  bool always_indirect)
 {
diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index f21db7a..883276a 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -2328,8 +2328,25 @@ int radeon_device_init(struct radeon_device *rdev,
 void radeon_device_fini(struct radeon_device *rdev);
 int radeon_gpu_wait_for_idle(struct radeon_device *rdev);

-uint32_t r100_mm_rreg(struct radeon_device *rdev, uint32_t reg,
- bool always_indirect);
+static inline uint32_t r100_mm_rreg(struct radeon_device *rdev, uint32_t reg,
+   bool always_indirect)
+{
+   /* The mmio size is 64kb at minimum. Allows the if to be optimized out. 
*/
+   if ((reg < rdev->rmmio_size || reg < 0x1) && !always_indirect)
+   return readl(((void __iomem *)rdev->rmmio) + reg);
+   else {
+   unsigned long flags;
+   uint32_t ret;
+
+   spin_lock_irqsave(&rdev->mmio_idx_lock, flags);
+   writel(reg, ((void __iomem *)rdev->rmmio) + RADEON_MM_INDEX);
+   ret = readl(((void __iomem *)rdev->rmmio) + RADEON_MM_DATA);
+   spin_unlock_irqrestore(&rdev->mmio_idx_lock, flags);
+
+   return ret;
+   }
+}
+
 void r100_mm_wreg(struct radeon_device *rdev, uint32_t reg, uint32_t v,
  bool always_indirect);
 u32 r100_io_rreg(struct radeon_device *rdev, u32 reg);
-- 
1.8.3.1



[PATCH] drm/radeon: Inline r100_mm_rreg

2014-04-11 Thread Lauri Kasanen
On Fri, 11 Apr 2014 14:32:20 +0200
Christian K?nig  wrote:

> Am 11.04.2014 11:54, schrieb Lauri Kasanen:
> > On Fri, 11 Apr 2014 10:33:08 +0200
> > Christian K?nig  wrote:
> >
> >>>> Actually direct register access shouldn't be necessary so often. Apart
> >>>> from page flips, write/read pointer updates and irq processing there
> >>>> shouldn't be so many of them. Could you clarify a bit more what issue
> >>>> you are seeing here?
> >>> Too much cpu usage for such a simple function. 2% makes it #2 in top-10
> >>> radeon.ko functions, right after evergreen_cs_parse. For reference, #3
> >>> (radeon_cs_packet_parse) is only 0.5%, one fourth of this function's
> >>> usage.
> >> I think you misunderstood me here. I do believe your numbers that it
> >> makes a noticeable difference.
> >>
> >> But I've did a couple of perf tests recently on SI and CIK while hacking
> >> on VM support, and IIRC r100_mm_rreg didn't showed up in the top 10 on
> >> those systems.
> >>
> >> So what puzzles me is who the hack is calling r100_mm_rreg so often that
> >> it makes a noticeable difference on evergreen/NI?
> > The biggest caller is cayman_cp_int_cntl_setup. Before inlining it took
> > 0.0013%, after it takes 1%.
> 
> Sounds like somebody is constantly turning interrupts on and off.

I instrumented radeon_irq_set. There's about 12 calls to that per sec,
most coming from radeon_irq_kms_sw_irq_get and
radeon_irq_kms_sw_irq_put.

So from your earlier mail I gather this amount of IRQ setting is
normal. It's just taking a lot of cpu for some reason.

- Lauri


[PATCH] drm/radeon: Inline r100_mm_rreg

2014-04-11 Thread Lauri Kasanen
On Fri, 11 Apr 2014 14:32:20 +0200
Christian K?nig  wrote:

> Anyway, I would do like Ilia suggested and only put the else branch into 
> a separate, not inlined function.
> 
> BTW: It's probably a good idea to do the same for the write function as 
> well.

I tested it. The majority of the size increase stayed - the
else/spinlock part as non-inlined functions, radeon.ko was still 5%
larger instead of 7%.

- Lauri


[PATCH] drm/radeon: Inline r100_mm_rreg

2014-04-11 Thread Lauri Kasanen
On Fri, 11 Apr 2014 10:33:08 +0200
Christian K?nig  wrote:

> >> Actually direct register access shouldn't be necessary so often. Apart
> >> from page flips, write/read pointer updates and irq processing there
> >> shouldn't be so many of them. Could you clarify a bit more what issue
> >> you are seeing here?
> > Too much cpu usage for such a simple function. 2% makes it #2 in top-10
> > radeon.ko functions, right after evergreen_cs_parse. For reference, #3
> > (radeon_cs_packet_parse) is only 0.5%, one fourth of this function's
> > usage.
> 
> I think you misunderstood me here. I do believe your numbers that it 
> makes a noticeable difference.
> 
> But I've did a couple of perf tests recently on SI and CIK while hacking 
> on VM support, and IIRC r100_mm_rreg didn't showed up in the top 10 on 
> those systems.
> 
> So what puzzles me is who the hack is calling r100_mm_rreg so often that 
> it makes a noticeable difference on evergreen/NI?

The biggest caller is cayman_cp_int_cntl_setup. Before inlining it took
0.0013%, after it takes 1%.

This is on a Richland APU, so Aruba/Cayman. Urban Terror is an ioq3
game with a lot of cpu-side vertex submissions.

- Lauri


[PATCH] drm/radeon: Inline r100_mm_rreg

2014-04-11 Thread Lauri Kasanen
On Thu, 10 Apr 2014 21:30:03 +0200
Christian K?nig  wrote:

> >>> Quick thought from someone entirely unfamiliar with the hardware:
> >>> perhaps you can get the performance benefit without the size increase
> >>> by moving the else portion into a non-inline function? I'm guessing
> >>> that most accesses happen in the "if" branch.
> >> The function call overhead is about equal to branching overhead, so
> >> splitting it would only help about half that. It's called from many
> >> places, and a lot of calls per sec.
> 
> Actually direct register access shouldn't be necessary so often. Apart 
> from page flips, write/read pointer updates and irq processing there 
> shouldn't be so many of them. Could you clarify a bit more what issue 
> you are seeing here?

Too much cpu usage for such a simple function. 2% makes it #2 in top-10
radeon.ko functions, right after evergreen_cs_parse. For reference, #3
(radeon_cs_packet_parse) is only 0.5%, one fourth of this function's
usage.

As proved by the perf increase, it's called often enough that getting
rid of the function call overhead (and compiling the if out
compile-time) helps measurably.

- Lauri


[PATCH] drm/radeon: Inline r100_mm_rreg

2014-04-10 Thread Lauri Kasanen
On Thu, 10 Apr 2014 12:19:10 -0400
Ilia Mirkin  wrote:

> > +static inline uint32_t r100_mm_rreg(struct radeon_device *rdev, uint32_t 
> > reg,
> > +   bool always_indirect)
> > +{
> > +   if (reg < rdev->rmmio_size && !always_indirect)
> > +   return readl(((void __iomem *)rdev->rmmio) + reg);
> 
> Quick thought from someone entirely unfamiliar with the hardware:
> perhaps you can get the performance benefit without the size increase
> by moving the else portion into a non-inline function? I'm guessing
> that most accesses happen in the "if" branch.

The function call overhead is about equal to branching overhead, so
splitting it would only help about half that. It's called from many
places, and a lot of calls per sec.

Of course the future kernel LTO will all make this go away, but that's
probably two years in the future before it's stable.

- Lauri


[PATCH] drm/radeon: Inline r100_mm_rreg

2014-04-10 Thread Lauri Kasanen
This was originally un-inlined by Andi Kleen in 2011 citing size concerns.
Indeed, inlining it grows radeon.ko by 7%.

However, 2% of cpu is spent in this function. Inlining it gives 1% more fps
in Urban Terror.

Signed-off-by: Lauri Kasanen 
---
 drivers/gpu/drm/radeon/r100.c   | 18 --
 drivers/gpu/drm/radeon/radeon.h | 20 ++--
 2 files changed, 18 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c
index b6c3264..8169e82 100644
--- a/drivers/gpu/drm/radeon/r100.c
+++ b/drivers/gpu/drm/radeon/r100.c
@@ -4086,24 +4086,6 @@ int r100_init(struct radeon_device *rdev)
return 0;
 }

-uint32_t r100_mm_rreg(struct radeon_device *rdev, uint32_t reg,
- bool always_indirect)
-{
-   if (reg < rdev->rmmio_size && !always_indirect)
-   return readl(((void __iomem *)rdev->rmmio) + reg);
-   else {
-   unsigned long flags;
-   uint32_t ret;
-
-   spin_lock_irqsave(&rdev->mmio_idx_lock, flags);
-   writel(reg, ((void __iomem *)rdev->rmmio) + RADEON_MM_INDEX);
-   ret = readl(((void __iomem *)rdev->rmmio) + RADEON_MM_DATA);
-   spin_unlock_irqrestore(&rdev->mmio_idx_lock, flags);
-
-   return ret;
-   }
-}
-
 void r100_mm_wreg(struct radeon_device *rdev, uint32_t reg, uint32_t v,
  bool always_indirect)
 {
diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 5cf10a7..9231100 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -2330,8 +2330,24 @@ int radeon_device_init(struct radeon_device *rdev,
 void radeon_device_fini(struct radeon_device *rdev);
 int radeon_gpu_wait_for_idle(struct radeon_device *rdev);

-uint32_t r100_mm_rreg(struct radeon_device *rdev, uint32_t reg,
- bool always_indirect);
+static inline uint32_t r100_mm_rreg(struct radeon_device *rdev, uint32_t reg,
+   bool always_indirect)
+{
+   if (reg < rdev->rmmio_size && !always_indirect)
+   return readl(((void __iomem *)rdev->rmmio) + reg);
+   else {
+   unsigned long flags;
+   uint32_t ret;
+
+   spin_lock_irqsave(&rdev->mmio_idx_lock, flags);
+   writel(reg, ((void __iomem *)rdev->rmmio) + RADEON_MM_INDEX);
+   ret = readl(((void __iomem *)rdev->rmmio) + RADEON_MM_DATA);
+   spin_unlock_irqrestore(&rdev->mmio_idx_lock, flags);
+
+   return ret;
+   }
+}
+
 void r100_mm_wreg(struct radeon_device *rdev, uint32_t reg, uint32_t v,
  bool always_indirect);
 u32 r100_io_rreg(struct radeon_device *rdev, u32 reg);
-- 
1.8.3.1



[PATCH] drm/radeon: Improve vramlimit module param documentation

2014-04-08 Thread Lauri Kasanen
Signed-off-by: Lauri Kasanen 
---
 drivers/gpu/drm/radeon/radeon_drv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/radeon_drv.c 
b/drivers/gpu/drm/radeon/radeon_drv.c
index d0eba48..cf2721a 100644
--- a/drivers/gpu/drm/radeon/radeon_drv.c
+++ b/drivers/gpu/drm/radeon/radeon_drv.c
@@ -186,7 +186,7 @@ module_param_named(dynclks, radeon_dynclks, int, 0444);
 MODULE_PARM_DESC(r4xx_atom, "Enable ATOMBIOS modesetting for R4xx");
 module_param_named(r4xx_atom, radeon_r4xx_atom, int, 0444);

-MODULE_PARM_DESC(vramlimit, "Restrict VRAM for testing");
+MODULE_PARM_DESC(vramlimit, "Restrict VRAM for testing, in megabytes");
 module_param_named(vramlimit, radeon_vram_limit, int, 0600);

 MODULE_PARM_DESC(agpmode, "AGP Mode (-1 == PCI)");
-- 
1.8.3.1



[RFC 0/3] TTM priority queue logic

2014-04-07 Thread Lauri Kasanen
On Mon, 07 Apr 2014 14:25:28 +0200
Thomas Hellstrom  wrote:

> Hi, Lauri.
> 
> On 04/04/2014 03:52 PM, Lauri Kasanen wrote:
> > Hi list, Thomas,
> >
> > I'd like to know if this is going in the right direction.
> 
> This looks fine with me.
> 
> However, if possible I'd like the drivers to enable both alloc_threshold
> and priority queue on a per-memory-type basis.
> 
> That would mean no new arguments (use_pqueue, alloc_threshold) in
> ttm_bo_device_init(). Instead, set default values in
> ttm_bo_init_mm(), and let the driver change them in the init_mem_type()
> callback.
> 
> Do you think that would work?

Thanks for the review.

Alloc_threshold was removed, and the current patch (in drm-next)
replaced it with a placement flag. So now that logic is in the driver.

Making the pqueue a per-type option would certainly work, I'll edit it
to do so. I don't see why it would improve GTT or SYSTEM given their
perf characteristics, but I suppose future devices can always do weird
things.

- Lauri


Required backwards support level?

2014-04-07 Thread Lauri Kasanen
On Sun, 6 Apr 2014 19:58:48 +0200
Daniel Vetter  wrote:

> On Sun, Apr 6, 2014 at 7:28 PM, Lauri Kasanen  wrote:
> > This is related to my memory management work. As the VRAM queue is
> > global, it cannot be determined on a per-app basis. If at least one
> > client is running old userspace, the new scoring cannot be used.
> >
> > Switching live between the two would bring additional complexity. I'd
> > hope to be able to determine by the first 3d app if the userspace is
> > new enough. But that depends on if it's expected to be able to run
> > mixed loads.
> 
> At least thus far we've worked under the example that any
> explicitly-enabled new behaviour is only enabled per-fd. That's also
> useful when you install all your developer versions into a prefix, so
> still have 2 versions of X driver, libdrm, mesa, everything else lying
> around and want to switch at runtime even.
> 
> Can't you fake a default score somehow for all legacy userspace and
> always run with the new code? Reimplementing the old interfaces in
> terms of the new ones also allows you to kick out duplicated code
> (usually) compared to having both old and new code around.

Certainly, that can be done. 

- Lauri


Required backwards support level?

2014-04-06 Thread Lauri Kasanen
On Sun, 6 Apr 2014 10:53:58 -0400
Rob Clark  wrote:

> On Sun, Apr 6, 2014 at 8:26 AM, Lauri Kasanen  wrote:
> > Hi,
> >
> > Obviously old userspace + new kernel combo needs to be supported. But
> > I'm not so sure about a mixed case, does old userspace + new userspace
> > need to be supported to run at the same time?
> >
> > For example, a new 64-bit mesa+libdrm and a 32-bit old set, both
> > running apps at the same time.
> >
> > Or is it acceptable to assume that once a new userspace has been run,
> > an old one won't be run for this boot?
> 
> old 32b userspace in a chroot, for example?

Yes, for example.

> Maybe you can give a better description of what you are wanting to do?
>  Could you decide on 32b vs 64b userspace on a per 'struct drm_file'
> basis (ie. each time /dev/dri/cardN is opened)?

This is related to my memory management work. As the VRAM queue is
global, it cannot be determined on a per-app basis. If at least one
client is running old userspace, the new scoring cannot be used.

Switching live between the two would bring additional complexity. I'd
hope to be able to determine by the first 3d app if the userspace is
new enough. But that depends on if it's expected to be able to run
mixed loads.

- Lauri


Required backwards support level?

2014-04-06 Thread Lauri Kasanen
Hi,

Obviously old userspace + new kernel combo needs to be supported. But
I'm not so sure about a mixed case, does old userspace + new userspace
need to be supported to run at the same time?

For example, a new 64-bit mesa+libdrm and a 32-bit old set, both
running apps at the same time.

Or is it acceptable to assume that once a new userspace has been run,
an old one won't be run for this boot?

- Lauri


[RFC 3/3] drm/ttm: Enable the priority queue for VRAM

2014-04-04 Thread Lauri Kasanen
Signed-off-by: Lauri Kasanen 
---
 drivers/gpu/drm/ttm/ttm_bo.c | 68 +++-
 1 file changed, 54 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 621151c..80e5856 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -172,7 +172,12 @@ void ttm_bo_add_to_lru(struct ttm_buffer_object *bo)
BUG_ON(!list_empty(&bo->lru));

man = &bdev->man[bo->mem.mem_type];
-   list_add_tail(&bo->lru, &man->lru);
+
+   if (bdev->use_pqueue && bo->mem.mem_type == TTM_PL_VRAM)
+   ttm_prio_add(&man->pqueue, &bo->pqueue);
+   else
+   list_add_tail(&bo->lru, &man->lru);
+
kref_get(&bo->list_kref);

if (bo->ttm != NULL) {
@@ -186,6 +191,8 @@ EXPORT_SYMBOL(ttm_bo_add_to_lru);
 int ttm_bo_del_from_lru(struct ttm_buffer_object *bo)
 {
int put_count = 0;
+   struct ttm_bo_device *bdev = bo->bdev;
+   struct ttm_mem_type_manager *man = &bdev->man[bo->mem.mem_type];

if (!list_empty(&bo->swap)) {
list_del_init(&bo->swap);
@@ -194,6 +201,10 @@ int ttm_bo_del_from_lru(struct ttm_buffer_object *bo)
if (!list_empty(&bo->lru)) {
list_del_init(&bo->lru);
++put_count;
+   } else if (bdev->use_pqueue && bo->mem.mem_type == TTM_PL_VRAM &&
+   ttm_prio_is_queued(&bo->pqueue)) {
+   ttm_prio_remove(&man->pqueue, &bo->pqueue);
+   ++put_count;
}

/*
@@ -725,10 +736,22 @@ static int ttm_mem_evict_first(struct ttm_bo_device *bdev,
int ret = -EBUSY, put_count;

spin_lock(&glob->lru_lock);
-   list_for_each_entry(bo, &man->lru, lru) {
-   ret = ttm_bo_reserve_nolru(bo, false, true, false, 0);
-   if (!ret)
-   break;
+   if (bdev->use_pqueue && mem_type == TTM_PL_VRAM) {
+   struct ttm_pqueue_entry *pe;
+   for (pe = ttm_prio_query_lowest(&man->pqueue); pe;
+   pe = ttm_prio_query_next(pe)) {
+
+   bo = container_of(pe, struct ttm_buffer_object, pqueue);
+   ret = ttm_bo_reserve_nolru(bo, false, true, false, 0);
+   if (!ret)
+   break;
+   }
+   } else {
+   list_for_each_entry(bo, &man->lru, lru) {
+   ret = ttm_bo_reserve_nolru(bo, false, true, false, 0);
+   if (!ret)
+   break;
+   }
}

if (ret) {
@@ -1125,6 +1148,7 @@ int ttm_bo_init(struct ttm_bo_device *bdev,
INIT_LIST_HEAD(&bo->ddestroy);
INIT_LIST_HEAD(&bo->swap);
INIT_LIST_HEAD(&bo->io_reserve_lru);
+   ttm_prio_init_entry(&bo->pqueue);
mutex_init(&bo->wu_mutex);
bo->bdev = bdev;
bo->glob = bdev->glob;
@@ -1243,17 +1267,32 @@ static int ttm_bo_force_list_clean(struct ttm_bo_device 
*bdev,
 */

spin_lock(&glob->lru_lock);
-   while (!list_empty(&man->lru)) {
-   spin_unlock(&glob->lru_lock);
-   ret = ttm_mem_evict_first(bdev, mem_type, false, false);
-   if (ret) {
-   if (allow_errors) {
-   return ret;
-   } else {
-   pr_err("Cleanup eviction failed\n");
+   if (bdev->use_pqueue && mem_type == TTM_PL_VRAM) {
+   while (ttm_prio_query_lowest(&man->pqueue)) {
+   spin_unlock(&glob->lru_lock);
+   ret = ttm_mem_evict_first(bdev, mem_type, false, false);
+   if (ret) {
+   if (allow_errors) {
+   return ret;
+   } else {
+   pr_err("Cleanup eviction failed\n");
+   }
}
+   spin_lock(&glob->lru_lock);
+   }
+   } else {
+   while (!list_empty(&man->lru)) {
+   spin_unlock(&glob->lru_lock);
+   ret = ttm_mem_evict_first(bdev, mem_type, false, false);
+   if (ret) {
+   if (allow_errors) {
+   return ret;
+   } else {
+   pr_err("Cleanup eviction failed

[RFC 2/3] drm/ttm: Add the priority queue to appropriate structs

2014-04-04 Thread Lauri Kasanen
Signed-off-by: Lauri Kasanen 
---
 drivers/gpu/drm/ast/ast_ttm.c | 1 +
 drivers/gpu/drm/bochs/bochs_mm.c  | 1 +
 drivers/gpu/drm/cirrus/cirrus_ttm.c   | 1 +
 drivers/gpu/drm/mgag200/mgag200_ttm.c | 1 +
 drivers/gpu/drm/nouveau/nouveau_ttm.c | 2 +-
 drivers/gpu/drm/qxl/qxl_ttm.c | 2 +-
 drivers/gpu/drm/radeon/radeon_ttm.c   | 1 +
 drivers/gpu/drm/ttm/ttm_bo.c  | 2 ++
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c   | 2 +-
 include/drm/ttm/ttm_bo_api.h  | 2 ++
 include/drm/ttm/ttm_bo_driver.h   | 7 +++
 11 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/ast/ast_ttm.c b/drivers/gpu/drm/ast/ast_ttm.c
index 61f9e39..311f37f 100644
--- a/drivers/gpu/drm/ast/ast_ttm.c
+++ b/drivers/gpu/drm/ast/ast_ttm.c
@@ -263,6 +263,7 @@ int ast_mm_init(struct ast_private *ast)
 dev->anon_inode->i_mapping,
 DRM_FILE_PAGE_OFFSET,
 true,
+false,
 0);
if (ret) {
DRM_ERROR("Error initialising bo driver; %d\n", ret);
diff --git a/drivers/gpu/drm/bochs/bochs_mm.c b/drivers/gpu/drm/bochs/bochs_mm.c
index 9dfd24a..c4aba61 100644
--- a/drivers/gpu/drm/bochs/bochs_mm.c
+++ b/drivers/gpu/drm/bochs/bochs_mm.c
@@ -229,6 +229,7 @@ int bochs_mm_init(struct bochs_device *bochs)
 bochs->dev->anon_inode->i_mapping,
 DRM_FILE_PAGE_OFFSET,
 true,
+false,
 0);
if (ret) {
DRM_ERROR("Error initialising bo driver; %d\n", ret);
diff --git a/drivers/gpu/drm/cirrus/cirrus_ttm.c 
b/drivers/gpu/drm/cirrus/cirrus_ttm.c
index 74e8e21..895d20e 100644
--- a/drivers/gpu/drm/cirrus/cirrus_ttm.c
+++ b/drivers/gpu/drm/cirrus/cirrus_ttm.c
@@ -263,6 +263,7 @@ int cirrus_mm_init(struct cirrus_device *cirrus)
 dev->anon_inode->i_mapping,
 DRM_FILE_PAGE_OFFSET,
 true,
+false,
 0);
if (ret) {
DRM_ERROR("Error initialising bo driver; %d\n", ret);
diff --git a/drivers/gpu/drm/mgag200/mgag200_ttm.c 
b/drivers/gpu/drm/mgag200/mgag200_ttm.c
index 6844b24..591f68e 100644
--- a/drivers/gpu/drm/mgag200/mgag200_ttm.c
+++ b/drivers/gpu/drm/mgag200/mgag200_ttm.c
@@ -263,6 +263,7 @@ int mgag200_mm_init(struct mga_device *mdev)
 dev->anon_inode->i_mapping,
 DRM_FILE_PAGE_OFFSET,
 true,
+false,
 0);
if (ret) {
DRM_ERROR("Error initialising bo driver; %d\n", ret);
diff --git a/drivers/gpu/drm/nouveau/nouveau_ttm.c 
b/drivers/gpu/drm/nouveau/nouveau_ttm.c
index 3fef97c..4b032b4 100644
--- a/drivers/gpu/drm/nouveau/nouveau_ttm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_ttm.c
@@ -384,7 +384,7 @@ nouveau_ttm_init(struct nouveau_drm *drm)
  &nouveau_bo_driver,
  dev->anon_inode->i_mapping,
  DRM_FILE_PAGE_OFFSET,
- bits <= 32 ? true : false, 0);
+ bits <= 32 ? true : false, false, 0);
if (ret) {
NV_ERROR(drm, "error initialising bo driver, %d\n", ret);
return ret;
diff --git a/drivers/gpu/drm/qxl/qxl_ttm.c b/drivers/gpu/drm/qxl/qxl_ttm.c
index 8401a00..88f12e7 100644
--- a/drivers/gpu/drm/qxl/qxl_ttm.c
+++ b/drivers/gpu/drm/qxl/qxl_ttm.c
@@ -495,7 +495,7 @@ int qxl_ttm_init(struct qxl_device *qdev)
   qdev->mman.bo_global_ref.ref.object,
   &qxl_bo_driver,
   qdev->ddev->anon_inode->i_mapping,
-  DRM_FILE_PAGE_OFFSET, 0, 0);
+  DRM_FILE_PAGE_OFFSET, 0, false, 0);
if (r) {
DRM_ERROR("failed initializing buffer object driver(%d).\n", r);
return r;
diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c 
b/drivers/gpu/drm/radeon/radeon_ttm.c
index 1aef339..dd96ed5 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -711,6 +711,7 @@ int radeon_ttm_init(struct radeon_device *rdev)
   rdev->ddev->anon_inode->i_mapping,
   DRM_FILE_PAGE_OFFSET,
   rdev->need_dma32,
+  false,
   512 * 1024);
if (r) {
DRM_ERROR(&q

[RFC 1/3] drm/ttm: Add a priority queue

2014-04-04 Thread Lauri Kasanen
Signed-off-by: Lauri Kasanen 
---
 drivers/gpu/drm/ttm/Makefile   |   2 +-
 drivers/gpu/drm/ttm/ttm_priority.c | 152 +
 include/drm/ttm/ttm_priority.h |  90 ++
 3 files changed, 243 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/ttm/ttm_priority.c
 create mode 100644 include/drm/ttm/ttm_priority.h

diff --git a/drivers/gpu/drm/ttm/Makefile b/drivers/gpu/drm/ttm/Makefile
index b433b9f..4411599 100644
--- a/drivers/gpu/drm/ttm/Makefile
+++ b/drivers/gpu/drm/ttm/Makefile
@@ -5,6 +5,6 @@ ccflags-y := -Iinclude/drm
 ttm-y := ttm_agp_backend.o ttm_memory.o ttm_tt.o ttm_bo.o \
ttm_bo_util.o ttm_bo_vm.o ttm_module.o \
ttm_object.o ttm_lock.o ttm_execbuf_util.o ttm_page_alloc.o \
-   ttm_bo_manager.o ttm_page_alloc_dma.o
+   ttm_bo_manager.o ttm_page_alloc_dma.o ttm_priority.o

 obj-$(CONFIG_DRM_TTM) += ttm.o
diff --git a/drivers/gpu/drm/ttm/ttm_priority.c 
b/drivers/gpu/drm/ttm/ttm_priority.c
new file mode 100644
index 000..a7cf8d1
--- /dev/null
+++ b/drivers/gpu/drm/ttm/ttm_priority.c
@@ -0,0 +1,152 @@
+/**
+ *
+ * Copyright (c) 2014 Lauri Kasanen
+ * All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the
+ * "Software"), to deal in the Software without restriction, including
+ * without limitation the rights to use, copy, modify, merge, publish,
+ * distribute, sub license, and/or sell copies of the Software, and to
+ * permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the
+ * next paragraph) shall be included in all copies or substantial portions
+ * of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDERS, AUTHORS AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM,
+ * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
+ * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
+ * USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *
+ **/
+
+
+#include 
+#include 
+
+static void ttm_prio_insert_rb(struct rb_root * const root,
+  struct ttm_pqueue_entry * const data,
+  struct rb_node *parent,
+  struct rb_node **where)
+{
+   BUG_ON(!where || (!parent && root->rb_node));
+
+   rb_link_node(&data->node, parent, where);
+   rb_insert_color(&data->node, root);
+}
+
+static struct ttm_pqueue_entry *ttm_prio_search_rb(struct rb_root *root,
+  u64 score,
+  struct rb_node **parent,
+  struct rb_node ***where)
+{
+   struct rb_node **new = &root->rb_node;
+
+   while (*new) {
+   struct ttm_pqueue_entry *this = container_of(*new,
+struct 
ttm_pqueue_entry,
+node);
+
+   *parent = *new;
+
+   if (score < this->score)
+   new = &((*new)->rb_left);
+   else if (score > this->score)
+   new = &((*new)->rb_right);
+   else
+   return this;
+
+   BUG_ON(*parent == *new);
+   }
+
+   *where = new;
+
+   return NULL;
+}
+
+void ttm_prio_add(struct ttm_pqueue * const queue,
+ struct ttm_pqueue_entry * const entry)
+{
+   struct rb_root * const tree = &queue->tree;
+   struct rb_node **place = NULL, *parent = NULL;
+   struct ttm_pqueue_entry *test;
+
+   if (ttm_prio_is_queued(entry))
+   ttm_prio_remove(queue, entry);
+
+   test = ttm_prio_search_rb(tree, entry->score, &parent, &place);
+
+   if (!test)
+   ttm_prio_insert_rb(tree, entry, parent, place);
+   else
+   list_add_tail(&entry->list, &test->list);
+}
+
+struct ttm_pqueue_entry *ttm_prio_query_lowest(const struct ttm_pqueue * const 
queue)
+{
+   const struct rb_root * const root = &queue->tree;
+   struct rb_node *node;
+
+   node = rb_first(root);
+   if (!node)
+   return NULL;
+
+   return container_of(node, struct ttm_pqueue_entry, node);
+}
+
+void ttm_prio_remove(struct ttm_pqueue * const queue,
+ 

[RFC 0/3] TTM priority queue logic

2014-04-04 Thread Lauri Kasanen
Hi list, Thomas,

I'd like to know if this is going in the right direction.

I've implemented a priority queue on top of the kernel rb tree and
linked list. It's been tested well in userspace.

I hardcoded radeon to input the buffer size as the score. Nothing blew
up, games ran fine, and even got ~2% more fps on average*.

The only thing missing from this code is the "if score is too low, and
there is no room without eviction, tell driver so" logic.

- Lauri

* This is a fairly bad strategy, and according to my simulator achieves
16% worse results compared to LRU in heavier situations. The games
tested here all fit in VRAM, which should explain the improvement.


[PATCH] drm/radeon: Use two-ended allocation by size, v2

2014-04-02 Thread Lauri Kasanen
This decreases eviction by up to 20%, by improving the fragmentation
quality. No harm in normal cases that fit VRAM fully (PTS gaming suite).

In some cases, even the VRAM-fitting cases improved slightly (openarena, urban 
terror).

512kb was measured as the most optimal threshold for 3d workloads common to 
radeon.
Other drivers may need different thresholds according to their workloads.

v2: Nicer formatting
Signed-off-by: Lauri Kasanen 
---
 drivers/gpu/drm/radeon/radeon_object.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/radeon_object.c 
b/drivers/gpu/drm/radeon/radeon_object.c
index 1375ff8..19bec0d 100644
--- a/drivers/gpu/drm/radeon/radeon_object.c
+++ b/drivers/gpu/drm/radeon/radeon_object.c
@@ -104,7 +104,7 @@ bool radeon_ttm_bo_is_radeon_bo(struct ttm_buffer_object 
*bo)

 void radeon_ttm_placement_from_domain(struct radeon_bo *rbo, u32 domain)
 {
-   u32 c = 0;
+   u32 c = 0, i;

rbo->placement.fpfn = 0;
rbo->placement.lpfn = 0;
@@ -131,6 +131,17 @@ void radeon_ttm_placement_from_domain(struct radeon_bo 
*rbo, u32 domain)
rbo->placements[c++] = TTM_PL_MASK_CACHING | TTM_PL_FLAG_SYSTEM;
rbo->placement.num_placement = c;
rbo->placement.num_busy_placement = c;
+
+   /*
+* Use two-ended allocation depending on the buffer size to
+* improve fragmentation quality.
+* 512kb was measured as the most optimal number.
+*/
+   if (rbo->tbo.mem.size > 512 * 1024) {
+   for (i = 0; i < c; i++) {
+   rbo->placements[i] |= TTM_PL_FLAG_TOPDOWN;
+   }
+   }
 }

 int radeon_bo_create(struct radeon_device *rdev,
-- 
1.8.3.1



[PATCH 2/2] drm/radeon: Use two-ended allocation by size

2014-04-02 Thread Lauri Kasanen
This decreases eviction by up to 20%, by improving the fragmentation
quality. No harm in normal cases that fit VRAM fully (PTS gaming suite).

In some cases, even the VRAM-fitting cases improved slightly (openarena, urban 
terror).

512kb was measured as the most optimal threshold for 3d workloads common to 
radeon.
Other drivers may need different thresholds according to their workloads.

Signed-off-by: Lauri Kasanen 
---
 drivers/gpu/drm/radeon/radeon_object.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/radeon_object.c 
b/drivers/gpu/drm/radeon/radeon_object.c
index 1375ff8..6251456 100644
--- a/drivers/gpu/drm/radeon/radeon_object.c
+++ b/drivers/gpu/drm/radeon/radeon_object.c
@@ -104,7 +104,7 @@ bool radeon_ttm_bo_is_radeon_bo(struct ttm_buffer_object 
*bo)

 void radeon_ttm_placement_from_domain(struct radeon_bo *rbo, u32 domain)
 {
-   u32 c = 0;
+   u32 c = 0, i;

rbo->placement.fpfn = 0;
rbo->placement.lpfn = 0;
@@ -131,6 +131,15 @@ void radeon_ttm_placement_from_domain(struct radeon_bo 
*rbo, u32 domain)
rbo->placements[c++] = TTM_PL_MASK_CACHING | TTM_PL_FLAG_SYSTEM;
rbo->placement.num_placement = c;
rbo->placement.num_busy_placement = c;
+
+   /*
+* Use two-ended allocation depending on the buffer size to
+* improve fragmentation quality.
+* 512kb was measured as the most optimal number.
+*/
+   if (rbo->tbo.mem.size > 512 * 1024) for (i = 0; i < c; i++) {
+   rbo->placements[i] |= TTM_PL_FLAG_TOPDOWN;
+   }
 }

 int radeon_bo_create(struct radeon_device *rdev,
-- 
1.8.3.1



[PATCH 1/2] drm: Add support for two-ended allocation, v3

2014-04-02 Thread Lauri Kasanen
Clients like i915 need to segregate cache domains within the GTT which
can lead to small amounts of fragmentation. By allocating the uncached
buffers from the bottom and the cacheable buffers from the top, we can
reduce the amount of wasted space and also optimize allocation of the
mappable portion of the GTT to only those buffers that require CPU
access through the GTT.

For other drivers, allocating small bos from one end and large ones
from the other helps improve the quality of fragmentation.

Based on drm_mm work by Chris Wilson.

v3: Changed to use a TTM placement flag
v2: Updated kerneldoc

Cc: Chris Wilson 
Cc: Ben Widawsky 
Cc: Christian K?nig 
Signed-off-by: Lauri Kasanen 
---
 drivers/gpu/drm/drm_mm.c | 66 ++--
 drivers/gpu/drm/i915/i915_gem.c  |  3 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c  |  3 +-
 drivers/gpu/drm/ttm/ttm_bo_manager.c | 11 --
 include/drm/drm_mm.h | 32 ++---
 include/drm/ttm/ttm_placement.h  |  3 ++
 6 files changed, 92 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c
index a2d45b74..8f64be4 100644
--- a/drivers/gpu/drm/drm_mm.c
+++ b/drivers/gpu/drm/drm_mm.c
@@ -82,6 +82,10 @@
  * this to implement guard pages between incompatible caching domains in the
  * graphics TT.
  *
+ * Two behaviors are supported for searching and allocating: bottom-up and 
top-down.
+ * The default is bottom-up. Top-down allocation can be used if the memory area
+ * has different restrictions, or just to reduce fragmentation.
+ *
  * Finally iteration helpers to walk all nodes and all holes are provided as 
are
  * some basic allocator dumpers for debugging.
  */
@@ -102,7 +106,8 @@ static struct drm_mm_node 
*drm_mm_search_free_in_range_generic(const struct drm_
 static void drm_mm_insert_helper(struct drm_mm_node *hole_node,
 struct drm_mm_node *node,
 unsigned long size, unsigned alignment,
-unsigned long color)
+unsigned long color,
+enum drm_mm_allocator_flags flags)
 {
struct drm_mm *mm = hole_node->mm;
unsigned long hole_start = drm_mm_hole_node_start(hole_node);
@@ -115,12 +120,22 @@ static void drm_mm_insert_helper(struct drm_mm_node 
*hole_node,
if (mm->color_adjust)
mm->color_adjust(hole_node, color, &adj_start, &adj_end);

+   if (flags & DRM_MM_CREATE_TOP)
+   adj_start = adj_end - size;
+
if (alignment) {
unsigned tmp = adj_start % alignment;
-   if (tmp)
-   adj_start += alignment - tmp;
+   if (tmp) {
+   if (flags & DRM_MM_CREATE_TOP)
+   adj_start -= tmp;
+   else
+   adj_start += alignment - tmp;
+   }
}

+   BUG_ON(adj_start < hole_start);
+   BUG_ON(adj_end > hole_end);
+
if (adj_start == hole_start) {
hole_node->hole_follows = 0;
list_del(&hole_node->hole_stack);
@@ -205,7 +220,8 @@ EXPORT_SYMBOL(drm_mm_reserve_node);
  * @size: size of the allocation
  * @alignment: alignment of the allocation
  * @color: opaque tag value to use for this node
- * @flags: flags to fine-tune the allocation
+ * @sflags: flags to fine-tune the allocation search
+ * @aflags: flags to fine-tune the allocation behavior
  *
  * The preallocated node must be cleared to 0.
  *
@@ -215,16 +231,17 @@ EXPORT_SYMBOL(drm_mm_reserve_node);
 int drm_mm_insert_node_generic(struct drm_mm *mm, struct drm_mm_node *node,
   unsigned long size, unsigned alignment,
   unsigned long color,
-  enum drm_mm_search_flags flags)
+  enum drm_mm_search_flags sflags,
+  enum drm_mm_allocator_flags aflags)
 {
struct drm_mm_node *hole_node;

hole_node = drm_mm_search_free_generic(mm, size, alignment,
-  color, flags);
+  color, sflags);
if (!hole_node)
return -ENOSPC;

-   drm_mm_insert_helper(hole_node, node, size, alignment, color);
+   drm_mm_insert_helper(hole_node, node, size, alignment, color, aflags);
return 0;
 }
 EXPORT_SYMBOL(drm_mm_insert_node_generic);
@@ -233,7 +250,8 @@ static void drm_mm_insert_helper_range(struct drm_mm_node 
*hole_node,
   struct drm_mm_node *node,
   unsigned long size, unsigned alignment,
   unsigned long color,
-  unsigned long start, unsigned long end)
+ 

[PATCH] drm: Add support for two-ended allocation, v2

2014-04-02 Thread Lauri Kasanen
On Wed, 02 Apr 2014 13:00:00 +0200
Christian K?nig  wrote:

> Nice idea, but I wouldn't put the decision where to place the buffer 
> into TTM based on it's size.
> 
> Instead please make that a proper TTM placement flag because for example 
> for VM page tables we want them to be at the end of VRAM, not because 
> they are big (which they are anyway) but because they never accessed by 
> the CPU.

I think there aren't many special cases like that, and they could
already be handled by setting the first and last pages for the bo?

- Lauri


[PATCH] drm: Add support for two-ended allocation, v2

2014-04-02 Thread Lauri Kasanen
Clients like i915 need to segregate cache domains within the GTT which
can lead to small amounts of fragmentation. By allocating the uncached
buffers from the bottom and the cacheable buffers from the top, we can
reduce the amount of wasted space and also optimize allocation of the
mappable portion of the GTT to only those buffers that require CPU
access through the GTT.

For other drivers, allocating small bos from one end and large ones
from the other helps improve the quality of fragmentation.

Only Radeon has behavioral changes in this patch, as I have no Intel hw.

Based on drm_mm work by Chris Wilson.

--

Radeon uses a 512kb threshold.

This decreases eviction by up to 20%, by improving the fragmentation
quality. No harm in normal cases that fit VRAM fully (PTS gaming suite).

In some cases, even the VRAM-fitting cases improved slightly (openarena, urban 
terror).

512kb was measured as the most optimal threshold for 3d workloads common to 
radeon.
Other drivers may need different thresholds according to their workloads.

v2: Updated kerneldoc in more places

Cc: Chris Wilson 
Cc: Ben Widawsky 
Signed-off-by: Lauri Kasanen 
---
 drivers/gpu/drm/ast/ast_ttm.c |  3 +-
 drivers/gpu/drm/bochs/bochs_mm.c  |  3 +-
 drivers/gpu/drm/cirrus/cirrus_ttm.c   |  3 +-
 drivers/gpu/drm/drm_mm.c  | 66 ++-
 drivers/gpu/drm/i915/i915_gem.c   |  3 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c   |  3 +-
 drivers/gpu/drm/mgag200/mgag200_ttm.c |  3 +-
 drivers/gpu/drm/nouveau/nouveau_ttm.c |  2 +-
 drivers/gpu/drm/qxl/qxl_ttm.c |  2 +-
 drivers/gpu/drm/radeon/radeon_ttm.c   |  3 +-
 drivers/gpu/drm/ttm/ttm_bo.c  |  4 ++-
 drivers/gpu/drm/ttm/ttm_bo_manager.c  | 16 +++--
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c   |  2 +-
 include/drm/drm_mm.h  | 32 ++---
 include/drm/ttm/ttm_bo_driver.h   |  9 -
 15 files changed, 118 insertions(+), 36 deletions(-)

diff --git a/drivers/gpu/drm/ast/ast_ttm.c b/drivers/gpu/drm/ast/ast_ttm.c
index b824622..61f9e39 100644
--- a/drivers/gpu/drm/ast/ast_ttm.c
+++ b/drivers/gpu/drm/ast/ast_ttm.c
@@ -262,7 +262,8 @@ int ast_mm_init(struct ast_private *ast)
 &ast_bo_driver,
 dev->anon_inode->i_mapping,
 DRM_FILE_PAGE_OFFSET,
-true);
+true,
+0);
if (ret) {
DRM_ERROR("Error initialising bo driver; %d\n", ret);
return ret;
diff --git a/drivers/gpu/drm/bochs/bochs_mm.c b/drivers/gpu/drm/bochs/bochs_mm.c
index f488be5..9dfd24a 100644
--- a/drivers/gpu/drm/bochs/bochs_mm.c
+++ b/drivers/gpu/drm/bochs/bochs_mm.c
@@ -228,7 +228,8 @@ int bochs_mm_init(struct bochs_device *bochs)
 &bochs_bo_driver,
 bochs->dev->anon_inode->i_mapping,
 DRM_FILE_PAGE_OFFSET,
-true);
+true,
+0);
if (ret) {
DRM_ERROR("Error initialising bo driver; %d\n", ret);
return ret;
diff --git a/drivers/gpu/drm/cirrus/cirrus_ttm.c 
b/drivers/gpu/drm/cirrus/cirrus_ttm.c
index 92e6b77..74e8e21 100644
--- a/drivers/gpu/drm/cirrus/cirrus_ttm.c
+++ b/drivers/gpu/drm/cirrus/cirrus_ttm.c
@@ -262,7 +262,8 @@ int cirrus_mm_init(struct cirrus_device *cirrus)
 &cirrus_bo_driver,
 dev->anon_inode->i_mapping,
 DRM_FILE_PAGE_OFFSET,
-true);
+true,
+0);
if (ret) {
DRM_ERROR("Error initialising bo driver; %d\n", ret);
return ret;
diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c
index a2d45b74..8f64be4 100644
--- a/drivers/gpu/drm/drm_mm.c
+++ b/drivers/gpu/drm/drm_mm.c
@@ -82,6 +82,10 @@
  * this to implement guard pages between incompatible caching domains in the
  * graphics TT.
  *
+ * Two behaviors are supported for searching and allocating: bottom-up and 
top-down.
+ * The default is bottom-up. Top-down allocation can be used if the memory area
+ * has different restrictions, or just to reduce fragmentation.
+ *
  * Finally iteration helpers to walk all nodes and all holes are provided as 
are
  * some basic allocator dumpers for debugging.
  */
@@ -102,7 +106,8 @@ static struct drm_mm_node 
*drm_mm_search_free_in_range_generic(const struct drm_
 static void drm_mm_insert_helper(struct drm_mm_node *hole_node,
 struct drm_mm_node *node,
 unsigned long size, unsigned alignment,
-unsigned l

[PATCH] drm: Add support for two-ended allocation

2014-03-31 Thread Lauri Kasanen
Clients like i915 need to segregate cache domains within the GTT which
can lead to small amounts of fragmentation. By allocating the uncached
buffers from the bottom and the cacheable buffers from the top, we can
reduce the amount of wasted space and also optimize allocation of the
mappable portion of the GTT to only those buffers that require CPU
access through the GTT.

For other drivers, allocating small bos from one end and large ones
from the other helps improve the quality of fragmentation.

Based on drm_mm work by Chris Wilson.

--

Radeon uses a 512kb threshold.

This decreases eviction by up to 20%, by improving the fragmentation
quality. No harm in normal cases that fit VRAM fully (PTS gaming suite).

In some cases, even the VRAM-fitting cases improved slightly (openarena, urban 
terror).

512kb was measured as the most optimal threshold for 3d workloads common to 
radeon.
Other drivers may need different thresholds according to their workloads.

Signed-off-by: Lauri Kasanen 
---
 drivers/gpu/drm/ast/ast_ttm.c |  3 +-
 drivers/gpu/drm/bochs/bochs_mm.c  |  3 +-
 drivers/gpu/drm/cirrus/cirrus_ttm.c   |  3 +-
 drivers/gpu/drm/drm_mm.c  | 56 +--
 drivers/gpu/drm/i915/i915_gem.c   |  3 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c   |  3 +-
 drivers/gpu/drm/mgag200/mgag200_ttm.c |  3 +-
 drivers/gpu/drm/nouveau/nouveau_ttm.c |  2 +-
 drivers/gpu/drm/qxl/qxl_ttm.c |  2 +-
 drivers/gpu/drm/radeon/radeon_ttm.c   |  3 +-
 drivers/gpu/drm/ttm/ttm_bo.c  |  4 ++-
 drivers/gpu/drm/ttm/ttm_bo_manager.c  | 16 --
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c   |  2 +-
 include/drm/drm_mm.h  | 29 +++---
 include/drm/ttm/ttm_bo_driver.h   |  7 -
 15 files changed, 105 insertions(+), 34 deletions(-)

diff --git a/drivers/gpu/drm/ast/ast_ttm.c b/drivers/gpu/drm/ast/ast_ttm.c
index b824622..61f9e39 100644
--- a/drivers/gpu/drm/ast/ast_ttm.c
+++ b/drivers/gpu/drm/ast/ast_ttm.c
@@ -262,7 +262,8 @@ int ast_mm_init(struct ast_private *ast)
 &ast_bo_driver,
 dev->anon_inode->i_mapping,
 DRM_FILE_PAGE_OFFSET,
-true);
+true,
+0);
if (ret) {
DRM_ERROR("Error initialising bo driver; %d\n", ret);
return ret;
diff --git a/drivers/gpu/drm/bochs/bochs_mm.c b/drivers/gpu/drm/bochs/bochs_mm.c
index f488be5..9dfd24a 100644
--- a/drivers/gpu/drm/bochs/bochs_mm.c
+++ b/drivers/gpu/drm/bochs/bochs_mm.c
@@ -228,7 +228,8 @@ int bochs_mm_init(struct bochs_device *bochs)
 &bochs_bo_driver,
 bochs->dev->anon_inode->i_mapping,
 DRM_FILE_PAGE_OFFSET,
-true);
+true,
+0);
if (ret) {
DRM_ERROR("Error initialising bo driver; %d\n", ret);
return ret;
diff --git a/drivers/gpu/drm/cirrus/cirrus_ttm.c 
b/drivers/gpu/drm/cirrus/cirrus_ttm.c
index 92e6b77..74e8e21 100644
--- a/drivers/gpu/drm/cirrus/cirrus_ttm.c
+++ b/drivers/gpu/drm/cirrus/cirrus_ttm.c
@@ -262,7 +262,8 @@ int cirrus_mm_init(struct cirrus_device *cirrus)
 &cirrus_bo_driver,
 dev->anon_inode->i_mapping,
 DRM_FILE_PAGE_OFFSET,
-true);
+true,
+0);
if (ret) {
DRM_ERROR("Error initialising bo driver; %d\n", ret);
return ret;
diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c
index a2d45b74..1728bcc 100644
--- a/drivers/gpu/drm/drm_mm.c
+++ b/drivers/gpu/drm/drm_mm.c
@@ -102,7 +102,8 @@ static struct drm_mm_node 
*drm_mm_search_free_in_range_generic(const struct drm_
 static void drm_mm_insert_helper(struct drm_mm_node *hole_node,
 struct drm_mm_node *node,
 unsigned long size, unsigned alignment,
-unsigned long color)
+unsigned long color,
+enum drm_mm_allocator_flags flags)
 {
struct drm_mm *mm = hole_node->mm;
unsigned long hole_start = drm_mm_hole_node_start(hole_node);
@@ -115,12 +116,22 @@ static void drm_mm_insert_helper(struct drm_mm_node 
*hole_node,
if (mm->color_adjust)
mm->color_adjust(hole_node, color, &adj_start, &adj_end);

+   if (flags & DRM_MM_CREATE_TOP)
+   adj_start = adj_end - size;
+
if (alignment) {
unsigned tmp = adj_start % alignment;
-

[PATCH 2/4] drm/ttm: Add optional support for two-ended allocation

2014-03-31 Thread Lauri Kasanen
On Mon, 31 Mar 2014 14:41:05 +0200
Lucas Stach  wrote:

> Am Montag, den 31.03.2014, 15:28 +0300 schrieb Lauri Kasanen:
> > Allocating small bos from one end and large ones from the other helps
> > improve the quality of fragmentation.
> > 
> > This depends on "drm: Optionally create mm blocks from top-to-bottom" by
> > Chris Wilson.
> > 
> You are breaking bisectability here. This patch deliberately introduces
> a build failure that you only fix up in the next two patches. This isn't
> acceptable, you must squash the following patches in to avoid the
> breakage.
> 
> Also if the first patch is from Chris Wilson and you only did some small
> changes this should be reflected in the author name of the patch.

OK, will squash and resend. Then the changes are mostly mine in the
singular patch.

BTW, how would I handle the case of a different author, when I cannot
use git-send-email due to network mail restrictions? I can't send email
from an address I don't own.

- Lauri


[PATCH 4/4] drm/radeon: Use two-ended allocation with a 512kb threshold

2014-03-31 Thread Lauri Kasanen
This decreases eviction by up to 20%, by improving the fragmentation
quality. No harm in normal cases that fit VRAM fully (PTS gaming suite).

In some cases, even the VRAM-fitting cases improved slightly (openarena, urban 
terror).

512kb was measured as the most optimal threshold for 3d workloads common to 
radeon.
Other drivers may need different thresholds according to their workloads.

Signed-off-by: Lauri Kasanen 
---
 drivers/gpu/drm/radeon/radeon_ttm.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c 
b/drivers/gpu/drm/radeon/radeon_ttm.c
index c8a8a51..1aef339 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -710,7 +710,8 @@ int radeon_ttm_init(struct radeon_device *rdev)
   &radeon_bo_driver,
   rdev->ddev->anon_inode->i_mapping,
   DRM_FILE_PAGE_OFFSET,
-  rdev->need_dma32);
+  rdev->need_dma32,
+  512 * 1024);
if (r) {
DRM_ERROR("failed initializing buffer object driver(%d).\n", r);
return r;
-- 
1.8.3.1



[PATCH 3/4] drm: Update the drivers for the two-ended allocation param

2014-03-31 Thread Lauri Kasanen
This updates the other drivers to build, no-op behavior-wise.

Signed-off-by: Lauri Kasanen 
---
 drivers/gpu/drm/ast/ast_ttm.c | 3 ++-
 drivers/gpu/drm/bochs/bochs_mm.c  | 3 ++-
 drivers/gpu/drm/cirrus/cirrus_ttm.c   | 3 ++-
 drivers/gpu/drm/i915/i915_gem.c   | 3 ++-
 drivers/gpu/drm/i915/i915_gem_gtt.c   | 3 ++-
 drivers/gpu/drm/mgag200/mgag200_ttm.c | 3 ++-
 drivers/gpu/drm/nouveau/nouveau_ttm.c | 2 +-
 drivers/gpu/drm/qxl/qxl_ttm.c | 2 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c   | 2 +-
 9 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/ast/ast_ttm.c b/drivers/gpu/drm/ast/ast_ttm.c
index b824622..61f9e39 100644
--- a/drivers/gpu/drm/ast/ast_ttm.c
+++ b/drivers/gpu/drm/ast/ast_ttm.c
@@ -262,7 +262,8 @@ int ast_mm_init(struct ast_private *ast)
 &ast_bo_driver,
 dev->anon_inode->i_mapping,
 DRM_FILE_PAGE_OFFSET,
-true);
+true,
+0);
if (ret) {
DRM_ERROR("Error initialising bo driver; %d\n", ret);
return ret;
diff --git a/drivers/gpu/drm/bochs/bochs_mm.c b/drivers/gpu/drm/bochs/bochs_mm.c
index f488be5..9dfd24a 100644
--- a/drivers/gpu/drm/bochs/bochs_mm.c
+++ b/drivers/gpu/drm/bochs/bochs_mm.c
@@ -228,7 +228,8 @@ int bochs_mm_init(struct bochs_device *bochs)
 &bochs_bo_driver,
 bochs->dev->anon_inode->i_mapping,
 DRM_FILE_PAGE_OFFSET,
-true);
+true,
+0);
if (ret) {
DRM_ERROR("Error initialising bo driver; %d\n", ret);
return ret;
diff --git a/drivers/gpu/drm/cirrus/cirrus_ttm.c 
b/drivers/gpu/drm/cirrus/cirrus_ttm.c
index 92e6b77..74e8e21 100644
--- a/drivers/gpu/drm/cirrus/cirrus_ttm.c
+++ b/drivers/gpu/drm/cirrus/cirrus_ttm.c
@@ -262,7 +262,8 @@ int cirrus_mm_init(struct cirrus_device *cirrus)
 &cirrus_bo_driver,
 dev->anon_inode->i_mapping,
 DRM_FILE_PAGE_OFFSET,
-true);
+true,
+0);
if (ret) {
DRM_ERROR("Error initialising bo driver; %d\n", ret);
return ret;
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 9c52f68..489bc33 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3270,7 +3270,8 @@ search_free:
ret = drm_mm_insert_node_in_range_generic(&vm->mm, &vma->node,
  size, alignment,
  obj->cache_level, 0, gtt_max,
- DRM_MM_SEARCH_DEFAULT);
+ DRM_MM_SEARCH_DEFAULT,
+ DRM_MM_CREATE_DEFAULT);
if (ret) {
ret = i915_gem_evict_something(dev, vm, size, alignment,
   obj->cache_level, flags);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 63a6dc7..dd72d35 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1072,7 +1072,8 @@ alloc:
  &ppgtt->node, GEN6_PD_SIZE,
  GEN6_PD_ALIGN, 0,
  0, dev_priv->gtt.base.total,
- DRM_MM_SEARCH_DEFAULT);
+ DRM_MM_SEARCH_DEFAULT,
+ DRM_MM_CREATE_DEFAULT);
if (ret == -ENOSPC && !retried) {
ret = i915_gem_evict_something(dev, &dev_priv->gtt.base,
   GEN6_PD_SIZE, GEN6_PD_ALIGN,
diff --git a/drivers/gpu/drm/mgag200/mgag200_ttm.c 
b/drivers/gpu/drm/mgag200/mgag200_ttm.c
index 5a00e90..6844b24 100644
--- a/drivers/gpu/drm/mgag200/mgag200_ttm.c
+++ b/drivers/gpu/drm/mgag200/mgag200_ttm.c
@@ -262,7 +262,8 @@ int mgag200_mm_init(struct mga_device *mdev)
 &mgag200_bo_driver,
 dev->anon_inode->i_mapping,
 DRM_FILE_PAGE_OFFSET,
-true);
+true,
+0);
if (ret) {
DRM_ERROR("Error initialising bo dri

[PATCH 2/4] drm/ttm: Add optional support for two-ended allocation

2014-03-31 Thread Lauri Kasanen
Allocating small bos from one end and large ones from the other helps
improve the quality of fragmentation.

This depends on "drm: Optionally create mm blocks from top-to-bottom" by
Chris Wilson.

Signed-off-by: Lauri Kasanen 
---
 drivers/gpu/drm/ttm/ttm_bo.c |  4 +++-
 drivers/gpu/drm/ttm/ttm_bo_manager.c | 16 +---
 include/drm/ttm/ttm_bo_driver.h  |  7 ++-
 3 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 9df79ac..caf7cd3 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -1453,7 +1453,8 @@ int ttm_bo_device_init(struct ttm_bo_device *bdev,
   struct ttm_bo_driver *driver,
   struct address_space *mapping,
   uint64_t file_page_offset,
-  bool need_dma32)
+  bool need_dma32,
+  uint32_t alloc_threshold)
 {
int ret = -EINVAL;

@@ -1476,6 +1477,7 @@ int ttm_bo_device_init(struct ttm_bo_device *bdev,
bdev->dev_mapping = mapping;
bdev->glob = glob;
bdev->need_dma32 = need_dma32;
+   bdev->alloc_threshold = alloc_threshold;
bdev->val_seq = 0;
spin_lock_init(&bdev->fence_lock);
mutex_lock(&glob->device_list_mutex);
diff --git a/drivers/gpu/drm/ttm/ttm_bo_manager.c 
b/drivers/gpu/drm/ttm/ttm_bo_manager.c
index c58eba33..db9fcb4 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_manager.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_manager.c
@@ -55,6 +55,7 @@ static int ttm_bo_man_get_node(struct ttm_mem_type_manager 
*man,
struct ttm_range_manager *rman = (struct ttm_range_manager *) man->priv;
struct drm_mm *mm = &rman->mm;
struct drm_mm_node *node = NULL;
+   enum drm_mm_allocator_flags aflags = DRM_MM_CREATE_DEFAULT;
unsigned long lpfn;
int ret;

@@ -65,12 +66,21 @@ static int ttm_bo_man_get_node(struct ttm_mem_type_manager 
*man,
node = kzalloc(sizeof(*node), GFP_KERNEL);
if (!node)
return -ENOMEM;
+   /**
+* If the driver requested a threshold, use two-ended allocation.
+* Pinned buffers require bottom-up allocation.
+*/
+   if (man->bdev->alloc_threshold &&
+   !(bo->mem.placement & TTM_PL_FLAG_NO_EVICT) &&
+   man->bdev->alloc_threshold < (mem->num_pages * PAGE_SIZE))
+   aflags = DRM_MM_CREATE_TOP;

spin_lock(&rman->lock);
-   ret = drm_mm_insert_node_in_range(mm, node, mem->num_pages,
- mem->page_alignment,
+   ret = drm_mm_insert_node_in_range_generic(mm, node, mem->num_pages,
+ mem->page_alignment, 0,
  placement->fpfn, lpfn,
- DRM_MM_SEARCH_BEST);
+ DRM_MM_SEARCH_BEST,
+ aflags);
spin_unlock(&rman->lock);

if (unlikely(ret)) {
diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
index 5d8aabe..f5fe6df 100644
--- a/include/drm/ttm/ttm_bo_driver.h
+++ b/include/drm/ttm/ttm_bo_driver.h
@@ -565,6 +565,7 @@ struct ttm_bo_device {
struct delayed_work wq;

bool need_dma32;
+   uint32_t alloc_threshold;
 };

 /**
@@ -751,6 +752,8 @@ extern int ttm_bo_device_release(struct ttm_bo_device 
*bdev);
  * @file_page_offset: Offset into the device address space that is available
  * for buffer data. This ensures compatibility with other users of the
  * address space.
+ * @alloc_threshold: If non-zero, use this as the threshold for two-ended
+ * allocation.
  *
  * Initializes a struct ttm_bo_device:
  * Returns:
@@ -760,7 +763,9 @@ extern int ttm_bo_device_init(struct ttm_bo_device *bdev,
  struct ttm_bo_global *glob,
  struct ttm_bo_driver *driver,
  struct address_space *mapping,
- uint64_t file_page_offset, bool need_dma32);
+ uint64_t file_page_offset,
+ bool need_dma32,
+ uint32_t alloc_threshold);

 /**
  * ttm_bo_unmap_virtual
-- 
1.8.3.1



[PATCH 1/4] drm: Optionally create mm blocks from top-to-bottom

2014-03-31 Thread Lauri Kasanen
Clients like i915 need to segregate cache domains within the GTT which
can lead to small amounts of fragmentation. By allocating the uncached
buffers from the bottom and the cacheable buffers from the top, we can
reduce the amount of wasted space and also optimize allocation of the
mappable portion of the GTT to only those buffers that require CPU
access through the GTT.

For other drivers, allocating small bos from one end and large ones
from the other helps improve the quality of fragmentation.

Original patch by Chris Wilson.

v2 by Ben:
Update callers in i915_gem_object_bind_to_gtt()
Turn search flags and allocation flags into separate enums
Make checkpatch happy where logical/easy

v3 by Ben:
Rebased on top of the many drm_mm changes since the original patches
Remove ATOMIC from allocator flags (Chris)
Reverse order of TOPDOWN and BOTTOMUP

v4 by Lauri:
Remove i915 parts, they don't apply
Respin on top of drm-next

Signed-off-by: Lauri Kasanen 
---
 drivers/gpu/drm/drm_mm.c | 56 +++-
 include/drm/drm_mm.h | 29 +
 2 files changed, 66 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c
index a2d45b74..1728bcc 100644
--- a/drivers/gpu/drm/drm_mm.c
+++ b/drivers/gpu/drm/drm_mm.c
@@ -102,7 +102,8 @@ static struct drm_mm_node 
*drm_mm_search_free_in_range_generic(const struct drm_
 static void drm_mm_insert_helper(struct drm_mm_node *hole_node,
 struct drm_mm_node *node,
 unsigned long size, unsigned alignment,
-unsigned long color)
+unsigned long color,
+enum drm_mm_allocator_flags flags)
 {
struct drm_mm *mm = hole_node->mm;
unsigned long hole_start = drm_mm_hole_node_start(hole_node);
@@ -115,12 +116,22 @@ static void drm_mm_insert_helper(struct drm_mm_node 
*hole_node,
if (mm->color_adjust)
mm->color_adjust(hole_node, color, &adj_start, &adj_end);

+   if (flags & DRM_MM_CREATE_TOP)
+   adj_start = adj_end - size;
+
if (alignment) {
unsigned tmp = adj_start % alignment;
-   if (tmp)
-   adj_start += alignment - tmp;
+   if (tmp) {
+   if (flags & DRM_MM_CREATE_TOP)
+   adj_start -= tmp;
+   else
+   adj_start += alignment - tmp;
+   }
}

+   BUG_ON(adj_start < hole_start);
+   BUG_ON(adj_end > hole_end);
+
if (adj_start == hole_start) {
hole_node->hole_follows = 0;
list_del(&hole_node->hole_stack);
@@ -215,16 +226,17 @@ EXPORT_SYMBOL(drm_mm_reserve_node);
 int drm_mm_insert_node_generic(struct drm_mm *mm, struct drm_mm_node *node,
   unsigned long size, unsigned alignment,
   unsigned long color,
-  enum drm_mm_search_flags flags)
+  enum drm_mm_search_flags sflags,
+  enum drm_mm_allocator_flags aflags)
 {
struct drm_mm_node *hole_node;

hole_node = drm_mm_search_free_generic(mm, size, alignment,
-  color, flags);
+  color, sflags);
if (!hole_node)
return -ENOSPC;

-   drm_mm_insert_helper(hole_node, node, size, alignment, color);
+   drm_mm_insert_helper(hole_node, node, size, alignment, color, aflags);
return 0;
 }
 EXPORT_SYMBOL(drm_mm_insert_node_generic);
@@ -233,7 +245,8 @@ static void drm_mm_insert_helper_range(struct drm_mm_node 
*hole_node,
   struct drm_mm_node *node,
   unsigned long size, unsigned alignment,
   unsigned long color,
-  unsigned long start, unsigned long end)
+  unsigned long start, unsigned long end,
+  enum drm_mm_allocator_flags flags)
 {
struct drm_mm *mm = hole_node->mm;
unsigned long hole_start = drm_mm_hole_node_start(hole_node);
@@ -248,13 +261,20 @@ static void drm_mm_insert_helper_range(struct drm_mm_node 
*hole_node,
if (adj_end > end)
adj_end = end;

+   if (flags & DRM_MM_CREATE_TOP)
+   adj_start = adj_end - size;
+
if (mm->color_adjust)
mm->color_adjust(hole_node, color, &adj_start, &adj_end);

if (alignment) {
unsigned tmp = adj_start % alignment;
-   if (tmp)
-   adj_start += alignment - tmp;
+ 

[PATCH 0/4] Two-ended allocation support

2014-03-31 Thread Lauri Kasanen
Cheers,

This patchset is respun on top of today's drm-next. It has only been
compile-tested, since my test computer is currently on strike in
support of Ukraine.

It was a simple spin though, so Mr. Murphy will likely stay away. Only
Radeon behavior is affected, all other drivers have just compiling
changes.

- Lauri


[PATCH] drm/radeon: TTM must be init with cpu-visible VRAM, v2

2014-03-01 Thread Lauri Kasanen
On Sat, 1 Mar 2014 06:47:41 +1000
Dave Airlie  wrote:

> On Sat, Mar 1, 2014 at 5:56 AM, Christian K?nig  
> wrote:
> > Am 28.02.2014 19:50, schrieb Lauri Kasanen:
> >
> >> Without this, a bo may get created in the cpu-inaccessible vram.
> >> Before the CP engines get setup, all copies are done via cpu memcpy.
> >>
> >> This means that the cpu tries to read from inaccessible memory, fails,
> >> and the radeon module proceeds to disable acceleration.
> >>
> >> Doing this has no downsides, as the real VRAM size gets set as soon as the
> >> CP engines get init.
> >>
> >> This is a candidate for 3.14 fixes.
> >>
> >> v2: Add comment on why the function is used
>
>> Reviewed-by: Christian K?nig 
>>
>> And I suggest to add "Cc: stable at vger.kernel.org" as well.
>
> Won't this create objects that are stuck in the middle of VRAM with
> the new top down approach?
> 
> then when we go to use all the VRAM they'll be pinned in the middle?

Yes, the initial pins would act like that with the top-down code. But
the top-down logic is 3.15 material and still WIP.

Depending on their constraints, I think I'll either add a new flag, or
turn them into FIXED allocations - do they need to be at exact position
foo or only at the beginning. (Christian?)

So sending this fix to stable is safe, as they all use bottom-up.

- Lauri


[PATCH] drm/radeon: TTM must be init with cpu-visible VRAM, v2

2014-02-28 Thread Lauri Kasanen
Without this, a bo may get created in the cpu-inaccessible vram.
Before the CP engines get setup, all copies are done via cpu memcpy.

This means that the cpu tries to read from inaccessible memory, fails,
and the radeon module proceeds to disable acceleration.

Doing this has no downsides, as the real VRAM size gets set as soon as the
CP engines get init.

This is a candidate for 3.14 fixes.

v2: Add comment on why the function is used

Signed-off-by: Lauri Kasanen 
---
 drivers/gpu/drm/radeon/radeon_ttm.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c 
b/drivers/gpu/drm/radeon/radeon_ttm.c
index 3aa853c..b966f85 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -715,6 +715,9 @@ int radeon_ttm_init(struct radeon_device *rdev)
DRM_ERROR("Failed initializing VRAM heap.\n");
return r;
}
+   /* Change the size here instead of the init above so only lpfn is 
affected */
+   radeon_ttm_set_active_vram_size(rdev, rdev->mc.visible_vram_size);
+
r = radeon_bo_create(rdev, 256 * 1024, PAGE_SIZE, true,
 RADEON_GEM_DOMAIN_VRAM,
 NULL, &rdev->stollen_vga_memory);
-- 
1.8.3.1



[PATCH] drm/radeon: TTM must be init with cpu-visible VRAM

2014-02-28 Thread Lauri Kasanen
On Fri, 28 Feb 2014 16:43:54 +0100
Christian K?nig  wrote:

> >>> Am 27.02.2014 22:38, schrieb Lauri Kasanen:
> >>>> Without this, a bo may get created in the cpu-inaccessible vram.
> >>>> Before the CP engines get setup, all copies are done via cpu memcpy.
> >>>>
> >>>> This means that the cpu tries to read from inaccessible memory, fails,
> >>>> and the radeon module proceeds to disable acceleration.
> >>>>
> >>>> Doing this has no downsides, as the real VRAM size gets set as soon as 
> >>>> the
> >>>> CP engines get init.
> >>>>
> >>>> This is a candidate for 3.14 fixes.
> >>> This should be unnecessary, since TTM gets initialized only seeing the
> >>> visible VRAM and later on radeon_ttm_set_active_vram_size gets called to
> >>> increase the limit.
> >>>
> >>> If this isn't the case any more we should figure out why instead of
> >>> working around it like this.
> >> Negative, TTM gets initialized with real_vram just a few lines above
> >> this patch, not visible_vram.
> > git blame shows 7a50f01a from 2009, "drm/radeon/kms: vram sizing on
> > certain r100 chips needs workaround." by Dave Airlie.
> >
> > So the TTM VRAM init has been wrong for five years, and only worked by
> > accident because until now all allocations were done bottom-up.
> 
> Yeah, just came to the same conclusion. We probably never hit the case 
> in the last five years because we don't really access the memory before 
> we start the copy ring.
> 
> Please fix ttm_bo_init_mm to use rdev->mc.visible_vram_size instead.
> 
> That allocations are made bottom-up is relied upon in a couple of other 
> cases as well, the stolen VGA memory and the UVD firmware handling for 
> example.

I initially did that, but it caused some issues. I didn't investigage
it further, but I guess it has to be init to the maximum size, and then
only the size lowered, so that the change only affects lpfn.

- Lauri


[PATCH] drm/radeon: TTM must be init with cpu-visible VRAM

2014-02-28 Thread Lauri Kasanen
On Fri, 28 Feb 2014 17:30:39 +0200
Lauri Kasanen  wrote:

> On Fri, 28 Feb 2014 10:36:59 +0100
> Christian K?nig  wrote:
> 
> > Am 27.02.2014 22:38, schrieb Lauri Kasanen:
> > > Without this, a bo may get created in the cpu-inaccessible vram.
> > > Before the CP engines get setup, all copies are done via cpu memcpy.
> > >
> > > This means that the cpu tries to read from inaccessible memory, fails,
> > > and the radeon module proceeds to disable acceleration.
> > >
> > > Doing this has no downsides, as the real VRAM size gets set as soon as the
> > > CP engines get init.
> > >
> > > This is a candidate for 3.14 fixes.
> > 
> > This should be unnecessary, since TTM gets initialized only seeing the 
> > visible VRAM and later on radeon_ttm_set_active_vram_size gets called to 
> > increase the limit.
> > 
> > If this isn't the case any more we should figure out why instead of 
> > working around it like this.
> 
> Negative, TTM gets initialized with real_vram just a few lines above
> this patch, not visible_vram.

git blame shows 7a50f01a from 2009, "drm/radeon/kms: vram sizing on
certain r100 chips needs workaround." by Dave Airlie.

So the TTM VRAM init has been wrong for five years, and only worked by
accident because until now all allocations were done bottom-up.

- Lauri


[PATCH] drm/radeon: TTM must be init with cpu-visible VRAM

2014-02-28 Thread Lauri Kasanen
On Fri, 28 Feb 2014 10:36:59 +0100
Christian K?nig  wrote:

> Am 27.02.2014 22:38, schrieb Lauri Kasanen:
> > Without this, a bo may get created in the cpu-inaccessible vram.
> > Before the CP engines get setup, all copies are done via cpu memcpy.
> >
> > This means that the cpu tries to read from inaccessible memory, fails,
> > and the radeon module proceeds to disable acceleration.
> >
> > Doing this has no downsides, as the real VRAM size gets set as soon as the
> > CP engines get init.
> >
> > This is a candidate for 3.14 fixes.
> 
> This should be unnecessary, since TTM gets initialized only seeing the 
> visible VRAM and later on radeon_ttm_set_active_vram_size gets called to 
> increase the limit.
> 
> If this isn't the case any more we should figure out why instead of 
> working around it like this.

Negative, TTM gets initialized with real_vram just a few lines above
this patch, not visible_vram.

- Lauri


[RFC] drm/ttm: Add optional support for two-ended allocation

2014-02-27 Thread Lauri Kasanen
Allocating small bos from one end and large ones from the other helps
improve the quality of fragmentation.

I have measured a suitable threshold to reduce eviction by up to 20%,
and to cause no harm in normal cases that fit VRAM fully (PTS gaming suite).
In some cases, even the VRAM-fitting cases improved slightly (openarena, urban 
terror).

This depends on "drm: Optionally create mm blocks from top-to-bottom" by
Chris Wilson, and so is just RFC for now.
---
 drivers/gpu/drm/ttm/ttm_bo.c |  4 +++-
 drivers/gpu/drm/ttm/ttm_bo_manager.c | 12 +---
 include/drm/ttm/ttm_bo_driver.h  |  5 -
 3 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index a066513..7f30c2d 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -1450,7 +1450,8 @@ int ttm_bo_device_init(struct ttm_bo_device *bdev,
   struct ttm_bo_global *glob,
   struct ttm_bo_driver *driver,
   uint64_t file_page_offset,
-  bool need_dma32)
+  bool need_dma32,
+  uint32_t alloc_threshold)
 {
int ret = -EINVAL;

@@ -1473,6 +1474,7 @@ int ttm_bo_device_init(struct ttm_bo_device *bdev,
bdev->dev_mapping = NULL;
bdev->glob = glob;
bdev->need_dma32 = need_dma32;
+   bdev->alloc_threshold = alloc_threshold;
bdev->val_seq = 0;
spin_lock_init(&bdev->fence_lock);
mutex_lock(&glob->device_list_mutex);
diff --git a/drivers/gpu/drm/ttm/ttm_bo_manager.c 
b/drivers/gpu/drm/ttm/ttm_bo_manager.c
index c58eba33..bd1cbf1 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_manager.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_manager.c
@@ -55,6 +55,7 @@ static int ttm_bo_man_get_node(struct ttm_mem_type_manager 
*man,
struct ttm_range_manager *rman = (struct ttm_range_manager *) man->priv;
struct drm_mm *mm = &rman->mm;
struct drm_mm_node *node = NULL;
+   enum drm_mm_allocator_flags aflags = DRM_MM_CREATE_DEFAULT;
unsigned long lpfn;
int ret;

@@ -66,11 +67,16 @@ static int ttm_bo_man_get_node(struct ttm_mem_type_manager 
*man,
if (!node)
return -ENOMEM;

+   if (man->bdev->alloc_threshold &&
+   man->bdev->alloc_threshold < (mem->num_pages * PAGE_SIZE))
+   aflags = DRM_MM_CREATE_TOP;
+
spin_lock(&rman->lock);
-   ret = drm_mm_insert_node_in_range(mm, node, mem->num_pages,
- mem->page_alignment,
+   ret = drm_mm_insert_node_in_range_generic(mm, node, mem->num_pages,
+ mem->page_alignment, 0,
  placement->fpfn, lpfn,
- DRM_MM_SEARCH_BEST);
+ DRM_MM_SEARCH_BEST,
+ aflags);
spin_unlock(&rman->lock);

if (unlikely(ret)) {
diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
index 32d34eb..831320e 100644
--- a/include/drm/ttm/ttm_bo_driver.h
+++ b/include/drm/ttm/ttm_bo_driver.h
@@ -565,6 +565,7 @@ struct ttm_bo_device {
struct delayed_work wq;

bool need_dma32;
+   uint32_t alloc_threshold;
 };

 /**
@@ -758,7 +759,9 @@ extern int ttm_bo_device_release(struct ttm_bo_device 
*bdev);
 extern int ttm_bo_device_init(struct ttm_bo_device *bdev,
  struct ttm_bo_global *glob,
  struct ttm_bo_driver *driver,
- uint64_t file_page_offset, bool need_dma32);
+ uint64_t file_page_offset,
+ bool need_dma32,
+ uint32_t alloc_threshold);

 /**
  * ttm_bo_unmap_virtual
-- 
1.8.3.1



[PATCH] drm/radeon: TTM must be init with cpu-visible VRAM

2014-02-27 Thread Lauri Kasanen
Without this, a bo may get created in the cpu-inaccessible vram.
Before the CP engines get setup, all copies are done via cpu memcpy.

This means that the cpu tries to read from inaccessible memory, fails,
and the radeon module proceeds to disable acceleration.

Doing this has no downsides, as the real VRAM size gets set as soon as the
CP engines get init.

This is a candidate for 3.14 fixes.

Signed-off-by: Lauri Kasanen 
---
 drivers/gpu/drm/radeon/radeon_ttm.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c 
b/drivers/gpu/drm/radeon/radeon_ttm.c
index 3aa853c..35eae91 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -715,6 +715,8 @@ int radeon_ttm_init(struct radeon_device *rdev)
DRM_ERROR("Failed initializing VRAM heap.\n");
return r;
}
+   radeon_ttm_set_active_vram_size(rdev, rdev->mc.visible_vram_size);
+
r = radeon_bo_create(rdev, 256 * 1024, PAGE_SIZE, true,
 RADEON_GEM_DOMAIN_VRAM,
 NULL, &rdev->stollen_vga_memory);
-- 
1.8.3.1



TTM's role in score-based eviction

2013-12-11 Thread Lauri Kasanen
On Wed, 11 Dec 2013 15:46:53 +0100
Thomas Hellstrom  wrote:

> > I think the kernel just has to trust userspace on this. I can't think
> > of any way of not involving userspace, so if somebody really wants to
> > hack mesa to gain some fps advantage on a multiseat system, let them ;)
> >
> > Basically, they already can hack mesa to pass invalid buffers to cause
> > a hang/crash the kernel. So we already trust userspace more than this
> > new functionality would.
> 
> Yes, but these are two different things. Letting user-space pin buffers 
> by design is building in a software DOS in the kernel.
> I don't think even Microsoft is allowing this, and AFAICT we've avoided 
> that since the very dawn of kernel buffer management.
> 
> Not having a perfect command stream parser or proper GPU hang recovery 
> mechanism is something else, and something we wish
> to have but don't at the moment.
> 
> Allowing a new type of DOS just because we have other flaws isn't moving 
> things forward, but i guess in the end it's your choice.

The worst case with the scoring is that a new client will work
somewhat slower than it otherwise would. I wouldn't call this a DOS.

Instead I would compare it to nice levels. Still, I agree with your
concern that a user could disturb another user. This wouldn't be an
issue within a single user environment, as the user obviously wanted
it if he went that far.

Perhaps we could solve that by taking the process's UID into account
inside the kernel. If there are multiple UIDs with 3d processes
running, reserve a chunk of VRAM for each?

- Lauri


TTM's role in score-based eviction

2013-12-11 Thread Lauri Kasanen
On Wed, 11 Dec 2013 12:04:05 +0900
Michel D?nzer  wrote:

> > Of all the worries that exist, this is a non-issue. Userspace can
> > simply queue a lot of draw calls that take 1 second each through the
> > normal command submission methods, why would it need to tweak some
> > obscure number to cause some eviction?
> 
> That's not what I'm concerned about.
> 
> Consider e.g. a multiseat environment: Some users could patch their
> userspace drivers such that their buffers are more likely to stay in
> VRAM than those of other users.
> 
> I agree it's not a huge issue, I'm just saying we should try to make the
> score calculation as much as possible based on the actual usage of the
> buffers instead of on meta data provided by userspace.

We don't have that in the kernel. Only userspace has the accurate stats
on usage. If we instead modified userspace to pass these stats, it
would have the exact same issue of "what if somebody passes false data"?

Maarten:
> Well, the easiest solution is to make the score only count as penalty,
> and set buffers that don't have the meta-data to maximum score. This
> preserves current behavior for clients that aren't score aware.

No, this would be the exact opposite: it would pin the old-userspace
buffers, at the cost of possibly not letting proper-scored buffers in
VRAM.

Thomas:
> I agree with Michel that some mechanism needs to be in place to stop 
> user-space clients from effectively pinning buffers by giving them a certain 
> > score.

I think the kernel just has to trust userspace on this. I can't think
of any way of not involving userspace, so if somebody really wants to
hack mesa to gain some fps advantage on a multiseat system, let them ;)

Basically, they already can hack mesa to pass invalid buffers to cause
a hang/crash the kernel. So we already trust userspace more than this
new functionality would.

> 2) If score is calculated in user-space, how are shared buffers handled?

Good question, I don't know yet.

- Lauri


TTM's role in score-based eviction

2013-12-10 Thread Lauri Kasanen
On Mon, 9 Dec 2013 23:45:12 +0100
Marek Ol??k  wrote:

> > Note that the hotness calculation will be in userspace, as only there
> > are the necessary counters available. So the finished hotness score
> > will be passed to the kernel, instead of sending all the necessary data
> > there. Ought to be less context switches that way.
> 
> This sounds good, but you will also need to update the DDX for
> everything up to and including Cayman. Hopefully the DDX doesn't emit
> IBs outside of glamor on Southern Islands and later chips.

Do you mean to pass an empty score (0) for 2d buffers, or that 2d
buffers should also get the calculation? I suppose this depends on
which ioctl is used to pass it.

IMHO for 2d use the current behavior is ok, so passing a dummy value
and falling back should be enough.

- Lauri


TTM's role in score-based eviction

2013-12-09 Thread Lauri Kasanen
On Mon, 9 Dec 2013 20:28:21 +0100
Marek Ol??k  wrote:

Hi,

> FYI, since the userspace driver sends end-of-frame markers to the
> kernel, the radeon kernel driver knows the current frame number and it
> can also save the frame number of the last use of each buffer. We
> should definitely use that to measure the buffer hotness, or just
> prevent eviction if the buffer was used recently (the last 2 or 3
> frames) and you can drop the hotness calculations entirely.

I think this would result in sub-optimal behavior with one client, but
a workload larger than VRAM. If everything is needed in one frame, then
this logic would almost randomly decide what gets to stay.

> Also, MSAA buffers and depth buffers should have higher probability of
> being placed in VRAM than other buffers, because their placement has
> higher impact on performance. They also tend to contain auxiliary data
> which significantly improve performance, like fast clear data, MSAA
> fragment coverage data, and hierarchical depth and stencil data. We
> can add a new ioctl which sets buffer usage flags.

Thanks, this info will be useful.

Note that the hotness calculation will be in userspace, as only there
are the necessary counters available. So the finished hotness score
will be passed to the kernel, instead of sending all the necessary data
there. Ought to be less context switches that way.

- Lauri


TTM's role in score-based eviction

2013-12-05 Thread Lauri Kasanen
Hi list, Thomas,

I will be investigating the use of a hotness score for each bo, to
replace the ping-pong causing LRU eviction in radeon*.

The goal is to put all bos that fit in VRAM there, in order of hotness;
a new bo should only be placed there if its hotness score is greater
than the lowest VRAM bo's. Then the lowest-hotness-bos in
VRAM should be evicted until the new bo fits. This should result in a
more stable set with less ping-pong.

Jerome advised that the bo placement should be done entirely outside
TTM. As I'm not (yet) too familiar with that side of the kernel, what is
the opinion of TTM folks?

- Lauri

* github.com/clbr/jamkthesis


radeon: Make PM info available to all, not just debug users, yearly re-roll

2013-07-08 Thread Lauri Kasanen
Hi list,

In June 2012 I started a discussion on moving the radeon PM info out
from debugfs, into a proper place available to all, be it proc, sysfs,
or some other way. For the rationale please see the archives:
http://www.mail-archive.com/dri-devel at lists.freedesktop.org/msg23401.html

Jerome's reply at the time was that since big changes are coming, he
didn't want to commit to any ABI at that time. DPM has now landed.

Jerome wanted to implement a custom ioctl for unprivileged apps to
query the power details (and ideally other GPU info too). As now the
big changes have happened, what's the status on this ioctl?


Pre-emptive -enopatch: I'm not the right person to design an ioctl, with
exactly zero such experience in making kernel ABIs. I do volunteer to
making the userspace app that displays the values and info.

- Lauri


radeon: Make PM info available to all, not just debug users, yearly re-roll

2013-07-08 Thread Lauri Kasanen
Hi list,

In June 2012 I started a discussion on moving the radeon PM info out
from debugfs, into a proper place available to all, be it proc, sysfs,
or some other way. For the rationale please see the archives:
http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg23401.html

Jerome's reply at the time was that since big changes are coming, he
didn't want to commit to any ABI at that time. DPM has now landed.

Jerome wanted to implement a custom ioctl for unprivileged apps to
query the power details (and ideally other GPU info too). As now the
big changes have happened, what's the status on this ioctl?


Pre-emptive -enopatch: I'm not the right person to design an ioctl, with
exactly zero such experience in making kernel ABIs. I do volunteer to
making the userspace app that displays the values and info.

- Lauri
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] drm/radeon: Remove unused functions

2012-07-31 Thread Lauri Kasanen
This applies on top of drm/radeon: Mark all possible functions / structs as 
static.

Signed-off-by: Lauri Kasanen 
---
 drivers/gpu/drm/radeon/r100.c   |   44 ---
 drivers/gpu/drm/radeon/radeon_combios.c |9 -
 drivers/gpu/drm/radeon/radeon_fb.c  |   16 --
 drivers/gpu/drm/radeon/radeon_legacy_crtc.c |5 ---
 4 files changed, 0 insertions(+), 74 deletions(-)

diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c
index b51fc49..ff9b692 100644
--- a/drivers/gpu/drm/radeon/r100.c
+++ b/drivers/gpu/drm/radeon/r100.c
@@ -2544,50 +2544,6 @@ static void r100_errata(struct radeon_device *rdev)
}
 }

-/* Wait for vertical sync on primary CRTC */
-static void r100_gpu_wait_for_vsync(struct radeon_device *rdev)
-{
-   uint32_t crtc_gen_cntl, tmp;
-   int i;
-
-   crtc_gen_cntl = RREG32(RADEON_CRTC_GEN_CNTL);
-   if ((crtc_gen_cntl & RADEON_CRTC_DISP_REQ_EN_B) ||
-   !(crtc_gen_cntl & RADEON_CRTC_EN)) {
-   return;
-   }
-   /* Clear the CRTC_VBLANK_SAVE bit */
-   WREG32(RADEON_CRTC_STATUS, RADEON_CRTC_VBLANK_SAVE_CLEAR);
-   for (i = 0; i < rdev->usec_timeout; i++) {
-   tmp = RREG32(RADEON_CRTC_STATUS);
-   if (tmp & RADEON_CRTC_VBLANK_SAVE) {
-   return;
-   }
-   DRM_UDELAY(1);
-   }
-}
-
-/* Wait for vertical sync on secondary CRTC */
-static void r100_gpu_wait_for_vsync2(struct radeon_device *rdev)
-{
-   uint32_t crtc2_gen_cntl, tmp;
-   int i;
-
-   crtc2_gen_cntl = RREG32(RADEON_CRTC2_GEN_CNTL);
-   if ((crtc2_gen_cntl & RADEON_CRTC2_DISP_REQ_EN_B) ||
-   !(crtc2_gen_cntl & RADEON_CRTC2_EN))
-   return;
-
-   /* Clear the CRTC_VBLANK_SAVE bit */
-   WREG32(RADEON_CRTC2_STATUS, RADEON_CRTC2_VBLANK_SAVE_CLEAR);
-   for (i = 0; i < rdev->usec_timeout; i++) {
-   tmp = RREG32(RADEON_CRTC2_STATUS);
-   if (tmp & RADEON_CRTC2_VBLANK_SAVE) {
-   return;
-   }
-   DRM_UDELAY(1);
-   }
-}
-
 static int r100_rbbm_fifo_wait_for_entry(struct radeon_device *rdev, unsigned 
n)
 {
unsigned i;
diff --git a/drivers/gpu/drm/radeon/radeon_combios.c 
b/drivers/gpu/drm/radeon/radeon_combios.c
index c2bfac2..fe225cc 100644
--- a/drivers/gpu/drm/radeon/radeon_combios.c
+++ b/drivers/gpu/drm/radeon/radeon_combios.c
@@ -3304,15 +3304,6 @@ static void combios_write_ram_size(struct drm_device 
*dev)
WREG32(RADEON_CONFIG_MEMSIZE, mem_size);
 }

-static void radeon_combios_dyn_clk_setup(struct drm_device *dev, int enable)
-{
-   uint16_t dyn_clk_info =
-   combios_get_table_offset(dev, COMBIOS_DYN_CLK_1_TABLE);
-
-   if (dyn_clk_info)
-   combios_parse_pll_table(dev, dyn_clk_info);
-}
-
 void radeon_combios_asic_init(struct drm_device *dev)
 {
struct radeon_device *rdev = dev->dev_private;
diff --git a/drivers/gpu/drm/radeon/radeon_fb.c 
b/drivers/gpu/drm/radeon/radeon_fb.c
index 8b088ca..6ada72c 100644
--- a/drivers/gpu/drm/radeon/radeon_fb.c
+++ b/drivers/gpu/drm/radeon/radeon_fb.c
@@ -316,22 +316,6 @@ static int radeon_fb_find_or_create_single(struct 
drm_fb_helper *helper,
return new_fb;
 }

-static char *mode_option;
-static int radeon_parse_options(char *options)
-{
-   char *this_opt;
-
-   if (!options || !*options)
-   return 0;
-
-   while ((this_opt = strsep(&options, ",")) != NULL) {
-   if (!*this_opt)
-   continue;
-   mode_option = this_opt;
-   }
-   return 0;
-}
-
 void radeon_fb_output_poll_changed(struct radeon_device *rdev)
 {
drm_fb_helper_hotplug_event(&rdev->mode_info.rfbdev->helper);
diff --git a/drivers/gpu/drm/radeon/radeon_legacy_crtc.c 
b/drivers/gpu/drm/radeon/radeon_legacy_crtc.c
index 31022c2..08a60bd 100644
--- a/drivers/gpu/drm/radeon/radeon_legacy_crtc.c
+++ b/drivers/gpu/drm/radeon/radeon_legacy_crtc.c
@@ -206,11 +206,6 @@ static void radeon_legacy_rmx_mode_set(struct drm_crtc 
*crtc,
WREG32(RADEON_FP_CRTC_V_TOTAL_DISP, fp_crtc_v_total_disp);
 }

-static void radeon_restore_common_regs(struct drm_device *dev)
-{
-   /* don't need this yet */
-}
-
 static void radeon_pll_wait_for_read_update_complete(struct drm_device *dev)
 {
struct radeon_device *rdev = dev->dev_private;
-- 
1.7.2.1



[PATCH] drm/radeon: Mark all possible functions / structs as static

2012-07-31 Thread Lauri Kasanen
On Tue, 31 Jul 2012 14:43:34 +1000
Dave Airlie  wrote:

> On Tue, Jul 31, 2012 at 12:45 AM, Alex Deucher  
> wrote:
> > On Fri, Jul 27, 2012 at 5:34 PM, Lauri Kasanen  wrote:
> >> Let's allow GCC to optimize better.
> >>
> >> This exposed some five unused functions, but this patch doesn't remove 
> >> them.
> >>
> >> Signed-off-by: Lauri Kasanen 
> >
> > Reviewed-by: Alex Deucher 
> 
> You might want to take this one step further and fix all the warnings
> it introduces here,
> 
>   CC [M]  drivers/gpu/drm/radeon/radeon_combios.o
> /home/airlied/devel/kernel/drm-next/drivers/gpu/drm/radeon/radeon_combios.c:3307:13:
> warning: ?radeon_combios_dyn_clk_setup? defined but not used
> [-Wunused-function]
>  CC [M]  drivers/gpu/drm/radeon/radeon_legacy_crtc.o
> /home/airlied/devel/kernel/drm-next/drivers/gpu/drm/radeon/radeon_legacy_crtc.c:209:13:
> warning: ?radeon_restore_common_regs? defined but not used
> [-Wunused-function]
> 
>   CC [M]  drivers/gpu/drm/radeon/r100.o
> /home/airlied/devel/kernel/drm-next/drivers/gpu/drm/radeon/r100.c:2548:13:
> warning: ?r100_gpu_wait_for_vsync? defined but not used
> [-Wunused-function]
> /home/airlied/devel/kernel/drm-next/drivers/gpu/drm/radeon/r100.c:2570:13:
> warning: ?r100_gpu_wait_for_vsync2? defined but not used
> [-Wunused-function]
> 
> Not sure we want to lose all these functions or just if 0 them out for
> reference purposes, or even why we might need to call some of them.

That's why I didn't remove them. I don't have the info on whether they are 
needed or not, someone qualified will need to make that decision.

- Lauri


[PATCH] drm/radeon: Remove unused functions

2012-07-31 Thread Lauri Kasanen
This applies on top of drm/radeon: Mark all possible functions / structs as 
static.

Signed-off-by: Lauri Kasanen 
---
 drivers/gpu/drm/radeon/r100.c   |   44 ---
 drivers/gpu/drm/radeon/radeon_combios.c |9 -
 drivers/gpu/drm/radeon/radeon_fb.c  |   16 --
 drivers/gpu/drm/radeon/radeon_legacy_crtc.c |5 ---
 4 files changed, 0 insertions(+), 74 deletions(-)

diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c
index b51fc49..ff9b692 100644
--- a/drivers/gpu/drm/radeon/r100.c
+++ b/drivers/gpu/drm/radeon/r100.c
@@ -2544,50 +2544,6 @@ static void r100_errata(struct radeon_device *rdev)
}
 }
 
-/* Wait for vertical sync on primary CRTC */
-static void r100_gpu_wait_for_vsync(struct radeon_device *rdev)
-{
-   uint32_t crtc_gen_cntl, tmp;
-   int i;
-
-   crtc_gen_cntl = RREG32(RADEON_CRTC_GEN_CNTL);
-   if ((crtc_gen_cntl & RADEON_CRTC_DISP_REQ_EN_B) ||
-   !(crtc_gen_cntl & RADEON_CRTC_EN)) {
-   return;
-   }
-   /* Clear the CRTC_VBLANK_SAVE bit */
-   WREG32(RADEON_CRTC_STATUS, RADEON_CRTC_VBLANK_SAVE_CLEAR);
-   for (i = 0; i < rdev->usec_timeout; i++) {
-   tmp = RREG32(RADEON_CRTC_STATUS);
-   if (tmp & RADEON_CRTC_VBLANK_SAVE) {
-   return;
-   }
-   DRM_UDELAY(1);
-   }
-}
-
-/* Wait for vertical sync on secondary CRTC */
-static void r100_gpu_wait_for_vsync2(struct radeon_device *rdev)
-{
-   uint32_t crtc2_gen_cntl, tmp;
-   int i;
-
-   crtc2_gen_cntl = RREG32(RADEON_CRTC2_GEN_CNTL);
-   if ((crtc2_gen_cntl & RADEON_CRTC2_DISP_REQ_EN_B) ||
-   !(crtc2_gen_cntl & RADEON_CRTC2_EN))
-   return;
-
-   /* Clear the CRTC_VBLANK_SAVE bit */
-   WREG32(RADEON_CRTC2_STATUS, RADEON_CRTC2_VBLANK_SAVE_CLEAR);
-   for (i = 0; i < rdev->usec_timeout; i++) {
-   tmp = RREG32(RADEON_CRTC2_STATUS);
-   if (tmp & RADEON_CRTC2_VBLANK_SAVE) {
-   return;
-   }
-   DRM_UDELAY(1);
-   }
-}
-
 static int r100_rbbm_fifo_wait_for_entry(struct radeon_device *rdev, unsigned 
n)
 {
unsigned i;
diff --git a/drivers/gpu/drm/radeon/radeon_combios.c 
b/drivers/gpu/drm/radeon/radeon_combios.c
index c2bfac2..fe225cc 100644
--- a/drivers/gpu/drm/radeon/radeon_combios.c
+++ b/drivers/gpu/drm/radeon/radeon_combios.c
@@ -3304,15 +3304,6 @@ static void combios_write_ram_size(struct drm_device 
*dev)
WREG32(RADEON_CONFIG_MEMSIZE, mem_size);
 }
 
-static void radeon_combios_dyn_clk_setup(struct drm_device *dev, int enable)
-{
-   uint16_t dyn_clk_info =
-   combios_get_table_offset(dev, COMBIOS_DYN_CLK_1_TABLE);
-
-   if (dyn_clk_info)
-   combios_parse_pll_table(dev, dyn_clk_info);
-}
-
 void radeon_combios_asic_init(struct drm_device *dev)
 {
struct radeon_device *rdev = dev->dev_private;
diff --git a/drivers/gpu/drm/radeon/radeon_fb.c 
b/drivers/gpu/drm/radeon/radeon_fb.c
index 8b088ca..6ada72c 100644
--- a/drivers/gpu/drm/radeon/radeon_fb.c
+++ b/drivers/gpu/drm/radeon/radeon_fb.c
@@ -316,22 +316,6 @@ static int radeon_fb_find_or_create_single(struct 
drm_fb_helper *helper,
return new_fb;
 }
 
-static char *mode_option;
-static int radeon_parse_options(char *options)
-{
-   char *this_opt;
-
-   if (!options || !*options)
-   return 0;
-
-   while ((this_opt = strsep(&options, ",")) != NULL) {
-   if (!*this_opt)
-   continue;
-   mode_option = this_opt;
-   }
-   return 0;
-}
-
 void radeon_fb_output_poll_changed(struct radeon_device *rdev)
 {
drm_fb_helper_hotplug_event(&rdev->mode_info.rfbdev->helper);
diff --git a/drivers/gpu/drm/radeon/radeon_legacy_crtc.c 
b/drivers/gpu/drm/radeon/radeon_legacy_crtc.c
index 31022c2..08a60bd 100644
--- a/drivers/gpu/drm/radeon/radeon_legacy_crtc.c
+++ b/drivers/gpu/drm/radeon/radeon_legacy_crtc.c
@@ -206,11 +206,6 @@ static void radeon_legacy_rmx_mode_set(struct drm_crtc 
*crtc,
WREG32(RADEON_FP_CRTC_V_TOTAL_DISP, fp_crtc_v_total_disp);
 }
 
-static void radeon_restore_common_regs(struct drm_device *dev)
-{
-   /* don't need this yet */
-}
-
 static void radeon_pll_wait_for_read_update_complete(struct drm_device *dev)
 {
struct radeon_device *rdev = dev->dev_private;
-- 
1.7.2.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/radeon: Mark all possible functions / structs as static

2012-07-31 Thread Lauri Kasanen
On Tue, 31 Jul 2012 14:43:34 +1000
Dave Airlie  wrote:

> On Tue, Jul 31, 2012 at 12:45 AM, Alex Deucher  wrote:
> > On Fri, Jul 27, 2012 at 5:34 PM, Lauri Kasanen  wrote:
> >> Let's allow GCC to optimize better.
> >>
> >> This exposed some five unused functions, but this patch doesn't remove 
> >> them.
> >>
> >> Signed-off-by: Lauri Kasanen 
> >
> > Reviewed-by: Alex Deucher 
> 
> You might want to take this one step further and fix all the warnings
> it introduces here,
> 
>   CC [M]  drivers/gpu/drm/radeon/radeon_combios.o
> /home/airlied/devel/kernel/drm-next/drivers/gpu/drm/radeon/radeon_combios.c:3307:13:
> warning: ‘radeon_combios_dyn_clk_setup’ defined but not used
> [-Wunused-function]
>  CC [M]  drivers/gpu/drm/radeon/radeon_legacy_crtc.o
> /home/airlied/devel/kernel/drm-next/drivers/gpu/drm/radeon/radeon_legacy_crtc.c:209:13:
> warning: ‘radeon_restore_common_regs’ defined but not used
> [-Wunused-function]
> 
>   CC [M]  drivers/gpu/drm/radeon/r100.o
> /home/airlied/devel/kernel/drm-next/drivers/gpu/drm/radeon/r100.c:2548:13:
> warning: ‘r100_gpu_wait_for_vsync’ defined but not used
> [-Wunused-function]
> /home/airlied/devel/kernel/drm-next/drivers/gpu/drm/radeon/r100.c:2570:13:
> warning: ‘r100_gpu_wait_for_vsync2’ defined but not used
> [-Wunused-function]
> 
> Not sure we want to lose all these functions or just if 0 them out for
> reference purposes, or even why we might need to call some of them.

That's why I didn't remove them. I don't have the info on whether they are 
needed or not, someone qualified will need to make that decision.

- Lauri
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] drm/radeon: Mark all possible functions / structs as static

2012-07-28 Thread Lauri Kasanen
Let's allow GCC to optimize better.

This exposed some five unused functions, but this patch doesn't remove them.

Signed-off-by: Lauri Kasanen 
---
 drivers/gpu/drm/radeon/atombios_encoders.c  |4 +-
 drivers/gpu/drm/radeon/evergreen.c  |   12 +-
 drivers/gpu/drm/radeon/evergreen_cs.c   |2 +-
 drivers/gpu/drm/radeon/ni.c |8 +++---
 drivers/gpu/drm/radeon/r100.c   |   14 ++--
 drivers/gpu/drm/radeon/r300.c   |4 +-
 drivers/gpu/drm/radeon/r520.c   |4 +-
 drivers/gpu/drm/radeon/r600.c   |   18 
 drivers/gpu/drm/radeon/r600_cs.c|2 +-
 drivers/gpu/drm/radeon/r600_hdmi.c  |2 +-
 drivers/gpu/drm/radeon/radeon_combios.c |2 +-
 drivers/gpu/drm/radeon/radeon_connectors.c  |   28 +-
 drivers/gpu/drm/radeon/radeon_cs.c  |2 +-
 drivers/gpu/drm/radeon/radeon_device.c  |2 +-
 drivers/gpu/drm/radeon/radeon_fb.c  |2 +-
 drivers/gpu/drm/radeon/radeon_fence.c   |2 +-
 drivers/gpu/drm/radeon/radeon_ioc32.c   |2 +-
 drivers/gpu/drm/radeon/radeon_legacy_crtc.c |4 +-
 drivers/gpu/drm/radeon/radeon_ring.c|8 +++---
 drivers/gpu/drm/radeon/radeon_test.c|2 +-
 drivers/gpu/drm/radeon/radeon_ttm.c |2 +-
 drivers/gpu/drm/radeon/rs400.c  |6 ++--
 drivers/gpu/drm/radeon/rs600.c  |   14 ++--
 drivers/gpu/drm/radeon/rs690.c  |6 ++--
 drivers/gpu/drm/radeon/rv515.c  |   18 
 drivers/gpu/drm/radeon/rv770.c  |   10 
 drivers/gpu/drm/radeon/si.c |6 ++--
 27 files changed, 93 insertions(+), 93 deletions(-)

diff --git a/drivers/gpu/drm/radeon/atombios_encoders.c 
b/drivers/gpu/drm/radeon/atombios_encoders.c
index f9bc27f..260fd01 100644
--- a/drivers/gpu/drm/radeon/atombios_encoders.c
+++ b/drivers/gpu/drm/radeon/atombios_encoders.c
@@ -2289,7 +2289,7 @@ static const struct drm_encoder_funcs 
radeon_atom_enc_funcs = {
.destroy = radeon_enc_destroy,
 };

-struct radeon_encoder_atom_dac *
+static struct radeon_encoder_atom_dac *
 radeon_atombios_set_dac_info(struct radeon_encoder *radeon_encoder)
 {
struct drm_device *dev = radeon_encoder->base.dev;
@@ -2303,7 +2303,7 @@ radeon_atombios_set_dac_info(struct radeon_encoder 
*radeon_encoder)
return dac;
 }

-struct radeon_encoder_atom_dig *
+static struct radeon_encoder_atom_dig *
 radeon_atombios_set_dig_info(struct radeon_encoder *radeon_encoder)
 {
int encoder_enum = (radeon_encoder->encoder_enum & ENUM_ID_MASK) >> 
ENUM_ID_SHIFT;
diff --git a/drivers/gpu/drm/radeon/evergreen.c 
b/drivers/gpu/drm/radeon/evergreen.c
index e585a3b..32b0522 100644
--- a/drivers/gpu/drm/radeon/evergreen.c
+++ b/drivers/gpu/drm/radeon/evergreen.c
@@ -1109,7 +1109,7 @@ void evergreen_pcie_gart_tlb_flush(struct radeon_device 
*rdev)
}
 }

-int evergreen_pcie_gart_enable(struct radeon_device *rdev)
+static int evergreen_pcie_gart_enable(struct radeon_device *rdev)
 {
u32 tmp;
int r;
@@ -1168,7 +1168,7 @@ int evergreen_pcie_gart_enable(struct radeon_device *rdev)
return 0;
 }

-void evergreen_pcie_gart_disable(struct radeon_device *rdev)
+static void evergreen_pcie_gart_disable(struct radeon_device *rdev)
 {
u32 tmp;

@@ -1193,7 +1193,7 @@ void evergreen_pcie_gart_disable(struct radeon_device 
*rdev)
radeon_gart_table_vram_unpin(rdev);
 }

-void evergreen_pcie_gart_fini(struct radeon_device *rdev)
+static void evergreen_pcie_gart_fini(struct radeon_device *rdev)
 {
evergreen_pcie_gart_disable(rdev);
radeon_gart_table_vram_free(rdev);
@@ -1201,7 +1201,7 @@ void evergreen_pcie_gart_fini(struct radeon_device *rdev)
 }


-void evergreen_agp_enable(struct radeon_device *rdev)
+static void evergreen_agp_enable(struct radeon_device *rdev)
 {
u32 tmp;

@@ -1614,7 +1614,7 @@ static int evergreen_cp_start(struct radeon_device *rdev)
return 0;
 }

-int evergreen_cp_resume(struct radeon_device *rdev)
+static int evergreen_cp_resume(struct radeon_device *rdev)
 {
struct radeon_ring *ring = &rdev->ring[RADEON_RING_TYPE_GFX_INDEX];
u32 tmp;
@@ -2775,7 +2775,7 @@ static void evergreen_irq_ack(struct radeon_device *rdev)
}
 }

-void evergreen_irq_disable(struct radeon_device *rdev)
+static void evergreen_irq_disable(struct radeon_device *rdev)
 {
r600_disable_interrupts(rdev);
/* Wait and acknowledge irq */
diff --git a/drivers/gpu/drm/radeon/evergreen_cs.c 
b/drivers/gpu/drm/radeon/evergreen_cs.c
index c1655412..b93a617 100644
--- a/drivers/gpu/drm/radeon/evergreen_cs.c
+++ b/drivers/gpu/drm/radeon/evergreen_cs.c
@@ -986,7 +986,7 @@ static int evergreen_cs_track_check(struct radeon_cs_parser 
*p)
  * Assume that chunk_ib_index is properly set. Will return -EI

[PATCH] drm/radeon: Mark all possible functions / structs as static

2012-07-27 Thread Lauri Kasanen
Let's allow GCC to optimize better.

This exposed some five unused functions, but this patch doesn't remove them.

Signed-off-by: Lauri Kasanen 
---
 drivers/gpu/drm/radeon/atombios_encoders.c  |4 +-
 drivers/gpu/drm/radeon/evergreen.c  |   12 +-
 drivers/gpu/drm/radeon/evergreen_cs.c   |2 +-
 drivers/gpu/drm/radeon/ni.c |8 +++---
 drivers/gpu/drm/radeon/r100.c   |   14 ++--
 drivers/gpu/drm/radeon/r300.c   |4 +-
 drivers/gpu/drm/radeon/r520.c   |4 +-
 drivers/gpu/drm/radeon/r600.c   |   18 
 drivers/gpu/drm/radeon/r600_cs.c|2 +-
 drivers/gpu/drm/radeon/r600_hdmi.c  |2 +-
 drivers/gpu/drm/radeon/radeon_combios.c |2 +-
 drivers/gpu/drm/radeon/radeon_connectors.c  |   28 +-
 drivers/gpu/drm/radeon/radeon_cs.c  |2 +-
 drivers/gpu/drm/radeon/radeon_device.c  |2 +-
 drivers/gpu/drm/radeon/radeon_fb.c  |2 +-
 drivers/gpu/drm/radeon/radeon_fence.c   |2 +-
 drivers/gpu/drm/radeon/radeon_ioc32.c   |2 +-
 drivers/gpu/drm/radeon/radeon_legacy_crtc.c |4 +-
 drivers/gpu/drm/radeon/radeon_ring.c|8 +++---
 drivers/gpu/drm/radeon/radeon_test.c|2 +-
 drivers/gpu/drm/radeon/radeon_ttm.c |2 +-
 drivers/gpu/drm/radeon/rs400.c  |6 ++--
 drivers/gpu/drm/radeon/rs600.c  |   14 ++--
 drivers/gpu/drm/radeon/rs690.c  |6 ++--
 drivers/gpu/drm/radeon/rv515.c  |   18 
 drivers/gpu/drm/radeon/rv770.c  |   10 
 drivers/gpu/drm/radeon/si.c |6 ++--
 27 files changed, 93 insertions(+), 93 deletions(-)

diff --git a/drivers/gpu/drm/radeon/atombios_encoders.c 
b/drivers/gpu/drm/radeon/atombios_encoders.c
index f9bc27f..260fd01 100644
--- a/drivers/gpu/drm/radeon/atombios_encoders.c
+++ b/drivers/gpu/drm/radeon/atombios_encoders.c
@@ -2289,7 +2289,7 @@ static const struct drm_encoder_funcs 
radeon_atom_enc_funcs = {
.destroy = radeon_enc_destroy,
 };
 
-struct radeon_encoder_atom_dac *
+static struct radeon_encoder_atom_dac *
 radeon_atombios_set_dac_info(struct radeon_encoder *radeon_encoder)
 {
struct drm_device *dev = radeon_encoder->base.dev;
@@ -2303,7 +2303,7 @@ radeon_atombios_set_dac_info(struct radeon_encoder 
*radeon_encoder)
return dac;
 }
 
-struct radeon_encoder_atom_dig *
+static struct radeon_encoder_atom_dig *
 radeon_atombios_set_dig_info(struct radeon_encoder *radeon_encoder)
 {
int encoder_enum = (radeon_encoder->encoder_enum & ENUM_ID_MASK) >> 
ENUM_ID_SHIFT;
diff --git a/drivers/gpu/drm/radeon/evergreen.c 
b/drivers/gpu/drm/radeon/evergreen.c
index e585a3b..32b0522 100644
--- a/drivers/gpu/drm/radeon/evergreen.c
+++ b/drivers/gpu/drm/radeon/evergreen.c
@@ -1109,7 +1109,7 @@ void evergreen_pcie_gart_tlb_flush(struct radeon_device 
*rdev)
}
 }
 
-int evergreen_pcie_gart_enable(struct radeon_device *rdev)
+static int evergreen_pcie_gart_enable(struct radeon_device *rdev)
 {
u32 tmp;
int r;
@@ -1168,7 +1168,7 @@ int evergreen_pcie_gart_enable(struct radeon_device *rdev)
return 0;
 }
 
-void evergreen_pcie_gart_disable(struct radeon_device *rdev)
+static void evergreen_pcie_gart_disable(struct radeon_device *rdev)
 {
u32 tmp;
 
@@ -1193,7 +1193,7 @@ void evergreen_pcie_gart_disable(struct radeon_device 
*rdev)
radeon_gart_table_vram_unpin(rdev);
 }
 
-void evergreen_pcie_gart_fini(struct radeon_device *rdev)
+static void evergreen_pcie_gart_fini(struct radeon_device *rdev)
 {
evergreen_pcie_gart_disable(rdev);
radeon_gart_table_vram_free(rdev);
@@ -1201,7 +1201,7 @@ void evergreen_pcie_gart_fini(struct radeon_device *rdev)
 }
 
 
-void evergreen_agp_enable(struct radeon_device *rdev)
+static void evergreen_agp_enable(struct radeon_device *rdev)
 {
u32 tmp;
 
@@ -1614,7 +1614,7 @@ static int evergreen_cp_start(struct radeon_device *rdev)
return 0;
 }
 
-int evergreen_cp_resume(struct radeon_device *rdev)
+static int evergreen_cp_resume(struct radeon_device *rdev)
 {
struct radeon_ring *ring = &rdev->ring[RADEON_RING_TYPE_GFX_INDEX];
u32 tmp;
@@ -2775,7 +2775,7 @@ static void evergreen_irq_ack(struct radeon_device *rdev)
}
 }
 
-void evergreen_irq_disable(struct radeon_device *rdev)
+static void evergreen_irq_disable(struct radeon_device *rdev)
 {
r600_disable_interrupts(rdev);
/* Wait and acknowledge irq */
diff --git a/drivers/gpu/drm/radeon/evergreen_cs.c 
b/drivers/gpu/drm/radeon/evergreen_cs.c
index c1655412..b93a617 100644
--- a/drivers/gpu/drm/radeon/evergreen_cs.c
+++ b/drivers/gpu/drm/radeon/evergreen_cs.c
@@ -986,7 +986,7 @@ static int evergreen_cs_track_check(struct radeon_cs_parser 
*p)
  * Assume that chunk_ib_index is properly set. Will retur

[PATCH 2/2] drm/radeon: Make sure the correct TA bit is used for R700

2012-07-07 Thread Lauri Kasanen

The declarations were moved around because of a GCC warning,
"ISO C90 forbids mixed declarations and code". (why so strict?)

Signed-off-by: Lauri Kasanen 
---
 drivers/gpu/drm/radeon/r600.c |   16 
 1 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c
index bff6272..c77cdba 100644
--- a/drivers/gpu/drm/radeon/r600.c
+++ b/drivers/gpu/drm/radeon/r600.c
@@ -1278,15 +1278,24 @@ void r600_vram_scratch_fini(struct radeon_device *rdev)
 int r600_gpu_soft_reset(struct radeon_device *rdev)
 {
struct rv515_mc_save save;
-   u32 grbm_busy_mask = S_008010_VC_BUSY(1) | S_008010_VGT_BUSY_NO_DMA(1) |
-   S_008010_VGT_BUSY(1) | S_008010_TA03_BUSY(1) |
+   u32 grbm_busy_mask;
+   u32 grbm2_busy_mask;
+   u32 tmp;
+
+   u32 ta_busy = S_008010_TA03_BUSY(1);
+
+   if (rdev->family >= CHIP_RV770)
+   ta_busy = S_008010_R700_TA03_BUSY(1);
+
+   grbm_busy_mask = S_008010_VC_BUSY(1) | S_008010_VGT_BUSY_NO_DMA(1) |
+   S_008010_VGT_BUSY(1) | ta_busy |
S_008010_TC_BUSY(1) | S_008010_SX_BUSY(1) |
S_008010_SH_BUSY(1) | S_008010_SPI03_BUSY(1) |
S_008010_SMX_BUSY(1) | S_008010_SC_BUSY(1) |
S_008010_PA_BUSY(1) | S_008010_DB03_BUSY(1) |
S_008010_CR_BUSY(1) | S_008010_CB03_BUSY(1) |
S_008010_GUI_ACTIVE(1);
-   u32 grbm2_busy_mask = S_008014_SPI0_BUSY(1) | S_008014_SPI1_BUSY(1) |
+   grbm2_busy_mask = S_008014_SPI0_BUSY(1) | S_008014_SPI1_BUSY(1) |
S_008014_SPI2_BUSY(1) | S_008014_SPI3_BUSY(1) |
S_008014_TA0_BUSY(1) | S_008014_TA1_BUSY(1) |
S_008014_TA2_BUSY(1) | S_008014_TA3_BUSY(1) |
@@ -1294,7 +1303,6 @@ int r600_gpu_soft_reset(struct radeon_device *rdev)
S_008014_DB2_BUSY(1) | S_008014_DB3_BUSY(1) |
S_008014_CB0_BUSY(1) | S_008014_CB1_BUSY(1) |
S_008014_CB2_BUSY(1) | S_008014_CB3_BUSY(1);
-   u32 tmp;

if (!(RREG32(GRBM_STATUS) & GUI_ACTIVE))
return 0;
-- 
1.7.2.1



[PATCH 1/2] drm/radeon: Document that R700 has a different TA status bit than R600

2012-07-07 Thread Lauri Kasanen
Hi

This info comes from r600_demo, and was confirmed correct on a RV710.

Signed-off-by: Lauri Kasanen 
---
 drivers/gpu/drm/radeon/r600d.h |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/gpu/drm/radeon/r600d.h b/drivers/gpu/drm/radeon/r600d.h
index 025fd5b..b818a5c 100644
--- a/drivers/gpu/drm/radeon/r600d.h
+++ b/drivers/gpu/drm/radeon/r600d.h
@@ -1192,6 +1192,7 @@
 #defineS_008010_VC_BUSY(x) (((x) & 1) << 
11)
 #defineS_008010_DB03_CLEAN(x)  (((x) & 1) << 
12)
 #defineS_008010_CB03_CLEAN(x)  (((x) & 1) << 
13)
+#defineS_008010_R700_TA03_BUSY(x)  (((x) & 1) << 
14)
 #defineS_008010_VGT_BUSY_NO_DMA(x) (((x) & 1) << 
16)
 #defineS_008010_VGT_BUSY(x)(((x) & 1) << 
17)
 #defineS_008010_TA03_BUSY(x)   (((x) & 1) << 
18)
-- 
1.7.2.1



[PATCH 2/2] drm/radeon: Make sure the correct TA bit is used for R700

2012-07-07 Thread Lauri Kasanen

The declarations were moved around because of a GCC warning,
"ISO C90 forbids mixed declarations and code". (why so strict?)

Signed-off-by: Lauri Kasanen 
---
 drivers/gpu/drm/radeon/r600.c |   16 
 1 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c
index bff6272..c77cdba 100644
--- a/drivers/gpu/drm/radeon/r600.c
+++ b/drivers/gpu/drm/radeon/r600.c
@@ -1278,15 +1278,24 @@ void r600_vram_scratch_fini(struct radeon_device *rdev)
 int r600_gpu_soft_reset(struct radeon_device *rdev)
 {
struct rv515_mc_save save;
-   u32 grbm_busy_mask = S_008010_VC_BUSY(1) | S_008010_VGT_BUSY_NO_DMA(1) |
-   S_008010_VGT_BUSY(1) | S_008010_TA03_BUSY(1) |
+   u32 grbm_busy_mask;
+   u32 grbm2_busy_mask;
+   u32 tmp;
+
+   u32 ta_busy = S_008010_TA03_BUSY(1);
+
+   if (rdev->family >= CHIP_RV770)
+   ta_busy = S_008010_R700_TA03_BUSY(1);
+
+   grbm_busy_mask = S_008010_VC_BUSY(1) | S_008010_VGT_BUSY_NO_DMA(1) |
+   S_008010_VGT_BUSY(1) | ta_busy |
S_008010_TC_BUSY(1) | S_008010_SX_BUSY(1) |
S_008010_SH_BUSY(1) | S_008010_SPI03_BUSY(1) |
S_008010_SMX_BUSY(1) | S_008010_SC_BUSY(1) |
S_008010_PA_BUSY(1) | S_008010_DB03_BUSY(1) |
S_008010_CR_BUSY(1) | S_008010_CB03_BUSY(1) |
S_008010_GUI_ACTIVE(1);
-   u32 grbm2_busy_mask = S_008014_SPI0_BUSY(1) | S_008014_SPI1_BUSY(1) |
+   grbm2_busy_mask = S_008014_SPI0_BUSY(1) | S_008014_SPI1_BUSY(1) |
S_008014_SPI2_BUSY(1) | S_008014_SPI3_BUSY(1) |
S_008014_TA0_BUSY(1) | S_008014_TA1_BUSY(1) |
S_008014_TA2_BUSY(1) | S_008014_TA3_BUSY(1) |
@@ -1294,7 +1303,6 @@ int r600_gpu_soft_reset(struct radeon_device *rdev)
S_008014_DB2_BUSY(1) | S_008014_DB3_BUSY(1) |
S_008014_CB0_BUSY(1) | S_008014_CB1_BUSY(1) |
S_008014_CB2_BUSY(1) | S_008014_CB3_BUSY(1);
-   u32 tmp;
 
if (!(RREG32(GRBM_STATUS) & GUI_ACTIVE))
return 0;
-- 
1.7.2.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 1/2] drm/radeon: Document that R700 has a different TA status bit than R600

2012-07-07 Thread Lauri Kasanen
Hi

This info comes from r600_demo, and was confirmed correct on a RV710.

Signed-off-by: Lauri Kasanen 
---
 drivers/gpu/drm/radeon/r600d.h |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/gpu/drm/radeon/r600d.h b/drivers/gpu/drm/radeon/r600d.h
index 025fd5b..b818a5c 100644
--- a/drivers/gpu/drm/radeon/r600d.h
+++ b/drivers/gpu/drm/radeon/r600d.h
@@ -1192,6 +1192,7 @@
 #defineS_008010_VC_BUSY(x) (((x) & 1) << 
11)
 #defineS_008010_DB03_CLEAN(x)  (((x) & 1) << 
12)
 #defineS_008010_CB03_CLEAN(x)  (((x) & 1) << 
13)
+#defineS_008010_R700_TA03_BUSY(x)  (((x) & 1) << 
14)
 #defineS_008010_VGT_BUSY_NO_DMA(x) (((x) & 1) << 
16)
 #defineS_008010_VGT_BUSY(x)(((x) & 1) << 
17)
 #defineS_008010_TA03_BUSY(x)   (((x) & 1) << 
18)
-- 
1.7.2.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH libdrm] intel: Fix build failure in test_decode.c

2012-07-03 Thread Lauri Kasanen
On Mon, 2 Jul 2012 14:54:58 -0700
Ben Widawsky  wrote:

> > +#define _GNU_SOURCE
> > +
> >  #include 
> >  #include 
> >  #include 
> 
> I can't reproduce this. Can anyone else confirm this is broken, and if
> so that the above patch fixes it?

See the manpage, it depends on the glibc version. Libdrm is broken on glibc < 
2.10.

- Lauri


Re: [PATCH libdrm] intel: Fix build failure in test_decode.c

2012-07-03 Thread Lauri Kasanen
On Mon, 2 Jul 2012 14:54:58 -0700
Ben Widawsky  wrote:

> > +#define _GNU_SOURCE
> > +
> >  #include 
> >  #include 
> >  #include 
> 
> I can't reproduce this. Can anyone else confirm this is broken, and if
> so that the above patch fixes it?

See the manpage, it depends on the glibc version. Libdrm is broken on glibc < 
2.10.

- Lauri
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH libdrm] intel: Fix build failure in test_decode.c

2012-06-30 Thread Lauri Kasanen
Hi list

The recently released libdrm 2.4.37 does not compile the Intel part:

test_decode.c: In function 'compare_batch':
test_decode.c:107: error: implicit declaration of function 'open_memstream'

PS: Please CC me.

Signed-off-by: Lauri Kasanen 
---
 intel/test_decode.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/intel/test_decode.c b/intel/test_decode.c
index c9ab7ad..0fcdf3b 100644
--- a/intel/test_decode.c
+++ b/intel/test_decode.c
@@ -21,6 +21,8 @@
  * IN THE SOFTWARE.
  */

+#define _GNU_SOURCE
+
 #include 
 #include 
 #include 
-- 
1.7.2.1



[PATCH libdrm] intel: Fix build failure in test_decode.c

2012-06-30 Thread Lauri Kasanen
Hi list

The recently released libdrm 2.4.37 does not compile the Intel part:

test_decode.c: In function 'compare_batch':
test_decode.c:107: error: implicit declaration of function 'open_memstream'

PS: Please CC me.

Signed-off-by: Lauri Kasanen 
---
 intel/test_decode.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/intel/test_decode.c b/intel/test_decode.c
index c9ab7ad..0fcdf3b 100644
--- a/intel/test_decode.c
+++ b/intel/test_decode.c
@@ -21,6 +21,8 @@
  * IN THE SOFTWARE.
  */
 
+#define _GNU_SOURCE
+
 #include 
 #include 
 #include 
-- 
1.7.2.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] radeon: Make PM info available to all, not just debug users

2012-06-05 Thread Lauri Kasanen
On Mon, 4 Jun 2012 13:05:46 -0400
Jerome Glisse  wrote:

> My dream here is to talk with the gnome folks to have them make some
> kind GPU module we could write and that would show in control center.
> I just need to corner some of my gnome coworker to work something out.
> So if you were using nouveau you would a nouveau specific control
> center entry where there could be nouveau specific information (clock
> voltage) and control. Same for radeon, intel ...

As I don't use Gnome or any other heavy DE, I'll end up writing a cli tool for 
that (radeon, at the start if they differ) anyway. So don't worry about the 
userspace part, just expose everything kernel-side :)

- Lauri


Re: [PATCH] radeon: Make PM info available to all, not just debug users

2012-06-05 Thread Lauri Kasanen
On Mon, 4 Jun 2012 13:05:46 -0400
Jerome Glisse  wrote:

> My dream here is to talk with the gnome folks to have them make some
> kind GPU module we could write and that would show in control center.
> I just need to corner some of my gnome coworker to work something out.
> So if you were using nouveau you would a nouveau specific control
> center entry where there could be nouveau specific information (clock
> voltage) and control. Same for radeon, intel ...

As I don't use Gnome or any other heavy DE, I'll end up writing a cli tool for 
that (radeon, at the start if they differ) anyway. So don't worry about the 
userspace part, just expose everything kernel-side :)

- Lauri
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] radeon: Make PM info available to all, not just debug users

2012-06-04 Thread Lauri Kasanen
On Mon, 04 Jun 2012 18:18:10 +0200
Christian K?nig  wrote:
> > My experience is that things that are true today for GPU, are not
> > tomorrow. Yes there will still be clock/voltage, but there could be
> > new complete different things, like shutting down block.
> >
> > I am not even mentioning things like complex value range dependency
> > btw things (for instance if a domain as clock/voltage in certain range
> > than some other domain can only have clock in a restricted range
> > value).
> >
> > While i agree that sysfs looks easy for user to play with, i believe
> > that gui is what you really after and afaik closed source driver all
> > expose a gui for their power management. Using ioctl allow better
> > overall control, like atomic setting of several domain ...
> 
> Yeah, I think Jerome is right here.
> 
> The internals like different voltage areas, dependencies of clocks, 
> possible powered down chip areas, etc are very complex and actually 
> completely unteresting to the end user. Also the general direction of 
> AMD hardware is going to a complete self containing system, e.g. the 
> chip is handling mostly everything on its own.
> 
> I agree that this is better done in a hardware dependent ioctl and then 
> abstracted in userspace instead of pushing the whole abstraction into 
> the kernel.

Hi

As long as the info is made available better than it is currently, I'm all for 
it.

Though with the direction of ioctl and folks, it is deep outside my expertise. 
With a deep solution like that someone who knows the area well would need to do 
it.

- Lauri


[PATCH] radeon: Make PM info available to all, not just debug users

2012-06-04 Thread Lauri Kasanen
On Sun, 03 Jun 2012 12:54:30 +0200
Christian K?nig  wrote:

> > This moves the pm_info file from debugfs to next to the other two power 
> > files.
> >
> > Requested by several users at Phoronix.
> >
> > PS: Please CC me. Also please be gentle, it's my first step in kernel-land 
> > ;)

> Hui? What should this be good for?
> 
> Sysfs files are for setting driver parameters, like the power management 
> method or profile currently in use. One major advantage of sysfs is the 
> strict rules for a permanent and machine usable interface, for example 
> it is mandatory to only specify one parameter per sysfs file.
> 
> Debugfs on the other hand should be used for human readable 
> informations, e.g. the printing the current clocks in a human readable 
> form. Also you don't need a debug build or turn on any other debugging 
> facility to get those information, just take a look under 
> "sys/kernel/debug/dri/*".

I have no such dir, /sys/kernel/debug. The fact you have it means you have 
CONFIG_DEBUGFS enabled and mounted.

> So the code is actually quite valid as it is.

First, the current location is illogical, and several users have complained 
about it. This info should be right next to where it is tweaked, ie right next 
to power_profile and power_method. That is where it's expected to be by users.

Secondly, checking the clocks is absolutely not a debug operation. Therefore 
requiring a debug option (CONFIG_DEBUGFS) to see this info is plain wrong. This 
info needs to be available to all users, including those on production kernels 
without such debug options.

--

So the issue is the location of the info, not the format. I'd be more than 
happy to split it into six files (default_core_clock, current_core_clock...) 
that each offer just a kHz number, just like the cpufreq scaling_cur_freq do. 
Would that be preferable?

- Lauri


Re: [PATCH] radeon: Make PM info available to all, not just debug users

2012-06-04 Thread Lauri Kasanen
On Mon, 04 Jun 2012 18:18:10 +0200
Christian König  wrote:
> > My experience is that things that are true today for GPU, are not
> > tomorrow. Yes there will still be clock/voltage, but there could be
> > new complete different things, like shutting down block.
> >
> > I am not even mentioning things like complex value range dependency
> > btw things (for instance if a domain as clock/voltage in certain range
> > than some other domain can only have clock in a restricted range
> > value).
> >
> > While i agree that sysfs looks easy for user to play with, i believe
> > that gui is what you really after and afaik closed source driver all
> > expose a gui for their power management. Using ioctl allow better
> > overall control, like atomic setting of several domain ...
> 
> Yeah, I think Jerome is right here.
> 
> The internals like different voltage areas, dependencies of clocks, 
> possible powered down chip areas, etc are very complex and actually 
> completely unteresting to the end user. Also the general direction of 
> AMD hardware is going to a complete self containing system, e.g. the 
> chip is handling mostly everything on its own.
> 
> I agree that this is better done in a hardware dependent ioctl and then 
> abstracted in userspace instead of pushing the whole abstraction into 
> the kernel.

Hi

As long as the info is made available better than it is currently, I'm all for 
it.

Though with the direction of ioctl and folks, it is deep outside my expertise. 
With a deep solution like that someone who knows the area well would need to do 
it.

- Lauri
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] radeon: Make PM info available to all, not just debug users

2012-06-04 Thread Lauri Kasanen
On Sun, 03 Jun 2012 12:54:30 +0200
Christian König  wrote:

> > This moves the pm_info file from debugfs to next to the other two power 
> > files.
> >
> > Requested by several users at Phoronix.
> >
> > PS: Please CC me. Also please be gentle, it's my first step in kernel-land 
> > ;)

> Hui? What should this be good for?
> 
> Sysfs files are for setting driver parameters, like the power management 
> method or profile currently in use. One major advantage of sysfs is the 
> strict rules for a permanent and machine usable interface, for example 
> it is mandatory to only specify one parameter per sysfs file.
> 
> Debugfs on the other hand should be used for human readable 
> informations, e.g. the printing the current clocks in a human readable 
> form. Also you don't need a debug build or turn on any other debugging 
> facility to get those information, just take a look under 
> "sys/kernel/debug/dri/*".

I have no such dir, /sys/kernel/debug. The fact you have it means you have 
CONFIG_DEBUGFS enabled and mounted.

> So the code is actually quite valid as it is.

First, the current location is illogical, and several users have complained 
about it. This info should be right next to where it is tweaked, ie right next 
to power_profile and power_method. That is where it's expected to be by users.

Secondly, checking the clocks is absolutely not a debug operation. Therefore 
requiring a debug option (CONFIG_DEBUGFS) to see this info is plain wrong. This 
info needs to be available to all users, including those on production kernels 
without such debug options.

--

So the issue is the location of the info, not the format. I'd be more than 
happy to split it into six files (default_core_clock, current_core_clock...) 
that each offer just a kHz number, just like the cpufreq scaling_cur_freq do. 
Would that be preferable?

- Lauri
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] radeon: Make PM info available to all, not just debug users

2012-06-02 Thread Lauri Kasanen
Hi all

This moves the pm_info file from debugfs to next to the other two power files.

Requested by several users at Phoronix.

PS: Please CC me. Also please be gentle, it's my first step in kernel-land ;)

Signed-off-by: Lauri Kasanen 
---
 drivers/gpu/drm/radeon/radeon_pm.c |   86 ++-
 1 files changed, 44 insertions(+), 42 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_pm.c 
b/drivers/gpu/drm/radeon/radeon_pm.c
index 0882554..7aab18f 100644
--- a/drivers/gpu/drm/radeon/radeon_pm.c
+++ b/drivers/gpu/drm/radeon/radeon_pm.c
@@ -45,7 +45,6 @@ static const char *radeon_pm_state_type_name[5] = {
 };

 static void radeon_dynpm_idle_work_handler(struct work_struct *work);
-static int radeon_debugfs_pm_init(struct radeon_device *rdev);
 static bool radeon_pm_in_vbl(struct radeon_device *rdev);
 static bool radeon_pm_debug_check_in_vbl(struct radeon_device *rdev, bool 
finish);
 static void radeon_pm_update_profile(struct radeon_device *rdev);
@@ -437,8 +436,49 @@ fail:
return count;
 }

+static ssize_t radeon_get_pm_info(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+   struct drm_device *ddev = pci_get_drvdata(to_pci_dev(dev));
+   struct radeon_device *rdev = ddev->dev_private;
+
+   ssize_t curpos, len = PAGE_SIZE;
+   char *tmp;
+
+   curpos = snprintf(buf, len,
+ "default engine clock: %u0 kHz\n"
+ "current engine clock: %u0 kHz\n"
+ "default memory clock: %u0 kHz\n",
+ rdev->pm.default_sclk,
+ radeon_get_engine_clock(rdev),
+ rdev->pm.default_mclk);
+   len -= curpos;
+
+   if (rdev->asic->get_memory_clock) {
+   tmp = buf + curpos;
+   curpos += snprintf(tmp, len, "current memory clock: %u0 kHz\n", 
radeon_get_memory_clock(rdev));
+   len = PAGE_SIZE - curpos;
+   }
+
+   if (rdev->pm.current_vddc) {
+   tmp = buf + curpos;
+   curpos += snprintf(tmp, len, "voltage: %u mV\n", 
rdev->pm.current_vddc);
+   len = PAGE_SIZE - curpos;
+   }
+
+   if (rdev->asic->get_pcie_lanes) {
+   tmp = buf + curpos;
+   curpos += snprintf(tmp, len, "PCIE lanes: %d\n", 
radeon_get_pcie_lanes(rdev));
+   len = PAGE_SIZE - curpos;
+   }
+
+   return curpos;
+}
+
 static DEVICE_ATTR(power_profile, S_IRUGO | S_IWUSR, radeon_get_pm_profile, 
radeon_set_pm_profile);
 static DEVICE_ATTR(power_method, S_IRUGO | S_IWUSR, radeon_get_pm_method, 
radeon_set_pm_method);
+static DEVICE_ATTR(power_info, S_IRUGO, radeon_get_pm_info, NULL);

 static ssize_t radeon_hwmon_show_temp(struct device *dev,
  struct device_attribute *attr,
@@ -639,14 +679,14 @@ int radeon_pm_init(struct radeon_device *rdev)
ret = device_create_file(rdev->dev, &dev_attr_power_method);
if (ret)
DRM_ERROR("failed to create device file for power 
method\n");
+   ret = device_create_file(rdev->dev, &dev_attr_power_info);
+   if (ret)
+   DRM_ERROR("failed to create device file for power 
info\n");

 #ifdef CONFIG_ACPI
rdev->acpi_nb.notifier_call = radeon_acpi_event;
register_acpi_notifier(&rdev->acpi_nb);
 #endif
-   if (radeon_debugfs_pm_init(rdev)) {
-   DRM_ERROR("Failed to register debugfs file for PM!\n");
-   }

DRM_INFO("radeon: power management initialized\n");
}
@@ -843,41 +883,3 @@ static void radeon_dynpm_idle_work_handler(struct 
work_struct *work)
mutex_unlock(&rdev->pm.mutex);
ttm_bo_unlock_delayed_workqueue(&rdev->mman.bdev, resched);
 }
-
-/*
- * Debugfs info
- */
-#if defined(CONFIG_DEBUG_FS)
-
-static int radeon_debugfs_pm_info(struct seq_file *m, void *data)
-{
-   struct drm_info_node *node = (struct drm_info_node *) m->private;
-   struct drm_device *dev = node->minor->dev;
-   struct radeon_device *rdev = dev->dev_private;
-
-   seq_printf(m, "default engine clock: %u0 kHz\n", rdev->pm.default_sclk);
-   seq_printf(m, "current engine clock: %u0 kHz\n", 
radeon_get_engine_clock(rdev));
-   seq_printf(m, "default memory clock: %u0 kHz\n", rdev->pm.default_mclk);
-   if (rdev->asic->pm.get_memory_clock)
-   seq_printf(m, "current memory clock: %u0 kHz\n", 
radeon_get_memory_clock(rdev));
-   if (rdev->pm.current_vddc)
-   seq_printf(m, "voltage: %u mV\n", rdev

[PATCH] radeon: Make PM info available to all, not just debug users

2012-06-02 Thread Lauri Kasanen
Hi all

This moves the pm_info file from debugfs to next to the other two power files.

Requested by several users at Phoronix.

PS: Please CC me. Also please be gentle, it's my first step in kernel-land ;)

Signed-off-by: Lauri Kasanen 
---
 drivers/gpu/drm/radeon/radeon_pm.c |   86 ++-
 1 files changed, 44 insertions(+), 42 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_pm.c 
b/drivers/gpu/drm/radeon/radeon_pm.c
index 0882554..7aab18f 100644
--- a/drivers/gpu/drm/radeon/radeon_pm.c
+++ b/drivers/gpu/drm/radeon/radeon_pm.c
@@ -45,7 +45,6 @@ static const char *radeon_pm_state_type_name[5] = {
 };
 
 static void radeon_dynpm_idle_work_handler(struct work_struct *work);
-static int radeon_debugfs_pm_init(struct radeon_device *rdev);
 static bool radeon_pm_in_vbl(struct radeon_device *rdev);
 static bool radeon_pm_debug_check_in_vbl(struct radeon_device *rdev, bool 
finish);
 static void radeon_pm_update_profile(struct radeon_device *rdev);
@@ -437,8 +436,49 @@ fail:
return count;
 }
 
+static ssize_t radeon_get_pm_info(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+   struct drm_device *ddev = pci_get_drvdata(to_pci_dev(dev));
+   struct radeon_device *rdev = ddev->dev_private;
+
+   ssize_t curpos, len = PAGE_SIZE;
+   char *tmp;
+
+   curpos = snprintf(buf, len,
+ "default engine clock: %u0 kHz\n"
+ "current engine clock: %u0 kHz\n"
+ "default memory clock: %u0 kHz\n",
+ rdev->pm.default_sclk,
+ radeon_get_engine_clock(rdev),
+ rdev->pm.default_mclk);
+   len -= curpos;
+
+   if (rdev->asic->get_memory_clock) {
+   tmp = buf + curpos;
+   curpos += snprintf(tmp, len, "current memory clock: %u0 kHz\n", 
radeon_get_memory_clock(rdev));
+   len = PAGE_SIZE - curpos;
+   }
+
+   if (rdev->pm.current_vddc) {
+   tmp = buf + curpos;
+   curpos += snprintf(tmp, len, "voltage: %u mV\n", 
rdev->pm.current_vddc);
+   len = PAGE_SIZE - curpos;
+   }
+
+   if (rdev->asic->get_pcie_lanes) {
+   tmp = buf + curpos;
+   curpos += snprintf(tmp, len, "PCIE lanes: %d\n", 
radeon_get_pcie_lanes(rdev));
+   len = PAGE_SIZE - curpos;
+   }
+
+   return curpos;
+}
+
 static DEVICE_ATTR(power_profile, S_IRUGO | S_IWUSR, radeon_get_pm_profile, 
radeon_set_pm_profile);
 static DEVICE_ATTR(power_method, S_IRUGO | S_IWUSR, radeon_get_pm_method, 
radeon_set_pm_method);
+static DEVICE_ATTR(power_info, S_IRUGO, radeon_get_pm_info, NULL);
 
 static ssize_t radeon_hwmon_show_temp(struct device *dev,
  struct device_attribute *attr,
@@ -639,14 +679,14 @@ int radeon_pm_init(struct radeon_device *rdev)
ret = device_create_file(rdev->dev, &dev_attr_power_method);
if (ret)
DRM_ERROR("failed to create device file for power 
method\n");
+   ret = device_create_file(rdev->dev, &dev_attr_power_info);
+   if (ret)
+   DRM_ERROR("failed to create device file for power 
info\n");
 
 #ifdef CONFIG_ACPI
rdev->acpi_nb.notifier_call = radeon_acpi_event;
register_acpi_notifier(&rdev->acpi_nb);
 #endif
-   if (radeon_debugfs_pm_init(rdev)) {
-   DRM_ERROR("Failed to register debugfs file for PM!\n");
-   }
 
DRM_INFO("radeon: power management initialized\n");
}
@@ -843,41 +883,3 @@ static void radeon_dynpm_idle_work_handler(struct 
work_struct *work)
mutex_unlock(&rdev->pm.mutex);
ttm_bo_unlock_delayed_workqueue(&rdev->mman.bdev, resched);
 }
-
-/*
- * Debugfs info
- */
-#if defined(CONFIG_DEBUG_FS)
-
-static int radeon_debugfs_pm_info(struct seq_file *m, void *data)
-{
-   struct drm_info_node *node = (struct drm_info_node *) m->private;
-   struct drm_device *dev = node->minor->dev;
-   struct radeon_device *rdev = dev->dev_private;
-
-   seq_printf(m, "default engine clock: %u0 kHz\n", rdev->pm.default_sclk);
-   seq_printf(m, "current engine clock: %u0 kHz\n", 
radeon_get_engine_clock(rdev));
-   seq_printf(m, "default memory clock: %u0 kHz\n", rdev->pm.default_mclk);
-   if (rdev->asic->pm.get_memory_clock)
-   seq_printf(m, "current memory clock: %u0 kHz\n", 
radeon_get_memory_clock(rdev));
-   if (rdev->pm.current_vddc)
-   seq_printf(m, "voltage: %u mV\n", rdev