[PATCH] drm/radeon: fix TOPDOWN handling for bo_create
On Thu, 12 Mar 2015 18:02:56 +0900 Michel Dänzer wrote: > struct ttm_place::lpfn is honoured even with TTM_PL_FLAG_TOPDOWN, so > latter should work with RADEON_GEM_CPU_ACCESS. It sounds like the > problem is really that some BOs are expected to be within a certain > range from the beginning of VRAM, but lpfn isn't set accordingly. It > would be better to fix that by setting lpfn directly than indirectly via > RADEON_GEM_CPU_ACCESS. > > > Anyway, since this isn't the first bug which prevents > TTM_PL_FLAG_TOPDOWN from working as intended in the radeon driver, I > wonder if its performance impact should be re-evaluated. Lauri? I'm sorry, I'm not in a place where I could spend the time to redo the benchmarks. If it causes too many issues it is of course easy to disable, but so far the issues shown have not been caused by it - it merely exposed wrong settings/bugs elsewhere. From this POV I would say it's good to have it enabled, to stress the various parts. This doesn't warm the heart of the guy with flicker after suspend, so perhaps a kernel module parameter to disable it (defaulting to enabled)? - Lauri
[PATCH 1/5] drm/radeon: Set TTM_PL_FLAG_TOPDOWN also for RADEON_GEM_CPU_ACCESS BOs
On Tue, 28 Oct 2014 18:35:02 +0900 Michel Dänzer wrote: > From: Michel Dänzer > > I wasn't sure if TTM_PL_FLAG_TOPDOWN works correctly with non-0 lpfn, but > AFAICT it does. > > Signed-off-by: Michel Dänzer This series is Reviewed-by: Lauri Kasanen - Lauri
[PATCH] drm/radeon: Inline r100_mm_rreg, -wreg, v3
On Sun, 20 Apr 2014 19:41:11 +0200 Christian K?nig wrote: > Am 20.04.2014 19:29, schrieb Lauri Kasanen: > > This was originally un-inlined by Andi Kleen in 2011 citing size concerns. > > Indeed, a first attempt at inlining it grew radeon.ko by 7%. > > > > However, 2% of cpu is spent in this function. Simply inlining it gave 1% > > more fps > > in Urban Terror. > > > > v2: We know the minimum MMIO size. Adding it to the if allows the compiler > > to > > optimize the branch out, improving both performance and size. > > > > The v2 patch decreases radeon.ko size by 2%. I didn't re-benchmark, but > > common sense > > says perf is now more than 1% better. > > > > v3: Also change _wreg, make the threshold a define. > > > > Inlining _wreg increased the size a bit compared to v2, so now radeon.ko > > is only 1% smaller. > > > > Signed-off-by: Lauri Kasanen > > Reviewed-by: Christian K?nig Ping. Although reviewed in April, seems this wasn't applied to any tree? - Lauri
[PATCH] drm/radeon: Inline r100_mm_rreg, -wreg, v3
This was originally un-inlined by Andi Kleen in 2011 citing size concerns. Indeed, a first attempt at inlining it grew radeon.ko by 7%. However, 2% of cpu is spent in this function. Simply inlining it gave 1% more fps in Urban Terror. v2: We know the minimum MMIO size. Adding it to the if allows the compiler to optimize the branch out, improving both performance and size. The v2 patch decreases radeon.ko size by 2%. I didn't re-benchmark, but common sense says perf is now more than 1% better. v3: Also change _wreg, make the threshold a define. Inlining _wreg increased the size a bit compared to v2, so now radeon.ko is only 1% smaller. Signed-off-by: Lauri Kasanen --- drivers/gpu/drm/radeon/r100.c | 33 - drivers/gpu/drm/radeon/radeon.h | 40 2 files changed, 36 insertions(+), 37 deletions(-) diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c index b6c3264..a4e7871 100644 --- a/drivers/gpu/drm/radeon/r100.c +++ b/drivers/gpu/drm/radeon/r100.c @@ -4086,39 +4086,6 @@ int r100_init(struct radeon_device *rdev) return 0; } -uint32_t r100_mm_rreg(struct radeon_device *rdev, uint32_t reg, - bool always_indirect) -{ - if (reg < rdev->rmmio_size && !always_indirect) - return readl(((void __iomem *)rdev->rmmio) + reg); - else { - unsigned long flags; - uint32_t ret; - - spin_lock_irqsave(&rdev->mmio_idx_lock, flags); - writel(reg, ((void __iomem *)rdev->rmmio) + RADEON_MM_INDEX); - ret = readl(((void __iomem *)rdev->rmmio) + RADEON_MM_DATA); - spin_unlock_irqrestore(&rdev->mmio_idx_lock, flags); - - return ret; - } -} - -void r100_mm_wreg(struct radeon_device *rdev, uint32_t reg, uint32_t v, - bool always_indirect) -{ - if (reg < rdev->rmmio_size && !always_indirect) - writel(v, ((void __iomem *)rdev->rmmio) + reg); - else { - unsigned long flags; - - spin_lock_irqsave(&rdev->mmio_idx_lock, flags); - writel(reg, ((void __iomem *)rdev->rmmio) + RADEON_MM_INDEX); - writel(v, ((void __iomem *)rdev->rmmio) + RADEON_MM_DATA); - spin_unlock_irqrestore(&rdev->mmio_idx_lock, flags); - } -} - u32 r100_io_rreg(struct radeon_device *rdev, u32 reg) { if (reg < rdev->rio_mem_size) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index f21db7a..a749b6c 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -2328,10 +2328,42 @@ int radeon_device_init(struct radeon_device *rdev, void radeon_device_fini(struct radeon_device *rdev); int radeon_gpu_wait_for_idle(struct radeon_device *rdev); -uint32_t r100_mm_rreg(struct radeon_device *rdev, uint32_t reg, - bool always_indirect); -void r100_mm_wreg(struct radeon_device *rdev, uint32_t reg, uint32_t v, - bool always_indirect); +#define RADEON_MIN_MMIO_SIZE 0x1 + +static inline uint32_t r100_mm_rreg(struct radeon_device *rdev, uint32_t reg, + bool always_indirect) +{ + /* The mmio size is 64kb at minimum. Allows the if to be optimized out. */ + if ((reg < rdev->rmmio_size || reg < RADEON_MIN_MMIO_SIZE) && !always_indirect) + return readl(((void __iomem *)rdev->rmmio) + reg); + else { + unsigned long flags; + uint32_t ret; + + spin_lock_irqsave(&rdev->mmio_idx_lock, flags); + writel(reg, ((void __iomem *)rdev->rmmio) + RADEON_MM_INDEX); + ret = readl(((void __iomem *)rdev->rmmio) + RADEON_MM_DATA); + spin_unlock_irqrestore(&rdev->mmio_idx_lock, flags); + + return ret; + } +} + +static inline void r100_mm_wreg(struct radeon_device *rdev, uint32_t reg, uint32_t v, + bool always_indirect) +{ + if ((reg < rdev->rmmio_size || reg < RADEON_MIN_MMIO_SIZE) && !always_indirect) + writel(v, ((void __iomem *)rdev->rmmio) + reg); + else { + unsigned long flags; + + spin_lock_irqsave(&rdev->mmio_idx_lock, flags); + writel(reg, ((void __iomem *)rdev->rmmio) + RADEON_MM_INDEX); + writel(v, ((void __iomem *)rdev->rmmio) + RADEON_MM_DATA); + spin_unlock_irqrestore(&rdev->mmio_idx_lock, flags); + } +} + u32 r100_io_rreg(struct radeon_device *rdev, u32 reg); void r100_io_wreg(struct radeon_device *rdev, u32 reg, u32 v); -- 1.8.3.1
[PATCH] drm/radeon: Inline r100_mm_rreg, v2
On Sat, 19 Apr 2014 11:15:53 -0400 Alex Deucher wrote: > On Sat, Apr 19, 2014 at 6:06 AM, Christian K?nig > >> This was originally un-inlined by Andi Kleen in 2011 citing size concerns. > >> Indeed, a first attempt at inlining it grew radeon.ko by 7%. > >> > >> However, 2% of cpu is spent in this function. Simply inlining it gave 1% > >> more fps > >> in Urban Terror. > >> > >> v2: We know the minimum MMIO size. Adding it to the if allows the compiler > >> to > >> optimize the branch out, improving both performance and size. > >> > >> The v2 patch decreases radeon.ko size by 2%. I didn't re-benchmark, but > >> common sense > >> says perf is now more than 1% better. > > > > Nice! > > > > But are you sure that the register PCI bar is always at least 64K in size? > > Keep in mind that this code is used over all generations since R100. > > Additional to that we probably should have a define for that and also apply > > the optimizations to r100_mm_wreg as well. Yes, I checked the earlier code. It had 64kb hard-coded, and when it was changed in 2010 to use the dynamic value, the commit said later asics are larger. (07bec2df01) A quick google also didn't find any dmesg with smaller values, R100 cards had 64kb. > If most of the register accesses are for the interrupt setup, I wonder > if it would be better to just clean up the irq_set functions to reduce > the register accesses. E.g., only touch the registers for the > specific irq masks have changed. Yes, that should also be done. But as this function is used elsewhere as well, having it fast (not to mention the size decrease) would be good. I think this patch is safe enough for 3.15, but perhaps it's too late now. - Lauri
[PATCH] drm/radeon: Inline r100_mm_rreg, v2
This was originally un-inlined by Andi Kleen in 2011 citing size concerns. Indeed, a first attempt at inlining it grew radeon.ko by 7%. However, 2% of cpu is spent in this function. Simply inlining it gave 1% more fps in Urban Terror. v2: We know the minimum MMIO size. Adding it to the if allows the compiler to optimize the branch out, improving both performance and size. The v2 patch decreases radeon.ko size by 2%. I didn't re-benchmark, but common sense says perf is now more than 1% better. Signed-off-by: Lauri Kasanen --- drivers/gpu/drm/radeon/r100.c | 18 -- drivers/gpu/drm/radeon/radeon.h | 21 +++-- 2 files changed, 19 insertions(+), 20 deletions(-) The _wreg function could be given similar treatment, but as it's nowhere as hot (0.009% vs 2%), didn't bother. diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c index b6c3264..8169e82 100644 --- a/drivers/gpu/drm/radeon/r100.c +++ b/drivers/gpu/drm/radeon/r100.c @@ -4086,24 +4086,6 @@ int r100_init(struct radeon_device *rdev) return 0; } -uint32_t r100_mm_rreg(struct radeon_device *rdev, uint32_t reg, - bool always_indirect) -{ - if (reg < rdev->rmmio_size && !always_indirect) - return readl(((void __iomem *)rdev->rmmio) + reg); - else { - unsigned long flags; - uint32_t ret; - - spin_lock_irqsave(&rdev->mmio_idx_lock, flags); - writel(reg, ((void __iomem *)rdev->rmmio) + RADEON_MM_INDEX); - ret = readl(((void __iomem *)rdev->rmmio) + RADEON_MM_DATA); - spin_unlock_irqrestore(&rdev->mmio_idx_lock, flags); - - return ret; - } -} - void r100_mm_wreg(struct radeon_device *rdev, uint32_t reg, uint32_t v, bool always_indirect) { diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index f21db7a..883276a 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -2328,8 +2328,25 @@ int radeon_device_init(struct radeon_device *rdev, void radeon_device_fini(struct radeon_device *rdev); int radeon_gpu_wait_for_idle(struct radeon_device *rdev); -uint32_t r100_mm_rreg(struct radeon_device *rdev, uint32_t reg, - bool always_indirect); +static inline uint32_t r100_mm_rreg(struct radeon_device *rdev, uint32_t reg, + bool always_indirect) +{ + /* The mmio size is 64kb at minimum. Allows the if to be optimized out. */ + if ((reg < rdev->rmmio_size || reg < 0x1) && !always_indirect) + return readl(((void __iomem *)rdev->rmmio) + reg); + else { + unsigned long flags; + uint32_t ret; + + spin_lock_irqsave(&rdev->mmio_idx_lock, flags); + writel(reg, ((void __iomem *)rdev->rmmio) + RADEON_MM_INDEX); + ret = readl(((void __iomem *)rdev->rmmio) + RADEON_MM_DATA); + spin_unlock_irqrestore(&rdev->mmio_idx_lock, flags); + + return ret; + } +} + void r100_mm_wreg(struct radeon_device *rdev, uint32_t reg, uint32_t v, bool always_indirect); u32 r100_io_rreg(struct radeon_device *rdev, u32 reg); -- 1.8.3.1
[PATCH] drm/radeon: Inline r100_mm_rreg
On Fri, 11 Apr 2014 14:32:20 +0200 Christian K?nig wrote: > Am 11.04.2014 11:54, schrieb Lauri Kasanen: > > On Fri, 11 Apr 2014 10:33:08 +0200 > > Christian K?nig wrote: > > > >>>> Actually direct register access shouldn't be necessary so often. Apart > >>>> from page flips, write/read pointer updates and irq processing there > >>>> shouldn't be so many of them. Could you clarify a bit more what issue > >>>> you are seeing here? > >>> Too much cpu usage for such a simple function. 2% makes it #2 in top-10 > >>> radeon.ko functions, right after evergreen_cs_parse. For reference, #3 > >>> (radeon_cs_packet_parse) is only 0.5%, one fourth of this function's > >>> usage. > >> I think you misunderstood me here. I do believe your numbers that it > >> makes a noticeable difference. > >> > >> But I've did a couple of perf tests recently on SI and CIK while hacking > >> on VM support, and IIRC r100_mm_rreg didn't showed up in the top 10 on > >> those systems. > >> > >> So what puzzles me is who the hack is calling r100_mm_rreg so often that > >> it makes a noticeable difference on evergreen/NI? > > The biggest caller is cayman_cp_int_cntl_setup. Before inlining it took > > 0.0013%, after it takes 1%. > > Sounds like somebody is constantly turning interrupts on and off. I instrumented radeon_irq_set. There's about 12 calls to that per sec, most coming from radeon_irq_kms_sw_irq_get and radeon_irq_kms_sw_irq_put. So from your earlier mail I gather this amount of IRQ setting is normal. It's just taking a lot of cpu for some reason. - Lauri
[PATCH] drm/radeon: Inline r100_mm_rreg
On Fri, 11 Apr 2014 14:32:20 +0200 Christian K?nig wrote: > Anyway, I would do like Ilia suggested and only put the else branch into > a separate, not inlined function. > > BTW: It's probably a good idea to do the same for the write function as > well. I tested it. The majority of the size increase stayed - the else/spinlock part as non-inlined functions, radeon.ko was still 5% larger instead of 7%. - Lauri
[PATCH] drm/radeon: Inline r100_mm_rreg
On Fri, 11 Apr 2014 10:33:08 +0200 Christian K?nig wrote: > >> Actually direct register access shouldn't be necessary so often. Apart > >> from page flips, write/read pointer updates and irq processing there > >> shouldn't be so many of them. Could you clarify a bit more what issue > >> you are seeing here? > > Too much cpu usage for such a simple function. 2% makes it #2 in top-10 > > radeon.ko functions, right after evergreen_cs_parse. For reference, #3 > > (radeon_cs_packet_parse) is only 0.5%, one fourth of this function's > > usage. > > I think you misunderstood me here. I do believe your numbers that it > makes a noticeable difference. > > But I've did a couple of perf tests recently on SI and CIK while hacking > on VM support, and IIRC r100_mm_rreg didn't showed up in the top 10 on > those systems. > > So what puzzles me is who the hack is calling r100_mm_rreg so often that > it makes a noticeable difference on evergreen/NI? The biggest caller is cayman_cp_int_cntl_setup. Before inlining it took 0.0013%, after it takes 1%. This is on a Richland APU, so Aruba/Cayman. Urban Terror is an ioq3 game with a lot of cpu-side vertex submissions. - Lauri
[PATCH] drm/radeon: Inline r100_mm_rreg
On Thu, 10 Apr 2014 21:30:03 +0200 Christian K?nig wrote: > >>> Quick thought from someone entirely unfamiliar with the hardware: > >>> perhaps you can get the performance benefit without the size increase > >>> by moving the else portion into a non-inline function? I'm guessing > >>> that most accesses happen in the "if" branch. > >> The function call overhead is about equal to branching overhead, so > >> splitting it would only help about half that. It's called from many > >> places, and a lot of calls per sec. > > Actually direct register access shouldn't be necessary so often. Apart > from page flips, write/read pointer updates and irq processing there > shouldn't be so many of them. Could you clarify a bit more what issue > you are seeing here? Too much cpu usage for such a simple function. 2% makes it #2 in top-10 radeon.ko functions, right after evergreen_cs_parse. For reference, #3 (radeon_cs_packet_parse) is only 0.5%, one fourth of this function's usage. As proved by the perf increase, it's called often enough that getting rid of the function call overhead (and compiling the if out compile-time) helps measurably. - Lauri
[PATCH] drm/radeon: Inline r100_mm_rreg
On Thu, 10 Apr 2014 12:19:10 -0400 Ilia Mirkin wrote: > > +static inline uint32_t r100_mm_rreg(struct radeon_device *rdev, uint32_t > > reg, > > + bool always_indirect) > > +{ > > + if (reg < rdev->rmmio_size && !always_indirect) > > + return readl(((void __iomem *)rdev->rmmio) + reg); > > Quick thought from someone entirely unfamiliar with the hardware: > perhaps you can get the performance benefit without the size increase > by moving the else portion into a non-inline function? I'm guessing > that most accesses happen in the "if" branch. The function call overhead is about equal to branching overhead, so splitting it would only help about half that. It's called from many places, and a lot of calls per sec. Of course the future kernel LTO will all make this go away, but that's probably two years in the future before it's stable. - Lauri
[PATCH] drm/radeon: Inline r100_mm_rreg
This was originally un-inlined by Andi Kleen in 2011 citing size concerns. Indeed, inlining it grows radeon.ko by 7%. However, 2% of cpu is spent in this function. Inlining it gives 1% more fps in Urban Terror. Signed-off-by: Lauri Kasanen --- drivers/gpu/drm/radeon/r100.c | 18 -- drivers/gpu/drm/radeon/radeon.h | 20 ++-- 2 files changed, 18 insertions(+), 20 deletions(-) diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c index b6c3264..8169e82 100644 --- a/drivers/gpu/drm/radeon/r100.c +++ b/drivers/gpu/drm/radeon/r100.c @@ -4086,24 +4086,6 @@ int r100_init(struct radeon_device *rdev) return 0; } -uint32_t r100_mm_rreg(struct radeon_device *rdev, uint32_t reg, - bool always_indirect) -{ - if (reg < rdev->rmmio_size && !always_indirect) - return readl(((void __iomem *)rdev->rmmio) + reg); - else { - unsigned long flags; - uint32_t ret; - - spin_lock_irqsave(&rdev->mmio_idx_lock, flags); - writel(reg, ((void __iomem *)rdev->rmmio) + RADEON_MM_INDEX); - ret = readl(((void __iomem *)rdev->rmmio) + RADEON_MM_DATA); - spin_unlock_irqrestore(&rdev->mmio_idx_lock, flags); - - return ret; - } -} - void r100_mm_wreg(struct radeon_device *rdev, uint32_t reg, uint32_t v, bool always_indirect) { diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 5cf10a7..9231100 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -2330,8 +2330,24 @@ int radeon_device_init(struct radeon_device *rdev, void radeon_device_fini(struct radeon_device *rdev); int radeon_gpu_wait_for_idle(struct radeon_device *rdev); -uint32_t r100_mm_rreg(struct radeon_device *rdev, uint32_t reg, - bool always_indirect); +static inline uint32_t r100_mm_rreg(struct radeon_device *rdev, uint32_t reg, + bool always_indirect) +{ + if (reg < rdev->rmmio_size && !always_indirect) + return readl(((void __iomem *)rdev->rmmio) + reg); + else { + unsigned long flags; + uint32_t ret; + + spin_lock_irqsave(&rdev->mmio_idx_lock, flags); + writel(reg, ((void __iomem *)rdev->rmmio) + RADEON_MM_INDEX); + ret = readl(((void __iomem *)rdev->rmmio) + RADEON_MM_DATA); + spin_unlock_irqrestore(&rdev->mmio_idx_lock, flags); + + return ret; + } +} + void r100_mm_wreg(struct radeon_device *rdev, uint32_t reg, uint32_t v, bool always_indirect); u32 r100_io_rreg(struct radeon_device *rdev, u32 reg); -- 1.8.3.1
[PATCH] drm/radeon: Improve vramlimit module param documentation
Signed-off-by: Lauri Kasanen --- drivers/gpu/drm/radeon/radeon_drv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/radeon/radeon_drv.c b/drivers/gpu/drm/radeon/radeon_drv.c index d0eba48..cf2721a 100644 --- a/drivers/gpu/drm/radeon/radeon_drv.c +++ b/drivers/gpu/drm/radeon/radeon_drv.c @@ -186,7 +186,7 @@ module_param_named(dynclks, radeon_dynclks, int, 0444); MODULE_PARM_DESC(r4xx_atom, "Enable ATOMBIOS modesetting for R4xx"); module_param_named(r4xx_atom, radeon_r4xx_atom, int, 0444); -MODULE_PARM_DESC(vramlimit, "Restrict VRAM for testing"); +MODULE_PARM_DESC(vramlimit, "Restrict VRAM for testing, in megabytes"); module_param_named(vramlimit, radeon_vram_limit, int, 0600); MODULE_PARM_DESC(agpmode, "AGP Mode (-1 == PCI)"); -- 1.8.3.1
[RFC 0/3] TTM priority queue logic
On Mon, 07 Apr 2014 14:25:28 +0200 Thomas Hellstrom wrote: > Hi, Lauri. > > On 04/04/2014 03:52 PM, Lauri Kasanen wrote: > > Hi list, Thomas, > > > > I'd like to know if this is going in the right direction. > > This looks fine with me. > > However, if possible I'd like the drivers to enable both alloc_threshold > and priority queue on a per-memory-type basis. > > That would mean no new arguments (use_pqueue, alloc_threshold) in > ttm_bo_device_init(). Instead, set default values in > ttm_bo_init_mm(), and let the driver change them in the init_mem_type() > callback. > > Do you think that would work? Thanks for the review. Alloc_threshold was removed, and the current patch (in drm-next) replaced it with a placement flag. So now that logic is in the driver. Making the pqueue a per-type option would certainly work, I'll edit it to do so. I don't see why it would improve GTT or SYSTEM given their perf characteristics, but I suppose future devices can always do weird things. - Lauri
Required backwards support level?
On Sun, 6 Apr 2014 19:58:48 +0200 Daniel Vetter wrote: > On Sun, Apr 6, 2014 at 7:28 PM, Lauri Kasanen wrote: > > This is related to my memory management work. As the VRAM queue is > > global, it cannot be determined on a per-app basis. If at least one > > client is running old userspace, the new scoring cannot be used. > > > > Switching live between the two would bring additional complexity. I'd > > hope to be able to determine by the first 3d app if the userspace is > > new enough. But that depends on if it's expected to be able to run > > mixed loads. > > At least thus far we've worked under the example that any > explicitly-enabled new behaviour is only enabled per-fd. That's also > useful when you install all your developer versions into a prefix, so > still have 2 versions of X driver, libdrm, mesa, everything else lying > around and want to switch at runtime even. > > Can't you fake a default score somehow for all legacy userspace and > always run with the new code? Reimplementing the old interfaces in > terms of the new ones also allows you to kick out duplicated code > (usually) compared to having both old and new code around. Certainly, that can be done. - Lauri
Required backwards support level?
On Sun, 6 Apr 2014 10:53:58 -0400 Rob Clark wrote: > On Sun, Apr 6, 2014 at 8:26 AM, Lauri Kasanen wrote: > > Hi, > > > > Obviously old userspace + new kernel combo needs to be supported. But > > I'm not so sure about a mixed case, does old userspace + new userspace > > need to be supported to run at the same time? > > > > For example, a new 64-bit mesa+libdrm and a 32-bit old set, both > > running apps at the same time. > > > > Or is it acceptable to assume that once a new userspace has been run, > > an old one won't be run for this boot? > > old 32b userspace in a chroot, for example? Yes, for example. > Maybe you can give a better description of what you are wanting to do? > Could you decide on 32b vs 64b userspace on a per 'struct drm_file' > basis (ie. each time /dev/dri/cardN is opened)? This is related to my memory management work. As the VRAM queue is global, it cannot be determined on a per-app basis. If at least one client is running old userspace, the new scoring cannot be used. Switching live between the two would bring additional complexity. I'd hope to be able to determine by the first 3d app if the userspace is new enough. But that depends on if it's expected to be able to run mixed loads. - Lauri
Required backwards support level?
Hi, Obviously old userspace + new kernel combo needs to be supported. But I'm not so sure about a mixed case, does old userspace + new userspace need to be supported to run at the same time? For example, a new 64-bit mesa+libdrm and a 32-bit old set, both running apps at the same time. Or is it acceptable to assume that once a new userspace has been run, an old one won't be run for this boot? - Lauri
[RFC 3/3] drm/ttm: Enable the priority queue for VRAM
Signed-off-by: Lauri Kasanen --- drivers/gpu/drm/ttm/ttm_bo.c | 68 +++- 1 file changed, 54 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 621151c..80e5856 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -172,7 +172,12 @@ void ttm_bo_add_to_lru(struct ttm_buffer_object *bo) BUG_ON(!list_empty(&bo->lru)); man = &bdev->man[bo->mem.mem_type]; - list_add_tail(&bo->lru, &man->lru); + + if (bdev->use_pqueue && bo->mem.mem_type == TTM_PL_VRAM) + ttm_prio_add(&man->pqueue, &bo->pqueue); + else + list_add_tail(&bo->lru, &man->lru); + kref_get(&bo->list_kref); if (bo->ttm != NULL) { @@ -186,6 +191,8 @@ EXPORT_SYMBOL(ttm_bo_add_to_lru); int ttm_bo_del_from_lru(struct ttm_buffer_object *bo) { int put_count = 0; + struct ttm_bo_device *bdev = bo->bdev; + struct ttm_mem_type_manager *man = &bdev->man[bo->mem.mem_type]; if (!list_empty(&bo->swap)) { list_del_init(&bo->swap); @@ -194,6 +201,10 @@ int ttm_bo_del_from_lru(struct ttm_buffer_object *bo) if (!list_empty(&bo->lru)) { list_del_init(&bo->lru); ++put_count; + } else if (bdev->use_pqueue && bo->mem.mem_type == TTM_PL_VRAM && + ttm_prio_is_queued(&bo->pqueue)) { + ttm_prio_remove(&man->pqueue, &bo->pqueue); + ++put_count; } /* @@ -725,10 +736,22 @@ static int ttm_mem_evict_first(struct ttm_bo_device *bdev, int ret = -EBUSY, put_count; spin_lock(&glob->lru_lock); - list_for_each_entry(bo, &man->lru, lru) { - ret = ttm_bo_reserve_nolru(bo, false, true, false, 0); - if (!ret) - break; + if (bdev->use_pqueue && mem_type == TTM_PL_VRAM) { + struct ttm_pqueue_entry *pe; + for (pe = ttm_prio_query_lowest(&man->pqueue); pe; + pe = ttm_prio_query_next(pe)) { + + bo = container_of(pe, struct ttm_buffer_object, pqueue); + ret = ttm_bo_reserve_nolru(bo, false, true, false, 0); + if (!ret) + break; + } + } else { + list_for_each_entry(bo, &man->lru, lru) { + ret = ttm_bo_reserve_nolru(bo, false, true, false, 0); + if (!ret) + break; + } } if (ret) { @@ -1125,6 +1148,7 @@ int ttm_bo_init(struct ttm_bo_device *bdev, INIT_LIST_HEAD(&bo->ddestroy); INIT_LIST_HEAD(&bo->swap); INIT_LIST_HEAD(&bo->io_reserve_lru); + ttm_prio_init_entry(&bo->pqueue); mutex_init(&bo->wu_mutex); bo->bdev = bdev; bo->glob = bdev->glob; @@ -1243,17 +1267,32 @@ static int ttm_bo_force_list_clean(struct ttm_bo_device *bdev, */ spin_lock(&glob->lru_lock); - while (!list_empty(&man->lru)) { - spin_unlock(&glob->lru_lock); - ret = ttm_mem_evict_first(bdev, mem_type, false, false); - if (ret) { - if (allow_errors) { - return ret; - } else { - pr_err("Cleanup eviction failed\n"); + if (bdev->use_pqueue && mem_type == TTM_PL_VRAM) { + while (ttm_prio_query_lowest(&man->pqueue)) { + spin_unlock(&glob->lru_lock); + ret = ttm_mem_evict_first(bdev, mem_type, false, false); + if (ret) { + if (allow_errors) { + return ret; + } else { + pr_err("Cleanup eviction failed\n"); + } } + spin_lock(&glob->lru_lock); + } + } else { + while (!list_empty(&man->lru)) { + spin_unlock(&glob->lru_lock); + ret = ttm_mem_evict_first(bdev, mem_type, false, false); + if (ret) { + if (allow_errors) { + return ret; + } else { + pr_err("Cleanup eviction failed
[RFC 2/3] drm/ttm: Add the priority queue to appropriate structs
Signed-off-by: Lauri Kasanen --- drivers/gpu/drm/ast/ast_ttm.c | 1 + drivers/gpu/drm/bochs/bochs_mm.c | 1 + drivers/gpu/drm/cirrus/cirrus_ttm.c | 1 + drivers/gpu/drm/mgag200/mgag200_ttm.c | 1 + drivers/gpu/drm/nouveau/nouveau_ttm.c | 2 +- drivers/gpu/drm/qxl/qxl_ttm.c | 2 +- drivers/gpu/drm/radeon/radeon_ttm.c | 1 + drivers/gpu/drm/ttm/ttm_bo.c | 2 ++ drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 2 +- include/drm/ttm/ttm_bo_api.h | 2 ++ include/drm/ttm/ttm_bo_driver.h | 7 +++ 11 files changed, 19 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/ast/ast_ttm.c b/drivers/gpu/drm/ast/ast_ttm.c index 61f9e39..311f37f 100644 --- a/drivers/gpu/drm/ast/ast_ttm.c +++ b/drivers/gpu/drm/ast/ast_ttm.c @@ -263,6 +263,7 @@ int ast_mm_init(struct ast_private *ast) dev->anon_inode->i_mapping, DRM_FILE_PAGE_OFFSET, true, +false, 0); if (ret) { DRM_ERROR("Error initialising bo driver; %d\n", ret); diff --git a/drivers/gpu/drm/bochs/bochs_mm.c b/drivers/gpu/drm/bochs/bochs_mm.c index 9dfd24a..c4aba61 100644 --- a/drivers/gpu/drm/bochs/bochs_mm.c +++ b/drivers/gpu/drm/bochs/bochs_mm.c @@ -229,6 +229,7 @@ int bochs_mm_init(struct bochs_device *bochs) bochs->dev->anon_inode->i_mapping, DRM_FILE_PAGE_OFFSET, true, +false, 0); if (ret) { DRM_ERROR("Error initialising bo driver; %d\n", ret); diff --git a/drivers/gpu/drm/cirrus/cirrus_ttm.c b/drivers/gpu/drm/cirrus/cirrus_ttm.c index 74e8e21..895d20e 100644 --- a/drivers/gpu/drm/cirrus/cirrus_ttm.c +++ b/drivers/gpu/drm/cirrus/cirrus_ttm.c @@ -263,6 +263,7 @@ int cirrus_mm_init(struct cirrus_device *cirrus) dev->anon_inode->i_mapping, DRM_FILE_PAGE_OFFSET, true, +false, 0); if (ret) { DRM_ERROR("Error initialising bo driver; %d\n", ret); diff --git a/drivers/gpu/drm/mgag200/mgag200_ttm.c b/drivers/gpu/drm/mgag200/mgag200_ttm.c index 6844b24..591f68e 100644 --- a/drivers/gpu/drm/mgag200/mgag200_ttm.c +++ b/drivers/gpu/drm/mgag200/mgag200_ttm.c @@ -263,6 +263,7 @@ int mgag200_mm_init(struct mga_device *mdev) dev->anon_inode->i_mapping, DRM_FILE_PAGE_OFFSET, true, +false, 0); if (ret) { DRM_ERROR("Error initialising bo driver; %d\n", ret); diff --git a/drivers/gpu/drm/nouveau/nouveau_ttm.c b/drivers/gpu/drm/nouveau/nouveau_ttm.c index 3fef97c..4b032b4 100644 --- a/drivers/gpu/drm/nouveau/nouveau_ttm.c +++ b/drivers/gpu/drm/nouveau/nouveau_ttm.c @@ -384,7 +384,7 @@ nouveau_ttm_init(struct nouveau_drm *drm) &nouveau_bo_driver, dev->anon_inode->i_mapping, DRM_FILE_PAGE_OFFSET, - bits <= 32 ? true : false, 0); + bits <= 32 ? true : false, false, 0); if (ret) { NV_ERROR(drm, "error initialising bo driver, %d\n", ret); return ret; diff --git a/drivers/gpu/drm/qxl/qxl_ttm.c b/drivers/gpu/drm/qxl/qxl_ttm.c index 8401a00..88f12e7 100644 --- a/drivers/gpu/drm/qxl/qxl_ttm.c +++ b/drivers/gpu/drm/qxl/qxl_ttm.c @@ -495,7 +495,7 @@ int qxl_ttm_init(struct qxl_device *qdev) qdev->mman.bo_global_ref.ref.object, &qxl_bo_driver, qdev->ddev->anon_inode->i_mapping, - DRM_FILE_PAGE_OFFSET, 0, 0); + DRM_FILE_PAGE_OFFSET, 0, false, 0); if (r) { DRM_ERROR("failed initializing buffer object driver(%d).\n", r); return r; diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index 1aef339..dd96ed5 100644 --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -711,6 +711,7 @@ int radeon_ttm_init(struct radeon_device *rdev) rdev->ddev->anon_inode->i_mapping, DRM_FILE_PAGE_OFFSET, rdev->need_dma32, + false, 512 * 1024); if (r) { DRM_ERROR(&q
[RFC 1/3] drm/ttm: Add a priority queue
Signed-off-by: Lauri Kasanen --- drivers/gpu/drm/ttm/Makefile | 2 +- drivers/gpu/drm/ttm/ttm_priority.c | 152 + include/drm/ttm/ttm_priority.h | 90 ++ 3 files changed, 243 insertions(+), 1 deletion(-) create mode 100644 drivers/gpu/drm/ttm/ttm_priority.c create mode 100644 include/drm/ttm/ttm_priority.h diff --git a/drivers/gpu/drm/ttm/Makefile b/drivers/gpu/drm/ttm/Makefile index b433b9f..4411599 100644 --- a/drivers/gpu/drm/ttm/Makefile +++ b/drivers/gpu/drm/ttm/Makefile @@ -5,6 +5,6 @@ ccflags-y := -Iinclude/drm ttm-y := ttm_agp_backend.o ttm_memory.o ttm_tt.o ttm_bo.o \ ttm_bo_util.o ttm_bo_vm.o ttm_module.o \ ttm_object.o ttm_lock.o ttm_execbuf_util.o ttm_page_alloc.o \ - ttm_bo_manager.o ttm_page_alloc_dma.o + ttm_bo_manager.o ttm_page_alloc_dma.o ttm_priority.o obj-$(CONFIG_DRM_TTM) += ttm.o diff --git a/drivers/gpu/drm/ttm/ttm_priority.c b/drivers/gpu/drm/ttm/ttm_priority.c new file mode 100644 index 000..a7cf8d1 --- /dev/null +++ b/drivers/gpu/drm/ttm/ttm_priority.c @@ -0,0 +1,152 @@ +/** + * + * Copyright (c) 2014 Lauri Kasanen + * All Rights Reserved. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the + * "Software"), to deal in the Software without restriction, including + * without limitation the rights to use, copy, modify, merge, publish, + * distribute, sub license, and/or sell copies of the Software, and to + * permit persons to whom the Software is furnished to do so, subject to + * the following conditions: + * + * The above copyright notice and this permission notice (including the + * next paragraph) shall be included in all copies or substantial portions + * of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL + * THE COPYRIGHT HOLDERS, AUTHORS AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, + * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR + * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE + * USE OR OTHER DEALINGS IN THE SOFTWARE. + * + **/ + + +#include +#include + +static void ttm_prio_insert_rb(struct rb_root * const root, + struct ttm_pqueue_entry * const data, + struct rb_node *parent, + struct rb_node **where) +{ + BUG_ON(!where || (!parent && root->rb_node)); + + rb_link_node(&data->node, parent, where); + rb_insert_color(&data->node, root); +} + +static struct ttm_pqueue_entry *ttm_prio_search_rb(struct rb_root *root, + u64 score, + struct rb_node **parent, + struct rb_node ***where) +{ + struct rb_node **new = &root->rb_node; + + while (*new) { + struct ttm_pqueue_entry *this = container_of(*new, +struct ttm_pqueue_entry, +node); + + *parent = *new; + + if (score < this->score) + new = &((*new)->rb_left); + else if (score > this->score) + new = &((*new)->rb_right); + else + return this; + + BUG_ON(*parent == *new); + } + + *where = new; + + return NULL; +} + +void ttm_prio_add(struct ttm_pqueue * const queue, + struct ttm_pqueue_entry * const entry) +{ + struct rb_root * const tree = &queue->tree; + struct rb_node **place = NULL, *parent = NULL; + struct ttm_pqueue_entry *test; + + if (ttm_prio_is_queued(entry)) + ttm_prio_remove(queue, entry); + + test = ttm_prio_search_rb(tree, entry->score, &parent, &place); + + if (!test) + ttm_prio_insert_rb(tree, entry, parent, place); + else + list_add_tail(&entry->list, &test->list); +} + +struct ttm_pqueue_entry *ttm_prio_query_lowest(const struct ttm_pqueue * const queue) +{ + const struct rb_root * const root = &queue->tree; + struct rb_node *node; + + node = rb_first(root); + if (!node) + return NULL; + + return container_of(node, struct ttm_pqueue_entry, node); +} + +void ttm_prio_remove(struct ttm_pqueue * const queue, +
[RFC 0/3] TTM priority queue logic
Hi list, Thomas, I'd like to know if this is going in the right direction. I've implemented a priority queue on top of the kernel rb tree and linked list. It's been tested well in userspace. I hardcoded radeon to input the buffer size as the score. Nothing blew up, games ran fine, and even got ~2% more fps on average*. The only thing missing from this code is the "if score is too low, and there is no room without eviction, tell driver so" logic. - Lauri * This is a fairly bad strategy, and according to my simulator achieves 16% worse results compared to LRU in heavier situations. The games tested here all fit in VRAM, which should explain the improvement.
[PATCH] drm/radeon: Use two-ended allocation by size, v2
This decreases eviction by up to 20%, by improving the fragmentation quality. No harm in normal cases that fit VRAM fully (PTS gaming suite). In some cases, even the VRAM-fitting cases improved slightly (openarena, urban terror). 512kb was measured as the most optimal threshold for 3d workloads common to radeon. Other drivers may need different thresholds according to their workloads. v2: Nicer formatting Signed-off-by: Lauri Kasanen --- drivers/gpu/drm/radeon/radeon_object.c | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/radeon/radeon_object.c b/drivers/gpu/drm/radeon/radeon_object.c index 1375ff8..19bec0d 100644 --- a/drivers/gpu/drm/radeon/radeon_object.c +++ b/drivers/gpu/drm/radeon/radeon_object.c @@ -104,7 +104,7 @@ bool radeon_ttm_bo_is_radeon_bo(struct ttm_buffer_object *bo) void radeon_ttm_placement_from_domain(struct radeon_bo *rbo, u32 domain) { - u32 c = 0; + u32 c = 0, i; rbo->placement.fpfn = 0; rbo->placement.lpfn = 0; @@ -131,6 +131,17 @@ void radeon_ttm_placement_from_domain(struct radeon_bo *rbo, u32 domain) rbo->placements[c++] = TTM_PL_MASK_CACHING | TTM_PL_FLAG_SYSTEM; rbo->placement.num_placement = c; rbo->placement.num_busy_placement = c; + + /* +* Use two-ended allocation depending on the buffer size to +* improve fragmentation quality. +* 512kb was measured as the most optimal number. +*/ + if (rbo->tbo.mem.size > 512 * 1024) { + for (i = 0; i < c; i++) { + rbo->placements[i] |= TTM_PL_FLAG_TOPDOWN; + } + } } int radeon_bo_create(struct radeon_device *rdev, -- 1.8.3.1
[PATCH 2/2] drm/radeon: Use two-ended allocation by size
This decreases eviction by up to 20%, by improving the fragmentation quality. No harm in normal cases that fit VRAM fully (PTS gaming suite). In some cases, even the VRAM-fitting cases improved slightly (openarena, urban terror). 512kb was measured as the most optimal threshold for 3d workloads common to radeon. Other drivers may need different thresholds according to their workloads. Signed-off-by: Lauri Kasanen --- drivers/gpu/drm/radeon/radeon_object.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/radeon/radeon_object.c b/drivers/gpu/drm/radeon/radeon_object.c index 1375ff8..6251456 100644 --- a/drivers/gpu/drm/radeon/radeon_object.c +++ b/drivers/gpu/drm/radeon/radeon_object.c @@ -104,7 +104,7 @@ bool radeon_ttm_bo_is_radeon_bo(struct ttm_buffer_object *bo) void radeon_ttm_placement_from_domain(struct radeon_bo *rbo, u32 domain) { - u32 c = 0; + u32 c = 0, i; rbo->placement.fpfn = 0; rbo->placement.lpfn = 0; @@ -131,6 +131,15 @@ void radeon_ttm_placement_from_domain(struct radeon_bo *rbo, u32 domain) rbo->placements[c++] = TTM_PL_MASK_CACHING | TTM_PL_FLAG_SYSTEM; rbo->placement.num_placement = c; rbo->placement.num_busy_placement = c; + + /* +* Use two-ended allocation depending on the buffer size to +* improve fragmentation quality. +* 512kb was measured as the most optimal number. +*/ + if (rbo->tbo.mem.size > 512 * 1024) for (i = 0; i < c; i++) { + rbo->placements[i] |= TTM_PL_FLAG_TOPDOWN; + } } int radeon_bo_create(struct radeon_device *rdev, -- 1.8.3.1
[PATCH 1/2] drm: Add support for two-ended allocation, v3
Clients like i915 need to segregate cache domains within the GTT which can lead to small amounts of fragmentation. By allocating the uncached buffers from the bottom and the cacheable buffers from the top, we can reduce the amount of wasted space and also optimize allocation of the mappable portion of the GTT to only those buffers that require CPU access through the GTT. For other drivers, allocating small bos from one end and large ones from the other helps improve the quality of fragmentation. Based on drm_mm work by Chris Wilson. v3: Changed to use a TTM placement flag v2: Updated kerneldoc Cc: Chris Wilson Cc: Ben Widawsky Cc: Christian K?nig Signed-off-by: Lauri Kasanen --- drivers/gpu/drm/drm_mm.c | 66 ++-- drivers/gpu/drm/i915/i915_gem.c | 3 +- drivers/gpu/drm/i915/i915_gem_gtt.c | 3 +- drivers/gpu/drm/ttm/ttm_bo_manager.c | 11 -- include/drm/drm_mm.h | 32 ++--- include/drm/ttm/ttm_placement.h | 3 ++ 6 files changed, 92 insertions(+), 26 deletions(-) diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c index a2d45b74..8f64be4 100644 --- a/drivers/gpu/drm/drm_mm.c +++ b/drivers/gpu/drm/drm_mm.c @@ -82,6 +82,10 @@ * this to implement guard pages between incompatible caching domains in the * graphics TT. * + * Two behaviors are supported for searching and allocating: bottom-up and top-down. + * The default is bottom-up. Top-down allocation can be used if the memory area + * has different restrictions, or just to reduce fragmentation. + * * Finally iteration helpers to walk all nodes and all holes are provided as are * some basic allocator dumpers for debugging. */ @@ -102,7 +106,8 @@ static struct drm_mm_node *drm_mm_search_free_in_range_generic(const struct drm_ static void drm_mm_insert_helper(struct drm_mm_node *hole_node, struct drm_mm_node *node, unsigned long size, unsigned alignment, -unsigned long color) +unsigned long color, +enum drm_mm_allocator_flags flags) { struct drm_mm *mm = hole_node->mm; unsigned long hole_start = drm_mm_hole_node_start(hole_node); @@ -115,12 +120,22 @@ static void drm_mm_insert_helper(struct drm_mm_node *hole_node, if (mm->color_adjust) mm->color_adjust(hole_node, color, &adj_start, &adj_end); + if (flags & DRM_MM_CREATE_TOP) + adj_start = adj_end - size; + if (alignment) { unsigned tmp = adj_start % alignment; - if (tmp) - adj_start += alignment - tmp; + if (tmp) { + if (flags & DRM_MM_CREATE_TOP) + adj_start -= tmp; + else + adj_start += alignment - tmp; + } } + BUG_ON(adj_start < hole_start); + BUG_ON(adj_end > hole_end); + if (adj_start == hole_start) { hole_node->hole_follows = 0; list_del(&hole_node->hole_stack); @@ -205,7 +220,8 @@ EXPORT_SYMBOL(drm_mm_reserve_node); * @size: size of the allocation * @alignment: alignment of the allocation * @color: opaque tag value to use for this node - * @flags: flags to fine-tune the allocation + * @sflags: flags to fine-tune the allocation search + * @aflags: flags to fine-tune the allocation behavior * * The preallocated node must be cleared to 0. * @@ -215,16 +231,17 @@ EXPORT_SYMBOL(drm_mm_reserve_node); int drm_mm_insert_node_generic(struct drm_mm *mm, struct drm_mm_node *node, unsigned long size, unsigned alignment, unsigned long color, - enum drm_mm_search_flags flags) + enum drm_mm_search_flags sflags, + enum drm_mm_allocator_flags aflags) { struct drm_mm_node *hole_node; hole_node = drm_mm_search_free_generic(mm, size, alignment, - color, flags); + color, sflags); if (!hole_node) return -ENOSPC; - drm_mm_insert_helper(hole_node, node, size, alignment, color); + drm_mm_insert_helper(hole_node, node, size, alignment, color, aflags); return 0; } EXPORT_SYMBOL(drm_mm_insert_node_generic); @@ -233,7 +250,8 @@ static void drm_mm_insert_helper_range(struct drm_mm_node *hole_node, struct drm_mm_node *node, unsigned long size, unsigned alignment, unsigned long color, - unsigned long start, unsigned long end) +
[PATCH] drm: Add support for two-ended allocation, v2
On Wed, 02 Apr 2014 13:00:00 +0200 Christian K?nig wrote: > Nice idea, but I wouldn't put the decision where to place the buffer > into TTM based on it's size. > > Instead please make that a proper TTM placement flag because for example > for VM page tables we want them to be at the end of VRAM, not because > they are big (which they are anyway) but because they never accessed by > the CPU. I think there aren't many special cases like that, and they could already be handled by setting the first and last pages for the bo? - Lauri
[PATCH] drm: Add support for two-ended allocation, v2
Clients like i915 need to segregate cache domains within the GTT which can lead to small amounts of fragmentation. By allocating the uncached buffers from the bottom and the cacheable buffers from the top, we can reduce the amount of wasted space and also optimize allocation of the mappable portion of the GTT to only those buffers that require CPU access through the GTT. For other drivers, allocating small bos from one end and large ones from the other helps improve the quality of fragmentation. Only Radeon has behavioral changes in this patch, as I have no Intel hw. Based on drm_mm work by Chris Wilson. -- Radeon uses a 512kb threshold. This decreases eviction by up to 20%, by improving the fragmentation quality. No harm in normal cases that fit VRAM fully (PTS gaming suite). In some cases, even the VRAM-fitting cases improved slightly (openarena, urban terror). 512kb was measured as the most optimal threshold for 3d workloads common to radeon. Other drivers may need different thresholds according to their workloads. v2: Updated kerneldoc in more places Cc: Chris Wilson Cc: Ben Widawsky Signed-off-by: Lauri Kasanen --- drivers/gpu/drm/ast/ast_ttm.c | 3 +- drivers/gpu/drm/bochs/bochs_mm.c | 3 +- drivers/gpu/drm/cirrus/cirrus_ttm.c | 3 +- drivers/gpu/drm/drm_mm.c | 66 ++- drivers/gpu/drm/i915/i915_gem.c | 3 +- drivers/gpu/drm/i915/i915_gem_gtt.c | 3 +- drivers/gpu/drm/mgag200/mgag200_ttm.c | 3 +- drivers/gpu/drm/nouveau/nouveau_ttm.c | 2 +- drivers/gpu/drm/qxl/qxl_ttm.c | 2 +- drivers/gpu/drm/radeon/radeon_ttm.c | 3 +- drivers/gpu/drm/ttm/ttm_bo.c | 4 ++- drivers/gpu/drm/ttm/ttm_bo_manager.c | 16 +++-- drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 2 +- include/drm/drm_mm.h | 32 ++--- include/drm/ttm/ttm_bo_driver.h | 9 - 15 files changed, 118 insertions(+), 36 deletions(-) diff --git a/drivers/gpu/drm/ast/ast_ttm.c b/drivers/gpu/drm/ast/ast_ttm.c index b824622..61f9e39 100644 --- a/drivers/gpu/drm/ast/ast_ttm.c +++ b/drivers/gpu/drm/ast/ast_ttm.c @@ -262,7 +262,8 @@ int ast_mm_init(struct ast_private *ast) &ast_bo_driver, dev->anon_inode->i_mapping, DRM_FILE_PAGE_OFFSET, -true); +true, +0); if (ret) { DRM_ERROR("Error initialising bo driver; %d\n", ret); return ret; diff --git a/drivers/gpu/drm/bochs/bochs_mm.c b/drivers/gpu/drm/bochs/bochs_mm.c index f488be5..9dfd24a 100644 --- a/drivers/gpu/drm/bochs/bochs_mm.c +++ b/drivers/gpu/drm/bochs/bochs_mm.c @@ -228,7 +228,8 @@ int bochs_mm_init(struct bochs_device *bochs) &bochs_bo_driver, bochs->dev->anon_inode->i_mapping, DRM_FILE_PAGE_OFFSET, -true); +true, +0); if (ret) { DRM_ERROR("Error initialising bo driver; %d\n", ret); return ret; diff --git a/drivers/gpu/drm/cirrus/cirrus_ttm.c b/drivers/gpu/drm/cirrus/cirrus_ttm.c index 92e6b77..74e8e21 100644 --- a/drivers/gpu/drm/cirrus/cirrus_ttm.c +++ b/drivers/gpu/drm/cirrus/cirrus_ttm.c @@ -262,7 +262,8 @@ int cirrus_mm_init(struct cirrus_device *cirrus) &cirrus_bo_driver, dev->anon_inode->i_mapping, DRM_FILE_PAGE_OFFSET, -true); +true, +0); if (ret) { DRM_ERROR("Error initialising bo driver; %d\n", ret); return ret; diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c index a2d45b74..8f64be4 100644 --- a/drivers/gpu/drm/drm_mm.c +++ b/drivers/gpu/drm/drm_mm.c @@ -82,6 +82,10 @@ * this to implement guard pages between incompatible caching domains in the * graphics TT. * + * Two behaviors are supported for searching and allocating: bottom-up and top-down. + * The default is bottom-up. Top-down allocation can be used if the memory area + * has different restrictions, or just to reduce fragmentation. + * * Finally iteration helpers to walk all nodes and all holes are provided as are * some basic allocator dumpers for debugging. */ @@ -102,7 +106,8 @@ static struct drm_mm_node *drm_mm_search_free_in_range_generic(const struct drm_ static void drm_mm_insert_helper(struct drm_mm_node *hole_node, struct drm_mm_node *node, unsigned long size, unsigned alignment, -unsigned l
[PATCH] drm: Add support for two-ended allocation
Clients like i915 need to segregate cache domains within the GTT which can lead to small amounts of fragmentation. By allocating the uncached buffers from the bottom and the cacheable buffers from the top, we can reduce the amount of wasted space and also optimize allocation of the mappable portion of the GTT to only those buffers that require CPU access through the GTT. For other drivers, allocating small bos from one end and large ones from the other helps improve the quality of fragmentation. Based on drm_mm work by Chris Wilson. -- Radeon uses a 512kb threshold. This decreases eviction by up to 20%, by improving the fragmentation quality. No harm in normal cases that fit VRAM fully (PTS gaming suite). In some cases, even the VRAM-fitting cases improved slightly (openarena, urban terror). 512kb was measured as the most optimal threshold for 3d workloads common to radeon. Other drivers may need different thresholds according to their workloads. Signed-off-by: Lauri Kasanen --- drivers/gpu/drm/ast/ast_ttm.c | 3 +- drivers/gpu/drm/bochs/bochs_mm.c | 3 +- drivers/gpu/drm/cirrus/cirrus_ttm.c | 3 +- drivers/gpu/drm/drm_mm.c | 56 +-- drivers/gpu/drm/i915/i915_gem.c | 3 +- drivers/gpu/drm/i915/i915_gem_gtt.c | 3 +- drivers/gpu/drm/mgag200/mgag200_ttm.c | 3 +- drivers/gpu/drm/nouveau/nouveau_ttm.c | 2 +- drivers/gpu/drm/qxl/qxl_ttm.c | 2 +- drivers/gpu/drm/radeon/radeon_ttm.c | 3 +- drivers/gpu/drm/ttm/ttm_bo.c | 4 ++- drivers/gpu/drm/ttm/ttm_bo_manager.c | 16 -- drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 2 +- include/drm/drm_mm.h | 29 +++--- include/drm/ttm/ttm_bo_driver.h | 7 - 15 files changed, 105 insertions(+), 34 deletions(-) diff --git a/drivers/gpu/drm/ast/ast_ttm.c b/drivers/gpu/drm/ast/ast_ttm.c index b824622..61f9e39 100644 --- a/drivers/gpu/drm/ast/ast_ttm.c +++ b/drivers/gpu/drm/ast/ast_ttm.c @@ -262,7 +262,8 @@ int ast_mm_init(struct ast_private *ast) &ast_bo_driver, dev->anon_inode->i_mapping, DRM_FILE_PAGE_OFFSET, -true); +true, +0); if (ret) { DRM_ERROR("Error initialising bo driver; %d\n", ret); return ret; diff --git a/drivers/gpu/drm/bochs/bochs_mm.c b/drivers/gpu/drm/bochs/bochs_mm.c index f488be5..9dfd24a 100644 --- a/drivers/gpu/drm/bochs/bochs_mm.c +++ b/drivers/gpu/drm/bochs/bochs_mm.c @@ -228,7 +228,8 @@ int bochs_mm_init(struct bochs_device *bochs) &bochs_bo_driver, bochs->dev->anon_inode->i_mapping, DRM_FILE_PAGE_OFFSET, -true); +true, +0); if (ret) { DRM_ERROR("Error initialising bo driver; %d\n", ret); return ret; diff --git a/drivers/gpu/drm/cirrus/cirrus_ttm.c b/drivers/gpu/drm/cirrus/cirrus_ttm.c index 92e6b77..74e8e21 100644 --- a/drivers/gpu/drm/cirrus/cirrus_ttm.c +++ b/drivers/gpu/drm/cirrus/cirrus_ttm.c @@ -262,7 +262,8 @@ int cirrus_mm_init(struct cirrus_device *cirrus) &cirrus_bo_driver, dev->anon_inode->i_mapping, DRM_FILE_PAGE_OFFSET, -true); +true, +0); if (ret) { DRM_ERROR("Error initialising bo driver; %d\n", ret); return ret; diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c index a2d45b74..1728bcc 100644 --- a/drivers/gpu/drm/drm_mm.c +++ b/drivers/gpu/drm/drm_mm.c @@ -102,7 +102,8 @@ static struct drm_mm_node *drm_mm_search_free_in_range_generic(const struct drm_ static void drm_mm_insert_helper(struct drm_mm_node *hole_node, struct drm_mm_node *node, unsigned long size, unsigned alignment, -unsigned long color) +unsigned long color, +enum drm_mm_allocator_flags flags) { struct drm_mm *mm = hole_node->mm; unsigned long hole_start = drm_mm_hole_node_start(hole_node); @@ -115,12 +116,22 @@ static void drm_mm_insert_helper(struct drm_mm_node *hole_node, if (mm->color_adjust) mm->color_adjust(hole_node, color, &adj_start, &adj_end); + if (flags & DRM_MM_CREATE_TOP) + adj_start = adj_end - size; + if (alignment) { unsigned tmp = adj_start % alignment; -
[PATCH 2/4] drm/ttm: Add optional support for two-ended allocation
On Mon, 31 Mar 2014 14:41:05 +0200 Lucas Stach wrote: > Am Montag, den 31.03.2014, 15:28 +0300 schrieb Lauri Kasanen: > > Allocating small bos from one end and large ones from the other helps > > improve the quality of fragmentation. > > > > This depends on "drm: Optionally create mm blocks from top-to-bottom" by > > Chris Wilson. > > > You are breaking bisectability here. This patch deliberately introduces > a build failure that you only fix up in the next two patches. This isn't > acceptable, you must squash the following patches in to avoid the > breakage. > > Also if the first patch is from Chris Wilson and you only did some small > changes this should be reflected in the author name of the patch. OK, will squash and resend. Then the changes are mostly mine in the singular patch. BTW, how would I handle the case of a different author, when I cannot use git-send-email due to network mail restrictions? I can't send email from an address I don't own. - Lauri
[PATCH 4/4] drm/radeon: Use two-ended allocation with a 512kb threshold
This decreases eviction by up to 20%, by improving the fragmentation quality. No harm in normal cases that fit VRAM fully (PTS gaming suite). In some cases, even the VRAM-fitting cases improved slightly (openarena, urban terror). 512kb was measured as the most optimal threshold for 3d workloads common to radeon. Other drivers may need different thresholds according to their workloads. Signed-off-by: Lauri Kasanen --- drivers/gpu/drm/radeon/radeon_ttm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index c8a8a51..1aef339 100644 --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -710,7 +710,8 @@ int radeon_ttm_init(struct radeon_device *rdev) &radeon_bo_driver, rdev->ddev->anon_inode->i_mapping, DRM_FILE_PAGE_OFFSET, - rdev->need_dma32); + rdev->need_dma32, + 512 * 1024); if (r) { DRM_ERROR("failed initializing buffer object driver(%d).\n", r); return r; -- 1.8.3.1
[PATCH 3/4] drm: Update the drivers for the two-ended allocation param
This updates the other drivers to build, no-op behavior-wise. Signed-off-by: Lauri Kasanen --- drivers/gpu/drm/ast/ast_ttm.c | 3 ++- drivers/gpu/drm/bochs/bochs_mm.c | 3 ++- drivers/gpu/drm/cirrus/cirrus_ttm.c | 3 ++- drivers/gpu/drm/i915/i915_gem.c | 3 ++- drivers/gpu/drm/i915/i915_gem_gtt.c | 3 ++- drivers/gpu/drm/mgag200/mgag200_ttm.c | 3 ++- drivers/gpu/drm/nouveau/nouveau_ttm.c | 2 +- drivers/gpu/drm/qxl/qxl_ttm.c | 2 +- drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 2 +- 9 files changed, 15 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/ast/ast_ttm.c b/drivers/gpu/drm/ast/ast_ttm.c index b824622..61f9e39 100644 --- a/drivers/gpu/drm/ast/ast_ttm.c +++ b/drivers/gpu/drm/ast/ast_ttm.c @@ -262,7 +262,8 @@ int ast_mm_init(struct ast_private *ast) &ast_bo_driver, dev->anon_inode->i_mapping, DRM_FILE_PAGE_OFFSET, -true); +true, +0); if (ret) { DRM_ERROR("Error initialising bo driver; %d\n", ret); return ret; diff --git a/drivers/gpu/drm/bochs/bochs_mm.c b/drivers/gpu/drm/bochs/bochs_mm.c index f488be5..9dfd24a 100644 --- a/drivers/gpu/drm/bochs/bochs_mm.c +++ b/drivers/gpu/drm/bochs/bochs_mm.c @@ -228,7 +228,8 @@ int bochs_mm_init(struct bochs_device *bochs) &bochs_bo_driver, bochs->dev->anon_inode->i_mapping, DRM_FILE_PAGE_OFFSET, -true); +true, +0); if (ret) { DRM_ERROR("Error initialising bo driver; %d\n", ret); return ret; diff --git a/drivers/gpu/drm/cirrus/cirrus_ttm.c b/drivers/gpu/drm/cirrus/cirrus_ttm.c index 92e6b77..74e8e21 100644 --- a/drivers/gpu/drm/cirrus/cirrus_ttm.c +++ b/drivers/gpu/drm/cirrus/cirrus_ttm.c @@ -262,7 +262,8 @@ int cirrus_mm_init(struct cirrus_device *cirrus) &cirrus_bo_driver, dev->anon_inode->i_mapping, DRM_FILE_PAGE_OFFSET, -true); +true, +0); if (ret) { DRM_ERROR("Error initialising bo driver; %d\n", ret); return ret; diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 9c52f68..489bc33 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -3270,7 +3270,8 @@ search_free: ret = drm_mm_insert_node_in_range_generic(&vm->mm, &vma->node, size, alignment, obj->cache_level, 0, gtt_max, - DRM_MM_SEARCH_DEFAULT); + DRM_MM_SEARCH_DEFAULT, + DRM_MM_CREATE_DEFAULT); if (ret) { ret = i915_gem_evict_something(dev, vm, size, alignment, obj->cache_level, flags); diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index 63a6dc7..dd72d35 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -1072,7 +1072,8 @@ alloc: &ppgtt->node, GEN6_PD_SIZE, GEN6_PD_ALIGN, 0, 0, dev_priv->gtt.base.total, - DRM_MM_SEARCH_DEFAULT); + DRM_MM_SEARCH_DEFAULT, + DRM_MM_CREATE_DEFAULT); if (ret == -ENOSPC && !retried) { ret = i915_gem_evict_something(dev, &dev_priv->gtt.base, GEN6_PD_SIZE, GEN6_PD_ALIGN, diff --git a/drivers/gpu/drm/mgag200/mgag200_ttm.c b/drivers/gpu/drm/mgag200/mgag200_ttm.c index 5a00e90..6844b24 100644 --- a/drivers/gpu/drm/mgag200/mgag200_ttm.c +++ b/drivers/gpu/drm/mgag200/mgag200_ttm.c @@ -262,7 +262,8 @@ int mgag200_mm_init(struct mga_device *mdev) &mgag200_bo_driver, dev->anon_inode->i_mapping, DRM_FILE_PAGE_OFFSET, -true); +true, +0); if (ret) { DRM_ERROR("Error initialising bo dri
[PATCH 2/4] drm/ttm: Add optional support for two-ended allocation
Allocating small bos from one end and large ones from the other helps improve the quality of fragmentation. This depends on "drm: Optionally create mm blocks from top-to-bottom" by Chris Wilson. Signed-off-by: Lauri Kasanen --- drivers/gpu/drm/ttm/ttm_bo.c | 4 +++- drivers/gpu/drm/ttm/ttm_bo_manager.c | 16 +--- include/drm/ttm/ttm_bo_driver.h | 7 ++- 3 files changed, 22 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 9df79ac..caf7cd3 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -1453,7 +1453,8 @@ int ttm_bo_device_init(struct ttm_bo_device *bdev, struct ttm_bo_driver *driver, struct address_space *mapping, uint64_t file_page_offset, - bool need_dma32) + bool need_dma32, + uint32_t alloc_threshold) { int ret = -EINVAL; @@ -1476,6 +1477,7 @@ int ttm_bo_device_init(struct ttm_bo_device *bdev, bdev->dev_mapping = mapping; bdev->glob = glob; bdev->need_dma32 = need_dma32; + bdev->alloc_threshold = alloc_threshold; bdev->val_seq = 0; spin_lock_init(&bdev->fence_lock); mutex_lock(&glob->device_list_mutex); diff --git a/drivers/gpu/drm/ttm/ttm_bo_manager.c b/drivers/gpu/drm/ttm/ttm_bo_manager.c index c58eba33..db9fcb4 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_manager.c +++ b/drivers/gpu/drm/ttm/ttm_bo_manager.c @@ -55,6 +55,7 @@ static int ttm_bo_man_get_node(struct ttm_mem_type_manager *man, struct ttm_range_manager *rman = (struct ttm_range_manager *) man->priv; struct drm_mm *mm = &rman->mm; struct drm_mm_node *node = NULL; + enum drm_mm_allocator_flags aflags = DRM_MM_CREATE_DEFAULT; unsigned long lpfn; int ret; @@ -65,12 +66,21 @@ static int ttm_bo_man_get_node(struct ttm_mem_type_manager *man, node = kzalloc(sizeof(*node), GFP_KERNEL); if (!node) return -ENOMEM; + /** +* If the driver requested a threshold, use two-ended allocation. +* Pinned buffers require bottom-up allocation. +*/ + if (man->bdev->alloc_threshold && + !(bo->mem.placement & TTM_PL_FLAG_NO_EVICT) && + man->bdev->alloc_threshold < (mem->num_pages * PAGE_SIZE)) + aflags = DRM_MM_CREATE_TOP; spin_lock(&rman->lock); - ret = drm_mm_insert_node_in_range(mm, node, mem->num_pages, - mem->page_alignment, + ret = drm_mm_insert_node_in_range_generic(mm, node, mem->num_pages, + mem->page_alignment, 0, placement->fpfn, lpfn, - DRM_MM_SEARCH_BEST); + DRM_MM_SEARCH_BEST, + aflags); spin_unlock(&rman->lock); if (unlikely(ret)) { diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h index 5d8aabe..f5fe6df 100644 --- a/include/drm/ttm/ttm_bo_driver.h +++ b/include/drm/ttm/ttm_bo_driver.h @@ -565,6 +565,7 @@ struct ttm_bo_device { struct delayed_work wq; bool need_dma32; + uint32_t alloc_threshold; }; /** @@ -751,6 +752,8 @@ extern int ttm_bo_device_release(struct ttm_bo_device *bdev); * @file_page_offset: Offset into the device address space that is available * for buffer data. This ensures compatibility with other users of the * address space. + * @alloc_threshold: If non-zero, use this as the threshold for two-ended + * allocation. * * Initializes a struct ttm_bo_device: * Returns: @@ -760,7 +763,9 @@ extern int ttm_bo_device_init(struct ttm_bo_device *bdev, struct ttm_bo_global *glob, struct ttm_bo_driver *driver, struct address_space *mapping, - uint64_t file_page_offset, bool need_dma32); + uint64_t file_page_offset, + bool need_dma32, + uint32_t alloc_threshold); /** * ttm_bo_unmap_virtual -- 1.8.3.1
[PATCH 1/4] drm: Optionally create mm blocks from top-to-bottom
Clients like i915 need to segregate cache domains within the GTT which can lead to small amounts of fragmentation. By allocating the uncached buffers from the bottom and the cacheable buffers from the top, we can reduce the amount of wasted space and also optimize allocation of the mappable portion of the GTT to only those buffers that require CPU access through the GTT. For other drivers, allocating small bos from one end and large ones from the other helps improve the quality of fragmentation. Original patch by Chris Wilson. v2 by Ben: Update callers in i915_gem_object_bind_to_gtt() Turn search flags and allocation flags into separate enums Make checkpatch happy where logical/easy v3 by Ben: Rebased on top of the many drm_mm changes since the original patches Remove ATOMIC from allocator flags (Chris) Reverse order of TOPDOWN and BOTTOMUP v4 by Lauri: Remove i915 parts, they don't apply Respin on top of drm-next Signed-off-by: Lauri Kasanen --- drivers/gpu/drm/drm_mm.c | 56 +++- include/drm/drm_mm.h | 29 + 2 files changed, 66 insertions(+), 19 deletions(-) diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c index a2d45b74..1728bcc 100644 --- a/drivers/gpu/drm/drm_mm.c +++ b/drivers/gpu/drm/drm_mm.c @@ -102,7 +102,8 @@ static struct drm_mm_node *drm_mm_search_free_in_range_generic(const struct drm_ static void drm_mm_insert_helper(struct drm_mm_node *hole_node, struct drm_mm_node *node, unsigned long size, unsigned alignment, -unsigned long color) +unsigned long color, +enum drm_mm_allocator_flags flags) { struct drm_mm *mm = hole_node->mm; unsigned long hole_start = drm_mm_hole_node_start(hole_node); @@ -115,12 +116,22 @@ static void drm_mm_insert_helper(struct drm_mm_node *hole_node, if (mm->color_adjust) mm->color_adjust(hole_node, color, &adj_start, &adj_end); + if (flags & DRM_MM_CREATE_TOP) + adj_start = adj_end - size; + if (alignment) { unsigned tmp = adj_start % alignment; - if (tmp) - adj_start += alignment - tmp; + if (tmp) { + if (flags & DRM_MM_CREATE_TOP) + adj_start -= tmp; + else + adj_start += alignment - tmp; + } } + BUG_ON(adj_start < hole_start); + BUG_ON(adj_end > hole_end); + if (adj_start == hole_start) { hole_node->hole_follows = 0; list_del(&hole_node->hole_stack); @@ -215,16 +226,17 @@ EXPORT_SYMBOL(drm_mm_reserve_node); int drm_mm_insert_node_generic(struct drm_mm *mm, struct drm_mm_node *node, unsigned long size, unsigned alignment, unsigned long color, - enum drm_mm_search_flags flags) + enum drm_mm_search_flags sflags, + enum drm_mm_allocator_flags aflags) { struct drm_mm_node *hole_node; hole_node = drm_mm_search_free_generic(mm, size, alignment, - color, flags); + color, sflags); if (!hole_node) return -ENOSPC; - drm_mm_insert_helper(hole_node, node, size, alignment, color); + drm_mm_insert_helper(hole_node, node, size, alignment, color, aflags); return 0; } EXPORT_SYMBOL(drm_mm_insert_node_generic); @@ -233,7 +245,8 @@ static void drm_mm_insert_helper_range(struct drm_mm_node *hole_node, struct drm_mm_node *node, unsigned long size, unsigned alignment, unsigned long color, - unsigned long start, unsigned long end) + unsigned long start, unsigned long end, + enum drm_mm_allocator_flags flags) { struct drm_mm *mm = hole_node->mm; unsigned long hole_start = drm_mm_hole_node_start(hole_node); @@ -248,13 +261,20 @@ static void drm_mm_insert_helper_range(struct drm_mm_node *hole_node, if (adj_end > end) adj_end = end; + if (flags & DRM_MM_CREATE_TOP) + adj_start = adj_end - size; + if (mm->color_adjust) mm->color_adjust(hole_node, color, &adj_start, &adj_end); if (alignment) { unsigned tmp = adj_start % alignment; - if (tmp) - adj_start += alignment - tmp; +
[PATCH 0/4] Two-ended allocation support
Cheers, This patchset is respun on top of today's drm-next. It has only been compile-tested, since my test computer is currently on strike in support of Ukraine. It was a simple spin though, so Mr. Murphy will likely stay away. Only Radeon behavior is affected, all other drivers have just compiling changes. - Lauri
[PATCH] drm/radeon: TTM must be init with cpu-visible VRAM, v2
On Sat, 1 Mar 2014 06:47:41 +1000 Dave Airlie wrote: > On Sat, Mar 1, 2014 at 5:56 AM, Christian K?nig > wrote: > > Am 28.02.2014 19:50, schrieb Lauri Kasanen: > > > >> Without this, a bo may get created in the cpu-inaccessible vram. > >> Before the CP engines get setup, all copies are done via cpu memcpy. > >> > >> This means that the cpu tries to read from inaccessible memory, fails, > >> and the radeon module proceeds to disable acceleration. > >> > >> Doing this has no downsides, as the real VRAM size gets set as soon as the > >> CP engines get init. > >> > >> This is a candidate for 3.14 fixes. > >> > >> v2: Add comment on why the function is used > >> Reviewed-by: Christian K?nig >> >> And I suggest to add "Cc: stable at vger.kernel.org" as well. > > Won't this create objects that are stuck in the middle of VRAM with > the new top down approach? > > then when we go to use all the VRAM they'll be pinned in the middle? Yes, the initial pins would act like that with the top-down code. But the top-down logic is 3.15 material and still WIP. Depending on their constraints, I think I'll either add a new flag, or turn them into FIXED allocations - do they need to be at exact position foo or only at the beginning. (Christian?) So sending this fix to stable is safe, as they all use bottom-up. - Lauri
[PATCH] drm/radeon: TTM must be init with cpu-visible VRAM, v2
Without this, a bo may get created in the cpu-inaccessible vram. Before the CP engines get setup, all copies are done via cpu memcpy. This means that the cpu tries to read from inaccessible memory, fails, and the radeon module proceeds to disable acceleration. Doing this has no downsides, as the real VRAM size gets set as soon as the CP engines get init. This is a candidate for 3.14 fixes. v2: Add comment on why the function is used Signed-off-by: Lauri Kasanen --- drivers/gpu/drm/radeon/radeon_ttm.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index 3aa853c..b966f85 100644 --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -715,6 +715,9 @@ int radeon_ttm_init(struct radeon_device *rdev) DRM_ERROR("Failed initializing VRAM heap.\n"); return r; } + /* Change the size here instead of the init above so only lpfn is affected */ + radeon_ttm_set_active_vram_size(rdev, rdev->mc.visible_vram_size); + r = radeon_bo_create(rdev, 256 * 1024, PAGE_SIZE, true, RADEON_GEM_DOMAIN_VRAM, NULL, &rdev->stollen_vga_memory); -- 1.8.3.1
[PATCH] drm/radeon: TTM must be init with cpu-visible VRAM
On Fri, 28 Feb 2014 16:43:54 +0100 Christian K?nig wrote: > >>> Am 27.02.2014 22:38, schrieb Lauri Kasanen: > >>>> Without this, a bo may get created in the cpu-inaccessible vram. > >>>> Before the CP engines get setup, all copies are done via cpu memcpy. > >>>> > >>>> This means that the cpu tries to read from inaccessible memory, fails, > >>>> and the radeon module proceeds to disable acceleration. > >>>> > >>>> Doing this has no downsides, as the real VRAM size gets set as soon as > >>>> the > >>>> CP engines get init. > >>>> > >>>> This is a candidate for 3.14 fixes. > >>> This should be unnecessary, since TTM gets initialized only seeing the > >>> visible VRAM and later on radeon_ttm_set_active_vram_size gets called to > >>> increase the limit. > >>> > >>> If this isn't the case any more we should figure out why instead of > >>> working around it like this. > >> Negative, TTM gets initialized with real_vram just a few lines above > >> this patch, not visible_vram. > > git blame shows 7a50f01a from 2009, "drm/radeon/kms: vram sizing on > > certain r100 chips needs workaround." by Dave Airlie. > > > > So the TTM VRAM init has been wrong for five years, and only worked by > > accident because until now all allocations were done bottom-up. > > Yeah, just came to the same conclusion. We probably never hit the case > in the last five years because we don't really access the memory before > we start the copy ring. > > Please fix ttm_bo_init_mm to use rdev->mc.visible_vram_size instead. > > That allocations are made bottom-up is relied upon in a couple of other > cases as well, the stolen VGA memory and the UVD firmware handling for > example. I initially did that, but it caused some issues. I didn't investigage it further, but I guess it has to be init to the maximum size, and then only the size lowered, so that the change only affects lpfn. - Lauri
[PATCH] drm/radeon: TTM must be init with cpu-visible VRAM
On Fri, 28 Feb 2014 17:30:39 +0200 Lauri Kasanen wrote: > On Fri, 28 Feb 2014 10:36:59 +0100 > Christian K?nig wrote: > > > Am 27.02.2014 22:38, schrieb Lauri Kasanen: > > > Without this, a bo may get created in the cpu-inaccessible vram. > > > Before the CP engines get setup, all copies are done via cpu memcpy. > > > > > > This means that the cpu tries to read from inaccessible memory, fails, > > > and the radeon module proceeds to disable acceleration. > > > > > > Doing this has no downsides, as the real VRAM size gets set as soon as the > > > CP engines get init. > > > > > > This is a candidate for 3.14 fixes. > > > > This should be unnecessary, since TTM gets initialized only seeing the > > visible VRAM and later on radeon_ttm_set_active_vram_size gets called to > > increase the limit. > > > > If this isn't the case any more we should figure out why instead of > > working around it like this. > > Negative, TTM gets initialized with real_vram just a few lines above > this patch, not visible_vram. git blame shows 7a50f01a from 2009, "drm/radeon/kms: vram sizing on certain r100 chips needs workaround." by Dave Airlie. So the TTM VRAM init has been wrong for five years, and only worked by accident because until now all allocations were done bottom-up. - Lauri
[PATCH] drm/radeon: TTM must be init with cpu-visible VRAM
On Fri, 28 Feb 2014 10:36:59 +0100 Christian K?nig wrote: > Am 27.02.2014 22:38, schrieb Lauri Kasanen: > > Without this, a bo may get created in the cpu-inaccessible vram. > > Before the CP engines get setup, all copies are done via cpu memcpy. > > > > This means that the cpu tries to read from inaccessible memory, fails, > > and the radeon module proceeds to disable acceleration. > > > > Doing this has no downsides, as the real VRAM size gets set as soon as the > > CP engines get init. > > > > This is a candidate for 3.14 fixes. > > This should be unnecessary, since TTM gets initialized only seeing the > visible VRAM and later on radeon_ttm_set_active_vram_size gets called to > increase the limit. > > If this isn't the case any more we should figure out why instead of > working around it like this. Negative, TTM gets initialized with real_vram just a few lines above this patch, not visible_vram. - Lauri
[RFC] drm/ttm: Add optional support for two-ended allocation
Allocating small bos from one end and large ones from the other helps improve the quality of fragmentation. I have measured a suitable threshold to reduce eviction by up to 20%, and to cause no harm in normal cases that fit VRAM fully (PTS gaming suite). In some cases, even the VRAM-fitting cases improved slightly (openarena, urban terror). This depends on "drm: Optionally create mm blocks from top-to-bottom" by Chris Wilson, and so is just RFC for now. --- drivers/gpu/drm/ttm/ttm_bo.c | 4 +++- drivers/gpu/drm/ttm/ttm_bo_manager.c | 12 +--- include/drm/ttm/ttm_bo_driver.h | 5 - 3 files changed, 16 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index a066513..7f30c2d 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -1450,7 +1450,8 @@ int ttm_bo_device_init(struct ttm_bo_device *bdev, struct ttm_bo_global *glob, struct ttm_bo_driver *driver, uint64_t file_page_offset, - bool need_dma32) + bool need_dma32, + uint32_t alloc_threshold) { int ret = -EINVAL; @@ -1473,6 +1474,7 @@ int ttm_bo_device_init(struct ttm_bo_device *bdev, bdev->dev_mapping = NULL; bdev->glob = glob; bdev->need_dma32 = need_dma32; + bdev->alloc_threshold = alloc_threshold; bdev->val_seq = 0; spin_lock_init(&bdev->fence_lock); mutex_lock(&glob->device_list_mutex); diff --git a/drivers/gpu/drm/ttm/ttm_bo_manager.c b/drivers/gpu/drm/ttm/ttm_bo_manager.c index c58eba33..bd1cbf1 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_manager.c +++ b/drivers/gpu/drm/ttm/ttm_bo_manager.c @@ -55,6 +55,7 @@ static int ttm_bo_man_get_node(struct ttm_mem_type_manager *man, struct ttm_range_manager *rman = (struct ttm_range_manager *) man->priv; struct drm_mm *mm = &rman->mm; struct drm_mm_node *node = NULL; + enum drm_mm_allocator_flags aflags = DRM_MM_CREATE_DEFAULT; unsigned long lpfn; int ret; @@ -66,11 +67,16 @@ static int ttm_bo_man_get_node(struct ttm_mem_type_manager *man, if (!node) return -ENOMEM; + if (man->bdev->alloc_threshold && + man->bdev->alloc_threshold < (mem->num_pages * PAGE_SIZE)) + aflags = DRM_MM_CREATE_TOP; + spin_lock(&rman->lock); - ret = drm_mm_insert_node_in_range(mm, node, mem->num_pages, - mem->page_alignment, + ret = drm_mm_insert_node_in_range_generic(mm, node, mem->num_pages, + mem->page_alignment, 0, placement->fpfn, lpfn, - DRM_MM_SEARCH_BEST); + DRM_MM_SEARCH_BEST, + aflags); spin_unlock(&rman->lock); if (unlikely(ret)) { diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h index 32d34eb..831320e 100644 --- a/include/drm/ttm/ttm_bo_driver.h +++ b/include/drm/ttm/ttm_bo_driver.h @@ -565,6 +565,7 @@ struct ttm_bo_device { struct delayed_work wq; bool need_dma32; + uint32_t alloc_threshold; }; /** @@ -758,7 +759,9 @@ extern int ttm_bo_device_release(struct ttm_bo_device *bdev); extern int ttm_bo_device_init(struct ttm_bo_device *bdev, struct ttm_bo_global *glob, struct ttm_bo_driver *driver, - uint64_t file_page_offset, bool need_dma32); + uint64_t file_page_offset, + bool need_dma32, + uint32_t alloc_threshold); /** * ttm_bo_unmap_virtual -- 1.8.3.1
[PATCH] drm/radeon: TTM must be init with cpu-visible VRAM
Without this, a bo may get created in the cpu-inaccessible vram. Before the CP engines get setup, all copies are done via cpu memcpy. This means that the cpu tries to read from inaccessible memory, fails, and the radeon module proceeds to disable acceleration. Doing this has no downsides, as the real VRAM size gets set as soon as the CP engines get init. This is a candidate for 3.14 fixes. Signed-off-by: Lauri Kasanen --- drivers/gpu/drm/radeon/radeon_ttm.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index 3aa853c..35eae91 100644 --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -715,6 +715,8 @@ int radeon_ttm_init(struct radeon_device *rdev) DRM_ERROR("Failed initializing VRAM heap.\n"); return r; } + radeon_ttm_set_active_vram_size(rdev, rdev->mc.visible_vram_size); + r = radeon_bo_create(rdev, 256 * 1024, PAGE_SIZE, true, RADEON_GEM_DOMAIN_VRAM, NULL, &rdev->stollen_vga_memory); -- 1.8.3.1
TTM's role in score-based eviction
On Wed, 11 Dec 2013 15:46:53 +0100 Thomas Hellstrom wrote: > > I think the kernel just has to trust userspace on this. I can't think > > of any way of not involving userspace, so if somebody really wants to > > hack mesa to gain some fps advantage on a multiseat system, let them ;) > > > > Basically, they already can hack mesa to pass invalid buffers to cause > > a hang/crash the kernel. So we already trust userspace more than this > > new functionality would. > > Yes, but these are two different things. Letting user-space pin buffers > by design is building in a software DOS in the kernel. > I don't think even Microsoft is allowing this, and AFAICT we've avoided > that since the very dawn of kernel buffer management. > > Not having a perfect command stream parser or proper GPU hang recovery > mechanism is something else, and something we wish > to have but don't at the moment. > > Allowing a new type of DOS just because we have other flaws isn't moving > things forward, but i guess in the end it's your choice. The worst case with the scoring is that a new client will work somewhat slower than it otherwise would. I wouldn't call this a DOS. Instead I would compare it to nice levels. Still, I agree with your concern that a user could disturb another user. This wouldn't be an issue within a single user environment, as the user obviously wanted it if he went that far. Perhaps we could solve that by taking the process's UID into account inside the kernel. If there are multiple UIDs with 3d processes running, reserve a chunk of VRAM for each? - Lauri
TTM's role in score-based eviction
On Wed, 11 Dec 2013 12:04:05 +0900 Michel D?nzer wrote: > > Of all the worries that exist, this is a non-issue. Userspace can > > simply queue a lot of draw calls that take 1 second each through the > > normal command submission methods, why would it need to tweak some > > obscure number to cause some eviction? > > That's not what I'm concerned about. > > Consider e.g. a multiseat environment: Some users could patch their > userspace drivers such that their buffers are more likely to stay in > VRAM than those of other users. > > I agree it's not a huge issue, I'm just saying we should try to make the > score calculation as much as possible based on the actual usage of the > buffers instead of on meta data provided by userspace. We don't have that in the kernel. Only userspace has the accurate stats on usage. If we instead modified userspace to pass these stats, it would have the exact same issue of "what if somebody passes false data"? Maarten: > Well, the easiest solution is to make the score only count as penalty, > and set buffers that don't have the meta-data to maximum score. This > preserves current behavior for clients that aren't score aware. No, this would be the exact opposite: it would pin the old-userspace buffers, at the cost of possibly not letting proper-scored buffers in VRAM. Thomas: > I agree with Michel that some mechanism needs to be in place to stop > user-space clients from effectively pinning buffers by giving them a certain > > score. I think the kernel just has to trust userspace on this. I can't think of any way of not involving userspace, so if somebody really wants to hack mesa to gain some fps advantage on a multiseat system, let them ;) Basically, they already can hack mesa to pass invalid buffers to cause a hang/crash the kernel. So we already trust userspace more than this new functionality would. > 2) If score is calculated in user-space, how are shared buffers handled? Good question, I don't know yet. - Lauri
TTM's role in score-based eviction
On Mon, 9 Dec 2013 23:45:12 +0100 Marek Ol??k wrote: > > Note that the hotness calculation will be in userspace, as only there > > are the necessary counters available. So the finished hotness score > > will be passed to the kernel, instead of sending all the necessary data > > there. Ought to be less context switches that way. > > This sounds good, but you will also need to update the DDX for > everything up to and including Cayman. Hopefully the DDX doesn't emit > IBs outside of glamor on Southern Islands and later chips. Do you mean to pass an empty score (0) for 2d buffers, or that 2d buffers should also get the calculation? I suppose this depends on which ioctl is used to pass it. IMHO for 2d use the current behavior is ok, so passing a dummy value and falling back should be enough. - Lauri
TTM's role in score-based eviction
On Mon, 9 Dec 2013 20:28:21 +0100 Marek Ol??k wrote: Hi, > FYI, since the userspace driver sends end-of-frame markers to the > kernel, the radeon kernel driver knows the current frame number and it > can also save the frame number of the last use of each buffer. We > should definitely use that to measure the buffer hotness, or just > prevent eviction if the buffer was used recently (the last 2 or 3 > frames) and you can drop the hotness calculations entirely. I think this would result in sub-optimal behavior with one client, but a workload larger than VRAM. If everything is needed in one frame, then this logic would almost randomly decide what gets to stay. > Also, MSAA buffers and depth buffers should have higher probability of > being placed in VRAM than other buffers, because their placement has > higher impact on performance. They also tend to contain auxiliary data > which significantly improve performance, like fast clear data, MSAA > fragment coverage data, and hierarchical depth and stencil data. We > can add a new ioctl which sets buffer usage flags. Thanks, this info will be useful. Note that the hotness calculation will be in userspace, as only there are the necessary counters available. So the finished hotness score will be passed to the kernel, instead of sending all the necessary data there. Ought to be less context switches that way. - Lauri
TTM's role in score-based eviction
Hi list, Thomas, I will be investigating the use of a hotness score for each bo, to replace the ping-pong causing LRU eviction in radeon*. The goal is to put all bos that fit in VRAM there, in order of hotness; a new bo should only be placed there if its hotness score is greater than the lowest VRAM bo's. Then the lowest-hotness-bos in VRAM should be evicted until the new bo fits. This should result in a more stable set with less ping-pong. Jerome advised that the bo placement should be done entirely outside TTM. As I'm not (yet) too familiar with that side of the kernel, what is the opinion of TTM folks? - Lauri * github.com/clbr/jamkthesis
radeon: Make PM info available to all, not just debug users, yearly re-roll
Hi list, In June 2012 I started a discussion on moving the radeon PM info out from debugfs, into a proper place available to all, be it proc, sysfs, or some other way. For the rationale please see the archives: http://www.mail-archive.com/dri-devel at lists.freedesktop.org/msg23401.html Jerome's reply at the time was that since big changes are coming, he didn't want to commit to any ABI at that time. DPM has now landed. Jerome wanted to implement a custom ioctl for unprivileged apps to query the power details (and ideally other GPU info too). As now the big changes have happened, what's the status on this ioctl? Pre-emptive -enopatch: I'm not the right person to design an ioctl, with exactly zero such experience in making kernel ABIs. I do volunteer to making the userspace app that displays the values and info. - Lauri
radeon: Make PM info available to all, not just debug users, yearly re-roll
Hi list, In June 2012 I started a discussion on moving the radeon PM info out from debugfs, into a proper place available to all, be it proc, sysfs, or some other way. For the rationale please see the archives: http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg23401.html Jerome's reply at the time was that since big changes are coming, he didn't want to commit to any ABI at that time. DPM has now landed. Jerome wanted to implement a custom ioctl for unprivileged apps to query the power details (and ideally other GPU info too). As now the big changes have happened, what's the status on this ioctl? Pre-emptive -enopatch: I'm not the right person to design an ioctl, with exactly zero such experience in making kernel ABIs. I do volunteer to making the userspace app that displays the values and info. - Lauri ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH] drm/radeon: Remove unused functions
This applies on top of drm/radeon: Mark all possible functions / structs as static. Signed-off-by: Lauri Kasanen --- drivers/gpu/drm/radeon/r100.c | 44 --- drivers/gpu/drm/radeon/radeon_combios.c |9 - drivers/gpu/drm/radeon/radeon_fb.c | 16 -- drivers/gpu/drm/radeon/radeon_legacy_crtc.c |5 --- 4 files changed, 0 insertions(+), 74 deletions(-) diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c index b51fc49..ff9b692 100644 --- a/drivers/gpu/drm/radeon/r100.c +++ b/drivers/gpu/drm/radeon/r100.c @@ -2544,50 +2544,6 @@ static void r100_errata(struct radeon_device *rdev) } } -/* Wait for vertical sync on primary CRTC */ -static void r100_gpu_wait_for_vsync(struct radeon_device *rdev) -{ - uint32_t crtc_gen_cntl, tmp; - int i; - - crtc_gen_cntl = RREG32(RADEON_CRTC_GEN_CNTL); - if ((crtc_gen_cntl & RADEON_CRTC_DISP_REQ_EN_B) || - !(crtc_gen_cntl & RADEON_CRTC_EN)) { - return; - } - /* Clear the CRTC_VBLANK_SAVE bit */ - WREG32(RADEON_CRTC_STATUS, RADEON_CRTC_VBLANK_SAVE_CLEAR); - for (i = 0; i < rdev->usec_timeout; i++) { - tmp = RREG32(RADEON_CRTC_STATUS); - if (tmp & RADEON_CRTC_VBLANK_SAVE) { - return; - } - DRM_UDELAY(1); - } -} - -/* Wait for vertical sync on secondary CRTC */ -static void r100_gpu_wait_for_vsync2(struct radeon_device *rdev) -{ - uint32_t crtc2_gen_cntl, tmp; - int i; - - crtc2_gen_cntl = RREG32(RADEON_CRTC2_GEN_CNTL); - if ((crtc2_gen_cntl & RADEON_CRTC2_DISP_REQ_EN_B) || - !(crtc2_gen_cntl & RADEON_CRTC2_EN)) - return; - - /* Clear the CRTC_VBLANK_SAVE bit */ - WREG32(RADEON_CRTC2_STATUS, RADEON_CRTC2_VBLANK_SAVE_CLEAR); - for (i = 0; i < rdev->usec_timeout; i++) { - tmp = RREG32(RADEON_CRTC2_STATUS); - if (tmp & RADEON_CRTC2_VBLANK_SAVE) { - return; - } - DRM_UDELAY(1); - } -} - static int r100_rbbm_fifo_wait_for_entry(struct radeon_device *rdev, unsigned n) { unsigned i; diff --git a/drivers/gpu/drm/radeon/radeon_combios.c b/drivers/gpu/drm/radeon/radeon_combios.c index c2bfac2..fe225cc 100644 --- a/drivers/gpu/drm/radeon/radeon_combios.c +++ b/drivers/gpu/drm/radeon/radeon_combios.c @@ -3304,15 +3304,6 @@ static void combios_write_ram_size(struct drm_device *dev) WREG32(RADEON_CONFIG_MEMSIZE, mem_size); } -static void radeon_combios_dyn_clk_setup(struct drm_device *dev, int enable) -{ - uint16_t dyn_clk_info = - combios_get_table_offset(dev, COMBIOS_DYN_CLK_1_TABLE); - - if (dyn_clk_info) - combios_parse_pll_table(dev, dyn_clk_info); -} - void radeon_combios_asic_init(struct drm_device *dev) { struct radeon_device *rdev = dev->dev_private; diff --git a/drivers/gpu/drm/radeon/radeon_fb.c b/drivers/gpu/drm/radeon/radeon_fb.c index 8b088ca..6ada72c 100644 --- a/drivers/gpu/drm/radeon/radeon_fb.c +++ b/drivers/gpu/drm/radeon/radeon_fb.c @@ -316,22 +316,6 @@ static int radeon_fb_find_or_create_single(struct drm_fb_helper *helper, return new_fb; } -static char *mode_option; -static int radeon_parse_options(char *options) -{ - char *this_opt; - - if (!options || !*options) - return 0; - - while ((this_opt = strsep(&options, ",")) != NULL) { - if (!*this_opt) - continue; - mode_option = this_opt; - } - return 0; -} - void radeon_fb_output_poll_changed(struct radeon_device *rdev) { drm_fb_helper_hotplug_event(&rdev->mode_info.rfbdev->helper); diff --git a/drivers/gpu/drm/radeon/radeon_legacy_crtc.c b/drivers/gpu/drm/radeon/radeon_legacy_crtc.c index 31022c2..08a60bd 100644 --- a/drivers/gpu/drm/radeon/radeon_legacy_crtc.c +++ b/drivers/gpu/drm/radeon/radeon_legacy_crtc.c @@ -206,11 +206,6 @@ static void radeon_legacy_rmx_mode_set(struct drm_crtc *crtc, WREG32(RADEON_FP_CRTC_V_TOTAL_DISP, fp_crtc_v_total_disp); } -static void radeon_restore_common_regs(struct drm_device *dev) -{ - /* don't need this yet */ -} - static void radeon_pll_wait_for_read_update_complete(struct drm_device *dev) { struct radeon_device *rdev = dev->dev_private; -- 1.7.2.1
[PATCH] drm/radeon: Mark all possible functions / structs as static
On Tue, 31 Jul 2012 14:43:34 +1000 Dave Airlie wrote: > On Tue, Jul 31, 2012 at 12:45 AM, Alex Deucher > wrote: > > On Fri, Jul 27, 2012 at 5:34 PM, Lauri Kasanen wrote: > >> Let's allow GCC to optimize better. > >> > >> This exposed some five unused functions, but this patch doesn't remove > >> them. > >> > >> Signed-off-by: Lauri Kasanen > > > > Reviewed-by: Alex Deucher > > You might want to take this one step further and fix all the warnings > it introduces here, > > CC [M] drivers/gpu/drm/radeon/radeon_combios.o > /home/airlied/devel/kernel/drm-next/drivers/gpu/drm/radeon/radeon_combios.c:3307:13: > warning: ?radeon_combios_dyn_clk_setup? defined but not used > [-Wunused-function] > CC [M] drivers/gpu/drm/radeon/radeon_legacy_crtc.o > /home/airlied/devel/kernel/drm-next/drivers/gpu/drm/radeon/radeon_legacy_crtc.c:209:13: > warning: ?radeon_restore_common_regs? defined but not used > [-Wunused-function] > > CC [M] drivers/gpu/drm/radeon/r100.o > /home/airlied/devel/kernel/drm-next/drivers/gpu/drm/radeon/r100.c:2548:13: > warning: ?r100_gpu_wait_for_vsync? defined but not used > [-Wunused-function] > /home/airlied/devel/kernel/drm-next/drivers/gpu/drm/radeon/r100.c:2570:13: > warning: ?r100_gpu_wait_for_vsync2? defined but not used > [-Wunused-function] > > Not sure we want to lose all these functions or just if 0 them out for > reference purposes, or even why we might need to call some of them. That's why I didn't remove them. I don't have the info on whether they are needed or not, someone qualified will need to make that decision. - Lauri
[PATCH] drm/radeon: Remove unused functions
This applies on top of drm/radeon: Mark all possible functions / structs as static. Signed-off-by: Lauri Kasanen --- drivers/gpu/drm/radeon/r100.c | 44 --- drivers/gpu/drm/radeon/radeon_combios.c |9 - drivers/gpu/drm/radeon/radeon_fb.c | 16 -- drivers/gpu/drm/radeon/radeon_legacy_crtc.c |5 --- 4 files changed, 0 insertions(+), 74 deletions(-) diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c index b51fc49..ff9b692 100644 --- a/drivers/gpu/drm/radeon/r100.c +++ b/drivers/gpu/drm/radeon/r100.c @@ -2544,50 +2544,6 @@ static void r100_errata(struct radeon_device *rdev) } } -/* Wait for vertical sync on primary CRTC */ -static void r100_gpu_wait_for_vsync(struct radeon_device *rdev) -{ - uint32_t crtc_gen_cntl, tmp; - int i; - - crtc_gen_cntl = RREG32(RADEON_CRTC_GEN_CNTL); - if ((crtc_gen_cntl & RADEON_CRTC_DISP_REQ_EN_B) || - !(crtc_gen_cntl & RADEON_CRTC_EN)) { - return; - } - /* Clear the CRTC_VBLANK_SAVE bit */ - WREG32(RADEON_CRTC_STATUS, RADEON_CRTC_VBLANK_SAVE_CLEAR); - for (i = 0; i < rdev->usec_timeout; i++) { - tmp = RREG32(RADEON_CRTC_STATUS); - if (tmp & RADEON_CRTC_VBLANK_SAVE) { - return; - } - DRM_UDELAY(1); - } -} - -/* Wait for vertical sync on secondary CRTC */ -static void r100_gpu_wait_for_vsync2(struct radeon_device *rdev) -{ - uint32_t crtc2_gen_cntl, tmp; - int i; - - crtc2_gen_cntl = RREG32(RADEON_CRTC2_GEN_CNTL); - if ((crtc2_gen_cntl & RADEON_CRTC2_DISP_REQ_EN_B) || - !(crtc2_gen_cntl & RADEON_CRTC2_EN)) - return; - - /* Clear the CRTC_VBLANK_SAVE bit */ - WREG32(RADEON_CRTC2_STATUS, RADEON_CRTC2_VBLANK_SAVE_CLEAR); - for (i = 0; i < rdev->usec_timeout; i++) { - tmp = RREG32(RADEON_CRTC2_STATUS); - if (tmp & RADEON_CRTC2_VBLANK_SAVE) { - return; - } - DRM_UDELAY(1); - } -} - static int r100_rbbm_fifo_wait_for_entry(struct radeon_device *rdev, unsigned n) { unsigned i; diff --git a/drivers/gpu/drm/radeon/radeon_combios.c b/drivers/gpu/drm/radeon/radeon_combios.c index c2bfac2..fe225cc 100644 --- a/drivers/gpu/drm/radeon/radeon_combios.c +++ b/drivers/gpu/drm/radeon/radeon_combios.c @@ -3304,15 +3304,6 @@ static void combios_write_ram_size(struct drm_device *dev) WREG32(RADEON_CONFIG_MEMSIZE, mem_size); } -static void radeon_combios_dyn_clk_setup(struct drm_device *dev, int enable) -{ - uint16_t dyn_clk_info = - combios_get_table_offset(dev, COMBIOS_DYN_CLK_1_TABLE); - - if (dyn_clk_info) - combios_parse_pll_table(dev, dyn_clk_info); -} - void radeon_combios_asic_init(struct drm_device *dev) { struct radeon_device *rdev = dev->dev_private; diff --git a/drivers/gpu/drm/radeon/radeon_fb.c b/drivers/gpu/drm/radeon/radeon_fb.c index 8b088ca..6ada72c 100644 --- a/drivers/gpu/drm/radeon/radeon_fb.c +++ b/drivers/gpu/drm/radeon/radeon_fb.c @@ -316,22 +316,6 @@ static int radeon_fb_find_or_create_single(struct drm_fb_helper *helper, return new_fb; } -static char *mode_option; -static int radeon_parse_options(char *options) -{ - char *this_opt; - - if (!options || !*options) - return 0; - - while ((this_opt = strsep(&options, ",")) != NULL) { - if (!*this_opt) - continue; - mode_option = this_opt; - } - return 0; -} - void radeon_fb_output_poll_changed(struct radeon_device *rdev) { drm_fb_helper_hotplug_event(&rdev->mode_info.rfbdev->helper); diff --git a/drivers/gpu/drm/radeon/radeon_legacy_crtc.c b/drivers/gpu/drm/radeon/radeon_legacy_crtc.c index 31022c2..08a60bd 100644 --- a/drivers/gpu/drm/radeon/radeon_legacy_crtc.c +++ b/drivers/gpu/drm/radeon/radeon_legacy_crtc.c @@ -206,11 +206,6 @@ static void radeon_legacy_rmx_mode_set(struct drm_crtc *crtc, WREG32(RADEON_FP_CRTC_V_TOTAL_DISP, fp_crtc_v_total_disp); } -static void radeon_restore_common_regs(struct drm_device *dev) -{ - /* don't need this yet */ -} - static void radeon_pll_wait_for_read_update_complete(struct drm_device *dev) { struct radeon_device *rdev = dev->dev_private; -- 1.7.2.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] drm/radeon: Mark all possible functions / structs as static
On Tue, 31 Jul 2012 14:43:34 +1000 Dave Airlie wrote: > On Tue, Jul 31, 2012 at 12:45 AM, Alex Deucher wrote: > > On Fri, Jul 27, 2012 at 5:34 PM, Lauri Kasanen wrote: > >> Let's allow GCC to optimize better. > >> > >> This exposed some five unused functions, but this patch doesn't remove > >> them. > >> > >> Signed-off-by: Lauri Kasanen > > > > Reviewed-by: Alex Deucher > > You might want to take this one step further and fix all the warnings > it introduces here, > > CC [M] drivers/gpu/drm/radeon/radeon_combios.o > /home/airlied/devel/kernel/drm-next/drivers/gpu/drm/radeon/radeon_combios.c:3307:13: > warning: ‘radeon_combios_dyn_clk_setup’ defined but not used > [-Wunused-function] > CC [M] drivers/gpu/drm/radeon/radeon_legacy_crtc.o > /home/airlied/devel/kernel/drm-next/drivers/gpu/drm/radeon/radeon_legacy_crtc.c:209:13: > warning: ‘radeon_restore_common_regs’ defined but not used > [-Wunused-function] > > CC [M] drivers/gpu/drm/radeon/r100.o > /home/airlied/devel/kernel/drm-next/drivers/gpu/drm/radeon/r100.c:2548:13: > warning: ‘r100_gpu_wait_for_vsync’ defined but not used > [-Wunused-function] > /home/airlied/devel/kernel/drm-next/drivers/gpu/drm/radeon/r100.c:2570:13: > warning: ‘r100_gpu_wait_for_vsync2’ defined but not used > [-Wunused-function] > > Not sure we want to lose all these functions or just if 0 them out for > reference purposes, or even why we might need to call some of them. That's why I didn't remove them. I don't have the info on whether they are needed or not, someone qualified will need to make that decision. - Lauri ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH] drm/radeon: Mark all possible functions / structs as static
Let's allow GCC to optimize better. This exposed some five unused functions, but this patch doesn't remove them. Signed-off-by: Lauri Kasanen --- drivers/gpu/drm/radeon/atombios_encoders.c |4 +- drivers/gpu/drm/radeon/evergreen.c | 12 +- drivers/gpu/drm/radeon/evergreen_cs.c |2 +- drivers/gpu/drm/radeon/ni.c |8 +++--- drivers/gpu/drm/radeon/r100.c | 14 ++-- drivers/gpu/drm/radeon/r300.c |4 +- drivers/gpu/drm/radeon/r520.c |4 +- drivers/gpu/drm/radeon/r600.c | 18 drivers/gpu/drm/radeon/r600_cs.c|2 +- drivers/gpu/drm/radeon/r600_hdmi.c |2 +- drivers/gpu/drm/radeon/radeon_combios.c |2 +- drivers/gpu/drm/radeon/radeon_connectors.c | 28 +- drivers/gpu/drm/radeon/radeon_cs.c |2 +- drivers/gpu/drm/radeon/radeon_device.c |2 +- drivers/gpu/drm/radeon/radeon_fb.c |2 +- drivers/gpu/drm/radeon/radeon_fence.c |2 +- drivers/gpu/drm/radeon/radeon_ioc32.c |2 +- drivers/gpu/drm/radeon/radeon_legacy_crtc.c |4 +- drivers/gpu/drm/radeon/radeon_ring.c|8 +++--- drivers/gpu/drm/radeon/radeon_test.c|2 +- drivers/gpu/drm/radeon/radeon_ttm.c |2 +- drivers/gpu/drm/radeon/rs400.c |6 ++-- drivers/gpu/drm/radeon/rs600.c | 14 ++-- drivers/gpu/drm/radeon/rs690.c |6 ++-- drivers/gpu/drm/radeon/rv515.c | 18 drivers/gpu/drm/radeon/rv770.c | 10 drivers/gpu/drm/radeon/si.c |6 ++-- 27 files changed, 93 insertions(+), 93 deletions(-) diff --git a/drivers/gpu/drm/radeon/atombios_encoders.c b/drivers/gpu/drm/radeon/atombios_encoders.c index f9bc27f..260fd01 100644 --- a/drivers/gpu/drm/radeon/atombios_encoders.c +++ b/drivers/gpu/drm/radeon/atombios_encoders.c @@ -2289,7 +2289,7 @@ static const struct drm_encoder_funcs radeon_atom_enc_funcs = { .destroy = radeon_enc_destroy, }; -struct radeon_encoder_atom_dac * +static struct radeon_encoder_atom_dac * radeon_atombios_set_dac_info(struct radeon_encoder *radeon_encoder) { struct drm_device *dev = radeon_encoder->base.dev; @@ -2303,7 +2303,7 @@ radeon_atombios_set_dac_info(struct radeon_encoder *radeon_encoder) return dac; } -struct radeon_encoder_atom_dig * +static struct radeon_encoder_atom_dig * radeon_atombios_set_dig_info(struct radeon_encoder *radeon_encoder) { int encoder_enum = (radeon_encoder->encoder_enum & ENUM_ID_MASK) >> ENUM_ID_SHIFT; diff --git a/drivers/gpu/drm/radeon/evergreen.c b/drivers/gpu/drm/radeon/evergreen.c index e585a3b..32b0522 100644 --- a/drivers/gpu/drm/radeon/evergreen.c +++ b/drivers/gpu/drm/radeon/evergreen.c @@ -1109,7 +1109,7 @@ void evergreen_pcie_gart_tlb_flush(struct radeon_device *rdev) } } -int evergreen_pcie_gart_enable(struct radeon_device *rdev) +static int evergreen_pcie_gart_enable(struct radeon_device *rdev) { u32 tmp; int r; @@ -1168,7 +1168,7 @@ int evergreen_pcie_gart_enable(struct radeon_device *rdev) return 0; } -void evergreen_pcie_gart_disable(struct radeon_device *rdev) +static void evergreen_pcie_gart_disable(struct radeon_device *rdev) { u32 tmp; @@ -1193,7 +1193,7 @@ void evergreen_pcie_gart_disable(struct radeon_device *rdev) radeon_gart_table_vram_unpin(rdev); } -void evergreen_pcie_gart_fini(struct radeon_device *rdev) +static void evergreen_pcie_gart_fini(struct radeon_device *rdev) { evergreen_pcie_gart_disable(rdev); radeon_gart_table_vram_free(rdev); @@ -1201,7 +1201,7 @@ void evergreen_pcie_gart_fini(struct radeon_device *rdev) } -void evergreen_agp_enable(struct radeon_device *rdev) +static void evergreen_agp_enable(struct radeon_device *rdev) { u32 tmp; @@ -1614,7 +1614,7 @@ static int evergreen_cp_start(struct radeon_device *rdev) return 0; } -int evergreen_cp_resume(struct radeon_device *rdev) +static int evergreen_cp_resume(struct radeon_device *rdev) { struct radeon_ring *ring = &rdev->ring[RADEON_RING_TYPE_GFX_INDEX]; u32 tmp; @@ -2775,7 +2775,7 @@ static void evergreen_irq_ack(struct radeon_device *rdev) } } -void evergreen_irq_disable(struct radeon_device *rdev) +static void evergreen_irq_disable(struct radeon_device *rdev) { r600_disable_interrupts(rdev); /* Wait and acknowledge irq */ diff --git a/drivers/gpu/drm/radeon/evergreen_cs.c b/drivers/gpu/drm/radeon/evergreen_cs.c index c1655412..b93a617 100644 --- a/drivers/gpu/drm/radeon/evergreen_cs.c +++ b/drivers/gpu/drm/radeon/evergreen_cs.c @@ -986,7 +986,7 @@ static int evergreen_cs_track_check(struct radeon_cs_parser *p) * Assume that chunk_ib_index is properly set. Will return -EI
[PATCH] drm/radeon: Mark all possible functions / structs as static
Let's allow GCC to optimize better. This exposed some five unused functions, but this patch doesn't remove them. Signed-off-by: Lauri Kasanen --- drivers/gpu/drm/radeon/atombios_encoders.c |4 +- drivers/gpu/drm/radeon/evergreen.c | 12 +- drivers/gpu/drm/radeon/evergreen_cs.c |2 +- drivers/gpu/drm/radeon/ni.c |8 +++--- drivers/gpu/drm/radeon/r100.c | 14 ++-- drivers/gpu/drm/radeon/r300.c |4 +- drivers/gpu/drm/radeon/r520.c |4 +- drivers/gpu/drm/radeon/r600.c | 18 drivers/gpu/drm/radeon/r600_cs.c|2 +- drivers/gpu/drm/radeon/r600_hdmi.c |2 +- drivers/gpu/drm/radeon/radeon_combios.c |2 +- drivers/gpu/drm/radeon/radeon_connectors.c | 28 +- drivers/gpu/drm/radeon/radeon_cs.c |2 +- drivers/gpu/drm/radeon/radeon_device.c |2 +- drivers/gpu/drm/radeon/radeon_fb.c |2 +- drivers/gpu/drm/radeon/radeon_fence.c |2 +- drivers/gpu/drm/radeon/radeon_ioc32.c |2 +- drivers/gpu/drm/radeon/radeon_legacy_crtc.c |4 +- drivers/gpu/drm/radeon/radeon_ring.c|8 +++--- drivers/gpu/drm/radeon/radeon_test.c|2 +- drivers/gpu/drm/radeon/radeon_ttm.c |2 +- drivers/gpu/drm/radeon/rs400.c |6 ++-- drivers/gpu/drm/radeon/rs600.c | 14 ++-- drivers/gpu/drm/radeon/rs690.c |6 ++-- drivers/gpu/drm/radeon/rv515.c | 18 drivers/gpu/drm/radeon/rv770.c | 10 drivers/gpu/drm/radeon/si.c |6 ++-- 27 files changed, 93 insertions(+), 93 deletions(-) diff --git a/drivers/gpu/drm/radeon/atombios_encoders.c b/drivers/gpu/drm/radeon/atombios_encoders.c index f9bc27f..260fd01 100644 --- a/drivers/gpu/drm/radeon/atombios_encoders.c +++ b/drivers/gpu/drm/radeon/atombios_encoders.c @@ -2289,7 +2289,7 @@ static const struct drm_encoder_funcs radeon_atom_enc_funcs = { .destroy = radeon_enc_destroy, }; -struct radeon_encoder_atom_dac * +static struct radeon_encoder_atom_dac * radeon_atombios_set_dac_info(struct radeon_encoder *radeon_encoder) { struct drm_device *dev = radeon_encoder->base.dev; @@ -2303,7 +2303,7 @@ radeon_atombios_set_dac_info(struct radeon_encoder *radeon_encoder) return dac; } -struct radeon_encoder_atom_dig * +static struct radeon_encoder_atom_dig * radeon_atombios_set_dig_info(struct radeon_encoder *radeon_encoder) { int encoder_enum = (radeon_encoder->encoder_enum & ENUM_ID_MASK) >> ENUM_ID_SHIFT; diff --git a/drivers/gpu/drm/radeon/evergreen.c b/drivers/gpu/drm/radeon/evergreen.c index e585a3b..32b0522 100644 --- a/drivers/gpu/drm/radeon/evergreen.c +++ b/drivers/gpu/drm/radeon/evergreen.c @@ -1109,7 +1109,7 @@ void evergreen_pcie_gart_tlb_flush(struct radeon_device *rdev) } } -int evergreen_pcie_gart_enable(struct radeon_device *rdev) +static int evergreen_pcie_gart_enable(struct radeon_device *rdev) { u32 tmp; int r; @@ -1168,7 +1168,7 @@ int evergreen_pcie_gart_enable(struct radeon_device *rdev) return 0; } -void evergreen_pcie_gart_disable(struct radeon_device *rdev) +static void evergreen_pcie_gart_disable(struct radeon_device *rdev) { u32 tmp; @@ -1193,7 +1193,7 @@ void evergreen_pcie_gart_disable(struct radeon_device *rdev) radeon_gart_table_vram_unpin(rdev); } -void evergreen_pcie_gart_fini(struct radeon_device *rdev) +static void evergreen_pcie_gart_fini(struct radeon_device *rdev) { evergreen_pcie_gart_disable(rdev); radeon_gart_table_vram_free(rdev); @@ -1201,7 +1201,7 @@ void evergreen_pcie_gart_fini(struct radeon_device *rdev) } -void evergreen_agp_enable(struct radeon_device *rdev) +static void evergreen_agp_enable(struct radeon_device *rdev) { u32 tmp; @@ -1614,7 +1614,7 @@ static int evergreen_cp_start(struct radeon_device *rdev) return 0; } -int evergreen_cp_resume(struct radeon_device *rdev) +static int evergreen_cp_resume(struct radeon_device *rdev) { struct radeon_ring *ring = &rdev->ring[RADEON_RING_TYPE_GFX_INDEX]; u32 tmp; @@ -2775,7 +2775,7 @@ static void evergreen_irq_ack(struct radeon_device *rdev) } } -void evergreen_irq_disable(struct radeon_device *rdev) +static void evergreen_irq_disable(struct radeon_device *rdev) { r600_disable_interrupts(rdev); /* Wait and acknowledge irq */ diff --git a/drivers/gpu/drm/radeon/evergreen_cs.c b/drivers/gpu/drm/radeon/evergreen_cs.c index c1655412..b93a617 100644 --- a/drivers/gpu/drm/radeon/evergreen_cs.c +++ b/drivers/gpu/drm/radeon/evergreen_cs.c @@ -986,7 +986,7 @@ static int evergreen_cs_track_check(struct radeon_cs_parser *p) * Assume that chunk_ib_index is properly set. Will retur
[PATCH 2/2] drm/radeon: Make sure the correct TA bit is used for R700
The declarations were moved around because of a GCC warning, "ISO C90 forbids mixed declarations and code". (why so strict?) Signed-off-by: Lauri Kasanen --- drivers/gpu/drm/radeon/r600.c | 16 1 files changed, 12 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c index bff6272..c77cdba 100644 --- a/drivers/gpu/drm/radeon/r600.c +++ b/drivers/gpu/drm/radeon/r600.c @@ -1278,15 +1278,24 @@ void r600_vram_scratch_fini(struct radeon_device *rdev) int r600_gpu_soft_reset(struct radeon_device *rdev) { struct rv515_mc_save save; - u32 grbm_busy_mask = S_008010_VC_BUSY(1) | S_008010_VGT_BUSY_NO_DMA(1) | - S_008010_VGT_BUSY(1) | S_008010_TA03_BUSY(1) | + u32 grbm_busy_mask; + u32 grbm2_busy_mask; + u32 tmp; + + u32 ta_busy = S_008010_TA03_BUSY(1); + + if (rdev->family >= CHIP_RV770) + ta_busy = S_008010_R700_TA03_BUSY(1); + + grbm_busy_mask = S_008010_VC_BUSY(1) | S_008010_VGT_BUSY_NO_DMA(1) | + S_008010_VGT_BUSY(1) | ta_busy | S_008010_TC_BUSY(1) | S_008010_SX_BUSY(1) | S_008010_SH_BUSY(1) | S_008010_SPI03_BUSY(1) | S_008010_SMX_BUSY(1) | S_008010_SC_BUSY(1) | S_008010_PA_BUSY(1) | S_008010_DB03_BUSY(1) | S_008010_CR_BUSY(1) | S_008010_CB03_BUSY(1) | S_008010_GUI_ACTIVE(1); - u32 grbm2_busy_mask = S_008014_SPI0_BUSY(1) | S_008014_SPI1_BUSY(1) | + grbm2_busy_mask = S_008014_SPI0_BUSY(1) | S_008014_SPI1_BUSY(1) | S_008014_SPI2_BUSY(1) | S_008014_SPI3_BUSY(1) | S_008014_TA0_BUSY(1) | S_008014_TA1_BUSY(1) | S_008014_TA2_BUSY(1) | S_008014_TA3_BUSY(1) | @@ -1294,7 +1303,6 @@ int r600_gpu_soft_reset(struct radeon_device *rdev) S_008014_DB2_BUSY(1) | S_008014_DB3_BUSY(1) | S_008014_CB0_BUSY(1) | S_008014_CB1_BUSY(1) | S_008014_CB2_BUSY(1) | S_008014_CB3_BUSY(1); - u32 tmp; if (!(RREG32(GRBM_STATUS) & GUI_ACTIVE)) return 0; -- 1.7.2.1
[PATCH 1/2] drm/radeon: Document that R700 has a different TA status bit than R600
Hi This info comes from r600_demo, and was confirmed correct on a RV710. Signed-off-by: Lauri Kasanen --- drivers/gpu/drm/radeon/r600d.h |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/drivers/gpu/drm/radeon/r600d.h b/drivers/gpu/drm/radeon/r600d.h index 025fd5b..b818a5c 100644 --- a/drivers/gpu/drm/radeon/r600d.h +++ b/drivers/gpu/drm/radeon/r600d.h @@ -1192,6 +1192,7 @@ #defineS_008010_VC_BUSY(x) (((x) & 1) << 11) #defineS_008010_DB03_CLEAN(x) (((x) & 1) << 12) #defineS_008010_CB03_CLEAN(x) (((x) & 1) << 13) +#defineS_008010_R700_TA03_BUSY(x) (((x) & 1) << 14) #defineS_008010_VGT_BUSY_NO_DMA(x) (((x) & 1) << 16) #defineS_008010_VGT_BUSY(x)(((x) & 1) << 17) #defineS_008010_TA03_BUSY(x) (((x) & 1) << 18) -- 1.7.2.1
[PATCH 2/2] drm/radeon: Make sure the correct TA bit is used for R700
The declarations were moved around because of a GCC warning, "ISO C90 forbids mixed declarations and code". (why so strict?) Signed-off-by: Lauri Kasanen --- drivers/gpu/drm/radeon/r600.c | 16 1 files changed, 12 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c index bff6272..c77cdba 100644 --- a/drivers/gpu/drm/radeon/r600.c +++ b/drivers/gpu/drm/radeon/r600.c @@ -1278,15 +1278,24 @@ void r600_vram_scratch_fini(struct radeon_device *rdev) int r600_gpu_soft_reset(struct radeon_device *rdev) { struct rv515_mc_save save; - u32 grbm_busy_mask = S_008010_VC_BUSY(1) | S_008010_VGT_BUSY_NO_DMA(1) | - S_008010_VGT_BUSY(1) | S_008010_TA03_BUSY(1) | + u32 grbm_busy_mask; + u32 grbm2_busy_mask; + u32 tmp; + + u32 ta_busy = S_008010_TA03_BUSY(1); + + if (rdev->family >= CHIP_RV770) + ta_busy = S_008010_R700_TA03_BUSY(1); + + grbm_busy_mask = S_008010_VC_BUSY(1) | S_008010_VGT_BUSY_NO_DMA(1) | + S_008010_VGT_BUSY(1) | ta_busy | S_008010_TC_BUSY(1) | S_008010_SX_BUSY(1) | S_008010_SH_BUSY(1) | S_008010_SPI03_BUSY(1) | S_008010_SMX_BUSY(1) | S_008010_SC_BUSY(1) | S_008010_PA_BUSY(1) | S_008010_DB03_BUSY(1) | S_008010_CR_BUSY(1) | S_008010_CB03_BUSY(1) | S_008010_GUI_ACTIVE(1); - u32 grbm2_busy_mask = S_008014_SPI0_BUSY(1) | S_008014_SPI1_BUSY(1) | + grbm2_busy_mask = S_008014_SPI0_BUSY(1) | S_008014_SPI1_BUSY(1) | S_008014_SPI2_BUSY(1) | S_008014_SPI3_BUSY(1) | S_008014_TA0_BUSY(1) | S_008014_TA1_BUSY(1) | S_008014_TA2_BUSY(1) | S_008014_TA3_BUSY(1) | @@ -1294,7 +1303,6 @@ int r600_gpu_soft_reset(struct radeon_device *rdev) S_008014_DB2_BUSY(1) | S_008014_DB3_BUSY(1) | S_008014_CB0_BUSY(1) | S_008014_CB1_BUSY(1) | S_008014_CB2_BUSY(1) | S_008014_CB3_BUSY(1); - u32 tmp; if (!(RREG32(GRBM_STATUS) & GUI_ACTIVE)) return 0; -- 1.7.2.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 1/2] drm/radeon: Document that R700 has a different TA status bit than R600
Hi This info comes from r600_demo, and was confirmed correct on a RV710. Signed-off-by: Lauri Kasanen --- drivers/gpu/drm/radeon/r600d.h |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/drivers/gpu/drm/radeon/r600d.h b/drivers/gpu/drm/radeon/r600d.h index 025fd5b..b818a5c 100644 --- a/drivers/gpu/drm/radeon/r600d.h +++ b/drivers/gpu/drm/radeon/r600d.h @@ -1192,6 +1192,7 @@ #defineS_008010_VC_BUSY(x) (((x) & 1) << 11) #defineS_008010_DB03_CLEAN(x) (((x) & 1) << 12) #defineS_008010_CB03_CLEAN(x) (((x) & 1) << 13) +#defineS_008010_R700_TA03_BUSY(x) (((x) & 1) << 14) #defineS_008010_VGT_BUSY_NO_DMA(x) (((x) & 1) << 16) #defineS_008010_VGT_BUSY(x)(((x) & 1) << 17) #defineS_008010_TA03_BUSY(x) (((x) & 1) << 18) -- 1.7.2.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH libdrm] intel: Fix build failure in test_decode.c
On Mon, 2 Jul 2012 14:54:58 -0700 Ben Widawsky wrote: > > +#define _GNU_SOURCE > > + > > #include > > #include > > #include > > I can't reproduce this. Can anyone else confirm this is broken, and if > so that the above patch fixes it? See the manpage, it depends on the glibc version. Libdrm is broken on glibc < 2.10. - Lauri
Re: [PATCH libdrm] intel: Fix build failure in test_decode.c
On Mon, 2 Jul 2012 14:54:58 -0700 Ben Widawsky wrote: > > +#define _GNU_SOURCE > > + > > #include > > #include > > #include > > I can't reproduce this. Can anyone else confirm this is broken, and if > so that the above patch fixes it? See the manpage, it depends on the glibc version. Libdrm is broken on glibc < 2.10. - Lauri ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH libdrm] intel: Fix build failure in test_decode.c
Hi list The recently released libdrm 2.4.37 does not compile the Intel part: test_decode.c: In function 'compare_batch': test_decode.c:107: error: implicit declaration of function 'open_memstream' PS: Please CC me. Signed-off-by: Lauri Kasanen --- intel/test_decode.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/intel/test_decode.c b/intel/test_decode.c index c9ab7ad..0fcdf3b 100644 --- a/intel/test_decode.c +++ b/intel/test_decode.c @@ -21,6 +21,8 @@ * IN THE SOFTWARE. */ +#define _GNU_SOURCE + #include #include #include -- 1.7.2.1
[PATCH libdrm] intel: Fix build failure in test_decode.c
Hi list The recently released libdrm 2.4.37 does not compile the Intel part: test_decode.c: In function 'compare_batch': test_decode.c:107: error: implicit declaration of function 'open_memstream' PS: Please CC me. Signed-off-by: Lauri Kasanen --- intel/test_decode.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/intel/test_decode.c b/intel/test_decode.c index c9ab7ad..0fcdf3b 100644 --- a/intel/test_decode.c +++ b/intel/test_decode.c @@ -21,6 +21,8 @@ * IN THE SOFTWARE. */ +#define _GNU_SOURCE + #include #include #include -- 1.7.2.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH] radeon: Make PM info available to all, not just debug users
On Mon, 4 Jun 2012 13:05:46 -0400 Jerome Glisse wrote: > My dream here is to talk with the gnome folks to have them make some > kind GPU module we could write and that would show in control center. > I just need to corner some of my gnome coworker to work something out. > So if you were using nouveau you would a nouveau specific control > center entry where there could be nouveau specific information (clock > voltage) and control. Same for radeon, intel ... As I don't use Gnome or any other heavy DE, I'll end up writing a cli tool for that (radeon, at the start if they differ) anyway. So don't worry about the userspace part, just expose everything kernel-side :) - Lauri
Re: [PATCH] radeon: Make PM info available to all, not just debug users
On Mon, 4 Jun 2012 13:05:46 -0400 Jerome Glisse wrote: > My dream here is to talk with the gnome folks to have them make some > kind GPU module we could write and that would show in control center. > I just need to corner some of my gnome coworker to work something out. > So if you were using nouveau you would a nouveau specific control > center entry where there could be nouveau specific information (clock > voltage) and control. Same for radeon, intel ... As I don't use Gnome or any other heavy DE, I'll end up writing a cli tool for that (radeon, at the start if they differ) anyway. So don't worry about the userspace part, just expose everything kernel-side :) - Lauri ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH] radeon: Make PM info available to all, not just debug users
On Mon, 04 Jun 2012 18:18:10 +0200 Christian K?nig wrote: > > My experience is that things that are true today for GPU, are not > > tomorrow. Yes there will still be clock/voltage, but there could be > > new complete different things, like shutting down block. > > > > I am not even mentioning things like complex value range dependency > > btw things (for instance if a domain as clock/voltage in certain range > > than some other domain can only have clock in a restricted range > > value). > > > > While i agree that sysfs looks easy for user to play with, i believe > > that gui is what you really after and afaik closed source driver all > > expose a gui for their power management. Using ioctl allow better > > overall control, like atomic setting of several domain ... > > Yeah, I think Jerome is right here. > > The internals like different voltage areas, dependencies of clocks, > possible powered down chip areas, etc are very complex and actually > completely unteresting to the end user. Also the general direction of > AMD hardware is going to a complete self containing system, e.g. the > chip is handling mostly everything on its own. > > I agree that this is better done in a hardware dependent ioctl and then > abstracted in userspace instead of pushing the whole abstraction into > the kernel. Hi As long as the info is made available better than it is currently, I'm all for it. Though with the direction of ioctl and folks, it is deep outside my expertise. With a deep solution like that someone who knows the area well would need to do it. - Lauri
[PATCH] radeon: Make PM info available to all, not just debug users
On Sun, 03 Jun 2012 12:54:30 +0200 Christian K?nig wrote: > > This moves the pm_info file from debugfs to next to the other two power > > files. > > > > Requested by several users at Phoronix. > > > > PS: Please CC me. Also please be gentle, it's my first step in kernel-land > > ;) > Hui? What should this be good for? > > Sysfs files are for setting driver parameters, like the power management > method or profile currently in use. One major advantage of sysfs is the > strict rules for a permanent and machine usable interface, for example > it is mandatory to only specify one parameter per sysfs file. > > Debugfs on the other hand should be used for human readable > informations, e.g. the printing the current clocks in a human readable > form. Also you don't need a debug build or turn on any other debugging > facility to get those information, just take a look under > "sys/kernel/debug/dri/*". I have no such dir, /sys/kernel/debug. The fact you have it means you have CONFIG_DEBUGFS enabled and mounted. > So the code is actually quite valid as it is. First, the current location is illogical, and several users have complained about it. This info should be right next to where it is tweaked, ie right next to power_profile and power_method. That is where it's expected to be by users. Secondly, checking the clocks is absolutely not a debug operation. Therefore requiring a debug option (CONFIG_DEBUGFS) to see this info is plain wrong. This info needs to be available to all users, including those on production kernels without such debug options. -- So the issue is the location of the info, not the format. I'd be more than happy to split it into six files (default_core_clock, current_core_clock...) that each offer just a kHz number, just like the cpufreq scaling_cur_freq do. Would that be preferable? - Lauri
Re: [PATCH] radeon: Make PM info available to all, not just debug users
On Mon, 04 Jun 2012 18:18:10 +0200 Christian König wrote: > > My experience is that things that are true today for GPU, are not > > tomorrow. Yes there will still be clock/voltage, but there could be > > new complete different things, like shutting down block. > > > > I am not even mentioning things like complex value range dependency > > btw things (for instance if a domain as clock/voltage in certain range > > than some other domain can only have clock in a restricted range > > value). > > > > While i agree that sysfs looks easy for user to play with, i believe > > that gui is what you really after and afaik closed source driver all > > expose a gui for their power management. Using ioctl allow better > > overall control, like atomic setting of several domain ... > > Yeah, I think Jerome is right here. > > The internals like different voltage areas, dependencies of clocks, > possible powered down chip areas, etc are very complex and actually > completely unteresting to the end user. Also the general direction of > AMD hardware is going to a complete self containing system, e.g. the > chip is handling mostly everything on its own. > > I agree that this is better done in a hardware dependent ioctl and then > abstracted in userspace instead of pushing the whole abstraction into > the kernel. Hi As long as the info is made available better than it is currently, I'm all for it. Though with the direction of ioctl and folks, it is deep outside my expertise. With a deep solution like that someone who knows the area well would need to do it. - Lauri ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] radeon: Make PM info available to all, not just debug users
On Sun, 03 Jun 2012 12:54:30 +0200 Christian König wrote: > > This moves the pm_info file from debugfs to next to the other two power > > files. > > > > Requested by several users at Phoronix. > > > > PS: Please CC me. Also please be gentle, it's my first step in kernel-land > > ;) > Hui? What should this be good for? > > Sysfs files are for setting driver parameters, like the power management > method or profile currently in use. One major advantage of sysfs is the > strict rules for a permanent and machine usable interface, for example > it is mandatory to only specify one parameter per sysfs file. > > Debugfs on the other hand should be used for human readable > informations, e.g. the printing the current clocks in a human readable > form. Also you don't need a debug build or turn on any other debugging > facility to get those information, just take a look under > "sys/kernel/debug/dri/*". I have no such dir, /sys/kernel/debug. The fact you have it means you have CONFIG_DEBUGFS enabled and mounted. > So the code is actually quite valid as it is. First, the current location is illogical, and several users have complained about it. This info should be right next to where it is tweaked, ie right next to power_profile and power_method. That is where it's expected to be by users. Secondly, checking the clocks is absolutely not a debug operation. Therefore requiring a debug option (CONFIG_DEBUGFS) to see this info is plain wrong. This info needs to be available to all users, including those on production kernels without such debug options. -- So the issue is the location of the info, not the format. I'd be more than happy to split it into six files (default_core_clock, current_core_clock...) that each offer just a kHz number, just like the cpufreq scaling_cur_freq do. Would that be preferable? - Lauri ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH] radeon: Make PM info available to all, not just debug users
Hi all This moves the pm_info file from debugfs to next to the other two power files. Requested by several users at Phoronix. PS: Please CC me. Also please be gentle, it's my first step in kernel-land ;) Signed-off-by: Lauri Kasanen --- drivers/gpu/drm/radeon/radeon_pm.c | 86 ++- 1 files changed, 44 insertions(+), 42 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_pm.c b/drivers/gpu/drm/radeon/radeon_pm.c index 0882554..7aab18f 100644 --- a/drivers/gpu/drm/radeon/radeon_pm.c +++ b/drivers/gpu/drm/radeon/radeon_pm.c @@ -45,7 +45,6 @@ static const char *radeon_pm_state_type_name[5] = { }; static void radeon_dynpm_idle_work_handler(struct work_struct *work); -static int radeon_debugfs_pm_init(struct radeon_device *rdev); static bool radeon_pm_in_vbl(struct radeon_device *rdev); static bool radeon_pm_debug_check_in_vbl(struct radeon_device *rdev, bool finish); static void radeon_pm_update_profile(struct radeon_device *rdev); @@ -437,8 +436,49 @@ fail: return count; } +static ssize_t radeon_get_pm_info(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct drm_device *ddev = pci_get_drvdata(to_pci_dev(dev)); + struct radeon_device *rdev = ddev->dev_private; + + ssize_t curpos, len = PAGE_SIZE; + char *tmp; + + curpos = snprintf(buf, len, + "default engine clock: %u0 kHz\n" + "current engine clock: %u0 kHz\n" + "default memory clock: %u0 kHz\n", + rdev->pm.default_sclk, + radeon_get_engine_clock(rdev), + rdev->pm.default_mclk); + len -= curpos; + + if (rdev->asic->get_memory_clock) { + tmp = buf + curpos; + curpos += snprintf(tmp, len, "current memory clock: %u0 kHz\n", radeon_get_memory_clock(rdev)); + len = PAGE_SIZE - curpos; + } + + if (rdev->pm.current_vddc) { + tmp = buf + curpos; + curpos += snprintf(tmp, len, "voltage: %u mV\n", rdev->pm.current_vddc); + len = PAGE_SIZE - curpos; + } + + if (rdev->asic->get_pcie_lanes) { + tmp = buf + curpos; + curpos += snprintf(tmp, len, "PCIE lanes: %d\n", radeon_get_pcie_lanes(rdev)); + len = PAGE_SIZE - curpos; + } + + return curpos; +} + static DEVICE_ATTR(power_profile, S_IRUGO | S_IWUSR, radeon_get_pm_profile, radeon_set_pm_profile); static DEVICE_ATTR(power_method, S_IRUGO | S_IWUSR, radeon_get_pm_method, radeon_set_pm_method); +static DEVICE_ATTR(power_info, S_IRUGO, radeon_get_pm_info, NULL); static ssize_t radeon_hwmon_show_temp(struct device *dev, struct device_attribute *attr, @@ -639,14 +679,14 @@ int radeon_pm_init(struct radeon_device *rdev) ret = device_create_file(rdev->dev, &dev_attr_power_method); if (ret) DRM_ERROR("failed to create device file for power method\n"); + ret = device_create_file(rdev->dev, &dev_attr_power_info); + if (ret) + DRM_ERROR("failed to create device file for power info\n"); #ifdef CONFIG_ACPI rdev->acpi_nb.notifier_call = radeon_acpi_event; register_acpi_notifier(&rdev->acpi_nb); #endif - if (radeon_debugfs_pm_init(rdev)) { - DRM_ERROR("Failed to register debugfs file for PM!\n"); - } DRM_INFO("radeon: power management initialized\n"); } @@ -843,41 +883,3 @@ static void radeon_dynpm_idle_work_handler(struct work_struct *work) mutex_unlock(&rdev->pm.mutex); ttm_bo_unlock_delayed_workqueue(&rdev->mman.bdev, resched); } - -/* - * Debugfs info - */ -#if defined(CONFIG_DEBUG_FS) - -static int radeon_debugfs_pm_info(struct seq_file *m, void *data) -{ - struct drm_info_node *node = (struct drm_info_node *) m->private; - struct drm_device *dev = node->minor->dev; - struct radeon_device *rdev = dev->dev_private; - - seq_printf(m, "default engine clock: %u0 kHz\n", rdev->pm.default_sclk); - seq_printf(m, "current engine clock: %u0 kHz\n", radeon_get_engine_clock(rdev)); - seq_printf(m, "default memory clock: %u0 kHz\n", rdev->pm.default_mclk); - if (rdev->asic->pm.get_memory_clock) - seq_printf(m, "current memory clock: %u0 kHz\n", radeon_get_memory_clock(rdev)); - if (rdev->pm.current_vddc) - seq_printf(m, "voltage: %u mV\n", rdev
[PATCH] radeon: Make PM info available to all, not just debug users
Hi all This moves the pm_info file from debugfs to next to the other two power files. Requested by several users at Phoronix. PS: Please CC me. Also please be gentle, it's my first step in kernel-land ;) Signed-off-by: Lauri Kasanen --- drivers/gpu/drm/radeon/radeon_pm.c | 86 ++- 1 files changed, 44 insertions(+), 42 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_pm.c b/drivers/gpu/drm/radeon/radeon_pm.c index 0882554..7aab18f 100644 --- a/drivers/gpu/drm/radeon/radeon_pm.c +++ b/drivers/gpu/drm/radeon/radeon_pm.c @@ -45,7 +45,6 @@ static const char *radeon_pm_state_type_name[5] = { }; static void radeon_dynpm_idle_work_handler(struct work_struct *work); -static int radeon_debugfs_pm_init(struct radeon_device *rdev); static bool radeon_pm_in_vbl(struct radeon_device *rdev); static bool radeon_pm_debug_check_in_vbl(struct radeon_device *rdev, bool finish); static void radeon_pm_update_profile(struct radeon_device *rdev); @@ -437,8 +436,49 @@ fail: return count; } +static ssize_t radeon_get_pm_info(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct drm_device *ddev = pci_get_drvdata(to_pci_dev(dev)); + struct radeon_device *rdev = ddev->dev_private; + + ssize_t curpos, len = PAGE_SIZE; + char *tmp; + + curpos = snprintf(buf, len, + "default engine clock: %u0 kHz\n" + "current engine clock: %u0 kHz\n" + "default memory clock: %u0 kHz\n", + rdev->pm.default_sclk, + radeon_get_engine_clock(rdev), + rdev->pm.default_mclk); + len -= curpos; + + if (rdev->asic->get_memory_clock) { + tmp = buf + curpos; + curpos += snprintf(tmp, len, "current memory clock: %u0 kHz\n", radeon_get_memory_clock(rdev)); + len = PAGE_SIZE - curpos; + } + + if (rdev->pm.current_vddc) { + tmp = buf + curpos; + curpos += snprintf(tmp, len, "voltage: %u mV\n", rdev->pm.current_vddc); + len = PAGE_SIZE - curpos; + } + + if (rdev->asic->get_pcie_lanes) { + tmp = buf + curpos; + curpos += snprintf(tmp, len, "PCIE lanes: %d\n", radeon_get_pcie_lanes(rdev)); + len = PAGE_SIZE - curpos; + } + + return curpos; +} + static DEVICE_ATTR(power_profile, S_IRUGO | S_IWUSR, radeon_get_pm_profile, radeon_set_pm_profile); static DEVICE_ATTR(power_method, S_IRUGO | S_IWUSR, radeon_get_pm_method, radeon_set_pm_method); +static DEVICE_ATTR(power_info, S_IRUGO, radeon_get_pm_info, NULL); static ssize_t radeon_hwmon_show_temp(struct device *dev, struct device_attribute *attr, @@ -639,14 +679,14 @@ int radeon_pm_init(struct radeon_device *rdev) ret = device_create_file(rdev->dev, &dev_attr_power_method); if (ret) DRM_ERROR("failed to create device file for power method\n"); + ret = device_create_file(rdev->dev, &dev_attr_power_info); + if (ret) + DRM_ERROR("failed to create device file for power info\n"); #ifdef CONFIG_ACPI rdev->acpi_nb.notifier_call = radeon_acpi_event; register_acpi_notifier(&rdev->acpi_nb); #endif - if (radeon_debugfs_pm_init(rdev)) { - DRM_ERROR("Failed to register debugfs file for PM!\n"); - } DRM_INFO("radeon: power management initialized\n"); } @@ -843,41 +883,3 @@ static void radeon_dynpm_idle_work_handler(struct work_struct *work) mutex_unlock(&rdev->pm.mutex); ttm_bo_unlock_delayed_workqueue(&rdev->mman.bdev, resched); } - -/* - * Debugfs info - */ -#if defined(CONFIG_DEBUG_FS) - -static int radeon_debugfs_pm_info(struct seq_file *m, void *data) -{ - struct drm_info_node *node = (struct drm_info_node *) m->private; - struct drm_device *dev = node->minor->dev; - struct radeon_device *rdev = dev->dev_private; - - seq_printf(m, "default engine clock: %u0 kHz\n", rdev->pm.default_sclk); - seq_printf(m, "current engine clock: %u0 kHz\n", radeon_get_engine_clock(rdev)); - seq_printf(m, "default memory clock: %u0 kHz\n", rdev->pm.default_mclk); - if (rdev->asic->pm.get_memory_clock) - seq_printf(m, "current memory clock: %u0 kHz\n", radeon_get_memory_clock(rdev)); - if (rdev->pm.current_vddc) - seq_printf(m, "voltage: %u mV\n", rdev