date:20130826

On Fri, Aug 23, 2013 at 02:08:11AM +0200, Daniel Vetter wrote:
 The important bugfix here is that we must not unlink the vma when
 we keep it around as a placeholder for the execbuf code. Since then we
 won't find it again when execbuf gets interrupt and restarted and
 create a 2nd vma. And since the code as-is isn't fit yet to deal with
 more than one vma, hilarity ensues.
 
 Specifically the dma map/unmap of the sg table isn't adjusted for
 multiple vmas yet and will blow up like this:
 
 BUG: unable to handle kernel NULL pointer dereference at 0008
 IP: [a008fb37] i915_gem_gtt_finish_object+0x73/0xc8 [i915]
 PGD 56bb5067 PUD ad3dd067 PMD 0
 Oops:  [#1] SMP
 Modules linked in: tcp_lp ppdev parport_pc lp parport ipv6 dm_mod dcdbas 
 snd_hda_codec_hdmi pcspkr snd_hda_codec_realtek serio_raw i2c_i801 iTCO_wdt 
 iTCO_vendor_support snd_hda_intel snd_hda_codec lpc_ich snd_hwdep mfd_core 
 snd_pcm snd_page_alloc snd_timer snd soundcore acpi_cpufreq i915 video button 
 drm_kms_helper drm mperf freq_table
 CPU: 1 PID: 16650 Comm: fbo-maxsize Not tainted 
 3.11.0-rc4_nightlytop_d93f59_debug_20130814_+ #6957
 Hardware name: Dell Inc. OptiPlex 9010/03JR84, BIOS A01 05/04/2012
 task: 8800563b3f00 ti: 88004bdf4000 task.ti: 88004bdf4000
 RIP: 0010:[a008fb37]  [a008fb37] 
 i915_gem_gtt_finish_object+0x73/0xc8 [i915]
 RSP: 0018:88004bdf5958  EFLAGS: 00010246
 RAX:  RBX: 8801135e RCX: 8800ad3bf8e0
 RDX: 8800ad3bf8e0 RSI:  RDI: 8801007ee780
 RBP: 88004bdf5978 R08: 8800ad3bf8e0 R09: 
 R10: 86ca1810 R11: 880036a17101 R12: 8801007ee780
 R13: 00018001 R14: 880118c4e000 R15: 8801007ee780
 FS:  7f401a0ce740() GS:88011e28() knlGS:
 CS:  0010 DS:  ES:  CR0: 80050033
 CR2: 0008 CR3: 5635c000 CR4: 001407e0
 Stack:
  8801007ee780 88005c253180 00018000 8801135e
  88004bdf59a8 a0088e55 0011 8801007eec00
  00018000 880036a17101 88004bdf5a08 a0089026
 Call Trace:
  [a0088e55] i915_vma_unbind+0xdf/0x1ab [i915]
  [a0089026] __i915_gem_shrink+0x105/0x177 [i915]
  [a0089452] i915_gem_object_get_pages_gtt+0x108/0x309 [i915]
  [a0085ba9] i915_gem_object_get_pages+0x61/0x90 [i915]
  [a008f22b] ? gen6_ppgtt_insert_entries+0x103/0x125 [i915]
  [a008a113] i915_gem_object_pin+0x1fa/0x5df [i915]
  [a008cdfe] i915_gem_execbuffer_reserve_object.isra.6+0x8d/0x1bc 
 [i915]
  [a008d156] i915_gem_execbuffer_reserve+0x229/0x367 [i915]
  [a008dbf6] i915_gem_do_execbuffer.isra.12+0x4dc/0xf3a [i915]
  [810fc823] ? might_fault+0x40/0x90
  [a008eb89] i915_gem_execbuffer2+0x187/0x222 [i915]
  [a000971c] drm_ioctl+0x308/0x442 [drm]
  [a008ea02] ? i915_gem_execbuffer+0x3ae/0x3ae [i915]
  [817db156] ? __do_page_fault+0x3dd/0x481
  [8112fdba] vfs_ioctl+0x26/0x39
  [811306a2] do_vfs_ioctl+0x40e/0x451
  [817deda7] ? sysret_check+0x1b/0x56
  [8113073c] SyS_ioctl+0x57/0x87
  [8135bbfe] ? trace_hardirqs_on_thunk+0x3a/0x3f
  [817ded82] system_call_fastpath+0x16/0x1b
 Code: 48 c7 c6 84 30 0e a0 31 c0 e8 d0 e9 f7 ff bf c6 a7 00 00 e8 07 af 2c e1 
 41 f6 84 24 03 01 00 00 10 75 44 49 8b 84 24 08 01 00 00 8b 50 08 48 8b 30 
 49 8b 86 b0 04 00 00 48 89 c7 48 81 c7 98 00
 RIP  [a008fb37] i915_gem_gtt_finish_object+0x73/0xc8 [i915]
  RSP 88004bdf5958
 CR2: 0008
 
 As a consequence we need to change the only one vma for now check in
 vma_unbind - since vma_destroy isn't always called the obj-vma_list
 might not be empty. Instead check that the vma list is singular at the
 beginning of vma_unbind. This is also more symmetric with bind_to_vm.
 
 v2:
 - Add a paranoid WARN to mark_free in the eviction code to make sure
   we never try to evict a vma used by the execbuf code right now.
 - Move the check for a temporary execbuf vma into vma_destroy -
   otherwise the failure path cleanup in bind_to_vm will blow up.
 
 Our first attempting at fixing this was
 
 commit 1be81a2f2cfd8789a627401d470423358fba2d76
 Author: Chris Wilson ch...@chris-wilson.co.uk
 Date:   Tue Aug 20 12:56:40 2013 +0100
 
 drm/i915: Don't destroy the vma placeholder during execbuffer reservation
 
 Squash with this when merging!
 
 Cc: Chris Wilson ch...@chris-wilson.co.uk
 Cc: Ben Widawsky b...@bwidawsk.net
 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68298
 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68171
 Signed-off-by: Daniel Vetter daniel.vet...@ffwll.ch

Tested-by: lu hua huax...@intel.com

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
Intel-gfx mailing list

Re: [Intel-gfx] [PATCH] Add second DRI driver name (DRI2DriverVDPAU)

2013-08-26 Thread Rinat Ibragimov

Ping.


Пятница, 16 августа 2013, 14:31 +04:00 от Ибрагимов 
Ринатibragimovri...@mail.ru:
 libvdpau uses second DRI driver name to determine which VDPAU driver
 to use. This patch will allow libvdpau choose libvdpau_i965.so on systems
 with Intel GPUs, libvdpau_nvidia.so on those with nVidia ones, and so on.
 I'm experimenting now with generic vdpau driver using OpenGL/VA-API,
 it would be convenient to have this driver selection working without manual
 driver selection.
 
 Signed-off-by: Rinat ibragimovri...@mail.ru
 ---
  src/sna/sna_dri.c   |5 +++--
  src/uxa/intel_dri.c |5 +++--
  2 files changed, 6 insertions(+), 4 deletions(-)
 
 diff --git a/src/sna/sna_dri.c b/src/sna/sna_dri.c
 index 1569251..0ba373d 100644
 --- a/src/sna/sna_dri.c
 +++ b/src/sna/sna_dri.c
 @@ -2299,7 +2299,7 @@ bool sna_dri_open(struct sna *sna, ScreenPtr screen)
   DRI2InfoRec info;
   int major = 1, minor = 0;
  #if DRI2INFOREC_VERSION = 4
 - const char *driverNames[1];
 + const char *driverNames[2];
  #endif
  
   DBG((%s()\n, __FUNCTION__));
 @@ -2336,9 +2336,10 @@ bool sna_dri_open(struct sna *sna, ScreenPtr screen)
   info.ScheduleSwap = sna_dri_schedule_swap;
   info.GetMSC = sna_dri_get_msc;
   info.ScheduleWaitMSC = sna_dri_schedule_wait_msc;
 - info.numDrivers = 1;
 + info.numDrivers = 2;
   info.driverNames = driverNames;
   driverNames[0] = info.driverName;
 + driverNames[1] = info.driverName;
  #endif
  
  #if DRI2INFOREC_VERSION = 6
 diff --git a/src/uxa/intel_dri.c b/src/uxa/intel_dri.c
 index 0370034..2d33380 100644
 --- a/src/uxa/intel_dri.c
 +++ b/src/uxa/intel_dri.c
 @@ -1553,7 +1553,7 @@ Bool I830DRI2ScreenInit(ScreenPtr screen)
   int dri2_major = 1;
   int dri2_minor = 0;
  #if DRI2INFOREC_VERSION = 4
 - const char *driverNames[1];
 + const char *driverNames[2];
  #endif
  
   if (intel-force_fallback) {
 @@ -1620,9 +1620,10 @@ Bool I830DRI2ScreenInit(ScreenPtr screen)
   info.ScheduleSwap = I830DRI2ScheduleSwap;
   info.GetMSC = I830DRI2GetMSC;
   info.ScheduleWaitMSC = I830DRI2ScheduleWaitMSC;
 - info.numDrivers = 1;
 + info.numDrivers = 2;
   info.driverNames = driverNames;
   driverNames[0] = info.driverName;
 + driverNames[1] = info.driverName;
  #endif
  
   return DRI2ScreenInit(screen, info);
 -- 
 1.7.10.4
 
 ___
 Intel-gfx mailing list
 Intel-gfx@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/intel-gfx
 

---
Rinat
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 6/6] DRAFT: drm/i915: do adapter power state notification on PC8+ enable/disable

On Fri, Aug 23, 2013 at 05:14:00PM -0300, Paulo Zanoni wrote:
 2013/8/23 Ville Syrjälä ville.syrj...@linux.intel.com:
  On Fri, Aug 23, 2013 at 04:41:57PM -0300, Paulo Zanoni wrote:
  2013/8/23 Kristen Carlson Accardi kristen.c.acca...@intel.com:
   On Fri, 23 Aug 2013 13:44:17 -0300
   Paulo Zanoni przan...@gmail.com wrote:
  
   2013/8/23 Jani Nikula jani.nik...@intel.com:
  
   /* Please insert explanation on why we need this and what changes if
   we do this. */
  
   I applied your patches and booted them. I got into PC8, did the PC8
   test suite and nothing changed. I really don't know what to expect
   from this series and/or how to check what's improving. Also, see
   below:
  
  
   So this is one of these things that will have no visible impact on
   i915, but will impact other parts of the system.  So I think the only
   way to test it is by throwing it on the SIP board and checking the
   power level of the components this impacts (Audio, thermal, KBC/EC,
   LPT).  And without the code which does the actual PCI D3 request from
   i915, nothing will happen.
 
  I was told we could try to verify this by reading PCI config register
  0xd4, but for me it's always 0x0, which means we're probably not
  really getting into D3.
 
  You have to write 0x3 into the PCI PM register to make it enter D3.
  Exactly like you do when you suspend the whole machine.
 
  Not sure 0xd4 is the correct offset in this case, but you can look
  it up from lspci output (remember +4), or in kernel code just use
  pci_dev.pm_cap+4. Hmm, it's 0xd4 in my SNB at least. Maybe it's always
  the same for the GPU.
 
 Check the description of 0xd4 on BSpec.
 
 I actually wrote the get into D3 value to it (using setpci), and
 then when I got out of PC8 I saw tons and tons of error messages on
 dmesg due to the fact that we were failing to write registers. Which
 means it probably works, but we may have more work to do.

My quick and dirty idea for runtime PM would be something like this:
- for ioctls just slap rpm_get_sync/put() around drm_ioctl,
  not optimal but it's very easy for getting started
- all sysfs/debugfs stuff would need to be handled individually
- to deal w/ gtt mmaps just call put_fence or something during suspend,
  grab one ref in fault and probably release it from a delayed work,
  or mabe record a timestamp at last fault time and check it in the idle
  callback
- grab one ref per request (or per active ring maybe?)
- grab one ref per active pipe
- maybe some odd delayed work would need another reference

With that, I think everything of importance would hold a reference,
so the runtime pm idle callback shouldn't really need to do much.

-- 
Ville Syrjälä
Intel OTC
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] Patches for performance improvement

On Sun, Aug 25, 2013 at 04:35:52AM +, Kannan, Vandana wrote:
 Hi,
 
 For sprite performance improvement, instead of the patch 
 drm/i915: Don't wait for vblank for sprite plane flips, we made use of 
 Chris Wilson's patches from RFC asynchronous vblank tasks and verified that 
 it gives the expected performance boost.
 Chris's patches have not been merged yet, specifically,
 drm/i915: Introduce vblank work function
 drm/i915: Asynchronously unpin the old   framebuffer after the 
 next vblank
 drm/i915/sprite: Make plane switching asynchronous
 
 Will the updated patches for these be coming any time soon ? We 
 require these patches as it gives us a very good performance improvement.
 Please let me know, I can verify with the updated patches.

I don't seem to have said patches in my mailbox, so I can't comment on
the details, but...

My main concern is again the contents of the old fb after the setplane
ioctl returns. Due to the current synchronous nature of the ioctl,
userspace knows that when the ioctl returns the flip has completed, and
so can write the old fb again.

If the ioctl is async, userspace has to guess whether the old fb can
be safely written. We could solve this with a new event that gets sent
from setplane. We do have plenty of flags left in setplane so it's
certainly doable.

Also with the async ioctl userspace can issue another setplane before
the previous flip has completed. I personally think that's a nice
feature that allows no-fps-limits triple buffering, but it does make it
a bit more difficult to know which fb is actually being used at a given
time. I have it solved in my atomic page flip code though, as the
drm_flip_helper contraption knows how to track this stuff.

Maybe someone should just start pulling in the drm_flip_helper and
associated code from the atomic code to make this feasible. The new
event type I added there would also be suitable for this purpose as
it carries the plane id and information on which fb is freed by the
operation. The other option is to just add a new event, and disallow
another setplane until the previous has completed (maybe try to
generalize the current page flip code a bit to handle sprites as
well as primary planes?)

-- 
Ville Syrjälä
Intel OTC
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] Patches for performance improvement

On Mon, Aug 26, 2013 at 10:16 AM, Ville Syrjälä
ville.syrj...@linux.intel.com wrote:
 Maybe someone should just start pulling in the drm_flip_helper and
 associated code from the atomic code to make this feasible. The new
 event type I added there would also be suitable for this purpose as
 it carries the plane id and information on which fb is freed by the
 operation. The other option is to just add a new event, and disallow
 another setplane until the previous has completed (maybe try to
 generalize the current page flip code a bit to handle sprites as
 well as primary planes?)

I think extending the current pageflip ioctl to planes would be a
suitable stop-gap measure until we have nuclear pageflips for real.
It's only a 90% solution though since we can't update anything else
than the base address with the current pageflip ioctl.

Actually I think doing this would be a neat way to start pulling in
parts of the nuclear pageflip code and getting all the flip
complete/event signalling straigthened out (and properly tested with
igt like the primary plane flips).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 6/6] DRAFT: drm/i915: do adapter power state notification on PC8+ enable/disable

On Mon, Aug 26, 2013 at 9:43 AM, Ville Syrjälä
ville.syrj...@linux.intel.com wrote:
 My quick and dirty idea for runtime PM would be something like this:
 - for ioctls just slap rpm_get_sync/put() around drm_ioctl,
   not optimal but it's very easy for getting started
 - all sysfs/debugfs stuff would need to be handled individually
 - to deal w/ gtt mmaps just call put_fence or something during suspend,
   grab one ref in fault and probably release it from a delayed work,
   or mabe record a timestamp at last fault time and check it in the idle
   callback
 - grab one ref per request (or per active ring maybe?)
 - grab one ref per active pipe
 - maybe some odd delayed work would need another reference

 With that, I think everything of importance would hold a reference,
 so the runtime pm idle callback shouldn't really need to do much.

I think we can do this cheaper:
- Grab a ref everywhere we want to disallow pc8+ currently (by hooking
into the same get/put calls Paulo's patches sprinkled all over). That
should cover 100% of all the runtime stuff - Paulo's pc8+ test is
rather extensive.

That leaves us with the debugfs/sysfs crap doing register access.
Cheap and dirty (and likely to make Chris rage) is to add get/put
calls with a suitable timeout to the register i/o functions. Less
cheap (but I'm unsure whether that's better) would be to sprinkle
get/put calls around all relevant sysfs/debugfs files. Pure s/w stuff
like the error state wouldn't need it though (and we should try hard
not to require it in case runtime pm hangs).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH] drm/i915: More vma fixups around unbind/destroy

The important bugfix here is that we must not unlink the vma when
we keep it around as a placeholder for the execbuf code. Since then we
won't find it again when execbuf gets interrupt and restarted and
create a 2nd vma. And since the code as-is isn't fit yet to deal with
more than one vma, hilarity ensues.

Specifically the dma map/unmap of the sg table isn't adjusted for
multiple vmas yet and will blow up like this:

BUG: unable to handle kernel NULL pointer dereference at 0008
IP: [a008fb37] i915_gem_gtt_finish_object+0x73/0xc8 [i915]
PGD 56bb5067 PUD ad3dd067 PMD 0
Oops:  [#1] SMP
Modules linked in: tcp_lp ppdev parport_pc lp parport ipv6 dm_mod dcdbas 
snd_hda_codec_hdmi pcspkr snd_hda_codec_realtek serio_raw i2c_i801 iTCO_wdt 
iTCO_vendor_support snd_hda_intel snd_hda_codec lpc_ich snd_hwdep mfd_core 
snd_pcm snd_page_alloc snd_timer snd soundcore acpi_cpufreq i915 video button 
drm_kms_helper drm mperf freq_table
CPU: 1 PID: 16650 Comm: fbo-maxsize Not tainted 
3.11.0-rc4_nightlytop_d93f59_debug_20130814_+ #6957
Hardware name: Dell Inc. OptiPlex 9010/03JR84, BIOS A01 05/04/2012
task: 8800563b3f00 ti: 88004bdf4000 task.ti: 88004bdf4000
RIP: 0010:[a008fb37]  [a008fb37] 
i915_gem_gtt_finish_object+0x73/0xc8 [i915]
RSP: 0018:88004bdf5958  EFLAGS: 00010246
RAX:  RBX: 8801135e RCX: 8800ad3bf8e0
RDX: 8800ad3bf8e0 RSI:  RDI: 8801007ee780
RBP: 88004bdf5978 R08: 8800ad3bf8e0 R09: 
R10: 86ca1810 R11: 880036a17101 R12: 8801007ee780
R13: 00018001 R14: 880118c4e000 R15: 8801007ee780
FS:  7f401a0ce740() GS:88011e28() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 0008 CR3: 5635c000 CR4: 001407e0
Stack:
 8801007ee780 88005c253180 00018000 8801135e
 88004bdf59a8 a0088e55 0011 8801007eec00
 00018000 880036a17101 88004bdf5a08 a0089026
Call Trace:
 [a0088e55] i915_vma_unbind+0xdf/0x1ab [i915]
 [a0089026] __i915_gem_shrink+0x105/0x177 [i915]
 [a0089452] i915_gem_object_get_pages_gtt+0x108/0x309 [i915]
 [a0085ba9] i915_gem_object_get_pages+0x61/0x90 [i915]
 [a008f22b] ? gen6_ppgtt_insert_entries+0x103/0x125 [i915]
 [a008a113] i915_gem_object_pin+0x1fa/0x5df [i915]
 [a008cdfe] i915_gem_execbuffer_reserve_object.isra.6+0x8d/0x1bc 
[i915]
 [a008d156] i915_gem_execbuffer_reserve+0x229/0x367 [i915]
 [a008dbf6] i915_gem_do_execbuffer.isra.12+0x4dc/0xf3a [i915]
 [810fc823] ? might_fault+0x40/0x90
 [a008eb89] i915_gem_execbuffer2+0x187/0x222 [i915]
 [a000971c] drm_ioctl+0x308/0x442 [drm]
 [a008ea02] ? i915_gem_execbuffer+0x3ae/0x3ae [i915]
 [817db156] ? __do_page_fault+0x3dd/0x481
 [8112fdba] vfs_ioctl+0x26/0x39
 [811306a2] do_vfs_ioctl+0x40e/0x451
 [817deda7] ? sysret_check+0x1b/0x56
 [8113073c] SyS_ioctl+0x57/0x87
 [8135bbfe] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [817ded82] system_call_fastpath+0x16/0x1b
Code: 48 c7 c6 84 30 0e a0 31 c0 e8 d0 e9 f7 ff bf c6 a7 00 00 e8 07 af 2c e1 
41 f6 84 24 03 01 00 00 10 75 44 49 8b 84 24 08 01 00 00 8b 50 08 48 8b 30 49 
8b 86 b0 04 00 00 48 89 c7 48 81 c7 98 00
RIP  [a008fb37] i915_gem_gtt_finish_object+0x73/0xc8 [i915]
 RSP 88004bdf5958
CR2: 0008

As a consequence we need to change the only one vma for now check in
vma_unbind - since vma_destroy isn't always called the obj-vma_list
might not be empty. Instead check that the vma list is singular at the
beginning of vma_unbind. This is also more symmetric with bind_to_vm.

v2:
- Add a paranoid WARN to mark_free in the eviction code to make sure
  we never try to evict a vma used by the execbuf code right now.
- Move the check for a temporary execbuf vma into vma_destroy -
  otherwise the failure path cleanup in bind_to_vm will blow up.

Our first attempting at fixing this was

commit 1be81a2f2cfd8789a627401d470423358fba2d76
Author: Chris Wilson ch...@chris-wilson.co.uk
Date:   Tue Aug 20 12:56:40 2013 +0100

drm/i915: Don't destroy the vma placeholder during execbuffer reservation

Squash with this when merging!

v3: Improvements suggested in Chris' review:
- Move the WARN_ON in vma_destroy that checks for vmas with an drm_mm
  allocation before the early return.
- Bail out if we hit the WARN in mark_free to hopefully make the
  kernel survive for long enough to capture it.

Cc: Chris Wilson ch...@chris-wilson.co.uk
Cc: Ben Widawsky b...@bwidawsk.net
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68298
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68171
Tested-by: lu hua huax...@intel.com (v2)
Signed-off-by: Daniel Vetter daniel.vet...@ffwll.ch
---
 drivers/gpu/drm/i915/i915_gem.c   | 10 ++

Re: [Intel-gfx] [PATCH] drm/i915: More vma fixups around unbind/destroy

On Mon, Aug 26, 2013 at 11:23:47AM +0200, Daniel Vetter wrote:
 The important bugfix here is that we must not unlink the vma when
 we keep it around as a placeholder for the execbuf code. Since then we
 won't find it again when execbuf gets interrupt and restarted and
 create a 2nd vma. And since the code as-is isn't fit yet to deal with
 more than one vma, hilarity ensues.
 
 Specifically the dma map/unmap of the sg table isn't adjusted for
 multiple vmas yet and will blow up like this:
 
 BUG: unable to handle kernel NULL pointer dereference at 0008
 IP: [a008fb37] i915_gem_gtt_finish_object+0x73/0xc8 [i915]
 PGD 56bb5067 PUD ad3dd067 PMD 0
 Oops:  [#1] SMP
 Modules linked in: tcp_lp ppdev parport_pc lp parport ipv6 dm_mod dcdbas 
 snd_hda_codec_hdmi pcspkr snd_hda_codec_realtek serio_raw i2c_i801 iTCO_wdt 
 iTCO_vendor_support snd_hda_intel snd_hda_codec lpc_ich snd_hwdep mfd_core 
 snd_pcm snd_page_alloc snd_timer snd soundcore acpi_cpufreq i915 video button 
 drm_kms_helper drm mperf freq_table
 CPU: 1 PID: 16650 Comm: fbo-maxsize Not tainted 
 3.11.0-rc4_nightlytop_d93f59_debug_20130814_+ #6957
 Hardware name: Dell Inc. OptiPlex 9010/03JR84, BIOS A01 05/04/2012
 task: 8800563b3f00 ti: 88004bdf4000 task.ti: 88004bdf4000
 RIP: 0010:[a008fb37]  [a008fb37] 
 i915_gem_gtt_finish_object+0x73/0xc8 [i915]
 RSP: 0018:88004bdf5958  EFLAGS: 00010246
 RAX:  RBX: 8801135e RCX: 8800ad3bf8e0
 RDX: 8800ad3bf8e0 RSI:  RDI: 8801007ee780
 RBP: 88004bdf5978 R08: 8800ad3bf8e0 R09: 
 R10: 86ca1810 R11: 880036a17101 R12: 8801007ee780
 R13: 00018001 R14: 880118c4e000 R15: 8801007ee780
 FS:  7f401a0ce740() GS:88011e28() knlGS:
 CS:  0010 DS:  ES:  CR0: 80050033
 CR2: 0008 CR3: 5635c000 CR4: 001407e0
 Stack:
  8801007ee780 88005c253180 00018000 8801135e
  88004bdf59a8 a0088e55 0011 8801007eec00
  00018000 880036a17101 88004bdf5a08 a0089026
 Call Trace:
  [a0088e55] i915_vma_unbind+0xdf/0x1ab [i915]
  [a0089026] __i915_gem_shrink+0x105/0x177 [i915]
  [a0089452] i915_gem_object_get_pages_gtt+0x108/0x309 [i915]
  [a0085ba9] i915_gem_object_get_pages+0x61/0x90 [i915]
  [a008f22b] ? gen6_ppgtt_insert_entries+0x103/0x125 [i915]
  [a008a113] i915_gem_object_pin+0x1fa/0x5df [i915]
  [a008cdfe] i915_gem_execbuffer_reserve_object.isra.6+0x8d/0x1bc 
 [i915]
  [a008d156] i915_gem_execbuffer_reserve+0x229/0x367 [i915]
  [a008dbf6] i915_gem_do_execbuffer.isra.12+0x4dc/0xf3a [i915]
  [810fc823] ? might_fault+0x40/0x90
  [a008eb89] i915_gem_execbuffer2+0x187/0x222 [i915]
  [a000971c] drm_ioctl+0x308/0x442 [drm]
  [a008ea02] ? i915_gem_execbuffer+0x3ae/0x3ae [i915]
  [817db156] ? __do_page_fault+0x3dd/0x481
  [8112fdba] vfs_ioctl+0x26/0x39
  [811306a2] do_vfs_ioctl+0x40e/0x451
  [817deda7] ? sysret_check+0x1b/0x56
  [8113073c] SyS_ioctl+0x57/0x87
  [8135bbfe] ? trace_hardirqs_on_thunk+0x3a/0x3f
  [817ded82] system_call_fastpath+0x16/0x1b
 Code: 48 c7 c6 84 30 0e a0 31 c0 e8 d0 e9 f7 ff bf c6 a7 00 00 e8 07 af 2c e1 
 41 f6 84 24 03 01 00 00 10 75 44 49 8b 84 24 08 01 00 00 8b 50 08 48 8b 30 
 49 8b 86 b0 04 00 00 48 89 c7 48 81 c7 98 00
 RIP  [a008fb37] i915_gem_gtt_finish_object+0x73/0xc8 [i915]
  RSP 88004bdf5958
 CR2: 0008
 
 As a consequence we need to change the only one vma for now check in
 vma_unbind - since vma_destroy isn't always called the obj-vma_list
 might not be empty. Instead check that the vma list is singular at the
 beginning of vma_unbind. This is also more symmetric with bind_to_vm.
 
 v2:
 - Add a paranoid WARN to mark_free in the eviction code to make sure
   we never try to evict a vma used by the execbuf code right now.
 - Move the check for a temporary execbuf vma into vma_destroy -
   otherwise the failure path cleanup in bind_to_vm will blow up.
 
 Our first attempting at fixing this was
 
 commit 1be81a2f2cfd8789a627401d470423358fba2d76
 Author: Chris Wilson ch...@chris-wilson.co.uk
 Date:   Tue Aug 20 12:56:40 2013 +0100
 
 drm/i915: Don't destroy the vma placeholder during execbuffer reservation
 
 Squash with this when merging!
 
 v3: Improvements suggested in Chris' review:
 - Move the WARN_ON in vma_destroy that checks for vmas with an drm_mm
   allocation before the early return.
 - Bail out if we hit the WARN in mark_free to hopefully make the
   kernel survive for long enough to capture it.
 
 Cc: Chris Wilson ch...@chris-wilson.co.uk
 Cc: Ben Widawsky b...@bwidawsk.net
 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68298
 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68171
 Tested-by: lu

Re: [Intel-gfx] [PATCH] drm/i915: More vma fixups around unbind/destroy

On Mon, Aug 26, 2013 at 11:23:47AM +0200, Daniel Vetter wrote:
 The important bugfix here is that we must not unlink the vma when
 we keep it around as a placeholder for the execbuf code. Since then we
 won't find it again when execbuf gets interrupt and restarted and
 create a 2nd vma. And since the code as-is isn't fit yet to deal with
 more than one vma, hilarity ensues.
 
 Specifically the dma map/unmap of the sg table isn't adjusted for
 multiple vmas yet and will blow up like this:
 
 BUG: unable to handle kernel NULL pointer dereference at 0008
 IP: [a008fb37] i915_gem_gtt_finish_object+0x73/0xc8 [i915]
 PGD 56bb5067 PUD ad3dd067 PMD 0
 Oops:  [#1] SMP
 Modules linked in: tcp_lp ppdev parport_pc lp parport ipv6 dm_mod dcdbas 
 snd_hda_codec_hdmi pcspkr snd_hda_codec_realtek serio_raw i2c_i801 iTCO_wdt 
 iTCO_vendor_support snd_hda_intel snd_hda_codec lpc_ich snd_hwdep mfd_core 
 snd_pcm snd_page_alloc snd_timer snd soundcore acpi_cpufreq i915 video button 
 drm_kms_helper drm mperf freq_table
 CPU: 1 PID: 16650 Comm: fbo-maxsize Not tainted 
 3.11.0-rc4_nightlytop_d93f59_debug_20130814_+ #6957
 Hardware name: Dell Inc. OptiPlex 9010/03JR84, BIOS A01 05/04/2012
 task: 8800563b3f00 ti: 88004bdf4000 task.ti: 88004bdf4000
 RIP: 0010:[a008fb37]  [a008fb37] 
 i915_gem_gtt_finish_object+0x73/0xc8 [i915]
 RSP: 0018:88004bdf5958  EFLAGS: 00010246
 RAX:  RBX: 8801135e RCX: 8800ad3bf8e0
 RDX: 8800ad3bf8e0 RSI:  RDI: 8801007ee780
 RBP: 88004bdf5978 R08: 8800ad3bf8e0 R09: 
 R10: 86ca1810 R11: 880036a17101 R12: 8801007ee780
 R13: 00018001 R14: 880118c4e000 R15: 8801007ee780
 FS:  7f401a0ce740() GS:88011e28() knlGS:
 CS:  0010 DS:  ES:  CR0: 80050033
 CR2: 0008 CR3: 5635c000 CR4: 001407e0
 Stack:
  8801007ee780 88005c253180 00018000 8801135e
  88004bdf59a8 a0088e55 0011 8801007eec00
  00018000 880036a17101 88004bdf5a08 a0089026
 Call Trace:
  [a0088e55] i915_vma_unbind+0xdf/0x1ab [i915]
  [a0089026] __i915_gem_shrink+0x105/0x177 [i915]
  [a0089452] i915_gem_object_get_pages_gtt+0x108/0x309 [i915]
  [a0085ba9] i915_gem_object_get_pages+0x61/0x90 [i915]
  [a008f22b] ? gen6_ppgtt_insert_entries+0x103/0x125 [i915]
  [a008a113] i915_gem_object_pin+0x1fa/0x5df [i915]
  [a008cdfe] i915_gem_execbuffer_reserve_object.isra.6+0x8d/0x1bc 
 [i915]
  [a008d156] i915_gem_execbuffer_reserve+0x229/0x367 [i915]
  [a008dbf6] i915_gem_do_execbuffer.isra.12+0x4dc/0xf3a [i915]
  [810fc823] ? might_fault+0x40/0x90
  [a008eb89] i915_gem_execbuffer2+0x187/0x222 [i915]
  [a000971c] drm_ioctl+0x308/0x442 [drm]
  [a008ea02] ? i915_gem_execbuffer+0x3ae/0x3ae [i915]
  [817db156] ? __do_page_fault+0x3dd/0x481
  [8112fdba] vfs_ioctl+0x26/0x39
  [811306a2] do_vfs_ioctl+0x40e/0x451
  [817deda7] ? sysret_check+0x1b/0x56
  [8113073c] SyS_ioctl+0x57/0x87
  [8135bbfe] ? trace_hardirqs_on_thunk+0x3a/0x3f
  [817ded82] system_call_fastpath+0x16/0x1b
 Code: 48 c7 c6 84 30 0e a0 31 c0 e8 d0 e9 f7 ff bf c6 a7 00 00 e8 07 af 2c e1 
 41 f6 84 24 03 01 00 00 10 75 44 49 8b 84 24 08 01 00 00 8b 50 08 48 8b 30 
 49 8b 86 b0 04 00 00 48 89 c7 48 81 c7 98 00
 RIP  [a008fb37] i915_gem_gtt_finish_object+0x73/0xc8 [i915]
  RSP 88004bdf5958
 CR2: 0008
 
 As a consequence we need to change the only one vma for now check in
 vma_unbind - since vma_destroy isn't always called the obj-vma_list
 might not be empty. Instead check that the vma list is singular at the
 beginning of vma_unbind. This is also more symmetric with bind_to_vm.
 
 v2:
 - Add a paranoid WARN to mark_free in the eviction code to make sure
   we never try to evict a vma used by the execbuf code right now.
 - Move the check for a temporary execbuf vma into vma_destroy -
   otherwise the failure path cleanup in bind_to_vm will blow up.
 
 Our first attempting at fixing this was
 
 commit 1be81a2f2cfd8789a627401d470423358fba2d76
 Author: Chris Wilson ch...@chris-wilson.co.uk
 Date:   Tue Aug 20 12:56:40 2013 +0100
 
 drm/i915: Don't destroy the vma placeholder during execbuffer reservation
 
 Squash with this when merging!
 
 v3: Improvements suggested in Chris' review:
 - Move the WARN_ON in vma_destroy that checks for vmas with an drm_mm
   allocation before the early return.
 - Bail out if we hit the WARN in mark_free to hopefully make the
   kernel survive for long enough to capture it.
 
 Cc: Chris Wilson ch...@chris-wilson.co.uk
 Cc: Ben Widawsky b...@bwidawsk.net
 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68298
 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68171
 Tested-by: lu

[Intel-gfx] sysfs: i915_cur_delayinfo is always twice as gt_cur_freq_mhz in 3.8.13.4?

2013-08-26 Thread Cui, Dexuan

Hi all, I'm using haswell with linux-3.8.13.4:
http://kernel.ubuntu.com/git?p=ubuntu/linux.git;a=commitdiff;h=d1baa1360260b5a01938674fc518109a4e5a148d
and I find the CAGF showed by i915_cur_delayinfo is always twice as 
gt_cur_freq_mhz, e.g., when the system is idle, I can get the below log:

- Log begins
# cat /sys/kernel/debug/dri/0/i915_cur_delayinfo; echo;  cat 
/sys/devices/pci:00/:00:02.0/drm/card0/gt_cur_freq_mhz; echo ;
GT_PERF_STATUS: 0x0800
RPSTAT1: 0x00040400
Render p-state ratio: 8
Render p-state VID: 0
Render p-state limit: 255
CAGF: 400MHz
RP CUR UP EI: 55890us
RP CUR UP: 276us
RP PREV UP: 0us
RP CUR DOWN EI: 94295us
RP CUR DOWN: 389us
RP PREV DOWN: 0us
Lowest (RPN) frequency: 200MHz
Nominal (RP1) frequency: 400MHz
Max non-overclocked (RP0) frequency: 1200MHz

200
- Log ends. 

But with the latest 3.10.0, both values are 200, as expected.

I guess there is a fix in 3.10.0 I can pick for my 3.8.13.4?

Thanks,
-- Dexuan

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915: Delay the relase of the forcewake by a jiffie

On Sun, Aug 25, 2013 at 08:45:34PM +0100, Chris Wilson wrote:
 Obtaining the forcwake requires expensive and time consuming
 serialisation. And we often try to obtain the forcewake multiple times
 in very quick succession. We can reduce the overhead of these sequences
 by delaying the forcewake release, and so not hammer the hw quite so
 hard.
 
 I was hoping this would help with the spurious
 [drm:__gen6_gt_force_wake_mt_get] *ERROR* Timed out waiting for forcewake old 
 ack to clear.
 found on Haswell. Alas not.
 
 Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
 ---
  drivers/gpu/drm/i915/i915_dma.c |  2 ++
  drivers/gpu/drm/i915/i915_drv.h |  3 +++
  drivers/gpu/drm/i915/intel_uncore.c | 33 +++--
  3 files changed, 36 insertions(+), 2 deletions(-)
 
 diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
 index 883990f..c0fb23e 100644
 --- a/drivers/gpu/drm/i915/i915_dma.c
 +++ b/drivers/gpu/drm/i915/i915_dma.c
 @@ -1800,6 +1800,8 @@ int i915_driver_unload(struct drm_device *dev)
  
   dev_priv-gtt.base.cleanup(dev_priv-gtt.base);
  
 + intel_uncore_fini(dev);
 +

We've already unmapped the registers.

   if (dev_priv-slab)
   kmem_cache_destroy(dev_priv-slab);
  
 diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
 index a6354c3..8c93d93 100644
 --- a/drivers/gpu/drm/i915/i915_drv.h
 +++ b/drivers/gpu/drm/i915/i915_drv.h
 @@ -406,6 +406,8 @@ struct intel_uncore {
  
   unsigned fifo_count;
   unsigned forcewake_count;
 +
 + struct delayed_work force_wake_work;
  };
  
  #define DEV_INFO_FOR_EACH_FLAG(func, sep) \
 @@ -1792,6 +1794,7 @@ extern void intel_uncore_early_sanitize(struct 
 drm_device *dev);
  extern void intel_uncore_init(struct drm_device *dev);
  extern void intel_uncore_clear_errors(struct drm_device *dev);
  extern void intel_uncore_check_errors(struct drm_device *dev);
 +extern void intel_uncore_fini(struct drm_device *dev);
  
  void
  i915_enable_pipestat(drm_i915_private_t *dev_priv, int pipe, u32 mask);
 diff --git a/drivers/gpu/drm/i915/intel_uncore.c 
 b/drivers/gpu/drm/i915/intel_uncore.c
 index 8f5bc86..50fdad7 100644
 --- a/drivers/gpu/drm/i915/intel_uncore.c
 +++ b/drivers/gpu/drm/i915/intel_uncore.c
 @@ -204,6 +204,18 @@ static void vlv_force_wake_put(struct drm_i915_private 
 *dev_priv)
   gen6_gt_check_fifodbg(dev_priv);
  }
  
 +static void gen6_force_wake_work(struct work_struct *work)
 +{
 + struct drm_i915_private *dev_priv =
 + container_of(work, typeof(*dev_priv), 
 uncore.force_wake_work.work);
 + unsigned long irqflags;
 +
 + spin_lock_irqsave(dev_priv-uncore.lock, irqflags);
 + if (--dev_priv-uncore.forcewake_count == 0)
 + dev_priv-uncore.funcs.force_wake_put(dev_priv);
 + spin_unlock_irqrestore(dev_priv-uncore.lock, irqflags);
 +}
 +
  void intel_uncore_early_sanitize(struct drm_device *dev)
  {
   struct drm_i915_private *dev_priv = dev-dev_private;
 @@ -216,6 +228,9 @@ void intel_uncore_init(struct drm_device *dev)
  {
   struct drm_i915_private *dev_priv = dev-dev_private;
  
 + INIT_DELAYED_WORK(dev_priv-uncore.force_wake_work,
 +   gen6_force_wake_work);
 +
   if (IS_VALLEYVIEW(dev)) {
   dev_priv-uncore.funcs.force_wake_get = vlv_force_wake_get;
   dev_priv-uncore.funcs.force_wake_put = vlv_force_wake_put;
 @@ -261,6 +276,13 @@ void intel_uncore_init(struct drm_device *dev)
   }
  }
  
 +void intel_uncore_fini(struct drm_device *dev)
 +{
 + struct drm_i915_private *dev_priv = dev-dev_private;
 +
 + cancel_delayed_work_sync(dev_priv-uncore.force_wake_work);

Aren't we potentially leaking a forcewake here? Or do we clear it
forcefully somewhere else?

 +}
 +
  void intel_uncore_sanitize(struct drm_device *dev)
  {
   struct drm_i915_private *dev_priv = dev-dev_private;
 @@ -301,8 +323,15 @@ void gen6_gt_force_wake_put(struct drm_i915_private 
 *dev_priv)
   unsigned long irqflags;
  
   spin_lock_irqsave(dev_priv-uncore.lock, irqflags);
 - if (--dev_priv-uncore.forcewake_count == 0)
 - dev_priv-uncore.funcs.force_wake_put(dev_priv);
 + if (--dev_priv-uncore.forcewake_count == 0) {
 + if (dev_priv-uncore.force_wake_work.work.func) {
 + dev_priv-uncore.forcewake_count++;
 + mod_delayed_work(dev_priv-wq,
 +  dev_priv-uncore.force_wake_work,
 +  1);
 + } else
 + dev_priv-uncore.funcs.force_wake_put(dev_priv);
 + }
   spin_unlock_irqrestore(dev_priv-uncore.lock, irqflags);
  }
  
 -- 
 1.8.4.rc3
 
 ___
 Intel-gfx mailing list
 Intel-gfx@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ville Syrjälä
Intel OTC

[Intel-gfx] [PATCH 2/2] drm/i915: Tweak RPS thresholds to more aggressively downclock

After applying wait-boost we often find ourselves stuck at higher clocks
than required. The current threshold value requires the GPU to be
continuously and completely idle for 313ms before it is dropped by one
bin. Conversely, we require the GPU to be busy for an average of 90% over
a 84ms period before we upclock. So the current thresholds almost never
downclock the GPU, and respond very slowly to sudden demands for more
power. It is easy to observe that we currently lock into the wrong bin
and both underperform in benchmarks and consume more power than optimal
(just by repeating the task and measuring the different results).

An alternative approach, as discussed in the bspec, is to use a
continuous threshold for upclocking, and an average value for downclocking.
This is good for quickly detecting and reacting to state changes within a
frame, however it fails with the common throttling method of waiting
upon the outstanding frame - at least it is difficult to choose a
threshold that works well at 15,000fps and at 60fps. So continue to use
average busy/idle loads to determine frequency change.

v2: Use 3 power zones to keep frequencies low in steady-state mostly
idle (e.g. scrolling, interactive 2D drawing), and frequencies high
for demanding games. In between those end-states, we use a
fast-reclocking algorithm to converge more quickly on the desired bin.

v3: Bug fixes - make sure we reset adj after switching power zones.

v4: Tune - drop the continuous busy thresholds as it prevents us from
choosing the right frequency for glxgears style swap benchmarks. Instead
the goal is to be able to find the right clocks irrespective of the
wait-boost.

Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
Cc: Kenneth Graunke kenn...@whitecape.org
Cc: Stéphane Marchesin stephane.marche...@gmail.com
Cc: Owen Taylor otay...@redhat.com
Cc: Meng, Mengmeng mengmeng.m...@intel.com
Cc: Zhuang, Lena lena.zhu...@intel.com
---
 drivers/gpu/drm/i915/i915_drv.h |   5 ++
 drivers/gpu/drm/i915/i915_irq.c |  46 ++
 drivers/gpu/drm/i915/i915_reg.h |   2 +-
 drivers/gpu/drm/i915/intel_pm.c | 135 ++--
 4 files changed, 141 insertions(+), 47 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 8993a0b..a6354c3 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -835,8 +835,13 @@ struct intel_gen6_power_mgmt {
u8 min_delay;
u8 max_delay;
u8 rpe_delay;
+   u8 rp1_delay;
+   u8 rp0_delay;
u8 hw_max;
 
+   int last_adj;
+   enum { LOW_POWER, BETWEEN, HIGH_POWER } power;
+
struct delayed_work delayed_resume_work;
 
/*
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 875125f..5072a61 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -812,7 +812,7 @@ static void gen6_pm_rps_work(struct work_struct *work)
drm_i915_private_t *dev_priv = container_of(work, drm_i915_private_t,
rps.work);
u32 pm_iir;
-   u8 new_delay;
+   int new_delay, adj;
 
spin_lock_irq(dev_priv-irq_lock);
pm_iir = dev_priv-rps.pm_iir;
@@ -829,29 +829,49 @@ static void gen6_pm_rps_work(struct work_struct *work)
 
mutex_lock(dev_priv-rps.hw_lock);
 
+   adj = dev_priv-rps.last_adj;
if (pm_iir  GEN6_PM_RP_UP_THRESHOLD) {
-   new_delay = dev_priv-rps.cur_delay + 1;
+   if (adj  0)
+   adj *= 2;
+   else
+   adj = 1;
+   new_delay = dev_priv-rps.cur_delay + adj;
 
/*
 * For better performance, jump directly
 * to RPe if we're below it.
 */
-   if (IS_VALLEYVIEW(dev_priv-dev) 
-   dev_priv-rps.cur_delay  dev_priv-rps.rpe_delay)
+   if (new_delay  dev_priv-rps.rpe_delay)
+   new_delay = dev_priv-rps.rpe_delay;
+   } else if (pm_iir  GEN6_PM_RP_DOWN_TIMEOUT) {
+   if (dev_priv-rps.cur_delay  dev_priv-rps.rpe_delay)
new_delay = dev_priv-rps.rpe_delay;
-   } else
-   new_delay = dev_priv-rps.cur_delay - 1;
+   else
+   new_delay = dev_priv-rps.min_delay;
+   adj = 0;
+   } else if (pm_iir  GEN6_PM_RP_DOWN_THRESHOLD) {
+   if (adj  0)
+   adj *= 2;
+   else
+   adj = -1;
+   new_delay = dev_priv-rps.cur_delay + adj;
+   } else { /* unknown event */
+   new_delay = dev_priv-rps.cur_delay;
+   }
 
/* sysfs frequency interfaces may have snuck in while servicing the
 * interrupt
 */
-   if (new_delay = dev_priv-rps.min_delay 
-   new_delay = dev_priv-rps.max_delay) {
-

[Intel-gfx] [PATCH 1/2] drm/i915: Boost RPS frequency for CPU stalls

If we encounter a situation where the CPU blocks waiting for results
from the GPU, give the GPU a kick to boost its the frequency.

This should work to reduce user interface stalls and to quickly promote
mesa to high frequencies - but the cost is that our requested frequency
stalls high (as we do not idle for long enough before rc6 to start
reducing frequencies, nor are we aggressive at down clocking an
underused GPU). However, this should be mitigated by rc6 itself powering
off the GPU when idle, and that energy use is dependent upon the workload
of the GPU in addition to its frequency (e.g. the math or sampler
functions only consume power when used). Still, this is likely to
adversely affect light workloads.

In particular, this nearly eliminates the highly noticeable wake-up lag
in animations from idle. For example, expose or workspace transitions.
(However, given the situation where we fail to downclock, our requested
frequency is almost always the maximum, except for Baytrail where we
manually downclock upon idling. This often masks the latency of
upclocking after being idle, so animations are typically smooth - at the
cost of increased power consumption.)

Stéphane raised the concern that this will punish good applications and
reward bad applications - but due to the nature of how mesa performs its
client throttling, I believe all mesa applications will be roughly
equally affected. To address this concern, and to prevent applications
like compositors from regularly boosting the RPS state, we only allow
each client to receive one boost in each period of activity.

Unfortunately, this techinique is ineffective with Ironlake - which also
has dynamic render power states and suffers just as dramatically. For
Ironlake, the thermal/power headroom is shared with the CPU through
Intelligent Power Sharing and the intel-ips module. This leaves us with
no GPU boost frequencies available when coming out of idle, and due to
hardware limitations we cannot change the arbitration between the CPU and
GPU quickly enough to be effective.

v2: Limit each client to receiving a single boost for each active period.
v3: Tidy up. Allow the device to always boost if it waits outside of
client context.

Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
Cc: Kenneth Graunke kenn...@whitecape.org
Cc: Stéphane Marchesin stephane.marche...@gmail.com
Cc: Owen Taylor otay...@redhat.com
Cc: Meng, Mengmeng mengmeng.m...@intel.com
Cc: Zhuang, Lena lena.zhu...@intel.com
---
 drivers/gpu/drm/i915/i915_dma.c  |  16 ++---
 drivers/gpu/drm/i915/i915_drv.h  |  19 +++--
 drivers/gpu/drm/i915/i915_gem.c  | 136 ---
 drivers/gpu/drm/i915/i915_irq.c  |  11 ---
 drivers/gpu/drm/i915/intel_display.c |   3 +
 drivers/gpu/drm/i915/intel_drv.h |   3 +
 drivers/gpu/drm/i915/intel_pm.c  |  42 ++-
 7 files changed, 139 insertions(+), 91 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 99159c9..883990f 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1811,19 +1811,11 @@ int i915_driver_unload(struct drm_device *dev)
 
 int i915_driver_open(struct drm_device *dev, struct drm_file *file)
 {
-   struct drm_i915_file_private *file_priv;
-
-   DRM_DEBUG_DRIVER(\n);
-   file_priv = kzalloc(sizeof(*file_priv), GFP_KERNEL);
-   if (!file_priv)
-   return -ENOMEM;
-
-   file-driver_priv = file_priv;
-
-   spin_lock_init(file_priv-mm.lock);
-   INIT_LIST_HEAD(file_priv-mm.request_list);
+   int ret;
 
-   idr_init(file_priv-context_idr);
+   ret = i915_gem_open(dev, file);
+   if (ret)
+   return ret;
 
return 0;
 }
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index f626271..8993a0b 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -829,9 +829,6 @@ struct intel_gen6_power_mgmt {
struct work_struct work;
u32 pm_iir;
 
-   /* On vlv we need to manually drop to Vmin with a delayed work. */
-   struct delayed_work vlv_work;
-
/* The below variables an all the rps hw state are protected by
 * dev-struct mutext. */
u8 cur_delay;
@@ -947,6 +944,15 @@ struct i915_gem_mm {
struct delayed_work retire_work;
 
/**
+* When we detect an idle GPU, we want to turn on
+* powersaving features. So once we see that there
+* are no more requests outstanding and no more
+* arrive within a small period of time, we fire
+* off the idle_work.
+*/
+   struct delayed_work idle_work;
+
+   /**
 * Are we in a non-interruptible section of code like
 * modesetting?
 */
@@ -1570,13 +1576,17 @@ struct drm_i915_gem_request {
 };
 
 struct drm_i915_file_private {
+   struct drm_i915_private *dev_priv;
+
struct {
spinlock_t lock;

[Intel-gfx] [PATCH] quick_dump/hsw: support haswell in device auto-detection

2013-08-26 Thread mengdong . lin

From: Mengdong Lin mengdong@intel.com

This patch exposes is_haswell() to python, to be used by device auto-detection.

Signed-off-by: Mengdong Lin mengdong@intel.com

diff --git a/tools/quick_dump/chipset.i b/tools/quick_dump/chipset.i
index 69dc8df..4b13487 100644
--- a/tools/quick_dump/chipset.i
+++ b/tools/quick_dump/chipset.i
@@ -7,6 +7,7 @@
 extern int is_sandybridge(unsigned short pciid);
 extern int is_ivybridge(unsigned short pciid);
 extern int is_valleyview(unsigned short pciid);
+extern int is_haswell(unsigned short pciid);
 extern struct pci_device *intel_get_pci_device();
 extern int intel_register_access_init(struct pci_device *pci_dev, int safe);
 extern uint32_t intel_register_read(uint32_t reg);
@@ -19,6 +20,7 @@ extern uint32_t intel_dpio_reg_read(uint32_t reg);
 
 extern int is_sandybridge(unsigned short pciid);
 extern int is_ivybridge(unsigned short pciid);
+extern int is_haswell(unsigned short pciid);
 extern int is_valleyview(unsigned short pciid);
 extern struct pci_device *intel_get_pci_device();
 extern int intel_register_access_init(struct pci_device *pci_dev, int safe);
-- 
1.8.1.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] Patches for performance improvement

On Mon, Aug 26, 2013 at 11:16:40AM +0300, Ville Syrjälä wrote:
 On Sun, Aug 25, 2013 at 04:35:52AM +, Kannan, Vandana wrote:
  Hi,
  
  For sprite performance improvement, instead of the patch 
  drm/i915: Don't wait for vblank for sprite plane flips, we made use of 
  Chris Wilson's patches from RFC asynchronous vblank tasks and verified 
  that it gives the expected performance boost.
  Chris's patches have not been merged yet, specifically,
  drm/i915: Introduce vblank work function
  drm/i915: Asynchronously unpin the old   framebuffer after the 
  next vblank
  drm/i915/sprite: Make plane switching asynchronous
  
  Will the updated patches for these be coming any time soon ? We 
  require these patches as it gives us a very good performance improvement.
  Please let me know, I can verify with the updated patches.
 
 I don't seem to have said patches in my mailbox, so I can't comment on
 the details, but...
 
 My main concern is again the contents of the old fb after the setplane
 ioctl returns. Due to the current synchronous nature of the ioctl,
 userspace knows that when the ioctl returns the flip has completed, and
 so can write the old fb again.
 
 If the ioctl is async, userspace has to guess whether the old fb can
 be safely written. We could solve this with a new event that gets sent
 from setplane. We do have plenty of flags left in setplane so it's
 certainly doable.

The patches were originally motivated because the setplane ioctl was
doing the opposite of the userspace expectations. Its user is quite
happy to have the ioctl pretend it completed immediately without an
event. (Okay, not happy, but the whole setup just ignores the complexity
by have a small ringbuffer of frames and hoping for the best [mplayer].)

The issue and discussion is as old as the original patch for setplane.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH] drm/i915: Delay the relase of the forcewake by a jiffie

Obtaining the forcwake requires expensive and time consuming
serialisation. And we often try to obtain the forcewake multiple times
in very quick succession. We can reduce the overhead of these sequences
by delaying the forcewake release, and so not hammer the hw quite so
hard.

I was hoping this would help with the spurious
[drm:__gen6_gt_force_wake_mt_get] *ERROR* Timed out waiting for forcewake old 
ack to clear.
found on Haswell. Alas not.

v2: Fix teardown ordering - unmap the regs after turning off forcewake,
and make sure we do turn off forcewake - both found by Ville.

Note: I have no claims for improved performance, stablity or power
comsumption for this patch. We should not be hitting the registers often
enough for this to improve benchmarks, but given the nature of our hw it
is likely to improve long term stability.

Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
---
 drivers/gpu/drm/i915/i915_dma.c |  6 --
 drivers/gpu/drm/i915/i915_drv.h |  3 +++
 drivers/gpu/drm/i915/intel_uncore.c | 30 --
 3 files changed, 35 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 883990f..97a6e22 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1787,8 +1787,6 @@ int i915_driver_unload(struct drm_device *dev)
list_del(dev_priv-gtt.base.global_link);
WARN_ON(!list_empty(dev_priv-vm_list));
drm_mm_takedown(dev_priv-gtt.base.mm);
-   if (dev_priv-regs != NULL)
-   pci_iounmap(dev-pdev, dev_priv-regs);
 
drm_vblank_cleanup(dev);
 
@@ -1800,6 +1798,10 @@ int i915_driver_unload(struct drm_device *dev)
 
dev_priv-gtt.base.cleanup(dev_priv-gtt.base);
 
+   intel_uncore_fini(dev);
+   if (dev_priv-regs != NULL)
+   pci_iounmap(dev-pdev, dev_priv-regs);
+
if (dev_priv-slab)
kmem_cache_destroy(dev_priv-slab);
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index a6354c3..8c93d93 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -406,6 +406,8 @@ struct intel_uncore {
 
unsigned fifo_count;
unsigned forcewake_count;
+
+   struct delayed_work force_wake_work;
 };
 
 #define DEV_INFO_FOR_EACH_FLAG(func, sep) \
@@ -1792,6 +1794,7 @@ extern void intel_uncore_early_sanitize(struct drm_device 
*dev);
 extern void intel_uncore_init(struct drm_device *dev);
 extern void intel_uncore_clear_errors(struct drm_device *dev);
 extern void intel_uncore_check_errors(struct drm_device *dev);
+extern void intel_uncore_fini(struct drm_device *dev);
 
 void
 i915_enable_pipestat(drm_i915_private_t *dev_priv, int pipe, u32 mask);
diff --git a/drivers/gpu/drm/i915/intel_uncore.c 
b/drivers/gpu/drm/i915/intel_uncore.c
index 8649f1c..462cc7f 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -204,6 +204,18 @@ static void vlv_force_wake_put(struct drm_i915_private 
*dev_priv)
gen6_gt_check_fifodbg(dev_priv);
 }
 
+static void gen6_force_wake_work(struct work_struct *work)
+{
+   struct drm_i915_private *dev_priv =
+   container_of(work, typeof(*dev_priv), 
uncore.force_wake_work.work);
+   unsigned long irqflags;
+
+   spin_lock_irqsave(dev_priv-uncore.lock, irqflags);
+   if (--dev_priv-uncore.forcewake_count == 0)
+   dev_priv-uncore.funcs.force_wake_put(dev_priv);
+   spin_unlock_irqrestore(dev_priv-uncore.lock, irqflags);
+}
+
 void intel_uncore_early_sanitize(struct drm_device *dev)
 {
struct drm_i915_private *dev_priv = dev-dev_private;
@@ -216,6 +228,9 @@ void intel_uncore_init(struct drm_device *dev)
 {
struct drm_i915_private *dev_priv = dev-dev_private;
 
+   INIT_DELAYED_WORK(dev_priv-uncore.force_wake_work,
+ gen6_force_wake_work);
+
if (IS_VALLEYVIEW(dev)) {
dev_priv-uncore.funcs.force_wake_get = vlv_force_wake_get;
dev_priv-uncore.funcs.force_wake_put = vlv_force_wake_put;
@@ -261,6 +276,13 @@ void intel_uncore_init(struct drm_device *dev)
}
 }
 
+void intel_uncore_fini(struct drm_device *dev)
+{
+   struct drm_i915_private *dev_priv = dev-dev_private;
+
+   flush_delayed_work(dev_priv-uncore.force_wake_work);
+}
+
 static void intel_uncore_forcewake_reset(struct drm_device *dev)
 {
struct drm_i915_private *dev_priv = dev-dev_private;
@@ -306,8 +328,12 @@ void gen6_gt_force_wake_put(struct drm_i915_private 
*dev_priv)
unsigned long irqflags;
 
spin_lock_irqsave(dev_priv-uncore.lock, irqflags);
-   if (--dev_priv-uncore.forcewake_count == 0)
-   dev_priv-uncore.funcs.force_wake_put(dev_priv);
+   if (--dev_priv-uncore.forcewake_count == 0) {
+   dev_priv-uncore.forcewake_count++;
+   mod_delayed_work(dev_priv-wq,
+

Re: [Intel-gfx] [PATCH] drm/i915: Delay the relase of the forcewake by a jiffie

On Mon, Aug 26, 2013 at 12:06:43PM +0100, Chris Wilson wrote:
 Obtaining the forcwake requires expensive and time consuming
 serialisation. And we often try to obtain the forcewake multiple times
 in very quick succession. We can reduce the overhead of these sequences
 by delaying the forcewake release, and so not hammer the hw quite so
 hard.
 
 I was hoping this would help with the spurious
 [drm:__gen6_gt_force_wake_mt_get] *ERROR* Timed out waiting for forcewake old 
 ack to clear.
 found on Haswell. Alas not.
 
 v2: Fix teardown ordering - unmap the regs after turning off forcewake,
 and make sure we do turn off forcewake - both found by Ville.
 
 Note: I have no claims for improved performance, stablity or power
 comsumption for this patch. We should not be hitting the registers often
 enough for this to improve benchmarks, but given the nature of our hw it
 is likely to improve long term stability.
 
 Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
 ---
  drivers/gpu/drm/i915/i915_dma.c |  6 --
  drivers/gpu/drm/i915/i915_drv.h |  3 +++
  drivers/gpu/drm/i915/intel_uncore.c | 30 --
  3 files changed, 35 insertions(+), 4 deletions(-)
 
 diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
 index 883990f..97a6e22 100644
 --- a/drivers/gpu/drm/i915/i915_dma.c
 +++ b/drivers/gpu/drm/i915/i915_dma.c
 @@ -1787,8 +1787,6 @@ int i915_driver_unload(struct drm_device *dev)
   list_del(dev_priv-gtt.base.global_link);
   WARN_ON(!list_empty(dev_priv-vm_list));
   drm_mm_takedown(dev_priv-gtt.base.mm);
 - if (dev_priv-regs != NULL)
 - pci_iounmap(dev-pdev, dev_priv-regs);
  
   drm_vblank_cleanup(dev);
  
 @@ -1800,6 +1798,10 @@ int i915_driver_unload(struct drm_device *dev)
  
   dev_priv-gtt.base.cleanup(dev_priv-gtt.base);
  
 + intel_uncore_fini(dev);
 + if (dev_priv-regs != NULL)
 + pci_iounmap(dev-pdev, dev_priv-regs);
 +
   if (dev_priv-slab)
   kmem_cache_destroy(dev_priv-slab);
  
 diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
 index a6354c3..8c93d93 100644
 --- a/drivers/gpu/drm/i915/i915_drv.h
 +++ b/drivers/gpu/drm/i915/i915_drv.h
 @@ -406,6 +406,8 @@ struct intel_uncore {
  
   unsigned fifo_count;
   unsigned forcewake_count;
 +
 + struct delayed_work force_wake_work;
  };
  
  #define DEV_INFO_FOR_EACH_FLAG(func, sep) \
 @@ -1792,6 +1794,7 @@ extern void intel_uncore_early_sanitize(struct 
 drm_device *dev);
  extern void intel_uncore_init(struct drm_device *dev);
  extern void intel_uncore_clear_errors(struct drm_device *dev);
  extern void intel_uncore_check_errors(struct drm_device *dev);
 +extern void intel_uncore_fini(struct drm_device *dev);
  
  void
  i915_enable_pipestat(drm_i915_private_t *dev_priv, int pipe, u32 mask);
 diff --git a/drivers/gpu/drm/i915/intel_uncore.c 
 b/drivers/gpu/drm/i915/intel_uncore.c
 index 8649f1c..462cc7f 100644
 --- a/drivers/gpu/drm/i915/intel_uncore.c
 +++ b/drivers/gpu/drm/i915/intel_uncore.c
 @@ -204,6 +204,18 @@ static void vlv_force_wake_put(struct drm_i915_private 
 *dev_priv)
   gen6_gt_check_fifodbg(dev_priv);
  }
  
 +static void gen6_force_wake_work(struct work_struct *work)
 +{
 + struct drm_i915_private *dev_priv =
 + container_of(work, typeof(*dev_priv), 
 uncore.force_wake_work.work);
 + unsigned long irqflags;
 +
 + spin_lock_irqsave(dev_priv-uncore.lock, irqflags);
 + if (--dev_priv-uncore.forcewake_count == 0)
 + dev_priv-uncore.funcs.force_wake_put(dev_priv);
 + spin_unlock_irqrestore(dev_priv-uncore.lock, irqflags);
 +}
 +
  void intel_uncore_early_sanitize(struct drm_device *dev)
  {
   struct drm_i915_private *dev_priv = dev-dev_private;
 @@ -216,6 +228,9 @@ void intel_uncore_init(struct drm_device *dev)
  {
   struct drm_i915_private *dev_priv = dev-dev_private;
  
 + INIT_DELAYED_WORK(dev_priv-uncore.force_wake_work,
 +   gen6_force_wake_work);
 +
   if (IS_VALLEYVIEW(dev)) {
   dev_priv-uncore.funcs.force_wake_get = vlv_force_wake_get;
   dev_priv-uncore.funcs.force_wake_put = vlv_force_wake_put;
 @@ -261,6 +276,13 @@ void intel_uncore_init(struct drm_device *dev)
   }
  }
  
 +void intel_uncore_fini(struct drm_device *dev)
 +{
 + struct drm_i915_private *dev_priv = dev-dev_private;
 +
 + flush_delayed_work(dev_priv-uncore.force_wake_work);

A bit orthogonal, but maybe we should also add
'WARN_ON(dev_priv-uncore.forcewake_count)' just to make sure we
don't leave forcewake enabled by accident? Or if we're really
paranoid, maybe even 'if (WARN_ON(...)) force_wake_reset()'.

But anyways, the patch looks like it should do what it claims, and
I can't see other bugs, so:
Reviewed-by: Ville Syrjälä ville.syrj...@linux.intel.com

 +}
 +
  static void intel_uncore_forcewake_reset(struct drm_device *dev)
  {
   struct drm_i915_private

Re: [Intel-gfx] [PATCH v2] i915: Update VGA arbiter support for newer devices

On Sat, Aug 24, 2013 at 08:53:40AM -0600, Alex Williamson wrote:
 This is intended to add VGA arbiter support for Intel HD graphics on
 Core processors.  The old GMCH registers no longer exist, so even
 though it appears that i915 participates in VGA arbitration, it doesn't
 work.  On Intel HD graphics we already attempt to disable VGA regions
 of the device.  This makes registering as a VGA client unnecessary since
 we don't intend to operate differently depending on how many VGA devices
 are present.  We can disable VGA memory regions by clearing the memory
 enable bit in the VGA MSR.  That only leaves VGA IO, which we update
 the VGA arbiter to know that we don't participate in VGA memory
 arbitration.  We also add a hook on unload to re-enable memory and
 reinstate VGA memory arbitration.
 
 Signed-off-by: Alex Williamson alex.william...@redhat.com
 ---
 
 v2: I915_READ/WRITE accessors don't work in i915_disable_vga, use inb/outb
 directly.  Also, on the driver unbind VGA enable path, acquire legacy
 IO to re-enable VGA memory.  Correct comment.
 
 As with v1, this depends on vgaarb: Fixes for partial VGA opt-out.  With
 all patches I'm able to assign a discrete PEG VGA device to a guest and
 execute the VBIOS w/o interference from IGD or corruption of the IGD
 framebuffer.

I glanced through those before, but I'll try to do a more thorough
review soon.

 
  drivers/gpu/drm/i915/i915_dma.c  |9 ++---
  drivers/gpu/drm/i915/intel_display.c |   24 
  2 files changed, 30 insertions(+), 3 deletions(-)
 
 diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
 index f466980..d9cf216 100644
 --- a/drivers/gpu/drm/i915/i915_dma.c
 +++ b/drivers/gpu/drm/i915/i915_dma.c
 @@ -1287,9 +1287,12 @@ static int i915_load_modeset_init(struct drm_device 
 *dev)
* then we do not take part in VGA arbitration and the
* vga_client_register() fails with -ENODEV.
*/
 - ret = vga_client_register(dev-pdev, dev, NULL, i915_vga_set_decode);
 - if (ret  ret != -ENODEV)
 - goto out;
 + if (!HAS_PCH_SPLIT(dev)) {
 + ret = vga_client_register(dev-pdev, dev, NULL,
 +   i915_vga_set_decode);
 + if (ret  ret != -ENODEV)
 + goto out;
 + }

I think we should do something to fix i915_redisable_vga() for !pch
platforms too. The MSR method should work for everything, so I'm leaning
towards using it all the time. The other option would be add an mmio
codepath into intel_disable_vga(), but then we have several codepaths
to worry about.

  
   intel_register_dsm_handler();
  
 diff --git a/drivers/gpu/drm/i915/intel_display.c 
 b/drivers/gpu/drm/i915/intel_display.c
 index 5fb3058..2bb805c 100644
 --- a/drivers/gpu/drm/i915/intel_display.c
 +++ b/drivers/gpu/drm/i915/intel_display.c
 @@ -9519,6 +9519,15 @@ static void i915_disable_vga(struct drm_device *dev)
   outb(SR01, VGA_SR_INDEX);
   sr1 = inb(VGA_SR_DATA);
   outb(sr1 | 15, VGA_SR_DATA);
 +
 + /* Disable VGA memory on Intel HD */
 + if (HAS_PCH_SPLIT(dev)) {
 + outb(inb(VGA_MSR_READ)  ~VGA_MSR_MEM_EN, VGA_MSR_WRITE);
 + vga_set_legacy_decoding(dev-pdev, VGA_RSRC_LEGACY_IO |
 +VGA_RSRC_NORMAL_IO |
 +VGA_RSRC_NORMAL_MEM);
 + }
 +
   vga_put(dev-pdev, VGA_RSRC_LEGACY_IO);
   udelay(300);
  
 @@ -9526,6 +9535,19 @@ static void i915_disable_vga(struct drm_device *dev)
   POSTING_READ(vga_reg);
  }
  
 +static void i915_enable_vga(struct drm_device *dev)
 +{

Not sure we need this actually. We don't re-enable the VGA display
on unload, so you'll never see anything on the display anyway. But
then again, I don't see any real harm in doing this either.

 + /* Enable VGA memory on Intel HD */
 + if (HAS_PCH_SPLIT(dev)) {
 + vga_get_uninterruptible(dev-pdev, VGA_RSRC_LEGACY_IO);
 + outb(inb(VGA_MSR_READ) | VGA_MSR_MEM_EN, VGA_MSR_WRITE);

I was thinking we might save the MSR state during driver load and
restore to that same state, but it could be a bit overkill.

 + vga_set_legacy_decoding(dev-pdev, VGA_RSRC_LEGACY_MASK |

Maybe spell out IO and MEM for legacy too. Otherwise it looks a
bit cryptic.

 +VGA_RSRC_NORMAL_IO |
 +VGA_RSRC_NORMAL_MEM);
 + vga_put(dev-pdev, VGA_RSRC_LEGACY_IO);
 + }
 +}
 +
  void intel_modeset_init_hw(struct drm_device *dev)
  {
   intel_init_power_well(dev);
 @@ -9983,6 +10005,8 @@ void intel_modeset_cleanup(struct drm_device *dev)
  
   intel_disable_fbc(dev);
  
 + i915_enable_vga(dev);
 +
   intel_disable_gt_powersave(dev);
  
   ironlake_teardown_rc6(dev);

-- 
Ville Syrjälä
Intel OTC
___
Intel-gfx

[Intel-gfx] [PATCH] drm/i915: Initialise min/max frequencies before updating RPS registers

The RPS register writing routines use the current value of min/max to
set certain limits and interrupt gating. If we set those afterwards, we
risk setting up the hw incorrectly and losing power management events,
and worse, trigger some internal assertions.

Reorder the calling sequences to be correct, and remove the then
unrequired clamping from inside set_rps(). And for a bonus, fix the bug
of calling gen6_set_rps() from Valleyview.

Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
---
 drivers/gpu/drm/i915/i915_debugfs.c |  2 +-
 drivers/gpu/drm/i915/i915_sysfs.c   | 16 
 drivers/gpu/drm/i915/intel_pm.c | 19 +--
 3 files changed, 14 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index 2a276c8..b2b1730 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2113,7 +2113,7 @@ i915_max_freq_set(void *data, u64 val)
if (IS_VALLEYVIEW(dev)) {
val = vlv_freq_opcode(dev_priv-mem_freq, val);
dev_priv-rps.max_delay = val;
-   gen6_set_rps(dev, val);
+   valleyview_set_rps(dev, val);
} else {
do_div(val, GT_FREQUENCY_MULTIPLIER);
dev_priv-rps.max_delay = val;
diff --git a/drivers/gpu/drm/i915/i915_sysfs.c 
b/drivers/gpu/drm/i915/i915_sysfs.c
index c8c4112..05195c0 100644
--- a/drivers/gpu/drm/i915/i915_sysfs.c
+++ b/drivers/gpu/drm/i915/i915_sysfs.c
@@ -294,15 +294,15 @@ static ssize_t gt_max_freq_mhz_store(struct device *kdev,
DRM_DEBUG(User requested overclocking to %d\n,
  val * GT_FREQUENCY_MULTIPLIER);
 
+   dev_priv-rps.max_delay = val;
+
if (dev_priv-rps.cur_delay  val) {
-   if (IS_VALLEYVIEW(dev_priv-dev))
-   valleyview_set_rps(dev_priv-dev, val);
+   if (IS_VALLEYVIEW(dev))
+   valleyview_set_rps(dev, val);
else
-   gen6_set_rps(dev_priv-dev, val);
+   gen6_set_rps(dev, val);
}
 
-   dev_priv-rps.max_delay = val;
-
mutex_unlock(dev_priv-rps.hw_lock);
 
return count;
@@ -359,15 +359,15 @@ static ssize_t gt_min_freq_mhz_store(struct device *kdev,
return -EINVAL;
}
 
+   dev_priv-rps.min_delay = val;
+
if (dev_priv-rps.cur_delay  val) {
if (IS_VALLEYVIEW(dev))
valleyview_set_rps(dev, val);
else
-   gen6_set_rps(dev_priv-dev, val);
+   gen6_set_rps(dev, val);
}
 
-   dev_priv-rps.min_delay = val;
-
mutex_unlock(dev_priv-rps.hw_lock);
 
return count;
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 6767e2b..f995dda 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -3263,26 +3263,19 @@ static void ironlake_disable_drps(struct drm_device 
*dev)
  * ourselves, instead of doing a rmw cycle (which might result in us clearing
  * all limits and the gpu stuck at whatever frequency it is at atm).
  */
-static u32 gen6_rps_limits(struct drm_i915_private *dev_priv, u8 *val)
+static u32 gen6_rps_limits(struct drm_i915_private *dev_priv, u8 val)
 {
u32 limits;
 
-   limits = 0;
-
-   if (*val = dev_priv-rps.max_delay)
-   *val = dev_priv-rps.max_delay;
-   limits |= dev_priv-rps.max_delay  24;
-
/* Only set the down limit when we've reached the lowest level to avoid
 * getting more interrupts, otherwise leave this clear. This prevents a
 * race in the hw when coming out of rc6: There's a tiny window where
 * the hw runs at the minimal clock before selecting the desired
 * frequency, if the down threshold expires in that window we will not
 * receive a down interrupt. */
-   if (*val = dev_priv-rps.min_delay) {
-   *val = dev_priv-rps.min_delay;
+   limits = dev_priv-rps.max_delay  24;
+   if (val = dev_priv-rps.min_delay)
limits |= dev_priv-rps.min_delay  16;
-   }
 
return limits;
 }
@@ -3382,7 +3375,6 @@ static void gen6_set_rps_thresholds(struct 
drm_i915_private *dev_priv, u8 val)
 void gen6_set_rps(struct drm_device *dev, u8 val)
 {
struct drm_i915_private *dev_priv = dev-dev_private;
-   u32 limits = gen6_rps_limits(dev_priv, val);
 
WARN_ON(!mutex_is_locked(dev_priv-rps.hw_lock));
WARN_ON(val  dev_priv-rps.max_delay);
@@ -3405,7 +3397,8 @@ void gen6_set_rps(struct drm_device *dev, u8 val)
/* Make sure we continue to get interrupts
 * until we hit the minimum or maximum frequencies.
 */
-   I915_WRITE(GEN6_RP_INTERRUPT_LIMITS, limits);
+   I915_WRITE(GEN6_RP_INTERRUPT_LIMITS,
+  gen6_rps_limits(dev_priv, val));

[Intel-gfx] [PATCH] drm/i915: Delay the relase of the forcewake by a jiffie

Obtaining the forcwake requires expensive and time consuming
serialisation. And we often try to obtain the forcewake multiple times
in very quick succession. We can reduce the overhead of these sequences
by delaying the forcewake release, and so not hammer the hw quite so
hard.

I was hoping this would help with the spurious
[drm:__gen6_gt_force_wake_mt_get] *ERROR* Timed out waiting for forcewake old 
ack to clear.
found on Haswell. Alas not.

v2: Fix teardown ordering - unmap the regs after turning off forcewake,
and make sure we do turn off forcewake - both found by Ville.

v3: As we introduce intel_uncore_fini(), use it to make sure everything
is disabled before we hand back to the BIOS.

Note: I have no claims for improved performance, stablity or power
comsumption for this patch. We should not be hitting the registers often
enough for this to improve benchmarks, but given the nature of our hw it
is likely to improve long term stability.

Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
Reviewed-by: Ville Syrjälä ville.syrj...@linux.intel.com
---
 drivers/gpu/drm/i915/i915_dma.c |  6 --
 drivers/gpu/drm/i915/i915_drv.h |  3 +++
 drivers/gpu/drm/i915/intel_uncore.c | 33 +++--
 3 files changed, 38 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 883990f..97a6e22 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1787,8 +1787,6 @@ int i915_driver_unload(struct drm_device *dev)
list_del(dev_priv-gtt.base.global_link);
WARN_ON(!list_empty(dev_priv-vm_list));
drm_mm_takedown(dev_priv-gtt.base.mm);
-   if (dev_priv-regs != NULL)
-   pci_iounmap(dev-pdev, dev_priv-regs);
 
drm_vblank_cleanup(dev);
 
@@ -1800,6 +1798,10 @@ int i915_driver_unload(struct drm_device *dev)
 
dev_priv-gtt.base.cleanup(dev_priv-gtt.base);
 
+   intel_uncore_fini(dev);
+   if (dev_priv-regs != NULL)
+   pci_iounmap(dev-pdev, dev_priv-regs);
+
if (dev_priv-slab)
kmem_cache_destroy(dev_priv-slab);
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index a6354c3..8c93d93 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -406,6 +406,8 @@ struct intel_uncore {
 
unsigned fifo_count;
unsigned forcewake_count;
+
+   struct delayed_work force_wake_work;
 };
 
 #define DEV_INFO_FOR_EACH_FLAG(func, sep) \
@@ -1792,6 +1794,7 @@ extern void intel_uncore_early_sanitize(struct drm_device 
*dev);
 extern void intel_uncore_init(struct drm_device *dev);
 extern void intel_uncore_clear_errors(struct drm_device *dev);
 extern void intel_uncore_check_errors(struct drm_device *dev);
+extern void intel_uncore_fini(struct drm_device *dev);
 
 void
 i915_enable_pipestat(drm_i915_private_t *dev_priv, int pipe, u32 mask);
diff --git a/drivers/gpu/drm/i915/intel_uncore.c 
b/drivers/gpu/drm/i915/intel_uncore.c
index 8649f1c..f2753d9 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -204,6 +204,18 @@ static void vlv_force_wake_put(struct drm_i915_private 
*dev_priv)
gen6_gt_check_fifodbg(dev_priv);
 }
 
+static void gen6_force_wake_work(struct work_struct *work)
+{
+   struct drm_i915_private *dev_priv =
+   container_of(work, typeof(*dev_priv), 
uncore.force_wake_work.work);
+   unsigned long irqflags;
+
+   spin_lock_irqsave(dev_priv-uncore.lock, irqflags);
+   if (--dev_priv-uncore.forcewake_count == 0)
+   dev_priv-uncore.funcs.force_wake_put(dev_priv);
+   spin_unlock_irqrestore(dev_priv-uncore.lock, irqflags);
+}
+
 void intel_uncore_early_sanitize(struct drm_device *dev)
 {
struct drm_i915_private *dev_priv = dev-dev_private;
@@ -216,6 +228,9 @@ void intel_uncore_init(struct drm_device *dev)
 {
struct drm_i915_private *dev_priv = dev-dev_private;
 
+   INIT_DELAYED_WORK(dev_priv-uncore.force_wake_work,
+ gen6_force_wake_work);
+
if (IS_VALLEYVIEW(dev)) {
dev_priv-uncore.funcs.force_wake_get = vlv_force_wake_get;
dev_priv-uncore.funcs.force_wake_put = vlv_force_wake_put;
@@ -261,6 +276,16 @@ void intel_uncore_init(struct drm_device *dev)
}
 }
 
+void intel_uncore_fini(struct drm_device *dev)
+{
+   struct drm_i915_private *dev_priv = dev-dev_private;
+
+   flush_delayed_work(dev_priv-uncore.force_wake_work);
+
+   /* Paranoia: make sure we have disabled everything before we exit. */
+   intel_uncore_sanitize(dev);
+}
+
 static void intel_uncore_forcewake_reset(struct drm_device *dev)
 {
struct drm_i915_private *dev_priv = dev-dev_private;
@@ -306,8 +331,12 @@ void gen6_gt_force_wake_put(struct drm_i915_private 
*dev_priv)
unsigned long irqflags;
 
spin_lock_irqsave(dev_priv-uncore.lock, irqflags);
-

[Intel-gfx] [PATCH] drm/i915: Add a tracepoint for using a semaphore

So that we can find the callers who introduce a ring stall. A single
ring stall is not too unwelcome, the right issue becomes when they start
to interlock and prevent any concurrent work. That, however, is a little
tricker to detect with a mere tracepoint!

Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
---
 drivers/gpu/drm/i915/i915_gem.c   |  2 ++
 drivers/gpu/drm/i915/i915_trace.h | 19 +++
 2 files changed, 21 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 85537d1..0122278 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2698,6 +2698,8 @@ i915_gem_object_sync(struct drm_i915_gem_object *obj,
if (ret)
return ret;
 
+   trace_i915_gem_object_sync(obj, to);
+
ret = to-sync_to(to, from, seqno);
if (!ret)
/* We use last_read_seqno because sync_to()
diff --git a/drivers/gpu/drm/i915/i915_trace.h 
b/drivers/gpu/drm/i915/i915_trace.h
index feaacbb..728c43b 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -33,6 +33,25 @@ TRACE_EVENT(i915_gem_object_create,
TP_printk(obj=%p, size=%u, __entry-obj, __entry-size)
 );
 
+TRACE_EVENT(i915_gem_object_sync,
+   TP_PROTO(struct drm_i915_gem_object *obj, struct intel_ring_buffer 
*to),
+   TP_ARGS(obj, to),
+
+   TP_STRUCT__entry(
+__field(struct drm_i915_gem_object *, obj)
+__field(u32, sync_from)
+__field(u32, sync_to)
+),
+
+   TP_fast_assign(
+  __entry-obj = obj;
+  __entry-sync_from = obj-ring-id;
+  __entry-sync_to = to-id;
+  ),
+
+   TP_printk(obj=%p, sync-from=%u, sync-to=%u, __entry-obj, 
__entry-sync_from, __entry-sync_to)
+);
+
 TRACE_EVENT(i915_vma_bind,
TP_PROTO(struct i915_vma *vma, bool mappable),
TP_ARGS(vma, mappable),
-- 
1.8.4.rc3

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH] tests: Simulate missed breadcrumb irqs

Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
---
 tests/Makefile.am   |  1 +
 tests/ZZ_missed_irq | 70 +
 2 files changed, 71 insertions(+)
 create mode 100755 tests/ZZ_missed_irq

diff --git a/tests/Makefile.am b/tests/Makefile.am
index 9878a25..e258b1f 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -130,6 +130,7 @@ TESTS_scripts = \
test_rte_check \
tools_test \
ZZ_hangman \
+   ZZ_missed_irq \
$(NULL)
 
 # This target contains testcases which support automagic subtest enumeration
diff --git a/tests/ZZ_missed_irq b/tests/ZZ_missed_irq
new file mode 100755
index 000..86c0123
--- /dev/null
+++ b/tests/ZZ_missed_irq
@@ -0,0 +1,70 @@
+#!/bin/bash
+#
+# Testcase: Simulate missed breadcrumb interrupts
+#
+
+SOURCE_DIR=$( dirname ${BASH_SOURCE[0]} )
+test $SOURCE_DIR = .  SOURCE_DIR=`pwd`
+. $SOURCE_DIR/drm_lib.sh
+cd $i915_dfs_path
+
+echo $SOURCE_DIR
+
+function check_for_missed_irq {
+   if test `cat i915_ring_missed_irq` = 0x; then
+   echo missed interrupts undetected
+   exit 1
+   fi
+}
+
+function check_for_hang {
+   if cat i915_error_state | grep -v no error state collected  
/dev/null ; then
+   echo gpu hang reported
+   exit 2
+   fi
+}
+
+if [ ! -f i915_ring_missed_irq ] ; then
+   echo kernel doesn't support interrupt masking
+   exit 77
+fi
+
+# clear error state first
+echo  i915_error_state
+check_for_hang
+
+echo 0xf  i915_ring_test_irq
+echo Interrupts masked
+if test `cat i915_ring_test_irq` != 0x000f; then
+   echo Failed to set interrupt mask
+   exit 3
+fi
+
+$SOURCE_DIR/gem_exec_big  /dev/null
+
+check_for_missed_irq
+check_for_hang
+
+$SOURCE_DIR/gem_exec_big  /dev/null
+
+check_for_hang
+
+echo 0  i915_ring_test_irq
+echo Interrupts unmasked
+if test `cat i915_ring_test_irq` != 0x; then
+   echo Failed to clear interrupt mask
+   exit 3
+fi
+
+$SOURCE_DIR/gem_exec_big  /dev/null
+
+check_for_hang
+
+echo 0  i915_ring_missed_irq
+echo Cleared missed interrupts
+if test `cat i915_ring_missed_irq` != 0x; then
+   echo Failed to clear missed interrupts
+   exit 3
+fi
+
+exit 0
-- 
1.8.4.rc3

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915: Add a tracepoint for using a semaphore

On Mon, Aug 26, 2013 at 01:46:21PM +0100, Chris Wilson wrote:
 So that we can find the callers who introduce a ring stall. A single
 ring stall is not too unwelcome, the right issue becomes when they start
 to interlock and prevent any concurrent work. That, however, is a little
 tricker to detect with a mere tracepoint!

I was staring at the semaphore code a bit recently, and I was wondering
if it would make sense to do a manual seqno check (against the ring's
current seqno, not just the last sync_seqno) before we decide to
add the semaphore into the ring?

 
 Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
 ---
  drivers/gpu/drm/i915/i915_gem.c   |  2 ++
  drivers/gpu/drm/i915/i915_trace.h | 19 +++
  2 files changed, 21 insertions(+)
 
 diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
 index 85537d1..0122278 100644
 --- a/drivers/gpu/drm/i915/i915_gem.c
 +++ b/drivers/gpu/drm/i915/i915_gem.c
 @@ -2698,6 +2698,8 @@ i915_gem_object_sync(struct drm_i915_gem_object *obj,
   if (ret)
   return ret;
  
 + trace_i915_gem_object_sync(obj, to);
 +
   ret = to-sync_to(to, from, seqno);
   if (!ret)
   /* We use last_read_seqno because sync_to()
 diff --git a/drivers/gpu/drm/i915/i915_trace.h 
 b/drivers/gpu/drm/i915/i915_trace.h
 index feaacbb..728c43b 100644
 --- a/drivers/gpu/drm/i915/i915_trace.h
 +++ b/drivers/gpu/drm/i915/i915_trace.h
 @@ -33,6 +33,25 @@ TRACE_EVENT(i915_gem_object_create,
   TP_printk(obj=%p, size=%u, __entry-obj, __entry-size)
  );
  
 +TRACE_EVENT(i915_gem_object_sync,
 + TP_PROTO(struct drm_i915_gem_object *obj, struct intel_ring_buffer 
 *to),
 + TP_ARGS(obj, to),
 +
 + TP_STRUCT__entry(
 +  __field(struct drm_i915_gem_object *, obj)
 +  __field(u32, sync_from)
 +  __field(u32, sync_to)
 +  ),
 +
 + TP_fast_assign(
 +__entry-obj = obj;
 +__entry-sync_from = obj-ring-id;
 +__entry-sync_to = to-id;
 +),
 +
 + TP_printk(obj=%p, sync-from=%u, sync-to=%u, __entry-obj, 
 __entry-sync_from, __entry-sync_to)
 +);
 +
  TRACE_EVENT(i915_vma_bind,
   TP_PROTO(struct i915_vma *vma, bool mappable),
   TP_ARGS(vma, mappable),
 -- 
 1.8.4.rc3
 
 ___
 Intel-gfx mailing list
 Intel-gfx@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ville Syrjälä
Intel OTC
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915: Add a tracepoint for using a semaphore

On Mon, Aug 26, 2013 at 04:20:06PM +0300, Ville Syrjälä wrote:
 On Mon, Aug 26, 2013 at 01:46:21PM +0100, Chris Wilson wrote:
  So that we can find the callers who introduce a ring stall. A single
  ring stall is not too unwelcome, the right issue becomes when they start
  to interlock and prevent any concurrent work. That, however, is a little
  tricker to detect with a mere tracepoint!
 
 I was staring at the semaphore code a bit recently, and I was wondering
 if it would make sense to do a manual seqno check (against the ring's
 current seqno, not just the last sync_seqno) before we decide to
 add the semaphore into the ring?

If we use the cheap check, yeah. We can also store the cross-sync seqno
as well. Both are going to be unlikely paths though.

Anyway, I'm moving the tracepoint over to the ring rather than the
object for consistency.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] Review me!

Daniel is pushing for something like this upstream so that we can catch
missed interrupts again. This contains the compromise for Ben and Jesse,
and should work even better than before with absent interrupts.

I'm not thrilled by the test_/missed_irq debugfs interface, it is a little
racy, but I didn't want to impact the hotpath.
-Chris

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 2/2] drm/i915: Call io_schedule() whilst whilsting for the GPU

Since we are waiting upon IO completion, inform the kernel through use
of the io_schedule() call rather than the regular schedule(). This
should allow the kernel to make better decisions regarding scheduling
and power management.

Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
---
 drivers/gpu/drm/i915/i915_gem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 4ed782f..408578b 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1063,7 +1063,7 @@ static int __wait_seqno(struct intel_ring_buffer *ring, 
u32 seqno,
mod_timer(timer, expire);
}
 
-   schedule();
+   io_schedule();
 
if (timeout)
timeout_jiffies = expire - jiffies;
-- 
1.8.4.rc3

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 1/2] drm/i915: Fix __wait_seqno to use true infinite timeouts

When we switched to always using a timeout in conjunction with
wait_seqno, we lost the ability to detect missed interrupts. Since, we
have had issues with interrupts on a number of generations, and they are
required to be delivered in a timely fashion for a smooth UX, it is
important that we do log errors found in the wild and prevent the
display stalling for upwards of 1s every time the seqno interrupt is
missed.

Rather than continue to fix up the timeouts to work around the interface
impedence in wait_event_*(), open code the combination of
wait_event[_interruptible][_timeout], and use the exposed timer to
poll for seqno should we detect a lost interrupt.

v2: In order to satisfy the debug requirement of logging missed
interrupts with the real world requirments of making machines work even
if interrupts are hosed, we revert to polling after detecting a missed
interrupt.

v3: Throw in a debugfs interface to simulate broken hw not reporting
interrupts.

Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
---
 drivers/gpu/drm/i915/i915_debugfs.c   |  68 
 drivers/gpu/drm/i915/i915_drv.h   |   5 ++
 drivers/gpu/drm/i915/i915_gem.c   | 114 --
 drivers/gpu/drm/i915/i915_gpu_error.c |   1 +
 drivers/gpu/drm/i915/i915_irq.c   |  11 ++--
 5 files changed, 148 insertions(+), 51 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index 55ab924..e823169 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1876,6 +1876,72 @@ DEFINE_SIMPLE_ATTRIBUTE(i915_ring_stop_fops,
i915_ring_stop_get, i915_ring_stop_set,
0x%08llx\n);
 
+static int
+i915_ring_missed_irq_get(void *data, u64 *val)
+{
+   struct drm_device *dev = data;
+   drm_i915_private_t *dev_priv = dev-dev_private;
+
+   *val = dev_priv-gpu_error.missed_irq_rings;
+   return 0;
+}
+
+static int
+i915_ring_missed_irq_set(void *data, u64 val)
+{
+   struct drm_device *dev = data;
+   struct drm_i915_private *dev_priv = dev-dev_private;
+   int ret;
+
+   /* Lock against concurrent debugfs callers */
+   ret = mutex_lock_interruptible(dev-struct_mutex);
+   if (ret)
+   return ret;
+   dev_priv-gpu_error.missed_irq_rings = val;
+   mutex_unlock(dev-struct_mutex);
+
+   return 0;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(i915_ring_missed_irq_fops,
+   i915_ring_missed_irq_get, i915_ring_missed_irq_set,
+   0x%08llx\n);
+
+static int
+i915_ring_test_irq_get(void *data, u64 *val)
+{
+   struct drm_device *dev = data;
+   drm_i915_private_t *dev_priv = dev-dev_private;
+
+   *val = dev_priv-gpu_error.test_irq_rings;
+
+   return 0;
+}
+
+static int
+i915_ring_test_irq_set(void *data, u64 val)
+{
+   struct drm_device *dev = data;
+   struct drm_i915_private *dev_priv = dev-dev_private;
+   int ret;
+
+   DRM_DEBUG_DRIVER(Masking interrupts on rings 0x%08llx\n, val);
+
+   /* Lock against concurrent debugfs callers */
+   ret = mutex_lock_interruptible(dev-struct_mutex);
+   if (ret)
+   return ret;
+
+   dev_priv-gpu_error.test_irq_rings = val;
+   mutex_unlock(dev-struct_mutex);
+
+   return 0;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(i915_ring_test_irq_fops,
+   i915_ring_test_irq_get, i915_ring_test_irq_set,
+   0x%08llx\n);
+
 #define DROP_UNBOUND 0x1
 #define DROP_BOUND 0x2
 #define DROP_RETIRE 0x4
@@ -2269,6 +2335,8 @@ static struct i915_debugfs_files {
{i915_min_freq, i915_min_freq_fops},
{i915_cache_sharing, i915_cache_sharing_fops},
{i915_ring_stop, i915_ring_stop_fops},
+   {i915_ring_missed_irq, i915_ring_missed_irq_fops},
+   {i915_ring_test_irq, i915_ring_test_irq_fops},
{i915_gem_drop_caches, i915_drop_caches_fops},
{i915_error_state, i915_error_state_fops},
{i915_next_seqno, i915_next_seqno_fops},
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 84b95b1..ac6db6e 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -993,6 +993,8 @@ struct i915_gpu_error {
 
unsigned long last_reset;
 
+   unsigned long missed_irq_rings;
+
/**
 * State variable and reset counter controlling the reset flow
 *
@@ -1031,6 +1033,9 @@ struct i915_gpu_error {
 
/* For gpu hang simulation. */
unsigned int stop_rings;
+
+   /* For missed irq/seqno simulation. */
+   unsigned int test_irq_rings;
 };
 
 enum modeset_restore {
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index ccfebf5..4ed782f 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -970,6 +970,17 @@ i915_gem_check_olr(struct intel_ring_buffer *ring, u32 
seqno)
return ret;
 }

[Intel-gfx] [PATCH] drm/i915: Adjust available RPS information through sysfs for vlv

Valleyview has its own render power state implementation with different
capability knobs - it has no RP0,RP1,RPn but rather RPe.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=67734
Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
Tested-by: kobe@intel.com
Reviewed-by: Jani Nikula jani.nik...@intel.com
---
 drivers/gpu/drm/i915/i915_sysfs.c |   36 +++-
 1 file changed, 31 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_sysfs.c 
b/drivers/gpu/drm/i915/i915_sysfs.c
index a777e7f..c8c4112 100644
--- a/drivers/gpu/drm/i915/i915_sysfs.c
+++ b/drivers/gpu/drm/i915/i915_sysfs.c
@@ -224,6 +224,18 @@ static ssize_t gt_cur_freq_mhz_show(struct device *kdev,
return snprintf(buf, PAGE_SIZE, %d\n, ret);
 }
 
+static ssize_t vlv_rpe_freq_mhz_show(struct device *kdev,
+struct device_attribute *attr, char *buf)
+{
+   struct drm_minor *minor = container_of(kdev, struct drm_minor, kdev);
+   struct drm_device *dev = minor-dev;
+   struct drm_i915_private *dev_priv = dev-dev_private;
+
+   return snprintf(buf, PAGE_SIZE, %d\n,
+   vlv_gpu_freq(dev_priv-mem_freq,
+dev_priv-rps.rpe_delay));
+}
+
 static ssize_t gt_max_freq_mhz_show(struct device *kdev, struct 
device_attribute *attr, char *buf)
 {
struct drm_minor *minor = container_of(kdev, struct drm_minor, kdev);
@@ -366,6 +378,7 @@ static DEVICE_ATTR(gt_cur_freq_mhz, S_IRUGO, 
gt_cur_freq_mhz_show, NULL);
 static DEVICE_ATTR(gt_max_freq_mhz, S_IRUGO | S_IWUSR, gt_max_freq_mhz_show, 
gt_max_freq_mhz_store);
 static DEVICE_ATTR(gt_min_freq_mhz, S_IRUGO | S_IWUSR, gt_min_freq_mhz_show, 
gt_min_freq_mhz_store);
 
+static DEVICE_ATTR(vlv_rpe_freq_mhz, S_IRUGO, vlv_rpe_freq_mhz_show, NULL);
 
 static ssize_t gt_rp_mhz_show(struct device *kdev, struct device_attribute 
*attr, char *buf);
 static DEVICE_ATTR(gt_RP0_freq_mhz, S_IRUGO, gt_rp_mhz_show, NULL);
@@ -409,6 +422,14 @@ static const struct attribute *gen6_attrs[] = {
NULL,
 };
 
+static const struct attribute *vlv_attrs[] = {
+   dev_attr_gt_cur_freq_mhz.attr,
+   dev_attr_gt_max_freq_mhz.attr,
+   dev_attr_gt_min_freq_mhz.attr,
+   dev_attr_vlv_rpe_freq_mhz.attr,
+   NULL,
+};
+
 static ssize_t error_state_read(struct file *filp, struct kobject *kobj,
struct bin_attribute *attr, char *buf,
loff_t off, size_t count)
@@ -492,11 +513,13 @@ void i915_setup_sysfs(struct drm_device *dev)
DRM_ERROR(l3 parity sysfs setup failed\n);
}
 
-   if (INTEL_INFO(dev)-gen = 6) {
+   ret = 0;
+   if (IS_VALLEYVIEW(dev))
+   ret = sysfs_create_files(dev-primary-kdev.kobj, vlv_attrs);
+   else if (INTEL_INFO(dev)-gen = 6)
ret = sysfs_create_files(dev-primary-kdev.kobj, gen6_attrs);
-   if (ret)
-   DRM_ERROR(gen6 sysfs setup failed\n);
-   }
+   if (ret)
+   DRM_ERROR(RPS sysfs setup failed\n);
 
ret = sysfs_create_bin_file(dev-primary-kdev.kobj,
error_state_attr);
@@ -507,7 +530,10 @@ void i915_setup_sysfs(struct drm_device *dev)
 void i915_teardown_sysfs(struct drm_device *dev)
 {
sysfs_remove_bin_file(dev-primary-kdev.kobj, error_state_attr);
-   sysfs_remove_files(dev-primary-kdev.kobj, gen6_attrs);
+   if (IS_VALLEYVIEW(dev))
+   sysfs_remove_files(dev-primary-kdev.kobj, vlv_attrs);
+   else
+   sysfs_remove_files(dev-primary-kdev.kobj, gen6_attrs);
device_remove_bin_file(dev-primary-kdev,  dpf_attrs);
 #ifdef CONFIG_PM
sysfs_unmerge_group(dev-primary-kdev.kobj, rc6_attr_group);
-- 
1.7.9.5

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH v2] i915: Update VGA arbiter support for newer devices

2013-08-26 Thread Alex Williamson

On Mon, 2013-08-26 at 15:23 +0300, Ville Syrjälä wrote:
 On Sat, Aug 24, 2013 at 08:53:40AM -0600, Alex Williamson wrote:
  This is intended to add VGA arbiter support for Intel HD graphics on
  Core processors.  The old GMCH registers no longer exist, so even
  though it appears that i915 participates in VGA arbitration, it doesn't
  work.  On Intel HD graphics we already attempt to disable VGA regions
  of the device.  This makes registering as a VGA client unnecessary since
  we don't intend to operate differently depending on how many VGA devices
  are present.  We can disable VGA memory regions by clearing the memory
  enable bit in the VGA MSR.  That only leaves VGA IO, which we update
  the VGA arbiter to know that we don't participate in VGA memory
  arbitration.  We also add a hook on unload to re-enable memory and
  reinstate VGA memory arbitration.
  
  Signed-off-by: Alex Williamson alex.william...@redhat.com
  ---
  
  v2: I915_READ/WRITE accessors don't work in i915_disable_vga, use inb/outb
  directly.  Also, on the driver unbind VGA enable path, acquire legacy
  IO to re-enable VGA memory.  Correct comment.
  
  As with v1, this depends on vgaarb: Fixes for partial VGA opt-out.  With
  all patches I'm able to assign a discrete PEG VGA device to a guest and
  execute the VBIOS w/o interference from IGD or corruption of the IGD
  framebuffer.
 
 I glanced through those before, but I'll try to do a more thorough
 review soon.

Thanks!

  
   drivers/gpu/drm/i915/i915_dma.c  |9 ++---
   drivers/gpu/drm/i915/intel_display.c |   24 
   2 files changed, 30 insertions(+), 3 deletions(-)
  
  diff --git a/drivers/gpu/drm/i915/i915_dma.c 
  b/drivers/gpu/drm/i915/i915_dma.c
  index f466980..d9cf216 100644
  --- a/drivers/gpu/drm/i915/i915_dma.c
  +++ b/drivers/gpu/drm/i915/i915_dma.c
  @@ -1287,9 +1287,12 @@ static int i915_load_modeset_init(struct drm_device 
  *dev)
   * then we do not take part in VGA arbitration and the
   * vga_client_register() fails with -ENODEV.
   */
  -   ret = vga_client_register(dev-pdev, dev, NULL, i915_vga_set_decode);
  -   if (ret  ret != -ENODEV)
  -   goto out;
  +   if (!HAS_PCH_SPLIT(dev)) {
  +   ret = vga_client_register(dev-pdev, dev, NULL,
  + i915_vga_set_decode);
  +   if (ret  ret != -ENODEV)
  +   goto out;
  +   }
 
 I think we should do something to fix i915_redisable_vga() for !pch
 platforms too. The MSR method should work for everything, so I'm leaning
 towards using it all the time. The other option would be add an mmio
 codepath into intel_disable_vga(), but then we have several codepaths
 to worry about.

Do you have time, systems, and interest to be able to test !pch?  I
assume that at some point in the past VGA arbiter did work on i915 with
the right hardware, so I didn't want to go breaking code that I can't
test.
  
  intel_register_dsm_handler();
   
  diff --git a/drivers/gpu/drm/i915/intel_display.c 
  b/drivers/gpu/drm/i915/intel_display.c
  index 5fb3058..2bb805c 100644
  --- a/drivers/gpu/drm/i915/intel_display.c
  +++ b/drivers/gpu/drm/i915/intel_display.c
  @@ -9519,6 +9519,15 @@ static void i915_disable_vga(struct drm_device *dev)
  outb(SR01, VGA_SR_INDEX);
  sr1 = inb(VGA_SR_DATA);
  outb(sr1 | 15, VGA_SR_DATA);
  +
  +   /* Disable VGA memory on Intel HD */
  +   if (HAS_PCH_SPLIT(dev)) {
  +   outb(inb(VGA_MSR_READ)  ~VGA_MSR_MEM_EN, VGA_MSR_WRITE);
  +   vga_set_legacy_decoding(dev-pdev, VGA_RSRC_LEGACY_IO |
  +  VGA_RSRC_NORMAL_IO |
  +  VGA_RSRC_NORMAL_MEM);
  +   }
  +
  vga_put(dev-pdev, VGA_RSRC_LEGACY_IO);
  udelay(300);
   
  @@ -9526,6 +9535,19 @@ static void i915_disable_vga(struct drm_device *dev)
  POSTING_READ(vga_reg);
   }
   
  +static void i915_enable_vga(struct drm_device *dev)
  +{
 
 Not sure we need this actually. We don't re-enable the VGA display
 on unload, so you'll never see anything on the display anyway. But
 then again, I don't see any real harm in doing this either.

Right, I actually get a flood of DMAR errors from VT-d when I unbind the
device from i915, but that happens regardless of this patch.  I did
confirm that VGA memory is re-added to VGA arbitration though, which is
important because even if i915 doesn't leave the device in a usable
state we might still be able to pass it to a guest, which will rely on
the VGA arbiter control being re-enabled.

  +   /* Enable VGA memory on Intel HD */
  +   if (HAS_PCH_SPLIT(dev)) {
  +   vga_get_uninterruptible(dev-pdev, VGA_RSRC_LEGACY_IO);
  +   outb(inb(VGA_MSR_READ) | VGA_MSR_MEM_EN, VGA_MSR_WRITE);
 
 I was thinking we might save the MSR state during driver load and
 restore to that same state, but it could be a bit overkill.

It seems unnecessary for just this bit, but if

[Intel-gfx] (no subject)

2013-08-26 Thread Michael S. Tsirkin

On Mon, Aug 26, 2013 at 04:05:11PM +0300, Michael S. Tsirkin wrote:
 On Wed, Aug 21, 2013 at 11:22:58AM +0200, Sedat Dilek wrote:
  [ Re: [Intel-gfx] i915 producing warnings with kernel 3.11-rc5 ]
  
  Hi,
  
  saw your posting in [1]... can you try the patches below?
  Not sure if they apply.
  Did you try v3.11-rc6(+)... or drm-intel-nightly?
  
  Regards,
  - Sedat -
  
  [1] http://lists.freedesktop.org/archives/intel-gfx/2013-August/032154.html
 
 Same thing observed with v3.11-rc7.

Looks like when this happens, external monitor does not work.
It shows this message:
not optimum mode: recommended mode 1280x1024 60Hz
while this is exactly what I configured in xrandr.



 
  
  -- Forwarded message --
  From: Sedat Dilek sedat.di...@gmail.com
  Date: Tue, Jul 2, 2013 at 7:31 AM
  Subject: Re: linux-next: Tree for Jul 1 [ drm-intel-next: Several 
  call-traces ]
  To: Daniel Vetter daniel.vet...@ffwll.ch
  Cc: Barnes, Jesse jbar...@virtuousgeek.org, Stephen Rothwell
  s...@canb.auug.org.au, linux-next linux-n...@vger.kernel.org, Linux
  Kernel Mailing List linux-ker...@vger.kernel.org, intel-gfx
  intel-gfx@lists.freedesktop.org, DRI
  dri-de...@lists.freedesktop.org, Dave Airlie airl...@gmail.com
  
  
  On Mon, Jul 1, 2013 at 11:03 AM, Sedat Dilek sedat.di...@gmail.com wrote:
   On Mon, Jul 1, 2013 at 10:52 AM, Daniel Vetter daniel.vet...@ffwll.ch 
   wrote:
   On Mon, Jul 1, 2013 at 10:49 AM, Sedat Dilek sedat.di...@gmail.com 
   wrote:
   On Mon, Jul 1, 2013 at 9:59 AM, Stephen Rothwell 
   s...@canb.auug.org.au wrote:
   Hi all,
  
   Changes since 20130628:
  
   The regulator tree gained a build failure so I used the version from
   next-20130628.
  
   The trivial tree gained a conflict against the fbdev tree.
  
   The arm-soc tree gained a conflict against the net-next tree.
  
   The akpm tree lost a few patches that turned up elsewhere and I 
   removed 2
   that were causing run time problems.
  
  
   [ CC drm and drm-intel folks ]
  
   [ Did not check any relevant MLs ]
  
   Please, see attached dmesg output.
  
   Clock mismatch, one for Jesse to figure out. Note that this patch is
   for 3.12, I simply haven't yet gotten around to properly split my
   patch queue so a few spilled into -next. I'll do that now.
  
   I like lightspeed-fast replies :-).
  
   Guess drm/i915: get mode clock when reading the pipe config v9 [1]
   is the cause.
  
  
  Problem solved by applying these patches to next-20130701 from
  intel-gfx patchwork-service [0]:
  
 [1/2] drm/i915: fixup messages in pipe_config_compare
 [2/2] drm/i915: get clock config when checking CRTC state too
  
  AFAICS 2/2 was folded into updated drm/i915: get mode clock when
  reading the pipe config v9 [3].
  
  It would be kind to be CCed on the patches and get also some credits.
  Also a CC to the report in linux-next should IMHO be done.
  
  - Sedat -
  
  [0] https://patchwork.kernel.org/project/intel-gfx/list/
  [1] https://patchwork.kernel.org/patch/2809031/
  [2] https://patchwork.kernel.org/patch/2809021/
  [3] 
  http://cgit.freedesktop.org/~danvet/drm-intel/commit/?h=drm-intel-nightlyid=f1f644dc66cbaf5a4c7dcde683361536b41885b9
  
   - Sedat -
  
   [1] 
   http://cgit.freedesktop.org/~danvet/drm-intel/commit/?h=drm-intel-next-queuedid=d325d8b4f351f9d45e7c8baabf581fd21f343133
  
   -Daniel
   --
   Daniel Vetter
   Software Engineer, Intel Corporation
   +41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH v2] i915: Update VGA arbiter support for newer devices

On Mon, Aug 26, 2013 at 10:05:36AM -0600, Alex Williamson wrote:
 On Mon, 2013-08-26 at 15:23 +0300, Ville Syrjälä wrote:
  On Sat, Aug 24, 2013 at 08:53:40AM -0600, Alex Williamson wrote:
   This is intended to add VGA arbiter support for Intel HD graphics on
   Core processors.  The old GMCH registers no longer exist, so even
   though it appears that i915 participates in VGA arbitration, it doesn't
   work.  On Intel HD graphics we already attempt to disable VGA regions
   of the device.  This makes registering as a VGA client unnecessary since
   we don't intend to operate differently depending on how many VGA devices
   are present.  We can disable VGA memory regions by clearing the memory
   enable bit in the VGA MSR.  That only leaves VGA IO, which we update
   the VGA arbiter to know that we don't participate in VGA memory
   arbitration.  We also add a hook on unload to re-enable memory and
   reinstate VGA memory arbitration.
   
   Signed-off-by: Alex Williamson alex.william...@redhat.com
   ---
   
   v2: I915_READ/WRITE accessors don't work in i915_disable_vga, use inb/outb
   directly.  Also, on the driver unbind VGA enable path, acquire legacy
   IO to re-enable VGA memory.  Correct comment.
   
   As with v1, this depends on vgaarb: Fixes for partial VGA opt-out.  With
   all patches I'm able to assign a discrete PEG VGA device to a guest and
   execute the VBIOS w/o interference from IGD or corruption of the IGD
   framebuffer.
  
  I glanced through those before, but I'll try to do a more thorough
  review soon.
 
 Thanks!
 
   
drivers/gpu/drm/i915/i915_dma.c  |9 ++---
drivers/gpu/drm/i915/intel_display.c |   24 
2 files changed, 30 insertions(+), 3 deletions(-)
   
   diff --git a/drivers/gpu/drm/i915/i915_dma.c 
   b/drivers/gpu/drm/i915/i915_dma.c
   index f466980..d9cf216 100644
   --- a/drivers/gpu/drm/i915/i915_dma.c
   +++ b/drivers/gpu/drm/i915/i915_dma.c
   @@ -1287,9 +1287,12 @@ static int i915_load_modeset_init(struct 
   drm_device *dev)
  * then we do not take part in VGA arbitration and the
  * vga_client_register() fails with -ENODEV.
  */
   - ret = vga_client_register(dev-pdev, dev, NULL, i915_vga_set_decode);
   - if (ret  ret != -ENODEV)
   - goto out;
   + if (!HAS_PCH_SPLIT(dev)) {
   + ret = vga_client_register(dev-pdev, dev, NULL,
   +   i915_vga_set_decode);
   + if (ret  ret != -ENODEV)
   + goto out;
   + }
  
  I think we should do something to fix i915_redisable_vga() for !pch
  platforms too. The MSR method should work for everything, so I'm leaning
  towards using it all the time. The other option would be add an mmio
  codepath into intel_disable_vga(), but then we have several codepaths
  to worry about.
 
 Do you have time, systems, and interest to be able to test !pch?  I
 assume that at some point in the past VGA arbiter did work on i915 with
 the right hardware, so I didn't want to go breaking code that I can't
 test.

I do have a few gen4ish machines sitting around. I guess I could try it on
them at some point when I have a bit of time. But you're right that
maybe it's better to not touch them before that. i915_disable_vga() is
already potentially broken on them, so your patch is not introducing
any new regressions, just maintaining the status quo.

   
 intel_register_dsm_handler();

   diff --git a/drivers/gpu/drm/i915/intel_display.c 
   b/drivers/gpu/drm/i915/intel_display.c
   index 5fb3058..2bb805c 100644
   --- a/drivers/gpu/drm/i915/intel_display.c
   +++ b/drivers/gpu/drm/i915/intel_display.c
   @@ -9519,6 +9519,15 @@ static void i915_disable_vga(struct drm_device 
   *dev)
 outb(SR01, VGA_SR_INDEX);
 sr1 = inb(VGA_SR_DATA);
 outb(sr1 | 15, VGA_SR_DATA);
   +
   + /* Disable VGA memory on Intel HD */
   + if (HAS_PCH_SPLIT(dev)) {
   + outb(inb(VGA_MSR_READ)  ~VGA_MSR_MEM_EN, VGA_MSR_WRITE);
   + vga_set_legacy_decoding(dev-pdev, VGA_RSRC_LEGACY_IO |
   +VGA_RSRC_NORMAL_IO |
   +VGA_RSRC_NORMAL_MEM);
   + }
   +
 vga_put(dev-pdev, VGA_RSRC_LEGACY_IO);
 udelay(300);

   @@ -9526,6 +9535,19 @@ static void i915_disable_vga(struct drm_device 
   *dev)
 POSTING_READ(vga_reg);
}

   +static void i915_enable_vga(struct drm_device *dev)
   +{
  
  Not sure we need this actually. We don't re-enable the VGA display
  on unload, so you'll never see anything on the display anyway. But
  then again, I don't see any real harm in doing this either.
 
 Right, I actually get a flood of DMAR errors from VT-d when I unbind the
 device from i915, but that happens regardless of this patch.  I did
 confirm that VGA memory is re-added to VGA arbitration though, which is
 important because even if i915 doesn't leave the device in a usable
 state we might still be able to pass it to

Re: [Intel-gfx] [PATCH] drm/i915: More vma fixups around unbind/destroy

On Mon, Aug 26, 2013 at 10:36:35AM +0100, Chris Wilson wrote:
 On Mon, Aug 26, 2013 at 11:23:47AM +0200, Daniel Vetter wrote:
  The important bugfix here is that we must not unlink the vma when
  we keep it around as a placeholder for the execbuf code. Since then we
  won't find it again when execbuf gets interrupt and restarted and
  create a 2nd vma. And since the code as-is isn't fit yet to deal with
  more than one vma, hilarity ensues.
  
  Specifically the dma map/unmap of the sg table isn't adjusted for
  multiple vmas yet and will blow up like this:
  
  BUG: unable to handle kernel NULL pointer dereference at 0008
  IP: [a008fb37] i915_gem_gtt_finish_object+0x73/0xc8 [i915]
  PGD 56bb5067 PUD ad3dd067 PMD 0
  Oops:  [#1] SMP
  Modules linked in: tcp_lp ppdev parport_pc lp parport ipv6 dm_mod dcdbas 
  snd_hda_codec_hdmi pcspkr snd_hda_codec_realtek serio_raw i2c_i801 iTCO_wdt 
  iTCO_vendor_support snd_hda_intel snd_hda_codec lpc_ich snd_hwdep mfd_core 
  snd_pcm snd_page_alloc snd_timer snd soundcore acpi_cpufreq i915 video 
  button drm_kms_helper drm mperf freq_table
  CPU: 1 PID: 16650 Comm: fbo-maxsize Not tainted 
  3.11.0-rc4_nightlytop_d93f59_debug_20130814_+ #6957
  Hardware name: Dell Inc. OptiPlex 9010/03JR84, BIOS A01 05/04/2012
  task: 8800563b3f00 ti: 88004bdf4000 task.ti: 88004bdf4000
  RIP: 0010:[a008fb37]  [a008fb37] 
  i915_gem_gtt_finish_object+0x73/0xc8 [i915]
  RSP: 0018:88004bdf5958  EFLAGS: 00010246
  RAX:  RBX: 8801135e RCX: 8800ad3bf8e0
  RDX: 8800ad3bf8e0 RSI:  RDI: 8801007ee780
  RBP: 88004bdf5978 R08: 8800ad3bf8e0 R09: 
  R10: 86ca1810 R11: 880036a17101 R12: 8801007ee780
  R13: 00018001 R14: 880118c4e000 R15: 8801007ee780
  FS:  7f401a0ce740() GS:88011e28() knlGS:
  CS:  0010 DS:  ES:  CR0: 80050033
  CR2: 0008 CR3: 5635c000 CR4: 001407e0
  Stack:
   8801007ee780 88005c253180 00018000 8801135e
   88004bdf59a8 a0088e55 0011 8801007eec00
   00018000 880036a17101 88004bdf5a08 a0089026
  Call Trace:
   [a0088e55] i915_vma_unbind+0xdf/0x1ab [i915]
   [a0089026] __i915_gem_shrink+0x105/0x177 [i915]
   [a0089452] i915_gem_object_get_pages_gtt+0x108/0x309 [i915]
   [a0085ba9] i915_gem_object_get_pages+0x61/0x90 [i915]
   [a008f22b] ? gen6_ppgtt_insert_entries+0x103/0x125 [i915]
   [a008a113] i915_gem_object_pin+0x1fa/0x5df [i915]
   [a008cdfe] i915_gem_execbuffer_reserve_object.isra.6+0x8d/0x1bc 
  [i915]
   [a008d156] i915_gem_execbuffer_reserve+0x229/0x367 [i915]
   [a008dbf6] i915_gem_do_execbuffer.isra.12+0x4dc/0xf3a [i915]
   [810fc823] ? might_fault+0x40/0x90
   [a008eb89] i915_gem_execbuffer2+0x187/0x222 [i915]
   [a000971c] drm_ioctl+0x308/0x442 [drm]
   [a008ea02] ? i915_gem_execbuffer+0x3ae/0x3ae [i915]
   [817db156] ? __do_page_fault+0x3dd/0x481
   [8112fdba] vfs_ioctl+0x26/0x39
   [811306a2] do_vfs_ioctl+0x40e/0x451
   [817deda7] ? sysret_check+0x1b/0x56
   [8113073c] SyS_ioctl+0x57/0x87
   [8135bbfe] ? trace_hardirqs_on_thunk+0x3a/0x3f
   [817ded82] system_call_fastpath+0x16/0x1b
  Code: 48 c7 c6 84 30 0e a0 31 c0 e8 d0 e9 f7 ff bf c6 a7 00 00 e8 07 af 2c 
  e1 41 f6 84 24 03 01 00 00 10 75 44 49 8b 84 24 08 01 00 00 8b 50 08 48 
  8b 30 49 8b 86 b0 04 00 00 48 89 c7 48 81 c7 98 00
  RIP  [a008fb37] i915_gem_gtt_finish_object+0x73/0xc8 [i915]
   RSP 88004bdf5958
  CR2: 0008
  
  As a consequence we need to change the only one vma for now check in
  vma_unbind - since vma_destroy isn't always called the obj-vma_list
  might not be empty. Instead check that the vma list is singular at the
  beginning of vma_unbind. This is also more symmetric with bind_to_vm.
  
  v2:
  - Add a paranoid WARN to mark_free in the eviction code to make sure
we never try to evict a vma used by the execbuf code right now.
  - Move the check for a temporary execbuf vma into vma_destroy -
otherwise the failure path cleanup in bind_to_vm will blow up.
  
  Our first attempting at fixing this was
  
  commit 1be81a2f2cfd8789a627401d470423358fba2d76
  Author: Chris Wilson ch...@chris-wilson.co.uk
  Date:   Tue Aug 20 12:56:40 2013 +0100
  
  drm/i915: Don't destroy the vma placeholder during execbuffer 
  reservation
  
  Squash with this when merging!
  
  v3: Improvements suggested in Chris' review:
  - Move the WARN_ON in vma_destroy that checks for vmas with an drm_mm
allocation before the early return.
  - Bail out if we hit the WARN in mark_free to hopefully make the
kernel survive for long enough to capture it.
  
  Cc: Chris Wilson ch...@chris-wilson.co.uk
  Cc: Ben

Re: [Intel-gfx] [PATCH] [VPG HSW-A] drm/i915: BUN vol4g[DevHSW]

2013-08-26 Thread Paulo Zanoni

2013/8/26 Koushik Biswas koushikx.bis...@intel.com:
 From: koushik koushikx.bis...@intel.com

 WW43 2012 - DDI buffer translation corrections
 WW36 2012 - Added HDMI voltage swing (not implemented
 for HDMI)

 Added comments with voltage swing, pre-emphasis,
 transition and non-transition values in form of table
 for reference. This values are applicable only for HSW
 platform.


But why exactly do we need this comment? We already have the
intel_hsw_signal_levels() function (inside intel_dp.c) which is
basically the implementation of your comment.



 Signed-off-by: koushik koushikx.bis...@intel.com
 Change-Id: I0cff220c7d047f41b2a96b3e3880b4029550d458
 ---
  drivers/gpu/drm/i915/intel_ddi.c |   31 +++
  drivers/gpu/drm/i915/intel_dp.c  |   32 
  2 files changed, 63 insertions(+)

 diff --git a/drivers/gpu/drm/i915/intel_ddi.c 
 b/drivers/gpu/drm/i915/intel_ddi.c
 index 5131517..0de236e 100644
 --- a/drivers/gpu/drm/i915/intel_ddi.c
 +++ b/drivers/gpu/drm/i915/intel_ddi.c
 @@ -128,6 +128,37 @@ void intel_prepare_ddi(struct drm_device *dev)
 intel_prepare_ddi_buffers(dev, PORT_E, true);
  }

 +
 +/* Updating the new table in comments as it doesn’t cause any logic change */
 +
 +/* For HSW Voltage swing levels ***/
 +/**/
 +/**/
 +/*|Entry|Voltage|Pre-emphasis|Non-Transition|Transition |Pre-emphasis|*/
 +/*| |Swing  |level   |mV diff p-p   |mV diff p-p|db  |*/
 +/*|--|*/
 +/*|0|0  |0   |400   |400|0   |*/
 +/*|--|*/
 +/*|1|0  |1   |400   |600|3.5 |*/
 +/*|--|*/
 +/*|2|0  |2   |400   |800|6   |*/
 +/*|--|*/
 +/*|3|0  |3   |400   |1000   |8   |*/
 +/*|--|*/
 +/*|4|1  |0   |600   |600|0   |*/
 +/*|--|*/
 +/*|5|1  |1   |600   |900|3.5 |*/
 +/*|--|*/
 +/*|6|1  |2   |600   |1000   |4.5 |*/
 +/*|--|*/
 +/*|7|2  |0   |800   |800|0   |*/
 +/*|--|*/
 +/*|8|2  |1   |1000  |1000   |2   |*/
 +/*|--|*/
 +/*|9|   Entry 9 is only used for HDMI and DVI|*/
 +/*|--|*/
 +/**/
 +
  static const long hsw_ddi_buf_ctl_values[] = {
 DDI_BUF_EMP_400MV_0DB_HSW,
 DDI_BUF_EMP_400MV_3_5DB_HSW,
 diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
 index 9fd7f90..fa73fb1 100644
 --- a/drivers/gpu/drm/i915/intel_dp.c
 +++ b/drivers/gpu/drm/i915/intel_dp.c
 @@ -1809,6 +1809,38 @@ intel_gen7_edp_signal_levels(uint8_t train_set)
 }
  }

 +
 +/* Updating the new table in comments as it doesn’t cause any logic change */
 +
 +/* For HSW Voltage swing levels ***/
 +/**/
 +/**/
 +/*|Entry|Voltage|Pre-emphasis|Non-Transition|Transition |Pre-emphasis|*/
 +/*| |Swing  |level   |mV diff p-p   |mV diff p-p|db  |*/
 +/*|--|*/
 +/*|0|0  |0   |400   |400|0   |*/
 +/*|--|*/
 +/*|1|0  |1   |400   |600|3.5 |*/
 +/*|--|*/
 +/*|2|0  |2   |400   |800|6   |*/
 +/*|--|*/
 +/*|3|0  |3   |400   |1000   |8   |*/
 +/*|--|*/
 +/*|4|1  |0   |600   |600|0   |*/
 +/*|--|*/
 +/*|5

[Intel-gfx] [PATCH 1/2] drm/i915: Embed the ring-private within the struct intel_ring_buffer

We now have more devices using ring-private than not, and they all want
the same structure. Worse, I would like to use a scratch page from
outside of intel_ringbuffer.c and so for convenience would like to reuse
ring-private. Embed the object into the struct intel_ringbuffer so that
we can keep the code clean.

Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
---
 drivers/gpu/drm/i915/i915_gpu_error.c   |  2 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c | 99 ++---
 drivers/gpu/drm/i915/intel_ringbuffer.h |  6 +-
 3 files changed, 35 insertions(+), 72 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c 
b/drivers/gpu/drm/i915/i915_gpu_error.c
index e8955e7..3b003af 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -644,7 +644,7 @@ i915_error_first_batchbuffer(struct drm_i915_private 
*dev_priv,
if (WARN_ON(ring-id != RCS))
return NULL;
 
-   obj = ring-private;
+   obj = ring-scratch.obj;
if (acthd = i915_gem_obj_ggtt_offset(obj) 
acthd  i915_gem_obj_ggtt_offset(obj) + obj-base.size)
return i915_error_object_create(dev_priv, obj);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
b/drivers/gpu/drm/i915/intel_ringbuffer.c
index b0fb5ce..7fa52bd 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -33,16 +33,6 @@
 #include i915_trace.h
 #include intel_drv.h
 
-/*
- * 965+ support PIPE_CONTROL commands, which provide finer grained control
- * over cache flushing.
- */
-struct pipe_control {
-   struct drm_i915_gem_object *obj;
-   volatile u32 *cpu_page;
-   u32 gtt_offset;
-};
-
 static inline int ring_space(struct intel_ring_buffer *ring)
 {
int space = (ring-head  HEAD_ADDR) - (ring-tail + 
I915_RING_FREE_SPACE);
@@ -185,8 +175,7 @@ gen4_render_ring_flush(struct intel_ring_buffer *ring,
 static int
 intel_emit_post_sync_nonzero_flush(struct intel_ring_buffer *ring)
 {
-   struct pipe_control *pc = ring-private;
-   u32 scratch_addr = pc-gtt_offset + 128;
+   u32 scratch_addr = ring-scratch.gtt_offset + 128;
int ret;
 
 
@@ -223,8 +212,7 @@ gen6_render_ring_flush(struct intel_ring_buffer *ring,
  u32 invalidate_domains, u32 flush_domains)
 {
u32 flags = 0;
-   struct pipe_control *pc = ring-private;
-   u32 scratch_addr = pc-gtt_offset + 128;
+   u32 scratch_addr = ring-scratch.gtt_offset + 128;
int ret;
 
/* Force SNB workarounds for PIPE_CONTROL flushes */
@@ -316,8 +304,7 @@ gen7_render_ring_flush(struct intel_ring_buffer *ring,
   u32 invalidate_domains, u32 flush_domains)
 {
u32 flags = 0;
-   struct pipe_control *pc = ring-private;
-   u32 scratch_addr = pc-gtt_offset + 128;
+   u32 scratch_addr = ring-scratch.gtt_offset + 128;
int ret;
 
/*
@@ -491,68 +478,43 @@ out:
 static int
 init_pipe_control(struct intel_ring_buffer *ring)
 {
-   struct pipe_control *pc;
-   struct drm_i915_gem_object *obj;
int ret;
 
-   if (ring-private)
+   if (ring-scratch.obj)
return 0;
 
-   pc = kmalloc(sizeof(*pc), GFP_KERNEL);
-   if (!pc)
-   return -ENOMEM;
-
-   obj = i915_gem_alloc_object(ring-dev, 4096);
-   if (obj == NULL) {
+   ring-scratch.obj = i915_gem_alloc_object(ring-dev, 4096);
+   if (ring-scratch.obj == NULL) {
DRM_ERROR(Failed to allocate seqno page\n);
ret = -ENOMEM;
goto err;
}
 
-   i915_gem_object_set_cache_level(obj, I915_CACHE_LLC);
+   i915_gem_object_set_cache_level(ring-scratch.obj, I915_CACHE_LLC);
 
-   ret = i915_gem_obj_ggtt_pin(obj, 4096, true, false);
+   ret = i915_gem_obj_ggtt_pin(ring-scratch.obj, 4096, true, false);
if (ret)
goto err_unref;
 
-   pc-gtt_offset = i915_gem_obj_ggtt_offset(obj);
-   pc-cpu_page = kmap(sg_page(obj-pages-sgl));
-   if (pc-cpu_page == NULL) {
+   ring-scratch.gtt_offset = i915_gem_obj_ggtt_offset(ring-scratch.obj);
+   ring-scratch.cpu_page = kmap(sg_page(ring-scratch.obj-pages-sgl));
+   if (ring-scratch.cpu_page == NULL) {
ret = -ENOMEM;
goto err_unpin;
}
 
DRM_DEBUG_DRIVER(%s pipe control offset: 0x%08x\n,
-ring-name, pc-gtt_offset);
-
-   pc-obj = obj;
-   ring-private = pc;
+ring-name, ring-scratch.gtt_offset);
return 0;
 
 err_unpin:
-   i915_gem_object_unpin(obj);
+   i915_gem_object_unpin(ring-scratch.obj);
 err_unref:
-   drm_gem_object_unreference(obj-base);
+   drm_gem_object_unreference(ring-scratch.obj-base);
 err:
-   kfree(pc);
return ret;
 }
 
-static void
-cleanup_pipe_control(struct

[Intel-gfx] [PATCH 2/2] drm/i915: Use RCS flips on Ivybridge+

RCS flips do work on Iybridge+ so long as we can unmask the messages
through DERRMR. However, there are quite a few workarounds mentioned
regarding unmasking more than one event or triggering more than one
message through DERRMR. Those workarounds in principle prevent us from
performing pipelined flips (and asynchronous flips across multiple
planes) and equally apply to the known good BCS ring. Given that it
already appears to work, and also appears to work with unmasking all 3
planes at once (and queuing flips across multiple planes), be brave.

Bugzlla: https://bugs.freedesktop.org/show_bug.cgi?id=67600
Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
Lightly-tested-by: Stephane Marchesin marche...@icps.u-strasbg.fr
Cc: Stephane Marchesin marche...@icps.u-strasbg.fr
Cc: Ben Widawsky b...@bwidawsk.net
---
 drivers/gpu/drm/i915/i915_reg.h  | 18 
 drivers/gpu/drm/i915/intel_display.c | 40 
 2 files changed, 49 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index c6f5009..df168f4 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -230,6 +230,7 @@
  *   address/value pairs. Don't overdue it, though, x = 2^4 must hold!
  */
 #define MI_LOAD_REGISTER_IMM(x)MI_INSTR(0x22, 2*x-1)
+#define MI_STORE_REGISTER_MEM(x) MI_INSTR(0x24, 2*x-1)
 #define MI_FLUSH_DWMI_INSTR(0x26, 1) /* for GEN6 */
 #define   MI_FLUSH_DW_STORE_INDEX  (121)
 #define   MI_INVALIDATE_TLB(118)
@@ -678,6 +679,23 @@
 #define   FPGA_DBG_RM_NOCLAIM  (131)
 
 #define DERRMR 0x44050
+#define   DERRMR_PIPEA_SCANLINE(10)
+#define   DERRMR_PIPEA_PRI_FLIP_DONE   (11)
+#define   DERRMR_PIPEA_SPR_FLIP_DONE   (12)
+#define   DERRMR_PIPEA_VBLANK  (13)
+#define   DERRMR_PIPEA_HBLANK  (15)
+#define   DERRMR_PIPEB_SCANLINE(18)
+#define   DERRMR_PIPEB_PRI_FLIP_DONE   (19)
+#define   DERRMR_PIPEB_SPR_FLIP_DONE   (110)
+#define   DERRMR_PIPEB_VBLANK  (111)
+#define   DERRMR_PIPEB_HBLANK  (113)
+/* Note that PIPEC is not a simple translation of PIPEA/PIPEB */
+#define   DERRMR_PIPEC_SCANLINE(114)
+#define   DERRMR_PIPEC_PRI_FLIP_DONE   (115)
+#define   DERRMR_PIPEC_SPR_FLIP_DONE   (120)
+#define   DERRMR_PIPEC_VBLANK  (121)
+#define   DERRMR_PIPEC_HBLANK  (122)
+
 
 /* GM45+ chicken bits -- debug workaround bits that may be required
  * for various sorts of correct behavior.  The top 16 bits of each are
diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index 9748dce..ffbcbd1 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -7826,12 +7826,6 @@ err:
return ret;
 }
 
-/*
- * On gen7 we currently use the blit ring because (in early silicon at least)
- * the render ring doesn't give us interrpts for page flip completion, which
- * means clients will hang after the first flip is queued.  Fortunately the
- * blit ring generates interrupts properly, so use it instead.
- */
 static int intel_gen7_queue_flip(struct drm_device *dev,
 struct drm_crtc *crtc,
 struct drm_framebuffer *fb,
@@ -7839,9 +7833,13 @@ static int intel_gen7_queue_flip(struct drm_device *dev,
 {
struct drm_i915_private *dev_priv = dev-dev_private;
struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
-   struct intel_ring_buffer *ring = dev_priv-ring[BCS];
+   struct intel_ring_buffer *ring;
uint32_t plane_bit = 0;
-   int ret;
+   int len, ret;
+
+   ring = obj-ring;
+   if (ring == NULL || ring-id != RCS)
+   ring = dev_priv-ring[BCS];
 
ret = intel_pin_and_fence_fb_obj(dev, obj, ring);
if (ret)
@@ -7863,10 +7861,34 @@ static int intel_gen7_queue_flip(struct drm_device *dev,
goto err_unpin;
}
 
-   ret = intel_ring_begin(ring, 4);
+   len = 4;
+   if (ring-id == RCS)
+   len += 6;
+
+   ret = intel_ring_begin(ring, len);
if (ret)
goto err_unpin;
 
+   /* Unmask the flip-done completion message. Note that the bspec says 
that
+* we should do this for both the BCS and RCS, and that we must not 
unmask
+* more than one flip event at any time (or ensure that one flip message
+* can be sent by waiting for flip-done prior to queueing new flips).
+* Experimentation says that BCS works despite DERRMR masking all
+* flip-done completion events and that unmasking all planes at once
+* for the RCS also doesn't appear to drop events. Setting the DERRMR
+* to zero does lead to lockups within MI_DISPLAY_FLIP.
+*/
+   if (ring-id == RCS) {
+   intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
+   intel_ring_emit(ring, DERRMR);
+

Re: [Intel-gfx] [PATCH] Add second DRI driver name (DRI2DriverVDPAU)

2013-08-26 Thread Eric Anholt

Ибрагимов Ринат ibragimovri...@mail.ru writes:

 libvdpau uses second DRI driver name to determine which VDPAU driver
 to use. This patch will allow libvdpau choose libvdpau_i965.so on systems
 with Intel GPUs, libvdpau_nvidia.so on those with nVidia ones, and so on.
 I'm experimenting now with generic vdpau driver using OpenGL/VA-API,
 it would be convenient to have this driver selection working without manual
 driver selection.

If it's a generic driver, why would it need a i965 string passed over
the DRI2 protocol to find it?

One of the things I'm planning on doing is removing use of the
protocol-provided DRI2 driver name from Mesa -- Mesa knows what drivers
it has built, and it knows how to detect those devices (in the EGL
code), so why would we pay attention to what the X Server thinks our
driver's name is?  Seems like vdpau could probably do the same.


pgpR0lyDH__GV.pgp
Description: PGP signature
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 2/2] drm/i915: Use RCS flips on Ivybridge+

2013-08-26 Thread Stéphane Marchesin

On Mon, Aug 26, 2013 at 12:58 PM, Chris Wilson ch...@chris-wilson.co.ukwrote:

 RCS flips do work on Iybridge+ so long as we can unmask the messages
 through DERRMR. However, there are quite a few workarounds mentioned
 regarding unmasking more than one event or triggering more than one
 message through DERRMR. Those workarounds in principle prevent us from
 performing pipelined flips (and asynchronous flips across multiple
 planes) and equally apply to the known good BCS ring. Given that it
 already appears to work, and also appears to work with unmasking all 3
 planes at once (and queuing flips across multiple planes), be brave.

 Bugzlla: https://bugs.freedesktop.org/show_bug.cgi?id=67600
 Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
 Lightly-tested-by: Stephane Marchesin marche...@icps.u-strasbg.fr


You can go ahead and say Tested-by, I ran my usual tests for 3 hours and it
didn't crash/show an issue. It would crash in ~10-30 minutes with the other
patches.

Stéphane


 Cc: Stephane Marchesin marche...@icps.u-strasbg.fr
 Cc: Ben Widawsky b...@bwidawsk.net
 ---
  drivers/gpu/drm/i915/i915_reg.h  | 18 
  drivers/gpu/drm/i915/intel_display.c | 40
 
  2 files changed, 49 insertions(+), 9 deletions(-)

 diff --git a/drivers/gpu/drm/i915/i915_reg.h
 b/drivers/gpu/drm/i915/i915_reg.h
 index c6f5009..df168f4 100644
 --- a/drivers/gpu/drm/i915/i915_reg.h
 +++ b/drivers/gpu/drm/i915/i915_reg.h
 @@ -230,6 +230,7 @@
   *   address/value pairs. Don't overdue it, though, x = 2^4 must hold!
   */
  #define MI_LOAD_REGISTER_IMM(x)MI_INSTR(0x22, 2*x-1)
 +#define MI_STORE_REGISTER_MEM(x) MI_INSTR(0x24, 2*x-1)
  #define MI_FLUSH_DWMI_INSTR(0x26, 1) /* for GEN6 */
  #define   MI_FLUSH_DW_STORE_INDEX  (121)
  #define   MI_INVALIDATE_TLB(118)
 @@ -678,6 +679,23 @@
  #define   FPGA_DBG_RM_NOCLAIM  (131)

  #define DERRMR 0x44050
 +#define   DERRMR_PIPEA_SCANLINE(10)
 +#define   DERRMR_PIPEA_PRI_FLIP_DONE   (11)
 +#define   DERRMR_PIPEA_SPR_FLIP_DONE   (12)
 +#define   DERRMR_PIPEA_VBLANK  (13)
 +#define   DERRMR_PIPEA_HBLANK  (15)
 +#define   DERRMR_PIPEB_SCANLINE(18)
 +#define   DERRMR_PIPEB_PRI_FLIP_DONE   (19)
 +#define   DERRMR_PIPEB_SPR_FLIP_DONE   (110)
 +#define   DERRMR_PIPEB_VBLANK  (111)
 +#define   DERRMR_PIPEB_HBLANK  (113)
 +/* Note that PIPEC is not a simple translation of PIPEA/PIPEB */
 +#define   DERRMR_PIPEC_SCANLINE(114)
 +#define   DERRMR_PIPEC_PRI_FLIP_DONE   (115)
 +#define   DERRMR_PIPEC_SPR_FLIP_DONE   (120)
 +#define   DERRMR_PIPEC_VBLANK  (121)
 +#define   DERRMR_PIPEC_HBLANK  (122)
 +

  /* GM45+ chicken bits -- debug workaround bits that may be required
   * for various sorts of correct behavior.  The top 16 bits of each are
 diff --git a/drivers/gpu/drm/i915/intel_display.c
 b/drivers/gpu/drm/i915/intel_display.c
 index 9748dce..ffbcbd1 100644
 --- a/drivers/gpu/drm/i915/intel_display.c
 +++ b/drivers/gpu/drm/i915/intel_display.c
 @@ -7826,12 +7826,6 @@ err:
 return ret;
  }

 -/*
 - * On gen7 we currently use the blit ring because (in early silicon at
 least)
 - * the render ring doesn't give us interrpts for page flip completion,
 which
 - * means clients will hang after the first flip is queued.  Fortunately
 the
 - * blit ring generates interrupts properly, so use it instead.
 - */
  static int intel_gen7_queue_flip(struct drm_device *dev,
  struct drm_crtc *crtc,
  struct drm_framebuffer *fb,
 @@ -7839,9 +7833,13 @@ static int intel_gen7_queue_flip(struct drm_device
 *dev,
  {
 struct drm_i915_private *dev_priv = dev-dev_private;
 struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
 -   struct intel_ring_buffer *ring = dev_priv-ring[BCS];
 +   struct intel_ring_buffer *ring;
 uint32_t plane_bit = 0;
 -   int ret;
 +   int len, ret;
 +
 +   ring = obj-ring;
 +   if (ring == NULL || ring-id != RCS)
 +   ring = dev_priv-ring[BCS];

 ret = intel_pin_and_fence_fb_obj(dev, obj, ring);
 if (ret)
 @@ -7863,10 +7861,34 @@ static int intel_gen7_queue_flip(struct drm_device
 *dev,
 goto err_unpin;
 }

 -   ret = intel_ring_begin(ring, 4);
 +   len = 4;
 +   if (ring-id == RCS)
 +   len += 6;
 +
 +   ret = intel_ring_begin(ring, len);
 if (ret)
 goto err_unpin;

 +   /* Unmask the flip-done completion message. Note that the bspec
 says that
 +* we should do this for both the BCS and RCS, and that we must
 not unmask
 +* more than one flip event at any time (or ensure that one flip
 message
 +* can be sent by waiting for flip-done prior to queueing new
 flips).
 +* Experimentation says that BCS works despite DERRMR masking all
 +* flip-done

Re: [Intel-gfx] [PATCH] drm/i915: Write RING_TAIL once per-request

2013-08-26 Thread Ben Widawsky

On Sat, Aug 10, 2013 at 10:16:32PM +0100, Chris Wilson wrote:
 Ignoring the legacy DRI1 code, and a couple of special cases (to be
 discussed later), all access to the ring is mediated through requests.
 The first write to a ring will grab a seqno and mark the ring as having
 an outstanding_lazy_request. Either through explicitly adding a request
 after an execbuffer or through an implicit wait (either by the CPU or by
 a semaphore), that sequence of writes will be terminated with a request.
 So we can ellide all the intervening writes to the tail register and
 send the entire command stream to the GPU at once. This will reduce the
 number of *serialising* writes to the tail register by a factor or 3-5
 times (depending upon architecture and number of workarounds, context
 switches, etc involved). This becomes even more noticeable when the
 register write is overloaded with a number of debugging tools. The
 astute reader will wonder if it is then possible to overflow the ring
 with a single command. It is not. When we start a command sequence to
 the ring, we check for available space and issue a wait in case we have
 not. The ring wait will in this case be forced to flush the outstanding
 register write and then poll the ACTHD for sufficient space to continue.
 
 The exception to the rule where everything is inside a request are a few
 initialisation cases where we may want to write GPU commands via the CS
 before userspace wakes up and page flips.
 

I'm not a huge fan of having the second intel_ring_advance() that does something
other than it sounds like. I'd *much* prefer to not intel_ring_advance()
if you are certain more emits will be coming like in the case you
mention above. We can add a paranoia check whenever we're about to
return to userspace that tail == RING_TAIL

Also, without performance data, it's hard to say this indirection is
worth it.

 Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
 ---
  drivers/gpu/drm/i915/i915_dma.c |  2 +-
  drivers/gpu/drm/i915/i915_gem_exec.c|  2 +-
  drivers/gpu/drm/i915/intel_display.c| 10 +-
  drivers/gpu/drm/i915/intel_ringbuffer.c | 30 --
  drivers/gpu/drm/i915/intel_ringbuffer.h |  7 ++-
  5 files changed, 29 insertions(+), 22 deletions(-)
 
 diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
 index de0b86a..a7a0eb86 100644
 --- a/drivers/gpu/drm/i915/i915_dma.c
 +++ b/drivers/gpu/drm/i915/i915_dma.c
 @@ -52,7 +52,7 @@
   intel_ring_emit(LP_RING(dev_priv), x)
  
  #define ADVANCE_LP_RING() \
 - intel_ring_advance(LP_RING(dev_priv))
 + __intel_ring_advance(LP_RING(dev_priv))
  
  /**
   * Lock test for when it's just for synchronization of ring access.
 diff --git a/drivers/gpu/drm/i915/i915_gem_exec.c 
 b/drivers/gpu/drm/i915/i915_gem_exec.c
 index a2c6dbf..4da3704 100644
 --- a/drivers/gpu/drm/i915/i915_gem_exec.c
 +++ b/drivers/gpu/drm/i915/i915_gem_exec.c
 @@ -111,7 +111,7 @@ int i915_gem_exec_clear_object(struct drm_i915_gem_object 
 *obj)
   intel_ring_emit(ring, 0);
   intel_ring_emit(ring, MI_NOOP);
  
 - intel_ring_advance(ring);
 + __intel_ring_advance(ring);
   i915_gem_exec_dirty_object(obj, ring);
  
  unpin:
 diff --git a/drivers/gpu/drm/i915/intel_display.c 
 b/drivers/gpu/drm/i915/intel_display.c
 index b26b50b..6b7ce06 100644
 --- a/drivers/gpu/drm/i915/intel_display.c
 +++ b/drivers/gpu/drm/i915/intel_display.c
 @@ -7547,7 +7547,7 @@ static int intel_gen2_queue_flip(struct drm_device *dev,
   intel_ring_emit(ring, 0); /* aux display base address, unused */
  
   intel_mark_page_flip_active(intel_crtc);
 - intel_ring_advance(ring);
 + __intel_ring_advance(ring);
   return 0;
  
  err_unpin:
 @@ -7588,7 +7588,7 @@ static int intel_gen3_queue_flip(struct drm_device *dev,
   intel_ring_emit(ring, MI_NOOP);
  
   intel_mark_page_flip_active(intel_crtc);
 - intel_ring_advance(ring);
 + __intel_ring_advance(ring);
   return 0;
  
  err_unpin:
 @@ -7636,7 +7636,7 @@ static int intel_gen4_queue_flip(struct drm_device *dev,
   intel_ring_emit(ring, pf | pipesrc);
  
   intel_mark_page_flip_active(intel_crtc);
 - intel_ring_advance(ring);
 + __intel_ring_advance(ring);
   return 0;
  
  err_unpin:
 @@ -7680,7 +7680,7 @@ static int intel_gen6_queue_flip(struct drm_device *dev,
   intel_ring_emit(ring, pf | pipesrc);
  
   intel_mark_page_flip_active(intel_crtc);
 - intel_ring_advance(ring);
 + __intel_ring_advance(ring);
   return 0;
  
  err_unpin:
 @@ -7736,7 +7736,7 @@ static int intel_gen7_queue_flip(struct drm_device *dev,
   intel_ring_emit(ring, (MI_NOOP));
  
   intel_mark_page_flip_active(intel_crtc);
 - intel_ring_advance(ring);
 + __intel_ring_advance(ring);
   return 0;
  
  err_unpin:
 diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
 b/drivers/gpu/drm/i915/intel_ringbuffer.c
 index 7de95da..8d5dd65 100644
 ---

Re: [Intel-gfx] [PATCH 2/2] drm/i915: Use RCS flips on Ivybridge+

2013-08-26 Thread Stéphane Marchesin

On Mon, Aug 26, 2013 at 1:42 PM, Stéphane Marchesin 
stephane.marche...@gmail.com wrote:




 On Mon, Aug 26, 2013 at 12:58 PM, Chris Wilson 
 ch...@chris-wilson.co.ukwrote:

 RCS flips do work on Iybridge+ so long as we can unmask the messages
 through DERRMR. However, there are quite a few workarounds mentioned
 regarding unmasking more than one event or triggering more than one
 message through DERRMR. Those workarounds in principle prevent us from
 performing pipelined flips (and asynchronous flips across multiple
 planes) and equally apply to the known good BCS ring. Given that it
 already appears to work, and also appears to work with unmasking all 3
 planes at once (and queuing flips across multiple planes), be brave.

 Bugzlla: https://bugs.freedesktop.org/show_bug.cgi?id=67600
 Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
 Lightly-tested-by: Stephane Marchesin marche...@icps.u-strasbg.fr


 You can go ahead and say Tested-by, I ran my usual tests for 3 hours and
 it didn't crash/show an issue. It would crash in ~10-30 minutes with the
 other patches.


Oh actually this one is a bit different... Give me 3 hours :)

Stéphane



 Stéphane


 Cc: Stephane Marchesin marche...@icps.u-strasbg.fr
 Cc: Ben Widawsky b...@bwidawsk.net
 ---
  drivers/gpu/drm/i915/i915_reg.h  | 18 
  drivers/gpu/drm/i915/intel_display.c | 40
 
  2 files changed, 49 insertions(+), 9 deletions(-)

 diff --git a/drivers/gpu/drm/i915/i915_reg.h
 b/drivers/gpu/drm/i915/i915_reg.h
 index c6f5009..df168f4 100644
 --- a/drivers/gpu/drm/i915/i915_reg.h
 +++ b/drivers/gpu/drm/i915/i915_reg.h
 @@ -230,6 +230,7 @@
   *   address/value pairs. Don't overdue it, though, x = 2^4 must hold!
   */
  #define MI_LOAD_REGISTER_IMM(x)MI_INSTR(0x22, 2*x-1)
 +#define MI_STORE_REGISTER_MEM(x) MI_INSTR(0x24, 2*x-1)
  #define MI_FLUSH_DWMI_INSTR(0x26, 1) /* for GEN6 */
  #define   MI_FLUSH_DW_STORE_INDEX  (121)
  #define   MI_INVALIDATE_TLB(118)
 @@ -678,6 +679,23 @@
  #define   FPGA_DBG_RM_NOCLAIM  (131)

  #define DERRMR 0x44050
 +#define   DERRMR_PIPEA_SCANLINE(10)
 +#define   DERRMR_PIPEA_PRI_FLIP_DONE   (11)
 +#define   DERRMR_PIPEA_SPR_FLIP_DONE   (12)
 +#define   DERRMR_PIPEA_VBLANK  (13)
 +#define   DERRMR_PIPEA_HBLANK  (15)
 +#define   DERRMR_PIPEB_SCANLINE(18)
 +#define   DERRMR_PIPEB_PRI_FLIP_DONE   (19)
 +#define   DERRMR_PIPEB_SPR_FLIP_DONE   (110)
 +#define   DERRMR_PIPEB_VBLANK  (111)
 +#define   DERRMR_PIPEB_HBLANK  (113)
 +/* Note that PIPEC is not a simple translation of PIPEA/PIPEB */
 +#define   DERRMR_PIPEC_SCANLINE(114)
 +#define   DERRMR_PIPEC_PRI_FLIP_DONE   (115)
 +#define   DERRMR_PIPEC_SPR_FLIP_DONE   (120)
 +#define   DERRMR_PIPEC_VBLANK  (121)
 +#define   DERRMR_PIPEC_HBLANK  (122)
 +

  /* GM45+ chicken bits -- debug workaround bits that may be required
   * for various sorts of correct behavior.  The top 16 bits of each are
 diff --git a/drivers/gpu/drm/i915/intel_display.c
 b/drivers/gpu/drm/i915/intel_display.c
 index 9748dce..ffbcbd1 100644
 --- a/drivers/gpu/drm/i915/intel_display.c
 +++ b/drivers/gpu/drm/i915/intel_display.c
 @@ -7826,12 +7826,6 @@ err:
 return ret;
  }

 -/*
 - * On gen7 we currently use the blit ring because (in early silicon at
 least)
 - * the render ring doesn't give us interrpts for page flip completion,
 which
 - * means clients will hang after the first flip is queued.  Fortunately
 the
 - * blit ring generates interrupts properly, so use it instead.
 - */
  static int intel_gen7_queue_flip(struct drm_device *dev,
  struct drm_crtc *crtc,
  struct drm_framebuffer *fb,
 @@ -7839,9 +7833,13 @@ static int intel_gen7_queue_flip(struct drm_device
 *dev,
  {
 struct drm_i915_private *dev_priv = dev-dev_private;
 struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
 -   struct intel_ring_buffer *ring = dev_priv-ring[BCS];
 +   struct intel_ring_buffer *ring;
 uint32_t plane_bit = 0;
 -   int ret;
 +   int len, ret;
 +
 +   ring = obj-ring;
 +   if (ring == NULL || ring-id != RCS)
 +   ring = dev_priv-ring[BCS];

 ret = intel_pin_and_fence_fb_obj(dev, obj, ring);
 if (ret)
 @@ -7863,10 +7861,34 @@ static int intel_gen7_queue_flip(struct
 drm_device *dev,
 goto err_unpin;
 }

 -   ret = intel_ring_begin(ring, 4);
 +   len = 4;
 +   if (ring-id == RCS)
 +   len += 6;
 +
 +   ret = intel_ring_begin(ring, len);
 if (ret)
 goto err_unpin;

 +   /* Unmask the flip-done completion message. Note that the bspec
 says that
 +* we should do this for both the BCS and RCS, and that we must
 not unmask
 +* more than one flip event at any time (or ensure that one flip
 message
 +

Re: [Intel-gfx] [PATCH 2/2] drm/i915: Use RCS flips on Ivybridge+

On Mon, Aug 26, 2013 at 01:43:25PM -0700, Stéphane Marchesin wrote:
On Mon, Aug 26, 2013 at 1:42 PM, St�phane Marchesin
[1]stephane.marche...@gmail.com wrote:
 
  On Mon, Aug 26, 2013 at 12:58 PM, Chris Wilson
  [2]ch...@chris-wilson.co.uk wrote:
 
RCS flips do work on Iybridge+ so long as we can unmask the messages
through DERRMR. However, there are quite a few workarounds mentioned
regarding unmasking more than one event or triggering more than one
message through DERRMR. Those workarounds in principle prevent us from
performing pipelined flips (and asynchronous flips across multiple
planes) and equally apply to the known good BCS ring. Given that it
already appears to work, and also appears to work with unmasking all 3
planes at once (and queuing flips across multiple planes), be brave.
 
Bugzlla: [3]https://bugs.freedesktop.org/show_bug.cgi?id=67600
Signed-off-by: Chris Wilson [4]ch...@chris-wilson.co.uk
Lightly-tested-by: Stephane Marchesin [5]marche...@icps.u-strasbg.fr
 
  You can go ahead and say Tested-by, I ran my usual tests for 3 hours and
  it didn't crash/show an issue. It would crash in ~10-30 minutes with the
  other patches.
 
Oh actually this one is a bit different... Give me 3 hours :)
St�phane

I should have mentionted this was v2, and your testing was for v1. Hopefully,
I've moved the code around correctly.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] Add second DRI driver name (DRI2DriverVDPAU)

2013-08-26 Thread Rinat Ibragimov

Понедельник, 26 августа 2013, 13:40 -07:00 от Eric Anholt e...@anholt.net:
Ибрагимов Ринат ibragimovri...@mail.ru writes:

libvdpau uses second DRI driver name to determine which VDPAU driver
to use. This patch will allow libvdpau choose libvdpau_i965.so on systems
with Intel GPUs, libvdpau_nvidia.so on those with nVidia ones, and so on.
I'm experimenting now with generic vdpau driver using OpenGL/VA-API,
it would be convenient to have this driver selection working without manual
driver selection.

If it's a generic driver, why would it need a i965 string passed over
the DRI2 protocol to find it?

It doesn't actually. It's just to simplify distribution package management.
If one names driver libvdpau_nvidia.so.1, it will break VDPAU on
nVidia equipped machines. With this patch one can name it
libvdpau_i965.so.1 and driver will be loaded on intel equipped
machines only.

One of the things I'm planning on doing is removing use of the
protocol-provided DRI2 driver name from Mesa -- Mesa knows what drivers
it has built, and it knows how to detect those devices (in the EGL
code), so why would we pay attention to what the X Server thinks our
driver's name is? Seems like vdpau could probably do the same.

VDPAU uses entry point library (libvdpau) which loads backend drivers.
libvdpau uses second DRI2 driver name amongst other methods to determine
which driver to load. It can not rely on Mesa's internal knowledge, because
it must work with proprietary nVidia driver too.

---
Rinat
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH] drm/i915: Rename __intel_ring_advance() to intel_ring_commit()

Ben complained in his review of write RING_TAIL once per request that
intel_ring_advance() did nothing and was confusing with
__intel_ring_advance() doing the hard work. This patch drops the
intel_ring_advance(), and renames __intel_ring_advance() to
intel_ring_commit(). We go one step further and rename intel_ring_begin()
to intel_ring_get_space() so that the asymmetry looks less startling.

The advantage of intel_ring_begin()/intel_ring_advance() is that that
demarcated a known number of dword writes to the ring, with the
intention of making it easier for the reviewer to check the count.

The advantage of intel_ring_get_space(), ..., intel_ring_commit() is
that the relaxed rules on the TAIL write are more obvious. There is no
implicit nop (intel_ring_advance) to befuddle the reader. If the
commands need to be emitter immediate, the reader knows to look for an
explicit intel_ring_commit().

Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
---
 drivers/gpu/drm/i915/i915_dma.c|4 +-
 drivers/gpu/drm/i915/i915_gem_context.c|7 +--
 drivers/gpu/drm/i915/i915_gem_exec.c   |4 +-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |9 +--
 drivers/gpu/drm/i915/intel_display.c   |   20 +++
 drivers/gpu/drm/i915/intel_overlay.c   |   12 ++--
 drivers/gpu/drm/i915/intel_ringbuffer.c|   82 +++-
 drivers/gpu/drm/i915/intel_ringbuffer.h|8 +--
 8 files changed, 58 insertions(+), 88 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index c0fb23e..fb50467 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -46,13 +46,13 @@
 #define LP_RING(d) (((struct drm_i915_private *)(d))-ring[RCS])
 
 #define BEGIN_LP_RING(n) \
-   intel_ring_begin(LP_RING(dev_priv), (n))
+   intel_ring_get_space(LP_RING(dev_priv), (n))
 
 #define OUT_RING(x) \
intel_ring_emit(LP_RING(dev_priv), x)
 
 #define ADVANCE_LP_RING() \
-   __intel_ring_advance(LP_RING(dev_priv))
+   intel_ring_commit(LP_RING(dev_priv))
 
 /**
  * Lock test for when it's just for synchronization of ring access.
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
b/drivers/gpu/drm/i915/i915_gem_context.c
index fae2f4d..729b1bb 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -370,7 +370,7 @@ mi_set_context(struct intel_ring_buffer *ring,
if (!new_context-is_initialized)
len += 2;
 
-   ret = intel_ring_begin(ring, len);
+   ret = intel_ring_get_space(ring, len);
if (ret)
return ret;
 
@@ -419,10 +419,7 @@ mi_set_context(struct intel_ring_buffer *ring,
break;
}
 
-   intel_ring_advance(ring);
-
-   return ret;
-
+   return 0;
 }
 
 static int do_switch(struct i915_hw_context *to)
diff --git a/drivers/gpu/drm/i915/i915_gem_exec.c 
b/drivers/gpu/drm/i915/i915_gem_exec.c
index 4da3704..2d434b2 100644
--- a/drivers/gpu/drm/i915/i915_gem_exec.c
+++ b/drivers/gpu/drm/i915/i915_gem_exec.c
@@ -100,7 +100,7 @@ int i915_gem_exec_clear_object(struct drm_i915_gem_object 
*obj)
if (ret)
goto unpin;
 
-   ret = intel_ring_begin(ring, 6);
+   ret = intel_ring_get_space(ring, 6);
if (ret)
goto unpin;
 
@@ -111,7 +111,7 @@ int i915_gem_exec_clear_object(struct drm_i915_gem_object 
*obj)
intel_ring_emit(ring, 0);
intel_ring_emit(ring, MI_NOOP);
 
-   __intel_ring_advance(ring);
+   intel_ring_commit(ring);
i915_gem_exec_dirty_object(obj, ring);
 
 unpin:
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 9b7785d..ca0a2f5 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -899,7 +899,7 @@ i915_reset_gen7_sol_offsets(struct drm_device *dev,
if (!IS_GEN7(dev) || ring != dev_priv-ring[RCS])
return 0;
 
-   ret = intel_ring_begin(ring, 4 * 3);
+   ret = intel_ring_get_space(ring, 4 * 3);
if (ret)
return ret;
 
@@ -909,8 +909,6 @@ i915_reset_gen7_sol_offsets(struct drm_device *dev,
intel_ring_emit(ring, 0);
}
 
-   intel_ring_advance(ring);
-
return 0;
 }
 
@@ -1121,15 +1119,14 @@ i915_gem_do_execbuffer(struct drm_device *dev, void 
*data,
 
if (ring == dev_priv-ring[RCS] 
mode != dev_priv-relative_constants_mode) {
-   ret = intel_ring_begin(ring, 4);
+   ret = intel_ring_get_space(ring, 4);
if (ret)
-   goto err;
+   goto err;
 
intel_ring_emit(ring, MI_NOOP);
intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
intel_ring_emit(ring, INSTPM);
intel_ring_emit(ring, mask  16 | mode);
-   intel_ring_advance(ring);

[Intel-gfx] [PATCH 01/17] drm/i915: Do not add an interrupt for a context switch

From: Chris Wilson ch...@chris-wilson.co.uk

We use the request to ensure we hold a reference to the context for the
duration that it remains in use by the ring. Each request only holds a
reference to the current context, hence we emit a request after
switching contexts with the final reference to the old context. However,
the extra interrupt caused by that request is not useful (no timing
critical function will wait for the context object), instead the overhead
of servicing the IRQ shows up in some (lightweight) benchmarks. In order
to keep the useful property of using the request to manage the context
lifetime, we want to add a dummy request that is associated with the
interrupt from the subsequent real request following the batch.

The extra interrupt was added as a side-effect of using
i915_add_request() in

commit 112522f6789581824903f6f72082b5b841a7f0f9
Author: Chris Wilson ch...@chris-wilson.co.uk
Date:   Thu May 2 16:48:07 2013 +0300

drm/i915: put context upon switching

v2: Daniel convinced me that the request here was solely for context
lifetime tracking and that we have the active ref to keep the object
alive whilst the MI_SET_CONTEXT. So the only concern then is which
context should get the blame for MI_SET_CONTEXT failing. The old scheme
added a request for the old context so that any hang upto and including
the switch away would mark the old context as guilty. Now any hang here
implicates the new context. However since we have already gone through a
complete flush with the last context in its last request, and all that
lies in no-man's-land is an invalidate flush and the MI_SET_CONTEXT, we
should be safe in not unduly placing blame on the new context.

Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
Cc: Ben Widawsky b...@bwidawsk.net
Cc: Paulo Zanoni paulo.r.zan...@intel.com
---
 drivers/gpu/drm/i915/i915_gem.c |  1 -
 drivers/gpu/drm/i915/i915_gem_context.c | 12 +---
 2 files changed, 1 insertion(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 2d1cb10..56c9104 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2046,7 +2046,6 @@ int __i915_add_request(struct intel_ring_buffer *ring,
if (request == NULL)
return -ENOMEM;
 
-
/* Record the position of the start of the request so that
 * should we detect the updated seqno part-way through the
 * GPU processing the request, we never over-estimate the
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
b/drivers/gpu/drm/i915/i915_gem_context.c
index 403309c..b6da70b 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -451,17 +451,7 @@ static int do_switch(struct i915_hw_context *to)
from-obj-dirty = 1;
BUG_ON(from-obj-ring != ring);
 
-   ret = i915_add_request(ring, NULL);
-   if (ret) {
-   /* Too late, we've already scheduled a context switch.
-* Try to undo the change so that the hw state is
-* consistent with out tracking. In case of emergency,
-* scream.
-*/
-   WARN_ON(mi_set_context(ring, from, MI_RESTORE_INHIBIT));
-   return ret;
-   }
-
+   /* obj is kept alive until the next request by its active ref */
i915_gem_object_unpin(from-obj);
i915_gem_context_unreference(from);
}
-- 
1.8.1.4

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 00/17] drm-intel-collector WW34 - Simple patches as series for review

Hi all,

Let me introduce drm-intel-collector branch:
http://cgit.freedesktop.org/~vivijim/drm-intel/log/?h=drm-intel-collector

To describe drm-intel-collector I'll quote Daniel:
The overall idea is to make sure that simple patches don't get lost.
Bigger patch series or feature work tends to not get lost, and really
trivial patches I tend to merge right away. But 1-2 patch stuff in
between is occasionally lost

Process:

1. Daniel pushs drm-intel-testing
2. I rebase drm-intel-collector onto drm-intel-testing
3. I collect all simple (1-2) patches that wasn't yet reviewed and not queued 
by Daniel
4. Request automated QA's PRTS automated i-g-t tests comparing 
drm-intel-testing x drm-intel-collector
5. If tests are ok I send the patches as a series to intel-gfx mailing list for 
better tracking and to be reviewed.

There are some reasons that some patches can be left behind:
1. It was send so long time ago. I started with patches from Jul 26th.
2. Your patch didn't applied cleanly and I couldn't easily solve the conflicts.
3. Kernel didn't compiled with your patch.
4. I simply missed it. If you believe this is the case please warn me.

Please help me to get these patches reviewed and queued by Daniel.

Also, please let me know if you have further ideas how to improve this process.

Thanks in advance,
Rodrigo.

Chris Wilson (13):
  drm/i915: Do not add an interrupt for a context switch
  drm/i915: Rearrange the comments in i915_add_request()
  drm/i915: Pin pages whilst mapping the dma-buf
  drm/i915: Cancel outstanding modeset workers before suspend
  drm/i915: Always prefer CPU relocations with LLC
  drm/i915: Report requested frequency alongside current frequency in
debugfs
  drm/i915: Move the conditional seqno query into the tracepoint
  drm/i915: Add some missing steps to i915_driver_load error path
  drm/i915: Asynchronously perform the set-base for a simple modeset
  drm/i915: Align tiled scanouts from stolen memory to 256k in the GTT
  drm/i915: Apply the force-detect VGA w/a to Valleyview
  drm/i915: Pair seqno completion tracepoint with its dispatch
  RFM drm/i915: Boost RPS frequency for CPU stalls

Daniel Vetter (1):
  drm/i915: check that the i965g/gm 4G limit is really obeyed

Jesse Barnes (2):
  drm/i915: split PCI IDs out into i915_drm.h v4
  x86: add early quirk for reserving Intel graphics stolen memory v5

Rodrigo Vivi (1):
  drm/i915: Enable Lower Slice on Haswell GT3.

 arch/x86/kernel/early-quirks.c | 154 +
 drivers/gpu/drm/i915/i915_debugfs.c|  11 +-
 drivers/gpu/drm/i915/i915_dma.c|  19 ++-
 drivers/gpu/drm/i915/i915_drv.c| 164 +-
 drivers/gpu/drm/i915/i915_drv.h|   3 +
 drivers/gpu/drm/i915/i915_gem.c|  25 +++-
 drivers/gpu/drm/i915/i915_gem_context.c|  12 +-
 drivers/gpu/drm/i915/i915_gem_dmabuf.c |  41 +++---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   7 +-
 drivers/gpu/drm/i915/i915_irq.c|   2 +-
 drivers/gpu/drm/i915/i915_reg.h|  20 +--
 drivers/gpu/drm/i915/i915_trace.h  |  33 +++--
 drivers/gpu/drm/i915/intel_crt.c   |   2 +-
 drivers/gpu/drm/i915/intel_display.c   |  34 +++--
 include/drm/i915_drm.h |  34 +
 include/drm/i915_pciids.h  | 211 +
 16 files changed, 566 insertions(+), 206 deletions(-)
 create mode 100644 include/drm/i915_pciids.h

-- 
1.8.1.4

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 02/17] drm/i915: Rearrange the comments in i915_add_request()

From: Chris Wilson ch...@chris-wilson.co.uk

The comments were a little out-of-sequence with the code, forcing the
reader to jump around whilst reading. Whilst moving the comments around,
add one to explain the context reference.

Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
---
 drivers/gpu/drm/i915/i915_gem.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 56c9104..d5c8a02 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2063,8 +2063,6 @@ int __i915_add_request(struct intel_ring_buffer *ring,
request-ring = ring;
request-head = request_start;
request-tail = request_ring_position;
-   request-ctx = ring-last_context;
-   request-batch_obj = obj;
 
/* Whilst this request exists, batch_obj will be on the
 * active_list, and so will hold the active reference. Only when this
@@ -2072,7 +2070,12 @@ int __i915_add_request(struct intel_ring_buffer *ring,
 * inactive_list and lose its active reference. Hence we do not need
 * to explicitly hold another reference here.
 */
+   request-batch_obj = obj;
 
+   /* Hold a reference to the current context so that we can inspect
+* it later in case a hangcheck error event fires.
+*/
+   request-ctx = ring-last_context;
if (request-ctx)
i915_gem_context_reference(request-ctx);
 
-- 
1.8.1.4

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 03/17] drm/i915: Pin pages whilst mapping the dma-buf

From: Chris Wilson ch...@chris-wilson.co.uk

As we attempt to kmalloc after calling get_pages, there is a possibility
that the shrinker may reap the pages we just acquired. To prevent this
we need to increment the pages_pin_count early, so rearrange the code
and error paths to make it so.

Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
---
 drivers/gpu/drm/i915/i915_gem_dmabuf.c | 41 ++
 1 file changed, 22 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c 
b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
index e918b05..7d5752f 100644
--- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
@@ -42,27 +42,24 @@ static struct sg_table *i915_gem_map_dma_buf(struct 
dma_buf_attachment *attachme
 
ret = i915_mutex_lock_interruptible(obj-base.dev);
if (ret)
-   return ERR_PTR(ret);
+   goto err;
 
ret = i915_gem_object_get_pages(obj);
-   if (ret) {
-   st = ERR_PTR(ret);
-   goto out;
-   }
+   if (ret)
+   goto err_unlock;
+
+   i915_gem_object_pin_pages(obj);
 
/* Copy sg so that we make an independent mapping */
st = kmalloc(sizeof(struct sg_table), GFP_KERNEL);
if (st == NULL) {
-   st = ERR_PTR(-ENOMEM);
-   goto out;
+   ret = -ENOMEM;
+   goto err_unpin;
}
 
ret = sg_alloc_table(st, obj-pages-nents, GFP_KERNEL);
-   if (ret) {
-   kfree(st);
-   st = ERR_PTR(ret);
-   goto out;
-   }
+   if (ret)
+   goto err_free;
 
src = obj-pages-sgl;
dst = st-sgl;
@@ -73,17 +70,23 @@ static struct sg_table *i915_gem_map_dma_buf(struct 
dma_buf_attachment *attachme
}
 
if (!dma_map_sg(attachment-dev, st-sgl, st-nents, dir)) {
-   sg_free_table(st);
-   kfree(st);
-   st = ERR_PTR(-ENOMEM);
-   goto out;
+   ret =-ENOMEM;
+   goto err_free_sg;
}
 
-   i915_gem_object_pin_pages(obj);
-
-out:
mutex_unlock(obj-base.dev-struct_mutex);
return st;
+
+err_free_sg:
+   sg_free_table(st);
+err_free:
+   kfree(st);
+err_unpin:
+   i915_gem_object_unpin_pages(obj);
+err_unlock:
+   mutex_unlock(obj-base.dev-struct_mutex);
+err:
+   return ERR_PTR(ret);
 }
 
 static void i915_gem_unmap_dma_buf(struct dma_buf_attachment *attachment,
-- 
1.8.1.4

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 05/17] drm/i915: Cancel outstanding modeset workers before suspend

From: Chris Wilson ch...@chris-wilson.co.uk

Upon resume we will do a complete restoration of the mode and so reset
all tasks, but during suspend (and unload) we need to make sure that no
workers run concurrently with our suspend code. Or worse just after.

The issue was first raised whilst tackling a suspend issue with Takashi
Iwai (http://lists.freedesktop.org/archives/intel-gfx/2012-April/016738.html)
and then it was independently rediscovered by Chuanshen Lui.

v2: Rebase for the lost year.

Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
Acked-by: Takashi Iwai ti...@suse.de
Cc: Liu, Chuansheng chuansheng@intel.com
---
 drivers/gpu/drm/i915/i915_drv.h  |  1 +
 drivers/gpu/drm/i915/intel_display.c | 21 -
 2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 8ee15b4..7ac5ca5 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2199,6 +2199,7 @@ extern void intel_modeset_init_hw(struct drm_device *dev);
 extern void intel_modeset_suspend_hw(struct drm_device *dev);
 extern void intel_modeset_init(struct drm_device *dev);
 extern void intel_modeset_gem_init(struct drm_device *dev);
+extern void intel_modeset_quiesce(struct drm_device *dev);
 extern void intel_modeset_cleanup(struct drm_device *dev);
 extern int intel_modeset_vga_set_state(struct drm_device *dev, bool state);
 extern void intel_modeset_setup_hw_state(struct drm_device *dev,
diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index bcb62fe..74dbc68 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -10030,6 +10030,7 @@ void intel_modeset_init_hw(struct drm_device *dev)
 
 void intel_modeset_suspend_hw(struct drm_device *dev)
 {
+   intel_modeset_quiesce(dev);
intel_suspend_hw(dev);
 }
 
@@ -10474,9 +10475,19 @@ void intel_modeset_gem_init(struct drm_device *dev)
intel_modeset_setup_hw_state(dev, false);
 }
 
-void intel_modeset_cleanup(struct drm_device *dev)
+void intel_modeset_quiesce(struct drm_device *dev)
 {
struct drm_i915_private *dev_priv = dev-dev_private;
+
+   cancel_work_sync(dev_priv-hotplug_work);
+   cancel_work_sync(dev_priv-rps.work);
+
+   /* catch all required for dev_priv-wq */
+   flush_scheduled_work();
+}
+
+void intel_modeset_cleanup(struct drm_device *dev)
+{
struct drm_crtc *crtc;
 
/*
@@ -10485,7 +10496,10 @@ void intel_modeset_cleanup(struct drm_device *dev)
 * experience fancy races otherwise.
 */
drm_irq_uninstall(dev);
-   cancel_work_sync(dev_priv-hotplug_work);
+
+   /* flush any delayed tasks or pending work */
+   intel_modeset_quiesce(dev);
+
/*
 * Due to the hpd irq storm handling the hotplug work can re-arm the
 * poll handlers. Hence disable polling after hpd handling is shut down.
@@ -10512,9 +10526,6 @@ void intel_modeset_cleanup(struct drm_device *dev)
 
mutex_unlock(dev-struct_mutex);
 
-   /* flush any delayed tasks or pending work */
-   flush_scheduled_work();
-
/* destroy backlight, if any, before the connectors */
intel_panel_destroy_backlight(dev);
 
-- 
1.8.1.4

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 04/17] drm/i915: check that the i965g/gm 4G limit is really obeyed

From: Daniel Vetter daniel.vet...@ffwll.ch

In truly crazy circumstances shmem might give us the wrong type of
page. So be a bit paranoid and double check this.

Cc: Rob Clark robdcl...@gmail.com
References: http://lkml.org/lkml/2011/7/11/238
Signed-off-by: Daniel Vetter daniel.vet...@ffwll.ch
---
 drivers/gpu/drm/i915/i915_gem.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d5c8a02..7c92923 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1829,6 +1829,9 @@ i915_gem_object_get_pages_gtt(struct drm_i915_gem_object 
*obj)
sg-length += PAGE_SIZE;
}
last_pfn = page_to_pfn(page);
+
+   /* Check that the i965g/gm workaround works. */
+   WARN_ON((gfp  __GFP_DMA32)  (last_pfn = 0x0010UL));
}
 #ifdef CONFIG_SWIOTLB
if (!swiotlb_nr_tbl())
-- 
1.8.1.4

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 06/17] drm/i915: split PCI IDs out into i915_drm.h v4

From: Jesse Barnes jbar...@virtuousgeek.org

For use by userspace (at some point in the future) and other kernel code.

v2: move PCI IDs to uabi (Chris)
move PCI IDs to drm/ (Dave)
v3: fixup Quanta detection - needs to come first (Daniel)
v4: fix up PCI match structure init for easier use by userspace (Chris)

Signed-off-by: Jesse Barnes jbar...@virtuousgeek.org
---
 drivers/gpu/drm/i915/i915_drv.c | 164 +++
 include/drm/i915_drm.h  |   2 +
 include/drm/i915_pciids.h   | 211 
 3 files changed, 247 insertions(+), 130 deletions(-)
 create mode 100644 include/drm/i915_pciids.h

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 735dd56..3872f66 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -157,25 +157,6 @@ MODULE_PARM_DESC(prefault_disable,
 static struct drm_driver driver;
 extern int intel_agp_enabled;
 
-#define INTEL_VGA_DEVICE(id, info) {   \
-   .class = PCI_BASE_CLASS_DISPLAY  16,  \
-   .class_mask = 0xff, \
-   .vendor = 0x8086,   \
-   .device = id,   \
-   .subvendor = PCI_ANY_ID,\
-   .subdevice = PCI_ANY_ID,\
-   .driver_data = (unsigned long) info }
-
-#define INTEL_QUANTA_VGA_DEVICE(info) {\
-   .class = PCI_BASE_CLASS_DISPLAY  16,  \
-   .class_mask = 0xff, \
-   .vendor = 0x8086,   \
-   .device = 0x16a,\
-   .subvendor = 0x152d,\
-   .subdevice = 0x8990,\
-   .driver_data = (unsigned long) info }
-
-
 static const struct intel_device_info intel_i830_info = {
.gen = 2, .is_mobile = 1, .cursor_needs_physical = 1, .num_pipes = 2,
.has_overlay = 1, .overlay_needs_physical = 1,
@@ -350,118 +331,41 @@ static const struct intel_device_info 
intel_haswell_m_info = {
.has_vebox_ring = 1,
 };
 
+/*
+ * Make sure any device matches here are from most specific to most
+ * general.  For example, since the Quanta match is based on the subsystem
+ * and subvendor IDs, we need it to come before the more general IVB
+ * PCI ID matches, otherwise we'll use the wrong info struct above.
+ */
+#define INTEL_PCI_IDS \
+   INTEL_I830_IDS(intel_i830_info),   \
+   INTEL_I845G_IDS(intel_845g_info),  \
+   INTEL_I85X_IDS(intel_i85x_info),   \
+   INTEL_I865G_IDS(intel_i865g_info), \
+   INTEL_I915G_IDS(intel_i915g_info), \
+   INTEL_I915GM_IDS(intel_i915gm_info),   \
+   INTEL_I945G_IDS(intel_i945g_info), \
+   INTEL_I945GM_IDS(intel_i945gm_info),   \
+   INTEL_I965G_IDS(intel_i965g_info), \
+   INTEL_G33_IDS(intel_g33_info), \
+   INTEL_I965GM_IDS(intel_i965gm_info),   \
+   INTEL_GM45_IDS(intel_gm45_info),   \
+   INTEL_G45_IDS(intel_g45_info), \
+   INTEL_PINEVIEW_IDS(intel_pineview_info),   \
+   INTEL_IRONLAKE_D_IDS(intel_ironlake_d_info),   \
+   INTEL_IRONLAKE_M_IDS(intel_ironlake_m_info),   \
+   INTEL_SNB_D_IDS(intel_sandybridge_d_info), \
+   INTEL_SNB_M_IDS(intel_sandybridge_m_info), \
+   INTEL_IVB_Q_IDS(intel_ivybridge_q_info), /* must be first IVB */ \
+   INTEL_IVB_M_IDS(intel_ivybridge_m_info),   \
+   INTEL_IVB_D_IDS(intel_ivybridge_d_info),   \
+   INTEL_HSW_D_IDS(intel_haswell_d_info), \
+   INTEL_HSW_M_IDS(intel_haswell_m_info), \
+   INTEL_VLV_M_IDS(intel_valleyview_m_info),  \
+   INTEL_VLV_D_IDS(intel_valleyview_d_info)
+
 static const struct pci_device_id pciidlist[] = {  /* aka */
-   INTEL_VGA_DEVICE(0x3577, intel_i830_info), /* I830_M */
-   INTEL_VGA_DEVICE(0x2562, intel_845g_info), /* 845_G */
-   INTEL_VGA_DEVICE(0x3582, intel_i85x_info), /* I855_GM */
-   INTEL_VGA_DEVICE(0x358e, intel_i85x_info),
-   INTEL_VGA_DEVICE(0x2572, intel_i865g_info),/* I865_G */
-   INTEL_VGA_DEVICE(0x2582, intel_i915g_info),/* I915_G */
-   INTEL_VGA_DEVICE(0x258a, intel_i915g_info),/* E7221_G */
-   INTEL_VGA_DEVICE(0x2592, intel_i915gm_info),   /* I915_GM */
-   INTEL_VGA_DEVICE(0x2772, intel_i945g_info),/* I945_G */
-   INTEL_VGA_DEVICE(0x27a2, intel_i945gm_info),   /* I945_GM */
-   INTEL_VGA_DEVICE(0x27ae, intel_i945gm_info),   /* I945_GME */
-   INTEL_VGA_DEVICE(0x2972, intel_i965g_info),/* I946_GZ */
-   INTEL_VGA_DEVICE(0x2982, intel_i965g_info),/* G35_G */
-   INTEL_VGA_DEVICE(0x2992, intel_i965g_info),/* I965_Q */
-   INTEL_VGA_DEVICE(0x29a2, intel_i965g_info),/* I965_G */
-   INTEL_VGA_DEVICE(0x29b2, intel_g33_info),  /* Q35_G */
-

[Intel-gfx] [PATCH 07/17] x86: add early quirk for reserving Intel graphics stolen memory v5

From: Jesse Barnes jbar...@virtuousgeek.org

Systems with Intel graphics controllers set aside memory exclusively for
gfx driver use.  This memory is not always marked in the E820 as
reserved or as RAM, and so is subject to overlap from E820 manipulation
later in the boot process.  On some systems, MMIO space is allocated on
top, despite the efforts of the RAM buffer approach, which simply
rounds memory boundaries up to 64M to try to catch space that may decode
as RAM and so is not suitable for MMIO.

v2: use read_pci_config for 32 bit reads instead of adding a new one
(Chris)
add gen6 stolen size function (Chris)
v3: use a function pointer (Chris)
drop gen2 bits (Daniel)
v4: call e820_sanitize_map after adding the region
v5: fixup comments (Peter)
simplify loop (Chris)

Acked-by: Ingo Molnar mi...@kernel.org
Signed-off-by: Jesse Barnes jbar...@virtuousgeek.org
---
 arch/x86/kernel/early-quirks.c  | 154 
 drivers/gpu/drm/i915/i915_reg.h |  15 
 include/drm/i915_drm.h  |  32 +
 3 files changed, 186 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/early-quirks.c b/arch/x86/kernel/early-quirks.c
index 63bdb29..b3cd3eb 100644
--- a/arch/x86/kernel/early-quirks.c
+++ b/arch/x86/kernel/early-quirks.c
@@ -12,6 +12,7 @@
 #include linux/pci.h
 #include linux/acpi.h
 #include linux/pci_ids.h
+#include drm/i915_drm.h
 #include asm/pci-direct.h
 #include asm/dma.h
 #include asm/io_apic.h
@@ -216,6 +217,157 @@ static void __init intel_remapping_check(int num, int 
slot, int func)
 
 }
 
+/*
+ * Systems with Intel graphics controllers set aside memory exclusively
+ * for gfx driver use.  This memory is not marked in the E820 as reserved
+ * or as RAM, and so is subject to overlap from E820 manipulation later
+ * in the boot process.  On some systems, MMIO space is allocated on top,
+ * despite the efforts of the RAM buffer approach, which simply rounds
+ * memory boundaries up to 64M to try to catch space that may decode
+ * as RAM and so is not suitable for MMIO.
+ *
+ * And yes, so far on current devices the base addr is always under 4G.
+ */
+static u32 __init intel_stolen_base(int num, int slot, int func)
+{
+   u32 base;
+
+   /*
+* For the PCI IDs in this quirk, the stolen base is always
+* in 0x5c, aka the BDSM register (yes that's really what
+* it's called).
+*/
+   base = read_pci_config(num, slot, func, 0x5c);
+   base = ~((120) - 1);
+
+   return base;
+}
+
+#define KB(x)  ((x) * 1024)
+#define MB(x)  (KB (KB (x)))
+#define GB(x)  (MB (KB (x)))
+
+static size_t __init gen3_stolen_size(int num, int slot, int func)
+{
+   size_t stolen_size;
+   u16 gmch_ctrl;
+
+   gmch_ctrl = read_pci_config_16(0, 0, 0, I830_GMCH_CTRL);
+
+   switch (gmch_ctrl  I855_GMCH_GMS_MASK) {
+   case I855_GMCH_GMS_STOLEN_1M:
+   stolen_size = MB(1);
+   break;
+   case I855_GMCH_GMS_STOLEN_4M:
+   stolen_size = MB(4);
+   break;
+   case I855_GMCH_GMS_STOLEN_8M:
+   stolen_size = MB(8);
+   break;
+   case I855_GMCH_GMS_STOLEN_16M:
+   stolen_size = MB(16);
+   break;
+   case I855_GMCH_GMS_STOLEN_32M:
+   stolen_size = MB(32);
+   break;
+   case I915_GMCH_GMS_STOLEN_48M:
+   stolen_size = MB(48);
+   break;
+   case I915_GMCH_GMS_STOLEN_64M:
+   stolen_size = MB(64);
+   break;
+   case G33_GMCH_GMS_STOLEN_128M:
+   stolen_size = MB(128);
+   break;
+   case G33_GMCH_GMS_STOLEN_256M:
+   stolen_size = MB(256);
+   break;
+   case INTEL_GMCH_GMS_STOLEN_96M:
+   stolen_size = MB(96);
+   break;
+   case INTEL_GMCH_GMS_STOLEN_160M:
+   stolen_size = MB(160);
+   break;
+   case INTEL_GMCH_GMS_STOLEN_224M:
+   stolen_size = MB(224);
+   break;
+   case INTEL_GMCH_GMS_STOLEN_352M:
+   stolen_size = MB(352);
+   break;
+   default:
+   stolen_size = 0;
+   break;
+   }
+
+   return stolen_size;
+}
+
+static size_t __init gen6_stolen_size(int num, int slot, int func)
+{
+   u16 gmch_ctrl;
+
+   gmch_ctrl = read_pci_config_16(num, slot, func, SNB_GMCH_CTRL);
+   gmch_ctrl = SNB_GMCH_GMS_SHIFT;
+   gmch_ctrl = SNB_GMCH_GMS_MASK;
+
+   return gmch_ctrl  25; /* 32 MB units */
+}
+
+typedef size_t (*stolen_size_fn)(int num, int slot, int func);
+
+static struct pci_device_id intel_stolen_ids[] __initdata = {
+   INTEL_I915G_IDS(gen3_stolen_size),
+   INTEL_I915GM_IDS(gen3_stolen_size),
+   INTEL_I945G_IDS(gen3_stolen_size),
+   INTEL_I945GM_IDS(gen3_stolen_size),
+   INTEL_VLV_M_IDS(gen3_stolen_size),
+   INTEL_VLV_D_IDS(gen3_stolen_size),
+

[Intel-gfx] [PATCH 08/17] drm/i915: Always prefer CPU relocations with LLC

From: Chris Wilson ch...@chris-wilson.co.uk

A follow-on to the update of the LLC coherency logic is that we can rely
on the LLC being coherent with the CS for rewriting batchbuffers
irrespective of their cache domain. (This should have no effect
currently as all the batch buffers are expected to be I915_CACHE_LLC and
so using the cpu relocation path anyway.)

Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 792c52a..3b64b9f 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -166,7 +166,8 @@ eb_destroy(struct eb_objects *eb)
 
 static inline int use_cpu_reloc(struct drm_i915_gem_object *obj)
 {
-   return (obj-base.write_domain == I915_GEM_DOMAIN_CPU ||
+   return (HAS_LLC(obj-base.dev) ||
+   obj-base.write_domain == I915_GEM_DOMAIN_CPU ||
!obj-map_and_fenceable ||
obj-cache_level != I915_CACHE_NONE);
 }
@@ -179,7 +180,7 @@ relocate_entry_cpu(struct drm_i915_gem_object *obj,
char *vaddr;
int ret = -EINVAL;
 
-   ret = i915_gem_object_set_to_cpu_domain(obj, 1);
+   ret = i915_gem_object_set_to_cpu_domain(obj, true);
if (ret)
return ret;
 
-- 
1.8.1.4

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 10/17] drm/i915: Move the conditional seqno query into the tracepoint

From: Chris Wilson ch...@chris-wilson.co.uk

We only wish to know the value of seqno when emitting the tracepoint, so
move the query from a parameter to the macro to inside the conditional
macro body so that the query is only evaluated when required.

Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
---
 drivers/gpu/drm/i915/i915_irq.c   |  2 +-
 drivers/gpu/drm/i915/i915_trace.h | 21 ++---
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index a03b445..efac65b 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -801,7 +801,7 @@ static void notify_ring(struct drm_device *dev,
if (ring-obj == NULL)
return;
 
-   trace_i915_gem_request_complete(ring, ring-get_seqno(ring, false));
+   trace_i915_gem_request_complete(ring);
 
wake_up_all(ring-irq_queue);
i915_queue_hangcheck(dev);
diff --git a/drivers/gpu/drm/i915/i915_trace.h 
b/drivers/gpu/drm/i915/i915_trace.h
index e2c5ee6..a1797f6 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -304,9 +304,24 @@ DEFINE_EVENT(i915_gem_request, i915_gem_request_add,
TP_ARGS(ring, seqno)
 );
 
-DEFINE_EVENT(i915_gem_request, i915_gem_request_complete,
-   TP_PROTO(struct intel_ring_buffer *ring, u32 seqno),
-   TP_ARGS(ring, seqno)
+TRACE_EVENT(i915_gem_request_complete,
+   TP_PROTO(struct intel_ring_buffer *ring),
+   TP_ARGS(ring),
+
+   TP_STRUCT__entry(
+__field(u32, dev)
+__field(u32, ring)
+__field(u32, seqno)
+),
+
+   TP_fast_assign(
+  __entry-dev = ring-dev-primary-index;
+  __entry-ring = ring-id;
+  __entry-seqno = ring-get_seqno(ring, false);
+  ),
+
+   TP_printk(dev=%u, ring=%u, seqno=%u,
+ __entry-dev, __entry-ring, __entry-seqno)
 );
 
 DEFINE_EVENT(i915_gem_request, i915_gem_request_retire,
-- 
1.8.1.4

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 09/17] drm/i915: Report requested frequency alongside current frequency in debugfs

From: Chris Wilson ch...@chris-wilson.co.uk

It can be useful to compare at times the current vs requested frequency
of the GPU, so provide the contents of RPNSWREQ alonside CAGF.

Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
---
 drivers/gpu/drm/i915/i915_debugfs.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index 55ab924..a6f4cb5 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -857,7 +857,7 @@ static int i915_cur_delayinfo(struct seq_file *m, void 
*unused)
u32 gt_perf_status = I915_READ(GEN6_GT_PERF_STATUS);
u32 rp_state_limits = I915_READ(GEN6_RP_STATE_LIMITS);
u32 rp_state_cap = I915_READ(GEN6_RP_STATE_CAP);
-   u32 rpstat, cagf;
+   u32 rpstat, cagf, reqf;
u32 rpupei, rpcurup, rpprevup;
u32 rpdownei, rpcurdown, rpprevdown;
int max_freq;
@@ -869,6 +869,14 @@ static int i915_cur_delayinfo(struct seq_file *m, void 
*unused)
 
gen6_gt_force_wake_get(dev_priv);
 
+   reqf = I915_READ(GEN6_RPNSWREQ);
+   reqf = ~GEN6_TURBO_DISABLE;
+   if (IS_HASWELL(dev))
+   reqf = 24;
+   else
+   reqf = 25;
+   reqf *= GT_FREQUENCY_MULTIPLIER;
+
rpstat = I915_READ(GEN6_RPSTAT1);
rpupei = I915_READ(GEN6_RP_CUR_UP_EI);
rpcurup = I915_READ(GEN6_RP_CUR_UP);
@@ -893,6 +901,7 @@ static int i915_cur_delayinfo(struct seq_file *m, void 
*unused)
   gt_perf_status  0xff);
seq_printf(m, Render p-state limit: %d\n,
   rp_state_limits  0xff);
+   seq_printf(m, RPNSWREQ: %dMHz\n, reqf);
seq_printf(m, CAGF: %dMHz\n, cagf);
seq_printf(m, RP CUR UP EI: %dus\n, rpupei 
   GEN6_CURICONT_MASK);
-- 
1.8.1.4

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 11/17] drm/i915: Add some missing steps to i915_driver_load error path

From: Chris Wilson ch...@chris-wilson.co.uk

We missed adding a few cleanup steps for recent additions.

Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
---
 drivers/gpu/drm/i915/i915_dma.c | 19 +++
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 4c7669f..236e5e3 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1543,7 +1543,7 @@ int i915_driver_load(struct drm_device *dev, unsigned 
long flags)
 
ret = i915_gem_gtt_init(dev);
if (ret)
-   goto put_bridge;
+   goto out_regs;
 
if (drm_core_check_feature(dev, DRIVER_MODESET))
i915_kick_out_firmware_fb(dev_priv);
@@ -1572,7 +1572,7 @@ int i915_driver_load(struct drm_device *dev, unsigned 
long flags)
 aperture_size);
if (dev_priv-gtt.mappable == NULL) {
ret = -EIO;
-   goto out_rmmap;
+   goto out_gtt;
}
 
dev_priv-gtt.mtrr = arch_phys_wc_add(dev_priv-gtt.mappable_base,
@@ -1646,7 +1646,7 @@ int i915_driver_load(struct drm_device *dev, unsigned 
long flags)
ret = i915_load_modeset_init(dev);
if (ret  0) {
DRM_ERROR(failed to init modeset\n);
-   goto out_gem_unload;
+   goto out_power_well;
}
} else {
/* Start out suspended in ums mode. */
@@ -1666,6 +1666,10 @@ int i915_driver_load(struct drm_device *dev, unsigned 
long flags)
 
return 0;
 
+out_power_well:
+   if (HAS_POWER_WELL(dev))
+   i915_remove_power_well(dev);
+   drm_vblank_cleanup(dev);
 out_gem_unload:
if (dev_priv-mm.inactive_shrinker.shrink)
unregister_shrinker(dev_priv-mm.inactive_shrinker);
@@ -1679,12 +1683,17 @@ out_gem_unload:
 out_mtrrfree:
arch_phys_wc_del(dev_priv-gtt.mtrr);
io_mapping_free(dev_priv-gtt.mappable);
+out_gtt:
+   list_del(dev_priv-gtt.base.global_link);
+   drm_mm_takedown(dev_priv-gtt.base.mm);
dev_priv-gtt.base.cleanup(dev_priv-gtt.base);
-out_rmmap:
+out_regs:
pci_iounmap(dev-pdev, dev_priv-regs);
 put_bridge:
pci_dev_put(dev_priv-bridge_dev);
 free_priv:
+   if (dev_priv-slab)
+   kmem_cache_destroy(dev_priv-slab);
kfree(dev_priv);
return ret;
 }
@@ -1777,6 +1786,8 @@ int i915_driver_unload(struct drm_device *dev)
if (dev_priv-regs != NULL)
pci_iounmap(dev-pdev, dev_priv-regs);
 
+   drm_vblank_cleanup(dev);
+
intel_teardown_gmbus(dev);
intel_teardown_mchbar(dev);
 
-- 
1.8.1.4

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 13/17] drm/i915: Align tiled scanouts from stolen memory to 256k in the GTT

From: Chris Wilson ch...@chris-wilson.co.uk

For unfathomable reasons this alignment appears to be required for tiled
scanouts being read from stolen memory. I can find no reference in the
w/a db to support this requirement, but the evidence of my own eyes says
this prevents many headaches.

Note that I have not tricked anything older than Sandybridge into using
stolen tiled scanouts, so the extra alignment may be required there as
well.

Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
---
 drivers/gpu/drm/i915/intel_display.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index 0da0ead1..cee9c0d 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -1837,6 +1837,8 @@ intel_pin_and_fence_fb_obj(struct drm_device *dev,
case I915_TILING_X:
/* pin() will align the object as required by fence */
alignment = 0;
+   if (obj-stolen  INTEL_INFO(dev)-gen = 6)
+   alignment = 256 * 1024;
break;
case I915_TILING_Y:
/* Despite that we check this in framebuffer_init userspace can
-- 
1.8.1.4

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 14/17] drm/i915: Apply the force-detect VGA w/a to Valleyview

From: Chris Wilson ch...@chris-wilson.co.uk

It appears that Valleyview shares its VGA encoder with more recent
siblings and requires the same forced detection cycle after a hardware
reset before we can rely on hotplugging.

Reported-and-tested-by: kobeqin kobe@intel.com
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=67733
Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
---
 drivers/gpu/drm/i915/intel_crt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_crt.c b/drivers/gpu/drm/i915/intel_crt.c
index b5a3875..7475200 100644
--- a/drivers/gpu/drm/i915/intel_crt.c
+++ b/drivers/gpu/drm/i915/intel_crt.c
@@ -688,7 +688,7 @@ static void intel_crt_reset(struct drm_connector *connector)
struct drm_i915_private *dev_priv = dev-dev_private;
struct intel_crt *crt = intel_attached_crt(connector);
 
-   if (HAS_PCH_SPLIT(dev)) {
+   if (HAS_PCH_SPLIT(dev) || IS_VALLEYVIEW(dev)) {
u32 adpa;
 
adpa = I915_READ(crt-adpa_reg);
-- 
1.8.1.4

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 12/17] drm/i915: Asynchronously perform the set-base for a simple modeset

From: Chris Wilson ch...@chris-wilson.co.uk

A simple modeset, where we only wish to switch over to a new framebuffer
such as the transition from fbcon to X, takes around 30-60ms. This is
due to three factors:

1. We need to make sure the fb-obj is in the display domain, which
incurs a cache flush to ensure no dirt is left on the scanout.

2. We need to flush any pending rendering before performing the mmio
so that the frame is complete before it is shown.

3. We currently wait for the vblank after the mmio to be sure that the
old fb is no longer being shown before releasing it.

(1) can only be eliminated by userspace preparing the fb-obj in advance
to already be in the display domain. This can be done through use of the
create2 ioctl, or by reusing an existing fb-obj.

However, (2) and (3) are already solved by the existing page flip
mechanism, and it is surprisingly trivial to wire them up for use in the
set-base fast path. Though it can be argued that this represents a
subtle ABI break in that the set_config ioctl now returns before the old
framebuffer is unpinned. The danger is that userspace will start to
modify it before it is no longer being shown, however we should be able
to prevent that through proper domain tracking.

By combining all of the above, we can achieve an instaneous set_config:

[ 6.601] (II) intel(0): switch to mode 2560x1440@60.0 on pipe 0 using DP2, 
position (0, 0), rotation normal
[ 6.601] (II) intel(0): Setting screen physical size to 677 x 381

Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
---
 drivers/gpu/drm/i915/intel_display.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index 74dbc68..0da0ead1 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -9213,10 +9213,13 @@ static int intel_crtc_set_config(struct drm_mode_set 
*set)
ret = intel_set_mode(set-crtc, set-mode,
 set-x, set-y, set-fb);
} else if (config-fb_changed) {
-   intel_crtc_wait_for_pending_flips(set-crtc);
-
-   ret = intel_pipe_set_base(set-crtc,
- set-x, set-y, set-fb);
+   if (to_intel_framebuffer(set-fb)-obj-ring == NULL ||
+   save_set.x != set-x || save_set.y != set-y ||
+   intel_crtc_page_flip(set-crtc, set-fb, NULL)) {
+   intel_crtc_wait_for_pending_flips(set-crtc);
+   ret = intel_pipe_set_base(set-crtc,
+ set-x, set-y, set-fb);
+   }
}
 
if (ret) {
-- 
1.8.1.4

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 15/17] drm/i915: Pair seqno completion tracepoint with its dispatch

From: Chris Wilson ch...@chris-wilson.co.uk

In order to time how long a seqno is executed by a ring, we need to
measure both its insertion and its completion. (Using the completion of
the previous seqno as an estimate for when the GPU starts, if busy.) In
order to get an exact completion timestamp, we need irqs. This is
enabled by trace_i915_gem_ring_dispatch, so it makes more sens to pair
the completion event with that rather than the request.

Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  2 +-
 drivers/gpu/drm/i915/i915_irq.c|  2 +-
 drivers/gpu/drm/i915/i915_trace.h  | 48 +++---
 3 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 3b64b9f..4f90cb3 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1127,7 +1127,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
goto err;
}
 
-   trace_i915_gem_ring_dispatch(ring, intel_ring_get_seqno(ring), flags);
+   trace_i915_gem_ring_dispatch(ring, flags);
 
i915_gem_execbuffer_move_to_active(eb-objects, vm, ring);
i915_gem_execbuffer_retire_commands(dev, file, ring, batch_obj);
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index efac65b..d61bd5b 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -801,7 +801,7 @@ static void notify_ring(struct drm_device *dev,
if (ring-obj == NULL)
return;
 
-   trace_i915_gem_request_complete(ring);
+   trace_i915_gem_ring_complete(ring);
 
wake_up_all(ring-irq_queue);
i915_queue_hangcheck(dev);
diff --git a/drivers/gpu/drm/i915/i915_trace.h 
b/drivers/gpu/drm/i915/i915_trace.h
index a1797f6..5c8e36a 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -234,8 +234,8 @@ TRACE_EVENT(i915_gem_evict_everything,
 );
 
 TRACE_EVENT(i915_gem_ring_dispatch,
-   TP_PROTO(struct intel_ring_buffer *ring, u32 seqno, u32 flags),
-   TP_ARGS(ring, seqno, flags),
+   TP_PROTO(struct intel_ring_buffer *ring, u32 flags),
+   TP_ARGS(ring, flags),
 
TP_STRUCT__entry(
 __field(u32, dev)
@@ -247,15 +247,35 @@ TRACE_EVENT(i915_gem_ring_dispatch,
TP_fast_assign(
   __entry-dev = ring-dev-primary-index;
   __entry-ring = ring-id;
-  __entry-seqno = seqno;
+  __entry-seqno = intel_ring_get_seqno(ring),
   __entry-flags = flags;
-  i915_trace_irq_get(ring, seqno);
+  i915_trace_irq_get(ring, __entry-seqno);
   ),
 
TP_printk(dev=%u, ring=%u, seqno=%u, flags=%x,
  __entry-dev, __entry-ring, __entry-seqno, 
__entry-flags)
 );
 
+TRACE_EVENT(i915_gem_ring_complete,
+   TP_PROTO(struct intel_ring_buffer *ring),
+   TP_ARGS(ring),
+
+   TP_STRUCT__entry(
+__field(u32, dev)
+__field(u32, ring)
+__field(u32, seqno)
+),
+
+   TP_fast_assign(
+  __entry-dev = ring-dev-primary-index;
+  __entry-ring = ring-id;
+  __entry-seqno = ring-get_seqno(ring, false);
+  ),
+
+   TP_printk(dev=%u, ring=%u, seqno=%u,
+ __entry-dev, __entry-ring, __entry-seqno)
+);
+
 TRACE_EVENT(i915_gem_ring_flush,
TP_PROTO(struct intel_ring_buffer *ring, u32 invalidate, u32 flush),
TP_ARGS(ring, invalidate, flush),
@@ -304,26 +324,6 @@ DEFINE_EVENT(i915_gem_request, i915_gem_request_add,
TP_ARGS(ring, seqno)
 );
 
-TRACE_EVENT(i915_gem_request_complete,
-   TP_PROTO(struct intel_ring_buffer *ring),
-   TP_ARGS(ring),
-
-   TP_STRUCT__entry(
-__field(u32, dev)
-__field(u32, ring)
-__field(u32, seqno)
-),
-
-   TP_fast_assign(
-  __entry-dev = ring-dev-primary-index;
-  __entry-ring = ring-id;
-  __entry-seqno = ring-get_seqno(ring, false);
-  ),
-
-   TP_printk(dev=%u, ring=%u, seqno=%u,
- __entry-dev, __entry-ring, __entry-seqno)
-);
-
 DEFINE_EVENT(i915_gem_request, i915_gem_request_retire,
TP_PROTO(struct intel_ring_buffer *ring, u32 seqno),
TP_ARGS(ring, seqno)
-- 
1.8.1.4

___

[Intel-gfx] [PATCH 16/17] RFM drm/i915: Boost RPS frequency for CPU stalls

From: Chris Wilson ch...@chris-wilson.co.uk

If we encounter a situation where the CPU blocks waiting for results
from the GPU, give the GPU a kick to boost its the frequency.

This should work to reduce user interface stalls and to quickly promote
mesa to high frequencies - but the cost is that our requested frequency
stalls high (as we do not idle for long enough before rc6 to start
reducing frequencies, nor are we aggressive at down clocking an
underused GPU). However, this should be mitigated by rc6 itself powering
off the GPU when idle, and that energy use is dependent upon the workload
of the GPU in addition to its frequency (e.g. the math or sampler
functions only consume power when used). Still, this is likely to
adversely affect light workloads.

Stéphane raised the concern that this will punish good applications and
reward bad applications - but due to the nature of how mesa performs its
client throttling, I believe all mesa applications will be roughly
equally affected.

RFM - request for measurement!

Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
Cc: Kenneth Graunke kenn...@whitecape.org
Cc: Stéphane Marchesin stephane.marche...@gmail.com
Cc: Meng, Mengmeng mengmeng.m...@intel.com
Cc: Zhuang, Lena lena.zhu...@intel.com
---
 drivers/gpu/drm/i915/i915_gem.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 7c92923..50a70ee 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1016,6 +1016,15 @@ static int __wait_seqno(struct intel_ring_buffer *ring, 
u32 seqno,
if (WARN_ON(!ring-irq_get(ring)))
return -ENODEV;
 
+   if (dev_priv-info-gen = 6) {
+   mutex_lock(dev_priv-rps.hw_lock);
+   if (dev_priv-info-is_valleyview)
+   valleyview_set_rps(dev_priv-dev, 
dev_priv-rps.max_delay);
+   else
+   gen6_set_rps(dev_priv-dev, dev_priv-rps.max_delay);
+   mutex_unlock(dev_priv-rps.hw_lock);
+   }
+
/* Record current time in case interrupted by signal, or wedged * */
getrawmonotonic(before);
 
-- 
1.8.1.4

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 17/17] drm/i915: Enable Lower Slice on Haswell GT3.

Full HSW GT3 power can only be achieved with all Execution Units turned on.
This patch enables all EUs present on HSW GT3 by enabling lower slice.

Credits-by: Yejun Guo yejun@intel.com
Signed-off-by: Rodrigo Vivi rodrigo.v...@gmail.com
---
 drivers/gpu/drm/i915/i915_drv.h | 2 ++
 drivers/gpu/drm/i915/i915_gem.c | 5 +
 drivers/gpu/drm/i915/i915_reg.h | 5 +
 3 files changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 7ac5ca5..e9b2f86 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1592,6 +1592,8 @@ struct drm_i915_file_private {
 ((dev)-pci_device  0xFF00) == 0x0C00)
 #define IS_ULT(dev)(IS_HASWELL(dev)  \
 ((dev)-pci_device  0xFF00) == 0x0A00)
+#define IS_HSW_GT3(dev)(IS_HASWELL(dev)  \
+((dev)-pci_device  0x00F0) == 0x0020)
 
 /*
  * The genX designation typically refers to the render engine, so render
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 50a70ee..1c72229 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4316,6 +4316,11 @@ i915_gem_init_hw(struct drm_device *dev)
if (dev_priv-ellc_size)
I915_WRITE(HSW_IDICR, I915_READ(HSW_IDICR) | IDIHASHMSK(0xf));
 
+   if (IS_HSW_GT3(dev))
+   I915_WRITE(MI_PREDICATE_RESULT_2, LOWER_SLICE_ENABLED);
+   else
+   I915_WRITE(MI_PREDICATE_RESULT_2, LOWER_SLICE_DISABLED);
+
if (HAS_PCH_NOP(dev)) {
u32 temp = I915_READ(GEN7_MSG_CTL);
temp = ~(WAIT_FOR_PCH_FLR_ACK | WAIT_FOR_PCH_RESET_ACK);
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 4c3cf22..e122b57 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -263,6 +263,11 @@
 #define  MI_SEMAPHORE_SYNC_VVE (116) /* VECS wait for VCS  (VEVSYNC) */
 #define  MI_SEMAPHORE_SYNC_RVE (216) /* VECS wait for RCS  (VERSYNC) */
 #define  MI_SEMAPHORE_SYNC_INVALID  (316)
+
+#define MI_PREDICATE_RESULT_2  (0x2214)
+#define  LOWER_SLICE_ENABLED   (10)
+#define  LOWER_SLICE_DISABLED  (00)
+
 /*
  * 3D instructions used by the kernel
  */
-- 
1.8.1.4

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 16/17] RFM drm/i915: Boost RPS frequency for CPU stalls