Re: [Intel-gfx] [git pull] drm fixes

2015-11-29 Thread Daniel Vetter
On Sat, Nov 28, 2015 at 05:36:54AM +1000, Dave Airlie wrote:
> On 28 November 2015 at 05:05, Linus Torvalds
>  wrote:
> > On Thu, Nov 19, 2015 at 8:07 PM, Dave Airlie  wrote:
> >>
> >> core: Atomic fixes and Atomic helper fixes
> >> i915: Revert for the backlight regression along with a bunch of fixes.
> >
> > So I have no idea if the GPU updates are my problem, but my main
> > desktop machine has been hanging silently a few times lately.
> >
> > It has never done it while unattended, even if it's doing things like
> > compiling the kernel. So I'm a bit inclined to blame graphics.
> >
> > Sadly, when it hangs, it's a total hang and doesn't leave anything in
> > the logs - I just have to reboot.
> >
> > I'll try to see if I can get anything at all out of the machine, but I
> > thought I'd ask if there is some known issue with Haswell graphics in
> > the 4.4-rc code base?
> >
> > Sorry for the complete lack of details, and really any hard reason to
> > even blame the GPU people.. You may be entirely blameless.
> 
> I've been running this fixes pull on my haswell laptop since I sent it to you,
> 
> and I've been using it, F22 + gnome-shell.
> 
>  05:35:41 up 7 days, 18:31, 35 users,  load average: 0.26, 0.26, 0.23
> 
> So I'm not aware of anything at least with Haswell in general.

Yeah I just hunted down a test infrastructure failure on my Haswell the
past few days with various loads and output configs. Seemed very happy.
And not aware of anything else blowing up (bdw/skl would be less
surprising).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH] Implement Limited Video Range

2015-11-29 Thread Peter Frühberger
(Resent cause of moderation)

This implements a highly needed feature in a minimal non instructive way.
Consider a Limited Range display (like most TVs) where you want to watch a
decoded video. The TV is set to limited range and the intel driver also
uses full scaling Limited 16:235 mode, e.g. if you display the gray value
16 - the driver will postprocess it to something like ~ 22.

The following happens currently in linux intel video world:
Video is decoded with e.g. vaapi, the decoded surface is either used via
EGL directly in Limited Range. Limited Range processing takes place and at
the end while rendering the intel driver will scale down the limited range
data again as it cannot distunguish of the data it gets when e.g. rendering
with OpenGL like we succesfully do in kodi.

Another use case is vaapi when using the old vaCopySurfaceGLX
(vaPutSurface) which would automatically use BT601 / BT709 matrix to
upscale the content without any dithering to Full Range RGB. Later on this
is scaled down again with the Limited 16:235 setting and output. Quality
degrades two times.

Users and applications widely used want to make sure that colors appear on
screen like they were processed after the last processing step. In video
case two use cases make sense:

a) scaling limited range to full range with dithering applied when you need
to output fullrange
b) already having limited range and outputting that without any further
touching by the driver

Use case a) is already possible. Usecase b) will be possible with the
attached patch. It is a possibility to get Limited Range on screen in
perfect quality in 2k15.

For the future a userspace api to provide info frames and so on in a
generic way should be considered but until we have this I bet we have 2
years to go. This solution also works on X11 (without wayland) and is
useful for existing applications.

Thanks much for consideration.

---
>From deaa9d730c08aefefe3671d073d93d970bb39a31 Mon Sep 17 00:00:00 2001
From: fritsch 
Date: Sun, 29 Nov 2015 16:38:14 +0100
Subject: [PATCH] Intel: Implement Video Color Range

---
 drivers/gpu/drm/i915/i915_drv.h  |  1 +
 drivers/gpu/drm/i915/intel_display.c |  4 ++--
 drivers/gpu/drm/i915/intel_drv.h |  8 
 drivers/gpu/drm/i915/intel_hdmi.c| 17 +++--
 drivers/gpu/drm/i915/intel_modes.c   |  1 +
 5 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h
b/drivers/gpu/drm/i915/i915_drv.h
index 95bb27d..e453461 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3449,6 +3449,7 @@ int intel_freq_opcode(struct drm_i915_private
*dev_priv, int val);
 #define INTEL_BROADCAST_RGB_AUTO 0
 #define INTEL_BROADCAST_RGB_FULL 1
 #define INTEL_BROADCAST_RGB_LIMITED 2
+#define INTEL_BROADCAST_RGB_VIDEO 3

 static inline uint32_t i915_vgacntrl_reg(struct drm_device *dev)
 {
diff --git a/drivers/gpu/drm/i915/intel_display.c
b/drivers/gpu/drm/i915/intel_display.c
index 71860f8..ea40d81 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -8605,7 +8605,7 @@ static void intel_set_pipe_csc(struct drm_crtc *crtc)
   * consideration.
   */

- if (intel_crtc->config->limited_color_range)
+ if (intel_crtc->config->limited_color_range &&
!intel_crtc->config->video_color_range)
  coeff = ((235 - 16) * (1 << 12) / 255) & 0xff8; /* 0.xxx... */

  /*
@@ -8629,7 +8629,7 @@ static void intel_set_pipe_csc(struct drm_crtc *crtc)
  if (INTEL_INFO(dev)->gen > 6) {
  uint16_t postoff = 0;

- if (intel_crtc->config->limited_color_range)
+ if (intel_crtc->config->limited_color_range &&
!intel_crtc->config->video_color_range)
  postoff = (16 * (1 << 12) / 255) & 0x1fff;

  I915_WRITE(PIPE_CSC_POSTOFF_HI(pipe), postoff);
diff --git a/drivers/gpu/drm/i915/intel_drv.h
b/drivers/gpu/drm/i915/intel_drv.h
index 0598932..6940243 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -372,6 +372,13 @@ struct intel_crtc_state {
   */
  bool limited_color_range;

+ /*
+  *
+  * Use reduced/limited/broadcast rgb range without compressing.
+  *
+  */
+ bool video_color_range;
+
  /* DP has a bunch of special case unfortunately, so mark the pipe
   * accordingly. */
  bool has_dp_encoder;
@@ -682,6 +689,7 @@ struct intel_hdmi {
  int ddc_bus;
  bool limited_color_range;
  bool color_range_auto;
+ bool color_range_video;
  bool has_hdmi_sink;
  bool has_audio;
  enum hdmi_force_audio force_audio;
diff --git a/drivers/gpu/drm/i915/intel_hdmi.c
b/drivers/gpu/drm/i915/intel_hdmi.c
index 9eafa19..dc78760 100644
--- a/drivers/gpu/drm/i915/intel_hdmi.c
+++ b/drivers/gpu/drm/i915/intel_hdmi.c
@@ -464,7 +464,8 @@ static void intel_hdmi_set_avi_infoframe(struct
drm_encoder *encoder,
  }

  if (intel_hdmi->rgb_quant_range_selectable) {
- if (intel_crtc->config->limited_color_range)
+ if (intel_crtc->config->limited_color_range ||
+ intel_crtc->config->video_color_range)
  frame.avi.quantization_range =
  HDMI_QUAN

[Intel-gfx] [PATCH v3] drm/i915 : Avoid superfluous invalidation of CPU cache lines

2015-11-29 Thread akash . goel
From: Akash Goel 

When the object is moved out of CPU read domain, the cachelines
are not invalidated immediately. The invalidation is deferred till
next time the object is brought back into CPU read domain.
But the invalidation is done unconditionally, i.e. even for the case
where the cachelines were flushed previously, when the object moved out
of CPU write domain. This is avoidable and would lead to some optimization.
Though this is not a hypothetical case, but is unlikely to occur often.
The aim is to detect changes to the backing storage whilst the
data is potentially in the CPU cache, and only clflush in those case.

v2: Made the comment more verbose (Ville/Chris)
Added doc for 'cache_clean' field (Daniel)

v3: Updated the comment to assuage an apprehension regarding the
speculative-prefetching behavior of HW (Ville/Chris)

Testcase: igt/gem_concurrent_blit
Testcase: igt/benchmarks/gem_set_domain
Signed-off-by: Chris Wilson 
Signed-off-by: Akash Goel 
---
 drivers/gpu/drm/i915/i915_drv.h |  9 +
 drivers/gpu/drm/i915/i915_gem.c | 17 -
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 11ae5a5..f97795e 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2100,6 +2100,15 @@ struct drm_i915_gem_object {
unsigned int cache_level:3;
unsigned int cache_dirty:1;
 
+   /*
+* Tracks if the CPU cache has been completely clflushed.
+* !cache_clean does not imply cache_dirty (there is some data in the
+* CPU cachelines, but has not been dirtied), but cache_clean
+* does imply !cache_dirty (no data in cachelines, so not dirty also).
+* Actually cache_dirty tracks whether we have been omitting clflushes.
+*/
+   unsigned int cache_clean:1;
+
unsigned int frontbuffer_bits:INTEL_FRONTBUFFER_BITS;
 
unsigned int pin_display;
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 33adc8f..7376be8 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3552,6 +3552,7 @@ i915_gem_clflush_object(struct drm_i915_gem_object *obj,
trace_i915_gem_object_clflush(obj);
drm_clflush_sg(obj->pages);
obj->cache_dirty = false;
+   obj->cache_clean = true;
 
return true;
 }
@@ -3982,7 +3983,21 @@ i915_gem_object_set_to_cpu_domain(struct 
drm_i915_gem_object *obj, bool write)
 
/* Flush the CPU cache if it's still invalid. */
if ((obj->base.read_domains & I915_GEM_DOMAIN_CPU) == 0) {
-   i915_gem_clflush_object(obj, false);
+   /* If an object is moved out of the CPU domain following a
+* CPU write and before a GPU or GTT write, we will clflush
+* it out of the CPU cache, and mark the cache as clean.
+* After clflushing we know that this object cannot be in the
+* CPU cache, nor can it be speculatively loaded into the CPU
+* cache as our objects are page-aligned (& speculation cannot
+* cross page boundaries). Whilst this flag is set, we know
+* that any future access to the object's pages will miss the
+* stale cache and have to be serviced from main memory, i.e.
+* we do not need another clflush to invalidate the CPU cache
+* in preparing to read from the object.
+*/
+   if (!obj->cache_clean)
+   i915_gem_clflush_object(obj, false);
+   obj->cache_clean = false;
 
obj->base.read_domains |= I915_GEM_DOMAIN_CPU;
}
-- 
1.9.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915 : Avoid superfluous invalidation of CPU cache lines

2015-11-29 Thread Goel, Akash



On 11/25/2015 3:30 PM, Daniel Vetter wrote:

On Wed, Nov 25, 2015 at 02:57:47PM +0530, Goel, Akash wrote:



On 11/25/2015 2:51 PM, Daniel Vetter wrote:

On Tue, Nov 24, 2015 at 10:39:38PM +, Chris Wilson wrote:

On Tue, Nov 24, 2015 at 07:14:31PM +0100, Daniel Vetter wrote:

On Tue, Nov 24, 2015 at 12:04:06PM +0200, Ville Syrjälä wrote:

On Tue, Nov 24, 2015 at 03:35:24PM +0530, akash.g...@intel.com wrote:

From: Akash Goel 

When the object is moved out of CPU read domain, the cachelines
are not invalidated immediately. The invalidation is deferred till
next time the object is brought back into CPU read domain.
But the invalidation is done unconditionally, i.e. even for the case
where the cachelines were flushed previously, when the object moved out
of CPU write domain. This is avoidable and would lead to some optimization.
Though this is not a hypothetical case, but is unlikely to occur often.
The aim is to detect changes to the backing storage whilst the
data is potentially in the CPU cache, and only clflush in those case.

Signed-off-by: Chris Wilson 
Signed-off-by: Akash Goel 
---
  drivers/gpu/drm/i915/i915_drv.h | 1 +
  drivers/gpu/drm/i915/i915_gem.c | 9 -
  2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index df9316f..fedb71d 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2098,6 +2098,7 @@ struct drm_i915_gem_object {
unsigned long gt_ro:1;
unsigned int cache_level:3;
unsigned int cache_dirty:1;
+   unsigned int cache_clean:1;


So now we have cache_dirty and cache_clean which seems redundant,
except somehow cache_dirty != !cache_clean?


Exactly, not entirely redundant. I did think something along MESI lines
would be useful, but that didn't capture the different meanings we
employ.

cache_dirty tracks whether we have been eliding the clflush.

cache_clean tracks whether we know the cache has been completely
clflushed.

(cache_clean implies !cache_dirty, but
!cache_clean does not imply cache_dirty)


We also have read_domains & DOMAIN_CPU. Which is which?


DOMAIN_CPU implies that the object may be in the cpu cache (modulo the
clflush elision above).

DOMAIN_CPU implies !cache_clean

and even

cache_clean implies !DOMAIN_CPU

but

!DOMAIN_CPU does not imply cache_clean


All the above should be in the kerneldoc (per-struct-member comments
please) of drm_i915_gem_object. Akash, can you please amend your patch?
And please make sure we do include that kerneldoc somewhere ... might need
an upfront patch to do that, for just drm_i915_gem_object.


I floated the amended patch, earlier today,
http://lists.freedesktop.org/archives/intel-gfx/2015-November/081194.html.
Please kindly check that.


Already done and replied here because I think this should be lifted to
kerneldoc for the structure itself. That's why I replied here ;-)
-Daniel

Hi Daniel,

I think the patch to provide a kernel-doc for just the 
drm_i915_gem_object structure can be submitted independently of this 
patch. The kernel-doc part can be done as a follow up patch.


For the current patch, have added the per-struct-member comments for the 
'cache_clean' field.


Best regards
Akash




___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v4] PCI / PM: tune down RPM suspend error message with EBUSY and EAGAIN retval

2015-11-29 Thread Rafael J. Wysocki
On Saturday, November 28, 2015 10:34:24 AM Imre Deak wrote:
> The runtime PM core doesn't treat EBUSY and EAGAIN retvals from the driver
> suspend hooks as errors, but they still show up as errors in dmesg. Tune
> them down.
> 
> One problem caused by this was noticed by Daniel: the i915 driver
> returns EAGAIN to signal a temporary failure to suspend and as a request
> towards the RPM core for scheduling a suspend again. This is a normal
> event, but the resulting error message flags a breakage during the
> driver's automated testing which parses dmesg and picks up the error.
> 
> v2:
> - fix compile breake when CONFIG_PM_SLEEP=n (0-day builder)
> v3:
> - instead of modifying the suspend_report_result() helper to disinguish
>   between the runtime and system suspend case, inline the error
>   printing, it's not used anywhere else (Rafael)
> v4:
> - don't refer to log levels as flags in code comment (Rafael)
> - use pr_debug(), pr_err() instead of the corresponding printk() (Rafael)
> 
> Reported-by: Daniel Vetter 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92992
> CC: Bjorn Helgaas 
> CC: Rafael J. Wysocki 
> Signed-off-by: Imre Deak 

This is fine by me.

Bjorn, any objections against applying it?  It reduces log noise quite
a bit for i915.

> ---
>  drivers/pci/pci-driver.c | 16 ++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 4446fcb..32a9947 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -1146,9 +1146,21 @@ static int pci_pm_runtime_suspend(struct device *dev)
>   pci_dev->state_saved = false;
>   pci_dev->no_d3cold = false;
>   error = pm->runtime_suspend(dev);
> - suspend_report_result(pm->runtime_suspend, error);
> - if (error)
> + if (error) {
> + /*
> +  * -EBUSY and -EAGAIN is used to request the runtime PM core
> +  * to schedule a new suspend, so log the event only with debug
> +  * log level.
> +  */
> + if (error == -EBUSY || error == -EAGAIN)
> + pr_debug("%s(): %pF returns %d\n", __func__,
> +  pm->runtime_suspend, error);
> + else
> + pr_err("%s(): %pF returns %d\n", __func__,
> +pm->runtime_suspend, error);
> +
>   return error;
> + }
>   if (!pci_dev->d3cold_allowed)
>   pci_dev->no_d3cold = true;
>  
> 

Thanks,
Rafael

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH i-g-t] tests/gem_softpin: New tests for softpin feature

2015-11-29 Thread Vinay Belgaumkar
These tests exercise the userptr ioctl to create shared buffers
between CPU and GPU. They contain error and normal usage scenarios.
They also contain a couple of stress tests which copy buffers between
CPU and GPU. These tests rely on the softpin patch in order to pin buffers
to a certain VA.

Caveat: These tests were designed to run on 64-bit system. Future work
includes adding logic to ensure these tests can run on 32-bit systems with
PPGTT support. Some tests are currently disabled for 32-bit systems for that
reason.

v2: Added cc and signed-off-by fields

v3: Fixed review comments, added helper functions. Removed userptr error
scenarios covered by existing userptr tests. Modified stress test to have
100K buffers, it now runs for ~30 mins, checks every element has been written
to correctly, and pins buffers at different VMAs.

v4: Changed name to gem_softpin

v5: More fixes. Removed the file based tests, will move them to userptr tests.
Added a function that validates appropriate PPGTT support before running tests.
Optimized stack space and memory footprint in stress test. Removed the eviction
test, will add it back after verifying proper functionality.

v6: Split basic test into userptr and bo
Fixed some coding style issues.

Cc: Michel Thierry 
Cc: Tvrtko Ursulin 
Signed-off-by: Vinay Belgaumkar 
---
 tests/.gitignore   |1 +
 tests/Makefile.sources |1 +
 tests/gem_softpin.c| 1003 
 3 files changed, 1005 insertions(+)
 create mode 100644 tests/gem_softpin.c

diff --git a/tests/.gitignore b/tests/.gitignore
index 80af9a7..424870b 100644
--- a/tests/.gitignore
+++ b/tests/.gitignore
@@ -21,6 +21,7 @@ gem_bad_blit
 gem_bad_length
 gem_bad_reloc
 gem_basic
+gem_softpin
 gem_caching
 gem_close_race
 gem_concurrent_all
diff --git a/tests/Makefile.sources b/tests/Makefile.sources
index 8fb2de8..2008d4a 100644
--- a/tests/Makefile.sources
+++ b/tests/Makefile.sources
@@ -11,6 +11,7 @@ TESTS_progs_M = \
drv_hangman \
gem_bad_reloc \
gem_basic \
+   gem_softpin \
gem_caching \
gem_close_race \
gem_concurrent_blit \
diff --git a/tests/gem_softpin.c b/tests/gem_softpin.c
new file mode 100644
index 000..7cec732
--- /dev/null
+++ b/tests/gem_softpin.c
@@ -0,0 +1,1003 @@
+/*
+ * Copyright © 2015 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *Vinay Belgaumkar 
+ *Thomas Daniel 
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "drm.h"
+#include "ioctl_wrappers.h"
+#include "drmtest.h"
+#include "intel_chipset.h"
+#include "intel_io.h"
+#include "i915_drm.h"
+#include 
+#include 
+#include 
+#include 
+#include "igt_kms.h"
+#include 
+#include 
+#include 
+
+#define BO_SIZE 4096
+#define MULTIPAGE_BO_SIZE 2 * BO_SIZE
+#define STORE_BATCH_BUFFER_SIZE 4
+#define EXEC_OBJECT_PINNED (1<<4)
+#define EXEC_OBJECT_SUPPORTS_48B_ADDRESS (1<<3)
+#define SHARED_BUFFER_SIZE 4096
+
+typedef struct drm_i915_gem_userptr i915_gem_userptr;
+
+static uint32_t init_userptr(int fd, i915_gem_userptr *, void *ptr, uint64_t 
size);
+static void *create_mem_buffer(uint64_t size);
+static int gem_call_userptr_ioctl(int fd, i915_gem_userptr *userptr);
+static void gem_pin_userptr_test(void);
+static void gem_pin_bo_test(void);
+static void gem_pin_invalid_vma_test(void);
+static void gem_pin_overlap_test(void);
+static void gem_pin_high_address_test(void);
+
+#define NO_PPGTT 0
+#define ALIASING_PPGTT 1
+#define FULL_32_BIT_PPGTT 2
+#define FULL_48_BIT_PPGTT 3
+/* uses_full_ppgtt
+ * Finds supported PPGTT details. 
+ * @fd DRM fd
+ * @min can be
+ * 0 - No PPGTT
+ * 1 - Aliasing PPGTT
+ * 2 - Full PPGTT (32b)
+ * 3 - Full PPGTT (48b)
+ * RETURNS true/false if min support is present
+*/
+static bool uses_full_ppgtt(in

Re: [Intel-gfx] [PATCH RESEND 2/2] drm/i915: prevent the vgacon from ever reloading

2015-11-29 Thread Emil Velikov
On 29 November 2015 at 12:47, Daniel Vetter  wrote:
> On Fri, Nov 27, 2015 at 07:37:53PM -0500, Ilia Mirkin wrote:
>> On Fri, Nov 27, 2015 at 10:40 AM, Emil Velikov  
>> wrote:
>> > On 27 November 2015 at 15:10, Daniel Vetter  wrote:
>> >> It only leads to bloodshed and tears - we don't bother to restore a
>> >> working legacy vga hw setup.
>> >>
>> >> On haswell with the new dynamic power well code this leads to even
>> >> more hilarity since for some configurations the hardware is simply no
>> >> longer there.
>> >>
>> >> The actual implementation is a bit a hack - we realy on fbcon to kick
>> >> out the vgacon. To make this also work with I915_FBDEV=n (or FBCON=n)
>> >> and VGA_CONSOLE=y i915 already unregisters the vga console manually
>> >> early in the driver load sequence.
>> >>
>> > Interesting... nv50 and later GPUs are in a roughly similar shame
>> > afaict. They lack the dedicated hardware and no one really bothered
>> > figuring out how to restore things to a working shape [1].
>> >
>> > Then again, upon sequential load of the nouveau module the GPU gets
>> > initialised properly, where you can get X (weston?) up and running
>> > without issues. Am I thinking about a different thing ?
>> >
>> > I take it that you guys do less of the (re)initialisation dance, to
>> > ensure faster boot times ?
>> >
>> > Thanks,
>> > Emil
>> >
>> > [1] 
>> > http://nouveau.freedesktop.org/wiki/KernelModeSetting/#deactivatingkmsandunloadingnouveau
>>
>> FWIW it's possible to unload nouveau, at which point your screen turns
>> off (because we can't restore the vga emulation junk... haven't the
>> faintest clue how it works), and then reload it, at which point fbcon
>> gets rebound and is completely happy. I can't tell what this patch is
>> doing, but for your own sanity, you should support unloading/reloading
>> i915 as well.
>
> We do of course support unloading/reloading. And we do the same "shut
> everything down" like nouveau seems to do. The problem though on some hw
> is that if vgacon then tries to access the registers the hw dies. So the
> problem we have with not restoring VGA functionality isn't that the screen
> is black when i915.ko is unloaded, but that the machine dies. This patch
> fixes that.
I see. Thank you for the explanation!

-Emil
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH RESEND 1/2] vgacon: dummy implementation for vgacon_text_force

2015-11-29 Thread Daniel Vetter
On Fri, Nov 27, 2015 at 03:29:52PM +, Emil Velikov wrote:
> On 27 November 2015 at 15:10, Daniel Vetter  wrote:
> > This allows us to ditch a ton of ugly #ifdefs from a bunch of drm modeset
> > drivers.
> >
> > v2: Make the dummy function actually return a sane value, spotted by
> > Ville.
> >
> > Cc: Ville Syrjälä 
> > Cc: Andrew Morton 
> > Cc: Greg Kroah-Hartman 
> > Signed-off-by: Daniel Vetter 
> > ---
> >  drivers/gpu/drm/ast/ast_drv.c | 2 --
> >  drivers/gpu/drm/cirrus/cirrus_drv.c   | 2 --
> >  drivers/gpu/drm/i915/i915_drv.c   | 2 --
> >  drivers/gpu/drm/mgag200/mgag200_drv.c | 2 --
> >  drivers/gpu/drm/nouveau/nouveau_drm.c | 2 --
> >  drivers/gpu/drm/qxl/qxl_drv.c | 2 --
> >  drivers/gpu/drm/radeon/radeon_drv.c   | 2 --
> >  include/linux/console.h   | 2 ++
> 
> The new drivers seems to be missing the conversion. Namely amdgpu and
> virtio_gpu.

Just shows how old this patch is ;-) And it's not going to cause a
problem since this patch simply makes the #ifdef redundant.

Greg, do you prefer a respin or a fixup patch?

Thanks, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH RESEND 2/2] drm/i915: prevent the vgacon from ever reloading

2015-11-29 Thread Daniel Vetter
On Fri, Nov 27, 2015 at 07:37:53PM -0500, Ilia Mirkin wrote:
> On Fri, Nov 27, 2015 at 10:40 AM, Emil Velikov  
> wrote:
> > On 27 November 2015 at 15:10, Daniel Vetter  wrote:
> >> It only leads to bloodshed and tears - we don't bother to restore a
> >> working legacy vga hw setup.
> >>
> >> On haswell with the new dynamic power well code this leads to even
> >> more hilarity since for some configurations the hardware is simply no
> >> longer there.
> >>
> >> The actual implementation is a bit a hack - we realy on fbcon to kick
> >> out the vgacon. To make this also work with I915_FBDEV=n (or FBCON=n)
> >> and VGA_CONSOLE=y i915 already unregisters the vga console manually
> >> early in the driver load sequence.
> >>
> > Interesting... nv50 and later GPUs are in a roughly similar shame
> > afaict. They lack the dedicated hardware and no one really bothered
> > figuring out how to restore things to a working shape [1].
> >
> > Then again, upon sequential load of the nouveau module the GPU gets
> > initialised properly, where you can get X (weston?) up and running
> > without issues. Am I thinking about a different thing ?
> >
> > I take it that you guys do less of the (re)initialisation dance, to
> > ensure faster boot times ?
> >
> > Thanks,
> > Emil
> >
> > [1] 
> > http://nouveau.freedesktop.org/wiki/KernelModeSetting/#deactivatingkmsandunloadingnouveau
> 
> FWIW it's possible to unload nouveau, at which point your screen turns
> off (because we can't restore the vga emulation junk... haven't the
> faintest clue how it works), and then reload it, at which point fbcon
> gets rebound and is completely happy. I can't tell what this patch is
> doing, but for your own sanity, you should support unloading/reloading
> i915 as well.

We do of course support unloading/reloading. And we do the same "shut
everything down" like nouveau seems to do. The problem though on some hw
is that if vgacon then tries to access the registers the hw dies. So the
problem we have with not restoring VGA functionality isn't that the screen
is black when i915.ko is unloaded, but that the machine dies. This patch
fixes that.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [QA 2015-11-20 ww48] Testing report for `drm-intel-testing`

2015-11-29 Thread christophe . prigent
Hello,

We launched Intel GPU Tools on 7 platforms: Skylake-Y, Braswell-M,
Broadwell-U, Baytrail-M, Haswell-U, Ivy Bridge and Sandy Bridge to
validate tag drm-intel-testing-2015-11-20 (kernel 4.4-rc1).


Results:
``
| Avail | Black | Skip  | Not   |Exec   | Exec  | Pass of   
| Pass of
| able  | listed|   | Run   |   | % | Total %   
| executed %
--
SKL-Y
Basic   | 131   | 0 | 2 | 0 | 129   | 98.47%| 93.89%
| 95.35%
Others  | 5542  | 3780  | 271   | 0 | 1491  | 26.90%| 24.11%
| 89.60%
--
BSW-M
Basic   | 131   | 0 | 4 | 0 | 127   | 96.95%| 90.84%
| 93.70%
Others  | 5542  | 3780  | 380   | 340   | 1042  | 18.80%| 12.94%
| 68.81%
--
BDW-U
Basic   | 131   | 0 | 4 | 0 | 127   | 96.95%| 92.37%
| 95.28%
Others  | 5542  | 3780  | 574   | 0 | 1188  | 21.44%| 19.99%
| 93.27%
--
HSW-U
Basic   | 131   | 0 | 11| 0 | 120   | 91.60%| 87.02%
| 95.00%
Others  | 5542  | 3780  | 677   | 0 | 1085  | 19.58%| 17.72%
| 90.51%
--
IVB
Basic   | 131   | 0 | 6 | 0 | 125   | 95.42%| 94.66%
| 99.20%
Others  | 5542  | 3780  | 902   | 0 | 860   | 15.52%| 14.71%
| 94.77%
--
SNB
Basic   | 131   | 0 | 13| 0 | 118   | 90.08%| 90.08%
| 100.00%
Others  | 5542  | 3780  | 895   | 0 | 867   | 15.64%| 14.85%
| 94.93%
--
BYT-M
Basic   | 131   | 0 | 7 | 0 | 124   | 94.66%| 93.89%
| 99.19%
Others  | 5673  | 3780  | 654   | 0 | 1239  | 21.84%| 19.51%
| 89.35%
--


Tag drm-intel-testing-2015-10-23 results:
``
Platform| Avail | Black | Skip  | Exec  | Exec  | Pass of   | Pass 
of
| able  | listed|   |   | % | Total %   | 
executed %
--
SKLY| 5596  | 3780  | 274   | 1542  | 27.56%| 25.05%| 90.92%
BDWU| 5578  | 3781  | 633   | 1164  | 20.87%| 19.51%| 93.47%
BYTM| 5578  | 3798  | 667   | 1113  | 19.95%| 18.48%| 92.63%
HSWU| 5578  | 3727  | 557   | 1294  | 23.20%| 21.30%| 91.81%
BSWM| 6171  | 4321  | 496   | 1354  | 21.94%| 16.95%| 77.25%
--


Test Environment:

Kernel: tag drm-intel-testing-2015-11-20
5074e51ef3a2b0ad4c2354e95aec5380a93966b3 (4.4-rc1) from
git://anongit.freedesktop.org/drm-intel
Mesa: mesa-11.0.5 from http://cgit.freedesktop.org/mesa/mesa
Xf86_video_intel: 2.99.917 from
http://cgit.freedesktop.org/xorg/driver/xf86-video-intel
Libdrm: libdrm-2.4.65 from http://cgit.freedesktop.org/mesa/drm
Cairo: 1.14.2 from http://cgit.freedesktop.org/cairo
libva: libva-1.6.0 from http://cgit.freedesktop.org/libva
intel-driver: 1.6.1. from http://cgit.freedesktop.org/vaapi/intel-driver
xorg: 1.17.99 installed with script git_xorg.sh
Xserver: xorg-server-1.17.2 from http://cgit.freedesktop.org/xorg/xserver
Intel-gpu-tools: 1.12 b68a642 from
http://cgit.freedesktop.org/xorg/app/intel-gpu


New bugs reported:

None


Closed bugs:

https://bugs.freedesktop.org/show_bug.cgi?id=92472
https://bugs.freedesktop.org/show_bug.cgi?id=92737
https://bugs.freedesktop.org/show_bug.cgi?id=92707
https://bugs.freedesktop.org/show_bug.cgi?id=92211


Reproduced bugs:
``
SKL
https://bugs.freedesktop.org/show_bug.cgi?id=92602 [HSW BDW SKL BSW] [IGT
Basic] kms_pipe_crc_basic is epic fail
https://bugs.freedesktop.org/show_bug.cgi?id=93013 [SKL]
igt/drm_import_export/import-close-race-flink result is crash
https://bugs.freedesktop.org/show_bug.cgi?id=88437
[BDW/SKL/BYT]igt/drv_missed_irq_hang fails
https://bugs.freedesktop.org/show_bug.cgi?id=92699 [BDW SKL B

[Intel-gfx] [PATCH 15/15] drm/i915: On GPU reset, set the HWS breadcrumb to the last seqno

2015-11-29 Thread Chris Wilson
After the GPU reset and we discard all of the incomplete requests, mark
the GPU as having advanced to the last_submitted_seqno (as having
completed the requests and ready for fresh work). The impact of this is
negligble, as all the requests will be considered completed by this
point, it just brings the HWS into line with expectations for external
viewers.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/i915_gem.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 871201713c73..f82ee12c9492 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2793,6 +2793,8 @@ static void i915_gem_reset_ring_cleanup(struct 
drm_i915_private *dev_priv,
buffer->last_retired_head = buffer->tail;
intel_ring_update_space(buffer);
}
+
+   intel_ring_init_seqno(ring, ring->last_submitted_seqno);
 }
 
 void i915_gem_reset(struct drm_device *dev)
-- 
2.6.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 01/15] drm/i915: Break busywaiting for requests on pending signals

2015-11-29 Thread Chris Wilson
The busywait in __i915_spin_request() does not respect pending signals
and so may consume the entire timeslice for the task instead of
returning to userspace to handle the signal.

In the worst case this could cause a delay in signal processing of 20ms,
which would be a noticeable jitter in cursor tracking. If a higher
resolution signal was being used, for example to provide fairness of a
server timeslices between clients, we could expect to detect some
unfairness between clients (i.e. some windows not updating as fast as
others). This issue was noticed when inspecting a report of poor
interactivity resulting from excessively high __i915_spin_request usage.

Fixes regression from
commit 2def4ad99befa25775dd2f714fdd4d92faec6e34 [v4.2]
Author: Chris Wilson 
Date:   Tue Apr 7 16:20:41 2015 +0100

 drm/i915: Optimistically spin for the request completion

v2: Try to assess the impact of the bug

Signed-off-by: Chris Wilson 
Cc: Jens Axboe 
Cc; "Rogozhkin, Dmitry V" 
Cc: Daniel Vetter 
Cc: Tvrtko Ursulin 
Cc: Eero Tamminen 
Cc: "Rantala, Valtteri" 
Cc: sta...@kernel.vger.org
---
 drivers/gpu/drm/i915/i915_gem.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 33adc8f8ab20..87fc34f5899f 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1146,7 +1146,7 @@ static bool missed_irq(struct drm_i915_private *dev_priv,
return test_bit(ring->id, &dev_priv->gpu_error.missed_irq_rings);
 }
 
-static int __i915_spin_request(struct drm_i915_gem_request *req)
+static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
 {
unsigned long timeout;
 
@@ -1158,6 +1158,9 @@ static int __i915_spin_request(struct 
drm_i915_gem_request *req)
if (i915_gem_request_completed(req, true))
return 0;
 
+   if (signal_pending_state(state, current))
+   break;
+
if (time_after_eq(jiffies, timeout))
break;
 
@@ -1197,6 +1200,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
struct drm_i915_private *dev_priv = dev->dev_private;
const bool irq_test_in_progress =
ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) & 
intel_ring_flag(ring);
+   int state = interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
DEFINE_WAIT(wait);
unsigned long timeout_expire;
s64 before, now;
@@ -1221,7 +1225,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
before = ktime_get_raw_ns();
 
/* Optimistic spin for the next jiffie before touching IRQs */
-   ret = __i915_spin_request(req);
+   ret = __i915_spin_request(req, state);
if (ret == 0)
goto out;
 
@@ -1233,8 +1237,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
for (;;) {
struct timer_list timer;
 
-   prepare_to_wait(&ring->irq_queue, &wait,
-   interruptible ? TASK_INTERRUPTIBLE : 
TASK_UNINTERRUPTIBLE);
+   prepare_to_wait(&ring->irq_queue, &wait, state);
 
/* We need to check whether any gpu reset happened in between
 * the caller grabbing the seqno and now ... */
@@ -1252,7 +1255,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
break;
}
 
-   if (interruptible && signal_pending(current)) {
+   if (signal_pending_state(state, current)) {
ret = -ERESTARTSYS;
break;
}
-- 
2.6.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 09/15] drm/i915: Separate out the seqno-barrier from engine->get_seqno

2015-11-29 Thread Chris Wilson
In order to simplify the next couple of patches, extract the
lazy_coherency optimisation our of the engine->get_seqno() vfunc into
its own callback.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/i915_debugfs.c | 10 -
 drivers/gpu/drm/i915/i915_drv.h | 12 ++
 drivers/gpu/drm/i915/i915_gpu_error.c   |  2 +-
 drivers/gpu/drm/i915/i915_irq.c |  4 ++--
 drivers/gpu/drm/i915/i915_trace.h   |  2 +-
 drivers/gpu/drm/i915/intel_lrc.c| 39 +
 drivers/gpu/drm/i915/intel_ringbuffer.c | 34 ++--
 drivers/gpu/drm/i915/intel_ringbuffer.h |  4 ++--
 8 files changed, 51 insertions(+), 56 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index 8458447ddc17..1e8aa897673a 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -600,7 +600,7 @@ static int i915_gem_pageflip_info(struct seq_file *m, void 
*data)
   ring->name,
   
i915_gem_request_get_seqno(work->flip_queued_req),
   dev_priv->next_seqno,
-  ring->get_seqno(ring, true),
+  ring->get_seqno(ring),
   
i915_gem_request_completed(work->flip_queued_req, true));
} else
seq_printf(m, "Flip not associated with any 
ring\n");
@@ -730,10 +730,8 @@ static int i915_gem_request_info(struct seq_file *m, void 
*data)
 static void i915_ring_seqno_info(struct seq_file *m,
 struct intel_engine_cs *ring)
 {
-   if (ring->get_seqno) {
-   seq_printf(m, "Current sequence (%s): %x\n",
-  ring->name, ring->get_seqno(ring, false));
-   }
+   seq_printf(m, "Current sequence (%s): %x\n",
+  ring->name, ring->get_seqno(ring));
 }
 
 static int i915_gem_seqno_info(struct seq_file *m, void *data)
@@ -1342,7 +1340,7 @@ static int i915_hangcheck_info(struct seq_file *m, void 
*unused)
intel_runtime_pm_get(dev_priv);
 
for_each_ring(ring, dev_priv, i) {
-   seqno[i] = ring->get_seqno(ring, false);
+   seqno[i] = ring->get_seqno(ring);
acthd[i] = intel_ring_get_active_head(ring);
}
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 782d08ab6264..a9c1785ac08c 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2988,15 +2988,19 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2)
 static inline bool i915_gem_request_started(struct drm_i915_gem_request *req,
   bool lazy_coherency)
 {
-   u32 seqno = req->ring->get_seqno(req->ring, lazy_coherency);
-   return i915_seqno_passed(seqno, req->previous_seqno);
+   if (!lazy_coherency && req->ring->seqno_barrier)
+   req->ring->seqno_barrier(req->ring);
+   return i915_seqno_passed(req->ring->get_seqno(req->ring),
+req->previous_seqno);
 }
 
 static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
  bool lazy_coherency)
 {
-   u32 seqno = req->ring->get_seqno(req->ring, lazy_coherency);
-   return i915_seqno_passed(seqno, req->seqno);
+   if (!lazy_coherency && req->ring->seqno_barrier)
+   req->ring->seqno_barrier(req->ring);
+   return i915_seqno_passed(req->ring->get_seqno(req->ring),
+req->seqno);
 }
 
 int __must_check i915_gem_get_seqno(struct drm_device *dev, u32 *seqno);
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c 
b/drivers/gpu/drm/i915/i915_gpu_error.c
index d1df405b905c..7a427240c813 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -903,8 +903,8 @@ static void i915_record_ring_state(struct drm_device *dev,
 
ering->waiting = 
READ_ONCE(dev_priv->breadcrumbs.engine[ring->id].first);
ering->instpm = I915_READ(RING_INSTPM(ring->mmio_base));
-   ering->seqno = ring->get_seqno(ring, false);
ering->acthd = intel_ring_get_active_head(ring);
+   ering->seqno = ring->get_seqno(ring);
ering->start = I915_READ_START(ring);
ering->head = I915_READ_HEAD(ring);
ering->tail = I915_READ_TAIL(ring);
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 0c2390476bdb..43078e09e1f0 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2871,7 +2871,7 @@ static int semaphore_passed(struct intel_engine_cs *ring)
if (signaller->hangcheck.deadlock >= I915_NUM_RINGS)
return -1;
 
-   if (i915_seqno_passed(signaller->get_seqno(signa

[Intel-gfx] [PATCH 02/15] drm/i915: Limit the busy wait on requests to 10us not 10ms!

2015-11-29 Thread Chris Wilson
When waiting for high frequency requests, the finite amount of time
required to set up the irq and wait upon it limits the response rate. By
busywaiting on the request completion for a short while we can service
the high frequency waits as quick as possible. However, if it is a slow
request, we want to sleep as quickly as possible. The tradeoff between
waiting and sleeping is roughly the time it takes to sleep on a request,
on the order of a microsecond. Based on measurements of synchronous
workloads from across big core and little atom, I have set the limit for
busywaiting as 10 microseconds. In most of the synchronous cases, we can
reduce the limit down to as little as 2 miscroseconds, but that leaves
quite a few test cases regressing by factors of 3 and more.

The code currently uses the jiffie clock, but that is far too coarse (on
the order of 10 milliseconds) and results in poor interactivity as the
CPU ends up being hogged by slow requests. To get microsecond resolution
we need to use a high resolution timer. The cheapest of which is polling
local_clock(), but that is only valid on the same CPU. If we switch CPUs
because the task was preempted, we can also use that as an indicator that
 the system is too busy to waste cycles on spinning and we should sleep
instead.

__i915_spin_request was introduced in
commit 2def4ad99befa25775dd2f714fdd4d92faec6e34 [v4.2]
Author: Chris Wilson 
Date:   Tue Apr 7 16:20:41 2015 +0100

 drm/i915: Optimistically spin for the request completion

v2: Drop full u64 for unsigned long - the timer is 32bit wraparound safe,
so we can use native register sizes on smaller architectures. Mention
the approximate microseconds units for elapsed time and add some extra
comments describing the reason for busywaiting.

v3: Raise the limit to 10us

Reported-by: Jens Axboe 
Link: https://lkml.org/lkml/2015/11/12/621
Cc; "Rogozhkin, Dmitry V" 
Cc: Daniel Vetter 
Cc: Tvrtko Ursulin 
Cc: Eero Tamminen 
Cc: "Rantala, Valtteri" 
Cc: sta...@kernel.vger.org
---
 drivers/gpu/drm/i915/i915_gem.c | 47 +++--
 1 file changed, 45 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 87fc34f5899f..bad112abb16b 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1146,14 +1146,57 @@ static bool missed_irq(struct drm_i915_private 
*dev_priv,
return test_bit(ring->id, &dev_priv->gpu_error.missed_irq_rings);
 }
 
+static unsigned long local_clock_us(unsigned *cpu)
+{
+   unsigned long t;
+
+   /* Cheaply and approximately convert from nanoseconds to microseconds.
+* The result and subsequent calculations are also defined in the same
+* approximate microseconds units. The principal source of timing
+* error here is from the simple truncation.
+*
+* Note that local_clock() is only defined wrt to the current CPU;
+* the comparisons are no longer valid if we switch CPUs. Instead of
+* blocking preemption for the entire busywait, we can detect the CPU
+* switch and use that as indicator of system load and a reason to
+* stop busywaiting, see busywait_stop().
+*/
+   *cpu = get_cpu();
+   t = local_clock() >> 10;
+   put_cpu();
+
+   return t;
+}
+
+static bool busywait_stop(unsigned long timeout, unsigned cpu)
+{
+   unsigned this_cpu;
+
+   if (time_after(local_clock_us(&this_cpu), timeout))
+   return true;
+
+   return this_cpu != cpu;
+}
+
 static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
 {
unsigned long timeout;
+   unsigned cpu;
+
+   /* When waiting for high frequency requests, e.g. during synchronous
+* rendering split between the CPU and GPU, the finite amount of time
+* required to set up the irq and wait upon it limits the response
+* rate. By busywaiting on the request completion for a short while we
+* can service the high frequency waits as quick as possible. However,
+* if it is a slow request, we want to sleep as quickly as possible.
+* The tradeoff between waiting and sleeping is roughly the time it
+* takes to sleep on a request, on the order of a microsecond.
+*/
 
if (i915_gem_request_get_ring(req)->irq_refcount)
return -EBUSY;
 
-   timeout = jiffies + 1;
+   timeout = local_clock_us(&cpu) + 10;
while (!need_resched()) {
if (i915_gem_request_completed(req, true))
return 0;
@@ -1161,7 +1204,7 @@ static int __i915_spin_request(struct 
drm_i915_gem_request *req, int state)
if (signal_pending_state(state, current))
break;
 
-   if (time_after_eq(jiffies, timeout))
+   if (busywait_stop(timeout, cpu))
break;
 
cpu_relax_lowlatency();
-- 
2.6

[Intel-gfx] [PATCH 05/15] drm/i915: Suppress error message when GPU resets are disabled

2015-11-29 Thread Chris Wilson
If we do not have lowlevel support for reseting the GPU, or if the user
has explicitly disabled reseting the device, the failure is expected.
Since it is an expected failure, we should be using a lower priority
message than *ERROR*, perhaps NOTICE. In the absence of DRM_NOTICE, just
emit the expected failure as a DEBUG message.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/i915_drv.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index eb893a5e00b1..476bb696dbcb 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -959,7 +959,10 @@ int i915_reset(struct drm_device *dev)
pr_notice("drm/i915: Resetting chip after gpu hang\n");
 
if (ret) {
-   DRM_ERROR("Failed to reset chip: %i\n", ret);
+   if (ret != -ENODEV)
+   DRM_ERROR("Failed to reset chip: %i\n", ret);
+   else
+   DRM_DEBUG_DRIVER("GPU reset disabled\n");
goto error;
}
 
-- 
2.6.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 06/15] drm/i915: Delay queuing hangcheck to wait-request

2015-11-29 Thread Chris Wilson
We can forgo queuing the hangcheck from the start of every request to
until we wait upon a request. This reduces the overhead of every
request, but may increase the latency of detecting a hang. Howeever, if
nothing every waits upon a hang, did it ever hang? It also improves the
robustness of the wait-request by ensuring that the hangchecker is
indeed running before we sleep indefinitely (and thereby ensuring that
we never actually sleep forever waiting for a dead GPU).

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/i915_drv.h | 2 +-
 drivers/gpu/drm/i915/i915_gem.c | 4 ++--
 drivers/gpu/drm/i915/i915_irq.c | 9 -
 3 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index c24c23d8a0c0..e1980212ba37 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2711,7 +2711,7 @@ void intel_hpd_cancel_work(struct drm_i915_private 
*dev_priv);
 bool intel_hpd_pin_to_port(enum hpd_pin pin, enum port *port);
 
 /* i915_irq.c */
-void i915_queue_hangcheck(struct drm_device *dev);
+void i915_queue_hangcheck(struct drm_i915_private *dev_priv);
 __printf(3, 4)
 void i915_handle_error(struct drm_device *dev, bool wedged,
   const char *fmt, ...);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 646d189e23a1..f33c35c6130f 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1302,6 +1302,8 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
break;
}
 
+   i915_queue_hangcheck(dev_priv);
+
timer.function = NULL;
if (timeout || missed_irq(dev_priv, ring)) {
unsigned long expire;
@@ -2579,8 +2581,6 @@ void __i915_add_request(struct drm_i915_gem_request 
*request,
 
trace_i915_gem_request_add(request);
 
-   i915_queue_hangcheck(ring->dev);
-
queue_delayed_work(dev_priv->wq,
   &dev_priv->mm.retire_work,
   round_jiffies_up_relative(HZ));
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 3b2a8fbe0392..28c0329f8281 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -3066,15 +3066,14 @@ static void i915_hangcheck_elapsed(struct work_struct 
*work)
if (rings_hung)
return i915_handle_error(dev, true, "Ring hung");
 
+   /* Reset timer in case GPU hangs without another request being added */
if (busy_count)
-   /* Reset timer case chip hangs without another request
-* being added */
-   i915_queue_hangcheck(dev);
+   i915_queue_hangcheck(dev_priv);
 }
 
-void i915_queue_hangcheck(struct drm_device *dev)
+void i915_queue_hangcheck(struct drm_i915_private *dev_priv)
 {
-   struct i915_gpu_error *e = &to_i915(dev)->gpu_error;
+   struct i915_gpu_error *e = &dev_priv->gpu_error;
 
if (!i915.enable_hangcheck)
return;
-- 
2.6.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 11/15] drm/i915: Use HWS for seqno tracking everywhere

2015-11-29 Thread Chris Wilson
By using the same address for storing the HWS on every platform, we can
remove the platform specific vfuncs and reduce the get-seqno routine to
a single read of a cached memory location.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/i915_debugfs.c  |  6 +--
 drivers/gpu/drm/i915/i915_drv.h  |  4 +-
 drivers/gpu/drm/i915/i915_gpu_error.c|  2 +-
 drivers/gpu/drm/i915/i915_irq.c  |  4 +-
 drivers/gpu/drm/i915/i915_trace.h|  2 +-
 drivers/gpu/drm/i915/intel_breadcrumbs.c |  2 +-
 drivers/gpu/drm/i915/intel_lrc.c | 46 ++---
 drivers/gpu/drm/i915/intel_ringbuffer.c  | 86 
 drivers/gpu/drm/i915/intel_ringbuffer.h  |  7 +--
 9 files changed, 41 insertions(+), 118 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index 99a79a10ce75..1ced3738281c 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -600,7 +600,7 @@ static int i915_gem_pageflip_info(struct seq_file *m, void 
*data)
   ring->name,
   
i915_gem_request_get_seqno(work->flip_queued_req),
   dev_priv->next_seqno,
-  ring->get_seqno(ring),
+  intel_ring_get_seqno(ring),
   
i915_gem_request_completed(work->flip_queued_req));
} else
seq_printf(m, "Flip not associated with any 
ring\n");
@@ -731,7 +731,7 @@ static void i915_ring_seqno_info(struct seq_file *m,
 struct intel_engine_cs *ring)
 {
seq_printf(m, "Current sequence (%s): %x\n",
-  ring->name, ring->get_seqno(ring));
+  ring->name, intel_ring_get_seqno(ring));
 }
 
 static int i915_gem_seqno_info(struct seq_file *m, void *data)
@@ -1340,7 +1340,7 @@ static int i915_hangcheck_info(struct seq_file *m, void 
*unused)
intel_runtime_pm_get(dev_priv);
 
for_each_ring(ring, dev_priv, i) {
-   seqno[i] = ring->get_seqno(ring);
+   seqno[i] = intel_ring_get_seqno(ring);
acthd[i] = intel_ring_get_active_head(ring);
}
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 8eb0871cd7af..410939c3f05b 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2987,13 +2987,13 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2)
 
 static inline bool i915_gem_request_started(struct drm_i915_gem_request *req)
 {
-   return i915_seqno_passed(req->ring->get_seqno(req->ring),
+   return i915_seqno_passed(intel_ring_get_seqno(req->ring),
 req->previous_seqno);
 }
 
 static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
 {
-   return i915_seqno_passed(req->ring->get_seqno(req->ring),
+   return i915_seqno_passed(intel_ring_get_seqno(req->ring),
 req->seqno);
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c 
b/drivers/gpu/drm/i915/i915_gpu_error.c
index 7a427240c813..b4e4796fac1a 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -904,7 +904,7 @@ static void i915_record_ring_state(struct drm_device *dev,
ering->waiting = 
READ_ONCE(dev_priv->breadcrumbs.engine[ring->id].first);
ering->instpm = I915_READ(RING_INSTPM(ring->mmio_base));
ering->acthd = intel_ring_get_active_head(ring);
-   ering->seqno = ring->get_seqno(ring);
+   ering->seqno = intel_ring_get_seqno(ring);
ering->start = I915_READ_START(ring);
ering->head = I915_READ_HEAD(ring);
ering->tail = I915_READ_TAIL(ring);
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 43078e09e1f0..86f98d8eacc8 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2871,7 +2871,7 @@ static int semaphore_passed(struct intel_engine_cs *ring)
if (signaller->hangcheck.deadlock >= I915_NUM_RINGS)
return -1;
 
-   if (i915_seqno_passed(signaller->get_seqno(signaller), seqno))
+   if (i915_seqno_passed(intel_ring_get_seqno(signaller), seqno))
return 1;
 
/* cursory check for an unkickable deadlock */
@@ -2975,7 +2975,7 @@ static void i915_hangcheck_elapsed(struct work_struct 
*work)
semaphore_clear_deadlocks(dev_priv);
 
acthd = intel_ring_get_active_head(ring);
-   seqno = ring->get_seqno(ring);
+   seqno = intel_ring_get_seqno(ring);
 
if (ring->hangcheck.seqno == seqno) {
if (ring_idle(ring, seqno)) {
diff --git a/drivers/gpu/drm/i915/i915_trace.h 
b/drivers/gpu/drm/i915/i915_trace.h
index cfb5

[Intel-gfx] [PATCH 07/15] drm/i915: Check the timeout passed to i915_wait_request

2015-11-29 Thread Chris Wilson
We have relied upon the sole caller (wait_ioctl) validating the timeout
argument. However, when waiting for multiple requests I forgot to ensure
that the timeout was still positive on the later requests. This is more
simply done inside __i915_wait_request.

Fixes a minor regression introduced in

commit b47161858ba13c9c7e0132230d66e008dd55
Author: Chris Wilson 
Date:   Mon Apr 27 13:41:17 2015 +0100

drm/i915: Implement inter-engine read-read optimisations

where we may end up waiting for an extra jiffie for each active ring
after consuming all of the user's timeout.

Signed-off-by: Chris Wilson 
Cc: Lionel Landwerlin 
Cc: Tvrtko Ursulin 
Cc: Daniel Vetter 
---
 drivers/gpu/drm/i915/i915_gem.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index f33c35c6130f..65d101b47d8e 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1251,8 +1251,16 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
if (i915_gem_request_completed(req, true))
return 0;
 
-   timeout_expire = timeout ?
-   jiffies + nsecs_to_jiffies_timeout((u64)*timeout) : 0;
+   timeout_expire = 0;
+   if (timeout) {
+   if (WARN_ON(*timeout < 0))
+   return -EINVAL;
+
+   if (*timeout == 0)
+   return -ETIME;
+
+   timeout_expire = jiffies + nsecs_to_jiffies_timeout(*timeout);
+   }
 
if (INTEL_INFO(dev_priv)->gen >= 6)
gen6_rps_boost(dev_priv, rps, req->emitted_jiffies);
-- 
2.6.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 13/15] drm/i915: Stop setting wraparound seqno on initialisation

2015-11-29 Thread Chris Wilson
We have testcases to ensure that seqno wraparound works fine, so we can
forgo forcing everyone to encounter seqno wraparound during early
uptime. seqno wraparound incurs a full GPU stall so not forcing it
will eliminate one jitter from the early system.

Advancing the global next_seqno after a GPU reset is equally pointless.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/i915_gem.c | 16 +---
 1 file changed, 1 insertion(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 0cfdd971e8d7..2c3e36e19cb0 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4774,14 +4774,6 @@ i915_gem_init_hw(struct drm_device *dev)
}
}
 
-   /*
-* Increment the next seqno by 0x100 so we have a visible break
-* on re-initialisation
-*/
-   ret = i915_gem_set_seqno(dev, dev_priv->next_seqno+0x100);
-   if (ret)
-   goto out;
-
/* Now it is safe to go back round and do everything else: */
for_each_ring(ring, dev_priv, i) {
struct drm_i915_gem_request *req;
@@ -4969,13 +4961,7 @@ i915_gem_load(struct drm_device *dev)
dev_priv->num_fence_regs =
I915_READ(vgtif_reg(avail_rs.fence_num));
 
-   /*
-* Set initial sequence number for requests.
-* Using this number allows the wraparound to happen early,
-* catching any obvious problems.
-*/
-   dev_priv->next_seqno = ((u32)~0 - 0x1100);
-   dev_priv->last_seqno = ((u32)~0 - 0x1101);
+   dev_priv->next_seqno = 1;
 
/* Initialize fence registers to zero */
INIT_LIST_HEAD(&dev_priv->mm.fence_list);
-- 
2.6.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 12/15] drm/i915: Reduce seqno/irq barrier to a clflush on legacy gen6+

2015-11-29 Thread Chris Wilson
Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 3d59dd555e64..ccceb43f14ac 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1501,11 +1501,7 @@ pc_render_add_request(struct drm_i915_gem_request *req)
 static void
 gen6_seqno_barrier(struct intel_engine_cs *ring)
 {
-   /* Workaround to force correct ordering between irq and seqno writes on
-* ivb (and maybe also on snb) by reading from a CS register (like
-* ACTHD) before reading the status page. */
-   struct drm_i915_private *dev_priv = ring->i915;
-   POSTING_READ(RING_ACTHD(ring->mmio_base));
+   intel_flush_status_page(ring, I915_GEM_HWS_INDEX);
 }
 
 static bool
-- 
2.6.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 04/15] drm/i915: Cache the reset_counter for the request

2015-11-29 Thread Chris Wilson
Instead of querying the reset counter before every access to the ring,
query it the first time we touch the ring, and do a final compare when
submitting the request. For correctness, we need to then sanitize how
the reset_counter is incremented to prevent broken submission and
waiting across resets, in the process fixing the persistent -EIO we
still see today on failed waits.

v2: Rebase
v3: Now with added testcase
v4: Rebase

Testcase: igt/gem_eio
Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/i915_debugfs.c |  2 +-
 drivers/gpu/drm/i915/i915_drv.c | 32 +++-
 drivers/gpu/drm/i915/i915_drv.h | 39 ++
 drivers/gpu/drm/i915/i915_gem.c | 90 +++--
 drivers/gpu/drm/i915/i915_irq.c | 21 +---
 drivers/gpu/drm/i915/intel_display.c| 13 ++---
 drivers/gpu/drm/i915/intel_lrc.c| 13 +++--
 drivers/gpu/drm/i915/intel_ringbuffer.c | 14 ++---
 8 files changed, 111 insertions(+), 113 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index a728ff11e389..8458447ddc17 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -4660,7 +4660,7 @@ i915_wedged_get(void *data, u64 *val)
struct drm_device *dev = data;
struct drm_i915_private *dev_priv = dev->dev_private;
 
-   *val = atomic_read(&dev_priv->gpu_error.reset_counter);
+   *val = i915_terminally_wedged(&dev_priv->gpu_error);
 
return 0;
 }
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 90faa8e03fca..eb893a5e00b1 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -923,23 +923,31 @@ int i915_resume_switcheroo(struct drm_device *dev)
 int i915_reset(struct drm_device *dev)
 {
struct drm_i915_private *dev_priv = dev->dev_private;
+   struct i915_gpu_error *error = &dev_priv->gpu_error;
+   unsigned reset_counter;
bool simulated;
int ret;
 
intel_reset_gt_powersave(dev);
 
mutex_lock(&dev->struct_mutex);
+   atomic_andnot(I915_WEDGED, &error->reset_counter);
+   reset_counter = atomic_inc_return(&error->reset_counter);
+   if (WARN_ON(__i915_reset_in_progress(reset_counter))) {
+   ret = -EIO;
+   goto error;
+   }
 
i915_gem_reset(dev);
 
-   simulated = dev_priv->gpu_error.stop_rings != 0;
+   simulated = error->stop_rings != 0;
 
ret = intel_gpu_reset(dev);
 
/* Also reset the gpu hangman. */
if (simulated) {
DRM_INFO("Simulated gpu hang, resetting stop_rings\n");
-   dev_priv->gpu_error.stop_rings = 0;
+   error->stop_rings = 0;
if (ret == -ENODEV) {
DRM_INFO("Reset not implemented, but ignoring "
 "error for simulated gpu hangs\n");
@@ -952,8 +960,7 @@ int i915_reset(struct drm_device *dev)
 
if (ret) {
DRM_ERROR("Failed to reset chip: %i\n", ret);
-   mutex_unlock(&dev->struct_mutex);
-   return ret;
+   goto error;
}
 
intel_overlay_reset(dev_priv);
@@ -972,20 +979,14 @@ int i915_reset(struct drm_device *dev)
 * was running at the time of the reset (i.e. we weren't VT
 * switched away).
 */
-
-   /* Used to prevent gem_check_wedged returning -EAGAIN during gpu reset 
*/
-   dev_priv->gpu_error.reload_in_reset = true;
-
ret = i915_gem_init_hw(dev);
-
-   dev_priv->gpu_error.reload_in_reset = false;
-
-   mutex_unlock(&dev->struct_mutex);
if (ret) {
DRM_ERROR("Failed hw init on reset %d\n", ret);
-   return ret;
+   goto error;
}
 
+   mutex_unlock(&dev->struct_mutex);
+
/*
 * rps/rc6 re-init is necessary to restore state lost after the
 * reset and the re-install of gt irqs. Skip for ironlake per
@@ -996,6 +997,11 @@ int i915_reset(struct drm_device *dev)
intel_enable_gt_powersave(dev);
 
return 0;
+
+error:
+   atomic_or(I915_WEDGED, &error->reset_counter);
+   mutex_unlock(&dev->struct_mutex);
+   return ret;
 }
 
 static int i915_pci_probe(struct pci_dev *pdev, const struct pci_device_id 
*ent)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index d07041c1729d..c24c23d8a0c0 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1384,9 +1384,6 @@ struct i915_gpu_error {
 
/* For missed irq/seqno simulation. */
unsigned int test_irq_rings;
-
-   /* Used to prevent gem_check_wedged returning -EAGAIN during gpu reset  
 */
-   bool reload_in_reset;
 };
 
 enum modeset_restore {
@@ -2181,6 +2178,7 @@ struct drm_i915_gem_request {
/** On Which ring this request was generated */
struct drm_i915_private 

[Intel-gfx] [PATCH 08/15] drm/i915: Slaughter the thundering i915_wait_request herd

2015-11-29 Thread Chris Wilson
One particularly stressful scenario consists of many independent tasks
all competing for GPU time and waiting upon the results (e.g. realtime
transcoding of many, many streams). One bottleneck in particular is that
each client waits on its own results, but every client is woken up after
every batchbuffer - hence the thunder of hooves as then every client must
do its heavyweight dance to read a coherent seqno to see if it is the
lucky one. Alternatively, we can have one kthread responsible for waking
after an interrupt, checking the seqno and only waking up the waiting
clients who are complete. The disadvantage is that in the uncontended
scenario (i.e. only one waiter) we incur an extra context switch in the
wakeup path - though that should be mitigated somewhat by the busy-wait
we do first before sleeping.

v2: Convert from a kworker per engine into a dedicated kthread for the
bottom-half.

Signed-off-by: Chris Wilson 
Cc: "Rogozhkin, Dmitry V" 
Cc: "Gong, Zhipeng" 
Cc: Tvrtko Ursulin 
---
 drivers/gpu/drm/i915/Makefile|   1 +
 drivers/gpu/drm/i915/i915_dma.c  |   9 +-
 drivers/gpu/drm/i915/i915_drv.h  |  35 -
 drivers/gpu/drm/i915/i915_gem.c  |  99 +++-
 drivers/gpu/drm/i915/i915_gpu_error.c|   8 +-
 drivers/gpu/drm/i915/i915_irq.c  |  17 +--
 drivers/gpu/drm/i915/intel_breadcrumbs.c | 253 +++
 drivers/gpu/drm/i915/intel_lrc.c |   2 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c  |   3 +-
 drivers/gpu/drm/i915/intel_ringbuffer.h  |   3 +-
 10 files changed, 333 insertions(+), 97 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/intel_breadcrumbs.c

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 0851de07bd13..d3b9d3618719 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -35,6 +35,7 @@ i915-y += i915_cmd_parser.o \
  i915_gem_userptr.o \
  i915_gpu_error.o \
  i915_trace_points.o \
+ intel_breadcrumbs.o \
  intel_lrc.o \
  intel_mocs.o \
  intel_ringbuffer.o \
diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index a81c76603544..93c6762d87b4 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -388,12 +388,16 @@ static int i915_load_modeset_init(struct drm_device *dev)
if (ret)
goto cleanup_vga_client;
 
+   ret = intel_breadcrumbs_init(dev_priv);
+   if (ret)
+   goto cleanup_vga_switcheroo;
+
/* Initialise stolen first so that we may reserve preallocated
 * objects for the BIOS to KMS transition.
 */
ret = i915_gem_init_stolen(dev);
if (ret)
-   goto cleanup_vga_switcheroo;
+   goto cleanup_breadcrumbs;
 
intel_power_domains_init_hw(dev_priv, false);
 
@@ -454,6 +458,8 @@ cleanup_irq:
drm_irq_uninstall(dev);
 cleanup_gem_stolen:
i915_gem_cleanup_stolen(dev);
+cleanup_breadcrumbs:
+   intel_breadcrumbs_fini(dev_priv);
 cleanup_vga_switcheroo:
vga_switcheroo_unregister_client(dev->pdev);
 cleanup_vga_client:
@@ -1193,6 +1199,7 @@ int i915_driver_unload(struct drm_device *dev)
mutex_unlock(&dev->struct_mutex);
intel_fbc_cleanup_cfb(dev_priv);
i915_gem_cleanup_stolen(dev);
+   intel_breadcrumbs_fini(dev_priv);
 
intel_csr_ucode_fini(dev_priv);
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index e1980212ba37..782d08ab6264 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -564,6 +564,7 @@ struct drm_i915_error_state {
long jiffies;
u32 seqno;
u32 tail;
+   u32 waiters;
} *requests;
 
struct {
@@ -1383,7 +1384,7 @@ struct i915_gpu_error {
 #define I915_STOP_RING_ALLOW_WARN  (1 << 30)
 
/* For missed irq/seqno simulation. */
-   unsigned int test_irq_rings;
+   unsigned long test_irq_rings;
 };
 
 enum modeset_restore {
@@ -1695,6 +1696,27 @@ struct drm_i915_private {
 
struct intel_uncore uncore;
 
+   /* Rather than have every client wait upon all user interrupts,
+* with the herd waking after every interrupt and each doing the
+* heavyweight seqno dance, we delegate the task to a bottom-half
+* for the user interrupt (one bh handling all engines). Now,
+* just one task is woken up and does the coherent seqno read and
+* then wakes up all the waiters registered with the bottom-half
+* that are complete. This avoid the thundering herd problem
+* with lots of concurrent waiters, at the expense of an extra
+* context switch between the interrupt and the client. That should
+* be mitigated somewhat by the busyspin in the client before we go to
+* sleep waiting 

[Intel-gfx] [PATCH 10/15] drm/i915: Remove the lazy_coherency parameter from request-completed?

2015-11-29 Thread Chris Wilson
Now that we have split out the seqno-barrier from the
engine->get_seqno() callback itself, we can move the users of the
seqno-barrier to the required callsites simplifying the common code and
making the required workaround handling much more explicit.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/i915_debugfs.c  |  2 +-
 drivers/gpu/drm/i915/i915_drv.h  | 10 ++
 drivers/gpu/drm/i915/i915_gem.c  | 30 ++
 drivers/gpu/drm/i915/intel_breadcrumbs.c |  7 ++-
 drivers/gpu/drm/i915/intel_display.c |  2 +-
 drivers/gpu/drm/i915/intel_pm.c  |  4 ++--
 6 files changed, 30 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index 1e8aa897673a..99a79a10ce75 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -601,7 +601,7 @@ static int i915_gem_pageflip_info(struct seq_file *m, void 
*data)
   
i915_gem_request_get_seqno(work->flip_queued_req),
   dev_priv->next_seqno,
   ring->get_seqno(ring),
-  
i915_gem_request_completed(work->flip_queued_req, true));
+  
i915_gem_request_completed(work->flip_queued_req));
} else
seq_printf(m, "Flip not associated with any 
ring\n");
seq_printf(m, "Flip queued on frame %d, (was ready on 
frame %d), now %d\n",
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index a9c1785ac08c..8eb0871cd7af 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2985,20 +2985,14 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2)
return (int32_t)(seq1 - seq2) >= 0;
 }
 
-static inline bool i915_gem_request_started(struct drm_i915_gem_request *req,
-  bool lazy_coherency)
+static inline bool i915_gem_request_started(struct drm_i915_gem_request *req)
 {
-   if (!lazy_coherency && req->ring->seqno_barrier)
-   req->ring->seqno_barrier(req->ring);
return i915_seqno_passed(req->ring->get_seqno(req->ring),
 req->previous_seqno);
 }
 
-static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
- bool lazy_coherency)
+static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
 {
-   if (!lazy_coherency && req->ring->seqno_barrier)
-   req->ring->seqno_barrier(req->ring);
return i915_seqno_passed(req->ring->get_seqno(req->ring),
 req->seqno);
 }
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 969592fb0969..0cfdd971e8d7 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1173,13 +1173,16 @@ static int __i915_spin_request(struct 
drm_i915_gem_request *req, int state)
 * takes to sleep on a request, on the order of a microsecond.
 */
 
+   if (req->ring->seqno_barrier)
+   req->ring->seqno_barrier(req->ring);
+
/* Only spin if we know the GPU is processing this request */
-   if (!i915_gem_request_started(req, false))
+   if (!i915_gem_request_started(req))
return -EAGAIN;
 
timeout = local_clock_us(&cpu) + 10;
-   while (!need_resched()) {
-   if (i915_gem_request_completed(req, true))
+   do {
+   if (i915_gem_request_completed(req))
return 0;
 
if (signal_pending_state(state, current))
@@ -1189,9 +1192,12 @@ static int __i915_spin_request(struct 
drm_i915_gem_request *req, int state)
break;
 
cpu_relax_lowlatency();
-   }
+   } while (!need_resched());
+
+   if (req->ring->seqno_barrier)
+   req->ring->seqno_barrier(req->ring);
 
-   if (i915_gem_request_completed(req, false))
+   if (i915_gem_request_completed(req))
return 0;
 
return -EAGAIN;
@@ -1227,7 +1233,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
if (list_empty(&req->list))
return 0;
 
-   if (i915_gem_request_completed(req, true))
+   if (i915_gem_request_completed(req))
return 0;
 
timeout_remain = 0;
@@ -1258,7 +1264,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
for (;;) {
prepare_to_wait(&req->wait, &wait, state);
 
-   if (i915_gem_request_completed(req, true) ||
+   if (i915_gem_request_completed(req) ||
req->reset_counter != 
i915_reset_counter(&req->i915->gpu_error)) {
ret = 0;
   

[Intel-gfx] [PATCH 03/15] drm/i915: Only spin whilst waiting on the current request

2015-11-29 Thread Chris Wilson
Limit busywaiting only to the request currently being processed by the
GPU. If the request is not currently being processed by the GPU, there
is a very low likelihood of it being completed within the 2 microsecond
spin timeout and so we will just be wasting CPU cycles.

v2: Check for logical inversion when rebasing - we were incorrectly
checking for this request being active, and instead busywaiting for
when the GPU was not yet processing the request of interest.

v3: Try another colour for the seqno names.
v4: Another colour for the function names.

Signed-off-by: Chris Wilson 
Cc: "Rogozhkin, Dmitry V" 
Cc: Daniel Vetter 
Cc: Tvrtko Ursulin 
Cc: Eero Tamminen 
Cc: "Rantala, Valtteri" 
Cc: sta...@kernel.vger.org
---
 drivers/gpu/drm/i915/i915_drv.h | 27 +++
 drivers/gpu/drm/i915/i915_gem.c |  8 +++-
 2 files changed, 26 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index bc865e234df2..d07041c1729d 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2182,8 +2182,17 @@ struct drm_i915_gem_request {
struct drm_i915_private *i915;
struct intel_engine_cs *ring;
 
-   /** GEM sequence number associated with this request. */
-   uint32_t seqno;
+/** GEM sequence number associated with the previous request,
+ * when the HWS breadcrumb is equal to this the GPU is processing
+ * this request.
+ */
+   u32 previous_seqno;
+
+/** GEM sequence number associated with this request,
+ * when the HWS breadcrumb is equal or greater than this the GPU
+ * has finished processing this request.
+ */
+   u32 seqno;
 
/** Position in the ringbuffer of the start of the request */
u32 head;
@@ -2945,15 +2954,17 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2)
return (int32_t)(seq1 - seq2) >= 0;
 }
 
+static inline bool i915_gem_request_started(struct drm_i915_gem_request *req,
+  bool lazy_coherency)
+{
+   u32 seqno = req->ring->get_seqno(req->ring, lazy_coherency);
+   return i915_seqno_passed(seqno, req->previous_seqno);
+}
+
 static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
  bool lazy_coherency)
 {
-   u32 seqno;
-
-   BUG_ON(req == NULL);
-
-   seqno = req->ring->get_seqno(req->ring, lazy_coherency);
-
+   u32 seqno = req->ring->get_seqno(req->ring, lazy_coherency);
return i915_seqno_passed(seqno, req->seqno);
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index bad112abb16b..ada461e02718 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1193,9 +1193,13 @@ static int __i915_spin_request(struct 
drm_i915_gem_request *req, int state)
 * takes to sleep on a request, on the order of a microsecond.
 */
 
-   if (i915_gem_request_get_ring(req)->irq_refcount)
+   if (req->ring->irq_refcount)
return -EBUSY;
 
+   /* Only spin if we know the GPU is processing this request */
+   if (!i915_gem_request_started(req, false))
+   return -EAGAIN;
+
timeout = local_clock_us(&cpu) + 10;
while (!need_resched()) {
if (i915_gem_request_completed(req, true))
@@ -1209,6 +1213,7 @@ static int __i915_spin_request(struct 
drm_i915_gem_request *req, int state)
 
cpu_relax_lowlatency();
}
+
if (i915_gem_request_completed(req, false))
return 0;
 
@@ -2592,6 +2597,7 @@ void __i915_add_request(struct drm_i915_gem_request 
*request,
request->batch_obj = obj;
 
request->emitted_jiffies = jiffies;
+   request->previous_seqno = ring->last_submitted_seqno;
ring->last_submitted_seqno = request->seqno;
list_add_tail(&request->list, &ring->request_list);
 
-- 
2.6.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 14/15] drm/i915: Only query timestamp when measuring elapsed time

2015-11-29 Thread Chris Wilson
Avoid the two calls to ktime_get_raw_ns() (at best it reads the TSC) as
we only need to compute the elapsed time for a timed wait.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/i915_gem.c | 13 +
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 2c3e36e19cb0..871201713c73 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1227,7 +1227,6 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
int state = interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
DEFINE_WAIT(wait);
unsigned long timeout_remain;
-   s64 before, now;
int ret;
 
if (list_empty(&req->list))
@@ -1244,13 +1243,12 @@ int __i915_wait_request(struct drm_i915_gem_request 
*req,
if (*timeout == 0)
return -ETIME;
 
+   /* Record current time in case interrupted, or wedged */
timeout_remain = nsecs_to_jiffies_timeout(*timeout);
+   *timeout += ktime_get_raw_ns();
}
 
-   /* Record current time in case interrupted by signal, or wedged */
trace_i915_gem_request_wait_begin(req);
-   before = ktime_get_raw_ns();
-
if (INTEL_INFO(req->i915)->gen >= 6)
gen6_rps_boost(req->i915, rps, req->emitted_jiffies);
 
@@ -1286,14 +1284,13 @@ int __i915_wait_request(struct drm_i915_gem_request 
*req,
}
finish_wait(&req->wait, &wait);
 out:
-   now = ktime_get_raw_ns();
intel_breadcrumbs_remove_waiter(req);
trace_i915_gem_request_wait_end(req);
 
if (timeout) {
-   s64 tres = *timeout - (now - before);
-
-   *timeout = tres < 0 ? 0 : tres;
+   *timeout -= ktime_get_raw_ns();
+   if (*timeout < 0)
+   *timeout = 0;
 
/*
 * Apparently ktime isn't accurate enough and occasionally has a
-- 
2.6.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] i915_wait_request scaling

2015-11-29 Thread Chris Wilson
The first 3 patches are cc:stable for reducing the negative
side-effects of busywaiting. Following on from them, we have a set of
patches to prevent the thundering herd issue with multiple concurrent
waiters and to optimise the waiters. Since this has been flagged as
severely impacting some client loads, I've moved this forward of the
series to cut the execbuffer overheads.

Available as:
http://cgit.freedesktop.org/~ickle/linux-2.6/log/?h=breadcrumbs
-Chris

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx