date:20150701

Re: [Intel-gfx] [PATCH] drm/i915: apply the PCI_D0/D3 hibernation workaround everywhere on pre GEN6

2015-07-01 Thread shuang . he

Tested-By: Intel Graphics QA PRTS (Patch Regression Test System Contact: 
shuang...@intel.com)
Task id: 6683
-Summary-
Platform  Delta  drm-intel-nightly  Series Applied
ILK  302/302  302/302
SNB  312/316  312/316
IVB  343/343  343/343
BYT -2  287/287  285/287
HSW  380/380  380/380
-Detailed-
Platform  Testdrm-intel-nightly  Series 
Applied
*BYT  igt@gem_partial_pwrite_pread@reads-display  PASS(1)  FAIL(1)
*BYT  igt@gem_tiled_partial_pwrite_pread@reads  PASS(1)  FAIL(1)
Note: You need to pay more attention to line start with '*'
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 3/3] drm/i915: Read HDMI EDID only when required

2015-07-01 Thread Sharma, Shashank


Regards
Shashank

On 7/1/2015 6:26 PM, Daniel Vetter wrote:

On Tue, Jun 30, 2015 at 09:49:57PM +0530, Shashank Sharma wrote:

Userspace always sets force. Are you sure this actually improves anything?

Yes we do. We have had this code for commercial projects, and that's really
  important to have proper interrupt handling as well as to avoid race
condition between multiple HDMI detects from interrupt handler and
userspace detect calls. This is a must for HDMI compliance also.


There's no race, we have locks for this. And we already have a little bit
of edid caching, and you're code won't add more caching for the force =
true case. Which is why I wondered whether you've really seen improvements
with this on latest upstream, and not just an older android tree.
This caching is not useful, as in every detect call, we are doing a 
unset EDID, and doing a set EDID again. The actual reason behind 
multiple HDMI EDID reads is detect() getting called from user space 
only.So every time there is a detect call, there is HDMI EDID read.


We should read EDID only on hotplug, cache it, and reuse it until hot 
unplug.



Actually the plan is to use this force for GEN < 6 HW only, where the
hotplug doesn't work reliably (I remember our last conversation on some old
HW which doesn't support HPD properly). For vlv+, we can (will) use only
the cached EDID.


Ok, once more: HDMI hpd is unreliable everywhere. I have a gen7 here which
is half-busted it seems, and we've found examples for every single
platform that supports hdmi out there. The problem isn't necessarily spec
compliant HDMI sinks, but all the other crap ppl like to plug in.

Yes this means we'll not be spec compliant, but if we have reality vs.
spec, reality wins. At least in upstream.


We have tested HPD with several HDMI monitors for VLV, CHV SKL and now 
for BXT also. We are getting reliable hotplugs and unplugs across these 
platforms, with accurate information on long/short pulses. Can you 
please give some details about what is your observation ?


Its very important for the Android projects to comply with the spec due 
to certification pressure from customers. And we can get a common path 
for us, if we know what exactly is the problem. But for sure we cant 
ignore this factor that compliance is essential.



Also the goal should be to keep things cache for a few calls from
userspace (since often it pokes a few times in a row unfortuantely), for
which we need a proper timeout to clear the edid again.


Can you please let us know why ? Why do we need to clear this EDID caching
? We should clear it only in the next hot-unplug, and maintain this cached
EDID for all userspace detect operations. I believe as long as we have the
state machine maintained, we need not to clear it.


hotplug is not reliable, at least not outside of labs and validation
testers. And your code here throws the cached edid away every time force =
true is set, which is pretty much always. At least on upstream.

The only place where we don't set force is in the poll worker, and that's
only run when we have a hpd storm.
-Daniel

The new code doesn't throw away cached EDID for platform's > GEN6
but the current code does that, in every detect call.

The current state machine is:
=
1. Hotplug -> Unset EDID, Read EDID, Set edid
2. all detect calls -> Unset EDID, read EDID, Set EDID
3. Hotunplug -> Unset EDID

The state machine I am suggesting is:
=
1. Hotplug -> throw away cached EDID, cache new one probing DDC
2. all detect() calls ->
use cached EDID only
3. hot unplug -> throw away cached EDID,

OR if you want to support some old platforms:
=
1. Hotplug -> throw away cached EDID, cache new one probing DDC
2. all detect() calls -> (Support old HW with unstable HPD)
if (gen > GEN6)
use cached EDID only
else
probe DDC and read EDID, update cache
3. hot unplug -> throw away cached EDID,





-Daniel


Regards
Shashank

On Tue, Jun 30, 2015 at 4:36 PM, Daniel Vetter  wrote:


On Tue, Jun 30, 2015 at 11:13:58AM +0530, Sonika Jindal wrote:

From: Shashank Sharma 

This patch makes sure that the HDMI detect function
reads EDID only when its forced to do it. All the other
times, it uses the connector->detect_edid which was cached
during hotplug handling in the hdmi_probe() function. As the
probe function gets called before detect in the interrupt handler
and handles the EDID cacheing part, its absolutely safe to assume
that presence of EDID reflects monitor connected and viceversa.

This will save us from many race conditions between hotplug/unplug
detect call handler threads and userspace calls for the same.
The previous patch in this patch series explains this in detail.

Signed-off-by: Shashank Sharma 
---
  drivers/gpu/drm/i915/intel_hdmi.c |   26 --
  1 file changed, 20 insertions(+), 6 deletions(-)

diff -

Re: [Intel-gfx] [PATCH] drm/i915: Clear pipe's pll hw state in hsw_dp_set_ddi_pll_sel()

2015-07-01 Thread shuang . he

Tested-By: Intel Graphics QA PRTS (Patch Regression Test System Contact: 
shuang...@intel.com)
Task id: 6681
-Summary-
Platform  Delta  drm-intel-nightly  Series Applied
ILK  302/302  302/302
SNB  312/316  312/316
IVB  343/343  343/343
BYT -2  287/287  285/287
HSW  380/380  380/380
-Detailed-
Platform  Testdrm-intel-nightly  Series 
Applied
*BYT  igt@gem_partial_pwrite_pread@reads  PASS(1)  FAIL(1)
*BYT  igt@gem_tiled_partial_pwrite_pread@reads  PASS(1)  FAIL(1)
Note: You need to pay more attention to line start with '*'
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [RFC 4/8] drm/i915: Refactor ilk_update_wm (v3)

2015-07-01 Thread Matt Roper

From: Ville Syrjälä 

Split ilk_update_wm() into two parts; one doing the programming
and the other the calculations.

v2: Fix typo in commit message

v3 (by Matt): Heavily rebased for current codebase.

Reviewed-by(v2): Paulo Zanoni 
Signed-off-by: Ville Syrjälä 
Signed-off-by: Matt Roper 
---
 drivers/gpu/drm/i915/intel_pm.c | 60 ++---
 1 file changed, 33 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index d061fcd..44e361c 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -3708,37 +3708,14 @@ skl_update_sprite_wm(struct drm_plane *plane, struct 
drm_crtc *crtc,
skl_update_wm(crtc);
 }
 
-static void ilk_update_wm(struct drm_crtc *crtc)
+static void ilk_program_watermarks(struct drm_i915_private *dev_priv)
 {
-   struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
-   struct intel_crtc_state *cstate = to_intel_crtc_state(crtc->state);
-   struct drm_device *dev = crtc->dev;
-   struct drm_i915_private *dev_priv = dev->dev_private;
+   struct drm_device *dev = dev_priv->dev;
+   struct intel_pipe_wm lp_wm_1_2 = {}, lp_wm_5_6 = {}, *best_lp_wm;
struct ilk_wm_maximums max;
+   struct intel_wm_config config = {};
struct ilk_wm_values results = {};
enum intel_ddb_partitioning partitioning;
-   struct intel_pipe_wm pipe_wm = {};
-   struct intel_pipe_wm lp_wm_1_2 = {}, lp_wm_5_6 = {}, *best_lp_wm;
-   struct intel_wm_config config = {};
-
-   /*
-* IVB workaround: must disable low power watermarks for at least
-* one frame before enabling scaling.  LP watermarks can be re-enabled
-* when scaling is disabled.
-*
-* WaCxSRDisabledForSpriteScaling:ivb
-*/
-   if (cstate->disable_lp_wm) {
-   ilk_disable_lp_wm(dev);
-   intel_wait_for_vblank(dev, intel_crtc->pipe);
-   }
-
-   intel_compute_pipe_wm(cstate, &pipe_wm);
-
-   if (!memcmp(&intel_crtc->wm.active, &pipe_wm, sizeof(pipe_wm)))
-   return;
-
-   intel_crtc->wm.active = pipe_wm;
 
ilk_compute_wm_config(dev, &config);
 
@@ -3764,6 +3741,35 @@ static void ilk_update_wm(struct drm_crtc *crtc)
ilk_write_wm_values(dev_priv, &results);
 }
 
+static void ilk_update_wm(struct drm_crtc *crtc)
+{
+   struct drm_i915_private *dev_priv = to_i915(crtc->dev);
+   struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
+   struct intel_crtc_state *cstate = to_intel_crtc_state(crtc->state);
+   struct intel_pipe_wm pipe_wm = {};
+
+   /*
+* IVB workaround: must disable low power watermarks for at least
+* one frame before enabling scaling.  LP watermarks can be re-enabled
+* when scaling is disabled.
+*
+* WaCxSRDisabledForSpriteScaling:ivb
+*/
+   if (cstate->disable_lp_wm) {
+   ilk_disable_lp_wm(crtc->dev);
+   intel_wait_for_vblank(crtc->dev, intel_crtc->pipe);
+   }
+
+   intel_compute_pipe_wm(cstate, &pipe_wm);
+
+   if (!memcmp(&intel_crtc->wm.active, &pipe_wm, sizeof(pipe_wm)))
+   return;
+
+   intel_crtc->wm.active = pipe_wm;
+
+   ilk_program_watermarks(dev_priv);
+}
+
 static void skl_pipe_wm_active_state(uint32_t val,
 struct skl_pipe_wm *active,
 bool is_transwm,
-- 
2.1.4

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [RFC 3/8] drm/i915/ivb: Move WaCxSRDisabledForSpriteScaling w/a to atomic check

2015-07-01 Thread Matt Roper

Determine whether we need to apply this workaround at atomic check time
and just set a flag that will be used by the main watermark update
routine.

Moving this workaround into the atomic framework reduces
ilk_update_sprite_wm() to just a standard watermark update, so drop it
completely and just ensure that ilk_update_wm() is called whenever a
sprite plane is updated in a way that would affect watermarks.

Signed-off-by: Matt Roper 
---
 drivers/gpu/drm/i915/intel_atomic.c  |  1 +
 drivers/gpu/drm/i915/intel_display.c | 40 +---
 drivers/gpu/drm/i915/intel_drv.h |  3 +++
 drivers/gpu/drm/i915/intel_pm.c  | 35 +++
 drivers/gpu/drm/i915/intel_sprite.c  |  8 
 5 files changed, 49 insertions(+), 38 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_atomic.c 
b/drivers/gpu/drm/i915/intel_atomic.c
index 0aeced8..6673516 100644
--- a/drivers/gpu/drm/i915/intel_atomic.c
+++ b/drivers/gpu/drm/i915/intel_atomic.c
@@ -230,6 +230,7 @@ intel_crtc_duplicate_state(struct drm_crtc *crtc)
__drm_atomic_helper_crtc_duplicate_state(crtc, &crtc_state->base);
 
crtc_state->base.crtc = crtc;
+   crtc_state->disable_lp_wm = false;
 
return &crtc_state->base;
 }
diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index 5de1ded..46ef981 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -11560,23 +11560,38 @@ retry:
 static bool intel_wm_need_update(struct drm_plane *plane,
 struct drm_plane_state *state)
 {
-   /* Update watermarks on tiling changes. */
+   struct intel_plane_state *new = to_intel_plane_state(state);
+   struct intel_plane_state *cur = to_intel_plane_state(plane->state);
+
+   /* Update watermarks on tiling or size changes. */
if (!plane->state->fb || !state->fb ||
plane->state->fb->modifier[0] != state->fb->modifier[0] ||
-   plane->state->rotation != state->rotation)
-   return true;
-
-   if (plane->state->crtc_w != state->crtc_w)
+   plane->state->rotation != state->rotation ||
+   drm_rect_width(&new->src) != drm_rect_width(&cur->src) ||
+   drm_rect_height(&new->src) != drm_rect_height(&cur->src) ||
+   drm_rect_width(&new->dst) != drm_rect_width(&cur->dst) ||
+   drm_rect_height(&new->dst) != drm_rect_height(&cur->dst))
return true;
 
return false;
 }
 
+static bool needs_scaling(struct intel_plane_state *state)
+{
+   int src_w = drm_rect_width(&state->src) >> 16;
+   int src_h = drm_rect_height(&state->src) >> 16;
+   int dst_w = drm_rect_width(&state->dst);
+   int dst_h = drm_rect_height(&state->dst);
+
+   return (src_w != dst_w || src_h != dst_h);
+}
+
 int intel_plane_atomic_calc_changes(struct drm_crtc_state *crtc_state,
struct drm_plane_state *plane_state)
 {
struct drm_crtc *crtc = crtc_state->crtc;
struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
+   struct intel_crtc_state *cstate = to_intel_crtc_state(crtc_state);
struct drm_plane *plane = plane_state->plane;
struct drm_device *dev = crtc->dev;
struct drm_i915_private *dev_priv = dev->dev_private;
@@ -11587,7 +11602,6 @@ int intel_plane_atomic_calc_changes(struct 
drm_crtc_state *crtc_state,
bool mode_changed = needs_modeset(crtc_state);
bool was_crtc_enabled = crtc->state->active;
bool is_crtc_enabled = crtc_state->active;
-
bool turn_off, turn_on, visible, was_visible;
struct drm_framebuffer *fb = plane_state->fb;
 
@@ -11705,11 +11719,23 @@ int intel_plane_atomic_calc_changes(struct 
drm_crtc_state *crtc_state,
case DRM_PLANE_TYPE_CURSOR:
break;
case DRM_PLANE_TYPE_OVERLAY:
-   if (turn_off && !mode_changed) {
+   /*
+* WaCxSRDisabledForSpriteScaling:ivb
+*
+* atomic.update_wm was already set above, so this flag will
+* take effect when we commit and program watermarks.
+*/
+   if (IS_IVYBRIDGE(dev) &&
+   needs_scaling(to_intel_plane_state(plane_state)) &&
+   !needs_scaling(old_plane_state)) {
+   cstate->disable_lp_wm = true;
+   } else if (turn_off && !mode_changed) {
intel_crtc->atomic.wait_vblank = true;
intel_crtc->atomic.update_sprite_watermarks |=
1 << i;
}
+
+   break;
}
return 0;
 }
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index 9ffacc0..cdc7d6d 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -460,6 +460,9 @@ struct intel_crtc_state {

[Intel-gfx] [RFC 6/8] drm/i915: Calculate ILK-style watermarks during atomic check (v2)

2015-07-01 Thread Matt Roper

Calculate pipe watermarks during atomic calculation phase, based on the
contents of the atomic transaction's state structure.  We still program
the watermarks at the same time we did before, but the computation now
happens much earlier.

While this patch isn't too exciting by itself, it paves the way for
future patches.  The eventual goal (which will be realized in future
patches in this series) is to calculate multiple sets up watermark
values up front, and then program them at different times (pre- vs
post-vblank) on the platforms that need a two-step watermark update.

While we're at it, s/intel_compute_pipe_wm/ilk_compute_pipe_wm/ since
this function only applies to ILK-style watermarks and we have a
completely different function for SKL-style watermarks.

Note that the original code had a memcmp() in ilk_update_wm() to avoid
calling ilk_program_watermarks() if the watermarks hadn't changed.  This
memcmp vanishes here, which means we may do some unnecessary result
generation and merging in cases where watermarks didn't change, but the
lower-level function ilk_write_wm_values already makes sure that we
don't actually try to program the watermark registers again.

v2: Squash a few commits from the original series together; no longer
leave pre-calculated wm's in a separate temporary structure since
it's easier to follow the logic if we just cut over to using the
pre-calculated values directly.

Signed-off-by: Matt Roper 
---
 drivers/gpu/drm/i915/i915_drv.h  |  2 +
 drivers/gpu/drm/i915/intel_display.c |  6 +++
 drivers/gpu/drm/i915/intel_drv.h |  2 +
 drivers/gpu/drm/i915/intel_pm.c  | 87 ++--
 4 files changed, 53 insertions(+), 44 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 6aa8083..2774976 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -621,6 +621,8 @@ struct drm_i915_display_funcs {
  int target, int refclk,
  struct dpll *match_clock,
  struct dpll *best_clock);
+   int (*compute_pipe_wm)(struct drm_crtc *crtc,
+  struct drm_atomic_state *state);
void (*update_wm)(struct drm_crtc *crtc);
void (*update_sprite_wm)(struct drm_plane *plane,
 struct drm_crtc *crtc,
diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index 36ae3f7..46b62cc 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -11857,6 +11857,12 @@ static int intel_crtc_atomic_check(struct drm_crtc 
*crtc,
return ret;
}
 
+   if (dev_priv->display.compute_pipe_wm) {
+   ret = dev_priv->display.compute_pipe_wm(crtc, state);
+   if (ret)
+   return ret;
+   }
+
return intel_atomic_setup_scalers(dev, intel_crtc, pipe_config);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index c23cf7d..335b24b 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -1445,6 +1445,8 @@ intel_atomic_get_crtc_state(struct drm_atomic_state 
*state,
 int intel_atomic_setup_scalers(struct drm_device *dev,
struct intel_crtc *intel_crtc,
struct intel_crtc_state *crtc_state);
+int intel_check_crtc(struct drm_crtc *crtc,
+struct drm_crtc_state *state);
 
 /* intel_atomic_plane.c */
 struct intel_plane_state *intel_create_plane_state(struct drm_plane *plane);
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 0e28806..c6e735f 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -2039,9 +2039,11 @@ static void ilk_compute_wm_level(const struct 
drm_i915_private *dev_priv,
 const struct intel_crtc *intel_crtc,
 int level,
 struct intel_crtc_state *cstate,
+struct intel_plane_state *pristate,
+struct intel_plane_state *sprstate,
+struct intel_plane_state *curstate,
 struct intel_wm_level *result)
 {
-   struct intel_plane *intel_plane;
uint16_t pri_latency = dev_priv->wm.pri_latency[level];
uint16_t spr_latency = dev_priv->wm.spr_latency[level];
uint16_t cur_latency = dev_priv->wm.cur_latency[level];
@@ -2053,29 +2055,11 @@ static void ilk_compute_wm_level(const struct 
drm_i915_private *dev_priv,
cur_latency *= 5;
}
 
-   for_each_intel_plane_on_crtc(dev_priv->dev, intel_crtc, intel_plane) {
-   struct intel_plane_state *pstate =
-   to_intel_plane_state(intel_plane->base.state);
-
-   switch (intel_plane->base.type) {

[Intel-gfx] [RFC 7/8] drm/i915: Allow final wm programming to be scheduled after next vblank (v2)

2015-07-01 Thread Matt Roper

Add a simple mechanism to trigger final watermark updates in an
asynchronous manner once the next vblank occurs.  No platform types
actually support atomic watermark programming until a future patch, so
there should be no functional change yet; individual platforms will be
converted to use this mechanism one-by-one in future patches.

Note that we'll probably expand this to cover other post-vblank async
tasks (like unpinning) at some point in the future.

v2: Much simpler vblank mechanism than was used in the previous series;
no need to allocate new heap structures.

Signed-off-by: Matt Roper 
---
 drivers/gpu/drm/i915/i915_drv.h  |  7 +++
 drivers/gpu/drm/i915/i915_irq.c  |  9 +
 drivers/gpu/drm/i915/intel_display.c | 30 ++
 drivers/gpu/drm/i915/intel_drv.h |  4 
 4 files changed, 46 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 2774976..5ad942e 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -628,6 +628,7 @@ struct drm_i915_display_funcs {
 struct drm_crtc *crtc,
 uint32_t sprite_width, uint32_t sprite_height,
 int pixel_size, bool enable, bool scaled);
+   void (*program_watermarks)(struct drm_i915_private *dev_priv);
int (*modeset_calc_cdclk)(struct drm_atomic_state *state);
void (*modeset_commit_cdclk)(struct drm_atomic_state *state);
/* Returns the active state of the crtc, and if the crtc is active,
@@ -2567,6 +2568,12 @@ struct drm_i915_cmd_table {
 #define HAS_L3_DPF(dev) (IS_IVYBRIDGE(dev) || IS_HASWELL(dev))
 #define NUM_L3_SLICES(dev) (IS_HSW_GT3(dev) ? 2 : HAS_L3_DPF(dev))
 
+/*
+ * FIXME:  Not all platforms have been transitioned to atomic watermark
+ * updates yet.
+ */
+#define HAS_ATOMIC_WM(dev_priv) (dev_priv->display.program_watermarks != NULL)
+
 #define GT_FREQUENCY_MULTIPLIER 50
 #define GEN9_FREQ_SCALER 3
 
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index a6fbe64..20c7260 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1452,6 +1452,15 @@ static void gen6_rps_irq_handler(struct drm_i915_private 
*dev_priv, u32 pm_iir)
 
 static bool intel_pipe_handle_vblank(struct drm_device *dev, enum pipe pipe)
 {
+   struct drm_i915_private *dev_priv = to_i915(dev);
+   struct drm_crtc *crtc = dev_priv->pipe_to_crtc_mapping[pipe];
+   struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
+
+   if (intel_crtc->need_vblank_wm_update) {
+   queue_work(dev_priv->wq, &intel_crtc->wm_work);
+   intel_crtc->need_vblank_wm_update = false;
+   }
+
if (!drm_handle_vblank(dev, pipe))
return false;
 
diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index 46b62cc..fa4373e 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -4737,6 +4737,10 @@ static void intel_post_plane_update(struct intel_crtc 
*crtc)
struct drm_device *dev = crtc->base.dev;
struct drm_plane *plane;
 
+   if (HAS_ATOMIC_WM(to_i915(dev)))
+   /* vblank handler will kick off workqueue task to update wm's */
+   crtc->need_vblank_wm_update = true;
+
if (atomic->wait_vblank)
intel_wait_for_vblank(dev, crtc->pipe);
 
@@ -4745,7 +4749,7 @@ static void intel_post_plane_update(struct intel_crtc 
*crtc)
if (atomic->disable_cxsr)
cstate->wm.cxsr_allowed = true;
 
-   if (crtc->atomic.update_wm_post)
+   if (!HAS_ATOMIC_WM(to_i915(dev)) && crtc->atomic.update_wm_post)
intel_update_watermarks(&crtc->base);
 
if (atomic->update_fbc) {
@@ -4757,9 +4761,10 @@ static void intel_post_plane_update(struct intel_crtc 
*crtc)
if (atomic->post_enable_primary)
intel_post_enable_primary(&crtc->base);
 
-   drm_for_each_plane_mask(plane, dev, atomic->update_sprite_watermarks)
-   intel_update_sprite_watermarks(plane, &crtc->base,
-  0, 0, 0, false, false);
+   if (!HAS_ATOMIC_WM(to_i915(dev)))
+   drm_for_each_plane_mask(plane, dev, 
atomic->update_sprite_watermarks)
+   intel_update_sprite_watermarks(plane, &crtc->base,
+  0, 0, 0, false, false);
 
memset(atomic, 0, sizeof(*atomic));
 }
@@ -14070,6 +14075,21 @@ static void skl_init_scalers(struct drm_device *dev, 
struct intel_crtc *intel_cr
scaler_state->scaler_id = -1;
 }
 
+/* FIXME: This may expand to cover other tasks like unpinning in the future... 
*/
+static void wm_work_func(struct work_struct *work)
+{
+   struct intel_crtc *intel_crtc =
+   container_of(work, struct intel_cr

[Intel-gfx] [RFC 2/8] drm/i915: Eliminate usage of pipe_wm_parameters from ILK-style WM

2015-07-01 Thread Matt Roper

Just pull the info out of the CRTC state structure rather than staging
it in an additional structure.

Signed-off-by: Matt Roper 
---
 drivers/gpu/drm/i915/intel_pm.c | 84 ++---
 1 file changed, 28 insertions(+), 56 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 2f38070..a639cc9 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -1774,12 +1774,6 @@ struct skl_pipe_wm_parameters {
struct intel_plane_wm_parameters cursor;
 };
 
-struct ilk_pipe_wm_parameters {
-   bool active;
-   uint32_t pipe_htotal;
-   uint32_t pixel_rate;
-};
-
 struct ilk_wm_maximums {
uint16_t pri;
uint16_t spr;
@@ -1798,7 +1792,7 @@ struct intel_wm_config {
  * For both WM_PIPE and WM_LP.
  * mem_value must be in 0.1us units.
  */
-static uint32_t ilk_compute_pri_wm(const struct ilk_pipe_wm_parameters *params,
+static uint32_t ilk_compute_pri_wm(const struct intel_crtc_state *cstate,
   const struct intel_plane_state *pstate,
   uint32_t mem_value,
   bool is_lp)
@@ -1806,16 +1800,16 @@ static uint32_t ilk_compute_pri_wm(const struct 
ilk_pipe_wm_parameters *params,
int bpp = pstate->base.fb ? pstate->base.fb->bits_per_pixel / 8 : 0;
uint32_t method1, method2;
 
-   if (!params->active || !pstate->visible)
+   if (!cstate->base.active || !pstate->visible)
return 0;
 
-   method1 = ilk_wm_method1(params->pixel_rate, bpp, mem_value);
+   method1 = ilk_wm_method1(ilk_pipe_pixel_rate(cstate), bpp, mem_value);
 
if (!is_lp)
return method1;
 
-   method2 = ilk_wm_method2(params->pixel_rate,
-params->pipe_htotal,
+   method2 = ilk_wm_method2(ilk_pipe_pixel_rate(cstate),
+cstate->base.adjusted_mode.crtc_htotal,
 drm_rect_width(&pstate->dst),
 bpp,
 mem_value);
@@ -1827,19 +1821,19 @@ static uint32_t ilk_compute_pri_wm(const struct 
ilk_pipe_wm_parameters *params,
  * For both WM_PIPE and WM_LP.
  * mem_value must be in 0.1us units.
  */
-static uint32_t ilk_compute_spr_wm(const struct ilk_pipe_wm_parameters *params,
+static uint32_t ilk_compute_spr_wm(const struct intel_crtc_state *cstate,
   const struct intel_plane_state *pstate,
   uint32_t mem_value)
 {
int bpp = pstate->base.fb ? pstate->base.fb->bits_per_pixel / 8 : 0;
uint32_t method1, method2;
 
-   if (!params->active || !pstate->visible)
+   if (!cstate->base.active || !pstate->visible)
return 0;
 
-   method1 = ilk_wm_method1(params->pixel_rate, bpp, mem_value);
-   method2 = ilk_wm_method2(params->pixel_rate,
-params->pipe_htotal,
+   method1 = ilk_wm_method1(ilk_pipe_pixel_rate(cstate), bpp, mem_value);
+   method2 = ilk_wm_method2(ilk_pipe_pixel_rate(cstate),
+cstate->base.adjusted_mode.crtc_htotal,
 drm_rect_width(&pstate->dst),
 bpp,
 mem_value);
@@ -1850,30 +1844,30 @@ static uint32_t ilk_compute_spr_wm(const struct 
ilk_pipe_wm_parameters *params,
  * For both WM_PIPE and WM_LP.
  * mem_value must be in 0.1us units.
  */
-static uint32_t ilk_compute_cur_wm(const struct ilk_pipe_wm_parameters *params,
+static uint32_t ilk_compute_cur_wm(const struct intel_crtc_state *cstate,
   const struct intel_plane_state *pstate,
   uint32_t mem_value)
 {
int bpp = pstate->base.fb ? pstate->base.fb->bits_per_pixel / 8 : 0;
 
-   if (!params->active || !pstate->visible)
+   if (!cstate->base.active || !pstate->visible)
return 0;
 
-   return ilk_wm_method2(params->pixel_rate,
- params->pipe_htotal,
+   return ilk_wm_method2(ilk_pipe_pixel_rate(cstate),
+ cstate->base.adjusted_mode.crtc_htotal,
  drm_rect_width(&pstate->dst),
  bpp,
  mem_value);
 }
 
 /* Only for WM_LP. */
-static uint32_t ilk_compute_fbc_wm(const struct ilk_pipe_wm_parameters *params,
+static uint32_t ilk_compute_fbc_wm(const struct intel_crtc_state *cstate,
   const struct intel_plane_state *pstate,
   uint32_t pri_val)
 {
int bpp = pstate->base.fb ? pstate->base.fb->bits_per_pixel / 8 : 0;
 
-   if (!params->active || !pstate->visible)
+   if (!cstate->base.active || !pstate->visible)
return 0;
 
return ilk_wm_fbc(pri

[Intel-gfx] [RFC 1/8] drm/i915: Eliminate usage of plane_wm_parameters from ILK-style WM code

2015-07-01 Thread Matt Roper

Just pull the info out of the plane state structure rather than staging
it in an additional structure.

Signed-off-by: Matt Roper 
---
 drivers/gpu/drm/i915/intel_pm.c | 133 +---
 1 file changed, 70 insertions(+), 63 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 6eb5d76..2f38070 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -1778,9 +1778,6 @@ struct ilk_pipe_wm_parameters {
bool active;
uint32_t pipe_htotal;
uint32_t pixel_rate;
-   struct intel_plane_wm_parameters pri;
-   struct intel_plane_wm_parameters spr;
-   struct intel_plane_wm_parameters cur;
 };
 
 struct ilk_wm_maximums {
@@ -1802,25 +1799,25 @@ struct intel_wm_config {
  * mem_value must be in 0.1us units.
  */
 static uint32_t ilk_compute_pri_wm(const struct ilk_pipe_wm_parameters *params,
+  const struct intel_plane_state *pstate,
   uint32_t mem_value,
   bool is_lp)
 {
+   int bpp = pstate->base.fb ? pstate->base.fb->bits_per_pixel / 8 : 0;
uint32_t method1, method2;
 
-   if (!params->active || !params->pri.enabled)
+   if (!params->active || !pstate->visible)
return 0;
 
-   method1 = ilk_wm_method1(params->pixel_rate,
-params->pri.bytes_per_pixel,
-mem_value);
+   method1 = ilk_wm_method1(params->pixel_rate, bpp, mem_value);
 
if (!is_lp)
return method1;
 
method2 = ilk_wm_method2(params->pixel_rate,
 params->pipe_htotal,
-params->pri.horiz_pixels,
-params->pri.bytes_per_pixel,
+drm_rect_width(&pstate->dst),
+bpp,
 mem_value);
 
return min(method1, method2);
@@ -1831,20 +1828,20 @@ static uint32_t ilk_compute_pri_wm(const struct 
ilk_pipe_wm_parameters *params,
  * mem_value must be in 0.1us units.
  */
 static uint32_t ilk_compute_spr_wm(const struct ilk_pipe_wm_parameters *params,
+  const struct intel_plane_state *pstate,
   uint32_t mem_value)
 {
+   int bpp = pstate->base.fb ? pstate->base.fb->bits_per_pixel / 8 : 0;
uint32_t method1, method2;
 
-   if (!params->active || !params->spr.enabled)
+   if (!params->active || !pstate->visible)
return 0;
 
-   method1 = ilk_wm_method1(params->pixel_rate,
-params->spr.bytes_per_pixel,
-mem_value);
+   method1 = ilk_wm_method1(params->pixel_rate, bpp, mem_value);
method2 = ilk_wm_method2(params->pixel_rate,
 params->pipe_htotal,
-params->spr.horiz_pixels,
-params->spr.bytes_per_pixel,
+drm_rect_width(&pstate->dst),
+bpp,
 mem_value);
return min(method1, method2);
 }
@@ -1854,28 +1851,32 @@ static uint32_t ilk_compute_spr_wm(const struct 
ilk_pipe_wm_parameters *params,
  * mem_value must be in 0.1us units.
  */
 static uint32_t ilk_compute_cur_wm(const struct ilk_pipe_wm_parameters *params,
+  const struct intel_plane_state *pstate,
   uint32_t mem_value)
 {
-   if (!params->active || !params->cur.enabled)
+   int bpp = pstate->base.fb ? pstate->base.fb->bits_per_pixel / 8 : 0;
+
+   if (!params->active || !pstate->visible)
return 0;
 
return ilk_wm_method2(params->pixel_rate,
  params->pipe_htotal,
- params->cur.horiz_pixels,
- params->cur.bytes_per_pixel,
+ drm_rect_width(&pstate->dst),
+ bpp,
  mem_value);
 }
 
 /* Only for WM_LP. */
 static uint32_t ilk_compute_fbc_wm(const struct ilk_pipe_wm_parameters *params,
+  const struct intel_plane_state *pstate,
   uint32_t pri_val)
 {
-   if (!params->active || !params->pri.enabled)
+   int bpp = pstate->base.fb ? pstate->base.fb->bits_per_pixel / 8 : 0;
+
+   if (!params->active || !pstate->visible)
return 0;
 
-   return ilk_wm_fbc(pri_val,
- params->pri.horiz_pixels,
- params->pri.bytes_per_pixel);
+   return ilk_wm_fbc(pri_val, drm_rect_width(&pstate->dst), bpp);
 }
 
 static unsigned int ilk_display_fifo_size(const struct drm_device *dev)
@@ -2040,10 +2041,12 @@ static

[Intel-gfx] [RFC 5/8] drm/i915: Move active watermarks into CRTC state

2015-07-01 Thread Matt Roper

Since we allocate a few CRTC states on the stack, also switch the 'wm'
struct here to be a union so that we're not wasting stack space with
other platforms' watermark values.

Signed-off-by: Matt Roper 
---
 drivers/gpu/drm/i915/intel_display.c |  8 --
 drivers/gpu/drm/i915/intel_drv.h | 54 +++-
 drivers/gpu/drm/i915/intel_pm.c  | 34 ++-
 3 files changed, 55 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index 46ef981..36ae3f7 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -4733,6 +4733,7 @@ intel_pre_disable_primary(struct drm_crtc *crtc)
 static void intel_post_plane_update(struct intel_crtc *crtc)
 {
struct intel_crtc_atomic_commit *atomic = &crtc->atomic;
+   struct intel_crtc_state *cstate = to_intel_crtc_state(crtc->base.state);
struct drm_device *dev = crtc->base.dev;
struct drm_plane *plane;
 
@@ -4742,7 +4743,7 @@ static void intel_post_plane_update(struct intel_crtc 
*crtc)
intel_frontbuffer_flip(dev, atomic->fb_bits);
 
if (atomic->disable_cxsr)
-   crtc->wm.cxsr_allowed = true;
+   cstate->wm.cxsr_allowed = true;
 
if (crtc->atomic.update_wm_post)
intel_update_watermarks(&crtc->base);
@@ -4766,6 +4767,7 @@ static void intel_post_plane_update(struct intel_crtc 
*crtc)
 static void intel_pre_plane_update(struct intel_crtc *crtc)
 {
struct drm_device *dev = crtc->base.dev;
+   struct intel_crtc_state *cstate = to_intel_crtc_state(crtc->base.state);
struct drm_i915_private *dev_priv = dev->dev_private;
struct intel_crtc_atomic_commit *atomic = &crtc->atomic;
struct drm_plane *p;
@@ -4798,7 +4800,7 @@ static void intel_pre_plane_update(struct intel_crtc 
*crtc)
intel_pre_disable_primary(&crtc->base);
 
if (atomic->disable_cxsr) {
-   crtc->wm.cxsr_allowed = false;
+   cstate->wm.cxsr_allowed = false;
intel_set_memory_cxsr(dev_priv, false);
}
 }
@@ -14127,7 +14129,7 @@ static void intel_crtc_init(struct drm_device *dev, int 
pipe)
intel_crtc->cursor_cntl = ~0;
intel_crtc->cursor_size = ~0;
 
-   intel_crtc->wm.cxsr_allowed = true;
+   crtc_state->wm.cxsr_allowed = true;
 
BUG_ON(pipe >= ARRAY_SIZE(dev_priv->plane_to_crtc_mapping) ||
   dev_priv->plane_to_crtc_mapping[intel_crtc->plane] != NULL);
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index cdc7d6d..c23cf7d 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -328,6 +328,21 @@ struct intel_crtc_scaler_state {
int scaler_id;
 };
 
+struct intel_pipe_wm {
+   struct intel_wm_level wm[5];
+   uint32_t linetime;
+   bool fbc_wm_enabled;
+   bool pipe_enabled;
+   bool sprites_enabled;
+   bool sprites_scaled;
+};
+
+struct skl_pipe_wm {
+   struct skl_wm_level wm[8];
+   struct skl_wm_level trans_wm;
+   uint32_t linetime;
+};
+
 struct intel_crtc_state {
struct drm_crtc_state base;
 
@@ -463,6 +478,20 @@ struct intel_crtc_state {
 
/* IVB sprite scaling w/a (WaCxSRDisabledForSpriteScaling:ivb) */
bool disable_lp_wm;
+
+   struct {
+   /*
+* final watermarks, programmed post-vblank when this state
+* is committed
+*/
+   union {
+   struct intel_pipe_wm ilk;
+   struct skl_pipe_wm skl;
+   } active;
+
+   /* allow CxSR on this pipe */
+   bool cxsr_allowed;
+   } wm;
 };
 
 struct vlv_wm_state {
@@ -474,15 +503,6 @@ struct vlv_wm_state {
bool cxsr;
 };
 
-struct intel_pipe_wm {
-   struct intel_wm_level wm[5];
-   uint32_t linetime;
-   bool fbc_wm_enabled;
-   bool pipe_enabled;
-   bool sprites_enabled;
-   bool sprites_scaled;
-};
-
 struct intel_mmio_flip {
struct work_struct work;
struct drm_i915_private *i915;
@@ -490,12 +510,6 @@ struct intel_mmio_flip {
struct intel_crtc *crtc;
 };
 
-struct skl_pipe_wm {
-   struct skl_wm_level wm[8];
-   struct skl_wm_level trans_wm;
-   uint32_t linetime;
-};
-
 /*
  * Tracking of operations that need to be performed at the beginning/end of an
  * atomic commit, outside the atomic section where interrupts are disabled.
@@ -564,16 +578,6 @@ struct intel_crtc {
bool cpu_fifo_underrun_disabled;
bool pch_fifo_underrun_disabled;
 
-   /* per-pipe watermark state */
-   struct {
-   /* watermarks currently being used  */
-   struct intel_pipe_wm active;
-   /* SKL wm values currently in use */
-   struct skl_pipe_wm skl_active;
-   /* allow CxSR on

[Intel-gfx] [RFC 8/8] drm/i915: Add two-stage ILK-style watermark programming (v2)

2015-07-01 Thread Matt Roper

From: Matt Roper 

In addition to calculating final watermarks, let's also pre-calculate a
set of intermediate watermark values at atomic check time.  These
intermediate watermarks are a combination of the watermarks for the old
state and the new state; they should satisfy the requirements of both
states which means they can be programmed immediately when we commit the
atomic state (without waiting for a vblank).  Once the vblank does
happen, we can then re-program watermarks to the more optimal final
value.

v2: Significant rebasing/rewriting.

Signed-off-by: Matt Roper 
---
 drivers/gpu/drm/i915/i915_drv.h  |  9 +
 drivers/gpu/drm/i915/i915_irq.c  |  7 
 drivers/gpu/drm/i915/intel_display.c | 34 +++-
 drivers/gpu/drm/i915/intel_drv.h | 26 +
 drivers/gpu/drm/i915/intel_pm.c  | 75 ++--
 5 files changed, 130 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 5ad942e..42397e2 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -623,6 +623,9 @@ struct drm_i915_display_funcs {
  struct dpll *best_clock);
int (*compute_pipe_wm)(struct drm_crtc *crtc,
   struct drm_atomic_state *state);
+   void (*compute_intermediate_wm)(struct drm_device *dev,
+   struct intel_crtc_state *newstate,
+   const struct intel_crtc_state 
*oldstate);
void (*update_wm)(struct drm_crtc *crtc);
void (*update_sprite_wm)(struct drm_plane *plane,
 struct drm_crtc *crtc,
@@ -2574,6 +2577,12 @@ struct drm_i915_cmd_table {
  */
 #define HAS_ATOMIC_WM(dev_priv) (dev_priv->display.program_watermarks != NULL)
 
+/*
+ * Newer platforms have doublebuffered watermark registers and do not need
+ * the two-step watermark programming used by older platforms.
+ */
+#define HAS_DBLBUF_WM(dev_priv) (INTEL_INFO(dev_priv)->gen >= 9)
+
 #define GT_FREQUENCY_MULTIPLIER 50
 #define GEN9_FREQ_SCALER 3
 
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 20c7260..627c56f 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1455,8 +1455,15 @@ static bool intel_pipe_handle_vblank(struct drm_device 
*dev, enum pipe pipe)
struct drm_i915_private *dev_priv = to_i915(dev);
struct drm_crtc *crtc = dev_priv->pipe_to_crtc_mapping[pipe];
struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
+   struct intel_crtc_state *cstate = to_intel_crtc_state(crtc->state);
 
if (intel_crtc->need_vblank_wm_update) {
+   WARN_ON(HAS_DBLBUF_WM(dev_priv));
+
+   /* Latch final watermarks now that vblank is past */
+   cstate->wm.active = cstate->wm.target;
+
+   /* Queue work to actually program them asynchronously */
queue_work(dev_priv->wq, &intel_crtc->wm_work);
intel_crtc->need_vblank_wm_update = false;
}
diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index fa4373e..1616d7f 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -4737,7 +4737,7 @@ static void intel_post_plane_update(struct intel_crtc 
*crtc)
struct drm_device *dev = crtc->base.dev;
struct drm_plane *plane;
 
-   if (HAS_ATOMIC_WM(to_i915(dev)))
+   if (HAS_ATOMIC_WM(to_i915(dev)) && !HAS_DBLBUF_WM(to_i915(dev)))
/* vblank handler will kick off workqueue task to update wm's */
crtc->need_vblank_wm_update = true;
 
@@ -11833,6 +11833,8 @@ static int intel_crtc_atomic_check(struct drm_crtc 
*crtc,
struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
struct intel_crtc_state *pipe_config =
to_intel_crtc_state(crtc_state);
+   struct intel_crtc_state *old_pipe_config =
+   to_intel_crtc_state(crtc->state);
struct drm_atomic_state *state = crtc_state->state;
int ret, idx = crtc->base.id;
bool mode_changed = needs_modeset(crtc_state);
@@ -11863,9 +11865,20 @@ static int intel_crtc_atomic_check(struct drm_crtc 
*crtc,
}
 
if (dev_priv->display.compute_pipe_wm) {
+   if (WARN_ON(!dev_priv->display.compute_intermediate_wm))
+   return 0;
+
ret = dev_priv->display.compute_pipe_wm(crtc, state);
if (ret)
return ret;
+
+   /*
+* Calculate 'intermediate' watermarks that satisfy both the 
old state
+* and the new state.  We can program these immediately.
+*/
+   dev_priv->display.compute_intermediate_wm(crtc->dev,
+ pipe_config,
+

[Intel-gfx] [RFC 0/8] Atomic watermark updates (v2)

2015-07-01 Thread Matt Roper

Here's a second RFC for transitioning watermark updates to an atomic model.  As
in the first series, I'm only transitioning a single platform style to start
with (ilk-style watermarks).  For pre-gen9 platforms, two sets of watermarks
are pre-computed at atomic 'check' time --- one set that can be programmed
immediately without waiting for a vblank (these will satisfy both the new and
old hardware state) and a second set that should be programmed following the
vblank (optimal values that may not work until the hardware has actually
switched to the new state).

A lot of the differences between this series and the first one are just a
matter of rebasing on the latest code; there's been a lot of work by Maarten
and Ville that have significantly changed (for the better) the areas this code
touches.

I'm working on some updates for skl/bxt right now (which don't need the
two-step process used by pre-gen9, but do need some other rework); I'll post
those later once I finish them off and have a chance to test them on real BXT
hardware.

Matt Roper (7):
  drm/i915: Eliminate usage of plane_wm_parameters from ILK-style WM
code
  drm/i915: Eliminate usage of pipe_wm_parameters from ILK-style WM
  drm/i915/ivb: Move WaCxSRDisabledForSpriteScaling w/a to atomic check
  drm/i915: Move active watermarks into CRTC state
  drm/i915: Calculate ILK-style watermarks during atomic check (v2)
  drm/i915: Allow final wm programming to be scheduled after next vblank
(v2)
  drm/i915: Add two-stage ILK-style watermark programming (v2)

Ville Syrjälä (1):
  drm/i915: Refactor ilk_update_wm (v3)

 drivers/gpu/drm/i915/i915_drv.h  |  18 ++
 drivers/gpu/drm/i915/i915_irq.c  |  16 ++
 drivers/gpu/drm/i915/intel_atomic.c  |   1 +
 drivers/gpu/drm/i915/intel_display.c | 116 ++--
 drivers/gpu/drm/i915/intel_drv.h |  73 +---
 drivers/gpu/drm/i915/intel_pm.c  | 330 +++
 drivers/gpu/drm/i915/intel_sprite.c  |   8 -
 7 files changed, 366 insertions(+), 196 deletions(-)

-- 
2.1.4

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH v3] drm/i915: Report correct GGTT space usage

2015-07-01 Thread shuang . he

Tested-By: Intel Graphics QA PRTS (Patch Regression Test System Contact: 
shuang...@intel.com)
Task id: 6680
-Summary-
Platform  Delta  drm-intel-nightly  Series Applied
ILK  302/302  302/302
SNB  312/316  312/316
IVB  343/343  343/343
BYT -2  287/287  285/287
HSW  380/380  380/380
-Detailed-
Platform  Testdrm-intel-nightly  Series 
Applied
*BYT  igt@gem_partial_pwrite_pread@reads-display  PASS(1)  FAIL(1)
*BYT  igt@gem_tiled_partial_pwrite_pread@reads  PASS(1)  FAIL(1)
Note: You need to pay more attention to line start with '*'
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH i-g-t 2/4] aux: Don't evaluate several times the arguments of min() and max()

2015-07-01 Thread Damien Lespiau

Signed-off-by: Damien Lespiau 
---
 lib/igt_aux.h | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/lib/igt_aux.h b/lib/igt_aux.h
index 9ea50de..2979314 100644
--- a/lib/igt_aux.h
+++ b/lib/igt_aux.h
@@ -91,8 +91,16 @@ void intel_require_memory(uint32_t count, uint32_t size, 
unsigned mode);
 #define CHECK_SWAP 0x2
 
 
-#define min(a, b) ((a) < (b) ? (a) : (b))
-#define max(a, b) ((a) > (b) ? (a) : (b))
+#define min(a, b) ({   \
+   typeof(a) _a = (a); \
+   typeof(b) _b = (b); \
+   _a < _b ? _a : _b;  \
+})
+#define max(a, b) ({   \
+   typeof(a) _a = (a); \
+   typeof(b) _b = (b); \
+   _a > _b ? _a : _b;  \
+})
 
 #define igt_swap(a, b) do {\
typeof(a) _tmp = (a);   \
-- 
2.1.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH i-g-t 3/4] build: Add DEBUG_FLAGS to tools and self-tests

2015-07-01 Thread Damien Lespiau

Makes using GDB better on those binaries.

Signed-off-by: Damien Lespiau 
---
 lib/tests/Makefile.am | 2 +-
 tools/Makefile.am | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/tests/Makefile.am b/lib/tests/Makefile.am
index 938d2ab..f09d2fe 100644
--- a/lib/tests/Makefile.am
+++ b/lib/tests/Makefile.am
@@ -6,7 +6,7 @@ AM_TESTS_ENVIRONMENT = \
 
 EXTRA_DIST = $(check_SCRIPTS)
 
-AM_CFLAGS = $(DRM_CFLAGS) $(CWARNFLAGS) \
+AM_CFLAGS = $(DRM_CFLAGS) $(CWARNFLAGS) $(DEBUG_CFLAGS) \
-I$(srcdir)/../.. \
-I$(srcdir)/.. \
-include "$(srcdir)/../../lib/check-ndebug.h" \
diff --git a/tools/Makefile.am b/tools/Makefile.am
index 8288248..da971c3 100644
--- a/tools/Makefile.am
+++ b/tools/Makefile.am
@@ -7,6 +7,6 @@ SUBDIRS += quick_dump
 endif
 
 AM_CPPFLAGS = -I$(top_srcdir) -I$(top_srcdir)/lib
-AM_CFLAGS = $(DRM_CFLAGS) $(PCIACCESS_CFLAGS) $(CWARNFLAGS) $(CAIRO_CFLAGS) 
$(LIBUNWIND_CFLAGS)
+AM_CFLAGS = $(DEBUG_CFLAGS) $(DRM_CFLAGS) $(PCIACCESS_CFLAGS) $(CWARNFLAGS) 
$(CAIRO_CFLAGS) $(LIBUNWIND_CFLAGS)
 LDADD = $(top_builddir)/lib/libintel_tools.la $(DRM_LIBS) $(PCIACCESS_LIBS) 
$(CAIRO_LIBS) $(LIBUDEV_LIBS) $(LIBUNWIND_LIBS) -lm
 
-- 
2.1.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH i-g-t 4/4] build: Add an option to not use the git hash in version

2015-07-01 Thread Damien Lespiau

When developing, it's quite annoying that the version changes every
commit, causing the library to be rebuild and everything single binary
re-linked.

Add a config option to skip that.

I remember Ville asking for this "feature" as well.

Cc: Ville Syrjälä 
Signed-off-by: Damien Lespiau 
---
 configure.ac | 7 +++
 lib/Makefile.sources | 5 +
 2 files changed, 12 insertions(+)

diff --git a/configure.ac b/configure.ac
index 4208f00..caa3f50 100644
--- a/configure.ac
+++ b/configure.ac
@@ -212,6 +212,13 @@ if test "x$enable_debug" = xyes; then
AC_SUBST([DEBUG_CFLAGS])
 fi
 
+# prevent relinking the world on every commit for developpers
+AC_ARG_ENABLE(skip-version,
+ AS_HELP_STRING([--enable-skip-version],
+[Do not use git hash in version]),
+ [skip_version=$enableval], [skip_version=no])
+AM_CONDITIONAL(SKIP_VERSION, [test "x$skip_version" = xyes])
+
 # -
 
 # To build multithread code, gcc uses -pthread, Solaris Studio cc uses -mt
diff --git a/lib/Makefile.sources b/lib/Makefile.sources
index f8a1b92..2148684 100644
--- a/lib/Makefile.sources
+++ b/lib/Makefile.sources
@@ -60,6 +60,10 @@ libintel_tools_la_SOURCES =  \
 
 .PHONY: version.h.tmp
 
+if SKIP_VERSION
+$(IGT_LIB_PATH)/version.h.tmp:
+   @echo '#define IGT_GIT_SHA1 "SKIP"' >> $@
+else
 $(IGT_LIB_PATH)/version.h.tmp:
@touch $@
@if test -d $(GPU_TOOLS_PATH)/.git; then \
@@ -73,6 +77,7 @@ $(IGT_LIB_PATH)/version.h.tmp:
else \
echo '#define IGT_GIT_SHA1 "NOT-GIT"' ; \
fi >> $@
+endif # SKIP_VERSION
 
 
 $(IGT_LIB_PATH)/version.h: $(IGT_LIB_PATH)/version.h.tmp
-- 
2.1.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH i-g-t 1/4] stats: Add wikipedia links to get_trimean() and get_iqm()

2015-07-01 Thread Damien Lespiau

Useful knowledge for anyone looking at the documentation and following
the linkes.

Signed-off-by: Damien Lespiau 
---
 lib/igt_stats.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/lib/igt_stats.c b/lib/igt_stats.c
index b7053c3..70650ec 100644
--- a/lib/igt_stats.c
+++ b/lib/igt_stats.c
@@ -496,11 +496,15 @@ double igt_stats_get_std_deviation(igt_stats_t *stats)
  * igt_stats_get_iqm:
  * @stats: An #igt_stats_t instance
  *
- * Retrieves the interquartile mean of the @stats dataset.
+ * Retrieves the
+ * [interquartile mean](https://en.wikipedia.org/wiki/Interquartile_mean) (IQM)
+ * of the @stats dataset.
  *
  * The interquartile mean is a "statistical measure of central tendency".
  * It is a truncated mean that discards the lowest and highest 25% of values,
  * and calculates the mean value of the remaining central values.
+ *
+ * It's useful to hide outliers in measurements (due to cold cache etc).
  */
 double igt_stats_get_iqm(igt_stats_t *stats)
 {
@@ -533,7 +537,8 @@ double igt_stats_get_iqm(igt_stats_t *stats)
  * igt_stats_get_trimean:
  * @stats: An #igt_stats_t instance
  *
- * Retrieves the trimean of the @stats dataset.
+ * Retrieves the [trimean](https://en.wikipedia.org/wiki/Trimean) of the @stats
+ * dataset.
  *
  * The trimean is a the most efficient 3-point L-estimator, even more
  * robust than the median at estimating the average of a sample population.
-- 
2.1.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915: Reserve space improvements

2015-07-01 Thread shuang . he

Tested-By: Intel Graphics QA PRTS (Patch Regression Test System Contact: 
shuang...@intel.com)
Task id: 6679
-Summary-
Platform  Delta  drm-intel-nightly  Series Applied
ILK  302/302  302/302
SNB  312/316  312/316
IVB  343/343  343/343
BYT -2  287/287  285/287
HSW  380/380  380/380
-Detailed-
Platform  Testdrm-intel-nightly  Series 
Applied
*BYT  igt@gem_partial_pwrite_pread@reads-display  PASS(1)  FAIL(1)
*BYT  igt@gem_tiled_partial_pwrite_pread@reads  PASS(1)  FAIL(1)
Note: You need to pay more attention to line start with '*'
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 7/7] drm/i915: FBC doesn't need struct_mutex anymore

2015-07-01 Thread Chris Wilson

On Wed, Jul 01, 2015 at 05:15:26PM -0300, Paulo Zanoni wrote:
> From: Paulo Zanoni 
> 
> Everything is covered either by fbc.lock or mm.stolen_lock, and
> intel_fbc.c is already responsible for grabbing the appropriate locks
> when it needs them.
> 
> Signed-off-by: Paulo Zanoni 
5-7 Reviewed-by: Chris wilson 

They all look to be safely guarded by a specific mutex now.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 4/7] drm/i915: add the FBC mutex

2015-07-01 Thread Chris Wilson

On Wed, Jul 01, 2015 at 05:15:23PM -0300, Paulo Zanoni wrote:
> From: Paulo Zanoni 
> 
> Make sure we're not going to have weird races in really weird cases
> where a lot of different CRTCs are doing rendering and modesets at the
> same time.
> 
> With this change and the stolen_lock from the previous patch, we can start
> removing the struct_mutex locking we have around FBC in the next patches.
> 
> v2:
>  - Rebase (6 months later)
>  - Also lock debugfs and stolen.
> v3:
>  - Don't lock a single value read (Chris).
>  - Replace lockdep assertions with WARNs (Daniel).
>  - Improve commit message.
>  - Don't forget intel_pre_plane_update() locking.
> v4:
>  - Don't remove struct_mutex at intel_pre_plane_update() (Chris).
>  - Add comment regarding locking dependencies (Chris).
>  - Rebase after the stolen code rework.
>  - Rebase again after drm-intel-nightly changes.
> 
> Signed-off-by: Paulo Zanoni 

Reviewed-by: Chris Wilson 

See below.

> @@ -683,6 +721,8 @@ void intel_fbc_update(struct drm_device *dev)
>   const struct drm_display_mode *adjusted_mode;
>   unsigned int max_width, max_height;
>  
> + WARN_ON(!mutex_is_locked(&dev_priv->fbc.lock));
> +
>   if (!HAS_FBC(dev))
>   return;

This is now __intel_fbc_update() inside the fbc.lock. One would think
that the internal functions would only be called when FBC is desired.

That looks to be true, except for the new intel_fbc_update(). You can
upgrade this to if (WARN_ON(!HAS_FBC(dev_priv))) return; after adding
the proper guard to intel_fbc_update().
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 2/7] drm/i915: move FBC code out of i915_gem_stolen.c

2015-07-01 Thread Chris Wilson

On Wed, Jul 01, 2015 at 05:15:21PM -0300, Paulo Zanoni wrote:

Looks much cleaner with the split.

> +void intel_fbc_cleanup_cfb(struct drm_device *dev)
> +{
> + struct drm_i915_private *dev_priv = dev->dev_private;
> +
> + if (dev_priv->fbc.uncompressed_size == 0)
> + return;
> +
> + i915_gem_stolen_remove_node(&dev_priv->fbc.compressed_fb);
> +
> + if (dev_priv->fbc.compressed_llb) {
> + i915_gem_stolen_remove_node(dev_priv->fbc.compressed_llb);
> + kfree(dev_priv->fbc.compressed_llb);
> + }

Any reason why one node is embedded and the other allocated? Just feels
a little inconsistent, so lacks an explanation. Just that one is always
used, and the other on rare gen would probably suffice.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 1/7] drm/i915: add simple wrappers for stolen node insertion/removal

2015-07-01 Thread Chris Wilson

On Wed, Jul 01, 2015 at 05:15:20PM -0300, Paulo Zanoni wrote:
> From: Paulo Zanoni 
> 
> We want to move the FBC code out of i915_gem_stolen.c, but that code
> directly adds/removes stolen memory nodes. Let's create this
> abstraction, so i915_gme_stolen.c is still in control of all the
> stolen memory handling. These abstractions will also allow us to add
> locking assertions later.
> 
> Requested-by: Chris Wilson 
> Signed-off-by: Paulo Zanoni 
> ---
>  drivers/gpu/drm/i915/i915_drv.h|  4 
>  drivers/gpu/drm/i915/i915_gem_stolen.c | 44 
> +-
>  2 files changed, 32 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 1dbd957..b9de374 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -3109,6 +3109,10 @@ static inline void i915_gem_chipset_flush(struct 
> drm_device *dev)
>  }
>  
>  /* i915_gem_stolen.c */
> +int i915_gem_stolen_insert_node(struct drm_i915_private *dev_priv,
> + struct drm_mm_node *node, u64 size,
> + unsigned alignment);
> +void i915_gem_stolen_remove_node(struct drm_mm_node *node);

Might as well pass in dev_priv now to save changing the interface later.


>  int i915_gem_init_stolen(struct drm_device *dev);
>  int i915_gem_stolen_setup_compression(struct drm_device *dev, int size, int 
> fb_cpp);
>  void i915_gem_stolen_cleanup_compression(struct drm_device *dev);
> diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c 
> b/drivers/gpu/drm/i915/i915_gem_stolen.c
> index 348ed5a..6b43234 100644
> --- a/drivers/gpu/drm/i915/i915_gem_stolen.c
> +++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
> @@ -42,6 +42,22 @@
>   * for is a boon.
>   */
>  
> +int i915_gem_stolen_insert_node(struct drm_i915_private *dev_priv,
> + struct drm_mm_node *node, u64 size,
> + unsigned alignment)
> +{
> + if (!drm_mm_initialized(&dev_priv->mm.stolen))
> + return -ENODEV;

Might as well take advantage of this test here to remove the same check
from i915_gem_object_create_stolen_for_preallocated and
i915_gem_object_create_stolen

Other than those minor, looks good.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH v2 07/10] drm/i915: Try to make sure cxsr is disabled around plane enable/disable

2015-07-01 Thread Matt Roper

On Wed, Jul 01, 2015 at 10:13:38PM +0300, ville.syrj...@linux.intel.com wrote:
> From: Ville Syrjälä 
> 
> CxSR (or maxfifo on VLV/CHV) blocks somne changes to the plane control
> register (enable bit at least, not quite sure about the rest). So in
> order to have the plane enable/disable when we want we need to first
> kick the hardware out of cxsr.
> 
> Unfortunateloy this requires some extra vblank waits. For the CxSR
> enable after the plane update we should eventually use an async
> vblank worker, but since we don't have that just do sync vblank
> waits. For the disable case we have no choice but to do it
> synchronously.
> 
> v2: Don't add a spurious intel_pre_plane_update() to crtc disable
> 
> Cc: Paulo Zanoni 
> Reviewed-by: Clint Taylor 
> Tested-by: Clint Taylor 
> Signed-off-by: Ville Syrjälä 
> ---
> Paulo noticed some frontbuffer_bits WARNs from this patch, and sure enough
> I accidentally added another intel_pre_plane_update() to the crtc disable 
> loop.
> I failed to notice because I had commented out the frontbuffer_bits WARNs 
> earlier
> from my tree since they were too noisy.

I can confirm that your v2 fixes the warning spam caused by v1.

Regarding the v2 fix:
Tested-by: Matt Roper 

> 
>  drivers/gpu/drm/i915/intel_display.c | 34 +-
>  drivers/gpu/drm/i915/intel_drv.h |  3 +++
>  drivers/gpu/drm/i915/intel_pm.c  | 11 ---
>  3 files changed, 36 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_display.c 
> b/drivers/gpu/drm/i915/intel_display.c
> index d67b5f1..defc4ce 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -4716,6 +4716,9 @@ static void intel_post_plane_update(struct intel_crtc 
> *crtc)
>  
>   intel_frontbuffer_flip(dev, atomic->fb_bits);
>  
> + if (atomic->disable_cxsr)
> + crtc->wm.cxsr_allowed = true;
> +
>   if (crtc->atomic.update_wm_post)
>   intel_update_watermarks(&crtc->base);
>  
> @@ -4765,6 +4768,11 @@ static void intel_pre_plane_update(struct intel_crtc 
> *crtc)
>  
>   if (atomic->pre_disable_primary)
>   intel_pre_disable_primary(&crtc->base);
> +
> + if (atomic->disable_cxsr) {
> + crtc->wm.cxsr_allowed = false;
> + intel_set_memory_cxsr(dev_priv, false);
> + }
>  }
>  
>  static void intel_crtc_disable_planes(struct drm_crtc *crtc, unsigned 
> plane_mask)
> @@ -11646,12 +11654,26 @@ int intel_plane_atomic_calc_changes(struct 
> drm_crtc_state *crtc_state,
>plane->base.id, was_visible, visible,
>turn_off, turn_on, mode_changed);
>  
> - if (turn_on)
> + if (turn_on) {
>   intel_crtc->atomic.update_wm_pre = true;
> - else if (turn_off)
> + /* must disable cxsr around plane enable/disable */
> + if (plane->type != DRM_PLANE_TYPE_CURSOR) {
> + intel_crtc->atomic.disable_cxsr = true;
> + /* to potentially re-enable cxsr */
> + intel_crtc->atomic.wait_vblank = true;
> + intel_crtc->atomic.update_wm_post = true;
> + }
> + } else if (turn_off) {
>   intel_crtc->atomic.update_wm_post = true;
> - else if (intel_wm_need_update(plane, plane_state))
> + /* must disable cxsr around plane enable/disable */
> + if (plane->type != DRM_PLANE_TYPE_CURSOR) {
> + if (is_crtc_enabled)
> + intel_crtc->atomic.wait_vblank = true;
> + intel_crtc->atomic.disable_cxsr = true;
> + }
> + } else if (intel_wm_need_update(plane, plane_state)) {
>   intel_crtc->atomic.update_wm_pre = true;
> + }
>  
>   if (visible)
>   intel_crtc->atomic.fb_bits |=
> @@ -11808,8 +11830,8 @@ static int intel_crtc_atomic_check(struct drm_crtc 
> *crtc,
>   if (pipe_config->quirks & PIPE_CONFIG_QUIRK_INITIAL_PLANES)
>   intel_crtc_check_initial_planes(crtc, crtc_state);
>  
> - if (mode_changed)
> - intel_crtc->atomic.update_wm_post = !crtc_state->active;
> + if (mode_changed && !crtc_state->active)
> + intel_crtc->atomic.update_wm_post = true;
>  
>   if (mode_changed && crtc_state->enable &&
>   dev_priv->display.crtc_compute_clock &&
> @@ -14089,6 +14111,8 @@ static void intel_crtc_init(struct drm_device *dev, 
> int pipe)
>   intel_crtc->cursor_cntl = ~0;
>   intel_crtc->cursor_size = ~0;
>  
> + intel_crtc->wm.cxsr_allowed = true;
> +
>   BUG_ON(pipe >= ARRAY_SIZE(dev_priv->plane_to_crtc_mapping) ||
>  dev_priv->plane_to_crtc_mapping[intel_crtc->plane] != NULL);
>   dev_priv->plane_to_crtc_mapping[intel_crtc->plane] = &intel_crtc->base;
> diff --git a/drivers/gpu/drm/i915/intel_drv.h 
> b/drivers/gpu/drm/i915/intel_drv.h
> index f26a680..4e8d13e 100644
> --- a/

Re: [Intel-gfx] [PATCH 3/7] drm/i915: add dev_priv->mm.stolen_lock

2015-07-01 Thread Chris Wilson

On Wed, Jul 01, 2015 at 05:15:22PM -0300, Paulo Zanoni wrote:
> From: Paulo Zanoni 
> 
> Which should protect dev_priv->mm.stolen usage. This will allow us to
> simplify the relationship between stolen memory, FBC and struct_mutex.

Too coarse. The locking need only be around the stolen drm_mm, i.e. just
insert/remove node. (And you don't need the lock around drm_mm_initialized,
similarly teardown since like the init, they are serialised through
other means.)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 3/7] drm/i915: add dev_priv->mm.stolen_lock

2015-07-01 Thread Paulo Zanoni

From: Paulo Zanoni 

Which should protect dev_priv->mm.stolen usage. This will allow us to
simplify the relationship between stolen memory, FBC and struct_mutex.

Cc: Chris Wilson 
Signed-off-by: Paulo Zanoni 
---
 drivers/gpu/drm/i915/i915_drv.h|  7 +++-
 drivers/gpu/drm/i915/i915_gem_stolen.c | 69 +++---
 drivers/gpu/drm/i915/intel_fbc.c   | 29 +++---
 3 files changed, 77 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index c955037..0b908b1 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1245,6 +1245,10 @@ struct intel_l3_parity {
 struct i915_gem_mm {
/** Memory allocator for GTT stolen memory */
struct drm_mm stolen;
+   /** Protects the usage of the GTT stolen memory allocator. This is
+* always the inner lock when overlapping with struct_mutex. */
+   struct mutex stolen_lock;
+
/** List of all objects in gtt_space. Used to restore gtt
 * mappings on resume */
struct list_head bound_list;
@@ -3112,7 +3116,8 @@ static inline void i915_gem_chipset_flush(struct 
drm_device *dev)
 int i915_gem_stolen_insert_node(struct drm_i915_private *dev_priv,
struct drm_mm_node *node, u64 size,
unsigned alignment);
-void i915_gem_stolen_remove_node(struct drm_mm_node *node);
+void i915_gem_stolen_remove_node(struct drm_i915_private *dev_priv,
+struct drm_mm_node *node);
 int i915_gem_init_stolen(struct drm_device *dev);
 void i915_gem_cleanup_stolen(struct drm_device *dev);
 struct drm_i915_gem_object *
diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c 
b/drivers/gpu/drm/i915/i915_gem_stolen.c
index 0619786..b432085 100644
--- a/drivers/gpu/drm/i915/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
@@ -46,6 +46,8 @@ int i915_gem_stolen_insert_node(struct drm_i915_private 
*dev_priv,
struct drm_mm_node *node, u64 size,
unsigned alignment)
 {
+   WARN_ON(!mutex_is_locked(&dev_priv->mm.stolen_lock));
+
if (!drm_mm_initialized(&dev_priv->mm.stolen))
return -ENODEV;
 
@@ -53,8 +55,11 @@ int i915_gem_stolen_insert_node(struct drm_i915_private 
*dev_priv,
  DRM_MM_SEARCH_DEFAULT);
 }
 
-void i915_gem_stolen_remove_node(struct drm_mm_node *node)
+void i915_gem_stolen_remove_node(struct drm_i915_private *dev_priv,
+struct drm_mm_node *node)
 {
+   WARN_ON(!mutex_is_locked(&dev_priv->mm.stolen_lock));
+
drm_mm_remove_node(node);
 }
 
@@ -171,10 +176,15 @@ void i915_gem_cleanup_stolen(struct drm_device *dev)
 {
struct drm_i915_private *dev_priv = dev->dev_private;
 
+   mutex_lock(&dev_priv->mm.stolen_lock);
+
if (!drm_mm_initialized(&dev_priv->mm.stolen))
-   return;
+   goto out;
 
drm_mm_takedown(&dev_priv->mm.stolen);
+
+out:
+   mutex_unlock(&dev_priv->mm.stolen_lock);
 }
 
 int i915_gem_init_stolen(struct drm_device *dev)
@@ -183,6 +193,8 @@ int i915_gem_init_stolen(struct drm_device *dev)
u32 tmp;
int bios_reserved = 0;
 
+   mutex_init(&dev_priv->mm.stolen_lock);
+
 #ifdef CONFIG_INTEL_IOMMU
if (intel_iommu_gfx_mapped && INTEL_INFO(dev)->gen < 8) {
DRM_INFO("DMAR active, disabling use of stolen memory\n");
@@ -273,8 +285,10 @@ static void i915_gem_object_put_pages_stolen(struct 
drm_i915_gem_object *obj)
 static void
 i915_gem_object_release_stolen(struct drm_i915_gem_object *obj)
 {
+   struct drm_i915_private *dev_priv = obj->base.dev->dev_private;
+
if (obj->stolen) {
-   i915_gem_stolen_remove_node(obj->stolen);
+   i915_gem_stolen_remove_node(dev_priv, obj->stolen);
kfree(obj->stolen);
obj->stolen = NULL;
}
@@ -325,29 +339,36 @@ i915_gem_object_create_stolen(struct drm_device *dev, u32 
size)
struct drm_mm_node *stolen;
int ret;
 
+   mutex_lock(&dev_priv->mm.stolen_lock);
+
if (!drm_mm_initialized(&dev_priv->mm.stolen))
-   return NULL;
+   goto out_unlock;
 
DRM_DEBUG_KMS("creating stolen object: size=%x\n", size);
if (size == 0)
-   return NULL;
+   goto out_unlock;
 
stolen = kzalloc(sizeof(*stolen), GFP_KERNEL);
if (!stolen)
-   return NULL;
+   goto out_unlock;
 
ret = i915_gem_stolen_insert_node(dev_priv, stolen, size, 4096);
-   if (ret) {
-   kfree(stolen);
-   return NULL;
-   }
+   if (ret)
+   goto out_free;
 
obj = _i915_gem_object_create_stolen(dev, stolen);
-   if (obj)
-   return obj;
+   if (!obj)
+   goto out_node;
 
-

[Intel-gfx] [PATCH 2/7] drm/i915: move FBC code out of i915_gem_stolen.c

2015-07-01 Thread Paulo Zanoni

From: Paulo Zanoni 

With the abstractions created by the last patch, we can move this code
and the only thing inside intel_fbc.c that knows about dev_priv->mm is
the code that reads stolen_base.

We also had to move a call to i915_gem_stolen_cleanup_compression()
- now called intel_fbc_cleanup_cfb() - outside i915_gem_stolen.c.

Requested-by: Chris Wilson 
Signed-off-by: Paulo Zanoni 
---
 drivers/gpu/drm/i915/i915_dma.c|   1 +
 drivers/gpu/drm/i915/i915_drv.h|   2 -
 drivers/gpu/drm/i915/i915_gem_stolen.c | 126 
 drivers/gpu/drm/i915/intel_drv.h   |   1 +
 drivers/gpu/drm/i915/intel_fbc.c   | 128 -
 5 files changed, 127 insertions(+), 131 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index c5349fa..1ae9e0b 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1120,6 +1120,7 @@ int i915_driver_unload(struct drm_device *dev)
i915_gem_cleanup_ringbuffer(dev);
i915_gem_context_fini(dev);
mutex_unlock(&dev->struct_mutex);
+   intel_fbc_cleanup_cfb(dev);
i915_gem_cleanup_stolen(dev);
 
intel_csr_ucode_fini(dev);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index b9de374..c955037 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3114,8 +3114,6 @@ int i915_gem_stolen_insert_node(struct drm_i915_private 
*dev_priv,
unsigned alignment);
 void i915_gem_stolen_remove_node(struct drm_mm_node *node);
 int i915_gem_init_stolen(struct drm_device *dev);
-int i915_gem_stolen_setup_compression(struct drm_device *dev, int size, int 
fb_cpp);
-void i915_gem_stolen_cleanup_compression(struct drm_device *dev);
 void i915_gem_cleanup_stolen(struct drm_device *dev);
 struct drm_i915_gem_object *
 i915_gem_object_create_stolen(struct drm_device *dev, u32 size);
diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c 
b/drivers/gpu/drm/i915/i915_gem_stolen.c
index 6b43234..0619786 100644
--- a/drivers/gpu/drm/i915/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
@@ -167,131 +167,6 @@ static unsigned long i915_stolen_to_physical(struct 
drm_device *dev)
return base;
 }
 
-static int find_compression_threshold(struct drm_device *dev,
- struct drm_mm_node *node,
- int size,
- int fb_cpp)
-{
-   struct drm_i915_private *dev_priv = dev->dev_private;
-   int compression_threshold = 1;
-   int ret;
-
-   /* HACK: This code depends on what we will do in *_enable_fbc. If that
-* code changes, this code needs to change as well.
-*
-* The enable_fbc code will attempt to use one of our 2 compression
-* thresholds, therefore, in that case, we only have 1 resort.
-*/
-
-   /* Try to over-allocate to reduce reallocations and fragmentation. */
-   ret = i915_gem_stolen_insert_node(dev_priv, node, size <<= 1, 4096);
-   if (ret == 0)
-   return compression_threshold;
-
-again:
-   /* HW's ability to limit the CFB is 1:4 */
-   if (compression_threshold > 4 ||
-   (fb_cpp == 2 && compression_threshold == 2))
-   return 0;
-
-   ret = i915_gem_stolen_insert_node(dev_priv, node, size >>= 1, 4096);
-   if (ret && INTEL_INFO(dev)->gen <= 4) {
-   return 0;
-   } else if (ret) {
-   compression_threshold <<= 1;
-   goto again;
-   } else {
-   return compression_threshold;
-   }
-}
-
-static int i915_setup_compression(struct drm_device *dev, int size, int fb_cpp)
-{
-   struct drm_i915_private *dev_priv = dev->dev_private;
-   struct drm_mm_node *uninitialized_var(compressed_llb);
-   int ret;
-
-   ret = find_compression_threshold(dev, &dev_priv->fbc.compressed_fb,
-size, fb_cpp);
-   if (!ret)
-   goto err_llb;
-   else if (ret > 1) {
-   DRM_INFO("Reducing the compressed framebuffer size. This may 
lead to less power savings than a non-reduced-size. Try to increase stolen 
memory size if available in BIOS.\n");
-
-   }
-
-   dev_priv->fbc.threshold = ret;
-
-   if (INTEL_INFO(dev_priv)->gen >= 5)
-   I915_WRITE(ILK_DPFC_CB_BASE, dev_priv->fbc.compressed_fb.start);
-   else if (IS_GM45(dev)) {
-   I915_WRITE(DPFC_CB_BASE, dev_priv->fbc.compressed_fb.start);
-   } else {
-   compressed_llb = kzalloc(sizeof(*compressed_llb), GFP_KERNEL);
-   if (!compressed_llb)
-   goto err_fb;
-
-   ret = i915_gem_stolen_insert_node(dev_priv, compressed_llb,
- 4096, 4096);
-   if (ret)
-   goto err_f

[Intel-gfx] [PATCH 6/7] drm/i915: intel_unregister_dsm_handler() doesn't need struct_mutex

2015-07-01 Thread Paulo Zanoni

From: Paulo Zanoni 

So don't grab the lock before calling the function.

Signed-off-by: Paulo Zanoni 
---
 drivers/gpu/drm/i915/intel_display.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index 219e4c5..01d7cff 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -15603,12 +15603,10 @@ void intel_modeset_cleanup(struct drm_device *dev)
 */
drm_kms_helper_poll_fini(dev);
 
-   mutex_lock(&dev->struct_mutex);
-
intel_unregister_dsm_handler();
 
+   mutex_lock(&dev->struct_mutex);
intel_fbc_disable(dev);
-
mutex_unlock(&dev->struct_mutex);
 
/* flush any delayed tasks or pending work */
-- 
2.1.4

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 5/7] drm/i915: intel_frontbuffer_flip_prepare() doesn't need struct_mutex

2015-07-01 Thread Paulo Zanoni

From: Paulo Zanoni 

So release the lock earlier.

Signed-off-by: Paulo Zanoni 
---
 drivers/gpu/drm/i915/intel_display.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index e12ed4f..219e4c5 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -11475,9 +11475,9 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
  to_intel_plane(primary)->frontbuffer_bit);
 
intel_fbc_disable(dev);
+   mutex_unlock(&dev->struct_mutex);
intel_frontbuffer_flip_prepare(dev,
   
to_intel_plane(primary)->frontbuffer_bit);
-   mutex_unlock(&dev->struct_mutex);
 
trace_i915_flip_request(intel_crtc->plane, obj);
 
-- 
2.1.4

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 7/7] drm/i915: FBC doesn't need struct_mutex anymore

2015-07-01 Thread Paulo Zanoni

From: Paulo Zanoni 

Everything is covered either by fbc.lock or mm.stolen_lock, and
intel_fbc.c is already responsible for grabbing the appropriate locks
when it needs them.

Signed-off-by: Paulo Zanoni 
---
 drivers/gpu/drm/i915/i915_debugfs.c  |  4 
 drivers/gpu/drm/i915/intel_display.c | 14 +++---
 drivers/gpu/drm/i915/intel_fbc.c |  2 --
 3 files changed, 3 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index b2f3919..98fd3c9 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1660,9 +1660,7 @@ static int i915_fbc_fc_get(void *data, u64 *val)
if (INTEL_INFO(dev)->gen < 7 || !HAS_FBC(dev))
return -ENODEV;
 
-   drm_modeset_lock_all(dev);
*val = dev_priv->fbc.false_color;
-   drm_modeset_unlock_all(dev);
 
return 0;
 }
@@ -1676,7 +1674,6 @@ static int i915_fbc_fc_set(void *data, u64 val)
if (INTEL_INFO(dev)->gen < 7 || !HAS_FBC(dev))
return -ENODEV;
 
-   drm_modeset_lock_all(dev);
mutex_lock(&dev_priv->fbc.lock);
 
reg = I915_READ(ILK_DPFC_CONTROL);
@@ -1687,7 +1684,6 @@ static int i915_fbc_fc_set(void *data, u64 val)
   (reg & ~FBC_CTL_FALSE_COLOR));
 
mutex_unlock(&dev_priv->fbc.lock);
-   drm_modeset_unlock_all(dev);
return 0;
 }
 
diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index 01d7cff..83d971c 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -4747,11 +4747,8 @@ static void intel_post_plane_update(struct intel_crtc 
*crtc)
if (crtc->atomic.update_wm_post)
intel_update_watermarks(&crtc->base);
 
-   if (atomic->update_fbc) {
-   mutex_lock(&dev->struct_mutex);
+   if (atomic->update_fbc)
intel_fbc_update(dev);
-   mutex_unlock(&dev->struct_mutex);
-   }
 
if (atomic->post_enable_primary)
intel_post_enable_primary(&crtc->base);
@@ -4783,11 +4780,8 @@ static void intel_pre_plane_update(struct intel_crtc 
*crtc)
if (atomic->wait_for_flips)
intel_crtc_wait_for_pending_flips(&crtc->base);
 
-   if (atomic->disable_fbc) {
-   mutex_lock(&dev->struct_mutex);
+   if (atomic->disable_fbc)
intel_fbc_disable_crtc(crtc);
-   mutex_unlock(&dev->struct_mutex);
-   }
 
if (crtc->atomic.disable_ips)
hsw_disable_ips(crtc);
@@ -11473,9 +11467,9 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
 
i915_gem_track_fb(intel_fb_obj(work->old_fb), obj,
  to_intel_plane(primary)->frontbuffer_bit);
+   mutex_unlock(&dev->struct_mutex);
 
intel_fbc_disable(dev);
-   mutex_unlock(&dev->struct_mutex);
intel_frontbuffer_flip_prepare(dev,
   
to_intel_plane(primary)->frontbuffer_bit);
 
@@ -15605,9 +15599,7 @@ void intel_modeset_cleanup(struct drm_device *dev)
 
intel_unregister_dsm_handler();
 
-   mutex_lock(&dev->struct_mutex);
intel_fbc_disable(dev);
-   mutex_unlock(&dev->struct_mutex);
 
/* flush any delayed tasks or pending work */
flush_scheduled_work();
diff --git a/drivers/gpu/drm/i915/intel_fbc.c b/drivers/gpu/drm/i915/intel_fbc.c
index 3a98bc1..1c9f092 100644
--- a/drivers/gpu/drm/i915/intel_fbc.c
+++ b/drivers/gpu/drm/i915/intel_fbc.c
@@ -335,7 +335,6 @@ static void intel_fbc_work_fn(struct work_struct *__work)
struct drm_device *dev = work->crtc->dev;
struct drm_i915_private *dev_priv = dev->dev_private;
 
-   mutex_lock(&dev->struct_mutex);
mutex_lock(&dev_priv->fbc.lock);
if (work == dev_priv->fbc.fbc_work) {
/* Double check that we haven't switched fb without cancelling
@@ -352,7 +351,6 @@ static void intel_fbc_work_fn(struct work_struct *__work)
dev_priv->fbc.fbc_work = NULL;
}
mutex_unlock(&dev_priv->fbc.lock);
-   mutex_unlock(&dev->struct_mutex);
 
kfree(work);
 }
-- 
2.1.4

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 1/7] drm/i915: add simple wrappers for stolen node insertion/removal

2015-07-01 Thread Paulo Zanoni

From: Paulo Zanoni 

We want to move the FBC code out of i915_gem_stolen.c, but that code
directly adds/removes stolen memory nodes. Let's create this
abstraction, so i915_gme_stolen.c is still in control of all the
stolen memory handling. These abstractions will also allow us to add
locking assertions later.

Requested-by: Chris Wilson 
Signed-off-by: Paulo Zanoni 
---
 drivers/gpu/drm/i915/i915_drv.h|  4 
 drivers/gpu/drm/i915/i915_gem_stolen.c | 44 +-
 2 files changed, 32 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 1dbd957..b9de374 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3109,6 +3109,10 @@ static inline void i915_gem_chipset_flush(struct 
drm_device *dev)
 }
 
 /* i915_gem_stolen.c */
+int i915_gem_stolen_insert_node(struct drm_i915_private *dev_priv,
+   struct drm_mm_node *node, u64 size,
+   unsigned alignment);
+void i915_gem_stolen_remove_node(struct drm_mm_node *node);
 int i915_gem_init_stolen(struct drm_device *dev);
 int i915_gem_stolen_setup_compression(struct drm_device *dev, int size, int 
fb_cpp);
 void i915_gem_stolen_cleanup_compression(struct drm_device *dev);
diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c 
b/drivers/gpu/drm/i915/i915_gem_stolen.c
index 348ed5a..6b43234 100644
--- a/drivers/gpu/drm/i915/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
@@ -42,6 +42,22 @@
  * for is a boon.
  */
 
+int i915_gem_stolen_insert_node(struct drm_i915_private *dev_priv,
+   struct drm_mm_node *node, u64 size,
+   unsigned alignment)
+{
+   if (!drm_mm_initialized(&dev_priv->mm.stolen))
+   return -ENODEV;
+
+   return drm_mm_insert_node(&dev_priv->mm.stolen, node, size, alignment,
+ DRM_MM_SEARCH_DEFAULT);
+}
+
+void i915_gem_stolen_remove_node(struct drm_mm_node *node)
+{
+   drm_mm_remove_node(node);
+}
+
 static unsigned long i915_stolen_to_physical(struct drm_device *dev)
 {
struct drm_i915_private *dev_priv = dev->dev_private;
@@ -168,8 +184,7 @@ static int find_compression_threshold(struct drm_device 
*dev,
 */
 
/* Try to over-allocate to reduce reallocations and fragmentation. */
-   ret = drm_mm_insert_node(&dev_priv->mm.stolen, node,
-size <<= 1, 4096, DRM_MM_SEARCH_DEFAULT);
+   ret = i915_gem_stolen_insert_node(dev_priv, node, size <<= 1, 4096);
if (ret == 0)
return compression_threshold;
 
@@ -179,9 +194,7 @@ again:
(fb_cpp == 2 && compression_threshold == 2))
return 0;
 
-   ret = drm_mm_insert_node(&dev_priv->mm.stolen, node,
-size >>= 1, 4096,
-DRM_MM_SEARCH_DEFAULT);
+   ret = i915_gem_stolen_insert_node(dev_priv, node, size >>= 1, 4096);
if (ret && INTEL_INFO(dev)->gen <= 4) {
return 0;
} else if (ret) {
@@ -218,8 +231,8 @@ static int i915_setup_compression(struct drm_device *dev, 
int size, int fb_cpp)
if (!compressed_llb)
goto err_fb;
 
-   ret = drm_mm_insert_node(&dev_priv->mm.stolen, compressed_llb,
-4096, 4096, DRM_MM_SEARCH_DEFAULT);
+   ret = i915_gem_stolen_insert_node(dev_priv, compressed_llb,
+ 4096, 4096);
if (ret)
goto err_fb;
 
@@ -240,7 +253,7 @@ static int i915_setup_compression(struct drm_device *dev, 
int size, int fb_cpp)
 
 err_fb:
kfree(compressed_llb);
-   drm_mm_remove_node(&dev_priv->fbc.compressed_fb);
+   i915_gem_stolen_remove_node(&dev_priv->fbc.compressed_fb);
 err_llb:
pr_info_once("drm: not enough stolen space for compressed buffer (need 
%d more bytes), disabling. Hint: you may be able to increase stolen memory size 
in the BIOS to avoid this.\n", size);
return -ENOSPC;
@@ -269,10 +282,10 @@ void i915_gem_stolen_cleanup_compression(struct 
drm_device *dev)
if (dev_priv->fbc.uncompressed_size == 0)
return;
 
-   drm_mm_remove_node(&dev_priv->fbc.compressed_fb);
+   i915_gem_stolen_remove_node(&dev_priv->fbc.compressed_fb);
 
if (dev_priv->fbc.compressed_llb) {
-   drm_mm_remove_node(dev_priv->fbc.compressed_llb);
+   i915_gem_stolen_remove_node(dev_priv->fbc.compressed_llb);
kfree(dev_priv->fbc.compressed_llb);
}
 
@@ -387,7 +400,7 @@ static void
 i915_gem_object_release_stolen(struct drm_i915_gem_object *obj)
 {
if (obj->stolen) {
-   drm_mm_remove_node(obj->stolen);
+   i915_gem_stolen_remove_node(obj->stolen);
kfree(obj->stolen);

[Intel-gfx] [PATCH 4/7] drm/i915: add the FBC mutex

2015-07-01 Thread Paulo Zanoni

From: Paulo Zanoni 

Make sure we're not going to have weird races in really weird cases
where a lot of different CRTCs are doing rendering and modesets at the
same time.

With this change and the stolen_lock from the previous patch, we can start
removing the struct_mutex locking we have around FBC in the next patches.

v2:
 - Rebase (6 months later)
 - Also lock debugfs and stolen.
v3:
 - Don't lock a single value read (Chris).
 - Replace lockdep assertions with WARNs (Daniel).
 - Improve commit message.
 - Don't forget intel_pre_plane_update() locking.
v4:
 - Don't remove struct_mutex at intel_pre_plane_update() (Chris).
 - Add comment regarding locking dependencies (Chris).
 - Rebase after the stolen code rework.
 - Rebase again after drm-intel-nightly changes.

Signed-off-by: Paulo Zanoni 
---
 drivers/gpu/drm/i915/i915_debugfs.c  |  4 ++
 drivers/gpu/drm/i915/i915_drv.h  |  3 ++
 drivers/gpu/drm/i915/intel_display.c |  6 +--
 drivers/gpu/drm/i915/intel_drv.h |  1 +
 drivers/gpu/drm/i915/intel_fbc.c | 94 +++-
 5 files changed, 91 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index 71ba519..b2f3919 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1633,6 +1633,7 @@ static int i915_fbc_status(struct seq_file *m, void 
*unused)
}
 
intel_runtime_pm_get(dev_priv);
+   mutex_lock(&dev_priv->fbc.lock);
 
if (intel_fbc_enabled(dev))
seq_puts(m, "FBC enabled\n");
@@ -1645,6 +1646,7 @@ static int i915_fbc_status(struct seq_file *m, void 
*unused)
   yesno(I915_READ(FBC_STATUS2) &
 FBC_COMPRESSION_MASK));
 
+   mutex_unlock(&dev_priv->fbc.lock);
intel_runtime_pm_put(dev_priv);
 
return 0;
@@ -1675,6 +1677,7 @@ static int i915_fbc_fc_set(void *data, u64 val)
return -ENODEV;
 
drm_modeset_lock_all(dev);
+   mutex_lock(&dev_priv->fbc.lock);
 
reg = I915_READ(ILK_DPFC_CONTROL);
dev_priv->fbc.false_color = val;
@@ -1683,6 +1686,7 @@ static int i915_fbc_fc_set(void *data, u64 val)
   (reg | FBC_CTL_FALSE_COLOR) :
   (reg & ~FBC_CTL_FALSE_COLOR));
 
+   mutex_unlock(&dev_priv->fbc.lock);
drm_modeset_unlock_all(dev);
return 0;
 }
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 0b908b1..4d3d4103 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -899,6 +899,9 @@ enum fb_op_origin {
 };
 
 struct i915_fbc {
+   /* This is always the inner lock when overlapping with struct_mutex and
+* it's the outer lock when overlapping with stolen_lock. */
+   struct mutex lock;
unsigned long uncompressed_size;
unsigned threshold;
unsigned int fb_id;
diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index 5de1ded..e12ed4f 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -4783,11 +4783,9 @@ static void intel_pre_plane_update(struct intel_crtc 
*crtc)
if (atomic->wait_for_flips)
intel_crtc_wait_for_pending_flips(&crtc->base);
 
-   if (atomic->disable_fbc &&
-   dev_priv->fbc.crtc == crtc) {
+   if (atomic->disable_fbc) {
mutex_lock(&dev->struct_mutex);
-   if (dev_priv->fbc.crtc == crtc)
-   intel_fbc_disable(dev);
+   intel_fbc_disable_crtc(crtc);
mutex_unlock(&dev->struct_mutex);
}
 
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index 82abbfa..63d7d32 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -1252,6 +1252,7 @@ bool intel_fbc_enabled(struct drm_device *dev);
 void intel_fbc_update(struct drm_device *dev);
 void intel_fbc_init(struct drm_i915_private *dev_priv);
 void intel_fbc_disable(struct drm_device *dev);
+void intel_fbc_disable_crtc(struct intel_crtc *crtc);
 void intel_fbc_invalidate(struct drm_i915_private *dev_priv,
  unsigned int frontbuffer_bits,
  enum fb_op_origin origin);
diff --git a/drivers/gpu/drm/i915/intel_fbc.c b/drivers/gpu/drm/i915/intel_fbc.c
index dcd83ab..3a98bc1 100644
--- a/drivers/gpu/drm/i915/intel_fbc.c
+++ b/drivers/gpu/drm/i915/intel_fbc.c
@@ -336,6 +336,7 @@ static void intel_fbc_work_fn(struct work_struct *__work)
struct drm_i915_private *dev_priv = dev->dev_private;
 
mutex_lock(&dev->struct_mutex);
+   mutex_lock(&dev_priv->fbc.lock);
if (work == dev_priv->fbc.fbc_work) {
/* Double check that we haven't switched fb without cancelling
 * the prior work.
@@ -350,6 +351,7 @@ static void intel_fbc_work_fn(struct work_struct *__work

[Intel-gfx] [PATCH 0/7] FBC (+stolen) locking, v4

2015-07-01 Thread Paulo Zanoni

From: Paulo Zanoni 

Hi

So, based on the reviews, here's v4 of the series, now with
dev_priv->mm.stolen_lock. This allows us to completely get rid of struct_mutex
locking around FBC calls. Kudos to Chris for the suggestion.

The patch that added intel_fbc_stop() got removed from this series because the
addition of mm.stolen_lock turned it into just an optimization instead of a
bugfix. Let's leave it to another series, where I'll also try to clarify all the
function names involved.

Since we didn't seem to reach any conclusions on the lockdep_assert_held vs
WARN_ON discussion, I kept using WARN_ON since it's what the maintainer is
asking and since it's what's winning in usage count. If we decide to change it,
we can always do it in a later patch.

Thanks,
Paulo

Paulo Zanoni (7):
  drm/i915: add simple wrappers for stolen node insertion/removal
  drm/i915: move FBC code out of i915_gem_stolen.c
  drm/i915: add dev_priv->mm.stolen_lock
  drm/i915: add the FBC mutex
  drm/i915: intel_frontbuffer_flip_prepare() doesn't need struct_mutex
  drm/i915: intel_unregister_dsm_handler() doesn't need struct_mutex
  drm/i915: FBC doesn't need struct_mutex anymore

 drivers/gpu/drm/i915/i915_debugfs.c|   8 +-
 drivers/gpu/drm/i915/i915_dma.c|   1 +
 drivers/gpu/drm/i915/i915_drv.h|  14 +-
 drivers/gpu/drm/i915/i915_gem_stolen.c | 215 +
 drivers/gpu/drm/i915/intel_display.c   |  20 +--
 drivers/gpu/drm/i915/intel_drv.h   |   2 +
 drivers/gpu/drm/i915/intel_fbc.c   | 239 ++---
 7 files changed, 309 insertions(+), 190 deletions(-)

-- 
2.1.4

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH v2] drm/i915: Report correct GGTT space usage

2015-07-01 Thread shuang . he

Tested-By: Intel Graphics QA PRTS (Patch Regression Test System Contact: 
shuang...@intel.com)
Task id: 6678
-Summary-
Platform  Delta  drm-intel-nightly  Series Applied
ILK  302/302  302/302
SNB  312/316  312/316
IVB  343/343  343/343
BYT -2  287/287  285/287
HSW  380/380  380/380
-Detailed-
Platform  Testdrm-intel-nightly  Series 
Applied
*BYT  igt@gem_partial_pwrite_pread@reads-display  PASS(1)  FAIL(1)
*BYT  igt@gem_tiled_partial_pwrite_pread@writes-after-reads  PASS(1)  
FAIL(1)
Note: You need to pay more attention to line start with '*'
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH v2 07/10] drm/i915: Try to make sure cxsr is disabled around plane enable/disable

2015-07-01 Thread Paulo Zanoni

2015-07-01 16:13 GMT-03:00  :
> From: Ville Syrjälä 
>
> CxSR (or maxfifo on VLV/CHV) blocks somne changes to the plane control
> register (enable bit at least, not quite sure about the rest). So in
> order to have the plane enable/disable when we want we need to first
> kick the hardware out of cxsr.
>
> Unfortunateloy this requires some extra vblank waits. For the CxSR
> enable after the plane update we should eventually use an async
> vblank worker, but since we don't have that just do sync vblank
> waits. For the disable case we have no choice but to do it
> synchronously.
>
> v2: Don't add a spurious intel_pre_plane_update() to crtc disable
>
> Cc: Paulo Zanoni 
> Reviewed-by: Clint Taylor 
> Tested-by: Clint Taylor 
> Signed-off-by: Ville Syrjälä 
> ---
> Paulo noticed some frontbuffer_bits WARNs from this patch, and sure enough
> I accidentally added another intel_pre_plane_update() to the crtc disable 
> loop.
> I failed to notice because I had commented out the frontbuffer_bits WARNs 
> earlier
> from my tree since they were too noisy.

Ok, so I didn't test this exact v2 patch since I'm on -nightly and v1
is already applied there. But I removed the extra
intel_pre_plane_update() added by v1 of the patch, booted, and I can
confirm the WARNs are gone. The test I was using to reproduce the
WARNs was:

sudo ./kms_frontbuffer_tracking --run-subtest fbc-1p-rte

Kinda-tested-by: Paulo Zanoni 

Thanks for the quick response and fix!

>
>  drivers/gpu/drm/i915/intel_display.c | 34 +-
>  drivers/gpu/drm/i915/intel_drv.h |  3 +++
>  drivers/gpu/drm/i915/intel_pm.c  | 11 ---
>  3 files changed, 36 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_display.c 
> b/drivers/gpu/drm/i915/intel_display.c
> index d67b5f1..defc4ce 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -4716,6 +4716,9 @@ static void intel_post_plane_update(struct intel_crtc 
> *crtc)
>
> intel_frontbuffer_flip(dev, atomic->fb_bits);
>
> +   if (atomic->disable_cxsr)
> +   crtc->wm.cxsr_allowed = true;
> +
> if (crtc->atomic.update_wm_post)
> intel_update_watermarks(&crtc->base);
>
> @@ -4765,6 +4768,11 @@ static void intel_pre_plane_update(struct intel_crtc 
> *crtc)
>
> if (atomic->pre_disable_primary)
> intel_pre_disable_primary(&crtc->base);
> +
> +   if (atomic->disable_cxsr) {
> +   crtc->wm.cxsr_allowed = false;
> +   intel_set_memory_cxsr(dev_priv, false);
> +   }
>  }
>
>  static void intel_crtc_disable_planes(struct drm_crtc *crtc, unsigned 
> plane_mask)
> @@ -11646,12 +11654,26 @@ int intel_plane_atomic_calc_changes(struct 
> drm_crtc_state *crtc_state,
>  plane->base.id, was_visible, visible,
>  turn_off, turn_on, mode_changed);
>
> -   if (turn_on)
> +   if (turn_on) {
> intel_crtc->atomic.update_wm_pre = true;
> -   else if (turn_off)
> +   /* must disable cxsr around plane enable/disable */
> +   if (plane->type != DRM_PLANE_TYPE_CURSOR) {
> +   intel_crtc->atomic.disable_cxsr = true;
> +   /* to potentially re-enable cxsr */
> +   intel_crtc->atomic.wait_vblank = true;
> +   intel_crtc->atomic.update_wm_post = true;
> +   }
> +   } else if (turn_off) {
> intel_crtc->atomic.update_wm_post = true;
> -   else if (intel_wm_need_update(plane, plane_state))
> +   /* must disable cxsr around plane enable/disable */
> +   if (plane->type != DRM_PLANE_TYPE_CURSOR) {
> +   if (is_crtc_enabled)
> +   intel_crtc->atomic.wait_vblank = true;
> +   intel_crtc->atomic.disable_cxsr = true;
> +   }
> +   } else if (intel_wm_need_update(plane, plane_state)) {
> intel_crtc->atomic.update_wm_pre = true;
> +   }
>
> if (visible)
> intel_crtc->atomic.fb_bits |=
> @@ -11808,8 +11830,8 @@ static int intel_crtc_atomic_check(struct drm_crtc 
> *crtc,
> if (pipe_config->quirks & PIPE_CONFIG_QUIRK_INITIAL_PLANES)
> intel_crtc_check_initial_planes(crtc, crtc_state);
>
> -   if (mode_changed)
> -   intel_crtc->atomic.update_wm_post = !crtc_state->active;
> +   if (mode_changed && !crtc_state->active)
> +   intel_crtc->atomic.update_wm_post = true;
>
> if (mode_changed && crtc_state->enable &&
> dev_priv->display.crtc_compute_clock &&
> @@ -14089,6 +14111,8 @@ static void intel_crtc_init(struct drm_device *dev, 
> int pipe)
> intel_crtc->cursor_cntl = ~0;
> intel_crtc->cursor_size = ~0;
>
> +   intel_crtc->wm.cxsr_allowed = true;
> +
> BUG_ON(pipe >= ARRAY_SIZE(dev_pri

Re: [Intel-gfx] [PATCH] drm/i915: Per-DDI I_boost override

2015-07-01 Thread Paulo Zanoni

2015-07-01 9:19 GMT-03:00 Daniel Vetter :
> On Thu, Jun 25, 2015 at 03:16:37PM +0200, Daniel Vetter wrote:
>> On Thu, Jun 25, 2015 at 02:18:06PM +0300, David Weinehall wrote:
>> > On Thu, Jun 25, 2015 at 09:37:22AM +0200, Daniel Vetter wrote:
>> > > On Thu, Jun 25, 2015 at 09:14:09AM +0300, David Weinehall wrote:
>> > > > Looks good.
>> > > >
>> > > > Reviewed-by: David Weinehall 
>> > > >
>> > > > On Thu, Jun 18, 2015 at 02:23:37PM +0300, Antti Koskipaa wrote:
>> > > > > An OEM may request increased I_boost beyond the recommended values
>> > > > > by specifying an I_boost value to be applied to all swing entries for
>> > > > > a port. These override values are specified in VBT.
>> > > > >
>> > > > > Issue: VIZ-5676
>> > > > > Signed-off-by: Antti Koskipaa 
>> > >
>> > > What depencies has this patch? Please either mention that or include this
>> > > patch in whatever other patch series it needs, since I can't apply this.
>> > > Maybe it also simply needs a rebase, but conflicts don't look like that.
>> >
>> > As mentioned in Antti's reply to the patch, it depends on
>> > "drm/i915/skl: Buffer translation improvements".
>> >
>> > Antti's patch still applies against v2 of my patch, but with some slight
>> > fuzz.
>>
>> Light fuzz I can handle. In the future can you pls just include Antti's
>> patch when submitting yours in the same series? That way I won't miss it.
>
> Queued for -next, thanks for the patch.

Git bisect tells me this commit introduced the following message when
I boot BDW:

[drm:intel_parse_bios [i915]] *ERROR* General definiton block child
device size is too small.


These are the lines around it:

[8.607616] [drm] Driver supports precise vblank timestamp query.
[8.607623] [drm:init_vbt_defaults] Set default to SSC at 12 kHz
[8.607630] [drm:validate_vbt] Using VBT from OpRegion: $VBT HASWELLd
[8.607638] [drm:parse_general_features] BDB_GENERAL_FEATURES
int_tv_support 0 int_crt_support 0 lvds_use_ssc 0 lvds_ssc_freq 12
display_clock_mode 0 fdi_rx_polarity_inverted 0
[8.607651] [drm:parse_general_definitions] crt_ddc_bus_pin: 2
[8.607658] [drm:parse_lfp_panel_data] DRRS supported mode is static
[8.607674] [drm:parse_lfp_panel_data] Found panel mode in BIOS VBT tables:
[8.607682] [drm:drm_mode_debug_printmodeline] Modeline
0:"1920x1080" 0 138780 1920 1966 1996 2080 1080 1082 1086 1112 0x8 0xa
[8.607692] [drm:parse_lfp_panel_data] VBT initial LVDS value 30033c
[8.607699] [drm:parse_lfp_backlight] VBT backlight PWM modulation
frequency 200 Hz, active high, min brightness 0, level 255
[8.607719] [drm:parse_sdvo_panel_data] Found SDVO panel mode in
BIOS VBT tables:
[8.607728] [drm:drm_mode_debug_printmodeline] Modeline
0:"1600x1200" 0 162000 1600 1664 1856 2160 1200 1201 1204 1250 0x8 0xa
[8.607738] [drm:parse_sdvo_device_mapping] different child size is
found. Invalid.
[8.607773] [drm:intel_parse_bios [i915]] *ERROR* General definiton
block child device size is too small.
[8.607782] [drm:parse_driver_features] DRRS State Enabled:1
[8.609574] [drm:intel_dsm_pci_probe] no _DSM method for intel device
[8.609604] [drm:i915_gem_init_stolen] found 33554432 bytes of
stolen memory at ae00


> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
> ___
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx



-- 
Paulo Zanoni
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH v2 07/10] drm/i915: Try to make sure cxsr is disabled around plane enable/disable

2015-07-01 Thread ville . syrjala

From: Ville Syrjälä 

CxSR (or maxfifo on VLV/CHV) blocks somne changes to the plane control
register (enable bit at least, not quite sure about the rest). So in
order to have the plane enable/disable when we want we need to first
kick the hardware out of cxsr.

Unfortunateloy this requires some extra vblank waits. For the CxSR
enable after the plane update we should eventually use an async
vblank worker, but since we don't have that just do sync vblank
waits. For the disable case we have no choice but to do it
synchronously.

v2: Don't add a spurious intel_pre_plane_update() to crtc disable

Cc: Paulo Zanoni 
Reviewed-by: Clint Taylor 
Tested-by: Clint Taylor 
Signed-off-by: Ville Syrjälä 
---
Paulo noticed some frontbuffer_bits WARNs from this patch, and sure enough
I accidentally added another intel_pre_plane_update() to the crtc disable loop.
I failed to notice because I had commented out the frontbuffer_bits WARNs 
earlier
from my tree since they were too noisy.

 drivers/gpu/drm/i915/intel_display.c | 34 +-
 drivers/gpu/drm/i915/intel_drv.h |  3 +++
 drivers/gpu/drm/i915/intel_pm.c  | 11 ---
 3 files changed, 36 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index d67b5f1..defc4ce 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -4716,6 +4716,9 @@ static void intel_post_plane_update(struct intel_crtc 
*crtc)
 
intel_frontbuffer_flip(dev, atomic->fb_bits);
 
+   if (atomic->disable_cxsr)
+   crtc->wm.cxsr_allowed = true;
+
if (crtc->atomic.update_wm_post)
intel_update_watermarks(&crtc->base);
 
@@ -4765,6 +4768,11 @@ static void intel_pre_plane_update(struct intel_crtc 
*crtc)
 
if (atomic->pre_disable_primary)
intel_pre_disable_primary(&crtc->base);
+
+   if (atomic->disable_cxsr) {
+   crtc->wm.cxsr_allowed = false;
+   intel_set_memory_cxsr(dev_priv, false);
+   }
 }
 
 static void intel_crtc_disable_planes(struct drm_crtc *crtc, unsigned 
plane_mask)
@@ -11646,12 +11654,26 @@ int intel_plane_atomic_calc_changes(struct 
drm_crtc_state *crtc_state,
 plane->base.id, was_visible, visible,
 turn_off, turn_on, mode_changed);
 
-   if (turn_on)
+   if (turn_on) {
intel_crtc->atomic.update_wm_pre = true;
-   else if (turn_off)
+   /* must disable cxsr around plane enable/disable */
+   if (plane->type != DRM_PLANE_TYPE_CURSOR) {
+   intel_crtc->atomic.disable_cxsr = true;
+   /* to potentially re-enable cxsr */
+   intel_crtc->atomic.wait_vblank = true;
+   intel_crtc->atomic.update_wm_post = true;
+   }
+   } else if (turn_off) {
intel_crtc->atomic.update_wm_post = true;
-   else if (intel_wm_need_update(plane, plane_state))
+   /* must disable cxsr around plane enable/disable */
+   if (plane->type != DRM_PLANE_TYPE_CURSOR) {
+   if (is_crtc_enabled)
+   intel_crtc->atomic.wait_vblank = true;
+   intel_crtc->atomic.disable_cxsr = true;
+   }
+   } else if (intel_wm_need_update(plane, plane_state)) {
intel_crtc->atomic.update_wm_pre = true;
+   }
 
if (visible)
intel_crtc->atomic.fb_bits |=
@@ -11808,8 +11830,8 @@ static int intel_crtc_atomic_check(struct drm_crtc 
*crtc,
if (pipe_config->quirks & PIPE_CONFIG_QUIRK_INITIAL_PLANES)
intel_crtc_check_initial_planes(crtc, crtc_state);
 
-   if (mode_changed)
-   intel_crtc->atomic.update_wm_post = !crtc_state->active;
+   if (mode_changed && !crtc_state->active)
+   intel_crtc->atomic.update_wm_post = true;
 
if (mode_changed && crtc_state->enable &&
dev_priv->display.crtc_compute_clock &&
@@ -14089,6 +14111,8 @@ static void intel_crtc_init(struct drm_device *dev, int 
pipe)
intel_crtc->cursor_cntl = ~0;
intel_crtc->cursor_size = ~0;
 
+   intel_crtc->wm.cxsr_allowed = true;
+
BUG_ON(pipe >= ARRAY_SIZE(dev_priv->plane_to_crtc_mapping) ||
   dev_priv->plane_to_crtc_mapping[intel_crtc->plane] != NULL);
dev_priv->plane_to_crtc_mapping[intel_crtc->plane] = &intel_crtc->base;
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index f26a680..4e8d13e 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -507,6 +507,7 @@ struct intel_crtc_atomic_commit {
/* Sleepable operations to perform before commit */
bool wait_for_flips;
bool disable_fbc;
+   bool disable_cxsr;
bool pre_disable_primary;
bool update_wm_pr

Re: [Intel-gfx] [PATCH v2 0/4] drm/i915: Re-enable HDMI 12bpc

2015-07-01 Thread Imre Deak

On Tue, 2015-06-30 at 15:33 +0300, ville.syrj...@linux.intel.com wrote:
> From: Ville Syrjälä 
> 
> Here's my second attempt at flipping HDMI 12bpc back on. In my last attempt 
> [1]
> Imre found that lots of standard CEA modes (1080p60 etc.) no longer work on
> BXT due to the 12bpc port clock landing in a range that the DPLL can't 
> generate.
> CHV has the same limitation but it doesn't do 12bpc so the situation there
> isn't as bad.
> 
> This series attempts to work around this problem by falling back to 8bpc when
> the 1.5x port clock frequency turns out to be bad. Additionally we will from
> now on filter out any mode where both 8bpc and 12bpc port clock is bad.
> 
> [1] http://lists.freedesktop.org/archives/intel-gfx/2015-June/068988.html
> 
> Ville Syrjälä (4):
>   drm/i915: Fix HDMI 12bpc and pixel repeat clock readout for DDI
> platforms
>   drm/i915: Bump HDMI min port clock to 25 MHz
>   drm/i915: Account for CHV/BXT DPLL clock limittions
>   Revert "drm/i915: Disable 12bpc hdmi for now"
> 
>  drivers/gpu/drm/i915/intel_ddi.c  | 49 +-
>  drivers/gpu/drm/i915/intel_hdmi.c | 55 
> +++
>  2 files changed, 63 insertions(+), 41 deletions(-)

Looks ok to me. 12bpc modesets and the fall-back logic seems to work
fine on BXT. On the series:
Reviewed-and-tested-by: Imre Deak 


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915: Report correct GGTT space usage

2015-07-01 Thread shuang . he

Tested-By: Intel Graphics QA PRTS (Patch Regression Test System Contact: 
shuang...@intel.com)
Task id: 6676
-Summary-
Platform  Delta  drm-intel-nightly  Series Applied
ILK  302/302  302/302
SNB  312/316  312/316
IVB  345/345  345/345
BYT -3  287/287  284/287
HSW  382/382  382/382
-Detailed-
Platform  Testdrm-intel-nightly  Series 
Applied
*BYT  igt@gem_partial_pwrite_pread@reads  PASS(1)  FAIL(1)
*BYT  igt@gem_partial_pwrite_pread@reads-display  PASS(1)  FAIL(1)
*BYT  igt@gem_partial_pwrite_pread@reads-uncached  PASS(1)  FAIL(1)
Note: You need to pay more attention to line start with '*'
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH libdrm v2 1/2] intel: Add EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag.

2015-07-01 Thread Emil Velikov

Hi Michel,

Although I cannot comment on the exact implementation I can give you
general some tips which you might find useful.

On 1 July 2015 at 16:28, Michel Thierry  wrote:
> Gen8+ supports 48-bit virtual addresses, but some objects must always be
> allocated inside the 32-bit address range.
>
> In specific, any resource used with flat/heapless (0x-0xf000)
> General State Heap (GSH) or Intruction State Heap (ISH) must be in a
> 32-bit range, because the General State Offset and Instruction State Offset
> are limited to 32-bits.
>
> Provide a flag to set when the 4GB limit is not necessary in a given bo.
> 48-bit range will only be used when explicitly requested.
>
> Calls to the new drm_intel_bo_emit_reloc_48bit function will have this flag
> set automatically, while calls to drm_intel_bo_emit_reloc will clear it.
>
> v2: Make set/clear functions nops on pre-gen8 platforms, and use them
> internally in emit_reloc functions (Ben)
> s/48BADDRESS/48B_ADDRESS/ (Dave)
>
> Cc: Ben Widawsky 
> Cc: Dave Gordon 
> Cc: dri-de...@lists.freedesktop.org
> Signed-off-by: Michel Thierry 
> ---
>  include/drm/i915_drm.h|  3 ++-
>  intel/intel_bufmgr.c  | 24 +
>  intel/intel_bufmgr.h  |  8 ++-
>  intel/intel_bufmgr_gem.c  | 54 
> +++
>  intel/intel_bufmgr_priv.h | 11 ++
>  5 files changed, 94 insertions(+), 6 deletions(-)
>
> diff --git a/include/drm/i915_drm.h b/include/drm/i915_drm.h
> index ded43b1..426b25c 100644
> --- a/include/drm/i915_drm.h
> +++ b/include/drm/i915_drm.h
> @@ -680,7 +680,8 @@ struct drm_i915_gem_exec_object2 {
>  #define EXEC_OBJECT_NEEDS_FENCE (1<<0)
>  #define EXEC_OBJECT_NEEDS_GTT  (1<<1)
>  #define EXEC_OBJECT_WRITE  (1<<2)
> -#define __EXEC_OBJECT_UNKNOWN_FLAGS -(EXEC_OBJECT_WRITE<<1)
> +#define EXEC_OBJECT_SUPPORTS_48B_ADDRESS (1<<3)
> +#define __EXEC_OBJECT_UNKNOWN_FLAGS -(EXEC_OBJECT_SUPPORTS_48B_ADDRESS<<1)
Perhaps you already know this but changes like these go in _after_ the
updated kernel header is part of linux-next or a released kernel
version.

> __u64 flags;
>
> __u64 rsvd1;
> diff --git a/intel/intel_bufmgr.c b/intel/intel_bufmgr.c
> index 14ea9f9..590a855 100644
> --- a/intel/intel_bufmgr.c
> +++ b/intel/intel_bufmgr.c
> @@ -188,6 +188,18 @@ drm_intel_bufmgr_check_aperture_space(drm_intel_bo ** 
> bo_array, int count)
> return bo_array[0]->bufmgr->check_aperture_space(bo_array, count);
>  }
>
> +void drm_intel_bo_set_supports_48b_address(drm_intel_bo *bo)
> +{
> +   if (bo->bufmgr->bo_set_supports_48b_address)
> +   bo->bufmgr->bo_set_supports_48b_address(bo);
> +}
> +
> +void drm_intel_bo_clear_supports_48b_address(drm_intel_bo *bo)
> +{
> +   if (bo->bufmgr->bo_clear_supports_48b_address)
> +   bo->bufmgr->bo_clear_supports_48b_address(bo);
> +}
> +
>  int
>  drm_intel_bo_flink(drm_intel_bo *bo, uint32_t * name)
>  {
> @@ -202,6 +214,18 @@ drm_intel_bo_emit_reloc(drm_intel_bo *bo, uint32_t 
> offset,
> drm_intel_bo *target_bo, uint32_t target_offset,
> uint32_t read_domains, uint32_t write_domain)
>  {
> +   drm_intel_bo_clear_supports_48b_address(target_bo);
> +   return bo->bufmgr->bo_emit_reloc(bo, offset,
> +target_bo, target_offset,
> +read_domains, write_domain);
> +}
> +
> +int
> +drm_intel_bo_emit_reloc_48bit(drm_intel_bo *bo, uint32_t offset,
> +   drm_intel_bo *target_bo, uint32_t target_offset,
> +   uint32_t read_domains, uint32_t write_domain)
> +{
> +   drm_intel_bo_set_supports_48b_address(target_bo);
> return bo->bufmgr->bo_emit_reloc(bo, offset,
>  target_bo, target_offset,
>  read_domains, write_domain);
> diff --git a/intel/intel_bufmgr.h b/intel/intel_bufmgr.h
> index 285919e..62480cb 100644
> --- a/intel/intel_bufmgr.h
> +++ b/intel/intel_bufmgr.h
> @@ -87,7 +87,8 @@ struct _drm_intel_bo {
> /**
>  * Last seen card virtual address (offset from the beginning of the
>  * aperture) for the object.  This should be used to fill relocation
> -* entries when calling drm_intel_bo_emit_reloc()
> +* entries when calling drm_intel_bo_emit_reloc() or
> +* drm_intel_bo_emit_reloc_48bit()
>  */
> uint64_t offset64;
>  };
> @@ -137,6 +138,8 @@ void drm_intel_bo_wait_rendering(drm_intel_bo *bo);
>
>  void drm_intel_bufmgr_set_debug(drm_intel_bufmgr *bufmgr, int enable_debug);
>  void drm_intel_bufmgr_destroy(drm_intel_bufmgr *bufmgr);
> +void drm_intel_bo_set_supports_48b_address(drm_intel_bo *bo);
> +void drm_intel_bo_clear_supports_48b_address(drm_intel_bo *bo);
Are these two are internal/implementation specific functions ? If so
please don't include them in this public header a

Re: [Intel-gfx] [PATCH 1/2] drm/i915: Extend GET_APERTURE ioctl to report available map space

2015-07-01 Thread Ankitprasad Sharma

On Wed, 2015-07-01 at 15:39 +0200, Daniel Vetter wrote:
> On Wed, Jul 01, 2015 at 02:55:12PM +0530, ankitprasad.r.sha...@intel.com 
> wrote:
> > From: Rodrigo Vivi 
> > 
> > When constructing a batchbuffer, it is sometimes crucial to know the
> > largest hole into which we can fit a fenceable buffer (for example when
> > handling very large objects on gen2 and gen3). This depends on the
> > fragmentation of pinned buffers inside the aperture, a question only the
> > kernel can easily answer.
> > 
> > This patch extends the current DRM_I915_GEM_GET_APERTURE ioctl to
> > include a couple of new fields in its reply to userspace - the total
> > amount of space available in the mappable region of the aperture and
> > also the single largest block available.
> > 
> > This is not quite what userspace wants to answer the question of whether
> > this batch will fit as fences are also required to meet severe alignment
> > constraints within the batch. For this purpose, a third conservative
> > estimate of largest fence available is also provided. For when userspace
> > needs more than one batch, we also provide the culmulative space
> > available for fences such that it has some additional guidance to how
> > much space it could allocate to fences. Conservatism still wins.
> > 
> > The patch also adds a debugfs file for convenient testing and reporting.
> > 
> > v2: The first object cannot end at offset 0, so we can use last==0 to
> > detect the empty list.
> > 
> > v3: Expand all values to 64bit, just in case.
> > Report total mappable aperture size for userspace that cannot easily
> > determine it by inspecting the PCI device.
> > 
> > v4: (Rodrigo) Fixed rebase conflicts.
> > 
> > v5: Rebased to the latest drm-intel-nightly (Ankit)
> > 
> > Signed-off-by: Chris Wilson 
> > Signed-off-by: Rodrigo Vivi 
> > ---
> >  drivers/gpu/drm/i915/i915_debugfs.c |  27 +
> >  drivers/gpu/drm/i915/i915_gem.c | 116 
> > ++--
> >  2 files changed, 139 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
> > b/drivers/gpu/drm/i915/i915_debugfs.c
> > index 31d8768..49ec438 100644
> > --- a/drivers/gpu/drm/i915/i915_debugfs.c
> > +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> > @@ -512,6 +512,32 @@ static int i915_gem_object_info(struct seq_file *m, 
> > void* data)
> > return 0;
> >  }
> >  
> > +static int i915_gem_aperture_info(struct seq_file *m, void *data)
> > +{
> > +   struct drm_info_node *node = m->private;
> > +   struct drm_i915_gem_get_aperture arg;
> > +   int ret;
> > +
> > +   ret = i915_gem_get_aperture_ioctl(node->minor->dev, &arg, NULL);
> > +   if (ret)
> > +   return ret;
> > +
> > +   seq_printf(m, "Total size of the GTT: %llu bytes\n",
> > +  arg.aper_size);
> > +   seq_printf(m, "Available space in the GTT: %llu bytes\n",
> > +  arg.aper_available_size);
> > +   seq_printf(m, "Available space in the mappable aperture: %llu bytes\n",
> > +  arg.map_available_size);
> > +   seq_printf(m, "Single largest space in the mappable aperture: %llu 
> > bytes\n",
> > +  arg.map_largest_size);
> > +   seq_printf(m, "Available space for fences: %llu bytes\n",
> > +  arg.fence_available_size);
> > +   seq_printf(m, "Single largest fence available: %llu bytes\n",
> > +  arg.fence_largest_size);
> > +
> > +   return 0;
> > +}
> > +
> >  static int i915_gem_gtt_info(struct seq_file *m, void *data)
> >  {
> > struct drm_info_node *node = m->private;
> > @@ -5030,6 +5056,7 @@ static int i915_debugfs_create(struct dentry *root,
> >  static const struct drm_info_list i915_debugfs_list[] = {
> > {"i915_capabilities", i915_capabilities, 0},
> > {"i915_gem_objects", i915_gem_object_info, 0},
> > +   {"i915_gem_aperture", i915_gem_aperture_info, 0},
> > {"i915_gem_gtt", i915_gem_gtt_info, 0},
> > {"i915_gem_pinned", i915_gem_gtt_info, 0, (void *) PINNED_LIST},
> > {"i915_gem_active", i915_gem_object_list_info, 0, (void *) ACTIVE_LIST},
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c 
> > b/drivers/gpu/drm/i915/i915_gem.c
> > index a2a4a27..ccfc8d3 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -32,6 +32,7 @@
> >  #include "i915_vgpu.h"
> >  #include "i915_trace.h"
> >  #include "intel_drv.h"
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -143,6 +144,55 @@ int i915_mutex_lock_interruptible(struct drm_device 
> > *dev)
> > return 0;
> >  }
> >  
> > +static inline bool
> > +i915_gem_object_is_inactive(struct drm_i915_gem_object *obj)
> > +{
> > +   return i915_gem_obj_bound_any(obj) && !obj->active;
> > +}
> > +
> > +static int obj_rank_by_ggtt(void *priv,
> > +   struct list_head *A,
> > +   struct list_head *B)
> > +{
> > +   struct drm_i915_gem_object *a = list_entry(A,typeof(*a), obj_exec_link);
> > +   struct drm_i915_gem_objec

Re: [Intel-gfx] [PATCH 1/4] drm/i915: Clearing buffer objects via blitter engine

2015-07-01 Thread Chris Wilson

On Wed, Jul 01, 2015 at 03:54:55PM +0100, Tvrtko Ursulin wrote:
> 
> Hi,
> 
> On 07/01/2015 10:25 AM, ankitprasad.r.sha...@intel.com wrote:
> >From: Ankitprasad Sharma 
> >
> >This patch adds support for clearing buffer objects via blitter
> >engines. This is particularly useful for clearing out the memory
> >from stolen region.
> 
> Because CPU cannot access it? I would put that into the commit
> message since I think cover letter does not go into the git history.
> 
> >v2: Add support for using execlists & PPGTT
> >
> >v3: Fix issues in legacy ringbuffer submission mode
> >
> >v4: Rebased to the latest drm-intel-nightly (Ankit)
> >
> >testcase: igt/gem_stolen
> >
> 
> Nitpick: usually it is "Testcase:" and all tags grouped together.
> 
> >Signed-off-by: Chris Wilson 
> >Signed-off-by: Deepak S 
> >Signed-off-by: Ankitprasad Sharma 
> >---
> >  drivers/gpu/drm/i915/Makefile   |   1 +
> >  drivers/gpu/drm/i915/i915_drv.h |   4 +
> >  drivers/gpu/drm/i915/i915_gem_exec.c| 201 
> > 
> >  drivers/gpu/drm/i915/intel_lrc.c|   4 +-
> >  drivers/gpu/drm/i915/intel_lrc.h|   3 +
> >  drivers/gpu/drm/i915/intel_ringbuffer.c |   2 +-
> >  drivers/gpu/drm/i915/intel_ringbuffer.h |   1 +
> >  7 files changed, 213 insertions(+), 3 deletions(-)
> >  create mode 100644 drivers/gpu/drm/i915/i915_gem_exec.c
> >
> >diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> >index de21965..1959314 100644
> >--- a/drivers/gpu/drm/i915/Makefile
> >+++ b/drivers/gpu/drm/i915/Makefile
> >@@ -24,6 +24,7 @@ i915-y += i915_cmd_parser.o \
> >   i915_gem_debug.o \
> >   i915_gem_dmabuf.o \
> >   i915_gem_evict.o \
> >+  i915_gem_exec.o \
> >   i915_gem_execbuffer.o \
> >   i915_gem_gtt.o \
> >   i915_gem.o \
> >diff --git a/drivers/gpu/drm/i915/i915_drv.h 
> >b/drivers/gpu/drm/i915/i915_drv.h
> >index ea9caf2..d1e151e 100644
> >--- a/drivers/gpu/drm/i915/i915_drv.h
> >+++ b/drivers/gpu/drm/i915/i915_drv.h
> >@@ -3082,6 +3082,10 @@ int __must_check i915_gem_evict_something(struct 
> >drm_device *dev,
> >  int i915_gem_evict_vm(struct i915_address_space *vm, bool do_idle);
> >  int i915_gem_evict_everything(struct drm_device *dev);
> >
> >+/* i915_gem_exec.c */
> >+int i915_gem_exec_clear_object(struct drm_i915_gem_object *obj,
> >+   struct drm_i915_file_private *file_priv);
> >+
> >  /* belongs in i915_gem_gtt.h */
> >  static inline void i915_gem_chipset_flush(struct drm_device *dev)
> >  {
> >diff --git a/drivers/gpu/drm/i915/i915_gem_exec.c 
> >b/drivers/gpu/drm/i915/i915_gem_exec.c
> >new file mode 100644
> >index 000..a07fda0
> >--- /dev/null
> >+++ b/drivers/gpu/drm/i915/i915_gem_exec.c
> >@@ -0,0 +1,201 @@
> >+/*
> >+ * Copyright © 2013 Intel Corporation
> 
> Is the year correct?

Yes, but it should be extended to include the lrc mess.
 
> >+ *
> >+ * Permission is hereby granted, free of charge, to any person obtaining a
> >+ * copy of this software and associated documentation files (the 
> >"Software"),
> >+ * to deal in the Software without restriction, including without limitation
> >+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> >+ * and/or sell copies of the Software, and to permit persons to whom the
> >+ * Software is furnished to do so, subject to the following conditions:
> >+ *
> >+ * The above copyright notice and this permission notice (including the next
> >+ * paragraph) shall be included in all copies or substantial portions of the
> >+ * Software.
> >+ *
> >+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS 
> >OR
> >+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> >+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> >+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR 
> >OTHER
> >+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> >+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
> >DEALINGS
> >+ * IN THE SOFTWARE.
> >+ *
> >+ * Authors:
> >+ *Chris Wilson 
> 
> And author?

Yes. If we discount the ugly changes to support two parallel ring
interface api's that are doing identical jobs.
 
> >+ *
> >+ */
> >+
> >+#include 
> >+#include 
> >+#include "i915_drv.h"
> >+
> >+#define GEN8_COLOR_BLT_CMD (2<<29 | 0x50<<22)
> >+
> >+#define BPP_8 0
> >+#define BPP_16 (1<<24)
> >+#define BPP_32 (1<<25 | 1<<24)
> >+
> >+#define ROP_FILL_COPY (0xf0 << 16)
> >+
> >+static int i915_gem_exec_flush_object(struct drm_i915_gem_object *obj,
> >+  struct intel_engine_cs *ring,
> >+  struct intel_context *ctx,
> >+  struct drm_i915_gem_request **req)
> >+{
> >+int ret;
> >+
> >+ret = i915_gem_object_sync(obj, ring, req);
> >+if (ret)
> >+return ret;
> >+
> >+if (obj->base.write_domai

Re: [Intel-gfx] [PATCH 2/4] drm/i915: Support for creating Stolen memory backed objects

2015-07-01 Thread Chris Wilson

On Wed, Jul 01, 2015 at 04:06:49PM +0100, Tvrtko Ursulin wrote:
> 
> 
> On 07/01/2015 10:25 AM, ankitprasad.r.sha...@intel.com wrote:
> >From: Ankitprasad Sharma 
> >
> >Extend the drm_i915_gem_create structure to add support for
> >creating Stolen memory backed objects. Added a new flag through
> >which user can specify the preference to allocate the object from
> >stolen memory, which if set, an attempt will be made to allocate
> >the object from stolen memory subject to the availability of
> >free space in the stolen region.
> >
> >v2: Rebased to the latest drm-intel-nightly (Ankit)
> >
> >testcase: igt/gem_stolen
> >
> >Signed-off-by: Ankitprasad Sharma 
> >---
> >  drivers/gpu/drm/i915/i915_dma.c |  3 +++
> >  drivers/gpu/drm/i915/i915_gem.c | 31 +++
> >  include/uapi/drm/i915_drm.h | 15 +++
> >  3 files changed, 45 insertions(+), 4 deletions(-)
> >
> >diff --git a/drivers/gpu/drm/i915/i915_dma.c 
> >b/drivers/gpu/drm/i915/i915_dma.c
> >index c5349fa..6045749 100644
> >--- a/drivers/gpu/drm/i915/i915_dma.c
> >+++ b/drivers/gpu/drm/i915/i915_dma.c
> >@@ -167,6 +167,9 @@ static int i915_getparam(struct drm_device *dev, void 
> >*data,
> > value = i915.enable_hangcheck &&
> > intel_has_gpu_reset(dev);
> > break;
> >+case I915_PARAM_CREATE_VERSION:
> >+value = 1;
> 
> Shouldn't it be 2?

But 1 is the 2nd number, discounting all those pesky negative versions :)

> > /* Allocate the new object */
> >-obj = i915_gem_alloc_object(dev, size);
> >+if (flags & I915_CREATE_PLACEMENT_STOLEN) {
> >+mutex_lock(&dev->struct_mutex);
> 
> Probably need the interruptible variant so userspace can Ctrl-C if
> things get stuck in submission/waiting.

Paulo has been working on removing this struct_mutex requirement, in
which case internally we will take a stolen_mutex around the drm_mm as
required.
 
> >+obj = i915_gem_object_create_stolen(dev, size);
> >+if (!obj) {
> >+mutex_unlock(&dev->struct_mutex);
> >+return -ENOMEM;
> >+}
> >+

And pushes the struct_mutex to here (one day, one glorious day that will
be a vm->mutex or something!).

And yes, you will want, nay must use,

ret = i915_mutex_interruptible(dev);

before thinking about using the GPU.

> >+ret = i915_gem_exec_clear_object(obj, file->driver_priv);
> 
> I would put a comment here saying why it is important to clear
> stolen memory.

Userspace ABI (and kernel ABI in general) is that we do not hand back
uncleared buffers. Something to do with bank card details I guess.
So just:

/* always clear fresh buffers before handing to userspace */

An alternative is that I've been contemplating a private page pool to
reuse and not clear. It's a trade-off between having a large cache in
userspace, and a less flexible cache in the kernel with the supposed
advantage that the kernel cache could be more space efficient.

> >+if (ret) {
> >+i915_gem_object_free(obj);
> 
> This should probably be drm_gem_object_unreference.
> 
> >+mutex_unlock(&dev->struct_mutex);
> >+return ret;
> >+}
> >+
> >+mutex_unlock(&dev->struct_mutex);
> >+} else
> >+obj = i915_gem_alloc_object(dev, size);
> 
> Need curly braces on both branches.

I am sure someone hacked CODING_STYLE. Or I should.
 
> >@@ -355,6 +355,7 @@ typedef struct drm_i915_irq_wait {
> >  #define I915_PARAM_SUBSLICE_TOTAL   33
> >  #define I915_PARAM_EU_TOTAL 34
> >  #define I915_PARAM_HAS_GPU_RESET35
> >+#define I915_PARAM_CREATE_VERSION36
> >
> >  typedef struct drm_i915_getparam {
> > int param;
> >@@ -450,6 +451,20 @@ struct drm_i915_gem_create {
> >  */
> > __u32 handle;
> > __u32 pad;
> >+/**
> >+ * Requested flags (currently used for placement
> >+ * (which memory domain))
> >+ *
> >+ * You can request that the object be created from special memory
> >+ * rather than regular system pages using this parameter. Such
> >+ * irregular objects may have certain restrictions (such as CPU
> >+ * access to a stolen object is verboten).
> 
> I'd just use English all the way. :)

Heh!, English is highly adaptible language and steals good words all
the time!

> >+ *
> >+ * This can be used in the future for other purposes too
> >+ * e.g. specifying tiling/caching/madvise
> >+ */
> >+__u32 flags;
> >+#define I915_CREATE_PLACEMENT_STOLEN (1<<0) /* Cannot use CPU mmaps or 
> >pread/pwrite */

Note that we dropped the pread/pwrite restriction.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 2/4] drm/i915: Support for creating Stolen memory backed objects

2015-07-01 Thread Tvrtko Ursulin



On 07/01/2015 10:25 AM, ankitprasad.r.sha...@intel.com wrote:

From: Ankitprasad Sharma 

Extend the drm_i915_gem_create structure to add support for
creating Stolen memory backed objects. Added a new flag through
which user can specify the preference to allocate the object from
stolen memory, which if set, an attempt will be made to allocate
the object from stolen memory subject to the availability of
free space in the stolen region.

v2: Rebased to the latest drm-intel-nightly (Ankit)

testcase: igt/gem_stolen

Signed-off-by: Ankitprasad Sharma 
---
  drivers/gpu/drm/i915/i915_dma.c |  3 +++
  drivers/gpu/drm/i915/i915_gem.c | 31 +++
  include/uapi/drm/i915_drm.h | 15 +++
  3 files changed, 45 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index c5349fa..6045749 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -167,6 +167,9 @@ static int i915_getparam(struct drm_device *dev, void *data,
value = i915.enable_hangcheck &&
intel_has_gpu_reset(dev);
break;
+   case I915_PARAM_CREATE_VERSION:
+   value = 1;
+   break;
default:
DRM_DEBUG("Unknown parameter %d\n", param->param);
return -EINVAL;
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index a2a4a27..4acf331 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -391,7 +391,8 @@ static int
  i915_gem_create(struct drm_file *file,
struct drm_device *dev,
uint64_t size,
-   uint32_t *handle_p)
+   uint32_t *handle_p,
+   uint32_t flags)
  {
struct drm_i915_gem_object *obj;
int ret;
@@ -401,8 +402,29 @@ i915_gem_create(struct drm_file *file,
if (size == 0)
return -EINVAL;

+   if (flags & ~(I915_CREATE_PLACEMENT_STOLEN))
+   return -EINVAL;
+
/* Allocate the new object */
-   obj = i915_gem_alloc_object(dev, size);
+   if (flags & I915_CREATE_PLACEMENT_STOLEN) {
+   mutex_lock(&dev->struct_mutex);
+   obj = i915_gem_object_create_stolen(dev, size);


One more thing here, size is u64 in this function but 
i915_gem_object_create_stolen takes u32. Is compiler not noticing this?


(And i915_gem_alloc_object is size_t for a complete win!) :D

Regards,

Tvrtko
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 3/4] drm/i915: Add support for stealing purgable stolen pages

2015-07-01 Thread Tvrtko Ursulin



On 07/01/2015 10:25 AM, ankitprasad.r.sha...@intel.com wrote:

From: Chris Wilson 

If we run out of stolen memory when trying to allocate an object, see if
we can reap enough purgeable objects to free up enough contiguous free
space for the allocation. This is in principle very much like evicting
objects to free up enough contiguous space in the vma when binding
a new object - and you will be forgiven for thinking that the code looks
very similar.

At the moment, we do not allow userspace to allocate objects in stolen,
so there is neither the memory pressure to trigger stolen eviction nor
any purgeable objects inside the stolen arena. However, this will change
in the near future, and so better management and defragmentation of
stolen memory will become a real issue.

v2: Remember to remove the drm_mm_node.

v3: Rebased to the latest drm-intel-nightly (Ankit)

testcase: igt/gem_stolen



Tidy "Testcase:" tag again.


Signed-off-by: Chris Wilson 
---
  drivers/gpu/drm/i915/i915_gem_stolen.c | 121 ++---
  1 file changed, 110 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c 
b/drivers/gpu/drm/i915/i915_gem_stolen.c
index 348ed5a..7e216be 100644
--- a/drivers/gpu/drm/i915/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
@@ -430,18 +430,29 @@ cleanup:
return NULL;
  }

-struct drm_i915_gem_object *
-i915_gem_object_create_stolen(struct drm_device *dev, u32 size)
+static bool mark_free(struct drm_i915_gem_object *obj, struct list_head 
*unwind)
+{
+   if (obj->stolen == NULL)
+   return false;
+
+   if (obj->madv != I915_MADV_DONTNEED)
+   return false;
+
+   if (i915_gem_obj_is_pinned(obj))
+   return false;
+
+   list_add(&obj->obj_exec_link, unwind);
+   return drm_mm_scan_add_block(obj->stolen);
+}
+
+static struct drm_mm_node *
+stolen_alloc(struct drm_i915_private *dev_priv, u32 size)
  {
-   struct drm_i915_private *dev_priv = dev->dev_private;
-   struct drm_i915_gem_object *obj;
struct drm_mm_node *stolen;
+   struct drm_i915_gem_object *obj;
+   struct list_head unwind, evict;
int ret;

-   if (!drm_mm_initialized(&dev_priv->mm.stolen))
-   return NULL;
-
-   DRM_DEBUG_KMS("creating stolen object: size=%x\n", size);
if (size == 0)
return NULL;

@@ -451,11 +462,99 @@ i915_gem_object_create_stolen(struct drm_device *dev, u32 
size)

ret = drm_mm_insert_node(&dev_priv->mm.stolen, stolen, size,
 4096, DRM_MM_SEARCH_DEFAULT);
-   if (ret) {
-   kfree(stolen);
-   return NULL;
+   if (ret == 0)
+   return stolen;
+
+   /* No more stolen memory available, or too fragmented.
+* Try evicting purgeable objects and search again.
+*/
+
+   drm_mm_init_scan(&dev_priv->mm.stolen, size, 4096, 0);
+   INIT_LIST_HEAD(&unwind);
+
+   list_for_each_entry(obj, &dev_priv->mm.unbound_list, global_list)
+   if (mark_free(obj, &unwind))
+   goto found;
+
+   list_for_each_entry(obj, &dev_priv->mm.bound_list, global_list)
+   if (mark_free(obj, &unwind))
+   goto found;
+
+found:
+   INIT_LIST_HEAD(&evict);
+   while (!list_empty(&unwind)) {
+   obj = list_first_entry(&unwind,
+  struct drm_i915_gem_object,
+  obj_exec_link);
+   list_del_init(&obj->obj_exec_link);
+
+   if (drm_mm_scan_remove_block(obj->stolen)) {
+   list_add(&obj->obj_exec_link, &evict);
+   drm_gem_object_reference(&obj->base);
+   }
}

+   ret = 0;
+   while (!list_empty(&evict)) {
+   obj = list_first_entry(&evict,
+  struct drm_i915_gem_object,
+  obj_exec_link);
+   list_del_init(&obj->obj_exec_link);
+
+   if (ret == 0) {
+   struct i915_vma *vma, *vma_next;
+
+   list_for_each_entry_safe(vma, vma_next,
+&obj->vma_list,
+vma_link)
+   if (i915_vma_unbind(vma))
+   break;
+
+   /* Stolen pins its pages to prevent the
+* normal shrinker from processing stolen
+* objects.
+*/
+   i915_gem_object_unpin_pages(obj);
+
+   ret = i915_gem_object_put_pages(obj);
+   if (ret == 0) {
+   i915_gem_object_release_stolen(obj);
+   obj->madv = __I915_MADV_PURGED;
+

[Intel-gfx] [PATCH v5] drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset

2015-07-01 Thread Michel Thierry

There are some allocations that must be only referenced by 32-bit
offsets. To limit the chances of having the first 4GB already full,
objects not requiring this workaround use DRM_MM_SEARCH_BELOW/
DRM_MM_CREATE_TOP flags

In specific, any resource used with flat/heapless (0x-0xf000)
General State Heap (GSH) or Intructions State Heap (ISH) must be in a
32-bit range, because the General State Offset and Instruction State
Offset are limited to 32-bits.

Objects must have EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag to indicate if
they can be allocated above the 32-bit address range. To limit the
chances of having the first 4GB already full, objects will use
DRM_MM_SEARCH_BELOW + DRM_MM_CREATE_TOP flags when possible.

v2: Changed flag logic from neeeds_32b, to supports_48b.
v3: Moved 48-bit support flag back to exec_object. (Chris, Daniel)
v4: Split pin flags into PIN_ZONE_4G and PIN_HIGH; update PIN_OFFSET_MASK
to use last PIN_ defined instead of hard-coded value; use correct limit
check in eb_vma_misplaced. (Chris)
v5: Don't touch PIN_OFFSET_MASK and update workaround comment (Chris)

Cc: Chris Wilson 
Reviewed-by: Chris Wilson  (v4)
Signed-off-by: Michel Thierry 
---
 drivers/gpu/drm/i915/i915_drv.h|  2 ++
 drivers/gpu/drm/i915/i915_gem.c| 14 --
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 13 +
 include/uapi/drm/i915_drm.h|  3 ++-
 4 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 3fbfce5..cda6366 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2768,6 +2768,8 @@ void i915_gem_vma_destroy(struct i915_vma *vma);
 #define PIN_OFFSET_BIAS(1<<3)
 #define PIN_USER   (1<<4)
 #define PIN_UPDATE (1<<5)
+#define PIN_ZONE_4G(1<<6)
+#define PIN_HIGH   (1<<7)
 #define PIN_OFFSET_MASK (~4095)
 int __must_check
 i915_gem_object_pin(struct drm_i915_gem_object *obj,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 43719b8..1372259 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3722,6 +3722,8 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object 
*obj,
struct drm_i915_private *dev_priv = dev->dev_private;
u32 fence_alignment, unfenced_alignment;
u64 size, fence_size;
+   u32 search_flag = DRM_MM_SEARCH_DEFAULT;
+   u32 alloc_flag = DRM_MM_CREATE_DEFAULT;
u64 start =
flags & PIN_OFFSET_BIAS ? flags & PIN_OFFSET_MASK : 0;
u64 end =
@@ -3763,6 +3765,14 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object 
*obj,
   obj->tiling_mode,
   false);
size = flags & PIN_MAPPABLE ? fence_size : obj->base.size;
+
+   if (flags & PIN_HIGH) {
+   search_flag = DRM_MM_SEARCH_BELOW;
+   alloc_flag = DRM_MM_CREATE_TOP;
+   }
+
+   if (flags & PIN_ZONE_4G)
+   end = (1ULL << 32);
}
 
if (alignment == 0)
@@ -3805,8 +3815,8 @@ search_free:
  size, alignment,
  obj->cache_level,
  start, end,
- DRM_MM_SEARCH_DEFAULT,
- DRM_MM_CREATE_DEFAULT);
+ search_flag,
+ alloc_flag);
if (ret) {
ret = i915_gem_evict_something(dev, vm, size, alignment,
   obj->cache_level,
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 600db74..ff50619 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -588,11 +588,20 @@ i915_gem_execbuffer_reserve_vma(struct i915_vma *vma,
if (entry->flags & EXEC_OBJECT_NEEDS_GTT)
flags |= PIN_GLOBAL;
 
+   /* Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset,
+* limit address to the first 4GBs for unflagged objects.
+*/
+   flags |= PIN_ZONE_4G;
+   if (entry->flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS)
+   flags &= ~PIN_ZONE_4G;
+
if (!drm_mm_node_allocated(&vma->node)) {
if (entry->flags & __EXEC_OBJECT_NEEDS_MAP)
flags |= PIN_GLOBAL | PIN_MAPPABLE;
if (entry->flags & __EXEC_OBJECT_NEEDS_BIAS)
flags |= BATCH_OFFSET_BIAS | PIN_OFFSET_BIAS;
+   if ((flags & PIN_MAPPABLE) == 0)
+   flags |= PIN_HIGH;
}
 
ret = i915_gem_object_pin(obj, vma->vm, entry->

Re: [Intel-gfx] [PATCH v3 14/17] drm/i915: batch_obj vm offset must be u64

2015-07-01 Thread John Harrison


On 01/07/2015 16:27, Michel Thierry wrote:

Otherwise it can overflow in 48-bit mode, and cause an incorrect
exec_start.

Before commit 5f19e2bff ("drm/i915: Merged the many do_execbuf()
parameters into a structure"), it was already an u64, so it could be
seen as a regression (or as an optimization that looked good at that time).
Almost certainly a merge failure. The above patch moved the variable 
when it was a uint32_t but by the time it got merged, another patch had 
updated it to uint64_t. Unfortunately, the merge conflict either didn't 
conflict or didn't get resolved correctly. Either way, the downgrade was 
certainly not intentional.




Cc: John Harrison 
Signed-off-by: Michel Thierry 
---
  drivers/gpu/drm/i915/i915_drv.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index d245c82..c720a18 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1664,7 +1664,7 @@ struct i915_execbuffer_params {
struct drm_file *file;
uint32_tdispatch_flags;
uint32_targs_batch_start_offset;
-   uint32_tbatch_obj_vm_offset;
+   uint64_tbatch_obj_vm_offset;
struct intel_engine_cs  *ring;
struct drm_i915_gem_object  *batch_obj;
struct intel_context*ctx;


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH v3 16/17] drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset

2015-07-01 Thread Michel Thierry


On 7/1/2015 4:43 PM, Chris Wilson wrote:

On Wed, Jul 01, 2015 at 04:27:32PM +0100, Michel Thierry wrote:


+   flags |= PIN_ZONE_4G;
+   if (entry->flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS)
+   flags &= ~PIN_ZONE_4G;
+
if (!drm_mm_node_allocated(&vma->node)) {
if (entry->flags & __EXEC_OBJECT_NEEDS_MAP)
flags |= PIN_GLOBAL | PIN_MAPPABLE;
if (entry->flags & __EXEC_OBJECT_NEEDS_BIAS)
flags |= BATCH_OFFSET_BIAS | PIN_OFFSET_BIAS;
+   if ((flags & PIN_MAPPABLE) == 0)
+   flags |= PIN_HIGH;


I'm still debating the right semantics to use, but I'm happy with this
until I can find something better. (The biggest issue is that drm_mm is
not indexed for fast top-down searching. The current search code I put
into drm_mm is unfortunately broken, the idea I have in mind to fix it is
to add a hole_list into drm_mm/drm_mm_node, so that we can just walk
holes in up/down, recent/old order. And with that allocating top-down
will not be any more expensive than the current reuse recent hole -
though perhaps given a fragment drm_mm the hole stack probably requires
fewer steps to find a large hole.)

Other than don't touch PIN_OFFSET_MASK and move the w/a note next to
where we tweak the PIN_ZONE_4G, it lgtm, so with those changes,


Also the comment should be "limit address to the first 4GBs for 
_unflagged_ objects". I didn't update it after changing the logic.


I'll resend with these changes.

Thanks,


Reviewed-by: Chris Wilson 
-Chris


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915/bxt: Calculate port clock

2015-07-01 Thread shuang . he

Tested-By: Intel Graphics QA PRTS (Patch Regression Test System Contact: 
shuang...@intel.com)
Task id: 6675
-Summary-
Platform  Delta  drm-intel-nightly  Series Applied
ILK  302/302  302/302
SNB  314/318  314/318
IVB  343/343  343/343
BYT -2  287/287  285/287
HSW  380/380  380/380
-Detailed-
Platform  Testdrm-intel-nightly  Series 
Applied
*BYT  igt@gem_partial_pwrite_pread@reads  PASS(1)  FAIL(1)
*BYT  igt@gem_partial_pwrite_pread@reads-uncached  PASS(1)  FAIL(1)
Note: You need to pay more attention to line start with '*'
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH v3 16/17] drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset

2015-07-01 Thread Chris Wilson

On Wed, Jul 01, 2015 at 04:27:32PM +0100, Michel Thierry wrote:
> There are some allocations that must be only referenced by 32-bit
> offsets. To limit the chances of having the first 4GB already full,
> objects not requiring this workaround use DRM_MM_SEARCH_BELOW/
> DRM_MM_CREATE_TOP flags
> 
> In specific, any resource used with flat/heapless (0x-0xf000)
> General State Heap (GSH) or Intructions State Heap (ISH) must be in a
> 32-bit range, because the General State Offset and Instruction State
> Offset are limited to 32-bits.
> 
> Objects must have EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag to indicate if
> they can be allocated above the 32-bit address range. To limit the
> chances of having the first 4GB already full, objects will use
> DRM_MM_SEARCH_BELOW + DRM_MM_CREATE_TOP flags when possible.
> 
> v2: Changed flag logic from neeeds_32b, to supports_48b.
> v3: Moved 48-bit support flag back to exec_object. (Chris, Daniel)
> v4: Split pin flags into PIN_ZONE_4G and PIN_HIGH; update PIN_OFFSET_MASK
> to use last PIN_ defined instead of hard-coded value; use correct limit
> check in eb_vma_misplaced. (Chris)
> 
> Cc: Chris Wilson 
> Cc: Ben Widawsky 
> Signed-off-by: Michel Thierry 
> ---
>  drivers/gpu/drm/i915/i915_drv.h|  4 +++-
>  drivers/gpu/drm/i915/i915_gem.c| 17 +++--
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c | 10 ++
>  include/uapi/drm/i915_drm.h|  3 ++-
>  4 files changed, 30 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index c720a18..aac51fb 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2765,7 +2765,9 @@ void i915_gem_vma_destroy(struct i915_vma *vma);
>  #define PIN_OFFSET_BIAS  (1<<3)
>  #define PIN_USER (1<<4)
>  #define PIN_UPDATE   (1<<5)
> -#define PIN_OFFSET_MASK (~4095)
> +#define PIN_ZONE_4G  (1<<6)
> +#define PIN_HIGH (1<<7)
> +#define PIN_OFFSET_MASK -(PIN_HIGH<<1)

The offset has to be 4096 aligned - it imposes an upper limit on how
many low bits we can use for flags. When we exceed it, it probably past
time for a params struct!

>  int __must_check
>  i915_gem_object_pin(struct drm_i915_gem_object *obj,
>   struct i915_address_space *vm,
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index eeea748..8aa0189 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3718,6 +3718,8 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object 
> *obj,
>   struct drm_i915_private *dev_priv = dev->dev_private;
>   u32 fence_alignment, unfenced_alignment;
>   u64 size, fence_size;
> + u32 search_flag = DRM_MM_SEARCH_DEFAULT;
> + u32 alloc_flag = DRM_MM_CREATE_DEFAULT;
>   u64 start =
>   flags & PIN_OFFSET_BIAS ? flags & PIN_OFFSET_MASK : 0;
>   u64 end =
> @@ -3759,6 +3761,17 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object 
> *obj,
>  obj->tiling_mode,
>  false);
>   size = flags & PIN_MAPPABLE ? fence_size : obj->base.size;
> +
> + if (flags & PIN_HIGH) {
> + search_flag = DRM_MM_SEARCH_BELOW;
> + alloc_flag = DRM_MM_CREATE_TOP;
> + }
> +
> + /* Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset,
> +  * limit address to the first 4GBs for flagged objects.
> +  */

This note is best next to where we set PIN_ZONE_4G in execbuffer.

> + if (flags & PIN_ZONE_4G)
> + end = (1ULL << 32);
>   }
>  
>   if (alignment == 0)

> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
> b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 600db74..f52b736 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -588,11 +588,17 @@ i915_gem_execbuffer_reserve_vma(struct i915_vma *vma,
>   if (entry->flags & EXEC_OBJECT_NEEDS_GTT)
>   flags |= PIN_GLOBAL;
>  
> + flags |= PIN_ZONE_4G;
> + if (entry->flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS)
> + flags &= ~PIN_ZONE_4G;
> +
>   if (!drm_mm_node_allocated(&vma->node)) {
>   if (entry->flags & __EXEC_OBJECT_NEEDS_MAP)
>   flags |= PIN_GLOBAL | PIN_MAPPABLE;
>   if (entry->flags & __EXEC_OBJECT_NEEDS_BIAS)
>   flags |= BATCH_OFFSET_BIAS | PIN_OFFSET_BIAS;
> + if ((flags & PIN_MAPPABLE) == 0)
> + flags |= PIN_HIGH;

I'm still debating the right semantics to use, but I'm happy with this
until I can find something better. (The biggest issue is that drm_mm is
not indexed for fast top-down searching. The current search code I put
into drm_mm is unfortunately broken, the idea I have in mind to

Re: [Intel-gfx] [PATCH 6/8] drm/i915: add struct_mutex WARNs to i915_gem_stolen.c

2015-07-01 Thread Daniel Vetter

On Wed, Jul 01, 2015 at 08:17:37AM -0700, Jesse Barnes wrote:
> On 07/01/2015 06:56 AM, Daniel Vetter wrote:
> > On Tue, Jun 30, 2015 at 01:30:27PM -0700, Jesse Barnes wrote:
> >> On 06/30/2015 07:36 AM, Chris Wilson wrote:
> >>> On Tue, Jun 30, 2015 at 11:26:11AM -0300, Paulo Zanoni wrote:
>  2015-06-30 11:15 GMT-03:00 Chris Wilson :
> > On Tue, Jun 30, 2015 at 10:53:10AM -0300, Paulo Zanoni wrote:
> >> From: Paulo Zanoni 
> >>
> >> Let's make sure the future Paulos don't forget that we need
> >> struct_mutex when touching dev_priv->mm.stolen.
> >>
> >> Signed-off-by: Paulo Zanoni 
> >> ---
> >>  drivers/gpu/drm/i915/i915_gem_stolen.c | 13 +
> >>  1 file changed, 13 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c 
> >> b/drivers/gpu/drm/i915/i915_gem_stolen.c
> >> index 793bcba..cac1bce 100644
> >> --- a/drivers/gpu/drm/i915/i915_gem_stolen.c
> >> +++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
> >> @@ -160,6 +160,8 @@ static int find_compression_threshold(struct 
> >> drm_device *dev,
> >>   int compression_threshold = 1;
> >>   int ret;
> >>
> >> + WARN_ON(!mutex_is_locked(&dev->struct_mutex));
> >
> > I'm not a huge fan of vague mutex warnings that don't even check the 
> > owner.
> > I'm espcially not a fan of adding a WARN and not handling the error.
> 
>  But then, what exactly is your proposal? What would you like to see here?
> 
>  We can discard this patch if you want. But I hope you're not
>  advocating for lockdep_assert_held(), because if I switch to lockdep,
>  then Daniel is going to deny it again. Also, this type of WARN_ON is a
>  common pattern on our codebase...
> >>>
> >>> I'm just trying to convince Daniel that blindly using this pattern is
> >>> the wrong approach and encouraging a proliferation of unhandled WARN_ON
> >>> doesn't improve driver robustness.
> >>
> >> I think they serve as useful documentation at the very least, whether in
> >> lockdep form, WARN form, or BUG form.  It's not really something we can
> >> recover from either (maybe returning early before touching data?), so...
> > 
> > Not grabbing a lock is generally a harmless error since real races out
> > there are rare with X being single-threaded and all that. Especially in
> > stuff called from modeset code. Hence I think just WARN_ON plus continuing
> > on with blissful ignorance is the best approach.
> > 
> > I don't the lockdep versions personally since they don't work when lockdep
> > is disabled, which is pretty much always the case. Might be useful to do
> > an assert_mutex_held which always does the most paranoid check (i.e.
> > WARN_ON without lockdep, lockdep_assert_held with lockdep).
> 
> Maybe we should add WARN_ONs to the lockdep_assert macros in the
> !CONFIG_LOCKDEP case.  That would give us documentation, checking in
> both cases, and everyone would be happy, right?

tbh never tried that ;-)
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH v4] drm/i915 : Added Programming of the MOCS

2015-07-01 Thread Daniel Vetter

On Wed, Jul 01, 2015 at 08:14:30AM -0700, Jesse Barnes wrote:
> On 07/01/2015 06:53 AM, Peter Antoine wrote:
> > On Wed, 1 Jul 2015, Francisco Jerez wrote:
> > 
> >> Peter Antoine  writes:
> >>
> >>> On Tue, 30 Jun 2015, Francisco Jerez wrote:
> >>>
>  Francisco Jerez  writes:
> 
> > Peter Antoine  writes:
> >
> >> On Mon, 29 Jun 2015, Peter Antoine wrote:
> >>
> >>> On Thu, 25 Jun 2015, Francisco Jerez wrote:
> >>>
>  Peter Antoine  writes:
>  Mesa will want an additional entry with TC=LLC/eLLC, LeCC=PTE,
>  L3CC=WB,
>  everything else unset, I'll reply with a userspace patch making
>  use of
>  your change if you add such an entry.
> >> Ok. I think what you want is, same as entry two, but use the
> >> underlying
> >> pagetable settings and not specify the EDRAM settings. Please
> >> confirm in
> >> the new patchset.
> >
> > Yeah, that sounds good.
> >
> 
>  Another thing worth mentioning is that entries 0, 2 and 5 seem
>  to do the
>  same thing suspiciously, the only difference is the LRUM field
>  which
>  AFAIK doesn't have any effect for LeCC=UC.  Is my understanding
>  correct?
> 
> >>> These tables are generated via requests and then boiled down to
> >>> the above.
> >>> So some of the entries are by request. Swings and roundabouts,
> >>> can remove
> >>> the ones that look redundant but then the tuning that has been
> >>> done wont
> >>> match. I'll add the new entry at the end of the table.
> 
>  Are you planning to propagate the entry you just added back to the
>  original table this was generated from?  What about new entries we may
>  need to add in the future?  What should be the process to make sure
>  that
>  our table and the master table don't diverge and end up with
>  conflicting
>  entries we cannot remove because of ABI compatibility?  I guess there
>  should be a comment on the top warning that the table is part of the
>  kernel ABI and supposed to be kept in sync with your table, so other
>  people don't change it unknowingly?
> 
>  Thanks.
> >>> I am talking to the team that handles this and see if they will add this
> >>> (so future gens this is baked in) but it is unlikely that the other
> >>> tables
> >>> will stay in step as getting in changes will cause too much grief
> >>> getting
> >>> them upstreamed and as the table is auto-generated we will not be
> >>> able to
> >>> guarantee the ordering. It will have to be manual job for anyone doing
> >>> this. It is required for other platforms for the tables to match the
> >>> userspace for performance reasons, but on Linux it will be by request if
> >>> there is a problem. We will see what happens.
> >>>
> >> I think it only makes sense for Linux to maintain compatibility with
> >> Android's tables if we agree on some straightforward process for us to
> >> allocate new entries without causing conflicts (otherwise people are
> >> likely to ignore the issue completely and let the tables diverge, as you
> >> mentioned yourself), and have some guarantee that any entries ever
> >> contributed by your team to the Linux kernel (and therefore part of our
> >> stable ABI) will never be changed or reordered in the future.
> >>
> > I think internally (and informally) that we cannot keep sync between
> > Android
> > and Linux. We need to keep compatibility with userspace and there is no
> > guarantee of ordering as these tables are generated at runtime. The tables
> > that are in Linux are a snapshot. These changes are supposed to
> > stabilise at
> > PV so they don't change in the future, but if a bug or good performance
> > enhancement occurs I can't imagine that they wont make the changes.
> 
> Wow this discussion just keeps going.  Who'd have thought such a simple
> table would cause so much trouble? :)
> 
> What you mention above is a key point: "these tables are generated at
> runtime. The tables that are in Linux are a snapshot. These changes are
> supposed to stabilise at PV so they don't change in the future, but if a
> bug or good performance enhancement occurs I can't imagine that they
> wont make the changes."
> 
> That really argues for a runtime API that allows the userland drivers to
> load in MOCS values.  I'm not sure if it's practical to make the table
> effectively part of the context (lazily applying new values if we detect
> a change vs the defaults), but that would at least let the different
> user level drivers do whatever they think is ideal...

runtime api needs an open-source user. And it sounds like mesa will be
happy for a long time with just 3 fixed entries.

I guess this mocs upstreaming went nowhere unfortunately :( Imo better to
concentrate efforts in areas where we can get somewhere (guc, preempt,
tdr, whatever).
-Daniel
-- 
Daniel Vetter
Software Engineer, In

Re: [Intel-gfx] [PATCH v3 00/17] 48-bit PPGTT

2015-07-01 Thread Daniel Vetter

On Wed, Jul 01, 2015 at 04:27:16PM +0100, Michel Thierry wrote:
> These are the rebased patches, after Mika's final ppgtt clean-up series landed
> (it relies in the macros added). New functions also follow these changes.
> 
> In order expand the GPU address space, a 4th level translation is added, the
> Page Map Level 4 (PML4). This PML4 has 256 PML4 Entries (PML4E), PML4[0-255],
> each pointing to a PDP. All the existing "dynamic alloc ppgtt" functions are
> used, only adding the 4th level changes. I also updated some remaining
> variables that were 32b only.
> 
> There are 2 hardware workarounds needed to allow correct operation with 48b
> addresses (Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset). This
> new patchset version includes the comments and suggestions from Chris Wilson.
> A flag (EXEC_OBJECT_SUPPORTS_48B_ADDRESS) will indicate if a given object can 
> be
> allocated outside the first 4 PDPs; if not, the end range is forced to 4GB. 
> Also,
> more objects now use the DRM_MM_CREATE_TOP flag. To maintain compatibility, in
> libdrm I added a new drm_intel_bo_emit_reloc_48bit function that will flag
> these objects, while the existing drm_intel_bo_emit_reloc clears it.
> 
> Finally, this feature is only available in BDW and Gen9, requires LRC 
> submission
> mode (execlists) and it can be detected by i915.enable_ppgtt=3.
> 
> Also note that this expanded address space is only available for full PPGTT,
> aliasing PPGTT and Global GTT remain 32-bit.
> 
> Michel Thierry (17):
>   drm/i915: Remove unnecessary gen8_clamp_pd
>   drm/i915/gen8: Make pdp allocation more dynamic
>   drm/i915/gen8: Abstract PDP usage
>   drm/i915/gen8: Add dynamic page trace events
>   drm/i915/gen8: implement alloc/free for 4lvl
>   drm/i915/gen8: Add 4 level switching infrastructure and lrc support
>   drm/i915/gen8: Generalize PTE writing for GEN8 PPGTT
>   drm/i915/gen8: Pass sg_iter through pte inserts
>   drm/i915/gen8: Add 4 level support in insert_entries and clear_range
>   drm/i915/gen8: Initialize PDPs
>   drm/i915: Expand error state's address width to 64b
>   drm/i915/gen8: Add ppgtt info and debug_dump
>   drm/i915: object size needs to be u64
>   drm/i915: batch_obj vm offset must be u64
>   drm/i915/userptr: Kill user_size limit check
>   drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset
>   drm/i915/gen8: Flip the 48b switch

Please start a new thread when resending the entire patch series. Only
in-reply-to parts of a series. It's harder to piece the series together
this way and hence doesn't really improve things compared to just
in-reply-to all the patches individually. The point of a full resend is to
make restart/consolidate the review discussions.

But don't resend this one here now since that will make a discussion split
guaranteed.
-Daniel

> 
>  drivers/gpu/drm/i915/i915_debugfs.c|  18 +-
>  drivers/gpu/drm/i915/i915_drv.h|  17 +-
>  drivers/gpu/drm/i915/i915_gem.c|  22 +-
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c |  10 +
>  drivers/gpu/drm/i915/i915_gem_gtt.c| 649 
> -
>  drivers/gpu/drm/i915/i915_gem_gtt.h|  66 ++-
>  drivers/gpu/drm/i915/i915_gem_userptr.c|   4 -
>  drivers/gpu/drm/i915/i915_gpu_error.c  |  17 +-
>  drivers/gpu/drm/i915/i915_params.c |   2 +-
>  drivers/gpu/drm/i915/i915_reg.h|   1 +
>  drivers/gpu/drm/i915/i915_trace.h  |  16 +
>  drivers/gpu/drm/i915/intel_lrc.c   |  65 ++-
>  include/uapi/drm/i915_drm.h|   3 +-
>  13 files changed, 725 insertions(+), 165 deletions(-)
> 
> -- 
> 2.4.5
> 
> ___
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH v3 15/17] drm/i915/userptr: Kill user_size limit check

2015-07-01 Thread Chris Wilson

On Wed, Jul 01, 2015 at 04:27:31PM +0100, Michel Thierry wrote:
> GTT was only 32b and its max value is 4GB. In order to allow objects
> bigger than 4GB in 48b PPGTT, i915_gem_userptr_ioctl we could check
> against max 48b range (1ULL << 48).
> 
> But since the check no longer applies, just kill the limit.
> 
> v2: Use the default ctx to infer the ppgtt max size (Akash).
> v3: Just kill the limit, it was only there for early detection of an
> error when used for execbuffer (Chris).
> 
> Cc: Akash Goel 
> Cc: Chris Wilson 
> Signed-off-by: Michel Thierry 
Reviewed-by: Chris Wilson 
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH mesa v2] i965/gen8+: bo in state base address must be in 32-bit address range

2015-07-01 Thread Michel Thierry

Gen8+ supports 48-bit virtual addresses, but some objects must always be
allocated inside the 32-bit address range.

In specific, any resource used with flat/heapless (0x-0xf000)
General State Heap or Intruction State Heap must be in a 32-bit range
(GSH / ISH), because the General State Offset and Instruction State Offset
are limited to 32-bits.

Use drm_intel_bo_emit_reloc_48bit when the 4GB limit is not necessary, and
the bo can be in the full address space.

This commit introduces a dependency of libdrm 2.4.63, which introduces the
drm_intel_bo_emit_reloc_48bit function.

v2: s/48baddress/48b_address/,
Only use in OUT_RELOC64 cases, OUT_RELOC implies a 32-bit address offset
is needed (Ben)

Cc: Ben Widawsky 
Cc: mesa-...@lists.freedesktop.org
Signed-off-by: Michel Thierry 
---
 configure.ac  |  2 +-
 src/mesa/drivers/dri/i965/gen8_misc_state.c   | 23 +++
 src/mesa/drivers/dri/i965/intel_batchbuffer.c |  6 +++---
 3 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/configure.ac b/configure.ac
index af61aa2..c92ca44 100644
--- a/configure.ac
+++ b/configure.ac
@@ -68,7 +68,7 @@ AC_SUBST([OSMESA_VERSION])
 dnl Versions for external dependencies
 LIBDRM_REQUIRED=2.4.38
 LIBDRM_RADEON_REQUIRED=2.4.56
-LIBDRM_INTEL_REQUIRED=2.4.60
+LIBDRM_INTEL_REQUIRED=2.4.63
 LIBDRM_NVVIEUX_REQUIRED=2.4.33
 LIBDRM_NOUVEAU_REQUIRED="2.4.33 libdrm >= 2.4.41"
 LIBDRM_FREEDRENO_REQUIRED=2.4.57
diff --git a/src/mesa/drivers/dri/i965/gen8_misc_state.c 
b/src/mesa/drivers/dri/i965/gen8_misc_state.c
index b20038e..5c8924d 100644
--- a/src/mesa/drivers/dri/i965/gen8_misc_state.c
+++ b/src/mesa/drivers/dri/i965/gen8_misc_state.c
@@ -28,6 +28,11 @@
 
 /**
  * Define the base addresses which some state is referenced from.
+ *
+ * Use OUT_RELOC instead of OUT_RELOC64, because the General State
+ * Offset and Instruction State Offset are limited to 32-bits by
+ * hardware [and add OUT_BATCH(0) after each OUT_RELOC to complete
+ * the number of dwords needed for STATE_BASE_ADDRESS].
  */
 void gen8_upload_state_base_address(struct brw_context *brw)
 {
@@ -41,19 +46,21 @@ void gen8_upload_state_base_address(struct brw_context *brw)
OUT_BATCH(0);
OUT_BATCH(mocs_wb << 16);
/* Surface state base address: */
-   OUT_RELOC64(brw->batch.bo, I915_GEM_DOMAIN_SAMPLER, 0,
-   mocs_wb << 4 | 1);
+   OUT_RELOC(brw->batch.bo, I915_GEM_DOMAIN_SAMPLER, 0,
+ mocs_wb << 4 | 1);
+   OUT_BATCH(0);
/* Dynamic state base address: */
-   OUT_RELOC64(brw->batch.bo,
-   I915_GEM_DOMAIN_RENDER | I915_GEM_DOMAIN_INSTRUCTION, 0,
-   mocs_wb << 4 | 1);
+   OUT_RELOC(brw->batch.bo,
+ I915_GEM_DOMAIN_RENDER | I915_GEM_DOMAIN_INSTRUCTION, 0,
+ mocs_wb << 4 | 1);
+   OUT_BATCH(0);
/* Indirect object base address: MEDIA_OBJECT data */
OUT_BATCH(mocs_wb << 4 | 1);
OUT_BATCH(0);
/* Instruction base address: shader kernels (incl. SIP) */
-   OUT_RELOC64(brw->cache.bo, I915_GEM_DOMAIN_INSTRUCTION, 0,
-   mocs_wb << 4 | 1);
-
+   OUT_RELOC(brw->cache.bo, I915_GEM_DOMAIN_INSTRUCTION, 0,
+ mocs_wb << 4 | 1);
+   OUT_BATCH(0);
/* General state buffer size */
OUT_BATCH(0xf001);
/* Dynamic state buffer size */
diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c 
b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
index 54081a1..220a35b 100644
--- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
+++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
@@ -411,9 +411,9 @@ intel_batchbuffer_emit_reloc64(struct brw_context *brw,
uint32_t read_domains, uint32_t write_domain,
   uint32_t delta)
 {
-   int ret = drm_intel_bo_emit_reloc(brw->batch.bo, 4*brw->batch.used,
- buffer, delta,
- read_domains, write_domain);
+   int ret = drm_intel_bo_emit_reloc_48bit(brw->batch.bo, 4*brw->batch.used,
+   buffer, delta,
+   read_domains, write_domain);
assert(ret == 0);
(void) ret;
 
-- 
2.4.5

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH v3 02/17] drm/i915/gen8: Make pdp allocation more dynamic

2015-07-01 Thread Michel Thierry

This transitional patch doesn't do much for the existing code. However,
it should make upcoming patches to use the full 48b address space a bit
easier. The patch also introduces the PML4, ie. the new top level structure
of the page tables.

v2: Renamed  pdp_free to be similar to  pd/pt (unmap_and_free_pdp).
v3: To facilitate testing, 48b mode will be available on Broadwell and
GEN9+, when i915.enable_ppgtt = 3.
v4: Rebase after s/page_tables/page_table/, added extra information
about 4-level page table formats and use IS_ENABLED macro.
v5: Check CONFIG_X86_64 instead of CONFIG_64BIT.
v6: Rebase after Mika's ppgtt cleanup / scratch merge patch series, and follow
his nomenclature in pdp functions (there is no alloc_pdp yet).
v7: Rebase after merged version of Mika's ppgtt cleanup patch series.
v8: Rebase after final merged version of Mika's ppgtt/scratch patches.

Signed-off-by: Ben Widawsky 
Signed-off-by: Michel Thierry  (v2+)
---
 drivers/gpu/drm/i915/i915_drv.h |   7 ++-
 drivers/gpu/drm/i915/i915_gem_gtt.c | 116 
 drivers/gpu/drm/i915/i915_gem_gtt.h |  41 ++---
 3 files changed, 128 insertions(+), 36 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 1dbd957..7bccfd5 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2490,7 +2490,12 @@ struct drm_i915_cmd_table {
 #define HAS_HW_CONTEXTS(dev)   (INTEL_INFO(dev)->gen >= 6)
 #define HAS_LOGICAL_RING_CONTEXTS(dev) (INTEL_INFO(dev)->gen >= 8)
 #define USES_PPGTT(dev)(i915.enable_ppgtt)
-#define USES_FULL_PPGTT(dev)   (i915.enable_ppgtt == 2)
+#define USES_FULL_PPGTT(dev)   (i915.enable_ppgtt >= 2)
+#ifdef CONFIG_X86_64
+# define USES_FULL_48BIT_PPGTT(dev)(i915.enable_ppgtt == 3)
+#else
+# define USES_FULL_48BIT_PPGTT(dev)false
+#endif
 
 #define HAS_OVERLAY(dev)   (INTEL_INFO(dev)->has_overlay)
 #define OVERLAY_NEEDS_PHYSICAL(dev)
(INTEL_INFO(dev)->overlay_needs_physical)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 712ca34..cdcc778 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -104,9 +104,13 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, 
int enable_ppgtt)
 {
bool has_aliasing_ppgtt;
bool has_full_ppgtt;
+   bool has_full_64bit_ppgtt;
 
has_aliasing_ppgtt = INTEL_INFO(dev)->gen >= 6;
has_full_ppgtt = INTEL_INFO(dev)->gen >= 7;
+   has_full_64bit_ppgtt = IS_ENABLED(CONFIG_X86_64) &&
+  (IS_BROADWELL(dev) ||
+   INTEL_INFO(dev)->gen >= 9) && false; /* FIXME: 
64b */
 
if (intel_vgpu_active(dev))
has_full_ppgtt = false; /* emulation is too hard */
@@ -125,6 +129,9 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, 
int enable_ppgtt)
if (enable_ppgtt == 2 && has_full_ppgtt)
return 2;
 
+   if (enable_ppgtt == 3 && has_full_64bit_ppgtt)
+   return 3;
+
 #ifdef CONFIG_INTEL_IOMMU
/* Disable ppgtt on SNB if VT-d is on. */
if (INTEL_INFO(dev)->gen == 6 && intel_iommu_gfx_mapped) {
@@ -522,6 +529,45 @@ static void gen8_initialize_pd(struct i915_address_space 
*vm,
fill_px(vm->dev, pd, scratch_pde);
 }
 
+static int __pdp_init(struct drm_device *dev,
+ struct i915_page_directory_pointer *pdp)
+{
+   size_t pdpes = I915_PDPES_PER_PDP(dev);
+
+   pdp->used_pdpes = kcalloc(BITS_TO_LONGS(pdpes),
+ sizeof(unsigned long),
+ GFP_KERNEL);
+   if (!pdp->used_pdpes)
+   return -ENOMEM;
+
+   pdp->page_directory = kcalloc(pdpes, sizeof(*pdp->page_directory),
+ GFP_KERNEL);
+   if (!pdp->page_directory) {
+   kfree(pdp->used_pdpes);
+   /* the PDP might be the statically allocated top level. Keep it
+* as clean as possible */
+   pdp->used_pdpes = NULL;
+   return -ENOMEM;
+   }
+
+   return 0;
+}
+
+static void __pdp_fini(struct i915_page_directory_pointer *pdp)
+{
+   kfree(pdp->used_pdpes);
+   kfree(pdp->page_directory);
+   pdp->page_directory = NULL;
+}
+
+static void free_pdp(struct drm_device *dev,
+struct i915_page_directory_pointer *pdp)
+{
+   __pdp_fini(pdp);
+   if (USES_FULL_48BIT_PPGTT(dev))
+   kfree(pdp);
+}
+
 /* Broadwell Page Directory Pointer Descriptors */
 static int gen8_write_pdp(struct drm_i915_gem_request *req,
  unsigned entry,
@@ -634,9 +680,6 @@ static void gen8_ppgtt_insert_entries(struct 
i915_address_space *vm,
pt_vaddr = NULL;
 
for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
-   if (WARN_ON(pdpe >= GEN8_LEGACY_PDPES))
-   brea

[Intel-gfx] [PATCH v3 05/17] drm/i915/gen8: implement alloc/free for 4lvl

2015-07-01 Thread Michel Thierry

PML4 has no special attributes, and there will always be a PML4.
So simply initialize it at creation, and destroy it at the end.

The code for 4lvl is able to call into the existing 3lvl page table code
to handle all of the lower levels.

v2: Return something at the end of gen8_alloc_va_range_4lvl to keep the
compiler happy. And define ret only in one place.
Updated gen8_ppgtt_unmap_pages and gen8_ppgtt_free to handle 4lvl.
v3: Use i915_dma_unmap_single instead of pci API. Fix a
couple of incorrect checks when unmapping pdp and pd pages (Akash).
v4: Call __pdp_fini also for 32b PPGTT. Clean up alloc_pdp param list.
v5: Prevent (harmless) out of range access in gen8_for_each_pml4e.
v6: Simplify alloc_vma_range_4lvl and gen8_ppgtt_init_common error
paths. (Akash)
v7: Rebase, s/gen8_ppgtt_free_*/gen8_ppgtt_cleanup_*/.
v8: Change location of pml4_init/fini. It will make next patches
cleaner.
v9: Rebase after Mika's ppgtt cleanup / scratch merge patch series, while
trying to reuse as much as possible for pdp alloc. pml4_init/fini
replaced by setup/cleanup_px macros.
v10: Rebase after Mika's merged ppgtt cleanup patch series.
v11: Rebase after final merged version of Mika's ppgtt/scratch patches.

Cc: Akash Goel 
Signed-off-by: Ben Widawsky 
Signed-off-by: Michel Thierry  (v2+)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 162 ++--
 drivers/gpu/drm/i915/i915_gem_gtt.h |  12 ++-
 2 files changed, 146 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 1327e41..d23b0a8 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -584,12 +584,44 @@ static void __pdp_fini(struct i915_page_directory_pointer 
*pdp)
pdp->page_directory = NULL;
 }
 
+static struct
+i915_page_directory_pointer *alloc_pdp(struct drm_device *dev)
+{
+   struct i915_page_directory_pointer *pdp;
+   int ret = -ENOMEM;
+
+   WARN_ON(!USES_FULL_48BIT_PPGTT(dev));
+
+   pdp = kzalloc(sizeof(*pdp), GFP_KERNEL);
+   if (!pdp)
+   return ERR_PTR(-ENOMEM);
+
+   ret = __pdp_init(dev, pdp);
+   if (ret)
+   goto fail_bitmap;
+
+   ret = setup_px(dev, pdp);
+   if (ret)
+   goto fail_page_m;
+
+   return pdp;
+
+fail_page_m:
+   __pdp_fini(pdp);
+fail_bitmap:
+   kfree(pdp);
+
+   return ERR_PTR(ret);
+}
+
 static void free_pdp(struct drm_device *dev,
 struct i915_page_directory_pointer *pdp)
 {
__pdp_fini(pdp);
-   if (USES_FULL_48BIT_PPGTT(dev))
+   if (USES_FULL_48BIT_PPGTT(dev)) {
+   cleanup_px(dev, pdp);
kfree(pdp);
+   }
 }
 
 /* Broadwell Page Directory Pointer Descriptors */
@@ -783,28 +815,46 @@ static void gen8_free_scratch(struct i915_address_space 
*vm)
free_scratch_page(dev, vm->scratch_page);
 }
 
-static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
+static void gen8_ppgtt_cleanup_3lvl(struct drm_device *dev,
+   struct i915_page_directory_pointer *pdp)
 {
-   struct i915_hw_ppgtt *ppgtt =
-   container_of(vm, struct i915_hw_ppgtt, base);
int i;
 
-   if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
-   for_each_set_bit(i, ppgtt->pdp.used_pdpes,
-I915_PDPES_PER_PDP(ppgtt->base.dev)) {
-   if (WARN_ON(!ppgtt->pdp.page_directory[i]))
-   continue;
+   for_each_set_bit(i, pdp->used_pdpes, I915_PDPES_PER_PDP(dev)) {
+   if (WARN_ON(!pdp->page_directory[i]))
+   continue;
 
-   gen8_free_page_tables(ppgtt->base.dev,
- ppgtt->pdp.page_directory[i]);
-   free_pd(ppgtt->base.dev,
-   ppgtt->pdp.page_directory[i]);
-   }
-   free_pdp(ppgtt->base.dev, &ppgtt->pdp);
-   } else {
-   WARN_ON(1); /* to be implemented later */
+   gen8_free_page_tables(dev, pdp->page_directory[i]);
+   free_pd(dev, pdp->page_directory[i]);
}
 
+   free_pdp(dev, pdp);
+}
+
+static void gen8_ppgtt_cleanup_4lvl(struct i915_hw_ppgtt *ppgtt)
+{
+   int i;
+
+   for_each_set_bit(i, ppgtt->pml4.used_pml4es, GEN8_PML4ES_PER_PML4) {
+   if (WARN_ON(!ppgtt->pml4.pdps[i]))
+   continue;
+
+   gen8_ppgtt_cleanup_3lvl(ppgtt->base.dev, ppgtt->pml4.pdps[i]);
+   }
+
+   cleanup_px(ppgtt->base.dev, &ppgtt->pml4);
+}
+
+static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
+{
+   struct i915_hw_ppgtt *ppgtt =
+   container_of(vm, struct i915_hw_ppgtt, base);
+
+   if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev))
+   gen8_ppgtt_cleanup_3lvl(ppgtt->base.dev, &ppgtt->pdp);
+   else
+   gen8_ppgtt_

[Intel-gfx] [PATCH v3 09/17] drm/i915/gen8: Add 4 level support in insert_entries and clear_range

2015-07-01 Thread Michel Thierry

When 48b is enabled, gen8_ppgtt_insert_entries needs to read the Page Map
Level 4 (PML4), before it selects which Page Directory Pointer (PDP)
it will write to.

Similarly, gen8_ppgtt_clear_range needs to get the correct PDP/PD range.

This patch was inspired by Ben's "Depend exclusively on map and
unmap_vma".

v2: Rebase after s/page_tables/page_table/.
v3: Remove unnecessary pdpe loop in gen8_ppgtt_clear_range_4lvl and use
clamp_pdp in gen8_ppgtt_insert_entries (Akash).
v4: Merge gen8_ppgtt_clear_range_4lvl into gen8_ppgtt_clear_range to
maintain symmetry with gen8_ppgtt_insert_entries (Akash).
v5: Do not mix pages and bytes in insert_entries (Akash).
v6: Prevent overflow in sg_nents << PAGE_SHIFT, when inserting 4GB at
once.
v7: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
Use gen8_px_index functions, and remove unnecessary number of pages
parameter in insert_pte_entries.

Cc: Akash Goel 
Signed-off-by: Michel Thierry 
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 51 -
 drivers/gpu/drm/i915/i915_gem_gtt.h | 11 
 2 files changed, 50 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 67d02b9..d16fbce 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -712,9 +712,9 @@ static void gen8_ppgtt_clear_pte_range(struct 
i915_address_space *vm,
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
gen8_pte_t *pt_vaddr;
-   unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
-   unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
-   unsigned pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
+   unsigned pdpe = gen8_pdpe_index(start);
+   unsigned pde = gen8_pde_index(start);
+   unsigned pte = gen8_pte_index(start);
unsigned num_entries = length >> PAGE_SHIFT;
unsigned last_pte, i;
 
@@ -763,12 +763,24 @@ static void gen8_ppgtt_clear_range(struct 
i915_address_space *vm,
 {
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
-   struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
-
gen8_pte_t scratch_pte = gen8_pte_encode(px_dma(vm->scratch_page),
 I915_CACHE_LLC, use_scratch);
 
-   gen8_ppgtt_clear_pte_range(vm, pdp, start, length, scratch_pte);
+   if (!USES_FULL_48BIT_PPGTT(vm->dev)) {
+   gen8_ppgtt_clear_pte_range(vm, &ppgtt->pdp, start, length,
+  scratch_pte);
+   } else {
+   uint64_t templ4, pml4e;
+   struct i915_page_directory_pointer *pdp;
+
+   gen8_for_each_pml4e(pdp, &ppgtt->pml4, start, length, templ4, 
pml4e) {
+   uint64_t pdp_len = gen8_clamp_pdp(start, length);
+   uint64_t pdp_start = start;
+
+   gen8_ppgtt_clear_pte_range(vm, pdp, pdp_start, pdp_len,
+  scratch_pte);
+   }
+   }
 }
 
 static void
@@ -781,9 +793,9 @@ gen8_ppgtt_insert_pte_entries(struct i915_address_space *vm,
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
gen8_pte_t *pt_vaddr;
-   unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
-   unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
-   unsigned pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
+   unsigned pdpe = gen8_pdpe_index(start);
+   unsigned pde = gen8_pde_index(start);
+   unsigned pte = gen8_pte_index(start);
 
pt_vaddr = NULL;
 
@@ -801,7 +813,8 @@ gen8_ppgtt_insert_pte_entries(struct i915_address_space *vm,
kunmap_px(ppgtt, pt_vaddr);
pt_vaddr = NULL;
if (++pde == I915_PDES) {
-   pdpe++;
+   if (++pdpe == I915_PDPES_PER_PDP(vm->dev))
+   break;
pde = 0;
}
pte = 0;
@@ -820,11 +833,25 @@ static void gen8_ppgtt_insert_entries(struct 
i915_address_space *vm,
 {
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
-   struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
struct sg_page_iter sg_iter;
 
__sg_page_iter_start(&sg_iter, pages->sgl, sg_nents(pages->sgl), 0);
-   gen8_ppgtt_insert_pte_entries(vm, pdp, &sg_iter, start, cache_level);
+
+   if (!USES_FULL_48BIT_PPGTT(vm->dev)) {
+   gen8_ppgtt_insert_pte_entries(vm, &ppgtt->pdp, &sg_iter, start,
+ cache_level);
+   } else {
+   struct i915_page_directory_pointer *pdp;
+

[Intel-gfx] [PATCH libdrm v2 1/2] intel: Add EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag.

2015-07-01 Thread Michel Thierry

Gen8+ supports 48-bit virtual addresses, but some objects must always be
allocated inside the 32-bit address range.

In specific, any resource used with flat/heapless (0x-0xf000)
General State Heap (GSH) or Intruction State Heap (ISH) must be in a
32-bit range, because the General State Offset and Instruction State Offset
are limited to 32-bits.

Provide a flag to set when the 4GB limit is not necessary in a given bo.
48-bit range will only be used when explicitly requested.

Calls to the new drm_intel_bo_emit_reloc_48bit function will have this flag
set automatically, while calls to drm_intel_bo_emit_reloc will clear it.

v2: Make set/clear functions nops on pre-gen8 platforms, and use them
internally in emit_reloc functions (Ben)
s/48BADDRESS/48B_ADDRESS/ (Dave)

Cc: Ben Widawsky 
Cc: Dave Gordon 
Cc: dri-de...@lists.freedesktop.org
Signed-off-by: Michel Thierry 
---
 include/drm/i915_drm.h|  3 ++-
 intel/intel_bufmgr.c  | 24 +
 intel/intel_bufmgr.h  |  8 ++-
 intel/intel_bufmgr_gem.c  | 54 +++
 intel/intel_bufmgr_priv.h | 11 ++
 5 files changed, 94 insertions(+), 6 deletions(-)

diff --git a/include/drm/i915_drm.h b/include/drm/i915_drm.h
index ded43b1..426b25c 100644
--- a/include/drm/i915_drm.h
+++ b/include/drm/i915_drm.h
@@ -680,7 +680,8 @@ struct drm_i915_gem_exec_object2 {
 #define EXEC_OBJECT_NEEDS_FENCE (1<<0)
 #define EXEC_OBJECT_NEEDS_GTT  (1<<1)
 #define EXEC_OBJECT_WRITE  (1<<2)
-#define __EXEC_OBJECT_UNKNOWN_FLAGS -(EXEC_OBJECT_WRITE<<1)
+#define EXEC_OBJECT_SUPPORTS_48B_ADDRESS (1<<3)
+#define __EXEC_OBJECT_UNKNOWN_FLAGS -(EXEC_OBJECT_SUPPORTS_48B_ADDRESS<<1)
__u64 flags;
 
__u64 rsvd1;
diff --git a/intel/intel_bufmgr.c b/intel/intel_bufmgr.c
index 14ea9f9..590a855 100644
--- a/intel/intel_bufmgr.c
+++ b/intel/intel_bufmgr.c
@@ -188,6 +188,18 @@ drm_intel_bufmgr_check_aperture_space(drm_intel_bo ** 
bo_array, int count)
return bo_array[0]->bufmgr->check_aperture_space(bo_array, count);
 }
 
+void drm_intel_bo_set_supports_48b_address(drm_intel_bo *bo)
+{
+   if (bo->bufmgr->bo_set_supports_48b_address)
+   bo->bufmgr->bo_set_supports_48b_address(bo);
+}
+
+void drm_intel_bo_clear_supports_48b_address(drm_intel_bo *bo)
+{
+   if (bo->bufmgr->bo_clear_supports_48b_address)
+   bo->bufmgr->bo_clear_supports_48b_address(bo);
+}
+
 int
 drm_intel_bo_flink(drm_intel_bo *bo, uint32_t * name)
 {
@@ -202,6 +214,18 @@ drm_intel_bo_emit_reloc(drm_intel_bo *bo, uint32_t offset,
drm_intel_bo *target_bo, uint32_t target_offset,
uint32_t read_domains, uint32_t write_domain)
 {
+   drm_intel_bo_clear_supports_48b_address(target_bo);
+   return bo->bufmgr->bo_emit_reloc(bo, offset,
+target_bo, target_offset,
+read_domains, write_domain);
+}
+
+int
+drm_intel_bo_emit_reloc_48bit(drm_intel_bo *bo, uint32_t offset,
+   drm_intel_bo *target_bo, uint32_t target_offset,
+   uint32_t read_domains, uint32_t write_domain)
+{
+   drm_intel_bo_set_supports_48b_address(target_bo);
return bo->bufmgr->bo_emit_reloc(bo, offset,
 target_bo, target_offset,
 read_domains, write_domain);
diff --git a/intel/intel_bufmgr.h b/intel/intel_bufmgr.h
index 285919e..62480cb 100644
--- a/intel/intel_bufmgr.h
+++ b/intel/intel_bufmgr.h
@@ -87,7 +87,8 @@ struct _drm_intel_bo {
/**
 * Last seen card virtual address (offset from the beginning of the
 * aperture) for the object.  This should be used to fill relocation
-* entries when calling drm_intel_bo_emit_reloc()
+* entries when calling drm_intel_bo_emit_reloc() or
+* drm_intel_bo_emit_reloc_48bit()
 */
uint64_t offset64;
 };
@@ -137,6 +138,8 @@ void drm_intel_bo_wait_rendering(drm_intel_bo *bo);
 
 void drm_intel_bufmgr_set_debug(drm_intel_bufmgr *bufmgr, int enable_debug);
 void drm_intel_bufmgr_destroy(drm_intel_bufmgr *bufmgr);
+void drm_intel_bo_set_supports_48b_address(drm_intel_bo *bo);
+void drm_intel_bo_clear_supports_48b_address(drm_intel_bo *bo);
 int drm_intel_bo_exec(drm_intel_bo *bo, int used,
  struct drm_clip_rect *cliprects, int num_cliprects, int 
DR4);
 int drm_intel_bo_mrb_exec(drm_intel_bo *bo, int used,
@@ -147,6 +150,9 @@ int drm_intel_bufmgr_check_aperture_space(drm_intel_bo ** 
bo_array, int count);
 int drm_intel_bo_emit_reloc(drm_intel_bo *bo, uint32_t offset,
drm_intel_bo *target_bo, uint32_t target_offset,
uint32_t read_domains, uint32_t write_domain);
+int drm_intel_bo_emit_reloc_48bit(drm_intel_bo *bo, uint32_t offset,
+ drm_intel_bo *target_bo, uint32_t 
ta

[Intel-gfx] [PATCH v3 07/17] drm/i915/gen8: Generalize PTE writing for GEN8 PPGTT

2015-07-01 Thread Michel Thierry

The insert_entries function was the function used to write PTEs. For the
PPGTT it was "hardcoded" to only understand two level page tables, which
was the case for GEN7. We can reuse this for 4 level page tables, and
remove the concept of insert_entries, which was never viable past 2
level page tables anyway, but it requires a bit of rework to make the
function a bit more generic.

This patch begins the generalization work, and it will be heavily used
upon when the 48b code is complete. The patch series attempts to make
each function which touches a part of code specific to the page table
level and here is no exception.

v2: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
v3: Rebase after final merged version of Mika's ppgtt/scratch patches.

Signed-off-by: Ben Widawsky 
Signed-off-by: Michel Thierry  (v2)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 52 +++--
 1 file changed, 39 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index fcb8c4b..bd31cbc 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -703,24 +703,21 @@ static int gen8_48b_mm_switch(struct i915_hw_ppgtt *ppgtt,
return gen8_write_pdp(req, 0, px_dma(&ppgtt->pml4));
 }
 
-static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
-  uint64_t start,
-  uint64_t length,
-  bool use_scratch)
+static void gen8_ppgtt_clear_pte_range(struct i915_address_space *vm,
+  struct i915_page_directory_pointer *pdp,
+  uint64_t start,
+  uint64_t length,
+  gen8_pte_t scratch_pte)
 {
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
-   struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
-   gen8_pte_t *pt_vaddr, scratch_pte;
+   gen8_pte_t *pt_vaddr;
unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
unsigned pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
unsigned num_entries = length >> PAGE_SHIFT;
unsigned last_pte, i;
 
-   scratch_pte = gen8_pte_encode(px_dma(ppgtt->base.scratch_page),
- I915_CACHE_LLC, use_scratch);
-
while (num_entries) {
struct i915_page_directory *pd;
struct i915_page_table *pt;
@@ -759,14 +756,30 @@ static void gen8_ppgtt_clear_range(struct 
i915_address_space *vm,
}
 }
 
-static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
- struct sg_table *pages,
- uint64_t start,
- enum i915_cache_level cache_level, u32 
unused)
+static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
+  uint64_t start,
+  uint64_t length,
+  bool use_scratch)
 {
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
+
+   gen8_pte_t scratch_pte = gen8_pte_encode(px_dma(vm->scratch_page),
+I915_CACHE_LLC, use_scratch);
+
+   gen8_ppgtt_clear_pte_range(vm, pdp, start, length, scratch_pte);
+}
+
+static void
+gen8_ppgtt_insert_pte_entries(struct i915_address_space *vm,
+ struct i915_page_directory_pointer *pdp,
+ struct sg_table *pages,
+ uint64_t start,
+ enum i915_cache_level cache_level)
+{
+   struct i915_hw_ppgtt *ppgtt =
+   container_of(vm, struct i915_hw_ppgtt, base);
gen8_pte_t *pt_vaddr;
unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
@@ -800,6 +813,19 @@ static void gen8_ppgtt_insert_entries(struct 
i915_address_space *vm,
kunmap_px(ppgtt, pt_vaddr);
 }
 
+static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
+ struct sg_table *pages,
+ uint64_t start,
+ enum i915_cache_level cache_level,
+ u32 unused)
+{
+   struct i915_hw_ppgtt *ppgtt =
+   container_of(vm, struct i915_hw_ppgtt, base);
+   struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
+
+   gen8_ppgtt_insert_pte_entries(vm, pdp, pages, start, cache_level);
+}
+
 static void gen8_free_pag

[Intel-gfx] [PATCH v3 16/17] drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset

2015-07-01 Thread Michel Thierry

There are some allocations that must be only referenced by 32-bit
offsets. To limit the chances of having the first 4GB already full,
objects not requiring this workaround use DRM_MM_SEARCH_BELOW/
DRM_MM_CREATE_TOP flags

In specific, any resource used with flat/heapless (0x-0xf000)
General State Heap (GSH) or Intructions State Heap (ISH) must be in a
32-bit range, because the General State Offset and Instruction State
Offset are limited to 32-bits.

Objects must have EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag to indicate if
they can be allocated above the 32-bit address range. To limit the
chances of having the first 4GB already full, objects will use
DRM_MM_SEARCH_BELOW + DRM_MM_CREATE_TOP flags when possible.

v2: Changed flag logic from neeeds_32b, to supports_48b.
v3: Moved 48-bit support flag back to exec_object. (Chris, Daniel)
v4: Split pin flags into PIN_ZONE_4G and PIN_HIGH; update PIN_OFFSET_MASK
to use last PIN_ defined instead of hard-coded value; use correct limit
check in eb_vma_misplaced. (Chris)

Cc: Chris Wilson 
Cc: Ben Widawsky 
Signed-off-by: Michel Thierry 
---
 drivers/gpu/drm/i915/i915_drv.h|  4 +++-
 drivers/gpu/drm/i915/i915_gem.c| 17 +++--
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 10 ++
 include/uapi/drm/i915_drm.h|  3 ++-
 4 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index c720a18..aac51fb 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2765,7 +2765,9 @@ void i915_gem_vma_destroy(struct i915_vma *vma);
 #define PIN_OFFSET_BIAS(1<<3)
 #define PIN_USER   (1<<4)
 #define PIN_UPDATE (1<<5)
-#define PIN_OFFSET_MASK (~4095)
+#define PIN_ZONE_4G(1<<6)
+#define PIN_HIGH   (1<<7)
+#define PIN_OFFSET_MASK -(PIN_HIGH<<1)
 int __must_check
 i915_gem_object_pin(struct drm_i915_gem_object *obj,
struct i915_address_space *vm,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index eeea748..8aa0189 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3718,6 +3718,8 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object 
*obj,
struct drm_i915_private *dev_priv = dev->dev_private;
u32 fence_alignment, unfenced_alignment;
u64 size, fence_size;
+   u32 search_flag = DRM_MM_SEARCH_DEFAULT;
+   u32 alloc_flag = DRM_MM_CREATE_DEFAULT;
u64 start =
flags & PIN_OFFSET_BIAS ? flags & PIN_OFFSET_MASK : 0;
u64 end =
@@ -3759,6 +3761,17 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object 
*obj,
   obj->tiling_mode,
   false);
size = flags & PIN_MAPPABLE ? fence_size : obj->base.size;
+
+   if (flags & PIN_HIGH) {
+   search_flag = DRM_MM_SEARCH_BELOW;
+   alloc_flag = DRM_MM_CREATE_TOP;
+   }
+
+   /* Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset,
+* limit address to the first 4GBs for flagged objects.
+*/
+   if (flags & PIN_ZONE_4G)
+   end = (1ULL << 32);
}
 
if (alignment == 0)
@@ -3801,8 +3814,8 @@ search_free:
  size, alignment,
  obj->cache_level,
  start, end,
- DRM_MM_SEARCH_DEFAULT,
- DRM_MM_CREATE_DEFAULT);
+ search_flag,
+ alloc_flag);
if (ret) {
ret = i915_gem_evict_something(dev, vm, size, alignment,
   obj->cache_level,
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 600db74..f52b736 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -588,11 +588,17 @@ i915_gem_execbuffer_reserve_vma(struct i915_vma *vma,
if (entry->flags & EXEC_OBJECT_NEEDS_GTT)
flags |= PIN_GLOBAL;
 
+   flags |= PIN_ZONE_4G;
+   if (entry->flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS)
+   flags &= ~PIN_ZONE_4G;
+
if (!drm_mm_node_allocated(&vma->node)) {
if (entry->flags & __EXEC_OBJECT_NEEDS_MAP)
flags |= PIN_GLOBAL | PIN_MAPPABLE;
if (entry->flags & __EXEC_OBJECT_NEEDS_BIAS)
flags |= BATCH_OFFSET_BIAS | PIN_OFFSET_BIAS;
+   if ((flags & PIN_MAPPABLE) == 0)
+   flags |= PIN_HIGH;
}
 
ret = i915_gem_o

[Intel-gfx] [PATCH v3 17/17] drm/i915/gen8: Flip the 48b switch

2015-07-01 Thread Michel Thierry

Use 48b addresses if hw supports it (i915.enable_ppgtt=3).

Note, aliasing PPGTT remains 32b only.

Signed-off-by: Michel Thierry 
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 4 ++--
 drivers/gpu/drm/i915/i915_params.c  | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 7712b10..27dc28c 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -110,7 +110,7 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, 
int enable_ppgtt)
has_full_ppgtt = INTEL_INFO(dev)->gen >= 7;
has_full_64bit_ppgtt = IS_ENABLED(CONFIG_X86_64) &&
   (IS_BROADWELL(dev) ||
-   INTEL_INFO(dev)->gen >= 9) && false; /* FIXME: 
64b */
+   INTEL_INFO(dev)->gen >= 9);
 
if (intel_vgpu_active(dev))
has_full_ppgtt = false; /* emulation is too hard */
@@ -148,7 +148,7 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, 
int enable_ppgtt)
}
 
if (INTEL_INFO(dev)->gen >= 8 && i915.enable_execlists)
-   return 2;
+   return has_full_64bit_ppgtt ? 3 : 2;
else
return has_aliasing_ppgtt ? 1 : 0;
 }
diff --git a/drivers/gpu/drm/i915/i915_params.c 
b/drivers/gpu/drm/i915/i915_params.c
index 7983fe4..ccf3eb2 100644
--- a/drivers/gpu/drm/i915/i915_params.c
+++ b/drivers/gpu/drm/i915/i915_params.c
@@ -110,7 +110,7 @@ MODULE_PARM_DESC(enable_hangcheck,
 module_param_named_unsafe(enable_ppgtt, i915.enable_ppgtt, int, 0400);
 MODULE_PARM_DESC(enable_ppgtt,
"Override PPGTT usage. "
-   "(-1=auto [default], 0=disabled, 1=aliasing, 2=full)");
+   "(-1=auto [default], 0=disabled, 1=aliasing, 2=full, 3=full_64b)");
 
 module_param_named(enable_execlists, i915.enable_execlists, int, 0400);
 MODULE_PARM_DESC(enable_execlists,
-- 
2.4.5

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH libdrm v2 2/2] configure.ac: bump version to 2.4.63

2015-07-01 Thread Michel Thierry

Cc: dri-de...@lists.freedesktop.org
Signed-off-by: Michel Thierry 
---
 configure.ac | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/configure.ac b/configure.ac
index 001fd3d..12b8465 100644
--- a/configure.ac
+++ b/configure.ac
@@ -20,7 +20,7 @@
 
 AC_PREREQ([2.63])
 AC_INIT([libdrm],
-[2.4.62],
+[2.4.63],
 [https://bugs.freedesktop.org/enter_bug.cgi?product=DRI],
 [libdrm])
 
-- 
2.4.5

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH v3 14/17] drm/i915: batch_obj vm offset must be u64

2015-07-01 Thread Michel Thierry

Otherwise it can overflow in 48-bit mode, and cause an incorrect
exec_start.

Before commit 5f19e2bff ("drm/i915: Merged the many do_execbuf()
parameters into a structure"), it was already an u64, so it could be
seen as a regression (or as an optimization that looked good at that time).

Cc: John Harrison 
Signed-off-by: Michel Thierry 
---
 drivers/gpu/drm/i915/i915_drv.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index d245c82..c720a18 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1664,7 +1664,7 @@ struct i915_execbuffer_params {
struct drm_file *file;
uint32_tdispatch_flags;
uint32_targs_batch_start_offset;
-   uint32_tbatch_obj_vm_offset;
+   uint64_tbatch_obj_vm_offset;
struct intel_engine_cs  *ring;
struct drm_i915_gem_object  *batch_obj;
struct intel_context*ctx;
-- 
2.4.5

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH v3 15/17] drm/i915/userptr: Kill user_size limit check

2015-07-01 Thread Michel Thierry

GTT was only 32b and its max value is 4GB. In order to allow objects
bigger than 4GB in 48b PPGTT, i915_gem_userptr_ioctl we could check
against max 48b range (1ULL << 48).

But since the check no longer applies, just kill the limit.

v2: Use the default ctx to infer the ppgtt max size (Akash).
v3: Just kill the limit, it was only there for early detection of an
error when used for execbuffer (Chris).

Cc: Akash Goel 
Cc: Chris Wilson 
Signed-off-by: Michel Thierry 
---
 drivers/gpu/drm/i915/i915_gem_userptr.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c 
b/drivers/gpu/drm/i915/i915_gem_userptr.c
index 1f4e5a3..1b66e39 100644
--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
@@ -788,7 +788,6 @@ static const struct drm_i915_gem_object_ops 
i915_gem_userptr_ops = {
 int
 i915_gem_userptr_ioctl(struct drm_device *dev, void *data, struct drm_file 
*file)
 {
-   struct drm_i915_private *dev_priv = dev->dev_private;
struct drm_i915_gem_userptr *args = data;
struct drm_i915_gem_object *obj;
int ret;
@@ -801,9 +800,6 @@ i915_gem_userptr_ioctl(struct drm_device *dev, void *data, 
struct drm_file *file
if (offset_in_page(args->user_ptr | args->user_size))
return -EINVAL;
 
-   if (args->user_size > dev_priv->gtt.base.total)
-   return -E2BIG;
-
if (!access_ok(args->flags & I915_USERPTR_READ_ONLY ? VERIFY_READ : 
VERIFY_WRITE,
   (char __user *)(unsigned long)args->user_ptr, 
args->user_size))
return -EFAULT;
-- 
2.4.5

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH v3 01/17] drm/i915: Remove unnecessary gen8_clamp_pd

2015-07-01 Thread Michel Thierry

gen8_clamp_pd clamps to the next page directory boundary, but the macro
gen8_for_each_pde already has a check to stop at the page directory boundary.

Furthermore, i915_pte_count also restricts to the next page table
boundary.

v2: Rebase after Mika's ppgtt cleanup / scratch merge patch series.

Suggested-by: Akash Goel 
Signed-off-by: Michel Thierry 
---
 drivers/gpu/drm/i915/i915_gem_gtt.c |  2 +-
 drivers/gpu/drm/i915/i915_gem_gtt.h | 11 ---
 2 files changed, 1 insertion(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index b29b73f..712ca34 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -955,7 +955,7 @@ static int gen8_alloc_va_range(struct i915_address_space 
*vm,
gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
gen8_pde_t *const page_directory = kmap_px(pd);
struct i915_page_table *pt;
-   uint64_t pd_len = gen8_clamp_pd(start, length);
+   uint64_t pd_len = length;
uint64_t pd_start = start;
uint32_t pde;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h 
b/drivers/gpu/drm/i915/i915_gem_gtt.h
index e1cfa29..d5bf953 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -444,17 +444,6 @@ static inline uint32_t gen6_pde_index(uint32_t addr)
 temp = min(temp, length),  \
 start += temp, length -= temp)
 
-/* Clamp length to the next page_directory boundary */
-static inline uint64_t gen8_clamp_pd(uint64_t start, uint64_t length)
-{
-   uint64_t next_pd = ALIGN(start + 1, 1 << GEN8_PDPE_SHIFT);
-
-   if (next_pd > (start + length))
-   return length;
-
-   return next_pd - start;
-}
-
 static inline uint32_t gen8_pte_index(uint64_t address)
 {
return i915_pte_index(address, GEN8_PDE_SHIFT);
-- 
2.4.5

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH v3 13/17] drm/i915: object size needs to be u64

2015-07-01 Thread Michel Thierry

In a 48b world, users can try to allocate buffers bigger than 4GB; in
these cases it is important that size is a 64b variable.

Also added a warning for illegal bind with size = 0.

Signed-off-by: Michel Thierry 
---
 drivers/gpu/drm/i915/i915_gem.c | 5 +++--
 drivers/gpu/drm/i915/i915_gem_gtt.c | 3 +++
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index a2a4a27..eeea748 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3716,7 +3716,8 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object 
*obj,
 {
struct drm_device *dev = obj->base.dev;
struct drm_i915_private *dev_priv = dev->dev_private;
-   u32 size, fence_size, fence_alignment, unfenced_alignment;
+   u32 fence_alignment, unfenced_alignment;
+   u64 size, fence_size;
u64 start =
flags & PIN_OFFSET_BIAS ? flags & PIN_OFFSET_MASK : 0;
u64 end =
@@ -3775,7 +3776,7 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object 
*obj,
 * attempt to find space.
 */
if (size > end) {
-   DRM_DEBUG("Attempting to bind an object (view type=%u) larger 
than the aperture: size=%u > %s aperture=%llu\n",
+   DRM_DEBUG("Attempting to bind an object (view type=%u) larger 
than the aperture: size=%llu > %s aperture=%llu\n",
  ggtt_view ? ggtt_view->type : 0,
  size,
  flags & PIN_MAPPABLE ? "mappable" : "total",
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 0c41e5d..7712b10 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -3336,6 +3336,9 @@ int i915_vma_bind(struct i915_vma *vma, enum 
i915_cache_level cache_level,
if (WARN_ON(flags == 0))
return -EINVAL;
 
+   if (WARN_ON(vma->node.size == 0))
+   return -EINVAL;
+
bind_flags = 0;
if (flags & PIN_GLOBAL)
bind_flags |= GLOBAL_BIND;
-- 
2.4.5

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH v3 11/17] drm/i915: Expand error state's address width to 64b

2015-07-01 Thread Michel Thierry

Signed-off-by: Ben Widawsky 
Signed-off-by: Michel Thierry 
---
 drivers/gpu/drm/i915/i915_drv.h   |  4 ++--
 drivers/gpu/drm/i915/i915_gpu_error.c | 17 +
 2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 7bccfd5..d245c82 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -546,7 +546,7 @@ struct drm_i915_error_state {
 
struct drm_i915_error_object {
int page_count;
-   u32 gtt_offset;
+   u64 gtt_offset;
u32 *pages[0];
} *ringbuffer, *batchbuffer, *wa_batchbuffer, *ctx, *hws_page;
 
@@ -572,7 +572,7 @@ struct drm_i915_error_state {
u32 size;
u32 name;
u32 rseqno[I915_NUM_RINGS], wseqno;
-   u32 gtt_offset;
+   u64 gtt_offset;
u32 read_domains;
u32 write_domain;
s32 fence_reg:I915_MAX_NUM_FENCE_BITS;
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c 
b/drivers/gpu/drm/i915/i915_gpu_error.c
index 6f42569..cdbd4c2 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -197,7 +197,7 @@ static void print_error_buffers(struct 
drm_i915_error_state_buf *m,
err_printf(m, "  %s [%d]:\n", name, count);
 
while (count--) {
-   err_printf(m, "%08x %8u %02x %02x [ ",
+   err_printf(m, "%016llx %8u %02x %02x [ ",
   err->gtt_offset,
   err->size,
   err->read_domains,
@@ -426,7 +426,7 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf 
*m,
err_printf(m, " (submitted by %s [%d])",
   error->ring[i].comm,
   error->ring[i].pid);
-   err_printf(m, " --- gtt_offset = 0x%08x\n",
+   err_printf(m, " --- gtt_offset = 0x%016llx\n",
   obj->gtt_offset);
print_error_obj(m, obj);
}
@@ -434,7 +434,8 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf 
*m,
obj = error->ring[i].wa_batchbuffer;
if (obj) {
err_printf(m, "%s (w/a) --- gtt_offset = 0x%08x\n",
-  dev_priv->ring[i].name, obj->gtt_offset);
+  dev_priv->ring[i].name,
+  lower_32_bits(obj->gtt_offset));
print_error_obj(m, obj);
}
 
@@ -453,14 +454,14 @@ int i915_error_state_to_str(struct 
drm_i915_error_state_buf *m,
if ((obj = error->ring[i].ringbuffer)) {
err_printf(m, "%s --- ringbuffer = 0x%08x\n",
   dev_priv->ring[i].name,
-  obj->gtt_offset);
+  lower_32_bits(obj->gtt_offset));
print_error_obj(m, obj);
}
 
if ((obj = error->ring[i].hws_page)) {
err_printf(m, "%s --- HW Status = 0x%08x\n",
   dev_priv->ring[i].name,
-  obj->gtt_offset);
+  lower_32_bits(obj->gtt_offset));
offset = 0;
for (elt = 0; elt < PAGE_SIZE/16; elt += 4) {
err_printf(m, "[%04x] %08x %08x %08x %08x\n",
@@ -476,13 +477,13 @@ int i915_error_state_to_str(struct 
drm_i915_error_state_buf *m,
if ((obj = error->ring[i].ctx)) {
err_printf(m, "%s --- HW Context = 0x%08x\n",
   dev_priv->ring[i].name,
-  obj->gtt_offset);
+  lower_32_bits(obj->gtt_offset));
print_error_obj(m, obj);
}
}
 
if ((obj = error->semaphore_obj)) {
-   err_printf(m, "Semaphore page = 0x%08x\n", obj->gtt_offset);
+   err_printf(m, "Semaphore page = 0x%016llx\n", obj->gtt_offset);
for (elt = 0; elt < PAGE_SIZE/16; elt += 4) {
err_printf(m, "[%04x] %08x %08x %08x %08x\n",
   elt * 4,
@@ -590,7 +591,7 @@ i915_error_object_create(struct drm_i915_private *dev_priv,
int num_pages;
bool use_ggtt;
int i = 0;
-   u32 reloc_offset;
+   u64 reloc_offset;
 
if (src == NULL || src->pages == NULL)
return NULL;
-- 
2.4.5

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman

[Intel-gfx] [PATCH v3 08/17] drm/i915/gen8: Pass sg_iter through pte inserts

2015-07-01 Thread Michel Thierry

As a step towards implementing 4 levels, while not discarding the
existing pte insert functions, we need to pass the sg_iter through.
The current function understands to the page directory granularity.
An object's pages may span the page directory, and so using the iter
directly as we write the PTEs allows the iterator to stay coherent
through a VMA insert operation spanning multiple page table levels.

v2: Rebase after s/page_tables/page_table/.
v3: Rebase after Mika's ppgtt cleanup / scratch merge patch series;
updated commit message (s/map/insert).

Signed-off-by: Ben Widawsky 
Signed-off-by: Michel Thierry  (v2+)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index bd31cbc..67d02b9 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -774,7 +774,7 @@ static void gen8_ppgtt_clear_range(struct 
i915_address_space *vm,
 static void
 gen8_ppgtt_insert_pte_entries(struct i915_address_space *vm,
  struct i915_page_directory_pointer *pdp,
- struct sg_table *pages,
+ struct sg_page_iter *sg_iter,
  uint64_t start,
  enum i915_cache_level cache_level)
 {
@@ -784,11 +784,10 @@ gen8_ppgtt_insert_pte_entries(struct i915_address_space 
*vm,
unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
unsigned pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
-   struct sg_page_iter sg_iter;
 
pt_vaddr = NULL;
 
-   for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
+   while (__sg_page_iter_next(sg_iter)) {
if (pt_vaddr == NULL) {
struct i915_page_directory *pd = 
pdp->page_directory[pdpe];
struct i915_page_table *pt = pd->page_table[pde];
@@ -796,7 +795,7 @@ gen8_ppgtt_insert_pte_entries(struct i915_address_space *vm,
}
 
pt_vaddr[pte] =
-   gen8_pte_encode(sg_page_iter_dma_address(&sg_iter),
+   gen8_pte_encode(sg_page_iter_dma_address(sg_iter),
cache_level, true);
if (++pte == GEN8_PTES) {
kunmap_px(ppgtt, pt_vaddr);
@@ -822,8 +821,10 @@ static void gen8_ppgtt_insert_entries(struct 
i915_address_space *vm,
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
+   struct sg_page_iter sg_iter;
 
-   gen8_ppgtt_insert_pte_entries(vm, pdp, pages, start, cache_level);
+   __sg_page_iter_start(&sg_iter, pages->sgl, sg_nents(pages->sgl), 0);
+   gen8_ppgtt_insert_pte_entries(vm, pdp, &sg_iter, start, cache_level);
 }
 
 static void gen8_free_page_tables(struct drm_device *dev,
-- 
2.4.5

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH v3 12/17] drm/i915/gen8: Add ppgtt info and debug_dump

2015-07-01 Thread Michel Thierry

v2: Clean up patch after rebases.
v3: gen8_dump_ppgtt for 32b and 48b PPGTT.
v4: Use used_pml4es/pdpes (Akash).
v5: Rebase after Mika's ppgtt cleanup / scratch merge patch series.

Signed-off-by: Ben Widawsky 
Signed-off-by: Michel Thierry  (v2+)
---
 drivers/gpu/drm/i915/i915_debugfs.c | 18 
 drivers/gpu/drm/i915/i915_gem_gtt.c | 92 +
 2 files changed, 102 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index ad9a737..8c3dcc9 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2223,7 +2223,6 @@ static void gen6_ppgtt_info(struct seq_file *m, struct 
drm_device *dev)
 {
struct drm_i915_private *dev_priv = dev->dev_private;
struct intel_engine_cs *ring;
-   struct drm_file *file;
int i;
 
if (INTEL_INFO(dev)->gen == 6)
@@ -2246,13 +2245,6 @@ static void gen6_ppgtt_info(struct seq_file *m, struct 
drm_device *dev)
ppgtt->debug_dump(ppgtt, m);
}
 
-   list_for_each_entry_reverse(file, &dev->filelist, lhead) {
-   struct drm_i915_file_private *file_priv = file->driver_priv;
-
-   seq_printf(m, "proc: %s\n",
-  get_pid_task(file->pid, PIDTYPE_PID)->comm);
-   idr_for_each(&file_priv->context_idr, per_file_ctx, m);
-   }
seq_printf(m, "ECOCHK: 0x%08x\n", I915_READ(GAM_ECOCHK));
 }
 
@@ -2261,6 +2253,7 @@ static int i915_ppgtt_info(struct seq_file *m, void *data)
struct drm_info_node *node = m->private;
struct drm_device *dev = node->minor->dev;
struct drm_i915_private *dev_priv = dev->dev_private;
+   struct drm_file *file;
 
int ret = mutex_lock_interruptible(&dev->struct_mutex);
if (ret)
@@ -2272,6 +2265,15 @@ static int i915_ppgtt_info(struct seq_file *m, void 
*data)
else if (INTEL_INFO(dev)->gen >= 6)
gen6_ppgtt_info(m, dev);
 
+   list_for_each_entry_reverse(file, &dev->filelist, lhead) {
+   struct drm_i915_file_private *file_priv = file->driver_priv;
+
+   seq_printf(m, "\nproc: %s\n",
+  get_pid_task(file->pid, PIDTYPE_PID)->comm);
+   idr_for_each(&file_priv->context_idr, per_file_ctx,
+(void *)(unsigned long)m);
+   }
+
intel_runtime_pm_put(dev_priv);
mutex_unlock(&dev->struct_mutex);
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index c6fc0d3..0c41e5d 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1337,6 +1337,97 @@ static int gen8_alloc_va_range(struct i915_address_space 
*vm,
return gen8_alloc_va_range_3lvl(vm, &ppgtt->pdp, start, length);
 }
 
+static void gen8_dump_pdp(struct i915_page_directory_pointer *pdp,
+ uint64_t start, uint64_t length,
+ gen8_pte_t scratch_pte,
+ struct seq_file *m)
+{
+   struct i915_page_directory *pd;
+   uint64_t temp;
+   uint32_t pdpe;
+
+   gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
+   struct i915_page_table *pt;
+   uint64_t pd_len = length;
+   uint64_t pd_start = start;
+   uint32_t pde;
+
+   if (!pd)
+   continue;
+
+   if(!test_bit(pdpe, pdp->used_pdpes))
+   continue;
+
+   seq_printf(m, "\tPDPE #%d\n", pdpe);
+   gen8_for_each_pde(pt, pd, pd_start, pd_len, temp, pde) {
+   uint32_t  pte;
+   gen8_pte_t *pt_vaddr;
+
+   if (!pt)
+   continue;
+
+   pt_vaddr = kmap_px(pt);
+   for (pte = 0; pte < GEN8_PTES; pte+=4) {
+   uint64_t va =
+   (pdpe << GEN8_PDPE_SHIFT) |
+   (pde << GEN8_PDE_SHIFT) |
+   (pte << GEN8_PTE_SHIFT);
+   int i;
+   bool found = false;
+   for (i = 0; i < 4; i++)
+   if (pt_vaddr[pte + i] != scratch_pte)
+   found = true;
+   if (!found)
+   continue;
+
+   seq_printf(m, "\t\t0x%llx [%03d,%03d,%04d]: =", 
va, pdpe, pde, pte);
+   for (i = 0; i < 4; i++) {
+   if (pt_vaddr[pte + i] != scratch_pte)
+   seq_printf(m, " %llx", 
pt_vaddr[pte + i]);
+   else
+

[Intel-gfx] [PATCH v3 06/17] drm/i915/gen8: Add 4 level switching infrastructure and lrc support

2015-07-01 Thread Michel Thierry

In 64b (48bit canonical) PPGTT addressing, the PDP0 register contains
the base address to PML4, while the other PDP registers are ignored.

In LRC, the addressing mode must be specified in every context descriptor.

v2: PML4 update in legacy context switch is left for historic reasons,
the preferred mode of operation is with lrc context based submission.
v3: s/gen8_map_page_directory/gen8_setup_page_directory and
s/gen8_map_page_directory_pointer/gen8_setup_page_directory_pointer.
Also, clflush will be needed for bxt. (Akash)
v4: Squashed lrc-specific code and use a macro to set PML4 register.
v5: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
PDP update in bb_start is only for legacy 32b mode.
v6: Rebase after final merged version of Mika's ppgtt/scratch patches.

Cc: Akash Goel 
Signed-off-by: Ben Widawsky 
Signed-off-by: Michel Thierry  (v2+)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 52 ++---
 drivers/gpu/drm/i915/i915_gem_gtt.h |  2 ++
 drivers/gpu/drm/i915/i915_reg.h |  1 +
 drivers/gpu/drm/i915/intel_lrc.c| 65 +++--
 4 files changed, 97 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index d23b0a8..fcb8c4b 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -211,6 +211,9 @@ static gen8_pde_t gen8_pde_encode(const dma_addr_t addr,
return pde;
 }
 
+#define gen8_pdpe_encode gen8_pde_encode
+#define gen8_pml4e_encode gen8_pde_encode
+
 static gen6_pte_t snb_pte_encode(dma_addr_t addr,
 enum i915_cache_level level,
 bool valid, u32 unused)
@@ -624,6 +627,35 @@ static void free_pdp(struct drm_device *dev,
}
 }
 
+static void
+gen8_setup_page_directory(struct i915_hw_ppgtt *ppgtt,
+ struct i915_page_directory_pointer *pdp,
+ struct i915_page_directory *pd,
+ int index)
+{
+   gen8_ppgtt_pdpe_t *page_directorypo;
+
+   if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev))
+   return;
+
+   page_directorypo = kmap_px(pdp);
+   page_directorypo[index] = gen8_pdpe_encode(px_dma(pd), I915_CACHE_LLC);
+   kunmap_px(ppgtt, page_directorypo);
+}
+
+static void
+gen8_setup_page_directory_pointer(struct i915_hw_ppgtt *ppgtt,
+ struct i915_pml4 *pml4,
+ struct i915_page_directory_pointer *pdp,
+ int index)
+{
+   gen8_ppgtt_pml4e_t *pagemap = kmap_px(pml4);
+
+   WARN_ON(!USES_FULL_48BIT_PPGTT(ppgtt->base.dev));
+   pagemap[index] = gen8_pml4e_encode(px_dma(pdp), I915_CACHE_LLC);
+   kunmap_px(ppgtt, pagemap);
+}
+
 /* Broadwell Page Directory Pointer Descriptors */
 static int gen8_write_pdp(struct drm_i915_gem_request *req,
  unsigned entry,
@@ -649,8 +681,8 @@ static int gen8_write_pdp(struct drm_i915_gem_request *req,
return 0;
 }
 
-static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
- struct drm_i915_gem_request *req)
+static int gen8_legacy_mm_switch(struct i915_hw_ppgtt *ppgtt,
+struct drm_i915_gem_request *req)
 {
int i, ret;
 
@@ -665,6 +697,12 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
return 0;
 }
 
+static int gen8_48b_mm_switch(struct i915_hw_ppgtt *ppgtt,
+ struct drm_i915_gem_request *req)
+{
+   return gen8_write_pdp(req, 0, px_dma(&ppgtt->pml4));
+}
+
 static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
   uint64_t start,
   uint64_t length,
@@ -1112,6 +1150,7 @@ static int gen8_alloc_va_range_3lvl(struct 
i915_address_space *vm,
 
__set_bit(pdpe, pdp->used_pdpes);
gen8_map_pagetable_range(ppgtt, pd, start, length);
+   gen8_setup_page_directory(ppgtt, pdp, pd, pdpe);
}
 
free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
@@ -1181,6 +1220,8 @@ static int gen8_alloc_va_range_4lvl(struct 
i915_address_space *vm,
ret = gen8_alloc_va_range_3lvl(vm, pdp, start, length);
if (ret)
goto err_out;
+
+   gen8_setup_page_directory_pointer(ppgtt, pml4, pdp, pml4e);
}
 
bitmap_or(pml4->used_pml4es, new_pdps, pml4->used_pml4es,
@@ -1230,14 +1271,13 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
ppgtt->base.unbind_vma = ppgtt_unbind_vma;
ppgtt->base.bind_vma = ppgtt_bind_vma;
 
-   ppgtt->switch_mm = gen8_mm_switch;
-
if (USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
ret = setup_px(ppgtt->base.dev, &ppgtt->pml4);
if (ret)
goto free_scratch;
 
ppgtt->base.

[Intel-gfx] [PATCH v3 03/17] drm/i915/gen8: Abstract PDP usage

2015-07-01 Thread Michel Thierry

Up until now, ppgtt->pdp has always been the root of our page tables.
Legacy 32b addresses acted like it had 1 PDP with 4 PDPEs.

In preparation for 4 level page tables, we need to stop use ppgtt->pdp
directly unless we know it's what we want. The future structure will use
ppgtt->pml4 for the top level, and the pdp is just one of the entries
being pointed to by a pml4e.

v2: Updated after dynamic page allocation changes.
v3: Rebase after s/page_tables/page_table/.
v4: Rebase after changes in "Dynamic page table allocations" patch.
v5: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
v6: Rebase after final merged version of Mika's ppgtt/scratch patches.

Signed-off-by: Ben Widawsky 
Signed-off-by: Michel Thierry  (v2+)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 136 +++-
 1 file changed, 88 insertions(+), 48 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index cdcc778..41a18ff 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -529,6 +529,25 @@ static void gen8_initialize_pd(struct i915_address_space 
*vm,
fill_px(vm->dev, pd, scratch_pde);
 }
 
+/* It's likely we'll map more than one page table at a time. This function will
+ * save us unnecessary kmap calls, but do no more functionally than multiple
+ * calls to pde_encode. The ppgtt is only needed to reuse the kunmap macro. */
+static void gen8_map_pagetable_range(struct i915_hw_ppgtt *ppgtt,
+struct i915_page_directory *pd,
+uint64_t start,
+uint64_t length)
+{
+   gen8_pde_t * const page_directory = kmap_px(pd);
+   struct i915_page_table *pt;
+   uint64_t temp, pde;
+
+   gen8_for_each_pde(pt, pd, start, length, temp, pde)
+   page_directory[pde] = gen8_pde_encode(px_dma(pt),
+ I915_CACHE_LLC);
+
+   kunmap_px(ppgtt, page_directory);
+}
+
 static int __pdp_init(struct drm_device *dev,
  struct i915_page_directory_pointer *pdp)
 {
@@ -616,6 +635,7 @@ static void gen8_ppgtt_clear_range(struct 
i915_address_space *vm,
 {
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
+   struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
gen8_pte_t *pt_vaddr, scratch_pte;
unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
@@ -630,10 +650,10 @@ static void gen8_ppgtt_clear_range(struct 
i915_address_space *vm,
struct i915_page_directory *pd;
struct i915_page_table *pt;
 
-   if (WARN_ON(!ppgtt->pdp.page_directory[pdpe]))
+   if (WARN_ON(!pdp->page_directory[pdpe]))
break;
 
-   pd = ppgtt->pdp.page_directory[pdpe];
+   pd = pdp->page_directory[pdpe];
 
if (WARN_ON(!pd->page_table[pde]))
break;
@@ -671,6 +691,7 @@ static void gen8_ppgtt_insert_entries(struct 
i915_address_space *vm,
 {
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
+   struct i915_page_directory_pointer *pdp = &ppgtt->pdp; /* FIXME: 48b */
gen8_pte_t *pt_vaddr;
unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
@@ -681,7 +702,7 @@ static void gen8_ppgtt_insert_entries(struct 
i915_address_space *vm,
 
for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
if (pt_vaddr == NULL) {
-   struct i915_page_directory *pd = 
ppgtt->pdp.page_directory[pdpe];
+   struct i915_page_directory *pd = 
pdp->page_directory[pdpe];
struct i915_page_table *pt = pd->page_table[pde];
pt_vaddr = kmap_px(pt);
}
@@ -763,23 +784,28 @@ static void gen8_ppgtt_cleanup(struct i915_address_space 
*vm)
container_of(vm, struct i915_hw_ppgtt, base);
int i;
 
-   for_each_set_bit(i, ppgtt->pdp.used_pdpes,
-   I915_PDPES_PER_PDP(ppgtt->base.dev)) {
-   if (WARN_ON(!ppgtt->pdp.page_directory[i]))
-   continue;
+   if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
+   for_each_set_bit(i, ppgtt->pdp.used_pdpes,
+I915_PDPES_PER_PDP(ppgtt->base.dev)) {
+   if (WARN_ON(!ppgtt->pdp.page_directory[i]))
+   continue;
 
-   gen8_free_page_tables(ppgtt->base.dev,
- ppgtt->pdp.page_directory[i]);
-   free_pd(ppgtt->base.dev, ppgtt->pdp.page_directory[i]);
+   gen8_fr

[Intel-gfx] [PATCH v3 04/17] drm/i915/gen8: Add dynamic page trace events

2015-07-01 Thread Michel Thierry

The dynamic page allocation patch series added it for GEN6, this patch
adds them for GEN8.

v2: Consolidate pagetable/page_directory events
v3: Multiple rebases.
v4: Rebase after s/page_tables/page_table/.
v5: Rebase after Mika's ppgtt cleanup / scratch merge patch series.

Signed-off-by: Ben Widawsky 
Signed-off-by: Michel Thierry  (v3+)
---
 drivers/gpu/drm/i915/i915_gem_gtt.c |  9 -
 drivers/gpu/drm/i915/i915_trace.h   | 16 
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 41a18ff..1327e41 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -541,9 +541,14 @@ static void gen8_map_pagetable_range(struct i915_hw_ppgtt 
*ppgtt,
struct i915_page_table *pt;
uint64_t temp, pde;
 
-   gen8_for_each_pde(pt, pd, start, length, temp, pde)
+   gen8_for_each_pde(pt, pd, start, length, temp, pde) {
page_directory[pde] = gen8_pde_encode(px_dma(pt),
  I915_CACHE_LLC);
+   trace_i915_page_table_entry_map(&ppgtt->base, pde, pt,
+   gen8_pte_index(start),
+   gen8_pte_count(start, length),
+   GEN8_PTES);
+   }
 
kunmap_px(ppgtt, page_directory);
 }
@@ -849,6 +854,7 @@ static int gen8_ppgtt_alloc_pagetabs(struct 
i915_address_space *vm,
gen8_initialize_pt(vm, pt);
pd->page_table[pde] = pt;
__set_bit(pde, new_pts);
+   trace_i915_page_table_entry_alloc(vm, pde, start, 
GEN8_PDE_SHIFT);
}
 
return 0;
@@ -909,6 +915,7 @@ gen8_ppgtt_alloc_page_directories(struct i915_address_space 
*vm,
gen8_initialize_pd(vm, pd);
pdp->page_directory[pdpe] = pd;
__set_bit(pdpe, new_pds);
+   trace_i915_page_directory_entry_alloc(vm, pdpe, start, 
GEN8_PDPE_SHIFT);
}
 
return 0;
diff --git a/drivers/gpu/drm/i915/i915_trace.h 
b/drivers/gpu/drm/i915/i915_trace.h
index 63328b6..15cf1af 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -213,6 +213,22 @@ DEFINE_EVENT(i915_page_table_entry, 
i915_page_table_entry_alloc,
 TP_ARGS(vm, pde, start, pde_shift)
 );
 
+DEFINE_EVENT_PRINT(i915_page_table_entry, i915_page_directory_entry_alloc,
+  TP_PROTO(struct i915_address_space *vm, u32 pdpe, u64 start, 
u64 pdpe_shift),
+  TP_ARGS(vm, pdpe, start, pdpe_shift),
+
+  TP_printk("vm=%p, pdpe=%d (0x%llx-0x%llx)",
+__entry->vm, __entry->pde, __entry->start, 
__entry->end)
+);
+
+DEFINE_EVENT_PRINT(i915_page_table_entry, 
i915_page_directory_pointer_entry_alloc,
+  TP_PROTO(struct i915_address_space *vm, u32 pml4e, u64 
start, u64 pml4e_shift),
+  TP_ARGS(vm, pml4e, start, pml4e_shift),
+
+  TP_printk("vm=%p, pml4e=%d (0x%llx-0x%llx)",
+__entry->vm, __entry->pde, __entry->start, 
__entry->end)
+);
+
 /* Avoid extra math because we only support two sizes. The format is defined by
  * bitmap_scnprintf. Each 32 bits is 8 HEX digits followed by comma */
 #define TRACE_PT_SIZE(bits) \
-- 
2.4.5

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH v3 10/17] drm/i915/gen8: Initialize PDPs

2015-07-01 Thread Michel Thierry

Similar to PDs, while setting up a page directory pointer, make all entries
of the pdp point to the scratch pdp before mapping (and make all its entries
point to the scratch page); this is to be safe in case of out of bound
access or  proactive prefetch.

Although the ggtt is always 32-bit, the scratch_pdp will be 
initialized/destroyed
at the same time as the other scratch pages, to keep it consistent.

v2: Handle scratch_pdp allocation failure correctly, and keep
initialize_px functions together (Akash)
v3: Rebase after Mika's ppgtt cleanup / scratch merge patch series. Rely on
the added macros to initialize the pdps.
v4: Rebase after final merged version of Mika's ppgtt/scratch patches.

Suggested-by: Akash Goel 
Signed-off-by: Michel Thierry 
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 41 +++--
 drivers/gpu/drm/i915/i915_gem_gtt.h |  1 +
 2 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index d16fbce..c6fc0d3 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -627,6 +627,27 @@ static void free_pdp(struct drm_device *dev,
}
 }
 
+static void gen8_initialize_pdp(struct i915_address_space *vm,
+   struct i915_page_directory_pointer *pdp)
+{
+   gen8_ppgtt_pdpe_t scratch_pdpe;
+
+   scratch_pdpe = gen8_pdpe_encode(px_dma(vm->scratch_pd), I915_CACHE_LLC);
+
+   fill_px(vm->dev, pdp, scratch_pdpe);
+}
+
+static void gen8_initialize_pml4(struct i915_address_space *vm,
+struct i915_pml4 *pml4)
+{
+   gen8_ppgtt_pml4e_t scratch_pml4e;
+
+   scratch_pml4e = gen8_pml4e_encode(px_dma(vm->scratch_pdp),
+ I915_CACHE_LLC);
+
+   fill_px(vm->dev, pml4, scratch_pml4e);
+}
+
 static void
 gen8_setup_page_directory(struct i915_hw_ppgtt *ppgtt,
  struct i915_page_directory_pointer *pdp,
@@ -892,8 +913,20 @@ static int gen8_init_scratch(struct i915_address_space *vm)
return PTR_ERR(vm->scratch_pd);
}
 
+   if (USES_FULL_48BIT_PPGTT(dev)) {
+   vm->scratch_pdp = alloc_pdp(dev);
+   if (IS_ERR(vm->scratch_pdp)) {
+   free_pd(dev, vm->scratch_pd);
+   free_pt(dev, vm->scratch_pt);
+   free_scratch_page(dev, vm->scratch_page);
+   return PTR_ERR(vm->scratch_pdp);
+   }
+   }
+
gen8_initialize_pt(vm, vm->scratch_pt);
gen8_initialize_pd(vm, vm->scratch_pd);
+   if (USES_FULL_48BIT_PPGTT(dev))
+   gen8_initialize_pdp(vm, vm->scratch_pdp);
 
return 0;
 }
@@ -902,6 +935,8 @@ static void gen8_free_scratch(struct i915_address_space *vm)
 {
struct drm_device *dev = vm->dev;
 
+   if (USES_FULL_48BIT_PPGTT(dev))
+   free_pdp(dev, vm->scratch_pdp);
free_pd(dev, vm->scratch_pd);
free_pt(dev, vm->scratch_pt);
free_scratch_page(dev, vm->scratch_page);
@@ -1247,12 +1282,12 @@ static int gen8_alloc_va_range_4lvl(struct 
i915_address_space *vm,
 * and 4 level code. Just allocate the pdps.
 */
gen8_for_each_pml4e(pdp, pml4, start, length, temp, pml4e) {
-   if (!pdp) {
-   WARN_ON(test_bit(pml4e, pml4->used_pml4es));
+   if (!test_bit(pml4e, pml4->used_pml4es)) {
pdp = alloc_pdp(vm->dev);
if (IS_ERR(pdp))
goto err_out;
 
+   gen8_initialize_pdp(vm, pdp);
pml4->pdps[pml4e] = pdp;
__set_bit(pml4e, new_pdps);

trace_i915_page_directory_pointer_entry_alloc(&ppgtt->base, pml4e,
@@ -1330,6 +1365,8 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
if (ret)
goto free_scratch;
 
+   gen8_initialize_pml4(&ppgtt->base, &ppgtt->pml4);
+
ppgtt->base.total = 1ULL << 48;
ppgtt->switch_mm = gen8_48b_mm_switch;
} else {
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h 
b/drivers/gpu/drm/i915/i915_gem_gtt.h
index fd61325..2b2505a 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -278,6 +278,7 @@ struct i915_address_space {
struct i915_page_scratch *scratch_page;
struct i915_page_table *scratch_pt;
struct i915_page_directory *scratch_pd;
+   struct i915_page_directory_pointer *scratch_pdp; /* GEN8+ & 48b PPGTT */
 
/**
 * List of objects currently involved in rendering.
-- 
2.4.5

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH v3 00/17] 48-bit PPGTT

2015-07-01 Thread Michel Thierry

These are the rebased patches, after Mika's final ppgtt clean-up series landed
(it relies in the macros added). New functions also follow these changes.

In order expand the GPU address space, a 4th level translation is added, the
Page Map Level 4 (PML4). This PML4 has 256 PML4 Entries (PML4E), PML4[0-255],
each pointing to a PDP. All the existing "dynamic alloc ppgtt" functions are
used, only adding the 4th level changes. I also updated some remaining
variables that were 32b only.

There are 2 hardware workarounds needed to allow correct operation with 48b
addresses (Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset). This
new patchset version includes the comments and suggestions from Chris Wilson.
A flag (EXEC_OBJECT_SUPPORTS_48B_ADDRESS) will indicate if a given object can be
allocated outside the first 4 PDPs; if not, the end range is forced to 4GB. 
Also,
more objects now use the DRM_MM_CREATE_TOP flag. To maintain compatibility, in
libdrm I added a new drm_intel_bo_emit_reloc_48bit function that will flag
these objects, while the existing drm_intel_bo_emit_reloc clears it.

Finally, this feature is only available in BDW and Gen9, requires LRC submission
mode (execlists) and it can be detected by i915.enable_ppgtt=3.

Also note that this expanded address space is only available for full PPGTT,
aliasing PPGTT and Global GTT remain 32-bit.

Michel Thierry (17):
  drm/i915: Remove unnecessary gen8_clamp_pd
  drm/i915/gen8: Make pdp allocation more dynamic
  drm/i915/gen8: Abstract PDP usage
  drm/i915/gen8: Add dynamic page trace events
  drm/i915/gen8: implement alloc/free for 4lvl
  drm/i915/gen8: Add 4 level switching infrastructure and lrc support
  drm/i915/gen8: Generalize PTE writing for GEN8 PPGTT
  drm/i915/gen8: Pass sg_iter through pte inserts
  drm/i915/gen8: Add 4 level support in insert_entries and clear_range
  drm/i915/gen8: Initialize PDPs
  drm/i915: Expand error state's address width to 64b
  drm/i915/gen8: Add ppgtt info and debug_dump
  drm/i915: object size needs to be u64
  drm/i915: batch_obj vm offset must be u64
  drm/i915/userptr: Kill user_size limit check
  drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset
  drm/i915/gen8: Flip the 48b switch

 drivers/gpu/drm/i915/i915_debugfs.c|  18 +-
 drivers/gpu/drm/i915/i915_drv.h|  17 +-
 drivers/gpu/drm/i915/i915_gem.c|  22 +-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  10 +
 drivers/gpu/drm/i915/i915_gem_gtt.c| 649 -
 drivers/gpu/drm/i915/i915_gem_gtt.h|  66 ++-
 drivers/gpu/drm/i915/i915_gem_userptr.c|   4 -
 drivers/gpu/drm/i915/i915_gpu_error.c  |  17 +-
 drivers/gpu/drm/i915/i915_params.c |   2 +-
 drivers/gpu/drm/i915/i915_reg.h|   1 +
 drivers/gpu/drm/i915/i915_trace.h  |  16 +
 drivers/gpu/drm/i915/intel_lrc.c   |  65 ++-
 include/uapi/drm/i915_drm.h|   3 +-
 13 files changed, 725 insertions(+), 165 deletions(-)

-- 
2.4.5

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 0/2] I915 GEM context updates

2015-07-01 Thread Daniel Vetter

On Wed, Jul 1, 2015 at 2:21 PM, David Weinehall
 wrote:
> On Tue, Jun 30, 2015 at 03:01:06PM +0100, Chris Wilson wrote:
>> On Tue, Jun 30, 2015 at 04:36:55PM +0300, David Weinehall wrote:
>> > On Tue, Jun 30, 2015 at 02:32:19PM +0100, Chris Wilson wrote:
>> > > On Tue, Jun 30, 2015 at 04:01:23PM +0300, David Weinehall wrote:
>> > > > On Tue, Jun 30, 2015 at 01:49:27PM +0100, Chris Wilson wrote:
>> > > > > On Tue, Jun 30, 2015 at 03:24:51PM +0300, David Weinehall wrote:
>> > > > > > This patch contains a few minor updates related to
>> > > > > > I915 GEM context.
>> > > > >
>> > > > > As a kernel API, this is absolutely awful. Can we please correct it 
>> > > > > before
>> > > > > it is released?
>> > > >
>> > > > Daniel has already merged it and didn't have any objections, so you'll
>> > > > have to convince him, not me.
>> > > >
>> > > > If you believe it's awful, feel free to provide a better 
>> > > > implementation.
>> > >
>> > > As I recall, I did.
>> >
>> > Hmmm, I must've missed your patch -- if so I apologise.  What was the
>> > title of the post, and how come Daniel hasn't merged that one instead?
>>
>> I gave details on a comment to your patch, where I thought the api could
>> be improved.
>
> Yeah, I got the bits about you not liking the approach, but the things
> you write in this e-mail are the first suggestions that I find concrete
> enough for me to actually know what you want instead.

Imo NONZEROMAP is still good to go, and good enough for
opencl/beignet. Allowing more fancy placement constraints might be
useful eventually, but thus far I haven't seen a compelling reason
really. Or not compelling enough at least.

And I don't think there's a point in blocking beignet for something
too fancy. Hence this still has my Ack. It gets the (really specific)
job done for beignet, which seems good enough.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH v4] drm/i915 : Added Programming of the MOCS

2015-07-01 Thread Francisco Jerez

Francisco Jerez  writes:

> Peter Antoine  writes:
>
>> On Wed, 1 Jul 2015, Francisco Jerez wrote:
>>
>>> Peter Antoine  writes:
>>>
 On Tue, 30 Jun 2015, Francisco Jerez wrote:

> Francisco Jerez  writes:
>
>> Peter Antoine  writes:
>>
>>> On Mon, 29 Jun 2015, Peter Antoine wrote:
>>>
 On Thu, 25 Jun 2015, Francisco Jerez wrote:

> Peter Antoine  writes:
>
>> This change adds the programming of the MOCS registers to the gen 9+
>> platforms. This change set programs the MOCS register values to a set
>> of values that are defined to be optimal.
>>
>> It creates a fixed register set that is programmed across the 
>> different
>> engines so that all engines have the same table. This is done as the
>> main RCS context only holds the registers for itself and the shared
>> L3 values. By trying to keep the registers consistent across the
>> different engines it should make the programming for the registers
>> consistent.
>>
>> v2:
>> -'static const' for private data structures and style changes.(Matt
 Turner)
>> v3:
>> - Make the tables "slightly" more readable. (Damien Lespiau)
>> - Updated tables fix performance regression.
>> v4:
>> - Code formatting. (Chris Wilson)
>> - re-privatised mocs code. (Daniel Vetter)
>>
>> Signed-off-by: Peter Antoine 
>> ---
>>  drivers/gpu/drm/i915/Makefile |   1 +
>>  drivers/gpu/drm/i915/i915_reg.h   |   9 +
>>  drivers/gpu/drm/i915/intel_lrc.c  |  10 +-
>>  drivers/gpu/drm/i915/intel_lrc.h  |   4 +
>>  drivers/gpu/drm/i915/intel_mocs.c | 373
 ++
>>  drivers/gpu/drm/i915/intel_mocs.h |  64 +++
>>  6 files changed, 460 insertions(+), 1 deletion(-)
>>  create mode 100644 drivers/gpu/drm/i915/intel_mocs.c
>>  create mode 100644 drivers/gpu/drm/i915/intel_mocs.h
>>
>> diff --git a/drivers/gpu/drm/i915/Makefile 
>> b/drivers/gpu/drm/i915/Makefile
>> index b7ddf48..c781e19 100644
>> --- a/drivers/gpu/drm/i915/Makefile
>> +++ b/drivers/gpu/drm/i915/Makefile
>> @@ -35,6 +35,7 @@ i915-y += i915_cmd_parser.o \
>>i915_irq.o \
>>i915_trace_points.o \
>>intel_lrc.o \
>> +  intel_mocs.o \
>>intel_ringbuffer.o \
>>intel_uncore.o
>>
>> diff --git a/drivers/gpu/drm/i915/i915_reg.h
 b/drivers/gpu/drm/i915/i915_reg.h
>> index 7213224..3a435b5 100644
>> --- a/drivers/gpu/drm/i915/i915_reg.h
>> +++ b/drivers/gpu/drm/i915/i915_reg.h
>> @@ -7829,4 +7829,13 @@ enum skl_disp_power_wells {
>>  #define _PALETTE_A (dev_priv->info.display_mmio_offset + 0xa000)
>>  #define _PALETTE_B (dev_priv->info.display_mmio_offset + 0xa800)
>>
>> +/* MOCS (Memory Object Control State) registers */
>> +#define GEN9_LNCFCMOCS0 (0xB020)/* L3 Cache 
>> Control
 base */
>> +
>> +#define GEN9_GFX_MOCS_0 (0xc800)/* Graphics 
>> MOCS base
 register*/
>> +#define GEN9_MFX0_MOCS_0(0xc900)/* Media 0 MOCS base
 register*/
>> +#define GEN9_MFX1_MOCS_0(0xcA00)/* Media 1 MOCS base
 register*/
>> +#define GEN9_VEBOX_MOCS_0   (0xcB00)/* Video MOCS base 
>> register*/
>> +#define GEN9_BLT_MOCS_0 (0xcc00)/* Blitter MOCS 
>> base
 register*/
>> +
>>  #endif /* _I915_REG_H_ */
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c
 b/drivers/gpu/drm/i915/intel_lrc.c
>> index 9f5485d..73b919d 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> @@ -135,6 +135,7 @@
>>  #include 
>>  #include 
>>  #include "i915_drv.h"
>> +#include "intel_mocs.h"
>>
>>  #define GEN9_LR_CONTEXT_RENDER_SIZE (22 * PAGE_SIZE)
>>  #define GEN8_LR_CONTEXT_RENDER_SIZE (20 * PAGE_SIZE)
>> @@ -796,7 +797,7 @@ static int logical_ring_prepare(struct
 intel_ringbuffer *ringbuf,
>>   *
>>   * Return: non-zero if the ringbuffer is not ready to be written to.
>>   */
>> -static int intel_logical_ring_begin(struct intel_ringbuffer 
>> *ringbuf,
>> +int intel_logical_ring_begin(struct intel_ringbuffer *ringbuf,
>>  struct intel_context *ctx, int
 num_dwords)
>>  {
>>  struct intel_engine_cs *ring = ringbuf->ring;
>> @@ -1379,6 +1380,13 @@ static int gen8_init_rcs_cont

Re: [Intel-gfx] [PATCH 6/8] drm/i915: add struct_mutex WARNs to i915_gem_stolen.c

2015-07-01 Thread Jesse Barnes

On 07/01/2015 06:56 AM, Daniel Vetter wrote:
> On Tue, Jun 30, 2015 at 01:30:27PM -0700, Jesse Barnes wrote:
>> On 06/30/2015 07:36 AM, Chris Wilson wrote:
>>> On Tue, Jun 30, 2015 at 11:26:11AM -0300, Paulo Zanoni wrote:
 2015-06-30 11:15 GMT-03:00 Chris Wilson :
> On Tue, Jun 30, 2015 at 10:53:10AM -0300, Paulo Zanoni wrote:
>> From: Paulo Zanoni 
>>
>> Let's make sure the future Paulos don't forget that we need
>> struct_mutex when touching dev_priv->mm.stolen.
>>
>> Signed-off-by: Paulo Zanoni 
>> ---
>>  drivers/gpu/drm/i915/i915_gem_stolen.c | 13 +
>>  1 file changed, 13 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c 
>> b/drivers/gpu/drm/i915/i915_gem_stolen.c
>> index 793bcba..cac1bce 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_stolen.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
>> @@ -160,6 +160,8 @@ static int find_compression_threshold(struct 
>> drm_device *dev,
>>   int compression_threshold = 1;
>>   int ret;
>>
>> + WARN_ON(!mutex_is_locked(&dev->struct_mutex));
>
> I'm not a huge fan of vague mutex warnings that don't even check the 
> owner.
> I'm espcially not a fan of adding a WARN and not handling the error.

 But then, what exactly is your proposal? What would you like to see here?

 We can discard this patch if you want. But I hope you're not
 advocating for lockdep_assert_held(), because if I switch to lockdep,
 then Daniel is going to deny it again. Also, this type of WARN_ON is a
 common pattern on our codebase...
>>>
>>> I'm just trying to convince Daniel that blindly using this pattern is
>>> the wrong approach and encouraging a proliferation of unhandled WARN_ON
>>> doesn't improve driver robustness.
>>
>> I think they serve as useful documentation at the very least, whether in
>> lockdep form, WARN form, or BUG form.  It's not really something we can
>> recover from either (maybe returning early before touching data?), so...
> 
> Not grabbing a lock is generally a harmless error since real races out
> there are rare with X being single-threaded and all that. Especially in
> stuff called from modeset code. Hence I think just WARN_ON plus continuing
> on with blissful ignorance is the best approach.
> 
> I don't the lockdep versions personally since they don't work when lockdep
> is disabled, which is pretty much always the case. Might be useful to do
> an assert_mutex_held which always does the most paranoid check (i.e.
> WARN_ON without lockdep, lockdep_assert_held with lockdep).

Maybe we should add WARN_ONs to the lockdep_assert macros in the
!CONFIG_LOCKDEP case.  That would give us documentation, checking in
both cases, and everyone would be happy, right?

Jesse
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH v4] drm/i915 : Added Programming of the MOCS

2015-07-01 Thread Jesse Barnes

On 07/01/2015 06:53 AM, Peter Antoine wrote:
> On Wed, 1 Jul 2015, Francisco Jerez wrote:
> 
>> Peter Antoine  writes:
>>
>>> On Tue, 30 Jun 2015, Francisco Jerez wrote:
>>>
 Francisco Jerez  writes:

> Peter Antoine  writes:
>
>> On Mon, 29 Jun 2015, Peter Antoine wrote:
>>
>>> On Thu, 25 Jun 2015, Francisco Jerez wrote:
>>>
 Peter Antoine  writes:
 Mesa will want an additional entry with TC=LLC/eLLC, LeCC=PTE,
 L3CC=WB,
 everything else unset, I'll reply with a userspace patch making
 use of
 your change if you add such an entry.
>> Ok. I think what you want is, same as entry two, but use the
>> underlying
>> pagetable settings and not specify the EDRAM settings. Please
>> confirm in
>> the new patchset.
>
> Yeah, that sounds good.
>

 Another thing worth mentioning is that entries 0, 2 and 5 seem
 to do the
 same thing suspiciously, the only difference is the LRUM field
 which
 AFAIK doesn't have any effect for LeCC=UC.  Is my understanding
 correct?

>>> These tables are generated via requests and then boiled down to
>>> the above.
>>> So some of the entries are by request. Swings and roundabouts,
>>> can remove
>>> the ones that look redundant but then the tuning that has been
>>> done wont
>>> match. I'll add the new entry at the end of the table.

 Are you planning to propagate the entry you just added back to the
 original table this was generated from?  What about new entries we may
 need to add in the future?  What should be the process to make sure
 that
 our table and the master table don't diverge and end up with
 conflicting
 entries we cannot remove because of ABI compatibility?  I guess there
 should be a comment on the top warning that the table is part of the
 kernel ABI and supposed to be kept in sync with your table, so other
 people don't change it unknowingly?

 Thanks.
>>> I am talking to the team that handles this and see if they will add this
>>> (so future gens this is baked in) but it is unlikely that the other
>>> tables
>>> will stay in step as getting in changes will cause too much grief
>>> getting
>>> them upstreamed and as the table is auto-generated we will not be
>>> able to
>>> guarantee the ordering. It will have to be manual job for anyone doing
>>> this. It is required for other platforms for the tables to match the
>>> userspace for performance reasons, but on Linux it will be by request if
>>> there is a problem. We will see what happens.
>>>
>> I think it only makes sense for Linux to maintain compatibility with
>> Android's tables if we agree on some straightforward process for us to
>> allocate new entries without causing conflicts (otherwise people are
>> likely to ignore the issue completely and let the tables diverge, as you
>> mentioned yourself), and have some guarantee that any entries ever
>> contributed by your team to the Linux kernel (and therefore part of our
>> stable ABI) will never be changed or reordered in the future.
>>
> I think internally (and informally) that we cannot keep sync between
> Android
> and Linux. We need to keep compatibility with userspace and there is no
> guarantee of ordering as these tables are generated at runtime. The tables
> that are in Linux are a snapshot. These changes are supposed to
> stabilise at
> PV so they don't change in the future, but if a bug or good performance
> enhancement occurs I can't imagine that they wont make the changes.

Wow this discussion just keeps going.  Who'd have thought such a simple
table would cause so much trouble? :)

What you mention above is a key point: "these tables are generated at
runtime. The tables that are in Linux are a snapshot. These changes are
supposed to stabilise at PV so they don't change in the future, but if a
bug or good performance enhancement occurs I can't imagine that they
wont make the changes."

That really argues for a runtime API that allows the userland drivers to
load in MOCS values.  I'm not sure if it's practical to make the table
effectively part of the context (lazily applying new values if we detect
a change vs the defaults), but that would at least let the different
user level drivers do whatever they think is ideal...

>> I have the impression that because of your development model you have
>> far more freedom to make changes in your kernel ABI after the fact than
>> we do -- OTOH we would be locked in if we accept to import Android's
>> tables now, what brings me to the next question: How would you feel
>> about reversing the roles of our tables?  The workflow could be as
>> follows:
> The Android kernel is more flexible, in what it accepts, and secondly (and
> more importantly) you should be using the userspace drivers as this is
> the API and is tuned, so changing the tables are less of a problem

Re: [Intel-gfx] [PATCH 3/3] drm/i915: Read HDMI EDID only when required

2015-07-01 Thread Shashank Sharma

> Userspace always sets force. Are you sure this actually improves anything?
Yes we do. We have had this code for commercial projects, and that's really
 important to have proper interrupt handling as well as to avoid race
condition between multiple HDMI detects from interrupt handler and
userspace detect calls. This is a must for HDMI compliance also.

Actually the plan is to use this force for GEN < 6 HW only, where the
hotplug doesn't work reliably (I remember our last conversation on some old
HW which doesn't support HPD properly). For vlv+, we can (will) use only
the cached EDID.

> Also the goal should be to keep things cache for a few calls from
> userspace (since often it pokes a few times in a row unfortuantely), for
> which we need a proper timeout to clear the edid again.

Can you please let us know why ? Why do we need to clear this EDID caching
? We should clear it only in the next hot-unplug, and maintain this cached
EDID for all userspace detect operations. I believe as long as we have the
state machine maintained, we need not to clear it.

>-Daniel

Regards
Shashank

On Tue, Jun 30, 2015 at 4:36 PM, Daniel Vetter  wrote:

> On Tue, Jun 30, 2015 at 11:13:58AM +0530, Sonika Jindal wrote:
> > From: Shashank Sharma 
> >
> > This patch makes sure that the HDMI detect function
> > reads EDID only when its forced to do it. All the other
> > times, it uses the connector->detect_edid which was cached
> > during hotplug handling in the hdmi_probe() function. As the
> > probe function gets called before detect in the interrupt handler
> > and handles the EDID cacheing part, its absolutely safe to assume
> > that presence of EDID reflects monitor connected and viceversa.
> >
> > This will save us from many race conditions between hotplug/unplug
> > detect call handler threads and userspace calls for the same.
> > The previous patch in this patch series explains this in detail.
> >
> > Signed-off-by: Shashank Sharma 
> > ---
> >  drivers/gpu/drm/i915/intel_hdmi.c |   26 --
> >  1 file changed, 20 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/intel_hdmi.c
> b/drivers/gpu/drm/i915/intel_hdmi.c
> > index 064ddd8..1fb6919 100644
> > --- a/drivers/gpu/drm/i915/intel_hdmi.c
> > +++ b/drivers/gpu/drm/i915/intel_hdmi.c
> > @@ -1362,19 +1362,33 @@ static enum drm_connector_status
> >  intel_hdmi_detect(struct drm_connector *connector, bool force)
> >  {
> >   enum drm_connector_status status;
> > + struct intel_connector *intel_connector =
> > + to_intel_connector(connector);
> >
> >   DRM_DEBUG_KMS("[CONNECTOR:%d:%s]\n",
> > connector->base.id, connector->name);
> > + /*
> > +  * There are many userspace calls which probe EDID from
> > +  * detect path. In case on multiple hotplug/unplug, these
> > +  * can cause race conditions while probing EDID. Also its
> > +  * waste of CPU cycles to read the EDID again and again
> > +  * unless there is a real hotplug.
> > +  * So until we are forced, check connector status
> > +  * based on availability of cached EDID. This will avoid many of
> > +  * these race conditions and timing problems.
> > +  */
> > + if (force)
>
> Userspace always sets force. Are you sure this actually improves anything?
> Also the goal should be to keep things cache for a few calls from
> userspace (since often it pokes a few times in a row unfortuantely), for
> which we need a proper timeout to clear the edid again.
> -Daniel
>
> > + intel_hdmi_probe(intel_connector->encoder);
> >
> > - intel_hdmi_unset_edid(connector);
> > -
> > - if (intel_hdmi_set_edid(connector)) {
> > + if (intel_connector->detect_edid) {
> >   struct intel_hdmi *intel_hdmi =
> intel_attached_hdmi(connector);
> > -
> > - hdmi_to_dig_port(intel_hdmi)->base.type =
> INTEL_OUTPUT_HDMI;
> >   status = connector_status_connected;
> > - } else
> > + hdmi_to_dig_port(intel_hdmi)->base.type =
> INTEL_OUTPUT_HDMI;
> > + DRM_DEBUG_DRIVER("hdmi status = connected\n");
> > + } else {
> >   status = connector_status_disconnected;
> > + DRM_DEBUG_DRIVER("hdmi status = disconnected\n");
> > + }
> >
> >   return status;
> >  }
> > --
> > 1.7.10.4
> >
> > ___
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
>
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 2/4] drm/i915: Support for creating Stolen memory backed objects

2015-07-01 Thread Tvrtko Ursulin




On 07/01/2015 10:25 AM, ankitprasad.r.sha...@intel.com wrote:

From: Ankitprasad Sharma 

Extend the drm_i915_gem_create structure to add support for
creating Stolen memory backed objects. Added a new flag through
which user can specify the preference to allocate the object from
stolen memory, which if set, an attempt will be made to allocate
the object from stolen memory subject to the availability of
free space in the stolen region.

v2: Rebased to the latest drm-intel-nightly (Ankit)

testcase: igt/gem_stolen

Signed-off-by: Ankitprasad Sharma 
---
  drivers/gpu/drm/i915/i915_dma.c |  3 +++
  drivers/gpu/drm/i915/i915_gem.c | 31 +++
  include/uapi/drm/i915_drm.h | 15 +++
  3 files changed, 45 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index c5349fa..6045749 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -167,6 +167,9 @@ static int i915_getparam(struct drm_device *dev, void *data,
value = i915.enable_hangcheck &&
intel_has_gpu_reset(dev);
break;
+   case I915_PARAM_CREATE_VERSION:
+   value = 1;


Shouldn't it be 2?


+   break;
default:
DRM_DEBUG("Unknown parameter %d\n", param->param);
return -EINVAL;
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index a2a4a27..4acf331 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -391,7 +391,8 @@ static int
  i915_gem_create(struct drm_file *file,
struct drm_device *dev,
uint64_t size,
-   uint32_t *handle_p)
+   uint32_t *handle_p,
+   uint32_t flags)
  {
struct drm_i915_gem_object *obj;
int ret;
@@ -401,8 +402,29 @@ i915_gem_create(struct drm_file *file,
if (size == 0)
return -EINVAL;

+   if (flags & ~(I915_CREATE_PLACEMENT_STOLEN))
+   return -EINVAL;
+
/* Allocate the new object */
-   obj = i915_gem_alloc_object(dev, size);
+   if (flags & I915_CREATE_PLACEMENT_STOLEN) {
+   mutex_lock(&dev->struct_mutex);


Probably need the interruptible variant so userspace can Ctrl-C if 
things get stuck in submission/waiting.



+   obj = i915_gem_object_create_stolen(dev, size);
+   if (!obj) {
+   mutex_unlock(&dev->struct_mutex);
+   return -ENOMEM;
+   }
+
+   ret = i915_gem_exec_clear_object(obj, file->driver_priv);


I would put a comment here saying why it is important to clear stolen 
memory.



+   if (ret) {
+   i915_gem_object_free(obj);


This should probably be drm_gem_object_unreference.


+   mutex_unlock(&dev->struct_mutex);
+   return ret;
+   }
+
+   mutex_unlock(&dev->struct_mutex);
+   } else
+   obj = i915_gem_alloc_object(dev, size);


Need curly braces on both branches.


if (obj == NULL)
return -ENOMEM;

@@ -425,7 +447,7 @@ i915_gem_dumb_create(struct drm_file *file,
args->pitch = ALIGN(args->width * DIV_ROUND_UP(args->bpp, 8), 64);
args->size = args->pitch * args->height;
return i915_gem_create(file, dev,
-  args->size, &args->handle);
+  args->size, &args->handle, 0);
  }

  /**
@@ -438,7 +460,8 @@ i915_gem_create_ioctl(struct drm_device *dev, void *data,
struct drm_i915_gem_create *args = data;

return i915_gem_create(file, dev,
-  args->size, &args->handle);
+  args->size, &args->handle,
+  args->flags);
  }

  static inline int
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index f88cc1c..87992d1 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -355,6 +355,7 @@ typedef struct drm_i915_irq_wait {
  #define I915_PARAM_SUBSLICE_TOTAL  33
  #define I915_PARAM_EU_TOTAL34
  #define I915_PARAM_HAS_GPU_RESET   35
+#define I915_PARAM_CREATE_VERSION   36

  typedef struct drm_i915_getparam {
int param;
@@ -450,6 +451,20 @@ struct drm_i915_gem_create {
 */
__u32 handle;
__u32 pad;
+   /**
+* Requested flags (currently used for placement
+* (which memory domain))
+*
+* You can request that the object be created from special memory
+* rather than regular system pages using this parameter. Such
+* irregular objects may have certain restrictions (such as CPU
+* access to a stolen object is verboten).


I'd just use English all the way. :)


+*
+* This can be used in the future for other pur

Re: [Intel-gfx] [PATCH v4] drm/i915 : Added Programming of the MOCS

2015-07-01 Thread Francisco Jerez

Peter Antoine  writes:

> On Wed, 1 Jul 2015, Francisco Jerez wrote:
>
>> Peter Antoine  writes:
>>
>>> On Tue, 30 Jun 2015, Francisco Jerez wrote:
>>>
 Francisco Jerez  writes:

> Peter Antoine  writes:
>
>> On Mon, 29 Jun 2015, Peter Antoine wrote:
>>
>>> On Thu, 25 Jun 2015, Francisco Jerez wrote:
>>>
 Peter Antoine  writes:

> This change adds the programming of the MOCS registers to the gen 9+
> platforms. This change set programs the MOCS register values to a set
> of values that are defined to be optimal.
>
> It creates a fixed register set that is programmed across the 
> different
> engines so that all engines have the same table. This is done as the
> main RCS context only holds the registers for itself and the shared
> L3 values. By trying to keep the registers consistent across the
> different engines it should make the programming for the registers
> consistent.
>
> v2:
> -'static const' for private data structures and style changes.(Matt
>>> Turner)
> v3:
> - Make the tables "slightly" more readable. (Damien Lespiau)
> - Updated tables fix performance regression.
> v4:
> - Code formatting. (Chris Wilson)
> - re-privatised mocs code. (Daniel Vetter)
>
> Signed-off-by: Peter Antoine 
> ---
>  drivers/gpu/drm/i915/Makefile |   1 +
>  drivers/gpu/drm/i915/i915_reg.h   |   9 +
>  drivers/gpu/drm/i915/intel_lrc.c  |  10 +-
>  drivers/gpu/drm/i915/intel_lrc.h  |   4 +
>  drivers/gpu/drm/i915/intel_mocs.c | 373
>>> ++
>  drivers/gpu/drm/i915/intel_mocs.h |  64 +++
>  6 files changed, 460 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/gpu/drm/i915/intel_mocs.c
>  create mode 100644 drivers/gpu/drm/i915/intel_mocs.h
>
> diff --git a/drivers/gpu/drm/i915/Makefile 
> b/drivers/gpu/drm/i915/Makefile
> index b7ddf48..c781e19 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -35,6 +35,7 @@ i915-y += i915_cmd_parser.o \
> i915_irq.o \
> i915_trace_points.o \
> intel_lrc.o \
> +   intel_mocs.o \
> intel_ringbuffer.o \
> intel_uncore.o
>
> diff --git a/drivers/gpu/drm/i915/i915_reg.h
>>> b/drivers/gpu/drm/i915/i915_reg.h
> index 7213224..3a435b5 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -7829,4 +7829,13 @@ enum skl_disp_power_wells {
>  #define _PALETTE_A (dev_priv->info.display_mmio_offset + 0xa000)
>  #define _PALETTE_B (dev_priv->info.display_mmio_offset + 0xa800)
>
> +/* MOCS (Memory Object Control State) registers */
> +#define GEN9_LNCFCMOCS0  (0xB020)/* L3 Cache 
> Control
>>> base */
> +
> +#define GEN9_GFX_MOCS_0  (0xc800)/* Graphics 
> MOCS base
>>> register*/
> +#define GEN9_MFX0_MOCS_0 (0xc900)/* Media 0 MOCS base
>>> register*/
> +#define GEN9_MFX1_MOCS_0 (0xcA00)/* Media 1 MOCS base
>>> register*/
> +#define GEN9_VEBOX_MOCS_0(0xcB00)/* Video MOCS base 
> register*/
> +#define GEN9_BLT_MOCS_0  (0xcc00)/* Blitter MOCS 
> base
>>> register*/
> +
>  #endif /* _I915_REG_H_ */
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c
>>> b/drivers/gpu/drm/i915/intel_lrc.c
> index 9f5485d..73b919d 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -135,6 +135,7 @@
>  #include 
>  #include 
>  #include "i915_drv.h"
> +#include "intel_mocs.h"
>
>  #define GEN9_LR_CONTEXT_RENDER_SIZE (22 * PAGE_SIZE)
>  #define GEN8_LR_CONTEXT_RENDER_SIZE (20 * PAGE_SIZE)
> @@ -796,7 +797,7 @@ static int logical_ring_prepare(struct
>>> intel_ringbuffer *ringbuf,
>   *
>   * Return: non-zero if the ringbuffer is not ready to be written to.
>   */
> -static int intel_logical_ring_begin(struct intel_ringbuffer *ringbuf,
> +int intel_logical_ring_begin(struct intel_ringbuffer *ringbuf,
>   struct intel_context *ctx, int
>>> num_dwords)
>  {
>   struct intel_engine_cs *ring = ringbuf->ring;
> @@ -1379,6 +1380,13 @@ static int gen8_init_rcs_context(struct
>>> intel_engine_cs *ring,
>   if (ret)
>   return ret;
>
> + /*
> +

Re: [Intel-gfx] [PATCH] drm/i915: Clear pipe's pll hw state in hsw_dp_set_ddi_pll_sel()

2015-07-01 Thread Daniel Vetter

On Wed, Jul 01, 2015 at 05:54:06PM +0300, Ander Conselvan De Oliveira wrote:
> On Tue, 2015-06-30 at 18:41 +0300, Jani Nikula wrote:
> > On Tue, 30 Jun 2015, Daniel Vetter  wrote:
> > > On Tue, Jun 30, 2015 at 04:47:06PM +0300, Jani Nikula wrote:
> > >> On Tue, 30 Jun 2015, Ander Conselvan de Oliveira 
> > >>  wrote:
> > >> > Similarly to what is done for SKL, clear the dpll_hw_state of the pipe
> > >> > config in hsw_dp_set_ddi_pll_sel(), since it main contain stale values.
> > >> > That can happen if a crtc that was previously driving an HDMI connector
> > >> > switches to a DP connector. In that case, the wrpll field was left with
> > >> > its old value, leading to warnings like the one below:
> > >> >
> > >> > [drm:check_crtc_state [i915]] *ERROR* mismatch in dpll_hw_state.wrpll 
> > >> > (expected 0xb035061f, found 0x)
> > >> > [ cut here ]
> > >> > WARNING: CPU: 1 PID: 767 at drivers/gpu/drm/i915/intel_display.c:12324 
> > >> > check_crtc_state+0x975/0x10b0 [i915]()
> > >> > pipe state doesn't match!
> > >> >
> > >> > This regression was indroduced in
> > >> >
> > >> > commit dd3cd74acf12723045a64f1f2c6298ac7b34a5d5
> > >> > Author: Ander Conselvan de Oliveira 
> > >> > 
> > >> > Date:   Fri May 15 13:34:29 2015 +0300
> > >> >
> > >> > drm/i915: Don't overwrite (e)DP PLL selection on SKL
> > >> >
> > >> > Signed-off-by: Ander Conselvan de Oliveira 
> > >> > 
> > >> 
> > >> Reported-by: Linus Torvalds 
> > >> Tested-by: Jani Nikula 
> > >
> > > Yeah makes sense as a fix for 4.2. But for 4.3 I wonder whether the
> > > original commit that started this chain needs to be changed a bit:
> > >
> > > commit 4978cc93d9ac240b435ce60431aef24239b4c270
> > > Author: Ander Conselvan de Oliveira 
> > > 
> > > Date:   Tue Apr 21 17:13:21 2015 +0300
> > >
> > > drm/i915: Preserve shared DPLL information in new pipe_config
> > >
> > > All the trouble this caused is because it not only preserves the sharing
> > > config (in crtc_state->shared_dpll) but also the ->dpll_hw_state. And I
> > > think with Maarten's latest code (for 4.3) we'd just do an unconditional
> > > compute_config (need it for fast pfit updates and fastboot), which means
> > > the bogus values in ->dpll_hw_state aren't a problem any more since we'll
> > > overwrite them again. And then we could remove that sprinkle of memsets we
> > > have all over, which would be good (since the current approach is
> > > obviously a bit fragile). Anyway:
> > >
> > > Reviewed-by: Daniel Vetter 
> > 
> > Pushed to drm-intel-next-fixes, thanks for the patch and review. One
> > down, another one left to fix.
> 
> I made some progress on the second issue, but I'm afraid Jani might have
> a found a third bug. The warning he gets happens because we try to wait
> for vblanks while updating the primary plane during the modeset. At that
> point, the crtc is off. The problem is in intel_check_primary_plane(),
> which is called from drm_atomic_helper_check_planes(). That function
> makes decisions about waiting for a vblank based on intel_crtc->active.
> Since the check is called before we disable the crtcs, active might be
> true, even though the plane update is done with crtcs disable.
> 
> The patch below makes the warning go away, but I still need to figure
> out how to set crtc_state->planes_changed properly if we are going down
> that route.
> 
> diff --git a/drivers/gpu/drm/i915/intel_display.c 
> b/drivers/gpu/drm/i915/intel_display.c
> index dcb1d25..f14727c 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -12480,10 +12480,6 @@ intel_modeset_compute_config(struct drm_crtc *crtc,
>  
> intel_dump_pipe_config(to_intel_crtc(crtc), pipe_config,"[modeset]");
>  
> -   ret = drm_atomic_helper_check_planes(state->dev, state);
> -   if (ret)
> -   return ERR_PTR(ret);
> -
> return pipe_config;
>  }
>  
> 
> The backtrace on Linus' machine is different, though. It comes from the
> call to intel_crtc_disable_planes() in __intel_set_mode(). That would
> indicate we have a crtc with crtc->state->enable == true but that is
> actually inactive. I'm still not sure how we can get in that state.

Using intel_crtc->active to precompute any kind of decisions won't work. I
guess we just need to delay the decision whether to make a vblank wait or
not to where we do the vblank wait, and use the (then current
intel_crtc->active) there. This will be fixed properly in 4.3.

I suspect Linus' backtrace is something similar - we try to precompute
what needs to be updated, get it wrong and the go boom. Sprinkling an

if (!intel_crtc->active)
return;

early return into the set_mode should help. But I haven't looked at what
4.2 looks precisely yet in this area - too much flux because of the atomic
conversion.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
Intel-gfx

Re: [Intel-gfx] [PATCH 1/4] drm/i915: Clearing buffer objects via blitter engine

2015-07-01 Thread Tvrtko Ursulin



Hi,

On 07/01/2015 10:25 AM, ankitprasad.r.sha...@intel.com wrote:

From: Ankitprasad Sharma 

This patch adds support for clearing buffer objects via blitter
engines. This is particularly useful for clearing out the memory
from stolen region.


Because CPU cannot access it? I would put that into the commit message 
since I think cover letter does not go into the git history.



v2: Add support for using execlists & PPGTT

v3: Fix issues in legacy ringbuffer submission mode

v4: Rebased to the latest drm-intel-nightly (Ankit)

testcase: igt/gem_stolen



Nitpick: usually it is "Testcase:" and all tags grouped together.


Signed-off-by: Chris Wilson 
Signed-off-by: Deepak S 
Signed-off-by: Ankitprasad Sharma 
---
  drivers/gpu/drm/i915/Makefile   |   1 +
  drivers/gpu/drm/i915/i915_drv.h |   4 +
  drivers/gpu/drm/i915/i915_gem_exec.c| 201 
  drivers/gpu/drm/i915/intel_lrc.c|   4 +-
  drivers/gpu/drm/i915/intel_lrc.h|   3 +
  drivers/gpu/drm/i915/intel_ringbuffer.c |   2 +-
  drivers/gpu/drm/i915/intel_ringbuffer.h |   1 +
  7 files changed, 213 insertions(+), 3 deletions(-)
  create mode 100644 drivers/gpu/drm/i915/i915_gem_exec.c

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index de21965..1959314 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -24,6 +24,7 @@ i915-y += i915_cmd_parser.o \
  i915_gem_debug.o \
  i915_gem_dmabuf.o \
  i915_gem_evict.o \
+ i915_gem_exec.o \
  i915_gem_execbuffer.o \
  i915_gem_gtt.o \
  i915_gem.o \
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index ea9caf2..d1e151e 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3082,6 +3082,10 @@ int __must_check i915_gem_evict_something(struct 
drm_device *dev,
  int i915_gem_evict_vm(struct i915_address_space *vm, bool do_idle);
  int i915_gem_evict_everything(struct drm_device *dev);

+/* i915_gem_exec.c */
+int i915_gem_exec_clear_object(struct drm_i915_gem_object *obj,
+  struct drm_i915_file_private *file_priv);
+
  /* belongs in i915_gem_gtt.h */
  static inline void i915_gem_chipset_flush(struct drm_device *dev)
  {
diff --git a/drivers/gpu/drm/i915/i915_gem_exec.c 
b/drivers/gpu/drm/i915/i915_gem_exec.c
new file mode 100644
index 000..a07fda0
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_gem_exec.c
@@ -0,0 +1,201 @@
+/*
+ * Copyright © 2013 Intel Corporation


Is the year correct?


+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *Chris Wilson 


And author?


+ *
+ */
+
+#include 
+#include 
+#include "i915_drv.h"
+
+#define GEN8_COLOR_BLT_CMD (2<<29 | 0x50<<22)
+
+#define BPP_8 0
+#define BPP_16 (1<<24)
+#define BPP_32 (1<<25 | 1<<24)
+
+#define ROP_FILL_COPY (0xf0 << 16)
+
+static int i915_gem_exec_flush_object(struct drm_i915_gem_object *obj,
+ struct intel_engine_cs *ring,
+ struct intel_context *ctx,
+ struct drm_i915_gem_request **req)
+{
+   int ret;
+
+   ret = i915_gem_object_sync(obj, ring, req);
+   if (ret)
+   return ret;
+
+   if (obj->base.write_domain & I915_GEM_DOMAIN_CPU) {
+   if (i915_gem_clflush_object(obj, false))
+   i915_gem_chipset_flush(obj->base.dev);
+   obj->base.write_domain &= ~I915_GEM_DOMAIN_CPU;
+   }
+   if (obj->base.write_domain & I915_GEM_DOMAIN_GTT) {
+   wmb();
+   obj->base.write_domain &= ~I915_GEM_DOMAIN_GTT;
+   }


All this could be replaced with i915_gem_object_set_to_gtt_domain, no?


+
+   return i915.enable_execlists ?
+   logical_ring_invalidate_all_caches(*req) :
+

Re: [Intel-gfx] [PATCH] drm/i915: Clear pipe's pll hw state in hsw_dp_set_ddi_pll_sel()

2015-07-01 Thread Ander Conselvan De Oliveira

On Tue, 2015-06-30 at 18:41 +0300, Jani Nikula wrote:
> On Tue, 30 Jun 2015, Daniel Vetter  wrote:
> > On Tue, Jun 30, 2015 at 04:47:06PM +0300, Jani Nikula wrote:
> >> On Tue, 30 Jun 2015, Ander Conselvan de Oliveira 
> >>  wrote:
> >> > Similarly to what is done for SKL, clear the dpll_hw_state of the pipe
> >> > config in hsw_dp_set_ddi_pll_sel(), since it main contain stale values.
> >> > That can happen if a crtc that was previously driving an HDMI connector
> >> > switches to a DP connector. In that case, the wrpll field was left with
> >> > its old value, leading to warnings like the one below:
> >> >
> >> > [drm:check_crtc_state [i915]] *ERROR* mismatch in dpll_hw_state.wrpll 
> >> > (expected 0xb035061f, found 0x)
> >> > [ cut here ]
> >> > WARNING: CPU: 1 PID: 767 at drivers/gpu/drm/i915/intel_display.c:12324 
> >> > check_crtc_state+0x975/0x10b0 [i915]()
> >> > pipe state doesn't match!
> >> >
> >> > This regression was indroduced in
> >> >
> >> > commit dd3cd74acf12723045a64f1f2c6298ac7b34a5d5
> >> > Author: Ander Conselvan de Oliveira 
> >> > 
> >> > Date:   Fri May 15 13:34:29 2015 +0300
> >> >
> >> > drm/i915: Don't overwrite (e)DP PLL selection on SKL
> >> >
> >> > Signed-off-by: Ander Conselvan de Oliveira 
> >> > 
> >> 
> >> Reported-by: Linus Torvalds 
> >> Tested-by: Jani Nikula 
> >
> > Yeah makes sense as a fix for 4.2. But for 4.3 I wonder whether the
> > original commit that started this chain needs to be changed a bit:
> >
> > commit 4978cc93d9ac240b435ce60431aef24239b4c270
> > Author: Ander Conselvan de Oliveira 
> > Date:   Tue Apr 21 17:13:21 2015 +0300
> >
> > drm/i915: Preserve shared DPLL information in new pipe_config
> >
> > All the trouble this caused is because it not only preserves the sharing
> > config (in crtc_state->shared_dpll) but also the ->dpll_hw_state. And I
> > think with Maarten's latest code (for 4.3) we'd just do an unconditional
> > compute_config (need it for fast pfit updates and fastboot), which means
> > the bogus values in ->dpll_hw_state aren't a problem any more since we'll
> > overwrite them again. And then we could remove that sprinkle of memsets we
> > have all over, which would be good (since the current approach is
> > obviously a bit fragile). Anyway:
> >
> > Reviewed-by: Daniel Vetter 
> 
> Pushed to drm-intel-next-fixes, thanks for the patch and review. One
> down, another one left to fix.

I made some progress on the second issue, but I'm afraid Jani might have
a found a third bug. The warning he gets happens because we try to wait
for vblanks while updating the primary plane during the modeset. At that
point, the crtc is off. The problem is in intel_check_primary_plane(),
which is called from drm_atomic_helper_check_planes(). That function
makes decisions about waiting for a vblank based on intel_crtc->active.
Since the check is called before we disable the crtcs, active might be
true, even though the plane update is done with crtcs disable.

The patch below makes the warning go away, but I still need to figure
out how to set crtc_state->planes_changed properly if we are going down
that route.

diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index dcb1d25..f14727c 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -12480,10 +12480,6 @@ intel_modeset_compute_config(struct drm_crtc *crtc,
 
intel_dump_pipe_config(to_intel_crtc(crtc), pipe_config,"[modeset]");
 
-   ret = drm_atomic_helper_check_planes(state->dev, state);
-   if (ret)
-   return ERR_PTR(ret);
-
return pipe_config;
 }
 

The backtrace on Linus' machine is different, though. It comes from the
call to intel_crtc_disable_planes() in __intel_set_mode(). That would
indicate we have a crtc with crtc->state->enable == true but that is
actually inactive. I'm still not sure how we can get in that state.

Ander

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 4/4] drm/i915/gtt: Per ppgtt scratch page

2015-07-01 Thread Daniel Vetter

On Wed, Jul 01, 2015 at 03:25:09PM +0100, Michel Thierry wrote:
> On 7/1/2015 3:26 PM, Daniel Vetter wrote:
> >On Wed, Jul 01, 2015 at 03:05:44PM +0100, Michel Thierry wrote:
> >>On 6/30/2015 4:16 PM, Mika Kuoppala wrote:
> >>>Previously we have pointed the page where the individual ppgtt
> >>>scratch structures refer to, to be the instance which GGTT setup have
> >>>allocated. So it has been shared.
> >>>
> >>>To achive full isolation between ppgtts also in this regard,
> >>  ^achieve
> >>
> >>>allocate per ppgtt scratch page.
> >>>
> >>Maybe also say that it moved scratch page/pt/pd operations together
> >>(genx_init/free_scratch functions).
> >>
> >>Daniel, since you requested this, should it get yours r-b?
> >>It looks ok to me.
> >
> >Does that count as an r-b? Doing a detailed review is more work than just
> >acking the overall idea ;-)
> 
> Yes, it'd be great if you fix the typo while merging.

Done.

> Reviewed-by: Michel Thierry 

Queued for -next, thanks for the patch.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 1/3] drm/i915: Make fb user dirty operation to invalidate frontbuffer

2015-07-01 Thread Chris Wilson

On Wed, Jul 01, 2015 at 04:09:19PM +0200, Daniel Vetter wrote:
> On Wed, Jul 01, 2015 at 02:21:40PM +0100, Chris Wilson wrote:
> > On Wed, Jul 01, 2015 at 03:19:31PM +0200, Daniel Vetter wrote:
> > > On Wed, Jul 01, 2015 at 09:04:08AM +0100, Chris Wilson wrote:
> > > > On Tue, Jun 30, 2015 at 04:42:00PM -0700, Rodrigo Vivi wrote:
> > > > > Let's do a frontbuffer invalidation on dirty fb.
> > > > > To be used for DIRTYFB drm ioctl.
> > > > > 
> > > > > This patch solves the biggest PSR known issue, that is
> > > > > missed screen updates during boot, mainly when there is a splash
> > > > > screen involved like plymouth.
> > > > > 
> > > > > Plymoth will do a modeset over ioctl that flushes frontbuffer
> > > > > tracking and PSR gets back to work while it cannot track the
> > > > > screen updates and exit properly. However plymouth also uses
> > > > > a dirtyfb ioctl whenever updating the screen. So let's use it
> > > > > to invalidate PSR back again.
> > > > > 
> > > > > v2: Remove ORIGIN_FB_DIRTY and use ORIGIN_GTT instead since dirty
> > > > > callback is just called after few screen updates and not on
> > > > > everyone as pointed by Daniel.
> > > > > 
> > > > > Cc: Daniel Vetter 
> > > > > Signed-off-by: Rodrigo Vivi 
> > > > 
> > > > Will it ever grow the ability to handle clip rects? I can detect the
> > > > presence of the syscall and call it appropriately, but I don't want to
> > > > have to start tracking frontbuffer damage unless there's a significant
> > > > advantage in doing so (to offset the cost of the tracking).
> > > 
> > > For now this is just for generic userspace using the dumb mmap ioctls,
> > > which does already dirty everything. For gem/i915 userspace the existing
> > > frontbuffer tracking rules will still apply.
> > 
> > But they are inadequate for the map/set-domain scanout once and write
> > through the GTT for umpteen seconds, which can happen quite frequenctly.
> > 
> > In that situation, we behave exactly like fbdev/dumb fb.
> 
> Yeah you can use it to flush gtt of course too. And there I'd just
> defensively flush the entire fb until we've grown more clueful in the
> kernel. But for forntbuffer flushing I don't expect that to ever happen
> for i915. It makes more sense ofc for udl/qxl and others where uploads are
> really expensive.

Ha, I thought that MIPI was going off in this direction precisely
because high pixel count displays only transferring the dirty regions is
a big power saving. Likewise I expect at some point, there will be a
chipset mode to only portions of the framebuffer out of the chip local
cache across the bus.

I want to make sure we are not going to shoot ourselves in the foot and
can forward-proof the design so that we can easily detect if we want to
use cliprects.

An easy way would be to return the number of rectangles pushed by
dirtyfb i.e. if dirtyfb(fb, .num_rects=1)== 0 I know that it just pushes
the whole framebuffer everytime and need not track damage.

The ABI would be negative error return on failure, 0 or postive value on
success, where the positive value is the number of rects pushed (which
should be identical to the user request on success). Dumb users then
don't need to care as they can either always request full fb flushes or
always push rects.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 4/4] drm/i915/gtt: Per ppgtt scratch page

2015-07-01 Thread Michel Thierry


On 7/1/2015 3:26 PM, Daniel Vetter wrote:

On Wed, Jul 01, 2015 at 03:05:44PM +0100, Michel Thierry wrote:

On 6/30/2015 4:16 PM, Mika Kuoppala wrote:

Previously we have pointed the page where the individual ppgtt
scratch structures refer to, to be the instance which GGTT setup have
allocated. So it has been shared.

To achive full isolation between ppgtts also in this regard,

  ^achieve


allocate per ppgtt scratch page.


Maybe also say that it moved scratch page/pt/pd operations together
(genx_init/free_scratch functions).

Daniel, since you requested this, should it get yours r-b?
It looks ok to me.


Does that count as an r-b? Doing a detailed review is more work than just
acking the overall idea ;-)


Yes, it'd be great if you fix the typo while merging.

Reviewed-by: Michel Thierry 


-Daniel


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 4/4] drm/i915/gtt: Per ppgtt scratch page

2015-07-01 Thread Daniel Vetter

On Wed, Jul 01, 2015 at 03:05:44PM +0100, Michel Thierry wrote:
> On 6/30/2015 4:16 PM, Mika Kuoppala wrote:
> >Previously we have pointed the page where the individual ppgtt
> >scratch structures refer to, to be the instance which GGTT setup have
> >allocated. So it has been shared.
> >
> >To achive full isolation between ppgtts also in this regard,
>  ^achieve
> 
> >allocate per ppgtt scratch page.
> >
> Maybe also say that it moved scratch page/pt/pd operations together
> (genx_init/free_scratch functions).
> 
> Daniel, since you requested this, should it get yours r-b?
> It looks ok to me.

Does that count as an r-b? Doing a detailed review is more work than just
acking the overall idea ;-)
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [drm-intel:drm-intel-next-queued 318/324] drivers/gpu/drm/i915/intel_ddi.c:2094:6: warning: unused variable 'iboost_bit'

2015-07-01 Thread kbuild test robot

tree:   git://anongit.freedesktop.org/drm-intel drm-intel-next-queued
head:   2cc898e05de1d1269ecd2d4208f8e890b4f8adad
commit: 3b7e4f82f3600c2251ea3411052419b7351addd2 [318/324] drm/i915: Per-DDI 
I_boost override
config: i386-randconfig-r0-201526 (attached as .config)
reproduce:
  git checkout 3b7e4f82f3600c2251ea3411052419b7351addd2
  # save the attached .config to linux build tree
  make ARCH=i386 

All warnings (new ones prefixed by >>):

   drivers/gpu/drm/i915/intel_ddi.c: In function 'skl_ddi_set_iboost':
>> drivers/gpu/drm/i915/intel_ddi.c:2094:6: warning: unused variable 
>> 'iboost_bit' [-Wunused-variable]
 u32 iboost_bit = 0;
 ^

vim +/iboost_bit +2094 drivers/gpu/drm/i915/intel_ddi.c

  2078  enum transcoder cpu_transcoder = 
intel_crtc->config->cpu_transcoder;
  2079  
  2080  if (cpu_transcoder != TRANSCODER_EDP)
  2081  I915_WRITE(TRANS_CLK_SEL(cpu_transcoder),
  2082 TRANS_CLK_SEL_DISABLED);
  2083  }
  2084  
  2085  static void skl_ddi_set_iboost(struct drm_device *dev, u32 level,
  2086 enum port port, int type)
  2087  {
  2088  struct drm_i915_private *dev_priv = dev->dev_private;
  2089  const struct ddi_buf_trans *ddi_translations;
  2090  uint8_t iboost;
  2091  uint8_t dp_iboost, hdmi_iboost;
  2092  int n_entries;
  2093  u32 reg;
> 2094  u32 iboost_bit = 0;
  2095  
  2096  /* VBT may override standard boost values */
  2097  dp_iboost = dev_priv->vbt.ddi_port_info[port].dp_boost_level;
  2098  hdmi_iboost = 
dev_priv->vbt.ddi_port_info[port].hdmi_boost_level;
  2099  
  2100  if (type == INTEL_OUTPUT_DISPLAYPORT) {
  2101  if (dp_iboost) {
  2102  iboost = dp_iboost;

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation
#
# Automatically generated file; DO NOT EDIT.
# Linux/i386 4.1.0-rc6 Kernel Configuration
#
# CONFIG_64BIT is not set
CONFIG_X86_32=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_PERF_EVENTS_INTEL_UNCORE=y
CONFIG_OUTPUT_FORMAT="elf32-i386"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_MMU=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_X86_32_SMP=y
CONFIG_X86_HT=y
CONFIG_X86_32_LAZY_GS=y
CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-ecx -fcall-saved-edx"
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_PGTABLE_LEVELS=2
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
# CONFIG_KERNEL_GZIP is not set
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
CONFIG_KERNEL_LZO=y
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
CONFIG_CROSS_MEMORY_ATTACH=y
CONFIG_FHANDLE=y
CONFIG_USELIB=y
CONFIG_AUDIT=y
CONFIG_HAVE_ARCH_AUDITSYSCALL=y
# CONFIG_AUDITSYSCALL is not set

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_LEGACY_ALLOC_HWIRQ=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_GENERIC_IRQ_CHIP=y
CONFIG_IRQ_DOMAIN=y
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_IRQ_DOMAIN_DEBUG=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_HZ_PERIODIC=y
# CONFIG_NO_HZ_IDLE is not set
CONFIG_NO_HZ=y
# CONFIG_HIGH_RES_TIMERS is not set

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_IRQ_TIME_ACCOUNTING is not set
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set

#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
CONFIG_SRCU=y
CONFIG_TASKS

Re: [Intel-gfx] [PATCH] drm/i915: Asynchronously initialise the GPU state

2015-07-01 Thread Chris Wilson

On Wed, Jul 01, 2015 at 04:07:08PM +0200, Daniel Vetter wrote:
> On Wed, Jul 01, 2015 at 02:17:28PM +0100, Chris Wilson wrote:
> > On Wed, Jul 01, 2015 at 03:07:18PM +0200, Daniel Vetter wrote:
> > > On Wed, Jul 01, 2015 at 10:27:21AM +0100, Chris Wilson wrote:
> > > > Dave Gordon made the good suggestion that once the ringbuffers were
> > > > setup, the actual queuing of commands to program the initial GPU state
> > > > could be deferred. Since that initial state contains instructions for
> > > > setting up the first power context, we want to execute that as earlier
> > > > as possible, preferrably in the background to userspace. Then when
> > > > userspace does wake up, the first time it opens the device we just need
> > > > to flush the work to be sure that our commands are queued before any of
> > > > userspace's. (Hooking into the device open should mean we have to check
> > > > less often than say hooking into execbuffer.)
> > > > 
> > > > Suggested-by: Dave Gordon 
> > > > Signed-off-by: Chris Wilson 
> > > > Cc: Dave Gordon 
> > > 
> > > Just before this gets a bit out of hand with various patches floating
> > > around ... I really meant it when I said that we should have a proper
> > > design discussion about this in Jesse's meeting first.
> > 
> > What more is there to design? Asynchronously loading the submission port
> > is orthogonal to the task of queuing requests for it, and need not block
> > request construction (be it kernel or userspace). Dave just identified
> > some work that we didn't need to do during module load. I don't think he
> > would propose using it for loading guc firmware, that would just be
> > silly...
> 
> set_wedged in your patch doesn't have the wakeup to kick waiters.

True. But is has to be impossible for a waiter to exist at this point,
or else the entire async GPU init is broken. These commands have to be
the first requests we send to the GPU. Everything else must wait before
it is allowed to start queuing.

> And
> maybe we want to be somewhat more synchronous with with init fail than gpu
> hangs, for userspace to make better decisions.

The init is still synchronous with userspace using the device, just (and
this is no change) the only communication with userspace that GEM
initialisation failed is the wedged GPU.

> Also we still have that
> issue that sometimes an -EIO escapes into modeset code.

But there are no new wait requests running conncurrent with GEM init,
so this patch doesn't alter that.

> And yes this is
> mean to provide the async init for the request firmware.

It is an inappropriate juncture for async request firmware. I can keep
repeating that enabling the submission ports is orthogonal to setting up
the CS engines and allowing requests to be queued, because it is...

There is no need to modify the higher levels for async GuC
initialisation. The serialisation there is when to start feeding requests
into the submission port. At the moment we do that immediately when it 
is idle - but the GuC is not idle until it loaded, as soon as it is
loaded it can simply feed in the first set of requests and start on
its merry way. This also allows GuC failure also always to transparently
fallback to execlists.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH RESEND 1/3] drm/i915/dsi: abstract dsi bpp derivation from pixel format

2015-07-01 Thread Daniel Vetter

On Wed, Jul 01, 2015 at 03:58:50PM +0300, Jani Nikula wrote:
> Nuke three copies of the same switch case.
> 
> Hopefully we can switch to a drm generic function later on, but that
> will require us to swich to enum mipi_dsi_pixel_format first.
> 
> Reviewed-by: Ville Syrjälä 
> Signed-off-by: Jani Nikula 

All merged, thanks for resending.
-Daniel

> ---
>  drivers/gpu/drm/i915/intel_dsi_pll.c | 67 
> +---
>  1 file changed, 24 insertions(+), 43 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_dsi_pll.c 
> b/drivers/gpu/drm/i915/intel_dsi_pll.c
> index d20cf37b6901..49ae821e82d8 100644
> --- a/drivers/gpu/drm/i915/intel_dsi_pll.c
> +++ b/drivers/gpu/drm/i915/intel_dsi_pll.c
> @@ -38,6 +38,27 @@
>  #define DSI_HFP_PACKET_EXTRA_SIZE6
>  #define DSI_EOTP_PACKET_SIZE 4
>  
> +static int dsi_pixel_format_bpp(int pixel_format)
> +{
> + int bpp;
> +
> + switch (pixel_format) {
> + default:
> + case VID_MODE_FORMAT_RGB888:
> + case VID_MODE_FORMAT_RGB666_LOOSE:
> + bpp = 24;
> + break;
> + case VID_MODE_FORMAT_RGB666:
> + bpp = 18;
> + break;
> + case VID_MODE_FORMAT_RGB565:
> + bpp = 16;
> + break;
> + }
> +
> + return bpp;
> +}
> +
>  struct dsi_mnp {
>   u32 dsi_pll_ctrl;
>   u32 dsi_pll_div;
> @@ -65,19 +86,7 @@ static u32 dsi_rr_formula(const struct drm_display_mode 
> *mode,
>   u32 dsi_bit_clock_hz;
>   u32 dsi_clk;
>  
> - switch (pixel_format) {
> - default:
> - case VID_MODE_FORMAT_RGB888:
> - case VID_MODE_FORMAT_RGB666_LOOSE:
> - bpp = 24;
> - break;
> - case VID_MODE_FORMAT_RGB666:
> - bpp = 18;
> - break;
> - case VID_MODE_FORMAT_RGB565:
> - bpp = 16;
> - break;
> - }
> + bpp = dsi_pixel_format_bpp(pixel_format);
>  
>   hactive = mode->hdisplay;
>   vactive = mode->vdisplay;
> @@ -137,21 +146,7 @@ static u32 dsi_rr_formula(const struct drm_display_mode 
> *mode,
>  static u32 dsi_clk_from_pclk(u32 pclk, int pixel_format, int lane_count)
>  {
>   u32 dsi_clk_khz;
> - u32 bpp;
> -
> - switch (pixel_format) {
> - default:
> - case VID_MODE_FORMAT_RGB888:
> - case VID_MODE_FORMAT_RGB666_LOOSE:
> - bpp = 24;
> - break;
> - case VID_MODE_FORMAT_RGB666:
> - bpp = 18;
> - break;
> - case VID_MODE_FORMAT_RGB565:
> - bpp = 16;
> - break;
> - }
> + u32 bpp = dsi_pixel_format_bpp(pixel_format);
>  
>   /* DSI data rate = pixel clock * bits per pixel / lane count
>  pixel clock is converted from KHz to Hz */
> @@ -286,21 +281,7 @@ void vlv_disable_dsi_pll(struct intel_encoder *encoder)
>  
>  static void assert_bpp_mismatch(int pixel_format, int pipe_bpp)
>  {
> - int bpp;
> -
> - switch (pixel_format) {
> - default:
> - case VID_MODE_FORMAT_RGB888:
> - case VID_MODE_FORMAT_RGB666_LOOSE:
> - bpp = 24;
> - break;
> - case VID_MODE_FORMAT_RGB666:
> - bpp = 18;
> - break;
> - case VID_MODE_FORMAT_RGB565:
> - bpp = 16;
> - break;
> - }
> + int bpp = dsi_pixel_format_bpp(pixel_format);
>  
>   WARN(bpp != pipe_bpp,
>"bpp match assertion failure (expected %d, current %d)\n",
> -- 
> 2.1.4
> 
> ___
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 1/3] drm/i915: Make fb user dirty operation to invalidate frontbuffer

2015-07-01 Thread Daniel Vetter

On Wed, Jul 01, 2015 at 02:21:40PM +0100, Chris Wilson wrote:
> On Wed, Jul 01, 2015 at 03:19:31PM +0200, Daniel Vetter wrote:
> > On Wed, Jul 01, 2015 at 09:04:08AM +0100, Chris Wilson wrote:
> > > On Tue, Jun 30, 2015 at 04:42:00PM -0700, Rodrigo Vivi wrote:
> > > > Let's do a frontbuffer invalidation on dirty fb.
> > > > To be used for DIRTYFB drm ioctl.
> > > > 
> > > > This patch solves the biggest PSR known issue, that is
> > > > missed screen updates during boot, mainly when there is a splash
> > > > screen involved like plymouth.
> > > > 
> > > > Plymoth will do a modeset over ioctl that flushes frontbuffer
> > > > tracking and PSR gets back to work while it cannot track the
> > > > screen updates and exit properly. However plymouth also uses
> > > > a dirtyfb ioctl whenever updating the screen. So let's use it
> > > > to invalidate PSR back again.
> > > > 
> > > > v2: Remove ORIGIN_FB_DIRTY and use ORIGIN_GTT instead since dirty
> > > > callback is just called after few screen updates and not on
> > > > everyone as pointed by Daniel.
> > > > 
> > > > Cc: Daniel Vetter 
> > > > Signed-off-by: Rodrigo Vivi 
> > > 
> > > Will it ever grow the ability to handle clip rects? I can detect the
> > > presence of the syscall and call it appropriately, but I don't want to
> > > have to start tracking frontbuffer damage unless there's a significant
> > > advantage in doing so (to offset the cost of the tracking).
> > 
> > For now this is just for generic userspace using the dumb mmap ioctls,
> > which does already dirty everything. For gem/i915 userspace the existing
> > frontbuffer tracking rules will still apply.
> 
> But they are inadequate for the map/set-domain scanout once and write
> through the GTT for umpteen seconds, which can happen quite frequenctly.
> 
> In that situation, we behave exactly like fbdev/dumb fb.

Yeah you can use it to flush gtt of course too. And there I'd just
defensively flush the entire fb until we've grown more clueful in the
kernel. But for forntbuffer flushing I don't expect that to ever happen
for i915. It makes more sense ofc for udl/qxl and others where uploads are
really expensive.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 4/4] drm/i915/gtt: Per ppgtt scratch page

2015-07-01 Thread Michel Thierry


On 6/30/2015 4:16 PM, Mika Kuoppala wrote:

Previously we have pointed the page where the individual ppgtt
scratch structures refer to, to be the instance which GGTT setup have
allocated. So it has been shared.

To achive full isolation between ppgtts also in this regard,

 ^achieve


allocate per ppgtt scratch page.

Maybe also say that it moved scratch page/pt/pd operations together 
(genx_init/free_scratch functions).


Daniel, since you requested this, should it get yours r-b?
It looks ok to me.

-Michel


Cc: Michel Thierry 
Cc: Daniel Vetter 
Signed-off-by: Mika Kuoppala 
---
  drivers/gpu/drm/i915/i915_gem_gtt.c | 94 +
  1 file changed, 74 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 402d6d3..b1a8fc4 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -682,6 +682,42 @@ static void gen8_free_page_tables(struct drm_device *dev,
}
  }

+static int gen8_init_scratch(struct i915_address_space *vm)
+{
+   struct drm_device *dev = vm->dev;
+
+   vm->scratch_page = alloc_scratch_page(dev);
+   if (IS_ERR(vm->scratch_page))
+   return PTR_ERR(vm->scratch_page);
+
+   vm->scratch_pt = alloc_pt(dev);
+   if (IS_ERR(vm->scratch_pt)) {
+   free_scratch_page(dev, vm->scratch_page);
+   return PTR_ERR(vm->scratch_pt);
+   }
+
+   vm->scratch_pd = alloc_pd(dev);
+   if (IS_ERR(vm->scratch_pd)) {
+   free_pt(dev, vm->scratch_pt);
+   free_scratch_page(dev, vm->scratch_page);
+   return PTR_ERR(vm->scratch_pd);
+   }
+
+   gen8_initialize_pt(vm, vm->scratch_pt);
+   gen8_initialize_pd(vm, vm->scratch_pd);
+
+   return 0;
+}
+
+static void gen8_free_scratch(struct i915_address_space *vm)
+{
+   struct drm_device *dev = vm->dev;
+
+   free_pd(dev, vm->scratch_pd);
+   free_pt(dev, vm->scratch_pt);
+   free_scratch_page(dev, vm->scratch_page);
+}
+
  static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
  {
struct i915_hw_ppgtt *ppgtt =
@@ -697,8 +733,7 @@ static void gen8_ppgtt_cleanup(struct i915_address_space 
*vm)
free_pd(ppgtt->base.dev, ppgtt->pdp.page_directory[i]);
}

-   free_pd(vm->dev, vm->scratch_pd);
-   free_pt(vm->dev, vm->scratch_pt);
+   gen8_free_scratch(vm);
  }

  /**
@@ -985,16 +1020,11 @@ err_out:
   */
  static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
  {
-   ppgtt->base.scratch_pt = alloc_pt(ppgtt->base.dev);
-   if (IS_ERR(ppgtt->base.scratch_pt))
-   return PTR_ERR(ppgtt->base.scratch_pt);
-
-   ppgtt->base.scratch_pd = alloc_pd(ppgtt->base.dev);
-   if (IS_ERR(ppgtt->base.scratch_pd))
-   return PTR_ERR(ppgtt->base.scratch_pd);
+   int ret;

-   gen8_initialize_pt(&ppgtt->base, ppgtt->base.scratch_pt);
-   gen8_initialize_pd(&ppgtt->base, ppgtt->base.scratch_pd);
+   ret = gen8_init_scratch(&ppgtt->base);
+   if (ret)
+   return ret;

ppgtt->base.start = 0;
ppgtt->base.total = 1ULL << 32;
@@ -1410,6 +1440,33 @@ unwind_out:
return ret;
  }

+static int gen6_init_scratch(struct i915_address_space *vm)
+{
+   struct drm_device *dev = vm->dev;
+
+   vm->scratch_page = alloc_scratch_page(dev);
+   if (IS_ERR(vm->scratch_page))
+   return PTR_ERR(vm->scratch_page);
+
+   vm->scratch_pt = alloc_pt(dev);
+   if (IS_ERR(vm->scratch_pt)) {
+   free_scratch_page(dev, vm->scratch_page);
+   return PTR_ERR(vm->scratch_pt);
+   }
+
+   gen6_initialize_pt(vm, vm->scratch_pt);
+
+   return 0;
+}
+
+static void gen6_free_scratch(struct i915_address_space *vm)
+{
+   struct drm_device *dev = vm->dev;
+
+   free_pt(dev, vm->scratch_pt);
+   free_scratch_page(dev, vm->scratch_page);
+}
+
  static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
  {
struct i915_hw_ppgtt *ppgtt =
@@ -1424,11 +1481,12 @@ static void gen6_ppgtt_cleanup(struct 
i915_address_space *vm)
free_pt(ppgtt->base.dev, pt);
}

-   free_pt(vm->dev, vm->scratch_pt);
+   gen6_free_scratch(vm);
  }

  static int gen6_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt)
  {
+   struct i915_address_space *vm = &ppgtt->base;
struct drm_device *dev = ppgtt->base.dev;
struct drm_i915_private *dev_priv = dev->dev_private;
bool retried = false;
@@ -1439,11 +1497,10 @@ static int gen6_ppgtt_allocate_page_directories(struct 
i915_hw_ppgtt *ppgtt)
 * size. We allocate at the top of the GTT to avoid fragmentation.
 */
BUG_ON(!drm_mm_initialized(&dev_priv->gtt.base.mm));
-   ppgtt->base.scratch_pt = alloc_pt(ppgtt->base.dev);
-   if (IS_ERR(ppgtt->base.scratch_pt))
-   return PTR_ERR(ppgtt->

Re: [Intel-gfx] [PATCH] drm/i915: Asynchronously initialise the GPU state

2015-07-01 Thread Daniel Vetter

On Wed, Jul 01, 2015 at 02:17:28PM +0100, Chris Wilson wrote:
> On Wed, Jul 01, 2015 at 03:07:18PM +0200, Daniel Vetter wrote:
> > On Wed, Jul 01, 2015 at 10:27:21AM +0100, Chris Wilson wrote:
> > > Dave Gordon made the good suggestion that once the ringbuffers were
> > > setup, the actual queuing of commands to program the initial GPU state
> > > could be deferred. Since that initial state contains instructions for
> > > setting up the first power context, we want to execute that as earlier
> > > as possible, preferrably in the background to userspace. Then when
> > > userspace does wake up, the first time it opens the device we just need
> > > to flush the work to be sure that our commands are queued before any of
> > > userspace's. (Hooking into the device open should mean we have to check
> > > less often than say hooking into execbuffer.)
> > > 
> > > Suggested-by: Dave Gordon 
> > > Signed-off-by: Chris Wilson 
> > > Cc: Dave Gordon 
> > 
> > Just before this gets a bit out of hand with various patches floating
> > around ... I really meant it when I said that we should have a proper
> > design discussion about this in Jesse's meeting first.
> 
> What more is there to design? Asynchronously loading the submission port
> is orthogonal to the task of queuing requests for it, and need not block
> request construction (be it kernel or userspace). Dave just identified
> some work that we didn't need to do during module load. I don't think he
> would propose using it for loading guc firmware, that would just be
> silly...

set_wedged in your patch doesn't have the wakeup to kick waiters. And
maybe we want to be somewhat more synchronous with with init fail than gpu
hangs, for userspace to make better decisions. Also we still have that
issue that sometimes an -EIO escapes into modeset code.  And yes this is
mean to provide the async init for the request firmware.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 5/8] drm/i915: simplify FBC start/stop at invalidate/flush

2015-07-01 Thread Chris Wilson

On Tue, Jun 30, 2015 at 06:12:59PM -0300, Paulo Zanoni wrote:
> 2015-06-30 11:34 GMT-03:00 Chris Wilson :
> > I presume that start/stop are the highest, and control the sw state. And
> > that enable/disable are just hw interaction. And who sets fbc.enabled?
> > start()? enable()? disable()? stop()?
> >
> > In confusion,
> 
> I understand your concerns and I agree with you that this is
> confusing. I also agree that the addition of stop() makes things even
> worse. One of the problems is that intel_fbc_update() does
> "everything": it picks the CRTC, it can enable FBC, it can disable
> FBC, it can change the CRTC, etc. So we have: update(), enable(),
> disable(), flush() and invalidate(), and the patch added stop().
> 
> I had some patches that would move us to enable/disable (high level)
> activate/deactivate (low level), flush/invalidate (wrappers for
> activate/deactivate) and update (highest level). This would make the
> naming scheme similar to PSR. I wanted to merge the locking fixes
> first, but I can put everything on the same series if you want. Or
> leave this patch out of the "locking" series and add it to the next
> series...

To keep it minimal, a quick outline comment telling me the layers and
ordering would be of use right now to review the patches, and longer
term to review the code.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 6/8] drm/i915: add struct_mutex WARNs to i915_gem_stolen.c

2015-07-01 Thread Paulo Zanoni

2015-07-01 11:02 GMT-03:00 Chris Wilson :
> On Wed, Jul 01, 2015 at 04:00:23PM +0200, Daniel Vetter wrote:
>> On Tue, Jun 30, 2015 at 03:34:55PM +0100, Chris Wilson wrote:
>> > On Tue, Jun 30, 2015 at 10:53:10AM -0300, Paulo Zanoni wrote:
>> > > From: Paulo Zanoni 
>> > >
>> > > Let's make sure the future Paulos don't forget that we need
>> > > struct_mutex when touching dev_priv->mm.stolen.
>> >
>> > As I elluded to in patch 5, I think the stolen warns are a misstep.
>>
>> Imo switching to a separate stolen_mutex should be a separate patch, this
>> just documents the current rules. Which seems fine to me.
>
> Introducing a stolen mutex won't be a very much larger patch, and the
> current locking rules are an impediment for use elsewhere.

I wrote the stolen_mutex patches yesterday, I'll send them soon.

> -Chris
>
> --
> Chris Wilson, Intel Open Source Technology Centre



-- 
Paulo Zanoni
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 6/8] drm/i915: add struct_mutex WARNs to i915_gem_stolen.c

2015-07-01 Thread Chris Wilson

On Wed, Jul 01, 2015 at 04:00:23PM +0200, Daniel Vetter wrote:
> On Tue, Jun 30, 2015 at 03:34:55PM +0100, Chris Wilson wrote:
> > On Tue, Jun 30, 2015 at 10:53:10AM -0300, Paulo Zanoni wrote:
> > > From: Paulo Zanoni 
> > > 
> > > Let's make sure the future Paulos don't forget that we need
> > > struct_mutex when touching dev_priv->mm.stolen.
> > 
> > As I elluded to in patch 5, I think the stolen warns are a misstep.
> 
> Imo switching to a separate stolen_mutex should be a separate patch, this
> just documents the current rules. Which seems fine to me.

Introducing a stolen mutex won't be a very much larger patch, and the
current locking rules are an impediment for use elsewhere.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 6/8] drm/i915: add struct_mutex WARNs to i915_gem_stolen.c

2015-07-01 Thread Daniel Vetter

On Tue, Jun 30, 2015 at 03:34:55PM +0100, Chris Wilson wrote:
> On Tue, Jun 30, 2015 at 10:53:10AM -0300, Paulo Zanoni wrote:
> > From: Paulo Zanoni 
> > 
> > Let's make sure the future Paulos don't forget that we need
> > struct_mutex when touching dev_priv->mm.stolen.
> 
> As I elluded to in patch 5, I think the stolen warns are a misstep.

Imo switching to a separate stolen_mutex should be a separate patch, this
just documents the current rules. Which seems fine to me.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 6/8] drm/i915: add struct_mutex WARNs to i915_gem_stolen.c

2015-07-01 Thread Daniel Vetter

On Tue, Jun 30, 2015 at 01:30:27PM -0700, Jesse Barnes wrote:
> On 06/30/2015 07:36 AM, Chris Wilson wrote:
> > On Tue, Jun 30, 2015 at 11:26:11AM -0300, Paulo Zanoni wrote:
> >> 2015-06-30 11:15 GMT-03:00 Chris Wilson :
> >>> On Tue, Jun 30, 2015 at 10:53:10AM -0300, Paulo Zanoni wrote:
>  From: Paulo Zanoni 
> 
>  Let's make sure the future Paulos don't forget that we need
>  struct_mutex when touching dev_priv->mm.stolen.
> 
>  Signed-off-by: Paulo Zanoni 
>  ---
>   drivers/gpu/drm/i915/i915_gem_stolen.c | 13 +
>   1 file changed, 13 insertions(+)
> 
>  diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c 
>  b/drivers/gpu/drm/i915/i915_gem_stolen.c
>  index 793bcba..cac1bce 100644
>  --- a/drivers/gpu/drm/i915/i915_gem_stolen.c
>  +++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
>  @@ -160,6 +160,8 @@ static int find_compression_threshold(struct 
>  drm_device *dev,
>    int compression_threshold = 1;
>    int ret;
> 
>  + WARN_ON(!mutex_is_locked(&dev->struct_mutex));
> >>>
> >>> I'm not a huge fan of vague mutex warnings that don't even check the 
> >>> owner.
> >>> I'm espcially not a fan of adding a WARN and not handling the error.
> >>
> >> But then, what exactly is your proposal? What would you like to see here?
> >>
> >> We can discard this patch if you want. But I hope you're not
> >> advocating for lockdep_assert_held(), because if I switch to lockdep,
> >> then Daniel is going to deny it again. Also, this type of WARN_ON is a
> >> common pattern on our codebase...
> > 
> > I'm just trying to convince Daniel that blindly using this pattern is
> > the wrong approach and encouraging a proliferation of unhandled WARN_ON
> > doesn't improve driver robustness.
> 
> I think they serve as useful documentation at the very least, whether in
> lockdep form, WARN form, or BUG form.  It's not really something we can
> recover from either (maybe returning early before touching data?), so...

Not grabbing a lock is generally a harmless error since real races out
there are rare with X being single-threaded and all that. Especially in
stuff called from modeset code. Hence I think just WARN_ON plus continuing
on with blissful ignorance is the best approach.

I don't the lockdep versions personally since they don't work when lockdep
is disabled, which is pretty much always the case. Might be useful to do
an assert_mutex_held which always does the most paranoid check (i.e.
WARN_ON without lockdep, lockdep_assert_held with lockdep).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno

2015-07-01 Thread Mika Kuoppala

Mika Kuoppala  writes:

> Imre Deak  writes:
>
>> By running igt/store_dword_loop_render on BXT we can hit a coherency
>> problem where the seqno written at GPU command completion time is not
>> seen by the CPU. This results in __i915_wait_request seeing the stale
>> seqno and not completing the request (not considering the lost
>> interrupt/GPU reset mechanism). I also verified that this isn't a case
>> of a lost interrupt, or that the command didn't complete somehow: when
>> the coherency issue occured I read the seqno via an uncached GTT mapping
>> too. While the cached version of the seqno still showed the stale value
>> the one read via the uncached mapping was the correct one.
>>
>> Work around this issue by clflushing the corresponding CPU cacheline
>> following any store of the seqno and preceding any reading of it. When
>> reading it do this only when the caller expects a coherent view.
>>
>> Testcase: igt/store_dword_loop_render
>> Signed-off-by: Imre Deak 
>> ---
>>  drivers/gpu/drm/i915/intel_lrc.c| 17 +
>>  drivers/gpu/drm/i915/intel_ringbuffer.h |  7 +++
>>  2 files changed, 24 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c 
>> b/drivers/gpu/drm/i915/intel_lrc.c
>> index 9f5485d..88bc5525 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> @@ -1288,12 +1288,29 @@ static int gen8_emit_flush_render(struct 
>> intel_ringbuffer *ringbuf,
>>  
>>  static u32 gen8_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency)
>>  {
>> +/*
>> + * On BXT-A1 there is a coherency issue whereby the MI_STORE_DATA_IMM
>> + * storing the completed request's seqno occasionally doesn't
>> + * invalidate the CPU cache. Work around this by clflushing the
>> + * corresponding cacheline whenever the caller wants the coherency to
>> + * be guaranteed. Note that this cacheline is known to be
>> + * clean at this point, since we only write it in gen8_set_seqno(),
>> + * where we also do a clflush after the write. So this clflush in
>> + * practice becomes an invalidate operation.
>> + */
>> +if (IS_BROXTON(ring->dev) & !lazy_coherency)
>
> s/&/&& ?

s//Read The Whole Thread Before Replying

-Mika

> -Mika
>
>> +intel_flush_status_page(ring, I915_GEM_HWS_INDEX);
>> +
>>  return intel_read_status_page(ring, I915_GEM_HWS_INDEX);
>>  }
>>  
>>  static void gen8_set_seqno(struct intel_engine_cs *ring, u32 seqno)
>>  {
>>  intel_write_status_page(ring, I915_GEM_HWS_INDEX, seqno);
>> +
>> +/* See gen8_get_seqno() explaining the reason for the clflush. */
>> +if (IS_BROXTON(ring->dev))
>> +intel_flush_status_page(ring, I915_GEM_HWS_INDEX);
>>  }
>>  
>>  static int gen8_emit_request(struct intel_ringbuffer *ringbuf,
>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h 
>> b/drivers/gpu/drm/i915/intel_ringbuffer.h
>> index 39f6dfc..224a25b 100644
>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
>> @@ -352,6 +352,13 @@ intel_ring_sync_index(struct intel_engine_cs *ring,
>>  return idx;
>>  }
>>  
>> +static inline void
>> +intel_flush_status_page(struct intel_engine_cs *ring, int reg)
>> +{
>> +drm_clflush_virt_range(&ring->status_page.page_addr[reg],
>> +   sizeof(uint32_t));
>> +}
>> +
>>  static inline u32
>>  intel_read_status_page(struct intel_engine_cs *ring,
>> int reg)
>> -- 
>> 2.1.4
>>
>> ___
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> ___
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

1 2 >

1 - 100 of 185 matches

Mail list logo