Re: [Intel-gfx] [PATCH] drm/i915: Wait for PP cycle delay only if panel is in power off sequence

2015-12-11 Thread Thulasimani, Sivakumar



On 12/10/2015 8:32 PM, Ville Syrjälä wrote:

On Thu, Dec 10, 2015 at 08:09:01PM +0530, Thulasimani, Sivakumar wrote:


On 12/10/2015 7:08 PM, Ville Syrjälä wrote:

On Thu, Dec 10, 2015 at 03:15:37PM +0200, Ville Syrjälä wrote:

On Thu, Dec 10, 2015 at 03:01:02PM +0530, Kumar, Shobhit wrote:

On 12/09/2015 09:35 PM, Ville Syrjälä wrote:

On Wed, Dec 09, 2015 at 08:59:26PM +0530, Shobhit Kumar wrote:

On Wed, Dec 9, 2015 at 8:34 PM, Chris Wilson  wrote:

On Wed, Dec 09, 2015 at 08:07:10PM +0530, Shobhit Kumar wrote:

On Wed, Dec 9, 2015 at 7:27 PM, Ville Syrjälä
 wrote:

On Wed, Dec 09, 2015 at 06:51:48PM +0530, Shobhit Kumar wrote:

During resume, while turning the EDP panel power on, we need not wait
blindly for panel_power_cycle_delay. Check if panel power down sequence
in progress and then only wait. This improves our resume time significantly.

Signed-off-by: Shobhit Kumar 
---
drivers/gpu/drm/i915/intel_dp.c | 17 -
1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
index f335c92..10ec669 100644
--- a/drivers/gpu/drm/i915/intel_dp.c
+++ b/drivers/gpu/drm/i915/intel_dp.c
@@ -617,6 +617,20 @@ static bool edp_have_panel_power(struct intel_dp *intel_dp)
 return (I915_READ(_pp_stat_reg(intel_dp)) & PP_ON) != 0;
}

+static bool edp_panel_off_seq(struct intel_dp *intel_dp)
+{
+ struct drm_device *dev = intel_dp_to_dev(intel_dp);
+ struct drm_i915_private *dev_priv = dev->dev_private;
+
+ lockdep_assert_held(_priv->pps_mutex);
+
+ if (IS_VALLEYVIEW(dev) &&
+ intel_dp->pps_pipe == INVALID_PIPE)
+ return false;
+
+ return (I915_READ(_pp_stat_reg(intel_dp)) & PP_SEQUENCE_POWER_DOWN) != 0;
+}

This doens't make sense to me. The power down cycle may have
completed just before, and so this would claim we don't have to
wait for the power_cycle_delay.

Not sure I understand your concern correctly. You are right, power
down cycle may have completed just before and if it has then we don't
need to wait. But in case the power down cycle is in progress as per
internal state, then we need to wait for it to complete. This will
happen for example in non-suspend disable path and will be handled
correctly. In case of actual suspend/resume, this would have
successfully completed and will skip the wait as it is not needed
before enabling panel power.


+
static bool edp_have_panel_vdd(struct intel_dp *intel_dp)
{
 struct drm_device *dev = intel_dp_to_dev(intel_dp);
@@ -2025,7 +2039,8 @@ static void edp_panel_on(struct intel_dp *intel_dp)
  port_name(dp_to_dig_port(intel_dp)->port)))
 return;

- wait_panel_power_cycle(intel_dp);
+ if (edp_panel_off_seq(intel_dp))
+ wait_panel_power_cycle(intel_dp);

Looking in from the side, I have no idea what this is meant to do. At
the very least you need your explanatory paragraph here which would
include what exactly you are waiting for at the start of edp_panel_on
(and please try and find a better name for edp_panel_off_seq()).

I will add a comment. Basically I am not additionally waiting, but
converting the wait which was already there to a conditional wait. The
edp_panel_off_seq, checks if panel power down sequence is in progress.
In that case we need to wait for the panel power cycle delay. If it is
not in that sequence, there is no need to wait. I will make an attempt
again on the naming in next patch update.

As far I remeber you need to wait for power_cycle_delay between power
down cycle and power up cycle. You're trying to throw that wait away
entirely, unless the function happens get called while the power down

Yes you are right and I realize I made a mistake in my patch which is
not checking PP_CYCLE_DELAY_ACTIVE bit.


cycle is still in progress. We should already optimize away redundant
waits by tracking the end of the power down cycle with the jiffies
tracking.

Actually looking at the code the power_cycle_delay gets counted from
the start of the last power down cycle, so supposedly it's always at
least as long as the power down cycle, and typically it's quite a bit
longer that that. But that doesn't change the fact that you can't
just skip it because the power down cycle delay happened to end
already.

So what we do now is:
1. initiate power down cycle
2. last_power_cycle=jiffies
3. wait for power down (I suppose this actually waits
  until the power down delay has passed since that's
  programmes into the PPS).
4. wait for power_cycle_delay from last_power_cycle
5. initiate power up cycle

I think with your patch step 4 would always be skipped since the
power down cycle has already ended, and then we fail to honor the
power cycle delay.

Yes, I agree. I missed checking for PP_CYCLE_DELAY_ACTIVE. Adding that
check will take care of this scenario I guess ?

Nope. The 

Re: [Intel-gfx] [PATCH 2/6] drm/i915: Support for creating Stolen memory backed objects

2015-12-11 Thread Tvrtko Ursulin


On 11/12/15 11:22, Ankitprasad Sharma wrote:

On Wed, 2015-12-09 at 14:06 +, Tvrtko Ursulin wrote:

Hi,

On 09/12/15 12:46, ankitprasad.r.sha...@intel.com wrote:

From: Ankitprasad Sharma 

Extend the drm_i915_gem_create structure to add support for
creating Stolen memory backed objects. Added a new flag through
which user can specify the preference to allocate the object from
stolen memory, which if set, an attempt will be made to allocate
the object from stolen memory subject to the availability of
free space in the stolen region.

v2: Rebased to the latest drm-intel-nightly (Ankit)

v3: Changed versioning of GEM_CREATE param, added new comments (Tvrtko)

v4: Changed size from 32b to 64b to prevent userspace overflow (Tvrtko)
Corrected function arguments ordering (Chris)

v5: Corrected function name (Chris)

Testcase: igt/gem_stolen

Signed-off-by: Ankitprasad Sharma 
Reviewed-by: Tvrtko Ursulin 
---
   drivers/gpu/drm/i915/i915_dma.c|  3 +++
   drivers/gpu/drm/i915/i915_drv.h|  2 +-
   drivers/gpu/drm/i915/i915_gem.c| 30 +++---
   drivers/gpu/drm/i915/i915_gem_stolen.c |  4 ++--
   include/uapi/drm/i915_drm.h| 16 
   5 files changed, 49 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index ffcb9c6..6927c7e 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -170,6 +170,9 @@ static int i915_getparam(struct drm_device *dev, void *data,
case I915_PARAM_HAS_RESOURCE_STREAMER:
value = HAS_RESOURCE_STREAMER(dev);
break;
+   case I915_PARAM_CREATE_VERSION:
+   value = 2;
+   break;
default:
DRM_DEBUG("Unknown parameter %d\n", param->param);
return -EINVAL;
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 8e554d3..d45274e 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3213,7 +3213,7 @@ void i915_gem_stolen_remove_node(struct drm_i915_private 
*dev_priv,
   int i915_gem_init_stolen(struct drm_device *dev);
   void i915_gem_cleanup_stolen(struct drm_device *dev);
   struct drm_i915_gem_object *
-i915_gem_object_create_stolen(struct drm_device *dev, u32 size);
+i915_gem_object_create_stolen(struct drm_device *dev, u64 size);
   struct drm_i915_gem_object *
   i915_gem_object_create_stolen_for_preallocated(struct drm_device *dev,
   u32 stolen_offset,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d57e850..296e63f 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -375,6 +375,7 @@ static int
   i915_gem_create(struct drm_file *file,
struct drm_device *dev,
uint64_t size,
+   uint32_t flags,
uint32_t *handle_p)
   {
struct drm_i915_gem_object *obj;
@@ -385,8 +386,31 @@ i915_gem_create(struct drm_file *file,
if (size == 0)
return -EINVAL;

+   if (flags & __I915_CREATE_UNKNOWN_FLAGS)
+   return -EINVAL;
+
/* Allocate the new object */
-   obj = i915_gem_alloc_object(dev, size);
+   if (flags & I915_CREATE_PLACEMENT_STOLEN) {
+   mutex_lock(>struct_mutex);
+   obj = i915_gem_object_create_stolen(dev, size);
+   if (!obj) {
+   mutex_unlock(>struct_mutex);
+   return -ENOMEM;
+   }
+
+   /* Always clear fresh buffers before handing to userspace */
+   ret = i915_gem_object_clear(obj);
+   if (ret) {
+   drm_gem_object_unreference(>base);
+   mutex_unlock(>struct_mutex);
+   return ret;
+   }
+
+   mutex_unlock(>struct_mutex);
+   } else {
+   obj = i915_gem_alloc_object(dev, size);
+   }
+
if (obj == NULL)
return -ENOMEM;

@@ -409,7 +433,7 @@ i915_gem_dumb_create(struct drm_file *file,
args->pitch = ALIGN(args->width * DIV_ROUND_UP(args->bpp, 8), 64);
args->size = args->pitch * args->height;
return i915_gem_create(file, dev,
-  args->size, >handle);
+  args->size, 0, >handle);
   }

   /**
@@ -422,7 +446,7 @@ i915_gem_create_ioctl(struct drm_device *dev, void *data,
struct drm_i915_gem_create *args = data;

return i915_gem_create(file, dev,
-  args->size, >handle);
+  args->size, args->flags, >handle);
   }

   static inline int
diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c 
b/drivers/gpu/drm/i915/i915_gem_stolen.c
index 

Re: [Intel-gfx] [PATCH] Always mark GEM objects as dirty when written by the CPU

2015-12-11 Thread Chris Wilson
On Fri, Dec 11, 2015 at 12:19:09PM +, Dave Gordon wrote:
> On 10/12/15 08:58, Daniel Vetter wrote:
> >On Mon, Dec 07, 2015 at 12:51:49PM +, Dave Gordon wrote:
> >>I think I missed i915_gem_phys_pwrite().
> >>
> >>i915_gem_gtt_pwrite_fast() marks the object dirty for most cases (vit
> >>set_to_gtt_domain(), but isn't called for all cases (or can return before
> >>the set_domain). Then we try i915_gem_shmem_pwrite() for non-phys
> >>objects (no check for stolen!) and that already marks the object dirty
> >>[aside: we might be able to change that to page-by-page?], but
> >>i915_gem_phys_pwrite() doesn't mark the object dirty, so we might lose
> >>updates there?
> >>
> >>Or maybe we should move the marking up into i915_gem_pwrite_ioctl() instead.
> >>The target object is surely going to be dirtied, whatever type it is.
> >
> >phys objects are special, and when binding we create allocate new
> >(contiguous) storage. In put_pages_phys that gets copied back and pages
> >marked as dirty. While a phys object is pinned it's a kernel bug to look
> >at the shmem pages and a userspace bug to touch the cpu mmap (since that
> >data will simply be overwritten whenever the kernel feels like).
> >
> >phys objects are only used for cursors on old crap though, so ok if we
> >don't streamline this fairly quirky old ABI.
> >-Daniel
> 
> So is pread broken already for 'phys' ?

Yes. A completely unused corner of the API.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v2] PM / Runtime: Introduce pm_runtime_get_noidle

2015-12-11 Thread Dave Gordon

On 10/12/15 22:14, Rafael J. Wysocki wrote:

On Thursday, December 10, 2015 11:20:40 PM Imre Deak wrote:

On Thu, 2015-12-10 at 22:42 +0100, Rafael J. Wysocki wrote:

On Thursday, December 10, 2015 10:36:37 PM Rafael J. Wysocki wrote:

On Thursday, December 10, 2015 11:43:50 AM Imre Deak wrote:

On Thu, 2015-12-10 at 01:58 +0100, Rafael J. Wysocki wrote:

On Wednesday, December 09, 2015 06:22:19 PM Joonas Lahtinen
wrote:

Introduce pm_runtime_get_noidle to for situations where it is
not
desireable to touch an idling device. One use scenario is
periodic
hangchecks performed by the drm/i915 driver which can be
omitted
on a device in a runtime idle state.

v2:
- Fix inconsistent return value when !CONFIG_PM.
- Update documentation for bool return value

Signed-off-by: Joonas Lahtinen 
Reported-by: Chris Wilson 
Cc: Chris Wilson 
Cc: "Rafael J. Wysocki" 
Cc: linux...@vger.kernel.org


Well, I don't quite see how this can be used in a non-racy way
without doing an additional pm_runtime_resume() or something
like
that in the same code path.


We don't want to resume, that would be the whole point. We'd like
to
ensure that we hold a reference _and_ the device is already
active. So
AFAICS we'd need to check runtime_status == RPM_ACTIVE in
addition
after taking the reference.


Right, and that under the lock.


Which basically means you can call pm_runtime_resume() just fine,
because it will do nothing if the status is RPM_ACTIVE already.

So really, why don't you use pm_runtime_get_sync()?


The difference would be that if the status is not RPM_ACTIVE already we
would drop the reference and report error. The caller would in this
case forego of doing something, since we the device is suspended or on
the way to being suspended. One example of such a scenario is a
watchdog like functionality: the watchdog work would
call pm_runtime_get_noidle() and check if the device is ok by doing
some HW access, but only if the device is powered. Otherwise the work
item would do nothing (meaning it also won't reschedule itself). The
watchdog work would get rescheduled next time the device is woken up
and some work is submitted to the device.


So first of all the name "pm_runtime_get_noidle" doesn't make sense.


How about pm_runtime_get_unless_idle(), which would be analogous to 
kref_get_unless_zero() ?


.Dave.


I guess what you need is something like

bool pm_runtime_get_if_active(struct device *dev)
{
unsigned log flags;
bool ret;

spin_lock_irqsave(>power.lock, flags);

if (dev->power.runtime_status == RPM_ACTIVE) {
atomic_inc(>power.usage_count);
ret = true;
} else {
ret = false;
}

spin_unlock_irqrestore(>power.lock, flags);
}

and the caller will simply bail out if "false" is returned, but if "true"
is returned, it will have to drop the usage count, right?

Thanks,
Rafael


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PULL] drm-intel-fixes

2015-12-11 Thread Jani Nikula

Hi Dave -

Here are some i915 fixes for v4.4, sorry for being late this week.

BR,
Jani.

The following changes since commit 527e9316f8ec44bd53d90fb9f611fa752bb9:

  Linux 4.4-rc4 (2015-12-06 15:43:12 -0800)

are available in the git repository at:

  git://anongit.freedesktop.org/drm-intel tags/drm-intel-fixes-2015-12-11

for you to fetch changes up to 634b3a4a476e96816d5d6cd5bb9f8900a53f56ba:

  drm/i915: Do a better job at disabling primary plane in the noatomic case. 
(2015-12-10 13:33:42 +0200)


Maarten Lankhorst (1):
  drm/i915: Do a better job at disabling primary plane in the noatomic case.

Mika Kuoppala (2):
  drm/i915/skl: Disable coarse power gating up until F0
  drm/i915/skl: Double RC6 WRL always on

Tvrtko Ursulin (1):
  drm/i915: Remove incorrect warning in context cleanup

 drivers/gpu/drm/i915/i915_gem_context.c | 2 --
 drivers/gpu/drm/i915/intel_display.c| 4 +++-
 drivers/gpu/drm/i915/intel_pm.c | 5 ++---
 3 files changed, 5 insertions(+), 6 deletions(-)

-- 
Jani Nikula, Intel Open Source Technology Center
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 17/32] drm/i915: Remove the lazy_coherency parameter from request-completed?

2015-12-11 Thread Chris Wilson
Now that we have split out the seqno-barrier from the
engine->get_seqno() callback itself, we can move the users of the
seqno-barrier to the required callsites simplifying the common code and
making the required workaround handling much more explicit.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/i915_debugfs.c  |  4 ++--
 drivers/gpu/drm/i915/i915_drv.h  | 10 ++
 drivers/gpu/drm/i915/i915_gem.c  | 24 +++-
 drivers/gpu/drm/i915/intel_display.c |  2 +-
 drivers/gpu/drm/i915/intel_pm.c  |  4 ++--
 5 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index 6344fe69ab82..8860dec36aae 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -601,7 +601,7 @@ static int i915_gem_pageflip_info(struct seq_file *m, void 
*data)
   
i915_gem_request_get_seqno(work->flip_queued_req),
   dev_priv->next_seqno,
   ring->get_seqno(ring),
-  
i915_gem_request_completed(work->flip_queued_req, true));
+  
i915_gem_request_completed(work->flip_queued_req));
} else
seq_printf(m, "Flip not associated with any 
ring\n");
seq_printf(m, "Flip queued on frame %d, (was ready on 
frame %d), now %d\n",
@@ -1353,8 +1353,8 @@ static int i915_hangcheck_info(struct seq_file *m, void 
*unused)
intel_runtime_pm_get(dev_priv);
 
for_each_ring(ring, dev_priv, i) {
-   seqno[i] = ring->get_seqno(ring);
acthd[i] = intel_ring_get_active_head(ring);
+   seqno[i] = ring->get_seqno(ring);
}
 
intel_runtime_pm_put(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index ff83f148658f..d099e960f9b8 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2978,20 +2978,14 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2)
return (int32_t)(seq1 - seq2) >= 0;
 }
 
-static inline bool i915_gem_request_started(struct drm_i915_gem_request *req,
-  bool lazy_coherency)
+static inline bool i915_gem_request_started(struct drm_i915_gem_request *req)
 {
-   if (!lazy_coherency && req->ring->seqno_barrier)
-   req->ring->seqno_barrier(req->ring);
return i915_seqno_passed(req->ring->get_seqno(req->ring),
 req->previous_seqno);
 }
 
-static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
- bool lazy_coherency)
+static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
 {
-   if (!lazy_coherency && req->ring->seqno_barrier)
-   req->ring->seqno_barrier(req->ring);
return i915_seqno_passed(req->ring->get_seqno(req->ring),
 req->seqno);
 }
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index fa0cf6c9f4d0..f3c1e268f614 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1173,12 +1173,12 @@ static bool __i915_spin_request(struct 
drm_i915_gem_request *req,
 */
 
/* Only spin if we know the GPU is processing this request */
-   if (!i915_gem_request_started(req, true))
+   if (!i915_gem_request_started(req))
return false;
 
timeout = local_clock_us() + 5;
do {
-   if (i915_gem_request_completed(req, true))
+   if (i915_gem_request_completed(req))
return true;
 
if (signal_pending_state(state, wait->task))
@@ -1230,7 +1230,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
if (list_empty(>list))
return 0;
 
-   if (i915_gem_request_completed(req, true))
+   if (i915_gem_request_completed(req))
return 0;
 
timeout_remain = MAX_SCHEDULE_TIMEOUT;
@@ -1299,7 +1299,10 @@ wakeup:  set_task_state(wait.task, state);
 * but it is easier and safer to do it every time the waiter
 * is woken.
 */
-   if (i915_gem_request_completed(req, false))
+   if (req->ring->seqno_barrier)
+   req->ring->seqno_barrier(req->ring);
+
+   if (i915_gem_request_completed(req))
break;
 
/* We need to check whether any gpu reset happened in between
@@ -2731,8 +2734,11 @@ i915_gem_find_active_request(struct intel_engine_cs 
*ring)
 {
struct drm_i915_gem_request *request;
 
+   if (ring->seqno_barrier)
+   

[Intel-gfx] [PATCH 18/32] drm/i915: Use HWS for seqno tracking everywhere

2015-12-11 Thread Chris Wilson
By using the same address for storing the HWS on every platform, we can
remove the platform specific vfuncs and reduce the get-seqno routine to
a single read of a cached memory location.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/i915_debugfs.c  | 10 ++--
 drivers/gpu/drm/i915/i915_drv.h  |  4 +-
 drivers/gpu/drm/i915/i915_gpu_error.c|  2 +-
 drivers/gpu/drm/i915/i915_irq.c  |  4 +-
 drivers/gpu/drm/i915/i915_trace.h|  2 +-
 drivers/gpu/drm/i915/intel_breadcrumbs.c |  4 +-
 drivers/gpu/drm/i915/intel_lrc.c | 46 ++---
 drivers/gpu/drm/i915/intel_ringbuffer.c  | 86 
 drivers/gpu/drm/i915/intel_ringbuffer.h  |  7 +--
 9 files changed, 43 insertions(+), 122 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index 8860dec36aae..a03ed9e38499 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -600,7 +600,7 @@ static int i915_gem_pageflip_info(struct seq_file *m, void 
*data)
   ring->name,
   
i915_gem_request_get_seqno(work->flip_queued_req),
   dev_priv->next_seqno,
-  ring->get_seqno(ring),
+  intel_ring_get_seqno(ring),
   
i915_gem_request_completed(work->flip_queued_req));
} else
seq_printf(m, "Flip not associated with any 
ring\n");
@@ -732,10 +732,8 @@ static void i915_ring_seqno_info(struct seq_file *m,
 {
struct rb_node *rb;
 
-   if (ring->get_seqno) {
-   seq_printf(m, "Current sequence (%s): %x\n",
-  ring->name, ring->get_seqno(ring));
-   }
+   seq_printf(m, "Current sequence (%s): %x\n",
+  ring->name, intel_ring_get_seqno(ring));
 
spin_lock(>breadcrumbs.lock);
for (rb = rb_first(>breadcrumbs.requests);
@@ -1354,7 +1352,7 @@ static int i915_hangcheck_info(struct seq_file *m, void 
*unused)
 
for_each_ring(ring, dev_priv, i) {
acthd[i] = intel_ring_get_active_head(ring);
-   seqno[i] = ring->get_seqno(ring);
+   seqno[i] = intel_ring_get_seqno(ring);
}
 
intel_runtime_pm_put(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index d099e960f9b8..37f4ef59fb4a 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2980,13 +2980,13 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2)
 
 static inline bool i915_gem_request_started(struct drm_i915_gem_request *req)
 {
-   return i915_seqno_passed(req->ring->get_seqno(req->ring),
+   return i915_seqno_passed(intel_ring_get_seqno(req->ring),
 req->previous_seqno);
 }
 
 static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
 {
-   return i915_seqno_passed(req->ring->get_seqno(req->ring),
+   return i915_seqno_passed(intel_ring_get_seqno(req->ring),
 req->seqno);
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c 
b/drivers/gpu/drm/i915/i915_gpu_error.c
index 01d0206ca4dd..3e137fc701cf 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -903,7 +903,7 @@ static void i915_record_ring_state(struct drm_device *dev,
ering->waiting = intel_engine_has_waiter(ring);
ering->instpm = I915_READ(RING_INSTPM(ring->mmio_base));
ering->acthd = intel_ring_get_active_head(ring);
-   ering->seqno = ring->get_seqno(ring);
+   ering->seqno = intel_ring_get_seqno(ring);
ering->start = I915_READ_START(ring);
ering->head = I915_READ_HEAD(ring);
ering->tail = I915_READ_TAIL(ring);
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index da3c8aaa50a3..64502c0d2a81 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2875,7 +2875,7 @@ static int semaphore_passed(struct intel_engine_cs *ring)
if (signaller->hangcheck.deadlock >= I915_NUM_RINGS)
return -1;
 
-   if (i915_seqno_passed(signaller->get_seqno(signaller), seqno))
+   if (i915_seqno_passed(intel_ring_get_seqno(signaller), seqno))
return 1;
 
/* cursory check for an unkickable deadlock */
@@ -2979,7 +2979,7 @@ static void i915_hangcheck_elapsed(struct work_struct 
*work)
semaphore_clear_deadlocks(dev_priv);
 
acthd = intel_ring_get_active_head(ring);
-   seqno = ring->get_seqno(ring);
+   seqno = intel_ring_get_seqno(ring);
 
if (ring->hangcheck.seqno == seqno) {
if (ring_idle(ring, 

[Intel-gfx] Slaughter the thundering i915_wait_request, v3?

2015-12-11 Thread Chris Wilson
The biggest change is the revised bottom-half for handling user
interupts (now we use the waiter on the oldest request as the
bottom-half). That and the review feedback on Daniel on handling resets
(and hangcheck) during the wait. Oh, and some interrupt/seqno timing review.

Available from
http://cgit.freedesktop.org/~ickle/linux-2.6/log/?h=breadcrumbs
-Chris

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 01/32] drm/i915: Break busywaiting for requests on pending signals

2015-12-11 Thread Chris Wilson
The busywait in __i915_spin_request() does not respect pending signals
and so may consume the entire timeslice for the task instead of
returning to userspace to handle the signal.

In the worst case this could cause a delay in signal processing of 20ms,
which would be a noticeable jitter in cursor tracking. If a higher
resolution signal was being used, for example to provide fairness of a
server timeslices between clients, we could expect to detect some
unfairness between clients (i.e. some windows not updating as fast as
others). This issue was noticed when inspecting a report of poor
interactivity resulting from excessively high __i915_spin_request usage.

Fixes regression from
commit 2def4ad99befa25775dd2f714fdd4d92faec6e34 [v4.2]
Author: Chris Wilson 
Date:   Tue Apr 7 16:20:41 2015 +0100

 drm/i915: Optimistically spin for the request completion

v2: Try to assess the impact of the bug

Signed-off-by: Chris Wilson 
Reviewed-by: Tvrtko Ursulin 
Cc: Jens Axboe 
Cc; "Rogozhkin, Dmitry V" 
Cc: Daniel Vetter 
Cc: Tvrtko Ursulin 
Cc: Eero Tamminen 
Cc: "Rantala, Valtteri" 
Cc: sta...@vger.kernel.org
---
 drivers/gpu/drm/i915/i915_gem.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 8e2acdebc74a..7e1246410afc 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1146,7 +1146,7 @@ static bool missed_irq(struct drm_i915_private *dev_priv,
return test_bit(ring->id, _priv->gpu_error.missed_irq_rings);
 }
 
-static int __i915_spin_request(struct drm_i915_gem_request *req)
+static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
 {
unsigned long timeout;
 
@@ -1158,6 +1158,9 @@ static int __i915_spin_request(struct 
drm_i915_gem_request *req)
if (i915_gem_request_completed(req, true))
return 0;
 
+   if (signal_pending_state(state, current))
+   break;
+
if (time_after_eq(jiffies, timeout))
break;
 
@@ -1197,6 +1200,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
struct drm_i915_private *dev_priv = dev->dev_private;
const bool irq_test_in_progress =
ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) & 
intel_ring_flag(ring);
+   int state = interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
DEFINE_WAIT(wait);
unsigned long timeout_expire;
s64 before, now;
@@ -1229,7 +1233,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
before = ktime_get_raw_ns();
 
/* Optimistic spin for the next jiffie before touching IRQs */
-   ret = __i915_spin_request(req);
+   ret = __i915_spin_request(req, state);
if (ret == 0)
goto out;
 
@@ -1241,8 +1245,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
for (;;) {
struct timer_list timer;
 
-   prepare_to_wait(>irq_queue, ,
-   interruptible ? TASK_INTERRUPTIBLE : 
TASK_UNINTERRUPTIBLE);
+   prepare_to_wait(>irq_queue, , state);
 
/* We need to check whether any gpu reset happened in between
 * the caller grabbing the seqno and now ... */
@@ -1260,7 +1263,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
break;
}
 
-   if (interruptible && signal_pending(current)) {
+   if (signal_pending_state(state, current)) {
ret = -ERESTARTSYS;
break;
}
-- 
2.6.3

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 12/32] drm/i915: Remove the dedicated hangcheck workqueue

2015-12-11 Thread Chris Wilson
The queue only ever contains at most one item and has no special flags.
It is just a very simple wrapper around the system-wq - a complication
with no benefits.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/i915_dma.c | 11 ---
 drivers/gpu/drm/i915/i915_drv.h |  1 -
 drivers/gpu/drm/i915/i915_irq.c |  6 +++---
 3 files changed, 3 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 84e2b202ecb5..1fdb52048cea 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1013,14 +1013,6 @@ int i915_driver_load(struct drm_device *dev, unsigned 
long flags)
goto out_freewq;
}
 
-   dev_priv->gpu_error.hangcheck_wq =
-   alloc_ordered_workqueue("i915-hangcheck", 0);
-   if (dev_priv->gpu_error.hangcheck_wq == NULL) {
-   DRM_ERROR("Failed to create our hangcheck workqueue.\n");
-   ret = -ENOMEM;
-   goto out_freedpwq;
-   }
-
intel_irq_init(dev_priv);
intel_uncore_sanitize(dev);
 
@@ -1100,8 +1092,6 @@ out_gem_unload:
intel_teardown_gmbus(dev);
intel_teardown_mchbar(dev);
pm_qos_remove_request(_priv->pm_qos);
-   destroy_workqueue(dev_priv->gpu_error.hangcheck_wq);
-out_freedpwq:
destroy_workqueue(dev_priv->hotplug.dp_wq);
 out_freewq:
destroy_workqueue(dev_priv->wq);
@@ -1201,7 +1191,6 @@ int i915_driver_unload(struct drm_device *dev)
 
destroy_workqueue(dev_priv->hotplug.dp_wq);
destroy_workqueue(dev_priv->wq);
-   destroy_workqueue(dev_priv->gpu_error.hangcheck_wq);
pm_qos_remove_request(_priv->pm_qos);
 
i915_global_gtt_cleanup(dev);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 987a35c5af72..9304ecfa05d4 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1333,7 +1333,6 @@ struct i915_gpu_error {
/* Hang gpu twice in this window and your context gets banned */
 #define DRM_I915_CTX_BAN_PERIOD DIV_ROUND_UP(8*DRM_I915_HANGCHECK_PERIOD, 1000)
 
-   struct workqueue_struct *hangcheck_wq;
struct delayed_work hangcheck_work;
 
/* For reset and error_state handling. */
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 21089ac5dd58..afe04aeb858d 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -3073,7 +3073,7 @@ static void i915_hangcheck_elapsed(struct work_struct 
*work)
 
 void i915_queue_hangcheck(struct drm_i915_private *dev_priv)
 {
-   struct i915_gpu_error *e = _priv->gpu_error;
+   unsigned long delay;
 
if (!i915.enable_hangcheck)
return;
@@ -3083,8 +3083,8 @@ void i915_queue_hangcheck(struct drm_i915_private 
*dev_priv)
 * we will ignore a hung ring if a second ring is kept busy.
 */
 
-   queue_delayed_work(e->hangcheck_wq, >hangcheck_work,
-  
round_jiffies_up_relative(DRM_I915_HANGCHECK_JIFFIES));
+   delay = round_jiffies_up_relative(DRM_I915_HANGCHECK_JIFFIES);
+   schedule_delayed_work(_priv->gpu_error.hangcheck_work, delay);
 }
 
 static void ibx_irq_reset(struct drm_device *dev)
-- 
2.6.3

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 19/32] drm/i915: Check the CPU cached value of seqno after waking the waiter

2015-12-11 Thread Chris Wilson
If we have multiple waiters, we may find that many complete on the same
wake up. If we first inspect the seqno from the CPU cache, we may reduce
the number of heavyweight coherent seqno reads we require.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/i915_gem.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index f3c1e268f614..15495b8112f9 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1288,6 +1288,12 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 
 wakeup:set_task_state(wait.task, state);
 
+   /* Before we do the heavier coherent read of the seqno,
+* check the value (hopefully) in the CPU cacheline.
+*/
+   if (i915_gem_request_completed(req))
+   break;
+
/* Ensure our read of the seqno is coherent so that we
 * do not "miss an interrupt" (i.e. if this is the last
 * request and the seqno write from the GPU is not visible
@@ -1299,11 +1305,11 @@ wakeup: set_task_state(wait.task, state);
 * but it is easier and safer to do it every time the waiter
 * is woken.
 */
-   if (req->ring->seqno_barrier)
+   if (req->ring->seqno_barrier) {
req->ring->seqno_barrier(req->ring);
-
-   if (i915_gem_request_completed(req))
-   break;
+   if (i915_gem_request_completed(req))
+   break;
+   }
 
/* We need to check whether any gpu reset happened in between
 * the request being submitted and now. If a reset has occurred,
-- 
2.6.3

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 07/32] drm/i915: Store the reset counter when constructing a request

2015-12-11 Thread Chris Wilson
As the request is only valid during the same global reset epoch, we can
record the current reset_counter when constructing the request and reuse
it when waiting upon that request in future. This removes a very hairy
atomic check serialised by the struct_mutex at the time of waiting and
allows us to transfer those waits to a central dispatcher for all
waiters and all requests.

Signed-off-by: Chris Wilson 
Cc: Daniel Vetter 
---
 drivers/gpu/drm/i915/i915_drv.h |  2 +-
 drivers/gpu/drm/i915/i915_gem.c | 40 +++--
 drivers/gpu/drm/i915/intel_display.c|  7 +-
 drivers/gpu/drm/i915/intel_lrc.c|  7 --
 drivers/gpu/drm/i915/intel_ringbuffer.c |  6 -
 5 files changed, 15 insertions(+), 47 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 1043ddd670a5..f30c305a6889 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2178,6 +2178,7 @@ struct drm_i915_gem_request {
/** On Which ring this request was generated */
struct drm_i915_private *i915;
struct intel_engine_cs *ring;
+   unsigned reset_counter;
 
 /** GEM sequence number associated with the previous request,
  * when the HWS breadcrumb is equal to this the GPU is processing
@@ -3059,7 +3060,6 @@ void __i915_add_request(struct drm_i915_gem_request *req,
 #define i915_add_request_no_flush(req) \
__i915_add_request(req, NULL, false)
 int __i915_wait_request(struct drm_i915_gem_request *req,
-   unsigned reset_counter,
bool interruptible,
s64 *timeout,
struct intel_rps_client *rps);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 27e617b76418..b17cc0e42a4f 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1214,7 +1214,6 @@ static int __i915_spin_request(struct 
drm_i915_gem_request *req, int state)
 /**
  * __i915_wait_request - wait until execution of request has finished
  * @req: duh!
- * @reset_counter: reset sequence associated with the given request
  * @interruptible: do an interruptible wait (normally yes)
  * @timeout: in - how long to wait (NULL forever); out - how much time 
remaining
  *
@@ -1229,7 +1228,6 @@ static int __i915_spin_request(struct 
drm_i915_gem_request *req, int state)
  * errno with remaining time filled in timeout argument.
  */
 int __i915_wait_request(struct drm_i915_gem_request *req,
-   unsigned reset_counter,
bool interruptible,
s64 *timeout,
struct intel_rps_client *rps)
@@ -1288,7 +1286,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 
/* We need to check whether any gpu reset happened in between
 * the caller grabbing the seqno and now ... */
-   if (reset_counter != i915_reset_counter(_priv->gpu_error)) {
+   if (req->reset_counter != 
i915_reset_counter(_priv->gpu_error)) {
/* ... but upgrade the -EAGAIN to an -EIO if the gpu
 * is truely gone. */
ret = i915_gem_check_wedge(_priv->gpu_error, 
interruptible);
@@ -1461,13 +1459,7 @@ i915_wait_request(struct drm_i915_gem_request *req)
 
BUG_ON(!mutex_is_locked(>struct_mutex));
 
-   ret = i915_gem_check_wedge(_priv->gpu_error, interruptible);
-   if (ret)
-   return ret;
-
-   ret = __i915_wait_request(req,
- i915_reset_counter(_priv->gpu_error),
- interruptible, NULL, NULL);
+   ret = __i915_wait_request(req, interruptible, NULL, NULL);
if (ret)
return ret;
 
@@ -1542,7 +1534,6 @@ i915_gem_object_wait_rendering__nonblocking(struct 
drm_i915_gem_object *obj,
struct drm_device *dev = obj->base.dev;
struct drm_i915_private *dev_priv = dev->dev_private;
struct drm_i915_gem_request *requests[I915_NUM_RINGS];
-   unsigned reset_counter;
int ret, i, n = 0;
 
BUG_ON(!mutex_is_locked(>struct_mutex));
@@ -1551,12 +1542,6 @@ i915_gem_object_wait_rendering__nonblocking(struct 
drm_i915_gem_object *obj,
if (!obj->active)
return 0;
 
-   ret = i915_gem_check_wedge(_priv->gpu_error, true);
-   if (ret)
-   return ret;
-
-   reset_counter = i915_reset_counter(_priv->gpu_error);
-
if (readonly) {
struct drm_i915_gem_request *req;
 
@@ -1578,9 +1563,9 @@ i915_gem_object_wait_rendering__nonblocking(struct 
drm_i915_gem_object *obj,
}
 
mutex_unlock(>struct_mutex);
+   ret = 0;
for (i = 0; ret == 0 && i < n; i++)
-   ret = __i915_wait_request(requests[i], reset_counter, true,
-

[Intel-gfx] [PATCH 05/32] drm/i915: Simplify checking of GPU reset_counter in display pageflips

2015-12-11 Thread Chris Wilson
If we, when we store the reset_counter for the operation, we ensure that
it is not in a wedged or in the middle of a reset, we can then assert that
if any reset occurs the reset_counter must change. Later we can just
compare the operation's reset epoch against the current counter to see
if we need to abort the operation (to handle the hang).

Signed-off-by: Chris Wilson 
Cc: Daniel Vetter 
---
 drivers/gpu/drm/i915/intel_display.c | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index cc47c0206294..8b6028cd619f 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -3283,14 +3283,12 @@ void intel_finish_reset(struct drm_device *dev)
 static bool intel_crtc_has_pending_flip(struct drm_crtc *crtc)
 {
struct drm_device *dev = crtc->dev;
-   struct drm_i915_private *dev_priv = dev->dev_private;
struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
unsigned reset_counter;
bool pending;
 
-   reset_counter = i915_reset_counter(_priv->gpu_error);
-   if (intel_crtc->reset_counter != reset_counter ||
-   __i915_reset_in_progress_or_wedged(reset_counter))
+   reset_counter = i915_reset_counter(_i915(dev)->gpu_error);
+   if (intel_crtc->reset_counter != reset_counter)
return false;
 
spin_lock_irq(>event_lock);
@@ -10947,8 +10945,7 @@ static bool page_flip_finished(struct intel_crtc *crtc)
unsigned reset_counter;
 
reset_counter = i915_reset_counter(_priv->gpu_error);
-   if (crtc->reset_counter != reset_counter ||
-   __i915_reset_in_progress_or_wedged(reset_counter))
+   if (crtc->reset_counter != reset_counter)
return true;
 
/*
@@ -11604,8 +11601,13 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
if (ret)
goto cleanup;
 
-   atomic_inc(_crtc->unpin_work_count);
intel_crtc->reset_counter = i915_reset_counter(_priv->gpu_error);
+   if (__i915_reset_in_progress_or_wedged(intel_crtc->reset_counter)) {
+   ret = -EIO;
+   goto cleanup;
+   }
+
+   atomic_inc(_crtc->unpin_work_count);
 
if (INTEL_INFO(dev)->gen >= 5 || IS_G4X(dev))
work->flip_count = I915_READ(PIPE_FLIPCOUNT_G4X(pipe)) + 1;
-- 
2.6.3

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 14/32] drm/i915: Remove forcewake dance from seqno/irq barrier on legacy gen6+

2015-12-11 Thread Chris Wilson
In order to ensure seqno/irq coherency, we current read a ring register.
We are not sure quite how it works, only that is does. Experiments show
that e.g. doing a clflush(seqno) instead is not sufficient, but we can
remove the forcewake dance from the mmio access.

v2: Baytrail wants a clflush too.

Signed-off-by: Chris Wilson 
Cc: Daniel Vetter 
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 6cecc15ec01b..69dd69e46fa9 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1490,10 +1490,21 @@ gen6_ring_get_seqno(struct intel_engine_cs *ring, bool 
lazy_coherency)
 {
/* Workaround to force correct ordering between irq and seqno writes on
 * ivb (and maybe also on snb) by reading from a CS register (like
-* ACTHD) before reading the status page. */
+* ACTHD) before reading the status page.
+*
+* Note that this effectively effectively stalls the read by the time
+* it takes to do a memory transaction, which more or less ensures
+* that the write from the GPU has sufficient time to invalidate
+* the CPU cacheline. Alternatively we could delay the interrupt from
+* the CS ring to give the write time to land, but that would incur
+* a delay after every batch i.e. much more frequent than a delay
+* when waiting for the interrupt (with the same net latency).
+*/
if (!lazy_coherency) {
struct drm_i915_private *dev_priv = ring->dev->dev_private;
-   POSTING_READ(RING_ACTHD(ring->mmio_base));
+   POSTING_READ_FW(RING_ACTHD(ring->mmio_base));
+
+   intel_flush_status_page(ring, I915_GEM_HWS_INDEX);
}
 
return intel_read_status_page(ring, I915_GEM_HWS_INDEX);
-- 
2.6.3

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [RFC 08/12] drm/i915: Interrupt driven fences

2015-12-11 Thread Tvrtko Ursulin


Hi,

Some random comments, mostly from the point of view of solving the 
thundering herd problem.


On 23/11/15 11:34, john.c.harri...@intel.com wrote:

From: John Harrison 

The intended usage model for struct fence is that the signalled status
should be set on demand rather than polled. That is, there should not
be a need for a 'signaled' function to be called everytime the status
is queried. Instead, 'something' should be done to enable a signal
callback from the hardware which will update the state directly. In
the case of requests, this is the seqno update interrupt. The idea is
that this callback will only be enabled on demand when something
actually tries to wait on the fence.

This change removes the polling test and replaces it with the callback
scheme. Each fence is added to a 'please poke me' list at the start of
i915_add_request(). The interrupt handler then scans through the 'poke
me' list when a new seqno pops out and signals any matching
fence/request. The fence is then removed from the list so the entire
request stack does not need to be scanned every time. Note that the
fence is added to the list before the commands to generate the seqno
interrupt are added to the ring. Thus the sequence is guaranteed to be
race free if the interrupt is already enabled.

Note that the interrupt is only enabled on demand (i.e. when
__wait_request() is called). Thus there is still a potential race when
enabling the interrupt as the request may already have completed.
However, this is simply solved by calling the interrupt processing
code immediately after enabling the interrupt and thereby checking for
already completed requests.

Lastly, the ring clean up code has the possibility to cancel
outstanding requests (e.g. because TDR has reset the ring). These
requests will never get signalled and so must be removed from the
signal list manually. This is done by setting a 'cancelled' flag and
then calling the regular notify/retire code path rather than
attempting to duplicate the list manipulatation and clean up code in
multiple places. This also avoid any race condition where the
cancellation request might occur after/during the completion interrupt
actually arriving.

v2: Updated to take advantage of the request unreference no longer
requiring the mutex lock.

v3: Move the signal list processing around to prevent unsubmitted
requests being added to the list. This was occurring on Android
because the native sync implementation calls the
fence->enable_signalling API immediately on fence creation.

Updated after review comments by Tvrtko Ursulin. Renamed list nodes to
'link' instead of 'list'. Added support for returning an error code on
a cancelled fence. Update list processing to be more efficient/safer
with respect to spinlocks.

For: VIZ-5190
Signed-off-by: John Harrison 
Cc: Tvrtko Ursulin 
---
  drivers/gpu/drm/i915/i915_drv.h |  10 ++
  drivers/gpu/drm/i915/i915_gem.c | 187 ++--
  drivers/gpu/drm/i915/i915_irq.c |   2 +
  drivers/gpu/drm/i915/intel_lrc.c|   2 +
  drivers/gpu/drm/i915/intel_ringbuffer.c |   2 +
  drivers/gpu/drm/i915/intel_ringbuffer.h |   2 +
  6 files changed, 196 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index fbf591f..d013c6d 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2187,7 +2187,12 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
  struct drm_i915_gem_request {
/** Underlying object for implementing the signal/wait stuff. */
struct fence fence;
+   struct list_head signal_link;
+   struct list_head unsignal_link;
struct list_head delayed_free_link;
+   bool cancelled;
+   bool irq_enabled;
+   bool signal_requested;

/** On Which ring this request was generated */
struct drm_i915_private *i915;
@@ -2265,6 +2270,11 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
   struct drm_i915_gem_request **req_out);
  void i915_gem_request_cancel(struct drm_i915_gem_request *req);

+void i915_gem_request_submit(struct drm_i915_gem_request *req);
+void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req,
+  bool fence_locked);
+void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked);
+
  int i915_create_fence_timeline(struct drm_device *dev,
   struct intel_context *ctx,
   struct intel_engine_cs *ring);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 171ae5f..2a0b346 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1165,6 +1165,8 @@ static int __i915_spin_request(struct 
drm_i915_gem_request *req)

timeout = jiffies + 1;
while 

[Intel-gfx] [PATCH] tests/kms_color:Color IGT

2015-12-11 Thread Dhanya Pillai
From: Dhanya 

This patch will verify color correction capability of a display driver.
Gamma/CSC/De-gamma for SKL/BXT supported.

Signed-off-by: Dhanya 
---
 tests/.gitignore   |   1 +
 tests/Makefile.sources |   1 +
 tests/kms_color.c  | 684 +
 3 files changed, 686 insertions(+)
 create mode 100644 tests/kms_color.c

diff --git a/tests/.gitignore b/tests/.gitignore
index 80af9a7..58c79e2 100644
--- a/tests/.gitignore
+++ b/tests/.gitignore
@@ -127,6 +127,7 @@ gen7_forcewake_mt
 kms_3d
 kms_addfb_basic
 kms_atomic
+kms_color
 kms_crtc_background_color
 kms_cursor_crc
 kms_draw_crc
diff --git a/tests/Makefile.sources b/tests/Makefile.sources
index 8fb2de8..906c14f 100644
--- a/tests/Makefile.sources
+++ b/tests/Makefile.sources
@@ -64,6 +64,7 @@ TESTS_progs_M = \
gem_write_read_ring_switch \
kms_addfb_basic \
kms_atomic \
+   kms_color \
kms_cursor_crc \
kms_draw_crc \
kms_fbc_crc \
diff --git a/tests/kms_color.c b/tests/kms_color.c
new file mode 100644
index 000..b5d199b
--- /dev/null
+++ b/tests/kms_color.c
@@ -0,0 +1,684 @@
+/*
+ * Copyright ?? 2015 Intel Corporation
+ *
+  * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#include 
+#include "drmtest.h"
+#include "drm.h"
+#include "igt_debugfs.h"
+#include "igt_kms.h"
+#include "igt_core.h"
+#include "intel_io.h"
+#include "intel_chipset.h"
+#include "igt_aux.h"
+#include
+#include
+#include 
+#include 
+
+
+IGT_TEST_DESCRIPTION("Test Color Features at Pipe level");
+/*
+This tool tests the following color features:
+   1.csc-red
+   2.csc-green
+   3.csc-blue
+   4.gamma-legacy
+   5.gamma-8bit
+   6.gamma-10bit
+   7.gamma-12bit
+   8.gamma-split
+
+Verification is done by CRC checks.
+
+*/
+
+#define CSC_MAX_VALS9
+#define GEN9_SPLITGAMMA_MAX_VALS512
+#define GEN9_8BIT_GAMMA_MAX_VALS256
+#define GEN9_10BIT_GAMMA_MAX_VALS   1024
+#define GEN9_12BIT_GAMMA_MAX_VALS   513
+#define GEN9_MAX_GAMMA ((1 << 24) - 1)
+#define GEN9_MIN_GAMMA 0
+#define RED_CSC 0
+#define GREEN_CSC 1
+#define BLUE_CSC 2
+#define RED_FB 0
+#define GREEN_FB 1
+#define BLUE_FB 2
+
+struct _drm_r32g32b32 {
+   __u32 r32;
+   __u32 g32;
+   __u32 b32;
+   __u32 reserved;
+};
+
+struct _drm_palette {
+   struct _drm_r32g32b32 lut[0];
+};
+
+struct _drm_ctm {
+   __s64 ctm_coeff[9];
+};
+
+float ctm_red[9] = {1, 1, 1, 0, 0, 0, 0, 0, 0};
+float ctm_green[9] = {0, 0, 0, 1, 1, 1, 0, 0, 0};
+float ctm_blue[9] = {0, 0, 0, 0, 0, 0, 1, 1, 1};
+float ctm_unity[9] = {1, 0, 0, 0, 1, 0, 0, 0, 1};
+
+struct framebuffer_color {
+   int red;
+   int green;
+   int blue;
+};
+struct framebuffer_color fb_color = {0,0,0};
+
+igt_crc_t crc_reference, crc_reference_black, crc_reference_white;
+igt_crc_t crc_black, crc_white, crc_current;
+
+struct data_t {
+   int fb_initial;
+   int drm_fd;
+   int gen;
+   int w, h;
+   igt_display_t display;
+   struct igt_fb fb_prep;
+   struct igt_fb fb, fb1;
+igt_pipe_crc_t *pipe_crc;
+   enum pipe pipe;
+
+};
+
+
+static int create_blob(int fd, uint64_t *data, int length)
+{
+   struct drm_mode_create_blob blob;
+   int ret = -1;
+
+   blob.data = (uint64_t)data;
+   blob.length = length;
+   blob.blob_id = -1;
+   ret = ioctl(fd, DRM_IOCTL_MODE_CREATEPROPBLOB, );
+   if (!ret)
+   return blob.blob_id;
+   igt_fail(IGT_EXIT_FAILURE);
+   return ret;
+}
+
+static void prepare_crtc(struct data_t *data, igt_output_t *output,
+enum pipe pipe1, igt_plane_t *plane, drmModeModeInfo 
*mode,
+enum igt_commit_style s)
+{
+   igt_display_t 

Re: [Intel-gfx] [PATCH V4 2/2] drm/i915: start adding dp mst audio

2015-12-11 Thread Takashi Iwai
On Fri, 11 Dec 2015 07:07:53 +0100,
Libin Yang wrote:
> 
> >>> diff --git a/drivers/gpu/drm/i915/intel_audio.c 
> >>> b/drivers/gpu/drm/i915/intel_audio.c
> >>> index 9aa83e7..5ad2e66 100644
> >>> --- a/drivers/gpu/drm/i915/intel_audio.c
> >>> +++ b/drivers/gpu/drm/i915/intel_audio.c
> >>> @@ -262,7 +262,8 @@ static void hsw_audio_codec_disable(struct 
> >>> intel_encoder *encoder)
> >>>   tmp |= AUD_CONFIG_N_PROG_ENABLE;
> >>>   tmp &= ~AUD_CONFIG_UPPER_N_MASK;
> >>>   tmp &= ~AUD_CONFIG_LOWER_N_MASK;
> >>> - if (intel_pipe_has_type(intel_crtc, INTEL_OUTPUT_DISPLAYPORT))
> >>> + if (intel_pipe_has_type(intel_crtc, INTEL_OUTPUT_DISPLAYPORT) ||
> >>> + intel_pipe_has_type(intel_crtc, INTEL_OUTPUT_DP_MST))
> >>>   tmp |= AUD_CONFIG_N_VALUE_INDEX;

The same check is missing in hsw_audio_codec_enable()?

> >>>   I915_WRITE(HSW_AUD_CFG(pipe), tmp);
> >>>
> >>> @@ -474,7 +475,8 @@ static void ilk_audio_codec_enable(struct 
> >>> drm_connector *connector,
> >>>   tmp &= ~AUD_CONFIG_N_VALUE_INDEX;
> >>>   tmp &= ~AUD_CONFIG_N_PROG_ENABLE;
> >>>   tmp &= ~AUD_CONFIG_PIXEL_CLOCK_HDMI_MASK;
> >>> - if (intel_pipe_has_type(intel_crtc, INTEL_OUTPUT_DISPLAYPORT))
> >>> + if (intel_pipe_has_type(intel_crtc, INTEL_OUTPUT_DISPLAYPORT) ||
> >>> + intel_pipe_has_type(intel_crtc, INTEL_OUTPUT_DP_MST))
> >>>   tmp |= AUD_CONFIG_N_VALUE_INDEX;

... and missing for ilk_audio_codec_disable()?


> >>>   else
> >>>   tmp |= audio_config_hdmi_pixel_clock(adjusted_mode);
> >>> @@ -512,7 +514,8 @@ void intel_audio_codec_enable(struct intel_encoder 
> >>> *intel_encoder)
> >>>
> >>>   /* ELD Conn_Type */
> >>>   connector->eld[5] &= ~(3 << 2);
> >>> - if (intel_pipe_has_type(crtc, INTEL_OUTPUT_DISPLAYPORT))
> >>> + if (intel_pipe_has_type(crtc, INTEL_OUTPUT_DISPLAYPORT) ||
> >>> + intel_pipe_has_type(crtc, INTEL_OUTPUT_DP_MST))

IMO, it's better to have a macro to cover this two-line check instead
of open-coding at each place.  We'll have 5 places in the end.


thanks,

Takashi
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Wait for PP cycle delay only if panel is in power off sequence

2015-12-11 Thread Kumar, Shobhit

On 12/11/2015 04:55 PM, Thulasimani, Sivakumar wrote:



On 12/10/2015 8:32 PM, Ville Syrjälä wrote:

On Thu, Dec 10, 2015 at 08:09:01PM +0530, Thulasimani, Sivakumar wrote:


On 12/10/2015 7:08 PM, Ville Syrjälä wrote:

On Thu, Dec 10, 2015 at 03:15:37PM +0200, Ville Syrjälä wrote:

On Thu, Dec 10, 2015 at 03:01:02PM +0530, Kumar, Shobhit wrote:

On 12/09/2015 09:35 PM, Ville Syrjälä wrote:

On Wed, Dec 09, 2015 at 08:59:26PM +0530, Shobhit Kumar wrote:

On Wed, Dec 9, 2015 at 8:34 PM, Chris Wilson
 wrote:

On Wed, Dec 09, 2015 at 08:07:10PM +0530, Shobhit Kumar wrote:

On Wed, Dec 9, 2015 at 7:27 PM, Ville Syrjälä
 wrote:

On Wed, Dec 09, 2015 at 06:51:48PM +0530, Shobhit Kumar wrote:

During resume, while turning the EDP panel power on, we need
not wait
blindly for panel_power_cycle_delay. Check if panel power
down sequence
in progress and then only wait. This improves our resume
time significantly.

Signed-off-by: Shobhit Kumar 
---
drivers/gpu/drm/i915/intel_dp.c | 17 -
1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_dp.c
b/drivers/gpu/drm/i915/intel_dp.c
index f335c92..10ec669 100644
--- a/drivers/gpu/drm/i915/intel_dp.c
+++ b/drivers/gpu/drm/i915/intel_dp.c
@@ -617,6 +617,20 @@ static bool edp_have_panel_power(struct
intel_dp *intel_dp)
 return (I915_READ(_pp_stat_reg(intel_dp)) & PP_ON)
!= 0;
}

+static bool edp_panel_off_seq(struct intel_dp *intel_dp)
+{
+ struct drm_device *dev = intel_dp_to_dev(intel_dp);
+ struct drm_i915_private *dev_priv = dev->dev_private;
+
+ lockdep_assert_held(_priv->pps_mutex);
+
+ if (IS_VALLEYVIEW(dev) &&
+ intel_dp->pps_pipe == INVALID_PIPE)
+ return false;
+
+ return (I915_READ(_pp_stat_reg(intel_dp)) &
PP_SEQUENCE_POWER_DOWN) != 0;
+}

This doens't make sense to me. The power down cycle may have
completed just before, and so this would claim we don't have to
wait for the power_cycle_delay.

Not sure I understand your concern correctly. You are right,
power
down cycle may have completed just before and if it has then
we don't
need to wait. But in case the power down cycle is in progress
as per
internal state, then we need to wait for it to complete. This
will
happen for example in non-suspend disable path and will be
handled
correctly. In case of actual suspend/resume, this would have
successfully completed and will skip the wait as it is not needed
before enabling panel power.


+
static bool edp_have_panel_vdd(struct intel_dp *intel_dp)
{
 struct drm_device *dev = intel_dp_to_dev(intel_dp);
@@ -2025,7 +2039,8 @@ static void edp_panel_on(struct
intel_dp *intel_dp)
  port_name(dp_to_dig_port(intel_dp)->port)))
 return;

- wait_panel_power_cycle(intel_dp);
+ if (edp_panel_off_seq(intel_dp))
+ wait_panel_power_cycle(intel_dp);

Looking in from the side, I have no idea what this is meant to
do. At
the very least you need your explanatory paragraph here which
would
include what exactly you are waiting for at the start of
edp_panel_on
(and please try and find a better name for edp_panel_off_seq()).

I will add a comment. Basically I am not additionally waiting, but
converting the wait which was already there to a conditional
wait. The
edp_panel_off_seq, checks if panel power down sequence is in
progress.
In that case we need to wait for the panel power cycle delay. If
it is
not in that sequence, there is no need to wait. I will make an
attempt
again on the naming in next patch update.

As far I remeber you need to wait for power_cycle_delay between
power
down cycle and power up cycle. You're trying to throw that wait away
entirely, unless the function happens get called while the power
down

Yes you are right and I realize I made a mistake in my patch which is
not checking PP_CYCLE_DELAY_ACTIVE bit.


cycle is still in progress. We should already optimize away
redundant
waits by tracking the end of the power down cycle with the jiffies
tracking.

Actually looking at the code the power_cycle_delay gets counted from
the start of the last power down cycle, so supposedly it's always at
least as long as the power down cycle, and typically it's quite a
bit
longer that that. But that doesn't change the fact that you can't
just skip it because the power down cycle delay happened to end
already.

So what we do now is:
1. initiate power down cycle
2. last_power_cycle=jiffies
3. wait for power down (I suppose this actually waits
  until the power down delay has passed since that's
  programmes into the PPS).
4. wait for power_cycle_delay from last_power_cycle
5. initiate power up cycle

I think with your patch step 4 would always be skipped since the
power down cycle has already ended, and then we fail to honor the
power cycle delay.

Yes, I agree. I missed checking for PP_CYCLE_DELAY_ACTIVE. Adding
that
check 

Re: [Intel-gfx] [PATCH 2/6] drm/i915: Support for creating Stolen memory backed objects

2015-12-11 Thread Ankitprasad Sharma
On Wed, 2015-12-09 at 14:06 +, Tvrtko Ursulin wrote:
> Hi,
> 
> On 09/12/15 12:46, ankitprasad.r.sha...@intel.com wrote:
> > From: Ankitprasad Sharma 
> >
> > Extend the drm_i915_gem_create structure to add support for
> > creating Stolen memory backed objects. Added a new flag through
> > which user can specify the preference to allocate the object from
> > stolen memory, which if set, an attempt will be made to allocate
> > the object from stolen memory subject to the availability of
> > free space in the stolen region.
> >
> > v2: Rebased to the latest drm-intel-nightly (Ankit)
> >
> > v3: Changed versioning of GEM_CREATE param, added new comments (Tvrtko)
> >
> > v4: Changed size from 32b to 64b to prevent userspace overflow (Tvrtko)
> > Corrected function arguments ordering (Chris)
> >
> > v5: Corrected function name (Chris)
> >
> > Testcase: igt/gem_stolen
> >
> > Signed-off-by: Ankitprasad Sharma 
> > Reviewed-by: Tvrtko Ursulin 
> > ---
> >   drivers/gpu/drm/i915/i915_dma.c|  3 +++
> >   drivers/gpu/drm/i915/i915_drv.h|  2 +-
> >   drivers/gpu/drm/i915/i915_gem.c| 30 +++---
> >   drivers/gpu/drm/i915/i915_gem_stolen.c |  4 ++--
> >   include/uapi/drm/i915_drm.h| 16 
> >   5 files changed, 49 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_dma.c 
> > b/drivers/gpu/drm/i915/i915_dma.c
> > index ffcb9c6..6927c7e 100644
> > --- a/drivers/gpu/drm/i915/i915_dma.c
> > +++ b/drivers/gpu/drm/i915/i915_dma.c
> > @@ -170,6 +170,9 @@ static int i915_getparam(struct drm_device *dev, void 
> > *data,
> > case I915_PARAM_HAS_RESOURCE_STREAMER:
> > value = HAS_RESOURCE_STREAMER(dev);
> > break;
> > +   case I915_PARAM_CREATE_VERSION:
> > +   value = 2;
> > +   break;
> > default:
> > DRM_DEBUG("Unknown parameter %d\n", param->param);
> > return -EINVAL;
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h 
> > b/drivers/gpu/drm/i915/i915_drv.h
> > index 8e554d3..d45274e 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -3213,7 +3213,7 @@ void i915_gem_stolen_remove_node(struct 
> > drm_i915_private *dev_priv,
> >   int i915_gem_init_stolen(struct drm_device *dev);
> >   void i915_gem_cleanup_stolen(struct drm_device *dev);
> >   struct drm_i915_gem_object *
> > -i915_gem_object_create_stolen(struct drm_device *dev, u32 size);
> > +i915_gem_object_create_stolen(struct drm_device *dev, u64 size);
> >   struct drm_i915_gem_object *
> >   i915_gem_object_create_stolen_for_preallocated(struct drm_device *dev,
> >u32 stolen_offset,
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c 
> > b/drivers/gpu/drm/i915/i915_gem.c
> > index d57e850..296e63f 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -375,6 +375,7 @@ static int
> >   i915_gem_create(struct drm_file *file,
> > struct drm_device *dev,
> > uint64_t size,
> > +   uint32_t flags,
> > uint32_t *handle_p)
> >   {
> > struct drm_i915_gem_object *obj;
> > @@ -385,8 +386,31 @@ i915_gem_create(struct drm_file *file,
> > if (size == 0)
> > return -EINVAL;
> >
> > +   if (flags & __I915_CREATE_UNKNOWN_FLAGS)
> > +   return -EINVAL;
> > +
> > /* Allocate the new object */
> > -   obj = i915_gem_alloc_object(dev, size);
> > +   if (flags & I915_CREATE_PLACEMENT_STOLEN) {
> > +   mutex_lock(>struct_mutex);
> > +   obj = i915_gem_object_create_stolen(dev, size);
> > +   if (!obj) {
> > +   mutex_unlock(>struct_mutex);
> > +   return -ENOMEM;
> > +   }
> > +
> > +   /* Always clear fresh buffers before handing to userspace */
> > +   ret = i915_gem_object_clear(obj);
> > +   if (ret) {
> > +   drm_gem_object_unreference(>base);
> > +   mutex_unlock(>struct_mutex);
> > +   return ret;
> > +   }
> > +
> > +   mutex_unlock(>struct_mutex);
> > +   } else {
> > +   obj = i915_gem_alloc_object(dev, size);
> > +   }
> > +
> > if (obj == NULL)
> > return -ENOMEM;
> >
> > @@ -409,7 +433,7 @@ i915_gem_dumb_create(struct drm_file *file,
> > args->pitch = ALIGN(args->width * DIV_ROUND_UP(args->bpp, 8), 64);
> > args->size = args->pitch * args->height;
> > return i915_gem_create(file, dev,
> > -  args->size, >handle);
> > +  args->size, 0, >handle);
> >   }
> >
> >   /**
> > @@ -422,7 +446,7 @@ i915_gem_create_ioctl(struct drm_device *dev, void 
> > *data,
> > struct drm_i915_gem_create *args = data;
> >
> > return i915_gem_create(file, dev,
> > -  

Re: [Intel-gfx] [PATCH] Always mark GEM objects as dirty when written by the CPU

2015-12-11 Thread Dave Gordon

On 10/12/15 08:58, Daniel Vetter wrote:

On Mon, Dec 07, 2015 at 12:51:49PM +, Dave Gordon wrote:

I think I missed i915_gem_phys_pwrite().

i915_gem_gtt_pwrite_fast() marks the object dirty for most cases (vit
set_to_gtt_domain(), but isn't called for all cases (or can return before
the set_domain). Then we try i915_gem_shmem_pwrite() for non-phys
objects (no check for stolen!) and that already marks the object dirty
[aside: we might be able to change that to page-by-page?], but
i915_gem_phys_pwrite() doesn't mark the object dirty, so we might lose
updates there?

Or maybe we should move the marking up into i915_gem_pwrite_ioctl() instead.
The target object is surely going to be dirtied, whatever type it is.


phys objects are special, and when binding we create allocate new
(contiguous) storage. In put_pages_phys that gets copied back and pages
marked as dirty. While a phys object is pinned it's a kernel bug to look
at the shmem pages and a userspace bug to touch the cpu mmap (since that
data will simply be overwritten whenever the kernel feels like).

phys objects are only used for cursors on old crap though, so ok if we
don't streamline this fairly quirky old ABI.
-Daniel


So is pread broken already for 'phys' ? In the pwrite code, we have 
i915_gem_phys_pwrite() which look OK, but there isn't a corresponding 
i915_gem_phys_pread(), instead it will call i915_gem_shmem_pread(), and 
I'm not sure that will work! The question being, does the kernel have 
page table slots corresponding to the DMA area allocated, otherwise
the for_each_sg_page()/sg_page_iter_page() in i915_gem_shmem_pread() 
isn't going to give meaningful results. And I found this comment in 
drm_pci_alloc() (called from i915_gem_object_attach_phys()):


/* XXX - Is virt_to_page() legal for consistent mem? */
/* Reserve */
for (addr = (unsigned long)dmah->vaddr, sz = size;
 sz > 0; addr += PAGE_SIZE, sz -= PAGE_SIZE) {
SetPageReserved(virt_to_page((void *)addr));
}

(and does it depend on which memory configuration is selected?).

See also current thread on "Support for pread/pwrite from/to non shmem 
backed objects" ...


.Dave.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH i-g-t] gem_flink_race/prime_self_import: Improve test reliability

2015-12-11 Thread Morton, Derek J
>
>
>-Original Message-
>From: Daniel Vetter [mailto:daniel.vet...@ffwll.ch] On Behalf Of Daniel Vetter
>Sent: Thursday, December 10, 2015 12:53 PM
>To: Morton, Derek J
>Cc: Daniel Vetter; intel-gfx@lists.freedesktop.org; Wood, Thomas
>Subject: Re: [Intel-gfx] [PATCH i-g-t] gem_flink_race/prime_self_import: 
>Improve test reliability
>
>On Thu, Dec 10, 2015 at 11:51:29AM +, Morton, Derek J wrote:
>> >
>> >
>> >-Original Message-
>> >From: Daniel Vetter [mailto:daniel.vet...@ffwll.ch] On Behalf Of 
>> >Daniel Vetter
>> >Sent: Thursday, December 10, 2015 10:13 AM
>> >To: Morton, Derek J
>> >Cc: intel-gfx@lists.freedesktop.org; Wood, Thomas
>> >Subject: Re: [Intel-gfx] [PATCH i-g-t] 
>> >gem_flink_race/prime_self_import: Improve test reliability
>> >
>> >On Tue, Dec 08, 2015 at 12:44:44PM +, Derek Morton wrote:
>> >> gem_flink_race and prime_self_import have subtests which read the 
>> >> number of open gem objects from debugfs to determine if objects 
>> >> have leaked during the test. However the test can fail sporadically 
>> >> if the number of gem objects changes due to other process activity.
>> >> This patch introduces a change to check the number of gem objects 
>> >> several times to filter out any fluctuations.
>> >
>> >Why exactly does this happen? IGT tests should be run on bare metal, 
>> >with everything else killed/subdued/shutup. If there's still things 
>> >going on that create objects, we need to stop them from doing that.
>> >
>> >If this only applies to Android, or some special Android deamon them 
>> >imo check for that at runtime and igt_skip("your setup is invalid, 
>> >deamon %s running\n"); is the correct fix. After all just because you 
>> >sampled for a bit doesn't mean that it wont still change right when 
>> >you start running the test for real, so this is still fragile.
>> 
>> Before running tests on android we do stop everything possible. I 
>> suspect the culprit is coreu getting automatically restarted after it 
>> is stopped. I had additional debug while developing this patch and 
>> what I saw was the system being mostly quiescent but with some very 
>> low level background activity. 1 extra object being created and then 
>> deleted occasionally. Depending on whether it occurred at the start or 
>> end of the test it was resulting in a reported leak of either 1 or -1 
>> objects.
>> The patch fixes that issue by taking several samples and requiring 
>> them to be the same, therefore filtering out the low level background noise.
>> It would not help if something in the background allocated an object 
>> and kept it allocated, but I have not seen that happen. I only saw 
>> once the object count increasing for 2 consecutive reads hence the 
>> count to 4 to give a margin. The test was failing about 10%. With this 
>> patch I got 100% pass across 300 runs of each of the tests.
>
>Hm, piglit checks that there's no other drm clients running. Have you tried 
>re-running that check to zero in on the culprit?

We don't use piglet to run IGT tests on Android. I have had a look at what 
piglet does and added the same check to our scripts. (It reads a list of 
clients from /sys/kernel/debug/dri/0/clients)
For CHV it shows a process called 'y', though that seems to be some issue on 
CHV that all driver clients are called 'y'. I checked on BXT which properly 
shows the process names and it looks like it is the binder process (which  is 
handling some inter process communication). I don't think this is something we 
can stop. 

>
>> If you are concerned about the behaviour when running the test with a 
>> load of background activity I could add code to limit to the reset of 
>> the count and fail the test in that instance. That would give a 
>> benefit of distinguishing a test fail due to excessive background 
>> activity from a detected leak.
>
>I'm also concerned for the overhead this causes everyone else. If this really 
>is some Android trouble then I think it'd be good to only compile this on 
>Android. But would still be much better if you can get to a reliably clean 
>test environment.

I will make the loop part android specific.


//Derek

>
>> I would not want to just have the test skip as that introduces a hole 
>> in our test coverage.
>> 
>> >Also would be good to extract get_stable_obj_count to a proper igt 
>> >library function, if it indeed needs to be this tricky. And then add 
>> >the explanation for why we need this in the gtkdoc.
>> 
>> I  can move the code to an igt library. Which library would you suggest? 
>> Igt_debugfs ?
>
>Hm yeah, it's a bit the dumping ground for all things debugfs access ;-) 
>-Daniel
>--
>Daniel Vetter
>Software Engineer, Intel Corporation
>http://blog.ffwll.ch
>
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 22/32] drm/i915: Stop setting wraparound seqno on initialisation

2015-12-11 Thread Chris Wilson
We have testcases to ensure that seqno wraparound works fine, so we can
forgo forcing everyone to encounter seqno wraparound during early
uptime. seqno wraparound incurs a full GPU stall so not forcing it
will eliminate one jitter from the early system. Using the testcases, we
have very deterministic testing which given how difficult it would be to
debug an issue (GPU hang) stemming from a wraparound using pure
postmortem analysis I see no value in forcing a wrap during boot.

Advancing the global next_seqno after a GPU reset is equally pointless.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/i915_gem.c | 16 +---
 1 file changed, 1 insertion(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 15495b8112f9..d595d72e53b1 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4831,14 +4831,6 @@ i915_gem_init_hw(struct drm_device *dev)
}
}
 
-   /*
-* Increment the next seqno by 0x100 so we have a visible break
-* on re-initialisation
-*/
-   ret = i915_gem_set_seqno(dev, dev_priv->next_seqno+0x100);
-   if (ret)
-   goto out;
-
/* Now it is safe to go back round and do everything else: */
for_each_ring(ring, dev_priv, i) {
struct drm_i915_gem_request *req;
@@ -5018,13 +5010,7 @@ i915_gem_load(struct drm_device *dev)
dev_priv->num_fence_regs =
I915_READ(vgtif_reg(avail_rs.fence_num));
 
-   /*
-* Set initial sequence number for requests.
-* Using this number allows the wraparound to happen early,
-* catching any obvious problems.
-*/
-   dev_priv->next_seqno = ((u32)~0 - 0x1100);
-   dev_priv->last_seqno = ((u32)~0 - 0x1101);
+   dev_priv->next_seqno = 1;
 
/* Initialize fence registers to zero */
INIT_LIST_HEAD(_priv->mm.fence_list);
-- 
2.6.3

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 15/32] drm/i915: Slaughter the thundering i915_wait_request herd

2015-12-11 Thread Chris Wilson
One particularly stressful scenario consists of many independent tasks
all competing for GPU time and waiting upon the results (e.g. realtime
transcoding of many, many streams). One bottleneck in particular is that
each client waits on its own results, but every client is woken up after
every batchbuffer - hence the thunder of hooves as then every client must
do its heavyweight dance to read a coherent seqno to see if it is the
lucky one.

Ideally, we only want one client to wake up after the interrupt and
check its request for completion. Since the requests must retire in
order, we can select the first client on the oldest request to be woken.
Once that client has completed his wait, we can then wake up the
next client and so on. However, all clients then incur latency as every
process in the chain may be delayed for scheduling - this may also then
cause some priority inversion. To reduce the latency, when a client
is added or removed from the list, we scan the tree for completed
seqno and wake up all the completed waiters in parallel.

v2: Convert from a kworker per engine into a dedicated kthread for the
bottom-half.
v3: Rename request members and tweak comments.
v4: Use a per-engine spinlock in the breadcrumbs bottom-half.
v5: Fix race in locklessly checking waiter status and kicking the task on
adding a new waiter.
v6: Fix deciding when to force the timer to hide missing interrupts.
v7: Move the bottom-half from the kthread to the first client process.
v8: Reword a few comments
v9: Break the busy loop when the interrupt is unmasked or has fired.
v10: Comments, unnecessary churn, better debugging from Tvrtko
v11: Wake all completed waiters on removing the current bottom-half to
reduce the latency of waking up a herd of clients all waiting on the
same request.
v12: Rearrange missed-interrupt fault injection so that it works with
igt/drv_missed_irq_hang

Signed-off-by: Chris Wilson 
Cc: "Rogozhkin, Dmitry V" 
Cc: "Gong, Zhipeng" 
Cc: Tvrtko Ursulin 
Cc: Dave Gordon 
---
 drivers/gpu/drm/i915/Makefile|   1 +
 drivers/gpu/drm/i915/i915_debugfs.c  |  19 ++-
 drivers/gpu/drm/i915/i915_drv.h  |   3 +-
 drivers/gpu/drm/i915/i915_gem.c  | 152 -
 drivers/gpu/drm/i915/i915_gpu_error.c|   2 +-
 drivers/gpu/drm/i915/i915_irq.c  |  14 +-
 drivers/gpu/drm/i915/intel_breadcrumbs.c | 274 +++
 drivers/gpu/drm/i915/intel_lrc.c |   5 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c  |   5 +-
 drivers/gpu/drm/i915/intel_ringbuffer.h  |  63 ++-
 10 files changed, 436 insertions(+), 102 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/intel_breadcrumbs.c

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 0851de07bd13..d3b9d3618719 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -35,6 +35,7 @@ i915-y += i915_cmd_parser.o \
  i915_gem_userptr.o \
  i915_gpu_error.o \
  i915_trace_points.o \
+ intel_breadcrumbs.o \
  intel_lrc.o \
  intel_mocs.o \
  intel_ringbuffer.o \
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index d5f66bbdb160..48e574247a30 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -730,10 +730,22 @@ static int i915_gem_request_info(struct seq_file *m, void 
*data)
 static void i915_ring_seqno_info(struct seq_file *m,
 struct intel_engine_cs *ring)
 {
+   struct rb_node *rb;
+
if (ring->get_seqno) {
seq_printf(m, "Current sequence (%s): %x\n",
   ring->name, ring->get_seqno(ring, false));
}
+
+   spin_lock(>breadcrumbs.lock);
+   for (rb = rb_first(>breadcrumbs.requests);
+rb != NULL;
+rb = rb_next(rb)) {
+   struct intel_breadcrumb *b = container_of(rb, typeof(*b), node);
+   seq_printf(m, "Waiting (%s): %s [%d] on %x\n",
+  ring->name, b->task->comm, b->task->pid, b->seqno);
+   }
+   spin_unlock(>breadcrumbs.lock);
 }
 
 static int i915_gem_seqno_info(struct seq_file *m, void *data)
@@ -1356,8 +1368,9 @@ static int i915_hangcheck_info(struct seq_file *m, void 
*unused)
 
for_each_ring(ring, dev_priv, i) {
seq_printf(m, "%s:\n", ring->name);
-   seq_printf(m, "\tseqno = %x [current %x]\n",
-  ring->hangcheck.seqno, seqno[i]);
+   seq_printf(m, "\tseqno = %x [current %x], waiters? %d\n",
+  ring->hangcheck.seqno, seqno[i],
+  intel_engine_has_waiter(ring));
seq_printf(m, "\tACTHD = 0x%08llx [current 0x%08llx]\n",
   (long 

[Intel-gfx] [PATCH 04/32] drm/i915: Hide the atomic_read(reset_counter) behind a helper

2015-12-11 Thread Chris Wilson
This is principally a little bit of syntatic sugar to hide the
atomic_read()s throughout the code to retrieve the current reset_counter.
It also provides the other utility functions to check the reset state on the
already read reset_counter, so that (in later patches) we can read it once
and do multiple tests rather than risk the value changing between tests.

v2: Be strictly on converting existing i915_reset_in_progress() over to
the more verbose i915_reset_in_progress_or_wedged().

Signed-off-by: Chris Wilson 
Cc: Daniel Vetter 
---
 drivers/gpu/drm/i915/i915_debugfs.c |  4 ++--
 drivers/gpu/drm/i915/i915_drv.h | 32 
 drivers/gpu/drm/i915/i915_gem.c | 16 
 drivers/gpu/drm/i915/i915_irq.c |  2 +-
 drivers/gpu/drm/i915/intel_display.c| 18 +++---
 drivers/gpu/drm/i915/intel_lrc.c|  2 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c |  4 ++--
 7 files changed, 53 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index 24318b79bcfc..c26a4c087f49 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -4672,7 +4672,7 @@ i915_wedged_get(void *data, u64 *val)
struct drm_device *dev = data;
struct drm_i915_private *dev_priv = dev->dev_private;
 
-   *val = atomic_read(_priv->gpu_error.reset_counter);
+   *val = i915_reset_counter(_priv->gpu_error);
 
return 0;
 }
@@ -4691,7 +4691,7 @@ i915_wedged_set(void *data, u64 val)
 * while it is writing to 'i915_wedged'
 */
 
-   if (i915_reset_in_progress(_priv->gpu_error))
+   if (i915_reset_in_progress_or_wedged(_priv->gpu_error))
return -EAGAIN;
 
intel_runtime_pm_get(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 8c4303b664d9..466caa0bc043 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2992,20 +2992,44 @@ void i915_gem_retire_requests_ring(struct 
intel_engine_cs *ring);
 int __must_check i915_gem_check_wedge(struct i915_gpu_error *error,
  bool interruptible);
 
+static inline u32 i915_reset_counter(struct i915_gpu_error *error)
+{
+   return atomic_read(>reset_counter);
+}
+
+static inline bool __i915_reset_in_progress(u32 reset)
+{
+   return unlikely(reset & I915_RESET_IN_PROGRESS_FLAG);
+}
+
+static inline bool __i915_reset_in_progress_or_wedged(u32 reset)
+{
+   return unlikely(reset & (I915_RESET_IN_PROGRESS_FLAG | I915_WEDGED));
+}
+
+static inline bool __i915_terminally_wedged(u32 reset)
+{
+   return unlikely(reset & I915_WEDGED);
+}
+
 static inline bool i915_reset_in_progress(struct i915_gpu_error *error)
 {
-   return unlikely(atomic_read(>reset_counter)
-   & (I915_RESET_IN_PROGRESS_FLAG | I915_WEDGED));
+   return __i915_reset_in_progress(i915_reset_counter(error));
+}
+
+static inline bool i915_reset_in_progress_or_wedged(struct i915_gpu_error 
*error)
+{
+   return __i915_reset_in_progress_or_wedged(i915_reset_counter(error));
 }
 
 static inline bool i915_terminally_wedged(struct i915_gpu_error *error)
 {
-   return atomic_read(>reset_counter) & I915_WEDGED;
+   return __i915_terminally_wedged(i915_reset_counter(error));
 }
 
 static inline u32 i915_reset_count(struct i915_gpu_error *error)
 {
-   return ((atomic_read(>reset_counter) & ~I915_WEDGED) + 1) / 2;
+   return ((i915_reset_counter(error) & ~I915_WEDGED) + 1) / 2;
 }
 
 static inline bool i915_stop_ring_allow_ban(struct drm_i915_private *dev_priv)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 29d98ddbbc80..0b3e0534baa3 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -85,7 +85,7 @@ i915_gem_wait_for_error(struct i915_gpu_error *error)
 {
int ret;
 
-#define EXIT_COND (!i915_reset_in_progress(error) || \
+#define EXIT_COND (!i915_reset_in_progress_or_wedged(error) || \
   i915_terminally_wedged(error))
if (EXIT_COND)
return 0;
@@ -1113,7 +1113,7 @@ int
 i915_gem_check_wedge(struct i915_gpu_error *error,
 bool interruptible)
 {
-   if (i915_reset_in_progress(error)) {
+   if (i915_reset_in_progress_or_wedged(error)) {
/* Non-interruptible callers can't handle -EAGAIN, hence return
 * -EIO unconditionally for these. */
if (!interruptible)
@@ -1297,7 +1297,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 
/* We need to check whether any gpu reset happened in between
 * the caller grabbing the seqno and now ... */
-   if (reset_counter != 
atomic_read(_priv->gpu_error.reset_counter)) {
+   if (reset_counter != 

[Intel-gfx] [PATCH 03/32] drm/i915: Only spin whilst waiting on the current request

2015-12-11 Thread Chris Wilson
Limit busywaiting only to the request currently being processed by the
GPU. If the request is not currently being processed by the GPU, there
is a very low likelihood of it being completed within the 2 microsecond
spin timeout and so we will just be wasting CPU cycles.

v2: Check for logical inversion when rebasing - we were incorrectly
checking for this request being active, and instead busywaiting for
when the GPU was not yet processing the request of interest.

v3: Try another colour for the seqno names.
v4: Another colour for the function names.

v5: Remove the forced coherency when checking for the active request. On
reflection and plenty of recent experimentation, the issue is not a
cache coherency problem - but an irq/seqno ordering problem (timing issue).
Here, we do not need the w/a to force ordering of the read with an
interrupt.

Signed-off-by: Chris Wilson 
Reviewed-by: Tvrtko Ursulin 
Cc: "Rogozhkin, Dmitry V" 
Cc: Daniel Vetter 
Cc: Tvrtko Ursulin 
Cc: Eero Tamminen 
Cc: "Rantala, Valtteri" 
Cc: sta...@vger.kernel.org
---
 drivers/gpu/drm/i915/i915_drv.h | 27 +++
 drivers/gpu/drm/i915/i915_gem.c |  8 +++-
 2 files changed, 26 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 5edd39352e97..8c4303b664d9 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2182,8 +2182,17 @@ struct drm_i915_gem_request {
struct drm_i915_private *i915;
struct intel_engine_cs *ring;
 
-   /** GEM sequence number associated with this request. */
-   uint32_t seqno;
+/** GEM sequence number associated with the previous request,
+ * when the HWS breadcrumb is equal to this the GPU is processing
+ * this request.
+ */
+   u32 previous_seqno;
+
+/** GEM sequence number associated with this request,
+ * when the HWS breadcrumb is equal or greater than this the GPU
+ * has finished processing this request.
+ */
+   u32 seqno;
 
/** Position in the ringbuffer of the start of the request */
u32 head;
@@ -2958,15 +2967,17 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2)
return (int32_t)(seq1 - seq2) >= 0;
 }
 
+static inline bool i915_gem_request_started(struct drm_i915_gem_request *req,
+  bool lazy_coherency)
+{
+   u32 seqno = req->ring->get_seqno(req->ring, lazy_coherency);
+   return i915_seqno_passed(seqno, req->previous_seqno);
+}
+
 static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
  bool lazy_coherency)
 {
-   u32 seqno;
-
-   BUG_ON(req == NULL);
-
-   seqno = req->ring->get_seqno(req->ring, lazy_coherency);
-
+   u32 seqno = req->ring->get_seqno(req->ring, lazy_coherency);
return i915_seqno_passed(seqno, req->seqno);
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 46a84c447d8f..29d98ddbbc80 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1193,9 +1193,13 @@ static int __i915_spin_request(struct 
drm_i915_gem_request *req, int state)
 * takes to sleep on a request, on the order of a microsecond.
 */
 
-   if (i915_gem_request_get_ring(req)->irq_refcount)
+   if (req->ring->irq_refcount)
return -EBUSY;
 
+   /* Only spin if we know the GPU is processing this request */
+   if (!i915_gem_request_started(req, true))
+   return -EAGAIN;
+
timeout = local_clock_us() + 5;
while (!need_resched()) {
if (i915_gem_request_completed(req, true))
@@ -1209,6 +1213,7 @@ static int __i915_spin_request(struct 
drm_i915_gem_request *req, int state)
 
cpu_relax_lowlatency();
}
+
if (i915_gem_request_completed(req, false))
return 0;
 
@@ -2600,6 +2605,7 @@ void __i915_add_request(struct drm_i915_gem_request 
*request,
request->batch_obj = obj;
 
request->emitted_jiffies = jiffies;
+   request->previous_seqno = ring->last_submitted_seqno;
ring->last_submitted_seqno = request->seqno;
list_add_tail(>list, >request_list);
 
-- 
2.6.3

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 20/32] drm/i915: Replace manual barrier() with READ_ONCE() in HWS accessor

2015-12-11 Thread Chris Wilson
When reading from the HWS page, we use barrier() to prevent the compiler
optimising away the read from the volatile (may be updated by the GPU)
memory address. This is more suited to READ_ONCE(); make it so.

Signed-off-by: Chris Wilson 
Cc: Daniel Vetter 
---
 drivers/gpu/drm/i915/intel_ringbuffer.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h 
b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 7ad06cbef6be..a35c17106f4b 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -416,8 +416,7 @@ intel_read_status_page(struct intel_engine_cs *ring,
   int reg)
 {
/* Ensure that the compiler doesn't optimize away the load. */
-   barrier();
-   return ring->status_page.page_addr[reg];
+   return READ_ONCE(ring->status_page.page_addr[reg]);
 }
 
 static inline void
-- 
2.6.3

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 10/32] drm/i915: Suppress error message when GPU resets are disabled

2015-12-11 Thread Chris Wilson
If we do not have lowlevel support for reseting the GPU, or if the user
has explicitly disabled reseting the device, the failure is expected.
Since it is an expected failure, we should be using a lower priority
message than *ERROR*, perhaps NOTICE. In the absence of DRM_NOTICE, just
emit the expected failure as a DEBUG message.

Signed-off-by: Chris Wilson 
Cc: Daniel Vetter 
Reviewed-by: Daniel Vetter 
---
 drivers/gpu/drm/i915/i915_drv.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 8bdc51bc00a4..ba91f65b6082 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -895,7 +895,10 @@ int i915_reset(struct drm_device *dev)
pr_notice("drm/i915: Resetting chip after gpu hang\n");
 
if (ret) {
-   DRM_ERROR("Failed to reset chip: %i\n", ret);
+   if (ret != -ENODEV)
+   DRM_ERROR("Failed to reset chip: %i\n", ret);
+   else
+   DRM_DEBUG_DRIVER("GPU reset disabled\n");
goto error;
}
 
-- 
2.6.3

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 21/32] drm/i915: Broadwell execlists needs exactly the same seqno w/a as legacy

2015-12-11 Thread Chris Wilson
In legacy mode, we use the gen6 seqno barrier to insert a delay after
the interrupt before reading the seqno (as the seqno write is not
flushed before the interrupt is sent, the interrupt arrives before the
seqno is visible). Execlists ignored the evidence of igt.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/intel_lrc.c | 39 +--
 1 file changed, 21 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 91e5ed6867e5..a73c5e671423 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1745,18 +1745,24 @@ static int gen8_emit_flush_render(struct 
drm_i915_gem_request *request,
return 0;
 }
 
-static void bxt_seqno_barrier(struct intel_engine_cs *ring)
+static void
+gen6_seqno_barrier(struct intel_engine_cs *ring)
 {
-   /*
-* On BXT A steppings there is a HW coherency issue whereby the
-* MI_STORE_DATA_IMM storing the completed request's seqno
-* occasionally doesn't invalidate the CPU cache. Work around this by
-* clflushing the corresponding cacheline whenever the caller wants
-* the coherency to be guaranteed. Note that this cacheline is known
-* to be clean at this point, since we only write it in
-* bxt_a_set_seqno(), where we also do a clflush after the write. So
-* this clflush in practice becomes an invalidate operation.
+   /* Workaround to force correct ordering between irq and seqno writes on
+* ivb (and maybe also on snb) by reading from a CS register (like
+* ACTHD) before reading the status page.
+*
+* Note that this effectively effectively stalls the read by the time
+* it takes to do a memory transaction, which more or less ensures
+* that the write from the GPU has sufficient time to invalidate
+* the CPU cacheline. Alternatively we could delay the interrupt from
+* the CS ring to give the write time to land, but that would incur
+* a delay after every batch i.e. much more frequent than a delay
+* when waiting for the interrupt (with the same net latency).
 */
+   struct drm_i915_private *dev_priv = ring->i915;
+   POSTING_READ_FW(RING_ACTHD(ring->mmio_base));
+
intel_flush_status_page(ring, I915_GEM_HWS_INDEX);
 }
 
@@ -1954,8 +1960,7 @@ static int logical_render_ring_init(struct drm_device 
*dev)
ring->init_hw = gen8_init_render_ring;
ring->init_context = gen8_init_rcs_context;
ring->cleanup = intel_fini_pipe_control;
-   if (IS_BXT_REVID(dev, 0, BXT_REVID_A1))
-   ring->seqno_barrier = bxt_seqno_barrier;
+   ring->seqno_barrier = gen6_seqno_barrier;
ring->emit_request = gen8_emit_request;
ring->emit_flush = gen8_emit_flush_render;
ring->irq_get = gen8_logical_ring_get_irq;
@@ -2001,8 +2006,7 @@ static int logical_bsd_ring_init(struct drm_device *dev)
GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS1_IRQ_SHIFT;
 
ring->init_hw = gen8_init_common_ring;
-   if (IS_BXT_REVID(dev, 0, BXT_REVID_A1))
-   ring->seqno_barrier = bxt_seqno_barrier;
+   ring->seqno_barrier = gen6_seqno_barrier;
ring->emit_request = gen8_emit_request;
ring->emit_flush = gen8_emit_flush;
ring->irq_get = gen8_logical_ring_get_irq;
@@ -2026,6 +2030,7 @@ static int logical_bsd2_ring_init(struct drm_device *dev)
GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS2_IRQ_SHIFT;
 
ring->init_hw = gen8_init_common_ring;
+   ring->seqno_barrier = gen6_seqno_barrier;
ring->emit_request = gen8_emit_request;
ring->emit_flush = gen8_emit_flush;
ring->irq_get = gen8_logical_ring_get_irq;
@@ -2049,8 +2054,7 @@ static int logical_blt_ring_init(struct drm_device *dev)
GT_CONTEXT_SWITCH_INTERRUPT << GEN8_BCS_IRQ_SHIFT;
 
ring->init_hw = gen8_init_common_ring;
-   if (IS_BXT_REVID(dev, 0, BXT_REVID_A1))
-   ring->seqno_barrier = bxt_seqno_barrier;
+   ring->seqno_barrier = gen6_seqno_barrier;
ring->emit_request = gen8_emit_request;
ring->emit_flush = gen8_emit_flush;
ring->irq_get = gen8_logical_ring_get_irq;
@@ -2074,8 +2078,7 @@ static int logical_vebox_ring_init(struct drm_device *dev)
GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VECS_IRQ_SHIFT;
 
ring->init_hw = gen8_init_common_ring;
-   if (IS_BXT_REVID(dev, 0, BXT_REVID_A1))
-   ring->seqno_barrier = bxt_seqno_barrier;
+   ring->seqno_barrier = gen6_seqno_barrier;
ring->emit_request = gen8_emit_request;
ring->emit_flush = gen8_emit_flush;
ring->irq_get = gen8_logical_ring_get_irq;
-- 
2.6.3

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org

[Intel-gfx] [PATCH 16/32] drm/i915: Separate out the seqno-barrier from engine->get_seqno

2015-12-11 Thread Chris Wilson
In order to simplify the next couple of patches, extract the
lazy_coherency optimisation our of the engine->get_seqno() vfunc into
its own callback.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/i915_debugfs.c  |  6 ++---
 drivers/gpu/drm/i915/i915_drv.h  | 12 ++
 drivers/gpu/drm/i915/i915_gpu_error.c|  2 +-
 drivers/gpu/drm/i915/i915_irq.c  |  4 ++--
 drivers/gpu/drm/i915/i915_trace.h|  2 +-
 drivers/gpu/drm/i915/intel_breadcrumbs.c |  4 ++--
 drivers/gpu/drm/i915/intel_lrc.c | 39 
 drivers/gpu/drm/i915/intel_ringbuffer.c  | 36 +++--
 drivers/gpu/drm/i915/intel_ringbuffer.h  |  4 ++--
 9 files changed, 53 insertions(+), 56 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index 48e574247a30..6344fe69ab82 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -600,7 +600,7 @@ static int i915_gem_pageflip_info(struct seq_file *m, void 
*data)
   ring->name,
   
i915_gem_request_get_seqno(work->flip_queued_req),
   dev_priv->next_seqno,
-  ring->get_seqno(ring, true),
+  ring->get_seqno(ring),
   
i915_gem_request_completed(work->flip_queued_req, true));
} else
seq_printf(m, "Flip not associated with any 
ring\n");
@@ -734,7 +734,7 @@ static void i915_ring_seqno_info(struct seq_file *m,
 
if (ring->get_seqno) {
seq_printf(m, "Current sequence (%s): %x\n",
-  ring->name, ring->get_seqno(ring, false));
+  ring->name, ring->get_seqno(ring));
}
 
spin_lock(>breadcrumbs.lock);
@@ -1353,7 +1353,7 @@ static int i915_hangcheck_info(struct seq_file *m, void 
*unused)
intel_runtime_pm_get(dev_priv);
 
for_each_ring(ring, dev_priv, i) {
-   seqno[i] = ring->get_seqno(ring, false);
+   seqno[i] = ring->get_seqno(ring);
acthd[i] = intel_ring_get_active_head(ring);
}
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 830d760aa562..ff83f148658f 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2981,15 +2981,19 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2)
 static inline bool i915_gem_request_started(struct drm_i915_gem_request *req,
   bool lazy_coherency)
 {
-   u32 seqno = req->ring->get_seqno(req->ring, lazy_coherency);
-   return i915_seqno_passed(seqno, req->previous_seqno);
+   if (!lazy_coherency && req->ring->seqno_barrier)
+   req->ring->seqno_barrier(req->ring);
+   return i915_seqno_passed(req->ring->get_seqno(req->ring),
+req->previous_seqno);
 }
 
 static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
  bool lazy_coherency)
 {
-   u32 seqno = req->ring->get_seqno(req->ring, lazy_coherency);
-   return i915_seqno_passed(seqno, req->seqno);
+   if (!lazy_coherency && req->ring->seqno_barrier)
+   req->ring->seqno_barrier(req->ring);
+   return i915_seqno_passed(req->ring->get_seqno(req->ring),
+req->seqno);
 }
 
 int __must_check i915_gem_get_seqno(struct drm_device *dev, u32 *seqno);
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c 
b/drivers/gpu/drm/i915/i915_gpu_error.c
index f805d117f3d1..01d0206ca4dd 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -902,8 +902,8 @@ static void i915_record_ring_state(struct drm_device *dev,
 
ering->waiting = intel_engine_has_waiter(ring);
ering->instpm = I915_READ(RING_INSTPM(ring->mmio_base));
-   ering->seqno = ring->get_seqno(ring, false);
ering->acthd = intel_ring_get_active_head(ring);
+   ering->seqno = ring->get_seqno(ring);
ering->start = I915_READ_START(ring);
ering->head = I915_READ_HEAD(ring);
ering->tail = I915_READ_TAIL(ring);
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index d250b4721a6a..da3c8aaa50a3 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2875,7 +2875,7 @@ static int semaphore_passed(struct intel_engine_cs *ring)
if (signaller->hangcheck.deadlock >= I915_NUM_RINGS)
return -1;
 
-   if (i915_seqno_passed(signaller->get_seqno(signaller, false), seqno))
+   if (i915_seqno_passed(signaller->get_seqno(signaller), seqno))
return 1;
 
/* cursory check for 

[Intel-gfx] [PATCH 08/32] drm/i915: Simplify reset_counter handling during atomic modesetting

2015-12-11 Thread Chris Wilson
Now that the reset_counter is stored on the request, we can rearrange
the code to handle reading the counter versus waiting during the atomic
modesetting for readibility (by deleting the hairiest of codes).

Signed-off-by: Chris Wilson 
Cc: Daniel Vetter 
---
 drivers/gpu/drm/i915/intel_display.c | 18 +++---
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index d59beca928b7..d7bbd015de35 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -13393,9 +13393,9 @@ static int intel_atomic_prepare_commit(struct 
drm_device *dev,
return ret;
 
ret = drm_atomic_helper_prepare_planes(dev, state);
-   if (!ret && !async && 
!i915_reset_in_progress_or_wedged(_priv->gpu_error)) {
-   mutex_unlock(>struct_mutex);
+   mutex_unlock(>struct_mutex);
 
+   if (!ret && !async) {
for_each_plane_in_state(state, plane, plane_state, i) {
struct intel_plane_state *intel_plane_state =
to_intel_plane_state(plane_state);
@@ -13409,19 +13409,15 @@ static int intel_atomic_prepare_commit(struct 
drm_device *dev,
/* Swallow -EIO errors to allow updates during hw 
lockup. */
if (ret == -EIO)
ret = 0;
-
-   if (ret)
+   if (ret) {
+   mutex_lock(>struct_mutex);
+   drm_atomic_helper_cleanup_planes(dev, state);
+   mutex_unlock(>struct_mutex);
break;
+   }
}
-
-   if (!ret)
-   return 0;
-
-   mutex_lock(>struct_mutex);
-   drm_atomic_helper_cleanup_planes(dev, state);
}
 
-   mutex_unlock(>struct_mutex);
return ret;
 }
 
-- 
2.6.3

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 11/32] drm/i915: Delay queuing hangcheck to wait-request

2015-12-11 Thread Chris Wilson
We can forgo queuing the hangcheck from the start of every request to
until we wait upon a request. This reduces the overhead of every
request, but may increase the latency of detecting a hang. Howeever, if
nothing every waits upon a hang, did it ever hang? It also improves the
robustness of the wait-request by ensuring that the hangchecker is
indeed running before we sleep indefinitely (and thereby ensuring that
we never actually sleep forever waiting for a dead GPU).

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/i915_drv.h | 2 +-
 drivers/gpu/drm/i915/i915_gem.c | 5 +++--
 drivers/gpu/drm/i915/i915_irq.c | 9 -
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 7acbc072973a..987a35c5af72 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2723,7 +2723,7 @@ void intel_hpd_cancel_work(struct drm_i915_private 
*dev_priv);
 bool intel_hpd_pin_to_port(enum hpd_pin pin, enum port *port);
 
 /* i915_irq.c */
-void i915_queue_hangcheck(struct drm_device *dev);
+void i915_queue_hangcheck(struct drm_i915_private *dev_priv);
 __printf(3, 4)
 void i915_handle_error(struct drm_device *dev, bool wedged,
   const char *fmt, ...);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index f5760869a17c..0340a5fe9cda 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1308,6 +1308,9 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
break;
}
 
+   /* Ensure that even if the GPU hangs, we get woken up. */
+   i915_queue_hangcheck(dev_priv);
+
timer.function = NULL;
if (timeout || missed_irq(dev_priv, ring)) {
unsigned long expire;
@@ -2584,8 +2587,6 @@ void __i915_add_request(struct drm_i915_gem_request 
*request,
 
trace_i915_gem_request_add(request);
 
-   i915_queue_hangcheck(ring->dev);
-
queue_delayed_work(dev_priv->wq,
   _priv->mm.retire_work,
   round_jiffies_up_relative(HZ));
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 88206c0404d7..21089ac5dd58 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -3066,15 +3066,14 @@ static void i915_hangcheck_elapsed(struct work_struct 
*work)
if (rings_hung)
return i915_handle_error(dev, true, "Ring hung");
 
+   /* Reset timer in case GPU hangs without another request being added */
if (busy_count)
-   /* Reset timer case chip hangs without another request
-* being added */
-   i915_queue_hangcheck(dev);
+   i915_queue_hangcheck(dev_priv);
 }
 
-void i915_queue_hangcheck(struct drm_device *dev)
+void i915_queue_hangcheck(struct drm_i915_private *dev_priv)
 {
-   struct i915_gpu_error *e = _i915(dev)->gpu_error;
+   struct i915_gpu_error *e = _priv->gpu_error;
 
if (!i915.enable_hangcheck)
return;
-- 
2.6.3

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 13/32] drm/i915: Make queueing the hangcheck work inline

2015-12-11 Thread Chris Wilson
Since the function is a small wrapper around schedule_delayed_work(),
move it inline to remove the function call overhead for the principle
caller.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/i915_drv.h | 17 -
 drivers/gpu/drm/i915/i915_irq.c | 16 
 2 files changed, 16 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 9304ecfa05d4..f82e8fb19c9b 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2722,7 +2722,22 @@ void intel_hpd_cancel_work(struct drm_i915_private 
*dev_priv);
 bool intel_hpd_pin_to_port(enum hpd_pin pin, enum port *port);
 
 /* i915_irq.c */
-void i915_queue_hangcheck(struct drm_i915_private *dev_priv);
+static inline void i915_queue_hangcheck(struct drm_i915_private *dev_priv)
+{
+   unsigned long delay;
+
+   if (unlikely(!i915.enable_hangcheck))
+   return;
+
+   /* Don't continually defer the hangcheck so that it is always run at
+* least once after work has been scheduled on any ring. Otherwise,
+* we will ignore a hung ring if a second ring is kept busy.
+*/
+
+   delay = round_jiffies_up_relative(DRM_I915_HANGCHECK_JIFFIES);
+   schedule_delayed_work(_priv->gpu_error.hangcheck_work, delay);
+}
+
 __printf(3, 4)
 void i915_handle_error(struct drm_device *dev, bool wedged,
   const char *fmt, ...);
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index afe04aeb858d..5f88869e2207 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -3071,22 +3071,6 @@ static void i915_hangcheck_elapsed(struct work_struct 
*work)
i915_queue_hangcheck(dev_priv);
 }
 
-void i915_queue_hangcheck(struct drm_i915_private *dev_priv)
-{
-   unsigned long delay;
-
-   if (!i915.enable_hangcheck)
-   return;
-
-   /* Don't continually defer the hangcheck so that it is always run at
-* least once after work has been scheduled on any ring. Otherwise,
-* we will ignore a hung ring if a second ring is kept busy.
-*/
-
-   delay = round_jiffies_up_relative(DRM_I915_HANGCHECK_JIFFIES);
-   schedule_delayed_work(_priv->gpu_error.hangcheck_work, delay);
-}
-
 static void ibx_irq_reset(struct drm_device *dev)
 {
struct drm_i915_private *dev_priv = dev->dev_private;
-- 
2.6.3

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 29/32] drm/i915: Only start retire worker when idle

2015-12-11 Thread Chris Wilson
The retire worker is a low frequency task that makes sure we retire
outstanding requests if userspace is being lax. We only need to start it
once as it remains active until the GPU is idle, so do a cheap test
before the more expensive queue_work(). A consequence of this is that we
need correct locking in the worker to make the hot path of request
submission cheap. To keep the symmetry and keep hangcheck strictly bound
by the GPU's wakelock, we move the cancel_sync(hangcheck) to the idle
worker before dropping the wakelock.

Signed-off-by: Chris Wilson 
References: https://bugs.freedesktop.org/show_bug.cgi?id=88437
---
 drivers/gpu/drm/i915/i915_drv.c  |  2 -
 drivers/gpu/drm/i915/i915_drv.h  |  2 +-
 drivers/gpu/drm/i915/i915_gem.c  | 97 +---
 drivers/gpu/drm/i915/intel_display.c | 29 ---
 4 files changed, 69 insertions(+), 61 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index ba91f65b6082..0f79ee1d35a2 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1472,8 +1472,6 @@ static int intel_runtime_suspend(struct device *device)
i915_gem_release_all_mmaps(dev_priv);
mutex_unlock(>struct_mutex);
 
-   cancel_delayed_work_sync(_priv->gpu_error.hangcheck_work);
-
intel_guc_suspend(dev);
 
intel_suspend_gt_powersave(dev);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index dabfb043362f..834cc779a9db 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2996,7 +2996,7 @@ int __must_check i915_gem_set_seqno(struct drm_device 
*dev, u32 seqno);
 struct drm_i915_gem_request *
 i915_gem_find_active_request(struct intel_engine_cs *ring);
 
-bool i915_gem_retire_requests(struct drm_device *dev);
+void i915_gem_retire_requests(struct drm_device *dev);
 void i915_gem_retire_requests_ring(struct intel_engine_cs *ring);
 
 static inline u32 i915_reset_counter(struct i915_gpu_error *error)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index fdd9dd5296e9..d1a7a7f8f3ad 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2495,6 +2495,51 @@ i915_gem_get_seqno(struct drm_device *dev, u32 *seqno)
return 0;
 }
 
+static void i915_gem_mark_busy(struct drm_i915_private *dev_priv)
+{
+   if (dev_priv->mm.busy)
+   return;
+
+   intel_runtime_pm_get_noresume(dev_priv);
+
+   i915_update_gfx_val(dev_priv);
+   if (INTEL_INFO(dev_priv)->gen >= 6)
+   gen6_rps_busy(dev_priv);
+
+   queue_delayed_work(dev_priv->wq,
+  _priv->mm.retire_work,
+  round_jiffies_up_relative(HZ));
+
+   dev_priv->mm.busy = true;
+}
+
+static void kick_waiters(struct drm_i915_private *dev_priv)
+{
+   struct intel_engine_cs *ring;
+   int i;
+
+   for_each_ring(ring, dev_priv, i) {
+   if (!intel_engine_has_waiter(ring))
+   continue;
+
+   set_bit(ring->id, _priv->gpu_error.missed_irq_rings);
+   intel_engine_wakeup(ring);
+   }
+}
+
+static void i915_gem_mark_idle(struct drm_i915_private *dev_priv)
+{
+   dev_priv->mm.busy = false;
+
+   if (cancel_delayed_work_sync(_priv->gpu_error.hangcheck_work))
+   kick_waiters(dev_priv);
+
+   if (INTEL_INFO(dev_priv)->gen >= 6)
+   gen6_rps_idle(dev_priv);
+
+   intel_runtime_pm_put(dev_priv);
+}
+
 /*
  * NB: This function is not allowed to fail. Doing so would mean the the
  * request is not being tracked for completion but the work itself is
@@ -2575,10 +2620,7 @@ void __i915_add_request(struct drm_i915_gem_request 
*request,
 
trace_i915_gem_request_add(request);
 
-   queue_delayed_work(dev_priv->wq,
-  _priv->mm.retire_work,
-  round_jiffies_up_relative(HZ));
-   intel_mark_busy(dev_priv->dev);
+   i915_gem_mark_busy(dev_priv);
 
/* Sanity check that the reserved size was large enough. */
intel_ring_reserved_space_end(ringbuf);
@@ -2910,7 +2952,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs 
*ring)
WARN_ON(i915_verify_lists(ring->dev));
 }
 
-bool
+void
 i915_gem_retire_requests(struct drm_device *dev)
 {
struct drm_i915_private *dev_priv = dev->dev_private;
@@ -2934,10 +2976,8 @@ i915_gem_retire_requests(struct drm_device *dev)
 
if (idle)
mod_delayed_work(dev_priv->wq,
-  _priv->mm.idle_work,
-  msecs_to_jiffies(100));
-
-   return idle;
+_priv->mm.idle_work,
+msecs_to_jiffies(100));
 }
 
 static void
@@ -2946,16 +2986,20 @@ i915_gem_retire_work_handler(struct work_struct *work)
struct drm_i915_private 

[Intel-gfx] [PATCH 02/32] drm/i915: Limit the busy wait on requests to 5us not 10ms!

2015-12-11 Thread Chris Wilson
When waiting for high frequency requests, the finite amount of time
required to set up the irq and wait upon it limits the response rate. By
busywaiting on the request completion for a short while we can service
the high frequency waits as quick as possible. However, if it is a slow
request, we want to sleep as quickly as possible. The tradeoff between
waiting and sleeping is roughly the time it takes to sleep on a request,
on the order of a microsecond. Based on measurements of synchronous
workloads from across big core and little atom, I have set the limit for
busywaiting as 10 microseconds. In most of the synchronous cases, we can
reduce the limit down to as little as 2 miscroseconds, but that leaves
quite a few test cases regressing by factors of 3 and more.

The code currently uses the jiffie clock, but that is far too coarse (on
the order of 10 milliseconds) and results in poor interactivity as the
CPU ends up being hogged by slow requests. To get microsecond resolution
we need to use a high resolution timer. The cheapest of which is polling
local_clock(), but that is only valid on the same CPU. If we switch CPUs
because the task was preempted, we can also use that as an indicator that
 the system is too busy to waste cycles on spinning and we should sleep
instead.

__i915_spin_request was introduced in
commit 2def4ad99befa25775dd2f714fdd4d92faec6e34 [v4.2]
Author: Chris Wilson 
Date:   Tue Apr 7 16:20:41 2015 +0100

 drm/i915: Optimistically spin for the request completion

v2: Drop full u64 for unsigned long - the timer is 32bit wraparound safe,
so we can use native register sizes on smaller architectures. Mention
the approximate microseconds units for elapsed time and add some extra
comments describing the reason for busywaiting.

v3: Raise the limit to 10us
v4: Now 5us.

Reported-by: Jens Axboe 
Link: https://lkml.org/lkml/2015/11/12/621
Reviewed-by: Tvrtko Ursulin 
Cc: "Rogozhkin, Dmitry V" 
Cc: Daniel Vetter 
Cc: Tvrtko Ursulin 
Cc: Eero Tamminen 
Cc: "Rantala, Valtteri" 
Cc: sta...@vger.kernel.org
---
 drivers/gpu/drm/i915/i915_gem.c | 47 +++--
 1 file changed, 45 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 7e1246410afc..46a84c447d8f 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1146,14 +1146,57 @@ static bool missed_irq(struct drm_i915_private 
*dev_priv,
return test_bit(ring->id, _priv->gpu_error.missed_irq_rings);
 }
 
+static unsigned long local_clock_us(unsigned *cpu)
+{
+   unsigned long t;
+
+   /* Cheaply and approximately convert from nanoseconds to microseconds.
+* The result and subsequent calculations are also defined in the same
+* approximate microseconds units. The principal source of timing
+* error here is from the simple truncation.
+*
+* Note that local_clock() is only defined wrt to the current CPU;
+* the comparisons are no longer valid if we switch CPUs. Instead of
+* blocking preemption for the entire busywait, we can detect the CPU
+* switch and use that as indicator of system load and a reason to
+* stop busywaiting, see busywait_stop().
+*/
+   *cpu = get_cpu();
+   t = local_clock() >> 10;
+   put_cpu();
+
+   return t;
+}
+
+static bool busywait_stop(unsigned long timeout, unsigned cpu)
+{
+   unsigned this_cpu;
+
+   if (time_after(local_clock_us(_cpu), timeout))
+   return true;
+
+   return this_cpu != cpu;
+}
+
 static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
 {
unsigned long timeout;
+   unsigned cpu;
+
+   /* When waiting for high frequency requests, e.g. during synchronous
+* rendering split between the CPU and GPU, the finite amount of time
+* required to set up the irq and wait upon it limits the response
+* rate. By busywaiting on the request completion for a short while we
+* can service the high frequency waits as quick as possible. However,
+* if it is a slow request, we want to sleep as quickly as possible.
+* The tradeoff between waiting and sleeping is roughly the time it
+* takes to sleep on a request, on the order of a microsecond.
+*/
 
if (i915_gem_request_get_ring(req)->irq_refcount)
return -EBUSY;
 
-   timeout = jiffies + 1;
+   timeout = local_clock_us() + 5;
while (!need_resched()) {
if (i915_gem_request_completed(req, true))
return 0;
@@ -1161,7 +1204,7 @@ static int __i915_spin_request(struct 
drm_i915_gem_request *req, int state)
if 

[Intel-gfx] [PATCH 06/32] drm/i915: Tighten reset_counter for reset status

2015-12-11 Thread Chris Wilson
In the reset_counter, we use two bits to track a GPU hang and reset. The
low bit is a "reset-in-progress" flag that we set to signal when we need
to break waiters in order for the recovery task to grab the mutex. As
soon as the recovery task has the mutex, we can clear that flag (which
we do by incrementing the reset_counter thereby incrementing the gobal
reset epoch). By clearing that flag when the recovery task holds the
struct_mutex, we can forgo a second flag that simply tells GEM to ignore
the "reset-in-progress" flag.

The second flag we store in the reset_counter is whether the
reset failed and we consider the GPU terminally wedged. Whilst this flag
is set, all access to the GPU (at least through GEM rather than direct mmio
access) is verboten.

Signed-off-by: Chris Wilson 
Cc: Daniel Vetter 
---
 drivers/gpu/drm/i915/i915_debugfs.c |  4 ++--
 drivers/gpu/drm/i915/i915_drv.c | 39 ++---
 drivers/gpu/drm/i915/i915_drv.h |  3 ---
 drivers/gpu/drm/i915/i915_gem.c | 27 +
 drivers/gpu/drm/i915/i915_irq.c | 21 ++--
 5 files changed, 36 insertions(+), 58 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index c26a4c087f49..d5f66bbdb160 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -4672,7 +4672,7 @@ i915_wedged_get(void *data, u64 *val)
struct drm_device *dev = data;
struct drm_i915_private *dev_priv = dev->dev_private;
 
-   *val = i915_reset_counter(_priv->gpu_error);
+   *val = i915_terminally_wedged(_priv->gpu_error);
 
return 0;
 }
@@ -4691,7 +4691,7 @@ i915_wedged_set(void *data, u64 val)
 * while it is writing to 'i915_wedged'
 */
 
-   if (i915_reset_in_progress_or_wedged(_priv->gpu_error))
+   if (i915_reset_in_progress(_priv->gpu_error))
return -EAGAIN;
 
intel_runtime_pm_get(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 8ddfcce92cf1..8bdc51bc00a4 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -858,23 +858,32 @@ int i915_resume_switcheroo(struct drm_device *dev)
 int i915_reset(struct drm_device *dev)
 {
struct drm_i915_private *dev_priv = dev->dev_private;
-   bool simulated;
+   struct i915_gpu_error *error = _priv->gpu_error;
+   unsigned reset_counter;
int ret;
 
intel_reset_gt_powersave(dev);
 
mutex_lock(>struct_mutex);
 
-   i915_gem_reset(dev);
+   /* Clear any previous failed attempts at recovery. Time to try again. */
+   atomic_andnot(I915_WEDGED, >reset_counter);
 
-   simulated = dev_priv->gpu_error.stop_rings != 0;
+   /* Clear the reset-in-progress flag and increment the reset epoch. */
+   reset_counter = atomic_inc_return(>reset_counter);
+   if (WARN_ON(__i915_reset_in_progress(reset_counter))) {
+   ret = -EIO;
+   goto error;
+   }
+
+   i915_gem_reset(dev);
 
ret = intel_gpu_reset(dev);
 
/* Also reset the gpu hangman. */
-   if (simulated) {
+   if (error->stop_rings != 0) {
DRM_INFO("Simulated gpu hang, resetting stop_rings\n");
-   dev_priv->gpu_error.stop_rings = 0;
+   error->stop_rings = 0;
if (ret == -ENODEV) {
DRM_INFO("Reset not implemented, but ignoring "
 "error for simulated gpu hangs\n");
@@ -887,8 +896,7 @@ int i915_reset(struct drm_device *dev)
 
if (ret) {
DRM_ERROR("Failed to reset chip: %i\n", ret);
-   mutex_unlock(>struct_mutex);
-   return ret;
+   goto error;
}
 
intel_overlay_reset(dev_priv);
@@ -907,20 +915,14 @@ int i915_reset(struct drm_device *dev)
 * was running at the time of the reset (i.e. we weren't VT
 * switched away).
 */
-
-   /* Used to prevent gem_check_wedged returning -EAGAIN during gpu reset 
*/
-   dev_priv->gpu_error.reload_in_reset = true;
-
ret = i915_gem_init_hw(dev);
-
-   dev_priv->gpu_error.reload_in_reset = false;
-
-   mutex_unlock(>struct_mutex);
if (ret) {
DRM_ERROR("Failed hw init on reset %d\n", ret);
-   return ret;
+   goto error;
}
 
+   mutex_unlock(>struct_mutex);
+
/*
 * rps/rc6 re-init is necessary to restore state lost after the
 * reset and the re-install of gt irqs. Skip for ironlake per
@@ -931,6 +933,11 @@ int i915_reset(struct drm_device *dev)
intel_enable_gt_powersave(dev);
 
return 0;
+
+error:
+   atomic_or(I915_WEDGED, >reset_counter);
+   mutex_unlock(>struct_mutex);
+   return ret;
 }
 
 static int i915_pci_probe(struct pci_dev 

[Intel-gfx] [PATCH 31/32] drm/i915: Add background commentary to "waitboosting"

2015-12-11 Thread Chris Wilson
Describe the intent of boosting the GPU frequency to maximum before
waiting on the GPU.

RPS waitboosting was introduced with

commit b29c19b645287f7062e17d70fa4e9781a01a5d88
Author: Chris Wilson 
Date:   Wed Sep 25 17:34:56 2013 +0100

drm/i915: Boost RPS frequency for CPU stalls

but lacked a concise comment in the code to explain itself.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/i915_gem.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index a0584cffa7cd..56b00bf69d89 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1246,6 +1246,22 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
}
 
trace_i915_gem_request_wait_begin(req);
+
+   /* This client is about to stall waiting for the GPU. In many cases
+* this is undesirable and limits the throughput of the system, as
+* many clients cannot continue processing user input/output whilst
+* asleep. RPS autotuning may take tens of milliseconds to respond
+* to the GPU load and thus incurs additional latency for the client.
+* We can circumvent that promoting the GPU frequency to maximum
+* before we wait. This makes GPU throttle up much more quickly
+* (good for benchmarks), but at a cost of spending more power
+* processing the workload (bad for battery). Not all clients even
+* want their results immediately and for them we should just let
+* the GPU select its own frequency to maximise efficiency.
+* To prevent a single client from forcing the clocks too high for
+* the whole system, we only allow each client to waitboost once
+* in a busy period.
+*/
if (INTEL_INFO(req->i915)->gen >= 6)
gen6_rps_boost(req->i915, rps, req->emitted_jiffies);
 
-- 
2.6.3

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 27/32] drm/i915: Harden detection of missed interrupts

2015-12-11 Thread Chris Wilson
Only declare a missed interrupt if we find that the GPU is idle with
waiters and a hangcheck interval has passed in which no new user
interrupts have been raised.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/i915_debugfs.c | 6 ++
 drivers/gpu/drm/i915/i915_irq.c | 7 ++-
 drivers/gpu/drm/i915/intel_ringbuffer.h | 2 ++
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index a03ed9e38499..78506abe7882 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -735,6 +735,9 @@ static void i915_ring_seqno_info(struct seq_file *m,
seq_printf(m, "Current sequence (%s): %x\n",
   ring->name, intel_ring_get_seqno(ring));
 
+   seq_printf(m, "Current user interrupts (%s): %x\n",
+  ring->name, READ_ONCE(ring->user_interrupts));
+
spin_lock(>breadcrumbs.lock);
for (rb = rb_first(>breadcrumbs.requests);
 rb != NULL;
@@ -1369,6 +1372,9 @@ static int i915_hangcheck_info(struct seq_file *m, void 
*unused)
seq_printf(m, "\tseqno = %x [current %x], waiters? %d\n",
   ring->hangcheck.seqno, seqno[i],
   intel_engine_has_waiter(ring));
+   seq_printf(m, "\tuser interrupts = %x [current %x]\n",
+  ring->hangcheck.user_interrupts,
+  ring->user_interrupts);
seq_printf(m, "\tACTHD = 0x%08llx [current 0x%08llx]\n",
   (long long)ring->hangcheck.acthd,
   (long long)acthd[i]);
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 64502c0d2a81..e864ebeef4ef 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1000,6 +1000,7 @@ static void notify_ring(struct intel_engine_cs *ring)
return;
 
trace_i915_gem_request_notify(ring);
+   ring->user_interrupts++;
intel_engine_wakeup(ring);
 }
 
@@ -2974,12 +2975,14 @@ static void i915_hangcheck_elapsed(struct work_struct 
*work)
for_each_ring(ring, dev_priv, i) {
u64 acthd;
u32 seqno;
+   unsigned user_interrupts;
bool busy = true;
 
semaphore_clear_deadlocks(dev_priv);
 
acthd = intel_ring_get_active_head(ring);
seqno = intel_ring_get_seqno(ring);
+   user_interrupts = ring->user_interrupts;
 
if (ring->hangcheck.seqno == seqno) {
if (ring_idle(ring, seqno)) {
@@ -2987,7 +2990,8 @@ static void i915_hangcheck_elapsed(struct work_struct 
*work)
 
if (intel_engine_has_waiter(ring)) {
/* Issue a wake-up to catch stuck h/w. 
*/
-   if (!test_and_set_bit(ring->id, 
_priv->gpu_error.missed_irq_rings)) {
+   if (ring->hangcheck.user_interrupts == 
user_interrupts &&
+   !test_and_set_bit(ring->id, 
_priv->gpu_error.missed_irq_rings)) {
if (!test_bit(ring->id, 
_priv->gpu_error.test_irq_rings))
DRM_ERROR("Hangcheck 
timer elapsed... %s idle\n",
  ring->name);
@@ -3051,6 +3055,7 @@ static void i915_hangcheck_elapsed(struct work_struct 
*work)
 
ring->hangcheck.seqno = seqno;
ring->hangcheck.acthd = acthd;
+   ring->hangcheck.user_interrupts = user_interrupts;
busy_count += busy;
}
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h 
b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 33780fad6a30..1b4aa59c4d21 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -90,6 +90,7 @@ struct intel_ring_hangcheck {
u64 acthd;
u64 max_acthd;
u32 seqno;
+   unsigned user_interrupts;
int score;
enum intel_ring_hangcheck_action action;
int deadlock;
@@ -323,6 +324,7 @@ struct  intel_engine_cs {
 * inspecting request list.
 */
u32 last_submitted_seqno;
+   unsigned user_interrupts;
 
bool gpu_caches_dirty;
 
-- 
2.6.3

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 32/32] drm/i915: Flush the RPS bottom-half when the GPU idles

2015-12-11 Thread Chris Wilson
Make sure that the RPS bottom-half is flushed before we set the idle
frequency when we decide the GPU is idle. This should prevent any races
with the bottom-half and setting the idle frequency, and ensures that
the bottom-half is bounded by the GPU's rpm reference taken for when it
is active (i.e. between gen6_rps_busy() and gen6_rps_idle()).

v2: Avoid recursively using the i915->wq - RPS does not touch the
struct_mutex so has no place being on the ordered i915->wq.

Signed-off-by: Chris Wilson 
Cc: Imre Deak 
Cc: Jesse Barnes 
---
 drivers/gpu/drm/i915/i915_irq.c |  2 +-
 drivers/gpu/drm/i915/intel_pm.c | 10 +++---
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index e5e307654c66..4cfbd694b3a8 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1609,7 +1609,7 @@ static void gen6_rps_irq_handler(struct drm_i915_private 
*dev_priv, u32 pm_iir)
gen6_disable_pm_irq(dev_priv, pm_iir & dev_priv->pm_rps_events);
if (dev_priv->rps.interrupts_enabled) {
dev_priv->rps.pm_iir |= pm_iir & 
dev_priv->pm_rps_events;
-   queue_work(dev_priv->wq, _priv->rps.work);
+   schedule_work(_priv->rps.work);
}
spin_unlock(_priv->irq_lock);
}
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 570628628a90..f543f897c516 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -4401,11 +4401,15 @@ void gen6_rps_busy(struct drm_i915_private *dev_priv)
 
 void gen6_rps_idle(struct drm_i915_private *dev_priv)
 {
-   struct drm_device *dev = dev_priv->dev;
+   /* Flush our bottom-half so that it does not race with us
+* setting the idle frequency and so that it is bounded by
+* our rpm wakeref.
+*/
+   flush_work(_priv->rps.work);
 
mutex_lock(_priv->rps.hw_lock);
if (dev_priv->rps.enabled) {
-   if (IS_VALLEYVIEW(dev) || IS_CHERRYVIEW(dev))
+   if (IS_VALLEYVIEW(dev_priv) || IS_CHERRYVIEW(dev_priv))
vlv_set_rps_idle(dev_priv);
else
gen6_set_rps(dev_priv->dev, dev_priv->rps.idle_freq);
@@ -4443,7 +4447,7 @@ void gen6_rps_boost(struct drm_i915_private *dev_priv,
spin_lock_irq(_priv->irq_lock);
if (dev_priv->rps.interrupts_enabled) {
dev_priv->rps.client_boost = true;
-   queue_work(dev_priv->wq, _priv->rps.work);
+   schedule_work(_priv->rps.work);
}
spin_unlock_irq(_priv->irq_lock);
 
-- 
2.6.3

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 25/32] drm/i915: Convert trace-irq to the breadcrumb waiter

2015-12-11 Thread Chris Wilson
If we convert the tracing over from direct use of ring->irq_get() and
over to the breadcrumb infrastructure, we only have a single user of the
ring->irq_get and so we will be able to simplify the driver routines
(eliminating the redundant validation and irq refcounting).

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/i915_drv.h  |  8 ---
 drivers/gpu/drm/i915/i915_gem.c  |  6 -
 drivers/gpu/drm/i915/i915_trace.h|  2 +-
 drivers/gpu/drm/i915/intel_breadcrumbs.c | 39 
 drivers/gpu/drm/i915/intel_ringbuffer.h  |  4 +++-
 5 files changed, 43 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 37f4ef59fb4a..dabfb043362f 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3625,12 +3625,4 @@ wait_remaining_ms_from_jiffies(unsigned long 
timestamp_jiffies, int to_wait_ms)
schedule_timeout_uninterruptible(remaining_jiffies);
}
 }
-
-static inline void i915_trace_irq_get(struct intel_engine_cs *ring,
- struct drm_i915_gem_request *req)
-{
-   if (ring->trace_irq_req == NULL && ring->irq_get(ring))
-   i915_gem_request_assign(>trace_irq_req, req);
-}
-
 #endif
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 78bcd231b100..fdd9dd5296e9 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2907,12 +2907,6 @@ i915_gem_retire_requests_ring(struct intel_engine_cs 
*ring)
i915_gem_object_retire__read(obj, ring->id);
}
 
-   if (unlikely(ring->trace_irq_req &&
-i915_gem_request_completed(ring->trace_irq_req))) {
-   ring->irq_put(ring);
-   i915_gem_request_assign(>trace_irq_req, NULL);
-   }
-
WARN_ON(i915_verify_lists(ring->dev));
 }
 
diff --git a/drivers/gpu/drm/i915/i915_trace.h 
b/drivers/gpu/drm/i915/i915_trace.h
index efca75bcace3..628008e6c24f 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -503,7 +503,7 @@ TRACE_EVENT(i915_gem_ring_dispatch,
   __entry->ring = ring->id;
   __entry->seqno = i915_gem_request_get_seqno(req);
   __entry->flags = flags;
-  i915_trace_irq_get(ring, req);
+  intel_breadcrumbs_enable_trace(req);
   ),
 
TP_printk("dev=%u, ring=%u, seqno=%u, flags=%x",
diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c 
b/drivers/gpu/drm/i915/intel_breadcrumbs.c
index 69b966b4f71b..ea5ee3f7fe01 100644
--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -258,17 +258,56 @@ void intel_engine_remove_breadcrumb(struct 
intel_engine_cs *engine,
spin_unlock(>lock);
 }
 
+static void intel_breadcrumbs_tracer(struct work_struct *work)
+{
+   struct intel_breadcrumbs *b =
+   container_of(work, struct intel_breadcrumbs, trace);
+   struct intel_rps_client rps;
+
+   INIT_LIST_HEAD();
+
+   do {
+   struct drm_i915_gem_request *request;
+
+   spin_lock(>lock);
+   request = b->trace_request;
+   b->trace_request = NULL;
+   spin_unlock(>lock);
+   if (request == NULL)
+   return;
+
+   __i915_wait_request(request, true, NULL, );
+   i915_gem_request_unreference__unlocked(request);
+   } while (1);
+}
+
+void intel_breadcrumbs_enable_trace(struct drm_i915_gem_request *request)
+{
+   struct intel_breadcrumbs *b = >ring->breadcrumbs;
+
+   spin_lock(>lock);
+   if (b->trace_request == NULL) {
+   b->trace_request = i915_gem_request_reference(request);
+   queue_work(system_long_wq, >trace);
+   }
+   spin_unlock(>lock);
+}
+
 void intel_engine_init_breadcrumbs(struct intel_engine_cs *engine)
 {
struct intel_breadcrumbs *b = >breadcrumbs;
 
spin_lock_init(>lock);
setup_timer(>fake_irq, intel_breadcrumbs_fake_irq, (unsigned long)b);
+   INIT_WORK(>trace, intel_breadcrumbs_tracer);
 }
 
 void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine)
 {
struct intel_breadcrumbs *b = >breadcrumbs;
 
+   cancel_work_sync(>trace);
+   i915_gem_request_unreference(b->trace_request);
+
del_timer_sync(>fake_irq);
 }
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h 
b/drivers/gpu/drm/i915/intel_ringbuffer.h
index a35c17106f4b..0fd6395f1a1b 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -182,6 +182,8 @@ struct  intel_engine_cs {
struct rb_root requests; /* sorted by retirement */
struct task_struct *first_waiter; /* 

[Intel-gfx] [PATCH 30/32] drm/i915: Restore waitboost credit to the synchronous waiter

2015-12-11 Thread Chris Wilson
Ideally, we want to automagically have the GPU respond to the
instantaneous load by reclocking itself. However, reclocking occurs
relatively slowly, and to the client waiting for a result from the GPU,
too late. To compensate and reduce the client latency, we allow the
first wait from a client to boost the GPU clocks to maximum. This
overcomes the lag in autoreclocking, at the expense of forcing the GPU
clocks too high. So to offset the excessive power usage, we currently
allow a client to only boost the clocks once before we detect the GPU
is idle again. This works reasonably for say the first frame in a
benchmark, but for many more synchronous workloads (like OpenCL) we find
the GPU clocks remain too low. By noting a wait which would idle the GPU
(i.e. we just waited upon the last known request), we can give that
client the idle boost credit (for their next wait) without the 100ms
delay required for us to detect the GPU idle state. The intention is to
boost clients that are stalling in the process of feeding the GPU more
work (and who in doing so let the GPU idle), without granting boost
credits to clients that are throttling themselves (such as compositors).

Signed-off-by: Chris Wilson 
Cc: "Zou, Nanhai" 
Cc: Jesse Barnes 
Reviewed-by: Jesse Barnes 
---
 drivers/gpu/drm/i915/i915_gem.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d1a7a7f8f3ad..a0584cffa7cd 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1340,6 +1340,22 @@ out:
*timeout = 0;
}
 
+   if (ret == 0 && rps && req->seqno == req->ring->last_submitted_seqno) {
+   /* The GPU is now idle and this client has stalled.
+* Since no other client has submitted a request in the
+* meantime, assume that this client is the only one
+* supplying work to the GPU but is unable to keep that
+* work supplied because it is waiting. Since the GPU is
+* then never kept fully busy, RPS autoclocking will
+* keep the clocks relatively low, causing further delays.
+* Compensate by giving the synchronous client credit for
+* a waitboost next time.
+*/
+   spin_lock(>i915->rps.client_lock);
+   list_del_init(>link);
+   spin_unlock(>i915->rps.client_lock);
+   }
+
return ret;
 }
 
-- 
2.6.3

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 28/32] drm/i915: Remove debug noise on detecting fault-injection of missed interrupts

2015-12-11 Thread Chris Wilson
Since the tests can and do explicitly check debugfs/i915_ring_missed_irqs
for the handling of a "missed interrupt", adding it to the dmesg at INFO
is just noise. When it happens for real, we still class it as an ERROR.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/i915_irq.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index e864ebeef4ef..e5e307654c66 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2995,9 +2995,6 @@ static void i915_hangcheck_elapsed(struct work_struct 
*work)
if (!test_bit(ring->id, 
_priv->gpu_error.test_irq_rings))
DRM_ERROR("Hangcheck 
timer elapsed... %s idle\n",
  ring->name);
-   else
-   DRM_INFO("Fake missed 
irq on %s\n",
-ring->name);
 

intel_engine_enable_fake_irq(ring);
}
-- 
2.6.3

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 24/32] drm/i915: On GPU reset, set the HWS breadcrumb to the last seqno

2015-12-11 Thread Chris Wilson
After the GPU reset and we discard all of the incomplete requests, mark
the GPU as having advanced to the last_submitted_seqno (as having
completed the requests and ready for fresh work). The impact of this is
negligble, as all the requests will be considered completed by this
point, it just brings the HWS into line with expectations for external
viewers.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/i915_gem.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index ca327c0e73f1..78bcd231b100 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2836,6 +2836,8 @@ static void i915_gem_reset_ring_cleanup(struct 
drm_i915_private *dev_priv,
buffer->last_retired_head = buffer->tail;
intel_ring_update_space(buffer);
}
+
+   intel_ring_init_seqno(ring, ring->last_submitted_seqno);
 }
 
 void i915_gem_reset(struct drm_device *dev)
-- 
2.6.3

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 23/32] drm/i915: Only query timestamp when measuring elapsed time

2015-12-11 Thread Chris Wilson
Avoid the two calls to ktime_get_raw_ns() (at best it reads the TSC) as
we only need to compute the elapsed time for a timed wait.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/i915_gem.c | 13 +
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d595d72e53b1..ca327c0e73f1 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1222,7 +1222,6 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
int state = interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
struct intel_breadcrumb wait;
unsigned long timeout_remain;
-   s64 before, now;
int ret = 0;
 
might_sleep();
@@ -1241,13 +1240,12 @@ int __i915_wait_request(struct drm_i915_gem_request 
*req,
if (*timeout == 0)
return -ETIME;
 
+   /* Record current time in case interrupted, or wedged */
timeout_remain = nsecs_to_jiffies_timeout(*timeout);
+   *timeout += ktime_get_raw_ns();
}
 
-   /* Record current time in case interrupted by signal, or wedged */
trace_i915_gem_request_wait_begin(req);
-   before = ktime_get_raw_ns();
-
if (INTEL_INFO(req->i915)->gen >= 6)
gen6_rps_boost(req->i915, rps, req->emitted_jiffies);
 
@@ -1324,13 +1322,12 @@ wakeup: set_task_state(wait.task, state);
 out:
intel_engine_remove_breadcrumb(req->ring, );
__set_task_state(wait.task, TASK_RUNNING);
-   now = ktime_get_raw_ns();
trace_i915_gem_request_wait_end(req);
 
if (timeout) {
-   s64 tres = *timeout - (now - before);
-
-   *timeout = tres < 0 ? 0 : tres;
+   *timeout -= ktime_get_raw_ns();
+   if (*timeout < 0)
+   *timeout = 0;
 
/*
 * Apparently ktime isn't accurate enough and occasionally has a
-- 
2.6.3

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH V4 2/2] drm/i915: start adding dp mst audio

2015-12-11 Thread Takashi Iwai
On Fri, 11 Dec 2015 11:43:51 +0100,
Takashi Iwai wrote:
> 
> On Fri, 11 Dec 2015 07:07:53 +0100,
> Libin Yang wrote:
> > 
> > >>> diff --git a/drivers/gpu/drm/i915/intel_audio.c 
> > >>> b/drivers/gpu/drm/i915/intel_audio.c
> > >>> index 9aa83e7..5ad2e66 100644
> > >>> --- a/drivers/gpu/drm/i915/intel_audio.c
> > >>> +++ b/drivers/gpu/drm/i915/intel_audio.c
> > >>> @@ -262,7 +262,8 @@ static void hsw_audio_codec_disable(struct 
> > >>> intel_encoder *encoder)
> > >>> tmp |= AUD_CONFIG_N_PROG_ENABLE;
> > >>> tmp &= ~AUD_CONFIG_UPPER_N_MASK;
> > >>> tmp &= ~AUD_CONFIG_LOWER_N_MASK;
> > >>> -   if (intel_pipe_has_type(intel_crtc, INTEL_OUTPUT_DISPLAYPORT))
> > >>> +   if (intel_pipe_has_type(intel_crtc, INTEL_OUTPUT_DISPLAYPORT) ||
> > >>> +   intel_pipe_has_type(intel_crtc, INTEL_OUTPUT_DP_MST))
> > >>> tmp |= AUD_CONFIG_N_VALUE_INDEX;
> 
> The same check is missing in hsw_audio_codec_enable()?
> 
> > >>> I915_WRITE(HSW_AUD_CFG(pipe), tmp);
> > >>>
> > >>> @@ -474,7 +475,8 @@ static void ilk_audio_codec_enable(struct 
> > >>> drm_connector *connector,
> > >>> tmp &= ~AUD_CONFIG_N_VALUE_INDEX;
> > >>> tmp &= ~AUD_CONFIG_N_PROG_ENABLE;
> > >>> tmp &= ~AUD_CONFIG_PIXEL_CLOCK_HDMI_MASK;
> > >>> -   if (intel_pipe_has_type(intel_crtc, INTEL_OUTPUT_DISPLAYPORT))
> > >>> +   if (intel_pipe_has_type(intel_crtc, INTEL_OUTPUT_DISPLAYPORT) ||
> > >>> +   intel_pipe_has_type(intel_crtc, INTEL_OUTPUT_DP_MST))
> > >>> tmp |= AUD_CONFIG_N_VALUE_INDEX;
> 
> ... and missing for ilk_audio_codec_disable()?
> 
> 
> > >>> else
> > >>> tmp |= audio_config_hdmi_pixel_clock(adjusted_mode);
> > >>> @@ -512,7 +514,8 @@ void intel_audio_codec_enable(struct intel_encoder 
> > >>> *intel_encoder)
> > >>>
> > >>> /* ELD Conn_Type */
> > >>> connector->eld[5] &= ~(3 << 2);
> > >>> -   if (intel_pipe_has_type(crtc, INTEL_OUTPUT_DISPLAYPORT))
> > >>> +   if (intel_pipe_has_type(crtc, INTEL_OUTPUT_DISPLAYPORT) ||
> > >>> +   intel_pipe_has_type(crtc, INTEL_OUTPUT_DP_MST))
> 
> IMO, it's better to have a macro to cover this two-line check instead
> of open-coding at each place.  We'll have 5 places in the end.

Also, this patch still has an issue about the encoder type, namely, it
passes intel_encoder from MST, where you can't apply
enc_to_dig_port().  We need another help to get the digital port
depending on the encoder type, e.g.

static struct intel_digital_port *
intel_encoder_to_dig_port(struct intel_encoder *intel_encoder)
{
struct drm_encoder *encoder = _encoder->base;

if (intel_encoder->type == INTEL_OUTPUT_DP_MST)
return enc_to_mst(encoder)->primary;
return enc_to_dig_port(encoder);
}


Takashi
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH] drm/i915: Allow objects to go back above 4GB in the address range

2015-12-11 Thread Michel Thierry
We detected if objects should be moved to the lower parts when 48-bit
support flag was not set, but not the other way around.

This handles the case in which an object was allocated in the 32-bit
address range, but it has been marked as safe to move above it, which
theoretically would help to keep the lower addresses available for
objects which really need to be there.

Cc: Daniele Ceraolo Spurio 
Signed-off-by: Michel Thierry 
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 8df5b96..a83916e 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -696,6 +696,11 @@ eb_vma_misplaced(struct i915_vma *vma)
(vma->node.start + vma->node.size - 1) >> 32)
return true;
 
+   /* keep the lower addresses free of unnecessary objects */
+   if ((entry->flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS) &&
+   !((vma->node.start + vma->node.size - 1) >> 32))
+   return true;
+
return false;
 }
 
-- 
2.6.3

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v2] PM / Runtime: Introduce pm_runtime_get_noidle

2015-12-11 Thread Rafael J. Wysocki
On Friday, December 11, 2015 01:03:50 PM Ulf Hansson wrote:
> [...]
> 
> >> >
> >> > Which basically means you can call pm_runtime_resume() just fine,
> >> > because it will do nothing if the status is RPM_ACTIVE already.
> >> >
> >> > So really, why don't you use pm_runtime_get_sync()?
> >>
> >> The difference would be that if the status is not RPM_ACTIVE already we
> >> would drop the reference and report error. The caller would in this
> >> case forego of doing something, since we the device is suspended or on
> >> the way to being suspended. One example of such a scenario is a
> >> watchdog like functionality: the watchdog work would
> >> call pm_runtime_get_noidle() and check if the device is ok by doing
> >> some HW access, but only if the device is powered. Otherwise the work
> >> item would do nothing (meaning it also won't reschedule itself). The
> >> watchdog work would get rescheduled next time the device is woken up
> >> and some work is submitted to the device.
> >
> > So first of all the name "pm_runtime_get_noidle" doesn't make sense.
> >
> > I guess what you need is something like
> >
> > bool pm_runtime_get_if_active(struct device *dev)
> > {
> > unsigned log flags;
> > bool ret;
> >
> > spin_lock_irqsave(>power.lock, flags);
> >
> > if (dev->power.runtime_status == RPM_ACTIVE) {
> > atomic_inc(>power.usage_count);
> > ret = true;
> > } else {
> > ret = false;
> > }
> >
> > spin_unlock_irqrestore(>power.lock, flags);
> > }
> >
> > and the caller will simply bail out if "false" is returned, but if "true"
> > is returned, it will have to drop the usage count, right?
> >
> > Thanks,
> > Rafael
> >
> 
> Why not just:
> 
> pm_runtime_get_noresume():
> if (RPM_ACTIVE)
>   "do some actions"
> pm_runtime_put();

Because that's racy?

What if the rpm_suspend() is running for the device, but it hasn't changed
the status yet?

Thanks,
Rafael

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v3] drm/i915: Avoid writing relocs with addresses in non-canonical form

2015-12-11 Thread Michel Thierry

On 12/11/2015 2:13 PM, Michał Winiarski wrote:

According to bspec, some parts of HW require the addresses to be in
a canonical form, where bits [63:48] == [47]. Let's convert addresses to
canonical form prior to relocating and return converted offsets to
userspace. We also need to make sure that userspace is using addresses
in canonical form in case of softpin.

v2: Whitespace fixup, gen8_canonical_addr description (Chris, Ville)
v3: Rebase on top of softpin, fix a hole in relocate_entry,
 s/expect/require (Chris)

Cc: Chris Wilson 
Cc: Michel Thierry 
Cc: Ville Syrjälä 
Signed-off-by: Michał Winiarski 


With updated gem_softpin 
[http://patchwork.freedesktop.org/patch/msgid/1449843255-32640-1-git-send-email-michel.thie...@intel.com]


Tested-by: Michel Thierry 


---
  drivers/gpu/drm/i915/i915_gem.c|  9 +++--
  drivers/gpu/drm/i915/i915_gem_execbuffer.c | 21 +++--
  drivers/gpu/drm/i915/i915_gem_gtt.h| 12 
  3 files changed, 34 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 8e2acde..b83207b 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3482,12 +3482,17 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object 
*obj,

if (flags & PIN_OFFSET_FIXED) {
uint64_t offset = flags & PIN_OFFSET_MASK;
+   uint64_t noncanonical_offset = offset & ((1ULL << 48) - 1);

-   if (offset & (alignment - 1) || offset + size > end) {
+   if (offset & (alignment - 1) ||
+   noncanonical_offset + size > end ||
+   offset != gen8_canonical_addr(offset)) {
ret = -EINVAL;
goto err_free_vma;
}
-   vma->node.start = offset;
+   /* While userspace is using addresses in canonical form, our
+* allocator is unaware of this */
+   vma->node.start = noncanonical_offset;
vma->node.size = size;
vma->node.color = obj->cache_level;
ret = drm_mm_reserve_node(>mm, >node);
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 48ec484..445ccc7 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -249,6 +249,13 @@ static inline int use_cpu_reloc(struct drm_i915_gem_object 
*obj)
obj->cache_level != I915_CACHE_NONE);
  }

+static inline uint64_t
+relocation_target(struct drm_i915_gem_relocation_entry *reloc,
+ uint64_t target_offset)
+{
+   return gen8_canonical_addr((int)reloc->delta + target_offset);
+}
+
  static int
  relocate_entry_cpu(struct drm_i915_gem_object *obj,
   struct drm_i915_gem_relocation_entry *reloc,
@@ -256,7 +263,7 @@ relocate_entry_cpu(struct drm_i915_gem_object *obj,
  {
struct drm_device *dev = obj->base.dev;
uint32_t page_offset = offset_in_page(reloc->offset);
-   uint64_t delta = reloc->delta + target_offset;
+   uint64_t delta = relocation_target(reloc, target_offset);
char *vaddr;
int ret;

@@ -292,7 +299,7 @@ relocate_entry_gtt(struct drm_i915_gem_object *obj,
  {
struct drm_device *dev = obj->base.dev;
struct drm_i915_private *dev_priv = dev->dev_private;
-   uint64_t delta = reloc->delta + target_offset;
+   uint64_t delta = relocation_target(reloc, target_offset);
uint64_t offset;
void __iomem *reloc_page;
int ret;
@@ -347,7 +354,7 @@ relocate_entry_clflush(struct drm_i915_gem_object *obj,
  {
struct drm_device *dev = obj->base.dev;
uint32_t page_offset = offset_in_page(reloc->offset);
-   uint64_t delta = (int)reloc->delta + target_offset;
+   uint64_t delta = relocation_target(reloc, target_offset);
char *vaddr;
int ret;

@@ -395,7 +402,7 @@ i915_gem_execbuffer_relocate_entry(struct 
drm_i915_gem_object *obj,
target_i915_obj = target_vma->obj;
target_obj = _vma->obj->base;

-   target_offset = target_vma->node.start;
+   target_offset = gen8_canonical_addr(target_vma->node.start);

/* Sandybridge PPGTT errata: We need a global gtt mapping for MI and
 * pipe_control writes because the gpu doesn't properly redirect them
@@ -583,6 +590,7 @@ i915_gem_execbuffer_reserve_vma(struct i915_vma *vma,
struct drm_i915_gem_object *obj = vma->obj;
struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
uint64_t flags;
+   uint64_t offset;
int ret;

flags = PIN_USER;
@@ -625,8 +633,9 @@ i915_gem_execbuffer_reserve_vma(struct i915_vma *vma,
entry->flags |= __EXEC_OBJECT_HAS_FENCE;
 

[Intel-gfx] [RFC 31/38] drm/i915/preempt: scheduler logic for landing preemptive requests

2015-12-11 Thread John . C . Harrison
From: Dave Gordon 

This patch adds the GEM & scheduler logic for detection and first-stage
processing of completed preemption requests. Similar to regular batches,
they deposit their sequence number in the hardware status page when
starting and again when finished, but using different locations so that
information pertaining to a preempted batch is not overwritten. Also,
the in-progress flag is not by the GPU cleared at the end of the batch;
instead driver software is responsible for clearing this once the
request completion has been noticed.

Actually-preemptive requests are still disabled via a module parameter
at this early stage, as the rest of the logic to deal with the
consequences of preemption isn't in place yet.

v2: Re-worked to simplify 'pre-emption in progress' logic.

For: VIZ-2021
Signed-off-by: Dave Gordon 
---
 drivers/gpu/drm/i915/i915_gem.c | 55 --
 drivers/gpu/drm/i915/i915_scheduler.c   | 70 +
 drivers/gpu/drm/i915/i915_scheduler.h   |  3 +-
 drivers/gpu/drm/i915/intel_ringbuffer.h |  1 +
 4 files changed, 107 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 66c9a58..ea3d224 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2489,6 +2489,14 @@ i915_gem_init_seqno(struct drm_device *dev, u32 seqno)
ring->last_irq_seqno = 0;
}
 
+   /* Also reset sw batch tracking state */
+   for_each_ring(ring, dev_priv, i) {
+   intel_write_status_page(ring, I915_BATCH_DONE_SEQNO, 0);
+   intel_write_status_page(ring, I915_BATCH_ACTIVE_SEQNO, 0);
+   intel_write_status_page(ring, I915_PREEMPTIVE_DONE_SEQNO, 0);
+   intel_write_status_page(ring, I915_PREEMPTIVE_ACTIVE_SEQNO, 0);
+   }
+
return 0;
 }
 
@@ -2831,15 +2839,18 @@ void i915_gem_request_notify(struct intel_engine_cs 
*ring, bool fence_locked)
return;
}
 
-   seqno = ring->get_seqno(ring, false);
+   seqno   = ring->get_seqno(ring, false);
trace_i915_gem_request_notify(ring, seqno);
-   if (seqno == ring->last_irq_seqno)
+
+   /* Is there anything new to process? */
+   if ((seqno == ring->last_irq_seqno) && 
!i915_scheduler_is_ring_preempting(ring))
return;
-   ring->last_irq_seqno = seqno;
 
if (!fence_locked)
spin_lock_irqsave(>fence_lock, flags);
 
+   ring->last_irq_seqno = seqno;
+
list_for_each_entry_safe(req, req_next, >fence_signal_list, 
signal_link) {
if (!req->cancelled) {
/* How can this happen? */
@@ -2861,7 +2872,7 @@ void i915_gem_request_notify(struct intel_engine_cs 
*ring, bool fence_locked)
 * and call scheduler_clean() while the scheduler
 * thinks it is still active.
 */
-   wake_sched |= i915_scheduler_notify_request(req);
+   wake_sched |= i915_scheduler_notify_request(req, false);
 
if (!req->cancelled) {
fence_signal_locked(>fence);
@@ -2877,6 +2888,42 @@ void i915_gem_request_notify(struct intel_engine_cs 
*ring, bool fence_locked)
list_add_tail(>unsignal_link, >fence_unsignal_list);
}
 
+   if (i915_scheduler_is_ring_preempting(ring)) {
+   u32 preempt_start, preempt_done;
+
+   preempt_start = intel_read_status_page(ring, 
I915_PREEMPTIVE_ACTIVE_SEQNO);
+   preempt_done = intel_read_status_page(ring, 
I915_PREEMPTIVE_DONE_SEQNO);
+
+   /*
+* A preemption request leaves both ACTIVE and DONE set to the 
same
+* seqno.  If we find ACTIVE set but DONE is different, the 
preemption
+* has started but not yet completed, so leave it until next 
time.
+* After successfully processing a preemption request, we clear 
ACTIVE
+* below to ensure we don't see it again.
+*/
+   if (preempt_start && preempt_done == preempt_start) {
+   bool sched_ack = false;
+
+   list_for_each_entry_safe(req, req_next, 
>fence_signal_list, signal_link) {
+   if (req->seqno == preempt_done) {
+   /* De-list and notify the scheduler, 
but don't signal yet */
+   list_del_init(>signal_link);
+   sched_ack = 
i915_scheduler_notify_request(req, true);
+   break;
+   }
+   }
+
+   WARN_ON(!sched_ack);
+   wake_sched = true;
+
+   /* Capture BATCH ACTIVE to determine whether a batch 
was in 

[Intel-gfx] [RFC 32/38] drm/i915/preempt: add hook to catch 'unexpected' ring submissions

2015-12-11 Thread John . C . Harrison
From: Dave Gordon 

Author: John Harrison 
Date:   Thu Apr 10 10:41:06 2014 +0100

The scheduler needs to know what each seqno that pops out of the ring is
referring to. This change adds a hook into the the 'submit some random
work that got forgotten about' clean up code to inform the scheduler
that a new seqno has been sent to the ring for some non-batch buffer
operation.

Reworked for latest scheduler+preemption by Dave Gordon: with the newer
implementation, knowing about untracked requests is merely helpful for
debugging rather than being mandatory, as we have already taken steps to
prevent untracked requests intruding at awkward moments!

v2: Removed unnecessary debug spew.

For: VIZ-2021
Signed-off-by: John Harrison 
Signed-off-by: Dave Gordon 
---
 drivers/gpu/drm/i915/i915_gem.c   |  4 
 drivers/gpu/drm/i915/i915_gpu_error.c |  2 ++
 drivers/gpu/drm/i915/i915_scheduler.c | 21 +
 drivers/gpu/drm/i915/i915_scheduler.h |  1 +
 4 files changed, 28 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index ea3d224..a91b916 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2597,6 +2597,10 @@ void __i915_add_request(struct drm_i915_gem_request 
*request,
WARN_ON(request->seqno != dev_priv->last_seqno);
}
 
+   /* Notify the scheduler, if it doesn't already track this request */
+   if (!request->scheduler_qe)
+   i915_scheduler_fly_request(request);
+
/* Record the position of the start of the request so that
 * should we detect the updated seqno part-way through the
 * GPU processing the request, we never over-estimate the
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c 
b/drivers/gpu/drm/i915/i915_gpu_error.c
index 2d9dd3f..72c861e 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1331,6 +1331,8 @@ static void i915_gem_record_rings(struct drm_device *dev,
erq->ringbuffer_gtt = 
i915_gem_obj_ggtt_offset(request->ringbuf->obj);
erq->scheduler_state = !sqe ? 'u' :
i915_scheduler_queue_status_chr(sqe->status);
+   if (request->scheduler_flags & i915_req_sf_untracked)
+   erq->scheduler_state = 'U';
}
}
 }
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c 
b/drivers/gpu/drm/i915/i915_scheduler.c
index 54b6c32..8cd89d2 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -455,6 +455,27 @@ int i915_scheduler_queue_execbuffer(struct 
i915_scheduler_queue_entry *qe)
return 0;
 }
 
+/* An untracked request is being launched ... */
+void i915_scheduler_fly_request(struct drm_i915_gem_request *req)
+{
+   struct drm_i915_private *dev_priv = req->i915;
+   struct i915_scheduler *scheduler = dev_priv->scheduler;
+
+   BUG_ON(!scheduler);
+   BUG_ON(!mutex_is_locked(_priv->dev->struct_mutex));
+
+   /* This shouldn't happen */
+   WARN_ON(i915_scheduler_is_ring_busy(req->ring));
+
+   /* We don't expect to see nodes that are already tracked */
+   if (!WARN_ON(req->scheduler_qe)) {
+   /* Untracked node, must not be inside scheduler submission path 
*/
+   WARN_ON((scheduler->flags[req->ring->id] & i915_sf_submitting));
+   scheduler->stats[req->ring->id].non_batch++;
+   req->scheduler_flags |= i915_req_sf_untracked;
+   }
+}
+
 static int i915_scheduler_fly_node(struct i915_scheduler_queue_entry *node)
 {
struct drm_i915_private *dev_priv = node->params.dev->dev_private;
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h 
b/drivers/gpu/drm/i915/i915_scheduler.h
index 5b871b0..7e7e974 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -199,6 +199,7 @@ booli915_scheduler_is_ring_flying(struct 
intel_engine_cs *ring);
 booli915_scheduler_is_ring_preempting(struct intel_engine_cs *ring);
 booli915_scheduler_is_ring_busy(struct intel_engine_cs *ring);
 voidi915_gem_scheduler_work_handler(struct work_struct *work);
+voidi915_scheduler_fly_request(struct drm_i915_gem_request *req);
 int i915_scheduler_flush(struct intel_engine_cs *ring, bool is_locked);
 int i915_scheduler_flush_stamp(struct intel_engine_cs *ring,
   unsigned long stamp, bool is_locked);
-- 
1.9.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 12/40] drm/i915: Added scheduler hook when closing DRM file handles

2015-12-11 Thread John . C . Harrison
From: John Harrison 

The scheduler decouples the submission of batch buffers to the driver
with submission of batch buffers to the hardware. Thus it is possible
for an application to close its DRM file handle while there is still
work outstanding. That means the scheduler needs to know about file
close events so it can remove the file pointer from such orphaned
batch buffers and not attempt to dereference it later.

v3: Updated to not wait for outstanding work to complete but merely
remove the file handle reference. The wait was getting excessively
complicated with inter-ring dependencies, pre-emption, and other such
issues.

Change-Id: I24ac056c062b075ff1cc5e2ed2d3fa8e17e85951
For: VIZ-1587
Signed-off-by: John Harrison 
---
 drivers/gpu/drm/i915/i915_dma.c   |  3 +++
 drivers/gpu/drm/i915/i915_scheduler.c | 35 +++
 drivers/gpu/drm/i915/i915_scheduler.h |  2 ++
 3 files changed, 40 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 731cf31..c2f9c03 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -46,6 +46,7 @@
 #include 
 #include 
 #include 
+#include "i915_scheduler.h"
 #include 
 #include 
 #include 
@@ -1250,6 +1251,8 @@ void i915_driver_lastclose(struct drm_device *dev)
 
 void i915_driver_preclose(struct drm_device *dev, struct drm_file *file)
 {
+   i915_scheduler_closefile(dev, file);
+
mutex_lock(>struct_mutex);
i915_gem_context_close(dev, file);
i915_gem_release(dev, file);
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c 
b/drivers/gpu/drm/i915/i915_scheduler.c
index 344760e..5aafc96 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -768,3 +768,38 @@ static int i915_scheduler_remove_dependent(struct 
i915_scheduler *scheduler,
 
return 0;
 }
+
+int i915_scheduler_closefile(struct drm_device *dev, struct drm_file *file)
+{
+   struct i915_scheduler_queue_entry  *node;
+   struct drm_i915_private*dev_priv = dev->dev_private;
+   struct i915_scheduler  *scheduler = dev_priv->scheduler;
+   struct intel_engine_cs  *ring;
+   int i;
+   unsigned long   flags;
+
+   if (!scheduler)
+   return 0;
+
+   spin_lock_irqsave(>lock, flags);
+
+   for_each_ring(ring, dev_priv, i) {
+   list_for_each_entry(node, >node_queue[ring->id], 
link) {
+   if (node->params.file != file)
+   continue;
+
+   if(!I915_SQS_IS_COMPLETE(node))
+   DRM_DEBUG_DRIVER("Closing file handle with 
outstanding work: %d:%d/%d on %s\n",
+node->params.request->uniq,
+node->params.request->seqno,
+node->status,
+ring->name);
+
+   node->params.file = NULL;
+   }
+   }
+
+   spin_unlock_irqrestore(>lock, flags);
+
+   return 0;
+}
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h 
b/drivers/gpu/drm/i915/i915_scheduler.h
index 2d50d83..02ac6f2 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -86,6 +86,8 @@ enum {
 
 booli915_scheduler_is_enabled(struct drm_device *dev);
 int i915_scheduler_init(struct drm_device *dev);
+int i915_scheduler_closefile(struct drm_device *dev,
+struct drm_file *file);
 int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry 
*qe);
 booli915_scheduler_notify_request(struct drm_i915_gem_request *req);
 
-- 
1.9.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 17/40] drm/i915: Added tracking/locking of batch buffer objects

2015-12-11 Thread John . C . Harrison
From: John Harrison 

The scheduler needs to track interdependencies between batch buffers.
These are calculated by analysing the object lists of the buffers and
looking for commonality. The scheduler also needs to keep those
buffers locked long after the initial IOCTL call has returned to user
land.

v3: Updated to support read-read optimisation.

Change-Id: I31e3677ecfc2c9b5a908bda6acc4850432d55f1e
For: VIZ-1587
Signed-off-by: John Harrison 
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 52 --
 drivers/gpu/drm/i915/i915_scheduler.c  | 33 +--
 2 files changed, 80 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 2c7a395..0908699 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1418,7 +1418,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
struct i915_execbuffer_params *params = 
const u32 ctx_id = i915_execbuffer2_get_context_id(*args);
u32 dispatch_flags;
-   int ret;
+   int ret, i;
bool need_relocs;
int fd_fence_complete = -1;
int fd_fence_wait = lower_32_bits(args->rsvd2);
@@ -1553,6 +1553,14 @@ i915_gem_do_execbuffer(struct drm_device *dev, void 
*data,
goto pre_mutex_err;
}
 
+   qe.saved_objects = kzalloc(
+   sizeof(*qe.saved_objects) * args->buffer_count,
+   GFP_KERNEL);
+   if (!qe.saved_objects) {
+   ret = -ENOMEM;
+   goto err;
+   }
+
/* Look up object handles */
ret = eb_lookup_vmas(eb, exec, args, vm, file);
if (ret)
@@ -1673,7 +1681,30 @@ i915_gem_do_execbuffer(struct drm_device *dev, void 
*data,
params->args_DR1= args->DR1;
params->args_DR4= args->DR4;
params->batch_obj   = batch_obj;
-   params->ctx = ctx;
+
+   /*
+* Save away the list of objects used by this batch buffer for the
+* purpose of tracking inter-buffer dependencies.
+*/
+   for (i = 0; i < args->buffer_count; i++) {
+   struct drm_i915_gem_object *obj;
+
+   /*
+* NB: 'drm_gem_object_lookup()' increments the object's
+* reference count and so must be matched by a
+* 'drm_gem_object_unreference' call.
+*/
+   obj = to_intel_bo(drm_gem_object_lookup(dev, file,
+ exec[i].handle));
+   qe.saved_objects[i].obj   = obj;
+   qe.saved_objects[i].read_only = obj->base.pending_write_domain 
== 0;
+
+   }
+   qe.num_objs = i;
+
+   /* Lock and save the context object as well. */
+   i915_gem_context_reference(ctx);
+   params->ctx = ctx;
 
if (args->flags & I915_EXEC_CREATE_FENCE) {
/*
@@ -1738,6 +1769,23 @@ err:
i915_gem_context_unreference(ctx);
eb_destroy(eb);
 
+   if (qe.saved_objects) {
+   /* Need to release the objects: */
+   for (i = 0; i < qe.num_objs; i++) {
+   if (!qe.saved_objects[i].obj)
+   continue;
+
+   drm_gem_object_unreference(
+   _objects[i].obj->base);
+   }
+
+   kfree(qe.saved_objects);
+
+   /* Context too */
+   if (params->ctx)
+   i915_gem_context_unreference(params->ctx);
+   }
+
/*
 * If the request was created but not successfully submitted then it
 * must be freed again. If it was submitted then it is being tracked
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c 
b/drivers/gpu/drm/i915/i915_scheduler.c
index 9d1475f..300cd89 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -158,7 +158,23 @@ int i915_scheduler_queue_execbuffer(struct 
i915_scheduler_queue_entry *qe)
if (ret)
return ret;
 
-   /* Free everything that is owned by the QE structure: */
+   /* Need to release the objects: */
+   for (i = 0; i < qe->num_objs; i++) {
+   if (!qe->saved_objects[i].obj)
+   continue;
+
+   
drm_gem_object_unreference(>saved_objects[i].obj->base);
+   }
+
+   kfree(qe->saved_objects);
+   qe->saved_objects = NULL;
+   qe->num_objs = 0;
+
+   /* Free the context object too: */
+   if (qe->params.ctx)
+   i915_gem_context_unreference(qe->params.ctx);
+
+   /* And anything else owned by the 

[Intel-gfx] [PATCH 32/40] drm/i915: Add early exit to execbuff_final() if insufficient ring space

2015-12-11 Thread John . C . Harrison
From: John Harrison 

One of the major purposes of the GPU scheduler is to avoid stalling
the CPU when the GPU is busy and unable to accept more work. This
change adds support to the ring submission code to allow a ring space
check to be performed before attempting to submit a batch buffer to
the hardware. If insufficient space is available then the scheduler
can go away and come back later, letting the CPU get on with other
work, rather than stalling and waiting for the hardware to catch up.

v3: Updated to use locally cached request pointer.

Change-Id: I267159ce1150cb6714d34a49b841bcbe4bf66326
For: VIZ-1587
Signed-off-by: John Harrison 
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 42 --
 drivers/gpu/drm/i915/intel_lrc.c   | 57 +++---
 drivers/gpu/drm/i915/intel_ringbuffer.c| 24 +
 drivers/gpu/drm/i915/intel_ringbuffer.h|  1 +
 4 files changed, 109 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 8ba426f..bf9d804 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1101,25 +1101,19 @@ i915_reset_gen7_sol_offsets(struct drm_device *dev,
 {
struct intel_engine_cs *ring = req->ring;
struct drm_i915_private *dev_priv = dev->dev_private;
-   int ret, i;
+   int i;
 
if (!IS_GEN7(dev) || ring != _priv->ring[RCS]) {
DRM_DEBUG("sol reset is gen7/rcs only\n");
return -EINVAL;
}
 
-   ret = intel_ring_begin(req, 4 * 3);
-   if (ret)
-   return ret;
-
for (i = 0; i < 4; i++) {
intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
intel_ring_emit(ring, GEN7_SO_WRITE_OFFSET(i));
intel_ring_emit(ring, 0);
}
 
-   intel_ring_advance(ring);
-
return 0;
 }
 
@@ -1247,6 +1241,7 @@ int i915_gem_ringbuffer_submission_final(struct 
i915_execbuffer_params *params)
struct intel_engine_cs  *ring = params->ring;
u64 exec_start, exec_len;
int ret;
+   uint32_t min_space;
 
/* The mutex must be acquired before calling this function */
BUG_ON(!mutex_is_locked(>dev->struct_mutex));
@@ -1268,8 +1263,36 @@ int i915_gem_ringbuffer_submission_final(struct 
i915_execbuffer_params *params)
if (ret)
return ret;
 
+   /*
+* It would be a bad idea to run out of space while writing commands
+* to the ring. One of the major aims of the scheduler is to not stall
+* at any point for any reason. However, doing an early exit half way
+* through submission could result in a partial sequence being written
+* which would leave the engine in an unknown state. Therefore, check in
+* advance that there will be enough space for the entire submission
+* whether emitted by the code below OR by any other functions that may
+* be executed before the end of final().
+*
+* NB: This test deliberately overestimates, because that's easier than
+* tracing every potential path that could be taken!
+*
+* Current measurements suggest that we may need to emit up to 744 bytes
+* (186 dwords), so this is rounded up to 256 dwords here. Then we 
double
+* that to get the free space requirement, because the block isn't 
allowed
+* to span the transition from the end to the beginning of the ring.
+*/
+#define I915_BATCH_EXEC_MAX_LEN 256/* max dwords emitted here  
*/
+   min_space = I915_BATCH_EXEC_MAX_LEN * 2 * sizeof(uint32_t);
+   ret = intel_ring_test_space(req->ringbuf, min_space);
+   if (ret)
+   goto early_error;
+
intel_runtime_pm_get(dev_priv);
 
+   ret = intel_ring_begin(req, I915_BATCH_EXEC_MAX_LEN);
+   if (ret)
+   goto error;
+
/*
 * Unconditionally invalidate gpu caches and ensure that we do flush
 * any residual writes from the previous batch.
@@ -1288,10 +1311,6 @@ int i915_gem_ringbuffer_submission_final(struct 
i915_execbuffer_params *params)
 
if (ring == _priv->ring[RCS] &&
params->instp_mode != dev_priv->relative_constants_mode) {
-   ret = intel_ring_begin(req, 4);
-   if (ret)
-   goto error;
-
intel_ring_emit(ring, MI_NOOP);
intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
intel_ring_emit(ring, INSTPM);
@@ -1328,6 +1347,7 @@ error:
 */
intel_runtime_pm_put(dev_priv);
 
+early_error:
if (ret)
intel_ring_reserved_space_cancel(req->ringbuf);
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 1fa3228..d6acd2d6 100644
--- 

[Intel-gfx] [PATCH 33/40] drm/i915: Added scheduler statistic reporting to debugfs

2015-12-11 Thread John . C . Harrison
From: John Harrison 

It is useful for know what the scheduler is doing for both debugging
and performance analysis purposes. This change adds a bunch of
counters and such that keep track of various scheduler operations
(batches submitted, completed, flush requests, etc.). The data can
then be read in userland via the debugfs mechanism.

v2: Updated to match changes to scheduler implementation.

v3: Updated for changes to kill code and flush code.

Change-Id: I3266c631cd70c9eeb2c235f88f493e60462f85d7
For: VIZ-1587
Signed-off-by: John Harrison 
---
 drivers/gpu/drm/i915/i915_debugfs.c| 77 +++
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 11 +++-
 drivers/gpu/drm/i915/i915_scheduler.c  | 85 +++---
 drivers/gpu/drm/i915/i915_scheduler.h  | 36 +
 4 files changed, 200 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index 8f1c10c..9e7d67d 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -3603,6 +3603,82 @@ static int i915_drrs_status(struct seq_file *m, void 
*unused)
return 0;
 }
 
+static int i915_scheduler_info(struct seq_file *m, void *unused)
+{
+   struct drm_info_node *node = (struct drm_info_node *) m->private;
+   struct drm_device *dev = node->minor->dev;
+   struct drm_i915_private *dev_priv = dev->dev_private;
+   struct i915_scheduler   *scheduler = dev_priv->scheduler;
+   struct i915_scheduler_stats *stats = scheduler->stats;
+   struct i915_scheduler_stats_nodes node_stats[I915_NUM_RINGS];
+   struct intel_engine_cs *ring;
+   char   str[50 * (I915_NUM_RINGS + 1)], name[50], *ptr;
+   int ret, i, r;
+
+   ret = mutex_lock_interruptible(>mode_config.mutex);
+   if (ret)
+   return ret;
+
+#define PRINT_VAR(name, fmt, var)  \
+   do {\
+   sprintf(str, "%-22s", name);\
+   ptr = str + strlen(str);\
+   for_each_ring(ring, dev_priv, r) {  \
+   sprintf(ptr, " %10" fmt, var);  \
+   ptr += strlen(ptr); \
+   }   \
+   seq_printf(m, "%s\n", str); \
+   } while (0)
+
+   PRINT_VAR("Ring name:", "s", dev_priv->ring[r].name);
+   PRINT_VAR("  Ring seqno",   "d", ring->get_seqno(ring, false));
+   seq_putc(m, '\n');
+
+   seq_puts(m, "Batch submissions:\n");
+   PRINT_VAR("  Queued",   "u", stats[r].queued);
+   PRINT_VAR("  Submitted","u", stats[r].submitted);
+   PRINT_VAR("  Completed","u", stats[r].completed);
+   PRINT_VAR("  Expired",  "u", stats[r].expired);
+   seq_putc(m, '\n');
+
+   seq_puts(m, "Flush counts:\n");
+   PRINT_VAR("  By object","u", stats[r].flush_obj);
+   PRINT_VAR("  By request",   "u", stats[r].flush_req);
+   PRINT_VAR("  By stamp", "u", stats[r].flush_stamp);
+   PRINT_VAR("  Blanket",  "u", stats[r].flush_all);
+   PRINT_VAR("  Entries bumped",   "u", stats[r].flush_bump);
+   PRINT_VAR("  Entries submitted","u", stats[r].flush_submit);
+   seq_putc(m, '\n');
+
+   seq_puts(m, "Miscellaneous:\n");
+   PRINT_VAR("  ExecEarly retry",  "u", stats[r].exec_early);
+   PRINT_VAR("  ExecFinal requeue","u", stats[r].exec_again);
+   PRINT_VAR("  ExecFinal killed", "u", stats[r].exec_dead);
+   PRINT_VAR("  Fence wait",   "u", stats[r].fence_wait);
+   PRINT_VAR("  Fence wait again", "u", stats[r].fence_again);
+   PRINT_VAR("  Fence wait ignore","u", stats[r].fence_ignore);
+   PRINT_VAR("  Fence supplied",   "u", stats[r].fence_got);
+   PRINT_VAR("  Hung flying",  "u", stats[r].kill_flying);
+   PRINT_VAR("  Hung queued",  "u", stats[r].kill_queued);
+   seq_putc(m, '\n');
+
+   seq_puts(m, "Queue contents:\n");
+   for_each_ring(ring, dev_priv, i)
+   i915_scheduler_query_stats(ring, node_stats + ring->id);
+
+   for (i = 0; i < (i915_sqs_MAX + 1); i++) {
+   sprintf(name, "  %s", i915_scheduler_queue_status_str(i));
+   PRINT_VAR(name, "d", node_stats[r].counts[i]);
+   }
+   seq_putc(m, '\n');
+
+#undef PRINT_VAR
+
+   mutex_unlock(>mode_config.mutex);
+
+   return 0;
+}
+
 struct pipe_crc_info {
const char *name;
struct drm_device *dev;
@@ -5571,6 +5647,7 @@ static const struct drm_info_list i915_debugfs_list[] = {
   

[Intel-gfx] [RFC 38/38] drm/i915: Added preemption info to various trace points

2015-12-11 Thread John . C . Harrison
From: John Harrison 

v2: Fixed a typo (and improved the names in general). Updated for
changes to notify() code.

For: VIZ-2021
Signed-off-by: Dave Gordon 
---
 drivers/gpu/drm/i915/i915_gem.c   |  5 +++--
 drivers/gpu/drm/i915/i915_scheduler.c |  2 +-
 drivers/gpu/drm/i915/i915_trace.h | 30 +++---
 3 files changed, 23 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 68bf8ce..d90b12c 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2872,12 +2872,12 @@ void i915_gem_request_notify(struct intel_engine_cs 
*ring, bool fence_locked)
u32 seqno;
 
if (list_empty(>fence_signal_list)) {
-   trace_i915_gem_request_notify(ring, 0);
+   trace_i915_gem_request_notify(ring, 0, 0, 0);
return;
}
 
seqno   = ring->get_seqno(ring, false);
-   trace_i915_gem_request_notify(ring, seqno);
+   trace_i915_gem_request_notify(ring, seqno, 0, 0);
 
/* Is there anything new to process? */
if ((seqno == ring->last_irq_seqno) && 
!i915_scheduler_is_ring_preempting(ring))
@@ -2930,6 +2930,7 @@ void i915_gem_request_notify(struct intel_engine_cs 
*ring, bool fence_locked)
 
preempt_start = intel_read_status_page(ring, 
I915_PREEMPTIVE_ACTIVE_SEQNO);
preempt_done = intel_read_status_page(ring, 
I915_PREEMPTIVE_DONE_SEQNO);
+   trace_i915_gem_request_notify(ring, seqno, preempt_start, 
preempt_done);
 
/*
 * A preemption request leaves both ACTIVE and DONE set to the 
same
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c 
b/drivers/gpu/drm/i915/i915_scheduler.c
index e0db268..37fcd7c 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -616,7 +616,7 @@ bool i915_scheduler_notify_request(struct 
drm_i915_gem_request *req,
unsigned long flags;
bool result;
 
-   trace_i915_scheduler_landing(req);
+   trace_i915_scheduler_landing(req, preempt);
 
spin_lock_irqsave(>lock, flags);
 
diff --git a/drivers/gpu/drm/i915/i915_trace.h 
b/drivers/gpu/drm/i915/i915_trace.h
index 73b0ee9..5725cfa 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -569,13 +569,16 @@ DEFINE_EVENT(i915_gem_request, i915_gem_request_add,
 );
 
 TRACE_EVENT(i915_gem_request_notify,
-   TP_PROTO(struct intel_engine_cs *ring, uint32_t seqno),
-   TP_ARGS(ring, seqno),
+   TP_PROTO(struct intel_engine_cs *ring, uint32_t seqno,
+uint32_t preempt_start, uint32_t preempt_done),
+   TP_ARGS(ring, seqno, preempt_start, preempt_done),
 
TP_STRUCT__entry(
 __field(u32, dev)
 __field(u32, ring)
 __field(u32, seqno)
+__field(u32, preempt_start)
+__field(u32, preempt_done)
 __field(bool, is_empty)
 ),
 
@@ -583,11 +586,14 @@ TRACE_EVENT(i915_gem_request_notify,
   __entry->dev = ring->dev->primary->index;
   __entry->ring = ring->id;
   __entry->seqno = seqno;
+  __entry->preempt_start = preempt_start;
+  __entry->preempt_done = preempt_done;
   __entry->is_empty = 
list_empty(>fence_signal_list);
   ),
 
-   TP_printk("dev=%u, ring=%u, seqno=%u, empty=%d",
+   TP_printk("dev=%u, ring=%u, seqno=%u, preempt_start=%u, 
preempt_done=%u, empty=%d",
  __entry->dev, __entry->ring, __entry->seqno,
+ __entry->preempt_start, __entry->preempt_done,
  __entry->is_empty)
 );
 
@@ -887,25 +893,27 @@ TRACE_EVENT(i915_scheduler_unfly,
 );
 
 TRACE_EVENT(i915_scheduler_landing,
-   TP_PROTO(struct drm_i915_gem_request *req),
-   TP_ARGS(req),
+   TP_PROTO(struct drm_i915_gem_request *req, bool preempt),
+   TP_ARGS(req, preempt),
 
TP_STRUCT__entry(
 __field(u32, ring)
 __field(u32, uniq)
 __field(u32, seqno)
 __field(u32, status)
+__field(bool, preempt)
 ),
 
TP_fast_assign(
-  __entry->ring   = req->ring->id;
-  __entry->uniq   = req->uniq;
-  __entry->seqno  = req->seqno;
-  __entry->status = req->scheduler_qe ? 
req->scheduler_qe->status : ~0U;
+  __entry->ring= req->ring->id;
+   

Re: [Intel-gfx] [PATCH 13/13] drm/i915: Cache last IRQ seqno to reduce IRQ overhead

2015-12-11 Thread Chris Wilson
On Fri, Dec 11, 2015 at 01:12:01PM +, john.c.harri...@intel.com wrote:
> From: John Harrison 
> 
> The notify function can be called many times without the seqno
> changing. A large number of duplicates are to prevent races due to the
> requirement of not enabling interrupts until requested. However, when
> interrupts are enabled the IRQ handle can be called multiple times
> without the ring's seqno value changing. This patch reduces the
> overhead of these extra calls by caching the last processed seqno
> value and early exiting if it has not changed.

This is just plain wrong. Every user-interrupt is preceded by a seqno
update.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 08/40] drm/i915: Start of GPU scheduler

2015-12-11 Thread John . C . Harrison
From: John Harrison 

Initial creation of scheduler source files. Note that this patch
implements most of the scheduler functionality but does not hook it in
to the driver yet. It also leaves the scheduler code in 'pass through'
mode so that even when it is hooked in, it will not actually do very
much. This allows the hooks to be added one at a time in byte size
chunks and only when the scheduler is finally enabled at the end does
anything start happening.

The general theory of operation is that when batch buffers are
submitted to the driver, the execbuffer() code assigns a unique
request and then packages up all the information required to execute
the batch buffer at a later time. This package is given over to the
scheduler which adds it to an internal node list. The scheduler also
scans the list of objects associated with the batch buffer and
compares them against the objects already in use by other buffers in
the node list. If matches are found then the new batch buffer node is
marked as being dependent upon the matching node. The same is done for
the context object. The scheduler also bumps up the priority of such
matching nodes on the grounds that the more dependencies a given batch
buffer has the more important it is likely to be.

The scheduler aims to have a given (tuneable) number of batch buffers
in flight on the hardware at any given time. If fewer than this are
currently executing when a new node is queued, then the node is passed
straight through to the submit function. Otherwise it is simply added
to the queue and the driver returns back to user land.

As each batch buffer completes, it raises an interrupt which wakes up
the scheduler. Note that it is possible for multiple buffers to
complete before the IRQ handler gets to run. Further, it is possible
for the seqno values to be un-ordered (particularly once pre-emption
is enabled). However, the scheduler keeps the list of executing
buffers in order of hardware submission. Thus it can scan through the
list until a matching seqno is found and then mark all in flight nodes
from that point on as completed.

A deferred work queue is also poked by the interrupt handler. When
this wakes up it can do more involved processing such as actually
removing completed nodes from the queue and freeing up the resources
associated with them (internal memory allocations, DRM object
references, context reference, etc.). The work handler also checks the
in flight count and calls the submission code if a new slot has
appeared.

When the scheduler's submit code is called, it scans the queued node
list for the highest priority node that has no unmet dependencies.
Note that the dependency calculation is complex as it must take
inter-ring dependencies and potential preemptions into account. Note
also that in the future this will be extended to include external
dependencies such as the Android Native Sync file descriptors and/or
the linux dma-buff synchronisation scheme.

If a suitable node is found then it is sent to execbuff_final() for
submission to the hardware. The in flight count is then re-checked and
a new node popped from the list if appropriate.

Note that this patch does not implement pre-emptive scheduling. Only
basic scheduling by re-ordering batch buffer submission is currently
implemented.

v2: Changed priority levels to +/-1023 due to feedback from Chris
Wilson.

Removed redundant index from scheduler node.

Changed time stamps to use jiffies instead of raw monotonic. This
provides lower resolution but improved compatibility with other i915
code.

Major re-write of completion tracking code due to struct fence
conversion. The scheduler no longer has it's own private IRQ handler
but just lets the existing request code handle completion events.
Instead, the scheduler now hooks into the request notify code to be
told when a request has completed.

Reduced driver mutex locking scope. Removal of scheduler nodes no
longer grabs the mutex lock.

v3: Refactor of dependency generation to make the code more readable.
Also added in read-read optimisation support - i.e., don't treat a
shared read-only buffer as being a dependency.

Allowed the killing of queued nodes rather than only flying ones.

Change-Id: I1e08f59e650a3c2bbaaa9de7627da33849b06106
For: VIZ-1587
Signed-off-by: John Harrison 
---
 drivers/gpu/drm/i915/Makefile |   1 +
 drivers/gpu/drm/i915/i915_drv.h   |   4 +
 drivers/gpu/drm/i915/i915_gem.c   |   5 +
 drivers/gpu/drm/i915/i915_scheduler.c | 763 ++
 drivers/gpu/drm/i915/i915_scheduler.h |  91 
 5 files changed, 864 insertions(+)
 create mode 100644 drivers/gpu/drm/i915/i915_scheduler.c
 create mode 100644 drivers/gpu/drm/i915/i915_scheduler.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 15398c5..79cb38b 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -10,6 +10,7 @@ 

[Intel-gfx] [PATCH 06/40] drm/i915: Cache request pointer in *_submission_final()

2015-12-11 Thread John . C . Harrison
From: Dave Gordon 

Keep a local copy of the request pointer in the _final() functions
rather than dereferencing the params block repeatedly.

v3: New patch in series.

For: VIZ-1587
Signed-off-by: Dave Gordon 
Signed-off-by: John Harrison 
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 13 +++--
 drivers/gpu/drm/i915/intel_lrc.c   | 11 ++-
 2 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 05c9de6..e38310f 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1245,6 +1245,7 @@ i915_gem_ringbuffer_submission(struct 
i915_execbuffer_params *params,
 int i915_gem_ringbuffer_submission_final(struct i915_execbuffer_params *params)
 {
struct drm_i915_private *dev_priv = params->dev->dev_private;
+   struct drm_i915_gem_request *req = params->request;
struct intel_engine_cs  *ring = params->ring;
u64 exec_start, exec_len;
int ret;
@@ -1258,12 +1259,12 @@ int i915_gem_ringbuffer_submission_final(struct 
i915_execbuffer_params *params)
 * Unconditionally invalidate gpu caches and ensure that we do flush
 * any residual writes from the previous batch.
 */
-   ret = intel_ring_invalidate_all_caches(params->request);
+   ret = intel_ring_invalidate_all_caches(req);
if (ret)
goto error;
 
/* Switch to the correct context for the batch */
-   ret = i915_switch_context(params->request);
+   ret = i915_switch_context(req);
if (ret)
goto error;
 
@@ -1272,7 +1273,7 @@ int i915_gem_ringbuffer_submission_final(struct 
i915_execbuffer_params *params)
 
if (ring == _priv->ring[RCS] &&
params->instp_mode != dev_priv->relative_constants_mode) {
-   ret = intel_ring_begin(params->request, 4);
+   ret = intel_ring_begin(req, 4);
if (ret)
goto error;
 
@@ -1286,7 +1287,7 @@ int i915_gem_ringbuffer_submission_final(struct 
i915_execbuffer_params *params)
}
 
if (params->args_flags & I915_EXEC_GEN7_SOL_RESET) {
-   ret = i915_reset_gen7_sol_offsets(params->dev, params->request);
+   ret = i915_reset_gen7_sol_offsets(params->dev, req);
if (ret)
goto error;
}
@@ -1295,13 +1296,13 @@ int i915_gem_ringbuffer_submission_final(struct 
i915_execbuffer_params *params)
exec_start = params->batch_obj_vm_offset +
 params->args_batch_start_offset;
 
-   ret = ring->dispatch_execbuffer(params->request,
+   ret = ring->dispatch_execbuffer(req,
exec_start, exec_len,
params->dispatch_flags);
if (ret)
goto error;
 
-   trace_i915_gem_ring_dispatch(params->request, params->dispatch_flags);
+   trace_i915_gem_ring_dispatch(req, params->dispatch_flags);
 
i915_gem_execbuffer_retire_commands(params);
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 88d57b7..b98ea3d 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -929,7 +929,8 @@ int intel_execlists_submission(struct 
i915_execbuffer_params *params,
 int intel_execlists_submission_final(struct i915_execbuffer_params *params)
 {
struct drm_i915_private *dev_priv = params->dev->dev_private;
-   struct intel_ringbuffer *ringbuf = params->request->ringbuf;
+   struct drm_i915_gem_request *req = params->request;
+   struct intel_ringbuffer *ringbuf = req->ringbuf;
struct intel_engine_cs *ring = params->ring;
u64 exec_start;
int ret;
@@ -941,13 +942,13 @@ int intel_execlists_submission_final(struct 
i915_execbuffer_params *params)
 * Unconditionally invalidate gpu caches and ensure that we do flush
 * any residual writes from the previous batch.
 */
-   ret = logical_ring_invalidate_all_caches(params->request);
+   ret = logical_ring_invalidate_all_caches(req);
if (ret)
return ret;
 
if (ring == _priv->ring[RCS] &&
params->instp_mode != dev_priv->relative_constants_mode) {
-   ret = intel_logical_ring_begin(params->request, 4);
+   ret = intel_logical_ring_begin(req, 4);
if (ret)
return ret;
 
@@ -963,11 +964,11 @@ int intel_execlists_submission_final(struct 
i915_execbuffer_params *params)
exec_start = params->batch_obj_vm_offset +
 params->args_batch_start_offset;
 
-   ret = ring->emit_bb_start(params->request, exec_start, 
params->dispatch_flags);
+   ret = ring->emit_bb_start(req, exec_start, 

[Intel-gfx] [PATCH 31/40] drm/i915: Added debug state dump facilities to scheduler

2015-12-11 Thread John . C . Harrison
From: John Harrison 

When debugging batch buffer submission issues, it is useful to be able
to see what the current state of the scheduler is. This change adds
functions for decoding the internal scheduler state and reporting it.

v3: Updated a debug message with the new state_str() function.

Change-Id: I0634168e3f3465ff023f5a673165c90b07e535b6
For: VIZ-1587
Signed-off-by: John Harrison 
---
 drivers/gpu/drm/i915/i915_scheduler.c | 280 +-
 drivers/gpu/drm/i915/i915_scheduler.h |  14 ++
 2 files changed, 292 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_scheduler.c 
b/drivers/gpu/drm/i915/i915_scheduler.c
index e6e1bd967..be2430d 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -36,6 +36,9 @@ static int i915_scheduler_submit_max_priority(struct 
intel_engine_cs *ri
  bool is_locked);
 static uint32_ti915_scheduler_count_flying(struct i915_scheduler 
*scheduler,
   struct intel_engine_cs *ring);
+static int i915_scheduler_dump_locked(struct intel_engine_cs *ring,
+ const char *msg);
+static int i915_scheduler_dump_all_locked(struct drm_device *dev, 
const char *msg);
 static voidi915_scheduler_priority_bump_clear(struct i915_scheduler 
*scheduler);
 static int i915_scheduler_priority_bump(struct i915_scheduler 
*scheduler,
struct 
i915_scheduler_queue_entry *target,
@@ -53,6 +56,116 @@ bool i915_scheduler_is_enabled(struct drm_device *dev)
return dev_priv->scheduler != NULL;
 }
 
+const char *i915_qe_state_str(struct i915_scheduler_queue_entry *node)
+{
+   static char str[50];
+   char*ptr = str;
+
+   *(ptr++) = node->bumped ? 'B' : '-',
+   *(ptr++) = i915_gem_request_completed(node->params.request) ? 'C' : '-';
+
+   *ptr = 0;
+
+   return str;
+}
+
+char i915_scheduler_queue_status_chr(enum i915_scheduler_queue_status status)
+{
+   switch (status) {
+   case i915_sqs_none:
+   return 'N';
+
+   case i915_sqs_queued:
+   return 'Q';
+
+   case i915_sqs_popped:
+   return 'X';
+
+   case i915_sqs_flying:
+   return 'F';
+
+   case i915_sqs_complete:
+   return 'C';
+
+   case i915_sqs_dead:
+   return 'D';
+
+   default:
+   break;
+   }
+
+   return '?';
+}
+
+const char *i915_scheduler_queue_status_str(
+   enum i915_scheduler_queue_status status)
+{
+   static char str[50];
+
+   switch (status) {
+   case i915_sqs_none:
+   return "None";
+
+   case i915_sqs_queued:
+   return "Queued";
+
+   case i915_sqs_popped:
+   return "Popped";
+
+   case i915_sqs_flying:
+   return "Flying";
+
+   case i915_sqs_complete:
+   return "Complete";
+
+   case i915_sqs_dead:
+   return "Dead";
+
+   default:
+   break;
+   }
+
+   sprintf(str, "[Unknown_%d!]", status);
+   return str;
+}
+
+const char *i915_scheduler_flag_str(uint32_t flags)
+{
+   static char str[100];
+   char   *ptr = str;
+
+   *ptr = 0;
+
+#define TEST_FLAG(flag, msg)   \
+   do {\
+   if (flags & (flag)) {   \
+   strcpy(ptr, msg);   \
+   ptr += strlen(ptr); \
+   flags &= ~(flag);   \
+   }   \
+   } while (0)
+
+   TEST_FLAG(i915_sf_interrupts_enabled, "IntOn|");
+   TEST_FLAG(i915_sf_submitting, "Submitting|");
+   TEST_FLAG(i915_sf_dump_force, "DumpForce|");
+   TEST_FLAG(i915_sf_dump_details,   "DumpDetails|");
+   TEST_FLAG(i915_sf_dump_dependencies,  "DumpDeps|");
+
+#undef TEST_FLAG
+
+   if (flags) {
+   sprintf(ptr, "Unknown_0x%X!", flags);
+   ptr += strlen(ptr);
+   }
+
+   if (ptr == str)
+   strcpy(str, "-");
+   else
+   ptr[-1] = 0;
+
+   return str;
+};
+
 int i915_scheduler_init(struct drm_device *dev)
 {
struct drm_i915_private *dev_priv = dev->dev_private;
@@ -631,6 +744,169 @@ void i915_gem_scheduler_work_handler(struct work_struct 
*work)
}
 }
 
+int i915_scheduler_dump_all(struct drm_device *dev, const char *msg)
+{
+   struct drm_i915_private *dev_priv = dev->dev_private;
+   struct i915_scheduler   *scheduler = dev_priv->scheduler;
+   unsigned long   flags;
+   int ret;
+
+   

[Intel-gfx] [PATCH 29/40] drm/i915: Added scheduler queue throttling by DRM file handle

2015-12-11 Thread John . C . Harrison
From: John Harrison 

The scheduler decouples the submission of batch buffers to the driver
from their subsequent submission to the hardware. This means that an
application which is continuously submitting buffers as fast as it can
could potentialy flood the driver. To prevent this, the driver now
tracks how many buffers are in progress (queued in software or
executing in hardware) and limits this to a given (tunable) number. If
this number is exceeded then the queue to the driver will return
EAGAIN and thus prevent the scheduler's queue becoming arbitrarily
large.

v3: Added a missing decrement of the file queue counter.

Change-Id: I83258240aec7c810db08c006a3062d46aa91363f
For: VIZ-1587
Signed-off-by: John Harrison 
---
 drivers/gpu/drm/i915/i915_drv.h|  2 ++
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  8 +++
 drivers/gpu/drm/i915/i915_scheduler.c  | 35 ++
 drivers/gpu/drm/i915/i915_scheduler.h  |  2 ++
 4 files changed, 47 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 4187e75..4ecb6e4 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -342,6 +342,8 @@ struct drm_i915_file_private {
} rps;
 
struct intel_engine_cs *bsd_ring;
+
+   u32 scheduler_queue_length;
 };
 
 enum intel_dpll_id {
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index b358b21..8ba426f 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1862,6 +1862,10 @@ i915_gem_execbuffer(struct drm_device *dev, void *data,
return -EINVAL;
}
 
+   /* Throttle batch requests per device file */
+   if (i915_scheduler_file_queue_is_full(file))
+   return -EAGAIN;
+
/* Copy in the exec list from userland */
exec_list = drm_malloc_ab(sizeof(*exec_list), args->buffer_count);
exec2_list = drm_malloc_ab(sizeof(*exec2_list), args->buffer_count);
@@ -1945,6 +1949,10 @@ i915_gem_execbuffer2(struct drm_device *dev, void *data,
return -EINVAL;
}
 
+   /* Throttle batch requests per device file */
+   if (i915_scheduler_file_queue_is_full(file))
+   return -EAGAIN;
+
exec2_list = kmalloc(sizeof(*exec2_list)*args->buffer_count,
 GFP_TEMPORARY | __GFP_NOWARN | __GFP_NORETRY);
if (exec2_list == NULL)
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c 
b/drivers/gpu/drm/i915/i915_scheduler.c
index 4736f0f..e6e1bd967 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -40,6 +40,8 @@ static voidi915_scheduler_priority_bump_clear(struct 
i915_scheduler *sch
 static int i915_scheduler_priority_bump(struct i915_scheduler 
*scheduler,
struct 
i915_scheduler_queue_entry *target,
uint32_t bump);
+static voidi915_scheduler_file_queue_inc(struct drm_file *file);
+static voidi915_scheduler_file_queue_dec(struct drm_file *file);
 
 bool i915_scheduler_is_enabled(struct drm_device *dev)
 {
@@ -74,6 +76,7 @@ int i915_scheduler_init(struct drm_device *dev)
scheduler->priority_level_max = 1023;
scheduler->priority_level_preempt = 900;
scheduler->min_flying = 2;
+   scheduler->file_queue_max = 64;
 
dev_priv->scheduler = scheduler;
 
@@ -267,6 +270,8 @@ int i915_scheduler_queue_execbuffer(struct 
i915_scheduler_queue_entry *qe)
 
list_add_tail(>link, >node_queue[ring->id]);
 
+   i915_scheduler_file_queue_inc(node->params.file);
+
if (i915.scheduler_override & i915_so_submit_on_queue)
not_flying = true;
else
@@ -551,6 +556,12 @@ static int i915_scheduler_remove(struct intel_engine_cs 
*ring)
/* Strip the dependency info while the mutex is still locked */
i915_scheduler_remove_dependent(scheduler, node);
 
+   /* Likewise clean up the file descriptor before it might 
disappear. */
+   if (node->params.file) {
+   i915_scheduler_file_queue_dec(node->params.file);
+   node->params.file = NULL;
+   }
+
continue;
}
 
@@ -1194,6 +1205,7 @@ int i915_scheduler_closefile(struct drm_device *dev, 
struct drm_file *file)
 node->status,
 ring->name);
 
+   i915_scheduler_file_queue_dec(node->params.file);
node->params.file = NULL;
}
}
@@ -1202,3 +1214,26 @@ int i915_scheduler_closefile(struct drm_device *dev, 
struct drm_file *file)
 
return 0;
 }
+

Re: [Intel-gfx] [PATCH V4 2/2] drm/i915: start adding dp mst audio

2015-12-11 Thread Takashi Iwai
On Fri, 11 Dec 2015 07:07:53 +0100,
Libin Yang wrote:
> 
> Add Takashi and ALSA mail list.
> 
> On 12/10/2015 05:02 PM, Daniel Vetter wrote:
> > On Tue, Dec 08, 2015 at 04:01:20PM +0800, Libin Yang wrote:
> >> Hi all,
> >>
> >> Any comments on the patches?
> >
> > Sorry, simply fell through the cracks since Ander is on vacation. Takashi
> > is working on some cleanup patches to have a port->encoder mapping for the
> > audio side of i915. His patch cleans up all the existing audio code in
> > i915, but please work together with him to align mst code with the new
> > style.
> >
> > Both patches queued for next.
> 
> Yes, I have seen Takashi's patches. I will check the patches.

The patch like below should work; it sets/clears the reverse mapping
dynamically for the MST encoder.

At least, now I could get a proper ELD from a docking station.  But
the audio itself doesn't seem working yet, missing something...

FWIW, the fixed patches are found in my test/hdmi-jack branch.
It contains my previous get_eld patchset, HD-audio side changes,
Libin's this patchset, plus Libin's HD-audio MST patchset and some
fixes.


Takashi

---
diff --git a/drivers/gpu/drm/i915/intel_dp_mst.c 
b/drivers/gpu/drm/i915/intel_dp_mst.c
index 8b608c2cd070..87dad62fd10b 100644
--- a/drivers/gpu/drm/i915/intel_dp_mst.c
+++ b/drivers/gpu/drm/i915/intel_dp_mst.c
@@ -108,6 +108,7 @@ static void intel_mst_disable_dp(struct intel_encoder 
*encoder)
struct drm_i915_private *dev_priv = dev->dev_private;
struct drm_crtc *crtc = encoder->base.crtc;
struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
+   enum port port = intel_dig_port->port;
 
int ret;
 
@@ -122,6 +123,9 @@ static void intel_mst_disable_dp(struct intel_encoder 
*encoder)
if (intel_crtc->config->has_audio) {
intel_audio_codec_disable(encoder);
intel_display_power_put(dev_priv, POWER_DOMAIN_AUDIO);
+   mutex_lock(_priv->av_mutex);
+   dev_priv->dig_port_map[port] = NULL;
+   mutex_unlock(_priv->av_mutex);
}
 }
 
@@ -236,6 +240,9 @@ static void intel_mst_enable_dp(struct intel_encoder 
*encoder)
if (crtc->config->has_audio) {
DRM_DEBUG_DRIVER("Enabling DP audio on pipe %c\n",
 pipe_name(crtc->pipe));
+   mutex_lock(_priv->av_mutex);
+   dev_priv->dig_port_map[port] = encoder;
+   mutex_unlock(_priv->av_mutex);
intel_display_power_get(dev_priv, POWER_DOMAIN_AUDIO);
intel_audio_codec_enable(encoder);
}
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 2/6] drm/i915: Support for creating Stolen memory backed objects

2015-12-11 Thread Dave Gordon

On 11/12/15 12:19, Tvrtko Ursulin wrote:


On 11/12/15 11:22, Ankitprasad Sharma wrote:

On Wed, 2015-12-09 at 14:06 +, Tvrtko Ursulin wrote:

Hi,

On 09/12/15 12:46, ankitprasad.r.sha...@intel.com wrote:

From: Ankitprasad Sharma 


[snip!]

+/**
+ * Requested flags (currently used for placement
+ * (which memory domain))
+ *
+ * You can request that the object be created from special memory
+ * rather than regular system pages using this parameter. Such
+ * irregular objects may have certain restrictions (such as CPU
+ * access to a stolen object is verboten).
+ *
+ * This can be used in the future for other purposes too
+ * e.g. specifying tiling/caching/madvise
+ */
+__u32 flags;
+#define I915_CREATE_PLACEMENT_STOLEN (1<<0) /* Cannot use CPU
mmaps */
+#define __I915_CREATE_UNKNOWN_FLAGS
-(I915_CREATE_PLACEMENT_STOLEN << 1)


I've asked in another reply, now that userspace can create a stolen
object, what happens if it tries to use it for a batch buffer?

Can it end up in the relocate_entry_cpu with a batch buffer allocated
from stolen, which would then call i915_gem_object_get_page and crash?

Thanks for pointing it out.
Yes, this is definitely a possibility, if we allocate batchbuffers from
the stolen region. I have started working on that, to do
relocate_entry_stolen() if the object is allocated from stolen.


Or perhaps it would be OK to just fail the execbuf?

Just thinking to simplify things. Is it required (or expected) that
users will need or want to create batch buffers from stolen?

Regards,
Tvrtko


Let's NOT have batchbuffers in stolen. Or anywhere else exotic, just in 
regular shmfs-backed GEM objects (no phys, userptr, or dma_buf either).
And I'd rather contexts and ringbuffers weren't placed there either, 
because the CPU needs to write those all the time. All special-purpose 
GEM objects should be usable ONLY as data buffers for the GPU, or for 
CPU access with pread/pwrite. The objects that the kernel needs to 
understand and manipulate (contexts, ringbuffers, and batches) should 
always be default (shmfs-backed) GEM objects, so that we don't have to 
propagate the understanding of all the exceptional cases into a 
multitude of different kernel functions.


Oh, and I'd suggest that once we have more than two GEM object types, 
the pread/pwrite operations should be extracted and turned into vfuncs 
rather than adding complexity to the common ioctl/shmfs path.


.Dave.

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH] drm/i915: Fix context/engine cleanup order

2015-12-11 Thread Nick Hoath
Swap the order of context & engine cleanup, so that it is now
contexts, then engines.
This allows the context clean up code to do things like confirm
that ring->dev->struct_mutex is locked without a NULL pointer
dereference.
This came about as a result of the 'intel_ring_initialized() must
be simple and inline' patch now using ring->dev as an initialised
flag.
Rename the cleanup function to reflect what it actually does.
Also clean up some very annoying whitespace issues at the same time.

Signed-off-by: Nick Hoath 
Cc: Mika Kuoppala 
Cc: Daniel Vetter 
Cc: David Gordon 
Cc: Chris Wilson 
---
 drivers/gpu/drm/i915/i915_dma.c |  4 ++--
 drivers/gpu/drm/i915/i915_drv.h |  2 +-
 drivers/gpu/drm/i915/i915_gem.c | 23 ---
 3 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 84e2b20..a2857b0 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -449,7 +449,7 @@ static int i915_load_modeset_init(struct drm_device *dev)
 
 cleanup_gem:
mutex_lock(>struct_mutex);
-   i915_gem_cleanup_ringbuffer(dev);
+   i915_gem_cleanup_engines(dev);
i915_gem_context_fini(dev);
mutex_unlock(>struct_mutex);
 cleanup_irq:
@@ -1188,8 +1188,8 @@ int i915_driver_unload(struct drm_device *dev)
 
intel_guc_ucode_fini(dev);
mutex_lock(>struct_mutex);
-   i915_gem_cleanup_ringbuffer(dev);
i915_gem_context_fini(dev);
+   i915_gem_cleanup_engines(dev);
mutex_unlock(>struct_mutex);
intel_fbc_cleanup_cfb(dev_priv);
i915_gem_cleanup_stolen(dev);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 5edd393..e317f88 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3016,7 +3016,7 @@ int i915_gem_init_rings(struct drm_device *dev);
 int __must_check i915_gem_init_hw(struct drm_device *dev);
 int i915_gem_l3_remap(struct drm_i915_gem_request *req, int slice);
 void i915_gem_init_swizzling(struct drm_device *dev);
-void i915_gem_cleanup_ringbuffer(struct drm_device *dev);
+void i915_gem_cleanup_engines(struct drm_device *dev);
 int __must_check i915_gpu_idle(struct drm_device *dev);
 int __must_check i915_gem_suspend(struct drm_device *dev);
 void __i915_add_request(struct drm_i915_gem_request *req,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 8e2acde..04a22db 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4823,7 +4823,7 @@ i915_gem_init_hw(struct drm_device *dev)
 
ret = i915_gem_request_alloc(ring, ring->default_context, );
if (ret) {
-   i915_gem_cleanup_ringbuffer(dev);
+   i915_gem_cleanup_engines(dev);
goto out;
}
 
@@ -4836,7 +4836,7 @@ i915_gem_init_hw(struct drm_device *dev)
if (ret && ret != -EIO) {
DRM_ERROR("PPGTT enable ring #%d failed %d\n", i, ret);
i915_gem_request_cancel(req);
-   i915_gem_cleanup_ringbuffer(dev);
+   i915_gem_cleanup_engines(dev);
goto out;
}
 
@@ -4844,7 +4844,7 @@ i915_gem_init_hw(struct drm_device *dev)
if (ret && ret != -EIO) {
DRM_ERROR("Context enable ring #%d failed %d\n", i, 
ret);
i915_gem_request_cancel(req);
-   i915_gem_cleanup_ringbuffer(dev);
+   i915_gem_cleanup_engines(dev);
goto out;
}
 
@@ -4919,7 +4919,7 @@ out_unlock:
 }
 
 void
-i915_gem_cleanup_ringbuffer(struct drm_device *dev)
+i915_gem_cleanup_engines(struct drm_device *dev)
 {
struct drm_i915_private *dev_priv = dev->dev_private;
struct intel_engine_cs *ring;
@@ -4928,13 +4928,14 @@ i915_gem_cleanup_ringbuffer(struct drm_device *dev)
for_each_ring(ring, dev_priv, i)
dev_priv->gt.cleanup_ring(ring);
 
-if (i915.enable_execlists)
-/*
- * Neither the BIOS, ourselves or any other kernel
- * expects the system to be in execlists mode on startup,
- * so we need to reset the GPU back to legacy mode.
- */
-intel_gpu_reset(dev);
+   if (i915.enable_execlists) {
+   /*
+* Neither the BIOS, ourselves or any other kernel
+* expects the system to be in execlists mode on startup,
+* so we need to reset the GPU back to legacy mode.
+*/
+   intel_gpu_reset(dev);
+   }
 }
 
 static void
-- 
1.9.1

___
Intel-gfx 

[Intel-gfx] [RFC 36/38] drm/i915/preempt: update (LRC) ringbuffer-filling code to create preemptive requests

2015-12-11 Thread John . C . Harrison
From: Dave Gordon 

This patch refactors the rinbuffer-level code (in execlists/GuC mode
only) and enhances it so that it can emit the proper sequence of opcode
for preemption requests.

A preemption request is similar to an batch submission, but doesn't
actually invoke a batchbuffer, the purpose being simply to get the
engine to stop what it's doing so that the scheduler can then send it a
new workload instead.

Preemption requests use different locations in the hardware status page
to hold the 'active' and 'done' seqnos from regular batches, so that
information pertaining to a preempted batch is not overwritten. Also,
whereas a regular batch clears its 'active' flag when it finishes (so
that TDR knows it's no longer to blame), preemption requests leave this
set and the driver clears it once the completion of the preemption
request has been noticed. Only one preemption (per ring) can be in
progress at one time, so this handshake ensures correct sequencing of
the request between the GPU and CPU.

Actually-preemptive requests are still disabled via a module parameter
at this stage, but all the components should now be ready for us to turn
it on :)

v2: Updated to use locally cached request pointer and to fix the
location of the dispatch trace point.

For: VIZ-2021
Signed-off-by: Dave Gordon 
---
 drivers/gpu/drm/i915/intel_lrc.c | 177 ++-
 1 file changed, 136 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 36d63b7..31645a3 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -748,7 +748,7 @@ intel_logical_ring_advance_and_submit(struct 
drm_i915_gem_request *request)
struct drm_i915_private *dev_priv = request->i915;
struct i915_guc_client *client = dev_priv->guc.execbuf_client;
const static bool fake = false; /* true => only pretend to preempt */
-   bool preemptive = false;/* for now */
+   bool preemptive;
 
intel_logical_ring_advance(request->ringbuf);
 
@@ -757,6 +757,7 @@ intel_logical_ring_advance_and_submit(struct 
drm_i915_gem_request *request)
if (intel_ring_stopped(ring))
return;
 
+   preemptive = (request->scheduler_flags & i915_req_sf_preempt) != 0;
if (preemptive && dev_priv->guc.preempt_client && !fake)
client = dev_priv->guc.preempt_client;
 
@@ -951,6 +952,117 @@ int intel_execlists_submission(struct 
i915_execbuffer_params *params,
 }
 
 /*
+ * This function stores the specified constant value in the (index)th DWORD of 
the
+ * hardware status page (execlist mode only). See separate code for legacy 
mode.
+ */
+static void
+emit_store_dw_index(struct drm_i915_gem_request *req, uint32_t value, uint32_t 
index)
+{
+   struct intel_ringbuffer *ringbuf = req->ringbuf;
+   uint64_t hwpa = req->ring->status_page.gfx_addr;
+   hwpa += index << MI_STORE_DWORD_INDEX_SHIFT;
+
+   intel_logical_ring_emit(ringbuf, MI_STORE_DWORD_IMM_GEN4 | 
MI_GLOBAL_GTT);
+   intel_logical_ring_emit(ringbuf, lower_32_bits(hwpa));
+   intel_logical_ring_emit(ringbuf, upper_32_bits(hwpa)); /* GEN8+ */
+   intel_logical_ring_emit(ringbuf, value);
+
+   req->ring->gpu_caches_dirty = true;
+}
+
+/*
+ * This function stores the specified register value in the (index)th DWORD
+ * of the hardware status page (execlist mode only). See separate code for
+ * legacy mode.
+ */
+static void
+emit_store_reg_index(struct drm_i915_gem_request *req, uint32_t reg, uint32_t 
index)
+{
+   struct intel_ringbuffer *ringbuf = req->ringbuf;
+   uint64_t hwpa = req->ring->status_page.gfx_addr;
+   hwpa += index << MI_STORE_DWORD_INDEX_SHIFT;
+
+   intel_logical_ring_emit(ringbuf, (MI_STORE_REG_MEM+1) | MI_GLOBAL_GTT);
+   intel_logical_ring_emit(ringbuf, reg);
+   intel_logical_ring_emit(ringbuf, lower_32_bits(hwpa));
+   intel_logical_ring_emit(ringbuf, upper_32_bits(hwpa)); /* GEN8+ */
+
+   req->ring->gpu_caches_dirty = true;
+}
+
+/*
+ * Emit the commands to execute when preparing to start a batch
+ *
+ * The GPU will log the seqno of the batch before it starts
+ * running any of the commands to actually execute that batch
+ */
+static void
+emit_preamble(struct drm_i915_gem_request *req)
+{
+   struct intel_ringbuffer *ringbuf = req->ringbuf;
+   uint32_t seqno = i915_gem_request_get_seqno(req);
+
+   BUG_ON(!seqno);
+   if (req->scheduler_flags & i915_req_sf_preempt)
+   emit_store_dw_index(req, seqno, I915_PREEMPTIVE_ACTIVE_SEQNO);
+   else
+   emit_store_dw_index(req, seqno, I915_BATCH_ACTIVE_SEQNO);
+
+   intel_logical_ring_emit(ringbuf, MI_REPORT_HEAD);
+   intel_logical_ring_emit(ringbuf, MI_NOOP);
+
+   req->ring->gpu_caches_dirty = true;
+}
+
+static void
+emit_relconsts_mode(struct i915_execbuffer_params *params)
+{
+  

Re: [Intel-gfx] [PATCH] drm/i915: Allow objects to go back above 4GB in the address range

2015-12-11 Thread Chris Wilson
On Fri, Dec 11, 2015 at 02:34:13PM +, Michel Thierry wrote:
> We detected if objects should be moved to the lower parts when 48-bit
> support flag was not set, but not the other way around.
> 
> This handles the case in which an object was allocated in the 32-bit
> address range, but it has been marked as safe to move above it, which
> theoretically would help to keep the lower addresses available for
> objects which really need to be there.
> 
> Cc: Daniele Ceraolo Spurio 
> Signed-off-by: Michel Thierry 

No. This is not lazy. When we run out of low space, we evict. Until then
don't cause extra work for no reason.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 10/13] drm/i915: Updated request structure tracing

2015-12-11 Thread John . C . Harrison
From: John Harrison 

Added the '_complete' trace event which occurs when a fence/request is
signaled as complete. Also moved the notify event from the IRQ handler
code to inside the notify function itself.

v3: Added the current ring seqno to the notify trace point.

For: VIZ-5190
Signed-off-by: John Harrison 
---
 drivers/gpu/drm/i915/i915_gem.c   |  6 +-
 drivers/gpu/drm/i915/i915_irq.c   |  2 --
 drivers/gpu/drm/i915/i915_trace.h | 13 -
 3 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index f71215f..4817015 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2776,13 +2776,16 @@ void i915_gem_request_notify(struct intel_engine_cs 
*ring, bool fence_locked)
unsigned long flags;
u32 seqno;
 
-   if (list_empty(>fence_signal_list))
+   if (list_empty(>fence_signal_list)) {
+   trace_i915_gem_request_notify(ring, 0);
return;
+   }
 
if (!fence_locked)
spin_lock_irqsave(>fence_lock, flags);
 
seqno = ring->get_seqno(ring, false);
+   trace_i915_gem_request_notify(ring, seqno);
 
list_for_each_entry_safe(req, req_next, >fence_signal_list, 
signal_link) {
if (!req->cancelled) {
@@ -2798,6 +2801,7 @@ void i915_gem_request_notify(struct intel_engine_cs 
*ring, bool fence_locked)
 
if (!req->cancelled) {
fence_signal_locked(>fence);
+   trace_i915_gem_request_complete(req);
}
 
if (req->irq_enabled) {
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 74f8552..d280e05 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -979,8 +979,6 @@ static void notify_ring(struct intel_engine_cs *ring)
if (!intel_ring_initialized(ring))
return;
 
-   trace_i915_gem_request_notify(ring);
-
i915_gem_request_notify(ring, false);
 
wake_up_all(>irq_queue);
diff --git a/drivers/gpu/drm/i915/i915_trace.h 
b/drivers/gpu/drm/i915/i915_trace.h
index 04fe849..41a026d 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -561,23 +561,26 @@ DEFINE_EVENT(i915_gem_request, i915_gem_request_add,
 );
 
 TRACE_EVENT(i915_gem_request_notify,
-   TP_PROTO(struct intel_engine_cs *ring),
-   TP_ARGS(ring),
+   TP_PROTO(struct intel_engine_cs *ring, uint32_t seqno),
+   TP_ARGS(ring, seqno),
 
TP_STRUCT__entry(
 __field(u32, dev)
 __field(u32, ring)
 __field(u32, seqno)
+__field(bool, is_empty)
 ),
 
TP_fast_assign(
   __entry->dev = ring->dev->primary->index;
   __entry->ring = ring->id;
-  __entry->seqno = ring->get_seqno(ring, false);
+  __entry->seqno = seqno;
+  __entry->is_empty = 
list_empty(>fence_signal_list);
   ),
 
-   TP_printk("dev=%u, ring=%u, seqno=%u",
- __entry->dev, __entry->ring, __entry->seqno)
+   TP_printk("dev=%u, ring=%u, seqno=%u, empty=%d",
+ __entry->dev, __entry->ring, __entry->seqno,
+ __entry->is_empty)
 );
 
 DEFINE_EVENT(i915_gem_request, i915_gem_request_retire,
-- 
1.9.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 05/13] drm/i915: Convert requests to use struct fence

2015-12-11 Thread John . C . Harrison
From: John Harrison 

There is a construct in the linux kernel called 'struct fence' that is
intended to keep track of work that is executed on hardware. I.e. it
solves the basic problem that the drivers 'struct
drm_i915_gem_request' is trying to address. The request structure does
quite a lot more than simply track the execution progress so is very
definitely still required. However, the basic completion status side
could be updated to use the ready made fence implementation and gain
all the advantages that provides.

This patch makes the first step of integrating a struct fence into the
request. It replaces the explicit reference count with that of the
fence. It also replaces the 'is completed' test with the fence's
equivalent. Currently, that simply chains on to the original request
implementation. A future patch will improve this.

v3: Updated after review comments by Tvrtko Ursulin. Added fence
context/seqno pair to the debugfs request info. Renamed fence 'driver
name' to just 'i915'. Removed BUG_ONs.

For: VIZ-5190
Signed-off-by: John Harrison 
Cc: Tvrtko Ursulin 
---
 drivers/gpu/drm/i915/i915_debugfs.c |  5 +--
 drivers/gpu/drm/i915/i915_drv.h | 45 +-
 drivers/gpu/drm/i915/i915_gem.c | 56 ++---
 drivers/gpu/drm/i915/intel_lrc.c|  1 +
 drivers/gpu/drm/i915/intel_ringbuffer.c |  1 +
 drivers/gpu/drm/i915/intel_ringbuffer.h |  3 ++
 6 files changed, 81 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index 7415606..5b31186 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -709,11 +709,12 @@ static int i915_gem_request_info(struct seq_file *m, void 
*data)
task = NULL;
if (req->pid)
task = pid_task(req->pid, PIDTYPE_PID);
-   seq_printf(m, "%x @ %d: %s [%d]\n",
+   seq_printf(m, "%x @ %d: %s [%d], fence = %u.%u\n",
   req->seqno,
   (int) (jiffies - req->emitted_jiffies),
   task ? task->comm : "",
-  task ? task->pid : -1);
+  task ? task->pid : -1,
+  req->fence.context, req->fence.seqno);
rcu_read_unlock();
}
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 436149e..aa5cba7 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -51,6 +51,7 @@
 #include 
 #include 
 #include "intel_guc.h"
+#include 
 
 /* General customization:
  */
@@ -2174,7 +2175,17 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
  * initial reference taken using kref_init
  */
 struct drm_i915_gem_request {
-   struct kref ref;
+   /**
+* Underlying object for implementing the signal/wait stuff.
+* NB: Never call fence_later() or return this fence object to user
+* land! Due to lazy allocation, scheduler re-ordering, pre-emption,
+* etc., there is no guarantee at all about the validity or
+* sequentiality of the fence's seqno! It is also unsafe to let
+* anything outside of the i915 driver get hold of the fence object
+* as the clean up when decrementing the reference count requires
+* holding the driver mutex lock.
+*/
+   struct fence fence;
 
/** On Which ring this request was generated */
struct drm_i915_private *i915;
@@ -2251,7 +2262,13 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
   struct intel_context *ctx,
   struct drm_i915_gem_request **req_out);
 void i915_gem_request_cancel(struct drm_i915_gem_request *req);
-void i915_gem_request_free(struct kref *req_ref);
+
+static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
+ bool lazy_coherency)
+{
+   return fence_is_signaled(>fence);
+}
+
 int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
   struct drm_file *file);
 
@@ -2271,7 +2288,7 @@ static inline struct drm_i915_gem_request *
 i915_gem_request_reference(struct drm_i915_gem_request *req)
 {
if (req)
-   kref_get(>ref);
+   fence_get(>fence);
return req;
 }
 
@@ -2279,7 +2296,7 @@ static inline void
 i915_gem_request_unreference(struct drm_i915_gem_request *req)
 {
WARN_ON(!mutex_is_locked(>ring->dev->struct_mutex));
-   kref_put(>ref, i915_gem_request_free);
+   fence_put(>fence);
 }
 
 static inline void
@@ -2291,7 +2308,7 @@ i915_gem_request_unreference__unlocked(struct 

[Intel-gfx] [PATCH 00/13] Convert requests to use struct fence

2015-12-11 Thread John . C . Harrison
From: John Harrison 

There is a construct in the linux kernel called 'struct fence' that is
intended to keep track of work that is executed on hardware. I.e. it
solves the basic problem that the drivers 'struct
drm_i915_gem_request' is trying to address. The request structure does
quite a lot more than simply track the execution progress so is very
definitely still required. However, the basic completion status side
could be updated to use the ready made fence implementation and gain
all the advantages that provides.

Using the struct fence object also has the advantage that the fence
can be used outside of the i915 driver (by other drivers or by
userland applications). That is the basis of the dma-buff
synchronisation API and allows asynchronous tracking of work
completion. In this case, it allows applications to be signalled
directly when a batch buffer completes without having to make an IOCTL
call into the driver.

This is work that was planned since the conversion of the driver from
being seqno value based to being request structure based. This patch
series does that work.

An IGT test to exercise the fence support from user land is in
progress and will follow. Android already makes extensive use of
fences for display composition. Real world linux usage is planned in
the form of Jesse's page table sharing / bufferless execbuf support.
There is also a plan that Wayland (and others) could make use of it in
a similar manner to Android.

v2: Updated for review comments by various people and to add support
for Android style 'native sync'.

v3: Updated from review comments by Tvrtko Ursulin. Also moved sync
framework out of staging and improved request completion handling.

v4: Fixed patch tag (should have been PATCH not RFC). Corrected
ownership of one patch which had passed through many hands before
reaching me. Fixed a bug introduced in v3 and updated for review
comments.

[Patches against drm-intel-nightly tree fetched 17/11/2015]

John Harrison (10):
  staging/android/sync: Move sync framework out of staging
  android/sync: Improved debug dump to dmesg
  drm/i915: Convert requests to use struct fence
  drm/i915: Removed now redudant parameter to i915_gem_request_completed()
  drm/i915: Add per context timelines to fence object
  drm/i915: Delay the freeing of requests until retire time
  drm/i915: Interrupt driven fences
  drm/i915: Updated request structure tracing
  drm/i915: Add sync framework support to execbuff IOCTL
  drm/i915: Cache last IRQ seqno to reduce IRQ overhead

Maarten Lankhorst (2):
  staging/android/sync: Support sync points created from dma-fences
  staging/android/sync: add sync_fence_create_dma

Peter Lawthers (1):
  android/sync: Fix reversed sense of signaled fence

 drivers/android/Kconfig|  28 ++
 drivers/android/Makefile   |   2 +
 drivers/android/sw_sync.c  | 260 ++
 drivers/android/sw_sync.h  |  59 +++
 drivers/android/sync.c | 739 +
 drivers/android/sync.h | 388 +++
 drivers/android/sync_debug.c   | 280 +++
 drivers/android/trace/sync.h   |  82 
 drivers/gpu/drm/i915/Kconfig   |   3 +
 drivers/gpu/drm/i915/i915_debugfs.c|   7 +-
 drivers/gpu/drm/i915/i915_drv.h|  75 +--
 drivers/gpu/drm/i915/i915_gem.c| 438 -
 drivers/gpu/drm/i915/i915_gem_context.c|  15 +-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  95 +++-
 drivers/gpu/drm/i915/i915_irq.c|   2 +-
 drivers/gpu/drm/i915/i915_trace.h  |  13 +-
 drivers/gpu/drm/i915/intel_display.c   |   4 +-
 drivers/gpu/drm/i915/intel_lrc.c   |  13 +
 drivers/gpu/drm/i915/intel_pm.c|   6 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c|   5 +
 drivers/gpu/drm/i915/intel_ringbuffer.h|   9 +
 drivers/staging/android/Kconfig|  28 --
 drivers/staging/android/Makefile   |   2 -
 drivers/staging/android/sw_sync.c  | 260 --
 drivers/staging/android/sw_sync.h  |  59 ---
 drivers/staging/android/sync.c | 729 
 drivers/staging/android/sync.h | 356 --
 drivers/staging/android/sync_debug.c   | 254 --
 drivers/staging/android/trace/sync.h   |  82 
 drivers/staging/android/uapi/sw_sync.h |  32 --
 drivers/staging/android/uapi/sync.h|  97 
 include/uapi/Kbuild|   1 +
 include/uapi/drm/i915_drm.h|  16 +-
 include/uapi/sync/Kbuild   |   3 +
 include/uapi/sync/sw_sync.h|  32 ++
 include/uapi/sync/sync.h   |  97 
 36 files changed, 2600 insertions(+), 1971 deletions(-)
 create mode 100644 drivers/android/sw_sync.c
 create mode 100644 drivers/android/sw_sync.h
 create mode 100644 

[Intel-gfx] [PATCH 07/13] drm/i915: Add per context timelines to fence object

2015-12-11 Thread John . C . Harrison
From: John Harrison 

The fence object used inside the request structure requires a sequence
number. Although this is not used by the i915 driver itself, it could
potentially be used by non-i915 code if the fence is passed outside of
the driver. This is the intention as it allows external kernel drivers
and user applications to wait on batch buffer completion
asynchronously via the dma-buff fence API.

To ensure that such external users are not confused by strange things
happening with the seqno, this patch adds in a per context timeline
that can provide a guaranteed in-order seqno value for the fence. This
is safe because the scheduler will not re-order batch buffers within a
context - they are considered to be mutually dependent.

v2: New patch in series.

v3: Renamed/retyped timeline structure fields after review comments by
Tvrtko Ursulin.

Added context information to the timeline's name string for better
identification in debugfs output.

For: VIZ-5190
Signed-off-by: John Harrison 
Cc: Tvrtko Ursulin 
---
 drivers/gpu/drm/i915/i915_drv.h | 25 ---
 drivers/gpu/drm/i915/i915_gem.c | 80 +
 drivers/gpu/drm/i915/i915_gem_context.c | 15 ++-
 drivers/gpu/drm/i915/intel_lrc.c|  8 
 drivers/gpu/drm/i915/intel_ringbuffer.h |  1 -
 5 files changed, 111 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index caf7897..7d6a7c0 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -841,6 +841,15 @@ struct i915_ctx_hang_stats {
bool banned;
 };
 
+struct i915_fence_timeline {
+   charname[32];
+   unsignedfence_context;
+   unsignednext;
+
+   struct intel_context *ctx;
+   struct intel_engine_cs *ring;
+};
+
 /* This must match up with the value previously used for execbuf2.rsvd1. */
 #define DEFAULT_CONTEXT_HANDLE 0
 
@@ -885,6 +894,7 @@ struct intel_context {
struct drm_i915_gem_object *state;
struct intel_ringbuffer *ringbuf;
int pin_count;
+   struct i915_fence_timeline fence_timeline;
} engine[I915_NUM_RINGS];
 
struct list_head link;
@@ -2177,13 +2187,10 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
 struct drm_i915_gem_request {
/**
 * Underlying object for implementing the signal/wait stuff.
-* NB: Never call fence_later() or return this fence object to user
-* land! Due to lazy allocation, scheduler re-ordering, pre-emption,
-* etc., there is no guarantee at all about the validity or
-* sequentiality of the fence's seqno! It is also unsafe to let
-* anything outside of the i915 driver get hold of the fence object
-* as the clean up when decrementing the reference count requires
-* holding the driver mutex lock.
+* NB: Never return this fence object to user land! It is unsafe to
+* let anything outside of the i915 driver get hold of the fence
+* object as the clean up when decrementing the reference count
+* requires holding the driver mutex lock.
 */
struct fence fence;
 
@@ -2263,6 +2270,10 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
   struct drm_i915_gem_request **req_out);
 void i915_gem_request_cancel(struct drm_i915_gem_request *req);
 
+int i915_create_fence_timeline(struct drm_device *dev,
+  struct intel_context *ctx,
+  struct intel_engine_cs *ring);
+
 static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
 {
return fence_is_signaled(>fence);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 0801738..7a37fb7 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2665,9 +2665,32 @@ static const char 
*i915_gem_request_get_driver_name(struct fence *req_fence)
 
 static const char *i915_gem_request_get_timeline_name(struct fence *req_fence)
 {
-   struct drm_i915_gem_request *req = container_of(req_fence,
-typeof(*req), fence);
-   return req->ring->name;
+   struct drm_i915_gem_request *req;
+   struct i915_fence_timeline *timeline;
+
+   req = container_of(req_fence, typeof(*req), fence);
+   timeline = >ctx->engine[req->ring->id].fence_timeline;
+
+   return timeline->name;
+}
+
+static void i915_gem_request_timeline_value_str(struct fence *req_fence, char 
*str, int size)
+{
+   struct drm_i915_gem_request *req;
+
+   req = container_of(req_fence, typeof(*req), fence);
+
+   /* Last signalled timeline value ??? */
+   snprintf(str, size, "? [%d]"/*, timeline->value*/, 
req->ring->get_seqno(req->ring, true));
+}
+
+static 

[Intel-gfx] [PATCH 06/13] drm/i915: Removed now redudant parameter to i915_gem_request_completed()

2015-12-11 Thread John . C . Harrison
From: John Harrison 

The change to the implementation of i915_gem_request_completed() means
that the lazy coherency flag is no longer used. This can now be
removed to simplify the interface.

For: VIZ-5190
Signed-off-by: John Harrison 
---
 drivers/gpu/drm/i915/i915_debugfs.c  |  2 +-
 drivers/gpu/drm/i915/i915_drv.h  |  3 +--
 drivers/gpu/drm/i915/i915_gem.c  | 18 +-
 drivers/gpu/drm/i915/intel_display.c |  2 +-
 drivers/gpu/drm/i915/intel_pm.c  |  4 ++--
 5 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index 5b31186..18dfb56 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -601,7 +601,7 @@ static int i915_gem_pageflip_info(struct seq_file *m, void 
*data)
   
i915_gem_request_get_seqno(work->flip_queued_req),
   dev_priv->next_seqno,
   ring->get_seqno(ring, true),
-  
i915_gem_request_completed(work->flip_queued_req, true));
+  
i915_gem_request_completed(work->flip_queued_req));
} else
seq_printf(m, "Flip not associated with any 
ring\n");
seq_printf(m, "Flip queued on frame %d, (was ready on 
frame %d), now %d\n",
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index aa5cba7..caf7897 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2263,8 +2263,7 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
   struct drm_i915_gem_request **req_out);
 void i915_gem_request_cancel(struct drm_i915_gem_request *req);
 
-static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
- bool lazy_coherency)
+static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
 {
return fence_is_signaled(>fence);
 }
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index a1b4dbd..0801738 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1165,7 +1165,7 @@ static int __i915_spin_request(struct 
drm_i915_gem_request *req)
 
timeout = jiffies + 1;
while (!need_resched()) {
-   if (i915_gem_request_completed(req, true))
+   if (i915_gem_request_completed(req))
return 0;
 
if (time_after_eq(jiffies, timeout))
@@ -1173,7 +1173,7 @@ static int __i915_spin_request(struct 
drm_i915_gem_request *req)
 
cpu_relax_lowlatency();
}
-   if (i915_gem_request_completed(req, false))
+   if (i915_gem_request_completed(req))
return 0;
 
return -EAGAIN;
@@ -1217,7 +1217,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
if (list_empty(>list))
return 0;
 
-   if (i915_gem_request_completed(req, true))
+   if (i915_gem_request_completed(req))
return 0;
 
timeout_expire = timeout ?
@@ -1257,7 +1257,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
break;
}
 
-   if (i915_gem_request_completed(req, false)) {
+   if (i915_gem_request_completed(req)) {
ret = 0;
break;
}
@@ -2758,7 +2758,7 @@ i915_gem_find_active_request(struct intel_engine_cs *ring)
struct drm_i915_gem_request *request;
 
list_for_each_entry(request, >request_list, list) {
-   if (i915_gem_request_completed(request, false))
+   if (i915_gem_request_completed(request))
continue;
 
return request;
@@ -2899,7 +2899,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs 
*ring)
   struct drm_i915_gem_request,
   list);
 
-   if (!i915_gem_request_completed(request, true))
+   if (!i915_gem_request_completed(request))
break;
 
i915_gem_request_retire(request);
@@ -2923,7 +2923,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs 
*ring)
}
 
if (unlikely(ring->trace_irq_req &&
-i915_gem_request_completed(ring->trace_irq_req, true))) {
+i915_gem_request_completed(ring->trace_irq_req))) {
ring->irq_put(ring);
i915_gem_request_assign(>trace_irq_req, NULL);
}
@@ -3029,7 +3029,7 @@ i915_gem_object_flush_active(struct drm_i915_gem_object 
*obj)
if (list_empty(>list))

[Intel-gfx] [PATCH 08/13] drm/i915: Delay the freeing of requests until retire time

2015-12-11 Thread John . C . Harrison
From: John Harrison 

The request structure is reference counted. When the count reached
zero, the request was immediately freed and all associated objects
were unrefereced/unallocated. This meant that the driver mutex lock
must be held at the point where the count reaches zero. This was fine
while all references were held internally to the driver. However, the
plan is to allow the underlying fence object (and hence the request
itself) to be returned to other drivers and to userland. External
users cannot be expected to acquire a driver private mutex lock.

Rather than attempt to disentangle the request structure from the
driver mutex lock, the decsion was to defer the free code until a
later (safer) point. Hence this patch changes the unreference callback
to merely move the request onto a delayed free list. The driver's
retire worker thread will then process the list and actually call the
free function on the requests.

v2: New patch in series.

v3: Updated after review comments by Tvrtko Ursulin. Rename list nodes
to 'link' rather than 'list'. Update list processing to be more
efficient/safer with respect to spinlocks.

v4: Changed to use basic spinlocks rather than IRQ ones - missed
update from earlier feedback by Tvrtko.

For: VIZ-5190
Signed-off-by: John Harrison 
Cc: Tvrtko Ursulin 
---
 drivers/gpu/drm/i915/i915_drv.h | 22 +++-
 drivers/gpu/drm/i915/i915_gem.c | 37 +
 drivers/gpu/drm/i915/intel_display.c|  2 +-
 drivers/gpu/drm/i915/intel_lrc.c|  2 ++
 drivers/gpu/drm/i915/intel_pm.c |  2 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c |  2 ++
 drivers/gpu/drm/i915/intel_ringbuffer.h |  4 
 7 files changed, 46 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 7d6a7c0..fbf591f 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2185,14 +2185,9 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
  * initial reference taken using kref_init
  */
 struct drm_i915_gem_request {
-   /**
-* Underlying object for implementing the signal/wait stuff.
-* NB: Never return this fence object to user land! It is unsafe to
-* let anything outside of the i915 driver get hold of the fence
-* object as the clean up when decrementing the reference count
-* requires holding the driver mutex lock.
-*/
+   /** Underlying object for implementing the signal/wait stuff. */
struct fence fence;
+   struct list_head delayed_free_link;
 
/** On Which ring this request was generated */
struct drm_i915_private *i915;
@@ -2305,21 +2300,10 @@ i915_gem_request_reference(struct drm_i915_gem_request 
*req)
 static inline void
 i915_gem_request_unreference(struct drm_i915_gem_request *req)
 {
-   WARN_ON(!mutex_is_locked(>ring->dev->struct_mutex));
-   fence_put(>fence);
-}
-
-static inline void
-i915_gem_request_unreference__unlocked(struct drm_i915_gem_request *req)
-{
-   struct drm_device *dev;
-
if (!req)
return;
 
-   dev = req->ring->dev;
-   if (kref_put_mutex(>fence.refcount, fence_release, 
>struct_mutex))
-   mutex_unlock(>struct_mutex);
+   fence_put(>fence);
 }
 
 static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 7a37fb7..f6c3e96 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2617,10 +2617,26 @@ static void i915_set_reset_status(struct 
drm_i915_private *dev_priv,
}
 }
 
-static void i915_gem_request_free(struct fence *req_fence)
+static void i915_gem_request_release(struct fence *req_fence)
 {
struct drm_i915_gem_request *req = container_of(req_fence,
 typeof(*req), fence);
+   struct intel_engine_cs *ring = req->ring;
+   struct drm_i915_private *dev_priv = to_i915(ring->dev);
+
+   /*
+* Need to add the request to a deferred dereference list to be
+* processed at a mutex lock safe time.
+*/
+   spin_lock(>delayed_free_lock);
+   list_add_tail(>delayed_free_link, >delayed_free_list);
+   spin_unlock(>delayed_free_lock);
+
+   queue_delayed_work(dev_priv->wq, _priv->mm.retire_work, 0);
+}
+
+static void i915_gem_request_free(struct drm_i915_gem_request *req)
+{
struct intel_context *ctx = req->ctx;
 
WARN_ON(!mutex_is_locked(>ring->dev->struct_mutex));
@@ -2697,7 +2713,7 @@ static const struct fence_ops i915_gem_request_fops = {
.enable_signaling   = i915_gem_request_enable_signaling,
.signaled   = i915_gem_request_is_completed,
.wait   = fence_default_wait,
-   .release 

[Intel-gfx] [PATCH 11/13] android/sync: Fix reversed sense of signaled fence

2015-12-11 Thread John . C . Harrison
From: Peter Lawthers 

In the 3.14 kernel, a signaled fence was indicated by the status field
== 1. In 4.x, a status == 0 indicates signaled, status < 0 indicates error,
and status > 0 indicates active.

This patch wraps the check for a signaled fence in a function so that
callers no longer needs to know the underlying implementation.

v3: New patch for series.

Change-Id: I8e565e49683e3efeb9474656cd84cf4add6ad6a2
Tracked-On: https://jira01.devtools.intel.com/browse/ACD-308
Signed-off-by: Peter Lawthers 
---
 drivers/android/sync.h | 21 +
 1 file changed, 21 insertions(+)

diff --git a/drivers/android/sync.h b/drivers/android/sync.h
index d57fa0a..75532d8 100644
--- a/drivers/android/sync.h
+++ b/drivers/android/sync.h
@@ -345,6 +345,27 @@ int sync_fence_cancel_async(struct sync_fence *fence,
  */
 int sync_fence_wait(struct sync_fence *fence, long timeout);
 
+/**
+ * sync_fence_is_signaled() - Return an indication if the fence is signaled
+ * @fence: fence to check
+ *
+ * returns 1 if fence is signaled
+ * returns 0 if fence is not signaled
+ * returns < 0 if fence is in error state
+ */
+static inline int
+sync_fence_is_signaled(struct sync_fence *fence)
+{
+   int status;
+
+   status = atomic_read(>status);
+   if (status == 0)
+   return 1;
+   if (status > 0)
+   return 0;
+   return status;
+}
+
 #ifdef CONFIG_DEBUG_FS
 
 void sync_timeline_debug_add(struct sync_timeline *obj);
-- 
1.9.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 01/13] staging/android/sync: Support sync points created from dma-fences

2015-12-11 Thread John . C . Harrison
From: Maarten Lankhorst 

Debug output assumes all sync points are built on top of Android sync points
and when we start creating them from dma-fences will NULL ptr deref unless
taught about this.

v4: Corrected patch ownership.

Signed-off-by: Maarten Lankhorst 
Signed-off-by: Tvrtko Ursulin 
Cc: Maarten Lankhorst 
Cc: de...@driverdev.osuosl.org
Cc: Riley Andrews 
Cc: Greg Kroah-Hartman 
Cc: Arve Hjønnevåg 
---
 drivers/staging/android/sync_debug.c | 42 +++-
 1 file changed, 22 insertions(+), 20 deletions(-)

diff --git a/drivers/staging/android/sync_debug.c 
b/drivers/staging/android/sync_debug.c
index 91ed2c4..f45d13c 100644
--- a/drivers/staging/android/sync_debug.c
+++ b/drivers/staging/android/sync_debug.c
@@ -82,36 +82,42 @@ static const char *sync_status_str(int status)
return "error";
 }
 
-static void sync_print_pt(struct seq_file *s, struct sync_pt *pt, bool fence)
+static void sync_print_pt(struct seq_file *s, struct fence *pt, bool fence)
 {
int status = 1;
-   struct sync_timeline *parent = sync_pt_parent(pt);
 
-   if (fence_is_signaled_locked(>base))
-   status = pt->base.status;
+   if (fence_is_signaled_locked(pt))
+   status = pt->status;
 
seq_printf(s, "  %s%spt %s",
-  fence ? parent->name : "",
+  fence && pt->ops->get_timeline_name ?
+  pt->ops->get_timeline_name(pt) : "",
   fence ? "_" : "",
   sync_status_str(status));
 
if (status <= 0) {
struct timespec64 ts64 =
-   ktime_to_timespec64(pt->base.timestamp);
+   ktime_to_timespec64(pt->timestamp);
 
seq_printf(s, "@%lld.%09ld", (s64)ts64.tv_sec, ts64.tv_nsec);
}
 
-   if (parent->ops->timeline_value_str &&
-   parent->ops->pt_value_str) {
+   if ((!fence || pt->ops->timeline_value_str) &&
+   pt->ops->fence_value_str) {
char value[64];
+   bool success;
 
-   parent->ops->pt_value_str(pt, value, sizeof(value));
-   seq_printf(s, ": %s", value);
-   if (fence) {
-   parent->ops->timeline_value_str(parent, value,
-   sizeof(value));
-   seq_printf(s, " / %s", value);
+   pt->ops->fence_value_str(pt, value, sizeof(value));
+   success = strlen(value);
+
+   if (success)
+   seq_printf(s, ": %s", value);
+
+   if (success && fence) {
+   pt->ops->timeline_value_str(pt, value, sizeof(value));
+
+   if (strlen(value))
+   seq_printf(s, " / %s", value);
}
}
 
@@ -138,7 +144,7 @@ static void sync_print_obj(struct seq_file *s, struct 
sync_timeline *obj)
list_for_each(pos, >child_list_head) {
struct sync_pt *pt =
container_of(pos, struct sync_pt, child_list);
-   sync_print_pt(s, pt, false);
+   sync_print_pt(s, >base, false);
}
spin_unlock_irqrestore(>child_list_lock, flags);
 }
@@ -153,11 +159,7 @@ static void sync_print_fence(struct seq_file *s, struct 
sync_fence *fence)
   sync_status_str(atomic_read(>status)));
 
for (i = 0; i < fence->num_fences; ++i) {
-   struct sync_pt *pt =
-   container_of(fence->cbs[i].sync_pt,
-struct sync_pt, base);
-
-   sync_print_pt(s, pt, true);
+   sync_print_pt(s, fence->cbs[i].sync_pt, true);
}
 
spin_lock_irqsave(>wq.lock, flags);
-- 
1.9.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 02/13] staging/android/sync: add sync_fence_create_dma

2015-12-11 Thread John . C . Harrison
From: Maarten Lankhorst 

This allows users of dma fences to create a android fence.

v2: Added kerneldoc. (Tvrtko Ursulin).

v4: Updated comments from review feedback my Maarten.

Signed-off-by: Maarten Lankhorst 
Signed-off-by: Tvrtko Ursulin 
Cc: Maarten Lankhorst 
Cc: Daniel Vetter 
Cc: Jesse Barnes 
Cc: de...@driverdev.osuosl.org
Cc: Riley Andrews 
Cc: Greg Kroah-Hartman 
Cc: Arve Hjønnevåg 
---
 drivers/staging/android/sync.c | 13 +
 drivers/staging/android/sync.h | 10 ++
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/android/sync.c b/drivers/staging/android/sync.c
index f83e00c..7f0e919 100644
--- a/drivers/staging/android/sync.c
+++ b/drivers/staging/android/sync.c
@@ -188,7 +188,7 @@ static void fence_check_cb_func(struct fence *f, struct 
fence_cb *cb)
 }
 
 /* TODO: implement a create which takes more that one sync_pt */
-struct sync_fence *sync_fence_create(const char *name, struct sync_pt *pt)
+struct sync_fence *sync_fence_create_dma(const char *name, struct fence *pt)
 {
struct sync_fence *fence;
 
@@ -199,16 +199,21 @@ struct sync_fence *sync_fence_create(const char *name, 
struct sync_pt *pt)
fence->num_fences = 1;
atomic_set(>status, 1);
 
-   fence->cbs[0].sync_pt = >base;
+   fence->cbs[0].sync_pt = pt;
fence->cbs[0].fence = fence;
-   if (fence_add_callback(>base, >cbs[0].cb,
-  fence_check_cb_func))
+   if (fence_add_callback(pt, >cbs[0].cb, fence_check_cb_func))
atomic_dec(>status);
 
sync_fence_debug_add(fence);
 
return fence;
 }
+EXPORT_SYMBOL(sync_fence_create_dma);
+
+struct sync_fence *sync_fence_create(const char *name, struct sync_pt *pt)
+{
+   return sync_fence_create_dma(name, >base);
+}
 EXPORT_SYMBOL(sync_fence_create);
 
 struct sync_fence *sync_fence_fdget(int fd)
diff --git a/drivers/staging/android/sync.h b/drivers/staging/android/sync.h
index 61f8a3a..afa0752 100644
--- a/drivers/staging/android/sync.h
+++ b/drivers/staging/android/sync.h
@@ -254,6 +254,16 @@ void sync_pt_free(struct sync_pt *pt);
  */
 struct sync_fence *sync_fence_create(const char *name, struct sync_pt *pt);
 
+/**
+ * sync_fence_create_dma() - creates a sync fence from dma-fence
+ * @name:  name of fence to create
+ * @pt:dma-fence to add to the fence
+ *
+ * Creates a fence containg @pt.  Once this is called, the fence takes
+ * ownership of @pt.
+ */
+struct sync_fence *sync_fence_create_dma(const char *name, struct fence *pt);
+
 /*
  * API for sync_fence consumers
  */
-- 
1.9.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 00/40] GPU scheduler for i915 driver

2015-12-11 Thread John . C . Harrison
From: John Harrison 

Implemented a batch buffer submission scheduler for the i915 DRM driver.

The general theory of operation is that when batch buffers are
submitted to the driver, the execbuffer() code assigns a unique seqno
value and then packages up all the information required to execute the
batch buffer at a later time. This package is given over to the
scheduler which adds it to an internal node list. The scheduler also
scans the list of objects associated with the batch buffer and
compares them against the objects already in use by other buffers in
the node list. If matches are found then the new batch buffer node is
marked as being dependent upon the matching node. The same is done for
the context object. The scheduler also bumps up the priority of such
matching nodes on the grounds that the more dependencies a given batch
buffer has the more important it is likely to be.

The scheduler aims to have a given (tuneable) number of batch buffers
in flight on the hardware at any given time. If fewer than this are
currently executing when a new node is queued, then the node is passed
straight through to the submit function. Otherwise it is simply added
to the queue and the driver returns back to user land.

As each batch buffer completes, it raises an interrupt which wakes up
the scheduler. Note that it is possible for multiple buffers to
complete before the IRQ handler gets to run. Further, the seqno values
of the individual buffers are not necessary incrementing as the
scheduler may have re-ordered their submission. However, the scheduler
keeps the list of executing buffers in order of hardware submission.
Thus it can scan through the list until a matching seqno is found and
then mark all in flight nodes from that point on as completed.

A deferred work queue is also poked by the interrupt handler. When
this wakes up it can do more involved processing such as actually
removing completed nodes from the queue and freeing up the resources
associated with them (internal memory allocations, DRM object
references, context reference, etc.). The work handler also checks the
in flight count and calls the submission code if a new slot has
appeared.

When the scheduler's submit code is called, it scans the queued node
list for the highest priority node that has no unmet dependencies.
Note that the dependency calculation is complex as it must take
inter-ring dependencies and potential preemptions into account. Note
also that in the future this will be extended to include external
dependencies such as the Android Native Sync file descriptors and/or
the linux dma-buff synchronisation scheme.

If a suitable node is found then it is sent to execbuff_final() for
submission to the hardware. The in flight count is then re-checked and
a new node popped from the list if appropriate.

The scheduler also allows high priority batch buffers (e.g. from a
desktop compositor) to jump ahead of whatever is already running if
the underlying hardware supports pre-emption. In this situation, any
work that was pre-empted is returned to the queued list ready to be
resubmitted when no more high priority work is outstanding.

Various IGT tests are in progress to test the scheduler's operation
and will follow.

v2: Updated for changes in struct fence patch series and other changes
to underlying tree (e.g. removal of cliprects). Also changed priority
levels to be signed +/-1023 range and reduced mutex lock usage.

v3: More reuse of cached pointers rather than repeated dereferencing
(David Gordon).

Moved the dependency generation code out to a seperate function for
easier readability. Also added in support for the read-read
optimisation.

Major simplification of the DRM file close handler.

Fixed up an overzealous WARN.

Removed unnecessary flushing of the scheduler queue when waiting for a
request.


[Patches against drm-intel-nightly tree fetched 17/11/2015 with struct
fence conversion patches applied]

Dave Gordon (3):
  drm/i915: Updating assorted register and status page definitions
  drm/i915: Cache request pointer in *_submission_final()
  drm/i915: Add scheduling priority to per-context parameters

John Harrison (37):
  drm/i915: Add total count to context status debugfs output
  drm/i915: Explicit power enable during deferred context initialisation
  drm/i915: Prelude to splitting i915_gem_do_execbuffer in two
  drm/i915: Split i915_dem_do_execbuffer() in half
  drm/i915: Re-instate request->uniq because it is extremely useful
  drm/i915: Start of GPU scheduler
  drm/i915: Prepare retire_requests to handle out-of-order seqnos
  drm/i915: Disable hardware semaphores when GPU scheduler is enabled
  drm/i915: Force MMIO flips when scheduler enabled
  drm/i915: Added scheduler hook when closing DRM file handles
  drm/i915: Added scheduler hook into i915_gem_request_notify()
  drm/i915: Added deferred work handler for scheduler
  drm/i915: Redirect execbuffer_final() via scheduler
  drm/i915: Keep the 

Re: [Intel-gfx] [PATCH] drm/i915: Update to post-reset execlist queue clean-up

2015-12-11 Thread Dave Gordon

On 01/12/15 11:46, Tvrtko Ursulin wrote:


On 23/10/15 18:02, Tomas Elf wrote:

When clearing an execlist queue, instead of traversing it and
unreferencing all
requests while holding the spinlock (which might lead to thread
sleeping with
IRQs are turned off - bad news!), just move all requests to the retire
request
list while holding spinlock and then drop spinlock and invoke the
execlists
request retirement path, which already deals with the intricacies of
purging/dereferencing execlist queue requests.

This patch can be considered v3 of:

commit b96db8b81c54ef30485ddb5992d63305d86ea8d3
Author: Tomas Elf 
drm/i915: Grab execlist spinlock to avoid post-reset concurrency
issues

This patch assumes v2 of the above patch is part of the baseline,
reverts v2
and adds changes on top to turn it into v3.

Signed-off-by: Tomas Elf 
Cc: Tvrtko Ursulin 
Cc: Chris Wilson 
---
  drivers/gpu/drm/i915/i915_gem.c | 15 ---
  1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c
b/drivers/gpu/drm/i915/i915_gem.c
index 2c7a0b7..b492603 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2756,20 +2756,13 @@ static void i915_gem_reset_ring_cleanup(struct
drm_i915_private *dev_priv,

  if (i915.enable_execlists) {
  spin_lock_irq(>execlist_lock);
-while (!list_empty(>execlist_queue)) {
-struct drm_i915_gem_request *submit_req;

-submit_req = list_first_entry(>execlist_queue,
-struct drm_i915_gem_request,
-execlist_link);
-list_del(_req->execlist_link);
+/* list_splice_tail_init checks for empty lists */
+list_splice_tail_init(>execlist_queue,
+  >execlist_retired_req_list);

-if (submit_req->ctx != ring->default_context)
-intel_lr_context_unpin(submit_req);
-
-i915_gem_request_unreference(submit_req);
-}
  spin_unlock_irq(>execlist_lock);
+intel_execlists_retire_requests(ring);
  }

  /*


Fallen through the cracks..

This looks to be even more serious, since lockdep notices possible
deadlock involving vmap_area_lock:

  Possible interrupt unsafe locking scenario:

CPU0CPU1

   lock(vmap_area_lock);
local_irq_disable();
lock(&(>execlist_lock)->rlock);
lock(vmap_area_lock);
   
 lock(&(>execlist_lock)->rlock);

  *** DEADLOCK ***

Because it unpins LRC context and ringbuffer which ends up in the VM
code under the execlist_lock.

intel_execlists_retire_requests is slightly different from the code in
the reset handler because it concerns itself with ctx_obj existence
which the other one doesn't.

Could people more knowledgeable of this code check if it is OK and R-B?

Regards,

Tvrtko


Hi Tvrtko,

I didn't understand this message at first, I thought you'd found a 
problem with this ("v3") patch, but now I see what you actually meant is 
that there is indeed a problem with the (v2) that got merged, not the 
original question about unreferencing an object while holding a spinlock 
(because it can't be the last reference), but rather because of the 
unpin, which can indeed cause a problem with a non-i915-defined kernel lock.


So we should certainly update the current (v2) upstream with this.
Thomas Daniel already R-B'd this code on 23rd October, when it was:

[PATCH v3 7/8] drm/i915: Grab execlist spinlock to avoid post-reset 
concurrency issues.


and it hasn't changed in substance since then, so you can carry his R-B 
over, plus I said on that same day that this was a better solution. So:


Reviewed-by: Thomas Daniel 
Reviewed-by: Dave Gordon 

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v2] PM / Runtime: Introduce pm_runtime_get_noidle

2015-12-11 Thread Imre Deak
On to, 2015-12-10 at 23:14 +0100, Rafael J. Wysocki wrote:
> On Thursday, December 10, 2015 11:20:40 PM Imre Deak wrote:
> > On Thu, 2015-12-10 at 22:42 +0100, Rafael J. Wysocki wrote:
> > > On Thursday, December 10, 2015 10:36:37 PM Rafael J. Wysocki
> > > wrote:
> > > > On Thursday, December 10, 2015 11:43:50 AM Imre Deak wrote:
> > > > > On Thu, 2015-12-10 at 01:58 +0100, Rafael J. Wysocki wrote:
> > > > > > On Wednesday, December 09, 2015 06:22:19 PM Joonas Lahtinen
> > > > > > wrote:
> > > > > > > Introduce pm_runtime_get_noidle to for situations where
> > > > > > > it is
> > > > > > > not
> > > > > > > desireable to touch an idling device. One use scenario is
> > > > > > > periodic
> > > > > > > hangchecks performed by the drm/i915 driver which can be
> > > > > > > omitted
> > > > > > > on a device in a runtime idle state.
> > > > > > > 
> > > > > > > v2:
> > > > > > > - Fix inconsistent return value when !CONFIG_PM.
> > > > > > > - Update documentation for bool return value
> > > > > > > 
> > > > > > > Signed-off-by: Joonas Lahtinen  > > > > > > el.c
> > > > > > > om>
> > > > > > > Reported-by: Chris Wilson 
> > > > > > > Cc: Chris Wilson 
> > > > > > > Cc: "Rafael J. Wysocki" 
> > > > > > > Cc: linux...@vger.kernel.org
> > > > > > 
> > > > > > Well, I don't quite see how this can be used in a non-racy
> > > > > > way
> > > > > > without doing an additional pm_runtime_resume() or
> > > > > > something
> > > > > > like
> > > > > > that in the same code path.
> > > > > 
> > > > > We don't want to resume, that would be the whole point. We'd
> > > > > like
> > > > > to
> > > > > ensure that we hold a reference _and_ the device is already
> > > > > active. So
> > > > > AFAICS we'd need to check runtime_status == RPM_ACTIVE in
> > > > > addition
> > > > > after taking the reference.
> > > > 
> > > > Right, and that under the lock.
> > > 
> > > Which basically means you can call pm_runtime_resume() just fine,
> > > because it will do nothing if the status is RPM_ACTIVE already.
> > > 
> > > So really, why don't you use pm_runtime_get_sync()?
> > 
> > The difference would be that if the status is not RPM_ACTIVE
> > already we
> > would drop the reference and report error. The caller would in this
> > case forego of doing something, since we the device is suspended or
> > on
> > the way to being suspended. One example of such a scenario is a
> > watchdog like functionality: the watchdog work would
> > call pm_runtime_get_noidle() and check if the device is ok by doing
> > some HW access, but only if the device is powered. Otherwise the
> > work
> > item would do nothing (meaning it also won't reschedule itself).
> > The
> > watchdog work would get rescheduled next time the device is woken
> > up
> > and some work is submitted to the device.
> 
> So first of all the name "pm_runtime_get_noidle" doesn't make sense.
> 
> I guess what you need is something like
> 
> bool pm_runtime_get_if_active(struct device *dev)
> {
>   unsigned log flags;
>   bool ret;
> 
>   spin_lock_irqsave(>power.lock, flags);
> 
>   if (dev->power.runtime_status == RPM_ACTIVE) {

But here usage_count could be zero, meaning that the device is already
on the way to be suspended (autosuspend or ASYNC suspend), no? In that
case we don't want to return success. That would unnecessarily prolong
the time the device is kept active.

>   atomic_inc(>power.usage_count);
>   ret = true;
>   } else {
>   ret = false;
>   }
> 
>   spin_unlock_irqrestore(>power.lock, flags);
> }
> 
> and the caller will simply bail out if "false" is returned, but if
> "true"
> is returned, it will have to drop the usage count, right?

Yes.

--Imre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH i-g-t] tests/gem_softpin: Use offset addresses in canonical form

2015-12-11 Thread Michel Thierry
i915 validates that requested offset is in canonical form, so tests need
to convert the offsets as required.

Also add test to verify non-canonical 48-bit address will be rejected.

Signed-off-by: Michel Thierry 
---
 tests/gem_softpin.c | 66 +
 1 file changed, 46 insertions(+), 20 deletions(-)

diff --git a/tests/gem_softpin.c b/tests/gem_softpin.c
index 7bee16b..2981b30 100644
--- a/tests/gem_softpin.c
+++ b/tests/gem_softpin.c
@@ -67,7 +67,7 @@ static void *create_mem_buffer(uint64_t size);
 static int gem_call_userptr_ioctl(int fd, i915_gem_userptr *userptr);
 static void gem_pin_userptr_test(void);
 static void gem_pin_bo_test(void);
-static void gem_pin_invalid_vma_test(bool test_decouple_flags);
+static void gem_pin_invalid_vma_test(bool test_decouple_flags, bool 
test_canonical_offset);
 static void gem_pin_overlap_test(void);
 static void gem_pin_high_address_test(void);
 
@@ -198,6 +198,15 @@ static void setup_exec_obj(struct 
drm_i915_gem_exec_object2 *exec,
exec->offset = offset;
 }
 
+/* gen8_canonical_addr
+ * Used to convert any address into canonical form, i.e. [63:48] == [47].
+ * @address - a virtual address
+*/
+static uint64_t gen8_canonical_addr(uint64_t address)
+{
+   return ((int64_t)address << 16) >> 16;
+}
+
 /* gem_store_data_svm
  * populate batch buffer with MI_STORE_DWORD_IMM command
  * @fd: drm file descriptor
@@ -630,6 +639,7 @@ static void gem_pin_overlap_test(void)
  * Share with GPU using userptr ioctl
  * Create batch buffer to write DATA in first element of each buffer
  * Pin each buffer to varying addresses starting from 0x8000 going 
below
+ * (requires offsets in canonical form)
  * Execute Batch Buffer on Blit ring STRESS_NUM_LOOPS times
  * Validate every buffer has DATA in first element
  * Rinse and Repeat on Render ring
@@ -637,7 +647,7 @@ static void gem_pin_overlap_test(void)
 #define STRESS_NUM_BUFFERS 10
 #define STRESS_NUM_LOOPS 100
 #define STRESS_STORE_COMMANDS 4 * STRESS_NUM_BUFFERS
-
+#define STRESS_START_ADDRESS 0x8000
 static void gem_softpin_stress_test(void)
 {
i915_gem_userptr userptr;
@@ -650,7 +660,7 @@ static void gem_softpin_stress_test(void)
uint32_t batch_buf_handle;
int ring, len;
int buf, loop;
-   uint64_t pinning_offset = 0x8000;
+   uint64_t pinning_offset = STRESS_START_ADDRESS;
 
fd = drm_open_driver(DRIVER_INTEL);
igt_require(uses_full_ppgtt(fd, FULL_48_BIT_PPGTT));
@@ -680,10 +690,10 @@ static void gem_softpin_stress_test(void)
setup_exec_obj(_object2[buf], shared_handle[buf],
   EXEC_OBJECT_PINNED |
   EXEC_OBJECT_SUPPORTS_48B_ADDRESS,
-  pinning_offset);
+  gen8_canonical_addr(pinning_offset));
len += gem_store_data_svm(fd, batch_buffer + (len/4),
- pinning_offset, buf,
- (buf == STRESS_NUM_BUFFERS-1)? \
+ gen8_canonical_addr(pinning_offset),
+ buf, (buf == STRESS_NUM_BUFFERS-1)? \
  true:false);
 
/* decremental 4K aligned address */
@@ -705,10 +715,11 @@ static void gem_softpin_stress_test(void)
for (loop = 0; loop < STRESS_NUM_LOOPS; loop++) {
submit_and_sync(fd, , batch_buf_handle);
/* Set pinning offset back to original value */
-   pinning_offset = 0x8000;
+   pinning_offset = STRESS_START_ADDRESS;
for(buf = 0; buf < STRESS_NUM_BUFFERS; buf++) {
gem_userptr_sync(fd, shared_handle[buf]);
-   igt_assert(exec_object2[buf].offset == 
pinning_offset);
+   igt_assert(exec_object2[buf].offset ==
+   gen8_canonical_addr(pinning_offset));
igt_fail_on_f(*shared_buffer[buf] != buf, \
"Mismatch in buffer %d, iteration %d: 
0x%08X\n", \
buf, loop, *shared_buffer[buf]);
@@ -727,10 +738,11 @@ static void gem_softpin_stress_test(void)
 STRESS_NUM_BUFFERS + 1, len);
for (loop = 0; loop < STRESS_NUM_LOOPS; loop++) {
submit_and_sync(fd, , batch_buf_handle);
-   pinning_offset = 0x8000;
+   pinning_offset = STRESS_START_ADDRESS;
for(buf = 0; buf < STRESS_NUM_BUFFERS; buf++) {
gem_userptr_sync(fd, shared_handle[buf]);
-   igt_assert(exec_object2[buf].offset == pinning_offset);
+   

Re: [Intel-gfx] [PATCH 13/13] drm/i915: Cache last IRQ seqno to reduce IRQ overhead

2015-12-11 Thread Tvrtko Ursulin



On 11/12/15 13:12, john.c.harri...@intel.com wrote:

From: John Harrison 

The notify function can be called many times without the seqno
changing. A large number of duplicates are to prevent races due to the
requirement of not enabling interrupts until requested. However, when
interrupts are enabled the IRQ handle can be called multiple times
without the ring's seqno value changing. This patch reduces the
overhead of these extra calls by caching the last processed seqno
value and early exiting if it has not changed.

v3: New patch for series.

For: VIZ-5190
Signed-off-by: John Harrison 
---
  drivers/gpu/drm/i915/i915_gem.c | 14 +++---
  drivers/gpu/drm/i915/intel_ringbuffer.h |  1 +
  2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 279d79f..3c88678 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2457,6 +2457,8 @@ i915_gem_init_seqno(struct drm_device *dev, u32 seqno)

for (j = 0; j < ARRAY_SIZE(ring->semaphore.sync_seqno); j++)
ring->semaphore.sync_seqno[j] = 0;
+
+   ring->last_irq_seqno = 0;
}

return 0;
@@ -2788,11 +2790,14 @@ void i915_gem_request_notify(struct intel_engine_cs 
*ring, bool fence_locked)
return;
}

-   if (!fence_locked)
-   spin_lock_irqsave(>fence_lock, flags);
-
seqno = ring->get_seqno(ring, false);
trace_i915_gem_request_notify(ring, seqno);
+   if (seqno == ring->last_irq_seqno)
+   return;
+   ring->last_irq_seqno = seqno;


Hmmm.. do you want to make the check "seqno <= ring->last_irq_seqno" ?

Is there a possibility for some weird timing or caching issue where two 
callers get in and last_irq_seqno goes backwards? Not sure that it would 
cause a problem, but pattern is unusual and hard to understand for me.


Also check and the assignment would need to be under the spinlock I think.


+
+   if (!fence_locked)
+   spin_lock_irqsave(>fence_lock, flags);

list_for_each_entry_safe(req, req_next, >fence_signal_list, 
signal_link) {
if (!req->cancelled) {
@@ -3163,7 +3168,10 @@ static void i915_gem_reset_ring_cleanup(struct 
drm_i915_private *dev_priv,
 * Tidy up anything left over. This includes a call to
 * i915_gem_request_notify() which will make sure that any requests
 * that were on the signal pending list get also cleaned up.
+* NB: The seqno cache must be cleared otherwise the notify call will
+* simply return immediately.
 */
+   ring->last_irq_seqno = 0;
i915_gem_retire_requests_ring(ring);

/* Having flushed all requests from all queues, we know that all
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h 
b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 9d09edb..1987abd 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -356,6 +356,7 @@ struct  intel_engine_cs {
spinlock_t fence_lock;
struct list_head fence_signal_list;
struct list_head fence_unsignal_list;
+   uint32_t last_irq_seqno;
  };

  bool intel_ring_initialized(struct intel_engine_cs *ring);



Regards,

Tvrtko
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Correct max delay for HDMI hotplug live status checking

2015-12-11 Thread Ville Syrjälä
On Fri, Dec 11, 2015 at 05:05:11AM +, Jindal, Sonika wrote:
> How about following instead of two levels of check in the while loop:
> 
> unsigned int retry = 3;
> 
> do {
>   live_status = intel_digital_port_connected(dev_priv,
>   hdmi_to_dig_port(intel_hdmi));
>   if (live_status)
>   break;
>   mdelay(10);
> } while (--retry);

How about a straight up for loop instead?

> 
> Regards,
> Sonika
> 
> -Original Message-
> From: Intel-gfx [mailto:intel-gfx-boun...@lists.freedesktop.org] On Behalf Of 
> Wang, Gary C
> Sent: Friday, December 11, 2015 7:39 AM
> To: intel-gfx@lists.freedesktop.org
> Subject: [Intel-gfx] [PATCH] drm/i915: Correct max delay for HDMI hotplug 
> live status checking
> 
> The total delay of HDMI hotplug detecting with 30ms should have been split 
> into a resolution of 3 retries of 10ms each, for the worst cases. But it 
> still suffered from only waiting 10ms at most in intel_hdmi_detect(). This 
> patch corrects it by reading hotplug status with 4 times at most for 30ms 
> delay.
> 
> Reviewed-by: Cooper Chiou 
> Cc: Gavin Hindman 
> Signed-off-by: Gary Wang 
> ---
>  drivers/gpu/drm/i915/intel_hdmi.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)  mode change 100644 => 
> 100755 drivers/gpu/drm/i915/intel_hdmi.c
> 
> diff --git a/drivers/gpu/drm/i915/intel_hdmi.c 
> b/drivers/gpu/drm/i915/intel_hdmi.c
> old mode 100644
> new mode 100755
> index be7fab9..888401b
> --- a/drivers/gpu/drm/i915/intel_hdmi.c
> +++ b/drivers/gpu/drm/i915/intel_hdmi.c
> @@ -1387,16 +1387,19 @@ intel_hdmi_detect(struct drm_connector *connector, 
> bool force)
>   struct intel_hdmi *intel_hdmi = intel_attached_hdmi(connector);
>   struct drm_i915_private *dev_priv = to_i915(connector->dev);
>   bool live_status = false;
> - unsigned int retry = 3;
> + // read hotplug status 4 times at most for 30ms delay (3 retries of 
> 10ms each)
> + unsigned int retry = 4;
>  
>   DRM_DEBUG_KMS("[CONNECTOR:%d:%s]\n",
> connector->base.id, connector->name);
>  
>   intel_display_power_get(dev_priv, POWER_DOMAIN_GMBUS);
>  
> - while (!live_status && --retry) {
> + while (!live_status && retry--) {
>   live_status = intel_digital_port_connected(dev_priv,
>   hdmi_to_dig_port(intel_hdmi));
> + if (live_status || !retry)
> + break;
>   mdelay(10);
>   }
>  
> --
> 1.9.1
> 
> ___
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> ___
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ville Syrjälä
Intel OTC
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 05/40] drm/i915: Split i915_dem_do_execbuffer() in half

2015-12-11 Thread John . C . Harrison
From: John Harrison 

Split the execbuffer() function in half. The first half collects and
validates all the information requried to process the batch buffer. It
also does all the object pinning, relocations, active list management,
etc - basically anything that must be done upfront before the IOCTL
returns and allows the user land side to start changing/freeing
things. The second half does the actual ring submission.

This change implements the split but leaves the back half being called
directly from the end of the front half.

v2: Updated due to changes in underlying tree - addition of sync fence
support and removal of cliprects.

v3: Moved local 'ringbuf' variable to make later patches in the
series a bit neater.

Change-Id: I5e1c77639ce526ab2401b0323186c518bf13da0a
For: VIZ-1587
Signed-off-by: John Harrison 
---
 drivers/gpu/drm/i915/i915_drv.h|  11 +++
 drivers/gpu/drm/i915/i915_gem.c|   2 +
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 130 -
 drivers/gpu/drm/i915/intel_lrc.c   |  57 +
 drivers/gpu/drm/i915/intel_lrc.h   |   1 +
 5 files changed, 145 insertions(+), 56 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 194bca0..eb00454 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1679,10 +1679,18 @@ struct i915_execbuffer_params {
struct drm_device   *dev;
struct drm_file *file;
uint32_tdispatch_flags;
+   uint32_targs_flags;
uint32_targs_batch_start_offset;
+   uint32_targs_batch_len;
+   uint32_targs_num_cliprects;
+   uint32_targs_DR1;
+   uint32_targs_DR4;
uint64_tbatch_obj_vm_offset;
struct intel_engine_cs  *ring;
struct drm_i915_gem_object  *batch_obj;
+   struct drm_clip_rect*cliprects;
+   uint32_tinstp_mask;
+   int instp_mode;
struct intel_context*ctx;
struct drm_i915_gem_request *request;
 };
@@ -1944,6 +1952,7 @@ struct drm_i915_private {
int (*execbuf_submit)(struct i915_execbuffer_params *params,
  struct drm_i915_gem_execbuffer2 *args,
  struct list_head *vmas);
+   int (*execbuf_final)(struct i915_execbuffer_params *params);
int (*init_rings)(struct drm_device *dev);
void (*cleanup_ring)(struct intel_engine_cs *ring);
void (*stop_ring)(struct intel_engine_cs *ring);
@@ -2798,9 +2807,11 @@ int i915_gem_sw_finish_ioctl(struct drm_device *dev, 
void *data,
 void i915_gem_execbuffer_move_to_active(struct list_head *vmas,
struct drm_i915_gem_request *req);
 void i915_gem_execbuffer_retire_commands(struct i915_execbuffer_params 
*params);
+void i915_gem_execbuff_release_batch_obj(struct drm_i915_gem_object 
*batch_obj);
 int i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params,
   struct drm_i915_gem_execbuffer2 *args,
   struct list_head *vmas);
+int i915_gem_ringbuffer_submission_final(struct i915_execbuffer_params 
*params);
 int i915_gem_execbuffer(struct drm_device *dev, void *data,
struct drm_file *file_priv);
 int i915_gem_execbuffer2(struct drm_device *dev, void *data,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 3c88678..b9501ca 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -5257,11 +5257,13 @@ int i915_gem_init(struct drm_device *dev)
 
if (!i915.enable_execlists) {
dev_priv->gt.execbuf_submit = i915_gem_ringbuffer_submission;
+   dev_priv->gt.execbuf_final = 
i915_gem_ringbuffer_submission_final;
dev_priv->gt.init_rings = i915_gem_init_rings;
dev_priv->gt.cleanup_ring = intel_cleanup_ring_buffer;
dev_priv->gt.stop_ring = intel_stop_ring_buffer;
} else {
dev_priv->gt.execbuf_submit = intel_execlists_submission;
+   dev_priv->gt.execbuf_final = intel_execlists_submission_final;
dev_priv->gt.init_rings = intel_logical_rings_init;
dev_priv->gt.cleanup_ring = intel_logical_ring_cleanup;
dev_priv->gt.stop_ring = intel_logical_ring_stop;
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index f7f1057..05c9de6 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ 

[Intel-gfx] [PATCH 18/40] drm/i915: Hook scheduler node clean up into retire requests

2015-12-11 Thread John . C . Harrison
From: John Harrison 

The scheduler keeps its own lock on various DRM objects in order to
guarantee safe access long after the original execbuff IOCTL has
completed. This is especially important when pre-emption is enabled as
the batch buffer might need to be submitted to the hardware multiple
times. This patch hooks the clean up of these locks into the request
retire function. The request can only be retired after it has
completed on the hardware and thus is no longer eligible for
re-submission. Thus there is no point holding on to the locks beyond
that time.

v3: Updated to not WARN when cleaning a node that is being cancelled.
The clean will happen later so skipping it at the point of
cancellation is fine.

For: VIZ-1587
Signed-off-by: John Harrison 
---
 drivers/gpu/drm/i915/i915_gem.c   |  3 ++
 drivers/gpu/drm/i915/i915_scheduler.c | 54 ---
 drivers/gpu/drm/i915/i915_scheduler.h |  1 +
 3 files changed, 42 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index dc5f3fe..349ff58 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1402,6 +1402,9 @@ static void i915_gem_request_retire(struct 
drm_i915_gem_request *request)
fence_signal_locked(>fence);
}
 
+   if (request->scheduler_qe)
+   i915_gem_scheduler_clean_node(request->scheduler_qe);
+
i915_gem_request_unreference(request);
 }
 
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c 
b/drivers/gpu/drm/i915/i915_scheduler.c
index 300cd89..f88c871 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -406,6 +406,41 @@ void i915_scheduler_wakeup(struct drm_device *dev)
queue_work(dev_priv->wq, _priv->mm.scheduler_work);
 }
 
+void i915_gem_scheduler_clean_node(struct i915_scheduler_queue_entry *node)
+{
+   uint32_t i;
+
+   if (!I915_SQS_IS_COMPLETE(node)) {
+   WARN(!node->params.request->cancelled,
+"Cleaning active node: %d!\n", node->status);
+   return;
+   }
+
+   if (node->params.batch_obj) {
+   /* The batch buffer must be unpinned before it is unreferenced
+* otherwise the unpin fails with a missing vma!? */
+   if (node->params.dispatch_flags & I915_DISPATCH_SECURE)
+   
i915_gem_execbuff_release_batch_obj(node->params.batch_obj);
+
+   node->params.batch_obj = NULL;
+   }
+
+   /* Release the locked buffers: */
+   for (i = 0; i < node->num_objs; i++) {
+   drm_gem_object_unreference(
+   >saved_objects[i].obj->base);
+   }
+   kfree(node->saved_objects);
+   node->saved_objects = NULL;
+   node->num_objs = 0;
+
+   /* Context too: */
+   if (node->params.ctx) {
+   i915_gem_context_unreference(node->params.ctx);
+   node->params.ctx = NULL;
+   }
+}
+
 static int i915_scheduler_remove(struct intel_engine_cs *ring)
 {
struct drm_i915_private *dev_priv = ring->dev->dev_private;
@@ -415,7 +450,7 @@ static int i915_scheduler_remove(struct intel_engine_cs 
*ring)
int flying = 0, queued = 0;
int ret = 0;
booldo_submit;
-   uint32_ti, min_seqno;
+   uint32_tmin_seqno;
struct list_headremove;
 
if (list_empty(>node_queue[ring->id]))
@@ -514,21 +549,8 @@ static int i915_scheduler_remove(struct intel_engine_cs 
*ring)
node = list_first_entry(, typeof(*node), link);
list_del(>link);
 
-   /* The batch buffer must be unpinned before it is unreferenced
-* otherwise the unpin fails with a missing vma!? */
-   if (node->params.dispatch_flags & I915_DISPATCH_SECURE)
-   
i915_gem_execbuff_release_batch_obj(node->params.batch_obj);
-
-   /* Release the locked buffers: */
-   for (i = 0; i < node->num_objs; i++) {
-   drm_gem_object_unreference(
-   >saved_objects[i].obj->base);
-   }
-   kfree(node->saved_objects);
-
-   /* Context too: */
-   if (node->params.ctx)
-   i915_gem_context_unreference(node->params.ctx);
+   /* Free up all the DRM object references */
+   i915_gem_scheduler_clean_node(node);
 
/* And anything else owned by the node: */
node->params.request->scheduler_qe = NULL;
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h 
b/drivers/gpu/drm/i915/i915_scheduler.h
index 56f68e5..54d87fb 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -88,6 +88,7 @@ 

[Intel-gfx] [PATCH 21/40] drm/i915: Added scheduler flush calls to ring throttle and idle functions

2015-12-11 Thread John . C . Harrison
From: John Harrison 

When requesting that all GPU work is completed, it is now necessary to
get the scheduler involved in order to flush out work that queued and
not yet submitted.

v2: Updated to add support for flushing the scheduler queue by time
stamp rather than just doing a blanket flush.

v3: Moved submit_max_priority() to this patch from an earlier patch
is it is no longer required in the other.

Change-Id: I95dcc2a2ee5c1a844748621c333994ddd6cf6a66
For: VIZ-1587
Signed-off-by: John Harrison 
---
 drivers/gpu/drm/i915/i915_gem.c   |  24 ++-
 drivers/gpu/drm/i915/i915_scheduler.c | 132 ++
 drivers/gpu/drm/i915/i915_scheduler.h |   3 +
 3 files changed, 158 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 1a05c97..541ed9a 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3765,6 +3765,10 @@ int i915_gpu_idle(struct drm_device *dev)
 
/* Flush everything onto the inactive list. */
for_each_ring(ring, dev_priv, i) {
+   ret = i915_scheduler_flush(ring, true);
+   if (ret < 0)
+   return ret;
+
if (!i915.enable_execlists) {
struct drm_i915_gem_request *req;
 
@@ -4478,7 +4482,8 @@ i915_gem_ring_throttle(struct drm_device *dev, struct 
drm_file *file)
unsigned long recent_enough = jiffies - DRM_I915_THROTTLE_JIFFIES;
struct drm_i915_gem_request *request, *target = NULL;
unsigned reset_counter;
-   int ret;
+   int i, ret;
+   struct intel_engine_cs *ring;
 
ret = i915_gem_wait_for_error(_priv->gpu_error);
if (ret)
@@ -4488,6 +4493,23 @@ i915_gem_ring_throttle(struct drm_device *dev, struct 
drm_file *file)
if (ret)
return ret;
 
+   for_each_ring(ring, dev_priv, i) {
+   /*
+* Flush out scheduler entries that are getting 'stale'. Note
+* that the following recent_enough test will only check
+* against the time at which the request was submitted to the
+* hardware (i.e. when it left the scheduler) not the time it
+* was submitted to the driver.
+*
+* Also, there is not much point worring about busy return
+* codes from the scheduler flush call. Even if more work
+* cannot be submitted right now for whatever reason, we
+* still want to throttle against stale work that has already
+* been submitted.
+*/
+   i915_scheduler_flush_stamp(ring, recent_enough, false);
+   }
+
spin_lock(_priv->mm.lock);
list_for_each_entry(request, _priv->mm.request_list, client_list) {
if (time_after_eq(request->emitted_jiffies, recent_enough))
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c 
b/drivers/gpu/drm/i915/i915_scheduler.c
index 386f157..c13dbc3 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -31,6 +31,8 @@ static int i915_scheduler_remove_dependent(struct 
i915_scheduler *schedu
   struct 
i915_scheduler_queue_entry *remove);
 static int i915_scheduler_submit(struct intel_engine_cs *ring,
 bool is_locked);
+static int i915_scheduler_submit_max_priority(struct intel_engine_cs 
*ring,
+ bool is_locked);
 static uint32_ti915_scheduler_count_flying(struct i915_scheduler 
*scheduler,
   struct intel_engine_cs *ring);
 static voidi915_scheduler_priority_bump_clear(struct i915_scheduler 
*scheduler);
@@ -580,6 +582,98 @@ void i915_gem_scheduler_work_handler(struct work_struct 
*work)
}
 }
 
+int i915_scheduler_flush_stamp(struct intel_engine_cs *ring,
+  unsigned long target,
+  bool is_locked)
+{
+   struct i915_scheduler_queue_entry *node;
+   struct drm_i915_private   *dev_priv;
+   struct i915_scheduler *scheduler;
+   unsigned long   flags;
+   int flush_count = 0;
+
+   if (!ring)
+   return -EINVAL;
+
+   dev_priv  = ring->dev->dev_private;
+   scheduler = dev_priv->scheduler;
+
+   if (!scheduler)
+   return 0;
+
+   if (is_locked && (scheduler->flags[ring->id] & i915_sf_submitting)) {
+   /* Scheduler is busy already submitting another batch,
+* come back later rather than going recursive... */
+   return -EAGAIN;
+   }
+
+   spin_lock_irqsave(>lock, flags);
+   i915_scheduler_priority_bump_clear(scheduler);
+   

[Intel-gfx] [PATCH 24/40] drm/i915: Defer seqno allocation until actual hardware submission time

2015-12-11 Thread John . C . Harrison
From: John Harrison 

The seqno value is now only used for the final test for completion of
a request. It is no longer used to track the request through the
software stack. Thus it is no longer necessary to allocate the seqno
immediately with the request. Instead, it can be done lazily and left
until the request is actually sent to the hardware. This is particular
advantageous with a GPU scheduler as the requests can then be
re-ordered between their creation and their hardware submission
without having out of order seqnos.

v2: i915_add_request() can't fail!

Combine with 'drm/i915: Assign seqno at start of exec_final()'
Various bits of code during the execbuf code path need a seqno value
to be assigned to the request. This change makes this assignment
explicit at the start of submission_final() rather than relying on an
auto-generated seqno to have happened already. This is in preparation
for a future patch which changes seqno values to be assigned lazily
(during add_request).

v3: Updated to use locally cached request pointer.

Change-Id: I0d922b84c517611a79fa6c2b9e730d4fe3671d6a
For: VIZ-1587
Signed-off-by: John Harrison 
---
 drivers/gpu/drm/i915/i915_drv.h|  1 +
 drivers/gpu/drm/i915/i915_gem.c| 21 -
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 13 +
 drivers/gpu/drm/i915/intel_lrc.c   | 13 +
 4 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 5b893a6..15dee41 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2225,6 +2225,7 @@ struct drm_i915_gem_request {
 
/** GEM sequence number associated with this request. */
uint32_t seqno;
+   uint32_t reserved_seqno;
 
/* Unique identifier which can be used for trace points & debug */
uint32_t uniq;
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 99e5b1d0..1fb45c2 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2525,6 +2525,9 @@ i915_gem_get_seqno(struct drm_device *dev, u32 *seqno)
 
/* reserve 0 for non-seqno */
if (dev_priv->next_seqno == 0) {
+   /* Why is the full re-initialisation required? Is it only for
+* hardware semaphores? If so, could skip it in the case where
+* semaphores are disabled? */
int ret = i915_gem_init_seqno(dev, 0);
if (ret)
return ret;
@@ -2582,6 +2585,12 @@ void __i915_add_request(struct drm_i915_gem_request 
*request,
WARN(ret, "*_ring_flush_all_caches failed: %d!\n", ret);
}
 
+   /* Make the request's seqno 'live': */
+   if(!request->seqno) {
+   request->seqno = request->reserved_seqno;
+   WARN_ON(request->seqno != dev_priv->last_seqno);
+   }
+
/* Record the position of the start of the request so that
 * should we detect the updated seqno part-way through the
 * GPU processing the request, we never over-estimate the
@@ -2830,6 +2839,9 @@ void i915_gem_request_notify(struct intel_engine_cs 
*ring, bool fence_locked)
 
list_for_each_entry_safe(req, req_next, >fence_signal_list, 
signal_link) {
if (!req->cancelled) {
+   /* How can this happen? */
+   WARN_ON(req->seqno == 0);
+
if (!i915_seqno_passed(seqno, req->seqno))
break;
}
@@ -3054,7 +3066,14 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
if (req == NULL)
return -ENOMEM;
 
-   ret = i915_gem_get_seqno(ring->dev, >seqno);
+   /*
+* Assign an identifier to track this request through the hardware
+* but don't make it live yet. It could change in the future if this
+* request gets overtaken. However, it still needs to be allocated
+* in advance because the point of submission must not fail and seqno
+* allocation can fail.
+*/
+   ret = i915_gem_get_seqno(ring->dev, >reserved_seqno);
if (ret)
goto err;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 0908699..7970958 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1249,6 +1249,19 @@ int i915_gem_ringbuffer_submission_final(struct 
i915_execbuffer_params *params)
/* The mutex must be acquired before calling this function */
BUG_ON(!mutex_is_locked(>dev->struct_mutex));
 
+   /* Make sure the request's seqno is the latest and greatest: */
+   if(req->reserved_seqno != dev_priv->last_seqno) {
+   ret = i915_gem_get_seqno(ring->dev, >reserved_seqno);
+   if 

[Intel-gfx] [PATCH 28/40] drm/i915: Added trace points to scheduler

2015-12-11 Thread John . C . Harrison
From: John Harrison 

Added trace points to the scheduler to track all the various events,
node state transitions and other interesting things that occur.

v2: Updated for new request completion tracking implementation.

v3: Updated for changes to node kill code.

Change-Id: I9886390cfc7897bc1faf50a104bc651d8baed8a5
For: VIZ-1587
Signed-off-by: John Harrison 
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   2 +
 drivers/gpu/drm/i915/i915_scheduler.c  |  26 
 drivers/gpu/drm/i915/i915_trace.h  | 190 +
 drivers/gpu/drm/i915/intel_lrc.c   |   2 +
 4 files changed, 220 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index fdaede3..b358b21 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1226,6 +1226,8 @@ i915_gem_ringbuffer_submission(struct 
i915_execbuffer_params *params,
 
i915_gem_execbuffer_move_to_active(vmas, params->request);
 
+   trace_i915_gem_ring_queue(ring, params);
+
qe = container_of(params, typeof(*qe), params);
ret = i915_scheduler_queue_execbuffer(qe);
if (ret)
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c 
b/drivers/gpu/drm/i915/i915_scheduler.c
index 39aa702..4736f0f 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -151,6 +151,8 @@ int i915_scheduler_queue_execbuffer(struct 
i915_scheduler_queue_entry *qe)
if (i915.scheduler_override & i915_so_direct_submit) {
int ret;
 
+   trace_i915_scheduler_queue(qe->params.ring, qe);
+
WARN_ON(qe->params.fence_wait &&
(!sync_fence_is_signaled(qe->params.fence_wait)));
 
@@ -271,6 +273,9 @@ int i915_scheduler_queue_execbuffer(struct 
i915_scheduler_queue_entry *qe)
not_flying = i915_scheduler_count_flying(scheduler, ring) <
 scheduler->min_flying;
 
+   trace_i915_scheduler_queue(ring, node);
+   trace_i915_scheduler_node_state_change(ring, node);
+
spin_unlock_irqrestore(>lock, flags);
 
if (not_flying)
@@ -298,6 +303,9 @@ static int i915_scheduler_fly_node(struct 
i915_scheduler_queue_entry *node)
 
node->status = i915_sqs_flying;
 
+   trace_i915_scheduler_fly(ring, node);
+   trace_i915_scheduler_node_state_change(ring, node);
+
if (!(scheduler->flags[ring->id] & i915_sf_interrupts_enabled)) {
boolsuccess = true;
 
@@ -363,6 +371,8 @@ static void i915_scheduler_node_requeue(struct 
i915_scheduler_queue_entry *node)
 
node->status = i915_sqs_queued;
node->params.request->seqno = 0;
+   trace_i915_scheduler_unfly(node->params.ring, node);
+   trace_i915_scheduler_node_state_change(node->params.ring, node);
 }
 
 /* Give up on a node completely. For example, because it is causing the
@@ -372,7 +382,11 @@ static void i915_scheduler_node_kill(struct 
i915_scheduler_queue_entry *node)
BUG_ON(!node);
BUG_ON(I915_SQS_IS_COMPLETE(node));
 
+   if (I915_SQS_IS_FLYING(node))
+   trace_i915_scheduler_unfly(node->params.ring, node);
+
node->status = i915_sqs_dead;
+   trace_i915_scheduler_node_state_change(node->params.ring, node);
 }
 
 /*
@@ -392,6 +406,8 @@ bool i915_scheduler_notify_request(struct 
drm_i915_gem_request *req)
struct i915_scheduler_queue_entry *node = req->scheduler_qe;
unsigned long   flags;
 
+   trace_i915_scheduler_landing(req);
+
if (!node)
return false;
 
@@ -405,6 +421,8 @@ bool i915_scheduler_notify_request(struct 
drm_i915_gem_request *req)
else
node->status = i915_sqs_complete;
 
+   trace_i915_scheduler_node_state_change(req->ring, node);
+
spin_unlock_irqrestore(>lock, flags);
 
return true;
@@ -550,6 +568,8 @@ static int i915_scheduler_remove(struct intel_engine_cs 
*ring)
/* Launch more packets now? */
do_submit = (queued > 0) && (flying < scheduler->min_flying);
 
+   trace_i915_scheduler_remove(ring, min_seqno, do_submit);
+
spin_unlock_irqrestore(>lock, flags);
 
if (!do_submit && list_empty())
@@ -564,6 +584,8 @@ static int i915_scheduler_remove(struct intel_engine_cs 
*ring)
node = list_first_entry(, typeof(*node), link);
list_del(>link);
 
+   trace_i915_scheduler_destroy(ring, node);
+
if (node->params.fence_wait)
sync_fence_put(node->params.fence_wait);
 
@@ -927,6 +949,8 @@ static int i915_scheduler_pop_from_queue_locked(struct 
intel_engine_cs *ring,
INIT_LIST_HEAD(>link);
best->status  = i915_sqs_popped;
 
+   trace_i915_scheduler_node_state_change(ring, best);
+
  

[Intel-gfx] [PATCH 16/40] drm/i915: Keep the reserved space mechanism happy

2015-12-11 Thread John . C . Harrison
From: John Harrison 

Ring space is reserved when constructing a request to ensure that the
subsequent 'add_request()' call cannot fail due to waiting for space
on a busy or broken GPU. However, the scheduler jumps in to the middle
of the execbuffer process between request creation and request
submission. Thus it needs to cancel the reserved space when the
request is simply added to the scheduler's queue and not yet
submitted. Similarly, it needs to re-reserve the space when it finally
does want to send the batch buffer to the hardware.

v3: Updated to use locally cached request pointer.

For: VIZ-1587
Signed-off-by: John Harrison 
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  7 +++
 drivers/gpu/drm/i915/i915_scheduler.c  |  4 
 drivers/gpu/drm/i915/intel_lrc.c   | 13 +++--
 3 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index b5d618a..2c7a395 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1249,6 +1249,10 @@ int i915_gem_ringbuffer_submission_final(struct 
i915_execbuffer_params *params)
/* The mutex must be acquired before calling this function */
BUG_ON(!mutex_is_locked(>dev->struct_mutex));
 
+   ret = intel_ring_reserve_space(req);
+   if (ret)
+   return ret;
+
intel_runtime_pm_get(dev_priv);
 
/*
@@ -1309,6 +1313,9 @@ error:
 */
intel_runtime_pm_put(dev_priv);
 
+   if (ret)
+   intel_ring_reserved_space_cancel(req->ringbuf);
+
return ret;
 }
 
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c 
b/drivers/gpu/drm/i915/i915_scheduler.c
index 0e657cf..9d1475f 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -145,6 +145,8 @@ int i915_scheduler_queue_execbuffer(struct 
i915_scheduler_queue_entry *qe)
if (1/*i915.scheduler_override & i915_so_direct_submit*/) {
int ret;
 
+   intel_ring_reserved_space_cancel(qe->params.request->ringbuf);
+
scheduler->flags[qe->params.ring->id] |= i915_sf_submitting;
ret = dev_priv->gt.execbuf_final(>params);
scheduler->flags[qe->params.ring->id] &= ~i915_sf_submitting;
@@ -174,6 +176,8 @@ int i915_scheduler_queue_execbuffer(struct 
i915_scheduler_queue_entry *qe)
node->stamp  = jiffies;
i915_gem_request_reference(node->params.request);
 
+   intel_ring_reserved_space_cancel(node->params.request->ringbuf);
+
BUG_ON(node->params.request->scheduler_qe);
node->params.request->scheduler_qe = node;
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index f14d9b2..ebc951e 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -934,13 +934,17 @@ int intel_execlists_submission_final(struct 
i915_execbuffer_params *params)
/* The mutex must be acquired before calling this function */
BUG_ON(!mutex_is_locked(>dev->struct_mutex));
 
+   ret = intel_logical_ring_reserve_space(req);
+   if (ret)
+   return ret;
+
/*
 * Unconditionally invalidate gpu caches and ensure that we do flush
 * any residual writes from the previous batch.
 */
ret = logical_ring_invalidate_all_caches(req);
if (ret)
-   return ret;
+   goto err;
 
if (ring == _priv->ring[RCS] &&
params->instp_mode != dev_priv->relative_constants_mode) {
@@ -962,13 +966,18 @@ int intel_execlists_submission_final(struct 
i915_execbuffer_params *params)
 
ret = ring->emit_bb_start(req, exec_start, params->dispatch_flags);
if (ret)
-   return ret;
+   goto err;
 
trace_i915_gem_ring_dispatch(req, params->dispatch_flags);
 
i915_gem_execbuffer_retire_commands(params);
 
return 0;
+
+err:
+   intel_ring_reserved_space_cancel(params->request->ringbuf);
+
+   return ret;
 }
 
 void intel_execlists_retire_requests(struct intel_engine_cs *ring)
-- 
1.9.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 19/40] drm/i915: Added scheduler support to __wait_request() calls

2015-12-11 Thread John . C . Harrison
From: John Harrison 

The scheduler can cause batch buffers, and hence requests, to be
submitted to the ring out of order and asynchronously to their
submission to the driver. Thus at the point of waiting for the
completion of a given request, it is not even guaranteed that the
request has actually been sent to the hardware yet. Even it is has
been sent, it is possible that it could be pre-empted and thus
'unsent'.

This means that it is necessary to be able to submit requests to the
hardware during the wait call itself. Unfortunately, while some
callers of __wait_request() release the mutex lock first, others do
not (and apparently can not). Hence there is the ability to deadlock
as the wait stalls for submission but the asynchronous submission is
stalled for the mutex lock.

This change hooks the scheduler in to the __wait_request() code to
ensure correct behaviour. That is, flush the target batch buffer
through to the hardware and do not deadlock waiting for something that
cannot currently be submitted. Instead, the wait call must return
EAGAIN at least as far back as necessary to release the mutex lock and
allow the scheduler's asynchronous processing to get in and handle the
pre-emption operation and eventually (re-)submit the work.

v3: Removed the explicit scheduler flush from i915_wait_request().
This is no longer necessary and was causing unintended changes to the
scheduler priority level which broke a validation team test.

Change-Id: I31fe6bc7e38f6ffdd843fcae16e7cc8b1e52a931
For: VIZ-1587
Signed-off-by: John Harrison 
---
 drivers/gpu/drm/i915/i915_drv.h |  3 ++-
 drivers/gpu/drm/i915/i915_gem.c | 33 ++---
 drivers/gpu/drm/i915/i915_scheduler.c   | 20 
 drivers/gpu/drm/i915/i915_scheduler.h   |  2 ++
 drivers/gpu/drm/i915/intel_display.c|  5 +++--
 drivers/gpu/drm/i915/intel_ringbuffer.c |  2 +-
 6 files changed, 54 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 9a67f7c..5ed600c 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3029,7 +3029,8 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
unsigned reset_counter,
bool interruptible,
s64 *timeout,
-   struct intel_rps_client *rps);
+   struct intel_rps_client *rps,
+   bool is_locked);
 int __must_check i915_wait_request(struct drm_i915_gem_request *req);
 int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf);
 int __must_check
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 349ff58..784000b 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1207,7 +1207,8 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
unsigned reset_counter,
bool interruptible,
s64 *timeout,
-   struct intel_rps_client *rps)
+   struct intel_rps_client *rps,
+   bool is_locked)
 {
struct intel_engine_cs *ring = i915_gem_request_get_ring(req);
struct drm_device *dev = ring->dev;
@@ -1217,8 +1218,10 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
DEFINE_WAIT(wait);
unsigned long timeout_expire;
s64 before, now;
-   int ret;
+   int ret = 0;
+   boolbusy;
 
+   might_sleep();
WARN(!intel_irqs_enabled(dev_priv), "IRQs disabled");
 
if (i915_gem_request_completed(req))
@@ -1269,6 +1272,22 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
break;
}
 
+   if (is_locked) {
+   /* If this request is being processed by the scheduler
+* then it is unsafe to sleep with the mutex lock held
+* as the scheduler may require the lock in order to
+* progress the request. */
+   if (i915_scheduler_is_request_tracked(req, NULL, 
)) {
+   if (busy) {
+   ret = -EAGAIN;
+   break;
+   }
+   }
+
+   /* If the request is not tracked by the scheduler then 
the
+* regular test can be done. */
+   }
+
if (i915_gem_request_completed(req)) {
ret = 0;
break;
@@ -1455,7 +1474,7 @@ i915_wait_request(struct drm_i915_gem_request *req)
 
ret = __i915_wait_request(req,
  
atomic_read(_priv->gpu_error.reset_counter),
- 

Re: [Intel-gfx] [PATCH] drm/i915: Fix context/engine cleanup order

2015-12-11 Thread Chris Wilson
On Fri, Dec 11, 2015 at 02:36:36PM +, Nick Hoath wrote:
> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> index 84e2b20..a2857b0 100644
> --- a/drivers/gpu/drm/i915/i915_dma.c
> +++ b/drivers/gpu/drm/i915/i915_dma.c
> @@ -449,7 +449,7 @@ static int i915_load_modeset_init(struct drm_device *dev)
>  
>  cleanup_gem:
>   mutex_lock(>struct_mutex);
> - i915_gem_cleanup_ringbuffer(dev);
> + i915_gem_cleanup_engines(dev);
>   i915_gem_context_fini(dev);
>   mutex_unlock(>struct_mutex);
>  cleanup_irq:
> @@ -1188,8 +1188,8 @@ int i915_driver_unload(struct drm_device *dev)
>  
>   intel_guc_ucode_fini(dev);
>   mutex_lock(>struct_mutex);
> - i915_gem_cleanup_ringbuffer(dev);
>   i915_gem_context_fini(dev);
> + i915_gem_cleanup_engines(dev);
>   mutex_unlock(>struct_mutex);
>   intel_fbc_cleanup_cfb(dev_priv);
>   i915_gem_cleanup_stolen(dev);

Choose!

Anyway contexts should be shutdown before the engines, so with the above
fixed
Reviewed-by: Chris Wilson 
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [RFC 35/38] drm/i915/preempt: Implement mid-batch preemption support

2015-12-11 Thread John . C . Harrison
From: Dave Gordon 

Batch buffers which have been pre-emption mid-way through execution
must be handled seperately. Rather than simply re-submitting the batch
as a brand new piece of work, the driver only needs to requeue the
context. The hardware will take care of picking up where it left off.

v2: New patch in series.

For: VIZ-2021
Signed-off-by: Dave Gordon 
---
 drivers/gpu/drm/i915/i915_debugfs.c   |  1 +
 drivers/gpu/drm/i915/i915_scheduler.c | 55 +++
 drivers/gpu/drm/i915/i915_scheduler.h |  3 ++
 drivers/gpu/drm/i915/intel_lrc.c  | 51 
 drivers/gpu/drm/i915/intel_lrc.h  |  1 +
 5 files changed, 105 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index 7137439..6798f9c 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -3722,6 +3722,7 @@ static int i915_scheduler_info(struct seq_file *m, void 
*unused)
PRINT_VAR("  Queued",   "u", stats[r].queued);
PRINT_VAR("  Submitted","u", stats[r].submitted);
PRINT_VAR("  Preempted","u", stats[r].preempted);
+   PRINT_VAR("  Midbatch preempted",   "u", stats[r].mid_preempted);
PRINT_VAR("  Completed","u", stats[r].completed);
PRINT_VAR("  Expired",  "u", stats[r].expired);
seq_putc(m, '\n');
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c 
b/drivers/gpu/drm/i915/i915_scheduler.c
index d0c4b46..d96eefb 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -743,6 +743,7 @@ i915_scheduler_preemption_postprocess(struct 
intel_engine_cs *ring)
struct i915_scheduler *scheduler = dev_priv->scheduler;
struct i915_scheduler_queue_entry *pnode = NULL;
struct drm_i915_gem_request *preq = NULL;
+   struct drm_i915_gem_request *midp = NULL;
struct i915_scheduler_stats *stats;
unsigned long flags;
int preempted = 0, preemptive = 0;
@@ -806,8 +807,12 @@ i915_scheduler_preemption_postprocess(struct 
intel_engine_cs *ring)
node->status = i915_sqs_preempted;
trace_i915_scheduler_unfly(ring, node);
trace_i915_scheduler_node_state_change(ring, 
node);
-   /* Empty the preempted ringbuffer */
-   intel_lr_context_resync(req->ctx, ring, false);
+
+   /* Identify a mid-batch preemption */
+   if (req->seqno == ring->last_batch_start) {
+   WARN(midp, "Multiple 
mid-batch-preempted requests?\n");
+   midp = req;
+   }
}
 
i915_gem_request_dequeue(req);
@@ -821,11 +826,47 @@ i915_scheduler_preemption_postprocess(struct 
intel_engine_cs *ring)
if (stats->max_preempted < preempted)
stats->max_preempted = preempted;
 
+   /* Now fix up the contexts of all preempt{ive,ed} requests */
{
-   /* XXX: Sky should be empty now */
+   struct intel_context *mid_ctx = NULL;
struct i915_scheduler_queue_entry *node;
-   list_for_each_entry(node, >node_queue[ring->id], 
link)
-   WARN_ON(I915_SQS_IS_FLYING(node));
+   u32 started = ring->last_batch_start;
+
+   /*
+* Iff preemption was mid-batch, we should have found a
+* mid-batch-preempted request
+*/
+   if (started && started != ring->last_irq_seqno)
+   WARN(!midp, "Mid-batch preempted, but request not 
found\n");
+   else
+   WARN(midp, "Found unexpected mid-batch preemption?\n");
+
+   if (midp) {
+   /* Rewrite this context rather than emptying it */
+   intel_lr_context_resync_req(midp);
+   midp->scheduler_flags |= i915_req_sf_restart;
+   mid_ctx = midp->ctx;
+   stats->mid_preempted += 1;
+   WARN_ON(preq == midp);
+   }
+
+   list_for_each_entry(node, >node_queue[ring->id], 
link) {
+   /* XXX: Sky should be empty now */
+   if (WARN_ON(I915_SQS_IS_FLYING(node)))
+   continue;
+
+   /* Clean up preempted contexts */
+   if (node->status != i915_sqs_preempted)
+   continue;
+
+   if (node->params.ctx != mid_ctx) {
+   /* Empty the preempted ringbuffer */
+ 

[Intel-gfx] [RFC 00/38] Preemption support for GPU scheduler

2015-12-11 Thread John . C . Harrison
From: John Harrison 

Added pre-emption support to the i915 GPU scheduler.

Note that this patch series was written by David Gordon. I have simply
ported it onto a more recent set of scheduler patches and am uploading
it as part of that work so that everything can be viewed at once. Also
because David is on extended vacation at the moment. Not that the
series is being sent as an RFC as there are still some things to be
tidied up. Most notably the commit messages are missing in a few
places. I am leaving those to be filled in by David when he returns.

Also, the series includes a few general fix up and improvement patches
that are not directly related to pre-emption. E.g. for improving the
error capture state. However, the pre-emption code is built upon them
so right now it is much simpler to just send the whole lot out as a
single series. It can be broken up into separate patch sets if/when
people decide it is all good stuff to be doing.

Re the pre-emption itself. It is functional and working but with the
caveat that it requires the GuC. Hence it is only operation on SKL or
later hardware. If the GuC is not available then the pre-emption
support is simply disabled in the scheduler.

v2: Updated for changes to scheduler - use locally cached request
pointer.

Re-worked the 'pre-emption in progress' logic inside the notify code
to simplify it.

Implemented support for mid-batch pre-emption. This must be treated
differently to bettween-batch pre-emption.

Fixed a couple of trace point issues.

[Patches against drm-intel-nightly tree fetched 17/11/2015 with struct fence
conversion and GPU scheduler patches applied]

Dave Gordon (37):
  drm/i915: update ring space correctly
  drm/i915: recalculate ring space after reset
  drm/i915: hangcheck=idle should wake_up_all every time, not just once
  drm/i915/error: capture execlist state on error
  drm/i915/error: capture ringbuffer pointed to by START
  drm/i915/error: report ctx id & desc for each request in the queue
  drm/i915/error: improve CSB reporting
  drm/i915/error: report size in pages for each object dumped
  drm/i915/error: track, capture & print ringbuffer submission activity
  drm/i915/guc: Tidy up GuC proc/ctx descriptor setup
  drm/i915/guc: Add a second client, to be used for preemption
  drm/i915/guc: implement submission via REQUEST_PREEMPTION action
  drm/i915/guc: Improve action error reporting, add preemption debug
  drm/i915/guc: Expose GuC-maintained statistics
  drm/i915: add i915_wait_request() call after i915_add_request_no_flush()
  drm/i915/guc: Expose (intel)_lr_context_size()
  drm/i915/guc: Add support for GuC ADS (Addition Data Structure)
  drm/i915/guc: Fill in (part of?) the ADS whitelist
  drm/i915/error: capture errored context based on request context-id
  drm/i915/error: enhanced error capture of requests
  drm/i915/error: add GuC state error capture & decode
  drm/i915: track relative-constants-mode per-context not per-device
  drm/i915: set request 'head' on allocation not in add_request()
  drm/i915/sched: set request 'head' on at start of ring submission
  drm/i915/sched: include scheduler state in error capture
  drm/i915/preempt: preemption-related definitions and statistics
  drm/i915/preempt: scheduler logic for queueing preemptive requests
  drm/i915/preempt: scheduler logic for selecting preemptive requests
  drm/i915/preempt: scheduler logic for preventing recursive preemption
  drm/i915/preempt: don't allow nonbatch ctx init when the scheduler is
busy
  drm/i915/preempt: scheduler logic for landing preemptive requests
  drm/i915/preempt: add hook to catch 'unexpected' ring submissions
  drm/i915/preempt: Refactor intel_lr_context_reset()
  drm/i915/preempt: scheduler logic for postprocessing preemptive
requests
  drm/i915/preempt: Implement mid-batch preemption support
  drm/i915/preempt: update (LRC) ringbuffer-filling code to create
preemptive requests
  drm/i915/preempt: update scheduler parameters to enable preemption

John Harrison (1):
  drm/i915: Added preemption info to various trace points

 drivers/gpu/drm/i915/i915_debugfs.c|  50 ++-
 drivers/gpu/drm/i915/i915_drv.h|  34 +-
 drivers/gpu/drm/i915/i915_gem.c| 122 +++-
 drivers/gpu/drm/i915/i915_gem_context.c|   5 +-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   9 +-
 drivers/gpu/drm/i915/i915_gpu_error.c  | 307 --
 drivers/gpu/drm/i915/i915_guc_reg.h|   1 +
 drivers/gpu/drm/i915/i915_guc_submission.c | 243 +++---
 drivers/gpu/drm/i915/i915_irq.c|  23 +-
 drivers/gpu/drm/i915/i915_scheduler.c  | 487 ++---
 drivers/gpu/drm/i915/i915_scheduler.h  |  49 ++-
 drivers/gpu/drm/i915/i915_trace.h  |  30 +-
 drivers/gpu/drm/i915/intel_guc.h   |  31 +-
 drivers/gpu/drm/i915/intel_guc_fwif.h  |  93 +-
 drivers/gpu/drm/i915/intel_guc_loader.c|  14 +-
 drivers/gpu/drm/i915/intel_lrc.c 

Re: [Intel-gfx] [PATCH v2] drm/i915: Fix context/engine cleanup order

2015-12-11 Thread Daniel Vetter
On Fri, Dec 11, 2015 at 02:59:09PM +, Nick Hoath wrote:
> Swap the order of context & engine cleanup, so that it is now
> contexts, then engines.
> This allows the context clean up code to do things like confirm
> that ring->dev->struct_mutex is locked without a NULL pointer
> dereference.
> This came about as a result of the 'intel_ring_initialized() must
> be simple and inline' patch now using ring->dev as an initialised
> flag.
> Rename the cleanup function to reflect what it actually does.
> Also clean up some very annoying whitespace issues at the same time.
> 
> v2: Also make the fix in i915_load_modeset_init, not just
> in i915_driver_unload (Chris Wilson)
> 
> Signed-off-by: Nick Hoath 
> Reviewed-by: Chris Wilson 
> 
> Cc: Mika Kuoppala 
> Cc: Daniel Vetter 
> Cc: David Gordon 
> Cc: Chris Wilson 

Queued for -next, thanks for the patch.
-Daniel

> ---
>  drivers/gpu/drm/i915/i915_dma.c |  4 ++--
>  drivers/gpu/drm/i915/i915_drv.h |  2 +-
>  drivers/gpu/drm/i915/i915_gem.c | 23 ---
>  3 files changed, 15 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> index 84e2b20..4dad121 100644
> --- a/drivers/gpu/drm/i915/i915_dma.c
> +++ b/drivers/gpu/drm/i915/i915_dma.c
> @@ -449,8 +449,8 @@ static int i915_load_modeset_init(struct drm_device *dev)
>  
>  cleanup_gem:
>   mutex_lock(>struct_mutex);
> - i915_gem_cleanup_ringbuffer(dev);
>   i915_gem_context_fini(dev);
> + i915_gem_cleanup_engines(dev);
>   mutex_unlock(>struct_mutex);
>  cleanup_irq:
>   intel_guc_ucode_fini(dev);
> @@ -1188,8 +1188,8 @@ int i915_driver_unload(struct drm_device *dev)
>  
>   intel_guc_ucode_fini(dev);
>   mutex_lock(>struct_mutex);
> - i915_gem_cleanup_ringbuffer(dev);
>   i915_gem_context_fini(dev);
> + i915_gem_cleanup_engines(dev);
>   mutex_unlock(>struct_mutex);
>   intel_fbc_cleanup_cfb(dev_priv);
>   i915_gem_cleanup_stolen(dev);
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 5edd393..e317f88 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -3016,7 +3016,7 @@ int i915_gem_init_rings(struct drm_device *dev);
>  int __must_check i915_gem_init_hw(struct drm_device *dev);
>  int i915_gem_l3_remap(struct drm_i915_gem_request *req, int slice);
>  void i915_gem_init_swizzling(struct drm_device *dev);
> -void i915_gem_cleanup_ringbuffer(struct drm_device *dev);
> +void i915_gem_cleanup_engines(struct drm_device *dev);
>  int __must_check i915_gpu_idle(struct drm_device *dev);
>  int __must_check i915_gem_suspend(struct drm_device *dev);
>  void __i915_add_request(struct drm_i915_gem_request *req,
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 8e2acde..04a22db 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -4823,7 +4823,7 @@ i915_gem_init_hw(struct drm_device *dev)
>  
>   ret = i915_gem_request_alloc(ring, ring->default_context, );
>   if (ret) {
> - i915_gem_cleanup_ringbuffer(dev);
> + i915_gem_cleanup_engines(dev);
>   goto out;
>   }
>  
> @@ -4836,7 +4836,7 @@ i915_gem_init_hw(struct drm_device *dev)
>   if (ret && ret != -EIO) {
>   DRM_ERROR("PPGTT enable ring #%d failed %d\n", i, ret);
>   i915_gem_request_cancel(req);
> - i915_gem_cleanup_ringbuffer(dev);
> + i915_gem_cleanup_engines(dev);
>   goto out;
>   }
>  
> @@ -4844,7 +4844,7 @@ i915_gem_init_hw(struct drm_device *dev)
>   if (ret && ret != -EIO) {
>   DRM_ERROR("Context enable ring #%d failed %d\n", i, 
> ret);
>   i915_gem_request_cancel(req);
> - i915_gem_cleanup_ringbuffer(dev);
> + i915_gem_cleanup_engines(dev);
>   goto out;
>   }
>  
> @@ -4919,7 +4919,7 @@ out_unlock:
>  }
>  
>  void
> -i915_gem_cleanup_ringbuffer(struct drm_device *dev)
> +i915_gem_cleanup_engines(struct drm_device *dev)
>  {
>   struct drm_i915_private *dev_priv = dev->dev_private;
>   struct intel_engine_cs *ring;
> @@ -4928,13 +4928,14 @@ i915_gem_cleanup_ringbuffer(struct drm_device *dev)
>   for_each_ring(ring, dev_priv, i)
>   dev_priv->gt.cleanup_ring(ring);
>  
> -if (i915.enable_execlists)
> -/*
> - * Neither the BIOS, ourselves or any other kernel
> - * expects the system to be in execlists mode on startup,
> - * so we need to reset the GPU back to legacy mode.
> - */
> -

[Intel-gfx] [PATCH] drm/i915: Instrument PSR parameter for possible quirks with link standby.

2015-12-11 Thread Rodrigo Vivi
Link standby support has been deprecated with 'commit 89251b177
("drm/i915: PSR: deprecate link_standby support for core platforms.")'

The reason for that is that main link in full off offers more power
savings and some platforms implementations on source side had known
bugs with link standby.

However we don't know all panels out there and we don't fully rely
on the VBT information after the case found with the commit that
made us to deprecate link standby.

So, before enable PSR by default let's instrument the PSR parameter
in a way that we can identify different panels out there that might
require or work better with link standby mode.

It is also useful to say that for backward compatibility I'm not
changing the meaning of this flag. So "0" still means disabled
and "1" means enabled with full support and maximum power savings.

v2: Use positive value instead of negative for different operation mode
as suggested by Daniel.

Cc: Paulo Zanoni 
Cc: Daniel Vetter 
Signed-off-by: Rodrigo Vivi 
---
 drivers/gpu/drm/i915/i915_debugfs.c |  5 +
 drivers/gpu/drm/i915/i915_drv.h |  1 +
 drivers/gpu/drm/i915/i915_params.c  |  7 ++-
 drivers/gpu/drm/i915/intel_psr.c| 13 -
 4 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index 24318b7..efe973b 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2567,6 +2567,10 @@ static int i915_edp_psr_status(struct seq_file *m, void 
*data)
enabled = true;
}
}
+
+   seq_printf(m, "Forcing main link standby: %s\n",
+  yesno(dev_priv->psr.link_standby));
+
seq_printf(m, "HW Enabled & Active bit: %s", yesno(enabled));
 
if (!HAS_DDI(dev))
@@ -2587,6 +2591,7 @@ static int i915_edp_psr_status(struct seq_file *m, void 
*data)
 
seq_printf(m, "Performance_Counter: %u\n", psrperf);
}
+
mutex_unlock(_priv->psr.lock);
 
intel_runtime_pm_put(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 5edd393..de086f0 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -969,6 +969,7 @@ struct i915_psr {
unsigned busy_frontbuffer_bits;
bool psr2_support;
bool aux_frame_sync;
+   bool link_standby;
 };
 
 enum intel_pch {
diff --git a/drivers/gpu/drm/i915/i915_params.c 
b/drivers/gpu/drm/i915/i915_params.c
index 835d609..6dd39f0 100644
--- a/drivers/gpu/drm/i915/i915_params.c
+++ b/drivers/gpu/drm/i915/i915_params.c
@@ -126,7 +126,12 @@ MODULE_PARM_DESC(enable_execlists,
"(-1=auto [default], 0=disabled, 1=enabled)");
 
 module_param_named_unsafe(enable_psr, i915.enable_psr, int, 0600);
-MODULE_PARM_DESC(enable_psr, "Enable PSR (default: false)");
+MODULE_PARM_DESC(enable_psr, "Enable PSR "
+"(0=disabled [default], 1=link-off maximum power-savings, 
2=link-standby mode)"
+"In case you needed to force it on standby or disabled, please 
"
+"report PCI device ID, subsystem vendor and subsystem device 
ID "
+"to intel-gfx@lists.freedesktop.org, if your machine needs it. 
"
+"It will then be included in an upcoming module version.");
 
 module_param_named_unsafe(preliminary_hw_support, i915.preliminary_hw_support, 
int, 0600);
 MODULE_PARM_DESC(preliminary_hw_support,
diff --git a/drivers/gpu/drm/i915/intel_psr.c b/drivers/gpu/drm/i915/intel_psr.c
index 9ccff30..bcc85fd 100644
--- a/drivers/gpu/drm/i915/intel_psr.c
+++ b/drivers/gpu/drm/i915/intel_psr.c
@@ -225,7 +225,12 @@ static void hsw_psr_enable_sink(struct intel_dp *intel_dp)
   (aux_clock_divider << DP_AUX_CH_CTL_BIT_CLOCK_2X_SHIFT));
}
 
-   drm_dp_dpcd_writeb(_dp->aux, DP_PSR_EN_CFG, DP_PSR_ENABLE);
+   if (dev_priv->psr.link_standby)
+   drm_dp_dpcd_writeb(_dp->aux, DP_PSR_EN_CFG,
+  DP_PSR_ENABLE | DP_PSR_MAIN_LINK_ACTIVE);
+   else
+   drm_dp_dpcd_writeb(_dp->aux, DP_PSR_EN_CFG,
+  DP_PSR_ENABLE);
 }
 
 static void vlv_psr_enable_source(struct intel_dp *intel_dp)
@@ -280,6 +285,9 @@ static void hsw_psr_enable_source(struct intel_dp *intel_dp)
if (IS_HASWELL(dev))
val |= EDP_PSR_MIN_LINK_ENTRY_TIME_8_LINES;
 
+   if (dev_priv->psr.link_standby)
+   val |= EDP_PSR_LINK_STANDBY;
+
I915_WRITE(EDP_PSR_CTL, val |
   max_sleep_time << EDP_PSR_MAX_SLEEP_TIME_SHIFT |
   idle_frames << EDP_PSR_IDLE_FRAME_SHIFT |
@@ -763,6 +771,9 @@ void intel_psr_init(struct drm_device *dev)
dev_priv->psr_mmio_base = IS_HASWELL(dev_priv) ?
HSW_EDP_PSR_BASE : BDW_EDP_PSR_BASE;
 
+   if 

Re: [Intel-gfx] [PATCH v2] PM / Runtime: Introduce pm_runtime_get_noidle

2015-12-11 Thread Imre Deak
On pe, 2015-12-11 at 16:40 +0100, Rafael J. Wysocki wrote:
> On Friday, December 11, 2015 02:54:45 PM Imre Deak wrote:
> > On to, 2015-12-10 at 23:14 +0100, Rafael J. Wysocki wrote:
> > > On Thursday, December 10, 2015 11:20:40 PM Imre Deak wrote:
> > > > On Thu, 2015-12-10 at 22:42 +0100, Rafael J. Wysocki wrote:
> > > > > On Thursday, December 10, 2015 10:36:37 PM Rafael J. Wysocki
> > > > > wrote:
> > > > > > On Thursday, December 10, 2015 11:43:50 AM Imre Deak wrote:
> > > > > > > On Thu, 2015-12-10 at 01:58 +0100, Rafael J. Wysocki
> > > > > > > wrote:
> > > > > > > > On Wednesday, December 09, 2015 06:22:19 PM Joonas
> > > > > > > > Lahtinen
> > > > > > > > wrote:
> > > > > > > > > Introduce pm_runtime_get_noidle to for situations
> > > > > > > > > where
> > > > > > > > > it is
> > > > > > > > > not
> > > > > > > > > desireable to touch an idling device. One use
> > > > > > > > > scenario is
> > > > > > > > > periodic
> > > > > > > > > hangchecks performed by the drm/i915 driver which can
> > > > > > > > > be
> > > > > > > > > omitted
> > > > > > > > > on a device in a runtime idle state.
> > > > > > > > > 
> > > > > > > > > v2:
> > > > > > > > > - Fix inconsistent return value when !CONFIG_PM.
> > > > > > > > > - Update documentation for bool return value
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Joonas Lahtinen  > > > > > > > > .int
> > > > > > > > > el.c
> > > > > > > > > om>
> > > > > > > > > Reported-by: Chris Wilson 
> > > > > > > > > Cc: Chris Wilson 
> > > > > > > > > Cc: "Rafael J. Wysocki" 
> > > > > > > > > Cc: linux...@vger.kernel.org
> > > > > > > > 
> > > > > > > > Well, I don't quite see how this can be used in a non-
> > > > > > > > racy
> > > > > > > > way
> > > > > > > > without doing an additional pm_runtime_resume() or
> > > > > > > > something
> > > > > > > > like
> > > > > > > > that in the same code path.
> > > > > > > 
> > > > > > > We don't want to resume, that would be the whole point.
> > > > > > > We'd
> > > > > > > like
> > > > > > > to
> > > > > > > ensure that we hold a reference _and_ the device is
> > > > > > > already
> > > > > > > active. So
> > > > > > > AFAICS we'd need to check runtime_status == RPM_ACTIVE in
> > > > > > > addition
> > > > > > > after taking the reference.
> > > > > > 
> > > > > > Right, and that under the lock.
> > > > > 
> > > > > Which basically means you can call pm_runtime_resume() just
> > > > > fine,
> > > > > because it will do nothing if the status is RPM_ACTIVE
> > > > > already.
> > > > > 
> > > > > So really, why don't you use pm_runtime_get_sync()?
> > > > 
> > > > The difference would be that if the status is not RPM_ACTIVE
> > > > already we
> > > > would drop the reference and report error. The caller would in
> > > > this
> > > > case forego of doing something, since we the device is
> > > > suspended or
> > > > on
> > > > the way to being suspended. One example of such a scenario is a
> > > > watchdog like functionality: the watchdog work would
> > > > call pm_runtime_get_noidle() and check if the device is ok by
> > > > doing
> > > > some HW access, but only if the device is powered. Otherwise
> > > > the
> > > > work
> > > > item would do nothing (meaning it also won't reschedule
> > > > itself).
> > > > The
> > > > watchdog work would get rescheduled next time the device is
> > > > woken
> > > > up
> > > > and some work is submitted to the device.
> > > 
> > > So first of all the name "pm_runtime_get_noidle" doesn't make
> > > sense.
> > > 
> > > I guess what you need is something like
> > > 
> > > bool pm_runtime_get_if_active(struct device *dev)
> > > {
> > >   unsigned log flags;
> > >   bool ret;
> > > 
> > >   spin_lock_irqsave(>power.lock, flags);
> > > 
> > >   if (dev->power.runtime_status == RPM_ACTIVE) {
> > 
> > But here usage_count could be zero, meaning that the device is
> > already
> > on the way to be suspended (autosuspend or ASYNC suspend), no?
> 
> The usage counter equal to 0 need not mean that the device is being
> suspended
> right now.

From the driver's point of view it means there is no need to keep the
device active, and that's the only thing that matters for the driver.
It doesn't matter at what exact point the actual suspend will happen
after the 1->0 transition.

> Also even if that's the case, the usage counter may be incremented at
> this very
> moment by a concurrent thread and you'll lose the opportunity to do
> what you
> want.

In that case the other thread makes sure that the work what we want to
do (run the watchdog check) is rescheduled. We need to handle that kind
of race anyway, since an increment from 0->1 and setting runtime_status
to RPM_ACTIVE could happen even after we have already determined here
that the device is not active and so we return failure.

> > In that case we don't want to return success. That would
> > unnecessarily prolong
> > the time the device is kept 

Re: [Intel-gfx] [PATCH v2] PM / Runtime: Introduce pm_runtime_get_noidle

2015-12-11 Thread Ulf Hansson
On 11 December 2015 at 16:13, Rafael J. Wysocki  wrote:
> On Friday, December 11, 2015 01:03:50 PM Ulf Hansson wrote:
>> [...]
>>
>> >> >
>> >> > Which basically means you can call pm_runtime_resume() just fine,
>> >> > because it will do nothing if the status is RPM_ACTIVE already.
>> >> >
>> >> > So really, why don't you use pm_runtime_get_sync()?
>> >>
>> >> The difference would be that if the status is not RPM_ACTIVE already we
>> >> would drop the reference and report error. The caller would in this
>> >> case forego of doing something, since we the device is suspended or on
>> >> the way to being suspended. One example of such a scenario is a
>> >> watchdog like functionality: the watchdog work would
>> >> call pm_runtime_get_noidle() and check if the device is ok by doing
>> >> some HW access, but only if the device is powered. Otherwise the work
>> >> item would do nothing (meaning it also won't reschedule itself). The
>> >> watchdog work would get rescheduled next time the device is woken up
>> >> and some work is submitted to the device.
>> >
>> > So first of all the name "pm_runtime_get_noidle" doesn't make sense.
>> >
>> > I guess what you need is something like
>> >
>> > bool pm_runtime_get_if_active(struct device *dev)
>> > {
>> > unsigned log flags;
>> > bool ret;
>> >
>> > spin_lock_irqsave(>power.lock, flags);
>> >
>> > if (dev->power.runtime_status == RPM_ACTIVE) {
>> > atomic_inc(>power.usage_count);
>> > ret = true;
>> > } else {
>> > ret = false;
>> > }
>> >
>> > spin_unlock_irqrestore(>power.lock, flags);
>> > }
>> >
>> > and the caller will simply bail out if "false" is returned, but if "true"
>> > is returned, it will have to drop the usage count, right?
>> >
>> > Thanks,
>> > Rafael
>> >
>>
>> Why not just:
>>
>> pm_runtime_get_noresume():
>> if (RPM_ACTIVE)
>>   "do some actions"
>> pm_runtime_put();
>
> Because that's racy?

Right, that was too easy. :-)

>
> What if the rpm_suspend() is running for the device, but it hasn't changed
> the status yet?

So if we can add a pm_runtime_barrier() or even simplifier, just hold
the spin_lock when checking if the rpm status is RPM_ACTIVE.

Kind regards
Uffe
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v2] PM / Runtime: Introduce pm_runtime_get_noidle

2015-12-11 Thread Ulf Hansson
[...]

>> >
>> > Which basically means you can call pm_runtime_resume() just fine,
>> > because it will do nothing if the status is RPM_ACTIVE already.
>> >
>> > So really, why don't you use pm_runtime_get_sync()?
>>
>> The difference would be that if the status is not RPM_ACTIVE already we
>> would drop the reference and report error. The caller would in this
>> case forego of doing something, since we the device is suspended or on
>> the way to being suspended. One example of such a scenario is a
>> watchdog like functionality: the watchdog work would
>> call pm_runtime_get_noidle() and check if the device is ok by doing
>> some HW access, but only if the device is powered. Otherwise the work
>> item would do nothing (meaning it also won't reschedule itself). The
>> watchdog work would get rescheduled next time the device is woken up
>> and some work is submitted to the device.
>
> So first of all the name "pm_runtime_get_noidle" doesn't make sense.
>
> I guess what you need is something like
>
> bool pm_runtime_get_if_active(struct device *dev)
> {
> unsigned log flags;
> bool ret;
>
> spin_lock_irqsave(>power.lock, flags);
>
> if (dev->power.runtime_status == RPM_ACTIVE) {
> atomic_inc(>power.usage_count);
> ret = true;
> } else {
> ret = false;
> }
>
> spin_unlock_irqrestore(>power.lock, flags);
> }
>
> and the caller will simply bail out if "false" is returned, but if "true"
> is returned, it will have to drop the usage count, right?
>
> Thanks,
> Rafael
>

Why not just:

pm_runtime_get_noresume():
if (RPM_ACTIVE)
  "do some actions"
pm_runtime_put();

Kind regards
Uffe
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] Always mark GEM objects as dirty when written by the CPU

2015-12-11 Thread Daniel Vetter
On Fri, Dec 11, 2015 at 12:29:40PM +, Chris Wilson wrote:
> On Fri, Dec 11, 2015 at 12:19:09PM +, Dave Gordon wrote:
> > On 10/12/15 08:58, Daniel Vetter wrote:
> > >On Mon, Dec 07, 2015 at 12:51:49PM +, Dave Gordon wrote:
> > >>I think I missed i915_gem_phys_pwrite().
> > >>
> > >>i915_gem_gtt_pwrite_fast() marks the object dirty for most cases (vit
> > >>set_to_gtt_domain(), but isn't called for all cases (or can return before
> > >>the set_domain). Then we try i915_gem_shmem_pwrite() for non-phys
> > >>objects (no check for stolen!) and that already marks the object dirty
> > >>[aside: we might be able to change that to page-by-page?], but
> > >>i915_gem_phys_pwrite() doesn't mark the object dirty, so we might lose
> > >>updates there?
> > >>
> > >>Or maybe we should move the marking up into i915_gem_pwrite_ioctl() 
> > >>instead.
> > >>The target object is surely going to be dirtied, whatever type it is.
> > >
> > >phys objects are special, and when binding we create allocate new
> > >(contiguous) storage. In put_pages_phys that gets copied back and pages
> > >marked as dirty. While a phys object is pinned it's a kernel bug to look
> > >at the shmem pages and a userspace bug to touch the cpu mmap (since that
> > >data will simply be overwritten whenever the kernel feels like).
> > >
> > >phys objects are only used for cursors on old crap though, so ok if we
> > >don't streamline this fairly quirky old ABI.
> > >-Daniel
> > 
> > So is pread broken already for 'phys' ?
> 
> Yes. A completely unused corner of the API.

I think it would be useful to extract all the phys object stuff into
i915_gem_phys_obj.c, add minimal kerneldoc for the functions, and then an
overview section which explains in detail how fucked up this little bit of
ABI history lore is. I can do the overview section, but the
extraction/basic kerneldoc will probably take a bit longer to get around
to.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Wait for PP cycle delay only if panel is in power off sequence

2015-12-11 Thread Daniel Vetter
On Fri, Dec 11, 2015 at 05:11:23PM +0530, Kumar, Shobhit wrote:
> On 12/11/2015 04:55 PM, Thulasimani, Sivakumar wrote:
> >
> >
> >On 12/10/2015 8:32 PM, Ville Syrjälä wrote:
> >>On Thu, Dec 10, 2015 at 08:09:01PM +0530, Thulasimani, Sivakumar wrote:
> >>>
> >>>On 12/10/2015 7:08 PM, Ville Syrjälä wrote:
> On Thu, Dec 10, 2015 at 03:15:37PM +0200, Ville Syrjälä wrote:
> >On Thu, Dec 10, 2015 at 03:01:02PM +0530, Kumar, Shobhit wrote:
> >>On 12/09/2015 09:35 PM, Ville Syrjälä wrote:
> >>>On Wed, Dec 09, 2015 at 08:59:26PM +0530, Shobhit Kumar wrote:
> On Wed, Dec 9, 2015 at 8:34 PM, Chris Wilson
>  wrote:
> >On Wed, Dec 09, 2015 at 08:07:10PM +0530, Shobhit Kumar wrote:
> >>On Wed, Dec 9, 2015 at 7:27 PM, Ville Syrjälä
> >> wrote:
> >>>On Wed, Dec 09, 2015 at 06:51:48PM +0530, Shobhit Kumar wrote:
> During resume, while turning the EDP panel power on, we need
> not wait
> blindly for panel_power_cycle_delay. Check if panel power
> down sequence
> in progress and then only wait. This improves our resume
> time significantly.
> 
> Signed-off-by: Shobhit Kumar 
> ---
> drivers/gpu/drm/i915/intel_dp.c | 17 -
> 1 file changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_dp.c
> b/drivers/gpu/drm/i915/intel_dp.c
> index f335c92..10ec669 100644
> --- a/drivers/gpu/drm/i915/intel_dp.c
> +++ b/drivers/gpu/drm/i915/intel_dp.c
> @@ -617,6 +617,20 @@ static bool edp_have_panel_power(struct
> intel_dp *intel_dp)
>  return (I915_READ(_pp_stat_reg(intel_dp)) & PP_ON)
> != 0;
> }
> 
> +static bool edp_panel_off_seq(struct intel_dp *intel_dp)
> +{
> + struct drm_device *dev = intel_dp_to_dev(intel_dp);
> + struct drm_i915_private *dev_priv = dev->dev_private;
> +
> + lockdep_assert_held(_priv->pps_mutex);
> +
> + if (IS_VALLEYVIEW(dev) &&
> + intel_dp->pps_pipe == INVALID_PIPE)
> + return false;
> +
> + return (I915_READ(_pp_stat_reg(intel_dp)) &
> PP_SEQUENCE_POWER_DOWN) != 0;
> +}
> >>>This doens't make sense to me. The power down cycle may have
> >>>completed just before, and so this would claim we don't have to
> >>>wait for the power_cycle_delay.
> >>Not sure I understand your concern correctly. You are right,
> >>power
> >>down cycle may have completed just before and if it has then
> >>we don't
> >>need to wait. But in case the power down cycle is in progress
> >>as per
> >>internal state, then we need to wait for it to complete. This
> >>will
> >>happen for example in non-suspend disable path and will be
> >>handled
> >>correctly. In case of actual suspend/resume, this would have
> >>successfully completed and will skip the wait as it is not needed
> >>before enabling panel power.
> >>
> +
> static bool edp_have_panel_vdd(struct intel_dp *intel_dp)
> {
>  struct drm_device *dev = intel_dp_to_dev(intel_dp);
> @@ -2025,7 +2039,8 @@ static void edp_panel_on(struct
> intel_dp *intel_dp)
>   port_name(dp_to_dig_port(intel_dp)->port)))
>  return;
> 
> - wait_panel_power_cycle(intel_dp);
> + if (edp_panel_off_seq(intel_dp))
> + wait_panel_power_cycle(intel_dp);
> >Looking in from the side, I have no idea what this is meant to
> >do. At
> >the very least you need your explanatory paragraph here which
> >would
> >include what exactly you are waiting for at the start of
> >edp_panel_on
> >(and please try and find a better name for edp_panel_off_seq()).
> I will add a comment. Basically I am not additionally waiting, but
> converting the wait which was already there to a conditional
> wait. The
> edp_panel_off_seq, checks if panel power down sequence is in
> progress.
> In that case we need to wait for the panel power cycle delay. If
> it is
> not in that sequence, there is no need to wait. I will make an
> attempt
> again on the naming in next patch update.
> >>>As far I remeber you need to wait for power_cycle_delay between
> >>>power
> >>>down cycle 

Re: [Intel-gfx] [PATCH i-g-t] RFC: split PM workarounds into separate lib

2015-12-11 Thread Daniel Vetter
On Thu, Dec 10, 2015 at 06:01:28PM +0200, David Weinehall wrote:
> On Tue, Dec 08, 2015 at 03:42:27PM +0200, Ville Syrjälä wrote:
> > On Tue, Dec 08, 2015 at 10:50:39AM +0200, David Weinehall wrote:
> > > Since the defaults for some external power management related settings
> > > prevents us from testing our power management functionality properly,
> > > we have to work around it. Currently this is done from the individual
> > > test cases, but this is sub-optimal.  This patch moves the PM-related
> > > workarounds into a separate library, and adds some code to restore the
> > > previous settings for the SATA link power management while at it.
> > 
> > Why is it called "workarounds"? That gives me the impression we're
> > working around something that's supposed to work but doesn't. That's not
> > the case here.
> 
> Workarounds was because we are working around "imperfect" settings
> in other components. At least to me power management should be enabled
> out of the box, not something that requires admin-level workarounds.
> Since we're not in control of said defaults, we have to modify the
> settings when we run our tests, hence workarounds.

Fully agreed that power tuning should be applied by default, but that's a
loong process to convince all the other kernel maintainers. And we
need to get our own house in order first too, but that's in progress.

> That said, as I've replied to a later post, igt_pm is fine by me.

One more: Please namespace all the library functions you're adding and
exporting to tests with igt_pm_. Static/internal functions can still be
named however you feel like.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH i-g-t] gem_flink_race/prime_self_import: Improve test reliability

2015-12-11 Thread Daniel Vetter
On Fri, Dec 11, 2015 at 10:33:46AM +, Morton, Derek J wrote:
> >
> >
> >-Original Message-
> >From: Daniel Vetter [mailto:daniel.vet...@ffwll.ch] On Behalf Of Daniel 
> >Vetter
> >Sent: Thursday, December 10, 2015 12:53 PM
> >To: Morton, Derek J
> >Cc: Daniel Vetter; intel-gfx@lists.freedesktop.org; Wood, Thomas
> >Subject: Re: [Intel-gfx] [PATCH i-g-t] gem_flink_race/prime_self_import: 
> >Improve test reliability
> >
> >On Thu, Dec 10, 2015 at 11:51:29AM +, Morton, Derek J wrote:
> >> >
> >> >
> >> >-Original Message-
> >> >From: Daniel Vetter [mailto:daniel.vet...@ffwll.ch] On Behalf Of 
> >> >Daniel Vetter
> >> >Sent: Thursday, December 10, 2015 10:13 AM
> >> >To: Morton, Derek J
> >> >Cc: intel-gfx@lists.freedesktop.org; Wood, Thomas
> >> >Subject: Re: [Intel-gfx] [PATCH i-g-t] 
> >> >gem_flink_race/prime_self_import: Improve test reliability
> >> >
> >> >On Tue, Dec 08, 2015 at 12:44:44PM +, Derek Morton wrote:
> >> >> gem_flink_race and prime_self_import have subtests which read the 
> >> >> number of open gem objects from debugfs to determine if objects 
> >> >> have leaked during the test. However the test can fail sporadically 
> >> >> if the number of gem objects changes due to other process activity.
> >> >> This patch introduces a change to check the number of gem objects 
> >> >> several times to filter out any fluctuations.
> >> >
> >> >Why exactly does this happen? IGT tests should be run on bare metal, 
> >> >with everything else killed/subdued/shutup. If there's still things 
> >> >going on that create objects, we need to stop them from doing that.
> >> >
> >> >If this only applies to Android, or some special Android deamon them 
> >> >imo check for that at runtime and igt_skip("your setup is invalid, 
> >> >deamon %s running\n"); is the correct fix. After all just because you 
> >> >sampled for a bit doesn't mean that it wont still change right when 
> >> >you start running the test for real, so this is still fragile.
> >> 
> >> Before running tests on android we do stop everything possible. I 
> >> suspect the culprit is coreu getting automatically restarted after it 
> >> is stopped. I had additional debug while developing this patch and 
> >> what I saw was the system being mostly quiescent but with some very 
> >> low level background activity. 1 extra object being created and then 
> >> deleted occasionally. Depending on whether it occurred at the start or 
> >> end of the test it was resulting in a reported leak of either 1 or -1 
> >> objects.
> >> The patch fixes that issue by taking several samples and requiring 
> >> them to be the same, therefore filtering out the low level background 
> >> noise.
> >> It would not help if something in the background allocated an object 
> >> and kept it allocated, but I have not seen that happen. I only saw 
> >> once the object count increasing for 2 consecutive reads hence the 
> >> count to 4 to give a margin. The test was failing about 10%. With this 
> >> patch I got 100% pass across 300 runs of each of the tests.
> >
> >Hm, piglit checks that there's no other drm clients running. Have you tried 
> >re-running that check to zero in on the culprit?
> 
> We don't use piglet to run IGT tests on Android. I have had a look at what 
> piglet does and added the same check to our scripts. (It reads a list of 
> clients from /sys/kernel/debug/dri/0/clients)
> For CHV it shows a process called 'y', though that seems to be some issue on 
> CHV that all driver clients are called 'y'. I checked on BXT which properly 
> shows the process names and it looks like it is the binder process (which  is 
> handling some inter process communication). I don't think this is something 
> we can stop. 

Nah, you definitely can't stop binder, won't have an android left after
that ;-)

But it is strange that binder owns these buffers. Binder is just IPC, but
like unix domain sockets you can also throw around file descriptors. So
something on your system is moving open drm fd devices still around. I
don't have an idea what kind of audit/debug tooling binder offers, but
there should be a way to figure out who really owns that file descriptor.
If you're lucky lsof (if android has that, otherwise walk /proc/*/fd/*
symlinks manually) should help.

Cheers, Daniel

> >> If you are concerned about the behaviour when running the test with a 
> >> load of background activity I could add code to limit to the reset of 
> >> the count and fail the test in that instance. That would give a 
> >> benefit of distinguishing a test fail due to excessive background 
> >> activity from a detected leak.
> >
> >I'm also concerned for the overhead this causes everyone else. If this 
> >really is some Android trouble then I think it'd be good to only compile 
> >this on Android. But would still be much better if you can get to a reliably 
> >clean test environment.
> 
> I will make the loop part android specific.
> 
> 
> //Derek
> 
> >
> >> I would not want to 

Re: [Intel-gfx] [PATCH] drm/i915: Update to post-reset execlist queue clean-up

2015-12-11 Thread Daniel Vetter
On Fri, Dec 11, 2015 at 02:14:00PM +, Dave Gordon wrote:
> On 01/12/15 11:46, Tvrtko Ursulin wrote:
> >
> >On 23/10/15 18:02, Tomas Elf wrote:
> >>When clearing an execlist queue, instead of traversing it and
> >>unreferencing all
> >>requests while holding the spinlock (which might lead to thread
> >>sleeping with
> >>IRQs are turned off - bad news!), just move all requests to the retire
> >>request
> >>list while holding spinlock and then drop spinlock and invoke the
> >>execlists
> >>request retirement path, which already deals with the intricacies of
> >>purging/dereferencing execlist queue requests.
> >>
> >>This patch can be considered v3 of:
> >>
> >>commit b96db8b81c54ef30485ddb5992d63305d86ea8d3
> >>Author: Tomas Elf 
> >>drm/i915: Grab execlist spinlock to avoid post-reset concurrency
> >>issues
> >>
> >>This patch assumes v2 of the above patch is part of the baseline,
> >>reverts v2
> >>and adds changes on top to turn it into v3.
> >>
> >>Signed-off-by: Tomas Elf 
> >>Cc: Tvrtko Ursulin 
> >>Cc: Chris Wilson 
> >>---
> >>  drivers/gpu/drm/i915/i915_gem.c | 15 ---
> >>  1 file changed, 4 insertions(+), 11 deletions(-)
> >>
> >>diff --git a/drivers/gpu/drm/i915/i915_gem.c
> >>b/drivers/gpu/drm/i915/i915_gem.c
> >>index 2c7a0b7..b492603 100644
> >>--- a/drivers/gpu/drm/i915/i915_gem.c
> >>+++ b/drivers/gpu/drm/i915/i915_gem.c
> >>@@ -2756,20 +2756,13 @@ static void i915_gem_reset_ring_cleanup(struct
> >>drm_i915_private *dev_priv,
> >>
> >>  if (i915.enable_execlists) {
> >>  spin_lock_irq(>execlist_lock);
> >>-while (!list_empty(>execlist_queue)) {
> >>-struct drm_i915_gem_request *submit_req;
> >>
> >>-submit_req = list_first_entry(>execlist_queue,
> >>-struct drm_i915_gem_request,
> >>-execlist_link);
> >>-list_del(_req->execlist_link);
> >>+/* list_splice_tail_init checks for empty lists */
> >>+list_splice_tail_init(>execlist_queue,
> >>+  >execlist_retired_req_list);
> >>
> >>-if (submit_req->ctx != ring->default_context)
> >>-intel_lr_context_unpin(submit_req);
> >>-
> >>-i915_gem_request_unreference(submit_req);
> >>-}
> >>  spin_unlock_irq(>execlist_lock);
> >>+intel_execlists_retire_requests(ring);
> >>  }
> >>
> >>  /*
> >
> >Fallen through the cracks..
> >
> >This looks to be even more serious, since lockdep notices possible
> >deadlock involving vmap_area_lock:
> >
> >  Possible interrupt unsafe locking scenario:
> >
> >CPU0CPU1
> >
> >   lock(vmap_area_lock);
> >local_irq_disable();
> >lock(&(>execlist_lock)->rlock);
> >lock(vmap_area_lock);
> >   
> > lock(&(>execlist_lock)->rlock);
> >
> >  *** DEADLOCK ***
> >
> >Because it unpins LRC context and ringbuffer which ends up in the VM
> >code under the execlist_lock.
> >
> >intel_execlists_retire_requests is slightly different from the code in
> >the reset handler because it concerns itself with ctx_obj existence
> >which the other one doesn't.
> >
> >Could people more knowledgeable of this code check if it is OK and R-B?
> >
> >Regards,
> >
> >Tvrtko
> 
> Hi Tvrtko,
> 
> I didn't understand this message at first, I thought you'd found a problem
> with this ("v3") patch, but now I see what you actually meant is that there
> is indeed a problem with the (v2) that got merged, not the original question
> about unreferencing an object while holding a spinlock (because it can't be
> the last reference), but rather because of the unpin, which can indeed cause
> a problem with a non-i915-defined kernel lock.
> 
> So we should certainly update the current (v2) upstream with this.
> Thomas Daniel already R-B'd this code on 23rd October, when it was:
> 
> [PATCH v3 7/8] drm/i915: Grab execlist spinlock to avoid post-reset
> concurrency issues.
> 
> and it hasn't changed in substance since then, so you can carry his R-B
> over, plus I said on that same day that this was a better solution. So:
> 
> Reviewed-by: Thomas Daniel 
> Reviewed-by: Dave Gordon 

Indeed, fell through the cracks more than once :(

Sorry about that, picked up now.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 3/3] drm/i915: Prevent leaking of -EIO from i915_wait_request()

2015-12-11 Thread Daniel Vetter
On Fri, Dec 11, 2015 at 09:02:18AM +, Chris Wilson wrote:
> On Thu, Dec 03, 2015 at 10:14:54AM +0100, Daniel Vetter wrote:
> > On Tue, Dec 01, 2015 at 11:05:35AM +, Chris Wilson wrote:
> > > diff --git a/drivers/gpu/drm/i915/intel_display.c 
> > > b/drivers/gpu/drm/i915/intel_display.c
> > > index 4447e73b54db..73c61b94f7fd 100644
> > > --- a/drivers/gpu/drm/i915/intel_display.c
> > > +++ b/drivers/gpu/drm/i915/intel_display.c
> > > @@ -13315,23 +13309,15 @@ static int intel_atomic_prepare_commit(struct 
> > > drm_device *dev,
> > >  
> > >   ret = __i915_wait_request(intel_plane_state->wait_req,
> > > true, NULL, NULL);
> > > -
> > > - /* Swallow -EIO errors to allow updates during hw 
> > > lockup. */
> > > - if (ret == -EIO)
> > > - ret = 0;
> > > -
> > > - if (ret)
> > > + if (ret) {
> > > + mutex_lock(>struct_mutex);
> > > + drm_atomic_helper_cleanup_planes(dev, state);
> > > + mutex_unlock(>struct_mutex);
> > >   break;
> > > + }
> > >   }
> > > -
> > > - if (!ret)
> > > - return 0;
> > > -
> > > - mutex_lock(>struct_mutex);
> > > - drm_atomic_helper_cleanup_planes(dev, state);
> > >   }
> > >  
> > > - mutex_unlock(>struct_mutex);
> > 
> > Sneaking in lockless waits! Separate patch please.
> 
> No, it is just badly written code. The wait is already lockless but the
> lock is dropped and retaken around the error paths in such a manner that
> you cannot see this from a glimpse.

Indeed lack of diff context made me all confused, I stand corrected. Looks
good.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


  1   2   >