Re: [Intel-gfx] [PATCH] drm/i915: Wait for PP cycle delay only if panel is in power off sequence
On 12/10/2015 8:32 PM, Ville Syrjälä wrote: On Thu, Dec 10, 2015 at 08:09:01PM +0530, Thulasimani, Sivakumar wrote: On 12/10/2015 7:08 PM, Ville Syrjälä wrote: On Thu, Dec 10, 2015 at 03:15:37PM +0200, Ville Syrjälä wrote: On Thu, Dec 10, 2015 at 03:01:02PM +0530, Kumar, Shobhit wrote: On 12/09/2015 09:35 PM, Ville Syrjälä wrote: On Wed, Dec 09, 2015 at 08:59:26PM +0530, Shobhit Kumar wrote: On Wed, Dec 9, 2015 at 8:34 PM, Chris Wilsonwrote: On Wed, Dec 09, 2015 at 08:07:10PM +0530, Shobhit Kumar wrote: On Wed, Dec 9, 2015 at 7:27 PM, Ville Syrjälä wrote: On Wed, Dec 09, 2015 at 06:51:48PM +0530, Shobhit Kumar wrote: During resume, while turning the EDP panel power on, we need not wait blindly for panel_power_cycle_delay. Check if panel power down sequence in progress and then only wait. This improves our resume time significantly. Signed-off-by: Shobhit Kumar --- drivers/gpu/drm/i915/intel_dp.c | 17 - 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c index f335c92..10ec669 100644 --- a/drivers/gpu/drm/i915/intel_dp.c +++ b/drivers/gpu/drm/i915/intel_dp.c @@ -617,6 +617,20 @@ static bool edp_have_panel_power(struct intel_dp *intel_dp) return (I915_READ(_pp_stat_reg(intel_dp)) & PP_ON) != 0; } +static bool edp_panel_off_seq(struct intel_dp *intel_dp) +{ + struct drm_device *dev = intel_dp_to_dev(intel_dp); + struct drm_i915_private *dev_priv = dev->dev_private; + + lockdep_assert_held(_priv->pps_mutex); + + if (IS_VALLEYVIEW(dev) && + intel_dp->pps_pipe == INVALID_PIPE) + return false; + + return (I915_READ(_pp_stat_reg(intel_dp)) & PP_SEQUENCE_POWER_DOWN) != 0; +} This doens't make sense to me. The power down cycle may have completed just before, and so this would claim we don't have to wait for the power_cycle_delay. Not sure I understand your concern correctly. You are right, power down cycle may have completed just before and if it has then we don't need to wait. But in case the power down cycle is in progress as per internal state, then we need to wait for it to complete. This will happen for example in non-suspend disable path and will be handled correctly. In case of actual suspend/resume, this would have successfully completed and will skip the wait as it is not needed before enabling panel power. + static bool edp_have_panel_vdd(struct intel_dp *intel_dp) { struct drm_device *dev = intel_dp_to_dev(intel_dp); @@ -2025,7 +2039,8 @@ static void edp_panel_on(struct intel_dp *intel_dp) port_name(dp_to_dig_port(intel_dp)->port))) return; - wait_panel_power_cycle(intel_dp); + if (edp_panel_off_seq(intel_dp)) + wait_panel_power_cycle(intel_dp); Looking in from the side, I have no idea what this is meant to do. At the very least you need your explanatory paragraph here which would include what exactly you are waiting for at the start of edp_panel_on (and please try and find a better name for edp_panel_off_seq()). I will add a comment. Basically I am not additionally waiting, but converting the wait which was already there to a conditional wait. The edp_panel_off_seq, checks if panel power down sequence is in progress. In that case we need to wait for the panel power cycle delay. If it is not in that sequence, there is no need to wait. I will make an attempt again on the naming in next patch update. As far I remeber you need to wait for power_cycle_delay between power down cycle and power up cycle. You're trying to throw that wait away entirely, unless the function happens get called while the power down Yes you are right and I realize I made a mistake in my patch which is not checking PP_CYCLE_DELAY_ACTIVE bit. cycle is still in progress. We should already optimize away redundant waits by tracking the end of the power down cycle with the jiffies tracking. Actually looking at the code the power_cycle_delay gets counted from the start of the last power down cycle, so supposedly it's always at least as long as the power down cycle, and typically it's quite a bit longer that that. But that doesn't change the fact that you can't just skip it because the power down cycle delay happened to end already. So what we do now is: 1. initiate power down cycle 2. last_power_cycle=jiffies 3. wait for power down (I suppose this actually waits until the power down delay has passed since that's programmes into the PPS). 4. wait for power_cycle_delay from last_power_cycle 5. initiate power up cycle I think with your patch step 4 would always be skipped since the power down cycle has already ended, and then we fail to honor the power cycle delay. Yes, I agree. I missed checking for PP_CYCLE_DELAY_ACTIVE. Adding that check will take care of this scenario I guess ? Nope. The
Re: [Intel-gfx] [PATCH 2/6] drm/i915: Support for creating Stolen memory backed objects
On 11/12/15 11:22, Ankitprasad Sharma wrote: On Wed, 2015-12-09 at 14:06 +, Tvrtko Ursulin wrote: Hi, On 09/12/15 12:46, ankitprasad.r.sha...@intel.com wrote: From: Ankitprasad SharmaExtend the drm_i915_gem_create structure to add support for creating Stolen memory backed objects. Added a new flag through which user can specify the preference to allocate the object from stolen memory, which if set, an attempt will be made to allocate the object from stolen memory subject to the availability of free space in the stolen region. v2: Rebased to the latest drm-intel-nightly (Ankit) v3: Changed versioning of GEM_CREATE param, added new comments (Tvrtko) v4: Changed size from 32b to 64b to prevent userspace overflow (Tvrtko) Corrected function arguments ordering (Chris) v5: Corrected function name (Chris) Testcase: igt/gem_stolen Signed-off-by: Ankitprasad Sharma Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_dma.c| 3 +++ drivers/gpu/drm/i915/i915_drv.h| 2 +- drivers/gpu/drm/i915/i915_gem.c| 30 +++--- drivers/gpu/drm/i915/i915_gem_stolen.c | 4 ++-- include/uapi/drm/i915_drm.h| 16 5 files changed, 49 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c index ffcb9c6..6927c7e 100644 --- a/drivers/gpu/drm/i915/i915_dma.c +++ b/drivers/gpu/drm/i915/i915_dma.c @@ -170,6 +170,9 @@ static int i915_getparam(struct drm_device *dev, void *data, case I915_PARAM_HAS_RESOURCE_STREAMER: value = HAS_RESOURCE_STREAMER(dev); break; + case I915_PARAM_CREATE_VERSION: + value = 2; + break; default: DRM_DEBUG("Unknown parameter %d\n", param->param); return -EINVAL; diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 8e554d3..d45274e 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -3213,7 +3213,7 @@ void i915_gem_stolen_remove_node(struct drm_i915_private *dev_priv, int i915_gem_init_stolen(struct drm_device *dev); void i915_gem_cleanup_stolen(struct drm_device *dev); struct drm_i915_gem_object * -i915_gem_object_create_stolen(struct drm_device *dev, u32 size); +i915_gem_object_create_stolen(struct drm_device *dev, u64 size); struct drm_i915_gem_object * i915_gem_object_create_stolen_for_preallocated(struct drm_device *dev, u32 stolen_offset, diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index d57e850..296e63f 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -375,6 +375,7 @@ static int i915_gem_create(struct drm_file *file, struct drm_device *dev, uint64_t size, + uint32_t flags, uint32_t *handle_p) { struct drm_i915_gem_object *obj; @@ -385,8 +386,31 @@ i915_gem_create(struct drm_file *file, if (size == 0) return -EINVAL; + if (flags & __I915_CREATE_UNKNOWN_FLAGS) + return -EINVAL; + /* Allocate the new object */ - obj = i915_gem_alloc_object(dev, size); + if (flags & I915_CREATE_PLACEMENT_STOLEN) { + mutex_lock(>struct_mutex); + obj = i915_gem_object_create_stolen(dev, size); + if (!obj) { + mutex_unlock(>struct_mutex); + return -ENOMEM; + } + + /* Always clear fresh buffers before handing to userspace */ + ret = i915_gem_object_clear(obj); + if (ret) { + drm_gem_object_unreference(>base); + mutex_unlock(>struct_mutex); + return ret; + } + + mutex_unlock(>struct_mutex); + } else { + obj = i915_gem_alloc_object(dev, size); + } + if (obj == NULL) return -ENOMEM; @@ -409,7 +433,7 @@ i915_gem_dumb_create(struct drm_file *file, args->pitch = ALIGN(args->width * DIV_ROUND_UP(args->bpp, 8), 64); args->size = args->pitch * args->height; return i915_gem_create(file, dev, - args->size, >handle); + args->size, 0, >handle); } /** @@ -422,7 +446,7 @@ i915_gem_create_ioctl(struct drm_device *dev, void *data, struct drm_i915_gem_create *args = data; return i915_gem_create(file, dev, - args->size, >handle); + args->size, args->flags, >handle); } static inline int diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c b/drivers/gpu/drm/i915/i915_gem_stolen.c index
Re: [Intel-gfx] [PATCH] Always mark GEM objects as dirty when written by the CPU
On Fri, Dec 11, 2015 at 12:19:09PM +, Dave Gordon wrote: > On 10/12/15 08:58, Daniel Vetter wrote: > >On Mon, Dec 07, 2015 at 12:51:49PM +, Dave Gordon wrote: > >>I think I missed i915_gem_phys_pwrite(). > >> > >>i915_gem_gtt_pwrite_fast() marks the object dirty for most cases (vit > >>set_to_gtt_domain(), but isn't called for all cases (or can return before > >>the set_domain). Then we try i915_gem_shmem_pwrite() for non-phys > >>objects (no check for stolen!) and that already marks the object dirty > >>[aside: we might be able to change that to page-by-page?], but > >>i915_gem_phys_pwrite() doesn't mark the object dirty, so we might lose > >>updates there? > >> > >>Or maybe we should move the marking up into i915_gem_pwrite_ioctl() instead. > >>The target object is surely going to be dirtied, whatever type it is. > > > >phys objects are special, and when binding we create allocate new > >(contiguous) storage. In put_pages_phys that gets copied back and pages > >marked as dirty. While a phys object is pinned it's a kernel bug to look > >at the shmem pages and a userspace bug to touch the cpu mmap (since that > >data will simply be overwritten whenever the kernel feels like). > > > >phys objects are only used for cursors on old crap though, so ok if we > >don't streamline this fairly quirky old ABI. > >-Daniel > > So is pread broken already for 'phys' ? Yes. A completely unused corner of the API. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v2] PM / Runtime: Introduce pm_runtime_get_noidle
On 10/12/15 22:14, Rafael J. Wysocki wrote: On Thursday, December 10, 2015 11:20:40 PM Imre Deak wrote: On Thu, 2015-12-10 at 22:42 +0100, Rafael J. Wysocki wrote: On Thursday, December 10, 2015 10:36:37 PM Rafael J. Wysocki wrote: On Thursday, December 10, 2015 11:43:50 AM Imre Deak wrote: On Thu, 2015-12-10 at 01:58 +0100, Rafael J. Wysocki wrote: On Wednesday, December 09, 2015 06:22:19 PM Joonas Lahtinen wrote: Introduce pm_runtime_get_noidle to for situations where it is not desireable to touch an idling device. One use scenario is periodic hangchecks performed by the drm/i915 driver which can be omitted on a device in a runtime idle state. v2: - Fix inconsistent return value when !CONFIG_PM. - Update documentation for bool return value Signed-off-by: Joonas LahtinenReported-by: Chris Wilson Cc: Chris Wilson Cc: "Rafael J. Wysocki" Cc: linux...@vger.kernel.org Well, I don't quite see how this can be used in a non-racy way without doing an additional pm_runtime_resume() or something like that in the same code path. We don't want to resume, that would be the whole point. We'd like to ensure that we hold a reference _and_ the device is already active. So AFAICS we'd need to check runtime_status == RPM_ACTIVE in addition after taking the reference. Right, and that under the lock. Which basically means you can call pm_runtime_resume() just fine, because it will do nothing if the status is RPM_ACTIVE already. So really, why don't you use pm_runtime_get_sync()? The difference would be that if the status is not RPM_ACTIVE already we would drop the reference and report error. The caller would in this case forego of doing something, since we the device is suspended or on the way to being suspended. One example of such a scenario is a watchdog like functionality: the watchdog work would call pm_runtime_get_noidle() and check if the device is ok by doing some HW access, but only if the device is powered. Otherwise the work item would do nothing (meaning it also won't reschedule itself). The watchdog work would get rescheduled next time the device is woken up and some work is submitted to the device. So first of all the name "pm_runtime_get_noidle" doesn't make sense. How about pm_runtime_get_unless_idle(), which would be analogous to kref_get_unless_zero() ? .Dave. I guess what you need is something like bool pm_runtime_get_if_active(struct device *dev) { unsigned log flags; bool ret; spin_lock_irqsave(>power.lock, flags); if (dev->power.runtime_status == RPM_ACTIVE) { atomic_inc(>power.usage_count); ret = true; } else { ret = false; } spin_unlock_irqrestore(>power.lock, flags); } and the caller will simply bail out if "false" is returned, but if "true" is returned, it will have to drop the usage count, right? Thanks, Rafael ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PULL] drm-intel-fixes
Hi Dave - Here are some i915 fixes for v4.4, sorry for being late this week. BR, Jani. The following changes since commit 527e9316f8ec44bd53d90fb9f611fa752bb9: Linux 4.4-rc4 (2015-12-06 15:43:12 -0800) are available in the git repository at: git://anongit.freedesktop.org/drm-intel tags/drm-intel-fixes-2015-12-11 for you to fetch changes up to 634b3a4a476e96816d5d6cd5bb9f8900a53f56ba: drm/i915: Do a better job at disabling primary plane in the noatomic case. (2015-12-10 13:33:42 +0200) Maarten Lankhorst (1): drm/i915: Do a better job at disabling primary plane in the noatomic case. Mika Kuoppala (2): drm/i915/skl: Disable coarse power gating up until F0 drm/i915/skl: Double RC6 WRL always on Tvrtko Ursulin (1): drm/i915: Remove incorrect warning in context cleanup drivers/gpu/drm/i915/i915_gem_context.c | 2 -- drivers/gpu/drm/i915/intel_display.c| 4 +++- drivers/gpu/drm/i915/intel_pm.c | 5 ++--- 3 files changed, 5 insertions(+), 6 deletions(-) -- Jani Nikula, Intel Open Source Technology Center ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 17/32] drm/i915: Remove the lazy_coherency parameter from request-completed?
Now that we have split out the seqno-barrier from the engine->get_seqno() callback itself, we can move the users of the seqno-barrier to the required callsites simplifying the common code and making the required workaround handling much more explicit. Signed-off-by: Chris Wilson--- drivers/gpu/drm/i915/i915_debugfs.c | 4 ++-- drivers/gpu/drm/i915/i915_drv.h | 10 ++ drivers/gpu/drm/i915/i915_gem.c | 24 +++- drivers/gpu/drm/i915/intel_display.c | 2 +- drivers/gpu/drm/i915/intel_pm.c | 4 ++-- 5 files changed, 22 insertions(+), 22 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index 6344fe69ab82..8860dec36aae 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -601,7 +601,7 @@ static int i915_gem_pageflip_info(struct seq_file *m, void *data) i915_gem_request_get_seqno(work->flip_queued_req), dev_priv->next_seqno, ring->get_seqno(ring), - i915_gem_request_completed(work->flip_queued_req, true)); + i915_gem_request_completed(work->flip_queued_req)); } else seq_printf(m, "Flip not associated with any ring\n"); seq_printf(m, "Flip queued on frame %d, (was ready on frame %d), now %d\n", @@ -1353,8 +1353,8 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused) intel_runtime_pm_get(dev_priv); for_each_ring(ring, dev_priv, i) { - seqno[i] = ring->get_seqno(ring); acthd[i] = intel_ring_get_active_head(ring); + seqno[i] = ring->get_seqno(ring); } intel_runtime_pm_put(dev_priv); diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index ff83f148658f..d099e960f9b8 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2978,20 +2978,14 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2) return (int32_t)(seq1 - seq2) >= 0; } -static inline bool i915_gem_request_started(struct drm_i915_gem_request *req, - bool lazy_coherency) +static inline bool i915_gem_request_started(struct drm_i915_gem_request *req) { - if (!lazy_coherency && req->ring->seqno_barrier) - req->ring->seqno_barrier(req->ring); return i915_seqno_passed(req->ring->get_seqno(req->ring), req->previous_seqno); } -static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req, - bool lazy_coherency) +static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req) { - if (!lazy_coherency && req->ring->seqno_barrier) - req->ring->seqno_barrier(req->ring); return i915_seqno_passed(req->ring->get_seqno(req->ring), req->seqno); } diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index fa0cf6c9f4d0..f3c1e268f614 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1173,12 +1173,12 @@ static bool __i915_spin_request(struct drm_i915_gem_request *req, */ /* Only spin if we know the GPU is processing this request */ - if (!i915_gem_request_started(req, true)) + if (!i915_gem_request_started(req)) return false; timeout = local_clock_us() + 5; do { - if (i915_gem_request_completed(req, true)) + if (i915_gem_request_completed(req)) return true; if (signal_pending_state(state, wait->task)) @@ -1230,7 +1230,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req, if (list_empty(>list)) return 0; - if (i915_gem_request_completed(req, true)) + if (i915_gem_request_completed(req)) return 0; timeout_remain = MAX_SCHEDULE_TIMEOUT; @@ -1299,7 +1299,10 @@ wakeup: set_task_state(wait.task, state); * but it is easier and safer to do it every time the waiter * is woken. */ - if (i915_gem_request_completed(req, false)) + if (req->ring->seqno_barrier) + req->ring->seqno_barrier(req->ring); + + if (i915_gem_request_completed(req)) break; /* We need to check whether any gpu reset happened in between @@ -2731,8 +2734,11 @@ i915_gem_find_active_request(struct intel_engine_cs *ring) { struct drm_i915_gem_request *request; + if (ring->seqno_barrier) +
[Intel-gfx] [PATCH 18/32] drm/i915: Use HWS for seqno tracking everywhere
By using the same address for storing the HWS on every platform, we can remove the platform specific vfuncs and reduce the get-seqno routine to a single read of a cached memory location. Signed-off-by: Chris Wilson--- drivers/gpu/drm/i915/i915_debugfs.c | 10 ++-- drivers/gpu/drm/i915/i915_drv.h | 4 +- drivers/gpu/drm/i915/i915_gpu_error.c| 2 +- drivers/gpu/drm/i915/i915_irq.c | 4 +- drivers/gpu/drm/i915/i915_trace.h| 2 +- drivers/gpu/drm/i915/intel_breadcrumbs.c | 4 +- drivers/gpu/drm/i915/intel_lrc.c | 46 ++--- drivers/gpu/drm/i915/intel_ringbuffer.c | 86 drivers/gpu/drm/i915/intel_ringbuffer.h | 7 +-- 9 files changed, 43 insertions(+), 122 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index 8860dec36aae..a03ed9e38499 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -600,7 +600,7 @@ static int i915_gem_pageflip_info(struct seq_file *m, void *data) ring->name, i915_gem_request_get_seqno(work->flip_queued_req), dev_priv->next_seqno, - ring->get_seqno(ring), + intel_ring_get_seqno(ring), i915_gem_request_completed(work->flip_queued_req)); } else seq_printf(m, "Flip not associated with any ring\n"); @@ -732,10 +732,8 @@ static void i915_ring_seqno_info(struct seq_file *m, { struct rb_node *rb; - if (ring->get_seqno) { - seq_printf(m, "Current sequence (%s): %x\n", - ring->name, ring->get_seqno(ring)); - } + seq_printf(m, "Current sequence (%s): %x\n", + ring->name, intel_ring_get_seqno(ring)); spin_lock(>breadcrumbs.lock); for (rb = rb_first(>breadcrumbs.requests); @@ -1354,7 +1352,7 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused) for_each_ring(ring, dev_priv, i) { acthd[i] = intel_ring_get_active_head(ring); - seqno[i] = ring->get_seqno(ring); + seqno[i] = intel_ring_get_seqno(ring); } intel_runtime_pm_put(dev_priv); diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index d099e960f9b8..37f4ef59fb4a 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2980,13 +2980,13 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2) static inline bool i915_gem_request_started(struct drm_i915_gem_request *req) { - return i915_seqno_passed(req->ring->get_seqno(req->ring), + return i915_seqno_passed(intel_ring_get_seqno(req->ring), req->previous_seqno); } static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req) { - return i915_seqno_passed(req->ring->get_seqno(req->ring), + return i915_seqno_passed(intel_ring_get_seqno(req->ring), req->seqno); } diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index 01d0206ca4dd..3e137fc701cf 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -903,7 +903,7 @@ static void i915_record_ring_state(struct drm_device *dev, ering->waiting = intel_engine_has_waiter(ring); ering->instpm = I915_READ(RING_INSTPM(ring->mmio_base)); ering->acthd = intel_ring_get_active_head(ring); - ering->seqno = ring->get_seqno(ring); + ering->seqno = intel_ring_get_seqno(ring); ering->start = I915_READ_START(ring); ering->head = I915_READ_HEAD(ring); ering->tail = I915_READ_TAIL(ring); diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index da3c8aaa50a3..64502c0d2a81 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -2875,7 +2875,7 @@ static int semaphore_passed(struct intel_engine_cs *ring) if (signaller->hangcheck.deadlock >= I915_NUM_RINGS) return -1; - if (i915_seqno_passed(signaller->get_seqno(signaller), seqno)) + if (i915_seqno_passed(intel_ring_get_seqno(signaller), seqno)) return 1; /* cursory check for an unkickable deadlock */ @@ -2979,7 +2979,7 @@ static void i915_hangcheck_elapsed(struct work_struct *work) semaphore_clear_deadlocks(dev_priv); acthd = intel_ring_get_active_head(ring); - seqno = ring->get_seqno(ring); + seqno = intel_ring_get_seqno(ring); if (ring->hangcheck.seqno == seqno) { if (ring_idle(ring,
[Intel-gfx] Slaughter the thundering i915_wait_request, v3?
The biggest change is the revised bottom-half for handling user interupts (now we use the waiter on the oldest request as the bottom-half). That and the review feedback on Daniel on handling resets (and hangcheck) during the wait. Oh, and some interrupt/seqno timing review. Available from http://cgit.freedesktop.org/~ickle/linux-2.6/log/?h=breadcrumbs -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 01/32] drm/i915: Break busywaiting for requests on pending signals
The busywait in __i915_spin_request() does not respect pending signals and so may consume the entire timeslice for the task instead of returning to userspace to handle the signal. In the worst case this could cause a delay in signal processing of 20ms, which would be a noticeable jitter in cursor tracking. If a higher resolution signal was being used, for example to provide fairness of a server timeslices between clients, we could expect to detect some unfairness between clients (i.e. some windows not updating as fast as others). This issue was noticed when inspecting a report of poor interactivity resulting from excessively high __i915_spin_request usage. Fixes regression from commit 2def4ad99befa25775dd2f714fdd4d92faec6e34 [v4.2] Author: Chris WilsonDate: Tue Apr 7 16:20:41 2015 +0100 drm/i915: Optimistically spin for the request completion v2: Try to assess the impact of the bug Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin Cc: Jens Axboe Cc; "Rogozhkin, Dmitry V" Cc: Daniel Vetter Cc: Tvrtko Ursulin Cc: Eero Tamminen Cc: "Rantala, Valtteri" Cc: sta...@vger.kernel.org --- drivers/gpu/drm/i915/i915_gem.c | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 8e2acdebc74a..7e1246410afc 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1146,7 +1146,7 @@ static bool missed_irq(struct drm_i915_private *dev_priv, return test_bit(ring->id, _priv->gpu_error.missed_irq_rings); } -static int __i915_spin_request(struct drm_i915_gem_request *req) +static int __i915_spin_request(struct drm_i915_gem_request *req, int state) { unsigned long timeout; @@ -1158,6 +1158,9 @@ static int __i915_spin_request(struct drm_i915_gem_request *req) if (i915_gem_request_completed(req, true)) return 0; + if (signal_pending_state(state, current)) + break; + if (time_after_eq(jiffies, timeout)) break; @@ -1197,6 +1200,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req, struct drm_i915_private *dev_priv = dev->dev_private; const bool irq_test_in_progress = ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) & intel_ring_flag(ring); + int state = interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE; DEFINE_WAIT(wait); unsigned long timeout_expire; s64 before, now; @@ -1229,7 +1233,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req, before = ktime_get_raw_ns(); /* Optimistic spin for the next jiffie before touching IRQs */ - ret = __i915_spin_request(req); + ret = __i915_spin_request(req, state); if (ret == 0) goto out; @@ -1241,8 +1245,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req, for (;;) { struct timer_list timer; - prepare_to_wait(>irq_queue, , - interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE); + prepare_to_wait(>irq_queue, , state); /* We need to check whether any gpu reset happened in between * the caller grabbing the seqno and now ... */ @@ -1260,7 +1263,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req, break; } - if (interruptible && signal_pending(current)) { + if (signal_pending_state(state, current)) { ret = -ERESTARTSYS; break; } -- 2.6.3 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 12/32] drm/i915: Remove the dedicated hangcheck workqueue
The queue only ever contains at most one item and has no special flags. It is just a very simple wrapper around the system-wq - a complication with no benefits. Signed-off-by: Chris Wilson--- drivers/gpu/drm/i915/i915_dma.c | 11 --- drivers/gpu/drm/i915/i915_drv.h | 1 - drivers/gpu/drm/i915/i915_irq.c | 6 +++--- 3 files changed, 3 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c index 84e2b202ecb5..1fdb52048cea 100644 --- a/drivers/gpu/drm/i915/i915_dma.c +++ b/drivers/gpu/drm/i915/i915_dma.c @@ -1013,14 +1013,6 @@ int i915_driver_load(struct drm_device *dev, unsigned long flags) goto out_freewq; } - dev_priv->gpu_error.hangcheck_wq = - alloc_ordered_workqueue("i915-hangcheck", 0); - if (dev_priv->gpu_error.hangcheck_wq == NULL) { - DRM_ERROR("Failed to create our hangcheck workqueue.\n"); - ret = -ENOMEM; - goto out_freedpwq; - } - intel_irq_init(dev_priv); intel_uncore_sanitize(dev); @@ -1100,8 +1092,6 @@ out_gem_unload: intel_teardown_gmbus(dev); intel_teardown_mchbar(dev); pm_qos_remove_request(_priv->pm_qos); - destroy_workqueue(dev_priv->gpu_error.hangcheck_wq); -out_freedpwq: destroy_workqueue(dev_priv->hotplug.dp_wq); out_freewq: destroy_workqueue(dev_priv->wq); @@ -1201,7 +1191,6 @@ int i915_driver_unload(struct drm_device *dev) destroy_workqueue(dev_priv->hotplug.dp_wq); destroy_workqueue(dev_priv->wq); - destroy_workqueue(dev_priv->gpu_error.hangcheck_wq); pm_qos_remove_request(_priv->pm_qos); i915_global_gtt_cleanup(dev); diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 987a35c5af72..9304ecfa05d4 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1333,7 +1333,6 @@ struct i915_gpu_error { /* Hang gpu twice in this window and your context gets banned */ #define DRM_I915_CTX_BAN_PERIOD DIV_ROUND_UP(8*DRM_I915_HANGCHECK_PERIOD, 1000) - struct workqueue_struct *hangcheck_wq; struct delayed_work hangcheck_work; /* For reset and error_state handling. */ diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 21089ac5dd58..afe04aeb858d 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -3073,7 +3073,7 @@ static void i915_hangcheck_elapsed(struct work_struct *work) void i915_queue_hangcheck(struct drm_i915_private *dev_priv) { - struct i915_gpu_error *e = _priv->gpu_error; + unsigned long delay; if (!i915.enable_hangcheck) return; @@ -3083,8 +3083,8 @@ void i915_queue_hangcheck(struct drm_i915_private *dev_priv) * we will ignore a hung ring if a second ring is kept busy. */ - queue_delayed_work(e->hangcheck_wq, >hangcheck_work, - round_jiffies_up_relative(DRM_I915_HANGCHECK_JIFFIES)); + delay = round_jiffies_up_relative(DRM_I915_HANGCHECK_JIFFIES); + schedule_delayed_work(_priv->gpu_error.hangcheck_work, delay); } static void ibx_irq_reset(struct drm_device *dev) -- 2.6.3 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 19/32] drm/i915: Check the CPU cached value of seqno after waking the waiter
If we have multiple waiters, we may find that many complete on the same wake up. If we first inspect the seqno from the CPU cache, we may reduce the number of heavyweight coherent seqno reads we require. Signed-off-by: Chris Wilson--- drivers/gpu/drm/i915/i915_gem.c | 14 ++ 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index f3c1e268f614..15495b8112f9 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1288,6 +1288,12 @@ int __i915_wait_request(struct drm_i915_gem_request *req, wakeup:set_task_state(wait.task, state); + /* Before we do the heavier coherent read of the seqno, +* check the value (hopefully) in the CPU cacheline. +*/ + if (i915_gem_request_completed(req)) + break; + /* Ensure our read of the seqno is coherent so that we * do not "miss an interrupt" (i.e. if this is the last * request and the seqno write from the GPU is not visible @@ -1299,11 +1305,11 @@ wakeup: set_task_state(wait.task, state); * but it is easier and safer to do it every time the waiter * is woken. */ - if (req->ring->seqno_barrier) + if (req->ring->seqno_barrier) { req->ring->seqno_barrier(req->ring); - - if (i915_gem_request_completed(req)) - break; + if (i915_gem_request_completed(req)) + break; + } /* We need to check whether any gpu reset happened in between * the request being submitted and now. If a reset has occurred, -- 2.6.3 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 07/32] drm/i915: Store the reset counter when constructing a request
As the request is only valid during the same global reset epoch, we can record the current reset_counter when constructing the request and reuse it when waiting upon that request in future. This removes a very hairy atomic check serialised by the struct_mutex at the time of waiting and allows us to transfer those waits to a central dispatcher for all waiters and all requests. Signed-off-by: Chris WilsonCc: Daniel Vetter --- drivers/gpu/drm/i915/i915_drv.h | 2 +- drivers/gpu/drm/i915/i915_gem.c | 40 +++-- drivers/gpu/drm/i915/intel_display.c| 7 +- drivers/gpu/drm/i915/intel_lrc.c| 7 -- drivers/gpu/drm/i915/intel_ringbuffer.c | 6 - 5 files changed, 15 insertions(+), 47 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 1043ddd670a5..f30c305a6889 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2178,6 +2178,7 @@ struct drm_i915_gem_request { /** On Which ring this request was generated */ struct drm_i915_private *i915; struct intel_engine_cs *ring; + unsigned reset_counter; /** GEM sequence number associated with the previous request, * when the HWS breadcrumb is equal to this the GPU is processing @@ -3059,7 +3060,6 @@ void __i915_add_request(struct drm_i915_gem_request *req, #define i915_add_request_no_flush(req) \ __i915_add_request(req, NULL, false) int __i915_wait_request(struct drm_i915_gem_request *req, - unsigned reset_counter, bool interruptible, s64 *timeout, struct intel_rps_client *rps); diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 27e617b76418..b17cc0e42a4f 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1214,7 +1214,6 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state) /** * __i915_wait_request - wait until execution of request has finished * @req: duh! - * @reset_counter: reset sequence associated with the given request * @interruptible: do an interruptible wait (normally yes) * @timeout: in - how long to wait (NULL forever); out - how much time remaining * @@ -1229,7 +1228,6 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state) * errno with remaining time filled in timeout argument. */ int __i915_wait_request(struct drm_i915_gem_request *req, - unsigned reset_counter, bool interruptible, s64 *timeout, struct intel_rps_client *rps) @@ -1288,7 +1286,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req, /* We need to check whether any gpu reset happened in between * the caller grabbing the seqno and now ... */ - if (reset_counter != i915_reset_counter(_priv->gpu_error)) { + if (req->reset_counter != i915_reset_counter(_priv->gpu_error)) { /* ... but upgrade the -EAGAIN to an -EIO if the gpu * is truely gone. */ ret = i915_gem_check_wedge(_priv->gpu_error, interruptible); @@ -1461,13 +1459,7 @@ i915_wait_request(struct drm_i915_gem_request *req) BUG_ON(!mutex_is_locked(>struct_mutex)); - ret = i915_gem_check_wedge(_priv->gpu_error, interruptible); - if (ret) - return ret; - - ret = __i915_wait_request(req, - i915_reset_counter(_priv->gpu_error), - interruptible, NULL, NULL); + ret = __i915_wait_request(req, interruptible, NULL, NULL); if (ret) return ret; @@ -1542,7 +1534,6 @@ i915_gem_object_wait_rendering__nonblocking(struct drm_i915_gem_object *obj, struct drm_device *dev = obj->base.dev; struct drm_i915_private *dev_priv = dev->dev_private; struct drm_i915_gem_request *requests[I915_NUM_RINGS]; - unsigned reset_counter; int ret, i, n = 0; BUG_ON(!mutex_is_locked(>struct_mutex)); @@ -1551,12 +1542,6 @@ i915_gem_object_wait_rendering__nonblocking(struct drm_i915_gem_object *obj, if (!obj->active) return 0; - ret = i915_gem_check_wedge(_priv->gpu_error, true); - if (ret) - return ret; - - reset_counter = i915_reset_counter(_priv->gpu_error); - if (readonly) { struct drm_i915_gem_request *req; @@ -1578,9 +1563,9 @@ i915_gem_object_wait_rendering__nonblocking(struct drm_i915_gem_object *obj, } mutex_unlock(>struct_mutex); + ret = 0; for (i = 0; ret == 0 && i < n; i++) - ret = __i915_wait_request(requests[i], reset_counter, true, -
[Intel-gfx] [PATCH 05/32] drm/i915: Simplify checking of GPU reset_counter in display pageflips
If we, when we store the reset_counter for the operation, we ensure that it is not in a wedged or in the middle of a reset, we can then assert that if any reset occurs the reset_counter must change. Later we can just compare the operation's reset epoch against the current counter to see if we need to abort the operation (to handle the hang). Signed-off-by: Chris WilsonCc: Daniel Vetter --- drivers/gpu/drm/i915/intel_display.c | 16 +--- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index cc47c0206294..8b6028cd619f 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -3283,14 +3283,12 @@ void intel_finish_reset(struct drm_device *dev) static bool intel_crtc_has_pending_flip(struct drm_crtc *crtc) { struct drm_device *dev = crtc->dev; - struct drm_i915_private *dev_priv = dev->dev_private; struct intel_crtc *intel_crtc = to_intel_crtc(crtc); unsigned reset_counter; bool pending; - reset_counter = i915_reset_counter(_priv->gpu_error); - if (intel_crtc->reset_counter != reset_counter || - __i915_reset_in_progress_or_wedged(reset_counter)) + reset_counter = i915_reset_counter(_i915(dev)->gpu_error); + if (intel_crtc->reset_counter != reset_counter) return false; spin_lock_irq(>event_lock); @@ -10947,8 +10945,7 @@ static bool page_flip_finished(struct intel_crtc *crtc) unsigned reset_counter; reset_counter = i915_reset_counter(_priv->gpu_error); - if (crtc->reset_counter != reset_counter || - __i915_reset_in_progress_or_wedged(reset_counter)) + if (crtc->reset_counter != reset_counter) return true; /* @@ -11604,8 +11601,13 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc, if (ret) goto cleanup; - atomic_inc(_crtc->unpin_work_count); intel_crtc->reset_counter = i915_reset_counter(_priv->gpu_error); + if (__i915_reset_in_progress_or_wedged(intel_crtc->reset_counter)) { + ret = -EIO; + goto cleanup; + } + + atomic_inc(_crtc->unpin_work_count); if (INTEL_INFO(dev)->gen >= 5 || IS_G4X(dev)) work->flip_count = I915_READ(PIPE_FLIPCOUNT_G4X(pipe)) + 1; -- 2.6.3 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 14/32] drm/i915: Remove forcewake dance from seqno/irq barrier on legacy gen6+
In order to ensure seqno/irq coherency, we current read a ring register. We are not sure quite how it works, only that is does. Experiments show that e.g. doing a clflush(seqno) instead is not sufficient, but we can remove the forcewake dance from the mmio access. v2: Baytrail wants a clflush too. Signed-off-by: Chris WilsonCc: Daniel Vetter --- drivers/gpu/drm/i915/intel_ringbuffer.c | 15 +-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 6cecc15ec01b..69dd69e46fa9 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -1490,10 +1490,21 @@ gen6_ring_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency) { /* Workaround to force correct ordering between irq and seqno writes on * ivb (and maybe also on snb) by reading from a CS register (like -* ACTHD) before reading the status page. */ +* ACTHD) before reading the status page. +* +* Note that this effectively effectively stalls the read by the time +* it takes to do a memory transaction, which more or less ensures +* that the write from the GPU has sufficient time to invalidate +* the CPU cacheline. Alternatively we could delay the interrupt from +* the CS ring to give the write time to land, but that would incur +* a delay after every batch i.e. much more frequent than a delay +* when waiting for the interrupt (with the same net latency). +*/ if (!lazy_coherency) { struct drm_i915_private *dev_priv = ring->dev->dev_private; - POSTING_READ(RING_ACTHD(ring->mmio_base)); + POSTING_READ_FW(RING_ACTHD(ring->mmio_base)); + + intel_flush_status_page(ring, I915_GEM_HWS_INDEX); } return intel_read_status_page(ring, I915_GEM_HWS_INDEX); -- 2.6.3 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [RFC 08/12] drm/i915: Interrupt driven fences
Hi, Some random comments, mostly from the point of view of solving the thundering herd problem. On 23/11/15 11:34, john.c.harri...@intel.com wrote: From: John HarrisonThe intended usage model for struct fence is that the signalled status should be set on demand rather than polled. That is, there should not be a need for a 'signaled' function to be called everytime the status is queried. Instead, 'something' should be done to enable a signal callback from the hardware which will update the state directly. In the case of requests, this is the seqno update interrupt. The idea is that this callback will only be enabled on demand when something actually tries to wait on the fence. This change removes the polling test and replaces it with the callback scheme. Each fence is added to a 'please poke me' list at the start of i915_add_request(). The interrupt handler then scans through the 'poke me' list when a new seqno pops out and signals any matching fence/request. The fence is then removed from the list so the entire request stack does not need to be scanned every time. Note that the fence is added to the list before the commands to generate the seqno interrupt are added to the ring. Thus the sequence is guaranteed to be race free if the interrupt is already enabled. Note that the interrupt is only enabled on demand (i.e. when __wait_request() is called). Thus there is still a potential race when enabling the interrupt as the request may already have completed. However, this is simply solved by calling the interrupt processing code immediately after enabling the interrupt and thereby checking for already completed requests. Lastly, the ring clean up code has the possibility to cancel outstanding requests (e.g. because TDR has reset the ring). These requests will never get signalled and so must be removed from the signal list manually. This is done by setting a 'cancelled' flag and then calling the regular notify/retire code path rather than attempting to duplicate the list manipulatation and clean up code in multiple places. This also avoid any race condition where the cancellation request might occur after/during the completion interrupt actually arriving. v2: Updated to take advantage of the request unreference no longer requiring the mutex lock. v3: Move the signal list processing around to prevent unsubmitted requests being added to the list. This was occurring on Android because the native sync implementation calls the fence->enable_signalling API immediately on fence creation. Updated after review comments by Tvrtko Ursulin. Renamed list nodes to 'link' instead of 'list'. Added support for returning an error code on a cancelled fence. Update list processing to be more efficient/safer with respect to spinlocks. For: VIZ-5190 Signed-off-by: John Harrison Cc: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_drv.h | 10 ++ drivers/gpu/drm/i915/i915_gem.c | 187 ++-- drivers/gpu/drm/i915/i915_irq.c | 2 + drivers/gpu/drm/i915/intel_lrc.c| 2 + drivers/gpu/drm/i915/intel_ringbuffer.c | 2 + drivers/gpu/drm/i915/intel_ringbuffer.h | 2 + 6 files changed, 196 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index fbf591f..d013c6d 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2187,7 +2187,12 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old, struct drm_i915_gem_request { /** Underlying object for implementing the signal/wait stuff. */ struct fence fence; + struct list_head signal_link; + struct list_head unsignal_link; struct list_head delayed_free_link; + bool cancelled; + bool irq_enabled; + bool signal_requested; /** On Which ring this request was generated */ struct drm_i915_private *i915; @@ -2265,6 +2270,11 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring, struct drm_i915_gem_request **req_out); void i915_gem_request_cancel(struct drm_i915_gem_request *req); +void i915_gem_request_submit(struct drm_i915_gem_request *req); +void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req, + bool fence_locked); +void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked); + int i915_create_fence_timeline(struct drm_device *dev, struct intel_context *ctx, struct intel_engine_cs *ring); diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 171ae5f..2a0b346 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1165,6 +1165,8 @@ static int __i915_spin_request(struct drm_i915_gem_request *req) timeout = jiffies + 1; while
[Intel-gfx] [PATCH] tests/kms_color:Color IGT
From: DhanyaThis patch will verify color correction capability of a display driver. Gamma/CSC/De-gamma for SKL/BXT supported. Signed-off-by: Dhanya --- tests/.gitignore | 1 + tests/Makefile.sources | 1 + tests/kms_color.c | 684 + 3 files changed, 686 insertions(+) create mode 100644 tests/kms_color.c diff --git a/tests/.gitignore b/tests/.gitignore index 80af9a7..58c79e2 100644 --- a/tests/.gitignore +++ b/tests/.gitignore @@ -127,6 +127,7 @@ gen7_forcewake_mt kms_3d kms_addfb_basic kms_atomic +kms_color kms_crtc_background_color kms_cursor_crc kms_draw_crc diff --git a/tests/Makefile.sources b/tests/Makefile.sources index 8fb2de8..906c14f 100644 --- a/tests/Makefile.sources +++ b/tests/Makefile.sources @@ -64,6 +64,7 @@ TESTS_progs_M = \ gem_write_read_ring_switch \ kms_addfb_basic \ kms_atomic \ + kms_color \ kms_cursor_crc \ kms_draw_crc \ kms_fbc_crc \ diff --git a/tests/kms_color.c b/tests/kms_color.c new file mode 100644 index 000..b5d199b --- /dev/null +++ b/tests/kms_color.c @@ -0,0 +1,684 @@ +/* + * Copyright ?? 2015 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + */ + +#include +#include "drmtest.h" +#include "drm.h" +#include "igt_debugfs.h" +#include "igt_kms.h" +#include "igt_core.h" +#include "intel_io.h" +#include "intel_chipset.h" +#include "igt_aux.h" +#include +#include +#include +#include + + +IGT_TEST_DESCRIPTION("Test Color Features at Pipe level"); +/* +This tool tests the following color features: + 1.csc-red + 2.csc-green + 3.csc-blue + 4.gamma-legacy + 5.gamma-8bit + 6.gamma-10bit + 7.gamma-12bit + 8.gamma-split + +Verification is done by CRC checks. + +*/ + +#define CSC_MAX_VALS9 +#define GEN9_SPLITGAMMA_MAX_VALS512 +#define GEN9_8BIT_GAMMA_MAX_VALS256 +#define GEN9_10BIT_GAMMA_MAX_VALS 1024 +#define GEN9_12BIT_GAMMA_MAX_VALS 513 +#define GEN9_MAX_GAMMA ((1 << 24) - 1) +#define GEN9_MIN_GAMMA 0 +#define RED_CSC 0 +#define GREEN_CSC 1 +#define BLUE_CSC 2 +#define RED_FB 0 +#define GREEN_FB 1 +#define BLUE_FB 2 + +struct _drm_r32g32b32 { + __u32 r32; + __u32 g32; + __u32 b32; + __u32 reserved; +}; + +struct _drm_palette { + struct _drm_r32g32b32 lut[0]; +}; + +struct _drm_ctm { + __s64 ctm_coeff[9]; +}; + +float ctm_red[9] = {1, 1, 1, 0, 0, 0, 0, 0, 0}; +float ctm_green[9] = {0, 0, 0, 1, 1, 1, 0, 0, 0}; +float ctm_blue[9] = {0, 0, 0, 0, 0, 0, 1, 1, 1}; +float ctm_unity[9] = {1, 0, 0, 0, 1, 0, 0, 0, 1}; + +struct framebuffer_color { + int red; + int green; + int blue; +}; +struct framebuffer_color fb_color = {0,0,0}; + +igt_crc_t crc_reference, crc_reference_black, crc_reference_white; +igt_crc_t crc_black, crc_white, crc_current; + +struct data_t { + int fb_initial; + int drm_fd; + int gen; + int w, h; + igt_display_t display; + struct igt_fb fb_prep; + struct igt_fb fb, fb1; +igt_pipe_crc_t *pipe_crc; + enum pipe pipe; + +}; + + +static int create_blob(int fd, uint64_t *data, int length) +{ + struct drm_mode_create_blob blob; + int ret = -1; + + blob.data = (uint64_t)data; + blob.length = length; + blob.blob_id = -1; + ret = ioctl(fd, DRM_IOCTL_MODE_CREATEPROPBLOB, ); + if (!ret) + return blob.blob_id; + igt_fail(IGT_EXIT_FAILURE); + return ret; +} + +static void prepare_crtc(struct data_t *data, igt_output_t *output, +enum pipe pipe1, igt_plane_t *plane, drmModeModeInfo *mode, +enum igt_commit_style s) +{ + igt_display_t
Re: [Intel-gfx] [PATCH V4 2/2] drm/i915: start adding dp mst audio
On Fri, 11 Dec 2015 07:07:53 +0100, Libin Yang wrote: > > >>> diff --git a/drivers/gpu/drm/i915/intel_audio.c > >>> b/drivers/gpu/drm/i915/intel_audio.c > >>> index 9aa83e7..5ad2e66 100644 > >>> --- a/drivers/gpu/drm/i915/intel_audio.c > >>> +++ b/drivers/gpu/drm/i915/intel_audio.c > >>> @@ -262,7 +262,8 @@ static void hsw_audio_codec_disable(struct > >>> intel_encoder *encoder) > >>> tmp |= AUD_CONFIG_N_PROG_ENABLE; > >>> tmp &= ~AUD_CONFIG_UPPER_N_MASK; > >>> tmp &= ~AUD_CONFIG_LOWER_N_MASK; > >>> - if (intel_pipe_has_type(intel_crtc, INTEL_OUTPUT_DISPLAYPORT)) > >>> + if (intel_pipe_has_type(intel_crtc, INTEL_OUTPUT_DISPLAYPORT) || > >>> + intel_pipe_has_type(intel_crtc, INTEL_OUTPUT_DP_MST)) > >>> tmp |= AUD_CONFIG_N_VALUE_INDEX; The same check is missing in hsw_audio_codec_enable()? > >>> I915_WRITE(HSW_AUD_CFG(pipe), tmp); > >>> > >>> @@ -474,7 +475,8 @@ static void ilk_audio_codec_enable(struct > >>> drm_connector *connector, > >>> tmp &= ~AUD_CONFIG_N_VALUE_INDEX; > >>> tmp &= ~AUD_CONFIG_N_PROG_ENABLE; > >>> tmp &= ~AUD_CONFIG_PIXEL_CLOCK_HDMI_MASK; > >>> - if (intel_pipe_has_type(intel_crtc, INTEL_OUTPUT_DISPLAYPORT)) > >>> + if (intel_pipe_has_type(intel_crtc, INTEL_OUTPUT_DISPLAYPORT) || > >>> + intel_pipe_has_type(intel_crtc, INTEL_OUTPUT_DP_MST)) > >>> tmp |= AUD_CONFIG_N_VALUE_INDEX; ... and missing for ilk_audio_codec_disable()? > >>> else > >>> tmp |= audio_config_hdmi_pixel_clock(adjusted_mode); > >>> @@ -512,7 +514,8 @@ void intel_audio_codec_enable(struct intel_encoder > >>> *intel_encoder) > >>> > >>> /* ELD Conn_Type */ > >>> connector->eld[5] &= ~(3 << 2); > >>> - if (intel_pipe_has_type(crtc, INTEL_OUTPUT_DISPLAYPORT)) > >>> + if (intel_pipe_has_type(crtc, INTEL_OUTPUT_DISPLAYPORT) || > >>> + intel_pipe_has_type(crtc, INTEL_OUTPUT_DP_MST)) IMO, it's better to have a macro to cover this two-line check instead of open-coding at each place. We'll have 5 places in the end. thanks, Takashi ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Wait for PP cycle delay only if panel is in power off sequence
On 12/11/2015 04:55 PM, Thulasimani, Sivakumar wrote: On 12/10/2015 8:32 PM, Ville Syrjälä wrote: On Thu, Dec 10, 2015 at 08:09:01PM +0530, Thulasimani, Sivakumar wrote: On 12/10/2015 7:08 PM, Ville Syrjälä wrote: On Thu, Dec 10, 2015 at 03:15:37PM +0200, Ville Syrjälä wrote: On Thu, Dec 10, 2015 at 03:01:02PM +0530, Kumar, Shobhit wrote: On 12/09/2015 09:35 PM, Ville Syrjälä wrote: On Wed, Dec 09, 2015 at 08:59:26PM +0530, Shobhit Kumar wrote: On Wed, Dec 9, 2015 at 8:34 PM, Chris Wilsonwrote: On Wed, Dec 09, 2015 at 08:07:10PM +0530, Shobhit Kumar wrote: On Wed, Dec 9, 2015 at 7:27 PM, Ville Syrjälä wrote: On Wed, Dec 09, 2015 at 06:51:48PM +0530, Shobhit Kumar wrote: During resume, while turning the EDP panel power on, we need not wait blindly for panel_power_cycle_delay. Check if panel power down sequence in progress and then only wait. This improves our resume time significantly. Signed-off-by: Shobhit Kumar --- drivers/gpu/drm/i915/intel_dp.c | 17 - 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c index f335c92..10ec669 100644 --- a/drivers/gpu/drm/i915/intel_dp.c +++ b/drivers/gpu/drm/i915/intel_dp.c @@ -617,6 +617,20 @@ static bool edp_have_panel_power(struct intel_dp *intel_dp) return (I915_READ(_pp_stat_reg(intel_dp)) & PP_ON) != 0; } +static bool edp_panel_off_seq(struct intel_dp *intel_dp) +{ + struct drm_device *dev = intel_dp_to_dev(intel_dp); + struct drm_i915_private *dev_priv = dev->dev_private; + + lockdep_assert_held(_priv->pps_mutex); + + if (IS_VALLEYVIEW(dev) && + intel_dp->pps_pipe == INVALID_PIPE) + return false; + + return (I915_READ(_pp_stat_reg(intel_dp)) & PP_SEQUENCE_POWER_DOWN) != 0; +} This doens't make sense to me. The power down cycle may have completed just before, and so this would claim we don't have to wait for the power_cycle_delay. Not sure I understand your concern correctly. You are right, power down cycle may have completed just before and if it has then we don't need to wait. But in case the power down cycle is in progress as per internal state, then we need to wait for it to complete. This will happen for example in non-suspend disable path and will be handled correctly. In case of actual suspend/resume, this would have successfully completed and will skip the wait as it is not needed before enabling panel power. + static bool edp_have_panel_vdd(struct intel_dp *intel_dp) { struct drm_device *dev = intel_dp_to_dev(intel_dp); @@ -2025,7 +2039,8 @@ static void edp_panel_on(struct intel_dp *intel_dp) port_name(dp_to_dig_port(intel_dp)->port))) return; - wait_panel_power_cycle(intel_dp); + if (edp_panel_off_seq(intel_dp)) + wait_panel_power_cycle(intel_dp); Looking in from the side, I have no idea what this is meant to do. At the very least you need your explanatory paragraph here which would include what exactly you are waiting for at the start of edp_panel_on (and please try and find a better name for edp_panel_off_seq()). I will add a comment. Basically I am not additionally waiting, but converting the wait which was already there to a conditional wait. The edp_panel_off_seq, checks if panel power down sequence is in progress. In that case we need to wait for the panel power cycle delay. If it is not in that sequence, there is no need to wait. I will make an attempt again on the naming in next patch update. As far I remeber you need to wait for power_cycle_delay between power down cycle and power up cycle. You're trying to throw that wait away entirely, unless the function happens get called while the power down Yes you are right and I realize I made a mistake in my patch which is not checking PP_CYCLE_DELAY_ACTIVE bit. cycle is still in progress. We should already optimize away redundant waits by tracking the end of the power down cycle with the jiffies tracking. Actually looking at the code the power_cycle_delay gets counted from the start of the last power down cycle, so supposedly it's always at least as long as the power down cycle, and typically it's quite a bit longer that that. But that doesn't change the fact that you can't just skip it because the power down cycle delay happened to end already. So what we do now is: 1. initiate power down cycle 2. last_power_cycle=jiffies 3. wait for power down (I suppose this actually waits until the power down delay has passed since that's programmes into the PPS). 4. wait for power_cycle_delay from last_power_cycle 5. initiate power up cycle I think with your patch step 4 would always be skipped since the power down cycle has already ended, and then we fail to honor the power cycle delay. Yes, I agree. I missed checking for PP_CYCLE_DELAY_ACTIVE. Adding that check
Re: [Intel-gfx] [PATCH 2/6] drm/i915: Support for creating Stolen memory backed objects
On Wed, 2015-12-09 at 14:06 +, Tvrtko Ursulin wrote: > Hi, > > On 09/12/15 12:46, ankitprasad.r.sha...@intel.com wrote: > > From: Ankitprasad Sharma> > > > Extend the drm_i915_gem_create structure to add support for > > creating Stolen memory backed objects. Added a new flag through > > which user can specify the preference to allocate the object from > > stolen memory, which if set, an attempt will be made to allocate > > the object from stolen memory subject to the availability of > > free space in the stolen region. > > > > v2: Rebased to the latest drm-intel-nightly (Ankit) > > > > v3: Changed versioning of GEM_CREATE param, added new comments (Tvrtko) > > > > v4: Changed size from 32b to 64b to prevent userspace overflow (Tvrtko) > > Corrected function arguments ordering (Chris) > > > > v5: Corrected function name (Chris) > > > > Testcase: igt/gem_stolen > > > > Signed-off-by: Ankitprasad Sharma > > Reviewed-by: Tvrtko Ursulin > > --- > > drivers/gpu/drm/i915/i915_dma.c| 3 +++ > > drivers/gpu/drm/i915/i915_drv.h| 2 +- > > drivers/gpu/drm/i915/i915_gem.c| 30 +++--- > > drivers/gpu/drm/i915/i915_gem_stolen.c | 4 ++-- > > include/uapi/drm/i915_drm.h| 16 > > 5 files changed, 49 insertions(+), 6 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_dma.c > > b/drivers/gpu/drm/i915/i915_dma.c > > index ffcb9c6..6927c7e 100644 > > --- a/drivers/gpu/drm/i915/i915_dma.c > > +++ b/drivers/gpu/drm/i915/i915_dma.c > > @@ -170,6 +170,9 @@ static int i915_getparam(struct drm_device *dev, void > > *data, > > case I915_PARAM_HAS_RESOURCE_STREAMER: > > value = HAS_RESOURCE_STREAMER(dev); > > break; > > + case I915_PARAM_CREATE_VERSION: > > + value = 2; > > + break; > > default: > > DRM_DEBUG("Unknown parameter %d\n", param->param); > > return -EINVAL; > > diff --git a/drivers/gpu/drm/i915/i915_drv.h > > b/drivers/gpu/drm/i915/i915_drv.h > > index 8e554d3..d45274e 100644 > > --- a/drivers/gpu/drm/i915/i915_drv.h > > +++ b/drivers/gpu/drm/i915/i915_drv.h > > @@ -3213,7 +3213,7 @@ void i915_gem_stolen_remove_node(struct > > drm_i915_private *dev_priv, > > int i915_gem_init_stolen(struct drm_device *dev); > > void i915_gem_cleanup_stolen(struct drm_device *dev); > > struct drm_i915_gem_object * > > -i915_gem_object_create_stolen(struct drm_device *dev, u32 size); > > +i915_gem_object_create_stolen(struct drm_device *dev, u64 size); > > struct drm_i915_gem_object * > > i915_gem_object_create_stolen_for_preallocated(struct drm_device *dev, > >u32 stolen_offset, > > diff --git a/drivers/gpu/drm/i915/i915_gem.c > > b/drivers/gpu/drm/i915/i915_gem.c > > index d57e850..296e63f 100644 > > --- a/drivers/gpu/drm/i915/i915_gem.c > > +++ b/drivers/gpu/drm/i915/i915_gem.c > > @@ -375,6 +375,7 @@ static int > > i915_gem_create(struct drm_file *file, > > struct drm_device *dev, > > uint64_t size, > > + uint32_t flags, > > uint32_t *handle_p) > > { > > struct drm_i915_gem_object *obj; > > @@ -385,8 +386,31 @@ i915_gem_create(struct drm_file *file, > > if (size == 0) > > return -EINVAL; > > > > + if (flags & __I915_CREATE_UNKNOWN_FLAGS) > > + return -EINVAL; > > + > > /* Allocate the new object */ > > - obj = i915_gem_alloc_object(dev, size); > > + if (flags & I915_CREATE_PLACEMENT_STOLEN) { > > + mutex_lock(>struct_mutex); > > + obj = i915_gem_object_create_stolen(dev, size); > > + if (!obj) { > > + mutex_unlock(>struct_mutex); > > + return -ENOMEM; > > + } > > + > > + /* Always clear fresh buffers before handing to userspace */ > > + ret = i915_gem_object_clear(obj); > > + if (ret) { > > + drm_gem_object_unreference(>base); > > + mutex_unlock(>struct_mutex); > > + return ret; > > + } > > + > > + mutex_unlock(>struct_mutex); > > + } else { > > + obj = i915_gem_alloc_object(dev, size); > > + } > > + > > if (obj == NULL) > > return -ENOMEM; > > > > @@ -409,7 +433,7 @@ i915_gem_dumb_create(struct drm_file *file, > > args->pitch = ALIGN(args->width * DIV_ROUND_UP(args->bpp, 8), 64); > > args->size = args->pitch * args->height; > > return i915_gem_create(file, dev, > > - args->size, >handle); > > + args->size, 0, >handle); > > } > > > > /** > > @@ -422,7 +446,7 @@ i915_gem_create_ioctl(struct drm_device *dev, void > > *data, > > struct drm_i915_gem_create *args = data; > > > > return i915_gem_create(file, dev, > > -
Re: [Intel-gfx] [PATCH] Always mark GEM objects as dirty when written by the CPU
On 10/12/15 08:58, Daniel Vetter wrote: On Mon, Dec 07, 2015 at 12:51:49PM +, Dave Gordon wrote: I think I missed i915_gem_phys_pwrite(). i915_gem_gtt_pwrite_fast() marks the object dirty for most cases (vit set_to_gtt_domain(), but isn't called for all cases (or can return before the set_domain). Then we try i915_gem_shmem_pwrite() for non-phys objects (no check for stolen!) and that already marks the object dirty [aside: we might be able to change that to page-by-page?], but i915_gem_phys_pwrite() doesn't mark the object dirty, so we might lose updates there? Or maybe we should move the marking up into i915_gem_pwrite_ioctl() instead. The target object is surely going to be dirtied, whatever type it is. phys objects are special, and when binding we create allocate new (contiguous) storage. In put_pages_phys that gets copied back and pages marked as dirty. While a phys object is pinned it's a kernel bug to look at the shmem pages and a userspace bug to touch the cpu mmap (since that data will simply be overwritten whenever the kernel feels like). phys objects are only used for cursors on old crap though, so ok if we don't streamline this fairly quirky old ABI. -Daniel So is pread broken already for 'phys' ? In the pwrite code, we have i915_gem_phys_pwrite() which look OK, but there isn't a corresponding i915_gem_phys_pread(), instead it will call i915_gem_shmem_pread(), and I'm not sure that will work! The question being, does the kernel have page table slots corresponding to the DMA area allocated, otherwise the for_each_sg_page()/sg_page_iter_page() in i915_gem_shmem_pread() isn't going to give meaningful results. And I found this comment in drm_pci_alloc() (called from i915_gem_object_attach_phys()): /* XXX - Is virt_to_page() legal for consistent mem? */ /* Reserve */ for (addr = (unsigned long)dmah->vaddr, sz = size; sz > 0; addr += PAGE_SIZE, sz -= PAGE_SIZE) { SetPageReserved(virt_to_page((void *)addr)); } (and does it depend on which memory configuration is selected?). See also current thread on "Support for pread/pwrite from/to non shmem backed objects" ... .Dave. ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH i-g-t] gem_flink_race/prime_self_import: Improve test reliability
> > >-Original Message- >From: Daniel Vetter [mailto:daniel.vet...@ffwll.ch] On Behalf Of Daniel Vetter >Sent: Thursday, December 10, 2015 12:53 PM >To: Morton, Derek J >Cc: Daniel Vetter; intel-gfx@lists.freedesktop.org; Wood, Thomas >Subject: Re: [Intel-gfx] [PATCH i-g-t] gem_flink_race/prime_self_import: >Improve test reliability > >On Thu, Dec 10, 2015 at 11:51:29AM +, Morton, Derek J wrote: >> > >> > >> >-Original Message- >> >From: Daniel Vetter [mailto:daniel.vet...@ffwll.ch] On Behalf Of >> >Daniel Vetter >> >Sent: Thursday, December 10, 2015 10:13 AM >> >To: Morton, Derek J >> >Cc: intel-gfx@lists.freedesktop.org; Wood, Thomas >> >Subject: Re: [Intel-gfx] [PATCH i-g-t] >> >gem_flink_race/prime_self_import: Improve test reliability >> > >> >On Tue, Dec 08, 2015 at 12:44:44PM +, Derek Morton wrote: >> >> gem_flink_race and prime_self_import have subtests which read the >> >> number of open gem objects from debugfs to determine if objects >> >> have leaked during the test. However the test can fail sporadically >> >> if the number of gem objects changes due to other process activity. >> >> This patch introduces a change to check the number of gem objects >> >> several times to filter out any fluctuations. >> > >> >Why exactly does this happen? IGT tests should be run on bare metal, >> >with everything else killed/subdued/shutup. If there's still things >> >going on that create objects, we need to stop them from doing that. >> > >> >If this only applies to Android, or some special Android deamon them >> >imo check for that at runtime and igt_skip("your setup is invalid, >> >deamon %s running\n"); is the correct fix. After all just because you >> >sampled for a bit doesn't mean that it wont still change right when >> >you start running the test for real, so this is still fragile. >> >> Before running tests on android we do stop everything possible. I >> suspect the culprit is coreu getting automatically restarted after it >> is stopped. I had additional debug while developing this patch and >> what I saw was the system being mostly quiescent but with some very >> low level background activity. 1 extra object being created and then >> deleted occasionally. Depending on whether it occurred at the start or >> end of the test it was resulting in a reported leak of either 1 or -1 >> objects. >> The patch fixes that issue by taking several samples and requiring >> them to be the same, therefore filtering out the low level background noise. >> It would not help if something in the background allocated an object >> and kept it allocated, but I have not seen that happen. I only saw >> once the object count increasing for 2 consecutive reads hence the >> count to 4 to give a margin. The test was failing about 10%. With this >> patch I got 100% pass across 300 runs of each of the tests. > >Hm, piglit checks that there's no other drm clients running. Have you tried >re-running that check to zero in on the culprit? We don't use piglet to run IGT tests on Android. I have had a look at what piglet does and added the same check to our scripts. (It reads a list of clients from /sys/kernel/debug/dri/0/clients) For CHV it shows a process called 'y', though that seems to be some issue on CHV that all driver clients are called 'y'. I checked on BXT which properly shows the process names and it looks like it is the binder process (which is handling some inter process communication). I don't think this is something we can stop. > >> If you are concerned about the behaviour when running the test with a >> load of background activity I could add code to limit to the reset of >> the count and fail the test in that instance. That would give a >> benefit of distinguishing a test fail due to excessive background >> activity from a detected leak. > >I'm also concerned for the overhead this causes everyone else. If this really >is some Android trouble then I think it'd be good to only compile this on >Android. But would still be much better if you can get to a reliably clean >test environment. I will make the loop part android specific. //Derek > >> I would not want to just have the test skip as that introduces a hole >> in our test coverage. >> >> >Also would be good to extract get_stable_obj_count to a proper igt >> >library function, if it indeed needs to be this tricky. And then add >> >the explanation for why we need this in the gtkdoc. >> >> I can move the code to an igt library. Which library would you suggest? >> Igt_debugfs ? > >Hm yeah, it's a bit the dumping ground for all things debugfs access ;-) >-Daniel >-- >Daniel Vetter >Software Engineer, Intel Corporation >http://blog.ffwll.ch > ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 22/32] drm/i915: Stop setting wraparound seqno on initialisation
We have testcases to ensure that seqno wraparound works fine, so we can forgo forcing everyone to encounter seqno wraparound during early uptime. seqno wraparound incurs a full GPU stall so not forcing it will eliminate one jitter from the early system. Using the testcases, we have very deterministic testing which given how difficult it would be to debug an issue (GPU hang) stemming from a wraparound using pure postmortem analysis I see no value in forcing a wrap during boot. Advancing the global next_seqno after a GPU reset is equally pointless. Signed-off-by: Chris Wilson--- drivers/gpu/drm/i915/i915_gem.c | 16 +--- 1 file changed, 1 insertion(+), 15 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 15495b8112f9..d595d72e53b1 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -4831,14 +4831,6 @@ i915_gem_init_hw(struct drm_device *dev) } } - /* -* Increment the next seqno by 0x100 so we have a visible break -* on re-initialisation -*/ - ret = i915_gem_set_seqno(dev, dev_priv->next_seqno+0x100); - if (ret) - goto out; - /* Now it is safe to go back round and do everything else: */ for_each_ring(ring, dev_priv, i) { struct drm_i915_gem_request *req; @@ -5018,13 +5010,7 @@ i915_gem_load(struct drm_device *dev) dev_priv->num_fence_regs = I915_READ(vgtif_reg(avail_rs.fence_num)); - /* -* Set initial sequence number for requests. -* Using this number allows the wraparound to happen early, -* catching any obvious problems. -*/ - dev_priv->next_seqno = ((u32)~0 - 0x1100); - dev_priv->last_seqno = ((u32)~0 - 0x1101); + dev_priv->next_seqno = 1; /* Initialize fence registers to zero */ INIT_LIST_HEAD(_priv->mm.fence_list); -- 2.6.3 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 15/32] drm/i915: Slaughter the thundering i915_wait_request herd
One particularly stressful scenario consists of many independent tasks all competing for GPU time and waiting upon the results (e.g. realtime transcoding of many, many streams). One bottleneck in particular is that each client waits on its own results, but every client is woken up after every batchbuffer - hence the thunder of hooves as then every client must do its heavyweight dance to read a coherent seqno to see if it is the lucky one. Ideally, we only want one client to wake up after the interrupt and check its request for completion. Since the requests must retire in order, we can select the first client on the oldest request to be woken. Once that client has completed his wait, we can then wake up the next client and so on. However, all clients then incur latency as every process in the chain may be delayed for scheduling - this may also then cause some priority inversion. To reduce the latency, when a client is added or removed from the list, we scan the tree for completed seqno and wake up all the completed waiters in parallel. v2: Convert from a kworker per engine into a dedicated kthread for the bottom-half. v3: Rename request members and tweak comments. v4: Use a per-engine spinlock in the breadcrumbs bottom-half. v5: Fix race in locklessly checking waiter status and kicking the task on adding a new waiter. v6: Fix deciding when to force the timer to hide missing interrupts. v7: Move the bottom-half from the kthread to the first client process. v8: Reword a few comments v9: Break the busy loop when the interrupt is unmasked or has fired. v10: Comments, unnecessary churn, better debugging from Tvrtko v11: Wake all completed waiters on removing the current bottom-half to reduce the latency of waking up a herd of clients all waiting on the same request. v12: Rearrange missed-interrupt fault injection so that it works with igt/drv_missed_irq_hang Signed-off-by: Chris WilsonCc: "Rogozhkin, Dmitry V" Cc: "Gong, Zhipeng" Cc: Tvrtko Ursulin Cc: Dave Gordon --- drivers/gpu/drm/i915/Makefile| 1 + drivers/gpu/drm/i915/i915_debugfs.c | 19 ++- drivers/gpu/drm/i915/i915_drv.h | 3 +- drivers/gpu/drm/i915/i915_gem.c | 152 - drivers/gpu/drm/i915/i915_gpu_error.c| 2 +- drivers/gpu/drm/i915/i915_irq.c | 14 +- drivers/gpu/drm/i915/intel_breadcrumbs.c | 274 +++ drivers/gpu/drm/i915/intel_lrc.c | 5 +- drivers/gpu/drm/i915/intel_ringbuffer.c | 5 +- drivers/gpu/drm/i915/intel_ringbuffer.h | 63 ++- 10 files changed, 436 insertions(+), 102 deletions(-) create mode 100644 drivers/gpu/drm/i915/intel_breadcrumbs.c diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index 0851de07bd13..d3b9d3618719 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -35,6 +35,7 @@ i915-y += i915_cmd_parser.o \ i915_gem_userptr.o \ i915_gpu_error.o \ i915_trace_points.o \ + intel_breadcrumbs.o \ intel_lrc.o \ intel_mocs.o \ intel_ringbuffer.o \ diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index d5f66bbdb160..48e574247a30 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -730,10 +730,22 @@ static int i915_gem_request_info(struct seq_file *m, void *data) static void i915_ring_seqno_info(struct seq_file *m, struct intel_engine_cs *ring) { + struct rb_node *rb; + if (ring->get_seqno) { seq_printf(m, "Current sequence (%s): %x\n", ring->name, ring->get_seqno(ring, false)); } + + spin_lock(>breadcrumbs.lock); + for (rb = rb_first(>breadcrumbs.requests); +rb != NULL; +rb = rb_next(rb)) { + struct intel_breadcrumb *b = container_of(rb, typeof(*b), node); + seq_printf(m, "Waiting (%s): %s [%d] on %x\n", + ring->name, b->task->comm, b->task->pid, b->seqno); + } + spin_unlock(>breadcrumbs.lock); } static int i915_gem_seqno_info(struct seq_file *m, void *data) @@ -1356,8 +1368,9 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused) for_each_ring(ring, dev_priv, i) { seq_printf(m, "%s:\n", ring->name); - seq_printf(m, "\tseqno = %x [current %x]\n", - ring->hangcheck.seqno, seqno[i]); + seq_printf(m, "\tseqno = %x [current %x], waiters? %d\n", + ring->hangcheck.seqno, seqno[i], + intel_engine_has_waiter(ring)); seq_printf(m, "\tACTHD = 0x%08llx [current 0x%08llx]\n", (long
[Intel-gfx] [PATCH 04/32] drm/i915: Hide the atomic_read(reset_counter) behind a helper
This is principally a little bit of syntatic sugar to hide the atomic_read()s throughout the code to retrieve the current reset_counter. It also provides the other utility functions to check the reset state on the already read reset_counter, so that (in later patches) we can read it once and do multiple tests rather than risk the value changing between tests. v2: Be strictly on converting existing i915_reset_in_progress() over to the more verbose i915_reset_in_progress_or_wedged(). Signed-off-by: Chris WilsonCc: Daniel Vetter --- drivers/gpu/drm/i915/i915_debugfs.c | 4 ++-- drivers/gpu/drm/i915/i915_drv.h | 32 drivers/gpu/drm/i915/i915_gem.c | 16 drivers/gpu/drm/i915/i915_irq.c | 2 +- drivers/gpu/drm/i915/intel_display.c| 18 +++--- drivers/gpu/drm/i915/intel_lrc.c| 2 +- drivers/gpu/drm/i915/intel_ringbuffer.c | 4 ++-- 7 files changed, 53 insertions(+), 25 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index 24318b79bcfc..c26a4c087f49 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -4672,7 +4672,7 @@ i915_wedged_get(void *data, u64 *val) struct drm_device *dev = data; struct drm_i915_private *dev_priv = dev->dev_private; - *val = atomic_read(_priv->gpu_error.reset_counter); + *val = i915_reset_counter(_priv->gpu_error); return 0; } @@ -4691,7 +4691,7 @@ i915_wedged_set(void *data, u64 val) * while it is writing to 'i915_wedged' */ - if (i915_reset_in_progress(_priv->gpu_error)) + if (i915_reset_in_progress_or_wedged(_priv->gpu_error)) return -EAGAIN; intel_runtime_pm_get(dev_priv); diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 8c4303b664d9..466caa0bc043 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2992,20 +2992,44 @@ void i915_gem_retire_requests_ring(struct intel_engine_cs *ring); int __must_check i915_gem_check_wedge(struct i915_gpu_error *error, bool interruptible); +static inline u32 i915_reset_counter(struct i915_gpu_error *error) +{ + return atomic_read(>reset_counter); +} + +static inline bool __i915_reset_in_progress(u32 reset) +{ + return unlikely(reset & I915_RESET_IN_PROGRESS_FLAG); +} + +static inline bool __i915_reset_in_progress_or_wedged(u32 reset) +{ + return unlikely(reset & (I915_RESET_IN_PROGRESS_FLAG | I915_WEDGED)); +} + +static inline bool __i915_terminally_wedged(u32 reset) +{ + return unlikely(reset & I915_WEDGED); +} + static inline bool i915_reset_in_progress(struct i915_gpu_error *error) { - return unlikely(atomic_read(>reset_counter) - & (I915_RESET_IN_PROGRESS_FLAG | I915_WEDGED)); + return __i915_reset_in_progress(i915_reset_counter(error)); +} + +static inline bool i915_reset_in_progress_or_wedged(struct i915_gpu_error *error) +{ + return __i915_reset_in_progress_or_wedged(i915_reset_counter(error)); } static inline bool i915_terminally_wedged(struct i915_gpu_error *error) { - return atomic_read(>reset_counter) & I915_WEDGED; + return __i915_terminally_wedged(i915_reset_counter(error)); } static inline u32 i915_reset_count(struct i915_gpu_error *error) { - return ((atomic_read(>reset_counter) & ~I915_WEDGED) + 1) / 2; + return ((i915_reset_counter(error) & ~I915_WEDGED) + 1) / 2; } static inline bool i915_stop_ring_allow_ban(struct drm_i915_private *dev_priv) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 29d98ddbbc80..0b3e0534baa3 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -85,7 +85,7 @@ i915_gem_wait_for_error(struct i915_gpu_error *error) { int ret; -#define EXIT_COND (!i915_reset_in_progress(error) || \ +#define EXIT_COND (!i915_reset_in_progress_or_wedged(error) || \ i915_terminally_wedged(error)) if (EXIT_COND) return 0; @@ -1113,7 +1113,7 @@ int i915_gem_check_wedge(struct i915_gpu_error *error, bool interruptible) { - if (i915_reset_in_progress(error)) { + if (i915_reset_in_progress_or_wedged(error)) { /* Non-interruptible callers can't handle -EAGAIN, hence return * -EIO unconditionally for these. */ if (!interruptible) @@ -1297,7 +1297,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req, /* We need to check whether any gpu reset happened in between * the caller grabbing the seqno and now ... */ - if (reset_counter != atomic_read(_priv->gpu_error.reset_counter)) { + if (reset_counter !=
[Intel-gfx] [PATCH 03/32] drm/i915: Only spin whilst waiting on the current request
Limit busywaiting only to the request currently being processed by the GPU. If the request is not currently being processed by the GPU, there is a very low likelihood of it being completed within the 2 microsecond spin timeout and so we will just be wasting CPU cycles. v2: Check for logical inversion when rebasing - we were incorrectly checking for this request being active, and instead busywaiting for when the GPU was not yet processing the request of interest. v3: Try another colour for the seqno names. v4: Another colour for the function names. v5: Remove the forced coherency when checking for the active request. On reflection and plenty of recent experimentation, the issue is not a cache coherency problem - but an irq/seqno ordering problem (timing issue). Here, we do not need the w/a to force ordering of the read with an interrupt. Signed-off-by: Chris WilsonReviewed-by: Tvrtko Ursulin Cc: "Rogozhkin, Dmitry V" Cc: Daniel Vetter Cc: Tvrtko Ursulin Cc: Eero Tamminen Cc: "Rantala, Valtteri" Cc: sta...@vger.kernel.org --- drivers/gpu/drm/i915/i915_drv.h | 27 +++ drivers/gpu/drm/i915/i915_gem.c | 8 +++- 2 files changed, 26 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 5edd39352e97..8c4303b664d9 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2182,8 +2182,17 @@ struct drm_i915_gem_request { struct drm_i915_private *i915; struct intel_engine_cs *ring; - /** GEM sequence number associated with this request. */ - uint32_t seqno; +/** GEM sequence number associated with the previous request, + * when the HWS breadcrumb is equal to this the GPU is processing + * this request. + */ + u32 previous_seqno; + +/** GEM sequence number associated with this request, + * when the HWS breadcrumb is equal or greater than this the GPU + * has finished processing this request. + */ + u32 seqno; /** Position in the ringbuffer of the start of the request */ u32 head; @@ -2958,15 +2967,17 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2) return (int32_t)(seq1 - seq2) >= 0; } +static inline bool i915_gem_request_started(struct drm_i915_gem_request *req, + bool lazy_coherency) +{ + u32 seqno = req->ring->get_seqno(req->ring, lazy_coherency); + return i915_seqno_passed(seqno, req->previous_seqno); +} + static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req, bool lazy_coherency) { - u32 seqno; - - BUG_ON(req == NULL); - - seqno = req->ring->get_seqno(req->ring, lazy_coherency); - + u32 seqno = req->ring->get_seqno(req->ring, lazy_coherency); return i915_seqno_passed(seqno, req->seqno); } diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 46a84c447d8f..29d98ddbbc80 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1193,9 +1193,13 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state) * takes to sleep on a request, on the order of a microsecond. */ - if (i915_gem_request_get_ring(req)->irq_refcount) + if (req->ring->irq_refcount) return -EBUSY; + /* Only spin if we know the GPU is processing this request */ + if (!i915_gem_request_started(req, true)) + return -EAGAIN; + timeout = local_clock_us() + 5; while (!need_resched()) { if (i915_gem_request_completed(req, true)) @@ -1209,6 +1213,7 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state) cpu_relax_lowlatency(); } + if (i915_gem_request_completed(req, false)) return 0; @@ -2600,6 +2605,7 @@ void __i915_add_request(struct drm_i915_gem_request *request, request->batch_obj = obj; request->emitted_jiffies = jiffies; + request->previous_seqno = ring->last_submitted_seqno; ring->last_submitted_seqno = request->seqno; list_add_tail(>list, >request_list); -- 2.6.3 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 20/32] drm/i915: Replace manual barrier() with READ_ONCE() in HWS accessor
When reading from the HWS page, we use barrier() to prevent the compiler optimising away the read from the volatile (may be updated by the GPU) memory address. This is more suited to READ_ONCE(); make it so. Signed-off-by: Chris WilsonCc: Daniel Vetter --- drivers/gpu/drm/i915/intel_ringbuffer.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h index 7ad06cbef6be..a35c17106f4b 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.h +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h @@ -416,8 +416,7 @@ intel_read_status_page(struct intel_engine_cs *ring, int reg) { /* Ensure that the compiler doesn't optimize away the load. */ - barrier(); - return ring->status_page.page_addr[reg]; + return READ_ONCE(ring->status_page.page_addr[reg]); } static inline void -- 2.6.3 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 10/32] drm/i915: Suppress error message when GPU resets are disabled
If we do not have lowlevel support for reseting the GPU, or if the user has explicitly disabled reseting the device, the failure is expected. Since it is an expected failure, we should be using a lower priority message than *ERROR*, perhaps NOTICE. In the absence of DRM_NOTICE, just emit the expected failure as a DEBUG message. Signed-off-by: Chris WilsonCc: Daniel Vetter Reviewed-by: Daniel Vetter --- drivers/gpu/drm/i915/i915_drv.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 8bdc51bc00a4..ba91f65b6082 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -895,7 +895,10 @@ int i915_reset(struct drm_device *dev) pr_notice("drm/i915: Resetting chip after gpu hang\n"); if (ret) { - DRM_ERROR("Failed to reset chip: %i\n", ret); + if (ret != -ENODEV) + DRM_ERROR("Failed to reset chip: %i\n", ret); + else + DRM_DEBUG_DRIVER("GPU reset disabled\n"); goto error; } -- 2.6.3 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 21/32] drm/i915: Broadwell execlists needs exactly the same seqno w/a as legacy
In legacy mode, we use the gen6 seqno barrier to insert a delay after the interrupt before reading the seqno (as the seqno write is not flushed before the interrupt is sent, the interrupt arrives before the seqno is visible). Execlists ignored the evidence of igt. Signed-off-by: Chris Wilson--- drivers/gpu/drm/i915/intel_lrc.c | 39 +-- 1 file changed, 21 insertions(+), 18 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 91e5ed6867e5..a73c5e671423 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1745,18 +1745,24 @@ static int gen8_emit_flush_render(struct drm_i915_gem_request *request, return 0; } -static void bxt_seqno_barrier(struct intel_engine_cs *ring) +static void +gen6_seqno_barrier(struct intel_engine_cs *ring) { - /* -* On BXT A steppings there is a HW coherency issue whereby the -* MI_STORE_DATA_IMM storing the completed request's seqno -* occasionally doesn't invalidate the CPU cache. Work around this by -* clflushing the corresponding cacheline whenever the caller wants -* the coherency to be guaranteed. Note that this cacheline is known -* to be clean at this point, since we only write it in -* bxt_a_set_seqno(), where we also do a clflush after the write. So -* this clflush in practice becomes an invalidate operation. + /* Workaround to force correct ordering between irq and seqno writes on +* ivb (and maybe also on snb) by reading from a CS register (like +* ACTHD) before reading the status page. +* +* Note that this effectively effectively stalls the read by the time +* it takes to do a memory transaction, which more or less ensures +* that the write from the GPU has sufficient time to invalidate +* the CPU cacheline. Alternatively we could delay the interrupt from +* the CS ring to give the write time to land, but that would incur +* a delay after every batch i.e. much more frequent than a delay +* when waiting for the interrupt (with the same net latency). */ + struct drm_i915_private *dev_priv = ring->i915; + POSTING_READ_FW(RING_ACTHD(ring->mmio_base)); + intel_flush_status_page(ring, I915_GEM_HWS_INDEX); } @@ -1954,8 +1960,7 @@ static int logical_render_ring_init(struct drm_device *dev) ring->init_hw = gen8_init_render_ring; ring->init_context = gen8_init_rcs_context; ring->cleanup = intel_fini_pipe_control; - if (IS_BXT_REVID(dev, 0, BXT_REVID_A1)) - ring->seqno_barrier = bxt_seqno_barrier; + ring->seqno_barrier = gen6_seqno_barrier; ring->emit_request = gen8_emit_request; ring->emit_flush = gen8_emit_flush_render; ring->irq_get = gen8_logical_ring_get_irq; @@ -2001,8 +2006,7 @@ static int logical_bsd_ring_init(struct drm_device *dev) GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS1_IRQ_SHIFT; ring->init_hw = gen8_init_common_ring; - if (IS_BXT_REVID(dev, 0, BXT_REVID_A1)) - ring->seqno_barrier = bxt_seqno_barrier; + ring->seqno_barrier = gen6_seqno_barrier; ring->emit_request = gen8_emit_request; ring->emit_flush = gen8_emit_flush; ring->irq_get = gen8_logical_ring_get_irq; @@ -2026,6 +2030,7 @@ static int logical_bsd2_ring_init(struct drm_device *dev) GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS2_IRQ_SHIFT; ring->init_hw = gen8_init_common_ring; + ring->seqno_barrier = gen6_seqno_barrier; ring->emit_request = gen8_emit_request; ring->emit_flush = gen8_emit_flush; ring->irq_get = gen8_logical_ring_get_irq; @@ -2049,8 +2054,7 @@ static int logical_blt_ring_init(struct drm_device *dev) GT_CONTEXT_SWITCH_INTERRUPT << GEN8_BCS_IRQ_SHIFT; ring->init_hw = gen8_init_common_ring; - if (IS_BXT_REVID(dev, 0, BXT_REVID_A1)) - ring->seqno_barrier = bxt_seqno_barrier; + ring->seqno_barrier = gen6_seqno_barrier; ring->emit_request = gen8_emit_request; ring->emit_flush = gen8_emit_flush; ring->irq_get = gen8_logical_ring_get_irq; @@ -2074,8 +2078,7 @@ static int logical_vebox_ring_init(struct drm_device *dev) GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VECS_IRQ_SHIFT; ring->init_hw = gen8_init_common_ring; - if (IS_BXT_REVID(dev, 0, BXT_REVID_A1)) - ring->seqno_barrier = bxt_seqno_barrier; + ring->seqno_barrier = gen6_seqno_barrier; ring->emit_request = gen8_emit_request; ring->emit_flush = gen8_emit_flush; ring->irq_get = gen8_logical_ring_get_irq; -- 2.6.3 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org
[Intel-gfx] [PATCH 16/32] drm/i915: Separate out the seqno-barrier from engine->get_seqno
In order to simplify the next couple of patches, extract the lazy_coherency optimisation our of the engine->get_seqno() vfunc into its own callback. Signed-off-by: Chris Wilson--- drivers/gpu/drm/i915/i915_debugfs.c | 6 ++--- drivers/gpu/drm/i915/i915_drv.h | 12 ++ drivers/gpu/drm/i915/i915_gpu_error.c| 2 +- drivers/gpu/drm/i915/i915_irq.c | 4 ++-- drivers/gpu/drm/i915/i915_trace.h| 2 +- drivers/gpu/drm/i915/intel_breadcrumbs.c | 4 ++-- drivers/gpu/drm/i915/intel_lrc.c | 39 drivers/gpu/drm/i915/intel_ringbuffer.c | 36 +++-- drivers/gpu/drm/i915/intel_ringbuffer.h | 4 ++-- 9 files changed, 53 insertions(+), 56 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index 48e574247a30..6344fe69ab82 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -600,7 +600,7 @@ static int i915_gem_pageflip_info(struct seq_file *m, void *data) ring->name, i915_gem_request_get_seqno(work->flip_queued_req), dev_priv->next_seqno, - ring->get_seqno(ring, true), + ring->get_seqno(ring), i915_gem_request_completed(work->flip_queued_req, true)); } else seq_printf(m, "Flip not associated with any ring\n"); @@ -734,7 +734,7 @@ static void i915_ring_seqno_info(struct seq_file *m, if (ring->get_seqno) { seq_printf(m, "Current sequence (%s): %x\n", - ring->name, ring->get_seqno(ring, false)); + ring->name, ring->get_seqno(ring)); } spin_lock(>breadcrumbs.lock); @@ -1353,7 +1353,7 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused) intel_runtime_pm_get(dev_priv); for_each_ring(ring, dev_priv, i) { - seqno[i] = ring->get_seqno(ring, false); + seqno[i] = ring->get_seqno(ring); acthd[i] = intel_ring_get_active_head(ring); } diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 830d760aa562..ff83f148658f 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2981,15 +2981,19 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2) static inline bool i915_gem_request_started(struct drm_i915_gem_request *req, bool lazy_coherency) { - u32 seqno = req->ring->get_seqno(req->ring, lazy_coherency); - return i915_seqno_passed(seqno, req->previous_seqno); + if (!lazy_coherency && req->ring->seqno_barrier) + req->ring->seqno_barrier(req->ring); + return i915_seqno_passed(req->ring->get_seqno(req->ring), +req->previous_seqno); } static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req, bool lazy_coherency) { - u32 seqno = req->ring->get_seqno(req->ring, lazy_coherency); - return i915_seqno_passed(seqno, req->seqno); + if (!lazy_coherency && req->ring->seqno_barrier) + req->ring->seqno_barrier(req->ring); + return i915_seqno_passed(req->ring->get_seqno(req->ring), +req->seqno); } int __must_check i915_gem_get_seqno(struct drm_device *dev, u32 *seqno); diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index f805d117f3d1..01d0206ca4dd 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -902,8 +902,8 @@ static void i915_record_ring_state(struct drm_device *dev, ering->waiting = intel_engine_has_waiter(ring); ering->instpm = I915_READ(RING_INSTPM(ring->mmio_base)); - ering->seqno = ring->get_seqno(ring, false); ering->acthd = intel_ring_get_active_head(ring); + ering->seqno = ring->get_seqno(ring); ering->start = I915_READ_START(ring); ering->head = I915_READ_HEAD(ring); ering->tail = I915_READ_TAIL(ring); diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index d250b4721a6a..da3c8aaa50a3 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -2875,7 +2875,7 @@ static int semaphore_passed(struct intel_engine_cs *ring) if (signaller->hangcheck.deadlock >= I915_NUM_RINGS) return -1; - if (i915_seqno_passed(signaller->get_seqno(signaller, false), seqno)) + if (i915_seqno_passed(signaller->get_seqno(signaller), seqno)) return 1; /* cursory check for
[Intel-gfx] [PATCH 08/32] drm/i915: Simplify reset_counter handling during atomic modesetting
Now that the reset_counter is stored on the request, we can rearrange the code to handle reading the counter versus waiting during the atomic modesetting for readibility (by deleting the hairiest of codes). Signed-off-by: Chris WilsonCc: Daniel Vetter --- drivers/gpu/drm/i915/intel_display.c | 18 +++--- 1 file changed, 7 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index d59beca928b7..d7bbd015de35 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -13393,9 +13393,9 @@ static int intel_atomic_prepare_commit(struct drm_device *dev, return ret; ret = drm_atomic_helper_prepare_planes(dev, state); - if (!ret && !async && !i915_reset_in_progress_or_wedged(_priv->gpu_error)) { - mutex_unlock(>struct_mutex); + mutex_unlock(>struct_mutex); + if (!ret && !async) { for_each_plane_in_state(state, plane, plane_state, i) { struct intel_plane_state *intel_plane_state = to_intel_plane_state(plane_state); @@ -13409,19 +13409,15 @@ static int intel_atomic_prepare_commit(struct drm_device *dev, /* Swallow -EIO errors to allow updates during hw lockup. */ if (ret == -EIO) ret = 0; - - if (ret) + if (ret) { + mutex_lock(>struct_mutex); + drm_atomic_helper_cleanup_planes(dev, state); + mutex_unlock(>struct_mutex); break; + } } - - if (!ret) - return 0; - - mutex_lock(>struct_mutex); - drm_atomic_helper_cleanup_planes(dev, state); } - mutex_unlock(>struct_mutex); return ret; } -- 2.6.3 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 11/32] drm/i915: Delay queuing hangcheck to wait-request
We can forgo queuing the hangcheck from the start of every request to until we wait upon a request. This reduces the overhead of every request, but may increase the latency of detecting a hang. Howeever, if nothing every waits upon a hang, did it ever hang? It also improves the robustness of the wait-request by ensuring that the hangchecker is indeed running before we sleep indefinitely (and thereby ensuring that we never actually sleep forever waiting for a dead GPU). Signed-off-by: Chris Wilson--- drivers/gpu/drm/i915/i915_drv.h | 2 +- drivers/gpu/drm/i915/i915_gem.c | 5 +++-- drivers/gpu/drm/i915/i915_irq.c | 9 - 3 files changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 7acbc072973a..987a35c5af72 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2723,7 +2723,7 @@ void intel_hpd_cancel_work(struct drm_i915_private *dev_priv); bool intel_hpd_pin_to_port(enum hpd_pin pin, enum port *port); /* i915_irq.c */ -void i915_queue_hangcheck(struct drm_device *dev); +void i915_queue_hangcheck(struct drm_i915_private *dev_priv); __printf(3, 4) void i915_handle_error(struct drm_device *dev, bool wedged, const char *fmt, ...); diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index f5760869a17c..0340a5fe9cda 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1308,6 +1308,9 @@ int __i915_wait_request(struct drm_i915_gem_request *req, break; } + /* Ensure that even if the GPU hangs, we get woken up. */ + i915_queue_hangcheck(dev_priv); + timer.function = NULL; if (timeout || missed_irq(dev_priv, ring)) { unsigned long expire; @@ -2584,8 +2587,6 @@ void __i915_add_request(struct drm_i915_gem_request *request, trace_i915_gem_request_add(request); - i915_queue_hangcheck(ring->dev); - queue_delayed_work(dev_priv->wq, _priv->mm.retire_work, round_jiffies_up_relative(HZ)); diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 88206c0404d7..21089ac5dd58 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -3066,15 +3066,14 @@ static void i915_hangcheck_elapsed(struct work_struct *work) if (rings_hung) return i915_handle_error(dev, true, "Ring hung"); + /* Reset timer in case GPU hangs without another request being added */ if (busy_count) - /* Reset timer case chip hangs without another request -* being added */ - i915_queue_hangcheck(dev); + i915_queue_hangcheck(dev_priv); } -void i915_queue_hangcheck(struct drm_device *dev) +void i915_queue_hangcheck(struct drm_i915_private *dev_priv) { - struct i915_gpu_error *e = _i915(dev)->gpu_error; + struct i915_gpu_error *e = _priv->gpu_error; if (!i915.enable_hangcheck) return; -- 2.6.3 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 13/32] drm/i915: Make queueing the hangcheck work inline
Since the function is a small wrapper around schedule_delayed_work(), move it inline to remove the function call overhead for the principle caller. Signed-off-by: Chris Wilson--- drivers/gpu/drm/i915/i915_drv.h | 17 - drivers/gpu/drm/i915/i915_irq.c | 16 2 files changed, 16 insertions(+), 17 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 9304ecfa05d4..f82e8fb19c9b 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2722,7 +2722,22 @@ void intel_hpd_cancel_work(struct drm_i915_private *dev_priv); bool intel_hpd_pin_to_port(enum hpd_pin pin, enum port *port); /* i915_irq.c */ -void i915_queue_hangcheck(struct drm_i915_private *dev_priv); +static inline void i915_queue_hangcheck(struct drm_i915_private *dev_priv) +{ + unsigned long delay; + + if (unlikely(!i915.enable_hangcheck)) + return; + + /* Don't continually defer the hangcheck so that it is always run at +* least once after work has been scheduled on any ring. Otherwise, +* we will ignore a hung ring if a second ring is kept busy. +*/ + + delay = round_jiffies_up_relative(DRM_I915_HANGCHECK_JIFFIES); + schedule_delayed_work(_priv->gpu_error.hangcheck_work, delay); +} + __printf(3, 4) void i915_handle_error(struct drm_device *dev, bool wedged, const char *fmt, ...); diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index afe04aeb858d..5f88869e2207 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -3071,22 +3071,6 @@ static void i915_hangcheck_elapsed(struct work_struct *work) i915_queue_hangcheck(dev_priv); } -void i915_queue_hangcheck(struct drm_i915_private *dev_priv) -{ - unsigned long delay; - - if (!i915.enable_hangcheck) - return; - - /* Don't continually defer the hangcheck so that it is always run at -* least once after work has been scheduled on any ring. Otherwise, -* we will ignore a hung ring if a second ring is kept busy. -*/ - - delay = round_jiffies_up_relative(DRM_I915_HANGCHECK_JIFFIES); - schedule_delayed_work(_priv->gpu_error.hangcheck_work, delay); -} - static void ibx_irq_reset(struct drm_device *dev) { struct drm_i915_private *dev_priv = dev->dev_private; -- 2.6.3 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 29/32] drm/i915: Only start retire worker when idle
The retire worker is a low frequency task that makes sure we retire outstanding requests if userspace is being lax. We only need to start it once as it remains active until the GPU is idle, so do a cheap test before the more expensive queue_work(). A consequence of this is that we need correct locking in the worker to make the hot path of request submission cheap. To keep the symmetry and keep hangcheck strictly bound by the GPU's wakelock, we move the cancel_sync(hangcheck) to the idle worker before dropping the wakelock. Signed-off-by: Chris WilsonReferences: https://bugs.freedesktop.org/show_bug.cgi?id=88437 --- drivers/gpu/drm/i915/i915_drv.c | 2 - drivers/gpu/drm/i915/i915_drv.h | 2 +- drivers/gpu/drm/i915/i915_gem.c | 97 +--- drivers/gpu/drm/i915/intel_display.c | 29 --- 4 files changed, 69 insertions(+), 61 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index ba91f65b6082..0f79ee1d35a2 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -1472,8 +1472,6 @@ static int intel_runtime_suspend(struct device *device) i915_gem_release_all_mmaps(dev_priv); mutex_unlock(>struct_mutex); - cancel_delayed_work_sync(_priv->gpu_error.hangcheck_work); - intel_guc_suspend(dev); intel_suspend_gt_powersave(dev); diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index dabfb043362f..834cc779a9db 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2996,7 +2996,7 @@ int __must_check i915_gem_set_seqno(struct drm_device *dev, u32 seqno); struct drm_i915_gem_request * i915_gem_find_active_request(struct intel_engine_cs *ring); -bool i915_gem_retire_requests(struct drm_device *dev); +void i915_gem_retire_requests(struct drm_device *dev); void i915_gem_retire_requests_ring(struct intel_engine_cs *ring); static inline u32 i915_reset_counter(struct i915_gpu_error *error) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index fdd9dd5296e9..d1a7a7f8f3ad 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2495,6 +2495,51 @@ i915_gem_get_seqno(struct drm_device *dev, u32 *seqno) return 0; } +static void i915_gem_mark_busy(struct drm_i915_private *dev_priv) +{ + if (dev_priv->mm.busy) + return; + + intel_runtime_pm_get_noresume(dev_priv); + + i915_update_gfx_val(dev_priv); + if (INTEL_INFO(dev_priv)->gen >= 6) + gen6_rps_busy(dev_priv); + + queue_delayed_work(dev_priv->wq, + _priv->mm.retire_work, + round_jiffies_up_relative(HZ)); + + dev_priv->mm.busy = true; +} + +static void kick_waiters(struct drm_i915_private *dev_priv) +{ + struct intel_engine_cs *ring; + int i; + + for_each_ring(ring, dev_priv, i) { + if (!intel_engine_has_waiter(ring)) + continue; + + set_bit(ring->id, _priv->gpu_error.missed_irq_rings); + intel_engine_wakeup(ring); + } +} + +static void i915_gem_mark_idle(struct drm_i915_private *dev_priv) +{ + dev_priv->mm.busy = false; + + if (cancel_delayed_work_sync(_priv->gpu_error.hangcheck_work)) + kick_waiters(dev_priv); + + if (INTEL_INFO(dev_priv)->gen >= 6) + gen6_rps_idle(dev_priv); + + intel_runtime_pm_put(dev_priv); +} + /* * NB: This function is not allowed to fail. Doing so would mean the the * request is not being tracked for completion but the work itself is @@ -2575,10 +2620,7 @@ void __i915_add_request(struct drm_i915_gem_request *request, trace_i915_gem_request_add(request); - queue_delayed_work(dev_priv->wq, - _priv->mm.retire_work, - round_jiffies_up_relative(HZ)); - intel_mark_busy(dev_priv->dev); + i915_gem_mark_busy(dev_priv); /* Sanity check that the reserved size was large enough. */ intel_ring_reserved_space_end(ringbuf); @@ -2910,7 +2952,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring) WARN_ON(i915_verify_lists(ring->dev)); } -bool +void i915_gem_retire_requests(struct drm_device *dev) { struct drm_i915_private *dev_priv = dev->dev_private; @@ -2934,10 +2976,8 @@ i915_gem_retire_requests(struct drm_device *dev) if (idle) mod_delayed_work(dev_priv->wq, - _priv->mm.idle_work, - msecs_to_jiffies(100)); - - return idle; +_priv->mm.idle_work, +msecs_to_jiffies(100)); } static void @@ -2946,16 +2986,20 @@ i915_gem_retire_work_handler(struct work_struct *work) struct drm_i915_private
[Intel-gfx] [PATCH 02/32] drm/i915: Limit the busy wait on requests to 5us not 10ms!
When waiting for high frequency requests, the finite amount of time required to set up the irq and wait upon it limits the response rate. By busywaiting on the request completion for a short while we can service the high frequency waits as quick as possible. However, if it is a slow request, we want to sleep as quickly as possible. The tradeoff between waiting and sleeping is roughly the time it takes to sleep on a request, on the order of a microsecond. Based on measurements of synchronous workloads from across big core and little atom, I have set the limit for busywaiting as 10 microseconds. In most of the synchronous cases, we can reduce the limit down to as little as 2 miscroseconds, but that leaves quite a few test cases regressing by factors of 3 and more. The code currently uses the jiffie clock, but that is far too coarse (on the order of 10 milliseconds) and results in poor interactivity as the CPU ends up being hogged by slow requests. To get microsecond resolution we need to use a high resolution timer. The cheapest of which is polling local_clock(), but that is only valid on the same CPU. If we switch CPUs because the task was preempted, we can also use that as an indicator that the system is too busy to waste cycles on spinning and we should sleep instead. __i915_spin_request was introduced in commit 2def4ad99befa25775dd2f714fdd4d92faec6e34 [v4.2] Author: Chris WilsonDate: Tue Apr 7 16:20:41 2015 +0100 drm/i915: Optimistically spin for the request completion v2: Drop full u64 for unsigned long - the timer is 32bit wraparound safe, so we can use native register sizes on smaller architectures. Mention the approximate microseconds units for elapsed time and add some extra comments describing the reason for busywaiting. v3: Raise the limit to 10us v4: Now 5us. Reported-by: Jens Axboe Link: https://lkml.org/lkml/2015/11/12/621 Reviewed-by: Tvrtko Ursulin Cc: "Rogozhkin, Dmitry V" Cc: Daniel Vetter Cc: Tvrtko Ursulin Cc: Eero Tamminen Cc: "Rantala, Valtteri" Cc: sta...@vger.kernel.org --- drivers/gpu/drm/i915/i915_gem.c | 47 +++-- 1 file changed, 45 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 7e1246410afc..46a84c447d8f 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1146,14 +1146,57 @@ static bool missed_irq(struct drm_i915_private *dev_priv, return test_bit(ring->id, _priv->gpu_error.missed_irq_rings); } +static unsigned long local_clock_us(unsigned *cpu) +{ + unsigned long t; + + /* Cheaply and approximately convert from nanoseconds to microseconds. +* The result and subsequent calculations are also defined in the same +* approximate microseconds units. The principal source of timing +* error here is from the simple truncation. +* +* Note that local_clock() is only defined wrt to the current CPU; +* the comparisons are no longer valid if we switch CPUs. Instead of +* blocking preemption for the entire busywait, we can detect the CPU +* switch and use that as indicator of system load and a reason to +* stop busywaiting, see busywait_stop(). +*/ + *cpu = get_cpu(); + t = local_clock() >> 10; + put_cpu(); + + return t; +} + +static bool busywait_stop(unsigned long timeout, unsigned cpu) +{ + unsigned this_cpu; + + if (time_after(local_clock_us(_cpu), timeout)) + return true; + + return this_cpu != cpu; +} + static int __i915_spin_request(struct drm_i915_gem_request *req, int state) { unsigned long timeout; + unsigned cpu; + + /* When waiting for high frequency requests, e.g. during synchronous +* rendering split between the CPU and GPU, the finite amount of time +* required to set up the irq and wait upon it limits the response +* rate. By busywaiting on the request completion for a short while we +* can service the high frequency waits as quick as possible. However, +* if it is a slow request, we want to sleep as quickly as possible. +* The tradeoff between waiting and sleeping is roughly the time it +* takes to sleep on a request, on the order of a microsecond. +*/ if (i915_gem_request_get_ring(req)->irq_refcount) return -EBUSY; - timeout = jiffies + 1; + timeout = local_clock_us() + 5; while (!need_resched()) { if (i915_gem_request_completed(req, true)) return 0; @@ -1161,7 +1204,7 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state) if
[Intel-gfx] [PATCH 06/32] drm/i915: Tighten reset_counter for reset status
In the reset_counter, we use two bits to track a GPU hang and reset. The low bit is a "reset-in-progress" flag that we set to signal when we need to break waiters in order for the recovery task to grab the mutex. As soon as the recovery task has the mutex, we can clear that flag (which we do by incrementing the reset_counter thereby incrementing the gobal reset epoch). By clearing that flag when the recovery task holds the struct_mutex, we can forgo a second flag that simply tells GEM to ignore the "reset-in-progress" flag. The second flag we store in the reset_counter is whether the reset failed and we consider the GPU terminally wedged. Whilst this flag is set, all access to the GPU (at least through GEM rather than direct mmio access) is verboten. Signed-off-by: Chris WilsonCc: Daniel Vetter --- drivers/gpu/drm/i915/i915_debugfs.c | 4 ++-- drivers/gpu/drm/i915/i915_drv.c | 39 ++--- drivers/gpu/drm/i915/i915_drv.h | 3 --- drivers/gpu/drm/i915/i915_gem.c | 27 + drivers/gpu/drm/i915/i915_irq.c | 21 ++-- 5 files changed, 36 insertions(+), 58 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index c26a4c087f49..d5f66bbdb160 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -4672,7 +4672,7 @@ i915_wedged_get(void *data, u64 *val) struct drm_device *dev = data; struct drm_i915_private *dev_priv = dev->dev_private; - *val = i915_reset_counter(_priv->gpu_error); + *val = i915_terminally_wedged(_priv->gpu_error); return 0; } @@ -4691,7 +4691,7 @@ i915_wedged_set(void *data, u64 val) * while it is writing to 'i915_wedged' */ - if (i915_reset_in_progress_or_wedged(_priv->gpu_error)) + if (i915_reset_in_progress(_priv->gpu_error)) return -EAGAIN; intel_runtime_pm_get(dev_priv); diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 8ddfcce92cf1..8bdc51bc00a4 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -858,23 +858,32 @@ int i915_resume_switcheroo(struct drm_device *dev) int i915_reset(struct drm_device *dev) { struct drm_i915_private *dev_priv = dev->dev_private; - bool simulated; + struct i915_gpu_error *error = _priv->gpu_error; + unsigned reset_counter; int ret; intel_reset_gt_powersave(dev); mutex_lock(>struct_mutex); - i915_gem_reset(dev); + /* Clear any previous failed attempts at recovery. Time to try again. */ + atomic_andnot(I915_WEDGED, >reset_counter); - simulated = dev_priv->gpu_error.stop_rings != 0; + /* Clear the reset-in-progress flag and increment the reset epoch. */ + reset_counter = atomic_inc_return(>reset_counter); + if (WARN_ON(__i915_reset_in_progress(reset_counter))) { + ret = -EIO; + goto error; + } + + i915_gem_reset(dev); ret = intel_gpu_reset(dev); /* Also reset the gpu hangman. */ - if (simulated) { + if (error->stop_rings != 0) { DRM_INFO("Simulated gpu hang, resetting stop_rings\n"); - dev_priv->gpu_error.stop_rings = 0; + error->stop_rings = 0; if (ret == -ENODEV) { DRM_INFO("Reset not implemented, but ignoring " "error for simulated gpu hangs\n"); @@ -887,8 +896,7 @@ int i915_reset(struct drm_device *dev) if (ret) { DRM_ERROR("Failed to reset chip: %i\n", ret); - mutex_unlock(>struct_mutex); - return ret; + goto error; } intel_overlay_reset(dev_priv); @@ -907,20 +915,14 @@ int i915_reset(struct drm_device *dev) * was running at the time of the reset (i.e. we weren't VT * switched away). */ - - /* Used to prevent gem_check_wedged returning -EAGAIN during gpu reset */ - dev_priv->gpu_error.reload_in_reset = true; - ret = i915_gem_init_hw(dev); - - dev_priv->gpu_error.reload_in_reset = false; - - mutex_unlock(>struct_mutex); if (ret) { DRM_ERROR("Failed hw init on reset %d\n", ret); - return ret; + goto error; } + mutex_unlock(>struct_mutex); + /* * rps/rc6 re-init is necessary to restore state lost after the * reset and the re-install of gt irqs. Skip for ironlake per @@ -931,6 +933,11 @@ int i915_reset(struct drm_device *dev) intel_enable_gt_powersave(dev); return 0; + +error: + atomic_or(I915_WEDGED, >reset_counter); + mutex_unlock(>struct_mutex); + return ret; } static int i915_pci_probe(struct pci_dev
[Intel-gfx] [PATCH 31/32] drm/i915: Add background commentary to "waitboosting"
Describe the intent of boosting the GPU frequency to maximum before waiting on the GPU. RPS waitboosting was introduced with commit b29c19b645287f7062e17d70fa4e9781a01a5d88 Author: Chris WilsonDate: Wed Sep 25 17:34:56 2013 +0100 drm/i915: Boost RPS frequency for CPU stalls but lacked a concise comment in the code to explain itself. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_gem.c | 16 1 file changed, 16 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index a0584cffa7cd..56b00bf69d89 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1246,6 +1246,22 @@ int __i915_wait_request(struct drm_i915_gem_request *req, } trace_i915_gem_request_wait_begin(req); + + /* This client is about to stall waiting for the GPU. In many cases +* this is undesirable and limits the throughput of the system, as +* many clients cannot continue processing user input/output whilst +* asleep. RPS autotuning may take tens of milliseconds to respond +* to the GPU load and thus incurs additional latency for the client. +* We can circumvent that promoting the GPU frequency to maximum +* before we wait. This makes GPU throttle up much more quickly +* (good for benchmarks), but at a cost of spending more power +* processing the workload (bad for battery). Not all clients even +* want their results immediately and for them we should just let +* the GPU select its own frequency to maximise efficiency. +* To prevent a single client from forcing the clocks too high for +* the whole system, we only allow each client to waitboost once +* in a busy period. +*/ if (INTEL_INFO(req->i915)->gen >= 6) gen6_rps_boost(req->i915, rps, req->emitted_jiffies); -- 2.6.3 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 27/32] drm/i915: Harden detection of missed interrupts
Only declare a missed interrupt if we find that the GPU is idle with waiters and a hangcheck interval has passed in which no new user interrupts have been raised. Signed-off-by: Chris Wilson--- drivers/gpu/drm/i915/i915_debugfs.c | 6 ++ drivers/gpu/drm/i915/i915_irq.c | 7 ++- drivers/gpu/drm/i915/intel_ringbuffer.h | 2 ++ 3 files changed, 14 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index a03ed9e38499..78506abe7882 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -735,6 +735,9 @@ static void i915_ring_seqno_info(struct seq_file *m, seq_printf(m, "Current sequence (%s): %x\n", ring->name, intel_ring_get_seqno(ring)); + seq_printf(m, "Current user interrupts (%s): %x\n", + ring->name, READ_ONCE(ring->user_interrupts)); + spin_lock(>breadcrumbs.lock); for (rb = rb_first(>breadcrumbs.requests); rb != NULL; @@ -1369,6 +1372,9 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused) seq_printf(m, "\tseqno = %x [current %x], waiters? %d\n", ring->hangcheck.seqno, seqno[i], intel_engine_has_waiter(ring)); + seq_printf(m, "\tuser interrupts = %x [current %x]\n", + ring->hangcheck.user_interrupts, + ring->user_interrupts); seq_printf(m, "\tACTHD = 0x%08llx [current 0x%08llx]\n", (long long)ring->hangcheck.acthd, (long long)acthd[i]); diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 64502c0d2a81..e864ebeef4ef 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -1000,6 +1000,7 @@ static void notify_ring(struct intel_engine_cs *ring) return; trace_i915_gem_request_notify(ring); + ring->user_interrupts++; intel_engine_wakeup(ring); } @@ -2974,12 +2975,14 @@ static void i915_hangcheck_elapsed(struct work_struct *work) for_each_ring(ring, dev_priv, i) { u64 acthd; u32 seqno; + unsigned user_interrupts; bool busy = true; semaphore_clear_deadlocks(dev_priv); acthd = intel_ring_get_active_head(ring); seqno = intel_ring_get_seqno(ring); + user_interrupts = ring->user_interrupts; if (ring->hangcheck.seqno == seqno) { if (ring_idle(ring, seqno)) { @@ -2987,7 +2990,8 @@ static void i915_hangcheck_elapsed(struct work_struct *work) if (intel_engine_has_waiter(ring)) { /* Issue a wake-up to catch stuck h/w. */ - if (!test_and_set_bit(ring->id, _priv->gpu_error.missed_irq_rings)) { + if (ring->hangcheck.user_interrupts == user_interrupts && + !test_and_set_bit(ring->id, _priv->gpu_error.missed_irq_rings)) { if (!test_bit(ring->id, _priv->gpu_error.test_irq_rings)) DRM_ERROR("Hangcheck timer elapsed... %s idle\n", ring->name); @@ -3051,6 +3055,7 @@ static void i915_hangcheck_elapsed(struct work_struct *work) ring->hangcheck.seqno = seqno; ring->hangcheck.acthd = acthd; + ring->hangcheck.user_interrupts = user_interrupts; busy_count += busy; } diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h index 33780fad6a30..1b4aa59c4d21 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.h +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h @@ -90,6 +90,7 @@ struct intel_ring_hangcheck { u64 acthd; u64 max_acthd; u32 seqno; + unsigned user_interrupts; int score; enum intel_ring_hangcheck_action action; int deadlock; @@ -323,6 +324,7 @@ struct intel_engine_cs { * inspecting request list. */ u32 last_submitted_seqno; + unsigned user_interrupts; bool gpu_caches_dirty; -- 2.6.3 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 32/32] drm/i915: Flush the RPS bottom-half when the GPU idles
Make sure that the RPS bottom-half is flushed before we set the idle frequency when we decide the GPU is idle. This should prevent any races with the bottom-half and setting the idle frequency, and ensures that the bottom-half is bounded by the GPU's rpm reference taken for when it is active (i.e. between gen6_rps_busy() and gen6_rps_idle()). v2: Avoid recursively using the i915->wq - RPS does not touch the struct_mutex so has no place being on the ordered i915->wq. Signed-off-by: Chris WilsonCc: Imre Deak Cc: Jesse Barnes --- drivers/gpu/drm/i915/i915_irq.c | 2 +- drivers/gpu/drm/i915/intel_pm.c | 10 +++--- 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index e5e307654c66..4cfbd694b3a8 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -1609,7 +1609,7 @@ static void gen6_rps_irq_handler(struct drm_i915_private *dev_priv, u32 pm_iir) gen6_disable_pm_irq(dev_priv, pm_iir & dev_priv->pm_rps_events); if (dev_priv->rps.interrupts_enabled) { dev_priv->rps.pm_iir |= pm_iir & dev_priv->pm_rps_events; - queue_work(dev_priv->wq, _priv->rps.work); + schedule_work(_priv->rps.work); } spin_unlock(_priv->irq_lock); } diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c index 570628628a90..f543f897c516 100644 --- a/drivers/gpu/drm/i915/intel_pm.c +++ b/drivers/gpu/drm/i915/intel_pm.c @@ -4401,11 +4401,15 @@ void gen6_rps_busy(struct drm_i915_private *dev_priv) void gen6_rps_idle(struct drm_i915_private *dev_priv) { - struct drm_device *dev = dev_priv->dev; + /* Flush our bottom-half so that it does not race with us +* setting the idle frequency and so that it is bounded by +* our rpm wakeref. +*/ + flush_work(_priv->rps.work); mutex_lock(_priv->rps.hw_lock); if (dev_priv->rps.enabled) { - if (IS_VALLEYVIEW(dev) || IS_CHERRYVIEW(dev)) + if (IS_VALLEYVIEW(dev_priv) || IS_CHERRYVIEW(dev_priv)) vlv_set_rps_idle(dev_priv); else gen6_set_rps(dev_priv->dev, dev_priv->rps.idle_freq); @@ -4443,7 +4447,7 @@ void gen6_rps_boost(struct drm_i915_private *dev_priv, spin_lock_irq(_priv->irq_lock); if (dev_priv->rps.interrupts_enabled) { dev_priv->rps.client_boost = true; - queue_work(dev_priv->wq, _priv->rps.work); + schedule_work(_priv->rps.work); } spin_unlock_irq(_priv->irq_lock); -- 2.6.3 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 25/32] drm/i915: Convert trace-irq to the breadcrumb waiter
If we convert the tracing over from direct use of ring->irq_get() and over to the breadcrumb infrastructure, we only have a single user of the ring->irq_get and so we will be able to simplify the driver routines (eliminating the redundant validation and irq refcounting). Signed-off-by: Chris Wilson--- drivers/gpu/drm/i915/i915_drv.h | 8 --- drivers/gpu/drm/i915/i915_gem.c | 6 - drivers/gpu/drm/i915/i915_trace.h| 2 +- drivers/gpu/drm/i915/intel_breadcrumbs.c | 39 drivers/gpu/drm/i915/intel_ringbuffer.h | 4 +++- 5 files changed, 43 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 37f4ef59fb4a..dabfb043362f 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -3625,12 +3625,4 @@ wait_remaining_ms_from_jiffies(unsigned long timestamp_jiffies, int to_wait_ms) schedule_timeout_uninterruptible(remaining_jiffies); } } - -static inline void i915_trace_irq_get(struct intel_engine_cs *ring, - struct drm_i915_gem_request *req) -{ - if (ring->trace_irq_req == NULL && ring->irq_get(ring)) - i915_gem_request_assign(>trace_irq_req, req); -} - #endif diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 78bcd231b100..fdd9dd5296e9 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2907,12 +2907,6 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring) i915_gem_object_retire__read(obj, ring->id); } - if (unlikely(ring->trace_irq_req && -i915_gem_request_completed(ring->trace_irq_req))) { - ring->irq_put(ring); - i915_gem_request_assign(>trace_irq_req, NULL); - } - WARN_ON(i915_verify_lists(ring->dev)); } diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h index efca75bcace3..628008e6c24f 100644 --- a/drivers/gpu/drm/i915/i915_trace.h +++ b/drivers/gpu/drm/i915/i915_trace.h @@ -503,7 +503,7 @@ TRACE_EVENT(i915_gem_ring_dispatch, __entry->ring = ring->id; __entry->seqno = i915_gem_request_get_seqno(req); __entry->flags = flags; - i915_trace_irq_get(ring, req); + intel_breadcrumbs_enable_trace(req); ), TP_printk("dev=%u, ring=%u, seqno=%u, flags=%x", diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c index 69b966b4f71b..ea5ee3f7fe01 100644 --- a/drivers/gpu/drm/i915/intel_breadcrumbs.c +++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c @@ -258,17 +258,56 @@ void intel_engine_remove_breadcrumb(struct intel_engine_cs *engine, spin_unlock(>lock); } +static void intel_breadcrumbs_tracer(struct work_struct *work) +{ + struct intel_breadcrumbs *b = + container_of(work, struct intel_breadcrumbs, trace); + struct intel_rps_client rps; + + INIT_LIST_HEAD(); + + do { + struct drm_i915_gem_request *request; + + spin_lock(>lock); + request = b->trace_request; + b->trace_request = NULL; + spin_unlock(>lock); + if (request == NULL) + return; + + __i915_wait_request(request, true, NULL, ); + i915_gem_request_unreference__unlocked(request); + } while (1); +} + +void intel_breadcrumbs_enable_trace(struct drm_i915_gem_request *request) +{ + struct intel_breadcrumbs *b = >ring->breadcrumbs; + + spin_lock(>lock); + if (b->trace_request == NULL) { + b->trace_request = i915_gem_request_reference(request); + queue_work(system_long_wq, >trace); + } + spin_unlock(>lock); +} + void intel_engine_init_breadcrumbs(struct intel_engine_cs *engine) { struct intel_breadcrumbs *b = >breadcrumbs; spin_lock_init(>lock); setup_timer(>fake_irq, intel_breadcrumbs_fake_irq, (unsigned long)b); + INIT_WORK(>trace, intel_breadcrumbs_tracer); } void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine) { struct intel_breadcrumbs *b = >breadcrumbs; + cancel_work_sync(>trace); + i915_gem_request_unreference(b->trace_request); + del_timer_sync(>fake_irq); } diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h index a35c17106f4b..0fd6395f1a1b 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.h +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h @@ -182,6 +182,8 @@ struct intel_engine_cs { struct rb_root requests; /* sorted by retirement */ struct task_struct *first_waiter; /*
[Intel-gfx] [PATCH 30/32] drm/i915: Restore waitboost credit to the synchronous waiter
Ideally, we want to automagically have the GPU respond to the instantaneous load by reclocking itself. However, reclocking occurs relatively slowly, and to the client waiting for a result from the GPU, too late. To compensate and reduce the client latency, we allow the first wait from a client to boost the GPU clocks to maximum. This overcomes the lag in autoreclocking, at the expense of forcing the GPU clocks too high. So to offset the excessive power usage, we currently allow a client to only boost the clocks once before we detect the GPU is idle again. This works reasonably for say the first frame in a benchmark, but for many more synchronous workloads (like OpenCL) we find the GPU clocks remain too low. By noting a wait which would idle the GPU (i.e. we just waited upon the last known request), we can give that client the idle boost credit (for their next wait) without the 100ms delay required for us to detect the GPU idle state. The intention is to boost clients that are stalling in the process of feeding the GPU more work (and who in doing so let the GPU idle), without granting boost credits to clients that are throttling themselves (such as compositors). Signed-off-by: Chris WilsonCc: "Zou, Nanhai" Cc: Jesse Barnes Reviewed-by: Jesse Barnes --- drivers/gpu/drm/i915/i915_gem.c | 16 1 file changed, 16 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index d1a7a7f8f3ad..a0584cffa7cd 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1340,6 +1340,22 @@ out: *timeout = 0; } + if (ret == 0 && rps && req->seqno == req->ring->last_submitted_seqno) { + /* The GPU is now idle and this client has stalled. +* Since no other client has submitted a request in the +* meantime, assume that this client is the only one +* supplying work to the GPU but is unable to keep that +* work supplied because it is waiting. Since the GPU is +* then never kept fully busy, RPS autoclocking will +* keep the clocks relatively low, causing further delays. +* Compensate by giving the synchronous client credit for +* a waitboost next time. +*/ + spin_lock(>i915->rps.client_lock); + list_del_init(>link); + spin_unlock(>i915->rps.client_lock); + } + return ret; } -- 2.6.3 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 28/32] drm/i915: Remove debug noise on detecting fault-injection of missed interrupts
Since the tests can and do explicitly check debugfs/i915_ring_missed_irqs for the handling of a "missed interrupt", adding it to the dmesg at INFO is just noise. When it happens for real, we still class it as an ERROR. Signed-off-by: Chris Wilson--- drivers/gpu/drm/i915/i915_irq.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index e864ebeef4ef..e5e307654c66 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -2995,9 +2995,6 @@ static void i915_hangcheck_elapsed(struct work_struct *work) if (!test_bit(ring->id, _priv->gpu_error.test_irq_rings)) DRM_ERROR("Hangcheck timer elapsed... %s idle\n", ring->name); - else - DRM_INFO("Fake missed irq on %s\n", -ring->name); intel_engine_enable_fake_irq(ring); } -- 2.6.3 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 24/32] drm/i915: On GPU reset, set the HWS breadcrumb to the last seqno
After the GPU reset and we discard all of the incomplete requests, mark the GPU as having advanced to the last_submitted_seqno (as having completed the requests and ready for fresh work). The impact of this is negligble, as all the requests will be considered completed by this point, it just brings the HWS into line with expectations for external viewers. Signed-off-by: Chris Wilson--- drivers/gpu/drm/i915/i915_gem.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index ca327c0e73f1..78bcd231b100 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2836,6 +2836,8 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv, buffer->last_retired_head = buffer->tail; intel_ring_update_space(buffer); } + + intel_ring_init_seqno(ring, ring->last_submitted_seqno); } void i915_gem_reset(struct drm_device *dev) -- 2.6.3 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 23/32] drm/i915: Only query timestamp when measuring elapsed time
Avoid the two calls to ktime_get_raw_ns() (at best it reads the TSC) as we only need to compute the elapsed time for a timed wait. Signed-off-by: Chris Wilson--- drivers/gpu/drm/i915/i915_gem.c | 13 + 1 file changed, 5 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index d595d72e53b1..ca327c0e73f1 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1222,7 +1222,6 @@ int __i915_wait_request(struct drm_i915_gem_request *req, int state = interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE; struct intel_breadcrumb wait; unsigned long timeout_remain; - s64 before, now; int ret = 0; might_sleep(); @@ -1241,13 +1240,12 @@ int __i915_wait_request(struct drm_i915_gem_request *req, if (*timeout == 0) return -ETIME; + /* Record current time in case interrupted, or wedged */ timeout_remain = nsecs_to_jiffies_timeout(*timeout); + *timeout += ktime_get_raw_ns(); } - /* Record current time in case interrupted by signal, or wedged */ trace_i915_gem_request_wait_begin(req); - before = ktime_get_raw_ns(); - if (INTEL_INFO(req->i915)->gen >= 6) gen6_rps_boost(req->i915, rps, req->emitted_jiffies); @@ -1324,13 +1322,12 @@ wakeup: set_task_state(wait.task, state); out: intel_engine_remove_breadcrumb(req->ring, ); __set_task_state(wait.task, TASK_RUNNING); - now = ktime_get_raw_ns(); trace_i915_gem_request_wait_end(req); if (timeout) { - s64 tres = *timeout - (now - before); - - *timeout = tres < 0 ? 0 : tres; + *timeout -= ktime_get_raw_ns(); + if (*timeout < 0) + *timeout = 0; /* * Apparently ktime isn't accurate enough and occasionally has a -- 2.6.3 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH V4 2/2] drm/i915: start adding dp mst audio
On Fri, 11 Dec 2015 11:43:51 +0100, Takashi Iwai wrote: > > On Fri, 11 Dec 2015 07:07:53 +0100, > Libin Yang wrote: > > > > >>> diff --git a/drivers/gpu/drm/i915/intel_audio.c > > >>> b/drivers/gpu/drm/i915/intel_audio.c > > >>> index 9aa83e7..5ad2e66 100644 > > >>> --- a/drivers/gpu/drm/i915/intel_audio.c > > >>> +++ b/drivers/gpu/drm/i915/intel_audio.c > > >>> @@ -262,7 +262,8 @@ static void hsw_audio_codec_disable(struct > > >>> intel_encoder *encoder) > > >>> tmp |= AUD_CONFIG_N_PROG_ENABLE; > > >>> tmp &= ~AUD_CONFIG_UPPER_N_MASK; > > >>> tmp &= ~AUD_CONFIG_LOWER_N_MASK; > > >>> - if (intel_pipe_has_type(intel_crtc, INTEL_OUTPUT_DISPLAYPORT)) > > >>> + if (intel_pipe_has_type(intel_crtc, INTEL_OUTPUT_DISPLAYPORT) || > > >>> + intel_pipe_has_type(intel_crtc, INTEL_OUTPUT_DP_MST)) > > >>> tmp |= AUD_CONFIG_N_VALUE_INDEX; > > The same check is missing in hsw_audio_codec_enable()? > > > >>> I915_WRITE(HSW_AUD_CFG(pipe), tmp); > > >>> > > >>> @@ -474,7 +475,8 @@ static void ilk_audio_codec_enable(struct > > >>> drm_connector *connector, > > >>> tmp &= ~AUD_CONFIG_N_VALUE_INDEX; > > >>> tmp &= ~AUD_CONFIG_N_PROG_ENABLE; > > >>> tmp &= ~AUD_CONFIG_PIXEL_CLOCK_HDMI_MASK; > > >>> - if (intel_pipe_has_type(intel_crtc, INTEL_OUTPUT_DISPLAYPORT)) > > >>> + if (intel_pipe_has_type(intel_crtc, INTEL_OUTPUT_DISPLAYPORT) || > > >>> + intel_pipe_has_type(intel_crtc, INTEL_OUTPUT_DP_MST)) > > >>> tmp |= AUD_CONFIG_N_VALUE_INDEX; > > ... and missing for ilk_audio_codec_disable()? > > > > >>> else > > >>> tmp |= audio_config_hdmi_pixel_clock(adjusted_mode); > > >>> @@ -512,7 +514,8 @@ void intel_audio_codec_enable(struct intel_encoder > > >>> *intel_encoder) > > >>> > > >>> /* ELD Conn_Type */ > > >>> connector->eld[5] &= ~(3 << 2); > > >>> - if (intel_pipe_has_type(crtc, INTEL_OUTPUT_DISPLAYPORT)) > > >>> + if (intel_pipe_has_type(crtc, INTEL_OUTPUT_DISPLAYPORT) || > > >>> + intel_pipe_has_type(crtc, INTEL_OUTPUT_DP_MST)) > > IMO, it's better to have a macro to cover this two-line check instead > of open-coding at each place. We'll have 5 places in the end. Also, this patch still has an issue about the encoder type, namely, it passes intel_encoder from MST, where you can't apply enc_to_dig_port(). We need another help to get the digital port depending on the encoder type, e.g. static struct intel_digital_port * intel_encoder_to_dig_port(struct intel_encoder *intel_encoder) { struct drm_encoder *encoder = _encoder->base; if (intel_encoder->type == INTEL_OUTPUT_DP_MST) return enc_to_mst(encoder)->primary; return enc_to_dig_port(encoder); } Takashi ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH] drm/i915: Allow objects to go back above 4GB in the address range
We detected if objects should be moved to the lower parts when 48-bit support flag was not set, but not the other way around. This handles the case in which an object was allocated in the 32-bit address range, but it has been marked as safe to move above it, which theoretically would help to keep the lower addresses available for objects which really need to be there. Cc: Daniele Ceraolo SpurioSigned-off-by: Michel Thierry --- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index 8df5b96..a83916e 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -696,6 +696,11 @@ eb_vma_misplaced(struct i915_vma *vma) (vma->node.start + vma->node.size - 1) >> 32) return true; + /* keep the lower addresses free of unnecessary objects */ + if ((entry->flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS) && + !((vma->node.start + vma->node.size - 1) >> 32)) + return true; + return false; } -- 2.6.3 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v2] PM / Runtime: Introduce pm_runtime_get_noidle
On Friday, December 11, 2015 01:03:50 PM Ulf Hansson wrote: > [...] > > >> > > >> > Which basically means you can call pm_runtime_resume() just fine, > >> > because it will do nothing if the status is RPM_ACTIVE already. > >> > > >> > So really, why don't you use pm_runtime_get_sync()? > >> > >> The difference would be that if the status is not RPM_ACTIVE already we > >> would drop the reference and report error. The caller would in this > >> case forego of doing something, since we the device is suspended or on > >> the way to being suspended. One example of such a scenario is a > >> watchdog like functionality: the watchdog work would > >> call pm_runtime_get_noidle() and check if the device is ok by doing > >> some HW access, but only if the device is powered. Otherwise the work > >> item would do nothing (meaning it also won't reschedule itself). The > >> watchdog work would get rescheduled next time the device is woken up > >> and some work is submitted to the device. > > > > So first of all the name "pm_runtime_get_noidle" doesn't make sense. > > > > I guess what you need is something like > > > > bool pm_runtime_get_if_active(struct device *dev) > > { > > unsigned log flags; > > bool ret; > > > > spin_lock_irqsave(>power.lock, flags); > > > > if (dev->power.runtime_status == RPM_ACTIVE) { > > atomic_inc(>power.usage_count); > > ret = true; > > } else { > > ret = false; > > } > > > > spin_unlock_irqrestore(>power.lock, flags); > > } > > > > and the caller will simply bail out if "false" is returned, but if "true" > > is returned, it will have to drop the usage count, right? > > > > Thanks, > > Rafael > > > > Why not just: > > pm_runtime_get_noresume(): > if (RPM_ACTIVE) > "do some actions" > pm_runtime_put(); Because that's racy? What if the rpm_suspend() is running for the device, but it hasn't changed the status yet? Thanks, Rafael ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v3] drm/i915: Avoid writing relocs with addresses in non-canonical form
On 12/11/2015 2:13 PM, Michał Winiarski wrote: According to bspec, some parts of HW require the addresses to be in a canonical form, where bits [63:48] == [47]. Let's convert addresses to canonical form prior to relocating and return converted offsets to userspace. We also need to make sure that userspace is using addresses in canonical form in case of softpin. v2: Whitespace fixup, gen8_canonical_addr description (Chris, Ville) v3: Rebase on top of softpin, fix a hole in relocate_entry, s/expect/require (Chris) Cc: Chris WilsonCc: Michel Thierry Cc: Ville Syrjälä Signed-off-by: Michał Winiarski With updated gem_softpin [http://patchwork.freedesktop.org/patch/msgid/1449843255-32640-1-git-send-email-michel.thie...@intel.com] Tested-by: Michel Thierry --- drivers/gpu/drm/i915/i915_gem.c| 9 +++-- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 21 +++-- drivers/gpu/drm/i915/i915_gem_gtt.h| 12 3 files changed, 34 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 8e2acde..b83207b 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -3482,12 +3482,17 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj, if (flags & PIN_OFFSET_FIXED) { uint64_t offset = flags & PIN_OFFSET_MASK; + uint64_t noncanonical_offset = offset & ((1ULL << 48) - 1); - if (offset & (alignment - 1) || offset + size > end) { + if (offset & (alignment - 1) || + noncanonical_offset + size > end || + offset != gen8_canonical_addr(offset)) { ret = -EINVAL; goto err_free_vma; } - vma->node.start = offset; + /* While userspace is using addresses in canonical form, our +* allocator is unaware of this */ + vma->node.start = noncanonical_offset; vma->node.size = size; vma->node.color = obj->cache_level; ret = drm_mm_reserve_node(>mm, >node); diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index 48ec484..445ccc7 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -249,6 +249,13 @@ static inline int use_cpu_reloc(struct drm_i915_gem_object *obj) obj->cache_level != I915_CACHE_NONE); } +static inline uint64_t +relocation_target(struct drm_i915_gem_relocation_entry *reloc, + uint64_t target_offset) +{ + return gen8_canonical_addr((int)reloc->delta + target_offset); +} + static int relocate_entry_cpu(struct drm_i915_gem_object *obj, struct drm_i915_gem_relocation_entry *reloc, @@ -256,7 +263,7 @@ relocate_entry_cpu(struct drm_i915_gem_object *obj, { struct drm_device *dev = obj->base.dev; uint32_t page_offset = offset_in_page(reloc->offset); - uint64_t delta = reloc->delta + target_offset; + uint64_t delta = relocation_target(reloc, target_offset); char *vaddr; int ret; @@ -292,7 +299,7 @@ relocate_entry_gtt(struct drm_i915_gem_object *obj, { struct drm_device *dev = obj->base.dev; struct drm_i915_private *dev_priv = dev->dev_private; - uint64_t delta = reloc->delta + target_offset; + uint64_t delta = relocation_target(reloc, target_offset); uint64_t offset; void __iomem *reloc_page; int ret; @@ -347,7 +354,7 @@ relocate_entry_clflush(struct drm_i915_gem_object *obj, { struct drm_device *dev = obj->base.dev; uint32_t page_offset = offset_in_page(reloc->offset); - uint64_t delta = (int)reloc->delta + target_offset; + uint64_t delta = relocation_target(reloc, target_offset); char *vaddr; int ret; @@ -395,7 +402,7 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj, target_i915_obj = target_vma->obj; target_obj = _vma->obj->base; - target_offset = target_vma->node.start; + target_offset = gen8_canonical_addr(target_vma->node.start); /* Sandybridge PPGTT errata: We need a global gtt mapping for MI and * pipe_control writes because the gpu doesn't properly redirect them @@ -583,6 +590,7 @@ i915_gem_execbuffer_reserve_vma(struct i915_vma *vma, struct drm_i915_gem_object *obj = vma->obj; struct drm_i915_gem_exec_object2 *entry = vma->exec_entry; uint64_t flags; + uint64_t offset; int ret; flags = PIN_USER; @@ -625,8 +633,9 @@ i915_gem_execbuffer_reserve_vma(struct i915_vma *vma, entry->flags |= __EXEC_OBJECT_HAS_FENCE;
[Intel-gfx] [RFC 31/38] drm/i915/preempt: scheduler logic for landing preemptive requests
From: Dave GordonThis patch adds the GEM & scheduler logic for detection and first-stage processing of completed preemption requests. Similar to regular batches, they deposit their sequence number in the hardware status page when starting and again when finished, but using different locations so that information pertaining to a preempted batch is not overwritten. Also, the in-progress flag is not by the GPU cleared at the end of the batch; instead driver software is responsible for clearing this once the request completion has been noticed. Actually-preemptive requests are still disabled via a module parameter at this early stage, as the rest of the logic to deal with the consequences of preemption isn't in place yet. v2: Re-worked to simplify 'pre-emption in progress' logic. For: VIZ-2021 Signed-off-by: Dave Gordon --- drivers/gpu/drm/i915/i915_gem.c | 55 -- drivers/gpu/drm/i915/i915_scheduler.c | 70 + drivers/gpu/drm/i915/i915_scheduler.h | 3 +- drivers/gpu/drm/i915/intel_ringbuffer.h | 1 + 4 files changed, 107 insertions(+), 22 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 66c9a58..ea3d224 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2489,6 +2489,14 @@ i915_gem_init_seqno(struct drm_device *dev, u32 seqno) ring->last_irq_seqno = 0; } + /* Also reset sw batch tracking state */ + for_each_ring(ring, dev_priv, i) { + intel_write_status_page(ring, I915_BATCH_DONE_SEQNO, 0); + intel_write_status_page(ring, I915_BATCH_ACTIVE_SEQNO, 0); + intel_write_status_page(ring, I915_PREEMPTIVE_DONE_SEQNO, 0); + intel_write_status_page(ring, I915_PREEMPTIVE_ACTIVE_SEQNO, 0); + } + return 0; } @@ -2831,15 +2839,18 @@ void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked) return; } - seqno = ring->get_seqno(ring, false); + seqno = ring->get_seqno(ring, false); trace_i915_gem_request_notify(ring, seqno); - if (seqno == ring->last_irq_seqno) + + /* Is there anything new to process? */ + if ((seqno == ring->last_irq_seqno) && !i915_scheduler_is_ring_preempting(ring)) return; - ring->last_irq_seqno = seqno; if (!fence_locked) spin_lock_irqsave(>fence_lock, flags); + ring->last_irq_seqno = seqno; + list_for_each_entry_safe(req, req_next, >fence_signal_list, signal_link) { if (!req->cancelled) { /* How can this happen? */ @@ -2861,7 +2872,7 @@ void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked) * and call scheduler_clean() while the scheduler * thinks it is still active. */ - wake_sched |= i915_scheduler_notify_request(req); + wake_sched |= i915_scheduler_notify_request(req, false); if (!req->cancelled) { fence_signal_locked(>fence); @@ -2877,6 +2888,42 @@ void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked) list_add_tail(>unsignal_link, >fence_unsignal_list); } + if (i915_scheduler_is_ring_preempting(ring)) { + u32 preempt_start, preempt_done; + + preempt_start = intel_read_status_page(ring, I915_PREEMPTIVE_ACTIVE_SEQNO); + preempt_done = intel_read_status_page(ring, I915_PREEMPTIVE_DONE_SEQNO); + + /* +* A preemption request leaves both ACTIVE and DONE set to the same +* seqno. If we find ACTIVE set but DONE is different, the preemption +* has started but not yet completed, so leave it until next time. +* After successfully processing a preemption request, we clear ACTIVE +* below to ensure we don't see it again. +*/ + if (preempt_start && preempt_done == preempt_start) { + bool sched_ack = false; + + list_for_each_entry_safe(req, req_next, >fence_signal_list, signal_link) { + if (req->seqno == preempt_done) { + /* De-list and notify the scheduler, but don't signal yet */ + list_del_init(>signal_link); + sched_ack = i915_scheduler_notify_request(req, true); + break; + } + } + + WARN_ON(!sched_ack); + wake_sched = true; + + /* Capture BATCH ACTIVE to determine whether a batch was in
[Intel-gfx] [RFC 32/38] drm/i915/preempt: add hook to catch 'unexpected' ring submissions
From: Dave GordonAuthor: John Harrison Date: Thu Apr 10 10:41:06 2014 +0100 The scheduler needs to know what each seqno that pops out of the ring is referring to. This change adds a hook into the the 'submit some random work that got forgotten about' clean up code to inform the scheduler that a new seqno has been sent to the ring for some non-batch buffer operation. Reworked for latest scheduler+preemption by Dave Gordon: with the newer implementation, knowing about untracked requests is merely helpful for debugging rather than being mandatory, as we have already taken steps to prevent untracked requests intruding at awkward moments! v2: Removed unnecessary debug spew. For: VIZ-2021 Signed-off-by: John Harrison Signed-off-by: Dave Gordon --- drivers/gpu/drm/i915/i915_gem.c | 4 drivers/gpu/drm/i915/i915_gpu_error.c | 2 ++ drivers/gpu/drm/i915/i915_scheduler.c | 21 + drivers/gpu/drm/i915/i915_scheduler.h | 1 + 4 files changed, 28 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index ea3d224..a91b916 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2597,6 +2597,10 @@ void __i915_add_request(struct drm_i915_gem_request *request, WARN_ON(request->seqno != dev_priv->last_seqno); } + /* Notify the scheduler, if it doesn't already track this request */ + if (!request->scheduler_qe) + i915_scheduler_fly_request(request); + /* Record the position of the start of the request so that * should we detect the updated seqno part-way through the * GPU processing the request, we never over-estimate the diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index 2d9dd3f..72c861e 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -1331,6 +1331,8 @@ static void i915_gem_record_rings(struct drm_device *dev, erq->ringbuffer_gtt = i915_gem_obj_ggtt_offset(request->ringbuf->obj); erq->scheduler_state = !sqe ? 'u' : i915_scheduler_queue_status_chr(sqe->status); + if (request->scheduler_flags & i915_req_sf_untracked) + erq->scheduler_state = 'U'; } } } diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index 54b6c32..8cd89d2 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -455,6 +455,27 @@ int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe) return 0; } +/* An untracked request is being launched ... */ +void i915_scheduler_fly_request(struct drm_i915_gem_request *req) +{ + struct drm_i915_private *dev_priv = req->i915; + struct i915_scheduler *scheduler = dev_priv->scheduler; + + BUG_ON(!scheduler); + BUG_ON(!mutex_is_locked(_priv->dev->struct_mutex)); + + /* This shouldn't happen */ + WARN_ON(i915_scheduler_is_ring_busy(req->ring)); + + /* We don't expect to see nodes that are already tracked */ + if (!WARN_ON(req->scheduler_qe)) { + /* Untracked node, must not be inside scheduler submission path */ + WARN_ON((scheduler->flags[req->ring->id] & i915_sf_submitting)); + scheduler->stats[req->ring->id].non_batch++; + req->scheduler_flags |= i915_req_sf_untracked; + } +} + static int i915_scheduler_fly_node(struct i915_scheduler_queue_entry *node) { struct drm_i915_private *dev_priv = node->params.dev->dev_private; diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h index 5b871b0..7e7e974 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.h +++ b/drivers/gpu/drm/i915/i915_scheduler.h @@ -199,6 +199,7 @@ booli915_scheduler_is_ring_flying(struct intel_engine_cs *ring); booli915_scheduler_is_ring_preempting(struct intel_engine_cs *ring); booli915_scheduler_is_ring_busy(struct intel_engine_cs *ring); voidi915_gem_scheduler_work_handler(struct work_struct *work); +voidi915_scheduler_fly_request(struct drm_i915_gem_request *req); int i915_scheduler_flush(struct intel_engine_cs *ring, bool is_locked); int i915_scheduler_flush_stamp(struct intel_engine_cs *ring, unsigned long stamp, bool is_locked); -- 1.9.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 12/40] drm/i915: Added scheduler hook when closing DRM file handles
From: John HarrisonThe scheduler decouples the submission of batch buffers to the driver with submission of batch buffers to the hardware. Thus it is possible for an application to close its DRM file handle while there is still work outstanding. That means the scheduler needs to know about file close events so it can remove the file pointer from such orphaned batch buffers and not attempt to dereference it later. v3: Updated to not wait for outstanding work to complete but merely remove the file handle reference. The wait was getting excessively complicated with inter-ring dependencies, pre-emption, and other such issues. Change-Id: I24ac056c062b075ff1cc5e2ed2d3fa8e17e85951 For: VIZ-1587 Signed-off-by: John Harrison --- drivers/gpu/drm/i915/i915_dma.c | 3 +++ drivers/gpu/drm/i915/i915_scheduler.c | 35 +++ drivers/gpu/drm/i915/i915_scheduler.h | 2 ++ 3 files changed, 40 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c index 731cf31..c2f9c03 100644 --- a/drivers/gpu/drm/i915/i915_dma.c +++ b/drivers/gpu/drm/i915/i915_dma.c @@ -46,6 +46,7 @@ #include #include #include +#include "i915_scheduler.h" #include #include #include @@ -1250,6 +1251,8 @@ void i915_driver_lastclose(struct drm_device *dev) void i915_driver_preclose(struct drm_device *dev, struct drm_file *file) { + i915_scheduler_closefile(dev, file); + mutex_lock(>struct_mutex); i915_gem_context_close(dev, file); i915_gem_release(dev, file); diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index 344760e..5aafc96 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -768,3 +768,38 @@ static int i915_scheduler_remove_dependent(struct i915_scheduler *scheduler, return 0; } + +int i915_scheduler_closefile(struct drm_device *dev, struct drm_file *file) +{ + struct i915_scheduler_queue_entry *node; + struct drm_i915_private*dev_priv = dev->dev_private; + struct i915_scheduler *scheduler = dev_priv->scheduler; + struct intel_engine_cs *ring; + int i; + unsigned long flags; + + if (!scheduler) + return 0; + + spin_lock_irqsave(>lock, flags); + + for_each_ring(ring, dev_priv, i) { + list_for_each_entry(node, >node_queue[ring->id], link) { + if (node->params.file != file) + continue; + + if(!I915_SQS_IS_COMPLETE(node)) + DRM_DEBUG_DRIVER("Closing file handle with outstanding work: %d:%d/%d on %s\n", +node->params.request->uniq, +node->params.request->seqno, +node->status, +ring->name); + + node->params.file = NULL; + } + } + + spin_unlock_irqrestore(>lock, flags); + + return 0; +} diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h index 2d50d83..02ac6f2 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.h +++ b/drivers/gpu/drm/i915/i915_scheduler.h @@ -86,6 +86,8 @@ enum { booli915_scheduler_is_enabled(struct drm_device *dev); int i915_scheduler_init(struct drm_device *dev); +int i915_scheduler_closefile(struct drm_device *dev, +struct drm_file *file); int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe); booli915_scheduler_notify_request(struct drm_i915_gem_request *req); -- 1.9.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 17/40] drm/i915: Added tracking/locking of batch buffer objects
From: John HarrisonThe scheduler needs to track interdependencies between batch buffers. These are calculated by analysing the object lists of the buffers and looking for commonality. The scheduler also needs to keep those buffers locked long after the initial IOCTL call has returned to user land. v3: Updated to support read-read optimisation. Change-Id: I31e3677ecfc2c9b5a908bda6acc4850432d55f1e For: VIZ-1587 Signed-off-by: John Harrison --- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 52 -- drivers/gpu/drm/i915/i915_scheduler.c | 33 +-- 2 files changed, 80 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index 2c7a395..0908699 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -1418,7 +1418,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data, struct i915_execbuffer_params *params = const u32 ctx_id = i915_execbuffer2_get_context_id(*args); u32 dispatch_flags; - int ret; + int ret, i; bool need_relocs; int fd_fence_complete = -1; int fd_fence_wait = lower_32_bits(args->rsvd2); @@ -1553,6 +1553,14 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data, goto pre_mutex_err; } + qe.saved_objects = kzalloc( + sizeof(*qe.saved_objects) * args->buffer_count, + GFP_KERNEL); + if (!qe.saved_objects) { + ret = -ENOMEM; + goto err; + } + /* Look up object handles */ ret = eb_lookup_vmas(eb, exec, args, vm, file); if (ret) @@ -1673,7 +1681,30 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data, params->args_DR1= args->DR1; params->args_DR4= args->DR4; params->batch_obj = batch_obj; - params->ctx = ctx; + + /* +* Save away the list of objects used by this batch buffer for the +* purpose of tracking inter-buffer dependencies. +*/ + for (i = 0; i < args->buffer_count; i++) { + struct drm_i915_gem_object *obj; + + /* +* NB: 'drm_gem_object_lookup()' increments the object's +* reference count and so must be matched by a +* 'drm_gem_object_unreference' call. +*/ + obj = to_intel_bo(drm_gem_object_lookup(dev, file, + exec[i].handle)); + qe.saved_objects[i].obj = obj; + qe.saved_objects[i].read_only = obj->base.pending_write_domain == 0; + + } + qe.num_objs = i; + + /* Lock and save the context object as well. */ + i915_gem_context_reference(ctx); + params->ctx = ctx; if (args->flags & I915_EXEC_CREATE_FENCE) { /* @@ -1738,6 +1769,23 @@ err: i915_gem_context_unreference(ctx); eb_destroy(eb); + if (qe.saved_objects) { + /* Need to release the objects: */ + for (i = 0; i < qe.num_objs; i++) { + if (!qe.saved_objects[i].obj) + continue; + + drm_gem_object_unreference( + _objects[i].obj->base); + } + + kfree(qe.saved_objects); + + /* Context too */ + if (params->ctx) + i915_gem_context_unreference(params->ctx); + } + /* * If the request was created but not successfully submitted then it * must be freed again. If it was submitted then it is being tracked diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index 9d1475f..300cd89 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -158,7 +158,23 @@ int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe) if (ret) return ret; - /* Free everything that is owned by the QE structure: */ + /* Need to release the objects: */ + for (i = 0; i < qe->num_objs; i++) { + if (!qe->saved_objects[i].obj) + continue; + + drm_gem_object_unreference(>saved_objects[i].obj->base); + } + + kfree(qe->saved_objects); + qe->saved_objects = NULL; + qe->num_objs = 0; + + /* Free the context object too: */ + if (qe->params.ctx) + i915_gem_context_unreference(qe->params.ctx); + + /* And anything else owned by the
[Intel-gfx] [PATCH 32/40] drm/i915: Add early exit to execbuff_final() if insufficient ring space
From: John HarrisonOne of the major purposes of the GPU scheduler is to avoid stalling the CPU when the GPU is busy and unable to accept more work. This change adds support to the ring submission code to allow a ring space check to be performed before attempting to submit a batch buffer to the hardware. If insufficient space is available then the scheduler can go away and come back later, letting the CPU get on with other work, rather than stalling and waiting for the hardware to catch up. v3: Updated to use locally cached request pointer. Change-Id: I267159ce1150cb6714d34a49b841bcbe4bf66326 For: VIZ-1587 Signed-off-by: John Harrison --- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 42 -- drivers/gpu/drm/i915/intel_lrc.c | 57 +++--- drivers/gpu/drm/i915/intel_ringbuffer.c| 24 + drivers/gpu/drm/i915/intel_ringbuffer.h| 1 + 4 files changed, 109 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index 8ba426f..bf9d804 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -1101,25 +1101,19 @@ i915_reset_gen7_sol_offsets(struct drm_device *dev, { struct intel_engine_cs *ring = req->ring; struct drm_i915_private *dev_priv = dev->dev_private; - int ret, i; + int i; if (!IS_GEN7(dev) || ring != _priv->ring[RCS]) { DRM_DEBUG("sol reset is gen7/rcs only\n"); return -EINVAL; } - ret = intel_ring_begin(req, 4 * 3); - if (ret) - return ret; - for (i = 0; i < 4; i++) { intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1)); intel_ring_emit(ring, GEN7_SO_WRITE_OFFSET(i)); intel_ring_emit(ring, 0); } - intel_ring_advance(ring); - return 0; } @@ -1247,6 +1241,7 @@ int i915_gem_ringbuffer_submission_final(struct i915_execbuffer_params *params) struct intel_engine_cs *ring = params->ring; u64 exec_start, exec_len; int ret; + uint32_t min_space; /* The mutex must be acquired before calling this function */ BUG_ON(!mutex_is_locked(>dev->struct_mutex)); @@ -1268,8 +1263,36 @@ int i915_gem_ringbuffer_submission_final(struct i915_execbuffer_params *params) if (ret) return ret; + /* +* It would be a bad idea to run out of space while writing commands +* to the ring. One of the major aims of the scheduler is to not stall +* at any point for any reason. However, doing an early exit half way +* through submission could result in a partial sequence being written +* which would leave the engine in an unknown state. Therefore, check in +* advance that there will be enough space for the entire submission +* whether emitted by the code below OR by any other functions that may +* be executed before the end of final(). +* +* NB: This test deliberately overestimates, because that's easier than +* tracing every potential path that could be taken! +* +* Current measurements suggest that we may need to emit up to 744 bytes +* (186 dwords), so this is rounded up to 256 dwords here. Then we double +* that to get the free space requirement, because the block isn't allowed +* to span the transition from the end to the beginning of the ring. +*/ +#define I915_BATCH_EXEC_MAX_LEN 256/* max dwords emitted here */ + min_space = I915_BATCH_EXEC_MAX_LEN * 2 * sizeof(uint32_t); + ret = intel_ring_test_space(req->ringbuf, min_space); + if (ret) + goto early_error; + intel_runtime_pm_get(dev_priv); + ret = intel_ring_begin(req, I915_BATCH_EXEC_MAX_LEN); + if (ret) + goto error; + /* * Unconditionally invalidate gpu caches and ensure that we do flush * any residual writes from the previous batch. @@ -1288,10 +1311,6 @@ int i915_gem_ringbuffer_submission_final(struct i915_execbuffer_params *params) if (ring == _priv->ring[RCS] && params->instp_mode != dev_priv->relative_constants_mode) { - ret = intel_ring_begin(req, 4); - if (ret) - goto error; - intel_ring_emit(ring, MI_NOOP); intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1)); intel_ring_emit(ring, INSTPM); @@ -1328,6 +1347,7 @@ error: */ intel_runtime_pm_put(dev_priv); +early_error: if (ret) intel_ring_reserved_space_cancel(req->ringbuf); diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 1fa3228..d6acd2d6 100644 ---
[Intel-gfx] [PATCH 33/40] drm/i915: Added scheduler statistic reporting to debugfs
From: John HarrisonIt is useful for know what the scheduler is doing for both debugging and performance analysis purposes. This change adds a bunch of counters and such that keep track of various scheduler operations (batches submitted, completed, flush requests, etc.). The data can then be read in userland via the debugfs mechanism. v2: Updated to match changes to scheduler implementation. v3: Updated for changes to kill code and flush code. Change-Id: I3266c631cd70c9eeb2c235f88f493e60462f85d7 For: VIZ-1587 Signed-off-by: John Harrison --- drivers/gpu/drm/i915/i915_debugfs.c| 77 +++ drivers/gpu/drm/i915/i915_gem_execbuffer.c | 11 +++- drivers/gpu/drm/i915/i915_scheduler.c | 85 +++--- drivers/gpu/drm/i915/i915_scheduler.h | 36 + 4 files changed, 200 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index 8f1c10c..9e7d67d 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -3603,6 +3603,82 @@ static int i915_drrs_status(struct seq_file *m, void *unused) return 0; } +static int i915_scheduler_info(struct seq_file *m, void *unused) +{ + struct drm_info_node *node = (struct drm_info_node *) m->private; + struct drm_device *dev = node->minor->dev; + struct drm_i915_private *dev_priv = dev->dev_private; + struct i915_scheduler *scheduler = dev_priv->scheduler; + struct i915_scheduler_stats *stats = scheduler->stats; + struct i915_scheduler_stats_nodes node_stats[I915_NUM_RINGS]; + struct intel_engine_cs *ring; + char str[50 * (I915_NUM_RINGS + 1)], name[50], *ptr; + int ret, i, r; + + ret = mutex_lock_interruptible(>mode_config.mutex); + if (ret) + return ret; + +#define PRINT_VAR(name, fmt, var) \ + do {\ + sprintf(str, "%-22s", name);\ + ptr = str + strlen(str);\ + for_each_ring(ring, dev_priv, r) { \ + sprintf(ptr, " %10" fmt, var); \ + ptr += strlen(ptr); \ + } \ + seq_printf(m, "%s\n", str); \ + } while (0) + + PRINT_VAR("Ring name:", "s", dev_priv->ring[r].name); + PRINT_VAR(" Ring seqno", "d", ring->get_seqno(ring, false)); + seq_putc(m, '\n'); + + seq_puts(m, "Batch submissions:\n"); + PRINT_VAR(" Queued", "u", stats[r].queued); + PRINT_VAR(" Submitted","u", stats[r].submitted); + PRINT_VAR(" Completed","u", stats[r].completed); + PRINT_VAR(" Expired", "u", stats[r].expired); + seq_putc(m, '\n'); + + seq_puts(m, "Flush counts:\n"); + PRINT_VAR(" By object","u", stats[r].flush_obj); + PRINT_VAR(" By request", "u", stats[r].flush_req); + PRINT_VAR(" By stamp", "u", stats[r].flush_stamp); + PRINT_VAR(" Blanket", "u", stats[r].flush_all); + PRINT_VAR(" Entries bumped", "u", stats[r].flush_bump); + PRINT_VAR(" Entries submitted","u", stats[r].flush_submit); + seq_putc(m, '\n'); + + seq_puts(m, "Miscellaneous:\n"); + PRINT_VAR(" ExecEarly retry", "u", stats[r].exec_early); + PRINT_VAR(" ExecFinal requeue","u", stats[r].exec_again); + PRINT_VAR(" ExecFinal killed", "u", stats[r].exec_dead); + PRINT_VAR(" Fence wait", "u", stats[r].fence_wait); + PRINT_VAR(" Fence wait again", "u", stats[r].fence_again); + PRINT_VAR(" Fence wait ignore","u", stats[r].fence_ignore); + PRINT_VAR(" Fence supplied", "u", stats[r].fence_got); + PRINT_VAR(" Hung flying", "u", stats[r].kill_flying); + PRINT_VAR(" Hung queued", "u", stats[r].kill_queued); + seq_putc(m, '\n'); + + seq_puts(m, "Queue contents:\n"); + for_each_ring(ring, dev_priv, i) + i915_scheduler_query_stats(ring, node_stats + ring->id); + + for (i = 0; i < (i915_sqs_MAX + 1); i++) { + sprintf(name, " %s", i915_scheduler_queue_status_str(i)); + PRINT_VAR(name, "d", node_stats[r].counts[i]); + } + seq_putc(m, '\n'); + +#undef PRINT_VAR + + mutex_unlock(>mode_config.mutex); + + return 0; +} + struct pipe_crc_info { const char *name; struct drm_device *dev; @@ -5571,6 +5647,7 @@ static const struct drm_info_list i915_debugfs_list[] = {
[Intel-gfx] [RFC 38/38] drm/i915: Added preemption info to various trace points
From: John Harrisonv2: Fixed a typo (and improved the names in general). Updated for changes to notify() code. For: VIZ-2021 Signed-off-by: Dave Gordon --- drivers/gpu/drm/i915/i915_gem.c | 5 +++-- drivers/gpu/drm/i915/i915_scheduler.c | 2 +- drivers/gpu/drm/i915/i915_trace.h | 30 +++--- 3 files changed, 23 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 68bf8ce..d90b12c 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2872,12 +2872,12 @@ void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked) u32 seqno; if (list_empty(>fence_signal_list)) { - trace_i915_gem_request_notify(ring, 0); + trace_i915_gem_request_notify(ring, 0, 0, 0); return; } seqno = ring->get_seqno(ring, false); - trace_i915_gem_request_notify(ring, seqno); + trace_i915_gem_request_notify(ring, seqno, 0, 0); /* Is there anything new to process? */ if ((seqno == ring->last_irq_seqno) && !i915_scheduler_is_ring_preempting(ring)) @@ -2930,6 +2930,7 @@ void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked) preempt_start = intel_read_status_page(ring, I915_PREEMPTIVE_ACTIVE_SEQNO); preempt_done = intel_read_status_page(ring, I915_PREEMPTIVE_DONE_SEQNO); + trace_i915_gem_request_notify(ring, seqno, preempt_start, preempt_done); /* * A preemption request leaves both ACTIVE and DONE set to the same diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index e0db268..37fcd7c 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -616,7 +616,7 @@ bool i915_scheduler_notify_request(struct drm_i915_gem_request *req, unsigned long flags; bool result; - trace_i915_scheduler_landing(req); + trace_i915_scheduler_landing(req, preempt); spin_lock_irqsave(>lock, flags); diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h index 73b0ee9..5725cfa 100644 --- a/drivers/gpu/drm/i915/i915_trace.h +++ b/drivers/gpu/drm/i915/i915_trace.h @@ -569,13 +569,16 @@ DEFINE_EVENT(i915_gem_request, i915_gem_request_add, ); TRACE_EVENT(i915_gem_request_notify, - TP_PROTO(struct intel_engine_cs *ring, uint32_t seqno), - TP_ARGS(ring, seqno), + TP_PROTO(struct intel_engine_cs *ring, uint32_t seqno, +uint32_t preempt_start, uint32_t preempt_done), + TP_ARGS(ring, seqno, preempt_start, preempt_done), TP_STRUCT__entry( __field(u32, dev) __field(u32, ring) __field(u32, seqno) +__field(u32, preempt_start) +__field(u32, preempt_done) __field(bool, is_empty) ), @@ -583,11 +586,14 @@ TRACE_EVENT(i915_gem_request_notify, __entry->dev = ring->dev->primary->index; __entry->ring = ring->id; __entry->seqno = seqno; + __entry->preempt_start = preempt_start; + __entry->preempt_done = preempt_done; __entry->is_empty = list_empty(>fence_signal_list); ), - TP_printk("dev=%u, ring=%u, seqno=%u, empty=%d", + TP_printk("dev=%u, ring=%u, seqno=%u, preempt_start=%u, preempt_done=%u, empty=%d", __entry->dev, __entry->ring, __entry->seqno, + __entry->preempt_start, __entry->preempt_done, __entry->is_empty) ); @@ -887,25 +893,27 @@ TRACE_EVENT(i915_scheduler_unfly, ); TRACE_EVENT(i915_scheduler_landing, - TP_PROTO(struct drm_i915_gem_request *req), - TP_ARGS(req), + TP_PROTO(struct drm_i915_gem_request *req, bool preempt), + TP_ARGS(req, preempt), TP_STRUCT__entry( __field(u32, ring) __field(u32, uniq) __field(u32, seqno) __field(u32, status) +__field(bool, preempt) ), TP_fast_assign( - __entry->ring = req->ring->id; - __entry->uniq = req->uniq; - __entry->seqno = req->seqno; - __entry->status = req->scheduler_qe ? req->scheduler_qe->status : ~0U; + __entry->ring= req->ring->id; +
Re: [Intel-gfx] [PATCH 13/13] drm/i915: Cache last IRQ seqno to reduce IRQ overhead
On Fri, Dec 11, 2015 at 01:12:01PM +, john.c.harri...@intel.com wrote: > From: John Harrison> > The notify function can be called many times without the seqno > changing. A large number of duplicates are to prevent races due to the > requirement of not enabling interrupts until requested. However, when > interrupts are enabled the IRQ handle can be called multiple times > without the ring's seqno value changing. This patch reduces the > overhead of these extra calls by caching the last processed seqno > value and early exiting if it has not changed. This is just plain wrong. Every user-interrupt is preceded by a seqno update. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 08/40] drm/i915: Start of GPU scheduler
From: John HarrisonInitial creation of scheduler source files. Note that this patch implements most of the scheduler functionality but does not hook it in to the driver yet. It also leaves the scheduler code in 'pass through' mode so that even when it is hooked in, it will not actually do very much. This allows the hooks to be added one at a time in byte size chunks and only when the scheduler is finally enabled at the end does anything start happening. The general theory of operation is that when batch buffers are submitted to the driver, the execbuffer() code assigns a unique request and then packages up all the information required to execute the batch buffer at a later time. This package is given over to the scheduler which adds it to an internal node list. The scheduler also scans the list of objects associated with the batch buffer and compares them against the objects already in use by other buffers in the node list. If matches are found then the new batch buffer node is marked as being dependent upon the matching node. The same is done for the context object. The scheduler also bumps up the priority of such matching nodes on the grounds that the more dependencies a given batch buffer has the more important it is likely to be. The scheduler aims to have a given (tuneable) number of batch buffers in flight on the hardware at any given time. If fewer than this are currently executing when a new node is queued, then the node is passed straight through to the submit function. Otherwise it is simply added to the queue and the driver returns back to user land. As each batch buffer completes, it raises an interrupt which wakes up the scheduler. Note that it is possible for multiple buffers to complete before the IRQ handler gets to run. Further, it is possible for the seqno values to be un-ordered (particularly once pre-emption is enabled). However, the scheduler keeps the list of executing buffers in order of hardware submission. Thus it can scan through the list until a matching seqno is found and then mark all in flight nodes from that point on as completed. A deferred work queue is also poked by the interrupt handler. When this wakes up it can do more involved processing such as actually removing completed nodes from the queue and freeing up the resources associated with them (internal memory allocations, DRM object references, context reference, etc.). The work handler also checks the in flight count and calls the submission code if a new slot has appeared. When the scheduler's submit code is called, it scans the queued node list for the highest priority node that has no unmet dependencies. Note that the dependency calculation is complex as it must take inter-ring dependencies and potential preemptions into account. Note also that in the future this will be extended to include external dependencies such as the Android Native Sync file descriptors and/or the linux dma-buff synchronisation scheme. If a suitable node is found then it is sent to execbuff_final() for submission to the hardware. The in flight count is then re-checked and a new node popped from the list if appropriate. Note that this patch does not implement pre-emptive scheduling. Only basic scheduling by re-ordering batch buffer submission is currently implemented. v2: Changed priority levels to +/-1023 due to feedback from Chris Wilson. Removed redundant index from scheduler node. Changed time stamps to use jiffies instead of raw monotonic. This provides lower resolution but improved compatibility with other i915 code. Major re-write of completion tracking code due to struct fence conversion. The scheduler no longer has it's own private IRQ handler but just lets the existing request code handle completion events. Instead, the scheduler now hooks into the request notify code to be told when a request has completed. Reduced driver mutex locking scope. Removal of scheduler nodes no longer grabs the mutex lock. v3: Refactor of dependency generation to make the code more readable. Also added in read-read optimisation support - i.e., don't treat a shared read-only buffer as being a dependency. Allowed the killing of queued nodes rather than only flying ones. Change-Id: I1e08f59e650a3c2bbaaa9de7627da33849b06106 For: VIZ-1587 Signed-off-by: John Harrison --- drivers/gpu/drm/i915/Makefile | 1 + drivers/gpu/drm/i915/i915_drv.h | 4 + drivers/gpu/drm/i915/i915_gem.c | 5 + drivers/gpu/drm/i915/i915_scheduler.c | 763 ++ drivers/gpu/drm/i915/i915_scheduler.h | 91 5 files changed, 864 insertions(+) create mode 100644 drivers/gpu/drm/i915/i915_scheduler.c create mode 100644 drivers/gpu/drm/i915/i915_scheduler.h diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index 15398c5..79cb38b 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -10,6 +10,7 @@
[Intel-gfx] [PATCH 06/40] drm/i915: Cache request pointer in *_submission_final()
From: Dave GordonKeep a local copy of the request pointer in the _final() functions rather than dereferencing the params block repeatedly. v3: New patch in series. For: VIZ-1587 Signed-off-by: Dave Gordon Signed-off-by: John Harrison --- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 13 +++-- drivers/gpu/drm/i915/intel_lrc.c | 11 ++- 2 files changed, 13 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index 05c9de6..e38310f 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -1245,6 +1245,7 @@ i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params, int i915_gem_ringbuffer_submission_final(struct i915_execbuffer_params *params) { struct drm_i915_private *dev_priv = params->dev->dev_private; + struct drm_i915_gem_request *req = params->request; struct intel_engine_cs *ring = params->ring; u64 exec_start, exec_len; int ret; @@ -1258,12 +1259,12 @@ int i915_gem_ringbuffer_submission_final(struct i915_execbuffer_params *params) * Unconditionally invalidate gpu caches and ensure that we do flush * any residual writes from the previous batch. */ - ret = intel_ring_invalidate_all_caches(params->request); + ret = intel_ring_invalidate_all_caches(req); if (ret) goto error; /* Switch to the correct context for the batch */ - ret = i915_switch_context(params->request); + ret = i915_switch_context(req); if (ret) goto error; @@ -1272,7 +1273,7 @@ int i915_gem_ringbuffer_submission_final(struct i915_execbuffer_params *params) if (ring == _priv->ring[RCS] && params->instp_mode != dev_priv->relative_constants_mode) { - ret = intel_ring_begin(params->request, 4); + ret = intel_ring_begin(req, 4); if (ret) goto error; @@ -1286,7 +1287,7 @@ int i915_gem_ringbuffer_submission_final(struct i915_execbuffer_params *params) } if (params->args_flags & I915_EXEC_GEN7_SOL_RESET) { - ret = i915_reset_gen7_sol_offsets(params->dev, params->request); + ret = i915_reset_gen7_sol_offsets(params->dev, req); if (ret) goto error; } @@ -1295,13 +1296,13 @@ int i915_gem_ringbuffer_submission_final(struct i915_execbuffer_params *params) exec_start = params->batch_obj_vm_offset + params->args_batch_start_offset; - ret = ring->dispatch_execbuffer(params->request, + ret = ring->dispatch_execbuffer(req, exec_start, exec_len, params->dispatch_flags); if (ret) goto error; - trace_i915_gem_ring_dispatch(params->request, params->dispatch_flags); + trace_i915_gem_ring_dispatch(req, params->dispatch_flags); i915_gem_execbuffer_retire_commands(params); diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 88d57b7..b98ea3d 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -929,7 +929,8 @@ int intel_execlists_submission(struct i915_execbuffer_params *params, int intel_execlists_submission_final(struct i915_execbuffer_params *params) { struct drm_i915_private *dev_priv = params->dev->dev_private; - struct intel_ringbuffer *ringbuf = params->request->ringbuf; + struct drm_i915_gem_request *req = params->request; + struct intel_ringbuffer *ringbuf = req->ringbuf; struct intel_engine_cs *ring = params->ring; u64 exec_start; int ret; @@ -941,13 +942,13 @@ int intel_execlists_submission_final(struct i915_execbuffer_params *params) * Unconditionally invalidate gpu caches and ensure that we do flush * any residual writes from the previous batch. */ - ret = logical_ring_invalidate_all_caches(params->request); + ret = logical_ring_invalidate_all_caches(req); if (ret) return ret; if (ring == _priv->ring[RCS] && params->instp_mode != dev_priv->relative_constants_mode) { - ret = intel_logical_ring_begin(params->request, 4); + ret = intel_logical_ring_begin(req, 4); if (ret) return ret; @@ -963,11 +964,11 @@ int intel_execlists_submission_final(struct i915_execbuffer_params *params) exec_start = params->batch_obj_vm_offset + params->args_batch_start_offset; - ret = ring->emit_bb_start(params->request, exec_start, params->dispatch_flags); + ret = ring->emit_bb_start(req, exec_start,
[Intel-gfx] [PATCH 31/40] drm/i915: Added debug state dump facilities to scheduler
From: John HarrisonWhen debugging batch buffer submission issues, it is useful to be able to see what the current state of the scheduler is. This change adds functions for decoding the internal scheduler state and reporting it. v3: Updated a debug message with the new state_str() function. Change-Id: I0634168e3f3465ff023f5a673165c90b07e535b6 For: VIZ-1587 Signed-off-by: John Harrison --- drivers/gpu/drm/i915/i915_scheduler.c | 280 +- drivers/gpu/drm/i915/i915_scheduler.h | 14 ++ 2 files changed, 292 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index e6e1bd967..be2430d 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -36,6 +36,9 @@ static int i915_scheduler_submit_max_priority(struct intel_engine_cs *ri bool is_locked); static uint32_ti915_scheduler_count_flying(struct i915_scheduler *scheduler, struct intel_engine_cs *ring); +static int i915_scheduler_dump_locked(struct intel_engine_cs *ring, + const char *msg); +static int i915_scheduler_dump_all_locked(struct drm_device *dev, const char *msg); static voidi915_scheduler_priority_bump_clear(struct i915_scheduler *scheduler); static int i915_scheduler_priority_bump(struct i915_scheduler *scheduler, struct i915_scheduler_queue_entry *target, @@ -53,6 +56,116 @@ bool i915_scheduler_is_enabled(struct drm_device *dev) return dev_priv->scheduler != NULL; } +const char *i915_qe_state_str(struct i915_scheduler_queue_entry *node) +{ + static char str[50]; + char*ptr = str; + + *(ptr++) = node->bumped ? 'B' : '-', + *(ptr++) = i915_gem_request_completed(node->params.request) ? 'C' : '-'; + + *ptr = 0; + + return str; +} + +char i915_scheduler_queue_status_chr(enum i915_scheduler_queue_status status) +{ + switch (status) { + case i915_sqs_none: + return 'N'; + + case i915_sqs_queued: + return 'Q'; + + case i915_sqs_popped: + return 'X'; + + case i915_sqs_flying: + return 'F'; + + case i915_sqs_complete: + return 'C'; + + case i915_sqs_dead: + return 'D'; + + default: + break; + } + + return '?'; +} + +const char *i915_scheduler_queue_status_str( + enum i915_scheduler_queue_status status) +{ + static char str[50]; + + switch (status) { + case i915_sqs_none: + return "None"; + + case i915_sqs_queued: + return "Queued"; + + case i915_sqs_popped: + return "Popped"; + + case i915_sqs_flying: + return "Flying"; + + case i915_sqs_complete: + return "Complete"; + + case i915_sqs_dead: + return "Dead"; + + default: + break; + } + + sprintf(str, "[Unknown_%d!]", status); + return str; +} + +const char *i915_scheduler_flag_str(uint32_t flags) +{ + static char str[100]; + char *ptr = str; + + *ptr = 0; + +#define TEST_FLAG(flag, msg) \ + do {\ + if (flags & (flag)) { \ + strcpy(ptr, msg); \ + ptr += strlen(ptr); \ + flags &= ~(flag); \ + } \ + } while (0) + + TEST_FLAG(i915_sf_interrupts_enabled, "IntOn|"); + TEST_FLAG(i915_sf_submitting, "Submitting|"); + TEST_FLAG(i915_sf_dump_force, "DumpForce|"); + TEST_FLAG(i915_sf_dump_details, "DumpDetails|"); + TEST_FLAG(i915_sf_dump_dependencies, "DumpDeps|"); + +#undef TEST_FLAG + + if (flags) { + sprintf(ptr, "Unknown_0x%X!", flags); + ptr += strlen(ptr); + } + + if (ptr == str) + strcpy(str, "-"); + else + ptr[-1] = 0; + + return str; +}; + int i915_scheduler_init(struct drm_device *dev) { struct drm_i915_private *dev_priv = dev->dev_private; @@ -631,6 +744,169 @@ void i915_gem_scheduler_work_handler(struct work_struct *work) } } +int i915_scheduler_dump_all(struct drm_device *dev, const char *msg) +{ + struct drm_i915_private *dev_priv = dev->dev_private; + struct i915_scheduler *scheduler = dev_priv->scheduler; + unsigned long flags; + int ret; + +
[Intel-gfx] [PATCH 29/40] drm/i915: Added scheduler queue throttling by DRM file handle
From: John HarrisonThe scheduler decouples the submission of batch buffers to the driver from their subsequent submission to the hardware. This means that an application which is continuously submitting buffers as fast as it can could potentialy flood the driver. To prevent this, the driver now tracks how many buffers are in progress (queued in software or executing in hardware) and limits this to a given (tunable) number. If this number is exceeded then the queue to the driver will return EAGAIN and thus prevent the scheduler's queue becoming arbitrarily large. v3: Added a missing decrement of the file queue counter. Change-Id: I83258240aec7c810db08c006a3062d46aa91363f For: VIZ-1587 Signed-off-by: John Harrison --- drivers/gpu/drm/i915/i915_drv.h| 2 ++ drivers/gpu/drm/i915/i915_gem_execbuffer.c | 8 +++ drivers/gpu/drm/i915/i915_scheduler.c | 35 ++ drivers/gpu/drm/i915/i915_scheduler.h | 2 ++ 4 files changed, 47 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 4187e75..4ecb6e4 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -342,6 +342,8 @@ struct drm_i915_file_private { } rps; struct intel_engine_cs *bsd_ring; + + u32 scheduler_queue_length; }; enum intel_dpll_id { diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index b358b21..8ba426f 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -1862,6 +1862,10 @@ i915_gem_execbuffer(struct drm_device *dev, void *data, return -EINVAL; } + /* Throttle batch requests per device file */ + if (i915_scheduler_file_queue_is_full(file)) + return -EAGAIN; + /* Copy in the exec list from userland */ exec_list = drm_malloc_ab(sizeof(*exec_list), args->buffer_count); exec2_list = drm_malloc_ab(sizeof(*exec2_list), args->buffer_count); @@ -1945,6 +1949,10 @@ i915_gem_execbuffer2(struct drm_device *dev, void *data, return -EINVAL; } + /* Throttle batch requests per device file */ + if (i915_scheduler_file_queue_is_full(file)) + return -EAGAIN; + exec2_list = kmalloc(sizeof(*exec2_list)*args->buffer_count, GFP_TEMPORARY | __GFP_NOWARN | __GFP_NORETRY); if (exec2_list == NULL) diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index 4736f0f..e6e1bd967 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -40,6 +40,8 @@ static voidi915_scheduler_priority_bump_clear(struct i915_scheduler *sch static int i915_scheduler_priority_bump(struct i915_scheduler *scheduler, struct i915_scheduler_queue_entry *target, uint32_t bump); +static voidi915_scheduler_file_queue_inc(struct drm_file *file); +static voidi915_scheduler_file_queue_dec(struct drm_file *file); bool i915_scheduler_is_enabled(struct drm_device *dev) { @@ -74,6 +76,7 @@ int i915_scheduler_init(struct drm_device *dev) scheduler->priority_level_max = 1023; scheduler->priority_level_preempt = 900; scheduler->min_flying = 2; + scheduler->file_queue_max = 64; dev_priv->scheduler = scheduler; @@ -267,6 +270,8 @@ int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe) list_add_tail(>link, >node_queue[ring->id]); + i915_scheduler_file_queue_inc(node->params.file); + if (i915.scheduler_override & i915_so_submit_on_queue) not_flying = true; else @@ -551,6 +556,12 @@ static int i915_scheduler_remove(struct intel_engine_cs *ring) /* Strip the dependency info while the mutex is still locked */ i915_scheduler_remove_dependent(scheduler, node); + /* Likewise clean up the file descriptor before it might disappear. */ + if (node->params.file) { + i915_scheduler_file_queue_dec(node->params.file); + node->params.file = NULL; + } + continue; } @@ -1194,6 +1205,7 @@ int i915_scheduler_closefile(struct drm_device *dev, struct drm_file *file) node->status, ring->name); + i915_scheduler_file_queue_dec(node->params.file); node->params.file = NULL; } } @@ -1202,3 +1214,26 @@ int i915_scheduler_closefile(struct drm_device *dev, struct drm_file *file) return 0; } +
Re: [Intel-gfx] [PATCH V4 2/2] drm/i915: start adding dp mst audio
On Fri, 11 Dec 2015 07:07:53 +0100, Libin Yang wrote: > > Add Takashi and ALSA mail list. > > On 12/10/2015 05:02 PM, Daniel Vetter wrote: > > On Tue, Dec 08, 2015 at 04:01:20PM +0800, Libin Yang wrote: > >> Hi all, > >> > >> Any comments on the patches? > > > > Sorry, simply fell through the cracks since Ander is on vacation. Takashi > > is working on some cleanup patches to have a port->encoder mapping for the > > audio side of i915. His patch cleans up all the existing audio code in > > i915, but please work together with him to align mst code with the new > > style. > > > > Both patches queued for next. > > Yes, I have seen Takashi's patches. I will check the patches. The patch like below should work; it sets/clears the reverse mapping dynamically for the MST encoder. At least, now I could get a proper ELD from a docking station. But the audio itself doesn't seem working yet, missing something... FWIW, the fixed patches are found in my test/hdmi-jack branch. It contains my previous get_eld patchset, HD-audio side changes, Libin's this patchset, plus Libin's HD-audio MST patchset and some fixes. Takashi --- diff --git a/drivers/gpu/drm/i915/intel_dp_mst.c b/drivers/gpu/drm/i915/intel_dp_mst.c index 8b608c2cd070..87dad62fd10b 100644 --- a/drivers/gpu/drm/i915/intel_dp_mst.c +++ b/drivers/gpu/drm/i915/intel_dp_mst.c @@ -108,6 +108,7 @@ static void intel_mst_disable_dp(struct intel_encoder *encoder) struct drm_i915_private *dev_priv = dev->dev_private; struct drm_crtc *crtc = encoder->base.crtc; struct intel_crtc *intel_crtc = to_intel_crtc(crtc); + enum port port = intel_dig_port->port; int ret; @@ -122,6 +123,9 @@ static void intel_mst_disable_dp(struct intel_encoder *encoder) if (intel_crtc->config->has_audio) { intel_audio_codec_disable(encoder); intel_display_power_put(dev_priv, POWER_DOMAIN_AUDIO); + mutex_lock(_priv->av_mutex); + dev_priv->dig_port_map[port] = NULL; + mutex_unlock(_priv->av_mutex); } } @@ -236,6 +240,9 @@ static void intel_mst_enable_dp(struct intel_encoder *encoder) if (crtc->config->has_audio) { DRM_DEBUG_DRIVER("Enabling DP audio on pipe %c\n", pipe_name(crtc->pipe)); + mutex_lock(_priv->av_mutex); + dev_priv->dig_port_map[port] = encoder; + mutex_unlock(_priv->av_mutex); intel_display_power_get(dev_priv, POWER_DOMAIN_AUDIO); intel_audio_codec_enable(encoder); } ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 2/6] drm/i915: Support for creating Stolen memory backed objects
On 11/12/15 12:19, Tvrtko Ursulin wrote: On 11/12/15 11:22, Ankitprasad Sharma wrote: On Wed, 2015-12-09 at 14:06 +, Tvrtko Ursulin wrote: Hi, On 09/12/15 12:46, ankitprasad.r.sha...@intel.com wrote: From: Ankitprasad Sharma[snip!] +/** + * Requested flags (currently used for placement + * (which memory domain)) + * + * You can request that the object be created from special memory + * rather than regular system pages using this parameter. Such + * irregular objects may have certain restrictions (such as CPU + * access to a stolen object is verboten). + * + * This can be used in the future for other purposes too + * e.g. specifying tiling/caching/madvise + */ +__u32 flags; +#define I915_CREATE_PLACEMENT_STOLEN (1<<0) /* Cannot use CPU mmaps */ +#define __I915_CREATE_UNKNOWN_FLAGS -(I915_CREATE_PLACEMENT_STOLEN << 1) I've asked in another reply, now that userspace can create a stolen object, what happens if it tries to use it for a batch buffer? Can it end up in the relocate_entry_cpu with a batch buffer allocated from stolen, which would then call i915_gem_object_get_page and crash? Thanks for pointing it out. Yes, this is definitely a possibility, if we allocate batchbuffers from the stolen region. I have started working on that, to do relocate_entry_stolen() if the object is allocated from stolen. Or perhaps it would be OK to just fail the execbuf? Just thinking to simplify things. Is it required (or expected) that users will need or want to create batch buffers from stolen? Regards, Tvrtko Let's NOT have batchbuffers in stolen. Or anywhere else exotic, just in regular shmfs-backed GEM objects (no phys, userptr, or dma_buf either). And I'd rather contexts and ringbuffers weren't placed there either, because the CPU needs to write those all the time. All special-purpose GEM objects should be usable ONLY as data buffers for the GPU, or for CPU access with pread/pwrite. The objects that the kernel needs to understand and manipulate (contexts, ringbuffers, and batches) should always be default (shmfs-backed) GEM objects, so that we don't have to propagate the understanding of all the exceptional cases into a multitude of different kernel functions. Oh, and I'd suggest that once we have more than two GEM object types, the pread/pwrite operations should be extracted and turned into vfuncs rather than adding complexity to the common ioctl/shmfs path. .Dave. ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH] drm/i915: Fix context/engine cleanup order
Swap the order of context & engine cleanup, so that it is now contexts, then engines. This allows the context clean up code to do things like confirm that ring->dev->struct_mutex is locked without a NULL pointer dereference. This came about as a result of the 'intel_ring_initialized() must be simple and inline' patch now using ring->dev as an initialised flag. Rename the cleanup function to reflect what it actually does. Also clean up some very annoying whitespace issues at the same time. Signed-off-by: Nick HoathCc: Mika Kuoppala Cc: Daniel Vetter Cc: David Gordon Cc: Chris Wilson --- drivers/gpu/drm/i915/i915_dma.c | 4 ++-- drivers/gpu/drm/i915/i915_drv.h | 2 +- drivers/gpu/drm/i915/i915_gem.c | 23 --- 3 files changed, 15 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c index 84e2b20..a2857b0 100644 --- a/drivers/gpu/drm/i915/i915_dma.c +++ b/drivers/gpu/drm/i915/i915_dma.c @@ -449,7 +449,7 @@ static int i915_load_modeset_init(struct drm_device *dev) cleanup_gem: mutex_lock(>struct_mutex); - i915_gem_cleanup_ringbuffer(dev); + i915_gem_cleanup_engines(dev); i915_gem_context_fini(dev); mutex_unlock(>struct_mutex); cleanup_irq: @@ -1188,8 +1188,8 @@ int i915_driver_unload(struct drm_device *dev) intel_guc_ucode_fini(dev); mutex_lock(>struct_mutex); - i915_gem_cleanup_ringbuffer(dev); i915_gem_context_fini(dev); + i915_gem_cleanup_engines(dev); mutex_unlock(>struct_mutex); intel_fbc_cleanup_cfb(dev_priv); i915_gem_cleanup_stolen(dev); diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 5edd393..e317f88 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -3016,7 +3016,7 @@ int i915_gem_init_rings(struct drm_device *dev); int __must_check i915_gem_init_hw(struct drm_device *dev); int i915_gem_l3_remap(struct drm_i915_gem_request *req, int slice); void i915_gem_init_swizzling(struct drm_device *dev); -void i915_gem_cleanup_ringbuffer(struct drm_device *dev); +void i915_gem_cleanup_engines(struct drm_device *dev); int __must_check i915_gpu_idle(struct drm_device *dev); int __must_check i915_gem_suspend(struct drm_device *dev); void __i915_add_request(struct drm_i915_gem_request *req, diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 8e2acde..04a22db 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -4823,7 +4823,7 @@ i915_gem_init_hw(struct drm_device *dev) ret = i915_gem_request_alloc(ring, ring->default_context, ); if (ret) { - i915_gem_cleanup_ringbuffer(dev); + i915_gem_cleanup_engines(dev); goto out; } @@ -4836,7 +4836,7 @@ i915_gem_init_hw(struct drm_device *dev) if (ret && ret != -EIO) { DRM_ERROR("PPGTT enable ring #%d failed %d\n", i, ret); i915_gem_request_cancel(req); - i915_gem_cleanup_ringbuffer(dev); + i915_gem_cleanup_engines(dev); goto out; } @@ -4844,7 +4844,7 @@ i915_gem_init_hw(struct drm_device *dev) if (ret && ret != -EIO) { DRM_ERROR("Context enable ring #%d failed %d\n", i, ret); i915_gem_request_cancel(req); - i915_gem_cleanup_ringbuffer(dev); + i915_gem_cleanup_engines(dev); goto out; } @@ -4919,7 +4919,7 @@ out_unlock: } void -i915_gem_cleanup_ringbuffer(struct drm_device *dev) +i915_gem_cleanup_engines(struct drm_device *dev) { struct drm_i915_private *dev_priv = dev->dev_private; struct intel_engine_cs *ring; @@ -4928,13 +4928,14 @@ i915_gem_cleanup_ringbuffer(struct drm_device *dev) for_each_ring(ring, dev_priv, i) dev_priv->gt.cleanup_ring(ring); -if (i915.enable_execlists) -/* - * Neither the BIOS, ourselves or any other kernel - * expects the system to be in execlists mode on startup, - * so we need to reset the GPU back to legacy mode. - */ -intel_gpu_reset(dev); + if (i915.enable_execlists) { + /* +* Neither the BIOS, ourselves or any other kernel +* expects the system to be in execlists mode on startup, +* so we need to reset the GPU back to legacy mode. +*/ + intel_gpu_reset(dev); + } } static void -- 1.9.1 ___ Intel-gfx
[Intel-gfx] [RFC 36/38] drm/i915/preempt: update (LRC) ringbuffer-filling code to create preemptive requests
From: Dave GordonThis patch refactors the rinbuffer-level code (in execlists/GuC mode only) and enhances it so that it can emit the proper sequence of opcode for preemption requests. A preemption request is similar to an batch submission, but doesn't actually invoke a batchbuffer, the purpose being simply to get the engine to stop what it's doing so that the scheduler can then send it a new workload instead. Preemption requests use different locations in the hardware status page to hold the 'active' and 'done' seqnos from regular batches, so that information pertaining to a preempted batch is not overwritten. Also, whereas a regular batch clears its 'active' flag when it finishes (so that TDR knows it's no longer to blame), preemption requests leave this set and the driver clears it once the completion of the preemption request has been noticed. Only one preemption (per ring) can be in progress at one time, so this handshake ensures correct sequencing of the request between the GPU and CPU. Actually-preemptive requests are still disabled via a module parameter at this stage, but all the components should now be ready for us to turn it on :) v2: Updated to use locally cached request pointer and to fix the location of the dispatch trace point. For: VIZ-2021 Signed-off-by: Dave Gordon --- drivers/gpu/drm/i915/intel_lrc.c | 177 ++- 1 file changed, 136 insertions(+), 41 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 36d63b7..31645a3 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -748,7 +748,7 @@ intel_logical_ring_advance_and_submit(struct drm_i915_gem_request *request) struct drm_i915_private *dev_priv = request->i915; struct i915_guc_client *client = dev_priv->guc.execbuf_client; const static bool fake = false; /* true => only pretend to preempt */ - bool preemptive = false;/* for now */ + bool preemptive; intel_logical_ring_advance(request->ringbuf); @@ -757,6 +757,7 @@ intel_logical_ring_advance_and_submit(struct drm_i915_gem_request *request) if (intel_ring_stopped(ring)) return; + preemptive = (request->scheduler_flags & i915_req_sf_preempt) != 0; if (preemptive && dev_priv->guc.preempt_client && !fake) client = dev_priv->guc.preempt_client; @@ -951,6 +952,117 @@ int intel_execlists_submission(struct i915_execbuffer_params *params, } /* + * This function stores the specified constant value in the (index)th DWORD of the + * hardware status page (execlist mode only). See separate code for legacy mode. + */ +static void +emit_store_dw_index(struct drm_i915_gem_request *req, uint32_t value, uint32_t index) +{ + struct intel_ringbuffer *ringbuf = req->ringbuf; + uint64_t hwpa = req->ring->status_page.gfx_addr; + hwpa += index << MI_STORE_DWORD_INDEX_SHIFT; + + intel_logical_ring_emit(ringbuf, MI_STORE_DWORD_IMM_GEN4 | MI_GLOBAL_GTT); + intel_logical_ring_emit(ringbuf, lower_32_bits(hwpa)); + intel_logical_ring_emit(ringbuf, upper_32_bits(hwpa)); /* GEN8+ */ + intel_logical_ring_emit(ringbuf, value); + + req->ring->gpu_caches_dirty = true; +} + +/* + * This function stores the specified register value in the (index)th DWORD + * of the hardware status page (execlist mode only). See separate code for + * legacy mode. + */ +static void +emit_store_reg_index(struct drm_i915_gem_request *req, uint32_t reg, uint32_t index) +{ + struct intel_ringbuffer *ringbuf = req->ringbuf; + uint64_t hwpa = req->ring->status_page.gfx_addr; + hwpa += index << MI_STORE_DWORD_INDEX_SHIFT; + + intel_logical_ring_emit(ringbuf, (MI_STORE_REG_MEM+1) | MI_GLOBAL_GTT); + intel_logical_ring_emit(ringbuf, reg); + intel_logical_ring_emit(ringbuf, lower_32_bits(hwpa)); + intel_logical_ring_emit(ringbuf, upper_32_bits(hwpa)); /* GEN8+ */ + + req->ring->gpu_caches_dirty = true; +} + +/* + * Emit the commands to execute when preparing to start a batch + * + * The GPU will log the seqno of the batch before it starts + * running any of the commands to actually execute that batch + */ +static void +emit_preamble(struct drm_i915_gem_request *req) +{ + struct intel_ringbuffer *ringbuf = req->ringbuf; + uint32_t seqno = i915_gem_request_get_seqno(req); + + BUG_ON(!seqno); + if (req->scheduler_flags & i915_req_sf_preempt) + emit_store_dw_index(req, seqno, I915_PREEMPTIVE_ACTIVE_SEQNO); + else + emit_store_dw_index(req, seqno, I915_BATCH_ACTIVE_SEQNO); + + intel_logical_ring_emit(ringbuf, MI_REPORT_HEAD); + intel_logical_ring_emit(ringbuf, MI_NOOP); + + req->ring->gpu_caches_dirty = true; +} + +static void +emit_relconsts_mode(struct i915_execbuffer_params *params) +{ +
Re: [Intel-gfx] [PATCH] drm/i915: Allow objects to go back above 4GB in the address range
On Fri, Dec 11, 2015 at 02:34:13PM +, Michel Thierry wrote: > We detected if objects should be moved to the lower parts when 48-bit > support flag was not set, but not the other way around. > > This handles the case in which an object was allocated in the 32-bit > address range, but it has been marked as safe to move above it, which > theoretically would help to keep the lower addresses available for > objects which really need to be there. > > Cc: Daniele Ceraolo Spurio> Signed-off-by: Michel Thierry No. This is not lazy. When we run out of low space, we evict. Until then don't cause extra work for no reason. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 10/13] drm/i915: Updated request structure tracing
From: John HarrisonAdded the '_complete' trace event which occurs when a fence/request is signaled as complete. Also moved the notify event from the IRQ handler code to inside the notify function itself. v3: Added the current ring seqno to the notify trace point. For: VIZ-5190 Signed-off-by: John Harrison --- drivers/gpu/drm/i915/i915_gem.c | 6 +- drivers/gpu/drm/i915/i915_irq.c | 2 -- drivers/gpu/drm/i915/i915_trace.h | 13 - 3 files changed, 13 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index f71215f..4817015 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2776,13 +2776,16 @@ void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked) unsigned long flags; u32 seqno; - if (list_empty(>fence_signal_list)) + if (list_empty(>fence_signal_list)) { + trace_i915_gem_request_notify(ring, 0); return; + } if (!fence_locked) spin_lock_irqsave(>fence_lock, flags); seqno = ring->get_seqno(ring, false); + trace_i915_gem_request_notify(ring, seqno); list_for_each_entry_safe(req, req_next, >fence_signal_list, signal_link) { if (!req->cancelled) { @@ -2798,6 +2801,7 @@ void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked) if (!req->cancelled) { fence_signal_locked(>fence); + trace_i915_gem_request_complete(req); } if (req->irq_enabled) { diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 74f8552..d280e05 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -979,8 +979,6 @@ static void notify_ring(struct intel_engine_cs *ring) if (!intel_ring_initialized(ring)) return; - trace_i915_gem_request_notify(ring); - i915_gem_request_notify(ring, false); wake_up_all(>irq_queue); diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h index 04fe849..41a026d 100644 --- a/drivers/gpu/drm/i915/i915_trace.h +++ b/drivers/gpu/drm/i915/i915_trace.h @@ -561,23 +561,26 @@ DEFINE_EVENT(i915_gem_request, i915_gem_request_add, ); TRACE_EVENT(i915_gem_request_notify, - TP_PROTO(struct intel_engine_cs *ring), - TP_ARGS(ring), + TP_PROTO(struct intel_engine_cs *ring, uint32_t seqno), + TP_ARGS(ring, seqno), TP_STRUCT__entry( __field(u32, dev) __field(u32, ring) __field(u32, seqno) +__field(bool, is_empty) ), TP_fast_assign( __entry->dev = ring->dev->primary->index; __entry->ring = ring->id; - __entry->seqno = ring->get_seqno(ring, false); + __entry->seqno = seqno; + __entry->is_empty = list_empty(>fence_signal_list); ), - TP_printk("dev=%u, ring=%u, seqno=%u", - __entry->dev, __entry->ring, __entry->seqno) + TP_printk("dev=%u, ring=%u, seqno=%u, empty=%d", + __entry->dev, __entry->ring, __entry->seqno, + __entry->is_empty) ); DEFINE_EVENT(i915_gem_request, i915_gem_request_retire, -- 1.9.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 05/13] drm/i915: Convert requests to use struct fence
From: John HarrisonThere is a construct in the linux kernel called 'struct fence' that is intended to keep track of work that is executed on hardware. I.e. it solves the basic problem that the drivers 'struct drm_i915_gem_request' is trying to address. The request structure does quite a lot more than simply track the execution progress so is very definitely still required. However, the basic completion status side could be updated to use the ready made fence implementation and gain all the advantages that provides. This patch makes the first step of integrating a struct fence into the request. It replaces the explicit reference count with that of the fence. It also replaces the 'is completed' test with the fence's equivalent. Currently, that simply chains on to the original request implementation. A future patch will improve this. v3: Updated after review comments by Tvrtko Ursulin. Added fence context/seqno pair to the debugfs request info. Renamed fence 'driver name' to just 'i915'. Removed BUG_ONs. For: VIZ-5190 Signed-off-by: John Harrison Cc: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_debugfs.c | 5 +-- drivers/gpu/drm/i915/i915_drv.h | 45 +- drivers/gpu/drm/i915/i915_gem.c | 56 ++--- drivers/gpu/drm/i915/intel_lrc.c| 1 + drivers/gpu/drm/i915/intel_ringbuffer.c | 1 + drivers/gpu/drm/i915/intel_ringbuffer.h | 3 ++ 6 files changed, 81 insertions(+), 30 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index 7415606..5b31186 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -709,11 +709,12 @@ static int i915_gem_request_info(struct seq_file *m, void *data) task = NULL; if (req->pid) task = pid_task(req->pid, PIDTYPE_PID); - seq_printf(m, "%x @ %d: %s [%d]\n", + seq_printf(m, "%x @ %d: %s [%d], fence = %u.%u\n", req->seqno, (int) (jiffies - req->emitted_jiffies), task ? task->comm : "", - task ? task->pid : -1); + task ? task->pid : -1, + req->fence.context, req->fence.seqno); rcu_read_unlock(); } diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 436149e..aa5cba7 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -51,6 +51,7 @@ #include #include #include "intel_guc.h" +#include /* General customization: */ @@ -2174,7 +2175,17 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old, * initial reference taken using kref_init */ struct drm_i915_gem_request { - struct kref ref; + /** +* Underlying object for implementing the signal/wait stuff. +* NB: Never call fence_later() or return this fence object to user +* land! Due to lazy allocation, scheduler re-ordering, pre-emption, +* etc., there is no guarantee at all about the validity or +* sequentiality of the fence's seqno! It is also unsafe to let +* anything outside of the i915 driver get hold of the fence object +* as the clean up when decrementing the reference count requires +* holding the driver mutex lock. +*/ + struct fence fence; /** On Which ring this request was generated */ struct drm_i915_private *i915; @@ -2251,7 +2262,13 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring, struct intel_context *ctx, struct drm_i915_gem_request **req_out); void i915_gem_request_cancel(struct drm_i915_gem_request *req); -void i915_gem_request_free(struct kref *req_ref); + +static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req, + bool lazy_coherency) +{ + return fence_is_signaled(>fence); +} + int i915_gem_request_add_to_client(struct drm_i915_gem_request *req, struct drm_file *file); @@ -2271,7 +2288,7 @@ static inline struct drm_i915_gem_request * i915_gem_request_reference(struct drm_i915_gem_request *req) { if (req) - kref_get(>ref); + fence_get(>fence); return req; } @@ -2279,7 +2296,7 @@ static inline void i915_gem_request_unreference(struct drm_i915_gem_request *req) { WARN_ON(!mutex_is_locked(>ring->dev->struct_mutex)); - kref_put(>ref, i915_gem_request_free); + fence_put(>fence); } static inline void @@ -2291,7 +2308,7 @@ i915_gem_request_unreference__unlocked(struct
[Intel-gfx] [PATCH 00/13] Convert requests to use struct fence
From: John HarrisonThere is a construct in the linux kernel called 'struct fence' that is intended to keep track of work that is executed on hardware. I.e. it solves the basic problem that the drivers 'struct drm_i915_gem_request' is trying to address. The request structure does quite a lot more than simply track the execution progress so is very definitely still required. However, the basic completion status side could be updated to use the ready made fence implementation and gain all the advantages that provides. Using the struct fence object also has the advantage that the fence can be used outside of the i915 driver (by other drivers or by userland applications). That is the basis of the dma-buff synchronisation API and allows asynchronous tracking of work completion. In this case, it allows applications to be signalled directly when a batch buffer completes without having to make an IOCTL call into the driver. This is work that was planned since the conversion of the driver from being seqno value based to being request structure based. This patch series does that work. An IGT test to exercise the fence support from user land is in progress and will follow. Android already makes extensive use of fences for display composition. Real world linux usage is planned in the form of Jesse's page table sharing / bufferless execbuf support. There is also a plan that Wayland (and others) could make use of it in a similar manner to Android. v2: Updated for review comments by various people and to add support for Android style 'native sync'. v3: Updated from review comments by Tvrtko Ursulin. Also moved sync framework out of staging and improved request completion handling. v4: Fixed patch tag (should have been PATCH not RFC). Corrected ownership of one patch which had passed through many hands before reaching me. Fixed a bug introduced in v3 and updated for review comments. [Patches against drm-intel-nightly tree fetched 17/11/2015] John Harrison (10): staging/android/sync: Move sync framework out of staging android/sync: Improved debug dump to dmesg drm/i915: Convert requests to use struct fence drm/i915: Removed now redudant parameter to i915_gem_request_completed() drm/i915: Add per context timelines to fence object drm/i915: Delay the freeing of requests until retire time drm/i915: Interrupt driven fences drm/i915: Updated request structure tracing drm/i915: Add sync framework support to execbuff IOCTL drm/i915: Cache last IRQ seqno to reduce IRQ overhead Maarten Lankhorst (2): staging/android/sync: Support sync points created from dma-fences staging/android/sync: add sync_fence_create_dma Peter Lawthers (1): android/sync: Fix reversed sense of signaled fence drivers/android/Kconfig| 28 ++ drivers/android/Makefile | 2 + drivers/android/sw_sync.c | 260 ++ drivers/android/sw_sync.h | 59 +++ drivers/android/sync.c | 739 + drivers/android/sync.h | 388 +++ drivers/android/sync_debug.c | 280 +++ drivers/android/trace/sync.h | 82 drivers/gpu/drm/i915/Kconfig | 3 + drivers/gpu/drm/i915/i915_debugfs.c| 7 +- drivers/gpu/drm/i915/i915_drv.h| 75 +-- drivers/gpu/drm/i915/i915_gem.c| 438 - drivers/gpu/drm/i915/i915_gem_context.c| 15 +- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 95 +++- drivers/gpu/drm/i915/i915_irq.c| 2 +- drivers/gpu/drm/i915/i915_trace.h | 13 +- drivers/gpu/drm/i915/intel_display.c | 4 +- drivers/gpu/drm/i915/intel_lrc.c | 13 + drivers/gpu/drm/i915/intel_pm.c| 6 +- drivers/gpu/drm/i915/intel_ringbuffer.c| 5 + drivers/gpu/drm/i915/intel_ringbuffer.h| 9 + drivers/staging/android/Kconfig| 28 -- drivers/staging/android/Makefile | 2 - drivers/staging/android/sw_sync.c | 260 -- drivers/staging/android/sw_sync.h | 59 --- drivers/staging/android/sync.c | 729 drivers/staging/android/sync.h | 356 -- drivers/staging/android/sync_debug.c | 254 -- drivers/staging/android/trace/sync.h | 82 drivers/staging/android/uapi/sw_sync.h | 32 -- drivers/staging/android/uapi/sync.h| 97 include/uapi/Kbuild| 1 + include/uapi/drm/i915_drm.h| 16 +- include/uapi/sync/Kbuild | 3 + include/uapi/sync/sw_sync.h| 32 ++ include/uapi/sync/sync.h | 97 36 files changed, 2600 insertions(+), 1971 deletions(-) create mode 100644 drivers/android/sw_sync.c create mode 100644 drivers/android/sw_sync.h create mode 100644
[Intel-gfx] [PATCH 07/13] drm/i915: Add per context timelines to fence object
From: John HarrisonThe fence object used inside the request structure requires a sequence number. Although this is not used by the i915 driver itself, it could potentially be used by non-i915 code if the fence is passed outside of the driver. This is the intention as it allows external kernel drivers and user applications to wait on batch buffer completion asynchronously via the dma-buff fence API. To ensure that such external users are not confused by strange things happening with the seqno, this patch adds in a per context timeline that can provide a guaranteed in-order seqno value for the fence. This is safe because the scheduler will not re-order batch buffers within a context - they are considered to be mutually dependent. v2: New patch in series. v3: Renamed/retyped timeline structure fields after review comments by Tvrtko Ursulin. Added context information to the timeline's name string for better identification in debugfs output. For: VIZ-5190 Signed-off-by: John Harrison Cc: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_drv.h | 25 --- drivers/gpu/drm/i915/i915_gem.c | 80 + drivers/gpu/drm/i915/i915_gem_context.c | 15 ++- drivers/gpu/drm/i915/intel_lrc.c| 8 drivers/gpu/drm/i915/intel_ringbuffer.h | 1 - 5 files changed, 111 insertions(+), 18 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index caf7897..7d6a7c0 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -841,6 +841,15 @@ struct i915_ctx_hang_stats { bool banned; }; +struct i915_fence_timeline { + charname[32]; + unsignedfence_context; + unsignednext; + + struct intel_context *ctx; + struct intel_engine_cs *ring; +}; + /* This must match up with the value previously used for execbuf2.rsvd1. */ #define DEFAULT_CONTEXT_HANDLE 0 @@ -885,6 +894,7 @@ struct intel_context { struct drm_i915_gem_object *state; struct intel_ringbuffer *ringbuf; int pin_count; + struct i915_fence_timeline fence_timeline; } engine[I915_NUM_RINGS]; struct list_head link; @@ -2177,13 +2187,10 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old, struct drm_i915_gem_request { /** * Underlying object for implementing the signal/wait stuff. -* NB: Never call fence_later() or return this fence object to user -* land! Due to lazy allocation, scheduler re-ordering, pre-emption, -* etc., there is no guarantee at all about the validity or -* sequentiality of the fence's seqno! It is also unsafe to let -* anything outside of the i915 driver get hold of the fence object -* as the clean up when decrementing the reference count requires -* holding the driver mutex lock. +* NB: Never return this fence object to user land! It is unsafe to +* let anything outside of the i915 driver get hold of the fence +* object as the clean up when decrementing the reference count +* requires holding the driver mutex lock. */ struct fence fence; @@ -2263,6 +2270,10 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring, struct drm_i915_gem_request **req_out); void i915_gem_request_cancel(struct drm_i915_gem_request *req); +int i915_create_fence_timeline(struct drm_device *dev, + struct intel_context *ctx, + struct intel_engine_cs *ring); + static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req) { return fence_is_signaled(>fence); diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 0801738..7a37fb7 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2665,9 +2665,32 @@ static const char *i915_gem_request_get_driver_name(struct fence *req_fence) static const char *i915_gem_request_get_timeline_name(struct fence *req_fence) { - struct drm_i915_gem_request *req = container_of(req_fence, -typeof(*req), fence); - return req->ring->name; + struct drm_i915_gem_request *req; + struct i915_fence_timeline *timeline; + + req = container_of(req_fence, typeof(*req), fence); + timeline = >ctx->engine[req->ring->id].fence_timeline; + + return timeline->name; +} + +static void i915_gem_request_timeline_value_str(struct fence *req_fence, char *str, int size) +{ + struct drm_i915_gem_request *req; + + req = container_of(req_fence, typeof(*req), fence); + + /* Last signalled timeline value ??? */ + snprintf(str, size, "? [%d]"/*, timeline->value*/, req->ring->get_seqno(req->ring, true)); +} + +static
[Intel-gfx] [PATCH 06/13] drm/i915: Removed now redudant parameter to i915_gem_request_completed()
From: John HarrisonThe change to the implementation of i915_gem_request_completed() means that the lazy coherency flag is no longer used. This can now be removed to simplify the interface. For: VIZ-5190 Signed-off-by: John Harrison --- drivers/gpu/drm/i915/i915_debugfs.c | 2 +- drivers/gpu/drm/i915/i915_drv.h | 3 +-- drivers/gpu/drm/i915/i915_gem.c | 18 +- drivers/gpu/drm/i915/intel_display.c | 2 +- drivers/gpu/drm/i915/intel_pm.c | 4 ++-- 5 files changed, 14 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index 5b31186..18dfb56 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -601,7 +601,7 @@ static int i915_gem_pageflip_info(struct seq_file *m, void *data) i915_gem_request_get_seqno(work->flip_queued_req), dev_priv->next_seqno, ring->get_seqno(ring, true), - i915_gem_request_completed(work->flip_queued_req, true)); + i915_gem_request_completed(work->flip_queued_req)); } else seq_printf(m, "Flip not associated with any ring\n"); seq_printf(m, "Flip queued on frame %d, (was ready on frame %d), now %d\n", diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index aa5cba7..caf7897 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2263,8 +2263,7 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring, struct drm_i915_gem_request **req_out); void i915_gem_request_cancel(struct drm_i915_gem_request *req); -static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req, - bool lazy_coherency) +static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req) { return fence_is_signaled(>fence); } diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index a1b4dbd..0801738 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1165,7 +1165,7 @@ static int __i915_spin_request(struct drm_i915_gem_request *req) timeout = jiffies + 1; while (!need_resched()) { - if (i915_gem_request_completed(req, true)) + if (i915_gem_request_completed(req)) return 0; if (time_after_eq(jiffies, timeout)) @@ -1173,7 +1173,7 @@ static int __i915_spin_request(struct drm_i915_gem_request *req) cpu_relax_lowlatency(); } - if (i915_gem_request_completed(req, false)) + if (i915_gem_request_completed(req)) return 0; return -EAGAIN; @@ -1217,7 +1217,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req, if (list_empty(>list)) return 0; - if (i915_gem_request_completed(req, true)) + if (i915_gem_request_completed(req)) return 0; timeout_expire = timeout ? @@ -1257,7 +1257,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req, break; } - if (i915_gem_request_completed(req, false)) { + if (i915_gem_request_completed(req)) { ret = 0; break; } @@ -2758,7 +2758,7 @@ i915_gem_find_active_request(struct intel_engine_cs *ring) struct drm_i915_gem_request *request; list_for_each_entry(request, >request_list, list) { - if (i915_gem_request_completed(request, false)) + if (i915_gem_request_completed(request)) continue; return request; @@ -2899,7 +2899,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring) struct drm_i915_gem_request, list); - if (!i915_gem_request_completed(request, true)) + if (!i915_gem_request_completed(request)) break; i915_gem_request_retire(request); @@ -2923,7 +2923,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring) } if (unlikely(ring->trace_irq_req && -i915_gem_request_completed(ring->trace_irq_req, true))) { +i915_gem_request_completed(ring->trace_irq_req))) { ring->irq_put(ring); i915_gem_request_assign(>trace_irq_req, NULL); } @@ -3029,7 +3029,7 @@ i915_gem_object_flush_active(struct drm_i915_gem_object *obj) if (list_empty(>list))
[Intel-gfx] [PATCH 08/13] drm/i915: Delay the freeing of requests until retire time
From: John HarrisonThe request structure is reference counted. When the count reached zero, the request was immediately freed and all associated objects were unrefereced/unallocated. This meant that the driver mutex lock must be held at the point where the count reaches zero. This was fine while all references were held internally to the driver. However, the plan is to allow the underlying fence object (and hence the request itself) to be returned to other drivers and to userland. External users cannot be expected to acquire a driver private mutex lock. Rather than attempt to disentangle the request structure from the driver mutex lock, the decsion was to defer the free code until a later (safer) point. Hence this patch changes the unreference callback to merely move the request onto a delayed free list. The driver's retire worker thread will then process the list and actually call the free function on the requests. v2: New patch in series. v3: Updated after review comments by Tvrtko Ursulin. Rename list nodes to 'link' rather than 'list'. Update list processing to be more efficient/safer with respect to spinlocks. v4: Changed to use basic spinlocks rather than IRQ ones - missed update from earlier feedback by Tvrtko. For: VIZ-5190 Signed-off-by: John Harrison Cc: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_drv.h | 22 +++- drivers/gpu/drm/i915/i915_gem.c | 37 + drivers/gpu/drm/i915/intel_display.c| 2 +- drivers/gpu/drm/i915/intel_lrc.c| 2 ++ drivers/gpu/drm/i915/intel_pm.c | 2 +- drivers/gpu/drm/i915/intel_ringbuffer.c | 2 ++ drivers/gpu/drm/i915/intel_ringbuffer.h | 4 7 files changed, 46 insertions(+), 25 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 7d6a7c0..fbf591f 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2185,14 +2185,9 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old, * initial reference taken using kref_init */ struct drm_i915_gem_request { - /** -* Underlying object for implementing the signal/wait stuff. -* NB: Never return this fence object to user land! It is unsafe to -* let anything outside of the i915 driver get hold of the fence -* object as the clean up when decrementing the reference count -* requires holding the driver mutex lock. -*/ + /** Underlying object for implementing the signal/wait stuff. */ struct fence fence; + struct list_head delayed_free_link; /** On Which ring this request was generated */ struct drm_i915_private *i915; @@ -2305,21 +2300,10 @@ i915_gem_request_reference(struct drm_i915_gem_request *req) static inline void i915_gem_request_unreference(struct drm_i915_gem_request *req) { - WARN_ON(!mutex_is_locked(>ring->dev->struct_mutex)); - fence_put(>fence); -} - -static inline void -i915_gem_request_unreference__unlocked(struct drm_i915_gem_request *req) -{ - struct drm_device *dev; - if (!req) return; - dev = req->ring->dev; - if (kref_put_mutex(>fence.refcount, fence_release, >struct_mutex)) - mutex_unlock(>struct_mutex); + fence_put(>fence); } static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst, diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 7a37fb7..f6c3e96 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2617,10 +2617,26 @@ static void i915_set_reset_status(struct drm_i915_private *dev_priv, } } -static void i915_gem_request_free(struct fence *req_fence) +static void i915_gem_request_release(struct fence *req_fence) { struct drm_i915_gem_request *req = container_of(req_fence, typeof(*req), fence); + struct intel_engine_cs *ring = req->ring; + struct drm_i915_private *dev_priv = to_i915(ring->dev); + + /* +* Need to add the request to a deferred dereference list to be +* processed at a mutex lock safe time. +*/ + spin_lock(>delayed_free_lock); + list_add_tail(>delayed_free_link, >delayed_free_list); + spin_unlock(>delayed_free_lock); + + queue_delayed_work(dev_priv->wq, _priv->mm.retire_work, 0); +} + +static void i915_gem_request_free(struct drm_i915_gem_request *req) +{ struct intel_context *ctx = req->ctx; WARN_ON(!mutex_is_locked(>ring->dev->struct_mutex)); @@ -2697,7 +2713,7 @@ static const struct fence_ops i915_gem_request_fops = { .enable_signaling = i915_gem_request_enable_signaling, .signaled = i915_gem_request_is_completed, .wait = fence_default_wait, - .release
[Intel-gfx] [PATCH 11/13] android/sync: Fix reversed sense of signaled fence
From: Peter LawthersIn the 3.14 kernel, a signaled fence was indicated by the status field == 1. In 4.x, a status == 0 indicates signaled, status < 0 indicates error, and status > 0 indicates active. This patch wraps the check for a signaled fence in a function so that callers no longer needs to know the underlying implementation. v3: New patch for series. Change-Id: I8e565e49683e3efeb9474656cd84cf4add6ad6a2 Tracked-On: https://jira01.devtools.intel.com/browse/ACD-308 Signed-off-by: Peter Lawthers --- drivers/android/sync.h | 21 + 1 file changed, 21 insertions(+) diff --git a/drivers/android/sync.h b/drivers/android/sync.h index d57fa0a..75532d8 100644 --- a/drivers/android/sync.h +++ b/drivers/android/sync.h @@ -345,6 +345,27 @@ int sync_fence_cancel_async(struct sync_fence *fence, */ int sync_fence_wait(struct sync_fence *fence, long timeout); +/** + * sync_fence_is_signaled() - Return an indication if the fence is signaled + * @fence: fence to check + * + * returns 1 if fence is signaled + * returns 0 if fence is not signaled + * returns < 0 if fence is in error state + */ +static inline int +sync_fence_is_signaled(struct sync_fence *fence) +{ + int status; + + status = atomic_read(>status); + if (status == 0) + return 1; + if (status > 0) + return 0; + return status; +} + #ifdef CONFIG_DEBUG_FS void sync_timeline_debug_add(struct sync_timeline *obj); -- 1.9.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 01/13] staging/android/sync: Support sync points created from dma-fences
From: Maarten LankhorstDebug output assumes all sync points are built on top of Android sync points and when we start creating them from dma-fences will NULL ptr deref unless taught about this. v4: Corrected patch ownership. Signed-off-by: Maarten Lankhorst Signed-off-by: Tvrtko Ursulin Cc: Maarten Lankhorst Cc: de...@driverdev.osuosl.org Cc: Riley Andrews Cc: Greg Kroah-Hartman Cc: Arve Hjønnevåg --- drivers/staging/android/sync_debug.c | 42 +++- 1 file changed, 22 insertions(+), 20 deletions(-) diff --git a/drivers/staging/android/sync_debug.c b/drivers/staging/android/sync_debug.c index 91ed2c4..f45d13c 100644 --- a/drivers/staging/android/sync_debug.c +++ b/drivers/staging/android/sync_debug.c @@ -82,36 +82,42 @@ static const char *sync_status_str(int status) return "error"; } -static void sync_print_pt(struct seq_file *s, struct sync_pt *pt, bool fence) +static void sync_print_pt(struct seq_file *s, struct fence *pt, bool fence) { int status = 1; - struct sync_timeline *parent = sync_pt_parent(pt); - if (fence_is_signaled_locked(>base)) - status = pt->base.status; + if (fence_is_signaled_locked(pt)) + status = pt->status; seq_printf(s, " %s%spt %s", - fence ? parent->name : "", + fence && pt->ops->get_timeline_name ? + pt->ops->get_timeline_name(pt) : "", fence ? "_" : "", sync_status_str(status)); if (status <= 0) { struct timespec64 ts64 = - ktime_to_timespec64(pt->base.timestamp); + ktime_to_timespec64(pt->timestamp); seq_printf(s, "@%lld.%09ld", (s64)ts64.tv_sec, ts64.tv_nsec); } - if (parent->ops->timeline_value_str && - parent->ops->pt_value_str) { + if ((!fence || pt->ops->timeline_value_str) && + pt->ops->fence_value_str) { char value[64]; + bool success; - parent->ops->pt_value_str(pt, value, sizeof(value)); - seq_printf(s, ": %s", value); - if (fence) { - parent->ops->timeline_value_str(parent, value, - sizeof(value)); - seq_printf(s, " / %s", value); + pt->ops->fence_value_str(pt, value, sizeof(value)); + success = strlen(value); + + if (success) + seq_printf(s, ": %s", value); + + if (success && fence) { + pt->ops->timeline_value_str(pt, value, sizeof(value)); + + if (strlen(value)) + seq_printf(s, " / %s", value); } } @@ -138,7 +144,7 @@ static void sync_print_obj(struct seq_file *s, struct sync_timeline *obj) list_for_each(pos, >child_list_head) { struct sync_pt *pt = container_of(pos, struct sync_pt, child_list); - sync_print_pt(s, pt, false); + sync_print_pt(s, >base, false); } spin_unlock_irqrestore(>child_list_lock, flags); } @@ -153,11 +159,7 @@ static void sync_print_fence(struct seq_file *s, struct sync_fence *fence) sync_status_str(atomic_read(>status))); for (i = 0; i < fence->num_fences; ++i) { - struct sync_pt *pt = - container_of(fence->cbs[i].sync_pt, -struct sync_pt, base); - - sync_print_pt(s, pt, true); + sync_print_pt(s, fence->cbs[i].sync_pt, true); } spin_lock_irqsave(>wq.lock, flags); -- 1.9.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 02/13] staging/android/sync: add sync_fence_create_dma
From: Maarten LankhorstThis allows users of dma fences to create a android fence. v2: Added kerneldoc. (Tvrtko Ursulin). v4: Updated comments from review feedback my Maarten. Signed-off-by: Maarten Lankhorst Signed-off-by: Tvrtko Ursulin Cc: Maarten Lankhorst Cc: Daniel Vetter Cc: Jesse Barnes Cc: de...@driverdev.osuosl.org Cc: Riley Andrews Cc: Greg Kroah-Hartman Cc: Arve Hjønnevåg --- drivers/staging/android/sync.c | 13 + drivers/staging/android/sync.h | 10 ++ 2 files changed, 19 insertions(+), 4 deletions(-) diff --git a/drivers/staging/android/sync.c b/drivers/staging/android/sync.c index f83e00c..7f0e919 100644 --- a/drivers/staging/android/sync.c +++ b/drivers/staging/android/sync.c @@ -188,7 +188,7 @@ static void fence_check_cb_func(struct fence *f, struct fence_cb *cb) } /* TODO: implement a create which takes more that one sync_pt */ -struct sync_fence *sync_fence_create(const char *name, struct sync_pt *pt) +struct sync_fence *sync_fence_create_dma(const char *name, struct fence *pt) { struct sync_fence *fence; @@ -199,16 +199,21 @@ struct sync_fence *sync_fence_create(const char *name, struct sync_pt *pt) fence->num_fences = 1; atomic_set(>status, 1); - fence->cbs[0].sync_pt = >base; + fence->cbs[0].sync_pt = pt; fence->cbs[0].fence = fence; - if (fence_add_callback(>base, >cbs[0].cb, - fence_check_cb_func)) + if (fence_add_callback(pt, >cbs[0].cb, fence_check_cb_func)) atomic_dec(>status); sync_fence_debug_add(fence); return fence; } +EXPORT_SYMBOL(sync_fence_create_dma); + +struct sync_fence *sync_fence_create(const char *name, struct sync_pt *pt) +{ + return sync_fence_create_dma(name, >base); +} EXPORT_SYMBOL(sync_fence_create); struct sync_fence *sync_fence_fdget(int fd) diff --git a/drivers/staging/android/sync.h b/drivers/staging/android/sync.h index 61f8a3a..afa0752 100644 --- a/drivers/staging/android/sync.h +++ b/drivers/staging/android/sync.h @@ -254,6 +254,16 @@ void sync_pt_free(struct sync_pt *pt); */ struct sync_fence *sync_fence_create(const char *name, struct sync_pt *pt); +/** + * sync_fence_create_dma() - creates a sync fence from dma-fence + * @name: name of fence to create + * @pt:dma-fence to add to the fence + * + * Creates a fence containg @pt. Once this is called, the fence takes + * ownership of @pt. + */ +struct sync_fence *sync_fence_create_dma(const char *name, struct fence *pt); + /* * API for sync_fence consumers */ -- 1.9.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 00/40] GPU scheduler for i915 driver
From: John HarrisonImplemented a batch buffer submission scheduler for the i915 DRM driver. The general theory of operation is that when batch buffers are submitted to the driver, the execbuffer() code assigns a unique seqno value and then packages up all the information required to execute the batch buffer at a later time. This package is given over to the scheduler which adds it to an internal node list. The scheduler also scans the list of objects associated with the batch buffer and compares them against the objects already in use by other buffers in the node list. If matches are found then the new batch buffer node is marked as being dependent upon the matching node. The same is done for the context object. The scheduler also bumps up the priority of such matching nodes on the grounds that the more dependencies a given batch buffer has the more important it is likely to be. The scheduler aims to have a given (tuneable) number of batch buffers in flight on the hardware at any given time. If fewer than this are currently executing when a new node is queued, then the node is passed straight through to the submit function. Otherwise it is simply added to the queue and the driver returns back to user land. As each batch buffer completes, it raises an interrupt which wakes up the scheduler. Note that it is possible for multiple buffers to complete before the IRQ handler gets to run. Further, the seqno values of the individual buffers are not necessary incrementing as the scheduler may have re-ordered their submission. However, the scheduler keeps the list of executing buffers in order of hardware submission. Thus it can scan through the list until a matching seqno is found and then mark all in flight nodes from that point on as completed. A deferred work queue is also poked by the interrupt handler. When this wakes up it can do more involved processing such as actually removing completed nodes from the queue and freeing up the resources associated with them (internal memory allocations, DRM object references, context reference, etc.). The work handler also checks the in flight count and calls the submission code if a new slot has appeared. When the scheduler's submit code is called, it scans the queued node list for the highest priority node that has no unmet dependencies. Note that the dependency calculation is complex as it must take inter-ring dependencies and potential preemptions into account. Note also that in the future this will be extended to include external dependencies such as the Android Native Sync file descriptors and/or the linux dma-buff synchronisation scheme. If a suitable node is found then it is sent to execbuff_final() for submission to the hardware. The in flight count is then re-checked and a new node popped from the list if appropriate. The scheduler also allows high priority batch buffers (e.g. from a desktop compositor) to jump ahead of whatever is already running if the underlying hardware supports pre-emption. In this situation, any work that was pre-empted is returned to the queued list ready to be resubmitted when no more high priority work is outstanding. Various IGT tests are in progress to test the scheduler's operation and will follow. v2: Updated for changes in struct fence patch series and other changes to underlying tree (e.g. removal of cliprects). Also changed priority levels to be signed +/-1023 range and reduced mutex lock usage. v3: More reuse of cached pointers rather than repeated dereferencing (David Gordon). Moved the dependency generation code out to a seperate function for easier readability. Also added in support for the read-read optimisation. Major simplification of the DRM file close handler. Fixed up an overzealous WARN. Removed unnecessary flushing of the scheduler queue when waiting for a request. [Patches against drm-intel-nightly tree fetched 17/11/2015 with struct fence conversion patches applied] Dave Gordon (3): drm/i915: Updating assorted register and status page definitions drm/i915: Cache request pointer in *_submission_final() drm/i915: Add scheduling priority to per-context parameters John Harrison (37): drm/i915: Add total count to context status debugfs output drm/i915: Explicit power enable during deferred context initialisation drm/i915: Prelude to splitting i915_gem_do_execbuffer in two drm/i915: Split i915_dem_do_execbuffer() in half drm/i915: Re-instate request->uniq because it is extremely useful drm/i915: Start of GPU scheduler drm/i915: Prepare retire_requests to handle out-of-order seqnos drm/i915: Disable hardware semaphores when GPU scheduler is enabled drm/i915: Force MMIO flips when scheduler enabled drm/i915: Added scheduler hook when closing DRM file handles drm/i915: Added scheduler hook into i915_gem_request_notify() drm/i915: Added deferred work handler for scheduler drm/i915: Redirect execbuffer_final() via scheduler drm/i915: Keep the
Re: [Intel-gfx] [PATCH] drm/i915: Update to post-reset execlist queue clean-up
On 01/12/15 11:46, Tvrtko Ursulin wrote: On 23/10/15 18:02, Tomas Elf wrote: When clearing an execlist queue, instead of traversing it and unreferencing all requests while holding the spinlock (which might lead to thread sleeping with IRQs are turned off - bad news!), just move all requests to the retire request list while holding spinlock and then drop spinlock and invoke the execlists request retirement path, which already deals with the intricacies of purging/dereferencing execlist queue requests. This patch can be considered v3 of: commit b96db8b81c54ef30485ddb5992d63305d86ea8d3 Author: Tomas Elfdrm/i915: Grab execlist spinlock to avoid post-reset concurrency issues This patch assumes v2 of the above patch is part of the baseline, reverts v2 and adds changes on top to turn it into v3. Signed-off-by: Tomas Elf Cc: Tvrtko Ursulin Cc: Chris Wilson --- drivers/gpu/drm/i915/i915_gem.c | 15 --- 1 file changed, 4 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 2c7a0b7..b492603 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2756,20 +2756,13 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv, if (i915.enable_execlists) { spin_lock_irq(>execlist_lock); -while (!list_empty(>execlist_queue)) { -struct drm_i915_gem_request *submit_req; -submit_req = list_first_entry(>execlist_queue, -struct drm_i915_gem_request, -execlist_link); -list_del(_req->execlist_link); +/* list_splice_tail_init checks for empty lists */ +list_splice_tail_init(>execlist_queue, + >execlist_retired_req_list); -if (submit_req->ctx != ring->default_context) -intel_lr_context_unpin(submit_req); - -i915_gem_request_unreference(submit_req); -} spin_unlock_irq(>execlist_lock); +intel_execlists_retire_requests(ring); } /* Fallen through the cracks.. This looks to be even more serious, since lockdep notices possible deadlock involving vmap_area_lock: Possible interrupt unsafe locking scenario: CPU0CPU1 lock(vmap_area_lock); local_irq_disable(); lock(&(>execlist_lock)->rlock); lock(vmap_area_lock); lock(&(>execlist_lock)->rlock); *** DEADLOCK *** Because it unpins LRC context and ringbuffer which ends up in the VM code under the execlist_lock. intel_execlists_retire_requests is slightly different from the code in the reset handler because it concerns itself with ctx_obj existence which the other one doesn't. Could people more knowledgeable of this code check if it is OK and R-B? Regards, Tvrtko Hi Tvrtko, I didn't understand this message at first, I thought you'd found a problem with this ("v3") patch, but now I see what you actually meant is that there is indeed a problem with the (v2) that got merged, not the original question about unreferencing an object while holding a spinlock (because it can't be the last reference), but rather because of the unpin, which can indeed cause a problem with a non-i915-defined kernel lock. So we should certainly update the current (v2) upstream with this. Thomas Daniel already R-B'd this code on 23rd October, when it was: [PATCH v3 7/8] drm/i915: Grab execlist spinlock to avoid post-reset concurrency issues. and it hasn't changed in substance since then, so you can carry his R-B over, plus I said on that same day that this was a better solution. So: Reviewed-by: Thomas Daniel Reviewed-by: Dave Gordon ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v2] PM / Runtime: Introduce pm_runtime_get_noidle
On to, 2015-12-10 at 23:14 +0100, Rafael J. Wysocki wrote: > On Thursday, December 10, 2015 11:20:40 PM Imre Deak wrote: > > On Thu, 2015-12-10 at 22:42 +0100, Rafael J. Wysocki wrote: > > > On Thursday, December 10, 2015 10:36:37 PM Rafael J. Wysocki > > > wrote: > > > > On Thursday, December 10, 2015 11:43:50 AM Imre Deak wrote: > > > > > On Thu, 2015-12-10 at 01:58 +0100, Rafael J. Wysocki wrote: > > > > > > On Wednesday, December 09, 2015 06:22:19 PM Joonas Lahtinen > > > > > > wrote: > > > > > > > Introduce pm_runtime_get_noidle to for situations where > > > > > > > it is > > > > > > > not > > > > > > > desireable to touch an idling device. One use scenario is > > > > > > > periodic > > > > > > > hangchecks performed by the drm/i915 driver which can be > > > > > > > omitted > > > > > > > on a device in a runtime idle state. > > > > > > > > > > > > > > v2: > > > > > > > - Fix inconsistent return value when !CONFIG_PM. > > > > > > > - Update documentation for bool return value > > > > > > > > > > > > > > Signed-off-by: Joonas Lahtinen> > > > > > el.c > > > > > > > om> > > > > > > > Reported-by: Chris Wilson > > > > > > > Cc: Chris Wilson > > > > > > > Cc: "Rafael J. Wysocki" > > > > > > > Cc: linux...@vger.kernel.org > > > > > > > > > > > > Well, I don't quite see how this can be used in a non-racy > > > > > > way > > > > > > without doing an additional pm_runtime_resume() or > > > > > > something > > > > > > like > > > > > > that in the same code path. > > > > > > > > > > We don't want to resume, that would be the whole point. We'd > > > > > like > > > > > to > > > > > ensure that we hold a reference _and_ the device is already > > > > > active. So > > > > > AFAICS we'd need to check runtime_status == RPM_ACTIVE in > > > > > addition > > > > > after taking the reference. > > > > > > > > Right, and that under the lock. > > > > > > Which basically means you can call pm_runtime_resume() just fine, > > > because it will do nothing if the status is RPM_ACTIVE already. > > > > > > So really, why don't you use pm_runtime_get_sync()? > > > > The difference would be that if the status is not RPM_ACTIVE > > already we > > would drop the reference and report error. The caller would in this > > case forego of doing something, since we the device is suspended or > > on > > the way to being suspended. One example of such a scenario is a > > watchdog like functionality: the watchdog work would > > call pm_runtime_get_noidle() and check if the device is ok by doing > > some HW access, but only if the device is powered. Otherwise the > > work > > item would do nothing (meaning it also won't reschedule itself). > > The > > watchdog work would get rescheduled next time the device is woken > > up > > and some work is submitted to the device. > > So first of all the name "pm_runtime_get_noidle" doesn't make sense. > > I guess what you need is something like > > bool pm_runtime_get_if_active(struct device *dev) > { > unsigned log flags; > bool ret; > > spin_lock_irqsave(>power.lock, flags); > > if (dev->power.runtime_status == RPM_ACTIVE) { But here usage_count could be zero, meaning that the device is already on the way to be suspended (autosuspend or ASYNC suspend), no? In that case we don't want to return success. That would unnecessarily prolong the time the device is kept active. > atomic_inc(>power.usage_count); > ret = true; > } else { > ret = false; > } > > spin_unlock_irqrestore(>power.lock, flags); > } > > and the caller will simply bail out if "false" is returned, but if > "true" > is returned, it will have to drop the usage count, right? Yes. --Imre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH i-g-t] tests/gem_softpin: Use offset addresses in canonical form
i915 validates that requested offset is in canonical form, so tests need to convert the offsets as required. Also add test to verify non-canonical 48-bit address will be rejected. Signed-off-by: Michel Thierry--- tests/gem_softpin.c | 66 + 1 file changed, 46 insertions(+), 20 deletions(-) diff --git a/tests/gem_softpin.c b/tests/gem_softpin.c index 7bee16b..2981b30 100644 --- a/tests/gem_softpin.c +++ b/tests/gem_softpin.c @@ -67,7 +67,7 @@ static void *create_mem_buffer(uint64_t size); static int gem_call_userptr_ioctl(int fd, i915_gem_userptr *userptr); static void gem_pin_userptr_test(void); static void gem_pin_bo_test(void); -static void gem_pin_invalid_vma_test(bool test_decouple_flags); +static void gem_pin_invalid_vma_test(bool test_decouple_flags, bool test_canonical_offset); static void gem_pin_overlap_test(void); static void gem_pin_high_address_test(void); @@ -198,6 +198,15 @@ static void setup_exec_obj(struct drm_i915_gem_exec_object2 *exec, exec->offset = offset; } +/* gen8_canonical_addr + * Used to convert any address into canonical form, i.e. [63:48] == [47]. + * @address - a virtual address +*/ +static uint64_t gen8_canonical_addr(uint64_t address) +{ + return ((int64_t)address << 16) >> 16; +} + /* gem_store_data_svm * populate batch buffer with MI_STORE_DWORD_IMM command * @fd: drm file descriptor @@ -630,6 +639,7 @@ static void gem_pin_overlap_test(void) * Share with GPU using userptr ioctl * Create batch buffer to write DATA in first element of each buffer * Pin each buffer to varying addresses starting from 0x8000 going below + * (requires offsets in canonical form) * Execute Batch Buffer on Blit ring STRESS_NUM_LOOPS times * Validate every buffer has DATA in first element * Rinse and Repeat on Render ring @@ -637,7 +647,7 @@ static void gem_pin_overlap_test(void) #define STRESS_NUM_BUFFERS 10 #define STRESS_NUM_LOOPS 100 #define STRESS_STORE_COMMANDS 4 * STRESS_NUM_BUFFERS - +#define STRESS_START_ADDRESS 0x8000 static void gem_softpin_stress_test(void) { i915_gem_userptr userptr; @@ -650,7 +660,7 @@ static void gem_softpin_stress_test(void) uint32_t batch_buf_handle; int ring, len; int buf, loop; - uint64_t pinning_offset = 0x8000; + uint64_t pinning_offset = STRESS_START_ADDRESS; fd = drm_open_driver(DRIVER_INTEL); igt_require(uses_full_ppgtt(fd, FULL_48_BIT_PPGTT)); @@ -680,10 +690,10 @@ static void gem_softpin_stress_test(void) setup_exec_obj(_object2[buf], shared_handle[buf], EXEC_OBJECT_PINNED | EXEC_OBJECT_SUPPORTS_48B_ADDRESS, - pinning_offset); + gen8_canonical_addr(pinning_offset)); len += gem_store_data_svm(fd, batch_buffer + (len/4), - pinning_offset, buf, - (buf == STRESS_NUM_BUFFERS-1)? \ + gen8_canonical_addr(pinning_offset), + buf, (buf == STRESS_NUM_BUFFERS-1)? \ true:false); /* decremental 4K aligned address */ @@ -705,10 +715,11 @@ static void gem_softpin_stress_test(void) for (loop = 0; loop < STRESS_NUM_LOOPS; loop++) { submit_and_sync(fd, , batch_buf_handle); /* Set pinning offset back to original value */ - pinning_offset = 0x8000; + pinning_offset = STRESS_START_ADDRESS; for(buf = 0; buf < STRESS_NUM_BUFFERS; buf++) { gem_userptr_sync(fd, shared_handle[buf]); - igt_assert(exec_object2[buf].offset == pinning_offset); + igt_assert(exec_object2[buf].offset == + gen8_canonical_addr(pinning_offset)); igt_fail_on_f(*shared_buffer[buf] != buf, \ "Mismatch in buffer %d, iteration %d: 0x%08X\n", \ buf, loop, *shared_buffer[buf]); @@ -727,10 +738,11 @@ static void gem_softpin_stress_test(void) STRESS_NUM_BUFFERS + 1, len); for (loop = 0; loop < STRESS_NUM_LOOPS; loop++) { submit_and_sync(fd, , batch_buf_handle); - pinning_offset = 0x8000; + pinning_offset = STRESS_START_ADDRESS; for(buf = 0; buf < STRESS_NUM_BUFFERS; buf++) { gem_userptr_sync(fd, shared_handle[buf]); - igt_assert(exec_object2[buf].offset == pinning_offset); +
Re: [Intel-gfx] [PATCH 13/13] drm/i915: Cache last IRQ seqno to reduce IRQ overhead
On 11/12/15 13:12, john.c.harri...@intel.com wrote: From: John HarrisonThe notify function can be called many times without the seqno changing. A large number of duplicates are to prevent races due to the requirement of not enabling interrupts until requested. However, when interrupts are enabled the IRQ handle can be called multiple times without the ring's seqno value changing. This patch reduces the overhead of these extra calls by caching the last processed seqno value and early exiting if it has not changed. v3: New patch for series. For: VIZ-5190 Signed-off-by: John Harrison --- drivers/gpu/drm/i915/i915_gem.c | 14 +++--- drivers/gpu/drm/i915/intel_ringbuffer.h | 1 + 2 files changed, 12 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 279d79f..3c88678 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2457,6 +2457,8 @@ i915_gem_init_seqno(struct drm_device *dev, u32 seqno) for (j = 0; j < ARRAY_SIZE(ring->semaphore.sync_seqno); j++) ring->semaphore.sync_seqno[j] = 0; + + ring->last_irq_seqno = 0; } return 0; @@ -2788,11 +2790,14 @@ void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked) return; } - if (!fence_locked) - spin_lock_irqsave(>fence_lock, flags); - seqno = ring->get_seqno(ring, false); trace_i915_gem_request_notify(ring, seqno); + if (seqno == ring->last_irq_seqno) + return; + ring->last_irq_seqno = seqno; Hmmm.. do you want to make the check "seqno <= ring->last_irq_seqno" ? Is there a possibility for some weird timing or caching issue where two callers get in and last_irq_seqno goes backwards? Not sure that it would cause a problem, but pattern is unusual and hard to understand for me. Also check and the assignment would need to be under the spinlock I think. + + if (!fence_locked) + spin_lock_irqsave(>fence_lock, flags); list_for_each_entry_safe(req, req_next, >fence_signal_list, signal_link) { if (!req->cancelled) { @@ -3163,7 +3168,10 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv, * Tidy up anything left over. This includes a call to * i915_gem_request_notify() which will make sure that any requests * that were on the signal pending list get also cleaned up. +* NB: The seqno cache must be cleared otherwise the notify call will +* simply return immediately. */ + ring->last_irq_seqno = 0; i915_gem_retire_requests_ring(ring); /* Having flushed all requests from all queues, we know that all diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h index 9d09edb..1987abd 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.h +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h @@ -356,6 +356,7 @@ struct intel_engine_cs { spinlock_t fence_lock; struct list_head fence_signal_list; struct list_head fence_unsignal_list; + uint32_t last_irq_seqno; }; bool intel_ring_initialized(struct intel_engine_cs *ring); Regards, Tvrtko ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Correct max delay for HDMI hotplug live status checking
On Fri, Dec 11, 2015 at 05:05:11AM +, Jindal, Sonika wrote: > How about following instead of two levels of check in the while loop: > > unsigned int retry = 3; > > do { > live_status = intel_digital_port_connected(dev_priv, > hdmi_to_dig_port(intel_hdmi)); > if (live_status) > break; > mdelay(10); > } while (--retry); How about a straight up for loop instead? > > Regards, > Sonika > > -Original Message- > From: Intel-gfx [mailto:intel-gfx-boun...@lists.freedesktop.org] On Behalf Of > Wang, Gary C > Sent: Friday, December 11, 2015 7:39 AM > To: intel-gfx@lists.freedesktop.org > Subject: [Intel-gfx] [PATCH] drm/i915: Correct max delay for HDMI hotplug > live status checking > > The total delay of HDMI hotplug detecting with 30ms should have been split > into a resolution of 3 retries of 10ms each, for the worst cases. But it > still suffered from only waiting 10ms at most in intel_hdmi_detect(). This > patch corrects it by reading hotplug status with 4 times at most for 30ms > delay. > > Reviewed-by: Cooper Chiou> Cc: Gavin Hindman > Signed-off-by: Gary Wang > --- > drivers/gpu/drm/i915/intel_hdmi.c | 7 +-- > 1 file changed, 5 insertions(+), 2 deletions(-) mode change 100644 => > 100755 drivers/gpu/drm/i915/intel_hdmi.c > > diff --git a/drivers/gpu/drm/i915/intel_hdmi.c > b/drivers/gpu/drm/i915/intel_hdmi.c > old mode 100644 > new mode 100755 > index be7fab9..888401b > --- a/drivers/gpu/drm/i915/intel_hdmi.c > +++ b/drivers/gpu/drm/i915/intel_hdmi.c > @@ -1387,16 +1387,19 @@ intel_hdmi_detect(struct drm_connector *connector, > bool force) > struct intel_hdmi *intel_hdmi = intel_attached_hdmi(connector); > struct drm_i915_private *dev_priv = to_i915(connector->dev); > bool live_status = false; > - unsigned int retry = 3; > + // read hotplug status 4 times at most for 30ms delay (3 retries of > 10ms each) > + unsigned int retry = 4; > > DRM_DEBUG_KMS("[CONNECTOR:%d:%s]\n", > connector->base.id, connector->name); > > intel_display_power_get(dev_priv, POWER_DOMAIN_GMBUS); > > - while (!live_status && --retry) { > + while (!live_status && retry--) { > live_status = intel_digital_port_connected(dev_priv, > hdmi_to_dig_port(intel_hdmi)); > + if (live_status || !retry) > + break; > mdelay(10); > } > > -- > 1.9.1 > > ___ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx > ___ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Ville Syrjälä Intel OTC ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 05/40] drm/i915: Split i915_dem_do_execbuffer() in half
From: John HarrisonSplit the execbuffer() function in half. The first half collects and validates all the information requried to process the batch buffer. It also does all the object pinning, relocations, active list management, etc - basically anything that must be done upfront before the IOCTL returns and allows the user land side to start changing/freeing things. The second half does the actual ring submission. This change implements the split but leaves the back half being called directly from the end of the front half. v2: Updated due to changes in underlying tree - addition of sync fence support and removal of cliprects. v3: Moved local 'ringbuf' variable to make later patches in the series a bit neater. Change-Id: I5e1c77639ce526ab2401b0323186c518bf13da0a For: VIZ-1587 Signed-off-by: John Harrison --- drivers/gpu/drm/i915/i915_drv.h| 11 +++ drivers/gpu/drm/i915/i915_gem.c| 2 + drivers/gpu/drm/i915/i915_gem_execbuffer.c | 130 - drivers/gpu/drm/i915/intel_lrc.c | 57 + drivers/gpu/drm/i915/intel_lrc.h | 1 + 5 files changed, 145 insertions(+), 56 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 194bca0..eb00454 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1679,10 +1679,18 @@ struct i915_execbuffer_params { struct drm_device *dev; struct drm_file *file; uint32_tdispatch_flags; + uint32_targs_flags; uint32_targs_batch_start_offset; + uint32_targs_batch_len; + uint32_targs_num_cliprects; + uint32_targs_DR1; + uint32_targs_DR4; uint64_tbatch_obj_vm_offset; struct intel_engine_cs *ring; struct drm_i915_gem_object *batch_obj; + struct drm_clip_rect*cliprects; + uint32_tinstp_mask; + int instp_mode; struct intel_context*ctx; struct drm_i915_gem_request *request; }; @@ -1944,6 +1952,7 @@ struct drm_i915_private { int (*execbuf_submit)(struct i915_execbuffer_params *params, struct drm_i915_gem_execbuffer2 *args, struct list_head *vmas); + int (*execbuf_final)(struct i915_execbuffer_params *params); int (*init_rings)(struct drm_device *dev); void (*cleanup_ring)(struct intel_engine_cs *ring); void (*stop_ring)(struct intel_engine_cs *ring); @@ -2798,9 +2807,11 @@ int i915_gem_sw_finish_ioctl(struct drm_device *dev, void *data, void i915_gem_execbuffer_move_to_active(struct list_head *vmas, struct drm_i915_gem_request *req); void i915_gem_execbuffer_retire_commands(struct i915_execbuffer_params *params); +void i915_gem_execbuff_release_batch_obj(struct drm_i915_gem_object *batch_obj); int i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params, struct drm_i915_gem_execbuffer2 *args, struct list_head *vmas); +int i915_gem_ringbuffer_submission_final(struct i915_execbuffer_params *params); int i915_gem_execbuffer(struct drm_device *dev, void *data, struct drm_file *file_priv); int i915_gem_execbuffer2(struct drm_device *dev, void *data, diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 3c88678..b9501ca 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -5257,11 +5257,13 @@ int i915_gem_init(struct drm_device *dev) if (!i915.enable_execlists) { dev_priv->gt.execbuf_submit = i915_gem_ringbuffer_submission; + dev_priv->gt.execbuf_final = i915_gem_ringbuffer_submission_final; dev_priv->gt.init_rings = i915_gem_init_rings; dev_priv->gt.cleanup_ring = intel_cleanup_ring_buffer; dev_priv->gt.stop_ring = intel_stop_ring_buffer; } else { dev_priv->gt.execbuf_submit = intel_execlists_submission; + dev_priv->gt.execbuf_final = intel_execlists_submission_final; dev_priv->gt.init_rings = intel_logical_rings_init; dev_priv->gt.cleanup_ring = intel_logical_ring_cleanup; dev_priv->gt.stop_ring = intel_logical_ring_stop; diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index f7f1057..05c9de6 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++
[Intel-gfx] [PATCH 18/40] drm/i915: Hook scheduler node clean up into retire requests
From: John HarrisonThe scheduler keeps its own lock on various DRM objects in order to guarantee safe access long after the original execbuff IOCTL has completed. This is especially important when pre-emption is enabled as the batch buffer might need to be submitted to the hardware multiple times. This patch hooks the clean up of these locks into the request retire function. The request can only be retired after it has completed on the hardware and thus is no longer eligible for re-submission. Thus there is no point holding on to the locks beyond that time. v3: Updated to not WARN when cleaning a node that is being cancelled. The clean will happen later so skipping it at the point of cancellation is fine. For: VIZ-1587 Signed-off-by: John Harrison --- drivers/gpu/drm/i915/i915_gem.c | 3 ++ drivers/gpu/drm/i915/i915_scheduler.c | 54 --- drivers/gpu/drm/i915/i915_scheduler.h | 1 + 3 files changed, 42 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index dc5f3fe..349ff58 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1402,6 +1402,9 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request) fence_signal_locked(>fence); } + if (request->scheduler_qe) + i915_gem_scheduler_clean_node(request->scheduler_qe); + i915_gem_request_unreference(request); } diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index 300cd89..f88c871 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -406,6 +406,41 @@ void i915_scheduler_wakeup(struct drm_device *dev) queue_work(dev_priv->wq, _priv->mm.scheduler_work); } +void i915_gem_scheduler_clean_node(struct i915_scheduler_queue_entry *node) +{ + uint32_t i; + + if (!I915_SQS_IS_COMPLETE(node)) { + WARN(!node->params.request->cancelled, +"Cleaning active node: %d!\n", node->status); + return; + } + + if (node->params.batch_obj) { + /* The batch buffer must be unpinned before it is unreferenced +* otherwise the unpin fails with a missing vma!? */ + if (node->params.dispatch_flags & I915_DISPATCH_SECURE) + i915_gem_execbuff_release_batch_obj(node->params.batch_obj); + + node->params.batch_obj = NULL; + } + + /* Release the locked buffers: */ + for (i = 0; i < node->num_objs; i++) { + drm_gem_object_unreference( + >saved_objects[i].obj->base); + } + kfree(node->saved_objects); + node->saved_objects = NULL; + node->num_objs = 0; + + /* Context too: */ + if (node->params.ctx) { + i915_gem_context_unreference(node->params.ctx); + node->params.ctx = NULL; + } +} + static int i915_scheduler_remove(struct intel_engine_cs *ring) { struct drm_i915_private *dev_priv = ring->dev->dev_private; @@ -415,7 +450,7 @@ static int i915_scheduler_remove(struct intel_engine_cs *ring) int flying = 0, queued = 0; int ret = 0; booldo_submit; - uint32_ti, min_seqno; + uint32_tmin_seqno; struct list_headremove; if (list_empty(>node_queue[ring->id])) @@ -514,21 +549,8 @@ static int i915_scheduler_remove(struct intel_engine_cs *ring) node = list_first_entry(, typeof(*node), link); list_del(>link); - /* The batch buffer must be unpinned before it is unreferenced -* otherwise the unpin fails with a missing vma!? */ - if (node->params.dispatch_flags & I915_DISPATCH_SECURE) - i915_gem_execbuff_release_batch_obj(node->params.batch_obj); - - /* Release the locked buffers: */ - for (i = 0; i < node->num_objs; i++) { - drm_gem_object_unreference( - >saved_objects[i].obj->base); - } - kfree(node->saved_objects); - - /* Context too: */ - if (node->params.ctx) - i915_gem_context_unreference(node->params.ctx); + /* Free up all the DRM object references */ + i915_gem_scheduler_clean_node(node); /* And anything else owned by the node: */ node->params.request->scheduler_qe = NULL; diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h index 56f68e5..54d87fb 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.h +++ b/drivers/gpu/drm/i915/i915_scheduler.h @@ -88,6 +88,7 @@
[Intel-gfx] [PATCH 21/40] drm/i915: Added scheduler flush calls to ring throttle and idle functions
From: John HarrisonWhen requesting that all GPU work is completed, it is now necessary to get the scheduler involved in order to flush out work that queued and not yet submitted. v2: Updated to add support for flushing the scheduler queue by time stamp rather than just doing a blanket flush. v3: Moved submit_max_priority() to this patch from an earlier patch is it is no longer required in the other. Change-Id: I95dcc2a2ee5c1a844748621c333994ddd6cf6a66 For: VIZ-1587 Signed-off-by: John Harrison --- drivers/gpu/drm/i915/i915_gem.c | 24 ++- drivers/gpu/drm/i915/i915_scheduler.c | 132 ++ drivers/gpu/drm/i915/i915_scheduler.h | 3 + 3 files changed, 158 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 1a05c97..541ed9a 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -3765,6 +3765,10 @@ int i915_gpu_idle(struct drm_device *dev) /* Flush everything onto the inactive list. */ for_each_ring(ring, dev_priv, i) { + ret = i915_scheduler_flush(ring, true); + if (ret < 0) + return ret; + if (!i915.enable_execlists) { struct drm_i915_gem_request *req; @@ -4478,7 +4482,8 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file) unsigned long recent_enough = jiffies - DRM_I915_THROTTLE_JIFFIES; struct drm_i915_gem_request *request, *target = NULL; unsigned reset_counter; - int ret; + int i, ret; + struct intel_engine_cs *ring; ret = i915_gem_wait_for_error(_priv->gpu_error); if (ret) @@ -4488,6 +4493,23 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file) if (ret) return ret; + for_each_ring(ring, dev_priv, i) { + /* +* Flush out scheduler entries that are getting 'stale'. Note +* that the following recent_enough test will only check +* against the time at which the request was submitted to the +* hardware (i.e. when it left the scheduler) not the time it +* was submitted to the driver. +* +* Also, there is not much point worring about busy return +* codes from the scheduler flush call. Even if more work +* cannot be submitted right now for whatever reason, we +* still want to throttle against stale work that has already +* been submitted. +*/ + i915_scheduler_flush_stamp(ring, recent_enough, false); + } + spin_lock(_priv->mm.lock); list_for_each_entry(request, _priv->mm.request_list, client_list) { if (time_after_eq(request->emitted_jiffies, recent_enough)) diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index 386f157..c13dbc3 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -31,6 +31,8 @@ static int i915_scheduler_remove_dependent(struct i915_scheduler *schedu struct i915_scheduler_queue_entry *remove); static int i915_scheduler_submit(struct intel_engine_cs *ring, bool is_locked); +static int i915_scheduler_submit_max_priority(struct intel_engine_cs *ring, + bool is_locked); static uint32_ti915_scheduler_count_flying(struct i915_scheduler *scheduler, struct intel_engine_cs *ring); static voidi915_scheduler_priority_bump_clear(struct i915_scheduler *scheduler); @@ -580,6 +582,98 @@ void i915_gem_scheduler_work_handler(struct work_struct *work) } } +int i915_scheduler_flush_stamp(struct intel_engine_cs *ring, + unsigned long target, + bool is_locked) +{ + struct i915_scheduler_queue_entry *node; + struct drm_i915_private *dev_priv; + struct i915_scheduler *scheduler; + unsigned long flags; + int flush_count = 0; + + if (!ring) + return -EINVAL; + + dev_priv = ring->dev->dev_private; + scheduler = dev_priv->scheduler; + + if (!scheduler) + return 0; + + if (is_locked && (scheduler->flags[ring->id] & i915_sf_submitting)) { + /* Scheduler is busy already submitting another batch, +* come back later rather than going recursive... */ + return -EAGAIN; + } + + spin_lock_irqsave(>lock, flags); + i915_scheduler_priority_bump_clear(scheduler); +
[Intel-gfx] [PATCH 24/40] drm/i915: Defer seqno allocation until actual hardware submission time
From: John HarrisonThe seqno value is now only used for the final test for completion of a request. It is no longer used to track the request through the software stack. Thus it is no longer necessary to allocate the seqno immediately with the request. Instead, it can be done lazily and left until the request is actually sent to the hardware. This is particular advantageous with a GPU scheduler as the requests can then be re-ordered between their creation and their hardware submission without having out of order seqnos. v2: i915_add_request() can't fail! Combine with 'drm/i915: Assign seqno at start of exec_final()' Various bits of code during the execbuf code path need a seqno value to be assigned to the request. This change makes this assignment explicit at the start of submission_final() rather than relying on an auto-generated seqno to have happened already. This is in preparation for a future patch which changes seqno values to be assigned lazily (during add_request). v3: Updated to use locally cached request pointer. Change-Id: I0d922b84c517611a79fa6c2b9e730d4fe3671d6a For: VIZ-1587 Signed-off-by: John Harrison --- drivers/gpu/drm/i915/i915_drv.h| 1 + drivers/gpu/drm/i915/i915_gem.c| 21 - drivers/gpu/drm/i915/i915_gem_execbuffer.c | 13 + drivers/gpu/drm/i915/intel_lrc.c | 13 + 4 files changed, 47 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 5b893a6..15dee41 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2225,6 +2225,7 @@ struct drm_i915_gem_request { /** GEM sequence number associated with this request. */ uint32_t seqno; + uint32_t reserved_seqno; /* Unique identifier which can be used for trace points & debug */ uint32_t uniq; diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 99e5b1d0..1fb45c2 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2525,6 +2525,9 @@ i915_gem_get_seqno(struct drm_device *dev, u32 *seqno) /* reserve 0 for non-seqno */ if (dev_priv->next_seqno == 0) { + /* Why is the full re-initialisation required? Is it only for +* hardware semaphores? If so, could skip it in the case where +* semaphores are disabled? */ int ret = i915_gem_init_seqno(dev, 0); if (ret) return ret; @@ -2582,6 +2585,12 @@ void __i915_add_request(struct drm_i915_gem_request *request, WARN(ret, "*_ring_flush_all_caches failed: %d!\n", ret); } + /* Make the request's seqno 'live': */ + if(!request->seqno) { + request->seqno = request->reserved_seqno; + WARN_ON(request->seqno != dev_priv->last_seqno); + } + /* Record the position of the start of the request so that * should we detect the updated seqno part-way through the * GPU processing the request, we never over-estimate the @@ -2830,6 +2839,9 @@ void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked) list_for_each_entry_safe(req, req_next, >fence_signal_list, signal_link) { if (!req->cancelled) { + /* How can this happen? */ + WARN_ON(req->seqno == 0); + if (!i915_seqno_passed(seqno, req->seqno)) break; } @@ -3054,7 +3066,14 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring, if (req == NULL) return -ENOMEM; - ret = i915_gem_get_seqno(ring->dev, >seqno); + /* +* Assign an identifier to track this request through the hardware +* but don't make it live yet. It could change in the future if this +* request gets overtaken. However, it still needs to be allocated +* in advance because the point of submission must not fail and seqno +* allocation can fail. +*/ + ret = i915_gem_get_seqno(ring->dev, >reserved_seqno); if (ret) goto err; diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index 0908699..7970958 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -1249,6 +1249,19 @@ int i915_gem_ringbuffer_submission_final(struct i915_execbuffer_params *params) /* The mutex must be acquired before calling this function */ BUG_ON(!mutex_is_locked(>dev->struct_mutex)); + /* Make sure the request's seqno is the latest and greatest: */ + if(req->reserved_seqno != dev_priv->last_seqno) { + ret = i915_gem_get_seqno(ring->dev, >reserved_seqno); + if
[Intel-gfx] [PATCH 28/40] drm/i915: Added trace points to scheduler
From: John HarrisonAdded trace points to the scheduler to track all the various events, node state transitions and other interesting things that occur. v2: Updated for new request completion tracking implementation. v3: Updated for changes to node kill code. Change-Id: I9886390cfc7897bc1faf50a104bc651d8baed8a5 For: VIZ-1587 Signed-off-by: John Harrison --- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 2 + drivers/gpu/drm/i915/i915_scheduler.c | 26 drivers/gpu/drm/i915/i915_trace.h | 190 + drivers/gpu/drm/i915/intel_lrc.c | 2 + 4 files changed, 220 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index fdaede3..b358b21 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -1226,6 +1226,8 @@ i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params, i915_gem_execbuffer_move_to_active(vmas, params->request); + trace_i915_gem_ring_queue(ring, params); + qe = container_of(params, typeof(*qe), params); ret = i915_scheduler_queue_execbuffer(qe); if (ret) diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index 39aa702..4736f0f 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -151,6 +151,8 @@ int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe) if (i915.scheduler_override & i915_so_direct_submit) { int ret; + trace_i915_scheduler_queue(qe->params.ring, qe); + WARN_ON(qe->params.fence_wait && (!sync_fence_is_signaled(qe->params.fence_wait))); @@ -271,6 +273,9 @@ int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe) not_flying = i915_scheduler_count_flying(scheduler, ring) < scheduler->min_flying; + trace_i915_scheduler_queue(ring, node); + trace_i915_scheduler_node_state_change(ring, node); + spin_unlock_irqrestore(>lock, flags); if (not_flying) @@ -298,6 +303,9 @@ static int i915_scheduler_fly_node(struct i915_scheduler_queue_entry *node) node->status = i915_sqs_flying; + trace_i915_scheduler_fly(ring, node); + trace_i915_scheduler_node_state_change(ring, node); + if (!(scheduler->flags[ring->id] & i915_sf_interrupts_enabled)) { boolsuccess = true; @@ -363,6 +371,8 @@ static void i915_scheduler_node_requeue(struct i915_scheduler_queue_entry *node) node->status = i915_sqs_queued; node->params.request->seqno = 0; + trace_i915_scheduler_unfly(node->params.ring, node); + trace_i915_scheduler_node_state_change(node->params.ring, node); } /* Give up on a node completely. For example, because it is causing the @@ -372,7 +382,11 @@ static void i915_scheduler_node_kill(struct i915_scheduler_queue_entry *node) BUG_ON(!node); BUG_ON(I915_SQS_IS_COMPLETE(node)); + if (I915_SQS_IS_FLYING(node)) + trace_i915_scheduler_unfly(node->params.ring, node); + node->status = i915_sqs_dead; + trace_i915_scheduler_node_state_change(node->params.ring, node); } /* @@ -392,6 +406,8 @@ bool i915_scheduler_notify_request(struct drm_i915_gem_request *req) struct i915_scheduler_queue_entry *node = req->scheduler_qe; unsigned long flags; + trace_i915_scheduler_landing(req); + if (!node) return false; @@ -405,6 +421,8 @@ bool i915_scheduler_notify_request(struct drm_i915_gem_request *req) else node->status = i915_sqs_complete; + trace_i915_scheduler_node_state_change(req->ring, node); + spin_unlock_irqrestore(>lock, flags); return true; @@ -550,6 +568,8 @@ static int i915_scheduler_remove(struct intel_engine_cs *ring) /* Launch more packets now? */ do_submit = (queued > 0) && (flying < scheduler->min_flying); + trace_i915_scheduler_remove(ring, min_seqno, do_submit); + spin_unlock_irqrestore(>lock, flags); if (!do_submit && list_empty()) @@ -564,6 +584,8 @@ static int i915_scheduler_remove(struct intel_engine_cs *ring) node = list_first_entry(, typeof(*node), link); list_del(>link); + trace_i915_scheduler_destroy(ring, node); + if (node->params.fence_wait) sync_fence_put(node->params.fence_wait); @@ -927,6 +949,8 @@ static int i915_scheduler_pop_from_queue_locked(struct intel_engine_cs *ring, INIT_LIST_HEAD(>link); best->status = i915_sqs_popped; + trace_i915_scheduler_node_state_change(ring, best); +
[Intel-gfx] [PATCH 16/40] drm/i915: Keep the reserved space mechanism happy
From: John HarrisonRing space is reserved when constructing a request to ensure that the subsequent 'add_request()' call cannot fail due to waiting for space on a busy or broken GPU. However, the scheduler jumps in to the middle of the execbuffer process between request creation and request submission. Thus it needs to cancel the reserved space when the request is simply added to the scheduler's queue and not yet submitted. Similarly, it needs to re-reserve the space when it finally does want to send the batch buffer to the hardware. v3: Updated to use locally cached request pointer. For: VIZ-1587 Signed-off-by: John Harrison --- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 7 +++ drivers/gpu/drm/i915/i915_scheduler.c | 4 drivers/gpu/drm/i915/intel_lrc.c | 13 +++-- 3 files changed, 22 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index b5d618a..2c7a395 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -1249,6 +1249,10 @@ int i915_gem_ringbuffer_submission_final(struct i915_execbuffer_params *params) /* The mutex must be acquired before calling this function */ BUG_ON(!mutex_is_locked(>dev->struct_mutex)); + ret = intel_ring_reserve_space(req); + if (ret) + return ret; + intel_runtime_pm_get(dev_priv); /* @@ -1309,6 +1313,9 @@ error: */ intel_runtime_pm_put(dev_priv); + if (ret) + intel_ring_reserved_space_cancel(req->ringbuf); + return ret; } diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index 0e657cf..9d1475f 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -145,6 +145,8 @@ int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe) if (1/*i915.scheduler_override & i915_so_direct_submit*/) { int ret; + intel_ring_reserved_space_cancel(qe->params.request->ringbuf); + scheduler->flags[qe->params.ring->id] |= i915_sf_submitting; ret = dev_priv->gt.execbuf_final(>params); scheduler->flags[qe->params.ring->id] &= ~i915_sf_submitting; @@ -174,6 +176,8 @@ int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe) node->stamp = jiffies; i915_gem_request_reference(node->params.request); + intel_ring_reserved_space_cancel(node->params.request->ringbuf); + BUG_ON(node->params.request->scheduler_qe); node->params.request->scheduler_qe = node; diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index f14d9b2..ebc951e 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -934,13 +934,17 @@ int intel_execlists_submission_final(struct i915_execbuffer_params *params) /* The mutex must be acquired before calling this function */ BUG_ON(!mutex_is_locked(>dev->struct_mutex)); + ret = intel_logical_ring_reserve_space(req); + if (ret) + return ret; + /* * Unconditionally invalidate gpu caches and ensure that we do flush * any residual writes from the previous batch. */ ret = logical_ring_invalidate_all_caches(req); if (ret) - return ret; + goto err; if (ring == _priv->ring[RCS] && params->instp_mode != dev_priv->relative_constants_mode) { @@ -962,13 +966,18 @@ int intel_execlists_submission_final(struct i915_execbuffer_params *params) ret = ring->emit_bb_start(req, exec_start, params->dispatch_flags); if (ret) - return ret; + goto err; trace_i915_gem_ring_dispatch(req, params->dispatch_flags); i915_gem_execbuffer_retire_commands(params); return 0; + +err: + intel_ring_reserved_space_cancel(params->request->ringbuf); + + return ret; } void intel_execlists_retire_requests(struct intel_engine_cs *ring) -- 1.9.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 19/40] drm/i915: Added scheduler support to __wait_request() calls
From: John HarrisonThe scheduler can cause batch buffers, and hence requests, to be submitted to the ring out of order and asynchronously to their submission to the driver. Thus at the point of waiting for the completion of a given request, it is not even guaranteed that the request has actually been sent to the hardware yet. Even it is has been sent, it is possible that it could be pre-empted and thus 'unsent'. This means that it is necessary to be able to submit requests to the hardware during the wait call itself. Unfortunately, while some callers of __wait_request() release the mutex lock first, others do not (and apparently can not). Hence there is the ability to deadlock as the wait stalls for submission but the asynchronous submission is stalled for the mutex lock. This change hooks the scheduler in to the __wait_request() code to ensure correct behaviour. That is, flush the target batch buffer through to the hardware and do not deadlock waiting for something that cannot currently be submitted. Instead, the wait call must return EAGAIN at least as far back as necessary to release the mutex lock and allow the scheduler's asynchronous processing to get in and handle the pre-emption operation and eventually (re-)submit the work. v3: Removed the explicit scheduler flush from i915_wait_request(). This is no longer necessary and was causing unintended changes to the scheduler priority level which broke a validation team test. Change-Id: I31fe6bc7e38f6ffdd843fcae16e7cc8b1e52a931 For: VIZ-1587 Signed-off-by: John Harrison --- drivers/gpu/drm/i915/i915_drv.h | 3 ++- drivers/gpu/drm/i915/i915_gem.c | 33 ++--- drivers/gpu/drm/i915/i915_scheduler.c | 20 drivers/gpu/drm/i915/i915_scheduler.h | 2 ++ drivers/gpu/drm/i915/intel_display.c| 5 +++-- drivers/gpu/drm/i915/intel_ringbuffer.c | 2 +- 6 files changed, 54 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 9a67f7c..5ed600c 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -3029,7 +3029,8 @@ int __i915_wait_request(struct drm_i915_gem_request *req, unsigned reset_counter, bool interruptible, s64 *timeout, - struct intel_rps_client *rps); + struct intel_rps_client *rps, + bool is_locked); int __must_check i915_wait_request(struct drm_i915_gem_request *req); int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf); int __must_check diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 349ff58..784000b 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1207,7 +1207,8 @@ int __i915_wait_request(struct drm_i915_gem_request *req, unsigned reset_counter, bool interruptible, s64 *timeout, - struct intel_rps_client *rps) + struct intel_rps_client *rps, + bool is_locked) { struct intel_engine_cs *ring = i915_gem_request_get_ring(req); struct drm_device *dev = ring->dev; @@ -1217,8 +1218,10 @@ int __i915_wait_request(struct drm_i915_gem_request *req, DEFINE_WAIT(wait); unsigned long timeout_expire; s64 before, now; - int ret; + int ret = 0; + boolbusy; + might_sleep(); WARN(!intel_irqs_enabled(dev_priv), "IRQs disabled"); if (i915_gem_request_completed(req)) @@ -1269,6 +1272,22 @@ int __i915_wait_request(struct drm_i915_gem_request *req, break; } + if (is_locked) { + /* If this request is being processed by the scheduler +* then it is unsafe to sleep with the mutex lock held +* as the scheduler may require the lock in order to +* progress the request. */ + if (i915_scheduler_is_request_tracked(req, NULL, )) { + if (busy) { + ret = -EAGAIN; + break; + } + } + + /* If the request is not tracked by the scheduler then the +* regular test can be done. */ + } + if (i915_gem_request_completed(req)) { ret = 0; break; @@ -1455,7 +1474,7 @@ i915_wait_request(struct drm_i915_gem_request *req) ret = __i915_wait_request(req, atomic_read(_priv->gpu_error.reset_counter), -
Re: [Intel-gfx] [PATCH] drm/i915: Fix context/engine cleanup order
On Fri, Dec 11, 2015 at 02:36:36PM +, Nick Hoath wrote: > diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c > index 84e2b20..a2857b0 100644 > --- a/drivers/gpu/drm/i915/i915_dma.c > +++ b/drivers/gpu/drm/i915/i915_dma.c > @@ -449,7 +449,7 @@ static int i915_load_modeset_init(struct drm_device *dev) > > cleanup_gem: > mutex_lock(>struct_mutex); > - i915_gem_cleanup_ringbuffer(dev); > + i915_gem_cleanup_engines(dev); > i915_gem_context_fini(dev); > mutex_unlock(>struct_mutex); > cleanup_irq: > @@ -1188,8 +1188,8 @@ int i915_driver_unload(struct drm_device *dev) > > intel_guc_ucode_fini(dev); > mutex_lock(>struct_mutex); > - i915_gem_cleanup_ringbuffer(dev); > i915_gem_context_fini(dev); > + i915_gem_cleanup_engines(dev); > mutex_unlock(>struct_mutex); > intel_fbc_cleanup_cfb(dev_priv); > i915_gem_cleanup_stolen(dev); Choose! Anyway contexts should be shutdown before the engines, so with the above fixed Reviewed-by: Chris Wilson-Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [RFC 35/38] drm/i915/preempt: Implement mid-batch preemption support
From: Dave GordonBatch buffers which have been pre-emption mid-way through execution must be handled seperately. Rather than simply re-submitting the batch as a brand new piece of work, the driver only needs to requeue the context. The hardware will take care of picking up where it left off. v2: New patch in series. For: VIZ-2021 Signed-off-by: Dave Gordon --- drivers/gpu/drm/i915/i915_debugfs.c | 1 + drivers/gpu/drm/i915/i915_scheduler.c | 55 +++ drivers/gpu/drm/i915/i915_scheduler.h | 3 ++ drivers/gpu/drm/i915/intel_lrc.c | 51 drivers/gpu/drm/i915/intel_lrc.h | 1 + 5 files changed, 105 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index 7137439..6798f9c 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -3722,6 +3722,7 @@ static int i915_scheduler_info(struct seq_file *m, void *unused) PRINT_VAR(" Queued", "u", stats[r].queued); PRINT_VAR(" Submitted","u", stats[r].submitted); PRINT_VAR(" Preempted","u", stats[r].preempted); + PRINT_VAR(" Midbatch preempted", "u", stats[r].mid_preempted); PRINT_VAR(" Completed","u", stats[r].completed); PRINT_VAR(" Expired", "u", stats[r].expired); seq_putc(m, '\n'); diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index d0c4b46..d96eefb 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -743,6 +743,7 @@ i915_scheduler_preemption_postprocess(struct intel_engine_cs *ring) struct i915_scheduler *scheduler = dev_priv->scheduler; struct i915_scheduler_queue_entry *pnode = NULL; struct drm_i915_gem_request *preq = NULL; + struct drm_i915_gem_request *midp = NULL; struct i915_scheduler_stats *stats; unsigned long flags; int preempted = 0, preemptive = 0; @@ -806,8 +807,12 @@ i915_scheduler_preemption_postprocess(struct intel_engine_cs *ring) node->status = i915_sqs_preempted; trace_i915_scheduler_unfly(ring, node); trace_i915_scheduler_node_state_change(ring, node); - /* Empty the preempted ringbuffer */ - intel_lr_context_resync(req->ctx, ring, false); + + /* Identify a mid-batch preemption */ + if (req->seqno == ring->last_batch_start) { + WARN(midp, "Multiple mid-batch-preempted requests?\n"); + midp = req; + } } i915_gem_request_dequeue(req); @@ -821,11 +826,47 @@ i915_scheduler_preemption_postprocess(struct intel_engine_cs *ring) if (stats->max_preempted < preempted) stats->max_preempted = preempted; + /* Now fix up the contexts of all preempt{ive,ed} requests */ { - /* XXX: Sky should be empty now */ + struct intel_context *mid_ctx = NULL; struct i915_scheduler_queue_entry *node; - list_for_each_entry(node, >node_queue[ring->id], link) - WARN_ON(I915_SQS_IS_FLYING(node)); + u32 started = ring->last_batch_start; + + /* +* Iff preemption was mid-batch, we should have found a +* mid-batch-preempted request +*/ + if (started && started != ring->last_irq_seqno) + WARN(!midp, "Mid-batch preempted, but request not found\n"); + else + WARN(midp, "Found unexpected mid-batch preemption?\n"); + + if (midp) { + /* Rewrite this context rather than emptying it */ + intel_lr_context_resync_req(midp); + midp->scheduler_flags |= i915_req_sf_restart; + mid_ctx = midp->ctx; + stats->mid_preempted += 1; + WARN_ON(preq == midp); + } + + list_for_each_entry(node, >node_queue[ring->id], link) { + /* XXX: Sky should be empty now */ + if (WARN_ON(I915_SQS_IS_FLYING(node))) + continue; + + /* Clean up preempted contexts */ + if (node->status != i915_sqs_preempted) + continue; + + if (node->params.ctx != mid_ctx) { + /* Empty the preempted ringbuffer */ +
[Intel-gfx] [RFC 00/38] Preemption support for GPU scheduler
From: John HarrisonAdded pre-emption support to the i915 GPU scheduler. Note that this patch series was written by David Gordon. I have simply ported it onto a more recent set of scheduler patches and am uploading it as part of that work so that everything can be viewed at once. Also because David is on extended vacation at the moment. Not that the series is being sent as an RFC as there are still some things to be tidied up. Most notably the commit messages are missing in a few places. I am leaving those to be filled in by David when he returns. Also, the series includes a few general fix up and improvement patches that are not directly related to pre-emption. E.g. for improving the error capture state. However, the pre-emption code is built upon them so right now it is much simpler to just send the whole lot out as a single series. It can be broken up into separate patch sets if/when people decide it is all good stuff to be doing. Re the pre-emption itself. It is functional and working but with the caveat that it requires the GuC. Hence it is only operation on SKL or later hardware. If the GuC is not available then the pre-emption support is simply disabled in the scheduler. v2: Updated for changes to scheduler - use locally cached request pointer. Re-worked the 'pre-emption in progress' logic inside the notify code to simplify it. Implemented support for mid-batch pre-emption. This must be treated differently to bettween-batch pre-emption. Fixed a couple of trace point issues. [Patches against drm-intel-nightly tree fetched 17/11/2015 with struct fence conversion and GPU scheduler patches applied] Dave Gordon (37): drm/i915: update ring space correctly drm/i915: recalculate ring space after reset drm/i915: hangcheck=idle should wake_up_all every time, not just once drm/i915/error: capture execlist state on error drm/i915/error: capture ringbuffer pointed to by START drm/i915/error: report ctx id & desc for each request in the queue drm/i915/error: improve CSB reporting drm/i915/error: report size in pages for each object dumped drm/i915/error: track, capture & print ringbuffer submission activity drm/i915/guc: Tidy up GuC proc/ctx descriptor setup drm/i915/guc: Add a second client, to be used for preemption drm/i915/guc: implement submission via REQUEST_PREEMPTION action drm/i915/guc: Improve action error reporting, add preemption debug drm/i915/guc: Expose GuC-maintained statistics drm/i915: add i915_wait_request() call after i915_add_request_no_flush() drm/i915/guc: Expose (intel)_lr_context_size() drm/i915/guc: Add support for GuC ADS (Addition Data Structure) drm/i915/guc: Fill in (part of?) the ADS whitelist drm/i915/error: capture errored context based on request context-id drm/i915/error: enhanced error capture of requests drm/i915/error: add GuC state error capture & decode drm/i915: track relative-constants-mode per-context not per-device drm/i915: set request 'head' on allocation not in add_request() drm/i915/sched: set request 'head' on at start of ring submission drm/i915/sched: include scheduler state in error capture drm/i915/preempt: preemption-related definitions and statistics drm/i915/preempt: scheduler logic for queueing preemptive requests drm/i915/preempt: scheduler logic for selecting preemptive requests drm/i915/preempt: scheduler logic for preventing recursive preemption drm/i915/preempt: don't allow nonbatch ctx init when the scheduler is busy drm/i915/preempt: scheduler logic for landing preemptive requests drm/i915/preempt: add hook to catch 'unexpected' ring submissions drm/i915/preempt: Refactor intel_lr_context_reset() drm/i915/preempt: scheduler logic for postprocessing preemptive requests drm/i915/preempt: Implement mid-batch preemption support drm/i915/preempt: update (LRC) ringbuffer-filling code to create preemptive requests drm/i915/preempt: update scheduler parameters to enable preemption John Harrison (1): drm/i915: Added preemption info to various trace points drivers/gpu/drm/i915/i915_debugfs.c| 50 ++- drivers/gpu/drm/i915/i915_drv.h| 34 +- drivers/gpu/drm/i915/i915_gem.c| 122 +++- drivers/gpu/drm/i915/i915_gem_context.c| 5 +- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 9 +- drivers/gpu/drm/i915/i915_gpu_error.c | 307 -- drivers/gpu/drm/i915/i915_guc_reg.h| 1 + drivers/gpu/drm/i915/i915_guc_submission.c | 243 +++--- drivers/gpu/drm/i915/i915_irq.c| 23 +- drivers/gpu/drm/i915/i915_scheduler.c | 487 ++--- drivers/gpu/drm/i915/i915_scheduler.h | 49 ++- drivers/gpu/drm/i915/i915_trace.h | 30 +- drivers/gpu/drm/i915/intel_guc.h | 31 +- drivers/gpu/drm/i915/intel_guc_fwif.h | 93 +- drivers/gpu/drm/i915/intel_guc_loader.c| 14 +- drivers/gpu/drm/i915/intel_lrc.c
Re: [Intel-gfx] [PATCH v2] drm/i915: Fix context/engine cleanup order
On Fri, Dec 11, 2015 at 02:59:09PM +, Nick Hoath wrote: > Swap the order of context & engine cleanup, so that it is now > contexts, then engines. > This allows the context clean up code to do things like confirm > that ring->dev->struct_mutex is locked without a NULL pointer > dereference. > This came about as a result of the 'intel_ring_initialized() must > be simple and inline' patch now using ring->dev as an initialised > flag. > Rename the cleanup function to reflect what it actually does. > Also clean up some very annoying whitespace issues at the same time. > > v2: Also make the fix in i915_load_modeset_init, not just > in i915_driver_unload (Chris Wilson) > > Signed-off-by: Nick Hoath> Reviewed-by: Chris Wilson > > Cc: Mika Kuoppala > Cc: Daniel Vetter > Cc: David Gordon > Cc: Chris Wilson Queued for -next, thanks for the patch. -Daniel > --- > drivers/gpu/drm/i915/i915_dma.c | 4 ++-- > drivers/gpu/drm/i915/i915_drv.h | 2 +- > drivers/gpu/drm/i915/i915_gem.c | 23 --- > 3 files changed, 15 insertions(+), 14 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c > index 84e2b20..4dad121 100644 > --- a/drivers/gpu/drm/i915/i915_dma.c > +++ b/drivers/gpu/drm/i915/i915_dma.c > @@ -449,8 +449,8 @@ static int i915_load_modeset_init(struct drm_device *dev) > > cleanup_gem: > mutex_lock(>struct_mutex); > - i915_gem_cleanup_ringbuffer(dev); > i915_gem_context_fini(dev); > + i915_gem_cleanup_engines(dev); > mutex_unlock(>struct_mutex); > cleanup_irq: > intel_guc_ucode_fini(dev); > @@ -1188,8 +1188,8 @@ int i915_driver_unload(struct drm_device *dev) > > intel_guc_ucode_fini(dev); > mutex_lock(>struct_mutex); > - i915_gem_cleanup_ringbuffer(dev); > i915_gem_context_fini(dev); > + i915_gem_cleanup_engines(dev); > mutex_unlock(>struct_mutex); > intel_fbc_cleanup_cfb(dev_priv); > i915_gem_cleanup_stolen(dev); > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > index 5edd393..e317f88 100644 > --- a/drivers/gpu/drm/i915/i915_drv.h > +++ b/drivers/gpu/drm/i915/i915_drv.h > @@ -3016,7 +3016,7 @@ int i915_gem_init_rings(struct drm_device *dev); > int __must_check i915_gem_init_hw(struct drm_device *dev); > int i915_gem_l3_remap(struct drm_i915_gem_request *req, int slice); > void i915_gem_init_swizzling(struct drm_device *dev); > -void i915_gem_cleanup_ringbuffer(struct drm_device *dev); > +void i915_gem_cleanup_engines(struct drm_device *dev); > int __must_check i915_gpu_idle(struct drm_device *dev); > int __must_check i915_gem_suspend(struct drm_device *dev); > void __i915_add_request(struct drm_i915_gem_request *req, > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > index 8e2acde..04a22db 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > @@ -4823,7 +4823,7 @@ i915_gem_init_hw(struct drm_device *dev) > > ret = i915_gem_request_alloc(ring, ring->default_context, ); > if (ret) { > - i915_gem_cleanup_ringbuffer(dev); > + i915_gem_cleanup_engines(dev); > goto out; > } > > @@ -4836,7 +4836,7 @@ i915_gem_init_hw(struct drm_device *dev) > if (ret && ret != -EIO) { > DRM_ERROR("PPGTT enable ring #%d failed %d\n", i, ret); > i915_gem_request_cancel(req); > - i915_gem_cleanup_ringbuffer(dev); > + i915_gem_cleanup_engines(dev); > goto out; > } > > @@ -4844,7 +4844,7 @@ i915_gem_init_hw(struct drm_device *dev) > if (ret && ret != -EIO) { > DRM_ERROR("Context enable ring #%d failed %d\n", i, > ret); > i915_gem_request_cancel(req); > - i915_gem_cleanup_ringbuffer(dev); > + i915_gem_cleanup_engines(dev); > goto out; > } > > @@ -4919,7 +4919,7 @@ out_unlock: > } > > void > -i915_gem_cleanup_ringbuffer(struct drm_device *dev) > +i915_gem_cleanup_engines(struct drm_device *dev) > { > struct drm_i915_private *dev_priv = dev->dev_private; > struct intel_engine_cs *ring; > @@ -4928,13 +4928,14 @@ i915_gem_cleanup_ringbuffer(struct drm_device *dev) > for_each_ring(ring, dev_priv, i) > dev_priv->gt.cleanup_ring(ring); > > -if (i915.enable_execlists) > -/* > - * Neither the BIOS, ourselves or any other kernel > - * expects the system to be in execlists mode on startup, > - * so we need to reset the GPU back to legacy mode. > - */ > -
[Intel-gfx] [PATCH] drm/i915: Instrument PSR parameter for possible quirks with link standby.
Link standby support has been deprecated with 'commit 89251b177 ("drm/i915: PSR: deprecate link_standby support for core platforms.")' The reason for that is that main link in full off offers more power savings and some platforms implementations on source side had known bugs with link standby. However we don't know all panels out there and we don't fully rely on the VBT information after the case found with the commit that made us to deprecate link standby. So, before enable PSR by default let's instrument the PSR parameter in a way that we can identify different panels out there that might require or work better with link standby mode. It is also useful to say that for backward compatibility I'm not changing the meaning of this flag. So "0" still means disabled and "1" means enabled with full support and maximum power savings. v2: Use positive value instead of negative for different operation mode as suggested by Daniel. Cc: Paulo ZanoniCc: Daniel Vetter Signed-off-by: Rodrigo Vivi --- drivers/gpu/drm/i915/i915_debugfs.c | 5 + drivers/gpu/drm/i915/i915_drv.h | 1 + drivers/gpu/drm/i915/i915_params.c | 7 ++- drivers/gpu/drm/i915/intel_psr.c| 13 - 4 files changed, 24 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index 24318b7..efe973b 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -2567,6 +2567,10 @@ static int i915_edp_psr_status(struct seq_file *m, void *data) enabled = true; } } + + seq_printf(m, "Forcing main link standby: %s\n", + yesno(dev_priv->psr.link_standby)); + seq_printf(m, "HW Enabled & Active bit: %s", yesno(enabled)); if (!HAS_DDI(dev)) @@ -2587,6 +2591,7 @@ static int i915_edp_psr_status(struct seq_file *m, void *data) seq_printf(m, "Performance_Counter: %u\n", psrperf); } + mutex_unlock(_priv->psr.lock); intel_runtime_pm_put(dev_priv); diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 5edd393..de086f0 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -969,6 +969,7 @@ struct i915_psr { unsigned busy_frontbuffer_bits; bool psr2_support; bool aux_frame_sync; + bool link_standby; }; enum intel_pch { diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c index 835d609..6dd39f0 100644 --- a/drivers/gpu/drm/i915/i915_params.c +++ b/drivers/gpu/drm/i915/i915_params.c @@ -126,7 +126,12 @@ MODULE_PARM_DESC(enable_execlists, "(-1=auto [default], 0=disabled, 1=enabled)"); module_param_named_unsafe(enable_psr, i915.enable_psr, int, 0600); -MODULE_PARM_DESC(enable_psr, "Enable PSR (default: false)"); +MODULE_PARM_DESC(enable_psr, "Enable PSR " +"(0=disabled [default], 1=link-off maximum power-savings, 2=link-standby mode)" +"In case you needed to force it on standby or disabled, please " +"report PCI device ID, subsystem vendor and subsystem device ID " +"to intel-gfx@lists.freedesktop.org, if your machine needs it. " +"It will then be included in an upcoming module version."); module_param_named_unsafe(preliminary_hw_support, i915.preliminary_hw_support, int, 0600); MODULE_PARM_DESC(preliminary_hw_support, diff --git a/drivers/gpu/drm/i915/intel_psr.c b/drivers/gpu/drm/i915/intel_psr.c index 9ccff30..bcc85fd 100644 --- a/drivers/gpu/drm/i915/intel_psr.c +++ b/drivers/gpu/drm/i915/intel_psr.c @@ -225,7 +225,12 @@ static void hsw_psr_enable_sink(struct intel_dp *intel_dp) (aux_clock_divider << DP_AUX_CH_CTL_BIT_CLOCK_2X_SHIFT)); } - drm_dp_dpcd_writeb(_dp->aux, DP_PSR_EN_CFG, DP_PSR_ENABLE); + if (dev_priv->psr.link_standby) + drm_dp_dpcd_writeb(_dp->aux, DP_PSR_EN_CFG, + DP_PSR_ENABLE | DP_PSR_MAIN_LINK_ACTIVE); + else + drm_dp_dpcd_writeb(_dp->aux, DP_PSR_EN_CFG, + DP_PSR_ENABLE); } static void vlv_psr_enable_source(struct intel_dp *intel_dp) @@ -280,6 +285,9 @@ static void hsw_psr_enable_source(struct intel_dp *intel_dp) if (IS_HASWELL(dev)) val |= EDP_PSR_MIN_LINK_ENTRY_TIME_8_LINES; + if (dev_priv->psr.link_standby) + val |= EDP_PSR_LINK_STANDBY; + I915_WRITE(EDP_PSR_CTL, val | max_sleep_time << EDP_PSR_MAX_SLEEP_TIME_SHIFT | idle_frames << EDP_PSR_IDLE_FRAME_SHIFT | @@ -763,6 +771,9 @@ void intel_psr_init(struct drm_device *dev) dev_priv->psr_mmio_base = IS_HASWELL(dev_priv) ? HSW_EDP_PSR_BASE : BDW_EDP_PSR_BASE; + if
Re: [Intel-gfx] [PATCH v2] PM / Runtime: Introduce pm_runtime_get_noidle
On pe, 2015-12-11 at 16:40 +0100, Rafael J. Wysocki wrote: > On Friday, December 11, 2015 02:54:45 PM Imre Deak wrote: > > On to, 2015-12-10 at 23:14 +0100, Rafael J. Wysocki wrote: > > > On Thursday, December 10, 2015 11:20:40 PM Imre Deak wrote: > > > > On Thu, 2015-12-10 at 22:42 +0100, Rafael J. Wysocki wrote: > > > > > On Thursday, December 10, 2015 10:36:37 PM Rafael J. Wysocki > > > > > wrote: > > > > > > On Thursday, December 10, 2015 11:43:50 AM Imre Deak wrote: > > > > > > > On Thu, 2015-12-10 at 01:58 +0100, Rafael J. Wysocki > > > > > > > wrote: > > > > > > > > On Wednesday, December 09, 2015 06:22:19 PM Joonas > > > > > > > > Lahtinen > > > > > > > > wrote: > > > > > > > > > Introduce pm_runtime_get_noidle to for situations > > > > > > > > > where > > > > > > > > > it is > > > > > > > > > not > > > > > > > > > desireable to touch an idling device. One use > > > > > > > > > scenario is > > > > > > > > > periodic > > > > > > > > > hangchecks performed by the drm/i915 driver which can > > > > > > > > > be > > > > > > > > > omitted > > > > > > > > > on a device in a runtime idle state. > > > > > > > > > > > > > > > > > > v2: > > > > > > > > > - Fix inconsistent return value when !CONFIG_PM. > > > > > > > > > - Update documentation for bool return value > > > > > > > > > > > > > > > > > > Signed-off-by: Joonas Lahtinen> > > > > > > > .int > > > > > > > > > el.c > > > > > > > > > om> > > > > > > > > > Reported-by: Chris Wilson > > > > > > > > > Cc: Chris Wilson > > > > > > > > > Cc: "Rafael J. Wysocki" > > > > > > > > > Cc: linux...@vger.kernel.org > > > > > > > > > > > > > > > > Well, I don't quite see how this can be used in a non- > > > > > > > > racy > > > > > > > > way > > > > > > > > without doing an additional pm_runtime_resume() or > > > > > > > > something > > > > > > > > like > > > > > > > > that in the same code path. > > > > > > > > > > > > > > We don't want to resume, that would be the whole point. > > > > > > > We'd > > > > > > > like > > > > > > > to > > > > > > > ensure that we hold a reference _and_ the device is > > > > > > > already > > > > > > > active. So > > > > > > > AFAICS we'd need to check runtime_status == RPM_ACTIVE in > > > > > > > addition > > > > > > > after taking the reference. > > > > > > > > > > > > Right, and that under the lock. > > > > > > > > > > Which basically means you can call pm_runtime_resume() just > > > > > fine, > > > > > because it will do nothing if the status is RPM_ACTIVE > > > > > already. > > > > > > > > > > So really, why don't you use pm_runtime_get_sync()? > > > > > > > > The difference would be that if the status is not RPM_ACTIVE > > > > already we > > > > would drop the reference and report error. The caller would in > > > > this > > > > case forego of doing something, since we the device is > > > > suspended or > > > > on > > > > the way to being suspended. One example of such a scenario is a > > > > watchdog like functionality: the watchdog work would > > > > call pm_runtime_get_noidle() and check if the device is ok by > > > > doing > > > > some HW access, but only if the device is powered. Otherwise > > > > the > > > > work > > > > item would do nothing (meaning it also won't reschedule > > > > itself). > > > > The > > > > watchdog work would get rescheduled next time the device is > > > > woken > > > > up > > > > and some work is submitted to the device. > > > > > > So first of all the name "pm_runtime_get_noidle" doesn't make > > > sense. > > > > > > I guess what you need is something like > > > > > > bool pm_runtime_get_if_active(struct device *dev) > > > { > > > unsigned log flags; > > > bool ret; > > > > > > spin_lock_irqsave(>power.lock, flags); > > > > > > if (dev->power.runtime_status == RPM_ACTIVE) { > > > > But here usage_count could be zero, meaning that the device is > > already > > on the way to be suspended (autosuspend or ASYNC suspend), no? > > The usage counter equal to 0 need not mean that the device is being > suspended > right now. From the driver's point of view it means there is no need to keep the device active, and that's the only thing that matters for the driver. It doesn't matter at what exact point the actual suspend will happen after the 1->0 transition. > Also even if that's the case, the usage counter may be incremented at > this very > moment by a concurrent thread and you'll lose the opportunity to do > what you > want. In that case the other thread makes sure that the work what we want to do (run the watchdog check) is rescheduled. We need to handle that kind of race anyway, since an increment from 0->1 and setting runtime_status to RPM_ACTIVE could happen even after we have already determined here that the device is not active and so we return failure. > > In that case we don't want to return success. That would > > unnecessarily prolong > > the time the device is kept
Re: [Intel-gfx] [PATCH v2] PM / Runtime: Introduce pm_runtime_get_noidle
On 11 December 2015 at 16:13, Rafael J. Wysockiwrote: > On Friday, December 11, 2015 01:03:50 PM Ulf Hansson wrote: >> [...] >> >> >> > >> >> > Which basically means you can call pm_runtime_resume() just fine, >> >> > because it will do nothing if the status is RPM_ACTIVE already. >> >> > >> >> > So really, why don't you use pm_runtime_get_sync()? >> >> >> >> The difference would be that if the status is not RPM_ACTIVE already we >> >> would drop the reference and report error. The caller would in this >> >> case forego of doing something, since we the device is suspended or on >> >> the way to being suspended. One example of such a scenario is a >> >> watchdog like functionality: the watchdog work would >> >> call pm_runtime_get_noidle() and check if the device is ok by doing >> >> some HW access, but only if the device is powered. Otherwise the work >> >> item would do nothing (meaning it also won't reschedule itself). The >> >> watchdog work would get rescheduled next time the device is woken up >> >> and some work is submitted to the device. >> > >> > So first of all the name "pm_runtime_get_noidle" doesn't make sense. >> > >> > I guess what you need is something like >> > >> > bool pm_runtime_get_if_active(struct device *dev) >> > { >> > unsigned log flags; >> > bool ret; >> > >> > spin_lock_irqsave(>power.lock, flags); >> > >> > if (dev->power.runtime_status == RPM_ACTIVE) { >> > atomic_inc(>power.usage_count); >> > ret = true; >> > } else { >> > ret = false; >> > } >> > >> > spin_unlock_irqrestore(>power.lock, flags); >> > } >> > >> > and the caller will simply bail out if "false" is returned, but if "true" >> > is returned, it will have to drop the usage count, right? >> > >> > Thanks, >> > Rafael >> > >> >> Why not just: >> >> pm_runtime_get_noresume(): >> if (RPM_ACTIVE) >> "do some actions" >> pm_runtime_put(); > > Because that's racy? Right, that was too easy. :-) > > What if the rpm_suspend() is running for the device, but it hasn't changed > the status yet? So if we can add a pm_runtime_barrier() or even simplifier, just hold the spin_lock when checking if the rpm status is RPM_ACTIVE. Kind regards Uffe ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v2] PM / Runtime: Introduce pm_runtime_get_noidle
[...] >> > >> > Which basically means you can call pm_runtime_resume() just fine, >> > because it will do nothing if the status is RPM_ACTIVE already. >> > >> > So really, why don't you use pm_runtime_get_sync()? >> >> The difference would be that if the status is not RPM_ACTIVE already we >> would drop the reference and report error. The caller would in this >> case forego of doing something, since we the device is suspended or on >> the way to being suspended. One example of such a scenario is a >> watchdog like functionality: the watchdog work would >> call pm_runtime_get_noidle() and check if the device is ok by doing >> some HW access, but only if the device is powered. Otherwise the work >> item would do nothing (meaning it also won't reschedule itself). The >> watchdog work would get rescheduled next time the device is woken up >> and some work is submitted to the device. > > So first of all the name "pm_runtime_get_noidle" doesn't make sense. > > I guess what you need is something like > > bool pm_runtime_get_if_active(struct device *dev) > { > unsigned log flags; > bool ret; > > spin_lock_irqsave(>power.lock, flags); > > if (dev->power.runtime_status == RPM_ACTIVE) { > atomic_inc(>power.usage_count); > ret = true; > } else { > ret = false; > } > > spin_unlock_irqrestore(>power.lock, flags); > } > > and the caller will simply bail out if "false" is returned, but if "true" > is returned, it will have to drop the usage count, right? > > Thanks, > Rafael > Why not just: pm_runtime_get_noresume(): if (RPM_ACTIVE) "do some actions" pm_runtime_put(); Kind regards Uffe ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] Always mark GEM objects as dirty when written by the CPU
On Fri, Dec 11, 2015 at 12:29:40PM +, Chris Wilson wrote: > On Fri, Dec 11, 2015 at 12:19:09PM +, Dave Gordon wrote: > > On 10/12/15 08:58, Daniel Vetter wrote: > > >On Mon, Dec 07, 2015 at 12:51:49PM +, Dave Gordon wrote: > > >>I think I missed i915_gem_phys_pwrite(). > > >> > > >>i915_gem_gtt_pwrite_fast() marks the object dirty for most cases (vit > > >>set_to_gtt_domain(), but isn't called for all cases (or can return before > > >>the set_domain). Then we try i915_gem_shmem_pwrite() for non-phys > > >>objects (no check for stolen!) and that already marks the object dirty > > >>[aside: we might be able to change that to page-by-page?], but > > >>i915_gem_phys_pwrite() doesn't mark the object dirty, so we might lose > > >>updates there? > > >> > > >>Or maybe we should move the marking up into i915_gem_pwrite_ioctl() > > >>instead. > > >>The target object is surely going to be dirtied, whatever type it is. > > > > > >phys objects are special, and when binding we create allocate new > > >(contiguous) storage. In put_pages_phys that gets copied back and pages > > >marked as dirty. While a phys object is pinned it's a kernel bug to look > > >at the shmem pages and a userspace bug to touch the cpu mmap (since that > > >data will simply be overwritten whenever the kernel feels like). > > > > > >phys objects are only used for cursors on old crap though, so ok if we > > >don't streamline this fairly quirky old ABI. > > >-Daniel > > > > So is pread broken already for 'phys' ? > > Yes. A completely unused corner of the API. I think it would be useful to extract all the phys object stuff into i915_gem_phys_obj.c, add minimal kerneldoc for the functions, and then an overview section which explains in detail how fucked up this little bit of ABI history lore is. I can do the overview section, but the extraction/basic kerneldoc will probably take a bit longer to get around to. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Wait for PP cycle delay only if panel is in power off sequence
On Fri, Dec 11, 2015 at 05:11:23PM +0530, Kumar, Shobhit wrote: > On 12/11/2015 04:55 PM, Thulasimani, Sivakumar wrote: > > > > > >On 12/10/2015 8:32 PM, Ville Syrjälä wrote: > >>On Thu, Dec 10, 2015 at 08:09:01PM +0530, Thulasimani, Sivakumar wrote: > >>> > >>>On 12/10/2015 7:08 PM, Ville Syrjälä wrote: > On Thu, Dec 10, 2015 at 03:15:37PM +0200, Ville Syrjälä wrote: > >On Thu, Dec 10, 2015 at 03:01:02PM +0530, Kumar, Shobhit wrote: > >>On 12/09/2015 09:35 PM, Ville Syrjälä wrote: > >>>On Wed, Dec 09, 2015 at 08:59:26PM +0530, Shobhit Kumar wrote: > On Wed, Dec 9, 2015 at 8:34 PM, Chris Wilson >wrote: > >On Wed, Dec 09, 2015 at 08:07:10PM +0530, Shobhit Kumar wrote: > >>On Wed, Dec 9, 2015 at 7:27 PM, Ville Syrjälä > >> wrote: > >>>On Wed, Dec 09, 2015 at 06:51:48PM +0530, Shobhit Kumar wrote: > During resume, while turning the EDP panel power on, we need > not wait > blindly for panel_power_cycle_delay. Check if panel power > down sequence > in progress and then only wait. This improves our resume > time significantly. > > Signed-off-by: Shobhit Kumar > --- > drivers/gpu/drm/i915/intel_dp.c | 17 - > 1 file changed, 16 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/intel_dp.c > b/drivers/gpu/drm/i915/intel_dp.c > index f335c92..10ec669 100644 > --- a/drivers/gpu/drm/i915/intel_dp.c > +++ b/drivers/gpu/drm/i915/intel_dp.c > @@ -617,6 +617,20 @@ static bool edp_have_panel_power(struct > intel_dp *intel_dp) > return (I915_READ(_pp_stat_reg(intel_dp)) & PP_ON) > != 0; > } > > +static bool edp_panel_off_seq(struct intel_dp *intel_dp) > +{ > + struct drm_device *dev = intel_dp_to_dev(intel_dp); > + struct drm_i915_private *dev_priv = dev->dev_private; > + > + lockdep_assert_held(_priv->pps_mutex); > + > + if (IS_VALLEYVIEW(dev) && > + intel_dp->pps_pipe == INVALID_PIPE) > + return false; > + > + return (I915_READ(_pp_stat_reg(intel_dp)) & > PP_SEQUENCE_POWER_DOWN) != 0; > +} > >>>This doens't make sense to me. The power down cycle may have > >>>completed just before, and so this would claim we don't have to > >>>wait for the power_cycle_delay. > >>Not sure I understand your concern correctly. You are right, > >>power > >>down cycle may have completed just before and if it has then > >>we don't > >>need to wait. But in case the power down cycle is in progress > >>as per > >>internal state, then we need to wait for it to complete. This > >>will > >>happen for example in non-suspend disable path and will be > >>handled > >>correctly. In case of actual suspend/resume, this would have > >>successfully completed and will skip the wait as it is not needed > >>before enabling panel power. > >> > + > static bool edp_have_panel_vdd(struct intel_dp *intel_dp) > { > struct drm_device *dev = intel_dp_to_dev(intel_dp); > @@ -2025,7 +2039,8 @@ static void edp_panel_on(struct > intel_dp *intel_dp) > port_name(dp_to_dig_port(intel_dp)->port))) > return; > > - wait_panel_power_cycle(intel_dp); > + if (edp_panel_off_seq(intel_dp)) > + wait_panel_power_cycle(intel_dp); > >Looking in from the side, I have no idea what this is meant to > >do. At > >the very least you need your explanatory paragraph here which > >would > >include what exactly you are waiting for at the start of > >edp_panel_on > >(and please try and find a better name for edp_panel_off_seq()). > I will add a comment. Basically I am not additionally waiting, but > converting the wait which was already there to a conditional > wait. The > edp_panel_off_seq, checks if panel power down sequence is in > progress. > In that case we need to wait for the panel power cycle delay. If > it is > not in that sequence, there is no need to wait. I will make an > attempt > again on the naming in next patch update. > >>>As far I remeber you need to wait for power_cycle_delay between > >>>power > >>>down cycle
Re: [Intel-gfx] [PATCH i-g-t] RFC: split PM workarounds into separate lib
On Thu, Dec 10, 2015 at 06:01:28PM +0200, David Weinehall wrote: > On Tue, Dec 08, 2015 at 03:42:27PM +0200, Ville Syrjälä wrote: > > On Tue, Dec 08, 2015 at 10:50:39AM +0200, David Weinehall wrote: > > > Since the defaults for some external power management related settings > > > prevents us from testing our power management functionality properly, > > > we have to work around it. Currently this is done from the individual > > > test cases, but this is sub-optimal. This patch moves the PM-related > > > workarounds into a separate library, and adds some code to restore the > > > previous settings for the SATA link power management while at it. > > > > Why is it called "workarounds"? That gives me the impression we're > > working around something that's supposed to work but doesn't. That's not > > the case here. > > Workarounds was because we are working around "imperfect" settings > in other components. At least to me power management should be enabled > out of the box, not something that requires admin-level workarounds. > Since we're not in control of said defaults, we have to modify the > settings when we run our tests, hence workarounds. Fully agreed that power tuning should be applied by default, but that's a loong process to convince all the other kernel maintainers. And we need to get our own house in order first too, but that's in progress. > That said, as I've replied to a later post, igt_pm is fine by me. One more: Please namespace all the library functions you're adding and exporting to tests with igt_pm_. Static/internal functions can still be named however you feel like. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH i-g-t] gem_flink_race/prime_self_import: Improve test reliability
On Fri, Dec 11, 2015 at 10:33:46AM +, Morton, Derek J wrote: > > > > > >-Original Message- > >From: Daniel Vetter [mailto:daniel.vet...@ffwll.ch] On Behalf Of Daniel > >Vetter > >Sent: Thursday, December 10, 2015 12:53 PM > >To: Morton, Derek J > >Cc: Daniel Vetter; intel-gfx@lists.freedesktop.org; Wood, Thomas > >Subject: Re: [Intel-gfx] [PATCH i-g-t] gem_flink_race/prime_self_import: > >Improve test reliability > > > >On Thu, Dec 10, 2015 at 11:51:29AM +, Morton, Derek J wrote: > >> > > >> > > >> >-Original Message- > >> >From: Daniel Vetter [mailto:daniel.vet...@ffwll.ch] On Behalf Of > >> >Daniel Vetter > >> >Sent: Thursday, December 10, 2015 10:13 AM > >> >To: Morton, Derek J > >> >Cc: intel-gfx@lists.freedesktop.org; Wood, Thomas > >> >Subject: Re: [Intel-gfx] [PATCH i-g-t] > >> >gem_flink_race/prime_self_import: Improve test reliability > >> > > >> >On Tue, Dec 08, 2015 at 12:44:44PM +, Derek Morton wrote: > >> >> gem_flink_race and prime_self_import have subtests which read the > >> >> number of open gem objects from debugfs to determine if objects > >> >> have leaked during the test. However the test can fail sporadically > >> >> if the number of gem objects changes due to other process activity. > >> >> This patch introduces a change to check the number of gem objects > >> >> several times to filter out any fluctuations. > >> > > >> >Why exactly does this happen? IGT tests should be run on bare metal, > >> >with everything else killed/subdued/shutup. If there's still things > >> >going on that create objects, we need to stop them from doing that. > >> > > >> >If this only applies to Android, or some special Android deamon them > >> >imo check for that at runtime and igt_skip("your setup is invalid, > >> >deamon %s running\n"); is the correct fix. After all just because you > >> >sampled for a bit doesn't mean that it wont still change right when > >> >you start running the test for real, so this is still fragile. > >> > >> Before running tests on android we do stop everything possible. I > >> suspect the culprit is coreu getting automatically restarted after it > >> is stopped. I had additional debug while developing this patch and > >> what I saw was the system being mostly quiescent but with some very > >> low level background activity. 1 extra object being created and then > >> deleted occasionally. Depending on whether it occurred at the start or > >> end of the test it was resulting in a reported leak of either 1 or -1 > >> objects. > >> The patch fixes that issue by taking several samples and requiring > >> them to be the same, therefore filtering out the low level background > >> noise. > >> It would not help if something in the background allocated an object > >> and kept it allocated, but I have not seen that happen. I only saw > >> once the object count increasing for 2 consecutive reads hence the > >> count to 4 to give a margin. The test was failing about 10%. With this > >> patch I got 100% pass across 300 runs of each of the tests. > > > >Hm, piglit checks that there's no other drm clients running. Have you tried > >re-running that check to zero in on the culprit? > > We don't use piglet to run IGT tests on Android. I have had a look at what > piglet does and added the same check to our scripts. (It reads a list of > clients from /sys/kernel/debug/dri/0/clients) > For CHV it shows a process called 'y', though that seems to be some issue on > CHV that all driver clients are called 'y'. I checked on BXT which properly > shows the process names and it looks like it is the binder process (which is > handling some inter process communication). I don't think this is something > we can stop. Nah, you definitely can't stop binder, won't have an android left after that ;-) But it is strange that binder owns these buffers. Binder is just IPC, but like unix domain sockets you can also throw around file descriptors. So something on your system is moving open drm fd devices still around. I don't have an idea what kind of audit/debug tooling binder offers, but there should be a way to figure out who really owns that file descriptor. If you're lucky lsof (if android has that, otherwise walk /proc/*/fd/* symlinks manually) should help. Cheers, Daniel > >> If you are concerned about the behaviour when running the test with a > >> load of background activity I could add code to limit to the reset of > >> the count and fail the test in that instance. That would give a > >> benefit of distinguishing a test fail due to excessive background > >> activity from a detected leak. > > > >I'm also concerned for the overhead this causes everyone else. If this > >really is some Android trouble then I think it'd be good to only compile > >this on Android. But would still be much better if you can get to a reliably > >clean test environment. > > I will make the loop part android specific. > > > //Derek > > > > >> I would not want to
Re: [Intel-gfx] [PATCH] drm/i915: Update to post-reset execlist queue clean-up
On Fri, Dec 11, 2015 at 02:14:00PM +, Dave Gordon wrote: > On 01/12/15 11:46, Tvrtko Ursulin wrote: > > > >On 23/10/15 18:02, Tomas Elf wrote: > >>When clearing an execlist queue, instead of traversing it and > >>unreferencing all > >>requests while holding the spinlock (which might lead to thread > >>sleeping with > >>IRQs are turned off - bad news!), just move all requests to the retire > >>request > >>list while holding spinlock and then drop spinlock and invoke the > >>execlists > >>request retirement path, which already deals with the intricacies of > >>purging/dereferencing execlist queue requests. > >> > >>This patch can be considered v3 of: > >> > >>commit b96db8b81c54ef30485ddb5992d63305d86ea8d3 > >>Author: Tomas Elf> >>drm/i915: Grab execlist spinlock to avoid post-reset concurrency > >>issues > >> > >>This patch assumes v2 of the above patch is part of the baseline, > >>reverts v2 > >>and adds changes on top to turn it into v3. > >> > >>Signed-off-by: Tomas Elf > >>Cc: Tvrtko Ursulin > >>Cc: Chris Wilson > >>--- > >> drivers/gpu/drm/i915/i915_gem.c | 15 --- > >> 1 file changed, 4 insertions(+), 11 deletions(-) > >> > >>diff --git a/drivers/gpu/drm/i915/i915_gem.c > >>b/drivers/gpu/drm/i915/i915_gem.c > >>index 2c7a0b7..b492603 100644 > >>--- a/drivers/gpu/drm/i915/i915_gem.c > >>+++ b/drivers/gpu/drm/i915/i915_gem.c > >>@@ -2756,20 +2756,13 @@ static void i915_gem_reset_ring_cleanup(struct > >>drm_i915_private *dev_priv, > >> > >> if (i915.enable_execlists) { > >> spin_lock_irq(>execlist_lock); > >>-while (!list_empty(>execlist_queue)) { > >>-struct drm_i915_gem_request *submit_req; > >> > >>-submit_req = list_first_entry(>execlist_queue, > >>-struct drm_i915_gem_request, > >>-execlist_link); > >>-list_del(_req->execlist_link); > >>+/* list_splice_tail_init checks for empty lists */ > >>+list_splice_tail_init(>execlist_queue, > >>+ >execlist_retired_req_list); > >> > >>-if (submit_req->ctx != ring->default_context) > >>-intel_lr_context_unpin(submit_req); > >>- > >>-i915_gem_request_unreference(submit_req); > >>-} > >> spin_unlock_irq(>execlist_lock); > >>+intel_execlists_retire_requests(ring); > >> } > >> > >> /* > > > >Fallen through the cracks.. > > > >This looks to be even more serious, since lockdep notices possible > >deadlock involving vmap_area_lock: > > > > Possible interrupt unsafe locking scenario: > > > >CPU0CPU1 > > > > lock(vmap_area_lock); > >local_irq_disable(); > >lock(&(>execlist_lock)->rlock); > >lock(vmap_area_lock); > > > > lock(&(>execlist_lock)->rlock); > > > > *** DEADLOCK *** > > > >Because it unpins LRC context and ringbuffer which ends up in the VM > >code under the execlist_lock. > > > >intel_execlists_retire_requests is slightly different from the code in > >the reset handler because it concerns itself with ctx_obj existence > >which the other one doesn't. > > > >Could people more knowledgeable of this code check if it is OK and R-B? > > > >Regards, > > > >Tvrtko > > Hi Tvrtko, > > I didn't understand this message at first, I thought you'd found a problem > with this ("v3") patch, but now I see what you actually meant is that there > is indeed a problem with the (v2) that got merged, not the original question > about unreferencing an object while holding a spinlock (because it can't be > the last reference), but rather because of the unpin, which can indeed cause > a problem with a non-i915-defined kernel lock. > > So we should certainly update the current (v2) upstream with this. > Thomas Daniel already R-B'd this code on 23rd October, when it was: > > [PATCH v3 7/8] drm/i915: Grab execlist spinlock to avoid post-reset > concurrency issues. > > and it hasn't changed in substance since then, so you can carry his R-B > over, plus I said on that same day that this was a better solution. So: > > Reviewed-by: Thomas Daniel > Reviewed-by: Dave Gordon Indeed, fell through the cracks more than once :( Sorry about that, picked up now. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 3/3] drm/i915: Prevent leaking of -EIO from i915_wait_request()
On Fri, Dec 11, 2015 at 09:02:18AM +, Chris Wilson wrote: > On Thu, Dec 03, 2015 at 10:14:54AM +0100, Daniel Vetter wrote: > > On Tue, Dec 01, 2015 at 11:05:35AM +, Chris Wilson wrote: > > > diff --git a/drivers/gpu/drm/i915/intel_display.c > > > b/drivers/gpu/drm/i915/intel_display.c > > > index 4447e73b54db..73c61b94f7fd 100644 > > > --- a/drivers/gpu/drm/i915/intel_display.c > > > +++ b/drivers/gpu/drm/i915/intel_display.c > > > @@ -13315,23 +13309,15 @@ static int intel_atomic_prepare_commit(struct > > > drm_device *dev, > > > > > > ret = __i915_wait_request(intel_plane_state->wait_req, > > > true, NULL, NULL); > > > - > > > - /* Swallow -EIO errors to allow updates during hw > > > lockup. */ > > > - if (ret == -EIO) > > > - ret = 0; > > > - > > > - if (ret) > > > + if (ret) { > > > + mutex_lock(>struct_mutex); > > > + drm_atomic_helper_cleanup_planes(dev, state); > > > + mutex_unlock(>struct_mutex); > > > break; > > > + } > > > } > > > - > > > - if (!ret) > > > - return 0; > > > - > > > - mutex_lock(>struct_mutex); > > > - drm_atomic_helper_cleanup_planes(dev, state); > > > } > > > > > > - mutex_unlock(>struct_mutex); > > > > Sneaking in lockless waits! Separate patch please. > > No, it is just badly written code. The wait is already lockless but the > lock is dropped and retaken around the error paths in such a manner that > you cannot see this from a glimpse. Indeed lack of diff context made me all confused, I stand corrected. Looks good. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx