[Intel-gfx] [PATCH 09/10] drm/i915: Migrate stolen objects before hibernation
From: Chris WilsonVille reminded us that stolen memory is not preserved across hibernation, and a result of this was that context objects now being allocated from stolen were being corrupted on S4 and promptly hanging the GPU on resume. We want to utilise stolen for as much as possible (nothing else will use that wasted memory otherwise), so we need a strategy for handling general objects allocated from stolen and hibernation. A simple solution is to do a CPU copy through the GTT of the stolen object into a fresh shmemfs backing store and thenceforth treat it as a normal objects. This can be refined in future to either use a GPU copy to avoid the slow uncached reads (though it's hibernation!) and recreate stolen objects upon resume/first-use. For now, a simple approach should suffice for testing the object migration. v2: Swap PTE for pinned bindings over to the shmemfs. This adds a complicated dance, but is required as many stolen objects are likely to be pinned for use by the hardware. Swapping the PTEs should not result in externally visible behaviour, as each PTE update should be atomic and the two pages identical. (danvet) safe-by-default, or the principle of least surprise. We need a new flag to mark objects that we can wilfully discard and recreate across hibernation. (danvet) Just use the global_list rather than invent a new stolen_list. This is the slowpath hibernate and so adding a new list and the associated complexity isn't worth it. v3: Rebased on drm-intel-nightly (Ankit) v4: Use insert_page to map stolen memory backed pages for migration to shmem (Chris) v5: Acquire mutex lock while copying stolen buffer objects to shmem (Chris) v6: Handled file leak, Splitted object migration function, added kerneldoc for migrate_stolen_to_shmemfs() function (Tvrtko) Use i915 wrapper function for drm_mm_insert_node_in_range() v7: Keep the object in cpu domain after get_pages, remove the object from the unbound list only when marked PURGED, Corrected split of object migration function (Chris) v8: Split i915_gem_freeze(), removed redundant use of barrier, corrected use of set_to_cpu_domain() (Chris) v9: Replaced WARN_ON by BUG_ON and added a comment explaining it (Daniel/Tvrtko) v10: Document use of barriers (Chris) Signed-off-by: Chris Wilson Signed-off-by: Ankitprasad Sharma Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_drv.c | 17 ++- drivers/gpu/drm/i915/i915_drv.h | 10 ++ drivers/gpu/drm/i915/i915_gem.c | 198 ++-- drivers/gpu/drm/i915/i915_gem_stolen.c | 49 drivers/gpu/drm/i915/intel_display.c| 3 + drivers/gpu/drm/i915/intel_fbdev.c | 6 + drivers/gpu/drm/i915/intel_pm.c | 2 + drivers/gpu/drm/i915/intel_ringbuffer.c | 6 + 8 files changed, 279 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 11d8414..cfa44af 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -996,6 +996,21 @@ static int i915_pm_suspend(struct device *dev) return i915_drm_suspend(drm_dev); } +static int i915_pm_freeze(struct device *dev) +{ + int ret; + + ret = i915_gem_freeze(pci_get_drvdata(to_pci_dev(dev))); + if (ret) + return ret; + + ret = i915_pm_suspend(dev); + if (ret) + return ret; + + return 0; +} + static int i915_pm_suspend_late(struct device *dev) { struct drm_device *drm_dev = dev_to_i915(dev)->dev; @@ -1643,7 +1658,7 @@ static const struct dev_pm_ops i915_pm_ops = { * @restore, @restore_early : called after rebooting and restoring the *hibernation image [PMSG_RESTORE] */ - .freeze = i915_pm_suspend, + .freeze = i915_pm_freeze, .freeze_late = i915_pm_suspend_late, .thaw_early = i915_pm_resume_early, .thaw = i915_pm_resume, diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 943b301..16f2f94 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2137,6 +2137,12 @@ struct drm_i915_gem_object { * Advice: are the backing pages purgeable? */ unsigned int madv:2; + /** +* Whereas madv is for userspace, there are certain situations +* where we want I915_MADV_DONTNEED behaviour on internal objects +* without conflating the userspace setting. +*/ + unsigned int internal_volatile:1; /** * Current tiling mode for the object. @@ -3093,6 +3099,9 @@ int i915_gem_l3_remap(struct drm_i915_gem_request *req, int slice); void i915_gem_init_swizzling(struct drm_device *dev); void i915_gem_cleanup_ringbuffer(struct drm_device *dev); int __must_check i915_gpu_idle(struct drm_device *dev); +int __must_check
[Intel-gfx] [PATCH 10/10] drm/i915: Disable use of stolen area by User when Intel RST is present
From: Ankitprasad SharmaThe BIOS RapidStartTechnology may corrupt the stolen memory across S3 suspend due to unalarmed hibernation, in which case we will not be able to preserve the User data stored in the stolen region. Hence this patch tries to identify presence of the RST device on the ACPI bus, and disables use of stolen memory (for persistent data) if found. v2: Updated comment, updated/corrected new functions private to driver (Chris/Tvrtko) v3: Disabling stolen by default, wait till required acpi changes to detect device presence are pulled in (Ankit) v4: Enabled stolen by default as required acpi changes are merged (Ankit) v5: renamed variable, is IS_ENABLED() in place of #ifdef, use char* instead of structures (Lukas) Signed-off-by: Ankitprasad Sharma Cc: Lukas Wunner --- drivers/gpu/drm/i915/i915_drv.h| 11 +++ drivers/gpu/drm/i915/i915_gem.c| 8 drivers/gpu/drm/i915/i915_gem_stolen.c | 12 drivers/gpu/drm/i915/intel_acpi.c | 7 +++ 4 files changed, 38 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 16f2f94..75e6935 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1349,6 +1349,16 @@ struct i915_gem_mm { */ bool busy; + /** +* Stolen will be lost upon hibernate (as the memory is unpowered). +* Across resume, we expect stolen to be intact - however, it may +* also be utililised by third parties (e.g. Intel RapidStart +* Technology) and if so we have to assume that any data stored in +* stolen across resume is lost and we set this flag to indicate that +* the stolen memory is volatile. +*/ + bool volatile_stolen; + /* the indicator for dispatch video commands on two BSD rings */ unsigned int bsd_ring_dispatch_index; @@ -3465,6 +3475,7 @@ intel_opregion_notify_adapter(struct drm_device *dev, pci_power_t state) #endif /* intel_acpi.c */ +bool intel_detect_acpi_rst(void); #ifdef CONFIG_ACPI extern void intel_register_dsm_handler(void); extern void intel_unregister_dsm_handler(void); diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 587beea..8e5fce4 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -396,8 +396,16 @@ static struct drm_i915_gem_object * i915_gem_alloc_object_stolen(struct drm_device *dev, size_t size) { struct drm_i915_gem_object *obj; + struct drm_i915_private *dev_priv = dev->dev_private; int ret; + if (dev_priv->mm.volatile_stolen) { + /* Stolen may be overwritten by external parties +* so unsuitable for persistent user data. +*/ + return ERR_PTR(-ENODEV); + } + mutex_lock(>struct_mutex); obj = i915_gem_object_create_stolen(dev, size); if (IS_ERR(obj)) diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c b/drivers/gpu/drm/i915/i915_gem_stolen.c index 335a1ef..88ee036 100644 --- a/drivers/gpu/drm/i915/i915_gem_stolen.c +++ b/drivers/gpu/drm/i915/i915_gem_stolen.c @@ -482,6 +482,18 @@ int i915_gem_init_stolen(struct drm_device *dev) */ drm_mm_init(_priv->mm.stolen, 0, dev_priv->gtt.stolen_usable_size); + /* If the stolen region can be modified behind our backs upon suspend, +* then we cannot use it to store nonvolatile contents (i.e user data) +* as it will be corrupted upon resume. +*/ + dev_priv->mm.volatile_stolen = false; + if (IS_ENABLED(CONFIG_SUSPEND)) { + /* BIOSes using RapidStart Technology have been reported +* to overwrite stolen across S3, not just S4. +*/ + dev_priv->mm.volatile_stolen = intel_detect_acpi_rst(); + } + return 0; } diff --git a/drivers/gpu/drm/i915/intel_acpi.c b/drivers/gpu/drm/i915/intel_acpi.c index eb638a1..05fd67f 100644 --- a/drivers/gpu/drm/i915/intel_acpi.c +++ b/drivers/gpu/drm/i915/intel_acpi.c @@ -23,6 +23,8 @@ static const u8 intel_dsm_guid[] = { 0x0f, 0x13, 0x17, 0xb0, 0x1c, 0x2c }; +static const char *irst_id = "INT3392"; + static char *intel_dsm_port_name(u8 id) { switch (id) { @@ -162,3 +164,8 @@ void intel_register_dsm_handler(void) void intel_unregister_dsm_handler(void) { } + +bool intel_detect_acpi_rst(void) +{ + return acpi_dev_present(irst_id); +} -- 1.9.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 03/10] drm/i915: Use insert_page for pwrite_fast
From: Ankitprasad SharmaIn pwrite_fast, map an object page by page if obj_ggtt_pin fails. First, we try a nonblocking pin for the whole object (since that is fastest if reused), then failing that we try to grab one page in the mappable aperture. It also allows us to handle objects larger than the mappable aperture (e.g. if we need to pwrite with vGPU restricting the aperture to a measely 8MiB or something like that). v2: Pin pages before starting pwrite, Combined duplicate loops (Chris) v3: Combined loops based on local patch by Chris (Chris) v4: Added i915 wrapper function for drm_mm_insert_node_in_range (Chris) v5: Renamed wrapper function for drm_mm_insert_node_in_range (Chris) v5: Added wrapper for drm_mm_remove_node() (Chris) v6: Added get_pages call before pinning the pages (Tvrtko) Added remove_mappable_node() wrapper for drm_mm_remove_node() (Chris) v7: Added size argument for insert_mappable_node (Tvrtko) v8: Do not put_pages after pwrite, do memset of node in the wrapper function (insert_mappable_node) (Chris) Signed-off-by: Ankitprasad Sharma Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_gem.c | 92 +++-- 1 file changed, 70 insertions(+), 22 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index a928823..49a03f2 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -61,6 +61,24 @@ static bool cpu_write_needs_clflush(struct drm_i915_gem_object *obj) return obj->pin_display; } +static int +insert_mappable_node(struct drm_i915_private *i915, + struct drm_mm_node *node, u32 size) +{ + memset(node, 0, sizeof(*node)); + return drm_mm_insert_node_in_range_generic(>gtt.base.mm, node, + size, 0, 0, 0, + i915->gtt.mappable_end, + DRM_MM_SEARCH_DEFAULT, + DRM_MM_CREATE_DEFAULT); +} + +static void +remove_mappable_node(struct drm_mm_node *node) +{ + drm_mm_remove_node(node); +} + /* some bookkeeping */ static void i915_gem_info_add_obj(struct drm_i915_private *dev_priv, size_t size) @@ -760,20 +778,33 @@ fast_user_write(struct io_mapping *mapping, * user into the GTT, uncached. */ static int -i915_gem_gtt_pwrite_fast(struct drm_device *dev, +i915_gem_gtt_pwrite_fast(struct drm_i915_private *i915, struct drm_i915_gem_object *obj, struct drm_i915_gem_pwrite *args, struct drm_file *file) { - struct drm_i915_private *dev_priv = dev->dev_private; - ssize_t remain; - loff_t offset, page_base; + struct drm_mm_node node; + uint64_t remain, offset; char __user *user_data; - int page_offset, page_length, ret; + int ret; ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE | PIN_NONBLOCK); - if (ret) - goto out; + if (ret) { + ret = insert_mappable_node(i915, , PAGE_SIZE); + if (ret) + goto out; + + ret = i915_gem_object_get_pages(obj); + if (ret) { + remove_mappable_node(); + goto out; + } + + i915_gem_object_pin_pages(obj); + } else { + node.start = i915_gem_obj_ggtt_offset(obj); + node.allocated = false; + } ret = i915_gem_object_set_to_gtt_domain(obj, true); if (ret) @@ -783,31 +814,39 @@ i915_gem_gtt_pwrite_fast(struct drm_device *dev, if (ret) goto out_unpin; - user_data = to_user_ptr(args->data_ptr); - remain = args->size; - - offset = i915_gem_obj_ggtt_offset(obj) + args->offset; - intel_fb_obj_invalidate(obj, ORIGIN_GTT); + obj->dirty = true; - while (remain > 0) { + user_data = to_user_ptr(args->data_ptr); + offset = args->offset; + remain = args->size; + while (remain) { /* Operation in this page * * page_base = page offset within aperture * page_offset = offset within page * page_length = bytes to copy for this page */ - page_base = offset & PAGE_MASK; - page_offset = offset_in_page(offset); - page_length = remain; - if ((page_offset + remain) > PAGE_SIZE) - page_length = PAGE_SIZE - page_offset; - + u32 page_base = node.start; + unsigned page_offset = offset_in_page(offset); +
[Intel-gfx] [PATCH v17 0/10] Support for creating/using Stolen memory backed objects
From: Ankitprasad SharmaThis patch series adds support for creating/using Stolen memory backed objects. Despite being a unified memory architecture (UMA) some bits of memory are more equal than others. In particular we have the thorny issue of stolen memory, memory stolen from the system by the BIOS and reserved for igfx use. Stolen memory is required for some functions of the GPU and display engine, but in general it goes wasted. Whilst we cannot return it back to the system, we need to find some other method for utilising it. As we do not support direct access to the physical address in the stolen region, it behaves like a different class of memory, closer in kin to local GPU memory. This strongly suggests that we need a placement model like TTM if we are to fully utilize these discrete chunks of differing memory. To add support for creating Stolen memory backed objects, we extend the drm_i915_gem_create structure, by adding a new flag through which user can specify the preference to allocate the object from stolen memory, which if set, an attempt will be made to allocate the object from stolen memory subject to the availability of free space in the stolen region. This patch series adds support for clearing buffer objects via CPU/GTT. This is particularly useful for clearing out the memory from stolen region, but can also be used for other shmem allocated objects. Currently being used for buffers allocated in the stolen region. Also adding support for stealing purgable stolen pages, if we run out of stolen memory when trying to allocate an object. v2: Added support for read/write from/to objects not backed by shmem using the pread/pwrite interface. Also extended the current get_aperture ioctl to retrieve the total and available size of the stolen region. v3: Removed the extended get_aperture ioctl patch 5 (to be submitted as part of other patch series), addressed comments by Chris about pread/pwrite for non shmem backed objects. v4: Rebased to the latest drm-intel-nightly. v5: Addressed comments, replaced patch 1/4 "Clearing buffers via blitter engine" by "Clearing buffers via CPU/GTT". v6: Rebased to the latest drm-intel-nightly, Addressed comments, updated stolen memory purging logic by maintaining a list for purgable stolen memory objects, enabled pread/pwrite for all non-shmem backed objects without tiling restrictions. v7: Addressed comments, compiler optimization, new patch added for correct error code propagation to the userspace. v8: Added a new patch to the series to Migrate stolen objects before hibernation, as stolen memory is not preserved across hibernation. Added correct error propagation for shmem as well non-shmem backed object allocation. v9: Addressed comments, use of insert_page helper function to map object page by page which can be helpful in low aperture space availability. v10: Addressed comments, use insert_page for clearing out the stolen memory v11: Addressed comments, 3 new patches added to support allocation from Stolen memory 1. Allow use of i915_gem_object_get_dma_address for stolen backed objects 2. Use insert_page for pwrite_fast 3. Fail the execbuff using stolen objects as batchbuffers v12: Addressed comments, Removed patch "Fail the execbuff using stolen objects as batchbuffers" v13: Addressed comments, Added 2 patches to detect Intel RST and disable stolen for persistent data if RST device found 1. acpi: Export acpi_bus_type 2. drm/i915: Disable use of stolen area by User when Intel RST is present v14: Addressed comments, Added 2 base patches to the series 1. drm/i915: Add support for mapping an object page by page 2. drm/i915: Introduce i915_gem_object_get_dma_address() v15: Addressed comments, Disabled stolen memory by default v16: Addressed comments, Added low level rpm assertions, Enabled stolen memory v17: Addressed comments This can be verified using IGT tests: igt/gem_stolen, igt/gem_create, igt/gem_pread, igt/gem_pwrite Ankitprasad Sharma (6): drm/i915: Use insert_page for pwrite_fast drm/i915: Clearing buffer objects via CPU/GTT drm/i915: Support for creating Stolen memory backed objects drm/i915: Propagating correct error codes to the userspace drm/i915: Support for pread/pwrite from/to non shmem backed objects drm/i915: Disable use of stolen area by User when Intel RST is present Chris Wilson (4): drm/i915: Add support for mapping an object page by page drm/i915: Introduce i915_gem_object_get_dma_address() drm/i915: Add support for stealing purgable stolen pages drm/i915: Migrate stolen objects before hibernation drivers/char/agp/intel-gtt.c | 9 + drivers/gpu/drm/i915/i915_debugfs.c | 6 +- drivers/gpu/drm/i915/i915_dma.c | 3 + drivers/gpu/drm/i915/i915_drv.c | 17 +- drivers/gpu/drm/i915/i915_drv.h | 58 ++- drivers/gpu/drm/i915/i915_gem.c | 631 ---
[Intel-gfx] [PATCH 06/10] drm/i915: Propagating correct error codes to the userspace
From: Ankitprasad SharmaPropagating correct error codes to userspace by using ERR_PTR and PTR_ERR macros for stolen memory based object allocation. We generally return -ENOMEM to the user whenever there is a failure in object allocation. This patch helps user to identify the correct reason for the failure and not just -ENOMEM each time. v2: Moved the patch up in the series, added error propagation for i915_gem_alloc_object too (Chris) v3: Removed storing of error pointer inside structs, Corrected error propagation in caller functions (Chris) v4: Remove assignments inside the predicate (Chris) v5: Removed unnecessary initializations, updated kerneldoc for i915_guc_client, corrected missed error pointer handling (Tvrtko) v6: Use ERR_CAST/temporary variable to avoid storing invalid pointer in a common field (Chris) v7: Resolved rebasing conflicts (Ankit) v8: Removed redundant code (Chris) Signed-off-by: Ankitprasad Sharma Reviewed-by: Chris Wilson --- drivers/gpu/drm/i915/i915_gem.c | 23 ++-- drivers/gpu/drm/i915/i915_gem_batch_pool.c | 4 +-- drivers/gpu/drm/i915/i915_gem_context.c | 4 +-- drivers/gpu/drm/i915/i915_gem_render_state.c | 7 ++-- drivers/gpu/drm/i915/i915_gem_stolen.c | 53 +++- drivers/gpu/drm/i915/i915_guc_submission.c | 52 +-- drivers/gpu/drm/i915/intel_display.c | 2 +- drivers/gpu/drm/i915/intel_fbdev.c | 6 ++-- drivers/gpu/drm/i915/intel_lrc.c | 10 +++--- drivers/gpu/drm/i915/intel_overlay.c | 4 +-- drivers/gpu/drm/i915/intel_pm.c | 7 ++-- drivers/gpu/drm/i915/intel_ringbuffer.c | 21 +-- 12 files changed, 110 insertions(+), 83 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 60d27fe..d63f18c 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -397,19 +397,18 @@ i915_gem_alloc_object_stolen(struct drm_device *dev, size_t size) mutex_lock(>struct_mutex); obj = i915_gem_object_create_stolen(dev, size); - if (!obj) { - mutex_unlock(>struct_mutex); - return NULL; - } + if (IS_ERR(obj)) + goto out; /* Always clear fresh buffers before handing to userspace */ ret = i915_gem_object_clear(obj); if (ret) { drm_gem_object_unreference(>base); - mutex_unlock(>struct_mutex); - return NULL; + obj = ERR_PTR(ret); + goto out; } +out: mutex_unlock(>struct_mutex); return obj; } @@ -444,8 +443,8 @@ i915_gem_create(struct drm_file *file, return -EINVAL; } - if (obj == NULL) - return -ENOMEM; + if (IS_ERR(obj)) + return PTR_ERR(obj); ret = drm_gem_handle_create(file, >base, ); /* drop reference from allocate - handle holds it now */ @@ -4562,14 +4561,16 @@ struct drm_i915_gem_object *i915_gem_alloc_object(struct drm_device *dev, struct drm_i915_gem_object *obj; struct address_space *mapping; gfp_t mask; + int ret; obj = i915_gem_object_alloc(dev); if (obj == NULL) - return NULL; + return ERR_PTR(-ENOMEM); - if (drm_gem_object_init(dev, >base, size) != 0) { + ret = drm_gem_object_init(dev, >base, size); + if (ret) { i915_gem_object_free(obj); - return NULL; + return ERR_PTR(ret); } mask = GFP_HIGHUSER | __GFP_RECLAIMABLE; diff --git a/drivers/gpu/drm/i915/i915_gem_batch_pool.c b/drivers/gpu/drm/i915/i915_gem_batch_pool.c index 7bf2f3f..d79caa2 100644 --- a/drivers/gpu/drm/i915/i915_gem_batch_pool.c +++ b/drivers/gpu/drm/i915/i915_gem_batch_pool.c @@ -135,8 +135,8 @@ i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool, int ret; obj = i915_gem_alloc_object(pool->dev, size); - if (obj == NULL) - return ERR_PTR(-ENOMEM); + if (IS_ERR(obj)) + return obj; ret = i915_gem_object_get_pages(obj); if (ret) diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index 83a097c..2dd5fed 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -179,8 +179,8 @@ i915_gem_alloc_context_obj(struct drm_device *dev, size_t size) int ret; obj = i915_gem_alloc_object(dev, size); - if (obj == NULL) - return ERR_PTR(-ENOMEM); + if (IS_ERR(obj)) + return obj; /* * Try to make the context utilize L3 as well as LLC. diff --git
[Intel-gfx] [PATCH 08/10] drm/i915: Support for pread/pwrite from/to non shmem backed objects
From: Ankitprasad SharmaThis patch adds support for extending the pread/pwrite functionality for objects not backed by shmem. The access will be made through gtt interface. This will cover objects backed by stolen memory as well as other non-shmem backed objects. v2: Drop locks around slow_user_access, prefault the pages before access (Chris) v3: Rebased to the latest drm-intel-nightly (Ankit) v4: Moved page base & offset calculations outside the copy loop, corrected data types for size and offset variables, corrected if-else braces format (Tvrtko/kerneldocs) v5: Enabled pread/pwrite for all non-shmem backed objects including without tiling restrictions (Ankit) v6: Using pwrite_fast for non-shmem backed objects as well (Chris) v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy, added pwrite slow path for non-shmem backed objects (Chris/Tvrtko) v8: Updated v7 commit message, mutex unlock around pwrite slow path for non-shmem backed objects (Tvrtko) v9: Corrected check during pread_ioctl, to avoid shmem_pread being called for non-shmem backed objects (Tvrtko) v10: Moved the write_domain check to needs_clflush and tiling mode check to pwrite_fast (Chris) v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed), call fast_user_write regardless of pagefault in previous iteration v12: Use page-by-page copy for slow user access too (Chris) v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj pinned (Chris) v14: Corrected datatypes/initializations (Tvrtko) Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite Signed-off-by: Ankitprasad Sharma --- drivers/gpu/drm/i915/i915_gem.c | 221 ++-- 1 file changed, 189 insertions(+), 32 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index ed8ae5d..0938ab1 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -55,6 +55,9 @@ static bool cpu_cache_is_coherent(struct drm_device *dev, static bool cpu_write_needs_clflush(struct drm_i915_gem_object *obj) { + if (obj->base.write_domain == I915_GEM_DOMAIN_CPU) + return false; + if (!cpu_cache_is_coherent(obj->base.dev, obj->cache_level)) return true; @@ -646,6 +649,141 @@ shmem_pread_slow(struct page *page, int shmem_page_offset, int page_length, return ret ? - EFAULT : 0; } +static inline uint64_t +slow_user_access(struct io_mapping *mapping, +uint64_t page_base, int page_offset, +char __user *user_data, +unsigned long length, bool pwrite) +{ + void __iomem *ioaddr; + void *vaddr; + uint64_t unwritten; + + ioaddr = io_mapping_map_wc(mapping, page_base); + /* We can use the cpu mem copy function because this is X86. */ + vaddr = (void __force *)ioaddr + page_offset; + if (pwrite) + unwritten = __copy_from_user(vaddr, user_data, length); + else + unwritten = __copy_to_user(user_data, vaddr, length); + + io_mapping_unmap(ioaddr); + return unwritten; +} + +static int +i915_gem_gtt_pread(struct drm_device *dev, + struct drm_i915_gem_object *obj, uint64_t size, + uint64_t data_offset, uint64_t data_ptr) +{ + struct drm_i915_private *dev_priv = dev->dev_private; + struct drm_mm_node node; + char __user *user_data; + uint64_t remain; + uint64_t offset; + int ret; + + ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE); + if (ret) { + ret = insert_mappable_node(dev_priv, , PAGE_SIZE); + if (ret) + goto out; + + ret = i915_gem_object_get_pages(obj); + if (ret) { + remove_mappable_node(); + goto out; + } + + i915_gem_object_pin_pages(obj); + } else { + node.start = i915_gem_obj_ggtt_offset(obj); + node.allocated = false; + ret = i915_gem_object_put_fence(obj); + if (ret) + goto out_unpin; + } + + ret = i915_gem_object_set_to_gtt_domain(obj, false); + if (ret) + goto out_unpin; + + user_data = to_user_ptr(data_ptr); + remain = size; + offset = data_offset; + + mutex_unlock(>struct_mutex); + if (likely(!i915.prefault_disable)) { + ret = fault_in_multipages_writeable(user_data, remain); + if (ret) { + mutex_lock(>struct_mutex); + goto out_unpin; + } + } + + while (remain > 0) { + /* Operation in this page +* +* page_base = page offset within aperture +* page_offset =
[Intel-gfx] [PATCH 04/10] drm/i915: Clearing buffer objects via CPU/GTT
From: Ankitprasad SharmaThis patch adds support for clearing buffer objects via CPU/GTT. This is particularly useful for clearing out the non shmem backed objects. Currently intend to use this only for buffers allocated from stolen region. v2: Added kernel doc for i915_gem_clear_object(), corrected/removed variable assignments (Tvrtko) v3: Map object page by page to the gtt if the pinning of the whole object to the ggtt fails, Corrected function name (Chris) v4: Clear the buffer page by page, and not map the whole object in the gtt aperture. Use i915 wrapper function in place of drm_mm_insert_node_in_range. v5: Use renamed wrapper function for drm_mm_insert_node_in_range, updated barrier positioning (Chris) v6: Use PAGE_SIZE instead of 4096, use get_pages call before pinning pages (Tvrtko) v7: Fixed the onion (undo operation in reverse order) (Chris) Testcase: igt/gem_stolen Signed-off-by: Ankitprasad Sharma Reviewed-by: Tvrtko Ursulin Reviewed-by: Chris Wilson --- drivers/gpu/drm/i915/i915_drv.h | 1 + drivers/gpu/drm/i915/i915_gem.c | 47 + 2 files changed, 48 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index e4c25c6..1122e1b 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2938,6 +2938,7 @@ int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj, int *needs_clflush); int __must_check i915_gem_object_get_pages(struct drm_i915_gem_object *obj); +int i915_gem_object_clear(struct drm_i915_gem_object *obj); static inline int __sg_page_count(struct scatterlist *sg) { diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 49a03f2..1aa4fc9 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -5405,3 +5405,50 @@ fail: drm_gem_object_unreference(>base); return ERR_PTR(ret); } + +/** + * i915_gem_object_clear() - Clear buffer object via CPU/GTT + * @obj: Buffer object to be cleared + * + * Return: 0 - success, non-zero - failure + */ +int i915_gem_object_clear(struct drm_i915_gem_object *obj) +{ + struct drm_i915_private *i915 = to_i915(obj->base.dev); + struct drm_mm_node node; + char __iomem *base; + uint64_t size = obj->base.size; + int ret, i; + + lockdep_assert_held(>base.dev->struct_mutex); + ret = insert_mappable_node(i915, , PAGE_SIZE); + if (ret) + return ret; + + ret = i915_gem_object_get_pages(obj); + if (ret) + goto err_remove_node; + + i915_gem_object_pin_pages(obj); + base = io_mapping_map_wc(i915->gtt.mappable, node.start); + + for (i = 0; i < size/PAGE_SIZE; i++) { + i915->gtt.base.insert_page(>gtt.base, + i915_gem_object_get_dma_address(obj, i), + node.start, + I915_CACHE_NONE, 0); + wmb(); /* flush modifications to the GGTT (insert_page) */ + memset_io(base, 0, PAGE_SIZE); + wmb(); /* flush the write before we modify the GGTT */ + } + + io_mapping_unmap(base); + i915->gtt.base.clear_range(>gtt.base, + node.start, node.size, + true); + i915_gem_object_unpin_pages(obj); + +err_remove_node: + remove_mappable_node(); + return ret; +} -- 1.9.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 07/10] drm/i915: Add support for stealing purgable stolen pages
From: Chris Wilson If we run out of stolen memory when trying to allocate an object, see if we can reap enough purgeable objects to free up enough contiguous free space for the allocation. This is in principle very much like evicting objects to free up enough contiguous space in the vma when binding a new object - and you will be forgiven for thinking that the code looks very similar. At the moment, we do not allow userspace to allocate objects in stolen, so there is neither the memory pressure to trigger stolen eviction nor any purgeable objects inside the stolen arena. However, this will change in the near future, and so better management and defragmentation of stolen memory will become a real issue. v2: Remember to remove the drm_mm_node. v3: Rebased to the latest drm-intel-nightly (Ankit) v4: corrected if-else braces format (Tvrtko/kerneldoc) v5: Rebased to the latest drm-intel-nightly (Ankit) Added a seperate list to maintain purgable objects from stolen memory region (Chris/Daniel) v6: Compiler optimization (merging 2 single loops into one for() loop), corrected code for object eviction, retire_requests before starting object eviction (Chris) v7: Added kernel doc for i915_gem_object_create_stolen() v8: Check for struct_mutex lock before creating object from stolen region (Tvrtko) v9: Renamed variables to make usage clear, added comment, removed onetime used macro (Tvrtko) v10: Avoid masking of error when stolen_alloc fails (Tvrtko) v11: Renamed stolen_link to tmp_link, as it may be used for other purposes too (Chris) Used ERR_CAST to cast error pointers while returning v12: Added lockdep_assert before starting stolen-backed object eviction (Chris) Testcase: igt/gem_stolen Signed-off-by: Chris Wilson Signed-off-by: Ankitprasad SharmaReviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_debugfs.c| 6 +- drivers/gpu/drm/i915/i915_drv.h| 17 +++- drivers/gpu/drm/i915/i915_gem.c| 15 +++ drivers/gpu/drm/i915/i915_gem_stolen.c | 171 + drivers/gpu/drm/i915/intel_pm.c| 4 +- 5 files changed, 188 insertions(+), 25 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index ec0c2a05e..aa7c7a3 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -174,7 +174,7 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj) seq_puts(m, ")"); } if (obj->stolen) - seq_printf(m, " (stolen: %08llx)", obj->stolen->start); + seq_printf(m, " (stolen: %08llx)", obj->stolen->base.start); if (obj->pin_display || obj->fault_mappable) { char s[3], *t = s; if (obj->pin_display) @@ -253,9 +253,9 @@ static int obj_rank_by_stolen(void *priv, struct drm_i915_gem_object *b = container_of(B, struct drm_i915_gem_object, obj_exec_link); - if (a->stolen->start < b->stolen->start) + if (a->stolen->base.start < b->stolen->base.start) return -1; - if (a->stolen->start > b->stolen->start) + if (a->stolen->base.start > b->stolen->base.start) return 1; return 0; } diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 55f2de9..943b301 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -840,6 +840,12 @@ struct i915_ctx_hang_stats { bool banned; }; +struct i915_stolen_node { + struct drm_mm_node base; + struct list_head mm_link; + struct drm_i915_gem_object *obj; +}; + /* This must match up with the value previously used for execbuf2.rsvd1. */ #define DEFAULT_CONTEXT_HANDLE 0 @@ -1291,6 +1297,13 @@ struct i915_gem_mm { */ struct list_head unbound_list; + /** +* List of stolen objects that have been marked as purgeable and +* thus available for reaping if we need more space for a new +* allocation. Ordered by time of marking purgeable. +*/ + struct list_head stolen_list; + /** Usable portion of the GTT for GEM */ unsigned long stolen_base; /* limited to low memory (32-bit) */ @@ -2089,7 +2102,7 @@ struct drm_i915_gem_object { struct list_head vma_list; /** Stolen memory for this object, instead of being backed by shmem. */ - struct drm_mm_node *stolen; + struct i915_stolen_node *stolen; struct list_head global_list; struct list_head ring_list[I915_NUM_RINGS]; @@ -2097,6 +2110,8 @@ struct drm_i915_gem_object { struct list_head obj_exec_link; struct list_head batch_pool_link; + /** Used to link an object to a list temporarily */ + struct list_head tmp_link; /** * This is set if the object is on the active lists (has pending diff --git
[Intel-gfx] [PATCH 02/10] drm/i915: Introduce i915_gem_object_get_dma_address()
From: Chris WilsonThis utility function is a companion to i915_gem_object_get_page() that uses the same cached iterator for the scatterlist to perform fast sequential lookup of the dma address associated with any page within the object. Signed-off-by: Chris Wilson Signed-off-by: Ankitprasad Sharma Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_drv.h | 17 + 1 file changed, 17 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 65a2cd0..e4c25c6 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2947,6 +2947,23 @@ static inline int __sg_page_count(struct scatterlist *sg) struct page * i915_gem_object_get_dirty_page(struct drm_i915_gem_object *obj, int n); +static inline dma_addr_t +i915_gem_object_get_dma_address(struct drm_i915_gem_object *obj, int n) +{ + if (n < obj->get_page.last) { + obj->get_page.sg = obj->pages->sgl; + obj->get_page.last = 0; + } + + while (obj->get_page.last + __sg_page_count(obj->get_page.sg) <= n) { + obj->get_page.last += __sg_page_count(obj->get_page.sg++); + if (unlikely(sg_is_chain(obj->get_page.sg))) + obj->get_page.sg = sg_chain_ptr(obj->get_page.sg); + } + + return sg_dma_address(obj->get_page.sg) + ((n - obj->get_page.last) << PAGE_SHIFT); +} + static inline struct page * i915_gem_object_get_page(struct drm_i915_gem_object *obj, int n) { -- 1.9.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 01/10] drm/i915: Add support for mapping an object page by page
From: Chris WilsonIntroduced a new vm specfic callback insert_page() to program a single pte in ggtt or ppgtt. This allows us to map a single page in to the mappable aperture space. This can be iterated over to access the whole object by using space as meagre as page size. v2: Added low level rpm assertions to insert_page routines (Chris) v3: Added POSTING_READ post register write (Tvrtko) Signed-off-by: Chris Wilson Signed-off-by: Ankitprasad Sharma --- drivers/char/agp/intel-gtt.c| 9 + drivers/gpu/drm/i915/i915_gem_gtt.c | 67 + drivers/gpu/drm/i915/i915_gem_gtt.h | 5 +++ include/drm/intel-gtt.h | 3 ++ 4 files changed, 84 insertions(+) diff --git a/drivers/char/agp/intel-gtt.c b/drivers/char/agp/intel-gtt.c index 1341a94..7c68576 100644 --- a/drivers/char/agp/intel-gtt.c +++ b/drivers/char/agp/intel-gtt.c @@ -838,6 +838,15 @@ static bool i830_check_flags(unsigned int flags) return false; } +void intel_gtt_insert_page(dma_addr_t addr, + unsigned int pg, + unsigned int flags) +{ + intel_private.driver->write_entry(addr, pg, flags); + wmb(); +} +EXPORT_SYMBOL(intel_gtt_insert_page); + void intel_gtt_insert_sg_entries(struct sg_table *st, unsigned int pg_start, unsigned int flags) diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index 715a771..6586525 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -2341,6 +2341,29 @@ static void gen8_set_pte(void __iomem *addr, gen8_pte_t pte) #endif } +static void gen8_ggtt_insert_page(struct i915_address_space *vm, + dma_addr_t addr, + uint64_t offset, + enum i915_cache_level level, + u32 unused) +{ + struct drm_i915_private *dev_priv = to_i915(vm->dev); + gen8_pte_t __iomem *pte = + (gen8_pte_t __iomem *)dev_priv->gtt.gsm + + (offset >> PAGE_SHIFT); + int rpm_atomic_seq; + + rpm_atomic_seq = assert_rpm_atomic_begin(dev_priv); + + gen8_set_pte(pte, gen8_pte_encode(addr, level, true)); + wmb(); + + I915_WRITE(GFX_FLSH_CNTL_GEN6, GFX_FLSH_CNTL_EN); + POSTING_READ(GFX_FLSH_CNTL_GEN6); + + assert_rpm_atomic_end(dev_priv, rpm_atomic_seq); +} + static void gen8_ggtt_insert_entries(struct i915_address_space *vm, struct sg_table *st, uint64_t start, @@ -2412,6 +2435,29 @@ static void gen8_ggtt_insert_entries__BKL(struct i915_address_space *vm, stop_machine(gen8_ggtt_insert_entries__cb, , NULL); } +static void gen6_ggtt_insert_page(struct i915_address_space *vm, + dma_addr_t addr, + uint64_t offset, + enum i915_cache_level level, + u32 flags) +{ + struct drm_i915_private *dev_priv = to_i915(vm->dev); + gen6_pte_t __iomem *pte = + (gen6_pte_t __iomem *)dev_priv->gtt.gsm + + (offset >> PAGE_SHIFT); + int rpm_atomic_seq; + + rpm_atomic_seq = assert_rpm_atomic_begin(dev_priv); + + iowrite32(vm->pte_encode(addr, level, true, flags), pte); + wmb(); + + I915_WRITE(GFX_FLSH_CNTL_GEN6, GFX_FLSH_CNTL_EN); + POSTING_READ(GFX_FLSH_CNTL_GEN6); + + assert_rpm_atomic_end(dev_priv, rpm_atomic_seq); +} + /* * Binds an object into the global gtt with the specified cache level. The object * will be accessible to the GPU via commands whose operands reference offsets @@ -2523,6 +2569,24 @@ static void gen6_ggtt_clear_range(struct i915_address_space *vm, assert_rpm_atomic_end(dev_priv, rpm_atomic_seq); } +static void i915_ggtt_insert_page(struct i915_address_space *vm, + dma_addr_t addr, + uint64_t offset, + enum i915_cache_level cache_level, + u32 unused) +{ + struct drm_i915_private *dev_priv = to_i915(vm->dev); + unsigned int flags = (cache_level == I915_CACHE_NONE) ? + AGP_USER_MEMORY : AGP_USER_CACHED_MEMORY; + int rpm_atomic_seq; + + rpm_atomic_seq = assert_rpm_atomic_begin(dev_priv); + + intel_gtt_insert_page(addr, offset >> PAGE_SHIFT, flags); + + assert_rpm_atomic_end(dev_priv, rpm_atomic_seq); +} + static void i915_ggtt_insert_entries(struct i915_address_space *vm, struct sg_table *pages, uint64_t start, @@ -3054,6 +3118,7 @@ static int
[Intel-gfx] [PATCH 05/10] drm/i915: Support for creating Stolen memory backed objects
From: Ankitprasad SharmaExtend the drm_i915_gem_create structure to add support for creating Stolen memory backed objects. Added a new flag through which user can specify the preference to allocate the object from stolen memory, which if set, an attempt will be made to allocate the object from stolen memory subject to the availability of free space in the stolen region. v2: Rebased to the latest drm-intel-nightly (Ankit) v3: Changed versioning of GEM_CREATE param, added new comments (Tvrtko) v4: Changed size from 32b to 64b to prevent userspace overflow (Tvrtko) Corrected function arguments ordering (Chris) v5: Corrected function name (Chris) v6: Updated datatype for flags to keep sizeof(drm_i915_gem_create) u64 aligned (Chris) v7: Use first 8 bits of gem_create flags for placement (Chris), Add helper function for object allocation from stolen region (Ankit) v8: Added comment explaining STOLEN placement flag (Chris) Testcase: igt/gem_stolen Signed-off-by: Ankitprasad Sharma Reviewed-by: Chris Wilson --- drivers/gpu/drm/i915/i915_dma.c| 3 +++ drivers/gpu/drm/i915/i915_drv.h| 2 +- drivers/gpu/drm/i915/i915_gem.c| 45 +++--- drivers/gpu/drm/i915/i915_gem_stolen.c | 4 +-- include/uapi/drm/i915_drm.h| 41 +++ 5 files changed, 89 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c index a42eb58..1aa2cb6 100644 --- a/drivers/gpu/drm/i915/i915_dma.c +++ b/drivers/gpu/drm/i915/i915_dma.c @@ -172,6 +172,9 @@ static int i915_getparam(struct drm_device *dev, void *data, case I915_PARAM_HAS_EXEC_SOFTPIN: value = 1; break; + case I915_PARAM_CREATE_VERSION: + value = 2; + break; default: DRM_DEBUG("Unknown parameter %d\n", param->param); return -EINVAL; diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 1122e1b..55f2de9 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -3301,7 +3301,7 @@ void i915_gem_stolen_remove_node(struct drm_i915_private *dev_priv, int i915_gem_init_stolen(struct drm_device *dev); void i915_gem_cleanup_stolen(struct drm_device *dev); struct drm_i915_gem_object * -i915_gem_object_create_stolen(struct drm_device *dev, u32 size); +i915_gem_object_create_stolen(struct drm_device *dev, u64 size); struct drm_i915_gem_object * i915_gem_object_create_stolen_for_preallocated(struct drm_device *dev, u32 stolen_offset, diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 1aa4fc9..60d27fe 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -389,10 +389,36 @@ void i915_gem_object_free(struct drm_i915_gem_object *obj) kmem_cache_free(dev_priv->objects, obj); } +static struct drm_i915_gem_object * +i915_gem_alloc_object_stolen(struct drm_device *dev, size_t size) +{ + struct drm_i915_gem_object *obj; + int ret; + + mutex_lock(>struct_mutex); + obj = i915_gem_object_create_stolen(dev, size); + if (!obj) { + mutex_unlock(>struct_mutex); + return NULL; + } + + /* Always clear fresh buffers before handing to userspace */ + ret = i915_gem_object_clear(obj); + if (ret) { + drm_gem_object_unreference(>base); + mutex_unlock(>struct_mutex); + return NULL; + } + + mutex_unlock(>struct_mutex); + return obj; +} + static int i915_gem_create(struct drm_file *file, struct drm_device *dev, uint64_t size, + uint64_t flags, uint32_t *handle_p) { struct drm_i915_gem_object *obj; @@ -403,8 +429,21 @@ i915_gem_create(struct drm_file *file, if (size == 0) return -EINVAL; + if (flags & __I915_CREATE_UNKNOWN_FLAGS) + return -EINVAL; + /* Allocate the new object */ - obj = i915_gem_alloc_object(dev, size); + switch (flags & I915_CREATE_PLACEMENT_MASK) { + case I915_CREATE_PLACEMENT_NORMAL: + obj = i915_gem_alloc_object(dev, size); + break; + case I915_CREATE_PLACEMENT_STOLEN: + obj = i915_gem_alloc_object_stolen(dev, size); + break; + default: + return -EINVAL; + } + if (obj == NULL) return -ENOMEM; @@ -427,7 +466,7 @@ i915_gem_dumb_create(struct drm_file *file, args->pitch = ALIGN(args->width * DIV_ROUND_UP(args->bpp, 8), 64); args->size = args->pitch * args->height; return i915_gem_create(file, dev, - args->size, >handle); +
Re: [Intel-gfx] [PATCH 06/11] drm/i915: Framework for capturing command stream based OA reports
On Wed, 2016-02-17 at 23:00 +0530, Robert Bragg wrote: > Hi Sourab, > > > As Sergio Martinez has started experimenting with this in gputop and > reported seeing lots of ENOSPC errors being reported when reading I > had a look into this and saw a few issues with how we check that > there's data available to read in command stream mode, and a I think > there's a possibility of incorrectly sorting the samples sometimes... Hi Robert, Thanks for spotting this anomaly. I'll have this fixed in the next version of patch set. > > On Tue, Feb 16, 2016 at 5:27 AM,wrote: > From: Sourab Gupta > > > -static bool i915_oa_can_read(struct i915_perf_stream *stream) > +static bool append_oa_rcs_sample(struct i915_perf_stream > *stream, > +struct i915_perf_read_state > *read_state, > +struct i915_perf_cs_data_node > *node) > +{ > + struct drm_i915_private *dev_priv = stream->dev_priv; > + struct oa_sample_data data = { 0 }; > + const u8 *report = > dev_priv->perf.command_stream_buf.addr + > + node->offset; > + u32 sample_flags = stream->sample_flags; > + u32 report_ts; > + > + /* > +* Forward the periodic OA samples which have the > timestamp lower > +* than timestamp of this sample, before forwarding > this sample. > +* This ensures samples read by user are order acc. to > their timestamps > +*/ > + report_ts = *(u32 *)(report + 4); > + dev_priv->perf.oa.ops.read(stream, read_state, > report_ts); > + > + if (sample_flags & SAMPLE_OA_SOURCE_INFO) > + data.source = I915_PERF_OA_EVENT_SOURCE_RCS; > + > + if (sample_flags & SAMPLE_CTX_ID) > + data.ctx_id = node->ctx_id; > + > + if (sample_flags & SAMPLE_OA_REPORT) > + data.report = report; > + > + append_oa_sample(stream, read_state, ); > + > + return true; > +} > + > +static void oa_rcs_append_reports(struct i915_perf_stream > *stream, > + struct i915_perf_read_state > *read_state) > +{ > + struct drm_i915_private *dev_priv = stream->dev_priv; > + struct i915_perf_cs_data_node *entry, *next; > + > + list_for_each_entry_safe(entry, next, > +_priv->perf.node_list, > link) { > + if (! > i915_gem_request_completed(entry->request, true)) > + break; > + > + if (!append_oa_rcs_sample(stream, read_state, > entry)) > + break; > + > + spin_lock(_priv->perf.node_list_lock); > + list_del(>link); > + spin_unlock(_priv->perf.node_list_lock); > + > + > i915_gem_request_unreference__unlocked(entry->request); > + kfree(entry); > + } > + > + /* Flush any remaining periodic reports */ > + dev_priv->perf.oa.ops.read(stream, read_state, > U32_MAX); > > I don't think we can flush all remaining periodic reports here - at > least not if we have any in-flight MI_RPC commands - in case the next > request to complete might have reports with earlier timestamps than > some of these periodic reports. > > > Even if we have periodic reports available I think we need to throttle > forwarding them based on the command stream requests completing. > > > This is something that userspace should understand when it explicitly > decides to use command stream mode in conjunction with periodic > sampling. > I agree, there shouldn't be any flushing of remaining periodic reports here, instead any periodic reports remaining here should be taken care of during the next processing of command stream samples. > > +} > + > +static bool command_stream_buf_is_empty(struct > i915_perf_stream *stream) > { > struct drm_i915_private *dev_priv = stream->dev_priv; > > - return ! > dev_priv->perf.oa.ops.oa_buffer_is_empty(dev_priv); > + if (stream->cs_mode) > + return list_empty(_priv->perf.node_list); > + else > + return true; > } > > > I think this list_empty() check needs a lock around it, as it's called > from
Re: [Intel-gfx] [PATCH 08/10] drm/i915: Support for pread/pwrite from/to non shmem backed objects
Hi, On Thu, 2016-02-11 at 11:40 +, Tvrtko Ursulin wrote: > > + > > + mutex_unlock(>struct_mutex); > > + if (likely(!i915.prefault_disable)) { > > + ret = fault_in_multipages_writeable(user_data, remain); > > + if (ret) { > > + mutex_lock(>struct_mutex); > > + goto out_unpin; > > + } > > + } > > + > > + while (remain > 0) { > > + /* Operation in this page > > +* > > +* page_base = page offset within aperture > > +* page_offset = offset within page > > +* page_length = bytes to copy for this page > > +*/ > > + u32 page_base = node.start; > > + unsigned page_offset = offset_in_page(offset); > > + unsigned page_length = PAGE_SIZE - page_offset; > > + page_length = remain < page_length ? remain : page_length; > > + if (node.allocated) { > > + wmb(); > > + dev_priv->gtt.base.insert_page(_priv->gtt.base, > > + > > i915_gem_object_get_dma_address(obj, offset >> PAGE_SHIFT), > > + node.start, > > + I915_CACHE_NONE, 0); > > + wmb(); > > + } else { > > + page_base += offset & PAGE_MASK; > > + } > > + /* This is a slow read/write as it tries to read from > > +* and write to user memory which may result into page > > +* faults, and so we cannot perform this under struct_mutex. > > +*/ > > + if (slow_user_access(dev_priv->gtt.mappable, page_base, > > +page_offset, user_data, > > +page_length, false)) { > > + ret = -EFAULT; > > + break; > > + } > > Read does not want to try the fast access first, equivalent to pwrite ? Using fast access means we will be unable to handle faults, which are more frequent in a pread case. > > > + > > + remain -= page_length; > > + user_data += page_length; > > + offset += page_length; > > + } > > + > > > > @@ -870,24 +1012,36 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_private > > *i915, > > unsigned page_length = PAGE_SIZE - page_offset; > > page_length = remain < page_length ? remain : page_length; > > if (node.allocated) { > > - wmb(); > > + wmb(); /* flush the write before we modify the GGTT */ > > i915->gtt.base.insert_page(>gtt.base, > > > > i915_gem_object_get_dma_address(obj, offset >> PAGE_SHIFT), > >node.start, > >I915_CACHE_NONE, > >0); > > - wmb(); > > + wmb(); /* flush modifications to the GGTT (insert_page) > > */ > > } else { > > page_base += offset & PAGE_MASK; > > } > > /* If we get a fault while copying data, then (presumably) our > > * source page isn't available. Return the error and we'll > > * retry in the slow path. > > +* If the object is non-shmem backed, we retry again with the > > +* path that handles page fault. > > */ > > if (fast_user_write(i915->gtt.mappable, page_base, > > page_offset, user_data, page_length)) { > > - ret = -EFAULT; > > - goto out_flush; > > + hit_slow_path = true; > > + mutex_unlock(>struct_mutex); > > + if (slow_user_access(i915->gtt.mappable, > > +page_base, > > +page_offset, user_data, > > +page_length, true)) { > > + ret = -EFAULT; > > + mutex_lock(>struct_mutex); > > + goto out_flush; > > + } > > I think the function now be called i915_gem_gtt_pwrite. > > Would it also need the same pre-fault as in i915_gem_gtt_pread ? I do not think pre-fault is needed here, as in pread we are dealing with a read from the obj and to the user buffer (which has more chances of faulting). While in the pwrite case, we are optimistic that the user would have already mapped/accessed the buffer before using it to write the buffer contents into the object. > > > + > > + mutex_lock(>struct_mutex); > > } > > > > remain -= page_length; > > @@ -896,6 +1050,9 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_private *i915,
Re: [Intel-gfx] [PATCH] drm/atomic: Allow for holes in connector state.
On 16 February 2016 at 21:37, Ville Syrjäläwrote: > On Mon, Feb 15, 2016 at 02:17:01PM +0100, Maarten Lankhorst wrote: >> Because we record connector_mask using 1 << drm_connector_index now >> the connector_mask should stay the same even when other connectors >> are removed. This was not the case with MST, in that case when removing >> a connector all other connectors may change their index. >> >> This is fixed by waiting until the first get_connector_state to allocate >> connector_state, and force reallocation when state is too small. >> >> As a side effect connector arrays no longer have to be preallocated, >> and can be allocated on first use which means a less allocations in >> the page flip only path. Daniel you said something on irc about v2 of this for -fixes? Did I miss v2? Dave. ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 5/6] drm/i915: Implement color management on bdw/skl/bxt/kbl
On Tue, Feb 09, 2016 at 12:19:17PM +, Lionel Landwerlin wrote: > Patch based on a previous series by Shashank Sharma. > > v2: Do not read GAMMA_MODE register to figure what mode we're in > > v3: Program PREC_PAL_GC_MAX to clamp pixel values > 1.0 > > Add documentation on how the Broadcast RGB property is affected by > CTM_MATRIX > > v4: Update contributors > > Signed-off-by: Shashank Sharma> Signed-off-by: Lionel Landwerlin > Signed-off-by: Kumar, Kiran S > Signed-off-by: Kausal Malladi > --- > Documentation/DocBook/gpu.tmpl | 6 +- > drivers/gpu/drm/i915/i915_drv.c | 24 ++- > drivers/gpu/drm/i915/i915_drv.h | 9 + > drivers/gpu/drm/i915/i915_reg.h | 22 +++ > drivers/gpu/drm/i915/intel_color.c | 367 > ++- > drivers/gpu/drm/i915/intel_display.c | 22 ++- > drivers/gpu/drm/i915/intel_drv.h | 6 +- > 7 files changed, 396 insertions(+), 60 deletions(-) > > diff --git a/Documentation/DocBook/gpu.tmpl b/Documentation/DocBook/gpu.tmpl > index 7c49a92..78b8877 100644 > --- a/Documentation/DocBook/gpu.tmpl > +++ b/Documentation/DocBook/gpu.tmpl > @@ -2152,7 +2152,11 @@ void intel_crt_init(struct drm_device *dev) > ENUM > { "Automatic", "Full", "Limited 16:235" } > Connector > - TBD > + When this property is set to Limited 16:235 > + and CTM_MATRIX is set, the hardware will be programmed with > + the result of the multiplication of CTM_MATRIX by the limited > + range matrix to ensure the pixels normaly in the range 0..1.0 > + are remapped to the range 16/255..235/255. > > > “audio” > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c > index 44912ec..b65aa20 100644 > --- a/drivers/gpu/drm/i915/i915_drv.c > +++ b/drivers/gpu/drm/i915/i915_drv.c > @@ -66,6 +66,9 @@ static struct drm_driver driver; > #define IVB_CURSOR_OFFSETS \ > .cursor_offsets = { CURSOR_A_OFFSET, IVB_CURSOR_B_OFFSET, > IVB_CURSOR_C_OFFSET } > > +#define BDW_COLORS \ > + .color = { .degamma_lut_size = 512, .gamma_lut_size = 512 } > + > static const struct intel_device_info intel_i830_info = { > .gen = 2, .is_mobile = 1, .cursor_needs_physical = 1, .num_pipes = 2, > .has_overlay = 1, .overlay_needs_physical = 1, > @@ -288,24 +291,28 @@ static const struct intel_device_info > intel_haswell_m_info = { > .is_mobile = 1, > }; > > +#define BDW_FEATURES \ > + HSW_FEATURES, \ > + BDW_COLORS > + > static const struct intel_device_info intel_broadwell_d_info = { > - HSW_FEATURES, > + BDW_FEATURES, > .gen = 8, > }; > > static const struct intel_device_info intel_broadwell_m_info = { > - HSW_FEATURES, > + BDW_FEATURES, > .gen = 8, .is_mobile = 1, > }; > > static const struct intel_device_info intel_broadwell_gt3d_info = { > - HSW_FEATURES, > + BDW_FEATURES, > .gen = 8, > .ring_mask = RENDER_RING | BSD_RING | BLT_RING | VEBOX_RING | BSD2_RING, > }; > > static const struct intel_device_info intel_broadwell_gt3m_info = { > - HSW_FEATURES, > + BDW_FEATURES, > .gen = 8, .is_mobile = 1, > .ring_mask = RENDER_RING | BSD_RING | BLT_RING | VEBOX_RING | BSD2_RING, > }; > @@ -321,13 +328,13 @@ static const struct intel_device_info > intel_cherryview_info = { > }; > > static const struct intel_device_info intel_skylake_info = { > - HSW_FEATURES, > + BDW_FEATURES, > .is_skylake = 1, > .gen = 9, > }; > > static const struct intel_device_info intel_skylake_gt3_info = { > - HSW_FEATURES, > + BDW_FEATURES, > .is_skylake = 1, > .gen = 9, > .ring_mask = RENDER_RING | BSD_RING | BLT_RING | VEBOX_RING | BSD2_RING, > @@ -345,17 +352,18 @@ static const struct intel_device_info > intel_broxton_info = { > .has_fbc = 1, > GEN_DEFAULT_PIPEOFFSETS, > IVB_CURSOR_OFFSETS, > + BDW_COLORS, > }; > > static const struct intel_device_info intel_kabylake_info = { > - HSW_FEATURES, > + BDW_FEATURES, > .is_preliminary = 1, > .is_kabylake = 1, > .gen = 9, > }; > > static const struct intel_device_info intel_kabylake_gt3_info = { > - HSW_FEATURES, > + BDW_FEATURES, > .is_preliminary = 1, > .is_kabylake = 1, > .gen = 9, > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > index 8216665..c1ca4d0 100644 > --- a/drivers/gpu/drm/i915/i915_drv.h > +++ b/drivers/gpu/drm/i915/i915_drv.h > @@ -659,6 +659,10 @@ struct drm_i915_display_funcs { > /* render clock increase/decrease */ > /* display clock increase/decrease */ > /* pll clock increase/decrease */ > + > + void (*load_degamma_lut)(struct drm_crtc *crtc); > + void (*load_csc_matrix)(struct drm_crtc *crtc); > + void
[Intel-gfx] [PATCH] drm/i915: Before waiting for a vblank update drm frame counter.
Whenever power wells are disabled like when entering DC5/DC6 all display registers are zeroed. DMC firmware restore them on DC5/DC6 exit. However frame counter register is read-only and DMC cannot restore. So we start facing some funny errors where drm was waiting for vblank 500 and hardware counter got reset and not restored. So wait for vblank was returning 500 vblanks latter, like 8 seconds later. Since we have no visibility when DMC is restoring the registers the quick dirty way is to update the drm layer counter with the latest counter we know. At least we don't keep hundreds vblank behind. FIXME: A proper solution would involve a power domain handling to avoid DC off when a vblank is waited. However due the spin locks at drm vblank handling and the mutex sleeps on the power domain handling side we cannot do this. One alternative would be to create a pre_enable_vblank and post_disable_vblank out of the spin lock regions. But unfortunately this is also not trivial because of many asynchronous drm_vblank_get and drm_vblank_put. Any other idea or help is very welcome. Signed-off-by: Rodrigo Vivi--- drivers/gpu/drm/i915/i915_irq.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 25a8937..e67fae4 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -2744,6 +2744,20 @@ static int gen8_enable_vblank(struct drm_device *dev, unsigned int pipe) unsigned long irqflags; spin_lock_irqsave(_priv->irq_lock, irqflags); + /* +* DMC firmware can't restore frame counter register that is read-only +* so we need to force the drm layer to know what is our latest +* frame counter. +* FIXME: We might face some funny race condition with DC states +* entering after this restore. Unfortunately a power domain to avoid +* DC off is not possible at this point due to all spin locks drm layer +* does with vblanks. Another idea was to add pre-enable and +* post-disable functions at vblank, but at drm layer there are many +* asynchronous vblank puts that it is not possible with a bigger +* rework. +*/ + if (HAS_CSR(dev)) + dev->vblank[pipe].last = g4x_get_vblank_counter(dev, pipe); bdw_enable_pipe_irq(dev_priv, pipe, GEN8_PIPE_VBLANK); spin_unlock_irqrestore(_priv->irq_lock, irqflags); -- 2.4.3 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v5 10/12] drm/i915: Defer probe if gmux is present but its driver isn't
> -Original Message- > From: daniel.vet...@ffwll.ch [mailto:daniel.vet...@ffwll.ch] On Behalf Of > Daniel Vetter > Sent: Thursday, February 18, 2016 6:11 PM > To: Lukas Wunner > Cc: dri-devel; platform-driver-...@vger.kernel.org; intel-gfx; Ben Skeggs; > Deucher, Alexander > Subject: Re: [PATCH v5 10/12] drm/i915: Defer probe if gmux is present but > its driver isn't > > On Thu, Feb 18, 2016 at 11:20 PM, Lukas Wunnerwrote: > > Hi, > > > > On Thu, Feb 18, 2016 at 10:39:05PM +0100, Daniel Vetter wrote: > >> On Thu, Feb 18, 2016 at 9:34 PM, Lukas Wunner > wrote: > >> > > >> >> Ok, makes sense. I still think adding the check to the client_register > >> >> function would be good, just as a safety measure. > >> > > >> > Hm, the idea of calling vga_switcheroo_client_probe_defer() twice > >> > causes me pain in the stomach. It's surprising for drivers which > >> > just don't need it at the moment (amdgpu and snd_hda_intel) and > >> > it feels like overengineering and pampering driver developers > >> > beyond reasonable measure. Also while the single existing check is > >> > cheap, we might later on add checks that take more time and slow > >> > things down. > >> > >> It's motivated by Rusty's API Manifesto: > >> > >> http://sweng.the-davies.net/Home/rustys-api-design-manifesto > > > > Interesting, thank you. > > > > > >> With the mandatory check in _register we reach level 5 - it'll blow up > >> at runtime when we try to register it. > > > > The manifesto says "5. Do it right or it will always break at runtime". > > > > However even if we add a > WARN_ON(vga_switcheroo_client_probe_defer(pdev)) > > to register_client(), it will not *always* spew a stacktrace but only on > > the machines this concerns (MacBook Pros). Since failure to defer breaks > > GPU switching, level 5 is already reached. Chances are this won't go > > unnoticed by the user. > > If we fail the register hopefully the driver checks for that and might > blow up somewhere in untested error handling code. But there's a good > chance it'll fail (we can encourage that more by adding must_check to > the function declaration). In that case you get a nice bug report with > splat from users hitting this. > > Without this it'll silently work, and all the reports you get is > "linux is shit, gpu switching doesn't work". > > In both cases it can sometimes succeed, which is not great indeed. But > I'm trying to fix that by injection EDEFER points artificially > somehow. Not yet figured out that one. > > But irrespective of the precise failure mode making the defer check > mandatory by just including it in _register() is better since it makes > it impossible to forget to call it when its needed. So imo clearly the > more robust API. And that's my metric for evaluating new API - how > easy/hard is it to abuse/get wrong. > > >> For more context: We have tons of fun with EPROBE_DEFER handling > >> between i915 and snd-hda > > > > I don't understand, there is currently not a single occurrence of > > EPROBE_DEFER in i915, apart from the one I added. > > > > In sound/ there are 88 occurrences of EPROBE_DEFER in soc/, plus 1 in > > ppc/ and that's it. So not a single one in pci/hda/ where hda_intel.c > > resides. > > > > Is the fun with EPROBE_DEFER handling caused by the lack thereof? > > Yes, there's one instance where i915 shoudl defer missing. The real > trouble is that snd-hda has some really close ties with i915, and > resolves those with probe-defer. And blows up all the time since we > started using this, and with hdmi/dp you really always have to test > both together in CI, snd-hda is pretty much a part of the intel gfx > driver nowadays. Deferred probing is ime real trouble. To further complicate things, AMD dGPUs have HDA audio on board as well. Alex ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 3/4] drm/i915/gen9: Extend dmc debug mask to include cores
>-Original Message- >From: Deak, Imre ... >The BSpec "Sequence to Allow DC5 or DC6" requires this only for BXT >(looks like a recent addition to work around something), but it doesn't >say it's needed for other platforms. The register description doesn't >make a difference though. > >Perhaps Art has more info on this, adding him. > Only BXT needs it programmed to 1b at the moment. Other products should keep the default. ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v5 10/12] drm/i915: Defer probe if gmux is present but its driver isn't
On Thu, Feb 18, 2016 at 11:20 PM, Lukas Wunnerwrote: > Hi, > > On Thu, Feb 18, 2016 at 10:39:05PM +0100, Daniel Vetter wrote: >> On Thu, Feb 18, 2016 at 9:34 PM, Lukas Wunner wrote: >> > >> >> Ok, makes sense. I still think adding the check to the client_register >> >> function would be good, just as a safety measure. >> > >> > Hm, the idea of calling vga_switcheroo_client_probe_defer() twice >> > causes me pain in the stomach. It's surprising for drivers which >> > just don't need it at the moment (amdgpu and snd_hda_intel) and >> > it feels like overengineering and pampering driver developers >> > beyond reasonable measure. Also while the single existing check is >> > cheap, we might later on add checks that take more time and slow >> > things down. >> >> It's motivated by Rusty's API Manifesto: >> >> http://sweng.the-davies.net/Home/rustys-api-design-manifesto > > Interesting, thank you. > > >> With the mandatory check in _register we reach level 5 - it'll blow up >> at runtime when we try to register it. > > The manifesto says "5. Do it right or it will always break at runtime". > > However even if we add a WARN_ON(vga_switcheroo_client_probe_defer(pdev)) > to register_client(), it will not *always* spew a stacktrace but only on > the machines this concerns (MacBook Pros). Since failure to defer breaks > GPU switching, level 5 is already reached. Chances are this won't go > unnoticed by the user. If we fail the register hopefully the driver checks for that and might blow up somewhere in untested error handling code. But there's a good chance it'll fail (we can encourage that more by adding must_check to the function declaration). In that case you get a nice bug report with splat from users hitting this. Without this it'll silently work, and all the reports you get is "linux is shit, gpu switching doesn't work". In both cases it can sometimes succeed, which is not great indeed. But I'm trying to fix that by injection EDEFER points artificially somehow. Not yet figured out that one. But irrespective of the precise failure mode making the defer check mandatory by just including it in _register() is better since it makes it impossible to forget to call it when its needed. So imo clearly the more robust API. And that's my metric for evaluating new API - how easy/hard is it to abuse/get wrong. >> For more context: We have tons of fun with EPROBE_DEFER handling >> between i915 and snd-hda > > I don't understand, there is currently not a single occurrence of > EPROBE_DEFER in i915, apart from the one I added. > > In sound/ there are 88 occurrences of EPROBE_DEFER in soc/, plus 1 in > ppc/ and that's it. So not a single one in pci/hda/ where hda_intel.c > resides. > > Is the fun with EPROBE_DEFER handling caused by the lack thereof? Yes, there's one instance where i915 shoudl defer missing. The real trouble is that snd-hda has some really close ties with i915, and resolves those with probe-defer. And blows up all the time since we started using this, and with hdmi/dp you really always have to test both together in CI, snd-hda is pretty much a part of the intel gfx driver nowadays. Deferred probing is ime real trouble. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v5 10/12] drm/i915: Defer probe if gmux is present but its driver isn't
Hi, On Thu, Feb 18, 2016 at 10:39:05PM +0100, Daniel Vetter wrote: > On Thu, Feb 18, 2016 at 9:34 PM, Lukas Wunnerwrote: > > > >> Ok, makes sense. I still think adding the check to the client_register > >> function would be good, just as a safety measure. > > > > Hm, the idea of calling vga_switcheroo_client_probe_defer() twice > > causes me pain in the stomach. It's surprising for drivers which > > just don't need it at the moment (amdgpu and snd_hda_intel) and > > it feels like overengineering and pampering driver developers > > beyond reasonable measure. Also while the single existing check is > > cheap, we might later on add checks that take more time and slow > > things down. > > It's motivated by Rusty's API Manifesto: > > http://sweng.the-davies.net/Home/rustys-api-design-manifesto Interesting, thank you. > With the mandatory check in _register we reach level 5 - it'll blow up > at runtime when we try to register it. The manifesto says "5. Do it right or it will always break at runtime". However even if we add a WARN_ON(vga_switcheroo_client_probe_defer(pdev)) to register_client(), it will not *always* spew a stacktrace but only on the machines this concerns (MacBook Pros). Since failure to defer breaks GPU switching, level 5 is already reached. Chances are this won't go unnoticed by the user. > For more context: We have tons of fun with EPROBE_DEFER handling > between i915 and snd-hda I don't understand, there is currently not a single occurrence of EPROBE_DEFER in i915, apart from the one I added. In sound/ there are 88 occurrences of EPROBE_DEFER in soc/, plus 1 in ppc/ and that's it. So not a single one in pci/hda/ where hda_intel.c resides. Is the fun with EPROBE_DEFER handling caused by the lack thereof? Best regards, Lukas ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v5 10/12] drm/i915: Defer probe if gmux is present but its driver isn't
On Thu, Feb 18, 2016 at 9:34 PM, Lukas Wunnerwrote: > >> Ok, makes sense. I still think adding the check to the client_register >> function would be good, just as a safety measure. > > Hm, the idea of calling vga_switcheroo_client_probe_defer() twice > causes me pain in the stomach. It's surprising for drivers which > just don't need it at the moment (amdgpu and snd_hda_intel) and > it feels like overengineering and pampering driver developers > beyond reasonable measure. Also while the single existing check is > cheap, we might later on add checks that take more time and slow > things down. It's motivated by Rusty's API Manifesto: http://sweng.the-davies.net/Home/rustys-api-design-manifesto With the mandatory check in _register we reach level 5 - it'll blow up at runtime when we try to register it. Without that the failure is completely silent, and you need to read the right mailing list thread (level 1), but at least the kerneldocs lift it up to level 3. For more context: We have tons of fun with EPROBE_DEFER handling between i915 and snd-hda, and I'm looking into all possible means to make any api/subsystem using deferred probing as robust as possible by default. One of the ideas is to inject deferred probe failures at runtime, but that's kinda hard to do in a generic way. At least making it as close to impossible to abuse as feasible is the next best option. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v2 1/2] drm/i915: Add i915_gem_object_vmap to map GEM object to virtual space
On 02/18/2016 01:05 PM, Chris Wilson wrote: On Thu, Feb 18, 2016 at 10:31:37AM -0800, yu@intel.com wrote: > From: Alex Dai> > There are several places inside driver where a GEM object is mapped to > kernel virtual space. The mapping is either done for the whole object > or certain page range of it. > > This patch introduces a function i915_gem_object_vmap to do such job. > > v2: Use obj->pages->nents for iteration within i915_gem_object_vmap; > break when it finishes all desired pages. The caller need to pass > in actual page number. (Tvrtko Ursulin) Who owns the pages? vmap doesn't increase the page refcount nor mapcount, so it is the callers responsibility to keep the pages alive for the duration of the vmapping. I suggested i915_gem_object_pin_vmap/unpin_vmap for that reason and that also provides the foundation for undoing one of the more substantial performance regressions from vmap_batch(). OK, found it at 050/190 of your patch series. That is a huge list of patches. :-) The code I put here does not change (at least tries to keep) the current code logic or driver behavior. I am not opposed to using i915_gem_object_pin_vmap/unpin_vmap at all. I will now just keep eyes on that patch. Alex ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [RFC i-g-t] tests/drv_hangman: test for acthd increasing through invalid VM space
On Thu, Feb 18, 2016 at 05:34:50PM +, daniele.ceraolospu...@intel.com wrote: > +static void ppgtt_walking(void) > +{ > + memset(, 0, sizeof(execbuf)); > + execbuf.buffers_ptr = (uintptr_t)_exec; > + execbuf.buffer_count = 1; > + execbuf.batch_len = 8; > + > + gem_execbuf(fd, ); > + > + while (gem_bo_busy(fd, handle) && timeout > 0) { > + igt_debug("decreasing timeout to %u\n", --timeout); > + sleep(1); > + } See gem_wait() -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v2 1/2] drm/i915: Add i915_gem_object_vmap to map GEM object to virtual space
On Thu, Feb 18, 2016 at 10:31:37AM -0800, yu@intel.com wrote: > From: Alex Dai> > There are several places inside driver where a GEM object is mapped to > kernel virtual space. The mapping is either done for the whole object > or certain page range of it. > > This patch introduces a function i915_gem_object_vmap to do such job. > > v2: Use obj->pages->nents for iteration within i915_gem_object_vmap; > break when it finishes all desired pages. The caller need to pass > in actual page number. (Tvrtko Ursulin) Who owns the pages? vmap doesn't increase the page refcount nor mapcount, so it is the callers responsibility to keep the pages alive for the duration of the vmapping. I suggested i915_gem_object_pin_vmap/unpin_vmap for that reason and that also provides the foundation for undoing one of the more substantial performance regressions from vmap_batch(). -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v4 7/8] drm/i915: Do not compute watermarks on a noop.
Em Qua, 2016-02-10 às 13:49 +0100, Maarten Lankhorst escreveu: > No atomic state should be included after all validation when nothing > has > changed. During modeset all active planes will be added to the state, > while disabled planes won't change their state. As someone who is also not super familiar with the new watermarks code, I really really wish I had a more detailed commit message to allow me to understand your train of thought. I'll ask some questions below to validate my understanding. > > Signed-off-by: Maarten Lankhorst> Cc: Matt Roper > --- > drivers/gpu/drm/i915/intel_display.c | 3 +- > drivers/gpu/drm/i915/intel_drv.h | 13 > drivers/gpu/drm/i915/intel_pm.c | 61 +- > -- > 3 files changed, 51 insertions(+), 26 deletions(-) > > diff --git a/drivers/gpu/drm/i915/intel_display.c > b/drivers/gpu/drm/i915/intel_display.c > index 00cb261c6787..6bb1f5dbc7a0 100644 > --- a/drivers/gpu/drm/i915/intel_display.c > +++ b/drivers/gpu/drm/i915/intel_display.c > @@ -11910,7 +11910,8 @@ static int intel_crtc_atomic_check(struct > drm_crtc *crtc, > } > > ret = 0; > - if (dev_priv->display.compute_pipe_wm) { > + if (dev_priv->display.compute_pipe_wm && > + (mode_changed || pipe_config->update_pipe || crtc_state- > >planes_changed)) { > ret = dev_priv->display.compute_pipe_wm(intel_crtc, > state); > if (ret) > return ret; Can't this chunk be on its own separate commit? I'm not sure why the rest of the commit is related to this change. It seems the rest of the commit is aimed reducing the number of planes we have to lock, not about not computing WMs if nothing in the pipe has changed. > diff --git a/drivers/gpu/drm/i915/intel_drv.h > b/drivers/gpu/drm/i915/intel_drv.h > index 8effb9ece21e..144597ac74e3 100644 > --- a/drivers/gpu/drm/i915/intel_drv.h > +++ b/drivers/gpu/drm/i915/intel_drv.h > @@ -1583,6 +1583,19 @@ intel_atomic_get_crtc_state(struct > drm_atomic_state *state, > > return to_intel_crtc_state(crtc_state); > } > + > +static inline struct intel_plane_state * > +intel_atomic_get_existing_plane_state(struct drm_atomic_state > *state, > + struct intel_plane *plane) > +{ > + struct drm_plane_state *plane_state; > + > + plane_state = drm_atomic_get_existing_plane_state(state, > >base); > + > + return to_intel_plane_state(plane_state); > +} > + > + Two newlines above. It seems you'll be able to simplify a lot of stuff with this new function. I'm looking forward to review that patch :) > int intel_atomic_setup_scalers(struct drm_device *dev, > struct intel_crtc *intel_crtc, > struct intel_crtc_state *crtc_state); > diff --git a/drivers/gpu/drm/i915/intel_pm.c > b/drivers/gpu/drm/i915/intel_pm.c > index 379eabe093cb..8fb8c6891ae6 100644 > --- a/drivers/gpu/drm/i915/intel_pm.c > +++ b/drivers/gpu/drm/i915/intel_pm.c > @@ -2010,11 +2010,18 @@ static void ilk_compute_wm_level(const struct > drm_i915_private *dev_priv, > cur_latency *= 5; > } > > - result->pri_val = ilk_compute_pri_wm(cstate, pristate, > - pri_latency, level); > - result->spr_val = ilk_compute_spr_wm(cstate, sprstate, > spr_latency); > - result->cur_val = ilk_compute_cur_wm(cstate, curstate, > cur_latency); > - result->fbc_val = ilk_compute_fbc_wm(cstate, pristate, > result->pri_val); > + if (pristate) { > + result->pri_val = ilk_compute_pri_wm(cstate, > pristate, > + pri_latency, > level); > + result->fbc_val = ilk_compute_fbc_wm(cstate, > pristate, result->pri_val); > + } > + > + if (sprstate) > + result->spr_val = ilk_compute_spr_wm(cstate, > sprstate, spr_latency); > + > + if (curstate) > + result->cur_val = ilk_compute_cur_wm(cstate, > curstate, cur_latency); > + > result->enable = true; > } > > @@ -2287,7 +2294,6 @@ static int ilk_compute_pipe_wm(struct > intel_crtc *intel_crtc, > const struct drm_i915_private *dev_priv = dev->dev_private; > struct intel_crtc_state *cstate = NULL; > struct intel_plane *intel_plane; > - struct drm_plane_state *ps; > struct intel_plane_state *pristate = NULL; > struct intel_plane_state *sprstate = NULL; > struct intel_plane_state *curstate = NULL; > @@ -2306,30 +2312,37 @@ static int ilk_compute_pipe_wm(struct > intel_crtc *intel_crtc, > memset(pipe_wm, 0, sizeof(*pipe_wm)); > > for_each_intel_plane_on_crtc(dev, intel_crtc, intel_plane) { > - ps = drm_atomic_get_plane_state(state, > - _plane->base); > - if (IS_ERR(ps)) > - return PTR_ERR(ps); > + struct intel_plane_state *ps; > + > +
Re: [Intel-gfx] [PATCH v5 10/12] drm/i915: Defer probe if gmux is present but its driver isn't
Hi, On Tue, Feb 16, 2016 at 05:08:40PM +0100, Daniel Vetter wrote: > On Tue, Feb 16, 2016 at 04:58:20PM +0100, Lukas Wunner wrote: > > On Sun, Feb 14, 2016 at 01:46:28PM +0100, Daniel Vetter wrote: > > > On Sun, Feb 14, 2016 at 1:10 PM, Lukas Wunnerwrote: > > > > + * DRM drivers should invoke this early on in their ->probe callback > > > > and return > > > > + * %-EPROBE_DEFER if it evaluates to %true. The GPU need not be > > > > registered with > > > > + * vga_switcheroo_register_client() beforehand. > > > > > > s/need not/must not/ ... is your native language German by any chance? > > > > In principle there's no harm in registering the client first, > > then checking if probing should be deferred, as long as the > > client is unregistered before deferring. Thus the language > > above is intentionally "need not" (muss nicht) rather than > > "must not" (darf nicht). I didn't want to mandate something > > that isn't actually required. The above sentence is merely > > an aid for driver developers who might be confused in which > > order to call what. > > I'd reject any driver that does this, registering, then checking, then > unregistering seems extermely unsafe. I'd really stick with mandatory > language here to make this clear. Ok, I've made it mandatory in the kerneldoc, updated patch follows below. > Ok, makes sense. I still think adding the check to the client_register > function would be good, just as a safety measure. Hm, the idea of calling vga_switcheroo_client_probe_defer() twice causes me pain in the stomach. It's surprising for drivers which just don't need it at the moment (amdgpu and snd_hda_intel) and it feels like overengineering and pampering driver developers beyond reasonable measure. Also while the single existing check is cheap, we might later on add checks that take more time and slow things down. Best regards, Lukas -- >8 -- Subject: [PATCH] vga_switcheroo: Add helper for deferred probing So far we've got one condition when DRM drivers need to defer probing on a dual GPU system and it's coded separately into each of the relevant drivers. As suggested by Daniel Vetter, deduplicate that code in the drivers and move it to a new vga_switcheroo helper. This yields better encapsulation of concepts and lets us add further checks in a central place. (The existing check pertains to pre-retina MacBook Pros and an additional check is expected to be needed for retinas.) v2: This helper could eventually be used by audio clients as well, so rephrase kerneldoc to refer to "client" instead of "GPU" and move the single existing check in an if block specific to PCI_CLASS_DISPLAY_VGA devices. Move documentation on that check from kerneldoc to a comment. (Daniel Vetter) v3: Mandate in kerneldoc that registration of client shall only happen after calling this helper. (Daniel Vetter) Cc: Daniel Vetter Cc: Ben Skeggs Cc: Alex Deucher Signed-off-by: Lukas Wunner --- drivers/gpu/drm/i915/i915_drv.c | 10 +- drivers/gpu/drm/nouveau/nouveau_drm.c | 10 +- drivers/gpu/drm/radeon/radeon_drv.c | 10 +- drivers/gpu/vga/vga_switcheroo.c | 34 -- include/linux/vga_switcheroo.h| 2 ++ 5 files changed, 37 insertions(+), 29 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 44912ec..80cfd73 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -35,11 +35,9 @@ #include "i915_trace.h" #include "intel_drv.h" -#include #include #include #include -#include #include #include @@ -972,13 +970,7 @@ static int i915_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent) if (PCI_FUNC(pdev->devfn)) return -ENODEV; - /* -* apple-gmux is needed on dual GPU MacBook Pro -* to probe the panel if we're the inactive GPU. -*/ - if (IS_ENABLED(CONFIG_VGA_ARB) && IS_ENABLED(CONFIG_VGA_SWITCHEROO) && - apple_gmux_present() && pdev != vga_default_device() && - !vga_switcheroo_handler_flags()) + if (vga_switcheroo_client_probe_defer(pdev)) return -EPROBE_DEFER; return drm_get_pci_dev(pdev, ent, ); diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c index bb8498c..9141bcd 100644 --- a/drivers/gpu/drm/nouveau/nouveau_drm.c +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c @@ -22,13 +22,11 @@ * Authors: Ben Skeggs */ -#include #include #include #include #include #include -#include #include #include "drmP.h" @@ -314,13 +312,7 @@ static int nouveau_drm_probe(struct pci_dev *pdev, bool boot = false; int ret; - /* -* apple-gmux is needed on dual GPU MacBook Pro -* to probe the panel if we're the inactive GPU. -
[Intel-gfx] [PATCH] drm/i915: Skip PIPESTAT reads from irq handler on VLV/CHV when power well is down
From: Ville SyrjäläPIPESTAT registers live in the display power well on VLV/CHV, so we shouldn't access them when things are powered down. Let's check whether the display interrupts are on or off before accessing the PIPESTAT registers. Another option would be to read the PIPESTAT registers only when the IIR register indicates that there's a pending pipe event. But that would mean we might miss even more underrun reports than we do now, because the underrun status bit lives in PIPESTAT but doesn't actually generate an interrupt. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93738 Cc: Chris Wilson Tested-by: Chris Wilson Signed-off-by: Ville Syrjälä --- drivers/gpu/drm/i915/i915_irq.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 25a89373df63..d56c261ad867 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -1651,6 +1651,12 @@ static void valleyview_pipestat_irq_handler(struct drm_device *dev, u32 iir) int pipe; spin_lock(_priv->irq_lock); + + if (!dev_priv->display_irqs_enabled) { + spin_unlock(_priv->irq_lock); + return; + } + for_each_pipe(dev_priv, pipe) { i915_reg_t reg; u32 mask, iir_bit = 0; -- 2.4.10 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PULL] topic/drm-misc
Hi Dave, Misc stuff all over: - more mode_fixup removal from Carlos, there's another final pile still left. - final bits of vgaswitcheroo from Lukas for apple gmux, we're still discussing an api cleanup patch to make it a bit more abuse-safe as a follow-up - dp aux interface for userspace for tools from Rafael Antognolli - actual interface parts for dma-buf flushing for userspace mmap - few small bits all over ... plus all the bits from last pull req. Why do I split them up ;-) I'll be on vacation for 1 week now, will send final intel pull for 4.6 and probably more drm-misc when I'm back. Cheers, Daniel The following changes since commit 10c1b6183a163aca59ba92b88f2b4c4cecd20d4c: drm/tegra: drop unused variable. (2016-02-09 11:17:37 +1000) are available in the git repository at: git://anongit.freedesktop.org/drm-intel tags/topic/drm-misc-2016-02-18 for you to fetch changes up to a6ddd2f1b99f1c00b4e00289b13c3e451c7130b0: drm/udl: Use module_usb_driver (2016-02-17 14:19:30 +0100) Amitoj Kaur Chawla (1): drm/udl: Use module_usb_driver Arnd Bergmann (1): drm/msm: remove unused variable Carlos Palminha (22): drm: fixes when i2c encoder slave mode_fixup is null. drm: fixes crct set_mode when encoder mode_fixup is null. drm/i2c/sil164: removed unnecessary code, mode_fixup is now optional. drm/i2c/tda998x: removed unnecessary code, mode_fixup is now optional. drm/bridge: removed dummy mode_fixup function from dw-hdmi. drm/virtio: removed optional dummy encoder mode_fixup function. drm/udl: removed optional dummy encoder mode_fixup function. drm/exynos: removed optional dummy encoder mode_fixup function. drm/amdgpu: removed optional dummy encoder mode_fixup function. drm/ast: removed optional dummy encoder mode_fixup function. drm/bochs: removed optional dummy encoder mode_fixup function. drm/cirrus: removed optional dummy encoder mode_fixup function. drm/radeon: removed optional dummy encoder mode_fixup function. drm/gma500: removed optional dummy encoder mode_fixup function. drm/imx: removed optional dummy encoder mode_fixup function. drm/msm/mdp: removed optional dummy encoder mode_fixup function. drm/mgag200: removed optional dummy encoder mode_fixup function. drm/qxl: removed optional dummy encoder mode_fixup function. drm/rockchip: removed optional dummy encoder mode_fixup function. drm/sti: removed optional dummy encoder mode_fixup function. drm/tilcdc: removed optional dummy encoder mode_fixup function. drm: fixes crct set_mode when crtc mode_fixup is null. Daniel Thompson (1): drm: prime: Honour O_RDWR during prime-handle-to-fd Daniel Vetter (2): dma-buf: Add ioctls to allow userspace to flush Merge branch 'topic/mode_fixup-optional' into topic/drm-misc Haixia Shi (1): drm/msm: remove the drm_device_is_unplugged check Insu Yun (1): ch7006: correctly handling failed allocation LABBE Corentin (1): drm: modes: add missing [drm] to message printing Lukas Wunner (13): vga_switcheroo: Add handler flags infrastructure vga_switcheroo: Add support for switching only the DDC apple-gmux: Track switch state apple-gmux: Add switch_ddc support drm/edid: Switch DDC when reading the EDID drm/i915: Switch DDC when reading the EDID drm/nouveau: Switch DDC when reading the EDID drm/radeon: Switch DDC when reading the EDID apple-gmux: Add helper for presence detect drm/i915: Defer probe if gmux is present but its driver isn't drm/nouveau: Defer probe if gmux is present but its driver isn't drm/radeon: Defer probe if gmux is present but its driver isn't apple-gmux: Fix build breakage if !CONFIG_ACPI Maarten Lankhorst (7): drm/core: Add drm_encoder_index. drm/core: Add drm_for_each_encoder_mask, v2. drm/i915: Do not touch best_encoder for load detect. drm/atomic: Do not unset crtc when an encoder is stolen drm/atomic: Add encoder_mask to crtc_state, v3. drm/fb_helper: Use correct allocation count for arrays. drm/fb_helper: Use add_one_connector in add_all_connectors. Rafael Antognolli (3): drm/kms_helper: Add a common place to call init and exit functions. drm/dp: Add a drm_aux-dev module for reading/writing dpcd registers. drm/i915: Set aux.dev to the drm_connector device, instead of drm_device. Rasmus Villemoes (1): drm/gma500: fix error path in gma_intel_setup_gmbus() Tiago Vignatti (3): dma-buf: Remove range-based flush drm/i915: Implement end_cpu_access drm/i915: Use CPU mapping for userspace dma-buf mmap() Ville Syrjälä (1): drm: Add drm_format_plane_width() and drm_format_plane_height() Documentation/DocBook/gpu.tmpl | 5 + Documentation/dma-buf-sharing.txt
[Intel-gfx] [PATCH v2 1/2] drm/i915: Add i915_gem_object_vmap to map GEM object to virtual space
From: Alex DaiThere are several places inside driver where a GEM object is mapped to kernel virtual space. The mapping is either done for the whole object or certain page range of it. This patch introduces a function i915_gem_object_vmap to do such job. v2: Use obj->pages->nents for iteration within i915_gem_object_vmap; break when it finishes all desired pages. The caller need to pass in actual page number. (Tvrtko Ursulin) Signed-off-by: Alex Dai Cc: Dave Gordon Cc: Daniel Vetter Cc: Tvrtko Ursulin Cc: Chris Wilson Signed-off-by: Alex Dai --- drivers/gpu/drm/i915/i915_cmd_parser.c | 28 +--- drivers/gpu/drm/i915/i915_drv.h | 3 +++ drivers/gpu/drm/i915/i915_gem.c | 47 + drivers/gpu/drm/i915/i915_gem_dmabuf.c | 16 +++ drivers/gpu/drm/i915/intel_ringbuffer.c | 24 ++--- 5 files changed, 56 insertions(+), 62 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c index 814d894..915e8c1 100644 --- a/drivers/gpu/drm/i915/i915_cmd_parser.c +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c @@ -863,37 +863,11 @@ find_reg(const struct drm_i915_reg_descriptor *table, static u32 *vmap_batch(struct drm_i915_gem_object *obj, unsigned start, unsigned len) { - int i; - void *addr = NULL; - struct sg_page_iter sg_iter; int first_page = start >> PAGE_SHIFT; int last_page = (len + start + 4095) >> PAGE_SHIFT; int npages = last_page - first_page; - struct page **pages; - - pages = drm_malloc_ab(npages, sizeof(*pages)); - if (pages == NULL) { - DRM_DEBUG_DRIVER("Failed to get space for pages\n"); - goto finish; - } - - i = 0; - for_each_sg_page(obj->pages->sgl, _iter, obj->pages->nents, first_page) { - pages[i++] = sg_page_iter_page(_iter); - if (i == npages) - break; - } - - addr = vmap(pages, i, 0, PAGE_KERNEL); - if (addr == NULL) { - DRM_DEBUG_DRIVER("Failed to vmap pages\n"); - goto finish; - } -finish: - if (pages) - drm_free_large(pages); - return (u32*)addr; + return (u32*)i915_gem_object_vmap(obj, first_page, npages); } /* Returns a vmap'd pointer to dest_obj, which the caller must unmap */ diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 6644c2e..5b00a6a 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2899,6 +2899,9 @@ struct drm_i915_gem_object *i915_gem_object_create_from_data( struct drm_device *dev, const void *data, size_t size); void i915_gem_free_object(struct drm_gem_object *obj); void i915_gem_vma_destroy(struct i915_vma *vma); +void *i915_gem_object_vmap(struct drm_i915_gem_object *obj, + unsigned int first, + unsigned int npages); /* Flags used by pin/bind */ #define PIN_MAPPABLE (1<<0) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index f68f346..4bc0ce7 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -5356,3 +5356,50 @@ fail: drm_gem_object_unreference(>base); return ERR_PTR(ret); } + +/** + * i915_gem_object_vmap - map a GEM obj into kernel virtual space + * @obj: the GEM obj to be mapped + * @first: index of the first page where mapping starts + * @npages: how many pages to be mapped, starting from first page + * + * Map a given page range of GEM obj into kernel virtual space. The caller must + * make sure the associated pages are gathered and pinned before calling this + * function. vunmap should be called after use. + * + * NULL will be returned if fails. + */ +void *i915_gem_object_vmap(struct drm_i915_gem_object *obj, + unsigned int first, + unsigned int npages) +{ + struct sg_page_iter sg_iter; + struct page **pages; + void *addr; + int i; + + if (first + npages > obj->pages->nents) { + DRM_DEBUG_DRIVER("Invalid page count\n"); + return NULL; + } + + pages = drm_malloc_ab(npages, sizeof(*pages)); + if (pages == NULL) { + DRM_DEBUG_DRIVER("Failed to get space for pages\n"); + return NULL; + } + + i = 0; + for_each_sg_page(obj->pages->sgl, _iter, obj->pages->nents, first) { + pages[i++] = sg_page_iter_page(_iter); + if (i == npages) + break; + } + + addr = vmap(pages, npages, 0, PAGE_KERNEL); + if (addr == NULL) + DRM_DEBUG_DRIVER("Failed
[Intel-gfx] [PATCH v2 0/2] Add i915_gem_object_vmap
From: Alex DaiThere are several places in driver that a GEM object is mapped to kernel virtual space. Now add a common function i915_gem_object_vmap to do the vmap work for such use case. Alex Dai (2): drm/i915: Add i915_gem_object_vmap to map GEM object to virtual space drm/i915/guc: Simplify code by keeping vmap of guc_client object drivers/gpu/drm/i915/i915_cmd_parser.c | 28 +-- drivers/gpu/drm/i915/i915_drv.h| 3 ++ drivers/gpu/drm/i915/i915_gem.c| 47 + drivers/gpu/drm/i915/i915_gem_dmabuf.c | 16 ++--- drivers/gpu/drm/i915/i915_guc_submission.c | 56 ++ drivers/gpu/drm/i915/intel_guc.h | 3 +- drivers/gpu/drm/i915/intel_ringbuffer.c| 24 ++--- 7 files changed, 77 insertions(+), 100 deletions(-) -- 2.5.0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v2 2/2] drm/i915/guc: Simplify code by keeping vmap of guc_client object
From: Alex DaiGuC client object is always pinned during its life cycle. We cache the vmap of client object, which includes guc_process_desc, doorbell and work queue. By doing so, we can simplify the code where driver communicate with GuC. As a result, this patch removes the kmap_atomic in wq_check_space, where usleep_range could be called while kmap_atomic is held. This fixes issue below. v2: Pass page actual numbers to i915_gem_object_vmap(). Also, check return value for error handling. (Tvrtko Ursulin) v1: vmap is done by i915_gem_object_vmap(). [ 34.098798] BUG: scheduling while atomic: gem_close_race/1941/0x0002 [ 34.098822] Modules linked in: hid_generic usbhid i915 asix usbnet libphy mii i2c_algo_bit drm_kms_helper cfbfillrect syscopyarea cfbimgblt sysfillrect sysimgblt fb_sys_fops cfbcopyarea drm coretemp i2c_hid hid video pinctrl_sunrisepoint pinctrl_intel acpi_pad nls_iso8859_1 e1000e ptp psmouse pps_core ahci libahci [ 34.098824] CPU: 0 PID: 1941 Comm: gem_close_race Tainted: G U 4.4.0-160121+ #123 [ 34.098824] Hardware name: Intel Corporation Skylake Client platform/Skylake AIO DDR3L RVP10, BIOS SKLSE2R1.R00.X100.B01.1509220551 09/22/2015 [ 34.098825] 00013e40 880166c27a78 81280d02 880172c13e40 [ 34.098826] 880166c27a88 810c203a 880166c27ac8 814ec808 [ 34.098827] 88016b7c6000 880166c28000 000f4240 0001 [ 34.098827] Call Trace: [ 34.098831] [] dump_stack+0x4b/0x79 [ 34.098833] [] __schedule_bug+0x41/0x4f [ 34.098834] [] __schedule+0x5a8/0x690 [ 34.098835] [] schedule+0x37/0x80 [ 34.098836] [] schedule_hrtimeout_range_clock+0xad/0x130 [ 34.098837] [] ? hrtimer_init+0x10/0x10 [ 34.098838] [] ? schedule_hrtimeout_range_clock+0xa1/0x130 [ 34.098839] [] schedule_hrtimeout_range+0xe/0x10 [ 34.098840] [] usleep_range+0x3b/0x40 [ 34.098853] [] i915_guc_wq_check_space+0x119/0x210 [i915] [ 34.098861] [] intel_logical_ring_alloc_request_extras+0x5c/0x70 [i915] [ 34.098869] [] i915_gem_request_alloc+0x91/0x170 [i915] [ 34.098875] [] i915_gem_do_execbuffer.isra.25+0xbc7/0x12a0 [i915] [ 34.098882] [] ? i915_gem_object_get_pages_gtt+0x225/0x3c0 [i915] [ 34.098889] [] ? i915_gem_pwrite_ioctl+0xd6/0x9f0 [i915] [ 34.098895] [] i915_gem_execbuffer2+0xa8/0x250 [i915] [ 34.098900] [] drm_ioctl+0x258/0x4f0 [drm] [ 34.098906] [] ? i915_gem_execbuffer+0x340/0x340 [i915] [ 34.098908] [] do_vfs_ioctl+0x2cd/0x4a0 [ 34.098909] [] ? __fget+0x72/0xb0 [ 34.098910] [] SyS_ioctl+0x3c/0x70 [ 34.098911] [] entry_SYSCALL_64_fastpath+0x12/0x6a [ 34.100208] [ cut here ] Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93847 Cc: Dave Gordon Cc: Daniel Vetter Cc: Tvrtko Ursulin Signed-off-by: Alex Dai --- drivers/gpu/drm/i915/i915_guc_submission.c | 56 ++ drivers/gpu/drm/i915/intel_guc.h | 3 +- 2 files changed, 21 insertions(+), 38 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c index d7543ef..3e2ea42 100644 --- a/drivers/gpu/drm/i915/i915_guc_submission.c +++ b/drivers/gpu/drm/i915/i915_guc_submission.c @@ -195,11 +195,9 @@ static int guc_ring_doorbell(struct i915_guc_client *gc) struct guc_process_desc *desc; union guc_doorbell_qw db_cmp, db_exc, db_ret; union guc_doorbell_qw *db; - void *base; int attempt = 2, ret = -EAGAIN; - base = kmap_atomic(i915_gem_object_get_page(gc->client_obj, 0)); - desc = base + gc->proc_desc_offset; + desc = gc->client_base + gc->proc_desc_offset; /* Update the tail so it is visible to GuC */ desc->tail = gc->wq_tail; @@ -215,7 +213,7 @@ static int guc_ring_doorbell(struct i915_guc_client *gc) db_exc.cookie = 1; /* pointer of current doorbell cacheline */ - db = base + gc->doorbell_offset; + db = gc->client_base + gc->doorbell_offset; while (attempt--) { /* lets ring the doorbell */ @@ -244,10 +242,6 @@ static int guc_ring_doorbell(struct i915_guc_client *gc) db_exc.cookie = 1; } - /* Finally, update the cached copy of the GuC's WQ head */ - gc->wq_head = desc->head; - - kunmap_atomic(base); return ret; } @@ -341,10 +335,8 @@ static void guc_init_proc_desc(struct intel_guc *guc, struct i915_guc_client *client) { struct guc_process_desc *desc; - void *base; - base = kmap_atomic(i915_gem_object_get_page(client->client_obj, 0)); - desc = base + client->proc_desc_offset; + desc = client->client_base + client->proc_desc_offset; memset(desc, 0, sizeof(*desc)); @@ -361,8 +353,6 @@
Re: [Intel-gfx] [PATCH 3/6] drm/i915: Remove the SPLL==270Mhz assumption from intel_fdi_link_freq()
On ke, 2016-02-17 at 21:41 +0200, ville.syrj...@linux.intel.com wrote: > From: Ville Syrjälä> > Instead of assuming we've correctly set up SPLL to run at 270Mhz for > FDI, let's use the port_clock from pipe_config which should be what > we want. This would catch problems if someone misconfigures SPLL for > whatever reason. > > Signed-off-by: Ville Syrjälä Reviewed-by: Imre Deak > --- > drivers/gpu/drm/i915/intel_display.c | 17 ++--- > 1 file changed, 10 insertions(+), 7 deletions(-) > > diff --git a/drivers/gpu/drm/i915/intel_display.c > b/drivers/gpu/drm/i915/intel_display.c > index 99001e117517..a3c959cd8b3b 100644 > --- a/drivers/gpu/drm/i915/intel_display.c > +++ b/drivers/gpu/drm/i915/intel_display.c > @@ -224,12 +224,15 @@ static void intel_update_czclk(struct > drm_i915_private *dev_priv) > } > > static inline u32 /* units of 100MHz */ > -intel_fdi_link_freq(struct drm_i915_private *dev_priv) > +intel_fdi_link_freq(struct drm_i915_private *dev_priv, > + const struct intel_crtc_state *pipe_config) > { > - if (IS_GEN5(dev_priv)) > - return (I915_READ(FDI_PLL_BIOS_0) & > FDI_PLL_FB_CLOCK_MASK) + 2; > + if (HAS_DDI(dev_priv)) > + return pipe_config->port_clock; /* SPLL */ > + else if (IS_GEN5(dev_priv)) > + return ((I915_READ(FDI_PLL_BIOS_0) & > FDI_PLL_FB_CLOCK_MASK) + 2) * 1; > else > - return 27; > + return 27; > } > > static const intel_limit_t intel_limits_i8xx_dac = { > @@ -6588,7 +6591,7 @@ retry: > * Hence the bw of each lane in terms of the mode signal > * is: > */ > - link_bw = intel_fdi_link_freq(to_i915(dev)) * > MHz(100)/KHz(1)/10; > + link_bw = intel_fdi_link_freq(to_i915(dev), pipe_config); > > fdi_dotclock = adjusted_mode->crtc_clock; > > @@ -10774,7 +10777,7 @@ static void ironlake_pch_clock_get(struct > intel_crtc *crtc, > * Calculate one based on the FDI configuration. > */ > pipe_config->base.adjusted_mode.crtc_clock = > - intel_dotclock_calculate(intel_fdi_link_freq(dev_pri > v) * 1, > + intel_dotclock_calculate(intel_fdi_link_freq(dev_pri > v, pipe_config), > _config->fdi_m_n); > } > > @@ -12789,7 +12792,7 @@ static void > intel_pipe_config_sanity_check(struct drm_i915_private *dev_priv, > const struct > intel_crtc_state *pipe_config) > { > if (pipe_config->has_pch_encoder) { > - int fdi_dotclock = > intel_dotclock_calculate(intel_fdi_link_freq(dev_priv) * 1, > + int fdi_dotclock = > intel_dotclock_calculate(intel_fdi_link_freq(dev_priv, pipe_config), > _co > nfig->fdi_m_n); > int dotclock = pipe_config- > >base.adjusted_mode.crtc_clock; > ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 2/6] drm/i915: Move the encoder vs. FDI dotclock check out from encoder .get_config()
On ke, 2016-02-17 at 21:41 +0200, ville.syrj...@linux.intel.com wrote: > From: Ville Syrjälä> > Currently we check if the encoder's idea of dotclock agrees with what > we calculated based on the FDI parameters. We do this in the encoder > .get_config() hooks, which isn't so nice in case the BIOS (or some > other > outside party) made a mess of the state and we're just trying to take > over. > > So as a prep step to being able sanitize such a bogus state, move the > the sanity check to just after we've read out the entire state. If > we then need to sanitize a bad state, it should be easier to move the > sanity check to occur after sanitation instead of before it. > > Signed-off-by: Ville Syrjälä Separating the get-config and check steps makes things more logical in any case. Looks ok to me: Reviewed-by: Imre Deak > --- > drivers/gpu/drm/i915/intel_crt.c | 10 +-- > drivers/gpu/drm/i915/intel_display.c | 57 > > drivers/gpu/drm/i915/intel_dp.c | 11 ++- > drivers/gpu/drm/i915/intel_drv.h | 3 -- > drivers/gpu/drm/i915/intel_hdmi.c| 3 -- > drivers/gpu/drm/i915/intel_lvds.c| 8 + > drivers/gpu/drm/i915/intel_sdvo.c| 4 +-- > 7 files changed, 38 insertions(+), 58 deletions(-) > > diff --git a/drivers/gpu/drm/i915/intel_crt.c > b/drivers/gpu/drm/i915/intel_crt.c > index e686a91a416e..f4c88d93a164 100644 > --- a/drivers/gpu/drm/i915/intel_crt.c > +++ b/drivers/gpu/drm/i915/intel_crt.c > @@ -120,17 +120,9 @@ static unsigned int intel_crt_get_flags(struct > intel_encoder *encoder) > static void intel_crt_get_config(struct intel_encoder *encoder, > struct intel_crtc_state > *pipe_config) > { > - struct drm_device *dev = encoder->base.dev; > - int dotclock; > - > pipe_config->base.adjusted_mode.flags |= > intel_crt_get_flags(encoder); > > - dotclock = pipe_config->port_clock; > - > - if (HAS_PCH_SPLIT(dev)) > - ironlake_check_encoder_dotclock(pipe_config, > dotclock); > - > - pipe_config->base.adjusted_mode.crtc_clock = dotclock; > + pipe_config->base.adjusted_mode.crtc_clock = pipe_config- > >port_clock; > } > > static void hsw_crt_get_config(struct intel_encoder *encoder, > diff --git a/drivers/gpu/drm/i915/intel_display.c > b/drivers/gpu/drm/i915/intel_display.c > index f0f88061a9e5..99001e117517 100644 > --- a/drivers/gpu/drm/i915/intel_display.c > +++ b/drivers/gpu/drm/i915/intel_display.c > @@ -224,12 +224,11 @@ static void intel_update_czclk(struct > drm_i915_private *dev_priv) > } > > static inline u32 /* units of 100MHz */ > -intel_fdi_link_freq(struct drm_device *dev) > +intel_fdi_link_freq(struct drm_i915_private *dev_priv) > { > - if (IS_GEN5(dev)) { > - struct drm_i915_private *dev_priv = dev- > >dev_private; > + if (IS_GEN5(dev_priv)) > return (I915_READ(FDI_PLL_BIOS_0) & > FDI_PLL_FB_CLOCK_MASK) + 2; > - } else > + else > return 27; > } > > @@ -6589,7 +6588,7 @@ retry: > * Hence the bw of each lane in terms of the mode signal > * is: > */ > - link_bw = intel_fdi_link_freq(dev) * MHz(100)/KHz(1)/10; > + link_bw = intel_fdi_link_freq(to_i915(dev)) * > MHz(100)/KHz(1)/10; > > fdi_dotclock = adjusted_mode->crtc_clock; > > @@ -6601,8 +6600,7 @@ retry: > intel_link_compute_m_n(pipe_config->pipe_bpp, lane, > fdi_dotclock, > link_bw, _config->fdi_m_n); > > - ret = ironlake_check_fdi_lanes(intel_crtc->base.dev, > - intel_crtc->pipe, > pipe_config); > + ret = ironlake_check_fdi_lanes(dev, intel_crtc->pipe, > pipe_config); > if (ret == -EINVAL && pipe_config->pipe_bpp > 6*3) { > pipe_config->pipe_bpp -= 2*3; > DRM_DEBUG_KMS("fdi link bw constraint, reducing pipe > bpp to %i\n", > @@ -10765,19 +10763,18 @@ int intel_dotclock_calculate(int link_freq, > static void ironlake_pch_clock_get(struct intel_crtc *crtc, > struct intel_crtc_state > *pipe_config) > { > - struct drm_device *dev = crtc->base.dev; > + struct drm_i915_private *dev_priv = to_i915(crtc->base.dev); > > /* read out port_clock from the DPLL */ > i9xx_crtc_clock_get(crtc, pipe_config); > > /* > - * This value does not include pixel_multiplier. > - * We will check that port_clock and > adjusted_mode.crtc_clock > - * agree once we know their relationship in the encoder's > - * get_config() function. > + * In case there is an active pipe without active ports, > + * we may need some idea for the dotclock anyway. > + * Calculate one based on the FDI configuration. > */ > pipe_config->base.adjusted_mode.crtc_clock = > - intel_dotclock_calculate(intel_fdi_link_freq(dev) *
Re: [Intel-gfx] Fwd: [PATCH] drm/i915: Avoid vblank counter for gen9+
On to, 2016-02-18 at 08:56 -0800, Rodrigo Vivi wrote: > Imre, Patrik, do you know if I'm missing something or what I'm doing > wrong with this power domain handler for vblanks to avoid DC states > when we need a reliable frame counter in place. The WARN is due to the spin_lock() in drm_vblank_enable(), you can't call power domain functions in atomic context, due to the mutex the power domain and runtime PM fw uses. --Imre > > Do you have better ideas? > > Thanks, > Rodrigo. > > -- Forwarded message -- > From: Rodrigo Vivi> Date: Wed, Feb 17, 2016 at 3:14 PM > Subject: Re: [Intel-gfx] [PATCH] drm/i915: Avoid vblank counter for > gen9+ > To: Daniel Vetter , Patrik Jakobsson > > Cc: Rodrigo Vivi , intel-gfx > > > > On Tue, Feb 16, 2016 at 7:50 AM, Daniel Vetter > wrote: > > On Thu, Feb 11, 2016 at 09:00:47AM -0800, Rodrigo Vivi wrote: > > > Framecounter register is read-only so DMC cannot restore it > > > after exiting DC5 and DC6. > > > > > > Easiest way to go is to avoid the counter and use vblank > > > interruptions for this platform and for all the following > > > ones since DMC came to stay. At least while we can't change > > > this register to read-write. > > > > > > Signed-off-by: Rodrigo Vivi > > > > Now my comments also in public: > > - Do we still get reasonable dc5 residency with this - it means > > we'll keep > > vblank irq running forever. > > > > - I'm a bit unclear on what exactly this fixes - have you tested > > that > > long-lasting vblank waits are still accurate? Just want to make > > sure we > > don't just paper over the issue and desktops can still get stuck > > waiting > > for a vblank. > > apparently no... so please just ignore this patch for now... after a > while with that patch I was seeing the issue again... > > > > > Just a bit suprised that the only problem is the framecounter, and > > not > > that vblanks stop happening too. > > > > We need to also know these details for the proper fix, which will > > involve > > grabbing power well references (might need a new one for vblank > > interrupts) to make sure. > > Yeap, I liked this idea... so combining a power domain reference with > a vblank count restore once we know the dc off is blocked we could > workaround this case... something like: > > diff --git a/drivers/gpu/drm/i915/i915_irq.c > b/drivers/gpu/drm/i915/i915_irq.c > index 25a8937..2b18778 100644 > --- a/drivers/gpu/drm/i915/i915_irq.c > +++ b/drivers/gpu/drm/i915/i915_irq.c > @@ -2743,7 +2743,10 @@ static int gen8_enable_vblank(struct > drm_device > *dev, unsigned int pipe) > struct drm_i915_private *dev_priv = dev->dev_private; > unsigned long irqflags; > > + intel_display_power_get(dev_priv, POWER_DOMAIN_VBLANK); > + > spin_lock_irqsave(_priv->irq_lock, irqflags); > + dev->vblank[pipe].last = g4x_get_vblank_counter(dev, pipe); > bdw_enable_pipe_irq(dev_priv, pipe, GEN8_PIPE_VBLANK); > spin_unlock_irqrestore(_priv->irq_lock, irqflags); > > @@ -2796,6 +2799,8 @@ static void gen8_disable_vblank(struct > drm_device *dev, unsigned int pipe) > spin_lock_irqsave(_priv->irq_lock, irqflags); > bdw_disable_pipe_irq(dev_priv, pipe, GEN8_PIPE_VBLANK); > spin_unlock_irqrestore(_priv->irq_lock, irqflags); > + > + intel_display_power_put(dev_priv, POWER_DOMAIN_VBLANK); > } > > where POWER_DOMAIN_VBLANK is part of: > #define SKL_DISPLAY_DC_OFF_POWER_DOMAINS ( \ > BIT(POWER_DOMAIN_VBLANK) | \ > > > However I have my dmesg flooded by: > > > [ 69.025562] BUG: sleeping function called from invalid context at > drivers/base/power/runtime.c:955 > [ 69.025576] in_atomic(): 1, irqs_disabled(): 1, pid: 995, name: > Xorg > [ 69.025582] Preemption disabled at:[] > drm_vblank_get+0x4e/0xd0 > > [ 69.025619] CPU: 3 PID: 995 Comm: Xorg Tainted: G U W > 4.5.0-rc4+ #11 > [ 69.025628] Hardware name: Intel Corporation Kabylake Client > platform/Skylake U DDR3L RVP7, BIOS KBLSE2R1.R00.X019.B01.1512230743 > 12/23/2015 > [ 69.025637] 88003f0bfbb0 8148e983 > > [ 69.025653] 880085b04200 88003f0bfbd0 81133ece > 81d77f23 > [ 69.025667] 03bb 88003f0bfbf8 81133f89 > 88016913a098 > [ 69.025680] Call Trace: > [ 69.025697] [] dump_stack+0x65/0x92 > [ 69.025711] [] ___might_sleep+0x10e/0x180 > [ 69.025722] [] __might_sleep+0x49/0x80 > [ 69.025739] [] __pm_runtime_resume+0x79/0x80 > [ 69.025841] [] intel_runtime_pm_get+0x28/0x90 > [i915] > [ 69.025924] [] > intel_display_power_get+0x19/0x50 [i915] > [ 69.025995] [] gen8_enable_vblank+0x34/0xc0 > [i915] > [ 69.026016] [] drm_vblank_enable+0x76/0xd0 > > > > > Another thing
Re: [Intel-gfx] [PATCH RESSEND FOR CI *AGAIN*] drm/i915/bxt: Remove DSP CLK_GATE programming for BXT
On Thu, 18 Feb 2016, Jani Nikulawrote: > From: Uma Shankar > > DSP CLK_GATE registers are specific to BYT and CHT. > Avoid programming the same for BXT platform. > > v2: Rebased on latest drm nightly branch. > > v3: Fixed Jani's review comments > > Signed-off-by: Uma Shankar > Signed-off-by: Jani Nikula I gave up hoping to get CI results for this one, after two attempts. We have no coverage for this function anyway, and I've tested this before to not break BYT. Thus pushed to drm-intel-next-queued. BR, Jani. > --- > drivers/gpu/drm/i915/intel_dsi.c | 11 +++ > 1 file changed, 7 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/i915/intel_dsi.c > b/drivers/gpu/drm/i915/intel_dsi.c > index fcd746c55abd..b928c503df24 100644 > --- a/drivers/gpu/drm/i915/intel_dsi.c > +++ b/drivers/gpu/drm/i915/intel_dsi.c > @@ -634,7 +634,6 @@ static void intel_dsi_post_disable(struct intel_encoder > *encoder) > { > struct drm_i915_private *dev_priv = encoder->base.dev->dev_private; > struct intel_dsi *intel_dsi = enc_to_intel_dsi(>base); > - u32 val; > > DRM_DEBUG_KMS("\n"); > > @@ -642,9 +641,13 @@ static void intel_dsi_post_disable(struct intel_encoder > *encoder) > > intel_dsi_clear_device_ready(encoder); > > - val = I915_READ(DSPCLK_GATE_D); > - val &= ~DPOUNIT_CLOCK_GATE_DISABLE; > - I915_WRITE(DSPCLK_GATE_D, val); > + if (!IS_BROXTON(dev_priv)) { > + u32 val; > + > + val = I915_READ(DSPCLK_GATE_D); > + val &= ~DPOUNIT_CLOCK_GATE_DISABLE; > + I915_WRITE(DSPCLK_GATE_D, val); > + } > > drm_panel_unprepare(intel_dsi->panel); -- Jani Nikula, Intel Open Source Technology Center ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v4 3/8] drm/i915: Kill off intel_crtc->atomic.wait_vblank, v4.
Em Qui, 2016-02-18 às 15:46 +0100, Maarten Lankhorst escreveu: > Op 18-02-16 om 15:14 schreef Zanoni, Paulo R: > > Em Qui, 2016-02-18 às 14:22 +0100, Maarten Lankhorst escreveu: > > > Op 17-02-16 om 22:20 schreef Zanoni, Paulo R: > > > > Em Qua, 2016-02-10 às 13:49 +0100, Maarten Lankhorst escreveu: > > > > > Currently we perform our own wait in post_plane_update, > > > > > but the atomic core performs another one in wait_for_vblanks. > > > > > This means that 2 vblanks are done when a fb is changed, > > > > > which is a bit overkill. > > > > > > > > > > Merge them by creating a helper function that takes a crtc > > > > > mask > > > > > for the planes to wait on. > > > > > > > > > > The broadwell vblank workaround may look gone entirely but > > > > > this > > > > > is > > > > > not the case. pipe_config->wm_changed is set to true > > > > > when any plane is turned on, which forces a vblank wait. > > > > > > > > > > Changes since v1: > > > > > - Removing the double vblank wait on broadwell moved to its > > > > > own > > > > > commit. > > > > > Changes since v2: > > > > > - Move out POWER_DOMAIN_MODESET handling to its own commit. > > > > > Changes since v3: > > > > > - Do not wait for vblank on legacy cursor updates. (Ville) > > > > > - Move broadwell vblank workaround comment to > > > > > page_flip_finished. > > > > > (Ville) > > > > > Changes since v4: > > > > > - Compile fix, legacy_cursor_flip -> *_update. > > > > > > > > > > Signed-off-by: Maarten Lankhorst> > > > el.c > > > > > om> > > > > > --- > > > > > drivers/gpu/drm/i915/intel_atomic.c | 1 + > > > > > drivers/gpu/drm/i915/intel_display.c | 86 > > > > > +++- > > > > > drivers/gpu/drm/i915/intel_drv.h | 2 +- > > > > > 3 files changed, 67 insertions(+), 22 deletions(-) > > > > > > > > > > diff --git a/drivers/gpu/drm/i915/intel_atomic.c > > > > > b/drivers/gpu/drm/i915/intel_atomic.c > > > > > index 4625f8a9ba12..8e579a8505ac 100644 > > > > > --- a/drivers/gpu/drm/i915/intel_atomic.c > > > > > +++ b/drivers/gpu/drm/i915/intel_atomic.c > > > > > @@ -97,6 +97,7 @@ intel_crtc_duplicate_state(struct drm_crtc > > > > > *crtc) > > > > > crtc_state->disable_lp_wm = false; > > > > > crtc_state->disable_cxsr = false; > > > > > crtc_state->wm_changed = false; > > > > > + crtc_state->fb_changed = false; > > > > > > > > > > return _state->base; > > > > > } > > > > > diff --git a/drivers/gpu/drm/i915/intel_display.c > > > > > b/drivers/gpu/drm/i915/intel_display.c > > > > > index 804f2c6f260d..4d4dddc1f970 100644 > > > > > --- a/drivers/gpu/drm/i915/intel_display.c > > > > > +++ b/drivers/gpu/drm/i915/intel_display.c > > > > > @@ -4785,9 +4785,6 @@ static void > > > > > intel_post_plane_update(struct > > > > > intel_crtc *crtc) > > > > > to_intel_crtc_state(crtc->base.state); > > > > > struct drm_device *dev = crtc->base.dev; > > > > > > > > > > - if (atomic->wait_vblank) > > > > > - intel_wait_for_vblank(dev, crtc->pipe); > > > > > - > > > > > intel_frontbuffer_flip(dev, atomic->fb_bits); > > > > > > > > > > crtc->wm.cxsr_allowed = true; > > > > > @@ -10902,6 +10899,12 @@ static bool > > > > > page_flip_finished(struct > > > > > intel_crtc *crtc) > > > > > return true; > > > > > > > > > > /* > > > > > + * BDW signals flip done immediately if the plane > > > > > + * is disabled, even if the plane enable is already > > > > > + * armed to occur at the next vblank :( > > > > > + */ > > > > Having this comment here is just... weird. I think it removes a > > > > lot > > > > of > > > > the context that was present before. > > > > > > > > > + > > > > > + /* > > > > > * A DSPSURFLIVE check isn't enough in case the mmio > > > > > and > > > > > CS > > > > > flips > > > > > * used the same base address. In that case the mmio > > > > > flip > > > > > might > > > > > * have completed, but the CS hasn't even executed > > > > > the > > > > > flip > > > > > yet. > > > > > @@ -11778,6 +11781,9 @@ int > > > > > intel_plane_atomic_calc_changes(struct > > > > > drm_crtc_state *crtc_state, > > > > > if (!was_visible && !visible) > > > > > return 0; > > > > > > > > > > + if (fb != old_plane_state->base.fb) > > > > > + pipe_config->fb_changed = true; > > > > > + > > > > > turn_off = was_visible && (!visible || > > > > > mode_changed); > > > > > turn_on = visible && (!was_visible || mode_changed); > > > > > > > > > > @@ -11793,8 +11799,6 @@ int > > > > > intel_plane_atomic_calc_changes(struct > > > > > drm_crtc_state *crtc_state, > > > > > > > > > > /* must disable cxsr around plane > > > > > enable/disable > > > > > */ > > > > > if (plane->type != DRM_PLANE_TYPE_CURSOR) { > > > > > - if (is_crtc_enabled) > > > > > - intel_crtc- >
[Intel-gfx] Fwd: [PATCH] drm/i915: Avoid vblank counter for gen9+
Imre, Patrik, do you know if I'm missing something or what I'm doing wrong with this power domain handler for vblanks to avoid DC states when we need a reliable frame counter in place. Do you have better ideas? Thanks, Rodrigo. -- Forwarded message -- From: Rodrigo ViviDate: Wed, Feb 17, 2016 at 3:14 PM Subject: Re: [Intel-gfx] [PATCH] drm/i915: Avoid vblank counter for gen9+ To: Daniel Vetter , Patrik Jakobsson Cc: Rodrigo Vivi , intel-gfx On Tue, Feb 16, 2016 at 7:50 AM, Daniel Vetter wrote: > On Thu, Feb 11, 2016 at 09:00:47AM -0800, Rodrigo Vivi wrote: >> Framecounter register is read-only so DMC cannot restore it >> after exiting DC5 and DC6. >> >> Easiest way to go is to avoid the counter and use vblank >> interruptions for this platform and for all the following >> ones since DMC came to stay. At least while we can't change >> this register to read-write. >> >> Signed-off-by: Rodrigo Vivi > > Now my comments also in public: > - Do we still get reasonable dc5 residency with this - it means we'll keep > vblank irq running forever. > > - I'm a bit unclear on what exactly this fixes - have you tested that > long-lasting vblank waits are still accurate? Just want to make sure we > don't just paper over the issue and desktops can still get stuck waiting > for a vblank. apparently no... so please just ignore this patch for now... after a while with that patch I was seeing the issue again... > > Just a bit suprised that the only problem is the framecounter, and not > that vblanks stop happening too. > > We need to also know these details for the proper fix, which will involve > grabbing power well references (might need a new one for vblank > interrupts) to make sure. Yeap, I liked this idea... so combining a power domain reference with a vblank count restore once we know the dc off is blocked we could workaround this case... something like: diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 25a8937..2b18778 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -2743,7 +2743,10 @@ static int gen8_enable_vblank(struct drm_device *dev, unsigned int pipe) struct drm_i915_private *dev_priv = dev->dev_private; unsigned long irqflags; + intel_display_power_get(dev_priv, POWER_DOMAIN_VBLANK); + spin_lock_irqsave(_priv->irq_lock, irqflags); + dev->vblank[pipe].last = g4x_get_vblank_counter(dev, pipe); bdw_enable_pipe_irq(dev_priv, pipe, GEN8_PIPE_VBLANK); spin_unlock_irqrestore(_priv->irq_lock, irqflags); @@ -2796,6 +2799,8 @@ static void gen8_disable_vblank(struct drm_device *dev, unsigned int pipe) spin_lock_irqsave(_priv->irq_lock, irqflags); bdw_disable_pipe_irq(dev_priv, pipe, GEN8_PIPE_VBLANK); spin_unlock_irqrestore(_priv->irq_lock, irqflags); + + intel_display_power_put(dev_priv, POWER_DOMAIN_VBLANK); } where POWER_DOMAIN_VBLANK is part of: #define SKL_DISPLAY_DC_OFF_POWER_DOMAINS ( \ BIT(POWER_DOMAIN_VBLANK) | \ However I have my dmesg flooded by: [ 69.025562] BUG: sleeping function called from invalid context at drivers/base/power/runtime.c:955 [ 69.025576] in_atomic(): 1, irqs_disabled(): 1, pid: 995, name: Xorg [ 69.025582] Preemption disabled at:[] drm_vblank_get+0x4e/0xd0 [ 69.025619] CPU: 3 PID: 995 Comm: Xorg Tainted: G U W 4.5.0-rc4+ #11 [ 69.025628] Hardware name: Intel Corporation Kabylake Client platform/Skylake U DDR3L RVP7, BIOS KBLSE2R1.R00.X019.B01.1512230743 12/23/2015 [ 69.025637] 88003f0bfbb0 8148e983 [ 69.025653] 880085b04200 88003f0bfbd0 81133ece 81d77f23 [ 69.025667] 03bb 88003f0bfbf8 81133f89 88016913a098 [ 69.025680] Call Trace: [ 69.025697] [] dump_stack+0x65/0x92 [ 69.025711] [] ___might_sleep+0x10e/0x180 [ 69.025722] [] __might_sleep+0x49/0x80 [ 69.025739] [] __pm_runtime_resume+0x79/0x80 [ 69.025841] [] intel_runtime_pm_get+0x28/0x90 [i915] [ 69.025924] [] intel_display_power_get+0x19/0x50 [i915] [ 69.025995] [] gen8_enable_vblank+0x34/0xc0 [i915] [ 69.026016] [] drm_vblank_enable+0x76/0xd0 Another thing that I search in the spec was for an Interrupt to know when we came back from DC5 or DC6 or got power well re-enabled, so we would be able to restore the drm last counter... but I couldn't find any... Any other idea? > > Cheers, Daniel > >> --- >> drivers/gpu/drm/i915/i915_irq.c | 7 +-- >> 1 file changed, 5 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/gpu/drm/i915/i915_irq.c >> b/drivers/gpu/drm/i915/i915_irq.c >> index 25a8937..c294a4b 100644 >> --- a/drivers/gpu/drm/i915/i915_irq.c >> +++
[Intel-gfx] [maintainer-tools PATCH 2/8] dim: add list-branches subcommand to list nightly branches
Helper for bash completion. Where to get the information depends on user's dim configuration. Signed-off-by: Jani Nikula--- dim | 6 ++ 1 file changed, 6 insertions(+) diff --git a/dim b/dim index c004bc75ca06..33ef8288a291 100755 --- a/dim +++ b/dim @@ -972,6 +972,12 @@ function dim_pull_request_next_fixes dim_pull_request drm-intel-next-fixes $upstream } +# Note: used by bash completion +function dim_list_branches +{ + echo $dim_branches | sed 's/ /\n/g' +} + dim_alias_ub=update-branches function dim_update_branches { -- 2.1.4 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [maintainer-tools PATCH 4/8] completion: use the dim helpers to complete nightly and upstream branches
Use the user's configured directories and remotes via dim. Signed-off-by: Jani Nikula--- bash_completion | 9 ++--- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/bash_completion b/bash_completion index 6a3a88cc80f8..f89764e3947d 100644 --- a/bash_completion +++ b/bash_completion @@ -27,13 +27,8 @@ _dim () # args = number of arguments _count_args - if [ -f ~/linux/drm-intel-rerere/nightly.conf ] ; then - local nightly_branches=`(source ~/linux/drm-intel-rerere/nightly.conf ; echo $nightly_branches) | \ - xargs -n 1 echo | grep '^origin' | sed -e 's/^origin\///'` - else - local nightly_branches="" - fi - local upstream_branches="origin/master airlied/drm-next airlied/drm-fixes" + local nightly_branches="$(dim list-branches)" + local upstream_branches="$(dim list-upstreams)" cmds="setup nightly-forget update-branches" cmds="$cmds rebuild-nightly cat-to-fixup" -- 2.1.4 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [maintainer-tools PATCH 6/8] dim: rename alias subcommand to list-aliases
Also drop leading tab and fix underscores in output. Helper for bash completion. Signed-off-by: Jani Nikula--- dim | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/dim b/dim index 2f6e6151a4b2..1addd6f6a0e9 100755 --- a/dim +++ b/dim @@ -1121,11 +1121,12 @@ function dim_list_commands declare -F | grep -o " dim_[a-zA-Z_]*" | sed 's/^ dim_//;s/_/-/g' } -function dim_alias +# Note: used by bash completion +function dim_list_aliases { # use posix mode to omit functions in set output ( set -o posix; set ) | grep "^dim_alias_[a-zA-Z0-9_]*=" |\ - sed 's/^dim_alias_/\t/;s/=/\t/' + sed 's/^dim_alias_//;s/=/\t/;s/_/-/g' } function dim_cat_to_fixup -- 2.1.4 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [maintainer-tools PATCH 5/8] dim: add list-commands subcommand to list all subcommands
Helper for completion. Signed-off-by: Jani Nikula--- dim | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/dim b/dim index 6fb496ea4192..2f6e6151a4b2 100755 --- a/dim +++ b/dim @@ -1115,6 +1115,12 @@ function assert_branch fi } +# Note: used by bash completion +function dim_list_commands +{ + declare -F | grep -o " dim_[a-zA-Z_]*" | sed 's/^ dim_//;s/_/-/g' +} + function dim_alias { # use posix mode to omit functions in set output @@ -1178,7 +1184,7 @@ function dim_usage echo "usage: $0 [OPTIONS] SUBCOMMAND [ARGUMENTS]" echo echo "The available subcommands are:" - declare -F | grep -o " dim_[a-zA-Z_]*" | sed 's/^ dim_/\t/' + dim_list_commands | sed 's/^/\t/' echo echo "See '$0 help' for more information." } -- 2.1.4 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [maintainer-tools PATCH 3/8] dim: add list-upstreams subcommand to list upstream branches
Helper for bash completion. The result depends on user's dim configuration. Signed-off-by: Jani Nikula--- dim | 8 1 file changed, 8 insertions(+) diff --git a/dim b/dim index 33ef8288a291..6fb496ea4192 100755 --- a/dim +++ b/dim @@ -973,6 +973,14 @@ function dim_pull_request_next_fixes } # Note: used by bash completion +function dim_list_upstreams +{ + echo origin/master + echo $DIM_DRM_UPSTREAM_REMOTE/drm-next + echo $DIM_DRM_UPSTREAM_REMOTE/drm-fixes +} + +# Note: used by bash completion function dim_list_branches { echo $dim_branches | sed 's/ /\n/g' -- 2.1.4 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [maintainer-tools PATCH 8/8] completion: complete aliases like the actual command
Map aliases to the actual commands. No need to know all the aliases. Signed-off-by: Jani Nikula--- bash_completion | 18 -- 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/bash_completion b/bash_completion index 4a9d981709a0..9f659b4ebcce 100644 --- a/bash_completion +++ b/bash_completion @@ -44,20 +44,26 @@ _dim () return 0 fi + # complete aliases like the actual command + local aliasref=$(dim list-aliases | sed -n "s/^${arg}\t\(.*\)/\1/p") + if [[ -n "$aliasref" ]]; then + arg="$aliasref" + fi + case "${arg}" in push-branch) COMPREPLY=( $( compgen -W "-f $nightly_branches" -- $cur ) ) ;; - push-queued|pq|push-fixes|pf|push-next-fixes|pnf) + push-queued|push-fixes|push-next-fixes) COMPREPLY=( $( compgen -W "-f" -- $cur ) ) ;; - apply-branch|ab|sob) + apply-branch) COMPREPLY=( $( compgen -W "-s $nightly_branches" -- $cur ) ) ;; - apply-queued|aq|apply-fixes|af|apply-next-fixes|anf) + apply-queued|apply-fixes|apply-next-fixes) COMPREPLY=( $( compgen -W "-s" -- $cur ) ) ;; - magic-patch|mp) + magic-patch) if [[ $args == 2 ]]; then COMPREPLY=( $( compgen -o nospace -W "-a" -- $cur ) ) fi @@ -65,7 +71,7 @@ _dim () tc|fixes) # FIXME needs a git sha1 ;; - check-patch|cp) + checkpatch) # FIXME needs a git sha1 ;; pull-request) @@ -85,7 +91,7 @@ _dim () COMPREPLY=( $( compgen -o nospace -W "drm- topic/" -- $cur ) ) fi ;; - checkout|co) + checkout) if [[ $args == 2 ]]; then COMPREPLY=( $( compgen -W "$nightly_branches" -- $cur ) ) fi -- 2.1.4 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [maintainer-tools PATCH 7/8] completion: use the dim helpers to complete subcommands and aliases
Autodiscover everything, including user's configured aliases. Signed-off-by: Jani Nikula--- bash_completion | 16 +--- 1 file changed, 1 insertion(+), 15 deletions(-) diff --git a/bash_completion b/bash_completion index f89764e3947d..4a9d981709a0 100644 --- a/bash_completion +++ b/bash_completion @@ -12,7 +12,6 @@ dim () _dim () { local args arg cur prev words cword split - local cmds # require bash-completion with _init_completion type -t _init_completion >/dev/null 2>&1 || return @@ -30,20 +29,6 @@ _dim () local nightly_branches="$(dim list-branches)" local upstream_branches="$(dim list-upstreams)" - cmds="setup nightly-forget update-branches" - cmds="$cmds rebuild-nightly cat-to-fixup" - cmds="$cmds push-queued pq push-fixes pf push-next-fixes pnf push-branch" - cmds="$cmds checkout co conq cof conf" - cmds="$cmds apply-branch ab sob apply-queued aq apply-fixes af apply-next-fixes anf" - cmds="$cmds magic-patch mp cd" - cmds="$cmds magic-rebase-resolve mrr" - cmds="$cmds apply-igt ai" - cmds="$cmds apply-resolved ar tc fixes check-patch cp cherry-pick" - cmds="$cmds pull-request pull-request-fixes pull-request-next pull-request-next-fixes" - cmds="$cmds update-next" - cmds="$cmds create-branch remove-branch create-workdir for-each-workdirs fw" - cmds="$cmds tag-next checker" - if [ -z "${arg}" ]; then # top level completion case "${cur}" in @@ -52,6 +37,7 @@ _dim () COMPREPLY=( $(compgen -W "${opts}" -- ${cur}) ) ;; *) + local cmds="$(dim list-commands) $(dim list-aliases | sed 's/\t.*//')" COMPREPLY=( $(compgen -W "${cmds}" -- ${cur}) ) ;; esac -- 2.1.4 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [maintainer-tools PATCH 1/8] completion: require bash completion package and use it
The bash completion package makes life a whole lot easier than using the builtin bash completion features. It's quite likely anyone using completion in bash already has it installed. Signed-off-by: Jani Nikula--- bash_completion | 62 - 1 file changed, 35 insertions(+), 27 deletions(-) diff --git a/bash_completion b/bash_completion index e44e5fc844b4..6a3a88cc80f8 100644 --- a/bash_completion +++ b/bash_completion @@ -11,7 +11,21 @@ dim () _dim () { - local cur cmds opts i + local args arg cur prev words cword split + local cmds + + # require bash-completion with _init_completion + type -t _init_completion >/dev/null 2>&1 || return + + _init_completion || return + + COMPREPLY=() + + # arg = subcommand + _get_first_arg + + # args = number of arguments + _count_args if [ -f ~/linux/drm-intel-rerere/nightly.conf ] ; then local nightly_branches=`(source ~/linux/drm-intel-rerere/nightly.conf ; echo $nightly_branches) | \ @@ -35,27 +49,21 @@ _dim () cmds="$cmds create-branch remove-branch create-workdir for-each-workdirs fw" cmds="$cmds tag-next checker" - opts="-d -f -i" - - i=1 - - COMPREPLY=() # Array variable storing the possible completions. - cur=${COMP_WORDS[COMP_CWORD]} - - for comp in "${COMP_WORDS[@]}" ; do - for opt in $opts ; do - if [[ $opt = $comp ]] ; then - i=$((i+1)) - fi - done - done - - if [[ $COMP_CWORD == "$i" ]] ; then - COMPREPLY=( $( compgen -W "$cmds $opts" -- $cur ) ) + if [ -z "${arg}" ]; then + # top level completion + case "${cur}" in + -*) + local opts="-d -f -i" + COMPREPLY=( $(compgen -W "${opts}" -- ${cur}) ) + ;; + *) + COMPREPLY=( $(compgen -W "${cmds}" -- ${cur}) ) + ;; + esac return 0 fi - case "${COMP_WORDS[i]}" in + case "${arg}" in push-branch) COMPREPLY=( $( compgen -W "-f $nightly_branches" -- $cur ) ) ;; @@ -69,7 +77,7 @@ _dim () COMPREPLY=( $( compgen -W "-s" -- $cur ) ) ;; magic-patch|mp) - if [[ $COMP_CWORD == "$((i+1))" ]] ; then + if [[ $args == 2 ]]; then COMPREPLY=( $( compgen -o nospace -W "-a" -- $cur ) ) fi ;; @@ -80,34 +88,34 @@ _dim () # FIXME needs a git sha1 ;; pull-request) - if [[ $COMP_CWORD == "$((i+1))" ]] ; then + if [[ $args == 2 ]]; then COMPREPLY=( $( compgen -W "$nightly_branches" -- $cur ) ) - elif [[ $COMP_CWORD == "$((i+2))" ]] ; then + elif [[ $args == 3 ]]; then COMPREPLY=( $( compgen -W "$upstream_branches" -- $cur ) ) fi ;; pull-request-next|pull-request-fixes|pull-request-next-fixes) - if [[ $COMP_CWORD == "$((i+1))" ]] ; then + if [[ $args == 2 ]]; then COMPREPLY=( $( compgen -W "$upstream_branches" -- $cur ) ) fi ;; create-branch) - if [[ $COMP_CWORD == "$((i+1))" ]] ; then + if [[ $args == 2 ]]; then COMPREPLY=( $( compgen -o nospace -W "drm- topic/" -- $cur ) ) fi ;; checkout|co) - if [[ $COMP_CWORD == "$((i+1))" ]] ; then + if [[ $args == 2 ]]; then COMPREPLY=( $( compgen -W "$nightly_branches" -- $cur ) ) fi ;; remove-branch) - if [[ $COMP_CWORD == "$((i+1))" ]] ; then + if [[ $args == 2 ]]; then COMPREPLY=( $( compgen -W "$nightly_branches" -- $cur ) ) fi ;; create-workdir) - if [[ $COMP_CWORD == "$((i+1))" ]] ; then + if [[ $args == 2 ]]; then COMPREPLY=( $( compgen -W "$nightly_branches all" -- $cur ) ) fi
[Intel-gfx] [PATCH 2/4] drm/i915/gen9: Verify and enforce dc6 state writes
It has been observed that sometimes disabling the dc6 fails and dc6 state pops back up, brief moment after disabling. This has to be dmc save/restore timing issue or other bug in the way dc states are handled. Try to work around this issue as we don't have firmware fix yet available. Verify that the value we wrote for the dmc sticks, and also enforce it by rewriting it, if it didn't. v2: Zero rereads on rewrite for extra paranoia (Imre) Testcase: kms_flip/basic-flip-vs-dpms References: https://bugs.freedesktop.org/show_bug.cgi?id=93768 Cc: Patrik JakobssonCc: Rodrigo Vivi Cc: Imre Deak Signed-off-by: Mika Kuoppala Reviewed-by: Imre Deak --- drivers/gpu/drm/i915/intel_runtime_pm.c | 41 +++-- 1 file changed, 39 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c b/drivers/gpu/drm/i915/intel_runtime_pm.c index 8b9290fdb3b2..814cf5ac1ef0 100644 --- a/drivers/gpu/drm/i915/intel_runtime_pm.c +++ b/drivers/gpu/drm/i915/intel_runtime_pm.c @@ -470,6 +470,43 @@ static void gen9_set_dc_state_debugmask_memory_up( } } +static void gen9_write_dc_state(struct drm_i915_private *dev_priv, + u32 state) +{ + int rewrites = 0; + int rereads = 0; + u32 v; + + I915_WRITE(DC_STATE_EN, state); + + /* It has been observed that disabling the dc6 state sometimes +* doesn't stick and dmc keeps returning old value. Make sure +* the write really sticks enough times and also force rewrite until +* we are confident that state is exactly what we want. +*/ + do { + v = I915_READ(DC_STATE_EN); + + if (v != state) { + I915_WRITE(DC_STATE_EN, state); + rewrites++; + rereads = 0; + } else if (rereads++ > 5) { + break; + } + + } while (rewrites < 100); + + if (v != state) + DRM_ERROR("Writing dc state to 0x%x failed, now 0x%x\n", + state, v); + + /* Most of the times we need one retry, avoid spam */ + if (rewrites > 1) + DRM_DEBUG_KMS("Rewrote dc state to 0x%x %d times\n", + state, rewrites); +} + static void gen9_set_dc_state(struct drm_i915_private *dev_priv, uint32_t state) { uint32_t val; @@ -502,8 +539,8 @@ static void gen9_set_dc_state(struct drm_i915_private *dev_priv, uint32_t state) val &= ~mask; val |= state; - I915_WRITE(DC_STATE_EN, val); - POSTING_READ(DC_STATE_EN); + + gen9_write_dc_state(dev_priv, val); dev_priv->csr.dc_state = val & mask; } -- 2.5.0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 2/4] drm/i915/gen9: Verify and enforce dc6 state writes
Imre Deakwrites: > On to, 2016-02-18 at 17:21 +0200, Mika Kuoppala wrote: >> It has been observed that sometimes disabling the dc6 fails >> and dc6 state pops back up, brief moment after disabling. This >> has to be dmc save/restore timing issue or other bug in the >> way dc states are handled. >> >> Try to work around this issue as we don't have firmware fix >> yet available. Verify that the value we wrote for the dmc sticks, >> and also enforce it by rewriting it, if it didn't. >> >> Testcase: kms_flip/basic-flip-vs-dpms >> References: https://bugs.freedesktop.org/show_bug.cgi?id=93768 >> Cc: Patrik Jakobsson >> Cc: Rodrigo Vivi >> Cc: Imre Deak >> Signed-off-by: Mika Kuoppala >> --- >> drivers/gpu/drm/i915/intel_runtime_pm.c | 40 >> +++-- >> 1 file changed, 38 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c >> b/drivers/gpu/drm/i915/intel_runtime_pm.c >> index 8b9290fdb3b2..cb91540cfbad 100644 >> --- a/drivers/gpu/drm/i915/intel_runtime_pm.c >> +++ b/drivers/gpu/drm/i915/intel_runtime_pm.c >> @@ -470,6 +470,42 @@ static void >> gen9_set_dc_state_debugmask_memory_up( >> } >> } >> >> +static void gen9_write_dc_state(struct drm_i915_private *dev_priv, >> +u32 state) >> +{ >> +int rewrites = 0; >> +int rereads = 0; >> +u32 v; >> + >> +I915_WRITE(DC_STATE_EN, state); >> + >> +/* It has been observed that disabling the dc6 state >> sometimes >> + * doesn't stick and dmc keeps returning old value. Make >> sure >> + * the write really sticks enough times and also force >> rewrite until >> + * we are confident that state is exactly what we want. >> + */ >> +do { >> +v = I915_READ(DC_STATE_EN); >> + >> +if (v != state) { >> +I915_WRITE(DC_STATE_EN, state); >> +rewrites++; > > Could be rereads = 0; for extra paranoia. Either way: Oh yes, extra paranoia in here is warranted. I will add that. > Reviewed-by: Imre Deak Thanks, -Mika > >> +} else if (rereads++ > 5) { >> +break; >> +} >> + >> +} while (rewrites < 100); >> + >> +if (v != state) >> +DRM_ERROR("Writing dc state to 0x%x failed, now >> 0x%x\n", >> + state, v); >> + >> +/* Most of the times we need one retry, avoid spam */ >> +if (rewrites > 1) >> +DRM_DEBUG_KMS("Rewrote dc state to 0x%x %d times\n", >> + state, rewrites); >> +} >> + >> static void gen9_set_dc_state(struct drm_i915_private *dev_priv, >> uint32_t state) >> { >> uint32_t val; >> @@ -502,8 +538,8 @@ static void gen9_set_dc_state(struct >> drm_i915_private *dev_priv, uint32_t state) >> >> val &= ~mask; >> val |= state; >> -I915_WRITE(DC_STATE_EN, val); >> -POSTING_READ(DC_STATE_EN); >> + >> +gen9_write_dc_state(dev_priv, val); >> >> dev_priv->csr.dc_state = val & mask; >> } ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 4/4] drm/i915/gen9: Write dc state debugmask bits only once
On to, 2016-02-18 at 17:21 +0200, Mika Kuoppala wrote: > DMC debugmask bits should stick so no need to write them > everytime dc state is changed. > > v2: Write after firmware has been successfully loaded (Ville) > > Signed-off-by: Mika KuoppalaReviewed-by: Imre Deak > --- > drivers/gpu/drm/i915/intel_csr.c| 8 +--- > drivers/gpu/drm/i915/intel_drv.h| 2 +- > drivers/gpu/drm/i915/intel_runtime_pm.c | 7 ++- > 3 files changed, 8 insertions(+), 9 deletions(-) > > diff --git a/drivers/gpu/drm/i915/intel_csr.c > b/drivers/gpu/drm/i915/intel_csr.c > index b453fccfa25d..902054efb902 100644 > --- a/drivers/gpu/drm/i915/intel_csr.c > +++ b/drivers/gpu/drm/i915/intel_csr.c > @@ -220,19 +220,19 @@ static const struct stepping_info > *intel_get_stepping_info(struct drm_device *de > * Everytime display comes back from low power state this function > is called to > * copy the firmware from internal memory to registers. > */ > -void intel_csr_load_program(struct drm_i915_private *dev_priv) > +bool intel_csr_load_program(struct drm_i915_private *dev_priv) > { > u32 *payload = dev_priv->csr.dmc_payload; > uint32_t i, fw_size; > > if (!IS_GEN9(dev_priv)) { > DRM_ERROR("No CSR support available for this > platform\n"); > - return; > + return false; > } > > if (!dev_priv->csr.dmc_payload) { > DRM_ERROR("Tried to program CSR with empty > payload\n"); > - return; > + return false; > } > > fw_size = dev_priv->csr.dmc_fw_size; > @@ -245,6 +245,8 @@ void intel_csr_load_program(struct > drm_i915_private *dev_priv) > } > > dev_priv->csr.dc_state = 0; > + > + return true; > } > > static uint32_t *parse_csr_fw(struct drm_i915_private *dev_priv, > diff --git a/drivers/gpu/drm/i915/intel_drv.h > b/drivers/gpu/drm/i915/intel_drv.h > index 285b0570be9c..c208ca630e99 100644 > --- a/drivers/gpu/drm/i915/intel_drv.h > +++ b/drivers/gpu/drm/i915/intel_drv.h > @@ -1225,7 +1225,7 @@ u32 skl_plane_ctl_rotation(unsigned int > rotation); > > /* intel_csr.c */ > void intel_csr_ucode_init(struct drm_i915_private *); > -void intel_csr_load_program(struct drm_i915_private *); > +bool intel_csr_load_program(struct drm_i915_private *); > void intel_csr_ucode_fini(struct drm_i915_private *); > > /* intel_dp.c */ > diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c > b/drivers/gpu/drm/i915/intel_runtime_pm.c > index 1b490c7e4020..7f0577ca900e 100644 > --- a/drivers/gpu/drm/i915/intel_runtime_pm.c > +++ b/drivers/gpu/drm/i915/intel_runtime_pm.c > @@ -526,9 +526,6 @@ static void gen9_set_dc_state(struct > drm_i915_private *dev_priv, uint32_t state) > else if (i915.enable_dc == 1 && state > > DC_STATE_EN_UPTO_DC5) > state = DC_STATE_EN_UPTO_DC5; > > - if (state & DC_STATE_EN_UPTO_DC5_DC6_MASK) > - gen9_set_dc_state_debugmask(dev_priv); > - > val = I915_READ(DC_STATE_EN); > DRM_DEBUG_KMS("Setting DC state from %02x to %02x\n", > val & mask, state); > @@ -2119,8 +2116,8 @@ static void skl_display_core_init(struct > drm_i915_private *dev_priv, > > skl_init_cdclk(dev_priv); > > - if (dev_priv->csr.dmc_payload) > - intel_csr_load_program(dev_priv); > + if (dev_priv->csr.dmc_payload && > intel_csr_load_program(dev_priv)) > + gen9_set_dc_state_debugmask(dev_priv); > } > > static void skl_display_core_uninit(struct drm_i915_private > *dev_priv) ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 2/4] drm/i915/gen9: Verify and enforce dc6 state writes
On to, 2016-02-18 at 17:21 +0200, Mika Kuoppala wrote: > It has been observed that sometimes disabling the dc6 fails > and dc6 state pops back up, brief moment after disabling. This > has to be dmc save/restore timing issue or other bug in the > way dc states are handled. > > Try to work around this issue as we don't have firmware fix > yet available. Verify that the value we wrote for the dmc sticks, > and also enforce it by rewriting it, if it didn't. > > Testcase: kms_flip/basic-flip-vs-dpms > References: https://bugs.freedesktop.org/show_bug.cgi?id=93768 > Cc: Patrik Jakobsson> Cc: Rodrigo Vivi > Cc: Imre Deak > Signed-off-by: Mika Kuoppala > --- > drivers/gpu/drm/i915/intel_runtime_pm.c | 40 > +++-- > 1 file changed, 38 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c > b/drivers/gpu/drm/i915/intel_runtime_pm.c > index 8b9290fdb3b2..cb91540cfbad 100644 > --- a/drivers/gpu/drm/i915/intel_runtime_pm.c > +++ b/drivers/gpu/drm/i915/intel_runtime_pm.c > @@ -470,6 +470,42 @@ static void > gen9_set_dc_state_debugmask_memory_up( > } > } > > +static void gen9_write_dc_state(struct drm_i915_private *dev_priv, > + u32 state) > +{ > + int rewrites = 0; > + int rereads = 0; > + u32 v; > + > + I915_WRITE(DC_STATE_EN, state); > + > + /* It has been observed that disabling the dc6 state > sometimes > + * doesn't stick and dmc keeps returning old value. Make > sure > + * the write really sticks enough times and also force > rewrite until > + * we are confident that state is exactly what we want. > + */ > + do { > + v = I915_READ(DC_STATE_EN); > + > + if (v != state) { > + I915_WRITE(DC_STATE_EN, state); > + rewrites++; Could be rereads = 0; for extra paranoia. Either way: Reviewed-by: Imre Deak > + } else if (rereads++ > 5) { > + break; > + } > + > + } while (rewrites < 100); > + > + if (v != state) > + DRM_ERROR("Writing dc state to 0x%x failed, now > 0x%x\n", > + state, v); > + > + /* Most of the times we need one retry, avoid spam */ > + if (rewrites > 1) > + DRM_DEBUG_KMS("Rewrote dc state to 0x%x %d times\n", > + state, rewrites); > +} > + > static void gen9_set_dc_state(struct drm_i915_private *dev_priv, > uint32_t state) > { > uint32_t val; > @@ -502,8 +538,8 @@ static void gen9_set_dc_state(struct > drm_i915_private *dev_priv, uint32_t state) > > val &= ~mask; > val |= state; > - I915_WRITE(DC_STATE_EN, val); > - POSTING_READ(DC_STATE_EN); > + > + gen9_write_dc_state(dev_priv, val); > > dev_priv->csr.dc_state = val & mask; > } ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 3/4] drm/i915/gen9: Extend dmc debug mask to include cores
On to, 2016-02-18 at 17:21 +0200, Mika Kuoppala wrote: > Cores need to be included into the debug mask. We don't exactly > know what it does but the spec says it must be enabled. So obey. > > Signed-off-by: Mika Kuoppala> --- > drivers/gpu/drm/i915/i915_reg.h | 1 + > drivers/gpu/drm/i915/intel_runtime_pm.c | 14 -- > 2 files changed, 9 insertions(+), 6 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_reg.h > b/drivers/gpu/drm/i915/i915_reg.h > index 3774870477c1..f76cbf3e5d1e 100644 > --- a/drivers/gpu/drm/i915/i915_reg.h > +++ b/drivers/gpu/drm/i915/i915_reg.h > @@ -7568,6 +7568,7 @@ enum skl_disp_power_wells { > #define DC_STATE_EN_UPTO_DC5_DC6_MASK 0x3 > > #define DC_STATE_DEBUG _MMIO(0x45520) > +#define DC_STATE_DEBUG_MASK_CORES (1<<0) > #define DC_STATE_DEBUG_MASK_MEMORY_UP (1<<1) > > /* Please see hsw_read_dcomp() and hsw_write_dcomp() before using > this register, > diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c > b/drivers/gpu/drm/i915/intel_runtime_pm.c > index cb91540cfbad..1b490c7e4020 100644 > --- a/drivers/gpu/drm/i915/intel_runtime_pm.c > +++ b/drivers/gpu/drm/i915/intel_runtime_pm.c > @@ -456,15 +456,17 @@ static void assert_can_disable_dc9(struct > drm_i915_private *dev_priv) > */ > } > > -static void gen9_set_dc_state_debugmask_memory_up( > - struct drm_i915_private *dev_priv) > +static void gen9_set_dc_state_debugmask(struct drm_i915_private > *dev_priv) > { > - uint32_t val; > + uint32_t val, mask; > + > + mask = DC_STATE_DEBUG_MASK_MEMORY_UP | > + DC_STATE_DEBUG_MASK_CORES; The BSpec "Sequence to Allow DC5 or DC6" requires this only for BXT (looks like a recent addition to work around something), but it doesn't say it's needed for other platforms. The register description doesn't make a difference though. Perhaps Art has more info on this, adding him. > > /* The below bit doesn't need to be cleared ever afterwards > */ > val = I915_READ(DC_STATE_DEBUG); > - if (!(val & DC_STATE_DEBUG_MASK_MEMORY_UP)) { > - val |= DC_STATE_DEBUG_MASK_MEMORY_UP; > + if ((val & mask) != mask) { > + val |= mask; > I915_WRITE(DC_STATE_DEBUG, val); > POSTING_READ(DC_STATE_DEBUG); > } > @@ -525,7 +527,7 @@ static void gen9_set_dc_state(struct > drm_i915_private *dev_priv, uint32_t state) > state = DC_STATE_EN_UPTO_DC5; > > if (state & DC_STATE_EN_UPTO_DC5_DC6_MASK) > - gen9_set_dc_state_debugmask_memory_up(dev_priv); > + gen9_set_dc_state_debugmask(dev_priv); > > val = I915_READ(DC_STATE_EN); > DRM_DEBUG_KMS("Setting DC state from %02x to %02x\n", ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 4/4] drm/i915/gen9: Write dc state debugmask bits only once
DMC debugmask bits should stick so no need to write them everytime dc state is changed. v2: Write after firmware has been successfully loaded (Ville) Signed-off-by: Mika Kuoppala--- drivers/gpu/drm/i915/intel_csr.c| 8 +--- drivers/gpu/drm/i915/intel_drv.h| 2 +- drivers/gpu/drm/i915/intel_runtime_pm.c | 7 ++- 3 files changed, 8 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_csr.c b/drivers/gpu/drm/i915/intel_csr.c index b453fccfa25d..902054efb902 100644 --- a/drivers/gpu/drm/i915/intel_csr.c +++ b/drivers/gpu/drm/i915/intel_csr.c @@ -220,19 +220,19 @@ static const struct stepping_info *intel_get_stepping_info(struct drm_device *de * Everytime display comes back from low power state this function is called to * copy the firmware from internal memory to registers. */ -void intel_csr_load_program(struct drm_i915_private *dev_priv) +bool intel_csr_load_program(struct drm_i915_private *dev_priv) { u32 *payload = dev_priv->csr.dmc_payload; uint32_t i, fw_size; if (!IS_GEN9(dev_priv)) { DRM_ERROR("No CSR support available for this platform\n"); - return; + return false; } if (!dev_priv->csr.dmc_payload) { DRM_ERROR("Tried to program CSR with empty payload\n"); - return; + return false; } fw_size = dev_priv->csr.dmc_fw_size; @@ -245,6 +245,8 @@ void intel_csr_load_program(struct drm_i915_private *dev_priv) } dev_priv->csr.dc_state = 0; + + return true; } static uint32_t *parse_csr_fw(struct drm_i915_private *dev_priv, diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h index 285b0570be9c..c208ca630e99 100644 --- a/drivers/gpu/drm/i915/intel_drv.h +++ b/drivers/gpu/drm/i915/intel_drv.h @@ -1225,7 +1225,7 @@ u32 skl_plane_ctl_rotation(unsigned int rotation); /* intel_csr.c */ void intel_csr_ucode_init(struct drm_i915_private *); -void intel_csr_load_program(struct drm_i915_private *); +bool intel_csr_load_program(struct drm_i915_private *); void intel_csr_ucode_fini(struct drm_i915_private *); /* intel_dp.c */ diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c b/drivers/gpu/drm/i915/intel_runtime_pm.c index 1b490c7e4020..7f0577ca900e 100644 --- a/drivers/gpu/drm/i915/intel_runtime_pm.c +++ b/drivers/gpu/drm/i915/intel_runtime_pm.c @@ -526,9 +526,6 @@ static void gen9_set_dc_state(struct drm_i915_private *dev_priv, uint32_t state) else if (i915.enable_dc == 1 && state > DC_STATE_EN_UPTO_DC5) state = DC_STATE_EN_UPTO_DC5; - if (state & DC_STATE_EN_UPTO_DC5_DC6_MASK) - gen9_set_dc_state_debugmask(dev_priv); - val = I915_READ(DC_STATE_EN); DRM_DEBUG_KMS("Setting DC state from %02x to %02x\n", val & mask, state); @@ -2119,8 +2116,8 @@ static void skl_display_core_init(struct drm_i915_private *dev_priv, skl_init_cdclk(dev_priv); - if (dev_priv->csr.dmc_payload) - intel_csr_load_program(dev_priv); + if (dev_priv->csr.dmc_payload && intel_csr_load_program(dev_priv)) + gen9_set_dc_state_debugmask(dev_priv); } static void skl_display_core_uninit(struct drm_i915_private *dev_priv) -- 2.5.0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 3/4] drm/i915/gen9: Extend dmc debug mask to include cores
Cores need to be included into the debug mask. We don't exactly know what it does but the spec says it must be enabled. So obey. Signed-off-by: Mika Kuoppala--- drivers/gpu/drm/i915/i915_reg.h | 1 + drivers/gpu/drm/i915/intel_runtime_pm.c | 14 -- 2 files changed, 9 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 3774870477c1..f76cbf3e5d1e 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -7568,6 +7568,7 @@ enum skl_disp_power_wells { #define DC_STATE_EN_UPTO_DC5_DC6_MASK 0x3 #define DC_STATE_DEBUG _MMIO(0x45520) +#define DC_STATE_DEBUG_MASK_CORES (1<<0) #define DC_STATE_DEBUG_MASK_MEMORY_UP (1<<1) /* Please see hsw_read_dcomp() and hsw_write_dcomp() before using this register, diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c b/drivers/gpu/drm/i915/intel_runtime_pm.c index cb91540cfbad..1b490c7e4020 100644 --- a/drivers/gpu/drm/i915/intel_runtime_pm.c +++ b/drivers/gpu/drm/i915/intel_runtime_pm.c @@ -456,15 +456,17 @@ static void assert_can_disable_dc9(struct drm_i915_private *dev_priv) */ } -static void gen9_set_dc_state_debugmask_memory_up( - struct drm_i915_private *dev_priv) +static void gen9_set_dc_state_debugmask(struct drm_i915_private *dev_priv) { - uint32_t val; + uint32_t val, mask; + + mask = DC_STATE_DEBUG_MASK_MEMORY_UP | + DC_STATE_DEBUG_MASK_CORES; /* The below bit doesn't need to be cleared ever afterwards */ val = I915_READ(DC_STATE_DEBUG); - if (!(val & DC_STATE_DEBUG_MASK_MEMORY_UP)) { - val |= DC_STATE_DEBUG_MASK_MEMORY_UP; + if ((val & mask) != mask) { + val |= mask; I915_WRITE(DC_STATE_DEBUG, val); POSTING_READ(DC_STATE_DEBUG); } @@ -525,7 +527,7 @@ static void gen9_set_dc_state(struct drm_i915_private *dev_priv, uint32_t state) state = DC_STATE_EN_UPTO_DC5; if (state & DC_STATE_EN_UPTO_DC5_DC6_MASK) - gen9_set_dc_state_debugmask_memory_up(dev_priv); + gen9_set_dc_state_debugmask(dev_priv); val = I915_READ(DC_STATE_EN); DRM_DEBUG_KMS("Setting DC state from %02x to %02x\n", -- 2.5.0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 1/4] drm/i915/gen9: Check for DC state mismatch
From: Patrik JakobssonThe DMC can incorrectly run off and allow DC states on it's own. We don't know the root-cause for this yet but this patch makes it more visible. Reviewed-by: Mika Kuoppala Signed-off-by: Patrik Jakobsson --- drivers/gpu/drm/i915/i915_drv.h | 1 + drivers/gpu/drm/i915/intel_csr.c| 2 ++ drivers/gpu/drm/i915/intel_runtime_pm.c | 8 3 files changed, 11 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 6644c2e354c1..9cbcb5d80b3c 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -746,6 +746,7 @@ struct intel_csr { uint32_t mmio_count; i915_reg_t mmioaddr[8]; uint32_t mmiodata[8]; + uint32_t dc_state; }; #define DEV_INFO_FOR_EACH_FLAG(func, sep) \ diff --git a/drivers/gpu/drm/i915/intel_csr.c b/drivers/gpu/drm/i915/intel_csr.c index 2a7ec3141c8d..b453fccfa25d 100644 --- a/drivers/gpu/drm/i915/intel_csr.c +++ b/drivers/gpu/drm/i915/intel_csr.c @@ -243,6 +243,8 @@ void intel_csr_load_program(struct drm_i915_private *dev_priv) I915_WRITE(dev_priv->csr.mmioaddr[i], dev_priv->csr.mmiodata[i]); } + + dev_priv->csr.dc_state = 0; } static uint32_t *parse_csr_fw(struct drm_i915_private *dev_priv, diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c b/drivers/gpu/drm/i915/intel_runtime_pm.c index a2e367cf99a2..8b9290fdb3b2 100644 --- a/drivers/gpu/drm/i915/intel_runtime_pm.c +++ b/drivers/gpu/drm/i915/intel_runtime_pm.c @@ -494,10 +494,18 @@ static void gen9_set_dc_state(struct drm_i915_private *dev_priv, uint32_t state) val = I915_READ(DC_STATE_EN); DRM_DEBUG_KMS("Setting DC state from %02x to %02x\n", val & mask, state); + + /* Check if DMC is ignoring our DC state requests */ + if ((val & mask) != dev_priv->csr.dc_state) + DRM_ERROR("DC state mismatch (0x%x -> 0x%x)\n", + dev_priv->csr.dc_state, val & mask); + val &= ~mask; val |= state; I915_WRITE(DC_STATE_EN, val); POSTING_READ(DC_STATE_EN); + + dev_priv->csr.dc_state = val & mask; } void bxt_enable_dc9(struct drm_i915_private *dev_priv) -- 2.5.0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 2/4] drm/i915/gen9: Verify and enforce dc6 state writes
It has been observed that sometimes disabling the dc6 fails and dc6 state pops back up, brief moment after disabling. This has to be dmc save/restore timing issue or other bug in the way dc states are handled. Try to work around this issue as we don't have firmware fix yet available. Verify that the value we wrote for the dmc sticks, and also enforce it by rewriting it, if it didn't. Testcase: kms_flip/basic-flip-vs-dpms References: https://bugs.freedesktop.org/show_bug.cgi?id=93768 Cc: Patrik JakobssonCc: Rodrigo Vivi Cc: Imre Deak Signed-off-by: Mika Kuoppala --- drivers/gpu/drm/i915/intel_runtime_pm.c | 40 +++-- 1 file changed, 38 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c b/drivers/gpu/drm/i915/intel_runtime_pm.c index 8b9290fdb3b2..cb91540cfbad 100644 --- a/drivers/gpu/drm/i915/intel_runtime_pm.c +++ b/drivers/gpu/drm/i915/intel_runtime_pm.c @@ -470,6 +470,42 @@ static void gen9_set_dc_state_debugmask_memory_up( } } +static void gen9_write_dc_state(struct drm_i915_private *dev_priv, + u32 state) +{ + int rewrites = 0; + int rereads = 0; + u32 v; + + I915_WRITE(DC_STATE_EN, state); + + /* It has been observed that disabling the dc6 state sometimes +* doesn't stick and dmc keeps returning old value. Make sure +* the write really sticks enough times and also force rewrite until +* we are confident that state is exactly what we want. +*/ + do { + v = I915_READ(DC_STATE_EN); + + if (v != state) { + I915_WRITE(DC_STATE_EN, state); + rewrites++; + } else if (rereads++ > 5) { + break; + } + + } while (rewrites < 100); + + if (v != state) + DRM_ERROR("Writing dc state to 0x%x failed, now 0x%x\n", + state, v); + + /* Most of the times we need one retry, avoid spam */ + if (rewrites > 1) + DRM_DEBUG_KMS("Rewrote dc state to 0x%x %d times\n", + state, rewrites); +} + static void gen9_set_dc_state(struct drm_i915_private *dev_priv, uint32_t state) { uint32_t val; @@ -502,8 +538,8 @@ static void gen9_set_dc_state(struct drm_i915_private *dev_priv, uint32_t state) val &= ~mask; val |= state; - I915_WRITE(DC_STATE_EN, val); - POSTING_READ(DC_STATE_EN); + + gen9_write_dc_state(dev_priv, val); dev_priv->csr.dc_state = val & mask; } -- 2.5.0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 0/4] gen9 dmc state harderning
There have been problems on losing state sync between dmc and driver. I belive the interplay with racy hw access due to intel_display_power_is_enabled() with overlapping reprogramming of allowed dc states (DC_STATE_EN) made DMC very confused. Imre has now get rid of the troublesome intel_display_power_is_enabled(). On my tests, that is a prerequisite for keeping dmc healthy. But as we can see from CI/bat, it is still not enough. Sometimes the write still doesn't stick. So here are dcm state tracking patches. With these on top of Imre's patches, I have been able to make skl/dmc (v1.23) symptom free on dc state keeping. With the expection that sometimes we still need to write the dc_state_en twice. The runaway situation of dmc not obeying the write, stucking the flip and eventually killing the gpu is gone. Thanks, -Mika Mika Kuoppala (3): drm/i915/gen9: Verify and enforce dc6 state writes drm/i915/gen9: Extend dmc debug mask to include cores drm/i915/gen9: Write dc state debugmask bits only once Patrik Jakobsson (1): drm/i915/gen9: Check for DC state mismatch drivers/gpu/drm/i915/i915_drv.h | 1 + drivers/gpu/drm/i915/i915_reg.h | 1 + drivers/gpu/drm/i915/intel_csr.c| 10 +++-- drivers/gpu/drm/i915/intel_drv.h| 2 +- drivers/gpu/drm/i915/intel_runtime_pm.c | 67 +++-- 5 files changed, 65 insertions(+), 16 deletions(-) -- 2.5.0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v6 0/7] Convert requests to use struct fence
On Thu, Feb 18, 2016 at 02:24:03PM +, john.c.harri...@intel.com wrote: > From: John HarrisonDoes this pass igt? If so, which are the bug fixes for the current regressions from the request conversion? -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v6 3/7] drm/i915: Add per context timelines to fence object
On Thu, Feb 18, 2016 at 02:24:06PM +, john.c.harri...@intel.com wrote: > From: John Harrison> > The fence object used inside the request structure requires a sequence > number. Although this is not used by the i915 driver itself, it could > potentially be used by non-i915 code if the fence is passed outside of > the driver. This is the intention as it allows external kernel drivers > and user applications to wait on batch buffer completion > asynchronously via the dma-buff fence API. > > To ensure that such external users are not confused by strange things > happening with the seqno, this patch adds in a per context timeline > that can provide a guaranteed in-order seqno value for the fence. This > is safe because the scheduler will not re-order batch buffers within a > context - they are considered to be mutually dependent. This is still nonsense. Just implement per-context seqno. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v6 4/7] drm/i915: Delay the freeing of requests until retire time
On Thu, Feb 18, 2016 at 02:24:07PM +, john.c.harri...@intel.com wrote: > From: John HarrisonAs I said, and have shown in patches several months ago, just fix the underlying bug to remove the struct_mutex requirement for freeing the request. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v4 3/8] drm/i915: Kill off intel_crtc->atomic.wait_vblank, v4.
Op 18-02-16 om 15:14 schreef Zanoni, Paulo R: > Em Qui, 2016-02-18 às 14:22 +0100, Maarten Lankhorst escreveu: >> Op 17-02-16 om 22:20 schreef Zanoni, Paulo R: >>> Em Qua, 2016-02-10 às 13:49 +0100, Maarten Lankhorst escreveu: Currently we perform our own wait in post_plane_update, but the atomic core performs another one in wait_for_vblanks. This means that 2 vblanks are done when a fb is changed, which is a bit overkill. Merge them by creating a helper function that takes a crtc mask for the planes to wait on. The broadwell vblank workaround may look gone entirely but this is not the case. pipe_config->wm_changed is set to true when any plane is turned on, which forces a vblank wait. Changes since v1: - Removing the double vblank wait on broadwell moved to its own commit. Changes since v2: - Move out POWER_DOMAIN_MODESET handling to its own commit. Changes since v3: - Do not wait for vblank on legacy cursor updates. (Ville) - Move broadwell vblank workaround comment to page_flip_finished. (Ville) Changes since v4: - Compile fix, legacy_cursor_flip -> *_update. Signed-off-by: Maarten Lankhorst--- drivers/gpu/drm/i915/intel_atomic.c | 1 + drivers/gpu/drm/i915/intel_display.c | 86 +++- drivers/gpu/drm/i915/intel_drv.h | 2 +- 3 files changed, 67 insertions(+), 22 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_atomic.c b/drivers/gpu/drm/i915/intel_atomic.c index 4625f8a9ba12..8e579a8505ac 100644 --- a/drivers/gpu/drm/i915/intel_atomic.c +++ b/drivers/gpu/drm/i915/intel_atomic.c @@ -97,6 +97,7 @@ intel_crtc_duplicate_state(struct drm_crtc *crtc) crtc_state->disable_lp_wm = false; crtc_state->disable_cxsr = false; crtc_state->wm_changed = false; + crtc_state->fb_changed = false; return _state->base; } diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 804f2c6f260d..4d4dddc1f970 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -4785,9 +4785,6 @@ static void intel_post_plane_update(struct intel_crtc *crtc) to_intel_crtc_state(crtc->base.state); struct drm_device *dev = crtc->base.dev; - if (atomic->wait_vblank) - intel_wait_for_vblank(dev, crtc->pipe); - intel_frontbuffer_flip(dev, atomic->fb_bits); crtc->wm.cxsr_allowed = true; @@ -10902,6 +10899,12 @@ static bool page_flip_finished(struct intel_crtc *crtc) return true; /* + * BDW signals flip done immediately if the plane + * is disabled, even if the plane enable is already + * armed to occur at the next vblank :( + */ >>> Having this comment here is just... weird. I think it removes a lot >>> of >>> the context that was present before. >>> + + /* * A DSPSURFLIVE check isn't enough in case the mmio and CS flips * used the same base address. In that case the mmio flip might * have completed, but the CS hasn't even executed the flip yet. @@ -11778,6 +11781,9 @@ int intel_plane_atomic_calc_changes(struct drm_crtc_state *crtc_state, if (!was_visible && !visible) return 0; + if (fb != old_plane_state->base.fb) + pipe_config->fb_changed = true; + turn_off = was_visible && (!visible || mode_changed); turn_on = visible && (!was_visible || mode_changed); @@ -11793,8 +11799,6 @@ int intel_plane_atomic_calc_changes(struct drm_crtc_state *crtc_state, /* must disable cxsr around plane enable/disable */ if (plane->type != DRM_PLANE_TYPE_CURSOR) { - if (is_crtc_enabled) - intel_crtc->atomic.wait_vblank = true; pipe_config->disable_cxsr = true; } >>> We could have killed the brackets here :) >> Indeed, will do so in next version. } else if (intel_wm_need_update(plane, plane_state)) { @@ -11810,14 +11814,6 @@ int intel_plane_atomic_calc_changes(struct drm_crtc_state *crtc_state, intel_crtc->atomic.post_enable_primary = turn_on; intel_crtc->atomic.update_fbc = true; - /* - * BDW signals flip done immediately if the plane - * is disabled, even if the plane enable is already - * armed to occur at the next vblank :( - */ - if (turn_on && IS_BROADWELL(dev)) -
[Intel-gfx] [PATCH v5 06/35] drm/i915: Start of GPU scheduler
From: John HarrisonInitial creation of scheduler source files. Note that this patch implements most of the scheduler functionality but does not hook it in to the driver yet. It also leaves the scheduler code in 'pass through' mode so that even when it is hooked in, it will not actually do very much. This allows the hooks to be added one at a time in bite size chunks and only when the scheduler is finally enabled at the end does anything start happening. The general theory of operation is that when batch buffers are submitted to the driver, the execbuffer() code packages up all the information required to execute the batch buffer at a later time. This package is given over to the scheduler which adds it to an internal node list. The scheduler also scans the list of objects associated with the batch buffer and compares them against the objects already in use by other buffers in the node list. If matches are found then the new batch buffer node is marked as being dependent upon the matching node. The same is done for the context object. The scheduler also bumps up the priority of such matching nodes on the grounds that the more dependencies a given batch buffer has the more important it is likely to be. The scheduler aims to have a given (tuneable) number of batch buffers in flight on the hardware at any given time. If fewer than this are currently executing when a new node is queued, then the node is passed straight through to the submit function. Otherwise it is simply added to the queue and the driver returns back to user land. The scheduler is notified when each batch buffer completes and updates its internal tracking accordingly. At the end of the completion interrupt processing, if any scheduler tracked batches were processed, the scheduler's deferred worker thread is woken up. This can do more involved processing such as actually removing completed nodes from the queue and freeing up the resources associated with them (internal memory allocations, DRM object references, context reference, etc.). The work handler also checks the in flight count and calls the submission code if a new slot has appeared. When the scheduler's submit code is called, it scans the queued node list for the highest priority node that has no unmet dependencies. Note that the dependency calculation is complex as it must take inter-ring dependencies and potential preemptions into account. Note also that in the future this will be extended to include external dependencies such as the Android Native Sync file descriptors and/or the linux dma-buff synchronisation scheme. If a suitable node is found then it is sent to execbuff_final() for submission to the hardware. The in flight count is then re-checked and a new node popped from the list if appropriate. All nodes that are not submitted have their priority bumped. This ensures that low priority tasks do not get starved out by busy higher priority ones - everything will eventually get its turn to run. Note that this patch does not implement pre-emptive scheduling. Only basic scheduling by re-ordering batch buffer submission is currently implemented. Pre-emption of actively executing batch buffers comes in the next patch series. v2: Changed priority levels to +/-1023 due to feedback from Chris Wilson. Removed redundant index from scheduler node. Changed time stamps to use jiffies instead of raw monotonic. This provides lower resolution but improved compatibility with other i915 code. Major re-write of completion tracking code due to struct fence conversion. The scheduler no longer has it's own private IRQ handler but just lets the existing request code handle completion events. Instead, the scheduler now hooks into the request notify code to be told when a request has completed. Reduced driver mutex locking scope. Removal of scheduler nodes no longer grabs the mutex lock. v3: Refactor of dependency generation to make the code more readable. Also added in read-read optimisation support - i.e., don't treat a shared read-only buffer as being a dependency. Allowed the killing of queued nodes rather than only flying ones. v4: Updated the commit message to better reflect the current state of the code. Downgraded some BUG_ONs to WARN_ONs. Used the correct array memory allocator function (kmalloc_array instead of kmalloc). Corrected the format of some comments. Wrapped some lines differently to keep the style checker happy. Fixed a WARN_ON when killing nodes. The dependency removal code checks that nodes being destroyed do not have any oustanding dependencies (which would imply they should not have been executed yet). In the case of nodes being destroyed, e.g. due to context banning, then this might well be the case - they have not been executed and do indeed have outstanding dependencies. Re-instated the code to disble interrupts when not in use. The underlying problem causing broken IRQ reference counts seems to have been fixed now. v5: Shuffled
[Intel-gfx] [PATCH v6 6/7] drm/i915: Updated request structure tracing
From: John HarrisonAdded the '_complete' trace event which occurs when a fence/request is signaled as complete. Also moved the notify event from the IRQ handler code to inside the notify function itself. v3: Added the current ring seqno to the notify trace point. v5: Line wrapping to keep the style checker happy. For: VIZ-5190 Signed-off-by: John Harrison --- drivers/gpu/drm/i915/i915_gem.c | 9 +++-- drivers/gpu/drm/i915/i915_irq.c | 2 -- drivers/gpu/drm/i915/i915_trace.h | 14 +- 3 files changed, 16 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 635729e..f7858ea 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2870,13 +2870,16 @@ void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked) unsigned long flags; u32 seqno; - if (list_empty(>fence_signal_list)) + if (list_empty(>fence_signal_list)) { + trace_i915_gem_request_notify(ring, 0); return; + } if (!fence_locked) spin_lock_irqsave(>fence_lock, flags); seqno = ring->get_seqno(ring, false); + trace_i915_gem_request_notify(ring, seqno); list_for_each_entry_safe(req, req_next, >fence_signal_list, signal_link) { if (!req->cancelled) { @@ -2890,8 +2893,10 @@ void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked) */ list_del_init(>signal_link); - if (!req->cancelled) + if (!req->cancelled) { fence_signal_locked(>fence); + trace_i915_gem_request_complete(req); + } if (req->irq_enabled) { req->ring->irq_put(req->ring); diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index a5f64aa..20c6a90 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -999,8 +999,6 @@ static void notify_ring(struct intel_engine_cs *ring) if (!intel_ring_initialized(ring)) return; - trace_i915_gem_request_notify(ring); - i915_gem_request_notify(ring, false); wake_up_all(>irq_queue); diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h index 52b2d40..cfe4f03 100644 --- a/drivers/gpu/drm/i915/i915_trace.h +++ b/drivers/gpu/drm/i915/i915_trace.h @@ -561,23 +561,27 @@ DEFINE_EVENT(i915_gem_request, i915_gem_request_add, ); TRACE_EVENT(i915_gem_request_notify, - TP_PROTO(struct intel_engine_cs *ring), - TP_ARGS(ring), + TP_PROTO(struct intel_engine_cs *ring, uint32_t seqno), + TP_ARGS(ring, seqno), TP_STRUCT__entry( __field(u32, dev) __field(u32, ring) __field(u32, seqno) +__field(bool, is_empty) ), TP_fast_assign( __entry->dev = ring->dev->primary->index; __entry->ring = ring->id; - __entry->seqno = ring->get_seqno(ring, false); + __entry->seqno = seqno; + __entry->is_empty = + list_empty(>fence_signal_list); ), - TP_printk("dev=%u, ring=%u, seqno=%u", - __entry->dev, __entry->ring, __entry->seqno) + TP_printk("dev=%u, ring=%u, seqno=%u, empty=%d", + __entry->dev, __entry->ring, __entry->seqno, + __entry->is_empty) ); DEFINE_EVENT(i915_gem_request, i915_gem_request_retire, -- 1.9.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v6 2/7] drm/i915: Removed now redudant parameter to i915_gem_request_completed()
From: John HarrisonThe change to the implementation of i915_gem_request_completed() means that the lazy coherency flag is no longer used. This can now be removed to simplify the interface. v6: Updated to newer nigthly and resolved conflicts. For: VIZ-5190 Signed-off-by: John Harrison --- drivers/gpu/drm/i915/i915_debugfs.c | 2 +- drivers/gpu/drm/i915/i915_drv.h | 3 +-- drivers/gpu/drm/i915/i915_gem.c | 14 +++--- drivers/gpu/drm/i915/intel_display.c | 2 +- drivers/gpu/drm/i915/intel_pm.c | 4 ++-- 5 files changed, 12 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index d032e9f..b90d6ea 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -601,7 +601,7 @@ static int i915_gem_pageflip_info(struct seq_file *m, void *data) i915_gem_request_get_seqno(work->flip_queued_req), dev_priv->next_seqno, ring->get_seqno(ring, true), - i915_gem_request_completed(work->flip_queued_req, true)); + i915_gem_request_completed(work->flip_queued_req)); } else seq_printf(m, "Flip not associated with any ring\n"); seq_printf(m, "Flip queued on frame %d, (was ready on frame %d), now %d\n", diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 7c64cc1..86ef0b4 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2295,8 +2295,7 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring, struct drm_i915_gem_request **req_out); void i915_gem_request_cancel(struct drm_i915_gem_request *req); -static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req, - bool lazy_coherency) +static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req) { return fence_is_signaled(>fence); } diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 901be6c..e170732 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1273,7 +1273,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req, if (list_empty(>list)) return 0; - if (i915_gem_request_completed(req, true)) + if (i915_gem_request_completed(req)) return 0; timeout_expire = 0; @@ -1323,7 +1323,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req, break; } - if (i915_gem_request_completed(req, false)) { + if (i915_gem_request_completed(req)) { ret = 0; break; } @@ -2825,7 +2825,7 @@ i915_gem_find_active_request(struct intel_engine_cs *ring) struct drm_i915_gem_request *request; list_for_each_entry(request, >request_list, list) { - if (i915_gem_request_completed(request, false)) + if (i915_gem_request_completed(request)) continue; return request; @@ -2959,7 +2959,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring) struct drm_i915_gem_request, list); - if (!i915_gem_request_completed(request, true)) + if (!i915_gem_request_completed(request)) break; i915_gem_request_retire(request); @@ -2983,7 +2983,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring) } if (unlikely(ring->trace_irq_req && -i915_gem_request_completed(ring->trace_irq_req, true))) { +i915_gem_request_completed(ring->trace_irq_req))) { ring->irq_put(ring); i915_gem_request_assign(>trace_irq_req, NULL); } @@ -3093,7 +3093,7 @@ i915_gem_object_flush_active(struct drm_i915_gem_object *obj) if (list_empty(>list)) goto retire; - if (i915_gem_request_completed(req, true)) { + if (i915_gem_request_completed(req)) { __i915_gem_request_retire__upto(req); retire: i915_gem_object_retire__read(obj, i); @@ -3205,7 +3205,7 @@ __i915_gem_object_sync(struct drm_i915_gem_object *obj, if (to == from) return 0; - if (i915_gem_request_completed(from_req, true)) + if (i915_gem_request_completed(from_req)) return 0; if (!i915_semaphore_is_enabled(obj->base.dev)) { diff
[Intel-gfx] [PATCH v6 3/7] drm/i915: Add per context timelines to fence object
From: John HarrisonThe fence object used inside the request structure requires a sequence number. Although this is not used by the i915 driver itself, it could potentially be used by non-i915 code if the fence is passed outside of the driver. This is the intention as it allows external kernel drivers and user applications to wait on batch buffer completion asynchronously via the dma-buff fence API. To ensure that such external users are not confused by strange things happening with the seqno, this patch adds in a per context timeline that can provide a guaranteed in-order seqno value for the fence. This is safe because the scheduler will not re-order batch buffers within a context - they are considered to be mutually dependent. v2: New patch in series. v3: Renamed/retyped timeline structure fields after review comments by Tvrtko Ursulin. Added context information to the timeline's name string for better identification in debugfs output. v5: Line wrapping and other white space fixes to keep style checker happy. For: VIZ-5190 Signed-off-by: John Harrison Cc: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_drv.h | 25 +++--- drivers/gpu/drm/i915/i915_gem.c | 83 + drivers/gpu/drm/i915/i915_gem_context.c | 16 ++- drivers/gpu/drm/i915/intel_lrc.c| 8 drivers/gpu/drm/i915/intel_ringbuffer.h | 1 - 5 files changed, 115 insertions(+), 18 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 86ef0b4..62dbdf2 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -845,6 +845,15 @@ struct i915_ctx_hang_stats { bool banned; }; +struct i915_fence_timeline { + charname[32]; + unsignedfence_context; + unsignednext; + + struct intel_context *ctx; + struct intel_engine_cs *ring; +}; + /* This must match up with the value previously used for execbuf2.rsvd1. */ #define DEFAULT_CONTEXT_HANDLE 0 @@ -892,6 +901,7 @@ struct intel_context { struct i915_vma *lrc_vma; u64 lrc_desc; uint32_t *lrc_reg_state; + struct i915_fence_timeline fence_timeline; } engine[I915_NUM_RINGS]; struct list_head link; @@ -2200,13 +2210,10 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old, struct drm_i915_gem_request { /** * Underlying object for implementing the signal/wait stuff. -* NB: Never call fence_later() or return this fence object to user -* land! Due to lazy allocation, scheduler re-ordering, pre-emption, -* etc., there is no guarantee at all about the validity or -* sequentiality of the fence's seqno! It is also unsafe to let -* anything outside of the i915 driver get hold of the fence object -* as the clean up when decrementing the reference count requires -* holding the driver mutex lock. +* NB: Never return this fence object to user land! It is unsafe to +* let anything outside of the i915 driver get hold of the fence +* object as the clean up when decrementing the reference count +* requires holding the driver mutex lock. */ struct fence fence; @@ -2295,6 +2302,10 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring, struct drm_i915_gem_request **req_out); void i915_gem_request_cancel(struct drm_i915_gem_request *req); +int i915_create_fence_timeline(struct drm_device *dev, + struct intel_context *ctx, + struct intel_engine_cs *ring); + static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req) { return fence_is_signaled(>fence); diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index e170732..2d50287 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2731,9 +2731,35 @@ static const char *i915_gem_request_get_driver_name(struct fence *req_fence) static const char *i915_gem_request_get_timeline_name(struct fence *req_fence) { - struct drm_i915_gem_request *req = container_of(req_fence, -typeof(*req), fence); - return req->ring->name; + struct drm_i915_gem_request *req; + struct i915_fence_timeline *timeline; + + req = container_of(req_fence, typeof(*req), fence); + timeline = >ctx->engine[req->ring->id].fence_timeline; + + return timeline->name; +} + +static void i915_gem_request_timeline_value_str(struct fence *req_fence, + char *str, int size) +{ + struct drm_i915_gem_request *req; + + req = container_of(req_fence, typeof(*req), fence); + + /* Last signalled timeline value ??? */ +
[Intel-gfx] [PATCH v6 5/7] drm/i915: Interrupt driven fences
From: John HarrisonThe intended usage model for struct fence is that the signalled status should be set on demand rather than polled. That is, there should not be a need for a 'signaled' function to be called everytime the status is queried. Instead, 'something' should be done to enable a signal callback from the hardware which will update the state directly. In the case of requests, this is the seqno update interrupt. The idea is that this callback will only be enabled on demand when something actually tries to wait on the fence. This change removes the polling test and replaces it with the callback scheme. Each fence is added to a 'please poke me' list at the start of i915_add_request(). The interrupt handler then scans through the 'poke me' list when a new seqno pops out and signals any matching fence/request. The fence is then removed from the list so the entire request stack does not need to be scanned every time. Note that the fence is added to the list before the commands to generate the seqno interrupt are added to the ring. Thus the sequence is guaranteed to be race free if the interrupt is already enabled. Note that the interrupt is only enabled on demand (i.e. when __wait_request() is called). Thus there is still a potential race when enabling the interrupt as the request may already have completed. However, this is simply solved by calling the interrupt processing code immediately after enabling the interrupt and thereby checking for already completed requests. Lastly, the ring clean up code has the possibility to cancel outstanding requests (e.g. because TDR has reset the ring). These requests will never get signalled and so must be removed from the signal list manually. This is done by setting a 'cancelled' flag and then calling the regular notify/retire code path rather than attempting to duplicate the list manipulatation and clean up code in multiple places. This also avoid any race condition where the cancellation request might occur after/during the completion interrupt actually arriving. v2: Updated to take advantage of the request unreference no longer requiring the mutex lock. v3: Move the signal list processing around to prevent unsubmitted requests being added to the list. This was occurring on Android because the native sync implementation calls the fence->enable_signalling API immediately on fence creation. Updated after review comments by Tvrtko Ursulin. Renamed list nodes to 'link' instead of 'list'. Added support for returning an error code on a cancelled fence. Update list processing to be more efficient/safer with respect to spinlocks. v5: Made i915_gem_request_submit a static as it is only ever called from one place. Fixed up the low latency wait optimisation. The time delay between the seqno value being to memory and the drive's ISR running can be significant, at least for the wait request micro-benchmark. This can be greatly improved by explicitly checking for seqno updates in the pre-wait busy poll loop. Also added some documentation comments to the busy poll code. Fixed up support for the faking of lost interrupts (test_irq_rings/missed_irq_rings). That is, there is an IGT test that tells the driver to loose interrupts deliberately and then check that everything still works as expected (albeit much slower). Updates from review comments: use non IRQ-save spinlocking, early exit on WARN and improved comments (Tvrtko Ursulin). v6: Updated to newer nigthly and resolved conflicts around the wait_request busy spin optimisation. Also fixed a race condition between this early exit path and the regular completion path. For: VIZ-5190 Signed-off-by: John Harrison Cc: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_drv.h | 8 ++ drivers/gpu/drm/i915/i915_gem.c | 240 +--- drivers/gpu/drm/i915/i915_irq.c | 2 + drivers/gpu/drm/i915/intel_lrc.c| 2 + drivers/gpu/drm/i915/intel_ringbuffer.c | 2 + drivers/gpu/drm/i915/intel_ringbuffer.h | 2 + 6 files changed, 234 insertions(+), 22 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 2c6aefba..0584846 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2210,7 +2210,12 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old, struct drm_i915_gem_request { /** Underlying object for implementing the signal/wait stuff. */ struct fence fence; + struct list_head signal_link; + struct list_head unsignal_link; struct list_head delayed_free_link; + bool cancelled; + bool irq_enabled; + bool signal_requested; /** On Which ring this request was generated */ struct drm_i915_private *i915; @@ -2296,6 +2301,9 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring, struct intel_context *ctx,
[Intel-gfx] [PATCH v6 1/7] drm/i915: Convert requests to use struct fence
From: John HarrisonThere is a construct in the linux kernel called 'struct fence' that is intended to keep track of work that is executed on hardware. I.e. it solves the basic problem that the drivers 'struct drm_i915_gem_request' is trying to address. The request structure does quite a lot more than simply track the execution progress so is very definitely still required. However, the basic completion status side could be updated to use the ready made fence implementation and gain all the advantages that provides. This patch makes the first step of integrating a struct fence into the request. It replaces the explicit reference count with that of the fence. It also replaces the 'is completed' test with the fence's equivalent. Currently, that simply chains on to the original request implementation. A future patch will improve this. v3: Updated after review comments by Tvrtko Ursulin. Added fence context/seqno pair to the debugfs request info. Renamed fence 'driver name' to just 'i915'. Removed BUG_ONs. v5: Changed seqno format in debugfs to %x rather than %u as that is apparently the preferred appearance. Line wrapped some long lines to keep the style checker happy. v6: Updated to newer nigthly and resolved conflicts. The biggest issue was with the re-worked busy spin precursor to waiting on a request. In particular, the addition of a 'request_started' helper function. This has no corresponding concept within the fence framework. However, it is only ever used in one place and the whole point of that place is to always directly read the seqno for absolutely lowest latency possible. So the simple solution is to just make the seqno test explicit at that point now rather than later in the series (it was previously being done anyway when fences become interrupt driven). For: VIZ-5190 Signed-off-by: John Harrison Cc: Tvrtko Ursulin Reviewed-by: Jesse Barnes --- drivers/gpu/drm/i915/i915_debugfs.c | 5 ++- drivers/gpu/drm/i915/i915_drv.h | 47 +++ drivers/gpu/drm/i915/i915_gem.c | 67 + drivers/gpu/drm/i915/intel_lrc.c| 1 + drivers/gpu/drm/i915/intel_ringbuffer.c | 1 + drivers/gpu/drm/i915/intel_ringbuffer.h | 3 ++ 6 files changed, 89 insertions(+), 35 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index ebe7063..d032e9f 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -709,11 +709,12 @@ static int i915_gem_request_info(struct seq_file *m, void *data) task = NULL; if (req->pid) task = pid_task(req->pid, PIDTYPE_PID); - seq_printf(m, "%x @ %d: %s [%d]\n", + seq_printf(m, "%x @ %d: %s [%d], fence = %x:%x\n", req->seqno, (int) (jiffies - req->emitted_jiffies), task ? task->comm : "", - task ? task->pid : -1); + task ? task->pid : -1, + req->fence.context, req->fence.seqno); rcu_read_unlock(); } diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 351308f..7c64cc1 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -53,6 +53,7 @@ #include #include #include "intel_guc.h" +#include /* General customization: */ @@ -2197,7 +2198,17 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old, * initial reference taken using kref_init */ struct drm_i915_gem_request { - struct kref ref; + /** +* Underlying object for implementing the signal/wait stuff. +* NB: Never call fence_later() or return this fence object to user +* land! Due to lazy allocation, scheduler re-ordering, pre-emption, +* etc., there is no guarantee at all about the validity or +* sequentiality of the fence's seqno! It is also unsafe to let +* anything outside of the i915 driver get hold of the fence object +* as the clean up when decrementing the reference count requires +* holding the driver mutex lock. +*/ + struct fence fence; /** On Which ring this request was generated */ struct drm_i915_private *i915; @@ -2283,7 +2294,13 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring, struct intel_context *ctx, struct drm_i915_gem_request **req_out); void i915_gem_request_cancel(struct drm_i915_gem_request *req); -void i915_gem_request_free(struct kref *req_ref); + +static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
Re: [Intel-gfx] [PATCH v4 07/38] drm/i915: Start of GPU scheduler
On 20/01/2016 13:18, Joonas Lahtinen wrote: Hi, Comments below this pre text. Many of the comments are related to the indent and style of the code. That stuff is important to fix for future maintainability. In order for the future review to be more effective, I'd like to next see a v5 of the series where the code quality concerns have been addressed, patches squashed to be actual reviewable chunks and appropriate kerneldoc being added. To give an idea of proper slicing of patches, first produce a no-op scheduler, adding the extra function calls where needed and still keeping the scheduling completely linear. Second patch could introduce out of order submitting, third one priority bumping, fourth pre-empting and so on. That way, each patch extends the functionality and is itself already mergeable. That way I've been able to go through and understand the existing code, and I can actually review (other than just nag about indent and coding style) if the changes are appropriate to bring in the functionality desired. In the current split, for me or anyone who did not participate writing the code, it is otherwise too confusing to try to guess what future changes might make each piece of code make sense, and which will be redundant in the future too. There is no value in splitting code to chunks that are not itself functional. Regards, Joonas On Mon, 2016-01-11 at 18:42 +, john.c.harri...@intel.com wrote: From: John HarrisonInitial creation of scheduler source files. Note that this patch implements most of the scheduler functionality but does not hook it in to the driver yet. It also leaves the scheduler code in 'pass through' mode so that even when it is hooked in, it will not actually do very much. This allows the hooks to be added one at a time in bite size chunks and only when the scheduler is finally enabled at the end does anything start happening. The general theory of operation is that when batch buffers are submitted to the driver, the execbuffer() code packages up all the information required to execute the batch buffer at a later time. This package is given over to the scheduler which adds it to an internal node list. The scheduler also scans the list of objects associated with the batch buffer and compares them against the objects already in use by other buffers in the node list. If matches are found then the new batch buffer node is marked as being dependent upon the matching node. The same is done for the context object. The scheduler also bumps up the priority of such matching nodes on the grounds that the more dependencies a given batch buffer has the more important it is likely to be. The scheduler aims to have a given (tuneable) number of batch buffers in flight on the hardware at any given time. If fewer than this are currently executing when a new node is queued, then the node is passed straight through to the submit function. Otherwise it is simply added to the queue and the driver returns back to user land. The scheduler is notified when each batch buffer completes and updates its internal tracking accordingly. At the end of the completion interrupt processing, if any scheduler tracked batches were processed, the scheduler's deferred worker thread is woken up. This can do more involved processing such as actually removing completed nodes from the queue and freeing up the resources associated with them (internal memory allocations, DRM object references, context reference, etc.). The work handler also checks the in flight count and calls the submission code if a new slot has appeared. When the scheduler's submit code is called, it scans the queued node list for the highest priority node that has no unmet dependencies. Note that the dependency calculation is complex as it must take inter-ring dependencies and potential preemptions into account. Note also that in the future this will be extended to include external dependencies such as the Android Native Sync file descriptors and/or the linux dma-buff synchronisation scheme. If a suitable node is found then it is sent to execbuff_final() for submission to the hardware. The in flight count is then re-checked and a new node popped from the list if appropriate. Note that this patch does not implement pre-emptive scheduling. Only basic scheduling by re-ordering batch buffer submission is currently implemented. Pre-emption of actively executing batch buffers comes in the next patch series. v2: Changed priority levels to +/-1023 due to feedback from Chris Wilson. Removed redundant index from scheduler node. Changed time stamps to use jiffies instead of raw monotonic. This provides lower resolution but improved compatibility with other i915 code. Major re-write of completion tracking code due to struct fence conversion. The scheduler no longer has it's own private IRQ handler but just lets the existing request code handle completion events. Instead, the scheduler now hooks into the request notify
[Intel-gfx] [PATCH v6 0/7] Convert requests to use struct fence
From: John HarrisonThere is a construct in the linux kernel called 'struct fence' that is intended to keep track of work that is executed on hardware. I.e. it solves the basic problem that the drivers 'struct drm_i915_gem_request' is trying to address. The request structure does quite a lot more than simply track the execution progress so is very definitely still required. However, the basic completion status side could be updated to use the ready made fence implementation and gain all the advantages that provides. Using the struct fence object also has the advantage that the fence can be used outside of the i915 driver (by other drivers or by userland applications). That is the basis of the dma-buff synchronisation API and allows asynchronous tracking of work completion. In this case, it allows applications to be signalled directly when a batch buffer completes without having to make an IOCTL call into the driver. This is work that was planned since the conversion of the driver from being seqno value based to being request structure based. This patch series does that work. An IGT test to exercise the fence support from user land is in progress and will follow. Android already makes extensive use of fences for display composition. Real world linux usage is planned in the form of Jesse's page table sharing / bufferless execbuf support. There is also a plan that Wayland (and others) could make use of it in a similar manner to Android. v2: Updated for review comments by various people and to add support for Android style 'native sync'. v3: Updated from review comments by Tvrtko Ursulin. Also moved sync framework out of staging and improved request completion handling. v4: Fixed patch tag (should have been PATCH not RFC). Corrected ownership of one patch which had passed through many hands before reaching me. Fixed a bug introduced in v3 and updated for review comments. v5: Removed de-staging and further updates to Android sync code. The de-stage is now being handled by someone else. The sync integration to the i915 driver will be a separate patch set that can only land after the external de-stage has been completed. Assorted changes based on review comments and style checker fixes. Most significant change is fixing up the fake lost interrupt support for the 'drv_missed_irq_hang' IGT test and improving the wait request latency. v6: Updated to newer nigthly and resolved conflicts around updates to the wait_request optimisations. [Patches against drm-intel-nightly tree fetched 19/01/2016] John Harrison (7): drm/i915: Convert requests to use struct fence drm/i915: Removed now redudant parameter to i915_gem_request_completed() drm/i915: Add per context timelines to fence object drm/i915: Delay the freeing of requests until retire time drm/i915: Interrupt driven fences drm/i915: Updated request structure tracing drm/i915: Cache last IRQ seqno to reduce IRQ overhead drivers/gpu/drm/i915/i915_debugfs.c | 7 +- drivers/gpu/drm/i915/i915_drv.h | 69 +++--- drivers/gpu/drm/i915/i915_gem.c | 423 +--- drivers/gpu/drm/i915/i915_gem_context.c | 16 +- drivers/gpu/drm/i915/i915_irq.c | 2 +- drivers/gpu/drm/i915/i915_trace.h | 14 +- drivers/gpu/drm/i915/intel_display.c| 4 +- drivers/gpu/drm/i915/intel_lrc.c| 13 + drivers/gpu/drm/i915/intel_pm.c | 6 +- drivers/gpu/drm/i915/intel_ringbuffer.c | 5 + drivers/gpu/drm/i915/intel_ringbuffer.h | 12 + 11 files changed, 491 insertions(+), 80 deletions(-) -- 1.9.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v6 4/7] drm/i915: Delay the freeing of requests until retire time
From: John HarrisonThe request structure is reference counted. When the count reached zero, the request was immediately freed and all associated objects were unrefereced/unallocated. This meant that the driver mutex lock must be held at the point where the count reaches zero. This was fine while all references were held internally to the driver. However, the plan is to allow the underlying fence object (and hence the request itself) to be returned to other drivers and to userland. External users cannot be expected to acquire a driver private mutex lock. Rather than attempt to disentangle the request structure from the driver mutex lock, the decsion was to defer the free code until a later (safer) point. Hence this patch changes the unreference callback to merely move the request onto a delayed free list. The driver's retire worker thread will then process the list and actually call the free function on the requests. v2: New patch in series. v3: Updated after review comments by Tvrtko Ursulin. Rename list nodes to 'link' rather than 'list'. Update list processing to be more efficient/safer with respect to spinlocks. v4: Changed to use basic spinlocks rather than IRQ ones - missed update from earlier feedback by Tvrtko. v5: Improved a comment to keep the style checker happy. For: VIZ-5190 Signed-off-by: John Harrison Cc: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_drv.h | 22 +++- drivers/gpu/drm/i915/i915_gem.c | 37 + drivers/gpu/drm/i915/intel_display.c| 2 +- drivers/gpu/drm/i915/intel_lrc.c| 2 ++ drivers/gpu/drm/i915/intel_pm.c | 2 +- drivers/gpu/drm/i915/intel_ringbuffer.c | 2 ++ drivers/gpu/drm/i915/intel_ringbuffer.h | 7 +++ 7 files changed, 49 insertions(+), 25 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 62dbdf2..2c6aefba 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2208,14 +2208,9 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old, * initial reference taken using kref_init */ struct drm_i915_gem_request { - /** -* Underlying object for implementing the signal/wait stuff. -* NB: Never return this fence object to user land! It is unsafe to -* let anything outside of the i915 driver get hold of the fence -* object as the clean up when decrementing the reference count -* requires holding the driver mutex lock. -*/ + /** Underlying object for implementing the signal/wait stuff. */ struct fence fence; + struct list_head delayed_free_link; /** On Which ring this request was generated */ struct drm_i915_private *i915; @@ -2337,21 +2332,10 @@ i915_gem_request_reference(struct drm_i915_gem_request *req) static inline void i915_gem_request_unreference(struct drm_i915_gem_request *req) { - WARN_ON(!mutex_is_locked(>ring->dev->struct_mutex)); - fence_put(>fence); -} - -static inline void -i915_gem_request_unreference__unlocked(struct drm_i915_gem_request *req) -{ - struct drm_device *dev; - if (!req) return; - dev = req->ring->dev; - if (kref_put_mutex(>fence.refcount, fence_release, >struct_mutex)) - mutex_unlock(>struct_mutex); + fence_put(>fence); } static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst, diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 2d50287..aca9fcd 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2683,10 +2683,26 @@ static void i915_set_reset_status(struct drm_i915_private *dev_priv, } } -static void i915_gem_request_free(struct fence *req_fence) +static void i915_gem_request_release(struct fence *req_fence) { struct drm_i915_gem_request *req = container_of(req_fence, typeof(*req), fence); + struct intel_engine_cs *ring = req->ring; + struct drm_i915_private *dev_priv = to_i915(ring->dev); + + /* +* Need to add the request to a deferred dereference list to be +* processed at a mutex lock safe time. +*/ + spin_lock(>delayed_free_lock); + list_add_tail(>delayed_free_link, >delayed_free_list); + spin_unlock(>delayed_free_lock); + + queue_delayed_work(dev_priv->wq, _priv->mm.retire_work, 0); +} + +static void i915_gem_request_free(struct drm_i915_gem_request *req) +{ struct intel_context *ctx = req->ctx; WARN_ON(!mutex_is_locked(>ring->dev->struct_mutex)); @@ -2766,7 +2782,7 @@ static const struct fence_ops i915_gem_request_fops = { .enable_signaling = i915_gem_request_enable_signaling, .signaled = i915_gem_request_is_completed, .wait
[Intel-gfx] [PATCH v6 7/7] drm/i915: Cache last IRQ seqno to reduce IRQ overhead
From: John HarrisonThe notify function can be called many times without the seqno changing. A large number of duplicates are to prevent races due to the requirement of not enabling interrupts until requested. However, when interrupts are enabled the IRQ handle can be called multiple times without the ring's seqno value changing. This patch reduces the overhead of these extra calls by caching the last processed seqno value and early exiting if it has not changed. v3: New patch for series. v5: Added comment about last_irq_seqno usage due to code review feedback (Tvrtko Ursulin). v6: Minor update to resolve a race condition with the wait_request optimisation. For: VIZ-5190 Signed-off-by: John Harrison --- drivers/gpu/drm/i915/i915_gem.c | 21 +++-- drivers/gpu/drm/i915/intel_ringbuffer.h | 1 + 2 files changed, 20 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index f7858ea..72a37d6 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1386,6 +1386,7 @@ out: * request has not actually been fully processed yet. */ spin_lock_irq(>ring->fence_lock); + req->ring->last_irq_seqno = 0; i915_gem_request_notify(req->ring, true); spin_unlock_irq(>ring->fence_lock); } @@ -2543,6 +2544,8 @@ i915_gem_init_seqno(struct drm_device *dev, u32 seqno) for (j = 0; j < ARRAY_SIZE(ring->semaphore.sync_seqno); j++) ring->semaphore.sync_seqno[j] = 0; + + ring->last_irq_seqno = 0; } return 0; @@ -2875,11 +2878,22 @@ void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked) return; } + /* +* Check for a new seqno. If it hasn't actually changed then early +* exit without even grabbing the spinlock. Note that this is safe +* because any corruption of last_irq_seqno merely results in doing +* the full processing when there is potentially no work to be done. +* It can never lead to not processing work that does need to happen. +*/ + seqno = ring->get_seqno(ring, false); + trace_i915_gem_request_notify(ring, seqno); + if (seqno == ring->last_irq_seqno) + return; + if (!fence_locked) spin_lock_irqsave(>fence_lock, flags); - seqno = ring->get_seqno(ring, false); - trace_i915_gem_request_notify(ring, seqno); + ring->last_irq_seqno = seqno; list_for_each_entry_safe(req, req_next, >fence_signal_list, signal_link) { if (!req->cancelled) { @@ -3167,7 +3181,10 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv, * Tidy up anything left over. This includes a call to * i915_gem_request_notify() which will make sure that any requests * that were on the signal pending list get also cleaned up. +* NB: The seqno cache must be cleared otherwise the notify call will +* simply return immediately. */ + ring->last_irq_seqno = 0; i915_gem_retire_requests_ring(ring); /* Having flushed all requests from all queues, we know that all diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h index 6a7968b..ada93a9 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.h +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h @@ -363,6 +363,7 @@ struct intel_engine_cs { spinlock_t fence_lock; struct list_head fence_signal_list; struct list_head fence_unsignal_list; + uint32_t last_irq_seqno; }; static inline bool -- 1.9.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v5 35/35] drm/i915: Allow scheduler to manage inter-ring object synchronisation
From: John HarrisonThe scheduler has always tracked batch buffer dependencies based on DRM object usage. This means that it will not submit a batch on one ring that has outstanding dependencies still executing on other rings. This is exactly the same synchronisation performed by i915_gem_object_sync() using hardware semaphores where available and CPU stalls where not (e.g. in execlist mode and/or on Gen8 hardware). Unfortunately, when a batch buffer is submitted to the driver the _object_sync() call happens first. Thus in case where hardware semaphores are disabled, the driver has already stalled until the dependency has been resolved. This patch adds an optimisation to _object_sync() to ignore the synchronisation in the case where it will subsequently be handled by the scheduler. This removes the driver stall and (in the single application case) provides near hardware semaphore performance even when hardware semaphores are disabled. In a busy system where there is other work that can be executed on the stalling ring, it provides better than hardware semaphore performance as it removes the stall from both the driver and from the hardware. There is also a theory that this method should improve power usage as hardware semaphores are apparently not very power efficient - the stalled ring does not go into as low a power a state as when it is genuinely idle. The optimisation is to check whether both ends of the synchronisation are batch buffer requests. If they are, then the scheduler will have the inter-dependency tracked and managed. If one or other end is not a batch buffer request (e.g. a page flip) then the code falls back to the CPU stall or hardware semaphore as appropriate. To check whether the existing usage is a batch buffer, the code simply calls the 'are you tracking this request' function of the scheduler on the object's last_read_req member. To check whether the new usage is a batch buffer, a flag is passed in from the caller. Issue: VIZ-5566 Signed-off-by: John Harrison --- drivers/gpu/drm/i915/i915_drv.h| 2 +- drivers/gpu/drm/i915/i915_gem.c| 17 ++--- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 2 +- drivers/gpu/drm/i915/intel_display.c | 2 +- drivers/gpu/drm/i915/intel_lrc.c | 2 +- 5 files changed, 18 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 5d02f44..207ac16 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -3011,7 +3011,7 @@ int __must_check i915_mutex_lock_interruptible(struct drm_device *dev); #endif int i915_gem_object_sync(struct drm_i915_gem_object *obj, struct intel_engine_cs *to, -struct drm_i915_gem_request **to_req); +struct drm_i915_gem_request **to_req, bool to_batch); void i915_vma_move_to_active(struct i915_vma *vma, struct drm_i915_gem_request *req); int i915_gem_dumb_create(struct drm_file *file_priv, diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index a2c136d..b14e384 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -3577,7 +3577,7 @@ static int __i915_gem_object_sync(struct drm_i915_gem_object *obj, struct intel_engine_cs *to, struct drm_i915_gem_request *from_req, - struct drm_i915_gem_request **to_req) + struct drm_i915_gem_request **to_req, bool to_batch) { struct intel_engine_cs *from; int ret; @@ -3589,6 +3589,15 @@ __i915_gem_object_sync(struct drm_i915_gem_object *obj, if (i915_gem_request_completed(from_req)) return 0; + /* +* The scheduler will manage inter-ring object dependencies +* as long as both to and from requests are scheduler managed +* (i.e. batch buffers). +*/ + if (to_batch && + i915_scheduler_is_request_tracked(from_req, NULL, NULL)) + return 0; + if (!i915_semaphore_is_enabled(obj->base.dev)) { struct drm_i915_private *i915 = to_i915(obj->base.dev); ret = __i915_wait_request(from_req, @@ -3639,6 +3648,8 @@ __i915_gem_object_sync(struct drm_i915_gem_object *obj, * @to_req: request we wish to use the object for. See below. * This will be allocated and returned if a request is * required but not passed in. + * @to_batch: is the sync request on behalf of batch buffer submission? + * If so then the scheduler can (potentially) manage the synchronisation. * * This code is meant to abstract object synchronization with the GPU. * Calling with NULL implies synchronizing the object with the CPU @@ -3669,7 +3680,7 @@ __i915_gem_object_sync(struct drm_i915_gem_object *obj, int
[Intel-gfx] [PATCH 01/20] igt/gem_ctx_param_basic: Updated to support scheduler priority interface
From: John HarrisonThe GPU scheduler has added an execution priority level to the context object. There is an IOCTL interface to allow user apps/libraries to set this priority. This patch updates the context paramter IOCTL test to include the new interface. For: VIZ-1587 Signed-off-by: John Harrison --- lib/ioctl_wrappers.h| 1 + tests/gem_ctx_param_basic.c | 34 +- 2 files changed, 34 insertions(+), 1 deletion(-) diff --git a/lib/ioctl_wrappers.h b/lib/ioctl_wrappers.h index 214ec78..e650b8f 100644 --- a/lib/ioctl_wrappers.h +++ b/lib/ioctl_wrappers.h @@ -105,6 +105,7 @@ struct local_i915_gem_context_param { #define LOCAL_CONTEXT_PARAM_BAN_PERIOD 0x1 #define LOCAL_CONTEXT_PARAM_NO_ZEROMAP 0x2 #define LOCAL_CONTEXT_PARAM_GTT_SIZE 0x3 +#define LOCAL_CONTEXT_PARAM_PRIORITY 0x4 uint64_t value; }; void gem_context_require_ban_period(int fd); diff --git a/tests/gem_ctx_param_basic.c b/tests/gem_ctx_param_basic.c index b75800c..585a1a8 100644 --- a/tests/gem_ctx_param_basic.c +++ b/tests/gem_ctx_param_basic.c @@ -147,10 +147,42 @@ igt_main TEST_SUCCESS(LOCAL_IOCTL_I915_GEM_CONTEXT_SETPARAM); } + ctx_param.param = LOCAL_CONTEXT_PARAM_PRIORITY; + + igt_subtest("priority-root-set") { + ctx_param.context = ctx; + ctx_param.value = 2048; + TEST_FAIL(LOCAL_IOCTL_I915_GEM_CONTEXT_SETPARAM, EINVAL); + ctx_param.value = -2048; + TEST_FAIL(LOCAL_IOCTL_I915_GEM_CONTEXT_SETPARAM, EINVAL); + ctx_param.value = 512; + TEST_SUCCESS(LOCAL_IOCTL_I915_GEM_CONTEXT_SETPARAM); + ctx_param.value = -512; + TEST_SUCCESS(LOCAL_IOCTL_I915_GEM_CONTEXT_SETPARAM); + ctx_param.value = 0; + TEST_SUCCESS(LOCAL_IOCTL_I915_GEM_CONTEXT_SETPARAM); + } + + igt_subtest("priority-non-root-set") { + igt_fork(child, 1) { + igt_drop_root(); + + ctx_param.context = ctx; + ctx_param.value = 512; + TEST_FAIL(LOCAL_IOCTL_I915_GEM_CONTEXT_SETPARAM, EPERM); + ctx_param.value = -512; + TEST_SUCCESS(LOCAL_IOCTL_I915_GEM_CONTEXT_SETPARAM); + ctx_param.value = 0; + TEST_SUCCESS(LOCAL_IOCTL_I915_GEM_CONTEXT_SETPARAM); + } + + igt_waitchildren(); + } + /* NOTE: This testcase intentionally tests for the next free parameter * to catch ABI extensions. Don't "fix" this testcase without adding all * the tests for the new param first. */ - ctx_param.param = LOCAL_CONTEXT_PARAM_GTT_SIZE + 1; + ctx_param.param = LOCAL_CONTEXT_PARAM_PRIORITY + 1; igt_subtest("invalid-param-get") { ctx_param.context = ctx; -- 1.9.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v5 34/35] drm/i915: Add support for retro-actively banning batch buffers
From: John HarrisonIf a given context submits too many hanging batch buffers then it will be banned and no further batch buffers will be accepted for it. However, it is possible that a large number of buffers may already have been accepted and are sat in the scheduler waiting to be executed. This patch adds a late ban check to ensure that these will also be discarded. v4: New patch in series. For: VIZ-1587 Signed-off-by: John Harrison --- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 6 ++ drivers/gpu/drm/i915/intel_lrc.c | 6 ++ 2 files changed, 12 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index 793fbce..0b8c61e 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -1292,6 +1292,12 @@ int i915_gem_ringbuffer_submission_final(struct i915_execbuffer_params *params) /* The mutex must be acquired before calling this function */ WARN_ON(!mutex_is_locked(>dev->struct_mutex)); + /* Check the context wasn't banned between submission and execution: */ + if (params->ctx->hang_stats.banned) { + DRM_DEBUG("Trying to execute for banned context!\n"); + return -ENOENT; + } + /* Make sure the request's seqno is the latest and greatest: */ if (req->reserved_seqno != dev_priv->last_seqno) { ret = i915_gem_get_seqno(ring->dev, >reserved_seqno); diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index e124443..5fbeb0e 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1002,6 +1002,12 @@ int intel_execlists_submission_final(struct i915_execbuffer_params *params) /* The mutex must be acquired before calling this function */ WARN_ON(!mutex_is_locked(>dev->struct_mutex)); + /* Check the context wasn't banned between submission and execution: */ + if (params->ctx->hang_stats.banned) { + DRM_DEBUG("Trying to execute for banned context!\n"); + return -ENOENT; + } + /* Make sure the request's seqno is the latest and greatest: */ if (req->reserved_seqno != dev_priv->last_seqno) { ret = i915_gem_get_seqno(ring->dev, >reserved_seqno); -- 1.9.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v5 16/35] drm/i915: Hook scheduler node clean up into retire requests
From: John HarrisonThe scheduler keeps its own lock on various DRM objects in order to guarantee safe access long after the original execbuff IOCTL has completed. This is especially important when pre-emption is enabled as the batch buffer might need to be submitted to the hardware multiple times. This patch hooks the clean up of these locks into the request retire function. The request can only be retired after it has completed on the hardware and thus is no longer eligible for re-submission. Thus there is no point holding on to the locks beyond that time. v3: Updated to not WARN when cleaning a node that is being cancelled. The clean will happen later so skipping it at the point of cancellation is fine. v5: Squashed the i915_scheduler.c portions down into the 'start of scheduler' patch. [Joonas Lahtinen] For: VIZ-1587 Signed-off-by: John Harrison Cc: Joonas Lahtinen --- drivers/gpu/drm/i915/i915_gem.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 1ab7256..2dd9b55 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1489,6 +1489,9 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request) fence_signal_locked(>fence); } + if (request->scheduler_qe) + i915_scheduler_clean_node(request->scheduler_qe); + i915_gem_request_unreference(request); } -- 1.9.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v5 18/35] drm/i915: Added scheduler support to page fault handler
From: John HarrisonGPU page faults can now require scheduler operation in order to complete. For example, in order to free up sufficient memory to handle the fault the handler must wait for a batch buffer to complete that has not even been sent to the hardware yet. Thus EAGAIN no longer means a GPU hang, it can occur under normal operation. For: VIZ-1587 Signed-off-by: John Harrison --- drivers/gpu/drm/i915/i915_gem.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 17b44b3..a47a495 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2003,10 +2003,15 @@ out: } case -EAGAIN: /* -* EAGAIN means the gpu is hung and we'll wait for the error -* handler to reset everything when re-faulting in +* EAGAIN can mean the gpu is hung and we'll have to wait for +* the error handler to reset everything when re-faulting in * i915_mutex_lock_interruptible. +* +* It can also indicate various other nonfatal errors for which +* the best response is to give other threads a chance to run, +* and then retry the failing operation in its entirety. */ + /*FALLTHRU*/ case 0: case -ERESTARTSYS: case -EINTR: -- 1.9.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v5 12/35] drm/i915: Added deferred work handler for scheduler
From: John HarrisonThe scheduler needs to do interrupt triggered work that is too complex to do in the interrupt handler. Thus it requires a deferred work handler to process such tasks asynchronously. v2: Updated to reduce mutex lock usage. The lock is now only held for the minimum time within the remove function rather than for the whole of the worker thread's operation. v5: Removed objectionable white space and added some documentation. [Joonas Lahtinen] For: VIZ-1587 Signed-off-by: John Harrison Cc: Joonas Lahtinen --- drivers/gpu/drm/i915/i915_dma.c | 3 +++ drivers/gpu/drm/i915/i915_drv.h | 10 ++ drivers/gpu/drm/i915/i915_gem.c | 2 ++ drivers/gpu/drm/i915/i915_scheduler.c | 29 +++-- drivers/gpu/drm/i915/i915_scheduler.h | 1 + 5 files changed, 43 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c index 678adc7..c3d382d 100644 --- a/drivers/gpu/drm/i915/i915_dma.c +++ b/drivers/gpu/drm/i915/i915_dma.c @@ -1158,6 +1158,9 @@ int i915_driver_unload(struct drm_device *dev) WARN_ON(unregister_oom_notifier(_priv->mm.oom_notifier)); unregister_shrinker(_priv->mm.shrinker); + /* Cancel the scheduler work handler, which should be idle now. */ + cancel_work_sync(_priv->mm.scheduler_work); + io_mapping_free(dev_priv->gtt.mappable); arch_phys_wc_del(dev_priv->gtt.mtrr); diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 03add1a..4d544f1 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1291,6 +1291,16 @@ struct i915_gem_mm { struct delayed_work retire_work; /** +* New scheme is to get an interrupt after every work packet +* in order to allow the low latency scheduling of pending +* packets. The idea behind adding new packets to a pending +* queue rather than directly into the hardware ring buffer +* is to allow high priority packets to over take low priority +* ones. +*/ + struct work_struct scheduler_work; + + /** * When we detect an idle GPU, we want to turn on * powersaving features. So once we see that there * are no more requests outstanding and no more diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index c3b7def..1ab7256 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -5427,6 +5427,8 @@ i915_gem_load(struct drm_device *dev) i915_gem_retire_work_handler); INIT_DELAYED_WORK(_priv->mm.idle_work, i915_gem_idle_work_handler); + INIT_WORK(_priv->mm.scheduler_work, + i915_scheduler_work_handler); init_waitqueue_head(_priv->gpu_error.reset_queue); dev_priv->relative_constants_mode = I915_EXEC_CONSTANTS_REL_GENERAL; diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index ab5007a..3986890 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -697,7 +697,9 @@ static int i915_scheduler_remove_dependent(struct i915_scheduler *scheduler, */ void i915_scheduler_wakeup(struct drm_device *dev) { - /* XXX: Need to call i915_scheduler_remove() via work handler. */ + struct drm_i915_private *dev_priv = to_i915(dev); + + queue_work(dev_priv->wq, _priv->mm.scheduler_work); } /** @@ -827,7 +829,7 @@ static bool i915_scheduler_remove(struct i915_scheduler *scheduler, return do_submit; } -void i915_scheduler_process_work(struct intel_engine_cs *ring) +static void i915_scheduler_process_work(struct intel_engine_cs *ring) { struct drm_i915_private *dev_priv = ring->dev->dev_private; struct i915_scheduler *scheduler = dev_priv->scheduler; @@ -874,6 +876,29 @@ void i915_scheduler_process_work(struct intel_engine_cs *ring) } /** + * i915_scheduler_work_handler - scheduler's work handler callback. + * @work: Work structure + * A lot of the scheduler's work must be done asynchronously in response to + * an interrupt or other event. However, that work cannot be done at + * interrupt time or in the context of the event signaller (which might in + * fact be an interrupt). Thus a worker thread is required. This function + * will cause the thread to wake up and do its processing. + */ +void i915_scheduler_work_handler(struct work_struct *work) +{ + struct intel_engine_cs *ring; + struct drm_i915_private *dev_priv; + struct drm_device *dev; + int i; + + dev_priv = container_of(work, struct drm_i915_private, mm.scheduler_work); + dev = dev_priv->dev; + + for_each_ring(ring, dev_priv, i) + i915_scheduler_process_work(ring); +} +
[Intel-gfx] [PATCH v5 26/35] drm/i915: Added debugfs interface to scheduler tuning parameters
From: John HarrisonThere are various parameters within the scheduler which can be tuned to improve performance, reduce memory footprint, etc. This change adds support for altering these via debugfs. v2: Updated for priorities now being signed values. v5: Squashed priority bumping entries into this patch rather than a separate patch all of their own. For: VIZ-1587 Signed-off-by: John Harrison --- drivers/gpu/drm/i915/i915_debugfs.c | 169 1 file changed, 169 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index b923949..7d01c07 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -39,6 +39,7 @@ #include "intel_ringbuffer.h" #include #include "i915_drv.h" +#include "i915_scheduler.h" enum { ACTIVE_LIST, @@ -1122,6 +1123,168 @@ DEFINE_SIMPLE_ATTRIBUTE(i915_next_seqno_fops, i915_next_seqno_get, i915_next_seqno_set, "0x%llx\n"); +static int +i915_scheduler_priority_min_get(void *data, u64 *val) +{ + struct drm_device *dev = data; + struct drm_i915_private *dev_priv = dev->dev_private; + struct i915_scheduler *scheduler = dev_priv->scheduler; + + *val = (u64) scheduler->priority_level_min; + return 0; +} + +static int +i915_scheduler_priority_min_set(void *data, u64 val) +{ + struct drm_device *dev = data; + struct drm_i915_private *dev_priv = dev->dev_private; + struct i915_scheduler *scheduler = dev_priv->scheduler; + + scheduler->priority_level_min = (int32_t) val; + return 0; +} + +DEFINE_SIMPLE_ATTRIBUTE(i915_scheduler_priority_min_fops, + i915_scheduler_priority_min_get, + i915_scheduler_priority_min_set, + "%lld\n"); + +static int +i915_scheduler_priority_max_get(void *data, u64 *val) +{ + struct drm_device *dev = data; + struct drm_i915_private *dev_priv = dev->dev_private; + struct i915_scheduler *scheduler = dev_priv->scheduler; + + *val = (u64) scheduler->priority_level_max; + return 0; +} + +static int +i915_scheduler_priority_max_set(void *data, u64 val) +{ + struct drm_device *dev = data; + struct drm_i915_private *dev_priv = dev->dev_private; + struct i915_scheduler *scheduler = dev_priv->scheduler; + + scheduler->priority_level_max = (int32_t) val; + return 0; +} + +DEFINE_SIMPLE_ATTRIBUTE(i915_scheduler_priority_max_fops, + i915_scheduler_priority_max_get, + i915_scheduler_priority_max_set, + "%lld\n"); + +static int +i915_scheduler_priority_bump_get(void *data, u64 *val) +{ + struct drm_device *dev = data; + struct drm_i915_private *dev_priv = dev->dev_private; + struct i915_scheduler *scheduler = dev_priv->scheduler; + + *val = (u64) scheduler->priority_level_bump; + return 0; +} + +static int +i915_scheduler_priority_bump_set(void *data, u64 val) +{ + struct drm_device *dev = data; + struct drm_i915_private *dev_priv = dev->dev_private; + struct i915_scheduler *scheduler = dev_priv->scheduler; + + scheduler->priority_level_bump = (u32) val; + return 0; +} + +DEFINE_SIMPLE_ATTRIBUTE(i915_scheduler_priority_bump_fops, + i915_scheduler_priority_bump_get, + i915_scheduler_priority_bump_set, + "%lld\n"); + +static int +i915_scheduler_priority_preempt_get(void *data, u64 *val) +{ + struct drm_device *dev = data; + struct drm_i915_private *dev_priv = dev->dev_private; + struct i915_scheduler *scheduler = dev_priv->scheduler; + + *val = (u64) scheduler->priority_level_preempt; + return 0; +} + +static int +i915_scheduler_priority_preempt_set(void *data, u64 val) +{ + struct drm_device *dev = data; + struct drm_i915_private *dev_priv = dev->dev_private; + struct i915_scheduler *scheduler = dev_priv->scheduler; + + scheduler->priority_level_preempt = (u32) val; + return 0; +} + +DEFINE_SIMPLE_ATTRIBUTE(i915_scheduler_priority_preempt_fops, + i915_scheduler_priority_preempt_get, + i915_scheduler_priority_preempt_set, + "%lld\n"); + +static int +i915_scheduler_min_flying_get(void *data, u64 *val) +{ + struct drm_device *dev = data; + struct drm_i915_private *dev_priv = dev->dev_private; + struct i915_scheduler *scheduler = dev_priv->scheduler; + + *val = (u64) scheduler->min_flying; + return 0; +} + +static int +i915_scheduler_min_flying_set(void *data, u64 val) +{ + struct drm_device
[Intel-gfx] [PATCH v5 30/35] drm/i915: Add scheduler support functions for TDR
From: John HarrisonThe TDR code needs to know what the scheduler is up to in order to work out whether a ring is really hung or not. v4: Removed some unnecessary braces to keep the style checker happy. v5: Removed white space and added documentation. [Joonas Lahtinen] Also updated for new module parameter. For: VIZ-1587 Signed-off-by: John Harrison Cc: Joonas Lahtinen --- drivers/gpu/drm/i915/i915_scheduler.c | 33 + drivers/gpu/drm/i915/i915_scheduler.h | 1 + 2 files changed, 34 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index 0068d03..c69e2b8 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -1627,3 +1627,36 @@ int i915_scheduler_closefile(struct drm_device *dev, struct drm_file *file) return 0; } + +/** + * i915_scheduler_is_ring_flying - does the given ring have in flight batches? + * @ring: Ring to query + * Used by TDR to distinguish hung rings (not moving but with work to do) + * from idle rings (not moving because there is nothing to do). Returns true + * if the given ring has batches currently executing on the hardware. + */ +bool i915_scheduler_is_ring_flying(struct intel_engine_cs *ring) +{ + struct drm_i915_private *dev_priv = ring->dev->dev_private; + struct i915_scheduler *scheduler = dev_priv->scheduler; + struct i915_scheduler_queue_entry *node; + unsigned long flags; + bool found = false; + + /* With the scheduler in bypass mode, no information can be returned. */ + if (!i915.enable_scheduler) + return true; + + spin_lock_irqsave(>lock, flags); + + list_for_each_entry(node, >node_queue[ring->id], link) { + if (I915_SQS_IS_FLYING(node)) { + found = true; + break; + } + } + + spin_unlock_irqrestore(>lock, flags); + + return found; +} diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h index 065f2a3..dcf1f05 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.h +++ b/drivers/gpu/drm/i915/i915_scheduler.h @@ -136,6 +136,7 @@ void i915_scheduler_clean_node(struct i915_scheduler_queue_entry *node); int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe); bool i915_scheduler_notify_request(struct drm_i915_gem_request *req); void i915_scheduler_wakeup(struct drm_device *dev); +bool i915_scheduler_is_ring_flying(struct intel_engine_cs *ring); void i915_scheduler_work_handler(struct work_struct *work); int i915_scheduler_flush(struct intel_engine_cs *ring, bool is_locked); int i915_scheduler_flush_stamp(struct intel_engine_cs *ring, -- 1.9.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v5 29/35] drm/i915: Added scheduler statistic reporting to debugfs
From: John HarrisonIt is useful for know what the scheduler is doing for both debugging and performance analysis purposes. This change adds a bunch of counters and such that keep track of various scheduler operations (batches submitted, completed, flush requests, etc.). The data can then be read in userland via the debugfs mechanism. v2: Updated to match changes to scheduler implementation. v3: Updated for changes to kill code and flush code. v4: Removed the fence/sync code as that will be part of a separate patch series. Wrapped a long line to keep the style checker happy. v5: Updated to remove forward declarations and white space. Added documentation. [Joonas Lahtinen] Used lighter weight spinlocks. For: VIZ-1587 Signed-off-by: John Harrison Cc: Joonas Lahtinen --- drivers/gpu/drm/i915/i915_debugfs.c| 73 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 3 ++ drivers/gpu/drm/i915/i915_scheduler.c | 78 -- drivers/gpu/drm/i915/i915_scheduler.h | 31 4 files changed, 180 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index 7d01c07..2c8b00f 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -3595,6 +3595,78 @@ static int i915_drrs_status(struct seq_file *m, void *unused) return 0; } +static int i915_scheduler_info(struct seq_file *m, void *unused) +{ + struct drm_info_node *node = (struct drm_info_node *) m->private; + struct drm_device *dev = node->minor->dev; + struct drm_i915_private *dev_priv = dev->dev_private; + struct i915_scheduler *scheduler = dev_priv->scheduler; + struct i915_scheduler_stats *stats = scheduler->stats; + struct i915_scheduler_stats_nodes node_stats[I915_NUM_RINGS]; + struct intel_engine_cs *ring; + char str[50 * (I915_NUM_RINGS + 1)], name[50], *ptr; + int ret, i, r; + + ret = mutex_lock_interruptible(>mode_config.mutex); + if (ret) + return ret; + +#define PRINT_VAR(name, fmt, var) \ + do {\ + sprintf(str, "%-22s", name);\ + ptr = str + strlen(str);\ + for_each_ring(ring, dev_priv, r) { \ + sprintf(ptr, " %10" fmt, var); \ + ptr += strlen(ptr); \ + } \ + seq_printf(m, "%s\n", str); \ + } while (0) + + PRINT_VAR("Ring name:", "s", dev_priv->ring[r].name); + PRINT_VAR(" Ring seqno", "d", ring->get_seqno(ring, false)); + seq_putc(m, '\n'); + + seq_puts(m, "Batch submissions:\n"); + PRINT_VAR(" Queued", "u", stats[r].queued); + PRINT_VAR(" Submitted","u", stats[r].submitted); + PRINT_VAR(" Completed","u", stats[r].completed); + PRINT_VAR(" Expired", "u", stats[r].expired); + seq_putc(m, '\n'); + + seq_puts(m, "Flush counts:\n"); + PRINT_VAR(" By object","u", stats[r].flush_obj); + PRINT_VAR(" By request", "u", stats[r].flush_req); + PRINT_VAR(" By stamp", "u", stats[r].flush_stamp); + PRINT_VAR(" Blanket", "u", stats[r].flush_all); + PRINT_VAR(" Entries bumped", "u", stats[r].flush_bump); + PRINT_VAR(" Entries submitted","u", stats[r].flush_submit); + seq_putc(m, '\n'); + + seq_puts(m, "Miscellaneous:\n"); + PRINT_VAR(" ExecEarly retry", "u", stats[r].exec_early); + PRINT_VAR(" ExecFinal requeue","u", stats[r].exec_again); + PRINT_VAR(" ExecFinal killed", "u", stats[r].exec_dead); + PRINT_VAR(" Hung flying", "u", stats[r].kill_flying); + PRINT_VAR(" Hung queued", "u", stats[r].kill_queued); + seq_putc(m, '\n'); + + seq_puts(m, "Queue contents:\n"); + for_each_ring(ring, dev_priv, i) + i915_scheduler_query_stats(ring, node_stats + ring->id); + + for (i = 0; i < (i915_sqs_MAX + 1); i++) { + sprintf(name, " %s", i915_scheduler_queue_status_str(i)); + PRINT_VAR(name, "d", node_stats[r].counts[i]); + } + seq_putc(m, '\n'); + +#undef PRINT_VAR + + mutex_unlock(>mode_config.mutex); + + return 0; +} + struct pipe_crc_info { const char *name; struct drm_device *dev; @@ -5565,6 +5637,7 @@ static const struct drm_info_list i915_debugfs_list[] = {
[Intel-gfx] [PATCH v5 14/35] drm/i915: Keep the reserved space mechanism happy
From: John HarrisonRing space is reserved when constructing a request to ensure that the subsequent 'add_request()' call cannot fail due to waiting for space on a busy or broken GPU. However, the scheduler jumps in to the middle of the execbuffer process between request creation and request submission. Thus it needs to cancel the reserved space when the request is simply added to the scheduler's queue and not yet submitted. Similarly, it needs to re-reserve the space when it finally does want to send the batch buffer to the hardware. v3: Updated to use locally cached request pointer. v5: Updated due to changes to earlier patches in series - for runtime PM calls and splitting bypass mode into a separate function. For: VIZ-1587 Signed-off-by: John Harrison --- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 20 ++-- drivers/gpu/drm/i915/i915_scheduler.c | 4 drivers/gpu/drm/i915/intel_lrc.c | 13 +++-- 3 files changed, 29 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index 09c5ce9..11bea8d 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -1295,18 +1295,22 @@ int i915_gem_ringbuffer_submission_final(struct i915_execbuffer_params *params) /* The mutex must be acquired before calling this function */ WARN_ON(!mutex_is_locked(>dev->struct_mutex)); + ret = intel_ring_reserve_space(req); + if (ret) + goto error; + /* * Unconditionally invalidate gpu caches and ensure that we do flush * any residual writes from the previous batch. */ ret = intel_ring_invalidate_all_caches(req); if (ret) - return ret; + goto error; /* Switch to the correct context for the batch */ ret = i915_switch_context(req); if (ret) - return ret; + goto error; WARN(params->ctx->ppgtt && params->ctx->ppgtt->pd_dirty_rings & (1 name); @@ -1315,7 +1319,7 @@ int i915_gem_ringbuffer_submission_final(struct i915_execbuffer_params *params) params->instp_mode != dev_priv->relative_constants_mode) { ret = intel_ring_begin(req, 4); if (ret) - return ret; + goto error; intel_ring_emit(ring, MI_NOOP); intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1)); @@ -1329,7 +1333,7 @@ int i915_gem_ringbuffer_submission_final(struct i915_execbuffer_params *params) if (params->args_flags & I915_EXEC_GEN7_SOL_RESET) { ret = i915_reset_gen7_sol_offsets(params->dev, req); if (ret) - return ret; + goto error; } exec_len = params->args_batch_len; @@ -1343,13 +1347,17 @@ int i915_gem_ringbuffer_submission_final(struct i915_execbuffer_params *params) exec_start, exec_len, params->dispatch_flags); if (ret) - return ret; + goto error; trace_i915_gem_ring_dispatch(req, params->dispatch_flags); i915_gem_execbuffer_retire_commands(params); - return 0; +error: + if (ret) + intel_ring_reserved_space_cancel(req->ringbuf); + + return ret; } /** diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index 3986890..a3ffd04 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -483,6 +483,8 @@ static int i915_scheduler_queue_execbuffer_bypass(struct i915_scheduler_queue_en struct i915_scheduler *scheduler = dev_priv->scheduler; int ret; + intel_ring_reserved_space_cancel(qe->params.request->ringbuf); + scheduler->flags[qe->params.ring->id] |= i915_sf_submitting; ret = dev_priv->gt.execbuf_final(>params); scheduler->flags[qe->params.ring->id] &= ~i915_sf_submitting; @@ -539,6 +541,8 @@ int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe) node->stamp = jiffies; i915_gem_request_reference(node->params.request); + intel_ring_reserved_space_cancel(node->params.request->ringbuf); + WARN_ON(node->params.request->scheduler_qe); node->params.request->scheduler_qe = node; diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index ff4565f..f4bab82 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -978,13 +978,17 @@ int intel_execlists_submission_final(struct i915_execbuffer_params *params) /* The mutex must be acquired before calling this function */
[Intel-gfx] [PATCH v5 25/35] drm/i915: Added scheduler queue throttling by DRM file handle
From: John HarrisonThe scheduler decouples the submission of batch buffers to the driver from their subsequent submission to the hardware. This means that an application which is continuously submitting buffers as fast as it can could potentialy flood the driver. To prevent this, the driver now tracks how many buffers are in progress (queued in software or executing in hardware) and limits this to a given (tunable) number. If this number is exceeded then the queue to the driver will return EAGAIN and thus prevent the scheduler's queue becoming arbitrarily large. v3: Added a missing decrement of the file queue counter. v4: Updated a comment. v5: Updated due to changes to earlier patches in series - removing forward declarations and white space. Also added some documentation. [Joonas Lahtinen] For: VIZ-1587 Signed-off-by: John Harrison --- drivers/gpu/drm/i915/i915_drv.h| 2 ++ drivers/gpu/drm/i915/i915_gem_execbuffer.c | 8 + drivers/gpu/drm/i915/i915_scheduler.c | 48 ++ drivers/gpu/drm/i915/i915_scheduler.h | 2 ++ 4 files changed, 60 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 071a27b..3f4c4f0 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -336,6 +336,8 @@ struct drm_i915_file_private { } rps; struct intel_engine_cs *bsd_ring; + + u32 scheduler_queue_length; }; enum intel_dpll_id { diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index d4de8c7..dff120c 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -1803,6 +1803,10 @@ i915_gem_execbuffer(struct drm_device *dev, void *data, return -EINVAL; } + /* Throttle batch requests per device file */ + if (i915_scheduler_file_queue_is_full(file)) + return -EAGAIN; + /* Copy in the exec list from userland */ exec_list = drm_malloc_ab(sizeof(*exec_list), args->buffer_count); exec2_list = drm_malloc_ab(sizeof(*exec2_list), args->buffer_count); @@ -1893,6 +1897,10 @@ i915_gem_execbuffer2(struct drm_device *dev, void *data, return -EINVAL; } + /* Throttle batch requests per device file */ + if (i915_scheduler_file_queue_is_full(file)) + return -EAGAIN; + exec2_list = kmalloc(sizeof(*exec2_list)*args->buffer_count, GFP_TEMPORARY | __GFP_NOWARN | __GFP_NORETRY); if (exec2_list == NULL) diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index e56ce08..f7f29d5 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -69,6 +69,7 @@ int i915_scheduler_init(struct drm_device *dev) scheduler->priority_level_bump= 50; scheduler->priority_level_preempt = 900; scheduler->min_flying = 2; + scheduler->file_queue_max = 64; dev_priv->scheduler = scheduler; @@ -464,6 +465,44 @@ static int i915_scheduler_submit_unlocked(struct intel_engine_cs *ring) return ret; } +/** + * i915_scheduler_file_queue_is_full - Returns true if the queue is full. + * @file: File object to query. + * This allows throttling of applications by limiting the total number of + * outstanding requests to a specified level. Once that limit is reached, + * this call will return true and no more requests should be accepted. + */ +bool i915_scheduler_file_queue_is_full(struct drm_file *file) +{ + struct drm_i915_file_private *file_priv = file->driver_priv; + struct drm_i915_private *dev_priv = file_priv->dev_priv; + struct i915_scheduler *scheduler = dev_priv->scheduler; + + return file_priv->scheduler_queue_length >= scheduler->file_queue_max; +} + +/** + * i915_scheduler_file_queue_inc - Increment the file's request queue count. + * @file: File object to process. + */ +static void i915_scheduler_file_queue_inc(struct drm_file *file) +{ + struct drm_i915_file_private *file_priv = file->driver_priv; + + file_priv->scheduler_queue_length++; +} + +/** + * i915_scheduler_file_queue_dec - Decrement the file's request queue count. + * @file: File object to process. + */ +static void i915_scheduler_file_queue_dec(struct drm_file *file) +{ + struct drm_i915_file_private *file_priv = file->driver_priv; + + file_priv->scheduler_queue_length--; +} + static void i915_generate_dependencies(struct i915_scheduler *scheduler, struct i915_scheduler_queue_entry *node, uint32_t ring) @@ -640,6 +679,8 @@ int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe) list_add_tail(>link, >node_queue[ring->id]); +
[Intel-gfx] [PATCH v5 22/35] drm/i915: Support for 'unflushed' ring idle
From: John HarrisonWhen the seqno wraps around zero, the entire GPU is forced to be idle for some reason (possibly only to work around issues with hardware semaphores but no-one seems too sure!). This causes a problem if the force idle occurs at an inopportune moment such as in the middle of submitting a batch buffer. Specifically, it would lead to recursive submits - submitting work requires a new seqno, the new seqno requires idling the ring, idling the ring requires submitting work, submitting work requires a new seqno... This change adds a 'flush' parameter to the idle function call which specifies whether the scheduler queues should be flushed out. I.e. is the call intended to just idle the ring as it is right now (no flush) or is it intended to force all outstanding work out of the system (with flush). In the seqno wrap case, pending work is not an issue because the next operation will be to submit it. However, in other cases, the intention is to make sure everything that could be done has been done. For: VIZ-1587 Signed-off-by: John Harrison --- drivers/gpu/drm/i915/i915_gem.c | 4 ++-- drivers/gpu/drm/i915/intel_lrc.c| 2 +- drivers/gpu/drm/i915/intel_ringbuffer.c | 17 +++-- drivers/gpu/drm/i915/intel_ringbuffer.h | 2 +- 4 files changed, 19 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index d7f7f7a..a249e52 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2564,7 +2564,7 @@ i915_gem_init_seqno(struct drm_device *dev, u32 seqno) /* Carefully retire all requests without writing to the rings */ for_each_ring(ring, dev_priv, i) { - ret = intel_ring_idle(ring); + ret = intel_ring_idle(ring, false); if (ret) return ret; } @@ -3808,7 +3808,7 @@ int i915_gpu_idle(struct drm_device *dev) i915_add_request_no_flush(req); } - ret = intel_ring_idle(ring); + ret = intel_ring_idle(ring, true); if (ret) return ret; } diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index f4bab82..e056875 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1058,7 +1058,7 @@ void intel_logical_ring_stop(struct intel_engine_cs *ring) if (!intel_ring_initialized(ring)) return; - ret = intel_ring_idle(ring); + ret = intel_ring_idle(ring, true); if (ret && !i915_reset_in_progress(_i915(ring->dev)->gpu_error)) DRM_ERROR("failed to quiesce %s whilst cleaning up: %d\n", ring->name, ret); diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index a2093f5..70ef9f0 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -2288,9 +2288,22 @@ static void __wrap_ring_buffer(struct intel_ringbuffer *ringbuf) intel_ring_update_space(ringbuf); } -int intel_ring_idle(struct intel_engine_cs *ring) +int intel_ring_idle(struct intel_engine_cs *ring, bool flush) { struct drm_i915_gem_request *req; + int ret; + + /* +* NB: Must not flush the scheduler if this idle request is from +* within an execbuff submission (i.e. due to 'get_seqno' calling +* 'wrap_seqno' calling 'idle'). As that would lead to recursive +* flushes! +*/ + if (flush) { + ret = i915_scheduler_flush(ring, true); + if (ret) + return ret; + } /* Wait upon the last request to be completed */ if (list_empty(>request_list)) @@ -3095,7 +3108,7 @@ intel_stop_ring_buffer(struct intel_engine_cs *ring) if (!intel_ring_initialized(ring)) return; - ret = intel_ring_idle(ring); + ret = intel_ring_idle(ring, true); if (ret && !i915_reset_in_progress(_i915(ring->dev)->gpu_error)) DRM_ERROR("failed to quiesce %s whilst cleaning up: %d\n", ring->name, ret); diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h index ada93a9..cca476f 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.h +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h @@ -478,7 +478,7 @@ void intel_ring_update_space(struct intel_ringbuffer *ringbuf); int intel_ring_space(struct intel_ringbuffer *ringbuf); bool intel_ring_stopped(struct intel_engine_cs *ring); -int __must_check intel_ring_idle(struct intel_engine_cs *ring); +int __must_check intel_ring_idle(struct intel_engine_cs *ring, bool flush); void intel_ring_init_seqno(struct intel_engine_cs *ring, u32 seqno); int intel_ring_flush_all_caches(struct
[Intel-gfx] [PATCH v5 33/35] drm/i915: Add scheduling priority to per-context parameters
From: Dave GordonAdded an interface for user land applications/libraries/services to set their GPU scheduler priority. This extends the existing context parameter IOCTL interface to add a scheduler priority parameter. The range is +/-1023 with +ve numbers meaning higher priority. Only system processes may set a higher priority than the default (zero), normal applications may only lower theirs. v2: New patch in series. For: VIZ-1587 Signed-off-by: Dave Gordon Signed-off-by: John Harrison --- drivers/gpu/drm/i915/i915_drv.h| 14 ++ drivers/gpu/drm/i915/i915_gem_context.c| 24 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 3 +++ include/uapi/drm/i915_drm.h| 1 + 4 files changed, 42 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 3f4c4f0..5d02f44 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -847,6 +847,19 @@ struct i915_ctx_hang_stats { bool banned; }; +/* + * User-settable GFX scheduler priorities are on a scale of -1023 (I don't + * care about running) to +1023 (I'm the most important thing in existence) + * with zero being the default. Any process may decrease its scheduling + * priority, but only a sufficiently privileged process may increase it + * beyond zero. + */ + +struct i915_ctx_sched_info { + /* Scheduling priority */ + int32_t priority; +}; + struct i915_fence_timeline { charname[32]; unsignedfence_context; @@ -887,6 +900,7 @@ struct intel_context { int flags; struct drm_i915_file_private *file_priv; struct i915_ctx_hang_stats hang_stats; + struct i915_ctx_sched_info sched_info; struct i915_hw_ppgtt *ppgtt; /* Legacy ring buffer submission */ diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index 3dcb2f4..6ac03e8 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -956,6 +956,9 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data, else args->value = to_i915(dev)->gtt.base.total; break; + case I915_CONTEXT_PARAM_PRIORITY: + args->value = (__u64) ctx->sched_info.priority; + break; default: ret = -EINVAL; break; @@ -993,6 +996,7 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data, else ctx->hang_stats.ban_period_seconds = args->value; break; + case I915_CONTEXT_PARAM_NO_ZEROMAP: if (args->size) { ret = -EINVAL; @@ -1001,6 +1005,26 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data, ctx->flags |= args->value ? CONTEXT_NO_ZEROMAP : 0; } break; + + case I915_CONTEXT_PARAM_PRIORITY: + { + int32_t priority = (int32_t) args->value; + struct drm_i915_private *dev_priv = dev->dev_private; + struct i915_scheduler *scheduler = dev_priv->scheduler; + + if (args->size) + ret = -EINVAL; + else if ((priority > scheduler->priority_level_max) || +(priority < scheduler->priority_level_min)) + ret = -EINVAL; + else if ((priority > 0) && +!capable(CAP_SYS_ADMIN)) + ret = -EPERM; + else + ctx->sched_info.priority = priority; + break; + } + default: ret = -EINVAL; break; diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index a42a13e..793fbce 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -1707,6 +1707,9 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data, params->args_DR4= args->DR4; params->batch_obj = batch_obj; + /* Start with the context's priority level */ + qe.priority = ctx->sched_info.priority; + /* * Save away the list of objects used by this batch buffer for the * purpose of tracking inter-buffer dependencies. diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index acf2102..8a01a47 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -1140,6 +1140,7 @@ struct drm_i915_gem_context_param { #define I915_CONTEXT_PARAM_BAN_PERIOD 0x1 #define I915_CONTEXT_PARAM_NO_ZEROMAP 0x2 #define I915_CONTEXT_PARAM_GTT_SIZE0x3 +#define I915_CONTEXT_PARAM_PRIORITY0x4
[Intel-gfx] [PATCH v5 19/35] drm/i915: Added scheduler flush calls to ring throttle and idle functions
From: John HarrisonWhen requesting that all GPU work is completed, it is now necessary to get the scheduler involved in order to flush out work that queued and not yet submitted. v2: Updated to add support for flushing the scheduler queue by time stamp rather than just doing a blanket flush. v3: Moved submit_max_priority() to this patch from an earlier patch is it is no longer required in the other. v4: Corrected the format of a comment to keep the style checker happy. Downgraded a BUG_ON to a WARN_ON as the latter is preferred. v5: Shuffled functions around to remove forward prototypes, removed similarly offensive white space and added documentation. Re-worked the mutex locking around the submit function. [Joonas Lahtinen] Used lighter weight spinlocks. For: VIZ-1587 Signed-off-by: John Harrison Cc: Joonas Lahtinen --- drivers/gpu/drm/i915/i915_gem.c | 24 - drivers/gpu/drm/i915/i915_scheduler.c | 178 ++ drivers/gpu/drm/i915/i915_scheduler.h | 3 + 3 files changed, 204 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index a47a495..d946f53 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -3786,6 +3786,10 @@ int i915_gpu_idle(struct drm_device *dev) /* Flush everything onto the inactive list. */ for_each_ring(ring, dev_priv, i) { + ret = i915_scheduler_flush(ring, true); + if (ret < 0) + return ret; + if (!i915.enable_execlists) { struct drm_i915_gem_request *req; @@ -4519,7 +4523,8 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file) unsigned long recent_enough = jiffies - DRM_I915_THROTTLE_JIFFIES; struct drm_i915_gem_request *request, *target = NULL; unsigned reset_counter; - int ret; + int i, ret; + struct intel_engine_cs *ring; ret = i915_gem_wait_for_error(_priv->gpu_error); if (ret) @@ -4529,6 +4534,23 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file) if (ret) return ret; + for_each_ring(ring, dev_priv, i) { + /* +* Flush out scheduler entries that are getting 'stale'. Note +* that the following recent_enough test will only check +* against the time at which the request was submitted to the +* hardware (i.e. when it left the scheduler) not the time it +* was submitted to the driver. +* +* Also, there is not much point worring about busy return +* codes from the scheduler flush call. Even if more work +* cannot be submitted right now for whatever reason, we +* still want to throttle against stale work that has already +* been submitted. +*/ + i915_scheduler_flush_stamp(ring, recent_enough, false); + } + spin_lock(_priv->mm.lock); list_for_each_entry(request, _priv->mm.request_list, client_list) { if (time_after_eq(request->emitted_jiffies, recent_enough)) diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index edab63d..8130a9c 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -304,6 +304,10 @@ static int i915_scheduler_pop_from_queue_locked(struct intel_engine_cs *ring, * attempting to acquire a mutex while holding a spin lock is a Bad Idea. * And releasing the one before acquiring the other leads to other code * being run and interfering. + * + * Hence any caller that does not already have the mutex lock for other + * reasons should call i915_scheduler_submit_unlocked() instead in order to + * obtain the lock first. */ static int i915_scheduler_submit(struct intel_engine_cs *ring) { @@ -428,6 +432,22 @@ error: return ret; } +static int i915_scheduler_submit_unlocked(struct intel_engine_cs *ring) +{ + struct drm_device *dev = ring->dev; + int ret; + + ret = i915_mutex_lock_interruptible(dev); + if (ret) + return ret; + + ret = i915_scheduler_submit(ring); + + mutex_unlock(>struct_mutex); + + return ret; +} + static void i915_generate_dependencies(struct i915_scheduler *scheduler, struct i915_scheduler_queue_entry *node, uint32_t ring) @@ -917,6 +937,164 @@ void i915_scheduler_work_handler(struct work_struct *work) i915_scheduler_process_work(ring); } +static int i915_scheduler_submit_max_priority(struct intel_engine_cs *ring, + bool is_locked) +{ + struct
[Intel-gfx] [PATCH v5 20/35] drm/i915: Add scheduler hook to GPU reset
From: John HarrisonWhen the watchdog resets the GPU, all interrupts get disabled despite the reference count remaining. As the scheduler probably had interrupts enabled during the reset (it would have been waiting for the bad batch to complete), it must be poked to tell it that the interrupt has been disabled. v5: New patch in series. For: VIZ-1587 Signed-off-by: John Harrison --- drivers/gpu/drm/i915/i915_gem.c | 2 ++ drivers/gpu/drm/i915/i915_scheduler.c | 11 +++ drivers/gpu/drm/i915/i915_scheduler.h | 1 + 3 files changed, 14 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index d946f53..d7f7f7a 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -3248,6 +3248,8 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv, buffer->last_retired_head = buffer->tail; intel_ring_update_space(buffer); } + + i915_scheduler_reset_cleanup(ring); } void i915_gem_reset(struct drm_device *dev) diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index 8130a9c..4f25bf2 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -778,6 +778,17 @@ void i915_scheduler_clean_node(struct i915_scheduler_queue_entry *node) } } +void i915_scheduler_reset_cleanup(struct intel_engine_cs *ring) +{ + struct drm_i915_private *dev_priv = ring->dev->dev_private; + struct i915_scheduler *scheduler = dev_priv->scheduler; + + if (scheduler->flags[ring->id] & i915_sf_interrupts_enabled) { + ring->irq_put(ring); + scheduler->flags[ring->id] &= ~i915_sf_interrupts_enabled; + } +} + static bool i915_scheduler_remove(struct i915_scheduler *scheduler, struct intel_engine_cs *ring, struct list_head *remove) diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h index 839b048..075befb 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.h +++ b/drivers/gpu/drm/i915/i915_scheduler.h @@ -89,6 +89,7 @@ bool i915_scheduler_is_enabled(struct drm_device *dev); int i915_scheduler_init(struct drm_device *dev); int i915_scheduler_closefile(struct drm_device *dev, struct drm_file *file); +void i915_scheduler_reset_cleanup(struct intel_engine_cs *ring); void i915_scheduler_clean_node(struct i915_scheduler_queue_entry *node); int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe); bool i915_scheduler_notify_request(struct drm_i915_gem_request *req); -- 1.9.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v5 21/35] drm/i915: Added a module parameter to allow the scheduler to be disabled
From: John HarrisonIt can be useful to be able to disable the GPU scheduler via a module parameter for debugging purposes. v5: Converted from a multi-feature 'overrides' mask to a single 'enable' boolean. Further features (e.g. pre-emption) will now be separate 'enable' booleans added later. [Chris Wilson] For: VIZ-1587 Signed-off-by: John Harrison --- drivers/gpu/drm/i915/i915_params.c| 4 drivers/gpu/drm/i915/i915_params.h| 1 + drivers/gpu/drm/i915/i915_scheduler.c | 5 - 3 files changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c index d0eba58..0ef3159 100644 --- a/drivers/gpu/drm/i915/i915_params.c +++ b/drivers/gpu/drm/i915/i915_params.c @@ -57,6 +57,7 @@ struct i915_params i915 __read_mostly = { .edp_vswing = 0, .enable_guc_submission = true, .guc_log_level = -1, + .enable_scheduler = 0, }; module_param_named(modeset, i915.modeset, int, 0400); @@ -203,3 +204,6 @@ MODULE_PARM_DESC(enable_guc_submission, "Enable GuC submission (default:false)") module_param_named(guc_log_level, i915.guc_log_level, int, 0400); MODULE_PARM_DESC(guc_log_level, "GuC firmware logging level (-1:disabled (default), 0-3:enabled)"); + +module_param_named_unsafe(enable_scheduler, i915.enable_scheduler, int, 0600); +MODULE_PARM_DESC(enable_scheduler, "Enable scheduler (0 = disable [default], 1 = enable)"); diff --git a/drivers/gpu/drm/i915/i915_params.h b/drivers/gpu/drm/i915/i915_params.h index 5299290..f855c86 100644 --- a/drivers/gpu/drm/i915/i915_params.h +++ b/drivers/gpu/drm/i915/i915_params.h @@ -60,6 +60,7 @@ struct i915_params { bool enable_guc_submission; bool verbose_state_checks; bool nuclear_pageflip; + int enable_scheduler; }; extern struct i915_params i915 __read_mostly; diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index 4f25bf2..47d7de4 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -34,6 +34,9 @@ bool i915_scheduler_is_enabled(struct drm_device *dev) { struct drm_i915_private *dev_priv = dev->dev_private; + if (!i915.enable_scheduler) + return false; + return dev_priv->scheduler != NULL; } @@ -548,7 +551,7 @@ int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe) WARN_ON(!scheduler); - if (1/*!i915.enable_scheduler*/) + if (!i915.enable_scheduler) return i915_scheduler_queue_execbuffer_bypass(qe); node = kmalloc(sizeof(*node), GFP_KERNEL); -- 1.9.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v5 17/35] drm/i915: Added scheduler support to __wait_request() calls
From: John HarrisonThe scheduler can cause batch buffers, and hence requests, to be submitted to the ring out of order and asynchronously to their submission to the driver. Thus at the point of waiting for the completion of a given request, it is not even guaranteed that the request has actually been sent to the hardware yet. Even it is has been sent, it is possible that it could be pre-empted and thus 'unsent'. This means that it is necessary to be able to submit requests to the hardware during the wait call itself. Unfortunately, while some callers of __wait_request() release the mutex lock first, others do not (and apparently can not). Hence there is the ability to deadlock as the wait stalls for submission but the asynchronous submission is stalled for the mutex lock. This change hooks the scheduler in to the __wait_request() code to ensure correct behaviour. That is, flush the target batch buffer through to the hardware and do not deadlock waiting for something that cannot currently be submitted. Instead, the wait call must return EAGAIN at least as far back as necessary to release the mutex lock and allow the scheduler's asynchronous processing to get in and handle the pre-emption operation and eventually (re-)submit the work. v3: Removed the explicit scheduler flush from i915_wait_request(). This is no longer necessary and was causing unintended changes to the scheduler priority level which broke a validation team test. v4: Corrected the format of some comments to keep the style checker happy. v5: Added function description. [Joonas Lahtinen] For: VIZ-1587 Signed-off-by: John Harrison Cc: Joonas Lahtinen --- drivers/gpu/drm/i915/i915_drv.h | 3 ++- drivers/gpu/drm/i915/i915_gem.c | 37 ++--- drivers/gpu/drm/i915/i915_scheduler.c | 31 +++ drivers/gpu/drm/i915/i915_scheduler.h | 2 ++ drivers/gpu/drm/i915/intel_display.c| 5 +++-- drivers/gpu/drm/i915/intel_ringbuffer.c | 2 +- 6 files changed, 69 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 4d544f1..5eeeced 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -3071,7 +3071,8 @@ int __i915_wait_request(struct drm_i915_gem_request *req, unsigned reset_counter, bool interruptible, s64 *timeout, - struct intel_rps_client *rps); + struct intel_rps_client *rps, + bool is_locked); int __must_check i915_wait_request(struct drm_i915_gem_request *req); int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf); int __must_check diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 2dd9b55..17b44b3 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1258,7 +1258,8 @@ int __i915_wait_request(struct drm_i915_gem_request *req, unsigned reset_counter, bool interruptible, s64 *timeout, - struct intel_rps_client *rps) + struct intel_rps_client *rps, + bool is_locked) { struct intel_engine_cs *ring = i915_gem_request_get_ring(req); struct drm_device *dev = ring->dev; @@ -1268,8 +1269,10 @@ int __i915_wait_request(struct drm_i915_gem_request *req, DEFINE_WAIT(wait); unsigned long timeout_expire; s64 before = 0; /* Only to silence a compiler warning. */ - int ret; + int ret = 0; + boolbusy; + might_sleep(); WARN(!intel_irqs_enabled(dev_priv), "IRQs disabled"); if (i915_gem_request_completed(req)) @@ -1324,6 +1327,26 @@ int __i915_wait_request(struct drm_i915_gem_request *req, break; } + if (is_locked) { + /* +* If this request is being processed by the scheduler +* then it is unsafe to sleep with the mutex lock held +* as the scheduler may require the lock in order to +* progress the request. +*/ + if (i915_scheduler_is_request_tracked(req, NULL, )) { + if (busy) { + ret = -EAGAIN; + break; + } + } + + /* +* If the request is not tracked by the scheduler +* then the regular test can be done. +*/ + } + if (i915_gem_request_completed(req)) {
[Intel-gfx] [PATCH v5 31/35] drm/i915: Scheduler state dump via debugfs
From: John HarrisonAdded a facility for triggering the scheduler state dump via a debugfs entry. v2: New patch in series. For: VIZ-1587 Signed-off-by: John Harrison --- drivers/gpu/drm/i915/i915_debugfs.c | 33 + drivers/gpu/drm/i915/i915_scheduler.c | 9 + drivers/gpu/drm/i915/i915_scheduler.h | 6 ++ 3 files changed, 44 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index 2c8b00f..e0dc06d77 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -1285,6 +1285,38 @@ DEFINE_SIMPLE_ATTRIBUTE(i915_scheduler_file_queue_max_fops, i915_scheduler_file_queue_max_set, "%llu\n"); +static int +i915_scheduler_dump_flags_get(void *data, u64 *val) +{ + struct drm_device *dev = data; + struct drm_i915_private *dev_priv = dev->dev_private; + struct i915_scheduler *scheduler = dev_priv->scheduler; + + *val = scheduler->dump_flags; + + return 0; +} + +static int +i915_scheduler_dump_flags_set(void *data, u64 val) +{ + struct drm_device *dev = data; + struct drm_i915_private *dev_priv = dev->dev_private; + struct i915_scheduler *scheduler = dev_priv->scheduler; + + scheduler->dump_flags = lower_32_bits(val) & i915_sf_dump_mask; + + if (val & 1) + i915_scheduler_dump_all(dev, "DebugFS"); + + return 0; +} + +DEFINE_SIMPLE_ATTRIBUTE(i915_scheduler_dump_flags_fops, + i915_scheduler_dump_flags_get, + i915_scheduler_dump_flags_set, + "0x%llx\n"); + static int i915_frequency_info(struct seq_file *m, void *unused) { struct drm_info_node *node = m->private; @@ -5666,6 +5698,7 @@ static const struct i915_debugfs_files { {"i915_scheduler_priority_preempt", _scheduler_priority_preempt_fops}, {"i915_scheduler_min_flying", _scheduler_min_flying_fops}, {"i915_scheduler_file_queue_max", _scheduler_file_queue_max_fops}, + {"i915_scheduler_dump_flags", _scheduler_dump_flags_fops}, {"i915_display_crc_ctl", _display_crc_ctl_fops}, {"i915_pri_wm_latency", _pri_wm_latency_fops}, {"i915_spr_wm_latency", _spr_wm_latency_fops}, diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index c69e2b8..b738e0b 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -184,6 +184,10 @@ int i915_scheduler_init(struct drm_device *dev) scheduler->priority_level_preempt = 900; scheduler->min_flying = 2; scheduler->file_queue_max = 64; + scheduler->dump_flags = i915_sf_dump_force | + i915_sf_dump_details | + i915_sf_dump_seqno | + i915_sf_dump_dependencies; dev_priv->scheduler = scheduler; @@ -1311,10 +1315,7 @@ static int i915_scheduler_dump_all_locked(struct drm_device *dev, int i, r, ret = 0; for_each_ring(ring, dev_priv, i) { - scheduler->flags[ring->id] |= i915_sf_dump_force | - i915_sf_dump_details | - i915_sf_dump_seqno | - i915_sf_dump_dependencies; + scheduler->flags[ring->id] |= scheduler->dump_flags & i915_sf_dump_mask; r = i915_scheduler_dump_locked(ring, msg); if (ret == 0) ret = r; diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h index dcf1f05..47c7951 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.h +++ b/drivers/gpu/drm/i915/i915_scheduler.h @@ -108,6 +108,7 @@ struct i915_scheduler { int32_t priority_level_preempt; uint32_tmin_flying; uint32_tfile_queue_max; + uint32_tdump_flags; /* Statistics: */ struct i915_scheduler_stats stats[I915_NUM_RINGS]; @@ -124,6 +125,11 @@ enum { i915_sf_dump_details= (1 << 9), i915_sf_dump_dependencies = (1 << 10), i915_sf_dump_seqno = (1 << 11), + + i915_sf_dump_mask = i915_sf_dump_force| + i915_sf_dump_details | + i915_sf_dump_dependencies | + i915_sf_dump_seqno, }; const char *i915_scheduler_flag_str(uint32_t flags); -- 1.9.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org
[Intel-gfx] [PATCH v5 32/35] drm/i915: Enable GPU scheduler by default
From: John HarrisonNow that all the scheduler patches have been applied, it is safe to enable. v5: Updated for new module parameter. For: VIZ-1587 Signed-off-by: John Harrison --- drivers/gpu/drm/i915/i915_params.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c index 0ef3159..9be486f 100644 --- a/drivers/gpu/drm/i915/i915_params.c +++ b/drivers/gpu/drm/i915/i915_params.c @@ -57,7 +57,7 @@ struct i915_params i915 __read_mostly = { .edp_vswing = 0, .enable_guc_submission = true, .guc_log_level = -1, - .enable_scheduler = 0, + .enable_scheduler = 1, }; module_param_named(modeset, i915.modeset, int, 0400); @@ -206,4 +206,4 @@ MODULE_PARM_DESC(guc_log_level, "GuC firmware logging level (-1:disabled (default), 0-3:enabled)"); module_param_named_unsafe(enable_scheduler, i915.enable_scheduler, int, 0600); -MODULE_PARM_DESC(enable_scheduler, "Enable scheduler (0 = disable [default], 1 = enable)"); +MODULE_PARM_DESC(enable_scheduler, "Enable scheduler (0 = disable, 1 = enable [default])"); -- 1.9.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v5 24/35] drm/i915: Added trace points to scheduler
From: John HarrisonAdded trace points to the scheduler to track all the various events, node state transitions and other interesting things that occur. v2: Updated for new request completion tracking implementation. v3: Updated for changes to node kill code. v4: Wrapped some long lines to keep the style checker happy. For: VIZ-1587 Signed-off-by: John Harrison --- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 2 + drivers/gpu/drm/i915/i915_scheduler.c | 26 drivers/gpu/drm/i915/i915_trace.h | 196 + drivers/gpu/drm/i915/intel_lrc.c | 2 + 4 files changed, 226 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index b9ad0fd..d4de8c7 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -1272,6 +1272,8 @@ i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params, i915_gem_execbuffer_move_to_active(vmas, params->request); + trace_i915_gem_ring_queue(ring, params); + qe = container_of(params, typeof(*qe), params); ret = i915_scheduler_queue_execbuffer(qe); if (ret) diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index 47d7de4..e56ce08 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -88,6 +88,8 @@ static void i915_scheduler_node_requeue(struct i915_scheduler_queue_entry *node) /* Seqno will be reassigned on relaunch */ node->params.request->seqno = 0; node->status = i915_sqs_queued; + trace_i915_scheduler_unfly(node->params.ring, node); + trace_i915_scheduler_node_state_change(node->params.ring, node); } /* @@ -99,7 +101,11 @@ static void i915_scheduler_node_kill(struct i915_scheduler_queue_entry *node) WARN_ON(!node); WARN_ON(I915_SQS_IS_COMPLETE(node)); + if (I915_SQS_IS_FLYING(node)) + trace_i915_scheduler_unfly(node->params.ring, node); + node->status = i915_sqs_dead; + trace_i915_scheduler_node_state_change(node->params.ring, node); } /* Mark a node as in flight on the hardware. */ @@ -124,6 +130,9 @@ static int i915_scheduler_node_fly(struct i915_scheduler_queue_entry *node) node->status = i915_sqs_flying; + trace_i915_scheduler_fly(ring, node); + trace_i915_scheduler_node_state_change(ring, node); + if (!(scheduler->flags[ring->id] & i915_sf_interrupts_enabled)) { bool success = true; @@ -280,6 +289,8 @@ static int i915_scheduler_pop_from_queue_locked(struct intel_engine_cs *ring, INIT_LIST_HEAD(>link); best->status = i915_sqs_popped; + trace_i915_scheduler_node_state_change(ring, best); + ret = 0; } else { /* Can only get here if: @@ -297,6 +308,8 @@ static int i915_scheduler_pop_from_queue_locked(struct intel_engine_cs *ring, } } + trace_i915_scheduler_pop_from_queue(ring, best); + *pop_node = best; return ret; } @@ -506,6 +519,8 @@ static int i915_scheduler_queue_execbuffer_bypass(struct i915_scheduler_queue_en struct i915_scheduler *scheduler = dev_priv->scheduler; int ret; + trace_i915_scheduler_queue(qe->params.ring, qe); + intel_ring_reserved_space_cancel(qe->params.request->ringbuf); scheduler->flags[qe->params.ring->id] |= i915_sf_submitting; @@ -628,6 +643,9 @@ int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe) not_flying = i915_scheduler_count_flying(scheduler, ring) < scheduler->min_flying; + trace_i915_scheduler_queue(ring, node); + trace_i915_scheduler_node_state_change(ring, node); + spin_unlock_irq(>lock); if (not_flying) @@ -657,6 +675,8 @@ bool i915_scheduler_notify_request(struct drm_i915_gem_request *req) struct i915_scheduler_queue_entry *node = req->scheduler_qe; unsigned long flags; + trace_i915_scheduler_landing(req); + if (!node) return false; @@ -670,6 +690,8 @@ bool i915_scheduler_notify_request(struct drm_i915_gem_request *req) else node->status = i915_sqs_complete; + trace_i915_scheduler_node_state_change(req->ring, node); + spin_unlock_irqrestore(>lock, flags); return true; @@ -877,6 +899,8 @@ static bool i915_scheduler_remove(struct i915_scheduler *scheduler, /* Launch more packets now? */ do_submit = (queued > 0) && (flying < scheduler->min_flying); + trace_i915_scheduler_remove(ring, min_seqno, do_submit); + spin_unlock_irq(>lock); return do_submit; @@ -912,6 +936,8 @@ static void
[Intel-gfx] [PATCH v5 15/35] drm/i915: Added tracking/locking of batch buffer objects
From: John HarrisonThe scheduler needs to track interdependencies between batch buffers. These are calculated by analysing the object lists of the buffers and looking for commonality. The scheduler also needs to keep those buffers locked long after the initial IOCTL call has returned to user land. v3: Updated to support read-read optimisation. v5: Updated due to changes to earlier patches in series for splitting bypass mode into a separate function and consoliding the clean up code. For: VIZ-1587 Signed-off-by: John Harrison --- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 48 -- drivers/gpu/drm/i915/i915_scheduler.c | 15 ++ 2 files changed, 61 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index 11bea8d..f45f4dc 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -1428,7 +1428,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data, struct i915_execbuffer_params *params = const u32 ctx_id = i915_execbuffer2_get_context_id(*args); u32 dispatch_flags; - int ret; + int ret, i; bool need_relocs; if (!i915_gem_check_execbuffer(args)) @@ -1543,6 +1543,14 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data, goto pre_mutex_err; } + qe.saved_objects = kzalloc( + sizeof(*qe.saved_objects) * args->buffer_count, + GFP_KERNEL); + if (!qe.saved_objects) { + ret = -ENOMEM; + goto err; + } + /* Look up object handles */ ret = eb_lookup_vmas(eb, exec, args, vm, file); if (ret) @@ -1663,7 +1671,30 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data, params->args_DR1= args->DR1; params->args_DR4= args->DR4; params->batch_obj = batch_obj; - params->ctx = ctx; + + /* +* Save away the list of objects used by this batch buffer for the +* purpose of tracking inter-buffer dependencies. +*/ + for (i = 0; i < args->buffer_count; i++) { + struct drm_i915_gem_object *obj; + + /* +* NB: 'drm_gem_object_lookup()' increments the object's +* reference count and so must be matched by a +* 'drm_gem_object_unreference' call. +*/ + obj = to_intel_bo(drm_gem_object_lookup(dev, file, + exec[i].handle)); + qe.saved_objects[i].obj = obj; + qe.saved_objects[i].read_only = obj->base.pending_write_domain == 0; + + } + qe.num_objs = i; + + /* Lock and save the context object as well. */ + i915_gem_context_reference(ctx); + params->ctx = ctx; ret = dev_priv->gt.execbuf_submit(params, args, >vmas); if (ret) @@ -1696,6 +1727,19 @@ err: i915_gem_context_unreference(ctx); eb_destroy(eb); + /* Need to release the objects: */ + if (qe.saved_objects) { + for (i = 0; i < qe.num_objs; i++) + drm_gem_object_unreference( + _objects[i].obj->base); + + kfree(qe.saved_objects); + } + + /* Context too */ + if (params->ctx) + i915_gem_context_unreference(params->ctx); + /* * If the request was created but not successfully submitted then it * must be freed again. If it was submitted then it is being tracked diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index a3ffd04..60a59d3 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -719,6 +719,8 @@ void i915_scheduler_wakeup(struct drm_device *dev) */ void i915_scheduler_clean_node(struct i915_scheduler_queue_entry *node) { + int i; + if (!I915_SQS_IS_COMPLETE(node)) { WARN(!node->params.request->cancelled, "Cleaning active node: %d!\n", node->status); @@ -736,6 +738,19 @@ void i915_scheduler_clean_node(struct i915_scheduler_queue_entry *node) node->params.batch_obj = NULL; } + /* Release the locked buffers: */ + for (i = 0; i < node->num_objs; i++) + drm_gem_object_unreference(>saved_objects[i].obj->base); + kfree(node->saved_objects); + node->saved_objects = NULL; + node->num_objs = 0; + + /* Context too: */ + if (node->params.ctx) { + i915_gem_context_unreference(node->params.ctx); + node->params.ctx = NULL; + } + /* And anything else owned by the
[Intel-gfx] [PATCH v5 27/35] drm/i915: Added debug state dump facilities to scheduler
From: John HarrisonWhen debugging batch buffer submission issues, it is useful to be able to see what the current state of the scheduler is. This change adds functions for decoding the internal scheduler state and reporting it. v3: Updated a debug message with the new state_str() function. v4: Wrapped some long lines to keep the style checker happy. Removed the fence/sync code as that will now be part of a separate patch series. v5: Removed forward declarations and white space. Added documentation. [Joonas Lahtinen] Also squashed in later patch to add seqno information from the start. It was only being added in a separate patch due to historical reasons which have since gone away. For: VIZ-1587 Signed-off-by: John Harrison Cc: Joonas Lahtinen --- drivers/gpu/drm/i915/i915_scheduler.c | 302 +- drivers/gpu/drm/i915/i915_scheduler.h | 15 ++ 2 files changed, 315 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index f7f29d5..d0eed52 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -40,6 +40,117 @@ bool i915_scheduler_is_enabled(struct drm_device *dev) return dev_priv->scheduler != NULL; } +const char *i915_qe_state_str(struct i915_scheduler_queue_entry *node) +{ + static char str[50]; + char*ptr = str; + + *(ptr++) = node->bumped ? 'B' : '-', + *(ptr++) = i915_gem_request_completed(node->params.request) ? 'C' : '-'; + + *ptr = 0; + + return str; +} + +char i915_scheduler_queue_status_chr(enum i915_scheduler_queue_status status) +{ + switch (status) { + case i915_sqs_none: + return 'N'; + + case i915_sqs_queued: + return 'Q'; + + case i915_sqs_popped: + return 'X'; + + case i915_sqs_flying: + return 'F'; + + case i915_sqs_complete: + return 'C'; + + case i915_sqs_dead: + return 'D'; + + default: + break; + } + + return '?'; +} + +const char *i915_scheduler_queue_status_str( + enum i915_scheduler_queue_status status) +{ + static char str[50]; + + switch (status) { + case i915_sqs_none: + return "None"; + + case i915_sqs_queued: + return "Queued"; + + case i915_sqs_popped: + return "Popped"; + + case i915_sqs_flying: + return "Flying"; + + case i915_sqs_complete: + return "Complete"; + + case i915_sqs_dead: + return "Dead"; + + default: + break; + } + + sprintf(str, "[Unknown_%d!]", status); + return str; +} + +const char *i915_scheduler_flag_str(uint32_t flags) +{ + static char str[100]; + char *ptr = str; + + *ptr = 0; + +#define TEST_FLAG(flag, msg) \ + do {\ + if (flags & (flag)) { \ + strcpy(ptr, msg); \ + ptr += strlen(ptr); \ + flags &= ~(flag); \ + } \ + } while (0) + + TEST_FLAG(i915_sf_interrupts_enabled, "IntOn|"); + TEST_FLAG(i915_sf_submitting, "Submitting|"); + TEST_FLAG(i915_sf_dump_force, "DumpForce|"); + TEST_FLAG(i915_sf_dump_details, "DumpDetails|"); + TEST_FLAG(i915_sf_dump_dependencies, "DumpDeps|"); + TEST_FLAG(i915_sf_dump_seqno, "DumpSeqno|"); + +#undef TEST_FLAG + + if (flags) { + sprintf(ptr, "Unknown_0x%X!", flags); + ptr += strlen(ptr); + } + + if (ptr == str) + strcpy(str, "-"); + else + ptr[-1] = 0; + + return str; +}; + /** * i915_scheduler_init - Initialise the scheduler. * @dev: DRM device @@ -1024,6 +1135,193 @@ void i915_scheduler_work_handler(struct work_struct *work) i915_scheduler_process_work(ring); } +static int i915_scheduler_dump_locked(struct intel_engine_cs *ring, + const char *msg) +{ + struct drm_i915_private *dev_priv = ring->dev->dev_private; + struct i915_scheduler *scheduler = dev_priv->scheduler; + struct i915_scheduler_queue_entry *node; + int flying = 0, queued = 0, complete = 0, other = 0; + static int old_flying = -1, old_queued = -1, old_complete = -1; + bool b_dump; + char brkt[2] = { '<', '>' }; + + if (!ring) + return -EINVAL; + + list_for_each_entry(node, >node_queue[ring->id], link) { + if
[Intel-gfx] [PATCH v5 23/35] drm/i915: Defer seqno allocation until actual hardware submission time
From: John HarrisonThe seqno value is now only used for the final test for completion of a request. It is no longer used to track the request through the software stack. Thus it is no longer necessary to allocate the seqno immediately with the request. Instead, it can be done lazily and left until the request is actually sent to the hardware. This is particular advantageous with a GPU scheduler as the requests can then be re-ordered between their creation and their hardware submission without having out of order seqnos. v2: i915_add_request() can't fail! Combine with 'drm/i915: Assign seqno at start of exec_final()' Various bits of code during the execbuf code path need a seqno value to be assigned to the request. This change makes this assignment explicit at the start of submission_final() rather than relying on an auto-generated seqno to have happened already. This is in preparation for a future patch which changes seqno values to be assigned lazily (during add_request). v3: Updated to use locally cached request pointer. v4: Changed some white space and comment formatting to keep the style checker happy. For: VIZ-1587 Signed-off-by: John Harrison --- drivers/gpu/drm/i915/i915_drv.h| 1 + drivers/gpu/drm/i915/i915_gem.c| 23 ++- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 14 ++ drivers/gpu/drm/i915/intel_lrc.c | 14 ++ 4 files changed, 51 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 5eeeced..071a27b 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2257,6 +2257,7 @@ struct drm_i915_gem_request { * has finished processing this request. */ u32 seqno; + u32 reserved_seqno; /* Unique identifier which can be used for trace points & debug */ uint32_t uniq; diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index a249e52..a2c136d 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2616,6 +2616,11 @@ i915_gem_get_seqno(struct drm_device *dev, u32 *seqno) /* reserve 0 for non-seqno */ if (dev_priv->next_seqno == 0) { + /* +* Why is the full re-initialisation required? Is it only for +* hardware semaphores? If so, could skip it in the case where +* semaphores are disabled? +*/ int ret = i915_gem_init_seqno(dev, 0); if (ret) return ret; @@ -2673,6 +2678,12 @@ void __i915_add_request(struct drm_i915_gem_request *request, WARN(ret, "*_ring_flush_all_caches failed: %d!\n", ret); } + /* Make the request's seqno 'live': */ + if (!request->seqno) { + request->seqno = request->reserved_seqno; + WARN_ON(request->seqno != dev_priv->last_seqno); + } + /* Record the position of the start of the request so that * should we detect the updated seqno part-way through the * GPU processing the request, we never over-estimate the @@ -2930,6 +2941,9 @@ void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked) list_for_each_entry_safe(req, req_next, >fence_signal_list, signal_link) { if (!req->cancelled) { + /* How can this happen? */ + WARN_ON(req->seqno == 0); + if (!i915_seqno_passed(seqno, req->seqno)) break; } @@ -3079,7 +3093,14 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring, if (req == NULL) return -ENOMEM; - ret = i915_gem_get_seqno(ring->dev, >seqno); + /* +* Assign an identifier to track this request through the hardware +* but don't make it live yet. It could change in the future if this +* request gets overtaken. However, it still needs to be allocated +* in advance because the point of submission must not fail and seqno +* allocation can fail. +*/ + ret = i915_gem_get_seqno(ring->dev, >reserved_seqno); if (ret) goto err; diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index f45f4dc..b9ad0fd 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -1295,6 +1295,20 @@ int i915_gem_ringbuffer_submission_final(struct i915_execbuffer_params *params) /* The mutex must be acquired before calling this function */ WARN_ON(!mutex_is_locked(>dev->struct_mutex)); + /* Make sure the request's seqno is the latest and greatest: */ + if (req->reserved_seqno != dev_priv->last_seqno) { + ret =
[Intel-gfx] [PATCH v5 28/35] drm/i915: Add early exit to execbuff_final() if insufficient ring space
From: John HarrisonOne of the major purposes of the GPU scheduler is to avoid stalling the CPU when the GPU is busy and unable to accept more work. This change adds support to the ring submission code to allow a ring space check to be performed before attempting to submit a batch buffer to the hardware. If insufficient space is available then the scheduler can go away and come back later, letting the CPU get on with other work, rather than stalling and waiting for the hardware to catch up. v3: Updated to use locally cached request pointer. v4: Line wrapped some comments differently to keep the style checker happy. Downgraded a BUG_ON to a WARN_ON as the latter is preferred. Removed some obsolete, commented out code. For: VIZ-1587 Signed-off-by: John Harrison --- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 41 +-- drivers/gpu/drm/i915/intel_lrc.c | 54 +++--- drivers/gpu/drm/i915/intel_ringbuffer.c| 26 ++ drivers/gpu/drm/i915/intel_ringbuffer.h| 1 + 4 files changed, 107 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index dff120c..83ce94d 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -1147,25 +1147,19 @@ i915_reset_gen7_sol_offsets(struct drm_device *dev, { struct intel_engine_cs *ring = req->ring; struct drm_i915_private *dev_priv = dev->dev_private; - int ret, i; + int i; if (!IS_GEN7(dev) || ring != _priv->ring[RCS]) { DRM_DEBUG("sol reset is gen7/rcs only\n"); return -EINVAL; } - ret = intel_ring_begin(req, 4 * 3); - if (ret) - return ret; - for (i = 0; i < 4; i++) { intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1)); intel_ring_emit_reg(ring, GEN7_SO_WRITE_OFFSET(i)); intel_ring_emit(ring, 0); } - intel_ring_advance(ring); - return 0; } @@ -1293,6 +1287,7 @@ int i915_gem_ringbuffer_submission_final(struct i915_execbuffer_params *params) struct intel_engine_cs *ring = params->ring; u64 exec_start, exec_len; int ret; + uint32_t min_space; /* The mutex must be acquired before calling this function */ WARN_ON(!mutex_is_locked(>dev->struct_mutex)); @@ -1316,6 +1311,34 @@ int i915_gem_ringbuffer_submission_final(struct i915_execbuffer_params *params) goto error; /* +* It would be a bad idea to run out of space while writing commands +* to the ring. One of the major aims of the scheduler is to not +* stall at any point for any reason. However, doing an early exit +* half way through submission could result in a partial sequence +* being written which would leave the engine in an unknown state. +* Therefore, check in advance that there will be enough space for +* the entire submission whether emitted by the code below OR by any +* other functions that may be executed before the end of final(). +* +* NB: This test deliberately overestimates, because that's easier +* than tracing every potential path that could be taken! +* +* Current measurements suggest that we may need to emit up to 186 +* dwords, so this is rounded up to 256 here. Then double that to get +* the free space requirement, because the block is not allowed to +* span the transition from the end to the beginning of the ring. +*/ +#define I915_BATCH_EXEC_MAX_LEN 256/* max dwords emitted here */ + min_space = I915_BATCH_EXEC_MAX_LEN * 2 * sizeof(uint32_t); + ret = intel_ring_test_space(req->ringbuf, min_space); + if (ret) + goto error; + + ret = intel_ring_begin(req, I915_BATCH_EXEC_MAX_LEN); + if (ret) + goto error; + + /* * Unconditionally invalidate gpu caches and ensure that we do flush * any residual writes from the previous batch. */ @@ -1333,10 +1356,6 @@ int i915_gem_ringbuffer_submission_final(struct i915_execbuffer_params *params) if (ring == _priv->ring[RCS] && params->instp_mode != dev_priv->relative_constants_mode) { - ret = intel_ring_begin(req, 4); - if (ret) - goto error; - intel_ring_emit(ring, MI_NOOP); intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1)); intel_ring_emit_reg(ring, INSTPM); diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 2b9f49c..e124443 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -231,6 +231,27 @@ static void
[Intel-gfx] [PATCH v5 11/35] drm/i915: Added scheduler hook into i915_gem_request_notify()
From: John HarrisonThe scheduler needs to know when requests have completed so that it can keep its own internal state up to date and can submit new requests to the hardware from its queue. v2: Updated due to changes in request handling. The operation is now reversed from before. Rather than the scheduler being in control of completion events, it is now the request code itself. The scheduler merely receives a notification event. It can then optionally request it's worker thread be woken up after all completion processing is complete. v4: Downgraded a BUG_ON to a WARN_ON as the latter is preferred. v5: Squashed the i915_scheduler.c portions down into the 'start of scheduler' patch. [Joonas Lahtinen] For: VIZ-1587 Signed-off-by: John Harrison Cc: Joonas Lahtinen --- drivers/gpu/drm/i915/i915_gem.c | 16 1 file changed, 16 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 0003cfc..c3b7def 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2872,6 +2872,7 @@ void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked) { struct drm_i915_gem_request *req, *req_next; unsigned long flags; + bool wake_sched = false; u32 seqno; if (list_empty(>fence_signal_list)) { @@ -2908,6 +2909,14 @@ void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked) */ list_del_init(>signal_link); + /* +* NB: Must notify the scheduler before signalling +* the node. Otherwise the node can get retired first +* and call scheduler_clean() while the scheduler +* thinks it is still active. +*/ + wake_sched |= i915_scheduler_notify_request(req); + if (!req->cancelled) { fence_signal_locked(>fence); trace_i915_gem_request_complete(req); @@ -2924,6 +2933,13 @@ void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked) if (!fence_locked) spin_unlock_irqrestore(>fence_lock, flags); + + /* Necessary? Or does the fence_signal() call do an implicit wakeup? */ + wake_up_all(>irq_queue); + + /* Final scheduler processing after all individual updates are done. */ + if (wake_sched) + i915_scheduler_wakeup(ring->dev); } static const char *i915_gem_request_get_driver_name(struct fence *req_fence) -- 1.9.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v5 08/35] drm/i915: Disable hardware semaphores when GPU scheduler is enabled
From: John HarrisonHardware sempahores require seqno values to be continuously incrementing. However, the scheduler's reordering of batch buffers means that the seqno values going through the hardware could be out of order. Thus semaphores can not be used. On the other hand, the scheduler superceeds the need for hardware semaphores anyway. Having one ring stall waiting for something to complete on another ring is inefficient if that ring could be working on some other, independent task. This is what the scheduler is meant to do - keep the hardware as busy as possible by reordering batch buffers to avoid dependency stalls. v4: Downgraded a BUG_ON to WARN_ON as the latter is preferred. v5: Squashed the i915_scheduler.c portions down into the 'start of scheduler' patch. [Joonas Lahtinen] For: VIZ-1587 Signed-off-by: John Harrison Cc: Joonas Lahtinen --- drivers/gpu/drm/i915/i915_drv.c | 9 + drivers/gpu/drm/i915/intel_ringbuffer.c | 4 2 files changed, 13 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 975af35..5760a17 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -34,6 +34,7 @@ #include "i915_drv.h" #include "i915_trace.h" #include "intel_drv.h" +#include "i915_scheduler.h" #include #include @@ -517,6 +518,14 @@ void intel_detect_pch(struct drm_device *dev) bool i915_semaphore_is_enabled(struct drm_device *dev) { + /* Hardware semaphores are not compatible with the scheduler due to the +* seqno values being potentially out of order. However, semaphores are +* also not required as the scheduler will handle interring dependencies +* and try do so in a way that does not cause dead time on the hardware. +*/ + if (i915_scheduler_is_enabled(dev)) + return false; + if (INTEL_INFO(dev)->gen < 6) return false; diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 9d4f19d..ca7b8af 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -33,6 +33,7 @@ #include #include "i915_trace.h" #include "intel_drv.h" +#include "i915_scheduler.h" int __intel_ring_space(int head, int tail, int size) { @@ -1400,6 +1401,9 @@ gen6_ring_sync(struct drm_i915_gem_request *waiter_req, u32 wait_mbox = signaller->semaphore.mbox.wait[waiter->id]; int ret; + /* Arithmetic on sequence numbers is unreliable with a scheduler. */ + WARN_ON(i915_scheduler_is_enabled(signaller->dev)); + /* Throughout all of the GEM code, seqno passed implies our current * seqno is >= the last seqno executed. However for hardware the * comparison is strictly greater than. -- 1.9.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v5 13/35] drm/i915: Redirect execbuffer_final() via scheduler
From: John HarrisonUpdated the execbuffer() code to pass the packaged up batch buffer information to the scheduler rather than calling execbuffer_final() directly. The scheduler queue() code is currently a stub which simply chains on to _final() immediately. For: VIZ-1587 Signed-off-by: John Harrison --- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 18 +++--- drivers/gpu/drm/i915/intel_lrc.c | 12 2 files changed, 11 insertions(+), 19 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index 7978dae..09c5ce9 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -33,6 +33,7 @@ #include "intel_drv.h" #include #include +#include "i915_scheduler.h" #define __EXEC_OBJECT_HAS_PIN (1<<31) #define __EXEC_OBJECT_HAS_FENCE (1<<30) @@ -1226,6 +1227,7 @@ i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params, struct drm_i915_gem_execbuffer2 *args, struct list_head *vmas) { + struct i915_scheduler_queue_entry *qe; struct drm_device *dev = params->dev; struct intel_engine_cs *ring = params->ring; struct drm_i915_private *dev_priv = dev->dev_private; @@ -1270,17 +1272,11 @@ i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params, i915_gem_execbuffer_move_to_active(vmas, params->request); - ret = dev_priv->gt.execbuf_final(params); + qe = container_of(params, typeof(*qe), params); + ret = i915_scheduler_queue_execbuffer(qe); if (ret) return ret; - /* -* Free everything that was stored in the QE structure (until the -* scheduler arrives and does it instead): -*/ - if (params->dispatch_flags & I915_DISPATCH_SECURE) - i915_gem_execbuff_release_batch_obj(params->batch_obj); - return 0; } @@ -1420,8 +1416,8 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data, struct intel_engine_cs *ring; struct intel_context *ctx; struct i915_address_space *vm; - struct i915_execbuffer_params params_master; /* XXX: will be removed later */ - struct i915_execbuffer_params *params = _master; + struct i915_scheduler_queue_entry qe; + struct i915_execbuffer_params *params = const u32 ctx_id = i915_execbuffer2_get_context_id(*args); u32 dispatch_flags; int ret; @@ -1529,7 +1525,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data, else vm = _priv->gtt.base; - memset(_master, 0x00, sizeof(params_master)); + memset(, 0x00, sizeof(qe)); eb = eb_create(args); if (eb == NULL) { diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 12e8949..ff4565f 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -136,6 +136,7 @@ #include #include "i915_drv.h" #include "intel_mocs.h" +#include "i915_scheduler.h" #define GEN9_LR_CONTEXT_RENDER_SIZE (22 * PAGE_SIZE) #define GEN8_LR_CONTEXT_RENDER_SIZE (20 * PAGE_SIZE) @@ -910,6 +911,7 @@ int intel_execlists_submission(struct i915_execbuffer_params *params, struct drm_i915_gem_execbuffer2 *args, struct list_head *vmas) { + struct i915_scheduler_queue_entry *qe; struct drm_device *dev = params->dev; struct intel_engine_cs *ring = params->ring; struct drm_i915_private *dev_priv = dev->dev_private; @@ -952,17 +954,11 @@ int intel_execlists_submission(struct i915_execbuffer_params *params, i915_gem_execbuffer_move_to_active(vmas, params->request); - ret = dev_priv->gt.execbuf_final(params); + qe = container_of(params, typeof(*qe), params); + ret = i915_scheduler_queue_execbuffer(qe); if (ret) return ret; - /* -* Free everything that was stored in the QE structure (until the -* scheduler arrives and does it instead): -*/ - if (params->dispatch_flags & I915_DISPATCH_SECURE) - i915_gem_execbuff_release_batch_obj(params->batch_obj); - return 0; } -- 1.9.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v5 05/35] drm/i915: Re-instate request->uniq because it is extremely useful
From: John HarrisonThe seqno value cannot always be used when debugging issues via trace points. This is because it can be reset back to start, especially during TDR type tests. Also, when the scheduler arrives the seqno is only valid while a given request is executing on the hardware. While the request is simply queued waiting for submission, it's seqno value will be zero (meaning invalid). v4: Wrapped a long line to keep the style checker happy. v5: Added uniq to the dispatch trace point [Svetlana Kukanova] For: VIZ-5115 Signed-off-by: John Harrison Reviewed-by: Tomas Elf --- drivers/gpu/drm/i915/i915_drv.h | 5 + drivers/gpu/drm/i915/i915_gem.c | 4 +++- drivers/gpu/drm/i915/i915_trace.h | 32 ++-- 3 files changed, 30 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 8dd811e..f4487b9 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1986,6 +1986,8 @@ struct drm_i915_private { struct intel_encoder *dig_port_map[I915_MAX_PORTS]; + uint32_t request_uniq; + /* * NOTE: This is the dri1/ums dungeon, don't add stuff here. Your patch * will be rejected. Instead look for a better place. @@ -2242,6 +2244,9 @@ struct drm_i915_gem_request { */ u32 seqno; + /* Unique identifier which can be used for trace points & debug */ + uint32_t uniq; + /** Position in the ringbuffer of the start of the request */ u32 head; diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index bf39ca4..dfe43ea 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2960,7 +2960,8 @@ static void i915_gem_request_fence_value_str(struct fence *req_fence, req = container_of(req_fence, typeof(*req), fence); - snprintf(str, size, "%d [%d]", req->fence.seqno, req->seqno); + snprintf(str, size, "%d [%d:%d]", req->fence.seqno, req->uniq, +req->seqno); } static const struct fence_ops i915_gem_request_fops = { @@ -3036,6 +3037,7 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring, req->i915 = dev_priv; req->ring = ring; + req->uniq = dev_priv->request_uniq++; req->ctx = ctx; i915_gem_context_reference(req->ctx); diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h index cfe4f03..455c215 100644 --- a/drivers/gpu/drm/i915/i915_trace.h +++ b/drivers/gpu/drm/i915/i915_trace.h @@ -469,6 +469,7 @@ TRACE_EVENT(i915_gem_ring_sync_to, __field(u32, dev) __field(u32, sync_from) __field(u32, sync_to) +__field(u32, uniq_to) __field(u32, seqno) ), @@ -476,13 +477,14 @@ TRACE_EVENT(i915_gem_ring_sync_to, __entry->dev = from->dev->primary->index; __entry->sync_from = from->id; __entry->sync_to = to_req->ring->id; + __entry->uniq_to = to_req->uniq; __entry->seqno = i915_gem_request_get_seqno(req); ), - TP_printk("dev=%u, sync-from=%u, sync-to=%u, seqno=%u", + TP_printk("dev=%u, sync-from=%u, sync-to=%u, seqno=%u, to_uniq=%u", __entry->dev, __entry->sync_from, __entry->sync_to, - __entry->seqno) + __entry->seqno, __entry->uniq_to) ); TRACE_EVENT(i915_gem_ring_dispatch, @@ -492,6 +494,7 @@ TRACE_EVENT(i915_gem_ring_dispatch, TP_STRUCT__entry( __field(u32, dev) __field(u32, ring) +__field(u32, uniq) __field(u32, seqno) __field(u32, flags) ), @@ -501,13 +504,15 @@ TRACE_EVENT(i915_gem_ring_dispatch, i915_gem_request_get_ring(req); __entry->dev = ring->dev->primary->index; __entry->ring = ring->id; + __entry->uniq = req->uniq; __entry->seqno = i915_gem_request_get_seqno(req); __entry->flags = flags; i915_trace_irq_get(ring, req); ), - TP_printk("dev=%u, ring=%u, seqno=%u, flags=%x", - __entry->dev, __entry->ring, __entry->seqno, __entry->flags) + TP_printk("dev=%u, ring=%u, uniq=%u, seqno=%u, flags=%x", + __entry->dev, __entry->ring, __entry->uniq, +
[Intel-gfx] [PATCH v5 10/35] drm/i915: Added scheduler hook when closing DRM file handles
From: John HarrisonThe scheduler decouples the submission of batch buffers to the driver with submission of batch buffers to the hardware. Thus it is possible for an application to close its DRM file handle while there is still work outstanding. That means the scheduler needs to know about file close events so it can remove the file pointer from such orphaned batch buffers and not attempt to dereference it later. v3: Updated to not wait for outstanding work to complete but merely remove the file handle reference. The wait was getting excessively complicated with inter-ring dependencies, pre-emption, and other such issues. v4: Changed some white space to keep the style checker happy. v5: Added function documentation and removed apparently objectionable white space. [Joonas Lahtinen] Used lighter weight spinlocks. For: VIZ-1587 Signed-off-by: John Harrison Cc: Joonas Lahtinen --- drivers/gpu/drm/i915/i915_dma.c | 3 +++ drivers/gpu/drm/i915/i915_scheduler.c | 48 +++ drivers/gpu/drm/i915/i915_scheduler.h | 2 ++ 3 files changed, 53 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c index a0f5659..678adc7 100644 --- a/drivers/gpu/drm/i915/i915_dma.c +++ b/drivers/gpu/drm/i915/i915_dma.c @@ -46,6 +46,7 @@ #include #include #include +#include "i915_scheduler.h" #include #include #include @@ -1258,6 +1259,8 @@ void i915_driver_lastclose(struct drm_device *dev) void i915_driver_preclose(struct drm_device *dev, struct drm_file *file) { + i915_scheduler_closefile(dev, file); + mutex_lock(>struct_mutex); i915_gem_context_close(dev, file); i915_gem_release(dev, file); diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index fc23ee7..ab5007a 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -872,3 +872,51 @@ void i915_scheduler_process_work(struct intel_engine_cs *ring) if (do_submit) intel_runtime_pm_put(dev_priv); } + +/** + * i915_scheduler_closefile - notify the scheduler that a DRM file handle + * has been closed. + * @dev: DRM device + * @file: file being closed + * + * Goes through the scheduler's queues and removes all connections to the + * disappearing file handle that still exist. There is an argument to say + * that this should also flush such outstanding work through the hardware. + * However, with pre-emption, TDR and other such complications doing so + * becomes a locking nightmare. So instead, just warn with a debug message + * if the application is leaking uncompleted work and make sure a null + * pointer dereference will not follow. + */ +int i915_scheduler_closefile(struct drm_device *dev, struct drm_file *file) +{ + struct i915_scheduler_queue_entry *node; + struct drm_i915_private *dev_priv = dev->dev_private; + struct i915_scheduler *scheduler = dev_priv->scheduler; + struct intel_engine_cs *ring; + int i; + + if (!scheduler) + return 0; + + spin_lock_irq(>lock); + + for_each_ring(ring, dev_priv, i) { + list_for_each_entry(node, >node_queue[ring->id], link) { + if (node->params.file != file) + continue; + + if (!I915_SQS_IS_COMPLETE(node)) + DRM_DEBUG_DRIVER("Closing file handle with outstanding work: %d:%d/%d on %s\n", +node->params.request->uniq, +node->params.request->seqno, +node->status, +ring->name); + + node->params.file = NULL; + } + } + + spin_unlock_irq(>lock); + + return 0; +} diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h index 415fec8..0e8b6a9 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.h +++ b/drivers/gpu/drm/i915/i915_scheduler.h @@ -87,6 +87,8 @@ enum { bool i915_scheduler_is_enabled(struct drm_device *dev); int i915_scheduler_init(struct drm_device *dev); +int i915_scheduler_closefile(struct drm_device *dev, +struct drm_file *file); void i915_scheduler_clean_node(struct i915_scheduler_queue_entry *node); int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe); bool i915_scheduler_notify_request(struct drm_i915_gem_request *req); -- 1.9.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v5 09/35] drm/i915: Force MMIO flips when scheduler enabled
From: John HarrisonMMIO flips are the preferred mechanism now but more importantly, pipe based flips cause issues for the scheduler. Specifically, submitting work to the rings around the side of the scheduler could cause that work to be lost if the scheduler generates a pre-emption event on that ring. For: VIZ-1587 Signed-off-by: John Harrison --- drivers/gpu/drm/i915/intel_display.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 6e12ed7..731d20a 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -46,6 +46,7 @@ #include #include #include +#include "i915_scheduler.h" /* Primary plane formats for gen <= 3 */ static const uint32_t i8xx_primary_formats[] = { @@ -11330,6 +11331,8 @@ static bool use_mmio_flip(struct intel_engine_cs *ring, return true; else if (i915.enable_execlists) return true; + else if (i915_scheduler_is_enabled(ring->dev)) + return true; else if (obj->base.dma_buf && !reservation_object_test_signaled_rcu(obj->base.dma_buf->resv, false)) -- 1.9.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v5 07/35] drm/i915: Prepare retire_requests to handle out-of-order seqnos
From: John HarrisonA major point of the GPU scheduler is that it re-orders batch buffers after they have been submitted to the driver. This leads to requests completing out of order. In turn, this means that the retire processing can no longer assume that all completed entries are at the front of the list. Rather than attempting to re-order the request list on a regular basis, it is better to simply scan the entire list. v2: Removed deferred free code as no longer necessary due to request handling updates. For: VIZ-1587 Signed-off-by: John Harrison --- drivers/gpu/drm/i915/i915_gem.c | 31 +-- 1 file changed, 13 insertions(+), 18 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 7d9aa24..0003cfc 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -3233,6 +3233,7 @@ void i915_gem_reset(struct drm_device *dev) void i915_gem_retire_requests_ring(struct intel_engine_cs *ring) { + struct drm_i915_gem_object *obj, *obj_next; struct drm_i915_gem_request *req, *req_next; LIST_HEAD(list_head); @@ -3245,37 +3246,31 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring) */ i915_gem_request_notify(ring, false); + /* +* Note that request entries might be out of order due to rescheduling +* and pre-emption. Thus both lists must be processed in their entirety +* rather than stopping at the first non-complete entry. +*/ + /* Retire requests first as we use it above for the early return. * If we retire requests last, we may use a later seqno and so clear * the requests lists without clearing the active list, leading to * confusion. */ - while (!list_empty(>request_list)) { - struct drm_i915_gem_request *request; - - request = list_first_entry(>request_list, - struct drm_i915_gem_request, - list); - - if (!i915_gem_request_completed(request)) - break; + list_for_each_entry_safe(req, req_next, >request_list, list) { + if (!i915_gem_request_completed(req)) + continue; - i915_gem_request_retire(request); + i915_gem_request_retire(req); } /* Move any buffers on the active list that are no longer referenced * by the ringbuffer to the flushing/inactive lists as appropriate, * before we free the context associated with the requests. */ - while (!list_empty(>active_list)) { - struct drm_i915_gem_object *obj; - - obj = list_first_entry(>active_list, - struct drm_i915_gem_object, - ring_list[ring->id]); - + list_for_each_entry_safe(obj, obj_next, >active_list, ring_list[ring->id]) { if (!list_empty(>last_read_req[ring->id]->list)) - break; + continue; i915_gem_object_retire__read(obj, ring->id); } -- 1.9.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx