[Intel-gfx] [PATCH 09/10] drm/i915: Migrate stolen objects before hibernation

2016-02-18 Thread ankitprasad . r . sharma
From: Chris Wilson 

Ville reminded us that stolen memory is not preserved across
hibernation, and a result of this was that context objects now being
allocated from stolen were being corrupted on S4 and promptly hanging
the GPU on resume.

We want to utilise stolen for as much as possible (nothing else will use
that wasted memory otherwise), so we need a strategy for handling
general objects allocated from stolen and hibernation. A simple solution
is to do a CPU copy through the GTT of the stolen object into a fresh
shmemfs backing store and thenceforth treat it as a normal objects. This
can be refined in future to either use a GPU copy to avoid the slow
uncached reads (though it's hibernation!) and recreate stolen objects
upon resume/first-use. For now, a simple approach should suffice for
testing the object migration.

v2:
Swap PTE for pinned bindings over to the shmemfs. This adds a
complicated dance, but is required as many stolen objects are likely to
be pinned for use by the hardware. Swapping the PTEs should not result
in externally visible behaviour, as each PTE update should be atomic and
the two pages identical. (danvet)

safe-by-default, or the principle of least surprise. We need a new flag
to mark objects that we can wilfully discard and recreate across
hibernation. (danvet)

Just use the global_list rather than invent a new stolen_list. This is
the slowpath hibernate and so adding a new list and the associated
complexity isn't worth it.

v3: Rebased on drm-intel-nightly (Ankit)

v4: Use insert_page to map stolen memory backed pages for migration to
shmem (Chris)

v5: Acquire mutex lock while copying stolen buffer objects to shmem (Chris)

v6: Handled file leak, Splitted object migration function, added kerneldoc
for migrate_stolen_to_shmemfs() function (Tvrtko)
Use i915 wrapper function for drm_mm_insert_node_in_range()

v7: Keep the object in cpu domain after get_pages, remove the object from
the unbound list only when marked PURGED, Corrected split of object migration
function (Chris)

v8: Split i915_gem_freeze(), removed redundant use of barrier, corrected
use of set_to_cpu_domain() (Chris)

v9: Replaced WARN_ON by BUG_ON and added a comment explaining it
(Daniel/Tvrtko)

v10: Document use of barriers (Chris)

Signed-off-by: Chris Wilson 
Signed-off-by: Ankitprasad Sharma 
Reviewed-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/i915/i915_drv.c |  17 ++-
 drivers/gpu/drm/i915/i915_drv.h |  10 ++
 drivers/gpu/drm/i915/i915_gem.c | 198 ++--
 drivers/gpu/drm/i915/i915_gem_stolen.c  |  49 
 drivers/gpu/drm/i915/intel_display.c|   3 +
 drivers/gpu/drm/i915/intel_fbdev.c  |   6 +
 drivers/gpu/drm/i915/intel_pm.c |   2 +
 drivers/gpu/drm/i915/intel_ringbuffer.c |   6 +
 8 files changed, 279 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 11d8414..cfa44af 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -996,6 +996,21 @@ static int i915_pm_suspend(struct device *dev)
return i915_drm_suspend(drm_dev);
 }
 
+static int i915_pm_freeze(struct device *dev)
+{
+   int ret;
+
+   ret = i915_gem_freeze(pci_get_drvdata(to_pci_dev(dev)));
+   if (ret)
+   return ret;
+
+   ret = i915_pm_suspend(dev);
+   if (ret)
+   return ret;
+
+   return 0;
+}
+
 static int i915_pm_suspend_late(struct device *dev)
 {
struct drm_device *drm_dev = dev_to_i915(dev)->dev;
@@ -1643,7 +1658,7 @@ static const struct dev_pm_ops i915_pm_ops = {
 * @restore, @restore_early : called after rebooting and restoring the
 *hibernation image [PMSG_RESTORE]
 */
-   .freeze = i915_pm_suspend,
+   .freeze = i915_pm_freeze,
.freeze_late = i915_pm_suspend_late,
.thaw_early = i915_pm_resume_early,
.thaw = i915_pm_resume,
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 943b301..16f2f94 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2137,6 +2137,12 @@ struct drm_i915_gem_object {
 * Advice: are the backing pages purgeable?
 */
unsigned int madv:2;
+   /**
+* Whereas madv is for userspace, there are certain situations
+* where we want I915_MADV_DONTNEED behaviour on internal objects
+* without conflating the userspace setting.
+*/
+   unsigned int internal_volatile:1;
 
/**
 * Current tiling mode for the object.
@@ -3093,6 +3099,9 @@ int i915_gem_l3_remap(struct drm_i915_gem_request *req, 
int slice);
 void i915_gem_init_swizzling(struct drm_device *dev);
 void i915_gem_cleanup_ringbuffer(struct drm_device *dev);
 int __must_check i915_gpu_idle(struct drm_device *dev);
+int __must_check 

[Intel-gfx] [PATCH 10/10] drm/i915: Disable use of stolen area by User when Intel RST is present

2016-02-18 Thread ankitprasad . r . sharma
From: Ankitprasad Sharma 

The BIOS RapidStartTechnology may corrupt the stolen memory across S3
suspend due to unalarmed hibernation, in which case we will not be able
to preserve the User data stored in the stolen region. Hence this patch
tries to identify presence of the RST device on the ACPI bus, and
disables use of stolen memory (for persistent data) if found.

v2: Updated comment, updated/corrected new functions private to driver
(Chris/Tvrtko)

v3: Disabling stolen by default, wait till required acpi changes to
detect device presence are pulled in (Ankit)

v4: Enabled stolen by default as required acpi changes are merged
(Ankit)

v5: renamed variable, is IS_ENABLED() in place of #ifdef, use char*
instead of structures (Lukas)

Signed-off-by: Ankitprasad Sharma 
Cc: Lukas Wunner 
---
 drivers/gpu/drm/i915/i915_drv.h| 11 +++
 drivers/gpu/drm/i915/i915_gem.c|  8 
 drivers/gpu/drm/i915/i915_gem_stolen.c | 12 
 drivers/gpu/drm/i915/intel_acpi.c  |  7 +++
 4 files changed, 38 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 16f2f94..75e6935 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1349,6 +1349,16 @@ struct i915_gem_mm {
 */
bool busy;
 
+   /**
+* Stolen will be lost upon hibernate (as the memory is unpowered).
+* Across resume, we expect stolen to be intact - however, it may
+* also be utililised by third parties (e.g. Intel RapidStart
+* Technology) and if so we have to assume that any data stored in
+* stolen across resume is lost and we set this flag to indicate that
+* the stolen memory is volatile.
+*/
+   bool volatile_stolen;
+
/* the indicator for dispatch video commands on two BSD rings */
unsigned int bsd_ring_dispatch_index;
 
@@ -3465,6 +3475,7 @@ intel_opregion_notify_adapter(struct drm_device *dev, 
pci_power_t state)
 #endif
 
 /* intel_acpi.c */
+bool intel_detect_acpi_rst(void);
 #ifdef CONFIG_ACPI
 extern void intel_register_dsm_handler(void);
 extern void intel_unregister_dsm_handler(void);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 587beea..8e5fce4 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -396,8 +396,16 @@ static struct drm_i915_gem_object *
 i915_gem_alloc_object_stolen(struct drm_device *dev, size_t size)
 {
struct drm_i915_gem_object *obj;
+   struct drm_i915_private *dev_priv = dev->dev_private;
int ret;
 
+   if (dev_priv->mm.volatile_stolen) {
+   /* Stolen may be overwritten by external parties
+* so unsuitable for persistent user data.
+*/
+   return ERR_PTR(-ENODEV);
+   }
+
mutex_lock(>struct_mutex);
obj = i915_gem_object_create_stolen(dev, size);
if (IS_ERR(obj))
diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c 
b/drivers/gpu/drm/i915/i915_gem_stolen.c
index 335a1ef..88ee036 100644
--- a/drivers/gpu/drm/i915/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
@@ -482,6 +482,18 @@ int i915_gem_init_stolen(struct drm_device *dev)
 */
drm_mm_init(_priv->mm.stolen, 0, dev_priv->gtt.stolen_usable_size);
 
+   /* If the stolen region can be modified behind our backs upon suspend,
+* then we cannot use it to store nonvolatile contents (i.e user data)
+* as it will be corrupted upon resume.
+*/
+   dev_priv->mm.volatile_stolen = false;
+   if (IS_ENABLED(CONFIG_SUSPEND)) {
+   /* BIOSes using RapidStart Technology have been reported
+* to overwrite stolen across S3, not just S4.
+*/
+   dev_priv->mm.volatile_stolen = intel_detect_acpi_rst();
+   }
+
return 0;
 }
 
diff --git a/drivers/gpu/drm/i915/intel_acpi.c 
b/drivers/gpu/drm/i915/intel_acpi.c
index eb638a1..05fd67f 100644
--- a/drivers/gpu/drm/i915/intel_acpi.c
+++ b/drivers/gpu/drm/i915/intel_acpi.c
@@ -23,6 +23,8 @@ static const u8 intel_dsm_guid[] = {
0x0f, 0x13, 0x17, 0xb0, 0x1c, 0x2c
 };
 
+static const char *irst_id = "INT3392";
+
 static char *intel_dsm_port_name(u8 id)
 {
switch (id) {
@@ -162,3 +164,8 @@ void intel_register_dsm_handler(void)
 void intel_unregister_dsm_handler(void)
 {
 }
+
+bool intel_detect_acpi_rst(void)
+{
+   return acpi_dev_present(irst_id);
+}
-- 
1.9.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 03/10] drm/i915: Use insert_page for pwrite_fast

2016-02-18 Thread ankitprasad . r . sharma
From: Ankitprasad Sharma 

In pwrite_fast, map an object page by page if obj_ggtt_pin fails. First,
we try a nonblocking pin for the whole object (since that is fastest if
reused), then failing that we try to grab one page in the mappable
aperture. It also allows us to handle objects larger than the mappable
aperture (e.g. if we need to pwrite with vGPU restricting the aperture
to a measely 8MiB or something like that).

v2: Pin pages before starting pwrite, Combined duplicate loops (Chris)

v3: Combined loops based on local patch by Chris (Chris)

v4: Added i915 wrapper function for drm_mm_insert_node_in_range (Chris)

v5: Renamed wrapper function for drm_mm_insert_node_in_range (Chris)

v5: Added wrapper for drm_mm_remove_node() (Chris)

v6: Added get_pages call before pinning the pages (Tvrtko)
Added remove_mappable_node() wrapper for drm_mm_remove_node() (Chris)

v7: Added size argument for insert_mappable_node (Tvrtko)

v8: Do not put_pages after pwrite, do memset of node in the wrapper
function (insert_mappable_node) (Chris)

Signed-off-by: Ankitprasad Sharma 
Signed-off-by: Chris Wilson 
Reviewed-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/i915/i915_gem.c | 92 +++--
 1 file changed, 70 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index a928823..49a03f2 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -61,6 +61,24 @@ static bool cpu_write_needs_clflush(struct 
drm_i915_gem_object *obj)
return obj->pin_display;
 }
 
+static int
+insert_mappable_node(struct drm_i915_private *i915,
+ struct drm_mm_node *node, u32 size)
+{
+   memset(node, 0, sizeof(*node));
+   return drm_mm_insert_node_in_range_generic(>gtt.base.mm, node,
+  size, 0, 0, 0,
+  i915->gtt.mappable_end,
+  DRM_MM_SEARCH_DEFAULT,
+  DRM_MM_CREATE_DEFAULT);
+}
+
+static void
+remove_mappable_node(struct drm_mm_node *node)
+{
+   drm_mm_remove_node(node);
+}
+
 /* some bookkeeping */
 static void i915_gem_info_add_obj(struct drm_i915_private *dev_priv,
  size_t size)
@@ -760,20 +778,33 @@ fast_user_write(struct io_mapping *mapping,
  * user into the GTT, uncached.
  */
 static int
-i915_gem_gtt_pwrite_fast(struct drm_device *dev,
+i915_gem_gtt_pwrite_fast(struct drm_i915_private *i915,
 struct drm_i915_gem_object *obj,
 struct drm_i915_gem_pwrite *args,
 struct drm_file *file)
 {
-   struct drm_i915_private *dev_priv = dev->dev_private;
-   ssize_t remain;
-   loff_t offset, page_base;
+   struct drm_mm_node node;
+   uint64_t remain, offset;
char __user *user_data;
-   int page_offset, page_length, ret;
+   int ret;
 
ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE | PIN_NONBLOCK);
-   if (ret)
-   goto out;
+   if (ret) {
+   ret = insert_mappable_node(i915, , PAGE_SIZE);
+   if (ret)
+   goto out;
+
+   ret = i915_gem_object_get_pages(obj);
+   if (ret) {
+   remove_mappable_node();
+   goto out;
+   }
+
+   i915_gem_object_pin_pages(obj);
+   } else {
+   node.start = i915_gem_obj_ggtt_offset(obj);
+   node.allocated = false;
+   }
 
ret = i915_gem_object_set_to_gtt_domain(obj, true);
if (ret)
@@ -783,31 +814,39 @@ i915_gem_gtt_pwrite_fast(struct drm_device *dev,
if (ret)
goto out_unpin;
 
-   user_data = to_user_ptr(args->data_ptr);
-   remain = args->size;
-
-   offset = i915_gem_obj_ggtt_offset(obj) + args->offset;
-
intel_fb_obj_invalidate(obj, ORIGIN_GTT);
+   obj->dirty = true;
 
-   while (remain > 0) {
+   user_data = to_user_ptr(args->data_ptr);
+   offset = args->offset;
+   remain = args->size;
+   while (remain) {
/* Operation in this page
 *
 * page_base = page offset within aperture
 * page_offset = offset within page
 * page_length = bytes to copy for this page
 */
-   page_base = offset & PAGE_MASK;
-   page_offset = offset_in_page(offset);
-   page_length = remain;
-   if ((page_offset + remain) > PAGE_SIZE)
-   page_length = PAGE_SIZE - page_offset;
-
+   u32 page_base = node.start;
+   unsigned page_offset = offset_in_page(offset);
+   

[Intel-gfx] [PATCH v17 0/10] Support for creating/using Stolen memory backed objects

2016-02-18 Thread ankitprasad . r . sharma
From: Ankitprasad Sharma 

This patch series adds support for creating/using Stolen memory backed
objects.

Despite being a unified memory architecture (UMA) some bits of memory
are more equal than others. In particular we have the thorny issue of
stolen memory, memory stolen from the system by the BIOS and reserved
for igfx use. Stolen memory is required for some functions of the GPU
and display engine, but in general it goes wasted. Whilst we cannot
return it back to the system, we need to find some other method for
utilising it. As we do not support direct access to the physical address
in the stolen region, it behaves like a different class of memory,
closer in kin to local GPU memory. This strongly suggests that we need a
placement model like TTM if we are to fully utilize these discrete
chunks of differing memory.

To add support for creating Stolen memory backed objects, we extend the
drm_i915_gem_create structure, by adding a new flag through which user
can specify the preference to allocate the object from stolen memory,
which if set, an attempt will be made to allocate the object from stolen
memory subject to the availability of free space in the stolen region.

This patch series adds support for clearing buffer objects via CPU/GTT.
This is particularly useful for clearing out the memory from stolen
region, but can also be used for other shmem allocated objects. Currently
being used for buffers allocated in the stolen region. Also adding support
for stealing purgable stolen pages, if we run out of stolen memory when
trying to allocate an object.

v2: Added support for read/write from/to objects not backed by
shmem using the pread/pwrite interface.
Also extended the current get_aperture ioctl to retrieve the
total and available size of the stolen region.

v3: Removed the extended get_aperture ioctl patch 5 (to be submitted as
part of other patch series), addressed comments by Chris about pread/pwrite
for non shmem backed objects.

v4: Rebased to the latest drm-intel-nightly.

v5: Addressed comments, replaced patch 1/4 "Clearing buffers via blitter
engine" by "Clearing buffers via CPU/GTT".

v6: Rebased to the latest drm-intel-nightly, Addressed comments, updated
stolen memory purging logic by maintaining a list for purgable stolen
memory objects, enabled pread/pwrite for all non-shmem backed objects
without tiling restrictions.

v7: Addressed comments, compiler optimization, new patch added for correct
error code propagation to the userspace.

v8: Added a new patch to the series to Migrate stolen objects before
hibernation, as stolen memory is not preserved across hibernation. Added
correct error propagation for shmem as well non-shmem backed object allocation.

v9: Addressed comments, use of insert_page helper function to map object page
by page which can be helpful in low aperture space availability.

v10: Addressed comments, use insert_page for clearing out the stolen memory

v11: Addressed comments, 3 new patches added to support allocation from Stolen
memory
1. Allow use of i915_gem_object_get_dma_address for stolen backed objects
2. Use insert_page for pwrite_fast
3. Fail the execbuff using stolen objects as batchbuffers

v12: Addressed comments, Removed patch "Fail the execbuff using stolen objects
as batchbuffers"

v13: Addressed comments, Added 2 patches to detect Intel RST and disable stolen
for persistent data if RST device found
1. acpi: Export acpi_bus_type
2. drm/i915: Disable use of stolen area by User when Intel RST is present

v14: Addressed comments, Added 2 base patches to the series
1. drm/i915: Add support for mapping an object page by page
2. drm/i915: Introduce i915_gem_object_get_dma_address()

v15: Addressed comments, Disabled stolen memory by default

v16: Addressed comments, Added low level rpm assertions, Enabled stolen
memory

v17: Addressed comments

This can be verified using IGT tests:
igt/gem_stolen, igt/gem_create, igt/gem_pread, igt/gem_pwrite

Ankitprasad Sharma (6):
  drm/i915: Use insert_page for pwrite_fast
  drm/i915: Clearing buffer objects via CPU/GTT
  drm/i915: Support for creating Stolen memory backed objects
  drm/i915: Propagating correct error codes to the userspace
  drm/i915: Support for pread/pwrite from/to non shmem backed objects
  drm/i915: Disable use of stolen area by User when Intel RST is present

Chris Wilson (4):
  drm/i915: Add support for mapping an object page by page
  drm/i915: Introduce i915_gem_object_get_dma_address()
  drm/i915: Add support for stealing purgable stolen pages
  drm/i915: Migrate stolen objects before hibernation

 drivers/char/agp/intel-gtt.c |   9 +
 drivers/gpu/drm/i915/i915_debugfs.c  |   6 +-
 drivers/gpu/drm/i915/i915_dma.c  |   3 +
 drivers/gpu/drm/i915/i915_drv.c  |  17 +-
 drivers/gpu/drm/i915/i915_drv.h  |  58 ++-
 drivers/gpu/drm/i915/i915_gem.c  | 631 ---
 

[Intel-gfx] [PATCH 06/10] drm/i915: Propagating correct error codes to the userspace

2016-02-18 Thread ankitprasad . r . sharma
From: Ankitprasad Sharma 

Propagating correct error codes to userspace by using ERR_PTR and
PTR_ERR macros for stolen memory based object allocation. We generally
return -ENOMEM to the user whenever there is a failure in object
allocation. This patch helps user to identify the correct reason for the
failure and not just -ENOMEM each time.

v2: Moved the patch up in the series, added error propagation for
i915_gem_alloc_object too (Chris)

v3: Removed storing of error pointer inside structs, Corrected error
propagation in caller functions (Chris)

v4: Remove assignments inside the predicate (Chris)

v5: Removed unnecessary initializations, updated kerneldoc for
i915_guc_client, corrected missed error pointer handling (Tvrtko)

v6: Use ERR_CAST/temporary variable to avoid storing invalid pointer
in a common field (Chris)

v7: Resolved rebasing conflicts (Ankit)

v8: Removed redundant code (Chris)

Signed-off-by: Ankitprasad Sharma 
Reviewed-by: Chris Wilson 
---
 drivers/gpu/drm/i915/i915_gem.c  | 23 ++--
 drivers/gpu/drm/i915/i915_gem_batch_pool.c   |  4 +--
 drivers/gpu/drm/i915/i915_gem_context.c  |  4 +--
 drivers/gpu/drm/i915/i915_gem_render_state.c |  7 ++--
 drivers/gpu/drm/i915/i915_gem_stolen.c   | 53 +++-
 drivers/gpu/drm/i915/i915_guc_submission.c   | 52 +--
 drivers/gpu/drm/i915/intel_display.c |  2 +-
 drivers/gpu/drm/i915/intel_fbdev.c   |  6 ++--
 drivers/gpu/drm/i915/intel_lrc.c | 10 +++---
 drivers/gpu/drm/i915/intel_overlay.c |  4 +--
 drivers/gpu/drm/i915/intel_pm.c  |  7 ++--
 drivers/gpu/drm/i915/intel_ringbuffer.c  | 21 +--
 12 files changed, 110 insertions(+), 83 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 60d27fe..d63f18c 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -397,19 +397,18 @@ i915_gem_alloc_object_stolen(struct drm_device *dev, 
size_t size)
 
mutex_lock(>struct_mutex);
obj = i915_gem_object_create_stolen(dev, size);
-   if (!obj) {
-   mutex_unlock(>struct_mutex);
-   return NULL;
-   }
+   if (IS_ERR(obj))
+   goto out;
 
/* Always clear fresh buffers before handing to userspace */
ret = i915_gem_object_clear(obj);
if (ret) {
drm_gem_object_unreference(>base);
-   mutex_unlock(>struct_mutex);
-   return NULL;
+   obj = ERR_PTR(ret);
+   goto out;
}
 
+out:
mutex_unlock(>struct_mutex);
return obj;
 }
@@ -444,8 +443,8 @@ i915_gem_create(struct drm_file *file,
return -EINVAL;
}
 
-   if (obj == NULL)
-   return -ENOMEM;
+   if (IS_ERR(obj))
+   return PTR_ERR(obj);
 
ret = drm_gem_handle_create(file, >base, );
/* drop reference from allocate - handle holds it now */
@@ -4562,14 +4561,16 @@ struct drm_i915_gem_object 
*i915_gem_alloc_object(struct drm_device *dev,
struct drm_i915_gem_object *obj;
struct address_space *mapping;
gfp_t mask;
+   int ret;
 
obj = i915_gem_object_alloc(dev);
if (obj == NULL)
-   return NULL;
+   return ERR_PTR(-ENOMEM);
 
-   if (drm_gem_object_init(dev, >base, size) != 0) {
+   ret = drm_gem_object_init(dev, >base, size);
+   if (ret) {
i915_gem_object_free(obj);
-   return NULL;
+   return ERR_PTR(ret);
}
 
mask = GFP_HIGHUSER | __GFP_RECLAIMABLE;
diff --git a/drivers/gpu/drm/i915/i915_gem_batch_pool.c 
b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
index 7bf2f3f..d79caa2 100644
--- a/drivers/gpu/drm/i915/i915_gem_batch_pool.c
+++ b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
@@ -135,8 +135,8 @@ i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool,
int ret;
 
obj = i915_gem_alloc_object(pool->dev, size);
-   if (obj == NULL)
-   return ERR_PTR(-ENOMEM);
+   if (IS_ERR(obj))
+   return obj;
 
ret = i915_gem_object_get_pages(obj);
if (ret)
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
b/drivers/gpu/drm/i915/i915_gem_context.c
index 83a097c..2dd5fed 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -179,8 +179,8 @@ i915_gem_alloc_context_obj(struct drm_device *dev, size_t 
size)
int ret;
 
obj = i915_gem_alloc_object(dev, size);
-   if (obj == NULL)
-   return ERR_PTR(-ENOMEM);
+   if (IS_ERR(obj))
+   return obj;
 
/*
 * Try to make the context utilize L3 as well as LLC.
diff --git 

[Intel-gfx] [PATCH 08/10] drm/i915: Support for pread/pwrite from/to non shmem backed objects

2016-02-18 Thread ankitprasad . r . sharma
From: Ankitprasad Sharma 

This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.

v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)

v3: Rebased to the latest drm-intel-nightly (Ankit)

v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)

v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)

v6: Using pwrite_fast for non-shmem backed objects as well (Chris)

v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)

v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)

v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)

v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)

v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration

v12: Use page-by-page copy for slow user access too (Chris)

v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)

v14: Corrected datatypes/initializations (Tvrtko)

Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite

Signed-off-by: Ankitprasad Sharma 
---
 drivers/gpu/drm/i915/i915_gem.c | 221 ++--
 1 file changed, 189 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index ed8ae5d..0938ab1 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -55,6 +55,9 @@ static bool cpu_cache_is_coherent(struct drm_device *dev,
 
 static bool cpu_write_needs_clflush(struct drm_i915_gem_object *obj)
 {
+   if (obj->base.write_domain == I915_GEM_DOMAIN_CPU)
+   return false;
+
if (!cpu_cache_is_coherent(obj->base.dev, obj->cache_level))
return true;
 
@@ -646,6 +649,141 @@ shmem_pread_slow(struct page *page, int 
shmem_page_offset, int page_length,
return ret ? - EFAULT : 0;
 }
 
+static inline uint64_t
+slow_user_access(struct io_mapping *mapping,
+uint64_t page_base, int page_offset,
+char __user *user_data,
+unsigned long length, bool pwrite)
+{
+   void __iomem *ioaddr;
+   void *vaddr;
+   uint64_t unwritten;
+
+   ioaddr = io_mapping_map_wc(mapping, page_base);
+   /* We can use the cpu mem copy function because this is X86. */
+   vaddr = (void __force *)ioaddr + page_offset;
+   if (pwrite)
+   unwritten = __copy_from_user(vaddr, user_data, length);
+   else
+   unwritten = __copy_to_user(user_data, vaddr, length);
+
+   io_mapping_unmap(ioaddr);
+   return unwritten;
+}
+
+static int
+i915_gem_gtt_pread(struct drm_device *dev,
+  struct drm_i915_gem_object *obj, uint64_t size,
+  uint64_t data_offset, uint64_t data_ptr)
+{
+   struct drm_i915_private *dev_priv = dev->dev_private;
+   struct drm_mm_node node;
+   char __user *user_data;
+   uint64_t remain;
+   uint64_t offset;
+   int ret;
+
+   ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE);
+   if (ret) {
+   ret = insert_mappable_node(dev_priv, , PAGE_SIZE);
+   if (ret)
+   goto out;
+
+   ret = i915_gem_object_get_pages(obj);
+   if (ret) {
+   remove_mappable_node();
+   goto out;
+   }
+
+   i915_gem_object_pin_pages(obj);
+   } else {
+   node.start = i915_gem_obj_ggtt_offset(obj);
+   node.allocated = false;
+   ret = i915_gem_object_put_fence(obj);
+   if (ret)
+   goto out_unpin;
+   }
+
+   ret = i915_gem_object_set_to_gtt_domain(obj, false);
+   if (ret)
+   goto out_unpin;
+
+   user_data = to_user_ptr(data_ptr);
+   remain = size;
+   offset = data_offset;
+
+   mutex_unlock(>struct_mutex);
+   if (likely(!i915.prefault_disable)) {
+   ret = fault_in_multipages_writeable(user_data, remain);
+   if (ret) {
+   mutex_lock(>struct_mutex);
+   goto out_unpin;
+   }
+   }
+
+   while (remain > 0) {
+   /* Operation in this page
+*
+* page_base = page offset within aperture
+* page_offset = 

[Intel-gfx] [PATCH 04/10] drm/i915: Clearing buffer objects via CPU/GTT

2016-02-18 Thread ankitprasad . r . sharma
From: Ankitprasad Sharma 

This patch adds support for clearing buffer objects via CPU/GTT. This
is particularly useful for clearing out the non shmem backed objects.
Currently intend to use this only for buffers allocated from stolen
region.

v2: Added kernel doc for i915_gem_clear_object(), corrected/removed
variable assignments (Tvrtko)

v3: Map object page by page to the gtt if the pinning of the whole object
to the ggtt fails, Corrected function name (Chris)

v4: Clear the buffer page by page, and not map the whole object in the gtt
aperture. Use i915 wrapper function in place of drm_mm_insert_node_in_range.

v5: Use renamed wrapper function for drm_mm_insert_node_in_range,
updated barrier positioning (Chris)

v6: Use PAGE_SIZE instead of 4096, use get_pages call before pinning pages
(Tvrtko)

v7: Fixed the onion (undo operation in reverse order) (Chris)

Testcase: igt/gem_stolen

Signed-off-by: Ankitprasad Sharma 
Reviewed-by: Tvrtko Ursulin 
Reviewed-by: Chris Wilson 
---
 drivers/gpu/drm/i915/i915_drv.h |  1 +
 drivers/gpu/drm/i915/i915_gem.c | 47 +
 2 files changed, 48 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index e4c25c6..1122e1b 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2938,6 +2938,7 @@ int i915_gem_obj_prepare_shmem_read(struct 
drm_i915_gem_object *obj,
int *needs_clflush);
 
 int __must_check i915_gem_object_get_pages(struct drm_i915_gem_object *obj);
+int i915_gem_object_clear(struct drm_i915_gem_object *obj);
 
 static inline int __sg_page_count(struct scatterlist *sg)
 {
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 49a03f2..1aa4fc9 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -5405,3 +5405,50 @@ fail:
drm_gem_object_unreference(>base);
return ERR_PTR(ret);
 }
+
+/**
+ * i915_gem_object_clear() - Clear buffer object via CPU/GTT
+ * @obj: Buffer object to be cleared
+ *
+ * Return: 0 - success, non-zero - failure
+ */
+int i915_gem_object_clear(struct drm_i915_gem_object *obj)
+{
+   struct drm_i915_private *i915 = to_i915(obj->base.dev);
+   struct drm_mm_node node;
+   char __iomem *base;
+   uint64_t size = obj->base.size;
+   int ret, i;
+
+   lockdep_assert_held(>base.dev->struct_mutex);
+   ret = insert_mappable_node(i915, , PAGE_SIZE);
+   if (ret)
+   return ret;
+
+   ret = i915_gem_object_get_pages(obj);
+   if (ret)
+   goto err_remove_node;
+
+   i915_gem_object_pin_pages(obj);
+   base = io_mapping_map_wc(i915->gtt.mappable, node.start);
+
+   for (i = 0; i < size/PAGE_SIZE; i++) {
+   i915->gtt.base.insert_page(>gtt.base,
+  i915_gem_object_get_dma_address(obj, 
i),
+  node.start,
+  I915_CACHE_NONE, 0);
+   wmb(); /* flush modifications to the GGTT (insert_page) */
+   memset_io(base, 0, PAGE_SIZE);
+   wmb(); /* flush the write before we modify the GGTT */
+   }
+
+   io_mapping_unmap(base);
+   i915->gtt.base.clear_range(>gtt.base,
+   node.start, node.size,
+   true);
+   i915_gem_object_unpin_pages(obj);
+
+err_remove_node:
+   remove_mappable_node();
+   return ret;
+}
-- 
1.9.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 07/10] drm/i915: Add support for stealing purgable stolen pages

2016-02-18 Thread ankitprasad . r . sharma
From: Chris Wilson 

If we run out of stolen memory when trying to allocate an object, see if
we can reap enough purgeable objects to free up enough contiguous free
space for the allocation. This is in principle very much like evicting
objects to free up enough contiguous space in the vma when binding
a new object - and you will be forgiven for thinking that the code looks
very similar.

At the moment, we do not allow userspace to allocate objects in stolen,
so there is neither the memory pressure to trigger stolen eviction nor
any purgeable objects inside the stolen arena. However, this will change
in the near future, and so better management and defragmentation of
stolen memory will become a real issue.

v2: Remember to remove the drm_mm_node.

v3: Rebased to the latest drm-intel-nightly (Ankit)

v4: corrected if-else braces format (Tvrtko/kerneldoc)

v5: Rebased to the latest drm-intel-nightly (Ankit)
Added a seperate list to maintain purgable objects from stolen memory
region (Chris/Daniel)

v6: Compiler optimization (merging 2 single loops into one for() loop),
corrected code for object eviction, retire_requests before starting
object eviction (Chris)

v7: Added kernel doc for i915_gem_object_create_stolen()

v8: Check for struct_mutex lock before creating object from stolen
region (Tvrtko)

v9: Renamed variables to make usage clear, added comment, removed onetime
used macro (Tvrtko)

v10: Avoid masking of error when stolen_alloc fails (Tvrtko)

v11: Renamed stolen_link to tmp_link, as it may be used for other
purposes too (Chris)
Used ERR_CAST to cast error pointers while returning

v12: Added lockdep_assert before starting stolen-backed object
eviction (Chris)

Testcase: igt/gem_stolen

Signed-off-by: Chris Wilson 
Signed-off-by: Ankitprasad Sharma 
Reviewed-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/i915/i915_debugfs.c|   6 +-
 drivers/gpu/drm/i915/i915_drv.h|  17 +++-
 drivers/gpu/drm/i915/i915_gem.c|  15 +++
 drivers/gpu/drm/i915/i915_gem_stolen.c | 171 +
 drivers/gpu/drm/i915/intel_pm.c|   4 +-
 5 files changed, 188 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index ec0c2a05e..aa7c7a3 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -174,7 +174,7 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object 
*obj)
seq_puts(m, ")");
}
if (obj->stolen)
-   seq_printf(m, " (stolen: %08llx)", obj->stolen->start);
+   seq_printf(m, " (stolen: %08llx)", obj->stolen->base.start);
if (obj->pin_display || obj->fault_mappable) {
char s[3], *t = s;
if (obj->pin_display)
@@ -253,9 +253,9 @@ static int obj_rank_by_stolen(void *priv,
struct drm_i915_gem_object *b =
container_of(B, struct drm_i915_gem_object, obj_exec_link);
 
-   if (a->stolen->start < b->stolen->start)
+   if (a->stolen->base.start < b->stolen->base.start)
return -1;
-   if (a->stolen->start > b->stolen->start)
+   if (a->stolen->base.start > b->stolen->base.start)
return 1;
return 0;
 }
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 55f2de9..943b301 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -840,6 +840,12 @@ struct i915_ctx_hang_stats {
bool banned;
 };
 
+struct i915_stolen_node {
+   struct drm_mm_node base;
+   struct list_head mm_link;
+   struct drm_i915_gem_object *obj;
+};
+
 /* This must match up with the value previously used for execbuf2.rsvd1. */
 #define DEFAULT_CONTEXT_HANDLE 0
 
@@ -1291,6 +1297,13 @@ struct i915_gem_mm {
 */
struct list_head unbound_list;
 
+   /**
+* List of stolen objects that have been marked as purgeable and
+* thus available for reaping if we need more space for a new
+* allocation. Ordered by time of marking purgeable.
+*/
+   struct list_head stolen_list;
+
/** Usable portion of the GTT for GEM */
unsigned long stolen_base; /* limited to low memory (32-bit) */
 
@@ -2089,7 +2102,7 @@ struct drm_i915_gem_object {
struct list_head vma_list;
 
/** Stolen memory for this object, instead of being backed by shmem. */
-   struct drm_mm_node *stolen;
+   struct i915_stolen_node *stolen;
struct list_head global_list;
 
struct list_head ring_list[I915_NUM_RINGS];
@@ -2097,6 +2110,8 @@ struct drm_i915_gem_object {
struct list_head obj_exec_link;
 
struct list_head batch_pool_link;
+   /** Used to link an object to a list temporarily */
+   struct list_head tmp_link;
 
/**
 * This is set if the object is on the active lists (has pending
diff --git 

[Intel-gfx] [PATCH 02/10] drm/i915: Introduce i915_gem_object_get_dma_address()

2016-02-18 Thread ankitprasad . r . sharma
From: Chris Wilson 

This utility function is a companion to i915_gem_object_get_page() that
uses the same cached iterator for the scatterlist to perform fast
sequential lookup of the dma address associated with any page within the
object.

Signed-off-by: Chris Wilson 
Signed-off-by: Ankitprasad Sharma 
Reviewed-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/i915/i915_drv.h | 17 +
 1 file changed, 17 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 65a2cd0..e4c25c6 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2947,6 +2947,23 @@ static inline int __sg_page_count(struct scatterlist *sg)
 struct page *
 i915_gem_object_get_dirty_page(struct drm_i915_gem_object *obj, int n);
 
+static inline dma_addr_t
+i915_gem_object_get_dma_address(struct drm_i915_gem_object *obj, int n)
+{
+   if (n < obj->get_page.last) {
+   obj->get_page.sg = obj->pages->sgl;
+   obj->get_page.last = 0;
+   }
+
+   while (obj->get_page.last + __sg_page_count(obj->get_page.sg) <= n) {
+   obj->get_page.last += __sg_page_count(obj->get_page.sg++);
+   if (unlikely(sg_is_chain(obj->get_page.sg)))
+   obj->get_page.sg = sg_chain_ptr(obj->get_page.sg);
+   }
+
+   return sg_dma_address(obj->get_page.sg) + ((n - obj->get_page.last) << 
PAGE_SHIFT);
+}
+
 static inline struct page *
 i915_gem_object_get_page(struct drm_i915_gem_object *obj, int n)
 {
-- 
1.9.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 01/10] drm/i915: Add support for mapping an object page by page

2016-02-18 Thread ankitprasad . r . sharma
From: Chris Wilson 

Introduced a new vm specfic callback insert_page() to program a single pte in
ggtt or ppgtt. This allows us to map a single page in to the mappable aperture
space. This can be iterated over to access the whole object by using space as
meagre as page size.

v2: Added low level rpm assertions to insert_page routines (Chris)

v3: Added POSTING_READ post register write (Tvrtko)

Signed-off-by: Chris Wilson 
Signed-off-by: Ankitprasad Sharma 
---
 drivers/char/agp/intel-gtt.c|  9 +
 drivers/gpu/drm/i915/i915_gem_gtt.c | 67 +
 drivers/gpu/drm/i915/i915_gem_gtt.h |  5 +++
 include/drm/intel-gtt.h |  3 ++
 4 files changed, 84 insertions(+)

diff --git a/drivers/char/agp/intel-gtt.c b/drivers/char/agp/intel-gtt.c
index 1341a94..7c68576 100644
--- a/drivers/char/agp/intel-gtt.c
+++ b/drivers/char/agp/intel-gtt.c
@@ -838,6 +838,15 @@ static bool i830_check_flags(unsigned int flags)
return false;
 }
 
+void intel_gtt_insert_page(dma_addr_t addr,
+  unsigned int pg,
+  unsigned int flags)
+{
+   intel_private.driver->write_entry(addr, pg, flags);
+   wmb();
+}
+EXPORT_SYMBOL(intel_gtt_insert_page);
+
 void intel_gtt_insert_sg_entries(struct sg_table *st,
 unsigned int pg_start,
 unsigned int flags)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 715a771..6586525 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -2341,6 +2341,29 @@ static void gen8_set_pte(void __iomem *addr, gen8_pte_t 
pte)
 #endif
 }
 
+static void gen8_ggtt_insert_page(struct i915_address_space *vm,
+ dma_addr_t addr,
+ uint64_t offset,
+ enum i915_cache_level level,
+ u32 unused)
+{
+   struct drm_i915_private *dev_priv = to_i915(vm->dev);
+   gen8_pte_t __iomem *pte =
+   (gen8_pte_t __iomem *)dev_priv->gtt.gsm +
+   (offset >> PAGE_SHIFT);
+   int rpm_atomic_seq;
+
+   rpm_atomic_seq = assert_rpm_atomic_begin(dev_priv);
+
+   gen8_set_pte(pte, gen8_pte_encode(addr, level, true));
+   wmb();
+
+   I915_WRITE(GFX_FLSH_CNTL_GEN6, GFX_FLSH_CNTL_EN);
+   POSTING_READ(GFX_FLSH_CNTL_GEN6);
+
+   assert_rpm_atomic_end(dev_priv, rpm_atomic_seq);
+}
+
 static void gen8_ggtt_insert_entries(struct i915_address_space *vm,
 struct sg_table *st,
 uint64_t start,
@@ -2412,6 +2435,29 @@ static void gen8_ggtt_insert_entries__BKL(struct 
i915_address_space *vm,
stop_machine(gen8_ggtt_insert_entries__cb, , NULL);
 }
 
+static void gen6_ggtt_insert_page(struct i915_address_space *vm,
+ dma_addr_t addr,
+ uint64_t offset,
+ enum i915_cache_level level,
+ u32 flags)
+{
+   struct drm_i915_private *dev_priv = to_i915(vm->dev);
+   gen6_pte_t __iomem *pte =
+   (gen6_pte_t __iomem *)dev_priv->gtt.gsm +
+   (offset >> PAGE_SHIFT);
+   int rpm_atomic_seq;
+
+   rpm_atomic_seq = assert_rpm_atomic_begin(dev_priv);
+
+   iowrite32(vm->pte_encode(addr, level, true, flags), pte);
+   wmb();
+
+   I915_WRITE(GFX_FLSH_CNTL_GEN6, GFX_FLSH_CNTL_EN);
+   POSTING_READ(GFX_FLSH_CNTL_GEN6);
+
+   assert_rpm_atomic_end(dev_priv, rpm_atomic_seq);
+}
+
 /*
  * Binds an object into the global gtt with the specified cache level. The 
object
  * will be accessible to the GPU via commands whose operands reference offsets
@@ -2523,6 +2569,24 @@ static void gen6_ggtt_clear_range(struct 
i915_address_space *vm,
assert_rpm_atomic_end(dev_priv, rpm_atomic_seq);
 }
 
+static void i915_ggtt_insert_page(struct i915_address_space *vm,
+ dma_addr_t addr,
+ uint64_t offset,
+ enum i915_cache_level cache_level,
+ u32 unused)
+{
+   struct drm_i915_private *dev_priv = to_i915(vm->dev);
+   unsigned int flags = (cache_level == I915_CACHE_NONE) ?
+   AGP_USER_MEMORY : AGP_USER_CACHED_MEMORY;
+   int rpm_atomic_seq;
+
+   rpm_atomic_seq = assert_rpm_atomic_begin(dev_priv);
+
+   intel_gtt_insert_page(addr, offset >> PAGE_SHIFT, flags);
+
+   assert_rpm_atomic_end(dev_priv, rpm_atomic_seq);
+}
+
 static void i915_ggtt_insert_entries(struct i915_address_space *vm,
 struct sg_table *pages,
 uint64_t start,
@@ -3054,6 +3118,7 @@ static int 

[Intel-gfx] [PATCH 05/10] drm/i915: Support for creating Stolen memory backed objects

2016-02-18 Thread ankitprasad . r . sharma
From: Ankitprasad Sharma 

Extend the drm_i915_gem_create structure to add support for
creating Stolen memory backed objects. Added a new flag through
which user can specify the preference to allocate the object from
stolen memory, which if set, an attempt will be made to allocate
the object from stolen memory subject to the availability of
free space in the stolen region.

v2: Rebased to the latest drm-intel-nightly (Ankit)

v3: Changed versioning of GEM_CREATE param, added new comments (Tvrtko)

v4: Changed size from 32b to 64b to prevent userspace overflow (Tvrtko)
Corrected function arguments ordering (Chris)

v5: Corrected function name (Chris)

v6: Updated datatype for flags to keep sizeof(drm_i915_gem_create) u64
aligned (Chris)

v7: Use first 8 bits of gem_create flags for placement (Chris), Add helper
function for object allocation from stolen region (Ankit)

v8: Added comment explaining STOLEN placement flag (Chris)

Testcase: igt/gem_stolen

Signed-off-by: Ankitprasad Sharma 
Reviewed-by: Chris Wilson 
---
 drivers/gpu/drm/i915/i915_dma.c|  3 +++
 drivers/gpu/drm/i915/i915_drv.h|  2 +-
 drivers/gpu/drm/i915/i915_gem.c| 45 +++---
 drivers/gpu/drm/i915/i915_gem_stolen.c |  4 +--
 include/uapi/drm/i915_drm.h| 41 +++
 5 files changed, 89 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index a42eb58..1aa2cb6 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -172,6 +172,9 @@ static int i915_getparam(struct drm_device *dev, void *data,
case I915_PARAM_HAS_EXEC_SOFTPIN:
value = 1;
break;
+   case I915_PARAM_CREATE_VERSION:
+   value = 2;
+   break;
default:
DRM_DEBUG("Unknown parameter %d\n", param->param);
return -EINVAL;
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 1122e1b..55f2de9 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3301,7 +3301,7 @@ void i915_gem_stolen_remove_node(struct drm_i915_private 
*dev_priv,
 int i915_gem_init_stolen(struct drm_device *dev);
 void i915_gem_cleanup_stolen(struct drm_device *dev);
 struct drm_i915_gem_object *
-i915_gem_object_create_stolen(struct drm_device *dev, u32 size);
+i915_gem_object_create_stolen(struct drm_device *dev, u64 size);
 struct drm_i915_gem_object *
 i915_gem_object_create_stolen_for_preallocated(struct drm_device *dev,
   u32 stolen_offset,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 1aa4fc9..60d27fe 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -389,10 +389,36 @@ void i915_gem_object_free(struct drm_i915_gem_object *obj)
kmem_cache_free(dev_priv->objects, obj);
 }
 
+static struct drm_i915_gem_object *
+i915_gem_alloc_object_stolen(struct drm_device *dev, size_t size)
+{
+   struct drm_i915_gem_object *obj;
+   int ret;
+
+   mutex_lock(>struct_mutex);
+   obj = i915_gem_object_create_stolen(dev, size);
+   if (!obj) {
+   mutex_unlock(>struct_mutex);
+   return NULL;
+   }
+
+   /* Always clear fresh buffers before handing to userspace */
+   ret = i915_gem_object_clear(obj);
+   if (ret) {
+   drm_gem_object_unreference(>base);
+   mutex_unlock(>struct_mutex);
+   return NULL;
+   }
+
+   mutex_unlock(>struct_mutex);
+   return obj;
+}
+
 static int
 i915_gem_create(struct drm_file *file,
struct drm_device *dev,
uint64_t size,
+   uint64_t flags,
uint32_t *handle_p)
 {
struct drm_i915_gem_object *obj;
@@ -403,8 +429,21 @@ i915_gem_create(struct drm_file *file,
if (size == 0)
return -EINVAL;
 
+   if (flags & __I915_CREATE_UNKNOWN_FLAGS)
+   return -EINVAL;
+
/* Allocate the new object */
-   obj = i915_gem_alloc_object(dev, size);
+   switch (flags & I915_CREATE_PLACEMENT_MASK) {
+   case I915_CREATE_PLACEMENT_NORMAL:
+   obj = i915_gem_alloc_object(dev, size);
+   break;
+   case I915_CREATE_PLACEMENT_STOLEN:
+   obj = i915_gem_alloc_object_stolen(dev, size);
+   break;
+   default:
+   return -EINVAL;
+   }
+
if (obj == NULL)
return -ENOMEM;
 
@@ -427,7 +466,7 @@ i915_gem_dumb_create(struct drm_file *file,
args->pitch = ALIGN(args->width * DIV_ROUND_UP(args->bpp, 8), 64);
args->size = args->pitch * args->height;
return i915_gem_create(file, dev,
-  args->size, >handle);
+  

Re: [Intel-gfx] [PATCH 06/11] drm/i915: Framework for capturing command stream based OA reports

2016-02-18 Thread sourab gupta
On Wed, 2016-02-17 at 23:00 +0530, Robert Bragg wrote:
> Hi Sourab,
> 
> 
> As Sergio Martinez has started experimenting with this in gputop and
> reported seeing lots of ENOSPC errors being reported when reading I
> had a look into this and saw a few issues with how we check that
> there's data available to read in command stream mode, and a I think
> there's a possibility of incorrectly sorting the samples sometimes...

Hi Robert,
Thanks for spotting this anomaly. I'll have this fixed in the next
version of patch set.
> 
> On Tue, Feb 16, 2016 at 5:27 AM,  wrote:
> From: Sourab Gupta 
> 
> 
> -static bool i915_oa_can_read(struct i915_perf_stream *stream)
> +static bool append_oa_rcs_sample(struct i915_perf_stream
> *stream,
> +struct i915_perf_read_state
> *read_state,
> +struct i915_perf_cs_data_node
> *node)
> +{
> +   struct drm_i915_private *dev_priv = stream->dev_priv;
> +   struct oa_sample_data data = { 0 };
> +   const u8 *report =
> dev_priv->perf.command_stream_buf.addr +
> +   node->offset;
> +   u32 sample_flags = stream->sample_flags;
> +   u32 report_ts;
> +
> +   /*
> +* Forward the periodic OA samples which have the
> timestamp lower
> +* than timestamp of this sample, before forwarding
> this sample.
> +* This ensures samples read by user are order acc. to
> their timestamps
> +*/
> +   report_ts = *(u32 *)(report + 4);
> +   dev_priv->perf.oa.ops.read(stream, read_state,
> report_ts);
> +
> +   if (sample_flags & SAMPLE_OA_SOURCE_INFO)
> +   data.source = I915_PERF_OA_EVENT_SOURCE_RCS;
> +
> +   if (sample_flags & SAMPLE_CTX_ID)
> +   data.ctx_id = node->ctx_id;
> +
> +   if (sample_flags & SAMPLE_OA_REPORT)
> +   data.report = report;
> +
> +   append_oa_sample(stream, read_state, );
> +
> +   return true;
> +}
> +
> +static void oa_rcs_append_reports(struct i915_perf_stream
> *stream,
> + struct i915_perf_read_state
> *read_state)
> +{
> +   struct drm_i915_private *dev_priv = stream->dev_priv;
> +   struct i915_perf_cs_data_node *entry, *next;
> +
> +   list_for_each_entry_safe(entry, next,
> +_priv->perf.node_list,
> link) {
> +   if (!
> i915_gem_request_completed(entry->request, true))
> +   break;
> +
> +   if (!append_oa_rcs_sample(stream, read_state,
> entry))
> +   break;
> +
> +   spin_lock(_priv->perf.node_list_lock);
> +   list_del(>link);
> +   spin_unlock(_priv->perf.node_list_lock);
> +
> +
>  i915_gem_request_unreference__unlocked(entry->request);
> +   kfree(entry);
> +   }
> +
> +   /* Flush any remaining periodic reports */
> +   dev_priv->perf.oa.ops.read(stream, read_state,
> U32_MAX);
>  
> I don't think we can flush all remaining periodic reports here - at
> least not if we have any in-flight MI_RPC commands - in case the next
> request to complete might have reports with earlier timestamps than
> some of these periodic reports.
> 
> 
> Even if we have periodic reports available I think we need to throttle
> forwarding them based on the command stream requests completing.
> 
> 
> This is something that userspace should understand when it explicitly
> decides to use command stream mode in conjunction with periodic
> sampling.
> 
I agree, there shouldn't be any flushing of remaining periodic reports
here, instead any periodic reports remaining here should be taken care
of during the next processing of command stream samples.
>  
> +}
> +
> +static bool command_stream_buf_is_empty(struct
> i915_perf_stream *stream)
>  {
> struct drm_i915_private *dev_priv = stream->dev_priv;
> 
> -   return !
> dev_priv->perf.oa.ops.oa_buffer_is_empty(dev_priv);
> +   if (stream->cs_mode)
> +   return list_empty(_priv->perf.node_list);
> +   else
> +   return true;
>  }
> 
> 
> I think this list_empty() check needs a lock around it, as it's called
> from 

Re: [Intel-gfx] [PATCH 08/10] drm/i915: Support for pread/pwrite from/to non shmem backed objects

2016-02-18 Thread Ankitprasad Sharma
Hi,
On Thu, 2016-02-11 at 11:40 +, Tvrtko Ursulin wrote:

> > +
> > +   mutex_unlock(>struct_mutex);
> > +   if (likely(!i915.prefault_disable)) {
> > +   ret = fault_in_multipages_writeable(user_data, remain);
> > +   if (ret) {
> > +   mutex_lock(>struct_mutex);
> > +   goto out_unpin;
> > +   }
> > +   }
> > +
> > +   while (remain > 0) {
> > +   /* Operation in this page
> > +*
> > +* page_base = page offset within aperture
> > +* page_offset = offset within page
> > +* page_length = bytes to copy for this page
> > +*/
> > +   u32 page_base = node.start;
> > +   unsigned page_offset = offset_in_page(offset);
> > +   unsigned page_length = PAGE_SIZE - page_offset;
> > +   page_length = remain < page_length ? remain : page_length;
> > +   if (node.allocated) {
> > +   wmb();
> > +   dev_priv->gtt.base.insert_page(_priv->gtt.base,
> > +  
> > i915_gem_object_get_dma_address(obj, offset >> PAGE_SHIFT),
> > +  node.start,
> > +  I915_CACHE_NONE, 0);
> > +   wmb();
> > +   } else {
> > +   page_base += offset & PAGE_MASK;
> > +   }
> > +   /* This is a slow read/write as it tries to read from
> > +* and write to user memory which may result into page
> > +* faults, and so we cannot perform this under struct_mutex.
> > +*/
> > +   if (slow_user_access(dev_priv->gtt.mappable, page_base,
> > +page_offset, user_data,
> > +page_length, false)) {
> > +   ret = -EFAULT;
> > +   break;
> > +   }
> 
> Read does not want to try the fast access first, equivalent to pwrite ?
Using fast access means we will be unable to handle faults, which are
more frequent in a pread case.
> 
> > +
> > +   remain -= page_length;
> > +   user_data += page_length;
> > +   offset += page_length;
> > +   }
> > +
> >
> > @@ -870,24 +1012,36 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_private 
> > *i915,
> > unsigned page_length = PAGE_SIZE - page_offset;
> > page_length = remain < page_length ? remain : page_length;
> > if (node.allocated) {
> > -   wmb();
> > +   wmb(); /* flush the write before we modify the GGTT */
> > i915->gtt.base.insert_page(>gtt.base,
> >
> > i915_gem_object_get_dma_address(obj, offset >> PAGE_SHIFT),
> >node.start,
> >I915_CACHE_NONE,
> >0);
> > -   wmb();
> > +   wmb(); /* flush modifications to the GGTT (insert_page) 
> > */
> > } else {
> > page_base += offset & PAGE_MASK;
> > }
> > /* If we get a fault while copying data, then (presumably) our
> >  * source page isn't available.  Return the error and we'll
> >  * retry in the slow path.
> > +* If the object is non-shmem backed, we retry again with the
> > +* path that handles page fault.
> >  */
> > if (fast_user_write(i915->gtt.mappable, page_base,
> > page_offset, user_data, page_length)) {
> > -   ret = -EFAULT;
> > -   goto out_flush;
> > +   hit_slow_path = true;
> > +   mutex_unlock(>struct_mutex);
> > +   if (slow_user_access(i915->gtt.mappable,
> > +page_base,
> > +page_offset, user_data,
> > +page_length, true)) {
> > +   ret = -EFAULT;
> > +   mutex_lock(>struct_mutex);
> > +   goto out_flush;
> > +   }
> 
> I think the function now be called i915_gem_gtt_pwrite.
> 
> Would it also need the same pre-fault as in i915_gem_gtt_pread ?
I do not think pre-fault is needed here, as in pread we are dealing with
a read from the obj and to the user buffer (which has more chances of
faulting).
While in the pwrite case, we are optimistic that the user would have
already mapped/accessed the buffer before using it to write the buffer
contents into the object.
> 
> > +
> > +   mutex_lock(>struct_mutex);
> > }
> >
> > remain -= page_length;
> > @@ -896,6 +1050,9 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_private *i915,

Re: [Intel-gfx] [PATCH] drm/atomic: Allow for holes in connector state.

2016-02-18 Thread Dave Airlie
On 16 February 2016 at 21:37, Ville Syrjälä
 wrote:
> On Mon, Feb 15, 2016 at 02:17:01PM +0100, Maarten Lankhorst wrote:
>> Because we record connector_mask using 1 << drm_connector_index now
>> the connector_mask should stay the same even when other connectors
>> are removed. This was not the case with MST, in that case when removing
>> a connector all other connectors may change their index.
>>
>> This is fixed by waiting until the first get_connector_state to allocate
>> connector_state, and force reallocation when state is too small.
>>
>> As a side effect connector arrays no longer have to be preallocated,
>> and can be allocated on first use which means a less allocations in
>> the page flip only path.

Daniel you said something on irc about v2 of this for -fixes? Did I miss v2?

Dave.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 5/6] drm/i915: Implement color management on bdw/skl/bxt/kbl

2016-02-18 Thread Matt Roper
On Tue, Feb 09, 2016 at 12:19:17PM +, Lionel Landwerlin wrote:
> Patch based on a previous series by Shashank Sharma.
> 
> v2: Do not read GAMMA_MODE register to figure what mode we're in
> 
> v3: Program PREC_PAL_GC_MAX to clamp pixel values > 1.0
> 
> Add documentation on how the Broadcast RGB property is affected by
> CTM_MATRIX
> 
> v4: Update contributors
> 
> Signed-off-by: Shashank Sharma 
> Signed-off-by: Lionel Landwerlin 
> Signed-off-by: Kumar, Kiran S 
> Signed-off-by: Kausal Malladi 
> ---
>  Documentation/DocBook/gpu.tmpl   |   6 +-
>  drivers/gpu/drm/i915/i915_drv.c  |  24 ++-
>  drivers/gpu/drm/i915/i915_drv.h  |   9 +
>  drivers/gpu/drm/i915/i915_reg.h  |  22 +++
>  drivers/gpu/drm/i915/intel_color.c   | 367 
> ++-
>  drivers/gpu/drm/i915/intel_display.c |  22 ++-
>  drivers/gpu/drm/i915/intel_drv.h |   6 +-
>  7 files changed, 396 insertions(+), 60 deletions(-)
> 
> diff --git a/Documentation/DocBook/gpu.tmpl b/Documentation/DocBook/gpu.tmpl
> index 7c49a92..78b8877 100644
> --- a/Documentation/DocBook/gpu.tmpl
> +++ b/Documentation/DocBook/gpu.tmpl
> @@ -2152,7 +2152,11 @@ void intel_crt_init(struct drm_device *dev)
>   ENUM
>   { "Automatic", "Full", "Limited 16:235" }
>   Connector
> - TBD
> + When this property is set to Limited 16:235
> + and CTM_MATRIX is set, the hardware will be programmed with
> + the result of the multiplication of CTM_MATRIX by the limited
> + range matrix to ensure the pixels normaly in the range 0..1.0
> + are remapped to the range 16/255..235/255.
>   
>   
>   “audio”
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index 44912ec..b65aa20 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -66,6 +66,9 @@ static struct drm_driver driver;
>  #define IVB_CURSOR_OFFSETS \
>   .cursor_offsets = { CURSOR_A_OFFSET, IVB_CURSOR_B_OFFSET, 
> IVB_CURSOR_C_OFFSET }
>  
> +#define BDW_COLORS \
> + .color = { .degamma_lut_size = 512, .gamma_lut_size = 512 }
> +
>  static const struct intel_device_info intel_i830_info = {
>   .gen = 2, .is_mobile = 1, .cursor_needs_physical = 1, .num_pipes = 2,
>   .has_overlay = 1, .overlay_needs_physical = 1,
> @@ -288,24 +291,28 @@ static const struct intel_device_info 
> intel_haswell_m_info = {
>   .is_mobile = 1,
>  };
>  
> +#define BDW_FEATURES \
> + HSW_FEATURES, \
> + BDW_COLORS
> +
>  static const struct intel_device_info intel_broadwell_d_info = {
> - HSW_FEATURES,
> + BDW_FEATURES,
>   .gen = 8,
>  };
>  
>  static const struct intel_device_info intel_broadwell_m_info = {
> - HSW_FEATURES,
> + BDW_FEATURES,
>   .gen = 8, .is_mobile = 1,
>  };
>  
>  static const struct intel_device_info intel_broadwell_gt3d_info = {
> - HSW_FEATURES,
> + BDW_FEATURES,
>   .gen = 8,
>   .ring_mask = RENDER_RING | BSD_RING | BLT_RING | VEBOX_RING | BSD2_RING,
>  };
>  
>  static const struct intel_device_info intel_broadwell_gt3m_info = {
> - HSW_FEATURES,
> + BDW_FEATURES,
>   .gen = 8, .is_mobile = 1,
>   .ring_mask = RENDER_RING | BSD_RING | BLT_RING | VEBOX_RING | BSD2_RING,
>  };
> @@ -321,13 +328,13 @@ static const struct intel_device_info 
> intel_cherryview_info = {
>  };
>  
>  static const struct intel_device_info intel_skylake_info = {
> - HSW_FEATURES,
> + BDW_FEATURES,
>   .is_skylake = 1,
>   .gen = 9,
>  };
>  
>  static const struct intel_device_info intel_skylake_gt3_info = {
> - HSW_FEATURES,
> + BDW_FEATURES,
>   .is_skylake = 1,
>   .gen = 9,
>   .ring_mask = RENDER_RING | BSD_RING | BLT_RING | VEBOX_RING | BSD2_RING,
> @@ -345,17 +352,18 @@ static const struct intel_device_info 
> intel_broxton_info = {
>   .has_fbc = 1,
>   GEN_DEFAULT_PIPEOFFSETS,
>   IVB_CURSOR_OFFSETS,
> + BDW_COLORS,
>  };
>  
>  static const struct intel_device_info intel_kabylake_info = {
> - HSW_FEATURES,
> + BDW_FEATURES,
>   .is_preliminary = 1,
>   .is_kabylake = 1,
>   .gen = 9,
>  };
>  
>  static const struct intel_device_info intel_kabylake_gt3_info = {
> - HSW_FEATURES,
> + BDW_FEATURES,
>   .is_preliminary = 1,
>   .is_kabylake = 1,
>   .gen = 9,
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 8216665..c1ca4d0 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -659,6 +659,10 @@ struct drm_i915_display_funcs {
>   /* render clock increase/decrease */
>   /* display clock increase/decrease */
>   /* pll clock increase/decrease */
> +
> + void (*load_degamma_lut)(struct drm_crtc *crtc);
> + void (*load_csc_matrix)(struct drm_crtc *crtc);
> + void 

[Intel-gfx] [PATCH] drm/i915: Before waiting for a vblank update drm frame counter.

2016-02-18 Thread Rodrigo Vivi
Whenever power wells are disabled like when entering DC5/DC6
all display registers are zeroed. DMC firmware restore them
on DC5/DC6 exit. However frame counter register is read-only
and DMC cannot restore. So we start facing some funny errors
where drm was waiting for vblank 500 and hardware counter got
reset and not restored. So wait for vblank was returning  500
vblanks latter, like 8 seconds later.

Since we have no visibility when DMC is restoring the registers
the quick dirty way is to update the drm layer counter with the
latest counter we know. At least we don't keep hundreds vblank
behind.

FIXME: A proper solution would involve a power domain handling
to avoid DC off when a vblank is waited. However due the spin
locks at drm vblank handling and the mutex sleeps on the power
domain handling side we cannot do this. One alternative would be
to create a pre_enable_vblank and post_disable_vblank out of the
spin lock regions. But unfortunately this is also not trivial
because of many asynchronous drm_vblank_get and drm_vblank_put.

Any other idea or help is very welcome.

Signed-off-by: Rodrigo Vivi 
---
 drivers/gpu/drm/i915/i915_irq.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 25a8937..e67fae4 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2744,6 +2744,20 @@ static int gen8_enable_vblank(struct drm_device *dev, 
unsigned int pipe)
unsigned long irqflags;
 
spin_lock_irqsave(_priv->irq_lock, irqflags);
+   /*
+* DMC firmware can't restore frame counter register that is read-only
+* so we need to force the drm layer to know what is our latest
+* frame counter.
+* FIXME: We might face some funny race condition with DC states
+* entering after this restore. Unfortunately a power domain to avoid
+* DC off is not possible at this point due to all spin locks drm layer
+* does with vblanks. Another idea was to add pre-enable and
+* post-disable functions at vblank, but at drm layer there are many
+* asynchronous vblank puts that it is not possible with a bigger
+* rework.
+*/
+   if (HAS_CSR(dev))
+   dev->vblank[pipe].last = g4x_get_vblank_counter(dev, pipe);
bdw_enable_pipe_irq(dev_priv, pipe, GEN8_PIPE_VBLANK);
spin_unlock_irqrestore(_priv->irq_lock, irqflags);
 
-- 
2.4.3

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v5 10/12] drm/i915: Defer probe if gmux is present but its driver isn't

2016-02-18 Thread Deucher, Alexander
> -Original Message-
> From: daniel.vet...@ffwll.ch [mailto:daniel.vet...@ffwll.ch] On Behalf Of
> Daniel Vetter
> Sent: Thursday, February 18, 2016 6:11 PM
> To: Lukas Wunner
> Cc: dri-devel; platform-driver-...@vger.kernel.org; intel-gfx; Ben Skeggs;
> Deucher, Alexander
> Subject: Re: [PATCH v5 10/12] drm/i915: Defer probe if gmux is present but
> its driver isn't
> 
> On Thu, Feb 18, 2016 at 11:20 PM, Lukas Wunner  wrote:
> > Hi,
> >
> > On Thu, Feb 18, 2016 at 10:39:05PM +0100, Daniel Vetter wrote:
> >> On Thu, Feb 18, 2016 at 9:34 PM, Lukas Wunner 
> wrote:
> >> >
> >> >> Ok, makes sense. I still think adding the check to the client_register
> >> >> function would be good, just as a safety measure.
> >> >
> >> > Hm, the idea of calling vga_switcheroo_client_probe_defer() twice
> >> > causes me pain in the stomach. It's surprising for drivers which
> >> > just don't need it at the moment (amdgpu and snd_hda_intel) and
> >> > it feels like overengineering and pampering driver developers
> >> > beyond reasonable measure. Also while the single existing check is
> >> > cheap, we might later on add checks that take more time and slow
> >> > things down.
> >>
> >> It's motivated by Rusty's API Manifesto:
> >>
> >> http://sweng.the-davies.net/Home/rustys-api-design-manifesto
> >
> > Interesting, thank you.
> >
> >
> >> With the mandatory check in _register we reach level 5 - it'll blow up
> >> at runtime when we try to register it.
> >
> > The manifesto says "5. Do it right or it will always break at runtime".
> >
> > However even if we add a
> WARN_ON(vga_switcheroo_client_probe_defer(pdev))
> > to register_client(), it will not *always* spew a stacktrace but only on
> > the machines this concerns (MacBook Pros). Since failure to defer breaks
> > GPU switching, level 5 is already reached. Chances are this won't go
> > unnoticed by the user.
> 
> If we fail the register hopefully the driver checks for that and might
> blow up somewhere in untested error handling code. But there's a good
> chance it'll fail (we can encourage that more by adding must_check to
> the function declaration). In that case you get a nice bug report with
> splat from users hitting this.
> 
> Without this it'll silently work, and all the reports you get is
> "linux is shit, gpu switching doesn't work".
> 
> In both cases it can sometimes succeed, which is not great indeed. But
> I'm trying to fix that by injection EDEFER points artificially
> somehow. Not yet figured out that one.
> 
> But irrespective of the precise failure mode making the defer check
> mandatory by just including it in _register() is better since it makes
> it impossible to forget to call it when its needed. So imo clearly the
> more robust API. And that's my metric for evaluating new API - how
> easy/hard is it to abuse/get wrong.
> 
> >> For more context: We have tons of fun with EPROBE_DEFER handling
> >> between i915 and snd-hda
> >
> > I don't understand, there is currently not a single occurrence of
> > EPROBE_DEFER in i915, apart from the one I added.
> >
> > In sound/ there are 88 occurrences of EPROBE_DEFER in soc/, plus 1 in
> > ppc/ and that's it. So not a single one in pci/hda/ where hda_intel.c
> > resides.
> >
> > Is the fun with EPROBE_DEFER handling caused by the lack thereof?
> 
> Yes, there's one instance where i915 shoudl defer missing. The real
> trouble is that snd-hda has some really close ties with i915, and
> resolves those with probe-defer. And blows up all the time since we
> started using this, and with hdmi/dp you really always have to test
> both together in CI, snd-hda is pretty much a part of the intel gfx
> driver nowadays. Deferred probing is ime real trouble.

To further complicate things, AMD dGPUs have HDA audio on board as well.

Alex

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 3/4] drm/i915/gen9: Extend dmc debug mask to include cores

2016-02-18 Thread Runyan, Arthur J
>-Original Message-
>From: Deak, Imre
...
>The BSpec "Sequence to Allow DC5 or DC6" requires this only for BXT
>(looks like a recent addition to work around something), but it doesn't
>say it's needed for other platforms. The register description doesn't
>make a difference though.
>
>Perhaps Art has more info on this, adding him.
>

Only BXT needs it programmed to 1b at the moment.  Other products should keep 
the default.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v5 10/12] drm/i915: Defer probe if gmux is present but its driver isn't

2016-02-18 Thread Daniel Vetter
On Thu, Feb 18, 2016 at 11:20 PM, Lukas Wunner  wrote:
> Hi,
>
> On Thu, Feb 18, 2016 at 10:39:05PM +0100, Daniel Vetter wrote:
>> On Thu, Feb 18, 2016 at 9:34 PM, Lukas Wunner  wrote:
>> >
>> >> Ok, makes sense. I still think adding the check to the client_register
>> >> function would be good, just as a safety measure.
>> >
>> > Hm, the idea of calling vga_switcheroo_client_probe_defer() twice
>> > causes me pain in the stomach. It's surprising for drivers which
>> > just don't need it at the moment (amdgpu and snd_hda_intel) and
>> > it feels like overengineering and pampering driver developers
>> > beyond reasonable measure. Also while the single existing check is
>> > cheap, we might later on add checks that take more time and slow
>> > things down.
>>
>> It's motivated by Rusty's API Manifesto:
>>
>> http://sweng.the-davies.net/Home/rustys-api-design-manifesto
>
> Interesting, thank you.
>
>
>> With the mandatory check in _register we reach level 5 - it'll blow up
>> at runtime when we try to register it.
>
> The manifesto says "5. Do it right or it will always break at runtime".
>
> However even if we add a WARN_ON(vga_switcheroo_client_probe_defer(pdev))
> to register_client(), it will not *always* spew a stacktrace but only on
> the machines this concerns (MacBook Pros). Since failure to defer breaks
> GPU switching, level 5 is already reached. Chances are this won't go
> unnoticed by the user.

If we fail the register hopefully the driver checks for that and might
blow up somewhere in untested error handling code. But there's a good
chance it'll fail (we can encourage that more by adding must_check to
the function declaration). In that case you get a nice bug report with
splat from users hitting this.

Without this it'll silently work, and all the reports you get is
"linux is shit, gpu switching doesn't work".

In both cases it can sometimes succeed, which is not great indeed. But
I'm trying to fix that by injection EDEFER points artificially
somehow. Not yet figured out that one.

But irrespective of the precise failure mode making the defer check
mandatory by just including it in _register() is better since it makes
it impossible to forget to call it when its needed. So imo clearly the
more robust API. And that's my metric for evaluating new API - how
easy/hard is it to abuse/get wrong.

>> For more context: We have tons of fun with EPROBE_DEFER handling
>> between i915 and snd-hda
>
> I don't understand, there is currently not a single occurrence of
> EPROBE_DEFER in i915, apart from the one I added.
>
> In sound/ there are 88 occurrences of EPROBE_DEFER in soc/, plus 1 in
> ppc/ and that's it. So not a single one in pci/hda/ where hda_intel.c
> resides.
>
> Is the fun with EPROBE_DEFER handling caused by the lack thereof?

Yes, there's one instance where i915 shoudl defer missing. The real
trouble is that snd-hda has some really close ties with i915, and
resolves those with probe-defer. And blows up all the time since we
started using this, and with hdmi/dp you really always have to test
both together in CI, snd-hda is pretty much a part of the intel gfx
driver nowadays. Deferred probing is ime real trouble.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v5 10/12] drm/i915: Defer probe if gmux is present but its driver isn't

2016-02-18 Thread Lukas Wunner
Hi,

On Thu, Feb 18, 2016 at 10:39:05PM +0100, Daniel Vetter wrote:
> On Thu, Feb 18, 2016 at 9:34 PM, Lukas Wunner  wrote:
> >
> >> Ok, makes sense. I still think adding the check to the client_register
> >> function would be good, just as a safety measure.
> >
> > Hm, the idea of calling vga_switcheroo_client_probe_defer() twice
> > causes me pain in the stomach. It's surprising for drivers which
> > just don't need it at the moment (amdgpu and snd_hda_intel) and
> > it feels like overengineering and pampering driver developers
> > beyond reasonable measure. Also while the single existing check is
> > cheap, we might later on add checks that take more time and slow
> > things down.
> 
> It's motivated by Rusty's API Manifesto:
> 
> http://sweng.the-davies.net/Home/rustys-api-design-manifesto

Interesting, thank you.


> With the mandatory check in _register we reach level 5 - it'll blow up
> at runtime when we try to register it.

The manifesto says "5. Do it right or it will always break at runtime".

However even if we add a WARN_ON(vga_switcheroo_client_probe_defer(pdev))
to register_client(), it will not *always* spew a stacktrace but only on
the machines this concerns (MacBook Pros). Since failure to defer breaks
GPU switching, level 5 is already reached. Chances are this won't go
unnoticed by the user.


> For more context: We have tons of fun with EPROBE_DEFER handling
> between i915 and snd-hda

I don't understand, there is currently not a single occurrence of
EPROBE_DEFER in i915, apart from the one I added.

In sound/ there are 88 occurrences of EPROBE_DEFER in soc/, plus 1 in
ppc/ and that's it. So not a single one in pci/hda/ where hda_intel.c
resides.

Is the fun with EPROBE_DEFER handling caused by the lack thereof?


Best regards,

Lukas
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v5 10/12] drm/i915: Defer probe if gmux is present but its driver isn't

2016-02-18 Thread Daniel Vetter
On Thu, Feb 18, 2016 at 9:34 PM, Lukas Wunner  wrote:
>
>> Ok, makes sense. I still think adding the check to the client_register
>> function would be good, just as a safety measure.
>
> Hm, the idea of calling vga_switcheroo_client_probe_defer() twice
> causes me pain in the stomach. It's surprising for drivers which
> just don't need it at the moment (amdgpu and snd_hda_intel) and
> it feels like overengineering and pampering driver developers
> beyond reasonable measure. Also while the single existing check is
> cheap, we might later on add checks that take more time and slow
> things down.

It's motivated by Rusty's API Manifesto:

http://sweng.the-davies.net/Home/rustys-api-design-manifesto

With the mandatory check in _register we reach level 5 - it'll blow up
at runtime when we try to register it. Without that the failure is
completely silent, and you need to read the right mailing list thread
(level 1), but at least the kerneldocs lift it up to level 3.

For more context: We have tons of fun with EPROBE_DEFER handling
between i915 and snd-hda, and I'm looking into all possible means to
make any api/subsystem using deferred probing as robust as possible by
default. One of the ideas is to inject deferred probe failures at
runtime, but that's kinda hard to do in a generic way. At least making
it as close to impossible to abuse as feasible is the next best
option.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v2 1/2] drm/i915: Add i915_gem_object_vmap to map GEM object to virtual space

2016-02-18 Thread Yu Dai



On 02/18/2016 01:05 PM, Chris Wilson wrote:

On Thu, Feb 18, 2016 at 10:31:37AM -0800, yu@intel.com wrote:
> From: Alex Dai 
>
> There are several places inside driver where a GEM object is mapped to
> kernel virtual space. The mapping is either done for the whole object
> or certain page range of it.
>
> This patch introduces a function i915_gem_object_vmap to do such job.
>
> v2: Use obj->pages->nents for iteration within i915_gem_object_vmap;
> break when it finishes all desired pages. The caller need to pass
> in actual page number. (Tvrtko Ursulin)

Who owns the pages? vmap doesn't increase the page refcount nor
mapcount, so it is the callers responsibility to keep the pages alive
for the duration of the vmapping.

I suggested i915_gem_object_pin_vmap/unpin_vmap for that reason and that
also provides the foundation for undoing one of the more substantial
performance regressions from vmap_batch().




OK, found it at 050/190 of your patch series. That is a huge list of 
patches. :-) The code I put here does not change (at least tries to 
keep) the current code logic or driver behavior. I am not opposed to 
using i915_gem_object_pin_vmap/unpin_vmap at all. I will now just keep 
eyes on that patch.


Alex
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [RFC i-g-t] tests/drv_hangman: test for acthd increasing through invalid VM space

2016-02-18 Thread Chris Wilson
On Thu, Feb 18, 2016 at 05:34:50PM +, daniele.ceraolospu...@intel.com wrote:
> +static void ppgtt_walking(void)
> +{
> + memset(, 0, sizeof(execbuf));
> + execbuf.buffers_ptr = (uintptr_t)_exec;
> + execbuf.buffer_count = 1;
> + execbuf.batch_len = 8;
> +
> + gem_execbuf(fd, );
> +
> + while (gem_bo_busy(fd, handle) && timeout > 0) {
> + igt_debug("decreasing timeout to %u\n", --timeout);
> + sleep(1);
> + }

See gem_wait()
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v2 1/2] drm/i915: Add i915_gem_object_vmap to map GEM object to virtual space

2016-02-18 Thread Chris Wilson
On Thu, Feb 18, 2016 at 10:31:37AM -0800, yu@intel.com wrote:
> From: Alex Dai 
> 
> There are several places inside driver where a GEM object is mapped to
> kernel virtual space. The mapping is either done for the whole object
> or certain page range of it.
> 
> This patch introduces a function i915_gem_object_vmap to do such job.
> 
> v2: Use obj->pages->nents for iteration within i915_gem_object_vmap;
> break when it finishes all desired pages. The caller need to pass
> in actual page number. (Tvrtko Ursulin)

Who owns the pages? vmap doesn't increase the page refcount nor
mapcount, so it is the callers responsibility to keep the pages alive
for the duration of the vmapping.

I suggested i915_gem_object_pin_vmap/unpin_vmap for that reason and that
also provides the foundation for undoing one of the more substantial
performance regressions from vmap_batch().
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v4 7/8] drm/i915: Do not compute watermarks on a noop.

2016-02-18 Thread Zanoni, Paulo R
Em Qua, 2016-02-10 às 13:49 +0100, Maarten Lankhorst escreveu:
> No atomic state should be included after all validation when nothing
> has
> changed. During modeset all active planes will be added to the state,
> while disabled planes won't change their state.

As someone who is also not super familiar with the new watermarks code,
I really really wish I had a more detailed commit message to allow me
to understand your train of thought. I'll ask some questions below to
validate my understanding.

> 
> Signed-off-by: Maarten Lankhorst 
> Cc: Matt Roper 
> ---
>  drivers/gpu/drm/i915/intel_display.c |  3 +-
>  drivers/gpu/drm/i915/intel_drv.h | 13 
>  drivers/gpu/drm/i915/intel_pm.c  | 61 +-
> --
>  3 files changed, 51 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_display.c
> b/drivers/gpu/drm/i915/intel_display.c
> index 00cb261c6787..6bb1f5dbc7a0 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -11910,7 +11910,8 @@ static int intel_crtc_atomic_check(struct
> drm_crtc *crtc,
>   }
>  
>   ret = 0;
> - if (dev_priv->display.compute_pipe_wm) {
> + if (dev_priv->display.compute_pipe_wm &&
> + (mode_changed || pipe_config->update_pipe || crtc_state-
> >planes_changed)) {
>   ret = dev_priv->display.compute_pipe_wm(intel_crtc,
> state);
>   if (ret)
>   return ret;

Can't this chunk be on its own separate commit? I'm not sure why the
rest of the commit is related to this change.

It seems the rest of the commit is aimed reducing the number of planes
we have to lock, not about not computing WMs if nothing in the pipe has
changed.

> diff --git a/drivers/gpu/drm/i915/intel_drv.h
> b/drivers/gpu/drm/i915/intel_drv.h
> index 8effb9ece21e..144597ac74e3 100644
> --- a/drivers/gpu/drm/i915/intel_drv.h
> +++ b/drivers/gpu/drm/i915/intel_drv.h
> @@ -1583,6 +1583,19 @@ intel_atomic_get_crtc_state(struct
> drm_atomic_state *state,
>  
>   return to_intel_crtc_state(crtc_state);
>  }
> +
> +static inline struct intel_plane_state *
> +intel_atomic_get_existing_plane_state(struct drm_atomic_state
> *state,
> +   struct intel_plane *plane)
> +{
> + struct drm_plane_state *plane_state;
> +
> + plane_state = drm_atomic_get_existing_plane_state(state,
> >base);
> +
> + return to_intel_plane_state(plane_state);
> +}
> +
> +

Two newlines above.

It seems you'll be able to simplify a lot of stuff with this new
function. I'm looking forward to review that patch :)


>  int intel_atomic_setup_scalers(struct drm_device *dev,
>   struct intel_crtc *intel_crtc,
>   struct intel_crtc_state *crtc_state);
> diff --git a/drivers/gpu/drm/i915/intel_pm.c
> b/drivers/gpu/drm/i915/intel_pm.c
> index 379eabe093cb..8fb8c6891ae6 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -2010,11 +2010,18 @@ static void ilk_compute_wm_level(const struct
> drm_i915_private *dev_priv,
>   cur_latency *= 5;
>   }
>  
> - result->pri_val = ilk_compute_pri_wm(cstate, pristate,
> -  pri_latency, level);
> - result->spr_val = ilk_compute_spr_wm(cstate, sprstate,
> spr_latency);
> - result->cur_val = ilk_compute_cur_wm(cstate, curstate,
> cur_latency);
> - result->fbc_val = ilk_compute_fbc_wm(cstate, pristate,
> result->pri_val);
> + if (pristate) {
> + result->pri_val = ilk_compute_pri_wm(cstate,
> pristate,
> +  pri_latency,
> level);
> + result->fbc_val = ilk_compute_fbc_wm(cstate,
> pristate, result->pri_val);
> + }
> +
> + if (sprstate)
> + result->spr_val = ilk_compute_spr_wm(cstate,
> sprstate, spr_latency);
> +
> + if (curstate)
> + result->cur_val = ilk_compute_cur_wm(cstate,
> curstate, cur_latency);
> +
>   result->enable = true;
>  }
>  
> @@ -2287,7 +2294,6 @@ static int ilk_compute_pipe_wm(struct
> intel_crtc *intel_crtc,
>   const struct drm_i915_private *dev_priv = dev->dev_private;
>   struct intel_crtc_state *cstate = NULL;
>   struct intel_plane *intel_plane;
> - struct drm_plane_state *ps;
>   struct intel_plane_state *pristate = NULL;
>   struct intel_plane_state *sprstate = NULL;
>   struct intel_plane_state *curstate = NULL;
> @@ -2306,30 +2312,37 @@ static int ilk_compute_pipe_wm(struct
> intel_crtc *intel_crtc,
>   memset(pipe_wm, 0, sizeof(*pipe_wm));
>  
>   for_each_intel_plane_on_crtc(dev, intel_crtc, intel_plane) {
> - ps = drm_atomic_get_plane_state(state,
> - _plane->base);
> - if (IS_ERR(ps))
> - return PTR_ERR(ps);
> + struct intel_plane_state *ps;
> +
> + 

Re: [Intel-gfx] [PATCH v5 10/12] drm/i915: Defer probe if gmux is present but its driver isn't

2016-02-18 Thread Lukas Wunner
Hi,

On Tue, Feb 16, 2016 at 05:08:40PM +0100, Daniel Vetter wrote:
> On Tue, Feb 16, 2016 at 04:58:20PM +0100, Lukas Wunner wrote:
> > On Sun, Feb 14, 2016 at 01:46:28PM +0100, Daniel Vetter wrote:
> > > On Sun, Feb 14, 2016 at 1:10 PM, Lukas Wunner  wrote:
> > > > + * DRM drivers should invoke this early on in their ->probe callback 
> > > > and return
> > > > + * %-EPROBE_DEFER if it evaluates to %true. The GPU need not be 
> > > > registered with
> > > > + * vga_switcheroo_register_client() beforehand.
> > > 
> > > s/need not/must not/ ... is your native language German by any chance?
> > 
> > In principle there's no harm in registering the client first,
> > then checking if probing should be deferred, as long as the
> > client is unregistered before deferring. Thus the language
> > above is intentionally "need not" (muss nicht) rather than
> > "must not" (darf nicht). I didn't want to mandate something
> > that isn't actually required. The above sentence is merely
> > an aid for driver developers who might be confused in which
> > order to call what.
> 
> I'd reject any driver that does this, registering, then checking, then
> unregistering seems extermely unsafe. I'd really stick with mandatory
> language here to make this clear.

Ok, I've made it mandatory in the kerneldoc, updated patch follows below.


> Ok, makes sense. I still think adding the check to the client_register
> function would be good, just as a safety measure.

Hm, the idea of calling vga_switcheroo_client_probe_defer() twice
causes me pain in the stomach. It's surprising for drivers which
just don't need it at the moment (amdgpu and snd_hda_intel) and
it feels like overengineering and pampering driver developers
beyond reasonable measure. Also while the single existing check is
cheap, we might later on add checks that take more time and slow
things down.

Best regards,

Lukas

-- >8 --
Subject: [PATCH] vga_switcheroo: Add helper for deferred probing

So far we've got one condition when DRM drivers need to defer probing
on a dual GPU system and it's coded separately into each of the relevant
drivers. As suggested by Daniel Vetter, deduplicate that code in the
drivers and move it to a new vga_switcheroo helper. This yields better
encapsulation of concepts and lets us add further checks in a central
place. (The existing check pertains to pre-retina MacBook Pros and an
additional check is expected to be needed for retinas.)

v2: This helper could eventually be used by audio clients as well,
so rephrase kerneldoc to refer to "client" instead of "GPU"
and move the single existing check in an if block specific
to PCI_CLASS_DISPLAY_VGA devices. Move documentation on
that check from kerneldoc to a comment. (Daniel Vetter)

v3: Mandate in kerneldoc that registration of client shall only
happen after calling this helper. (Daniel Vetter)

Cc: Daniel Vetter 
Cc: Ben Skeggs 
Cc: Alex Deucher 
Signed-off-by: Lukas Wunner 
---
 drivers/gpu/drm/i915/i915_drv.c   | 10 +-
 drivers/gpu/drm/nouveau/nouveau_drm.c | 10 +-
 drivers/gpu/drm/radeon/radeon_drv.c   | 10 +-
 drivers/gpu/vga/vga_switcheroo.c  | 34 --
 include/linux/vga_switcheroo.h|  2 ++
 5 files changed, 37 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 44912ec..80cfd73 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -35,11 +35,9 @@
 #include "i915_trace.h"
 #include "intel_drv.h"
 
-#include 
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 
@@ -972,13 +970,7 @@ static int i915_pci_probe(struct pci_dev *pdev, const 
struct pci_device_id *ent)
if (PCI_FUNC(pdev->devfn))
return -ENODEV;
 
-   /*
-* apple-gmux is needed on dual GPU MacBook Pro
-* to probe the panel if we're the inactive GPU.
-*/
-   if (IS_ENABLED(CONFIG_VGA_ARB) && IS_ENABLED(CONFIG_VGA_SWITCHEROO) &&
-   apple_gmux_present() && pdev != vga_default_device() &&
-   !vga_switcheroo_handler_flags())
+   if (vga_switcheroo_client_probe_defer(pdev))
return -EPROBE_DEFER;
 
return drm_get_pci_dev(pdev, ent, );
diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c 
b/drivers/gpu/drm/nouveau/nouveau_drm.c
index bb8498c..9141bcd 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -22,13 +22,11 @@
  * Authors: Ben Skeggs
  */
 
-#include 
 #include 
 #include 
 #include 
 #include 
 #include 
-#include 
 #include 
 
 #include "drmP.h"
@@ -314,13 +312,7 @@ static int nouveau_drm_probe(struct pci_dev *pdev,
bool boot = false;
int ret;
 
-   /*
-* apple-gmux is needed on dual GPU MacBook Pro
-* to probe the panel if we're the inactive GPU.
-  

[Intel-gfx] [PATCH] drm/i915: Skip PIPESTAT reads from irq handler on VLV/CHV when power well is down

2016-02-18 Thread ville . syrjala
From: Ville Syrjälä 

PIPESTAT registers live in the display power well on VLV/CHV, so we
shouldn't access them when things are powered down. Let's check
whether the display interrupts are on or off before accessing the
PIPESTAT registers.

Another option would be to read the PIPESTAT registers only when
the IIR register indicates that there's a pending pipe event. But
that would mean we might miss even more underrun reports than we
do now, because the underrun status bit lives in PIPESTAT but doesn't
actually generate an interrupt.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93738
Cc: Chris Wilson 
Tested-by: Chris Wilson 
Signed-off-by: Ville Syrjälä 
---
 drivers/gpu/drm/i915/i915_irq.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 25a89373df63..d56c261ad867 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1651,6 +1651,12 @@ static void valleyview_pipestat_irq_handler(struct 
drm_device *dev, u32 iir)
int pipe;
 
spin_lock(_priv->irq_lock);
+
+   if (!dev_priv->display_irqs_enabled) {
+   spin_unlock(_priv->irq_lock);
+   return;
+   }
+
for_each_pipe(dev_priv, pipe) {
i915_reg_t reg;
u32 mask, iir_bit = 0;
-- 
2.4.10

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PULL] topic/drm-misc

2016-02-18 Thread Daniel Vetter
Hi Dave,

Misc stuff all over:
- more mode_fixup removal from Carlos, there's another final pile still
  left.
- final bits of vgaswitcheroo from Lukas for apple gmux, we're still
  discussing an api cleanup patch to make it a bit more abuse-safe as a
  follow-up
- dp aux interface for userspace for tools from Rafael Antognolli 
- actual interface parts for dma-buf flushing for userspace mmap
- few small bits all over

... plus all the bits from last pull req. Why do I split them up ;-)

I'll be on vacation for 1 week now, will send final intel pull for 4.6 and
probably more drm-misc when I'm back.

Cheers, Daniel


The following changes since commit 10c1b6183a163aca59ba92b88f2b4c4cecd20d4c:

  drm/tegra: drop unused variable. (2016-02-09 11:17:37 +1000)

are available in the git repository at:

  git://anongit.freedesktop.org/drm-intel tags/topic/drm-misc-2016-02-18

for you to fetch changes up to a6ddd2f1b99f1c00b4e00289b13c3e451c7130b0:

  drm/udl: Use module_usb_driver (2016-02-17 14:19:30 +0100)


Amitoj Kaur Chawla (1):
  drm/udl: Use module_usb_driver

Arnd Bergmann (1):
  drm/msm: remove unused variable

Carlos Palminha (22):
  drm: fixes when i2c encoder slave mode_fixup is null.
  drm: fixes crct set_mode when encoder mode_fixup is null.
  drm/i2c/sil164: removed unnecessary code, mode_fixup is now optional.
  drm/i2c/tda998x: removed unnecessary code, mode_fixup is now optional.
  drm/bridge: removed dummy mode_fixup function from dw-hdmi.
  drm/virtio: removed optional dummy encoder mode_fixup function.
  drm/udl: removed optional dummy encoder mode_fixup function.
  drm/exynos: removed optional dummy encoder mode_fixup function.
  drm/amdgpu: removed optional dummy encoder mode_fixup function.
  drm/ast: removed optional dummy encoder mode_fixup function.
  drm/bochs: removed optional dummy encoder mode_fixup function.
  drm/cirrus: removed optional dummy encoder mode_fixup function.
  drm/radeon: removed optional dummy encoder mode_fixup function.
  drm/gma500: removed optional dummy encoder mode_fixup function.
  drm/imx: removed optional dummy encoder mode_fixup function.
  drm/msm/mdp: removed optional dummy encoder mode_fixup function.
  drm/mgag200: removed optional dummy encoder mode_fixup function.
  drm/qxl: removed optional dummy encoder mode_fixup function.
  drm/rockchip: removed optional dummy encoder mode_fixup function.
  drm/sti: removed optional dummy encoder mode_fixup function.
  drm/tilcdc: removed optional dummy encoder mode_fixup function.
  drm: fixes crct set_mode when crtc mode_fixup is null.

Daniel Thompson (1):
  drm: prime: Honour O_RDWR during prime-handle-to-fd

Daniel Vetter (2):
  dma-buf: Add ioctls to allow userspace to flush
  Merge branch 'topic/mode_fixup-optional' into topic/drm-misc

Haixia Shi (1):
  drm/msm: remove the drm_device_is_unplugged check

Insu Yun (1):
  ch7006: correctly handling failed allocation

LABBE Corentin (1):
  drm: modes: add missing [drm] to message printing

Lukas Wunner (13):
  vga_switcheroo: Add handler flags infrastructure
  vga_switcheroo: Add support for switching only the DDC
  apple-gmux: Track switch state
  apple-gmux: Add switch_ddc support
  drm/edid: Switch DDC when reading the EDID
  drm/i915: Switch DDC when reading the EDID
  drm/nouveau: Switch DDC when reading the EDID
  drm/radeon: Switch DDC when reading the EDID
  apple-gmux: Add helper for presence detect
  drm/i915: Defer probe if gmux is present but its driver isn't
  drm/nouveau: Defer probe if gmux is present but its driver isn't
  drm/radeon: Defer probe if gmux is present but its driver isn't
  apple-gmux: Fix build breakage if !CONFIG_ACPI

Maarten Lankhorst (7):
  drm/core: Add drm_encoder_index.
  drm/core: Add drm_for_each_encoder_mask, v2.
  drm/i915: Do not touch best_encoder for load detect.
  drm/atomic: Do not unset crtc when an encoder is stolen
  drm/atomic: Add encoder_mask to crtc_state, v3.
  drm/fb_helper: Use correct allocation count for arrays.
  drm/fb_helper: Use add_one_connector in add_all_connectors.

Rafael Antognolli (3):
  drm/kms_helper: Add a common place to call init and exit functions.
  drm/dp: Add a drm_aux-dev module for reading/writing dpcd registers.
  drm/i915: Set aux.dev to the drm_connector device, instead of drm_device.

Rasmus Villemoes (1):
  drm/gma500: fix error path in gma_intel_setup_gmbus()

Tiago Vignatti (3):
  dma-buf: Remove range-based flush
  drm/i915: Implement end_cpu_access
  drm/i915: Use CPU mapping for userspace dma-buf mmap()

Ville Syrjälä (1):
  drm: Add drm_format_plane_width() and drm_format_plane_height()

 Documentation/DocBook/gpu.tmpl   |   5 +
 Documentation/dma-buf-sharing.txt 

[Intel-gfx] [PATCH v2 1/2] drm/i915: Add i915_gem_object_vmap to map GEM object to virtual space

2016-02-18 Thread yu . dai
From: Alex Dai 

There are several places inside driver where a GEM object is mapped to
kernel virtual space. The mapping is either done for the whole object
or certain page range of it.

This patch introduces a function i915_gem_object_vmap to do such job.

v2: Use obj->pages->nents for iteration within i915_gem_object_vmap;
break when it finishes all desired pages. The caller need to pass
in actual page number. (Tvrtko Ursulin)

Signed-off-by: Alex Dai 
Cc: Dave Gordon 
Cc: Daniel Vetter 
Cc: Tvrtko Ursulin 
Cc: Chris Wilson 
Signed-off-by: Alex Dai 
---
 drivers/gpu/drm/i915/i915_cmd_parser.c  | 28 +---
 drivers/gpu/drm/i915/i915_drv.h |  3 +++
 drivers/gpu/drm/i915/i915_gem.c | 47 +
 drivers/gpu/drm/i915/i915_gem_dmabuf.c  | 16 +++
 drivers/gpu/drm/i915/intel_ringbuffer.c | 24 ++---
 5 files changed, 56 insertions(+), 62 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c 
b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 814d894..915e8c1 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -863,37 +863,11 @@ find_reg(const struct drm_i915_reg_descriptor *table,
 static u32 *vmap_batch(struct drm_i915_gem_object *obj,
   unsigned start, unsigned len)
 {
-   int i;
-   void *addr = NULL;
-   struct sg_page_iter sg_iter;
int first_page = start >> PAGE_SHIFT;
int last_page = (len + start + 4095) >> PAGE_SHIFT;
int npages = last_page - first_page;
-   struct page **pages;
-
-   pages = drm_malloc_ab(npages, sizeof(*pages));
-   if (pages == NULL) {
-   DRM_DEBUG_DRIVER("Failed to get space for pages\n");
-   goto finish;
-   }
-
-   i = 0;
-   for_each_sg_page(obj->pages->sgl, _iter, obj->pages->nents, 
first_page) {
-   pages[i++] = sg_page_iter_page(_iter);
-   if (i == npages)
-   break;
-   }
-
-   addr = vmap(pages, i, 0, PAGE_KERNEL);
-   if (addr == NULL) {
-   DRM_DEBUG_DRIVER("Failed to vmap pages\n");
-   goto finish;
-   }
 
-finish:
-   if (pages)
-   drm_free_large(pages);
-   return (u32*)addr;
+   return (u32*)i915_gem_object_vmap(obj, first_page, npages);
 }
 
 /* Returns a vmap'd pointer to dest_obj, which the caller must unmap */
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 6644c2e..5b00a6a 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2899,6 +2899,9 @@ struct drm_i915_gem_object 
*i915_gem_object_create_from_data(
struct drm_device *dev, const void *data, size_t size);
 void i915_gem_free_object(struct drm_gem_object *obj);
 void i915_gem_vma_destroy(struct i915_vma *vma);
+void *i915_gem_object_vmap(struct drm_i915_gem_object *obj,
+  unsigned int first,
+  unsigned int npages);
 
 /* Flags used by pin/bind */
 #define PIN_MAPPABLE   (1<<0)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index f68f346..4bc0ce7 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -5356,3 +5356,50 @@ fail:
drm_gem_object_unreference(>base);
return ERR_PTR(ret);
 }
+
+/**
+ * i915_gem_object_vmap - map a GEM obj into kernel virtual space
+ * @obj: the GEM obj to be mapped
+ * @first: index of the first page where mapping starts
+ * @npages: how many pages to be mapped, starting from first page
+ *
+ * Map a given page range of GEM obj into kernel virtual space. The caller must
+ * make sure the associated pages are gathered and pinned before calling this
+ * function. vunmap should be called after use.
+ *
+ * NULL will be returned if fails.
+ */
+void *i915_gem_object_vmap(struct drm_i915_gem_object *obj,
+  unsigned int first,
+  unsigned int npages)
+{
+   struct sg_page_iter sg_iter;
+   struct page **pages;
+   void *addr;
+   int i;
+
+   if (first + npages > obj->pages->nents) {
+   DRM_DEBUG_DRIVER("Invalid page count\n");
+   return NULL;
+   }
+
+   pages = drm_malloc_ab(npages, sizeof(*pages));
+   if (pages == NULL) {
+   DRM_DEBUG_DRIVER("Failed to get space for pages\n");
+   return NULL;
+   }
+
+   i = 0;
+   for_each_sg_page(obj->pages->sgl, _iter, obj->pages->nents, first) {
+   pages[i++] = sg_page_iter_page(_iter);
+   if (i == npages)
+   break;
+   }
+
+   addr = vmap(pages, npages, 0, PAGE_KERNEL);
+   if (addr == NULL)
+   DRM_DEBUG_DRIVER("Failed 

[Intel-gfx] [PATCH v2 0/2] Add i915_gem_object_vmap

2016-02-18 Thread yu . dai
From: Alex Dai 

There are several places in driver that a GEM object is mapped to kernel
virtual space. Now add a common function i915_gem_object_vmap to do the vmap
work for such use case.

Alex Dai (2):
  drm/i915: Add i915_gem_object_vmap to map GEM object to virtual space
  drm/i915/guc: Simplify code by keeping vmap of guc_client object

 drivers/gpu/drm/i915/i915_cmd_parser.c | 28 +--
 drivers/gpu/drm/i915/i915_drv.h|  3 ++
 drivers/gpu/drm/i915/i915_gem.c| 47 +
 drivers/gpu/drm/i915/i915_gem_dmabuf.c | 16 ++---
 drivers/gpu/drm/i915/i915_guc_submission.c | 56 ++
 drivers/gpu/drm/i915/intel_guc.h   |  3 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c| 24 ++---
 7 files changed, 77 insertions(+), 100 deletions(-)

-- 
2.5.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v2 2/2] drm/i915/guc: Simplify code by keeping vmap of guc_client object

2016-02-18 Thread yu . dai
From: Alex Dai 

GuC client object is always pinned during its life cycle. We cache
the vmap of client object, which includes guc_process_desc, doorbell
and work queue. By doing so, we can simplify the code where driver
communicate with GuC.

As a result, this patch removes the kmap_atomic in wq_check_space,
where usleep_range could be called while kmap_atomic is held. This
fixes issue below.

v2: Pass page actual numbers to i915_gem_object_vmap(). Also, check
return value for error handling. (Tvrtko Ursulin)
v1: vmap is done by i915_gem_object_vmap().

[   34.098798] BUG: scheduling while atomic: gem_close_race/1941/0x0002
[   34.098822] Modules linked in: hid_generic usbhid i915 asix usbnet libphy 
mii i2c_algo_bit drm_kms_helper cfbfillrect syscopyarea cfbimgblt sysfillrect 
sysimgblt fb_sys_fops cfbcopyarea drm coretemp i2c_hid hid video 
pinctrl_sunrisepoint pinctrl_intel acpi_pad nls_iso8859_1 e1000e ptp psmouse 
pps_core ahci libahci
[   34.098824] CPU: 0 PID: 1941 Comm: gem_close_race Tainted: G U  
4.4.0-160121+ #123
[   34.098824] Hardware name: Intel Corporation Skylake Client platform/Skylake 
AIO DDR3L RVP10, BIOS SKLSE2R1.R00.X100.B01.1509220551 09/22/2015
[   34.098825]  00013e40 880166c27a78 81280d02 
880172c13e40
[   34.098826]  880166c27a88 810c203a 880166c27ac8 
814ec808
[   34.098827]  88016b7c6000 880166c28000 000f4240 
0001
[   34.098827] Call Trace:
[   34.098831]  [] dump_stack+0x4b/0x79
[   34.098833]  [] __schedule_bug+0x41/0x4f
[   34.098834]  [] __schedule+0x5a8/0x690
[   34.098835]  [] schedule+0x37/0x80
[   34.098836]  [] schedule_hrtimeout_range_clock+0xad/0x130
[   34.098837]  [] ? hrtimer_init+0x10/0x10
[   34.098838]  [] ? schedule_hrtimeout_range_clock+0xa1/0x130
[   34.098839]  [] schedule_hrtimeout_range+0xe/0x10
[   34.098840]  [] usleep_range+0x3b/0x40
[   34.098853]  [] i915_guc_wq_check_space+0x119/0x210 [i915]
[   34.098861]  [] 
intel_logical_ring_alloc_request_extras+0x5c/0x70 [i915]
[   34.098869]  [] i915_gem_request_alloc+0x91/0x170 [i915]
[   34.098875]  [] 
i915_gem_do_execbuffer.isra.25+0xbc7/0x12a0 [i915]
[   34.098882]  [] ? 
i915_gem_object_get_pages_gtt+0x225/0x3c0 [i915]
[   34.098889]  [] ? i915_gem_pwrite_ioctl+0xd6/0x9f0 [i915]
[   34.098895]  [] i915_gem_execbuffer2+0xa8/0x250 [i915]
[   34.098900]  [] drm_ioctl+0x258/0x4f0 [drm]
[   34.098906]  [] ? i915_gem_execbuffer+0x340/0x340 [i915]
[   34.098908]  [] do_vfs_ioctl+0x2cd/0x4a0
[   34.098909]  [] ? __fget+0x72/0xb0
[   34.098910]  [] SyS_ioctl+0x3c/0x70
[   34.098911]  [] entry_SYSCALL_64_fastpath+0x12/0x6a
[   34.100208] [ cut here ]

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93847
Cc: Dave Gordon 
Cc: Daniel Vetter 
Cc: Tvrtko Ursulin 
Signed-off-by: Alex Dai 
---
 drivers/gpu/drm/i915/i915_guc_submission.c | 56 ++
 drivers/gpu/drm/i915/intel_guc.h   |  3 +-
 2 files changed, 21 insertions(+), 38 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c 
b/drivers/gpu/drm/i915/i915_guc_submission.c
index d7543ef..3e2ea42 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -195,11 +195,9 @@ static int guc_ring_doorbell(struct i915_guc_client *gc)
struct guc_process_desc *desc;
union guc_doorbell_qw db_cmp, db_exc, db_ret;
union guc_doorbell_qw *db;
-   void *base;
int attempt = 2, ret = -EAGAIN;
 
-   base = kmap_atomic(i915_gem_object_get_page(gc->client_obj, 0));
-   desc = base + gc->proc_desc_offset;
+   desc = gc->client_base + gc->proc_desc_offset;
 
/* Update the tail so it is visible to GuC */
desc->tail = gc->wq_tail;
@@ -215,7 +213,7 @@ static int guc_ring_doorbell(struct i915_guc_client *gc)
db_exc.cookie = 1;
 
/* pointer of current doorbell cacheline */
-   db = base + gc->doorbell_offset;
+   db = gc->client_base + gc->doorbell_offset;
 
while (attempt--) {
/* lets ring the doorbell */
@@ -244,10 +242,6 @@ static int guc_ring_doorbell(struct i915_guc_client *gc)
db_exc.cookie = 1;
}
 
-   /* Finally, update the cached copy of the GuC's WQ head */
-   gc->wq_head = desc->head;
-
-   kunmap_atomic(base);
return ret;
 }
 
@@ -341,10 +335,8 @@ static void guc_init_proc_desc(struct intel_guc *guc,
   struct i915_guc_client *client)
 {
struct guc_process_desc *desc;
-   void *base;
 
-   base = kmap_atomic(i915_gem_object_get_page(client->client_obj, 0));
-   desc = base + client->proc_desc_offset;
+   desc = client->client_base + client->proc_desc_offset;
 
memset(desc, 0, sizeof(*desc));
 
@@ -361,8 +353,6 @@ 

Re: [Intel-gfx] [PATCH 3/6] drm/i915: Remove the SPLL==270Mhz assumption from intel_fdi_link_freq()

2016-02-18 Thread Imre Deak
On ke, 2016-02-17 at 21:41 +0200, ville.syrj...@linux.intel.com wrote:
> From: Ville Syrjälä 
> 
> Instead of assuming we've correctly set up SPLL to run at 270Mhz for
> FDI, let's use the port_clock from pipe_config which should be what
> we want. This would catch problems if someone misconfigures SPLL for
> whatever reason.
> 
> Signed-off-by: Ville Syrjälä 

Reviewed-by: Imre Deak 

> ---
>  drivers/gpu/drm/i915/intel_display.c | 17 ++---
>  1 file changed, 10 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_display.c
> b/drivers/gpu/drm/i915/intel_display.c
> index 99001e117517..a3c959cd8b3b 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -224,12 +224,15 @@ static void intel_update_czclk(struct
> drm_i915_private *dev_priv)
>  }
>  
>  static inline u32 /* units of 100MHz */
> -intel_fdi_link_freq(struct drm_i915_private *dev_priv)
> +intel_fdi_link_freq(struct drm_i915_private *dev_priv,
> + const struct intel_crtc_state *pipe_config)
>  {
> - if (IS_GEN5(dev_priv))
> - return (I915_READ(FDI_PLL_BIOS_0) &
> FDI_PLL_FB_CLOCK_MASK) + 2;
> + if (HAS_DDI(dev_priv))
> + return pipe_config->port_clock; /* SPLL */
> + else if (IS_GEN5(dev_priv))
> + return ((I915_READ(FDI_PLL_BIOS_0) &
> FDI_PLL_FB_CLOCK_MASK) + 2) * 1;
>   else
> - return 27;
> + return 27;
>  }
>  
>  static const intel_limit_t intel_limits_i8xx_dac = {
> @@ -6588,7 +6591,7 @@ retry:
>    * Hence the bw of each lane in terms of the mode signal
>    * is:
>    */
> - link_bw = intel_fdi_link_freq(to_i915(dev)) *
> MHz(100)/KHz(1)/10;
> + link_bw = intel_fdi_link_freq(to_i915(dev), pipe_config);
>  
>   fdi_dotclock = adjusted_mode->crtc_clock;
>  
> @@ -10774,7 +10777,7 @@ static void ironlake_pch_clock_get(struct
> intel_crtc *crtc,
>    * Calculate one based on the FDI configuration.
>    */
>   pipe_config->base.adjusted_mode.crtc_clock =
> - intel_dotclock_calculate(intel_fdi_link_freq(dev_pri
> v) * 1,
> + intel_dotclock_calculate(intel_fdi_link_freq(dev_pri
> v, pipe_config),
>    _config->fdi_m_n);
>  }
>  
> @@ -12789,7 +12792,7 @@ static void
> intel_pipe_config_sanity_check(struct drm_i915_private *dev_priv,
>      const struct
> intel_crtc_state *pipe_config)
>  {
>   if (pipe_config->has_pch_encoder) {
> - int fdi_dotclock =
> intel_dotclock_calculate(intel_fdi_link_freq(dev_priv) * 1,
> + int fdi_dotclock =
> intel_dotclock_calculate(intel_fdi_link_freq(dev_priv, pipe_config),
>   _co
> nfig->fdi_m_n);
>   int dotclock = pipe_config-
> >base.adjusted_mode.crtc_clock;
>  
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 2/6] drm/i915: Move the encoder vs. FDI dotclock check out from encoder .get_config()

2016-02-18 Thread Imre Deak
On ke, 2016-02-17 at 21:41 +0200, ville.syrj...@linux.intel.com wrote:
> From: Ville Syrjälä 
> 
> Currently we check if the encoder's idea of dotclock agrees with what
> we calculated based on the FDI parameters. We do this in the encoder
> .get_config() hooks, which isn't so nice in case the BIOS (or some
> other
> outside party) made a mess of the state and we're just trying to take
> over.
> 
> So as a prep step to being able sanitize such a bogus state, move the
> the sanity check to just after we've read out the entire state. If
> we then need to sanitize a bad state, it should be easier to move the
> sanity check to occur after sanitation instead of before it.
> 
> Signed-off-by: Ville Syrjälä 

Separating the get-config and check steps makes things more logical in
any case. Looks ok to me:
Reviewed-by: Imre Deak 

> ---
>  drivers/gpu/drm/i915/intel_crt.c | 10 +--
>  drivers/gpu/drm/i915/intel_display.c | 57 
> 
>  drivers/gpu/drm/i915/intel_dp.c  | 11 ++-
>  drivers/gpu/drm/i915/intel_drv.h |  3 --
>  drivers/gpu/drm/i915/intel_hdmi.c|  3 --
>  drivers/gpu/drm/i915/intel_lvds.c|  8 +
>  drivers/gpu/drm/i915/intel_sdvo.c|  4 +--
>  7 files changed, 38 insertions(+), 58 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_crt.c
> b/drivers/gpu/drm/i915/intel_crt.c
> index e686a91a416e..f4c88d93a164 100644
> --- a/drivers/gpu/drm/i915/intel_crt.c
> +++ b/drivers/gpu/drm/i915/intel_crt.c
> @@ -120,17 +120,9 @@ static unsigned int intel_crt_get_flags(struct
> intel_encoder *encoder)
>  static void intel_crt_get_config(struct intel_encoder *encoder,
>    struct intel_crtc_state
> *pipe_config)
>  {
> - struct drm_device *dev = encoder->base.dev;
> - int dotclock;
> -
>   pipe_config->base.adjusted_mode.flags |=
> intel_crt_get_flags(encoder);
>  
> - dotclock = pipe_config->port_clock;
> -
> - if (HAS_PCH_SPLIT(dev))
> - ironlake_check_encoder_dotclock(pipe_config,
> dotclock);
> -
> - pipe_config->base.adjusted_mode.crtc_clock = dotclock;
> + pipe_config->base.adjusted_mode.crtc_clock = pipe_config-
> >port_clock;
>  }
>  
>  static void hsw_crt_get_config(struct intel_encoder *encoder,
> diff --git a/drivers/gpu/drm/i915/intel_display.c
> b/drivers/gpu/drm/i915/intel_display.c
> index f0f88061a9e5..99001e117517 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -224,12 +224,11 @@ static void intel_update_czclk(struct
> drm_i915_private *dev_priv)
>  }
>  
>  static inline u32 /* units of 100MHz */
> -intel_fdi_link_freq(struct drm_device *dev)
> +intel_fdi_link_freq(struct drm_i915_private *dev_priv)
>  {
> - if (IS_GEN5(dev)) {
> - struct drm_i915_private *dev_priv = dev-
> >dev_private;
> + if (IS_GEN5(dev_priv))
>   return (I915_READ(FDI_PLL_BIOS_0) &
> FDI_PLL_FB_CLOCK_MASK) + 2;
> - } else
> + else
>   return 27;
>  }
>  
> @@ -6589,7 +6588,7 @@ retry:
>    * Hence the bw of each lane in terms of the mode signal
>    * is:
>    */
> - link_bw = intel_fdi_link_freq(dev) * MHz(100)/KHz(1)/10;
> + link_bw = intel_fdi_link_freq(to_i915(dev)) *
> MHz(100)/KHz(1)/10;
>  
>   fdi_dotclock = adjusted_mode->crtc_clock;
>  
> @@ -6601,8 +6600,7 @@ retry:
>   intel_link_compute_m_n(pipe_config->pipe_bpp, lane,
> fdi_dotclock,
>      link_bw, _config->fdi_m_n);
>  
> - ret = ironlake_check_fdi_lanes(intel_crtc->base.dev,
> -    intel_crtc->pipe,
> pipe_config);
> + ret = ironlake_check_fdi_lanes(dev, intel_crtc->pipe,
> pipe_config);
>   if (ret == -EINVAL && pipe_config->pipe_bpp > 6*3) {
>   pipe_config->pipe_bpp -= 2*3;
>   DRM_DEBUG_KMS("fdi link bw constraint, reducing pipe
> bpp to %i\n",
> @@ -10765,19 +10763,18 @@ int intel_dotclock_calculate(int link_freq,
>  static void ironlake_pch_clock_get(struct intel_crtc *crtc,
>      struct intel_crtc_state
> *pipe_config)
>  {
> - struct drm_device *dev = crtc->base.dev;
> + struct drm_i915_private *dev_priv = to_i915(crtc->base.dev);
>  
>   /* read out port_clock from the DPLL */
>   i9xx_crtc_clock_get(crtc, pipe_config);
>  
>   /*
> -  * This value does not include pixel_multiplier.
> -  * We will check that port_clock and
> adjusted_mode.crtc_clock
> -  * agree once we know their relationship in the encoder's
> -  * get_config() function.
> +  * In case there is an active pipe without active ports,
> +  * we may need some idea for the dotclock anyway.
> +  * Calculate one based on the FDI configuration.
>    */
>   pipe_config->base.adjusted_mode.crtc_clock =
> - intel_dotclock_calculate(intel_fdi_link_freq(dev) *

Re: [Intel-gfx] Fwd: [PATCH] drm/i915: Avoid vblank counter for gen9+

2016-02-18 Thread Imre Deak
On to, 2016-02-18 at 08:56 -0800, Rodrigo Vivi wrote:
> Imre, Patrik, do you know if I'm missing something or what I'm doing
> wrong with this power domain handler for vblanks to avoid DC states
> when we need a reliable frame counter in place.

The WARN is due to the spin_lock() in drm_vblank_enable(), you can't
call power domain functions in atomic context, due to the mutex the
power domain and runtime PM fw uses.

--Imre

> 
> Do you have better ideas?
> 
> Thanks,
> Rodrigo.
> 
> -- Forwarded message --
> From: Rodrigo Vivi 
> Date: Wed, Feb 17, 2016 at 3:14 PM
> Subject: Re: [Intel-gfx] [PATCH] drm/i915: Avoid vblank counter for
> gen9+
> To: Daniel Vetter , Patrik Jakobsson
> 
> Cc: Rodrigo Vivi , intel-gfx
> 
> 
> 
> On Tue, Feb 16, 2016 at 7:50 AM, Daniel Vetter 
> wrote:
> > On Thu, Feb 11, 2016 at 09:00:47AM -0800, Rodrigo Vivi wrote:
> > > Framecounter register is read-only so DMC cannot restore it
> > > after exiting DC5 and DC6.
> > > 
> > > Easiest way to go is to avoid the counter and use vblank
> > > interruptions for this platform and for all the following
> > > ones since DMC came to stay. At least while we can't change
> > > this register to read-write.
> > > 
> > > Signed-off-by: Rodrigo Vivi 
> > 
> > Now my comments also in public:
> > - Do we still get reasonable dc5 residency with this - it means
> > we'll keep
> >   vblank irq running forever.
> > 
> > - I'm a bit unclear on what exactly this fixes - have you tested
> > that
> >   long-lasting vblank waits are still accurate? Just want to make
> > sure we
> >   don't just paper over the issue and desktops can still get stuck
> > waiting
> >   for a vblank.
> 
> apparently no... so please just ignore this patch for now... after a
> while with that patch I was seeing the issue again...
> 
> > 
> > Just a bit suprised that the only problem is the framecounter, and
> > not
> > that vblanks stop happening too.
> > 
> > We need to also know these details for the proper fix, which will
> > involve
> > grabbing power well references (might need a new one for vblank
> > interrupts) to make sure.
> 
> Yeap, I liked this idea... so combining a power domain reference with
> a vblank count restore once we know the dc off is blocked we could
> workaround this case... something like:
> 
> diff --git a/drivers/gpu/drm/i915/i915_irq.c
> b/drivers/gpu/drm/i915/i915_irq.c
> index 25a8937..2b18778 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -2743,7 +2743,10 @@ static int gen8_enable_vblank(struct
> drm_device
> *dev, unsigned int pipe)
> struct drm_i915_private *dev_priv = dev->dev_private;
> unsigned long irqflags;
> 
> +   intel_display_power_get(dev_priv, POWER_DOMAIN_VBLANK);
> +
> spin_lock_irqsave(_priv->irq_lock, irqflags);
> +   dev->vblank[pipe].last = g4x_get_vblank_counter(dev, pipe);
> bdw_enable_pipe_irq(dev_priv, pipe, GEN8_PIPE_VBLANK);
> spin_unlock_irqrestore(_priv->irq_lock, irqflags);
> 
> @@ -2796,6 +2799,8 @@ static void gen8_disable_vblank(struct
> drm_device *dev, unsigned int pipe)
> spin_lock_irqsave(_priv->irq_lock, irqflags);
> bdw_disable_pipe_irq(dev_priv, pipe, GEN8_PIPE_VBLANK);
> spin_unlock_irqrestore(_priv->irq_lock, irqflags);
> +
> +   intel_display_power_put(dev_priv, POWER_DOMAIN_VBLANK);
>  }
> 
> where POWER_DOMAIN_VBLANK is part of:
> #define SKL_DISPLAY_DC_OFF_POWER_DOMAINS (  \
> BIT(POWER_DOMAIN_VBLANK) |  \
> 
> 
> However I have my dmesg flooded by:
> 
> 
> [   69.025562] BUG: sleeping function called from invalid context at
> drivers/base/power/runtime.c:955
> [   69.025576] in_atomic(): 1, irqs_disabled(): 1, pid: 995, name:
> Xorg
> [   69.025582] Preemption disabled at:[]
> drm_vblank_get+0x4e/0xd0
> 
> [   69.025619] CPU: 3 PID: 995 Comm: Xorg Tainted: G U  W
> 4.5.0-rc4+ #11
> [   69.025628] Hardware name: Intel Corporation Kabylake Client
> platform/Skylake U DDR3L RVP7, BIOS KBLSE2R1.R00.X019.B01.1512230743
> 12/23/2015
> [   69.025637]   88003f0bfbb0 8148e983
> 
> [   69.025653]  880085b04200 88003f0bfbd0 81133ece
> 81d77f23
> [   69.025667]  03bb 88003f0bfbf8 81133f89
> 88016913a098
> [   69.025680] Call Trace:
> [   69.025697]  [] dump_stack+0x65/0x92
> [   69.025711]  [] ___might_sleep+0x10e/0x180
> [   69.025722]  [] __might_sleep+0x49/0x80
> [   69.025739]  [] __pm_runtime_resume+0x79/0x80
> [   69.025841]  [] intel_runtime_pm_get+0x28/0x90
> [i915]
> [   69.025924]  []
> intel_display_power_get+0x19/0x50 [i915]
> [   69.025995]  [] gen8_enable_vblank+0x34/0xc0
> [i915]
> [   69.026016]  [] drm_vblank_enable+0x76/0xd0
> 
> 
> 
> 
> Another thing 

Re: [Intel-gfx] [PATCH RESSEND FOR CI *AGAIN*] drm/i915/bxt: Remove DSP CLK_GATE programming for BXT

2016-02-18 Thread Jani Nikula
On Thu, 18 Feb 2016, Jani Nikula  wrote:
> From: Uma Shankar 
>
> DSP CLK_GATE registers are specific to BYT and CHT.
> Avoid programming the same for BXT platform.
>
> v2: Rebased on latest drm nightly branch.
>
> v3: Fixed Jani's review comments
>
> Signed-off-by: Uma Shankar 
> Signed-off-by: Jani Nikula 

I gave up hoping to get CI results for this one, after two attempts. We
have no coverage for this function anyway, and I've tested this before
to not break BYT. Thus pushed to drm-intel-next-queued.

BR,
Jani.


> ---
>  drivers/gpu/drm/i915/intel_dsi.c | 11 +++
>  1 file changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_dsi.c 
> b/drivers/gpu/drm/i915/intel_dsi.c
> index fcd746c55abd..b928c503df24 100644
> --- a/drivers/gpu/drm/i915/intel_dsi.c
> +++ b/drivers/gpu/drm/i915/intel_dsi.c
> @@ -634,7 +634,6 @@ static void intel_dsi_post_disable(struct intel_encoder 
> *encoder)
>  {
>   struct drm_i915_private *dev_priv = encoder->base.dev->dev_private;
>   struct intel_dsi *intel_dsi = enc_to_intel_dsi(>base);
> - u32 val;
>  
>   DRM_DEBUG_KMS("\n");
>  
> @@ -642,9 +641,13 @@ static void intel_dsi_post_disable(struct intel_encoder 
> *encoder)
>  
>   intel_dsi_clear_device_ready(encoder);
>  
> - val = I915_READ(DSPCLK_GATE_D);
> - val &= ~DPOUNIT_CLOCK_GATE_DISABLE;
> - I915_WRITE(DSPCLK_GATE_D, val);
> + if (!IS_BROXTON(dev_priv)) {
> + u32 val;
> +
> + val = I915_READ(DSPCLK_GATE_D);
> + val &= ~DPOUNIT_CLOCK_GATE_DISABLE;
> + I915_WRITE(DSPCLK_GATE_D, val);
> + }
>  
>   drm_panel_unprepare(intel_dsi->panel);

-- 
Jani Nikula, Intel Open Source Technology Center
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v4 3/8] drm/i915: Kill off intel_crtc->atomic.wait_vblank, v4.

2016-02-18 Thread Zanoni, Paulo R
Em Qui, 2016-02-18 às 15:46 +0100, Maarten Lankhorst escreveu:
> Op 18-02-16 om 15:14 schreef Zanoni, Paulo R:
> > Em Qui, 2016-02-18 às 14:22 +0100, Maarten Lankhorst escreveu:
> > > Op 17-02-16 om 22:20 schreef Zanoni, Paulo R:
> > > > Em Qua, 2016-02-10 às 13:49 +0100, Maarten Lankhorst escreveu:
> > > > > Currently we perform our own wait in post_plane_update,
> > > > > but the atomic core performs another one in wait_for_vblanks.
> > > > > This means that 2 vblanks are done when a fb is changed,
> > > > > which is a bit overkill.
> > > > > 
> > > > > Merge them by creating a helper function that takes a crtc
> > > > > mask
> > > > > for the planes to wait on.
> > > > > 
> > > > > The broadwell vblank workaround may look gone entirely but
> > > > > this
> > > > > is
> > > > > not the case. pipe_config->wm_changed is set to true
> > > > > when any plane is turned on, which forces a vblank wait.
> > > > > 
> > > > > Changes since v1:
> > > > > - Removing the double vblank wait on broadwell moved to its
> > > > > own
> > > > > commit.
> > > > > Changes since v2:
> > > > > - Move out POWER_DOMAIN_MODESET handling to its own commit.
> > > > > Changes since v3:
> > > > > - Do not wait for vblank on legacy cursor updates. (Ville)
> > > > > - Move broadwell vblank workaround comment to
> > > > > page_flip_finished.
> > > > > (Ville)
> > > > > Changes since v4:
> > > > > - Compile fix, legacy_cursor_flip -> *_update.
> > > > > 
> > > > > Signed-off-by: Maarten Lankhorst  > > > > el.c
> > > > > om>
> > > > > ---
> > > > >  drivers/gpu/drm/i915/intel_atomic.c  |  1 +
> > > > >  drivers/gpu/drm/i915/intel_display.c | 86
> > > > > +++-
> > > > >  drivers/gpu/drm/i915/intel_drv.h |  2 +-
> > > > >  3 files changed, 67 insertions(+), 22 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/i915/intel_atomic.c
> > > > > b/drivers/gpu/drm/i915/intel_atomic.c
> > > > > index 4625f8a9ba12..8e579a8505ac 100644
> > > > > --- a/drivers/gpu/drm/i915/intel_atomic.c
> > > > > +++ b/drivers/gpu/drm/i915/intel_atomic.c
> > > > > @@ -97,6 +97,7 @@ intel_crtc_duplicate_state(struct drm_crtc
> > > > > *crtc)
> > > > >   crtc_state->disable_lp_wm = false;
> > > > >   crtc_state->disable_cxsr = false;
> > > > >   crtc_state->wm_changed = false;
> > > > > + crtc_state->fb_changed = false;
> > > > >  
> > > > >   return _state->base;
> > > > >  }
> > > > > diff --git a/drivers/gpu/drm/i915/intel_display.c
> > > > > b/drivers/gpu/drm/i915/intel_display.c
> > > > > index 804f2c6f260d..4d4dddc1f970 100644
> > > > > --- a/drivers/gpu/drm/i915/intel_display.c
> > > > > +++ b/drivers/gpu/drm/i915/intel_display.c
> > > > > @@ -4785,9 +4785,6 @@ static void
> > > > > intel_post_plane_update(struct
> > > > > intel_crtc *crtc)
> > > > >   to_intel_crtc_state(crtc->base.state);
> > > > >   struct drm_device *dev = crtc->base.dev;
> > > > >  
> > > > > - if (atomic->wait_vblank)
> > > > > - intel_wait_for_vblank(dev, crtc->pipe);
> > > > > -
> > > > >   intel_frontbuffer_flip(dev, atomic->fb_bits);
> > > > >  
> > > > >   crtc->wm.cxsr_allowed = true;
> > > > > @@ -10902,6 +10899,12 @@ static bool
> > > > > page_flip_finished(struct
> > > > > intel_crtc *crtc)
> > > > >   return true;
> > > > >  
> > > > >   /*
> > > > > +  * BDW signals flip done immediately if the plane
> > > > > +  * is disabled, even if the plane enable is already
> > > > > +  * armed to occur at the next vblank :(
> > > > > +  */
> > > > Having this comment here is just... weird. I think it removes a
> > > > lot
> > > > of
> > > > the context that was present before.
> > > > 
> > > > > +
> > > > > + /*
> > > > >    * A DSPSURFLIVE check isn't enough in case the mmio
> > > > > and
> > > > > CS
> > > > > flips
> > > > >    * used the same base address. In that case the mmio
> > > > > flip
> > > > > might
> > > > >    * have completed, but the CS hasn't even executed
> > > > > the
> > > > > flip
> > > > > yet.
> > > > > @@ -11778,6 +11781,9 @@ int
> > > > > intel_plane_atomic_calc_changes(struct
> > > > > drm_crtc_state *crtc_state,
> > > > >   if (!was_visible && !visible)
> > > > >   return 0;
> > > > >  
> > > > > + if (fb != old_plane_state->base.fb)
> > > > > + pipe_config->fb_changed = true;
> > > > > +
> > > > >   turn_off = was_visible && (!visible ||
> > > > > mode_changed);
> > > > >   turn_on = visible && (!was_visible || mode_changed);
> > > > >  
> > > > > @@ -11793,8 +11799,6 @@ int
> > > > > intel_plane_atomic_calc_changes(struct
> > > > > drm_crtc_state *crtc_state,
> > > > >  
> > > > >   /* must disable cxsr around plane
> > > > > enable/disable
> > > > > */
> > > > >   if (plane->type != DRM_PLANE_TYPE_CURSOR) {
> > > > > - if (is_crtc_enabled)
> > > > > - intel_crtc-
> 

[Intel-gfx] Fwd: [PATCH] drm/i915: Avoid vblank counter for gen9+

2016-02-18 Thread Rodrigo Vivi
Imre, Patrik, do you know if I'm missing something or what I'm doing
wrong with this power domain handler for vblanks to avoid DC states
when we need a reliable frame counter in place.

Do you have better ideas?

Thanks,
Rodrigo.

-- Forwarded message --
From: Rodrigo Vivi 
Date: Wed, Feb 17, 2016 at 3:14 PM
Subject: Re: [Intel-gfx] [PATCH] drm/i915: Avoid vblank counter for gen9+
To: Daniel Vetter , Patrik Jakobsson

Cc: Rodrigo Vivi , intel-gfx



On Tue, Feb 16, 2016 at 7:50 AM, Daniel Vetter  wrote:
> On Thu, Feb 11, 2016 at 09:00:47AM -0800, Rodrigo Vivi wrote:
>> Framecounter register is read-only so DMC cannot restore it
>> after exiting DC5 and DC6.
>>
>> Easiest way to go is to avoid the counter and use vblank
>> interruptions for this platform and for all the following
>> ones since DMC came to stay. At least while we can't change
>> this register to read-write.
>>
>> Signed-off-by: Rodrigo Vivi 
>
> Now my comments also in public:
> - Do we still get reasonable dc5 residency with this - it means we'll keep
>   vblank irq running forever.
>
> - I'm a bit unclear on what exactly this fixes - have you tested that
>   long-lasting vblank waits are still accurate? Just want to make sure we
>   don't just paper over the issue and desktops can still get stuck waiting
>   for a vblank.

apparently no... so please just ignore this patch for now... after a
while with that patch I was seeing the issue again...

>
> Just a bit suprised that the only problem is the framecounter, and not
> that vblanks stop happening too.
>
> We need to also know these details for the proper fix, which will involve
> grabbing power well references (might need a new one for vblank
> interrupts) to make sure.

Yeap, I liked this idea... so combining a power domain reference with
a vblank count restore once we know the dc off is blocked we could
workaround this case... something like:

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 25a8937..2b18778 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2743,7 +2743,10 @@ static int gen8_enable_vblank(struct drm_device
*dev, unsigned int pipe)
struct drm_i915_private *dev_priv = dev->dev_private;
unsigned long irqflags;

+   intel_display_power_get(dev_priv, POWER_DOMAIN_VBLANK);
+
spin_lock_irqsave(_priv->irq_lock, irqflags);
+   dev->vblank[pipe].last = g4x_get_vblank_counter(dev, pipe);
bdw_enable_pipe_irq(dev_priv, pipe, GEN8_PIPE_VBLANK);
spin_unlock_irqrestore(_priv->irq_lock, irqflags);

@@ -2796,6 +2799,8 @@ static void gen8_disable_vblank(struct
drm_device *dev, unsigned int pipe)
spin_lock_irqsave(_priv->irq_lock, irqflags);
bdw_disable_pipe_irq(dev_priv, pipe, GEN8_PIPE_VBLANK);
spin_unlock_irqrestore(_priv->irq_lock, irqflags);
+
+   intel_display_power_put(dev_priv, POWER_DOMAIN_VBLANK);
 }

where POWER_DOMAIN_VBLANK is part of:
#define SKL_DISPLAY_DC_OFF_POWER_DOMAINS (  \
BIT(POWER_DOMAIN_VBLANK) |  \


However I have my dmesg flooded by:


[   69.025562] BUG: sleeping function called from invalid context at
drivers/base/power/runtime.c:955
[   69.025576] in_atomic(): 1, irqs_disabled(): 1, pid: 995, name: Xorg
[   69.025582] Preemption disabled at:[]
drm_vblank_get+0x4e/0xd0

[   69.025619] CPU: 3 PID: 995 Comm: Xorg Tainted: G U  W
4.5.0-rc4+ #11
[   69.025628] Hardware name: Intel Corporation Kabylake Client
platform/Skylake U DDR3L RVP7, BIOS KBLSE2R1.R00.X019.B01.1512230743
12/23/2015
[   69.025637]   88003f0bfbb0 8148e983

[   69.025653]  880085b04200 88003f0bfbd0 81133ece
81d77f23
[   69.025667]  03bb 88003f0bfbf8 81133f89
88016913a098
[   69.025680] Call Trace:
[   69.025697]  [] dump_stack+0x65/0x92
[   69.025711]  [] ___might_sleep+0x10e/0x180
[   69.025722]  [] __might_sleep+0x49/0x80
[   69.025739]  [] __pm_runtime_resume+0x79/0x80
[   69.025841]  [] intel_runtime_pm_get+0x28/0x90 [i915]
[   69.025924]  [] intel_display_power_get+0x19/0x50 [i915]
[   69.025995]  [] gen8_enable_vblank+0x34/0xc0 [i915]
[   69.026016]  [] drm_vblank_enable+0x76/0xd0




Another thing that I search in the spec was for an Interrupt to know
when we came back from DC5 or DC6 or got power well re-enabled, so we
would be able to restore the drm last counter... but I couldn't find
any...


Any other idea?


>
> Cheers, Daniel
>
>> ---
>>  drivers/gpu/drm/i915/i915_irq.c | 7 +--
>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_irq.c 
>> b/drivers/gpu/drm/i915/i915_irq.c
>> index 25a8937..c294a4b 100644
>> --- a/drivers/gpu/drm/i915/i915_irq.c
>> +++ 

[Intel-gfx] [maintainer-tools PATCH 2/8] dim: add list-branches subcommand to list nightly branches

2016-02-18 Thread Jani Nikula
Helper for bash completion. Where to get the information depends on
user's dim configuration.

Signed-off-by: Jani Nikula 
---
 dim | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/dim b/dim
index c004bc75ca06..33ef8288a291 100755
--- a/dim
+++ b/dim
@@ -972,6 +972,12 @@ function dim_pull_request_next_fixes
dim_pull_request drm-intel-next-fixes $upstream
 }
 
+# Note: used by bash completion
+function dim_list_branches
+{
+   echo $dim_branches | sed 's/ /\n/g'
+}
+
 dim_alias_ub=update-branches
 function dim_update_branches
 {
-- 
2.1.4

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [maintainer-tools PATCH 4/8] completion: use the dim helpers to complete nightly and upstream branches

2016-02-18 Thread Jani Nikula
Use the user's configured directories and remotes via dim.

Signed-off-by: Jani Nikula 
---
 bash_completion | 9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/bash_completion b/bash_completion
index 6a3a88cc80f8..f89764e3947d 100644
--- a/bash_completion
+++ b/bash_completion
@@ -27,13 +27,8 @@ _dim ()
# args = number of arguments
_count_args
 
-   if [ -f ~/linux/drm-intel-rerere/nightly.conf ] ; then
-   local nightly_branches=`(source 
~/linux/drm-intel-rerere/nightly.conf ; echo $nightly_branches) | \
-   xargs -n 1 echo | grep '^origin' | sed -e 
's/^origin\///'`
-   else
-   local nightly_branches=""
-   fi
-   local upstream_branches="origin/master airlied/drm-next 
airlied/drm-fixes"
+   local nightly_branches="$(dim list-branches)"
+   local upstream_branches="$(dim list-upstreams)"
 
cmds="setup nightly-forget update-branches"
cmds="$cmds rebuild-nightly cat-to-fixup"
-- 
2.1.4

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [maintainer-tools PATCH 6/8] dim: rename alias subcommand to list-aliases

2016-02-18 Thread Jani Nikula
Also drop leading tab and fix underscores in output. Helper for bash
completion.

Signed-off-by: Jani Nikula 
---
 dim | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/dim b/dim
index 2f6e6151a4b2..1addd6f6a0e9 100755
--- a/dim
+++ b/dim
@@ -1121,11 +1121,12 @@ function dim_list_commands
declare -F | grep -o " dim_[a-zA-Z_]*" | sed 's/^ dim_//;s/_/-/g'
 }
 
-function dim_alias
+# Note: used by bash completion
+function dim_list_aliases
 {
# use posix mode to omit functions in set output
( set -o posix; set ) | grep "^dim_alias_[a-zA-Z0-9_]*=" |\
-   sed 's/^dim_alias_/\t/;s/=/\t/'
+   sed 's/^dim_alias_//;s/=/\t/;s/_/-/g'
 }
 
 function dim_cat_to_fixup
-- 
2.1.4

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [maintainer-tools PATCH 5/8] dim: add list-commands subcommand to list all subcommands

2016-02-18 Thread Jani Nikula
Helper for completion.

Signed-off-by: Jani Nikula 
---
 dim | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/dim b/dim
index 6fb496ea4192..2f6e6151a4b2 100755
--- a/dim
+++ b/dim
@@ -1115,6 +1115,12 @@ function assert_branch
fi
 }
 
+# Note: used by bash completion
+function dim_list_commands
+{
+   declare -F | grep -o " dim_[a-zA-Z_]*" | sed 's/^ dim_//;s/_/-/g'
+}
+
 function dim_alias
 {
# use posix mode to omit functions in set output
@@ -1178,7 +1184,7 @@ function dim_usage
echo "usage: $0 [OPTIONS] SUBCOMMAND [ARGUMENTS]"
echo
echo "The available subcommands are:"
-   declare -F | grep -o " dim_[a-zA-Z_]*" | sed 's/^ dim_/\t/'
+   dim_list_commands | sed 's/^/\t/'
echo
echo "See '$0 help' for more information."
 }
-- 
2.1.4

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [maintainer-tools PATCH 3/8] dim: add list-upstreams subcommand to list upstream branches

2016-02-18 Thread Jani Nikula
Helper for bash completion. The result depends on user's dim
configuration.

Signed-off-by: Jani Nikula 
---
 dim | 8 
 1 file changed, 8 insertions(+)

diff --git a/dim b/dim
index 33ef8288a291..6fb496ea4192 100755
--- a/dim
+++ b/dim
@@ -973,6 +973,14 @@ function dim_pull_request_next_fixes
 }
 
 # Note: used by bash completion
+function dim_list_upstreams
+{
+   echo origin/master
+   echo $DIM_DRM_UPSTREAM_REMOTE/drm-next
+   echo $DIM_DRM_UPSTREAM_REMOTE/drm-fixes
+}
+
+# Note: used by bash completion
 function dim_list_branches
 {
echo $dim_branches | sed 's/ /\n/g'
-- 
2.1.4

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [maintainer-tools PATCH 8/8] completion: complete aliases like the actual command

2016-02-18 Thread Jani Nikula
Map aliases to the actual commands. No need to know all the aliases.

Signed-off-by: Jani Nikula 
---
 bash_completion | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/bash_completion b/bash_completion
index 4a9d981709a0..9f659b4ebcce 100644
--- a/bash_completion
+++ b/bash_completion
@@ -44,20 +44,26 @@ _dim ()
return 0
fi
 
+   # complete aliases like the actual command
+   local aliasref=$(dim list-aliases | sed -n "s/^${arg}\t\(.*\)/\1/p")
+   if [[ -n "$aliasref" ]]; then
+   arg="$aliasref"
+   fi
+
case "${arg}" in
push-branch)
COMPREPLY=( $( compgen -W "-f $nightly_branches" -- 
$cur ) )
;;
-   push-queued|pq|push-fixes|pf|push-next-fixes|pnf)
+   push-queued|push-fixes|push-next-fixes)
COMPREPLY=( $( compgen -W "-f" -- $cur ) )
;;
-   apply-branch|ab|sob)
+   apply-branch)
COMPREPLY=( $( compgen -W "-s $nightly_branches" -- 
$cur ) )
;;
-   apply-queued|aq|apply-fixes|af|apply-next-fixes|anf)
+   apply-queued|apply-fixes|apply-next-fixes)
COMPREPLY=( $( compgen -W "-s" -- $cur ) )
;;
-   magic-patch|mp)
+   magic-patch)
if [[ $args == 2 ]]; then
COMPREPLY=( $( compgen -o nospace -W "-a" -- 
$cur ) )
fi
@@ -65,7 +71,7 @@ _dim ()
tc|fixes)
# FIXME needs a git sha1
;;
-   check-patch|cp)
+   checkpatch)
# FIXME needs a git sha1
;;
pull-request)
@@ -85,7 +91,7 @@ _dim ()
COMPREPLY=( $( compgen -o nospace -W "drm- 
topic/" -- $cur ) )
fi
;;
-   checkout|co)
+   checkout)
if [[ $args == 2 ]]; then
COMPREPLY=( $( compgen -W "$nightly_branches" 
-- $cur ) )
fi
-- 
2.1.4

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [maintainer-tools PATCH 7/8] completion: use the dim helpers to complete subcommands and aliases

2016-02-18 Thread Jani Nikula
Autodiscover everything, including user's configured aliases.

Signed-off-by: Jani Nikula 
---
 bash_completion | 16 +---
 1 file changed, 1 insertion(+), 15 deletions(-)

diff --git a/bash_completion b/bash_completion
index f89764e3947d..4a9d981709a0 100644
--- a/bash_completion
+++ b/bash_completion
@@ -12,7 +12,6 @@ dim ()
 _dim ()
 {
local args arg cur prev words cword split
-   local cmds
 
# require bash-completion with _init_completion
type -t _init_completion >/dev/null 2>&1 || return
@@ -30,20 +29,6 @@ _dim ()
local nightly_branches="$(dim list-branches)"
local upstream_branches="$(dim list-upstreams)"
 
-   cmds="setup nightly-forget update-branches"
-   cmds="$cmds rebuild-nightly cat-to-fixup"
-   cmds="$cmds push-queued pq push-fixes pf push-next-fixes pnf 
push-branch"
-   cmds="$cmds checkout co conq cof conf"
-   cmds="$cmds apply-branch ab sob apply-queued aq apply-fixes af 
apply-next-fixes anf"
-   cmds="$cmds magic-patch mp cd"
-   cmds="$cmds magic-rebase-resolve mrr"
-   cmds="$cmds apply-igt ai"
-   cmds="$cmds apply-resolved ar tc fixes check-patch cp cherry-pick"
-   cmds="$cmds pull-request pull-request-fixes pull-request-next 
pull-request-next-fixes"
-   cmds="$cmds update-next"
-   cmds="$cmds create-branch remove-branch create-workdir 
for-each-workdirs fw"
-   cmds="$cmds tag-next checker"
-
if [ -z "${arg}" ]; then
# top level completion
case "${cur}" in
@@ -52,6 +37,7 @@ _dim ()
COMPREPLY=( $(compgen -W "${opts}" -- ${cur}) )
;;
*)
+   local cmds="$(dim list-commands) $(dim 
list-aliases | sed 's/\t.*//')"
COMPREPLY=( $(compgen -W "${cmds}" -- ${cur}) )
;;
esac
-- 
2.1.4

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [maintainer-tools PATCH 1/8] completion: require bash completion package and use it

2016-02-18 Thread Jani Nikula
The bash completion package makes life a whole lot easier than using the
builtin bash completion features. It's quite likely anyone using
completion in bash already has it installed.

Signed-off-by: Jani Nikula 
---
 bash_completion | 62 -
 1 file changed, 35 insertions(+), 27 deletions(-)

diff --git a/bash_completion b/bash_completion
index e44e5fc844b4..6a3a88cc80f8 100644
--- a/bash_completion
+++ b/bash_completion
@@ -11,7 +11,21 @@ dim ()
 
 _dim ()
 {
-   local cur cmds opts i
+   local args arg cur prev words cword split
+   local cmds
+
+   # require bash-completion with _init_completion
+   type -t _init_completion >/dev/null 2>&1 || return
+
+   _init_completion || return
+
+   COMPREPLY=()
+
+   # arg = subcommand
+   _get_first_arg
+
+   # args = number of arguments
+   _count_args
 
if [ -f ~/linux/drm-intel-rerere/nightly.conf ] ; then
local nightly_branches=`(source 
~/linux/drm-intel-rerere/nightly.conf ; echo $nightly_branches) | \
@@ -35,27 +49,21 @@ _dim ()
cmds="$cmds create-branch remove-branch create-workdir 
for-each-workdirs fw"
cmds="$cmds tag-next checker"
 
-   opts="-d -f -i"
-
-   i=1
-
-   COMPREPLY=()   # Array variable storing the possible completions.
-   cur=${COMP_WORDS[COMP_CWORD]}
-
-   for comp in "${COMP_WORDS[@]}" ; do
-   for opt in $opts ; do
-   if [[ $opt = $comp ]] ; then
-   i=$((i+1))
-   fi
-   done
-   done
-
-   if [[ $COMP_CWORD == "$i" ]] ; then
-   COMPREPLY=( $( compgen -W "$cmds $opts" -- $cur ) )
+   if [ -z "${arg}" ]; then
+   # top level completion
+   case "${cur}" in
+   -*)
+   local opts="-d -f -i"
+   COMPREPLY=( $(compgen -W "${opts}" -- ${cur}) )
+   ;;
+   *)
+   COMPREPLY=( $(compgen -W "${cmds}" -- ${cur}) )
+   ;;
+   esac
return 0
fi
 
-   case "${COMP_WORDS[i]}" in
+   case "${arg}" in
push-branch)
COMPREPLY=( $( compgen -W "-f $nightly_branches" -- 
$cur ) )
;;
@@ -69,7 +77,7 @@ _dim ()
COMPREPLY=( $( compgen -W "-s" -- $cur ) )
;;
magic-patch|mp)
-   if [[ $COMP_CWORD == "$((i+1))" ]] ; then
+   if [[ $args == 2 ]]; then
COMPREPLY=( $( compgen -o nospace -W "-a" -- 
$cur ) )
fi
;;
@@ -80,34 +88,34 @@ _dim ()
# FIXME needs a git sha1
;;
pull-request)
-   if [[ $COMP_CWORD == "$((i+1))" ]] ; then
+   if [[ $args == 2 ]]; then
COMPREPLY=( $( compgen -W "$nightly_branches" 
-- $cur ) )
-   elif [[ $COMP_CWORD == "$((i+2))" ]] ; then
+   elif [[ $args == 3 ]]; then
COMPREPLY=( $( compgen -W "$upstream_branches" 
-- $cur ) )
fi
;;
pull-request-next|pull-request-fixes|pull-request-next-fixes)
-   if [[ $COMP_CWORD == "$((i+1))" ]] ; then
+   if [[ $args == 2 ]]; then
COMPREPLY=( $( compgen -W "$upstream_branches" 
-- $cur ) )
fi
;;
create-branch)
-   if [[ $COMP_CWORD == "$((i+1))" ]] ; then
+   if [[ $args == 2 ]]; then
COMPREPLY=( $( compgen -o nospace -W "drm- 
topic/" -- $cur ) )
fi
;;
checkout|co)
-   if [[ $COMP_CWORD == "$((i+1))" ]] ; then
+   if [[ $args == 2 ]]; then
COMPREPLY=( $( compgen -W "$nightly_branches" 
-- $cur ) )
fi
;;
remove-branch)
-   if [[ $COMP_CWORD == "$((i+1))" ]] ; then
+   if [[ $args == 2 ]]; then
COMPREPLY=( $( compgen -W "$nightly_branches" 
-- $cur ) )
fi
;;
create-workdir)
-   if [[ $COMP_CWORD == "$((i+1))" ]] ; then
+   if [[ $args == 2 ]]; then
COMPREPLY=( $( compgen -W "$nightly_branches 
all" -- $cur ) )
fi
 

[Intel-gfx] [PATCH 2/4] drm/i915/gen9: Verify and enforce dc6 state writes

2016-02-18 Thread Mika Kuoppala
It has been observed that sometimes disabling the dc6 fails
and dc6 state pops back up, brief moment after disabling. This
has to be dmc save/restore timing issue or other bug in the
way dc states are handled.

Try to work around this issue as we don't have firmware fix
yet available. Verify that the value we wrote for the dmc sticks,
and also enforce it by rewriting it, if it didn't.

v2: Zero rereads on rewrite for extra paranoia (Imre)

Testcase: kms_flip/basic-flip-vs-dpms
References: https://bugs.freedesktop.org/show_bug.cgi?id=93768
Cc: Patrik Jakobsson 
Cc: Rodrigo Vivi 
Cc: Imre Deak 
Signed-off-by: Mika Kuoppala 
Reviewed-by: Imre Deak 
---
 drivers/gpu/drm/i915/intel_runtime_pm.c | 41 +++--
 1 file changed, 39 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c 
b/drivers/gpu/drm/i915/intel_runtime_pm.c
index 8b9290fdb3b2..814cf5ac1ef0 100644
--- a/drivers/gpu/drm/i915/intel_runtime_pm.c
+++ b/drivers/gpu/drm/i915/intel_runtime_pm.c
@@ -470,6 +470,43 @@ static void gen9_set_dc_state_debugmask_memory_up(
}
 }
 
+static void gen9_write_dc_state(struct drm_i915_private *dev_priv,
+   u32 state)
+{
+   int rewrites = 0;
+   int rereads = 0;
+   u32 v;
+
+   I915_WRITE(DC_STATE_EN, state);
+
+   /* It has been observed that disabling the dc6 state sometimes
+* doesn't stick and dmc keeps returning old value. Make sure
+* the write really sticks enough times and also force rewrite until
+* we are confident that state is exactly what we want.
+*/
+   do  {
+   v = I915_READ(DC_STATE_EN);
+
+   if (v != state) {
+   I915_WRITE(DC_STATE_EN, state);
+   rewrites++;
+   rereads = 0;
+   } else if (rereads++ > 5) {
+   break;
+   }
+
+   } while (rewrites < 100);
+
+   if (v != state)
+   DRM_ERROR("Writing dc state to 0x%x failed, now 0x%x\n",
+ state, v);
+
+   /* Most of the times we need one retry, avoid spam */
+   if (rewrites > 1)
+   DRM_DEBUG_KMS("Rewrote dc state to 0x%x %d times\n",
+ state, rewrites);
+}
+
 static void gen9_set_dc_state(struct drm_i915_private *dev_priv, uint32_t 
state)
 {
uint32_t val;
@@ -502,8 +539,8 @@ static void gen9_set_dc_state(struct drm_i915_private 
*dev_priv, uint32_t state)
 
val &= ~mask;
val |= state;
-   I915_WRITE(DC_STATE_EN, val);
-   POSTING_READ(DC_STATE_EN);
+
+   gen9_write_dc_state(dev_priv, val);
 
dev_priv->csr.dc_state = val & mask;
 }
-- 
2.5.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 2/4] drm/i915/gen9: Verify and enforce dc6 state writes

2016-02-18 Thread Mika Kuoppala
Imre Deak  writes:

> On to, 2016-02-18 at 17:21 +0200, Mika Kuoppala wrote:
>> It has been observed that sometimes disabling the dc6 fails
>> and dc6 state pops back up, brief moment after disabling. This
>> has to be dmc save/restore timing issue or other bug in the
>> way dc states are handled.
>> 
>> Try to work around this issue as we don't have firmware fix
>> yet available. Verify that the value we wrote for the dmc sticks,
>> and also enforce it by rewriting it, if it didn't.
>> 
>> Testcase: kms_flip/basic-flip-vs-dpms
>> References: https://bugs.freedesktop.org/show_bug.cgi?id=93768
>> Cc: Patrik Jakobsson 
>> Cc: Rodrigo Vivi 
>> Cc: Imre Deak 
>> Signed-off-by: Mika Kuoppala 
>> ---
>>  drivers/gpu/drm/i915/intel_runtime_pm.c | 40
>> +++--
>>  1 file changed, 38 insertions(+), 2 deletions(-)
>> 
>> diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c
>> b/drivers/gpu/drm/i915/intel_runtime_pm.c
>> index 8b9290fdb3b2..cb91540cfbad 100644
>> --- a/drivers/gpu/drm/i915/intel_runtime_pm.c
>> +++ b/drivers/gpu/drm/i915/intel_runtime_pm.c
>> @@ -470,6 +470,42 @@ static void
>> gen9_set_dc_state_debugmask_memory_up(
>>  }
>>  }
>>  
>> +static void gen9_write_dc_state(struct drm_i915_private *dev_priv,
>> +u32 state)
>> +{
>> +int rewrites = 0;
>> +int rereads = 0;
>> +u32 v;
>> +
>> +I915_WRITE(DC_STATE_EN, state);
>> +
>> +/* It has been observed that disabling the dc6 state
>> sometimes
>> + * doesn't stick and dmc keeps returning old value. Make
>> sure
>> + * the write really sticks enough times and also force
>> rewrite until
>> + * we are confident that state is exactly what we want.
>> + */
>> +do  {
>> +v = I915_READ(DC_STATE_EN);
>> +
>> +if (v != state) {
>> +I915_WRITE(DC_STATE_EN, state);
>> +rewrites++;
>
> Could be rereads = 0; for extra paranoia. Either way:

Oh yes, extra paranoia in here is warranted. I will
add that.

> Reviewed-by: Imre Deak 

Thanks,
-Mika

>
>> +} else if (rereads++ > 5) {
>> +break;
>> +}
>> +
>> +} while (rewrites < 100);
>> +
>> +if (v != state)
>> +DRM_ERROR("Writing dc state to 0x%x failed, now
>> 0x%x\n",
>> +  state, v);
>> +
>> +/* Most of the times we need one retry, avoid spam */
>> +if (rewrites > 1)
>> +DRM_DEBUG_KMS("Rewrote dc state to 0x%x %d times\n",
>> +  state, rewrites);
>> +}
>> +
>>  static void gen9_set_dc_state(struct drm_i915_private *dev_priv,
>> uint32_t state)
>>  {
>>  uint32_t val;
>> @@ -502,8 +538,8 @@ static void gen9_set_dc_state(struct
>> drm_i915_private *dev_priv, uint32_t state)
>>  
>>  val &= ~mask;
>>  val |= state;
>> -I915_WRITE(DC_STATE_EN, val);
>> -POSTING_READ(DC_STATE_EN);
>> +
>> +gen9_write_dc_state(dev_priv, val);
>>  
>>  dev_priv->csr.dc_state = val & mask;
>>  }
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 4/4] drm/i915/gen9: Write dc state debugmask bits only once

2016-02-18 Thread Imre Deak
On to, 2016-02-18 at 17:21 +0200, Mika Kuoppala wrote:
> DMC debugmask bits should stick so no need to write them
> everytime dc state is changed.
> 
> v2: Write after firmware has been successfully loaded (Ville)
> 
> Signed-off-by: Mika Kuoppala 

Reviewed-by: Imre Deak 

> ---
>  drivers/gpu/drm/i915/intel_csr.c| 8 +---
>  drivers/gpu/drm/i915/intel_drv.h| 2 +-
>  drivers/gpu/drm/i915/intel_runtime_pm.c | 7 ++-
>  3 files changed, 8 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_csr.c
> b/drivers/gpu/drm/i915/intel_csr.c
> index b453fccfa25d..902054efb902 100644
> --- a/drivers/gpu/drm/i915/intel_csr.c
> +++ b/drivers/gpu/drm/i915/intel_csr.c
> @@ -220,19 +220,19 @@ static const struct stepping_info
> *intel_get_stepping_info(struct drm_device *de
>   * Everytime display comes back from low power state this function
> is called to
>   * copy the firmware from internal memory to registers.
>   */
> -void intel_csr_load_program(struct drm_i915_private *dev_priv)
> +bool intel_csr_load_program(struct drm_i915_private *dev_priv)
>  {
>   u32 *payload = dev_priv->csr.dmc_payload;
>   uint32_t i, fw_size;
>  
>   if (!IS_GEN9(dev_priv)) {
>   DRM_ERROR("No CSR support available for this
> platform\n");
> - return;
> + return false;
>   }
>  
>   if (!dev_priv->csr.dmc_payload) {
>   DRM_ERROR("Tried to program CSR with empty
> payload\n");
> - return;
> + return false;
>   }
>  
>   fw_size = dev_priv->csr.dmc_fw_size;
> @@ -245,6 +245,8 @@ void intel_csr_load_program(struct
> drm_i915_private *dev_priv)
>   }
>  
>   dev_priv->csr.dc_state = 0;
> +
> + return true;
>  }
>  
>  static uint32_t *parse_csr_fw(struct drm_i915_private *dev_priv,
> diff --git a/drivers/gpu/drm/i915/intel_drv.h
> b/drivers/gpu/drm/i915/intel_drv.h
> index 285b0570be9c..c208ca630e99 100644
> --- a/drivers/gpu/drm/i915/intel_drv.h
> +++ b/drivers/gpu/drm/i915/intel_drv.h
> @@ -1225,7 +1225,7 @@ u32 skl_plane_ctl_rotation(unsigned int
> rotation);
>  
>  /* intel_csr.c */
>  void intel_csr_ucode_init(struct drm_i915_private *);
> -void intel_csr_load_program(struct drm_i915_private *);
> +bool intel_csr_load_program(struct drm_i915_private *);
>  void intel_csr_ucode_fini(struct drm_i915_private *);
>  
>  /* intel_dp.c */
> diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c
> b/drivers/gpu/drm/i915/intel_runtime_pm.c
> index 1b490c7e4020..7f0577ca900e 100644
> --- a/drivers/gpu/drm/i915/intel_runtime_pm.c
> +++ b/drivers/gpu/drm/i915/intel_runtime_pm.c
> @@ -526,9 +526,6 @@ static void gen9_set_dc_state(struct
> drm_i915_private *dev_priv, uint32_t state)
>   else if (i915.enable_dc == 1 && state >
> DC_STATE_EN_UPTO_DC5)
>   state = DC_STATE_EN_UPTO_DC5;
>  
> - if (state & DC_STATE_EN_UPTO_DC5_DC6_MASK)
> - gen9_set_dc_state_debugmask(dev_priv);
> -
>   val = I915_READ(DC_STATE_EN);
>   DRM_DEBUG_KMS("Setting DC state from %02x to %02x\n",
>     val & mask, state);
> @@ -2119,8 +2116,8 @@ static void skl_display_core_init(struct
> drm_i915_private *dev_priv,
>  
>   skl_init_cdclk(dev_priv);
>  
> - if (dev_priv->csr.dmc_payload)
> - intel_csr_load_program(dev_priv);
> + if (dev_priv->csr.dmc_payload &&
> intel_csr_load_program(dev_priv))
> + gen9_set_dc_state_debugmask(dev_priv);
>  }
>  
>  static void skl_display_core_uninit(struct drm_i915_private
> *dev_priv)
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 2/4] drm/i915/gen9: Verify and enforce dc6 state writes

2016-02-18 Thread Imre Deak
On to, 2016-02-18 at 17:21 +0200, Mika Kuoppala wrote:
> It has been observed that sometimes disabling the dc6 fails
> and dc6 state pops back up, brief moment after disabling. This
> has to be dmc save/restore timing issue or other bug in the
> way dc states are handled.
> 
> Try to work around this issue as we don't have firmware fix
> yet available. Verify that the value we wrote for the dmc sticks,
> and also enforce it by rewriting it, if it didn't.
> 
> Testcase: kms_flip/basic-flip-vs-dpms
> References: https://bugs.freedesktop.org/show_bug.cgi?id=93768
> Cc: Patrik Jakobsson 
> Cc: Rodrigo Vivi 
> Cc: Imre Deak 
> Signed-off-by: Mika Kuoppala 
> ---
>  drivers/gpu/drm/i915/intel_runtime_pm.c | 40
> +++--
>  1 file changed, 38 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c
> b/drivers/gpu/drm/i915/intel_runtime_pm.c
> index 8b9290fdb3b2..cb91540cfbad 100644
> --- a/drivers/gpu/drm/i915/intel_runtime_pm.c
> +++ b/drivers/gpu/drm/i915/intel_runtime_pm.c
> @@ -470,6 +470,42 @@ static void
> gen9_set_dc_state_debugmask_memory_up(
>   }
>  }
>  
> +static void gen9_write_dc_state(struct drm_i915_private *dev_priv,
> + u32 state)
> +{
> + int rewrites = 0;
> + int rereads = 0;
> + u32 v;
> +
> + I915_WRITE(DC_STATE_EN, state);
> +
> + /* It has been observed that disabling the dc6 state
> sometimes
> +  * doesn't stick and dmc keeps returning old value. Make
> sure
> +  * the write really sticks enough times and also force
> rewrite until
> +  * we are confident that state is exactly what we want.
> +  */
> + do  {
> + v = I915_READ(DC_STATE_EN);
> +
> + if (v != state) {
> + I915_WRITE(DC_STATE_EN, state);
> + rewrites++;

Could be rereads = 0; for extra paranoia. Either way:
Reviewed-by: Imre Deak 

> + } else if (rereads++ > 5) {
> + break;
> + }
> +
> + } while (rewrites < 100);
> +
> + if (v != state)
> + DRM_ERROR("Writing dc state to 0x%x failed, now
> 0x%x\n",
> +   state, v);
> +
> + /* Most of the times we need one retry, avoid spam */
> + if (rewrites > 1)
> + DRM_DEBUG_KMS("Rewrote dc state to 0x%x %d times\n",
> +   state, rewrites);
> +}
> +
>  static void gen9_set_dc_state(struct drm_i915_private *dev_priv,
> uint32_t state)
>  {
>   uint32_t val;
> @@ -502,8 +538,8 @@ static void gen9_set_dc_state(struct
> drm_i915_private *dev_priv, uint32_t state)
>  
>   val &= ~mask;
>   val |= state;
> - I915_WRITE(DC_STATE_EN, val);
> - POSTING_READ(DC_STATE_EN);
> +
> + gen9_write_dc_state(dev_priv, val);
>  
>   dev_priv->csr.dc_state = val & mask;
>  }
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 3/4] drm/i915/gen9: Extend dmc debug mask to include cores

2016-02-18 Thread Imre Deak
On to, 2016-02-18 at 17:21 +0200, Mika Kuoppala wrote:
> Cores need to be included into the debug mask. We don't exactly
> know what it does but the spec says it must be enabled. So obey.
> 
> Signed-off-by: Mika Kuoppala 
> ---
>  drivers/gpu/drm/i915/i915_reg.h |  1 +
>  drivers/gpu/drm/i915/intel_runtime_pm.c | 14 --
>  2 files changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_reg.h
> b/drivers/gpu/drm/i915/i915_reg.h
> index 3774870477c1..f76cbf3e5d1e 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -7568,6 +7568,7 @@ enum skl_disp_power_wells {
>  #define  DC_STATE_EN_UPTO_DC5_DC6_MASK   0x3
>  
>  #define  DC_STATE_DEBUG  _MMIO(0x45520)
> +#define  DC_STATE_DEBUG_MASK_CORES   (1<<0)
>  #define  DC_STATE_DEBUG_MASK_MEMORY_UP   (1<<1)
>  
>  /* Please see hsw_read_dcomp() and hsw_write_dcomp() before using
> this register,
> diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c
> b/drivers/gpu/drm/i915/intel_runtime_pm.c
> index cb91540cfbad..1b490c7e4020 100644
> --- a/drivers/gpu/drm/i915/intel_runtime_pm.c
> +++ b/drivers/gpu/drm/i915/intel_runtime_pm.c
> @@ -456,15 +456,17 @@ static void assert_can_disable_dc9(struct
> drm_i915_private *dev_priv)
>     */
>  }
>  
> -static void gen9_set_dc_state_debugmask_memory_up(
> - struct drm_i915_private *dev_priv)
> +static void gen9_set_dc_state_debugmask(struct drm_i915_private
> *dev_priv)
>  {
> - uint32_t val;
> + uint32_t val, mask;
> +
> + mask = DC_STATE_DEBUG_MASK_MEMORY_UP |
> + DC_STATE_DEBUG_MASK_CORES;

The BSpec "Sequence to Allow DC5 or DC6" requires this only for BXT
(looks like a recent addition to work around something), but it doesn't
say it's needed for other platforms. The register description doesn't
make a difference though.

Perhaps Art has more info on this, adding him.

>  
>   /* The below bit doesn't need to be cleared ever afterwards
> */
>   val = I915_READ(DC_STATE_DEBUG);
> - if (!(val & DC_STATE_DEBUG_MASK_MEMORY_UP)) {
> - val |= DC_STATE_DEBUG_MASK_MEMORY_UP;
> + if ((val & mask) != mask) {
> + val |= mask;
>   I915_WRITE(DC_STATE_DEBUG, val);
>   POSTING_READ(DC_STATE_DEBUG);
>   }
> @@ -525,7 +527,7 @@ static void gen9_set_dc_state(struct
> drm_i915_private *dev_priv, uint32_t state)
>   state = DC_STATE_EN_UPTO_DC5;
>  
>   if (state & DC_STATE_EN_UPTO_DC5_DC6_MASK)
> - gen9_set_dc_state_debugmask_memory_up(dev_priv);
> + gen9_set_dc_state_debugmask(dev_priv);
>  
>   val = I915_READ(DC_STATE_EN);
>   DRM_DEBUG_KMS("Setting DC state from %02x to %02x\n",
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 4/4] drm/i915/gen9: Write dc state debugmask bits only once

2016-02-18 Thread Mika Kuoppala
DMC debugmask bits should stick so no need to write them
everytime dc state is changed.

v2: Write after firmware has been successfully loaded (Ville)

Signed-off-by: Mika Kuoppala 
---
 drivers/gpu/drm/i915/intel_csr.c| 8 +---
 drivers/gpu/drm/i915/intel_drv.h| 2 +-
 drivers/gpu/drm/i915/intel_runtime_pm.c | 7 ++-
 3 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_csr.c b/drivers/gpu/drm/i915/intel_csr.c
index b453fccfa25d..902054efb902 100644
--- a/drivers/gpu/drm/i915/intel_csr.c
+++ b/drivers/gpu/drm/i915/intel_csr.c
@@ -220,19 +220,19 @@ static const struct stepping_info 
*intel_get_stepping_info(struct drm_device *de
  * Everytime display comes back from low power state this function is called to
  * copy the firmware from internal memory to registers.
  */
-void intel_csr_load_program(struct drm_i915_private *dev_priv)
+bool intel_csr_load_program(struct drm_i915_private *dev_priv)
 {
u32 *payload = dev_priv->csr.dmc_payload;
uint32_t i, fw_size;
 
if (!IS_GEN9(dev_priv)) {
DRM_ERROR("No CSR support available for this platform\n");
-   return;
+   return false;
}
 
if (!dev_priv->csr.dmc_payload) {
DRM_ERROR("Tried to program CSR with empty payload\n");
-   return;
+   return false;
}
 
fw_size = dev_priv->csr.dmc_fw_size;
@@ -245,6 +245,8 @@ void intel_csr_load_program(struct drm_i915_private 
*dev_priv)
}
 
dev_priv->csr.dc_state = 0;
+
+   return true;
 }
 
 static uint32_t *parse_csr_fw(struct drm_i915_private *dev_priv,
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index 285b0570be9c..c208ca630e99 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -1225,7 +1225,7 @@ u32 skl_plane_ctl_rotation(unsigned int rotation);
 
 /* intel_csr.c */
 void intel_csr_ucode_init(struct drm_i915_private *);
-void intel_csr_load_program(struct drm_i915_private *);
+bool intel_csr_load_program(struct drm_i915_private *);
 void intel_csr_ucode_fini(struct drm_i915_private *);
 
 /* intel_dp.c */
diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c 
b/drivers/gpu/drm/i915/intel_runtime_pm.c
index 1b490c7e4020..7f0577ca900e 100644
--- a/drivers/gpu/drm/i915/intel_runtime_pm.c
+++ b/drivers/gpu/drm/i915/intel_runtime_pm.c
@@ -526,9 +526,6 @@ static void gen9_set_dc_state(struct drm_i915_private 
*dev_priv, uint32_t state)
else if (i915.enable_dc == 1 && state > DC_STATE_EN_UPTO_DC5)
state = DC_STATE_EN_UPTO_DC5;
 
-   if (state & DC_STATE_EN_UPTO_DC5_DC6_MASK)
-   gen9_set_dc_state_debugmask(dev_priv);
-
val = I915_READ(DC_STATE_EN);
DRM_DEBUG_KMS("Setting DC state from %02x to %02x\n",
  val & mask, state);
@@ -2119,8 +2116,8 @@ static void skl_display_core_init(struct drm_i915_private 
*dev_priv,
 
skl_init_cdclk(dev_priv);
 
-   if (dev_priv->csr.dmc_payload)
-   intel_csr_load_program(dev_priv);
+   if (dev_priv->csr.dmc_payload && intel_csr_load_program(dev_priv))
+   gen9_set_dc_state_debugmask(dev_priv);
 }
 
 static void skl_display_core_uninit(struct drm_i915_private *dev_priv)
-- 
2.5.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 3/4] drm/i915/gen9: Extend dmc debug mask to include cores

2016-02-18 Thread Mika Kuoppala
Cores need to be included into the debug mask. We don't exactly
know what it does but the spec says it must be enabled. So obey.

Signed-off-by: Mika Kuoppala 
---
 drivers/gpu/drm/i915/i915_reg.h |  1 +
 drivers/gpu/drm/i915/intel_runtime_pm.c | 14 --
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 3774870477c1..f76cbf3e5d1e 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -7568,6 +7568,7 @@ enum skl_disp_power_wells {
 #define  DC_STATE_EN_UPTO_DC5_DC6_MASK   0x3
 
 #define  DC_STATE_DEBUG  _MMIO(0x45520)
+#define  DC_STATE_DEBUG_MASK_CORES (1<<0)
 #define  DC_STATE_DEBUG_MASK_MEMORY_UP (1<<1)
 
 /* Please see hsw_read_dcomp() and hsw_write_dcomp() before using this 
register,
diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c 
b/drivers/gpu/drm/i915/intel_runtime_pm.c
index cb91540cfbad..1b490c7e4020 100644
--- a/drivers/gpu/drm/i915/intel_runtime_pm.c
+++ b/drivers/gpu/drm/i915/intel_runtime_pm.c
@@ -456,15 +456,17 @@ static void assert_can_disable_dc9(struct 
drm_i915_private *dev_priv)
  */
 }
 
-static void gen9_set_dc_state_debugmask_memory_up(
-   struct drm_i915_private *dev_priv)
+static void gen9_set_dc_state_debugmask(struct drm_i915_private *dev_priv)
 {
-   uint32_t val;
+   uint32_t val, mask;
+
+   mask = DC_STATE_DEBUG_MASK_MEMORY_UP |
+   DC_STATE_DEBUG_MASK_CORES;
 
/* The below bit doesn't need to be cleared ever afterwards */
val = I915_READ(DC_STATE_DEBUG);
-   if (!(val & DC_STATE_DEBUG_MASK_MEMORY_UP)) {
-   val |= DC_STATE_DEBUG_MASK_MEMORY_UP;
+   if ((val & mask) != mask) {
+   val |= mask;
I915_WRITE(DC_STATE_DEBUG, val);
POSTING_READ(DC_STATE_DEBUG);
}
@@ -525,7 +527,7 @@ static void gen9_set_dc_state(struct drm_i915_private 
*dev_priv, uint32_t state)
state = DC_STATE_EN_UPTO_DC5;
 
if (state & DC_STATE_EN_UPTO_DC5_DC6_MASK)
-   gen9_set_dc_state_debugmask_memory_up(dev_priv);
+   gen9_set_dc_state_debugmask(dev_priv);
 
val = I915_READ(DC_STATE_EN);
DRM_DEBUG_KMS("Setting DC state from %02x to %02x\n",
-- 
2.5.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 1/4] drm/i915/gen9: Check for DC state mismatch

2016-02-18 Thread Mika Kuoppala
From: Patrik Jakobsson 

The DMC can incorrectly run off and allow DC states on it's own. We
don't know the root-cause for this yet but this patch makes it more
visible.

Reviewed-by: Mika Kuoppala 
Signed-off-by: Patrik Jakobsson 
---
 drivers/gpu/drm/i915/i915_drv.h | 1 +
 drivers/gpu/drm/i915/intel_csr.c| 2 ++
 drivers/gpu/drm/i915/intel_runtime_pm.c | 8 
 3 files changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 6644c2e354c1..9cbcb5d80b3c 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -746,6 +746,7 @@ struct intel_csr {
uint32_t mmio_count;
i915_reg_t mmioaddr[8];
uint32_t mmiodata[8];
+   uint32_t dc_state;
 };
 
 #define DEV_INFO_FOR_EACH_FLAG(func, sep) \
diff --git a/drivers/gpu/drm/i915/intel_csr.c b/drivers/gpu/drm/i915/intel_csr.c
index 2a7ec3141c8d..b453fccfa25d 100644
--- a/drivers/gpu/drm/i915/intel_csr.c
+++ b/drivers/gpu/drm/i915/intel_csr.c
@@ -243,6 +243,8 @@ void intel_csr_load_program(struct drm_i915_private 
*dev_priv)
I915_WRITE(dev_priv->csr.mmioaddr[i],
   dev_priv->csr.mmiodata[i]);
}
+
+   dev_priv->csr.dc_state = 0;
 }
 
 static uint32_t *parse_csr_fw(struct drm_i915_private *dev_priv,
diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c 
b/drivers/gpu/drm/i915/intel_runtime_pm.c
index a2e367cf99a2..8b9290fdb3b2 100644
--- a/drivers/gpu/drm/i915/intel_runtime_pm.c
+++ b/drivers/gpu/drm/i915/intel_runtime_pm.c
@@ -494,10 +494,18 @@ static void gen9_set_dc_state(struct drm_i915_private 
*dev_priv, uint32_t state)
val = I915_READ(DC_STATE_EN);
DRM_DEBUG_KMS("Setting DC state from %02x to %02x\n",
  val & mask, state);
+
+   /* Check if DMC is ignoring our DC state requests */
+   if ((val & mask) != dev_priv->csr.dc_state)
+   DRM_ERROR("DC state mismatch (0x%x -> 0x%x)\n",
+ dev_priv->csr.dc_state, val & mask);
+
val &= ~mask;
val |= state;
I915_WRITE(DC_STATE_EN, val);
POSTING_READ(DC_STATE_EN);
+
+   dev_priv->csr.dc_state = val & mask;
 }
 
 void bxt_enable_dc9(struct drm_i915_private *dev_priv)
-- 
2.5.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 2/4] drm/i915/gen9: Verify and enforce dc6 state writes

2016-02-18 Thread Mika Kuoppala
It has been observed that sometimes disabling the dc6 fails
and dc6 state pops back up, brief moment after disabling. This
has to be dmc save/restore timing issue or other bug in the
way dc states are handled.

Try to work around this issue as we don't have firmware fix
yet available. Verify that the value we wrote for the dmc sticks,
and also enforce it by rewriting it, if it didn't.

Testcase: kms_flip/basic-flip-vs-dpms
References: https://bugs.freedesktop.org/show_bug.cgi?id=93768
Cc: Patrik Jakobsson 
Cc: Rodrigo Vivi 
Cc: Imre Deak 
Signed-off-by: Mika Kuoppala 
---
 drivers/gpu/drm/i915/intel_runtime_pm.c | 40 +++--
 1 file changed, 38 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c 
b/drivers/gpu/drm/i915/intel_runtime_pm.c
index 8b9290fdb3b2..cb91540cfbad 100644
--- a/drivers/gpu/drm/i915/intel_runtime_pm.c
+++ b/drivers/gpu/drm/i915/intel_runtime_pm.c
@@ -470,6 +470,42 @@ static void gen9_set_dc_state_debugmask_memory_up(
}
 }
 
+static void gen9_write_dc_state(struct drm_i915_private *dev_priv,
+   u32 state)
+{
+   int rewrites = 0;
+   int rereads = 0;
+   u32 v;
+
+   I915_WRITE(DC_STATE_EN, state);
+
+   /* It has been observed that disabling the dc6 state sometimes
+* doesn't stick and dmc keeps returning old value. Make sure
+* the write really sticks enough times and also force rewrite until
+* we are confident that state is exactly what we want.
+*/
+   do  {
+   v = I915_READ(DC_STATE_EN);
+
+   if (v != state) {
+   I915_WRITE(DC_STATE_EN, state);
+   rewrites++;
+   } else if (rereads++ > 5) {
+   break;
+   }
+
+   } while (rewrites < 100);
+
+   if (v != state)
+   DRM_ERROR("Writing dc state to 0x%x failed, now 0x%x\n",
+ state, v);
+
+   /* Most of the times we need one retry, avoid spam */
+   if (rewrites > 1)
+   DRM_DEBUG_KMS("Rewrote dc state to 0x%x %d times\n",
+ state, rewrites);
+}
+
 static void gen9_set_dc_state(struct drm_i915_private *dev_priv, uint32_t 
state)
 {
uint32_t val;
@@ -502,8 +538,8 @@ static void gen9_set_dc_state(struct drm_i915_private 
*dev_priv, uint32_t state)
 
val &= ~mask;
val |= state;
-   I915_WRITE(DC_STATE_EN, val);
-   POSTING_READ(DC_STATE_EN);
+
+   gen9_write_dc_state(dev_priv, val);
 
dev_priv->csr.dc_state = val & mask;
 }
-- 
2.5.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 0/4] gen9 dmc state harderning

2016-02-18 Thread Mika Kuoppala
There have been problems on losing state sync between dmc
and driver. I belive the interplay with racy hw access due to
intel_display_power_is_enabled() with overlapping reprogramming of
allowed dc states (DC_STATE_EN) made DMC very confused.

Imre has now get rid of the troublesome intel_display_power_is_enabled().
On my tests, that is a prerequisite for keeping dmc healthy. But as we
can see from CI/bat, it is still not enough. Sometimes the write still
doesn't stick. So here are dcm state tracking patches.

With these on top of Imre's patches, I have been able to make skl/dmc (v1.23)
symptom free on dc state keeping. With the expection that sometimes we still
need to write the dc_state_en twice. The runaway situation of dmc
not obeying the write, stucking the flip and eventually killing the gpu
is gone.

Thanks,
-Mika

Mika Kuoppala (3):
  drm/i915/gen9: Verify and enforce dc6 state writes
  drm/i915/gen9: Extend dmc debug mask to include cores
  drm/i915/gen9: Write dc state debugmask bits only once

Patrik Jakobsson (1):
  drm/i915/gen9: Check for DC state mismatch

 drivers/gpu/drm/i915/i915_drv.h |  1 +
 drivers/gpu/drm/i915/i915_reg.h |  1 +
 drivers/gpu/drm/i915/intel_csr.c| 10 +++--
 drivers/gpu/drm/i915/intel_drv.h|  2 +-
 drivers/gpu/drm/i915/intel_runtime_pm.c | 67 +++--
 5 files changed, 65 insertions(+), 16 deletions(-)

-- 
2.5.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v6 0/7] Convert requests to use struct fence

2016-02-18 Thread Chris Wilson
On Thu, Feb 18, 2016 at 02:24:03PM +, john.c.harri...@intel.com wrote:
> From: John Harrison 

Does this pass igt? If so, which are the bug fixes for the current
regressions from the request conversion?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v6 3/7] drm/i915: Add per context timelines to fence object

2016-02-18 Thread Chris Wilson
On Thu, Feb 18, 2016 at 02:24:06PM +, john.c.harri...@intel.com wrote:
> From: John Harrison 
> 
> The fence object used inside the request structure requires a sequence
> number. Although this is not used by the i915 driver itself, it could
> potentially be used by non-i915 code if the fence is passed outside of
> the driver. This is the intention as it allows external kernel drivers
> and user applications to wait on batch buffer completion
> asynchronously via the dma-buff fence API.
> 
> To ensure that such external users are not confused by strange things
> happening with the seqno, this patch adds in a per context timeline
> that can provide a guaranteed in-order seqno value for the fence. This
> is safe because the scheduler will not re-order batch buffers within a
> context - they are considered to be mutually dependent.

This is still nonsense. Just implement per-context seqno.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v6 4/7] drm/i915: Delay the freeing of requests until retire time

2016-02-18 Thread Chris Wilson
On Thu, Feb 18, 2016 at 02:24:07PM +, john.c.harri...@intel.com wrote:
> From: John Harrison 

As I said, and have shown in patches several months ago, just fix the
underlying bug to remove the struct_mutex requirement for freeing the
request.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v4 3/8] drm/i915: Kill off intel_crtc->atomic.wait_vblank, v4.

2016-02-18 Thread Maarten Lankhorst
Op 18-02-16 om 15:14 schreef Zanoni, Paulo R:
> Em Qui, 2016-02-18 às 14:22 +0100, Maarten Lankhorst escreveu:
>> Op 17-02-16 om 22:20 schreef Zanoni, Paulo R:
>>> Em Qua, 2016-02-10 às 13:49 +0100, Maarten Lankhorst escreveu:
 Currently we perform our own wait in post_plane_update,
 but the atomic core performs another one in wait_for_vblanks.
 This means that 2 vblanks are done when a fb is changed,
 which is a bit overkill.

 Merge them by creating a helper function that takes a crtc mask
 for the planes to wait on.

 The broadwell vblank workaround may look gone entirely but this
 is
 not the case. pipe_config->wm_changed is set to true
 when any plane is turned on, which forces a vblank wait.

 Changes since v1:
 - Removing the double vblank wait on broadwell moved to its own
 commit.
 Changes since v2:
 - Move out POWER_DOMAIN_MODESET handling to its own commit.
 Changes since v3:
 - Do not wait for vblank on legacy cursor updates. (Ville)
 - Move broadwell vblank workaround comment to page_flip_finished.
 (Ville)
 Changes since v4:
 - Compile fix, legacy_cursor_flip -> *_update.

 Signed-off-by: Maarten Lankhorst 
 ---
  drivers/gpu/drm/i915/intel_atomic.c  |  1 +
  drivers/gpu/drm/i915/intel_display.c | 86
 +++-
  drivers/gpu/drm/i915/intel_drv.h |  2 +-
  3 files changed, 67 insertions(+), 22 deletions(-)

 diff --git a/drivers/gpu/drm/i915/intel_atomic.c
 b/drivers/gpu/drm/i915/intel_atomic.c
 index 4625f8a9ba12..8e579a8505ac 100644
 --- a/drivers/gpu/drm/i915/intel_atomic.c
 +++ b/drivers/gpu/drm/i915/intel_atomic.c
 @@ -97,6 +97,7 @@ intel_crtc_duplicate_state(struct drm_crtc
 *crtc)
crtc_state->disable_lp_wm = false;
crtc_state->disable_cxsr = false;
crtc_state->wm_changed = false;
 +  crtc_state->fb_changed = false;
  
return _state->base;
  }
 diff --git a/drivers/gpu/drm/i915/intel_display.c
 b/drivers/gpu/drm/i915/intel_display.c
 index 804f2c6f260d..4d4dddc1f970 100644
 --- a/drivers/gpu/drm/i915/intel_display.c
 +++ b/drivers/gpu/drm/i915/intel_display.c
 @@ -4785,9 +4785,6 @@ static void intel_post_plane_update(struct
 intel_crtc *crtc)
to_intel_crtc_state(crtc->base.state);
struct drm_device *dev = crtc->base.dev;
  
 -  if (atomic->wait_vblank)
 -  intel_wait_for_vblank(dev, crtc->pipe);
 -
intel_frontbuffer_flip(dev, atomic->fb_bits);
  
crtc->wm.cxsr_allowed = true;
 @@ -10902,6 +10899,12 @@ static bool page_flip_finished(struct
 intel_crtc *crtc)
return true;
  
/*
 +   * BDW signals flip done immediately if the plane
 +   * is disabled, even if the plane enable is already
 +   * armed to occur at the next vblank :(
 +   */
>>> Having this comment here is just... weird. I think it removes a lot
>>> of
>>> the context that was present before.
>>>
 +
 +  /*
 * A DSPSURFLIVE check isn't enough in case the mmio and
 CS
 flips
 * used the same base address. In that case the mmio
 flip
 might
 * have completed, but the CS hasn't even executed the
 flip
 yet.
 @@ -11778,6 +11781,9 @@ int
 intel_plane_atomic_calc_changes(struct
 drm_crtc_state *crtc_state,
if (!was_visible && !visible)
return 0;
  
 +  if (fb != old_plane_state->base.fb)
 +  pipe_config->fb_changed = true;
 +
turn_off = was_visible && (!visible || mode_changed);
turn_on = visible && (!was_visible || mode_changed);
  
 @@ -11793,8 +11799,6 @@ int
 intel_plane_atomic_calc_changes(struct
 drm_crtc_state *crtc_state,
  
/* must disable cxsr around plane enable/disable
 */
if (plane->type != DRM_PLANE_TYPE_CURSOR) {
 -  if (is_crtc_enabled)
 -  intel_crtc->atomic.wait_vblank =
 true;
pipe_config->disable_cxsr = true;
}
>>> We could have killed the brackets here :)
>> Indeed, will do so in next version.
} else if (intel_wm_need_update(plane, plane_state)) {
 @@ -11810,14 +11814,6 @@ int
 intel_plane_atomic_calc_changes(struct
 drm_crtc_state *crtc_state,
intel_crtc->atomic.post_enable_primary =
 turn_on;
intel_crtc->atomic.update_fbc = true;
  
 -  /*
 -   * BDW signals flip done immediately if the
 plane
 -   * is disabled, even if the plane enable is
 already
 -   * armed to occur at the next vblank :(
 -   */
 -  if (turn_on && IS_BROADWELL(dev))
 -  

[Intel-gfx] [PATCH v5 06/35] drm/i915: Start of GPU scheduler

2016-02-18 Thread John . C . Harrison
From: John Harrison 

Initial creation of scheduler source files. Note that this patch
implements most of the scheduler functionality but does not hook it in
to the driver yet. It also leaves the scheduler code in 'pass through'
mode so that even when it is hooked in, it will not actually do very
much. This allows the hooks to be added one at a time in bite size
chunks and only when the scheduler is finally enabled at the end does
anything start happening.

The general theory of operation is that when batch buffers are
submitted to the driver, the execbuffer() code packages up all the
information required to execute the batch buffer at a later time. This
package is given over to the scheduler which adds it to an internal
node list. The scheduler also scans the list of objects associated
with the batch buffer and compares them against the objects already in
use by other buffers in the node list. If matches are found then the
new batch buffer node is marked as being dependent upon the matching
node. The same is done for the context object. The scheduler also
bumps up the priority of such matching nodes on the grounds that the
more dependencies a given batch buffer has the more important it is
likely to be.

The scheduler aims to have a given (tuneable) number of batch buffers
in flight on the hardware at any given time. If fewer than this are
currently executing when a new node is queued, then the node is passed
straight through to the submit function. Otherwise it is simply added
to the queue and the driver returns back to user land.

The scheduler is notified when each batch buffer completes and updates
its internal tracking accordingly. At the end of the completion
interrupt processing, if any scheduler tracked batches were processed,
the scheduler's deferred worker thread is woken up. This can do more
involved processing such as actually removing completed nodes from the
queue and freeing up the resources associated with them (internal
memory allocations, DRM object references, context reference, etc.).
The work handler also checks the in flight count and calls the
submission code if a new slot has appeared.

When the scheduler's submit code is called, it scans the queued node
list for the highest priority node that has no unmet dependencies.
Note that the dependency calculation is complex as it must take
inter-ring dependencies and potential preemptions into account. Note
also that in the future this will be extended to include external
dependencies such as the Android Native Sync file descriptors and/or
the linux dma-buff synchronisation scheme.

If a suitable node is found then it is sent to execbuff_final() for
submission to the hardware. The in flight count is then re-checked and
a new node popped from the list if appropriate. All nodes that are not
submitted have their priority bumped. This ensures that low priority
tasks do not get starved out by busy higher priority ones - everything
will eventually get its turn to run.

Note that this patch does not implement pre-emptive scheduling. Only
basic scheduling by re-ordering batch buffer submission is currently
implemented. Pre-emption of actively executing batch buffers comes in
the next patch series.

v2: Changed priority levels to +/-1023 due to feedback from Chris
Wilson.

Removed redundant index from scheduler node.

Changed time stamps to use jiffies instead of raw monotonic. This
provides lower resolution but improved compatibility with other i915
code.

Major re-write of completion tracking code due to struct fence
conversion. The scheduler no longer has it's own private IRQ handler
but just lets the existing request code handle completion events.
Instead, the scheduler now hooks into the request notify code to be
told when a request has completed.

Reduced driver mutex locking scope. Removal of scheduler nodes no
longer grabs the mutex lock.

v3: Refactor of dependency generation to make the code more readable.
Also added in read-read optimisation support - i.e., don't treat a
shared read-only buffer as being a dependency.

Allowed the killing of queued nodes rather than only flying ones.

v4: Updated the commit message to better reflect the current state of
the code. Downgraded some BUG_ONs to WARN_ONs. Used the correct array
memory allocator function (kmalloc_array instead of kmalloc).
Corrected the format of some comments. Wrapped some lines differently
to keep the style checker happy.

Fixed a WARN_ON when killing nodes. The dependency removal code checks
that nodes being destroyed do not have any oustanding dependencies
(which would imply they should not have been executed yet). In the
case of nodes being destroyed, e.g. due to context banning, then this
might well be the case - they have not been executed and do indeed
have outstanding dependencies.

Re-instated the code to disble interrupts when not in use. The
underlying problem causing broken IRQ reference counts seems to have
been fixed now.

v5: Shuffled 

[Intel-gfx] [PATCH v6 6/7] drm/i915: Updated request structure tracing

2016-02-18 Thread John . C . Harrison
From: John Harrison 

Added the '_complete' trace event which occurs when a fence/request is
signaled as complete. Also moved the notify event from the IRQ handler
code to inside the notify function itself.

v3: Added the current ring seqno to the notify trace point.

v5: Line wrapping to keep the style checker happy.

For: VIZ-5190
Signed-off-by: John Harrison 
---
 drivers/gpu/drm/i915/i915_gem.c   |  9 +++--
 drivers/gpu/drm/i915/i915_irq.c   |  2 --
 drivers/gpu/drm/i915/i915_trace.h | 14 +-
 3 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 635729e..f7858ea 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2870,13 +2870,16 @@ void i915_gem_request_notify(struct intel_engine_cs 
*ring, bool fence_locked)
unsigned long flags;
u32 seqno;
 
-   if (list_empty(>fence_signal_list))
+   if (list_empty(>fence_signal_list)) {
+   trace_i915_gem_request_notify(ring, 0);
return;
+   }
 
if (!fence_locked)
spin_lock_irqsave(>fence_lock, flags);
 
seqno = ring->get_seqno(ring, false);
+   trace_i915_gem_request_notify(ring, seqno);
 
list_for_each_entry_safe(req, req_next, >fence_signal_list, 
signal_link) {
if (!req->cancelled) {
@@ -2890,8 +2893,10 @@ void i915_gem_request_notify(struct intel_engine_cs 
*ring, bool fence_locked)
 */
list_del_init(>signal_link);
 
-   if (!req->cancelled)
+   if (!req->cancelled) {
fence_signal_locked(>fence);
+   trace_i915_gem_request_complete(req);
+   }
 
if (req->irq_enabled) {
req->ring->irq_put(req->ring);
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index a5f64aa..20c6a90 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -999,8 +999,6 @@ static void notify_ring(struct intel_engine_cs *ring)
if (!intel_ring_initialized(ring))
return;
 
-   trace_i915_gem_request_notify(ring);
-
i915_gem_request_notify(ring, false);
 
wake_up_all(>irq_queue);
diff --git a/drivers/gpu/drm/i915/i915_trace.h 
b/drivers/gpu/drm/i915/i915_trace.h
index 52b2d40..cfe4f03 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -561,23 +561,27 @@ DEFINE_EVENT(i915_gem_request, i915_gem_request_add,
 );
 
 TRACE_EVENT(i915_gem_request_notify,
-   TP_PROTO(struct intel_engine_cs *ring),
-   TP_ARGS(ring),
+   TP_PROTO(struct intel_engine_cs *ring, uint32_t seqno),
+   TP_ARGS(ring, seqno),
 
TP_STRUCT__entry(
 __field(u32, dev)
 __field(u32, ring)
 __field(u32, seqno)
+__field(bool, is_empty)
 ),
 
TP_fast_assign(
   __entry->dev = ring->dev->primary->index;
   __entry->ring = ring->id;
-  __entry->seqno = ring->get_seqno(ring, false);
+  __entry->seqno = seqno;
+  __entry->is_empty =
+   list_empty(>fence_signal_list);
   ),
 
-   TP_printk("dev=%u, ring=%u, seqno=%u",
- __entry->dev, __entry->ring, __entry->seqno)
+   TP_printk("dev=%u, ring=%u, seqno=%u, empty=%d",
+ __entry->dev, __entry->ring, __entry->seqno,
+ __entry->is_empty)
 );
 
 DEFINE_EVENT(i915_gem_request, i915_gem_request_retire,
-- 
1.9.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v6 2/7] drm/i915: Removed now redudant parameter to i915_gem_request_completed()

2016-02-18 Thread John . C . Harrison
From: John Harrison 

The change to the implementation of i915_gem_request_completed() means
that the lazy coherency flag is no longer used. This can now be
removed to simplify the interface.

v6: Updated to newer nigthly and resolved conflicts.

For: VIZ-5190
Signed-off-by: John Harrison 
---
 drivers/gpu/drm/i915/i915_debugfs.c  |  2 +-
 drivers/gpu/drm/i915/i915_drv.h  |  3 +--
 drivers/gpu/drm/i915/i915_gem.c  | 14 +++---
 drivers/gpu/drm/i915/intel_display.c |  2 +-
 drivers/gpu/drm/i915/intel_pm.c  |  4 ++--
 5 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index d032e9f..b90d6ea 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -601,7 +601,7 @@ static int i915_gem_pageflip_info(struct seq_file *m, void 
*data)
   
i915_gem_request_get_seqno(work->flip_queued_req),
   dev_priv->next_seqno,
   ring->get_seqno(ring, true),
-  
i915_gem_request_completed(work->flip_queued_req, true));
+  
i915_gem_request_completed(work->flip_queued_req));
} else
seq_printf(m, "Flip not associated with any 
ring\n");
seq_printf(m, "Flip queued on frame %d, (was ready on 
frame %d), now %d\n",
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 7c64cc1..86ef0b4 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2295,8 +2295,7 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
   struct drm_i915_gem_request **req_out);
 void i915_gem_request_cancel(struct drm_i915_gem_request *req);
 
-static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
- bool lazy_coherency)
+static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
 {
return fence_is_signaled(>fence);
 }
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 901be6c..e170732 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1273,7 +1273,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
if (list_empty(>list))
return 0;
 
-   if (i915_gem_request_completed(req, true))
+   if (i915_gem_request_completed(req))
return 0;
 
timeout_expire = 0;
@@ -1323,7 +1323,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
break;
}
 
-   if (i915_gem_request_completed(req, false)) {
+   if (i915_gem_request_completed(req)) {
ret = 0;
break;
}
@@ -2825,7 +2825,7 @@ i915_gem_find_active_request(struct intel_engine_cs *ring)
struct drm_i915_gem_request *request;
 
list_for_each_entry(request, >request_list, list) {
-   if (i915_gem_request_completed(request, false))
+   if (i915_gem_request_completed(request))
continue;
 
return request;
@@ -2959,7 +2959,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs 
*ring)
   struct drm_i915_gem_request,
   list);
 
-   if (!i915_gem_request_completed(request, true))
+   if (!i915_gem_request_completed(request))
break;
 
i915_gem_request_retire(request);
@@ -2983,7 +2983,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs 
*ring)
}
 
if (unlikely(ring->trace_irq_req &&
-i915_gem_request_completed(ring->trace_irq_req, true))) {
+i915_gem_request_completed(ring->trace_irq_req))) {
ring->irq_put(ring);
i915_gem_request_assign(>trace_irq_req, NULL);
}
@@ -3093,7 +3093,7 @@ i915_gem_object_flush_active(struct drm_i915_gem_object 
*obj)
if (list_empty(>list))
goto retire;
 
-   if (i915_gem_request_completed(req, true)) {
+   if (i915_gem_request_completed(req)) {
__i915_gem_request_retire__upto(req);
 retire:
i915_gem_object_retire__read(obj, i);
@@ -3205,7 +3205,7 @@ __i915_gem_object_sync(struct drm_i915_gem_object *obj,
if (to == from)
return 0;
 
-   if (i915_gem_request_completed(from_req, true))
+   if (i915_gem_request_completed(from_req))
return 0;
 
if (!i915_semaphore_is_enabled(obj->base.dev)) {
diff 

[Intel-gfx] [PATCH v6 3/7] drm/i915: Add per context timelines to fence object

2016-02-18 Thread John . C . Harrison
From: John Harrison 

The fence object used inside the request structure requires a sequence
number. Although this is not used by the i915 driver itself, it could
potentially be used by non-i915 code if the fence is passed outside of
the driver. This is the intention as it allows external kernel drivers
and user applications to wait on batch buffer completion
asynchronously via the dma-buff fence API.

To ensure that such external users are not confused by strange things
happening with the seqno, this patch adds in a per context timeline
that can provide a guaranteed in-order seqno value for the fence. This
is safe because the scheduler will not re-order batch buffers within a
context - they are considered to be mutually dependent.

v2: New patch in series.

v3: Renamed/retyped timeline structure fields after review comments by
Tvrtko Ursulin.

Added context information to the timeline's name string for better
identification in debugfs output.

v5: Line wrapping and other white space fixes to keep style checker
happy.

For: VIZ-5190
Signed-off-by: John Harrison 
Cc: Tvrtko Ursulin 
---
 drivers/gpu/drm/i915/i915_drv.h | 25 +++---
 drivers/gpu/drm/i915/i915_gem.c | 83 +
 drivers/gpu/drm/i915/i915_gem_context.c | 16 ++-
 drivers/gpu/drm/i915/intel_lrc.c|  8 
 drivers/gpu/drm/i915/intel_ringbuffer.h |  1 -
 5 files changed, 115 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 86ef0b4..62dbdf2 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -845,6 +845,15 @@ struct i915_ctx_hang_stats {
bool banned;
 };
 
+struct i915_fence_timeline {
+   charname[32];
+   unsignedfence_context;
+   unsignednext;
+
+   struct intel_context *ctx;
+   struct intel_engine_cs *ring;
+};
+
 /* This must match up with the value previously used for execbuf2.rsvd1. */
 #define DEFAULT_CONTEXT_HANDLE 0
 
@@ -892,6 +901,7 @@ struct intel_context {
struct i915_vma *lrc_vma;
u64 lrc_desc;
uint32_t *lrc_reg_state;
+   struct i915_fence_timeline fence_timeline;
} engine[I915_NUM_RINGS];
 
struct list_head link;
@@ -2200,13 +2210,10 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
 struct drm_i915_gem_request {
/**
 * Underlying object for implementing the signal/wait stuff.
-* NB: Never call fence_later() or return this fence object to user
-* land! Due to lazy allocation, scheduler re-ordering, pre-emption,
-* etc., there is no guarantee at all about the validity or
-* sequentiality of the fence's seqno! It is also unsafe to let
-* anything outside of the i915 driver get hold of the fence object
-* as the clean up when decrementing the reference count requires
-* holding the driver mutex lock.
+* NB: Never return this fence object to user land! It is unsafe to
+* let anything outside of the i915 driver get hold of the fence
+* object as the clean up when decrementing the reference count
+* requires holding the driver mutex lock.
 */
struct fence fence;
 
@@ -2295,6 +2302,10 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
   struct drm_i915_gem_request **req_out);
 void i915_gem_request_cancel(struct drm_i915_gem_request *req);
 
+int i915_create_fence_timeline(struct drm_device *dev,
+  struct intel_context *ctx,
+  struct intel_engine_cs *ring);
+
 static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
 {
return fence_is_signaled(>fence);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index e170732..2d50287 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2731,9 +2731,35 @@ static const char 
*i915_gem_request_get_driver_name(struct fence *req_fence)
 
 static const char *i915_gem_request_get_timeline_name(struct fence *req_fence)
 {
-   struct drm_i915_gem_request *req = container_of(req_fence,
-typeof(*req), fence);
-   return req->ring->name;
+   struct drm_i915_gem_request *req;
+   struct i915_fence_timeline *timeline;
+
+   req = container_of(req_fence, typeof(*req), fence);
+   timeline = >ctx->engine[req->ring->id].fence_timeline;
+
+   return timeline->name;
+}
+
+static void i915_gem_request_timeline_value_str(struct fence *req_fence,
+   char *str, int size)
+{
+   struct drm_i915_gem_request *req;
+
+   req = container_of(req_fence, typeof(*req), fence);
+
+   /* Last signalled timeline value ??? */
+   

[Intel-gfx] [PATCH v6 5/7] drm/i915: Interrupt driven fences

2016-02-18 Thread John . C . Harrison
From: John Harrison 

The intended usage model for struct fence is that the signalled status
should be set on demand rather than polled. That is, there should not
be a need for a 'signaled' function to be called everytime the status
is queried. Instead, 'something' should be done to enable a signal
callback from the hardware which will update the state directly. In
the case of requests, this is the seqno update interrupt. The idea is
that this callback will only be enabled on demand when something
actually tries to wait on the fence.

This change removes the polling test and replaces it with the callback
scheme. Each fence is added to a 'please poke me' list at the start of
i915_add_request(). The interrupt handler then scans through the 'poke
me' list when a new seqno pops out and signals any matching
fence/request. The fence is then removed from the list so the entire
request stack does not need to be scanned every time. Note that the
fence is added to the list before the commands to generate the seqno
interrupt are added to the ring. Thus the sequence is guaranteed to be
race free if the interrupt is already enabled.

Note that the interrupt is only enabled on demand (i.e. when
__wait_request() is called). Thus there is still a potential race when
enabling the interrupt as the request may already have completed.
However, this is simply solved by calling the interrupt processing
code immediately after enabling the interrupt and thereby checking for
already completed requests.

Lastly, the ring clean up code has the possibility to cancel
outstanding requests (e.g. because TDR has reset the ring). These
requests will never get signalled and so must be removed from the
signal list manually. This is done by setting a 'cancelled' flag and
then calling the regular notify/retire code path rather than
attempting to duplicate the list manipulatation and clean up code in
multiple places. This also avoid any race condition where the
cancellation request might occur after/during the completion interrupt
actually arriving.

v2: Updated to take advantage of the request unreference no longer
requiring the mutex lock.

v3: Move the signal list processing around to prevent unsubmitted
requests being added to the list. This was occurring on Android
because the native sync implementation calls the
fence->enable_signalling API immediately on fence creation.

Updated after review comments by Tvrtko Ursulin. Renamed list nodes to
'link' instead of 'list'. Added support for returning an error code on
a cancelled fence. Update list processing to be more efficient/safer
with respect to spinlocks.

v5: Made i915_gem_request_submit a static as it is only ever called
from one place.

Fixed up the low latency wait optimisation. The time delay between the
seqno value being to memory and the drive's ISR running can be
significant, at least for the wait request micro-benchmark. This can
be greatly improved by explicitly checking for seqno updates in the
pre-wait busy poll loop. Also added some documentation comments to the
busy poll code.

Fixed up support for the faking of lost interrupts
(test_irq_rings/missed_irq_rings). That is, there is an IGT test that
tells the driver to loose interrupts deliberately and then check that
everything still works as expected (albeit much slower).

Updates from review comments: use non IRQ-save spinlocking, early exit
on WARN and improved comments (Tvrtko Ursulin).

v6: Updated to newer nigthly and resolved conflicts around the
wait_request busy spin optimisation. Also fixed a race condition
between this early exit path and the regular completion path.

For: VIZ-5190
Signed-off-by: John Harrison 
Cc: Tvrtko Ursulin 
---
 drivers/gpu/drm/i915/i915_drv.h |   8 ++
 drivers/gpu/drm/i915/i915_gem.c | 240 +---
 drivers/gpu/drm/i915/i915_irq.c |   2 +
 drivers/gpu/drm/i915/intel_lrc.c|   2 +
 drivers/gpu/drm/i915/intel_ringbuffer.c |   2 +
 drivers/gpu/drm/i915/intel_ringbuffer.h |   2 +
 6 files changed, 234 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 2c6aefba..0584846 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2210,7 +2210,12 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
 struct drm_i915_gem_request {
/** Underlying object for implementing the signal/wait stuff. */
struct fence fence;
+   struct list_head signal_link;
+   struct list_head unsignal_link;
struct list_head delayed_free_link;
+   bool cancelled;
+   bool irq_enabled;
+   bool signal_requested;
 
/** On Which ring this request was generated */
struct drm_i915_private *i915;
@@ -2296,6 +2301,9 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
   struct intel_context *ctx,
   

[Intel-gfx] [PATCH v6 1/7] drm/i915: Convert requests to use struct fence

2016-02-18 Thread John . C . Harrison
From: John Harrison 

There is a construct in the linux kernel called 'struct fence' that is
intended to keep track of work that is executed on hardware. I.e. it
solves the basic problem that the drivers 'struct
drm_i915_gem_request' is trying to address. The request structure does
quite a lot more than simply track the execution progress so is very
definitely still required. However, the basic completion status side
could be updated to use the ready made fence implementation and gain
all the advantages that provides.

This patch makes the first step of integrating a struct fence into the
request. It replaces the explicit reference count with that of the
fence. It also replaces the 'is completed' test with the fence's
equivalent. Currently, that simply chains on to the original request
implementation. A future patch will improve this.

v3: Updated after review comments by Tvrtko Ursulin. Added fence
context/seqno pair to the debugfs request info. Renamed fence 'driver
name' to just 'i915'. Removed BUG_ONs.

v5: Changed seqno format in debugfs to %x rather than %u as that is
apparently the preferred appearance. Line wrapped some long lines to
keep the style checker happy.

v6: Updated to newer nigthly and resolved conflicts. The biggest issue
was with the re-worked busy spin precursor to waiting on a request. In
particular, the addition of a 'request_started' helper function. This
has no corresponding concept within the fence framework. However, it
is only ever used in one place and the whole point of that place is to
always directly read the seqno for absolutely lowest latency possible.
So the simple solution is to just make the seqno test explicit at that
point now rather than later in the series (it was previously being
done anyway when fences become interrupt driven).

For: VIZ-5190
Signed-off-by: John Harrison 
Cc: Tvrtko Ursulin 
Reviewed-by: Jesse Barnes 
---
 drivers/gpu/drm/i915/i915_debugfs.c |  5 ++-
 drivers/gpu/drm/i915/i915_drv.h | 47 +++
 drivers/gpu/drm/i915/i915_gem.c | 67 +
 drivers/gpu/drm/i915/intel_lrc.c|  1 +
 drivers/gpu/drm/i915/intel_ringbuffer.c |  1 +
 drivers/gpu/drm/i915/intel_ringbuffer.h |  3 ++
 6 files changed, 89 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index ebe7063..d032e9f 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -709,11 +709,12 @@ static int i915_gem_request_info(struct seq_file *m, void 
*data)
task = NULL;
if (req->pid)
task = pid_task(req->pid, PIDTYPE_PID);
-   seq_printf(m, "%x @ %d: %s [%d]\n",
+   seq_printf(m, "%x @ %d: %s [%d], fence = %x:%x\n",
   req->seqno,
   (int) (jiffies - req->emitted_jiffies),
   task ? task->comm : "",
-  task ? task->pid : -1);
+  task ? task->pid : -1,
+  req->fence.context, req->fence.seqno);
rcu_read_unlock();
}
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 351308f..7c64cc1 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -53,6 +53,7 @@
 #include 
 #include 
 #include "intel_guc.h"
+#include 
 
 /* General customization:
  */
@@ -2197,7 +2198,17 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
  * initial reference taken using kref_init
  */
 struct drm_i915_gem_request {
-   struct kref ref;
+   /**
+* Underlying object for implementing the signal/wait stuff.
+* NB: Never call fence_later() or return this fence object to user
+* land! Due to lazy allocation, scheduler re-ordering, pre-emption,
+* etc., there is no guarantee at all about the validity or
+* sequentiality of the fence's seqno! It is also unsafe to let
+* anything outside of the i915 driver get hold of the fence object
+* as the clean up when decrementing the reference count requires
+* holding the driver mutex lock.
+*/
+   struct fence fence;
 
/** On Which ring this request was generated */
struct drm_i915_private *i915;
@@ -2283,7 +2294,13 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
   struct intel_context *ctx,
   struct drm_i915_gem_request **req_out);
 void i915_gem_request_cancel(struct drm_i915_gem_request *req);
-void i915_gem_request_free(struct kref *req_ref);
+
+static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,

Re: [Intel-gfx] [PATCH v4 07/38] drm/i915: Start of GPU scheduler

2016-02-18 Thread John Harrison

On 20/01/2016 13:18, Joonas Lahtinen wrote:

Hi,

Comments below this pre text.

Many of the comments are related to the indent and style of the code.
That stuff is important to fix for future maintainability. In order for
the future review to be more effective, I'd like to next see a v5 of
the series where the code quality concerns have been addressed, patches
squashed to be actual reviewable chunks and appropriate kerneldoc being
added.

To give an idea of proper slicing of patches, first produce a no-op
scheduler, adding the extra function calls where needed and still
keeping the scheduling completely linear. Second patch could introduce
out of order submitting, third one priority bumping, fourth pre-empting
and so on. That way, each patch extends the functionality and is itself
already mergeable. That way I've been able to go through and understand
the existing code, and I can actually review (other than just nag about
indent and coding style) if the changes are appropriate to bring in the
functionality desired.

In the current split, for me or anyone who did not participate writing
the code, it is otherwise too confusing to try to guess what future
changes might make each piece of code make sense, and which will be
redundant in the future too. There is no value in splitting code to
chunks that are not itself functional.

Regards, Joonas

On Mon, 2016-01-11 at 18:42 +, john.c.harri...@intel.com wrote:

From: John Harrison 

Initial creation of scheduler source files. Note that this patch
implements most of the scheduler functionality but does not hook it
in
to the driver yet. It also leaves the scheduler code in 'pass
through'
mode so that even when it is hooked in, it will not actually do very
much. This allows the hooks to be added one at a time in bite size
chunks and only when the scheduler is finally enabled at the end does
anything start happening.

The general theory of operation is that when batch buffers are
submitted to the driver, the execbuffer() code packages up all the
information required to execute the batch buffer at a later time.
This
package is given over to the scheduler which adds it to an internal
node list. The scheduler also scans the list of objects associated
with the batch buffer and compares them against the objects already
in
use by other buffers in the node list. If matches are found then the
new batch buffer node is marked as being dependent upon the matching
node. The same is done for the context object. The scheduler also
bumps up the priority of such matching nodes on the grounds that the
more dependencies a given batch buffer has the more important it is
likely to be.

The scheduler aims to have a given (tuneable) number of batch buffers
in flight on the hardware at any given time. If fewer than this are
currently executing when a new node is queued, then the node is
passed
straight through to the submit function. Otherwise it is simply added
to the queue and the driver returns back to user land.

The scheduler is notified when each batch buffer completes and
updates
its internal tracking accordingly. At the end of the completion
interrupt processing, if any scheduler tracked batches were
processed,
the scheduler's deferred worker thread is woken up. This can do more
involved processing such as actually removing completed nodes from
the
queue and freeing up the resources associated with them (internal
memory allocations, DRM object references, context reference, etc.).
The work handler also checks the in flight count and calls the
submission code if a new slot has appeared.

When the scheduler's submit code is called, it scans the queued node
list for the highest priority node that has no unmet dependencies.
Note that the dependency calculation is complex as it must take
inter-ring dependencies and potential preemptions into account. Note
also that in the future this will be extended to include external
dependencies such as the Android Native Sync file descriptors and/or
the linux dma-buff synchronisation scheme.

If a suitable node is found then it is sent to execbuff_final() for
submission to the hardware. The in flight count is then re-checked
and
a new node popped from the list if appropriate.

Note that this patch does not implement pre-emptive scheduling. Only
basic scheduling by re-ordering batch buffer submission is currently
implemented. Pre-emption of actively executing batch buffers comes in
the next patch series.

v2: Changed priority levels to +/-1023 due to feedback from Chris
Wilson.

Removed redundant index from scheduler node.

Changed time stamps to use jiffies instead of raw monotonic. This
provides lower resolution but improved compatibility with other i915
code.

Major re-write of completion tracking code due to struct fence
conversion. The scheduler no longer has it's own private IRQ handler
but just lets the existing request code handle completion events.
Instead, the scheduler now hooks into the request notify 

[Intel-gfx] [PATCH v6 0/7] Convert requests to use struct fence

2016-02-18 Thread John . C . Harrison
From: John Harrison 

There is a construct in the linux kernel called 'struct fence' that is
intended to keep track of work that is executed on hardware. I.e. it
solves the basic problem that the drivers 'struct
drm_i915_gem_request' is trying to address. The request structure does
quite a lot more than simply track the execution progress so is very
definitely still required. However, the basic completion status side
could be updated to use the ready made fence implementation and gain
all the advantages that provides.

Using the struct fence object also has the advantage that the fence
can be used outside of the i915 driver (by other drivers or by
userland applications). That is the basis of the dma-buff
synchronisation API and allows asynchronous tracking of work
completion. In this case, it allows applications to be signalled
directly when a batch buffer completes without having to make an IOCTL
call into the driver.

This is work that was planned since the conversion of the driver from
being seqno value based to being request structure based. This patch
series does that work.

An IGT test to exercise the fence support from user land is in
progress and will follow. Android already makes extensive use of
fences for display composition. Real world linux usage is planned in
the form of Jesse's page table sharing / bufferless execbuf support.
There is also a plan that Wayland (and others) could make use of it in
a similar manner to Android.

v2: Updated for review comments by various people and to add support
for Android style 'native sync'.

v3: Updated from review comments by Tvrtko Ursulin. Also moved sync
framework out of staging and improved request completion handling.

v4: Fixed patch tag (should have been PATCH not RFC). Corrected
ownership of one patch which had passed through many hands before
reaching me. Fixed a bug introduced in v3 and updated for review
comments.

v5: Removed de-staging and further updates to Android sync code. The
de-stage is now being handled by someone else. The sync integration to
the i915 driver will be a separate patch set that can only land after
the external de-stage has been completed.

Assorted changes based on review comments and style checker fixes.
Most significant change is fixing up the fake lost interrupt support
for the 'drv_missed_irq_hang' IGT test and improving the wait request
latency.

v6: Updated to newer nigthly and resolved conflicts around updates
to the wait_request optimisations.

[Patches against drm-intel-nightly tree fetched 19/01/2016]

John Harrison (7):
  drm/i915: Convert requests to use struct fence
  drm/i915: Removed now redudant parameter to i915_gem_request_completed()
  drm/i915: Add per context timelines to fence object
  drm/i915: Delay the freeing of requests until retire time
  drm/i915: Interrupt driven fences
  drm/i915: Updated request structure tracing
  drm/i915: Cache last IRQ seqno to reduce IRQ overhead

 drivers/gpu/drm/i915/i915_debugfs.c |   7 +-
 drivers/gpu/drm/i915/i915_drv.h |  69 +++---
 drivers/gpu/drm/i915/i915_gem.c | 423 +---
 drivers/gpu/drm/i915/i915_gem_context.c |  16 +-
 drivers/gpu/drm/i915/i915_irq.c |   2 +-
 drivers/gpu/drm/i915/i915_trace.h   |  14 +-
 drivers/gpu/drm/i915/intel_display.c|   4 +-
 drivers/gpu/drm/i915/intel_lrc.c|  13 +
 drivers/gpu/drm/i915/intel_pm.c |   6 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c |   5 +
 drivers/gpu/drm/i915/intel_ringbuffer.h |  12 +
 11 files changed, 491 insertions(+), 80 deletions(-)

-- 
1.9.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v6 4/7] drm/i915: Delay the freeing of requests until retire time

2016-02-18 Thread John . C . Harrison
From: John Harrison 

The request structure is reference counted. When the count reached
zero, the request was immediately freed and all associated objects
were unrefereced/unallocated. This meant that the driver mutex lock
must be held at the point where the count reaches zero. This was fine
while all references were held internally to the driver. However, the
plan is to allow the underlying fence object (and hence the request
itself) to be returned to other drivers and to userland. External
users cannot be expected to acquire a driver private mutex lock.

Rather than attempt to disentangle the request structure from the
driver mutex lock, the decsion was to defer the free code until a
later (safer) point. Hence this patch changes the unreference callback
to merely move the request onto a delayed free list. The driver's
retire worker thread will then process the list and actually call the
free function on the requests.

v2: New patch in series.

v3: Updated after review comments by Tvrtko Ursulin. Rename list nodes
to 'link' rather than 'list'. Update list processing to be more
efficient/safer with respect to spinlocks.

v4: Changed to use basic spinlocks rather than IRQ ones - missed
update from earlier feedback by Tvrtko.

v5: Improved a comment to keep the style checker happy.

For: VIZ-5190
Signed-off-by: John Harrison 
Cc: Tvrtko Ursulin 
---
 drivers/gpu/drm/i915/i915_drv.h | 22 +++-
 drivers/gpu/drm/i915/i915_gem.c | 37 +
 drivers/gpu/drm/i915/intel_display.c|  2 +-
 drivers/gpu/drm/i915/intel_lrc.c|  2 ++
 drivers/gpu/drm/i915/intel_pm.c |  2 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c |  2 ++
 drivers/gpu/drm/i915/intel_ringbuffer.h |  7 +++
 7 files changed, 49 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 62dbdf2..2c6aefba 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2208,14 +2208,9 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
  * initial reference taken using kref_init
  */
 struct drm_i915_gem_request {
-   /**
-* Underlying object for implementing the signal/wait stuff.
-* NB: Never return this fence object to user land! It is unsafe to
-* let anything outside of the i915 driver get hold of the fence
-* object as the clean up when decrementing the reference count
-* requires holding the driver mutex lock.
-*/
+   /** Underlying object for implementing the signal/wait stuff. */
struct fence fence;
+   struct list_head delayed_free_link;
 
/** On Which ring this request was generated */
struct drm_i915_private *i915;
@@ -2337,21 +2332,10 @@ i915_gem_request_reference(struct drm_i915_gem_request 
*req)
 static inline void
 i915_gem_request_unreference(struct drm_i915_gem_request *req)
 {
-   WARN_ON(!mutex_is_locked(>ring->dev->struct_mutex));
-   fence_put(>fence);
-}
-
-static inline void
-i915_gem_request_unreference__unlocked(struct drm_i915_gem_request *req)
-{
-   struct drm_device *dev;
-
if (!req)
return;
 
-   dev = req->ring->dev;
-   if (kref_put_mutex(>fence.refcount, fence_release, 
>struct_mutex))
-   mutex_unlock(>struct_mutex);
+   fence_put(>fence);
 }
 
 static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 2d50287..aca9fcd 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2683,10 +2683,26 @@ static void i915_set_reset_status(struct 
drm_i915_private *dev_priv,
}
 }
 
-static void i915_gem_request_free(struct fence *req_fence)
+static void i915_gem_request_release(struct fence *req_fence)
 {
struct drm_i915_gem_request *req = container_of(req_fence,
 typeof(*req), fence);
+   struct intel_engine_cs *ring = req->ring;
+   struct drm_i915_private *dev_priv = to_i915(ring->dev);
+
+   /*
+* Need to add the request to a deferred dereference list to be
+* processed at a mutex lock safe time.
+*/
+   spin_lock(>delayed_free_lock);
+   list_add_tail(>delayed_free_link, >delayed_free_list);
+   spin_unlock(>delayed_free_lock);
+
+   queue_delayed_work(dev_priv->wq, _priv->mm.retire_work, 0);
+}
+
+static void i915_gem_request_free(struct drm_i915_gem_request *req)
+{
struct intel_context *ctx = req->ctx;
 
WARN_ON(!mutex_is_locked(>ring->dev->struct_mutex));
@@ -2766,7 +2782,7 @@ static const struct fence_ops i915_gem_request_fops = {
.enable_signaling   = i915_gem_request_enable_signaling,
.signaled   = i915_gem_request_is_completed,
.wait 

[Intel-gfx] [PATCH v6 7/7] drm/i915: Cache last IRQ seqno to reduce IRQ overhead

2016-02-18 Thread John . C . Harrison
From: John Harrison 

The notify function can be called many times without the seqno
changing. A large number of duplicates are to prevent races due to the
requirement of not enabling interrupts until requested. However, when
interrupts are enabled the IRQ handle can be called multiple times
without the ring's seqno value changing. This patch reduces the
overhead of these extra calls by caching the last processed seqno
value and early exiting if it has not changed.

v3: New patch for series.

v5: Added comment about last_irq_seqno usage due to code review
feedback (Tvrtko Ursulin).

v6: Minor update to resolve a race condition with the wait_request
optimisation.

For: VIZ-5190
Signed-off-by: John Harrison 
---
 drivers/gpu/drm/i915/i915_gem.c | 21 +++--
 drivers/gpu/drm/i915/intel_ringbuffer.h |  1 +
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index f7858ea..72a37d6 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1386,6 +1386,7 @@ out:
 * request has not actually been fully processed yet.
 */
spin_lock_irq(>ring->fence_lock);
+   req->ring->last_irq_seqno = 0;
i915_gem_request_notify(req->ring, true);
spin_unlock_irq(>ring->fence_lock);
}
@@ -2543,6 +2544,8 @@ i915_gem_init_seqno(struct drm_device *dev, u32 seqno)
 
for (j = 0; j < ARRAY_SIZE(ring->semaphore.sync_seqno); j++)
ring->semaphore.sync_seqno[j] = 0;
+
+   ring->last_irq_seqno = 0;
}
 
return 0;
@@ -2875,11 +2878,22 @@ void i915_gem_request_notify(struct intel_engine_cs 
*ring, bool fence_locked)
return;
}
 
+   /*
+* Check for a new seqno. If it hasn't actually changed then early
+* exit without even grabbing the spinlock. Note that this is safe
+* because any corruption of last_irq_seqno merely results in doing
+* the full processing when there is potentially no work to be done.
+* It can never lead to not processing work that does need to happen.
+*/
+   seqno = ring->get_seqno(ring, false);
+   trace_i915_gem_request_notify(ring, seqno);
+   if (seqno == ring->last_irq_seqno)
+   return;
+
if (!fence_locked)
spin_lock_irqsave(>fence_lock, flags);
 
-   seqno = ring->get_seqno(ring, false);
-   trace_i915_gem_request_notify(ring, seqno);
+   ring->last_irq_seqno = seqno;
 
list_for_each_entry_safe(req, req_next, >fence_signal_list, 
signal_link) {
if (!req->cancelled) {
@@ -3167,7 +3181,10 @@ static void i915_gem_reset_ring_cleanup(struct 
drm_i915_private *dev_priv,
 * Tidy up anything left over. This includes a call to
 * i915_gem_request_notify() which will make sure that any requests
 * that were on the signal pending list get also cleaned up.
+* NB: The seqno cache must be cleared otherwise the notify call will
+* simply return immediately.
 */
+   ring->last_irq_seqno = 0;
i915_gem_retire_requests_ring(ring);
 
/* Having flushed all requests from all queues, we know that all
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h 
b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 6a7968b..ada93a9 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -363,6 +363,7 @@ struct  intel_engine_cs {
spinlock_t fence_lock;
struct list_head fence_signal_list;
struct list_head fence_unsignal_list;
+   uint32_t last_irq_seqno;
 };
 
 static inline bool
-- 
1.9.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v5 35/35] drm/i915: Allow scheduler to manage inter-ring object synchronisation

2016-02-18 Thread John . C . Harrison
From: John Harrison 

The scheduler has always tracked batch buffer dependencies based on
DRM object usage. This means that it will not submit a batch on one
ring that has outstanding dependencies still executing on other rings.
This is exactly the same synchronisation performed by
i915_gem_object_sync() using hardware semaphores where available and
CPU stalls where not (e.g. in execlist mode and/or on Gen8 hardware).

Unfortunately, when a batch buffer is submitted to the driver the
_object_sync() call happens first. Thus in case where hardware
semaphores are disabled, the driver has already stalled until the
dependency has been resolved.

This patch adds an optimisation to _object_sync() to ignore the
synchronisation in the case where it will subsequently be handled by
the scheduler. This removes the driver stall and (in the single
application case) provides near hardware semaphore performance even
when hardware semaphores are disabled. In a busy system where there is
other work that can be executed on the stalling ring, it provides
better than hardware semaphore performance as it removes the stall
from both the driver and from the hardware. There is also a theory
that this method should improve power usage as hardware semaphores are
apparently not very power efficient - the stalled ring does not go
into as low a power a state as when it is genuinely idle.

The optimisation is to check whether both ends of the synchronisation
are batch buffer requests. If they are, then the scheduler will have
the inter-dependency tracked and managed. If one or other end is not a
batch buffer request (e.g. a page flip) then the code falls back to
the CPU stall or hardware semaphore as appropriate.

To check whether the existing usage is a batch buffer, the code simply
calls the 'are you tracking this request' function of the scheduler on
the object's last_read_req member. To check whether the new usage is a
batch buffer, a flag is passed in from the caller.

Issue: VIZ-5566
Signed-off-by: John Harrison 
---
 drivers/gpu/drm/i915/i915_drv.h|  2 +-
 drivers/gpu/drm/i915/i915_gem.c| 17 ++---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  2 +-
 drivers/gpu/drm/i915/intel_display.c   |  2 +-
 drivers/gpu/drm/i915/intel_lrc.c   |  2 +-
 5 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 5d02f44..207ac16 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3011,7 +3011,7 @@ int __must_check i915_mutex_lock_interruptible(struct 
drm_device *dev);
 #endif
 int i915_gem_object_sync(struct drm_i915_gem_object *obj,
 struct intel_engine_cs *to,
-struct drm_i915_gem_request **to_req);
+struct drm_i915_gem_request **to_req, bool to_batch);
 void i915_vma_move_to_active(struct i915_vma *vma,
 struct drm_i915_gem_request *req);
 int i915_gem_dumb_create(struct drm_file *file_priv,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index a2c136d..b14e384 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3577,7 +3577,7 @@ static int
 __i915_gem_object_sync(struct drm_i915_gem_object *obj,
   struct intel_engine_cs *to,
   struct drm_i915_gem_request *from_req,
-  struct drm_i915_gem_request **to_req)
+  struct drm_i915_gem_request **to_req, bool to_batch)
 {
struct intel_engine_cs *from;
int ret;
@@ -3589,6 +3589,15 @@ __i915_gem_object_sync(struct drm_i915_gem_object *obj,
if (i915_gem_request_completed(from_req))
return 0;
 
+   /*
+* The scheduler will manage inter-ring object dependencies
+* as long as both to and from requests are scheduler managed
+* (i.e. batch buffers).
+*/
+   if (to_batch &&
+   i915_scheduler_is_request_tracked(from_req, NULL, NULL))
+   return 0;
+
if (!i915_semaphore_is_enabled(obj->base.dev)) {
struct drm_i915_private *i915 = to_i915(obj->base.dev);
ret = __i915_wait_request(from_req,
@@ -3639,6 +3648,8 @@ __i915_gem_object_sync(struct drm_i915_gem_object *obj,
  * @to_req: request we wish to use the object for. See below.
  *  This will be allocated and returned if a request is
  *  required but not passed in.
+ * @to_batch: is the sync request on behalf of batch buffer submission?
+ * If so then the scheduler can (potentially) manage the synchronisation.
  *
  * This code is meant to abstract object synchronization with the GPU.
  * Calling with NULL implies synchronizing the object with the CPU
@@ -3669,7 +3680,7 @@ __i915_gem_object_sync(struct drm_i915_gem_object *obj,
 int
 

[Intel-gfx] [PATCH 01/20] igt/gem_ctx_param_basic: Updated to support scheduler priority interface

2016-02-18 Thread John . C . Harrison
From: John Harrison 

The GPU scheduler has added an execution priority level to the context
object. There is an IOCTL interface to allow user apps/libraries to
set this priority. This patch updates the context paramter IOCTL test
to include the new interface.

For: VIZ-1587
Signed-off-by: John Harrison 
---
 lib/ioctl_wrappers.h|  1 +
 tests/gem_ctx_param_basic.c | 34 +-
 2 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/lib/ioctl_wrappers.h b/lib/ioctl_wrappers.h
index 214ec78..e650b8f 100644
--- a/lib/ioctl_wrappers.h
+++ b/lib/ioctl_wrappers.h
@@ -105,6 +105,7 @@ struct local_i915_gem_context_param {
 #define LOCAL_CONTEXT_PARAM_BAN_PERIOD 0x1
 #define LOCAL_CONTEXT_PARAM_NO_ZEROMAP 0x2
 #define LOCAL_CONTEXT_PARAM_GTT_SIZE   0x3
+#define LOCAL_CONTEXT_PARAM_PRIORITY   0x4
uint64_t value;
 };
 void gem_context_require_ban_period(int fd);
diff --git a/tests/gem_ctx_param_basic.c b/tests/gem_ctx_param_basic.c
index b75800c..585a1a8 100644
--- a/tests/gem_ctx_param_basic.c
+++ b/tests/gem_ctx_param_basic.c
@@ -147,10 +147,42 @@ igt_main
TEST_SUCCESS(LOCAL_IOCTL_I915_GEM_CONTEXT_SETPARAM);
}
 
+   ctx_param.param = LOCAL_CONTEXT_PARAM_PRIORITY;
+
+   igt_subtest("priority-root-set") {
+   ctx_param.context = ctx;
+   ctx_param.value = 2048;
+   TEST_FAIL(LOCAL_IOCTL_I915_GEM_CONTEXT_SETPARAM, EINVAL);
+   ctx_param.value = -2048;
+   TEST_FAIL(LOCAL_IOCTL_I915_GEM_CONTEXT_SETPARAM, EINVAL);
+   ctx_param.value = 512;
+   TEST_SUCCESS(LOCAL_IOCTL_I915_GEM_CONTEXT_SETPARAM);
+   ctx_param.value = -512;
+   TEST_SUCCESS(LOCAL_IOCTL_I915_GEM_CONTEXT_SETPARAM);
+   ctx_param.value = 0;
+   TEST_SUCCESS(LOCAL_IOCTL_I915_GEM_CONTEXT_SETPARAM);
+   }
+
+   igt_subtest("priority-non-root-set") {
+   igt_fork(child, 1) {
+   igt_drop_root();
+
+   ctx_param.context = ctx;
+   ctx_param.value = 512;
+   TEST_FAIL(LOCAL_IOCTL_I915_GEM_CONTEXT_SETPARAM, EPERM);
+   ctx_param.value = -512;
+   TEST_SUCCESS(LOCAL_IOCTL_I915_GEM_CONTEXT_SETPARAM);
+   ctx_param.value = 0;
+   TEST_SUCCESS(LOCAL_IOCTL_I915_GEM_CONTEXT_SETPARAM);
+   }
+
+   igt_waitchildren();
+   }
+
/* NOTE: This testcase intentionally tests for the next free parameter
 * to catch ABI extensions. Don't "fix" this testcase without adding all
 * the tests for the new param first. */
-   ctx_param.param = LOCAL_CONTEXT_PARAM_GTT_SIZE + 1;
+   ctx_param.param = LOCAL_CONTEXT_PARAM_PRIORITY + 1;
 
igt_subtest("invalid-param-get") {
ctx_param.context = ctx;
-- 
1.9.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v5 34/35] drm/i915: Add support for retro-actively banning batch buffers

2016-02-18 Thread John . C . Harrison
From: John Harrison 

If a given context submits too many hanging batch buffers then it will
be banned and no further batch buffers will be accepted for it.
However, it is possible that a large number of buffers may already
have been accepted and are sat in the scheduler waiting to be
executed. This patch adds a late ban check to ensure that these will
also be discarded.

v4: New patch in series.

For: VIZ-1587
Signed-off-by: John Harrison 
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 6 ++
 drivers/gpu/drm/i915/intel_lrc.c   | 6 ++
 2 files changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 793fbce..0b8c61e 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1292,6 +1292,12 @@ int i915_gem_ringbuffer_submission_final(struct 
i915_execbuffer_params *params)
/* The mutex must be acquired before calling this function */
WARN_ON(!mutex_is_locked(>dev->struct_mutex));
 
+   /* Check the context wasn't banned between submission and execution: */
+   if (params->ctx->hang_stats.banned) {
+   DRM_DEBUG("Trying to execute for banned context!\n");
+   return -ENOENT;
+   }
+
/* Make sure the request's seqno is the latest and greatest: */
if (req->reserved_seqno != dev_priv->last_seqno) {
ret = i915_gem_get_seqno(ring->dev, >reserved_seqno);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index e124443..5fbeb0e 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1002,6 +1002,12 @@ int intel_execlists_submission_final(struct 
i915_execbuffer_params *params)
/* The mutex must be acquired before calling this function */
WARN_ON(!mutex_is_locked(>dev->struct_mutex));
 
+   /* Check the context wasn't banned between submission and execution: */
+   if (params->ctx->hang_stats.banned) {
+   DRM_DEBUG("Trying to execute for banned context!\n");
+   return -ENOENT;
+   }
+
/* Make sure the request's seqno is the latest and greatest: */
if (req->reserved_seqno != dev_priv->last_seqno) {
ret = i915_gem_get_seqno(ring->dev, >reserved_seqno);
-- 
1.9.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v5 16/35] drm/i915: Hook scheduler node clean up into retire requests

2016-02-18 Thread John . C . Harrison
From: John Harrison 

The scheduler keeps its own lock on various DRM objects in order to
guarantee safe access long after the original execbuff IOCTL has
completed. This is especially important when pre-emption is enabled as
the batch buffer might need to be submitted to the hardware multiple
times. This patch hooks the clean up of these locks into the request
retire function. The request can only be retired after it has
completed on the hardware and thus is no longer eligible for
re-submission. Thus there is no point holding on to the locks beyond
that time.

v3: Updated to not WARN when cleaning a node that is being cancelled.
The clean will happen later so skipping it at the point of
cancellation is fine.

v5: Squashed the i915_scheduler.c portions down into the 'start of
scheduler' patch. [Joonas Lahtinen]

For: VIZ-1587
Signed-off-by: John Harrison 
Cc: Joonas Lahtinen 
---
 drivers/gpu/drm/i915/i915_gem.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 1ab7256..2dd9b55 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1489,6 +1489,9 @@ static void i915_gem_request_retire(struct 
drm_i915_gem_request *request)
fence_signal_locked(>fence);
}
 
+   if (request->scheduler_qe)
+   i915_scheduler_clean_node(request->scheduler_qe);
+
i915_gem_request_unreference(request);
 }
 
-- 
1.9.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v5 18/35] drm/i915: Added scheduler support to page fault handler

2016-02-18 Thread John . C . Harrison
From: John Harrison 

GPU page faults can now require scheduler operation in order to
complete. For example, in order to free up sufficient memory to handle
the fault the handler must wait for a batch buffer to complete that
has not even been sent to the hardware yet. Thus EAGAIN no longer
means a GPU hang, it can occur under normal operation.

For: VIZ-1587
Signed-off-by: John Harrison 
---
 drivers/gpu/drm/i915/i915_gem.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 17b44b3..a47a495 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2003,10 +2003,15 @@ out:
}
case -EAGAIN:
/*
-* EAGAIN means the gpu is hung and we'll wait for the error
-* handler to reset everything when re-faulting in
+* EAGAIN can mean the gpu is hung and we'll have to wait for
+* the error handler to reset everything when re-faulting in
 * i915_mutex_lock_interruptible.
+*
+* It can also indicate various other nonfatal errors for which
+* the best response is to give other threads a chance to run,
+* and then retry the failing operation in its entirety.
 */
+   /*FALLTHRU*/
case 0:
case -ERESTARTSYS:
case -EINTR:
-- 
1.9.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v5 12/35] drm/i915: Added deferred work handler for scheduler

2016-02-18 Thread John . C . Harrison
From: John Harrison 

The scheduler needs to do interrupt triggered work that is too complex
to do in the interrupt handler. Thus it requires a deferred work
handler to process such tasks asynchronously.

v2: Updated to reduce mutex lock usage. The lock is now only held for
the minimum time within the remove function rather than for the whole
of the worker thread's operation.

v5: Removed objectionable white space and added some documentation.
[Joonas Lahtinen]

For: VIZ-1587
Signed-off-by: John Harrison 
Cc: Joonas Lahtinen 
---
 drivers/gpu/drm/i915/i915_dma.c   |  3 +++
 drivers/gpu/drm/i915/i915_drv.h   | 10 ++
 drivers/gpu/drm/i915/i915_gem.c   |  2 ++
 drivers/gpu/drm/i915/i915_scheduler.c | 29 +++--
 drivers/gpu/drm/i915/i915_scheduler.h |  1 +
 5 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 678adc7..c3d382d 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1158,6 +1158,9 @@ int i915_driver_unload(struct drm_device *dev)
WARN_ON(unregister_oom_notifier(_priv->mm.oom_notifier));
unregister_shrinker(_priv->mm.shrinker);
 
+   /* Cancel the scheduler work handler, which should be idle now. */
+   cancel_work_sync(_priv->mm.scheduler_work);
+
io_mapping_free(dev_priv->gtt.mappable);
arch_phys_wc_del(dev_priv->gtt.mtrr);
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 03add1a..4d544f1 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1291,6 +1291,16 @@ struct i915_gem_mm {
struct delayed_work retire_work;
 
/**
+* New scheme is to get an interrupt after every work packet
+* in order to allow the low latency scheduling of pending
+* packets. The idea behind adding new packets to a pending
+* queue rather than directly into the hardware ring buffer
+* is to allow high priority packets to over take low priority
+* ones.
+*/
+   struct work_struct scheduler_work;
+
+   /**
 * When we detect an idle GPU, we want to turn on
 * powersaving features. So once we see that there
 * are no more requests outstanding and no more
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index c3b7def..1ab7256 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -5427,6 +5427,8 @@ i915_gem_load(struct drm_device *dev)
  i915_gem_retire_work_handler);
INIT_DELAYED_WORK(_priv->mm.idle_work,
  i915_gem_idle_work_handler);
+   INIT_WORK(_priv->mm.scheduler_work,
+   i915_scheduler_work_handler);
init_waitqueue_head(_priv->gpu_error.reset_queue);
 
dev_priv->relative_constants_mode = I915_EXEC_CONSTANTS_REL_GENERAL;
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c 
b/drivers/gpu/drm/i915/i915_scheduler.c
index ab5007a..3986890 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -697,7 +697,9 @@ static int i915_scheduler_remove_dependent(struct 
i915_scheduler *scheduler,
  */
 void i915_scheduler_wakeup(struct drm_device *dev)
 {
-   /* XXX: Need to call i915_scheduler_remove() via work handler. */
+   struct drm_i915_private *dev_priv = to_i915(dev);
+
+   queue_work(dev_priv->wq, _priv->mm.scheduler_work);
 }
 
 /**
@@ -827,7 +829,7 @@ static bool i915_scheduler_remove(struct i915_scheduler 
*scheduler,
return do_submit;
 }
 
-void i915_scheduler_process_work(struct intel_engine_cs *ring)
+static void i915_scheduler_process_work(struct intel_engine_cs *ring)
 {
struct drm_i915_private *dev_priv = ring->dev->dev_private;
struct i915_scheduler *scheduler = dev_priv->scheduler;
@@ -874,6 +876,29 @@ void i915_scheduler_process_work(struct intel_engine_cs 
*ring)
 }
 
 /**
+ * i915_scheduler_work_handler - scheduler's work handler callback.
+ * @work: Work structure
+ * A lot of the scheduler's work must be done asynchronously in response to
+ * an interrupt or other event. However, that work cannot be done at
+ * interrupt time or in the context of the event signaller (which might in
+ * fact be an interrupt). Thus a worker thread is required. This function
+ * will cause the thread to wake up and do its processing.
+ */
+void i915_scheduler_work_handler(struct work_struct *work)
+{
+   struct intel_engine_cs *ring;
+   struct drm_i915_private *dev_priv;
+   struct drm_device *dev;
+   int i;
+
+   dev_priv = container_of(work, struct drm_i915_private, 
mm.scheduler_work);
+   dev = dev_priv->dev;
+
+   for_each_ring(ring, dev_priv, i)
+   i915_scheduler_process_work(ring);
+}
+

[Intel-gfx] [PATCH v5 26/35] drm/i915: Added debugfs interface to scheduler tuning parameters

2016-02-18 Thread John . C . Harrison
From: John Harrison 

There are various parameters within the scheduler which can be tuned
to improve performance, reduce memory footprint, etc. This change adds
support for altering these via debugfs.

v2: Updated for priorities now being signed values.

v5: Squashed priority bumping entries into this patch rather than a
separate patch all of their own.

For: VIZ-1587
Signed-off-by: John Harrison 
---
 drivers/gpu/drm/i915/i915_debugfs.c | 169 
 1 file changed, 169 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index b923949..7d01c07 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -39,6 +39,7 @@
 #include "intel_ringbuffer.h"
 #include 
 #include "i915_drv.h"
+#include "i915_scheduler.h"
 
 enum {
ACTIVE_LIST,
@@ -1122,6 +1123,168 @@ DEFINE_SIMPLE_ATTRIBUTE(i915_next_seqno_fops,
i915_next_seqno_get, i915_next_seqno_set,
"0x%llx\n");
 
+static int
+i915_scheduler_priority_min_get(void *data, u64 *val)
+{
+   struct drm_device   *dev   = data;
+   struct drm_i915_private *dev_priv  = dev->dev_private;
+   struct i915_scheduler   *scheduler = dev_priv->scheduler;
+
+   *val = (u64) scheduler->priority_level_min;
+   return 0;
+}
+
+static int
+i915_scheduler_priority_min_set(void *data, u64 val)
+{
+   struct drm_device   *dev   = data;
+   struct drm_i915_private *dev_priv  = dev->dev_private;
+   struct i915_scheduler   *scheduler = dev_priv->scheduler;
+
+   scheduler->priority_level_min = (int32_t) val;
+   return 0;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(i915_scheduler_priority_min_fops,
+   i915_scheduler_priority_min_get,
+   i915_scheduler_priority_min_set,
+   "%lld\n");
+
+static int
+i915_scheduler_priority_max_get(void *data, u64 *val)
+{
+   struct drm_device   *dev   = data;
+   struct drm_i915_private *dev_priv  = dev->dev_private;
+   struct i915_scheduler   *scheduler = dev_priv->scheduler;
+
+   *val = (u64) scheduler->priority_level_max;
+   return 0;
+}
+
+static int
+i915_scheduler_priority_max_set(void *data, u64 val)
+{
+   struct drm_device   *dev   = data;
+   struct drm_i915_private *dev_priv  = dev->dev_private;
+   struct i915_scheduler   *scheduler = dev_priv->scheduler;
+
+   scheduler->priority_level_max = (int32_t) val;
+   return 0;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(i915_scheduler_priority_max_fops,
+   i915_scheduler_priority_max_get,
+   i915_scheduler_priority_max_set,
+   "%lld\n");
+
+static int
+i915_scheduler_priority_bump_get(void *data, u64 *val)
+{
+   struct drm_device   *dev   = data;
+   struct drm_i915_private *dev_priv  = dev->dev_private;
+   struct i915_scheduler   *scheduler = dev_priv->scheduler;
+
+   *val = (u64) scheduler->priority_level_bump;
+   return 0;
+}
+
+static int
+i915_scheduler_priority_bump_set(void *data, u64 val)
+{
+   struct drm_device   *dev   = data;
+   struct drm_i915_private *dev_priv  = dev->dev_private;
+   struct i915_scheduler   *scheduler = dev_priv->scheduler;
+
+   scheduler->priority_level_bump = (u32) val;
+   return 0;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(i915_scheduler_priority_bump_fops,
+   i915_scheduler_priority_bump_get,
+   i915_scheduler_priority_bump_set,
+   "%lld\n");
+
+static int
+i915_scheduler_priority_preempt_get(void *data, u64 *val)
+{
+   struct drm_device   *dev   = data;
+   struct drm_i915_private *dev_priv  = dev->dev_private;
+   struct i915_scheduler   *scheduler = dev_priv->scheduler;
+
+   *val = (u64) scheduler->priority_level_preempt;
+   return 0;
+}
+
+static int
+i915_scheduler_priority_preempt_set(void *data, u64 val)
+{
+   struct drm_device   *dev   = data;
+   struct drm_i915_private *dev_priv  = dev->dev_private;
+   struct i915_scheduler   *scheduler = dev_priv->scheduler;
+
+   scheduler->priority_level_preempt = (u32) val;
+   return 0;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(i915_scheduler_priority_preempt_fops,
+   i915_scheduler_priority_preempt_get,
+   i915_scheduler_priority_preempt_set,
+   "%lld\n");
+
+static int
+i915_scheduler_min_flying_get(void *data, u64 *val)
+{
+   struct drm_device   *dev   = data;
+   struct drm_i915_private *dev_priv  = dev->dev_private;
+   struct i915_scheduler   *scheduler = dev_priv->scheduler;
+
+   *val = (u64) scheduler->min_flying;
+   return 0;
+}
+
+static int
+i915_scheduler_min_flying_set(void *data, u64 val)
+{
+   struct drm_device   

[Intel-gfx] [PATCH v5 30/35] drm/i915: Add scheduler support functions for TDR

2016-02-18 Thread John . C . Harrison
From: John Harrison 

The TDR code needs to know what the scheduler is up to in order to
work out whether a ring is really hung or not.

v4: Removed some unnecessary braces to keep the style checker happy.

v5: Removed white space and added documentation. [Joonas Lahtinen]

Also updated for new module parameter.

For: VIZ-1587
Signed-off-by: John Harrison 
Cc: Joonas Lahtinen 
---
 drivers/gpu/drm/i915/i915_scheduler.c | 33 +
 drivers/gpu/drm/i915/i915_scheduler.h |  1 +
 2 files changed, 34 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_scheduler.c 
b/drivers/gpu/drm/i915/i915_scheduler.c
index 0068d03..c69e2b8 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -1627,3 +1627,36 @@ int i915_scheduler_closefile(struct drm_device *dev, 
struct drm_file *file)
 
return 0;
 }
+
+/**
+ * i915_scheduler_is_ring_flying - does the given ring have in flight batches?
+ * @ring: Ring to query
+ * Used by TDR to distinguish hung rings (not moving but with work to do)
+ * from idle rings (not moving because there is nothing to do). Returns true
+ * if the given ring has batches currently executing on the hardware.
+ */
+bool i915_scheduler_is_ring_flying(struct intel_engine_cs *ring)
+{
+   struct drm_i915_private *dev_priv = ring->dev->dev_private;
+   struct i915_scheduler *scheduler = dev_priv->scheduler;
+   struct i915_scheduler_queue_entry *node;
+   unsigned long flags;
+   bool found = false;
+
+   /* With the scheduler in bypass mode, no information can be returned. */
+   if (!i915.enable_scheduler)
+   return true;
+
+   spin_lock_irqsave(>lock, flags);
+
+   list_for_each_entry(node, >node_queue[ring->id], link) {
+   if (I915_SQS_IS_FLYING(node)) {
+   found = true;
+   break;
+   }
+   }
+
+   spin_unlock_irqrestore(>lock, flags);
+
+   return found;
+}
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h 
b/drivers/gpu/drm/i915/i915_scheduler.h
index 065f2a3..dcf1f05 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -136,6 +136,7 @@ void i915_scheduler_clean_node(struct 
i915_scheduler_queue_entry *node);
 int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe);
 bool i915_scheduler_notify_request(struct drm_i915_gem_request *req);
 void i915_scheduler_wakeup(struct drm_device *dev);
+bool i915_scheduler_is_ring_flying(struct intel_engine_cs *ring);
 void i915_scheduler_work_handler(struct work_struct *work);
 int i915_scheduler_flush(struct intel_engine_cs *ring, bool is_locked);
 int i915_scheduler_flush_stamp(struct intel_engine_cs *ring,
-- 
1.9.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v5 29/35] drm/i915: Added scheduler statistic reporting to debugfs

2016-02-18 Thread John . C . Harrison
From: John Harrison 

It is useful for know what the scheduler is doing for both debugging
and performance analysis purposes. This change adds a bunch of
counters and such that keep track of various scheduler operations
(batches submitted, completed, flush requests, etc.). The data can
then be read in userland via the debugfs mechanism.

v2: Updated to match changes to scheduler implementation.

v3: Updated for changes to kill code and flush code.

v4: Removed the fence/sync code as that will be part of a separate
patch series. Wrapped a long line to keep the style checker happy.

v5: Updated to remove forward declarations and white space. Added
documentation. [Joonas Lahtinen]

Used lighter weight spinlocks.

For: VIZ-1587
Signed-off-by: John Harrison 
Cc: Joonas Lahtinen 
---
 drivers/gpu/drm/i915/i915_debugfs.c| 73 
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  3 ++
 drivers/gpu/drm/i915/i915_scheduler.c  | 78 --
 drivers/gpu/drm/i915/i915_scheduler.h  | 31 
 4 files changed, 180 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index 7d01c07..2c8b00f 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -3595,6 +3595,78 @@ static int i915_drrs_status(struct seq_file *m, void 
*unused)
return 0;
 }
 
+static int i915_scheduler_info(struct seq_file *m, void *unused)
+{
+   struct drm_info_node *node = (struct drm_info_node *) m->private;
+   struct drm_device *dev = node->minor->dev;
+   struct drm_i915_private *dev_priv = dev->dev_private;
+   struct i915_scheduler   *scheduler = dev_priv->scheduler;
+   struct i915_scheduler_stats *stats = scheduler->stats;
+   struct i915_scheduler_stats_nodes node_stats[I915_NUM_RINGS];
+   struct intel_engine_cs *ring;
+   char   str[50 * (I915_NUM_RINGS + 1)], name[50], *ptr;
+   int ret, i, r;
+
+   ret = mutex_lock_interruptible(>mode_config.mutex);
+   if (ret)
+   return ret;
+
+#define PRINT_VAR(name, fmt, var)  \
+   do {\
+   sprintf(str, "%-22s", name);\
+   ptr = str + strlen(str);\
+   for_each_ring(ring, dev_priv, r) {  \
+   sprintf(ptr, " %10" fmt, var);  \
+   ptr += strlen(ptr); \
+   }   \
+   seq_printf(m, "%s\n", str); \
+   } while (0)
+
+   PRINT_VAR("Ring name:", "s", dev_priv->ring[r].name);
+   PRINT_VAR("  Ring seqno",   "d", ring->get_seqno(ring, false));
+   seq_putc(m, '\n');
+
+   seq_puts(m, "Batch submissions:\n");
+   PRINT_VAR("  Queued",   "u", stats[r].queued);
+   PRINT_VAR("  Submitted","u", stats[r].submitted);
+   PRINT_VAR("  Completed","u", stats[r].completed);
+   PRINT_VAR("  Expired",  "u", stats[r].expired);
+   seq_putc(m, '\n');
+
+   seq_puts(m, "Flush counts:\n");
+   PRINT_VAR("  By object","u", stats[r].flush_obj);
+   PRINT_VAR("  By request",   "u", stats[r].flush_req);
+   PRINT_VAR("  By stamp", "u", stats[r].flush_stamp);
+   PRINT_VAR("  Blanket",  "u", stats[r].flush_all);
+   PRINT_VAR("  Entries bumped",   "u", stats[r].flush_bump);
+   PRINT_VAR("  Entries submitted","u", stats[r].flush_submit);
+   seq_putc(m, '\n');
+
+   seq_puts(m, "Miscellaneous:\n");
+   PRINT_VAR("  ExecEarly retry",  "u", stats[r].exec_early);
+   PRINT_VAR("  ExecFinal requeue","u", stats[r].exec_again);
+   PRINT_VAR("  ExecFinal killed", "u", stats[r].exec_dead);
+   PRINT_VAR("  Hung flying",  "u", stats[r].kill_flying);
+   PRINT_VAR("  Hung queued",  "u", stats[r].kill_queued);
+   seq_putc(m, '\n');
+
+   seq_puts(m, "Queue contents:\n");
+   for_each_ring(ring, dev_priv, i)
+   i915_scheduler_query_stats(ring, node_stats + ring->id);
+
+   for (i = 0; i < (i915_sqs_MAX + 1); i++) {
+   sprintf(name, "  %s", i915_scheduler_queue_status_str(i));
+   PRINT_VAR(name, "d", node_stats[r].counts[i]);
+   }
+   seq_putc(m, '\n');
+
+#undef PRINT_VAR
+
+   mutex_unlock(>mode_config.mutex);
+
+   return 0;
+}
+
 struct pipe_crc_info {
const char *name;
struct drm_device *dev;
@@ -5565,6 +5637,7 @@ static const struct drm_info_list i915_debugfs_list[] = {

[Intel-gfx] [PATCH v5 14/35] drm/i915: Keep the reserved space mechanism happy

2016-02-18 Thread John . C . Harrison
From: John Harrison 

Ring space is reserved when constructing a request to ensure that the
subsequent 'add_request()' call cannot fail due to waiting for space
on a busy or broken GPU. However, the scheduler jumps in to the middle
of the execbuffer process between request creation and request
submission. Thus it needs to cancel the reserved space when the
request is simply added to the scheduler's queue and not yet
submitted. Similarly, it needs to re-reserve the space when it finally
does want to send the batch buffer to the hardware.

v3: Updated to use locally cached request pointer.

v5: Updated due to changes to earlier patches in series - for runtime
PM calls and splitting bypass mode into a separate function.

For: VIZ-1587
Signed-off-by: John Harrison 
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 20 ++--
 drivers/gpu/drm/i915/i915_scheduler.c  |  4 
 drivers/gpu/drm/i915/intel_lrc.c   | 13 +++--
 3 files changed, 29 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 09c5ce9..11bea8d 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1295,18 +1295,22 @@ int i915_gem_ringbuffer_submission_final(struct 
i915_execbuffer_params *params)
/* The mutex must be acquired before calling this function */
WARN_ON(!mutex_is_locked(>dev->struct_mutex));
 
+   ret = intel_ring_reserve_space(req);
+   if (ret)
+   goto error;
+
/*
 * Unconditionally invalidate gpu caches and ensure that we do flush
 * any residual writes from the previous batch.
 */
ret = intel_ring_invalidate_all_caches(req);
if (ret)
-   return ret;
+   goto error;
 
/* Switch to the correct context for the batch */
ret = i915_switch_context(req);
if (ret)
-   return ret;
+   goto error;
 
WARN(params->ctx->ppgtt && params->ctx->ppgtt->pd_dirty_rings & 
(1name);
@@ -1315,7 +1319,7 @@ int i915_gem_ringbuffer_submission_final(struct 
i915_execbuffer_params *params)
params->instp_mode != dev_priv->relative_constants_mode) {
ret = intel_ring_begin(req, 4);
if (ret)
-   return ret;
+   goto error;
 
intel_ring_emit(ring, MI_NOOP);
intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
@@ -1329,7 +1333,7 @@ int i915_gem_ringbuffer_submission_final(struct 
i915_execbuffer_params *params)
if (params->args_flags & I915_EXEC_GEN7_SOL_RESET) {
ret = i915_reset_gen7_sol_offsets(params->dev, req);
if (ret)
-   return ret;
+   goto error;
}
 
exec_len   = params->args_batch_len;
@@ -1343,13 +1347,17 @@ int i915_gem_ringbuffer_submission_final(struct 
i915_execbuffer_params *params)
exec_start, exec_len,
params->dispatch_flags);
if (ret)
-   return ret;
+   goto error;
 
trace_i915_gem_ring_dispatch(req, params->dispatch_flags);
 
i915_gem_execbuffer_retire_commands(params);
 
-   return 0;
+error:
+   if (ret)
+   intel_ring_reserved_space_cancel(req->ringbuf);
+
+   return ret;
 }
 
 /**
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c 
b/drivers/gpu/drm/i915/i915_scheduler.c
index 3986890..a3ffd04 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -483,6 +483,8 @@ static int i915_scheduler_queue_execbuffer_bypass(struct 
i915_scheduler_queue_en
struct i915_scheduler *scheduler = dev_priv->scheduler;
int ret;
 
+   intel_ring_reserved_space_cancel(qe->params.request->ringbuf);
+
scheduler->flags[qe->params.ring->id] |= i915_sf_submitting;
ret = dev_priv->gt.execbuf_final(>params);
scheduler->flags[qe->params.ring->id] &= ~i915_sf_submitting;
@@ -539,6 +541,8 @@ int i915_scheduler_queue_execbuffer(struct 
i915_scheduler_queue_entry *qe)
node->stamp  = jiffies;
i915_gem_request_reference(node->params.request);
 
+   intel_ring_reserved_space_cancel(node->params.request->ringbuf);
+
WARN_ON(node->params.request->scheduler_qe);
node->params.request->scheduler_qe = node;
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index ff4565f..f4bab82 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -978,13 +978,17 @@ int intel_execlists_submission_final(struct 
i915_execbuffer_params *params)
/* The mutex must be acquired before calling this function */

[Intel-gfx] [PATCH v5 25/35] drm/i915: Added scheduler queue throttling by DRM file handle

2016-02-18 Thread John . C . Harrison
From: John Harrison 

The scheduler decouples the submission of batch buffers to the driver
from their subsequent submission to the hardware. This means that an
application which is continuously submitting buffers as fast as it can
could potentialy flood the driver. To prevent this, the driver now
tracks how many buffers are in progress (queued in software or
executing in hardware) and limits this to a given (tunable) number. If
this number is exceeded then the queue to the driver will return
EAGAIN and thus prevent the scheduler's queue becoming arbitrarily
large.

v3: Added a missing decrement of the file queue counter.

v4: Updated a comment.

v5: Updated due to changes to earlier patches in series - removing
forward declarations and white space. Also added some documentation.
[Joonas Lahtinen]

For: VIZ-1587
Signed-off-by: John Harrison 
---
 drivers/gpu/drm/i915/i915_drv.h|  2 ++
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  8 +
 drivers/gpu/drm/i915/i915_scheduler.c  | 48 ++
 drivers/gpu/drm/i915/i915_scheduler.h  |  2 ++
 4 files changed, 60 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 071a27b..3f4c4f0 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -336,6 +336,8 @@ struct drm_i915_file_private {
} rps;
 
struct intel_engine_cs *bsd_ring;
+
+   u32 scheduler_queue_length;
 };
 
 enum intel_dpll_id {
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index d4de8c7..dff120c 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1803,6 +1803,10 @@ i915_gem_execbuffer(struct drm_device *dev, void *data,
return -EINVAL;
}
 
+   /* Throttle batch requests per device file */
+   if (i915_scheduler_file_queue_is_full(file))
+   return -EAGAIN;
+
/* Copy in the exec list from userland */
exec_list = drm_malloc_ab(sizeof(*exec_list), args->buffer_count);
exec2_list = drm_malloc_ab(sizeof(*exec2_list), args->buffer_count);
@@ -1893,6 +1897,10 @@ i915_gem_execbuffer2(struct drm_device *dev, void *data,
return -EINVAL;
}
 
+   /* Throttle batch requests per device file */
+   if (i915_scheduler_file_queue_is_full(file))
+   return -EAGAIN;
+
exec2_list = kmalloc(sizeof(*exec2_list)*args->buffer_count,
 GFP_TEMPORARY | __GFP_NOWARN | __GFP_NORETRY);
if (exec2_list == NULL)
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c 
b/drivers/gpu/drm/i915/i915_scheduler.c
index e56ce08..f7f29d5 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -69,6 +69,7 @@ int i915_scheduler_init(struct drm_device *dev)
scheduler->priority_level_bump= 50;
scheduler->priority_level_preempt = 900;
scheduler->min_flying = 2;
+   scheduler->file_queue_max = 64;
 
dev_priv->scheduler = scheduler;
 
@@ -464,6 +465,44 @@ static int i915_scheduler_submit_unlocked(struct 
intel_engine_cs *ring)
return ret;
 }
 
+/**
+ * i915_scheduler_file_queue_is_full - Returns true if the queue is full.
+ * @file: File object to query.
+ * This allows throttling of applications by limiting the total number of
+ * outstanding requests to a specified level. Once that limit is reached,
+ * this call will return true and no more requests should be accepted.
+ */
+bool i915_scheduler_file_queue_is_full(struct drm_file *file)
+{
+   struct drm_i915_file_private *file_priv = file->driver_priv;
+   struct drm_i915_private *dev_priv  = file_priv->dev_priv;
+   struct i915_scheduler *scheduler = dev_priv->scheduler;
+
+   return file_priv->scheduler_queue_length >= scheduler->file_queue_max;
+}
+
+/**
+ * i915_scheduler_file_queue_inc - Increment the file's request queue count.
+ * @file: File object to process.
+ */
+static void i915_scheduler_file_queue_inc(struct drm_file *file)
+{
+   struct drm_i915_file_private *file_priv = file->driver_priv;
+
+   file_priv->scheduler_queue_length++;
+}
+
+/**
+ * i915_scheduler_file_queue_dec - Decrement the file's request queue count.
+ * @file: File object to process.
+ */
+static void i915_scheduler_file_queue_dec(struct drm_file *file)
+{
+   struct drm_i915_file_private *file_priv = file->driver_priv;
+
+   file_priv->scheduler_queue_length--;
+}
+
 static void i915_generate_dependencies(struct i915_scheduler *scheduler,
   struct i915_scheduler_queue_entry *node,
   uint32_t ring)
@@ -640,6 +679,8 @@ int i915_scheduler_queue_execbuffer(struct 
i915_scheduler_queue_entry *qe)
 
list_add_tail(>link, >node_queue[ring->id]);
 
+  

[Intel-gfx] [PATCH v5 22/35] drm/i915: Support for 'unflushed' ring idle

2016-02-18 Thread John . C . Harrison
From: John Harrison 

When the seqno wraps around zero, the entire GPU is forced to be idle
for some reason (possibly only to work around issues with hardware
semaphores but no-one seems too sure!). This causes a problem if the
force idle occurs at an inopportune moment such as in the middle of
submitting a batch buffer. Specifically, it would lead to recursive
submits - submitting work requires a new seqno, the new seqno requires
idling the ring, idling the ring requires submitting work, submitting
work requires a new seqno...

This change adds a 'flush' parameter to the idle function call which
specifies whether the scheduler queues should be flushed out. I.e. is
the call intended to just idle the ring as it is right now (no flush)
or is it intended to force all outstanding work out of the system
(with flush).

In the seqno wrap case, pending work is not an issue because the next
operation will be to submit it. However, in other cases, the intention
is to make sure everything that could be done has been done.

For: VIZ-1587
Signed-off-by: John Harrison 
---
 drivers/gpu/drm/i915/i915_gem.c |  4 ++--
 drivers/gpu/drm/i915/intel_lrc.c|  2 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c | 17 +++--
 drivers/gpu/drm/i915/intel_ringbuffer.h |  2 +-
 4 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d7f7f7a..a249e52 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2564,7 +2564,7 @@ i915_gem_init_seqno(struct drm_device *dev, u32 seqno)
 
/* Carefully retire all requests without writing to the rings */
for_each_ring(ring, dev_priv, i) {
-   ret = intel_ring_idle(ring);
+   ret = intel_ring_idle(ring, false);
if (ret)
return ret;
}
@@ -3808,7 +3808,7 @@ int i915_gpu_idle(struct drm_device *dev)
i915_add_request_no_flush(req);
}
 
-   ret = intel_ring_idle(ring);
+   ret = intel_ring_idle(ring, true);
if (ret)
return ret;
}
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index f4bab82..e056875 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1058,7 +1058,7 @@ void intel_logical_ring_stop(struct intel_engine_cs *ring)
if (!intel_ring_initialized(ring))
return;
 
-   ret = intel_ring_idle(ring);
+   ret = intel_ring_idle(ring, true);
if (ret && !i915_reset_in_progress(_i915(ring->dev)->gpu_error))
DRM_ERROR("failed to quiesce %s whilst cleaning up: %d\n",
  ring->name, ret);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
b/drivers/gpu/drm/i915/intel_ringbuffer.c
index a2093f5..70ef9f0 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2288,9 +2288,22 @@ static void __wrap_ring_buffer(struct intel_ringbuffer 
*ringbuf)
intel_ring_update_space(ringbuf);
 }
 
-int intel_ring_idle(struct intel_engine_cs *ring)
+int intel_ring_idle(struct intel_engine_cs *ring, bool flush)
 {
struct drm_i915_gem_request *req;
+   int ret;
+
+   /*
+* NB: Must not flush the scheduler if this idle request is from
+* within an execbuff submission (i.e. due to 'get_seqno' calling
+* 'wrap_seqno' calling 'idle'). As that would lead to recursive
+* flushes!
+*/
+   if (flush) {
+   ret = i915_scheduler_flush(ring, true);
+   if (ret)
+   return ret;
+   }
 
/* Wait upon the last request to be completed */
if (list_empty(>request_list))
@@ -3095,7 +3108,7 @@ intel_stop_ring_buffer(struct intel_engine_cs *ring)
if (!intel_ring_initialized(ring))
return;
 
-   ret = intel_ring_idle(ring);
+   ret = intel_ring_idle(ring, true);
if (ret && !i915_reset_in_progress(_i915(ring->dev)->gpu_error))
DRM_ERROR("failed to quiesce %s whilst cleaning up: %d\n",
  ring->name, ret);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h 
b/drivers/gpu/drm/i915/intel_ringbuffer.h
index ada93a9..cca476f 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -478,7 +478,7 @@ void intel_ring_update_space(struct intel_ringbuffer 
*ringbuf);
 int intel_ring_space(struct intel_ringbuffer *ringbuf);
 bool intel_ring_stopped(struct intel_engine_cs *ring);
 
-int __must_check intel_ring_idle(struct intel_engine_cs *ring);
+int __must_check intel_ring_idle(struct intel_engine_cs *ring, bool flush);
 void intel_ring_init_seqno(struct intel_engine_cs *ring, u32 seqno);
 int intel_ring_flush_all_caches(struct 

[Intel-gfx] [PATCH v5 33/35] drm/i915: Add scheduling priority to per-context parameters

2016-02-18 Thread John . C . Harrison
From: Dave Gordon 

Added an interface for user land applications/libraries/services to
set their GPU scheduler priority. This extends the existing context
parameter IOCTL interface to add a scheduler priority parameter. The
range is +/-1023 with +ve numbers meaning higher priority. Only
system processes may set a higher priority than the default (zero),
normal applications may only lower theirs.

v2: New patch in series.

For: VIZ-1587
Signed-off-by: Dave Gordon 
Signed-off-by: John Harrison 
---
 drivers/gpu/drm/i915/i915_drv.h| 14 ++
 drivers/gpu/drm/i915/i915_gem_context.c| 24 
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  3 +++
 include/uapi/drm/i915_drm.h|  1 +
 4 files changed, 42 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 3f4c4f0..5d02f44 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -847,6 +847,19 @@ struct i915_ctx_hang_stats {
bool banned;
 };
 
+/*
+ * User-settable GFX scheduler priorities are on a scale of -1023 (I don't
+ * care about running) to +1023 (I'm the most important thing in existence)
+ * with zero being the default. Any process may decrease its scheduling
+ * priority, but only a sufficiently privileged process may increase it
+ * beyond zero.
+ */
+
+struct i915_ctx_sched_info {
+   /* Scheduling priority */
+   int32_t priority;
+};
+
 struct i915_fence_timeline {
charname[32];
unsignedfence_context;
@@ -887,6 +900,7 @@ struct intel_context {
int flags;
struct drm_i915_file_private *file_priv;
struct i915_ctx_hang_stats hang_stats;
+   struct i915_ctx_sched_info sched_info;
struct i915_hw_ppgtt *ppgtt;
 
/* Legacy ring buffer submission */
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
b/drivers/gpu/drm/i915/i915_gem_context.c
index 3dcb2f4..6ac03e8 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -956,6 +956,9 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, 
void *data,
else
args->value = to_i915(dev)->gtt.base.total;
break;
+   case I915_CONTEXT_PARAM_PRIORITY:
+   args->value = (__u64) ctx->sched_info.priority;
+   break;
default:
ret = -EINVAL;
break;
@@ -993,6 +996,7 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, 
void *data,
else
ctx->hang_stats.ban_period_seconds = args->value;
break;
+
case I915_CONTEXT_PARAM_NO_ZEROMAP:
if (args->size) {
ret = -EINVAL;
@@ -1001,6 +1005,26 @@ int i915_gem_context_setparam_ioctl(struct drm_device 
*dev, void *data,
ctx->flags |= args->value ? CONTEXT_NO_ZEROMAP : 0;
}
break;
+
+   case I915_CONTEXT_PARAM_PRIORITY:
+   {
+   int32_t priority = (int32_t) args->value;
+   struct drm_i915_private *dev_priv  = dev->dev_private;
+   struct i915_scheduler   *scheduler = dev_priv->scheduler;
+
+   if (args->size)
+   ret = -EINVAL;
+   else if ((priority > scheduler->priority_level_max) ||
+(priority < scheduler->priority_level_min))
+   ret = -EINVAL;
+   else if ((priority > 0) &&
+!capable(CAP_SYS_ADMIN))
+   ret = -EPERM;
+   else
+   ctx->sched_info.priority = priority;
+   break;
+   }
+
default:
ret = -EINVAL;
break;
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index a42a13e..793fbce 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1707,6 +1707,9 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
params->args_DR4= args->DR4;
params->batch_obj   = batch_obj;
 
+   /* Start with the context's priority level */
+   qe.priority = ctx->sched_info.priority;
+
/*
 * Save away the list of objects used by this batch buffer for the
 * purpose of tracking inter-buffer dependencies.
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index acf2102..8a01a47 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1140,6 +1140,7 @@ struct drm_i915_gem_context_param {
 #define I915_CONTEXT_PARAM_BAN_PERIOD  0x1
 #define I915_CONTEXT_PARAM_NO_ZEROMAP  0x2
 #define I915_CONTEXT_PARAM_GTT_SIZE0x3
+#define I915_CONTEXT_PARAM_PRIORITY0x4
   

[Intel-gfx] [PATCH v5 19/35] drm/i915: Added scheduler flush calls to ring throttle and idle functions

2016-02-18 Thread John . C . Harrison
From: John Harrison 

When requesting that all GPU work is completed, it is now necessary to
get the scheduler involved in order to flush out work that queued and
not yet submitted.

v2: Updated to add support for flushing the scheduler queue by time
stamp rather than just doing a blanket flush.

v3: Moved submit_max_priority() to this patch from an earlier patch
is it is no longer required in the other.

v4: Corrected the format of a comment to keep the style checker happy.
Downgraded a BUG_ON to a WARN_ON as the latter is preferred.

v5: Shuffled functions around to remove forward prototypes, removed
similarly offensive white space and added documentation. Re-worked the
mutex locking around the submit function. [Joonas Lahtinen]

Used lighter weight spinlocks.

For: VIZ-1587
Signed-off-by: John Harrison 
Cc: Joonas Lahtinen 
---
 drivers/gpu/drm/i915/i915_gem.c   |  24 -
 drivers/gpu/drm/i915/i915_scheduler.c | 178 ++
 drivers/gpu/drm/i915/i915_scheduler.h |   3 +
 3 files changed, 204 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index a47a495..d946f53 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3786,6 +3786,10 @@ int i915_gpu_idle(struct drm_device *dev)
 
/* Flush everything onto the inactive list. */
for_each_ring(ring, dev_priv, i) {
+   ret = i915_scheduler_flush(ring, true);
+   if (ret < 0)
+   return ret;
+
if (!i915.enable_execlists) {
struct drm_i915_gem_request *req;
 
@@ -4519,7 +4523,8 @@ i915_gem_ring_throttle(struct drm_device *dev, struct 
drm_file *file)
unsigned long recent_enough = jiffies - DRM_I915_THROTTLE_JIFFIES;
struct drm_i915_gem_request *request, *target = NULL;
unsigned reset_counter;
-   int ret;
+   int i, ret;
+   struct intel_engine_cs *ring;
 
ret = i915_gem_wait_for_error(_priv->gpu_error);
if (ret)
@@ -4529,6 +4534,23 @@ i915_gem_ring_throttle(struct drm_device *dev, struct 
drm_file *file)
if (ret)
return ret;
 
+   for_each_ring(ring, dev_priv, i) {
+   /*
+* Flush out scheduler entries that are getting 'stale'. Note
+* that the following recent_enough test will only check
+* against the time at which the request was submitted to the
+* hardware (i.e. when it left the scheduler) not the time it
+* was submitted to the driver.
+*
+* Also, there is not much point worring about busy return
+* codes from the scheduler flush call. Even if more work
+* cannot be submitted right now for whatever reason, we
+* still want to throttle against stale work that has already
+* been submitted.
+*/
+   i915_scheduler_flush_stamp(ring, recent_enough, false);
+   }
+
spin_lock(_priv->mm.lock);
list_for_each_entry(request, _priv->mm.request_list, client_list) {
if (time_after_eq(request->emitted_jiffies, recent_enough))
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c 
b/drivers/gpu/drm/i915/i915_scheduler.c
index edab63d..8130a9c 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -304,6 +304,10 @@ static int i915_scheduler_pop_from_queue_locked(struct 
intel_engine_cs *ring,
  * attempting to acquire a mutex while holding a spin lock is a Bad Idea.
  * And releasing the one before acquiring the other leads to other code
  * being run and interfering.
+ *
+ * Hence any caller that does not already have the mutex lock for other
+ * reasons should call i915_scheduler_submit_unlocked() instead in order to
+ * obtain the lock first.
  */
 static int i915_scheduler_submit(struct intel_engine_cs *ring)
 {
@@ -428,6 +432,22 @@ error:
return ret;
 }
 
+static int i915_scheduler_submit_unlocked(struct intel_engine_cs *ring)
+{
+   struct drm_device *dev = ring->dev;
+   int ret;
+
+   ret = i915_mutex_lock_interruptible(dev);
+   if (ret)
+   return ret;
+
+   ret = i915_scheduler_submit(ring);
+
+   mutex_unlock(>struct_mutex);
+
+   return ret;
+}
+
 static void i915_generate_dependencies(struct i915_scheduler *scheduler,
   struct i915_scheduler_queue_entry *node,
   uint32_t ring)
@@ -917,6 +937,164 @@ void i915_scheduler_work_handler(struct work_struct *work)
i915_scheduler_process_work(ring);
 }
 
+static int i915_scheduler_submit_max_priority(struct intel_engine_cs *ring,
+ bool is_locked)
+{
+   struct 

[Intel-gfx] [PATCH v5 20/35] drm/i915: Add scheduler hook to GPU reset

2016-02-18 Thread John . C . Harrison
From: John Harrison 

When the watchdog resets the GPU, all interrupts get disabled despite
the reference count remaining. As the scheduler probably had
interrupts enabled during the reset (it would have been waiting for
the bad batch to complete), it must be poked to tell it that the
interrupt has been disabled.

v5: New patch in series.

For: VIZ-1587
Signed-off-by: John Harrison 
---
 drivers/gpu/drm/i915/i915_gem.c   |  2 ++
 drivers/gpu/drm/i915/i915_scheduler.c | 11 +++
 drivers/gpu/drm/i915/i915_scheduler.h |  1 +
 3 files changed, 14 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d946f53..d7f7f7a 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3248,6 +3248,8 @@ static void i915_gem_reset_ring_cleanup(struct 
drm_i915_private *dev_priv,
buffer->last_retired_head = buffer->tail;
intel_ring_update_space(buffer);
}
+
+   i915_scheduler_reset_cleanup(ring);
 }
 
 void i915_gem_reset(struct drm_device *dev)
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c 
b/drivers/gpu/drm/i915/i915_scheduler.c
index 8130a9c..4f25bf2 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -778,6 +778,17 @@ void i915_scheduler_clean_node(struct 
i915_scheduler_queue_entry *node)
}
 }
 
+void i915_scheduler_reset_cleanup(struct intel_engine_cs *ring)
+{
+   struct drm_i915_private *dev_priv = ring->dev->dev_private;
+   struct i915_scheduler *scheduler = dev_priv->scheduler;
+
+   if (scheduler->flags[ring->id] & i915_sf_interrupts_enabled) {
+   ring->irq_put(ring);
+   scheduler->flags[ring->id] &= ~i915_sf_interrupts_enabled;
+   }
+}
+
 static bool i915_scheduler_remove(struct i915_scheduler *scheduler,
  struct intel_engine_cs *ring,
  struct list_head *remove)
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h 
b/drivers/gpu/drm/i915/i915_scheduler.h
index 839b048..075befb 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -89,6 +89,7 @@ bool i915_scheduler_is_enabled(struct drm_device *dev);
 int i915_scheduler_init(struct drm_device *dev);
 int i915_scheduler_closefile(struct drm_device *dev,
 struct drm_file *file);
+void i915_scheduler_reset_cleanup(struct intel_engine_cs *ring);
 void i915_scheduler_clean_node(struct i915_scheduler_queue_entry *node);
 int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe);
 bool i915_scheduler_notify_request(struct drm_i915_gem_request *req);
-- 
1.9.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v5 21/35] drm/i915: Added a module parameter to allow the scheduler to be disabled

2016-02-18 Thread John . C . Harrison
From: John Harrison 

It can be useful to be able to disable the GPU scheduler via a module
parameter for debugging purposes.

v5: Converted from a multi-feature 'overrides' mask to a single
'enable' boolean. Further features (e.g. pre-emption) will now be
separate 'enable' booleans added later. [Chris Wilson]

For: VIZ-1587
Signed-off-by: John Harrison 
---
 drivers/gpu/drm/i915/i915_params.c| 4 
 drivers/gpu/drm/i915/i915_params.h| 1 +
 drivers/gpu/drm/i915/i915_scheduler.c | 5 -
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_params.c 
b/drivers/gpu/drm/i915/i915_params.c
index d0eba58..0ef3159 100644
--- a/drivers/gpu/drm/i915/i915_params.c
+++ b/drivers/gpu/drm/i915/i915_params.c
@@ -57,6 +57,7 @@ struct i915_params i915 __read_mostly = {
.edp_vswing = 0,
.enable_guc_submission = true,
.guc_log_level = -1,
+   .enable_scheduler = 0,
 };
 
 module_param_named(modeset, i915.modeset, int, 0400);
@@ -203,3 +204,6 @@ MODULE_PARM_DESC(enable_guc_submission, "Enable GuC 
submission (default:false)")
 module_param_named(guc_log_level, i915.guc_log_level, int, 0400);
 MODULE_PARM_DESC(guc_log_level,
"GuC firmware logging level (-1:disabled (default), 0-3:enabled)");
+
+module_param_named_unsafe(enable_scheduler, i915.enable_scheduler, int, 0600);
+MODULE_PARM_DESC(enable_scheduler, "Enable scheduler (0 = disable [default], 1 
= enable)");
diff --git a/drivers/gpu/drm/i915/i915_params.h 
b/drivers/gpu/drm/i915/i915_params.h
index 5299290..f855c86 100644
--- a/drivers/gpu/drm/i915/i915_params.h
+++ b/drivers/gpu/drm/i915/i915_params.h
@@ -60,6 +60,7 @@ struct i915_params {
bool enable_guc_submission;
bool verbose_state_checks;
bool nuclear_pageflip;
+   int enable_scheduler;
 };
 
 extern struct i915_params i915 __read_mostly;
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c 
b/drivers/gpu/drm/i915/i915_scheduler.c
index 4f25bf2..47d7de4 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -34,6 +34,9 @@ bool i915_scheduler_is_enabled(struct drm_device *dev)
 {
struct drm_i915_private *dev_priv = dev->dev_private;
 
+   if (!i915.enable_scheduler)
+   return false;
+
return dev_priv->scheduler != NULL;
 }
 
@@ -548,7 +551,7 @@ int i915_scheduler_queue_execbuffer(struct 
i915_scheduler_queue_entry *qe)
 
WARN_ON(!scheduler);
 
-   if (1/*!i915.enable_scheduler*/)
+   if (!i915.enable_scheduler)
return i915_scheduler_queue_execbuffer_bypass(qe);
 
node = kmalloc(sizeof(*node), GFP_KERNEL);
-- 
1.9.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v5 17/35] drm/i915: Added scheduler support to __wait_request() calls

2016-02-18 Thread John . C . Harrison
From: John Harrison 

The scheduler can cause batch buffers, and hence requests, to be
submitted to the ring out of order and asynchronously to their
submission to the driver. Thus at the point of waiting for the
completion of a given request, it is not even guaranteed that the
request has actually been sent to the hardware yet. Even it is has
been sent, it is possible that it could be pre-empted and thus
'unsent'.

This means that it is necessary to be able to submit requests to the
hardware during the wait call itself. Unfortunately, while some
callers of __wait_request() release the mutex lock first, others do
not (and apparently can not). Hence there is the ability to deadlock
as the wait stalls for submission but the asynchronous submission is
stalled for the mutex lock.

This change hooks the scheduler in to the __wait_request() code to
ensure correct behaviour. That is, flush the target batch buffer
through to the hardware and do not deadlock waiting for something that
cannot currently be submitted. Instead, the wait call must return
EAGAIN at least as far back as necessary to release the mutex lock and
allow the scheduler's asynchronous processing to get in and handle the
pre-emption operation and eventually (re-)submit the work.

v3: Removed the explicit scheduler flush from i915_wait_request().
This is no longer necessary and was causing unintended changes to the
scheduler priority level which broke a validation team test.

v4: Corrected the format of some comments to keep the style checker
happy.

v5: Added function description. [Joonas Lahtinen]

For: VIZ-1587
Signed-off-by: John Harrison 
Cc: Joonas Lahtinen 
---
 drivers/gpu/drm/i915/i915_drv.h |  3 ++-
 drivers/gpu/drm/i915/i915_gem.c | 37 ++---
 drivers/gpu/drm/i915/i915_scheduler.c   | 31 +++
 drivers/gpu/drm/i915/i915_scheduler.h   |  2 ++
 drivers/gpu/drm/i915/intel_display.c|  5 +++--
 drivers/gpu/drm/i915/intel_ringbuffer.c |  2 +-
 6 files changed, 69 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 4d544f1..5eeeced 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3071,7 +3071,8 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
unsigned reset_counter,
bool interruptible,
s64 *timeout,
-   struct intel_rps_client *rps);
+   struct intel_rps_client *rps,
+   bool is_locked);
 int __must_check i915_wait_request(struct drm_i915_gem_request *req);
 int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf);
 int __must_check
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 2dd9b55..17b44b3 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1258,7 +1258,8 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
unsigned reset_counter,
bool interruptible,
s64 *timeout,
-   struct intel_rps_client *rps)
+   struct intel_rps_client *rps,
+   bool is_locked)
 {
struct intel_engine_cs *ring = i915_gem_request_get_ring(req);
struct drm_device *dev = ring->dev;
@@ -1268,8 +1269,10 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
DEFINE_WAIT(wait);
unsigned long timeout_expire;
s64 before = 0; /* Only to silence a compiler warning. */
-   int ret;
+   int ret = 0;
+   boolbusy;
 
+   might_sleep();
WARN(!intel_irqs_enabled(dev_priv), "IRQs disabled");
 
if (i915_gem_request_completed(req))
@@ -1324,6 +1327,26 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
break;
}
 
+   if (is_locked) {
+   /*
+* If this request is being processed by the scheduler
+* then it is unsafe to sleep with the mutex lock held
+* as the scheduler may require the lock in order to
+* progress the request.
+*/
+   if (i915_scheduler_is_request_tracked(req, NULL, 
)) {
+   if (busy) {
+   ret = -EAGAIN;
+   break;
+   }
+   }
+
+   /*
+* If the request is not tracked by the scheduler
+* then the regular test can be done.
+*/
+   }
+
if (i915_gem_request_completed(req)) {
  

[Intel-gfx] [PATCH v5 31/35] drm/i915: Scheduler state dump via debugfs

2016-02-18 Thread John . C . Harrison
From: John Harrison 

Added a facility for triggering the scheduler state dump via a debugfs
entry.

v2: New patch in series.

For: VIZ-1587
Signed-off-by: John Harrison 
---
 drivers/gpu/drm/i915/i915_debugfs.c   | 33 +
 drivers/gpu/drm/i915/i915_scheduler.c |  9 +
 drivers/gpu/drm/i915/i915_scheduler.h |  6 ++
 3 files changed, 44 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index 2c8b00f..e0dc06d77 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1285,6 +1285,38 @@ 
DEFINE_SIMPLE_ATTRIBUTE(i915_scheduler_file_queue_max_fops,
i915_scheduler_file_queue_max_set,
"%llu\n");
 
+static int
+i915_scheduler_dump_flags_get(void *data, u64 *val)
+{
+   struct drm_device   *dev   = data;
+   struct drm_i915_private *dev_priv  = dev->dev_private;
+   struct i915_scheduler   *scheduler = dev_priv->scheduler;
+
+   *val = scheduler->dump_flags;
+
+   return 0;
+}
+
+static int
+i915_scheduler_dump_flags_set(void *data, u64 val)
+{
+   struct drm_device   *dev   = data;
+   struct drm_i915_private *dev_priv  = dev->dev_private;
+   struct i915_scheduler   *scheduler = dev_priv->scheduler;
+
+   scheduler->dump_flags = lower_32_bits(val) & i915_sf_dump_mask;
+
+   if (val & 1)
+   i915_scheduler_dump_all(dev, "DebugFS");
+
+   return 0;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(i915_scheduler_dump_flags_fops,
+   i915_scheduler_dump_flags_get,
+   i915_scheduler_dump_flags_set,
+   "0x%llx\n");
+
 static int i915_frequency_info(struct seq_file *m, void *unused)
 {
struct drm_info_node *node = m->private;
@@ -5666,6 +5698,7 @@ static const struct i915_debugfs_files {
{"i915_scheduler_priority_preempt", 
_scheduler_priority_preempt_fops},
{"i915_scheduler_min_flying", _scheduler_min_flying_fops},
{"i915_scheduler_file_queue_max", _scheduler_file_queue_max_fops},
+   {"i915_scheduler_dump_flags", _scheduler_dump_flags_fops},
{"i915_display_crc_ctl", _display_crc_ctl_fops},
{"i915_pri_wm_latency", _pri_wm_latency_fops},
{"i915_spr_wm_latency", _spr_wm_latency_fops},
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c 
b/drivers/gpu/drm/i915/i915_scheduler.c
index c69e2b8..b738e0b 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -184,6 +184,10 @@ int i915_scheduler_init(struct drm_device *dev)
scheduler->priority_level_preempt = 900;
scheduler->min_flying = 2;
scheduler->file_queue_max = 64;
+   scheduler->dump_flags = i915_sf_dump_force   |
+   i915_sf_dump_details |
+   i915_sf_dump_seqno   |
+   i915_sf_dump_dependencies;
 
dev_priv->scheduler = scheduler;
 
@@ -1311,10 +1315,7 @@ static int i915_scheduler_dump_all_locked(struct 
drm_device *dev,
int i, r, ret = 0;
 
for_each_ring(ring, dev_priv, i) {
-   scheduler->flags[ring->id] |= i915_sf_dump_force   |
- i915_sf_dump_details |
- i915_sf_dump_seqno   |
- i915_sf_dump_dependencies;
+   scheduler->flags[ring->id] |= scheduler->dump_flags & 
i915_sf_dump_mask;
r = i915_scheduler_dump_locked(ring, msg);
if (ret == 0)
ret = r;
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h 
b/drivers/gpu/drm/i915/i915_scheduler.h
index dcf1f05..47c7951 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -108,6 +108,7 @@ struct i915_scheduler {
int32_t priority_level_preempt;
uint32_tmin_flying;
uint32_tfile_queue_max;
+   uint32_tdump_flags;
 
/* Statistics: */
struct i915_scheduler_stats stats[I915_NUM_RINGS];
@@ -124,6 +125,11 @@ enum {
i915_sf_dump_details= (1 << 9),
i915_sf_dump_dependencies   = (1 << 10),
i915_sf_dump_seqno  = (1 << 11),
+
+   i915_sf_dump_mask   = i915_sf_dump_force|
+ i915_sf_dump_details  |
+ i915_sf_dump_dependencies |
+ i915_sf_dump_seqno,
 };
 const char *i915_scheduler_flag_str(uint32_t flags);
 
-- 
1.9.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org

[Intel-gfx] [PATCH v5 32/35] drm/i915: Enable GPU scheduler by default

2016-02-18 Thread John . C . Harrison
From: John Harrison 

Now that all the scheduler patches have been applied, it is safe to enable.

v5: Updated for new module parameter.

For: VIZ-1587
Signed-off-by: John Harrison 
---
 drivers/gpu/drm/i915/i915_params.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_params.c 
b/drivers/gpu/drm/i915/i915_params.c
index 0ef3159..9be486f 100644
--- a/drivers/gpu/drm/i915/i915_params.c
+++ b/drivers/gpu/drm/i915/i915_params.c
@@ -57,7 +57,7 @@ struct i915_params i915 __read_mostly = {
.edp_vswing = 0,
.enable_guc_submission = true,
.guc_log_level = -1,
-   .enable_scheduler = 0,
+   .enable_scheduler = 1,
 };
 
 module_param_named(modeset, i915.modeset, int, 0400);
@@ -206,4 +206,4 @@ MODULE_PARM_DESC(guc_log_level,
"GuC firmware logging level (-1:disabled (default), 0-3:enabled)");
 
 module_param_named_unsafe(enable_scheduler, i915.enable_scheduler, int, 0600);
-MODULE_PARM_DESC(enable_scheduler, "Enable scheduler (0 = disable [default], 1 
= enable)");
+MODULE_PARM_DESC(enable_scheduler, "Enable scheduler (0 = disable, 1 = enable 
[default])");
-- 
1.9.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v5 24/35] drm/i915: Added trace points to scheduler

2016-02-18 Thread John . C . Harrison
From: John Harrison 

Added trace points to the scheduler to track all the various events,
node state transitions and other interesting things that occur.

v2: Updated for new request completion tracking implementation.

v3: Updated for changes to node kill code.

v4: Wrapped some long lines to keep the style checker happy.

For: VIZ-1587
Signed-off-by: John Harrison 
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   2 +
 drivers/gpu/drm/i915/i915_scheduler.c  |  26 
 drivers/gpu/drm/i915/i915_trace.h  | 196 +
 drivers/gpu/drm/i915/intel_lrc.c   |   2 +
 4 files changed, 226 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index b9ad0fd..d4de8c7 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1272,6 +1272,8 @@ i915_gem_ringbuffer_submission(struct 
i915_execbuffer_params *params,
 
i915_gem_execbuffer_move_to_active(vmas, params->request);
 
+   trace_i915_gem_ring_queue(ring, params);
+
qe = container_of(params, typeof(*qe), params);
ret = i915_scheduler_queue_execbuffer(qe);
if (ret)
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c 
b/drivers/gpu/drm/i915/i915_scheduler.c
index 47d7de4..e56ce08 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -88,6 +88,8 @@ static void i915_scheduler_node_requeue(struct 
i915_scheduler_queue_entry *node)
/* Seqno will be reassigned on relaunch */
node->params.request->seqno = 0;
node->status = i915_sqs_queued;
+   trace_i915_scheduler_unfly(node->params.ring, node);
+   trace_i915_scheduler_node_state_change(node->params.ring, node);
 }
 
 /*
@@ -99,7 +101,11 @@ static void i915_scheduler_node_kill(struct 
i915_scheduler_queue_entry *node)
WARN_ON(!node);
WARN_ON(I915_SQS_IS_COMPLETE(node));
 
+   if (I915_SQS_IS_FLYING(node))
+   trace_i915_scheduler_unfly(node->params.ring, node);
+
node->status = i915_sqs_dead;
+   trace_i915_scheduler_node_state_change(node->params.ring, node);
 }
 
 /* Mark a node as in flight on the hardware. */
@@ -124,6 +130,9 @@ static int i915_scheduler_node_fly(struct 
i915_scheduler_queue_entry *node)
 
node->status = i915_sqs_flying;
 
+   trace_i915_scheduler_fly(ring, node);
+   trace_i915_scheduler_node_state_change(ring, node);
+
if (!(scheduler->flags[ring->id] & i915_sf_interrupts_enabled)) {
bool success = true;
 
@@ -280,6 +289,8 @@ static int i915_scheduler_pop_from_queue_locked(struct 
intel_engine_cs *ring,
INIT_LIST_HEAD(>link);
best->status  = i915_sqs_popped;
 
+   trace_i915_scheduler_node_state_change(ring, best);
+
ret = 0;
} else {
/* Can only get here if:
@@ -297,6 +308,8 @@ static int i915_scheduler_pop_from_queue_locked(struct 
intel_engine_cs *ring,
}
}
 
+   trace_i915_scheduler_pop_from_queue(ring, best);
+
*pop_node = best;
return ret;
 }
@@ -506,6 +519,8 @@ static int i915_scheduler_queue_execbuffer_bypass(struct 
i915_scheduler_queue_en
struct i915_scheduler *scheduler = dev_priv->scheduler;
int ret;
 
+   trace_i915_scheduler_queue(qe->params.ring, qe);
+
intel_ring_reserved_space_cancel(qe->params.request->ringbuf);
 
scheduler->flags[qe->params.ring->id] |= i915_sf_submitting;
@@ -628,6 +643,9 @@ int i915_scheduler_queue_execbuffer(struct 
i915_scheduler_queue_entry *qe)
not_flying = i915_scheduler_count_flying(scheduler, ring) <
 scheduler->min_flying;
 
+   trace_i915_scheduler_queue(ring, node);
+   trace_i915_scheduler_node_state_change(ring, node);
+
spin_unlock_irq(>lock);
 
if (not_flying)
@@ -657,6 +675,8 @@ bool i915_scheduler_notify_request(struct 
drm_i915_gem_request *req)
struct i915_scheduler_queue_entry *node = req->scheduler_qe;
unsigned long flags;
 
+   trace_i915_scheduler_landing(req);
+
if (!node)
return false;
 
@@ -670,6 +690,8 @@ bool i915_scheduler_notify_request(struct 
drm_i915_gem_request *req)
else
node->status = i915_sqs_complete;
 
+   trace_i915_scheduler_node_state_change(req->ring, node);
+
spin_unlock_irqrestore(>lock, flags);
 
return true;
@@ -877,6 +899,8 @@ static bool i915_scheduler_remove(struct i915_scheduler 
*scheduler,
/* Launch more packets now? */
do_submit = (queued > 0) && (flying < scheduler->min_flying);
 
+   trace_i915_scheduler_remove(ring, min_seqno, do_submit);
+
spin_unlock_irq(>lock);
 
return do_submit;
@@ -912,6 +936,8 @@ static void 

[Intel-gfx] [PATCH v5 15/35] drm/i915: Added tracking/locking of batch buffer objects

2016-02-18 Thread John . C . Harrison
From: John Harrison 

The scheduler needs to track interdependencies between batch buffers.
These are calculated by analysing the object lists of the buffers and
looking for commonality. The scheduler also needs to keep those
buffers locked long after the initial IOCTL call has returned to user
land.

v3: Updated to support read-read optimisation.

v5: Updated due to changes to earlier patches in series for splitting
bypass mode into a separate function and consoliding the clean up code.

For: VIZ-1587
Signed-off-by: John Harrison 
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 48 --
 drivers/gpu/drm/i915/i915_scheduler.c  | 15 ++
 2 files changed, 61 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 11bea8d..f45f4dc 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1428,7 +1428,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
struct i915_execbuffer_params *params = 
const u32 ctx_id = i915_execbuffer2_get_context_id(*args);
u32 dispatch_flags;
-   int ret;
+   int ret, i;
bool need_relocs;
 
if (!i915_gem_check_execbuffer(args))
@@ -1543,6 +1543,14 @@ i915_gem_do_execbuffer(struct drm_device *dev, void 
*data,
goto pre_mutex_err;
}
 
+   qe.saved_objects = kzalloc(
+   sizeof(*qe.saved_objects) * args->buffer_count,
+   GFP_KERNEL);
+   if (!qe.saved_objects) {
+   ret = -ENOMEM;
+   goto err;
+   }
+
/* Look up object handles */
ret = eb_lookup_vmas(eb, exec, args, vm, file);
if (ret)
@@ -1663,7 +1671,30 @@ i915_gem_do_execbuffer(struct drm_device *dev, void 
*data,
params->args_DR1= args->DR1;
params->args_DR4= args->DR4;
params->batch_obj   = batch_obj;
-   params->ctx = ctx;
+
+   /*
+* Save away the list of objects used by this batch buffer for the
+* purpose of tracking inter-buffer dependencies.
+*/
+   for (i = 0; i < args->buffer_count; i++) {
+   struct drm_i915_gem_object *obj;
+
+   /*
+* NB: 'drm_gem_object_lookup()' increments the object's
+* reference count and so must be matched by a
+* 'drm_gem_object_unreference' call.
+*/
+   obj = to_intel_bo(drm_gem_object_lookup(dev, file,
+ exec[i].handle));
+   qe.saved_objects[i].obj   = obj;
+   qe.saved_objects[i].read_only = obj->base.pending_write_domain 
== 0;
+
+   }
+   qe.num_objs = i;
+
+   /* Lock and save the context object as well. */
+   i915_gem_context_reference(ctx);
+   params->ctx = ctx;
 
ret = dev_priv->gt.execbuf_submit(params, args, >vmas);
if (ret)
@@ -1696,6 +1727,19 @@ err:
i915_gem_context_unreference(ctx);
eb_destroy(eb);
 
+   /* Need to release the objects: */
+   if (qe.saved_objects) {
+   for (i = 0; i < qe.num_objs; i++)
+   drm_gem_object_unreference(
+   _objects[i].obj->base);
+
+   kfree(qe.saved_objects);
+   }
+
+   /* Context too */
+   if (params->ctx)
+   i915_gem_context_unreference(params->ctx);
+
/*
 * If the request was created but not successfully submitted then it
 * must be freed again. If it was submitted then it is being tracked
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c 
b/drivers/gpu/drm/i915/i915_scheduler.c
index a3ffd04..60a59d3 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -719,6 +719,8 @@ void i915_scheduler_wakeup(struct drm_device *dev)
  */
 void i915_scheduler_clean_node(struct i915_scheduler_queue_entry *node)
 {
+   int i;
+
if (!I915_SQS_IS_COMPLETE(node)) {
WARN(!node->params.request->cancelled,
 "Cleaning active node: %d!\n", node->status);
@@ -736,6 +738,19 @@ void i915_scheduler_clean_node(struct 
i915_scheduler_queue_entry *node)
node->params.batch_obj = NULL;
}
 
+   /* Release the locked buffers: */
+   for (i = 0; i < node->num_objs; i++)
+   drm_gem_object_unreference(>saved_objects[i].obj->base);
+   kfree(node->saved_objects);
+   node->saved_objects = NULL;
+   node->num_objs = 0;
+
+   /* Context too: */
+   if (node->params.ctx) {
+   i915_gem_context_unreference(node->params.ctx);
+   node->params.ctx = NULL;
+   }
+
/* And anything else owned by the 

[Intel-gfx] [PATCH v5 27/35] drm/i915: Added debug state dump facilities to scheduler

2016-02-18 Thread John . C . Harrison
From: John Harrison 

When debugging batch buffer submission issues, it is useful to be able
to see what the current state of the scheduler is. This change adds
functions for decoding the internal scheduler state and reporting it.

v3: Updated a debug message with the new state_str() function.

v4: Wrapped some long lines to keep the style checker happy. Removed
the fence/sync code as that will now be part of a separate patch series.

v5: Removed forward declarations and white space. Added documentation.
[Joonas Lahtinen]

Also squashed in later patch to add seqno information from the start.
It was only being added in a separate patch due to historical reasons
which have since gone away.

For: VIZ-1587
Signed-off-by: John Harrison 
Cc: Joonas Lahtinen 
---
 drivers/gpu/drm/i915/i915_scheduler.c | 302 +-
 drivers/gpu/drm/i915/i915_scheduler.h |  15 ++
 2 files changed, 315 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_scheduler.c 
b/drivers/gpu/drm/i915/i915_scheduler.c
index f7f29d5..d0eed52 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -40,6 +40,117 @@ bool i915_scheduler_is_enabled(struct drm_device *dev)
return dev_priv->scheduler != NULL;
 }
 
+const char *i915_qe_state_str(struct i915_scheduler_queue_entry *node)
+{
+   static char str[50];
+   char*ptr = str;
+
+   *(ptr++) = node->bumped ? 'B' : '-',
+   *(ptr++) = i915_gem_request_completed(node->params.request) ? 'C' : '-';
+
+   *ptr = 0;
+
+   return str;
+}
+
+char i915_scheduler_queue_status_chr(enum i915_scheduler_queue_status status)
+{
+   switch (status) {
+   case i915_sqs_none:
+   return 'N';
+
+   case i915_sqs_queued:
+   return 'Q';
+
+   case i915_sqs_popped:
+   return 'X';
+
+   case i915_sqs_flying:
+   return 'F';
+
+   case i915_sqs_complete:
+   return 'C';
+
+   case i915_sqs_dead:
+   return 'D';
+
+   default:
+   break;
+   }
+
+   return '?';
+}
+
+const char *i915_scheduler_queue_status_str(
+   enum i915_scheduler_queue_status status)
+{
+   static char str[50];
+
+   switch (status) {
+   case i915_sqs_none:
+   return "None";
+
+   case i915_sqs_queued:
+   return "Queued";
+
+   case i915_sqs_popped:
+   return "Popped";
+
+   case i915_sqs_flying:
+   return "Flying";
+
+   case i915_sqs_complete:
+   return "Complete";
+
+   case i915_sqs_dead:
+   return "Dead";
+
+   default:
+   break;
+   }
+
+   sprintf(str, "[Unknown_%d!]", status);
+   return str;
+}
+
+const char *i915_scheduler_flag_str(uint32_t flags)
+{
+   static char str[100];
+   char *ptr = str;
+
+   *ptr = 0;
+
+#define TEST_FLAG(flag, msg)   \
+   do {\
+   if (flags & (flag)) {   \
+   strcpy(ptr, msg);   \
+   ptr += strlen(ptr); \
+   flags &= ~(flag);   \
+   }   \
+   } while (0)
+
+   TEST_FLAG(i915_sf_interrupts_enabled, "IntOn|");
+   TEST_FLAG(i915_sf_submitting, "Submitting|");
+   TEST_FLAG(i915_sf_dump_force, "DumpForce|");
+   TEST_FLAG(i915_sf_dump_details,   "DumpDetails|");
+   TEST_FLAG(i915_sf_dump_dependencies,  "DumpDeps|");
+   TEST_FLAG(i915_sf_dump_seqno, "DumpSeqno|");
+
+#undef TEST_FLAG
+
+   if (flags) {
+   sprintf(ptr, "Unknown_0x%X!", flags);
+   ptr += strlen(ptr);
+   }
+
+   if (ptr == str)
+   strcpy(str, "-");
+   else
+   ptr[-1] = 0;
+
+   return str;
+};
+
 /**
  * i915_scheduler_init - Initialise the scheduler.
  * @dev: DRM device
@@ -1024,6 +1135,193 @@ void i915_scheduler_work_handler(struct work_struct 
*work)
i915_scheduler_process_work(ring);
 }
 
+static int i915_scheduler_dump_locked(struct intel_engine_cs *ring,
+ const char *msg)
+{
+   struct drm_i915_private *dev_priv = ring->dev->dev_private;
+   struct i915_scheduler *scheduler = dev_priv->scheduler;
+   struct i915_scheduler_queue_entry *node;
+   int flying = 0, queued = 0, complete = 0, other = 0;
+   static int old_flying = -1, old_queued = -1, old_complete = -1;
+   bool b_dump;
+   char brkt[2] = { '<', '>' };
+
+   if (!ring)
+   return -EINVAL;
+
+   list_for_each_entry(node, >node_queue[ring->id], link) {
+   if 

[Intel-gfx] [PATCH v5 23/35] drm/i915: Defer seqno allocation until actual hardware submission time

2016-02-18 Thread John . C . Harrison
From: John Harrison 

The seqno value is now only used for the final test for completion of
a request. It is no longer used to track the request through the
software stack. Thus it is no longer necessary to allocate the seqno
immediately with the request. Instead, it can be done lazily and left
until the request is actually sent to the hardware. This is particular
advantageous with a GPU scheduler as the requests can then be
re-ordered between their creation and their hardware submission
without having out of order seqnos.

v2: i915_add_request() can't fail!

Combine with 'drm/i915: Assign seqno at start of exec_final()'
Various bits of code during the execbuf code path need a seqno value
to be assigned to the request. This change makes this assignment
explicit at the start of submission_final() rather than relying on an
auto-generated seqno to have happened already. This is in preparation
for a future patch which changes seqno values to be assigned lazily
(during add_request).

v3: Updated to use locally cached request pointer.

v4: Changed some white space and comment formatting to keep the style
checker happy.

For: VIZ-1587
Signed-off-by: John Harrison 
---
 drivers/gpu/drm/i915/i915_drv.h|  1 +
 drivers/gpu/drm/i915/i915_gem.c| 23 ++-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 14 ++
 drivers/gpu/drm/i915/intel_lrc.c   | 14 ++
 4 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 5eeeced..071a27b 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2257,6 +2257,7 @@ struct drm_i915_gem_request {
  * has finished processing this request.
  */
u32 seqno;
+   u32 reserved_seqno;
 
/* Unique identifier which can be used for trace points & debug */
uint32_t uniq;
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index a249e52..a2c136d 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2616,6 +2616,11 @@ i915_gem_get_seqno(struct drm_device *dev, u32 *seqno)
 
/* reserve 0 for non-seqno */
if (dev_priv->next_seqno == 0) {
+   /*
+* Why is the full re-initialisation required? Is it only for
+* hardware semaphores? If so, could skip it in the case where
+* semaphores are disabled?
+*/
int ret = i915_gem_init_seqno(dev, 0);
if (ret)
return ret;
@@ -2673,6 +2678,12 @@ void __i915_add_request(struct drm_i915_gem_request 
*request,
WARN(ret, "*_ring_flush_all_caches failed: %d!\n", ret);
}
 
+   /* Make the request's seqno 'live': */
+   if (!request->seqno) {
+   request->seqno = request->reserved_seqno;
+   WARN_ON(request->seqno != dev_priv->last_seqno);
+   }
+
/* Record the position of the start of the request so that
 * should we detect the updated seqno part-way through the
 * GPU processing the request, we never over-estimate the
@@ -2930,6 +2941,9 @@ void i915_gem_request_notify(struct intel_engine_cs 
*ring, bool fence_locked)
 
list_for_each_entry_safe(req, req_next, >fence_signal_list, 
signal_link) {
if (!req->cancelled) {
+   /* How can this happen? */
+   WARN_ON(req->seqno == 0);
+
if (!i915_seqno_passed(seqno, req->seqno))
break;
}
@@ -3079,7 +3093,14 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
if (req == NULL)
return -ENOMEM;
 
-   ret = i915_gem_get_seqno(ring->dev, >seqno);
+   /*
+* Assign an identifier to track this request through the hardware
+* but don't make it live yet. It could change in the future if this
+* request gets overtaken. However, it still needs to be allocated
+* in advance because the point of submission must not fail and seqno
+* allocation can fail.
+*/
+   ret = i915_gem_get_seqno(ring->dev, >reserved_seqno);
if (ret)
goto err;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index f45f4dc..b9ad0fd 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1295,6 +1295,20 @@ int i915_gem_ringbuffer_submission_final(struct 
i915_execbuffer_params *params)
/* The mutex must be acquired before calling this function */
WARN_ON(!mutex_is_locked(>dev->struct_mutex));
 
+   /* Make sure the request's seqno is the latest and greatest: */
+   if (req->reserved_seqno != dev_priv->last_seqno) {
+   ret = 

[Intel-gfx] [PATCH v5 28/35] drm/i915: Add early exit to execbuff_final() if insufficient ring space

2016-02-18 Thread John . C . Harrison
From: John Harrison 

One of the major purposes of the GPU scheduler is to avoid stalling
the CPU when the GPU is busy and unable to accept more work. This
change adds support to the ring submission code to allow a ring space
check to be performed before attempting to submit a batch buffer to
the hardware. If insufficient space is available then the scheduler
can go away and come back later, letting the CPU get on with other
work, rather than stalling and waiting for the hardware to catch up.

v3: Updated to use locally cached request pointer.

v4: Line wrapped some comments differently to keep the style checker
happy. Downgraded a BUG_ON to a WARN_ON as the latter is preferred.

Removed some obsolete, commented out code.

For: VIZ-1587
Signed-off-by: John Harrison 
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 41 +--
 drivers/gpu/drm/i915/intel_lrc.c   | 54 +++---
 drivers/gpu/drm/i915/intel_ringbuffer.c| 26 ++
 drivers/gpu/drm/i915/intel_ringbuffer.h|  1 +
 4 files changed, 107 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index dff120c..83ce94d 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1147,25 +1147,19 @@ i915_reset_gen7_sol_offsets(struct drm_device *dev,
 {
struct intel_engine_cs *ring = req->ring;
struct drm_i915_private *dev_priv = dev->dev_private;
-   int ret, i;
+   int i;
 
if (!IS_GEN7(dev) || ring != _priv->ring[RCS]) {
DRM_DEBUG("sol reset is gen7/rcs only\n");
return -EINVAL;
}
 
-   ret = intel_ring_begin(req, 4 * 3);
-   if (ret)
-   return ret;
-
for (i = 0; i < 4; i++) {
intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
intel_ring_emit_reg(ring, GEN7_SO_WRITE_OFFSET(i));
intel_ring_emit(ring, 0);
}
 
-   intel_ring_advance(ring);
-
return 0;
 }
 
@@ -1293,6 +1287,7 @@ int i915_gem_ringbuffer_submission_final(struct 
i915_execbuffer_params *params)
struct intel_engine_cs  *ring = params->ring;
u64 exec_start, exec_len;
int ret;
+   uint32_t min_space;
 
/* The mutex must be acquired before calling this function */
WARN_ON(!mutex_is_locked(>dev->struct_mutex));
@@ -1316,6 +1311,34 @@ int i915_gem_ringbuffer_submission_final(struct 
i915_execbuffer_params *params)
goto error;
 
/*
+* It would be a bad idea to run out of space while writing commands
+* to the ring. One of the major aims of the scheduler is to not
+* stall at any point for any reason. However, doing an early exit
+* half way through submission could result in a partial sequence
+* being written which would leave the engine in an unknown state.
+* Therefore, check in advance that there will be enough space for
+* the entire submission whether emitted by the code below OR by any
+* other functions that may be executed before the end of final().
+*
+* NB: This test deliberately overestimates, because that's easier
+* than tracing every potential path that could be taken!
+*
+* Current measurements suggest that we may need to emit up to 186
+* dwords, so this is rounded up to 256 here. Then double that to get
+* the free space requirement, because the block is not allowed to
+* span the transition from the end to the beginning of the ring.
+*/
+#define I915_BATCH_EXEC_MAX_LEN 256/* max dwords emitted here */
+   min_space = I915_BATCH_EXEC_MAX_LEN * 2 * sizeof(uint32_t);
+   ret = intel_ring_test_space(req->ringbuf, min_space);
+   if (ret)
+   goto error;
+
+   ret = intel_ring_begin(req, I915_BATCH_EXEC_MAX_LEN);
+   if (ret)
+   goto error;
+
+   /*
 * Unconditionally invalidate gpu caches and ensure that we do flush
 * any residual writes from the previous batch.
 */
@@ -1333,10 +1356,6 @@ int i915_gem_ringbuffer_submission_final(struct 
i915_execbuffer_params *params)
 
if (ring == _priv->ring[RCS] &&
params->instp_mode != dev_priv->relative_constants_mode) {
-   ret = intel_ring_begin(req, 4);
-   if (ret)
-   goto error;
-
intel_ring_emit(ring, MI_NOOP);
intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
intel_ring_emit_reg(ring, INSTPM);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 2b9f49c..e124443 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -231,6 +231,27 @@ static void 

[Intel-gfx] [PATCH v5 11/35] drm/i915: Added scheduler hook into i915_gem_request_notify()

2016-02-18 Thread John . C . Harrison
From: John Harrison 

The scheduler needs to know when requests have completed so that it
can keep its own internal state up to date and can submit new requests
to the hardware from its queue.

v2: Updated due to changes in request handling. The operation is now
reversed from before. Rather than the scheduler being in control of
completion events, it is now the request code itself. The scheduler
merely receives a notification event. It can then optionally request
it's worker thread be woken up after all completion processing is
complete.

v4: Downgraded a BUG_ON to a WARN_ON as the latter is preferred.

v5: Squashed the i915_scheduler.c portions down into the 'start of
scheduler' patch. [Joonas Lahtinen]

For: VIZ-1587
Signed-off-by: John Harrison 
Cc: Joonas Lahtinen 
---
 drivers/gpu/drm/i915/i915_gem.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 0003cfc..c3b7def 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2872,6 +2872,7 @@ void i915_gem_request_notify(struct intel_engine_cs 
*ring, bool fence_locked)
 {
struct drm_i915_gem_request *req, *req_next;
unsigned long flags;
+   bool wake_sched = false;
u32 seqno;
 
if (list_empty(>fence_signal_list)) {
@@ -2908,6 +2909,14 @@ void i915_gem_request_notify(struct intel_engine_cs 
*ring, bool fence_locked)
 */
list_del_init(>signal_link);
 
+   /*
+* NB: Must notify the scheduler before signalling
+* the node. Otherwise the node can get retired first
+* and call scheduler_clean() while the scheduler
+* thinks it is still active.
+*/
+   wake_sched |= i915_scheduler_notify_request(req);
+
if (!req->cancelled) {
fence_signal_locked(>fence);
trace_i915_gem_request_complete(req);
@@ -2924,6 +2933,13 @@ void i915_gem_request_notify(struct intel_engine_cs 
*ring, bool fence_locked)
 
if (!fence_locked)
spin_unlock_irqrestore(>fence_lock, flags);
+
+   /* Necessary? Or does the fence_signal() call do an implicit wakeup? */
+   wake_up_all(>irq_queue);
+
+   /* Final scheduler processing after all individual updates are done. */
+   if (wake_sched)
+   i915_scheduler_wakeup(ring->dev);
 }
 
 static const char *i915_gem_request_get_driver_name(struct fence *req_fence)
-- 
1.9.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v5 08/35] drm/i915: Disable hardware semaphores when GPU scheduler is enabled

2016-02-18 Thread John . C . Harrison
From: John Harrison 

Hardware sempahores require seqno values to be continuously
incrementing. However, the scheduler's reordering of batch buffers
means that the seqno values going through the hardware could be out of
order. Thus semaphores can not be used.

On the other hand, the scheduler superceeds the need for hardware
semaphores anyway. Having one ring stall waiting for something to
complete on another ring is inefficient if that ring could be working
on some other, independent task. This is what the scheduler is meant
to do - keep the hardware as busy as possible by reordering batch
buffers to avoid dependency stalls.

v4: Downgraded a BUG_ON to WARN_ON as the latter is preferred.

v5: Squashed the i915_scheduler.c portions down into the 'start of
scheduler' patch. [Joonas Lahtinen]

For: VIZ-1587
Signed-off-by: John Harrison 
Cc: Joonas Lahtinen 
---
 drivers/gpu/drm/i915/i915_drv.c | 9 +
 drivers/gpu/drm/i915/intel_ringbuffer.c | 4 
 2 files changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 975af35..5760a17 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -34,6 +34,7 @@
 #include "i915_drv.h"
 #include "i915_trace.h"
 #include "intel_drv.h"
+#include "i915_scheduler.h"
 
 #include 
 #include 
@@ -517,6 +518,14 @@ void intel_detect_pch(struct drm_device *dev)
 
 bool i915_semaphore_is_enabled(struct drm_device *dev)
 {
+   /* Hardware semaphores are not compatible with the scheduler due to the
+* seqno values being potentially out of order. However, semaphores are
+* also not required as the scheduler will handle interring dependencies
+* and try do so in a way that does not cause dead time on the hardware.
+*/
+   if (i915_scheduler_is_enabled(dev))
+   return false;
+
if (INTEL_INFO(dev)->gen < 6)
return false;
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 9d4f19d..ca7b8af 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -33,6 +33,7 @@
 #include 
 #include "i915_trace.h"
 #include "intel_drv.h"
+#include "i915_scheduler.h"
 
 int __intel_ring_space(int head, int tail, int size)
 {
@@ -1400,6 +1401,9 @@ gen6_ring_sync(struct drm_i915_gem_request *waiter_req,
u32 wait_mbox = signaller->semaphore.mbox.wait[waiter->id];
int ret;
 
+   /* Arithmetic on sequence numbers is unreliable with a scheduler. */
+   WARN_ON(i915_scheduler_is_enabled(signaller->dev));
+
/* Throughout all of the GEM code, seqno passed implies our current
 * seqno is >= the last seqno executed. However for hardware the
 * comparison is strictly greater than.
-- 
1.9.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v5 13/35] drm/i915: Redirect execbuffer_final() via scheduler

2016-02-18 Thread John . C . Harrison
From: John Harrison 

Updated the execbuffer() code to pass the packaged up batch buffer
information to the scheduler rather than calling execbuffer_final()
directly. The scheduler queue() code is currently a stub which simply
chains on to _final() immediately.

For: VIZ-1587
Signed-off-by: John Harrison 
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 18 +++---
 drivers/gpu/drm/i915/intel_lrc.c   | 12 
 2 files changed, 11 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 7978dae..09c5ce9 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -33,6 +33,7 @@
 #include "intel_drv.h"
 #include 
 #include 
+#include "i915_scheduler.h"
 
 #define  __EXEC_OBJECT_HAS_PIN (1<<31)
 #define  __EXEC_OBJECT_HAS_FENCE (1<<30)
@@ -1226,6 +1227,7 @@ i915_gem_ringbuffer_submission(struct 
i915_execbuffer_params *params,
   struct drm_i915_gem_execbuffer2 *args,
   struct list_head *vmas)
 {
+   struct i915_scheduler_queue_entry *qe;
struct drm_device *dev = params->dev;
struct intel_engine_cs *ring = params->ring;
struct drm_i915_private *dev_priv = dev->dev_private;
@@ -1270,17 +1272,11 @@ i915_gem_ringbuffer_submission(struct 
i915_execbuffer_params *params,
 
i915_gem_execbuffer_move_to_active(vmas, params->request);
 
-   ret = dev_priv->gt.execbuf_final(params);
+   qe = container_of(params, typeof(*qe), params);
+   ret = i915_scheduler_queue_execbuffer(qe);
if (ret)
return ret;
 
-   /*
-* Free everything that was stored in the QE structure (until the
-* scheduler arrives and does it instead):
-*/
-   if (params->dispatch_flags & I915_DISPATCH_SECURE)
-   i915_gem_execbuff_release_batch_obj(params->batch_obj);
-
return 0;
 }
 
@@ -1420,8 +1416,8 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
struct intel_engine_cs *ring;
struct intel_context *ctx;
struct i915_address_space *vm;
-   struct i915_execbuffer_params params_master; /* XXX: will be removed 
later */
-   struct i915_execbuffer_params *params = _master;
+   struct i915_scheduler_queue_entry qe;
+   struct i915_execbuffer_params *params = 
const u32 ctx_id = i915_execbuffer2_get_context_id(*args);
u32 dispatch_flags;
int ret;
@@ -1529,7 +1525,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
else
vm = _priv->gtt.base;
 
-   memset(_master, 0x00, sizeof(params_master));
+   memset(, 0x00, sizeof(qe));
 
eb = eb_create(args);
if (eb == NULL) {
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 12e8949..ff4565f 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -136,6 +136,7 @@
 #include 
 #include "i915_drv.h"
 #include "intel_mocs.h"
+#include "i915_scheduler.h"
 
 #define GEN9_LR_CONTEXT_RENDER_SIZE (22 * PAGE_SIZE)
 #define GEN8_LR_CONTEXT_RENDER_SIZE (20 * PAGE_SIZE)
@@ -910,6 +911,7 @@ int intel_execlists_submission(struct 
i915_execbuffer_params *params,
   struct drm_i915_gem_execbuffer2 *args,
   struct list_head *vmas)
 {
+   struct i915_scheduler_queue_entry *qe;
struct drm_device   *dev = params->dev;
struct intel_engine_cs  *ring = params->ring;
struct drm_i915_private *dev_priv = dev->dev_private;
@@ -952,17 +954,11 @@ int intel_execlists_submission(struct 
i915_execbuffer_params *params,
 
i915_gem_execbuffer_move_to_active(vmas, params->request);
 
-   ret = dev_priv->gt.execbuf_final(params);
+   qe = container_of(params, typeof(*qe), params);
+   ret = i915_scheduler_queue_execbuffer(qe);
if (ret)
return ret;
 
-   /*
-* Free everything that was stored in the QE structure (until the
-* scheduler arrives and does it instead):
-*/
-   if (params->dispatch_flags & I915_DISPATCH_SECURE)
-   i915_gem_execbuff_release_batch_obj(params->batch_obj);
-
return 0;
 }
 
-- 
1.9.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v5 05/35] drm/i915: Re-instate request->uniq because it is extremely useful

2016-02-18 Thread John . C . Harrison
From: John Harrison 

The seqno value cannot always be used when debugging issues via trace
points. This is because it can be reset back to start, especially
during TDR type tests. Also, when the scheduler arrives the seqno is
only valid while a given request is executing on the hardware. While
the request is simply queued waiting for submission, it's seqno value
will be zero (meaning invalid).

v4: Wrapped a long line to keep the style checker happy.

v5: Added uniq to the dispatch trace point [Svetlana Kukanova]

For: VIZ-5115
Signed-off-by: John Harrison 
Reviewed-by: Tomas Elf 
---
 drivers/gpu/drm/i915/i915_drv.h   |  5 +
 drivers/gpu/drm/i915/i915_gem.c   |  4 +++-
 drivers/gpu/drm/i915/i915_trace.h | 32 ++--
 3 files changed, 30 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 8dd811e..f4487b9 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1986,6 +1986,8 @@ struct drm_i915_private {
 
struct intel_encoder *dig_port_map[I915_MAX_PORTS];
 
+   uint32_t request_uniq;
+
/*
 * NOTE: This is the dri1/ums dungeon, don't add stuff here. Your patch
 * will be rejected. Instead look for a better place.
@@ -2242,6 +2244,9 @@ struct drm_i915_gem_request {
  */
u32 seqno;
 
+   /* Unique identifier which can be used for trace points & debug */
+   uint32_t uniq;
+
/** Position in the ringbuffer of the start of the request */
u32 head;
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index bf39ca4..dfe43ea 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2960,7 +2960,8 @@ static void i915_gem_request_fence_value_str(struct fence 
*req_fence,
 
req = container_of(req_fence, typeof(*req), fence);
 
-   snprintf(str, size, "%d [%d]", req->fence.seqno, req->seqno);
+   snprintf(str, size, "%d [%d:%d]", req->fence.seqno, req->uniq,
+req->seqno);
 }
 
 static const struct fence_ops i915_gem_request_fops = {
@@ -3036,6 +3037,7 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
 
req->i915 = dev_priv;
req->ring = ring;
+   req->uniq = dev_priv->request_uniq++;
req->ctx  = ctx;
i915_gem_context_reference(req->ctx);
 
diff --git a/drivers/gpu/drm/i915/i915_trace.h 
b/drivers/gpu/drm/i915/i915_trace.h
index cfe4f03..455c215 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -469,6 +469,7 @@ TRACE_EVENT(i915_gem_ring_sync_to,
 __field(u32, dev)
 __field(u32, sync_from)
 __field(u32, sync_to)
+__field(u32, uniq_to)
 __field(u32, seqno)
 ),
 
@@ -476,13 +477,14 @@ TRACE_EVENT(i915_gem_ring_sync_to,
   __entry->dev = from->dev->primary->index;
   __entry->sync_from = from->id;
   __entry->sync_to = to_req->ring->id;
+  __entry->uniq_to = to_req->uniq;
   __entry->seqno = i915_gem_request_get_seqno(req);
   ),
 
-   TP_printk("dev=%u, sync-from=%u, sync-to=%u, seqno=%u",
+   TP_printk("dev=%u, sync-from=%u, sync-to=%u, seqno=%u, to_uniq=%u",
  __entry->dev,
  __entry->sync_from, __entry->sync_to,
- __entry->seqno)
+ __entry->seqno, __entry->uniq_to)
 );
 
 TRACE_EVENT(i915_gem_ring_dispatch,
@@ -492,6 +494,7 @@ TRACE_EVENT(i915_gem_ring_dispatch,
TP_STRUCT__entry(
 __field(u32, dev)
 __field(u32, ring)
+__field(u32, uniq)
 __field(u32, seqno)
 __field(u32, flags)
 ),
@@ -501,13 +504,15 @@ TRACE_EVENT(i915_gem_ring_dispatch,
i915_gem_request_get_ring(req);
   __entry->dev = ring->dev->primary->index;
   __entry->ring = ring->id;
+  __entry->uniq = req->uniq;
   __entry->seqno = i915_gem_request_get_seqno(req);
   __entry->flags = flags;
   i915_trace_irq_get(ring, req);
   ),
 
-   TP_printk("dev=%u, ring=%u, seqno=%u, flags=%x",
- __entry->dev, __entry->ring, __entry->seqno, 
__entry->flags)
+   TP_printk("dev=%u, ring=%u, uniq=%u, seqno=%u, flags=%x",
+ __entry->dev, __entry->ring, __entry->uniq,
+   

[Intel-gfx] [PATCH v5 10/35] drm/i915: Added scheduler hook when closing DRM file handles

2016-02-18 Thread John . C . Harrison
From: John Harrison 

The scheduler decouples the submission of batch buffers to the driver
with submission of batch buffers to the hardware. Thus it is possible
for an application to close its DRM file handle while there is still
work outstanding. That means the scheduler needs to know about file
close events so it can remove the file pointer from such orphaned
batch buffers and not attempt to dereference it later.

v3: Updated to not wait for outstanding work to complete but merely
remove the file handle reference. The wait was getting excessively
complicated with inter-ring dependencies, pre-emption, and other such
issues.

v4: Changed some white space to keep the style checker happy.

v5: Added function documentation and removed apparently objectionable
white space. [Joonas Lahtinen]

Used lighter weight spinlocks.

For: VIZ-1587
Signed-off-by: John Harrison 
Cc: Joonas Lahtinen 
---
 drivers/gpu/drm/i915/i915_dma.c   |  3 +++
 drivers/gpu/drm/i915/i915_scheduler.c | 48 +++
 drivers/gpu/drm/i915/i915_scheduler.h |  2 ++
 3 files changed, 53 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index a0f5659..678adc7 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -46,6 +46,7 @@
 #include 
 #include 
 #include 
+#include "i915_scheduler.h"
 #include 
 #include 
 #include 
@@ -1258,6 +1259,8 @@ void i915_driver_lastclose(struct drm_device *dev)
 
 void i915_driver_preclose(struct drm_device *dev, struct drm_file *file)
 {
+   i915_scheduler_closefile(dev, file);
+
mutex_lock(>struct_mutex);
i915_gem_context_close(dev, file);
i915_gem_release(dev, file);
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c 
b/drivers/gpu/drm/i915/i915_scheduler.c
index fc23ee7..ab5007a 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -872,3 +872,51 @@ void i915_scheduler_process_work(struct intel_engine_cs 
*ring)
if (do_submit)
intel_runtime_pm_put(dev_priv);
 }
+
+/**
+ * i915_scheduler_closefile - notify the scheduler that a DRM file handle
+ * has been closed.
+ * @dev: DRM device
+ * @file: file being closed
+ *
+ * Goes through the scheduler's queues and removes all connections to the
+ * disappearing file handle that still exist. There is an argument to say
+ * that this should also flush such outstanding work through the hardware.
+ * However, with pre-emption, TDR and other such complications doing so
+ * becomes a locking nightmare. So instead, just warn with a debug message
+ * if the application is leaking uncompleted work and make sure a null
+ * pointer dereference will not follow.
+ */
+int i915_scheduler_closefile(struct drm_device *dev, struct drm_file *file)
+{
+   struct i915_scheduler_queue_entry *node;
+   struct drm_i915_private *dev_priv = dev->dev_private;
+   struct i915_scheduler *scheduler = dev_priv->scheduler;
+   struct intel_engine_cs *ring;
+   int i;
+
+   if (!scheduler)
+   return 0;
+
+   spin_lock_irq(>lock);
+
+   for_each_ring(ring, dev_priv, i) {
+   list_for_each_entry(node, >node_queue[ring->id], 
link) {
+   if (node->params.file != file)
+   continue;
+
+   if (!I915_SQS_IS_COMPLETE(node))
+   DRM_DEBUG_DRIVER("Closing file handle with 
outstanding work: %d:%d/%d on %s\n",
+node->params.request->uniq,
+node->params.request->seqno,
+node->status,
+ring->name);
+
+   node->params.file = NULL;
+   }
+   }
+
+   spin_unlock_irq(>lock);
+
+   return 0;
+}
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h 
b/drivers/gpu/drm/i915/i915_scheduler.h
index 415fec8..0e8b6a9 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -87,6 +87,8 @@ enum {
 
 bool i915_scheduler_is_enabled(struct drm_device *dev);
 int i915_scheduler_init(struct drm_device *dev);
+int i915_scheduler_closefile(struct drm_device *dev,
+struct drm_file *file);
 void i915_scheduler_clean_node(struct i915_scheduler_queue_entry *node);
 int i915_scheduler_queue_execbuffer(struct i915_scheduler_queue_entry *qe);
 bool i915_scheduler_notify_request(struct drm_i915_gem_request *req);
-- 
1.9.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v5 09/35] drm/i915: Force MMIO flips when scheduler enabled

2016-02-18 Thread John . C . Harrison
From: John Harrison 

MMIO flips are the preferred mechanism now but more importantly, pipe
based flips cause issues for the scheduler. Specifically, submitting
work to the rings around the side of the scheduler could cause that
work to be lost if the scheduler generates a pre-emption event on that
ring.

For: VIZ-1587
Signed-off-by: John Harrison 
---
 drivers/gpu/drm/i915/intel_display.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index 6e12ed7..731d20a 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -46,6 +46,7 @@
 #include 
 #include 
 #include 
+#include "i915_scheduler.h"
 
 /* Primary plane formats for gen <= 3 */
 static const uint32_t i8xx_primary_formats[] = {
@@ -11330,6 +11331,8 @@ static bool use_mmio_flip(struct intel_engine_cs *ring,
return true;
else if (i915.enable_execlists)
return true;
+   else if (i915_scheduler_is_enabled(ring->dev))
+   return true;
else if (obj->base.dma_buf &&
 !reservation_object_test_signaled_rcu(obj->base.dma_buf->resv,
   false))
-- 
1.9.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v5 07/35] drm/i915: Prepare retire_requests to handle out-of-order seqnos

2016-02-18 Thread John . C . Harrison
From: John Harrison 

A major point of the GPU scheduler is that it re-orders batch buffers
after they have been submitted to the driver. This leads to requests
completing out of order. In turn, this means that the retire
processing can no longer assume that all completed entries are at the
front of the list. Rather than attempting to re-order the request list
on a regular basis, it is better to simply scan the entire list.

v2: Removed deferred free code as no longer necessary due to request
handling updates.

For: VIZ-1587
Signed-off-by: John Harrison 
---
 drivers/gpu/drm/i915/i915_gem.c | 31 +--
 1 file changed, 13 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 7d9aa24..0003cfc 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3233,6 +3233,7 @@ void i915_gem_reset(struct drm_device *dev)
 void
 i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 {
+   struct drm_i915_gem_object *obj, *obj_next;
struct drm_i915_gem_request *req, *req_next;
LIST_HEAD(list_head);
 
@@ -3245,37 +3246,31 @@ i915_gem_retire_requests_ring(struct intel_engine_cs 
*ring)
 */
i915_gem_request_notify(ring, false);
 
+   /*
+* Note that request entries might be out of order due to rescheduling
+* and pre-emption. Thus both lists must be processed in their entirety
+* rather than stopping at the first non-complete entry.
+*/
+
/* Retire requests first as we use it above for the early return.
 * If we retire requests last, we may use a later seqno and so clear
 * the requests lists without clearing the active list, leading to
 * confusion.
 */
-   while (!list_empty(>request_list)) {
-   struct drm_i915_gem_request *request;
-
-   request = list_first_entry(>request_list,
-  struct drm_i915_gem_request,
-  list);
-
-   if (!i915_gem_request_completed(request))
-   break;
+   list_for_each_entry_safe(req, req_next, >request_list, list) {
+   if (!i915_gem_request_completed(req))
+   continue;
 
-   i915_gem_request_retire(request);
+   i915_gem_request_retire(req);
}
 
/* Move any buffers on the active list that are no longer referenced
 * by the ringbuffer to the flushing/inactive lists as appropriate,
 * before we free the context associated with the requests.
 */
-   while (!list_empty(>active_list)) {
-   struct drm_i915_gem_object *obj;
-
-   obj = list_first_entry(>active_list,
- struct drm_i915_gem_object,
- ring_list[ring->id]);
-
+   list_for_each_entry_safe(obj, obj_next, >active_list, 
ring_list[ring->id]) {
if (!list_empty(>last_read_req[ring->id]->list))
-   break;
+   continue;
 
i915_gem_object_retire__read(obj, ring->id);
}
-- 
1.9.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


  1   2   >