date:20140509

Re: [Intel-gfx] [PATCH v3] drm/i915: Debugfs disable RPS boost and idle

2014-05-09 Thread Deepak S



On Tuesday 06 May 2014 03:20 AM, Daisy Sun wrote:

RP frequency request is affected by 2 modules: normal turbo
algorithm and RPS boost algorithm. By adding RPS boost algorithm
to the mix, the final frequency becomes relatively unpredictable.
Add a switch to enable/disable RPS boost functionality. When
disabled, RP frequency will follow the normal turbo algorithm only.

Intention: when boost and idle are disabled, we have a clear vision
of turbo algorithm. It‘s very helpful to verify if the turbo
algorithm is working as expected.
Without debugfs hooks, the RPS boost or idle may kick in at
anytime and any circumstances.

V1->V2: Follow Daniel's comment to explain the intention.
V2->V3: Abandon flush_delayed work, abandon lock of rps.hw_lock
during get/set of rps.debugfs_disable_boost

Signed-off-by: Daisy Sun 
---
  drivers/gpu/drm/i915/i915_debugfs.c | 29 +
  drivers/gpu/drm/i915/i915_drv.h |  1 +
  drivers/gpu/drm/i915/intel_pm.c |  8 ++--
  3 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index 1e83ae4..685f7e5 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -3486,6 +3486,34 @@ DEFINE_SIMPLE_ATTRIBUTE(i915_drop_caches_fops,
i915_drop_caches_get, i915_drop_caches_set,
"0x%08llx\n");
  
+static int i915_rps_disable_boost_get(void *data, u64 *val)

+{
+   struct drm_device *dev = data;
+   struct drm_i915_private *dev_priv = dev->dev_private;
+
+   *val = dev_priv->rps.debugfs_disable_boost;
+
+   return 0;
+}
+
+static int i915_rps_disable_boost_set(void *data, u64 val)
+{
+   struct drm_device *dev = data;
+   struct drm_i915_private *dev_priv = dev->dev_private;
+   int ret;
+
+   DRM_DEBUG_DRIVER("%s RPS Boost-Idle mode\n",
+val ? "Disable" : "Enable");
+
+   dev_priv->rps.debugfs_disable_boost = val;
+
+   return 0;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(i915_rps_disable_boost_fops,
+   i915_rps_disable_boost_get, i915_rps_disable_boost_set,
+   "%llu\n");
+
  static int
  i915_max_freq_get(void *data, u64 *val)
  {
@@ -3821,6 +3849,7 @@ static const struct i915_debugfs_files {
{"i915_wedged", &i915_wedged_fops},
{"i915_max_freq", &i915_max_freq_fops},
{"i915_min_freq", &i915_min_freq_fops},
+   {"i915_rps_disable_boost", &i915_rps_disable_boost_fops},
{"i915_cache_sharing", &i915_cache_sharing_fops},
{"i915_ring_stop", &i915_ring_stop_fops},
{"i915_ring_missed_irq", &i915_ring_missed_irq_fops},
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 272aa7a..9c427da 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -847,6 +847,7 @@ struct intel_gen6_power_mgmt {
int last_adj;
enum { LOW_POWER, BETWEEN, HIGH_POWER } power;
  
+	bool debugfs_disable_boost;

bool enabled;
struct delayed_work delayed_resume_work;
  
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c

index 75c1c76..6acac14 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -3163,7 +3163,9 @@ void gen6_rps_idle(struct drm_i915_private *dev_priv)
struct drm_device *dev = dev_priv->dev;
  
  	mutex_lock(&dev_priv->rps.hw_lock);

-   if (dev_priv->rps.enabled) {
+
+   if (dev_priv->rps.enabled
+   && !dev_priv->rps.debugfs_disable_boost) {


On VLV, when system is idle we wont get  down threshold interrupts. So 
disabling this will not help you to test the algorithm. I think we need to 
retain gen6_rps_idle



if (IS_VALLEYVIEW(dev))
vlv_set_rps_idle(dev_priv);
else
@@ -3178,7 +3180,9 @@ void gen6_rps_boost(struct drm_i915_private *dev_priv)
struct drm_device *dev = dev_priv->dev;
  
  	mutex_lock(&dev_priv->rps.hw_lock);

-   if (dev_priv->rps.enabled) {
+
+   if (dev_priv->rps.enabled
+   && !dev_priv->rps.debugfs_disable_boost) {
if (IS_VALLEYVIEW(dev))
valleyview_set_rps(dev_priv->dev, 
dev_priv->rps.max_freq_softlimit);
else


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 53/56] TESTME: Always force invalidate

2014-05-09 Thread Ben Widawsky

---
 drivers/gpu/drm/i915/i915_gem_context.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
b/drivers/gpu/drm/i915/i915_gem_context.c
index fec8114..a4ea50a 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -681,7 +681,7 @@ static int do_switch(struct intel_ring_buffer *ring,
 * it must avoid lite restores in HW by programming "Force Restore" bit
 * to ‘1’ in context descriptor during context submission
 */
-   if (IS_GEN8(ring->dev) && i915_semaphore_is_enabled(ring->dev))
+   if (IS_GEN8(ring->dev) && to->is_initialized)
hw_flags |= MI_FORCE_RESTORE;
 
ret = mi_set_context(ring, to, hw_flags);
-- 
1.9.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 54/56] drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl

2014-05-09 Thread Ben Widawsky

From: Chris Wilson 

By exporting the ability to map user address and inserting PTEs
representing their backing pages into the GTT, we can exploit UMA in order
to utilize normal application data as a texture source or even as a
render target (depending upon the capabilities of the chipset). This has
a number of uses, with zero-copy downloads to the GPU and efficient
readback making the intermixed streaming of CPU and GPU operations
fairly efficient. This ability has many widespread implications from
faster rendering of client-side software rasterisers (chromium),
mitigation of stalls due to read back (firefox) and to faster pipelining
of texture data (such as pixel buffer objects in GL or data blobs in CL).

v2: Compile with CONFIG_MMU_NOTIFIER
v3: We can sleep while performing invalidate-range, which we can utilise
to drop our page references prior to the kernel manipulating the vma
(for either discard or cloning) and so protect normal users.
v4: Only run the invalidate notifier if the range intercepts the bo.
v5: Prevent userspace from attempting to GTT mmap non-page aligned buffers
v6: Recheck after reacquire mutex for lost mmu.
v7: Fix implicit padding of ioctl struct by rounding to next 64bit boundary.
v8: Fix rebasing error after forwarding porting the back port.
v9: Limit the userptr to page aligned entries. We now expect userspace
to handle all the offset-in-page adjustments itself.
v10: Prevent vma from being copied across fork to avoid issues with cow.
v11: Drop vma behaviour changes -- locking is nigh on impossible.
 Use a worker to load user pages to avoid lock inversions.
v12: Use get_task_mm()/mmput() for correct refcounting of mm.
v13: Use a worker to release the mmu_notifier to avoid lock inversion
v14: Decouple mmu_notifier from struct_mutex using a custom mmu_notifer
 with its own locking and tree of objects for each mm/mmu_notifier.
v15: Prevent overlapping userptr objects, and invalidate all objects
 within the mmu_notifier range
v16: Fix a typo for iterating over multiple objects in the range and
 rearrange error path to destroy the mmu_notifier locklessly.
 Also close a race between invalidate_range and the get_pages_worker.
v17: Close a race between get_pages_worker/invalidate_range and fresh
 allocations of the same userptr range - and notice that
 struct_mutex was presumed to be held when during creation it wasn't.
v18: Sigh. Fix the refactor of st_set_pages() to allocate enough memory
 for the struct sg_table and to clear it before reporting an error.
v19: Always error out on read-only userptr requests as we don't have the
 hardware infrastructure to support them at the moment.
v20: Refuse to implement read-only support until we have the required
 infrastructure - but reserve the bit in flags for future use.
v21: use_mm() is not required for get_user_pages(). It is only meant to
 be used to fix up the kernel thread's current->mm for use with
 copy_user().
v22: Use sg_alloc_table_from_pages for that chunky feeling

Signed-off-by: Chris Wilson 
Cc: Tvrtko Ursulin 
Cc: "Gong, Zhipeng" 
Cc: Akash Goel 
Cc: "Volkin, Bradley D" 
Reviewed-by: Tvrtko Ursulin 

Conflicts:
drivers/gpu/drm/i915/i915_dma.c
drivers/gpu/drm/i915/i915_drv.h
include/uapi/drm/i915_drm.h
---
 drivers/gpu/drm/i915/Kconfig|   1 +
 drivers/gpu/drm/i915/Makefile   |   1 +
 drivers/gpu/drm/i915/i915_dma.c |   1 +
 drivers/gpu/drm/i915/i915_drv.h |  24 +-
 drivers/gpu/drm/i915/i915_gem.c |   4 +
 drivers/gpu/drm/i915/i915_gem_dmabuf.c  |   5 +
 drivers/gpu/drm/i915/i915_gem_userptr.c | 701 
 drivers/gpu/drm/i915/i915_gpu_error.c   |   2 +
 include/uapi/drm/i915_drm.h |  16 +
 9 files changed, 754 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/i915/i915_gem_userptr.c

diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
index e4e3c01..437e182 100644
--- a/drivers/gpu/drm/i915/Kconfig
+++ b/drivers/gpu/drm/i915/Kconfig
@@ -5,6 +5,7 @@ config DRM_I915
depends on (AGP || AGP=n)
select INTEL_GTT
select AGP_INTEL if AGP
+   select INTERVAL_TREE
# we need shmfs for the swappable backing store, and in particular
# the shmem_readpage() which depends upon tmpfs
select SHMEM
diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index b5d4029..e548f4e 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -27,6 +27,7 @@ i915-y += i915_cmd_parser.o \
  i915_gem.o \
  i915_gem_stolen.o \
  i915_gem_tiling.o \
+ i915_gem_userptr.o \
  i915_gpu_error.o \
  i915_irq.o \
  i915_trace_points.o \
diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 54a08a9..00ae6d6 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1995,6

[Intel-gfx] [PATCH 52/56] TESTME: GFX_TLB_INVALIDATE_EXPLICIT

2014-05-09 Thread Ben Widawsky

---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 33f9abd..15ede8e 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -630,7 +630,7 @@ static int init_render_ring(struct intel_ring_buffer *ring)
   _MASKED_BIT_ENABLE(GFX_TLB_INVALIDATE_EXPLICIT));
 
/* WaBCSVCSTlbInvalidationMode:ivb,vlv,hsw */
-   if (IS_GEN7(dev))
+   if (IS_GEN7(dev) || IS_GEN8(dev))
I915_WRITE(GFX_MODE_GEN7,
   _MASKED_BIT_ENABLE(GFX_TLB_INVALIDATE_EXPLICIT) |
   _MASKED_BIT_ENABLE(GFX_REPLAY_MODE));
-- 
1.9.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 55/56] drm/i915: Track userptr VMAs

2014-05-09 Thread Ben Widawsky

This HACK allows users to reuse the userptr ioctl in order to
pre-reserve the VMA at a specific location. The vma will follow all the
same paths as other userptr objects - only the drm_mm node is actually
allocated.

Again, this patch is a big HACK to get some other people currently using
userptr enabled.

Signed-off-by: Ben Widawsky 
---
 drivers/gpu/drm/i915/i915_drv.h|  1 +
 drivers/gpu/drm/i915/i915_gem.c| 22 +++---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  3 +++
 drivers/gpu/drm/i915/i915_gem_gtt.h|  4 
 4 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 60513e7..71e39ff 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2166,6 +2166,7 @@ void i915_gem_vma_destroy(struct i915_vma *vma);
 #define PIN_GLOBAL 0x4
 #define PIN_ALIASING 0x8
 #define PIN_GLOBAL_ALIASED (PIN_ALIASING | PIN_GLOBAL)
+#define PIN_BOUND  0x10
 int __must_check i915_gem_object_pin(struct drm_i915_gem_object *obj,
 struct i915_address_space *vm,
 uint32_t alignment,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 287d48e..ff75971 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3279,7 +3279,13 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object 
*obj,
if (IS_ERR(vma))
goto err_unpin;
 
+   if (flags & PIN_BOUND) {
+   WARN_ON(!vma->node.allocated && !vma->obj->userptr.ptr);
+   goto skip_search;
+   }
+
 search_free:
+   WARN_ON(vma->node.allocated);
ret = drm_mm_insert_node_in_range_generic(&vm->mm, &vma->node,
  size, alignment,
  obj->cache_level, 0, gtt_max,
@@ -3293,6 +3299,7 @@ search_free:
 
goto err_free_vma;
}
+skip_search:
if (WARN_ON(!i915_gem_valid_gtt_space(dev, &vma->node,
  obj->cache_level))) {
ret = -EINVAL;
@@ -3329,10 +3336,13 @@ search_free:
i915_gem_vma_bind(vma, obj->cache_level, vma_bind_flags);
 
i915_gem_verify_gtt(dev);
+   if (flags & PIN_BOUND)
+   vma->uptr_bind=1;
return vma;
 
 err_remove_node:
-   drm_mm_remove_node(&vma->node);
+   if ((flags & PIN_BOUND) == 0)
+   drm_mm_remove_node(&vma->node);
 err_free_vma:
i915_gem_vma_destroy(vma);
vma = ERR_PTR(ret);
@@ -3875,6 +3885,11 @@ i915_gem_object_pin(struct drm_i915_gem_object *obj,
if (WARN_ON(flags & (PIN_GLOBAL | PIN_MAPPABLE) && !i915_is_ggtt(vm)))
return -EINVAL;
 
+   if (flags & PIN_BOUND) {
+   if (WARN_ON(flags & ~PIN_BOUND))
+   return -EINVAL;
+   }
+
vma = i915_gem_obj_to_vma(obj, vm);
if (vma) {
if (WARN_ON(vma->pin_count == 
DRM_I915_GEM_OBJECT_MAX_PIN_COUNT))
@@ -3898,7 +3913,8 @@ i915_gem_object_pin(struct drm_i915_gem_object *obj,
}
}
 
-   if (vma == NULL || !drm_mm_node_allocated(&vma->node)) {
+   if (vma == NULL || !drm_mm_node_allocated(&vma->node) ||
+   ((flags & PIN_BOUND) && !vma->uptr_bind)) {
vma = i915_gem_object_bind_to_vm(obj, vm, alignment, flags);
if (IS_ERR(vma))
return PTR_ERR(vma);
@@ -4265,7 +4281,7 @@ struct i915_vma *i915_gem_obj_to_vma(struct 
drm_i915_gem_object *obj,
 
 void i915_gem_vma_destroy(struct i915_vma *vma)
 {
-   WARN_ON(vma->node.allocated);
+   WARN_ON(vma->node.allocated && !vma->uptr);
 
/* Keep the vma as a placeholder in the execbuffer reservation lists */
if (!list_empty(&vma->exec_list))
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 08fde7d..596e51e 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -566,6 +566,9 @@ i915_gem_execbuffer_reserve_vma(struct i915_vma *vma,
if (entry->flags & EXEC_OBJECT_NEEDS_GTT)
flags |= PIN_GLOBAL;
 
+   if (vma->uptr)
+   flags |= PIN_BOUND;
+
ret = i915_gem_object_pin(obj, vma->vm, entry->alignment, flags);
if (ret)
return ret;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h 
b/drivers/gpu/drm/i915/i915_gem_gtt.h
index c265c23..bdb4b05 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -171,6 +171,10 @@ struct i915_vma {
unsigned int pin_count:4;
 #define DRM_I915_GEM_OBJECT_MAX_PIN_COUNT 0xf
 
+   /* FIXME: */
+   unsigned int uptr:1; /* Whether this VMA has been userptr'd */
+   unsigned int uptr_bind:1; /* Whether we've actually bound it

[Intel-gfx] [PATCH 56/56] drm/i915/userptr: Mirror GPU addr at ioctl (HACK/POC)

2014-05-09 Thread Ben Widawsky

This is needed for the proof of concept work that will allow mirrored
GPU addressing via the existing userptr interface. Part of the hack
involves passing the context ID to the ioctl in order to get a VM.

Signed-off-by: Ben Widawsky 
---
 drivers/gpu/drm/i915/i915_gem_userptr.c | 120 +---
 include/uapi/drm/i915_drm.h |   7 +-
 2 files changed, 98 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c 
b/drivers/gpu/drm/i915/i915_gem_userptr.c
index 5da37cc..795ea3e 100644
--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
@@ -224,10 +224,6 @@ i915_mmu_notifier_add(struct i915_mmu_notifier *mmu,
 * remove the objects from the interval tree) before we do
 * the check for overlapping objects.
 */
-   ret = i915_mutex_lock_interruptible(mmu->dev);
-   if (ret)
-   return ret;
-
i915_gem_retire_requests(mmu->dev);
 
/* Disallow overlapping userptr objects */
@@ -253,7 +249,6 @@ i915_mmu_notifier_add(struct i915_mmu_notifier *mmu,
ret = 0;
}
spin_unlock(&mmu->lock);
-   mutex_unlock(&mmu->dev->struct_mutex);
 
return ret;
 }
@@ -283,19 +278,12 @@ i915_gem_userptr_init__mmu_notifier(struct 
drm_i915_gem_object *obj,
return capable(CAP_SYS_ADMIN) ? 0 : -EPERM;
 
down_write(&obj->userptr.mm->mmap_sem);
-   ret = i915_mutex_lock_interruptible(obj->base.dev);
-   if (ret == 0) {
-   mmu = i915_mmu_notifier_get(obj->base.dev, obj->userptr.mm);
-   if (!IS_ERR(mmu))
-   mmu->count++; /* preemptive add to act as a refcount */
-   else
-   ret = PTR_ERR(mmu);
-   mutex_unlock(&obj->base.dev->struct_mutex);
-   }
+   mmu = i915_mmu_notifier_get(obj->base.dev, obj->userptr.mm);
+   if (!IS_ERR(mmu))
+   mmu->count++; /* preemptive add to act as a refcount */
+   else
+   ret = PTR_ERR(mmu);
up_write(&obj->userptr.mm->mmap_sem);
-   if (ret)
-   return ret;
-
mn = kzalloc(sizeof(*mn), GFP_KERNEL);
if (mn == NULL) {
ret = -ENOMEM;
@@ -588,6 +576,52 @@ i915_gem_userptr_release(struct drm_i915_gem_object *obj)
}
 }
 
+/* Carve out the address space for later use */
+static int i915_gem_userptr_reserve_vma(struct drm_i915_gem_object *obj,
+   struct i915_address_space *vm,
+   uint64_t offset,
+   uint64_t size)
+{
+   struct i915_vma *vma;
+   int ret;
+
+   vma = i915_gem_obj_to_vma(obj, vm);
+   if (vma)
+   return -ENXIO;
+
+   vma = i915_gem_obj_lookup_or_create_vma(obj, vm);
+   if (!vma)
+   return PTR_ERR(vma);
+
+   BUG_ON(!drm_mm_initialized(&vm->mm));
+
+   if (vma->uptr) {
+   DRM_INFO("Already had a userptr\n");
+   return 0;
+   }
+   if (vma->node.allocated) {
+   DRM_INFO("Node was previously allocated\n");
+   return -EBUSY;
+   }
+
+   vma->node.start = offset;
+   vma->node.size = size;
+   vma->node.color = 0;
+   ret = drm_mm_reserve_node(&vm->mm, &vma->node);
+   if (ret) {
+   /* There are two reasons this can fail.
+* 1. The user is using a mix of relocs and userptr, and a reloc
+* won.
+* TODO: handle better.
+*/
+   return ret;
+   }
+
+   vma->uptr = 1;
+
+   return 0;
+}
+
 static const struct drm_i915_gem_object_ops i915_gem_userptr_ops = {
.get_pages = i915_gem_userptr_get_pages,
.put_pages = i915_gem_userptr_put_pages,
@@ -630,37 +664,62 @@ static const struct drm_i915_gem_object_ops 
i915_gem_userptr_ops = {
 int
 i915_gem_userptr_ioctl(struct drm_device *dev, void *data, struct drm_file 
*file)
 {
-   struct drm_i915_private *dev_priv = dev->dev_private;
+   struct drm_i915_file_private *file_priv = file->driver_priv;
struct drm_i915_gem_userptr *args = data;
struct drm_i915_gem_object *obj;
+   struct i915_hw_context *ctx;
+   struct i915_address_space *vm;
int ret;
u32 handle;
 
+   ret = i915_mutex_lock_interruptible(dev);
+   if (ret)
+   return ret;
+
+#define goto_err(__err) do { \
+   ret = (__err); \
+   goto out; \
+} while (0)
+
+   ctx = i915_gem_context_get(file_priv, args->ctx_id);
+   if (IS_ERR(ctx))
+   goto_err(PTR_ERR(ctx));
+
+   /* i915_gem_context_reference(ctx); */
+
if (args->flags & ~(I915_USERPTR_READ_ONLY |
+   I915_USERPTR_GPU_MIRROR |
I915_USERPTR_UNSYNCHRONIZED))
-   return -EINVAL;
+   goto_err(-EINVA

[Intel-gfx] [PATCH 51/56] drm/i915/bdw: Flip the 48b switch

2014-05-09 Thread Ben Widawsky

Signed-off-by: Ben Widawsky 
---
 drivers/gpu/drm/i915/i915_drv.h | 2 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c | 3 ---
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index b3b52cf..0848638 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1924,7 +1924,7 @@ struct drm_i915_cmd_table {
 #ifdef CONFIG_32BIT
 # define HAS_48B_PPGTT(dev)false
 #else
-# define HAS_48B_PPGTT(dev)(IS_BROADWELL(dev) && false)
+# define HAS_48B_PPGTT(dev)IS_BROADWELL(dev)
 #endif
 
 #define HAS_OVERLAY(dev)   (INTEL_INFO(dev)->has_overlay)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 959054c..d73a132 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -982,9 +982,6 @@ static int gen8_ppgtt_alloc_pagedirs(struct 
i915_address_space *vm,
 
BUG_ON(!bitmap_empty(new_pds, pdpes));
 
-   /* FIXME: PPGTT container_of won't work for 64b */
-   BUG_ON((start + length) > 0x8ULL);
-
gen8_for_each_pdpe(unused, pdp, start, length, temp, pdpe) {
struct i915_pagedir *pd;
if (unused)
-- 
1.9.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 45/56] drm/i915/bdw: Abstract PDP usage

2014-05-09 Thread Ben Widawsky

Up until now, ppgtt->pdp has always been the root of our page tables.
Legacy 32b addresses acted like it had 1 PDP with 4 PDPEs.

In preparation for 4 level page tables, we need to stop use ppgtt->pdp
directly unless we know it's what we want. The future structure will use
ppgtt->pml4 for the top level, and the pdp is just one of the entries
being pointed to by a pml4e.

This patch addresses some carelessness done throughout development wrt
assumptions made of the root page tables.

Signed-off-by: Ben Widawsky 
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 129 
 1 file changed, 85 insertions(+), 44 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index df3cd41..c4b53ef 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -499,6 +499,7 @@ static void gen8_ppgtt_clear_range(struct 
i915_address_space *vm,
 {
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
+   struct i915_pagedirpo *pdp = &ppgtt->pdp; /* FIXME: 48b */
gen8_gtt_pte_t *pt_vaddr, scratch_pte;
unsigned pdpe = gen8_pdpe_index(start);
unsigned pde = gen8_pde_index(start);
@@ -510,7 +511,7 @@ static void gen8_ppgtt_clear_range(struct 
i915_address_space *vm,
  I915_CACHE_LLC, use_scratch);
 
while (num_entries) {
-   struct i915_pagedir *pd = ppgtt->pdp.pagedirs[pdpe];
+   struct i915_pagedir *pd = pdp->pagedirs[pdpe];
struct i915_pagetab *pt = pd->page_tables[pde];
struct page *page_table = pt->page;
 
@@ -544,6 +545,7 @@ static void gen8_ppgtt_insert_entries(struct 
i915_address_space *vm,
 {
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
+   struct i915_pagedirpo *pdp = &ppgtt->pdp; /* FIXME: 48b */
gen8_gtt_pte_t *pt_vaddr;
unsigned pdpe = gen8_pdpe_index(start);
unsigned pde = gen8_pde_index(start);
@@ -554,7 +556,7 @@ static void gen8_ppgtt_insert_entries(struct 
i915_address_space *vm,
 
for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
if (pt_vaddr == NULL) {
-   struct i915_pagedir *pd = ppgtt->pdp.pagedirs[pdpe];
+   struct i915_pagedir *pd = pdp->pagedirs[pdpe];
struct i915_pagetab *pt = pd->page_tables[pde];
struct page *page_table = pt->page;
pt_vaddr = kmap_atomic(page_table);
@@ -636,23 +638,22 @@ static void gen8_unmap_pagetable(struct i915_hw_ppgtt 
*ppgtt,
gen8_map_pagedir(pd, ppgtt->scratch_pt, pde, ppgtt->base.dev);
 }
 
-static void gen8_teardown_va_range(struct i915_address_space *vm,
-  uint64_t start, uint64_t length)
+static void gen8_teardown_va_range_3lvl(struct i915_address_space *vm,
+   struct i915_pagedirpo *pdp,
+   uint64_t start, uint64_t length)
 {
-   struct i915_hw_ppgtt *ppgtt =
-   container_of(vm, struct i915_hw_ppgtt, base);
struct drm_device *dev = vm->dev;
struct i915_pagedir *pd;
struct i915_pagetab *pt;
uint64_t temp;
uint32_t pdpe, pde, orig_start = start;
 
-   if (!ppgtt->pdp.pagedirs) {
+   if (!pdp || !pdp->pagedirs) {
/* If pagedirs are already free, there is nothing to do.*/
return;
}
 
-   gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
+   gen8_for_each_pdpe(pd, pdp, start, length, temp, pdpe) {
uint64_t pd_len = gen8_clamp_pd(start, length);
uint64_t pd_start = start;
 
@@ -660,12 +661,12 @@ static void gen8_teardown_va_range(struct 
i915_address_space *vm,
 * down, and up.
 */
if (!pd) {
-   WARN(test_bit(pdpe, ppgtt->pdp.used_pdpes),
+   WARN(test_bit(pdpe, pdp->used_pdpes),
 "PDPE %d is not allocated, but is reserved (%p)\n",
 pdpe, vm);
continue;
} else {
-   WARN(!test_bit(pdpe, ppgtt->pdp.used_pdpes),
+   WARN(!test_bit(pdpe, pdp->used_pdpes),
 "PDPE %d not reserved, but is allocated (%p)",
 pdpe, vm);
}
@@ -691,6 +692,8 @@ static void gen8_teardown_va_range(struct 
i915_address_space *vm,
 gen8_pte_count(pd_start, pd_len));
 
if (bitmap_empty(pt->used_ptes, GEN8_PTES_PER_PT)) {
+   struct i915_hw_ppgtt *ppgtt =
+   container_of(vm, struct i915_hw_ppgtt, 
base);

[Intel-gfx] [PATCH 48/56] drm/i915: Restructure map vs. insert entries

2014-05-09 Thread Ben Widawsky

After this change, the old GGTT keeps its insert_entries/clear_range
functions as we don't expect those to ever change in terms of page table
levels. The address space now gets map_vma/unmap VMA. It better reflects
the operations we actually want to support for a VMA.

I was too lazy, but the GGTT should really use these new functions as
well.

BISECT WARNING: This commit breaks aliasing PPGTT as is. If you see this
during bisect, please skip. There was no other way I could find to make
these changes remotely readable

Signed-off-by: Ben Widawsky 
---
 drivers/gpu/drm/i915/i915_drv.h |   1 +
 drivers/gpu/drm/i915/i915_gem_gtt.c | 223 +++-
 drivers/gpu/drm/i915/i915_gem_gtt.h |  24 ++--
 3 files changed, 126 insertions(+), 122 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 4d53728..a043941 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -571,6 +571,7 @@ enum i915_cache_level {
  large Last-Level-Cache. LLC is coherent with
  the CPU, but L3 is only visible to the GPU. */
I915_CACHE_WT, /* hsw:gt3e WriteThrough for scanouts */
+   I915_CACHE_MAX,
 };
 
 struct i915_ctx_hang_stats {
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 15e61d8..d67d803 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -730,9 +730,9 @@ static void gen8_map_page_directory_pointer(struct 
i915_pml4 *pml4,
kunmap_atomic(pagemap);
 }
 
-static void gen8_teardown_va_range_3lvl(struct i915_address_space *vm,
-   struct i915_pagedirpo *pdp,
-   uint64_t start, uint64_t length)
+static void gen8_unmap_vma_3lvl(struct i915_address_space *vm,
+   struct i915_pagedirpo *pdp,
+   uint64_t start, uint64_t length)
 {
struct drm_device *dev = vm->dev;
struct i915_pagedir *pd;
@@ -817,38 +817,43 @@ static void gen8_teardown_va_range_3lvl(struct 
i915_address_space *vm,
}
 }
 
-static void gen8_teardown_va_range_4lvl(struct i915_address_space *vm,
-   struct i915_pml4 *pml4,
-   uint64_t start, uint64_t length)
+static void gen8_unmap_vma_4lvl(struct i915_address_space *vm,
+   struct i915_pml4 *pml4,
+   uint64_t start, uint64_t length)
 {
struct i915_pagedirpo *pdp;
uint64_t temp, pml4e;
 
gen8_for_each_pml4e(pdp, pml4, start, length, temp, pml4e) {
-   gen8_teardown_va_range_3lvl(vm, pdp, start, length);
+   gen8_unmap_vma_3lvl(vm, pdp, start, length);
if (bitmap_empty(pdp->used_pdpes, I915_PDPES_PER_PDP(vm->dev)))
clear_bit(pml4e, pml4->used_pml4es);
}
 }
 
-static void gen8_teardown_va_range(struct i915_address_space *vm,
-  uint64_t start, uint64_t length)
+static void __gen8_teardown_va_range(struct i915_address_space *vm,
+uint64_t start, uint64_t length)
 {
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
 
if (HAS_48B_PPGTT(vm->dev))
-   gen8_teardown_va_range_4lvl(vm, &ppgtt->pml4, start, length);
+   gen8_unmap_vma_4lvl(vm, &ppgtt->pml4, start, length);
else
-   gen8_teardown_va_range_3lvl(vm, &ppgtt->pdp, start, length);
+   gen8_unmap_vma_3lvl(vm, &ppgtt->pdp, start, length);
+}
+
+static void gen8_unmap_vma(struct i915_vma *vma)
+{
+   __gen8_teardown_va_range(vma->vm, vma->node.start, vma->node.size);
 }
 
 static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
trace_i915_va_teardown(&ppgtt->base,
   ppgtt->base.start, ppgtt->base.total);
-   gen8_teardown_va_range(&ppgtt->base,
-  ppgtt->base.start, ppgtt->base.total);
+   __gen8_teardown_va_range(&ppgtt->base,
+ppgtt->base.start, ppgtt->base.total);
 
WARN_ON(!bitmap_empty(ppgtt->pdp.used_pdpes,
  I915_PDPES_PER_PDP(ppgtt->base.dev)));
@@ -1188,15 +1193,15 @@ err_out:
start = orig_start;
length = orig_length;
gen8_for_each_pml4e(pdp, pml4, start, length, temp, pml4e)
-   gen8_teardown_va_range_3lvl(vm, pdp, start, length);
+   gen8_unmap_vma_3lvl(vm, pdp, start, length);
 
 err_alloc:
for_each_set_bit(pml4e, new_pdps, GEN8_PML4ES_PER_PML4)
free_pdp_single(pdp, vm->dev);
 }
 
-static int gen8_alloc_va_range(struct i915_address_space *vm,
-  uint64_t start, uint64_t length)
+static int __gen8_alloc_va_ran

[Intel-gfx] [PATCH 47/56] drm/i915/bdw: 4 level pages tables

2014-05-09 Thread Ben Widawsky

Map is easy, it's the same register as the PDP descriptor 0, but it only
has one entry. Also, the mapping code is now trivial thanks to all of
the prep patches.

Signed-off-by: Ben Widawsky 
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 53 +
 drivers/gpu/drm/i915/i915_gem_gtt.h |  4 ++-
 drivers/gpu/drm/i915/i915_reg.h |  1 +
 3 files changed, 52 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 3478bf5..15e61d8 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -528,9 +528,9 @@ static int gen8_write_pdp(struct intel_ring_buffer *ring,
return 0;
 }
 
-static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
- struct intel_ring_buffer *ring,
- bool synchronous)
+static int gen8_legacy_mm_switch(struct i915_hw_ppgtt *ppgtt,
+struct intel_ring_buffer *ring,
+bool synchronous)
 {
int i, ret;
 
@@ -547,6 +547,13 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
return 0;
 }
 
+static int gen8_48b_mm_switch(struct i915_hw_ppgtt *ppgtt,
+ struct intel_ring_buffer *ring,
+ bool synchronous)
+{
+   return gen8_write_pdp(ring, 0, ppgtt->pml4.daddr, synchronous);
+}
+
 static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
   uint64_t start,
   uint64_t length,
@@ -674,6 +681,7 @@ static void gen8_map_pagetable_range(struct 
i915_address_space *vm,
kunmap_atomic(pagedir);
 }
 
+
 static void gen8_map_pagedir(struct i915_pagedir *pd,
 struct i915_pagetab *pt,
 int entry,
@@ -693,6 +701,35 @@ static void gen8_unmap_pagetable(struct i915_hw_ppgtt 
*ppgtt,
gen8_map_pagedir(pd, ppgtt->scratch_pt, pde, ppgtt->base.dev);
 }
 
+static void gen8_map_page_directory(struct i915_pagedirpo *pdp,
+   struct i915_pagedir *pd,
+   int index,
+   struct drm_device *dev)
+{
+   gen8_ppgtt_pdpe_t *pagedirpo;
+   gen8_ppgtt_pdpe_t pdpe;
+
+   if (!HAS_48B_PPGTT(dev))
+   return;
+
+   pagedirpo = kmap_atomic(pdp->page);
+   pdpe = gen8_pde_encode(dev, pd->daddr, I915_CACHE_LLC);
+   pagedirpo[index] = pdpe;
+   kunmap_atomic(pagedirpo);
+}
+
+static void gen8_map_page_directory_pointer(struct i915_pml4 *pml4,
+   struct i915_pagedirpo *pdp,
+   int index,
+   struct drm_device *dev)
+{
+   gen8_ppgtt_pml4e_t *pagemap = kmap_atomic(pml4->page);
+   gen8_ppgtt_pml4e_t pml4e = gen8_pde_encode(dev, pdp->daddr, 
I915_CACHE_LLC);
+   BUG_ON(!HAS_48B_PPGTT(dev));
+   pagemap[index] = pml4e;
+   kunmap_atomic(pagemap);
+}
+
 static void gen8_teardown_va_range_3lvl(struct i915_address_space *vm,
struct i915_pagedirpo *pdp,
uint64_t start, uint64_t length)
@@ -1065,6 +1102,7 @@ static int gen8_alloc_va_range_3lvl(struct 
i915_address_space *vm,
set_bit(pdpe, pdp->used_pdpes);
 
gen8_map_pagetable_range(vm, pd, start, length);
+   gen8_map_page_directory(pdp, pd, pdpe, dev);
}
 
free_gen8_temp_bitmaps(new_page_dirs, new_page_tables, pdpes);
@@ -1132,6 +1170,8 @@ static int gen8_alloc_va_range_4lvl(struct 
i915_address_space *vm,
ret = gen8_alloc_va_range_3lvl(vm, pdp, start, length);
if (ret)
goto err_out;
+
+   gen8_map_page_directory_pointer(pml4, pdp, pml4e, vm->dev);
}
 
WARN(bitmap_weight(pml4->used_pml4es, GEN8_PML4ES_PER_PML4) > 2,
@@ -1201,6 +1241,7 @@ static int gen8_ppgtt_init_common(struct i915_hw_ppgtt 
*ppgtt, uint64_t size)
free_pt_scratch(ppgtt->scratch_pd, ppgtt->base.dev);
return ret;
}
+   ppgtt->switch_mm = gen8_48b_mm_switch;
} else {
int ret = __pdp_init(&ppgtt->pdp, false);
if (ret) {
@@ -1208,7 +1249,7 @@ static int gen8_ppgtt_init_common(struct i915_hw_ppgtt 
*ppgtt, uint64_t size)
return ret;
}
 
-   ppgtt->switch_mm = gen8_mm_switch;
+   ppgtt->switch_mm = gen8_legacy_mm_switch;
trace_i915_pagedirpo_alloc(&ppgtt->base, 0, 0, 
GEN8_PML4E_SHIFT);
}
 
@@ -1235,6 +1276,7 @@ static int gen8_aliasing_ppgtt_init(struct i915_hw_ppgtt 
*ppgtt)
return ret;
}
 
+   /* FIXME: PML4 */
gen8_for_each_pdpe(pd, pdp, st

[Intel-gfx] [PATCH 49/56] drm/i915/bdw: make aliasing PPGTT dynamic

2014-05-09 Thread Ben Widawsky

There is no need to preallocate the aliasing PPGTT. The code is properly
plubmed now to treat this address space like any other.

v2: Updated for CHV. Note CHV doesn't support 64b address space.

Signed-off-by: Ben Widawsky 
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 281 
 1 file changed, 153 insertions(+), 128 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index d67d803..959054c 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -554,14 +554,14 @@ static int gen8_48b_mm_switch(struct i915_hw_ppgtt *ppgtt,
return gen8_write_pdp(ring, 0, ppgtt->pml4.daddr, synchronous);
 }
 
-static void gen8_ppgtt_clear_range(struct i915_address_space *vm,
-  uint64_t start,
-  uint64_t length,
-  bool use_scratch)
+/* Helper function clear a range of PTEs. The range may span multiple page
+ * tables. */
+static void gen8_ppgtt_clear_pte_range(struct i915_hw_ppgtt *ppgtt,
+  struct i915_pagedirpo *pdp,
+  uint64_t start,
+  uint64_t length,
+  bool scratch)
 {
-   struct i915_hw_ppgtt *ppgtt =
-   container_of(vm, struct i915_hw_ppgtt, base);
-   struct i915_pagedirpo *pdp = &ppgtt->pdp; /* FIXME: 48b */
gen8_gtt_pte_t *pt_vaddr, scratch_pte;
unsigned pdpe = gen8_pdpe_index(start);
unsigned pde = gen8_pde_index(start);
@@ -570,7 +570,7 @@ static void gen8_ppgtt_clear_range(struct 
i915_address_space *vm,
unsigned last_pte, i;
 
scratch_pte = gen8_pte_encode(ppgtt->base.scratch.addr,
- I915_CACHE_LLC, use_scratch);
+ I915_CACHE_LLC, scratch);
 
while (num_entries) {
struct i915_pagedir *pd = pdp->pagedirs[pdpe];
@@ -600,23 +600,21 @@ static void gen8_ppgtt_clear_range(struct 
i915_address_space *vm,
}
 }
 
-static void gen8_ppgtt_insert_entries(struct i915_address_space *vm,
- struct sg_table *pages,
- uint64_t start,
- enum i915_cache_level cache_level)
+static void gen8_ppgtt_insert_pte_entries(struct i915_pagedirpo *pdp,
+ struct sg_page_iter *sg_iter,
+ uint64_t start,
+ size_t pages,
+ enum i915_cache_level cache_level,
+ bool flush_pt)
 {
-   struct i915_hw_ppgtt *ppgtt =
-   container_of(vm, struct i915_hw_ppgtt, base);
-   struct i915_pagedirpo *pdp = &ppgtt->pdp; /* FIXME: 48b */
gen8_gtt_pte_t *pt_vaddr;
unsigned pdpe = gen8_pdpe_index(start);
unsigned pde = gen8_pde_index(start);
unsigned pte = gen8_pte_index(start);
-   struct sg_page_iter sg_iter;
 
pt_vaddr = NULL;
 
-   for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
+   while (pages-- && __sg_page_iter_next(sg_iter)) {
if (pt_vaddr == NULL) {
struct i915_pagedir *pd = pdp->pagedirs[pdpe];
struct i915_pagetab *pt = pd->page_tables[pde];
@@ -625,10 +623,10 @@ static void gen8_ppgtt_insert_entries(struct 
i915_address_space *vm,
}
 
pt_vaddr[pte] =
-   gen8_pte_encode(sg_page_iter_dma_address(&sg_iter),
+   gen8_pte_encode(sg_page_iter_dma_address(sg_iter),
cache_level, true);
if (++pte == GEN8_PTES_PER_PT) {
-   if (!HAS_LLC(ppgtt->base.dev))
+   if (flush_pt)
drm_clflush_virt_range(pt_vaddr, PAGE_SIZE);
kunmap_atomic(pt_vaddr);
pt_vaddr = NULL;
@@ -640,7 +638,7 @@ static void gen8_ppgtt_insert_entries(struct 
i915_address_space *vm,
}
}
if (pt_vaddr) {
-   if (!HAS_LLC(ppgtt->base.dev))
+   if (flush_pt)
drm_clflush_virt_range(pt_vaddr, PAGE_SIZE);
kunmap_atomic(pt_vaddr);
}
@@ -730,10 +728,14 @@ static void gen8_map_page_directory_pointer(struct 
i915_pml4 *pml4,
kunmap_atomic(pagemap);
 }
 
-static void gen8_unmap_vma_3lvl(struct i915_address_space *vm,
-   struct i915_pagedirpo *pdp,
-   uint64_t start, uint64_t length)
+/* Returns 1 if the a PDP(s) has been freed and the caller could potentially
+ * cleanup. */
+static int gen8_unmap_vma_3lvl(struct i915_address_spac

[Intel-gfx] [PATCH 50/56] drm/i915: Expand error state's address width to 64b

2014-05-09 Thread Ben Widawsky

v2: 0 pad the new 8B fields or else intel_error_decode has a hard time.
Note, regardless we need an igt update.

v3: Make reloc_offset 64b also.

Signed-off-by: Ben Widawsky 
---
 drivers/gpu/drm/i915/i915_drv.h   |  4 ++--
 drivers/gpu/drm/i915/i915_gpu_error.c | 18 ++
 2 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index a043941..b3b52cf 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -365,7 +365,7 @@ struct drm_i915_error_state {
 
struct drm_i915_error_object {
int page_count;
-   u32 gtt_offset;
+   u64 gtt_offset;
u32 *pages[0];
} *ringbuffer, *batchbuffer, *wa_batchbuffer, *ctx, *hws_page;
 
@@ -390,7 +390,7 @@ struct drm_i915_error_state {
u32 size;
u32 name;
u32 rseqno, wseqno;
-   u32 gtt_offset;
+   u64 gtt_offset;
u32 read_domains;
u32 write_domain;
s32 fence_reg:I915_MAX_NUM_FENCE_BITS;
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c 
b/drivers/gpu/drm/i915/i915_gpu_error.c
index 5d691cd..d639d6f 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -195,7 +195,7 @@ static void print_error_buffers(struct 
drm_i915_error_state_buf *m,
err_printf(m, "%s [%d]:\n", name, count);
 
while (count--) {
-   err_printf(m, "  %08x %8u %02x %02x %x %x",
+   err_printf(m, "  %016llx %8u %02x %02x %x %x",
   err->gtt_offset,
   err->size,
   err->read_domains,
@@ -402,7 +402,7 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf 
*m,
err_printf(m, " (submitted by %s [%d])",
   error->ring[i].comm,
   error->ring[i].pid);
-   err_printf(m, " --- gtt_offset = 0x%08x\n",
+   err_printf(m, " --- gtt_offset = 0x%016llx\n",
   obj->gtt_offset);
print_error_obj(m, obj);
}
@@ -410,7 +410,8 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf 
*m,
obj = error->ring[i].wa_batchbuffer;
if (obj) {
err_printf(m, "%s (w/a) --- gtt_offset = 0x%08x\n",
-  dev_priv->ring[i].name, obj->gtt_offset);
+  dev_priv->ring[i].name,
+  lower_32_bits(obj->gtt_offset));
print_error_obj(m, obj);
}
 
@@ -429,14 +430,14 @@ int i915_error_state_to_str(struct 
drm_i915_error_state_buf *m,
if ((obj = error->ring[i].ringbuffer)) {
err_printf(m, "%s --- ringbuffer = 0x%08x\n",
   dev_priv->ring[i].name,
-  obj->gtt_offset);
+  lower_32_bits(obj->gtt_offset));
print_error_obj(m, obj);
}
 
if ((obj = error->ring[i].hws_page)) {
err_printf(m, "%s --- HW Status = 0x%08x\n",
   dev_priv->ring[i].name,
-  obj->gtt_offset);
+  lower_32_bits(obj->gtt_offset));
offset = 0;
for (elt = 0; elt < PAGE_SIZE/16; elt += 4) {
err_printf(m, "[%04x] %08x %08x %08x %08x\n",
@@ -452,13 +453,14 @@ int i915_error_state_to_str(struct 
drm_i915_error_state_buf *m,
if ((obj = error->ring[i].ctx)) {
err_printf(m, "%s --- HW Context = 0x%08x\n",
   dev_priv->ring[i].name,
-  obj->gtt_offset);
+  lower_32_bits(obj->gtt_offset));
print_error_obj(m, obj);
}
}
 
if ((obj = error->semaphore_obj)) {
-   err_printf(m, "Semaphore page = 0x%08x\n", obj->gtt_offset);
+   err_printf(m, "Semaphore page = 0x%08x\n",
+  lower_32_bits(obj->gtt_offset));
for (elt = 0; elt < PAGE_SIZE/16; elt += 4) {
err_printf(m, "[%04x] %08x %08x %08x %08x\n",
   elt * 4,
@@ -554,7 +556,7 @@ i915_error_object_create_sized(struct drm_i915_private 
*dev_priv,
 {
struct drm_i915_error_object *dst;
int i;
-   u32 reloc_offset;
+   u64 reloc_offset;
 
if (src == NULL || src->pages == NULL)
return NULL;
-- 
1.9.

[Intel-gfx] [PATCH 41/56] drm/i915/bdw: Optimize PDP loads

2014-05-09 Thread Ben Widawsky

Don't do them if they're not necessary, which they're not, for the RCS,
in certain conditions.

Signed-off-by: Ben Widawsky 
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 20 
 drivers/gpu/drm/i915/i915_gem_gtt.h |  1 +
 2 files changed, 21 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index d8bb4dc..3ea0c7d 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -438,8 +438,20 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
  struct intel_ring_buffer *ring,
  bool synchronous)
 {
+   struct drm_i915_private *dev_priv = ring->dev->dev_private;
int i, ret;
 
+   /* The RCS ring gets reloaded by the hardware context state. So we only
+* need to actually reload if one of the page directory pointer have
+* changed, or it's !RCS
+*
+* Aliasing PPGTT remains special, as we do not track it's
+* reloading needs.
+*/
+   if (ppgtt != dev_priv->mm.aliasing_ppgtt &&
+   ring->id == RCS && !ppgtt->pdp.needs_reload)
+   return 0;
+
for (i = GEN8_LEGACY_PDPES - 1; i >= 0; i--) {
struct i915_pagedir *pd = ppgtt->pdp.pagedirs[i];
dma_addr_t addr = pd ? pd->daddr : ppgtt->scratch_pt->daddr;
@@ -450,6 +462,9 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
return ret;
}
 
+
+   ppgtt->pdp.needs_reload = 0;
+
return 0;
 }
 
@@ -651,6 +666,7 @@ static void gen8_teardown_va_range(struct 
i915_address_space *vm,
free_pd_single(pd, vm->dev);
ppgtt->pdp.pagedirs[pdpe] = NULL;
WARN_ON(!test_and_clear_bit(pdpe, 
ppgtt->pdp.used_pdpes));
+   ppgtt->pdp.needs_reload = 1;
}
}
 }
@@ -901,6 +917,8 @@ static int gen8_alloc_va_range(struct i915_address_space 
*vm,
}
 
set_bit(pdpe, ppgtt->pdp.used_pdpes);
+   if (test_and_set_bit(pdpe, ppgtt->pdp.used_pdpes))
+   ppgtt->pdp.needs_reload = 1;
 
gen8_map_pagetable_range(pd, start, length, ppgtt->base.dev);
}
@@ -937,6 +955,8 @@ static int gen8_ppgtt_init_common(struct i915_hw_ppgtt 
*ppgtt, uint64_t size)
ppgtt->switch_mm = gen8_mm_switch;
ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
 
+   ppgtt->pdp.needs_reload = 1;
+
ppgtt->scratch_pd = alloc_pt_scratch(ppgtt->base.dev);
if (IS_ERR(ppgtt->scratch_pd))
return PTR_ERR(ppgtt->scratch_pd);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h 
b/drivers/gpu/drm/i915/i915_gem_gtt.h
index b3d0776..dd561f3 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -198,6 +198,7 @@ struct i915_pagedirpo {
/* struct page *page; */
DECLARE_BITMAP(used_pdpes, GEN8_LEGACY_PDPES);
struct i915_pagedir *pagedirs[GEN8_LEGACY_PDPES];
+   unsigned needs_reload:1;
 };
 
 struct i915_address_space {
-- 
1.9.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 46/56] drm/i915/bdw: implement alloc/teardown for 4lvl

2014-05-09 Thread Ben Widawsky

The code for 4lvl works just as one would expect, and nicely it is able
to call into the existing 3lvl page table code to handle all of the
lower levels.

PML4 has no special attributes.

Signed-off-by: Ben Widawsky 
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 170 
 drivers/gpu/drm/i915/i915_gem_gtt.h |  12 ++-
 2 files changed, 163 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index c4b53ef..3478bf5 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -413,9 +413,12 @@ static void __pdp_fini(struct i915_pagedirpo *pdp)
 static void free_pdp_single(struct i915_pagedirpo *pdp,
struct drm_device *dev)
 {
-   __pdp_fini(pdp);
-   if (HAS_48B_PPGTT(dev))
+   if (HAS_48B_PPGTT(dev)) {
+   __pdp_fini(pdp);
+   i915_dma_unmap_single(pdp, dev);
+   __free_page(pdp->page);
kfree(pdp);
+   }
 }
 
 static int __pdp_init(struct i915_pagedirpo *pdp,
@@ -441,6 +444,58 @@ static int __pdp_init(struct i915_pagedirpo *pdp,
return 0;
 }
 
+static struct i915_pagedirpo *alloc_pdp_single(struct i915_hw_ppgtt *ppgtt,
+  struct i915_pml4 *pml4)
+{
+   struct drm_device *dev = ppgtt->base.dev;
+   struct i915_pagedirpo *pdp;
+   int ret;
+
+   BUG_ON(!HAS_48B_PPGTT(dev));
+
+   pdp = kmalloc(sizeof(*pdp), GFP_KERNEL);
+   if (!pdp)
+   return ERR_PTR(-ENOMEM);
+
+   pdp->page = alloc_page(GFP_KERNEL | GFP_DMA32 | __GFP_ZERO);
+   if (!pdp->page) {
+   kfree(pdp);
+   return ERR_PTR(-ENOMEM);
+   }
+
+   ret = __pdp_init(pdp, dev);
+   if (ret) {
+   __free_page(pdp->page);
+   kfree(pdp);
+   return ERR_PTR(ret);
+   }
+
+   i915_dma_map_px_single(pdp, dev);
+
+   return pdp;
+}
+
+static void pml4_fini(struct i915_pml4 *pml4)
+{
+   struct i915_hw_ppgtt *ppgtt =
+   container_of(pml4, struct i915_hw_ppgtt, pml4);
+   i915_dma_unmap_single(pml4, ppgtt->base.dev);
+   __free_page(pml4->page);
+}
+
+static int pml4_init(struct i915_hw_ppgtt *ppgtt)
+{
+   struct i915_pml4 *pml4 = &ppgtt->pml4;
+
+   pml4->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+   if (!pml4->page)
+   return -ENOMEM;
+
+   i915_dma_map_px_single(pml4, ppgtt->base.dev);
+
+   return 0;
+}
+
 /* Broadwell Page Directory Pointer Descriptors */
 static int gen8_write_pdp(struct intel_ring_buffer *ring,
  unsigned entry,
@@ -729,7 +784,14 @@ static void gen8_teardown_va_range_4lvl(struct 
i915_address_space *vm,
struct i915_pml4 *pml4,
uint64_t start, uint64_t length)
 {
-   BUG();
+   struct i915_pagedirpo *pdp;
+   uint64_t temp, pml4e;
+
+   gen8_for_each_pml4e(pdp, pml4, start, length, temp, pml4e) {
+   gen8_teardown_va_range_3lvl(vm, pdp, start, length);
+   if (bitmap_empty(pdp->used_pdpes, I915_PDPES_PER_PDP(vm->dev)))
+   clear_bit(pml4e, pml4->used_pml4es);
+   }
 }
 
 static void gen8_teardown_va_range(struct i915_address_space *vm,
@@ -738,10 +800,10 @@ static void gen8_teardown_va_range(struct 
i915_address_space *vm,
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
 
-   if (!HAS_48B_PPGTT(vm->dev))
-   gen8_teardown_va_range_3lvl(vm, &ppgtt->pdp, start, length);
-   else
+   if (HAS_48B_PPGTT(vm->dev))
gen8_teardown_va_range_4lvl(vm, &ppgtt->pml4, start, length);
+   else
+   gen8_teardown_va_range_3lvl(vm, &ppgtt->pdp, start, length);
 }
 
 static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
@@ -1021,12 +1083,76 @@ err_out:
return ret;
 }
 
-static int __noreturn gen8_alloc_va_range_4lvl(struct i915_address_space *vm,
-  struct i915_pml4 *pml4,
-  uint64_t start,
-  uint64_t length)
+static int gen8_alloc_va_range_4lvl(struct i915_address_space *vm,
+   struct i915_pml4 *pml4,
+   uint64_t start,
+   uint64_t length)
 {
-   BUG();
+   DECLARE_BITMAP(new_pdps, GEN8_PML4ES_PER_PML4);
+   struct i915_hw_ppgtt *ppgtt =
+   container_of(vm, struct i915_hw_ppgtt, base);
+   struct i915_pagedirpo *pdp;
+   const uint64_t orig_start = start;
+   const uint64_t orig_length = length;
+   uint64_t temp, pml4e;
+
+   /* Do the pml4 allocations first, so we don't need to track the newly
+* allocated tables below the pdp */
+

[Intel-gfx] [PATCH 44/56] drm/i915/bdw: Make pdp allocation more dynamic

2014-05-09 Thread Ben Widawsky

This transitional patch doesn't do much for the existing code. However,
it should make upcoming patches to use the full 48b address space a bit
easier to swallow. The patch also introduces the PML4, ie. the new top
level structure of the page tables.

Signed-off-by: Ben Widawsky 
---
 drivers/gpu/drm/i915/i915_drv.h |   5 ++
 drivers/gpu/drm/i915/i915_gem_gtt.c | 122 +---
 drivers/gpu/drm/i915/i915_gem_gtt.h |  40 +---
 drivers/gpu/drm/i915/i915_trace.h   |  16 +
 4 files changed, 151 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 29bf034..4d53728 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1920,6 +1920,11 @@ struct drm_i915_cmd_table {
 #define HAS_PPGTT(dev) (INTEL_INFO(dev)->gen >= 7 && 
!IS_VALLEYVIEW(dev))
 #define USES_PPGTT(dev)intel_enable_ppgtt(dev, false)
 #define USES_FULL_PPGTT(dev)   intel_enable_ppgtt(dev, true)
+#ifdef CONFIG_32BIT
+# define HAS_48B_PPGTT(dev)false
+#else
+# define HAS_48B_PPGTT(dev)(IS_BROADWELL(dev) && false)
+#endif
 
 #define HAS_OVERLAY(dev)   (INTEL_INFO(dev)->has_overlay)
 #define OVERLAY_NEEDS_PHYSICAL(dev)
(INTEL_INFO(dev)->overlay_needs_physical)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 4d01d4e..df3cd41 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -402,6 +402,45 @@ free_pd:
return ERR_PTR(ret);
 }
 
+static void __pdp_fini(struct i915_pagedirpo *pdp)
+{
+   kfree(pdp->used_pdpes);
+   kfree(pdp->pagedirs);
+   /* HACK */
+   pdp->pagedirs = NULL;
+}
+
+static void free_pdp_single(struct i915_pagedirpo *pdp,
+   struct drm_device *dev)
+{
+   __pdp_fini(pdp);
+   if (HAS_48B_PPGTT(dev))
+   kfree(pdp);
+}
+
+static int __pdp_init(struct i915_pagedirpo *pdp,
+ struct drm_device *dev)
+{
+   size_t pdpes = I915_PDPES_PER_PDP(dev);
+
+   pdp->used_pdpes = kcalloc(BITS_TO_LONGS(pdpes),
+ sizeof(unsigned long),
+ GFP_KERNEL);
+   if (!pdp->used_pdpes)
+   return -ENOMEM;
+
+   pdp->pagedirs = kcalloc(pdpes, sizeof(*pdp->pagedirs), GFP_KERNEL);
+   if (!pdp->pagedirs) {
+   kfree(pdp->used_pdpes);
+   /* the PDP might be the statically allocated top level. Keep it
+* as clean as possible */
+   pdp->used_pdpes = NULL;
+   return -ENOMEM;
+   }
+
+   return 0;
+}
+
 /* Broadwell Page Directory Pointer Descriptors */
 static int gen8_write_pdp(struct intel_ring_buffer *ring,
  unsigned entry,
@@ -440,7 +479,7 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
 {
int i, ret;
 
-   for (i = GEN8_LEGACY_PDPES - 1; i >= 0; i--) {
+   for (i = 3; i >= 0; i--) {
struct i915_pagedir *pd = ppgtt->pdp.pagedirs[i];
dma_addr_t addr = pd ? pd->daddr : ppgtt->scratch_pt->daddr;
/* The page directory might be NULL, but we need to clear out
@@ -514,9 +553,6 @@ static void gen8_ppgtt_insert_entries(struct 
i915_address_space *vm,
pt_vaddr = NULL;
 
for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
-   if (WARN_ON(pdpe >= GEN8_LEGACY_PDPES))
-   break;
-
if (pt_vaddr == NULL) {
struct i915_pagedir *pd = ppgtt->pdp.pagedirs[pdpe];
struct i915_pagetab *pt = pd->page_tables[pde];
@@ -605,10 +641,16 @@ static void gen8_teardown_va_range(struct 
i915_address_space *vm,
 {
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
+   struct drm_device *dev = vm->dev;
struct i915_pagedir *pd;
struct i915_pagetab *pt;
uint64_t temp;
-   uint32_t pdpe, pde;
+   uint32_t pdpe, pde, orig_start = start;
+
+   if (!ppgtt->pdp.pagedirs) {
+   /* If pagedirs are already free, there is nothing to do.*/
+   return;
+   }
 
gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
uint64_t pd_len = gen8_clamp_pd(start, length);
@@ -653,7 +695,7 @@ static void gen8_teardown_va_range(struct 
i915_address_space *vm,
 pde,
 pd_start & 
GENMASK_ULL(64, GEN8_PDE_SHIFT),
 GEN8_PDE_SHIFT);
-   free_pt_single(pt, vm->dev);
+   free_pt_single(pt, dev);
/* This may be nixed later. Optimize? */
gen8_unmap_pagetab

[Intel-gfx] [PATCH 42/56] TESTME: Either drop the last patch or fix it.

2014-05-09 Thread Ben Widawsky

I was getting unexplainable hangs with the last patch, even though I
think it should be correct. As the subject says, should this ever get
merged, it needs to be coordinated with the patch this reverts.

Revert "drm/i915/bdw: Optimize PDP loads"

This reverts commit 64053129b5cbd3a5f87dab27d026c17efbdf0387.
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 20 
 drivers/gpu/drm/i915/i915_gem_gtt.h |  1 -
 2 files changed, 21 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 3ea0c7d..d8bb4dc 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -438,20 +438,8 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
  struct intel_ring_buffer *ring,
  bool synchronous)
 {
-   struct drm_i915_private *dev_priv = ring->dev->dev_private;
int i, ret;
 
-   /* The RCS ring gets reloaded by the hardware context state. So we only
-* need to actually reload if one of the page directory pointer have
-* changed, or it's !RCS
-*
-* Aliasing PPGTT remains special, as we do not track it's
-* reloading needs.
-*/
-   if (ppgtt != dev_priv->mm.aliasing_ppgtt &&
-   ring->id == RCS && !ppgtt->pdp.needs_reload)
-   return 0;
-
for (i = GEN8_LEGACY_PDPES - 1; i >= 0; i--) {
struct i915_pagedir *pd = ppgtt->pdp.pagedirs[i];
dma_addr_t addr = pd ? pd->daddr : ppgtt->scratch_pt->daddr;
@@ -462,9 +450,6 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
return ret;
}
 
-
-   ppgtt->pdp.needs_reload = 0;
-
return 0;
 }
 
@@ -666,7 +651,6 @@ static void gen8_teardown_va_range(struct 
i915_address_space *vm,
free_pd_single(pd, vm->dev);
ppgtt->pdp.pagedirs[pdpe] = NULL;
WARN_ON(!test_and_clear_bit(pdpe, 
ppgtt->pdp.used_pdpes));
-   ppgtt->pdp.needs_reload = 1;
}
}
 }
@@ -917,8 +901,6 @@ static int gen8_alloc_va_range(struct i915_address_space 
*vm,
}
 
set_bit(pdpe, ppgtt->pdp.used_pdpes);
-   if (test_and_set_bit(pdpe, ppgtt->pdp.used_pdpes))
-   ppgtt->pdp.needs_reload = 1;
 
gen8_map_pagetable_range(pd, start, length, ppgtt->base.dev);
}
@@ -955,8 +937,6 @@ static int gen8_ppgtt_init_common(struct i915_hw_ppgtt 
*ppgtt, uint64_t size)
ppgtt->switch_mm = gen8_mm_switch;
ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
 
-   ppgtt->pdp.needs_reload = 1;
-
ppgtt->scratch_pd = alloc_pt_scratch(ppgtt->base.dev);
if (IS_ERR(ppgtt->scratch_pd))
return PTR_ERR(ppgtt->scratch_pd);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h 
b/drivers/gpu/drm/i915/i915_gem_gtt.h
index dd561f3..b3d0776 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -198,7 +198,6 @@ struct i915_pagedirpo {
/* struct page *page; */
DECLARE_BITMAP(used_pdpes, GEN8_LEGACY_PDPES);
struct i915_pagedir *pagedirs[GEN8_LEGACY_PDPES];
-   unsigned needs_reload:1;
 };
 
 struct i915_address_space {
-- 
1.9.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 43/56] drm/i915/bdw: Add dynamic page trace events

2014-05-09 Thread Ben Widawsky

This works the same as GEN6.

I was disappointed that I need to pass vm around now, but it's not so
much uglier than the drm_device, and having the vm in trace events is
hugely important.

v2: Consolidate pagetable/pagedirectory events

Signed-off-by: Ben Widawsky 
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 41 -
 drivers/gpu/drm/i915/i915_trace.h   | 16 +++
 2 files changed, 47 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index d8bb4dc..4d01d4e 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -558,19 +558,24 @@ static void __gen8_do_map_pt(gen8_ppgtt_pde_t *pde,
 /* It's likely we'll map more than one pagetable at a time. This function will
  * save us unnecessary kmap calls, but do no more functionally than multiple
  * calls to map_pt. */
-static void gen8_map_pagetable_range(struct i915_pagedir *pd,
+static void gen8_map_pagetable_range(struct i915_address_space *vm,
+struct i915_pagedir *pd,
 uint64_t start,
-uint64_t length,
-struct drm_device *dev)
+uint64_t length)
 {
gen8_ppgtt_pde_t *pagedir = kmap_atomic(pd->page);
struct i915_pagetab *pt;
uint64_t temp, pde;
 
-   gen8_for_each_pde(pt, pd, start, length, temp, pde)
-   __gen8_do_map_pt(pagedir + pde, pt, dev);
+   gen8_for_each_pde(pt, pd, start, length, temp, pde) {
+   __gen8_do_map_pt(pagedir + pde, pt, vm->dev);
+   trace_i915_pagetable_map(vm, pde, pt,
+gen8_pte_index(start),
+gen8_pte_count(start, length),
+GEN8_PTES_PER_PT);
+   }
 
-   if (!HAS_LLC(dev))
+   if (!HAS_LLC(vm->dev))
drm_clflush_virt_range(pagedir, PAGE_SIZE);
 
kunmap_atomic(pagedir);
@@ -634,11 +639,20 @@ static void gen8_teardown_va_range(struct 
i915_address_space *vm,
 "PDE %d not reserved, but is allocated 
(%p)",
 pde, vm);
 
+   trace_i915_pagetable_unmap(vm, pde, pt,
+  gen8_pte_index(pd_start),
+  gen8_pte_count(pd_start, 
pd_len),
+  GEN8_PTES_PER_PT);
+
bitmap_clear(pt->used_ptes,
 gen8_pte_index(pd_start),
 gen8_pte_count(pd_start, pd_len));
 
if (bitmap_empty(pt->used_ptes, GEN8_PTES_PER_PT)) {
+   trace_i915_pagetable_destroy(vm,
+pde,
+pd_start & 
GENMASK_ULL(64, GEN8_PDE_SHIFT),
+GEN8_PDE_SHIFT);
free_pt_single(pt, vm->dev);
/* This may be nixed later. Optimize? */
gen8_unmap_pagetable(ppgtt, pd, pde);
@@ -650,6 +664,9 @@ static void gen8_teardown_va_range(struct 
i915_address_space *vm,
if (bitmap_empty(pd->used_pdes, I915_PDES_PER_PD)) {
free_pd_single(pd, vm->dev);
ppgtt->pdp.pagedirs[pdpe] = NULL;
+   trace_i915_pagedirectory_destroy(vm, pdpe,
+start & 
GENMASK_ULL(64, GEN8_PDPE_SHIFT),
+GEN8_PDPE_SHIFT);
WARN_ON(!test_and_clear_bit(pdpe, 
ppgtt->pdp.used_pdpes));
}
}
@@ -698,6 +715,7 @@ static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt 
*ppgtt,
 uint64_t length,
 unsigned long *new_pts)
 {
+   struct drm_device *dev = ppgtt->base.dev;
struct i915_pagetab *unused;
uint64_t temp;
uint32_t pde;
@@ -706,19 +724,20 @@ static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt 
*ppgtt,
if (unused)
continue;
 
-   pd->page_tables[pde] = alloc_pt_single(ppgtt->base.dev);
+   pd->page_tables[pde] = alloc_pt_single(dev);
 
if (IS_ERR(pd->page_tables[pde]))
goto unwind_out;
 
set_bit(pde, new_pts);
+   trace_i915_pagetable_alloc(&ppgtt->base, pde, start, 
GEN8_PDE_SHIFT);
}
 
return 0;
 
 unwind_out:
for_each_set_bit(pde, new_pts, I915_PDES_PER_PD)
-

[Intel-gfx] [PATCH 25/56] drm/i915: Always dma map page directory allocations

2014-05-09 Thread Ben Widawsky

Similar to the patch a few back in the series, we can always map and
unmap page directories when we do their allocation and teardown. Page
directory pages only exist on gen8+, so this should only effect behavior
on those platforms.

Signed-off-by: Ben Widawsky 
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 79 +
 1 file changed, 19 insertions(+), 60 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index bb909e9..51fc036 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -311,21 +311,23 @@ err_out:
return ret;
 }
 
-static void __free_pd_single(struct i915_pagedir *pd)
+static void __free_pd_single(struct i915_pagedir *pd, struct drm_device *dev)
 {
+   i915_dma_unmap_single(pd, dev);
__free_page(pd->page);
kfree(pd);
 }
 
-#define free_pd_single(pd) do { \
+#define free_pd_single(pd, dev) do { \
if ((pd)->page) { \
-   __free_pd_single(pd); \
+   __free_pd_single(pd, dev); \
} \
 } while (0)
 
-static struct i915_pagedir *alloc_pd_single(void)
+static struct i915_pagedir *alloc_pd_single(struct drm_device *dev)
 {
struct i915_pagedir *pd;
+   int ret;
 
pd = kzalloc(sizeof(*pd), GFP_KERNEL);
if (!pd)
@@ -337,6 +339,13 @@ static struct i915_pagedir *alloc_pd_single(void)
return ERR_PTR(-ENOMEM);
}
 
+   ret = i915_dma_map_px_single(pd, dev);
+   if (ret) {
+   __free_page(pd->page);
+   kfree(pd);
+   return ERR_PTR(ret);
+   }
+
return pd;
 }
 
@@ -501,30 +510,7 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 
for (i = 0; i < ppgtt->num_pd_pages; i++) {
gen8_free_page_tables(ppgtt->pdp.pagedir[i], ppgtt->base.dev);
-   free_pd_single(ppgtt->pdp.pagedir[i]);
-   }
-}
-
-static void gen8_ppgtt_dma_unmap_pages(struct i915_hw_ppgtt *ppgtt)
-{
-   struct drm_device *dev = ppgtt->base.dev;
-   int i, j;
-
-   for (i = 0; i < ppgtt->num_pd_pages; i++) {
-   /* TODO: In the future we'll support sparse mappings, so this
-* will have to change. */
-   if (!ppgtt->pdp.pagedir[i]->daddr)
-   continue;
-
-   i915_dma_unmap_single(ppgtt->pdp.pagedir[i], dev);
-
-   for (j = 0; j < I915_PDES_PER_PD; j++) {
-   struct i915_pagedir *pd = ppgtt->pdp.pagedir[i];
-   struct i915_pagetab *pt =  pd->page_tables[j];
-   dma_addr_t addr = pt->daddr;
-   if (addr)
-   i915_dma_unmap_single(pt, dev);
-   }
+   free_pd_single(ppgtt->pdp.pagedir[i], ppgtt->base.dev);
}
 }
 
@@ -536,7 +522,6 @@ static void gen8_ppgtt_cleanup(struct i915_address_space 
*vm)
list_del(&vm->global_link);
drm_mm_takedown(&vm->mm);
 
-   gen8_ppgtt_dma_unmap_pages(ppgtt);
gen8_ppgtt_free(ppgtt);
 }
 
@@ -566,7 +551,7 @@ static int gen8_ppgtt_allocate_page_directories(struct 
i915_hw_ppgtt *ppgtt,
int i;
 
for (i = 0; i < max_pdp; i++) {
-   ppgtt->pdp.pagedir[i] = alloc_pd_single();
+   ppgtt->pdp.pagedir[i] = alloc_pd_single(ppgtt->base.dev);
if (IS_ERR(ppgtt->pdp.pagedir[i]))
goto unwind_out;
}
@@ -578,7 +563,8 @@ static int gen8_ppgtt_allocate_page_directories(struct 
i915_hw_ppgtt *ppgtt,
 
 unwind_out:
while (i--)
-   free_pd_single(ppgtt->pdp.pagedir[i]);
+   free_pd_single(ppgtt->pdp.pagedir[i],
+  ppgtt->base.dev);
 
return -ENOMEM;
 }
@@ -606,19 +592,6 @@ err_out:
return ret;
 }
 
-static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
-const int pdpe)
-{
-   int ret;
-
-   ret = i915_dma_map_px_single(ppgtt->pdp.pagedir[pdpe],
-ppgtt->base.dev);
-   if (ret)
-   return ret;
-
-   return 0;
-}
-
 /**
  * GEN8 legacy ppgtt programming is accomplished through a max 4 PDP registers
  * with a net effect resembling a 2-level page table in normal x86 terms. Each
@@ -644,16 +617,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, 
uint64_t size)
return ret;
 
/*
-* 2. Create DMA mappings for the page directories and page tables.
-*/
-   for (i = 0; i < max_pdp; i++) {
-   ret = gen8_ppgtt_setup_page_directories(ppgtt, i);
-   if (ret)
-   goto bail;
-   }
-
-   /*
-* 3. Map all the page directory entires to point to the page tables
+* 2. Map all the page directory entires to point to the page tables
 * we've allocated.
 *

[Intel-gfx] [PATCH 24/56] drm/i915: Consolidate dma mappings

2014-05-09 Thread Ben Widawsky

With a little bit of macro magic, and the fact that every page
table/dir/etc. we wish to map will have a page, and daddr member, we can
greatly simplify and reduce code.

The patch introduces an i915_dma_map/unmap which has the same semantics
as pci_map_page, but is 1 line, and doesn't require newlines, or local
variables to make it fit cleanly.

Notice that even the page allocation shares this same attribute. For
now, I am leaving that code untouched because the macro version would be
a bit on the big side - but it's a nice cleanup as well (IMO)

Signed-off-by: Ben Widawsky 
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 56 -
 1 file changed, 18 insertions(+), 38 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 92ffee7..bb909e9 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -211,45 +211,33 @@ static gen6_gtt_pte_t iris_pte_encode(dma_addr_t addr,
return pte;
 }
 
-#define dma_unmap_pt_single(pt, dev) do { \
-   pci_unmap_page((dev)->pdev, (pt)->daddr, 4096, PCI_DMA_BIDIRECTIONAL); \
+#define i915_dma_unmap_single(px, dev) do { \
+   pci_unmap_page((dev)->pdev, (px)->daddr, 4096, PCI_DMA_BIDIRECTIONAL); \
 } while (0);
 
 /**
- * dma_map_pt_single() - Create a dma mapping for a page table
- * @pt:Page table to get a DMA map for
+ * i915_dma_map_px_single() - Create a dma mapping for a page table/dir/etc.
+ * @px:Page table/dir/etc to get a DMA map for
  * @dev:   drm device
  *
  * Page table allocations are unified across all gens. They always require a
- * single 4k allocation, as well as a DMA mapping.
+ * single 4k allocation, as well as a DMA mapping. If we keep the structs
+ * symmetric here, the simple macro covers us for every page table type.
  *
  * Return: 0 if success.
  */
-static int dma_map_pt_single(struct i915_pagetab *pt, struct drm_device *dev)
-{
-   struct page *page;
-   dma_addr_t pt_addr;
-   int ret;
-
-   page = pt->page;
-   pt_addr = pci_map_page(dev->pdev, page, 0, 4096,
-  PCI_DMA_BIDIRECTIONAL);
-
-   ret = pci_dma_mapping_error(dev->pdev, pt_addr);
-   if (ret)
-   return ret;
-
-   pt->daddr = pt_addr;
-
-   return 0;
-}
+#define i915_dma_map_px_single(px, dev) \
+   pci_dma_mapping_error((dev)->pdev, \
+ (px)->daddr = pci_map_page((dev)->pdev, \
+(px)->page, 0, 4096, \
+PCI_DMA_BIDIRECTIONAL))
 
 static void free_pt_single(struct i915_pagetab *pt, struct drm_device *dev)
 {
if (WARN_ON(!pt->page))
return;
 
-   dma_unmap_pt_single(pt, dev);
+   i915_dma_unmap_single(pt, dev);
__free_page(pt->page);
kfree(pt);
 }
@@ -269,7 +257,7 @@ static struct i915_pagetab *alloc_pt_single(struct 
drm_device *dev)
return ERR_PTR(-ENOMEM);
}
 
-   ret = dma_map_pt_single(pt, dev);
+   ret = i915_dma_map_px_single(pt, dev);
if (ret) {
__free_page(pt->page);
kfree(pt);
@@ -519,7 +507,7 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 
 static void gen8_ppgtt_dma_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 {
-   struct pci_dev *hwdev = ppgtt->base.dev->pdev;
+   struct drm_device *dev = ppgtt->base.dev;
int i, j;
 
for (i = 0; i < ppgtt->num_pd_pages; i++) {
@@ -528,16 +516,14 @@ static void gen8_ppgtt_dma_unmap_pages(struct 
i915_hw_ppgtt *ppgtt)
if (!ppgtt->pdp.pagedir[i]->daddr)
continue;
 
-   pci_unmap_page(hwdev, ppgtt->pdp.pagedir[i]->daddr, PAGE_SIZE,
-  PCI_DMA_BIDIRECTIONAL);
+   i915_dma_unmap_single(ppgtt->pdp.pagedir[i], dev);
 
for (j = 0; j < I915_PDES_PER_PD; j++) {
struct i915_pagedir *pd = ppgtt->pdp.pagedir[i];
struct i915_pagetab *pt =  pd->page_tables[j];
dma_addr_t addr = pt->daddr;
if (addr)
-   pci_unmap_page(hwdev, addr, PAGE_SIZE,
-  PCI_DMA_BIDIRECTIONAL);
+   i915_dma_unmap_single(pt, dev);
}
}
 }
@@ -623,19 +609,13 @@ err_out:
 static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
 const int pdpe)
 {
-   dma_addr_t pd_addr;
int ret;
 
-   pd_addr = pci_map_page(ppgtt->base.dev->pdev,
-  ppgtt->pdp.pagedir[pdpe]->page, 0,
-  PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
-
-   ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
+   ret = i915_dma_map_

[Intel-gfx] [PATCH 32/56] drm/i915/bdw: pagetable allocation rework

2014-05-09 Thread Ben Widawsky

Signed-off-by: Ben Widawsky 
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 54 -
 drivers/gpu/drm/i915/i915_gem_gtt.h | 10 +++
 2 files changed, 39 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 10cfad8..041ddca 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -553,14 +553,6 @@ static void gen8_teardown_va_range(struct 
i915_address_space *vm,
}
 }
 
-/* This function will die soon */
-static void gen8_free_full_pagedir(struct i915_hw_ppgtt *ppgtt, int i)
-{
-   gen8_teardown_va_range(&ppgtt->base,
-  i << GEN8_PDPE_SHIFT,
-  (1 << GEN8_PDPE_SHIFT));
-}
-
 static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
trace_i915_va_teardown(&ppgtt->base,
@@ -580,22 +572,27 @@ static void gen8_ppgtt_cleanup(struct i915_address_space 
*vm)
gen8_ppgtt_free(ppgtt);
 }
 
-static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
+static int gen8_ppgtt_alloc_pagetabs(struct i915_pagedir *pd,
+uint64_t start,
+uint64_t length,
+struct drm_device *dev)
 {
-   int i, ret;
+   struct i915_pagetab *unused;
+   uint64_t temp;
+   uint32_t pde;
 
-   for (i = 0; i < ppgtt->num_pd_pages; i++) {
-   ret = alloc_pt_range(ppgtt->pdp.pagedirs[i],
-0, I915_PDES_PER_PD, ppgtt->base.dev);
-   if (ret)
+   gen8_for_each_pde(unused, pd, start, length, temp, pde) {
+   BUG_ON(unused);
+   pd->page_tables[pde] = alloc_pt_single(dev);
+   if (IS_ERR(pd->page_tables[pde]))
goto unwind_out;
}
 
return 0;
 
 unwind_out:
-   while (i--)
-   gen8_free_full_pagedir(ppgtt, i);
+   while (pde--)
+   free_pt_single(pd->page_tables[pde], dev);
 
return -ENOMEM;
 }
@@ -639,20 +636,28 @@ unwind_out:
 }
 
 static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
-   const int max_pdp)
+   uint64_t start,
+   uint64_t length)
 {
+   struct i915_pagedir *pd;
+   uint64_t temp;
+   uint32_t pdpe;
int ret;
 
-   ret = gen8_ppgtt_alloc_pagedirs(&ppgtt->pdp, ppgtt->base.start,
-   ppgtt->base.total);
+   ret = gen8_ppgtt_alloc_pagedirs(&ppgtt->pdp, start, length);
if (ret)
return ret;
 
-   ret = gen8_ppgtt_allocate_page_tables(ppgtt);
-   if (ret)
-   goto err_out;
+   gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
+   ret = gen8_ppgtt_alloc_pagetabs(pd, start, length,
+   ppgtt->base.dev);
+   if (ret)
+   goto err_out;
+
+   ppgtt->num_pd_entries += I915_PDES_PER_PD;
+   }
 
-   ppgtt->num_pd_entries = max_pdp * I915_PDES_PER_PD;
+   BUG_ON(pdpe > ppgtt->num_pd_pages);
 
return 0;
 
@@ -683,10 +688,9 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, 
uint64_t size)
 
ppgtt->base.start = 0;
ppgtt->base.total = size;
-   BUG_ON(ppgtt->base.total == 0);
 
/* 1. Do all our allocations for page directories and page tables. */
-   ret = gen8_ppgtt_alloc(ppgtt, max_pdp);
+   ret = gen8_ppgtt_alloc(ppgtt, ppgtt->base.start, ppgtt->base.total);
if (ret)
return ret;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h 
b/drivers/gpu/drm/i915/i915_gem_gtt.h
index f81b26a..fae0867 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -411,6 +411,16 @@ static inline size_t gen6_pde_count(uint32_t addr, 
uint32_t length)
 temp = min(temp, length),  \
 start += temp, length -= temp)
 
+/* Clamp length to the next pagetab boundary */
+static inline uint64_t gen8_clamp_pt(uint64_t start, uint64_t length)
+{
+   uint64_t next_pt = ALIGN(start + 1, 1 << GEN8_PDE_SHIFT);
+   if (next_pt > (start + length))
+   return length;
+
+   return next_pt - start;
+}
+
 /* Clamp length to the next pagedir boundary */
 static inline uint64_t gen8_clamp_pd(uint64_t start, uint64_t length)
 {
-- 
1.9.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 28/56] drm/i915: Force pd restore when PDEs change, gen6-7

2014-05-09 Thread Ben Widawsky

The docs say you cannot change the PDEs of a currently running context. If you
are changing the PDEs of the currently running context then. We never
map new PDEs of a running context, and expect them to be present - so I
think this is okay. (We can unmap, but this should also be okay since we
only unmap unreferenced objects that the GPU shouldn't be tryingto
va->pa xlate.) The MI_SET_CONTEXT command does have a flag to signal
that even if the context is the same, force a reload. It's unclear
exactly what this does, but I have a hunch it's the right thing to do.

The logic assumes that we always emit a context switch after mapping new
PDEs, and before we submit a batch. This is the case today, and has been
the case since the inception of hardware contexts. A note in the comment
let's the user know.

NOTE: I have no evidence to suggest this is actually needed other than a
few tidbits which lead me to believe there are some corner cases that
will require it. I'm mostly depending on the reload of DCLV to
invalidate the old TLBs. We can try to remove this patch and see what
happens.

Signed-off-by: Ben Widawsky 
---
 drivers/gpu/drm/i915/i915_gem_context.c| 15 ---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  5 +
 drivers/gpu/drm/i915/i915_gem_gtt.c| 17 -
 drivers/gpu/drm/i915/i915_gem_gtt.h|  2 ++
 4 files changed, 35 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
b/drivers/gpu/drm/i915/i915_gem_context.c
index 7eb4091..5155d09 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -596,9 +596,18 @@ mi_set_context(struct intel_ring_buffer *ring,
 
 static inline bool should_skip_switch(struct intel_ring_buffer *ring,
  struct i915_hw_context *from,
- struct i915_hw_context *to)
+ struct i915_hw_context *to,
+ u32 *flags)
 {
-   if (from == to && from->last_ring == ring && !to->remap_slice)
+   if (test_and_clear_bit(ring->id, &to->vm->pd_reload_mask)) {
+   *flags |= MI_FORCE_RESTORE;
+   return false;
+   }
+
+   if (to->remap_slice)
+   return false;
+
+   if (from == to && from->last_ring == ring)
return true;
 
return false;
@@ -618,7 +627,7 @@ static int do_switch(struct intel_ring_buffer *ring,
BUG_ON(!i915_gem_obj_is_pinned(from->obj));
}
 
-   if (should_skip_switch(ring, from, to))
+   if (should_skip_switch(ring, from, to, &hw_flags))
return 0;
 
/* Trying to pin first makes error handling easier. */
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 3c3aba7..08fde7d 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1224,6 +1224,10 @@ i915_gem_do_execbuffer(struct drm_device *dev, void 
*data,
if (ret)
goto err;
 
+   /* XXX: Reserve has possibly change PDEs which means we must do a
+* context switch before we can coherently read some of the reserved
+* VMAs. */
+
/* The objects are in their final locations, apply the relocations. */
if (need_relocs)
ret = i915_gem_execbuffer_relocate(eb);
@@ -1328,6 +1332,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
goto err;
}
} else {
+   WARN_ON(vm->pd_reload_mask & (1dispatch_execbuffer(ring,
exec_start, exec_len,
flags);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index b7a0232..1d459e3 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1268,6 +1268,16 @@ int i915_gem_init_ppgtt(struct drm_device *dev, struct 
i915_hw_ppgtt *ppgtt)
return 0;
 }
 
+/* PDE TLBs are a pain invalidate pre GEN8. It requires a context reload. If we
+ * are switching between contexts with the same LRCA, we also must do a force
+ * restore.
+ */
+#define ppgtt_invalidate_tlbs(vm) do {\
+   if (INTEL_INFO(vm->dev)->gen < 8) { \
+   vm->pd_reload_mask = INTEL_INFO(vm->dev)->ring_mask; \
+   } \
+} while(0)
+
 static int
 ppgtt_bind_vma(struct i915_vma *vma,
   enum i915_cache_level cache_level,
@@ -1282,10 +1292,13 @@ ppgtt_bind_vma(struct i915_vma *vma,
 vma->node.size);
if (ret)
return ret;
+
+   ppgtt_invalidate_tlbs(vma->vm);
}
 
vma->vm->insert_entries(vma->vm, vma->obj->pages, vma->node.start,
cache

[Intel-gfx] [PATCH 38/56] drm/i915/bdw: Dynamic page table allocations

2014-05-09 Thread Ben Widawsky

This finishes off the dynamic page tables allocations, in the legacy 3
level style that already exists. Most everything has already been setup
to this point, the patch finishes off the enabling by setting the
appropriate function pointers.

Signed-off-by: Ben Widawsky 
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 261 +---
 1 file changed, 216 insertions(+), 45 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 82b98ea..66ed943 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -653,58 +653,160 @@ static void gen8_ppgtt_cleanup(struct i915_address_space 
*vm)
gen8_ppgtt_free(ppgtt);
 }
 
-static int gen8_ppgtt_alloc_pagetabs(struct i915_pagedir *pd,
+/**
+ * gen8_ppgtt_alloc_pagetabs() - Allocate page tables for VA range.
+ * @ppgtt: Master ppgtt structure.
+ * @pd:Page directory for this address range.
+ * @start: Starting virtual address to begin allocations.
+ * @length Size of the allocations.
+ * @new_pts:   Bitmap set by function with new allocations. Likely used by the
+ * caller to free on error.
+ *
+ * Allocate the required number of page tables. Extremely similar to
+ * gen8_ppgtt_alloc_pagedirs(). The main difference is here we are limited by
+ * the page directory boundary (instead of the page directory pointer). That
+ * boundary is 1GB virtual. Therefore, unlike gen8_ppgtt_alloc_pagedirs(), it 
is
+ * possible, and likely that the caller will need to use multiple calls of this
+ * function to achieve the appropriate allocation.
+ *
+ * Return: 0 if success; negative error code otherwise.
+ */
+static int gen8_ppgtt_alloc_pagetabs(struct i915_hw_ppgtt *ppgtt,
+struct i915_pagedir *pd,
 uint64_t start,
 uint64_t length,
-struct drm_device *dev)
+unsigned long *new_pts)
 {
struct i915_pagetab *unused;
uint64_t temp;
uint32_t pde;
 
gen8_for_each_pde(unused, pd, start, length, temp, pde) {
-   BUG_ON(unused);
-   pd->page_tables[pde] = alloc_pt_single(dev);
+   if (unused)
+   continue;
+
+   pd->page_tables[pde] = alloc_pt_single(ppgtt->base.dev);
+
if (IS_ERR(pd->page_tables[pde]))
goto unwind_out;
+
+   set_bit(pde, new_pts);
}
 
return 0;
 
 unwind_out:
-   while (pde--)
-   free_pt_single(pd->page_tables[pde], dev);
+   for_each_set_bit(pde, new_pts, I915_PDES_PER_PD)
+   free_pt_single(pd->page_tables[pde], ppgtt->base.dev);
 
return -ENOMEM;
 }
 
-/* bitmap of new pagedirs */
-static int gen8_ppgtt_alloc_pagedirs(struct i915_pagedirpo *pdp,
+/**
+ * gen8_ppgtt_alloc_pagedirs() - Allocate page directories for VA range.
+ * @ppgtt: Master ppgtt structure.
+ * @pdp:   Page directory pointer for this address range.
+ * @start: Starting virtual address to begin allocations.
+ * @length Size of the allocations.
+ * @new_pdsBitmap set by function with new allocations. Likely used by the
+ * caller to free on error.
+ *
+ * Allocate the required number of page directories starting at the pde index 
of
+ * @start, and ending at the pde index @start + @length. This function will 
skip
+ * over already allocated page directories within the range, and only allocate
+ * new ones, setting the appropriate pointer within the pdp as well as the
+ * correct position in the bitmap @new_pds.
+ *
+ * The function will only allocate the pages within the range for a give page
+ * directory pointer. In other words, if @start + @length straddles a virtually
+ * addressed PDP boundary (512GB for 4k pages), there will be more allocations
+ * required by the caller, This is not currently possible, and the BUG in the
+ * code will prevent it.
+ *
+ * Return: 0 if success; negative error code otherwise.
+ */
+static int gen8_ppgtt_alloc_pagedirs(struct i915_hw_ppgtt *ppgtt,
+struct i915_pagedirpo *pdp,
 uint64_t start,
 uint64_t length,
-struct drm_device *dev)
+unsigned long *new_pds)
 {
struct i915_pagedir *unused;
uint64_t temp;
uint32_t pdpe;
 
+   BUG_ON(!bitmap_empty(new_pds, GEN8_LEGACY_PDPES));
+
/* FIXME: PPGTT container_of won't work for 64b */
BUG_ON((start + length) > 0x8ULL);
 
gen8_for_each_pdpe(unused, pdp, start, length, temp, pdpe) {
-   BUG_ON(unused);
-   pdp->pagedirs[pdpe] = alloc_pd_single(dev);
+   struct i915_pagedir *pd;
+

[Intel-gfx] [PATCH 26/56] drm/i915: Track GEN6 page table usage

2014-05-09 Thread Ben Widawsky

Instead of implementing the full tracking + dynamic allocation, this
patch does a bit less than half of the work, by tracking and warning on
unexpected conditions. The tracking itself follows which PTEs within a
page table are currently being used for objects. The next patch will
modify this to actually allocate the page tables only when necessary.

With the current patch there isn't much in the way of making a gen
agnostic range allocation function. However, in the next patch we'll add
more specificity which makes having separate functions a bit easier to
manage.

Notice that aliasing PPGTT is not managed here. The patch which actually
begins dynamic allocation/teardown explains the reasoning forthis.

v2: s/pdp.pagedir/pdp.pagedirs
Make a scratch page allocation helper

Signed-off-by: Ben Widawsky 
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 203 
 drivers/gpu/drm/i915/i915_gem_gtt.h | 117 +
 2 files changed, 231 insertions(+), 89 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 51fc036..b7a0232 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -66,10 +66,9 @@ static int sanitize_enable_ppgtt(struct drm_device *dev, int 
enable_ppgtt)
return HAS_ALIASING_PPGTT(dev) ? 1 : 0;
 }
 
-
-static void ppgtt_bind_vma(struct i915_vma *vma,
-  enum i915_cache_level cache_level,
-  u32 flags);
+static int ppgtt_bind_vma(struct i915_vma *vma,
+ enum i915_cache_level cache_level,
+ u32 flags);
 static void ppgtt_unbind_vma(struct i915_vma *vma);
 static int gen8_ppgtt_enable(struct i915_hw_ppgtt *ppgtt);
 
@@ -232,37 +231,78 @@ static gen6_gtt_pte_t iris_pte_encode(dma_addr_t addr,
 (px)->page, 0, 4096, \
 PCI_DMA_BIDIRECTIONAL))
 
-static void free_pt_single(struct i915_pagetab *pt, struct drm_device *dev)
+static void __free_pt_single(struct i915_pagetab *pt, struct drm_device *dev,
+int scratch)
 {
+   if (WARN(scratch ^ pt->scratch,
+"Tried to free scratch = %d. Is scratch = %d\n",
+scratch, pt->scratch))
+   return;
+
if (WARN_ON(!pt->page))
return;
 
+   if (!scratch) {
+   const size_t count = INTEL_INFO(dev)->gen >= 8 ?
+   GEN8_PTES_PER_PT : GEN6_PTES_PER_PT;
+   WARN(!bitmap_empty(pt->used_ptes, count),
+"Free page table with %d used pages\n",
+bitmap_weight(pt->used_ptes, count));
+   }
+
i915_dma_unmap_single(pt, dev);
__free_page(pt->page);
+   kfree(pt->used_ptes);
kfree(pt);
 }
 
+#define free_pt_single(pt, dev) \
+   __free_pt_single(pt, dev, false)
+#define free_pt_scratch(pt, dev) \
+   __free_pt_single(pt, dev, true)
+
 static struct i915_pagetab *alloc_pt_single(struct drm_device *dev)
 {
struct i915_pagetab *pt;
-   int ret;
+   const size_t count = INTEL_INFO(dev)->gen >= 8 ?
+   GEN8_PTES_PER_PT : GEN6_PTES_PER_PT;
+   int ret = -ENOMEM;
 
pt = kzalloc(sizeof(*pt), GFP_KERNEL);
if (!pt)
return ERR_PTR(-ENOMEM);
 
+   pt->used_ptes = kcalloc(BITS_TO_LONGS(count), sizeof(*pt->used_ptes),
+   GFP_KERNEL);
+
+   if (!pt->used_ptes)
+   goto fail_bitmap;
+
pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
-   if (!pt->page) {
-   kfree(pt);
-   return ERR_PTR(-ENOMEM);
-   }
+   if (!pt->page)
+   goto fail_page;
 
ret = i915_dma_map_px_single(pt, dev);
-   if (ret) {
-   __free_page(pt->page);
-   kfree(pt);
-   return ERR_PTR(ret);
-   }
+   if (ret)
+   goto fail_dma;
+
+   return pt;
+
+fail_dma:
+   __free_page(pt->page);
+fail_page:
+   kfree(pt->used_ptes);
+fail_bitmap:
+   kfree(pt);
+
+   return ERR_PTR(ret);
+}
+
+static inline struct i915_pagetab *alloc_pt_scratch(struct drm_device *dev)
+{
+   struct i915_pagetab *pt = alloc_pt_single(dev);
+   if (!IS_ERR(pt))
+   pt->scratch = 1;
 
return pt;
 }
@@ -389,7 +429,7 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
int used_pd = ppgtt->num_pd_entries / I915_PDES_PER_PD;
 
for (i = used_pd - 1; i >= 0; i--) {
-   dma_addr_t addr = ppgtt->pdp.pagedir[i]->daddr;
+   dma_addr_t addr = ppgtt->pdp.pagedirs[i]->daddr;
ret = gen8_write_pdp(ring, i, addr, synchronous);
if (ret)
return ret;
@@ -416,7 +456,7 @@ static void gen8_ppgtt_clear_range(struct 
i915_address_space *vm,

[Intel-gfx] [PATCH 39/56] drm/i915/bdw: Scratch unused pages

2014-05-09 Thread Ben Widawsky

This is probably not required since BDW is hopefully a bit more robust
that previous generations. Realize also that scratch will not exist for
every entry within the page table structure. Doing this would waste
an extraordinary amount of space when we move to 4 level page tables.
Therefore, the scratch pages/tables will only be pointed to by page
tables which have  less than all of the entries filled.

I wrote the patch while debugging so I figured why not put it in the
series.

Signed-off-by: Ben Widawsky 
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 25 +++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 66ed943..2b732ca 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -576,6 +576,25 @@ static void gen8_map_pagetable_range(struct i915_pagedir 
*pd,
kunmap_atomic(pagedir);
 }
 
+static void gen8_map_pagedir(struct i915_pagedir *pd,
+struct i915_pagetab *pt,
+int entry,
+struct drm_device *dev)
+{
+   gen8_ppgtt_pde_t *pagedir = kmap_atomic(pd->page);
+   __gen8_do_map_pt(pagedir + entry, pt, dev);
+   kunmap_atomic(pagedir);
+}
+
+static void gen8_unmap_pagetable(struct i915_hw_ppgtt *ppgtt,
+struct i915_pagedir *pd,
+int pde)
+{
+   pd->page_tables[pde] = NULL;
+   WARN_ON(!test_and_clear_bit(pde, pd->used_pdes));
+   gen8_map_pagedir(pd, ppgtt->scratch_pt, pde, ppgtt->base.dev);
+}
+
 static void gen8_teardown_va_range(struct i915_address_space *vm,
   uint64_t start, uint64_t length)
 {
@@ -621,8 +640,10 @@ static void gen8_teardown_va_range(struct 
i915_address_space *vm,
 
if (bitmap_empty(pt->used_ptes, GEN8_PTES_PER_PT)) {
free_pt_single(pt, vm->dev);
-   pd->page_tables[pde] = NULL;
-   WARN_ON(!test_and_clear_bit(pde, 
pd->used_pdes));
+   /* This may be nixed later. Optimize? */
+   gen8_unmap_pagetable(ppgtt, pd, pde);
+   } else {
+   gen8_ppgtt_clear_range(vm, pd_start, pd_len, 
true);
}
}
 
-- 
1.9.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 29/56] drm/i915: Finish gen6/7 dynamic page table allocation

2014-05-09 Thread Ben Widawsky

This patch continues on the idea from the previous patch. From here on,
in the steady state, PDEs are all pointing to the scratch page table (as
recommended in the spec). When an object is allocated in the VA range,
the code will determine if we need to allocate a page for the page
table. Similarly when the object is destroyed, we will remove, and free
the page table pointing the PDE back to the scratch page.

Following patches will work to unify the code a bit as we bring in GEN8
support. GEN6 and GEN8 are different enough that I had a hard time to
get to this point with as much common code as I do.

The aliasing PPGTT must pre-allocate all of the page tables. There are a
few reasons for this. Two trivial ones: aliasing ppgtt goes through the
ggtt paths, so it's hard to maintain, we currently do not restore the
default context (assuming the previous force reload is indeed
necessary). Most importantly though, the only way (it seems from
empirical evidence) to invalidate the CS TLBs on non-render ring is to
either use ring sync (which requires actually stopping the rings in
order to synchronize when the sync completes vs. where you are in
execution), or to reload DCLV.  Since without full PPGTT we do not ever
reload the DCLV register, there is no good way to achieve this. The
simplest solution is just to not support dynamic page table
creation/destruction in the aliasing PPGTT.

We could always reload DCLV, but this seems like quite a bit of excess
overhead only to save at most 2MB-4k of memory for the aliasing PPGTT
page tables.

v2: Make the page table bitmap declared inside the function (Chris)
Simplify the way scratching address space works.
Move the alloc/teardown tracepoints up a level in the call stack so that
both all implementations get the trace.

Signed-off-by: Ben Widawsky 
---
 drivers/gpu/drm/i915/i915_debugfs.c |  19 -
 drivers/gpu/drm/i915/i915_gem_context.c |   2 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c | 118 +---
 drivers/gpu/drm/i915/i915_gem_gtt.h |   2 +-
 drivers/gpu/drm/i915/i915_trace.h   | 108 +
 5 files changed, 238 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index 64051b0..921d898 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1812,10 +1812,26 @@ static int i915_swizzle_info(struct seq_file *m, void 
*data)
return 0;
 }
 
+static size_t gen6_ppgtt_count_pt_pages(struct i915_hw_ppgtt *ppgtt)
+{
+   struct i915_pagedir *pd = &ppgtt->pd;
+   struct i915_pagetab **pt = &pd->page_tables[0];
+   size_t cnt = 0;
+   int i;
+
+   for (i = 0; i < ppgtt->num_pd_entries; i++) {
+   if (pt[i] != ppgtt->scratch_pt)
+   cnt++;
+   }
+
+   return cnt;
+}
+
 static void print_ppgtt(struct seq_file *m, struct i915_hw_ppgtt *ppgtt, const 
char *name)
 {
seq_printf(m, "%s:\n", name);
seq_printf(m, "pd gtt offset: 0x%08x\n", ppgtt->pd.pd_offset);
+   seq_printf(m, "\tpd pages: %zu\n", gen6_ppgtt_count_pt_pages(ppgtt));
 }
 
 static void gen8_ppgtt_info(struct seq_file *m, struct drm_device *dev, int 
verbose)
@@ -1874,6 +1890,8 @@ static void gen6_ppgtt_info(struct seq_file *m, struct 
drm_device *dev, bool ver
seq_printf(m, "PP_DIR_BASE_READ: 0x%08x\n", 
I915_READ(RING_PP_DIR_BASE_READ(ring)));
seq_printf(m, "PP_DIR_DCLV: 0x%08x\n", 
I915_READ(RING_PP_DIR_DCLV(ring)));
}
+   seq_printf(m, "ECOCHK: 0x%08x\n\n", I915_READ(GAM_ECOCHK));
+
if (dev_priv->mm.aliasing_ppgtt) {
struct i915_hw_ppgtt *ppgtt = dev_priv->mm.aliasing_ppgtt;
 
@@ -1894,7 +1912,6 @@ static void gen6_ppgtt_info(struct seq_file *m, struct 
drm_device *dev, bool ver
idr_for_each(&file_priv->context_idr, per_file_ctx,
 (void *)((unsigned long)m | verbose));
}
-   seq_printf(m, "ECOCHK: 0x%08x\n", I915_READ(GAM_ECOCHK));
 }
 
 static int i915_ppgtt_info(struct seq_file *m, void *data)
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
b/drivers/gpu/drm/i915/i915_gem_context.c
index 5155d09..fec8114 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -208,7 +208,7 @@ create_vm_for_ctx(struct drm_device *dev, struct 
i915_hw_context *ctx)
if (!ppgtt)
return ERR_PTR(-ENOMEM);
 
-   ret = i915_gem_init_ppgtt(dev, ppgtt);
+   ret = i915_gem_init_ppgtt(dev, ppgtt, ctx->file_priv == NULL);
if (ret) {
kfree(ppgtt);
return ERR_PTR(ret);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 1d459e3..68cc1ab 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1064,10 +1064,47 @@ static void gen6_ppgtt_insert_entries(struct 
i915_address_sp

[Intel-gfx] [PATCH 33/56] drm/i915/bdw: Make the pdp switch a bit less hacky

2014-05-09 Thread Ben Widawsky

One important part of this patch is we now write a scratch page
directory into any unused PDP descriptors. This matters for 2 reasons,
first, it's not clear we're allowed to just use 0, or an invalid
pointer, and second, we must wipe out any previous contents from the last
context.

The latter point only matters with full PPGTT. The former point would
only effect 32b platforms, or platforms with less than 4GB memory.

Signed-off-by: Ben Widawsky 
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 32 
 drivers/gpu/drm/i915/i915_gem_gtt.h |  5 -
 2 files changed, 24 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 041ddca..a895f4b 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -390,8 +390,10 @@ static struct i915_pagedir *alloc_pd_single(struct 
drm_device *dev)
 }
 
 /* Broadwell Page Directory Pointer Descriptors */
-static int gen8_write_pdp(struct intel_ring_buffer *ring, unsigned entry,
-  uint64_t val, bool synchronous)
+static int gen8_write_pdp(struct intel_ring_buffer *ring,
+ unsigned entry,
+ dma_addr_t addr,
+ bool synchronous)
 {
struct drm_i915_private *dev_priv = ring->dev->dev_private;
int ret;
@@ -399,8 +401,8 @@ static int gen8_write_pdp(struct intel_ring_buffer *ring, 
unsigned entry,
BUG_ON(entry >= 4);
 
if (synchronous) {
-   I915_WRITE(GEN8_RING_PDP_UDW(ring, entry), val >> 32);
-   I915_WRITE(GEN8_RING_PDP_LDW(ring, entry), (u32)val);
+   I915_WRITE(GEN8_RING_PDP_UDW(ring, entry), upper_32_bits(addr));
+   I915_WRITE(GEN8_RING_PDP_LDW(ring, entry), lower_32_bits(addr));
return 0;
}
 
@@ -410,10 +412,10 @@ static int gen8_write_pdp(struct intel_ring_buffer *ring, 
unsigned entry,
 
intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
intel_ring_emit(ring, GEN8_RING_PDP_UDW(ring, entry));
-   intel_ring_emit(ring, (u32)(val >> 32));
+   intel_ring_emit(ring, upper_32_bits(addr));
intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(1));
intel_ring_emit(ring, GEN8_RING_PDP_LDW(ring, entry));
-   intel_ring_emit(ring, (u32)(val));
+   intel_ring_emit(ring, lower_32_bits(addr));
intel_ring_advance(ring);
 
return 0;
@@ -425,11 +427,11 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
 {
int i, ret;
 
-   /* bit of a hack to find the actual last used pd */
-   int used_pd = ppgtt->num_pd_entries / I915_PDES_PER_PD;
-
-   for (i = used_pd - 1; i >= 0; i--) {
-   dma_addr_t addr = ppgtt->pdp.pagedirs[i]->daddr;
+   for (i = GEN8_LEGACY_PDPES - 1; i >= 0; i--) {
+   struct i915_pagedir *pd = ppgtt->pdp.pagedirs[i];
+   dma_addr_t addr = pd ? pd->daddr : ppgtt->scratch_pt->daddr;
+   /* The page directory might be NULL, but we need to clear out
+* whatever the previous context might have used. */
ret = gen8_write_pdp(ring, i, addr, synchronous);
if (ret)
return ret;
@@ -689,10 +691,16 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, 
uint64_t size)
ppgtt->base.start = 0;
ppgtt->base.total = size;
 
+   ppgtt->scratch_pd = alloc_pt_scratch(ppgtt->base.dev);
+   if (IS_ERR(ppgtt->scratch_pd))
+   return PTR_ERR(ppgtt->scratch_pd);
+
/* 1. Do all our allocations for page directories and page tables. */
ret = gen8_ppgtt_alloc(ppgtt, ppgtt->base.start, ppgtt->base.total);
-   if (ret)
+   if (ret) {
+   free_pt_scratch(ppgtt->scratch_pd, ppgtt->base.dev);
return ret;
+   }
 
/*
 * 2. Map all the page directory entires to point to the page tables
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h 
b/drivers/gpu/drm/i915/i915_gem_gtt.h
index fae0867..5c6db90 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -267,7 +267,10 @@ struct i915_hw_ppgtt {
struct i915_pagedir pd;
};
 
-   struct i915_pagetab *scratch_pt;
+   union {
+   struct i915_pagetab *scratch_pt;
+   struct i915_pagetab *scratch_pd; /* Just need the daddr */
+   };
 
struct i915_hw_context *ctx;
 
-- 
1.9.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 40/56] drm/i915/bdw: Add ppgtt info for dynamic pages

2014-05-09 Thread Ben Widawsky

Signed-off-by: Ben Widawsky 
---
 drivers/gpu/drm/i915/i915_debugfs.c | 59 +++--
 drivers/gpu/drm/i915/i915_gem_gtt.c | 32 
 drivers/gpu/drm/i915/i915_gem_gtt.h |  9 ++
 3 files changed, 85 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index 40aca7f..c29c71a 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1826,11 +1826,40 @@ static size_t gen6_ppgtt_count_pt_pages(struct 
i915_hw_ppgtt *ppgtt)
return cnt;
 }
 
+static void gen8_ppgtt_debugfs_counter(struct i915_pagedirpo *pdp,
+  struct i915_pagedir *pd,
+  struct i915_pagetab *pt,
+  unsigned pdpe,
+  unsigned pde,
+  void *data)
+{
+   if (!pd || !pt)
+   return;
+
+   (*(size_t *)data)++;
+}
+
+static size_t gen8_ppgtt_count_pt_pages(struct i915_hw_ppgtt *ppgtt)
+{
+   size_t count = 0;
+
+   gen8_for_every_pdpe_pde(ppgtt, gen8_ppgtt_debugfs_counter, &count);
+
+   return count;
+}
+
 static void print_ppgtt(struct seq_file *m, struct i915_hw_ppgtt *ppgtt, const 
char *name)
 {
+   struct drm_device *dev = ppgtt->base.dev;
+
seq_printf(m, "%s:\n", name);
-   seq_printf(m, "pd gtt offset: 0x%08x\n", ppgtt->pd.pd_offset);
-   seq_printf(m, "\tpd pages: %zu\n", gen6_ppgtt_count_pt_pages(ppgtt));
+
+   if (INTEL_INFO(dev)->gen < 8) {
+   seq_printf(m, "\tpd pages: %zu\n", 
gen6_ppgtt_count_pt_pages(ppgtt));
+   seq_printf(m, "pd gtt offset: 0x%08x\n", ppgtt->pd.pd_offset);
+   } else {
+   seq_printf(m, "\tpage table overhead: %zu pages\n", 
gen8_ppgtt_count_pt_pages(ppgtt));
+   }
 }
 
 static void gen8_ppgtt_info(struct seq_file *m, struct drm_device *dev, int 
verbose)
@@ -1873,7 +1902,6 @@ static void gen6_ppgtt_info(struct seq_file *m, struct 
drm_device *dev, bool ver
 {
struct drm_i915_private *dev_priv = dev->dev_private;
struct intel_ring_buffer *ring;
-   struct drm_file *file;
int i;
 
if (INTEL_INFO(dev)->gen == 6)
@@ -1897,18 +1925,6 @@ static void gen6_ppgtt_info(struct seq_file *m, struct 
drm_device *dev, bool ver
ppgtt->debug_dump(ppgtt, m);
} else
return;
-
-   list_for_each_entry_reverse(file, &dev->filelist, lhead) {
-   struct drm_i915_file_private *file_priv = file->driver_priv;
-   struct i915_hw_ppgtt *pvt_ppgtt;
-
-   pvt_ppgtt = ctx_to_ppgtt(file_priv->private_default_ctx);
-   seq_printf(m, "\nproc: %s\n",
-  get_pid_task(file->pid, PIDTYPE_PID)->comm);
-   print_ppgtt(m, pvt_ppgtt, "Default context");
-   idr_for_each(&file_priv->context_idr, per_file_ctx,
-(void *)((unsigned long)m | verbose));
-   }
 }
 
 static int i915_ppgtt_info(struct seq_file *m, void *data)
@@ -1917,6 +1933,7 @@ static int i915_ppgtt_info(struct seq_file *m, void *data)
struct drm_device *dev = node->minor->dev;
struct drm_i915_private *dev_priv = dev->dev_private;
bool verbose = node->info_ent->data ? true : false;
+   struct drm_file *file;
 
int ret = mutex_lock_interruptible(&dev->struct_mutex);
if (ret)
@@ -1928,6 +1945,18 @@ static int i915_ppgtt_info(struct seq_file *m, void 
*data)
else if (INTEL_INFO(dev)->gen >= 6)
gen6_ppgtt_info(m, dev, verbose);
 
+   list_for_each_entry_reverse(file, &dev->filelist, lhead) {
+   struct drm_i915_file_private *file_priv = file->driver_priv;
+   struct i915_hw_ppgtt *pvt_ppgtt;
+
+   pvt_ppgtt = ctx_to_ppgtt(file_priv->private_default_ctx);
+   seq_printf(m, "\nproc: %s\n",
+  get_pid_task(file->pid, PIDTYPE_PID)->comm);
+   print_ppgtt(m, pvt_ppgtt, "Default context");
+   idr_for_each(&file_priv->context_idr, per_file_ctx,
+(void *)((unsigned long)m | verbose));
+   }
+
intel_runtime_pm_put(dev_priv);
mutex_unlock(&dev->struct_mutex);
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 2b732ca..d8bb4dc 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1993,6 +1993,38 @@ static void gen8_ggtt_clear_range(struct 
i915_address_space *vm,
readl(gtt_base);
 }
 
+void gen8_for_every_pdpe_pde(struct i915_hw_ppgtt *ppgtt,
+void (*callback)(struct i915_pagedirpo *pdp,
+ struct i915_pagedir *pd,
+ struct i915_pagetab *pt,
+

[Intel-gfx] [PATCH 23/56] drm/i915: Always dma map page table allocations

2014-05-09 Thread Ben Widawsky

There is never a case where we don't want to do it. Since we've broken
up the allocations into nice clean helper functions, it's both easy and
obvious to do the dma mapping at the same time.

Signed-off-by: Ben Widawsky 
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 78 -
 1 file changed, 17 insertions(+), 61 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index e8d4dfa..92ffee7 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -215,20 +215,6 @@ static gen6_gtt_pte_t iris_pte_encode(dma_addr_t addr,
pci_unmap_page((dev)->pdev, (pt)->daddr, 4096, PCI_DMA_BIDIRECTIONAL); \
 } while (0);
 
-
-static void dma_unmap_pt_range(struct i915_pagedir *pd,
-  unsigned pde, size_t n,
-  struct drm_device *dev)
-{
-   if (WARN_ON(pde + n > I915_PDES_PER_PD))
-   n = I915_PDES_PER_PD - pde;
-
-   n += pde;
-
-   for (; pde < n; pde++)
-   dma_unmap_pt_single(pd->page_tables[pde], dev);
-}
-
 /**
  * dma_map_pt_single() - Create a dma mapping for a page table
  * @pt:Page table to get a DMA map for
@@ -258,33 +244,12 @@ static int dma_map_pt_single(struct i915_pagetab *pt, 
struct drm_device *dev)
return 0;
 }
 
-static int dma_map_pt_range(struct i915_pagedir *pd,
-   unsigned pde, size_t n,
-   struct drm_device *dev)
-{
-   const int first = pde;
-
-   if (WARN_ON(pde + n > I915_PDES_PER_PD))
-   n = I915_PDES_PER_PD - pde;
-
-   n += pde;
-
-   for (; pde < n; pde++) {
-   int ret;
-   ret = dma_map_pt_single(pd->page_tables[pde], dev);
-   if (ret) {
-   dma_unmap_pt_range(pd, first, pde, dev);
-   return ret;
-   }
-   }
-
-   return 0;
-}
-
-static void free_pt_single(struct i915_pagetab *pt)
+static void free_pt_single(struct i915_pagetab *pt, struct drm_device *dev)
 {
if (WARN_ON(!pt->page))
return;
+
+   dma_unmap_pt_single(pt, dev);
__free_page(pt->page);
kfree(pt);
 }
@@ -292,6 +257,7 @@ static void free_pt_single(struct i915_pagetab *pt)
 static struct i915_pagetab *alloc_pt_single(struct drm_device *dev)
 {
struct i915_pagetab *pt;
+   int ret;
 
pt = kzalloc(sizeof(*pt), GFP_KERNEL);
if (!pt)
@@ -303,6 +269,13 @@ static struct i915_pagetab *alloc_pt_single(struct 
drm_device *dev)
return ERR_PTR(-ENOMEM);
}
 
+   ret = dma_map_pt_single(pt, dev);
+   if (ret) {
+   __free_page(pt->page);
+   kfree(pt);
+   return ERR_PTR(ret);
+   }
+
return pt;
 }
 
@@ -346,7 +319,7 @@ static int alloc_pt_range(struct i915_pagedir *pd, uint16_t 
pde, size_t count,
 
 err_out:
while (i--)
-   free_pt_single(pd->page_tables[i]);
+   free_pt_single(pd->page_tables[i], dev);
return ret;
 }
 
@@ -521,7 +494,7 @@ static void gen8_ppgtt_insert_entries(struct 
i915_address_space *vm,
}
 }
 
-static void gen8_free_page_tables(struct i915_pagedir *pd)
+static void gen8_free_page_tables(struct i915_pagedir *pd, struct drm_device 
*dev)
 {
int i;
 
@@ -529,7 +502,7 @@ static void gen8_free_page_tables(struct i915_pagedir *pd)
return;
 
for (i = 0; i < I915_PDES_PER_PD; i++) {
-   free_pt_single(pd->page_tables[i]);
+   free_pt_single(pd->page_tables[i], dev);
pd->page_tables[i] = NULL;
}
 }
@@ -539,7 +512,7 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
int i;
 
for (i = 0; i < ppgtt->num_pd_pages; i++) {
-   gen8_free_page_tables(ppgtt->pdp.pagedir[i]);
+   gen8_free_page_tables(ppgtt->pdp.pagedir[i], ppgtt->base.dev);
free_pd_single(ppgtt->pdp.pagedir[i]);
}
 }
@@ -596,7 +569,7 @@ static int gen8_ppgtt_allocate_page_tables(struct 
i915_hw_ppgtt *ppgtt)
 
 unwind_out:
while (i--)
-   gen8_free_page_tables(ppgtt->pdp.pagedir[i]);
+   gen8_free_page_tables(ppgtt->pdp.pagedir[i], ppgtt->base.dev);
 
return -ENOMEM;
 }
@@ -694,18 +667,9 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, 
uint64_t size)
 * 2. Create DMA mappings for the page directories and page tables.
 */
for (i = 0; i < max_pdp; i++) {
-   struct i915_pagedir *pd;
ret = gen8_ppgtt_setup_page_directories(ppgtt, i);
if (ret)
goto bail;
-
-   pd = ppgtt->pdp.pagedir[i];
-
-   for (j = 0; j < I915_PDES_PER_PD; j++) {
-   ret = dma_map_pt_single(pd->page_tables[j], 
ppgtt->base.dev);
-   if (ret)
-

[Intel-gfx] [PATCH 34/56] drm/i915: num_pd_pages/num_pd_entries isn't useful

2014-05-09 Thread Ben Widawsky

These values are never quite useful for dynamic allocations of the page
tables. Getting rid of them will help prevent later confusion.

TODO: this probably needs to be earlier in the series

Signed-off-by: Ben Widawsky 
---
 drivers/gpu/drm/i915/i915_debugfs.c | 11 -
 drivers/gpu/drm/i915/i915_gem_gtt.c | 45 ++---
 drivers/gpu/drm/i915/i915_gem_gtt.h |  7 --
 3 files changed, 21 insertions(+), 42 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index 921d898..40aca7f 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1814,13 +1814,12 @@ static int i915_swizzle_info(struct seq_file *m, void 
*data)
 
 static size_t gen6_ppgtt_count_pt_pages(struct i915_hw_ppgtt *ppgtt)
 {
-   struct i915_pagedir *pd = &ppgtt->pd;
-   struct i915_pagetab **pt = &pd->page_tables[0];
+   struct i915_pagetab *pt;
size_t cnt = 0;
-   int i;
+   uint32_t useless;
 
-   for (i = 0; i < ppgtt->num_pd_entries; i++) {
-   if (pt[i] != ppgtt->scratch_pt)
+   gen6_for_all_pdes(pt, ppgtt, useless) {
+   if (pt != ppgtt->scratch_pt)
cnt++;
}
 
@@ -1844,8 +1843,6 @@ static void gen8_ppgtt_info(struct seq_file *m, struct 
drm_device *dev, int verb
if (!ppgtt)
return;
 
-   seq_printf(m, "Page directories: %d\n", ppgtt->num_pd_pages);
-   seq_printf(m, "Page tables: %d\n", ppgtt->num_pd_entries);
for_each_ring(ring, dev_priv, unused) {
seq_printf(m, "%s\n", ring->name);
for (i = 0; i < 4; i++) {
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index a895f4b..a646475 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -617,22 +617,14 @@ static int gen8_ppgtt_alloc_pagedirs(struct 
i915_pagedirpo *pdp,
pdp->pagedirs[pdpe] = alloc_pd_single(ppgtt->base.dev);
if (IS_ERR(ppgtt->pdp.pagedirs[pdpe]))
goto unwind_out;
-
-   ppgtt->num_pd_pages++;
}
 
-   BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPES);
-
return 0;
 
 unwind_out:
-   while (pdpe--) {
+   while (pdpe--)
free_pd_single(ppgtt->pdp.pagedirs[pdpe],
   ppgtt->base.dev);
-   ppgtt->num_pd_pages--;
-   }
-
-   WARN_ON(ppgtt->num_pd_pages);
 
return -ENOMEM;
 }
@@ -655,12 +647,8 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
ppgtt->base.dev);
if (ret)
goto err_out;
-
-   ppgtt->num_pd_entries += I915_PDES_PER_PD;
}
 
-   BUG_ON(pdpe > ppgtt->num_pd_pages);
-
return 0;
 
/* TODO: Check this for all cases */
@@ -682,7 +670,6 @@ err_out:
 static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 {
const int max_pdp = DIV_ROUND_UP(size, 1 << 30);
-   const int min_pt_pages = I915_PDES_PER_PD * max_pdp;
int i, j, ret;
 
if (size % (1<<30))
@@ -731,27 +718,21 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, 
uint64_t size)
ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
ppgtt->base.cleanup = gen8_ppgtt_cleanup;
 
-   DRM_DEBUG_DRIVER("Allocated %d pages for page directories (%d 
wasted)\n",
-ppgtt->num_pd_pages, ppgtt->num_pd_pages - max_pdp);
-   DRM_DEBUG_DRIVER("Allocated %d pages for page tables (%lld wasted)\n",
-ppgtt->num_pd_entries,
-(ppgtt->num_pd_entries - min_pt_pages) + size % 
(1<<30));
return 0;
 }
 
 static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 {
struct i915_address_space *vm = &ppgtt->base;
+   struct i915_pagetab *unused;
gen6_gtt_pte_t scratch_pte;
uint32_t pd_entry;
-   int pte, pde;
+   uint32_t  pte, pde, temp;
+   uint32_t start = ppgtt->base.start, length = ppgtt->base.total;
 
scratch_pte = vm->pte_encode(vm->scratch.addr, I915_CACHE_LLC, true);
 
-   seq_printf(m, "  VM %p (pd_offset %x-%x):\n", vm,
-  ppgtt->pd.pd_offset,
-  ppgtt->pd.pd_offset + ppgtt->num_pd_entries);
-   for (pde = 0; pde < ppgtt->num_pd_entries; pde++) {
+   gen6_for_each_pde(unused, &ppgtt->pd, start, length, temp, pde) {
u32 expected;
gen6_gtt_pte_t *pt_vaddr;
dma_addr_t pt_addr = ppgtt->pd.page_tables[pde]->daddr;
@@ -1229,12 +1210,12 @@ static void gen6_teardown_va_range(struct 
i915_address_space *vm,
 
 static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
-   int i;
+   struct i915_pagetab *pt;
+   uint32_t pde;
 
-   for (i = 0; i < ppgtt->num_pd_entries; i++)

[Intel-gfx] [PATCH 30/56] drm/i915/bdw: Use dynamic allocation idioms on free

2014-05-09 Thread Ben Widawsky

The page directory freer is left here for now as it's still useful given
that GEN8 still preallocates. Once the allocation functions are broken
up into more discrete chunks, we'll follow suit and destroy this
leftover piece.

comments

Signed-off-by: Ben Widawsky 
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 45 -
 drivers/gpu/drm/i915/i915_gem_gtt.h | 26 +
 2 files changed, 55 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 68cc1ab..14aae05 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -531,27 +531,40 @@ static void gen8_ppgtt_insert_entries(struct 
i915_address_space *vm,
}
 }
 
-static void gen8_free_page_tables(struct i915_pagedir *pd, struct drm_device 
*dev)
+static void gen8_teardown_va_range(struct i915_address_space *vm,
+  uint64_t start, uint64_t length)
 {
-   int i;
-
-   if (!pd->page)
-   return;
-
-   for (i = 0; i < I915_PDES_PER_PD; i++) {
-   free_pt_single(pd->page_tables[i], dev);
-   pd->page_tables[i] = NULL;
+   struct i915_hw_ppgtt *ppgtt =
+   container_of(vm, struct i915_hw_ppgtt, base);
+   struct i915_pagedir *pd;
+   struct i915_pagetab *pt;
+   uint64_t temp;
+   uint32_t pdpe, pde;
+
+   gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
+   uint64_t pd_len = gen8_clamp_pd(start, length);
+   uint64_t pd_start = start;
+   gen8_for_each_pde(pt, pd, pd_start, pd_len, temp, pde) {
+   free_pt_single(pt, vm->dev);
+   }
+   free_pd_single(pd, vm->dev);
}
 }
 
-static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
+/* This function will die soon */
+static void gen8_free_full_pagedir(struct i915_hw_ppgtt *ppgtt, int i)
 {
-   int i;
+   gen8_teardown_va_range(&ppgtt->base,
+  i << GEN8_PDPE_SHIFT,
+  (1 << GEN8_PDPE_SHIFT));
+}
 
-   for (i = 0; i < ppgtt->num_pd_pages; i++) {
-   gen8_free_page_tables(ppgtt->pdp.pagedirs[i], ppgtt->base.dev);
-   free_pd_single(ppgtt->pdp.pagedirs[i], ppgtt->base.dev);
-   }
+static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
+{
+   trace_i915_va_teardown(&ppgtt->base,
+  ppgtt->base.start, ppgtt->base.total);
+   gen8_teardown_va_range(&ppgtt->base,
+  ppgtt->base.start, ppgtt->base.total);
 }
 
 static void gen8_ppgtt_cleanup(struct i915_address_space *vm)
@@ -580,7 +593,7 @@ static int gen8_ppgtt_allocate_page_tables(struct 
i915_hw_ppgtt *ppgtt)
 
 unwind_out:
while (i--)
-   gen8_free_page_tables(ppgtt->pdp.pagedirs[i], ppgtt->base.dev);
+   gen8_free_full_pagedir(ppgtt, i);
 
return -ENOMEM;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h 
b/drivers/gpu/drm/i915/i915_gem_gtt.h
index d8a990e..f81b26a 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -395,6 +395,32 @@ static inline size_t gen6_pde_count(uint32_t addr, 
uint32_t length)
return i915_pde_count(addr, length, GEN6_PDE_SHIFT);
 }
 
+#define gen8_for_each_pde(pt, pd, start, length, temp, iter)   \
+   for (iter = gen8_pde_index(start), pt = (pd)->page_tables[iter]; \
+length > 0 && iter < I915_PDES_PER_PD; \
+pt = (pd)->page_tables[++iter],\
+temp = ALIGN(start+1, 1 << GEN8_PDE_SHIFT) - start,\
+temp = min(temp, length),  \
+start += temp, length -= temp)
+
+#define gen8_for_each_pdpe(pd, pdp, start, length, temp, iter) \
+   for (iter = gen8_pdpe_index(start), pd = (pdp)->pagedirs[iter]; \
+length > 0 && iter < GEN8_LEGACY_PDPES;\
+pd = (pdp)->pagedirs[++iter],  \
+temp = ALIGN(start+1, 1 << GEN8_PDPE_SHIFT) - start,   \
+temp = min(temp, length),  \
+start += temp, length -= temp)
+
+/* Clamp length to the next pagedir boundary */
+static inline uint64_t gen8_clamp_pd(uint64_t start, uint64_t length)
+{
+   uint64_t next_pd = ALIGN(start + 1, 1 << GEN8_PDPE_SHIFT);
+   if (next_pd > (start + length))
+   return length;
+
+   return next_pd - start;
+}
+
 static inline uint32_t gen8_pte_index(uint64_t address)
 {
return i915_pte_index(address, GEN8_PDE_SHIFT);
-- 
1.9.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 22/56] drm/i915: Clean up pagetable DMA map & unmap

2014-05-09 Thread Ben Widawsky

Map and unmap are common operations across all generations for
pagetables. With a simple helper, we can get a nice net code reduction
as well as simplified complexity.

There is some room for optimization here, for instance with the multiple
page mapping, that can be done in one pci_map operation. In that case
however, the max value we'll ever see there is 512, and so I believe the
simpler code makes this a worthwhile trade-off. Also, the range mapping
functions are place holders to help transition the code. Eventually,
mapping will only occur during a page allocation which will always be a
discrete operation.

Signed-off-by: Ben Widawsky 
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 147 +---
 1 file changed, 85 insertions(+), 62 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index e396b89..e8d4dfa 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -211,6 +211,76 @@ static gen6_gtt_pte_t iris_pte_encode(dma_addr_t addr,
return pte;
 }
 
+#define dma_unmap_pt_single(pt, dev) do { \
+   pci_unmap_page((dev)->pdev, (pt)->daddr, 4096, PCI_DMA_BIDIRECTIONAL); \
+} while (0);
+
+
+static void dma_unmap_pt_range(struct i915_pagedir *pd,
+  unsigned pde, size_t n,
+  struct drm_device *dev)
+{
+   if (WARN_ON(pde + n > I915_PDES_PER_PD))
+   n = I915_PDES_PER_PD - pde;
+
+   n += pde;
+
+   for (; pde < n; pde++)
+   dma_unmap_pt_single(pd->page_tables[pde], dev);
+}
+
+/**
+ * dma_map_pt_single() - Create a dma mapping for a page table
+ * @pt:Page table to get a DMA map for
+ * @dev:   drm device
+ *
+ * Page table allocations are unified across all gens. They always require a
+ * single 4k allocation, as well as a DMA mapping.
+ *
+ * Return: 0 if success.
+ */
+static int dma_map_pt_single(struct i915_pagetab *pt, struct drm_device *dev)
+{
+   struct page *page;
+   dma_addr_t pt_addr;
+   int ret;
+
+   page = pt->page;
+   pt_addr = pci_map_page(dev->pdev, page, 0, 4096,
+  PCI_DMA_BIDIRECTIONAL);
+
+   ret = pci_dma_mapping_error(dev->pdev, pt_addr);
+   if (ret)
+   return ret;
+
+   pt->daddr = pt_addr;
+
+   return 0;
+}
+
+static int dma_map_pt_range(struct i915_pagedir *pd,
+   unsigned pde, size_t n,
+   struct drm_device *dev)
+{
+   const int first = pde;
+
+   if (WARN_ON(pde + n > I915_PDES_PER_PD))
+   n = I915_PDES_PER_PD - pde;
+
+   n += pde;
+
+   for (; pde < n; pde++) {
+   int ret;
+   ret = dma_map_pt_single(pd->page_tables[pde], dev);
+   if (ret) {
+   dma_unmap_pt_range(pd, first, pde, dev);
+   return ret;
+   }
+   }
+
+   return 0;
+}
+
 static void free_pt_single(struct i915_pagetab *pt)
 {
if (WARN_ON(!pt->page))
@@ -219,7 +289,7 @@ static void free_pt_single(struct i915_pagetab *pt)
kfree(pt);
 }
 
-static struct i915_pagetab *alloc_pt_single(void)
+static struct i915_pagetab *alloc_pt_single(struct drm_device *dev)
 {
struct i915_pagetab *pt;
 
@@ -242,6 +312,7 @@ static struct i915_pagetab *alloc_pt_single(void)
  * available to point to the allocated page tables.
  * @pde:   First page directory entry for which we are allocating.
  * @count: Number of pages to allocate.
+ * @devDRM device used for DMA mapping.
  *
  * Allocates multiple page table pages and sets the appropriate entries in the
  * page table structure within the page directory. Function cleans up after
@@ -249,7 +320,8 @@ static struct i915_pagetab *alloc_pt_single(void)
  *
  * Return: 0 if allocation succeeded.
  */
-static int alloc_pt_range(struct i915_pagedir *pd, uint16_t pde, size_t count)
+static int alloc_pt_range(struct i915_pagedir *pd, uint16_t pde, size_t count,
+ struct drm_device *dev)
 {
int i, ret;
 
@@ -259,7 +331,7 @@ static int alloc_pt_range(struct i915_pagedir *pd, uint16_t 
pde, size_t count)
BUG_ON(pde + count > I915_PDES_PER_PD);
 
for (i = pde; i < pde + count; i++) {
-   struct i915_pagetab *pt = alloc_pt_single();
+   struct i915_pagetab *pt = alloc_pt_single(dev);
if (IS_ERR(pt)) {
ret = PTR_ERR(pt);
goto err_out;
@@ -515,7 +587,7 @@ static int gen8_ppgtt_allocate_page_tables(struct 
i915_hw_ppgtt *ppgtt)
 
for (i = 0; i < ppgtt->num_pd_pages; i++) {
ret = alloc_pt_range(ppgtt->pdp.pagedir[i],
-0, I915_PDES_PER_PD);
+0, I915_PDES_PER_PD, ppgtt->base.dev);
if (ret)
g

[Intel-gfx] [PATCH 37/56] drm/i915/bdw: begin bitmap tracking

2014-05-09 Thread Ben Widawsky

Like with gen6/7, we can enable bitmap tracking with all the
preallocations to make sure things actually don't blow up.

Signed-off-by: Ben Widawsky 
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 101 +++-
 drivers/gpu/drm/i915/i915_gem_gtt.h |  12 +
 2 files changed, 99 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index e2bc274..82b98ea 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -353,8 +353,12 @@ err_out:
 
 static void __free_pd_single(struct i915_pagedir *pd, struct drm_device *dev)
 {
+   WARN(!bitmap_empty(pd->used_pdes, I915_PDES_PER_PD),
+"Free page directory with %d used pages\n",
+bitmap_weight(pd->used_pdes, I915_PDES_PER_PD));
i915_dma_unmap_single(pd, dev);
__free_page(pd->page);
+   kfree(pd->used_pdes);
kfree(pd);
 }
 
@@ -367,26 +371,35 @@ static void __free_pd_single(struct i915_pagedir *pd, 
struct drm_device *dev)
 static struct i915_pagedir *alloc_pd_single(struct drm_device *dev)
 {
struct i915_pagedir *pd;
-   int ret;
+   int ret = -ENOMEM;
 
pd = kzalloc(sizeof(*pd), GFP_KERNEL);
if (!pd)
return ERR_PTR(-ENOMEM);
 
+   pd->used_pdes = kcalloc(BITS_TO_LONGS(I915_PDES_PER_PD),
+   sizeof(*pd->used_pdes), GFP_KERNEL);
+   if (!pd->used_pdes)
+   goto free_pd;
+
pd->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
-   if (!pd->page) {
-   kfree(pd);
-   return ERR_PTR(-ENOMEM);
-   }
+   if (!pd->page)
+   goto free_bitmap;
 
ret = i915_dma_map_px_single(pd, dev);
-   if (ret) {
-   __free_page(pd->page);
-   kfree(pd);
-   return ERR_PTR(ret);
-   }
+   if (ret)
+   goto free_page;
 
return pd;
+
+free_page:
+   __free_page(pd->page);
+free_bitmap:
+   kfree(pd->used_pdes);
+free_pd:
+   kfree(pd);
+
+   return ERR_PTR(ret);
 }
 
 /* Broadwell Page Directory Pointer Descriptors */
@@ -576,12 +589,48 @@ static void gen8_teardown_va_range(struct 
i915_address_space *vm,
gen8_for_each_pdpe(pd, &ppgtt->pdp, start, length, temp, pdpe) {
uint64_t pd_len = gen8_clamp_pd(start, length);
uint64_t pd_start = start;
+
+   /* Page directories might not be present since the macro rounds
+* down, and up.
+*/
+   if (!pd) {
+   WARN(test_bit(pdpe, ppgtt->pdp.used_pdpes),
+"PDPE %d is not allocated, but is reserved (%p)\n",
+pdpe, vm);
+   continue;
+   } else {
+   WARN(!test_bit(pdpe, ppgtt->pdp.used_pdpes),
+"PDPE %d not reserved, but is allocated (%p)",
+pdpe, vm);
+   }
+
gen8_for_each_pde(pt, pd, pd_start, pd_len, temp, pde) {
-   free_pt_single(pt, vm->dev);
-   pd->page_tables[pde] = NULL;
+   if (!pt) {
+   WARN(test_bit(pde, pd->used_pdes),
+"PDE %d is not allocated, but is reserved 
(%p)\n",
+pde, vm);
+   continue;
+   } else
+   WARN(!test_bit(pde, pd->used_pdes),
+"PDE %d not reserved, but is allocated 
(%p)",
+pde, vm);
+
+   bitmap_clear(pt->used_ptes,
+gen8_pte_index(pd_start),
+gen8_pte_count(pd_start, pd_len));
+
+   if (bitmap_empty(pt->used_ptes, GEN8_PTES_PER_PT)) {
+   free_pt_single(pt, vm->dev);
+   pd->page_tables[pde] = NULL;
+   WARN_ON(!test_and_clear_bit(pde, 
pd->used_pdes));
+   }
+   }
+
+   if (bitmap_empty(pd->used_pdes, I915_PDES_PER_PD)) {
+   free_pd_single(pd, vm->dev);
+   ppgtt->pdp.pagedirs[pdpe] = NULL;
+   WARN_ON(!test_and_clear_bit(pdpe, 
ppgtt->pdp.used_pdpes));
}
-   free_pd_single(pd, vm->dev);
-   ppgtt->pdp.pagedirs[pdpe] = NULL;
}
 }
 
@@ -629,6 +678,7 @@ unwind_out:
return -ENOMEM;
 }
 
+/* bitmap of new pagedirs */
 static int gen8_ppgtt_alloc_pagedirs(struct i915_pagedirpo *pdp,
 uint64_t start,
 uint64_t length,
@@ -644,6 +694,7 @@ static int gen8_ppgtt_alloc_pagedirs(struct i915

[Intel-gfx] [PATCH 35/56] drm/i915: Extract PPGTT param from pagedir alloc

2014-05-09 Thread Ben Widawsky

Now that we don't need to trace num_pd_pages, we may as well kill all
need for the PPGTT structure in the alloc_pagedirs. This is very useful
for when we move to 48b addressing, and the PDP isn't the root of the
page table structure.

The param is replaced with drm_device, which is an unavoidable wart
throughout the series. (in other words, not extra flagrant).

Signed-off-by: Ben Widawsky 
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index a646475..eded6a1 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -601,10 +601,9 @@ unwind_out:
 
 static int gen8_ppgtt_alloc_pagedirs(struct i915_pagedirpo *pdp,
 uint64_t start,
-uint64_t length)
+uint64_t length,
+struct drm_device *dev)
 {
-   struct i915_hw_ppgtt *ppgtt =
-   container_of(pdp, struct i915_hw_ppgtt, pdp);
struct i915_pagedir *unused;
uint64_t temp;
uint32_t pdpe;
@@ -614,8 +613,8 @@ static int gen8_ppgtt_alloc_pagedirs(struct i915_pagedirpo 
*pdp,
 
gen8_for_each_pdpe(unused, pdp, start, length, temp, pdpe) {
BUG_ON(unused);
-   pdp->pagedirs[pdpe] = alloc_pd_single(ppgtt->base.dev);
-   if (IS_ERR(ppgtt->pdp.pagedirs[pdpe]))
+   pdp->pagedirs[pdpe] = alloc_pd_single(dev);
+   if (IS_ERR(pdp->pagedirs[pdpe]))
goto unwind_out;
}
 
@@ -623,8 +622,7 @@ static int gen8_ppgtt_alloc_pagedirs(struct i915_pagedirpo 
*pdp,
 
 unwind_out:
while (pdpe--)
-   free_pd_single(ppgtt->pdp.pagedirs[pdpe],
-  ppgtt->base.dev);
+   free_pd_single(pdp->pagedirs[pdpe], dev);
 
return -ENOMEM;
 }
@@ -638,7 +636,8 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
uint32_t pdpe;
int ret;
 
-   ret = gen8_ppgtt_alloc_pagedirs(&ppgtt->pdp, start, length);
+   ret = gen8_ppgtt_alloc_pagedirs(&ppgtt->pdp, start, length,
+   ppgtt->base.dev);
if (ret)
return ret;
 
-- 
1.9.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 21/56] drm/i915: Generalize GEN6 mapping

2014-05-09 Thread Ben Widawsky

Having a more general way of doing mappings will allow the ability to
easy map and unmap a specific page table. Specifically in this case, we
pass down the page directory + entry, and the page table to map. This
works similarly to the x86 code.

The same work will need to happen for GEN8. At that point I will try to
combine functionality.

Signed-off-by: Ben Widawsky 
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 61 +++--
 drivers/gpu/drm/i915/i915_gem_gtt.h |  2 ++
 2 files changed, 34 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 35370eb..e396b89 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -700,18 +700,13 @@ bail:
 
 static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 {
-   struct drm_i915_private *dev_priv = ppgtt->base.dev->dev_private;
struct i915_address_space *vm = &ppgtt->base;
-   gen6_gtt_pte_t __iomem *pd_addr;
gen6_gtt_pte_t scratch_pte;
uint32_t pd_entry;
int pte, pde;
 
scratch_pte = vm->pte_encode(vm->scratch.addr, I915_CACHE_LLC, true);
 
-   pd_addr = (gen6_gtt_pte_t __iomem *)dev_priv->gtt.gsm +
-   ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
-
seq_printf(m, "  VM %p (pd_offset %x-%x):\n", vm,
   ppgtt->pd.pd_offset,
   ppgtt->pd.pd_offset + ppgtt->num_pd_entries);
@@ -719,7 +714,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, 
struct seq_file *m)
u32 expected;
gen6_gtt_pte_t *pt_vaddr;
dma_addr_t pt_addr = ppgtt->pd.page_tables[pde]->daddr;
-   pd_entry = readl(pd_addr + pde);
+   pd_entry = readl(ppgtt->pd_addr + pde);
expected = (GEN6_PDE_ADDR_ENCODE(pt_addr) | GEN6_PDE_VALID);
 
if (pd_entry != expected)
@@ -755,39 +750,43 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, 
struct seq_file *m)
}
 }
 
-static void gen6_map_single(struct i915_hw_ppgtt *ppgtt,
-   const unsigned pde_index,
-   dma_addr_t daddr)
+/* Map pde (index) from the page directory @pd to the page table @pt */
+static void gen6_map_single(struct i915_pagedir *pd,
+   const int pde, struct i915_pagetab *pt)
 {
-   struct drm_i915_private *dev_priv = ppgtt->base.dev->dev_private;
-   uint32_t pd_entry;
-   gen6_gtt_pte_t __iomem *pd_addr = (gen6_gtt_pte_t 
__iomem*)dev_priv->gtt.gsm;
-   pd_addr += ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
+   struct i915_hw_ppgtt *ppgtt =
+   container_of(pd, struct i915_hw_ppgtt, pd);
+   u32 pd_entry;
 
-   pd_entry = GEN6_PDE_ADDR_ENCODE(daddr);
+   pd_entry = GEN6_PDE_ADDR_ENCODE(pt->daddr);
pd_entry |= GEN6_PDE_VALID;
 
-   writel(pd_entry, pd_addr + pde_index);
+   writel(pd_entry, ppgtt->pd_addr + pde);
+
+   /* XXX: Caller needs to make sure the write completes if necessary */
 }
 
 /* Map all the page tables found in the ppgtt structure to incrementing page
  * directories. */
-static void gen6_map_page_tables(struct i915_hw_ppgtt *ppgtt)
+static void gen6_map_page_range(struct drm_i915_private *dev_priv,
+   struct i915_pagedir *pd, unsigned pde, size_t n)
 {
-   struct drm_i915_private *dev_priv = ppgtt->base.dev->dev_private;
-   int i;
+   if (WARN_ON(pde + n > I915_PDES_PER_PD))
+   n = I915_PDES_PER_PD - pde;
 
-   WARN_ON(ppgtt->pd.pd_offset & 0x3f);
-   for (i = 0; i < ppgtt->num_pd_entries; i++)
-   gen6_map_single(ppgtt, i, ppgtt->pd.page_tables[i]->daddr);
+   n += pde;
+
+   for (; pde < n; pde++)
+   gen6_map_single(pd, pde, pd->page_tables[pde]);
 
+   /* Make sure write is complete before other code can use this page
+* table. Also require for WC mapped PTEs */
readl(dev_priv->gtt.gsm);
 }
 
 static uint32_t get_pd_offset(struct i915_hw_ppgtt *ppgtt)
 {
BUG_ON(ppgtt->pd.pd_offset & 0x3f);
-
return (ppgtt->pd.pd_offset / 64) << 16;
 }
 
@@ -1219,7 +1218,10 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
ppgtt->pd.pd_offset =
ppgtt->node.start / PAGE_SIZE * sizeof(gen6_gtt_pte_t);
 
-   gen6_map_page_tables(ppgtt);
+   ppgtt->pd_addr = (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm +
+   ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t);
+
+   gen6_map_page_range(dev_priv, &ppgtt->pd, 0, ppgtt->num_pd_entries);
 
DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
 ppgtt->node.size >> 20,
@@ -1405,13 +1407,14 @@ void i915_gem_restore_gtt_mappings(struct drm_device 
*dev)
 
list_for_each_entry(vm, &dev_priv->vm_list, global_link) {
/* TODO: Perhaps it shouldn't b

[Intel-gfx] [PATCH 36/56] drm/i915/bdw: Split out mappings

2014-05-09 Thread Ben Widawsky

When we do dynamic page table allocations for gen8, we'll need to have
more control over how and when we map page tables, similar to gen6.

This patch adds the functionality and calls it at init, which should
have no functional change.

The PDPEs are still a special case for now. We'll need a function for
that in the future as well.

Signed-off-by: Ben Widawsky 
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 94 -
 1 file changed, 52 insertions(+), 42 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index eded6a1..e2bc274 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -533,6 +533,36 @@ static void gen8_ppgtt_insert_entries(struct 
i915_address_space *vm,
}
 }
 
+static void __gen8_do_map_pt(gen8_ppgtt_pde_t *pde,
+struct i915_pagetab *pt,
+struct drm_device *dev)
+{
+   gen8_ppgtt_pde_t entry =
+   gen8_pde_encode(dev, pt->daddr, I915_CACHE_LLC);
+   *pde = entry;
+}
+
+/* It's likely we'll map more than one pagetable at a time. This function will
+ * save us unnecessary kmap calls, but do no more functionally than multiple
+ * calls to map_pt. */
+static void gen8_map_pagetable_range(struct i915_pagedir *pd,
+uint64_t start,
+uint64_t length,
+struct drm_device *dev)
+{
+   gen8_ppgtt_pde_t *pagedir = kmap_atomic(pd->page);
+   struct i915_pagetab *pt;
+   uint64_t temp, pde;
+
+   gen8_for_each_pde(pt, pd, start, length, temp, pde)
+   __gen8_do_map_pt(pagedir + pde, pt, dev);
+
+   if (!HAS_LLC(dev))
+   drm_clflush_virt_range(pagedir, PAGE_SIZE);
+
+   kunmap_atomic(pagedir);
+}
+
 static void gen8_teardown_va_range(struct i915_address_space *vm,
   uint64_t start, uint64_t length)
 {
@@ -627,11 +657,14 @@ unwind_out:
return -ENOMEM;
 }
 
-static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
-   uint64_t start,
-   uint64_t length)
+static int gen8_alloc_va_range(struct i915_address_space *vm,
+  uint64_t start,
+  uint64_t length)
 {
+   struct i915_hw_ppgtt *ppgtt =
+   container_of(vm, struct i915_hw_ppgtt, base);
struct i915_pagedir *pd;
+   const uint64_t orig_start = start;
uint64_t temp;
uint32_t pdpe;
int ret;
@@ -650,9 +683,8 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
 
return 0;
 
-   /* TODO: Check this for all cases */
 err_out:
-   gen8_ppgtt_free(ppgtt);
+   gen8_teardown_va_range(vm, orig_start, start);
return ret;
 }
 
@@ -662,60 +694,38 @@ err_out:
  * PDP represents 1GB of memory 4 * 512 * 512 * 4096 = 4GB legacy 32b address
  * space.
  *
- * FIXME: split allocation into smaller pieces. For now we only ever do this
- * once, but with full PPGTT, the multiple contiguous allocations will be bad.
- * TODO: Do something with the size parameter
  */
 static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 {
-   const int max_pdp = DIV_ROUND_UP(size, 1 << 30);
-   int i, j, ret;
-
-   if (size % (1<<30))
-   DRM_INFO("Pages will be wasted unless GTT size (%llu) is 
divisible by 1GB\n", size);
+   struct i915_pagedir *pd;
+   uint64_t temp, start = 0;
+   const uint64_t orig_length = size;
+   uint32_t pdpe;
+   int ret;
 
ppgtt->base.start = 0;
ppgtt->base.total = size;
+   ppgtt->base.clear_range = gen8_ppgtt_clear_range;
+   ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
+   ppgtt->base.cleanup = gen8_ppgtt_cleanup;
+   ppgtt->enable = gen8_ppgtt_enable;
+   ppgtt->switch_mm = gen8_mm_switch;
 
ppgtt->scratch_pd = alloc_pt_scratch(ppgtt->base.dev);
if (IS_ERR(ppgtt->scratch_pd))
return PTR_ERR(ppgtt->scratch_pd);
 
-   /* 1. Do all our allocations for page directories and page tables. */
-   ret = gen8_ppgtt_alloc(ppgtt, ppgtt->base.start, ppgtt->base.total);
+   ret = gen8_alloc_va_range(&ppgtt->base, start, size);
if (ret) {
free_pt_scratch(ppgtt->scratch_pd, ppgtt->base.dev);
return ret;
}
 
-   /*
-* 2. Map all the page directory entires to point to the page tables
-* we've allocated.
-*
-* For now, the PPGTT helper functions all require that the PDEs are
-* plugged in correctly. So we do that now/here. For aliasing PPGTT, we
-* will never need to touch the PDEs again.
-*/
-   for (i = 0; i < max_pdp; i++) {
-   struct i915_pagedir *pd = ppgtt->pdp.pagedirs[i];
-   gen8_ppgtt_pde_t *pd_vaddr;
-

[Intel-gfx] [PATCH 31/56] drm/i915/bdw: pagedirs rework allocation

2014-05-09 Thread Ben Widawsky

Signed-off-by: Ben Widawsky 
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 43 ++---
 1 file changed, 31 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 14aae05..10cfad8 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -546,8 +546,10 @@ static void gen8_teardown_va_range(struct 
i915_address_space *vm,
uint64_t pd_start = start;
gen8_for_each_pde(pt, pd, pd_start, pd_len, temp, pde) {
free_pt_single(pt, vm->dev);
+   pd->page_tables[pde] = NULL;
}
free_pd_single(pd, vm->dev);
+   ppgtt->pdp.pagedirs[pdpe] = NULL;
}
 }
 
@@ -598,26 +600,40 @@ unwind_out:
return -ENOMEM;
 }
 
-static int gen8_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt,
-   const int max_pdp)
+static int gen8_ppgtt_alloc_pagedirs(struct i915_pagedirpo *pdp,
+uint64_t start,
+uint64_t length)
 {
-   int i;
+   struct i915_hw_ppgtt *ppgtt =
+   container_of(pdp, struct i915_hw_ppgtt, pdp);
+   struct i915_pagedir *unused;
+   uint64_t temp;
+   uint32_t pdpe;
 
-   for (i = 0; i < max_pdp; i++) {
-   ppgtt->pdp.pagedirs[i] = alloc_pd_single(ppgtt->base.dev);
-   if (IS_ERR(ppgtt->pdp.pagedirs[i]))
+   /* FIXME: PPGTT container_of won't work for 64b */
+   BUG_ON((start + length) > 0x8ULL);
+
+   gen8_for_each_pdpe(unused, pdp, start, length, temp, pdpe) {
+   BUG_ON(unused);
+   pdp->pagedirs[pdpe] = alloc_pd_single(ppgtt->base.dev);
+   if (IS_ERR(ppgtt->pdp.pagedirs[pdpe]))
goto unwind_out;
+
+   ppgtt->num_pd_pages++;
}
 
-   ppgtt->num_pd_pages = max_pdp;
BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPES);
 
return 0;
 
 unwind_out:
-   while (i--)
-   free_pd_single(ppgtt->pdp.pagedirs[i],
+   while (pdpe--) {
+   free_pd_single(ppgtt->pdp.pagedirs[pdpe],
   ppgtt->base.dev);
+   ppgtt->num_pd_pages--;
+   }
+
+   WARN_ON(ppgtt->num_pd_pages);
 
return -ENOMEM;
 }
@@ -627,7 +643,8 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
 {
int ret;
 
-   ret = gen8_ppgtt_allocate_page_directories(ppgtt, max_pdp);
+   ret = gen8_ppgtt_alloc_pagedirs(&ppgtt->pdp, ppgtt->base.start,
+   ppgtt->base.total);
if (ret)
return ret;
 
@@ -664,6 +681,10 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, 
uint64_t size)
if (size % (1<<30))
DRM_INFO("Pages will be wasted unless GTT size (%llu) is 
divisible by 1GB\n", size);
 
+   ppgtt->base.start = 0;
+   ppgtt->base.total = size;
+   BUG_ON(ppgtt->base.total == 0);
+
/* 1. Do all our allocations for page directories and page tables. */
ret = gen8_ppgtt_alloc(ppgtt, max_pdp);
if (ret)
@@ -697,8 +718,6 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, 
uint64_t size)
ppgtt->base.clear_range = gen8_ppgtt_clear_range;
ppgtt->base.insert_entries = gen8_ppgtt_insert_entries;
ppgtt->base.cleanup = gen8_ppgtt_cleanup;
-   ppgtt->base.start = 0;
-   ppgtt->base.total = ppgtt->num_pd_entries * GEN8_PTES_PER_PT * 
PAGE_SIZE;
 
DRM_DEBUG_DRIVER("Allocated %d pages for page directories (%d 
wasted)\n",
 ppgtt->num_pd_pages, ppgtt->num_pd_pages - max_pdp);
-- 
1.9.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 27/56] drm/i915: Extract context switch skip logic

2014-05-09 Thread Ben Widawsky

We have some fanciness coming up. This patch just breaks out the logic.

Signed-off-by: Ben Widawsky 
---
 drivers/gpu/drm/i915/i915_gem_context.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
b/drivers/gpu/drm/i915/i915_gem_context.c
index f2dc17a..7eb4091 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -594,6 +594,16 @@ mi_set_context(struct intel_ring_buffer *ring,
return ret;
 }
 
+static inline bool should_skip_switch(struct intel_ring_buffer *ring,
+ struct i915_hw_context *from,
+ struct i915_hw_context *to)
+{
+   if (from == to && from->last_ring == ring && !to->remap_slice)
+   return true;
+
+   return false;
+}
+
 static int do_switch(struct intel_ring_buffer *ring,
 struct i915_hw_context *to)
 {
@@ -608,7 +618,7 @@ static int do_switch(struct intel_ring_buffer *ring,
BUG_ON(!i915_gem_obj_is_pinned(from->obj));
}
 
-   if (from == to && from->last_ring == ring && !to->remap_slice)
+   if (should_skip_switch(ring, from, to))
return 0;
 
/* Trying to pin first makes error handling easier. */
-- 
1.9.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 16/56] drm/i915: Range clearing is PPGTT agnostic

2014-05-09 Thread Ben Widawsky

Therefore we can do it from our general init function. Eventually, I
hope to have a lot more commonality like this. It won't arrive yet, but
this was a nice easy one.

Signed-off-by: Ben Widawsky 
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index bfa9811..086c533 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -621,8 +621,6 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, 
uint64_t size)
ppgtt->base.start = 0;
ppgtt->base.total = ppgtt->num_pd_entries * GEN8_PTES_PER_PAGE * 
PAGE_SIZE;
 
-   ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
-
DRM_DEBUG_DRIVER("Allocated %d pages for page directories (%d 
wasted)\n",
 ppgtt->num_pd_pages, ppgtt->num_pd_pages - max_pdp);
DRM_DEBUG_DRIVER("Allocated %d pages for page tables (%lld wasted)\n",
@@ -1189,8 +1187,6 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
 
gen6_map_page_tables(ppgtt);
 
-   ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
-
DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
 ppgtt->node.size >> 20,
 ppgtt->node.start / PAGE_SIZE);
@@ -1218,6 +1214,7 @@ int i915_gem_init_ppgtt(struct drm_device *dev, struct 
i915_hw_ppgtt *ppgtt)
 
kref_init(&ppgtt->ref);
drm_mm_init(&ppgtt->base.mm, ppgtt->base.start, ppgtt->base.total);
+   ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
i915_init_vm(dev_priv, &ppgtt->base);
 
return 0;
-- 
1.9.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 17/56] drm/i915: Page table helpers, and define renames

2014-05-09 Thread Ben Widawsky

These page table helpers make the code much cleaner. There is some
room to use the arch/x86 header files. The reason I've opted not to is
in several cases, the definitions are dictated by the CONFIG_ options
which do not always indicate the restrictions in the GPU. While here,
clean up the defines to have more concise names, and consolidate between
gen6 and gen8 where appropriate.

v2: Use I915_PAGE_SIZE to remove PAGE_SIZE dep in the new code (Jesse)
Fix bugged I915_PTE_MASK define, which was unused (Chris)
BUG_ON bad length/size - taking directly from Chris (Chris)
define NUM_PTE (Chris)

I've made a lot of tiny errors in these helpers. Often I'd correct an
error only to introduce another one. While IGT was capable of catching
them, the tests often took a while to catch, and where hard/slow to
debug in the kernel. As a result, to test this, I compiled
i915_gem_gtt.h in userspace, and ran tests from userspace. What follows
isn't by any means complete, but it was able to catch lot of bugs. Gen8
is also untested, but since the current code is almost identical, I feel
pretty comfortable with that.

void test_pte(uint32_t base) {
uint32_t ret;
assert_pte_index((base + 0), 0);
assert_pte_index((base + 1), 0);
assert_pte_index((base + 0x1000), 1);
assert_pte_index((base + (1<<22)), 0);
assert_pte_index((base + ((1<<22) - 1)), 1023);
assert_pte_index((base + (1<<21)), 512);

assert_pte_count(base + 0, 0, 0);
assert_pte_count(base + 0, 1, 1);
assert_pte_count(base + 0, 0x1000, 1);
assert_pte_count(base + 0, 0x1001, 2);
assert_pte_count(base + 0, 1<<21, 512);

assert_pte_count(base + 0, 1<<22, 1024);
assert_pte_count(base + 0, (1<<22) - 1, 1024);
assert_pte_count(base + (1<<21), 1<<22, 512);
assert_pte_count(base + (1<<21), (1<<22)+1, 512);
assert_pte_count(base + (1<<21), 10<<22, 512);
}

void test_pde(uint32_t base) {
assert(gen6_pde_index(base + 0) == 0);
assert(gen6_pde_index(base + 1) == 0);
assert(gen6_pde_index(base + (1<<21)) == 0);
assert(gen6_pde_index(base + (1<<22)) == 1);
assert(gen6_pde_index(base + ((256<<22)))== 256);
assert(gen6_pde_index(base + ((512<<22))) == 0);
assert(gen6_pde_index(base + ((513<<22))) == 1); /* This is
actually not possible on gen6 */

assert(gen6_pde_count(base + 0, 0) == 0);
assert(gen6_pde_count(base + 0, 1) == 1);
assert(gen6_pde_count(base + 0, 1<<21) == 1);
assert(gen6_pde_count(base + 0, 1<<22) == 1);
assert(gen6_pde_count(base + 0, (1<<22) + 0x1000) == 2);
assert(gen6_pde_count(base + 0x1000, 1<<22) == 2);
assert(gen6_pde_count(base + 0, 511<<22) == 511);
assert(gen6_pde_count(base + 0, 512<<22) == 512);
assert(gen6_pde_count(base + 0x1000, 512<<22) == 512);
assert(gen6_pde_count(base + (1<<22), 512<<22) == 511);
}

int main()
{
test_pde(0);
while (1)
test_pte(rand() & ~((1<<22) - 1));

return 0;
}

Signed-off-by: Ben Widawsky 
---
 drivers/gpu/drm/i915/i915_gem_gtt.c |  88 +-
 drivers/gpu/drm/i915/i915_gem_gtt.h | 123 +---
 2 files changed, 156 insertions(+), 55 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 086c533..a8eb077 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -248,7 +248,7 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
int i, ret;
 
/* bit of a hack to find the actual last used pd */
-   int used_pd = ppgtt->num_pd_entries / GEN8_PDES_PER_PAGE;
+   int used_pd = ppgtt->num_pd_entries / I915_PDES_PER_PD;
 
for (i = used_pd - 1; i >= 0; i--) {
dma_addr_t addr = ppgtt->pd_dma_addr[i];
@@ -268,9 +268,9 @@ static void gen8_ppgtt_clear_range(struct 
i915_address_space *vm,
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
gen8_gtt_pte_t *pt_vaddr, scratch_pte;
-   unsigned pdpe = start >> GEN8_PDPE_SHIFT & GEN8_PDPE_MASK;
-   unsigned pde = start >> GEN8_PDE_SHIFT & GEN8_PDE_MASK;
-   unsigned pte = start >> GEN8_PTE_SHIFT & GEN8_PTE_MASK;
+   unsigned pdpe = gen8_pdpe_index(start);
+   unsigned pde = gen8_pde_index(start);
+   unsigned pte = gen8_pte_index(start);
unsigned num_entries = length >> PAGE_SHIFT;
unsigned last_pte, i;
 
@@ -281,8 +281,8 @@ static void gen8_ppgtt_clear_range(struct 
i915_address_space *vm,
struct page *page_table = ppgtt->gen8_pt_pages[pdpe][pde];
 
last_pte = pte + num_entries;
-   if (last_pte > GEN8_PTES_PER_PAGE)
-   last_pte = GEN8_PTES_PER_PAGE;
+   if (last_pte > GEN8_PTES_PER_PT)
+   last_pte = GEN8_PTES_PER_P

[Intel-gfx] [PATCH 07/56] drm/i915: fix gtt_total_entries()

2014-05-09 Thread Ben Widawsky

It's useful to have it not as a macro for some upcoming work. Generally
since we try to avoid macros anyway, I think it doesn't hurt to put this
as its own patch.

Signed-off-by: Ben Widawsky 
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 9 +++--
 drivers/gpu/drm/i915/i915_gem_gtt.h | 2 --
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index bec637b..33cac92 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -73,6 +73,11 @@ static void ppgtt_bind_vma(struct i915_vma *vma,
 static void ppgtt_unbind_vma(struct i915_vma *vma);
 static int gen8_ppgtt_enable(struct i915_hw_ppgtt *ppgtt);
 
+static size_t gtt_total_entries(struct i915_gtt *gtt)
+{
+   return gtt->base.total >> PAGE_SHIFT;
+}
+
 static inline gen8_gtt_pte_t gen8_pte_encode(dma_addr_t addr,
 enum i915_cache_level level,
 bool valid)
@@ -1491,7 +1496,7 @@ static void gen8_ggtt_clear_range(struct 
i915_address_space *vm,
unsigned num_entries = length >> PAGE_SHIFT;
gen8_gtt_pte_t scratch_pte, __iomem *gtt_base =
(gen8_gtt_pte_t __iomem *) dev_priv->gtt.gsm + first_entry;
-   const int max_entries = gtt_total_entries(dev_priv->gtt) - first_entry;
+   const int max_entries = gtt_total_entries(&dev_priv->gtt) - first_entry;
int i;
 
if (WARN(num_entries > max_entries,
@@ -1517,7 +1522,7 @@ static void gen6_ggtt_clear_range(struct 
i915_address_space *vm,
unsigned num_entries = length >> PAGE_SHIFT;
gen6_gtt_pte_t scratch_pte, __iomem *gtt_base =
(gen6_gtt_pte_t __iomem *) dev_priv->gtt.gsm + first_entry;
-   const int max_entries = gtt_total_entries(dev_priv->gtt) - first_entry;
+   const int max_entries = gtt_total_entries(&dev_priv->gtt) - first_entry;
int i;
 
if (WARN(num_entries > max_entries,
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h 
b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 5635c65..ad68079 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -38,8 +38,6 @@ typedef uint32_t gen6_gtt_pte_t;
 typedef uint64_t gen8_gtt_pte_t;
 typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
 
-#define gtt_total_entries(gtt) ((gtt).base.total >> PAGE_SHIFT)
-
 #define I915_PPGTT_PT_ENTRIES  (PAGE_SIZE / sizeof(gen6_gtt_pte_t))
 /* gen6-hsw has bit 11-4 for physical addr bit 39-32 */
 #define GEN6_GTT_ADDR_ENCODE(addr) ((addr) | (((addr) >> 28) & 0xff0))
-- 
1.9.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 18/56] drm/i915: construct page table abstractions

2014-05-09 Thread Ben Widawsky

Thus far we've opted to make complex code requiring difficult review. In
the future, the code is only going to become more complex, and as such
we'll take the hit now and start to encapsulate things.

To help transition the code nicely there is some wasted space in gen6/7.
This will be ameliorated shortly.

NOTE: The pun in the subject was intentional.

Signed-off-by: Ben Widawsky 

Conflicts:
drivers/gpu/drm/i915/i915_drv.h
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 175 ++--
 drivers/gpu/drm/i915/i915_gem_gtt.h |  24 +++--
 2 files changed, 104 insertions(+), 95 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index a8eb077..f2478c9 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -278,7 +278,8 @@ static void gen8_ppgtt_clear_range(struct 
i915_address_space *vm,
  I915_CACHE_LLC, use_scratch);
 
while (num_entries) {
-   struct page *page_table = ppgtt->gen8_pt_pages[pdpe][pde];
+   struct i915_pagedir *pd = &ppgtt->pdp.pagedir[pdpe];
+   struct page *page_table = pd->page_tables[pde].page;
 
last_pte = pte + num_entries;
if (last_pte > GEN8_PTES_PER_PT)
@@ -322,8 +323,11 @@ static void gen8_ppgtt_insert_entries(struct 
i915_address_space *vm,
if (WARN_ON(pdpe >= GEN8_LEGACY_PDPES))
break;
 
-   if (pt_vaddr == NULL)
-   pt_vaddr = kmap_atomic(ppgtt->gen8_pt_pages[pdpe][pde]);
+   if (pt_vaddr == NULL) {
+   struct i915_pagedir *pd = &ppgtt->pdp.pagedir[pdpe];
+   struct page *page_table = pd->page_tables[pde].page;
+   pt_vaddr = kmap_atomic(page_table);
+   }
 
pt_vaddr[pte] =
gen8_pte_encode(sg_page_iter_dma_address(&sg_iter),
@@ -347,29 +351,33 @@ static void gen8_ppgtt_insert_entries(struct 
i915_address_space *vm,
}
 }
 
-static void gen8_free_page_tables(struct page **pt_pages)
+static void gen8_free_page_tables(struct i915_pagedir *pd)
 {
int i;
 
-   if (pt_pages == NULL)
+   if (pd->page_tables == NULL)
return;
 
for (i = 0; i < I915_PDES_PER_PD; i++)
-   if (pt_pages[i])
-   __free_pages(pt_pages[i], 0);
+   if (pd->page_tables[i].page)
+   __free_page(pd->page_tables[i].page);
+}
+
+static void gen8_free_page_directories(struct i915_pagedir *pd)
+{
+   kfree(pd->page_tables);
+   __free_page(pd->page);
 }
 
-static void gen8_ppgtt_free(const struct i915_hw_ppgtt *ppgtt)
+static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
 {
int i;
 
for (i = 0; i < ppgtt->num_pd_pages; i++) {
-   gen8_free_page_tables(ppgtt->gen8_pt_pages[i]);
-   kfree(ppgtt->gen8_pt_pages[i]);
+   gen8_free_page_tables(&ppgtt->pdp.pagedir[i]);
+   gen8_free_page_directories(&ppgtt->pdp.pagedir[i]);
kfree(ppgtt->gen8_pt_dma_addr[i]);
}
-
-   __free_pages(ppgtt->pd_pages, get_order(ppgtt->num_pd_pages << 
PAGE_SHIFT));
 }
 
 static void gen8_ppgtt_dma_unmap_pages(struct i915_hw_ppgtt *ppgtt)
@@ -407,87 +415,73 @@ static void gen8_ppgtt_cleanup(struct i915_address_space 
*vm)
gen8_ppgtt_free(ppgtt);
 }
 
-static struct page **__gen8_alloc_page_tables(void)
+static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
 {
-   struct page **pt_pages;
int i;
 
-   pt_pages = kcalloc(I915_PDES_PER_PD, sizeof(struct page *), GFP_KERNEL);
-   if (!pt_pages)
-   return ERR_PTR(-ENOMEM);
-
-   for (i = 0; i < I915_PDES_PER_PD; i++) {
-   pt_pages[i] = alloc_page(GFP_KERNEL | __GFP_ZERO);
-   if (!pt_pages[i])
-   goto bail;
+   for (i = 0; i < ppgtt->num_pd_pages; i++) {
+   ppgtt->gen8_pt_dma_addr[i] = kcalloc(I915_PDES_PER_PD,
+sizeof(dma_addr_t),
+GFP_KERNEL);
+   if (!ppgtt->gen8_pt_dma_addr[i])
+   return -ENOMEM;
}
 
-   return pt_pages;
-
-bail:
-   gen8_free_page_tables(pt_pages);
-   kfree(pt_pages);
-   return ERR_PTR(-ENOMEM);
+   return 0;
 }
 
-static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt,
-  const int max_pdp)
+static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 {
-   struct page **pt_pages[GEN8_LEGACY_PDPES];
-   int i, ret;
+   int i, j;
 
-   for (i = 0; i < max_pdp; i++) {
-   pt_pages[i] = __gen8_alloc_page_tables();
-   if (IS_ERR(pt_pages[i])) {
-   ret

[Intel-gfx] [PATCH 10/56] drm/i915: s/pd/pdpe, s/pt/pde

2014-05-09 Thread Ben Widawsky

The actual correct way to think about this with the new style of page
table data structures is as the actual entry that is being indexed into
the array. "pd", and "pt" aren't representative of what the operation is
doing.

The clarity here will improve the readability of future patches.

Signed-off-by: Ben Widawsky 
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index d3c52b1..0869e54 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -515,40 +515,40 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
 }
 
 static int gen8_ppgtt_setup_page_directories(struct i915_hw_ppgtt *ppgtt,
-const int pd)
+const int pdpe)
 {
dma_addr_t pd_addr;
int ret;
 
pd_addr = pci_map_page(ppgtt->base.dev->pdev,
-  &ppgtt->pd_pages[pd], 0,
+  &ppgtt->pd_pages[pdpe], 0,
   PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
 
ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pd_addr);
if (ret)
return ret;
 
-   ppgtt->pd_dma_addr[pd] = pd_addr;
+   ppgtt->pd_dma_addr[pdpe] = pd_addr;
 
return 0;
 }
 
 static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
-   const int pd,
-   const int pt)
+   const int pdpe,
+   const int pde)
 {
dma_addr_t pt_addr;
struct page *p;
int ret;
 
-   p = ppgtt->gen8_pt_pages[pd][pt];
+   p = ppgtt->gen8_pt_pages[pdpe][pde];
pt_addr = pci_map_page(ppgtt->base.dev->pdev,
   p, 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
ret = pci_dma_mapping_error(ppgtt->base.dev->pdev, pt_addr);
if (ret)
return ret;
 
-   ppgtt->gen8_pt_dma_addr[pd][pt] = pt_addr;
+   ppgtt->gen8_pt_dma_addr[pdpe][pde] = pt_addr;
 
return 0;
 }
-- 
1.9.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 11/56] drm/i915: rename map/unmap to dma_map/unmap

2014-05-09 Thread Ben Widawsky

Upcoming patches will use the terms map and unmap in references to the
page table entries. Having this distinction will really help with code
clarity at that point.

Signed-off-by: Ben Widawsky 
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 0869e54..d772577 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -372,7 +372,7 @@ static void gen8_ppgtt_free(const struct i915_hw_ppgtt 
*ppgtt)
__free_pages(ppgtt->pd_pages, get_order(ppgtt->num_pd_pages << 
PAGE_SHIFT));
 }
 
-static void gen8_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
+static void gen8_ppgtt_dma_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 {
struct pci_dev *hwdev = ppgtt->base.dev->pdev;
int i, j;
@@ -403,7 +403,7 @@ static void gen8_ppgtt_cleanup(struct i915_address_space 
*vm)
list_del(&vm->global_link);
drm_mm_takedown(&vm->mm);
 
-   gen8_ppgtt_unmap_pages(ppgtt);
+   gen8_ppgtt_dma_unmap_pages(ppgtt);
gen8_ppgtt_free(ppgtt);
 }
 
@@ -631,7 +631,7 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, 
uint64_t size)
return 0;
 
 bail:
-   gen8_ppgtt_unmap_pages(ppgtt);
+   gen8_ppgtt_dma_unmap_pages(ppgtt);
gen8_ppgtt_free(ppgtt);
return ret;
 }
@@ -999,7 +999,7 @@ static void gen6_ppgtt_insert_entries(struct 
i915_address_space *vm,
kunmap_atomic(pt_vaddr);
 }
 
-static void gen6_ppgtt_unmap_pages(struct i915_hw_ppgtt *ppgtt)
+static void gen6_ppgtt_dma_unmap_pages(struct i915_hw_ppgtt *ppgtt)
 {
int i;
 
@@ -1030,7 +1030,7 @@ static void gen6_ppgtt_cleanup(struct i915_address_space 
*vm)
drm_mm_takedown(&ppgtt->base.mm);
drm_mm_remove_node(&ppgtt->node);
 
-   gen6_ppgtt_unmap_pages(ppgtt);
+   gen6_ppgtt_dma_unmap_pages(ppgtt);
gen6_ppgtt_free(ppgtt);
 }
 
@@ -1128,7 +1128,7 @@ static int gen6_ppgtt_setup_page_tables(struct 
i915_hw_ppgtt *ppgtt)
   PCI_DMA_BIDIRECTIONAL);
 
if (pci_dma_mapping_error(dev->pdev, pt_addr)) {
-   gen6_ppgtt_unmap_pages(ppgtt);
+   gen6_ppgtt_dma_unmap_pages(ppgtt);
return -EIO;
}
 
-- 
1.9.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 15/56] drm/i915: Make gen6_write_pdes gen6_map_page_tables

2014-05-09 Thread Ben Widawsky

Split out single mappings which will help with upcoming work. Also while
here, rename the function because it is a better description - but this
function is going away soon.

Signed-off-by: Ben Widawsky 
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 39 ++---
 1 file changed, 23 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 08b1b25..bfa9811 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -692,26 +692,33 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, 
struct seq_file *m)
}
 }
 
-static void gen6_write_pdes(struct i915_hw_ppgtt *ppgtt)
+static void gen6_map_single(struct i915_hw_ppgtt *ppgtt,
+   const unsigned pde_index,
+   dma_addr_t daddr)
 {
struct drm_i915_private *dev_priv = ppgtt->base.dev->dev_private;
-   gen6_gtt_pte_t __iomem *pd_addr;
uint32_t pd_entry;
+   gen6_gtt_pte_t __iomem *pd_addr =
+   (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm + ppgtt->pd_offset / 
sizeof(gen6_gtt_pte_t);
+
+   pd_entry = GEN6_PDE_ADDR_ENCODE(daddr);
+   pd_entry |= GEN6_PDE_VALID;
+
+   writel(pd_entry, pd_addr + pde_index);
+}
+
+/* Map all the page tables found in the ppgtt structure to incrementing page
+ * directories. */
+static void gen6_map_page_tables(struct i915_hw_ppgtt *ppgtt)
+{
+   struct drm_i915_private *dev_priv = ppgtt->base.dev->dev_private;
int i;
 
WARN_ON(ppgtt->pd_offset & 0x3f);
-   pd_addr = (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm +
-   ppgtt->pd_offset / sizeof(gen6_gtt_pte_t);
-   for (i = 0; i < ppgtt->num_pd_entries; i++) {
-   dma_addr_t pt_addr;
-
-   pt_addr = ppgtt->pt_dma_addr[i];
-   pd_entry = GEN6_PDE_ADDR_ENCODE(pt_addr);
-   pd_entry |= GEN6_PDE_VALID;
+   for (i = 0; i < ppgtt->num_pd_entries; i++)
+   gen6_map_single(ppgtt, i, ppgtt->pt_dma_addr[i]);
 
-   writel(pd_entry, pd_addr + i);
-   }
-   readl(pd_addr);
+   readl(dev_priv->gtt.gsm);
 }
 
 static uint32_t get_pd_offset(struct i915_hw_ppgtt *ppgtt)
@@ -1180,7 +1187,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
ppgtt->pd_offset =
ppgtt->node.start / PAGE_SIZE * sizeof(gen6_gtt_pte_t);
 
-   gen6_write_pdes(ppgtt);
+   gen6_map_page_tables(ppgtt);
 
ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
 
@@ -1369,11 +1376,11 @@ void i915_gem_restore_gtt_mappings(struct drm_device 
*dev)
/* TODO: Perhaps it shouldn't be gen6 specific */
if (i915_is_ggtt(vm)) {
if (dev_priv->mm.aliasing_ppgtt)
-   gen6_write_pdes(dev_priv->mm.aliasing_ppgtt);
+   
gen6_map_page_tables(dev_priv->mm.aliasing_ppgtt);
continue;
}
 
-   gen6_write_pdes(container_of(vm, struct i915_hw_ppgtt, base));
+   gen6_map_page_tables(container_of(vm, struct i915_hw_ppgtt, 
base));
}
 
i915_gem_chipset_flush(dev);
-- 
1.9.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 09/56] drm/i915: Split out verbose PPGTT dumping

2014-05-09 Thread Ben Widawsky

There often is not enough memory to dump the full contents of the PPGTT.
As a temporary bandage, to continue getting valuable basic PPGTT info,
wrap the dangerous, memory hungry part inside of a new verbose version
of the debugfs file.

Also while here we can split out the PPGTT print function so it's more
reusable.

I'd really like to get PPGTT info into our error state, but I found it too
difficult to make work in the limited time I have. Maybe Mika can find a way.

v2: Get the info for the non-default contexts. Merge a patch from Chris
into this patch (Chris). All credit goes to him.

References: 20140320115742.ga4...@nuc-i3427.alporthouse.com
Cc: Mika Kuoppala 
Cc: Chris Wilson 
Signed-off-by: Ben Widawsky 
---
 drivers/gpu/drm/i915/i915_debugfs.c | 49 +++--
 1 file changed, 30 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index d9c1414..4a0b1c8 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1812,18 +1812,13 @@ static int i915_swizzle_info(struct seq_file *m, void 
*data)
return 0;
 }
 
-static int per_file_ctx(int id, void *ptr, void *data)
+static void print_ppgtt(struct seq_file *m, struct i915_hw_ppgtt *ppgtt, const 
char *name)
 {
-   struct i915_hw_context *ctx = ptr;
-   struct seq_file *m = data;
-   struct i915_hw_ppgtt *ppgtt = ctx_to_ppgtt(ctx);
-
-   ppgtt->debug_dump(ppgtt, m);
-
-   return 0;
+   seq_printf(m, "%s:\n", name);
+   seq_printf(m, "pd gtt offset: 0x%08x\n", ppgtt->pd_offset);
 }
 
-static void gen8_ppgtt_info(struct seq_file *m, struct drm_device *dev)
+static void gen8_ppgtt_info(struct seq_file *m, struct drm_device *dev, int 
verbose)
 {
struct drm_i915_private *dev_priv = dev->dev_private;
struct intel_ring_buffer *ring;
@@ -1847,7 +1842,21 @@ static void gen8_ppgtt_info(struct seq_file *m, struct 
drm_device *dev)
}
 }
 
-static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)
+static int per_file_ctx(int id, void *ptr, void *data)
+{
+   struct i915_hw_context *ctx = ptr;
+   struct seq_file *m = data;
+   bool verbose = (unsigned long)data & 1;
+   struct i915_hw_ppgtt *ppgtt = ctx_to_ppgtt(ctx);
+
+   print_ppgtt(m, ppgtt, ctx->id == DEFAULT_CONTEXT_ID ? "Default context" 
: "User context");
+   if (verbose)
+   ppgtt->debug_dump(ppgtt, m);
+
+   return 0;
+}
+
+static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev, bool 
verbose)
 {
struct drm_i915_private *dev_priv = dev->dev_private;
struct intel_ring_buffer *ring;
@@ -1868,10 +1877,9 @@ static void gen6_ppgtt_info(struct seq_file *m, struct 
drm_device *dev)
if (dev_priv->mm.aliasing_ppgtt) {
struct i915_hw_ppgtt *ppgtt = dev_priv->mm.aliasing_ppgtt;
 
-   seq_puts(m, "aliasing PPGTT:\n");
-   seq_printf(m, "pd gtt offset: 0x%08x\n", ppgtt->pd_offset);
-
-   ppgtt->debug_dump(ppgtt, m);
+   print_ppgtt(m, ppgtt, "Aliasing PPGTT");
+   if (verbose)
+   ppgtt->debug_dump(ppgtt, m);
} else
return;
 
@@ -1880,10 +1888,11 @@ static void gen6_ppgtt_info(struct seq_file *m, struct 
drm_device *dev)
struct i915_hw_ppgtt *pvt_ppgtt;
 
pvt_ppgtt = ctx_to_ppgtt(file_priv->private_default_ctx);
-   seq_printf(m, "proc: %s\n",
+   seq_printf(m, "\nproc: %s\n",
   get_pid_task(file->pid, PIDTYPE_PID)->comm);
-   seq_puts(m, "  default context:\n");
-   idr_for_each(&file_priv->context_idr, per_file_ctx, m);
+   print_ppgtt(m, pvt_ppgtt, "Default context");
+   idr_for_each(&file_priv->context_idr, per_file_ctx,
+(void *)((unsigned long)m | verbose));
}
seq_printf(m, "ECOCHK: 0x%08x\n", I915_READ(GAM_ECOCHK));
 }
@@ -1893,6 +1902,7 @@ static int i915_ppgtt_info(struct seq_file *m, void *data)
struct drm_info_node *node = (struct drm_info_node *) m->private;
struct drm_device *dev = node->minor->dev;
struct drm_i915_private *dev_priv = dev->dev_private;
+   bool verbose = node->info_ent->data ? true : false;
 
int ret = mutex_lock_interruptible(&dev->struct_mutex);
if (ret)
@@ -1900,9 +1910,9 @@ static int i915_ppgtt_info(struct seq_file *m, void *data)
intel_runtime_pm_get(dev_priv);
 
if (INTEL_INFO(dev)->gen >= 8)
-   gen8_ppgtt_info(m, dev);
+   gen8_ppgtt_info(m, dev, verbose);
else if (INTEL_INFO(dev)->gen >= 6)
-   gen6_ppgtt_info(m, dev);
+   gen6_ppgtt_info(m, dev, verbose);
 
intel_runtime_pm_put(dev_priv);
mutex_unlock(&dev->struct_mutex);
@@ -3843,6 +3853,7 @@ static const struct drm_info_list i91

[Intel-gfx] [PATCH 20/56] drm/i915: Create page table allocators

2014-05-09 Thread Ben Widawsky

As we move toward dynamic page table allocation, it becomes much easier
to manage our data structures if break do things less coarsely by
breaking up all of our actions into individual tasks.  This makes the
code easier to write, read, and verify.

Aside from the dissection of the allocation functions, the patch
statically allocates the page table structures without a page directory.
This remains the same for all platforms,

The patch itself should not have much functional difference. The primary
noticeable difference is the fact that page tables are no longer
allocated, but rather statically declared as part of the page directory.
This has non-zero overhead, but things gain non-trivial complexity as a
result.

This patch exists for a few reasons:
1. Splitting out the functions allows easily combining GEN6 and GEN8
code. Page tables have no difference based on GEN8. As we'll see in a
future patch when we add the DMA mappings to the allocations, it
requires only one small change to make work, and error handling should
just fall into place.

2. Unless we always want to allocate all page tables under a given PDE,
we'll have to eventually break this up into an array of pointers (or
pointer to pointer).

3. Having the discrete functions is easier to review, and understand.
All allocations and frees now take place in just a couple of locations.
Reviewing, and catching leaks should be easy.

4. Less important: the GFP flags are confined to one location, which
makes playing around with such things trivial.

v2: Updated commit message to explain why this patch exists

Signed-off-by: Ben Widawsky 
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 226 +++-
 drivers/gpu/drm/i915/i915_gem_gtt.h |   4 +-
 2 files changed, 147 insertions(+), 83 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 1f186d3..35370eb 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -211,6 +211,102 @@ static gen6_gtt_pte_t iris_pte_encode(dma_addr_t addr,
return pte;
 }
 
+static void free_pt_single(struct i915_pagetab *pt)
+{
+   if (WARN_ON(!pt->page))
+   return;
+   __free_page(pt->page);
+   kfree(pt);
+}
+
+static struct i915_pagetab *alloc_pt_single(void)
+{
+   struct i915_pagetab *pt;
+
+   pt = kzalloc(sizeof(*pt), GFP_KERNEL);
+   if (!pt)
+   return ERR_PTR(-ENOMEM);
+
+   pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+   if (!pt->page) {
+   kfree(pt);
+   return ERR_PTR(-ENOMEM);
+   }
+
+   return pt;
+}
+
+/**
+ * alloc_pt_range() - Allocate a multiple page tables
+ * @pd:The page directory which will have at least @count 
entries
+ * available to point to the allocated page tables.
+ * @pde:   First page directory entry for which we are allocating.
+ * @count: Number of pages to allocate.
+ *
+ * Allocates multiple page table pages and sets the appropriate entries in the
+ * page table structure within the page directory. Function cleans up after
+ * itself on any failures.
+ *
+ * Return: 0 if allocation succeeded.
+ */
+static int alloc_pt_range(struct i915_pagedir *pd, uint16_t pde, size_t count)
+{
+   int i, ret;
+
+   /* 512 is the max page tables per pagedir on any platform.
+* TODO: make WARN after patch series is done
+*/
+   BUG_ON(pde + count > I915_PDES_PER_PD);
+
+   for (i = pde; i < pde + count; i++) {
+   struct i915_pagetab *pt = alloc_pt_single();
+   if (IS_ERR(pt)) {
+   ret = PTR_ERR(pt);
+   goto err_out;
+   }
+   WARN(pd->page_tables[i],
+"Leaking page directory entry %d (%pa)\n",
+i, pd->page_tables[i]);
+   pd->page_tables[i] = pt;
+   }
+
+   return 0;
+
+err_out:
+   while (i--)
+   free_pt_single(pd->page_tables[i]);
+   return ret;
+}
+
+static void __free_pd_single(struct i915_pagedir *pd)
+{
+   __free_page(pd->page);
+   kfree(pd);
+}
+
+#define free_pd_single(pd) do { \
+   if ((pd)->page) { \
+   __free_pd_single(pd); \
+   } \
+} while (0)
+
+static struct i915_pagedir *alloc_pd_single(void)
+{
+   struct i915_pagedir *pd;
+
+   pd = kzalloc(sizeof(*pd), GFP_KERNEL);
+   if (!pd)
+   return ERR_PTR(-ENOMEM);
+
+   pd->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+   if (!pd->page) {
+   kfree(pd);
+   return ERR_PTR(-ENOMEM);
+   }
+
+   return pd;
+}
+
 /* Broadwell Page Directory Pointer Descriptors */
 static int gen8_write_pdp(struct intel_ring_buffer *ring, unsigned entry,
   uint64_t val, bool synchronous)
@@ -251,7 +347,7 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
int used_pd = ppgtt->num_pd_entrie

[Intel-gfx] [PATCH 19/56] drm/i915: Complete page table structures

2014-05-09 Thread Ben Widawsky

Move the remaining members over to the new page table structures.

This can be squashed with the previous commit if desire. The reasoning
is the same as that patch. I simply felt it is easier to review if split.

Signed-off-by: Ben Widawsky 

Conflicts:
drivers/gpu/drm/i915/i915_drv.h
drivers/gpu/drm/i915/i915_gem_gtt.c
---
 drivers/gpu/drm/i915/i915_debugfs.c   |  2 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c   | 85 +--
 drivers/gpu/drm/i915/i915_gem_gtt.h   | 15 +++
 drivers/gpu/drm/i915/i915_gpu_error.c |  1 -
 4 files changed, 38 insertions(+), 65 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index 4a0b1c8..64051b0 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1815,7 +1815,7 @@ static int i915_swizzle_info(struct seq_file *m, void 
*data)
 static void print_ppgtt(struct seq_file *m, struct i915_hw_ppgtt *ppgtt, const 
char *name)
 {
seq_printf(m, "%s:\n", name);
-   seq_printf(m, "pd gtt offset: 0x%08x\n", ppgtt->pd_offset);
+   seq_printf(m, "pd gtt offset: 0x%08x\n", ppgtt->pd.pd_offset);
 }
 
 static void gen8_ppgtt_info(struct seq_file *m, struct drm_device *dev, int 
verbose)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index f2478c9..1f186d3 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -251,7 +251,7 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
int used_pd = ppgtt->num_pd_entries / I915_PDES_PER_PD;
 
for (i = used_pd - 1; i >= 0; i--) {
-   dma_addr_t addr = ppgtt->pd_dma_addr[i];
+   dma_addr_t addr = ppgtt->pdp.pagedir[i].daddr;
ret = gen8_write_pdp(ring, i, addr, synchronous);
if (ret)
return ret;
@@ -376,7 +376,6 @@ static void gen8_ppgtt_free(struct i915_hw_ppgtt *ppgtt)
for (i = 0; i < ppgtt->num_pd_pages; i++) {
gen8_free_page_tables(&ppgtt->pdp.pagedir[i]);
gen8_free_page_directories(&ppgtt->pdp.pagedir[i]);
-   kfree(ppgtt->gen8_pt_dma_addr[i]);
}
 }
 
@@ -388,14 +387,14 @@ static void gen8_ppgtt_dma_unmap_pages(struct 
i915_hw_ppgtt *ppgtt)
for (i = 0; i < ppgtt->num_pd_pages; i++) {
/* TODO: In the future we'll support sparse mappings, so this
 * will have to change. */
-   if (!ppgtt->pd_dma_addr[i])
+   if (!ppgtt->pdp.pagedir[i].daddr)
continue;
 
-   pci_unmap_page(hwdev, ppgtt->pd_dma_addr[i], PAGE_SIZE,
+   pci_unmap_page(hwdev, ppgtt->pdp.pagedir[i].daddr, PAGE_SIZE,
   PCI_DMA_BIDIRECTIONAL);
 
for (j = 0; j < I915_PDES_PER_PD; j++) {
-   dma_addr_t addr = ppgtt->gen8_pt_dma_addr[i][j];
+   dma_addr_t addr = 
ppgtt->pdp.pagedir[i].page_tables[j].daddr;
if (addr)
pci_unmap_page(hwdev, addr, PAGE_SIZE,
   PCI_DMA_BIDIRECTIONAL);
@@ -415,31 +414,18 @@ static void gen8_ppgtt_cleanup(struct i915_address_space 
*vm)
gen8_ppgtt_free(ppgtt);
 }
 
-static int gen8_ppgtt_allocate_dma(struct i915_hw_ppgtt *ppgtt)
-{
-   int i;
-
-   for (i = 0; i < ppgtt->num_pd_pages; i++) {
-   ppgtt->gen8_pt_dma_addr[i] = kcalloc(I915_PDES_PER_PD,
-sizeof(dma_addr_t),
-GFP_KERNEL);
-   if (!ppgtt->gen8_pt_dma_addr[i])
-   return -ENOMEM;
-   }
-
-   return 0;
-}
-
 static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
 {
int i, j;
 
for (i = 0; i < ppgtt->num_pd_pages; i++) {
+   struct i915_pagedir *pd = &ppgtt->pdp.pagedir[i];
for (j = 0; j < I915_PDES_PER_PD; j++) {
-   struct i915_pagetab *pt = 
&ppgtt->pdp.pagedir[i].page_tables[j];
+   struct i915_pagetab *pt = &pd->page_tables[j];
pt->page = alloc_page(GFP_KERNEL | __GFP_ZERO);
if (!pt->page)
goto unwind_out;
+
}
}
 
@@ -499,9 +485,7 @@ static int gen8_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt,
 
ppgtt->num_pd_entries = max_pdp * I915_PDES_PER_PD;
 
-   ret = gen8_ppgtt_allocate_dma(ppgtt);
-   if (!ret)
-   return ret;
+   return 0;
 
/* TODO: Check this for all cases */
 err_out:
@@ -523,7 +507,7 @@ static int gen8_ppgtt_setup_page_directories(struct 
i915_hw_ppgtt *ppgtt,
if (ret)
return ret;
 
-   ppgtt->pd_dma_addr[pdpe] = pd_addr;
+   ppgtt->pdp.pagedir[pdpe].daddr = pd_addr;
 
re

[Intel-gfx] [PATCH 14/56] drm/i915: Un-hardcode number of page directories

2014-05-09 Thread Ben Widawsky

trivial.

Signed-off-by: Ben Widawsky 
---
 drivers/gpu/drm/i915/i915_gem_gtt.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h 
b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 7c06c43..2002393 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -256,7 +256,7 @@ struct i915_hw_ppgtt {
};
union {
dma_addr_t *pt_dma_addr;
-   dma_addr_t *gen8_pt_dma_addr[4];
+   dma_addr_t *gen8_pt_dma_addr[GEN8_LEGACY_PDPES];
};
 
struct i915_hw_context *ctx;
-- 
1.9.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 08/56] drm/i915: Rename to GEN8_LEGACY_PDPES

2014-05-09 Thread Ben Widawsky

In gen8, 32b PPGTT has always had one "pdp" (it doesn't actually have
one, but it resembles having one). The #define was confusing as is, and
using "PDPE" is a much better description.

sed -i 's/GEN8_LEGACY_PDPS/GEN8_LEGACY_PDPES/' drivers/gpu/drm/i915/*.[ch]

Signed-off-by: Ben Widawsky 
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 6 +++---
 drivers/gpu/drm/i915/i915_gem_gtt.h | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 33cac92..d3c52b1 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -319,7 +319,7 @@ static void gen8_ppgtt_insert_entries(struct 
i915_address_space *vm,
pt_vaddr = NULL;
 
for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
-   if (WARN_ON(pdpe >= GEN8_LEGACY_PDPS))
+   if (WARN_ON(pdpe >= GEN8_LEGACY_PDPES))
break;
 
if (pt_vaddr == NULL)
@@ -433,7 +433,7 @@ bail:
 static int gen8_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt,
   const int max_pdp)
 {
-   struct page **pt_pages[GEN8_LEGACY_PDPS];
+   struct page **pt_pages[GEN8_LEGACY_PDPES];
int i, ret;
 
for (i = 0; i < max_pdp; i++) {
@@ -485,7 +485,7 @@ static int gen8_ppgtt_allocate_page_directories(struct 
i915_hw_ppgtt *ppgtt,
return -ENOMEM;
 
ppgtt->num_pd_pages = 1 << get_order(max_pdp << PAGE_SHIFT);
-   BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPS);
+   BUG_ON(ppgtt->num_pd_pages > GEN8_LEGACY_PDPES);
 
return 0;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h 
b/drivers/gpu/drm/i915/i915_gem_gtt.h
index ad68079..7c06c43 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -84,7 +84,7 @@ typedef gen8_gtt_pte_t gen8_ppgtt_pde_t;
 #define GEN8_PDE_MASK  0x1ff
 #define GEN8_PTE_SHIFT 12
 #define GEN8_PTE_MASK  0x1ff
-#define GEN8_LEGACY_PDPS   4
+#define GEN8_LEGACY_PDPES  4
 #define GEN8_PTES_PER_PAGE (PAGE_SIZE / sizeof(gen8_gtt_pte_t))
 #define GEN8_PDES_PER_PAGE (PAGE_SIZE / sizeof(gen8_ppgtt_pde_t))
 
@@ -247,12 +247,12 @@ struct i915_hw_ppgtt {
unsigned num_pd_pages; /* gen8+ */
union {
struct page **pt_pages;
-   struct page **gen8_pt_pages[GEN8_LEGACY_PDPS];
+   struct page **gen8_pt_pages[GEN8_LEGACY_PDPES];
};
struct page *pd_pages;
union {
uint32_t pd_offset;
-   dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPS];
+   dma_addr_t pd_dma_addr[GEN8_LEGACY_PDPES];
};
union {
dma_addr_t *pt_dma_addr;
-- 
1.9.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 04/56] drm/i915: Wrap VMA binding

2014-05-09 Thread Ben Widawsky

This will be useful for some upcoming patches which do more platform
specific work. Having it in one central place just makes things a bit
cleaner and easier.

NOTE: I didn't actually end up using this patch for the intended
purpose, but I thought it was a nice patch to keep around.

v2: s/i915_gem_bind_vma/i915_gem_vma_bind/
s/i915_gem_unbind_vma/i915_gem_vma_unbind/
(Chris)

v3: Missed one spot

v4: Don't change the trace events (Daniel)

Signed-off-by: Ben Widawsky 
---
 drivers/gpu/drm/i915/i915_drv.h|  3 +++
 drivers/gpu/drm/i915/i915_gem.c| 12 ++--
 drivers/gpu/drm/i915/i915_gem_context.c|  2 +-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  5 +++--
 drivers/gpu/drm/i915/i915_gem_gtt.c| 13 -
 5 files changed, 25 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 1a190a1..88d3d82 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2312,6 +2312,9 @@ bool i915_gem_obj_bound(struct drm_i915_gem_object *o,
struct i915_address_space *vm);
 unsigned long i915_gem_obj_size(struct drm_i915_gem_object *o,
struct i915_address_space *vm);
+void i915_gem_vma_bind(struct i915_vma *vma, enum i915_cache_level,
+  unsigned flags);
+void i915_gem_vma_unbind(struct i915_vma *vma);
 struct i915_vma *i915_gem_obj_to_vma(struct drm_i915_gem_object *obj,
 struct i915_address_space *vm);
 struct i915_vma *
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 8fd1824..59b0e67 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2771,7 +2771,7 @@ int i915_vma_unbind(struct i915_vma *vma)
 
trace_i915_vma_unbind(vma);
 
-   vma->unbind_vma(vma);
+   i915_gem_vma_unbind(vma);
 
i915_gem_gtt_finish_object(obj);
 
@@ -3317,8 +3317,8 @@ search_free:
WARN_ON(flags & PIN_MAPPABLE && !obj->map_and_fenceable);
 
trace_i915_vma_bind(vma, flags);
-   vma->bind_vma(vma, obj->cache_level,
- flags & (PIN_MAPPABLE | PIN_GLOBAL) ? GLOBAL_BIND : 0);
+   i915_gem_vma_bind(vma, obj->cache_level,
+ flags & (PIN_MAPPABLE | PIN_GLOBAL) ? GLOBAL_BIND : 
0);
 
i915_gem_verify_gtt(dev);
return vma;
@@ -3522,8 +3522,8 @@ int i915_gem_object_set_cache_level(struct 
drm_i915_gem_object *obj,
 
list_for_each_entry(vma, &obj->vma_list, vma_link)
if (drm_mm_node_allocated(&vma->node))
-   vma->bind_vma(vma, cache_level,
- obj->has_global_gtt_mapping ? 
GLOBAL_BIND : 0);
+   i915_gem_vma_bind(vma, cache_level,
+ obj->has_global_gtt_mapping ? 
GLOBAL_BIND : 0);
}
 
list_for_each_entry(vma, &obj->vma_list, vma_link)
@@ -3892,7 +3892,7 @@ i915_gem_object_pin(struct drm_i915_gem_object *obj,
}
 
if (flags & PIN_GLOBAL && !obj->has_global_gtt_mapping)
-   vma->bind_vma(vma, obj->cache_level, GLOBAL_BIND);
+   i915_gem_vma_bind(vma, obj->cache_level, GLOBAL_BIND);
 
vma->pin_count++;
if (flags & PIN_MAPPABLE)
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
b/drivers/gpu/drm/i915/i915_gem_context.c
index 29dd825..f2dc17a 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -652,7 +652,7 @@ static int do_switch(struct intel_ring_buffer *ring,
if (!to->obj->has_global_gtt_mapping) {
struct i915_vma *vma = i915_gem_obj_to_vma(to->obj,
   &dev_priv->gtt.base);
-   vma->bind_vma(vma, to->obj->cache_level, GLOBAL_BIND);
+   i915_gem_vma_bind(vma, to->obj->cache_level, GLOBAL_BIND);
}
 
if (!to->is_initialized || i915_gem_context_is_default(to))
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 47fe8ec..cd9b932 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -373,7 +373,8 @@ i915_gem_execbuffer_relocate_entry(struct 
drm_i915_gem_object *obj,
struct i915_vma *vma =
list_first_entry(&target_i915_obj->vma_list,
 typeof(*vma), vma_link);
-   vma->bind_vma(vma, target_i915_obj->cache_level, GLOBAL_BIND);
+   i915_gem_vma_bind(vma, target_i915_obj->cache_level,
+ GLOBAL_BIND);
}
 
/* Validate that the target is in a valid r/w GPU domain */
@@ -1269,7 +1270,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 * allocate spa

[Intel-gfx] [PATCH 12/56] drm/i915: Setup less PPGTT on failed pagedir

2014-05-09 Thread Ben Widawsky

The current code will both potentially print a WARN, and setup part of
the PPGTT structure. Neither of these harm the current code, it is
simply for clarity, and to perhaps prevent later bugs, or weird
debug messages.

Signed-off-by: Ben Widawsky 
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index d772577..5ca8208 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1063,11 +1063,14 @@ alloc:
goto alloc;
}
 
+   if (ret)
+   return ret;
+
if (ppgtt->node.start < dev_priv->gtt.mappable_end)
DRM_DEBUG("Forced to use aperture for PDEs\n");
 
ppgtt->num_pd_entries = GEN6_PPGTT_PD_ENTRIES;
-   return ret;
+   return 0;
 }
 
 static int gen6_ppgtt_allocate_page_tables(struct i915_hw_ppgtt *ppgtt)
-- 
1.9.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 13/56] drm/i915: clean up PPGTT init error path

2014-05-09 Thread Ben Widawsky

The old code (I'm having trouble finding the commit) had a reason for
doing things when there was an error, and would continue on, thus the
!ret. For the newer code however, this looks completely silly.

Follow the normal idiom of if (ret) return ret.

Also, put the pde wiring in the gen specific init, now that GEN8 exists.

Signed-off-by: Ben Widawsky 
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 22 +-
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 5ca8208..08b1b25 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1180,6 +1180,8 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt)
ppgtt->pd_offset =
ppgtt->node.start / PAGE_SIZE * sizeof(gen6_gtt_pte_t);
 
+   gen6_write_pdes(ppgtt);
+
ppgtt->base.clear_range(&ppgtt->base, 0, ppgtt->base.total, true);
 
DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n",
@@ -1204,20 +1206,14 @@ int i915_gem_init_ppgtt(struct drm_device *dev, struct 
i915_hw_ppgtt *ppgtt)
else
BUG();
 
-   if (!ret) {
-   struct drm_i915_private *dev_priv = dev->dev_private;
-   kref_init(&ppgtt->ref);
-   drm_mm_init(&ppgtt->base.mm, ppgtt->base.start,
-   ppgtt->base.total);
-   i915_init_vm(dev_priv, &ppgtt->base);
-   if (INTEL_INFO(dev)->gen < 8) {
-   gen6_write_pdes(ppgtt);
-   DRM_DEBUG("Adding PPGTT at offset %x\n",
- ppgtt->pd_offset << 10);
-   }
-   }
+   if (ret)
+   return ret;
 
-   return ret;
+   kref_init(&ppgtt->ref);
+   drm_mm_init(&ppgtt->base.mm, ppgtt->base.start, ppgtt->base.total);
+   i915_init_vm(dev_priv, &ppgtt->base);
+
+   return 0;
 }
 
 static void
-- 
1.9.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 06/56] drm/i915: Split out aliasing binds

2014-05-09 Thread Ben Widawsky

This patch finishes off  actually separating the aliasing and global
finds. Prior to this, all global binds would be aliased. Now if aliasing
binds are required, they must be explicitly asked for. So far, we have
no users of this outside of execbuf - but Mika has already submitted a
patch requiring just this.

A nice benefit of this is we should no longer be able to clobber GTT
only objects from the aliasing PPGTT.

v2: Only add aliasing binds for the GGTT/Aliasing PPGTT at execbuf

Signed-off-by: Ben Widawsky 
---
 drivers/gpu/drm/i915/i915_drv.h| 2 +-
 drivers/gpu/drm/i915/i915_gem.c| 6 --
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 5 +++--
 drivers/gpu/drm/i915/i915_gem_gtt.c| 3 +++
 4 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 62e1ecb..29bf034 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2364,7 +2364,7 @@ i915_gem_obj_ggtt_pin(struct drm_i915_gem_object *obj,
  uint32_t alignment,
  unsigned flags)
 {
-   return i915_gem_object_pin(obj, obj_to_ggtt(obj), alignment, flags | 
PIN_GLOBAL_ALIASED);
+   return i915_gem_object_pin(obj, obj_to_ggtt(obj), alignment, flags | 
PIN_GLOBAL);
 }
 
 static inline int
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index e3ac643..320d6b0 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3320,8 +3320,10 @@ search_free:
 
WARN_ON(flags & PIN_MAPPABLE && !obj->map_and_fenceable);
 
-   if (flags & PIN_GLOBAL_ALIASED)
-   vma_bind_flags = GLOBAL_BIND | ALIASING_BIND;
+   if (flags & PIN_ALIASING)
+   vma_bind_flags = ALIASING_BIND;
+   if (flags & PIN_GLOBAL)
+   vma_bind_flags = GLOBAL_BIND;
 
trace_i915_vma_bind(vma, flags);
i915_gem_vma_bind(vma, obj->cache_level, vma_bind_flags);
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 7cad10f..3c3aba7 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -549,10 +549,11 @@ i915_gem_execbuffer_reserve_vma(struct i915_vma *vma,
struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
bool has_fenced_gpu_access = INTEL_INFO(ring->dev)->gen < 4;
bool need_fence;
-   unsigned flags;
+   unsigned flags = 0;
int ret;
 
-   flags = 0;
+   if (i915_is_ggtt(vma->vm))
+   flags = PIN_ALIASING;
 
need_fence =
has_fenced_gpu_access &&
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 226afea..bec637b 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1595,6 +1595,9 @@ static void ggtt_bind_vma(struct i915_vma *vma,
}
}
 
+   if (!(flags & ALIASING_BIND))
+   return;
+
if (dev_priv->mm.aliasing_ppgtt &&
(!obj->has_aliasing_ppgtt_mapping ||
 (cache_level != obj->cache_level))) {
-- 
1.9.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 05/56] drm/i915: Make pin global flags explicit

2014-05-09 Thread Ben Widawsky

The driver currently lets callers pin global, and then tries to do
things correctly inside the function. Doing so has two downsides:
1. It's not possible to exclusively pin to a global, or an aliasing
address space.
2. It's difficult to read, and understand.

The eventual goal when realized should fix both of the issues. This patch
which should have no functional impact begins to address these issues
without intentionally breaking things.

v2: Replace PIN_GLOBAL with PIN_ALIASING in _pin(). Copy paste error

Signed-off-by: Ben Widawsky 
---
 drivers/gpu/drm/i915/i915_drv.h|  4 +++-
 drivers/gpu/drm/i915/i915_gem.c| 31 +++---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  8 ++--
 drivers/gpu/drm/i915/i915_gem_gtt.c| 12 ++--
 drivers/gpu/drm/i915/i915_gem_gtt.h|  4 
 5 files changed, 47 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 88d3d82..62e1ecb 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2136,6 +2136,8 @@ void i915_gem_vma_destroy(struct i915_vma *vma);
 #define PIN_MAPPABLE 0x1
 #define PIN_NONBLOCK 0x2
 #define PIN_GLOBAL 0x4
+#define PIN_ALIASING 0x8
+#define PIN_GLOBAL_ALIASED (PIN_ALIASING | PIN_GLOBAL)
 int __must_check i915_gem_object_pin(struct drm_i915_gem_object *obj,
 struct i915_address_space *vm,
 uint32_t alignment,
@@ -2362,7 +2364,7 @@ i915_gem_obj_ggtt_pin(struct drm_i915_gem_object *obj,
  uint32_t alignment,
  unsigned flags)
 {
-   return i915_gem_object_pin(obj, obj_to_ggtt(obj), alignment, flags | 
PIN_GLOBAL);
+   return i915_gem_object_pin(obj, obj_to_ggtt(obj), alignment, flags | 
PIN_GLOBAL_ALIASED);
 }
 
 static inline int
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 59b0e67..e3ac643 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3231,8 +3231,12 @@ i915_gem_object_bind_to_vm(struct drm_i915_gem_object 
*obj,
size_t gtt_max =
flags & PIN_MAPPABLE ? dev_priv->gtt.mappable_end : vm->total;
struct i915_vma *vma;
+   u32 vma_bind_flags = 0;
int ret;
 
+   if (WARN_ON((flags & (PIN_MAPPABLE | PIN_GLOBAL)) == PIN_MAPPABLE))
+   flags |= PIN_GLOBAL;
+
fence_size = i915_gem_get_gtt_size(dev,
   obj->base.size,
   obj->tiling_mode);
@@ -3316,9 +3320,11 @@ search_free:
 
WARN_ON(flags & PIN_MAPPABLE && !obj->map_and_fenceable);
 
+   if (flags & PIN_GLOBAL_ALIASED)
+   vma_bind_flags = GLOBAL_BIND | ALIASING_BIND;
+
trace_i915_vma_bind(vma, flags);
-   i915_gem_vma_bind(vma, obj->cache_level,
- flags & (PIN_MAPPABLE | PIN_GLOBAL) ? GLOBAL_BIND : 
0);
+   i915_gem_vma_bind(vma, obj->cache_level, vma_bind_flags);
 
i915_gem_verify_gtt(dev);
return vma;
@@ -3521,9 +3527,14 @@ int i915_gem_object_set_cache_level(struct 
drm_i915_gem_object *obj,
}
 
list_for_each_entry(vma, &obj->vma_list, vma_link)
-   if (drm_mm_node_allocated(&vma->node))
-   i915_gem_vma_bind(vma, cache_level,
- obj->has_global_gtt_mapping ? 
GLOBAL_BIND : 0);
+   if (drm_mm_node_allocated(&vma->node)) {
+   u32 bind_flags = 0;
+   if (obj->has_global_gtt_mapping)
+   bind_flags |= GLOBAL_BIND;
+   if (obj->has_aliasing_ppgtt_mapping)
+   bind_flags |= ALIASING_BIND;
+   i915_gem_vma_bind(vma, cache_level, bind_flags);
+   }
}
 
list_for_each_entry(vma, &obj->vma_list, vma_link)
@@ -3891,8 +3902,14 @@ i915_gem_object_pin(struct drm_i915_gem_object *obj,
return PTR_ERR(vma);
}
 
-   if (flags & PIN_GLOBAL && !obj->has_global_gtt_mapping)
-   i915_gem_vma_bind(vma, obj->cache_level, GLOBAL_BIND);
+   if (flags & PIN_GLOBAL_ALIASED) {
+   u32 bind_flags = 0;
+   if (flags & PIN_GLOBAL && !obj->has_global_gtt_mapping)
+   bind_flags |= GLOBAL_BIND;
+   if (flags & PIN_ALIASING && !obj->has_aliasing_ppgtt_mapping)
+   bind_flags |= ALIASING_BIND;
+   i915_gem_vma_bind(vma, obj->cache_level, bind_flags);
+   }
 
vma->pin_count++;
if (flags & PIN_MAPPABLE)
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index cd9b932..7cad10f 100644
--- a/drivers/g

[Intel-gfx] [PATCH 03/56] drm/i915: Prevent signals from interrupting close()

2014-05-09 Thread Ben Widawsky

From: Chris Wilson 

We neither report any unfinished operations during releasing GEM objects
associated with the file, and even if we did, it is bad form to report
-EINTR from a close().

The root cause of the bug that first showed itself during close is that
we do not do proper live tracking of vma and contexts under full-ppgtt,
but this is useful piece of defensive programming enforcing our
userspace API contract.

Cc: Ben Widawsky 
Signed-off-by: Chris Wilson 
Reviewed-by: Ben Widawsky 
---
 drivers/gpu/drm/i915/i915_dma.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index d10ddcc..54a08a9 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1921,9 +1921,18 @@ void i915_driver_lastclose(struct drm_device * dev)
 
 void i915_driver_preclose(struct drm_device * dev, struct drm_file *file_priv)
 {
+   struct drm_i915_private *dev_priv = to_i915(dev);
+   bool was_interruptible;
+
mutex_lock(&dev->struct_mutex);
+   was_interruptible = dev_priv->mm.interruptible;
+   WARN_ON(!was_interruptible);
+   dev_priv->mm.interruptible = false;
+
i915_gem_context_close(dev, file_priv);
i915_gem_release(dev, file_priv);
+
+   dev_priv->mm.interruptible = was_interruptible;
mutex_unlock(&dev->struct_mutex);
 }
 
-- 
1.9.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 01/56] drm/i915: Fix flush before context switch comment

2014-05-09 Thread Ben Widawsky

Signed-off-by: Ben Widawsky 
---
 drivers/gpu/drm/i915/i915_gem_context.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
b/drivers/gpu/drm/i915/i915_gem_context.c
index 6e2145b..29dd825 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -553,9 +553,7 @@ mi_set_context(struct intel_ring_buffer *ring,
int ret;
 
/* w/a: If Flush TLB Invalidation Mode is enabled, driver must do a TLB
-* invalidation prior to MI_SET_CONTEXT. On GEN6 we don't set the value
-* explicitly, so we rely on the value at ring init, stored in
-* itlb_before_ctx_switch.
+* invalidation prior to MI_SET_CONTEXT.
 */
if (IS_GEN6(ring->dev)) {
ret = ring->flush(ring, I915_GEM_GPU_DOMAINS, 0);
-- 
1.9.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 02/56] Revert "drm/i915: Drop I915_PARAM_HAS_FULL_PPGTT again"

2014-05-09 Thread Ben Widawsky

This reverts commit 7d9c477966e739a52d4c9655149958a2671ef376.

Conflicts:
drivers/gpu/drm/i915/i915_dma.c
include/uapi/drm/i915_drm.h
---
 drivers/gpu/drm/i915/i915_dma.c | 5 -
 include/uapi/drm/i915_drm.h | 1 +
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index d02c8de..d10ddcc 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -994,7 +994,7 @@ static int i915_getparam(struct drm_device *dev, void *data,
value = HAS_WT(dev);
break;
case I915_PARAM_HAS_ALIASING_PPGTT:
-   value = dev_priv->mm.aliasing_ppgtt || USES_FULL_PPGTT(dev);
+   value = dev_priv->mm.aliasing_ppgtt ? 1 : 0;
break;
case I915_PARAM_HAS_WAIT_TIMEOUT:
value = 1;
@@ -1020,6 +1020,9 @@ static int i915_getparam(struct drm_device *dev, void 
*data,
case I915_PARAM_CMD_PARSER_VERSION:
value = i915_cmd_parser_get_version();
break;
+   case I915_PARAM_HAS_FULL_PPGTT:
+   value = USES_FULL_PPGTT(dev);
+   break;
default:
DRM_DEBUG("Unknown parameter %d\n", param->param);
return -EINVAL;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 8a3e4ef00..6306a84 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -338,6 +338,7 @@ typedef struct drm_i915_irq_wait {
 #define I915_PARAM_HAS_EXEC_HANDLE_LUT   26
 #define I915_PARAM_HAS_WT   27
 #define I915_PARAM_CMD_PARSER_VERSION   28
+#define I915_PARAM_HAS_FULL_PPGTT   29
 
 typedef struct drm_i915_getparam {
int param;
-- 
1.9.2

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 00/56] [RFCish] Dynamic page table alloc, 64b, and GPU/CPU mirror

2014-05-09 Thread Ben Widawsky

Just as before, these patches are living based off of my Broadwell
branch, here:
http://cgit.freedesktop.org/~bwidawsk/drm-intel/log/?h=gpu_mirror

This is the follow-on patches for [1]

This patch series brings 3 things:
1. Dynamic page table allocation for gen6-8
2. 64b (48b canonical) graphics virtual address space for Broadwell
3. An interface to specify a specific offset for a BO.

It's taken way longer than I thought to get this work done, and given
the current state of our driver, I fear I may not have time to see this
through to the end before I am pulled onto other things. If people want
to send me smallish bugfixes, I will gladly do my best to fix them
quickly. If there are more substantial change requests wrt design or
patch reorganization, I will not be able to accommodate. Someone else
must take over this patch series at that point if they want these
features. I do believe that everything up until the userptr patch is in
decent shape though, so we'll see, I guess. (if you are qualified to
take this over, and have interest, please let me know).

The patch series is highly volatile and not manicured. I've run exactly
1 test on the GPU mirror (see below for what that means), though many
more on the prior stuff. The series depends on full PPGTT, which is not
yet enabled by default, and has a few outstanding issues. It also has
been developed exclusively on pre-production hardware. I am only sending
out now because I will be on vacation for the next 10 days, and I know
there are people that can benefit from this code before I return. With
that, I got the last parts of this working very recently, and they're
very hackish. The reason for this lack of refinement is I expect the
interfaces for letting userspace dictate things to change (more on this
later), and the other part is I just ran out of time before my vacation.
Throughout development, I've been hitting issues which I am not yet sure
if they are bugs in my code, bugs in full PPGTT, bugs in userptr, or
generally flakiness. There are a few patches in here which say TESTME
reflecting upon this. Also, if you want to run this, I highly recommend
turning off semaphores, and rc6. (To be honest, I've not tried it
recently). You also need to turn on PPGTT since it is disabled by
default.

modprobe i915 enable_ppgtt=2 semaphores=0 enable_rc6=0

What you get in this series is what I'm going to coin, GPU mirror. This
patch series allows one to allocate an arbitrary address for your GPU
buffer object, and map it to a specific space within the GPUs address
space. This is only possible because on Broadwell we get a 64b canonical
GPU address space, and this allows us to map any CPU address as a GPU
address. The obvious usage here is malloc(). malloc() returns a pointer
that is valid on the CPU. Now that address can be identical on the GPU.

The interface provided is identical to the userptr interface previously
posted by Chris Wilson. I've added a flag to that interface that
indicates this new functionality. This is not necessarily the final
version, and it's arguably not the best idea either. The reason for this
choice is we had users of userptr that wanted to try out this concept
and not have to do much porting.

To get to the userptr interface, I had to make a few things happen
first. I needed to get dynamic page table allocation and teardown
working. This was posted previously for gen6-7 [1] (with very rough code
for gen8). I've now added more robust support for gen8 dynamic page
table allocations. Doing the allocations dynamically was important
because preallocating all 4 levels of page tables is not feasible in a
real system. 4 level page tables are required in order to be able to
support the 64b canonical address space.

With that all done, I was able to make a few minor hacks to userptr,
take the intel-gpu-tools test from Tvrtko, and see at least one pass.
FWIW, I am currently running,
./tests/gem_userptr_blits --run-subtest coherency-unsync

Since I feel the interface will likely change, I do not feel compelled
to post either my libdrm, not my IGT changes. If you want the modified
test, let me know, as I don't think it's really relevant here.

One last thing. Intel GPU tools, as it stands today, makes a lot of
assumptions about using an address space > 32b. I have not had time to
fix this. It is something which needs fixing before this series could
even be considered testable.

[1] http://lists.freedesktop.org/archives/intel-gfx/2014-March/041814.html

Ben Widawsky (54):
  drm/i915: Fix flush before context switch comment
  Revert "drm/i915: Drop I915_PARAM_HAS_FULL_PPGTT again"
  drm/i915: Wrap VMA binding
  drm/i915: Make pin global flags explicit
  drm/i915: Split out aliasing binds
  drm/i915: fix gtt_total_entries()
  drm/i915: Rename to GEN8_LEGACY_PDPES
  drm/i915: Split out verbose PPGTT dumping
  drm/i915: s/pd/pdpe, s/pt/pde
  drm/i915: rename map/unmap to dma_map/unmap
  drm/i915: Setup less PPGTT on failed pagedir
  drm/i915: clean up PPG

Re: [Intel-gfx] 82845G/GL[Brookdale-G]/GE support for more resolution and external display

2014-05-09 Thread Felix Miata


On 2014-05-09 16:25 (GMT-0400) Brandon composed:


I have an old laptop with on board 82845G/GL[Brookdale-G]/GE Chipset Integrated 
Graphics Device (rev 03) which only has 640x480 available using the 
xf86-video-intel v2.99.911 driver with xorg-server 1.15.1. Â Additionally, I 
cannot use an external monitor attached to the onboard VGA out which I know is 
good since I can use it under another OS. Â Is this old chipset still supported?


The 845G has consistently been a thorn in developers and users' sides since new, but 
it still works here in post-14.x versions from Fedora, Mandriva and openSUSE at 
least. Here's a log from a freshly updated Rawhide system on a 1600x1200 CRT:

http://fm.no-ip.com/Tmp/Linux/Xorg/xorg.0.log-gx260-i845G-fc21-1405

Note that it's not from a laptop, so only has one real video output. And, it is 
configured via xorg.conf, not automagic.

--
"The wise are known for their understanding, and pleasant
words are persuasive." Proverbs 16:21 (New Living Translation)

 Team OS/2 ** Reg. Linux User #211409 ** a11y rocks!

Felix Miata  ***  http://fm.no-ip.com/
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [RFC] drm/i915: Add variable gem object size support to i915

2014-05-09 Thread Volkin, Bradley D

On Mon, Apr 28, 2014 at 08:01:29AM -0700, arun.siluv...@linux.intel.com wrote:
> From: "Siluvery, Arun" 
> 
> This patch adds support to have gem objects of variable size.
> The size of the gem object obj->size is always constant and this fact
> is tightly coupled in the driver; this implementation allows to vary
> its effective size using an interface similar to fallocate().
> 
> A new ioctl() is introduced to mark a range as scratch/usable.
> Once marked as scratch, associated backing store is released and the
> region is filled with scratch pages. The region can also be unmarked
> at a later point in which case new backing pages are created.
> The range can be anywhere within the object space, it can have multiple
> ranges possibly overlapping forming a large contiguous range.
> 
> There is only one single scratch page and Kernel allows to write to this
> page; userspace need to keep track of scratch page range otherwise any
> subsequent writes to these pages will overwrite previous content.
> 
> This feature is useful where the exact size of the object is not clear
> at the time of its creation, in such case we usually create an object
> with more than the required size but end up using it partially.
> In devices where there are tight memory constraints it would be useful
> to release that additional space which is currently unused. Using this
> interface the region can be simply marked as scratch which releases
> its backing store thus reducing the memory pressure on the kernel.
> 
> Many thanks to Daniel, ChrisW, Tvrtko, Bob for the idea and feedback
> on this implementation.
> 
> v2: fix holes in error handling and use consistent data types (Tvrtko)
>  - If page allocation fails simply return error; do not try to invoke
>shrinker to free backing store.
>  - Release new pages created by us in case of error during page allocation
>or sg_table update.
>  - Use 64-bit data types for start and length values to avoid truncation.
> 
> Change-Id: Id3339be95dbb6b5c69c39d751986c40ec0ccdaf8
> Signed-off-by: Siluvery, Arun 
> ---
> 
> Please let me know if I need to submit this as PATCH instead of RFC.
> Since this is RFC I have included all changes as a single patch.

Hi Arun,

For a change of this size, one patch seems fine to me. I think RFC vs.
PATCH is more a comment of what state you think the patch is in and
what level of feedback you're looking for.

> 
>  drivers/gpu/drm/i915/i915_dma.c |   1 +
>  drivers/gpu/drm/i915/i915_drv.h |   2 +
>  drivers/gpu/drm/i915/i915_gem.c | 205 
> 
>  include/uapi/drm/i915_drm.h |  31 ++
>  4 files changed, 239 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> index 31c499f..3dd4b1a 100644
> --- a/drivers/gpu/drm/i915/i915_dma.c
> +++ b/drivers/gpu/drm/i915/i915_dma.c
> @@ -2000,6 +2000,7 @@ const struct drm_ioctl_desc i915_ioctls[] = {
>   DRM_IOCTL_DEF_DRV(I915_GET_RESET_STATS, i915_get_reset_stats_ioctl, 
> DRM_UNLOCKED|DRM_RENDER_ALLOW),
>   DRM_IOCTL_DEF_DRV(I915_GEM_USERPTR, i915_gem_userptr_ioctl, \
>   DRM_UNLOCKED|DRM_RENDER_ALLOW),
> + DRM_IOCTL_DEF_DRV(I915_GEM_FALLOCATE, i915_gem_fallocate_ioctl, 
> DRM_UNLOCKED|DRM_RENDER_ALLOW),
>   DRM_IOCTL_DEF_DRV(I915_SET_PLANE_180_ROTATION, \
>   i915_set_plane_180_rotation, DRM_AUTH | DRM_UNLOCKED),
>   DRM_IOCTL_DEF_DRV(I915_ENABLE_PLANE_RESERVED_REG_BIT_2,
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 4069800..1f30fb6 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2210,6 +2210,8 @@ int i915_gem_get_tiling(struct drm_device *dev, void 
> *data,
>  int i915_gem_init_userptr(struct drm_device *dev);
>  int i915_gem_userptr_ioctl(struct drm_device *dev, void *data,
>   struct drm_file *file);
> +int i915_gem_fallocate_ioctl(struct drm_device *dev, void *data,
> + struct drm_file *file);
>  int i915_gem_get_aperture_ioctl(struct drm_device *dev, void *data,
>   struct drm_file *file_priv);
>  int i915_gem_wait_ioctl(struct drm_device *dev, void *data,
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 6153e01..a0188ee 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -317,6 +317,211 @@ i915_gem_create_ioctl(struct drm_device *dev, void 
> *data,
>  args->size, &args->handle);
>  }
>  
> +static int i915_gem_obj_fallocate(struct drm_i915_gem_object *obj,
> +   bool mark_scratch, uint64_t start,
> +   uint64_t length)
> +{
> + int i, j;
> + int ret;
> + uint32_t start_page, end_page;
> + uint32_t page_count;
> + gfp_t gfp;
> + bool update_sg_table = false;
> + unsigned long scratch_pfn

Re: [Intel-gfx] [3.14.0-rc4] regression: drm FIFO underruns

2014-05-09 Thread Dave Airlie

On 10 May 2014 03:03, Ville Syrjälä  wrote:
> On Fri, May 09, 2014 at 05:14:38PM +0100, Damien Lespiau wrote:
>> On Fri, May 09, 2014 at 06:11:37PM +0200, Jörg Otte wrote:
>> > > Jörg, can you please boot with drm.debug=0xe, reproduce the issue and
>> > > then attach the complete dmesg? Please make sure that the dmesg
>> > > contains the boot-up stuff too.
>> > >
>> > > Thanks, Daniel
>> > Here it is. I should mention it only happens at boot-up.
>>
>> [0.374095] [drm] Wrong MCH_SSKPD value: 0x20100406
>> [0.374096] [drm] This can cause pipe underruns and display issues.
>> [0.374097] [drm] Please upgrade your BIOS to fix this.
>
> That can be a factor, but I think we may have some more general issue
> in the modeset sequence which causes these to get reported. I'm getting
> some on my machine as well where SSKPD looks more sane. Maybe we turn on
> the error reporting too early or something.
>
> But I'm not going to spend time worrying about these before my previous
> watermark stuff gets merged. Also the underrun reporting code itself
> would need some kind of rewrite to be really useful.
>
> If the display doesn't blank out during use everything is more or less
> fine and you can ignore these errors. It's quite likely that the
> errors were always present and you didn't know it. We just made them
> more prominent recently.

Please don't make things more prominent if the fixes can't be merged
without rewriting the world,

Distros have auto reporting tools for the major backtrace warnings,
and releasing kernels with unfixable ones in it make it hard to know
what is real and what isn't.

Dave.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] 82845G/GL[Brookdale-G]/GE support for more resolution and external display

2014-05-09 Thread Brandon

I have an old laptop with on board 82845G/GL[Brookdale-G]/GE Chipset Integrated 
Graphics Device (rev 03) which only has 640x480 available using the 
xf86-video-intel v2.99.911 driver with xorg-server 1.15.1.  Additionally, I 
cannot use an external monitor attached to the onboard VGA out which I know is 
good since I can use it under another OS.  Is this old chipset still supported?

I attached Xorg.0.log.  Thank you for any advice I can use to get the full set 
of resolutions since running in 640x480 is not workable.

% xrandr
Screen 0: minimum 8 x 8, current 640 x 480, maximum 32767 x 32767
LVDS1 connected 640x480+0+0 (normal left inverted right x axis y axis) 0mm x 0mm
   640x480       60.00*+  59.94  
VGA1 unknown connection (normal left inverted right x axis y axis)
   1024x768      60.00  
   800x600       60.32  
   640x480       59.94  
VIRTUAL1 disconnected (normal left inverted right x axis y axis)

% xrandr --output LVDS1 --auto --output VGA1 --auto
xrandr: cannot find crtc for output VGA1

% xrandr --output LVDS1 --mode 640x480 --output VGA1 --mode 1024x768
xrandr: cannot find crtc for output VGA1
[   417.374] 
X.Org X Server 1.15.1
Release Date: 2014-04-13
[   417.374] X Protocol Version 11, Revision 0
[   417.374] Build Operating System: Linux 3.14.0-4-ARCH i686 
[   417.374] Current Operating System: Linux ramshackle 3.10.39-1-lts #1 SMP Tue May 6 15:42:03 UTC 2014 i686
[   417.374] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-linux-lts root=UUID=235a1c8d-ac77-4491-b0a7-8749dd3d58d1 rw
[   417.375] Build Date: 14 April 2014  08:41:02AM
[   417.375]  
[   417.375] Current version of pixman: 0.32.4
[   417.375] 	Before reporting problems, check http://wiki.x.org
	to make sure that you have the latest version.
[   417.375] Markers: (--) probed, (**) from config file, (==) default setting,
	(++) from command line, (!!) notice, (II) informational,
	(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
[   417.375] (==) Log file: "/var/log/Xorg.0.log", Time: Fri May  9 16:01:16 2014
[   417.376] (==) Using config directory: "/etc/X11/xorg.conf.d"
[   417.376] (==) Using system config directory "/usr/share/X11/xorg.conf.d"
[   417.377] (==) No Layout section.  Using the first Screen section.
[   417.377] (==) No screen section available. Using defaults.
[   417.377] (**) |-->Screen "Default Screen Section" (0)
[   417.377] (**) |   |-->Monitor ""
[   417.377] (==) No monitor specified for screen "Default Screen Section".
	Using a default monitor configuration.
[   417.377] (==) Automatically adding devices
[   417.377] (==) Automatically enabling devices
[   417.377] (==) Automatically adding GPU devices
[   417.378] (WW) The directory "/usr/share/fonts/OTF/" does not exist.
[   417.378] 	Entry deleted from font path.
[   417.378] (WW) The directory "/usr/share/fonts/Type1/" does not exist.
[   417.378] 	Entry deleted from font path.
[   417.378] (WW) `fonts.dir' not found (or not valid) in "/usr/share/fonts/100dpi/".
[   417.378] 	Entry deleted from font path.
[   417.378] 	(Run 'mkfontdir' on "/usr/share/fonts/100dpi/").
[   417.378] (WW) `fonts.dir' not found (or not valid) in "/usr/share/fonts/75dpi/".
[   417.378] 	Entry deleted from font path.
[   417.378] 	(Run 'mkfontdir' on "/usr/share/fonts/75dpi/").
[   417.378] (==) FontPath set to:
	/usr/share/fonts/misc/,
	/usr/share/fonts/TTF/
[   417.378] (==) ModulePath set to "/usr/lib/xorg/modules"
[   417.378] (II) The server relies on udev to provide the list of input devices.
	If no devices become available, reconfigure udev or disable AutoAddDevices.
[   417.378] (II) Loader magic: 0x8264660
[   417.378] (II) Module ABI versions:
[   417.378] 	X.Org ANSI C Emulation: 0.4
[   417.378] 	X.Org Video Driver: 15.0
[   417.378] 	X.Org XInput driver : 20.0
[   417.378] 	X.Org Server Extension : 8.0
[   417.379] (II) xfree86: Adding drm device (/dev/dri/card0)
[   417.384] (--) PCI:*(0:0:2:0) 8086:2562:1028:0149 rev 3, Mem @ 0xe000/134217728, 0xf6f8/524288
[   417.384] (WW) Open ACPI failed (/var/run/acpid.socket) (No such file or directory)
[   417.384] Initializing built-in extension Generic Event Extension
[   417.384] Initializing built-in extension SHAPE
[   417.384] Initializing built-in extension MIT-SHM
[   417.384] Initializing built-in extension XInputExtension
[   417.384] Initializing built-in extension XTEST
[   417.384] Initializing built-in extension BIG-REQUESTS
[   417.385] Initializing built-in extension SYNC
[   417.385] Initializing built-in extension XKEYBOARD
[   417.385] Initializing built-in extension XC-MISC
[   417.385] Initializing built-in extension SECURITY
[   417.385] Initializing built-in extension XINERAMA
[   417.385] Initializing built-in extension XFIXES
[   417.385] Initializing built-in extension RENDER
[   417.385] Initializing built-in extension RANDR
[   417.385] Initializing built-in extension COMPOSITE
[   417.385] Initializing built-in extension DAMAGE
[   417.385] Initializing built-in ex

[Intel-gfx] [PATCH] i915: Add module option to support VGA arbiter on HD devices

2014-05-09 Thread Alex Williamson

Commit 81b5c7bc found that the current VGA arbiter support in i915
only works for ancient GMCH-based IGD devices and attempted to update
support for newer HD devices.  Unfortunately newer devices cannot
completely opt-out of VGA arbitration like the old devices could.
The VGA I/O space cannot be disabled internally.  The only way to
route VGA I/O elsewhere is by disabling I/O at the device PCI command
register.  This means that with commit 81b5c7bc and multiple VGA
adapters, the VGA arbiter will report that multiple VGA devices are
participating in arbitration, Xorg will notice this and disable DRI.
Therefore, 81b5c7bc was reverted because DRI is more important than
being correct.

There is however an actual need for i915 to correctly participate in
VGA arbitration; VGA device assignment.  If we want to use VFIO to
assign a VGA device to a virtual machine, we need to be able to
access the VGA resources of that device.  By adding an i915 module
option we can allow i915 to continue with its charade by default, but
also allow an easy path for users who require working VGA arbitration.
Hopefully Xorg can someday be taught to behave better with multiple
VGA devices.

This also rolls in reverted commit 6e1b4fda, which corrected an
ordering issue with 81b5c7bc by delaying the disabling of VGA memory
until after vgacon->fbcon handoff.

Signed-off-by: Alex Williamson 
Cc: Ville Syrjälä 
Cc: Daniel Vetter 
Cc: Dave Airlie 
---

This should be a nop with the default module setting, so if there's
any opportunity to get this into v3.15, it would be appreciated.

 drivers/gpu/drm/i915/i915_dma.c  |   22 +++---
 drivers/gpu/drm/i915/i915_drv.h  |1 +
 drivers/gpu/drm/i915/i915_params.c   |5 +
 drivers/gpu/drm/i915/intel_display.c |   30 ++
 drivers/gpu/drm/i915/intel_drv.h |2 ++
 include/linux/vgaarb.h   |7 +++
 6 files changed, 64 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 96177ee..c0d0c03 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1306,10 +1306,20 @@ static int i915_load_modeset_init(struct drm_device 
*dev)
 * If we are a secondary display controller (!PCI_DISPLAY_CLASS_VGA),
 * then we do not take part in VGA arbitration and the
 * vga_client_register() fails with -ENODEV.
+*
+* NB.  The set_decode callback here actually only works on GMCH
+* devices, on newer HD devices we can only disable VGA MMIO space.
+* Disabling VGA I/O space requires disabling I/O in the PCI command
+* register.  Nonetheless, we like to pretend that we participate in
+* VGA arbitration and can dynamically disable VGA I/O space because
+* this makes X happy, even though it's a complete lie.
 */
-   ret = vga_client_register(dev->pdev, dev, NULL, i915_vga_set_decode);
-   if (ret && ret != -ENODEV)
-   goto out;
+   if (!i915.enable_hd_vgaarb || !HAS_PCH_SPLIT(dev)) {
+   ret = vga_client_register(dev->pdev, dev, NULL,
+ i915_vga_set_decode);
+   if (ret && ret != -ENODEV)
+   goto out;
+   }
 
intel_register_dsm_handler();
 
@@ -1369,6 +1379,12 @@ static int i915_load_modeset_init(struct drm_device *dev)
 */
intel_fbdev_initial_config(dev);
 
+   /*
+* Must do this after fbcon init so that
+* vgacon_save_screen() works during the handover.
+*/
+   i915_disable_vga_mem(dev);
+
/* Only enable hotplug handling once the fbdev is fully set up. */
dev_priv->enable_hotplug_processing = true;
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index ec82f6b..f3908f6 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2080,6 +2080,7 @@ struct i915_params {
bool prefault_disable;
bool reset;
bool disable_display;
+   bool enable_hd_vgaarb;
 };
 extern struct i915_params i915 __read_mostly;
 
diff --git a/drivers/gpu/drm/i915/i915_params.c 
b/drivers/gpu/drm/i915/i915_params.c
index d1d7980..64d96c6 100644
--- a/drivers/gpu/drm/i915/i915_params.c
+++ b/drivers/gpu/drm/i915/i915_params.c
@@ -47,6 +47,7 @@ struct i915_params i915 __read_mostly = {
.invert_brightness = 0,
.disable_display = 0,
.enable_cmd_parser = 0,
+   .enable_hd_vgaarb = false,
 };
 
 module_param_named(modeset, i915.modeset, int, 0400);
@@ -152,3 +153,7 @@ MODULE_PARM_DESC(disable_display, "Disable display 
(default: false)");
 module_param_named(enable_cmd_parser, i915.enable_cmd_parser, int, 0600);
 MODULE_PARM_DESC(enable_cmd_parser,
 "Enable command parsing (1=enabled, 0=disabled [default])");
+
+module_param_named(enable_hd_vgaarb, i915.enable_hd_vgaarb, bool, 0444);
+MODULE_PA

[Intel-gfx] Problems with 82845G/GL[Brookdale-G]/GE

2014-05-09 Thread Brandon Stone

I have a very old laptop with 

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915/vlv: reset VLV media force wake request register

2014-05-09 Thread Jani Nikula

On Fri, 09 May 2014, Darren Hart  wrote:
> On 5/9/14, 5:41, "Jani Nikula"  wrote:
>
>>Media force wake get hangs the machine when the system is booted without
>>displays attached. The assumption is that (at least some versions of)
>>the firmware has skipped some initialization in that case.
>>
>>Empirical evidence suggests we need to reset the media force wake
>>request register in addition to the render one to avoid hangs.
>>
>>Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75895
>>Reported-by: Imre Deak 
>>Reported-by: Darren Hart 
>>Cc: sta...@vger.kernel.org
>>Signed-off-by: Jani Nikula 
>
> Applied to 3.14.2 and tested on MinnowBoardMax A0 hardware (BayTrail-I,
> Atom E3825).
>
> * With no display connected, the boot no longer hangs and DRM prints
> sensible messages during boot:
> [5.968837] [drm] GMBUS [i915 gmbus vga] timed out, falling back to bit
> banging on pin 2
> [5.988037] i915 :00:02.0: No connectors reported connected with
> modes
> [5.995744] [drm] Cannot find any crtc or sizes - going 1024x768
> [6.004716] fbcon: inteldrmfb (fb0) is primary device
> [6.013066] Console: switching to colour frame buffer device 128x48
> [6.034147] i915 :00:02.0: fb0: inteldrmfb frame buffer device
> [6.041168] i915 :00:02.0: registered panic notifier
> [6.049820] ACPI: Video Device [GFX0] (multi-head: yes  rom: no  post:
> no)
> [6.058788] acpi device:30: registered as cooling_device3
> [6.065111] input: Video Bus as
> /devices/LNXSYSTM:00/device:00/PNP0A08:00/LNXVIDEO:00/input/input4
> [6.075370] [drm] Initialized i915 1.6.0 20080730 for :00:02.0 on
> minor 0
>
>
> * If a display is subsequently connected and X is restarted, it behaves as
> expected.
>
> * Booting with a display connected continue to work as expected.
>
> Tested-by: Darren Hart 
>
> Thank you very much Jani!

Pushed to -fixes. Thanks for testing, Darren, and thanks for review,
Mika.

BR,
Jani.



>
> -- 
> Darren Hart   Open Source Technology Center
> darren.h...@intel.com Intel Corporation
>
>
>
> ___
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Jani Nikula, Intel Open Source Technology Center
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915/vlv: reset VLV media force wake request register

2014-05-09 Thread Darren Hart

On 5/9/14, 5:41, "Jani Nikula"  wrote:

>Media force wake get hangs the machine when the system is booted without
>displays attached. The assumption is that (at least some versions of)
>the firmware has skipped some initialization in that case.
>
>Empirical evidence suggests we need to reset the media force wake
>request register in addition to the render one to avoid hangs.
>
>Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75895
>Reported-by: Imre Deak 
>Reported-by: Darren Hart 
>Cc: sta...@vger.kernel.org
>Signed-off-by: Jani Nikula 

Applied to 3.14.2 and tested on MinnowBoardMax A0 hardware (BayTrail-I,
Atom E3825).

* With no display connected, the boot no longer hangs and DRM prints
sensible messages during boot:
[5.968837] [drm] GMBUS [i915 gmbus vga] timed out, falling back to bit
banging on pin 2
[5.988037] i915 :00:02.0: No connectors reported connected with
modes
[5.995744] [drm] Cannot find any crtc or sizes - going 1024x768
[6.004716] fbcon: inteldrmfb (fb0) is primary device
[6.013066] Console: switching to colour frame buffer device 128x48
[6.034147] i915 :00:02.0: fb0: inteldrmfb frame buffer device
[6.041168] i915 :00:02.0: registered panic notifier
[6.049820] ACPI: Video Device [GFX0] (multi-head: yes  rom: no  post:
no)
[6.058788] acpi device:30: registered as cooling_device3
[6.065111] input: Video Bus as
/devices/LNXSYSTM:00/device:00/PNP0A08:00/LNXVIDEO:00/input/input4
[6.075370] [drm] Initialized i915 1.6.0 20080730 for :00:02.0 on
minor 0


* If a display is subsequently connected and X is restarted, it behaves as
expected.

* Booting with a display connected continue to work as expected.

Tested-by: Darren Hart 

Thank you very much Jani!

-- 
Darren Hart Open Source Technology Center
darren.h...@intel.com   Intel Corporation



___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH v2] drm/i915: Increase WM memory latency values?on SNB

2014-05-09 Thread Ville Syrjälä

On Fri, May 09, 2014 at 05:46:43PM +, Robert Navarro wrote:
> Ville Syrjälä  linux.intel.com> writes:
> 
> > I think it should apply to 3.13+. If not directly then with a bit of
> > manual frobbery. Which reminds me that we should perhaps slap a cc
> > stable on it to get it included in 3.13+. For older kernels the patch
> > would have to look totally different, so I'm not going to bother
> > about those.
> > 
> 
> Sounds good, one more question. Does this replace the previous patch or are 
> they used together?

Replace.

-- 
Ville Syrjälä
Intel OTC
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH v2] drm/i915: Increase WM memory latency values?on SNB

2014-05-09 Thread Robert Navarro

Ville Syrjälä  linux.intel.com> writes:

> I think it should apply to 3.13+. If not directly then with a bit of
> manual frobbery. Which reminds me that we should perhaps slap a cc
> stable on it to get it included in 3.13+. For older kernels the patch
> would have to look totally different, so I'm not going to bother
> about those.
> 

Sounds good, one more question. Does this replace the previous patch or are 
they used together?




___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH v3] drm/i915: Replaced Blitter ring based flips with MMIO flips for VLV

2014-05-09 Thread Ville Syrjälä

On Fri, May 09, 2014 at 02:59:42PM +0300, Ville Syrjälä wrote:
> On Sun, Mar 23, 2014 at 02:31:05PM +0530, sourab.gu...@intel.com wrote:
> > +   intel_do_mmio_flip(dev, crtc);
> > +   mmio_flip_data->seqno = 0;
> > +   ring->irq_put(ring);
> > +   }
> > +   }
> > +
> > +   spin_unlock_irqrestore(&dev_priv->mmio_flip_lock, irq_flags);
> > +}
> > +
> > +/* Using MMIO based flips starting from VLV, for Media power well
> > + * residency optimization. The other alternative of having Render
> > + * ring based flip calls is not being used, as the performance
> > + * (FPS) of certain 3D Apps was getting severly affected.
> > + */
> > +static int intel_gen7_queue_mmio_flip(struct drm_device *dev,
> > +   struct drm_crtc *crtc,
> > +   struct drm_framebuffer *fb,
> > +   struct drm_i915_gem_object *obj,
> > +   uint32_t flags)
> 
> There's nothing gen7 specific here. So you could just rename the
> function to eg. intel_queue_mmio_flip(). Maybe also move the
> comment about VLV to where you set up the function pointer.

Actually this code isn't entirely gen agnostic. It should work on gen5+
since all of those have a flip done interrupt. For older platforms we
use some clever tricks involving the flip_pending status bits and vblank
irqs. That code won't work for mmio flips. We'd need to add another way
to complete the flips based. That would involve using the frame counter
to make it accurate. To avoid races there we'd definitely need to use
the vblank evade mechanism to make sure we sample the frame counter
within the same frame as when we write the registers. Also gen2 has
the extra complication that it lacks a hardware frame counter.

So I think we can start off with limiting this to gen5+, and later we
can extend it to cover the older platforms since we anyway need to do
that work to get the nuclear flips working.

BTW I gave this code a whirl on my IVB and everything seems to work
fine.

-- 
Ville Syrjälä
Intel OTC
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [3.14.0-rc4] regression: drm FIFO underruns

2014-05-09 Thread Ville Syrjälä

On Fri, May 09, 2014 at 05:14:38PM +0100, Damien Lespiau wrote:
> On Fri, May 09, 2014 at 06:11:37PM +0200, Jörg Otte wrote:
> > > Jörg, can you please boot with drm.debug=0xe, reproduce the issue and
> > > then attach the complete dmesg? Please make sure that the dmesg
> > > contains the boot-up stuff too.
> > >
> > > Thanks, Daniel
> > Here it is. I should mention it only happens at boot-up.
> 
> [0.374095] [drm] Wrong MCH_SSKPD value: 0x20100406
> [0.374096] [drm] This can cause pipe underruns and display issues.
> [0.374097] [drm] Please upgrade your BIOS to fix this.

That can be a factor, but I think we may have some more general issue
in the modeset sequence which causes these to get reported. I'm getting
some on my machine as well where SSKPD looks more sane. Maybe we turn on
the error reporting too early or something.

But I'm not going to spend time worrying about these before my previous
watermark stuff gets merged. Also the underrun reporting code itself
would need some kind of rewrite to be really useful.

If the display doesn't blank out during use everything is more or less
fine and you can ignore these errors. It's quite likely that the
errors were always present and you didn't know it. We just made them
more prominent recently.

-- 
Ville Syrjälä
Intel OTC
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [3.14.0-rc4] regression: drm FIFO underruns

2014-05-09 Thread Damien Lespiau

On Fri, May 09, 2014 at 06:11:37PM +0200, Jörg Otte wrote:
> > Jörg, can you please boot with drm.debug=0xe, reproduce the issue and
> > then attach the complete dmesg? Please make sure that the dmesg
> > contains the boot-up stuff too.
> >
> > Thanks, Daniel
> Here it is. I should mention it only happens at boot-up.

[0.374095] [drm] Wrong MCH_SSKPD value: 0x20100406
[0.374096] [drm] This can cause pipe underruns and display issues.
[0.374097] [drm] Please upgrade your BIOS to fix this.

-- 
Damien
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH v2] drm/i915: Increase WM memory latency values?on SNB

2014-05-09 Thread Ville Syrjälä

On Fri, May 09, 2014 at 03:23:41PM +, Robert Navarro wrote:
> Thanks for this Ville.
> 
> Should this apply to 3.14 and 3.15?
> 
> I'll try it on 3.15 first and report back.

I think it should apply to 3.13+. If not directly then with a bit of
manual frobbery. Which reminds me that we should perhaps slap a cc
stable on it to get it included in 3.13+. For older kernels the patch
would have to look totally different, so I'm not going to bother
about those.

-- 
Ville Syrjälä
Intel OTC
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH v2] drm/i915: Increase WM memory latency values on SNB

2014-05-09 Thread Robert Navarro

Thanks for this Ville.

Should this apply to 3.14 and 3.15?

I'll try it on 3.15 first and report back.

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] tools/null_state_gen: generate null render state

2014-05-09 Thread Damien Lespiau

On Tue, May 06, 2014 at 02:47:40PM +0100, Chris Wilson wrote:
> Why does this work? It is neither the most minimal batch, nor the
> maximal. Which state is truly required? It looks like cargo-culted
> Chinese.

I'll have to echo this. It's really not obvious why this is needed.
If you look at the render engine power context for instance, it's just a
list of registers. So if we do:

  - init_clock_gating() (bad name!)
  - enable_rc6()

The render power context should contain the W/A we setup.

Would we do:

  - enable_rc6()
  -> enter rc6
  -> power context save
  - init_clock_gating()
  -> exit rc6
  -> power context restore

We'd end up restoring the reset value of the registers we touch in
init_clock_gating() (or the values after BIOS really), ie may not
contain all W/As and hang?

Note that init_clock_gating() is not the only place where we touch the
registers that are part of the power context(s), we do need to ensure
rc6 is only enabled after we setup those registers.

It could also be that something else than saving/restoring the power
contexts is happening at rc6 entry/exit, but the documentation is rather
sparse here and so we need to try talking to the hardware engineers
again.

So yes very much feels like cargo culting. I'd be nice to really
understand what's happening.

Now, a rather pragmatic approach would be to take those patches if they
actually paper over an issue, but the Tested-by: tags are not legion,
Mika didn't reproduce the issue on his BDW (IIRC) and Ben was saying
Kristen didn't confirm it was these exact patches that were solving
hangs for her (If I understood correctly on the call).

I do have to point out that it's a lot of code to review and rather full
of details, ie, we'll get it wrong-ish (but not enough to break
anything, hopefully).

-- 
Damien
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] tools/null_state_gen: generate null render state

2014-05-09 Thread Damien Lespiau

On Tue, May 06, 2014 at 04:39:01PM +0300, Mika Kuoppala wrote:
> diff --git a/tools/null_state_gen/intel_renderstate_gen8.c 
> b/tools/null_state_gen/intel_renderstate_gen8.c
> new file mode 100644
> index 000..7e22b24
> --- /dev/null
> +++ b/tools/null_state_gen/intel_renderstate_gen8.c

[...]

> +static void
> +gen7_emit_urb(struct intel_batchbuffer *batch) {
> + /* XXX: Min valid values from mesa */
> + const int vs_entries = 64;
> + const int vs_size = 2;
> + const int vs_start = 2;
> +
> + OUT_BATCH(GEN7_3DSTATE_URB_VS);
> + OUT_BATCH(vs_entries | ((vs_size - 1) << 16) | (vs_start << 25));
> + OUT_BATCH(GEN7_3DSTATE_URB_GS);
> + OUT_BATCH(vs_start << 25);
> + OUT_BATCH(GEN7_3DSTATE_URB_HS);
> + OUT_BATCH(vs_start << 25);
> + OUT_BATCH(GEN7_3DSTATE_URB_DS);
> + OUT_BATCH(vs_start << 25);
> +}

It seems that for BDW GT3, the minimal start is documented as 4. Mesa
has actually been updated to do the right thing now (push contants take
32KB on GT3) and vs_start is 4 on GT3.

Same story for the other URB allocations. But as they are disabled, may
not matter much. We don't setup the PUSH_CONSTANT state, so it's
possible the VS is able to address the start of the URB. Meh?

I'd still put vs_start to 4 I guess.

I'm quite puzzled by why we need to do all that, but let's not go there
again.

-- 
Damien
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 24/50] drm/i915/bdw: Populate LR contexts (somewhat)

2014-05-09 Thread Damien Lespiau

On Fri, May 09, 2014 at 01:08:54PM +0100, oscar.ma...@intel.com wrote:
> + if (ring->id == RCS) {
> + reg_state[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
> + reg_state[CTX_LRI_HEADER_2] |= MI_LRI_FORCE_POSTED;

This header doesn't have bit 12 set in BSpec.

> + reg_state[CTX_R_PWR_CLK_STATE] = 0x20c8;
> + reg_state[CTX_R_PWR_CLK_STATE+1] = 0;
> +#if 0
> + /* Offsets not yet defined for these */
> + reg_state[CTX_GPGPU_CSR_BASE_ADDRESS] = 0;
> + reg_state[CTX_GPGPU_CSR_BASE_ADDRESS+1] = 0;
> +#endif

Remove dead code?

-- 
Damien
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915/vlv: reset VLV media force wake request register

2014-05-09 Thread Mika Kuoppala

Jani Nikula  writes:

> Media force wake get hangs the machine when the system is booted without
> displays attached. The assumption is that (at least some versions of)
> the firmware has skipped some initialization in that case.
>
> Empirical evidence suggests we need to reset the media force wake
> request register in addition to the render one to avoid hangs.
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75895
> Reported-by: Imre Deak 
> Reported-by: Darren Hart 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Jani Nikula 

Reviewed-by: Mika Kuoppala 

> ---
>
> Darren, a Tested-by would be much appreciated!
>
> Thanks,
> Jani.
> ---
>  drivers/gpu/drm/i915/intel_uncore.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/intel_uncore.c 
> b/drivers/gpu/drm/i915/intel_uncore.c
> index 76dc185793ce..27fe2df47d73 100644
> --- a/drivers/gpu/drm/i915/intel_uncore.c
> +++ b/drivers/gpu/drm/i915/intel_uncore.c
> @@ -185,6 +185,8 @@ static void vlv_force_wake_reset(struct drm_i915_private 
> *dev_priv)
>  {
>   __raw_i915_write32(dev_priv, FORCEWAKE_VLV,
>  _MASKED_BIT_DISABLE(0x));
> + __raw_i915_write32(dev_priv, FORCEWAKE_MEDIA_VLV,
> +_MASKED_BIT_DISABLE(0x));
>   /* something from same cacheline, but !FORCEWAKE_VLV */
>   __raw_posting_read(dev_priv, FORCEWAKE_ACK_VLV);
>  }
> -- 
> 1.9.1
>
> ___
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH v3] drm/i915: Replaced Blitter ring based flips with MMIO flips for VLV

2014-05-09 Thread Ville Syrjälä

On Fri, May 09, 2014 at 02:59:42PM +0300, Ville Syrjälä wrote:
> On Sun, Mar 23, 2014 at 02:31:05PM +0530, sourab.gu...@intel.com wrote:
> > From: Sourab Gupta 
> > 
> > Using MMIO based flips on VLV for Media power well residency optimization.
> > The blitter ring is currently being used just for command streamer based
> > flip calls. For pure 3D workloads, with MMIO flips, there will be no use
> > of blitter ring and this will ensure the 100% residency for Media well.
> 
> Sorry for dragging my feet with reviewing this. I'm hoping this is the
> latest version...
> 
> > 
> > v2: The MMIO flips now use the interrupt driven mechanism for issuing the
> > flips when target seqno is reached. (Incorporating Ville's idea)
> > 
> > v3: Rebasing on latest code. Code restructuring after incorporating
> > Damien's comments
> > 
> > Signed-off-by: Sourab Gupta 
> > Signed-off-by: Akash Goel 
> > ---
> >  drivers/gpu/drm/i915/i915_dma.c  |   1 +
> >  drivers/gpu/drm/i915/i915_drv.h  |   7 ++
> >  drivers/gpu/drm/i915/i915_irq.c  |   2 +
> >  drivers/gpu/drm/i915/intel_display.c | 124 
> > +++
> >  drivers/gpu/drm/i915/intel_drv.h |   7 ++
> >  5 files changed, 141 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_dma.c 
> > b/drivers/gpu/drm/i915/i915_dma.c
> > index 4e0a26a..bca3c5a 100644
> > --- a/drivers/gpu/drm/i915/i915_dma.c
> > +++ b/drivers/gpu/drm/i915/i915_dma.c
> > @@ -1569,6 +1569,7 @@ int i915_driver_load(struct drm_device *dev, unsigned 
> > long flags)
> > spin_lock_init(&dev_priv->backlight_lock);
> > spin_lock_init(&dev_priv->uncore.lock);
> > spin_lock_init(&dev_priv->mm.object_stat_lock);
> > +   spin_lock_init(&dev_priv->mmio_flip_lock);
> > mutex_init(&dev_priv->dpio_lock);
> > mutex_init(&dev_priv->modeset_restore_lock);
> >  
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h 
> > b/drivers/gpu/drm/i915/i915_drv.h
> > index 3f62be0..678d31d 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -1621,6 +1621,10 @@ typedef struct drm_i915_private {
> > struct i915_dri1_state dri1;
> > /* Old ums support infrastructure, same warning applies. */
> > struct i915_ums_state ums;
> > +
> > +   /* protects the mmio flip data */
> > +   spinlock_t mmio_flip_lock;
> > +
> >  } drm_i915_private_t;
> >  
> >  static inline struct drm_i915_private *to_i915(const struct drm_device 
> > *dev)
> > @@ -2657,6 +2661,9 @@ int i915_reg_read_ioctl(struct drm_device *dev, void 
> > *data,
> >  int i915_get_reset_stats_ioctl(struct drm_device *dev, void *data,
> >struct drm_file *file);
> >  
> > +void intel_notify_mmio_flip(struct drm_device *dev,
> > +   struct intel_ring_buffer *ring);
> > +
> >  /* overlay */
> >  extern struct intel_overlay_error_state 
> > *intel_overlay_capture_error_state(struct drm_device *dev);
> >  extern void intel_overlay_print_error_state(struct 
> > drm_i915_error_state_buf *e,
> > diff --git a/drivers/gpu/drm/i915/i915_irq.c 
> > b/drivers/gpu/drm/i915/i915_irq.c
> > index 4b4aeb3..ad26abe 100644
> > --- a/drivers/gpu/drm/i915/i915_irq.c
> > +++ b/drivers/gpu/drm/i915/i915_irq.c
> > @@ -1080,6 +1080,8 @@ static void notify_ring(struct drm_device *dev,
> >  
> > trace_i915_gem_request_complete(ring);
> >  
> > +   intel_notify_mmio_flip(dev, ring);
> > +
> > wake_up_all(&ring->irq_queue);
> > i915_queue_hangcheck(dev);
> >  }
> > diff --git a/drivers/gpu/drm/i915/intel_display.c 
> > b/drivers/gpu/drm/i915/intel_display.c
> > index 7e4ea8d..19004bf 100644
> > --- a/drivers/gpu/drm/i915/intel_display.c
> > +++ b/drivers/gpu/drm/i915/intel_display.c
> > @@ -8782,6 +8782,125 @@ err:
> > return ret;
> >  }
> >  
> > +static int intel_do_mmio_flip(struct drm_device *dev,
> > +   struct drm_crtc *crtc)
> > +{
> > +   struct drm_i915_private *dev_priv = dev->dev_private;
> > +   struct intel_crtc *intel_crtc;
> > +
> > +   intel_crtc = to_intel_crtc(crtc);
> 
> nit: could be part of intel_crtc declaration
> 
> > +
> > +   intel_mark_page_flip_active(intel_crtc);
> > +   return dev_priv->display.update_primary_plane(crtc, crtc->fb, 0, 0);
> 
> Needs to pass crtc->{x,y} instead of 0,0.
> 
> I was a bit worried crtc->fb might be changed already at this point, but
> after thinking a bit it should be fine since the presense of unpin_work
> will keep intel_crtc_page_flip() from frobbing with it and we always
> call intel_crtc_wait_for_pending_flips() before set_base.
> 
> Just need to update to use crtc->primary->fb now.
> 
> I'm thinking we also have a small race here with a flip done interrupt
> from a previous set_base. Probably we need to sort it out using the 
> SURFLIVE and/or flip counter like I did for the mmio vs. cs flip
> race. But I need to think on this a bit more. Perhaps you want to also
> look at those patches a bit.

Oh another thing here is that we update_primary_plane() isn'

Re: [Intel-gfx] [PATCH 03/10] drm/i915/chv: Enable Render Standby (RC6) for Cherryview

2014-05-09 Thread Mika Kuoppala


Hi Deepak,

deepa...@linux.intel.com writes:

> From: Deepak S 
>
> v2: Configure PCBR if BIOS fails allocate pcbr (deepak)
>
> v3: Fix PCBR condition check during CHV RC6 Enable flag set
>
> v4: Fixup PCBR comment msg. (Chris)
> Rebase against latest code (Deak)
> Fixup Spurious hunk (Ben)
>
> Signed-off-by: Deepak S 
> Acked-by: Ben Widawsky 
> ---
>  drivers/gpu/drm/i915/i915_reg.h |   2 +
>  drivers/gpu/drm/i915/intel_pm.c | 115 
> +---
>  2 files changed, 111 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index c850254..b4074fd 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -962,6 +962,8 @@ enum punit_power_well {
>  #define VLV_IMR  (VLV_DISPLAY_BASE + 0x20a8)
>  #define VLV_ISR  (VLV_DISPLAY_BASE + 0x20ac)
>  #define VLV_PCBR (VLV_DISPLAY_BASE + 0x2120)
> +#define VLV_PCBR_ADDR_SHIFT  12
> +
>  #define   DISPLAY_PLANE_FLIP_PENDING(plane) (1<<(11-(plane))) /* A and B 
> only */
>  #define EIR  0x020b0
>  #define EMR  0x020b4
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index ebb5c88..f0359b6 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -3300,6 +3300,13 @@ static void gen6_disable_rps(struct drm_device *dev)
>   gen6_disable_rps_interrupts(dev);
>  }
>  
> +static void cherryview_disable_rps(struct drm_device *dev)
> +{
> + struct drm_i915_private *dev_priv = dev->dev_private;
> +
> + I915_WRITE(GEN6_RC_CONTROL, 0);
> +}
> +
>  static void valleyview_disable_rps(struct drm_device *dev)
>  {
>   struct drm_i915_private *dev_priv = dev->dev_private;
> @@ -3722,6 +3729,33 @@ static void valleyview_check_pctx(struct 
> drm_i915_private *dev_priv)
>dev_priv->vlv_pctx->stolen->start);
>  }
>  
> +
> +/* Check that the pcbr address is not empty. */
> +static void cherryview_check_pctx(struct drm_i915_private *dev_priv)
> +{
> + unsigned long pctx_addr = I915_READ(VLV_PCBR) & ~4095;
> +
> + WARN_ON((pctx_addr >> VLV_PCBR_ADDR_SHIFT) == 0);
> +}
> +
> +static void cherryview_setup_pctx(struct drm_device *dev)
> +{
> + struct drm_i915_private *dev_priv = dev->dev_private;
> + unsigned long pctx_paddr;
> + struct i915_gtt *gtt = &dev_priv->gtt;
> + u32 pcbr;
> + int pctx_size = 32*1024;
> +
> + WARN_ON(!mutex_is_locked(&dev->struct_mutex));
> +
> + pcbr = I915_READ(VLV_PCBR);
> + if ((pcbr >> VLV_PCBR_ADDR_SHIFT) == 0) {

I admit that address zero locked by bios is prolly in
realms of paranoia. But I would still omit shift here to
get lock bit taken into consideration.

> + pctx_paddr = (dev_priv->mm.stolen_base +
> +   (gtt->stolen_size - pctx_size));
> + I915_WRITE(VLV_PCBR, pctx_paddr);

In here tho I would mask the low bits out, just to be on
the safe side. If we get off by one on stolen calculation we end up
writing the lock bit.

I am thinking that we should just sanity check that bios
has set this up and that it seems to be in correct place. If not, spit a
warning and leave rc6 disabled.

The BIOS should have setup everything for us. Why do we need this
PCBR setup?

> + }
> +}
> +
>  static void valleyview_setup_pctx(struct drm_device *dev)
>  {
>   struct drm_i915_private *dev_priv = dev->dev_private;
> @@ -3811,11 +3845,72 @@ static void valleyview_init_gt_powersave(struct 
> drm_device *dev)
>   mutex_unlock(&dev_priv->rps.hw_lock);
>  }
>  
> +static void cherryview_init_gt_powersave(struct drm_device *dev)
> +{
> + cherryview_setup_pctx(dev);
> +}
> +
>  static void valleyview_cleanup_gt_powersave(struct drm_device *dev)
>  {
>   valleyview_cleanup_pctx(dev);
>  }
>  
> +static void cherryview_enable_rps(struct drm_device *dev)
> +{
> + struct drm_i915_private *dev_priv = dev->dev_private;
> + struct intel_ring_buffer *ring;
> + u32 gtfifodbg, rc6_mode = 0, pcbr;
> + int i;
> +
> + WARN_ON(!mutex_is_locked(&dev_priv->rps.hw_lock));
> +
> + gtfifodbg = I915_READ(GTFIFODBG);
> + if (gtfifodbg) {
> + DRM_DEBUG_DRIVER("GT fifo had a previous error %x\n",
> +  gtfifodbg);
> + I915_WRITE(GTFIFODBG, gtfifodbg);
> + }
> +
> + cherryview_check_pctx(dev_priv);
> +
> + /* 1a & 1b: Get forcewake during program sequence. Although the driver
> +  * hasn't enabled a state yet where we need forcewake, BIOS may have.*/
> + gen6_gt_force_wake_get(dev_priv, FORCEWAKE_ALL);
> +

I915_WRITE(GEN6_RC_CONTROL, 0);

> + /* 2a: Program RC6 thresholds.*/
> + I915_WRITE(GEN6_RC6_WAKE_RATE_LIMIT, 40 << 16);
> + I915_WRITE(GEN6_RC_EVALUATION_INTERVAL, 125000); /* 12500 * 1280ns */
> + I915_WRITE(GEN6_RC_IDLE_HYSTERSIS, 25); /* 25 * 1280ns */
> +
> + for_each_ring(ring, dev_

[Intel-gfx] [PATCH] drm/i915: Ringbuffer signal func for the second BSD ring

2014-05-09 Thread oscar . mateo

From: Oscar Mateo 

This is missing in:

commit 78325f2d270897c9ee0887125b7abb963eb8efea
Author: Ben Widawsky 
Date:   Tue Apr 29 14:52:29 2014 -0700

drm/i915: Virtualize the ringbuffer signal func

Looks to me like a rebase side-effect...

Signed-off-by: Oscar Mateo 
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 9907d66..203fa2b 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2182,6 +2182,7 @@ int intel_init_bsd2_ring_buffer(struct drm_device *dev)
ring->dispatch_execbuffer =
gen8_ring_dispatch_execbuffer;
ring->semaphore.sync_to = gen6_ring_sync;
+   ring->semaphore.signal = gen6_signal;
/*
 * The current semaphore is only applied on the pre-gen8. And there
 * is no bsd2 ring on the pre-gen8. So now the semaphore_register
-- 
1.9.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH] drm/i915/vlv: reset VLV media force wake request register

2014-05-09 Thread Jani Nikula

Media force wake get hangs the machine when the system is booted without
displays attached. The assumption is that (at least some versions of)
the firmware has skipped some initialization in that case.

Empirical evidence suggests we need to reset the media force wake
request register in addition to the render one to avoid hangs.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75895
Reported-by: Imre Deak 
Reported-by: Darren Hart 
Cc: sta...@vger.kernel.org
Signed-off-by: Jani Nikula 

---

Darren, a Tested-by would be much appreciated!

Thanks,
Jani.
---
 drivers/gpu/drm/i915/intel_uncore.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_uncore.c 
b/drivers/gpu/drm/i915/intel_uncore.c
index 76dc185793ce..27fe2df47d73 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -185,6 +185,8 @@ static void vlv_force_wake_reset(struct drm_i915_private 
*dev_priv)
 {
__raw_i915_write32(dev_priv, FORCEWAKE_VLV,
   _MASKED_BIT_DISABLE(0x));
+   __raw_i915_write32(dev_priv, FORCEWAKE_MEDIA_VLV,
+  _MASKED_BIT_DISABLE(0x));
/* something from same cacheline, but !FORCEWAKE_VLV */
__raw_posting_read(dev_priv, FORCEWAKE_ACK_VLV);
 }
-- 
1.9.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [3.14.0-rc4] regression: drm FIFO underruns

2014-05-09 Thread Daniel Vetter

Adding mailing lists.

Jörg, can you please boot with drm.debug=0xe, reproduce the issue and
then attach the complete dmesg? Please make sure that the dmesg
contains the boot-up stuff too.

Thanks, Daniel

On Mon, May 5, 2014 at 9:51 AM, Jörg Otte  wrote:
> I still have FIFO underruns in drm:
> [drm:ivb_err_int_handler] *ERROR* Pipe B FIFO underrun
> [drm:cpt_serr_int_handler] *ERROR* PCH transcoder A FIFO underrun
> [drm:cpt_serr_int_handler] *ERROR* PCH transcoder B FIFO underrun
>
> which I already reported here:
> https://lkml.org/lkml/2014/4/9/127
>
> and which is still unanswered!
>
> I tried to bisect the thing, but I ran into compile errors. So I
> can only say it came in with
> e9f37d3 "Merge branch 'drm-next' of 
> git://people.freedesktop.org/~airlied/linux"
>
> Jörg



-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 49/50] drm/i915/bdw: Help out the ctx switch interrupt handler

2014-05-09 Thread oscar . mateo

From: Oscar Mateo 

If we receive a storm of requests for the same context (see gem_storedw_loop_*)
we might end up iterating over too many elements in interrupt time, looking for
contexts to squash together. Instead, share the burden by giving more
intelligence to the queue function. At most, the interrupt will iterate over
three elements.

Signed-off-by: Oscar Mateo 
---
 drivers/gpu/drm/i915/intel_lrc.c | 23 ---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index d9edd10..0aad721 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -410,9 +410,11 @@ int gen8_switch_context_queue(struct intel_engine *ring,
  struct i915_hw_context *to,
  u32 tail)
 {
+   struct drm_i915_private *dev_priv = ring->dev->dev_private;
struct drm_i915_gem_request *req = NULL;
unsigned long flags;
-   bool was_empty;
+   struct drm_i915_gem_request *cursor;
+   int num_elements = 0;
 
req = kzalloc(sizeof(*req), GFP_KERNEL);
if (req == NULL)
@@ -425,9 +427,24 @@ int gen8_switch_context_queue(struct intel_engine *ring,
 
spin_lock_irqsave(&ring->execlist_lock, flags);
 
-   was_empty = list_empty(&ring->execlist_queue);
+   list_for_each_entry(cursor, &ring->execlist_queue, execlist_link)
+   if (++num_elements > 2)
+   break;
+
+   if (num_elements > 2) {
+   struct drm_i915_gem_request *tail_req =
+   list_last_entry(&ring->execlist_queue,
+   struct drm_i915_gem_request, 
execlist_link);
+   if (to == tail_req->ctx) {
+   WARN(tail_req->elsp_submitted != 0,
+   "More than 2 already-submitted reqs 
queued\n");
+   list_del(&tail_req->execlist_link);
+   queue_work(dev_priv->wq, &tail_req->work);
+   }
+   }
+
list_add_tail(&req->execlist_link, &ring->execlist_queue);
-   if (was_empty)
+   if (num_elements == 0)
gen8_switch_context_unqueue(ring);
 
spin_unlock_irqrestore(&ring->execlist_lock, flags);
-- 
1.9.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 50/50] drm/i915/bdw: Enable logical ring contexts

2014-05-09 Thread oscar . mateo

From: Oscar Mateo 

The time has come, the Walrus said, to talk of many things.

Signed-off-by: Oscar Mateo 
---
 drivers/gpu/drm/i915/i915_drv.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index c797e63..969962c 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1937,7 +1937,7 @@ struct drm_i915_cmd_table {
 #define I915_NEED_GFX_HWS(dev) (INTEL_INFO(dev)->need_gfx_hws)
 
 #define HAS_HW_CONTEXTS(dev)   (INTEL_INFO(dev)->gen >= 6)
-#define HAS_LOGICAL_RING_CONTEXTS(dev) 0
+#define HAS_LOGICAL_RING_CONTEXTS(dev) (INTEL_INFO(dev)->gen >= 8)
 #define HAS_ALIASING_PPGTT(dev)(INTEL_INFO(dev)->gen >= 6 && \
 (!IS_VALLEYVIEW(dev) || IS_CHERRYVIEW(dev)))
 #define HAS_PPGTT(dev) (INTEL_INFO(dev)->gen >= 7 \
-- 
1.9.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 48/50] drm/i915/bdw: Make sure error capture keeps working with Execlists

2014-05-09 Thread oscar . mateo

From: Oscar Mateo 

Since the ringbuffer does not belong per engine anymore, we have to
make sure that we are always recording the correct ringbuffer.

TODO: This is only a small fix to keep basic error capture working, but
we need to add more information for it to be useful (e.g. dump the
context being executed).

Signed-off-by: Oscar Mateo 
---
 drivers/gpu/drm/i915/i915_gpu_error.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c 
b/drivers/gpu/drm/i915/i915_gpu_error.c
index 6724e32..31ff7e1 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -823,9 +823,6 @@ static void i915_record_ring_state(struct drm_device *dev,
ering->hws = I915_READ(mmio);
}
 
-   ering->cpu_ring_head = ring->default_ringbuf.head;
-   ering->cpu_ring_tail = ring->default_ringbuf.tail;
-
ering->hangcheck_score = ring->hangcheck.score;
ering->hangcheck_action = ring->hangcheck.action;
 
@@ -881,6 +878,7 @@ static void i915_gem_record_rings(struct drm_device *dev,
 {
struct drm_i915_private *dev_priv = dev->dev_private;
struct drm_i915_gem_request *request;
+   struct intel_ringbuffer *ringbuf;
int i, count;
 
for (i = 0; i < I915_NUM_RINGS; i++) {
@@ -927,8 +925,13 @@ static void i915_gem_record_rings(struct drm_device *dev,
}
}
 
+   ringbuf = intel_ringbuffer_get(ring,
+   request ? request->ctx : ring->default_context);
+   error->ring[i].cpu_ring_head = ringbuf->head;
+   error->ring[i].cpu_ring_tail = ringbuf->tail;
+
error->ring[i].ringbuffer =
-   i915_error_ggtt_object_create(dev_priv, 
ring->default_ringbuf.obj);
+   i915_error_ggtt_object_create(dev_priv, ringbuf->obj);
 
if (ring->status_page.obj)
error->ring[i].hws_page =
-- 
1.9.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 47/50] drm/i915/bdw: Make sure gpu reset still works with Execlists

2014-05-09 Thread oscar . mateo

From: Oscar Mateo 

If we reset a ring after a hang, we have to make sure that we clear
out all queued Execlists requests and that we re-program the ring for
execution. Also, reset the hangcheck counters.

Signed-off-by: Oscar Mateo 
---
 drivers/gpu/drm/i915/i915_gem.c | 13 +
 drivers/gpu/drm/i915/intel_lrc.c| 10 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c |  8 
 3 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index e2d2edb..4f1bb46 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2400,6 +2400,19 @@ static void i915_gem_reset_ring_cleanup(struct 
drm_i915_private *dev_priv,
i915_gem_free_request(request);
}
 
+   if (dev_priv->lrc_enabled) {
+   while (!list_empty(&ring->execlist_queue)) {
+   struct drm_i915_gem_request *request;
+
+   request = list_first_entry(&ring->execlist_queue,
+  struct drm_i915_gem_request,
+  execlist_link);
+   list_del(&request->execlist_link);
+   i915_gem_context_unreference(request->ctx);
+   kfree(request);
+   }
+   }
+
/* These may not have been flush before the reset, do so now */
kfree(ring->preallocated_lazy_request);
ring->preallocated_lazy_request = NULL;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index a13a570..d9edd10 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -700,17 +700,9 @@ int gen8_gem_context_init(struct drm_device *dev)
goto err_out;
}
 
-   for_each_ring(ring, dev_priv, ring_id) {
+   for_each_ring(ring, dev_priv, ring_id)
ring->default_context = ctx;
 
-   I915_WRITE(RING_MODE_GEN7(ring),
-   _MASKED_BIT_DISABLE(GFX_REPLAY_MODE) |
-   _MASKED_BIT_ENABLE(GFX_RUN_LIST_ENABLE));
-   POSTING_READ(RING_MODE_GEN7(ring));
-   DRM_DEBUG_DRIVER("Execlists enabled for %s\n",
-   ring->name);
-   }
-
DRM_DEBUG_DRIVER("LR context support initialized\n");
return 0;
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 94c1716..9c0deb2 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -573,6 +573,14 @@ static int init_ring_common_lrc(struct intel_engine *ring)
struct drm_device *dev = ring->dev;
struct drm_i915_private *dev_priv = dev->dev_private;
 
+   I915_WRITE(RING_MODE_GEN7(ring),
+   _MASKED_BIT_DISABLE(GFX_REPLAY_MODE) |
+   _MASKED_BIT_ENABLE(GFX_RUN_LIST_ENABLE));
+   POSTING_READ(RING_MODE_GEN7(ring));
+   DRM_DEBUG_DRIVER("Execlists enabled for %s\n", ring->name);
+
+   memset(&ring->hangcheck, 0, sizeof(ring->hangcheck));
+
I915_WRITE_IMR(ring, ~(ring->irq_enable_mask | ring->irq_keep_mask));
I915_WRITE(RING_HWSTAM(ring->mmio_base), 0x);
 
-- 
1.9.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 46/50] drm/i915/bdw: Avoid non-lite-restore preemptions

2014-05-09 Thread oscar . mateo

From: Oscar Mateo 

In the current Execlists feeding mechanism, full preemption is not
supported yet: only lite-restores are allowed (this is: the GPU
simply samples a new tail pointer for the context currently in
execution).

But we have identified an scenario in which a full preemption occurs:
1) We submit two contexts for execution (A & B).
2) The GPU finishes with the first one (A), switches to the second one
(B) and informs us.
3) We submit B again (hoping to cause a lite restore) together with C,
but in the time we spend writing to the ELSP, the GPU finishes B.
4) The GPU start executing B again (since we told it so).
5) We receive a B finished interrupt and, mistakenly, we submit C (again)
and D, causing a full preemption of B.

By keeping a better track of our submissions, we can avoid the scenario
described above.

Signed-off-by: Oscar Mateo 
---
 drivers/gpu/drm/i915/i915_drv.h  |  3 +++
 drivers/gpu/drm/i915/intel_lrc.c | 28 
 2 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 07b8bdc..c797e63 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1750,6 +1750,9 @@ struct drm_i915_gem_request {
struct list_head execlist_link;
/** Struct to handle this request in the bottom half of an interrupt */
struct work_struct work;
+
+   /** No. of times this request has been sent to the ELSP */
+   int elsp_submitted;
 };
 
 struct drm_i915_file_private {
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 49f6c9d..a13a570 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -294,6 +294,7 @@ static void gen8_switch_context_unqueue(struct intel_engine 
*ring)
else if (req0->ctx == cursor->ctx) {
/* Same ctx: ignore first request, as second request
 * will update tail past first request's workload */
+   cursor->elsp_submitted = req0->elsp_submitted;
list_del(&req0->execlist_link);
queue_work(dev_priv->wq, &req0->work);
req0 = cursor;
@@ -303,8 +304,14 @@ static void gen8_switch_context_unqueue(struct 
intel_engine *ring)
}
}
 
+   WARN_ON(req1 && req1->elsp_submitted);
+
BUG_ON(gen8_switch_context(ring, req0->ctx, req0->tail,
req1? req1->ctx : NULL, req1? req1->tail : 0));
+
+   req0->elsp_submitted++;
+   if (req1)
+   req1->elsp_submitted++;
 }
 
 static bool check_remove_request(struct intel_engine *ring, u32 request_id)
@@ -320,9 +327,13 @@ static bool check_remove_request(struct intel_engine 
*ring, u32 request_id)
struct drm_i915_gem_object *ctx_obj =
head_req->ctx->engine[ring->id].obj;
if (intel_get_lr_contextid(ctx_obj) == request_id) {
-   list_del(&head_req->execlist_link);
-   queue_work(dev_priv->wq, &head_req->work);
-   return true;
+   WARN(head_req->elsp_submitted == 0,
+   "Never submitted head request\n");
+   if (--head_req->elsp_submitted <= 0) {
+   list_del(&head_req->execlist_link);
+   queue_work(dev_priv->wq, &head_req->work);
+   return true;
+   }
}
}
 
@@ -355,7 +366,16 @@ void gen8_handle_context_events(struct intel_engine *ring)
status_id = I915_READ(RING_CONTEXT_STATUS_BUF(ring) +
(read_pointer % 6) * 8 + 4);
 
-   if (status & GEN8_CTX_STATUS_COMPLETE) {
+   if (status & GEN8_CTX_STATUS_PREEMPTED) {
+   if (status & GEN8_CTX_STATUS_LITE_RESTORE) {
+   if (check_remove_request(ring, status_id))
+   WARN(1, "Lite Restored request removed 
from queue\n");
+   } else
+   WARN(1, "Preemption without Lite Restore\n");
+   }
+
+if ((status & GEN8_CTX_STATUS_ACTIVE_IDLE) ||
+(status & GEN8_CTX_STATUS_ELEMENT_SWITCH)) {
if (check_remove_request(ring, status_id))
submit_contexts++;
}
-- 
1.9.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 44/50] drm/i915/bdw: Print context state in debugfs

2014-05-09 Thread oscar . mateo

From: Ben Widawsky 

This has turned out to be really handy in debug so far.

Update:
Since writing this patch, I've gotten similar code upstream for error
state. I've used it quite a bit in debugfs however, and I'd like to keep
it here at least until preemption is working.

Signed-off-by: Ben Widawsky 

This patch was accidentally dropped in the first Execlists version, and
it has been very useful indeed. Put it back again, but as a standalone
debugfs file.

Signed-off-by: Oscar Mateo 
---
 drivers/gpu/drm/i915/i915_debugfs.c | 52 +
 1 file changed, 52 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index c99a872..7f661bf 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1738,6 +1738,57 @@ static int i915_context_status(struct seq_file *m, void 
*unused)
return 0;
 }
 
+static int i915_dump_lrc(struct seq_file *m, void *unused)
+{
+   struct drm_info_node *node = (struct drm_info_node *) m->private;
+   struct drm_device *dev = node->minor->dev;
+   struct drm_i915_private *dev_priv = dev->dev_private;
+   struct intel_engine *ring;
+   struct i915_hw_context *ctx;
+   int ret, i;
+
+   if (!dev_priv->lrc_enabled) {
+   seq_printf(m, "Logical Ring Contexts are disabled\n");
+   return 0;
+   }
+
+   ret = mutex_lock_interruptible(&dev->mode_config.mutex);
+   if (ret)
+   return ret;
+
+   list_for_each_entry(ctx, &dev_priv->context_list, link) {
+   for_each_active_ring(ring, dev_priv, i) {
+   struct drm_i915_gem_object *ctx_obj = 
ctx->engine[i].obj;
+
+   if (ring->default_context == ctx)
+   continue;
+
+   if (ctx_obj) {
+   struct page *page = 
i915_gem_object_get_page(ctx_obj, 1);
+   uint32_t *reg_state = kmap_atomic(page);
+   int j;
+
+   seq_printf(m, "CONTEXT: %s %u\n", ring->name,
+   
intel_get_lr_contextid(ctx_obj));
+
+   for (j = 0; j < 0x600 / sizeof(u32) / 4; j += 
4) {
+   seq_printf(m, "\t[0x%08lx] 0x%08x 
0x%08x 0x%08x 0x%08x\n",
+   i915_gem_obj_ggtt_offset(ctx_obj) + 
4096 + (j * 4),
+   reg_state[j], reg_state[j + 1],
+   reg_state[j + 2], reg_state[j + 3]);
+   }
+   kunmap_atomic(reg_state);
+
+   seq_putc(m, '\n');
+   }
+   }
+   }
+
+   mutex_unlock(&dev->mode_config.mutex);
+
+   return 0;
+}
+
 static int i915_execlists(struct seq_file *m, void *data)
 {
struct drm_info_node *node = (struct drm_info_node *) m->private;
@@ -3866,6 +3917,7 @@ static const struct drm_info_list i915_debugfs_list[] = {
{"i915_opregion", i915_opregion, 0},
{"i915_gem_framebuffer", i915_gem_framebuffer_info, 0},
{"i915_context_status", i915_context_status, 0},
+   {"i915_dump_lrc", i915_dump_lrc, 0},
{"i915_execlists", i915_execlists, 0},
{"i915_gen6_forcewake_count", i915_gen6_forcewake_count_info, 0},
{"i915_swizzle_info", i915_swizzle_info, 0},
-- 
1.9.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 45/50] drm/i915/bdw: Document execlists and logical ring contexts

2014-05-09 Thread oscar . mateo

From: Oscar Mateo 

Explain intel_lrc.c with some execlists notes

Signed-off-by: Thomas Daniel 

v2: Add notes on logical ring context creation.

Signed-off-by: Oscar Mateo 
---
 drivers/gpu/drm/i915/intel_lrc.c | 67 
 1 file changed, 67 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index dc1ab25..49f6c9d 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -33,8 +33,75 @@
  * These expanded contexts enable a number of new abilities, especially
  * "Execlists" (also implemented in this file).
  *
+ * One of the main differences with the legacy HW contexts is that logical
+ * ring contexts incorporate many more things to the context's state, like
+ * PDPs or ringbuffer control registers.
+ *
+ * Regarding the creation of contexts, we have:
+ *
+ * - One global default context.
+ * - One local default context for each opened fd.
+ * - One local extra context for each context create ioctl call.
+ *
+ * Now that ringbuffers belong per-context (and not per-engine, like before)
+ * and that contexts are uniquely tied to a given engine (and not reusable,
+ * like before) we need:
+ *
+ * - One ringbuffer per-engine inside each context.
+ * - One backing object per-engine inside each context.
+ *
+ * The global default context starts its life with these new objects fully
+ * allocated and populated. Regarding non-global contexts, we don't know
+ * at creation time which engine is going to use them, so we have implemented
+ * a deferred creation of LR contexts: the local context starts its life as a
+ * hollow or blank holder, that gets populated for a given engine once we 
receive
+ * an execbuffer. If later on we receive another execbuffer ioctl for the same
+ * context but a different engine, we allocate/populate a new ringbuffer and
+ * context backing object and so on.
+ *
  * Execlists are the new method by which, on gen8+ hardware, workloads are
  * submitted for execution (as opposed to the legacy, ringbuffer-based, 
method).
+ * This method works as follows:
+ *
+ * When a request is committed, its commands (the BB start and any leading or
+ * trailing commands, like the seqno breadcrumbs) are placed in the ringbuffer
+ * for the appropriate context. The tail pointer in the hardware context is not
+ * updated at this time, but instead, kept by the driver in the ringbuffer
+ * structure. A structure representing this request is added to a request queue
+ * for the appropriate engine: this structure contains a copy of the context's
+ * tail after the request was written to the ring buffer and a pointer to the
+ * context itself.
+ *
+ * If the engine's request queue was empty before the request was added, the
+ * queue is processed immediately. Otherwise the queue will be processed during
+ * a context switch interrupt. In any case, elements on the queue will get sent
+ * (in pairs) to the GPU's ExecLists Submit Port (ELSP, for short) with a
+ * globally unique 20-bits submission ID.
+ *
+ * When execution of a request completes, the GPU updates the context status
+ * buffer with a context complete event and generates a context switch 
interrupt.
+ * During the interrupt handling, the driver examines the events in the buffer:
+ * for each context complete event, if the announced ID matches that on the 
head
+ * of the request queue then that request is retired and removed from the 
queue.
+ *
+ * After processing, if any requests were retired and the queue is not empty
+ * then a new execution list can be submitted. The two requests at the front of
+ * the queue are next to be submitted but since a context may not occur twice 
in
+ * an execution list, if subsequent requests have the same ID as the first then
+ * the two requests must be combined. This is done simply by discarding 
requests
+ * at the head of the queue until either only one requests is left (in which 
case
+ * we use a NULL second context) or the first two requests have unique IDs.
+ *
+ * By always executing the first two requests in the queue the driver ensures
+ * that the GPU is kept as busy as possible. In the case where a single context
+ * completes but a second context is still executing, the request for the 
second
+ * context will be at the head of the queue when we remove the first one. This
+ * request will then be resubmitted along with a new request for a different 
context,
+ * which will cause the hardware to continue executing the second request and 
queue
+ * the new request (the GPU detects the condition of a context getting 
preempted
+ * with the same context and optimizes the context switch flow by not doing
+ * preemption, but just sampling the new tail pointer).
+ *
  */
 
 #include 
-- 
1.9.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 43/50] drm/i915/bdw: Display context backing obj & ringbuffer info in debugfs

2014-05-09 Thread oscar . mateo

From: Oscar Mateo 

Signed-off-by: Oscar Mateo 
---
 drivers/gpu/drm/i915/i915_debugfs.c | 26 --
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index cc212df..c99a872 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1671,6 +1671,12 @@ static int i915_gem_framebuffer_info(struct seq_file *m, 
void *data)
 
return 0;
 }
+static void describe_ctx_ringbuf(struct seq_file *m, struct intel_ringbuffer 
*ringbuf)
+{
+   seq_printf(m, " (ringbuffer, space: %d, head: %u, tail: %u, last head: 
%d)",
+   ringbuf->space, ringbuf->head, ringbuf->tail,
+   ringbuf->last_retired_head);
+}
 
 static int i915_context_status(struct seq_file *m, void *unused)
 {
@@ -1698,7 +1704,7 @@ static int i915_context_status(struct seq_file *m, void 
*unused)
}
 
list_for_each_entry(ctx, &dev_priv->context_list, link) {
-   if (ctx->engine[RCS].obj == NULL)
+   if (!dev_priv->lrc_enabled && ctx->engine[RCS].obj == NULL)
continue;
 
seq_puts(m, "HW context ");
@@ -1707,7 +1713,23 @@ static int i915_context_status(struct seq_file *m, void 
*unused)
if (ring->default_context == ctx)
seq_printf(m, "(default context %s) ", 
ring->name);
 
-   describe_obj(m, ctx->engine[RCS].obj);
+   if (dev_priv->lrc_enabled) {
+   seq_putc(m, '\n');
+   for_each_active_ring(ring, dev_priv, i) {
+   struct drm_i915_gem_object *ctx_obj = 
ctx->engine[i].obj;
+   struct intel_ringbuffer *ringbuf = 
ctx->engine[i].ringbuf;
+
+   seq_printf(m, "%s: ", ring->name);
+   if (ctx_obj)
+   describe_obj(m, ctx_obj);
+   if (ringbuf)
+   describe_ctx_ringbuf(m, ringbuf);
+   seq_putc(m, '\n');
+   }
+   } else {
+   describe_obj(m, ctx->engine[RCS].obj);
+   }
+
seq_putc(m, '\n');
}
 
-- 
1.9.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 29/50] drm/i915/bdw: Execlists ring tail writing

2014-05-09 Thread oscar . mateo

From: Oscar Mateo 

The write tail function is a very special place for execlists: since
all access to the ring is mediated through requests (thanks to
Chris Wilson's "Write RING_TAIL once per-request" for that) and all
requests end up with a write tail, this is the place we are going to
use to submit contexts for execution.

For the moment, just mark the place (we still need to do a lot of
preparation before execlists are ready to start submitting things).

Signed-off-by: Oscar Mateo 
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
b/drivers/gpu/drm/i915/intel_ringbuffer.c
index eef7094..03719b0 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -430,6 +430,12 @@ static void ring_write_tail(struct intel_engine *ring,
I915_WRITE_TAIL(ring, value);
 }
 
+static void gen8_submit_ctx(struct intel_engine *ring,
+   struct i915_hw_context *ctx, u32 value)
+{
+   DRM_ERROR("Execlists still not ready!\n");
+}
+
 u64 intel_ring_get_active_head(struct intel_engine *ring)
 {
struct drm_i915_private *dev_priv = ring->dev->dev_private;
@@ -1983,12 +1989,15 @@ int intel_init_render_ring(struct drm_device *dev)
struct drm_i915_private *dev_priv = dev->dev_private;
struct intel_engine *ring = &dev_priv->ring[RCS];
 
+   ring->submit = ring_write_tail;
if (INTEL_INFO(dev)->gen >= 6) {
ring->add_request = gen6_add_request;
ring->flush = gen7_render_ring_flush;
if (INTEL_INFO(dev)->gen == 6)
ring->flush = gen6_render_ring_flush;
if (INTEL_INFO(dev)->gen >= 8) {
+   if (dev_priv->lrc_enabled)
+   ring->submit = gen8_submit_ctx;
ring->flush = gen8_render_ring_flush;
ring->irq_get = gen8_ring_get_irq;
ring->irq_put = gen8_ring_put_irq;
@@ -2043,7 +2052,7 @@ int intel_init_render_ring(struct drm_device *dev)
}
ring->irq_enable_mask = I915_USER_INTERRUPT;
}
-   ring->submit = ring_write_tail;
+
if (IS_HASWELL(dev))
ring->dispatch_execbuffer = hsw_ring_dispatch_execbuffer;
else if (IS_GEN8(dev))
@@ -2163,6 +2172,8 @@ int intel_init_bsd_ring(struct drm_device *dev)
ring->get_seqno = gen6_ring_get_seqno;
ring->set_seqno = ring_set_seqno;
if (INTEL_INFO(dev)->gen >= 8) {
+   if (dev_priv->lrc_enabled)
+   ring->submit = gen8_submit_ctx;
ring->irq_enable_mask =
GT_RENDER_USER_INTERRUPT << GEN8_VCS1_IRQ_SHIFT;
ring->irq_get = gen8_ring_get_irq;
@@ -2229,7 +2240,10 @@ int intel_init_bsd2_ring(struct drm_device *dev)
return -EINVAL;
}
 
-   ring->submit = ring_write_tail;
+   if (dev_priv->lrc_enabled)
+   ring->submit = gen8_submit_ctx;
+   else
+   ring->submit = ring_write_tail;
ring->flush = gen6_bsd_ring_flush;
ring->add_request = gen6_add_request;
ring->get_seqno = gen6_ring_get_seqno;
@@ -2274,6 +2288,8 @@ int intel_init_blt_ring(struct drm_device *dev)
ring->get_seqno = gen6_ring_get_seqno;
ring->set_seqno = ring_set_seqno;
if (INTEL_INFO(dev)->gen >= 8) {
+   if (dev_priv->lrc_enabled)
+   ring->submit = gen8_submit_ctx;
ring->irq_enable_mask =
GT_RENDER_USER_INTERRUPT << GEN8_BCS_IRQ_SHIFT;
ring->irq_get = gen8_ring_get_irq;
@@ -2320,6 +2336,8 @@ int intel_init_vebox_ring(struct drm_device *dev)
ring->set_seqno = ring_set_seqno;
 
if (INTEL_INFO(dev)->gen >= 8) {
+   if (dev_priv->lrc_enabled)
+   ring->submit = gen8_submit_ctx;
ring->irq_enable_mask =
GT_RENDER_USER_INTERRUPT << GEN8_VECS_IRQ_SHIFT;
ring->irq_get = gen8_ring_get_irq;
-- 
1.9.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 35/50] drm/i915/bdw: Add forcewake lock around ELSP writes

2014-05-09 Thread oscar . mateo

From: Thomas Daniel 

BSPEC says: SW must set Force Wakeup bit to prevent GT from
entering C6 while ELSP writes are in progress.

Signed-off-by: Thomas Daniel 
Acked-by: Oscar Mateo 
---
 drivers/gpu/drm/i915/i915_drv.h  |  1 +
 drivers/gpu/drm/i915/intel_lrc.c | 15 +++
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 208a4bd..6b39fed 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2694,6 +2694,7 @@ int vlv_freq_opcode(struct drm_i915_private *dev_priv, 
int val);
 
 #define I915_READ(reg) dev_priv->uncore.funcs.mmio_readl(dev_priv, 
(reg), true)
 #define I915_WRITE(reg, val)   dev_priv->uncore.funcs.mmio_writel(dev_priv, 
(reg), (val), true)
+#define I915_RAW_WRITE(reg, val)   writel(val, dev_priv->regs + reg)
 #define I915_READ_NOTRACE(reg) 
dev_priv->uncore.funcs.mmio_readl(dev_priv, (reg), false)
 #define I915_WRITE_NOTRACE(reg, val)   
dev_priv->uncore.funcs.mmio_writel(dev_priv, (reg), (val), false)
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 2eb1c28..54cbb4b 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -141,14 +141,21 @@ static void submit_execlist(struct intel_engine *ring,
desc[3] = (u32)(temp >> 32);
desc[2] = (u32)temp;
 
-   I915_WRITE(RING_ELSP(ring), desc[1]);
-   I915_WRITE(RING_ELSP(ring), desc[0]);
-   I915_WRITE(RING_ELSP(ring), desc[3]);
+   /* Set Force Wakeup bit to prevent GT from entering C6 while
+* ELSP writes are in progress */
+   gen6_gt_force_wake_get(dev_priv, FORCEWAKE_ALL);
+
+   I915_RAW_WRITE(RING_ELSP(ring), desc[1]);
+   I915_RAW_WRITE(RING_ELSP(ring), desc[0]);
+   I915_RAW_WRITE(RING_ELSP(ring), desc[3]);
/* The context is automatically loaded after the following */
-   I915_WRITE(RING_ELSP(ring), desc[2]);
+   I915_RAW_WRITE(RING_ELSP(ring), desc[2]);
 
/* ELSP is a write only register, so this serves as a posting read */
POSTING_READ(RING_EXECLIST_STATUS(ring));
+
+   /* Release Force Wakeup */
+   gen6_gt_force_wake_put(dev_priv, FORCEWAKE_ALL);
 }
 
 static int gen8_switch_context(struct intel_engine *ring,
-- 
1.9.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 42/50] drm/i915/bdw: Display execlists info in debugfs

2014-05-09 Thread oscar . mateo

From: Oscar Mateo 

v2: Warn and return if LRCs are not enabled.

Signed-off-by: Oscar Mateo 
---
 drivers/gpu/drm/i915/i915_debugfs.c | 74 +
 drivers/gpu/drm/i915/i915_reg.h |  7 
 drivers/gpu/drm/i915/intel_lrc.c|  6 ---
 3 files changed, 81 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index 204b432..cc212df 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1716,6 +1716,79 @@ static int i915_context_status(struct seq_file *m, void 
*unused)
return 0;
 }
 
+static int i915_execlists(struct seq_file *m, void *data)
+{
+   struct drm_info_node *node = (struct drm_info_node *) m->private;
+   struct drm_device *dev = node->minor->dev;
+   struct drm_i915_private *dev_priv = dev->dev_private;
+   struct intel_engine *ring;
+   u32 status_pointer;
+   u8 read_pointer;
+   u8 write_pointer;
+   u32 status;
+   u32 ctx_id;
+   struct list_head *cursor;
+   struct drm_i915_gem_request *head_req;
+   int ring_id, i;
+
+   if (!dev_priv->lrc_enabled) {
+   seq_printf(m, "Logical Ring Contexts are disabled\n");
+   return 0;
+   }
+
+   for_each_active_ring(ring, dev_priv, ring_id) {
+   int count = 0;
+
+   seq_printf(m, "%s\n", ring->name);
+
+   status = I915_READ(RING_EXECLIST_STATUS(ring));
+   ctx_id = I915_READ(RING_EXECLIST_STATUS(ring) + 4);
+   seq_printf(m, "\tExeclist status: 0x%08X, context: %u\n",
+   status, ctx_id);
+
+   status_pointer = I915_READ(RING_CONTEXT_STATUS_PTR(ring));
+   seq_printf(m, "\tStatus pointer: 0x%08X\n", status_pointer);
+
+   read_pointer = ring->next_context_status_buffer;
+   write_pointer = status_pointer & 0x07;
+   if (read_pointer > write_pointer)
+   write_pointer += 6;
+   seq_printf(m, "\tRead pointer: 0x%08X, write pointer 0x%08X\n",
+   read_pointer, write_pointer);
+
+   for (i = 0; i < 6; i++) {
+   status = I915_READ(RING_CONTEXT_STATUS_BUF(ring) + 8*i);
+   ctx_id = I915_READ(RING_CONTEXT_STATUS_BUF(ring) + 8*i 
+ 4);
+
+   seq_printf(m, "\tStatus buffer %d: 0x%08X, context: 
%u\n",
+   i, status, ctx_id);
+   }
+
+   list_for_each(cursor, &ring->execlist_queue) {
+   count++;
+   }
+   seq_printf(m, "\t%d requests in queue\n", count);
+
+   if (count > 0) {
+   struct drm_i915_gem_object *ctx_obj;
+
+   head_req = list_first_entry(&ring->execlist_queue,
+   struct drm_i915_gem_request, 
execlist_link);
+
+   ctx_obj = head_req->ctx->engine[ring_id].obj;
+   seq_printf(m, "\tHead request id: %u\n",
+   intel_get_lr_contextid(ctx_obj));
+   seq_printf(m, "\tHead request seqno: %u\n", 
head_req->seqno);
+   seq_printf(m, "\tHead request tail: %u\n", 
head_req->tail);
+
+   }
+
+   seq_putc(m, '\n');
+   }
+
+   return 0;
+}
+
 static int i915_gen6_forcewake_count_info(struct seq_file *m, void *data)
 {
struct drm_info_node *node = (struct drm_info_node *) m->private;
@@ -3771,6 +3844,7 @@ static const struct drm_info_list i915_debugfs_list[] = {
{"i915_opregion", i915_opregion, 0},
{"i915_gem_framebuffer", i915_gem_framebuffer_info, 0},
{"i915_context_status", i915_context_status, 0},
+   {"i915_execlists", i915_execlists, 0},
{"i915_gen6_forcewake_count", i915_gen6_forcewake_count_info, 0},
{"i915_swizzle_info", i915_swizzle_info, 0},
{"i915_ppgtt_info", i915_ppgtt_info, 0},
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 97a51f8..ab3a650 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -116,6 +116,13 @@
 #define GEN8_RING_PDP_UDW(ring, n) ((ring)->mmio_base+0x270 + ((n) * 8 + 
4))
 #define GEN8_RING_PDP_LDW(ring, n) ((ring)->mmio_base+0x270 + (n) * 8)
 
+/* Execlists regs */
+#define RING_ELSP(ring)((ring)->mmio_base+0x230)
+#define RING_EXECLIST_STATUS(ring) ((ring)->mmio_base+0x234)
+#define RING_CONTEXT_CONTROL(ring) ((ring)->mmio_base+0x244)
+#define RING_CONTEXT_STATUS_BUF(ring)  ((ring)->mmio_base+0x370)
+#define RING_CONTEXT_STATUS_PTR(ring)  ((ring)->mmio_base+0x3a0)
+
 #define GAM_ECOCHK 0x4090
 #define   ECOCHK_SNB_BIT   (1<<10)
 #define   HSW_ECOCHK_ARB_PRIO_SOL  (1<<6)
diff --git a/

[Intel-gfx] [PATCH 39/50] drm/i915/bdw: Get prepared for a two-stage execlist submit process

2014-05-09 Thread oscar . mateo

From: Michel Thierry 

Context switch (and execlist submission) should happen only when
other contexts are not active, otherwise pre-emption occurs.

To assure this, we place context switch requests in a queue and those
request are later consumed when the right context switch interrupt is
received.

Signed-off-by: Michel Thierry 

v2: Use a spinlock, do not remove the requests on unqueue (wait for
context switch completion).

Signed-off-by: Thomas Daniel 

v3: Several rebases and code changes. Use unique ID.

v4:
- Move the queue/lock init to the late ring initialization.
- Damien's kmalloc review comments: check return, use sizeof(*req),
do not cast.

Signed-off-by: Oscar Mateo 
---
 drivers/gpu/drm/i915/i915_drv.h |  6 
 drivers/gpu/drm/i915/intel_lrc.c| 57 +
 drivers/gpu/drm/i915/intel_ringbuffer.c |  3 ++
 drivers/gpu/drm/i915/intel_ringbuffer.h |  3 ++
 4 files changed, 69 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 6b39fed..f2aae6a 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1745,6 +1745,9 @@ struct drm_i915_gem_request {
struct drm_i915_file_private *file_priv;
/** file_priv list entry for this request */
struct list_head client_list;
+
+   /** execlist queue entry for this request */
+   struct list_head execlist_link;
 };
 
 struct drm_i915_file_private {
@@ -2443,6 +2446,9 @@ static inline u32 intel_get_lr_contextid(struct 
drm_i915_gem_object *ctx_obj)
 * (which leaves one HwCtxId bit free) */
return lrca >> 13;
 }
+int gen8_switch_context_queue(struct intel_engine *ring,
+ struct i915_hw_context *to,
+ u32 tail);
 
 /* i915_gem_evict.c */
 int __must_check i915_gem_evict_something(struct drm_device *dev,
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index b06098e..6da7db9 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -199,6 +199,63 @@ static int gen8_switch_context(struct intel_engine *ring,
return 0;
 }
 
+static void gen8_switch_context_unqueue(struct intel_engine *ring)
+{
+   struct drm_i915_gem_request *req0 = NULL, *req1 = NULL;
+   struct drm_i915_gem_request *cursor = NULL, *tmp = NULL;
+
+   if (list_empty(&ring->execlist_queue))
+   return;
+
+   /* Try to read in pairs */
+   list_for_each_entry_safe(cursor, tmp, &ring->execlist_queue, 
execlist_link) {
+   if (!req0)
+   req0 = cursor;
+   else if (req0->ctx == cursor->ctx) {
+   /* Same ctx: ignore first request, as second request
+* will update tail past first request's workload */
+   list_del(&req0->execlist_link);
+   i915_gem_context_unreference(req0->ctx);
+   kfree(req0);
+   req0 = cursor;
+   } else {
+   req1 = cursor;
+   break;
+   }
+   }
+
+   BUG_ON(gen8_switch_context(ring, req0->ctx, req0->tail,
+   req1? req1->ctx : NULL, req1? req1->tail : 0));
+}
+
+int gen8_switch_context_queue(struct intel_engine *ring,
+ struct i915_hw_context *to,
+ u32 tail)
+{
+   struct drm_i915_gem_request *req = NULL;
+   unsigned long flags;
+   bool was_empty;
+
+   req = kzalloc(sizeof(*req), GFP_KERNEL);
+   if (req == NULL)
+   return -ENOMEM;
+   req->ring = ring;
+   req->ctx = to;
+   i915_gem_context_reference(req->ctx);
+   req->tail = tail;
+
+   spin_lock_irqsave(&ring->execlist_lock, flags);
+
+   was_empty = list_empty(&ring->execlist_queue);
+   list_add_tail(&req->execlist_link, &ring->execlist_queue);
+   if (was_empty)
+   gen8_switch_context_unqueue(ring);
+
+   spin_unlock_irqrestore(&ring->execlist_lock, flags);
+
+   return 0;
+}
+
 struct i915_hw_context *
 gen8_gem_validate_context(struct drm_device *dev, struct drm_file *file,
  struct intel_engine *ring, const u32 ctx_id)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 847fec5..35ced7c 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1554,6 +1554,9 @@ static int intel_init_ring(struct drm_device *dev,
 
init_waitqueue_head(&ring->irq_queue);
 
+   INIT_LIST_HEAD(&ring->execlist_queue);
+   spin_lock_init(&ring->execlist_lock);
+
if (dev_priv->lrc_enabled) {
struct drm_i915_gem_object *obj;
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h 
b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 709b1f1..daf91de 100644
--- a/drivers/gpu/dr

[Intel-gfx] [PATCH 34/50] drm/i915/bdw: Implement context switching (somewhat)

2014-05-09 Thread oscar . mateo

From: Ben Widawsky 

A context switch occurs by submitting a context descriptor to the
ExecList Submission Port. Given that we can now initialize a context,
it's possible to begin implementing the context switch by creating the
descriptor and submitting it to ELSP (actually two, since the ELSP
has two ports).

The context object must be mapped in the GGTT, which means it must exist
in the 0-4GB graphics VA range.

Signed-off-by: Ben Widawsky 

v2: This code has changed quite a lot in various rebases. Of particular
importance is that now we use the globally unique Submission ID to send
to the hardware. Also, context pages are now pinned unconditionally to
GGTT, so there is no need to bind them.

v3: Use LRCA[31:11] as hwCtxId[18:0]. This guarantees that the HW context
ID we submit to the ELSP is globally unique and != 0 (Bspec requirements
of the software use-only bits of the Context ID in the Context Descriptor
Format) without the hassle of the previous submission Id construction.
Also, re-add the ELSP porting read (it was dropped somewhere during the
rebases).

Signed-off-by: Oscar Mateo 
---
 drivers/gpu/drm/i915/i915_drv.h  |  9 
 drivers/gpu/drm/i915/intel_lrc.c | 95 
 2 files changed, 104 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 7d06a66..208a4bd 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2434,6 +2434,15 @@ int gen8_create_lr_context(struct i915_hw_context *ctx,
 struct i915_hw_context *
 gen8_gem_validate_context(struct drm_device *dev, struct drm_file *file,
  struct intel_engine *ring, const u32 ctx_id);
+static inline u32 intel_get_lr_contextid(struct drm_i915_gem_object *ctx_obj)
+{
+   u32 lrca = i915_gem_obj_ggtt_offset(ctx_obj);
+
+   /* LRCA is required to be 4K aligned and LRCA context image is always at
+* least 2 pages, so the more significant 19 bits are globally unique
+* (which leaves one HwCtxId bit free) */
+   return lrca >> 13;
+}
 
 /* i915_gem_evict.c */
 int __must_check i915_gem_evict_something(struct drm_device *dev,
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index a85f91c..2eb1c28 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -47,6 +47,7 @@
 #define GEN8_LR_CONTEXT_ALIGN 4096
 
 #define RING_ELSP(ring)((ring)->mmio_base+0x230)
+#define RING_EXECLIST_STATUS(ring) ((ring)->mmio_base+0x234)
 #define RING_CONTEXT_CONTROL(ring) ((ring)->mmio_base+0x244)
 
 #define CTX_LRI_HEADER_0   0x01
@@ -78,6 +79,100 @@
 #define CTX_R_PWR_CLK_STATE0x42
 #define CTX_GPGPU_CSR_BASE_ADDRESS 0x44
 
+#define GEN8_CTX_VALID (1<<0)
+#define GEN8_CTX_FORCE_PD_RESTORE (1<<1)
+#define GEN8_CTX_FORCE_RESTORE (1<<2)
+#define GEN8_CTX_L3LLC_COHERENT (1<<5)
+#define GEN8_CTX_PRIVILEGE (1<<8)
+enum {
+   ADVANCED_CONTEXT=0,
+   LEGACY_CONTEXT,
+   ADVANCED_AD_CONTEXT,
+   LEGACY_64B_CONTEXT
+};
+#define GEN8_CTX_MODE_SHIFT 3
+enum {
+   FAULT_AND_HANG=0,
+   FAULT_AND_HALT, /* Debug only */
+   FAULT_AND_STREAM,
+   FAULT_AND_CONTINUE /* Unsupported */
+};
+#define GEN8_CTX_FAULT_SHIFT 6
+#define GEN8_CTX_LRCA_SHIFT 12
+#define GEN8_CTX_UNUSED_SHIFT 32
+
+static inline uint64_t get_descriptor(struct drm_i915_gem_object *ctx_obj)
+{
+   uint64_t desc;
+
+   BUG_ON(i915_gem_obj_ggtt_offset(ctx_obj) & 0xULL);
+
+   desc = GEN8_CTX_VALID;
+   desc |= LEGACY_CONTEXT << GEN8_CTX_MODE_SHIFT;
+   desc |= i915_gem_obj_ggtt_offset(ctx_obj);
+   desc |= GEN8_CTX_L3LLC_COHERENT;
+   desc |= (u64)intel_get_lr_contextid(ctx_obj) << GEN8_CTX_UNUSED_SHIFT;
+   desc |= GEN8_CTX_PRIVILEGE;
+
+   /* TODO: WaDisableLiteRestore when we start using semaphore
+* signalling between Command Streamers */
+   /* desc |= GEN8_CTX_FORCE_RESTORE; */
+
+   return desc;
+}
+
+static void submit_execlist(struct intel_engine *ring,
+   struct drm_i915_gem_object *ctx_obj0,
+   struct drm_i915_gem_object *ctx_obj1)
+{
+   struct drm_i915_private *dev_priv = ring->dev->dev_private;
+   uint64_t temp = 0;
+   uint32_t desc[4];
+
+   /* XXX: You must always write both descriptors in the order below. */
+   if (ctx_obj1)
+   temp = get_descriptor(ctx_obj1);
+   else
+   temp = 0;
+   desc[1] = (u32)(temp >> 32);
+   desc[0] = (u32)temp;
+
+   temp = get_descriptor(ctx_obj0);
+   desc[3] = (u32)(temp >> 32);
+   desc[2] = (u32)temp;
+
+   I915_WRITE(RING_ELSP(ring), desc[1]);
+   I915_WRITE(RING_ELSP(ring), desc[0]);
+   I915_WRITE(RING_ELSP(ring), desc[3]);
+   /* The context is automatically loaded after the following */
+   I915_WRITE(RING_ELSP(ring), desc[2]);
+
+   /* ELSP is a write only register, s

[Intel-gfx] [PATCH 38/50] drm/i915/bdw: LR context switch interrupts

2014-05-09 Thread oscar . mateo

From: Thomas Daniel 

We need to attend context switch interrupts from all rings. Also, fixed writing
IMR/IER and added HWSTAM at ring init time.

Notice that, if added to irq_enable_mask, the context switch interrupts would
be incorrectly masked out when the user interrupts are due to no users waiting
on a sequence number. Therefore, this commit adds a bitmask of interrupts to
be kept unmasked at all times.

Signed-off-by: Thomas Daniel 

v2: Disable HWSTAM, as suggested by Damien (nobody listens to these interrupts,
anyway).

Signed-off-by: Oscar Mateo 
---
 drivers/gpu/drm/i915/i915_irq.c | 27 ++---
 drivers/gpu/drm/i915/i915_reg.h |  2 ++
 drivers/gpu/drm/i915/intel_ringbuffer.c | 36 -
 drivers/gpu/drm/i915/intel_ringbuffer.h |  1 +
 4 files changed, 45 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 873ae50..a28cf6c 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1300,7 +1300,7 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device 
*dev,
   struct drm_i915_private *dev_priv,
   u32 master_ctl)
 {
-   u32 rcs, bcs, vcs;
+   u32 rcs, bcs, vcs, vecs;
uint32_t tmp = 0;
irqreturn_t ret = IRQ_NONE;
 
@@ -1314,6 +1314,8 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device 
*dev,
notify_ring(dev, &dev_priv->ring[RCS]);
if (bcs & GT_RENDER_USER_INTERRUPT)
notify_ring(dev, &dev_priv->ring[BCS]);
+   if ((rcs | bcs) & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
+DRM_DEBUG_DRIVER("TODO: Context switch\n");
I915_WRITE(GEN8_GT_IIR(0), tmp);
} else
DRM_ERROR("The master control interrupt lied (GT0)!\n");
@@ -1326,9 +1328,13 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device 
*dev,
vcs = tmp >> GEN8_VCS1_IRQ_SHIFT;
if (vcs & GT_RENDER_USER_INTERRUPT)
notify_ring(dev, &dev_priv->ring[VCS]);
+   if (vcs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
+DRM_DEBUG_DRIVER("TODO: Context switch\n");
vcs = tmp >> GEN8_VCS2_IRQ_SHIFT;
if (vcs & GT_RENDER_USER_INTERRUPT)
notify_ring(dev, &dev_priv->ring[VCS2]);
+   if (vcs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
+DRM_DEBUG_DRIVER("TODO: Context switch\n");
I915_WRITE(GEN8_GT_IIR(1), tmp);
} else
DRM_ERROR("The master control interrupt lied (GT1)!\n");
@@ -1338,9 +1344,11 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device 
*dev,
tmp = I915_READ(GEN8_GT_IIR(3));
if (tmp) {
ret = IRQ_HANDLED;
-   vcs = tmp >> GEN8_VECS_IRQ_SHIFT;
-   if (vcs & GT_RENDER_USER_INTERRUPT)
+   vecs = tmp >> GEN8_VECS_IRQ_SHIFT;
+   if (vecs & GT_RENDER_USER_INTERRUPT)
notify_ring(dev, &dev_priv->ring[VECS]);
+   if (vecs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
+DRM_DEBUG_DRIVER("TODO: Context switch\n");
I915_WRITE(GEN8_GT_IIR(3), tmp);
} else
DRM_ERROR("The master control interrupt lied (GT3)!\n");
@@ -3243,12 +3251,17 @@ static void gen8_gt_irq_postinstall(struct 
drm_i915_private *dev_priv)
/* These are interrupts we'll toggle with the ring mask register */
uint32_t gt_interrupts[] = {
GT_RENDER_USER_INTERRUPT << GEN8_RCS_IRQ_SHIFT |
-   GT_RENDER_L3_PARITY_ERROR_INTERRUPT |
-   GT_RENDER_USER_INTERRUPT << GEN8_BCS_IRQ_SHIFT,
+   GT_RENDER_L3_PARITY_ERROR_INTERRUPT |
+   GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_RCS_IRQ_SHIFT |
+   GT_RENDER_USER_INTERRUPT << GEN8_BCS_IRQ_SHIFT |
+   GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_BCS_IRQ_SHIFT,
GT_RENDER_USER_INTERRUPT << GEN8_VCS1_IRQ_SHIFT |
-   GT_RENDER_USER_INTERRUPT << GEN8_VCS2_IRQ_SHIFT,
+   GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS1_IRQ_SHIFT |
+   GT_RENDER_USER_INTERRUPT << GEN8_VCS2_IRQ_SHIFT |
+   GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS2_IRQ_SHIFT,
0,
-   GT_RENDER_USER_INTERRUPT << GEN8_VECS_IRQ_SHIFT
+   GT_RENDER_USER_INTERRUPT << GEN8_VECS_IRQ_SHIFT |
+   GEN8_GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VECS_IRQ_SHIFT,
};

[Intel-gfx] [PATCH 41/50] drm/i915/bdw: Start queueing contexts to be submitted

2014-05-09 Thread oscar . mateo

From: Oscar Mateo 

Finally, start queueing request on ring->submit. Also, remove
remaining legacy context switches.

Signed-off-by: Oscar Mateo 
---
 drivers/gpu/drm/i915/i915_gem.c|  9 ++---
 drivers/gpu/drm/i915/i915_gem_context.c| 10 ++
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  8 +---
 drivers/gpu/drm/i915/intel_ringbuffer.c|  5 -
 4 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index f9ed89e..e2d2edb 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2813,9 +2813,12 @@ int i915_gpu_idle(struct drm_device *dev)
 
/* Flush everything onto the inactive list. */
for_each_active_ring(ring, dev_priv, i) {
-   ret = i915_switch_context(ring, ring->default_context);
-   if (ret)
-   return ret;
+   if (!dev_priv->lrc_enabled) {
+   ret = i915_switch_context(ring,
+   ring->default_context);
+   if (ret)
+   return ret;
+   }
 
ret = intel_ring_idle(ring);
if (ret)
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
b/drivers/gpu/drm/i915/i915_gem_context.c
index d4c6863..bf6264a 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -532,10 +532,12 @@ int i915_gem_context_enable(struct drm_i915_private 
*dev_priv)
 
BUG_ON(!dev_priv->ring[RCS].default_context);
 
-   for_each_active_ring(ring, dev_priv, i) {
-   ret = i915_switch_context(ring, ring->default_context);
-   if (ret)
-   return ret;
+   if (!dev_priv->lrc_enabled) {
+   for_each_active_ring(ring, dev_priv, i) {
+   ret = i915_switch_context(ring, ring->default_context);
+   if (ret)
+   return ret;
+   }
}
 
return 0;
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index f7dad8c..9d17bd8 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1288,9 +1288,11 @@ i915_gem_do_execbuffer(struct drm_device *dev, void 
*data,
if (ret)
goto err;
 
-   ret = i915_switch_context(ring, ctx);
-   if (ret)
-   goto err;
+   if (!dev_priv->lrc_enabled) {
+   ret = i915_switch_context(ring, ctx);
+   if (ret)
+   goto err;
+   }
 
if (ring == &dev_priv->ring[RCS] &&
mode != dev_priv->relative_constants_mode) {
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 9cd6ee8..94c1716 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -433,7 +433,10 @@ static void ring_write_tail(struct intel_engine *ring,
 static void gen8_submit_ctx(struct intel_engine *ring,
struct i915_hw_context *ctx, u32 value)
 {
-   DRM_ERROR("Execlists still not ready!\n");
+   if (WARN_ON(ctx == NULL))
+   ctx = ring->default_context;
+
+   gen8_switch_context_queue(ring, ctx, value);
 }
 
 u64 intel_ring_get_active_head(struct intel_engine *ring)
-- 
1.9.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 40/50] drm/i915/bdw: Handle context switch events

2014-05-09 Thread oscar . mateo

From: Thomas Daniel 

Handle all context status events in the context status buffer on every
context switch interrupt. We only remove work from the execlist queue
after a context status buffer reports that it has completed and we only
attempt to schedule new contexts on interrupt when a previously submitted
context completes (unless no contexts are queued, which means the GPU is
free).

Signed-off-by: Thomas Daniel 

v2: Unreferencing the context when we are freeing the request might free
the backing bo, which requires the struct_mutex to be grabbed, so defer
unreferencing and freeing to a bottom half.

v3:
- Ack the interrupt inmediately, before trying to handle it (fix for
missing interrupts by Bob Beckett ).
- Update the Context Status Buffer Read Pointer, just in case (spotted
by Damien Lespiau).

Signed-off-by: Oscar Mateo 
---
 drivers/gpu/drm/i915/i915_drv.h |   3 +
 drivers/gpu/drm/i915/i915_irq.c |  38 +++-
 drivers/gpu/drm/i915/intel_lrc.c| 102 +++-
 drivers/gpu/drm/i915/intel_ringbuffer.c |   1 +
 drivers/gpu/drm/i915/intel_ringbuffer.h |   1 +
 5 files changed, 129 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index f2aae6a..07b8bdc 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1748,6 +1748,8 @@ struct drm_i915_gem_request {
 
/** execlist queue entry for this request */
struct list_head execlist_link;
+   /** Struct to handle this request in the bottom half of an interrupt */
+   struct work_struct work;
 };
 
 struct drm_i915_file_private {
@@ -2449,6 +2451,7 @@ static inline u32 intel_get_lr_contextid(struct 
drm_i915_gem_object *ctx_obj)
 int gen8_switch_context_queue(struct intel_engine *ring,
  struct i915_hw_context *to,
  u32 tail);
+void gen8_handle_context_events(struct intel_engine *ring);
 
 /* i915_gem_evict.c */
 int __must_check i915_gem_evict_something(struct drm_device *dev,
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index a28cf6c..fbffead 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1300,6 +1300,7 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_device 
*dev,
   struct drm_i915_private *dev_priv,
   u32 master_ctl)
 {
+   struct intel_engine *ring;
u32 rcs, bcs, vcs, vecs;
uint32_t tmp = 0;
irqreturn_t ret = IRQ_NONE;
@@ -1307,16 +1308,22 @@ static irqreturn_t gen8_gt_irq_handler(struct 
drm_device *dev,
if (master_ctl & (GEN8_GT_RCS_IRQ | GEN8_GT_BCS_IRQ)) {
tmp = I915_READ(GEN8_GT_IIR(0));
if (tmp) {
+   I915_WRITE(GEN8_GT_IIR(0), tmp);
ret = IRQ_HANDLED;
+
rcs = tmp >> GEN8_RCS_IRQ_SHIFT;
-   bcs = tmp >> GEN8_BCS_IRQ_SHIFT;
+   ring = &dev_priv->ring[RCS];
if (rcs & GT_RENDER_USER_INTERRUPT)
-   notify_ring(dev, &dev_priv->ring[RCS]);
+   notify_ring(dev, ring);
+   if (rcs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
+   gen8_handle_context_events(ring);
+
+   bcs = tmp >> GEN8_BCS_IRQ_SHIFT;
+   ring = &dev_priv->ring[BCS];
if (bcs & GT_RENDER_USER_INTERRUPT)
-   notify_ring(dev, &dev_priv->ring[BCS]);
-   if ((rcs | bcs) & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
-DRM_DEBUG_DRIVER("TODO: Context switch\n");
-   I915_WRITE(GEN8_GT_IIR(0), tmp);
+   notify_ring(dev, ring);
+   if (bcs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
+   gen8_handle_context_events(ring);
} else
DRM_ERROR("The master control interrupt lied (GT0)!\n");
}
@@ -1324,18 +1331,20 @@ static irqreturn_t gen8_gt_irq_handler(struct 
drm_device *dev,
if (master_ctl & (GEN8_GT_VCS1_IRQ | GEN8_GT_VCS2_IRQ)) {
tmp = I915_READ(GEN8_GT_IIR(1));
if (tmp) {
+   I915_WRITE(GEN8_GT_IIR(1), tmp);
ret = IRQ_HANDLED;
vcs = tmp >> GEN8_VCS1_IRQ_SHIFT;
+   ring = &dev_priv->ring[VCS];
if (vcs & GT_RENDER_USER_INTERRUPT)
-   notify_ring(dev, &dev_priv->ring[VCS]);
+   notify_ring(dev, ring);
if (vcs & GEN8_GT_CONTEXT_SWITCH_INTERRUPT)
-DRM_DEBUG_DRIVER("TODO: Context switch\n");
+gen8_handle_c

[Intel-gfx] [PATCH 37/50] drm/i915/bdw: Don't write PDP in the legacy way when using LRCs

2014-05-09 Thread oscar . mateo

From: Oscar Mateo 

This is mostly for correctness so that we know we are running the LR
context correctly (this is, the PDPs are contained inside the context
object).

Signed-off-by: Oscar Mateo 
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index a0993c0..de4a982 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -240,11 +240,15 @@ static int gen8_mm_switch(struct i915_hw_ppgtt *ppgtt,
  struct intel_engine *ring,
  bool synchronous)
 {
+   struct drm_i915_private *dev_priv = ring->dev->dev_private;
int i, ret;
 
/* bit of a hack to find the actual last used pd */
int used_pd = ppgtt->num_pd_entries / GEN8_PDES_PER_PAGE;
 
+   if (dev_priv->lrc_enabled)
+   return 0;
+
for (i = used_pd - 1; i >= 0; i--) {
dma_addr_t addr = ppgtt->pd_dma_addr[i];
ret = gen8_write_pdp(ring, i, addr, synchronous);
-- 
1.9.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH 36/50] drm/i915/bdw: Write the tail pointer, LRC style

2014-05-09 Thread oscar . mateo

From: Oscar Mateo 

Each logical ring context has the tail pointer in the context object,
so update it before submission.

Signed-off-by: Oscar Mateo 
---
 drivers/gpu/drm/i915/intel_lrc.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 54cbb4b..b06098e 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -158,6 +158,21 @@ static void submit_execlist(struct intel_engine *ring,
gen6_gt_force_wake_put(dev_priv, FORCEWAKE_ALL);
 }
 
+static int lr_context_write_tail(struct drm_i915_gem_object *ctx_obj, u32 tail)
+{
+   struct page *page;
+   uint32_t *reg_state;
+
+   page = i915_gem_object_get_page(ctx_obj, 1);
+   reg_state = kmap_atomic(page);
+
+   reg_state[CTX_RING_TAIL+1] = tail;
+
+   kunmap_atomic(reg_state);
+
+   return 0;
+}
+
 static int gen8_switch_context(struct intel_engine *ring,
struct i915_hw_context *to0, u32 tail0,
struct i915_hw_context *to1, u32 tail1)
@@ -169,10 +184,14 @@ static int gen8_switch_context(struct intel_engine *ring,
BUG_ON(!ctx_obj0);
BUG_ON(!i915_gem_obj_is_pinned(ctx_obj0));
 
+   lr_context_write_tail(ctx_obj0, tail0);
+
if (to1) {
ctx_obj1 = to1->engine[ring->id].obj;
BUG_ON(!ctx_obj1);
BUG_ON(!i915_gem_obj_is_pinned(ctx_obj1));
+
+   lr_context_write_tail(ctx_obj1, tail1);
}
 
submit_execlist(ring, ctx_obj0, ctx_obj1);
-- 
1.9.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

1 2 >

1 - 100 of 138 matches

Mail list logo