Re: [Intel-gfx] [PATCH v2 14/24] drm/i915: Finish gen6/7 dynamic page table allocation

2015-01-13 Thread Michel Thierry

On 1/5/2015 2:45 PM, Daniel Vetter wrote:

On Tue, Dec 23, 2014 at 05:16:17PM +, Michel Thierry wrote:

From: Ben Widawsky benjamin.widaw...@intel.com

This patch continues on the idea from the previous patch. From here on,
in the steady state, PDEs are all pointing to the scratch page table (as
recommended in the spec). When an object is allocated in the VA range,
the code will determine if we need to allocate a page for the page
table. Similarly when the object is destroyed, we will remove, and free
the page table pointing the PDE back to the scratch page.

Following patches will work to unify the code a bit as we bring in GEN8
support. GEN6 and GEN8 are different enough that I had a hard time to
get to this point with as much common code as I do.

The aliasing PPGTT must pre-allocate all of the page tables. There are a
few reasons for this. Two trivial ones: aliasing ppgtt goes through the
ggtt paths, so it's hard to maintain, we currently do not restore the
default context (assuming the previous force reload is indeed
necessary). Most importantly though, the only way (it seems from
empirical evidence) to invalidate the CS TLBs on non-render ring is to
either use ring sync (which requires actually stopping the rings in
order to synchronize when the sync completes vs. where you are in
execution), or to reload DCLV.  Since without full PPGTT we do not ever
reload the DCLV register, there is no good way to achieve this. The
simplest solution is just to not support dynamic page table
creation/destruction in the aliasing PPGTT.

We could always reload DCLV, but this seems like quite a bit of excess
overhead only to save at most 2MB-4k of memory for the aliasing PPGTT
page tables.

v2: Make the page table bitmap declared inside the function (Chris)
Simplify the way scratching address space works.
Move the alloc/teardown tracepoints up a level in the call stack so that
both all implementations get the trace.

v3: Updated trace event to spit out a name

v4: Aliasing ppgtt is now initialized differently (in setup global gtt)

v5: Rebase to latest code. Also removed unnecessary aliasing ppgtt check for
trace, as it is no longer possible after the PPGTT cleanup patch series
of a couple of months ago (Daniel).

Cc: Daniel Vetter dan...@ffwll.ch
Signed-off-by: Ben Widawsky b...@bwidawsk.net
Signed-off-by: Michel Thierry michel.thie...@intel.com (v4+)

The tracepoints should be split into a separate patch. Although the
teardown stuff will likely disappear I guess ...

Two more comments below.
-Daniel


---
  drivers/gpu/drm/i915/i915_debugfs.c |   3 +-
  drivers/gpu/drm/i915/i915_gem.c |   2 +
  drivers/gpu/drm/i915/i915_gem_gtt.c | 128 
  drivers/gpu/drm/i915/i915_trace.h   | 115 
  4 files changed, 236 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index 60f91bc..0f63076 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2149,6 +2149,8 @@ static void gen6_ppgtt_info(struct seq_file *m, struct 
drm_device *dev)
seq_printf(m, PP_DIR_BASE_READ: 0x%08x\n, 
I915_READ(RING_PP_DIR_BASE_READ(ring)));
seq_printf(m, PP_DIR_DCLV: 0x%08x\n, 
I915_READ(RING_PP_DIR_DCLV(ring)));
}
+   seq_printf(m, ECOCHK: 0x%08x\n\n, I915_READ(GAM_ECOCHK));
+
if (dev_priv-mm.aliasing_ppgtt) {
struct i915_hw_ppgtt *ppgtt = dev_priv-mm.aliasing_ppgtt;
  
@@ -2165,7 +2167,6 @@ static void gen6_ppgtt_info(struct seq_file *m, struct drm_device *dev)

   get_pid_task(file-pid, PIDTYPE_PID)-comm);
idr_for_each(file_priv-context_idr, per_file_ctx, m);
}
-   seq_printf(m, ECOCHK: 0x%08x\n, I915_READ(GAM_ECOCHK));
  }
  
  static int i915_ppgtt_info(struct seq_file *m, void *data)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 5d52990..1649fb2 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3599,6 +3599,8 @@ search_free:
  
  	/*  allocate before insert / bind */

if (vma-vm-allocate_va_range) {
+   trace_i915_va_alloc(vma-vm, vma-node.start, vma-node.size,
+   VM_TO_TRACE_NAME(vma-vm));
ret = vma-vm-allocate_va_range(vma-vm,
vma-node.start,
vma-node.size);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 54c7ca7..32a355a 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1138,10 +1138,47 @@ static void gen6_ppgtt_unmap_pages(struct i915_hw_ppgtt 
*ppgtt)
  static int gen6_alloc_va_range(struct i915_address_space *vm,
   uint64_t start, uint64_t length)
  {
+   DECLARE_BITMAP(new_page_tables, 

Re: [Intel-gfx] [PATCH v2 14/24] drm/i915: Finish gen6/7 dynamic page table allocation

2015-01-13 Thread Daniel Vetter
On Tue, Jan 13, 2015 at 11:53:22AM +, Michel Thierry wrote:
 On 1/5/2015 2:45 PM, Daniel Vetter wrote:
 Aside: Should we only allocate the scratch_pt for !aliasing?
 The next patch version will have the changes.
 About the scratch_pt, I'm not sure if it's a requirement in gen6/7 (to point
 unused page tables to the scratch, e.g. if there's less than 2GB).
 We know in gen8 that's the case, and systems with less than 4GB must have
 the remaining PDPs set to scratch page.

Assuming I understand things correctly we need the scratch_pt for
replacing the intermediate levels to collapse the tree a bit. Which is
useful both on gen7 and gen8+.

My question was to move the allocation into the full-ppgtt code (i.e.
i915.enable_ppgtt=2, not =1) since we don't really need this for aliasing
ppgtt mode. That should help a bit with readability since it makes it
clearer which things are used in which modes of the driver.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v2 14/24] drm/i915: Finish gen6/7 dynamic page table allocation

2015-01-05 Thread Daniel Vetter
On Tue, Dec 23, 2014 at 05:16:17PM +, Michel Thierry wrote:
 From: Ben Widawsky benjamin.widaw...@intel.com
 
 This patch continues on the idea from the previous patch. From here on,
 in the steady state, PDEs are all pointing to the scratch page table (as
 recommended in the spec). When an object is allocated in the VA range,
 the code will determine if we need to allocate a page for the page
 table. Similarly when the object is destroyed, we will remove, and free
 the page table pointing the PDE back to the scratch page.
 
 Following patches will work to unify the code a bit as we bring in GEN8
 support. GEN6 and GEN8 are different enough that I had a hard time to
 get to this point with as much common code as I do.
 
 The aliasing PPGTT must pre-allocate all of the page tables. There are a
 few reasons for this. Two trivial ones: aliasing ppgtt goes through the
 ggtt paths, so it's hard to maintain, we currently do not restore the
 default context (assuming the previous force reload is indeed
 necessary). Most importantly though, the only way (it seems from
 empirical evidence) to invalidate the CS TLBs on non-render ring is to
 either use ring sync (which requires actually stopping the rings in
 order to synchronize when the sync completes vs. where you are in
 execution), or to reload DCLV.  Since without full PPGTT we do not ever
 reload the DCLV register, there is no good way to achieve this. The
 simplest solution is just to not support dynamic page table
 creation/destruction in the aliasing PPGTT.
 
 We could always reload DCLV, but this seems like quite a bit of excess
 overhead only to save at most 2MB-4k of memory for the aliasing PPGTT
 page tables.
 
 v2: Make the page table bitmap declared inside the function (Chris)
 Simplify the way scratching address space works.
 Move the alloc/teardown tracepoints up a level in the call stack so that
 both all implementations get the trace.
 
 v3: Updated trace event to spit out a name
 
 v4: Aliasing ppgtt is now initialized differently (in setup global gtt)
 
 v5: Rebase to latest code. Also removed unnecessary aliasing ppgtt check for
 trace, as it is no longer possible after the PPGTT cleanup patch series
 of a couple of months ago (Daniel).
 
 Cc: Daniel Vetter dan...@ffwll.ch
 Signed-off-by: Ben Widawsky b...@bwidawsk.net
 Signed-off-by: Michel Thierry michel.thie...@intel.com (v4+)

The tracepoints should be split into a separate patch. Although the
teardown stuff will likely disappear I guess ...

Two more comments below.
-Daniel

 ---
  drivers/gpu/drm/i915/i915_debugfs.c |   3 +-
  drivers/gpu/drm/i915/i915_gem.c |   2 +
  drivers/gpu/drm/i915/i915_gem_gtt.c | 128 
 
  drivers/gpu/drm/i915/i915_trace.h   | 115 
  4 files changed, 236 insertions(+), 12 deletions(-)
 
 diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
 b/drivers/gpu/drm/i915/i915_debugfs.c
 index 60f91bc..0f63076 100644
 --- a/drivers/gpu/drm/i915/i915_debugfs.c
 +++ b/drivers/gpu/drm/i915/i915_debugfs.c
 @@ -2149,6 +2149,8 @@ static void gen6_ppgtt_info(struct seq_file *m, struct 
 drm_device *dev)
   seq_printf(m, PP_DIR_BASE_READ: 0x%08x\n, 
 I915_READ(RING_PP_DIR_BASE_READ(ring)));
   seq_printf(m, PP_DIR_DCLV: 0x%08x\n, 
 I915_READ(RING_PP_DIR_DCLV(ring)));
   }
 + seq_printf(m, ECOCHK: 0x%08x\n\n, I915_READ(GAM_ECOCHK));
 +
   if (dev_priv-mm.aliasing_ppgtt) {
   struct i915_hw_ppgtt *ppgtt = dev_priv-mm.aliasing_ppgtt;
  
 @@ -2165,7 +2167,6 @@ static void gen6_ppgtt_info(struct seq_file *m, struct 
 drm_device *dev)
  get_pid_task(file-pid, PIDTYPE_PID)-comm);
   idr_for_each(file_priv-context_idr, per_file_ctx, m);
   }
 - seq_printf(m, ECOCHK: 0x%08x\n, I915_READ(GAM_ECOCHK));
  }
  
  static int i915_ppgtt_info(struct seq_file *m, void *data)
 diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
 index 5d52990..1649fb2 100644
 --- a/drivers/gpu/drm/i915/i915_gem.c
 +++ b/drivers/gpu/drm/i915/i915_gem.c
 @@ -3599,6 +3599,8 @@ search_free:
  
   /*  allocate before insert / bind */
   if (vma-vm-allocate_va_range) {
 + trace_i915_va_alloc(vma-vm, vma-node.start, vma-node.size,
 + VM_TO_TRACE_NAME(vma-vm));
   ret = vma-vm-allocate_va_range(vma-vm,
   vma-node.start,
   vma-node.size);
 diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
 b/drivers/gpu/drm/i915/i915_gem_gtt.c
 index 54c7ca7..32a355a 100644
 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
 +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
 @@ -1138,10 +1138,47 @@ static void gen6_ppgtt_unmap_pages(struct 
 i915_hw_ppgtt *ppgtt)
  static int gen6_alloc_va_range(struct i915_address_space *vm,
  uint64_t start, uint64_t length)
  {
 + 

[Intel-gfx] [PATCH v2 14/24] drm/i915: Finish gen6/7 dynamic page table allocation

2014-12-23 Thread Michel Thierry
From: Ben Widawsky benjamin.widaw...@intel.com

This patch continues on the idea from the previous patch. From here on,
in the steady state, PDEs are all pointing to the scratch page table (as
recommended in the spec). When an object is allocated in the VA range,
the code will determine if we need to allocate a page for the page
table. Similarly when the object is destroyed, we will remove, and free
the page table pointing the PDE back to the scratch page.

Following patches will work to unify the code a bit as we bring in GEN8
support. GEN6 and GEN8 are different enough that I had a hard time to
get to this point with as much common code as I do.

The aliasing PPGTT must pre-allocate all of the page tables. There are a
few reasons for this. Two trivial ones: aliasing ppgtt goes through the
ggtt paths, so it's hard to maintain, we currently do not restore the
default context (assuming the previous force reload is indeed
necessary). Most importantly though, the only way (it seems from
empirical evidence) to invalidate the CS TLBs on non-render ring is to
either use ring sync (which requires actually stopping the rings in
order to synchronize when the sync completes vs. where you are in
execution), or to reload DCLV.  Since without full PPGTT we do not ever
reload the DCLV register, there is no good way to achieve this. The
simplest solution is just to not support dynamic page table
creation/destruction in the aliasing PPGTT.

We could always reload DCLV, but this seems like quite a bit of excess
overhead only to save at most 2MB-4k of memory for the aliasing PPGTT
page tables.

v2: Make the page table bitmap declared inside the function (Chris)
Simplify the way scratching address space works.
Move the alloc/teardown tracepoints up a level in the call stack so that
both all implementations get the trace.

v3: Updated trace event to spit out a name

v4: Aliasing ppgtt is now initialized differently (in setup global gtt)

v5: Rebase to latest code. Also removed unnecessary aliasing ppgtt check for
trace, as it is no longer possible after the PPGTT cleanup patch series
of a couple of months ago (Daniel).

Cc: Daniel Vetter dan...@ffwll.ch
Signed-off-by: Ben Widawsky b...@bwidawsk.net
Signed-off-by: Michel Thierry michel.thie...@intel.com (v4+)
---
 drivers/gpu/drm/i915/i915_debugfs.c |   3 +-
 drivers/gpu/drm/i915/i915_gem.c |   2 +
 drivers/gpu/drm/i915/i915_gem_gtt.c | 128 
 drivers/gpu/drm/i915/i915_trace.h   | 115 
 4 files changed, 236 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index 60f91bc..0f63076 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2149,6 +2149,8 @@ static void gen6_ppgtt_info(struct seq_file *m, struct 
drm_device *dev)
seq_printf(m, PP_DIR_BASE_READ: 0x%08x\n, 
I915_READ(RING_PP_DIR_BASE_READ(ring)));
seq_printf(m, PP_DIR_DCLV: 0x%08x\n, 
I915_READ(RING_PP_DIR_DCLV(ring)));
}
+   seq_printf(m, ECOCHK: 0x%08x\n\n, I915_READ(GAM_ECOCHK));
+
if (dev_priv-mm.aliasing_ppgtt) {
struct i915_hw_ppgtt *ppgtt = dev_priv-mm.aliasing_ppgtt;
 
@@ -2165,7 +2167,6 @@ static void gen6_ppgtt_info(struct seq_file *m, struct 
drm_device *dev)
   get_pid_task(file-pid, PIDTYPE_PID)-comm);
idr_for_each(file_priv-context_idr, per_file_ctx, m);
}
-   seq_printf(m, ECOCHK: 0x%08x\n, I915_READ(GAM_ECOCHK));
 }
 
 static int i915_ppgtt_info(struct seq_file *m, void *data)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 5d52990..1649fb2 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3599,6 +3599,8 @@ search_free:
 
/*  allocate before insert / bind */
if (vma-vm-allocate_va_range) {
+   trace_i915_va_alloc(vma-vm, vma-node.start, vma-node.size,
+   VM_TO_TRACE_NAME(vma-vm));
ret = vma-vm-allocate_va_range(vma-vm,
vma-node.start,
vma-node.size);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 54c7ca7..32a355a 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1138,10 +1138,47 @@ static void gen6_ppgtt_unmap_pages(struct i915_hw_ppgtt 
*ppgtt)
 static int gen6_alloc_va_range(struct i915_address_space *vm,
   uint64_t start, uint64_t length)
 {
+   DECLARE_BITMAP(new_page_tables, GEN6_PPGTT_PD_ENTRIES);
+   struct drm_device *dev = vm-dev;
+   struct drm_i915_private *dev_priv = dev-dev_private;
struct i915_hw_ppgtt *ppgtt =
container_of(vm, struct i915_hw_ppgtt, base);
struct i915_pagetab