Re: intel hw and caching interface to TTM..

2007-10-31 Thread Thomas Hellström
Dave Airlie wrote:

  
  

OK. We're using this functionality in Poulsbo, so we should probably
sort this out to avoid breaking things.



Okay I'll try and fix it back up tomorrow.. 

  


I've attached my first attempt at fixing this up but it messes up by the 
time the flags hit the ttm binding function they've lost the CACHED_MAPPED 
bit and I end up with snooped bindings, drm_bo_mt_compatible seems to be 
where it disappears and also where my sanity started to end.. are you sure 
there isn't a nicer way of doing those flags/mask because the current 
system is vastly confusing to trace through..

Can you suggest what I'm missing to fix this up?

  

Dave, as a general rule, bo_mem-mask is the request and bo_mem-flags 
is the code's response to that request. The move code deals with the 
DRM_BO_MASK_MEMTYPE, whereas the rest of the flags ~DRM_BO_MASK_MEMTYPE 
are transferred from the mask in drm_buffer_object_validate, which is 
after the move and bind.

So I guess you should extend DRM_BO_MASK_MEMTYPE and do a quick sanity 
check (perhaps just a passthrough) in drm_bo_mt_compatible. That should 
do the drick.

That said, it's probably possible to do this in a nicer way, yes.

Dave.



diff --git a/linux-core/drm_agpsupport.c b/linux-core/drm_agpsupport.c
index 8c7f570..4015659 100644
--- a/linux-core/drm_agpsupport.c
+++ b/linux-core/drm_agpsupport.c
@@ -541,11 +541,17 @@ static int drm_agp_bind_ttm(struct drm_ttm_backend 
*backend,
   container_of(backend, struct drm_agp_ttm_backend, backend);
   DRM_AGP_MEM *mem = agp_be-mem;
   int ret;
+  int snooped = (bo_mem-flags  DRM_BO_FLAG_CACHED)  !(bo_mem-flags  
DRM_BO_FLAG_CACHED_MAPPED);
 
   DRM_DEBUG(drm_agp_bind_ttm\n);
   mem-is_flushed = TRUE;
-  mem-type = (bo_mem-flags  DRM_BO_FLAG_CACHED) ? 
AGP_USER_CACHED_MEMORY :
-  AGP_USER_MEMORY;
+  mem-type = AGP_USER_MEMORY;
+  /* CACHED MAPPED implies not snooped memory */
+  if (snooped) {
+  DRM_ERROR(snooped is set oh no 0x%016llx\n, (unsigned long 
long)bo_mem-flags);
+  mem-type = AGP_USER_CACHED_MEMORY;
+  }
+
   ret = drm_agp_bind_memory(mem, bo_mem-mm_node-start);
   if (ret) {
   DRM_ERROR(AGP Bind memory failed\n);
diff --git a/linux-core/drm_bo.c b/linux-core/drm_bo.c
index 16203c7..05b9f5a 100644
--- a/linux-core/drm_bo.c
+++ b/linux-core/drm_bo.c
@@ -1029,7 +1029,7 @@ static int drm_bo_busy(struct drm_buffer_object * bo)
   return 0;
 }
 
-static int drm_bo_read_cached(struct drm_buffer_object * bo)
+static int drm_bo_evict_cached(struct drm_buffer_object * bo)
 {
   int ret = 0;
 
@@ -1177,15 +1177,11 @@ static int drm_buffer_object_map(struct drm_file 
*file_priv, uint32_t handle,
   goto out;
   }
 
-  if ((map_flags  DRM_BO_FLAG_READ) 
-  (bo-mem.flags  DRM_BO_FLAG_READ_CACHED) 
-  (!(bo-mem.flags  DRM_BO_FLAG_CACHED))) {
-  drm_bo_read_cached(bo);
-  }
+  if (bo-mem.flags  DRM_BO_FLAG_CACHED_MAPPED)
+  drm_bo_evict_cached(bo);
+
   break;
-  } else if ((map_flags  DRM_BO_FLAG_READ) 
- (bo-mem.flags  DRM_BO_FLAG_READ_CACHED) 
- (!(bo-mem.flags  DRM_BO_FLAG_CACHED))) {
+  } else if (bo-mem.flags  DRM_BO_FLAG_CACHED_MAPPED) {
 
   /*
* We are already mapped with different flags.
@@ -1292,6 +1288,7 @@ int drm_bo_move_buffer(struct drm_buffer_object * bo, 
uint64_t new_mem_flags,
   if (ret)
   return ret;
 
+  DRM_DEBUG(new mem flags 0x%016llx\n, new_mem_flags);
   mem.num_pages = bo-num_pages;
   mem.size = mem.num_pages  PAGE_SHIFT;
   mem.mask = new_mem_flags;
@@ -1666,7 +1663,6 @@ int drm_buffer_object_create(struct drm_device *dev,
   DRM_BO_FLAG_MAPPABLE;
   atomic_inc(bm-count);
   ret = drm_bo_new_mask(bo, mask, hint);
-
   if (ret)
   goto out_err;
 
diff --git a/linux-core/drm_ttm.c b/linux-core/drm_ttm.c
index df9e7e4..83aa28a 100644
--- a/linux-core/drm_ttm.c
+++ b/linux-core/drm_ttm.c
@@ -327,9 +327,10 @@ int drm_bind_ttm(struct drm_ttm * ttm, struct 
drm_bo_mem_reg *bo_mem)
   if (ret)
   return ret;
 
+  DRM_DEBUG(bo-mem flags is 0x%016llx\n, bo_mem-flags);
   if (ttm-state == ttm_unbound  !(bo_mem-flags  DRM_BO_FLAG_CACHED)) 
 {
   drm_set_caching(ttm, DRM_TTM_PAGE_UNCACHED);
-  } else if ((bo_mem-flags  DRM_BO_FLAG_CACHED) 
+  } else if ((bo_mem-flags  DRM_BO_FLAG_CACHED_MAPPED) 
  bo_driver-ttm_cache_flush)
   bo_driver-ttm_cache_flush(ttm);
 
diff --git a/shared-core/drm.h 

Re: intel hw and caching interface to TTM..

2007-10-31 Thread Dave Airlie


  OK. We're using this functionality in Poulsbo, so we should probably
  sort this out to avoid breaking things.
  
 
  Okay I'll try and fix it back up tomorrow.. 
 

I've attached my first attempt at fixing this up but it messes up by the 
time the flags hit the ttm binding function they've lost the CACHED_MAPPED 
bit and I end up with snooped bindings, drm_bo_mt_compatible seems to be 
where it disappears and also where my sanity started to end.. are you sure 
there isn't a nicer way of doing those flags/mask because the current 
system is vastly confusing to trace through..

Can you suggest what I'm missing to fix this up?

Dave.diff --git a/linux-core/drm_agpsupport.c b/linux-core/drm_agpsupport.c
index 8c7f570..4015659 100644
--- a/linux-core/drm_agpsupport.c
+++ b/linux-core/drm_agpsupport.c
@@ -541,11 +541,17 @@ static int drm_agp_bind_ttm(struct drm_ttm_backend 
*backend,
container_of(backend, struct drm_agp_ttm_backend, backend);
DRM_AGP_MEM *mem = agp_be-mem;
int ret;
+   int snooped = (bo_mem-flags  DRM_BO_FLAG_CACHED)  !(bo_mem-flags  
DRM_BO_FLAG_CACHED_MAPPED);
 
DRM_DEBUG(drm_agp_bind_ttm\n);
mem-is_flushed = TRUE;
-   mem-type = (bo_mem-flags  DRM_BO_FLAG_CACHED) ? 
AGP_USER_CACHED_MEMORY :
-   AGP_USER_MEMORY;
+   mem-type = AGP_USER_MEMORY;
+   /* CACHED MAPPED implies not snooped memory */
+   if (snooped) {
+   DRM_ERROR(snooped is set oh no 0x%016llx\n, (unsigned long 
long)bo_mem-flags);
+   mem-type = AGP_USER_CACHED_MEMORY;
+   }
+
ret = drm_agp_bind_memory(mem, bo_mem-mm_node-start);
if (ret) {
DRM_ERROR(AGP Bind memory failed\n);
diff --git a/linux-core/drm_bo.c b/linux-core/drm_bo.c
index 16203c7..05b9f5a 100644
--- a/linux-core/drm_bo.c
+++ b/linux-core/drm_bo.c
@@ -1029,7 +1029,7 @@ static int drm_bo_busy(struct drm_buffer_object * bo)
return 0;
 }
 
-static int drm_bo_read_cached(struct drm_buffer_object * bo)
+static int drm_bo_evict_cached(struct drm_buffer_object * bo)
 {
int ret = 0;
 
@@ -1177,15 +1177,11 @@ static int drm_buffer_object_map(struct drm_file 
*file_priv, uint32_t handle,
goto out;
}
 
-   if ((map_flags  DRM_BO_FLAG_READ) 
-   (bo-mem.flags  DRM_BO_FLAG_READ_CACHED) 
-   (!(bo-mem.flags  DRM_BO_FLAG_CACHED))) {
-   drm_bo_read_cached(bo);
-   }
+   if (bo-mem.flags  DRM_BO_FLAG_CACHED_MAPPED)
+   drm_bo_evict_cached(bo);
+
break;
-   } else if ((map_flags  DRM_BO_FLAG_READ) 
-  (bo-mem.flags  DRM_BO_FLAG_READ_CACHED) 
-  (!(bo-mem.flags  DRM_BO_FLAG_CACHED))) {
+   } else if (bo-mem.flags  DRM_BO_FLAG_CACHED_MAPPED) {
 
/*
 * We are already mapped with different flags.
@@ -1292,6 +1288,7 @@ int drm_bo_move_buffer(struct drm_buffer_object * bo, 
uint64_t new_mem_flags,
if (ret)
return ret;
 
+   DRM_DEBUG(new mem flags 0x%016llx\n, new_mem_flags);
mem.num_pages = bo-num_pages;
mem.size = mem.num_pages  PAGE_SHIFT;
mem.mask = new_mem_flags;
@@ -1666,7 +1663,6 @@ int drm_buffer_object_create(struct drm_device *dev,
DRM_BO_FLAG_MAPPABLE;
atomic_inc(bm-count);
ret = drm_bo_new_mask(bo, mask, hint);
-
if (ret)
goto out_err;
 
diff --git a/linux-core/drm_ttm.c b/linux-core/drm_ttm.c
index df9e7e4..83aa28a 100644
--- a/linux-core/drm_ttm.c
+++ b/linux-core/drm_ttm.c
@@ -327,9 +327,10 @@ int drm_bind_ttm(struct drm_ttm * ttm, struct 
drm_bo_mem_reg *bo_mem)
if (ret)
return ret;
 
+   DRM_DEBUG(bo-mem flags is 0x%016llx\n, bo_mem-flags);
if (ttm-state == ttm_unbound  !(bo_mem-flags  DRM_BO_FLAG_CACHED)) 
{
drm_set_caching(ttm, DRM_TTM_PAGE_UNCACHED);
-   } else if ((bo_mem-flags  DRM_BO_FLAG_CACHED) 
+   } else if ((bo_mem-flags  DRM_BO_FLAG_CACHED_MAPPED) 
   bo_driver-ttm_cache_flush)
bo_driver-ttm_cache_flush(ttm);
 
diff --git a/shared-core/drm.h b/shared-core/drm.h
index 3a10273..9d00e9f 100644
--- a/shared-core/drm.h
+++ b/shared-core/drm.h
@@ -700,10 +700,12 @@ struct drm_fence_arg {
  */
 #define DRM_BO_FLAG_NO_MOVE (1ULL  8)
 
-/* Mask: Make sure the buffer is in cached memory when mapped for reading.
+/* Mask: Make sure the buffer is in cached memory when mapped
  * Flags: Acknowledge.
+ * Buffers allocated with this flag should not be used for suballocators
  */
-#define DRM_BO_FLAG_READ_CACHED(1ULL  19)
+#define DRM_BO_FLAG_CACHED_MAPPED(1ULL  19)
+
 
 /* Mask: Force DRM_BO_FLAG_CACHED flag strictly also if it is set.
  * 

Re: intel hw and caching interface to TTM..

2007-10-30 Thread Thomas Hellström
Dave Airlie wrote:

Hi,

So currently the TTM interface allows the user specify a cacheable 
allocation, and on Intel hardware this gets conflated with using the intel 
snooped memory type in the GART. This is bad as the intel snooped memory 
type comes with its own set of special rules and sucks for lots of things.

However I want to be able to use the cacheable CPU memory with the GPU to 
a) make things go fast
b) avoid lots of SMP cross-talking and cache flushing (see a)
c) buffer object creation faster
  

Cool.

So this led me to the patch at:
http://cgit.freedesktop.org/~airlied/drm/diff/?h=intel-hackeryid=da14a0bbb8849cdc91ca87786fde90ac36fe1198

I could add back the snooped option if we want using a driver private 
flag.
  

Dave, I'd like to see the flag DRM_BO_FLAG_CACHED really mean 
cache-coherent memory, that is cache coherent also while visible to the 
GPU. There are HW implementations out there (Poulsbo at least) where 
this option actually seems to work, althought it's considerably slower 
for things like texturing. It's also a requirement for user bo's since 
they will have VMAs that we cant kill and remap.

Could we perhaps change the flag DRM_BO_FLAG_READ_CACHED to mean 
DRM_BO_FLAG_MAPPED_CACHED to implement the behaviour you describe. This 
will also indicate that the buffer cannot be used for user-space 
sub-allocators, as we in that case must be able to guarantee that the 
CPU can access parts of the buffer while other parts are validated for 
the GPU.

This patch evicts the buffer on mapping so the GPU doesn't see anything 
via the aperture or otherwise, and flushes before validating into the 
aperture. It doesn't contain the chipset flush patch yet which is requried 
to actually make it work (add agp_chipset_flush to i915_dma.c before 
submitting the batchbuffer)
  

OK.

This works and appears to be nice and fast, all userspace buffers can be 
allocated _LOCAL | _CACHED and validated to _TT later without any major 
cache flushing overhead when we have clflush, and without SMP overhead at 
all as cache flushing is cache coherent on the Intel chipsets I've played 
with so far (CPU coherent- not GPU)
  

Does this mean that clflush() on one processor flushes the cache line on 
all processors in an SMP system? No need for preemption guarding and IPIs?

I of course need to makes this code not so x86 specific, so I might add a 
page flush hook to the driver interface and put the flushing code in the 
driver side.

This also leads me into backwards compatibility, the chipset flushing 
changes to AGP are required for all of this good stuff, options are 
1) resurrect linux-agp-compat add chipset flushing code - easier
2) try and hack chipset flushing into drm_compat.c - probably more 
difficult that I would like to bother with..

So comments please on whether a comeback for linux-agp-compat is a good or 
bad thing..

  

Perhaps an intel-specific TTM backend?

Dave.


  

/Thomas



-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now  http://get.splunk.com/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: intel hw and caching interface to TTM..

2007-10-30 Thread Dave Airlie

 Dave, I'd like to see the flag DRM_BO_FLAG_CACHED really mean cache-coherent
 memory, that is cache coherent also while visible to the GPU. There are HW
 implementations out there (Poulsbo at least) where this option actually seems
 to work, althought it's considerably slower for things like texturing. It's
 also a requirement for user bo's since they will have VMAs that we cant kill
 and remap.

Most PCIE cards will be cache coherent, however AGP cards not so much, so 
need to think if a generic _CACHED makes sense especially for something 
like radeon, will I have to pass different flags depending on the GART 
type this seems like uggh.. so maybe a separate flag makes more 
sense..

 
 Could we perhaps change the flag DRM_BO_FLAG_READ_CACHED to mean
 DRM_BO_FLAG_MAPPED_CACHED to implement the behaviour you describe. This will
 also indicate that the buffer cannot be used for user-space sub-allocators, as
 we in that case must be able to guarantee that the CPU can access parts of the
 buffer while other parts are validated for the GPU.

Yes, to be honest sub-allocators for most use-cases should be avoided if 
possible, we should be able to make the kernel interface fast enough for 
most things if we don't have to switching caching flags on the fly at 
map/destroy etc.. 
 
 Does this mean that clflush() on one processor flushes the cache line on all
 processors in an SMP system? No need for preemption guarding and IPIs?

Yes this is specified to do this, tlb flushing would require ipi but we 
should mostly be able to avoid doing tlb flushes from what I can see, the 
problem I was seeing on i9xx was the chipset had its own Global Write 
Buffer which wasn't that well documented.. so stuff could hide in it..

  So comments please on whether a comeback for linux-agp-compat is a good or
  bad thing..
  
   
 Perhaps an intel-specific TTM backend?

Actually I did a backport into a i915_compat.c and it seems like I can 
make that work, 

However something in the changes you made in your branch, broke something 
on me, so I'll have to spend tomorrow rebasing my tree with your patches 
to see what went wrong, something doesn't appear to be getting flushed at 
the right time.

Dave.

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now  http://get.splunk.com/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: intel hw and caching interface to TTM..

2007-10-30 Thread Keith Whitwell
Dave Airlie wrote:
 Dave, I'd like to see the flag DRM_BO_FLAG_CACHED really mean cache-coherent
 memory, that is cache coherent also while visible to the GPU. There are HW
 implementations out there (Poulsbo at least) where this option actually seems
 to work, althought it's considerably slower for things like texturing. It's
 also a requirement for user bo's since they will have VMAs that we cant kill
 and remap.
 
 Most PCIE cards will be cache coherent, however AGP cards not so much, so 
 need to think if a generic _CACHED makes sense especially for something 
 like radeon, will I have to pass different flags depending on the GART 
 type this seems like uggh.. so maybe a separate flag makes more 
 sense..
 
 Could we perhaps change the flag DRM_BO_FLAG_READ_CACHED to mean
 DRM_BO_FLAG_MAPPED_CACHED to implement the behaviour you describe. This will
 also indicate that the buffer cannot be used for user-space sub-allocators, 
 as
 we in that case must be able to guarantee that the CPU can access parts of 
 the
 buffer while other parts are validated for the GPU.
 
 Yes, to be honest sub-allocators for most use-cases should be avoided if 
 possible, we should be able to make the kernel interface fast enough for 
 most things if we don't have to switching caching flags on the fly at 
 map/destroy etc.. 

Hmm - if that was true why do we have malloc() and friends - aren't they 
just sub-allocators for brk() and mmap()?

There is more to this than performance - applications out there can 
allocate extraordinarily large numbers of small textures, that can only 
sanely be dealt with as light-weight userspace suballocations of a 
sensible-sized buffer.  (We don't do this yet, but will need to at some 
point!).  The reasons for this are granularity (ie wasted space in the 
allocation), the memory overhead of managing all these allocations, and 
perhaps third performance.

If you think about what goes on in a 3d driver, you are always doing 
sub-allocations of some sort or another, though that's more obvious when 
you start doing state objects that have an independent lifecycle as 
opposed to just emitting state linearly into a command buffer.  For 
managing objects of a few dozen bytes, obviously you are going to want 
to do that in userspace.

So there is a continuum where successively larger buffers increasingly 
justify whatever additional cost there is to go directly to the kernel 
to allocate them.  But for sufficiently small or frequently allocated 
buffers, there will always be a crossover point where it is faster to do 
it in userspace.

It certainly makes sense to speed up the kernel paths, but that won't 
make the crossover point go away - it'll just shift it more or less 
depending on how successful you are.

Keith



-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now  http://get.splunk.com/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: intel hw and caching interface to TTM..

2007-10-30 Thread Thomas Hellström
Dave Airlie wrote:
 Dave, I'd like to see the flag DRM_BO_FLAG_CACHED really mean cache-coherent
 memory, that is cache coherent also while visible to the GPU. There are HW
 implementations out there (Poulsbo at least) where this option actually seems
 to work, althought it's considerably slower for things like texturing. It's
 also a requirement for user bo's since they will have VMAs that we cant kill
 and remap.
 

 Most PCIE cards will be cache coherent, however AGP cards not so much, so 
 need to think if a generic _CACHED makes sense especially for something 
 like radeon, will I have to pass different flags depending on the GART 
 type this seems like uggh.. so maybe a separate flag makes more 
 sense..

   
OK. We're using this functionality in Poulsbo, so we should probably
sort this out to avoid breaking things.
 Could we perhaps change the flag DRM_BO_FLAG_READ_CACHED to mean
 DRM_BO_FLAG_MAPPED_CACHED to implement the behaviour you describe. This will
 also indicate that the buffer cannot be used for user-space sub-allocators, 
 as
 we in that case must be able to guarantee that the CPU can access parts of 
 the
 buffer while other parts are validated for the GPU.
 

 Yes, to be honest sub-allocators for most use-cases should be avoided if 
 possible, we should be able to make the kernel interface fast enough for 
 most things if we don't have to switching caching flags on the fly at 
 map/destroy etc.. 
   
Yes, Eric seems to have the same opinion. I'm not quite sure I 
understand the reasoning behind it.
Is it the added complexity or something else?

While it's super to have a fast kernel interface, the inherent latency 
and allocation granularity
will probably always make a user-space sub-allocator a desirable thing. 
Particularly something like a slab
allocator that would also to some extent avoid fragmentation.

My view of TTM has changed to be a bit from the opposite side:
Let's say we have a fast user-space per-client allocator.
What kernel functionality would we require to make sure that it
can assume it's the sole owner of the memory it manages?

For a repeated usage pattern like batch-buffers we end up allocating 
pages from the kernel,
 setting up one VMA per buffer, modifying  gart- and page tables and in 
the worst case even caching policy for each and every use.
Even if this can be made reasonably fast, I think it's a CPU overhead we 
really shouldn't be paying??

/Thomas










-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now  http://get.splunk.com/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: intel hw and caching interface to TTM..

2007-10-30 Thread Dave Airlie


 OK. We're using this functionality in Poulsbo, so we should probably
 sort this out to avoid breaking things.

Okay I'll try and fix it back up tomorrow.. 

 Yes, Eric seems to have the same opinion. I'm not quite sure I understand the
 reasoning behind it.
 Is it the added complexity or something else?
 
 While it's super to have a fast kernel interface, the inherent latency and
 allocation granularity
 will probably always make a user-space sub-allocator a desirable thing.
 Particularly something like a slab
 allocator that would also to some extent avoid fragmentation.
 
 My view of TTM has changed to be a bit from the opposite side:
 Let's say we have a fast user-space per-client allocator.
 What kernel functionality would we require to make sure that it
 can assume it's the sole owner of the memory it manages?

We have slightly different use-cases, I'm probably more targetting 3d 
desktop uses where sharing buffers is more important (think pixmaps etc) 
so making the allocator go as fast as possible make sense for this case as 
we need to have it in the kernel to do the sharing ...

 For a repeated usage pattern like batch-buffers we end up allocating pages
 from the kernel,
 setting up one VMA per buffer, modifying  gart- and page tables and in the
 worst case even caching policy for each and every use.
 Even if this can be made reasonably fast, I think it's a CPU overhead we
 really shouldn't be paying??

Or we end up allocating a large amount of space to store on-the-fly 
buffers, which requires more tuning per application, some apps may not 
need 65k of batchbuffer space etc..

So while I see the need for suballocators (we need one for glyphs and 
small pixmaps on 2D side in any case) I also see the need to make things 
faster for the non-sub-allocated use case...

So I don't think the goals of 3D driver vs 3D desktop are mutually 
exclusive, I think your work has fone too far towards the single app case 
and our work is trying to pull it back towards our use case..

Dave.

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now  http://get.splunk.com/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: intel hw and caching interface to TTM..

2007-10-30 Thread Thomas Hellström
Dave Airlie wrote:
   
   
 OK. We're using this functionality in Poulsbo, so we should probably
 sort this out to avoid breaking things.
 

 Okay I'll try and fix it back up tomorrow.. 

   
 Yes, Eric seems to have the same opinion. I'm not quite sure I understand the
 reasoning behind it.
 Is it the added complexity or something else?

 While it's super to have a fast kernel interface, the inherent latency and
 allocation granularity
 will probably always make a user-space sub-allocator a desirable thing.
 Particularly something like a slab
 allocator that would also to some extent avoid fragmentation.

 My view of TTM has changed to be a bit from the opposite side:
 Let's say we have a fast user-space per-client allocator.
 What kernel functionality would we require to make sure that it
 can assume it's the sole owner of the memory it manages?
 

 We have slightly different use-cases, I'm probably more targetting 3d 
 desktop uses where sharing buffers is more important (think pixmaps etc) 
 so making the allocator go as fast as possible make sense for this case as 
 we need to have it in the kernel to do the sharing ...
   
Agreed.
   
 For a repeated usage pattern like batch-buffers we end up allocating pages
 from the kernel,
 setting up one VMA per buffer, modifying  gart- and page tables and in the
 worst case even caching policy for each and every use.
 Even if this can be made reasonably fast, I think it's a CPU overhead we
 really shouldn't be paying??
 

 Or we end up allocating a large amount of space to store on-the-fly 
 buffers, which requires more tuning per application, some apps may not 
 need 65k of batchbuffer space etc..

 So while I see the need for suballocators (we need one for glyphs and 
 small pixmaps on 2D side in any case) I also see the need to make things 
 faster for the non-sub-allocated use case...

 So I don't think the goals of 3D driver vs 3D desktop are mutually 
 exclusive, I think your work has fone too far towards the single app case 
 and our work is trying to pull it back towards our use case..
   
Indeed. There are certainly different use-cases. In some cases a 
sub-allocator is beneficial and in some cases
we gain more by not using them.
I had a concern that they would be considered generally useless and that 
the needed kernel support would
go away.

/Thomas


 Dave.
   




-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now  http://get.splunk.com/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel