Re: [Mesa3d-dev] Memory management - an AGP manager

2006-01-09 Thread Keith Whitwell

Roland Scheidegger wrote:

Keith Whitwell wrote:


Right now, I'm primarily concerned with unified memory chipsets, like
i915 and via. This memory manager would be suitable for managing the
AGP memory on non-unified chipsets, but a different implementation
would be needed for the on-card video ram, based more on dma and
copying than map/unmapping as will be seen below.


I'm not sure I quite understand that, do you propose completely separate 
managers for handling agp and video ram?


You're right, I had been thinking as if the two managers could be made disjoint. 
   I guess there will need to be either a mechanism for multiple backends 
behind the common API, or else the future Video RAM manager I refer to will 
actually be an extension of the AGP manager I propose which includes handling of 
Video RAM as well.


In any case, I feel that there are benefits in tackling the AGP only case first, 
basically getting something up and running in a simplified environment to 
validate aspects of the design before tackling the full problem.


Keith



---
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637alloc_id=16865op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: Memory management - an AGP manager

2006-01-09 Thread Ian Romanick
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Benjamin Herrenschmidt wrote:

 - Caching issues. On a lot of chipsets, AGP memory must be mapped
 non-cacheable. This isn't trivial on all architectures and it's not
 always feasible to do with userland buffers. That means that either the
 cache must be flushed at the time of the mapping _and_ the buffer not
 touched at all by the CPU until it's unmapped, or the map call must
 change the userland mapping to the buffer to mark it uncached. The
 problems of course starts popping up if this buffer happens to be shared
 between multiple processes... also, that user memory will also be mapped
 in the kernel as part of the kernel's linear mapping, which is
 cacheable. Thus you'll end up with pages mapped both cacheable and
 non-cacheable in different contexts. This is a good way to cause
 checkstops with a number of CPUs (certainly with PowerPC and I think
 also with amd's). (At this point, for those who didn't figure out yet
 that AGP was just a piece of crap in the first place, welcome to the
 real world)

There must be some way to deal with all this sanely on PPC.  Apple has a
number of OpenGL extensions for making user memory directly accessable
to the graphics engine.  Perhaps their specs can provide some clues as
to how they do it?

http://oss.sgi.com/projects/ogl-sample/registry/APPLE/client_storage.txt
http://oss.sgi.com/projects/ogl-sample/registry/APPLE/vertex_array_range.txt
http://developer.apple.com/graphicsimaging/opengl/extensions/apple_texture_range.html
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFDwphDX1gOwKyEAw8RAlbbAJ9B42ZX+19keJ1tUo5hJa+43YouFgCfUIsl
DV99Rz8I5HS9BXWWxOCyDWw=
=fweg
-END PGP SIGNATURE-


---
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637alloc_id=16865op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: Memory management - an AGP manager

2006-01-09 Thread Benjamin Herrenschmidt

 There must be some way to deal with all this sanely on PPC.  Apple has a
 number of OpenGL extensions for making user memory directly accessable
 to the graphics engine.  Perhaps their specs can provide some clues as
 to how they do it?
 
 http://oss.sgi.com/projects/ogl-sample/registry/APPLE/client_storage.txt
 http://oss.sgi.com/projects/ogl-sample/registry/APPLE/vertex_array_range.txt
 http://developer.apple.com/graphicsimaging/opengl/extensions/apple_texture_range.html

They probably make the user memory non-cacheable... or maybe they just
flush the cache region occupied by the texture when submited since I
don't think one is allowed to modify it after it's been submited, unless
I mistread something. MacOS X doesn't have a linear mapping of memory
afaik and doesn't use large pages, so they can more easily play with
individual page cacheability without creating paradox (though their
kernel is slower than linux overall). 

Im certain that Apple AGP host bridge doesn't support cached memory and
most revisions of it don't support stores to AGP from the GPU neither.

Ben.




---
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637alloc_id=16865op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Memory management - an AGP manager

2006-01-08 Thread Keith Whitwell

This follows on from the previous post to discuss an implementation of
the memory manager for managing only AGP memory.

Right now, I'm primarily concerned with unified memory chipsets, like
i915 and via. This memory manager would be suitable for managing the
AGP memory on non-unified chipsets, but a different implementation
would be needed for the on-card video ram, based more on dma and
copying than map/unmapping as will be seen below.

In the past there has been talk about mapping user memory into the GTT
aperture as a mechanism to avoid copy-based uploading.  What I'm
proposing is that this type of mapping becomes the only or at least
primary way of getting data and memory into the GART aperture.

In the simplest case, the API functions would have approximately the
following implementations:


CreateBuffer(GLuint size)

- Allocate size bytes of virtual memory.

BufferData(void *ptr)
BufferSubData(...)

- Simplest implementation:  MapBuffer(), memcpy, UnmapBuffer()
- May be possible to do better, but get something working first.

ValidateBuffer()

 - Recall that this is the function called by the driver
   inside LOCK_HARDWARE() to indicate it wants to use the
   buffer.  So this will trigger the upload, which in this
   implementation is always handled by mapping the data into
   the GART table.

 - Search the agp address space for contiguous unused range
   matching buffer size.  If none available, attempt to
   rearrange and evict buffers to free space.

 - Mlock buffer pages.
 - Map buffer pages into GART.
 - Perhaps take some steps to ensure they aren't accessed
   except through the gart.

MapBuffer()

- If the buffer is mlocked and mapped into the GART, unmap it.
- Return a virtual address space pointer to the buffer.

UnmapBuffer()

- Could be a noop.  But perhaps take some steps to ensure
  the pointer above is invalidated.

DestroyBuffer()

- If buffer is mlocked and mapped to GART, unmap it.
- Free the virtual memory allocated above.

The idea of a fixed pool of AGP memory sitting there whether in use or
not is something I'd like to move away from, and this ties in very
well with some of the questions that an AGP memory manager raises:

1) When buffers are evicted from AGP memory, how are the contents of
   the buffers preserved?

- The naive answer is that we must allocate memory somewhere and
  then back-copy from AGP to that allocated memory.  This is
  deadly slow and would probably mean that any such design would
  actually be slower than the current memory manager under texture
  swapping.

- My proposal is that because the buffer is simply user memory
  that has been mapped into AGP, the way to free up AGP memory is
  to just unmap that buffer. There is no need to copy any data as
  the client process still holds the pages.

- Note that in this proposal AGP memory isn't a constrained
  resource, because AGP memory is really just mlocked client
  memory which has been mapped into the GART.  The constraints are
  1) mlocked client memory (not our problem) and 2) the AGP address
  space.


2) What happens when free space becomes fragmented?

- The naive answer is that you use the blitter to copy data within
  the static AGP memory pool to reduce fragmentation.

- My proposal is that you never attempt to rearrange memory in
  that way, because doing so would remove the data from the pages
  the client allocated to hold it - you'd then have to tell the
  client that the data had moved to a new bunch of pages and it
  would have to somehow map them into its memory.  A big hassle.

- Instead you have two options:

1) Unmap buffers from their old AGP addresses and re-map them
   into new contiguous addresses starting.

2) Alternately, just unmap the offending buffers and let the
   client re-map them when they are next required.


3) What about back and depth buffers?

- These are buffer objects just like any other, no special
  treatment is required.  The X server on initialization simply
  creates a buffer of this size.  The physical memory won't be
  allocated, let alone mapped into the GART, until the first time
  a client does ValidateBuffer( backbuffer ).

4) What about the front buffer?

- OK, you got me there.  There needs to be some way of tagging
  this buffer as special so it doesn't get moved or evicted except
  by the X server (eg. during rotate events).


What does all this mean?

1) I think this is the first solution for memory management that I can
   imagine implementing.  Also it's one which gives reasonable
   performance when data is being evicted from the GART.

2) We need to understand mapping and unmapping from the GART 

Re: [Mesa3d-dev] Memory management - an AGP manager

2006-01-08 Thread Keith Whitwell

Keith Whitwell wrote:


1) I think this is the first solution for memory management that I can
   imagine implementing.  Also it's one which gives reasonable
   performance when data is being evicted from the GART.


This sounds a little trite reading it back.

This a function of two things, firstly that I've thought more about this 
implementation than any other but also crucially that I've been able to find 
straightforward answers to most of the problems that have come up about the 
behaviour of the manager.


It seems to me that this could be implemented in a small to mid sized project 
and beyond the extensions to the AGP module for map/unmapping doesn't have too 
many scary implementation black holes.


It also seems like the drivers wouldn't change too drastically as a result of 
adopting this memory manager.


I've not mentioned replacement algorithms at all as I feel that the design 
doesn't place any constraints on the choice of replacement algorithm, pretty 
much anything can be implemented inside of ValidateBuffer() and the rest of the 
design would be unaffected.


Keith




---
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637alloc_id=16865op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: Memory management - an AGP manager

2006-01-08 Thread Benjamin Herrenschmidt
On Sun, 2006-01-08 at 18:17 +, Keith Whitwell wrote:

 In the past there has been talk about mapping user memory into the GTT
 aperture as a mechanism to avoid copy-based uploading.  What I'm
 proposing is that this type of mapping becomes the only or at least
 primary way of getting data and memory into the GART aperture.

Hi Keith !

A couple of things with this approach that come to mind..

- Caching issues. On a lot of chipsets, AGP memory must be mapped
non-cacheable. This isn't trivial on all architectures and it's not
always feasible to do with userland buffers. That means that either the
cache must be flushed at the time of the mapping _and_ the buffer not
touched at all by the CPU until it's unmapped, or the map call must
change the userland mapping to the buffer to mark it uncached. The
problems of course starts popping up if this buffer happens to be shared
between multiple processes... also, that user memory will also be mapped
in the kernel as part of the kernel's linear mapping, which is
cacheable. Thus you'll end up with pages mapped both cacheable and
non-cacheable in different contexts. This is a good way to cause
checkstops with a number of CPUs (certainly with PowerPC and I think
also with amd's). (At this point, for those who didn't figure out yet
that AGP was just a piece of crap in the first place, welcome to the
real world)

 - I would separate the mecanism for locking  preparing user buffers
for DMA and binding them into a GART, wether it's AGP, or a card
specific one like ATI's PCI GART, or a card's SG DMA engine, like nvidia
objects. That is, on one hand, you have the user program who has buffers
(textures, fbo's, whatever... ) that it wants to make available for
access by the engine, on the other, you have whatever mecanism your card
provides to access system memory with address translation. The later
changes. AGP GART is one way but is slowly going away, ATIs use a
similar mecanism on PCI and PCI-E, nVidia afaik, can store in the card
descriptors for objects in system memory associated with a scatter
gather list (or a page table), on ATI, I've had good results using also
the scatter-gather DMA engine instead of the GART for some types of
transfers (like card - memory, a direction for which AGP often doesn;t
work at all) etc... All those mecanisms however rely on the same left
hand basic mecanism for locking  mapping the user buffers for DMA,
which requires also some resource tracking to make sure a simple DRI
client can't hog the whole system by pinning down too much user memory,
etc...

Ben.



---
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637alloc_id=16865op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: Memory management - an AGP manager

2006-01-08 Thread Keith Whitwell

Benjamin Herrenschmidt wrote:

On Sun, 2006-01-08 at 18:17 +, Keith Whitwell wrote:



In the past there has been talk about mapping user memory into the GTT
aperture as a mechanism to avoid copy-based uploading.  What I'm
proposing is that this type of mapping becomes the only or at least
primary way of getting data and memory into the GART aperture.



Hi Keith !

A couple of things with this approach that come to mind..

- Caching issues. On a lot of chipsets, AGP memory must be mapped
non-cacheable. This isn't trivial on all architectures and it's not
always feasible to do with userland buffers. That means that either the
cache must be flushed at the time of the mapping _and_ the buffer not
touched at all by the CPU until it's unmapped, or the map call must
change the userland mapping to the buffer to mark it uncached. The
problems of course starts popping up if this buffer happens to be shared
between multiple processes... also, that user memory will also be mapped
in the kernel as part of the kernel's linear mapping, which is
cacheable. Thus you'll end up with pages mapped both cacheable and
non-cacheable in different contexts. This is a good way to cause
checkstops with a number of CPUs (certainly with PowerPC and I think
also with amd's). (At this point, for those who didn't figure out yet
that AGP was just a piece of crap in the first place, welcome to the
real world)


Yes, this I think is addressed by the Map/Unmap semantics from ARB_vbo and the 
additional constraints I included in the design, ie that the only time the 
buffer contents are meant to be available as user memory is when they are Mapped 
(in the ARB_vbo sense, which means *not* mapped in the GART).  Thus it's 
allowable to tear down the cached access to these pages on upload, and 
especially so if required for system stability.


IE, in systems where stability is threatened by the cacheable access to the 
pages, it would be necessary to extract the uploaded pages those cacheable maps 
and have them solely in the GTT - if it is possible to populate the GTT at all, 
it must be possible to populate it with these pages?


If this is an absolute killer for some systems, then I guess there would have to 
be a different AGP memory manager on those systems, based more on the sort of 
copying and DMA operations that I expect a Video RAM implementation would use. 
I feel however that the ARB_vbo API's and semantics are strong enough to cover 
both cases.



 - I would separate the mecanism for locking  preparing user buffers
for DMA and binding them into a GART, wether it's AGP, or a card
specific one like ATI's PCI GART, or a card's SG DMA engine, like nvidia
objects. That is, on one hand, you have the user program who has buffers
(textures, fbo's, whatever... ) that it wants to make available for
access by the engine, on the other, you have whatever mecanism your card
provides to access system memory with address translation. The later
changes. AGP GART is one way but is slowly going away, ATIs use a
similar mecanism on PCI and PCI-E, nVidia afaik, can store in the card
descriptors for objects in system memory associated with a scatter
gather list (or a page table), on ATI, I've had good results using also
the scatter-gather DMA engine instead of the GART for some types of
transfers (like card - memory, a direction for which AGP often doesn;t
work at all) etc... All those mecanisms however rely on the same left
hand basic mecanism for locking  mapping the user buffers for DMA,
which requires also some resource tracking to make sure a simple DRI
client can't hog the whole system by pinning down too much user memory,
etc...


Indeed.  I think that taking the API from ARB_vbo makes these different 
implementations entirely possible.  The implementation I am interested in right 
now is the AGP one, but don't take that to imply that other implementations and 
backends are excluded.


I've tried to provide exactly the seperation that I think you're talking about 
by sticking closely to the buffer_object semantics from ARB_vbo/ARB_fbo for the 
interface.


Keith


---
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637alloc_id=16865op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: Memory management - an AGP manager

2006-01-08 Thread Benjamin Herrenschmidt
On Sun, 2006-01-08 at 22:55 +, Keith Whitwell wrote:

 Yes, this I think is addressed by the Map/Unmap semantics from ARB_vbo and 
 the 
 additional constraints I included in the design, ie that the only time the 
 buffer contents are meant to be available as user memory is when they are 
 Mapped 
 (in the ARB_vbo sense, which means *not* mapped in the GART).  Thus it's 
 allowable to tear down the cached access to these pages on upload, and 
 especially so if required for system stability.
 
 IE, in systems where stability is threatened by the cacheable access to the 
 pages, it would be necessary to extract the uploaded pages those cacheable 
 maps 
 and have them solely in the GTT - if it is possible to populate the GTT at 
 all, 
 it must be possible to populate it with these pages?

Yup, I suppose so, at least in the case of an AGP GART... PCI GARTs 
friends have other issues (read: 32 bits GARTs on 64 bits systems
without iommus for example).

 If this is an absolute killer for some systems, then I guess there would have 
 to 
 be a different AGP memory manager on those systems, based more on the sort of 
 copying and DMA operations that I expect a Video RAM implementation would 
 use. 
 I feel however that the ARB_vbo API's and semantics are strong enough to 
 cover 
 both cases.

We can always do ugly memcpy to a pre-allocated AGP area on systems
where that wouldn't work at all...

One issue is you can't necessarily tear down the cacheable mapping.
For example, in some systems like g5's, the kernel linear mapping of all
RAM is done using large pages (16M pages). But the AGP pages are
individual 4k pages. Unfortunately, you can't easily just break up one
large page into small pages on those systems. However, they have fine
grained cache manipulation instructions too, thus what we can do is tear
down the user mapping completely (or rather make it non-accessible
though still present for memory accounting purposes, those pages are
still owned by the user, just neither readable nor writeable) and keep
the kernel mapping. That would have no adverse side effect if we take
care of flushing the relevant cache pages when doing so. The cacheable
kenrel mapping will still be there, but it shouldn't be a problem. 

In fact, the above might well be good for performances too ! Since it
would allow the user mapping to be fully cacheable (since we can
explcitely tear it down  flush it when mapping into card space). In
fact, tearing it down isn't even necessary. All that is required is that
we have a checkpoint when the buffer changes ownership where we can do
the appropriate cache flush/invalidate depending on the direction of the
transfer. That's it. If the user doesn't respect that and taps the
buffer at wrong times, it will have corrupted pixels, but that's a
programming error. We can enforce it with mmu protection but that would
be more costly in terms of performances.

 Indeed.  I think that taking the API from ARB_vbo makes these different 
 implementations entirely possible.  The implementation I am interested in 
 right 
 now is the AGP one, but don't take that to imply that other implementations 
 and 
 backends are excluded.
 
 I've tried to provide exactly the seperation that I think you're talking 
 about 
 by sticking closely to the buffer_object semantics from ARB_vbo/ARB_fbo for 
 the 
 interface.

Ok, I'm not familiar with those semantics (as I'm not too familiar with
such high-level things as OpenGL :) but I'll try to have a look one of
these days).

Cheers,
Ben.




---
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637alloc_id=16865op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: Memory management - an AGP manager

2006-01-08 Thread Keith Whitwell


Indeed.  I think that taking the API from ARB_vbo makes these different 
implementations entirely possible.  The implementation I am interested in right 
now is the AGP one, but don't take that to imply that other implementations and 
backends are excluded.


I've tried to provide exactly the seperation that I think you're talking about 
by sticking closely to the buffer_object semantics from ARB_vbo/ARB_fbo for the 
interface.



Ok, I'm not familiar with those semantics (as I'm not too familiar with
such high-level things as OpenGL :) but I'll try to have a look one of
these days).


It's usually very worthwhile to track down the GL extension that exposes the HW 
functionality in question - the extensions have been written by people who have 
already implemented whatever problem you're considering, usually on multiple 
systems, so they often contain a wealth of distilled knowledge applicable to the 
lowlevel design.


Keith


---
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637alloc_id=16865op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [Mesa3d-dev] Memory management - an AGP manager

2006-01-08 Thread Roland Scheidegger

Keith Whitwell wrote:

Right now, I'm primarily concerned with unified memory chipsets, like
i915 and via. This memory manager would be suitable for managing the
AGP memory on non-unified chipsets, but a different implementation
would be needed for the on-card video ram, based more on dma and
copying than map/unmapping as will be seen below.
I'm not sure I quite understand that, do you propose completely separate 
managers for handling agp and video ram?
I think this introduces some problems for things which can be both in 
agp or video ram. Some cards for instance may be able to render to agp 
memory, though the performance hit may be so big that you really want to 
use video memory unless you simply don't have enough video mem.
A very common case which can use both video memory and agp memory are 
textures, for performance reasons you usually want to place them in 
video memory, but putting some textures into agp memory is a lot faster 
than texture thrashing. I guess you could handle that at the driver 
level (since it has always access to all ram) but performance may 
potentially suffer quite a bit if multiple apps are running at the same 
time which stomp over each others buffer (which could be better avoided 
with a coordinated scheme by allocating some buffers in agp memory 
instead of video ram).


Roland


---
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637alloc_id=16865op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel