Re: [Mesa3d-dev] Memory management - an AGP manager
Roland Scheidegger wrote: Keith Whitwell wrote: Right now, I'm primarily concerned with unified memory chipsets, like i915 and via. This memory manager would be suitable for managing the AGP memory on non-unified chipsets, but a different implementation would be needed for the on-card video ram, based more on dma and copying than map/unmapping as will be seen below. I'm not sure I quite understand that, do you propose completely separate managers for handling agp and video ram? You're right, I had been thinking as if the two managers could be made disjoint. I guess there will need to be either a mechanism for multiple backends behind the common API, or else the future Video RAM manager I refer to will actually be an extension of the AGP manager I propose which includes handling of Video RAM as well. In any case, I feel that there are benefits in tackling the AGP only case first, basically getting something up and running in a simplified environment to validate aspects of the design before tackling the full problem. Keith --- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_id=7637alloc_id=16865op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: Memory management - an AGP manager
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Benjamin Herrenschmidt wrote: - Caching issues. On a lot of chipsets, AGP memory must be mapped non-cacheable. This isn't trivial on all architectures and it's not always feasible to do with userland buffers. That means that either the cache must be flushed at the time of the mapping _and_ the buffer not touched at all by the CPU until it's unmapped, or the map call must change the userland mapping to the buffer to mark it uncached. The problems of course starts popping up if this buffer happens to be shared between multiple processes... also, that user memory will also be mapped in the kernel as part of the kernel's linear mapping, which is cacheable. Thus you'll end up with pages mapped both cacheable and non-cacheable in different contexts. This is a good way to cause checkstops with a number of CPUs (certainly with PowerPC and I think also with amd's). (At this point, for those who didn't figure out yet that AGP was just a piece of crap in the first place, welcome to the real world) There must be some way to deal with all this sanely on PPC. Apple has a number of OpenGL extensions for making user memory directly accessable to the graphics engine. Perhaps their specs can provide some clues as to how they do it? http://oss.sgi.com/projects/ogl-sample/registry/APPLE/client_storage.txt http://oss.sgi.com/projects/ogl-sample/registry/APPLE/vertex_array_range.txt http://developer.apple.com/graphicsimaging/opengl/extensions/apple_texture_range.html -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.6 (GNU/Linux) iD8DBQFDwphDX1gOwKyEAw8RAlbbAJ9B42ZX+19keJ1tUo5hJa+43YouFgCfUIsl DV99Rz8I5HS9BXWWxOCyDWw= =fweg -END PGP SIGNATURE- --- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_id=7637alloc_id=16865op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: Memory management - an AGP manager
There must be some way to deal with all this sanely on PPC. Apple has a number of OpenGL extensions for making user memory directly accessable to the graphics engine. Perhaps their specs can provide some clues as to how they do it? http://oss.sgi.com/projects/ogl-sample/registry/APPLE/client_storage.txt http://oss.sgi.com/projects/ogl-sample/registry/APPLE/vertex_array_range.txt http://developer.apple.com/graphicsimaging/opengl/extensions/apple_texture_range.html They probably make the user memory non-cacheable... or maybe they just flush the cache region occupied by the texture when submited since I don't think one is allowed to modify it after it's been submited, unless I mistread something. MacOS X doesn't have a linear mapping of memory afaik and doesn't use large pages, so they can more easily play with individual page cacheability without creating paradox (though their kernel is slower than linux overall). Im certain that Apple AGP host bridge doesn't support cached memory and most revisions of it don't support stores to AGP from the GPU neither. Ben. --- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_id=7637alloc_id=16865op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Memory management - an AGP manager
This follows on from the previous post to discuss an implementation of the memory manager for managing only AGP memory. Right now, I'm primarily concerned with unified memory chipsets, like i915 and via. This memory manager would be suitable for managing the AGP memory on non-unified chipsets, but a different implementation would be needed for the on-card video ram, based more on dma and copying than map/unmapping as will be seen below. In the past there has been talk about mapping user memory into the GTT aperture as a mechanism to avoid copy-based uploading. What I'm proposing is that this type of mapping becomes the only or at least primary way of getting data and memory into the GART aperture. In the simplest case, the API functions would have approximately the following implementations: CreateBuffer(GLuint size) - Allocate size bytes of virtual memory. BufferData(void *ptr) BufferSubData(...) - Simplest implementation: MapBuffer(), memcpy, UnmapBuffer() - May be possible to do better, but get something working first. ValidateBuffer() - Recall that this is the function called by the driver inside LOCK_HARDWARE() to indicate it wants to use the buffer. So this will trigger the upload, which in this implementation is always handled by mapping the data into the GART table. - Search the agp address space for contiguous unused range matching buffer size. If none available, attempt to rearrange and evict buffers to free space. - Mlock buffer pages. - Map buffer pages into GART. - Perhaps take some steps to ensure they aren't accessed except through the gart. MapBuffer() - If the buffer is mlocked and mapped into the GART, unmap it. - Return a virtual address space pointer to the buffer. UnmapBuffer() - Could be a noop. But perhaps take some steps to ensure the pointer above is invalidated. DestroyBuffer() - If buffer is mlocked and mapped to GART, unmap it. - Free the virtual memory allocated above. The idea of a fixed pool of AGP memory sitting there whether in use or not is something I'd like to move away from, and this ties in very well with some of the questions that an AGP memory manager raises: 1) When buffers are evicted from AGP memory, how are the contents of the buffers preserved? - The naive answer is that we must allocate memory somewhere and then back-copy from AGP to that allocated memory. This is deadly slow and would probably mean that any such design would actually be slower than the current memory manager under texture swapping. - My proposal is that because the buffer is simply user memory that has been mapped into AGP, the way to free up AGP memory is to just unmap that buffer. There is no need to copy any data as the client process still holds the pages. - Note that in this proposal AGP memory isn't a constrained resource, because AGP memory is really just mlocked client memory which has been mapped into the GART. The constraints are 1) mlocked client memory (not our problem) and 2) the AGP address space. 2) What happens when free space becomes fragmented? - The naive answer is that you use the blitter to copy data within the static AGP memory pool to reduce fragmentation. - My proposal is that you never attempt to rearrange memory in that way, because doing so would remove the data from the pages the client allocated to hold it - you'd then have to tell the client that the data had moved to a new bunch of pages and it would have to somehow map them into its memory. A big hassle. - Instead you have two options: 1) Unmap buffers from their old AGP addresses and re-map them into new contiguous addresses starting. 2) Alternately, just unmap the offending buffers and let the client re-map them when they are next required. 3) What about back and depth buffers? - These are buffer objects just like any other, no special treatment is required. The X server on initialization simply creates a buffer of this size. The physical memory won't be allocated, let alone mapped into the GART, until the first time a client does ValidateBuffer( backbuffer ). 4) What about the front buffer? - OK, you got me there. There needs to be some way of tagging this buffer as special so it doesn't get moved or evicted except by the X server (eg. during rotate events). What does all this mean? 1) I think this is the first solution for memory management that I can imagine implementing. Also it's one which gives reasonable performance when data is being evicted from the GART. 2) We need to understand mapping and unmapping from the GART
Re: [Mesa3d-dev] Memory management - an AGP manager
Keith Whitwell wrote: 1) I think this is the first solution for memory management that I can imagine implementing. Also it's one which gives reasonable performance when data is being evicted from the GART. This sounds a little trite reading it back. This a function of two things, firstly that I've thought more about this implementation than any other but also crucially that I've been able to find straightforward answers to most of the problems that have come up about the behaviour of the manager. It seems to me that this could be implemented in a small to mid sized project and beyond the extensions to the AGP module for map/unmapping doesn't have too many scary implementation black holes. It also seems like the drivers wouldn't change too drastically as a result of adopting this memory manager. I've not mentioned replacement algorithms at all as I feel that the design doesn't place any constraints on the choice of replacement algorithm, pretty much anything can be implemented inside of ValidateBuffer() and the rest of the design would be unaffected. Keith --- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_id=7637alloc_id=16865op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: Memory management - an AGP manager
On Sun, 2006-01-08 at 18:17 +, Keith Whitwell wrote: In the past there has been talk about mapping user memory into the GTT aperture as a mechanism to avoid copy-based uploading. What I'm proposing is that this type of mapping becomes the only or at least primary way of getting data and memory into the GART aperture. Hi Keith ! A couple of things with this approach that come to mind.. - Caching issues. On a lot of chipsets, AGP memory must be mapped non-cacheable. This isn't trivial on all architectures and it's not always feasible to do with userland buffers. That means that either the cache must be flushed at the time of the mapping _and_ the buffer not touched at all by the CPU until it's unmapped, or the map call must change the userland mapping to the buffer to mark it uncached. The problems of course starts popping up if this buffer happens to be shared between multiple processes... also, that user memory will also be mapped in the kernel as part of the kernel's linear mapping, which is cacheable. Thus you'll end up with pages mapped both cacheable and non-cacheable in different contexts. This is a good way to cause checkstops with a number of CPUs (certainly with PowerPC and I think also with amd's). (At this point, for those who didn't figure out yet that AGP was just a piece of crap in the first place, welcome to the real world) - I would separate the mecanism for locking preparing user buffers for DMA and binding them into a GART, wether it's AGP, or a card specific one like ATI's PCI GART, or a card's SG DMA engine, like nvidia objects. That is, on one hand, you have the user program who has buffers (textures, fbo's, whatever... ) that it wants to make available for access by the engine, on the other, you have whatever mecanism your card provides to access system memory with address translation. The later changes. AGP GART is one way but is slowly going away, ATIs use a similar mecanism on PCI and PCI-E, nVidia afaik, can store in the card descriptors for objects in system memory associated with a scatter gather list (or a page table), on ATI, I've had good results using also the scatter-gather DMA engine instead of the GART for some types of transfers (like card - memory, a direction for which AGP often doesn;t work at all) etc... All those mecanisms however rely on the same left hand basic mecanism for locking mapping the user buffers for DMA, which requires also some resource tracking to make sure a simple DRI client can't hog the whole system by pinning down too much user memory, etc... Ben. --- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_id=7637alloc_id=16865op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: Memory management - an AGP manager
Benjamin Herrenschmidt wrote: On Sun, 2006-01-08 at 18:17 +, Keith Whitwell wrote: In the past there has been talk about mapping user memory into the GTT aperture as a mechanism to avoid copy-based uploading. What I'm proposing is that this type of mapping becomes the only or at least primary way of getting data and memory into the GART aperture. Hi Keith ! A couple of things with this approach that come to mind.. - Caching issues. On a lot of chipsets, AGP memory must be mapped non-cacheable. This isn't trivial on all architectures and it's not always feasible to do with userland buffers. That means that either the cache must be flushed at the time of the mapping _and_ the buffer not touched at all by the CPU until it's unmapped, or the map call must change the userland mapping to the buffer to mark it uncached. The problems of course starts popping up if this buffer happens to be shared between multiple processes... also, that user memory will also be mapped in the kernel as part of the kernel's linear mapping, which is cacheable. Thus you'll end up with pages mapped both cacheable and non-cacheable in different contexts. This is a good way to cause checkstops with a number of CPUs (certainly with PowerPC and I think also with amd's). (At this point, for those who didn't figure out yet that AGP was just a piece of crap in the first place, welcome to the real world) Yes, this I think is addressed by the Map/Unmap semantics from ARB_vbo and the additional constraints I included in the design, ie that the only time the buffer contents are meant to be available as user memory is when they are Mapped (in the ARB_vbo sense, which means *not* mapped in the GART). Thus it's allowable to tear down the cached access to these pages on upload, and especially so if required for system stability. IE, in systems where stability is threatened by the cacheable access to the pages, it would be necessary to extract the uploaded pages those cacheable maps and have them solely in the GTT - if it is possible to populate the GTT at all, it must be possible to populate it with these pages? If this is an absolute killer for some systems, then I guess there would have to be a different AGP memory manager on those systems, based more on the sort of copying and DMA operations that I expect a Video RAM implementation would use. I feel however that the ARB_vbo API's and semantics are strong enough to cover both cases. - I would separate the mecanism for locking preparing user buffers for DMA and binding them into a GART, wether it's AGP, or a card specific one like ATI's PCI GART, or a card's SG DMA engine, like nvidia objects. That is, on one hand, you have the user program who has buffers (textures, fbo's, whatever... ) that it wants to make available for access by the engine, on the other, you have whatever mecanism your card provides to access system memory with address translation. The later changes. AGP GART is one way but is slowly going away, ATIs use a similar mecanism on PCI and PCI-E, nVidia afaik, can store in the card descriptors for objects in system memory associated with a scatter gather list (or a page table), on ATI, I've had good results using also the scatter-gather DMA engine instead of the GART for some types of transfers (like card - memory, a direction for which AGP often doesn;t work at all) etc... All those mecanisms however rely on the same left hand basic mecanism for locking mapping the user buffers for DMA, which requires also some resource tracking to make sure a simple DRI client can't hog the whole system by pinning down too much user memory, etc... Indeed. I think that taking the API from ARB_vbo makes these different implementations entirely possible. The implementation I am interested in right now is the AGP one, but don't take that to imply that other implementations and backends are excluded. I've tried to provide exactly the seperation that I think you're talking about by sticking closely to the buffer_object semantics from ARB_vbo/ARB_fbo for the interface. Keith --- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_id=7637alloc_id=16865op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: Memory management - an AGP manager
On Sun, 2006-01-08 at 22:55 +, Keith Whitwell wrote: Yes, this I think is addressed by the Map/Unmap semantics from ARB_vbo and the additional constraints I included in the design, ie that the only time the buffer contents are meant to be available as user memory is when they are Mapped (in the ARB_vbo sense, which means *not* mapped in the GART). Thus it's allowable to tear down the cached access to these pages on upload, and especially so if required for system stability. IE, in systems where stability is threatened by the cacheable access to the pages, it would be necessary to extract the uploaded pages those cacheable maps and have them solely in the GTT - if it is possible to populate the GTT at all, it must be possible to populate it with these pages? Yup, I suppose so, at least in the case of an AGP GART... PCI GARTs friends have other issues (read: 32 bits GARTs on 64 bits systems without iommus for example). If this is an absolute killer for some systems, then I guess there would have to be a different AGP memory manager on those systems, based more on the sort of copying and DMA operations that I expect a Video RAM implementation would use. I feel however that the ARB_vbo API's and semantics are strong enough to cover both cases. We can always do ugly memcpy to a pre-allocated AGP area on systems where that wouldn't work at all... One issue is you can't necessarily tear down the cacheable mapping. For example, in some systems like g5's, the kernel linear mapping of all RAM is done using large pages (16M pages). But the AGP pages are individual 4k pages. Unfortunately, you can't easily just break up one large page into small pages on those systems. However, they have fine grained cache manipulation instructions too, thus what we can do is tear down the user mapping completely (or rather make it non-accessible though still present for memory accounting purposes, those pages are still owned by the user, just neither readable nor writeable) and keep the kernel mapping. That would have no adverse side effect if we take care of flushing the relevant cache pages when doing so. The cacheable kenrel mapping will still be there, but it shouldn't be a problem. In fact, the above might well be good for performances too ! Since it would allow the user mapping to be fully cacheable (since we can explcitely tear it down flush it when mapping into card space). In fact, tearing it down isn't even necessary. All that is required is that we have a checkpoint when the buffer changes ownership where we can do the appropriate cache flush/invalidate depending on the direction of the transfer. That's it. If the user doesn't respect that and taps the buffer at wrong times, it will have corrupted pixels, but that's a programming error. We can enforce it with mmu protection but that would be more costly in terms of performances. Indeed. I think that taking the API from ARB_vbo makes these different implementations entirely possible. The implementation I am interested in right now is the AGP one, but don't take that to imply that other implementations and backends are excluded. I've tried to provide exactly the seperation that I think you're talking about by sticking closely to the buffer_object semantics from ARB_vbo/ARB_fbo for the interface. Ok, I'm not familiar with those semantics (as I'm not too familiar with such high-level things as OpenGL :) but I'll try to have a look one of these days). Cheers, Ben. --- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_id=7637alloc_id=16865op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: Memory management - an AGP manager
Indeed. I think that taking the API from ARB_vbo makes these different implementations entirely possible. The implementation I am interested in right now is the AGP one, but don't take that to imply that other implementations and backends are excluded. I've tried to provide exactly the seperation that I think you're talking about by sticking closely to the buffer_object semantics from ARB_vbo/ARB_fbo for the interface. Ok, I'm not familiar with those semantics (as I'm not too familiar with such high-level things as OpenGL :) but I'll try to have a look one of these days). It's usually very worthwhile to track down the GL extension that exposes the HW functionality in question - the extensions have been written by people who have already implemented whatever problem you're considering, usually on multiple systems, so they often contain a wealth of distilled knowledge applicable to the lowlevel design. Keith --- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_id=7637alloc_id=16865op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Mesa3d-dev] Memory management - an AGP manager
Keith Whitwell wrote: Right now, I'm primarily concerned with unified memory chipsets, like i915 and via. This memory manager would be suitable for managing the AGP memory on non-unified chipsets, but a different implementation would be needed for the on-card video ram, based more on dma and copying than map/unmapping as will be seen below. I'm not sure I quite understand that, do you propose completely separate managers for handling agp and video ram? I think this introduces some problems for things which can be both in agp or video ram. Some cards for instance may be able to render to agp memory, though the performance hit may be so big that you really want to use video memory unless you simply don't have enough video mem. A very common case which can use both video memory and agp memory are textures, for performance reasons you usually want to place them in video memory, but putting some textures into agp memory is a lot faster than texture thrashing. I guess you could handle that at the driver level (since it has always access to all ram) but performance may potentially suffer quite a bit if multiple apps are running at the same time which stomp over each others buffer (which could be better avoided with a coordinated scheme by allocating some buffers in agp memory instead of video ram). Roland --- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_id=7637alloc_id=16865op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel