Re: [Mesa3d-dev] Merged texmem branch

Thomas Hellström Sun, 26 Nov 2006 11:35:44 -0800

Stephane Marchesin wrote:

>Stephane Marchesin wrote:
>  
>
>>Keith Whitwell wrote:
>>  
>>    
>>
>>>We've just merged the texmem-0-3-branch code.  This has been a major 
>>>project, probably much bigger than we realized when we started on it.
>>>
>>>The fundamental technology underpinning the changes is Thomas 
>>>Hellstrom's new TTM memory manager which dynamically maps buffers into 
>>>and out of the GART aperture as required for rendering.
>>>
>>>The benefits that flow from this are:
>>>     - Vastly increased amount of memory available for texturing.
>>>     - No need to reserve/waste memory for texture pool when not rendering.
>>>     - Fast transfers to *and* from video memory.
>>>
>>>As a result we've been able to build a whole bunch of features into the 
>>>i915tex driver that haven't been present in DRI-based drivers previously:
>>>
>>>     - EXT_framebuffer_objects, render to texture
>>>     - ARB_pixel_buffer_objects
>>>     - Accelerated
>>>             - CopyTexSubimage
>>>             - DrawPixels
>>>             - ReadPixels
>>>             - CopyPixels
>>>     - Accelerated texture uploads from pixel buffer objects
>>>     - Potentially texturing directly from the pixel buffer object (zero 
>>>copy texturing).
>>>
>>>If/when other drivers are ported to the memory manager, it will be easy 
>>>to support VBO's in the same way.
>>>
>>>    
>>>      
>>>
>>Hello,
>>
>>Nice work on the code and design ! It seems to me that this can really 
>>provide significant speedups for AGP access.
>>
>>Now, I'm interested in knowing what will happen next. In particular, I 
>>have two questions :
>>- the current design always assumes that memory chunks are mapped into 
>>the process space, which is not always possible with, say, VRAM above 
>>128Mb on radeon. If the design is to be unified, that's an assumption 
>>that can cause some trouble. How will that be done ?
>>- second, no real solution was proposed for cards which have multiple 
>>hardware contexts 
>>(http://www.mail-archive.com/dri-devel@lists.sourceforge.net/msg28472.html). 
>>I expect this hardware model to be more widespread in the future. How 
>>should we handle it ? In my opinion, it's not only for supporting a 
>>single kind of hardware (in which case we'd happily write our own memory 
>>manager), but more about being future-proof.
>>
>>  
>>    
>>
>Anyone ? It would be nice to know the goals of the current memory 
>manager. If it is only tailored for low-end graphics, we will simply 
>write our own system. But we need to know.
>
>Stephane
>
>  
>
Hi.


First, extending the current manager to handle vram and non-mappable vram is
possible and will probably be done in the not too distant future.

The design does not assume that physical memory chunks are mappable.
Before you want CPU access to a buffer, it must be mapped. If it then 
resides in
a non-mappable part of memory, it is first moved to a part of memory 
that is mappable.
The buffer is accessible to the CPU until it is unmapped. If needed, the 
manager waits
for the GPU to finish access to complete the mapping process. In the 
radeon case,
I suspect the best option would be to have the blitter move memory 
contents from an
unmappable part of VRAM to another.

One interesting aspect of the memory manager is that when space is 
needed, a buffer may be evicted
while it is still mapped. User space will never notice that the buffer 
has moved physically. This should be true both for VRAM and AGP memory.

To obtain a physical address to a buffer (which is needed for GPU 
access), the buffer is
"validated". Validating includes placing the buffer in a GPU-accessible 
part of memory and
hand back a physical offset. This can either be done before the command 
stream accessing
the buffer is constructed, or, as in the intel case, just before the 
hardware is sent the command stream.
In the intel case, offsets to buffers are fixed up by the user space 
driver when it knows
exactly what the offsets are. There is no command validation included in 
the process. The caller must then associate the buffer with a "Fence 
object", that can tell the kernel when the GPU is finished with
the buffer. Then the caller has two options. Either mark the buffer as 
NO_MOVE, which means
that it will maintain it's current location until the caller tells it 
otherwise *). This should not really be needed as it fragments memory.
The other option is that the caller relies on the kernel not to move the 
buffer until the fence object has signaled that it is OK. This option is 
used by the intel driver.
A fence object is usually implemented as a sequence number submitted to 
the command stream, a user interrupt that occurs when the command stream 
has passed the sequence number, and a means to flush the rendering 
caches. A command buffer fence does, for example, not need the flushing 
operation, whereas a texturing operation needs to make sure that all 
operations in the command stream prior to the fence have finished, and 
their result have been flushed to memory.

Given this description, I cannot see how hardware with multiple 
rendering contexts would need anything more. What can be a bit difficult 
to implement is if that hardware has a per-context translation table, 
but with the current implementation, that is strictly not necessary.

Also remember that some kernel memory manager operations can be slow. 
This includes copying and change page caching policy.
The optimal use is to have user space create its own memory manager 
within a buffer handed to it by the kernel memory manager.
For example, buffers can be allocated from the kernel in 2MB chunks that 
are managed from user space.  This is NOT currently done in the intel 
driver except for batch buffers. Still, performance is quite impressive.

*) A NO_MOVE buffer can actually lose it's content temporarily when its 
fence object has expired, but it is always guaranteed to be there if the 
owner's hardware lock is held and it is always guaranteed to be in the 
same physical location. The X server will, for example, move all managed 
buffers out when it leaves VT and takes the hardware lock, but will 
reload it before releasing the hardware lock.
Potential Scan-out buffers should be marked NO_EVICT and are available 
only to the X server. Trying to leave VT with active NO_EVICT buffers 
are an error.

/Thomas










>-------------------------------------------------------------------------
>Take Surveys. Earn Cash. Influence the Future of IT
>Join SourceForge.net's Techsay panel and you'll get the chance to share your
>opinions on IT & business topics through brief surveys - and earn cash
>http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
>_______________________________________________
>Mesa3d-dev mailing list
>Mesa3d-dev@lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
>  
>


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev

Re: [Mesa3d-dev] Merged texmem branch

Reply via email to