Jerome Glisse wrote:
> On Tue, 13 May 2008 21:35:16 +0100 (IST)
> Dave Airlie <[EMAIL PROTECTED]> wrote:
>
>   
>> 1) I feel there hasn't been enough open driver coverage to prove it. So 
>> far we have done an Intel IGD, we have a lot of code that isn't required 
>> for these devices, so the question of how much code exists purely to 
>> support poulsbo closed source userspace there is and why we need to live 
>> with it. Both radeon and nouveau developers have expressed frustration 
>> about the fencing internals being really hard to work with which doesn't 
>> bode well for maintainability in the future.
>>     
>
> Well my ttm experiment bring me up to EXA with radeon, i also done several
> small 3d test to see how i want to send command. So from my experiments here
> are the things that are becoming painfull for me.
>
> On some radeon hw (most of newer card with big amount of ram) you can't
> map vram beyond aperture, well you can be you need to reprogram card
> aperture and it's not somethings you want to do. TTM assumption is that
> memory access are done through map of the buffer and so in this situation
> this become cumberstone. We already discussed this and the idea was to
> split vram but i don't like this solution. So in the end i am more and
> more convinced that we should avoid object mapping in vma of client i see
> 2 advantages to this : no tlb flush on vma, no hard to solve page maping
> aliasing.
>
> On fence side i hoped that i could have reasonable code using IRQ working
> reliably but after discussion with AMD what i was doing was obviously not
> recommanded and prone to hard GPU lockup which is no go for me. The last
> solution i have in mind about synchronization ie knowing when gpu is done
> with a buffer could not use IRQ at least not on all hw i am interesed in
> (r3xx/r4xx). Of course i don't want to busy wait for knowing when GPU is
> done. Also fence code put too much assumption on what we should provide,
> while fencing might prove usefull, i think it can be more well served by
> driver specific ioctl than by a common infrastructure where hw obviously
> doesn't fit well in the scheme due to their differences.
>
> And like Stephane, i think virtual memory from GPU stuff can't be used
> at its best in this scheme.
>
> That said, i share also some concern on GEM like the high memory page but
> i think this one is workable with help of kernel people. For vram the
> solution discussed so far and which i like is to have driver choose
> based on client request on which object to put their and to see vram as
> a cache. So we will have all object backed by a ram copy (which can be
> swapped) then it's all a matter on syncing vram copy & ram copy when
> necessary. Domain & pread/pwrite access let you easily do this sync
> only on the necessary area. Also for suspend becomes easier just sync
> object where write domain is GPU. So all in all i agree that GEM might
> ask each driver to redo some stuff but i think a large set of helper
> function can leverage this, but more importantly i see this as freedom
> for each driver and the only way to cope with hw differences.
>
> Cheers,
> Jerome Glisse <[EMAIL PROTECTED]>
>   
Jerome, Dave, Keith

It's hard to argue against people trying things out and finding it's not 
really what they want, so I'm not going to do that.

The biggest argument (apart from the fencing) seems to be that people 
thinks TTM stops them from doing what they want with the hardware, 
although it seems like the Nouveau needs and Intel UMA needs are quite 
opposite. In an open-source community where people work on things 
because they want to, not being able to do what you want to is a bad thing,

OTOH a stall and disagreement about what's the best thing  to  use is 
even worse.  It confuses the users and it's particularly bad for  people 
trying to write drivers on a commercial basis.

I've looked through KeithPs mail to look for a way to use GEM for future 
development. Since many things will be device-dependent I think it's 
possible for us to work around some issues I see,  but  a couple of big 
things remain.

1) The inability to map device memory. The design arguments and proposed 
solution for VRAM are not really valid. Think of this, probably not too 
uncommon, scenario of a single pixel fallback composite to a scanout 
buffer in vram. Or a texture or video frame upload:

A) Page in all GEM pages, because they've been paged out.
B) Copy the complete scanout buffer to GEM because it's dirty. Untile.
C) Write the pixel.
D) Copy the complete buffer back while tiling.

2) Reserving pages when allocating VRAM buffers is also a very bad 
solution particularly on systems with a lot of VRAM and little system 
RAM. (Multiple card machines?). GEM basically needs to reserve 
swap-space when buffers are created, and put a limit on the pinned 
physical pages.  We basically should not be able to fail memory 
allocation during execbuf, because we cannot recover from that.

Other things like GFP_HIGHUSER etc are probably fixable if there is a 
will to do it.

So if GEM is the future, these shortcomings must IMHO be addressed. In 
particular GEM should not stop people from mapping device memory 
directly. Particularly not in the view of the arguments against TTM 
previously outlined.

This means that the dependency on SHMEMFS propably needs to be dropped 
and replaced with some sort of DRMFS that allows overloading of mmap and 
a correct swap handling, address the caching issue and also avoids the 
driver do_mmap(). 

If we're taking another round at this, There's a need to get it more 
right than the old solution.

/Thomas




-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
--
_______________________________________________
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Reply via email to