Re: uncached page allocator
Peter Zijlstra wrote: > On Tue, 2007-08-21 at 16:05 +1000, Dave Airlie wrote: > > >>So you can see why some sort of uncached+writecombined page cache >>would be useful, I could just allocate a bunch of pages at startup as >>uncached+writecombined, and allocate pixmaps from them and when I >>bind/free the pixmap I don't need the flush at all, now I'd really >>like this to be part of the VM so that under memory pressure it can >>just take the pages I've got in my cache back and after flushing turn >>them back into cached pages, the other option is for the DRM to do >>this on its own and penalise the whole system. > > > Can't you make these pages part of the regular VM by sticking them all > into an address_space. > > And for this reclaim behaviour you'd only need to set PG_private and > have a_ops->releasepage() dtrt. I'd just suggest Dave just registers a shrinker to start with. You really want to be able to batch TLB flushes as well, which ->releasepage may not be so good at (you could add more machinery behind the releasepage to build batches and so on, but anyway, a shrinker might be the quickest way to get something working). -- SUSE Labs, Novell Inc. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: uncached page allocator
There is an uncached allocator in IA64 arch code (linux/arch/ia64/kernel/uncached.c). Maybe having a look at that will help? Jes wrote it. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: uncached page allocator
On Tue, 2007-08-21 at 16:05 +1000, Dave Airlie wrote: > So you can see why some sort of uncached+writecombined page cache > would be useful, I could just allocate a bunch of pages at startup as > uncached+writecombined, and allocate pixmaps from them and when I > bind/free the pixmap I don't need the flush at all, now I'd really > like this to be part of the VM so that under memory pressure it can > just take the pages I've got in my cache back and after flushing turn > them back into cached pages, the other option is for the DRM to do > this on its own and penalise the whole system. Can't you make these pages part of the regular VM by sticking them all into an address_space. And for this reclaim behaviour you'd only need to set PG_private and have a_ops->releasepage() dtrt. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: uncached page allocator
> Blame intel ;) > > > Any other ideas and suggestions? > > Without knowing exactly what you are doing: > > - Copies to uncached memory are very expensive on an x86 processor > (so it might be faster not to write and flush) > - Its not clear from your description how intelligent your transfer > system is. It is still possible to change the transfer system, but it should be intelligent enough or possible to make it more intelligent.. I also realise I need PAT + write combining but I believe this problem is othogonal... > > I'd expect for example that the process was something like > > Parse pending commands until either > 1. Queue empties > 2. A time target passes > > For each command we need to shove a pixmap over add it > to the buffer to transfer > > Do a single CLFLUSH and maybe IPI > > Fire up the command queue > > Keep the buffers hanging around until there is memory pressure > if we may reuse that pixmap > > Can you clarify that ? So at the moment a pixmap maps directly to a kernel buffer object which is a bunch of pages that get faulted in on the CPU or allocated when the buffer is to be used by the GPU. So when a pixmap is created a buffer object is created, when a pixmap is destroyed a buffer object is destroyed. Perhaps I can cache a bunch of buffer objects in userspace for re-use as pixmaps but I'm not really sure that will scale too well. When X wishes the GPU to access a buffer (pixmap), it calls into the kernel with a single ioctl with a list of all buffers the GPU is going to access along with a buffer containing the command to do the access, now at the moment, when each of those buffers is bound into the GART for the first time the system does a change_page_attr for each page and calls the global flush[1]. Now if a buffer is bound into the GART and gets accessed from the CPU later again (software fallback) we have the choice of taking it back out of the GART and letting the nopfn call fault back in the pages uncached or we can flush the tlb and bring them back in cached. We are hoping to avoid software fallbacks on the hardware platforms we want to work on as much as possible. Finally when a buffer is destroyed, the pages are released back to the system, so of course the pages are set back to cached and we need another tlb/cache flush per pixmap buffer destructor. So you can see why some sort of uncached+writecombined page cache would be useful, I could just allocate a bunch of pages at startup as uncached+writecombined, and allocate pixmaps from them and when I bind/free the pixmap I don't need the flush at all, now I'd really like this to be part of the VM so that under memory pressure it can just take the pages I've got in my cache back and after flushing turn them back into cached pages, the other option is for the DRM to do this on its own and penalise the whole system. [1]. (this is one inefficiency in that if multiple buffers are being bound in for the first time it'll flush for each of them, I'm trying to get rid of this inefficiency but I may need to tweak the order of things as at the moment, it crashes hard if I tried to leave the cache/tlb flush until later.) Dave. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: uncached page allocator
> allocate pixmap gets cached memory > copy data into the pixmap > pre-use from hardware we flush the cache lines and tlb > use the pixmap in hardware > pre-free we need to set the page back to cached so we flush the tlb > free the memory. > Now the big issue here on SMP is that the cache and/or tlb flushes > require IPIs and they are very noticeable on the profiles, Blame intel ;) > Any other ideas and suggestions? Without knowing exactly what you are doing: - Copies to uncached memory are very expensive on an x86 processor (so it might be faster not to write and flush) - Its not clear from your description how intelligent your transfer system is. I'd expect for example that the process was something like Parse pending commands until either 1. Queue empties 2. A time target passes For each command we need to shove a pixmap over add it to the buffer to transfer Do a single CLFLUSH and maybe IPI Fire up the command queue Keep the buffers hanging around until there is memory pressure if we may reuse that pixmap Can you clarify that ? If the hugepage anti-frag stuff ever gets merged this would also help as you could possibly grab a huge page from the allocator for this purpose and have to flip only one TLB entry. Alan - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
uncached page allocator
Hi all, I've started doing some work with using the new DRM memory manager from TG for pixmaps in the X server using Intel 9xx series hardware. The intel hardware pretty much requires pages to be uncached for the GPU to access them. It can use cached memory for some operations but it isn't very useful and my attempts to use it ended in a lot of crashiness.. Now one of the major usage patterns for pixmaps is allocate pixmap copy data into pixmap use pixmap from hardware free pixmap with the current memory manager + updated change_page_attr (to use clflush when we have it) fixes from Andi Kleen, it operates something like this allocate pixmap gets cached memory copy data into the pixmap pre-use from hardware we flush the cache lines and tlb use the pixmap in hardware pre-free we need to set the page back to cached so we flush the tlb free the memory. The other path is if we don't want to use the memory cached ever is just allocate pixmap flush cache lines/tlb use uncached from CPU use uncached from GPU pre-free set the page back to cached, flush the TLB free the page Now the big issue here on SMP is that the cache and/or tlb flushes require IPIs and they are very noticeable on the profiles, So after all that I'd like to have some sort of uncached page list I can allocate pages from, so with frequent pixmap creation/destruction I don't spend a lot of time in the cache flushing routines and avoid the IPI in particular. The options I can sorta see roughly are: 1. the DRM just allocates a bunch of uncached pages and manages a cache of them for interacting with the hardware, this sounds wrong and we run into how do we correctly size the pool issues. 2. (Is this idea crazy??) We modify the VM somehow so we have an uncached list, when we first allocate pages with the GFP_UNCACHED they get migrated to the uncached zone and the pages use a page flag to say they are uncached. Then the DRM just re-uses things from that list. If later we end up with memory pressure, the free pages on the uncached list could be migrated back to the normal page lists by modifying the page attributes and flushing the tlb Any other ideas and suggestions? Dave. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel