> allocate pixmap gets cached memory > copy data into the pixmap > pre-use from hardware we flush the cache lines and tlb > use the pixmap in hardware > pre-free we need to set the page back to cached so we flush the tlb > free the memory.
> Now the big issue here on SMP is that the cache and/or tlb flushes > require IPIs and they are very noticeable on the profiles, Blame intel ;) > Any other ideas and suggestions? Without knowing exactly what you are doing: - Copies to uncached memory are very expensive on an x86 processor (so it might be faster not to write and flush) - Its not clear from your description how intelligent your transfer system is. I'd expect for example that the process was something like Parse pending commands until either 1. Queue empties 2. A time target passes For each command we need to shove a pixmap over add it to the buffer to transfer Do a single CLFLUSH and maybe IPI Fire up the command queue Keep the buffers hanging around until there is memory pressure if we may reuse that pixmap Can you clarify that ? If the hugepage anti-frag stuff ever gets merged this would also help as you could possibly grab a huge page from the allocator for this purpose and have to flip only one TLB entry. Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/