Re: [Linux-fbdev-devel] Re: radeon, apertures memory mapping
Benjamin Herrenschmidt [EMAIL PROTECTED] writes: In an ideal world ... However, since we are planning to move the memory manager to the kernel, that would mean a kernel access (syscall, ioctl, whatever...) twice per access to AGP memory. Not realistic. Could the user space driver batch many such accesses together and use a lock_many()/unlock_many() API? Søren --- SF email is sponsored by - The IT Product Guide Read honest candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_ide95alloc_id396op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Linux-fbdev-devel] Re: radeon, apertures memory mapping
On Mon, Mar 14, 2005 at 05:30:04PM +0100, Soeren Sandmann wrote: Benjamin Herrenschmidt [EMAIL PROTECTED] writes: In an ideal world ... However, since we are planning to move the memory manager to the kernel, that would mean a kernel access (syscall, ioctl, whatever...) twice per access to AGP memory. Not realistic. Could the user space driver batch many such accesses together and use a lock_many()/unlock_many() API? Natrually it should try to do as much as possible during the lock()/unlock() sequence. -- Ville Syrjälä [EMAIL PROTECTED] http://www.sci.fi/~syrjala/ --- SF email is sponsored by - The IT Product Guide Read honest candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_ide95alloc_id396op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Linux-fbdev-devel] Re: radeon, apertures memory mapping
On Mon, 2005-03-14 at 17:30 +0100, Soeren Sandmann wrote: Benjamin Herrenschmidt [EMAIL PROTECTED] writes: In an ideal world ... However, since we are planning to move the memory manager to the kernel, that would mean a kernel access (syscall, ioctl, whatever...) twice per access to AGP memory. Not realistic. Could the user space driver batch many such accesses together and use a lock_many()/unlock_many() API? We may have to use a lock/unlock API anyway due to interaction with the VGA arbiter in fact. If for some reason, the card can't completely disable decoding of VGA and IO space, it needs to bracket any access to the framebuffer with something. That is unuseable for things like MOL though. We are giving the framebuffer to some foreign OS in an emulation shell that doesn't know how to do but blit directly at any time. Oh well, I need to think a bit more about those sceniario. Ben. --- SF email is sponsored by - The IT Product Guide Read honest candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Linux-fbdev-devel] Re: radeon, apertures memory mapping
On Sun, Mar 13, 2005 at 11:19:35AM -0500, Jon Smirl wrote: On Sun, 13 Mar 2005 23:04:59 +1100, Benjamin Herrenschmidt [EMAIL PROTECTED] wrote: AGP as it's currently used is pretty much pointless for software fallbacks since reading from AGP memory is nearly as slow as reading from video memory. Hrm.. I wouldn't expect _that_ slow. It's uncacheable, right, but still on a faster bus. Especially if we use it the way we do on ppc where we actually map the RAM pages directly instead of having processes go through the GART. I asked at the Xdev conference if there were page table tricks that would work for accessing GART memory. Everybody said no but I'm still wondering if there are any. For example the ppc has an instruction for flushing specific pages from cache, unlike the x86 where you can only flush everything. So on the ppc you could leave the GART memory mapped normally and cached. Do all of your fallback calculations, then flush the address range from cache. Now tell the GPU to go use it. Can't GART memory be normally cached RAM as long as we flush the cache before telling the GPU to use it? If you are doing fallback calculations in a 6MB buffer that is 1,500 pages. Accessing all of this effectively flushes the data cache. Once you are done with it you probably don't want those pages in the cache anyway. I don't understand why we have GART memory anyway. It's just main memory and I don't see any point going through the GART to access it with the CPU. Only the graphics card needs to use the GART. -- Ville Syrjälä [EMAIL PROTECTED] http://www.sci.fi/~syrjala/ --- SF email is sponsored by - The IT Product Guide Read honest candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_ide95alloc_id396op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Linux-fbdev-devel] Re: radeon, apertures memory mapping
On Sun, 13 Mar 2005 19:47:14 +0200, Ville Syrjälä [EMAIL PROTECTED] wrote: I don't understand why we have GART memory anyway. It's just main memory and I don't see any point going through the GART to access it with the CPU. Only the graphics card needs to use the GART. I see no need to for the CPU to go through the GART either. The main CPU page tables can provide the same rearranging that the GART does. We do need specially marked GART memory because of caching issues. If the CPU writes to GART RAM the write may still be on the CPU chip in a cache. We have to make sure it gets pushed into physical memory so that the GPU can see it. The best model would be to chuck the AGP/PCI Express interface on the board and have a hyperchannel instead. Hyperchannel provides full cache consistency without all of these flushing problems. The GPU really is another specialized CPU, give it a CPU class memory interface. -- Jon Smirl [EMAIL PROTECTED] --- SF email is sponsored by - The IT Product Guide Read honest candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_ide95alloc_id396op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Linux-fbdev-devel] Re: radeon, apertures memory mapping
If you are doing fallback calculations in a 6MB buffer that is 1,500 pages. Accessing all of this effectively flushes the data cache. Once you are done with it you probably don't want those pages in the cache anyway. I don't understand why we have GART memory anyway. It's just main memory and I don't see any point going through the GART to access it with the CPU. Only the graphics card needs to use the GART. Which is what we do on some archs, like Apple ppcs, where the GART doesn't work for CPU accesses. We just use the real RAM pages. However, we have to map them non-cacheable since the GPU GART accesses bypass the cache coherency protocol (that is the case on most AGP implementations afaik and is broken by design, imho). Ben. --- SF email is sponsored by - The IT Product Guide Read honest candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Linux-fbdev-devel] Re: radeon, apertures memory mapping
The best model would be to chuck the AGP/PCI Express interface on the board and have a hyperchannel instead. Hyperchannel provides full cache consistency without all of these flushing problems. The GPU really is another specialized CPU, give it a CPU class memory interface. You mean HyperTransport ? Well, PCI Express isn't far from that neither... Ben. --- SF email is sponsored by - The IT Product Guide Read honest candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Linux-fbdev-devel] Re: radeon, apertures memory mapping
[slightly off topic] Am Sonntag, den 13.03.2005, 12:56 -0500 schrieb Jon Smirl: On Sun, 13 Mar 2005 19:47:14 +0200, Ville Syrjälä [EMAIL PROTECTED] wrote: I don't understand why we have GART memory anyway. It's just main memory and I don't see any point going through the GART to access it with the CPU. Only the graphics card needs to use the GART. I see no need to for the CPU to go through the GART either. The main CPU page tables can provide the same rearranging that the GART does. We do need specially marked GART memory because of caching issues. If the CPU writes to GART RAM the write may still be on the CPU chip in a cache. We have to make sure it gets pushed into physical memory so that the GPU can see it. If this is true, then I'm surprised that PCI-DMA with normal cacheable memory works. All practical experience with the Savage driver teaches me that a memory barrier is sufficient. Or does a memory barrier really flush all CPU caches? [snip] -- | Felix Kühling [EMAIL PROTECTED] http://fxk.de.vu | | PGP Fingerprint: 6A3C 9566 5B30 DDED 73C3 B152 151C 5CC1 D888 E595 | --- SF email is sponsored by - The IT Product Guide Read honest candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_ide95alloc_id396op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Linux-fbdev-devel] Re: radeon, apertures memory mapping
On Mon, 14 Mar 2005 08:52:46 +1100, Benjamin Herrenschmidt [EMAIL PROTECTED] wrote: The best model would be to chuck the AGP/PCI Express interface on the board and have a hyperchannel instead. Hyperchannel provides full cache consistency without all of these flushing problems. The GPU really is another specialized CPU, give it a CPU class memory interface. You mean HyperTransport ? Well, PCI Express isn't far from that neither... Yes, HyperTransport. That's what I get for writing email while the babies are crying. Does PCI Express support a full cache coherency protocol like HyperTransport? So if the GPU touches a page of system memory, the system memory will get flushed out of the CPU cache? In that case we don't need to mark shared GPU/CPU memory as non-cachable. Ben. -- Jon Smirl [EMAIL PROTECTED] --- SF email is sponsored by - The IT Product Guide Read honest candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Linux-fbdev-devel] Re: radeon, apertures memory mapping
On Sun, 2005-03-13 at 17:17 -0500, Jon Smirl wrote: On Mon, 14 Mar 2005 08:52:46 +1100, Benjamin Herrenschmidt [EMAIL PROTECTED] wrote: The best model would be to chuck the AGP/PCI Express interface on the board and have a hyperchannel instead. Hyperchannel provides full cache consistency without all of these flushing problems. The GPU really is another specialized CPU, give it a CPU class memory interface. You mean HyperTransport ? Well, PCI Express isn't far from that neither... Yes, HyperTransport. That's what I get for writing email while the babies are crying. Does PCI Express support a full cache coherency protocol like HyperTransport? Or just like PCI :) AGP is just a special case, and I'm not even sure wether the cache incoherency is a spec thing or just that all implementations are broken :) I'll check when I find my AGP spec. So if the GPU touches a page of system memory, the system memory will get flushed out of the CPU cache? In that case we don't need to mark shared GPU/CPU memory as non-cachable. Well, that's what happen with a normal cache coherent setup, but I'm fairly sure that won't happen with AGP on a lot of machines. Ben. --- SF email is sponsored by - The IT Product Guide Read honest candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Linux-fbdev-devel] Re: radeon, apertures memory mapping
On Sun, 2005-03-13 at 23:21 +0100, Felix Kühling wrote: [slightly off topic] Am Sonntag, den 13.03.2005, 12:56 -0500 schrieb Jon Smirl: On Sun, 13 Mar 2005 19:47:14 +0200, Ville Syrjälä [EMAIL PROTECTED] wrote: I don't understand why we have GART memory anyway. It's just main memory and I don't see any point going through the GART to access it with the CPU. Only the graphics card needs to use the GART. I see no need to for the CPU to go through the GART either. The main CPU page tables can provide the same rearranging that the GART does. We do need specially marked GART memory because of caching issues. If the CPU writes to GART RAM the write may still be on the CPU chip in a cache. We have to make sure it gets pushed into physical memory so that the GPU can see it. If this is true, then I'm surprised that PCI-DMA with normal cacheable memory works. All practical experience with the Savage driver teaches me that a memory barrier is sufficient. Or does a memory barrier really flush all CPU caches? Normal PCI DMA, on most architecture, is snooped by the bridge/cpu and thus is fully cache coherent. On architectures where it is no, the kernel pci_dma_* and pci_alloc_consistent functions will take care of dealing with cache issues (allocating non-cacheable space for consistent memory and doing caches flush/invalidates for the rest). AGP GART access tend to be implemented in a weird way in the host bridges that bypasses the cache coherency protocol, I suppose for performances reasons. Ben. --- SF email is sponsored by - The IT Product Guide Read honest candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_ide95alloc_id396op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Linux-fbdev-devel] Re: radeon, apertures memory mapping
On Mon, 14 Mar 2005 12:05:59 +1100, Benjamin Herrenschmidt [EMAIL PROTECTED] wrote: It should be the responsibility of the memory manager. If anything wants to access the memory it would call lock() and when it's done with the memory it calls unlock(). That's exactly how DirectFB's memory manager works. In an ideal world ... However, since we are planning to move the memory manager to the kernel, that would mean a kernel access (syscall, ioctl, whatever...) twice per access to AGP memory. Not realistic. I'm only suggesting this for the DRM/fbdev stack. Anything else from user space can use a non-cached mapping. It shouldn't hurt to have a parallel non-cached mapping being used in conjuction with this protocol. By definition the non-cached mapping never gets into an inconsistent state. The case of the CP ring is easy to deal with by the macros we have there already and it would be kernel-kernel. But it would be a hit for a lot of other things I suppose. The performance trade off is, how long does the invalidate take? If the CPU has 2MB of unflushed write data the instruction is going to take a while to finish. In the non-cached scheme this data is flushed in parallel with us playing with the AGP memory. To flush 2MB takes something like 2MB / 400Mhz * 64bytes * 2 (DDR) = 20 microseconds but it may be more like 1 microsecond on average. Thinking about this for a while you can't compute which is the better strategy because everything depends on the workload and how dirty the cache is. Best thing to do would be to code it up and try it. But I want to get a dual head radeon driver working first. It may also be true that the CP Ring is better left non-cached and only access to the graphics buffers be done with the caching scheme. BTW, you can implement super fast texture load/unload using a similar scheme. Start with the texture in the user space program. Program wants to upload the texture. Flush CPU cache. Point the GART at the physical pages allocated to the user holding the texture. Now walk the user's page table and mark those pages copy on write. Free the memory the pages the GART was originally pointing at. Reverse the scheme to get data from the GPU. For small textures it is faster to copy them but if you are moving 20MB of data this is much faster. -- Jon Smirl [EMAIL PROTECTED] --- SF email is sponsored by - The IT Product Guide Read honest candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Linux-fbdev-devel] Re: radeon, apertures memory mapping
On Sun, 2005-03-13 at 20:47 -0500, Jon Smirl wrote: On Mon, 14 Mar 2005 12:05:59 +1100, Benjamin Herrenschmidt [EMAIL PROTECTED] wrote: It should be the responsibility of the memory manager. If anything wants to access the memory it would call lock() and when it's done with the memory it calls unlock(). That's exactly how DirectFB's memory manager works. In an ideal world ... However, since we are planning to move the memory manager to the kernel, that would mean a kernel access (syscall, ioctl, whatever...) twice per access to AGP memory. Not realistic. I'm only suggesting this for the DRM/fbdev stack. Anything else from user space can use a non-cached mapping. Then I don't see the point. Especially since the problem I explained would still be there as long as there is a non-cached mapping. It shouldn't hurt to have a parallel non-cached mapping being used in conjuction with this protocol. By definition the non-cached mapping never gets into an inconsistent state. Wrong :) It can badly conflict with the existence of a cached mapping. Re-read my mail that explains the problem carefully. The case of the CP ring is easy to deal with by the macros we have there already and it would be kernel-kernel. But it would be a hit for a lot of other things I suppose. The performance trade off is, how long does the invalidate take? If the CPU has 2MB of unflushed write data the instruction is going to take a while to finish. In the non-cached scheme this data is flushed in parallel with us playing with the AGP memory. To flush 2MB takes something like 2MB / 400Mhz * 64bytes * 2 (DDR) = 20 microseconds but it may be more like 1 microsecond on average. Thinking about this for a while you can't compute which is the better strategy because everything depends on the workload and how dirty the cache is. Best thing to do would be to code it up and try it. But I want to get a dual head radeon driver working first. It may also be true that the CP Ring is better left non-cached and only access to the graphics buffers be done with the caching scheme. Using write-through cache might be an interesting tradeoff BTW, you can implement super fast texture load/unload using a similar scheme. Start with the texture in the user space program. Program wants to upload the texture. Flush CPU cache. Point the GART at the physical pages allocated to the user holding the texture. Now walk the user's page table and mark those pages copy on write. Free the memory the pages the GART was originally pointing at. Reverse the scheme to get data from the GPU. For small textures it is faster to copy them but if you are moving 20MB of data this is much faster. -- Benjamin Herrenschmidt [EMAIL PROTECTED] --- SF email is sponsored by - The IT Product Guide Read honest candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Linux-fbdev-devel] Re: radeon, apertures memory mapping
[EMAIL PROTECTED] removed from CC since I can't post to it. Jon Smirl writes: It shouldn't hurt to have a parallel non-cached mapping being used in conjuction with this protocol. By definition the non-cached mapping never gets into an inconsistent state. According to the PowerPC Architecture specification, it is a programming error to have both cacheable and uncacheable mappings of the same page. That means the hardware designers consider that they don't have to worry if the hardware misbehaves if software does that. :P So that is not a feasible solution for us. Paul. --- SF email is sponsored by - The IT Product Guide Read honest candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Linux-fbdev-devel] Re: radeon, apertures memory mapping
On Sun, 2005-03-13 at 17:43 +1100, Benjamin Herrenschmidt wrote: And finally, I want to blank the screen (using the accel engine) before setting the new mode, so that we come out clean of the mode setting (without ugly artifact), and I will probably clean both fb's (simpler). That would break X with UseFBDev. -- Earthling Michel Dnzer | Debian (powerpc), X and DRI developer Libre software enthusiast| http://svcs.affero.net/rm.php?r=daenzer --- SF email is sponsored by - The IT Product Guide Read honest candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_ide95alloc_id396op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Linux-fbdev-devel] Re: radeon, apertures memory mapping
On Mon, 14 Mar 2005 15:07:26 +1100, Paul Mackerras [EMAIL PROTECTED] wrote: [EMAIL PROTECTED] removed from CC since I can't post to it. Jon Smirl writes: It shouldn't hurt to have a parallel non-cached mapping being used in conjuction with this protocol. By definition the non-cached mapping never gets into an inconsistent state. According to the PowerPC Architecture specification, it is a programming error to have both cacheable and uncacheable mappings of the same page. That means the hardware designers consider that they don't have to worry if the hardware misbehaves if software does that. :P So that is not a feasible solution for us. Paul. Ok, I see this is a problem for the PPC. I've never used a PPC so you guys have to tell me what is illegal on it. -- Jon Smirl [EMAIL PROTECTED] --- SF email is sponsored by - The IT Product Guide Read honest candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Linux-fbdev-devel] Re: radeon, apertures memory mapping
On Sun, 2005-03-13 at 23:40 -0500, Jon Smirl wrote: On Mon, 14 Mar 2005 15:07:26 +1100, Paul Mackerras [EMAIL PROTECTED] wrote: [EMAIL PROTECTED] removed from CC since I can't post to it. Jon Smirl writes: It shouldn't hurt to have a parallel non-cached mapping being used in conjuction with this protocol. By definition the non-cached mapping never gets into an inconsistent state. According to the PowerPC Architecture specification, it is a programming error to have both cacheable and uncacheable mappings of the same page. That means the hardware designers consider that they don't have to worry if the hardware misbehaves if software does that. :P So that is not a feasible solution for us. Paul. Ok, I see this is a problem for the PPC. I've never used a PPC so you guys have to tell me what is illegal on it. And probably for other platforms as well. I'm pretty sure some Athlons will be very upset too. I'm not even sure you can do that sort of tricks on MIPS which has strange mapping rules, etc, etc, etc... Anyway, mixing cacheable and non-cacheable mappings is asking for trouble, just don't do it. Having the ability to do both (selected by the platform type, or some AGP errata bit, or whatever) is a different issue and might be worth investigating. Ben. --- SF email is sponsored by - The IT Product Guide Read honest candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Linux-fbdev-devel] Re: radeon, apertures memory mapping
On Sun, 2005-03-13 at 23:28 -0500, Michel Dänzer wrote: On Sun, 2005-03-13 at 17:43 +1100, Benjamin Herrenschmidt wrote: And finally, I want to blank the screen (using the accel engine) before setting the new mode, so that we come out clean of the mode setting (without ugly artifact), and I will probably clean both fb's (simpler). That would break X with UseFBDev. How so ? Ben. --- SF email is sponsored by - The IT Product Guide Read honest candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_ide95alloc_id396op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel