Re: [Dri-devel] r128 strangeness
Eric Anholt wrote: > I was working on os-independence, starting with the r128 driver because > I have a linux machine ready with a r128 in it. It's a gentoo 1.1 > system (2.4.18 vanilla), Rage128 Mobility M4 on Inspiron 8000 (i815), > and 4.2.0 was installed with no DRM. > > I made World install with bsd-3-0-0-branch, and compiled r128.o from > .../linux/drm/kernel. Loaded the module, restarted X, direct rendering > was enabled but the graphics are garbled. Screen clearing isn't working > and windows are missing graphics. Upon logging in (which started > gnome), the system crashed (alt-sysrq to reboot). After rebooting, > starting X again, and going to console, the XFree86.0.log is full of: > > (EE) R128(0): Idle timed out, resetting engine... > > bsd-3-0-0-branch has the same r128 ddx driver and kernel module code as > trunk according to cvs diff -u -rHEAD. The X Server, ddx, drm, and dri > modules got installed, so I don't think it's a versioning issue. > > Has anyone else seen this? > > I don't know how many people are using r128's. Keith Bringing you mounds of caffeinated joy >>> http://thinkgeek.com/sf<<< ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] R200 kernel interfaces
Jens Owen wrote: > Linus Torvalds wrote: > > >>Yes, kernel support (or indirect rendering) is needed for untrusted >>applications, but it might actually be interesting to see what a >>direct-rendering all-user-land implementation looks like. It has some >>debugging advantages, and it may actually make sense to start from a >>totally trusted app that goes as fast as humanly possible, and then when >>that has been optimized to death look at just where the interfaces make >>the most sense.. >> > > Keith, > > Along these lines, I've been toying around with the idea that direct > user level access to the ring for commands *might* be able to use a DRM > locking policy similar to how we protected the ring in the TDFX driver > where it was directly accessed by the user space driver and indirect > buffers were only used for indirect kinds of data like arrays and > textures. I don't understand why you see the ring as being so special. The hardware provides indirect buffers just so you don't have to have multiple clients contending for the ring. I don't see any disadvantages to using that; the big advantage is you don't have to grab the lock each time you want to emit a few bytes to an indirect buffer (you would with the ring, and the fast path on the lock isn't *that* fast). Arrays & textures have a third level of indirection provided by the hardware. > We touched on this a few weeks ago on IRC, and IIRC you thought there > might be some problems with coordinating access and aging buffers. > Would it be valuable if I were able to get a prototype going where the > 2D server access and the kernel DRM module shared direct access to the > ring protected by the DRM lock? Obviously, multiple instances of a 3D > driver would be an even better prototype, but I'm looking for an easier > proof of concept:-) This is a reasonable thing to do anyway as the 2d driver currently holds the lock over its entire operation (perhaps delayed until the first accel action), and is already 'trusted', so probably *should* use the ring directly rather than going through the ioctl overhead for accel, which typically comes to only a few 10's to 100's of bytes/ioctl. > If I can get 2D and kernel sharing access, what problems do you foresee > with getting multiple 3D clients to participate using a policy similar > to TDFX? The i810 actually works exactly the way you're describing. Keith Bringing you mounds of caffeinated joy >>> http://thinkgeek.com/sf<<< ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
[Dri-devel] r128 strangeness
I was working on os-independence, starting with the r128 driver because I have a linux machine ready with a r128 in it. It's a gentoo 1.1 system (2.4.18 vanilla), Rage128 Mobility M4 on Inspiron 8000 (i815), and 4.2.0 was installed with no DRM. I made World install with bsd-3-0-0-branch, and compiled r128.o from .../linux/drm/kernel. Loaded the module, restarted X, direct rendering was enabled but the graphics are garbled. Screen clearing isn't working and windows are missing graphics. Upon logging in (which started gnome), the system crashed (alt-sysrq to reboot). After rebooting, starting X again, and going to console, the XFree86.0.log is full of: (EE) R128(0): Idle timed out, resetting engine... bsd-3-0-0-branch has the same r128 ddx driver and kernel module code as trunk according to cvs diff -u -rHEAD. The X Server, ddx, drm, and dri modules got installed, so I don't think it's a versioning issue. Has anyone else seen this? -- Eric Anholt <[EMAIL PROTECTED]> http://gladstone.uoregon.edu/~eanholt/dri/ Bringing you mounds of caffeinated joy >>> http://thinkgeek.com/sf<<< ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] R200 kernel interfaces
Jeff Hartmann wrote: > > Keith Whitwell wrote: > > > Benjamin Herrenschmidt wrote: > > > >>> HOWEVER, if you tied the GART mapping to the DRM lock, you might be ok. > >>> That gives you the required system exclusion, and if you make it an > >>> explicit "get my GART context" function that is only called under > >>> the DRM > >>> lock _and_ only called when you actually need the AGP access, you also > >>> avoid the unnecessary context switches. > >>> > >>> You might still have some performance issues simply because you > >>> would do > >>> extra work when switching aperture mappings, but hopefully the GART > >>> switch > >>> wouldn't be a common operation. > >>> > >>> The flexibility you would get _might_ be worth it. > >>> > >> > >> Well, I would personally vote for the processes _not_ relying on having > >> the AGP aperture mapped directly, but instead, the various memory pages > >> making their AGP aperture. Several chipsets (Apple ones for sure, but it > >> seems others are hitting this too nowadays) don't support AGP aperture > >> accesses from the CPU. > > > > > > What are you actually saying, that pages mapped in agp can't be > > written by any means, or just that they can't be written through the > > agp address range? > > > > It sounds kindof broken to me in any case. How to mtrrs work in this > > world? > > Actually we should go to this model eventually. However it needs me to > have time to finish the Page Attribute Table support I started on at > VA. This allows write combining to be set on a per page basis, and is > the direction we want to go even on x86. > > > > > > >> That way, if you want several AGP contexts, you can have the processes > >> tapping their AGP buffers without lock, locking would only be required > >> once it's time to move one of these buffers in/out the physical GART > >> under the arbitration of the DRM. > > > > > > You don't need to lock to write to agp buffers in the current scheme. > > > > You also don't need to play with the gart table just to draw a > > 2-triangle strip. On some chipsets, particularly under smp, > > modifying the gart table is very slow. Ask Jeff about this. > > > > Keith > > >This is also true, but I've done alot of heavy think on this very > issue. The key is to manage the agp aperture and only swap out regions > when you absolutely have too. The big key to getting something like > this to work is a memory manager that every client uses, and is based on > some sort of sarea. It should be designed with a certain minimum block > size, and have a few different flags for what kind of usage that memory > block has. (I can go into more detail on design, but you probably have > a good idea what I mean here.) Then the next step is to create kernel > calls which can swap things to an from agp space and the card. One > cards that support it, another path (which prevents GART rewrites > entirely) is to add support to swap to normal cached memory. >This is what I envision making sense in the long run. A global > memory manager using an sarea (doesn't have to be the main one) and a > good aging mechanism get us most of the way there. Jeff, It might be helpful to clarify the different uses we are discussing WRT to AGP. In this thread so far, we've been jumping all over. Here's a shot at an AGP breakdown. Feel free to correct my misconceptions. 1) The original utilization of AGP under Linux is faster MMIO transactions than PCI. Some level of improvement happens here by simply accessing a device on an AGP bus, and no special AGP programming is required. 2) Simple MMIO transactions can be optimized by enabling fast writes. This case is identical to the MMIO transactions in the first case, but the bus and graphics chipset utilize hardware pipelining to increase thrueput. There is a penalty for turning the bus around write/read/write/read because of the pipelining. There are also certain combinations of host chipsets and graphics chips where enabling fast writes can cause hangs. The remaining cases all utilize AGP bus mastering where the graphics chip can read and write directly from AGP memory. 3) Static AGP Allocation. This is the primary functionality that the agpgart module provides today. Physical memory is allocated by agpgart as needed and that memory is managed on behalf of the user space and DRM drivers at run time. There is a finite amount of this memory available dictated by the size of the AGP apperature (typically 64M). We have not fully exploited this case in user space, yet. The prototype for the AGP allocator and transfer mechanism of glDrawPixels in the Matrox G400 driver is a good example of the potential here. 4) Dynamic AGP Binding. This functionality is spec'ed in the agpgart interface but is not fully implemented, yet. The intention is for user space processes to be able to bind normal virtual pages to the AGP apperature in a very dynamic fashion. Some of the discussions about binding and unbindin
Re: [Dri-devel] R200 kernel interfaces
Linus Torvalds wrote: > Yes, kernel support (or indirect rendering) is needed for untrusted > applications, but it might actually be interesting to see what a > direct-rendering all-user-land implementation looks like. It has some > debugging advantages, and it may actually make sense to start from a > totally trusted app that goes as fast as humanly possible, and then when > that has been optimized to death look at just where the interfaces make > the most sense.. Keith, Along these lines, I've been toying around with the idea that direct user level access to the ring for commands *might* be able to use a DRM locking policy similar to how we protected the ring in the TDFX driver where it was directly accessed by the user space driver and indirect buffers were only used for indirect kinds of data like arrays and textures. We touched on this a few weeks ago on IRC, and IIRC you thought there might be some problems with coordinating access and aging buffers. Would it be valuable if I were able to get a prototype going where the 2D server access and the kernel DRM module shared direct access to the ring protected by the DRM lock? Obviously, multiple instances of a 3D driver would be an even better prototype, but I'm looking for an easier proof of concept:-) If I can get 2D and kernel sharing access, what problems do you foresee with getting multiple 3D clients to participate using a policy similar to TDFX? Regards, Jens -- /\ Jens Owen/ \/\ _ [EMAIL PROTECTED] /\ \ \ Steamboat Springs, Colorado Bringing you mounds of caffeinated joy >>> http://thinkgeek.com/sf<<< ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] radeon driver on current trunk
On Tue, 2002-06-18 at 18:57, Michael wrote: > On Mon, Jun 17, 2002 at 02:26:49AM +0200, Michel Dänzer wrote: > > - Portability fixes for the new driver: > > http://penguinppc.org/~daenzer/DRI/radeon-endianness.diff > > Feedback and testing appreciated as always, in particular on the changes > > to the x86 specific parts. > > I've been running with this for a few hours now, seems fine. Great, I've committed them. Thanks for testing. -- Earthling Michel Dänzer (MrCooper)/ Debian GNU/Linux (powerpc) developer XFree86 and DRI project member / CS student, Free Software enthusiast Bringing you mounds of caffeinated joy >>> http://thinkgeek.com/sf<<< ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] R200 kernel interfaces
> > >On Mon, 17 Jun 2002, Benjamin Herrenschmidt wrote: >> >> >mmap() and AGP driver gives access to IOMEM/AGP >> >> That one is problematic. I don't support the mmap interface properly >> on Apple chipsets for example, because they don't support the AGP >> aperture beeing accessed by the CPU. > >I assume you mean that the CPU doesn't honour the AGP mappings, but the >CPU _can_ access the physical pages themselves. Of course, it would be pretty useless if the aperture couldn't be used at all ;) Sorry for the confusion. >How do you do it right >now, since we seem to be doing "ioremap_nocache()" all over the place with >the AGP aperture? Not that much. I really only bothered with the r128 and radeon drivers, the kernel side use a few well localized ioremap calls that I turned into an agp_ioremap call. I then implement that function by directly building a virtual mapping from the underlying AGP pages. The userland side is using drmMap, which already sets up some vmops for various cases, I just had to add a specific case for this kind of AGP hosts that give the real RAM pages on the fly. >But fundamentally that should not be a problem: we can map the (unmapped) >AGP pages one page at a time (rather than as one contiguous block of >remapped pages) into user mode. Using vmops is easy, provided that nobody plays tricks like binding/unbinding memory from the GART behind out back. That is my main problem with the current ioctl interface to agpgart. Basically, the API allows this and the agptest program itself will, for example, map the aperture before binding memory to it. It's still workable provided the aperture isn't accessed before memory is bound, but if we allow dynamic binding/unbinding of memory while the entire aperture is mmap'ed in some process space, then we have to potentially tear down mappings of those other processes on unbind, which is beyond my knowledge of linux vm (especially on SMP). >I thought AGP already supported a mmap() interface, and if it really >doesn't, it should be trivial to do... > > [ Time passes, Linus looks at the sources ] > >Ok, there does seem to be mmap() support in the AGP module, but it seems >to use that stupid "remap_page_range()" and the AGP base (similar to >ioremap() inside the kernel), so it does seem to mmap the _mapped_ AGP >area. > >It would be possible to just install a "nopage" handler, and map one page >at a time on demand from the pool of (non-GART-mapped) pages that we keep >in the gatt_table[] or whatever. Yup, exactly like what I do with drmMap, but I still don't like the API for the reason I just explained. >Maybe there is some reason for doing it that way that I don't understand. >More likely, it's just done that way because it was the simple and stupid >approach. > >However, you seem to prefer a different approach, which would certainly >work: > >> I would much prefer the agpgart interface to be redisigned around >> different semantics, mostly vmalloc() some space to use as AGP memory, >> then bind that to the GART, but don't rely on direct AGP aperture >> access. >> >> There are also some slight speed improvements to win using this >> sheme as I could map the AGP memory as cacheable (which would give a >> significant boost on PPC) provided buffers & ring get properly flushed >> before beeing "passed" to the chip. > >Hmm.. It would be fairly simple to do all page allocation in user space, >and have an interface that says "put the physical page corresponding to my >virtual address into the AGP aperture at offset ". > >This would effectively disallow the above "map by unmapped page" approach, >because it's too damn expensive to find and flush any existing mappings >when somebody maps in a new page. And if not all systems support the >GART-assisted CPU mapping that we do now, that means that nobody can mmap >the AGP area into memory. Exactly. >The expensive part would be the "mark this page uncacheable" when moving >it to the AGP buffer, which implies a cross-CPU TLB flush for each such >page. So moving a page into the AGP aperture is fundamentally a fairly >expensive operation: wbinvd itself takes a _loong_ time, but if you have >to do it on all CPU's along with the TLB flush, it gets _really_ >expensive. > >So moving pages that way is definitely not cheap either. Hmm. What about simpler semantics ? A given client need well known chunks of AGP memory (the ring buffer, the indirect buffers, etc...). All we really need is _one_ call to allocate a chunk of memory and bind it to the GART. That call would return whatever opaque ID that can be used for processes to later mmap that into their space and the offset into AGP aperture where it was bound. Of course, we need to provide the opposite call for disposing of it, but in this case, we probably don't need to be smart regarding processes who still have it mapped as it's typically a fatal thing or programming error. This avoids the problem with current agpgart which is to allow more or l
Re: [Dri-devel] Newbie to DRI development
I think their visualizations interests cover a wide range of fields such as fluids, astrophysics, molecular chemistry, etc. They are primarly interested and want everything being open-source if possible. I see that your also from HP, actually another person from HP is supposed to get back to use about possible video cards to run on the donated Compaq ES45 server that I will be using Tom > Tom, > Just out of curiosity, what type of visualization applications > are you going to be running? (Are any of them open-source?) > > Thanks, > --Phil > > Hewlett-Packard: High Performance Technical > Computing/Visualization > --- > [EMAIL PROTECTED] > Performance/Development > > > On Tue, 18 Jun 2002 [EMAIL PROTECTED] wrote: > > > No, its the University of Western Ontario > > > > Its not a class assignment, I'm working for SHARCNet (see this for > more info > > http://www.sharcnet.ca/org_corner/) at UWO. I was hired for the summer > by one of > > the professors in charge of SHARCNet to get video drivers working and > optimized > > on Compaq ES45 servers for scientific visualizations > > > > I've started looking at the radeon drivers, and they seem to be > better > > documented and commented so hopefully that will help > > > > Tom > > > > > [EMAIL PROTECTED] wrote: > > > > > > > > Hi, I'm working at a University where they would like me to > start > > > the > > > > development of video drivers for a ATI Radeon or 3Dlabs card to > run on > > > Alpha > > > > Linux(Red Hat). > > > > > > I see uwo.ca in your e-mail address. Is the University of > Waterloo, > > > Onterio? Are you the only student working on this, or is this a > class > > > assignment? > > > > > > Just curious. > > > > > > -- /\ > > > Jens Owen/ \/\ _ > > > [EMAIL PROTECTED] /\ \ \ Steamboat Springs, > Colorado > > > > > > > > > >Bringing you mounds of caffeinated joy > > >>> http://thinkgeek.com/sf<<< > > > > ___ > > Dri-devel mailing list > > [EMAIL PROTECTED] > > https://lists.sourceforge.net/lists/listinfo/dri-devel > > > Bringing you mounds of caffeinated joy >>> http://thinkgeek.com/sf<<< ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] Newbie to DRI development
No, its the University of Western Ontario Its not a class assignment, I'm working for SHARCNet (see this for more info http://www.sharcnet.ca/org_corner/) at UWO. I was hired for the summer by one of the professors in charge of SHARCNet to get video drivers working and optimized on Compaq ES45 servers for scientific visualizations I've started looking at the radeon drivers, and they seem to be better documented and commented so hopefully that will help Tom > [EMAIL PROTECTED] wrote: > > Hi, I'm working at a University where they would like me to start > the > > development of video drivers for a ATI Radeon or 3Dlabs card to run on > Alpha > > Linux(Red Hat). > > I see uwo.ca in your e-mail address. Is the University of Waterloo, > Onterio? Are you the only student working on this, or is this a class > assignment? > > Just curious. > > -- /\ > Jens Owen/ \/\ _ > [EMAIL PROTECTED] /\ \ \ Steamboat Springs, Colorado > Bringing you mounds of caffeinated joy >>> http://thinkgeek.com/sf<<< ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] Newbie to DRI development
No, its the University of Western Ontario Its not a class assignment, I'm working for SHARCNet (see this for more info http://www.sharcnet.ca/org_corner/) at UWO. I was hired for the summer by one of the professors in charge of SHARCNet to get video drivers working and optimized on Compaq ES45 servers for scientific visualizations I've started looking at the radeon drivers, and they seem to be better documented and commented so hopefully that will help Tom > [EMAIL PROTECTED] wrote: > > > > Hi, I'm working at a University where they would like me to start > the > > development of video drivers for a ATI Radeon or 3Dlabs card to run on > Alpha > > Linux(Red Hat). > > I see uwo.ca in your e-mail address. Is the University of Waterloo, > Onterio? Are you the only student working on this, or is this a class > assignment? > > Just curious. > > -- /\ > Jens Owen/ \/\ _ > [EMAIL PROTECTED] /\ \ \ Steamboat Springs, Colorado > Bringing you mounds of caffeinated joy >>> http://thinkgeek.com/sf<<< ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
[Dri-devel] gcc 3.1: finally OK but VT still problematic
Hi all Finally, I've got my kernel built with gcc 3.1 (actually, my problems were in some mystical gcc296 in some compat package). And - wow - mach64 0-0-4 branch works for me! Great thanks to everyone. Even 2D seems to be OK these days. The only problem I noticed is VT switching. When I switch to the first VT, my X crashes. What could this be? Any way to track? Great thanks to all of you folks. Sergey Bringing you mounds of caffeinated joy >>> http://thinkgeek.com/sf<<< ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] radeon driver on current trunk
On Mon, Jun 17, 2002 at 02:26:49AM +0200, Michel Dänzer wrote: > - Portability fixes for the new driver: > http://penguinppc.org/~daenzer/DRI/radeon-endianness.diff > Feedback and testing appreciated as always, in particular on the changes > to the x86 specific parts. I've been running with this for a few hours now, seems fine. -- Michael. Bringing you mounds of caffeinated joy >>> http://thinkgeek.com/sf<<< ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] R200 kernel interfaces
On Tue, 18 Jun 2002, Linus Torvalds wrote: > > So moving pages that way is definitely not cheap either. Hmm. In fact, considering the cache and multi-CPU overhead, it's likely to be faster to just memcpy() the damn thing from a regular cached mapping to an existing AGP-mapped page. Which is pretty much what we do right now in kernel space. Playing VM games tends to be slow for normal mappings thanks to TLB effects, and playing VM games with AGP stuff is taking that slowness to a new level. In addition to the TLB effects you now have cache effects and GART mapping updates. And cache effects are much worse than TLB effects ever were, simply because caches are a damn lot bigger (not to mention the fact that the x86 has very limited cache control). So when it comes to mmap, I think you should either just map the AGP pages uncached into user space in the first place (so that you don't have any cache coherency problems at run-time) or you're better off doing the existing memcpy(). Moving pages is too painful. Linus Bringing you mounds of caffeinated joy >>> http://thinkgeek.com/sf<<< ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] Newbie to DRI development
[EMAIL PROTECTED] wrote: > > Hi, I'm working at a University where they would like me to start the > development of video drivers for a ATI Radeon or 3Dlabs card to run on Alpha > Linux(Red Hat). I see uwo.ca in your e-mail address. Is the University of Waterloo, Onterio? Are you the only student working on this, or is this a class assignment? Just curious. -- /\ Jens Owen/ \/\ _ [EMAIL PROTECTED] /\ \ \ Steamboat Springs, Colorado Bringing you mounds of caffeinated joy >>> http://thinkgeek.com/sf<<< ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] R200 kernel interfaces
On Mon, 17 Jun 2002, Benjamin Herrenschmidt wrote: > > > mmap() and AGP driver gives access to IOMEM/AGP > > That one is problematic. I don't support the mmap interface properly > on Apple chipsets for example, because they don't support the AGP > aperture beeing accessed by the CPU. I assume you mean that the CPU doesn't honour the AGP mappings, but the CPU _can_ access the physical pages themselves. How do you do it right now, since we seem to be doing "ioremap_nocache()" all over the place with the AGP aperture? But fundamentally that should not be a problem: we can map the (unmapped) AGP pages one page at a time (rather than as one contiguous block of remapped pages) into user mode. I thought AGP already supported a mmap() interface, and if it really doesn't, it should be trivial to do... [ Time passes, Linus looks at the sources ] Ok, there does seem to be mmap() support in the AGP module, but it seems to use that stupid "remap_page_range()" and the AGP base (similar to ioremap() inside the kernel), so it does seem to mmap the _mapped_ AGP area. It would be possible to just install a "nopage" handler, and map one page at a time on demand from the pool of (non-GART-mapped) pages that we keep in the gatt_table[] or whatever. Maybe there is some reason for doing it that way that I don't understand. More likely, it's just done that way because it was the simple and stupid approach. However, you seem to prefer a different approach, which would certainly work: > I would much prefer the agpgart interface to be redisigned around > different semantics, mostly vmalloc() some space to use as AGP memory, > then bind that to the GART, but don't rely on direct AGP aperture > access. > > There are also some slight speed improvements to win using this > sheme as I could map the AGP memory as cacheable (which would give a > significant boost on PPC) provided buffers & ring get properly flushed > before beeing "passed" to the chip. Hmm.. It would be fairly simple to do all page allocation in user space, and have an interface that says "put the physical page corresponding to my virtual address into the AGP aperture at offset ". This would effectively disallow the above "map by unmapped page" approach, because it's too damn expensive to find and flush any existing mappings when somebody maps in a new page. And if not all systems support the GART-assisted CPU mapping that we do now, that means that nobody can mmap the AGP area into memory. The expensive part would be the "mark this page uncacheable" when moving it to the AGP buffer, which implies a cross-CPU TLB flush for each such page. So moving a page into the AGP aperture is fundamentally a fairly expensive operation: wbinvd itself takes a _loong_ time, but if you have to do it on all CPU's along with the TLB flush, it gets _really_ expensive. So moving pages that way is definitely not cheap either. Hmm. Linus Bringing you mounds of caffeinated joy >>> http://thinkgeek.com/sf<<< ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] Newbie to DRI development
On Tue, 2002-06-18 at 17:31, [EMAIL PROTECTED] wrote: > Hi, I'm working at a University where they would like me to start the > development of video drivers for a ATI Radeon or 3Dlabs card to run on Alpha > Linux(Red Hat). I'd expect the radeon driver to work on alpha, have you tried it? > I've read all the documentation on your website, and some more general > information in other places but I'm having a problem seeing where to start. The > documentation on the website is more of an overview(big picture) of DRI, rather > than details of what needs to be implemented and how to implement them in the > drivers and so I'm completely stuck. > > I've taken a look at some of the code, specifcally for the i810 card as is > suggested for beginners but its hard to make much sense out of it when > commenting and documentation of code is very sparse. How am I supposed to know > what functions need to be implemented or even their prototypes. > > Any help would be of great assistance, as right now I'm just completely lost In case it doesn't work yet, fixing it (for powerpc) has been a good way for me to get familiar with it. -- Earthling Michel Dänzer (MrCooper)/ Debian GNU/Linux (powerpc) developer XFree86 and DRI project member / CS student, Free Software enthusiast Bringing you mounds of caffeinated joy >>> http://thinkgeek.com/sf<<< ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
[Dri-devel] Newbie to DRI development
Hi, I'm working at a University where they would like me to start the development of video drivers for a ATI Radeon or 3Dlabs card to run on Alpha Linux(Red Hat). I've read all the documentation on your website, and some more general information in other places but I'm having a problem seeing where to start. The documentation on the website is more of an overview(big picture) of DRI, rather than details of what needs to be implemented and how to implement them in the drivers and so I'm completely stuck. I've taken a look at some of the code, specifcally for the i810 card as is suggested for beginners but its hard to make much sense out of it when commenting and documentation of code is very sparse. How am I supposed to know what functions need to be implemented or even their prototypes. Any help would be of great assistance, as right now I'm just completely lost Thanks Tom Bringing you mounds of caffeinated joy >>> http://thinkgeek.com/sf<<< ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
[Dri-devel] Re: Dri-devel digest, Vol 1 #1470 - 11 msgs
On Tue, 18 Jun 2002 [EMAIL PROTECTED] wrote: > From: Benjamin Herrenschmidt <[EMAIL PROTECTED]> > To: Linus Torvalds <[EMAIL PROTECTED]>, > > > - Interrupts > > > > You don't use these right now, and as far as I can tell the main > > reason for using them would be to just synchronize page flipping > > with the framerate. No? > > Which would be nice to have proper frame-sync on interlaced display > (especially with Michel Danzer work on using DRM for Xv blits). My two cents -- add this to the fbdev driver in a way that allows DRM to use it from userspace. Matroxfb already has an IOCTL set, which could be expanded to allow running an automatic page-flipper, and perhaps have a hook on which to hang any handlers for DRM drivers that for some reason really do need a kernel module. Simple apps that don't use X or much 3d (directfb, GGI, SDL) would really benefit from a unified pageflip/retrace API for fbdev, and if that allows DRI development to take place in userspace (and in so doing speed up development) so much the better. -- Brian S. Julin Bringing you mounds of caffeinated joy >>> http://thinkgeek.com/sf<<< ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] R200 kernel interfaces
Keith Whitwell wrote: > Benjamin Herrenschmidt wrote: > >>> HOWEVER, if you tied the GART mapping to the DRM lock, you might be ok. >>> That gives you the required system exclusion, and if you make it an >>> explicit "get my GART context" function that is only called under >>> the DRM >>> lock _and_ only called when you actually need the AGP access, you also >>> avoid the unnecessary context switches. >>> >>> You might still have some performance issues simply because you >>> would do >>> extra work when switching aperture mappings, but hopefully the GART >>> switch >>> wouldn't be a common operation. >>> >>> The flexibility you would get _might_ be worth it. >>> >> >> Well, I would personally vote for the processes _not_ relying on having >> the AGP aperture mapped directly, but instead, the various memory pages >> making their AGP aperture. Several chipsets (Apple ones for sure, but it >> seems others are hitting this too nowadays) don't support AGP aperture >> accesses from the CPU. > > > What are you actually saying, that pages mapped in agp can't be > written by any means, or just that they can't be written through the > agp address range? > > It sounds kindof broken to me in any case. How to mtrrs work in this > world? Actually we should go to this model eventually. However it needs me to have time to finish the Page Attribute Table support I started on at VA. This allows write combining to be set on a per page basis, and is the direction we want to go even on x86. > > >> That way, if you want several AGP contexts, you can have the processes >> tapping their AGP buffers without lock, locking would only be required >> once it's time to move one of these buffers in/out the physical GART >> under the arbitration of the DRM. > > > You don't need to lock to write to agp buffers in the current scheme. > > You also don't need to play with the gart table just to draw a > 2-triangle strip. On some chipsets, particularly under smp, > modifying the gart table is very slow. Ask Jeff about this. > > Keith > This is also true, but I've done alot of heavy think on this very issue. The key is to manage the agp aperture and only swap out regions when you absolutely have too. The big key to getting something like this to work is a memory manager that every client uses, and is based on some sort of sarea. It should be designed with a certain minimum block size, and have a few different flags for what kind of usage that memory block has. (I can go into more detail on design, but you probably have a good idea what I mean here.) Then the next step is to create kernel calls which can swap things to an from agp space and the card. One cards that support it, another path (which prevents GART rewrites entirely) is to add support to swap to normal cached memory. This is what I envision making sense in the long run. A global memory manager using an sarea (doesn't have to be the main one) and a good aging mechanism get us most of the way there. -Jeff Bringing you mounds of caffeinated joy >>> http://thinkgeek.com/sf<<< ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
[Dri-devel] DRI-DEVEL, Get Perfect 'Bank Rated' Credit Status!
Quickly and Easily Improve YOUR Credit to PERFECT 'Bank' Rated Credit Status! Click here now for FULL FREE details! © 2002 All rights reserved. Unsubscribe Bringing you mounds of caffeinated joy >>> http://thinkgeek.com/sf<<< ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] R200 kernel interfaces
>What are you actually saying, that pages mapped in agp can't be written >by any >means, or just that they can't be written through the agp address range? Through the AGP address range. I work around this by hacking the DRM to map the RAM pages directly in drmMap using specific vmops and a hacked agp_ioremap. AFAIK, ia64 has similar limitations and I've been told other recent bridges used on non-x86 at least share this design "mistake" >It sounds kindof broken to me in any case. How to mtrrs work in this world? They don't exist ;) >> That way, if you want several AGP contexts, you can have the processes >> tapping their AGP buffers without lock, locking would only be required >> once it's time to move one of these buffers in/out the physical GART >> under the arbitration of the DRM. > >You don't need to lock to write to agp buffers in the current scheme. > >You also don't need to play with the gart table just to draw a 2-triangle >strip. On some chipsets, particularly under smp, modifying the gart table is >very slow. Ask Jeff about this. Ok. Ben. Bringing you mounds of caffeinated joy >>> http://thinkgeek.com/sf<<< ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] R200 kernel interfaces
Benjamin Herrenschmidt wrote: >>HOWEVER, if you tied the GART mapping to the DRM lock, you might be ok. >>That gives you the required system exclusion, and if you make it an >>explicit "get my GART context" function that is only called under the DRM >>lock _and_ only called when you actually need the AGP access, you also >>avoid the unnecessary context switches. >> >>You might still have some performance issues simply because you would do >>extra work when switching aperture mappings, but hopefully the GART switch >>wouldn't be a common operation. >> >>The flexibility you would get _might_ be worth it. >> > > Well, I would personally vote for the processes _not_ relying on having > the AGP aperture mapped directly, but instead, the various memory pages > making their AGP aperture. Several chipsets (Apple ones for sure, but it > seems others are hitting this too nowadays) don't support AGP aperture > accesses from the CPU. What are you actually saying, that pages mapped in agp can't be written by any means, or just that they can't be written through the agp address range? It sounds kindof broken to me in any case. How to mtrrs work in this world? > That way, if you want several AGP contexts, you can have the processes > tapping their AGP buffers without lock, locking would only be required > once it's time to move one of these buffers in/out the physical GART > under the arbitration of the DRM. You don't need to lock to write to agp buffers in the current scheme. You also don't need to play with the gart table just to draw a 2-triangle strip. On some chipsets, particularly under smp, modifying the gart table is very slow. Ask Jeff about this. Keith Bringing you mounds of caffeinated joy >>> http://thinkgeek.com/sf<<< ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] R200 kernel interfaces
> - Interrupts > > You don't use these right now, and as far as I can tell the main > reason for using them would be to just synchronize page flipping > with the framerate. No? Which would be nice to have proper frame-sync on interlaced display (especially with Michel Danzer work on using DRM for Xv blits). > - IOIO and IOMEM access > > iopl() gives access to IOIO Which sucks on non-x86, but here XFree has it's own stuffs anyway > mmap() and AGP driver gives access to IOMEM/AGP That one is problematic. I don't support the mmap interface properly on Apple chipsets for example, because they don't support the AGP aperture beeing accessed by the CPU. I play mapping tricks for the in-kernel mapping of the aperture (using a home made agp_ioremap in the DRM) and I use special vm_ops for drmMap of the AGP so that the real mem pages get mapped in the client processes. I could do the same with the AGP driver, though the main problem with it currently is that clients using it via the ioctl interface tend to first mmap the aperture, then bind/unbind memory to/from it. I don't say that can't be fixed though ;) I would much prefer the agpgart interface to be redisigned around different semantics, mostly vmalloc() some space to use as AGP memory, then bind that to the GART, but don't rely on direct AGP aperture access. There are also some slight speed improvements to win using this sheme as I could map the AGP memory as cacheable (which would give a significant boost on PPC) provided buffers & ring get properly flushed before beeing "passed" to the chip. Ben. Bringing you mounds of caffeinated joy >>> http://thinkgeek.com/sf<<< ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] R200 kernel interfaces
>HOWEVER, if you tied the GART mapping to the DRM lock, you might be ok. >That gives you the required system exclusion, and if you make it an >explicit "get my GART context" function that is only called under the DRM >lock _and_ only called when you actually need the AGP access, you also >avoid the unnecessary context switches. > >You might still have some performance issues simply because you would do >extra work when switching aperture mappings, but hopefully the GART switch >wouldn't be a common operation. > >The flexibility you would get _might_ be worth it. Well, I would personally vote for the processes _not_ relying on having the AGP aperture mapped directly, but instead, the various memory pages making their AGP aperture. Several chipsets (Apple ones for sure, but it seems others are hitting this too nowadays) don't support AGP aperture accesses from the CPU. That way, if you want several AGP contexts, you can have the processes tapping their AGP buffers without lock, locking would only be required once it's time to move one of these buffers in/out the physical GART under the arbitration of the DRM. Ben. Bringing you mounds of caffeinated joy >>> http://thinkgeek.com/sf<<< ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] R200 kernel interfaces
On Tue, 2002-06-18 at 10:57, Keith Whitwell wrote: > > > - IOIO and IOMEM access > > > > iopl() gives access to IOIO > > mmap() and AGP driver gives access to IOMEM/AGP > > > > IOIO is actualy slightly slower in CPL3 than in CPL0, but it's > > slower in CPU cycles, not in IO cycles. And since IO cycles > > definitely dominate in IOIO (by orders of magnitude), this isn't > > likely to be an issue. > > > > And IOMEM is the same speed, since the only overhead for > > user space is the TLB, and AGP mappings use the TLB even in kernel > > space (vmalloc). > > I'm not sure how this works. Does the agp module have a facility to allow the > client to mmap the card mmio region & the framebuffer?I wasn't aware of this. I don't know about that, but framebuffer devices certainly do. Not sure if they will after the ongoing API changes in 2.5 though. And failing that, one could fall back to /dev/mem, like DGA clients. Not that I advocate that. :) Just two cents of mine... -- Earthling Michel Dänzer (MrCooper)/ Debian GNU/Linux (powerpc) developer XFree86 and DRI project member / CS student, Free Software enthusiast Bringing you mounds of caffeinated joy >>> http://thinkgeek.com/sf<<< ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] R200 kernel interfaces
Linus Torvalds wrote: > Keith, > I've got a silly question for you.. > > Why do you need a kernel driver at all for the R200? I go into your mail below, but the only good answer I have is: 1) To allow us to mmap the framebuffer, agp and mmio regions (or to handle mmio for us without us mapping it) 2) Backwards compatibility. The ddx module is shared with the radeon & wants to talk to a kernel module. This can be worked around. > There are a few things that the kernel can do for you: > > - Locking. > > However, there are better (and faster) locks available in user > space, namely the "futex" interface. They take some getting used > to, but you can have some _truly_ low-cost locking using them. > > Example library can be found at: > http://www.kernel.org/pub/linux/kernel/people/rusty/futex-2.0.tar.gz I'm not sure how these are so much better in concept than the concept behind our existing lock. Both seem to have a userspace fast path (with a locked cycle) and a syscall/ioctl slow path on contention. The implementation of our lock has various workstation-leftovers like infrastructure for real virtualization of the hardware (kernel does context switching on lock contention), which aren't really used. > - Interrupts > > You don't use these right now, and as far as I can tell the main > reason for using them would be to just synchronize page flipping > with the framerate. No? Correct. > - IOIO and IOMEM access > > iopl() gives access to IOIO > mmap() and AGP driver gives access to IOMEM/AGP > > IOIO is actualy slightly slower in CPL3 than in CPL0, but it's > slower in CPU cycles, not in IO cycles. And since IO cycles > definitely dominate in IOIO (by orders of magnitude), this isn't > likely to be an issue. > > And IOMEM is the same speed, since the only overhead for > user space is the TLB, and AGP mappings use the TLB even in kernel > space (vmalloc). I'm not sure how this works. Does the agp module have a facility to allow the client to mmap the card mmio region & the framebuffer?I wasn't aware of this. > - Global datastructures > > I think you do the aging right now globally or something. > > What else? Right now you cache some stuff globally (the ring tail > ptr etc), but that isn't necessary: you can re-create the > information on demand after a lock aquisition (since it is only > needed when contention happens). Contention gives us a hint to check if the cliprects have changed. There's a fairly ugly mechanism for retrieving the new cliprects (drop hw lock, get a spin-type lock, send a request, get a reply, drop the spin-lock, re-aquire the hw lock). However - the check to see if this is necessary is cheap and the cliprects aren't required that often anyway. > So from what I can tell, a trusted entity doesn't strictly _need_ any > kernel support. > > Yes, kernel support (or indirect rendering) is needed for untrusted > applications, but it might actually be interesting to see what a > direct-rendering all-user-land implementation looks like. It has some > debugging advantages, and it may actually make sense to start from a > totally trusted app that goes as fast as humanly possible, and then when > that has been optimized to death look at just where the interfaces make > the most sense.. This is closer & closer to the Utah direct rendering model (not that I'm complaining...) In that model synchronization was achieved by having the X server be the only entity to touch the mmio region, but the client had direct access to a (large) dma buffer which it could ask the X server (via extended X11 protocol) to dispatch for it. The X server would take care of cliprect issues. This actually worked pretty well, but was limited to a single direct client (second & subsequent clients would go indirect, maybe sw-indirect, I forget). A little bit of work could extend that fairly easily to multiple clients. It also required that the direct client be run as root in order to mmap the framebuffer & dma region. I think it's probably time to start considering a rewrite/redesign of the 3d infrastructure based around a minimalist approach. There's just so much leftover code hanging around I have to ask what can be salvaged. Keith Bringing you mounds of caffeinated joy >>> http://thinkgeek.com/sf<<< ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel