On Tue, 11 Jun 2002, José Fonseca wrote: > On 2002.06.11 22:12 Leif Delgass wrote: > > On Tue, 11 Jun 2002, José Fonseca wrote: > > > > ... > > > > This is where we have to make sure that any assumptions we make can be > > verified to be true. I haven't done enough testing to really determine a > > sure fire way of knowing that the card won't stop yet. What I'm > > concerned > > about is that the card might be doing some read-ahead buffering that we > > don't know about. That's why I was thinking we might have to see the > > card > > actually advance a couple of times before determining it won't stop. The > > test I did with changing BM_GUI_TABLE from a buffer took a couple of > > descriptors to take effect. > > I've already tested that before, and it didn't seem that there wasn't a > significant buffering noticed - at least with respect with the descriptor > table. Only if it ther is a lookahead value. In that case we could compare > the BM_GUI_TABLE instead of the head. In any case we would need more > testing to be sure then.. > > Another idea I had is instead of having a flag is having a bookmark - the > value of the last commited ring. Whenever we commit a buffer and the head > if after that bookmark then we would had to set this bookmark to the > beggining of the commited buffer. When we need the card to complete we > just would had to wait (and reset the DMA if it stopped until then) for > the head to reach the bookmark. Once it reached we could be sure that it > would succeed because the ring table wouldn't suffer any change until the > end. > > In any event this will take some experimentation, and from your comments > below this isn't as high priority as thing as condensating the state > buffers or do a costumized vertex buffer templates for the Mach64 vertex > format.
True. I'm not sure it's worth a bookmarking scheme if we only use it for the one place the flush ioctl will be used. > > ... > > > > I don't think it's a problem if the head_addr is one behind the actual > > position if there are 2D commands still in the FIFO (which could only > > happen at the final descriptor on the ring). We don't actually act on it > > until the card is idle. It just means that the last buffer in the ring > > won't be reclaimed until the card is idle. Actually, if you _did_ > > True. > > > advance > > the head while the card is active, it would trigger the error check you > > added to freelist_get because head would equal tail, but the card would > > still be active. > > I doubt, because in that case we would wait wait for idle and _then_ > restart DMA... (as it done now in CVS) > > Please, let's not discuss this further. I think we both agree that using a > variable is the best wat to go, isn't it? Each and every processor cycle must be precisely documented and accounted for! The lives of rocket-launcher toting space marines depend on our attention to detail! I've clamped onto the throat of this bit of code like a mad dog and will continue to shake it around while foam dribbles from the corners of my mouth long after it's dead. Ahem, ... ok I'm back now, I think I blacked-out for a minute there. Anyway, I just think that in any case it would be better if we only enable bus mastering on idle (if things are going well, an active engine should be the common case). If we do that I don't think it's really a big deal to have the extra writes. The writes could be conditional on a read or we could use a variable instead, but I'm not sure it's worth it and it could be error prone. Now that I think about it, there's an added bit of security and safety in making sure that the block 1 registers are enabled and that src_cntl is set for gui-mastering and FIFO synchronization before starting a new DMA pass. It would probably help performance in general to find ways to reduce the number of DMA restarts we do also. btw, as I tried to indicate above, I can be a bit of an obstinate bugger sometimes and I'm often just thinking out loud. You might want to have a salt shaker at hand and administer a grain or two when you read my posts. 8->~ (that's me foaming at the mouth). > > > > > ... > > > > > > I'm not sure if I understood correctly what you're saying. Note that > > once > > > we restart the card we can be sure that it won't stop until it finishes > > > _all_ buffers we supplied until _that_ moment. > > > > I wasn't very clear here. What I mean is that if the card is idle and we > > restart, we should be fine. The problem is if we _only_ do that and > > do doing nothing if the card is active. > > Ok. > > > ... > > > > > > I think that having a flag indicating whether the card can stop or not > > is > > > more efficient. What do you think? > > > > It depends on what's required to set a reliable flag. That would have to > > be done every time we advance the ring tail, whereas a flush ioctl is > > less > > frequent. We can remove the flush ioctls wherever they are followed by > > an > > idle ioctl with the current version of the idle ioctl (since it ensures > > _all_ buffers will complete), which would just leave the flush in DDFlush > > in the Mesa driver. If an app calls glFlush, it's probably not doing it > > very often (maybe once or twice a frame?). > > I really don't know.. I don't even know why would a regular application > (not X) call a idle if the flush wasn't implicit... I think that's probably true. With X, there is one case where it seems excessive: when uploading a new cursor image. This function has an XAA sync call in it becuase it writes to the cursor area of the framebuffer. The cursor image shouldn't be dependant on all 3D draw operations completing first AFAICT. The problem is that I can't think of a clean way to handle this at the moment. For other XAA functions, we need to complete the ring because the X server can change register state that won't be accounted for in buffers already committed to the ring. The other thing, which is really what I was addressing here, is that we still need a flush ioctl for DDFlush that doesn't wait for idle, independant of whether the idle ioctl flushes or not. > > ... > > > > The biggest problem with getting the client submitted buffers to be used > > more efficiently is state emits. Client-side state emits aren't secure. > > The current code _does_ allow multiple primitives in a buffer as long as > > there is no new state between them. The AllocDmaLow will use an existing > > vertex buffer until it's full or a state change causes a dispatch. > > > > Oh.. I didn't had that impression... But even with that restriction, they > are a lot smaller than I would expect. I would expect that OpenGL > applications made less state changes than that... Changing textures is one example, and I'd imagine that happens fairly often. I should point out that a primitive won't be split across multiple vertex buffers, so that can leave some unused space as well. As you mentioned, we're going to have to revisit vertex buffers at some point in any case, both for performance and security. BTW, last time I had to boot Windows (against my will, of course), I did a quake3 timedemo. On my laptop, the current dri branch is only behind by ~4fps w/ vertex lighting and 2fps w/ lightmap lighting (approx. 82% and 87% of performance in Windows respectively). I'd say we're making good progress. My goal is to try to at least match the Windows driver, but with a more secure implementation. -- Leif Delgass http://www.retinalburn.net _______________________________________________________________ Multimillion Dollar Computer Inventory Live Webcast Auctions Thru Aug. 2002 - http://www.cowanalexander.com/calendar _______________________________________________ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel