Re: [Dri-devel] Re: Ring lockups on mach64

Leif Delgass Tue, 11 Jun 2002 19:33:11 -0700

On Tue, 11 Jun 2002, Jos� Fonseca wrote:

> On 2002.06.11 22:12 Leif Delgass wrote:
> > On Tue, 11 Jun 2002, Jos� Fonseca wrote:
> > 
> > ...
> > 
> > This is where we have to make sure that any assumptions we make can be
> > verified to be true.  I haven't done enough testing to really determine a
> > sure fire way of knowing that the card won't stop yet.  What I'm
> > concerned
> > about is that the card might be doing some read-ahead buffering that we
> > don't know about.  That's why I was thinking we might have to see the
> > card
> > actually advance a couple of times before determining it won't stop.  The
> > test I did with changing BM_GUI_TABLE from a buffer took a couple of
> > descriptors to take effect.
> 
> I've already tested that before, and it didn't seem that there wasn't a 
> significant buffering noticed - at least with respect with the descriptor 
> table. Only if it ther is a lookahead value. In that case we could compare 
> the BM_GUI_TABLE instead of the head. In any case we would need more 
> testing to be sure then..
> 
> Another idea I had is instead of having a flag is having a bookmark - the 
> value of the last commited ring. Whenever we commit a buffer and the head 
> if after that bookmark then we would had to set this bookmark to the 
> beggining of the commited buffer. When we need the card to complete we 
> just would had to wait (and reset the DMA if it stopped until then) for 
> the head to reach the bookmark. Once it reached we could be sure that it 
> would succeed because the ring table wouldn't suffer any change until the 
> end.
> 
> In any event this will take some experimentation, and from your comments 
> below this isn't as high priority as thing as condensating the state 
> buffers or do a costumized vertex buffer templates for the Mach64 vertex 
> format.


True. I'm not sure it's worth a bookmarking scheme if we only use it for
the one place the flush ioctl will be used.
 
> > ...
> > 
> > I don't think it's a problem if the head_addr is one behind the actual
> > position if there are 2D commands still in the FIFO (which could only
> > happen at the final descriptor on the ring).  We don't actually act on it
> > until the card is idle.  It just means that the last buffer in the ring
> > won't be reclaimed until the card is idle.  Actually, if you _did_
> 
> True.
> 
> > advance
> > the head while the card is active, it would trigger the error check you
> > added to freelist_get because head would equal tail, but the card would
> > still be active.
> 
> I doubt, because in that case we would wait wait for idle and _then_ 
> restart DMA... (as it done now in CVS)
> 
> Please, let's not discuss this further. I think we both agree that using a 
> variable is the best wat to go, isn't it?

Each and every processor cycle must be precisely documented and accounted
for!  The lives of rocket-launcher toting space marines depend on our
attention to detail!  I've clamped onto the throat of this bit of code
like a mad dog and will continue to shake it around while foam dribbles
from the corners of my mouth long after it's dead.  Ahem, ... ok I'm back
now, I think I blacked-out for a minute there.

Anyway, I just think that in any case it would be better if we only enable
bus mastering on idle (if things are going well, an active engine should
be the common case). If we do that I don't think it's really a big deal to
have the extra writes.  The writes could be conditional on a read or we
could use a variable instead, but I'm not sure it's worth it and it could
be error prone.  Now that I think about it, there's an added bit of
security and safety in making sure that the block 1 registers are enabled
and that src_cntl is set for gui-mastering and FIFO synchronization before
starting a new DMA pass.  It would probably help performance in general to
find ways to reduce the number of DMA restarts we do also.

btw, as I tried to indicate above, I can be a bit of an obstinate bugger 
sometimes and I'm often just thinking out loud.  You might want to have a 
salt shaker at hand and administer a grain or two when you read my posts. 
8->~ (that's me foaming at the mouth).
 
> > 
> > > ...
> > >
> > > I'm not sure if I understood correctly what you're saying. Note that
> > once
> > > we restart the card we can be sure that it won't stop until it finishes
> > > _all_ buffers we supplied until _that_ moment.
> > 
> > I wasn't very clear here.  What I mean is that if the card is idle and we
> > restart, we should be fine.  The problem is if we _only_ do that and
> > do doing nothing if the card is active.
> 
> Ok.
> 
> > ...
> > >
> > > I think that having a flag indicating whether the card can stop or not
> > is
> > > more efficient. What do you think?
> > 
> > It depends on what's required to set a reliable flag.  That would have to
> > be done every time we advance the ring tail, whereas a flush ioctl is
> > less
> > frequent.  We can remove the flush ioctls wherever they are followed by
> > an
> > idle ioctl with the current version of the idle ioctl (since it ensures
> > _all_ buffers will complete), which would just leave the flush in DDFlush
> > in the Mesa driver.  If an app calls glFlush, it's probably not doing it
> > very often (maybe once or twice a frame?).
> 
> I really don't know.. I don't even know why would a regular application 
> (not X) call a idle if the flush wasn't implicit...

I think that's probably true.  With X, there is one case where it seems
excessive: when uploading a new cursor image.  This function has an XAA
sync call in it becuase it writes to the cursor area of the framebuffer.  
The cursor image shouldn't be dependant on all 3D draw operations
completing first AFAICT.  The problem is that I can't think of a clean way
to handle this at the moment.  For other XAA functions, we need to
complete the ring because the X server can change register state that
won't be accounted for in buffers already committed to the ring.

The other thing, which is really what I was addressing here, is that we
still need a flush ioctl for DDFlush that doesn't wait for idle,
independant of whether the idle ioctl flushes or not.
 
> > ...
> > 
> > The biggest problem with getting the client submitted buffers to be used
> > more efficiently is state emits.  Client-side state emits aren't secure.
> > The current code _does_ allow multiple primitives in a buffer as long as
> > there is no new state between them.  The AllocDmaLow will use an existing
> > vertex buffer until it's full or a state change causes a dispatch.
> > 
> 
> Oh.. I didn't had that impression... But even with that restriction, they 
> are a lot smaller than I would expect. I would expect that OpenGL 
> applications made less state changes than that...

Changing textures is one example, and I'd imagine that happens fairly
often.  I should point out that a primitive won't be split across multiple
vertex buffers, so that can leave some unused space as well.  As you
mentioned, we're going to have to revisit vertex buffers at some point in
any case, both for performance and security.  BTW, last time I had to boot
Windows (against my will, of course), I did a quake3 timedemo.  On my
laptop, the current dri branch is only behind by ~4fps w/ vertex lighting
and 2fps w/ lightmap lighting (approx. 82% and 87% of performance in
Windows respectively).  I'd say we're making good progress.  My goal is to
try to at least match the Windows driver, but with a more secure
implementation.

-- 
Leif Delgass 
http://www.retinalburn.net




_______________________________________________________________

Multimillion Dollar Computer Inventory
Live Webcast Auctions Thru Aug. 2002 - http://www.cowanalexander.com/calendar



_______________________________________________
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: [Dri-devel] Re: Ring lockups on mach64

Reply via email to