Re: [Dri-devel] Client context uploads: How to implement them? /Analysis of the Gamma driver

Leif Delgass Sun, 12 May 2002 15:28:26 -0700

On Sun, 12 May 2002, José Fonseca wrote:

> Leif,
> 
> On 2002.05.12 19:15 Leif Delgass wrote:
> > Jose,
> > 
> > I've been experimenting with this too, and was able to get things going
> > with state being emitted either from the client or the drm, though I'm
> > still having lockups and things are generally a bit buggy and unstable
> > still.  To try client side context emits, I basically went back to having
> > each primitive emit state into the vertex buffer before adding the vertex
> > data, like the original hack with MMIO.  This works, but may be emmiting
> > state when it's not necessary.
> 
> I don't see how that would happen: only the dirty context was updated 
> before.


It didn't really make sense to me as I was writing this, to tell the
truth. :)  I just had it in my head that this way was a hack. I guess it
was just the client-side register programming that made it "evil" before.
At any rate, as you say, I think doing this in the drm is probably better 
anyway.
 
> > Now I'm trying state emits in the drm, and
> 
> I think that doing the emits on the DRM give us more flexibility than in 
> the client.
> 
> > to do that I'm just grabbing a buffer from the freelist and adding it to
> > the queue before the vertex buffer, so things are in the correct order in
> > the queue.  The downside of this is that buffer space is wasted, since
> > the
> > state emit uses a small portion of a buffer, but putting state in a
> > separate buffer from vertex data allows the proper ordering in the queue.
> 
> Is it a requirement that the addresses stored in the descriptor tables 
> must be aligned on some boundary? If not we could use a single buffer to 
> hold succesive context emits, and the first entry each table descriptor 
> would point to a section of this buffer. This way there wouldn't be any 
> waste of space and a single buffer would suffice for a big number of DMA 
> buffers.

I think the data tables need to be aligned on a 4K boundary, since that's
the maximum size, but I'm not positive.  I know for sure that the
descriptor table has to aligned to its size.
 
> > 
> > Perhaps we could use a private set of smaller buffers for this.  At any
> > rate, I've done the same for clears and swaps, so I have asynchronous DMA
> > (minus blits) working with gears at least.
> 
> This is another way too. I don't know if we are limited to the kernel 
> memory allocation granularity, so unless this is already done by the pci_* 
> API we might need to to split buffers into smaller sizes.

The pci_pool interface is intended for these sort of small buffers, I
think.  We just tell it to give us 4K buffers and allocate as many as we
need with pci_pool_alloc.  That would give us buffers one quarter the size
of a full vertex buffer and still satisfy alignment constraints.  This
would also be more secure, since these buffers would be private to the
drm.  We could use these to terminate each DMA pass as well.  That's one
thing that needs more investigation, what registers need to be reset at
the end of a DMA pass?  Right now I'm only writing src_cntl to disable the
bus mastering bit.  Bus_cntl isn't fifo-ed, so it doesn't make sense to me
to set it, even though the utah driver did.  The only drawback to using 
private buffers is that it complicates the freelist.

> > I'm still getting lockups
> > with
> > anything more complicated and there are still some state problems.  The
> > good news is that I'm finally seeing an increase in frame rate, so
> > there's
> > light at the end of the tunnel.
> 
> My time is limited, and I can't spend more than 3 hrs per day on this, but 
> I think that after the meeting tomorrow we should try to keep the cvs on 
> sync, even if it's less stable - it's a development branch after all and 
> its stability is not as important as making progress.

OK, I'll try to check in more often.  I've been trying a lot of different
things, so I just need to clean things up a bit to minimize the cruft.  I
don't want to check in failed experiments. ;)  For a while the branch is 
likely to cause frequent lockups.  I'm trying to at least get pseudo-DMA 
stable again.

> > 
> > Right now I'm using 1MB (half the buffers) as the high water mark, so
> > there should always be plenty of available buffers for the drm.  To get
> > this working, I've used buffer aging rather than interrupts.
> 
> Which register do you use to keep track of the buffers age?

I'm using the PAT_REG[0,1] registers since they aren't needed for 3D.  As
long as we make sure that DMA is idle and the register contents are
saved/restored when switching contexts between 2D/3D, I think this should
work.  The DDX only uses them for mono pattern fills in the XAA routine,
and it saves and restores them, so we need to do the same.  I've done that
in the Enter/LeaveServer in atidri.c.  We should probably also modify the
DDX's Sync routine for XAA to use the drm idle ioctl.  I think we'll need
to make sure that the DMA queue is flushed before checking for engine 
idle.  At the moment I'm calling the idle ioctl from EnterServer in 
atidri.c, but I haven't touched the XAA Sync function.
 
> > What I
> > realized with interrupts is that there doesn't appear to be an interrupt
> > that can poll fast enough to keep up, since a VBLANK is tied to the
> > vertical refresh -- which is relatively infrequent.  I'm thinking that it
> > might be best to start out without interrupts and to use GUI masters for
> > blits and then investigate using interrupts, at least for blits.
> 
> That had crossed my mind before too. I think it may be a good idea too.

I'm keeping Frank's code, so we can return to this, but I've commented out 
the call to handle_dma.  I think I'll just disable the interrupt handler 
for now to eliminate that as a source of problems.
 
> > Anyway,
> > I have an implementation of the freelist and other queues that's
> > functional, though it might require some locks here and there.
> > I'll try to stabilize things more and send a patch for you to look at.
> > 
> 
> Looking forward to that.
> 
> > I've also played around some more with AGP textures.  I have hacked up
> > the
> > performance boxes client-side with clear ioctls, and this helps to see
> > what's going on.  I'll try to clean that up so I can commit it.  I've
> > found some problems with the global LRU and texture aging that I'm trying
> > to fix as well.  I'll post a more detailed summary of that soon.
> > 
> 
> What can I say? Great work Leaf! =)
> 
> > BTW, as to your question about multiple clients and state:  I think this
> > is handled when acquiring the lock.  If the context stamp on the SAREA
> > doesn't match the current context after getting the lock, everything is
> > marked as dirty to force the current context to emit all it's state.
> > Emitting state to the SAREA is always done while holding the lock.
> > 
> I hadn't realize that before. Thanks for the info.
> 
> Regards,
> 
> José Fonseca
> 

-- 
Leif Delgass 
http://www.retinalburn.net



_______________________________________________________________

Have big pipes? SourceForge.net is looking for download mirrors. We supply
the hardware. You get the recognition. Email Us: [EMAIL PROTECTED]
_______________________________________________
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: [Dri-devel] Client context uploads: How to implement them? /Analysis of the Gamma driver

Reply via email to