On 8/24/06, Hamish <[EMAIL PROTECTED]> wrote:

I had a thought... Might be a random brain fart, or it might just use far too
too much logic, or whatever...

Most interfaces I've seen that use DMA, you allocate a block of memory, add
commands to it & then wait for the card to go idle & then tell the card to go
for... Now with hard-drives you do pretty much the same. Except modern (OK,
anything since early 90's perhaps inthe SCSI world) you can queue commands.

So why not with the graphics card as well?

To an extent, we'll be doing that.  A typical approach is to have a
central circular buffer ("ring buffer") that is controlled by a
central driver.  Whenever an application wants to send commands, it
writes to its own "indirect" buffer, and when it wants to submit those
commands, a command is put into the ring buffer that points at the
indirect buffer.  [Indirect buffers are linear, not circular.]

The ring buffer is the queue.


Instead of telling the card about a block of commands to execute straight
away, why not simply tell it to enqueue a block of commands for execution?
The card then saves the address of the block in an SRAM block (circular
buffer). The graphics engine itself then simply loop sthrough block after
block until there are no more to execute.

The problem with directly enqueueing into the GPU is that if DMA is
already going on, we contend for the bus.  The bus is already in use,
so we end up wasting thousands of CPU cycles waiting on the bus to
clear just to do a handful of PIO writes.  It's better to use a DMA
buffer in host memory where we have some shared variables between the
CPU and GPU, and we just use those to indicate where the queue head
and tail are.

But wait there's more... Prioritise blocks... Have n+1 priorties of command
block. One is reserved as the highest. This could be used to execute
something NOW! Without waiting. (Or perhaps execute as the next block if the
logic to return is too great). You could then optionally have multiple levels
of priority for command blocks...

We could just have the driver do that.  If two applications are trying
to draw at the same time, the driver could give them virtual slices
and priorities.  It won't be perfect (numbers of commands rather than
amounts of time), but there's only so much that is reasonable.


And more still... Tag the block for execution at certain times. Why interrupt
the system processor(s) to say hey retrace is on us, or we're at line x of
the display. Allow blocks to be enqueued for execution at start of the
vblank. Or at line xxx... (Not to interrupt, but to start execution of).

Now vblank-oriented things ARE something I might want hardware support
for.  If you can be sure you're staying just behind the vertical
retrace, you can draw directly to the screen without tearing and have
an entire frame to do it.  However, many times, we will be drawing to
a back buffer and just swap front and back buffers in a vertical blank
interrupt, making that not so useful.

The way I see it, any interrupt saved is a bonus. Having the card able to do
as much as possible for scheduling itself has got to save system cpu.

Aside from video interrupt, I think we need two engine interrupts.
One is "dma idle", and the other is "engine idle".

What we do is this:  In a page of DMA buffer, we have some shared
variables between GPU and CPU.  As the GPU consumes ring buffer
entries, it'll periodically update a shared variable that indicates
the "head" of the queue, where words are extracted from the circular
buffer.  Similarly, as the driver fills commands into the ring buffer,
it'll update the "tail" pointer to indicate to the GPU where the end
of the queue is.  Whenever the GPU runs to the end of what it thinks
locally is the tail pointer (no more DMA reads), it'll reread the
pointer in the host.  If what it reads is different from the old
value, it keeps going.  If what it reads is the same, it stops
(nothing more to do) and raises an interrupt.

As long as the updates to the tail pointer can be done atomically
(which, as a 32-bit word, they would be on most architectures), then
we can keep the GPU going continuously without ever issuing an
expensive PIO.  If the interrupt arrives, it means that the GPU has
stopped DMA and won't be trying again automatically, so we'll have to
issue a PIO to get it going again when there's more to do.

Any other engine-related interrupts will be ones inserted as commands
into the queue, as suggested by Jon Smirl.

(The exercise in telling the system when a memory block is free is engineering
still... You could put something like a condition variable/mutex in it... Or
a simple flag...

No flames.  :)

If not done right, there could be a race condition between writing the
tail pointer and getting the interrupt, but this is surely a solved
problem.
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to