Ah, what the heck, here is the program:

http://www.supersecret.org/~mcnamara/fbconvertprogram.txt

Timothy Miller wrote:

On 5/26/05, Viktor Pracht <[EMAIL PROTECTED]> wrote:
Am Mittwoch, den 25.05.2005, 20:32 -0400 schrieb Timothy Miller:

I've been thinking about it, and while I really like the idea of
instructions being lookup tables in RAM, it may not give us the
performance we need.  Things will already be slow.  SO, I suggest we
develop a simple processor and use an FPGA RAM block to store both
nearly 500 instructions and the register file.
That "may not give" is not enough. I want real numbers to make it either
"will give" or "won't give".

The performance of the nanocontroller is adequate in all cases except
where a single VGA operation potentially affects the whole framebuffer
(changing the palette, changing the font etc.), or in text mode, where a
single write changes up to 1 KB of framebuffer but is expected to be
very fast. These cases are simply a lot of memory copying, with an
additional memory read in between (to perform computations on the data).
That becomes six cached instructions, a couple cached LUTs, and two
parallel, very predicvite access patterns.

Since the 3D pipeline is supposed to be able to redraw the whole screen
at much higher resolutions and framerates than VGA, the memory bandwidth
can't be the bottleneck. The question now is, how does the cache look
like, and how can the nanocontroller be designed to use it optimally?
The ideal case is indeed when the nanocontroller code is inside an FPGA
RAM block, but it's best when that block is part of the normal cache and
isn't wasted in non-VGA mode. (And that's true for any kind of VGA
processor.)

PS:  Don't worry about the idea of custom instructions. It's nothing
more than a memory read with indirect addressing. Any processor that is
capable of looking up colors in the DAC palette is automatically capable
of that.


Ok, the way reads work in this memory controller, it's designed for
throughput and not latency.  So, streaming reads will be efficient,
but atomic reads will have a latency of AT LEAST 20 clock cycles.  In
the 3D pipeline, the places where this matters have fifos to absorb
the latency.  But in the nanocontroller, it's mostly atomic, and
there's very little you can do to absorb the latency.  Also, since the
memory controller and the nanoprocessor run at different clock rates,
there's additional latency in the cross-domain syncronization.  So, I
figure you'll have a delay of roughly 20 cycles in the 100MHz domain
for the processor for ANY memory read.  While instructions can be
cached, to an extent, the lookup tables cannot be, because the
accesses are totally random.  If we pipeline it properly, that's 20
cycles per instruction, unless the instruction indicates another
memory read, in which case, it's another 20.  (Writes can be ignored.)
Now, imagine the sort of program that has to be written to convert an
80x25 text mode to graphics.  There are loops and lots of memory reads
and all sorts of stuff.  The throughput's going to be horrible.  If we
can do some estimates on instruction count, we can come up with a
framerate.

_______________________________________________
Open-graphics mailing list
Open-graphics@duskglow.com
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)


_______________________________________________
Open-graphics mailing list
Open-graphics@duskglow.com
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to