Re: In pursuit of Dead Wild Cat

Thomas Harte Fri, 16 May 2008 10:46:06 -0700

Yep, I think I'm on approximately 2000 cycles for every divide. Thereare no special cases and no tables are used. That number was arrivedat through the Sim Coupe debugger, so shouldn't be too inaccurate,though it obviously won't be completely dependable because of theusual RAM access timing issues.

I have two separate types of multiply. Both deal with multiplying 8.8numbers, but one only pays attention to the bottom 10 bits of bothnumbers. I could adjust the other one to only use the bottom 10 bitsof one of the numbers but I haven't yet.

The 16x16 multiply uses no tables whatsoever and costs between 700 and1200 cycles depending on RAM timings. The 10x10 multiply uses a 4kbtable and costs between 270 and 400 cycles. By changing my numberformat slightly and inlining those, I think I can chop the 10x10s downto fewer than 200 paper cycles. The vast majority of my multiplies are10x10s, and the proportion goes up as models become more detailed,which hopefully will occur if everything else speeds up.

So far I've been a good functional programmer and put lots of in nicecalable places. I think I'm spending 2–3% of my processing time oncall/ret pairs. Just inlining the 10x10 multiplies would eliminatemost of that.

Divides would be nice to optimise, especially once I'm line clipping,but they occur very infrequently compared to multiplies, so are not somuch of a worry.

If you accept that the Cobra Mk 3 is of a similar level of detail tothe objects in DWC then I guess the most relevant number to compare isframe rate. My program goes between 11 and 25 fps when the object isapproximately screen sized, depending which side you're looking at andtherefore how many points actually need to be processed. My program iscurrently 12.5kb in size, but I'm being quite wasteful in some areas.I have 4kb of sine and cosine tables for example, which is just silly.Though part of my general idea to speed up 10x10 multiplies involvesdoubling the size of that table.

Re: mirroring, I'm still thinking about how I want to do that in myhead. At the minute I have a fairly traditional data structure thatstores a separate location for each vertex and fairly traditional codethat transforms and projects each of those separately, on demand.Since that object and indeed all but one of the Elite objects (so as Irecall) is symmetrical, I could easily cut a whole bunch of multipliesthere without much recoding. A more radical idea is to have each pointindex tables, e.g. so that instead of saying that a point is at (10,128, 30), it says that a point is at (vec1, -vec2, vec3) and thevarious vectors are calculated once for the model then summed and/ornegatived to get the location of each vertex. That'd cut down evenmore on the number of multiplies required and take better advantage ofsymmetry across many more axes, but I'm not sure that theadministrative costs wouldn't be more troublesome than is worthwhileif multiplies really are going to significantly drop in cost.

Incidentally, having watched DWC again, I have the feeling that Marccounted the stars in his claim of "70-80 points". Which is silly sinceyou really don't need to put them through a real 3d transformation/project.


On 16 May 2008, at 17:58, David Brant wrote:

The DWC demo is about 450K file, and I believe that a lot of this istables for multiplying and dividing etc. You say that your divideroutine costs 2000 cycles is this each time you divide? How fast isthe multiply routine?


Dave

----- Original Message ----- From: "Thomas Harte" <[EMAIL PROTECTED]>

To: <sam-users@nvg.ntnu.no>
Sent: Thursday, May 15, 2008 10:48 PM
Subject: In pursuit of Dead Wild Cat

As I previously said, and as mentioned in the current Sam Revival, I'm
experimenting with 3d on the Sam. At the minute I'm just playing with
vector graphics, since they move reasonably quickly, making it easy to
observe problems with the algorithms.

I'm not doing too terribly (see http://www.youtube.com/watch?v=hcMiB1ZkukM
for my attempt at a Cobra Mk 3 versus the Spectrum original), but
I'm intimidated by the revelation by Marc Broster on this list 12
years ago that "with my dead wild cat demo I had about 70-80 points
being calculated in 3D with lines being plotted in 25fps."

70 points at 25 fps would be just over 3,428 cycles/point, even if
line drawing were completely free and memory was uncontended. Even my
divide routine costs something like 2,000 cycles. Part of it seems to
be that the perspective in Dead Wild Cat isn't correct — when objects
transition in and out they are obviously zooming rather than actually
moving (because the relative perspective of points doesn't change
correctly; though it's mostly hidden by the fact that many of the
objects have a flat front), and moving them around manually shows some
very odd effects. But even eliminating the divide in favour of some
weird limited range (i.e. unsuitable for a real game) table approach
doesn't account for everything.  I would imagine that even when I've
pulled out all the stops, I'd still be spending at least 7,000 to
8,000 (pencil calculated, uncontended and hence unrealistic) cycles to
process each individual point all the way from world space to a
location on screen.

So, is Marc on the list? If not, has anyone done any significant
disassembly on the demo? I've been told before that most of RAM is
given over to pre-calculated line drawing routines. I'm only spending
something like 15% of my time on drawing so it isn't the major
concern. Is there anything else that can be learnt from it that isn't
essentially limited to doing small objects with local perspective?=

Re: In pursuit of Dead Wild Cat

Reply via email to