Yep, I think I'm on approximately 2000 cycles for every divide. There are no special cases and no tables are used. That number was arrived at through the Sim Coupe debugger, so shouldn't be too inaccurate, though it obviously won't be completely dependable because of the usual RAM access timing issues.

I have two separate types of multiply. Both deal with multiplying 8.8 numbers, but one only pays attention to the bottom 10 bits of both numbers. I could adjust the other one to only use the bottom 10 bits of one of the numbers but I haven't yet.

The 16x16 multiply uses no tables whatsoever and costs between 700 and 1200 cycles depending on RAM timings. The 10x10 multiply uses a 4kb table and costs between 270 and 400 cycles. By changing my number format slightly and inlining those, I think I can chop the 10x10s down to fewer than 200 paper cycles. The vast majority of my multiplies are 10x10s, and the proportion goes up as models become more detailed, which hopefully will occur if everything else speeds up.

So far I've been a good functional programmer and put lots of in nice calable places. I think I'm spending 2–3% of my processing time on call/ret pairs. Just inlining the 10x10 multiplies would eliminate most of that.

Divides would be nice to optimise, especially once I'm line clipping, but they occur very infrequently compared to multiplies, so are not so much of a worry.

If you accept that the Cobra Mk 3 is of a similar level of detail to the objects in DWC then I guess the most relevant number to compare is frame rate. My program goes between 11 and 25 fps when the object is approximately screen sized, depending which side you're looking at and therefore how many points actually need to be processed. My program is currently 12.5kb in size, but I'm being quite wasteful in some areas. I have 4kb of sine and cosine tables for example, which is just silly. Though part of my general idea to speed up 10x10 multiplies involves doubling the size of that table.

Re: mirroring, I'm still thinking about how I want to do that in my head. At the minute I have a fairly traditional data structure that stores a separate location for each vertex and fairly traditional code that transforms and projects each of those separately, on demand. Since that object and indeed all but one of the Elite objects (so as I recall) is symmetrical, I could easily cut a whole bunch of multiplies there without much recoding. A more radical idea is to have each point index tables, e.g. so that instead of saying that a point is at (10, 128, 30), it says that a point is at (vec1, -vec2, vec3) and the various vectors are calculated once for the model then summed and/or negatived to get the location of each vertex. That'd cut down even more on the number of multiplies required and take better advantage of symmetry across many more axes, but I'm not sure that the administrative costs wouldn't be more troublesome than is worthwhile if multiplies really are going to significantly drop in cost.

Incidentally, having watched DWC again, I have the feeling that Marc counted the stars in his claim of "70-80 points". Which is silly since you really don't need to put them through a real 3d transformation/ project.

On 16 May 2008, at 17:58, David Brant wrote:

The DWC demo is about 450K file, and I believe that a lot of this is tables for multiplying and dividing etc. You say that your divide routine costs 2000 cycles is this each time you divide? How fast is the multiply routine?

Dave

----- Original Message ----- From: "Thomas Harte" <[EMAIL PROTECTED] >
To: <sam-users@nvg.ntnu.no>
Sent: Thursday, May 15, 2008 10:48 PM
Subject: In pursuit of Dead Wild Cat


As I previously said, and as mentioned in the current Sam Revival, I'm
experimenting with 3d on the Sam. At the minute I'm just playing with
vector graphics, since they move reasonably quickly, making it easy to
observe problems with the algorithms.

I'm not doing too terribly (see http://www.youtube.com/watch?v=hcMiB1ZkukM
for my attempt at a Cobra Mk 3 versus the Spectrum original), but
I'm intimidated by the revelation by Marc Broster on this list 12
years ago that "with my dead wild cat demo I had about 70-80 points
being calculated in 3D with lines being plotted in 25fps."

70 points at 25 fps would be just over 3,428 cycles/point, even if
line drawing were completely free and memory was uncontended. Even my
divide routine costs something like 2,000 cycles. Part of it seems to
be that the perspective in Dead Wild Cat isn't correct — when objects
transition in and out they are obviously zooming rather than actually
moving (because the relative perspective of points doesn't change
correctly; though it's mostly hidden by the fact that many of the
objects have a flat front), and moving them around manually shows some
very odd effects. But even eliminating the divide in favour of some
weird limited range (i.e. unsuitable for a real game) table approach
doesn't account for everything.  I would imagine that even when I've
pulled out all the stops, I'd still be spending at least 7,000 to
8,000 (pencil calculated, uncontended and hence unrealistic) cycles to
process each individual point all the way from world space to a
location on screen.

So, is Marc on the list? If not, has anyone done any significant
disassembly on the demo? I've been told before that most of RAM is
given over to pre-calculated line drawing routines. I'm only spending
something like 15% of my time on drawing so it isn't the major
concern. Is there anything else that can be learnt from it that isn't
essentially limited to doing small objects with local perspective?=

Reply via email to