On Wed, 20 Mar 2002, Allan Whiteford wrote:
[snip]
> I think there are conceptual differences.
We'll probably end up arguing over semantics, but here's my spin on it ...
> The algorithm would be as follows:
>
> "Set the total to zero. Go through each of the elements in turn and add
> it's value to the total".
... call this X for brevity
> 2 possible implementations of this are:
>
> total=0;
> for (i=0;i<=9;i++)
> {
> for (j=0;j<=9;j++)
> {
> total=total+array[i][j];
> }
> }
.. we'll call this Y1
> total=0;
> for (i=0;i<=9;i++)
> {
> for (j=0;j<=9;j++)
> {
> total=total+array[j][i];
> }
> }
.. and this Y2;
> The first solution will be faster but the algorithm hasn't changed. It's
> just about knowing the language/compiler.
Yep, Y1 and Y2 both implement X (I've been hanging around Java programmers
too long;), but there's another layer to this.
Call Y1 "the algorithm", you can implement Y1 as:
lda $30,-432($30)
stq $26,0($30)
stq $15,8($30)
mov $30,$15
.prologue 0
stl $31,16($15)
stl $31,20($15)
$L3:
ldl $1,20($15)
cmple $1,9,$1
bne $1,$L6
br $31,$L4
$L6:
stl $31,24($15)
$L7:
ldl $1,24($15)
cmple $1,9,$1
bne $1,$L10
ldl $1,20($15)
addl $1,1,$1
stl $1,20($15)
br $31,$L3
$L10:
ldl $2,20($15)
mov $2,$1
s4addq $1,0,$1
addq $1,$2,$1
addq $1,$1,$2
ldl $1,24($15)
addq $2,$1,$1
s4addq $1,0,$2
lda $1,32($15)
addq $1,$2,$1
ldl $2,16($15)
ldl $1,0($1)
addl $2,$1,$1
stl $1,16($15)
ldl $1,24($15)
addl $1,1,$1
stl $1,24($15)
br $31,$L7
$L4:
mov $15,$30
ldq $26,0($30)
lda $30,432($30)
ret $31,($26),1
.. we'll call this monstrosity Z1-1. We could also implement Y1 as:
lda $30,-400($30)
.prologue 0
mov $31,$3
.align 4
$L6:
mov $31,$2
.align 4
$L10:
addl $2,1,$2
fnop
cmple $2,9,$1
bne $1,$L10
addl $3,1,$3
fnop
cmple $3,9,$1
bne $1,$L6
lda $30,400($30)
ret $31,($26),1
.. and we'll call this tidier code Z1-2.
The point is both Z1-1 and Z1-2 implement Y1, but Z1-2 will be much
faster.
At a larger scale, you can have problem W
"Alterations are made to a matrix, maintain the correct total".
Algorithm X will do this, but a better way is:
static first_time=TRUE;
if( first_time) {
total = algorithmX( matrix);
first_time = FALSE;
}
else
total += diff;
Saving 100 array memory accesses and 99 additions on subsequent calls.
A compiler is able to produce Z1-2 instead of Z1-1 (in fact gcc did ;) and
a _really_ smart compiler might be able to produce Y2 instead of Y1, but
there's no way a compiler can produce the above algorithm instead of
always using algorithm X.
The cut-off is because there's a limit to how smart a compiler can be. The
cut-off from a human's point of view is how much time they're willing to
spend (and how much they know about the target platform). If the two
cut-offs are at the same "depth" of complexity, then optimal code is
produced.
> However, once you go to
> something complicated I think it's hard to label an optimisation as
> being purely computational or something to do with the algorithm; it
> just becomes a big grey area.
I think its all grey ... :)
Now, what about code morphing and Transmeta's Crusoe chips ...
Paul.
------------------------------------------------------------------------------
Paul Millar yo-yo, n. :
Particle Physics Theory Group Something that is occasionally
Department of Physics and Astronomy up but normally down.
University of Glasgow, (see also Computer)
Glasgow G12 8QQ, [EMAIL PROTECTED]
Scotland +44 (0)141 330 4717
------------------------------------------------------------------------------
--------------------------------------------------------------------
http://www.lug.org.uk http://www.linuxportal.co.uk
http://www.linuxjob.co.uk http://www.linuxshop.co.uk
--------------------------------------------------------------------