On Fri, 8 Mar 2002, Joaquim Carvalho wrote:

> A few bits of this:
> 
>       mov     [LIMIT1],eax
>       mov     bx,[LOWER esi]                  ; set LIMIT2 (ddata1 descriptor)
[snip]
>       mov     ebx,[Y esi]
> 
> ...in the right places could do wonders for plex86 execution speed.
> 
> Done right, assembly code can be several times faster than C.
> 
> As this is an 80x86 only project, there's no reason why assembly
> can not be used.
>
> Small critical sections of the C source code can be commented
> out and replace by inline assembly.
[snip]

While I agree with you in principle, the overwhelming majority of assembly
code is done wrong, often results in slower code, and are far more likely
to introduce bugs (a large chunk of of my career has been spent fixing
"optimized" assembly language code that was slower than even a crappy C
compiler would generate).  In addition, often simple optimizations of the
C code can get you the same performance gains.  For the case of serious
processors (CISC/RISC 32 bit or larger) I have seen very few instances
where carefully optimized C code and a good compiler aren't within 1 to 2
percent of the best hand optimized assembly.  Having said that, my take
for the general case of C v/s Assembly for this project would be to make
sure the algorithm is as well optimized as possible in C code, then if the
code section is performance critical to the system, try the assembly
approch and benchmark the code sections, C versus assembly on several
different processor variations with the C code compiled for each target
processor (such as K6, Duron, Athlon, Pentium III, Pentium IV, etc.).
If the assembly code consistently beats the C code by some significant
amount (a number I won't attempt to define because I'm not sure what it
would/should be), then by all means, we should go with the assembly code.
The cross CPU testing is important, because every cpu has timing
differences in at least some instructions and instruction combinations,
and if you improve the performance for everyone running Pentium III's at
the expense of performance for everyone running a Duron, then your
assembly language code has not really gained anything for the project
overall (unless of course someone wants to optimize it for every processor
variation).

Personally, at this point in the project, I think it is a waste of time to
work on performance optimization at the assembly language level, because
there is alot of other work to be done that could result in design changes
which might cause the section of code you just optimized to be discarded,
in which case all the time spent optimizing it was wasted.

Usually the best approach to new development goes something like this:

   1 - Come up with a design
   2 - Get it working
   3 - Get it stable
   4 - tune performance/resource utilization

There is always going to be some need to bounce back and forth in the
order above, but if you start doing things out of sequence consistently,
rather than just when it is really needed, it takes alot more work to get
the job done because you end up discarding work that has been done but
is now obsolete because of changes at the preceeding level.

FWIW.

Shannon C. Dealy      |               DeaTech Research Inc.
[EMAIL PROTECTED]     |          - Custom Software Development -
                      |    Embedded Systems, Real-time, Device Drivers
Phone: (800) 467-5820 | Networking, Scientific & Engineering Applications
   or: (541) 451-5177 |                  www.deatech.com


Reply via email to