Hi Shannon

I also agree with you in principle.
*But* in this case we where talking about a couple of standard functions
who is used all over the place (That's what this thread started about
anyway). Functions like memcpy and such that could be speeded up by a
fair degree by using instructions from MMX, SSE and the like. Many C
compilers does not yet have support for such simple things as using the
MMX 64-bit copying commands. It has to be done in asm.
But you are absolutely right when we are talking about more specific
parts of plex86. We shouldn't even think of optimizing them in asm yet.
Only if we can be positive that that part wont change to much in the
future.

The versions of memcpy found in the Linux kernel have been hand
optimized by *very* competent people. Probably also with help from AMD
and Intel engineers. So if we could make use of them or the like, it
would be A Good Thing.

Take care
/Gabriel

On Fri, 2002-03-08 at 22:59, Shannon C. Dealy wrote:
> On Fri, 8 Mar 2002, Joaquim Carvalho wrote:
> 
> > A few bits of this:
> > 
> >     mov     [LIMIT1],eax
> >     mov     bx,[LOWER esi]                  ; set LIMIT2 (ddata1 descriptor)
> [snip]
> >     mov     ebx,[Y esi]
> > 
> > ...in the right places could do wonders for plex86 execution speed.
> > 
> > Done right, assembly code can be several times faster than C.
> > 
> > As this is an 80x86 only project, there's no reason why assembly
> > can not be used.
> >
> > Small critical sections of the C source code can be commented
> > out and replace by inline assembly.
> [snip]
> 
> While I agree with you in principle, the overwhelming majority of assembly
> code is done wrong, often results in slower code, and are far more likely
> to introduce bugs (a large chunk of of my career has been spent fixing
> "optimized" assembly language code that was slower than even a crappy C
> compiler would generate).  In addition, often simple optimizations of the
> C code can get you the same performance gains.  For the case of serious
> processors (CISC/RISC 32 bit or larger) I have seen very few instances
> where carefully optimized C code and a good compiler aren't within 1 to 2
> percent of the best hand optimized assembly.  Having said that, my take
> for the general case of C v/s Assembly for this project would be to make
> sure the algorithm is as well optimized as possible in C code, then if the
> code section is performance critical to the system, try the assembly
> approch and benchmark the code sections, C versus assembly on several
> different processor variations with the C code compiled for each target
> processor (such as K6, Duron, Athlon, Pentium III, Pentium IV, etc.).
> If the assembly code consistently beats the C code by some significant
> amount (a number I won't attempt to define because I'm not sure what it
> would/should be), then by all means, we should go with the assembly code.
> The cross CPU testing is important, because every cpu has timing
> differences in at least some instructions and instruction combinations,
> and if you improve the performance for everyone running Pentium III's at
> the expense of performance for everyone running a Duron, then your
> assembly language code has not really gained anything for the project
> overall (unless of course someone wants to optimize it for every processor
> variation).
> 
> Personally, at this point in the project, I think it is a waste of time to
> work on performance optimization at the assembly language level, because
> there is alot of other work to be done that could result in design changes
> which might cause the section of code you just optimized to be discarded,
> in which case all the time spent optimizing it was wasted.
> 
> Usually the best approach to new development goes something like this:
> 
>    1 - Come up with a design
>    2 - Get it working
>    3 - Get it stable
>    4 - tune performance/resource utilization
> 
> There is always going to be some need to bounce back and forth in the
> order above, but if you start doing things out of sequence consistently,
> rather than just when it is really needed, it takes alot more work to get
> the job done because you end up discarding work that has been done but
> is now obsolete because of changes at the preceeding level.
> 
> FWIW.
> 
> Shannon C. Dealy      |               DeaTech Research Inc.
> [EMAIL PROTECTED]     |          - Custom Software Development -
>                       |    Embedded Systems, Real-time, Device Drivers
> Phone: (800) 467-5820 | Networking, Scientific & Engineering Applications
>    or: (541) 451-5177 |                  www.deatech.com
> 
> 



Reply via email to