On Fri, 16 Feb 2007, Gilboa Davara wrote:

On Thu, 2007-02-15 at 19:23 +0200, Peter wrote:
On Thu, 15 Feb 2007, Gilboa Davara wrote:

Small example.
About two years ago I go bored, and decided to implement binary trees in
(x86) Assembly.
The end result was between 2-10 times faster then GCC (-O2/-O3)
generated code. (Depending the size of the tree)
The main reason being the lack of a 3 way comparison in C.
(above/below/equal)

And assembly lacks it too.

????????!!!?

cmp $eax,$ebx
jb label_below
ja label_above
<equal code>

Each jump is equivalent with a cache line flush.

But in C you can get creative with compound
statements:
int x,y;
register int t;

(t = x - y) && (((t < 0) && below()) || above()) || equal();

.. Which will only work if the below/above/equal are made of short
statements which is a very problematic pre-requisite.

inline int below(your,optional,arguments);

will work fine. So will:

#define below(a,b,c) (z=a+b+c)

In my case I needed to store some additional information in each leaf -
making each step a compound statement by itself. (which in-turn,
rendered your compound less effective)

Don't be so sure about that. A compound statement can be optimized very well.

which wastes 1 register variable. Still, there is no guarantee that this
generates faster code than an optimizing compiler (and gcc is not known
among the best optimizing compilers). Rewriting above using binary
operators and masks may be even faster.

The same code was also tested under Visual Studio 2K3 and showed the same results. The assembly code was considerably faster then the VS generate binary.

Assembly is not portable and it is a *** to debug. Yes, you can make it run faster. It's fun for the 1st few days, after that you need to change something or port it to a NSLU2 and things stop being nice very fast. Especially if someone else needs to compile your code.

Atomic code execution should not require assembly because segment
locking can be done using C (even if that C is inline assembly for
some applications).

A. I -was- talking about in-line assembly.
B. How can I implement "lock btX/inc/dec/sub/add" in pure C?
(Let alone using the resulting flags. [setXX])

BTW, another valid excuse to using assembly (at least in
register-barren-world-known-as-i386) is the ability to trash the base
pointer. (every register count.)

Again, why are you assuming x86 assembly is the target ? It could be ARM or MIPS or PPC. Optimizing x86 makes sense for extreme driver writing, kernel code and such. Otherwise it makes little sense on a platform that doubles its MIPS speed every 2 years. lock exists only on x86 and it exists because x86 is a brainf***d architecture that allows 'long instructions' (once upon a time known as microcode) to be interrupted in the middle. I assure you that this is a very unique feature among CPUs. Think about it, it's the only popular CPU that can be proud of being theoretically able to throw an EINTR *inside* a machine code instruction. Modifying BP + small mistake = crash. Oops.

Peter

=================================================================
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word "unsubscribe" in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]

Reply via email to