It does not depend on what you are doing. It depends on the number of
instructions in the loop. It does not matter whether I use -msoft-float or
use kernel math emulation. It only depends on number of instructions in the
for loop.
for integer add i.e. one or two instructions in the for loop, it is
270000microseconds which is around 27 times slow. If I start increasing the
number of instructions to be equal to floating point add function (just
random integer addition code) I get the same results as using the floating
point. It is not a surprise.
My conjecture is that 8K instruction+data cache on SC400 is starts getting
trashed when a context switch happens and kernel starts doing something
else. So if the loop has more instructions, CPU has to fetch more
instructions from the memory and that affects more performance.
I was able to reproduce this problem with finally:
init, kflushd, kpiod, kswapd, login and bash and my process running. In such
a system, I would expect my process to be the only runnable process. (apart
from bash running the command I type, and serial driver because I am working
on a serial port).
Has stock Linux kernel and gcc been modified to take advantage of a bigger
L1 or L2 caches on moder CPUs? Is that why 8K cache is giving me a lot of
problems?
Thanks
Pawan
----- Original Message -----
From: "Bjorn Eriksson" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Cc: "Pawan Singh" <[EMAIL PROTECTED]>
Sent: Tuesday, June 27, 2000 3:30 AM
Subject: RE: Timer interrupt processing latency?
> We are using RedHat 6.0 kernel on an AMD SC400 66Mhz board with
> two serial ports and an ethernet port. <...>
AMD SC410, which lack Floating Point Unit? (We designed a board with that
processor but since it's got so poor address decoding options (very few
programmable chipselects) we ditched it and went with the SC520 instead (FPU
included).)
<...>I am seeing following
> weirdness for compute intensive pieces of code:
>
> double x = 3.4;
> doubly y = 5.67;
> for (long i=1; i<10000; i++) {
> double z = x+y;
> }
>
>
> The above piece of code takes 2 seconds to 3 seconds.
> But if I surround the above piece of code with: asm("cli") and asm("sti")
> i.e. turn of interrupts, it takes only 9 milliseconds.
Is this dependent on the loop-operation is done on doubles or integers? Are
you using -fsoft-float?
Either way, thats 2.3/0.009 >= 250 times difference, which is a Huge
difference!
Singh, Pawan reported poor math emulation performance a week or two ago.
I'm not sure what was discovered there but could these two be different side
of the same coin?
file://Bj�rnen.
--
To unsubscribe from this list, send a message to [EMAIL PROTECTED]
with the command "unsubscribe linux-embedded" in the message body.
For more information, see <http://waste.org/mail/linux-embedded>.