Bernard Mentink wrote:
Hi All,
I am having trouble getting good performance from the maths library
(libm.a).
I wrote a quick (simple) benchmark program that does some simple
floating point math, including an atan function, it is included below.
To measure the time taken, a scope is attached to the LED line on
P1OUT/BIT4.
With the IAR compiler at fastest optimization, the loop executes at 35
times per second
(35 hz). However msp430-gcc with -O2 optimization can only do 7Hz
(Hardware is a x149 with a 7.3Mhz clock). This is a big difference.....
I tried the attached little program this afternoon. It declares things
volatile to the extent that nothing is optimised away. As you will see,
it does nothing more than try all four basic arithmetic functions.
I have the latest IAR 2.10A, Quadravox and mspgcc tools on my machine,
and I used an msp420f149 FET tool. The FET tool is fresh from the box,
with no crystal, so the software times itself again the vague woolly
frequency of the RC oscillator. However, I tried all the compilers
within a few minutes, so frequency drift should be minimal.
As posted, with the calculation of the doubles switched off, the
software gives following values for "ops" as the program ends.
14724 IAR 2.10A compact floating library
5655 IAR 2.10A true IEEE754 floating library
4021 mspgcc true IEEE754 floating library
8064 Quadravox AQ430 non IEEE754 floating library
When I enabled the calculation of the doubles, each of these counts
roughly halved. In each case, the default behaviour (unless I missed
something) is for doubles to be the same as floats, so this is no surprise.
IAR 2.10A has an option for doubles to be true IEEE754 64 bit floats,
which I did not try. I don't know which version of IAR Bernard used in
his tests. The old 1.26 only supported the compact, not-quite-IEEE754
floating library. Clearly this gains considerable speed by its compromises.
I believe Quadravox's floating library is based on the code from TI's
applications manual and web site. This goes for speed, and is nothing
like IEEE754. It doesn't seem to match the IAR code, though.
Although mspgcc is the slowest, it isn't totally disgraced by the
others. It is a true IEEE754 implementation, so it should be compared
with the slower of the IAR times. I'm unclear how Dmitry handles doubles
right now. There was talk of enabling real doubles, but I don't know if
he has actually provided such an option.
Based on these numbers I assum Bernard's test compared mspgcc with the
IAR compact non-standard library. The ratio between their timings is
comparable to the figures he got. It seems the complexity of the trig
functions must be comparable in these two implementations. For all I
know, they might use the same trig functions, since the library code
used by mspgcc may be used without restriction in any commercial
application.
Another important issue, due to the complexity of floating point code,
it just how big these floating libraries are. This is left as an
exercise for the reader :-) Seriously, making a fair comparison isn't
easy, as different implementations may drag in different amounts of
library code in different circumstances.
Regards,
Steve
#if defined(__GNUC__)
#include <signal.h>
#endif
#include <io.h>
volatile int seconds;
volatile int count;
void setup(void)
{
WDTCTL = WDT_MDLY_32; // Set Watchdog Timer interval to ~30ms
IE1 |= WDTIE; // Enable the WDT interrupt
BCSCTL1 |= (RSEL0 | RSEL1 | RSEL2); // Select the highest nominal freq
DCOCTL = 0xFF; // Select the highest speed.
P1DIR |= 0x01; // Set P1.0 to output direction
seconds = 0;
count = 0;
}
#if defined(__GNUC__)
interrupt(WDT_VECTOR) one_second_timer(void)
#elif defined(__AQCOMPILER__)
void _INTERRUPT[WDT_VECTOR] one_second_timer(void)
#elif (__VER__ < 200) //Old IAR
void interrupt[WDT_VECTOR] one_second_timer(void)
#else
// New IAR
#pragma vector=WDT_VECTOR
__interrupt void one_second_timer(void)
#endif
{
seconds++;
if (seconds > 100)
{
seconds = 0;
count++;
}
P1OUT ^= 0x01;
}
volatile float a = 41.0;
volatile float b = 42.0;
volatile float c = 43.0;
volatile float d = 54.0;
volatile float e;
volatile double da = 81.0;
volatile double db = 82.0;
volatile double dc = 83.0;
volatile double dd = 84.0;
volatile double de;
long int ops;
int main(int argc, char *argv[])
{
int old_count;
setup();
_EINT();
ops = 0;
old_count = count;
while (old_count == count)
;
old_count = count + 5;
while (count < old_count)
{
e = a - b;
e = a + b;
e = a*b;
e = a/b;
#if 0
de = da - db;
de = da + db;
de = da*db;
de = da/db;
#endif
ops++;
}
for (;;);
}