Bernard Mentink wrote:

Hi All,

I am having trouble getting good performance from the maths library (libm.a). I wrote a quick (simple) benchmark program that does some simple floating point math, including an atan function, it is included below.

To measure the time taken, a scope is attached to the LED line on P1OUT/BIT4.

With the IAR compiler at fastest optimization, the loop executes at 35 times per second (35 hz). However msp430-gcc with -O2 optimization can only do 7Hz (Hardware is a x149 with a 7.3Mhz clock). This is a big difference.....

I tried the attached little program this afternoon. It declares things volatile to the extent that nothing is optimised away. As you will see, it does nothing more than try all four basic arithmetic functions.

I have the latest IAR 2.10A, Quadravox and mspgcc tools on my machine, and I used an msp420f149 FET tool. The FET tool is fresh from the box, with no crystal, so the software times itself again the vague woolly frequency of the RC oscillator. However, I tried all the compilers within a few minutes, so frequency drift should be minimal.

As posted, with the calculation of the doubles switched off, the software gives following values for "ops" as the program ends.

14724    IAR 2.10A compact floating library
5655      IAR 2.10A true IEEE754 floating library
4021      mspgcc true IEEE754 floating library
8064      Quadravox AQ430 non IEEE754 floating library

When I enabled the calculation of the doubles, each of these counts roughly halved. In each case, the default behaviour (unless I missed something) is for doubles to be the same as floats, so this is no surprise.

IAR 2.10A has an option for doubles to be true IEEE754 64 bit floats, which I did not try. I don't know which version of IAR Bernard used in his tests. The old 1.26 only supported the compact, not-quite-IEEE754 floating library. Clearly this gains considerable speed by its compromises.

I believe Quadravox's floating library is based on the code from TI's applications manual and web site. This goes for speed, and is nothing like IEEE754. It doesn't seem to match the IAR code, though.

Although mspgcc is the slowest, it isn't totally disgraced by the others. It is a true IEEE754 implementation, so it should be compared with the slower of the IAR times. I'm unclear how Dmitry handles doubles right now. There was talk of enabling real doubles, but I don't know if he has actually provided such an option.

Based on these numbers I assum Bernard's test compared mspgcc with the IAR compact non-standard library. The ratio between their timings is comparable to the figures he got. It seems the complexity of the trig functions must be comparable in these two implementations. For all I know, they might use the same trig functions, since the library code used by mspgcc may be used without restriction in any commercial application.

Another important issue, due to the complexity of floating point code, it just how big these floating libraries are. This is left as an exercise for the reader :-) Seriously, making a fair comparison isn't easy, as different implementations may drag in different amounts of library code in different circumstances.

Regards,
Steve

#if defined(__GNUC__)
#include <signal.h>
#endif
#include <io.h>

volatile int seconds;
volatile int count;

void setup(void)
{
    WDTCTL = WDT_MDLY_32;               // Set Watchdog Timer interval to ~30ms
    IE1 |= WDTIE;                       // Enable the WDT interrupt

    BCSCTL1 |= (RSEL0 | RSEL1 | RSEL2); // Select the highest nominal freq
    DCOCTL = 0xFF;                      // Select the highest speed.

    P1DIR |= 0x01;                      // Set P1.0 to output direction

    seconds = 0;
    count = 0;
}

#if defined(__GNUC__)
interrupt(WDT_VECTOR) one_second_timer(void)
#elif defined(__AQCOMPILER__)
void _INTERRUPT[WDT_VECTOR] one_second_timer(void)
#elif (__VER__ < 200) //Old IAR
void interrupt[WDT_VECTOR] one_second_timer(void)
#else
// New IAR
#pragma vector=WDT_VECTOR
__interrupt void one_second_timer(void)
#endif
{
    seconds++;
    if (seconds > 100)
    {
        seconds = 0;
        count++;
    }
    P1OUT ^= 0x01;
}

volatile float a = 41.0;
volatile float b = 42.0;
volatile float c = 43.0;
volatile float d = 54.0;

volatile float e;

volatile double da = 81.0;
volatile double db = 82.0;
volatile double dc = 83.0;
volatile double dd = 84.0;

volatile double de;

long int ops;

int main(int argc, char *argv[])
{
    int old_count;
    
    setup();
    _EINT();
    ops = 0;
    old_count = count;
    while (old_count == count)
        ;
    old_count = count + 5;
    while (count < old_count)
    {
        e = a - b;
        e = a + b;
        e = a*b;
        e = a/b;

#if 0
        de = da - db;
        de = da + db;
        de = da*db;
        de = da/db;
#endif

        ops++;
    }
    for (;;);
}

Reply via email to