Hello,

While investigating a timing issue with one of our program, we found out something weird: We've written a small test program that just calls clock_gettime() a lot of times and checks that the time difference between calls makes sense. In the end, it seems it doesn't always do.

Calling twice in a row clock_gettime() takes usually less than 1ms. But with an average load, about 1 time in 200000, more than 10ms are spent between both calls for no apparent reason. According to our tests, when it happens, the time between both calls can go from few milliseconds to many seconds (our best score so far is 10 seconds :). Same goes for gettimeofday().

To reproduce this issue, we use the small test program joined to this mail and openssl speed test commands (usually 'openssl speed -elapsed dsa2048'). It only appears if one of these openssl speed test run on the same core than our test progam, so we have to start as many openssl instances as there are cores on the test computer.

For instance:
- FreeBSD 7.3 + Core 2 duo
- FreeBSD 7.4/STABLE + Core 2 duo
- FreeBSD 8.1 + Core 2 duo
- FreeBSD 8.2-RC3 + Core 2 duo (clean setup)
- FreeBSD 9.0/CURRENT + Core 2 duo (clean setup, checked out last Friday)
On all these setups, with 2 instances of openssl speed tests, on average, for 2000000*2 calls to clock_gettime(), we got about 10 pauses of between 100 and 300ms

- FreeBSD 8.2-RC1 + Xeon W3680 (6 cores):
With 12 instances of openssl speed tests, for 10000000*2 calls to clock_gettime(), we got pauses of between 300ms and 10s.

- FreeBSD 6.3 + VIA C3:
     With 1 instance of openssl, we got pauses of between 70ms and 300ms

- FreeBSD 7.3 + VIA C7:
With 1 instance of openssl, we got pauses of between 150ms and 1s (Note: this computer may be slightly more loaded then the previous one)


For comparison purpose, we also tried on 2 Linux systems:
- Linux 2.6.32.15 (Via nano, with a bunch of services running on it):
   - 1 openssl speed tests + 2000000 iterations: max 10ms
   - 10 openssl speed tests + 2000000 iterations: max 44ms
- Linux 2.6 (Core i7):
   - 10 openssl speed tests + 2000000: max 1ms


We tried setting the test program to the highest priority possible (rtprio(REALTIME, RTP_PRIO_MAX)) and it doesn't seem to change a thing.


Does anyone know if there is a reason for this behavior ? Is there something that can be done to improve things ?


#include <sys/limits.h>
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <unistd.h>
#include <time.h>
#include <err.h>
#include <sysexits.h>

int main(int argc, char *argv[])
{
        struct timespec pre, cur;
        useconds_t d, dmin, dmax, dsum;
        int dweird = 0;
        int i, loop;

        if (argc != 2) {
                fprintf(stderr, "timecheck <loop>\n");
                return EXIT_FAILURE;
        }

        loop = atoi(argv[1]);
        dmax = dsum = 0;
        dmin = UINT_MAX;

        fprintf(stdout, "looping %d times...\n", loop);

        for (i = 0; i < loop; i++) {

                if (clock_gettime(CLOCK_MONOTONIC, &pre) < 0)
                        err(EX_OSERR, NULL);

                if (clock_gettime(CLOCK_MONOTONIC, &cur) < 0)
                        err(EX_OSERR, NULL);

                d = (cur.tv_sec - pre.tv_sec) * 1000000;
                d = d + (cur.tv_nsec / 1000) - (pre.tv_nsec / 1000);

                dsum += d;
                if (d < dmin)
                        dmin = d;
                if (d > dmax)
                        dmax = d;
                if (d > 10*1000) /* 10 ms */
                        dweird++;

                cur = pre;
        }

        fprintf(stdout, "dmin  = %dus\ndmax  = %dus\ndmean = %dus\ndweird = 
%d\n",
                        dmin, dmax, dsum / loop, dweird);
        return 0;
}


Reply via email to