> > +#define SEC_IN_NS 1000000000ULL
> > +
> > +/* Measure TSC frequency. Frequency information registers are defined
> for x86,
> > + * but those are often not enumerated. */
> > +uint64_t cpu_global_time_freq(void)
> > +{
> > +       struct timespec sleep, ts1, ts2;
> > +       uint64_t t1, t2, ts_nsec, cycles, hz;
> > +       int i;
> > +       uint64_t avg = 0;
> > +       int rounds = 4;
> > +
> > +       for (i = 0; i < rounds; i++) {
> > +               sleep.tv_sec  = 0;
> > +               sleep.tv_nsec = SEC_IN_NS / 10;
> > +
> > +               if (clock_gettime(CLOCK_MONOTONIC_RAW, &ts1)) {
> > +                       ODP_DBG("clock_gettime failed\n");
> > +                       return 0;
> > +               }
> > +
> > +               t1 = cpu_global_time();
> > +
> > +               if (nanosleep(&sleep, NULL) < 0) {
> > +                       ODP_DBG("nanosleep failed\n");
> > +                       return 0;
> > +               }
> > +
> > +               if (clock_gettime(CLOCK_MONOTONIC_RAW, &ts2)) {
> > +                       ODP_DBG("clock_gettime failed\n");
> > +                       return 0;
> > +               }
> > +
> > +               t2 = cpu_global_time();
> > +
> > +               ts_nsec  = (ts2.tv_sec - ts1.tv_sec) * SEC_IN_NS;
> > +               ts_nsec += ts2.tv_nsec - ts1.tv_nsec;
> > +
> > +               cycles = t2 - t1;
> > +
> > +               hz = (cycles * SEC_IN_NS) / ts_nsec;
> > +               avg += hz;
> > +       }
> > +
> > +       return avg / rounds;
> > +}
> 
> This function is not very accurate. Ideally, ts1 and t1 (ts2 and t2)
> should be read at the same instance for this to be accurate. There is
> also the possibility that the 'rdtsc' in cpu_global_time might get
> executed ahead or later than what is in the code here.
> 
> Since this is called during init, the 'rounds' can be increased to a
> higher value to get a better average. Initial values can be discarded
> to ignore the cache warming latencies.
> 
> We should fall back to this only if the frequency information
> registers are not available.
> 
> There is a good white paper on methods to use for measuring cycles for
> code that takes small amount of cycles.
> http://www.intel.com/content/dam/www/public/us/en/documents/white-
> papers/ia-32-ia-64-benchmark-code-execution-paper.pdf
> 

Execution or latency does not affect accuracy here since we measure over 100ms 
(about 200 M CPU cycles). E.g. a time stamp error of 1000 cycles would result 
an error in the 6th digit of the result, which is not very significant. DPDK 
uses similar algorithm to measure the frequency. This measurement takes now 0.4 
seconds in total. I would not want to consume too many seconds on every 
application start up for this.

E.g. on my Haswell those registers are not populated. Also DPDK does not refer 
to those anywhere. So, don't expect those to be populated often.

I think this should be accurate enough as is. It can be optimized later if 
needed.

-Petri

Reply via email to