Having thought about this some more, I think your suggestion to have
rtapi_clocks_to_ns (and possibly rtapi_ns_to_clocks) makes sense.
Encouraging use of delta times mitigates any rollovers that may be
inherent in the ns<->clock conversions.
Computing nanosecond time from tsc suffers a discontinuity at least when
rdtsc() wraps, but now I think that the rtai implementation may have a
discontinuity much more frequently--every time (int)rdtsc() wraps.
The comment on llimd says that it
/* Returns (long long)ll = (int)ll*(int)(mult)/(int)div. */
so the discontinuity actually happens when the TSC crosses a 2^31
(2^32?) boundary, not only when the 64-bit quantity wraps around back to
0.
Better would be a routine that takes u64 a, u32 b, and u8 s and
calculates the lower 64 bits of the arbitrary-precision
(a * b) >> s
gcc is able to generate efficient code for this on x86 (two integer multiplies,
about 21 cycles per invocation in a tight loop on a core2 CPU). This
algorithm should have a discontinuity only at full TSC rollover, not at
32-bit rollovers. It's also faster by a factor of 10 or so than the
rtai implementation of today.
The code:
//----------------------------------------------------------------------
static inline uint64_t mul_32x32_64(uint32_t a, uint32_t b)
__attribute__((always_inline));
static inline uint64_t mul_32x32_64(uint32_t a, uint32_t b) {
/* gcc is able to do this with a single 32x32 -> 64 multiply on x86 */
return ((uint64_t)a) * b;
}
/**
* Compute the lower 64 bits of '(a * b) >> s', s<=32
* the temporary (a*b) is 96 bits, not truncated to 64 bits
*/
static inline uint64_t ullms(uint64_t a, uint32_t b, uint8_t s)
{
uint32_t hi = (a>>32), lo = a & UINT32_C(0xffffffff);
uint64_t mul_hi = mul_32x32_64(hi, b), mul_lo = mul_32x32_64(lo, b);
return (mul_hi << (32-s)) + (mul_lo >> s);
}
/**
* b = get_scale_factor(num, denom, &s):
* Compute 'b' and 's' so that ullms32(a,b,s) is approximately (a * num /
denom)
*
* When using the same num and denom repeatedly, this is much more
* efficient than the implementation that actually performs the
* division. (In 2011 on x86, a single integer division is still about
* 10x the time of a single multiplication)
*
* However, get_scale_factor itself is not particularly efficient (this
* implementation uses fp arithmetic), so it should only be used to
* compute b and s for "constant" num / denom pairs
*/
uint32_t get_scale_factor(uint32_t num, uint32_t denom, uint8_t *scale) {
double d = (double) num / denom;
uint8_t s = 0;
while(d < 2147483647) { d *= 2; s++; }
*scale = s;
return (uint32_t)(round(d));
}
//----------------------------------------------------------------------
Then the rtapi code would look like this:
//----------------------------------------------------------------------
// globals to rtapi.ko
uint32_t tsc2ns_factor, ns2tsc_factor;
uint8_t tsc2ns_shift, ns2tsc_shift;
// somewhere in setup code {
tsc2ns_factor = get_scale_factor(1000000, cpu_khz, &tsc2ns_shift);
ns2tsc_factor = get_scale_factor(cpu_khz, 1000000, &ns2tsc_shift);
// }
uint64_t rtapi_clocks_to_ns(uint64_t clocks) {
return ullms(clocks, tsc2ns_factor, tsc2ns_shift);
}
uint64_t rtapi_ns_to_clocks(uint64_t ns) {
return ullms(ns, ns2tsc_factor, ns2tsc_shift);
}
//----------------------------------------------------------------------
Jeff
------------------------------------------------------------------------------
Got Input? Slashdot Needs You.
Take our quick survey online. Come on, we don't ask for help often.
Plus, you'll get a chance to win $100 to spend on ThinkGeek.
http://p.sf.net/sfu/slashdot-survey
_______________________________________________
Emc-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/emc-developers