On 24 April 2017 at 03:07, Savolainen, Petri (Nokia - FI/Espoo) <petri.savolai...@nokia-bell-labs.com> wrote: >> > diff --git a/platform/linux-generic/arch/x86/odp_cpu_arch.c >> b/platform/linux-generic/arch/x86/odp_cpu_arch.c >> > index c8cf27b6..9ba601a3 100644 >> > --- a/platform/linux-generic/arch/x86/odp_cpu_arch.c >> > +++ b/platform/linux-generic/arch/x86/odp_cpu_arch.c >> > @@ -3,7 +3,14 @@ >> > * >> > * SPDX-License-Identifier: BSD-3-Clause >> > */ >> > + >> > +#include <odp_posix_extensions.h> >> > + >> > #include <odp/api/cpu.h> >> > +#include <odp_time_internal.h> >> > +#include <odp_debug_internal.h> >> > + >> > +#include <time.h> >> > >> > uint64_t odp_cpu_cycles(void) >> > { >> > @@ -31,3 +38,55 @@ uint64_t odp_cpu_cycles_resolution(void) >> > { >> > return 1; >> > } >> > + >> > +uint64_t cpu_global_time(void) >> > +{ >> > + return odp_cpu_cycles(); >> >> A cycle counter cannot always be used to measure time. Even on x86, >> odp_cpu_cycles() will return the value of RDTSC which is not actually >> representative of the cycle count. Even if the x86 processor is set >> to a fixed frequency, the Invariant TSC may run at a different fixed >> frequency. Please take a look at the odp_tick_t proposal here: >> >> https://docs.google.com/document/d/1sY7rOxqCNu-bMqjBiT5_keAIohrX1ZW- >> eL0oGLAQ4OM/edit?usp=sharing >> > > From coverletter: > "This patch set modifies time implementation to use TSC when running on a x86 > CPU that has invarint TSC CPU flag set. Otherwise, the same Linux system time > is used as before. TSC is much more efficient both in performance and > latency/jitter wise than Linux system call. This can be seen also with > scheduler latency test which time stamps events with this API. All latency > measurements (min, ave, max) improved significantly."
odp_sched_latency currently uses clock_gettime. It is my understanding that clock_gettime does not have the over head of the system call. Can you elaborate more on the 'improved significantly' part? > > This function (cpu_global_time()) is called only when we have first checked > that TSC is invariant. Also we measure the TSC frequency in that case. This > function is defined in the same file as cpu_cycles(), and the file is x86 > specific. So, we know what we are doing, and just re-using the code to read > TSC. > > > >> > +} >> > + >> > +#define SEC_IN_NS 1000000000ULL >> > + >> > +/* Measure TSC frequency. Frequency information registers are defined >> for x86, >> > + * but those are often not enumerated. */ >> > +uint64_t cpu_global_time_freq(void) >> > +{ >> >> The 0x40000010 leaf is standardized in Hyper-V as returning the TSC's >> frequency in kHz. It would be better to use this when running under a >> hypervisor. It appears to work under VMware as well. > > > Sure, that can be done later, if we want to add any code into odp-linux that > would behave differently under VM vs. host. Today, we don't have any. > > Also there's leaf 0x15, which should give you TSC frequency but apparently is > not enumerated often. So, I ended up to do the same as DPDK - just measure it. > > >> > diff --git a/platform/linux-generic/include/odp/api/plat/time_types.h >> b/platform/linux-generic/include/odp/api/plat/time_types.h >> > index 4847f3b1..8ae7ce82 100644 >> > --- a/platform/linux-generic/include/odp/api/plat/time_types.h >> > +++ b/platform/linux-generic/include/odp/api/plat/time_types.h >> > @@ -26,11 +26,28 @@ extern "C" { >> > * the linux timespec structure, which is dependent on POSIX extension >> level. >> > */ >> > typedef struct odp_time_t { >> > - int64_t tv_sec; /**< @internal Seconds */ >> > - int64_t tv_nsec; /**< @internal Nanoseconds */ >> > + union { >> > + /** @internal Posix timespec */ >> > + struct { >> > + /** @internal Seconds */ >> > + int64_t tv_sec; >> > + >> > + /** @internal Nanoseconds */ >> > + int64_t tv_nsec; >> > + } spec; >> > + >> > + /** @internal HW time counter */ >> > + struct { >> > + /** @internal Counter value */ >> > + uint64_t count; >> > + >> > + /** @internal Reserved */ >> > + uint64_t res; >> > + } hw; >> > + }; >> >> A processor's tick counter cannot be used to measure calendar time by >> itself. The delta between two ticks, however, can be converted to >> calendar time. >> >> Please see the proposal that introduces odp_tick_t which eliminates >> the wasted u64 reserved field in the union above. > > > Nothing is wasted here today. We need to support Posix timespec anyway. > Timespec is used when there's no x86 invariant TSC available - so all non-x86 > and even on some x86 systems. > > Also "count" here starts from zero (we save cpu counter value at start). So, > it takes 580 years to wrap at 1GHz. Why 64 bit counter value (that starts > from zero) is not good for calendar time ? > > > > >> > +/* >> > + * Posix timespec based functions >> > + */ >> > >> > -static inline odp_time_t time_diff(odp_time_t t2, odp_time_t t1) >> > +static inline odp_time_t time_spec_diff(odp_time_t t2, odp_time_t t1) >> >> Static functions in .c files will always be considered for inlining, so >> the >> inline qualifier is unnecessary here. You can use always_inline compiler >> attribute if the disassembly and run time performance is not as expected. >> >> Comment applies to nearly every function in this file. > > > This is because all our code (in this and other files) does "static inline" > for functions that we want to be inlined. Static is used for functions that > are static, but we don't care if those are inlined (slow path). Most time API > functions are fast path. > > >> > >> > -static inline uint64_t time_local_res(void) >> > +static inline uint64_t time_hw_res(void) >> >> I think 'freq' is a more descriptive name than 'res'. >> >> It would also be good to document at either the function declaration >> or the function definition (if multiple implementations differ) whether >> the frequency is in Hz or kHz. For example, on ARMv8 it is just a >> register read to get Hz. No estimation, and no lowering to kHz. > > > This implements resolution API functions ... > > /** > * Global time resolution in hertz > * > * @return Global time resolution in hertz > */ > uint64_t odp_time_global_res(void); > > > ... so, from that context you know that it's Hz. In this implementation, I > lower the measured Hz to be conservative. > > >> > + >> > +static inline odp_time_t time_hw_sum(odp_time_t t1, odp_time_t t2) >> > +{ >> > + odp_time_t time; >> > + >> > + time.hw.res = 0; >> > + time.hw.count = t1.hw.count + t2.hw.count; >> > + >> > + return time; >> > +} >> >> If a single u64 were used this code wouldn't need to exist. > > Which code? hw.res = 0 ? It not actually used for anything and could be > removed. Although, the performance gain would not be huge. Anyway, I'll > remove it for v2. > > -Petri >