On 24 April 2017 at 03:07, Savolainen, Petri (Nokia - FI/Espoo)
<petri.savolai...@nokia-bell-labs.com> wrote:
>> > diff --git a/platform/linux-generic/arch/x86/odp_cpu_arch.c
>> b/platform/linux-generic/arch/x86/odp_cpu_arch.c
>> > index c8cf27b6..9ba601a3 100644
>> > --- a/platform/linux-generic/arch/x86/odp_cpu_arch.c
>> > +++ b/platform/linux-generic/arch/x86/odp_cpu_arch.c
>> > @@ -3,7 +3,14 @@
>> >   *
>> >   * SPDX-License-Identifier:     BSD-3-Clause
>> >   */
>> > +
>> > +#include <odp_posix_extensions.h>
>> > +
>> >  #include <odp/api/cpu.h>
>> > +#include <odp_time_internal.h>
>> > +#include <odp_debug_internal.h>
>> > +
>> > +#include <time.h>
>> >
>> >  uint64_t odp_cpu_cycles(void)
>> >  {
>> > @@ -31,3 +38,55 @@ uint64_t odp_cpu_cycles_resolution(void)
>> >  {
>> >     return 1;
>> >  }
>> > +
>> > +uint64_t cpu_global_time(void)
>> > +{
>> > +   return odp_cpu_cycles();
>>
>> A cycle counter cannot always be used to measure time. Even on x86,
>> odp_cpu_cycles() will return the value of RDTSC which is not actually
>> representative of the cycle count. Even if the x86 processor is set
>> to a fixed frequency, the Invariant TSC may run at a different fixed
>> frequency. Please take a look at the odp_tick_t proposal here:
>>
>> https://docs.google.com/document/d/1sY7rOxqCNu-bMqjBiT5_keAIohrX1ZW-
>> eL0oGLAQ4OM/edit?usp=sharing
>>
>
> From coverletter:
> "This patch set modifies time implementation to use TSC when running on a x86
> CPU that has invarint TSC CPU flag set. Otherwise, the same Linux system time
> is used as before. TSC is much more efficient both in performance and
> latency/jitter wise than Linux system call. This can be seen also with
> scheduler latency test which time stamps events with this API. All latency
> measurements (min, ave, max) improved significantly."

odp_sched_latency currently uses clock_gettime. It is my understanding
that clock_gettime does not have the over head of the system call. Can
you elaborate more on the 'improved significantly' part?

>
> This function (cpu_global_time()) is called only when we have first checked 
> that TSC is invariant. Also we measure the TSC frequency in that case. This 
> function is defined in the same file as cpu_cycles(), and the file is x86 
> specific. So, we know what we are doing, and just re-using the code to read 
> TSC.
>
>
>
>> > +}
>> > +
>> > +#define SEC_IN_NS 1000000000ULL
>> > +
>> > +/* Measure TSC frequency. Frequency information registers are defined
>> for x86,
>> > + * but those are often not enumerated. */
>> > +uint64_t cpu_global_time_freq(void)
>> > +{
>>
>> The 0x40000010 leaf is standardized in Hyper-V as returning the TSC's
>> frequency in kHz. It would be better to use this when running under a
>> hypervisor. It appears to work under VMware as well.
>
>
> Sure, that can be done later, if we want to add any code into odp-linux that 
> would behave differently under VM vs. host. Today, we don't have any.
>
> Also there's leaf 0x15, which should give you TSC frequency but apparently is 
> not enumerated often. So, I ended up to do the same as DPDK - just measure it.
>
>
>> > diff --git a/platform/linux-generic/include/odp/api/plat/time_types.h
>> b/platform/linux-generic/include/odp/api/plat/time_types.h
>> > index 4847f3b1..8ae7ce82 100644
>> > --- a/platform/linux-generic/include/odp/api/plat/time_types.h
>> > +++ b/platform/linux-generic/include/odp/api/plat/time_types.h
>> > @@ -26,11 +26,28 @@ extern "C" {
>> >   * the linux timespec structure, which is dependent on POSIX extension
>> level.
>> >   */
>> >  typedef struct odp_time_t {
>> > -   int64_t tv_sec;      /**< @internal Seconds */
>> > -   int64_t tv_nsec;     /**< @internal Nanoseconds */
>> > +   union {
>> > +           /** @internal Posix timespec */
>> > +           struct {
>> > +                   /** @internal Seconds */
>> > +                   int64_t tv_sec;
>> > +
>> > +                   /** @internal Nanoseconds */
>> > +                   int64_t tv_nsec;
>> > +           } spec;
>> > +
>> > +           /** @internal HW time counter */
>> > +           struct {
>> > +                   /** @internal Counter value */
>> > +                   uint64_t count;
>> > +
>> > +                   /** @internal Reserved */
>> > +                   uint64_t res;
>> > +           } hw;
>> > +   };
>>
>> A processor's tick counter cannot be used to measure calendar time by
>> itself. The delta between two ticks, however, can be converted to
>> calendar time.
>>
>> Please see the proposal that introduces odp_tick_t which eliminates
>> the wasted u64 reserved field in the union above.
>
>
> Nothing is wasted here today. We need to support Posix timespec anyway. 
> Timespec is used when there's no x86 invariant TSC available - so all non-x86 
> and even on some x86 systems.
>
> Also "count" here starts from zero (we save cpu counter value at start). So, 
> it takes 580 years to wrap at 1GHz. Why 64 bit counter value (that starts 
> from zero) is not good for calendar time ?
>
>
>
>
>> > +/*
>> > + * Posix timespec based functions
>> > + */
>> >
>> > -static inline odp_time_t time_diff(odp_time_t t2, odp_time_t t1)
>> > +static inline odp_time_t time_spec_diff(odp_time_t t2, odp_time_t t1)
>>
>> Static functions in .c files will always be considered for inlining, so
>> the
>> inline qualifier is unnecessary here. You can use always_inline compiler
>> attribute if the disassembly and run time performance is not as expected.
>>
>> Comment applies to nearly every function in this file.
>
>
> This is because all our code (in this and other files) does "static inline" 
> for functions that we want to be inlined. Static is used for functions that 
> are static, but we don't care if those are inlined (slow path). Most time API 
> functions are fast path.
>
>
>> >
>> > -static inline uint64_t time_local_res(void)
>> > +static inline uint64_t time_hw_res(void)
>>
>> I think 'freq' is a more descriptive name than 'res'.
>>
>> It would also be good to document at either the function declaration
>> or the function definition (if multiple implementations differ) whether
>> the frequency is in Hz or kHz. For example, on ARMv8 it is just a
>> register read to get Hz. No estimation, and no lowering to kHz.
>
>
> This implements resolution API functions ...
>
> /**
>  * Global time resolution in hertz
>  *
>  * @return      Global time resolution in hertz
>  */
> uint64_t odp_time_global_res(void);
>
>
> ... so, from that context you know that it's Hz. In this implementation, I 
> lower the measured Hz to be conservative.
>
>
>> > +
>> > +static inline odp_time_t time_hw_sum(odp_time_t t1, odp_time_t t2)
>> > +{
>> > +   odp_time_t time;
>> > +
>> > +   time.hw.res = 0;
>> > +   time.hw.count = t1.hw.count + t2.hw.count;
>> > +
>> > +   return time;
>> > +}
>>
>> If a single u64 were used this code wouldn't need to exist.
>
> Which code? hw.res = 0 ? It not actually used for anything and could be 
> removed. Although, the performance gain would not be huge. Anyway, I'll 
> remove it for v2.
>
> -Petri
>

Reply via email to