Re: [lng-odp] RFC: New time API

Savolainen, Petri (Nokia - FI/Espoo) Thu, 03 Sep 2015 04:34:39 -0700


> -----Original Message-----
> From: ext Ivan Khoronzhuk [mailto:ivan.khoronz...@linaro.org]
> Sent: Thursday, September 03, 2015 1:29 AM
> To: Savolainen, Petri (Nokia - FI/Espoo); lng-odp@lists.linaro.org
> Subject: Re: [lng-odp] RFC: New time API
> 
> Hi, Petri
> 
> We have to look at it proceeding from performance, platform portability
> and simplicity
> 
> If you want to split on hi-res time and low-res they must have separate
> functions and
> be not under one common opaque time in order to not break hi-res
> measurements.
> 
> But in fact you split the same quality timers, farther below...

The API has two goal (as any other API under ODP)
- solve a user problem (take timestamps and work with those)
- enable good performance on multiple HW platforms (enable direct HW time 
counter(s) usage)

> 
> On 02.09.15 18:21, Savolainen, Petri (Nokia - FI/Espoo) wrote:
> > Hi,
> >
> >
> > I think we need to restart the time API discussion and specify it
> only wall time in mind.
> 
> Let's suppose.
> 
> > CPU cycle count APIs can be tuned as a next step. CPU cycle counters
> are affected by frequency scaling, which makes those difficult to use
> for counting linear, real time.
> >  The time API should specify an easy way to check and use the real,
> wall clock time.
> > We need at least one time source that will not wrap in years - here
> it's the "global" time
> > (e.g. in POSIX it's CLOCK_MONOTONIC).
> 
> Don't mix it with this API.
> CLOCK_MONOTONIC is guaranteed by OS, that can handle wraps, OS can use
> interrupts for that, you cannot.

A monotonic, very long wrap around time source is an application requirement. 
CLOCK_MONOTONIC is an example of solving the same requirement in POSIX world.

Yes, an ODP implementation should not be interrupt driven, but still an 
implementation can and likely will serve some interrupts: in worst case on a 
worker core, in a better case on a control core and in the best case on a 
system core outside of the ODP application (e.g. linux kernel on core #0). The 
key is how often those interrupts need to be served and how long it takes in 
the worst case. E.g. one interrupt due to counter wrap in 5 years on the core 
running linux kernel, does not matter much.

Counter wraps are really an issue with short time counters. Today most chips 
provide large enough counters, so that wrap around is not really an issue (max 
one wrap in several years).

> 
> Also it must be zero at platform init and begin count time when
> application starts. For each application that starts.
> Can you guarantee that it's inited to zero for each platform? I
> hesitate to answer on this question.

The API does not specify that HW counter is reset at any point in time. It 
specifies that wall time (nsec time) is zero in an application start up. In 
practice, implementation needs to read HW counter once in start up and store 
it. Basic stuff.

> 
> But let's suppose that we can guarantee it.
> In this case time should be aligned for all executed applications to
> start from zero.
> Let's suppose some start_time = odp_time() at application init.
> As it was noted earlier, in some fast loop, init_count must be
> extracted in diff function.
> I'm not talking event about checking of time type, as you are going to
> put all of them under one type.
> 
>   Global time can be also compared between threads. Local time is
> defined for optimizing short interval time checks.
> 
> In fact global has same quality as local. As noted earlier, global can
> be emulated with local.
> Why we need to split them in this case? It'll add load on user only.
> 
> It's thread local, may wrap sooner than global time, but may be lower
> overhead to use.
> 
> They are both 64-bit. You are going to use the same function for both,
> overhead the same.

First, this as any other ODP API affect only one ODP application (instance). 
Global time is global between threads of a single application. 

These are two different application use cases:
- global == time that can be shared between thread
- local  == time that does not need to be shared between threads

Ability to share is the key difference in quality. Yes, when there's a SoC 
level, low latency, high frequency, 64-bit time counter - it's sensible to use 
that to implement both local and global. In this case, implementation also 
avoids check between global and local time (it's all the same). I'm expecting 
that this is the common case.

BUT, what if the SoC level HW counter has high latency to access (e.g. 150 CPU 
cycles) and is low frequency? And the HW would have low latency, high frequency 
per CPU counter that you could use for counting local time? If API has only 
global definition, you could not use that HW resource even when application is 
not interested in sharing timestamps with other thread (needs only local time). 
The application would run slower on your HW, since every (local) timestamp 
would consume 150 cycles instead of e.g. 1 cycle.

> 
> >
> > There could be actually four time bases defined in the API, if we'd
> want to optimize for each use case (global.hi_res, global.low_res,
> local.hi_res and local.low_res).
> 
> > I'd propose to have only two and give system specific way to
> configure those (rate and duration).
> 
> What do you mean "giv system way...", do you mean add some API?
> If so, I disagree. It's not ODP responsibility and it's not every
> platform applicable.

Time counters are likely used also for OS, etc. So, vendor would need to 
document e.g. if and how ODP global time resolution can be tuned. It may be 
e.g. through a Linux boot parameter, because ODP reads the same counter that 
Linux uses for its wall clock time. It's "system specific" how the HW resource 
(e.g. time counter rate) can be configured and it's unlikely that an ODP 
application could change the setting, so we don't need an API to set the rate, 
only an API to get the rate.

> 
> >  Typical config would be global.low_res and local.hi_res. User can
> check hz and max time value (wrap around time) with odp_time_info() and
> adapt (fall back to use global time) if e.g. local time wraps too
> quickly (e.g. in 4 sec).
> 
> If this time wrap every 4s, it shouldn't be used at all...(any 32-bits)

Agree. Frequent wraps is the main problem, but can we rule today that any HW 
interesting to ODP must have HW time counter large enough to wrap once per no 
less than X years. We cannot rule the implementation (e.g. 32, 48, 64 bit 
counter) only the API spec. API can require that there's at least one time 
source that won't wrap often (global). It's then up to the implementation how 
that's guaranteed (natively due to large enough counter, co-operating with OS, 
using per core interrupts, ...).  

> 
> >
> > See the proposal under.
> >
> > -Petri
> >
> >
> >
> >
> > //
> > // Use cases
> > //
> > //             high resolution                 low resolution
> > //             short interval                  long interval
> > //             low overhead                    high overhead
> > //
> > //
> > //  global     timestamp packets or       |    timestamp log entries
> or
> > //             other global resources     |    other global resources
> > //             at high rate               |    at low rate
> > //                                        |
> > //             ---------------------------+--------------------------
> ----
> > //                                        |
> > //  local      timestamp and sort items   |    measure execution time
> over
> > //             in thread local work queue,|    many iterations or
> over
> > //             measure execution time     |    a "long" function
> > //             of a "short" function,     |
> > //             spin and wait for a short  |
> > //             while
> > //
> > //
> 
> No see reason to overload user with this stuff.
> In fact we always need one hi-resolution time with best quality, no
> matter what we measure.
> No matter how resolution it has, it should be the max that platform can
> provide for that.
> At this moment all counters are 64-bit and can not wrap for years.
> On my opinion,32-bit counter we shouldn't take into account.
> 

These are *application use cases*. One use case could be for example: stamp 
every log entry (average 1 entry per minute) with millisecond resolution in 
global time... => does not need low overhead or high resolution, but globally 
synchronized linear  time.

> >
> > // time in nsec
> > // renamed to leave room for sec or other units in the future
> > #define ODP_TIME_NS_USEC 1000ULL       /**< Microsecond in nsec */
> > #define ODP_TIME_NS_MSEC 1000000ULL    /**< Millisecond in nsec */
> > #define ODP_TIME_NS_SEC  1000000000ULL /**< Second in nsec */
> > #define ODP_TIME_NS_DAY  ((24*60*60)*ODP_TIME_NS_SEC) /**< Day in
> nsec */
> >
> >
> >
> > // Abstract time type
> > // Implementation specific type, includes e.g.
> > // - counter value
> > // - potentially other information: global vs. local time, ...
> > typedef odp_time_t
> 
> This type can be added only with one aim - ask user to use appropriate
> API that
> can handle wraps correctly. In another case, like with global time you
> are
> proposing (no wrap), uint64_t can be used, no need to overload API with
> odp_time_t
> and APIs like diff, cmp, etc.

Main benefit from abstract time is that implementation can work in native 
counter values. If API would specify that time is always nsec (or sec+nsec like 
in POSIX struct timespec), every timestamp operation would need to convert 
between counter cycles and nsec (which may add e.g. division operations in the 
calls).

> 
> >
> > // Get global time
> > // Global time is common over all threads. Global timestamps can be
> compared
> > // between threads.
> > odp_time_t odp_time(void);
> >
> > // Get thread local time
> > // Thread local time is only meaningful for the calling thread. It
> cannot be
> > // compare with other timestamps (global or local from other
> threads).
> > // May run from different clock source and different rate than global
> time.
> > // User must take care not to mix local and global time values in API
> calls.
> > odp_time_t odp_time_local(void);
> 
> I dislike the idea of local time. Theoretically it can be added, but I
> no see reason for that.
> Even if it's required, it should be handled with separate functions, as
> according to RFC it can
> overlap, global cannot. In every function the time type has to be
> checked and different approach chosen.
> It's time consuming redundancy for short periods and this reduces the
> actual resolution.

The reason is implementation efficiency. Implementation can be optimized for 
local time (e.g. CPU local counters), when user doesn't need globally sharable 
time value.

The spec says: local can wrap, NOT that it must wrap.

Implementation decides and knows:
- if it's possible to wrap (in practice)
- if it's identical to global time (== no redundancy, no checks needed, same 
code serves both)

> 
> >
> > // Compare time values
> > //
> > // Check if t1 is before t2 in absolute time, or if interval t1 is
> shorter
> > // than interval t2
> > //
> > // -1: t2 <  t1
> > //  0: t2 == t1
> > //  1: t2 >  t1
> > int odp_time_cmp(odp_time_t t1, odp_time_t t2);
> 
> This function, according to RFC, must behave differently for local and
> global time.
> And use cases also different, for time than can wrap, it can be used
> only for ranges.
> But, again, I dislike to guarantee any timer linearity.
> I added this function with only one intention - simply compare time
> ranges, not more.

Time is linear - the API needs to support that. Application can check if local 
time stays linear long enough for its use case. If it does not, the global time 
should be the fall back (wrap only after several years).

Range is a relative term - ranges longer than the wrap around time (in real 
time) would again cause problems.

One option would be to force all time sources to have very long wrap around 
times, which may cause low resolution on all of them (not only global). Maybe 
it's better to just specify that cmp() must not be used if (nsec) time can wrap 
between t1 and t2. 

> 
> >
> > // Sum of t1 and t2
> > //
> > // User can sum timestamps or accumulate multiple intervals before
> > // comparing or converting to nsec
> > odp_time_t odp_time_sum(odp_time_t t1, odp_time_t t2);
> >
> > // Time difference between t1 and t2
> > //
> > // Calculate interval from timestamp t1 to t2, or difference of two
> intervals.
> > // T2 must be the latter timestamp, or the longer interval (t2 >=
> t1).
> > // Use cmp() first, if don't know which timestamp is the
> latter/longer.
> 
> Event if suppose that it's split on local/global
> you cannot use it to compare timestamps that can wrap (local)
> Compare can be used only to compare RANGES.
> 
> > odp_time_t odp_time_diff(odp_time_t t1, odp_time_t t2);
> >
> > // Convert ODP time to wall clock time in nsec
> > //
> > // Wall clock time advances linearly in realtime and starts from 0 in
> ODP init.
> > //
> > // Global time must not wrap in several years (max time value is
> defined by
> > // info.global_nsec_max). Local time may have shorter wrap around
> time
> > // (info.local_nsec_max) than global, but it's also recommended to be
> years.
> > //
> > // Global and local time may run from different time base and thus
> result
> > // different nsec values.
> > uint64_t odp_time_to_ns(odp_time_t time);
> 
> As I see it can be used for "local" time also.
> You cannot get wall clock time from time counter that can wrap with
> this function.
> It can be done only in this way:
> 
> start_time = odp_time(); // at init.
> ....
> 
> odp_time_ns(odp_time_diff(start_time, odp_time()))
> 

Yes, this is what implementation needs to do when converting odp_time_t to nsec 
time. User can see from the info struct when (and how often) a time source will 
wrap. Both global and local nsec time may wrap the first time after e.g. >100 
years when implemented with 64 bit counters.

> >
> > // convert nsec value to global time
> > odp_time_t odp_time_from_ns(uint64_t ns);
> >
> > // convert nsec value to local time
> > odp_time_t odp_time_local_from_ns(uint64_t ns);
> >
> > // Time info structure
> > typedef struct {
> >     // Global timestamp resolution in hz
> >     uint64_t global_hz;
> >     // Max global time value in nsec. Global time values (timestamps
> or
> >     // intervals) larger than this are not handled correctly.
> >        // Global wall clock time wraps back to zero after this value.
> >     uint64_t global_nsec_max;
> 
> User don't need to worry about this parameter.
> Why do we need this? I no see any usecase. Only if user wants catch
> wraps.
> But why then add wall global time if he needs to worry about this.
> Strange.

We can easily spec that this should be in minimum "several years". It gets 
trickier to spec that it must be at least X years. What would be good number 
that everybody can support efficiently in HW? If we find a number let's put it 
here. 

> 
> >     // Local timestamp resolution in hz
> >     uint64_t local_hz;
> >     // Max local time value in nsec. Local time values (timestamps or
> >     // intervals) larger than this are not handled correctly.
> >        // Local wall clock time wraps back to zero after this value.
> >     uint64_t local_nsec_max;
> > } odp_time_info_t;
> >
> > // Time info request
> > //
> > // Fill in time info struct. User can check resolutions and max time
> values¨
> > // in nsec.
> > //
> > // 0 on success
> > // <0 on failure
> > int odp_time_info(odp_time_info_t *info);
> > _______________________________________________
> > lng-odp mailing list
> > lng-odp@lists.linaro.org
> > https://lists.linaro.org/mailman/listinfo/lng-odp
> >
> 
> In summary:
> * I like wall time, but:
>    - it requires to extract init value in places where it's not needed.
>    - requires to guarantee that timer is set counter to 0 at init.
>    - can be replaced with:
>        odp_time_diff(start_init_time, odp_time()) // give you wall
> time,
>                                                   // then convert to ns
> if you need.
>      which doesn't require guarantee to be 0 at init.

Better to do this once and inside implementation. Only read access to HW 
counter is needed.

> 
> * According local time
>   - it increases complexity.
>   - it requires to hold different types of time under opaque type, thus
>   - it requires each time check the type of time under odp_time_diff(),
> which
>        can be used in places sensible for that.
>   - if local counters have better characteristics they can emulate
> global timer,
>     in turn global timer can be used everywhere. And doesn't matter if
> they are
>     the same on some platforms, you should worry about it in
> application anyway.

Application gives information (I need to share this timestamp, I don't need to 
share this one), implementation uses that as it wishes. 

> 
> I propose to use always global time and use API, that is enough for all
> cases:
> Mostly it includes and follows existent time API:
> 
> odp_time_t odp_time(void);
> odp_time_t odp_time_diff(odp_time_t t1, odp_time_t t2); // ranges and
> timestamps
> odp_time_t odp_time_sum(odp_time_t t1, odp_time_t t2);
> uint64_t odp_time_to_ns(odp_time_t time);
> odp_time_t odp_time_from_ns(uint64_t ns);
> int odp_time_cmp(odp_time_t t1, odp_time_t t2); // only ranges
> uint64_t odp_time_to_u64(odp_time_t time); // debugg purposes
> ODP_TIME_NULL // for init and comparison

To_u64 can be added. 

In general, odp_time_t could be a struct and thus pointer could be used for 
reference. Output would be through param and return value could indicate error 
(e.g. too large time value is input).

Also #defines should be minimized for possible future binary compatibility. So, 
odp_time_zero(odp_time_t *t) could be a better option

-Petri

_______________________________________________
lng-odp mailing list
lng-odp@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/lng-odp

Re: [lng-odp] RFC: New time API

Reply via email to