On 03.09.15 14:32, Savolainen, Petri (Nokia - FI/Espoo) wrote:


-----Original Message-----
From: ext Ivan Khoronzhuk [mailto:ivan.khoronz...@linaro.org]
Sent: Thursday, September 03, 2015 1:29 AM
To: Savolainen, Petri (Nokia - FI/Espoo); lng-odp@lists.linaro.org
Subject: Re: [lng-odp] RFC: New time API

Hi, Petri

We have to look at it proceeding from performance, platform portability
and simplicity

If you want to split on hi-res time and low-res they must have separate
functions and
be not under one common opaque time in order to not break hi-res
measurements.

But in fact you split the same quality timers, farther below...


The API has two goal (as any other API under ODP)
- solve a user problem (take timestamps and work with those)
- enable good performance on multiple HW platforms (enable direct HW time 
counter(s) usage)




On 02.09.15 18:21, Savolainen, Petri (Nokia - FI/Espoo) wrote:
Hi,


I think we need to restart the time API discussion and specify it
only wall time in mind.

Let's suppose.

CPU cycle count APIs can be tuned as a next step. CPU cycle counters
are affected by frequency scaling, which makes those difficult to use
for counting linear, real time.
  The time API should specify an easy way to check and use the real,
wall clock time.
We need at least one time source that will not wrap in years - here
it's the "global" time
(e.g. in POSIX it's CLOCK_MONOTONIC).

Don't mix it with this API.
CLOCK_MONOTONIC is guaranteed by OS, that can handle wraps, OS can use
interrupts for that, you cannot.


A monotonic, very long wrap around time source is an application requirement. 
CLOCK_MONOTONIC is an example of solving the same requirement in POSIX world.

Yes, an ODP implementation should not be interrupt driven, but still an 
implementation can and likely will serve some interrupts: in worst case on a 
worker core, in a better case on a control core and in the best case on a 
system core outside of the ODP application (e.g. linux kernel on core #0). The 
key is how often those interrupts need to be served and how long it takes in 
the worst case. E.g. one interrupt due to counter wrap in 5 years on the core 
running linux kernel, does not matter much.

Counter wraps are really an issue with short time counters. Today most chips 
provide large enough counters, so that wrap around is not really an issue (max 
one wrap in several years).



Also it must be zero at platform init and begin count time when
application starts. For each application that starts.
Can you guarantee that it's inited to zero for each platform? I
hesitate to answer on this question.

The API does not specify that HW counter is reset at any point in time. It 
specifies that wall time (nsec time) is zero in an application start up. In 
practice, implementation needs to read HW counter once in start up and store 
it. Basic stuff.



But let's suppose that we can guarantee it.
In this case time should be aligned for all executed applications to
start from zero.
Let's suppose some start_time = odp_time() at application init.
As it was noted earlier, in some fast loop, init_count must be
extracted in diff function.
I'm not talking event about checking of time type, as you are going to
put all of them under one type.

   Global time can be also compared between threads. Local time is
defined for optimizing short interval time checks.

In fact global has same quality as local. As noted earlier, global can
be emulated with local.
Why we need to split them in this case? It'll add load on user only.

It's thread local, may wrap sooner than global time, but may be lower
overhead to use.

They are both 64-bit. You are going to use the same function for both,
overhead the same.


First, this as any other ODP API affect only one ODP application (instance). 
Global time is global between threads of a single application.

These are two different application use cases:
- global == time that can be shared between thread
- local  == time that does not need to be shared between threads

Ability to share is the key difference in quality. Yes, when there's a SoC 
level, low latency, high frequency, 64-bit time counter - it's sensible to use 
that to implement both local and global. In this case, implementation also 
avoids check between global and local time (it's all the same). I'm expecting 
that this is the common case.

BUT, what if the SoC level HW counter has high latency to access (e.g. 150 CPU 
cycles) and is low frequency? And the HW would have low latency, high frequency 
per CPU counter that you could use for counting local time? If API has only 
global definition, you could not use that HW resource even when application is 
not interested in sharing timestamps with other thread (needs only local time). 
The application would run slower on your HW, since every (local) timestamp 
would consume 150 cycles instead of e.g. 1 cycle.




There could be actually four time bases defined in the API, if we'd
want to optimize for each use case (global.hi_res, global.low_res,
local.hi_res and local.low_res).

I'd propose to have only two and give system specific way to
configure those (rate and duration).

What do you mean "giv system way...", do you mean add some API?
If so, I disagree. It's not ODP responsibility and it's not every
platform applicable.

Time counters are likely used also for OS, etc. So, vendor would need to document e.g. if 
and how ODP global time resolution can be tuned. It may be e.g. through a Linux boot 
parameter, because ODP reads the same counter that Linux uses for its wall clock time. 
It's "system specific" how the HW resource (e.g. time counter rate) can be 
configured and it's unlikely that an ODP application could change the setting, so we 
don't need an API to set the rate, only an API to get the rate.



  Typical config would be global.low_res and local.hi_res. User can
check hz and max time value (wrap around time) with odp_time_info() and
adapt (fall back to use global time) if e.g. local time wraps too
quickly (e.g. in 4 sec).

If this time wrap every 4s, it shouldn't be used at all...(any 32-bits)

Agree. Frequent wraps is the main problem, but can we rule today that any HW 
interesting to ODP must have HW time counter large enough to wrap once per no 
less than X years. We cannot rule the implementation (e.g. 32, 48, 64 bit 
counter) only the API spec. API can require that there's at least one time 
source that won't wrap often (global). It's then up to the implementation how 
that's guaranteed (natively due to large enough counter, co-operating with OS, 
using per core interrupts, ...).



See the proposal under.

-Petri




//
// Use cases
//
//             high resolution                 low resolution
//             short interval                  long interval
//             low overhead                    high overhead
//
//
//  global     timestamp packets or       |    timestamp log entries
or
//             other global resources     |    other global resources
//             at high rate               |    at low rate
//                                        |
//             ---------------------------+--------------------------
----
//                                        |
//  local      timestamp and sort items   |    measure execution time
over
//             in thread local work queue,|    many iterations or
over
//             measure execution time     |    a "long" function
//             of a "short" function,     |
//             spin and wait for a short  |
//             while
//
//

No see reason to overload user with this stuff.
In fact we always need one hi-resolution time with best quality, no
matter what we measure.
No matter how resolution it has, it should be the max that platform can
provide for that.
At this moment all counters are 64-bit and can not wrap for years.
On my opinion,32-bit counter we shouldn't take into account.


These are *application use cases*. One use case could be for example: stamp every 
log entry (average 1 entry per minute) with millisecond resolution in global 
time... => does not need low overhead or high resolution, but globally 
synchronized linear  time.



// time in nsec
// renamed to leave room for sec or other units in the future
#define ODP_TIME_NS_USEC 1000ULL       /**< Microsecond in nsec */
#define ODP_TIME_NS_MSEC 1000000ULL    /**< Millisecond in nsec */
#define ODP_TIME_NS_SEC  1000000000ULL /**< Second in nsec */
#define ODP_TIME_NS_DAY  ((24*60*60)*ODP_TIME_NS_SEC) /**< Day in
nsec */



// Abstract time type
// Implementation specific type, includes e.g.
// - counter value
// - potentially other information: global vs. local time, ...
typedef odp_time_t

This type can be added only with one aim - ask user to use appropriate
API that
can handle wraps correctly. In another case, like with global time you
are
proposing (no wrap), uint64_t can be used, no need to overload API with
odp_time_t
and APIs like diff, cmp, etc.

Main benefit from abstract time is that implementation can work in native 
counter values. If API would specify that time is always nsec (or sec+nsec like 
in POSIX struct timespec), every timestamp operation would need to convert 
between counter cycles and nsec (which may add e.g. division operations in the 
calls).



// Get global time
// Global time is common over all threads. Global timestamps can be
compared
// between threads.
odp_time_t odp_time(void);

// Get thread local time
// Thread local time is only meaningful for the calling thread. It
cannot be
// compare with other timestamps (global or local from other
threads).
// May run from different clock source and different rate than global
time.
// User must take care not to mix local and global time values in API
calls.
odp_time_t odp_time_local(void);

I dislike the idea of local time. Theoretically it can be added, but I
no see reason for that.
Even if it's required, it should be handled with separate functions, as
according to RFC it can
overlap, global cannot. In every function the time type has to be
checked and different approach chosen.
It's time consuming redundancy for short periods and this reduces the
actual resolution.


The reason is implementation efficiency. Implementation can be optimized for 
local time (e.g. CPU local counters), when user doesn't need globally sharable 
time value.

The spec says: local can wrap, NOT that it must wrap.

Implementation decides and knows:
- if it's possible to wrap (in practice)
- if it's identical to global time (== no redundancy, no checks needed, same 
code serves both)

Yes in most cases it be so, but no guarantee.





// Compare time values
//
// Check if t1 is before t2 in absolute time, or if interval t1 is
shorter
// than interval t2
//
// -1: t2 <  t1
//  0: t2 == t1
//  1: t2 >  t1
int odp_time_cmp(odp_time_t t1, odp_time_t t2);

This function, according to RFC, must behave differently for local and
global time.
And use cases also different, for time than can wrap, it can be used
only for ranges.
But, again, I dislike to guarantee any timer linearity.
I added this function with only one intention - simply compare time
ranges, not more.


Time is linear - the API needs to support that. Application can check if local 
time stays linear long enough for its use case.

It doesn't sound like simplification. In current variant user don't need to 
worry about this.


If it does not, the global time should be the fall back (wrap only after 
several years).

Range is a relative term - ranges longer than the wrap around time (in real 
time) would again cause problems.

I thinks no need to compare ranges more then years. It's not for this use-case.


One option would be to force all time sources to have very long wrap around 
times, which may cause low resolution on all of them (not only global). Maybe 
it's better to just specify that cmp() must not be used if (nsec) time can wrap 
between t1 and t2.

This also doesn't sounds as simplification. How user can know he is comparing 
wrapped time or not?
He cannot - that's the problem. No one cannot. You cannot predict what points 
user compare.
You cannot emulate it in implementation also (suppose worst case - 
implementation cannot grantee counter is united to 0 at board start),
as first wrap can happen any time, it's second takes years. Relying on this 
makes all applications very configuration dependent.
That is the one of the main and bright examples that allow to see why we don't 
need to hide wraps.
So this function cannot be used with timestamps at all, only ranges. To get 
range, you must use diff function,
diff function can handle wraps inside. That is. If you must use diff and cmp 
then why bother with wall time?

Why user still should think about wraps, if you want to equalize it to wall 
time?
Or even, use this function (it was one of your ideas), to check time order with 
function that requires order......
What about to not bother with chicken/egg issue and always assume that wrap can 
happen or cannot at all.
Only describe in API file, it must be > 10 years, for instance, before first 
wrap.
And if your application can run more than 10 years it can suddenly fail.
Uh..or add in description. ..never change your dtb file to another init value 
or freq
if you don't know what are you doing...in another way you application can 
suddenly fail...
It be threshold for orientation and both implementation and application can lie 
on it. And hardly control.







// Sum of t1 and t2
//
// User can sum timestamps or accumulate multiple intervals before
// comparing or converting to nsec
odp_time_t odp_time_sum(odp_time_t t1, odp_time_t t2);

// Time difference between t1 and t2
//
// Calculate interval from timestamp t1 to t2, or difference of two
intervals.
// T2 must be the latter timestamp, or the longer interval (t2 >=
t1).
// Use cmp() first, if don't know which timestamp is the
latter/longer.

Event if suppose that it's split on local/global
you cannot use it to compare timestamps that can wrap (local)
Compare can be used only to compare RANGES.

odp_time_t odp_time_diff(odp_time_t t1, odp_time_t t2);

// Convert ODP time to wall clock time in nsec
//
// Wall clock time advances linearly in realtime and starts from 0 in
ODP init.
//
// Global time must not wrap in several years (max time value is
defined by
// info.global_nsec_max). Local time may have shorter wrap around
time
// (info.local_nsec_max) than global, but it's also recommended to be
years.
//
// Global and local time may run from different time base and thus
result
// different nsec values.
uint64_t odp_time_to_ns(odp_time_t time);

As I see it can be used for "local" time also.
You cannot get wall clock time from time counter that can wrap with
this function.
It can be done only in this way:

start_time = odp_time(); // at init.
....

odp_time_ns(odp_time_diff(start_time, odp_time()))



Yes, this is what implementation needs to do when converting odp_time_t to nsec 
time. User can see from the info struct when (and how often) a time source will 
wrap. Both global and local nsec time may wrap the first time after e.g. >100 
years when implemented with 64 bit counters.

Only if it can wrap, if it cannot  odp_time_ns(odp_time() - start_time)) is 
enough.




// convert nsec value to global time
odp_time_t odp_time_from_ns(uint64_t ns);

// convert nsec value to local time
odp_time_t odp_time_local_from_ns(uint64_t ns);

// Time info structure
typedef struct {
        // Global timestamp resolution in hz
        uint64_t global_hz;
        // Max global time value in nsec. Global time values (timestamps
or
        // intervals) larger than this are not handled correctly.
        // Global wall clock time wraps back to zero after this value.
        uint64_t global_nsec_max;

User don't need to worry about this parameter.
Why do we need this? I no see any usecase. Only if user wants catch
wraps.
But why then add wall global time if he needs to worry about this.
Strange.

We can easily spec that this should be in minimum "several years". It gets 
trickier to spec that it must be at least X years. What would be good number that 
everybody can support efficiently in HW? If we find a number let's put it here.



        // Local timestamp resolution in hz
        uint64_t local_hz;
        // Max local time value in nsec. Local time values (timestamps or
        // intervals) larger than this are not handled correctly.
        // Local wall clock time wraps back to zero after this value.
        uint64_t local_nsec_max;
} odp_time_info_t;

// Time info request
//
// Fill in time info struct. User can check resolutions and max time
values¨
// in nsec.
//
// 0 on success
// <0 on failure
int odp_time_info(odp_time_info_t *info);
_______________________________________________
lng-odp mailing list
lng-odp@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/lng-odp


In summary:
* I like wall time, but:
    - it requires to extract init value in places where it's not needed.
    - requires to guarantee that timer is set counter to 0 at init.
    - can be replaced with:
        odp_time_diff(start_init_time, odp_time()) // give you wall
time,
                                                   // then convert to ns
if you need.
      which doesn't require guarantee to be 0 at init.

Better to do this once and inside implementation. Only read access to HW 
counter is needed.


* According local time
   - it increases complexity.
   - it requires to hold different types of time under opaque type, thus
   - it requires each time check the type of time under odp_time_diff(),
which
        can be used in places sensible for that.
   - if local counters have better characteristics they can emulate
global timer,
     in turn global timer can be used everywhere. And doesn't matter if
they are
     the same on some platforms, you should worry about it in
application anyway.

Application gives information (I need to share this timestamp, I don't need to 
share this one), implementation uses that as it wishes.


I propose to use always global time and use API, that is enough for all
cases:
Mostly it includes and follows existent time API:

odp_time_t odp_time(void);
odp_time_t odp_time_diff(odp_time_t t1, odp_time_t t2); // ranges and
timestamps
odp_time_t odp_time_sum(odp_time_t t1, odp_time_t t2);
uint64_t odp_time_to_ns(odp_time_t time);
odp_time_t odp_time_from_ns(uint64_t ns);
int odp_time_cmp(odp_time_t t1, odp_time_t t2); // only ranges
uint64_t odp_time_to_u64(odp_time_t time); // debugg purposes
ODP_TIME_NULL // for init and comparison

To_u64 can be added.

In general, odp_time_t could be a struct and thus pointer could be used for 
reference. Output would be through param and return value could indicate error 
(e.g. too large time value is input).

Also #defines should be minimized for possible future binary compatibility. So, 
odp_time_zero(odp_time_t *t) could be a better option

-Petri




I like more separate API for global timer, and no any hidden wraps. All should 
be correctly written w/o pink glasses.


--
Regards,
Ivan Khoronzhuk
_______________________________________________
lng-odp mailing list
lng-odp@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/lng-odp

Reply via email to