Re: [lng-odp] RFC: New time API

2015-09-02 Thread Ivan Khoronzhuk

Hi, Petri

We have to look at it proceeding from performance, platform portability and 
simplicity

If you want to split on hi-res time and low-res they must have separate 
functions and
be not under one common opaque time in order to not break hi-res measurements.

But in fact you split the same quality timers, farther below...

On 02.09.15 18:21, Savolainen, Petri (Nokia - FI/Espoo) wrote:

Hi,


I think we need to restart the time API discussion and specify it only wall 
time in mind.


Let's suppose.


CPU cycle count APIs can be tuned as a next step. CPU cycle counters are 
affected by frequency scaling, which makes those difficult to use for counting 
linear, real time.
 The time API should specify an easy way to check and use the real, wall clock 
time.
We need at least one time source that will not wrap in years - here it's the 
"global" time
(e.g. in POSIX it's CLOCK_MONOTONIC).


Don't mix it with this API.
CLOCK_MONOTONIC is guaranteed by OS, that can handle wraps, OS can use 
interrupts for that, you cannot.

Also it must be zero at platform init and begin count time when application 
starts. For each application that starts.
Can you guarantee that it's inited to zero for each platform? I hesitate to 
answer on this question.

But let's suppose that we can guarantee it.
In this case time should be aligned for all executed applications to start from 
zero.
Let's suppose some start_time = odp_time() at application init.
As it was noted earlier, in some fast loop, init_count must be extracted in 
diff function.
I'm not talking event about checking of time type, as you are going to put all 
of them under one type.

 Global time can be also compared between threads. Local time is defined for 
optimizing short interval time checks.

In fact global has same quality as local. As noted earlier, global can be 
emulated with local.
Why we need to split them in this case? It'll add load on user only.

It's thread local, may wrap sooner than global time, but may be lower overhead 
to use.

They are both 64-bit. You are going to use the same function for both, overhead 
the same.



There could be actually four time bases defined in the API, if we'd want to 
optimize for each use case (global.hi_res, global.low_res, local.hi_res and 
local.low_res).



I'd propose to have only two and give system specific way to configure those 
(rate and duration).


What do you mean "giv system way...", do you mean add some API?
If so, I disagree. It's not ODP responsibility and it's not every platform 
applicable.


 Typical config would be global.low_res and local.hi_res. User can check hz and 
max time value (wrap around time) with odp_time_info() and adapt (fall back to 
use global time) if e.g. local time wraps too quickly (e.g. in 4 sec).


If this time wrap every 4s, it shouldn't be used at all...(any 32-bits)



See the proposal under.

-Petri




//
// Use cases
//
// high resolution low resolution
// short interval  long interval
// low overheadhigh overhead
//
//
//  global timestamp packets or   |timestamp log entries or
// other global resources |other global resources
// at high rate   |at low rate
//|
// ---+--
//|
//  local  timestamp and sort items   |measure execution time over
// in thread local work queue,|many iterations or over
// measure execution time |a "long" function
// of a "short" function, |
// spin and wait for a short  |
// while
//
//


No see reason to overload user with this stuff.
In fact we always need one hi-resolution time with best quality, no matter what 
we measure.
No matter how resolution it has, it should be the max that platform can provide 
for that.
At this moment all counters are 64-bit and can not wrap for years.
On my opinion,32-bit counter we shouldn't take into account.



// time in nsec
// renamed to leave room for sec or other units in the future
#define ODP_TIME_NS_USEC 1000ULL   /**< Microsecond in nsec */
#define ODP_TIME_NS_MSEC 100ULL/**< Millisecond in nsec */
#define ODP_TIME_NS_SEC  10ULL /**< Second in nsec */
#define ODP_TIME_NS_DAY  ((24*60*60)*ODP_TIME_NS_SEC) /**< Day in nsec */



// Abstract time type
// Implementation specific type, includes e.g.
// - counter value
// - potentially other information: global vs. local time, ...
typedef odp_time_t


This type can be added only with one aim - ask user to use appropriate API that
can handle wraps correctly. In another case, like with global time you are
proposing (no wrap), uint64_t can be used, no need to overload API with 
odp_time_t
and APIs like diff, cmp, etc.



// Get global time
// Global time is common over 

Re: [lng-odp] RFC: New time API

2015-09-03 Thread Ivan Khoronzhuk

One correction, that also makes me worry a little.

On 03.09.15 01:29, Ivan Khoronzhuk wrote:

Hi, Petri

We have to look at it proceeding from performance, platform portability and 
simplicity

If you want to split on hi-res time and low-res they must have separate 
functions and
be not under one common opaque time in order to not break hi-res measurements.

But in fact you split the same quality timers, farther below...

On 02.09.15 18:21, Savolainen, Petri (Nokia - FI/Espoo) wrote:

Hi,


I think we need to restart the time API discussion and specify it only wall 
time in mind.


Let's suppose.


CPU cycle count APIs can be tuned as a next step. CPU cycle counters are 
affected by frequency scaling, which makes those difficult to use for counting 
linear, real time.
 The time API should specify an easy way to check and use the real, wall clock 
time.
We need at least one time source that will not wrap in years - here it's the 
"global" time
(e.g. in POSIX it's CLOCK_MONOTONIC).


Don't mix it with this API.
CLOCK_MONOTONIC is guaranteed by OS, that can handle wraps, OS can use 
interrupts for that, you cannot.

Also it must be zero at platform init and begin count time when application 
starts. For each application that starts.
Can you guarantee that it's inited to zero for each platform? I hesitate to 
answer on this question.

But let's suppose that we can guarantee it.
In this case time should be aligned for all executed applications to start from 
zero.
Let's suppose some start_time = odp_time() at application init.
As it was noted earlier, in some fast loop, init_count must be extracted in 
diff function.
I'm not talking event about checking of time type, as you are going to put all 
of them under one type.

  Global time can be also compared between threads. Local time is defined for 
optimizing short interval time checks.

In fact global has same quality as local. As noted earlier, global can be 
emulated with local.


It's not always true. As we anyway will have some out of sync. It can require 
to periodically synchronize the counters.
We also cannot guarantee it.
That's why every multi-core board has to have common timer/counter on SoC.
In my case it runs with the same rate as arch arm timer, that is local.
I want to believe that for other platforms also, but I can't be sure.
In another case it's hard to guarantee global time availability also.

So we can have situation that local time API is only variant here.
For wall time, common timer or RTC should be used, I no see another variant
And it should be separate API, with not hard requirement.


Why we need to split them in this case? It'll add load on user only.

It's thread local, may wrap sooner than global time, but may be lower overhead 
to use.

They are both 64-bit. You are going to use the same function for both, overhead 
the same.



There could be actually four time bases defined in the API, if we'd want to 
optimize for each use case (global.hi_res, global.low_res, local.hi_res and 
local.low_res).



I'd propose to have only two and give system specific way to configure those 
(rate and duration).


What do you mean "giv system way...", do you mean add some API?
If so, I disagree. It's not ODP responsibility and it's not every platform 
applicable.


 Typical config would be global.low_res and local.hi_res. User can check hz and 
max time value (wrap around time) with odp_time_info() and adapt (fall back to 
use global time) if e.g. local time wraps too quickly (e.g. in 4 sec).


If this time wrap every 4s, it shouldn't be used at all...(any 32-bits)



See the proposal under.

-Petri




//
// Use cases
//
// high resolution low resolution
// short interval  long interval
// low overheadhigh overhead
//
//
//  global timestamp packets or   |timestamp log entries or
// other global resources |other global resources
// at high rate   |at low rate
//|
// ---+--
//|
//  local  timestamp and sort items   |measure execution time over
// in thread local work queue,|many iterations or over
// measure execution time |a "long" function
// of a "short" function, |
// spin and wait for a short  |
// while
//
//


No see reason to overload user with this stuff.
In fact we always need one hi-resolution time with best quality, no matter what 
we measure.
No matter how resolution it has, it should be the max that platform can provide 
for that.
At this moment all counters are 64-bit and can not wrap for years.
On my opinion,32-bit counter we shouldn't take into account.



// time in nsec
// renamed to leave room for sec or other units in the future
#define ODP_TIME_NS_USEC 10

Re: [lng-odp] RFC: New time API

2015-09-03 Thread Savolainen, Petri (Nokia - FI/Espoo)


> -Original Message-
> From: ext Ivan Khoronzhuk [mailto:ivan.khoronz...@linaro.org]
> Sent: Thursday, September 03, 2015 1:29 AM
> To: Savolainen, Petri (Nokia - FI/Espoo); lng-odp@lists.linaro.org
> Subject: Re: [lng-odp] RFC: New time API
> 
> Hi, Petri
> 
> We have to look at it proceeding from performance, platform portability
> and simplicity
> 
> If you want to split on hi-res time and low-res they must have separate
> functions and
> be not under one common opaque time in order to not break hi-res
> measurements.
> 
> But in fact you split the same quality timers, farther below...


The API has two goal (as any other API under ODP)
- solve a user problem (take timestamps and work with those)
- enable good performance on multiple HW platforms (enable direct HW time 
counter(s) usage)



> 
> On 02.09.15 18:21, Savolainen, Petri (Nokia - FI/Espoo) wrote:
> > Hi,
> >
> >
> > I think we need to restart the time API discussion and specify it
> only wall time in mind.
> 
> Let's suppose.
> 
> > CPU cycle count APIs can be tuned as a next step. CPU cycle counters
> are affected by frequency scaling, which makes those difficult to use
> for counting linear, real time.
> >  The time API should specify an easy way to check and use the real,
> wall clock time.
> > We need at least one time source that will not wrap in years - here
> it's the "global" time
> > (e.g. in POSIX it's CLOCK_MONOTONIC).
> 
> Don't mix it with this API.
> CLOCK_MONOTONIC is guaranteed by OS, that can handle wraps, OS can use
> interrupts for that, you cannot.


A monotonic, very long wrap around time source is an application requirement. 
CLOCK_MONOTONIC is an example of solving the same requirement in POSIX world.

Yes, an ODP implementation should not be interrupt driven, but still an 
implementation can and likely will serve some interrupts: in worst case on a 
worker core, in a better case on a control core and in the best case on a 
system core outside of the ODP application (e.g. linux kernel on core #0). The 
key is how often those interrupts need to be served and how long it takes in 
the worst case. E.g. one interrupt due to counter wrap in 5 years on the core 
running linux kernel, does not matter much.

Counter wraps are really an issue with short time counters. Today most chips 
provide large enough counters, so that wrap around is not really an issue (max 
one wrap in several years).
 

> 
> Also it must be zero at platform init and begin count time when
> application starts. For each application that starts.
> Can you guarantee that it's inited to zero for each platform? I
> hesitate to answer on this question.

The API does not specify that HW counter is reset at any point in time. It 
specifies that wall time (nsec time) is zero in an application start up. In 
practice, implementation needs to read HW counter once in start up and store 
it. Basic stuff.


> 
> But let's suppose that we can guarantee it.
> In this case time should be aligned for all executed applications to
> start from zero.
> Let's suppose some start_time = odp_time() at application init.
> As it was noted earlier, in some fast loop, init_count must be
> extracted in diff function.
> I'm not talking event about checking of time type, as you are going to
> put all of them under one type.
> 
>   Global time can be also compared between threads. Local time is
> defined for optimizing short interval time checks.
> 
> In fact global has same quality as local. As noted earlier, global can
> be emulated with local.
> Why we need to split them in this case? It'll add load on user only.
> 
> It's thread local, may wrap sooner than global time, but may be lower
> overhead to use.
> 
> They are both 64-bit. You are going to use the same function for both,
> overhead the same.


First, this as any other ODP API affect only one ODP application (instance). 
Global time is global between threads of a single application. 

These are two different application use cases:
- global == time that can be shared between thread
- local  == time that does not need to be shared between threads

Ability to share is the key difference in quality. Yes, when there's a SoC 
level, low latency, high frequency, 64-bit time counter - it's sensible to use 
that to implement both local and global. In this case, implementation also 
avoids check between global and local time (it's all the same). I'm expecting 
that this is the common case.

BUT, what if the SoC level HW counter has high latency to access (e.g. 150 CPU 
cycles) and is low frequency? And the HW would have low latency, high frequency 
per CPU counter that you could use for counting local time? If API has only 
glo

Re: [lng-odp] RFC: New time API

2015-09-03 Thread Ivan Khoronzhuk



On 03.09.15 14:32, Savolainen, Petri (Nokia - FI/Espoo) wrote:




-Original Message-
From: ext Ivan Khoronzhuk [mailto:ivan.khoronz...@linaro.org]
Sent: Thursday, September 03, 2015 1:29 AM
To: Savolainen, Petri (Nokia - FI/Espoo); lng-odp@lists.linaro.org
Subject: Re: [lng-odp] RFC: New time API

Hi, Petri

We have to look at it proceeding from performance, platform portability
and simplicity

If you want to split on hi-res time and low-res they must have separate
functions and
be not under one common opaque time in order to not break hi-res
measurements.

But in fact you split the same quality timers, farther below...



The API has two goal (as any other API under ODP)
- solve a user problem (take timestamps and work with those)
- enable good performance on multiple HW platforms (enable direct HW time 
counter(s) usage)





On 02.09.15 18:21, Savolainen, Petri (Nokia - FI/Espoo) wrote:

Hi,


I think we need to restart the time API discussion and specify it

only wall time in mind.

Let's suppose.


CPU cycle count APIs can be tuned as a next step. CPU cycle counters

are affected by frequency scaling, which makes those difficult to use
for counting linear, real time.

  The time API should specify an easy way to check and use the real,

wall clock time.

We need at least one time source that will not wrap in years - here

it's the "global" time

(e.g. in POSIX it's CLOCK_MONOTONIC).


Don't mix it with this API.
CLOCK_MONOTONIC is guaranteed by OS, that can handle wraps, OS can use
interrupts for that, you cannot.



A monotonic, very long wrap around time source is an application requirement. 
CLOCK_MONOTONIC is an example of solving the same requirement in POSIX world.

Yes, an ODP implementation should not be interrupt driven, but still an 
implementation can and likely will serve some interrupts: in worst case on a 
worker core, in a better case on a control core and in the best case on a 
system core outside of the ODP application (e.g. linux kernel on core #0). The 
key is how often those interrupts need to be served and how long it takes in 
the worst case. E.g. one interrupt due to counter wrap in 5 years on the core 
running linux kernel, does not matter much.

Counter wraps are really an issue with short time counters. Today most chips 
provide large enough counters, so that wrap around is not really an issue (max 
one wrap in several years).




Also it must be zero at platform init and begin count time when
application starts. For each application that starts.
Can you guarantee that it's inited to zero for each platform? I
hesitate to answer on this question.


The API does not specify that HW counter is reset at any point in time. It 
specifies that wall time (nsec time) is zero in an application start up. In 
practice, implementation needs to read HW counter once in start up and store 
it. Basic stuff.




But let's suppose that we can guarantee it.
In this case time should be aligned for all executed applications to
start from zero.
Let's suppose some start_time = odp_time() at application init.
As it was noted earlier, in some fast loop, init_count must be
extracted in diff function.
I'm not talking event about checking of time type, as you are going to
put all of them under one type.

   Global time can be also compared between threads. Local time is
defined for optimizing short interval time checks.

In fact global has same quality as local. As noted earlier, global can
be emulated with local.
Why we need to split them in this case? It'll add load on user only.

It's thread local, may wrap sooner than global time, but may be lower
overhead to use.

They are both 64-bit. You are going to use the same function for both,
overhead the same.



First, this as any other ODP API affect only one ODP application (instance). 
Global time is global between threads of a single application.

These are two different application use cases:
- global == time that can be shared between thread
- local  == time that does not need to be shared between threads

Ability to share is the key difference in quality. Yes, when there's a SoC 
level, low latency, high frequency, 64-bit time counter - it's sensible to use 
that to implement both local and global. In this case, implementation also 
avoids check between global and local time (it's all the same). I'm expecting 
that this is the common case.

BUT, what if the SoC level HW counter has high latency to access (e.g. 150 CPU 
cycles) and is low frequency? And the HW would have low latency, high frequency 
per CPU counter that you could use for counting local time? If API has only 
global definition, you could not use that HW resource even when application is 
not interested in sharing timestamps with other thread (needs only local time). 
The application would run slower on your HW, since every (local) timestamp 
would consume 150 cycles instead of e.g. 1 cycle.



Re: [lng-odp] RFC: New time API

2015-09-04 Thread Savolainen, Petri (Nokia - FI/Espoo)


> > Time is linear - the API needs to support that. Application can check
> if local time stays linear long enough for its use case.
> 
> It doesn't sound like simplification. In current variant user don't
> need to worry about this.
> 
> 
> > If it does not, the global time should be the fall back (wrap only
> after several years).
> >
> > Range is a relative term - ranges longer than the wrap around time
> (in real time) would again cause problems.
> 
> I thinks no need to compare ranges more then years. It's not for this
> use-case.


For example, a bit 32 bit counter at 1 GHz (e.g. CPU local counter, used for 
CPU local time) wraps in 4 sec


> 
> >
> > One option would be to force all time sources to have very long wrap
> around times, which may cause low resolution on all of them (not only
> global). Maybe it's better to just specify that cmp() must not be used
> if (nsec) time can wrap between t1 and t2.
> 
> This also doesn't sounds as simplification. How user can know he is
> comparing wrapped time or not?
> He cannot - that's the problem. No one cannot. You cannot predict what
> points user compare.
> You cannot emulate it in implementation also (suppose worst case -
> implementation cannot grantee counter is united to 0 at board start),
> as first wrap can happen any time, it's second takes years. Relying on
> this makes all applications very configuration dependent.
> That is the one of the main and bright examples that allow to see why
> we don't need to hide wraps.
> So this function cannot be used with timestamps at all, only ranges. To
> get range, you must use diff function,
> diff function can handle wraps inside. That is. If you must use diff
> and cmp then why bother with wall time?
> 
> Why user still should think about wraps, if you want to equalize it to
> wall time?
> Or even, use this function (it was one of your ideas), to check time
> order with function that requires order..
> What about to not bother with chicken/egg issue and always assume that
> wrap can happen or cannot at all.
> Only describe in API file, it must be > 10 years, for instance, before
> first wrap.
> And if your application can run more than 10 years it can suddenly
> fail.
> Uh..or add in description. ..never change your dtb file to another init
> value or freq
> if you don't know what are you doing...in another way you application
> can suddenly fail...
> It be threshold for orientation and both implementation and application
> can lie on it. And hardly control.


Implementation can handle single counter wrap and maintain nsec time which 
starts from 0, by reading and storing the counter value in ODP init. 
Application needs only worry about nsec time wrap (is xxx_nsec_max large enough 
for its purposes / lifetime).


For example, an application:
- needs high resolution
- low CPU overhead
- don't need to share the time
- needs to compare timestamps (e.g. sort a list based on timestamps)


SoC 1:
- local time counter is 32 bits and runs at core freq (local_hz == 1GHz, 
local_nsec_max == ~4sec)
  => nsec time wraps every 4 sec
  => nsec time wrap is possible between any t1 and t2
  => cannot use cmp() for absolute time stamps
  => must use global time if need to compare ... 
- global time counter runs at 25 MHz and wraps in 40 years (global_hz == 25 
MHz, global_nsec_max == ~40 years)
  => nsec time (that starts from zero by the spec) will not wrap in the life 
time of this application
  => can use cmp(), but must accept the low resolution. Done.

SoC 2:
- local time counter is 64 bits and runs at core freq (local_hz == 1GHz, 
local_nsec_max == ~580 years)
  => nsec time (that start from zero by the spec) will not wrap in the life 
time of this application
  => nsec time wrap is not possible
  => can use cmp(). Done.



-Petri

___
lng-odp mailing list
lng-odp@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/lng-odp


Re: [lng-odp] RFC: New time API

2015-09-04 Thread Ivan Khoronzhuk

Hi, Petri

On 04.09.15 13:55, Savolainen, Petri (Nokia - FI/Espoo) wrote:




Time is linear - the API needs to support that. Application can check

if local time stays linear long enough for its use case.

It doesn't sound like simplification. In current variant user don't
need to worry about this.



If it does not, the global time should be the fall back (wrap only

after several years).


Range is a relative term - ranges longer than the wrap around time

(in real time) would again cause problems.

I thinks no need to compare ranges more then years. It's not for this
use-case.



For example, a bit 32 bit counter at 1 GHz (e.g. CPU local counter, used for 
CPU local time) wraps in 4 sec


Seems we decided to not think about 32-bit timer here. Right?
It's impossible to work with times more than time wrap, not only for ranges.
Any of time API functions is not working correctly in this case, not only cmpr.








One option would be to force all time sources to have very long wrap

around times, which may cause low resolution on all of them (not only
global). Maybe it's better to just specify that cmp() must not be used
if (nsec) time can wrap between t1 and t2.

This also doesn't sounds as simplification. How user can know he is
comparing wrapped time or not?
He cannot - that's the problem. No one cannot. You cannot predict what
points user compare.
You cannot emulate it in implementation also (suppose worst case -
implementation cannot grantee counter is united to 0 at board start),
as first wrap can happen any time, it's second takes years. Relying on
this makes all applications very configuration dependent.
That is the one of the main and bright examples that allow to see why
we don't need to hide wraps.
So this function cannot be used with timestamps at all, only ranges. To
get range, you must use diff function,
diff function can handle wraps inside. That is. If you must use diff
and cmp then why bother with wall time?


Don't remove the main problem from this thread.
We need to put dot here. So please, answer. Don't forget to mention about
cmp() and that first wrap can happen any time.



Why user still should think about wraps, if you want to equalize it to
wall time?
Or even, use this function (it was one of your ideas), to check time
order with function that requires order..
What about to not bother with chicken/egg issue and always assume that
wrap can happen or cannot at all.
Only describe in API file, it must be > 10 years, for instance, before
first wrap.
And if your application can run more than 10 years it can suddenly
fail.
Uh..or add in description. ..never change your dtb file to another init
value or freq
if you don't know what are you doing...in another way you application
can suddenly fail...
It be threshold for orientation and both implementation and application
can lie on it. And hardly control.



Implementation can handle single counter wrap and maintain nsec time which 
starts from 0,

Yes, implementation can, but only for ns. Then you can compare, diff and sum 
only in ns.
Correct first wrap (and only 1, and we don't need more) can be guaranteed only 
in diff function,
not in odp_time() and odp_time_cmp(). You cannot compare in odp_time_t as wall 
time, only ns.
But you probably want not only ns.


by reading and storing the counter value in ODP init.
Application needs only worry about nsec time wrap (is xxx_nsec_max large enough 
for its purposes / lifetime).

Sorry, if implementation can maintain nsec time which starts from 0,
why does application need to worry about nsec wrap? I mean only 64-bit counter 
and above.




For example, an application:
- needs high resolution
- low CPU overhead
- don't need to share the time
- needs to compare timestamps (e.g. sort a list based on timestamps)


SoC 1:
- local time counter is 32 bits and runs at core freq (local_hz == 1GHz, 
local_nsec_max == ~4sec)

32-bit, again, but lets assume.


   => nsec time wraps every 4 sec
   => nsec time wrap is possible between any t1 and t2
   => cannot use cmp() for absolute time stamps

and not only for 32-bit, also 64-bit that doesn't guarantee zero at some init.


   => must use global time if need to compare ...
- global time counter runs at 25 MHz and wraps in 40 years (global_hz == 25 
MHz, global_nsec_max == ~40 years)
   => nsec time (that starts from zero by the spec) will not wrap in the life 
time of this application
   => can use cmp(), but must accept the low resolution. Done.

SoC 2:
- local time counter is 64 bits and runs at core freq (local_hz == 1GHz, 
local_nsec_max == ~580 years)

I tend to not limit freq on currently possible frequencies. Don't forget that 
you can have counter inited to any value.
The first wrap can happen any time. We should count on such cases also.
Also it can be configurable, and it's not safe from stupid conf.
Also it can be h/w init at 0, but on some emulator, not.then - glitches.
Run before some h/w timer test, that can leave it in non zero stat