Re: [lng-odp] RFC: New time API
Hi, Petri We have to look at it proceeding from performance, platform portability and simplicity If you want to split on hi-res time and low-res they must have separate functions and be not under one common opaque time in order to not break hi-res measurements. But in fact you split the same quality timers, farther below... On 02.09.15 18:21, Savolainen, Petri (Nokia - FI/Espoo) wrote: Hi, I think we need to restart the time API discussion and specify it only wall time in mind. Let's suppose. CPU cycle count APIs can be tuned as a next step. CPU cycle counters are affected by frequency scaling, which makes those difficult to use for counting linear, real time. The time API should specify an easy way to check and use the real, wall clock time. We need at least one time source that will not wrap in years - here it's the "global" time (e.g. in POSIX it's CLOCK_MONOTONIC). Don't mix it with this API. CLOCK_MONOTONIC is guaranteed by OS, that can handle wraps, OS can use interrupts for that, you cannot. Also it must be zero at platform init and begin count time when application starts. For each application that starts. Can you guarantee that it's inited to zero for each platform? I hesitate to answer on this question. But let's suppose that we can guarantee it. In this case time should be aligned for all executed applications to start from zero. Let's suppose some start_time = odp_time() at application init. As it was noted earlier, in some fast loop, init_count must be extracted in diff function. I'm not talking event about checking of time type, as you are going to put all of them under one type. Global time can be also compared between threads. Local time is defined for optimizing short interval time checks. In fact global has same quality as local. As noted earlier, global can be emulated with local. Why we need to split them in this case? It'll add load on user only. It's thread local, may wrap sooner than global time, but may be lower overhead to use. They are both 64-bit. You are going to use the same function for both, overhead the same. There could be actually four time bases defined in the API, if we'd want to optimize for each use case (global.hi_res, global.low_res, local.hi_res and local.low_res). I'd propose to have only two and give system specific way to configure those (rate and duration). What do you mean "giv system way...", do you mean add some API? If so, I disagree. It's not ODP responsibility and it's not every platform applicable. Typical config would be global.low_res and local.hi_res. User can check hz and max time value (wrap around time) with odp_time_info() and adapt (fall back to use global time) if e.g. local time wraps too quickly (e.g. in 4 sec). If this time wrap every 4s, it shouldn't be used at all...(any 32-bits) See the proposal under. -Petri // // Use cases // // high resolution low resolution // short interval long interval // low overheadhigh overhead // // // global timestamp packets or |timestamp log entries or // other global resources |other global resources // at high rate |at low rate //| // ---+-- //| // local timestamp and sort items |measure execution time over // in thread local work queue,|many iterations or over // measure execution time |a "long" function // of a "short" function, | // spin and wait for a short | // while // // No see reason to overload user with this stuff. In fact we always need one hi-resolution time with best quality, no matter what we measure. No matter how resolution it has, it should be the max that platform can provide for that. At this moment all counters are 64-bit and can not wrap for years. On my opinion,32-bit counter we shouldn't take into account. // time in nsec // renamed to leave room for sec or other units in the future #define ODP_TIME_NS_USEC 1000ULL /**< Microsecond in nsec */ #define ODP_TIME_NS_MSEC 100ULL/**< Millisecond in nsec */ #define ODP_TIME_NS_SEC 10ULL /**< Second in nsec */ #define ODP_TIME_NS_DAY ((24*60*60)*ODP_TIME_NS_SEC) /**< Day in nsec */ // Abstract time type // Implementation specific type, includes e.g. // - counter value // - potentially other information: global vs. local time, ... typedef odp_time_t This type can be added only with one aim - ask user to use appropriate API that can handle wraps correctly. In another case, like with global time you are proposing (no wrap), uint64_t can be used, no need to overload API with odp_time_t and APIs like diff, cmp, etc. // Get global time // Global time is common over
Re: [lng-odp] RFC: New time API
One correction, that also makes me worry a little. On 03.09.15 01:29, Ivan Khoronzhuk wrote: Hi, Petri We have to look at it proceeding from performance, platform portability and simplicity If you want to split on hi-res time and low-res they must have separate functions and be not under one common opaque time in order to not break hi-res measurements. But in fact you split the same quality timers, farther below... On 02.09.15 18:21, Savolainen, Petri (Nokia - FI/Espoo) wrote: Hi, I think we need to restart the time API discussion and specify it only wall time in mind. Let's suppose. CPU cycle count APIs can be tuned as a next step. CPU cycle counters are affected by frequency scaling, which makes those difficult to use for counting linear, real time. The time API should specify an easy way to check and use the real, wall clock time. We need at least one time source that will not wrap in years - here it's the "global" time (e.g. in POSIX it's CLOCK_MONOTONIC). Don't mix it with this API. CLOCK_MONOTONIC is guaranteed by OS, that can handle wraps, OS can use interrupts for that, you cannot. Also it must be zero at platform init and begin count time when application starts. For each application that starts. Can you guarantee that it's inited to zero for each platform? I hesitate to answer on this question. But let's suppose that we can guarantee it. In this case time should be aligned for all executed applications to start from zero. Let's suppose some start_time = odp_time() at application init. As it was noted earlier, in some fast loop, init_count must be extracted in diff function. I'm not talking event about checking of time type, as you are going to put all of them under one type. Global time can be also compared between threads. Local time is defined for optimizing short interval time checks. In fact global has same quality as local. As noted earlier, global can be emulated with local. It's not always true. As we anyway will have some out of sync. It can require to periodically synchronize the counters. We also cannot guarantee it. That's why every multi-core board has to have common timer/counter on SoC. In my case it runs with the same rate as arch arm timer, that is local. I want to believe that for other platforms also, but I can't be sure. In another case it's hard to guarantee global time availability also. So we can have situation that local time API is only variant here. For wall time, common timer or RTC should be used, I no see another variant And it should be separate API, with not hard requirement. Why we need to split them in this case? It'll add load on user only. It's thread local, may wrap sooner than global time, but may be lower overhead to use. They are both 64-bit. You are going to use the same function for both, overhead the same. There could be actually four time bases defined in the API, if we'd want to optimize for each use case (global.hi_res, global.low_res, local.hi_res and local.low_res). I'd propose to have only two and give system specific way to configure those (rate and duration). What do you mean "giv system way...", do you mean add some API? If so, I disagree. It's not ODP responsibility and it's not every platform applicable. Typical config would be global.low_res and local.hi_res. User can check hz and max time value (wrap around time) with odp_time_info() and adapt (fall back to use global time) if e.g. local time wraps too quickly (e.g. in 4 sec). If this time wrap every 4s, it shouldn't be used at all...(any 32-bits) See the proposal under. -Petri // // Use cases // // high resolution low resolution // short interval long interval // low overheadhigh overhead // // // global timestamp packets or |timestamp log entries or // other global resources |other global resources // at high rate |at low rate //| // ---+-- //| // local timestamp and sort items |measure execution time over // in thread local work queue,|many iterations or over // measure execution time |a "long" function // of a "short" function, | // spin and wait for a short | // while // // No see reason to overload user with this stuff. In fact we always need one hi-resolution time with best quality, no matter what we measure. No matter how resolution it has, it should be the max that platform can provide for that. At this moment all counters are 64-bit and can not wrap for years. On my opinion,32-bit counter we shouldn't take into account. // time in nsec // renamed to leave room for sec or other units in the future #define ODP_TIME_NS_USEC 10
Re: [lng-odp] RFC: New time API
> -Original Message- > From: ext Ivan Khoronzhuk [mailto:ivan.khoronz...@linaro.org] > Sent: Thursday, September 03, 2015 1:29 AM > To: Savolainen, Petri (Nokia - FI/Espoo); lng-odp@lists.linaro.org > Subject: Re: [lng-odp] RFC: New time API > > Hi, Petri > > We have to look at it proceeding from performance, platform portability > and simplicity > > If you want to split on hi-res time and low-res they must have separate > functions and > be not under one common opaque time in order to not break hi-res > measurements. > > But in fact you split the same quality timers, farther below... The API has two goal (as any other API under ODP) - solve a user problem (take timestamps and work with those) - enable good performance on multiple HW platforms (enable direct HW time counter(s) usage) > > On 02.09.15 18:21, Savolainen, Petri (Nokia - FI/Espoo) wrote: > > Hi, > > > > > > I think we need to restart the time API discussion and specify it > only wall time in mind. > > Let's suppose. > > > CPU cycle count APIs can be tuned as a next step. CPU cycle counters > are affected by frequency scaling, which makes those difficult to use > for counting linear, real time. > > The time API should specify an easy way to check and use the real, > wall clock time. > > We need at least one time source that will not wrap in years - here > it's the "global" time > > (e.g. in POSIX it's CLOCK_MONOTONIC). > > Don't mix it with this API. > CLOCK_MONOTONIC is guaranteed by OS, that can handle wraps, OS can use > interrupts for that, you cannot. A monotonic, very long wrap around time source is an application requirement. CLOCK_MONOTONIC is an example of solving the same requirement in POSIX world. Yes, an ODP implementation should not be interrupt driven, but still an implementation can and likely will serve some interrupts: in worst case on a worker core, in a better case on a control core and in the best case on a system core outside of the ODP application (e.g. linux kernel on core #0). The key is how often those interrupts need to be served and how long it takes in the worst case. E.g. one interrupt due to counter wrap in 5 years on the core running linux kernel, does not matter much. Counter wraps are really an issue with short time counters. Today most chips provide large enough counters, so that wrap around is not really an issue (max one wrap in several years). > > Also it must be zero at platform init and begin count time when > application starts. For each application that starts. > Can you guarantee that it's inited to zero for each platform? I > hesitate to answer on this question. The API does not specify that HW counter is reset at any point in time. It specifies that wall time (nsec time) is zero in an application start up. In practice, implementation needs to read HW counter once in start up and store it. Basic stuff. > > But let's suppose that we can guarantee it. > In this case time should be aligned for all executed applications to > start from zero. > Let's suppose some start_time = odp_time() at application init. > As it was noted earlier, in some fast loop, init_count must be > extracted in diff function. > I'm not talking event about checking of time type, as you are going to > put all of them under one type. > > Global time can be also compared between threads. Local time is > defined for optimizing short interval time checks. > > In fact global has same quality as local. As noted earlier, global can > be emulated with local. > Why we need to split them in this case? It'll add load on user only. > > It's thread local, may wrap sooner than global time, but may be lower > overhead to use. > > They are both 64-bit. You are going to use the same function for both, > overhead the same. First, this as any other ODP API affect only one ODP application (instance). Global time is global between threads of a single application. These are two different application use cases: - global == time that can be shared between thread - local == time that does not need to be shared between threads Ability to share is the key difference in quality. Yes, when there's a SoC level, low latency, high frequency, 64-bit time counter - it's sensible to use that to implement both local and global. In this case, implementation also avoids check between global and local time (it's all the same). I'm expecting that this is the common case. BUT, what if the SoC level HW counter has high latency to access (e.g. 150 CPU cycles) and is low frequency? And the HW would have low latency, high frequency per CPU counter that you could use for counting local time? If API has only glo
Re: [lng-odp] RFC: New time API
On 03.09.15 14:32, Savolainen, Petri (Nokia - FI/Espoo) wrote: -Original Message- From: ext Ivan Khoronzhuk [mailto:ivan.khoronz...@linaro.org] Sent: Thursday, September 03, 2015 1:29 AM To: Savolainen, Petri (Nokia - FI/Espoo); lng-odp@lists.linaro.org Subject: Re: [lng-odp] RFC: New time API Hi, Petri We have to look at it proceeding from performance, platform portability and simplicity If you want to split on hi-res time and low-res they must have separate functions and be not under one common opaque time in order to not break hi-res measurements. But in fact you split the same quality timers, farther below... The API has two goal (as any other API under ODP) - solve a user problem (take timestamps and work with those) - enable good performance on multiple HW platforms (enable direct HW time counter(s) usage) On 02.09.15 18:21, Savolainen, Petri (Nokia - FI/Espoo) wrote: Hi, I think we need to restart the time API discussion and specify it only wall time in mind. Let's suppose. CPU cycle count APIs can be tuned as a next step. CPU cycle counters are affected by frequency scaling, which makes those difficult to use for counting linear, real time. The time API should specify an easy way to check and use the real, wall clock time. We need at least one time source that will not wrap in years - here it's the "global" time (e.g. in POSIX it's CLOCK_MONOTONIC). Don't mix it with this API. CLOCK_MONOTONIC is guaranteed by OS, that can handle wraps, OS can use interrupts for that, you cannot. A monotonic, very long wrap around time source is an application requirement. CLOCK_MONOTONIC is an example of solving the same requirement in POSIX world. Yes, an ODP implementation should not be interrupt driven, but still an implementation can and likely will serve some interrupts: in worst case on a worker core, in a better case on a control core and in the best case on a system core outside of the ODP application (e.g. linux kernel on core #0). The key is how often those interrupts need to be served and how long it takes in the worst case. E.g. one interrupt due to counter wrap in 5 years on the core running linux kernel, does not matter much. Counter wraps are really an issue with short time counters. Today most chips provide large enough counters, so that wrap around is not really an issue (max one wrap in several years). Also it must be zero at platform init and begin count time when application starts. For each application that starts. Can you guarantee that it's inited to zero for each platform? I hesitate to answer on this question. The API does not specify that HW counter is reset at any point in time. It specifies that wall time (nsec time) is zero in an application start up. In practice, implementation needs to read HW counter once in start up and store it. Basic stuff. But let's suppose that we can guarantee it. In this case time should be aligned for all executed applications to start from zero. Let's suppose some start_time = odp_time() at application init. As it was noted earlier, in some fast loop, init_count must be extracted in diff function. I'm not talking event about checking of time type, as you are going to put all of them under one type. Global time can be also compared between threads. Local time is defined for optimizing short interval time checks. In fact global has same quality as local. As noted earlier, global can be emulated with local. Why we need to split them in this case? It'll add load on user only. It's thread local, may wrap sooner than global time, but may be lower overhead to use. They are both 64-bit. You are going to use the same function for both, overhead the same. First, this as any other ODP API affect only one ODP application (instance). Global time is global between threads of a single application. These are two different application use cases: - global == time that can be shared between thread - local == time that does not need to be shared between threads Ability to share is the key difference in quality. Yes, when there's a SoC level, low latency, high frequency, 64-bit time counter - it's sensible to use that to implement both local and global. In this case, implementation also avoids check between global and local time (it's all the same). I'm expecting that this is the common case. BUT, what if the SoC level HW counter has high latency to access (e.g. 150 CPU cycles) and is low frequency? And the HW would have low latency, high frequency per CPU counter that you could use for counting local time? If API has only global definition, you could not use that HW resource even when application is not interested in sharing timestamps with other thread (needs only local time). The application would run slower on your HW, since every (local) timestamp would consume 150 cycles instead of e.g. 1 cycle.
Re: [lng-odp] RFC: New time API
> > Time is linear - the API needs to support that. Application can check > if local time stays linear long enough for its use case. > > It doesn't sound like simplification. In current variant user don't > need to worry about this. > > > > If it does not, the global time should be the fall back (wrap only > after several years). > > > > Range is a relative term - ranges longer than the wrap around time > (in real time) would again cause problems. > > I thinks no need to compare ranges more then years. It's not for this > use-case. For example, a bit 32 bit counter at 1 GHz (e.g. CPU local counter, used for CPU local time) wraps in 4 sec > > > > > One option would be to force all time sources to have very long wrap > around times, which may cause low resolution on all of them (not only > global). Maybe it's better to just specify that cmp() must not be used > if (nsec) time can wrap between t1 and t2. > > This also doesn't sounds as simplification. How user can know he is > comparing wrapped time or not? > He cannot - that's the problem. No one cannot. You cannot predict what > points user compare. > You cannot emulate it in implementation also (suppose worst case - > implementation cannot grantee counter is united to 0 at board start), > as first wrap can happen any time, it's second takes years. Relying on > this makes all applications very configuration dependent. > That is the one of the main and bright examples that allow to see why > we don't need to hide wraps. > So this function cannot be used with timestamps at all, only ranges. To > get range, you must use diff function, > diff function can handle wraps inside. That is. If you must use diff > and cmp then why bother with wall time? > > Why user still should think about wraps, if you want to equalize it to > wall time? > Or even, use this function (it was one of your ideas), to check time > order with function that requires order.. > What about to not bother with chicken/egg issue and always assume that > wrap can happen or cannot at all. > Only describe in API file, it must be > 10 years, for instance, before > first wrap. > And if your application can run more than 10 years it can suddenly > fail. > Uh..or add in description. ..never change your dtb file to another init > value or freq > if you don't know what are you doing...in another way you application > can suddenly fail... > It be threshold for orientation and both implementation and application > can lie on it. And hardly control. Implementation can handle single counter wrap and maintain nsec time which starts from 0, by reading and storing the counter value in ODP init. Application needs only worry about nsec time wrap (is xxx_nsec_max large enough for its purposes / lifetime). For example, an application: - needs high resolution - low CPU overhead - don't need to share the time - needs to compare timestamps (e.g. sort a list based on timestamps) SoC 1: - local time counter is 32 bits and runs at core freq (local_hz == 1GHz, local_nsec_max == ~4sec) => nsec time wraps every 4 sec => nsec time wrap is possible between any t1 and t2 => cannot use cmp() for absolute time stamps => must use global time if need to compare ... - global time counter runs at 25 MHz and wraps in 40 years (global_hz == 25 MHz, global_nsec_max == ~40 years) => nsec time (that starts from zero by the spec) will not wrap in the life time of this application => can use cmp(), but must accept the low resolution. Done. SoC 2: - local time counter is 64 bits and runs at core freq (local_hz == 1GHz, local_nsec_max == ~580 years) => nsec time (that start from zero by the spec) will not wrap in the life time of this application => nsec time wrap is not possible => can use cmp(). Done. -Petri ___ lng-odp mailing list lng-odp@lists.linaro.org https://lists.linaro.org/mailman/listinfo/lng-odp
Re: [lng-odp] RFC: New time API
Hi, Petri On 04.09.15 13:55, Savolainen, Petri (Nokia - FI/Espoo) wrote: Time is linear - the API needs to support that. Application can check if local time stays linear long enough for its use case. It doesn't sound like simplification. In current variant user don't need to worry about this. If it does not, the global time should be the fall back (wrap only after several years). Range is a relative term - ranges longer than the wrap around time (in real time) would again cause problems. I thinks no need to compare ranges more then years. It's not for this use-case. For example, a bit 32 bit counter at 1 GHz (e.g. CPU local counter, used for CPU local time) wraps in 4 sec Seems we decided to not think about 32-bit timer here. Right? It's impossible to work with times more than time wrap, not only for ranges. Any of time API functions is not working correctly in this case, not only cmpr. One option would be to force all time sources to have very long wrap around times, which may cause low resolution on all of them (not only global). Maybe it's better to just specify that cmp() must not be used if (nsec) time can wrap between t1 and t2. This also doesn't sounds as simplification. How user can know he is comparing wrapped time or not? He cannot - that's the problem. No one cannot. You cannot predict what points user compare. You cannot emulate it in implementation also (suppose worst case - implementation cannot grantee counter is united to 0 at board start), as first wrap can happen any time, it's second takes years. Relying on this makes all applications very configuration dependent. That is the one of the main and bright examples that allow to see why we don't need to hide wraps. So this function cannot be used with timestamps at all, only ranges. To get range, you must use diff function, diff function can handle wraps inside. That is. If you must use diff and cmp then why bother with wall time? Don't remove the main problem from this thread. We need to put dot here. So please, answer. Don't forget to mention about cmp() and that first wrap can happen any time. Why user still should think about wraps, if you want to equalize it to wall time? Or even, use this function (it was one of your ideas), to check time order with function that requires order.. What about to not bother with chicken/egg issue and always assume that wrap can happen or cannot at all. Only describe in API file, it must be > 10 years, for instance, before first wrap. And if your application can run more than 10 years it can suddenly fail. Uh..or add in description. ..never change your dtb file to another init value or freq if you don't know what are you doing...in another way you application can suddenly fail... It be threshold for orientation and both implementation and application can lie on it. And hardly control. Implementation can handle single counter wrap and maintain nsec time which starts from 0, Yes, implementation can, but only for ns. Then you can compare, diff and sum only in ns. Correct first wrap (and only 1, and we don't need more) can be guaranteed only in diff function, not in odp_time() and odp_time_cmp(). You cannot compare in odp_time_t as wall time, only ns. But you probably want not only ns. by reading and storing the counter value in ODP init. Application needs only worry about nsec time wrap (is xxx_nsec_max large enough for its purposes / lifetime). Sorry, if implementation can maintain nsec time which starts from 0, why does application need to worry about nsec wrap? I mean only 64-bit counter and above. For example, an application: - needs high resolution - low CPU overhead - don't need to share the time - needs to compare timestamps (e.g. sort a list based on timestamps) SoC 1: - local time counter is 32 bits and runs at core freq (local_hz == 1GHz, local_nsec_max == ~4sec) 32-bit, again, but lets assume. => nsec time wraps every 4 sec => nsec time wrap is possible between any t1 and t2 => cannot use cmp() for absolute time stamps and not only for 32-bit, also 64-bit that doesn't guarantee zero at some init. => must use global time if need to compare ... - global time counter runs at 25 MHz and wraps in 40 years (global_hz == 25 MHz, global_nsec_max == ~40 years) => nsec time (that starts from zero by the spec) will not wrap in the life time of this application => can use cmp(), but must accept the low resolution. Done. SoC 2: - local time counter is 64 bits and runs at core freq (local_hz == 1GHz, local_nsec_max == ~580 years) I tend to not limit freq on currently possible frequencies. Don't forget that you can have counter inited to any value. The first wrap can happen any time. We should count on such cases also. Also it can be configurable, and it's not safe from stupid conf. Also it can be h/w init at 0, but on some emulator, not.then - glitches. Run before some h/w timer test, that can leave it in non zero stat