Re: [riot-devel] ztimer - a new high-level timer for RIOT

Kaspar Schleiser Wed, 11 Dec 2019 02:57:58 -0800

Hi Michel,

thanks for all that input. It is *a lot*. I guess it is a complex subject..

Would it make sense to make a micro conference? Get everyone interested
in improving timers in a room and lock it until solutions are presented?

On 12/10/19 6:23 PM, Michel Rottleuthner wrote:
>> RIOT needs an easy to use, efficient and flexible timer system that
>> allows precise high-frequency (microsecond scale) timings *alongside*
>> low-power timers.
>>
>> For example, an application might include a driver requiring
>> high-frequency (microsecond scale) timings, but at the same time needs
>> to run on batteries and thus needs to make use of low-power timers
>> that can wake up the device from deep sleep.
>>
> 
> I fully agree that this currently is a problem and it needs to be resolved.
> But what this statement essentially points out, is that the xtimer API
> either misses
> an instance parameter to be used on different low level timers or that it
> misses the functionality to internally handle multiple low level timers
> to multiplex
> timeouts to the (different) available hardware instances to fulfill
> low-power and high precision sleep requirements.

Agreed.

Adding the instance parameter is not trivial. The implementation needs
to provide the underlying functionality. xtimer currently doesn't allow
that. Much of it needs to be rewritten.

Handling this internally is IMO not feasible for reasons I will point
out later.

> The problem statement implies nothing related to the rest of the high
> level timer API and design at all.
> Thus, it is not a problem statement that shows ztimer is our only option.
> 
> *efficient*
> -in which dimensions? Do we favor small RAM, ROM, CPU overhead?
> -how do we prefer this to scale if more or less timers are used?
> -does it make sense to decide for only one of these things at all? (one
> implementation may not suit everyone)
> -just thinking: would it hurt to have one memory efficient
> implementation and one that is "bigger, but faster"?
> 
> *flexible*
> -in level of manual control it provides?
> -in level of automatic functionality and abstraction it provides?
> 
> *precision (& accuracy)*
> -the bounds are defined by the hardware.
> -the question is how do we come as close as possible to the capabilities
> the HW provides.
> -where do we need trade-offs?
> 
> *low-power*
> -dependencies are mostly defined by the hardware
> -How do we model these dependencies?
> -for our power management we try to do things implicit.
>     -how does this work together with a timer that needs to be called
> with explicit instances?
>     -why should the timer API differ from that principle?
> 
> 
> The following general questions pop up:
> -What does the hardware provide that is abstracted by this high level
> timer?
> -How much abstraction do we need?
> -Where do we need trade-offs?
> -How do we balance them?
> -Based on what information?
> 
> Please don't get me wrong, I'm not in principle for or against xtimer,
> ztimer (or even wtimer, the whatever timer;)
> Yes, 64 bit time intuitively doesn't sound like a perfect fit for the
> IoT, but how does this translate to numbers (also different scenarios)?
> Yes, xtimer is broken. Yes, it needs fixing or replacement.
> *But: the functional problems related to xtimer are not related to it's
> API, it is the implementation!*
> 
> 
>> General requirements:
>> - very efficient timers for use in time-critical drivers
> 
> This statement touches memory, runtime overhead and precision, but isn't
> precise on how to balance between them.

These are all very valid questions.

IMO, coming up with definite answers is quite difficult, unless we move
towards defining bounds.

(Assuming xtimer would be stable / reliable), performance wise, it is
*in the acceptable range* on RAM, ROM, cycle use. We have not yet seen
an application that was in any way limited by it, apart from someone
trying to use it for bit banging with the sub-10-us timings, which as I
described in another mail is (probably) just not doable with a callback
based high level timer.

With "acceptable range" I mean that any implementation that is in the
same ballpark (+- maybe 20%) would be acceptable (if it provides the
necessary functionality). The difference would just don't matter in
practice. Smaller and faster aabsolutely becomes a *nice to have*
compared to the *musts* of reliability, accuracy, low-power friendlyness
and the general possibility to be usable for a high percentage of our
applications.

For (RAM, ROM, CPU) performance, I'd not ditch xtimer. It performs
alright (if it works).

Maybe we can agree that xtimer's performance tradeoffs so far have not
shown to be wrong.

>> easy-to-use interface (unified interface)
> 
> Very much a matter of taste, but also what is the importance of "unified"?
> If there are two completely distinct use-cases (requirements) why
> enforce unified API for that?

With "unified" I actually mean something that xtimer already does: no
matter the timing needs, "xtimer_set(t, period)" has you covered. That
was one of the design ideas of xtimer, and I think it worked quite well
from a user perspective.

Compared to that, non-unified would be what AFAIK contiki is doing,
which has a whole alphabet of timers (not timer history. :), and some
can be used to schedule protothreads, others for other stuff.

Or, in RIOT, applications that do deep sleep in RIOT currently use
RTC/RTT directly, with a different API. If only "sleep x RTT ticks" is
needed, that is an unnecessary difference compared to z/xtimer.

I agree that an RTC API using hh,mm,ss alarms should not be dropped and
every user be forced to use "xtimer_sleep(epoch_target_value)".

>> - adaptable to varying configurations of timers, RTTs, RTCs
>> (use RTC if available for super-long-time timers)
>> - this means that applications and / or system modules need to be able
>> to set timers on different timer hardware (configurations)
>> *simultaneously*
> 
> To me this feels like it is asking for an API below the high level timer
> that fits various kind of hardware (?).

Not at all. xtimer would do fine, if it had a parameter allowing to
specify the clock, and a backend API, ....

periph_timer IMO should be the slimmest layer of hardware abstraction
that makes sense, so users that don't want to do direct non-portable
register based applications get the next "closest to the metal".

> We currently miss such an API. Though, an extended periph_timer could be
> used for most (all?) of it.

What would that extension look like? Would it add a "clock" parameter so
it can deal with "varying configurations of timers, RTTs, RTCs"? Would
it do any kind of timer width extension? Would it add multiplexing?
Would it implement frequency conversion?

Why can't xtimer solve this? (I think ztimer does.)

>> API (necessary functionality):
>> - Wait for a time period (e.g. timer_usleep(howlong))
>> - receive message after given period of time (e.g. timer_msg(howlong))
>> - receive message periodically
>> - Await a point in time
>> - Wait for a (past) time + x
> 
> I think we shouldn't start on the API level when thinking about
> functionality.

I think it wasn't clear that all the requirements in that section were
taken from what we collected before implementing xtimer...

> From a functional point of view there are four(?) different uses for
> high level timers:
> (1) delay execution (semantic: "wait at least...")
> (2) timeout an operation (semantic: "at some not too precise time after,
> but eventually...")
> (3) schedule an action (semantic: "try to hit an exact point in time")
> (4) measure time (semantic: how much time passed between t1 and t2)
> 
> Xtimer doesn't leave much to be desired from that perspective.
> Nonetheless, I think we should try to write down the requirements for
> the above things more precisely.
> How these functionalities are exposed thru an API is a different story.

Agreed.

>> Regarding the general requirements, current __xtimer is not "adaptable
>> to varying configurations of timers, RTTs, RTCs"__.
>>
>> While xtimer can be configured to use low-power timer hardware instead
>> of the (default) high-precision timer hardware, an application needs
>> to *choose* either low-power, low precision timers *or* high-precision
>> timers that prevent the system from sleeping, *system wide*.
>>
>> If xtimer is configured to use a low-power timer, it forces it to
>> quantize *all* timer values to the frequency of the low-power timer,
>> which usually has a low frequency (usually 32kHz).
>>
> As already pointed out regarding the problem statement:
> This only points out that we either need to add an instance parameter to
> the API (for explicit control)
> or that xtimer needs to have access to multiple low level timers
> internally (for implicit adaption).
> As a stupid example, using a perfectly working ztimer and wrapping it in
> the API of xtimer like that:
> 
> xtimer_xxx(uint64_t time) {
>     if (time < whatever_threshold) {
>         ztimer_xxx(HIGH_FREQ_INSTANCE, time);
>     } else{
>         ztimer_xxx(LOW_FREQ_INSTANCE, time / somediv);
>     }
> }
> 
> Clearly this is very simplified but you get the idea.

Yeah, but simplified doesn't cut it. Sleeping one second on an 1HZ timer
is a different thing than sleeping 1000ms on an ms timer. Even with a
perfect implementation, the former will sleep anything from zero to two
seconds. The latter anything from 999 to 1001ms, if they each need to
work with hardware of 1Hz resp. 1000Hz. There's not much to be done there.

That is one of the main issues with an API that doesn't have the clock
parameter, but a fixed (probably high frequency) frequency, as xtimer has.

> I'm not saying this is smart or we should do that, I just want to point
> out that whatever is inside xtimer can be fixed.

Yes, but that might be more work than to start from scratch.
If fixing includes rewriting or fundamentally changing most of the code
and / or concepts, that should not be called a fix, but a rewrite.

> And also it is not absolutely needed to change all things
> (implementation and API) at once.
> Saying that we need to replace it because it is not possible to fix it
> is simply not true.

Agreed. But saying that fixing it might be more work than rewriting it
might be valid.

Also, we're not talking about just fixing reliability issues or bugs.
That certainly can be done.
We're talking about fundamental issues with the API and the underlying
implementation.

> The above example brings me to another thing.
> Did we actually decide and agree that it is smart to force the
> app/developer to decide which timer instance to use?

I think we unfortunately did not decide on anything...

> For timeouts that are calculated at runtime do we really want to always
> add some code to decide which instance to use?

If there are multiple instances, there is code that selects them.
The question would be, do we want

a) to provide an xtimer style API that is fixed on a high level, combine
with logic below that chooses a suitable backend timer

or

b) add a "clock" parameter that explicitly handles this distinction.

> Then, how to write platform independent code if you don't know which
> instances will be available and what quality characteristics it comes with?
> 
> Is the instance name combined with "convention" as only meta-information
> on a timer instance sufficient?
> ->ZTIMER_USEC suggests USEC precision, but how about accuracy, low power
> dependencies etc. ?
> I'm not saying all of this can be solved easily but I think we should
> discuss and document that.

Yes. I think I have started some.

Something along the lines of:

1. ZTIMER_USEC provides at least +-10 us accuracy
2. ZTIMER_USEC prevents sleep if a timeout is set

3. ZTIMER_MSEC provides at least +-2ms accuracy
4. if the hardware supports it, it will wakeup the MCU from sleep

5. ZTIMER_SEC provides +-1 second accuracy
6. if the hardware supports it, it will wakeup the MCU from sleep

This already covers *a lot of our timing needs*, and can easily be
provided (configured automatically). Providing much more is difficult,
and is matter of configuration and documentation. Unless we want runtime
queriable characteristics.

Compare this to the current state:

1. xtimer provides +-31us accuracy and will not wakeup from sleep

>> Worse, much code in RIOT uses xtimer with the expectation of it being
>> high-precision, and breaks in unknown ways if suddenly used with
>> low-frequency timers through the microsecond based API. The API does
>> not reflect this distinction.
> 
> A converted ztimer instance named ZTIMER_USEC has similar problems.

Well, if it is the only choice. But ztimer allows multiple clocks.
It is possible to have ZTIMER_USEC and ZTIMER_MSEC. Applications would
choose the clock which minimally fulfills the timing needs. The
demanding driver sticks to ZTIMER_USEC, and gnrc can convert all
`xtimer_set(x * US_PER_SEC)` to `ztimer_set(x * MS_PER_SEC)` and use the
low power timer, *at the same time*. (that conversion can be done using
coccinelle).

> Why not let the application tell what kind of requirements it has for a
> timer (e.g. with flags)
> and let our high level timer do it's high level stuff to automatically
> map it to what is at hand?
> If we don't want that, are there valid reasons?

We can do that at compile time (when configuring the clocks), and flags
become the clock parameter.

"let our high level timer do it's high level stuff to automatically map
it to what is at hand" is maybe possible.

> Also keep in mind that some code like that will be required anyway for
> runtime calculated timer values
> if we want to make use of low power capable timers.

This I don't get.

>> - it is not ISR safe (#5428, #9530, #11087)
>> - it doesn't have unittests (#10321)
>> - it has some underflow bugs and race conditions (#7116, #9595, #11087)
>> - there's a PR trying to use multiple backend drivers (#9308)
> 
> Please don't tell us this is not-fixable by design. If so, what is it
> that makes these unfixable?

What means *fix*? If I rename ztimer to xtimer, would that could as "fix"?

>> - it sets target timers using absolute target times, making it prone
>> to underflows (#9530 fixes this?)
> 
> Yes its a problem and #9530 shows one possible solution for fixing it.

Yeah, I know. That's why I added the PR's "trying" to fix behind issues.

>> - it forces all timers to 1us ticks at the API level, which in turn
>> forces internal 64bit arithmetic
> 
> No one said we can not add to and adapt the API if requirements are not
> met.

... requiring possibly substantial changes.

> This doesn't require us to start from scratch.

Look, xtimer has around 1k lines of code. *if you know what you are
doing*, you can write those in a week or two, from scratch. *if you
don't*, it takes much longer.

Same goes for "fixing".

What I'm trying to say is that an implementation started from scratch
does not start from scratch in terms of concepts or experience.

> Also when introducing xtimer 1us ticks was considered good enough now it
> is a bad thing.
> What changed?

We put our theories and talking into code, then used that code for a
while, gaining experience.

> The IoT hardware? Our requirements? Your opinion?
> Can we write that down? What are the assumed conditions and scenarios
> for this to be true?
> What are the measurable ups and downs here?

We are talking about the implementation, right?
How many us are one 32768kHz tick? Something around 30.517578125.
when used as internal normalization base, this is weird.

>> - there's no clear path for implementing higher-level functionality
>> while keeping the API, e.g., for implementing a network-wide
>> synchronized clock
> 
> I'm not sure if I understand that. What is the missing clear path and
> how does ztimer provide it?

The clear path is knowing exactly how to implement it. In ztimer, a
clock that's synchronized via network can be a module storing some state
(offset, frequency difference), and re-use ztimer_core, to provide a
timer base that is synchronized, and that an application can be made to
use, using the same API.

How would you implement that using xtimer?

>> With API additions, the forced 1us timer base *could* be removed,
>> making the 64bit arithmetic unnecessary if done right. This would
>> require a tremendous effort and probably touching *every line* in xtimer.
> 
> Rather starting completely from scratch instead of touching existing
> code in master shows how much we can trust the code that is sitting there.
> IMO we can not improve this situation and create maintainable code if we
> always fall back to this "throwaway&rewrite" type of development process.

I don't agree at all. It is much worse to unnecessarily stick to some
code than to correctly trade off the effort to re-write.

And, *all* timer rewrites so far have *substantially* improved upon
their predecessors. *none* of them made it worse in any way.

Also, while the implementation might have been started from scratch, it
builds upon the experience of writing xtimer, ..., and on the updated
experience of where its pain points were.

>> ztimer solves all of xtimer's identified issues.
> 
> Thats impressive and I hope for the best, but I'm at least skeptical
> about not
> introducing new problems we currently don't have.

That's valid. I assure you that anyone trying to "fix" xtimer will
realize about issues they don't know they're having.

>> When compiled, ztimer's base configuration (1MHz periph timer, no
>> conversion) uses ~25% less code than xtimer (measured using direct
>> port of `tests/xtimer_msg` to ztimer).
> 
> Cool. With code you mean ROM? How about RAM?

Yes.

> Since you mention "without conversion" how does this change if
> conversion is enabled?

I can only make statements on specific configurations. Currently, a
ztimer_periph with an ztimer_convert_frac on top, on Cortex-M, is
slightly larger than xtimer, but that's because the frac values are
calculated at run time, which is unnecessary and easy to fix.

ALso, it is easy to implement different conversion modules. There's
already one conversion module that uses 64bit division (which I used for
testing).

> What is the cost of enabling the extension module?
> Also how does this change over different platforms and when using
> multiple timers in various scenarios?

It changes very differently depending on the scenario.

>> - its API only allows 32bit timeouts, vs. full 64bit in xtimer. This
>> allows much simpler (and efficient) arithmetics and uses less memory.
> 
> While I agree this intuitively sounds to be true, it wouldn't hurt to
> collect some numbers in the design document on how this actually
> compares in the different scenarios.

I made some measurements, actually the 32bit code as used is a little
less efficient (~10%).

>> It has been observed that no known hardware can provide both 64bit
>> range and precision, and more importantly, no application needs it.
> 
> Thats not true. Counter example: esp32 has 64 bit timers.

Ok, let me rephrase: No known hardware provides microsecond accuracy
over 64bit range. I'm 100% sure of that, but I'm happy to be proven
wrong. Which will be difficult regarding the time span that proof involves.

> Also there are other platforms that allow chaining together timers to
> get one 48 or 64 bit timer.
> That no application actually needs this is maybe just your opinion.
> Or do you say the hardware manufacturers put this into their chip only
> because there was still some silicon space left?

Specs look good on paper. Anyhow, I agree having >32bit counters is
maybe useful to have.

> Yet, we should try to compile a list of "application's actual needs" to
> see if we are missing something.
> I also offer my help for that.

Yes, but as apart from the ones that jump into our faces (reliability,
possibility to sleep), we don't even know, why bother?

>> - ztimer stores a timer's offset relative to the previous timer. This
>> means the time left until a timer expires cannot be calculated by
>> taking `absolute target time - now`, but needs to be summed up.
>> Essentially, a hypothetical `ztimer_left(clock, timer)` becomes an
>> O(n) operation.
> 
> This, and similar things related to the relative offset concept are
> things we should come up with
> extensive benchmark data of different scenarios.
> We need quantifiable information -arguable facts- to see what are the
> ups and downs of a design decisions.

Agreed.

>> - ztimer disables, as does xtimer, IRQs while manipulating its timer
>> linked lists. Thus the time spent with ISRs disabled is depending on
>> the amount of timers in the list, making ztimer real-time unsafe.
>> (there are ideas to fix this, see Outlook section below)
> 
> What about mutex instead of irq_disable for all the time?

Exactly, that would reduce the "downtime" during list traversion.

>> - Idea: we specify that by convention, ZTIMER_MSEC (and ZTIMER_SEC)
>> *keep running in sleep mode*, whereas ZTIMER_USEC stops when the MCU
>> enters sleep (unless a timeout is scheduled). This is current
>> behaviour *if* ZTIMER_USEC is using periph_timer as backend and
>> ZTIMER_MSEC is using RTT/RTC.
> 
> Wouldn't it be a bit counter intuitive to tie the capabilities to the
> instances if sometimes one is modeled by another (that doesn't have
> these capabilities)?

Yeah, agreed. If ZTIMER_USEC is just a converted 32kHz timer, we're back
to xtimer-on-RTT. Maybe in this case, ZTIMER_USEC should just not be
provided, and be guarded by a feature.

>> This would mean that `before = ztimer_now(clock); do something; diff =
>> ztimer_now(clock) - before;` only works if either `do_something` does
>> not schedule away the thread causing sleep *or* a clock is used that
>> runs in sleep mode.
> 
> The "does not schedule away" is probably hard to impossible to ensure
> and not what I would intuitively
> expect when measuring the execution time.
> Shouldn't a high level timer handle this for the user?

How, without extra API, other than keeping the clock active and blocking
lpm?

> Yes, but how to do that with 32 bit if "clock" is based on a fast
> running timer?
> 
> 
>> - the behaviour could be accessible either through defines
>> (ZTIMER_USEC_LPM_MODE or ZTIMER_USEC_DEEPSLEEP ...), *or* be made part
>> of the ztimer API
> 
> +1 for "made part of the API" over "through defines"

Honestly I don't think either is useful. What's the use case for runtime
querying of timer capabilities? Anyhow, that's easy to add to any
implementation.

>> - in addition, we could add functions to explicitly tell the clocks to
>> stay available until released, e.g., `ztimer_acquire(clock); before =
>> ztimer_now(clock); do something; diff = ztimer_now(clock) - before;
>> ztimer_release(clock);`.
>>     Once the "if timer is scheduled, don't sleep" is implemented, this
>> could also be worked around by:
>>     `ztimer_set(clock, dummy, 0xffffffff); ...; ztimer_cancel(clock,
>> dummy);`
> 
> This looks like a workaround for not considering the use case of
> "measuring time" beforehand.

Not really. Ideally, the high-freq timer is *powered down*. But then
"now()" doesn't work. There are not many options that don't involve
special API.

> Going a bit away from the technical stuff, there are also some things to
> discuss regarding the process.
> 
> TL;DR for the below: shoud we really prefer "throw out & rewrite" over
> "making our code maintainable"?

TL;DR the answer can only be "it depends".

> On 12/9/19 9:25 PM, Oleg Hahm wrote:
> 
>> (...) is apparently not fixable (at least no one came up with a concrete
>> proposal to fix it to the best of my knowledge). 
> 
> I disagree. No! Don't shout "OK, then go fix it!" ;).

Seriously, do.

> There where a lot of people providing fixes, testing and trying to
> improve the situation.
> All of them failed. Neither because all the ideas weren't good nor
> because the solutions weren't working.
> It was because we as a community didn't care enough, didn't put high
> enough priority on it, the code being a complex mess,
> and generally: ain't no body got time for that.
> These problems are serious and we should be able to improve on that
> without always falling back to "starting from scratch" [1].

What about "plan to throw away your first (and possibly second)
implementation"?

We're not talking about simple bug fixes to xtimer. We're talking about
serious refactoring and conceptual changes.

Doing those incrementally will take *ages*. Just note how long #9530 has
been open. To quote you: "I'm still struggling a bit to fully understand
all the changes we'll get there somehow. Just many changes at once and
many side effects - since you wrote and also debugged this, you probably
know what I mean ;)"

And those fixes are just *the tip of the ice berg*. There's not even a
plan for multi-timer support.

All those changes would need to be evaluated, reviewed, tested, ...
All of those in a codebase like xtimer (complex to begin with).

> I also want to recap a thing that was put up on the last summit.
> When xtimer was introduced, it was expected to improve whatever problems
> vtimer had.
> I don't know how vtimer performed, but oh boy, if it was worse than
> xtimer nowadays I'm glad I came to use RIOT after that^^
> What I actually want to say: having a solution at hand that is thought
> to solve all our problems doesn't imply it will.
> Throwing out xtimer because now there is ztimer sounds very much like
> history repeating itself.

As said above, vtimer improved *substantially* over swtimer. xtimer
improved *substantially* over vtimer. Every rewrite was a huge
improvement and did not have noticable downsides. And each time, noone
had either the knowledge or skills or interest to provide a better solution.

If we get the same kind of improvements from ztimer over xtimer, I'd be
super happy.

> If now the same thing happens with ztimer, we didn't learn from the past.

If what happens? If in 5 years, we have learned where ztimer doesn't cut
it and come up with ztimer+ (for lack of letters) that improves the
situation substantially, again?

> If ztimer solves all the problems, we didn't learn either:
> We weren't capable of isolating, fixing and documenting the core problems.
> We weren't agile. We didn't improve and evolve our code, we just
> replaced it without really knowing if and why this is required.
> "Because xtimer didn't work" is obviously not what I'm searching for
> here, and the "64 bit ain't nothin' for the IoT" discussion is
> independent of "making a high level timer functional".

We don't improve and evolve code, we improve and evolve a system.

> Digging down into all these nasty concurrency problems, the low level
> timer stuff and all that
> on the huge number of platforms is really not something you want to do.
> Kaspar already did this for more time than a human being deserves,
> again, kudos to you!
> I really want to help here, but I'm not a big fan of excitedly jumping
> over to ztimer without really
> thinking thru what we are doing here, why we do that, and at what cost.

I think, if we want to have a proper low-power timer story *soon*, we
should go with an implementation that provides this. If that
implementation is within bounds of acceptable performance metrics (or
just *better than what we have*), and can be shown do be at least as
reliable (bug-free) as what we have, and transition is painless, that
should be done.

Any discussion on what would have been better can continue in parallel.

At some point, we need to be pragmatic.

> Sorry for this text-monster and thanks for your time!

Dito. These mails take hours to write. Can we have a meeting?

Kaspar
_______________________________________________
devel mailing list
devel@riot-os.org
https://lists.riot-os.org/mailman/listinfo/devel

Re: [riot-devel] ztimer - a new high-level timer for RIOT

Reply via email to