Hi,

Thanks for starting this! It's very much appreciated.
Discussing these things, reaching common ground and documenting decisions and findings during this process is IMO one of the most important things to do before we move on.

I'm really sorry to write this wall of text, but there are so many things to this timer topic where
we IMO lack a facts driven analysis, decision- and development-process.
This is combined with a lack of documentation of decisions and their implications.

TL;DR:
let's try to decide what is good for our high level timer based on measurable facts. For that we need detailed knowledge on the hardware we target, the software that will use the high level timer, and measurements/benchmarks that tell us what are the ups an downs of different ways of implementation.

Side note: I think we should move the discussion to an RDM PR for the design document and discuss there to not get lost. In the much too long version below I comment mostly on the design document, other quotes as indicated.

RIOT needs an easy to use, efficient and flexible timer system that allows precise high-frequency (microsecond scale) timings *alongside* low-power timers.

For example, an application might include a driver requiring high-frequency (microsecond scale) timings, but at the same time needs to run on batteries and thus needs to make use of low-power timers that can wake up the device from deep sleep.


I fully agree that this currently is a problem and it needs to be resolved.
But what this statement essentially points out, is that the xtimer API either misses
an instance parameter to be used on different low level timers or that it
misses the functionality to internally handle multiple low level timers to multiplex timeouts to the (different) available hardware instances to fulfill low-power and high precision sleep requirements.

The problem statement implies nothing related to the rest of the high level timer API and design at all.
Thus, it is not a problem statement that shows ztimer is our only option.

*efficient*
-in which dimensions? Do we favor small RAM, ROM, CPU overhead?
-how do we prefer this to scale if more or less timers are used?
-does it make sense to decide for only one of these things at all? (one implementation may not suit everyone) -just thinking: would it hurt to have one memory efficient implementation and one that is "bigger, but faster"?

*flexible*
-in level of manual control it provides?
-in level of automatic functionality and abstraction it provides?

*precision (& accuracy)*
-the bounds are defined by the hardware.
-the question is how do we come as close as possible to the capabilities the HW provides.
-where do we need trade-offs?

*low-power*
-dependencies are mostly defined by the hardware
-How do we model these dependencies?
-for our power management we try to do things implicit.
    -how does this work together with a timer that needs to be called with explicit instances?
    -why should the timer API differ from that principle?


The following general questions pop up:
-What does the hardware provide that is abstracted by this high level timer?
-How much abstraction do we need?
-Where do we need trade-offs?
-How do we balance them?
-Based on what information?

Please don't get me wrong, I'm not in principle for or against xtimer, ztimer (or even wtimer, the whatever timer;) Yes, 64 bit time intuitively doesn't sound like a perfect fit for the IoT, but how does this translate to numbers (also different scenarios)?
Yes, xtimer is broken. Yes, it needs fixing or replacement.
*But: the functional problems related to xtimer are not related to it's API, it is the implementation!*


General requirements:
- very efficient timers for use in time-critical drivers

This statement touches memory, runtime overhead and precision, but isn't precise on how to balance between them.


easy-to-use interface (unified interface)

Very much a matter of taste, but also what is the importance of "unified"?
If there are two completely distinct use-cases (requirements) why enforce unified API for that?


work with varying MCU timer widths (16, 24, 32-bit timers)

Agree, an absolute must! (though I'd rather specify it more as "any width")


- adaptable to varying configurations of timers, RTTs, RTCs
(use RTC if available for super-long-time timers)
- this means that applications and / or system modules need to be able to set timers on different timer hardware (configurations) *simultaneously*

To me this feels like it is asking for an API below the high level timer that fits various kind of hardware (?). We currently miss such an API. Though, an extended periph_timer could be used for most (all?) of it.


API (necessary functionality):
- Wait for a time period (e.g. timer_usleep(howlong))
- receive message after given period of time (e.g. timer_msg(howlong))
- receive message periodically
- Await a point in time
- Wait for a (past) time + x

I think we shouldn't start on the API level when thinking about functionality. From a functional point of view there are four(?) different uses for high level timers:
(1) delay execution (semantic: "wait at least...")
(2) timeout an operation (semantic: "at some not too precise time after, but eventually...")
(3) schedule an action (semantic: "try to hit an exact point in time")
(4) measure time (semantic: how much time passed between t1 and t2)

Xtimer doesn't leave much to be desired from that perspective.
Nonetheless, I think we should try to write down the requirements for the above things more precisely.
How these functionalities are exposed thru an API is a different story.


Regarding the general requirements, current __xtimer is not "adaptable to varying configurations of timers, RTTs, RTCs"__.

While xtimer can be configured to use low-power timer hardware instead of the (default) high-precision timer hardware, an application needs to *choose* either low-power, low precision timers *or* high-precision timers that prevent the system from sleeping, *system wide*.

If xtimer is configured to use a low-power timer, it forces it to quantize *all* timer values to the frequency of the low-power timer, which usually has a low frequency (usually 32kHz).

As already pointed out regarding the problem statement:
This only points out that we either need to add an instance parameter to the API (for explicit control) or that xtimer needs to have access to multiple low level timers internally (for implicit adaption). As a stupid example, using a perfectly working ztimer and wrapping it in the API of xtimer like that:

xtimer_xxx(uint64_t time) {
    if (time < whatever_threshold) {
        ztimer_xxx(HIGH_FREQ_INSTANCE, time);
    } else{
        ztimer_xxx(LOW_FREQ_INSTANCE, time / somediv);
    }
}

Clearly this is very simplified but you get the idea.
I'm not saying this is smart or we should do that, I just want to point out that whatever is inside xtimer can be fixed. And also it is not absolutely needed to change all things (implementation and API) at once. Saying that we need to replace it because it is not possible to fix it is simply not true.

The above example brings me to another thing.
Did we actually decide and agree that it is smart to force the app/developer to decide which timer instance to use? For timeouts that are calculated at runtime do we really want to always add some code to decide which instance to use? Then, how to write platform independent code if you don't know which instances will be available and what quality characteristics it comes with?

Is the instance name combined with "convention" as only meta-information on a timer instance sufficient? ->ZTIMER_USEC suggests USEC precision, but how about accuracy, low power dependencies etc. ? I'm not saying all of this can be solved easily but I think we should discuss and document that.


Worse, much code in RIOT uses xtimer with the expectation of it being high-precision, and breaks in unknown ways if suddenly used with low-frequency timers through the microsecond based API. The API does not reflect this distinction.

A converted ztimer instance named ZTIMER_USEC has similar problems.

Why not let the application tell what kind of requirements it has for a timer (e.g. with flags) and let our high level timer do it's high level stuff to automatically map it to what is at hand?
If we don't want that, are there valid reasons?

Also keep in mind that some code like that will be required anyway for runtime calculated timer values
if we want to make use of low power capable timers.


- it is not ISR safe (#5428, #9530, #11087)
- it doesn't have unittests (#10321)
- it has some underflow bugs and race conditions (#7116, #9595, #11087)
- there's a PR trying to use multiple backend drivers (#9308)

Please don't tell us this is not-fixable by design. If so, what is it that makes these unfixable?


- it sets target timers using absolute target times, making it prone to underflows (#9530 fixes this?)

Yes its a problem and #9530 shows one possible solution for fixing it.


- it forces all timers to 1us ticks at the API level, which in turn forces internal 64bit arithmetic

No one said we can not add to and adapt the API if requirements are not met.
This doesn't require us to start from scratch.
Also when introducing xtimer 1us ticks was considered good enough now it is a bad thing.
What changed? The IoT hardware? Our requirements? Your opinion?
Can we write that down? What are the assumed conditions and scenarios for this to be true?
What are the measurable ups and downs here?

- its optional non-us tick support has undefined and fixed (generally, inflexible) arithmetic for conversion - its monolithic design does not allow an application to use timer hardware with custom configuration *alongside* the default configuration - xtimer is depending on correct per-platform configuration values (XTIMER_BACKOFF, XTIMER_ISR_BACKOFF, which are hard to understand) - subjectively, it's code is a mess of microsecond and tick types and complex interactions

All the above points are implementation problems.


- there's no clear path for implementing higher-level functionality while keeping the API, e.g., for implementing a network-wide synchronized clock

I'm not sure if I understand that. What is the missing clear path and how does ztimer provide it?


With API additions, the forced 1us timer base *could* be removed, making the 64bit arithmetic unnecessary if done right. This would require a tremendous effort and probably touching *every line* in xtimer.

Rather starting completely from scratch instead of touching existing code in master shows how much we can trust the code that is sitting there. IMO we can not improve this situation and create maintainable code if we always fall back to this "throwaway&rewrite" type of development process.


ztimer solves all of xtimer's identified issues.

Thats impressive and I hope for the best, but I'm at least skeptical about not introducing new problems we currently don't have. Testing and time will tell. The fact that ztimer is easier to test is great and for sure allows for catching
more bugs that were hard to find and reproduce in xtimer.


When compiled, ztimer's base configuration (1MHz periph timer, no conversion) uses ~25% less code than xtimer (measured using direct port of `tests/xtimer_msg` to ztimer).

Cool. With code you mean ROM? How about RAM?
Since you mention "without conversion" how does this change if conversion is enabled?
What is the cost of enabling the extension module?
Also how does this change over different platforms and when using multiple timers in various scenarios?


- its API only allows 32bit timeouts, vs. full 64bit in xtimer. This allows much simpler (and efficient) arithmetics and uses less memory.

While I agree this intuitively sounds to be true, it wouldn't hurt to collect some numbers in the design document on how this actually compares in the different scenarios.


It has been observed that no known hardware can provide both 64bit range and precision, and more importantly, no application needs it.

Thats not true. Counter example: esp32 has 64 bit timers.
Also there are other platforms that allow chaining together timers to get one 48 or 64 bit timer.
That no application actually needs this is maybe just your opinion.
Or do you say the hardware manufacturers put this into their chip only because there was still some silicon space left?


By offering multiple time bases (e.g. milliseconds for long timeouts and microseconds for short ones) with full 32bit range each, all application's actual needs should be met. If not, a 64bit implementation could be added on top of ztimer, only causing overhead for applications that actually use it.

Sounds reasonable to me.
Yet, we should try to compile a list of "application's actual needs" to see if we are missing something.
I also offer my help for that.


- ztimer stores a timer's offset relative to the previous timer. This means the time left until a timer expires cannot be calculated by taking `absolute target time - now`, but needs to be summed up. Essentially, a hypothetical `ztimer_left(clock, timer)` becomes an O(n) operation.

This, and similar things related to the relative offset concept are things we should come up with
extensive benchmark data of different scenarios.
We need quantifiable information -arguable facts- to see what are the ups and downs of a design decisions.


- ztimer disables, as does xtimer, IRQs while manipulating its timer linked lists. Thus the time spent with ISRs disabled is depending on the amount of timers in the list, making ztimer real-time unsafe. (there are ideas to fix this, see Outlook section below)

What about mutex instead of irq_disable for all the time?


- Idea: we specify that by convention, ZTIMER_MSEC (and ZTIMER_SEC) *keep running in sleep mode*, whereas ZTIMER_USEC stops when the MCU enters sleep (unless a timeout is scheduled). This is current behaviour *if* ZTIMER_USEC is using periph_timer as backend and ZTIMER_MSEC is using RTT/RTC.

Wouldn't it be a bit counter intuitive to tie the capabilities to the instances if sometimes one is modeled by another (that doesn't have these capabilities)?


This would mean that `before = ztimer_now(clock); do something; diff = ztimer_now(clock) - before;` only works if either `do_something` does not schedule away the thread causing sleep *or* a clock is used that runs in sleep mode.

The "does not schedule away" is probably hard to impossible to ensure and not what I would intuitively
expect when measuring the execution time.
Shouldn't a high level timer handle this for the user?
Yes, but how to do that with 32 bit if "clock" is based on a fast running timer?


- the behaviour could be accessible either through defines (ZTIMER_USEC_LPM_MODE or ZTIMER_USEC_DEEPSLEEP ...), *or* be made part of the ztimer API

+1 for "made part of the API" over "through defines"


- in addition, we could add functions to explicitly tell the clocks to stay available until released, e.g., `ztimer_acquire(clock); before = ztimer_now(clock); do something; diff = ztimer_now(clock) - before; ztimer_release(clock);`.     Once the "if timer is scheduled, don't sleep" is implemented, this could also be worked around by:     `ztimer_set(clock, dummy, 0xffffffff); ...; ztimer_cancel(clock, dummy);`

This looks like a workaround for not considering the use case of "measuring time" beforehand.


Going a bit away from the technical stuff, there are also some things to discuss regarding the process.

TL;DR for the below: shoud we really prefer "throw out & rewrite" over "making our code maintainable"?


On 12/9/19 9:25 PM, Oleg Hahm wrote:

(...) is apparently not fixable (at least no one came up with a concrete
proposal to fix it to the best of my knowledge).

I disagree. No! Don't shout "OK, then go fix it!" ;).
There where a lot of people providing fixes, testing and trying to improve the situation. All of them failed. Neither because all the ideas weren't good nor because the solutions weren't working. It was because we as a community didn't care enough, didn't put high enough priority on it, the code being a complex mess,
and generally: ain't no body got time for that.
These problems are serious and we should be able to improve on that without always falling back to "starting from scratch" [1].

Having a concrete implementation proposal is great.
Thanks for the effort of designing and implementing the current state of
ztimer and thanks for the documentation of this effort!

+1 and a big thanks to Kaspar & @gebart!


I also want to recap a thing that was put up on the last summit.
When xtimer was introduced, it was expected to improve whatever problems vtimer had. I don't know how vtimer performed, but oh boy, if it was worse than xtimer nowadays I'm glad I came to use RIOT after that^^ What I actually want to say: having a solution at hand that is thought to solve all our problems doesn't imply it will. Throwing out xtimer because now there is ztimer sounds very much like history repeating itself.
If now the same thing happens with ztimer, we didn't learn from the past.
If ztimer solves all the problems, we didn't learn either:
We weren't capable of isolating, fixing and documenting the core problems.
We weren't agile. We didn't improve and evolve our code, we just replaced it without really knowing if and why this is required. "Because xtimer didn't work" is obviously not what I'm searching for here, and the "64 bit ain't nothin' for the IoT" discussion is independent of "making a high level timer functional".

Digging down into all these nasty concurrency problems, the low level timer stuff and all that
on the huge number of platforms is really not something you want to do.
Kaspar already did this for more time than a human being deserves, again, kudos to you! I really want to help here, but I'm not a big fan of excitedly jumping over to ztimer without really
thinking thru what we are doing here, why we do that, and at what cost.

Sorry for this text-monster and thanks for your time!


cheers
Michel

[1]: https://image.slidesharecdn.com/living-with-legacy-software-160628155337/95/strangler-application-patterns-and-antipatterns-9-638.jpg?cb=1467129568

_______________________________________________
devel mailing list
devel@riot-os.org
https://lists.riot-os.org/mailman/listinfo/devel

Reply via email to