Hi Kaspar,

thanks a lot for reading thru that and for the reply!

Would it make sense to make a micro conference? Get everyone interested
in improving timers in a room and lock it until solutions are presented?
Not convinced about the "lock in a room" ;) - but otherwise: absolutely yes!

What do you think about an RDM PR?
We could just use your design document as a starting point.

RIOT needs an easy to use, efficient and flexible timer system that
allows precise high-frequency (microsecond scale) timings *alongside*
low-power timers.

For example, an application might include a driver requiring
high-frequency (microsecond scale) timings, but at the same time needs
to run on batteries and thus needs to make use of low-power timers
that can wake up the device from deep sleep.

I fully agree that this currently is a problem and it needs to be resolved.
But what this statement essentially points out, is that the xtimer API
either misses
an instance parameter to be used on different low level timers or that it
misses the functionality to internally handle multiple low level timers
to multiplex
timeouts to the (different) available hardware instances to fulfill
low-power and high precision sleep requirements.
Agreed.

Adding the instance parameter is not trivial. The implementation needs
to provide the underlying functionality. xtimer currently doesn't allow
that. Much of it needs to be rewritten.

Handling this internally is IMO not feasible for reasons I will point
out later.


I disagree on the last sentence, but maybe we are just not talking about the same thing.

These are all very valid questions.

IMO, coming up with definite answers is quite difficult, unless we move
towards defining bounds.
Yes. To at least move in that direction, I think we should try to work towards
a clear description of the problem space including, but not limited to:
-Hardware capabilities
-Requirements
-Use-Cases
-Quality metrics
-Benchmarks for these metrics
(...)

With this in place we can try to flesh out the key findings, design decisions and
implications of the timer design.
This will help us a lot once another re-design is considered.
Basically, I'd like to transfer what you call "experience from iterating" into a document
that manifests this experience in a form that our community can build upon.


Maybe we can agree that xtimer's performance tradeoffs so far have not
shown to be wrong.
Agree.

periph_timer IMO should be the slimmest layer of hardware abstraction
that makes sense, so users that don't want to do direct non-portable
register based applications get the next "closest to the metal".
Agree, but there are some things that we should add to the periph_timer.
E.g. adding support for dedicated overflow interrupts together with an API to read
the corresponding IRQ status bit.
The high level timer would benefit from that on many platforms.
E.g. Ztimer wouldn't require the code for the time partitioning mechanism then.
But thats yet another part of the story...

We currently miss such an API. Though, an extended periph_timer could be
used for most (all?) of it.
What would that extension look like? Would it add a "clock" parameter so
it can deal with "varying configurations of timers, RTTs, RTCs"? Would
it do any kind of timer width extension? Would it add multiplexing?
Would it implement frequency conversion?

Why can't xtimer solve this? (I think ztimer does.)

That's one of the questions the future RDM should investigate and answer.
It would probably do mostly a slim extension in a way that fits the platform and peripheral. But no multiplexing or frequency conversion stuff thats what high level timer is for, right?
Also the term "frequency conversion" is a bit misleading I think.
With a discrete clock you won't be able to just precisely convert a frequency to any other frequency in software. Especially if you want to increase the frequency - it will just be a calculation.

As already pointed out regarding the problem statement:
This only points out that we either need to add an instance parameter to
the API (for explicit control)
or that xtimer needs to have access to multiple low level timers
internally (for implicit adaption).
As a stupid example, using a perfectly working ztimer and wrapping it in
the API of xtimer like that:

xtimer_xxx(uint64_t time) {
     if (time < whatever_threshold) {
         ztimer_xxx(HIGH_FREQ_INSTANCE, time);
     } else{
         ztimer_xxx(LOW_FREQ_INSTANCE, time / somediv);
     }
}

Clearly this is very simplified but you get the idea.
Yeah, but simplified doesn't cut it. Sleeping one second on an 1HZ timer
is a different thing than sleeping 1000ms on an ms timer. Even with a
perfect implementation, the former will sleep anything from zero to two
seconds. The latter anything from 999 to 1001ms, if they each need to
work with hardware of 1Hz resp. 1000Hz. There's not much to be done there.

That is one of the main issues with an API that doesn't have the clock
parameter, but a fixed (probably high frequency) frequency, as xtimer has.

Of course there is a difference.
Here I just wanted to point out that the quality defect of xtimer
not mapping to multiple peripherals is not directly tied to its API.

Further, adding a convention to the xtimer API would allow to for
automatic selection of an appropriate low-level timer.
E.g. think of something like "will always use the lowest-power timer that still ensures x.xx% precision". Again, this is just a simple example to explain what I think we should also consider as part of the solution. Forcing the application / developer to select a specific instance also has it's downsides.


Yes, but that might be more work than to start from scratch.
If fixing includes rewriting or fundamentally changing most of the code
and / or concepts, that should not be called a fix, but a rewrite.
(...)
Agreed. But saying that fixing it might be more work than rewriting it
might be valid.

Also, we're not talking about just fixing reliability issues or bugs.
That certainly can be done.
We're talking about fundamental issues with the API and the underlying
implementation.

I mostly agree. But as I tried to clarify before:
ztimer is mixing "relevant and valid fixes" and "introducing new design concepts". We should strive for being able to tell what is done because it fixes something
and what is done because the concept is "considered better".
Next to that the "considered better" should then be put to the test.


The above example brings me to another thing.
Did we actually decide and agree that it is smart to force the
app/developer to decide which timer instance to use?
I think we unfortunately did not decide on anything...

That's nothing we can't catch up on ;)


For timeouts that are calculated at runtime do we really want to always
add some code to decide which instance to use?
If there are multiple instances, there is code that selects them.
The question would be, do we want

a) to provide an xtimer style API that is fixed on a high level, combine
with logic below that chooses a suitable backend timer

or

b) add a "clock" parameter that explicitly handles this distinction.

Yeah, that is one key thing.
I think that (a) would in most cases be preferable.

To elaborate, think about this:
-some low-level instances may not be present on all platforms
    -> an application that uses a specific instance then just doesn't work?
    -> ztimer then always adds a conversion instance that just maps to another one? -handling of dynamic time values that can be a few ms to minutes (eg. protocol backoffs)
    -> you always need "wrapping code" to decide for a ztimer instance
        -i.e. sleeping a few ms need to be done by the HF backend to be precise
        -sleeping minutes would optimally be done by an LF backend
    ->wouldn't it make sense to move this (probably repeated) code down, from the app, to the high level timer

It may be better to not tell the API "use this instance", but instead
something like "please try to schedule this timeout with xx precision".

If no instance is available that can do that, the timer just "does its best". If it is available, it uses the most-low-power instance available that covers the requirement.


Yes. I think I have started some.

Something along the lines of:

1. ZTIMER_USEC provides at least +-10 us accuracy
2. ZTIMER_USEC prevents sleep if a timeout is set

3. ZTIMER_MSEC provides at least +-2ms accuracy
4. if the hardware supports it, it will wakeup the MCU from sleep

5. ZTIMER_SEC provides +-1 second accuracy
6. if the hardware supports it, it will wakeup the MCU from sleep

This already covers *a lot of our timing needs*, and can easily be
provided (configured automatically). Providing much more is difficult,
and is matter of configuration and documentation. Unless we want runtime
queriable characteristics.

Compare this to the current state:

1. xtimer provides +-31us accuracy and will not wakeup from sleep

Yes ztimer does way better than the current implementation of xtimer.
But the "explicit instance selection" statement from above still applies.


Why not let the application tell what kind of requirements it has for a
timer (e.g. with flags)
and let our high level timer do it's high level stuff to automatically
map it to what is at hand?
If we don't want that, are there valid reasons?
We can do that at compile time (when configuring the clocks), and flags
become the clock parameter.

You maybe already got that form the above statements, but that's not what I meant. I'm referring to "runtime requirements of one specific timeout" that may differ based on the actual value.
Example: A protocol backoff that is 200ms probably requires some HF timer.
Then, because of whatever this may increase to 10 seconds and using an LF timer becomes practical. Wouldn't it be nice if ztimer then automatically allows to go to power down because it is practical? (all that without adding the wrapping code to decide on the instance in the application)


"let our high level timer do it's high level stuff to automatically map
it to what is at hand" is maybe possible.
Now we are talking!


Also keep in mind that some code like that will be required anyway for
runtime calculated timer values
if we want to make use of low power capable timers.
This I don't get.

Did the above clarify this?


Please don't tell us this is not-fixable by design. If so, what is it
that makes these unfixable?
What means *fix*? If I rename ztimer to xtimer, would that could as "fix"?

If the API wouldn't change and the provided functionality stays the same, we could come to an agreement :P


This doesn't require us to start from scratch.
Look, xtimer has around 1k lines of code. *if you know what you are
doing*, you can write those in a week or two, from scratch. *if you
don't*, it takes much longer.

Same goes for "fixing".

What I'm trying to say is that an implementation started from scratch
does not start from scratch in terms of concepts or experience.

Also when introducing xtimer 1us ticks was considered good enough now it
is a bad thing.
What changed?
We put our theories and talking into code, then used that code for a
while, gaining experience.
Ok, agree.
As I already wrote above: it must be possible to write down the key findings, "the essence" of this gained experience into a document, otherwise its worth nothing. We should try to handover the gained experience to newcomers so they can help with improving what we currently have, without them coming up with ideas that were
proved as being wrong before.

The IoT hardware? Our requirements? Your opinion?
Can we write that down? What are the assumed conditions and scenarios
for this to be true?
What are the measurable ups and downs here?
We are talking about the implementation, right?
How many us are one 32768kHz tick? Something around 30.517578125.
when used as internal normalization base, this is weird.

I don't understand this.

If now the same thing happens with ztimer, we didn't learn from the past.
If what happens? If in 5 years, we have learned where ztimer doesn't cut
it and come up with ztimer+ (for lack of letters) that improves the
situation substantially, again?
No, I mean if "having a non functional timer for many years" happens again.
I think the way how xtimer did is job over these years is not something we want to repeat.


If ztimer solves all the problems, we didn't learn either:
We weren't capable of isolating, fixing and documenting the core problems.
We weren't agile. We didn't improve and evolve our code, we just
replaced it without really knowing if and why this is required.
"Because xtimer didn't work" is obviously not what I'm searching for
here, and the "64 bit ain't nothin' for the IoT" discussion is
independent of "making a high level timer functional".
We don't improve and evolve code, we improve and evolve a system.
A system that is made of code...

Digging down into all these nasty concurrency problems, the low level
timer stuff and all that
on the huge number of platforms is really not something you want to do.
Kaspar already did this for more time than a human being deserves,
again, kudos to you!
I really want to help here, but I'm not a big fan of excitedly jumping
over to ztimer without really
thinking thru what we are doing here, why we do that, and at what cost.
I think, if we want to have a proper low-power timer story *soon*, we
should go with an implementation that provides this. If that
implementation is within bounds of acceptable performance metrics (or
just *better than what we have*), and can be shown do be at least as
reliable (bug-free) as what we have, and transition is painless, that
should be done.

Any discussion on what would have been better can continue in parallel.

At some point, we need to be pragmatic.
Yes, but at some point we should also take a step away and recap instead of only implementing.

Sorry for this text-monster and thanks for your time!
Dito. These mails take hours to write. Can we have a meeting?

Kaspar
I did it again...
Yes please let's have a meeting (and an RDM PR + discussion there)


cheers
Michel

_______________________________________________
devel mailing list
devel@riot-os.org
https://lists.riot-os.org/mailman/listinfo/devel

Reply via email to