Hi Michel,

On 12/11/19 4:50 PM, Michel Rottleuthner wrote:
> Hi Kaspa
>> Would it make sense to make a micro conference? Get everyone interested
>> in improving timers in a room and lock it until solutions are presented?
> Not convinced about the "lock in a room" ;) - but otherwise: absolutely
> yes!
> 
> What do you think about an RDM PR?
> We could just use your design document as a starting point.

Let me propose the following: we make merging ztimer *as a distinct and
optional module* independent of changing RIOT's default timer
implementation. The latter can be done within the RDM.

IMO, ztimer is quite usable already, and even if only used as
multiplexer for the RTT, it provides a lot of benefit. I don't see a
reason not to merge it (when it's been reviewed properly), as an
optional module.

We can, in parallel, work on the RDM. If it turns out there's some
better option than ztimer, no harm was done, I'll of course happily
accept that.
I already have a basic xtimer benchmark application (for timing
set()/remove()/now() in pathetic cases), which can provide at least some
numbers. I'll PR that today.

Regarding "fixing" xtimer vs a rewrite from scratch, I'd like to point
out that #9503 alone changes (well, removes) ~450 lines of *logic code*.
That is three quarters of xtimer's total code, including definitions and
prototypes. IMO, we need to acknowledge that changing that amount of
code does not result in the same code base. We should call it "ytimer".
The amount of reviewing, validation and testing should be the same as
for a rewrite. Or maybe just be measured in "amount of lines changed".

Regarding whether a "clock" parameter makes sense, this is something we
should explore within the RDM. I think you need to prototype a function
that chooses a suitable frequency from multiple options (without relying
on an explicit parameter for that). (I'd actually suggest you use ztimer
as basis, as there you have multiple, multiplexed backends using the
same API. :) ). You might even be successful. At that point, an RDM can
decide if that functionality should move down the layers.

More details following:

>> periph_timer IMO should be the slimmest layer of hardware abstraction
>> that makes sense, so users that don't want to do direct non-portable
>> register based applications get the next "closest to the metal".
> Agree, but there are some things that we should add to the periph_timer.
> E.g. adding support for dedicated overflow interrupts together with an
> API to read
> the corresponding IRQ status bit.
> The high level timer would benefit from that on many platforms.
> E.g. Ztimer wouldn't require the code for the time partitioning
> mechanism then.
> But thats yet another part of the story...

Yes. Also, do all platforms support that overflow interrupt?
I don't think so, in which case this cannot be relied upon to be available.

> Also the term "frequency conversion" is a bit misleading I think.
> With a discrete clock you won't be able to just precisely convert a
> frequency to any other frequency in software.
> Especially if you want to increase the frequency - it will just be a
> calculation.

Yup. Frequency conversion makes sense if an application wants to sleep
12345ms, but the timer is clocked at e.g., 1024Hz.

>> That is one of the main issues with an API that doesn't have the clock
>> parameter, but a fixed (probably high frequency) frequency, as xtimer
>> has.
> 
> Of course there is a difference.
> Here I just wanted to point out that the quality defect of xtimer
> not mapping to multiple peripherals is not directly tied to its API.
> 
> Further, adding a convention to the xtimer API would allow to for
> automatic selection of an appropriate low-level timer.
> E.g. think of something like "will always use the lowest-power timer
> that still ensures x.xx% precision".

That's what ztimer does.

> Again, this is just a simple example to explain what I think we should
> also consider as part of the solution.
> Forcing the application / developer to select a specific instance also
> has it's downsides.

With convention ("ZTIMER_MSEC provides ~1ms accuracy"), the application
developer chooses the intended precision. Without that, and with a fixed
API time base, 1s (1000000us) cannot be distinguished from 1000ms or
1000000us.
Maybe it can, this is where you can maybe come up with a prototype.

> I mostly agree. But as I tried to clarify before:
> ztimer is mixing "relevant and valid fixes" and "introducing new design
> concepts".
> We should strive for being able to tell what is done because it fixes
> something
> and what is done because the concept is "considered better".
> Next to that the "considered better" should then be put to the test.

Ok. That might be necessary to choose one implementation over xtimer. -> RDM

>>> For timeouts that are calculated at runtime do we really want to always
>>> add some code to decide which instance to use?
>> If there are multiple instances, there is code that selects them.
>> The question would be, do we want
>>
>> a) to provide an xtimer style API that is fixed on a high level, combine
>> with logic below that chooses a suitable backend timer
>>
>> or
>>
>> b) add a "clock" parameter that explicitly handles this distinction.
>>
> Yeah, that is one key thing.
> I think that (a) would in most cases be preferable.
> 
> To elaborate, think about this:
> -some low-level instances may not be present on all platforms
>     -> an application that uses a specific instance then just doesn't work?

Compile-time error.

>     -> ztimer then always adds a conversion instance that just maps to
> another one?

... if configured to do at compile time, and if necessary.

> -handling of dynamic time values that can be a few ms to minutes (eg.
> protocol backoffs)
>     -> you always need "wrapping code" to decide for a ztimer instance
>         -i.e. sleeping a few ms need to be done by the HF backend to be
> precise
>         -sleeping minutes would optimally be done by an LF backend

Both can be handled by a millisecond timer.

>     ->wouldn't it make sense to move this (probably repeated) code down,
> from the app, to the high level timer

If the range needed exceeds e.g., 32bis of milliseconds, which can
represent more than 49 *days*, such code might make sense.

> It may be better to not tell the API "use this instance", but instead
> something like "please try to schedule this timeout with xx precision".

That "precision" is exactly what the instance provides, with convention.
For specifying that, alternatives are "xxtimer_set_usec(t, value)", or
"xxtimer_set(t, value, 1000000LU)", .... Which actually map nicely on
ztimer's api.

> If no instance is available that can do that, the timer just "does its
> best".

That is what a compile-time configured ZTIMER_USEC on 32kHz would do, if
that is desirable.

> If it is available, it uses the most-low-power instance available that
> covers the requirement.

In ztimer's design, the most low power timer capable of doing e.g.,
millisecond precision is provided as ZTIMER_MSEC. No need for runtime logic.

> You maybe already got that form the above statements, but that's not
> what I meant.
> I'm referring to "runtime requirements of one specific timeout" that may
> differ based on the actual value.
> Example: A protocol backoff that is 200ms probably requires some HF timer.
> Then, because of whatever this may increase to 10 seconds and using an
> LF timer becomes practical.

milliseconds can usually be set on low-power timers (e.g., a 32kHz RTT).
32bit range at 1kHz means 49 days. No need to switch to a different clock.

Other examples would be a 200 - 2000us timer. That cannot be set on a
millisecond clock, as now=500us + 2000us is 2500us, whereas now=0.5ms
(==0ms) + 2ms is somewhere between 2 and 3 ms. With an "at least"
semantic, the timer would choose 3ms as target time, which is up to 1ms
(1000us) off. This *could* be solved by synchronizing twice, e.g., set a
timer to the next ms tick, sleep 1ms, then sleep the remainder of the
timeout in us precision. Doable, but tricky. Especially while keeping
accuracy guarantees over two clock domains.

If an application schedules <1000us...>2**32us, well yeah, it must be
using 64bit values already. In that case, we might need a 64bit API, or
*really* let the application handle that.

> Wouldn't it be nice if ztimer then automatically allows to go to power
> down because it is practical?
> (all that without adding the wrapping code to decide on the instance in
> the application)

Sure. But honestly, that is *one* function, which would in any sane
architecture be completely on top of something like ztimer.

>> "let our high level timer do it's high level stuff to automatically map
>> it to what is at hand" is maybe possible.
> Now we are talking!

Please prototoype!

I think that could, if successful, be the heart of a ztimer 64bit extension.

>>> Please don't tell us this is not-fixable by design. If so, what is it
>>> that makes these unfixable?
>> What means *fix*? If I rename ztimer to xtimer, would that could as
>> "fix"?
> 
> If the API wouldn't change and the provided functionality stays the
> same, we could come to an agreement :P

Now that's easy. ;)

>>> The IoT hardware? Our requirements? Your opinion?
>>> Can we write that down? What are the assumed conditions and scenarios
>>> for this to be true?
>>> What are the measurable ups and downs here?
>> We are talking about the implementation, right?
>> How many us are one 32768kHz tick? Something around 30.517578125.
>> when used as internal normalization base, this is weird.
> 
> I don't understand this.

Nevermind, I was doing the math backwards.

>>> If now the same thing happens with ztimer, we didn't learn from the
>>> past.
>> If what happens? If in 5 years, we have learned where ztimer doesn't cut
>> it and come up with ztimer+ (for lack of letters) that improves the
>> situation substantially, again?
> No, I mean if "having a non functional timer for many years" happens again.
> I think the way how xtimer did is job over these years is not something
> we want to repeat.

So an RDM would have prevented this? I doubt it. xtimer's ISR buggyness
was known *for years* even without an RDM. We were just inexperienced
(or ignorant, or incompetent, your pick).

>>> If ztimer solves all the problems, we didn't learn either:
>>> We weren't capable of isolating, fixing and documenting the core
>>> problems.
>>> We weren't agile. We didn't improve and evolve our code, we just
>>> replaced it without really knowing if and why this is required.
>>> "Because xtimer didn't work" is obviously not what I'm searching for
>>> here, and the "64 bit ain't nothin' for the IoT" discussion is
>>> independent of "making a high level timer functional".
>> We don't improve and evolve code, we improve and evolve a system.
> A system that is made of code...

Sure, but in the end, we should prioritize on the system, not its
components. If the system consists of a component that has two
alternative imlementations, blah blah blah ... ;)

>> At some point, we need to be pragmatic.
> Yes, but at some point we should also take a step away and recap instead
> of only implementing.

That is valid. Please acknowledge that while recapping, I can already
write applications that have both HF and LF timers, while the recappers
might be stuck, literally, because xtimer is broken.

Kaspar
_______________________________________________
devel mailing list
devel@riot-os.org
https://lists.riot-os.org/mailman/listinfo/devel

Reply via email to