Re: [riot-devel] ztimer - a new high-level timer for RIOT

Marian Buschsieweke Thu, 12 Dec 2019 08:47:14 -0800

Hi,

here are my thoughts on the discussion.


# Not Getting Lost in Requirement Analysis and Problem Specifications

A good requirement analysis is a valuable tool for both development and
evaluation of a software component. But once a solid understanding of the
problem to solve is reached, additional effort put into a requirement analysis
doesn't yield enough additional benefit to justify the work. And every
requirement analysis is flawed: Assembling a complete list of all current
requirements in RIOT's code base is hard. Predicting how future developments
influence the requirements we have is impossible. There has to be a point when
we just stop on collecting more requirements and consider the current list as
good enough; a perfect, complete and definite result cannot be reached.

High level timer API are no new concept and the basic goals and requirements
are well understood. On top of these basic requirements, benchmarks could be a
good tool to quantify how well specific timer implementations perform. To me,
writing a set of benchmarks would be more useful than additional requirements
collection. Not only would it allow to see how good ztimer/xtimer are
performing. They will also be useful for development and reviews of future PRs
targeting RIOTs timer framework.

In the end, RIOT will be judged upon the features it provides. Not on the
features on RIOT's to do lists. Not on how tough and rigorous the requirements
are we have formulated on some yet-to-be-implemented feature. And not on how
much documents and emails have been written about an yet-to-be-implemented
feature.

# Complete Rewrites can be the Best Option

@Michel: You seem to dismiss the development approach of a complete rewrite
fundamentally, apparently for ideological reasons. While most of the time a
complete rewrite is not the best option, there are good reasons for doing so in
some cases: When fundamental architectural changes are needed, the effort of an
rewrite can be less compared to iterative transform the architecture. In such
cases, a rewrite is the better option.

I don't think that there is a reason to see a complete rewrite as some kind
of failure that we should try very hard to prevent for the future. And
ultimately, any attempt to prevent rewrites can fail. Any design decision is
based on the current experience, the available tools, and the current
requirements. But with time, more experience is gained, better tools become
available, and completely new use cases with potentially fully distinct
requirements pop up. Perfectly solid design decisions can be rendered obsolete
by this. And sometimes, fundamental architecture changes are needed in
response. In that cases, a rewrite can be the best option. And this could
happen very well without any mistakes being made in the original implementation.

# Suggestion of a Path Forward

There seems to be an agreement, that an additional parameter is needed in the
API. How about we specify an implementation independent high level timer API
based on xtimer with an additional parameter. The type of the first argument
could just by some `foo_t` that is provided by the implementation of that API.
The implementation could be freely chosen as module, and xtimer could be the
default backend when the user has not specified a backend.

I bet it is trivial to extend xtimer in a way to comply with that API by just
adding an ignored parameter. Everyone with the motivation, knowledge and
dedication to fix the discussed issue in xtimer has still the opportunity to do
so. And providing ztimer as alternative implementation of the same API would
not hurt anyone.

Not defining the semantics and contents of the first parameter sounds a bit
crazy. But I bet that 90% of all use cases will only use two different values
there: One low power low resolution setting for long sleeps and one high power
high precision setting e.g. in drivers. Just providing these two settings as
global variables (or preprocessor macros) under defined names would be enough
to cater 90% of the use cases in an implementation agnostic way. (And the other
use cases could provide their own global variables for the additional settings,
so that only the assignment of those variable depend on the used
implementation.) And if there is indeed enough interest in fixing xtimer, this
would allow them to freely decide if they want this parameter to be a pointer
to a clock struct as in ztimer, or maybe a bitmask holding a set of flags, or
something completely different.

It might be very well possible that the future will bring use cases will
mutually exclusive requirements. A implementation independent API would allow
to just let the users choose what they need.

Kind regards,
Marian


On Tue, 10 Dec 2019 18:23:53 +0100
Michel Rottleuthner <michel.rottleuth...@haw-hamburg.de> wrote:

> Hi,
> 
> Thanks for starting this! It's very much appreciated.
> Discussing these things, reaching common ground and documenting 
> decisions and findings
> during this process is IMO one of the most important things to do before 
> we move on.
> 
> I'm really sorry to write this wall of text, but there are so many 
> things to this timer topic where
> we IMO lack a facts driven analysis, decision- and development-process.
> This is combined with a lack of documentation of decisions and their 
> implications.
> 
> TL;DR:
> let's try to decide what is good for our high level timer based on 
> measurable facts.
> For that we need detailed knowledge on the hardware we target, the 
> software that will use the high level timer,
> and measurements/benchmarks that tell us what are the ups an downs of 
> different ways of implementation.
> 
> Side note: I think we should move the discussion to an RDM PR for the 
> design document and discuss there to not get lost.
> In the much too long version below I comment mostly on the design 
> document, other quotes as indicated.
> 
> > RIOT needs an easy to use, efficient and flexible timer system that 
> > allows precise high-frequency (microsecond scale) timings *alongside* 
> > low-power timers.
> >
> > For example, an application might include a driver requiring 
> > high-frequency (microsecond scale) timings, but at the same time needs 
> > to run on batteries and thus needs to make use of low-power timers 
> > that can wake up the device from deep sleep.
> >  
> 
> I fully agree that this currently is a problem and it needs to be resolved.
> But what this statement essentially points out, is that the xtimer API 
> either misses
> an instance parameter to be used on different low level timers or that it
> misses the functionality to internally handle multiple low level timers 
> to multiplex
> timeouts to the (different) available hardware instances to fulfill 
> low-power and high precision sleep requirements.
> 
> The problem statement implies nothing related to the rest of the high 
> level timer API and design at all.
> Thus, it is not a problem statement that shows ztimer is our only option.
> 
> *efficient*
> -in which dimensions? Do we favor small RAM, ROM, CPU overhead?
> -how do we prefer this to scale if more or less timers are used?
> -does it make sense to decide for only one of these things at all? (one 
> implementation may not suit everyone)
> -just thinking: would it hurt to have one memory efficient 
> implementation and one that is "bigger, but faster"?
> 
> *flexible*
> -in level of manual control it provides?
> -in level of automatic functionality and abstraction it provides?
> 
> *precision (& accuracy)*
> -the bounds are defined by the hardware.
> -the question is how do we come as close as possible to the capabilities 
> the HW provides.
> -where do we need trade-offs?
> 
> *low-power*
> -dependencies are mostly defined by the hardware
> -How do we model these dependencies?
> -for our power management we try to do things implicit.
>      -how does this work together with a timer that needs to be called 
> with explicit instances?
>      -why should the timer API differ from that principle?
> 
> 
> The following general questions pop up:
> -What does the hardware provide that is abstracted by this high level timer?
> -How much abstraction do we need?
> -Where do we need trade-offs?
> -How do we balance them?
> -Based on what information?
> 
> Please don't get me wrong, I'm not in principle for or against xtimer, 
> ztimer (or even wtimer, the whatever timer;)
> Yes, 64 bit time intuitively doesn't sound like a perfect fit for the 
> IoT, but how does this translate to numbers (also different scenarios)?
> Yes, xtimer is broken. Yes, it needs fixing or replacement.
> *But: the functional problems related to xtimer are not related to it's 
> API, it is the implementation!*
> 
> 
> > General requirements:
> > - very efficient timers for use in time-critical drivers  
> 
> This statement touches memory, runtime overhead and precision, but isn't 
> precise on how to balance between them.
> 
> 
> > easy-to-use interface (unified interface)  
> 
> Very much a matter of taste, but also what is the importance of "unified"?
> If there are two completely distinct use-cases (requirements) why 
> enforce unified API for that?
> 
> 
> > work with varying MCU timer widths (16, 24, 32-bit timers)  
> 
> Agree, an absolute must! (though I'd rather specify it more as "any width")
> 
> 
> > - adaptable to varying configurations of timers, RTTs, RTCs
> > (use RTC if available for super-long-time timers)
> > - this means that applications and / or system modules need to be able 
> > to set timers on different timer hardware (configurations) 
> > *simultaneously*  
> 
> To me this feels like it is asking for an API below the high level timer 
> that fits various kind of hardware (?).
> We currently miss such an API. Though, an extended periph_timer could be 
> used for most (all?) of it.
> 
> 
> > API (necessary functionality):
> > - Wait for a time period (e.g. timer_usleep(howlong))
> > - receive message after given period of time (e.g. timer_msg(howlong))
> > - receive message periodically
> > - Await a point in time
> > - Wait for a (past) time + x  
> 
> I think we shouldn't start on the API level when thinking about 
> functionality.
>  From a functional point of view there are four(?) different uses for 
> high level timers:
> (1) delay execution (semantic: "wait at least...")
> (2) timeout an operation (semantic: "at some not too precise time after, 
> but eventually...")
> (3) schedule an action (semantic: "try to hit an exact point in time")
> (4) measure time (semantic: how much time passed between t1 and t2)
> 
> Xtimer doesn't leave much to be desired from that perspective.
> Nonetheless, I think we should try to write down the requirements for 
> the above things more precisely.
> How these functionalities are exposed thru an API is a different story.
> 
> 
> > Regarding the general requirements, current __xtimer is not "adaptable 
> > to varying configurations of timers, RTTs, RTCs"__.
> >
> > While xtimer can be configured to use low-power timer hardware instead 
> > of the (default) high-precision timer hardware, an application needs 
> > to *choose* either low-power, low precision timers *or* high-precision 
> > timers that prevent the system from sleeping, *system wide*.
> >
> > If xtimer is configured to use a low-power timer, it forces it to 
> > quantize *all* timer values to the frequency of the low-power timer, 
> > which usually has a low frequency (usually 32kHz).
> >  
> As already pointed out regarding the problem statement:
> This only points out that we either need to add an instance parameter to 
> the API (for explicit control)
> or that xtimer needs to have access to multiple low level timers 
> internally (for implicit adaption).
> As a stupid example, using a perfectly working ztimer and wrapping it in 
> the API of xtimer like that:
> 
> xtimer_xxx(uint64_t time) {
>      if (time < whatever_threshold) {
>          ztimer_xxx(HIGH_FREQ_INSTANCE, time);
>      } else{
>          ztimer_xxx(LOW_FREQ_INSTANCE, time / somediv);
>      }
> }
> 
> Clearly this is very simplified but you get the idea.
> I'm not saying this is smart or we should do that, I just want to point 
> out that whatever is inside xtimer can be fixed.
> And also it is not absolutely needed to change all things 
> (implementation and API) at once.
> Saying that we need to replace it because it is not possible to fix it 
> is simply not true.
> 
> The above example brings me to another thing.
> Did we actually decide and agree that it is smart to force the 
> app/developer to decide which timer instance to use?
> For timeouts that are calculated at runtime do we really want to always 
> add some code to decide which instance to use?
> Then, how to write platform independent code if you don't know which 
> instances will be available and what quality characteristics it comes with?
> 
> Is the instance name combined with "convention" as only meta-information 
> on a timer instance sufficient?
> ->ZTIMER_USEC suggests USEC precision, but how about accuracy, low power   
> dependencies etc. ?
> I'm not saying all of this can be solved easily but I think we should 
> discuss and document that.
> 
> 
> > Worse, much code in RIOT uses xtimer with the expectation of it being 
> > high-precision, and breaks in unknown ways if suddenly used with 
> > low-frequency timers through the microsecond based API. The API does 
> > not reflect this distinction.  
> 
> A converted ztimer instance named ZTIMER_USEC has similar problems.
> 
> Why not let the application tell what kind of requirements it has for a 
> timer (e.g. with flags)
> and let our high level timer do it's high level stuff to automatically 
> map it to what is at hand?
> If we don't want that, are there valid reasons?
> 
> Also keep in mind that some code like that will be required anyway for 
> runtime calculated timer values
> if we want to make use of low power capable timers.
> 
> 
> > - it is not ISR safe (#5428, #9530, #11087)
> > - it doesn't have unittests (#10321)
> > - it has some underflow bugs and race conditions (#7116, #9595, #11087)
> > - there's a PR trying to use multiple backend drivers (#9308)  
> 
> Please don't tell us this is not-fixable by design. If so, what is it 
> that makes these unfixable?
> 
> 
> > - it sets target timers using absolute target times, making it prone 
> > to underflows (#9530 fixes this?)  
> 
> Yes its a problem and #9530 shows one possible solution for fixing it.
> 
> 
> > - it forces all timers to 1us ticks at the API level, which in turn 
> > forces internal 64bit arithmetic  
> 
> No one said we can not add to and adapt the API if requirements are not met.
> This doesn't require us to start from scratch.
> Also when introducing xtimer 1us ticks was considered good enough now it 
> is a bad thing.
> What changed? The IoT hardware? Our requirements? Your opinion?
> Can we write that down? What are the assumed conditions and scenarios 
> for this to be true?
> What are the measurable ups and downs here?
> 
> > - its optional non-us tick support has undefined and fixed (generally, 
> > inflexible) arithmetic for conversion
> > - its monolithic design does not allow an application to use timer 
> > hardware with custom configuration *alongside* the default configuration
> > - xtimer is depending on correct per-platform configuration values 
> > (XTIMER_BACKOFF, XTIMER_ISR_BACKOFF, which are hard to understand)
> > - subjectively, it's code is a mess of microsecond and tick types and 
> > complex interactions  
> 
> All the above points are implementation problems.
> 
> 
> > - there's no clear path for implementing higher-level functionality 
> > while keeping the API, e.g., for implementing a network-wide 
> > synchronized clock  
> 
> I'm not sure if I understand that. What is the missing clear path and 
> how does ztimer provide it?
> 
> 
> > With API additions, the forced 1us timer base *could* be removed, 
> > making the 64bit arithmetic unnecessary if done right. This would 
> > require a tremendous effort and probably touching *every line* in xtimer.  
> 
> Rather starting completely from scratch instead of touching existing 
> code in master shows how much we can trust the code that is sitting there.
> IMO we can not improve this situation and create maintainable code if we 
> always fall back to this "throwaway&rewrite" type of development process.
> 
> 
> > ztimer solves all of xtimer's identified issues.  
> 
> Thats impressive and I hope for the best, but I'm at least skeptical 
> about not
> introducing new problems we currently don't have. Testing and time will 
> tell.
> The fact that ztimer is easier to test is great and for sure allows for 
> catching
> more bugs that were hard to find and reproduce in xtimer.
> 
> 
> > When compiled, ztimer's base configuration (1MHz periph timer, no 
> > conversion) uses ~25% less code than xtimer (measured using direct 
> > port of `tests/xtimer_msg` to ztimer).  
> 
> Cool. With code you mean ROM? How about RAM?
> Since you mention "without conversion" how does this change if 
> conversion is enabled?
> What is the cost of enabling the extension module?
> Also how does this change over different platforms and when using 
> multiple timers in various scenarios?
> 
> 
> > - its API only allows 32bit timeouts, vs. full 64bit in xtimer. This 
> > allows much simpler (and efficient) arithmetics and uses less memory.  
> 
> While I agree this intuitively sounds to be true, it wouldn't hurt to 
> collect some numbers in the design document on how this actually 
> compares in the different scenarios.
> 
> 
> > It has been observed that no known hardware can provide both 64bit 
> > range and precision, and more importantly, no application needs it.  
> 
> Thats not true. Counter example: esp32 has 64 bit timers.
> Also there are other platforms that allow chaining together timers to 
> get one 48 or 64 bit timer.
> That no application actually needs this is maybe just your opinion.
> Or do you say the hardware manufacturers put this into their chip only 
> because there was still some silicon space left?
> 
> 
> > By offering multiple time bases (e.g. milliseconds for long timeouts 
> > and microseconds for short ones) with full 32bit range each, all 
> > application's actual needs should be met. If not, a 64bit 
> > implementation could be added on top of ztimer, only causing overhead 
> > for applications that actually use it.  
> 
> Sounds reasonable to me.
> Yet, we should try to compile a list of "application's actual needs" to 
> see if we are missing something.
> I also offer my help for that.
> 
> 
> > - ztimer stores a timer's offset relative to the previous timer. This 
> > means the time left until a timer expires cannot be calculated by 
> > taking `absolute target time - now`, but needs to be summed up. 
> > Essentially, a hypothetical `ztimer_left(clock, timer)` becomes an 
> > O(n) operation.  
> 
> This, and similar things related to the relative offset concept are 
> things we should come up with
> extensive benchmark data of different scenarios.
> We need quantifiable information -arguable facts- to see what are the 
> ups and downs of a design decisions.
> 
> 
> > - ztimer disables, as does xtimer, IRQs while manipulating its timer 
> > linked lists. Thus the time spent with ISRs disabled is depending on 
> > the amount of timers in the list, making ztimer real-time unsafe. 
> > (there are ideas to fix this, see Outlook section below)  
> 
> What about mutex instead of irq_disable for all the time?
> 
> 
> > - Idea: we specify that by convention, ZTIMER_MSEC (and ZTIMER_SEC) 
> > *keep running in sleep mode*, whereas ZTIMER_USEC stops when the MCU 
> > enters sleep (unless a timeout is scheduled). This is current 
> > behaviour *if* ZTIMER_USEC is using periph_timer as backend and 
> > ZTIMER_MSEC is using RTT/RTC.  
> 
> Wouldn't it be a bit counter intuitive to tie the capabilities to the 
> instances if sometimes one is modeled by another (that doesn't have 
> these capabilities)?
> 
> 
> > This would mean that `before = ztimer_now(clock); do something; diff = 
> > ztimer_now(clock) - before;` only works if either `do_something` does 
> > not schedule away the thread causing sleep *or* a clock is used that 
> > runs in sleep mode.  
> 
> The "does not schedule away" is probably hard to impossible to ensure 
> and not what I would intuitively
> expect when measuring the execution time.
> Shouldn't a high level timer handle this for the user?
> Yes, but how to do that with 32 bit if "clock" is based on a fast 
> running timer?
> 
> 
> > - the behaviour could be accessible either through defines 
> > (ZTIMER_USEC_LPM_MODE or ZTIMER_USEC_DEEPSLEEP ...), *or* be made part 
> > of the ztimer API  
> 
> +1 for "made part of the API" over "through defines"
> 
> 
> > - in addition, we could add functions to explicitly tell the clocks to 
> > stay available until released, e.g., `ztimer_acquire(clock); before = 
> > ztimer_now(clock); do something; diff = ztimer_now(clock) - before; 
> > ztimer_release(clock);`.
> >     Once the "if timer is scheduled, don't sleep" is implemented, this 
> > could also be worked around by:
> >     `ztimer_set(clock, dummy, 0xffffffff); ...; ztimer_cancel(clock, 
> > dummy);`  
> 
> This looks like a workaround for not considering the use case of 
> "measuring time" beforehand.
> 
> 
> Going a bit away from the technical stuff, there are also some things to 
> discuss regarding the process.
> 
> TL;DR for the below: shoud we really prefer "throw out & rewrite" over 
> "making our code maintainable"?
> 
> 
> On 12/9/19 9:25 PM, Oleg Hahm wrote:
> 
> > (...) is apparently not fixable (at least no one came up with a concrete
> > proposal to fix it to the best of my knowledge).   
> 
> I disagree. No! Don't shout "OK, then go fix it!" ;).
> There where a lot of people providing fixes, testing and trying to 
> improve the situation.
> All of them failed. Neither because all the ideas weren't good nor 
> because the solutions weren't working.
> It was because we as a community didn't care enough, didn't put high 
> enough priority on it, the code being a complex mess,
> and generally: ain't no body got time for that.
> These problems are serious and we should be able to improve on that 
> without always falling back to "starting from scratch" [1].
> 
> > Having a concrete implementation proposal is great.
> > Thanks for the effort of designing and implementing the current state of
> > ztimer and thanks for the documentation of this effort!  
> 
> +1 and a big thanks to Kaspar & @gebart!
> 
> 
> I also want to recap a thing that was put up on the last summit.
> When xtimer was introduced, it was expected to improve whatever problems 
> vtimer had.
> I don't know how vtimer performed, but oh boy, if it was worse than 
> xtimer nowadays I'm glad I came to use RIOT after that^^
> What I actually want to say: having a solution at hand that is thought 
> to solve all our problems doesn't imply it will.
> Throwing out xtimer because now there is ztimer sounds very much like 
> history repeating itself.
> If now the same thing happens with ztimer, we didn't learn from the past.
> If ztimer solves all the problems, we didn't learn either:
> We weren't capable of isolating, fixing and documenting the core problems.
> We weren't agile. We didn't improve and evolve our code, we just 
> replaced it without really knowing if and why this is required.
> "Because xtimer didn't work" is obviously not what I'm searching for 
> here, and the "64 bit ain't nothin' for the IoT" discussion is 
> independent of "making a high level timer functional".
> 
> Digging down into all these nasty concurrency problems, the low level 
> timer stuff and all that
> on the huge number of platforms is really not something you want to do.
> Kaspar already did this for more time than a human being deserves, 
> again, kudos to you!
> I really want to help here, but I'm not a big fan of excitedly jumping 
> over to ztimer without really
> thinking thru what we are doing here, why we do that, and at what cost.
> 
> Sorry for this text-monster and thanks for your time!
> 
> 
> cheers
> Michel
> 
> [1]: 
> https://image.slidesharecdn.com/living-with-legacy-software-160628155337/95/strangler-application-patterns-and-antipatterns-9-638.jpg?cb=1467129568
> 
> _______________________________________________
> devel mailing list
> devel@riot-os.org
> https://lists.riot-os.org/mailman/listinfo/devel

pgpBDlAFaEh0O.pgp
Description: OpenPGP digital signature

_______________________________________________
devel mailing list
devel@riot-os.org
https://lists.riot-os.org/mailman/listinfo/devel

Re: [riot-devel] ztimer - a new high-level timer for RIOT

Reply via email to