Re: [PR] sched: Refactor the functional incorrect HRTimer implementation. [nuttx]

via GitHub Fri, 26 Dec 2025 04:05:59 -0800


Fix-Point commented on PR #17675:
URL: https://github.com/apache/nuttx/pull/17675#issuecomment-3692774822


   > @Fix-Point
   > 
   > Let me make a conclusion of all my comments for your implementation into 
four categories:
   > 
   > 1. Your way of resolving multicore running the same hrtimer for SMP system 
has performance issue and is not deterministic
   > 2. You does not consider to remove un-needed operations when running on 
non-SMP platform, so your implementation is also not good for non-smp platform
   > 3. Your APIs are two complicated, not friendly for users
   > 4. You do not decouple the queue abstraction for hrtimer very well, the 
queue related operations are not needed to be exposed to users
   
   Thank you for your suggestions. Your comments are very insightful. Below are 
the modifications made based on your advice:  
   
   - Your suggestion regarding performance improvement in non‑SMP mode is 
excellent. Following your advice, macros have been added to remove unnecessary 
waits that cannot occur in non‑SMP scenarios.  
   - Regarding the point about too many parameters in the API, I decided to 
fully align with the `wdog/workqueue` interface: the original `cancel` is now 
named `cancel_sync`, while `async_cancel` is renamed to `cancel`.  
   - Your suggestion to move the `hrtimer_queue.h` implementation into an 
internal header file is very good, as it helps reduce the likelihood of 
exposing this header to users.  
   - Your suggestions for documentation improvements are very helpful. Since 
RST documents cannot render Mermaid state diagrams, I used an AI‑generated 
ASCII art to describe the state diagram.
   
   Regarding some other points that have not been changed yet, clarifications 
are as follows:  
   
   ### 1. Determinism Issues Caused by Waiting  
   We need to discuss this case by case:  
   - In **non‑SMP mode**, because a timer interrupt cannot occur while a thread 
is executing `cancel`, the semantics of `hrtimer_cancel` and 
`hrtimer_cancel_sync` are essentially the same—no waiting is required. 
Therefore, both `hrtimer_start` and `hrtimer_cancel` are deterministic and 
predictable.  
   - In **SMP mode**, only `hrtimer_cancel_sync` requires waiting, which is 
unavoidable. This interface is typically called only before releasing a 
dynamically allocated `hrtimer` or when expecting all callback executions to 
complete. In most cases, **restarting a timer only requires the asynchronous 
`hrtimer_cancel` + `hrtimer_restart` combination, which does not involve 
waiting and is deterministic**. However, it requires users to be aware of 
synchronization issues in callback functions.  
   
   ### 2. Interface Design Issues  
   As mentioned, the reasons for adopting an interface in the form of 
`hrtimer_start(timer, func, arg, time)` are twofold:  
   - To align with the semantics of the `wdog/workqueue` interface, improve 
user‑friendliness, and maintain consistency with kernel interface styles.  
   - To use `hrtimer->func` (`NULL` and `UINTPTR_MAX`) to encode state, saving 
4 bytes of storage for the `state` field.  
   
   From a performance perspective, compared to the previous implementation, 
this PR’s implementation does incur **two additional writes outside the 
critical section** when restarting a timer, but it saves **one write inside the 
critical section** (because `func` is used to encode state).
   As shown in the table:  
   
   | | `hrtimer_start(timer, func, arg, time)` | `hrtimer_start(time, mode)` |  
   | - | - | - |  
   | start timer | func, arg, expired | func, arg, expired, **state** |  
   | restart timer | func, arg, expired | expired, **state** |  
   
   Since these are consecutive writes to the same cache line and CPU pipelining 
can hide some of the latency, the extra write outside the critical section 
should not cause significant performance issues in practice.  
   
   Overall, I believe the `hrtimer_start(timer, func, arg, time)` interface 
form is better.  
   
   ### 3. Lock Performance Issues  
   According to our test results (https://github.com/apache/nuttx/pull/17569/): 
 
   - On **non‑SMP platforms**, the write‑lock performance of seqcount is 
**1.58x** that of spinlock.  
   - On **SMP platforms**, the write‑lock performance of seqcount is roughly 
comparable with spinlock.  
   
   However, these results are from x86 platforms and may vary across different 
architectures.  
   Therefore, I fully agree with your point about lock performance—there is no 
lock fits for all scenarios. For example, in multi‑core real‑time scenarios, we 
might need starvation‑free queuing locks instead of high‑throughput spinlocks.  
   To address this, I will attempt to introduce a **composable lock abstraction 
layer** in future improvements, allowing users to customize the lock used by 
`hrtimer_queue`.  
   The lock‑related interfaces have already been abstracted, and I am 
considering how to better compose the lock data structure for `hrtimer_queue`.
   
   ### 4. Why Forward‑declare the Dependent Function `hrtimer_reprogram`?  
   I also tried two other approaches, but each had its own issues:  
   - Using a pre‑defined `hrtimer_ops_t` callback to inform users that they 
need to implement `hrtimer_reprogram` when instantiating an `hrtimer`. This 
works fine under `GCC/Clang`—the compiler can fully inline `static const 
hrtimer_ops_t` and eliminate the overhead of the structure. However, I checked 
the assembly and found that safety compilers like **CompCertC** cannot perform 
such optimizations. Hence, I had to adopt the current implementation.  
   - Using higher‑order macro functions, passing `hrtimer_reprogram` as a 
dependent input function. This avoids compiler optimization issues, but 
@xiaoxiang781216  and @GUIDINGLI  considered it poor for readability.  
   
   After trying the above approaches, I chose to forward‑declare the dependent 
function `hrtimer_reprogram`. This implementation meets readability, 
performance, and space requirements, but requires slight caution during use 
(the inclusion order must strictly follow Figure 1; otherwise, compilation 
errors may occur).  
   
   ### 5. Why Replace Instead of Improve?  
   I appreciate your use of **reference counting** in the latest version to 
solve memory reclamation issues. However, the problem is that **reference 
counting loses version information**. A version‑aware method, such as 
**Epoch‑based memory reclamation** (a typical kernel implementation is Linux’s 
QSBR‑RCU), is needed to address this.  
   
   As I mentioned in a previous scenario:  
   - **t0**: Thread 0 on Core 0 starts a periodic timer with callback `CA`, 
initial delay `1000 ns`, period `UINT32_MAX`.  
   - **t1**: The timer fires on Core 0, and callback `CA` is executing.  
   - **t2**: Thread 0 is scheduled onto Core 1, cancels the timer, and restarts 
a new timer with callback `CB`, initial delay `0 ns`, period `1 000 000 ns`.  
   - **t3**: The timer fires on Core 1, and callback `CB` starts executing.  
   - **t4**: Callback `CA` on Core 0 finishes, finds the `hrtimer` in a 
“running” state (but not of this version), and attempts to restart the timer by 
setting `hrtimer->expired += UINT32_MAX`.  
   
   Here, two errors occur:  
   - The old periodic timer that should have been canceled continues, 
overwriting the newly started timer.  
   - After the old timer restarts, its next trigger time is `t2 + UINT32_MAX`, 
which is incorrect—it should have been `t0 + 1000 ns + UINT32_MAX`.  
   
   **In this PR’s HRTimer implementation, this situation is impossible:**  
   - **t0**: Thread 0 on Core 0 starts a periodic timer with callback `CA`, 
initial delay `1000 ns`, period `UINT32_MAX`.  
   - **t1**: The timer fires on Core 0, and callback `CA` is executing.  
   - **t2**: Thread 0 is scheduled onto Core 1, cancels the timer, **taking 
ownership** of the `hrtimer` data structure from Core 0, and restarts a new 
timer with callback `CB`, initial delay `0 ns`, period `1 000 000 ns`.  
   - **t3**: The timer fires on Core 1, and callback `CB` starts executing.  
   - **t4**: Callback `CA` on Core 0 finishes, checks the hazard pointer and 
realizes it has lost ownership of the `hrtimer`, so it simply exits without 
further action.  
   
   The key difference is that **reference counting loses version information, 
while hazard pointers do not**. Canceling a timer via hazard pointers ensures 
that, by time **t2**, all cores using the `hrtimer` have lost ownership of the 
hrtimer, thereby **guaranteeing that an old timer callback can never overwrite 
a new timer**. 
   
   Designing correct concurrent algorithms requires systematic consideration. 
As I stated in https://github.com/apache/nuttx/pull/17570, after carefully 
reviewing your implementation, I could not really find a good solution that 
preserves version information without introducing significant performance or 
space overhead. 
   
   That is why I suggested we not spend additional time on the previous design 
and instead directly adopt this implementation. I do not disrespect your work 
or doubt your abilities—I acknowledge you as a diligent and excellent engineer. 
I simply do not wish to see anyone stumble twice on the same issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] sched: Refactor the functional incorrect HRTimer implementation. [nuttx]

Reply via email to