Fix-Point commented on PR #17675: URL: https://github.com/apache/nuttx/pull/17675#issuecomment-3692774822
> @Fix-Point > > Let me make a conclusion of all my comments for your implementation into four categories: > > 1. Your way of resolving multicore running the same hrtimer for SMP system has performance issue and is not deterministic > 2. You does not consider to remove un-needed operations when running on non-SMP platform, so your implementation is also not good for non-smp platform > 3. Your APIs are two complicated, not friendly for users > 4. You do not decouple the queue abstraction for hrtimer very well, the queue related operations are not needed to be exposed to users Thank you for your suggestions. Your comments are very insightful. Below are the modifications made based on your advice: - Your suggestion regarding performance improvement in non‑SMP mode is excellent. Following your advice, macros have been added to remove unnecessary waits that cannot occur in non‑SMP scenarios. - Regarding the point about too many parameters in the API, I decided to fully align with the `wdog/workqueue` interface: the original `cancel` is now named `cancel_sync`, while `async_cancel` is renamed to `cancel`. - Your suggestion to move the `hrtimer_queue.h` implementation into an internal header file is very good, as it helps reduce the likelihood of exposing this header to users. - Your suggestions for documentation improvements are very helpful. Since RST documents cannot render Mermaid state diagrams, I used an AI‑generated ASCII art to describe the state diagram. Regarding some other points that have not been changed yet, clarifications are as follows: ### 1. Determinism Issues Caused by Waiting We need to discuss this case by case: - In **non‑SMP mode**, because a timer interrupt cannot occur while a thread is executing `cancel`, the semantics of `hrtimer_cancel` and `hrtimer_cancel_sync` are essentially the same—no waiting is required. Therefore, both `hrtimer_start` and `hrtimer_cancel` are deterministic and predictable. - In **SMP mode**, only `hrtimer_cancel_sync` requires waiting, which is unavoidable. This interface is typically called only before releasing a dynamically allocated `hrtimer` or when expecting all callback executions to complete. In most cases, **restarting a timer only requires the asynchronous `hrtimer_cancel` + `hrtimer_restart` combination, which does not involve waiting and is deterministic**. However, it requires users to be aware of synchronization issues in callback functions. ### 2. Interface Design Issues As mentioned, the reasons for adopting an interface in the form of `hrtimer_start(timer, func, arg, time)` are twofold: - To align with the semantics of the `wdog/workqueue` interface, improve user‑friendliness, and maintain consistency with kernel interface styles. - To use `hrtimer->func` (`NULL` and `UINTPTR_MAX`) to encode state, saving 4 bytes of storage for the `state` field. From a performance perspective, compared to the previous implementation, this PR’s implementation does incur **two additional writes outside the critical section** when restarting a timer, but it saves **one write inside the critical section** (because `func` is used to encode state). As shown in the table: | | `hrtimer_start(timer, func, arg, time)` | `hrtimer_start(time, mode)` | | - | - | - | | start timer | func, arg, expired | func, arg, expired, **state** | | restart timer | func, arg, expired | expired, **state** | Since these are consecutive writes to the same cache line and CPU pipelining can hide some of the latency, the extra write outside the critical section should not cause significant performance issues in practice. Overall, I believe the `hrtimer_start(timer, func, arg, time)` interface form is better. ### 3. Lock Performance Issues According to our test results (https://github.com/apache/nuttx/pull/17569/): - On **non‑SMP platforms**, the write‑lock performance of seqcount is **1.58x** that of spinlock. - On **SMP platforms**, the write‑lock performance of seqcount is roughly comparable with spinlock. However, these results are from x86 platforms and may vary across different architectures. Therefore, I fully agree with your point about lock performance—there is no lock fits for all scenarios. For example, in multi‑core real‑time scenarios, we might need starvation‑free queuing locks instead of high‑throughput spinlocks. To address this, I will attempt to introduce a **composable lock abstraction layer** in future improvements, allowing users to customize the lock used by `hrtimer_queue`. The lock‑related interfaces have already been abstracted, and I am considering how to better compose the lock data structure for `hrtimer_queue`. ### 4. Why Forward‑declare the Dependent Function `hrtimer_reprogram`? I also tried two other approaches, but each had its own issues: - Using a pre‑defined `hrtimer_ops_t` callback to inform users that they need to implement `hrtimer_reprogram` when instantiating an `hrtimer`. This works fine under `GCC/Clang`—the compiler can fully inline `static const hrtimer_ops_t` and eliminate the overhead of the structure. However, I checked the assembly and found that safety compilers like **CompCertC** cannot perform such optimizations. Hence, I had to adopt the current implementation. - Using higher‑order macro functions, passing `hrtimer_reprogram` as a dependent input function. This avoids compiler optimization issues, but @xiaoxiang781216 and @GUIDINGLI considered it poor for readability. After trying the above approaches, I chose to forward‑declare the dependent function `hrtimer_reprogram`. This implementation meets readability, performance, and space requirements, but requires slight caution during use (the inclusion order must strictly follow Figure 1; otherwise, compilation errors may occur). ### 5. Why Replace Instead of Improve? I appreciate your use of **reference counting** in the latest version to solve memory reclamation issues. However, the problem is that **reference counting loses version information**. A version‑aware method, such as **Epoch‑based memory reclamation** (a typical kernel implementation is Linux’s QSBR‑RCU), is needed to address this. As I mentioned in a previous scenario: - **t0**: Thread 0 on Core 0 starts a periodic timer with callback `CA`, initial delay `1000 ns`, period `UINT32_MAX`. - **t1**: The timer fires on Core 0, and callback `CA` is executing. - **t2**: Thread 0 is scheduled onto Core 1, cancels the timer, and restarts a new timer with callback `CB`, initial delay `0 ns`, period `1 000 000 ns`. - **t3**: The timer fires on Core 1, and callback `CB` starts executing. - **t4**: Callback `CA` on Core 0 finishes, finds the `hrtimer` in a “running” state (but not of this version), and attempts to restart the timer by setting `hrtimer->expired += UINT32_MAX`. Here, two errors occur: - The old periodic timer that should have been canceled continues, overwriting the newly started timer. - After the old timer restarts, its next trigger time is `t2 + UINT32_MAX`, which is incorrect—it should have been `t0 + 1000 ns + UINT32_MAX`. **In this PR’s HRTimer implementation, this situation is impossible:** - **t0**: Thread 0 on Core 0 starts a periodic timer with callback `CA`, initial delay `1000 ns`, period `UINT32_MAX`. - **t1**: The timer fires on Core 0, and callback `CA` is executing. - **t2**: Thread 0 is scheduled onto Core 1, cancels the timer, **taking ownership** of the `hrtimer` data structure from Core 0, and restarts a new timer with callback `CB`, initial delay `0 ns`, period `1 000 000 ns`. - **t3**: The timer fires on Core 1, and callback `CB` starts executing. - **t4**: Callback `CA` on Core 0 finishes, checks the hazard pointer and realizes it has lost ownership of the `hrtimer`, so it simply exits without further action. The key difference is that **reference counting loses version information, while hazard pointers do not**. Canceling a timer via hazard pointers ensures that, by time **t2**, all cores using the `hrtimer` have lost ownership of the hrtimer, thereby **guaranteeing that an old timer callback can never overwrite a new timer**. Designing correct concurrent algorithms requires systematic consideration. As I stated in https://github.com/apache/nuttx/pull/17570, after carefully reviewing your implementation, I could not really find a good solution that preserves version information without introducing significant performance or space overhead. That is why I suggested we not spend additional time on the previous design and instead directly adopt this implementation. I do not disrespect your work or doubt your abilities—I acknowledge you as a diligent and excellent engineer. I simply do not wish to see anyone stumble twice on the same issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
