Yes, it worked before, but a long time ago. I tested this on both on arm (stm32f7) and risc-v (mpfs) platforms.

I tracked the problem down to this patch:

commit 19758788356f8623bac5f439419e231ff81cac14
Author: Huang Qi <huang...@xiaomi.com>
Date:   Mon Apr 11 18:42:24 2022 +0800
    arch/risc-v: Apply common mtime driver to mtime based chps
    Signed-off-by: Huang Qi <huang...@xiaomi.com>

The problem seems to be specific to RISC-V platforms

If I revert the changes in my platform (mpfs) in file arch/risc-v/src/mpfs/mpfs_timerisr.c, and handle the timer interrupt there, everything seems to be working again.

The "common mtimer driver" seems a bit complex (using the alarm interface), and I don't have time to debug that right now,  need to come back to the issue later. Maybe there is some race condition somewhere.

Warning to others: this might be broken for other risc-v platforms as well.

- Jukka


On 17.5.2023 18.51, Nathan Hartman wrote:
Was it working before? If so, are you able to use a git bisect to find
the commit where the bug was introduced? This might minimize the
amount of testing and debugging that needs to be done.

On Wed, May 17, 2023 at 11:12 AM Jukka Laitinen <jlait...@gmail.com> wrote:


Petro Karashchenko kirjoitti keskiviikko 17. toukokuuta 2023:
How do you measure the wait period? Are you togging a pin or used MCU free
running HW timer?
I used RISC-V MTIMER, so it is a free running HW counter at 1us resolution

Best regards,
Petro

On Wed, May 17, 2023, 5:43 PM Jukka Laitinen <jukka.laiti...@iki.fi> wrote:

On 17.5.2023 16.38, Gregory Nutt wrote:
On 5/17/2023 7:21 AM, Gregory Nutt wrote:
On 5/17/2023 4:21 AM, Jukka Laitinen wrote:
Hi,

I just observed the behaviour mentioned in the subject;

I tried just calling in a loop:

"

     sem_t sem =SEM_INITIALIZER(0);

     int ret;

     ret = nxsem_tickwait_uninterruptible(&sem, 1);

"

, and never posting the sem from anywhere. The function return
-ETIMEDOUT properly on every call.

But when measuring the time spent in the wait, I see randomly that
sometimes the sleep time was less than one systick.

If I set systick to 10ms, I see typical (correct) sleep time between
10000 - 20000us. But sometimes (very randomly) between 0 - 10000us.
Also in these error cases the return value is correct (-110,
-ETIMEDOUT).

When sleeping for 2 ticks, I see randomly sleep times between
10000-20000us, for 3 ticks 20000-30000us. So, randomly it is exactly
one systick too small.

I looked through the implementation of the
"nxsem_tickwait_uninterruptible" itself, and didn't saw problem
there. (Actually, I think there is a bug if -EINTR occurs; in that
case it should always sleep at least one tick more - now it doesn't.
But it is not related to this, in my test there was no -EINTR).

I believe the problem might be somewhere in sched/wdog/ , but so far
couldn't track down what causes it.

Has anyone else seen the same issue?

Br,

Jukka

If I understand what you are seeing properly, then it is normal and
correct behavior for a arbitrary  (asynchonous) timer.  See
https://cwiki.apache.org/confluence/display/NUTTX/Short+Time+Delays
for an explanation.

NuttX timers have always worked that way and has confused people that
use the timers near the limits of their resolution.  A solution is to
use a very high resolution timer in tickless mode.


Oops.  You are seeing a timer that is 1 tick too short.  That is an
error and should never happen.  Sorry for reading incorrectly. It was
still early in the morning here.

The timer logic adds +1 tick to the requested to assure that that
error never occurs.  If +1 were not added, the bad result would be
exactly as you describe (and as explained in the confluence reference).


Hi, yes, exactly. Seeing timeout 1 tick too short. Sorry for not
explaining it clearly enough :)

I fear that there is now some bug. It was rather easy to re-produce,
just a loop with few thousand iterations, and it occurs (infinite loop,
10 ms tick, less than a minute to catch). Most of the time it works ok;
the sleep time is longer than the requested ticks. But when it triggers,
the sleep is exactly one tick too short (and shorter than the requested
timeout in ticks).

I was just asking, if others have seen this as well; I'd like to know if
it is really a bug in current nuttx main. It is always possible that
there is something funny in our local build - although I can't see what
it could be.

-Jukka




Reply via email to