> From: Thomas Monjalon [mailto:tho...@monjalon.net]
> Sent: Thursday, 26 October 2023 18.07
> 
> 26/10/2023 17:54, Bruce Richardson:
> > On Thu, Oct 26, 2023 at 04:59:51PM +0200, Morten Brørup wrote:
> > > > From: Morten Brørup [mailto:m...@smartsharesystems.com]
> > > > Sent: Thursday, 26 October 2023 16.50
> > > >
> > > > > From: Thomas Monjalon [mailto:tho...@monjalon.net]
> > > > > Sent: Thursday, 26 October 2023 16.31
> > > > >
> > > > > 26/10/2023 16:08, Morten Brørup:
> > > > > > > From: Thomas Monjalon [mailto:tho...@monjalon.net]
> > > > > > > Sent: Thursday, 26 October 2023 16.05
> > > > > > >
> > > > > > > 26/10/2023 15:57, Morten Brørup:
> > > > > > > > > From: Morten Brørup [mailto:m...@smartsharesystems.com]
> > > > > > > > > Sent: Thursday, 26 October 2023 15.45
> > > > > > > > >
> > > > > > > > > > From: Thomas Monjalon [mailto:tho...@monjalon.net]
> > > > > > > > > > Sent: Thursday, 26 October 2023 15.37
> > > > > > > > > >
> > > > > > > > > > 25/10/2023 18:31, Thomas Monjalon:
> > > > > > > > > > > Real-time thread priority was been forbidden on Unix
> > > > > > > > > > > because of problems they can cause.
> > > > > > > > > > > Warnings and helpers are added to avoid deadlocks,
> > > > > > > > > > > so real-time can be allowed on all systems.
> > > > > > > > > >
> > > > > > > > > > Unit test is failing:
> > > > > > > > > > DPDK:fast-tests / threads_autotest      TIMEOUT 600.01
> s
> > > > > > > > > >
> > > > > > > > > > It is seen in only 1 target (maybe the failure
> occurence is
> > > > random):
> > > > > > > > > >   Debian 11 (Buster) (ARM) | PASS
> > > > > > > > > >   Fedora 37 (ARM)          | PASS
> > > > > > > > > >   CentOS Stream 9 (ARM)    | FAIL
> > > > > > > > > >   Fedora 38 (ARM)          | PASS
> > > > > > > > > >   Fedora 38 (ARM Clang)    | PASS
> > > > > > > > > >   Ubuntu 20.04 (ARM)       | PASS
> > > > > > > > > >
> > > > > > > > > > I need to send a v4 with new implementation and better
> comments.
> > > > > > > > > > The Unix sleep will be upgraded from 1 ns to 1 us in
> case it makes
> > > > a
> > > > > > > > > > difference.
> > > > > > > > >
> > > > > > > > > It will not make a difference. The kernel will go
> through the
> > > > sleeping
> > > > > > > steps,
> > > > > > > > > then wake up again and see the real-time thread is ready
> to run, and
> > > > > then
> > > > > > > > > immediately schedule it.
> > > > > > > > >
> > > > > > > > > For testing purposes, consider sleeping 10 milliseconds
> or something
> > > > > > > > > significant like that.
> > > > > > > >
> > > > > > > > A bit more details...
> > > > > > > >
> > > > > > > > In our recent tests, nanosleep() itself took around 50 us.
> So you need
> > > > > to
> > > > > > > sleep longer than that for your thread not to be runnable
> when the
> > > > > nanosleep()
> > > > > > > wakes up again, because 50 us has already passed in
> "nanosleep
> > > > overhead".
> > > > > > > > 10 milliseconds provides plenty of margin, and corresponds
> to 10
> > > > jiffies
> > > > > on
> > > > > > > a 1000 Hz kernel. (I don't know if it makes any difference
> for the
> > > > kernel
> > > > > > > scheduler if the timer crosses a jiffy border or not.)
> > > > > > >
> > > > > > > 10 ms looks like an eternity.
> > > > > >
> > > > > > Agree. It is only for functional testing, not for production!
> > > > >
> > > > > Realtime thread won't make any sense if we have to insert a long
> sleep.
> > > >
> > > > It seems David came to our rescue here!
> > > >
> > > > I have just tried running our test again with
> prctl(PR_SET_TIMERSLACK) of 1
> > > > ns, and the nanosleep(1 ns) delay dropped from ca. 50 us to ca.
> 2.5 us.
> > > >
> > > > The timeout parameter to epoll_wait() is in milliseconds, which is
> useless for
> > > > low-latency.
> > > > Perhaps real-time threads can be used with epoll() combined with
> timerfd for
> > > > nanosecond resolution timeout.
> > >
> > > Or epoll_pwait2(), which has nanosecond resolution timeout.
> > >
> > > Unfortunately, rte_epoll_wait() is not an experimental API anymore,
> so we cannot change its timeout parameter from milliseconds to micro- or
> nanoseconds. We would have to introduce a new API for this.
> > >
> >
> > Just an idea - can we change the timeout parameter to float rather
> than int,
> > and then use function versioning for backward compatibility for any
> > binaries passing int?
> > That way the actual meaning of the parameter doesn't change, but it
> still
> > allows sub-millisecond values (all-be-it with some loss of accuracy
> due to
> > float).

Too exotic for my taste. I would rather introduce rte_epoll_wait_ns() with 
timeout in nanoseconds than pass a float.

> 
> Sorry I'm not following why you want to use rte_epoll_wait()?

I don't have experience with it yet, but it seems to be the official DPDK API 
for blocking I/O system call.

> 
> If the realtime thread has some blocking system calls,
> no sleep is needed I think.

Correct.

> For average realtime thread, I suggest the API
> rte_thread_yield_realtime()
> which could wait for 1 ms or less by default.

If we introduce a yield() function, it should yield according to the O/S 
scheduling policy, e.g. the rest of the time slice allocated to the thread by 
the O/S scheduler (although that might not be available for real-time 
prioritized threads in Linux). I don't think we can make it O/S agnostic.

I don't think it should wait a fixed amount of time - we already have 
rte_delay_us_sleep() for that.

In my experiments with power saving, I ended up with a varying sleep duration, 
depending on traffic load. The l3fwd-power example also uses a varying sleep 
duration depending on traffic load.

> For smaller sleep, we can use PR_SET_TIMERSLACK and
> rte_delay_us_sleep().

Agree.

> If we provide an API for PR_SET_TIMERSLACK, we could adapt the duration
> of rte_thread_yield_realtime() dynamically after calling prctl().
> 

I'm not sure exposing an API for PR_SET_TIMERSLACK is the right solution.

I would rather have the EAL set the timer slack to minimum (1 ns) at EAL 
initialization. An EAL command line parameter could be added to change the 
default from 1 ns.

Also, something similar needs to be done for Windows.

Reply via email to