subject:"\"Process timing issue\""

Re: Process timing issue

2011-02-24 Thread Daniel O'Connor


On 22/02/2011, at 2:54, Jerome Flesch wrote:
> While investigating a timing issue with one of our program, we found out 
> something weird: We've written a small test program that just calls 
> clock_gettime() a lot of times and checks that the time difference between 
> calls makes sense. In the end, it seems it doesn't always do.
> 
> Calling twice in a row clock_gettime() takes usually less than 1ms. But with 
> an average load, about 1 time in 20, more than 10ms are spent between 
> both calls for no apparent reason. According to our tests, when it happens, 
> the time between both calls can go from few milliseconds to many seconds (our 
> best score so far is 10 seconds :). Same goes for gettimeofday().
> 
> To reproduce this issue, we use the small test program joined to this mail 
> and openssl speed test commands (usually 'openssl speed -elapsed dsa2048'). 
> It only appears if one of these openssl speed test run on the same core than 
> our test progam, so we have to start as many openssl instances as there are 
> cores on the test computer.

This sounds like a problem I have trying to read from a FIFO in hardware over 
USB.

Most of the time the libusb poll thread doesn't take very long, but every now 
and then it takes too long and my FIFO fills up (~75msec).

If you do find a solution I would be very interested :)

In the end I wrote a kernel driver which read & buffered the data, but I 
suppose that is not possible for you.

--
Daniel O'Connor software and network engineer
for Genesis Software - http://www.gsoft.com.au
"The nice thing about standards is that there
are so many of them to choose from."
  -- Andrew Tanenbaum
GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C






___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Process timing issue

2011-02-24 Thread John Baldwin

On Thursday, February 24, 2011 12:34:04 pm Andriy Gapon wrote:
> on 24/02/2011 16:18 Jerome Flesch said the following:
> > Thanks for your explanations. It helped greatly. Using ktrdump and 
> > schedgraph.py
> > and after modifying our test program to set and unset automatically
> > debug.ktr.mask, I've been able to get useful information.
> > 
> > First, It made me realize that task switching, with default settings and 2 
> > active
> > processes, only occurs each 100ms. Knowing that, expecting a latency time 
> > around
> > 100ms was kind of silly :)
> > 
> > Next, it seems most of the latency pikes are due to a process starting or 
> > waking
> > up. For instance, it usually happens when the openssl speed test is started 
> > (
> > http://jflesch.kwain.net/~jflesch/sys_latence/sched/sched_graph_openssl_start.png
> > ) or when pagedaemon wakes up (I forgot to disable the swap and my test 
> > program
> > used too much memory to store the result values ...). I'm not sure why, but 
> > when
> > we start openssl, it is often allowed to run for >= 300ms, even with our 
> > test
> > program set to real time priority. My intuition is that, at first, it's 
> > considered
> > as an interactive process, until the scheduler realizes it's not. But then, 
> > does
> > anyone know why it would take more than 300ms for the scheduler to realize 
> > that ?
> > 
> > Anyway, by setting kern.sched.interact=5 (so openssl isn't considered as an
> > interactive process), kern.sched.slice=3 (to get an high enough scheduling
> > resolution), and our program to real-time priority, we got rid of both 
> > problems.
> > I'm just a little bit worried about kern.sched.slice=3. Is there any known 
> > side
> > effect when reducing slices size ?
> > 
> > Also, another issue remain: We were hoping to keep our program with a normal
> > priority. However when we set our test program to a normal priority (but 
> > still an
> > higher priority than openssl), both get 50% of the CPU (I guess this is to 
> > be
> > expected), and from time to time we have a "hiccup" in the scheduling:
> > http://jflesch.kwain.net/~jflesch/sys_latence/sched/sched_graph_hicups.png 
> > . Is
> > there any way to avoid them ? In other words, is it possible to make sure 
> > that the
> > low priority process never gets more CPU time than the high priority one ?
> 
> The problems that you describe here sound very much like the issues that John
> Baldwin has been trying to solve a short while ago.  My recollection is that 
> he
> committed some improvements for real time priority processes.  Perhaps he'll 
> have
> some additional insights based on his observations and testing.

Well, the changes I made to 9 simply made rtprio more important than
interactive so that rtprio will always preempt interactive time-sharing
threads.  I'm not quite sure that this is exactly the same.  Note that by
default ULE does give interactive processes realtime priority, so that is why
openssl would not yield early on during startup.

As to why it takes the scheduler 300ms to decide openssl is a CPU hog, that
I'm less sure of.  You'd have to look at the interactive scoring stuff in ULE
to debug that.

How are you setting your program to a "normal" priority that is still higher
than openssl?  Are you using nice?

Hmm, during your hiccup it looks like openssl got two time slices back to
back rather than a single slice.

Also, note that in the hiccup graph, both threads have the same priority
(183), so openssl effectively has the same priority as timecheck.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Process timing issue

2011-02-24 Thread Andriy Gapon

on 24/02/2011 16:18 Jerome Flesch said the following:
> Thanks for your explanations. It helped greatly. Using ktrdump and 
> schedgraph.py
> and after modifying our test program to set and unset automatically
> debug.ktr.mask, I've been able to get useful information.
> 
> First, It made me realize that task switching, with default settings and 2 
> active
> processes, only occurs each 100ms. Knowing that, expecting a latency time 
> around
> 100ms was kind of silly :)
> 
> Next, it seems most of the latency pikes are due to a process starting or 
> waking
> up. For instance, it usually happens when the openssl speed test is started (
> http://jflesch.kwain.net/~jflesch/sys_latence/sched/sched_graph_openssl_start.png
> ) or when pagedaemon wakes up (I forgot to disable the swap and my test 
> program
> used too much memory to store the result values ...). I'm not sure why, but 
> when
> we start openssl, it is often allowed to run for >= 300ms, even with our test
> program set to real time priority. My intuition is that, at first, it's 
> considered
> as an interactive process, until the scheduler realizes it's not. But then, 
> does
> anyone know why it would take more than 300ms for the scheduler to realize 
> that ?
> 
> Anyway, by setting kern.sched.interact=5 (so openssl isn't considered as an
> interactive process), kern.sched.slice=3 (to get an high enough scheduling
> resolution), and our program to real-time priority, we got rid of both 
> problems.
> I'm just a little bit worried about kern.sched.slice=3. Is there any known 
> side
> effect when reducing slices size ?
> 
> Also, another issue remain: We were hoping to keep our program with a normal
> priority. However when we set our test program to a normal priority (but 
> still an
> higher priority than openssl), both get 50% of the CPU (I guess this is to be
> expected), and from time to time we have a "hiccup" in the scheduling:
> http://jflesch.kwain.net/~jflesch/sys_latence/sched/sched_graph_hicups.png . 
> Is
> there any way to avoid them ? In other words, is it possible to make sure 
> that the
> low priority process never gets more CPU time than the high priority one ?

The problems that you describe here sound very much like the issues that John
Baldwin has been trying to solve a short while ago.  My recollection is that he
committed some improvements for real time priority processes.  Perhaps he'll 
have
some additional insights based on his observations and testing.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Process timing issue

2011-02-24 Thread Jerome Flesch

Thanks for your explanations. It helped greatly. Using ktrdump and 
schedgraph.py and after modifying our test program to set and unset 
automatically debug.ktr.mask, I've been able to get useful information.


First, It made me realize that task switching, with default settings and 
2 active processes, only occurs each 100ms. Knowing that, expecting a 
latency time around 100ms was kind of silly :)


Next, it seems most of the latency pikes are due to a process starting 
or waking up. For instance, it usually happens when the openssl speed 
test is started ( 
http://jflesch.kwain.net/~jflesch/sys_latence/sched/sched_graph_openssl_start.png 
) or when pagedaemon wakes up (I forgot to disable the swap and my test 
program used too much memory to store the result values ...). I'm not 
sure why, but when we start openssl, it is often allowed to run for >= 
300ms, even with our test program set to real time priority. My 
intuition is that, at first, it's considered as an interactive process, 
until the scheduler realizes it's not. But then, does anyone know why it 
would take more than 300ms for the scheduler to realize that ?


Anyway, by setting kern.sched.interact=5 (so openssl isn't considered as 
an interactive process), kern.sched.slice=3 (to get an high enough 
scheduling resolution), and our program to real-time priority, we got 
rid of both problems. I'm just a little bit worried about 
kern.sched.slice=3. Is there any known side effect when reducing slices 
size ?


Also, another issue remain: We were hoping to keep our program with a 
normal priority. However when we set our test program to a normal 
priority (but still an higher priority than openssl), both get 50% of 
the CPU (I guess this is to be expected), and from time to time we have 
a "hiccup" in the scheduling: 
http://jflesch.kwain.net/~jflesch/sys_latence/sched/sched_graph_hicups.png 
. Is there any way to avoid them ? In other words, is it possible to 
make sure that the low priority process never gets more CPU time than 
the high priority one ?




On 23.02.2011 02:08, Ryan Stone wrote:

To debug weird scheduling issues I find it helpful to start by looking
at a schedgraph.  schedgraph is a tool that can display a graphical
representation of what the scheduler was doing over a small slice of
time.  The one downside is that you have to recompile your kernel to
get the hooks that collect the necessary data, but it sounds like your
problem is pretty easy to reproduce and so that shouldn't be a big
problem.

To enable the hooks, put the following in your kernel conf:

options KTR
options KTR_COMPILE=KTR_SCHED
options KTR_MASK=KTR_SCHED
options KTR_ENTIRES=(128*1024)

Then rebuild and install the new kernel.  Next, run your test.  The
instant that your test has detected the long delay, set the sysctl
debug.ktr.mask to 0.  The scheduler hooks record data into a ring
buffer, so the interesting data can be flushed out within seconds.
Clearing that sysctl will stop any further events from being logged,
which means that the interesting data will stay there until you go and
collect it.

You can get the raw data by redirecting the output of "ktrdump -ct"
into a file.  Then from any machine with a graphical interface(FreeBSD
with X installed, Windows, Mac, whatever) and python installed, run:
$ python schedgraph.py /path/to/ktrdump/output

You can get schedgraph.py from /usr/src/tools/sched.

If you want to collect the data again, set the sysctl debug.ktr.mask
back to 0x2000 to restart gathering scheduler data.

Ryan

Re: Process timing issue

2011-02-22 Thread Ryan Stone

To debug weird scheduling issues I find it helpful to start by looking
at a schedgraph.  schedgraph is a tool that can display a graphical
representation of what the scheduler was doing over a small slice of
time.  The one downside is that you have to recompile your kernel to
get the hooks that collect the necessary data, but it sounds like your
problem is pretty easy to reproduce and so that shouldn't be a big
problem.

To enable the hooks, put the following in your kernel conf:

options KTR
options KTR_COMPILE=KTR_SCHED
options KTR_MASK=KTR_SCHED
options KTR_ENTIRES=(128*1024)

Then rebuild and install the new kernel.  Next, run your test.  The
instant that your test has detected the long delay, set the sysctl
debug.ktr.mask to 0.  The scheduler hooks record data into a ring
buffer, so the interesting data can be flushed out within seconds.
Clearing that sysctl will stop any further events from being logged,
which means that the interesting data will stay there until you go and
collect it.

You can get the raw data by redirecting the output of "ktrdump -ct"
into a file.  Then from any machine with a graphical interface(FreeBSD
with X installed, Windows, Mac, whatever) and python installed, run:
$ python schedgraph.py /path/to/ktrdump/output

You can get schedgraph.py from /usr/src/tools/sched.

If you want to collect the data again, set the sysctl debug.ktr.mask
back to 0x2000 to restart gathering scheduler data.

Ryan
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Process timing issue

2011-02-22 Thread John-Mark Gurney

Jerome Flesch wrote this message on Tue, Feb 22, 2011 at 10:22 +0100:
> We expected both processes (the test program and openssl) to have each 
> half the CPU time and being scheduled quite often (at least once each 
> 10ms). According to the output of our test program, it works fine for 
> most of the calls to clock_gettime(), but from time to time (about 1 
> loop in 20 on my computer), we have a latency pike (>= 100ms).

Are you sure there isn't a cron task or something else that is suddenly
waking up, causing a large CPU spike?

-- 
  John-Mark Gurney  Voice: +1 415 225 5579

 "All that I will do, has been done, All that I have, has not."
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Process timing issue

2011-02-22 Thread Chuck Swiger

On Feb 22, 2011, at 1:22 AM, Jerome Flesch wrote:
>> A scheduler quantum of 10ms (or HZ=100) is a common granularity; probably 
>> some other process got the CPU and your timer process didn't run until the 
>> next or some later scheduler tick.  If you are maxing out the available CPU 
>> by running many "openssl speed" tasks, then this behavior is more-or-less 
>> expected.
>> 
> 
> We did most of our tests with kern.hz=1000 (the default FreeBSD value as far 
> as I know) and we also tried with kern.hz=2000 and kern.hz=1. It didn't 
> change a thing.

For a long time kern.hz=100 was the default; more recent versions have switched 
to kern.hz=1000, which is beneficial for minimizing latency for things like 
ipfw/dummynet processing, but also involve greater scheduler overhead.  
kern.hz=1 is likely to reduce performance and may have odd effects upon 
certain kernel timers.

> Also, we are talking about a process not being scheduled for more than 100ms 
> with only 1 instance of openssl on the same CPU core. Even with a scheduler 
> quantum of 10ms, I find that worrying :/

It depends on what else the machine is doing.  Gathering some data via acct/sa 
might be informative, as you might be running some other tasks via cron or 
whatever which your testing isn't expecting.

> We expected both processes (the test program and openssl) to have each half 
> the CPU time and being scheduled quite often (at least once each 10ms). 
> According to the output of our test program, it works fine for most of the 
> calls to clock_gettime(), but from time to time (about 1 loop in 20 on my 
> computer), we have a latency pike (>= 100ms).
> 
> Thing is, these pikes wouldn't worry us much if they wouldn't last longer 
> than 1s, but they do on some occasions.

Also, are you sure that you don't have ntpdate or ntpd or something else 
adjusting your system clock underneath you...?

Regards,
-- 
-Chuck

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Process timing issue

2011-02-22 Thread Jerome Flesch




On Feb 21, 2011, at 8:24 AM, Jerome Flesch wrote:

While investigating a timing issue with one of our program, we found out 
something weird: We've written a small test program that just calls 
clock_gettime() a lot of times and checks that the time difference between 
calls makes sense. In the end, it seems it doesn't always do.

Calling twice in a row clock_gettime() takes usually less than 1ms. But with an 
average load, about 1 time in 20, more than 10ms are spent between both 
calls for no apparent reason. According to our tests, when it happens, the time 
between both calls can go from few milliseconds to many seconds (our best score 
so far is 10 seconds :). Same goes for gettimeofday().


A scheduler quantum of 10ms (or HZ=100) is a common granularity; probably some other 
process got the CPU and your timer process didn't run until the next or some later 
scheduler tick.  If you are maxing out the available CPU by running many "openssl 
speed" tasks, then this behavior is more-or-less expected.

We did most of our tests with kern.hz=1000 (the default FreeBSD value as 
far as I know) and we also tried with kern.hz=2000 and kern.hz=1. It 
didn't change a thing.


Also, we are talking about a process not being scheduled for more than 
100ms with only 1 instance of openssl on the same CPU core. Even with a 
scheduler quantum of 10ms, I find that worrying :/


We expected both processes (the test program and openssl) to have each 
half the CPU time and being scheduled quite often (at least once each 
10ms). According to the output of our test program, it works fine for 
most of the calls to clock_gettime(), but from time to time (about 1 
loop in 20 on my computer), we have a latency pike (>= 100ms).


Thing is, these pikes wouldn't worry us much if they wouldn't last 
longer than 1s, but they do on some occasions.




We tried setting the test program to the highest priority possible 
(rtprio(REALTIME, RTP_PRIO_MAX)) and it doesn't seem to change a thing.




Does anyone know if there is a reason for this behavior ? Is there something 
that can be done to improve things ?


FreeBSD doesn't offer hard realtime guarantees, and it values maximizing 
throughput for all tasks more than it does providing absolute minimum latency 
even for something wanting RT.  There has been some discussion in the past 
about making RT tasks with very high priority less pre-emptible by lower 
priority tasks by removing or reducing the priority lowering that occurs when a 
task gets allocated CPU time.

What problem are you trying to solve where continuous CPU load and minimum 
latency realtime are both required?

We are not looking for hard realtime guarantees. Most of our tests were 
done in normal priority. Using real time priority on our test program 
was just a try to see it improves things. From what I can tell, it 
doesn't :/




Regards,


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Process timing issue

2011-02-21 Thread Chuck Swiger

On Feb 21, 2011, at 8:24 AM, Jerome Flesch wrote:
> While investigating a timing issue with one of our program, we found out 
> something weird: We've written a small test program that just calls 
> clock_gettime() a lot of times and checks that the time difference between 
> calls makes sense. In the end, it seems it doesn't always do.
> 
> Calling twice in a row clock_gettime() takes usually less than 1ms. But with 
> an average load, about 1 time in 20, more than 10ms are spent between 
> both calls for no apparent reason. According to our tests, when it happens, 
> the time between both calls can go from few milliseconds to many seconds (our 
> best score so far is 10 seconds :). Same goes for gettimeofday().

A scheduler quantum of 10ms (or HZ=100) is a common granularity; probably some 
other process got the CPU and your timer process didn't run until the next or 
some later scheduler tick.  If you are maxing out the available CPU by running 
many "openssl speed" tasks, then this behavior is more-or-less expected.

> We tried setting the test program to the highest priority possible 
> (rtprio(REALTIME, RTP_PRIO_MAX)) and it doesn't seem to change a thing.
> 

> Does anyone know if there is a reason for this behavior ? Is there something 
> that can be done to improve things ?

FreeBSD doesn't offer hard realtime guarantees, and it values maximizing 
throughput for all tasks more than it does providing absolute minimum latency 
even for something wanting RT.  There has been some discussion in the past 
about making RT tasks with very high priority less pre-emptible by lower 
priority tasks by removing or reducing the priority lowering that occurs when a 
task gets allocated CPU time.

What problem are you trying to solve where continuous CPU load and minimum 
latency realtime are both required?

Regards,
-- 
-Chuck

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Process timing issue

2011-02-21 Thread Jerome Flesch


Hello,

While investigating a timing issue with one of our program, we found out 
something weird: We've written a small test program that just calls 
clock_gettime() a lot of times and checks that the time difference 
between calls makes sense. In the end, it seems it doesn't always do.


Calling twice in a row clock_gettime() takes usually less than 1ms. But 
with an average load, about 1 time in 20, more than 10ms are spent 
between both calls for no apparent reason. According to our tests, when 
it happens, the time between both calls can go from few milliseconds to 
many seconds (our best score so far is 10 seconds :). Same goes for 
gettimeofday().


To reproduce this issue, we use the small test program joined to this 
mail and openssl speed test commands (usually 'openssl speed -elapsed 
dsa2048'). It only appears if one of these openssl speed test run on the 
same core than our test progam, so we have to start as many openssl 
instances as there are cores on the test computer.


For instance:
- FreeBSD 7.3 + Core 2 duo
- FreeBSD 7.4/STABLE + Core 2 duo
- FreeBSD 8.1 + Core 2 duo
- FreeBSD 8.2-RC3 + Core 2 duo (clean setup)
- FreeBSD 9.0/CURRENT + Core 2 duo (clean setup, checked out last Friday)
 On all these setups, with 2 instances of openssl speed tests, on 
average, for 200*2 calls to clock_gettime(), we got about 10 pauses 
of between 100 and 300ms


- FreeBSD 8.2-RC1 + Xeon W3680 (6 cores):
 With 12 instances of openssl speed tests, for 1000*2 calls to 
clock_gettime(), we got pauses of between 300ms and 10s.


- FreeBSD 6.3 + VIA C3:
 With 1 instance of openssl, we got pauses of between 70ms and 300ms

- FreeBSD 7.3 + VIA C7:
 With 1 instance of openssl, we got pauses of between 150ms and 1s 
(Note: this computer may be slightly more loaded then the previous one)



For comparison purpose, we also tried on 2 Linux systems:
- Linux 2.6.32.15 (Via nano, with a bunch of services running on it):
   - 1 openssl speed tests + 200 iterations: max 10ms
   - 10 openssl speed tests + 200 iterations: max 44ms
- Linux 2.6 (Core i7):
   - 10 openssl speed tests + 200: max 1ms


We tried setting the test program to the highest priority possible 
(rtprio(REALTIME, RTP_PRIO_MAX)) and it doesn't seem to change a thing.



Does anyone know if there is a reason for this behavior ? Is there 
something that can be done to improve things ?



#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

int main(int argc, char *argv[])
{
struct timespec pre, cur;
useconds_t d, dmin, dmax, dsum;
int dweird = 0;
int i, loop;

if (argc != 2) {
fprintf(stderr, "timecheck \n");
return EXIT_FAILURE;
}

loop = atoi(argv[1]);
dmax = dsum = 0;
dmin = UINT_MAX;

fprintf(stdout, "looping %d times...\n", loop);

for (i = 0; i < loop; i++) {

if (clock_gettime(CLOCK_MONOTONIC, &pre) < 0)
err(EX_OSERR, NULL);

if (clock_gettime(CLOCK_MONOTONIC, &cur) < 0)
err(EX_OSERR, NULL);

d = (cur.tv_sec - pre.tv_sec) * 100;
d = d + (cur.tv_nsec / 1000) - (pre.tv_nsec / 1000);

dsum += d;
if (d < dmin)
dmin = d;
if (d > dmax)
dmax = d;
if (d > 10*1000) /* 10 ms */
dweird++;

cur = pre;
}

fprintf(stdout, "dmin  = %dus\ndmax  = %dus\ndmean = %dus\ndweird = 
%d\n",
dmin, dmax, dsum / loop, dweird);
return 0;
}

Re: Process timing issue

Re: Process timing issue

Re: Process timing issue

Re: Process timing issue

Re: Process timing issue

Re: Process timing issue

Re: Process timing issue

Re: Process timing issue

Re: Process timing issue

Process timing issue

10 matches

Site Navigation

Mail list logo

Footer information