Re: [Lse-tech] more on scheduler benchmarks

2001-01-22 Thread Andi Kleen

On Mon, Jan 22, 2001 at 02:23:05PM -0500, Bill Hartner wrote:
> Mike K, wrote :
> 
> >
> > If the above is accurate, then I am wondering what would be a
> > good scheduler benchmark for these low task count situations.
> > I could undo the optimizations in sys_sched_yield() (for testing
> > purposes only!), and run the existing benchmarks.  Can anyone
> > suggest a better solution?
> 
> Hacking sys_sched_yield is one way around it.

How about process pairs that bounce around tokens in pipes ? 

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Lse-tech] more on scheduler benchmarks

2001-01-22 Thread Bill Hartner


Hubertus wrote :

> The only problem I have with sched_yield like benchmarks is that it
creates
> artificial lock contention as we basically spent most of the time other
> then context switching + syscall under the scheduler lock. This we won't
> see in real apps, that's why I think the chatroom numbers are probably
> better indicators.

Agreed. 100% artificial. The intention of the benchmark is to put a lot
of pressure on the scheduler so that the benchmark results will be very
"sensitive" to changes in schedule().  For example, if you were to split
the scheduling fields used by goodness() into several cache lines, the
benchmark results should reveal the degradation.  chatroom would probably
show it too though.  At some point, we could run your patch on our
SPECweb99 setup using Zeus - we don't have any lock analysis data on the
workload yet so I don't know what the contention on the run_queue lock is.
Zeus does not use many threads.  Apache does.

Bill

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Lse-tech] more on scheduler benchmarks

2001-01-22 Thread Hubertus Franke


Mike,

Deactivating that optimization is a good idea.
What we are interested in is what the general latency of the scheduler code
is. This should help to determine that.

The only problem I have with sched_yield like benchmarks is that it creates
artificial lock contention as we basically spent most of the time other
then context switching + syscall under the scheduler lock. This we won't
see in real apps, that's why I think the chatroom numbers are probably
better indicators.


Hubertus Franke
Enterprise Linux Group (Mgr),  Linux Technology Center (Member Scalability)
, OS-PIC (Chair)
email: [EMAIL PROTECTED]
(w) 914-945-2003(fax) 914-945-4425   TL: 862-2003



Mike Kravetz <[EMAIL PROTECTED]>@lists.sourceforge.net on 01/22/2001
01:17:38 PM

Sent by:  [EMAIL PROTECTED]


To:   [EMAIL PROTECTED]
cc:   [EMAIL PROTECTED], Ingo Molnar <[EMAIL PROTECTED]>
Subject:  [Lse-tech] more on scheduler benchmarks



Last week while discussing scheduler benchmarks, Bill Hartner
made a comment something like the following "the benchmark may
not even be invoking the scheduler as you expect".  This comment
did not fully sink in until this weekend when I started thinking
about changes made to sched_yield() in 2.4.0.  (I'm cc'ing Ingo
Molnar because I think he was involved in the changes).  If you
haven't taken a look at sys_sched_yield() in 2.4.0, I suggest
that you do that now.

A result of new optimizations made to sys_sched_yield() is that
calling sched_yield() does not result in a 'reschedule' if there
are no tasks waiting for CPU resources.  Therefore, I would claim
that running 'scheduler benchmarks' which loop doing sched_yield()
seem to have little meaning/value for runs where the number of
looping tasks is less than then number of CPUs in the system.  Is
that an accurate statement?

If the above is accurate, then I am wondering what would be a
good scheduler benchmark for these low task count situations.
I could undo the optimizations in sys_sched_yield() (for testing
purposes only!), and run the existing benchmarks.  Can anyone
suggest a better solution?

Thanks,
--
Mike Kravetz [EMAIL PROTECTED]
IBM Linux Technology Center
15450 SW Koll Parkway
Beaverton, OR 97006-6063 (503)578-3494

___
Lse-tech mailing list
[EMAIL PROTECTED]
http://lists.sourceforge.net/lists/listinfo/lse-tech



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [Lse-tech] more on scheduler benchmarks

2001-01-22 Thread Bill Hartner

Mike K, wrote :

>
> If the above is accurate, then I am wondering what would be a
> good scheduler benchmark for these low task count situations.
> I could undo the optimizations in sys_sched_yield() (for testing
> purposes only!), and run the existing benchmarks.  Can anyone
> suggest a better solution?

Hacking sys_sched_yield is one way around it.

Also, if you plot the thread count vs. yield times, you might
be able to get an idea of how many microseconds each thread
on the run_queue costs and what is the "base overhead" of just
running through the scheduler.  Could give some indication
of what it costs when small number of threads are on the run_queue.

On an 8-way, try 8, 10, 12, 14, 16, 18, ... threads and see
what the chart looks like when you plot #threads vs. microseconds.

Then there is always cache issues that send your results all over
the place.  You could hack sched_test_yield to start with 32 threads
and get results.  Then kill off (2) threads and get results for 30,
then kill off (2) more threads and get results for 28, ...

-

One aspect of the scheduler that is not being tested by
the yield tests is wake_up_process() and reschedule_idle().

So, here is another potential scheduler benchmark ...

The benchmark will do the following :

sched_test_semop -c 4 -l 3

The above command parms would create 32 threads or 4 chains of 3
threads.

A1->A2->A3
B1->B2->B3
C1->C2->C3
D1->D2->D3

-c N controls how many chains
-l N controls the length of the chains

Each thread has it's own sema4.

When the test starts, the main thread posts A1 B1 C1 D1 sema4.
Then A1 post A2 sem, B1 posts B2, C1 posts C2, D1 post D2.
At the same time A1, B1, C1, D1 waits on there own sema4.
And around and around and ...

So, you can control the number of threads in the RUN state,
and you exercise the wakeup code.  Your numbers will be
diluted by the overhead in semop().  The metric would be
how many wakeups done by each thread.

Is there a better API that could accomplish the same
wakeup someone up and then sleep semantics ?

Bill Hartner
[EMAIL PROTECTED]
Linux Technology Center - Linux Kernel Performance


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/