Eugene,

Here's the second note: some interesting things discovered during
testing.

Linux Fast-STREAMS performs at 85% of a pipe when the STREAMS scheduler
is run as a per-cpu software interrupt instead of a kernel thread, and
runs 3 times faster the LiS instead of just twice.  The only problem is
that put and service procedures run at bottom half.  That breaks the
strinet driver.  So I set the default back to per-cpu kernel threads.
Theoretically there should only be a difference on non-preemptive
kernels (i.e. older 2.4 kernels).

LiS 2.18, like 2.16 before it, runs its kernel threads with FIFO
scheduling and a real-time priority of 50.  Linux software interrupts
run nice -19 (as nice as they can get), but are executed upon return
from a hardware interrupt regardless of scheduling priority.  I tried
running LfS kernel threads like LiS (SCHED_FIFO, priority 50) and it
SLOWED DOWN.  This is too high a priority to run service procedures.
Try running them SCHED_RR nice -19 (but then other things might break
because races will change).

LiS 2.18, unlike 2.16 does not run the STREAMS scheduler on exit from
system calls.  LfS runs the STREAMS scheduler in the current process
context on exit from system calls in accordance with SVR 4.2.  This
removes a task switch for calls not subject to flow control in the
Stream and exhibits better performance (as well as running the put
procedure at the top of the module stack in user context which is
important to some non-conforming modules).

I also tested inter-module put and service procedure performance by
pushing a bunch of pass through or buffer modules onto the pipe.  LfS
performance is only about 30% better than LiS.  It appears that most of
LiS performance problems are in the Stream head and scheduling of
service procedures, rather than in put(), putnext() or service procedure
invocation itself.

It might be possible to get STREAMS-based pipes to outperform Linux
native pipes with LfS (but likely not with LiS).  LfS uses memory
caches for all STREAMS structures, including the Stream head and queue
pairs.  Linux native pipes kmalloc their pipe end data structures
(but page allocate their buffers).  I think that if I run many pipes
in parallel, LfS could exhibit better performance than native pipes
because the Stream heads are memory cached whereas the native pipes
will cache miss on the pipe end structures.  In a controlled test, LfS
can theoretically exhibit 150% of the per read/write performance of a 
Linux native pipe.  (Boy, would that impress John or what!)  LfS could
exhibit 6x the performance of LiS under the same circumstances (LiS
performance enhancements don't include memory cache for Stream heads).

I haven't done SMP testing, primarily 'cause I don't have an SMP box:
anyone willing to donate an SMP box to the project can have LfS tested
on their platform of choice and get CVS archive access to boot.  Failing
that, I am going to propose LfS to the OSDL for testing, but that will
probably take longer.

Unlike LiS, LfS runs a STREAMS scheduler per CPU differently: qenable
schedules on the same CPU for which it was invoked and STREAMS scheduler
threads run in their own per-CPU context (separate runqueues lists).
This also uses per-CPU thread info that greatly reduces lock contention
between CPUs.  It would be quite interesting to perform multiple
parallel pipe comparison tests on an N-way box.  I think LfS will really
shine there.

I will wrap the performance results into a proper report sometime,
but, I am furiously trying to wrap the documentation for LfS so that
it can be publicly released.

Hope that helps you in your quest for performance.

--brian


On Fri, 02 Dec 2005, [EMAIL PROTECTED] wrote:

> 
>    Hello,
> 
>    I'm curious about people impressions about LiS-2.18 performance.
>    Is it better comparing to LiS-2.16.18 ?
> 
>    Are  there  any  known  2.18  issues  that  can  be  fixed  to improve
>    performance?
> 
>    My   understanding  is  that  in  LiS-2.18  most(all?)  of  the  queue
>    processing
>    is done by LiS kernel threads and queuerun is never executed from
>    the  driver  tasklet  context.  That may result, I guess, in excessive
>    process
>    switching overhead and poorer performance.
>    I might be missing something, though.
> 
>    The other thing I noticed when I ran my tests on a 4 processor system
>    is that only one LiS thread accumulated CPU time:
> 
>    root 9574 1 0 Dec01 ? 00:02:27 [LiS-2.18.0:0]  <-------
>    root 9575 1 0 Dec01 ? 00:00:01 [LiS-2.18.0:1]
>    root 9576 1 0 Dec01 ? 00:00:00 [LiS-2.18.0:2]
>    root 9577 1 0 Dec01 ? 00:00:00 [LiS-2.18.0:3]
> 
>    Is it the way it's supposed to be, or  it's a bug?
> 
> 
>    I'd  appreciate  any  comment/advices  regarding performance issues on
>    LiS-2.18.
> 
>    --
>    Eugene
> 
> 
> 
>      _________________________________________________________________
> 
>    Try the New Netscape Mail Today!
>    Virtually Spam-Free | More Storage | Import Your Contact List
>    [1]http://mail.netscape.com
> 
> References
> 
>    1. http://mail.netscape.com/

-- 
Brian F. G. Bidulock    ¦ The reasonable man adapts himself to the ¦
[EMAIL PROTECTED]    ¦ world; the unreasonable one persists in  ¦
http://www.openss7.org/ ¦ trying  to adapt the  world  to himself. ¦
                        ¦ Therefore  all  progress  depends on the ¦
                        ¦ unreasonable man. -- George Bernard Shaw ¦
_______________________________________________
Linux-streams mailing list
[email protected]
http://gsyc.escet.urjc.es/mailman/listinfo/linux-streams

Reply via email to