Re: [Linux-streams] LiS-2.18 performance ?

Brian F. G. Bidulock Fri, 02 Dec 2005 14:42:07 -0800

Eugene,

I ran pipe performance tests on 2.56GHz PIV 333MHz FSB, same box I ran
2.16.16 and 2.16.18 pipe performance test on last time (about 18 months
ago).  I show 2.18 as comparable to 2.16.16.  The performance increases
of 2.16.18 only showed improvements when message sizes were within a
FASTBUF.

Performance tests on RH7.2 2.4.20-28.7bigmem (SMP kernel I tested on
last time) shows LiS 2.18 STREAMS-based pipes clocking in at a dismal
13% when compared to Linux SVR3-style native pipes.  2.16.18 showed
about 20% 18 months ago, but only beneath 64-byte read/write sizes and
then fell back to about 10% after that.

Performances tests on Centos4 (RHEL4 clone) and FC4 show the performance
gains of the 2.6 kernel (and recent re-optimizing compilers) to be quite
significant.

On FC4 (a regparms kernel), the per-byte read/write latency drops from
about 750 ps (picoseconds) on RH7.2 and CL4 to about 500 ps on FC4.  LiS
2.18 experiences a per-byte read/write latency drop from 1600 ps on
RH7.2 to 900 ps on CL4 and FC4.  I attribute the gain on native to the
regparms FC4 kernel.  I attribute the gain on LiS to the better
compilers (3.4.3 and 4.0) on CL4 and FC4 that better find their way
around cruft in the code.

Per message read/write delays for LiS 2.18 drops from 20 us on RH7.2 to
8 us on CL4 and FC4, while native pipes sit at around 2.5 us on all
three.  I attribute the gain on LiS to the tighter scheduling latency
and O(1) scheduler of the 2.6 kernel.  STREAMS-based pipes are far more
susceptible to scheduling latency.

Overall, when compared to native pipes, LiS 2.18 performs at 12.7% for
RH7.2, 28.1% for CL4 and a top end of 38.8% for FC4.  The FC4 native
pipes really cruise, so 38.8% is quite good.  I attribute the good FC4
results to the O(1) scheduler, the regparms kernel and the re-optimizing
GCC 4 compiler.

It is interesting that kernel improvements generate better performance
gains than could be accomplished within LiS with the changes from
2.16.16 to 2.16.18.  There, it was only a 2x gain when compared to
native pipes and only beneath 64-byte writes.  The FC4 improvesments are
across all messages sizes (tested linear with .999 correlation up to
4096 bytes).

So I suppose the story with LiS is, if you want the best performance use
a good 2.6 kernel.  Because 2.16.18 only runs on a 2.6 kernel, 2.18 is
the better choice of the two for performance.  If you are running on a
2.4 kernel; however, expect better performance from 2.16.18 at message
sizes within a FASTBUF.

Now, Linux Fast-STREAMS...

LfS (streams-0.7a.4) in the same performance tests relative to Linux
native pipes clocked in a 40%, 60% and 75% on RH7.2, CL4 and FC4 over
all message sizes.  When compared to LiS at 13%, 28% and 39% in the
same tests, LfS performs 3.1x, 2.1x, 1.9x compared to LiS.  The 3x
performance gain on 2.4 SMP over LiS 2.18 is quite impressive,
particularly when you consider that compared to native pipes, LfS runs
as fast on RH7.2 2.4 as LiS 2.18 runs on FC4.  The other impressive
figure is that LfS on FC4 is running at 75% of the performance of a
native Linux pipe.  This exceeds John Boyd's "impressed" threshold
(better than 50% native pipe performance).

LfS is the best performance choice on any kernel.  Transitioning from
LiS 2.16.18 to LfS is a better performance choice than to LiS 2.18.
But then, that's why I called it "Fast".

I will send a separate note on some of my discoveries reagarding
performance on LiS and LfS.

Here is the raw (well, half-cooked) data: (obtained using the perftest
program included in the OpenSS7 LiS 2.18.2 release and the streams
0.7a.4 release):

Linear regression was performed on pipe throughput at 4, 8, 16, 32,
64, 128, 256, 512, 1024, 2048 and 4096 byte message sizes running the
pipe wide open (100% cpu utilization).  Correlations were usually 99.9%
Slope is per-byte read/write delay, intercept is per message read/write
delay.  The delay is y = mx + b, where x is the message size in bytes.

Per byte delay, slope, (picoseconds):

        RH7.2           CL4             FC4
        ---------       --------        --------
LiS     1620             886             925
LfS     1230             919             987
Linux    760             750             482

Per write delay, intercept, (microseconds):

        RH7.2           CL4             FC4
        ---------       --------        --------
LiS     19.20            7.72            7.83
LfS      6.54            3.57            4.02
Linux    2.43            2.17            3.03

--brian

On Fri, 02 Dec 2005, [EMAIL PROTECTED] wrote:

> 
>    Hello,
> 
>    I'm curious about people impressions about LiS-2.18 performance.
>    Is it better comparing to LiS-2.16.18 ?
> 
>    Are  there  any  known  2.18  issues  that  can  be  fixed  to improve
>    performance?
> 
>    My   understanding  is  that  in  LiS-2.18  most(all?)  of  the  queue
>    processing
>    is done by LiS kernel threads and queuerun is never executed from
>    the  driver  tasklet  context.  That may result, I guess, in excessive
>    process
>    switching overhead and poorer performance.
>    I might be missing something, though.
> 
>    The other thing I noticed when I ran my tests on a 4 processor system
>    is that only one LiS thread accumulated CPU time:
> 
>    root 9574 1 0 Dec01 ? 00:02:27 [LiS-2.18.0:0]  <-------
>    root 9575 1 0 Dec01 ? 00:00:01 [LiS-2.18.0:1]
>    root 9576 1 0 Dec01 ? 00:00:00 [LiS-2.18.0:2]
>    root 9577 1 0 Dec01 ? 00:00:00 [LiS-2.18.0:3]
> 
>    Is it the way it's supposed to be, or  it's a bug?
> 
> 
>    I'd  appreciate  any  comment/advices  regarding performance issues on
>    LiS-2.18.
> 
>    --
>    Eugene
> 
> 
> 
>      _________________________________________________________________
> 
>    Try the New Netscape Mail Today!
>    Virtually Spam-Free | More Storage | Import Your Contact List
>    [1]http://mail.netscape.com
> 
> References
> 
>    1. http://mail.netscape.com/

-- 
Brian F. G. Bidulock    ¦ The reasonable man adapts himself to the ¦
[EMAIL PROTECTED]    ¦ world; the unreasonable one persists in  ¦
http://www.openss7.org/ ¦ trying  to adapt the  world  to himself. ¦
                        ¦ Therefore  all  progress  depends on the ¦
                        ¦ unreasonable man. -- George Bernard Shaw ¦
_______________________________________________
Linux-streams mailing list
Linux-streams@gsyc.escet.urjc.es
http://gsyc.escet.urjc.es/mailman/listinfo/linux-streams

Re: [Linux-streams] LiS-2.18 performance ?

Reply via email to