Re: [PATCH] Re: Negative scalability by removal of lock_kernel()?(Was:Strange performance behavior of 2.4.0-test9)

2000-11-04 Thread Dave Wagner

Linus Torvalds wrote:
>
> No.
>
> Please use unserialized accept() _always_, because we can fix that.
>
> Even 2.2.x can be fixed to do the wake-one for accept(), if required.
> It's not going to be any worse than the current apache config, and
> basically the less games apache plays, the better the kernel can try to
> accomodate what apache _really_ wants done.  When playing games, you
> hide what you really want done, and suddenly kernel profiles etc end up
> being completely useless, because they no longer give the data we needed
> to fix the problem.
>
> Basically, the whole serialization crap is all about the Apache people
> saying the equivalent of "the OS does a bad job on something we consider
> to be incredibly important, so we do something else instead to hide it".
>
> And regardless of _what_ workaround Apache does, whether it is the sucky
> fcntl() thing or using SysV semaphores, it's going to hide the real
> issue and mean that it never gets fixed properly.
>
> And in the end it will result in really really bad performance.
>
> Instead, if apache had just done the thing it wanted to do in the first
> place, the wake-one accept() semantics would have happened a hell of a
> lot earlier.
>
> Now it's there in 2.4.x. Please use it. PLEASE PLEASE PLEASE don't play
> games trying to outsmart the OS, it will just hurt Apache in the long run.
>

But how would you suggest people using 2.2 configure their
Apache?  Will flock/fcntl or semaphores perform better (albeit
"uglier") than unserialized accept()'s in 2.2.  I'm willing
and expecting to rebuild apache when 2.4 is released.  I do
not, though, want to leave performance on the table today,
just so I can say that my apache binary is 2.4-ready.

Do any of the apache serialization methods (flock/fcntl/semops)
have any performance improvement over unserialized accept() with
Apache running on a 2.2 kernel?

Dave Wagner

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] Re: Negative scalability by removal of lock_kernel()?(Was:Strange performance behavior of 2.4.0-test9)

2000-11-04 Thread Dave Wagner

Linus Torvalds wrote:

 No.

 Please use unserialized accept() _always_, because we can fix that.

 Even 2.2.x can be fixed to do the wake-one for accept(), if required.
 It's not going to be any worse than the current apache config, and
 basically the less games apache plays, the better the kernel can try to
 accomodate what apache _really_ wants done.  When playing games, you
 hide what you really want done, and suddenly kernel profiles etc end up
 being completely useless, because they no longer give the data we needed
 to fix the problem.

 Basically, the whole serialization crap is all about the Apache people
 saying the equivalent of "the OS does a bad job on something we consider
 to be incredibly important, so we do something else instead to hide it".

 And regardless of _what_ workaround Apache does, whether it is the sucky
 fcntl() thing or using SysV semaphores, it's going to hide the real
 issue and mean that it never gets fixed properly.

 And in the end it will result in really really bad performance.

 Instead, if apache had just done the thing it wanted to do in the first
 place, the wake-one accept() semantics would have happened a hell of a
 lot earlier.

 Now it's there in 2.4.x. Please use it. PLEASE PLEASE PLEASE don't play
 games trying to outsmart the OS, it will just hurt Apache in the long run.


But how would you suggest people using 2.2 configure their
Apache?  Will flock/fcntl or semaphores perform better (albeit
"uglier") than unserialized accept()'s in 2.2.  I'm willing
and expecting to rebuild apache when 2.4 is released.  I do
not, though, want to leave performance on the table today,
just so I can say that my apache binary is 2.4-ready.

Do any of the apache serialization methods (flock/fcntl/semops)
have any performance improvement over unserialized accept() with
Apache running on a 2.2 kernel?

Dave Wagner

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] Re: Negative scalability by removal of lock_kernel()?(Was:Strange performance behavior of 2.4.0-test9)

2000-11-03 Thread Linus Torvalds

In article <[EMAIL PROTECTED]>,
Andrew Morton  <[EMAIL PROTECTED]> wrote:
>
>neither flock() nor fcntl() serialisation are effective
>on linux 2.2 or linux 2.4.  This is because the file
>locking code still wakes up _all_ waiters.  In my testing
>with fcntl serialisation I have seen a single Apache
>instance get woken and put back to sleep 1,500 times
>before the poor thing actually got to service a request.

Indeed.

flock() is the absolute worst case, and always has been.  I guess nobody
every actually bothered to benchmark it.

>For kernel 2.2 I recommend that Apache consider using
>sysv semaphores for serialisation. They use wake-one. 
>
>For kernel 2.4 I recommend that Apache use unserialised
>accept.

No.

Please use unserialized accept() _always_, because we can fix that. 

Even 2.2.x can be fixed to do the wake-one for accept(), if required. 
It's not going to be any worse than the current apache config, and
basically the less games apache plays, the better the kernel can try to
accomodate what apache _really_ wants done.  When playing games, you
hide what you really want done, and suddenly kernel profiles etc end up
being completely useless, because they no longer give the data we needed
to fix the problem. 

Basically, the whole serialization crap is all about the Apache people
saying the equivalent of "the OS does a bad job on something we consider
to be incredibly important, so we do something else instead to hide it".

And regardless of _what_ workaround Apache does, whether it is the sucky
fcntl() thing or using SysV semaphores, it's going to hide the real
issue and mean that it never gets fixed properly.

And in the end it will result in really really bad performance. 

Instead, if apache had just done the thing it wanted to do in the first
place, the wake-one accept() semantics would have happened a hell of a
lot earlier. 

Now it's there in 2.4.x. Please use it. PLEASE PLEASE PLEASE don't play
games trying to outsmart the OS, it will just hurt Apache in the long run.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] Re: Negative scalability by removal of lock_kernel()?(Was:Strange performance behavior of 2.4.0-test9)

2000-11-03 Thread Andrew Morton

dean gaudet wrote:
> 
> On Tue, 31 Oct 2000, Andrew Morton wrote:
> 
> > Dean,  it looks like the same problem will occur with flock()-based
> > serialisation.  Does Apache/Linux ever use that option?
> 
> from apache/src/include/ap_config.h in the linux section there's
> this:
> 
> /* flock is faster ... but hasn't been tested on 1.x systems */
> /* PR#3531 indicates flock() may not be stable, probably depends on
>  * kernel version.  Go back to using fcntl, but provide a way for
>  * folks to tweak their Configuration to get flock.
>  */
> #ifndef USE_FLOCK_SERIALIZED_ACCEPT
> #define USE_FCNTL_SERIALIZED_ACCEPT
> #endif
> 
> so you should be able to -DUSE_FLOCK_SERIALIZED_ACCEPT to try it.
> 

Dean,

neither flock() nor fcntl() serialisation are effective
on linux 2.2 or linux 2.4.  This is because the file
locking code still wakes up _all_ waiters.  In my testing
with fcntl serialisation I have seen a single Apache
instance get woken and put back to sleep 1,500 times
before the poor thing actually got to service a request.

For kernel 2.2 I recommend that Apache consider using
sysv semaphores for serialisation. They use wake-one. 

For kernel 2.4 I recommend that Apache use unserialised
accept.

This means that you'll need to make a runtime decision
on whether to use unserialised, serialised with sysv or
serialised with fcntl (if sysv IPC isn't installed).


In my testing I launched 3, 10, 30 or 150 Apache instances and then used

httperf --num-conns=2000 --num-calls=1 --uri=/index.html

to open, use and close 2000 connections.

Here are the (terrible) results on 2.4 SMP with fcntl
serialisation:

fcntl accept, 3 servers, vanilla: 938.0 req/s
fcntl accept, 30 servers, vanilla: 697.1 req/s
fcntl accept, 150 servers, vanilla: 99.9 req/s (sic)

2.4 SMP with no serialisation:

unserialised accept, 3 servers, vanilla: 1049.0 req/s
unserialised accept, 10 servers, vanilla: 968.8 req/s
unserialised accept, 30 servers, vanilla: 1040.2 req/s
unserialised accept, 150 servers, vanilla: 1091.4 req/s

2.4 SMP with no serialisation and my patch to the
wakeup and waitqueue code:

unserialised accept, 3 servers, task_exclusive: 1117.4 req/s
unserialised accept, 10 servers, task_exclusive: 1118.6 req/s
unserialised accept, 30 servers, task_exclusive: 1105.6 req/s
unserialised accept, 150 servers, task_exclusive: 1077.1 req/s

2.4 SMP with sysv semaphore serialisation:

sysvsem accept, 3 servers: 1001.2 req/s
sysvsem accept, 10 servers: 1061.0 req/s
sysvsem accept, 30 servers: 1021.2 req/s
sysvsem accept, 150 servers: 943.6 req/s

2.2.14 SMP with fcntl serialisation:

fcntl accept, 3 servers: 1053.8 req/s
fcntl accept, 10 servers: 996.2 req/s
fcntl accept, 30 servers: 934.3 req/s
fcntl accept, 150 servers: 141.4 req/s(sic)

2.2.14 SMP with no serialisation:

unserialised accept, 3 servers: 1039.9 req/s
unserialised accept, 10 servers: 983.1 req/s
unserialised accept, 30 servers: 775.7 req/s
unserialised accept, 150 servers: 220.7 req/s (sic)

2.2.14 SMP with sysv sem serialisation:

sysv accept, 3 servers: 932.2 req/s
sysv accept, 10 servers: 910.6 req/s
sysv accept, 30 servers: 1026.6 req/s
sysv accept, 150 servers: 927.2 req/s


Note that the first test (2.4 with fcntl serialisation) was
with an unpatched 2.4.0-test10-pre5.  Once the simple
flock.patch is applied, the performance with 150 servers
doubles.  But it's still sucky.  The flock.patch change
is effective in increasing scalability wiht a large number
of CPUs, not a large number of httpd's.

Here's the silly patch I used to turn on sysv sem serialisation
in Apache.  There's probably a better way than this :)

--- apache_1.3.14.orig/src/main/http_main.c Fri Sep 29 00:32:36 2000
+++ apache_1.3.14/src/main/http_main.c  Sat Nov  4 15:01:41 2000
@@ -172,6 +172,13 @@
 
 #include "explain.h"
 
+/* AKPM */
+#if 1
+#define NEED_UNION_SEMUN
+#define USE_SYSVSEM_SERIALIZED_ACCEPT
+#define USE_FCNTL_SERIALIZED_ACCEPT
+#endif
+
 #if !defined(max)
 #define max(a,b)(a > b ? a : b)
 #endif
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] Re: Negative scalability by removal of lock_kernel()?(Was:Strange performance behavior of 2.4.0-test9)

2000-11-03 Thread Linus Torvalds

In article [EMAIL PROTECTED],
Andrew Morton  [EMAIL PROTECTED] wrote:

neither flock() nor fcntl() serialisation are effective
on linux 2.2 or linux 2.4.  This is because the file
locking code still wakes up _all_ waiters.  In my testing
with fcntl serialisation I have seen a single Apache
instance get woken and put back to sleep 1,500 times
before the poor thing actually got to service a request.

Indeed.

flock() is the absolute worst case, and always has been.  I guess nobody
every actually bothered to benchmark it.

For kernel 2.2 I recommend that Apache consider using
sysv semaphores for serialisation. They use wake-one. 

For kernel 2.4 I recommend that Apache use unserialised
accept.

No.

Please use unserialized accept() _always_, because we can fix that. 

Even 2.2.x can be fixed to do the wake-one for accept(), if required. 
It's not going to be any worse than the current apache config, and
basically the less games apache plays, the better the kernel can try to
accomodate what apache _really_ wants done.  When playing games, you
hide what you really want done, and suddenly kernel profiles etc end up
being completely useless, because they no longer give the data we needed
to fix the problem. 

Basically, the whole serialization crap is all about the Apache people
saying the equivalent of "the OS does a bad job on something we consider
to be incredibly important, so we do something else instead to hide it".

And regardless of _what_ workaround Apache does, whether it is the sucky
fcntl() thing or using SysV semaphores, it's going to hide the real
issue and mean that it never gets fixed properly.

And in the end it will result in really really bad performance. 

Instead, if apache had just done the thing it wanted to do in the first
place, the wake-one accept() semantics would have happened a hell of a
lot earlier. 

Now it's there in 2.4.x. Please use it. PLEASE PLEASE PLEASE don't play
games trying to outsmart the OS, it will just hurt Apache in the long run.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] Re: Negative scalability by removal of lock_kernel()?(Was:Strange performance behavior of 2.4.0-test9)

2000-11-03 Thread Andrew Morton

dean gaudet wrote:
 
 On Tue, 31 Oct 2000, Andrew Morton wrote:
 
  Dean,  it looks like the same problem will occur with flock()-based
  serialisation.  Does Apache/Linux ever use that option?
 
 from apache/src/include/ap_config.h in the linux section there's
 this:
 
 /* flock is faster ... but hasn't been tested on 1.x systems */
 /* PR#3531 indicates flock() may not be stable, probably depends on
  * kernel version.  Go back to using fcntl, but provide a way for
  * folks to tweak their Configuration to get flock.
  */
 #ifndef USE_FLOCK_SERIALIZED_ACCEPT
 #define USE_FCNTL_SERIALIZED_ACCEPT
 #endif
 
 so you should be able to -DUSE_FLOCK_SERIALIZED_ACCEPT to try it.
 

Dean,

neither flock() nor fcntl() serialisation are effective
on linux 2.2 or linux 2.4.  This is because the file
locking code still wakes up _all_ waiters.  In my testing
with fcntl serialisation I have seen a single Apache
instance get woken and put back to sleep 1,500 times
before the poor thing actually got to service a request.

For kernel 2.2 I recommend that Apache consider using
sysv semaphores for serialisation. They use wake-one. 

For kernel 2.4 I recommend that Apache use unserialised
accept.

This means that you'll need to make a runtime decision
on whether to use unserialised, serialised with sysv or
serialised with fcntl (if sysv IPC isn't installed).


In my testing I launched 3, 10, 30 or 150 Apache instances and then used

httperf --num-conns=2000 --num-calls=1 --uri=/index.html

to open, use and close 2000 connections.

Here are the (terrible) results on 2.4 SMP with fcntl
serialisation:

fcntl accept, 3 servers, vanilla: 938.0 req/s
fcntl accept, 30 servers, vanilla: 697.1 req/s
fcntl accept, 150 servers, vanilla: 99.9 req/s (sic)

2.4 SMP with no serialisation:

unserialised accept, 3 servers, vanilla: 1049.0 req/s
unserialised accept, 10 servers, vanilla: 968.8 req/s
unserialised accept, 30 servers, vanilla: 1040.2 req/s
unserialised accept, 150 servers, vanilla: 1091.4 req/s

2.4 SMP with no serialisation and my patch to the
wakeup and waitqueue code:

unserialised accept, 3 servers, task_exclusive: 1117.4 req/s
unserialised accept, 10 servers, task_exclusive: 1118.6 req/s
unserialised accept, 30 servers, task_exclusive: 1105.6 req/s
unserialised accept, 150 servers, task_exclusive: 1077.1 req/s

2.4 SMP with sysv semaphore serialisation:

sysvsem accept, 3 servers: 1001.2 req/s
sysvsem accept, 10 servers: 1061.0 req/s
sysvsem accept, 30 servers: 1021.2 req/s
sysvsem accept, 150 servers: 943.6 req/s

2.2.14 SMP with fcntl serialisation:

fcntl accept, 3 servers: 1053.8 req/s
fcntl accept, 10 servers: 996.2 req/s
fcntl accept, 30 servers: 934.3 req/s
fcntl accept, 150 servers: 141.4 req/s(sic)

2.2.14 SMP with no serialisation:

unserialised accept, 3 servers: 1039.9 req/s
unserialised accept, 10 servers: 983.1 req/s
unserialised accept, 30 servers: 775.7 req/s
unserialised accept, 150 servers: 220.7 req/s (sic)

2.2.14 SMP with sysv sem serialisation:

sysv accept, 3 servers: 932.2 req/s
sysv accept, 10 servers: 910.6 req/s
sysv accept, 30 servers: 1026.6 req/s
sysv accept, 150 servers: 927.2 req/s


Note that the first test (2.4 with fcntl serialisation) was
with an unpatched 2.4.0-test10-pre5.  Once the simple
flock.patch is applied, the performance with 150 servers
doubles.  But it's still sucky.  The flock.patch change
is effective in increasing scalability wiht a large number
of CPUs, not a large number of httpd's.

Here's the silly patch I used to turn on sysv sem serialisation
in Apache.  There's probably a better way than this :)

--- apache_1.3.14.orig/src/main/http_main.c Fri Sep 29 00:32:36 2000
+++ apache_1.3.14/src/main/http_main.c  Sat Nov  4 15:01:41 2000
@@ -172,6 +172,13 @@
 
 #include "explain.h"
 
+/* AKPM */
+#if 1
+#define NEED_UNION_SEMUN
+#define USE_SYSVSEM_SERIALIZED_ACCEPT
+#define USE_FCNTL_SERIALIZED_ACCEPT
+#endif
+
 #if !defined(max)
 #define max(a,b)(a  b ? a : b)
 #endif
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/