Re: [PATCH] Re: Negative scalability by removal of lock_kernel()?(Was:Strange performance behavior of 2.4.0-test9)
Linus Torvalds wrote: > > No. > > Please use unserialized accept() _always_, because we can fix that. > > Even 2.2.x can be fixed to do the wake-one for accept(), if required. > It's not going to be any worse than the current apache config, and > basically the less games apache plays, the better the kernel can try to > accomodate what apache _really_ wants done. When playing games, you > hide what you really want done, and suddenly kernel profiles etc end up > being completely useless, because they no longer give the data we needed > to fix the problem. > > Basically, the whole serialization crap is all about the Apache people > saying the equivalent of "the OS does a bad job on something we consider > to be incredibly important, so we do something else instead to hide it". > > And regardless of _what_ workaround Apache does, whether it is the sucky > fcntl() thing or using SysV semaphores, it's going to hide the real > issue and mean that it never gets fixed properly. > > And in the end it will result in really really bad performance. > > Instead, if apache had just done the thing it wanted to do in the first > place, the wake-one accept() semantics would have happened a hell of a > lot earlier. > > Now it's there in 2.4.x. Please use it. PLEASE PLEASE PLEASE don't play > games trying to outsmart the OS, it will just hurt Apache in the long run. > But how would you suggest people using 2.2 configure their Apache? Will flock/fcntl or semaphores perform better (albeit "uglier") than unserialized accept()'s in 2.2. I'm willing and expecting to rebuild apache when 2.4 is released. I do not, though, want to leave performance on the table today, just so I can say that my apache binary is 2.4-ready. Do any of the apache serialization methods (flock/fcntl/semops) have any performance improvement over unserialized accept() with Apache running on a 2.2 kernel? Dave Wagner - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: Negative scalability by removal of lock_kernel()?(Was:Strange performance behavior of 2.4.0-test9)
Linus Torvalds wrote: No. Please use unserialized accept() _always_, because we can fix that. Even 2.2.x can be fixed to do the wake-one for accept(), if required. It's not going to be any worse than the current apache config, and basically the less games apache plays, the better the kernel can try to accomodate what apache _really_ wants done. When playing games, you hide what you really want done, and suddenly kernel profiles etc end up being completely useless, because they no longer give the data we needed to fix the problem. Basically, the whole serialization crap is all about the Apache people saying the equivalent of "the OS does a bad job on something we consider to be incredibly important, so we do something else instead to hide it". And regardless of _what_ workaround Apache does, whether it is the sucky fcntl() thing or using SysV semaphores, it's going to hide the real issue and mean that it never gets fixed properly. And in the end it will result in really really bad performance. Instead, if apache had just done the thing it wanted to do in the first place, the wake-one accept() semantics would have happened a hell of a lot earlier. Now it's there in 2.4.x. Please use it. PLEASE PLEASE PLEASE don't play games trying to outsmart the OS, it will just hurt Apache in the long run. But how would you suggest people using 2.2 configure their Apache? Will flock/fcntl or semaphores perform better (albeit "uglier") than unserialized accept()'s in 2.2. I'm willing and expecting to rebuild apache when 2.4 is released. I do not, though, want to leave performance on the table today, just so I can say that my apache binary is 2.4-ready. Do any of the apache serialization methods (flock/fcntl/semops) have any performance improvement over unserialized accept() with Apache running on a 2.2 kernel? Dave Wagner - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: Negative scalability by removal of lock_kernel()?(Was:Strange performance behavior of 2.4.0-test9)
In article <[EMAIL PROTECTED]>, Andrew Morton <[EMAIL PROTECTED]> wrote: > >neither flock() nor fcntl() serialisation are effective >on linux 2.2 or linux 2.4. This is because the file >locking code still wakes up _all_ waiters. In my testing >with fcntl serialisation I have seen a single Apache >instance get woken and put back to sleep 1,500 times >before the poor thing actually got to service a request. Indeed. flock() is the absolute worst case, and always has been. I guess nobody every actually bothered to benchmark it. >For kernel 2.2 I recommend that Apache consider using >sysv semaphores for serialisation. They use wake-one. > >For kernel 2.4 I recommend that Apache use unserialised >accept. No. Please use unserialized accept() _always_, because we can fix that. Even 2.2.x can be fixed to do the wake-one for accept(), if required. It's not going to be any worse than the current apache config, and basically the less games apache plays, the better the kernel can try to accomodate what apache _really_ wants done. When playing games, you hide what you really want done, and suddenly kernel profiles etc end up being completely useless, because they no longer give the data we needed to fix the problem. Basically, the whole serialization crap is all about the Apache people saying the equivalent of "the OS does a bad job on something we consider to be incredibly important, so we do something else instead to hide it". And regardless of _what_ workaround Apache does, whether it is the sucky fcntl() thing or using SysV semaphores, it's going to hide the real issue and mean that it never gets fixed properly. And in the end it will result in really really bad performance. Instead, if apache had just done the thing it wanted to do in the first place, the wake-one accept() semantics would have happened a hell of a lot earlier. Now it's there in 2.4.x. Please use it. PLEASE PLEASE PLEASE don't play games trying to outsmart the OS, it will just hurt Apache in the long run. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: Negative scalability by removal of lock_kernel()?(Was:Strange performance behavior of 2.4.0-test9)
dean gaudet wrote: > > On Tue, 31 Oct 2000, Andrew Morton wrote: > > > Dean, it looks like the same problem will occur with flock()-based > > serialisation. Does Apache/Linux ever use that option? > > from apache/src/include/ap_config.h in the linux section there's > this: > > /* flock is faster ... but hasn't been tested on 1.x systems */ > /* PR#3531 indicates flock() may not be stable, probably depends on > * kernel version. Go back to using fcntl, but provide a way for > * folks to tweak their Configuration to get flock. > */ > #ifndef USE_FLOCK_SERIALIZED_ACCEPT > #define USE_FCNTL_SERIALIZED_ACCEPT > #endif > > so you should be able to -DUSE_FLOCK_SERIALIZED_ACCEPT to try it. > Dean, neither flock() nor fcntl() serialisation are effective on linux 2.2 or linux 2.4. This is because the file locking code still wakes up _all_ waiters. In my testing with fcntl serialisation I have seen a single Apache instance get woken and put back to sleep 1,500 times before the poor thing actually got to service a request. For kernel 2.2 I recommend that Apache consider using sysv semaphores for serialisation. They use wake-one. For kernel 2.4 I recommend that Apache use unserialised accept. This means that you'll need to make a runtime decision on whether to use unserialised, serialised with sysv or serialised with fcntl (if sysv IPC isn't installed). In my testing I launched 3, 10, 30 or 150 Apache instances and then used httperf --num-conns=2000 --num-calls=1 --uri=/index.html to open, use and close 2000 connections. Here are the (terrible) results on 2.4 SMP with fcntl serialisation: fcntl accept, 3 servers, vanilla: 938.0 req/s fcntl accept, 30 servers, vanilla: 697.1 req/s fcntl accept, 150 servers, vanilla: 99.9 req/s (sic) 2.4 SMP with no serialisation: unserialised accept, 3 servers, vanilla: 1049.0 req/s unserialised accept, 10 servers, vanilla: 968.8 req/s unserialised accept, 30 servers, vanilla: 1040.2 req/s unserialised accept, 150 servers, vanilla: 1091.4 req/s 2.4 SMP with no serialisation and my patch to the wakeup and waitqueue code: unserialised accept, 3 servers, task_exclusive: 1117.4 req/s unserialised accept, 10 servers, task_exclusive: 1118.6 req/s unserialised accept, 30 servers, task_exclusive: 1105.6 req/s unserialised accept, 150 servers, task_exclusive: 1077.1 req/s 2.4 SMP with sysv semaphore serialisation: sysvsem accept, 3 servers: 1001.2 req/s sysvsem accept, 10 servers: 1061.0 req/s sysvsem accept, 30 servers: 1021.2 req/s sysvsem accept, 150 servers: 943.6 req/s 2.2.14 SMP with fcntl serialisation: fcntl accept, 3 servers: 1053.8 req/s fcntl accept, 10 servers: 996.2 req/s fcntl accept, 30 servers: 934.3 req/s fcntl accept, 150 servers: 141.4 req/s(sic) 2.2.14 SMP with no serialisation: unserialised accept, 3 servers: 1039.9 req/s unserialised accept, 10 servers: 983.1 req/s unserialised accept, 30 servers: 775.7 req/s unserialised accept, 150 servers: 220.7 req/s (sic) 2.2.14 SMP with sysv sem serialisation: sysv accept, 3 servers: 932.2 req/s sysv accept, 10 servers: 910.6 req/s sysv accept, 30 servers: 1026.6 req/s sysv accept, 150 servers: 927.2 req/s Note that the first test (2.4 with fcntl serialisation) was with an unpatched 2.4.0-test10-pre5. Once the simple flock.patch is applied, the performance with 150 servers doubles. But it's still sucky. The flock.patch change is effective in increasing scalability wiht a large number of CPUs, not a large number of httpd's. Here's the silly patch I used to turn on sysv sem serialisation in Apache. There's probably a better way than this :) --- apache_1.3.14.orig/src/main/http_main.c Fri Sep 29 00:32:36 2000 +++ apache_1.3.14/src/main/http_main.c Sat Nov 4 15:01:41 2000 @@ -172,6 +172,13 @@ #include "explain.h" +/* AKPM */ +#if 1 +#define NEED_UNION_SEMUN +#define USE_SYSVSEM_SERIALIZED_ACCEPT +#define USE_FCNTL_SERIALIZED_ACCEPT +#endif + #if !defined(max) #define max(a,b)(a > b ? a : b) #endif - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: Negative scalability by removal of lock_kernel()?(Was:Strange performance behavior of 2.4.0-test9)
In article [EMAIL PROTECTED], Andrew Morton [EMAIL PROTECTED] wrote: neither flock() nor fcntl() serialisation are effective on linux 2.2 or linux 2.4. This is because the file locking code still wakes up _all_ waiters. In my testing with fcntl serialisation I have seen a single Apache instance get woken and put back to sleep 1,500 times before the poor thing actually got to service a request. Indeed. flock() is the absolute worst case, and always has been. I guess nobody every actually bothered to benchmark it. For kernel 2.2 I recommend that Apache consider using sysv semaphores for serialisation. They use wake-one. For kernel 2.4 I recommend that Apache use unserialised accept. No. Please use unserialized accept() _always_, because we can fix that. Even 2.2.x can be fixed to do the wake-one for accept(), if required. It's not going to be any worse than the current apache config, and basically the less games apache plays, the better the kernel can try to accomodate what apache _really_ wants done. When playing games, you hide what you really want done, and suddenly kernel profiles etc end up being completely useless, because they no longer give the data we needed to fix the problem. Basically, the whole serialization crap is all about the Apache people saying the equivalent of "the OS does a bad job on something we consider to be incredibly important, so we do something else instead to hide it". And regardless of _what_ workaround Apache does, whether it is the sucky fcntl() thing or using SysV semaphores, it's going to hide the real issue and mean that it never gets fixed properly. And in the end it will result in really really bad performance. Instead, if apache had just done the thing it wanted to do in the first place, the wake-one accept() semantics would have happened a hell of a lot earlier. Now it's there in 2.4.x. Please use it. PLEASE PLEASE PLEASE don't play games trying to outsmart the OS, it will just hurt Apache in the long run. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: Negative scalability by removal of lock_kernel()?(Was:Strange performance behavior of 2.4.0-test9)
dean gaudet wrote: On Tue, 31 Oct 2000, Andrew Morton wrote: Dean, it looks like the same problem will occur with flock()-based serialisation. Does Apache/Linux ever use that option? from apache/src/include/ap_config.h in the linux section there's this: /* flock is faster ... but hasn't been tested on 1.x systems */ /* PR#3531 indicates flock() may not be stable, probably depends on * kernel version. Go back to using fcntl, but provide a way for * folks to tweak their Configuration to get flock. */ #ifndef USE_FLOCK_SERIALIZED_ACCEPT #define USE_FCNTL_SERIALIZED_ACCEPT #endif so you should be able to -DUSE_FLOCK_SERIALIZED_ACCEPT to try it. Dean, neither flock() nor fcntl() serialisation are effective on linux 2.2 or linux 2.4. This is because the file locking code still wakes up _all_ waiters. In my testing with fcntl serialisation I have seen a single Apache instance get woken and put back to sleep 1,500 times before the poor thing actually got to service a request. For kernel 2.2 I recommend that Apache consider using sysv semaphores for serialisation. They use wake-one. For kernel 2.4 I recommend that Apache use unserialised accept. This means that you'll need to make a runtime decision on whether to use unserialised, serialised with sysv or serialised with fcntl (if sysv IPC isn't installed). In my testing I launched 3, 10, 30 or 150 Apache instances and then used httperf --num-conns=2000 --num-calls=1 --uri=/index.html to open, use and close 2000 connections. Here are the (terrible) results on 2.4 SMP with fcntl serialisation: fcntl accept, 3 servers, vanilla: 938.0 req/s fcntl accept, 30 servers, vanilla: 697.1 req/s fcntl accept, 150 servers, vanilla: 99.9 req/s (sic) 2.4 SMP with no serialisation: unserialised accept, 3 servers, vanilla: 1049.0 req/s unserialised accept, 10 servers, vanilla: 968.8 req/s unserialised accept, 30 servers, vanilla: 1040.2 req/s unserialised accept, 150 servers, vanilla: 1091.4 req/s 2.4 SMP with no serialisation and my patch to the wakeup and waitqueue code: unserialised accept, 3 servers, task_exclusive: 1117.4 req/s unserialised accept, 10 servers, task_exclusive: 1118.6 req/s unserialised accept, 30 servers, task_exclusive: 1105.6 req/s unserialised accept, 150 servers, task_exclusive: 1077.1 req/s 2.4 SMP with sysv semaphore serialisation: sysvsem accept, 3 servers: 1001.2 req/s sysvsem accept, 10 servers: 1061.0 req/s sysvsem accept, 30 servers: 1021.2 req/s sysvsem accept, 150 servers: 943.6 req/s 2.2.14 SMP with fcntl serialisation: fcntl accept, 3 servers: 1053.8 req/s fcntl accept, 10 servers: 996.2 req/s fcntl accept, 30 servers: 934.3 req/s fcntl accept, 150 servers: 141.4 req/s(sic) 2.2.14 SMP with no serialisation: unserialised accept, 3 servers: 1039.9 req/s unserialised accept, 10 servers: 983.1 req/s unserialised accept, 30 servers: 775.7 req/s unserialised accept, 150 servers: 220.7 req/s (sic) 2.2.14 SMP with sysv sem serialisation: sysv accept, 3 servers: 932.2 req/s sysv accept, 10 servers: 910.6 req/s sysv accept, 30 servers: 1026.6 req/s sysv accept, 150 servers: 927.2 req/s Note that the first test (2.4 with fcntl serialisation) was with an unpatched 2.4.0-test10-pre5. Once the simple flock.patch is applied, the performance with 150 servers doubles. But it's still sucky. The flock.patch change is effective in increasing scalability wiht a large number of CPUs, not a large number of httpd's. Here's the silly patch I used to turn on sysv sem serialisation in Apache. There's probably a better way than this :) --- apache_1.3.14.orig/src/main/http_main.c Fri Sep 29 00:32:36 2000 +++ apache_1.3.14/src/main/http_main.c Sat Nov 4 15:01:41 2000 @@ -172,6 +172,13 @@ #include "explain.h" +/* AKPM */ +#if 1 +#define NEED_UNION_SEMUN +#define USE_SYSVSEM_SERIALIZED_ACCEPT +#define USE_FCNTL_SERIALIZED_ACCEPT +#endif + #if !defined(max) #define max(a,b)(a b ? a : b) #endif - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/