Re: [users@httpd] Help: Apache Crashing Everyday
Hi! 2017-04-19 8:41 GMT+02:00 Jayaram Ponnusamy: > Hi Luca, > > Thanks for the details. > 1. our server's ulimit values are: > ]$ ulimit -a > max user processes (-u) 1024 > > Please let me know whether the values are sufficient to allow at least 500 > concurrent connections. > To be sure you should check /proc/$pid/limits (where $pid is one of the Apache processes), but I'd say that your original issue (quoting "and when the Total Children value is reached 999 the Apache is not responding") is related to this limit being enforced. > > 2. Yes I checked mod_jk log when hang happens, and getting below errors > continuously. > > [Wed Apr 19 02:00:38 2017]loadbalancer www.cmsp1.com 24.843284 > [Wed Apr 19 02:00:38 2017][16313:3878614784 <387%20861%204784>] [info] > ajp_process_callback::jk_ajp_common.c (1788): Writing to client aborted > or client network problems > [Wed Apr 19 02:00:38 2017][16313:3878614784 <387%20861%204784>] [info] > ajp_service::jk_ajp_common.c (2447): (qu_prod_live_svr1) sending request to > tomcat failed (unrecoverable), because of client write error (attempt=1) > [Wed Apr 19 02:00:38 2017][16313:3878614784 <387%20861%204784>] [info] > service::jk_lb_worker.c (1384): service failed, worker qu_prod_live_svr1 is > in local error state > [Wed Apr 19 02:00:38 2017][16313:3878614784 <387%20861%204784>] [info] > service::jk_lb_worker.c (1403): unrecoverable error 200, request failed. > Client failed in the middle of request, we can't recover to another > instance. > [Wed Apr 19 02:00:38 2017]loadbalancer www.cmsp1.com 19.170901 > [Wed Apr 19 02:00:38 2017][16313:3878614784 <387%20861%204784>] [info] > jk_handler::mod_jk.c (2608): Aborting connection for worker=loadbalancer > [Wed Apr 19 02:00:39 2017][16261:3878614784 <387%20861%204784>] [warn] > map_uri_to_worker_ext::jk_uri_worker_map.c (962): Uri * is invalid. Uri > must start with / > [Wed Apr 19 02:00:40 2017][16308:3878614784 <387%20861%204784>] [warn] > map_uri_to_worker_ext::jk_uri_worker_map.c (962): Uri * is invalid. Uri > must start with / > Was apache asked to reload for logrotation before this? Or did you see an increase in traffic? > > 3. We will upgrade to 2.4.25, could you please share optimal configuration > for mpm-event to allow more concurrent users, please. > I'd suggest to start from https://httpd.apache.org/docs/2.4/mod/event.html, but every server has its own set of requirements and a proper configuration needs a bit of testing, so I suggest to set up a fake production environment first and start playing with 2.4.25 in there first. Please also check https://httpd.apache.org/docs/current/upgrading.html, upgrading to 2.4 is not super difficult but you'll might be required to make some changes to your config. Hope that helps! Luca
Re: [users@httpd] Help: Apache Crashing Everyday
Hi Luca, Thanks for the details. 1. our server's ulimit values are: ]$ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 63714 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size(512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 1024 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited Please let me know whether the values are sufficient to allow at least 500 concurrent connections. 2. Yes I checked mod_jk log when hang happens, and getting below errors continuously. [Wed Apr 19 02:00:38 2017]loadbalancer www.cmsp1.com 24.843284 [Wed Apr 19 02:00:38 2017][16313:3878614784] [info] ajp_process_callback::jk_ajp_common.c (1788): Writing to client aborted or client network problems [Wed Apr 19 02:00:38 2017][16313:3878614784] [info] ajp_service::jk_ajp_common.c (2447): (qu_prod_live_svr1) sending request to tomcat failed (unrecoverable), because of client write error (attempt=1) [Wed Apr 19 02:00:38 2017][16313:3878614784] [info] service::jk_lb_worker.c (1384): service failed, worker qu_prod_live_svr1 is in local error state [Wed Apr 19 02:00:38 2017][16313:3878614784] [info] service::jk_lb_worker.c (1403): unrecoverable error 200, request failed. Client failed in the middle of request, we can't recover to another instance. [Wed Apr 19 02:00:38 2017]loadbalancer www.cmsp1.com 19.170901 [Wed Apr 19 02:00:38 2017][16313:3878614784] [info] jk_handler::mod_jk.c (2608): Aborting connection for worker=loadbalancer [Wed Apr 19 02:00:39 2017][16261:3878614784] [warn] map_uri_to_worker_ext::jk_uri_worker_map.c (962): Uri * is invalid. Uri must start with / [Wed Apr 19 02:00:40 2017][16308:3878614784] [warn] map_uri_to_worker_ext::jk_uri_worker_map.c (962): Uri * is invalid. Uri must start with / 3. We will upgrade to 2.4.25, could you please share optimal configuration for mpm-event to allow more concurrent users, please. Thanks Jay On Tue, Apr 18, 2017 at 10:03 AM, Luca Toscanowrote: > Hi, > > Some suggestions: > > 1) check your RHEL ulimits applied to httpd, the error message "Resource > temporarily unavailable: setuid: unable to change to uid" could be related > to maximum number of processes (allowed by the OS) reached. This should > allow you to spawn more httpd processes. > > 2) Have you checked when the "hang" happens? If you have long lived > connections and your httpd server reloads (for example for log rotation) > then it might hang a bit while waiting for the remaining connections to > drain. > > 3) If possible I'd consider to upgrade httpd to >= 2.4.25 and use > mpm-event (rather than prefork). > > Hope that helps! > > Luca > > > 2017-04-16 13:18 GMT+02:00 Jayaram Ponnusamy > : > >> Dear All, >> >> We were runnig our site in PHP based CMS tool earlier, and normally >> 20-30K users will access our sites daily. But in new system with Tomcat, we >> are facing performance and availability issue frequently, when i access the >> tomcat url directly the page is loading within 3seconds, but if we access >> webServer URL then its taking more than 9seconds. >> >> Also, Each day I am seeing more and more of these in my error_logs, and >> when the Total Children value is reached 999 the Apache is not responding >> and Server reboot only help to bring the site back. Every day atleast 4-5 >> times we are facing this issue (we are using mod_jk to connect with tomcat). >> >> Kindly please help on this. >> >> Usually I am seeing this on my error_log: >> [Sat Apr 15 20:49:33 2017] [info] server seems busy, (you may need to >> increase StartServers, or Min/MaxSpareServers), spawning 8 children, there >> are 4 idle, and 31 total children >> [Sat Apr 15 20:51:14 2017] [info] server seems busy, (you may need to >> increase StartServers, or Min/MaxSpareServers), spawning 8 children, there >> are 0 idle, and 20 total children >> [Sat Apr 15 20:51:15 2017] [info] server seems busy, (you may need to >> increase StartServers, or Min/MaxSpareServers), spawning 16 children, there >> are 0 idle, and 28 total children >> [Sat Apr 15 20:51:16 2017] [info] server seems busy, (you may need to >> increase StartServers, or Min/MaxSpareServers), spawning 32 children, there >> are 0 idle, and 44 total children >> We are using two Apache Nodes and Connected with Two Tomcat (at >> Application Level Clustering). >> Apache Servers: >> 4 Core 64-bit, Rhel System running on 16GB RAM (Both Servers) >> Server version: Apache/2.2.21 (Unix) >> >> *httpd.conf* >> KeepAlive On >> Timeout 300 >> MaxKeepAliveRequests
Re: [users@httpd] Help: Apache Crashing Everyday
Hi, Some suggestions: 1) check your RHEL ulimits applied to httpd, the error message "Resource temporarily unavailable: setuid: unable to change to uid" could be related to maximum number of processes (allowed by the OS) reached. This should allow you to spawn more httpd processes. 2) Have you checked when the "hang" happens? If you have long lived connections and your httpd server reloads (for example for log rotation) then it might hang a bit while waiting for the remaining connections to drain. 3) If possible I'd consider to upgrade httpd to >= 2.4.25 and use mpm-event (rather than prefork). Hope that helps! Luca 2017-04-16 13:18 GMT+02:00 Jayaram Ponnusamy: > Dear All, > > We were runnig our site in PHP based CMS tool earlier, and normally 20-30K > users will access our sites daily. But in new system with Tomcat, we are > facing performance and availability issue frequently, when i access the > tomcat url directly the page is loading within 3seconds, but if we access > webServer URL then its taking more than 9seconds. > > Also, Each day I am seeing more and more of these in my error_logs, and > when the Total Children value is reached 999 the Apache is not responding > and Server reboot only help to bring the site back. Every day atleast 4-5 > times we are facing this issue (we are using mod_jk to connect with tomcat). > > Kindly please help on this. > > Usually I am seeing this on my error_log: > [Sat Apr 15 20:49:33 2017] [info] server seems busy, (you may need to > increase StartServers, or Min/MaxSpareServers), spawning 8 children, there > are 4 idle, and 31 total children > [Sat Apr 15 20:51:14 2017] [info] server seems busy, (you may need to > increase StartServers, or Min/MaxSpareServers), spawning 8 children, there > are 0 idle, and 20 total children > [Sat Apr 15 20:51:15 2017] [info] server seems busy, (you may need to > increase StartServers, or Min/MaxSpareServers), spawning 16 children, there > are 0 idle, and 28 total children > [Sat Apr 15 20:51:16 2017] [info] server seems busy, (you may need to > increase StartServers, or Min/MaxSpareServers), spawning 32 children, there > are 0 idle, and 44 total children > We are using two Apache Nodes and Connected with Two Tomcat (at > Application Level Clustering). > Apache Servers: > 4 Core 64-bit, Rhel System running on 16GB RAM (Both Servers) > Server version: Apache/2.2.21 (Unix) > > *httpd.conf* > KeepAlive On > Timeout 300 > MaxKeepAliveRequests 100 > KeepAliveTimeout 15 > > StartServers 80 > ServerLimit 3500 > MaxClients 3500 > MaxRequestsPerChild 0 > > > *workers.properties* > worker.list=loadbalancer,status > worker.qu_prod_live_svr.type=ajp13 > worker.qu_prod_live_svr.host=cmsp1 > worker.qu_prod_live_svr.port=8009 > worker.qu_prod_live_svr.socket_keepalive=1 > worker.qu_prod_live_svr.socket_timeout=300 > worker.qu_prod_live_svr1.type=ajp13 > worker.qu_prod_live_svr1.host=cmsp2 > worker.qu_prod_live_svr1.port=8009 > worker.qu_prod_live_svr1.socket_keepalive=1 > worker.qu_prod_live_svr1.socket_timeout=300 > worker.qu_prod_live_svr.lbfactor=1 > worker.qu_prod_live_svr1.lbfactor=1 > worker.loadbalancer.type=lb > worker.loadbalancer.balance_workers=qu_prod_live_svr,qu_prod_live_svr1 > worker.status.type=status > > *Tomcat Servers:* > 4 Core 64-bit, Rhel System running on 16GB RAM (Both Servers) > Server version: Apache Tomcat/7.0.42 > URIEncoding="UTF-8" emptySessionPath="true" maxThreads="500" > minSpareThreads="10" connectionTimeout="-1" /> > URIEncoding="UTF-8" /> > > *error_log:* > [Sat Apr 15 21:52:36 2017] [info] server seems busy, (you may need to > increase StartServers, or Min/MaxSpareServers), spawning 32 children, there > are 0 idle, and 839 total children > [Sat Apr 15 21:52:37 2017] [info] server seems busy, (you may need to > increase StartServers, or Min/MaxSpareServers), spawning 32 children, there > are 0 idle, and 871 total children > [Sat Apr 15 21:52:38 2017] [info] server seems busy, (you may need to > increase StartServers, or Min/MaxSpareServers), spawning 32 children, there > are 0 idle, and 903 total children > [Sat Apr 15 21:52:39 2017] [info] server seems busy, (you may need to > increase StartServers, or Min/MaxSpareServers), spawning 32 children, there > are 0 idle, and 935 total children > [Sat Apr 15 21:52:40 2017] [info] server seems busy, (you may need to > increase StartServers, or Min/MaxSpareServers), spawning 32 children, there > are 0 idle, and 967 total children > [Sat Apr 15 21:52:41 2017] [info] server seems busy, (you may need to > increase StartServers, or Min/MaxSpareServers), spawning 32 children, there > are 0 idle, and 999 total children > [Sat Apr 15 21:52:41 2017] [alert] (11)Resource temporarily unavailable: > setuid: unable to change to uid: 2 > [Sat Apr 15 21:52:41 2017] [alert] (11)Resource temporarily unavailable: > setuid: unable to change to uid: 2 > [Sat Apr 15 21:52:41 2017] [alert] (11)Resource temporarily unavailable: > setuid:
[users@httpd] Help: Apache Crashing Everyday
Dear All, We were runnig our site in PHP based CMS tool earlier, and normally 20-30K users will access our sites daily. But in new system with Tomcat, we are facing performance and availability issue frequently, when i access the tomcat url directly the page is loading within 3seconds, but if we access webServer URL then its taking more than 9seconds. Also, Each day I am seeing more and more of these in my error_logs, and when the Total Children value is reached 999 the Apache is not responding and Server reboot only help to bring the site back. Every day atleast 4-5 times we are facing this issue (we are using mod_jk to connect with tomcat). Kindly please help on this. Usually I am seeing this on my error_log: [Sat Apr 15 20:49:33 2017] [info] server seems busy, (you may need to increase StartServers, or Min/MaxSpareServers), spawning 8 children, there are 4 idle, and 31 total children [Sat Apr 15 20:51:14 2017] [info] server seems busy, (you may need to increase StartServers, or Min/MaxSpareServers), spawning 8 children, there are 0 idle, and 20 total children [Sat Apr 15 20:51:15 2017] [info] server seems busy, (you may need to increase StartServers, or Min/MaxSpareServers), spawning 16 children, there are 0 idle, and 28 total children [Sat Apr 15 20:51:16 2017] [info] server seems busy, (you may need to increase StartServers, or Min/MaxSpareServers), spawning 32 children, there are 0 idle, and 44 total children We are using two Apache Nodes and Connected with Two Tomcat (at Application Level Clustering). Apache Servers: 4 Core 64-bit, Rhel System running on 16GB RAM (Both Servers) Server version: Apache/2.2.21 (Unix) *httpd.conf* KeepAlive On Timeout 300 MaxKeepAliveRequests 100 KeepAliveTimeout 15 StartServers 80 ServerLimit 3500 MaxClients 3500 MaxRequestsPerChild 0 *workers.properties* worker.list=loadbalancer,status worker.qu_prod_live_svr.type=ajp13 worker.qu_prod_live_svr.host=cmsp1 worker.qu_prod_live_svr.port=8009 worker.qu_prod_live_svr.socket_keepalive=1 worker.qu_prod_live_svr.socket_timeout=300 worker.qu_prod_live_svr1.type=ajp13 worker.qu_prod_live_svr1.host=cmsp2 worker.qu_prod_live_svr1.port=8009 worker.qu_prod_live_svr1.socket_keepalive=1 worker.qu_prod_live_svr1.socket_timeout=300 worker.qu_prod_live_svr.lbfactor=1 worker.qu_prod_live_svr1.lbfactor=1 worker.loadbalancer.type=lb worker.loadbalancer.balance_workers=qu_prod_live_svr,qu_prod_live_svr1 worker.status.type=status *Tomcat Servers:* 4 Core 64-bit, Rhel System running on 16GB RAM (Both Servers) Server version: Apache Tomcat/7.0.42 *error_log:* [Sat Apr 15 21:52:36 2017] [info] server seems busy, (you may need to increase StartServers, or Min/MaxSpareServers), spawning 32 children, there are 0 idle, and 839 total children [Sat Apr 15 21:52:37 2017] [info] server seems busy, (you may need to increase StartServers, or Min/MaxSpareServers), spawning 32 children, there are 0 idle, and 871 total children [Sat Apr 15 21:52:38 2017] [info] server seems busy, (you may need to increase StartServers, or Min/MaxSpareServers), spawning 32 children, there are 0 idle, and 903 total children [Sat Apr 15 21:52:39 2017] [info] server seems busy, (you may need to increase StartServers, or Min/MaxSpareServers), spawning 32 children, there are 0 idle, and 935 total children [Sat Apr 15 21:52:40 2017] [info] server seems busy, (you may need to increase StartServers, or Min/MaxSpareServers), spawning 32 children, there are 0 idle, and 967 total children [Sat Apr 15 21:52:41 2017] [info] server seems busy, (you may need to increase StartServers, or Min/MaxSpareServers), spawning 32 children, there are 0 idle, and 999 total children [Sat Apr 15 21:52:41 2017] [alert] (11)Resource temporarily unavailable: setuid: unable to change to uid: 2 [Sat Apr 15 21:52:41 2017] [alert] (11)Resource temporarily unavailable: setuid: unable to change to uid: 2 [Sat Apr 15 21:52:41 2017] [alert] (11)Resource temporarily unavailable: setuid: unable to change to uid: 2 [Sat Apr 15 21:52:41 2017] [alert] (11)Resource temporarily unavailable: setuid: unable to change to uid: 2 [Sat Apr 15 21:52:41 2017] [alert] Child 9351 returned a Fatal error... Apache is exiting! [Sat Apr 15 21:52:41 2017] [alert] (11)Resource temporarily unavailable: setuid: unable to change to uid: 2 [Sat Apr 15 21:52:41 2017] [alert] (11)Resource temporarily unavailable: setuid: unable to change to uid: 2 [Sat Apr 15 21:52:41 2017] [alert] (11)Resource temporarily unavailable: setuid: unable to change to uid: 2 [Sat Apr 15 21:53:06 2017] [error] (22)Invalid argument: apr_global_mutex_lock(jk_log_lock) failed [Sat Apr 15 21:53:06 2017] [error] mod_jk: jk_log_to_file [Sat Apr 15 21:53:06 2017][8752:4177577728] [info] ajp_connection_tcp_get_message::jk_ajp_common.c (1150): (qu_prod_live_svr1) can't receive the response header message from tomcat, network problems or tomcat (10.11.11.32:8009) is down (errno=104)\n failed: Broken pipe [Sat Apr 15 21:53:06 2017] [error] (22)Invalid