Re: [users@httpd] Help: Apache Crashing Everyday

2017-04-20 Thread Luca Toscano
Hi!

2017-04-19 8:41 GMT+02:00 Jayaram Ponnusamy :

> Hi Luca,
>
> Thanks for the details.
> 1. our server's ulimit values are:
> ]$ ulimit -a
> max user processes  (-u) 1024
>
> Please let me know whether the values are sufficient to allow at least 500
> concurrent connections.
>

To be sure you should check /proc/$pid/limits (where $pid is one of the
Apache processes), but I'd say that your original issue (quoting "and when
the Total Children value is reached 999 the Apache is not responding") is
related to this limit being enforced.


>
> 2. Yes I checked mod_jk log when hang happens, and getting below errors
> continuously.
>
> [Wed Apr 19 02:00:38 2017]loadbalancer www.cmsp1.com 24.843284
> [Wed Apr 19 02:00:38 2017][16313:3878614784 <387%20861%204784>] [info]
> ajp_process_callback::jk_ajp_common.c (1788): Writing to client aborted
> or client network problems
> [Wed Apr 19 02:00:38 2017][16313:3878614784 <387%20861%204784>] [info]
> ajp_service::jk_ajp_common.c (2447): (qu_prod_live_svr1) sending request to
> tomcat failed (unrecoverable), because of client write error (attempt=1)
> [Wed Apr 19 02:00:38 2017][16313:3878614784 <387%20861%204784>] [info]
> service::jk_lb_worker.c (1384): service failed, worker qu_prod_live_svr1 is
> in local error state
> [Wed Apr 19 02:00:38 2017][16313:3878614784 <387%20861%204784>] [info]
> service::jk_lb_worker.c (1403): unrecoverable error 200, request failed.
> Client failed in the middle of request, we can't recover to another
> instance.
> [Wed Apr 19 02:00:38 2017]loadbalancer www.cmsp1.com 19.170901
> [Wed Apr 19 02:00:38 2017][16313:3878614784 <387%20861%204784>] [info]
> jk_handler::mod_jk.c (2608): Aborting connection for worker=loadbalancer
> [Wed Apr 19 02:00:39 2017][16261:3878614784 <387%20861%204784>] [warn]
> map_uri_to_worker_ext::jk_uri_worker_map.c (962): Uri * is invalid. Uri
> must start with /
> [Wed Apr 19 02:00:40 2017][16308:3878614784 <387%20861%204784>] [warn]
> map_uri_to_worker_ext::jk_uri_worker_map.c (962): Uri * is invalid. Uri
> must start with /
>

Was apache asked to reload for logrotation before this? Or did you see an
increase in traffic?


>
> 3. We will upgrade to 2.4.25, could you please share optimal configuration
> for mpm-event to allow more concurrent users, please.
>

I'd suggest to start from https://httpd.apache.org/docs/2.4/mod/event.html,
but every server has its own set of requirements and a proper configuration
needs a bit of testing, so I suggest to set up a fake production
environment first and start playing with 2.4.25 in there first.

Please also check https://httpd.apache.org/docs/current/upgrading.html,
upgrading to 2.4 is not super difficult but you'll might be required to
make some changes to your config.

Hope that helps!

Luca


Re: [users@httpd] Help: Apache Crashing Everyday

2017-04-19 Thread Jayaram Ponnusamy
Hi Luca,

Thanks for the details.
1. our server's ulimit values are:
]$ ulimit -a
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 63714
max locked memory   (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files  (-n) 1024
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 10240
cpu time   (seconds, -t) unlimited
max user processes  (-u) 1024
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited

Please let me know whether the values are sufficient to allow at least 500
concurrent connections.

2. Yes I checked mod_jk log when hang happens, and getting below errors
continuously.

[Wed Apr 19 02:00:38 2017]loadbalancer www.cmsp1.com 24.843284
[Wed Apr 19 02:00:38 2017][16313:3878614784] [info]
ajp_process_callback::jk_ajp_common.c (1788): Writing to client aborted or
client network problems
[Wed Apr 19 02:00:38 2017][16313:3878614784] [info]
ajp_service::jk_ajp_common.c (2447): (qu_prod_live_svr1) sending request to
tomcat failed (unrecoverable), because of client write error (attempt=1)
[Wed Apr 19 02:00:38 2017][16313:3878614784] [info] service::jk_lb_worker.c
(1384): service failed, worker qu_prod_live_svr1 is in local error state
[Wed Apr 19 02:00:38 2017][16313:3878614784] [info] service::jk_lb_worker.c
(1403): unrecoverable error 200, request failed. Client failed in the
middle of request, we can't recover to another instance.
[Wed Apr 19 02:00:38 2017]loadbalancer www.cmsp1.com 19.170901
[Wed Apr 19 02:00:38 2017][16313:3878614784] [info] jk_handler::mod_jk.c
(2608): Aborting connection for worker=loadbalancer
[Wed Apr 19 02:00:39 2017][16261:3878614784] [warn]
map_uri_to_worker_ext::jk_uri_worker_map.c (962): Uri * is invalid. Uri
must start with /
[Wed Apr 19 02:00:40 2017][16308:3878614784] [warn]
map_uri_to_worker_ext::jk_uri_worker_map.c (962): Uri * is invalid. Uri
must start with /

3. We will upgrade to 2.4.25, could you please share optimal configuration
for mpm-event to allow more concurrent users, please.

Thanks
Jay


On Tue, Apr 18, 2017 at 10:03 AM, Luca Toscano 
wrote:

> Hi,
>
> Some suggestions:
>
> 1) check your RHEL ulimits applied to httpd, the error message "Resource
> temporarily unavailable: setuid: unable to change to uid" could be related
> to maximum number of processes (allowed by the OS) reached. This should
> allow you to spawn more httpd processes.
>
> 2) Have you checked when the "hang" happens? If you have long lived
> connections and your httpd server reloads (for example for log rotation)
> then it might hang a bit while waiting for the remaining connections to
> drain.
>
> 3) If possible I'd consider to upgrade httpd to >= 2.4.25 and use
> mpm-event (rather than prefork).
>
> Hope that helps!
>
> Luca
>
>
> 2017-04-16 13:18 GMT+02:00 Jayaram Ponnusamy 
> :
>
>> Dear All,
>>
>> We were runnig our site in PHP based CMS tool earlier, and normally
>> 20-30K users will access our sites daily. But in new system with Tomcat, we
>> are facing performance and availability issue frequently, when i access the
>> tomcat url directly the page is loading within 3seconds, but if we access
>> webServer URL then its taking more than 9seconds.
>>
>> Also, Each day I am seeing more and more of these in my error_logs, and
>> when the Total Children value is reached 999 the Apache is not responding
>> and Server reboot only help to bring the site back. Every day atleast 4-5
>> times we are facing this issue (we are using mod_jk to connect with tomcat).
>>
>> Kindly please help on this.
>>
>> Usually I am seeing this on my error_log:
>> [Sat Apr 15 20:49:33 2017] [info] server seems busy, (you may need to
>> increase StartServers, or Min/MaxSpareServers), spawning 8 children, there
>> are 4 idle, and 31 total children
>> [Sat Apr 15 20:51:14 2017] [info] server seems busy, (you may need to
>> increase StartServers, or Min/MaxSpareServers), spawning 8 children, there
>> are 0 idle, and 20 total children
>> [Sat Apr 15 20:51:15 2017] [info] server seems busy, (you may need to
>> increase StartServers, or Min/MaxSpareServers), spawning 16 children, there
>> are 0 idle, and 28 total children
>> [Sat Apr 15 20:51:16 2017] [info] server seems busy, (you may need to
>> increase StartServers, or Min/MaxSpareServers), spawning 32 children, there
>> are 0 idle, and 44 total children
>> We are using two Apache Nodes and Connected with Two Tomcat (at
>> Application Level Clustering).
>> Apache Servers:
>> 4 Core 64-bit, Rhel System running on 16GB RAM (Both Servers)
>> Server version: Apache/2.2.21 (Unix)
>>
>> *httpd.conf*
>> KeepAlive On
>> Timeout 300
>> MaxKeepAliveRequests 

Re: [users@httpd] Help: Apache Crashing Everyday

2017-04-18 Thread Luca Toscano
Hi,

Some suggestions:

1) check your RHEL ulimits applied to httpd, the error message "Resource
temporarily unavailable: setuid: unable to change to uid" could be related
to maximum number of processes (allowed by the OS) reached. This should
allow you to spawn more httpd processes.

2) Have you checked when the "hang" happens? If you have long lived
connections and your httpd server reloads (for example for log rotation)
then it might hang a bit while waiting for the remaining connections to
drain.

3) If possible I'd consider to upgrade httpd to >= 2.4.25 and use mpm-event
(rather than prefork).

Hope that helps!

Luca

2017-04-16 13:18 GMT+02:00 Jayaram Ponnusamy :

> Dear All,
>
> We were runnig our site in PHP based CMS tool earlier, and normally 20-30K
> users will access our sites daily. But in new system with Tomcat, we are
> facing performance and availability issue frequently, when i access the
> tomcat url directly the page is loading within 3seconds, but if we access
> webServer URL then its taking more than 9seconds.
>
> Also, Each day I am seeing more and more of these in my error_logs, and
> when the Total Children value is reached 999 the Apache is not responding
> and Server reboot only help to bring the site back. Every day atleast 4-5
> times we are facing this issue (we are using mod_jk to connect with tomcat).
>
> Kindly please help on this.
>
> Usually I am seeing this on my error_log:
> [Sat Apr 15 20:49:33 2017] [info] server seems busy, (you may need to
> increase StartServers, or Min/MaxSpareServers), spawning 8 children, there
> are 4 idle, and 31 total children
> [Sat Apr 15 20:51:14 2017] [info] server seems busy, (you may need to
> increase StartServers, or Min/MaxSpareServers), spawning 8 children, there
> are 0 idle, and 20 total children
> [Sat Apr 15 20:51:15 2017] [info] server seems busy, (you may need to
> increase StartServers, or Min/MaxSpareServers), spawning 16 children, there
> are 0 idle, and 28 total children
> [Sat Apr 15 20:51:16 2017] [info] server seems busy, (you may need to
> increase StartServers, or Min/MaxSpareServers), spawning 32 children, there
> are 0 idle, and 44 total children
> We are using two Apache Nodes and Connected with Two Tomcat (at
> Application Level Clustering).
> Apache Servers:
> 4 Core 64-bit, Rhel System running on 16GB RAM (Both Servers)
> Server version: Apache/2.2.21 (Unix)
>
> *httpd.conf*
> KeepAlive On
> Timeout 300
> MaxKeepAliveRequests 100
> KeepAliveTimeout 15
> 
> StartServers 80
> ServerLimit 3500
> MaxClients 3500
> MaxRequestsPerChild  0
> 
>
> *workers.properties*
> worker.list=loadbalancer,status
> worker.qu_prod_live_svr.type=ajp13
> worker.qu_prod_live_svr.host=cmsp1
> worker.qu_prod_live_svr.port=8009
> worker.qu_prod_live_svr.socket_keepalive=1
> worker.qu_prod_live_svr.socket_timeout=300
> worker.qu_prod_live_svr1.type=ajp13
> worker.qu_prod_live_svr1.host=cmsp2
> worker.qu_prod_live_svr1.port=8009
> worker.qu_prod_live_svr1.socket_keepalive=1
> worker.qu_prod_live_svr1.socket_timeout=300
> worker.qu_prod_live_svr.lbfactor=1
> worker.qu_prod_live_svr1.lbfactor=1
> worker.loadbalancer.type=lb
> worker.loadbalancer.balance_workers=qu_prod_live_svr,qu_prod_live_svr1
> worker.status.type=status
>
> *Tomcat Servers:*
> 4 Core 64-bit, Rhel System running on 16GB RAM (Both Servers)
> Server version: Apache Tomcat/7.0.42
>  URIEncoding="UTF-8" emptySessionPath="true" maxThreads="500"
> minSpareThreads="10" connectionTimeout="-1" />
>  URIEncoding="UTF-8" />
>
> *error_log:*
> [Sat Apr 15 21:52:36 2017] [info] server seems busy, (you may need to
> increase StartServers, or Min/MaxSpareServers), spawning 32 children, there
> are 0 idle, and 839 total children
> [Sat Apr 15 21:52:37 2017] [info] server seems busy, (you may need to
> increase StartServers, or Min/MaxSpareServers), spawning 32 children, there
> are 0 idle, and 871 total children
> [Sat Apr 15 21:52:38 2017] [info] server seems busy, (you may need to
> increase StartServers, or Min/MaxSpareServers), spawning 32 children, there
> are 0 idle, and 903 total children
> [Sat Apr 15 21:52:39 2017] [info] server seems busy, (you may need to
> increase StartServers, or Min/MaxSpareServers), spawning 32 children, there
> are 0 idle, and 935 total children
> [Sat Apr 15 21:52:40 2017] [info] server seems busy, (you may need to
> increase StartServers, or Min/MaxSpareServers), spawning 32 children, there
> are 0 idle, and 967 total children
> [Sat Apr 15 21:52:41 2017] [info] server seems busy, (you may need to
> increase StartServers, or Min/MaxSpareServers), spawning 32 children, there
> are 0 idle, and 999 total children
> [Sat Apr 15 21:52:41 2017] [alert] (11)Resource temporarily unavailable:
> setuid: unable to change to uid: 2
> [Sat Apr 15 21:52:41 2017] [alert] (11)Resource temporarily unavailable:
> setuid: unable to change to uid: 2
> [Sat Apr 15 21:52:41 2017] [alert] (11)Resource temporarily unavailable:
> setuid: 

[users@httpd] Help: Apache Crashing Everyday

2017-04-16 Thread Jayaram Ponnusamy
Dear All,

We were runnig our site in PHP based CMS tool earlier, and normally 20-30K
users will access our sites daily. But in new system with Tomcat, we are
facing performance and availability issue frequently, when i access the
tomcat url directly the page is loading within 3seconds, but if we access
webServer URL then its taking more than 9seconds.

Also, Each day I am seeing more and more of these in my error_logs, and
when the Total Children value is reached 999 the Apache is not responding
and Server reboot only help to bring the site back. Every day atleast 4-5
times we are facing this issue (we are using mod_jk to connect with tomcat).

Kindly please help on this.

Usually I am seeing this on my error_log:
[Sat Apr 15 20:49:33 2017] [info] server seems busy, (you may need to
increase StartServers, or Min/MaxSpareServers), spawning 8 children, there
are 4 idle, and 31 total children
[Sat Apr 15 20:51:14 2017] [info] server seems busy, (you may need to
increase StartServers, or Min/MaxSpareServers), spawning 8 children, there
are 0 idle, and 20 total children
[Sat Apr 15 20:51:15 2017] [info] server seems busy, (you may need to
increase StartServers, or Min/MaxSpareServers), spawning 16 children, there
are 0 idle, and 28 total children
[Sat Apr 15 20:51:16 2017] [info] server seems busy, (you may need to
increase StartServers, or Min/MaxSpareServers), spawning 32 children, there
are 0 idle, and 44 total children
We are using two Apache Nodes and Connected with Two Tomcat (at Application
Level Clustering).
Apache Servers:
4 Core 64-bit, Rhel System running on 16GB RAM (Both Servers)
Server version: Apache/2.2.21 (Unix)

*httpd.conf*
KeepAlive On
Timeout 300
MaxKeepAliveRequests 100
KeepAliveTimeout 15

StartServers 80
ServerLimit 3500
MaxClients 3500
MaxRequestsPerChild  0


*workers.properties*
worker.list=loadbalancer,status
worker.qu_prod_live_svr.type=ajp13
worker.qu_prod_live_svr.host=cmsp1
worker.qu_prod_live_svr.port=8009
worker.qu_prod_live_svr.socket_keepalive=1
worker.qu_prod_live_svr.socket_timeout=300
worker.qu_prod_live_svr1.type=ajp13
worker.qu_prod_live_svr1.host=cmsp2
worker.qu_prod_live_svr1.port=8009
worker.qu_prod_live_svr1.socket_keepalive=1
worker.qu_prod_live_svr1.socket_timeout=300
worker.qu_prod_live_svr.lbfactor=1
worker.qu_prod_live_svr1.lbfactor=1
worker.loadbalancer.type=lb
worker.loadbalancer.balance_workers=qu_prod_live_svr,qu_prod_live_svr1
worker.status.type=status

*Tomcat Servers:*
4 Core 64-bit, Rhel System running on 16GB RAM (Both Servers)
Server version: Apache Tomcat/7.0.42



*error_log:*
[Sat Apr 15 21:52:36 2017] [info] server seems busy, (you may need to
increase StartServers, or Min/MaxSpareServers), spawning 32 children, there
are 0 idle, and 839 total children
[Sat Apr 15 21:52:37 2017] [info] server seems busy, (you may need to
increase StartServers, or Min/MaxSpareServers), spawning 32 children, there
are 0 idle, and 871 total children
[Sat Apr 15 21:52:38 2017] [info] server seems busy, (you may need to
increase StartServers, or Min/MaxSpareServers), spawning 32 children, there
are 0 idle, and 903 total children
[Sat Apr 15 21:52:39 2017] [info] server seems busy, (you may need to
increase StartServers, or Min/MaxSpareServers), spawning 32 children, there
are 0 idle, and 935 total children
[Sat Apr 15 21:52:40 2017] [info] server seems busy, (you may need to
increase StartServers, or Min/MaxSpareServers), spawning 32 children, there
are 0 idle, and 967 total children
[Sat Apr 15 21:52:41 2017] [info] server seems busy, (you may need to
increase StartServers, or Min/MaxSpareServers), spawning 32 children, there
are 0 idle, and 999 total children
[Sat Apr 15 21:52:41 2017] [alert] (11)Resource temporarily unavailable:
setuid: unable to change to uid: 2
[Sat Apr 15 21:52:41 2017] [alert] (11)Resource temporarily unavailable:
setuid: unable to change to uid: 2
[Sat Apr 15 21:52:41 2017] [alert] (11)Resource temporarily unavailable:
setuid: unable to change to uid: 2
[Sat Apr 15 21:52:41 2017] [alert] (11)Resource temporarily unavailable:
setuid: unable to change to uid: 2
[Sat Apr 15 21:52:41 2017] [alert] Child 9351 returned a Fatal error...
Apache is exiting!
[Sat Apr 15 21:52:41 2017] [alert] (11)Resource temporarily unavailable:
setuid: unable to change to uid: 2
[Sat Apr 15 21:52:41 2017] [alert] (11)Resource temporarily unavailable:
setuid: unable to change to uid: 2
[Sat Apr 15 21:52:41 2017] [alert] (11)Resource temporarily unavailable:
setuid: unable to change to uid: 2
[Sat Apr 15 21:53:06 2017] [error] (22)Invalid argument:
apr_global_mutex_lock(jk_log_lock) failed
[Sat Apr 15 21:53:06 2017] [error] mod_jk: jk_log_to_file
[Sat Apr 15 21:53:06 2017][8752:4177577728] [info]
ajp_connection_tcp_get_message::jk_ajp_common.c (1150): (qu_prod_live_svr1)
can't receive the response header message from tomcat, network problems or
tomcat (10.11.11.32:8009) is down (errno=104)\n failed: Broken pipe
[Sat Apr 15 21:53:06 2017] [error] (22)Invalid