Bug#1057126: Bug#1067104: Acknowledgement (server stalls: AH00046: child process 2876749 still did not exit, sending a SIGKILL)

2024-03-21 Thread Peter Krefting

2024-03-21 13:12 skrev Yaroslav Halchenko:


FWIW here is a dirty workaround script I just crafted with chatgpt to
monitor/restart apache2 as soon as it starts happening


My workaround is simpler, I have this line in root's crontab:

 5 * * * * curl --silent --max-time 5 --output /dev/null 
http://localhost/trac/ || systemctl restart apache2


It seems to restart Apache once every 5-8 days, according to the notices 
I see from Zabbix. The frequency might very well be related to the 
number of accesses to the server.


--
\\// Peter - http://www.softwolves.pp.se/



Bug#1057126: Bug#1067104: Acknowledgement (server stalls: AH00046: child process 2876749 still did not exit, sending a SIGKILL)

2024-03-21 Thread Yaroslav Halchenko
"All ingenious is simple" -- thanks for sharing.  I might redo following
your example but to check more frequently.

On Thu, 21 Mar 2024, Peter Krefting wrote:
> My workaround is simpler, I have this line in root's crontab:

>  5 * * * * curl --silent --max-time 5 --output /dev/null
> http://localhost/trac/ || systemctl restart apache2

-- 
Yaroslav O. Halchenko
Center for Open Neuroscience http://centerforopenneuroscience.org
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
WWW:   http://www.linkedin.com/in/yarik



Bug#1057126: Bug#1067104: Acknowledgement (server stalls: AH00046: child process 2876749 still did not exit, sending a SIGKILL)

2024-03-21 Thread Yaroslav Halchenko
I think "my" https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1067104
is a duplicate of this one.  I blame mod_wsgi since this I believe
started to happen after I started to use it.

FWIW here is a dirty workaround script I just crafted with chatgpt to
monitor/restart apache2 as soon as it starts happening (doesn't happen upon
every maintenance event for me I believe). Let me know if I should gather any
additional information.

#!/bin/bash

set -eu

# Define the lock file and log directory
lock_file="/var/log/apache-scoreboard-restart/lock.lck"
log_dir="/var/log/apache-scoreboard-restart/"

# Ensure the log directory exists
mkdir -p "$log_dir"

# Attempt to acquire a lock
exec 200>"$lock_file"
if ! flock -n 200 ; then
echo "Another instance is running."
exit 0
fi

# Function to perform actions when the specified log line is found
handle_scoreboard_full() {
local timestamp=$(date --iso-8601=seconds)
local log_file="${log_dir}${timestamp}.log"

echo "Logging system information to $log_file."
{ ps auxw -H; echo "---"; lsof; } > "$log_file"

echo "Reloading Apache." >> "$log_file"
service apache2 reload

echo "Sleeping for a minute." >> "$log_file"
sleep 60
}

# Monitor the Apache error log
while true; do
tail --follow=name /var/log/apache2/error.log | while read line 
; do
if echo "$line" | grep -q "AH03490: scoreboard is full, 
not at MaxRequestWorkers.Increase ServerLimit." ; then
handle_scoreboard_full
break  # so we start with a fresh tail
fi
done
done

-- 
Yaroslav O. Halchenko
Center for Open Neuroscience http://centerforopenneuroscience.org
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
WWW:   http://www.linkedin.com/in/yarik



signature.asc
Description: PGP signature


Bug#1067104: server stalls: AH00046: child process 2876749 still did not exit, sending a SIGKILL

2024-03-21 Thread Stefan Fritsch

Am 18.03.24 um 13:59 schrieb Yaroslav Halchenko:

Package: apache2
Version: 2.4.57-2
Severity: important

Server was working just fine for years and recently started to stall
completely after 3-7 days of functioning normally.  error logs get filled up
first with AH03490 and then eventually with AH00045 messages:

 [Sun Mar 17 02:26:01.353381 2024] [mpm_event:error] [pid 2649373:tid 
139846579189632] AH03490: scoreboard is full, not at MaxRequestWorkers.Increase 
ServerLimit.
 ...
 [Sun Mar 17 22:00:42.201774 2024] [mpm_event:error] [pid 2649373:tid 
139846579189632] AH03490: scoreboard is full, not at MaxRequestWorkers.Increase 
ServerLimit.
 [Sun Mar 17 22:00:42.995574 2024] [mpm_event:error] [pid 2649373:tid 
139846579189632] AH03490: scoreboard is full, not at MaxRequestWorkers.Increase 
ServerLimit.
 [Sun Mar 17 22:00:42.998488 2024] [mpm_event:notice] [pid 2649373:tid 
139846579189632] AH00492: caught SIGWINCH, shutting down gracefully
 [Sun Mar 17 22:00:46.358981 2024] [core:warn] [pid 2649373:tid 
139846579189632] AH00045: child process 2649375 still did not exit, sending a 
SIGTERM
 [Sun Mar 17 22:00:46.359064 2024] [core:warn] [pid 2649373:tid 
139846579189632] AH00045: child process 2649376 still did not exit, sending a 
SIGTERM


Have you tried increasing ServerLimit as the warning suggests?

Apart from that, it is probably the same as 
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1057126 . mod_wsgi or 
some python script is preventing apache processes from dying and they 
accumulate until the scroeboard is full. Which versions of the wsgi 
related packages are you using?




Processed: tagging 1032628

2024-03-21 Thread Debian Bug Tracking System
Processing commands for cont...@bugs.debian.org:

> tags 1032628 + pending
Bug #1032628 [libapache2-mod-proxy-uwsgi] please drop transitional package 
libapache2-mod-proxy-uwsgi from src:apache2
Added tag(s) pending.
> thanks
Stopping processing here.

Please contact me if you need assistance.
-- 
1032628: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1032628
Debian Bug Tracking System
Contact ow...@bugs.debian.org with problems