FWIW, I've very recently had something similar happen to a 4.5-STABLE box.
The machine was NOT SMP, and the cause, as far as we know, was that /var
had been filled by apache's error_log -- a funky new mod_throttle install
with lots of 
critical_acquire() failed: Permission denied
critical_release() failed: Permission denied
entries.

Now, I assume that this is not because /var was full, but actually because
of system V semaphore locking in the mod_throttle code.

In mod_throttle-3.1.2...
The critical_acquire() code from mod_throttle.c (assuming
defined(USE_SYSTEM_V_SERIALIZATION)):

<snip>

struct critical {
        int id;
        struct sembuf on;
        struct sembuf off;
};

</snip><snip>

static int
critical_acquire(t_critical *mp)
{
        for (errno = 0; semop(mp->id, &mp->on, 1) < 0; ) {
                if (errno != EINTR) {
                        /*** We really should kill the server here. ***/
                        perror("critical_acquire() failed");

                        /* Neither of these calls appear to shutdown the
                         * server and its children; exit(APEXIT_CHILDFATAL),
                         * appears to kill only the parent process.
                         */
                        ap_start_shutdown();
                        return -1;
                }
        }

        return 0;
}

</snip>

Livelock, maybe?  Is there some sort of internal kernel semaphore table which
might be getting filled up or something?  I'd also like to find out more about
this, but sadly, the machine is a remote one and I can't drop into ddb as
suggested...
Thanks you all very much.  Hope this information is of use.
-Anthony.


On Sun, May 05, 2002 at 04:31:36PM -0700, Patrick Thomas wrote:
> 
> So, based on a previous thread, it looks like I have a server whose
> userland halted, essentially, but the kernel continued running.
> 
> As evidenced by:
> 
> - you can still ping the server just fine
> - you can still connect to running services just fine - if you ssh to it,
> `ssh -v` (verbose) claims a connection is established, but the server
> doesn't respond in any way over that connection.  Further, you can telnet
> to POP or IMAP or HTTP ports, and get a connection, but you can't get any
> response.
> - cron does NOT run while the server is in this state - no jobs run
> - no response from the console - caps lock does NOT toggle the LED
> 
> So, as was suggested in the previous thread, it looks like my kernel is
> still running, but the userland has halted.  There are no log entries that
> give any clue as to why this happened last week.
> 
> 
> 1. from a theoretical standpoint, how would this happen ?
> 2. Is there any way to watchdog for it and escape from it before the
> userland completely crashes ?
> 3. any previous/old problems that would cause this behavior ?
> 
> 
> It is a FreeBSD 4.5-RELEASE system, and it is SMP - fairly heavily loaded
> (averages 60% CPU idle in `top` output).
> 
> thanks,
> 
> PT
> 
> 
> 
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-hackers" in the body of the message
-----------------------------------------------
PGP key at:
    http://www.keyserver.net/
    http://www.anthonydotcom.com/gpgkey/key.txt
Home:
    http://www.anthonydotcom.com
-----------------------------------------------

Attachment: msg34122/pgp00000.pgp
Description: PGP signature

Reply via email to