Re: Intresting problem (and diald WARNING)
On Tue, 11 Mar 1997 23:04:04 +0100 Lars Hallberg ([EMAIL PROTECTED]) wrote: > First a smal warning about diald: If Your etc/diald.ip-up script for some > reason is 'hanging', then will diald keep the link up (this is probably > documented behavior, but I was suprised anyway...). I noticed this > becose my ip-up script was 'hanging' on a pipe to /dev/xconsole. I got > diald work as expected by making /etc/diald.ip-up just bakgrund another > shellscript (doing all the 'real' work). Yes, diald 0.14 wait until the completion of ip-up to coonsider initialization ok. The latest version 0.16-1 backgrounds the ip-up... > Some subprces of this script is hanging, probably the subproces > started by the line: > > echo -e `date`: Running /etc/diald.fetch-up\\r > /dev/xconsole & Yes, /dev/xconsole is a FIFO (aka a named pipe). It is normally read by a xconsole process started by xdm, but if this process isn't running, the FIFO fills up... When a FIFO is full, any process write()ing to it hangs, until someone reads it and makes room... The FIFO is written by syslogd (if you've got the standard distributed /etc/syslogd.conf), and almost every message that goes to the /var/log logfiles also gets dumped there. You can also read it by cat /dev/xconsole... Note that once it's emptied up, it's empty (redoing the same cat will just show nothing). > I suspect this line becose the problem do have something to do with > /dev/xconsole as it only hangs when the xconsole loging is broken. > I don't know what's braeking the xconsole but it happen in the first > day of system uptime (and is OK after a reboot). Yes, because after a reboot the FIFO is empty. It fills quickly after a few hours of uptime. > More intresting is the side-efects of this > process hanging on xconsole. For some reason cron-jobs starts to > hang when this process hangs. I can't login as root when this happens > but ther's no problems in using existing root-shells. User login is > OK to. If I kill the proces hanging on xconsole all cronjobs do finish > and the system is back at normal (exept xconsole who is still broke). > If I don't kill it the cronjobs keep piling up until the system load > craches the mashine :( Let me take my teacher's hat (actually while digging out this, I learnt some stuff too). Only one process at a time can write to a FIFO. If a process already has a writing file descriptor on a FIFO, an other process willing to open the FIFO in write mode will hang in open(). So what happens is that: 1) The FIFO is full 2) Your echo commands have the FIFO opened write, but the write() hangs because the FIFO is full. 3) Syslogd tries to write something to the FIFO and hangs in open(). Strangely enough :-), a root login will get logged by syslogd. Cron jobs start and end times get logged by syslogd (they don't actually appear in the log file because of the syslogd configuration, but cron calls syslogd). And as syslog is hung, the calling process are also hung (login, cron, etc...). Your whole system seems to be broken because of this little innocent sneaky echo to a FIFO ! Note that syslogd is clever enough to know when the pipe is full. Once opened (assuming it doesn't hang on open()), it does a non-waiting write(), which returns 0 characters written when the FIFO is full, and syslogd doesn't get hung. Your problem is a one because syslogd hangs in open(), where it doesn't expect. > I do work-around this problem by leting /etc/diald.ip-up also > background this script: [snip] IMHO, this is ugly. Try using the logger program. It's a shell interface to syslog. It will do all the hard work for you, will nicely deal with /etc/xconsole, and you'll also get your stuff in the system log files in addition to /dev/xconsole. Read (1)logger. > o Why is xconsole broke? lsattr /dev/xconsole (when it's broke) give: > lsattr 1.06, 7-Oct-96 for EXT2 FS 0.5b, 95/08/09 > lsattr: Invalid argument While reading flags on /dev/xconsole I don't know why lsattr returns this error message. Actually, it fail on any special file... Maybe this is intended... Any ext2 specialist around here ? As lsattr is only useful for regular files, it makes sense to return an error on a special file. I hope I was clear enough. This is rather hairy... Phil.
Intresting problem (and diald WARNING)
Hello First a smal warning about diald: If Your etc/diald.ip-up script for some reason is 'hanging', then will diald keep the link up (this is probably documented behavior, but I was suprised anyway...). I noticed this becose my ip-up script was 'hanging' on a pipe to /dev/xconsole. I got diald work as expected by making /etc/diald.ip-up just bakgrund another shellscript (doing all the 'real' work). Sadly, that do bring up other problems... This is the script i do background from /etc/diald.ip-up: /etc/diald.fetch-up -- #!/bin/bash # # Fetch and send mail # # Run by diald.ip_up when diald have conected to internet # Got 30 sec to complete before killed by diald.fetch-kill! # echo -e `date`: Running /etc/diald.fetch-up\\r > /dev/tty8 & echo -e `date`: Running /etc/diald.fetch-up\\r > /dev/xconsole & echo `date`":" fetchmail run from diald.fetch-up >> /root/mail-log # # Stopp fetchmail if alredy runing (probably a bug). # /usr/local/bin/fetchmail --quit -f /root/.fetchmailrc -L /root/mail-log # # Starta fetchmail in deamon mode and send qued mail. # /usr/local/bin/fetchmail -f /root/.fetchmailrc -b 10 -L /root/mail-log -d 600 -t 300 runq & -- end /etc/diald.fetch-up Some subprces of this script is hanging, probably the subproces started by the line: echo -e `date`: Running /etc/diald.fetch-up\\r > /dev/xconsole & I suspect this line becose the problem do have something to do with /dev/xconsole as it only hangs when the xconsole loging is broken. I don't know what's braeking the xconsole but it happen in the first day of system uptime (and is OK after a reboot). I do suspect the cron.daily scripts. More intresting is the side-efects of this process hanging on xconsole. For some reason cron-jobs starts to hang when this process hangs. I can't login as root when this happens but ther's no problems in using existing root-shells. User login is OK to. If I kill the proces hanging on xconsole all cronjobs do finish and the system is back at normal (exept xconsole who is still broke). If I don't kill it the cronjobs keep piling up until the system load craches the mashine :( I do work-around this problem by leting /etc/diald.ip-up also background this script: -- /etc/diald.fetch-kill - #!/bin/bash # # Kill ev. 'hangin' diald.fetch-up # * DO NOT USE WITH ISDN CONECTION WITHOUT PRECOTION ** # # Started by diald.ip_up when diald have conected to internet # # Give diald.fetch-up 30 secs to complete # sleep 30 # # Then kill any remaining diald.fetch-up *twice*! # kill `ps -x | grep diald.fetch-up | gawk -- "{print \\\$1}"` sleep 1 kill `ps -x | grep diald.fetch-up | gawk -- "{print \\\$1}"` sleep 2 # # Mean kill any diald.fetch-up *stil* remaining! # kill -9 `ps -x | grep diald.fetch-up | gawk -- "{print \\\$1}"` sleep 1 kill -9 `ps -x | grep diald.fetch-up | gawk -- "{print \\\$1}"` # # Hope this is enuf. Don't dear kill anymore as diald might have a new # *real* conection comming up soon... If You use ISDN it is probably # far to lait already # end /etc/diald.fetch-kill --- This is (so far) keeping my system from craching :) but it is an ugly work-around. I can live with an broken xconsole but I still want to know wats realy is wrong with my system. Do You have any ide: o Why is xconsole broke? lsattr /dev/xconsole (when it's broke) give: lsattr 1.06, 7-Oct-96 for EXT2 FS 0.5b, 95/08/09 lsattr: Invalid argument While reading flags on /dev/xconsole (dont know what it gives when xconsole is working, and I don't want to reboot as I am trying to find out if this 'fix' can keep my system up for a 'long' time). o Why is root login broken when an proces is hanging on the broken xconsole and why is cronjobs hanging at this time? o Any sugestions on what to read or what to investigate on my system is welcome. All the cron.daily script on the machine is as installed by Debian (the standard script has some "find ... rm" lines commented out whit a referens to a security hole. The other scripts do "find ... rm" call. Any reason to worry? Where can I read about this security hole?) I do know this is unexact questions but it is the best I can do. I had this problem with Debian 1.1 kernnel 2.0.0 and I upgraded to Debiann 1.2.6 and kernel 2.0.27 in hope it was an 2.0.0 related problem but the problem did not go away... I use the prebuilt Debian 'install' kernels (no kernel compile). All the cron.daily script on the machine is as instaled by Debian (the standard script has some find -> rm lines comented out whit a referens to a security hole. The other scripts do find -> rm call. Any reason to worry? Where can I read about this security hole?) This problem has become more 'intresting' and less 'frustrating' as I don't need to do bad reboots avery day any more Ask if You need to know anything more. Sorry for my poor english and TIA /Lars -- /