Re: Intresting problem (and diald WARNING)

1997-03-12 Thread Philippe Troin

On Tue, 11 Mar 1997 23:04:04 +0100 Lars Hallberg ([EMAIL PROTECTED]) 
wrote:

> First a smal warning about diald: If Your etc/diald.ip-up script for some
> reason is 'hanging', then will diald keep the link up (this is probably
> documented behavior, but I was suprised anyway...). I noticed this
> becose my ip-up script was 'hanging' on a pipe to /dev/xconsole. I got
> diald work as expected by making /etc/diald.ip-up just bakgrund another
> shellscript (doing all the 'real' work). 

Yes, diald 0.14 wait until the completion of ip-up to coonsider 
initialization ok. The latest version 0.16-1 backgrounds the ip-up...

> Some subprces of this script is hanging, probably the subproces
> started by the line:
> 
> echo -e `date`: Running /etc/diald.fetch-up\\r > /dev/xconsole &

Yes, /dev/xconsole is a FIFO (aka a named pipe).
It is normally read by a xconsole process started by xdm, but if this 
process isn't running, the FIFO fills up... When a FIFO is full, any 
process write()ing to it hangs, until someone reads it and makes 
room...
The FIFO is written by syslogd (if you've got the standard 
distributed /etc/syslogd.conf), and almost every message that goes to 
the /var/log logfiles also gets dumped there.
You can also read it by cat /dev/xconsole...
Note that once it's emptied up, it's empty (redoing the same cat will 
just show nothing).

> I suspect this line becose the problem do have something to do with
> /dev/xconsole as it only hangs when the xconsole loging is broken.
> I don't know what's braeking the xconsole but it happen in the first
> day of system uptime (and is OK after a reboot).

Yes, because after a reboot the FIFO is empty. It fills quickly after 
a few hours of uptime.

> More intresting is the side-efects of this
> process hanging on xconsole. For some reason cron-jobs starts to
> hang when this process hangs. I can't login as root when this happens
> but ther's no problems in using existing root-shells. User login is
> OK to. If I kill the proces hanging on xconsole all cronjobs do finish
> and the system is back at normal (exept xconsole who is still broke).
> If I don't kill it the cronjobs keep piling up until the system load
> craches the mashine :(

Let me take my teacher's hat (actually while digging out this, I 
learnt some stuff too).
Only one process at a time can write to a FIFO.
If a process already has a writing file descriptor on a FIFO, an 
other process willing to open the FIFO in write mode will hang in 
open().
So what happens is that:
 1) The FIFO is full
 2) Your echo commands have the FIFO opened write, but the write() 
hangs because the FIFO is full.
 3) Syslogd tries to write something to the FIFO and hangs in open().
Strangely enough :-), a root login will get logged by syslogd. Cron 
jobs start and end times get logged by syslogd (they don't actually 
appear in the log file because of the syslogd configuration, but cron 
calls syslogd).
And as syslog is hung, the calling process are also hung (login, 
cron, etc...).
Your whole system seems to be broken because of this little innocent 
sneaky echo to a FIFO !

Note that syslogd is clever enough to know when the pipe is full. 
Once opened (assuming it doesn't hang on open()), it does a 
non-waiting write(), which returns 0 characters written when the FIFO 
is full, and syslogd doesn't get hung.
Your problem is a one because syslogd hangs in open(), where it 
doesn't expect.

> I do work-around this problem by leting /etc/diald.ip-up also
> background this script:
[snip]

IMHO, this is ugly.
Try using the logger program. It's a shell interface to syslog. It 
will do all the hard work for you, will nicely deal with 
/etc/xconsole, and you'll also get your stuff in the system log files 
in addition to /dev/xconsole.
Read (1)logger.

> o Why is xconsole broke? lsattr /dev/xconsole (when it's broke) give:
>   lsattr 1.06, 7-Oct-96 for EXT2 FS 0.5b, 95/08/09
>   lsattr: Invalid argument While reading flags on /dev/xconsole

I don't know why lsattr returns this error message. Actually, it fail 
on any special file... Maybe this is intended... Any ext2 specialist 
around here ?
As lsattr is only useful for regular files, it makes sense to return 
an error on a special file.

I hope I was clear enough. This is rather hairy...

Phil.



Intresting problem (and diald WARNING)

1997-03-12 Thread Lars Hallberg
Hello

First a smal warning about diald: If Your etc/diald.ip-up script for some
reason is 'hanging', then will diald keep the link up (this is probably
documented behavior, but I was suprised anyway...). I noticed this
becose my ip-up script was 'hanging' on a pipe to /dev/xconsole. I got
diald work as expected by making /etc/diald.ip-up just bakgrund another
shellscript (doing all the 'real' work). Sadly, that do bring up other
problems... This is the script i do background from /etc/diald.ip-up:

 /etc/diald.fetch-up --
#!/bin/bash
#
# Fetch and send mail
#
# Run by diald.ip_up  when diald have conected to internet
# Got 30 sec to complete before killed by diald.fetch-kill!
#
echo -e `date`: Running /etc/diald.fetch-up\\r > /dev/tty8 &
echo -e `date`: Running /etc/diald.fetch-up\\r > /dev/xconsole &
echo `date`":" fetchmail run from diald.fetch-up >> /root/mail-log
#
# Stopp fetchmail if alredy runing (probably a bug).
#
/usr/local/bin/fetchmail --quit -f /root/.fetchmailrc -L /root/mail-log
#
# Starta fetchmail in deamon mode and send qued mail.
#
/usr/local/bin/fetchmail -f /root/.fetchmailrc -b 10 -L /root/mail-log -d 600 
-t 300
runq &
-- end /etc/diald.fetch-up 

Some subprces of this script is hanging, probably the subproces
started by the line:

echo -e `date`: Running /etc/diald.fetch-up\\r > /dev/xconsole &

I suspect this line becose the problem do have something to do with
/dev/xconsole as it only hangs when the xconsole loging is broken.
I don't know what's braeking the xconsole but it happen in the first
day of system uptime (and is OK after a reboot). I do suspect the
cron.daily scripts. More intresting is the side-efects of this
process hanging on xconsole. For some reason cron-jobs starts to
hang when this process hangs. I can't login as root when this happens
but ther's no problems in using existing root-shells. User login is
OK to. If I kill the proces hanging on xconsole all cronjobs do finish
and the system is back at normal (exept xconsole who is still broke).
If I don't kill it the cronjobs keep piling up until the system load
craches the mashine :(

I do work-around this problem by leting /etc/diald.ip-up also
background this script:

-- /etc/diald.fetch-kill -
#!/bin/bash
#
# Kill ev. 'hangin' diald.fetch-up
# * DO NOT USE WITH ISDN CONECTION WITHOUT PRECOTION **
#
# Started by diald.ip_up  when diald have conected to internet
#
# Give diald.fetch-up 30 secs to complete
#
sleep 30
#
# Then kill any remaining diald.fetch-up *twice*!
#
kill `ps -x | grep diald.fetch-up | gawk -- "{print \\\$1}"`
sleep 1
kill `ps -x | grep diald.fetch-up | gawk -- "{print \\\$1}"`
sleep 2
#
# Mean kill any diald.fetch-up *stil* remaining!
#
kill -9 `ps -x | grep diald.fetch-up | gawk -- "{print \\\$1}"`
sleep 1
kill -9 `ps -x | grep diald.fetch-up | gawk -- "{print \\\$1}"`
#
# Hope this is enuf. Don't dear kill anymore as diald might have a new
# *real* conection comming up soon... If You use ISDN it is probably
# far to lait already
#
 end /etc/diald.fetch-kill ---

This is (so far) keeping my system from craching :) but it is an
ugly work-around. I can live with an broken xconsole but I still
want to know wats realy is wrong with my system. Do You have any
ide:

o Why is xconsole broke? lsattr /dev/xconsole (when it's broke) give:
  lsattr 1.06, 7-Oct-96 for EXT2 FS 0.5b, 95/08/09
  lsattr: Invalid argument While reading flags on /dev/xconsole
  (dont know what it gives when xconsole is working, and I don't
  want to reboot as I am trying to find out if this 'fix' can keep
  my system up for a 'long' time).
o Why is root login broken when an proces is hanging on the broken
  xconsole and why is cronjobs hanging at this time?
o Any sugestions on what to read or what to investigate on my system
  is welcome.

All the cron.daily script on the machine is as installed by Debian
(the standard script has some "find ... rm" lines commented out whit a
referens to a security hole. The other scripts do "find ... rm" call.
Any reason to worry? Where can I read about this security hole?)

I do know this is unexact questions but it is the best I can do. I
had this problem with Debian 1.1 kernnel 2.0.0 and I upgraded to
Debiann 1.2.6 and kernel 2.0.27 in hope it was an 2.0.0 related
problem but the problem did not go away... I use the prebuilt Debian
'install' kernels (no kernel compile). All the cron.daily script on
the machine is as instaled by Debian (the standard script has some
find -> rm lines comented out whit a referens to a security hole.
The other scripts do find -> rm call. Any reason to worry? Where
can I read about this security hole?)

This problem has become more 'intresting' and less 'frustrating' as
I don't need to do bad reboots avery day any more

Ask if You need to know anything more.

Sorry for my poor english and TIA /Lars
--
   /