Re: Bind9 stopped after 34 days of uptime
On Wed, Dec 25, 2002 at 03:03:19PM +0100, InfoEmergencias - Luis Gomez wrote: Hi all I've been running my company's server with Linux in the same computer for about six months. Tonight, when I arrived home (my company is in my house) at about 6 a.m., I noticed I could not browse any website, and noticed that the DNS server (bind 9) was stopped. It was up when I left at 15.30h. I restarted the service and everything is OK now. [snip] Well, if anyone has ever had a problem like this and can lend me a hand or give me some advice, I'll be very happy to hear you :-) Apparently you can crash bind9 with a bad request. It happened to me once so far: Oct 31 06:29:14 polaris named[2450]: resolver.c:4030: REQUIREquery) != ((void *)0)) (((const isc__magic_t *)(query))-magic == ( (('Q') 24 | ( +'!') 16 | ( '!') 8 | ( '!')) failed Oct 31 06:29:14 polaris named[2450]: exiting (due to assertion failure) Oct 31 06:29:14 polaris named[2450]: resolver.c:4030: REQUIREquery) != ((void *)0)) (((const isc__magic_t *)(query))-magic == ( (('Q') 24 | ( +'!') 16 | ( '!') 8 | ( '!')) failed Oct 31 06:29:14 polaris named[2450]: exiting (due to assertion failure) IMHO, it should reject something like this and not quit!!! - Adam -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Bind9 stopped after 34 days of uptime
On Wed, Dec 25, 2002 at 03:03:19PM +0100, InfoEmergencias - Luis Gomez wrote: Hi all I've been running my company's server with Linux in the same computer for about six months. Tonight, when I arrived home (my company is in my house) at about 6 a.m., I noticed I could not browse any website, and noticed that the DNS server (bind 9) was stopped. It was up when I left at 15.30h. I restarted the service and everything is OK now. [snip] Well, if anyone has ever had a problem like this and can lend me a hand or give me some advice, I'll be very happy to hear you :-) Apparently you can crash bind9 with a bad request. It happened to me once so far: Oct 31 06:29:14 polaris named[2450]: resolver.c:4030: REQUIREquery) != ((void *)0)) (((const isc__magic_t *)(query))-magic == ( (('Q') 24 | ( +'!') 16 | ( '!') 8 | ( '!')) failed Oct 31 06:29:14 polaris named[2450]: exiting (due to assertion failure) Oct 31 06:29:14 polaris named[2450]: resolver.c:4030: REQUIREquery) != ((void *)0)) (((const isc__magic_t *)(query))-magic == ( (('Q') 24 | ( +'!') 16 | ( '!') 8 | ( '!')) failed Oct 31 06:29:14 polaris named[2450]: exiting (due to assertion failure) IMHO, it should reject something like this and not quit!!! - Adam
Re: Bind9 stopped after 34 days of uptime
Hi list, Here I am again and a happy newyear to all :) - Original message - On Wed, 25 Dec 2002 20:54:02 +0100 (CET) Richard [EMAIL PROTECTED] wrote in message [EMAIL PROTECTED]: On Wed, 25 Dec 2002, J.Reilink wrote: I've had exactly the same on our corperate primary nameserver (Slackware with bind 9.2.1), because there was no logging I couldn't find out why bind stopped working. Take a look at memory usage when Bind stop's working and monitor for some time how much memory Bind is using. If that amount is growing, Bind probably got a memory leak. ( isn't the first time :( ) The .pid files stays small[1] and the memory usage does too[2] # uptime 11:43pm up 179 days, 52 min, 1 user, load average: 0.03, 0.07, 0.0 The first crash was 8 days ago, so it first crashed after an uptime of (approx.) 171 days. In these last 8 days, Bind has crashed 3 times without any specific reason. Since the last crash yesterday I remembered CA-2002-19 http://www.cert.org/advisories/CA-2002-19.html and thought this might be the problem. Since there was no logging[3] I'm still not sure, but I've reconfigured syslog.conf to also log kernel messages, which it didn't at first. Our Bind version is not vulnerable (Bind 9.2.1) but perhaps our libc version (Bind is dynamicly linked) is. It is, as far as I can tell, version 2.2.5 [4] and I did see some strange messages in dmesg about ``UDP: bad checksum''. In fact, this is the reason why I turned on the kernel messages and installed tcpdump :) Perhaps it's a DoS? To be honest, I'm waiting untill the next crash to be sure (to see what the logs are telling me). Since we don't use Debian :( it's rather offtopic on this list, but perhaps it's interesting enough? :) [1] /var/run/named# ls -la -rw-r--r--1 root root5 Jun 6 2002 named.pid [2] according to ps aux and top, it stays around 6.2% and the proces is running 5 times. [3] There are some known issues with our secundairy nameserver, it generates a lot of errormessages in /var/log/*. That's why logging was off as much as possible. [4] /usr/lib# ls -la |grep libc -rw-r--r--1 root root 2347326 May 28 2001 libstdc++-3-libc6.2-2-2.10.0.a -r-xr-xr-x1 root root 274724 May 28 2001 libstdc++-3-libc6.2-2-2.10.0.so* lrwxrwxrwx1 root root 30 Jun 1 2002 libstdc++-libc6.2-2.a.3 - libstdc++-3-libc6.2-2-2.10.0.a lrwxrwxrwx1 root root 31 Jun 1 2002 libstdc++-libc6.2-2.so.3 - libstdc++-3-libc6.2-2-2.10.0.so Guess these are the files I'm looking for... Regards, Jan -- /\ ASCII Ribbon Campaign \ / No HTML in mail or news! X / \ DSINet: http://www.dsinet.org
Re: Bind9 stopped after 34 days of uptime
On Wednesday, 25 December 2002 21:54, Richard wrote: On Wed, 25 Dec 2002, J.Reilink wrote: I've had exactly the same on our corperate primary nameserver (Slackware with bind 9.2.1), because there was no logging I couldn't find out why bind stopped working. Take a look at memory usage when Bind stop's working and monitor for some time how much memory Bind is using. If that amount is growing, Bind probably got a memory leak. ( isn't the first time :( ) I've made the mistake of running bind with debugging (to find one bug), and have bind create a 2GB /var/named/named.run file. Bind crashed because that file was too big. Doh! If your Bind crashes regularly after X days, see if its creating its own (non-syslog) log file. Effects are similar to a memory leak. Greetings, Richard. Paul Vixie in an interview with Sendmail.net: Now that the Internet has the full spectrum of humanity as users, the technology is showing its weakness: it was designed to be used by friendly, smart people. Spammers, as an example of a class, are neither friendly nor smart. -- Berend De Schouwer
Re: Bind9 stopped after 34 days of uptime
El Lun 30 Dic 2002 08:16, Berend De Schouwer escribió: I've made the mistake of running bind with debugging (to find one bug), and have bind create a 2GB /var/named/named.run file. Bind crashed because that file was too big. Doh! If your Bind crashes regularly after X days, see if its creating its own (non-syslog) log file. Effects are similar to a memory leak. It's not, but thanks anyway! Since I restarted the service on the night of the 24th, it's been running normally. Maybe I'll wait other 34 days (until January the 28th or so) and see what happens. Thank you! Pope -- Luis Gomez Miralles InfoEmergencias - Technical Department Phone (+34) 654 24 01 34 Fax (+34) 963 49 31 80 [EMAIL PROTECTED] PGP Public Key available at http://www.infoemergencias.com/lgomez.asc
Re: Bind9 stopped after 34 days of uptime
On Wednesday, 25 December 2002 21:54, Richard wrote: On Wed, 25 Dec 2002, J.Reilink wrote: I've had exactly the same on our corperate primary nameserver (Slackware with bind 9.2.1), because there was no logging I couldn't find out why bind stopped working. Take a look at memory usage when Bind stop's working and monitor for some time how much memory Bind is using. If that amount is growing, Bind probably got a memory leak. ( isn't the first time :( ) I've made the mistake of running bind with debugging (to find one bug), and have bind create a 2GB /var/named/named.run file. Bind crashed because that file was too big. Doh! If your Bind crashes regularly after X days, see if its creating its own (non-syslog) log file. Effects are similar to a memory leak. Greetings, Richard. Paul Vixie in an interview with Sendmail.net: Now that the Internet has the full spectrum of humanity as users, the technology is showing its weakness: it was designed to be used by friendly, smart people. Spammers, as an example of a class, are neither friendly nor smart. -- Berend De Schouwer -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Bind9 stopped after 34 days of uptime
El Lun 30 Dic 2002 08:16, Berend De Schouwer escribió: I've made the mistake of running bind with debugging (to find one bug), and have bind create a 2GB /var/named/named.run file. Bind crashed because that file was too big. Doh! If your Bind crashes regularly after X days, see if its creating its own (non-syslog) log file. Effects are similar to a memory leak. It's not, but thanks anyway! Since I restarted the service on the night of the 24th, it's been running normally. Maybe I'll wait other 34 days (until January the 28th or so) and see what happens. Thank you! Pope -- Luis Gomez Miralles InfoEmergencias - Technical Department Phone (+34) 654 24 01 34 Fax (+34) 963 49 31 80 [EMAIL PROTECTED] PGP Public Key available at http://www.infoemergencias.com/lgomez.asc -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Bind9 stopped after 34 days of uptime
As you can all see, my bind9 has been up for some while: 2664 ?S 0:00 /usr/sbin/named -u named 2665 ?S 0:16 \_ /usr/sbin/named -u named 2666 ?S137:29 \_ /usr/sbin/named -u named 2667 ?S 1:04 \_ /usr/sbin/named -u named 2668 ?S 14:17 \_ /usr/sbin/named -u named This is on a Pentium 100 MHz with around 32 MB of RAM. The box itself has been up 134 days. This is the primary internet server for zionlth.org. Traffic to this domain is modest... -- Phil PGP/GPG Key: http://www.zionlth.org/~plhofmei/ wget -O - http://www.zionlth.org/~plhofmei/key.txt | gpg --import -- Excuse #189: You did wha... oh _dear_ -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Bind9 stopped after 34 days of uptime
On Thu, Dec 26, 2002 at 09:16:12AM -0500, Phillip Hofmeister wrote: This is on a Pentium 100 MHz with around 32 MB of RAM. The box itself has been up 134 days. This is the primary internet server for zionlth.org. Traffic to this domain is modest... I have a feeling that it's possible to misconfigure bind9 in such a way that it fails periodically. I had it running on a 200 MHz box with 32 MB RAM, and it failed occasionally, with no indication as to why. However, I've since re-worked named.conf, and have not experienced an unexpected failure in the past 6 months. The original named.conf was used with bind 8, and I just kept it when I upgraded to bind9 (except for the logging configuration, which changed significantly). It was when I ditched the old named.conf and re-wrote it for bind9, including more refined logging configuration, that stability was greatly improved. Of course, for a lot of the time that bind9 was crashing, it was running versions prior to the version that was actually released with woody, since this box was running woody before it was released. noah -- ___ | Web: http://web.morgul.net/~frodo/ | PGP Public Key: http://web.morgul.net/~frodo/mail.html msg08313/pgp0.pgp Description: PGP signature
Re: Bind9 stopped after 34 days of uptime
Hey All, I have a machine running a 2.4.20 kernel on deb2.2R5, and bind 9.2.1, uptime of 43 days. I haven't had any issues with bind in this time, but will pop in a note if anything crops up. I have another machine running deb 2.2R5 with a 2.2.19 kernel, and bind 8 (not sure of the sub-revisions), which was crashing every 7 or so hours. Crontab to the rescue on that one, and 6 hourly updates ;) Regards, William
Re: Bind9 stopped after 34 days of uptime
El Jue 26 Dic 2002 13:26, escribió: Hey All, I have a machine running a 2.4.20 kernel on deb2.2R5, and bind 9.2.1, uptime of 43 days. I haven't had any issues with bind in this time, but will pop in a note if anything crops up. Okay. How much RAM do u have? I have 256 MB here (and 512 MB of swap) I have another machine running deb 2.2R5 with a 2.2.19 kernel, and bind 8 (not sure of the sub-revisions), which was crashing every 7 or so hours. Crontab to the rescue on that one, and 6 hourly updates ;) Regards Pope -- Luis Gomez Miralles InfoEmergencias - Technical Department Phone (+34) 654 24 01 34 Fax (+34) 963 49 31 80 [EMAIL PROTECTED] PGP Public Key available at http://www.infoemergencias.com/lgomez.asc
Re: Bind9 stopped after 34 days of uptime
As you can all see, my bind9 has been up for some while: 2664 ?S 0:00 /usr/sbin/named -u named 2665 ?S 0:16 \_ /usr/sbin/named -u named 2666 ?S137:29 \_ /usr/sbin/named -u named 2667 ?S 1:04 \_ /usr/sbin/named -u named 2668 ?S 14:17 \_ /usr/sbin/named -u named This is on a Pentium 100 MHz with around 32 MB of RAM. The box itself has been up 134 days. This is the primary internet server for zionlth.org. Traffic to this domain is modest... -- Phil PGP/GPG Key: http://www.zionlth.org/~plhofmei/ wget -O - http://www.zionlth.org/~plhofmei/key.txt | gpg --import -- Excuse #189: You did wha... oh _dear_
Re: Bind9 stopped after 34 days of uptime
On Thu, Dec 26, 2002 at 09:16:12AM -0500, Phillip Hofmeister wrote: This is on a Pentium 100 MHz with around 32 MB of RAM. The box itself has been up 134 days. This is the primary internet server for zionlth.org. Traffic to this domain is modest... I have a feeling that it's possible to misconfigure bind9 in such a way that it fails periodically. I had it running on a 200 MHz box with 32 MB RAM, and it failed occasionally, with no indication as to why. However, I've since re-worked named.conf, and have not experienced an unexpected failure in the past 6 months. The original named.conf was used with bind 8, and I just kept it when I upgraded to bind9 (except for the logging configuration, which changed significantly). It was when I ditched the old named.conf and re-wrote it for bind9, including more refined logging configuration, that stability was greatly improved. Of course, for a lot of the time that bind9 was crashing, it was running versions prior to the version that was actually released with woody, since this box was running woody before it was released. noah -- ___ | Web: http://web.morgul.net/~frodo/ | PGP Public Key: http://web.morgul.net/~frodo/mail.html pgpfJ95cwuKeb.pgp Description: PGP signature
Bind9 stopped after 34 days of uptime
Hi all I've been running my company's server with Linux in the same computer for about six months. Tonight, when I arrived home (my company is in my house) at about 6 a.m., I noticed I could not browse any website, and noticed that the DNS server (bind 9) was stopped. It was up when I left at 15.30h. I restarted the service and everything is OK now. We currently have an uptime of 34 days, and this had never happened before. The computer is running Woody, upgraded every night (via cron.daily). I've been looking at my logs, trying to determine at what exact time the dns server failed, but I cannot figure out. We never lost connection from the Internet, as we use a secondary name server provided by our name registrant (gandi.net), so as far as I can tell our name did not stop being resolvable from the outside (that explains why I didn't stop receiving mails, I think). Well, if anyone has ever had a problem like this and can lend me a hand or give me some advice, I'll be very happy to hear you :-) TIA Pope -- Luis Gomez Miralles InfoEmergencias - Technical Department Phone (+34) 654 24 01 34 Fax (+34) 963 49 31 80 [EMAIL PROTECTED] PGP Public Key available at http://www.infoemergencias.com/lgomez.asc -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
RE: Bind9 stopped after 34 days of uptime
Hello, I have the same problem recently. I've deleted the needless users and groups from the group and passwd files, but because of some reason the utmp grp is needed by logrotate (why ? :)). Because of this problem, the logrotate didn't run daily and logfiles were bigger and bigger. Some days ago I received an alert from netsaint that my shiny new bind9 stopped running. I started to investigate it and I was surprised that i have 2 megabytes of free ram :/ I figured out that if some of the log files are so big (around 100Mb) syslog-ng ate up all of my 256Mb ram. So i compresssed those log files, restarted syslog-ng and bamm i have 181 Mb of free ram. So you should check this as well :) Best Regards, Domonkos Czinke -Original Message- From: InfoEmergencias - Luis Gomez [mailto:[EMAIL PROTECTED]] Sent: Wednesday, December 25, 2002 3:03 PM To: Debian security Subject: Bind9 stopped after 34 days of uptime Hi all I've been running my company's server with Linux in the same computer for about six months. Tonight, when I arrived home (my company is in my house) at about 6 a.m., I noticed I could not browse any website, and noticed that the DNS server (bind 9) was stopped. It was up when I left at 15.30h. I restarted the service and everything is OK now. We currently have an uptime of 34 days, and this had never happened before. The computer is running Woody, upgraded every night (via cron.daily). I've been looking at my logs, trying to determine at what exact time the dns server failed, but I cannot figure out. We never lost connection from the Internet, as we use a secondary name server provided by our name registrant (gandi.net), so as far as I can tell our name did not stop being resolvable from the outside (that explains why I didn't stop receiving mails, I think). Well, if anyone has ever had a problem like this and can lend me a hand or give me some advice, I'll be very happy to hear you :-) TIA Pope -- Luis Gomez Miralles InfoEmergencias - Technical Department Phone (+34) 654 24 01 34 Fax (+34) 963 49 31 80 [EMAIL PROTECTED] PGP Public Key available at http://www.infoemergencias.com/lgomez.asc -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED] -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Bind9 stopped after 34 days of uptime
- Original message - On Wed, 25 Dec 2002 15:03:19 +0100 InfoEmergencias - Luis Gomez [EMAIL PROTECTED] wrote in message[EMAIL PROTECTED]: Hi all I've been running my company's server with Linux in the same computer for about six months. Tonight, when I arrived home (my company is in my house) at about 6 a.m., I noticed I could not browse any website, and noticed that the DNS server (bind 9) was stopped. It was up when I left at 15.30h. I restarted the service and everything is OK now. I've had exactly the same on our corperate primary nameserver (Slackware with bind 9.2.1), because there was no logging I couldn't find out why bind stopped working. Paranoid: Perhaps there is something with bind 9 we don't know about (yet...)? :-) No, I honestly don't know yet. Well, if anyone has ever had a problem like this and can lend me a hand or give me some advice, I'll be very happy to hear you :-) Can't help you on this one right now, but I certainly will watch this thread closely :) Merry Christmas everyone. Regards, Jan -- /\ ASCII Ribbon Campaign \ / No HTML in mail or news! X / \ DSINet: http://www.dsinet.org -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bind9 stopped after 34 days of uptime
Hi all I've been running my company's server with Linux in the same computer for about six months. Tonight, when I arrived home (my company is in my house) at about 6 a.m., I noticed I could not browse any website, and noticed that the DNS server (bind 9) was stopped. It was up when I left at 15.30h. I restarted the service and everything is OK now. We currently have an uptime of 34 days, and this had never happened before. The computer is running Woody, upgraded every night (via cron.daily). I've been looking at my logs, trying to determine at what exact time the dns server failed, but I cannot figure out. We never lost connection from the Internet, as we use a secondary name server provided by our name registrant (gandi.net), so as far as I can tell our name did not stop being resolvable from the outside (that explains why I didn't stop receiving mails, I think). Well, if anyone has ever had a problem like this and can lend me a hand or give me some advice, I'll be very happy to hear you :-) TIA Pope -- Luis Gomez Miralles InfoEmergencias - Technical Department Phone (+34) 654 24 01 34 Fax (+34) 963 49 31 80 [EMAIL PROTECTED] PGP Public Key available at http://www.infoemergencias.com/lgomez.asc
RE: Bind9 stopped after 34 days of uptime
Hello, I have the same problem recently. I've deleted the needless users and groups from the group and passwd files, but because of some reason the utmp grp is needed by logrotate (why ? :)). Because of this problem, the logrotate didn't run daily and logfiles were bigger and bigger. Some days ago I received an alert from netsaint that my shiny new bind9 stopped running. I started to investigate it and I was surprised that i have 2 megabytes of free ram :/ I figured out that if some of the log files are so big (around 100Mb) syslog-ng ate up all of my 256Mb ram. So i compresssed those log files, restarted syslog-ng and bamm i have 181 Mb of free ram. So you should check this as well :) Best Regards, Domonkos Czinke -Original Message- From: InfoEmergencias - Luis Gomez [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 25, 2002 3:03 PM To: Debian security Subject: Bind9 stopped after 34 days of uptime Hi all I've been running my company's server with Linux in the same computer for about six months. Tonight, when I arrived home (my company is in my house) at about 6 a.m., I noticed I could not browse any website, and noticed that the DNS server (bind 9) was stopped. It was up when I left at 15.30h. I restarted the service and everything is OK now. We currently have an uptime of 34 days, and this had never happened before. The computer is running Woody, upgraded every night (via cron.daily). I've been looking at my logs, trying to determine at what exact time the dns server failed, but I cannot figure out. We never lost connection from the Internet, as we use a secondary name server provided by our name registrant (gandi.net), so as far as I can tell our name did not stop being resolvable from the outside (that explains why I didn't stop receiving mails, I think). Well, if anyone has ever had a problem like this and can lend me a hand or give me some advice, I'll be very happy to hear you :-) TIA Pope -- Luis Gomez Miralles InfoEmergencias - Technical Department Phone (+34) 654 24 01 34 Fax (+34) 963 49 31 80 [EMAIL PROTECTED] PGP Public Key available at http://www.infoemergencias.com/lgomez.asc -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Bind9 stopped after 34 days of uptime
- Original message - On Wed, 25 Dec 2002 15:03:19 +0100 InfoEmergencias - Luis Gomez [EMAIL PROTECTED] wrote in message[EMAIL PROTECTED]: Hi all I've been running my company's server with Linux in the same computer for about six months. Tonight, when I arrived home (my company is in my house) at about 6 a.m., I noticed I could not browse any website, and noticed that the DNS server (bind 9) was stopped. It was up when I left at 15.30h. I restarted the service and everything is OK now. I've had exactly the same on our corperate primary nameserver (Slackware with bind 9.2.1), because there was no logging I couldn't find out why bind stopped working. Paranoid: Perhaps there is something with bind 9 we don't know about (yet...)? :-) No, I honestly don't know yet. Well, if anyone has ever had a problem like this and can lend me a hand or give me some advice, I'll be very happy to hear you :-) Can't help you on this one right now, but I certainly will watch this thread closely :) Merry Christmas everyone. Regards, Jan -- /\ ASCII Ribbon Campaign \ / No HTML in mail or news! X / \ DSINet: http://www.dsinet.org
Re: Bind9 stopped after 34 days of uptime
On Wed, 25 Dec 2002, J.Reilink wrote: I've had exactly the same on our corperate primary nameserver (Slackware with bind 9.2.1), because there was no logging I couldn't find out why bind stopped working. Take a look at memory usage when Bind stop's working and monitor for some time how much memory Bind is using. If that amount is growing, Bind probably got a memory leak. ( isn't the first time :( ) Greetings, Richard. Paul Vixie in an interview with Sendmail.net: Now that the Internet has the full spectrum of humanity as users, the technology is showing its weakness: it was designed to be used by friendly, smart people. Spammers, as an example of a class, are neither friendly nor smart.