Using 4.3-RELEASE's libc on 5.0 causes hard lockups
We had a system running 4.3-RELEASE that I used the sysinstall upgrade mechanism to upgrade to 5.0-RELEASE. I installed compat4x to use our existing 4.x binaries. Immediately after rebooting, I noticed most old 4.x binaries were complaining about _stdoutp being an undefined symbol. However, the scary part was that when I started apache/mod_php4 the server crashed (hard lockup) within 10 seconds under load. This was easily reproducible, at least a dozen times while trying to debug this I started httpd, and the server locked up within 10 seconds. I recompiled all of apache, mod_php4 and all of its libraries, started up httpd and had no problems with that. Things were fine that night until an analog cron job ran, every time THAT ran, I also got a hard lockup of the server, OR between 100 and 500 of my httpd processes would suddenly SEGV. After a little more poking around, I saw in /usr/lib: lrwxr-xr-x 1 root wheel 9 Feb 1 00:18 libc.so - libc.so.5 lrwxr-xr-x 1 root wheel 16 Jul 5 2002 libc.so.3 - /usr/lib/libc.so -r--r--r-- 1 root wheel 571480 Aug 5 13:45 libc.so.4 -r--r--r-- 1 root wheel 836892 Feb 1 00:18 libc.so.5 Shouldn't libc.so.4 have been a symlink to libc.so after a compat4x install? In any case, doing that myself seemed to fix everything. My questions: 1) Shouldn't something along the way of doing a sysinstall upgrade or installing compat4x have fixed /usr/lib/libc.so.4 into a symlink? (That is the correct situation, right?) 2) Is it possible that some kernel interface has changed, and something isn't being validated in the kernel side? Non-root userland applications being able to lockup the server, and/or affect other processes simply by using a different libc would seem to indicate this. I know this is a pretty vague bug report, but this is a production server, so I wasn't able to play around too much with it. I do have a backup of the entire server before it was upgraded to 5.0 if you'd like me to check anything there. I did compile with INVARIANTS and WITNESS and got no debugging output when things did lock up. The keyboard and serial console were totally dead when this happened, so DDB isn't an option either. (originally emailed security-officer about this because of the possibility for a security issue, who told me to forward this here) To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Using 4.3-RELEASE's libc on 5.0 causes hard lockups
On Sun, Feb 02, 2003 at 11:41:32AM -0600, Kevin Day wrote: lrwxr-xr-x 1 root wheel 9 Feb 1 00:18 libc.so - libc.so.5 lrwxr-xr-x 1 root wheel 16 Jul 5 2002 libc.so.3 - /usr/lib/libc.so ^ This is seriously messed up. See below. -r--r--r-- 1 root wheel 571480 Aug 5 13:45 libc.so.4 -r--r--r-- 1 root wheel 836892 Feb 1 00:18 libc.so.5 Shouldn't libc.so.4 have been a symlink to libc.so after a compat4x install? In any case, doing that myself seemed to fix everything. No, this would cause you major problems. Binaries that expected the libc.so.4 interface would be calling into libc.so.5, and probably causing very strange behaviour. Cheers, -- Jacques A. Vidrine [EMAIL PROTECTED] http://www.celabo.org/ NTT/Verio SME . FreeBSD UNIX . Heimdal Kerberos [EMAIL PROTECTED] . [EMAIL PROTECTED] . [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Using 4.3-RELEASE's libc on 5.0 causes hard lockups
At 11:42 AM 2/2/2003, Jacques A. Vidrine wrote: On Sun, Feb 02, 2003 at 11:41:32AM -0600, Kevin Day wrote: lrwxr-xr-x 1 root wheel 9 Feb 1 00:18 libc.so - libc.so.5 lrwxr-xr-x 1 root wheel 16 Jul 5 2002 libc.so.3 - /usr/lib/libc.so ^ This is seriously messed up. See below. -r--r--r-- 1 root wheel 571480 Aug 5 13:45 libc.so.4 -r--r--r-- 1 root wheel 836892 Feb 1 00:18 libc.so.5 Shouldn't libc.so.4 have been a symlink to libc.so after a compat4x install? In any case, doing that myself seemed to fix everything. No, this would cause you major problems. Binaries that expected the libc.so.4 interface would be calling into libc.so.5, and probably causing very strange behaviour. Ok, I admit, no matter how it happened, an application using the wrong libc is a bad thing. But, how are things supposed to work? Apps that were using the old libc.so.4 complained about unresolved symbols(_stdoutp usually). If I removed /usr/lib/libc.so.4 they complained that they couldn't find libc, If I did create link libc.so.4 to libc.so.5 everything appeared to work just fine, but I know that's probably a fluke. In any case, a system lockup or being able to crash other user's processes just by having the wrong libc shouldn't be possible no matter what happens. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Using 4.3-RELEASE's libc on 5.0 causes hard lockups
On Sun, Feb 02, 2003 at 11:53:22AM -0600, Kevin Day wrote: Ok, I admit, no matter how it happened, an application using the wrong libc is a bad thing. But, how are things supposed to work? Apps that need the old libc.so.4 will find it in /usr/lib/compat/libc.so.4 (or /usr/lib/libc.so.4 if you didn't remove it, for that matter). [...] In any case, a system lockup or being able to crash other user's processes just by having the wrong libc shouldn't be possible no matter what happens. Probably not, although if you have processes running as root and using the `wrong' libc, all bets are off. Cheers, -- Jacques A. Vidrine [EMAIL PROTECTED] http://www.celabo.org/ NTT/Verio SME . FreeBSD UNIX . Heimdal Kerberos [EMAIL PROTECTED] . [EMAIL PROTECTED] . [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Using 4.3-RELEASE's libc on 5.0 causes hard lockups
On Sun, Feb 02, 2003 at 11:41:32AM -0600, Kevin Day wrote: lrwxr-xr-x 1 root wheel 9 Feb 1 00:18 libc.so - libc.so.5 lrwxr-xr-x 1 root wheel 16 Jul 5 2002 libc.so.3 - /usr/lib/libc.so Delete this. -r--r--r-- 1 root wheel 571480 Aug 5 13:45 libc.so.4 Delete this. -r--r--r-- 1 root wheel 836892 Feb 1 00:18 libc.so.5 Shouldn't libc.so.4 have been a symlink to libc.so after a compat4x install? In any case, doing that myself seemed to fix everything. The compat4x installs the libraries in /usr/lib/compat. kargl[202] ldd /usr/local/lib/NAGWare/f95 /usr/local/lib/NAGWare/f95: libm.so.2 = /usr/lib/libm.so.2 (0x28075000) libc.so.4 = /usr/lib/compat/libc.so.4 (0x28092000) What does ldd report for the binaries that die? My questions: 1) Shouldn't something along the way of doing a sysinstall upgrade or installing compat4x have fixed /usr/lib/libc.so.4 into a symlink? (That is the correct situation, right?) No. The reason for the version number bump from 4 to 5 is an ABI/API has changed. In this case, _stdinp, _stdoutp, and _stderrp have changed. I know this is a pretty vague bug report, but this is a production server, so I wasn't able to play around too much with it. I do have a backup of the entire server before it was upgraded to 5.0 if you'd like me to check anything there. 5.0 isn't recommended for production servers. -- Steve To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Using 4.3-RELEASE's libc on 5.0 causes hard lockups
At 11:54 AM 2/2/2003, Jacques A. Vidrine wrote: Ok, I admit, no matter how it happened, an application using the wrong libc is a bad thing. But, how are things supposed to work? Apps that need the old libc.so.4 will find it in /usr/lib/compat/libc.so.4 (or /usr/lib/libc.so.4 if you didn't remove it, for that matter). Well, things were definitely picking /usr/lib/libc.so.4 over anything in compat. Should sysinstall have nuked my /usr/lib/libc if it was putting the correct one in compat? In any case, a system lockup or being able to crash other user's processes just by having the wrong libc shouldn't be possible no matter what happens. Probably not, although if you have processes running as root and using the `wrong' libc, all bets are off. Well, after I recompiled httpd (which did have a single process owned by root) and rebooted, nothing at all owned by root touched anything that was compiled under 4.x. Non-privileged regular users owned the process owned by analog, which caused the same behavior. Me running analog under my normal account could kill processes owned by nobody with segfaults. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message