Using 4.3-RELEASE's libc on 5.0 causes hard lockups

2003-02-02 Thread Kevin Day

We had a system running 4.3-RELEASE that I used the sysinstall upgrade 
mechanism to upgrade to 5.0-RELEASE. I installed compat4x to use our 
existing 4.x binaries.

Immediately after rebooting, I noticed most old 4.x binaries were 
complaining about _stdoutp being an undefined symbol. However, the scary 
part was that when I started apache/mod_php4 the server crashed (hard 
lockup) within 10 seconds under load. This was easily reproducible, at 
least a dozen times while trying to debug this I started httpd, and the 
server locked up within 10 seconds.

I recompiled all of apache, mod_php4 and all of its libraries, started up 
httpd and had no problems with that.

Things were fine that night until an analog cron job ran, every time THAT 
ran, I also got a hard lockup of the server, OR between 100 and 500 of my 
httpd processes would suddenly SEGV.

After a little more poking around, I saw in /usr/lib:

lrwxr-xr-x 1 root wheel 9 Feb 1 00:18 libc.so - libc.so.5
lrwxr-xr-x 1 root wheel 16 Jul 5 2002 libc.so.3 - /usr/lib/libc.so
-r--r--r-- 1 root wheel 571480 Aug 5 13:45 libc.so.4
-r--r--r-- 1 root wheel 836892 Feb 1 00:18 libc.so.5


Shouldn't libc.so.4 have been a symlink to libc.so after a compat4x 
install? In any case, doing that myself seemed to fix everything.

My questions:

1) Shouldn't something along the way of doing a sysinstall upgrade or 
installing compat4x have fixed /usr/lib/libc.so.4 into a symlink? (That is 
the correct situation, right?)

2) Is it possible that some kernel interface has changed, and something 
isn't being validated in the kernel side? Non-root userland applications 
being able to lockup the server, and/or affect other processes simply by 
using a different libc would seem to indicate this.


I know this is a pretty vague bug report,  but this is a production server, 
so I wasn't able to play around too much with it. I do have a backup of the 
entire server before it was upgraded to 5.0 if you'd like me to check 
anything there. I did compile with INVARIANTS and WITNESS and got no 
debugging output when things did lock up. The keyboard and serial console 
were totally dead when this happened, so DDB isn't an option either.


(originally emailed security-officer about this because of the possibility 
for a security issue, who told me to forward this here)




To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message


Re: Using 4.3-RELEASE's libc on 5.0 causes hard lockups

2003-02-02 Thread Jacques A. Vidrine
On Sun, Feb 02, 2003 at 11:41:32AM -0600, Kevin Day wrote:
 lrwxr-xr-x 1 root wheel 9 Feb 1 00:18 libc.so - libc.so.5
 lrwxr-xr-x 1 root wheel 16 Jul 5 2002 libc.so.3 - /usr/lib/libc.so
^
This is seriously messed up.  See below.

 -r--r--r-- 1 root wheel 571480 Aug 5 13:45 libc.so.4
 -r--r--r-- 1 root wheel 836892 Feb 1 00:18 libc.so.5
 
 
 Shouldn't libc.so.4 have been a symlink to libc.so after a compat4x 
 install? In any case, doing that myself seemed to fix everything.

No, this would cause you major problems.  Binaries that expected the
libc.so.4 interface would be calling into libc.so.5, and probably
causing very strange behaviour.

Cheers,
-- 
Jacques A. Vidrine [EMAIL PROTECTED]  http://www.celabo.org/
NTT/Verio SME  . FreeBSD UNIX .   Heimdal Kerberos
[EMAIL PROTECTED] .  [EMAIL PROTECTED]  .  [EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Using 4.3-RELEASE's libc on 5.0 causes hard lockups

2003-02-02 Thread Kevin Day
At 11:42 AM 2/2/2003, Jacques A. Vidrine wrote:

On Sun, Feb 02, 2003 at 11:41:32AM -0600, Kevin Day wrote:
 lrwxr-xr-x 1 root wheel 9 Feb 1 00:18 libc.so - libc.so.5
 lrwxr-xr-x 1 root wheel 16 Jul 5 2002 libc.so.3 - /usr/lib/libc.so
^
This is seriously messed up.  See below.

 -r--r--r-- 1 root wheel 571480 Aug 5 13:45 libc.so.4
 -r--r--r-- 1 root wheel 836892 Feb 1 00:18 libc.so.5


 Shouldn't libc.so.4 have been a symlink to libc.so after a compat4x
 install? In any case, doing that myself seemed to fix everything.

No, this would cause you major problems.  Binaries that expected the
libc.so.4 interface would be calling into libc.so.5, and probably
causing very strange behaviour.


Ok, I admit, no matter how it happened, an application using the wrong libc 
is a bad thing.

But, how are things supposed to work? Apps that were using the old 
libc.so.4 complained about unresolved symbols(_stdoutp usually). If I 
removed /usr/lib/libc.so.4 they complained that they couldn't find libc, If 
I did create link libc.so.4 to libc.so.5 everything appeared to work just 
fine, but I know that's probably a fluke.

In any case, a system lockup or being able to crash other user's processes 
just by having the wrong libc shouldn't be possible no matter what happens.





To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message


Re: Using 4.3-RELEASE's libc on 5.0 causes hard lockups

2003-02-02 Thread Jacques A. Vidrine
On Sun, Feb 02, 2003 at 11:53:22AM -0600, Kevin Day wrote:
 Ok, I admit, no matter how it happened, an application using the wrong libc 
 is a bad thing.
 
 But, how are things supposed to work? 

Apps that need the old libc.so.4 will find it in
/usr/lib/compat/libc.so.4 (or /usr/lib/libc.so.4 if you didn't remove
it, for that matter).

[...]
 In any case, a system lockup or being able to crash other user's processes 
 just by having the wrong libc shouldn't be possible no matter what happens.

Probably not, although if you have processes running as root and using
the `wrong' libc, all bets are off.

Cheers,
-- 
Jacques A. Vidrine [EMAIL PROTECTED]  http://www.celabo.org/
NTT/Verio SME  . FreeBSD UNIX .   Heimdal Kerberos
[EMAIL PROTECTED] .  [EMAIL PROTECTED]  .  [EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Using 4.3-RELEASE's libc on 5.0 causes hard lockups

2003-02-02 Thread Steve Kargl
On Sun, Feb 02, 2003 at 11:41:32AM -0600, Kevin Day wrote:
 
 
 lrwxr-xr-x 1 root wheel 9 Feb 1 00:18 libc.so - libc.so.5
 lrwxr-xr-x 1 root wheel 16 Jul 5 2002 libc.so.3 - /usr/lib/libc.so

Delete this.

 -r--r--r-- 1 root wheel 571480 Aug 5 13:45 libc.so.4

Delete this.

 -r--r--r-- 1 root wheel 836892 Feb 1 00:18 libc.so.5
 
 Shouldn't libc.so.4 have been a symlink to libc.so after a compat4x 
 install? In any case, doing that myself seemed to fix everything.

The compat4x installs the libraries in /usr/lib/compat.

kargl[202] ldd /usr/local/lib/NAGWare/f95
/usr/local/lib/NAGWare/f95:
libm.so.2 = /usr/lib/libm.so.2 (0x28075000)
libc.so.4 = /usr/lib/compat/libc.so.4 (0x28092000)

What does ldd report for the binaries that die?

 
 My questions:
 
 1) Shouldn't something along the way of doing a sysinstall upgrade or 
 installing compat4x have fixed /usr/lib/libc.so.4 into a symlink? (That is 
 the correct situation, right?)

No. The reason for the version number bump from 4 to 5 is
an ABI/API has changed.  In this case, _stdinp, _stdoutp,
and _stderrp have changed.

 I know this is a pretty vague bug report,  but this is a production server, 
 so I wasn't able to play around too much with it. I do have a backup of the 
 entire server before it was upgraded to 5.0 if you'd like me to check 
 anything there.

5.0 isn't recommended for production servers.




-- 
Steve

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Using 4.3-RELEASE's libc on 5.0 causes hard lockups

2003-02-02 Thread Kevin Day
At 11:54 AM 2/2/2003, Jacques A. Vidrine wrote:

 Ok, I admit, no matter how it happened, an application using the wrong 
libc
 is a bad thing.

 But, how are things supposed to work?

Apps that need the old libc.so.4 will find it in
/usr/lib/compat/libc.so.4 (or /usr/lib/libc.so.4 if you didn't remove
it, for that matter).

Well, things were definitely picking /usr/lib/libc.so.4 over anything in 
compat. Should sysinstall have nuked my /usr/lib/libc if it was putting the 
correct one in compat?

 In any case, a system lockup or being able to crash other user's processes
 just by having the wrong libc shouldn't be possible no matter what happens.

Probably not, although if you have processes running as root and using
the `wrong' libc, all bets are off.


Well, after I recompiled httpd (which did have a single process owned by 
root) and rebooted, nothing at all owned by root touched anything that was 
compiled under 4.x. Non-privileged regular users owned the process owned by 
analog, which caused the same behavior. Me running analog under my normal 
account could kill processes owned by nobody with segfaults.







To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message