Re: 4.4-RELEASE corrupting filesystems?

2001-12-17 Thread Kent Stewart



Cy Schubert - ITSD Open Systems Group wrote:

 In message [EMAIL PROTECTED], Vivek Khera writes:
 
n == nsayer  [EMAIL PROTECTED] writes:

n One of my machines has been prone to having one of its filesystems curdle
n for an unknown reason.

n I have ever only experienced 3 filesystems get curdled on FreeBSD ever,
n but all three were /usr filesystems on that one disk. The first time
n was a while ago, but now it's happened twice in the last week.

This smells *seriously* of hardware failure.  Could be bad controller,
disk, cable, or even memory.  Only detailed diagnostics can pinpoint
the real culprit (either that or sequentially replacing every part one
at a a time.)

 
 Or possibly a termination problem (which is also a h/w problem).


Or the NFS corruption problem that has been discussed on -hackers. Fixes 
for it are still developed in current.

Kent

-- 
Kent Stewart
Richland, WA

mailto:[EMAIL PROTECTED]
http://users.owt.com/kstewart/index.html
FreeBSD News http://daily.daemonnews.org/


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-stable in the body of the message



Re: Userland not installing

2001-12-17 Thread Sam Drinkard

Something is rather odd here, and I'm at a loss to explain it.  Checking
dates does in fact show a Dec 17th datestamp, but comparing the /usr/bin
executables against /usr/obj/usr/src/usr.bin executables shows they are
different in size, but datestamps are the same, Dec 17.  This whole
thing started with a funny netstat output.  Looking at file sizes, I see
in /usr/bin, netstat is:

-r-xr-sr-x  1 root  kmem 91008  Dec 17 13:06 /usr/bin/netstat

and in /usr/obj/usr/src/usr.bin/netstat:

-rwxr-xr-x  1 root  wheel  99182   Dec 17  12:24  netstat


Knowing it has not been installed, but it does give proper output.  A
view of the script of the install does not show any problems...

More than baffled now...


Sam


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-stable in the body of the message



Re: Userland not installing

2001-12-17 Thread Michael Lucas

Your issue is basically this?

pedicular~;ls -lai /usr/obj/usr/src/usr.bin/xargs/
total 26
309805 drwxr-xr-x2 root  wheel   512 Dec 12 12:24 .
238112 drwxr-xr-x  211 root  wheel  6656 Dec 12 11:19 ..
310095 -rw-r--r--1 root  wheel   884 Dec 12 11:58 .depend
310504 -rwxr-xr-x1 root  wheel  8425 Dec 12 12:24 xargs
310505 -rw-r--r--1 root  wheel  2769 Dec 12 12:24 xargs.1.gz
310503 -rw-r--r--1 root  wheel  4544 Dec 12 12:24 xargs.o
pedicular~;ls -lai /usr/bin/xargs 
8627 -r-xr-xr-x  1 root  wheel  6324 Dec 12 12:53 /usr/bin/xargs

I *believe* that the install strips the binaries before installing
them.  That would explain the size difference.


On Mon, Dec 17, 2001 at 01:45:13PM -0500, Sam Drinkard wrote:
 Something is rather odd here, and I'm at a loss to explain it.  Checking
 dates does in fact show a Dec 17th datestamp, but comparing the /usr/bin
 executables against /usr/obj/usr/src/usr.bin executables shows they are
 different in size, but datestamps are the same, Dec 17.  This whole
 thing started with a funny netstat output.  Looking at file sizes, I see
 in /usr/bin, netstat is:
 
 -r-xr-sr-x  1 root  kmem 91008  Dec 17 13:06 /usr/bin/netstat
 
 and in /usr/obj/usr/src/usr.bin/netstat:
 
 -rwxr-xr-x  1 root  wheel  99182   Dec 17  12:24  netstat
 
 
 Knowing it has not been installed, but it does give proper output.  A
 view of the script of the install does not show any problems...
 
 More than baffled now...
 
 
 Sam

-- 
Michael Lucas   [EMAIL PROTECTED], [EMAIL PROTECTED]
my FreeBSD column: http://www.oreillynet.com/pub/q/Big_Scary_Daemons

http://www.blackhelicopters.org/~mwlucas/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-stable in the body of the message



Re: Userland not installing

2001-12-17 Thread David Wolfskill

Date: Mon, 17 Dec 2001 13:45:13 -0500
From: Sam Drinkard [EMAIL PROTECTED]

Something is rather odd here, and I'm at a loss to explain it.  Checking
dates does in fact show a Dec 17th datestamp, but comparing the /usr/bin
executables against /usr/obj/usr/src/usr.bin executables shows they are
different in size, but datestamps are the same, Dec 17.  This whole
thing started with a funny netstat output.  Looking at file sizes, I see
in /usr/bin, netstat is:

-r-xr-sr-x  1 root  kmem 91008  Dec 17 13:06 /usr/bin/netstat

and in /usr/obj/usr/src/usr.bin/netstat:

-rwxr-xr-x  1 root  wheel  99182   Dec 17  12:24  netstat


Knowing it has not been installed, but it does give proper output.  A
view of the script of the install does not show any problems...


FYI:

m133[1] file /usr/bin/netstat /usr/obj/usr/src/usr.bin/netstat/netstat
/usr/bin/netstat: setgid ELF 32-bit LSB executable, Intel 
80386, version 1 (FreeBSD), dynamically linked (uses shared libs), stripped
/usr/obj/usr/src/usr.bin/netstat/netstat: ELF 32-bit LSB executable, Intel 80386, 
version 1 (FreeBSD), dynamically linked (uses shared libs), not stripped
m133[2] ls -l !:*
ls -l /usr/bin/netstat /usr/obj/usr/src/usr.bin/netstat/netstat
-r-xr-sr-x  1 root  kmem   91008 Dec 17 05:58 /usr/bin/netstat
-rwxrwxr-x  1 root  wheel  99182 Dec 17 05:38 /usr/obj/usr/src/usr.bin/netstat/netstat
m133[3]


(I.e., the new one *is* installed; it's stripped, while the one in
/usr/obj is not stripped, thus accounting for the size difference.)

Cheers,
david
-- 
David H. Wolfskill  [EMAIL PROTECTED]
I believe it would be irresponsible (and thus, unethical) for me to advise,
recommend, or support the use of any product that is or depends on any
Microsoft product for any purpose other than personal amusement.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-stable in the body of the message



Re: Userland not installing

2001-12-17 Thread Sam Drinkard

That is true, however I can't explain why netstat -a only gives the
active UNIX domain sockets info, while the unstripped version reports
Active Internet connections (including servers) as it's supposed to..

Sam


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-stable in the body of the message



Re: Userland not installing

2001-12-17 Thread Michael Lucas

Of course, one could always use file on each binary to see what the
differences are.  But that would be too easy.  

Sigh... the new tea I tried must not have enough caffiene.  

On Mon, Dec 17, 2001 at 10:53:40AM -0800, David Wolfskill wrote:
 Date: Mon, 17 Dec 2001 13:45:13 -0500
 From: Sam Drinkard [EMAIL PROTECTED]
 
 Something is rather odd here, and I'm at a loss to explain it.  Checking
 dates does in fact show a Dec 17th datestamp, but comparing the /usr/bin
 executables against /usr/obj/usr/src/usr.bin executables shows they are
 different in size, but datestamps are the same, Dec 17.  This whole
 thing started with a funny netstat output.  Looking at file sizes, I see
 in /usr/bin, netstat is:
 
 -r-xr-sr-x  1 root  kmem 91008  Dec 17 13:06 /usr/bin/netstat
 
 and in /usr/obj/usr/src/usr.bin/netstat:
 
 -rwxr-xr-x  1 root  wheel  99182   Dec 17  12:24  netstat
 
 
 Knowing it has not been installed, but it does give proper output.  A
 view of the script of the install does not show any problems...
 
 
 FYI:
 
 m133[1] file /usr/bin/netstat /usr/obj/usr/src/usr.bin/netstat/netstat
 /usr/bin/netstat: setgid ELF 32-bit LSB executable, Intel 
80386, version 1 (FreeBSD), dynamically linked (uses shared libs), stripped
 /usr/obj/usr/src/usr.bin/netstat/netstat: ELF 32-bit LSB executable, Intel 80386, 
version 1 (FreeBSD), dynamically linked (uses shared libs), not stripped
 m133[2] ls -l !:*
 ls -l /usr/bin/netstat /usr/obj/usr/src/usr.bin/netstat/netstat
 -r-xr-sr-x  1 root  kmem   91008 Dec 17 05:58 /usr/bin/netstat
 -rwxrwxr-x  1 root  wheel  99182 Dec 17 05:38 
/usr/obj/usr/src/usr.bin/netstat/netstat
 m133[3]
 
 
 (I.e., the new one *is* installed; it's stripped, while the one in
 /usr/obj is not stripped, thus accounting for the size difference.)
 
 Cheers,
 david
 -- 
 David H. Wolfskill[EMAIL PROTECTED]
 I believe it would be irresponsible (and thus, unethical) for me to advise,
 recommend, or support the use of any product that is or depends on any
 Microsoft product for any purpose other than personal amusement.
 
 To Unsubscribe: send mail to [EMAIL PROTECTED]
 with unsubscribe freebsd-stable in the body of the message

-- 
Michael Lucas   [EMAIL PROTECTED], [EMAIL PROTECTED]
my FreeBSD column: http://www.oreillynet.com/pub/q/Big_Scary_Daemons

http://www.blackhelicopters.org/~mwlucas/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-stable in the body of the message



on/off NFS connection errors

2001-12-17 Thread Marius M. Rex


For a while I have been treating this as an annoyance, but I thought it
would be wise to investigate if something larger and more nefarious might
be being indicated by this.

I have a mixed environment of mainly Linux boxen, with a dozen or so
FreeBSD machines (For when we need the kind of network resources that
raising the NMBCLUSTERS can offer.)  Both types of systems serve mainly as
webservers, serving content that ultimately comes off of exported NFS
directories, from a Network Appliance (NetApp Release 5.3.4R3: Thu Jan 27
12:08:07 PST 2000)   The Linux boxen don't complain at all, but the FreeBSD
boxen can get rather noisy about NFS connection errors.  It happens
on and off like so:

118Dec 15 21:01:47 cc117 /kernel: nfs server netapp1:/vol/members: not
responding
118Dec 15 21:01:47 cc117 /kernel: nfs server netapp1:/vol/members: is
alive again
6nfs server netapp1:/vol/members: not responding
6nfs server netapp1:/vol/members: is alive again
118Dec 15 22:34:19 cc117 /kernel: nfs server netapp1:/vol/members: not
responding
118Dec 15 22:34:19 cc117 /kernel: nfs server netapp1:/vol/members: is
alive again
6nfs server netapp1:/vol/members: not responding
6nfs server netapp1:/vol/members: is alive again
118Dec 15 22:39:19 cc117 /kernel: nfs server netapp1:/vol/members: not
responding
118Dec 15 22:39:19 cc117 /kernel: nfs server netapp1:/vol/members: is
alive again
6nfs server netapp1:/vol/members: not responding
6nfs server netapp1:/vol/members: is alive again
118Dec 15 22:40:19 cc117 /kernel: nfs server netapp1:/vol/members: not
responding
118Dec 15 22:40:19 cc117 /kernel: nfs server netapp1:/vol/members: is
alive again

One moment we are connected, another we are down, and the we are back up
again.  Some days I can get pages and pages of this, others very little.
Luckily the connection error is so short lived that Apache never hiccups.

Has anyone else seen these kinds of persistent NFS errors is the 4.x
branch?  (This didn't happen noticeably in 3.x, but I would still
maintain that the NFS code in 4.x is an improvement over 3.x.)  Can anyone
suggest a sysctl/kernel variable I might tune to help remedy the problem?
If the root of the problem is more likely on the Netapp side, I have a support
contact and am not afraid to use it.  Anyone have any advice or
suggestions to offer?

This is the platform that I am working on:
FreeBSD cc117 4.2-STABLE FreeBSD 4.2-STABLE #0: Sat Aug 18 00:21:16 EDT
2001 root@cc117:/usr/src/sys/compile/CCI_KERNEL  i386


-
Marius M. Rex

Hardware: n. The parts of a computer that can be kicked.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-stable in the body of the message



Re: 4.4-STABLE crashes - suspects new ata-driver over wd-drivers

2001-12-17 Thread Brady Montz

On Fri, Dec 07, 2001 at 01:33:14PM -0800, Brady Montz wrote:
 Richard Nyberg [EMAIL PROTECTED] writes: 
  
  On Fri, Dec 07, 2001 at 10:59:13AM +0100, Samuel Tardieu wrote: 
   I am experiencing the same crashes on my new machine (ATA100 IDE 
   drive): they appeared when I noticed that I had forgotten to use 
   soft-updates. After I have turned them on, I experienced the first 
   crash in 15 minutes. Then I get one every two days, when doing 
   heavy disk IOs. I got a crash 10 minutes ago when the machine was 
   unattended though (and not doing important disk IOs), and could 
   see a panic message on the console. Unfortunately, I hadn't 
   enough free space in /var/crash to save the kernel. 

   Do you people use soft-updates? From my experience on this 
   problem, I assume that either soft-updates or the ATA driver may 
   be causing those spontanous reboots. 
   
  Yes I use soft-updates. The peculiar thing aboout my crash though is that 
  there was no panic; the machine just froze and the screen went blank, so 
  maybe I was hit by a different problem. 
   
-Richard 
 
 Yeah, I'm using soft updates too. My crashes are generally the same as
 Richards - no panic, just a freeze. Except my screen doesn't go blank.

Here's an update ...

I'm fairly certain there's a kernel bug at work here. Last night I rebooted 
to linux (which is on the same disk), and ran batch compiles all
night long without any troubles. In comparision, I can't compile more than
an hour at a time with BSD 4.4 before it crashes.

Again, I ran memtest86 and it didn't find any memory errors, and I'm
not seeing any file system corruption, just hangs and reboots.

I am running the latest 4.4-stable. The other day I went back to 4.4-release
and that didn't help. I've tried both with and without softupdates. The
crashes seem to happen most often when accessing stuff from all over
the filesystem, such as during a large make clean, or most reliably, with 
portsdb -Uu. 

I am tiring of this. Someone else on this thread mentioned that
their 5.0 machine is doing fine. In what shape is that and how much effort
is it to move a 4.4 machine to 5.0? 

-- 
  Brady Montz
  [EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-stable in the body of the message



Re: on/off NFS connection errors

2001-12-17 Thread Ian Dowse

In message [EMAIL PROTECTED], Marius M. Rex 
writes:

118Dec 15 22:40:19 cc117 /kernel: nfs server netapp1:/vol/members: not
responding
118Dec 15 22:40:19 cc117 /kernel: nfs server netapp1:/vol/members: is
alive again
...
Has anyone else seen these kinds of persistent NFS errors is the 4.x
branch?

These are a side-effect of the operation of the NFS dynamic retransmit
timeout code. The NFS client measures the request response time for
various types of operations and it sets a timeout based on the mean
and deviation of observed times.

The time taken by the server to perform some operations can vary
wildly though, so occasionally when a large number of operations
complete with very little delay, the response time estimate and
hence the timeout become very small. Then when one request is
unusually slow to complete (such as when the disk on the server is
busy), the client thinks that the server isn't responding and prints
those warnings. A fraction of a second later the request completes
and the client prints a an 'is alive again' message.

On non-soft mounts these messages are completely harmless because
the client will just wait for the server to eventually reply. On
soft mounts, the effect can cause problems because applications
occasionally see an EINTR error.

The dynamic retransmit timeout code can be disabled with the `-d'
flag to mount_nfs; this is often recommended for fast networks that
see very little packet loss.

Ian

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-stable in the body of the message



Re: on/off NFS connection errors

2001-12-17 Thread Matthew Dillon


:For a while I have been treating this as an annoyance, but I thought it
:would be wise to investigate if something larger and more nefarious might
:be being indicated by this.
:
:I have a mixed environment of mainly Linux boxen, with a dozen or so
:FreeBSD machines (For when we need the kind of network resources that
:raising the NMBCLUSTERS can offer.)  Both types of systems serve mainly as
:webservers, serving content that ultimately comes off of exported NFS
:directories, from a Network Appliance (NetApp Release 5.3.4R3: Thu Jan 27
:12:08:07 PST 2000)   The Linux boxen don't complain at all, but the FreeBSD
:boxen can get rather noisy about NFS connection errors.  It happens
:on and off like so:
:
:118Dec 15 21:01:47 cc117 /kernel: nfs server netapp1:/vol/members: not
:responding
:118Dec 15 21:01:47 cc117 /kernel: nfs server netapp1:/vol/members: is
:alive again
:..

Our NFS is somewhat finicky about response times.  It could probably
use some tuning.  Trying mounting with the 'dumbtimer' mount option
and see if that fixes your problem.  (also see the -d and -t options
in 'man mount_nfs'.  Note that -t is specified in 1/10 second increments).

-Matt

:Has anyone else seen these kinds of persistent NFS errors is the 4.x
:branch?  (This didn't happen noticeably in 3.x, but I would still
:maintain that the NFS code in 4.x is an improvement over 3.x.)  Can anyone
:suggest a sysctl/kernel variable I might tune to help remedy the problem?
:If the root of the problem is more likely on the Netapp side, I have a support
:contact and am not afraid to use it.  Anyone have any advice or
:suggestions to offer?
:
:This is the platform that I am working on:
:FreeBSD cc117 4.2-STABLE FreeBSD 4.2-STABLE #0: Sat Aug 18 00:21:16 EDT
:2001 root@cc117:/usr/src/sys/compile/CCI_KERNEL  i386


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-stable in the body of the message



Re: 4.4-STABLE crashes - suspects new ata-driver over wd-drivers

2001-12-17 Thread Chad David

On Mon, Dec 17, 2001 at 04:14:08PM -0800, Brady Montz wrote:
 
 Here's an update ...
 
 I'm fairly certain there's a kernel bug at work here. Last night I rebooted 
 to linux (which is on the same disk), and ran batch compiles all
 night long without any troubles. In comparision, I can't compile more than
 an hour at a time with BSD 4.4 before it crashes.

I still agree.  My -current machines run find, and I refuse to run -stable on
an SMP machine.

 
 I am running the latest 4.4-stable. The other day I went back to 4.4-release
 and that didn't help. I've tried both with and without softupdates. The
 crashes seem to happen most often when accessing stuff from all over
 the filesystem, such as during a large make clean, or most reliably, with 
 portsdb -Uu. 
 
 I am tiring of this. Someone else on this thread mentioned that
 their 5.0 machine is doing fine. In what shape is that and how much effort
 is it to move a 4.4 machine to 5.0? 


Unless you feel confident that you can deal with the problems that arise on
-current, I wouldn't want to be the one to recommend that you change, but
my personal experience has been that -stable is anything but stable on SMP
machines.  On UP machines I have no problems at all.  The -current SMP machines
here are all very stable.  I don't track it daily, and I am careful to build
a test box before I rebuild a box I care about, but generally I have been
much happier with -current than with -stable (this year).

As for the effort to upgrade, it depends on what the box is doing.  I've only
upgraded a few boxes in the last year or so, and found that it was fairly 
timing dependant, but in general I haven't had any real problems (read UPDATING).

I have a little time this afternoon, so I'm going to see if I can figure
something out.  I'll throw -stable onto one of my SMP development machines
and see if I can kill it.  At least there I can debug it. 



A small plug: I've written a script that will rebuild an entire machine, from
a cvsup - mergemaster and reboot.  It doesn't really address anything to
do with this thread, but you might find it handy :)

http://www.acns.ab.ca/projects/rebuild/rebuild.tar.gz


-- 
Chad David[EMAIL PROTECTED]
ACNS Inc. Calgary, Alberta Canada

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-stable in the body of the message



Re: 4.4-STABLE crashes - suspects new ata-driver over wd-drivers

2001-12-17 Thread David Wolfskill

Date: Mon, 17 Dec 2001 14:21:15 -0700
From: Chad David [EMAIL PROTECTED]

I still agree.  My -current machines run find, and I refuse to run -stable on
an SMP machine.

For whatever it may be worth, my build machine (which is one of the
machines on which I track both -STABLE and -CURRENT daily) is an SMP box.
It also has a local CVS repository on it (from which I update the CVS
repository on my laptop, which laso tracks -STABLE and -CURRENT daily).

I am not having any problems with -STABLE that I know of.

Cheers,
david
-- 
David H. Wolfskill  [EMAIL PROTECTED]
I believe it would be irresponsible (and thus, unethical) for me to advise,
recommend, or support the use of any product that is or depends on any
Microsoft product for any purpose other than personal amusement.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-stable in the body of the message



Re: staroffice5.2

2001-12-17 Thread Martin Schweizer

Hello Antonin

I've tested also serveral time but with no success. Afterwoods I used the 
original SUN cd-rom. Do you have one?

On Mon, Dec 17, 2001 at 04:35:46PM +0100 Ing. AntonĂ­n Walter wrote:
 maybe I did not use the correct word in english, but
 
 on using the port you have suggested it first downloaded the 97GB file of
 staroffice and the tried to download this
 
 109939-02.tar.Z
 
 file but it was not available at any location the port was looking at

-- 
Regards

Martin Schweizer
[EMAIL PROTECTED]

PC-Service M. Schweizer; Gewerbehaus Schwarz; CH-8608 Bubikon
Tel. +41 55 243 30 00; Fax: +41 55 243 33 22; http://www.pc-service.ch

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-stable in the body of the message



Re: 4.4-STABLE crashes - suspects new ata-driver over wd-drivers

2001-12-17 Thread Chad David

On Mon, Dec 17, 2001 at 01:33:19PM -0800, David Wolfskill wrote:
 Date: Mon, 17 Dec 2001 14:21:15 -0700
 From: Chad David [EMAIL PROTECTED]
 
 I still agree.  My -current machines run find, and I refuse to run -stable on
 an SMP machine.
 
 For whatever it may be worth, my build machine (which is one of the
 machines on which I track both -STABLE and -CURRENT daily) is an SMP box.
 It also has a local CVS repository on it (from which I update the CVS
 repository on my laptop, which laso tracks -STABLE and -CURRENT daily).
 
 I am not having any problems with -STABLE that I know of.

It might be worth a lot :).  It is possible that I and a few others have
bad hardware, and there is no problem with -stable.  It seems unlikely,
but it is not at all impossible I guess.

Are you running any combination of samba/nfs/ata?

Thanks

-- 
Chad David[EMAIL PROTECTED]
ACNS Inc. Calgary, Alberta Canada

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-stable in the body of the message