Errors on UFS Partitions

2010-01-16 Thread The-IRC FreeBSD
Hi,

I am sorry if I am asking a question that might have been brought up before
I have attempted to research my issue but it has many angles it might be
listed under so please bare with me.

We have had ongoing problems with UFS Errors on our root partition (and any
additional partition that did not have soft-updates enabled by default) and
we recently had a problem with a secondary drive that housed home
directories completely filled up and then everything locked up due-to huge
CPU and Memory usage because nothing was able to write to the drive but when
the server was rebooted it failed to bootup because of critical errors on
the root partition.

We have /etc and /usr on the root partition and our home/var partitions
mistakenly do not have soft-updates flag set.

::dmesg::
http://the-irc.com/dmesg

::mount::
/dev/ad4s1a on / (ufs, local)
devfs on /dev (devfs, local, multilabel)
/dev/ad4s1d on /home (ufs, local, with quotas)
/dev/ad4s1e on /tmp (ufs, local, noexec, nosuid, soft-updates)
/dev/ad4s1f on /var (ufs, local)
devfs on /var/named/dev (devfs, local, multilabel)
procfs on /proc (procfs, local)
/dev/ad0s1e on /Backups (ufs, local, soft-updates)
/dev/ad0s1d on /root (ufs, local, soft-updates)

::fsck /::
** /dev/ad4s1a (NO WRITE)
** Last Mounted on /
** Root file system
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
UNREF FILE I=361477  OWNER=root MODE=100666
SIZE=144464 MTIME=Jan  1 03:59 2010
CLEAR? no

UNREF FILE I=966786  OWNER=root MODE=100644
SIZE=0 MTIME=Jan 15 23:02 2010
CLEAR? no

** Phase 5 - Check Cyl groups
SUMMARY INFORMATION BAD
SALVAGE? no

BLK(S) MISSING IN BIT MAPS
SALVAGE? no

549534 files, 4784719 used, 2830920 free (47200 frags, 347965 blocks, 0.6%
fragmentation)




::fsck /home::
** /dev/ad4s1d (NO WRITE)
** Last Mounted on /home
** Phase 1 - Check Blocks and Sizes
INCORRECT BLOCK COUNT I=1957573 (4 should be 0)
CORRECT? no

INCORRECT BLOCK COUNT I=10270973 (300 should be 0)
CORRECT? no

INCORRECT BLOCK COUNT I=10270976 (44 should be 0)
CORRECT? no

INCORRECT BLOCK COUNT I=10271040 (48 should be 0)
CORRECT? no

INCORRECT BLOCK COUNT I=11871624 (4 should be 0)
CORRECT? no

** Phase 2 - Check Pathnames
UNALLOCATED  I=732010  OWNER=agrippas MODE=100600
SIZE=33868 MTIME=Jan 16 19:05 2010
FILE=/agrippas/services/lib/akill.db

REMOVE? no

UNALLOCATED  I=4545818  OWNER=port1080 MODE=100600
SIZE=2052 MTIME=Jan 16 19:06 2010
FILE=/port1080/services/nick.db

REMOVE? no

** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
UNREF FILE I=730879  OWNER=agrippas MODE=100664
SIZE=3020510 MTIME=Jan 16 18:54 2010
CLEAR? no

LINK COUNT FILE I=732011  OWNER=agrippas MODE=0
SIZE=0 MTIME=Jan 16 19:05 2010  COUNT 0 SHOULD BE -1
ADJUST? no

UNREF FILE I=2359889  OWNER=killjoyr MODE=100600
SIZE=0 MTIME=Jan 10 17:20 2010
CLEAR? no

UNREF FILE I=2359928  OWNER=killjoyr MODE=100600
SIZE=0 MTIME=Jan 10 17:20 2010
CLEAR? no

UNREF FILE I=2359930  OWNER=killjoyr MODE=100600
SIZE=0 MTIME=Jan 10 17:20 2010
CLEAR? no

UNREF FILE I=2359931  OWNER=killjoyr MODE=100600
SIZE=0 MTIME=Jan 10 17:20 2010
CLEAR? no

UNREF FILE I=2359932  OWNER=killjoyr MODE=100600
SIZE=0 MTIME=Jan 10 17:20 2010
CLEAR? no

UNREF FILE I=2359934  OWNER=killjoyr MODE=100600
SIZE=0 MTIME=Jan 10 17:20 2010
CLEAR? no

UNREF FILE I=2360094  OWNER=killjoyr MODE=100600
SIZE=0 MTIME=Jan 10 17:20 2010
CLEAR? no

UNREF FILE I=2360101  OWNER=killjoyr MODE=100600
SIZE=0 MTIME=Jan 10 17:20 2010
CLEAR? no

UNREF FILE I=2360103  OWNER=killjoyr MODE=100600
SIZE=0 MTIME=Jan 10 17:20 2010
CLEAR? no

UNREF FILE I=2360104  OWNER=killjoyr MODE=100600
SIZE=0 MTIME=Jan 10 17:20 2010
CLEAR? no

UNREF FILE I=2360118  OWNER=killjoyr MODE=100600
SIZE=0 MTIME=Jan 10 17:20 2010
CLEAR? no

UNREF FILE I=2360121  OWNER=killjoyr MODE=100600
SIZE=0 MTIME=Jan 10 17:20 2010
CLEAR? no

UNREF FILE I=2360122  OWNER=killjoyr MODE=100600
SIZE=0 MTIME=Jan 10 17:20 2010
CLEAR? no

UNREF FILE I=2360123  OWNER=killjoyr MODE=100600
SIZE=0 MTIME=Jan 10 17:20 2010
CLEAR? no

UNREF FILE I=2360124  OWNER=killjoyr MODE=100600
SIZE=0 MTIME=Jan 11 00:02 2010
CLEAR? no

UNREF FILE I=2920477  OWNER=marianus MODE=100644
SIZE=6 MTIME=Jan  2 20:27 2010
CLEAR? no

UNREF FILE I=2920480  OWNER=marianus MODE=100644
SIZE=6 MTIME=Jan  2 20:27 2010
CLEAR? no

LINK COUNT FILE I=4545817  OWNER=port1080 MODE=0
SIZE=0 MTIME=Jan 16 19:06 2010  COUNT 0 SHOULD BE -1
ADJUST? no

UNREF FILE I=6267525  OWNER=chijiru MODE=100644
SIZE=5 MTIME=Jan  2 10:05 2010
CLEAR? no

UNREF FILE I=6760292  OWNER=jibbanet MODE=100644
SIZE=6 MTIME=Jan 10 20:21 2010
CLEAR? no

UNREF FILE I=7089454  OWNER=talkingi MODE=100600
SIZE=0 MTIME=Jan 10 22:22 2010
CLEAR? no

UNREF FILE I=8668793  OWNER=mutrcom MODE=100660
SIZE=1074 MTIME=Jan  8 14:32 2010
CLEAR? no

UNREF FILE I=9752529  OWNER=gigircco MODE=100600
SIZE=0 MTIME=Jan 11 00:25 2010
CLEAR? no

UNREF FILE I=9752883  OWNER=gigircco MODE=100600
SIZE=18 MTIME=Jan 12 00:04 2010
CLEAR? 

Re: Errors on UFS Partitions

2010-01-16 Thread Dan Nelson
In the last episode (Jan 16), The-IRC FreeBSD said:
 I am sorry if I am asking a question that might have been brought up
 before I have attempted to research my issue but it has many angles it
 might be listed under so please bare with me.
 
 We have had ongoing problems with UFS Errors on our root partition (and
 any additional partition that did not have soft-updates enabled by
 default) and we recently had a problem with a secondary drive that housed
 home directories completely filled up and then everything locked up due-to
 huge CPU and Memory usage because nothing was able to write to the drive
 but when the server was rebooted it failed to bootup because of critical
 errors on the root partition.
 
 We have /etc and /usr on the root partition and our home/var partitions
 mistakenly do not have soft-updates flag set.
 
 ::dmesg::
 http://the-irc.com/dmesg
 
 ::mount::
 /dev/ad4s1a on / (ufs, local)
 devfs on /dev (devfs, local, multilabel)
 /dev/ad4s1d on /home (ufs, local, with quotas)
 /dev/ad4s1e on /tmp (ufs, local, noexec, nosuid, soft-updates)
 /dev/ad4s1f on /var (ufs, local)
 devfs on /var/named/dev (devfs, local, multilabel)
 procfs on /proc (procfs, local)
 /dev/ad0s1e on /Backups (ufs, local, soft-updates)
 /dev/ad0s1d on /root (ufs, local, soft-updates)
 
 ::fsck /::
 ** /dev/ad4s1a (NO WRITE)

fsck'ing a filesystem that is currently mounted read-write will always
produce errors.  Boot in single-user mode if you want to check the root
filesystem or other fs'es that you can't dismount in multi-user mode.

-- 
Dan Nelson
dnel...@allantgroup.com
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Errors on UFS Partitions

2010-01-16 Thread Michael Powell
The-IRC FreeBSD wrote:

 Hi,
 
 I am sorry if I am asking a question that might have been brought up
 before I have attempted to research my issue but it has many angles it
 might be listed under so please bare with me.
 
 We have had ongoing problems with UFS Errors on our root partition (and
 any additional partition that did not have soft-updates enabled by
 default) and we recently had a problem with a secondary drive that housed
 home directories completely filled up and then everything locked up due-to
 huge CPU and Memory usage because nothing was able to write to the drive
 but when the server was rebooted it failed to bootup because of critical
 errors on the root partition.

A healthy system does not get UFS errors during normal operation.
 
 We have /etc and /usr on the root partition and our home/var partitions
 mistakenly do not have soft-updates flag set.
 
 ::dmesg::
 http://the-irc.com/dmesg
 
 ::mount::
 /dev/ad4s1a on / (ufs, local)
 devfs on /dev (devfs, local, multilabel)
 /dev/ad4s1d on /home (ufs, local, with quotas)
 /dev/ad4s1e on /tmp (ufs, local, noexec, nosuid, soft-updates)
 /dev/ad4s1f on /var (ufs, local)
 devfs on /var/named/dev (devfs, local, multilabel)
 procfs on /proc (procfs, local)
 /dev/ad0s1e on /Backups (ufs, local, soft-updates)
 /dev/ad0s1d on /root (ufs, local, soft-updates)
[snip]
 
 To prevent letting these errors go out of control and not beable to fix
 the root partition errors without going into singleuser mode and the other
 partitions by mounting them with soft-updates flag, does anyone advise
 removing everything from the root partition and only leaving the
 bootloader and thus moving /etc and /usr (or most of all just /usr) to
 it's own partition or do you guys have a better solution.

No. Proceeding in directions such as this is a waste of time.
 
 Every partition gets errors over time but if you are unable to correct
 them without downtime how are you to correct them before they get out of
 control?

Probably by not looking for a software solution to a hardware problem. It is 
not normal for a file system to behave as you describe. Moving partitions 
around and other such avenues of approach are doomed to failure as they are 
not addressing the underlying problem.

Real server hardware with sophisticated ECC subsystems usually have some 
BIOS counters which you can check for stats on memory errors. Hard drives 
fail the most often but either bad memory or drive controller can readily 
corrupt data. If you have a RAID controller with RAM cache the RAM could be 
defective.

Hardware failure is going to mean downtime. But I'd be looking for a 
hardware problem, get it fixed, then worry about how to proceed. If you have 
decent backups from before the system was corrupted you can get back to 
where you need to be in relatively short order. Not fixing a hardware defect 
will result in you never getting your server back to normal operation.

-Mike



___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Errors on UFS Partitions

2010-01-16 Thread The-IRC FreeBSD
Thanks everyone for their input it has helped greatly.

Does anyone know a way to toggle soft-updates on a UFS non-root partition
while the system is live or without having to recreate the partition?
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Errors on UFS Partitions

2010-01-16 Thread Erik Trulsson
On Sun, Jan 17, 2010 at 12:30:09AM -0500, The-IRC FreeBSD wrote:
 Thanks everyone for their input it has helped greatly.
 
 Does anyone know a way to toggle soft-updates on a UFS non-root partition
 while the system is live or without having to recreate the partition?

Sure. Use the tunefs(8) utility for this. (Note that it cannot be used
on a filesystem which is mounted read-write.)



-- 
Insert your favourite quote here.
Erik Trulsson
ertr1...@student.uu.se
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org