from:"Brendan Hart"

RE: Large discrepancy in reported disk usage on USR partition

2008-10-30 Thread Brendan Hart

 I took a look at using the smart tools as you suggested, but have now 
 found that the disk in question is a RAID1 set on a DELL PERC 3/Di 
 controller and smartctl does not appear to be the correct tool to 
 access the SMART data for the individual disks.  After a little 
 research, I have found the aaccli tool and used it to get the following
information:

 Sadly, that controller does not show you SMART attributes.  This is one of
 the biggest problems with the majority (but not all) of hardware RAID 
 controllers -- they give you no access to disk-level things like SMART.
 FreeBSD has support for such (using CAM's pass(4)), but the driver has
 to support/use it, *and* the card firmware has to support it.  At present,
 Areca, 3Ware, and Promise controllers support such; HighPoint might, but 
 I haven't confirmed it.  Adaptec does not.

 What you showed tells me nothing about SMART, other than the remote
possibility 
 its basing some of its decisions on the general SMART health status, 
 which means jack squat.  I can explain why this is if need be, but it's
 not related to the problem you're having.

Thanks for this additional information. I hadn't understood that there was
far more information behind the simple SMART ok/not ok reported by the PERC
controller.

 Either way, this is just one of many reasons to avoid hardware RAID
controllers if given the choice.

I have seen some mentions of using gvinum and/or gmirror to achieve the
goals of protection from Single Point Of Failure with a single disk, which I
believe is the reason that most people, myself included, have specified
Hardware RAID in their servers. Is this what you mean by avoiding Hardware
Raid? 


 I hope these are SCSI disks you're showing here, otherwise I'm not sure
how the 
 controller is able to get the primary defect count of a SATA or SAS disk.
So, 
 assuming the numbers shown are accurate, then yes, I don't think there's
any 
 disk-level problem.

Yes, they are SCSI disks. Not particularly relevant to this topic, but
interesting: I would have thought that SAS would make the same information
available as SCSI does, as it is a serial bus evolution of SCSI. Is this
thinking incorrect?

 I understand at this point you're running around with your arms in the
air, 
 but you've already confirmed one thing: none of your other systems exhibit

 this problem.  If this is a production environment, step back a moment and

 ask yourself: just how much time is this worth?  It might be better to
just 
 newfs the filesystem and be done with it, especially if this is a
one-time-never-seen-before thing.

 I will wait and see if any other list member has any suggestions for 
 me to try, but I am now leaning toward scrubbing the system. Oh well.

 When you say scrubbing, are you referring to actually formatting/wiping
the system, or are you referring to disk scrubbing?

I meant reformatting and reinstalling, as a way to escape the issue without
spending too much more time on it. I would of course like to understand the
problem so as to know what to avoid in the future, but as you make the point
above, time is money and it is rapidly approaching the point where it isn't
worth any more effort.

Thanks for all your help.

Best Regards,
Brendan Hart

 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 3571 (20081030) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]

RE: Large discrepancy in reported disk usage on USR partition

2008-10-30 Thread Brendan Hart

 #: df -h
 Filesystem  SizeUsed   Avail Capacity  Mounted on
 /dev/aacd0s1a   496M163M 293M36%/
 devfs   1.0K1.0K 0B  100%   /dev
 /dev/aacd0s1e   496M15M  441M3% /tmp
 /dev/aacd0s1f28G25G  1.2G96%/usr
 /dev/aacd0s1d   1.9G429M 1.3G24%/var

 Is this output untruncated? Is df really df or an alias to 'df -t nonfs'?

Yes, it really is the untruncated output of df -h. I also tried the df -t
nonfs and it gives exactly the same output as df. What are you expecting
that is not present in the output ?

 Is it possible that nfs directory got written to /usr at some point in
time? 
 You would only notice this with du if the nfs directory is unmounted.
 Unmount it and ls -al /usr/mountpoint should only give you an empty dir

Bingo!! That is exactly the problem. An NFS mount was hiding a 17G local dir
which had an old copy of the entire NFS mounted dir. I guess it must have
been written incorrectly to this standby server by RSYNC before the NFS
mount was put in place. I will add an exclusion to rsync to make sure it
does not happen again even if the NFS dir is not mounted.

Thank you for your help, you have saved me much time rebuilding this server.

Best Regards,
Brendan Hart

-
Brendan Hart, Development Manager
Strategic Ecommerce Division
Securepay Pty Ltd
Phone: 08-8274-4000
Fax: 08-8274-1400 
 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 3571 (20081030) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]

RE: Large discrepancy in reported disk usage on USR partition

2008-10-30 Thread Brendan Hart

Now that you mention it, it *is* strange that the NFS mount was not listed
by the df function.

Try again after a fresh reboot:

#: df -h
Filesystem SizeUsed   Avail Capacity  Mounted on
/dev/aacd0s1a  496M176M280M39%/
devfs  1.0K1.0K  0B   100%/dev
/dev/aacd0s1e  496M 15M441M 3%/tmp
/dev/aacd0s1f   28G4.8G 21G19%/usr
/dev/aacd0s1d  1.9G430M1.3G24%/var
server2:/storage/blah/foo/data/397G103G262G28%
/usr/home/development/mount/foobar

I guess I must have missed the final line when copying the output when I
first posted to the mailing list. And then when I replied Mel, I had already
nmounted the NFS dir when attempting the suggested fix, so it did not show
when I ran df again to double-check, and I did not realize what had
happened.

I apologise for any confusion caused.

Best Regards,
Brendan Hart

-
Brendan Hart, Development Manager
Strategic Ecommerce Division
Securepay Pty Ltd
Phone: 08-8274-4000
Fax: 08-8274-1400 


-Original Message-
From: Jeremy Chadwick [mailto:[EMAIL PROTECTED] 
Sent: Friday, 31 October 2008 12:02 PM
To: Brendan Hart
Cc: 'Mel'; freebsd-questions@freebsd.org
Subject: Re: Large discrepancy in reported disk usage on USR partition

On Fri, Oct 31, 2008 at 11:50:39AM +1030, Brendan Hart wrote:
  #: df -h
  Filesystem  SizeUsed   Avail Capacity  Mounted on
  /dev/aacd0s1a   496M163M 293M36%/
  devfs   1.0K1.0K 0B  100%   /dev
  /dev/aacd0s1e   496M15M  441M3% /tmp
  /dev/aacd0s1f28G25G  1.2G96%/usr
  /dev/aacd0s1d   1.9G429M 1.3G24%/var
 
  Is this output untruncated? Is df really df or an alias to 'df -t
nonfs'?
 
 Yes, it really is the untruncated output of df -h. I also tried the 
 df -t nonfs and it gives exactly the same output as df. What are 
 you expecting that is not present in the output ?
 
  Is it possible that nfs directory got written to /usr at some point 
  in
 time? 
  You would only notice this with du if the nfs directory is unmounted.
  Unmount it and ls -al /usr/mountpoint should only give you an empty 
  dir
 
 Bingo!! That is exactly the problem. An NFS mount was hiding a 17G 
 local dir which had an old copy of the entire NFS mounted dir. I guess 
 it must have been written incorrectly to this standby server by RSYNC 
 before the NFS mount was put in place. I will add an exclusion to 
 rsync to make sure it does not happen again even if the NFS dir is not
mounted.
 
 Thank you for your help, you have saved me much time rebuilding this
server.

Can either of you outline what exactly happened here?  I'm trying to figure
out how an NFS mount was hiding a 17G local dir, when there's no NFS
mounts shown in the above df output.  This is purely an ignorant question on
my part, but I'm not able to piece together what happened.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |



__ Information from ESET NOD32 Antivirus, version of virus signature
database 3571 (20081030) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com

 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 3571 (20081030) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]

Large discrepancy in reported disk usage on USR partition

2008-10-29 Thread Brendan Hart

Hi,

I have inherited some servers running various releases of FreeBSD and I am
having some trouble with the /usr partition on one of these boxen.

The problem is that there appears to be far more space used on the USR
partition than there are actual files on the partition. The utility df -h
reports 25GB used (i.e. nearly the whole partition), but du -x /usr
reports only 7.6GB of files.

I have reviewed the FAQ, particularly item 9.24 The du and df commands show
different amounts of disk space available. What is going on?. However, the
suggested cause of the discrepancy (large files already unlinked but still
held open by active processes), does not appear to be true in this case as
problem is present even after rebooting into single user mode.

#: uname -a
FreeBSD ibisweb4spare.strategicecommerce.com.au 6.1-RELEASE FreeBSD
6.1-RELEASE #0: Sun May  7 04:42:56 UTC 2006
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/SMP  i386

#: df -h
Filesystem  SizeUsed   Avail Capacity  Mounted on
/dev/aacd0s1a   496M163M 293M36%/
devfs   1.0K1.0K 0B  100%   /dev
/dev/aacd0s1e   496M15M  441M3% /tmp
/dev/aacd0s1f28G25G  1.2G96%/usr
/dev/aacd0s1d   1.9G429M 1.3G24%/var

#: du -x -h /usr
2.0K/usr/.snap
 24M/usr/bin
  
  snip
  
584M/usr/ports
140K/usr/lost+found
7.6G/usr


The server is used as a standby machine and a nightly cronjob which uses
RSYNC to make a copy of the /usr partition from a live server. Depending on
how recently the logs have been culled, the Live server has approximately
7-10GB of data on the /usr partition, so I would expect the same size of
data on the standby server.

This may be irrelevant, but the server also has an external data directory
with 11GB mounted via NFS as a directory under the USR partition.

Next, I began to suspect some sort of disk corruption (echoes of the old
days of MSDOS lost cluster chains) and I have attempted to find disk issues
by running fsck, but no issues were reported and the issue was not remedied.
I also tried running fsck in single user mode, again, no improvement.

Can anyone suggest what I can try next?

Best Regards,
Brendan Hart

-
Brendan Hart, Development Manager
Strategic Ecommerce Division
Securepay Pty Ltd
Phone: 08-8274-4000
Fax: 08-8274-1400 

 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 3567 (20081029) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]

RE: Large discrepancy in reported disk usage on USR partition

2008-10-29 Thread Brendan Hart

Hi,

The space reserved as minfree does not appear to have been changed from the
default setting of 8%. Is your suggestion that I should change it to a
larger value? I don't understand how modifying it now could fix the
situation, but I could be missing something.

The output of tunefs -p /usr is as follows:

#: tunefs -p /usr
tunefs: ACLs: (-a) disabled
tunefs: MAC multilabel: (-l)   disabled
tunefs: soft updates: (-n) enabled
tunefs: maximum blocks per file in a cylinder group: (-e)  2048
tunefs: average file size: (-f)16384
tunefs: average number of files in a directory: (-s)   64
tunefs: minimum percentage of free space: (-m) 8%
tunefs: optimization preference: (-o)  time
tunefs: volume label: (-L)

I have not observed the problem on any of the other ~dozen FreeBSD servers
in our data centre. 

Could the missing space be an indication of hardware disk issues i.e.
physical blocks marked as bad? 

Is it possible on UFS2 for disk space to be allocated but hidden somehow?
(although I have been running the commands such as du -x as superuser)
Similarly, is it possible on UFS2 for disk space to be allocated in lost
cluster chains ?

Best Regards,
Brendan Hart

-Original Message-
From: Jeremy Chadwick [mailto:[EMAIL PROTECTED] 
Sent: Thursday, 30 October 2008 11:50 AM
To: Brendan Hart
Cc: freebsd-questions@freebsd.org
Subject: Re: Large discrepancy in reported disk usage on USR partition

On Thu, Oct 30, 2008 at 11:12:32AM +1030, Brendan Hart wrote:
 I have inherited some servers running various releases of FreeBSD and I am
 having some trouble with the /usr partition on one of these boxen.
 
 The problem is that there appears to be far more space used on the USR
 partition than there are actual files on the partition. The utility df
-h
 reports 25GB used (i.e. nearly the whole partition), but du -x /usr
 reports only 7.6GB of files.

Have you tried playing with tunefs(8), -m flag?

I can't reproduce this behaviour on any of our systems.

icarus# df -k /usr
Filesystem   1024-blocksUsed Avail Capacity  Mounted on
/dev/ad12s1f   167879968 1973344 152476228 1%/usr
icarus# du -sx /usr
1973344 /usr

eos# df -k /usr
Filesystem  1024-blocksUsedAvail Capacity  Mounted on
/dev/ad0s1f32494668 2261670 27633426 8%/usr
eos# du -sx /usr
2261670 /usr

anubis# df -k /usr
Filesystem  1024-blocksUsedAvail Capacity  Mounted on
/dev/ad4s1f80010344 1809620 71799898 2%/usr
anubis# du -sx /usr
1809620 /usr

horus# df -k /usr
Filesystem  1024-blocksUsedAvail Capacity  Mounted on
/dev/ad4s1f32494668 1608458 28286638 5%/usr
horus# du -sx /usr
1608458 /usr

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |



__ Information from ESET NOD32 Antivirus, version of virus signature
database 3567 (20081029) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com

 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 3567 (20081029) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]

RE: Large discrepancy in reported disk usage on USR partition

2008-10-29 Thread Brendan Hart

On Thu 30/10/2008 12:25 PM, Jeremy Chadwick wrote:
 Could the missing space be an indication of hardware disk issues i.e.
 physical blocks marked as bad? 

The simple answer is no, bad blocks would not cause what you're seeing.
smartctl -a /dev/disk will help you determine if there's evidence the disk
is in bad shape.  I can help you with reading SMART stats if need be.

I took a look at using the smart tools as you suggested, but have now found
that the disk in question is a RAID1 set on a DELL PERC 3/Di controller and
smartctl does not appear to be the correct tool to access the SMART data for
the individual disks. After a little research, I have found the aaccli tool
and used it to get the following information:

AAC0 disk show smart
Executing: disk show smart
SmartMethod of Enable
Capable  Informational Exception  Performance  Error
B:ID:L  Device   Exceptions(MRIE)  ControlEnabled  Count
--  ---    -  ---  --
0:00:0 Y6 Y   N 0
0:01:0 Y6 Y   N 0

AAC0 disk show defects 00
Executing: disk show defects (ID=0)
Number of PRIMARY defects on drive: 285
Number of GROWN defects on drive: 0

AAC0 disk show defects 01
Executing: disk show defects (ID=1)
Number of PRIMARY defects on drive: 193
Number of GROWN defects on drive: 0


This output doesn't seem to indicate existing physical issues on the disks. 

 Since you booted single-user and presumably ran fsck -f /usr, and nothing
came back, I'm left to believe this isn't filesystem corruption.

Yes, this is the command I tried when I went into the data centre yesterday,
and yes, nothing came back. 

I have done some additional digging and noticed that there is a /usr/.snap
folder present. ls -al shows no content however. Some quick searching
shows this could possibly be part of a UFS snapshot... I wonder if partition
snapshots might be the cause of my major disk space loss. Some old message
group posts suggest that UFS snapshots were dangerously flakey on Release
6.1, so I would hope that my predecessors were not using them however...  Do
you know anything about snapshots, and how I could see what/if any/ space is
used by snapshots?

I also took a look to see if the issue could be something like running out
of inodes, But this does't seem to be the case:

#: df -ih /usr
Filesystem   SizeUsed   Avail Capacity iused   ifree %iused  Mounted
on
/dev/aacd0s1f 28G 25G1.1G96%  708181 3107241   19%   /usr


BTW Jeremy, thanks for your help thus far.

I will wait and see if any other list member has any suggestions for me to
try, but I am now leaning toward scrubbing the system. Oh well.

Best Regards,
Brendan Hart

-
Brendan Hart, Development Manager
Strategic Ecommerce Division
Securepay Pty Ltd
Phone: 08-8274-4000
Fax: 08-8274-1400 
 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 3568 (20081030) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]

RE: Large discrepancy in reported disk usage on USR partition

RE: Large discrepancy in reported disk usage on USR partition

RE: Large discrepancy in reported disk usage on USR partition

Large discrepancy in reported disk usage on USR partition

RE: Large discrepancy in reported disk usage on USR partition

RE: Large discrepancy in reported disk usage on USR partition

6 matches

Site Navigation

Mail list logo

Footer information