RE: Large discrepancy in reported disk usage on USR partition
I took a look at using the smart tools as you suggested, but have now found that the disk in question is a RAID1 set on a DELL PERC 3/Di controller and smartctl does not appear to be the correct tool to access the SMART data for the individual disks. After a little research, I have found the aaccli tool and used it to get the following information: Sadly, that controller does not show you SMART attributes. This is one of the biggest problems with the majority (but not all) of hardware RAID controllers -- they give you no access to disk-level things like SMART. FreeBSD has support for such (using CAM's pass(4)), but the driver has to support/use it, *and* the card firmware has to support it. At present, Areca, 3Ware, and Promise controllers support such; HighPoint might, but I haven't confirmed it. Adaptec does not. What you showed tells me nothing about SMART, other than the remote possibility its basing some of its decisions on the general SMART health status, which means jack squat. I can explain why this is if need be, but it's not related to the problem you're having. Thanks for this additional information. I hadn't understood that there was far more information behind the simple SMART ok/not ok reported by the PERC controller. Either way, this is just one of many reasons to avoid hardware RAID controllers if given the choice. I have seen some mentions of using gvinum and/or gmirror to achieve the goals of protection from Single Point Of Failure with a single disk, which I believe is the reason that most people, myself included, have specified Hardware RAID in their servers. Is this what you mean by avoiding Hardware Raid? I hope these are SCSI disks you're showing here, otherwise I'm not sure how the controller is able to get the primary defect count of a SATA or SAS disk. So, assuming the numbers shown are accurate, then yes, I don't think there's any disk-level problem. Yes, they are SCSI disks. Not particularly relevant to this topic, but interesting: I would have thought that SAS would make the same information available as SCSI does, as it is a serial bus evolution of SCSI. Is this thinking incorrect? I understand at this point you're running around with your arms in the air, but you've already confirmed one thing: none of your other systems exhibit this problem. If this is a production environment, step back a moment and ask yourself: just how much time is this worth? It might be better to just newfs the filesystem and be done with it, especially if this is a one-time-never-seen-before thing. I will wait and see if any other list member has any suggestions for me to try, but I am now leaning toward scrubbing the system. Oh well. When you say scrubbing, are you referring to actually formatting/wiping the system, or are you referring to disk scrubbing? I meant reformatting and reinstalling, as a way to escape the issue without spending too much more time on it. I would of course like to understand the problem so as to know what to avoid in the future, but as you make the point above, time is money and it is rapidly approaching the point where it isn't worth any more effort. Thanks for all your help. Best Regards, Brendan Hart __ Information from ESET NOD32 Antivirus, version of virus signature database 3571 (20081030) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
RE: Large discrepancy in reported disk usage on USR partition
#: df -h Filesystem SizeUsed Avail Capacity Mounted on /dev/aacd0s1a 496M163M 293M36%/ devfs 1.0K1.0K 0B 100% /dev /dev/aacd0s1e 496M15M 441M3% /tmp /dev/aacd0s1f28G25G 1.2G96%/usr /dev/aacd0s1d 1.9G429M 1.3G24%/var Is this output untruncated? Is df really df or an alias to 'df -t nonfs'? Yes, it really is the untruncated output of df -h. I also tried the df -t nonfs and it gives exactly the same output as df. What are you expecting that is not present in the output ? Is it possible that nfs directory got written to /usr at some point in time? You would only notice this with du if the nfs directory is unmounted. Unmount it and ls -al /usr/mountpoint should only give you an empty dir Bingo!! That is exactly the problem. An NFS mount was hiding a 17G local dir which had an old copy of the entire NFS mounted dir. I guess it must have been written incorrectly to this standby server by RSYNC before the NFS mount was put in place. I will add an exclusion to rsync to make sure it does not happen again even if the NFS dir is not mounted. Thank you for your help, you have saved me much time rebuilding this server. Best Regards, Brendan Hart - Brendan Hart, Development Manager Strategic Ecommerce Division Securepay Pty Ltd Phone: 08-8274-4000 Fax: 08-8274-1400 __ Information from ESET NOD32 Antivirus, version of virus signature database 3571 (20081030) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
RE: Large discrepancy in reported disk usage on USR partition
Now that you mention it, it *is* strange that the NFS mount was not listed by the df function. Try again after a fresh reboot: #: df -h Filesystem SizeUsed Avail Capacity Mounted on /dev/aacd0s1a 496M176M280M39%/ devfs 1.0K1.0K 0B 100%/dev /dev/aacd0s1e 496M 15M441M 3%/tmp /dev/aacd0s1f 28G4.8G 21G19%/usr /dev/aacd0s1d 1.9G430M1.3G24%/var server2:/storage/blah/foo/data/397G103G262G28% /usr/home/development/mount/foobar I guess I must have missed the final line when copying the output when I first posted to the mailing list. And then when I replied Mel, I had already nmounted the NFS dir when attempting the suggested fix, so it did not show when I ran df again to double-check, and I did not realize what had happened. I apologise for any confusion caused. Best Regards, Brendan Hart - Brendan Hart, Development Manager Strategic Ecommerce Division Securepay Pty Ltd Phone: 08-8274-4000 Fax: 08-8274-1400 -Original Message- From: Jeremy Chadwick [mailto:[EMAIL PROTECTED] Sent: Friday, 31 October 2008 12:02 PM To: Brendan Hart Cc: 'Mel'; freebsd-questions@freebsd.org Subject: Re: Large discrepancy in reported disk usage on USR partition On Fri, Oct 31, 2008 at 11:50:39AM +1030, Brendan Hart wrote: #: df -h Filesystem SizeUsed Avail Capacity Mounted on /dev/aacd0s1a 496M163M 293M36%/ devfs 1.0K1.0K 0B 100% /dev /dev/aacd0s1e 496M15M 441M3% /tmp /dev/aacd0s1f28G25G 1.2G96%/usr /dev/aacd0s1d 1.9G429M 1.3G24%/var Is this output untruncated? Is df really df or an alias to 'df -t nonfs'? Yes, it really is the untruncated output of df -h. I also tried the df -t nonfs and it gives exactly the same output as df. What are you expecting that is not present in the output ? Is it possible that nfs directory got written to /usr at some point in time? You would only notice this with du if the nfs directory is unmounted. Unmount it and ls -al /usr/mountpoint should only give you an empty dir Bingo!! That is exactly the problem. An NFS mount was hiding a 17G local dir which had an old copy of the entire NFS mounted dir. I guess it must have been written incorrectly to this standby server by RSYNC before the NFS mount was put in place. I will add an exclusion to rsync to make sure it does not happen again even if the NFS dir is not mounted. Thank you for your help, you have saved me much time rebuilding this server. Can either of you outline what exactly happened here? I'm trying to figure out how an NFS mount was hiding a 17G local dir, when there's no NFS mounts shown in the above df output. This is purely an ignorant question on my part, but I'm not able to piece together what happened. -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | __ Information from ESET NOD32 Antivirus, version of virus signature database 3571 (20081030) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com __ Information from ESET NOD32 Antivirus, version of virus signature database 3571 (20081030) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Large discrepancy in reported disk usage on USR partition
Hi, I have inherited some servers running various releases of FreeBSD and I am having some trouble with the /usr partition on one of these boxen. The problem is that there appears to be far more space used on the USR partition than there are actual files on the partition. The utility df -h reports 25GB used (i.e. nearly the whole partition), but du -x /usr reports only 7.6GB of files. I have reviewed the FAQ, particularly item 9.24 The du and df commands show different amounts of disk space available. What is going on?. However, the suggested cause of the discrepancy (large files already unlinked but still held open by active processes), does not appear to be true in this case as problem is present even after rebooting into single user mode. #: uname -a FreeBSD ibisweb4spare.strategicecommerce.com.au 6.1-RELEASE FreeBSD 6.1-RELEASE #0: Sun May 7 04:42:56 UTC 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/SMP i386 #: df -h Filesystem SizeUsed Avail Capacity Mounted on /dev/aacd0s1a 496M163M 293M36%/ devfs 1.0K1.0K 0B 100% /dev /dev/aacd0s1e 496M15M 441M3% /tmp /dev/aacd0s1f28G25G 1.2G96%/usr /dev/aacd0s1d 1.9G429M 1.3G24%/var #: du -x -h /usr 2.0K/usr/.snap 24M/usr/bin snip 584M/usr/ports 140K/usr/lost+found 7.6G/usr The server is used as a standby machine and a nightly cronjob which uses RSYNC to make a copy of the /usr partition from a live server. Depending on how recently the logs have been culled, the Live server has approximately 7-10GB of data on the /usr partition, so I would expect the same size of data on the standby server. This may be irrelevant, but the server also has an external data directory with 11GB mounted via NFS as a directory under the USR partition. Next, I began to suspect some sort of disk corruption (echoes of the old days of MSDOS lost cluster chains) and I have attempted to find disk issues by running fsck, but no issues were reported and the issue was not remedied. I also tried running fsck in single user mode, again, no improvement. Can anyone suggest what I can try next? Best Regards, Brendan Hart - Brendan Hart, Development Manager Strategic Ecommerce Division Securepay Pty Ltd Phone: 08-8274-4000 Fax: 08-8274-1400 __ Information from ESET NOD32 Antivirus, version of virus signature database 3567 (20081029) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
RE: Large discrepancy in reported disk usage on USR partition
Hi, The space reserved as minfree does not appear to have been changed from the default setting of 8%. Is your suggestion that I should change it to a larger value? I don't understand how modifying it now could fix the situation, but I could be missing something. The output of tunefs -p /usr is as follows: #: tunefs -p /usr tunefs: ACLs: (-a) disabled tunefs: MAC multilabel: (-l) disabled tunefs: soft updates: (-n) enabled tunefs: maximum blocks per file in a cylinder group: (-e) 2048 tunefs: average file size: (-f)16384 tunefs: average number of files in a directory: (-s) 64 tunefs: minimum percentage of free space: (-m) 8% tunefs: optimization preference: (-o) time tunefs: volume label: (-L) I have not observed the problem on any of the other ~dozen FreeBSD servers in our data centre. Could the missing space be an indication of hardware disk issues i.e. physical blocks marked as bad? Is it possible on UFS2 for disk space to be allocated but hidden somehow? (although I have been running the commands such as du -x as superuser) Similarly, is it possible on UFS2 for disk space to be allocated in lost cluster chains ? Best Regards, Brendan Hart -Original Message- From: Jeremy Chadwick [mailto:[EMAIL PROTECTED] Sent: Thursday, 30 October 2008 11:50 AM To: Brendan Hart Cc: freebsd-questions@freebsd.org Subject: Re: Large discrepancy in reported disk usage on USR partition On Thu, Oct 30, 2008 at 11:12:32AM +1030, Brendan Hart wrote: I have inherited some servers running various releases of FreeBSD and I am having some trouble with the /usr partition on one of these boxen. The problem is that there appears to be far more space used on the USR partition than there are actual files on the partition. The utility df -h reports 25GB used (i.e. nearly the whole partition), but du -x /usr reports only 7.6GB of files. Have you tried playing with tunefs(8), -m flag? I can't reproduce this behaviour on any of our systems. icarus# df -k /usr Filesystem 1024-blocksUsed Avail Capacity Mounted on /dev/ad12s1f 167879968 1973344 152476228 1%/usr icarus# du -sx /usr 1973344 /usr eos# df -k /usr Filesystem 1024-blocksUsedAvail Capacity Mounted on /dev/ad0s1f32494668 2261670 27633426 8%/usr eos# du -sx /usr 2261670 /usr anubis# df -k /usr Filesystem 1024-blocksUsedAvail Capacity Mounted on /dev/ad4s1f80010344 1809620 71799898 2%/usr anubis# du -sx /usr 1809620 /usr horus# df -k /usr Filesystem 1024-blocksUsedAvail Capacity Mounted on /dev/ad4s1f32494668 1608458 28286638 5%/usr horus# du -sx /usr 1608458 /usr -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | __ Information from ESET NOD32 Antivirus, version of virus signature database 3567 (20081029) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com __ Information from ESET NOD32 Antivirus, version of virus signature database 3567 (20081029) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
RE: Large discrepancy in reported disk usage on USR partition
On Thu 30/10/2008 12:25 PM, Jeremy Chadwick wrote: Could the missing space be an indication of hardware disk issues i.e. physical blocks marked as bad? The simple answer is no, bad blocks would not cause what you're seeing. smartctl -a /dev/disk will help you determine if there's evidence the disk is in bad shape. I can help you with reading SMART stats if need be. I took a look at using the smart tools as you suggested, but have now found that the disk in question is a RAID1 set on a DELL PERC 3/Di controller and smartctl does not appear to be the correct tool to access the SMART data for the individual disks. After a little research, I have found the aaccli tool and used it to get the following information: AAC0 disk show smart Executing: disk show smart SmartMethod of Enable Capable Informational Exception Performance Error B:ID:L Device Exceptions(MRIE) ControlEnabled Count -- --- - --- -- 0:00:0 Y6 Y N 0 0:01:0 Y6 Y N 0 AAC0 disk show defects 00 Executing: disk show defects (ID=0) Number of PRIMARY defects on drive: 285 Number of GROWN defects on drive: 0 AAC0 disk show defects 01 Executing: disk show defects (ID=1) Number of PRIMARY defects on drive: 193 Number of GROWN defects on drive: 0 This output doesn't seem to indicate existing physical issues on the disks. Since you booted single-user and presumably ran fsck -f /usr, and nothing came back, I'm left to believe this isn't filesystem corruption. Yes, this is the command I tried when I went into the data centre yesterday, and yes, nothing came back. I have done some additional digging and noticed that there is a /usr/.snap folder present. ls -al shows no content however. Some quick searching shows this could possibly be part of a UFS snapshot... I wonder if partition snapshots might be the cause of my major disk space loss. Some old message group posts suggest that UFS snapshots were dangerously flakey on Release 6.1, so I would hope that my predecessors were not using them however... Do you know anything about snapshots, and how I could see what/if any/ space is used by snapshots? I also took a look to see if the issue could be something like running out of inodes, But this does't seem to be the case: #: df -ih /usr Filesystem SizeUsed Avail Capacity iused ifree %iused Mounted on /dev/aacd0s1f 28G 25G1.1G96% 708181 3107241 19% /usr BTW Jeremy, thanks for your help thus far. I will wait and see if any other list member has any suggestions for me to try, but I am now leaning toward scrubbing the system. Oh well. Best Regards, Brendan Hart - Brendan Hart, Development Manager Strategic Ecommerce Division Securepay Pty Ltd Phone: 08-8274-4000 Fax: 08-8274-1400 __ Information from ESET NOD32 Antivirus, version of virus signature database 3568 (20081030) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]