Re: Large discrepancy in reported disk usage on USR partition

2008-10-29 Thread Jeremy Chadwick
On Thu, Oct 30, 2008 at 11:12:32AM +1030, Brendan Hart wrote:
> I have inherited some servers running various releases of FreeBSD and I am
> having some trouble with the /usr partition on one of these boxen.
> 
> The problem is that there appears to be far more space used on the USR
> partition than there are actual files on the partition. The utility "df -h"
> reports 25GB used (i.e. nearly the whole partition), but "du -x /usr"
> reports only 7.6GB of files.

Have you tried playing with tunefs(8), -m flag?

I can't reproduce this behaviour on any of our systems.

icarus# df -k /usr
Filesystem   1024-blocksUsed Avail Capacity  Mounted on
/dev/ad12s1f   167879968 1973344 152476228 1%/usr
icarus# du -sx /usr
1973344 /usr

eos# df -k /usr
Filesystem  1024-blocksUsedAvail Capacity  Mounted on
/dev/ad0s1f32494668 2261670 27633426 8%/usr
eos# du -sx /usr
2261670 /usr

anubis# df -k /usr
Filesystem  1024-blocksUsedAvail Capacity  Mounted on
/dev/ad4s1f80010344 1809620 71799898 2%/usr
anubis# du -sx /usr
1809620 /usr

horus# df -k /usr
Filesystem  1024-blocksUsedAvail Capacity  Mounted on
/dev/ad4s1f32494668 1608458 28286638 5%/usr
horus# du -sx /usr
1608458 /usr

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


RE: Large discrepancy in reported disk usage on USR partition

2008-10-29 Thread Brendan Hart
Hi,

The space reserved as minfree does not appear to have been changed from the
default setting of 8%. Is your suggestion that I should change it to a
larger value? I don't understand how modifying it now could fix the
situation, but I could be missing something.

The output of "tunefs -p /usr" is as follows:

#: tunefs -p /usr
tunefs: ACLs: (-a) disabled
tunefs: MAC multilabel: (-l)   disabled
tunefs: soft updates: (-n) enabled
tunefs: maximum blocks per file in a cylinder group: (-e)  2048
tunefs: average file size: (-f)16384
tunefs: average number of files in a directory: (-s)   64
tunefs: minimum percentage of free space: (-m) 8%
tunefs: optimization preference: (-o)  time
tunefs: volume label: (-L)

I have not observed the problem on any of the other ~dozen FreeBSD servers
in our data centre. 

Could the "missing" space be an indication of hardware disk issues i.e.
physical blocks marked as bad? 

Is it possible on UFS2 for disk space to be allocated but hidden somehow?
(although I have been running the commands such as "du -x" as superuser)
Similarly, is it possible on UFS2 for disk space to be allocated in "lost
cluster chains" ?

Best Regards,
Brendan Hart

-Original Message-
From: Jeremy Chadwick [mailto:[EMAIL PROTECTED] 
Sent: Thursday, 30 October 2008 11:50 AM
To: Brendan Hart
Cc: freebsd-questions@freebsd.org
Subject: Re: Large discrepancy in reported disk usage on USR partition

On Thu, Oct 30, 2008 at 11:12:32AM +1030, Brendan Hart wrote:
> I have inherited some servers running various releases of FreeBSD and I am
> having some trouble with the /usr partition on one of these boxen.
> 
> The problem is that there appears to be far more space used on the USR
> partition than there are actual files on the partition. The utility "df
-h"
> reports 25GB used (i.e. nearly the whole partition), but "du -x /usr"
> reports only 7.6GB of files.

Have you tried playing with tunefs(8), -m flag?

I can't reproduce this behaviour on any of our systems.

icarus# df -k /usr
Filesystem   1024-blocksUsed Avail Capacity  Mounted on
/dev/ad12s1f   167879968 1973344 152476228 1%/usr
icarus# du -sx /usr
1973344 /usr

eos# df -k /usr
Filesystem  1024-blocksUsedAvail Capacity  Mounted on
/dev/ad0s1f32494668 2261670 27633426 8%/usr
eos# du -sx /usr
2261670 /usr

anubis# df -k /usr
Filesystem  1024-blocksUsedAvail Capacity  Mounted on
/dev/ad4s1f80010344 1809620 71799898 2%/usr
anubis# du -sx /usr
1809620 /usr

horus# df -k /usr
Filesystem  1024-blocksUsedAvail Capacity  Mounted on
/dev/ad4s1f32494668 1608458 28286638 5%/usr
horus# du -sx /usr
1608458 /usr

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |



__ Information from ESET NOD32 Antivirus, version of virus signature
database 3567 (20081029) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com

 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 3567 (20081029) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Large discrepancy in reported disk usage on USR partition

2008-10-29 Thread Jeremy Chadwick
On Thu, Oct 30, 2008 at 12:11:58PM +1030, Brendan Hart wrote:
> The space reserved as minfree does not appear to have been changed from the
> default setting of 8%.

Okay, then that's likely not the problem.

> Is your suggestion that I should change it to a larger value?

That would just make your problem worse.  :-)

> I don't understand how modifying it now could fix the situation, but I
> could be missing something.

Well, the feature I described isn't what's causing your problem, but to
clarify: if you change the percentage, it applies immediately.  I read
"I don't understand how modifying it now could fix ..." to mean "isn't
this option applied during newfs?"

> I have not observed the problem on any of the other ~dozen FreeBSD servers
> in our data centre. 

Unless someone more clueful chimes in with better hints, the obvious
choice here is going to be "recreate the filesystem".  I'd tell you
something like "try using ffsinfo(8)?", but I've never used the tool,
so very little of the output will make sense to me.

> Could the "missing" space be an indication of hardware disk issues i.e.
> physical blocks marked as bad? 

The simple answer is no, bad blocks would not cause what you're seeing.
smartctl -a /dev/disk will help you determine if there's evidence the
disk is in bad shape.  I can help you with reading SMART stats if need
be.

Since you booted single-user and presumably ran fsck -f /usr, and
nothing came back, I'm left to believe this isn't filesystem corruption.

> Is it possible on UFS2 for disk space to be allocated but hidden somehow?
> (although I have been running the commands such as "du -x" as superuser)

That's exactly what the above tunefs parameter describes.

> Similarly, is it possible on UFS2 for disk space to be allocated in "lost
> cluster chains" ?

I don't know what this means.  Someone more clueful will have to answer.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


RE: Large discrepancy in reported disk usage on USR partition

2008-10-29 Thread Brendan Hart
On Thu 30/10/2008 12:25 PM, Jeremy Chadwick wrote:
>> Could the "missing" space be an indication of hardware disk issues i.e.
>> physical blocks marked as bad? 

>The simple answer is no, bad blocks would not cause what you're seeing.
>smartctl -a /dev/disk will help you determine if there's evidence the disk
is in bad shape.  I can help you with reading SMART stats if need be.

I took a look at using the smart tools as you suggested, but have now found
that the disk in question is a RAID1 set on a DELL PERC 3/Di controller and
smartctl does not appear to be the correct tool to access the SMART data for
the individual disks. After a little research, I have found the aaccli tool
and used it to get the following information:

AAC0> disk show smart
Executing: disk show smart
SmartMethod of Enable
Capable  Informational Exception  Performance  Error
B:ID:L  Device   Exceptions(MRIE)  ControlEnabled  Count
--  ---    -  ---  --
0:00:0 Y6 Y   N 0
0:01:0 Y6 Y   N 0

AAC0> disk show defects 00
Executing: disk show defects (ID=0)
Number of PRIMARY defects on drive: 285
Number of GROWN defects on drive: 0

AAC0> disk show defects 01
Executing: disk show defects (ID=1)
Number of PRIMARY defects on drive: 193
Number of GROWN defects on drive: 0


This output doesn't seem to indicate existing physical issues on the disks. 

> Since you booted single-user and presumably ran fsck -f /usr, and nothing
came back, I'm left to believe this isn't filesystem corruption.

Yes, this is the command I tried when I went into the data centre yesterday,
and yes, nothing came back. 

I have done some additional digging and noticed that there is a /usr/.snap
folder present. "ls -al" shows no content however. Some quick searching
shows this could possibly be part of a UFS snapshot... I wonder if partition
snapshots might be the cause of my major disk space "loss". Some old message
group posts suggest that UFS snapshots were dangerously flakey on Release
6.1, so I would hope that my predecessors were not using them however...  Do
you know anything about snapshots, and how I could see what/if any/ space is
used by snapshots?

I also took a look to see if the issue could be something like running out
of inodes, But this does't seem to be the case:

#: df -ih /usr
Filesystem   SizeUsed   Avail Capacity iused   ifree %iused  Mounted
on
/dev/aacd0s1f 28G 25G1.1G96%  708181 3107241   19%   /usr


BTW Jeremy, thanks for your help thus far.

I will wait and see if any other list member has any suggestions for me to
try, but I am now leaning toward scrubbing the system. Oh well.

Best Regards,
Brendan Hart

-
Brendan Hart, Development Manager
Strategic Ecommerce Division
Securepay Pty Ltd
Phone: 08-8274-4000
Fax: 08-8274-1400 
 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 3568 (20081030) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Large discrepancy in reported disk usage on USR partition

2008-10-29 Thread Jeremy Chadwick
On Thu, Oct 30, 2008 at 02:04:36PM +1030, Brendan Hart wrote:
> On Thu 30/10/2008 12:25 PM, Jeremy Chadwick wrote:
> >> Could the "missing" space be an indication of hardware disk issues i.e.
> >> physical blocks marked as bad? 
> 
> >The simple answer is no, bad blocks would not cause what you're seeing.
> >smartctl -a /dev/disk will help you determine if there's evidence the disk
> is in bad shape.  I can help you with reading SMART stats if need be.
> 
> I took a look at using the smart tools as you suggested, but have now found
> that the disk in question is a RAID1 set on a DELL PERC 3/Di controller and
> smartctl does not appear to be the correct tool to access the SMART data for
> the individual disks.  After a little research, I have found the aaccli tool
> and used it to get the following information:

Sadly, that controller does not show you SMART attributes.  This is one
of the biggest problems with the majority (but not all) of hardware RAID
controllers -- they give you no access to disk-level things like SMART.
FreeBSD has support for such (using CAM's pass(4)), but the driver has
to support/use it, *and* the card firmware has to support it.  At
present, Areca, 3Ware, and Promise controllers support such; HighPoint
might, but I haven't confirmed it.  Adaptec does not.

What you showed tells me nothing about SMART, other than the remote
possibility its basing some of its decisions on the "general SMART
health status", which means jack squat.  I can explain why this is if
need be, but it's not related to the problem you're having.

Either way, this is just one of many reasons to avoid hardware RAID
controllers if given the choice.

> AAC0> disk show defects 00
> Executing: disk show defects (ID=0)
> Number of PRIMARY defects on drive: 285
> Number of GROWN defects on drive: 0
> 
> AAC0> disk show defects 01
> Executing: disk show defects (ID=1)
> Number of PRIMARY defects on drive: 193
> Number of GROWN defects on drive: 0
> 
> This output doesn't seem to indicate existing physical issues on the disks. 

I hope these are SCSI disks you're showing here, otherwise I'm not sure
how the controller is able to get the primary defect count of a SATA or
SAS disk.  So, assuming the numbers shown are accurate, then yes, I
don't think there's any disk-level problem.

> I have done some additional digging and noticed that there is a /usr/.snap
> folder present. "ls -al" shows no content however. Some quick searching
> shows this could possibly be part of a UFS snapshot...

Correct; the .snap directory is used for UFS2 snapshots and
mksnap_ffs(8) (which is also the program dump -L uses).

> I wonder if partition snapshots might be the cause of my major disk
> space "loss".

Your /usr/.snap directory is empty; there are no snapshots.  That said,
are you actually making filesystem snapshots using dump or mksnap_ffs?
If not, then you're barking up the wrong tree.  :-)

> I also took a look to see if the issue could be something like running out
> of inodes, But this does't seem to be the case:
> 
> #: df -ih /usr
> Filesystem   SizeUsed   Avail Capacity iused   ifree %iused  Mounted
> on
> /dev/aacd0s1f 28G 25G1.1G96%  708181 3107241   19%   /usr

inodes != disk space, but I'm pretty sure you know that.

I understand at this point you're running around with your arms in the
air, but you've already confirmed one thing: none of your other systems
exhibit this problem.  If this is a production environment, step back a
moment and ask yourself: "just how much time is this worth?"  It might
be better to just newfs the filesystem and be done with it, especially
if this is a one-time-never-seen-before thing.

> I will wait and see if any other list member has any suggestions for me to
> try, but I am now leaning toward scrubbing the system. Oh well.

When you say scrubbing, are you referring to actually formatting/wiping
the system, or are you referring to disk scrubbing?

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Large discrepancy in reported disk usage on USR partition

2008-10-30 Thread Mel
On Thursday 30 October 2008 01:42:32 Brendan Hart wrote:
> Hi,
>
> I have inherited some servers running various releases of FreeBSD and I am
> having some trouble with the /usr partition on one of these boxen.
>
> The problem is that there appears to be far more space used on the USR
> partition than there are actual files on the partition. The utility "df -h"
> reports 25GB used (i.e. nearly the whole partition), but "du -x /usr"
> reports only 7.6GB of files.
>
> I have reviewed the FAQ, particularly item 9.24 "The du and df commands
> show different amounts of disk space available. What is going on?".
> However, the suggested cause of the discrepancy (large files already
> unlinked but still held open by active processes), does not appear to be
> true in this case as problem is present even after rebooting into single
> user mode.
>
> #: uname -a
> FreeBSD ibisweb4spare.strategicecommerce.com.au 6.1-RELEASE FreeBSD
> 6.1-RELEASE #0: Sun May  7 04:42:56 UTC 2006
> [EMAIL PROTECTED]:/usr/obj/usr/src/sys/SMP  i386
>
> #: df -h
> Filesystem  SizeUsed   Avail Capacity  Mounted on
> /dev/aacd0s1a   496M163M 293M36%/
> devfs   1.0K1.0K 0B  100%   /dev
> /dev/aacd0s1e   496M15M  441M3% /tmp
> /dev/aacd0s1f28G25G  1.2G96%/usr
> /dev/aacd0s1d   1.9G429M 1.3G24%/var

Is this output untruncated? Is df really df or an alias to 'df -t nonfs'?

> #: du -x -h /usr
> 2.0K/usr/.snap
>  24M/usr/bin
>   
>   
>   
> 584M/usr/ports
> 140K/usr/lost+found
> 7.6G/usr

Is it possible that nfs directory got written to /usr at some point in time? 
You would only notice this with du if the nfs directory is unmounted.

Unmount it and ls -al /usr/mountpoint should only give you an empty dir.
-- 
Mel

Problem with today's modular software: they start with the modules
and never get to the software part.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


RE: Large discrepancy in reported disk usage on USR partition

2008-10-30 Thread Brendan Hart
>> I took a look at using the smart tools as you suggested, but have now 
>> found that the disk in question is a RAID1 set on a DELL PERC 3/Di 
>> controller and smartctl does not appear to be the correct tool to 
>> access the SMART data for the individual disks.  After a little 
>> research, I have found the aaccli tool and used it to get the following
information:

> Sadly, that controller does not show you SMART attributes.  This is one of
> the biggest problems with the majority (but not all) of hardware RAID 
> controllers -- they give you no access to disk-level things like SMART.
> FreeBSD has support for such (using CAM's pass(4)), but the driver has
> to support/use it, *and* the card firmware has to support it.  At present,
> Areca, 3Ware, and Promise controllers support such; HighPoint might, but 
> I haven't confirmed it.  Adaptec does not.

> What you showed tells me nothing about SMART, other than the remote
possibility 
> its basing some of its decisions on the "general SMART health status", 
> which means jack squat.  I can explain why this is if need be, but it's
> not related to the problem you're having.

Thanks for this additional information. I hadn't understood that there was
far more information behind the simple SMART ok/not ok reported by the PERC
controller.

> Either way, this is just one of many reasons to avoid hardware RAID
controllers if given the choice.

I have seen some mentions of using gvinum and/or gmirror to achieve the
goals of protection from Single Point Of Failure with a single disk, which I
believe is the reason that most people, myself included, have specified
Hardware RAID in their servers. Is this what you mean by avoiding Hardware
Raid? 


> I hope these are SCSI disks you're showing here, otherwise I'm not sure
how the 
> controller is able to get the primary defect count of a SATA or SAS disk.
So, 
> assuming the numbers shown are accurate, then yes, I don't think there's
any 
> disk-level problem.

Yes, they are SCSI disks. Not particularly relevant to this topic, but
interesting: I would have thought that SAS would make the same information
available as SCSI does, as it is a serial bus evolution of SCSI. Is this
thinking incorrect?

> I understand at this point you're running around with your arms in the
air, 
> but you've already confirmed one thing: none of your other systems exhibit

> this problem.  If this is a production environment, step back a moment and

> ask yourself: "just how much time is this worth?"  It might be better to
just 
> newfs the filesystem and be done with it, especially if this is a
one-time-never-seen-before thing.

>> I will wait and see if any other list member has any suggestions for 
>> me to try, but I am now leaning toward scrubbing the system. Oh well.

> When you say scrubbing, are you referring to actually formatting/wiping
the system, or are you referring to disk scrubbing?

I meant reformatting and reinstalling, as a way to escape the issue without
spending too much more time on it. I would of course like to understand the
problem so as to know what to avoid in the future, but as you make the point
above, time is money and it is rapidly approaching the point where it isn't
worth any more effort.

Thanks for all your help.

Best Regards,
Brendan Hart

 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 3571 (20081030) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Large discrepancy in reported disk usage on USR partition

2008-10-30 Thread Jeremy Chadwick
On Fri, Oct 31, 2008 at 11:15:15AM +1030, Brendan Hart wrote:
> > What you showed tells me nothing about SMART, other than the remote 
> > possibility 
> > its basing some of its decisions on the "general SMART health status", 
> > which means jack squat.  I can explain why this is if need be, but it's
> > not related to the problem you're having.
> 
> Thanks for this additional information. I hadn't understood that there was
> far more information behind the simple SMART ok/not ok reported by the PERC
> controller.

Here's an example of some attributes:

ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  UPDATED  
WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 0x000f   200   200   051Pre-fail  Always   
-   0
  3 Spin_Up_Time0x0003   178   175   021Pre-fail  Always   
-   6066
  4 Start_Stop_Count0x0032   100   100   000Old_age   Always   
-   50
  5 Reallocated_Sector_Ct   0x0033   200   200   140Pre-fail  Always   
-   0
  7 Seek_Error_Rate 0x000e   200   200   051Old_age   Always   
-   0
  9 Power_On_Hours  0x0032   085   085   000Old_age   Always   
-   11429
 10 Spin_Retry_Count0x0012   100   253   051Old_age   Always   
-   0
 11 Calibration_Retry_Count 0x0012   100   253   051Old_age   Always   
-   0
 12 Power_Cycle_Count   0x0032   100   100   000Old_age   Always   
-   48
192 Power-Off_Retract_Count 0x0032   200   200   000Old_age   Always   
-   33
193 Load_Cycle_Count0x0032   200   200   000Old_age   Always   
-   50
194 Temperature_Celsius 0x0022   117   100   000Old_age   Always   
-   33
196 Reallocated_Event_Count 0x0032   200   200   000Old_age   Always   
-   0
197 Current_Pending_Sector  0x0012   200   200   000Old_age   Always   
-   0
198 Offline_Uncorrectable   0x0010   200   200   000Old_age   Offline  
-   0
199 UDMA_CRC_Error_Count0x003e   200   200   000Old_age   Always   
-   0
200 Multi_Zone_Error_Rate   0x0008   200   200   051Old_age   Offline  
-   0

You probably now understand why having access to this information is
useful.  :-)  It's very disappointing that so many RAID controllers
don't provide a way to get at this information; the ones which do I am
very thankful for!

> > Either way, this is just one of many reasons to avoid hardware RAID
> controllers if given the choice.
> 
> I have seen some mentions of using gvinum and/or gmirror to achieve the
> goals of protection from Single Point Of Failure with a single disk, which I
> believe is the reason that most people, myself included, have specified
> Hardware RAID in their servers. Is this what you mean by avoiding Hardware
> Raid? 

More or less.  Hardware RAID has some advantages (I can dig up a mail of
mine long ago outlining what the advantages were), but a lot of the time
the controller acts as more of a hindrance than a benefit.  I personally
feel the negatives outweigh the positives, but each person has different
needs and requirements.  There are some controllers which work very well
and provide great degrees of insights (at a disk level) under FreeBSD,
and those are often what I recommend if someone wants to go that route.

I make it sound like I'm the authoritative voice for what a person
should or should not buy -- I'm not.  I predominantly rely on Intel ICHx
on-board controllers with SATA disks, because ICHx works quite well
under FreeBSD (especially with AHCI).

I personally have no experience with gmirror or gvinum, but I do have
experience with ZFS.  (I'll have a little more experience with gmirror
once I have the time to test some reported problems with gmirror and
high interrupt counts when a disk is hot-swapped).

> > I hope these are SCSI disks you're showing here, otherwise I'm not sure how 
> > the 
> > controller is able to get the primary defect count of a SATA or SAS disk.  
> > So, 
> > assuming the numbers shown are accurate, then yes, I don't think there's 
> > any 
> > disk-level problem.
>
> Yes, they are SCSI disks. Not particularly relevant to this topic, but
> interesting: I would have thought that SAS would make the same information
> available as SCSI does, as it is a serial bus evolution of SCSI. Is this
> thinking incorrect?

I don't have any experience with SAS, so I can't comment on what
features are available on SAS.

Specifically with regards to SMART: historically, SCSI does not provide
the amount of granularity/detail with attributes as ATA/SATA does.  I do
not consider this a negative against SCSI (in case, I very much like
SCSI).  SAS might provide these details, but I don't know, as I don't
have any SAS disks.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  

RE: Large discrepancy in reported disk usage on USR partition

2008-10-30 Thread Brendan Hart
>> #: df -h
>> Filesystem  SizeUsed   Avail Capacity  Mounted on
>> /dev/aacd0s1a   496M163M 293M36%/
>> devfs   1.0K1.0K 0B  100%   /dev
>> /dev/aacd0s1e   496M15M  441M3% /tmp
>> /dev/aacd0s1f28G25G  1.2G96%/usr
>> /dev/aacd0s1d   1.9G429M 1.3G24%/var

> Is this output untruncated? Is df really df or an alias to 'df -t nonfs'?

Yes, it really is the untruncated output of "df -h". I also tried the "df -t
nonfs" and it gives exactly the same output as "df". What are you expecting
that is not present in the output ?

> Is it possible that nfs directory got written to /usr at some point in
time? 
> You would only notice this with du if the nfs directory is unmounted.
> Unmount it and ls -al /usr/mountpoint should only give you an empty dir

Bingo!! That is exactly the problem. An NFS mount was hiding a 17G local dir
which had an old copy of the entire NFS mounted dir. I guess it must have
been written incorrectly to this standby server by RSYNC before the NFS
mount was put in place. I will add an exclusion to rsync to make sure it
does not happen again even if the NFS dir is not mounted.

Thank you for your help, you have saved me much time rebuilding this server.

Best Regards,
Brendan Hart

-
Brendan Hart, Development Manager
Strategic Ecommerce Division
Securepay Pty Ltd
Phone: 08-8274-4000
Fax: 08-8274-1400 
 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 3571 (20081030) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Large discrepancy in reported disk usage on USR partition

2008-10-30 Thread Jeremy Chadwick
On Fri, Oct 31, 2008 at 11:50:39AM +1030, Brendan Hart wrote:
> >> #: df -h
> >> Filesystem  SizeUsed   Avail Capacity  Mounted on
> >> /dev/aacd0s1a   496M163M 293M36%/
> >> devfs   1.0K1.0K 0B  100%   /dev
> >> /dev/aacd0s1e   496M15M  441M3% /tmp
> >> /dev/aacd0s1f28G25G  1.2G96%/usr
> >> /dev/aacd0s1d   1.9G429M 1.3G24%/var
> 
> > Is this output untruncated? Is df really df or an alias to 'df -t nonfs'?
> 
> Yes, it really is the untruncated output of "df -h". I also tried the "df -t
> nonfs" and it gives exactly the same output as "df". What are you expecting
> that is not present in the output ?
> 
> > Is it possible that nfs directory got written to /usr at some point in
> time? 
> > You would only notice this with du if the nfs directory is unmounted.
> > Unmount it and ls -al /usr/mountpoint should only give you an empty dir
> 
> Bingo!! That is exactly the problem. An NFS mount was hiding a 17G local dir
> which had an old copy of the entire NFS mounted dir. I guess it must have
> been written incorrectly to this standby server by RSYNC before the NFS
> mount was put in place. I will add an exclusion to rsync to make sure it
> does not happen again even if the NFS dir is not mounted.
> 
> Thank you for your help, you have saved me much time rebuilding this server.

Can either of you outline what exactly happened here?  I'm trying to
figure out how an "NFS mount was hiding a 17G local dir", when there's
no NFS mounts shown in the above df output.  This is purely an ignorant
question on my part, but I'm not able to piece together what happened.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Large discrepancy in reported disk usage on USR partition

2008-10-30 Thread Kevin Kinsey

Jeremy Chadwick wrote:

On Fri, Oct 31, 2008 at 11:50:39AM +1030, Brendan Hart wrote:

#: df -h
Filesystem  SizeUsed   Avail Capacity  Mounted on
/dev/aacd0s1a   496M163M 293M36%/
devfs   1.0K1.0K 0B  100%   /dev
/dev/aacd0s1e   496M15M  441M3% /tmp
/dev/aacd0s1f28G25G  1.2G96%/usr
/dev/aacd0s1d   1.9G429M 1.3G24%/var

Is this output untruncated? Is df really df or an alias to 'df -t nonfs'?

Yes, it really is the untruncated output of "df -h". I also tried the "df -t
nonfs" and it gives exactly the same output as "df". What are you expecting
that is not present in the output ?


I would have to assume he's looking for an NFS mount ;-)


Is it possible that nfs directory got written to /usr at some point in
time? 

You would only notice this with du if the nfs directory is unmounted.
Unmount it and ls -al /usr/mountpoint should only give you an empty dir



Bingo!! That is exactly the problem. An NFS mount was hiding a 17G local dir
which had an old copy of the entire NFS mounted dir. I guess it must have
been written incorrectly to this standby server by RSYNC before the NFS
mount was put in place. I will add an exclusion to rsync to make sure it
does not happen again even if the NFS dir is not mounted.

Thank you for your help, you have saved me much time rebuilding this server.


Can either of you outline what exactly happened here?  I'm trying to
figure out how an "NFS mount was hiding a 17G local dir", when there's
no NFS mounts shown in the above df output.  This is purely an ignorant
question on my part, but I'm not able to piece together what happened.


Well, it would appear that perhaps Mel also guessed right about df
being aliased?  Just my guess, but, as you mention, no nfs mounts
appear.  I may be mistaken, but I think it's also possible to get
into this sort of situation by mounting a local partition on a 
non-empty mountpoint---at least, it happened to me recently.


Kevin Kinsey
--
A triangle which has an angle of 135 degrees is called an obscene
triangle.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


RE: Large discrepancy in reported disk usage on USR partition

2008-10-30 Thread Brendan Hart
Now that you mention it, it *is* strange that the NFS mount was not listed
by the "df" function.

Try again after a fresh reboot:

#: df -h
Filesystem SizeUsed   Avail Capacity  Mounted on
/dev/aacd0s1a  496M176M280M39%/
devfs  1.0K1.0K  0B   100%/dev
/dev/aacd0s1e  496M 15M441M 3%/tmp
/dev/aacd0s1f   28G4.8G 21G19%/usr
/dev/aacd0s1d  1.9G430M1.3G24%/var
server2:/storage/blah/foo/data/397G103G262G28%
/usr/home/development/mount/foobar

I guess I must have missed the final line when copying the output when I
first posted to the mailing list. And then when I replied Mel, I had already
nmounted the NFS dir when attempting the suggested fix, so it did not show
when I ran "df" again to double-check, and I did not realize what had
happened.

I apologise for any confusion caused.

Best Regards,
Brendan Hart

-
Brendan Hart, Development Manager
Strategic Ecommerce Division
Securepay Pty Ltd
Phone: 08-8274-4000
Fax: 08-8274-1400 


-Original Message-
From: Jeremy Chadwick [mailto:[EMAIL PROTECTED] 
Sent: Friday, 31 October 2008 12:02 PM
To: Brendan Hart
Cc: 'Mel'; freebsd-questions@freebsd.org
Subject: Re: Large discrepancy in reported disk usage on USR partition

On Fri, Oct 31, 2008 at 11:50:39AM +1030, Brendan Hart wrote:
> >> #: df -h
> >> Filesystem  SizeUsed   Avail Capacity  Mounted on
> >> /dev/aacd0s1a   496M163M 293M36%/
> >> devfs   1.0K1.0K 0B  100%   /dev
> >> /dev/aacd0s1e   496M15M  441M3% /tmp
> >> /dev/aacd0s1f28G25G  1.2G96%/usr
> >> /dev/aacd0s1d   1.9G429M 1.3G24%/var
> 
> > Is this output untruncated? Is df really df or an alias to 'df -t
nonfs'?
> 
> Yes, it really is the untruncated output of "df -h". I also tried the 
> "df -t nonfs" and it gives exactly the same output as "df". What are 
> you expecting that is not present in the output ?
> 
> > Is it possible that nfs directory got written to /usr at some point 
> > in
> time? 
> > You would only notice this with du if the nfs directory is unmounted.
> > Unmount it and ls -al /usr/mountpoint should only give you an empty 
> > dir
> 
> Bingo!! That is exactly the problem. An NFS mount was hiding a 17G 
> local dir which had an old copy of the entire NFS mounted dir. I guess 
> it must have been written incorrectly to this standby server by RSYNC 
> before the NFS mount was put in place. I will add an exclusion to 
> rsync to make sure it does not happen again even if the NFS dir is not
mounted.
> 
> Thank you for your help, you have saved me much time rebuilding this
server.

Can either of you outline what exactly happened here?  I'm trying to figure
out how an "NFS mount was hiding a 17G local dir", when there's no NFS
mounts shown in the above df output.  This is purely an ignorant question on
my part, but I'm not able to piece together what happened.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |



__ Information from ESET NOD32 Antivirus, version of virus signature
database 3571 (20081030) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com

 

__ Information from ESET NOD32 Antivirus, version of virus signature
database 3571 (20081030) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Large discrepancy in reported disk usage on USR partition

2008-10-31 Thread Mel
On Friday 31 October 2008 02:20:39 Brendan Hart wrote:

> > Is it possible that nfs directory got written to /usr at some point in
>
> time?
>
> > You would only notice this with du if the nfs directory is unmounted.
> > Unmount it and ls -al /usr/mountpoint should only give you an empty dir
>
> Bingo!! That is exactly the problem. An NFS mount was hiding a 17G local
> dir which had an old copy of the entire NFS mounted dir. I guess it must
> have been written incorrectly to this standby server by RSYNC before the
> NFS mount was put in place. I will add an exclusion to rsync to make sure
> it does not happen again even if the NFS dir is not mounted.

I used to nfs mount /usr/ports and run a cron job on the local machine. I made 
a file on the local machine:
echo 'This is a mountpoint' > /usr/ports/KEEP_ME_EMPTY

The script would:
if [ -e /usr/ports/KEEP_ME_EMPTY ]; then
do_nfs_mount();
if [ -e /usr/ports/KEEP_ME_EMPTY ]; then
give_up_or_wait();
fi
fi

Of course it's fragile, but it works for not so critical issues.


-- 
Mel

Problem with today's modular software: they start with the modules
and never get to the software part.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"