Re: bad NFS/UDP performance

2008-09-27 Thread Danny Braniss
> 
> :>-vfs.nfs.realign_test: 22141777
> :>+vfs.nfs.realign_test: 498351
> :> 
> :>-vfs.nfsrv.realign_test: 5005908
> :>+vfs.nfsrv.realign_test: 0
> :> 
> :>+vfs.nfsrv.commit_miss: 0
> :>+vfs.nfsrv.commit_blks: 0
> :> 
> :> changing them did nothing - or at least with respect to nfs throughput :-)
> :
> :I'm not sure what any of these do, as NFS is a bit out of my league.
> ::-)  I'll be following this thread though!
> :
> :-- 
> :| Jeremy Chadwickjdc at parodius.com |
> 
> A non-zero nfs_realign_count is bad, it means NFS had to copy the
> mbuf chain to fix the alignment.  nfs_realign_test is just the
> number of times it checked.  So nfs_realign_test is irrelevant.
> it's nfs_realign_count that matters.
> 
it's zero, so I guess I'm ok there.
funny though, on my 'good' machine, vfs.nfsrv.realign_test: 5862999
and on the slow one, it's 0 - but then again the good one has been up
for several days.

> Several things can cause NFS payloads to be improperly aligned.
> Anything from older network drivers which can't start DMA on a 
> 2-byte boundary, resulting in the 14-byte encapsulation header 
> causing improper alignment of the IP header & payload, to rpc
> embedded in NFS TCP streams winding up being misaligned.
> 
> Modern network hardware either support 2-byte-aligned DMA, allowing
> the encapsulation to be 2-byte aligned so the payload winds up being
> 4-byte aligned, or support DMA chaining allowing the payload to be
> placed in its own mbuf, or pad, etc.
> 
> --
> 
> One thing I would check is to be sure a couple of nfsiod's are running
> on the client when doing your tests.  If none are running the RPCs wind
> up being more synchronous and less pipelined.  Another thing I would
> check is IP fragment reassembly statistics (for UDP) - there should be
> none for TCP connections no matter what the NFS I/O size selected.
> 
ahh, nfsiod, it seems that it's now dynamicaly started! at least none show
when host is idle, after i run my tests  there are 20! with ppid 0
need to refresh my NFS knowledge.
how can I see the IP fragment reassembly statistics?

> (It does seem more likely to be scheduler-related, though).
> 

tend to agree, I tried bith ULE/BSD, but the badness is there.

>   -Matt
> 

thanks,
danny


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: bad NFS/UDP performance

2008-09-27 Thread Matthew Dillon
:how can I see the IP fragment reassembly statistics?
:
:thanks,
:   danny

netstat -s

Also look for unexpected dropped packets, dropped fragments, and
errors during the test and such, they are counted in the statistics
as well.

-Matt
Matthew Dillon 
<[EMAIL PROTECTED]>
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY

2008-09-27 Thread Peter Jeremy
On 2008-Sep-26 23:44:17 -0700, Jeremy Chadwick <[EMAIL PROTECTED]> wrote:
>On Fri, Sep 26, 2008 at 10:35:57PM -0700, Derek Kuli??ski wrote:
>> As far as I know (at least ideally, when write caching is disabled)
...
>FreeBSD atacontrol does not let you toggle such features (although "cap"
>will show you if feature is available and if it's enabled or not).

True but it can be disabled via the loader tunable hw.ata.wc (at
least in theory - apparently some drives don't obey the cache disable
command to make them look better in benchmarks).

>Users using SCSI will most definitely have the ability to disable
>said feature (either via SCSI BIOS or via camcontrol).

Soft-updates plus write caching isn't an issue with tagged queueing
(which is standard for SCSI) because the critical point for
soft-updates is knowing when the data is written to non-volatile
storage - which tagged queuing provides.

-- 
Peter Jeremy
Please excuse any delays as the result of my ISP's inability to implement
an MTA that is either RFC2821-compliant or matches their claimed behaviour.


pgp72bNCQab19.pgp
Description: PGP signature


Re: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY

2008-09-27 Thread Derek Kuliński
Hello Jeremy,

Friday, September 26, 2008, 11:44:17 PM, you wrote:

>> As far as I know (at least ideally, when write caching is disabled)

> Re: write caching: wheelies and burn-outs in empty parking lots
> detected.

> Let's be realistic.  We're talking about ATA and SATA hard disks, hooked
> up to on-board controllers -- these are the majority of users.  Those
> with ATA/SATA RAID controllers (not on-board RAID either; most/all of
> those do not let you disable drive write caching) *might* have a RAID
> BIOS menu item for disabling said feature.

> FreeBSD atacontrol does not let you toggle such features (although "cap"
> will show you if feature is available and if it's enabled or not).

> Users using SCSI will most definitely have the ability to disable
> said feature (either via SCSI BIOS or via camcontrol).  But the majority
> of users are not using SCSI disks, because the majority of users are not
> going to spend hundreds of dollars on a controller followed by hundreds
> of dollars for a small (~74GB) disk.

> Regardless of all of this, end-users should, in no way shape or form,
> be expected to go to great lengths to disable their disk's write cache.
> They will not, I can assure you.  Thus, we must assume: write caching
> on a disk will be enabled, period.  If a filesystem is engineered with
> that fact ignored, then the filesystem is either 1) worthless, or 2)
> serves a very niche purpose and should not be the default filesystem.

> Do we agree?

Yes, but...

In the link you sent to me, someone mentioned that write cache is
always creates problem, and it doesn't matter on OS or filesystem.

There's more below.

>> the data should always be consistent, and all fsck supposed to be
>> doing is to free unreferenced blocks that were allocated.
> fsck does a heck of a lot more than that, and there's no guarantee
> that's all fsck is going to do on a UFS2+SU filesystem.  I'm under the
> impression it does a lot more than just looking for unref'd blocks.

Yes, fsck does a lot more than that. But the whole point of soft
updates is to reduce the work of fsck to deallocate allocated blocks.

Anyway, maybe my information are invalid, though funny thing is that
Soft Updates was mentioned in one of my lecture on Operating Systems.

Apparently the goal of Soft Updates is to always enforce those rules
in very efficient manner, by reordering the writes:
1. Never point to a data structure before initializing it
2. Never reuse a structure before nullifying pointers to it
3. Never reset last pointer to live structure before setting a new one
4. Always mark free-block bitmap entries as used before making the
   directory entry point to it

The problem comes with disks which for performance reasons cache the
data and then write it in different order back to the disk.
I think that's the reason why it's recommended to disable it.
If a disk is reordering the writes, it renders the soft updates
useless.

But if the writing order is preserved, all data remains always
consistent, the only thing that might appear are blocks that were
marked as being used, but nothing was pointing to them yet.

So (in ideal situation, when nothing interferes) all fsck needs to do
is just to scan the filesystem and deallocate those blocks.

> The system is already up and the filesystems mounted.  If the error in
> question is of such severity that it would impact a user's ability to
> reliably use the filesystem, how do you expect constant screaming on
> the console will help?  A user won't know what it means; there is
> already evidence of this happening (re: mysterious ATA DMA errors which
> still cannot be figured out[6]).

> IMHO, a dirty filesystem should not be mounted until it's been fully
> analysed/scanned by fsck.  So again, people are putting faith into
> UFS2+SU despite actual evidence proving that it doesn't handle all
> scenarios.

Yes, I think the background fsck should be disabled by default, with a
possibility to enable it if the user is sure that nothing will
interfere with soft updates.

> The problem here is that when it was created, it was sort of an
> "experiment".  Now, when someone installs FreeBSD, UFS2 is the default
> filesystem used, and SU are enabled on every filesystem except the root
> fs.  Thus, we have now put ourselves into a situation where said
> feature ***must*** be reliable in all cases.

I think in worst case it just is as realiable as if it wouldn't be
enabled (the only danger is the background fsck)

> You're also forgetting a huge focus of SU -- snapshots[1].  However, there
> are more than enough facts on the table at this point concluding that
> snapshots are causing more problems[7] than previously expected.  And
> there's further evidence filesystem snapshots shouldn't even be used in
> this way[8].

there's not much to argue about that.

>> Also, if I remember correctly, PJD said that gjournal is performing
>> much better with small files, while softupdates is faster with big
>> ones.

> Okay

Re: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY

2008-09-27 Thread Erik Trulsson
On Fri, Sep 26, 2008 at 11:44:17PM -0700, Jeremy Chadwick wrote:
> On Fri, Sep 26, 2008 at 10:35:57PM -0700, Derek Kuli??ski wrote:
> > Hello Jeremy,
> > 
> > Friday, September 26, 2008, 10:14:13 PM, you wrote:
> > 
> > >> Actually what's the advantage of having fsck run in background if it
> > >> isn't capable of fixing things?
> > >> Isn't it more dangerous to be it like that? i.e. administrator might
> > >> not notice the problem; also filesystem could break even further...
> > 
> > > This question should really be directed at a set of different folks,
> > > e.g. actual developers of said stuff (UFS2 and soft updates in
> > > specific), because it's opening up a can of worms.
> > 
> > > I believe it has to do with the fact that there is much faith given to
> > > UFS2 soft updates -- the ability to background fsck allows the user to
> > > boot their system and have it up and working (able to log in, etc.) in a
> > > much shorter amount of time[1].  It makes the assumption that "everything
> > > will work just fine", which is faulty.
> > 
> > As far as I know (at least ideally, when write caching is disabled)
> 
> Re: write caching: wheelies and burn-outs in empty parking lots
> detected.
> 
> Let's be realistic.  We're talking about ATA and SATA hard disks, hooked
> up to on-board controllers -- these are the majority of users.  Those
> with ATA/SATA RAID controllers (not on-board RAID either; most/all of
> those do not let you disable drive write caching) *might* have a RAID
> BIOS menu item for disabling said feature.
> 
> FreeBSD atacontrol does not let you toggle such features (although "cap"
> will show you if feature is available and if it's enabled or not).

No, but using 'sysctl hw.ata.wc=0' will quickly and easily let you disable
write caching on all ATA/SATA devices.
This was actually the default setting briefly (back in 4.3 IIRC) but was
reverted due to the performance penalty being considered too severe.


> 
> Users using SCSI will most definitely have the ability to disable
> said feature (either via SCSI BIOS or via camcontrol).  But the majority
> of users are not using SCSI disks, because the majority of users are not
> going to spend hundreds of dollars on a controller followed by hundreds
> of dollars for a small (~74GB) disk.
> 
> Regardless of all of this, end-users should, in no way shape or form,
> be expected to go to great lengths to disable their disk's write cache.
> They will not, I can assure you.  Thus, we must assume: write caching
> on a disk will be enabled, period.  If a filesystem is engineered with
> that fact ignored, then the filesystem is either 1) worthless, or 2)
> serves a very niche purpose and should not be the default filesystem.
> 
> Do we agree?

Sort of, but soft updates does not technically need write caching to be
disabled. It does assume that disks will not 'lie' about if data has
actually been written to the disk or just to the disk's cache.  Many (most?)
ATA/SATA disks are unreliable in this regard which means that the guarantees
Soft Updates normally give about consistency of the file system can no
longer be guaranteed.



Using UFS2+soft updates on standard ATA/SATA disks (with write caching
enabled) connected to a standard disk controller is not a problem (not any
more than any other file system anyway.)

Using background fsck together with the above setup is not recommended
however.  Background fsck will only handle a subset of the errors that a
standard foreground fsck can handle.  In particular it assumes that the soft
updates guarantees of consistency are in place which would mean that there
are only a few non-critical problems that could happen.  With the above
setup those guarantees are not in place, which means that background fsck
can encounter errors it cannot (and will not) fix.






-- 

Erik Trulsson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY

2008-09-27 Thread sthaug
> > IMHO, a dirty filesystem should not be mounted until it's been fully
> > analysed/scanned by fsck.  So again, people are putting faith into
> > UFS2+SU despite actual evidence proving that it doesn't handle all
> > scenarios.
> 
> Yes, I think the background fsck should be disabled by default, with a
> possibility to enable it if the user is sure that nothing will
> interfere with soft updates.

Having been bitten by problems in this area more than once, I now always
disable background fsck. Having it disabled by default has my vote too.

Steinar Haug, Nethelp consulting, [EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: bad NFS/UDP performance

2008-09-27 Thread Danny Braniss
> --==_Exmh_1222467420_5817P
> Content-Type: text/plain; charset=us-ascii
> Content-Disposition: inline
> 
> David,
> 
> You beat me to it.
> 
> Danny, read the iperf man page:
>-b, --bandwidth n[KM]
>   set  target  bandwidth to n bits/sec (default 1 Mbit/sec).  This
>   setting requires UDP (-u).
> 
> The page needs updating, though. It should read "-b, --bandwidth
> n[KMG]. It also does NOT require -u. If you use -b, UDP is assumed.

I did RTFM(*), but when i tried it just wouldn't work, I tried today
and it's actually working - so don't RTFM before coffee!
btw, even though iperf sucks, netperf udp tends to bring the server down
to it's knees.

danny
PS: * - i don't seem to have the iperf man, all I have is iperf -h


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: sysctl maxfiles

2008-09-27 Thread Aristedes Maniatis


On 27/09/2008, at 1:02 PM, Jeremy Chadwick wrote:

Anyway, I'd like to know why you have so many fds open  
simultaneously in
the first place.  We're talking over 11,000 fds actively open at  
once --

this is not a small number.  What exactly is this machine doing?  Are
you absolutely certain tuning this higher is justified?  Have you  
looked
into the possibility that you have a program which is exhausting fds  
by

not closing them when finished?  (Yes, this is quite common; I've seen
bad Java code cause this problem on Solaris.)



Well, there was a runaway process which looks like it is leaking fds.  
We haven't solved it yet, but the fact that the maxfiles per machine  
and the maxfiles per process were so close together was really causing  
us grief for a while.





You're asking for trouble setting these values to the equivalent of
unlimited.  Instead of asking "what would happen", you should be  
asking

"why would I need to do that".

Regarding memory implications, the Handbook goes over it.


Unfortunately I've been unable to find it.  While we fix the fd leak  
I'd like to know how high I can push these numbers and not cause other  
problems.


Ari Maniatis



-->
ish
http://www.ish.com.au
Level 1, 30 Wilson Street Newtown 2042 Australia
phone +61 2 9550 5001   fax +61 2 9550 4001
GPG fingerprint CBFB 84B4 738D 4E87 5E5C  5EFA EF6A 7D2E 3E49 102A


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: 7.1-PRERELEASE freezes (IPFW related)

2008-09-27 Thread Christian Laursen
Jeremy Chadwick <[EMAIL PROTECTED]> writes:

> On Fri, Sep 26, 2008 at 06:21:01PM +0200, Christian Laursen wrote:
>> I decided to give 7.1-PRERELEASE a try on one of my machines to find
>> out if there might be any problems I should be aware of.
>> 
>> I quickly ran into problems. After a while the system freezes
>> completely. It seems to be somehow related to the load of the machine
>> as it doesn't seem to happen when it is idle. I built a kernel with
>> software watchdog enabled and enabled watchdog which had the nice
>> effect of turning the freeze into a panic. Hopefully that will be of
>> some help.
>> 

[snip]

> A couple generic things, although I think jhb@ might be able to figure
> out what's going on here:
>
> 1) Is this machine running the latest BIOS available?
> 2) Are you running powerd(8) on this box?
> 3) Does disabling ACPI (it's a menu option when booting) help?
> 4) Does removing "device cpufreq" help?

I tried without ACPI right after writing the previous mail without any
luck.

However I tried turning off various stuff and found the cause of the
problem. When I tried running without my ipfw rules the crashes went
away. I then immediately suspected the rules using uid matching and
those were indeed responsible.

I am now back to running with everything I usually have running on
this machine (my primary desktop) but without the ipfw uid rules and
the machine is rock stable.

I have been running with debug.mpsafenet="0" most likely because I
have been using ipfw uid matching. Has RELENG_7 had significant
changes in this area?

Since I don't need these rules anymore I have just removed
them.

-- 
Christian Laursen
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: 7.1-PRERELEASE freezes (IPFW related)

2008-09-27 Thread Robert Watson


On Sat, 27 Sep 2008, Christian Laursen wrote:

I am now back to running with everything I usually have running on this 
machine (my primary desktop) but without the ipfw uid rules and the machine 
is rock stable.


I have been running with debug.mpsafenet="0" most likely because I have been 
using ipfw uid matching. Has RELENG_7 had significant changes in this area?


Since I don't need these rules anymore I have just removed them.


In the last few days, some previously undiscovered interactions have been 
discovered between the rwlock work for udp/tcp performance and ipfw 
uid/gid/jail rules.  In essence, there were a number of edge cases where it 
turned out ipfw was relying on lock recursion on those locks, and that's no 
longer possible.


I've fixed two such edge cases in HEAD and will MFC them shortly, but there is 
at least one other known case.  I'm on the fence about whether to continue 
playing whack-a-mole knocking off the bugs as they are discovered, and fixing 
it with a hammer (having ipfw and friends check for the lock held before 
trying to acquire it) -- if this keeps up it's the latter for -STABLE and 
continuing to fix them as one-off bugs in HEAD.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: bad NFS/UDP performance

2008-09-27 Thread Robert Watson

On Fri, 26 Sep 2008, Danny Braniss wrote:

after more testing, it seems it's related to changes made between Aug 4 and 
Aug 29 ie, a kernel built on Aug 4 works fine, Aug 29 is slow. I'l now try 
and close the gap.


I think this is the best way forward -- skimming August changes, there are a 
number of candidate commits, including retuning of UDP hashes by mav, my 
rwlock changes, changes to mbuf chain handling, etc.


Thanks,

Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY

2008-09-27 Thread Jeremy Chadwick
On Sat, Sep 27, 2008 at 12:37:50AM -0700, Derek Kuli??ski wrote:
> Friday, September 26, 2008, 11:44:17 PM, you wrote:
> 
> >> As far as I know (at least ideally, when write caching is disabled)
> 
> > Re: write caching: wheelies and burn-outs in empty parking lots
> > detected.
> 
> > Let's be realistic.  We're talking about ATA and SATA hard disks, hooked
> > up to on-board controllers -- these are the majority of users.  Those
> > with ATA/SATA RAID controllers (not on-board RAID either; most/all of
> > those do not let you disable drive write caching) *might* have a RAID
> > BIOS menu item for disabling said feature.
> 
> > FreeBSD atacontrol does not let you toggle such features (although "cap"
> > will show you if feature is available and if it's enabled or not).
> 
> > Users using SCSI will most definitely have the ability to disable
> > said feature (either via SCSI BIOS or via camcontrol).  But the majority
> > of users are not using SCSI disks, because the majority of users are not
> > going to spend hundreds of dollars on a controller followed by hundreds
> > of dollars for a small (~74GB) disk.
> 
> > Regardless of all of this, end-users should, in no way shape or form,
> > be expected to go to great lengths to disable their disk's write cache.
> > They will not, I can assure you.  Thus, we must assume: write caching
> > on a disk will be enabled, period.  If a filesystem is engineered with
> > that fact ignored, then the filesystem is either 1) worthless, or 2)
> > serves a very niche purpose and should not be the default filesystem.
> 
> > Do we agree?
> 
> Yes, but...
> 
> In the link you sent to me, someone mentioned that write cache is
> always creates problem, and it doesn't matter on OS or filesystem.
> 
> There's more below.
> 
> >> the data should always be consistent, and all fsck supposed to be
> >> doing is to free unreferenced blocks that were allocated.
> > fsck does a heck of a lot more than that, and there's no guarantee
> > that's all fsck is going to do on a UFS2+SU filesystem.  I'm under the
> > impression it does a lot more than just looking for unref'd blocks.
> 
> Yes, fsck does a lot more than that. But the whole point of soft
> updates is to reduce the work of fsck to deallocate allocated blocks.
> 
> Anyway, maybe my information are invalid, though funny thing is that
> Soft Updates was mentioned in one of my lecture on Operating Systems.
> 
> Apparently the goal of Soft Updates is to always enforce those rules
> in very efficient manner, by reordering the writes:
> 1. Never point to a data structure before initializing it
> 2. Never reuse a structure before nullifying pointers to it
> 3. Never reset last pointer to live structure before setting a new one
> 4. Always mark free-block bitmap entries as used before making the
>directory entry point to it
> 
> The problem comes with disks which for performance reasons cache the
> data and then write it in different order back to the disk.
> I think that's the reason why it's recommended to disable it.
> If a disk is reordering the writes, it renders the soft updates
> useless.
> 
> But if the writing order is preserved, all data remains always
> consistent, the only thing that might appear are blocks that were
> marked as being used, but nothing was pointing to them yet.
> 
> So (in ideal situation, when nothing interferes) all fsck needs to do
> is just to scan the filesystem and deallocate those blocks.
> 
> > The system is already up and the filesystems mounted.  If the error in
> > question is of such severity that it would impact a user's ability to
> > reliably use the filesystem, how do you expect constant screaming on
> > the console will help?  A user won't know what it means; there is
> > already evidence of this happening (re: mysterious ATA DMA errors which
> > still cannot be figured out[6]).
> 
> > IMHO, a dirty filesystem should not be mounted until it's been fully
> > analysed/scanned by fsck.  So again, people are putting faith into
> > UFS2+SU despite actual evidence proving that it doesn't handle all
> > scenarios.
> 
> Yes, I think the background fsck should be disabled by default, with a
> possibility to enable it if the user is sure that nothing will
> interfere with soft updates.
> 
> > The problem here is that when it was created, it was sort of an
> > "experiment".  Now, when someone installs FreeBSD, UFS2 is the default
> > filesystem used, and SU are enabled on every filesystem except the root
> > fs.  Thus, we have now put ourselves into a situation where said
> > feature ***must*** be reliable in all cases.
> 
> I think in worst case it just is as realiable as if it wouldn't be
> enabled (the only danger is the background fsck)
> 
> > You're also forgetting a huge focus of SU -- snapshots[1].  However, there
> > are more than enough facts on the table at this point concluding that
> > snapshots are causing more problems[7] than previously expected.  And
> > there's further evidence filesyste

Re: bad NFS/UDP performance

2008-09-27 Thread Danny Braniss
> On Fri, 26 Sep 2008, Danny Braniss wrote:
> 
> > after more testing, it seems it's related to changes made between Aug 4 and 
> > Aug 29 ie, a kernel built on Aug 4 works fine, Aug 29 is slow. I'l now try 
> > and close the gap.
> 
> I think this is the best way forward -- skimming August changes, there are a 
> number of candidate commits, including retuning of UDP hashes by mav, my 
> rwlock changes, changes to mbuf chain handling, etc.

it more difficult than I expected.
for one, the kernel date was missleading, the actual source update is the key, 
so
the window of changes is now 28/July to 19/August. I have the diffs, but nothing
yet seems relevant.

on the other hand, I tried NFS/TCP, and there things seem ok, ie the 'good' and 
the 'bad'
give the same throughput, which seem to point to UDP changes ...

danny


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY

2008-09-27 Thread Oliver Fromme
[EMAIL PROTECTED] wrote:
 > [...]
 > > > IMHO, a dirty filesystem should not be mounted until it's been fully
 > > > analysed/scanned by fsck.  So again, people are putting faith into
 > > > UFS2+SU despite actual evidence proving that it doesn't handle all
 > > > scenarios.
 > > 
 > > Yes, I think the background fsck should be disabled by default, with a
 > > possibility to enable it if the user is sure that nothing will
 > > interfere with soft updates.
 > 
 > Having been bitten by problems in this area more than once, I now always
 > disable background fsck. Having it disabled by default has my vote too.

Just a "me too" here.

Best regards
   Oliver

-- 
Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M.
Handelsregister: Registergericht Muenchen, HRA 74606,  Geschäftsfuehrung:
secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün-
chen, HRB 125758,  Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart

FreeBSD-Dienstleistungen, -Produkte und mehr:  http://www.secnetix.de/bsd

"If you think C++ is not overly complicated, just what is a protected
abstract virtual base pure virtual private destructor, and when was the
last time you needed one?"
-- Tom Cargil, C++ Journal
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY

2008-09-27 Thread Michel Talon
Jeremy Chadwick wrote:

> I believe we're in overall agreement with regards to background_fsck
> (should be disabled by default).

In fact background fsck has been introduced for a good reason:
waiting for a full fsck on modern big disks is far too long.
Similarly write cache is enabled on ata disks for the reason that
without it performance sucks too much. My humble opinion is that you
attach far far too much importance to reliability in this game.
There are many reasons why corruption may happen in the files, most
of them being hardware related (bad ram, overheating chipset, etc.)
Hence you can never be assured that your data is perfectly reliable
(except perhaps ZFS permanent checksumming), all you have is some
probability of reliability. I think that for most people what is
important is a good balance between the risk of catastrophic failure
(which is always here, and is increased little by background fsck)
and the performance and ease of use. The FreeBSD developers have
chosen this middle ground, with good reason, in my opinion. People
who are more concerned with the reliability of their data, and
want to pay the price can always disable background fsck, maintain
backups, etc. Personnally i would run away from a system requiring
hours of fsck before being able to run multiuser. Neither Windows,
with NTFS, nor Linux, with ext3, reiserfs, xfs, jfs, etc. require
any form of scandisk or fsck. Demanding that full fsck is the default in
FreeBSD is akin to alienating a large fraction of users who have greener
pasture easily available. Idem for asking to disable write caching on
the disks. So for most people there is a probability to get some day
the UNEXPECTED SOFT UPDATE INCONSISTENCY message. They will run a full
fsck in that occasion, not a terrible thing. In many years of FreeBSD
use, it happened me a small number of times, and i have still to loose
a file, at least that i remarked.

-- 

Michel TALON

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


7.1-PRELEASE sporadically panicking with fatal trap 12

2008-09-27 Thread John L. Templer
I'm running 7.1-PRERELEASE, with /usr/src and /usr/ports last csup-ed 
just a few days ago.  After being up for about a day or so the system 
will panic because of a page fault.  I'm not completely sure, but it 
seems that the system is more stable when gdm and gnome are disabled in 
rc.conf.  At least it stayed up for several days when I did that.


I've run memtest several times, so I'm pretty confident it's not a 
memory problem.  Also the stack trace is always the same, so I'm 
thinking it's not hardware related.


I've attached a stack trace from kgdb, and the output from dmesg.  I'd 
appreciate any help you could give me with this.
/var/crash# kgdb -n 5
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd"...

Unread portion of the kernel message buffer:
acd1: WARNING - READ_TOC read data overrun 18>12


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x188
fault code  = supervisor read, page not present
instruction pointer = 0x20:0xc0782714
stack pointer   = 0x28:0xe52aec00
frame pointer   = 0x28:0xe52aec18
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 18 (swi6: task queue)
trap number = 12
panic: page fault
cpuid = 0
Uptime: 8h10m38s
Physical memory: 1779 MB
Dumping 195 MB: 180 164 148 132 116 100 84 68 52 36 20 4

Reading symbols from /boot/kernel/sound.ko...Reading symbols from 
/boot/kernel/sound.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/sound.ko
Reading symbols from /boot/kernel/snd_cmi.ko...Reading symbols from 
/boot/kernel/snd_cmi.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/snd_cmi.ko
Reading symbols from /boot/kernel/acpi.ko...Reading symbols from 
/boot/kernel/acpi.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/acpi.ko
Reading symbols from /boot/kernel/linux.ko...Reading symbols from 
/boot/kernel/linux.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/linux.ko
Reading symbols from /usr/local/modules/fuse.ko...done.
Loaded symbols for /usr/local/modules/fuse.ko
Reading symbols from /boot/kernel/mach64.ko...Reading symbols from 
/boot/kernel/mach64.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/mach64.ko
Reading symbols from /boot/kernel/drm.ko...Reading symbols from 
/boot/kernel/drm.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/drm.ko
#0  doadump () at pcpu.h:196
196 pcpu.h: No such file or directory.
in pcpu.h
(kgdb) backtrace
#0  doadump () at pcpu.h:196
#1  0xc078fae7 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418
#2  0xc078fda9 in panic (fmt=Variable "fmt" is not available.
) at /usr/src/sys/kern/kern_shutdown.c:572
#3  0xc0aa174c in trap_fatal (frame=0xe52aebc0, eva=392)
at /usr/src/sys/i386/i386/trap.c:939
#4  0xc0aa19d0 in trap_pfault (frame=0xe52aebc0, usermode=0, eva=392)
at /usr/src/sys/i386/i386/trap.c:852
#5  0xc0aa238c in trap (frame=0xe52aebc0) at /usr/src/sys/i386/i386/trap.c:530
#6  0xc0a8827b in calltrap () at /usr/src/sys/i386/i386/exception.s:159
#7  0xc0782714 in _mtx_lock_sleep (m=0xc4ff804c, tid=3302734576, opts=0, 
file=0x0, line=0) at /usr/src/sys/kern/kern_mutex.c:339
#8  0xc078ed66 in _sema_post (sema=0xc4ff804c, file=0x0, line=0)
at /usr/src/sys/kern/kern_sema.c:79
#9  0xc0513350 in ata_completed (context=0xc4ff8000, dummy=1)
at /usr/src/sys/dev/ata/ata-queue.c:481
#10 0xc07c2e15 in taskqueue_run (queue=0xc4dbab80)
at /usr/src/sys/kern/subr_taskqueue.c:282
#11 0xc07c3123 in taskqueue_swi_run (dummy=0x0)
at /usr/src/sys/kern/subr_taskqueue.c:324
#12 0xc076f8db in ithread_loop (arg=0xc4dadb30)
at /usr/src/sys/kern/kern_intr.c:1088
#13 0xc076c449 in fork_exit (callout=0xc076f720 , 
arg=0xc4dadb30, frame=0xe52aed38) at /usr/src/sys/kern/kern_fork.c:804
---Type  to continue, or q  to quit---
#14 0xc0a882f0 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:264
(kgdb) up 7
#7  0xc0782714 in _mtx_lock_sleep (m=0xc4ff804c, tid=3302734576, opts=0, 
file=0x0, line=0) at /usr/src/sys/kern/kern_mutex.c:339
339 owner = (struct thread *)(v & ~MTX_FLAGMASK);
(kgdb) list
334  * If the owner is running on another CPU, spin until 
the
335  * owner stops running or the state of the lock changes.
336  */
337 v = m->mtx_lock;
338 if (v != MTX_UNOWNED) {
339 owner = (struct thread *)(v & ~MTX_FLAGMASK);
340 

Re: bad NFS/UDP performance

2008-09-27 Thread Eli Dart



Danny Braniss wrote:


I know, but I get about 1mgb, which seems somewhat low :-(


If you don't tell iperf how much bandwidth to use for a UDP test, it 
defaults to 1Mbps.


See -b option.

http://dast.nlanr.net/projects/Iperf/iperfdocs_1.7.0.php#bandwidth

--eli

--
Eli Dart
ESnet Network Engineering Group
Lawrence Berkeley National Laboratory
PGP Key fingerprint = C970 F8D3 CFDD 8FFF 5486 343A 2D31 4478 5F82 B2B3
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Recommendations for servers running SATA drives

2008-09-27 Thread Charles Sprickman
I'm forking the thread on fsck/soft-updates in hopes of getting some 
practical advice based on the discussion here of background fsck, 
softupdates and write-caching on SATA drives.


On Fri, 26 Sep 2008, Jeremy Chadwick wrote:


Let's be realistic.  We're talking about ATA and SATA hard disks, hooked
up to on-board controllers -- these are the majority of users.  Those
with ATA/SATA RAID controllers (not on-board RAID either; most/all of
those do not let you disable drive write caching) *might* have a RAID
BIOS menu item for disabling said feature.


While I would love to deploy every server with SAS, that's not practical 
in many cases, especially for light-duty servers that are not being pushed 
very hard.  I am taking my chances with multiple affordable drives and 
gmirror where I cannot throw in a 3Ware card.  I imagine that many 
non-desktop FreeBSD users are doing the same considering you can fetch a 
decent 1U box with plenty of storage for not much more than $1K.  I assume 
many here are in agreement on this point -- just making it clear that the 
bargain crowd is not some weird edge case in the userbase...



Regardless of all of this, end-users should, in no way shape or form,
be expected to go to great lengths to disable their disk's write cache.
They will not, I can assure you.  Thus, we must assume: write caching
on a disk will be enabled, period.  If a filesystem is engineered with
that fact ignored, then the filesystem is either 1) worthless, or 2)
serves a very niche purpose and should not be the default filesystem.


Arguments about defaults aside, this is my first questions.  If I've got a 
server with multiple SATA drives mirrored with gmirror, is turning on 
write-caching a good idea?  What kind of performance impact should I 
expect?  What is the relationship between caching, soft-updates, and 
either NCQ or TCQ?


Here's an example of a Seagate, trimmed for brevity:

Protocol  Serial ATA v1.0
device model  ST3160811AS

Feature  Support  EnableValue   Vendor
write cacheyes  yes
read ahead yes  yes
Native Command Queuing (NCQ)   yes   -  31/0x1F
Tagged Command Queuing (TCQ)   no   no  31/0x1F

TCQ is clearly not supported, NCQ seems to be supported, but I don't know 
how to tell if it's actually enabled or not.  Write-caching is currently 
on.


The tradeoff is apparently performance vs. more reliable recovery should 
the machine lose power, smoke itself, etc., but all I've seen is anecdotal 
evidence of how bad performance gets.


FWIW, this machine in particular had it's mainboard go up in smoke last 
week.  One drive was too far gone for gmirror to rebuild it without doing 
a "forget" and "insert".  The remaining drive was too screwy for 
background fsck, but a manual check in single-user left me with no real 
suprises or problems.



The system is already up and the filesystems mounted.  If the error in
question is of such severity that it would impact a user's ability to
reliably use the filesystem, how do you expect constant screaming on
the console will help?  A user won't know what it means; there is
already evidence of this happening (re: mysterious ATA DMA errors which
still cannot be figured out[6]).

IMHO, a dirty filesystem should not be mounted until it's been fully
analysed/scanned by fsck.  So again, people are putting faith into
UFS2+SU despite actual evidence proving that it doesn't handle all
scenarios.


I'll ask, but it seems like the consensus here is that background fsck, 
while the default, is best left disabled.  The cases where it might make 
sense are:


-desktop systems
-servers that have incredibly huge filesystems (and even there being able 
to selectively background fsck filesystems might be helpful)


The first example is obvious, people want a fast-booting desktop.  The 
second is trading long fsck times in single-user for some uncertainty.



The problem here is that when it was created, it was sort of an
"experiment".  Now, when someone installs FreeBSD, UFS2 is the default
filesystem used, and SU are enabled on every filesystem except the root
fs.  Thus, we have now put ourselves into a situation where said
feature ***must*** be reliable in all cases.

You're also forgetting a huge focus of SU -- snapshots[1].  However, there
are more than enough facts on the table at this point concluding that
snapshots are causing more problems[7] than previously expected.  And
there's further evidence filesystem snapshots shouldn't even be used in
this way[8].


...


Filesystems have to be reliable; data integrity is focus #1, and cannot
be sacrificed.  Users and administrators *expect* a filesystem to be
reliable.  No one is going to keep using a filesystem if it has
disadvantages which can result in data loss or "waste of administrative
time" (which I believe is what's occurring here).


The softupdates question seems tied quite closely to the wr

Re: sysctl maxfiles

2008-09-27 Thread Miroslav Lachman

Jeremy Chadwick wrote:

On Sat, Sep 27, 2008 at 11:10:01AM +1000, Aristedes Maniatis wrote:


By default FreeBSD 7.0 shipped with the sysctls set to:

kern.maxfiles: 12328
kern.maxfilesperproc: 11095


[...]


Anyway, I'd like to know why you have so many fds open simultaneously in
the first place.  We're talking over 11,000 fds actively open at once --
this is not a small number.  What exactly is this machine doing?  Are
you absolutely certain tuning this higher is justified?  Have you looked
into the possibility that you have a program which is exhausting fds by
not closing them when finished?  (Yes, this is quite common; I've seen
bad Java code cause this problem on Solaris.)


I can imagine some webhosting machine running Apache virtualhosts. Each 
virtual host using 3 logfiles (access log, error log, IO log) so it is 
"only" about 4000 domains (virtualhosts) which is not so uncommon in 
these days ;)


I don't know what files are "really" open in the meaning of 
kern.maxfiles. I have webserver with about 100 hosted domains and there 
is some numbers:


[EMAIL PROTECTED] ~/# fstat -u www | wc -l
9931
[EMAIL PROTECTED] ~/# fstat -u root | wc -l
 718
[EMAIL PROTECTED] ~/# fstat | grep httpd | wc -l
6379
[EMAIL PROTECTED] ~/# fstat | grep httpd | wc -l
6002
[EMAIL PROTECTED] ~/# fstat -u www | wc -l
4691
[EMAIL PROTECTED] ~/# sysctl kern.openfiles
kern.openfiles: 846

All above taken within few seconds.

Can somebody explain the difference between kern.openfiles and fstat?

Miroslav Lachman
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Recommendations for servers running SATA drives

2008-09-27 Thread Jeremy Chadwick
On Sat, Sep 27, 2008 at 03:16:11PM -0400, Charles Sprickman wrote:
> On Fri, 26 Sep 2008, Jeremy Chadwick wrote:
>> Let's be realistic.  We're talking about ATA and SATA hard disks, hooked
>> up to on-board controllers -- these are the majority of users.  Those
>> with ATA/SATA RAID controllers (not on-board RAID either; most/all of
>> those do not let you disable drive write caching) *might* have a RAID
>> BIOS menu item for disabling said feature.
>
> While I would love to deploy every server with SAS, that's not practical  
> in many cases, especially for light-duty servers that are not being 
> pushed very hard.  I am taking my chances with multiple affordable drives 
> and gmirror where I cannot throw in a 3Ware card.  I imagine that many  
> non-desktop FreeBSD users are doing the same considering you can fetch a  
> decent 1U box with plenty of storage for not much more than $1K.  I 
> assume many here are in agreement on this point -- just making it clear 
> that the bargain crowd is not some weird edge case in the userbase...

I'm in full agreement here.  As much as I love SCSI (and I sincerely do)
it's (IMHO unjustifiably) overpriced, simply because "it can be".  You'd
expect the price of SCSI to decrease over the years, but it hasn't; it's
become part of a niche market, primarily intended for large businesses
with cash to blow.  As I said, I love SCSI, the protocol is excellent,
and it's very well-supported all over the place -- and though I have
no personal experience with SAS, it appears to be equally as excellent,
yet the price is comparative to SCSI.

Even at my place of work we use SATA disks in our filers.  I suppose this
is justified in the sense that a disk failure there will be less painful
than it would be in a single or dual-disk server, so saving money is
legitimate since RAID-5 (or whatever) is in use.  But with regards to
our server boxes, either single or dual SATA disks are now being used,
rather than SCSI.  I haven't asked our datacenter and engineering folks
why we've switched, but gut feeling says "saving money"

>> Regardless of all of this, end-users should, in no way shape or form,
>> be expected to go to great lengths to disable their disk's write cache.
>> They will not, I can assure you.  Thus, we must assume: write caching
>> on a disk will be enabled, period.  If a filesystem is engineered with
>> that fact ignored, then the filesystem is either 1) worthless, or 2)
>> serves a very niche purpose and should not be the default filesystem.
>
> Arguments about defaults aside, this is my first questions.  If I've got 
> a server with multiple SATA drives mirrored with gmirror, is turning on  
> write-caching a good idea?  What kind of performance impact should I  
> expect?  What is the relationship between caching, soft-updates, and  
> either NCQ or TCQ?
>
> Here's an example of a Seagate, trimmed for brevity:
>
> Protocol  Serial ATA v1.0
> device model  ST3160811AS
>
> Feature  Support  EnableValue   Vendor
> write cacheyes  yes
> read ahead yes  yes
> Native Command Queuing (NCQ)   yes   -  31/0x1F
> Tagged Command Queuing (TCQ)   no   no  31/0x1F
>
> TCQ is clearly not supported, NCQ seems to be supported, but I don't know 
> how to tell if it's actually enabled or not.  Write-caching is currently  
> on.

Actually, no -- FreeBSD ata(4) does not support NCQ.  I believe there
are some unofficial patches (or even a PR) floating around which are for
testing, but out of the box, it lacks support.  The hyphen you see under
the Enable column is supposed to signify that (I feel it's badly placed;
it should say "notsupp" or "unsupp" or something like that.  Hyphen is
too vague).

The NCQ support patches might require AHCI as well, I forget.  It's been
a while.

> The tradeoff is apparently performance vs. more reliable recovery should  
> the machine lose power, smoke itself, etc., but all I've seen is 
> anecdotal evidence of how bad performance gets.
>
> FWIW, this machine in particular had it's mainboard go up in smoke last  
> week.  One drive was too far gone for gmirror to rebuild it without doing 
> a "forget" and "insert".  The remaining drive was too screwy for  
> background fsck, but a manual check in single-user left me with no real  
> suprises or problems.

As long as the array rebuilt fine, I believe small quirks are
acceptable.  Scenarios where the array *doesn't* rebuild properly when a
new disk is added are of great concern (and in the case of some features
such as Intel MatrixRAID, the FreeBSD bugs are so severe that you are
liable to lose data in such scenarios.  MatrixRAID != gmirror, of
course).

This also leads me a little off-topic -- when it comes to disk
replacements, administrators want to be able to do this without taking
the system down.  There are problems with this, but it often depends
greatly on hardware and BIOS configuration.


Re: sysctl maxfiles

2008-09-27 Thread Jeremy Chadwick
On Sat, Sep 27, 2008 at 10:14:09PM +0200, Miroslav Lachman wrote:
> Jeremy Chadwick wrote:
>> On Sat, Sep 27, 2008 at 11:10:01AM +1000, Aristedes Maniatis wrote:
>>
>>> By default FreeBSD 7.0 shipped with the sysctls set to:
>>>
>>> kern.maxfiles: 12328
>>> kern.maxfilesperproc: 11095
>
> [...]
>
>> Anyway, I'd like to know why you have so many fds open simultaneously in
>> the first place.  We're talking over 11,000 fds actively open at once --
>> this is not a small number.  What exactly is this machine doing?  Are
>> you absolutely certain tuning this higher is justified?  Have you looked
>> into the possibility that you have a program which is exhausting fds by
>> not closing them when finished?  (Yes, this is quite common; I've seen
>> bad Java code cause this problem on Solaris.)
>
> I can imagine some webhosting machine running Apache virtualhosts. Each  
> virtual host using 3 logfiles (access log, error log, IO log) so it is  
> "only" about 4000 domains (virtualhosts) which is not so uncommon in  
> these days ;)

We're a web/shell hosting provider who used to do it that way.  It
became unreasonable/impossible to manage.  Also, if said logfiles are
being placed in directories where users of those virtualhosts can remove
the files (and make symlinks to other places), that's a security hole
(because Apache opens webserver logfiles as root).

The way we do it is much more resource-friendly: log everything to a
single logfile, then every night split the logfile up (based on the
CustomLog %v parameter into per-vhost log files.  Apache comes with a
script to do this called split-logfile.

> I don't know what files are "really" open in the meaning of  
> kern.maxfiles. I have webserver with about 100 hosted domains and there  
> is some numbers:
>
> [EMAIL PROTECTED] ~/# fstat -u www | wc -l
> 9931

I don't think this is an accurate portrait of the number of open files.
The number is going to be too high; I believe entries that contain
FD=jail/mmap/root/text/tr/wd are not actual descriptors (are they?)

> [EMAIL PROTECTED] ~/# fstat -u root | wc -l
>  718
> [EMAIL PROTECTED] ~/# fstat | grep httpd | wc -l
> 6379
> [EMAIL PROTECTED] ~/# fstat | grep httpd | wc -l
> 6002
> [EMAIL PROTECTED] ~/# fstat -u www | wc -l
> 4691
> [EMAIL PROTECTED] ~/# sysctl kern.openfiles
> kern.openfiles: 846
>
> All above taken within few seconds.
>
> Can somebody explain the difference between kern.openfiles and fstat?

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: sysctl maxfiles

2008-09-27 Thread Oliver Fromme
Miroslav Lachman wrote:
 > I don't know what files are "really" open in the meaning of 
 > kern.maxfiles. I have webserver with about 100 hosted domains and there 
 > is some numbers:
 > 
 > [EMAIL PROTECTED] ~/# fstat -u www | wc -l
 >  9931
 > [EMAIL PROTECTED] ~/# fstat -u root | wc -l
 >   718
 > [EMAIL PROTECTED] ~/# fstat | grep httpd | wc -l
 >  6379
 > [EMAIL PROTECTED] ~/# fstat | grep httpd | wc -l
 >  6002
 > [EMAIL PROTECTED] ~/# fstat -u www | wc -l
 >  4691
 > [EMAIL PROTECTED] ~/# sysctl kern.openfiles
 > kern.openfiles: 846
 > 
 > All above taken within few seconds.
 > 
 > Can somebody explain the difference between kern.openfiles and fstat?

Those are different things:  fstat lists file descriptors,
while kern.openfiles counts open file objects, which are
often shared among processes.

For example, when the apache master process forks its
children, the children inherit the open file objects from
the parent process.  While every child has its own set of
file descriptors (listed separately by fstat), they
reference the same underlying open file objects, so they
don't contribute separately to kern.openfiles.

In the same way, fstat lists stdin + stdout + stderr for
almost every process, but in most cases they are not
separate file objects because they were inherited from the
parent process.

Best regards
   Oliver

-- 
Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M.
Handelsregister: Registergericht Muenchen, HRA 74606,  Geschäftsfuehrung:
secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün-
chen, HRB 125758,  Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart

FreeBSD-Dienstleistungen, -Produkte und mehr:  http://www.secnetix.de/bsd

"In My Egoistical Opinion, most people's C programs should be indented
six feet downward and covered with dirt."
-- Blair P. Houghton
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: sysctl maxfiles

2008-09-27 Thread Peter Jeremy
On 2008-Sep-27 22:14:09 +0200, Miroslav Lachman <[EMAIL PROTECTED]> wrote:
>[EMAIL PROTECTED] ~/# fstat -u www | wc -l
> 9931
>[EMAIL PROTECTED] ~/# fstat -u root | wc -l
>  718
>[EMAIL PROTECTED] ~/# fstat | grep httpd | wc -l
> 6379
>[EMAIL PROTECTED] ~/# fstat | grep httpd | wc -l
> 6002
>[EMAIL PROTECTED] ~/# fstat -u www | wc -l
> 4691
>[EMAIL PROTECTED] ~/# sysctl kern.openfiles
>kern.openfiles: 846

kern.openfiles reflects the total number of open file structures
within the kernel, whereas fstat (and lsof) report both open files
and vnodes associated with each process.  The differences are
1) File structures are shared via fork() etc so the same file structure
   can be reported multiple times.
2) fstat reports executable name, working directory and root

Open files in fstat can be detected because they have numeric values
(possibly with a '*' appended) in the FD column.  Unfortunately, there
doesn't appear to be any easy way to detect shared file structures
(for inode-based files) using either fstat or lsof.

In the case of apache, there are at least 6 file structures shared
by each httpd process (and it looks like it might be about 15).

-- 
Peter Jeremy
Please excuse any delays as the result of my ISP's inability to implement
an MTA that is either RFC2821-compliant or matches their claimed behaviour.


pgpFX8suq2Bn0.pgp
Description: PGP signature


Warning: known instability using ipfw "uid" rules

2008-09-27 Thread Robert Watson


An FYI: In the past couple of days, presumably as testing of 7.x becomes more 
widespread, I've seen several reports of instability resulting from ipfw 
credential rules.  For those unfamiliar with them, these allow the matching of 
packets in ipfw rules based on the credentials of the socket that generated 
them, or the credentials of the socket that likely will receive them.


These problems are a side effect of elimating support for lock recursion on 
inpcbinfo locks as part of the UDP performance optimization work for 7.1. 
There are two minor TCP fixes, and a more serious ipfw bug fix, in the queue 
to be MFC'd in the next couple of days.  Once they're fixed, please make sure 
any further problems with deadlocks or panics involving ipfw rules are brought 
to my attention.


Thanks, and apologies for any inconvenience -- this issue did not arise during 
testing in HEAD over the course of several months, but fortunately appears 
fairly straight forward to resolve now that it's a bit better understood.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


7.1-PRERELEASE : bad network performance (nfe0)

2008-09-27 Thread Arno J. Klaassen


Hello,

I've serious network performance problems on a HP Turion X2
based brand new notebook; I only used a 7-1Beta CD and
7-STABLE on this thing.

Scp-ing ports.tgz from a rock-stable 7-STABLE server to it gives :

  # scp -p ports.tgz [EMAIL PROTECTED]:/tmp/
ports.tgz 100%   98MB  88.7KB/s   18:49 

(doing the same thing by copy from an nfs-mounted disk even
 takes mores than an hour ...)


Doing a top(1) aside, just shows the box 100% idle :

  PID USERNAME PRI NICE   SIZERES STATE  C   TIME   WCPU COMMAND
   12 root 171 ki31 0K16K CPU0   0  38:55 100.00% idle: cpu0
   11 root 171 ki31 0K16K RUN1  38:55 100.00% idle: cpu1
   13 root -32- 0K16K WAIT   0   0:02  0.00% swi4: clock sio
   29 root -68- 0K16K -  0   0:00  0.00% nfe0 taskq
   34 root -64- 0K16K WAIT   1   0:00  0.00% irq23: atapci1
 1853 root   80  7060K  1920K wait   0   0:00  0.00% sh
  878 nono  440  8112K  2288K CPU1   1   0:00  0.00% top
  884 root   8- 0K16K -  1   0:00  0.00% nfsiod 0
4 root  -8- 0K16K -  1   0:00  0.00% g_down
   16 root -16- 0K16K -  1   0:00  0.00% yarrow
   46 root  20- 0K16K syncer 0   0:00  0.00% syncer
3 root  -8- 0K16K -  0   0:00  0.00% g_up
   30 root -68- 0K16K -  0   0:00  0.00% fw0_taskq


I tested :

  Update Bios
  ULE /4BSD
  PREEMPTION on/off
  PREEMPTION + IPI_PREEMPTION
  hw.nfe.msi[x]_disable=1

All don't seem to matter to the problem.

I put two tcpdumps (server and client during another scp(1) ) on 
  http://bare.snv.jussieu.fr/temp/tcpdump-s1518.server
  http://bare.snv.jussieu.fr/temp/tcpdump-s1518.client

I'm far from an expert on TCP/IP, but wireshark "expert info" shows
lots of sequences like :

  TCP Previous segment lost
  TCP Duplicate ACK 1
  TCP Window update
  TCP Duplicate ACK 2
  TCP Duplicate ACK 3
  TCP Duplicate ACK 4
  TCP Duplicate ACK 5
  TCP Fast retransmission (suspected)
  TCP ...
  TCP Out-of-Order segment
  TCP ...


As usual, feel free to contact me for further info/tests.

Thanx, Arno

# uname -a
FreeBSD mv 7.1-PRERELEASE FreeBSD 7.1-PRERELEASE #0: Fri Sep 26 15:06:07 CEST 
2008 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/PAVILLON  amd64

# pciconf -lcv (bits)
[EMAIL PROTECTED]:0:6:0:class=0x02 card=0x30cf103c chip=0x045010de 
rev=0xa3 hdr=0x00
vendor = 'Nvidia Corp'
device = 'MCP65 Ethernet'
class  = network
subclass   = ethernet
cap 01[44] = powerspec 2  supports D0 D1 D2 D3  current D0


# dmesg -a

Copyright (c) 1992-2008 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 7.1-PRERELEASE #0: Fri Sep 26 15:06:07 CEST 2008
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/PAVILLON
Timecounter "i8254" frequency 1193250 Hz quality 0
CPU: AMD Turion(tm) 64 X2 Mobile Technology TL-62 (2109.70-MHz K8-class CPU)
  Origin = "AuthenticAMD"  Id = 0x60f82  Stepping = 2
  
Features=0x178bfbff
  Features2=0x2001
  AMD Features=0xea500800
  AMD Features2=0x11f
  Cores per package: 2
usable memory = 3210813440 (3062 MB)
avail memory  = 3104542720 (2960 MB)
ACPI APIC Table: 
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
ioapic0  irqs 0-23 on motherboard
kbd1 at kbdmux0
ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413)
acpi0:  on motherboard
acpi0: [ITHREAD]
acpi0: Power Button (fixed)
ACPI Error (dsopcode-0671): Field [I9MN] at 544 exceeds Buffer [IORT] size 464 
(bits) [20070320]
ACPI Error (psparse-0626): Method parse/execution failed 
[\\_SB_.PCI0.LPC0.PMIO._CRS] (Node 0xff00011f50a0), AE_AML_BUFFER_LIMIT
ACPI Error (uteval-0309): Method execution failed [\\_SB_.PCI0.LPC0.PMIO._CRS] 
(Node 0xff00011f50a0), AE_AML_BUFFER_LIMIT
can't fetch resources for \\_SB_.PCI0.LPC0.PMIO - AE_AML_BUFFER_LIMIT
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0
acpi_ec0:  port 0x62,0x66 on acpi0
acpi_hpet0:  iomem 0xfed0-0xfed003ff on acpi0
Timecounter "HPET" frequency 2500 Hz quality 900
acpi_acad0:  on acpi0
battery0:  on acpi0
acpi_lid0:  on acpi0
pcib0:  port 0xcf8-0xcff on acpi0
pci0:  on pcib0
pci0:  at device 0.0 (no driver attached)
isab0:  port 0x1d00-0x1dff at device 1.0 on pci0
isa0:  on isab0
pci0:  at device 1.1 (no driver attached)
pci0:  at device 1.3 (no driver attached)
ohci0:  mem 0xf2486000-0xf2486fff irq 18 at 
device 2.0 on pci0
ohci0: [GIANT-LOCKED]
ohci0: [ITHREAD]
usb0: OHCI version 1.0, legacy support
usb0:  on ohci0
usb0: USB revision 1.0
uhub0:  on usb0
uhub0: 10 ports with 10 removable, self powered
ehci0:  mem 0xf24880

Re: sysctl maxfiles

2008-09-27 Thread Gary Palmer
On Sat, Sep 27, 2008 at 07:05:08PM +1000, Aristedes Maniatis wrote:
> 
> On 27/09/2008, at 1:02 PM, Jeremy Chadwick wrote:
> 
> >Anyway, I'd like to know why you have so many fds open  
> >simultaneously in
> >the first place.  We're talking over 11,000 fds actively open at  
> >once --
> >this is not a small number.  What exactly is this machine doing?  Are
> >you absolutely certain tuning this higher is justified?  Have you  
> >looked
> >into the possibility that you have a program which is exhausting fds  
> >by
> >not closing them when finished?  (Yes, this is quite common; I've seen
> >bad Java code cause this problem on Solaris.)
> 
> 
> Well, there was a runaway process which looks like it is leaking fds.  
> We haven't solved it yet, but the fact that the maxfiles per machine  
> and the maxfiles per process were so close together was really causing  
> us grief for a while.
> 
> 
> 
> >You're asking for trouble setting these values to the equivalent of
> >unlimited.  Instead of asking "what would happen", you should be  
> >asking
> >"why would I need to do that".
> >
> >Regarding memory implications, the Handbook goes over it.
> 
> Unfortunately I've been unable to find it.  While we fix the fd leak  
> I'd like to know how high I can push these numbers and not cause other  
> problems.

At least one port recommends you set

kern.maxfiles="4"

in /boot/loader.conf.  I think its one of the GNOME ports.  I'm pretty
confident you can run that without too many problems, and maybe go higher,
but if you really want to know the limit its probably kernel memory and
that will depend on your workload.

Solving the fd leak is by far the safest path.  Note that tracking
that many files is probably affecting your application performance
in addition to hurting the system.

Regards,

Gary
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: sysctl maxfiles

2008-09-27 Thread Aristedes Maniatis


On 28/09/2008, at 8:18 AM, Gary Palmer wrote:


At least one port recommends you set

kern.maxfiles="4"

in /boot/loader.conf.  I think its one of the GNOME ports.  I'm pretty
confident you can run that without too many problems, and maybe go  
higher,
but if you really want to know the limit its probably kernel memory  
and

that will depend on your workload.


I guess then I should ask the question a different way. How much  
memory does each fd use and which pool of memory does it come from?  
This is ZFS if that makes any difference.


Or asked a different way, if I set the number to 200,000 and some  
rogue process used 190,000 fds, then what bad thing would happen to  
the system? If any.




Solving the fd leak is by far the safest path.  Note that tracking
that many files is probably affecting your application performance
in addition to hurting the system.


Absolutely. We are working on it. But general Unix principles are that  
a non-root user should not be able to get Unix to a non-functional  
state. It appears that this is a very simple path to DoS, particularly  
since with the default settings it is easy for one process to use up  
all available fds and leave no more for anyone to be able to log in.



Ari Maniatis



-->
ish
http://www.ish.com.au
Level 1, 30 Wilson Street Newtown 2042 Australia
phone +61 2 9550 5001   fax +61 2 9550 4001
GPG fingerprint CBFB 84B4 738D 4E87 5E5C  5EFA EF6A 7D2E 3E49 102A


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: sysctl maxfiles

2008-09-27 Thread Miroslav Lachman

Peter Jeremy wrote:

On 2008-Sep-27 22:14:09 +0200, Miroslav Lachman <[EMAIL PROTECTED]> wrote:


[EMAIL PROTECTED] ~/# fstat -u www | wc -l
   9931
[EMAIL PROTECTED] ~/# fstat -u root | wc -l
718
[EMAIL PROTECTED] ~/# fstat | grep httpd | wc -l
   6379
[EMAIL PROTECTED] ~/# fstat | grep httpd | wc -l
   6002
[EMAIL PROTECTED] ~/# fstat -u www | wc -l
   4691
[EMAIL PROTECTED] ~/# sysctl kern.openfiles
kern.openfiles: 846



kern.openfiles reflects the total number of open file structures
within the kernel, whereas fstat (and lsof) report both open files
and vnodes associated with each process.  The differences are
1) File structures are shared via fork() etc so the same file structure
   can be reported multiple times.
2) fstat reports executable name, working directory and root

Open files in fstat can be detected because they have numeric values
(possibly with a '*' appended) in the FD column.  Unfortunately, there
doesn't appear to be any easy way to detect shared file structures
(for inode-based files) using either fstat or lsof.

In the case of apache, there are at least 6 file structures shared
by each httpd process (and it looks like it might be about 15).


Thank you for your explanation. (Jeremy Chadwick, Oliver Fromme, Peter 
Jeremy.

Now it makes sense to me.

Miroslav Lachmanx
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Request for testing - top 3.8b1 in the base system

2008-09-27 Thread Edwin Groothuis
I have made an update for the top(1) utility in the FreeBSD base
system to get it from the 3.5b12 version to the 3.8b1 version.

I have tried them on the amd64 architecture on FreeBSD -current and
FreeBSD 7.0 and on the i386 architecture on FreeBSD 7.0.

The big new features are a line upper part with kernel statistics
(context-switches, traps, interrupts, faults etc) and the FLG table
(if you window is big enough)

Some features specific to FreeBSD (dual display (press m)), threaded
processes, and jails have been ported to 3.8b1.

The biggest fix (AFAICT) is the TIME and CPU table for threaded
processes, which are now calculated properly.

The new code can be found on
http://www.mavetju.org/~edwin/freebsd-top-3.8b1-A.tar.gz
Go to 3.8b1/usr.sbin/top and run "make" there to produce the binary,
then run it via "./top".

Please report any issues with it (compile time, run time) and a way
to reproduce it (if possible). Thanks for your help!

Edwin

-- 
Edwin Groothuis
[EMAIL PROTECTED]
http://www.mavetju.org
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"