Re: bad NFS/UDP performance
> > :>-vfs.nfs.realign_test: 22141777 > :>+vfs.nfs.realign_test: 498351 > :> > :>-vfs.nfsrv.realign_test: 5005908 > :>+vfs.nfsrv.realign_test: 0 > :> > :>+vfs.nfsrv.commit_miss: 0 > :>+vfs.nfsrv.commit_blks: 0 > :> > :> changing them did nothing - or at least with respect to nfs throughput :-) > : > :I'm not sure what any of these do, as NFS is a bit out of my league. > ::-) I'll be following this thread though! > : > :-- > :| Jeremy Chadwickjdc at parodius.com | > > A non-zero nfs_realign_count is bad, it means NFS had to copy the > mbuf chain to fix the alignment. nfs_realign_test is just the > number of times it checked. So nfs_realign_test is irrelevant. > it's nfs_realign_count that matters. > it's zero, so I guess I'm ok there. funny though, on my 'good' machine, vfs.nfsrv.realign_test: 5862999 and on the slow one, it's 0 - but then again the good one has been up for several days. > Several things can cause NFS payloads to be improperly aligned. > Anything from older network drivers which can't start DMA on a > 2-byte boundary, resulting in the 14-byte encapsulation header > causing improper alignment of the IP header & payload, to rpc > embedded in NFS TCP streams winding up being misaligned. > > Modern network hardware either support 2-byte-aligned DMA, allowing > the encapsulation to be 2-byte aligned so the payload winds up being > 4-byte aligned, or support DMA chaining allowing the payload to be > placed in its own mbuf, or pad, etc. > > -- > > One thing I would check is to be sure a couple of nfsiod's are running > on the client when doing your tests. If none are running the RPCs wind > up being more synchronous and less pipelined. Another thing I would > check is IP fragment reassembly statistics (for UDP) - there should be > none for TCP connections no matter what the NFS I/O size selected. > ahh, nfsiod, it seems that it's now dynamicaly started! at least none show when host is idle, after i run my tests there are 20! with ppid 0 need to refresh my NFS knowledge. how can I see the IP fragment reassembly statistics? > (It does seem more likely to be scheduler-related, though). > tend to agree, I tried bith ULE/BSD, but the badness is there. > -Matt > thanks, danny ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: bad NFS/UDP performance
:how can I see the IP fragment reassembly statistics? : :thanks, : danny netstat -s Also look for unexpected dropped packets, dropped fragments, and errors during the test and such, they are counted in the statistics as well. -Matt Matthew Dillon <[EMAIL PROTECTED]> ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY
On 2008-Sep-26 23:44:17 -0700, Jeremy Chadwick <[EMAIL PROTECTED]> wrote: >On Fri, Sep 26, 2008 at 10:35:57PM -0700, Derek Kuli??ski wrote: >> As far as I know (at least ideally, when write caching is disabled) ... >FreeBSD atacontrol does not let you toggle such features (although "cap" >will show you if feature is available and if it's enabled or not). True but it can be disabled via the loader tunable hw.ata.wc (at least in theory - apparently some drives don't obey the cache disable command to make them look better in benchmarks). >Users using SCSI will most definitely have the ability to disable >said feature (either via SCSI BIOS or via camcontrol). Soft-updates plus write caching isn't an issue with tagged queueing (which is standard for SCSI) because the critical point for soft-updates is knowing when the data is written to non-volatile storage - which tagged queuing provides. -- Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. pgp72bNCQab19.pgp Description: PGP signature
Re: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY
Hello Jeremy, Friday, September 26, 2008, 11:44:17 PM, you wrote: >> As far as I know (at least ideally, when write caching is disabled) > Re: write caching: wheelies and burn-outs in empty parking lots > detected. > Let's be realistic. We're talking about ATA and SATA hard disks, hooked > up to on-board controllers -- these are the majority of users. Those > with ATA/SATA RAID controllers (not on-board RAID either; most/all of > those do not let you disable drive write caching) *might* have a RAID > BIOS menu item for disabling said feature. > FreeBSD atacontrol does not let you toggle such features (although "cap" > will show you if feature is available and if it's enabled or not). > Users using SCSI will most definitely have the ability to disable > said feature (either via SCSI BIOS or via camcontrol). But the majority > of users are not using SCSI disks, because the majority of users are not > going to spend hundreds of dollars on a controller followed by hundreds > of dollars for a small (~74GB) disk. > Regardless of all of this, end-users should, in no way shape or form, > be expected to go to great lengths to disable their disk's write cache. > They will not, I can assure you. Thus, we must assume: write caching > on a disk will be enabled, period. If a filesystem is engineered with > that fact ignored, then the filesystem is either 1) worthless, or 2) > serves a very niche purpose and should not be the default filesystem. > Do we agree? Yes, but... In the link you sent to me, someone mentioned that write cache is always creates problem, and it doesn't matter on OS or filesystem. There's more below. >> the data should always be consistent, and all fsck supposed to be >> doing is to free unreferenced blocks that were allocated. > fsck does a heck of a lot more than that, and there's no guarantee > that's all fsck is going to do on a UFS2+SU filesystem. I'm under the > impression it does a lot more than just looking for unref'd blocks. Yes, fsck does a lot more than that. But the whole point of soft updates is to reduce the work of fsck to deallocate allocated blocks. Anyway, maybe my information are invalid, though funny thing is that Soft Updates was mentioned in one of my lecture on Operating Systems. Apparently the goal of Soft Updates is to always enforce those rules in very efficient manner, by reordering the writes: 1. Never point to a data structure before initializing it 2. Never reuse a structure before nullifying pointers to it 3. Never reset last pointer to live structure before setting a new one 4. Always mark free-block bitmap entries as used before making the directory entry point to it The problem comes with disks which for performance reasons cache the data and then write it in different order back to the disk. I think that's the reason why it's recommended to disable it. If a disk is reordering the writes, it renders the soft updates useless. But if the writing order is preserved, all data remains always consistent, the only thing that might appear are blocks that were marked as being used, but nothing was pointing to them yet. So (in ideal situation, when nothing interferes) all fsck needs to do is just to scan the filesystem and deallocate those blocks. > The system is already up and the filesystems mounted. If the error in > question is of such severity that it would impact a user's ability to > reliably use the filesystem, how do you expect constant screaming on > the console will help? A user won't know what it means; there is > already evidence of this happening (re: mysterious ATA DMA errors which > still cannot be figured out[6]). > IMHO, a dirty filesystem should not be mounted until it's been fully > analysed/scanned by fsck. So again, people are putting faith into > UFS2+SU despite actual evidence proving that it doesn't handle all > scenarios. Yes, I think the background fsck should be disabled by default, with a possibility to enable it if the user is sure that nothing will interfere with soft updates. > The problem here is that when it was created, it was sort of an > "experiment". Now, when someone installs FreeBSD, UFS2 is the default > filesystem used, and SU are enabled on every filesystem except the root > fs. Thus, we have now put ourselves into a situation where said > feature ***must*** be reliable in all cases. I think in worst case it just is as realiable as if it wouldn't be enabled (the only danger is the background fsck) > You're also forgetting a huge focus of SU -- snapshots[1]. However, there > are more than enough facts on the table at this point concluding that > snapshots are causing more problems[7] than previously expected. And > there's further evidence filesystem snapshots shouldn't even be used in > this way[8]. there's not much to argue about that. >> Also, if I remember correctly, PJD said that gjournal is performing >> much better with small files, while softupdates is faster with big >> ones. > Okay
Re: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY
On Fri, Sep 26, 2008 at 11:44:17PM -0700, Jeremy Chadwick wrote: > On Fri, Sep 26, 2008 at 10:35:57PM -0700, Derek Kuli??ski wrote: > > Hello Jeremy, > > > > Friday, September 26, 2008, 10:14:13 PM, you wrote: > > > > >> Actually what's the advantage of having fsck run in background if it > > >> isn't capable of fixing things? > > >> Isn't it more dangerous to be it like that? i.e. administrator might > > >> not notice the problem; also filesystem could break even further... > > > > > This question should really be directed at a set of different folks, > > > e.g. actual developers of said stuff (UFS2 and soft updates in > > > specific), because it's opening up a can of worms. > > > > > I believe it has to do with the fact that there is much faith given to > > > UFS2 soft updates -- the ability to background fsck allows the user to > > > boot their system and have it up and working (able to log in, etc.) in a > > > much shorter amount of time[1]. It makes the assumption that "everything > > > will work just fine", which is faulty. > > > > As far as I know (at least ideally, when write caching is disabled) > > Re: write caching: wheelies and burn-outs in empty parking lots > detected. > > Let's be realistic. We're talking about ATA and SATA hard disks, hooked > up to on-board controllers -- these are the majority of users. Those > with ATA/SATA RAID controllers (not on-board RAID either; most/all of > those do not let you disable drive write caching) *might* have a RAID > BIOS menu item for disabling said feature. > > FreeBSD atacontrol does not let you toggle such features (although "cap" > will show you if feature is available and if it's enabled or not). No, but using 'sysctl hw.ata.wc=0' will quickly and easily let you disable write caching on all ATA/SATA devices. This was actually the default setting briefly (back in 4.3 IIRC) but was reverted due to the performance penalty being considered too severe. > > Users using SCSI will most definitely have the ability to disable > said feature (either via SCSI BIOS or via camcontrol). But the majority > of users are not using SCSI disks, because the majority of users are not > going to spend hundreds of dollars on a controller followed by hundreds > of dollars for a small (~74GB) disk. > > Regardless of all of this, end-users should, in no way shape or form, > be expected to go to great lengths to disable their disk's write cache. > They will not, I can assure you. Thus, we must assume: write caching > on a disk will be enabled, period. If a filesystem is engineered with > that fact ignored, then the filesystem is either 1) worthless, or 2) > serves a very niche purpose and should not be the default filesystem. > > Do we agree? Sort of, but soft updates does not technically need write caching to be disabled. It does assume that disks will not 'lie' about if data has actually been written to the disk or just to the disk's cache. Many (most?) ATA/SATA disks are unreliable in this regard which means that the guarantees Soft Updates normally give about consistency of the file system can no longer be guaranteed. Using UFS2+soft updates on standard ATA/SATA disks (with write caching enabled) connected to a standard disk controller is not a problem (not any more than any other file system anyway.) Using background fsck together with the above setup is not recommended however. Background fsck will only handle a subset of the errors that a standard foreground fsck can handle. In particular it assumes that the soft updates guarantees of consistency are in place which would mean that there are only a few non-critical problems that could happen. With the above setup those guarantees are not in place, which means that background fsck can encounter errors it cannot (and will not) fix. -- Erik Trulsson [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY
> > IMHO, a dirty filesystem should not be mounted until it's been fully > > analysed/scanned by fsck. So again, people are putting faith into > > UFS2+SU despite actual evidence proving that it doesn't handle all > > scenarios. > > Yes, I think the background fsck should be disabled by default, with a > possibility to enable it if the user is sure that nothing will > interfere with soft updates. Having been bitten by problems in this area more than once, I now always disable background fsck. Having it disabled by default has my vote too. Steinar Haug, Nethelp consulting, [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: bad NFS/UDP performance
> --==_Exmh_1222467420_5817P > Content-Type: text/plain; charset=us-ascii > Content-Disposition: inline > > David, > > You beat me to it. > > Danny, read the iperf man page: >-b, --bandwidth n[KM] > set target bandwidth to n bits/sec (default 1 Mbit/sec). This > setting requires UDP (-u). > > The page needs updating, though. It should read "-b, --bandwidth > n[KMG]. It also does NOT require -u. If you use -b, UDP is assumed. I did RTFM(*), but when i tried it just wouldn't work, I tried today and it's actually working - so don't RTFM before coffee! btw, even though iperf sucks, netperf udp tends to bring the server down to it's knees. danny PS: * - i don't seem to have the iperf man, all I have is iperf -h ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: sysctl maxfiles
On 27/09/2008, at 1:02 PM, Jeremy Chadwick wrote: Anyway, I'd like to know why you have so many fds open simultaneously in the first place. We're talking over 11,000 fds actively open at once -- this is not a small number. What exactly is this machine doing? Are you absolutely certain tuning this higher is justified? Have you looked into the possibility that you have a program which is exhausting fds by not closing them when finished? (Yes, this is quite common; I've seen bad Java code cause this problem on Solaris.) Well, there was a runaway process which looks like it is leaking fds. We haven't solved it yet, but the fact that the maxfiles per machine and the maxfiles per process were so close together was really causing us grief for a while. You're asking for trouble setting these values to the equivalent of unlimited. Instead of asking "what would happen", you should be asking "why would I need to do that". Regarding memory implications, the Handbook goes over it. Unfortunately I've been unable to find it. While we fix the fd leak I'd like to know how high I can push these numbers and not cause other problems. Ari Maniatis --> ish http://www.ish.com.au Level 1, 30 Wilson Street Newtown 2042 Australia phone +61 2 9550 5001 fax +61 2 9550 4001 GPG fingerprint CBFB 84B4 738D 4E87 5E5C 5EFA EF6A 7D2E 3E49 102A ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: 7.1-PRERELEASE freezes (IPFW related)
Jeremy Chadwick <[EMAIL PROTECTED]> writes: > On Fri, Sep 26, 2008 at 06:21:01PM +0200, Christian Laursen wrote: >> I decided to give 7.1-PRERELEASE a try on one of my machines to find >> out if there might be any problems I should be aware of. >> >> I quickly ran into problems. After a while the system freezes >> completely. It seems to be somehow related to the load of the machine >> as it doesn't seem to happen when it is idle. I built a kernel with >> software watchdog enabled and enabled watchdog which had the nice >> effect of turning the freeze into a panic. Hopefully that will be of >> some help. >> [snip] > A couple generic things, although I think jhb@ might be able to figure > out what's going on here: > > 1) Is this machine running the latest BIOS available? > 2) Are you running powerd(8) on this box? > 3) Does disabling ACPI (it's a menu option when booting) help? > 4) Does removing "device cpufreq" help? I tried without ACPI right after writing the previous mail without any luck. However I tried turning off various stuff and found the cause of the problem. When I tried running without my ipfw rules the crashes went away. I then immediately suspected the rules using uid matching and those were indeed responsible. I am now back to running with everything I usually have running on this machine (my primary desktop) but without the ipfw uid rules and the machine is rock stable. I have been running with debug.mpsafenet="0" most likely because I have been using ipfw uid matching. Has RELENG_7 had significant changes in this area? Since I don't need these rules anymore I have just removed them. -- Christian Laursen ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: 7.1-PRERELEASE freezes (IPFW related)
On Sat, 27 Sep 2008, Christian Laursen wrote: I am now back to running with everything I usually have running on this machine (my primary desktop) but without the ipfw uid rules and the machine is rock stable. I have been running with debug.mpsafenet="0" most likely because I have been using ipfw uid matching. Has RELENG_7 had significant changes in this area? Since I don't need these rules anymore I have just removed them. In the last few days, some previously undiscovered interactions have been discovered between the rwlock work for udp/tcp performance and ipfw uid/gid/jail rules. In essence, there were a number of edge cases where it turned out ipfw was relying on lock recursion on those locks, and that's no longer possible. I've fixed two such edge cases in HEAD and will MFC them shortly, but there is at least one other known case. I'm on the fence about whether to continue playing whack-a-mole knocking off the bugs as they are discovered, and fixing it with a hammer (having ipfw and friends check for the lock held before trying to acquire it) -- if this keeps up it's the latter for -STABLE and continuing to fix them as one-off bugs in HEAD. Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: bad NFS/UDP performance
On Fri, 26 Sep 2008, Danny Braniss wrote: after more testing, it seems it's related to changes made between Aug 4 and Aug 29 ie, a kernel built on Aug 4 works fine, Aug 29 is slow. I'l now try and close the gap. I think this is the best way forward -- skimming August changes, there are a number of candidate commits, including retuning of UDP hashes by mav, my rwlock changes, changes to mbuf chain handling, etc. Thanks, Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY
On Sat, Sep 27, 2008 at 12:37:50AM -0700, Derek Kuli??ski wrote: > Friday, September 26, 2008, 11:44:17 PM, you wrote: > > >> As far as I know (at least ideally, when write caching is disabled) > > > Re: write caching: wheelies and burn-outs in empty parking lots > > detected. > > > Let's be realistic. We're talking about ATA and SATA hard disks, hooked > > up to on-board controllers -- these are the majority of users. Those > > with ATA/SATA RAID controllers (not on-board RAID either; most/all of > > those do not let you disable drive write caching) *might* have a RAID > > BIOS menu item for disabling said feature. > > > FreeBSD atacontrol does not let you toggle such features (although "cap" > > will show you if feature is available and if it's enabled or not). > > > Users using SCSI will most definitely have the ability to disable > > said feature (either via SCSI BIOS or via camcontrol). But the majority > > of users are not using SCSI disks, because the majority of users are not > > going to spend hundreds of dollars on a controller followed by hundreds > > of dollars for a small (~74GB) disk. > > > Regardless of all of this, end-users should, in no way shape or form, > > be expected to go to great lengths to disable their disk's write cache. > > They will not, I can assure you. Thus, we must assume: write caching > > on a disk will be enabled, period. If a filesystem is engineered with > > that fact ignored, then the filesystem is either 1) worthless, or 2) > > serves a very niche purpose and should not be the default filesystem. > > > Do we agree? > > Yes, but... > > In the link you sent to me, someone mentioned that write cache is > always creates problem, and it doesn't matter on OS or filesystem. > > There's more below. > > >> the data should always be consistent, and all fsck supposed to be > >> doing is to free unreferenced blocks that were allocated. > > fsck does a heck of a lot more than that, and there's no guarantee > > that's all fsck is going to do on a UFS2+SU filesystem. I'm under the > > impression it does a lot more than just looking for unref'd blocks. > > Yes, fsck does a lot more than that. But the whole point of soft > updates is to reduce the work of fsck to deallocate allocated blocks. > > Anyway, maybe my information are invalid, though funny thing is that > Soft Updates was mentioned in one of my lecture on Operating Systems. > > Apparently the goal of Soft Updates is to always enforce those rules > in very efficient manner, by reordering the writes: > 1. Never point to a data structure before initializing it > 2. Never reuse a structure before nullifying pointers to it > 3. Never reset last pointer to live structure before setting a new one > 4. Always mark free-block bitmap entries as used before making the >directory entry point to it > > The problem comes with disks which for performance reasons cache the > data and then write it in different order back to the disk. > I think that's the reason why it's recommended to disable it. > If a disk is reordering the writes, it renders the soft updates > useless. > > But if the writing order is preserved, all data remains always > consistent, the only thing that might appear are blocks that were > marked as being used, but nothing was pointing to them yet. > > So (in ideal situation, when nothing interferes) all fsck needs to do > is just to scan the filesystem and deallocate those blocks. > > > The system is already up and the filesystems mounted. If the error in > > question is of such severity that it would impact a user's ability to > > reliably use the filesystem, how do you expect constant screaming on > > the console will help? A user won't know what it means; there is > > already evidence of this happening (re: mysterious ATA DMA errors which > > still cannot be figured out[6]). > > > IMHO, a dirty filesystem should not be mounted until it's been fully > > analysed/scanned by fsck. So again, people are putting faith into > > UFS2+SU despite actual evidence proving that it doesn't handle all > > scenarios. > > Yes, I think the background fsck should be disabled by default, with a > possibility to enable it if the user is sure that nothing will > interfere with soft updates. > > > The problem here is that when it was created, it was sort of an > > "experiment". Now, when someone installs FreeBSD, UFS2 is the default > > filesystem used, and SU are enabled on every filesystem except the root > > fs. Thus, we have now put ourselves into a situation where said > > feature ***must*** be reliable in all cases. > > I think in worst case it just is as realiable as if it wouldn't be > enabled (the only danger is the background fsck) > > > You're also forgetting a huge focus of SU -- snapshots[1]. However, there > > are more than enough facts on the table at this point concluding that > > snapshots are causing more problems[7] than previously expected. And > > there's further evidence filesyste
Re: bad NFS/UDP performance
> On Fri, 26 Sep 2008, Danny Braniss wrote: > > > after more testing, it seems it's related to changes made between Aug 4 and > > Aug 29 ie, a kernel built on Aug 4 works fine, Aug 29 is slow. I'l now try > > and close the gap. > > I think this is the best way forward -- skimming August changes, there are a > number of candidate commits, including retuning of UDP hashes by mav, my > rwlock changes, changes to mbuf chain handling, etc. it more difficult than I expected. for one, the kernel date was missleading, the actual source update is the key, so the window of changes is now 28/July to 19/August. I have the diffs, but nothing yet seems relevant. on the other hand, I tried NFS/TCP, and there things seem ok, ie the 'good' and the 'bad' give the same throughput, which seem to point to UDP changes ... danny ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY
[EMAIL PROTECTED] wrote: > [...] > > > IMHO, a dirty filesystem should not be mounted until it's been fully > > > analysed/scanned by fsck. So again, people are putting faith into > > > UFS2+SU despite actual evidence proving that it doesn't handle all > > > scenarios. > > > > Yes, I think the background fsck should be disabled by default, with a > > possibility to enable it if the user is sure that nothing will > > interfere with soft updates. > > Having been bitten by problems in this area more than once, I now always > disable background fsck. Having it disabled by default has my vote too. Just a "me too" here. Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün- chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd "If you think C++ is not overly complicated, just what is a protected abstract virtual base pure virtual private destructor, and when was the last time you needed one?" -- Tom Cargil, C++ Journal ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY
Jeremy Chadwick wrote: > I believe we're in overall agreement with regards to background_fsck > (should be disabled by default). In fact background fsck has been introduced for a good reason: waiting for a full fsck on modern big disks is far too long. Similarly write cache is enabled on ata disks for the reason that without it performance sucks too much. My humble opinion is that you attach far far too much importance to reliability in this game. There are many reasons why corruption may happen in the files, most of them being hardware related (bad ram, overheating chipset, etc.) Hence you can never be assured that your data is perfectly reliable (except perhaps ZFS permanent checksumming), all you have is some probability of reliability. I think that for most people what is important is a good balance between the risk of catastrophic failure (which is always here, and is increased little by background fsck) and the performance and ease of use. The FreeBSD developers have chosen this middle ground, with good reason, in my opinion. People who are more concerned with the reliability of their data, and want to pay the price can always disable background fsck, maintain backups, etc. Personnally i would run away from a system requiring hours of fsck before being able to run multiuser. Neither Windows, with NTFS, nor Linux, with ext3, reiserfs, xfs, jfs, etc. require any form of scandisk or fsck. Demanding that full fsck is the default in FreeBSD is akin to alienating a large fraction of users who have greener pasture easily available. Idem for asking to disable write caching on the disks. So for most people there is a probability to get some day the UNEXPECTED SOFT UPDATE INCONSISTENCY message. They will run a full fsck in that occasion, not a terrible thing. In many years of FreeBSD use, it happened me a small number of times, and i have still to loose a file, at least that i remarked. -- Michel TALON ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
7.1-PRELEASE sporadically panicking with fatal trap 12
I'm running 7.1-PRERELEASE, with /usr/src and /usr/ports last csup-ed just a few days ago. After being up for about a day or so the system will panic because of a page fault. I'm not completely sure, but it seems that the system is more stable when gdm and gnome are disabled in rc.conf. At least it stayed up for several days when I did that. I've run memtest several times, so I'm pretty confident it's not a memory problem. Also the stack trace is always the same, so I'm thinking it's not hardware related. I've attached a stack trace from kgdb, and the output from dmesg. I'd appreciate any help you could give me with this. /var/crash# kgdb -n 5 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-marcel-freebsd"... Unread portion of the kernel message buffer: acd1: WARNING - READ_TOC read data overrun 18>12 Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x188 fault code = supervisor read, page not present instruction pointer = 0x20:0xc0782714 stack pointer = 0x28:0xe52aec00 frame pointer = 0x28:0xe52aec18 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 18 (swi6: task queue) trap number = 12 panic: page fault cpuid = 0 Uptime: 8h10m38s Physical memory: 1779 MB Dumping 195 MB: 180 164 148 132 116 100 84 68 52 36 20 4 Reading symbols from /boot/kernel/sound.ko...Reading symbols from /boot/kernel/sound.ko.symbols...done. done. Loaded symbols for /boot/kernel/sound.ko Reading symbols from /boot/kernel/snd_cmi.ko...Reading symbols from /boot/kernel/snd_cmi.ko.symbols...done. done. Loaded symbols for /boot/kernel/snd_cmi.ko Reading symbols from /boot/kernel/acpi.ko...Reading symbols from /boot/kernel/acpi.ko.symbols...done. done. Loaded symbols for /boot/kernel/acpi.ko Reading symbols from /boot/kernel/linux.ko...Reading symbols from /boot/kernel/linux.ko.symbols...done. done. Loaded symbols for /boot/kernel/linux.ko Reading symbols from /usr/local/modules/fuse.ko...done. Loaded symbols for /usr/local/modules/fuse.ko Reading symbols from /boot/kernel/mach64.ko...Reading symbols from /boot/kernel/mach64.ko.symbols...done. done. Loaded symbols for /boot/kernel/mach64.ko Reading symbols from /boot/kernel/drm.ko...Reading symbols from /boot/kernel/drm.ko.symbols...done. done. Loaded symbols for /boot/kernel/drm.ko #0 doadump () at pcpu.h:196 196 pcpu.h: No such file or directory. in pcpu.h (kgdb) backtrace #0 doadump () at pcpu.h:196 #1 0xc078fae7 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418 #2 0xc078fda9 in panic (fmt=Variable "fmt" is not available. ) at /usr/src/sys/kern/kern_shutdown.c:572 #3 0xc0aa174c in trap_fatal (frame=0xe52aebc0, eva=392) at /usr/src/sys/i386/i386/trap.c:939 #4 0xc0aa19d0 in trap_pfault (frame=0xe52aebc0, usermode=0, eva=392) at /usr/src/sys/i386/i386/trap.c:852 #5 0xc0aa238c in trap (frame=0xe52aebc0) at /usr/src/sys/i386/i386/trap.c:530 #6 0xc0a8827b in calltrap () at /usr/src/sys/i386/i386/exception.s:159 #7 0xc0782714 in _mtx_lock_sleep (m=0xc4ff804c, tid=3302734576, opts=0, file=0x0, line=0) at /usr/src/sys/kern/kern_mutex.c:339 #8 0xc078ed66 in _sema_post (sema=0xc4ff804c, file=0x0, line=0) at /usr/src/sys/kern/kern_sema.c:79 #9 0xc0513350 in ata_completed (context=0xc4ff8000, dummy=1) at /usr/src/sys/dev/ata/ata-queue.c:481 #10 0xc07c2e15 in taskqueue_run (queue=0xc4dbab80) at /usr/src/sys/kern/subr_taskqueue.c:282 #11 0xc07c3123 in taskqueue_swi_run (dummy=0x0) at /usr/src/sys/kern/subr_taskqueue.c:324 #12 0xc076f8db in ithread_loop (arg=0xc4dadb30) at /usr/src/sys/kern/kern_intr.c:1088 #13 0xc076c449 in fork_exit (callout=0xc076f720 , arg=0xc4dadb30, frame=0xe52aed38) at /usr/src/sys/kern/kern_fork.c:804 ---Type to continue, or q to quit--- #14 0xc0a882f0 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:264 (kgdb) up 7 #7 0xc0782714 in _mtx_lock_sleep (m=0xc4ff804c, tid=3302734576, opts=0, file=0x0, line=0) at /usr/src/sys/kern/kern_mutex.c:339 339 owner = (struct thread *)(v & ~MTX_FLAGMASK); (kgdb) list 334 * If the owner is running on another CPU, spin until the 335 * owner stops running or the state of the lock changes. 336 */ 337 v = m->mtx_lock; 338 if (v != MTX_UNOWNED) { 339 owner = (struct thread *)(v & ~MTX_FLAGMASK); 340
Re: bad NFS/UDP performance
Danny Braniss wrote: I know, but I get about 1mgb, which seems somewhat low :-( If you don't tell iperf how much bandwidth to use for a UDP test, it defaults to 1Mbps. See -b option. http://dast.nlanr.net/projects/Iperf/iperfdocs_1.7.0.php#bandwidth --eli -- Eli Dart ESnet Network Engineering Group Lawrence Berkeley National Laboratory PGP Key fingerprint = C970 F8D3 CFDD 8FFF 5486 343A 2D31 4478 5F82 B2B3 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Recommendations for servers running SATA drives
I'm forking the thread on fsck/soft-updates in hopes of getting some practical advice based on the discussion here of background fsck, softupdates and write-caching on SATA drives. On Fri, 26 Sep 2008, Jeremy Chadwick wrote: Let's be realistic. We're talking about ATA and SATA hard disks, hooked up to on-board controllers -- these are the majority of users. Those with ATA/SATA RAID controllers (not on-board RAID either; most/all of those do not let you disable drive write caching) *might* have a RAID BIOS menu item for disabling said feature. While I would love to deploy every server with SAS, that's not practical in many cases, especially for light-duty servers that are not being pushed very hard. I am taking my chances with multiple affordable drives and gmirror where I cannot throw in a 3Ware card. I imagine that many non-desktop FreeBSD users are doing the same considering you can fetch a decent 1U box with plenty of storage for not much more than $1K. I assume many here are in agreement on this point -- just making it clear that the bargain crowd is not some weird edge case in the userbase... Regardless of all of this, end-users should, in no way shape or form, be expected to go to great lengths to disable their disk's write cache. They will not, I can assure you. Thus, we must assume: write caching on a disk will be enabled, period. If a filesystem is engineered with that fact ignored, then the filesystem is either 1) worthless, or 2) serves a very niche purpose and should not be the default filesystem. Arguments about defaults aside, this is my first questions. If I've got a server with multiple SATA drives mirrored with gmirror, is turning on write-caching a good idea? What kind of performance impact should I expect? What is the relationship between caching, soft-updates, and either NCQ or TCQ? Here's an example of a Seagate, trimmed for brevity: Protocol Serial ATA v1.0 device model ST3160811AS Feature Support EnableValue Vendor write cacheyes yes read ahead yes yes Native Command Queuing (NCQ) yes - 31/0x1F Tagged Command Queuing (TCQ) no no 31/0x1F TCQ is clearly not supported, NCQ seems to be supported, but I don't know how to tell if it's actually enabled or not. Write-caching is currently on. The tradeoff is apparently performance vs. more reliable recovery should the machine lose power, smoke itself, etc., but all I've seen is anecdotal evidence of how bad performance gets. FWIW, this machine in particular had it's mainboard go up in smoke last week. One drive was too far gone for gmirror to rebuild it without doing a "forget" and "insert". The remaining drive was too screwy for background fsck, but a manual check in single-user left me with no real suprises or problems. The system is already up and the filesystems mounted. If the error in question is of such severity that it would impact a user's ability to reliably use the filesystem, how do you expect constant screaming on the console will help? A user won't know what it means; there is already evidence of this happening (re: mysterious ATA DMA errors which still cannot be figured out[6]). IMHO, a dirty filesystem should not be mounted until it's been fully analysed/scanned by fsck. So again, people are putting faith into UFS2+SU despite actual evidence proving that it doesn't handle all scenarios. I'll ask, but it seems like the consensus here is that background fsck, while the default, is best left disabled. The cases where it might make sense are: -desktop systems -servers that have incredibly huge filesystems (and even there being able to selectively background fsck filesystems might be helpful) The first example is obvious, people want a fast-booting desktop. The second is trading long fsck times in single-user for some uncertainty. The problem here is that when it was created, it was sort of an "experiment". Now, when someone installs FreeBSD, UFS2 is the default filesystem used, and SU are enabled on every filesystem except the root fs. Thus, we have now put ourselves into a situation where said feature ***must*** be reliable in all cases. You're also forgetting a huge focus of SU -- snapshots[1]. However, there are more than enough facts on the table at this point concluding that snapshots are causing more problems[7] than previously expected. And there's further evidence filesystem snapshots shouldn't even be used in this way[8]. ... Filesystems have to be reliable; data integrity is focus #1, and cannot be sacrificed. Users and administrators *expect* a filesystem to be reliable. No one is going to keep using a filesystem if it has disadvantages which can result in data loss or "waste of administrative time" (which I believe is what's occurring here). The softupdates question seems tied quite closely to the wr
Re: sysctl maxfiles
Jeremy Chadwick wrote: On Sat, Sep 27, 2008 at 11:10:01AM +1000, Aristedes Maniatis wrote: By default FreeBSD 7.0 shipped with the sysctls set to: kern.maxfiles: 12328 kern.maxfilesperproc: 11095 [...] Anyway, I'd like to know why you have so many fds open simultaneously in the first place. We're talking over 11,000 fds actively open at once -- this is not a small number. What exactly is this machine doing? Are you absolutely certain tuning this higher is justified? Have you looked into the possibility that you have a program which is exhausting fds by not closing them when finished? (Yes, this is quite common; I've seen bad Java code cause this problem on Solaris.) I can imagine some webhosting machine running Apache virtualhosts. Each virtual host using 3 logfiles (access log, error log, IO log) so it is "only" about 4000 domains (virtualhosts) which is not so uncommon in these days ;) I don't know what files are "really" open in the meaning of kern.maxfiles. I have webserver with about 100 hosted domains and there is some numbers: [EMAIL PROTECTED] ~/# fstat -u www | wc -l 9931 [EMAIL PROTECTED] ~/# fstat -u root | wc -l 718 [EMAIL PROTECTED] ~/# fstat | grep httpd | wc -l 6379 [EMAIL PROTECTED] ~/# fstat | grep httpd | wc -l 6002 [EMAIL PROTECTED] ~/# fstat -u www | wc -l 4691 [EMAIL PROTECTED] ~/# sysctl kern.openfiles kern.openfiles: 846 All above taken within few seconds. Can somebody explain the difference between kern.openfiles and fstat? Miroslav Lachman ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Recommendations for servers running SATA drives
On Sat, Sep 27, 2008 at 03:16:11PM -0400, Charles Sprickman wrote: > On Fri, 26 Sep 2008, Jeremy Chadwick wrote: >> Let's be realistic. We're talking about ATA and SATA hard disks, hooked >> up to on-board controllers -- these are the majority of users. Those >> with ATA/SATA RAID controllers (not on-board RAID either; most/all of >> those do not let you disable drive write caching) *might* have a RAID >> BIOS menu item for disabling said feature. > > While I would love to deploy every server with SAS, that's not practical > in many cases, especially for light-duty servers that are not being > pushed very hard. I am taking my chances with multiple affordable drives > and gmirror where I cannot throw in a 3Ware card. I imagine that many > non-desktop FreeBSD users are doing the same considering you can fetch a > decent 1U box with plenty of storage for not much more than $1K. I > assume many here are in agreement on this point -- just making it clear > that the bargain crowd is not some weird edge case in the userbase... I'm in full agreement here. As much as I love SCSI (and I sincerely do) it's (IMHO unjustifiably) overpriced, simply because "it can be". You'd expect the price of SCSI to decrease over the years, but it hasn't; it's become part of a niche market, primarily intended for large businesses with cash to blow. As I said, I love SCSI, the protocol is excellent, and it's very well-supported all over the place -- and though I have no personal experience with SAS, it appears to be equally as excellent, yet the price is comparative to SCSI. Even at my place of work we use SATA disks in our filers. I suppose this is justified in the sense that a disk failure there will be less painful than it would be in a single or dual-disk server, so saving money is legitimate since RAID-5 (or whatever) is in use. But with regards to our server boxes, either single or dual SATA disks are now being used, rather than SCSI. I haven't asked our datacenter and engineering folks why we've switched, but gut feeling says "saving money" >> Regardless of all of this, end-users should, in no way shape or form, >> be expected to go to great lengths to disable their disk's write cache. >> They will not, I can assure you. Thus, we must assume: write caching >> on a disk will be enabled, period. If a filesystem is engineered with >> that fact ignored, then the filesystem is either 1) worthless, or 2) >> serves a very niche purpose and should not be the default filesystem. > > Arguments about defaults aside, this is my first questions. If I've got > a server with multiple SATA drives mirrored with gmirror, is turning on > write-caching a good idea? What kind of performance impact should I > expect? What is the relationship between caching, soft-updates, and > either NCQ or TCQ? > > Here's an example of a Seagate, trimmed for brevity: > > Protocol Serial ATA v1.0 > device model ST3160811AS > > Feature Support EnableValue Vendor > write cacheyes yes > read ahead yes yes > Native Command Queuing (NCQ) yes - 31/0x1F > Tagged Command Queuing (TCQ) no no 31/0x1F > > TCQ is clearly not supported, NCQ seems to be supported, but I don't know > how to tell if it's actually enabled or not. Write-caching is currently > on. Actually, no -- FreeBSD ata(4) does not support NCQ. I believe there are some unofficial patches (or even a PR) floating around which are for testing, but out of the box, it lacks support. The hyphen you see under the Enable column is supposed to signify that (I feel it's badly placed; it should say "notsupp" or "unsupp" or something like that. Hyphen is too vague). The NCQ support patches might require AHCI as well, I forget. It's been a while. > The tradeoff is apparently performance vs. more reliable recovery should > the machine lose power, smoke itself, etc., but all I've seen is > anecdotal evidence of how bad performance gets. > > FWIW, this machine in particular had it's mainboard go up in smoke last > week. One drive was too far gone for gmirror to rebuild it without doing > a "forget" and "insert". The remaining drive was too screwy for > background fsck, but a manual check in single-user left me with no real > suprises or problems. As long as the array rebuilt fine, I believe small quirks are acceptable. Scenarios where the array *doesn't* rebuild properly when a new disk is added are of great concern (and in the case of some features such as Intel MatrixRAID, the FreeBSD bugs are so severe that you are liable to lose data in such scenarios. MatrixRAID != gmirror, of course). This also leads me a little off-topic -- when it comes to disk replacements, administrators want to be able to do this without taking the system down. There are problems with this, but it often depends greatly on hardware and BIOS configuration.
Re: sysctl maxfiles
On Sat, Sep 27, 2008 at 10:14:09PM +0200, Miroslav Lachman wrote: > Jeremy Chadwick wrote: >> On Sat, Sep 27, 2008 at 11:10:01AM +1000, Aristedes Maniatis wrote: >> >>> By default FreeBSD 7.0 shipped with the sysctls set to: >>> >>> kern.maxfiles: 12328 >>> kern.maxfilesperproc: 11095 > > [...] > >> Anyway, I'd like to know why you have so many fds open simultaneously in >> the first place. We're talking over 11,000 fds actively open at once -- >> this is not a small number. What exactly is this machine doing? Are >> you absolutely certain tuning this higher is justified? Have you looked >> into the possibility that you have a program which is exhausting fds by >> not closing them when finished? (Yes, this is quite common; I've seen >> bad Java code cause this problem on Solaris.) > > I can imagine some webhosting machine running Apache virtualhosts. Each > virtual host using 3 logfiles (access log, error log, IO log) so it is > "only" about 4000 domains (virtualhosts) which is not so uncommon in > these days ;) We're a web/shell hosting provider who used to do it that way. It became unreasonable/impossible to manage. Also, if said logfiles are being placed in directories where users of those virtualhosts can remove the files (and make symlinks to other places), that's a security hole (because Apache opens webserver logfiles as root). The way we do it is much more resource-friendly: log everything to a single logfile, then every night split the logfile up (based on the CustomLog %v parameter into per-vhost log files. Apache comes with a script to do this called split-logfile. > I don't know what files are "really" open in the meaning of > kern.maxfiles. I have webserver with about 100 hosted domains and there > is some numbers: > > [EMAIL PROTECTED] ~/# fstat -u www | wc -l > 9931 I don't think this is an accurate portrait of the number of open files. The number is going to be too high; I believe entries that contain FD=jail/mmap/root/text/tr/wd are not actual descriptors (are they?) > [EMAIL PROTECTED] ~/# fstat -u root | wc -l > 718 > [EMAIL PROTECTED] ~/# fstat | grep httpd | wc -l > 6379 > [EMAIL PROTECTED] ~/# fstat | grep httpd | wc -l > 6002 > [EMAIL PROTECTED] ~/# fstat -u www | wc -l > 4691 > [EMAIL PROTECTED] ~/# sysctl kern.openfiles > kern.openfiles: 846 > > All above taken within few seconds. > > Can somebody explain the difference between kern.openfiles and fstat? -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: sysctl maxfiles
Miroslav Lachman wrote: > I don't know what files are "really" open in the meaning of > kern.maxfiles. I have webserver with about 100 hosted domains and there > is some numbers: > > [EMAIL PROTECTED] ~/# fstat -u www | wc -l > 9931 > [EMAIL PROTECTED] ~/# fstat -u root | wc -l > 718 > [EMAIL PROTECTED] ~/# fstat | grep httpd | wc -l > 6379 > [EMAIL PROTECTED] ~/# fstat | grep httpd | wc -l > 6002 > [EMAIL PROTECTED] ~/# fstat -u www | wc -l > 4691 > [EMAIL PROTECTED] ~/# sysctl kern.openfiles > kern.openfiles: 846 > > All above taken within few seconds. > > Can somebody explain the difference between kern.openfiles and fstat? Those are different things: fstat lists file descriptors, while kern.openfiles counts open file objects, which are often shared among processes. For example, when the apache master process forks its children, the children inherit the open file objects from the parent process. While every child has its own set of file descriptors (listed separately by fstat), they reference the same underlying open file objects, so they don't contribute separately to kern.openfiles. In the same way, fstat lists stdin + stdout + stderr for almost every process, but in most cases they are not separate file objects because they were inherited from the parent process. Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün- chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd "In My Egoistical Opinion, most people's C programs should be indented six feet downward and covered with dirt." -- Blair P. Houghton ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: sysctl maxfiles
On 2008-Sep-27 22:14:09 +0200, Miroslav Lachman <[EMAIL PROTECTED]> wrote: >[EMAIL PROTECTED] ~/# fstat -u www | wc -l > 9931 >[EMAIL PROTECTED] ~/# fstat -u root | wc -l > 718 >[EMAIL PROTECTED] ~/# fstat | grep httpd | wc -l > 6379 >[EMAIL PROTECTED] ~/# fstat | grep httpd | wc -l > 6002 >[EMAIL PROTECTED] ~/# fstat -u www | wc -l > 4691 >[EMAIL PROTECTED] ~/# sysctl kern.openfiles >kern.openfiles: 846 kern.openfiles reflects the total number of open file structures within the kernel, whereas fstat (and lsof) report both open files and vnodes associated with each process. The differences are 1) File structures are shared via fork() etc so the same file structure can be reported multiple times. 2) fstat reports executable name, working directory and root Open files in fstat can be detected because they have numeric values (possibly with a '*' appended) in the FD column. Unfortunately, there doesn't appear to be any easy way to detect shared file structures (for inode-based files) using either fstat or lsof. In the case of apache, there are at least 6 file structures shared by each httpd process (and it looks like it might be about 15). -- Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. pgpFX8suq2Bn0.pgp Description: PGP signature
Warning: known instability using ipfw "uid" rules
An FYI: In the past couple of days, presumably as testing of 7.x becomes more widespread, I've seen several reports of instability resulting from ipfw credential rules. For those unfamiliar with them, these allow the matching of packets in ipfw rules based on the credentials of the socket that generated them, or the credentials of the socket that likely will receive them. These problems are a side effect of elimating support for lock recursion on inpcbinfo locks as part of the UDP performance optimization work for 7.1. There are two minor TCP fixes, and a more serious ipfw bug fix, in the queue to be MFC'd in the next couple of days. Once they're fixed, please make sure any further problems with deadlocks or panics involving ipfw rules are brought to my attention. Thanks, and apologies for any inconvenience -- this issue did not arise during testing in HEAD over the course of several months, but fortunately appears fairly straight forward to resolve now that it's a bit better understood. Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
7.1-PRERELEASE : bad network performance (nfe0)
Hello, I've serious network performance problems on a HP Turion X2 based brand new notebook; I only used a 7-1Beta CD and 7-STABLE on this thing. Scp-ing ports.tgz from a rock-stable 7-STABLE server to it gives : # scp -p ports.tgz [EMAIL PROTECTED]:/tmp/ ports.tgz 100% 98MB 88.7KB/s 18:49 (doing the same thing by copy from an nfs-mounted disk even takes mores than an hour ...) Doing a top(1) aside, just shows the box 100% idle : PID USERNAME PRI NICE SIZERES STATE C TIME WCPU COMMAND 12 root 171 ki31 0K16K CPU0 0 38:55 100.00% idle: cpu0 11 root 171 ki31 0K16K RUN1 38:55 100.00% idle: cpu1 13 root -32- 0K16K WAIT 0 0:02 0.00% swi4: clock sio 29 root -68- 0K16K - 0 0:00 0.00% nfe0 taskq 34 root -64- 0K16K WAIT 1 0:00 0.00% irq23: atapci1 1853 root 80 7060K 1920K wait 0 0:00 0.00% sh 878 nono 440 8112K 2288K CPU1 1 0:00 0.00% top 884 root 8- 0K16K - 1 0:00 0.00% nfsiod 0 4 root -8- 0K16K - 1 0:00 0.00% g_down 16 root -16- 0K16K - 1 0:00 0.00% yarrow 46 root 20- 0K16K syncer 0 0:00 0.00% syncer 3 root -8- 0K16K - 0 0:00 0.00% g_up 30 root -68- 0K16K - 0 0:00 0.00% fw0_taskq I tested : Update Bios ULE /4BSD PREEMPTION on/off PREEMPTION + IPI_PREEMPTION hw.nfe.msi[x]_disable=1 All don't seem to matter to the problem. I put two tcpdumps (server and client during another scp(1) ) on http://bare.snv.jussieu.fr/temp/tcpdump-s1518.server http://bare.snv.jussieu.fr/temp/tcpdump-s1518.client I'm far from an expert on TCP/IP, but wireshark "expert info" shows lots of sequences like : TCP Previous segment lost TCP Duplicate ACK 1 TCP Window update TCP Duplicate ACK 2 TCP Duplicate ACK 3 TCP Duplicate ACK 4 TCP Duplicate ACK 5 TCP Fast retransmission (suspected) TCP ... TCP Out-of-Order segment TCP ... As usual, feel free to contact me for further info/tests. Thanx, Arno # uname -a FreeBSD mv 7.1-PRERELEASE FreeBSD 7.1-PRERELEASE #0: Fri Sep 26 15:06:07 CEST 2008 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/PAVILLON amd64 # pciconf -lcv (bits) [EMAIL PROTECTED]:0:6:0:class=0x02 card=0x30cf103c chip=0x045010de rev=0xa3 hdr=0x00 vendor = 'Nvidia Corp' device = 'MCP65 Ethernet' class = network subclass = ethernet cap 01[44] = powerspec 2 supports D0 D1 D2 D3 current D0 # dmesg -a Copyright (c) 1992-2008 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 7.1-PRERELEASE #0: Fri Sep 26 15:06:07 CEST 2008 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/PAVILLON Timecounter "i8254" frequency 1193250 Hz quality 0 CPU: AMD Turion(tm) 64 X2 Mobile Technology TL-62 (2109.70-MHz K8-class CPU) Origin = "AuthenticAMD" Id = 0x60f82 Stepping = 2 Features=0x178bfbff Features2=0x2001 AMD Features=0xea500800 AMD Features2=0x11f Cores per package: 2 usable memory = 3210813440 (3062 MB) avail memory = 3104542720 (2960 MB) ACPI APIC Table: FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 ioapic0 irqs 0-23 on motherboard kbd1 at kbdmux0 ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413) acpi0: on motherboard acpi0: [ITHREAD] acpi0: Power Button (fixed) ACPI Error (dsopcode-0671): Field [I9MN] at 544 exceeds Buffer [IORT] size 464 (bits) [20070320] ACPI Error (psparse-0626): Method parse/execution failed [\\_SB_.PCI0.LPC0.PMIO._CRS] (Node 0xff00011f50a0), AE_AML_BUFFER_LIMIT ACPI Error (uteval-0309): Method execution failed [\\_SB_.PCI0.LPC0.PMIO._CRS] (Node 0xff00011f50a0), AE_AML_BUFFER_LIMIT can't fetch resources for \\_SB_.PCI0.LPC0.PMIO - AE_AML_BUFFER_LIMIT Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0 acpi_ec0: port 0x62,0x66 on acpi0 acpi_hpet0: iomem 0xfed0-0xfed003ff on acpi0 Timecounter "HPET" frequency 2500 Hz quality 900 acpi_acad0: on acpi0 battery0: on acpi0 acpi_lid0: on acpi0 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 pci0: at device 0.0 (no driver attached) isab0: port 0x1d00-0x1dff at device 1.0 on pci0 isa0: on isab0 pci0: at device 1.1 (no driver attached) pci0: at device 1.3 (no driver attached) ohci0: mem 0xf2486000-0xf2486fff irq 18 at device 2.0 on pci0 ohci0: [GIANT-LOCKED] ohci0: [ITHREAD] usb0: OHCI version 1.0, legacy support usb0: on ohci0 usb0: USB revision 1.0 uhub0: on usb0 uhub0: 10 ports with 10 removable, self powered ehci0: mem 0xf24880
Re: sysctl maxfiles
On Sat, Sep 27, 2008 at 07:05:08PM +1000, Aristedes Maniatis wrote: > > On 27/09/2008, at 1:02 PM, Jeremy Chadwick wrote: > > >Anyway, I'd like to know why you have so many fds open > >simultaneously in > >the first place. We're talking over 11,000 fds actively open at > >once -- > >this is not a small number. What exactly is this machine doing? Are > >you absolutely certain tuning this higher is justified? Have you > >looked > >into the possibility that you have a program which is exhausting fds > >by > >not closing them when finished? (Yes, this is quite common; I've seen > >bad Java code cause this problem on Solaris.) > > > Well, there was a runaway process which looks like it is leaking fds. > We haven't solved it yet, but the fact that the maxfiles per machine > and the maxfiles per process were so close together was really causing > us grief for a while. > > > > >You're asking for trouble setting these values to the equivalent of > >unlimited. Instead of asking "what would happen", you should be > >asking > >"why would I need to do that". > > > >Regarding memory implications, the Handbook goes over it. > > Unfortunately I've been unable to find it. While we fix the fd leak > I'd like to know how high I can push these numbers and not cause other > problems. At least one port recommends you set kern.maxfiles="4" in /boot/loader.conf. I think its one of the GNOME ports. I'm pretty confident you can run that without too many problems, and maybe go higher, but if you really want to know the limit its probably kernel memory and that will depend on your workload. Solving the fd leak is by far the safest path. Note that tracking that many files is probably affecting your application performance in addition to hurting the system. Regards, Gary ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: sysctl maxfiles
On 28/09/2008, at 8:18 AM, Gary Palmer wrote: At least one port recommends you set kern.maxfiles="4" in /boot/loader.conf. I think its one of the GNOME ports. I'm pretty confident you can run that without too many problems, and maybe go higher, but if you really want to know the limit its probably kernel memory and that will depend on your workload. I guess then I should ask the question a different way. How much memory does each fd use and which pool of memory does it come from? This is ZFS if that makes any difference. Or asked a different way, if I set the number to 200,000 and some rogue process used 190,000 fds, then what bad thing would happen to the system? If any. Solving the fd leak is by far the safest path. Note that tracking that many files is probably affecting your application performance in addition to hurting the system. Absolutely. We are working on it. But general Unix principles are that a non-root user should not be able to get Unix to a non-functional state. It appears that this is a very simple path to DoS, particularly since with the default settings it is easy for one process to use up all available fds and leave no more for anyone to be able to log in. Ari Maniatis --> ish http://www.ish.com.au Level 1, 30 Wilson Street Newtown 2042 Australia phone +61 2 9550 5001 fax +61 2 9550 4001 GPG fingerprint CBFB 84B4 738D 4E87 5E5C 5EFA EF6A 7D2E 3E49 102A ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: sysctl maxfiles
Peter Jeremy wrote: On 2008-Sep-27 22:14:09 +0200, Miroslav Lachman <[EMAIL PROTECTED]> wrote: [EMAIL PROTECTED] ~/# fstat -u www | wc -l 9931 [EMAIL PROTECTED] ~/# fstat -u root | wc -l 718 [EMAIL PROTECTED] ~/# fstat | grep httpd | wc -l 6379 [EMAIL PROTECTED] ~/# fstat | grep httpd | wc -l 6002 [EMAIL PROTECTED] ~/# fstat -u www | wc -l 4691 [EMAIL PROTECTED] ~/# sysctl kern.openfiles kern.openfiles: 846 kern.openfiles reflects the total number of open file structures within the kernel, whereas fstat (and lsof) report both open files and vnodes associated with each process. The differences are 1) File structures are shared via fork() etc so the same file structure can be reported multiple times. 2) fstat reports executable name, working directory and root Open files in fstat can be detected because they have numeric values (possibly with a '*' appended) in the FD column. Unfortunately, there doesn't appear to be any easy way to detect shared file structures (for inode-based files) using either fstat or lsof. In the case of apache, there are at least 6 file structures shared by each httpd process (and it looks like it might be about 15). Thank you for your explanation. (Jeremy Chadwick, Oliver Fromme, Peter Jeremy. Now it makes sense to me. Miroslav Lachmanx ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Request for testing - top 3.8b1 in the base system
I have made an update for the top(1) utility in the FreeBSD base system to get it from the 3.5b12 version to the 3.8b1 version. I have tried them on the amd64 architecture on FreeBSD -current and FreeBSD 7.0 and on the i386 architecture on FreeBSD 7.0. The big new features are a line upper part with kernel statistics (context-switches, traps, interrupts, faults etc) and the FLG table (if you window is big enough) Some features specific to FreeBSD (dual display (press m)), threaded processes, and jails have been ported to 3.8b1. The biggest fix (AFAICT) is the TIME and CPU table for threaded processes, which are now calculated properly. The new code can be found on http://www.mavetju.org/~edwin/freebsd-top-3.8b1-A.tar.gz Go to 3.8b1/usr.sbin/top and run "make" there to produce the binary, then run it via "./top". Please report any issues with it (compile time, run time) and a way to reproduce it (if possible). Thanks for your help! Edwin -- Edwin Groothuis [EMAIL PROTECTED] http://www.mavetju.org ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"