Re: [IMPORTANT] Adaptec no longer supporting iir(4) driver ...
On Fri, 4 Aug 2006, John-Mark Gurney wrote: User Freebsd wrote this message on Mon, Jul 31, 2006 at 22:44 -0300: For those that haven't been following the discussion on this, the iir(4) driver in FreeBSD 6.x appears to have a deadlock issue under medium to heavy load, where the 'blocked' state just continues to rise until file accesses just no longer work ... So, if you are running a server that is using the iir(4) device driver and are considering upgrading to FreeBSD 6.x and beyond, or are looking to build a new machine using a device that relies on this driver, do so at your own peril ... Please note that this deadlock issue exists on *both* the ICP Vortex cards, *and* the Intel based RAID controllers ... Have you tried the driver in -current and/or 6.1-R? Specificly v1.14 and v1.13.2.1 of iir.c that limits the simq to 32 commands? We are running w/ this modifications w/o issues on 6.0-R w/ SRCU31A and SRCU42L cards... We have a few GDT cards also that I don't believe we are having any issues with... Yes, this was the first thing ScottL asked when we narrowed the problem down ... this appears to be a different issue then the one you were seeing :( Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: em device hangs on ifconfig alias ...
Any status on this patch being merged in? On Mon, 10 Jul 2006, Atanas wrote: Pyun YongHyeon said the following on 7/7/06 8:32 PM: On Fri, Jul 07, 2006 at 10:38:01PM +0100, Robert Watson wrote: Yes -- basically, there are two problems: (1) A little problem, in which an arp announcement is sent before the link has settled after reset. (2) A big problem, in which the interface is gratuitously recent requiring long settling times. I'd really like to see a fix to the second of these problems (not resetting when an IP is added or removed, resulting in link renegotiation); the first one I'm less concerned about, although it would make some amount of sense to do an arp announcement when the link goes up. Ah, I see. Thanks for the insight. How about the attached patch? This patch seems to fix both of the issues, or at least this is what I see now: - the card no longer gets reset when adding an alias; - the arp packet gets delivered; - adding 250 aliases takes less than a second; I haven't fully tested whether all 250 IP aliases were accessible (I used non-routable IP addresses), but I suppose so. Also I couldn't stress the patched driver enough to see whether it performs as expected. But in overall it looks good. I guess some more testing might be needed in order to merge the patch into the source tree. Regards, Atanas Index: if_em.c === RCS file: /pool/ncvs/src/sys/dev/em/if_em.c,v retrieving revision 1.116 diff -u -r1.116 if_em.c --- if_em.c 6 Jun 2006 08:03:49 - 1.116 +++ if_em.c 8 Jul 2006 03:30:36 - @@ -67,6 +67,7 @@ #include netinet/in_systm.h #include netinet/in.h +#include netinet/if_ether.h #include netinet/ip.h #include netinet/tcp.h #include netinet/udp.h @@ -692,6 +693,9 @@ EM_LOCK_ASSERT(sc); + if ((ifp-if_drv_flags (IFF_DRV_RUNNING|IFF_DRV_OACTIVE)) != + IFF_DRV_RUNNING) + return; if (!sc-link_active) return; @@ -745,6 +749,7 @@ { struct em_softc *sc = ifp-if_softc; struct ifreq *ifr = (struct ifreq *)data; + struct ifaddr *ifa = (struct ifaddr *)data; int error = 0; if (sc-in_detach) @@ -752,9 +757,22 @@ switch (command) { case SIOCSIFADDR: - case SIOCGIFADDR: - IOCTL_DEBUGOUT(ioctl rcv'd: SIOCxIFADDR (Get/Set Interface Addr)); - ether_ioctl(ifp, command, data); + if (ifa-ifa_addr-sa_family == AF_INET) { + /* +* XXX +* Since resetting hardware takes a very long time +* we only initialize the hardware only when it is +* absolutely required. +*/ + ifp-if_flags |= IFF_UP; + if (!(ifp-if_drv_flags IFF_DRV_RUNNING)) { + EM_LOCK(sc); + em_init_locked(sc); + EM_UNLOCK(sc); + } + arp_ifinit(ifp, ifa); + } else + error = ether_ioctl(ifp, command, data); break; case SIOCSIFMTU: { @@ -802,17 +820,19 @@ IOCTL_DEBUGOUT(ioctl rcv'd: SIOCSIFFLAGS (Set Interface Flags)); EM_LOCK(sc); if (ifp-if_flags IFF_UP) { - if (!(ifp-if_drv_flags IFF_DRV_RUNNING)) { + if ((ifp-if_drv_flags IFF_DRV_RUNNING)) { + if ((ifp-if_flags ^ sc-if_flags) + IFF_PROMISC) { + em_disable_promisc(sc); + em_set_promisc(sc); + } + } else em_init_locked(sc); - } - - em_disable_promisc(sc); - em_set_promisc(sc); } else { - if (ifp-if_drv_flags IFF_DRV_RUNNING) { + if (ifp-if_drv_flags IFF_DRV_RUNNING) em_stop(sc); - } } + sc-if_flags = ifp-if_flags; EM_UNLOCK(sc); break; case SIOCADDMULTI: @@ -878,8 +898,8 @@ break; } default: - IOCTL_DEBUGOUT1(ioctl received: UNKNOWN (0x%x), (int)command); - error = EINVAL; + error = ether_ioctl(ifp, command, data); + break; } return (error); Index: if_em.h === RCS file: /pool/ncvs/src/sys/dev/em/if_em.h,v retrieving revision 1.44 diff
Re: Safe card to replace for ICP Vortex GDT8514RZ ...
On Tue, 1 Aug 2006, Patrick M. Hausen wrote: Hi! On Mon, Jul 31, 2006 at 10:49:27PM -0300, User Freebsd wrote: Official word from Adaptec is that FreeBSD is no longer a supported platform, so, I either live with the deadlocks, or try and figure out a suitable replacement for the card ... That's really really bad news. Oddly, ICP Vortex Germany told me the opposite wr/t to their new line of cards. They said, they were working on full FreeBSD support. Great, that definitely wasn't the feel that I got from them ... I've been using Adaptec products since early 90's, mainly because they have always been the 'tried-n-true' product ... As I mentioned to someone else, I'm willing to endure having the server hang up a few times in order to debug the problem, and fix the driver, but any correspondance that I actually got answers back on gave me the feel that I was on my own ... my previous email to this was meant to warn others to think twice, especially with newer FreeBSD boxes, about going with anything that runs on the iir(4) driver ... Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Safe card to replace for ICP Vortex GDT8514RZ ...
On Tue, 1 Aug 2006, Christian Brueffer wrote: On Mon, Jul 31, 2006 at 10:49:27PM -0300, User Freebsd wrote: I have a remote server, running the above RAID controller, that, as most ppl here have seen over the past few weeks, is causing endless headaches ... Official word from Adaptec is that FreeBSD is no longer a supported platform, so, I either live with the deadlocks, or try and figure out a suitable replacement for the card ... So, can anyone recommend a card to replace this with? Its a remote server, so I'm looking for something that will be plug-n-play, same slot that the GDT is in ... I realize that I'll have to reformat the server afterwards ... I contacted Achim Leubner not long ago, about wheather he still maintains and supports the iir(4) driver, as claimed in the SEE ALSO section of the manpage. His answer was yes. I email'd him several weeks back, as soon as it was determined that the problem I've been experiencing with the deadlocks looked to be iir related, and didn't hear anything back :( Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: [IMPORTANT] Adaptec no longer supporting iir(4) driver ...
A quick follow up on this email ... please note that I have not, in this email, pointed to anything but the iir(4) driver, and, more specifically, the GDT controller card ... I have been using Adaptec products since the early 90's, and, until upgrading to FreeBSD 6.x, *never* had a complaint with them ... this email was meant to be a 'caveat emptor' for anyone looking to use the iir(4) driver, and is not meant to apply to *all* Adaptec cards, as they don't all use the iir(4) driver ... Apologies to all who took this as a broad attack against Adaptec, it was not meant as such ... On Mon, 31 Jul 2006, User Freebsd wrote: 'k, I finally got ahold of someone @ adaptec, and the official word seems to be: FreeBSD 6 is not officially supported for the GDT based ICP RAID controllers. Nevertheless the inbox driver should work. Great, well, the inbox driver doesn't work with FreeBSD 6.x, and support doesn't exist to get it fixed, mainly since, as most ppl here know, the specs are closed, so even a non-Adaptec person can't do much to fix the problem(s) ... For those that haven't been following the discussion on this, the iir(4) driver in FreeBSD 6.x appears to have a deadlock issue under medium to heavy load, where the 'blocked' state just continues to rise until file accesses just no longer work ... So, if you are running a server that is using the iir(4) device driver and are considering upgrading to FreeBSD 6.x and beyond, or are looking to build a new machine using a device that relies on this driver, do so at your own peril ... Please note that this deadlock issue exists on *both* the ICP Vortex cards, *and* the Intel based RAID controllers ... If anyone from Adaptec is out there and is actually interested in seeing this problem fixed, *please* let me know ... I have three servers, all three exhibiting this problem, and one of them is fully loaded with the kernel debug stuff so that I can (I think) give you almost *anything* you want in the way of information concerning the problem ... Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED] Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
iir(4) driver (Was: Re: Safe card to replace for ICP Vortex GDT851...)
On Tue, 1 Aug 2006, Patrick M. Hausen wrote: Hello! On Tue, Aug 01, 2006 at 09:51:59AM +0200, Patrick M. Hausen wrote: That's really really bad news. Oddly, ICP Vortex Germany told me the opposite wr/t to their new line of cards. They said, they were working on full FreeBSD support. I'll check what they have to say about the GDT controllers. OK - so here's the deal: The GDT products are officially EOE (End Of Engineering). ICP Vortex will not provide capacity to update their own driver for FreeBSD 6. The new products will feature full FreeBSD support, eventually. (couple of weeks, he said) 'k, just to clarify here ... the new products won't be based on the iir(4) driver then? Basically, should the iir(4) driver be considered EOE also? Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: iir(4) driver (Was: Re: Safe card to replace for ICP Vortex GDT851...)
On Tue, 1 Aug 2006, Scott Long wrote: Ok guys, time for a small breather here. All these claims about EoE and orphanage and whatnot are a bit premature and underinformed. First, the iir driver is being worked on when the need arises. Several bugs were fixed in it a few months ago, and until Mark's recent series of mails on it, no other problems had been reported. So far there is only one person reporting unhappiness with it, which doesn't necessarily mean that there is systematic trouble with the driver or the hardware. Second, various Adaptec sources have confirmed that they do support FreeBSD. Making big statements in public that they don't, or that it's not up to ones' standards or hopes, isn't terribly useful or productive. I'd hate for FreeBSD to turn into That Other BSD that publically abuses and harasses vendors for percieved sleights. There are much more positive and product ways to fix problems and form good relationships, and those ways are actively being pursued by some people right now. As email'd previous, I do apologize if my email was taken as disgruntled against Adaptec, for it was not meant as such ... it was merely meant as a warning to others, similar to your disclaimer below, that if you are running a card using the iir(4) driver, and are looking to move up to FreeBSD 6.x, that they might experience issues ... Please also note that until I hit what, from most angles, was appearing to be major brick walls, I was doing everything I could to, and am still willing to, provide all of the information I can towards diagnosing and fixing the issue ... I had tried all avenues that I knew about ... I tried email'ng the listed MAINTAINER, no response ... I got an email from one developer telling me that there wasn't much that could be done, due to the closed specs, without being able to get ahold of said MAINTAINER ... and the response I got back from ICP Vortex was one of the inbox driver should work fine, but we don't official support FreeBSD ... it doesn't leave much of a warm feeling that the driver is anything but orphaned :( My email was meant as a warning so that others could hopefully avoid the several weeks it took me to get to the point that all *appeared* lost ... Also, please note that in my email, I did finish it off with a plea that if anyone from Adaptec, or working with them, was out there, that my server was pretty much at their disposal to fix the problem, even at the risk of losing clients due to the downtime ... And here again is my standard disclaimer: I highly recommend that anyone who takes their data integrity seriously should spend time qualifying any RAID solution that they are interested in before putting it into production. What works for your workload might not work for someone else's workload, and vice-versa. In this case, we're talking about 3 servers that ran flawlessly with the iir(4) driver under 4.x, that are no exhibiting the deadlock/hang issues, after upgrading to FreeBSD 6.x ... Up until upgrading to FreeBSD 6.x, I've *never* had a problem with either an Adaptec controller, or running one with FreeBSD ... Scott Patrick M. Hausen wrote: Hello! 'k, just to clarify here ... the new products won't be based on the iir(4) driver then? Yes, they won't. Basically, should the iir(4) driver be considered EOE also? As far as Adaptec and ICP Vortex are concerned, yes. Since the driver is Open Source, there is no enforced EOE, just orphanage, if nobody is willing to work on it. Regards, Patrick M. Hausen Leiter Netzwerke und Sicherheit Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Safe card to replace for ICP Vortex GDT8514RZ ...
I have a remote server, running the above RAID controller, that, as most ppl here have seen over the past few weeks, is causing endless headaches ... Official word from Adaptec is that FreeBSD is no longer a supported platform, so, I either live with the deadlocks, or try and figure out a suitable replacement for the card ... So, can anyone recommend a card to replace this with? Its a remote server, so I'm looking for something that will be plug-n-play, same slot that the GDT is in ... I realize that I'll have to reformat the server afterwards ... Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
[IMPORTANT] Adaptec no longer supporting iir(4) driver ...
'k, I finally got ahold of someone @ adaptec, and the official word seems to be: FreeBSD 6 is not officially supported for the GDT based ICP RAID controllers. Nevertheless the inbox driver should work. Great, well, the inbox driver doesn't work with FreeBSD 6.x, and support doesn't exist to get it fixed, mainly since, as most ppl here know, the specs are closed, so even a non-Adaptec person can't do much to fix the problem(s) ... For those that haven't been following the discussion on this, the iir(4) driver in FreeBSD 6.x appears to have a deadlock issue under medium to heavy load, where the 'blocked' state just continues to rise until file accesses just no longer work ... So, if you are running a server that is using the iir(4) device driver and are considering upgrading to FreeBSD 6.x and beyond, or are looking to build a new machine using a device that relies on this driver, do so at your own peril ... Please note that this deadlock issue exists on *both* the ICP Vortex cards, *and* the Intel based RAID controllers ... If anyone from Adaptec is out there and is actually interested in seeing this problem fixed, *please* let me know ... I have three servers, all three exhibiting this problem, and one of them is fully loaded with the kernel debug stuff so that I can (I think) give you almost *anything* you want in the way of information concerning the problem ... Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: file system deadlock - the whole story?
On Wed, 19 Jul 2006, Kostik Belousov wrote: On Wed, Jul 19, 2006 at 01:31:17AM -0300, User Freebsd wrote: Kostik/Robert ... does this provide enough (any?) information concerning the deadlock situation(s) that are being reported? is there anything else I should do the next time it happens? I tried to submit a GnATs report on this also, but fear that the attachment was a wee bit too big :( Marc, thank you for the report. It does contain useful information, I'm looking into it. I see at least one obvius deadlock (you shell becomes unresponible when you tried to make auto-completion, right ?). Yup, that was when I first noticed it this time through, actually ... On Tue, 18 Jul 2006, User Freebsd wrote: 'k, had a bunch of fun tonight, but one of the results is that I was able to achieve file system deadlock, or so it appears ... Using the following from DDB: set $lines=0 show pcpu show allpcpu ps trace alltrace show locks show alllocks show uma show malloc show lockedvnods call doadump I've been able to produce the attached output, as well as have a core dump that can hopefully be used to gather any that I may have missed this time *cross fingers* Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: file system deadlock - the whole story?
On Wed, 19 Jul 2006, Kostik Belousov wrote: You did not provided the output of show lockedbufs, Added to my debug list ... but, even without that data, I doubt that the buf subsystem deadlocked by itself. I make an conjecture that the problem is either with you disk hardware (i.e., actual hard drive or disk controller), or in the controller driver. The problem that I have with this theory is that it isn't just one server doing this, or one type of hardware ... all three of the servers that I've upgraded to FreeBSD 6.x are doing it at some point or another ... I'm just getting jupiter (older Dual-PIII server) rebooted now :( Also note that under FreeBSD 4.x, all three of these machines were pretty much my more solid machines, with even more vServers running on them then I'm able to run with 6.x ... once I got rid of using unionfs, stability skyrocketed :( Hr ... but, your 'controller driver' comment ... that is one common thing amongst all three servers ... they are all running the iir driver ... not sure the *exact* controller, but pluto (older Dual-PIII) shows it as: iir0: Intel Integrated RAID Controller mem 0xfc8f-0xfc8f3fff irq 30 at device 9.0 on pci1 iir0: [GIANT-LOCKED] Beyond that controller, jupiter/pluto are Dual-PIII with 36G Seagate drives, uranus is a Dual-Xeon with 72G Seagate drives ... At least, you could show us the dmesg. I'll have to get that for you after next reboot, as /var/run/dmesg.boot shows: uranus# less /var/run/dmesg.boot WARNING: /tmp was not properly dismounted WARNING: /usr was not properly dismounted WARNING: /var was not properly dismounted And that's it :( Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: file system deadlock - the whole story?
On Wed, 19 Jul 2006, Robert Watson wrote: On Wed, 19 Jul 2006, User Freebsd wrote: Also note that under FreeBSD 4.x, all three of these machines were pretty much my more solid machines, with even more vServers running on them then I'm able to run with 6.x ... once I got rid of using unionfs, stability skyrocketed :( Hr ... but, your 'controller driver' comment ... that is one common thing amongst all three servers ... they are all running the iir driver ... not sure the *exact* controller, but pluto (older Dual-PIII) shows it as: Yes, this was going to be my next question -- if you're seeing wedges under load and there's a common controller in use, maybe we're looking at a driver bug. Bugs of those sort typically look a lot like what you describe: an I/O is lost and so eveything that depends on the I/O wedges waiting for it, leading to a lot of processes hanging around waiting for vnode locks, etc. 'k, but how do we debug *that*? :( If it was one, I'd suspect hardware ... but *three*, and only acting up *after* upgrading to FreeBSD 6.x, and only acting up under load ... Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: file system deadlock - the whole story?
On Wed, 19 Jul 2006, Kostik Belousov wrote: On Wed, Jul 19, 2006 at 11:23:21AM -0300, User Freebsd wrote: On Wed, 19 Jul 2006, Robert Watson wrote: On Wed, 19 Jul 2006, User Freebsd wrote: Also note that under FreeBSD 4.x, all three of these machines were pretty much my more solid machines, with even more vServers running on them then I'm able to run with 6.x ... once I got rid of using unionfs, stability skyrocketed :( Hr ... but, your 'controller driver' comment ... that is one common thing amongst all three servers ... they are all running the iir driver ... not sure the *exact* controller, but pluto (older Dual-PIII) shows it as: Yes, this was going to be my next question -- if you're seeing wedges under load and there's a common controller in use, maybe we're looking at a driver bug. Bugs of those sort typically look a lot like what you describe: an I/O is lost and so eveything that depends on the I/O wedges waiting for it, leading to a lot of processes hanging around waiting for vnode locks, etc. 'k, but how do we debug *that*? :( If it was one, I'd suspect hardware ... but *three*, and only acting up *after* upgrading to FreeBSD 6.x, and only acting up under load ... Obvious step would be to replace controller by some different kind. Unfortunately, that one isn't an option ... these aren't local machines that I can easily swap hardware in :( Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: file system deadlock - the whole story?
On Wed, 19 Jul 2006, Robert Watson wrote: On Wed, 19 Jul 2006, User Freebsd wrote: Yes, this was going to be my next question -- if you're seeing wedges under load and there's a common controller in use, maybe we're looking at a driver bug. Bugs of those sort typically look a lot like what you describe: an I/O is lost and so eveything that depends on the I/O wedges waiting for it, leading to a lot of processes hanging around waiting for vnode locks, etc. 'k, but how do we debug *that*? :( If it was one, I'd suspect hardware ... but *three*, and only acting up *after* upgrading to FreeBSD 6.x, and only acting up under load ... There are two normal approaches: (1) Switch controllers and see if the problem goes away, then blame the controller that was replaced. :-) (2) Debug the driver when the system is in the wedged state. When Scott Long helped me out with an identical problem with the 3ware driver a few years ago, he basically added debugging output for the driver in the debugger to list the state of outstanding I/Os, count the number of in-bound, out-bound I/Os, etc, to try and find where the missing one was leaked. My impression is that once he had confirmed the presence of the problem, it was fairly easy to fix, but that confirming it required quite a bit of paperwork. 'k, first question is with the core file provide any insight into this? ie. provide further confirmation that it looks like the driver vs file system? second question, who is currently maintaining the iir driver? I've CC'd Achim in this, as he's listed in the man page as being the maintainer ... Now, uranus has all the various kernel debugging enabled right now, and a serial console, so we're good for the debugging side of things ... and I believe that I can fairly easily recreate the issue by just moving a whack of vServers onto that machine to give it the load that seems to kill it ... *and* uranus is one of my newer machines, so the card that is in it is fairly new ... but, since I have a full BIOS serial console working on it, I should be able to get full model # and firmware version, which I take it will help some? Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: file system deadlock - the whole story?
On Wed, 19 Jul 2006, Scott Long wrote: Now, uranus has all the various kernel debugging enabled right now, and a serial console, so we're good for the debugging side of things ... and I believe that I can fairly easily recreate the issue by just moving a whack of vServers onto that machine to give it the load that seems to kill it ... *and* uranus is one of my newer machines, so the card that is in it is fairly new ... but, since I have a full BIOS serial console working on it, I should be able to get full model # and firmware version, which I take it will help some? What exact version of FreeBSD are you dealing with? 6-STABLE from ~Jun 28th ... but, I can upgrade it to the latest -STABLE if you feel that that might either help, or at least make debugging easier ... Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: file system deadlock - the whole story?
Kostik/Robert ... does this provide enough (any?) information concerning the deadlock situation(s) that are being reported? is there anything else I should do the next time it happens? I tried to submit a GnATs report on this also, but fear that the attachment was a wee bit too big :( On Tue, 18 Jul 2006, User Freebsd wrote: 'k, had a bunch of fun tonight, but one of the results is that I was able to achieve file system deadlock, or so it appears ... Using the following from DDB: set $lines=0 show pcpu show allpcpu ps trace alltrace show locks show alllocks show uma show malloc show lockedvnods call doadump I've been able to produce the attached output, as well as have a core dump that can hopefully be used to gather any that I may have missed this time *cross fingers* Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
vm_map.c lock up (Was: Re: NFS Locking Issue)
On Wed, 5 Jul 2006, Robert Watson wrote: If you can get into DDB when the hang has occurred, output via serial console for the following commands would be very helpful: show pcpu show allpcpu ps trace traceall show locks show alllocks show uma show malloc show lockedvnods 'k, after 16 days uptime, the server that I got all the debugging turned on for finally hung up solid ... I was able to break into DDB over the serial link, and have run all of the above on it ... and the output is attached ... One thing to note is that the ps listing is not complete ... there are 6k processes running at the time, and I don't know how to get rid of the '--more--' prompt :( After 1k processes, I just hit 'q' and went onto the other commands ... Also, traceall gave me a 'No such command' error ... now that I think about it, my luck, it was supposed to be 'trace all'? If this doesn't provide enough information, please let me know what else I should do the next time through, besides the above commands ... Oh, and how do you get DDB to 'dump core' in 6.x? Back in 4.x days, I'd just do 'panic' (maybe twice) at the DDB prompt, but that didn't work with 6.x ... it just gave me a stacktrace and then the DDB prompt both times ... Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 typescript.gz Description: Binary data ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: vm_map.c lock up (Was: Re: NFS Locking Issue)
On Sat, 15 Jul 2006, User Freebsd wrote: On Wed, 5 Jul 2006, Robert Watson wrote: If you can get into DDB when the hang has occurred, output via serial console for the following commands would be very helpful: show pcpu show allpcpu ps trace traceall show locks show alllocks show uma show malloc show lockedvnods 'k, after 16 days uptime, the server that I got all the debugging turned on for finally hung up solid ... I was able to break into DDB over the serial link, and have run all of the above on it ... and the output is attached ... One thing to note is that the ps listing is not complete ... there are 6k processes running at the time, and I don't know how to get rid of the '--more--' prompt :( After 1k processes, I just hit 'q' and went onto the other commands ... Also, traceall gave me a 'No such command' error ... now that I think about it, my luck, it was supposed to be 'trace all'? If this doesn't provide enough information, please let me know what else I should do the next time through, besides the above commands ... Oh, and how do you get DDB to 'dump core' in 6.x? Back in 4.x days, I'd just do 'panic' (maybe twice) at the DDB prompt, but that didn't work with 6.x ... it just gave me a stacktrace and then the DDB prompt both times ... Quick appendum ... the kernel on this server is from June 28th of this year ... Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: vm_map.c lock up (Was: Re: NFS Locking Issue)
On Sat, 15 Jul 2006, Kostik Belousov wrote: On Sat, Jul 15, 2006 at 12:10:29AM -0300, User Freebsd wrote: On Wed, 5 Jul 2006, Robert Watson wrote: If you can get into DDB when the hang has occurred, output via serial console for the following commands would be very helpful: show pcpu show allpcpu ps trace traceall show locks show alllocks show uma show malloc show lockedvnods 'k, after 16 days uptime, the server that I got all the debugging turned on for finally hung up solid ... I was able to break into DDB over the serial link, and have run all of the above on it ... and the output is attached ... One thing to note is that the ps listing is not complete ... there are 6k processes running at the time, and I don't know how to get rid of the '--more--' prompt :( After 1k processes, I just hit 'q' and went onto the other commands ... set lines=0 Also, traceall gave me a 'No such command' error ... now that I think about it, my luck, it was supposed to be 'trace all'? It is alltrace. If this doesn't provide enough information, please let me know what else I should do the next time through, besides the above commands ... Missing alltrace output seems to be critical. If this is not feasible, please, provide at least the output of the bt pid for each pid shown in the show lockedvnods and show alllocks. In you case, bt 64880 was the most interesting. It is pity that you had reset the machine. Was down for too long as it was ... it, of course, happened while I was out with the family :( Will keep all of this in mind next time I get a chance to run through things ... Any idea why 'panic' doesn't produce core like it used to? Just in case, do you use mlocked mappings ? Also, why so huge number of crons exist in the system ? The are all forking now. It may be (can not say definitely without further investigation) just a fork bomb. mlocked mappings? What are they? :) re: crons ... this, I'm not sure of, but my suspicion was that the crons weren't able to complete, since the file system was locked up, but the next one was being attempted to run ... *shrug* Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: em device hangs on ifconfig alias ...
On Mon, 10 Jul 2006, Patrick M. Hausen wrote: Mornin'! On Mon, Jul 10, 2006 at 12:11:36AM -0400, Mike Tancsa wrote: Not sure what STP is Spanning Tree Protocol. Having the link go up and down would cause the switch port to block traffic for a period of time. Of course, any reasonable administrator would configure interface FastEthernet0/1 spanning-tree portfast 'k, I know nothing about Cisco but do have access to change my configs (knowing nothing tends to keep me from doing too much playing) ... what does the above do, exactly? Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: em device hangs on ifconfig alias ...
On Mon, 10 Jul 2006, Mike Tancsa wrote: At 01:20 PM 08/07/2006, Ruslan Ermilov wrote: Ah, I see. Thanks for the insight. How about the attached patch? I've been working on this problem for Mike Tancsa about a year ago, and my fix was naive. I ended up not committing it because I found that it broke something else, but I don't remember what exactly now. Ahh, I seem to remember now -- setting a different MAC address was not programmed into a hardware with my patch applied. For my uses, this was a non issue. Having STP block for 20 seconds because I add or remove an alias made it kind of a non issue. Not sure what STP is, but I've not noticed any blocking on removing an alias, only on adding one ... Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: em device hangs on ifconfig alias ...
On Sat, 8 Jul 2006, Michael Vince wrote: I thought I remember a developer working on the em driver saying just before 6.1 was released that this reset was needed and couldn't be avoided to ensure performance of the device to work at its best, I can't remember his explanation, but this topic has come up before, of course anything is possible to fix. The thing is, and I may be mis-understanding the explanations so far, the 'reset' is to renegotiate the connection ... if that is the case, and both the switch and the interface are already locked at a speed (in my case, both are hard coded to 100baseTX full duplex, then what is there to re-negotiate? And, why does it appear that *only* the em driver/interface requires this? I run bge and fxp interfaces on this same network, against the same switch, all locked at the same speed, and only the em driver exhibits this problem ... in fact, its only the *newer* em driver that does, as I have one server on the network, using an em interface, that is running an older FreeBSD 4.x kernel, that performs the same as the bge/fxp (ie. perfectly) Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: em device hangs on ifconfig alias ...
On Fri, 7 Jul 2006, Atanas wrote: Robert Watson said the following on 7/7/06 7:17 AM: I just left a tcpdump -n arp host 10.10.64.40 on a third machine sniffing around and tested all em module versions I had (the stock 6.1, 6-STABLE and 6-STABLE with your patch), but got silence on all three: That's odd. I've tested it on CURRENT and I could see the ARP packet. Are you sure you patched correctly? If so I have to build a RELENG_6 machine and give it try. Is it possible you're seeing an interaction between the reset generated as part of IP address changing, and the time it takes to negotiate link? It's possible that the arp packets are being eaten during the link negotiation, so for systems negotiating quickly (or not at all) then the arp packet is seen on other hosts, and otherwise not... Looks like this is exactly what happens. I was able to see it by running two tcpdump instances - one on the EM machine running in background and another running elsewhere on the same subnet. So on the EM machine the arp packet actually gets generated by em(4) and caught by the tcpdump running there: EM# tcpdump -n arp and ether src 00:04:23:b5:1b:ff EM# EM# ifconfig em1 inet alias 10.10.64.40 EM# 11:28:37.178946 arp who-has 10.10.64.40 tell 10.10.64.40 EM# But it doesn't reach the other tcpdump instance running on another host. It seems that the arp packet gets killed before leaving the EM machine, due to the card initialization or something else. I tried sending it manually with arping, just to make sure both tcpdumps operate properly and yes, the packet got delivered to both. I think that I have patched, built and loaded the em(4) kernel module correctly. After applying the patch there were no rejects, before building the module I intentionally appended (patched) to its version string in if_em.c, and could see that in dmesg every time I loaded the module: em1: Intel(R) PRO/1000 Network Connection Version - 3.2.18 (patched) Is it possible that we're going at this issue backwards? It isn't the lack of ARP packet going out that is causing the problems with moving IPs, but that delay that we're seeing when aliasing a new IP on the stack? The ARP packet *is* being attempted, but is timing out before the re-init is completing? Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: NFS Locking Issue
On Wed, 5 Jul 2006, Robert Watson wrote: On Wed, 5 Jul 2006, Danny Braniss wrote: In my case our main servers are NetApp, and the problems are more related to am-utils running into some race condition (need more time to debug this :-) the other problem is related to throughput, freebsd is slower than linux, and while freebsd/nfs/tcp is faster on Freebsd than udp, on linux it's the same. So it seems some tunning is needed. our main problem now is samba/rpc.lockd, we are stuck with a server running FreeBSD 5.4 which crashes, and we can't upgrade to 6.1 because lockd doesn't work. So, if someone is willing to look into the lockd issue, we would like to help. The most significant problem working with rpc.lockd is creating easy to reproduce test cases. Not least because they can potentially involve multiple clients. If you can help to produce simple test cases to reproduce the bugs you're seeing, that would be invaluable. I'm aware of two general classes of problems with rpc.lockd. First, architectural issues, some derived from architectural problems in the NLM protocol: for example, assumptions that there can be a clean mapping of process lock owners to locks, which fall down as locks are properties of file descriptors that can be inheritted. Second, implementation bugs/misfeatures, such as the kernel not knowing how to cancel lock requests, so being unable to implement interruptible waits on locks in the distributed case. Reducing complex failure modes to easily reproduced test cases is tricky also, though. It requires careful analysis, often with ktrace and tcpdump/ethereal to work out what's going on, and not a little luck to perform the reduction of a large trace down to a simple test scenario. The first step is to try and figure out what, if any, specific workload results in a problem. For example, can you trigger it using work on just one client against a server, without client-client interactions? This makes tracking and reproduction a lot easier, as multi-client test cases are really tricky! Once you've established whether it can be reproduced with a single client, you have to track down the behavior that triggers it -- normally, this is done by attempting to narrow down the specific program or sequence of events that causes the bug to trigger, removing things one at a time to see what causes the problem to disappear. This is made more difficult as lock managers are sensitive to timing, so removing a high load item from the list, even if it isn't the source of the problem, might cause it to trigger less frequently. I'm not sure if this is an option for anyone, either developer or user, but in the past, on particularly tricky bugs where I seemed to be the only one to be able to produce it, I've given access to a 'trusted developer' to the machine itself, to minimize the time lag that emails create ... but, also, to let the developer at a machine that has the load required to easily reproduce it ... Not sure if there is anyone out there, on either side of the proverbial fence, that feels comfortable doing this, but figured I'd throw the idea out ... I believe, in Francisco's case, they are willing to pay someone to fix the NFS issues they are having, which, i'd assume, means easy access to the problematic server(s) to do proper testing in a real life scenario ... Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: NFS Locking Issue
On Wed, 5 Jul 2006, Michel Talon wrote: So it may be relevant to say that i have kernels without IPV6 support. Recall that i have absolutely no problem with the client in FreeBSD-6.1. Tomorrow i will test one of the 6.1 machines as a NFS server and the other as a client, and will make you know if i see something. Well, i have checked between 2 FreeBSD-6.1-RELEASE machines on the network, both have fxp ethernet driver running at 100 Mb/s, one is NFS server other NFS client. Both run lockd and statd. I have absolutely no problem exchanging files, for example if i begin to copy /usr/src through NFS from one machine to the other, which makes a lot of transactions of all sorts, i get: niobe# mount asmodee:/usr/src /mnt cp -R /mnt/src . ... after some time i interrupt the transfer niobe% du -sh . 131M. and during this time i observe the following type of statistics asmodee% netstat -w 1 -I fxp0 input (fxp0) output packets errs bytespackets errs bytes colls 542 0 84116 1330 01219388 0 515 0 72806 1290 01196330 0 501 0 95722 1081 0 741048 0 539 0 90704 1090 01228052 0 645 0 67888902 01451098 0 405 0 81264 1609 0 604278 0 503 0 74218709 0 924422 0 500 0 98904973 0 619350 0 550 0 100122855 0 836328 0 615 0 79336 1081 0 862772 0 577 0 82862901 01005024 0 which looks decent to me. Doing the same with just one big file no problem either, and i get a transfer speed of 6.60 MB/s which is perhaps a little less than with linux, but nothing catastrophic. I get 8.20 MB/s for FreeBSD client interacting with the Linux server. Now netstat gives packets errs bytespackets errs bytes colls 785 0 123266 4716 06825600 0 759 0 139898 4530 07747276 0 852 0 124652 5106 06902566 0 863 0 128040 5170 07081738 0 811 0 123760 4862 06851498 0 789 0 123540 4720 06834310 0 840 0 115378 5024 06382114 0 So up to what i can see NFS works OK for me on FreeBSD-6.1. So the main difference with other people cases may be that i have removed IPV6 support from kernel. What are others using for ethernet? In your case, you say you are running between fxp cards ... I've heard some report, in another thread, problems with the bge driver ... could we be possibly talking internet vs nfs issues? Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: NFS Locking Issue
On Wed, 5 Jul 2006, Francisco Reyes wrote: Scott Long writes: For what it's worth, I recently spent a lot of time putting FreeBSD 6.1 to the test as both an NFS client and server in a mixed OS environment. I have a few debugging settings/suggestions that have been sent my way and I plan to try them tonight, but this is just another report.. FreeBSD only environment. Today after hours going crazy with horrible performance I brought down nfsd and brought it back up.. that simple process got vmstat 'b' column down and everything was back to normal. Again this will not help anyone troubleshoot, but just to mention that it happens even with a FreeBSD only environment. 'k, to those out there that know what is useful, and what isn't ... If Francisco had DDB enabled, did a CTL-ALT-ESC when the above happens, and does a 'panic' to crash the server and dump a core ... can anything useful be gleamed from that core dump? Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: NFS Locking Issue
On Mon, 3 Jul 2006, Francisco Reyes wrote: Kostik Belousov writes: I think that then 6.2 and 6.3 is not for you either. Problems cannot be fixed until enough information is given. I am trying.. but so far only other users who are having the same problem are commenting on this and other simmilar threads. We just need some guidance.. Mark gave me a URL to turn on debugging and volunteered ot give me some pointers.. I will try, but I will likely try on my own time, on my own machines.. I can not tell the owner of the company I work for to let me try.. or play around in production machines.. as we loose customers because of current problems with the 6.X line. Since nobody except you experience that problems (at least, only you notified about the problem existence) Did you miss the part of: User Freebsd writes: Since there are several of us experiencing what looks to be the same sort of deadlock issue, I beseech you not to give up I am not the only one reporting or having the issue. Careful here, I think this is where things are getting confused ... the above is related to the deadlock (high vmstat blockd issue), not the NFS issue ... we're getting two different issues confused :) improved handling of signals in nfs client. If you could test it, that would be useful. Does it matter if the OS is i386 or am64? Have an amd64 machine I can more easily play with... with no risk to production. Does the amd64 machine exhibit the same problem? Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: High vmstat, filesystem unresponsive then hang 6.1 Stable
This is the same issue that I've been hitting, and that requires the serial console / DDB stuff described in the debugging deadlocks web page that I pointed you at ... So far *knock on wood* since adding all of the debugging to one of my server, none of mine have done it ... but the more ppl experiencing this, and getting the debugging in place to provide proper kernel traces, the better ... On Sat, 1 Jul 2006, Francisco Reyes wrote: I believe this may be related to the NFS issues mentioned recent, but hopefully I may have captured enough info to help others troubleshoot.. I got the header of some ps commands.. and when was about to do full listing of the same ps commands to files.. the machine hung up. The machine is 6.1 Stable around 6-25 ( plus or minus 1 day). iostat 5 (not much of a load) tty da0 cpu tin tout KB/t tps MB/s us ni sy in id 0 31 17.71 125 2.17 20 0 5 1 74 0 26 8.57 23 0.19 0 0 1 0 99 09 33.73 10 0.34 0 0 0 0 99 0 21 8.42 18 0.15 0 0 1 1 99 09 15.92 58 0.90 0 0 0 0 99 09 15.18 7 0.10 0 0 0 0 99 0 53 12.93 9 0.11 0 0 1 0 99 0 31 5.17 58 0.29 0 0 1 1 99 vmstat 5 (very high 'b' column) procs memory page disk faults cpu r b w avmfre flt re pi po fr sr da0 in sy cs us sy id 0 248 2 1410436 110728 1519 2 0 0 1644 264 0 4481 8862 9168 20 6 74 0 248 0 1410436 1107960 0 0 0 13 0 4 700 40 1426 0 1 99 0 248 0 1410436 1107641 0 0 0 39 0 14 1253 722 2615 0 1 99 0 248 0 1410436 1107201 0 0 0 10 0 5 407 396 899 0 1 99 0 248 0 1410436 1107041 0 0 0 60 0 21 2822 360 5695 0 2 98 0 248 0 1410436 1106841 0 0 0 10 0 7 538 434 1166 0 1 99 0 248 0 1410436 1106680 0 0 0 75 0 51 576 163 1026 0 0 99 0 248 0 1410436 1106960 0 0 0 23 0 31 1171 190 2271 0 1 99 vmstat 5 procs memory page disk faults cpu r b w avmfre flt re pi po fr sr da0 in sy cs us sy id 0 250 1 1399688 152000 1517 2 0 0 1643 264 0 4479 8853 9163 20 6 74 0 250 0 1399688 1519682 0 0 0 25 0 28 1395 966 2852 0 2 98 0 250 0 1399692 1518921 0 0 0 12 0 6 446 540 986 0 0 99 0 250 2 1399692 1516041 0 0 0 50 0 37 803 675 1611 0 1 99 Don't recall which ps.. 411 1 0 ufs ?? Ds 0:04.81 /usr/sbin/mountd -r 37675 650 0 ufs ?? D 0:00.46 /usr/bin/perl /data/backaway/mailarchive/client/bin/smtpproxy 127.0.0.1:10026 127.0.0.1:10025 (perl5.8.7) 37919 650 0 ufs ?? D 0:00.46 /usr/bin/perl /data/backaway/mailarchive/client/bin/smtpproxy 127.0.0.1:10026 127.0.0.1:10025 (perl5.8.7) 39306 650 0 ufs ?? D 0:00.39 /usr/bin/perl /data/backaway/mailarchive/client/bin/smtpproxy 127.0.0.1:10026 127.0.0.1:10025 (perl5.8.7) 40214 386494100 ufs ?? Ds 0:00.00 /usr/local/bin/maildrop -d [EMAIL PROTECTED] 40220 329434100 ufs ?? Ds 0:00.00 /usr/local/bin/maildrop -d [EMAIL PROTECTED] 40223 332574100 ufs ?? Ds 0:00.00 /usr/local/bin/maildrop -d [EMAIL PROTECTED] 40226 329424100 ufs ?? Ds 0:00.00 /usr/local/bin/maildrop -d [EMAIL PROTECTED] 40228 331994100 ufs ?? Ds 0:00.00 /usr/local/bin/maildrop -d [EMAIL PROTECTED] 40231 385994100 ufs ?? Ds 0:00.00 /usr/local/bin/maildrop -d [EMAIL PROTECTED] 40233 328964100 ufs ?? Ds 0:00.00 /usr/local/bin/maildrop -d [EMAIL PROTECTED] 40236 332244100 ufs ?? Ds 0:00.00 /usr/local/bin/maildrop -d [EMAIL PROTECTED] 40238 328764100 ufs ?? Ds 0:00.00 /usr/local/bin/maildrop -d [EMAIL PROTECTED] 40240 329764100 ufs ?? Ds 0:00.00 /usr/local/bin/maildrop -d [EMAIL PROTECTED] 40242 355804100 ufs ?? Ds 0:00.00 /usr/local/bin/maildrop -d [EMAIL PROTECTED] 40246 355934100 ufs ?? Ds 0:00.00 /usr/local/bin/maildrop -d [EMAIL PROTECTED] 40248 329234100 ufs ?? Ds 0:00.00 /usr/local/bin/maildrop -d [EMAIL PROTECTED] 40252 355964100 ufs ?? Ds 0:00.01 /usr/local/bin/maildrop -d [EMAIL PROTECTED] 40253 298334100 ufs ?? Ds 0:00.01 /usr/local/bin/maildrop -d [EMAIL PROTECTED] ps ax -O ppid,flags,mwchan | awk '$6 ~ /^D/ || $6 == STAT' PID PPID F MWCHAN TT STAT TIME COMMAND 2 0 204 - ?? DL 0:17.68 [g_event] 3 0 204 - ?? DL 9:14.85 [g_up] 4 0 204 - ?? DL10:50.81 [g_down] 5 0 204 - ?? DL 0:02.93 [thread taskq] 6 0 204 - ?? DL 0:00.00 [acpi_task0] 7 0 204 - ?? DL 0:00.00 [acpi_task1] 8 0 204 - ?? DL 0:00.00 [acpi_task2] 9 0 204 - ?? DL 0:00.00 [kqueue taskq] 15 0
Re: NFS Locking Issue
On Sat, 1 Jul 2006, Francisco Reyes wrote: John Hay writes: I only started to see the lockd problems when upgrading the server side to FreeBSD 6.x and later. I had various FreeBSD clients, between 4.x and 7-current and the lockd problem only showed up when upgrading the server from 5.x to 6.x. It confirms the same we are experiencing.. constant freezing/locking issues. I guess no more 6.X for us.. for the foreseable future.. Since there are several of us experiencing what looks to be the same sort of deadlock issue, I beseech you not to give up ... right now, all we've been able to get to the developers is virtually useless information (vmstat and such shows the problem, but it doesn't allow developers to identify the problem) ... Is this a problem that you can easily recreate, even on a non-production machine? In my case, I have one machine fully configured for debugging, but, of course, since re-configuring it, it hasn't exhibited the problem ... if most of us get our machines configured properly to give useful information to the developers to debug this, the faster it will get fixed ... My experience with most of the developers is that if you can get into DDB and give them 'internal traces' of the code, bugs tend to get fixed very quickly ... vmstat/ps give external views, more summaries then anything ... its the details under the hood that they need ... its not much different then your auto-mechanic ... try telling him there is a 'knocking under the hood, please tell me how to fix it, but you can't have my car', and he'll brush you off ... give him 30 minutes under the hood, and not only will he have identified it, but he'll probably fix it too ... Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: em device hangs on ifconfig alias ...
On Fri, 30 Jun 2006, Atanas wrote: A workaround is to power both of the systems down and then power them up. This however cannot be done remotely and in case there were IP aliases, they still don't get any traffic. see 'arping' ... great little tool, solved all my problems as far as moving around IPs ... I still have many 4.x based machines, and both em issues (the card reset on each alias and the arp packets not been sent when going down) were present when I was doing my tests. Right, what version of 4.x? The one that I have working is from ~Feb 2005 .. if I were to upgrade that to the latest 4-STABLE, it would break like the rest ... the older 4.x had a different em driver in the kernel then the newer one ... Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: configuring sio1 for serial console ...
On Thu, 29 Jun 2006, Robert Watson wrote: On Wed, 28 Jun 2006, User Freebsd wrote: On Wed, 28 Jun 2006, Robert Watson wrote: On Wed, 28 Jun 2006, User Freebsd wrote: Instead of changing your kernel config, edit the sio1 entries in /boot/device.hints. (This assumes you left device sio in your kernel -- if not, you need to re-add it). 'k, re-adding ... and I take it there is no more 'DDB_UNATTENDED' option? Something equivalent? This is now KDB_UNATTENDED, since it affects by DDB and GDB. KDB is the common debugger framework backend used to implement front-end debuggins ervices. Ya, figured this one out when I tried to compile ... someone might want to add a mention of the new options in the ddb man page though :) You mean like the following text in the ddb(4) man page? NAME ddb -- interactive kernel debugger SYNOPSIS options KDB options DDB To prevent activation of the debugger on kernel panic(9): options KDB_UNATTENDED Ack, I was probably reading on one of my 4.x boxes :( But, the handbook does need to be updated: http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-online-ddb.html Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: em device hangs on ifconfig alias ...
On Thu, 29 Jun 2006, Peter Jeremy wrote: On Thu, 2006-Jun-29 17:30:07 +1000, Michael Vince wrote: For me its IP alias additions take 1 or maybe 2secs, but it is noticeable, but really isn't an issue for me. But it obviously is for Atanas, who has 100's of aliases. In my case, it isn't 100's, but the problem is noticeable ... I have my start up scripts, right now, do the ifconfig, sleep for 45 seconds, and then start up the jail ... and even then, apache doesn't *always* start up, since sometimes that isn't long enough for the network to come back up for DNS to be reachable :( Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.1-R ? 6-Stable ? 5.5-R ?
On Thu, 29 Jun 2006, Francisco Reyes wrote: Kostik Belousov writes: Approved by:pjd (mentor) Revision ChangesPath 1.156.2.3 +16 -0 src/sys/nfsserver/nfs_serv.c 1.136.2.3 +4 -0 src/sys/nfsserver/nfs_srvsubs.c The above files are what I have. Yes from a 6.1 stable around 6-25-06 What this means ? That you have _this_ revisions of the files, and your LA skyrocketed ? LA = load average? Our problem is vmstat 'b' column growing and nfs causing locks on the server side. When the machine locked it was running a background fsck. I saw Giant a lot in the status of the nfsd. 'k, you are going through something similar to me, it seems ... have you implemented the stuff on: http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html and: enabled DDB within your kernel? (man ddb) ... DDB is required for the deadlocks debugging ... also, is this a machine you have 'easy hands on' for? (ie. in my case, I'm dealing with remote servers, which is *really* fun) ... Actually, what you will want to do is setup a serial console if you can, so that you can 'trap the output' of the commands that stuff like ps and all that will throw out from DDB, unless you *really* like to write? I can help you get this all setup offlist if you wish, just email me and we'll work through the steps required ... once you have a debug environment in place, then generating a good/complete/in depth problem report is easier ... Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: NFS Locking Issue
On Thu, 29 Jun 2006, Francisco Reyes wrote: Michel Talon writes: Strange, since i upgraded to FreeBSD-6.1 and the NFS server to Fedora Core 5, my machine, NFS client is happy, and lockd works. What volume are we talking about? My own problems and other reports I see are all under heavy load. the one thing that sticks out to me about this report is that they upgraded teh NFS server to FC5 ... what was the server running before? if FreeBSD, could the problem be an interaction problem between the NFS server and client, vs just the client side? Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: em device hangs on ifconfig alias ...
On Fri, 30 Jun 2006, Michael Vince wrote: The thing that have to ask is if Atanas has 100's why can't he just boot Freebsd have have them all prebound to the interface at startup, why would you need to add and remove them constantly by the hundreds during normal server uptime? I do restart my jails now and then, but because the IPs are already bound to the interface I don't have any pause issues. In my case, I move jails around between machines for load balancing reasons ... so, a physical server may be up for, hell, in one case, 211 days, but a vServer may only have been on it a few days ... The other funny thing about the current em driver is that if you move an IP to it from a different server, the appropriate ARP packets aren't sent out to redirect the IP traffic .. recently, someone pointed me to arping, which has solved my problem *external* to the driver ... I have a third machine that uses an em driver, but its an older 4.x kernel, and it operates perfectly ... no timeouts/hangs and sends out the appropriate ARP packet ... all three servers are connected to the same Cisco switch, with all ports configured identically, so it isn't a switch issue, as someone else intimated ... Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
(no subject)
___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
How to enter DDB through a terminal server / remote console ... ?
'k, now that I'm up to 3 6-STABLE servers that are deadlocking, I'm spending time with the remote tech today to get a serial console put online ... how do I drop into DDB remotely, where the serial console is going through a Portmaster Terminal server? issuing CTL-ALT-ESC, I doubt, will work, will it? Thx Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
em device hangs on ifconfig alias ...
has anyone figured out why the em device 'hangs' for about 30-45 seconds whenever you ifconfig alias a new IP on to the device? Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
configuring sio1 for serial console ...
Following hte instructions in the Handbook, I've added the following line to my kernel config: device sio1 at isa? port IO_COM2 flags 0x10 irq 3 but, when I try to build it: config: /usr/src/sys/i386/conf/kernel:71: syntax error *** Error code 1 so, obviously that is wrong for 6.x? :( Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: How to enter DDB through a terminal server / remote console ... ?
On Wed, 28 Jun 2006, Robert Watson wrote: On Wed, 28 Jun 2006, User Freebsd wrote: 'k, now that I'm up to 3 6-STABLE servers that are deadlocking, I'm spending time with the remote tech today to get a serial console put online ... how do I drop into DDB remotely, where the serial console is going through a Portmaster Terminal server? issuing CTL-ALT-ESC, I doubt, will work, will it? If configured to use a serial console (console=comconsole in loader.conf), you can enter the debugger with BREAK_TO_DEBUGGER in the kernel config by sending a serial break. With my portmasters, I telnet to a TCP port to connect to the serial console, so I send a telnet break, using ^]send break. Have you ever had a problem with this warning: (useful for remote diagnostics, but also dangerous if you generate a spurious BREAK on the serial port!) in the handbook? Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: configuring sio1 for serial console ...
On Wed, 28 Jun 2006, Robert Watson wrote: On Wed, 28 Jun 2006, User Freebsd wrote: Following hte instructions in the Handbook, I've added the following line to my kernel config: device sio1 at isa? port IO_COM2 flags 0x10 irq 3 but, when I try to build it: config: /usr/src/sys/i386/conf/kernel:71: syntax error *** Error code 1 so, obviously that is wrong for 6.x? :( Instead of changing your kernel config, edit the sio1 entries in /boot/device.hints. (This assumes you left device sio in your kernel -- if not, you need to re-add it). 'k, re-adding ... and I take it there is no more 'DDB_UNATTENDED' option? Something equivalent? Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Expensive timeout?
Just got this on the console of one of hte servers that has been causing problems ... Expensive timeout(9) function: 0xc0520e18(0xc8b223a0) 0.296959250 s not a very informative error, and that is all that was there, nothing before, nothing after ... Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Expensive timeout?
Oh, wait, does this have something to do with the Deadlock options I just added to the kernel? On Wed, 28 Jun 2006, User Freebsd wrote: Just got this on the console of one of hte servers that has been causing problems ... Expensive timeout(9) function: 0xc0520e18(0xc8b223a0) 0.296959250 s not a very informative error, and that is all that was there, nothing before, nothing after ... Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: configuring sio1 for serial console ...
On Wed, 28 Jun 2006, Robert Watson wrote: On Wed, 28 Jun 2006, User Freebsd wrote: Instead of changing your kernel config, edit the sio1 entries in /boot/device.hints. (This assumes you left device sio in your kernel -- if not, you need to re-add it). 'k, re-adding ... and I take it there is no more 'DDB_UNATTENDED' option? Something equivalent? This is now KDB_UNATTENDED, since it affects by DDB and GDB. KDB is the common debugger framework backend used to implement front-end debuggins ervices. Ya, figured this one out when I tried to compile ... someone might want to add a mention of the new options in the ddb man page though :) Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Expensive timeout?
On Wed, 28 Jun 2006, Jonathan Noack wrote: Please don't top-post... User Freebsd wrote: On Wed, 28 Jun 2006, User Freebsd wrote: Just got this on the console of one of hte servers that has been causing problems ... Expensive timeout(9) function: 0xc0520e18(0xc8b223a0) 0.296959250 s not a very informative error, and that is all that was there, nothing before, nothing after ... Oh, wait, does this have something to do with the Deadlock options I just added to the kernel? Yes, if you look in /sys/kern/kern_timeout.c you'll note that the Expensive timeout(9) function printf is inside an #ifdef DIAGNOSTIC. 'k, but is this something that I should be concerned about, or just ignore? Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: em device hangs on ifconfig alias ...
On Wed, 28 Jun 2006, Atanas wrote: I have some newer machines with 2 Broadcom chips on-board. I plan to give them a try at some point in the future, but I'm not sure how stable the bge driver is when compared to fxp and em. I'm using the bge driver on our new HP servers, and haven't noticed any problems with them to date ... Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]