Re: FreeBSD unstable on Dell 1750 using SMP?
It doesn't look like a power problem. We have it with several systems in different datacenters. I've tried the "giantlock" setting, let's hope it works! Am I safe to assume that it can (negatively) impact performance of the system? What can be the cause of "fine grained locking" causing the crashes? I'm willing to let a developer play around with one of the affected machines... Thanks again for the suggestion Ulrich. Met vriendelijke groet / Kind Regards, Rutger Bevaart On Apr 5, 2006, at 1:53 AM, Ulrich Keil wrote: We solved the problem by running the network stack with Giant lock (set "debug.mpsafenet=0" in loader.conf). Since then the machine runs rock stable. Ulrich ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: FreeBSD unstable on Dell 1750 using SMP?
On Apr 4, 2006, at 4:37 AM, Rutger Bevaart wrote: This because we have 2850's that experience exactly the same problems, just less frequently (about once every 4 months). I'm completely at a loss, and inclined to remove FreeBSD and install "another OS" as it is an important management machine for us, that reboots about monthly. By all means, feel free to see whether the problem reoccurs using another OS, but it sounds like an intermittent hardware failure or power drop to me. I've got a dozen or so Dell 2800 or 2850 machines which have no problems reaching 6+ months of uptime. -- -Chuck ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: FreeBSD unstable on Dell 1750 using SMP?
On Apr 4, 2006, at 4:37 AM, Rutger Bevaart wrote: I'm completely at a loss, and inclined to remove FreeBSD and install "another OS" as it is an important management machine for us, that reboots about monthly. Any clues, tips, help, know bugs? Either bad hardware or pilot error. Here's some stats for you: [morebiz]% grep DELL /var/run/dmesg.boot ACPI APIC Table: acpi0: on motherboard [morebiz]% sysctl kern.boottime kern.boottime: { sec = 1130521993, usec = 140021 } Fri Oct 28 13:53:13 2005 [morebiz]% date Tue Apr 4 09:58:18 EDT 2006 [morebiz]% uptime 9:58AM up 157 days, 20:05, 1 user, load averages: 0.00, 0.00, 0.00 [morebiz]% uname -r 5.4-RELEASE-p8 This machine runs two instances of apache on two IPs, a postgres server and a mysql server to run a few different web sites. It gets a fair number of hits, many of which hit the dbs. I run with hyperthreading enabled, but when I next upgrade this box to 6.1, I will turn it off. I don't have any 2850's but the one 1850 I have has been 100% stable since it went into production last october running FreeBSD 6.0. I'd buy it again in a heartbeat. Are you sure your electrical power is stable? ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: FreeBSD unstable on Dell 1750 using SMP?
Argh. After all the fixes done on the 5.4-STABLE and 6.0 codebases my Dell PE1750 still reboots randomly. Again last night at 03.03 :- ( essages still shows nothing, nothing special was going on at the time (loadavg ~ 0.00). It's running: FreeBSD xyz 6.0-RELEASE-p4 FreeBSD 6.0-RELEASE-p4 #0: Sun Feb 19 21:15:01 CET 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/SMP i386 What I've tried to fix the problems: - kern_proc.c patch submitted to freebsd-stable by Don Lewis. - disable HTT - upgrade to 5-STABLE - upgrade to 6.0-RELEASE-p1,2,3,4 What we've _not_ tried: - Swap memory This because we have 2850's that experience exactly the same problems, just less frequently (about once every 4 months). I'm completely at a loss, and inclined to remove FreeBSD and install "another OS" as it is an important management machine for us, that reboots about monthly. Any clues, tips, help, know bugs? Regards Rutger Bevaart ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: FreeBSD unstable on Dell 1750 using SMP?
On 30 Nov, Dan Charrois wrote: > This is encouraging - it's the first I've heard of someone who has > found a way to trigger the problem "on demand". The problems I was > experiencing were on a dual Xeon with HTT enabled as well.Perhaps > someone out there who knows much more about the inner workings of > FreeBSD may have an idea of why running top in "aggressive mode" like > this might trigger the random rebooting. In particular, it would be > nice to *know* that someone out there specifically fixed whatever is > wrong in 5.4 when bringing it to 6.0. It's encouraging that you > haven't had any problems since upgrading to 6.0, but I have to wonder > if the bug's actually fixed, or the specific trigger of running top > doesn't trigger the problem but the problem is still lurking in the > background waiting to strike with the right combination of events. > > In any case, I'm anxious to try it out myself on our server to see if > "top -s0" brings it down "on command" with HTT enabled, and not with > HTT disabled. But I'm going to have to wait until some time over the > Christmas holidays to do that sort of experimentation at a time when > it isn't affecting the end users of the machine. I may also upgrade > to 6.0 at that time, since by then it will have been out for a couple > of months, so most of the worst quirks should be worked out by then. > > In the meantime, disabling HTT as I've done seems like a reasonable > precaution to improve the stability.. > > Thanks for your help! > > Dan Try this patch, which I posted to stable@ on October 15. I had hoped to commit it to RELENG_5 in November, but my day job intervened. -- Forwarded message -- From: Don Lewis <[EMAIL PROTECTED]> Subject: testers wanted for 5.4-STABLE sysctl kern.proc patch Date: Sat, 15 Oct 2005 14:51:37 -0700 (PDT) To: [EMAIL PROTECTED] Cc: The patch below is the 5.4-STABLE version of a patch that was recently committed to HEAD and 6.0-BETA5 to fix locking problems in the kern.proc sysctl handler that could cause panics or deadlocks. It has already been tested by myself and one other person in 5.4-STABLE, but I think it deserves wider testing before I commit it. Testing on SMP systems, while running threaded applications, and on systems that have experienced panics in the existing code is of the most interest. Also be on the lookout for any regressions, such as incorrect data being returned. Index: sys/kern/kern_proc.c === RCS file: /home/ncvs/src/sys/kern/kern_proc.c,v retrieving revision 1.215.2.6 diff -u -r1.215.2.6 kern_proc.c --- sys/kern/kern_proc.c22 Mar 2005 13:40:23 - 1.215.2.6 +++ sys/kern/kern_proc.c12 Oct 2005 19:13:14 - @@ -72,6 +72,8 @@ static void doenterpgrp(struct proc *, struct pgrp *); static void orphanpg(struct pgrp *pg); +static void fill_kinfo_proc_only(struct proc *p, struct kinfo_proc *kp); +static void fill_kinfo_thread(struct thread *td, struct kinfo_proc *kp); static void pgadjustjobc(struct pgrp *pgrp, int entering); static void pgdelete(struct pgrp *); static int proc_ctor(void *mem, int size, void *arg, int flags); @@ -601,33 +603,22 @@ } } #endif /* DDB */ -void -fill_kinfo_thread(struct thread *td, struct kinfo_proc *kp); /* - * Fill in a kinfo_proc structure for the specified process. + * Clear kinfo_proc and fill in any information that is common + * to all threads in the process. * Must be called with the target process locked. */ -void -fill_kinfo_proc(struct proc *p, struct kinfo_proc *kp) -{ - fill_kinfo_thread(FIRST_THREAD_IN_PROC(p), kp); -} - -void -fill_kinfo_thread(struct thread *td, struct kinfo_proc *kp) +static void +fill_kinfo_proc_only(struct proc *p, struct kinfo_proc *kp) { - struct proc *p; struct thread *td0; - struct ksegrp *kg; struct tty *tp; struct session *sp; struct timeval tv; struct ucred *cred; struct sigacts *ps; - p = td->td_proc; - bzero(kp, sizeof(*kp)); kp->ki_structsize = sizeof(*kp); @@ -685,7 +676,8 @@ kp->ki_tsize = vm->vm_tsize; kp->ki_dsize = vm->vm_dsize; kp->ki_ssize = vm->vm_ssize; - } + } else if (p->p_state == PRS_ZOMBIE) + kp->ki_stat = SZOMB; if ((p->p_sflag & PS_INMEM) && p->p_stats) { kp->ki_start = p->p_stats->p_start; timevaladd(&kp->ki_start, &boottime); @@ -704,71 +696,6 @@ kp->ki_nice = p->p_nice; bintime2timeval(&p->p_runtime, &tv); kp->ki_runtime = tv.tv_sec * (u_int64_t)100 + tv.tv_usec; - if (p->p_state != PRS_ZOMBIE) { -#if 0 - if (td == NULL) { - /* XXXKSE: This should never happen. */ - printf("fill_kinfo_proc(): pid %d has no threads!\n", -
Re: FreeBSD unstable on Dell 1750 using SMP?
This is encouraging - it's the first I've heard of someone who has found a way to trigger the problem "on demand". The problems I was experiencing were on a dual Xeon with HTT enabled as well.Perhaps someone out there who knows much more about the inner workings of FreeBSD may have an idea of why running top in "aggressive mode" like this might trigger the random rebooting. In particular, it would be nice to *know* that someone out there specifically fixed whatever is wrong in 5.4 when bringing it to 6.0. It's encouraging that you haven't had any problems since upgrading to 6.0, but I have to wonder if the bug's actually fixed, or the specific trigger of running top doesn't trigger the problem but the problem is still lurking in the background waiting to strike with the right combination of events. In any case, I'm anxious to try it out myself on our server to see if "top -s0" brings it down "on command" with HTT enabled, and not with HTT disabled. But I'm going to have to wait until some time over the Christmas holidays to do that sort of experimentation at a time when it isn't affecting the end users of the machine. I may also upgrade to 6.0 at that time, since by then it will have been out for a couple of months, so most of the worst quirks should be worked out by then. In the meantime, disabling HTT as I've done seems like a reasonable precaution to improve the stability.. Thanks for your help! Dan On Nov 29, 2005, at 10:50 PM, Stephen Montgomery-Smith wrote: Dan Charrois wrote: It actually may be a comfort, since perhaps HTT is related to the culprit. Since the last crash, about a month ago, I disabled HTT, both in the kernel as well in the BIOS. So as far as I know, it's completely been disabled (and the boot messages and top only show 2 CPUs). And I haven't had the system go down for nearly a month now. I don't know if it is related, but I used to have random reboots on a dual Xeon system with HTT enabled. It happened when I ran a CPU intensive threaded program at the same time as "top" - running "top -s0" (which you have to do as root) could usually kill the machine in seconds if not minutes. All I can tell you is that with FreeBSD 6.0 the problem disappeared. Well not totally - I still get a bunch of harmless calcru negative messages, although I don't know if it is actually related to the boot problems I used to have with FreeBSD 5.4, because I get the calcru backwards messages even with HTT disabled. Anyway, if you are in the mood to try it out, you might like to try re-enabling HTT, starting up whatever process you usually use (I'm guessing it is MySQL), and then run "top -s0". If you get a crash soon after that, you have the same problem I had. Let me also add that these crashes usually did not trigger a crash dump (I had dumpon set), and when it did the resulting dump looked rather corrupted. Stephen -- Syzygy Research & Technology Box 83, Legal, AB T0G 1L0 Canada Phone: 780-961-2213 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: FreeBSD unstable on Dell 1750 using SMP?
Dan Charrois wrote: It actually may be a comfort, since perhaps HTT is related to the culprit. Since the last crash, about a month ago, I disabled HTT, both in the kernel as well in the BIOS. So as far as I know, it's completely been disabled (and the boot messages and top only show 2 CPUs). And I haven't had the system go down for nearly a month now. I don't know if it is related, but I used to have random reboots on a dual Xeon system with HTT enabled. It happened when I ran a CPU intensive threaded program at the same time as "top" - running "top -s0" (which you have to do as root) could usually kill the machine in seconds if not minutes. All I can tell you is that with FreeBSD 6.0 the problem disappeared. Well not totally - I still get a bunch of harmless calcru negative messages, although I don't know if it is actually related to the boot problems I used to have with FreeBSD 5.4, because I get the calcru backwards messages even with HTT disabled. Anyway, if you are in the mood to try it out, you might like to try re-enabling HTT, starting up whatever process you usually use (I'm guessing it is MySQL), and then run "top -s0". If you get a crash soon after that, you have the same problem I had. Let me also add that these crashes usually did not trigger a crash dump (I had dumpon set), and when it did the resulting dump looked rather corrupted. Stephen ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: FreeBSD unstable on Dell 1750 using SMP?
Rutger Bevaart wrote: Same here on several 1750's, 1850's and 2850's. Tomorrow I'll disable USB in the BIOS on one of the 1750's and see if it makes a difference. It's the only one of the set that I could get downtime for because it rebooted yesterday ;-) I've disabled USB in the BIOS on my 2850 much earlier on when I was getting interrupt storms, since I didn't need USB anyway. It solved the problem of the interrupt storms, but it didn't seem to have any impact on the mysterious unsolicited rebooting problem. Claus Guttesen wrote: It's not any comfort to you but I have two Dell PE 1750's running very reliable using FreeBSD 5.4 stable as of Wed. the 28'th of Sep. 2005. It has two Xeon at 3 GHz, 2 GB RAM, a LSILogic 1030 Ultra4 Adapter. HTT is *off*. HTT does not yield any higher performance for most purposes. I can send you my kernel if you want. It actually may be a comfort, since perhaps HTT is related to the culprit. Since the last crash, about a month ago, I disabled HTT, both in the kernel as well in the BIOS. So as far as I know, it's completely been disabled (and the boot messages and top only show 2 CPUs). And I haven't had the system go down for nearly a month now. Of course, I also did some other things at the same time, so it's unclear as to which specifically may have helped. I had noticed that in the past it had rebooted itself twice right while running mysqlhotcopy as root during a period where the server may have been rather heavily loaded. So in addition to turning off hyperthreading, I also changed the time when mysqlhotcopy was running to a period likely under a lighter load, and modified things so it isn't running as root any longer. Not that I think mysqlhotcopy was the culprit itself, but it does cause a fairly large burst of disk activity when it is running, and it does seem to be related to triggering the event, at least in my situation. In any case, since I've done those three things, I haven't had a crash yet. Of course, the lack of a result doesn't prove anything, but the more time that passes, the better I feel. That is until one day I wake up to find that it died again. In any case, if that happens, I'll know more things that the problem isn't related to.. Vivek Khera wrote: I'd recommend running the Dell diags. They're pretty good at picking out hardware trouble, which it sounds like the OP is having. In my case anyway, I have run the Dell diagnostics, and they showed everything to be just fine.. Kevin Oberman wrote: As far as I can tell, hyperthreading is not much of a win for anyone. See hte article at: http://news.zdnet.co.uk/ 0,39020330,39237341,00.htmhttp://news.zdnet .co.uk/0,39020330,39237341,00.htm It reports that HTT slows performance even on threaded and, theoretically HTT ideal apps. (And this was with Windows.) So I've heard. I was hoping that hyperthreading might be able to help a dedicated MySQL server handle a bit higher load, but I never had the chance to benchmark it with and with hyperthreading before I had to put the machine into production. So it's disabled now - it can't hurt the stability of the system and can only potentially help it. Time will tell. Thanks for your replies, everyone! Dan -- Syzygy Research & Technology Box 83, Legal, AB T0G 1L0 Canada Phone: 780-961-2213 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: FreeBSD unstable on Dell 1750 using SMP?
As far as I can tell, hyperthreading is not much of a win for anyone. See hte article at: http://news.zdnet.co.uk/0,39020330,39237341,00.htmhttp://news.zdnet .co.uk/0,39020330,39237341,00.htm It reports that HTT slows performance even on threaded and, theoretically HTT ideal apps. (And this was with Windows.) -- R. Kevin Oberman, Network Engineer Energy Sciences Network (ESnet) Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab) E-mail: [EMAIL PROTECTED] Phone: +1 510 486-8634 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: FreeBSD unstable on Dell 1750 using SMP?
On Nov 29, 2005, at 10:46 AM, Claus Guttesen wrote: It's not any comfort to you but I have two Dell PE 1750's running very reliable using FreeBSD 5.4 stable as of Wed. the 28'th of Sep. 2005. I'd recommend running the Dell diags. They're pretty good at picking out hardware trouble, which it sounds like the OP is having. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: FreeBSD unstable on Dell 1750 using SMP?
> Thanks everyone for replies made over the past few days about the > "unsolicited" rebooting problem. At first, I thought there was a > memory allocation bug as judged by the output of "netstat -m", but > apparently it's just a cosmetic statistics reporting bug and nothing > related to the instability itself. It's not any comfort to you but I have two Dell PE 1750's running very reliable using FreeBSD 5.4 stable as of Wed. the 28'th of Sep. 2005. It has two Xeon at 3 GHz, 2 GB RAM, a LSILogic 1030 Ultra4 Adapter. HTT is *off*. HTT does not yield any higher performance for most purposes. I can send you my kernel if you want. regards Claus ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: FreeBSD unstable on Dell 1750 using SMP?
Thanks everyone for replies made over the past few days about the "unsolicited" rebooting problem. At first, I thought there was a memory allocation bug as judged by the output of "netstat -m", but apparently it's just a cosmetic statistics reporting bug and nothing related to the instability itself. Unfortunately, it means that I still haven't been able to find a solution to the problem (and apparently, I'm not the only one to experience it). Considering that we only have the one machine, which happens to be a production machine, that experiences the problem (infrequently at that), it's difficult to test and resolve. It's been suggested that FreeBSD 6.0 may fix the problem, but considering some of the inevitable bugs that creep into new releases, I'm reluctant to go there until things settle down in 6.0 (plus, I haven't seen any documentation that implies that a fix for the problem will result from using 6.0 in any case). If it weren't a production machine that needs to be reliable, stable, and available, I'd have a better chance at being able to test it under 6.0. Some speculation has been made about it being triggered by possibly buggy ethernet drivers, etc. In my case, though possible, I doubt it - since my machine has rebooted itself right when mysqlhotcopy was about to run on the machine (and it runs locally without causing any network activity that I'm aware of). The first thought I had was that it may be caused by faulty memory or something, but Dell's hardware diagnostics all tested everything to be perfectly okay. What I find strange is that it's not that the kernel locks up or anything - the machine just suddenly restarts (caches aren't flushed to disk or anything - it's just like someone literally pulls the power plug midstream, and then plugs it back in. The only indication that something weird goes on is that in the server logs everything seems to be crunching away happily and then suddenly I see the boot messages when it restarts all by itself.. In any case, if anyone else with a dual processor machine (I have a PowerEdge 2850 myself) has experienced the rebooting problem discussed a few days ago and resolved it, I'd very much like to hear from you. Dan -- Syzygy Research & Technology Box 83, Legal, AB T0G 1L0 Canada Phone: 780-961-2213 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: FreeBSD unstable on Dell 1750 using SMP?
Both the servers where I changed the nic to an old intel 10/100 reached a week of uptime! I'm seriously thinking about a problem with the "em" ethernet card driver. later, gino From: Kris Kennaway <[EMAIL PROTECTED]> To: Rutger Bevaart <[EMAIL PROTECTED]> CC: Kris Kennaway <[EMAIL PROTECTED]>, freebsd-stable@freebsd.org,Gino Ruopolo <[EMAIL PROTECTED]> Subject: Re: FreeBSD unstable on Dell 1750 using SMP? Date: Fri, 25 Nov 2005 14:09:19 -0500 On Fri, Nov 25, 2005 at 01:22:01PM +0100, Rutger Bevaart wrote: > Hello Kris (& list), > > Thanks for helping the 1750 and 2850 owners on this list. Unfortunately I > cannot find any references to the leak or the fix you are referring to in > the Release errata (http://www.freebsd.org/releases/5.4R/errata.html). It was in the 5.3 errata, sorry. I don't think it was fixed until after 5.4 though. > We are trying really hard to resolve the stability issues with our Dell > servers and would be very happy to know when the fix for what was > committed. As I said twice already, the stats leak is ***HARMLESS***. It only gives the wrong value to counters that are unused for anything except reporting to the user. > No way we'll be upgrading to 6.0 without knowing exactly what > is going on (remembering broken 4.10 -> 5.3 systems) ... 5.4 -> 6.0 is really a very minor jump. But if you're not willing to even test it out on one machine to see whether it resolves your problems, you'll likely just have to get used to the instability until someone can identify your problem and then fix it. Kris << attach3 >> _ Parla con i tuoi amici che hanno MSN Hotmail in tempo reale! E' gratis. http://www.imagine-msn.com/messenger/default.aspx?locale=it-IT ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: FreeBSD unstable on Dell 1750 using SMP?
On Nov 25, 2005, at 8:09 PM, Kris Kennaway wrote: As I said twice already, the stats leak is ***HARMLESS***. It only gives the wrong value to counters that are unused for anything except reporting to the user. Aha, that's what I was trying to clear up when the whole counters issue came along in the first place. Must have missed your previous remark about it. No way we'll be upgrading to 6.0 without knowing exactly what is going on (remembering broken 4.10 -> 5.3 systems) ... 5.4 -> 6.0 is really a very minor jump. But if you're not willing to even test it out on one machine to see whether it resolves your problems, you'll likely just have to get used to the instability until someone can identify your problem and then fix it. Of course I'm trying it out, it's just hard to get a spare Dell 2850 and put it through the same day-to-day use as the rest of them. That's a matter of cost. On other posts I've basically offered anything short of root access to resolve this. There are actually quite a group of people with issues with this. Some think it's an ACPI issue. An irq conflict with USB has been suggested. the 'em' driver was suspect, the 'bge' driver was suspect, the 'amr' driver was suspect. Funny thing is we have this 1750 (2x 2.4 Xeon) that takes a major hitting each day, running 5.3-BETA6. Hasn't crashed ever. If you have any clues on where I can look further (previous posts at: http://lists.freebsd.org/pipermail/freebsd-smp/2005-July/000930.html) greatly appreciated. Regards & thanks for all the help, Rutger ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: FreeBSD unstable on Dell 1750 using SMP?
On Fri, Nov 25, 2005 at 01:22:01PM +0100, Rutger Bevaart wrote: > Hello Kris (& list), > > Thanks for helping the 1750 and 2850 owners on this list. Unfortunately I > cannot find any references to the leak or the fix you are referring to in > the Release errata (http://www.freebsd.org/releases/5.4R/errata.html). It was in the 5.3 errata, sorry. I don't think it was fixed until after 5.4 though. > We are trying really hard to resolve the stability issues with our Dell > servers and would be very happy to know when the fix for what was > committed. As I said twice already, the stats leak is ***HARMLESS***. It only gives the wrong value to counters that are unused for anything except reporting to the user. > No way we'll be upgrading to 6.0 without knowing exactly what > is going on (remembering broken 4.10 -> 5.3 systems) ... 5.4 -> 6.0 is really a very minor jump. But if you're not willing to even test it out on one machine to see whether it resolves your problems, you'll likely just have to get used to the instability until someone can identify your problem and then fix it. Kris pgpcAipeb4rbx.pgp Description: PGP signature
Re: FreeBSD unstable on Dell 1750 using SMP?
Hello Kris (& list), Thanks for helping the 1750 and 2850 owners on this list. Unfortunately I cannot find any references to the leak or the fix you are referring to in the Release errata (http://www.freebsd.org/releases/5.4R/errata.html). We are trying really hard to resolve the stability issues with our Dell servers and would be very happy to know when the fix for what was committed. No way we'll be upgrading to 6.0 without knowing exactly what is going on (remembering broken 4.10 -> 5.3 systems) ... Regards Rutger Bevaart On Thu, November 24, 2005 21:22, Kris Kennaway wrote: > On Thu, Nov 24, 2005 at 09:45:08AM +0100, Rutger Bevaart wrote: >> Hi Kris, >> >> I cannot find anything about that in the /usr/src/UPDATING for the 5.4 >> branch. > > I didn't say anything about UPDATING, I said the release errata. > >> We're running "FreeBSD xyz 5.4-RELEASE-p5 FreeBSD 5.4-RELEASE-p5" >> and p6 and later only fix some IPSEC and SSL stuff. >> >> Is it in 6.0 and if so, will somebody backport that fix? > > Yes and as I said, it already was. > >> > This is documented in the 5.4 errata, it's a leak in the stats >> > counting on SMP machines. It was fixed after 5.4. > > Kris > Rutger Bevaart :: illian.networks ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: FreeBSD unstable on Dell 1750 using SMP?
On Thu, Nov 24, 2005 at 02:49:10PM -0700, Dan Charrois wrote: > I just thought of one other bit of info that may be relevant to the > auto-rebooting problem I've experienced with our PowerEdge 2850. > Since the problem may be related to memory allocation, I thought I > should mention that we have more memory in that machine that is > typical for some users. We have 5 Gigs installed. From "top": > > Mem: 175M Active, 4121M Inact, 244M Wired, 244M Cache, 214M Buf, 23M > Free > Swap: 10G Total, 12K Used, 10G Free > > If this turns out to be an AMD64 vs. 386 issue and we were to revert > to the 386 branch, would we still be able to access this memory, or > would the 386 be limited to 4Gb (or maybe 2Gb) due to 32 bit > addressing? We don't need anywhere near this much memory for user > space programs, but the kernel does make good use of it to cache > commonly accessed regions of the file system in memory. There are no issues with using 5GB of RAM on AMD64, unless of course you have bad memory (I assume you already ruled this out by swapping out the RAM, making sure you don't have mismatched RAM with different characteristics, etc). On i386 this would be limited to 4GB unless you enable PAE, which has performance implications (how much depends on your CPU) and which may not be supported by the drivers you need (see the PAE kernel config file). Kris pgpmmkzOPLvFE.pgp Description: PGP signature
Re: FreeBSD unstable on Dell 1750 using SMP?
On Thu, Nov 24, 2005 at 02:36:01PM -0700, Dan Charrois wrote: > But here's about where any troubleshooting on my own reaches its > limit. I noticed that Kris mentioned it was a known problem in the > stats counting for SMP machines and had been fixed, but haven't been > able to find a reference to that, or any indication of how to do so. > Is this fix supposed to have been an accounting bug in the report for > netstat, or is it something which would have taken down the machine > as has been happening? It's a leak in the stats counting that has no implications other than cosmetic ones. If you update to 5.4-STABLE that should be fixed. Anyway, if 5.4 is giving you stability problems then you should try 6.0 to see if the bug is already fixed. Kris pgpk2WYuBzx3T.pgp Description: PGP signature
Re: FreeBSD unstable on Dell 1750 using SMP?
I just thought of one other bit of info that may be relevant to the auto-rebooting problem I've experienced with our PowerEdge 2850. Since the problem may be related to memory allocation, I thought I should mention that we have more memory in that machine that is typical for some users. We have 5 Gigs installed. From "top": Mem: 175M Active, 4121M Inact, 244M Wired, 244M Cache, 214M Buf, 23M Free Swap: 10G Total, 12K Used, 10G Free If this turns out to be an AMD64 vs. 386 issue and we were to revert to the 386 branch, would we still be able to access this memory, or would the 386 be limited to 4Gb (or maybe 2Gb) due to 32 bit addressing? We don't need anywhere near this much memory for user space programs, but the kernel does make good use of it to cache commonly accessed regions of the file system in memory. Dan -- Syzygy Research & Technology Box 83, Legal, AB T0G 1L0 Canada Phone: 780-961-2213 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: FreeBSD unstable on Dell 1750 using SMP?
Hi Kris, Rutger, and others that have commented on this thread. I'm happy to hear that I'm not the only one experiencing problems like this. I posted a similar question a month or so ago about a PowerEdge 2850 using SMP (dual Xeons) and never received any responses that helped solve the problem, or even any indication that others had the same problem. As you know, troubleshooting this is quite difficult, since it can take weeks to go down, and then the "auto-reboot" doesn't result in any clues as to why in the log file - it's just suddenly started again as if someone had pulled the plug on it. I've been pulling my hair out. My machine crashed twice in the last month or so, within two weeks of each other. Both times, it was just as a cron task was about to schedule the mysqlhotcopy script to back up some SQL databases that are being hosted on that machine, so I thought it may have something to do with that (I had it running as a root crontask so figured that maybe some bug in that caused things to go weird - it was running as root, after all). I changed it to run under a less privileged user and the machine hasn't died for about 2 1/2 weeks. But that's hardly a conclusive case of having solved the situation - it's probably planning on surviving just long enough to last until the point I need it the most to work. It sounds as though memory buffer allocations are going wacky or something, in which anything could take it down given the wrong combination of events. In any case, We're running the amd64 version of FreeBSD 5.4-RELEASE- p6 FreeBSD 5.4-RELEASE-p6 #3: Fri Aug 5 18:18:10 MDT 2005 A netstat -m (which I'd never tried before) yields: 18446744073709551402 mbufs in use 49/25600 mbuf clusters in use (current/max) 0/0/0 sfbufs in use (current/peak/max) 44 KBytes allocated to network 0 requests for sfbufs denied 0 requests for sfbufs delayed 0 requests for I/O initiated by sendfile 884 calls to protocol drain routines Obviously, the mbufs in use currently on that machine is way out to lunch. And interestingly, it looks as though my max mbuf clusters in use of 25600 is identical to the other netstat -m reports from people having this problem. Another machine (an older single CPU Dell) on which I'm running the 386 version of FreeBSD 5.4-RELEASE-p5 FreeBSD 5.4-RELEASE-p5 #1: Thu Jul 21 22:30:46 MDT 2005 has a more sane netstat -m: 130 mbufs in use 128/8896 mbuf clusters in use (current/max) 0/177/2480 sfbufs in use (current/peak/max) 288 KBytes allocated to network 0 requests for sfbufs denied 0 requests for sfbufs delayed 208493 requests for I/O initiated by sendfile 26697 calls to protocol drain routines But here's about where any troubleshooting on my own reaches its limit. I noticed that Kris mentioned it was a known problem in the stats counting for SMP machines and had been fixed, but haven't been able to find a reference to that, or any indication of how to do so. Is this fix supposed to have been an accounting bug in the report for netstat, or is it something which would have taken down the machine as has been happening? If switching to single CPU mode works, it's good to hear that I have an option if things continue to act up. But I'd really rather not have to "dumb down" the machine to one CPU when there is the potential of two. Most of the time it's not under a huge load, but periodically there are massive spikes, and that's where having two CPUs really help. If anyone can shed further light on a fix for this problem, it would be greatly appreciated! Dan -- Syzygy Research & Technology Box 83, Legal, AB T0G 1L0 Canada Phone: 780-961-2213 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: FreeBSD unstable on Dell 1750 using SMP?
On Thu, Nov 24, 2005 at 09:45:08AM +0100, Rutger Bevaart wrote: > Hi Kris, > > I cannot find anything about that in the /usr/src/UPDATING for the 5.4 > branch. I didn't say anything about UPDATING, I said the release errata. > We're running "FreeBSD xyz 5.4-RELEASE-p5 FreeBSD 5.4-RELEASE-p5" > and p6 and later only fix some IPSEC and SSL stuff. > > Is it in 6.0 and if so, will somebody backport that fix? Yes and as I said, it already was. > > This is documented in the 5.4 errata, it's a leak in the stats > > counting on SMP machines. It was fixed after 5.4. Kris pgppJezzRFjO2.pgp Description: PGP signature
Re: FreeBSD unstable on Dell 1750 using SMP?
Hi Kris, I cannot find anything about that in the /usr/src/UPDATING for the 5.4 branch. We're running "FreeBSD xyz 5.4-RELEASE-p5 FreeBSD 5.4-RELEASE-p5" and p6 and later only fix some IPSEC and SSL stuff. Is it in 6.0 and if so, will somebody backport that fix? Regards Rutger On Wed, November 23, 2005 22:39, Kris Kennaway wrote: > On Sun, Nov 20, 2005 at 07:24:25PM +0100, Rutger Bevaart wrote: >> Strange indeed. >> >> On a 1750 with bge's: >> 475 mbufs in use >> 501/25600 mbuf clusters in use (current/max) >> 0/3/6656 sfbufs in use (current/peak/max) >> 1120 KBytes allocated to network >> 0 requests for sfbufs denied >> 0 requests for sfbufs delayed >> 0 requests for I/O initiated by sendfile >> 100 calls to protocol drain routines >> >> On a 2850 (hardware identical to an 1850): >> $ netstat -m >> 4294966848 mbufs in use >> 565/25600 mbuf clusters in use (current/max) >> 0/67/6656 sfbufs in use (current/peak/max) >> 1018 KBytes allocated to network >> 0 requests for sfbufs denied >> 0 requests for sfbufs delayed >> 16449 requests for I/O initiated by sendfile >> 589 calls to protocol drain routines >> >> Both experience the "auto reboot" feature. The mbufs on the 2850 look >> like a counter (signed/unsigned) bug, maybe even just in the >> printing. Other than that I'm having a hard time interpreting these >> results. > > This is documented in the 5.4 errata, it's a leak in the stats > counting on SMP machines. It was fixed after 5.4. > > Kris > Rutger Bevaart :: illian.networks ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: FreeBSD unstable on Dell 1750 using SMP?
On Sun, Nov 20, 2005 at 07:24:25PM +0100, Rutger Bevaart wrote: > Strange indeed. > > On a 1750 with bge's: > 475 mbufs in use > 501/25600 mbuf clusters in use (current/max) > 0/3/6656 sfbufs in use (current/peak/max) > 1120 KBytes allocated to network > 0 requests for sfbufs denied > 0 requests for sfbufs delayed > 0 requests for I/O initiated by sendfile > 100 calls to protocol drain routines > > On a 2850 (hardware identical to an 1850): > $ netstat -m > 4294966848 mbufs in use > 565/25600 mbuf clusters in use (current/max) > 0/67/6656 sfbufs in use (current/peak/max) > 1018 KBytes allocated to network > 0 requests for sfbufs denied > 0 requests for sfbufs delayed > 16449 requests for I/O initiated by sendfile > 589 calls to protocol drain routines > > Both experience the "auto reboot" feature. The mbufs on the 2850 look > like a counter (signed/unsigned) bug, maybe even just in the > printing. Other than that I'm having a hard time interpreting these > results. This is documented in the 5.4 errata, it's a leak in the stats counting on SMP machines. It was fixed after 5.4. Kris pgpLzZcOoJBbO.pgp Description: PGP signature
Re: FreeBSD unstable on Dell 1750 using SMP?
On Nov 20, 2005, at 1:24 PM, Rutger Bevaart wrote: Both experience the "auto reboot" feature. The mbufs on the 2850 look like a counter (signed/unsigned) bug, maybe even just in the printing. Other than that I'm having a hard time interpreting these results. FreeBSD 4.x, 5.x, and 6.x have been stable for me on all Dell hardware. 4.x (currently 4.11) has been running on 1550's, 1650's, 2650 and 1750's for > 3 years 5.4 on 2450 for ~6 months 6.0 on 1750, 1850, and 2650 since 6.0-RC2, currently running 6.0-REL. Never a flake-out not due to a hardware failure, and that only on two of the 1550s over 4 years' time. I did have the 5.4 box running 5.4- REL-p7 lockup once, but was unable to determine the cause. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: FreeBSD unstable on Dell 1750 using SMP?
Strange indeed. On a 1750 with bge's: 475 mbufs in use 501/25600 mbuf clusters in use (current/max) 0/3/6656 sfbufs in use (current/peak/max) 1120 KBytes allocated to network 0 requests for sfbufs denied 0 requests for sfbufs delayed 0 requests for I/O initiated by sendfile 100 calls to protocol drain routines On a 2850 (hardware identical to an 1850): $ netstat -m 4294966848 mbufs in use 565/25600 mbuf clusters in use (current/max) 0/67/6656 sfbufs in use (current/peak/max) 1018 KBytes allocated to network 0 requests for sfbufs denied 0 requests for sfbufs delayed 16449 requests for I/O initiated by sendfile 589 calls to protocol drain routines Both experience the "auto reboot" feature. The mbufs on the 2850 look like a counter (signed/unsigned) bug, maybe even just in the printing. Other than that I'm having a hard time interpreting these results. Regards Rutger Bevaart On Nov 20, 2005, at 5:07 PM, Gino Ruopolo wrote: Hello Rutger, I red your post but I'm unable to reply on the list 'cause of some firewall settings. I'm having the same problems with various Dell1850 and Fbsd 5.4 Last week I noticed the following: #netstat -m 4294899289 mbufs in use!?!?!??!!? 4294940375/25600 mbuf clusters in use (current/max) !?!?!?!??! 0/9/6656 sfbufs in use (current/peak/max) 4123460 KBytes allocated to network 0 requests for sfbufs denied 0 requests for sfbufs delayed 34 requests for I/O initiated by sendfile 2533 calls to protocol drain routines Here is the output of the same command on a different server with fxp0 ethernet driver, also FBSD 5.4 and doing the same work: #netstat -m 194 mbufs in use 171/25600 mbuf clusters in use (current/max) 0/4/6656 sfbufs in use (current/peak/max) 390 KBytes allocated to network 0 requests for sfbufs denied 0 requests for sfbufs delayed 0 requests for I/O initiated by sendfile 0 calls to protocol drain routines So I've tried putting an old pci ethernet 10/100 using fxp driver on a Dell1850 suffering the "self-reboot" problem. I'm getting 5 days of uptime without a single reboot ... What about a problem with the em driver? Regards, gino _ Parla con i tuoi amici che hanno MSN Hotmail in tempo reale! E' gratis. http://www.imagine-msn.com/messenger/default.aspx?locale=it-IT ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"