Re: Panic during kernel booting on HP Proliant DL180G6 and latest STABLE

2011-09-22 Thread David G Lawrence
> I have a lot of supermicro motherboards and the newest ones have igb
> chipsets; they've been quite a headache with respect to FreeBSD 8. I'm
> running 8.2-RELEASE but have upgraded parts of my kernel to 8-RELENG (as
> of a few months ago). Some of them work ok while others panic on bootup.
> Upgrading to newer versions of the intel igb code fixes some but breaks
> others. It's been frustrating.
> 
> While working on this today, I saw two different kernel panics:
> 
> Could not setup receive structures
> m_getzone: m_getjcl: invalid cluster type

   I fixed this awhile back in my local sources. A 12 core Supermicro
MB system I'm building here was hitting the bug 100% of the time during
startup. Patch attached.

-DG

Dr. David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
Pave the road of life with opportunities.

Index: if_igb.c
===
RCS file: /home/ncvs/src/sys/dev/e1000/if_igb.c,v
retrieving revision 1.21.2.20
diff -c -r1.21.2.20 if_igb.c
*** if_igb.c29 Jun 2011 16:16:59 -  1.21.2.20
--- if_igb.c22 Sep 2011 10:04:31 -
***
*** 1278,1286 
/* Don't lose promiscuous settings */
igb_set_promisc(adapter);
  
-   ifp->if_drv_flags |= IFF_DRV_RUNNING;
-   ifp->if_drv_flags &= ~IFF_DRV_OACTIVE;
- 
callout_reset(&adapter->timer, hz, igb_local_timer, adapter);
e1000_clear_hw_cntrs_base_generic(&adapter->hw);
  
--- 1278,1283 
***
*** 1308,1313 
--- 1305,1313 
  
/* Don't reset the phy next time init gets called */
adapter->hw.phy.reset_disable = TRUE;
+ 
+   ifp->if_drv_flags |= IFF_DRV_RUNNING;
+   ifp->if_drv_flags &= ~IFF_DRV_OACTIVE;
  }
  
  static void
***
*** 1490,1501 
E1000_WRITE_REG(&adapter->hw, E1000_EIMC, que->eims);
++que->irqs;
  
IGB_TX_LOCK(txr);
more_tx = igb_txeof(txr);
IGB_TX_UNLOCK(txr);
  
-   more_rx = igb_rxeof(que, adapter->rx_process_limit, NULL);
- 
if (igb_enable_aim == FALSE)
goto no_calc;
/*
--- 1490,1505 
E1000_WRITE_REG(&adapter->hw, E1000_EIMC, que->eims);
++que->irqs;
  
+   if (!(adapter->ifp->if_drv_flags & IFF_DRV_RUNNING)) {
+   return;
+   }
+ 
+   more_rx = igb_rxeof(que, adapter->rx_process_limit, NULL);
+ 
IGB_TX_LOCK(txr);
more_tx = igb_txeof(txr);
IGB_TX_UNLOCK(txr);
  
if (igb_enable_aim == FALSE)
goto no_calc;
/*
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: HEADS UP: inpcb/inpcbinfo rwlocking: coming to a 7-STABLE branch near you

2008-08-05 Thread David G Lawrence
> The thrust of this change is to replace the mutexes protecting the inpcb 
> and inpcbinfo data structures with read-write locks (rwlocks).  These 

   That's really cool and directly affects my current work project. I'm
developing (have developed, actually) a multi-threaded, 5000+ member VoIP/SIP
conferencing server called Nconnect. It a primarily UDP application running
on FreeBSD 7. This generates and receives about 250,000 UDP packets a second,
with 200 byte packets, resulting in about 400Mbps of traffic in each
direction. The current bottleneck is the kernel UDP processing. It should
be possible to scale to 1+ members if kernel UDP processing had optimal
concurrency.
   Anyway, thumbs up (and not for the middle-eastern meaning :-)) - I'm
looking forward to the MFC.

-DG

Dr. David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
Pave the road of life with opportunities.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: RELENG_7 2008/01/10 desktop system also periodically freezes

2008-01-12 Thread David G Lawrence
> On Fri, 11 Jan 2008 12:49:29 -0500, I wrote:
> >
> > I have yet to experience a "random" freeze not directly attributable
> > to a softupdate while running the lock profiling.  I am running with
> > lock profiling on, and resetting the profiling counters once a minute.
> > Yesterday and this morning, I've run for quite a while now with lock
> > profiling on but without a "random" freeze.  I'll wait some more, but
> > I'm hoping that enabling the lock profiling hasn't masked the freeze.
> > I'll post again when I see one..
> > 
> 
> It is looking more likely to me that enabling lock profiling does mask
> the freeze.  I ran for more than 10 hours yesterday with lock profiling
> enabled and did not observe a single freeze.  After about 7 hours, I
> stopped the lock profiling and within 20 mins or so, I experienced a
> NINE MINUTE freeze!!  On re-enabling the lock profiling, I ran for about
> 3 more hours with no further freezes.
> 
> At the time of that long freeze, all I was doing was typing an email
> message.  The load average was almost 0.  Mail client is claws-email.
> Also running but idle were firefox, ical, several xterms, fvwm & its
> children (Fvwm{Buttons,Event,Pager,IconMan}), xload and xclock.  And
> xorg which uses the xf86-video-intel driver.  Daemons running were
> wpa_supplicant, dhclient, devd, syslogd, cupsd, ntpd, powerd, sshd,
> sendmail, cron, moused and xdm.  That is all.

   You might want to try disabling powerd and see if that mitigates the
problem. powerd is going to be messing with the CPU clock when it is near
idle. Your system would be less idle with lock profiling enabled, which
might explain why the problem seems to happen less often in that case.
 
-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Packet loss every 30.999 seconds

2007-12-21 Thread David G Lawrence
> > > Can you use a placeholder vnode as a place to restart the scan?
> > > you might have to mark it special so that other threads/things
> > > (getnewvnode()?) don't molest it, but it can provide for a convenient
> > > restart point.
> > 
> >That was one of the solutions that I considered and rejected since it
> > would significantly increase the overhead of the loop.
> >The solution provided by Kostik Belousov that uses uio_yield looks like
> > a find solution. I intend to try it out on some servers RSN.
> 
> Out of curiosity's sake, why would it make the loop slower?  one
> would only add the placeholder when yielding, not for every iteration.

   Actually, I misread your suggestion and was thinking marker flag,
rather than placeholder vnode. Sorry about that. The current code
actually already uses a marker vnode. It is hidden and obfuscated in
the MNT_VNODE_FOREACH macro, further hidden in the __mnt_vnode_first/next
functions, so it should be safe from vnode reclaimation/free problems.

-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Packet loss every 30.999 seconds

2007-12-21 Thread David G Lawrence
> As Bruce Evans noted, there is a vfs_msync() that do almost the same
> traversal of the vnodes. It was missed in the previous patch. Try this one.

   I forgot to comment on that when Bruce pointed that out. My solution
has been to comment out the call to vfs_msync. :-) It comes into play
when you have files modified through the mmap interface (kind of rare
on most systems). Obviously I have mixed feelings about vfs_msync, but
I'm not suggesting here that we should get rid of it as any sort of
solution.

-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Packet loss every 30.999 seconds

2007-12-21 Thread David G Lawrence
> I'm just an observer, and I may be confused, but it seems to me that this is
> motion in the wrong direction (at least, it's not going to fix the actual
> problem). As I understand the problem, once you reach a certain point, the
> system slows down *every* 30.999 seconds. Now, it's possible for the code to
> cause one slowdown as it cleans up, but why does it need to clean up so much
> 31 seconds later?
> 
> Why not find/fix the actual bug? Then work on getting the yield right if it
> turns out there's an actual problem for it to fix.
> 
> If the problem is that too much work is being done at a stretch and it turns
> out this is because work is being done erroneously or needlessly, fixing
> that should solve the whole problem. Doing the work that doesn't need to be
> done more slowly is at best an ugly workaround.
> 
> Or am I misunderstanding?

   It's the syncer that is causing the problem, and it runs every 31 seconds.
Historically, the syncer ran every 30 seconds, but things have changed a
bit over time.
   The reason that the syncer takes so muck time is that ffs_sync is a bit
stupid in how it works - it loops through all of the vnodes on each ffs
mountpoint (typically almost all of the vnodes in the system) to see if
any of them need to be synced out. This was marginally okay when there
were perhaps a thousand vnodes in the system, but when the maximum number
of vnodes was dramatically increased in FreeBSD some years ago (to
typically 5-10) and combined with kernel threads of FreeBSD 5,
this has resulted in some rather bad side effects.
   I think the proper solution would be to create a ffs_sync work list
(another TAILQ/LISTQ), probably with the head in the mountpoint struct,
that has on it any vnodes that need to be synced. Unfortuantely, such a
change would be extensive, scattered throughout much of the ufs/ffs code.

-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Packet loss every 30.999 seconds

2007-12-21 Thread David G Lawrence
> >What patch you have used ?
> 
> This is hand applied from the diff you sent December 19, 2007 1:24:48  
> PM EST

   Mark, try the previos patch from Kostik - the one that does the one
tick msleep. I think you'll find that that one does work. The likely
problem with the second version is that uio_yield doesn't lower the
priority enough for the other threads to run. Forcing it to msleep for
a tick will eliminate the priority from the consideration.

-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Packet loss every 30.999 seconds

2007-12-21 Thread David G Lawrence
> >Unfortunately, the version of the patch that I sent out isn't going to
> > help your problem. It needs to yield at the top of the loop, but vp isn't
> > necessarily valid after the wakeup from the msleep. That's a problem that
> > I'm having trouble figuring out a solution to - the solutions that come
> > to mind will all significantly increase the overhead of the loop.
> 
> I apologize for not reading the code as I am swamped, but a technique
> that Matt Dillon used for bufs might work here.
> 
> Can you use a placeholder vnode as a place to restart the scan?
> you might have to mark it special so that other threads/things
> (getnewvnode()?) don't molest it, but it can provide for a convenient
> restart point.

   That was one of the solutions that I considered and rejected since it
would significantly increase the overhead of the loop.
   The solution provided by Kostik Belousov that uses uio_yield looks like
a find solution. I intend to try it out on some servers RSN.

-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Packet loss every 30.999 seconds

2007-12-19 Thread David G Lawrence
> >Try it with "find / -type f >/dev/null" to duplicate the problem  
> >almost
> >instantly.
> 
> I was able to verify last night that (cd /; tar -cpf -) > all.tar would
> trigger the problem.  I'm working getting a test running with
> David's ffs_sync() workaround now, adding a few counters there should
> get this narrowed down a little more.

   Unfortunately, the version of the patch that I sent out isn't going to
help your problem. It needs to yield at the top of the loop, but vp isn't
necessarily valid after the wakeup from the msleep. That's a problem that
I'm having trouble figuring out a solution to - the solutions that come
to mind will all significantly increase the overhead of the loop.
   As a very inadequate work-around, you might consider lowering
kern.maxvnodes to something like 2 - that might be low enough to
not trigger the problem, but also be high enough to not significantly
affect system I/O performance.

-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Packet loss every 30.999 seconds

2007-12-19 Thread David G Lawrence
> >  In any case, it appears that my patch is a no-op, at least for the
> >problem I was trying to solve. This has me confused, however, because at
> >one point the problem was mitigated with it. The patch has gone through
> >several iterations, however, and it could be that it was made to the top
> >of the loop, before any of the checks, in a previous version. Hmmm.
> 
> The patch should work fine.  IIRC, it yields voluntarily so that other
> things can run.  I committed a similar hack for uiomove().  It was

   It patches the bottom of the loop, which is only reached if the vnode
is dirty. So it will only help if there are thousands of dirty vnodes.
While that condition can certainly happen, it isn't the case that I'm
particularly interested in.

> CPUs, everything except interrupts has to wait for these syscalls.  Now
> the main problem is to figure out why PREEMPTION doesn't work.  I'm
> not working on this directly since I'm running ~5.2 where nearly-full
> kernel preemption doesn't work due to Giant locking.

   I don't understand how PREEMPTION is supposed to work (I mean
to any significant detail), so I can't really comment on that.

-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Packet loss every 30.999 seconds

2007-12-19 Thread David G Lawrence
>In any case, it appears that my patch is a no-op, at least for the
> problem I was trying to solve. This has me confused, however, because at
> one point the problem was mitigated with it. The patch has gone through
> several iterations, however, and it could be that it was made to the top
> of the loop, before any of the checks, in a previous version. Hmmm.

(replying to myself)

   I just found an earlier version of the patch, and sure enough, it was
to the top of the loop. Unfortunately, that version caused the system to
crash because vp was occasionally invalid after the wakeup.

   Anyway, let's see if Mark's packet loss problem is indeed related to
this code. If he does the find just after boot and immediately sees the
problem, then I would say that is fairly conclusive. He could also release
the cached vnodes by temporarily setting kern.maxvnodes=1 and then
setting it back to whatever it was previously (probably 6-10).
If the problem then goes away for awhile, that would be another good
indicator.

-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Packet loss every 30.999 seconds

2007-12-19 Thread David G Lawrence
> Try it with "find / -type f >/dev/null" to duplicate the problem almost
> instantly.

   FreeBSD used to have some code that would cause vnodes with no cached
pages to be recycled quickly (which would have made a simple find
ineffective without reading the files at least a little bit). I guess
that got removed when the size of the vnode pool was dramatically
increased.

-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Packet loss every 30.999 seconds

2007-12-19 Thread David G Lawrence
> On Tue, 18 Dec 2007, David G Lawrence wrote:
> 
> >>>I got an almost identical delay (with 64000 vnodes).
> >>>
> >>>Now, 17ms isn't much.
> >>
> >>   Says you. On modern systems, trying to run a pseudo real-time 
> >>   application
> >>on an otherwise quiescent system, 17ms is just short of an eternity. I 
> >>agree
> >>that the syncer should be preemptable (which is what my bandaid patch
> >>attempts to do), but that probably wouldn't have helped my specific 
> >>problem
> >>since my application was a user process, not a kernel thread.
> 
> FreeBSD isn't a real-time system, and 17ms isn't much for it.  I saw lots

   I never said it was, but that doesn't stop us from using FreeBSD in
pseudo real-time applications. This is made possible by fast CPUs and
dedicated-task systems where the load is carefully controlled.

> of syscall delays of nearly 1 second while debugging this.  (With another

   I can make the delay several minutes by pushing the reset button.
 
> Debugging shows that the problem is like I said.  The loop really does
> take 125 ns per iteration.  This time is actually not very much.  The

   Considering that the CPU clock cycle time is on the order of 300ps, I
would say 125ns to do a few checks is pathetic.

   In any case, it appears that my patch is a no-op, at least for the
problem I was trying to solve. This has me confused, however, because at
one point the problem was mitigated with it. The patch has gone through
several iterations, however, and it could be that it was made to the top
of the loop, before any of the checks, in a previous version. Hmmm.

-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Packet loss every 30.999 seconds

2007-12-18 Thread David G Lawrence
> > I got an almost identical delay (with 64000 vnodes).
> > 
> > Now, 17ms isn't much.
> 
>Says you. On modern systems, trying to run a pseudo real-time application
> on an otherwise quiescent system, 17ms is just short of an eternity. I agree
> that the syncer should be preemptable (which is what my bandaid patch
> attempts to do), but that probably wouldn't have helped my specific problem
> since my application was a user process, not a kernel thread.

   One more followup (I swear I'm done, really!)... I have a laptop here
that runs at 150MHz when it is in the lowest running CPU power save mode.
At that speed, this bug causes a delay of more than 300ms and is enough
to cause loss of keyboard input. I have to switch into high speed mode
before I try to type anything, else I end up with random typos. Very
annoying.

-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Packet loss every 30.999 seconds

2007-12-18 Thread David G Lawrence
> I got an almost identical delay (with 64000 vnodes).
> 
> Now, 17ms isn't much.

   Says you. On modern systems, trying to run a pseudo real-time application
on an otherwise quiescent system, 17ms is just short of an eternity. I agree
that the syncer should be preemptable (which is what my bandaid patch
attempts to do), but that probably wouldn't have helped my specific problem
since my application was a user process, not a kernel thread.
   All of my systems have options PREEMPTION - that is the default in
6+. It doesn't affect this problem.
   On the other hand, the syncer shouldn't be consuming this much CPU in
the first place. There is obviously a bug here. Of course looking through
all of the vnodes in the system for something dirty is stupid in the
first place; there should be a seperate list for that. ...but a simple
fix is what is needed right now.
   I'm going to have to bow out of this discussion now. I just don't have
the time for it.

-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Packet loss every 30.999 seconds

2007-12-18 Thread David G Lawrence
> On Tue, 18 Dec 2007, David G Lawrence wrote:
> 
> >>Thanks.  Have a kernel building now.  It takes about a day of uptime
> >>after reboot before I'll see the problem.
> >
> >  You may also wish to try to get the problem to occur sooner after boot
> >on a non-patched system by doing a "tar cf /dev/null /" (note: substitute
> >/dev/zero instead of /dev/null, if you use GNU tar, to disable its
> >"optimization"). You can stop it after it has gone through a 100K files.
> >Verify by looking at "sysctl vfs.numvnodes".
> 
> Hmm, I said to use "find /", but that is not so good since it only
> looks at directories and directories (and their inodes) are not packed
> as tightly as files (and their inodes).  Optimized tar, or "find /
> -type f", or "ls -lR /", should work best, by doing not much more than
> stat()ing lots of files, while full tar wastes time reading file data.

   I have no reason to believe that just reading directories will
reproduce the problem with file vnodes. You need to open the files
and read them. Nothing else will do.

-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Packet loss every 30.999 seconds

2007-12-18 Thread David G Lawrence
> Thanks.  Have a kernel building now.  It takes about a day of uptime  
> after reboot before I'll see the problem.

   You may also wish to try to get the problem to occur sooner after boot
on a non-patched system by doing a "tar cf /dev/null /" (note: substitute
/dev/zero instead of /dev/null, if you use GNU tar, to disable its
"optimization"). You can stop it after it has gone through a 100K files.
Verify by looking at "sysctl vfs.numvnodes".
   Doing this would help to further prove that lots of allocated vnodes
is the prerequisite for the problem.

-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Packet loss every 30.999 seconds

2007-12-18 Thread David G Lawrence
> >Right, it's a non-optimal loop when N is very large, and that's a fairly
> >well understood problem.  I think what DG was getting at, though, is
> >that this massive flush happens every time the syncer runs, which
> >doesn't seem correct.  Sure, maybe you just rsynced 100,000 files 20
> >seconds ago, so the upcoming flush is going to be expensive.  But the
> >next flush 30 seconds after that shouldn't be just as expensive, yet it
> >appears to be so.
> 
> I'm sure it doesn't cause many bogus flushes.  iostat shows zero writes
> caused by calling this incessantly using "while :; do sync; done".

   I didn't say it caused any bogus disk I/O. My original problem
(after a day or two of uptime) was an occasional large scheduling delay
for a process that needed to process VoIP frames in real-time. It was
happening every 31 seconds and was causing voice frames to be dropped
due to the large latency causing the frame to be outside of the jitter
window. I wrote a program that measures the scheduling delay by sleeping
for one tick and then comparing the timeofday offset from what was
expected. This revealed that every 31 seconds, the process was seeing
a 17ms delay in scheduling. Further investigation found that 1) the
syncer was the process that was running every 31 seconds and causing
the delay (and it was the only one in the system with that timing
interval), and that 2) lowering the kern.maxvnodes to something lowish
(5000) would mostly mitigate the problem. The patch to limit the number
of vnodes to process in the loop before sleeping was then developed
and it completely resolved the problem. Since the wait that I added
is at the bottom of the loop and the limit is 500 vnodes, this tells
me that every 31 seconds, there are a whole lot of vnodes that are
being "synced", when there shouldn't have been any (this fact wasn't
apparent to me at the time, but when I later realized this, I had
no time to investigate further). My tests and analysis have all been
on an otherwise quiet system (no disk I/O), so the bottom of the
ffs_sync vnode loop should not have been reached at all, let alone
tens of thousands of times every 31 seconds. All machines were uni-
processor, FreeBSD 6+. I don't know if this problem is present in 5.2.
I didn't see ffs_syncvnode in your call graph, so it probably is not.
   Anyway, someone needs to instrument the vnode loop in ffs_sync and
figure out what is going on. As you've pointed out, it is necessary
to first read a lot of files (I use tar to /dev/null and make sure it
reads at least 100K files) in order to get the vnodes allocated. As
I mentioned previously, I suspect that either ip->i_flag is not getting
completely cleared in ffs_syncvnode or its children or
v_bufobj.bo_dirty.bv_cnt accounting is broken.

-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Packet loss every 30.999 seconds

2007-12-17 Thread David G Lawrence
> While trying to diagnose a packet loss problem in a RELENG_6 snapshot  
> dated
> November 8, 2007 it looks like I've stumbled across a broken driver or
> kernel routine which stops interrupt processing long enough to severly
> degrade network performance every 30.99 seconds.

   I noticed this as well some time ago. The problem has to do with the
processing (syncing) of vnodes. When the total number of allocated vnodes
in the system grows to tens of thousands, the ~31 second periodic sync
process takes a long time to run. Try this patch and let people know if
it helps your problem. It will periodically wait for one tick (1ms) every
500 vnodes of processing, which will allow other things to run.

Index: ufs/ffs/ffs_vfsops.c
===
RCS file: /home/ncvs/src/sys/ufs/ffs/ffs_vfsops.c,v
retrieving revision 1.290.2.16
diff -c -r1.290.2.16 ffs_vfsops.c
*** ufs/ffs/ffs_vfsops.c9 Oct 2006 19:47:17 -   1.290.2.16
--- ufs/ffs/ffs_vfsops.c25 Apr 2007 01:58:15 -
***
*** 1109,1114 
--- 1109,1115 
int softdep_deps;
int softdep_accdeps;
struct bufobj *bo;
+   int flushed_count = 0;
  
fs = ump->um_fs;
if (fs->fs_fmod != 0 && fs->fs_ronly != 0) {/* XXX */
***
*** 1174,1179 
--- 1175,1184 
allerror = error;
vput(vp);
MNT_ILOCK(mp);
+   if (flushed_count++ > 500) {
+   flushed_count = 0;
+   msleep(&flushed_count, MNT_MTX(mp), PZERO, "syncw", 1);
+   }
        }
    MNT_IUNLOCK(mp);
/*

-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Packet loss every 30.999 seconds

2007-12-17 Thread David G Lawrence
   One more comment on my last email... The patch that I included is not
meant as a real fix - it is just a bandaid. The real problem appears to
be that a very large number of vnodes (all of them?) are getting synced
(i.e. calling ffs_syncvnode()) every time. This should normally only
happen for dirty vnodes. I suspect that something is broken with this
check:

if (vp->v_type == VNON || ((ip->i_flag &
(IN_ACCESS | IN_CHANGE | IN_MODIFIED | IN_UPDATE)) == 0 &&
 vp->v_bufobj.bo_dirty.bv_cnt == 0)) {
VI_UNLOCK(vp);
continue;
}


   ...like the i_flag flags aren't ever getting properly cleared (or bv_cnt
is always non-zero).

   ...but I don't have the time to chase this down.

-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Watchdog Timeout - bge devices

2006-10-03 Thread David G Lawrence
> Very interesting data point.  I wonder if this accounts for some of the
> inconsistency in the reporting from others.  In any case, SCHED_ULE is
> still considered to be highly experimental.  Hopefully it will get some
> more attention in the near future to bring it closer to production
> quality.

   I'm not using SCHED_ULE on any of the machines that I'm seeing the
timeout problem with em and fxp devices. I suspect the problem has to do
with interrupt thread scheduling; maybe SCHED_ULE just somehow makes the
problem worse?

-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-10-01 Thread David G Lawrence
> Just an observation.
> 
> All the boxes I've had this problem on have _two_ em interfaces. I have 
> never seen it on my boxes with just one em NIC.
> 
> The error is always em0 timeout - never em1 (I haven't seen any!)
> 
> Yesterday my local network got completely wacky, the gateway had em0 
> timeouts on the screen: but em0 is the _outside_ the windows box that I 
> had to reboot was attached to the inside on em1!
> 
> Could there be something wrong in the driver if we have more than one em 
> interface?

   A machine I have here that shows the problem has one fxp and one em and
the timeouts occur on both interfaces.

-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-29 Thread David G Lawrence
> Are you enabling an option, like IPv6, that puts Giant over the network 
> stack?

>From dmesg:

WARNING: debug.mpsafenet forced to 0 as ipsec requires Giant
WARNING: MPSAFE network stack disabled, expect reduced performance.

   ...the kernel has IPSEC.

-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-29 Thread David G Lawrence
> >Do you have any history of seeing the watchdog timeout problem on your
> > machine?
> 
> On this machine no - but it's the only one running em0. On other
> machines running bge0 then, yes, I see it a lot. But those are all
> SMP machines, aside from one. On that one I am currently building
> the latest 6-STABLE and when it's done (give it a couple of hours)
> I will give it a shot with your code and see what happens.

   Another data point: After rebooting my machine, the program no longer
causes the problem. It appears that something else has to occur first on
the machine to put it into a state that makes it suspectible to the
program.

-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-29 Thread David G Lawrence
> >Attached is a simple user program that will immediately cause pretty much
> > all of the network drivers (at least the ones I own) to stop working and
> > get watchdog timeouts.
> 
> I am runnign this on a single processor machine with an SMP kernel and
> it does not have any effect. I dont tink I have any single processor machines
> running a non SMP kernel to try it on though. Not particularly helpful I 
> know. I'll

   Actually, I think it is helpful to know that the program only has an
effect on some machines. We just need to figure out what the common
denominator is.

> try building a non SMP kernel for this machine if I can.

   Do you have any history of seeing the watchdog timeout problem on your
machine?

-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-29 Thread David G Lawrence
> On Fri, Sep 29, 2006 at 12:27:41AM -0700, David G Lawrence wrote:
> >Attached is a simple user program that will immediately cause pretty much
> > all of the network drivers (at least the ones I own) to stop working and
> > get watchdog timeouts.
> > 
> > WARNING: This program will kill the network on your 6.x server. Do not run 
> > this on a production machine unless you are on the console and can ctrl-C
> > it!
> I have tried this program on my workstation and I have not got any
> timeouts, network works good.
> sysadm:~>uname -a
> FreeBSD sysadm.stc 6.1-STABLE FreeBSD 6.1-STABLE #4: Fri Aug 11 14:11:18

   Is this build date also about the same date that you cvsup'd the sources?

> MSD 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/SYSADM  amd64
> sysadm:~> ifconfig 
> nve0: flags=8843 mtu 1500
> inet6 fe80::2e0:81ff:fe55:bc54%nve0 prefixlen 64 scopeid 0x1 
> inet 192.168.2.26 netmask 0xff00 broadcast 192.168.2.255
> inet 192.168.2.55 netmask 0x broadcast 192.168.2.55
> ether 00:e0:81:55:bc:54
> media: Ethernet autoselect (100baseTX )
> status: active

   Is this a UP machine or MP machine?

-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-29 Thread David G Lawrence
>Attached is a simple user program that will immediately cause pretty much
> all of the network drivers (at least the ones I own) to stop working and
> get watchdog timeouts.

   Oh, one more thing - I've only tried this on uni-processor machines. The
only MP machine that I have here is a production machine that I can't test
this on right now.
   If running this on an SMP machine doesn't show the problem, then try
running multiple copies of it (one for each CPU).

-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-29 Thread David G Lawrence
   Attached is a simple user program that will immediately cause pretty much
all of the network drivers (at least the ones I own) to stop working and
get watchdog timeouts.

WARNING: This program will kill the network on your 6.x server. Do not run 
this on a production machine unless you are on the console and can ctrl-C
it!

-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.
#include 

main()
{
struct pollfd pfd;

pfd.fd = 1;
pfd.events = POLLOUT;
pfd.revents = 0;

while (1) {
if (poll(&pfd, 1 /* stdout */, -1) < 0)
break;
}
}
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-27 Thread David G Lawrence
> In the past (RELENG_5) I've had major problems with syncer delaying
> interrupt threads for long periods (I've seen 8msec).  See
> http://lists.freebsd.org/pipermail/freebsd-stable/2005-February/012346.html
> I'm not sure if this is still a problem (but I am still having some
> problems which may be caused by excessive interrupt and will be doing
> some debugging as I get time).
...
> tool and then post-process the file looking for oddities.  In my case,
> there was a _very_ high correlation between long latencies and syncer.
> If anyone's interested in this approach, I can provide the relevant
> code diffs.

   I've seen this problem as well - results in around 9-10ms of occasional
scheduling delay for a real-time streaming application that I'm developing.
Shutting off softupdates on all of the mounted filesystems helps.
   Note that the watchdog timeout for the network drivers is usually 8000ms
(8 seconds), so this is unlikely to be related to that problem.

-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: nve timeout (and down) regression?

2006-03-25 Thread David G. Lawrence
> This happens w/o any "real" activity on that interface (which goes into
> an Allied Telesyn switch):
> ...
> Mar 24 19:39:54 worf kernel: nve0: device timeout (1)
> Mar 24 19:39:54 worf kernel: nve0: link state changed to DOWN
> Mar 24 19:39:55 worf kernel: nve0: link state changed to UP
> Mar 24 19:40:14 worf kernel: nve0: device timeout (1)

   The problem is the watchdog timeout itself. I've attached am email that
I sent a few months ago which describes the problem, along with a simple
patch which disables the watchdog timer.

-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.

Date: Wed, 4 Jan 2006 16:21:03 -0800
Subject: Re: nve(4) patch - please test!

> Since I sent the mail below I had to discover that the new driver
> has a problem when no cable is plugged in, at least on my Asus board.
> 
> It doesn't only run into timeouts, during some of these timeout the
> machine or at least the keyboard hangs for about a minute.
> 
> Is there anything I can do to help debug this?

   I ran into this problem recently as well and spent some time diagnosing
it. It's not that the cable isn't plugged in - rather it happens whenever
the traffic levels are low.
   The problem is that the nvidia-supplied portion of the driver is defering
the releasing of the completed transmit buffers and this occasionally
results in if_timer expiring, causing the driver watchdog routine to be
called ("device timeout"). The watchdog routine resets the card and the
nvidia-supplied code sits in a high-priority loop waiting for the card
to reset. This can take many seconds and your system will be hung until
it completes.
   I have a work-around patch for the problem that I've attached to this
email. It simply disables the watchdog. A real fix would involve accounting
for the outstanding transmit buffers differently (or perhaps not at all -
e.g. always attempt to call the nvidia-supplied code and if a queue-full
error occurs, then wait for an interrupt before trying to queue more
transmit packets).

-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.

Index: if_nve.c
===
RCS file: /home/ncvs/src/sys/dev/nve/if_nve.c,v
retrieving revision 1.7.2.8
diff -c -r1.7.2.8 if_nve.c
*** if_nve.c25 Dec 2005 21:57:03 -  1.7.2.8
--- if_nve.c5 Jan 2006 00:12:45 -
***
*** 943,949 
return;
}
/* Set watchdog timer. */
!   ifp->if_timer = 8;
  
/* Copy packet to BPF tap */
BPF_MTAP(ifp, m0);
--- 943,949 
return;
}
/* Set watchdog timer. */
!   ifp->if_timer = 0;
  
/* Copy packet to BPF tap */
BPF_MTAP(ifp, m0);
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: sendfile + non local filesystem + lighttpd = EOPNOTSUPP

2005-10-28 Thread David G. Lawrence
> Hello,
> 
> I seem to have a problem serving files with
> lighttpd from non local (smbfs) filesystem.
> Lighttpd tries to use sendfile(2) but, it returns
> with -1 and errno "Operation not supported",
> but i can't find this error in the documented errors
> on the manpage.
> Forcing lighttpd to not use sendfile fixes the problem,
> but i would really like to use it...
> 
> Any suggestions?

   What version of FreeBSD?
 

-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
TeraSolutions, Inc. - http://www.terasolutions.com - (888) 346 7175
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: PXEBOOT/TFTPBOOT + big MD_ROOT problem

2005-04-19 Thread David G. Lawrence
> Hi, 
> 
> I'm trying to make very big MD_ROOT (300MB) sent using PXEBOOT+TFTPBOOT. No
> NFS. It's a sort of diskless machine with all the system on ram. There is a
> problem when the preloaded image is >~32MB. Kernel loads but it does not
> seem to find the files. It seems as if only part of the image is really
> there. With a "small" image (<~32MB), no probleme. I use the same image, off
> course, same init etc... just more data for my application in the big image
> case. 
...
> Am I missing something obvious? 

   I assume you saw this in the tftpd manual page?

BUGS
 Files larger than 33488896 octets (65535 blocks) cannot be transferred
 without client and server supporting blocksize negotiation (RFC1783).

 Many tftp clients will not transfer files over 16744448 octets (32767
 blocks).


-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
TeraSolutions, Inc. - http://www.terasolutions.com - (888) 346 7175
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: SATA RAID Support

2005-02-25 Thread David G. Lawrence
> On Fri, Feb 25, 2005 at 12:18:51PM -0800, David G. Lawrence wrote:
> > Answer  
> > Problem: 
> > WD EIDE drives are dropped from an IDE RAID array or system after several
> > days or weeks of error-free operation.
> 
> Of course I looked at 3ware, at WD and in google to find to find an 
> answer. I found both the articles you posted.
> 
> One applies to PATA drives, which we don't have. It gives no hint about 
> software versions for SATA drives and I did not find a firmware upgrade 
> at WD either.

   All SATA drives prior to a few months ago are PATA with a serializer on
the front end. It is likely that the firmware for the PATA Raptor is the
same as the SATA Raptor, so any problem affecting one would likely affect
the other.
 
> The other one applies to SATA drives. When I found this, I checked with 
> smartctl in another machine (smartctl does not see single drives on the 
> 3ware) all the RAID drives. Acoustic management was disabled on all 
> disks. This does not really surprise me, though. 10 krpm disks are 
> probably meant for servers and to deliver high performance - switching 
> on acoustic management by default on these drives would not be very 
> smart.

   Perhaps, but I found these problem descriptions by doing a search on
3ware controllers with Raptor drives.
   In any case, my point is that the problem you described appears to be
a problem with the drive, not the controller. The fact that you don't
see the problem with the ICP Vortex controller is not proof that the
3ware controller is at fault.

-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
TeraSolutions, Inc. - http://www.terasolutions.com - (888) 346 7175
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: SATA RAID Support

2005-02-25 Thread David G. Lawrence
> > > Under heavy load (I/O load on the disks constantly over 200 tps, average
> > > at about 250 tps, peaks over 600 tps) a random drive disconnects from
> > > the RAID 10. After removing the drive from the config and rescanning the
> > > bus, the drive does not show up anymore. The only way to get the drive
> > > back is to unplug the drive (or switch the computer off, so that power
> > > is removed).  After that there is no problem to rebuild the RAID with
> > > the drive.
> > 
> > Interesting. AFAIR the same sort of errors occured with some older WD
> > drives and 3Ware 750x controllers.
> > The solution was to flash the drives firmware.
> > 
> > A quick googling found some reference to the problem on this document:
> > http://japan.3ware.com/products/pdf/Drive_compatibility_list.pdf

   ...and here is even more information about the problem:


Question
Why do EIDE drives disappear from the IDE RAID array or system after
a short period of error-free operation?

Affected drives:

WD EIDE drives with capacities between 40GB & 120GB
WD EIDE drives with greater than 120GB capacity with a date earlier than
3/25/03.


Answer  
Problem: 
WD EIDE drives are dropped from an IDE RAID array or system after several
days or weeks of error-free operation.

Solution:
The problem is a result of a feature that reduces idle acoustic noise in
desktop drives. This feature may cause a timeout likely (though not
exclusively) in an IDE RAID environment. To disable the feature, you can run
a simple Western Digital utility to turn off a single bit in the drives
run-time configuration. Disabling of this feature will NOT impact normal
system operations. No firmware or hardware changes are required.

IDE Upgrade Utility (Non-3Ware controller cards)
For all configurations other than 3Ware controller cards, download the IDE
Upgrade Utility for the Desktop PC.

3Ware controller cards
If you are using one or more 3Ware controller cards in an IDE RAID
configuration, download the IDE RAID Compatibility Upgrade Utility for 3Ware
7500-X controllers cards.



-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
TeraSolutions, Inc. - http://www.terasolutions.com - (888) 346 7175
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: SATA RAID Support

2005-02-25 Thread David G. Lawrence
> > Under heavy load (I/O load on the disks constantly over 200 tps, average
> > at about 250 tps, peaks over 600 tps) a random drive disconnects from
> > the RAID 10. After removing the drive from the config and rescanning the
> > bus, the drive does not show up anymore. The only way to get the drive
> > back is to unplug the drive (or switch the computer off, so that power
> > is removed).  After that there is no problem to rebuild the RAID with
> > the drive.
> 
> Interesting. AFAIR the same sort of errors occured with some older WD
> drives and 3Ware 750x controllers.
> The solution was to flash the drives firmware.
> 
> A quick googling found some reference to the problem on this document:
> http://japan.3ware.com/products/pdf/Drive_compatibility_list.pdf

   Here is what I dug up from Western Digital:

IDE RAID Compatibility Upgrade - 3Ware Cards
Version Version 1.07
Publish DateApr, 2003   
Description This utility runs within DOS and is used to update WD drives
that are connected to one or more 3Ware 7500-X IDE (Parallel ATA) RAID
controllers. Affected drives: WD drives with capacities between 40GB and
120GB. WD drives with greater than 120GB capacity with Mfg. date codes
earlier than 3/25/03.   
Download
Wdc_upd.zip(140 KB)
Operating System
Windows 2000
Windows XP
Windows ME
Windows 98SE
Windows 98
Instructions
Download the wdc_upd.zip file.
Extract the file onto bootable medium (floppy, CD-RW, network drive, etc.).
Boot the system to be updated to the medium where the update files were
unzipped to.
Run wdcfgupd.bat
The utility will proceed to update all the drives on that controller. This
process takes approximately 1 min. per drive.
Once the update completes, re-boot the system.
The update is complete.
Related Resources   
Technical information, FAQ, and related answers from the knowledge base
Upload Date 04/01/2004  


-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
TeraSolutions, Inc. - http://www.terasolutions.com - (888) 346 7175
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: SATA RAID Support

2005-02-25 Thread David G. Lawrence
> Under heavy load (I/O load on the disks constantly over 200 tps, average
> at about 250 tps, peaks over 600 tps) a random drive disconnects from
> the RAID 10. After removing the drive from the config and rescanning the
> bus, the drive does not show up anymore. The only way to get the drive
> back is to unplug the drive (or switch the computer off, so that power
> is removed).  After that there is no problem to rebuild the RAID with
> the drive.
> 
> -> It's not reproducable. The error occurs under high load, sometimes
>three times a week, sometimes it does not happen in 3 months.
> 
> -> It happens only with the Raptors.
> 
> -> It's always a random drive, there's no drive, that disconnects more
>often

   This sounds like a bug in the drive firmware. Did you look into any
firmware updates from Western Digital? It's possible that the 3ware
controllers push the drives a bit harder and expose problems that wouldn't
show up at slightly lower TPS rates.
   We (TeraSolutions and Download Technologies) have deployed the 3ware
9500S controllers extensively and haven't seen any problems with the 7k250
and 7k400 series Hitachi drives that we use with them.
   One of the cool things about the 3ware controllers is that they will
automatically do bad block reassignment by first recovering the data
from the redundancy, issuing a block reassignment to the drive, and then
writing the recovered block back out to the new (reassigned) block. This
may seem pretty basic for RAID, but many controllers we've tested 
actually don't do this.

-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
TeraSolutions, Inc. - http://www.terasolutions.com - (888) 346 7175
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Bad disk or kernel (ATA Driver) problem? - SOLVED

2005-01-19 Thread David G. Lawrence
> I have had spectacularly bad luck with Maxtor SATA drives.  I've already 
> RMA'd 4 of 8 drives and have 2 more waiting to go back.  1 was DOA, the 
> rest failed completely while in operation (typically locking up the 
> machine).  These were all Maxtor DiamondMax Plus 9 6Y160M0 160GB Serial 
> ATA drives. After testing for power, cables, disk controller, 
> motherboard, and other potential reasons I've decided the drives were 
> just faulty and have switched to another vendor.

   Yeah, the 6Y series Maxtor is particularly bad. More than half of all
of them that I've had (a few dozen in total) have gone bad. The 7Y series
is a slightly better, but still well short of an acceptable failure rate.
   On the other hand, the Hitachi 7K250 and 7K400 series SATA drives have
worked quite well for us, with failure rates in the low single digits.
Seagate SATA drives also seem to be reliable, although they don't perform
very well.

-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
TeraSolutions, Inc. - http://www.terasolutions.com - (888) 346 7175
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: serious networking (em) performance (ggate and NFS) problem

2004-11-25 Thread David G. Lawrence
> >>tests.  With the re driver, no change except placing a 100BT setup with
> >>no packet loss to a gigE setup (both linksys switches) will cause
> >>serious packet loss at 20Mbps data rates.  I have discovered the only
> >>way to get good performance with no packet loss was to
> >>
> >>1) Remove interrupt moderation
> >>2) defrag each mbuf that comes in to the driver.
> >
> >Sounds like you're bumping into a queue limit that is made worse by
> >interrupting less frequently, resulting in bursts of packets that are
> >relatively large, rather than a trickle of packets at a higher rate.
> >Perhaps a limit on the number of outstanding descriptors in the driver or
> >hardware and/or a limit in the netisr/ifqueue queue depth.  You might try
> >changing the default IFQ_MAXLEN from 50 to 128 to increase the size of the
> >ifnet and netisr queues.  You could also try setting net.isr.enable=1 to
> >enable direct dispatch, which in the in-bound direction would reduce the
> >number of context switches and queueing.  It sounds like the device driver
> >has a limit of 256 receive and transmit descriptors, which one supposes is
> >probably derived from the hardware limit, but I have no documentation on
> >hand so can't confirm that.
> >
> >It would be interesting on the send and receive sides to inspect the
> >counters for drops at various points in the network stack; i.e., are we
> >dropping packets at the ifq handoff because we're overfilling the
> >descriptors in the driver, are packets dropped on the inbound path going
> >into the netisr due to over-filling before the netisr is scheduled, etc. 
> >And, it's probably interesting to look at stats on filling the socket
> >buffers for the same reason: if bursts of packets come up the stack, the
> >socket buffers could well be being over-filled before the user thread can
> >run.
> 
> I think it's the tcp_output() path that overflows the transmit side of
> the card.  I take that from the better numbers when he defrags the packets.
> Once I catch up with my mails I start to put up the code I wrote over the
> last two weeks. :-)  You can call me Mr. TCP now. ;-)

   He was doing his test with NFS over TCP, right? ...That would be a single
connection, so how is it possible to 'overflow the transmit side of the
card'? The TCP window size will prevent more than 64KB to be outstanding.
Assuming standard size ethernet frames, that would be a maximum of 45 packets
in-flight at any time (65536/1460=45), well below the 256 available transmit
descriptors.
   It is also worth pointing out that 45 full-size packets is 540us at
gig-e speeds. Even when you add up typical switch latencies and interrupt
overhead and coalesing on both sides, it's hard to imagine that the window
size (bandwidth * delay) would be a significant limiting factor across a
gig-e LAN.
   I too am seeing low NFS performance (both TCP and UDP) with non-SMP
5.3, but on the same systems I can measure raw TCP performance (using
ttcp) of >850Mbps. It looks to me like there is something wrong with
NFS, perhaps caused by delays with scheduling nfsd?

-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
TeraSolutions, Inc. - http://www.terasolutions.com - (888) 346 7175
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Problems reclaiming VM cache = XFree86 startup annoyance

2003-12-20 Thread David G. Lawrence
...
> As you can probably gather, all this manual intervention is a bit of a
> hassle.  So, my question is this: is there a way explicitly to force
> the kernel to flush its VM cache (to move it to "Free").  Failing
> that, are there any sysctls to tune to help alleviate the problem?
> The only sysctls I change in /etc/sysctl.conf are as follows:

   I don't know what is causing your problem, but 'cache' pages in FreeBSD
are free pages - they can be allocated directly in the page allocation code.
They only differ from "free" pages in that they contain cached file data.
   So the number of pages 'cache' vs. 'free' isn't the cause of the problem.
 
> net.inet.tcp.sendspace=65536

   You might want to use 65535 there instead to avoid the system having to
use large-window TCP extensions.

> kern.ipc.shmmax=67108864
> kern.ipc.shmall=32768
> 
> The latter two were in response to an installation message of a port I
> installed quite some time ago (xine, perhaps?).

   Uh, yeah. You might want to take that out and see if it affects the
problem. shmall of 32768 could potentially result in up to 134MB of virtual
memory being consumed, and in most FreeBSD kernel configurations, this would
cause it to run out.

-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
TeraSolutions, Inc. - http://www.terasolutions.com - (888) 346 7175
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Ok, are all the panics fixed now?

2003-08-27 Thread David G. Lawrence
> At 11:42 PM 27/08/2003 +0400, Maxim Konovalov wrote:
> 
> >It's simple: we need to backout all these untested MFCs.
> 
> I dont think people throw in untested MFCs into STABLE. They do their best 
> effort and with that, there will still be some bugs.  Its that simple.

   Well, we all have different definitions for 'testing'.
  To me, 'tesing' is something that takes anywhere from several weeks to
several months to do and often involves hundreds of machines, configurations,
and load mixes.
   Perhaps to some others, 'testing' means that the code compiles and 
the system boots up without panicing.
   For -current, perhaps the fast-and-loose definition is appropriate.
   For -stable that is nearing a 'x.9' release, I think my definition (very
conservative) should be the standard. By that definition, the PAE changes
should be promptly removed (and should have been when the first signs of
trouble showed up).

-DG

David G. Lawrence
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
TeraSolutions, Inc. - http://www.terasolutions.com - (888) 346 7175
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Please confirm (conf#3cf11a7145595546740c6064dbc27044)

2003-06-30 Thread David G. Lawrence
>> I hate to do it, but I am leaning towards a confirmation system as well.
> 
>confirmation requests not going through because the other person uses
>a confirmation system as well sound like a lot of fun. 
>
>Not that I'ld have a better solution ...

   ASK deals with that as well. If the other confirmation system includes
the original email in the confirmation request, then it will allow the email
through due to a special-phrase match in the email header or in your
signiture. ASK also limits the number of confirmations sent to a particular
address to prevent loops like this.
   I think by now we've strayed far away from freebsd-stable. Information
on ASK can be found at http://www.paganini.net/ask - so please go there if
you want to know more about it.
   It's also worth noting that there is a new anti-spam scheme that I heard 
of recently that operates at the SMTP level. In this scheme, if the mail
system hasn't seen email from your server IP + email address before, then
it defers reception of the email for a few hours. This stops spam from
people doing drive-by spamming since they don't try to re-deliver on
temporarly failures.

-DG

David G. Lawrence
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
TeraSolutions, Inc. - http://www.terasolutions.com - (888) 346 7175
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Please confirm (conf#3cf11a7145595546740c6064dbc27044)

2003-06-30 Thread David G. Lawrence
>David G. Lawrence wrote:
>>Michael Sierchio wrote:
>[ ... ]
>>>No, the real issue is that there are scads of virii/worms in the wild
>>>which forge message envelope senders.  It is absurd to send
>>>autoresponder messages to a mailing list.
>>
>>   ASK doesn't normally send autoresponder messages to mailing lists, and
>>of course I have freebsd-stable in my whitelist, but this particular forged
>>piece of spam managed to not match my whitelist entry and also didn't look 
>>like it was from a mailing list.
>
>This ASK autoresponder should pay attention to the Precedence: header and 
>not generate mail in response to 'bulk' or 'list' traffic types.

   It does. The email that ASK responded to did not come from the
freebsd-stable mailing list.

-DG

David G. Lawrence
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
TeraSolutions, Inc. - http://www.terasolutions.com - (888) 346 7175
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Please confirm (conf#3cf11a7145595546740c6064dbc27044)

2003-06-30 Thread David G. Lawrence
>What would it do with my hotel reservation confirmation email?

   It stores all messages waiting for confirmation in a queue. It doesn't
throw anything away. If I'm expecting something like a hotel reservation
confirmation, then I can look in the queue for it. It is really a small
price to pay to avoid the ~400 spams that I get each day.
   I never said this scheme was perfect. For me, spam has reached crisis
proportions - and I only get about 400/day. I know some people that get
over a thousand a day. It all depends on how long and how much you've been
doing email with a particular email address. Aside from changing email
addresses or shutting off email entirely, there are few alternative
solutions that are effective. I despise third party blacklists, especially
spews which I've been victimized by several times.

-DG

David G. Lawrence
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
TeraSolutions, Inc. - http://www.terasolutions.com - (888) 346 7175
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"