Bill Paul wrote:

Yes, but did you do it with a Smartbits though, or just with a couple of
other FreeBSD machines? Unfortunately, a typical FreeBSD system on its own
won't generate frames anywhere near fast enough to really torture test a
gigE interface. At best you might hit around 200000 to 300000 frames/sec.


Yes, it was some model of a Smartbits.

A given Smartbits system doesn't need special hardware to run a
bi-directional forwarding test. If you're using SmartApps, you just
have to click the "Bi-Directional" checkbox on the main setup window.
(At least, that's how it is with the ones at work.)

Didn't know the details here.


Being able to flood the link with the Smartbits is also handy for
provoking error conditions (RX overruns and TX underruns, mostly), which
shows you how well (or not) the driver's error recovery works.

Yup, tested that =-)


In the past I considered creating a kernel module that would grab hold
of a given interface and blast traffic through it with as little software
overhead as possible (e.g. sending the same mbuf over and over) in order
to create my own test system that could hopefully rival the Smartbits,
but I never got around to it. I'm not sure that it's really possible
without custom hardware though.


I tried this.  It was too crude.


Prior to the INTR_FAST change, the machine would live-lock.  Now it
survives, stays responsive, and drops packets as needed.


The wide range of failures people seem to be reporting might mean that
the driver code itself is not the issue, but that there's an interaction
with some other part of the system. This means torture testing the driver
itself might not be enough to provoke the problems.


It's indeed a complex problem, but I haven't ruled out the driver.
Shifting timing around in innocent ways seems to be the key.

Unfortunately, nobody seems to have nailed down a good test case for
any of these failures. I strongly suspect people are leaving out details
which seem obvious and/or trivial to them, but which are critical to
finding the problem. ("Oh, I was using SCHED_ULE... was I not supposed
to do that? Tee-hee. *curls finger in blonde hair*)

The survey that Kris and I sent out specifically asked about ULE, as
well as other 'deceptively obvious' attributes.


Another thing that might be handy is improving the watchdog timeout
message so that it dumps the state of the ICR and ICM registers (and
maybe some other interesting driver and/or device state). The timeout
implies no interrupts were delivered for a Long Time (tm). If the
ICM register indicates interrupts have been masked, then that means
em_intr_fast() was triggered by and interrupt and it scheduled work,
but that work never executed. If that really is what happened, then
I can understand the watchdog error occuring. If that's _not_ what
happened, them something else is screwed up.

Yes, instrumenting em_watchdog is on my TODO list, and will hopefully
reveal a lot more information here.

Scott

_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to