Oliver Lehmann wrote:
Hi,

I've problems with my 3ware controller. Havingg heavy I/O load (e.g.
running 40 port builds the day over with tinderbox which involves
un-taring a whole FreeBSD tree 40 times), my system hangs with the well
known

swap_pager: indefinite wait buffer: bufobj: 0, blkno: 2, size: 4096
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 2, size: 4096

error. I'v opened a ticket at 3ware and after half a month of
dummy-testings (are your drives fine, can you run a stress test), it
looks like i was redirected to someone from the 2nd lvl support and he
told me:

  There are 2 things that you can try,
  1, disable apic in your bootloader.conf file, or RMA the controller.

  The error that you have is generally caused by an interrupt problem,
  defective backplane, bad drive or bad controller.

and after I told him that I intend to use the 2 CPUs I have and not
falling back to one CPU for ever he responded:

  Yes I do understand about disabling APIC, but the feature is sometimes
  not stable in all dual proc systems.  There are many variables, the
  CPU's have to be matched down to the Lot #, the motherboard must have a
  good design and the kernel supporting APIC must be stable. But, it is a
  good test to see if it is software or hardware.

So what I did now, was compiling a kernel w/o apic/smp and I'm running
this configuration now for 3 days stressing the system w/o running into
the swap_pager problem. Can it be still a controller problem or is it
more likley a problem of FreeBSDs smp/apic implementation or the board
I'm using (Intel L440GX).

I'm asking because I'm not sure which problem it is now and before
telling it 3ware and having them responding "ok it is a FreeBSD problem"
or "ok it is a board problem" I'd like to know what can be the case here.

(please keep me CCed, I'm not subscribed to smp@)

Further information (and the history) on this topic can be found here
(and following):

http://lists.freebsd.org/pipermail/freebsd-stable/2008-September/045500.html



The probability that it's a problem in the generic interrupt/APIC code in FreeBSD is low. That code has matured quite well over the last 5 years, and it is very solid for just about every other hardware configuration out there. I'd suspect the following things in the following order:

1. Driver bug. Driver might be loosing an interrupt, or might be deadlocking due to coding/design problems.
2. Defective controller
3. Buggy firmware on the controller.  FreeBSD does tend to push I/O
controllers a lot harder than other OS's, resulting in strange bugs
sometimes being found.
4. Defective motherboard.

The fact that it's running fine with SMP/APIC disabled could easily mean
that it's not taking as high of a load, and is thus avoiding problems.
It could also mean that latent bugs in the driver are not being exposed.
I don't have a lot of time to spend debugging this, but I'd suggest that
you either take up AMCC's offer to RMA the board, or put a spare ATA
drive in the chassis and set it up as a dump partition, then get a
crashdump of the system when it gets into this state.

Scott

_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to