On Mon, Mar 29, 2010 at 09:21:42PM +0200, Attila Nagy wrote: > Pyun YongHyeon wrote: > > On Mon, Mar 29, 2010 at 12:57:59PM +0200, Attila Nagy wrote: > > > >> Hi, > >> > >> Michael Loftis wrote: > >> > >>> --On Thursday, March 25, 2010 3:22 PM +0100 Attila Nagy <b...@fsn.hu> > >>> wrote: > >>> > >>> <...> > >>> > >>>> Both unbound and python accepts DNS requests, and it seems when 25% > >>>> interrupt happens, only unbound is in *udp state, where it is 50%, both > >>>> programs are in that state. > >>>> > >>> Try turning of hardware TSO/checksum offload if it's availble on your > >>> chipset? ifconfig <interface> -rxcsum -txcsum -tso -- I'm only using > >>> nfe chips right now, but w/ the TSO/CSUM on they lock up constantly > >>> under high load. We're pretty sure it's mostly the nfe driver, or the > >>> chips themselves, but have never ruled out some generic 8.x hardware > >>> offload issues. > >>> > >> Bingo, this solved the problem. The current uptime nears four days. > >> Previously I couldn't go further than a day. > >> > >> The machine gets very light TCP load (and other machines which get work > >> well), so I guess it's UDP RX or TX checksum related. > >> > >> > > > > Hmm, this is unexpected result. Since you're using UDP, TSO is not > > involved in this issue. Because you disabled RX/TX checksum > > offloading could you check how many number of 'bad checksum' and > > and 'no checksum' you have from netstat(1)? > > To narrow down which side of checksum offloading causes the issue, > > would you just disable one side in a time? For instance, disable TX > > checksum offloading with RX checksum offloading enabled and see how > > bce(4) works. > > #ifconfig bce0 -txcsum rxcsum > > If that shows the same issue, try disabling RX checksum offloading > > but enabling TX checksum offloading. > > #ifconfig bce0 txcsum -rxcsum > > > It's interesting. During the day, I've disabled only HW checksumming and > left TSO enabled. It couldn't run more than a few hours. > I have disabled tso again to see what happens. > > BTW, of course there is TCP traffic on that interface (DNS is also > available on TCP), maybe this causes the problem.
The only guess I can think of at this moment is incorrect use of bus_dma(9) in TX path. But I'm not sure this is related with the issue you're seeing. Would you try the experimental patch at the following URL? http://people.freebsd.org/~yongari/bce/bce.20100305.diff Please make sure to back up your old bce(4) driver before applying the patch. I didn't see any abnormal things in testing but it wasn't much stressed. _______________________________________________ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"