date:20110209

Re: Slow Intel 10GbE CX4 adapter behaviour

2011-02-09 Thread rihad

Problem solved, I'm so embarrassed :) The issue on 7.2 mentioned above 
with ixgbe (tons of fragmentation failed errors) was real. The issue 
in 8.3-RC3 was because dummynet wasn't being loaded at all... so no 
traffic could pass on it, despite dummynet_load=YES being set in 
/boot/loader.conf. So I turned it on in /etc/rc.conf : 
dummynet_enable=YES and loaded it kldload dummynet in order to do 
without a reboot. Works like a charm so far. Thanks to all!

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: igb driver RX (was TX) hangs when out of mbuf clusters

2011-02-09 Thread Michael Tuexen

Hi Jack,

I could recreate the problem. When the problem occurs, we see

rx_nxt_check = n
rx_nxt_refresh = n + 1

(This was also reported in a mail from Karim)

This means that the *whole* receive ring has no buffers anymore. This can
occur if, for some amount of time, no clusters are available.

Now outside of the driver, at some point of time, clusters are freed.
I don't think that igb_refresh_mbufs() gets called, since it only gets
called from igb_rxeof(), which gets called when a packet has been received,
which can not happen since the receive ring is empty. So how can the driver
know? I have no idea. Maybe we can periodically check for such an event
and call igb_refresh_mbufs().

Does this make sense to you?

Best regards
Michael


On Feb 9, 2011, at 8:32 AM, Jack Vogel wrote:

 Hmmm, well so much for that theory :)
 
 Jack
 
 
 On Tue, Feb 8, 2011 at 4:06 PM, Karim Fodil-Lemelin 
 fodillemlinka...@gmail.com wrote:
 
 
 2011/2/8 Jack Vogel jfvo...@gmail.com
 
 
 I have been following this, and thinking about it. I still am working from a 
 theoretical
 standpoint, but based on a patch I got quite a long time back and never quite 
 groked,
 I believe now that I might have a solution.
 
 The original PR and patch was kern/150516 from Beezar Liu,  I was never quite 
 comfortable
 with the code changes, nor convinced that it was a real issue and not a 
 misunderstanding.
 However I think now that this very report might be behind what we are seeing 
 today. I have
 a slightly different approach to solving it, of course it remains to be seen 
 if it handles it 
 properly. 
 
 Please try the patch I've attached, I'm open to further correction or 
 polishing of the 
 changes. And thanks to Beezar for his original report and changes, this is 
 not for em,
 but if this eliminates the problem its clearly needed in all drivers. 
 
 Jack
 
 
 Hi Jack,
 
 Thanks for your help. I tried your patch and it didn't work so I added a 
 couple of printf to see if the added code was getting hit:
 
 --- a/freebsd/sys/dev/e1000/if_igb.c
 --More--(byte 1253)+++ b/freebsd/sys/dev/e1000/if_igb.c
 @@ -612,7 +612,7 @@ igb_attach(device_t dev)
 device_get_nameunit(dev));
  
 INIT_DEBUGOUT(igb_attach: end);
 -
 +   printf(this driver has a patch from Jack Vogel\n);
 return (0);
  
  err_late:
 @@ -4131,6 +4131,7 @@ igb_rxeof(struct igb_queue *que, int count, int *done)
 struct mbuf *sendmp, *mh, *mp;
 struct igb_rx_buf   *rxbuf;
 u16 hlen, plen, hdr, vtag;
 +   int commit;
 booleop = FALSE;
   
 cur = rxr-rx_base[i];
 @@ -4255,10 +4256,23 @@ next_desc:
 bus_dmamap_sync(rxr-rxdma.dma_tag, rxr-rxdma.dma_map,
 BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
  
 +   commit = i; /* capture the old index */
 +
 /* Advance our pointers to the next descriptor. */
 if (++i == adapter-num_rx_desc)
 i = 0;
 /*
 +   ** Sanity test for ring full, if this
 +   ** happens we need to refresh immediately
 +   ** or refresh may deadlock.
 +   */
 +   if (i == rxr-next_to_refresh) {
 +   igb_refresh_mbufs(rxr, commit);
 +   printf(igb_refresh_mbufs called with commit %d\n, 
 commit);
 +   processed = 0;
 +   }
 +
 +   /*
 ** Send to the stack or LRO
 */
 if (sendmp != NULL) {
 
 Here is the results:
 
 # dmesg | grep Vogel
 this driver has a patch from Jack Vogel
 this driver has a patch from Jack Vogel
 
 # netstat -m
 60453/52707/113160 mbufs in use (current/cache/total)
 48416/51584/10/10 mbuf clusters in use (current/cache/total/max)
 2894/690 mbuf+clusters out of packet secondary zone in use (current/cache)
 11946/854/12800/12800 4k (page size) jumbo clusters in use 
 (current/cache/total/max)
 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
 164834K/119760K/284595K bytes allocated to network (current/cache/total)
 0/339/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
 0/0/0 requests for jumbo clusters denied (4k/9k/16k)
 0/4/6656 sfbufs in use (current/peak/max)
 0 requests for sfbufs denied
 0 requests for sfbufs delayed
 0 requests for I/O initiated by sendfile
 0 calls to protocol drain routines
 # dmesg | grep commit
 
 At this point RX has hung.
 
 Somehow the check (i == rxr-next_to_refresh) is never true in this case. 
 Also, I did read kern/150516 and couldn't wrap my head around the patch for 
 the em driver that Beezar Liu suggested.
 
 Regards,
 
 Karim.
 
 

___
freebsd-net@freebsd.org

Re: Slow Intel 10GbE CX4 adapter behaviour

2011-02-09 Thread Sergey Kandaurov

On 9 February 2011 12:37, rihad ri...@mail.ru wrote:
 Problem solved, I'm so embarrassed :) The issue on 7.2 mentioned above with
 ixgbe (tons of fragmentation failed errors) was real. The issue in 8.3-RC3
 was because dummynet wasn't being loaded at all... so no traffic could pass
 on it, despite dummynet_load=YES being set in /boot/loader.conf. So I
 turned it on in /etc/rc.conf : dummynet_enable=YES and loaded it kldload
 dummynet in order to do without a reboot. Works like a charm so far. Thanks
 to all!

Looks like loading dummynet.ko via /boot/loader.conf doesn't work because
dummynet.ko depends on dummynet.ko but of the different version.

There are even more strange things:
1) dummynet.ko declares itself as version 1:
/sys/netinet/ipfw/ip_dummynet.c: MODULE_VERSION(dummynet, 1);
2) dummynet.ko compiles into itself the various schedulers: fifo, prio, rr, etc;
3) these schedulers presumably think they are compiled standalone, so they
are explicitly and strongly depend on dummynet of version 3 (why?):
/sys/netinet/ipfw/dn_sched.h: MODULE_DEPEND(name, dummynet, 3, 3, 3);

* That makes loader to error like dummynet: loading required module
'dummynet'.
and, if loading dummynet.ko in loader prompt manually, then
module 'dummynet' exists but with wrong version]

This shall fix the problem: rebuilding only dummynet should be enough.
%%%
Index: /sys/netinet/ipfw/ip_dummynet.c
===
--- /sys/netinet/ipfw/ip_dummynet.c (revision 218026)
+++ /sys/netinet/ipfw/ip_dummynet.c (working copy)
@@ -2294,7 +2294,7 @@
 #defineDN_MODEV_ORD(SI_ORDER_ANY - 128) /* after ipfw */
 DECLARE_MODULE(dummynet, dummynet_mod, DN_SI_SUB, DN_MODEV_ORD);
 MODULE_DEPEND(dummynet, ipfw, 2, 2, 2);
-MODULE_VERSION(dummynet, 1);
+MODULE_VERSION(dummynet, 3);

 /*
  * Starting up. Done in order after dummynet_modevent() has been called.
%%%

-- 
wbr,
pluknet
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: Slow Intel 10GbE CX4 adapter behaviour

2011-02-09 Thread rihad


On 02/09/2011 05:47 PM, Sergey Kandaurov wrote:

On 9 February 2011 12:37, rihadri...@mail.ru  wrote:

Problem solved, I'm so embarrassed :) The issue on 7.2 mentioned above with
ixgbe (tons of fragmentation failed errors) was real. The issue in 8.3-RC3
was because dummynet wasn't being loaded at all... so no traffic could pass
on it, despite dummynet_load=YES being set in /boot/loader.conf. So I
turned it on in /etc/rc.conf : dummynet_enable=YES and loaded it kldload
dummynet in order to do without a reboot. Works like a charm so far. Thanks
to all!


Looks like loading dummynet.ko via /boot/loader.conf doesn't work because
dummynet.ko depends on dummynet.ko but of the different version.

Would dummynet_enable=YES in rc.conf still work? We haven't yet had a 
chance to reboot to test that.

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: Slow Intel 10GbE CX4 adapter behaviour

2011-02-09 Thread Sergey Kandaurov

On 9 February 2011 18:15, rihad ri...@mail.ru wrote:
 On 02/09/2011 05:47 PM, Sergey Kandaurov wrote:

 On 9 February 2011 12:37, rihadri...@mail.ru  wrote:

 Problem solved, I'm so embarrassed :) The issue on 7.2 mentioned above
 with
 ixgbe (tons of fragmentation failed errors) was real. The issue in
 8.3-RC3
 was because dummynet wasn't being loaded at all... so no traffic could
 pass
 on it, despite dummynet_load=YES being set in /boot/loader.conf. So I
 turned it on in /etc/rc.conf : dummynet_enable=YES and loaded it
 kldload
 dummynet in order to do without a reboot. Works like a charm so far.
 Thanks
 to all!

 Looks like loading dummynet.ko via /boot/loader.conf doesn't work because
 dummynet.ko depends on dummynet.ko but of the different version.

 Would dummynet_enable=YES in rc.conf still work? We haven't yet had a
 chance to reboot to test that.


Yes, it would.
Note that it depends on firewall_enable=YES also present in rc.conf.

-- 
wbr,
pluknet
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

route messages from NDP

2011-02-09 Thread Sergey Matveychuk


Hello.

In my routing table I see entries after Neighbor Discovery Protocol 
processed:


...
2a02:6b8:0:401:51:4809:8158:1dcd  00:22:fb:3d:82:fe   UHLWvlan438
...

I'd like to catch them via a routing socket when they appear.

First, try to add a static entry:
ndp -s 2a02:6b8:0:403::1:1 00:0e:0c:09:2e:7b

and look at route -n monitor output:
got message of size 240 on Wed Feb  9 17:26:50 2011
RTM_ADD: Add Route: len 240, pid: 82741, seq 2, errno 0, 
flags:HOST,DONE,STATIC

locks:  inits:
sockaddrs: DST,GATEWAY
 2a02:6b8:0:403::1:1 0.e.c.9.2e.7b

We have two sections here - DST and GATEWAY. DST is a IPv6 address 
(sa_family == AF_INET6) and GATEWAY is a MAC (sa_family == AF_LINK).


Just for info sockaddr_dl looks like this:
$1 = {sdl_len = 54 '6', sdl_family = 18 '\022', sdl_index = 24, sdl_type 
= 135 '\207', sdl_nlen = 0 '\0',
  sdl_alen = 6 '\006', sdl_slen = 0 '\0', sdl_data = \000\016\f\t.{, 
'\0' repeats 39 times}


Looks good. Lets wait for NDP entry...

Here is it:
got message of size 328 on Wed Feb  9 17:27:11 2011
RTM_ADD: Add Route: len 328, pid: 0, seq 0, errno 0, 
flags:UP,HOST,DONE,LLINFO,WASCLONED

locks:  inits:
sockaddrs: DST,GATEWAY,IFP,IFA
 2a02:6b8:0:40c:daa2:5eff:fe8c:139  vlan438:0.30.48.33.4.92 
fe80::230:48ff:fe33:492%vlan438


We have four section here DST, GATEWAY, IFP, IFA.
DST is IPv6 address, IFP and IFA I don't care and GATEWAY section is empty.
Let's see why:
$1 = {sdl_len = 54 '6', sdl_family = 18 '\022', sdl_index = 8, sdl_type 
= 135 '\207', sdl_nlen = 0 '\0',

  sdl_alen = 0 '\0', sdl_slen = 0 '\0', sdl_data = '\0' repeats 45 times}

family is AF_LINK (18), it's a correct one. But sdl_alen, sdl_data are 
zeros.


I see this for all routing messages from NDP. All created routing table 
entries are good (no problems here).

Why sockaddr_dl in GATEWAY section has a zero address? Is it a bug?

--
Sem.
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: igb driver RX (was TX) hangs when out of mbuf clusters

2011-02-09 Thread Jack Vogel

OK, but the question is why does the ring get totally consumed this way, the
ring has 1024 descriptors, it seems unintuitive that that whole quantity can
be
used without some being recharged. Do you see the system mbuf pool being
depleted at the same time?

Since you can reproduce it, do me a favor, in rxeof,  change the processed
value from 8 to 4 and then 1, effectively call refresh every descriptor, see
if
that eliminates the issue.

Thanks for your help,

Jack


On Wed, Feb 9, 2011 at 2:36 AM, Michael Tuexen tue...@freebsd.org wrote:

 Hi Jack,

 I could recreate the problem. When the problem occurs, we see

 rx_nxt_check = n
 rx_nxt_refresh = n + 1

 (This was also reported in a mail from Karim)

 This means that the *whole* receive ring has no buffers anymore. This can
 occur if, for some amount of time, no clusters are available.

 Now outside of the driver, at some point of time, clusters are freed.
 I don't think that igb_refresh_mbufs() gets called, since it only gets
 called from igb_rxeof(), which gets called when a packet has been received,
 which can not happen since the receive ring is empty. So how can the driver
 know? I have no idea. Maybe we can periodically check for such an event
 and call igb_refresh_mbufs().

 Does this make sense to you?

 Best regards
 Michael


 On Feb 9, 2011, at 8:32 AM, Jack Vogel wrote:

  Hmmm, well so much for that theory :)
 
  Jack
 
 
  On Tue, Feb 8, 2011 at 4:06 PM, Karim Fodil-Lemelin 
 fodillemlinka...@gmail.com wrote:
 
 
  2011/2/8 Jack Vogel jfvo...@gmail.com
 
 
  I have been following this, and thinking about it. I still am working
 from a theoretical
  standpoint, but based on a patch I got quite a long time back and never
 quite groked,
  I believe now that I might have a solution.
 
  The original PR and patch was kern/150516 from Beezar Liu,  I was never
 quite comfortable
  with the code changes, nor convinced that it was a real issue and not a
 misunderstanding.
  However I think now that this very report might be behind what we are
 seeing today. I have
  a slightly different approach to solving it, of course it remains to be
 seen if it handles it
  properly.
 
  Please try the patch I've attached, I'm open to further correction or
 polishing of the
  changes. And thanks to Beezar for his original report and changes, this
 is not for em,
  but if this eliminates the problem its clearly needed in all drivers.
 
  Jack
 
 
  Hi Jack,
 
  Thanks for your help. I tried your patch and it didn't work so I added a
 couple of printf to see if the added code was getting hit:
 
  --- a/freebsd/sys/dev/e1000/if_igb.c
  --More--(byte 1253)+++ b/freebsd/sys/dev/e1000/if_igb.c
  @@ -612,7 +612,7 @@ igb_attach(device_t dev)
  device_get_nameunit(dev));
 
  INIT_DEBUGOUT(igb_attach: end);
  -
  +   printf(this driver has a patch from Jack Vogel\n);
  return (0);
 
   err_late:
  @@ -4131,6 +4131,7 @@ igb_rxeof(struct igb_queue *que, int count, int
 *done)
  struct mbuf *sendmp, *mh, *mp;
  struct igb_rx_buf   *rxbuf;
  u16 hlen, plen, hdr, vtag;
  +   int commit;
  booleop = FALSE;
 
  cur = rxr-rx_base[i];
  @@ -4255,10 +4256,23 @@ next_desc:
  bus_dmamap_sync(rxr-rxdma.dma_tag, rxr-rxdma.dma_map,
  BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
 
  +   commit = i; /* capture the old index */
  +
  /* Advance our pointers to the next descriptor. */
  if (++i == adapter-num_rx_desc)
  i = 0;
  /*
  +   ** Sanity test for ring full, if this
  +   ** happens we need to refresh immediately
  +   ** or refresh may deadlock.
  +   */
  +   if (i == rxr-next_to_refresh) {
  +   igb_refresh_mbufs(rxr, commit);
  +   printf(igb_refresh_mbufs called with commit
 %d\n, commit);
  +   processed = 0;
  +   }
  +
  +   /*
  ** Send to the stack or LRO
  */
  if (sendmp != NULL) {
 
  Here is the results:
 
  # dmesg | grep Vogel
  this driver has a patch from Jack Vogel
  this driver has a patch from Jack Vogel
 
  # netstat -m
  60453/52707/113160 mbufs in use (current/cache/total)
  48416/51584/10/10 mbuf clusters in use (current/cache/total/max)
  2894/690 mbuf+clusters out of packet secondary zone in use
 (current/cache)
  11946/854/12800/12800 4k (page size) jumbo clusters in use
 (current/cache/total/max)
  0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
  0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
  164834K/119760K/284595K bytes allocated to network (current/cache/total)
  0/339/0 requests for mbufs denied

Re: if_run in hostap mode: issue with stations in the power save mode

2011-02-09 Thread Bernhard Schmidt

On Tuesday 08 February 2011 10:52:53 Bernhard Schmidt wrote:
 I've combined both patches (see attachment), if I get an ACK from both
 of you I'll try get this into the tree ASAP.

Committed, thanks!

-- 
Bernhard
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: igb driver RX (was TX) hangs when out of mbuf clusters

2011-02-09 Thread Michael Tuexen

On Feb 9, 2011, at 6:35 PM, Jack Vogel wrote:

 OK, but the question is why does the ring get totally consumed this way, the
 ring has 1024 descriptors, it seems unintuitive that that whole quantity can 
 be
 used without some being recharged. Do you see the system mbuf pool being
 depleted at the same time?
That was the test case I created: I set up a server accepting connections
but not reading anything. So the driver passes the mbufs to the transport
stack and they are not consumed. Then the problem occurs. Then I kill the
server. Now there are mbufs available again, but the driver doesn't know.

I had the impression that these were the circumstances in which the problem
showed up (mbuf allocations failing).
 
 Since you can reproduce it, do me a favor, in rxeof,  change the processed
 value from 8 to 4 and then 1, effectively call refresh every descriptor, see 
 if
 that eliminates the issue.
I will do. Need to see if I can do it remotely, since I'm not in my lab
right now. Can do it tomorrow for sure.

But I do not think that this solves the problem, since I did the things
very slowly and you call it at least when you are leaving rxeof.

Best regards
Michael
 
 Thanks for your help,
 
 Jack
 
 
 On Wed, Feb 9, 2011 at 2:36 AM, Michael Tuexen tue...@freebsd.org wrote:
 Hi Jack,
 
 I could recreate the problem. When the problem occurs, we see
 
 rx_nxt_check = n
 rx_nxt_refresh = n + 1
 
 (This was also reported in a mail from Karim)
 
 This means that the *whole* receive ring has no buffers anymore. This can
 occur if, for some amount of time, no clusters are available.
 
 Now outside of the driver, at some point of time, clusters are freed.
 I don't think that igb_refresh_mbufs() gets called, since it only gets
 called from igb_rxeof(), which gets called when a packet has been received,
 which can not happen since the receive ring is empty. So how can the driver
 know? I have no idea. Maybe we can periodically check for such an event
 and call igb_refresh_mbufs().
 
 Does this make sense to you?
 
 Best regards
 Michael
 
 
 On Feb 9, 2011, at 8:32 AM, Jack Vogel wrote:
 
  Hmmm, well so much for that theory :)
 
  Jack
 
 
  On Tue, Feb 8, 2011 at 4:06 PM, Karim Fodil-Lemelin 
  fodillemlinka...@gmail.com wrote:
 
 
  2011/2/8 Jack Vogel jfvo...@gmail.com
 
 
  I have been following this, and thinking about it. I still am working from 
  a theoretical
  standpoint, but based on a patch I got quite a long time back and never 
  quite groked,
  I believe now that I might have a solution.
 
  The original PR and patch was kern/150516 from Beezar Liu,  I was never 
  quite comfortable
  with the code changes, nor convinced that it was a real issue and not a 
  misunderstanding.
  However I think now that this very report might be behind what we are 
  seeing today. I have
  a slightly different approach to solving it, of course it remains to be 
  seen if it handles it
  properly.
 
  Please try the patch I've attached, I'm open to further correction or 
  polishing of the
  changes. And thanks to Beezar for his original report and changes, this is 
  not for em,
  but if this eliminates the problem its clearly needed in all drivers.
 
  Jack
 
 
  Hi Jack,
 
  Thanks for your help. I tried your patch and it didn't work so I added a 
  couple of printf to see if the added code was getting hit:
 
  --- a/freebsd/sys/dev/e1000/if_igb.c
  --More--(byte 1253)+++ b/freebsd/sys/dev/e1000/if_igb.c
  @@ -612,7 +612,7 @@ igb_attach(device_t dev)
  device_get_nameunit(dev));
 
  INIT_DEBUGOUT(igb_attach: end);
  -
  +   printf(this driver has a patch from Jack Vogel\n);
  return (0);
 
   err_late:
  @@ -4131,6 +4131,7 @@ igb_rxeof(struct igb_queue *que, int count, int *done)
  struct mbuf *sendmp, *mh, *mp;
  struct igb_rx_buf   *rxbuf;
  u16 hlen, plen, hdr, vtag;
  +   int commit;
  booleop = FALSE;
 
  cur = rxr-rx_base[i];
  @@ -4255,10 +4256,23 @@ next_desc:
  bus_dmamap_sync(rxr-rxdma.dma_tag, rxr-rxdma.dma_map,
  BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
 
  +   commit = i; /* capture the old index */
  +
  /* Advance our pointers to the next descriptor. */
  if (++i == adapter-num_rx_desc)
  i = 0;
  /*
  +   ** Sanity test for ring full, if this
  +   ** happens we need to refresh immediately
  +   ** or refresh may deadlock.
  +   */
  +   if (i == rxr-next_to_refresh) {
  +   igb_refresh_mbufs(rxr, commit);
  +   printf(igb_refresh_mbufs called with commit %d\n, 
  commit);
  +   processed = 0;
  +   }
  +
  +   /*
  **

Problem with re0

2011-02-09 Thread Gabor Radnai

 Both the em and re drivers have had a lot of work done recently.  Are
 you trying with 8.2RC1 ?

Tried with 8.2RC2 (via fixit shell with em): the same symptoms sadly.
Card recognized, driver loaded as a result ifconfig reports it as
available interface. Though neither static IP addressing nor DHCP
makes it accessible on network. Interface cannot ping even the default
gateway and neither this machine can be pinged.

I am very sad :-(
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: bge wedging 8.2-RC1

2011-02-09 Thread Pyun YongHyeon

On Mon, Feb 07, 2011 at 08:27:43PM -0600, Peter Lai wrote:
 On Feb 7, 2011 7:38 PM, Pyun YongHyeon pyu...@gmail.com wrote:
 
  On Mon, Feb 07, 2011 at 06:09:16PM -0600, Peter Lai wrote:
   Hello
  
   I've got a new Dell Precision workstation here with a BCM5761 on intel
   mobo for westmere xeons that is wedging with interrupt storm and will
   lockup the system randomly. I have turned HTT and auto powermanagement
   off in bios (system cannot sleep), lowest cpu acpi state is C1.
  
   Here is dmesg:
   bge0: Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev.
   0x5761100 mem 0xf3be-0xf3be,0xf3bf-0xf3bf irq 17 at
   device 0.0 on pci6
   bge0: CHIP ID 0x05761100; ASIC REV 0x5761; CHIP REV 0x57611; PCI-E
   miibus0: MII bus on bge0
   brgphy0: BCM5761 10/100/1000baseTX PHY PHY 1 on miibus0
   brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT,
   1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
  
   Here is pciconf -lv:
   bge0@pci0:6:0:0:  class=0x02 card=0x026d1028 chip=0x168114e4
   rev=0x10 hdr=0x00
   vendor = 'Broadcom Corporation'
   device = 'Broadcom 57XX Gigabit Integrated Controller
  (BCM5761)'
   class  = network
   subclass   = ethernet
  
   here is the setup in rc.conf:
  
   ifconfig_bge0=polling -tso -vlanhwtso -vlanhwtag -vlanmtu inet
   192.168.123.124 netmask 255.255.255.0
  
   I have the card plugged into a dlink DSS8 100mbps switch with one
   other 100mbps device on it (rich man's crossover cable).
  
   Before turning off TSO4 and VLAN tagging (because I don't use them),
   the card would do several things:
   1. 1 out of 3 reboots: Fail to bring interface up. ifconfig would hang
   and systat/vmstat showed 800+ interrupts per second on IRQ256
 
  This is strange. bge(4) does not use MSI if you build bge(4) with
  DEVICE_POLLING so seeing IRQ256 interrupts looks odd to me.
  Are you sure bge(4) is using IRQ256?
 
 This is with GENERIC. I will rebuild with POLLING and try...
 

Let me know attached patch makes any difference on your box.
The patch contains some other changes but that wouldn't affect your
BCM5761 controller. If you see CLKREQ enabled message after
applying the patch also let me know that too.

 
   2. After a few hours lock up the system, requiring hard reboot
  
   After disabling TSO4 and VLAN stuff:
   bge0: flags=8802BROADCAST,SIMPLEX,MULTICAST metric 0 mtu 1500
 options=80083RXCSUM,TXCSUM,VLAN_HWCSUM,LINKSTATE
 media: Ethernet autoselect (100baseTX
   full-duplex,flowcontrol,rxpause,txpause)
  
   Everything seemed fine for about two weeks and then suddenly started
   acting up again, locked up, after hard reboot, soft reboot, link will
   not come up and I see interrupt storm again
  
 
  If you don't use DEVICE_POLLING, rebuild bge(4) with
  DEVICE_POLLING. For most cases, you don't need to enable polling on
  intelligent controllers like bge(4).
 
  I also have BCM5761 PCIe controller which shows no such issues. I
  know there is an edge case(send BD corruption) for BCM5761/BCM5784/
  BCM57780 which needs to be investigated. I'm not sure you're seeing
  that edge case though.
 
   I am close to buying an intel card to replace the bcm, but then I
   noticed that the main intel desktop PCI-E card is 82574L-based and
   people are having em driver wedging on that too. So now I have broken
   ethernet on this box; my primary link is atheros 5212 pci card and I
   may be out of pci slots (or else I might try a pci intel card).
Index: sys/dev/bge/if_bgereg.h
===
--- sys/dev/bge/if_bgereg.h	(revision 218409)
+++ sys/dev/bge/if_bgereg.h	(working copy)
@@ -2004,6 +2004,11 @@
 #define	BGE_EECTL_DATAOUT		0x0010
 #define	BGE_EECTL_DATAIN		0x0020
 
+/* PCIe Link control register */
+#define	BGE_PCIE_LNKCTL			0x7D54
+#define	BGE_PCIE_LNKCTL_L1_PLL_PD_ENB	0x0008
+#define	BGE_PCIE_LNKCTL_L1_PLL_PD_DIS	0x0080
+
 /* MDI (MII/GMII) access register */
 #define	BGE_MDI_DATA			0x0001
 #define	BGE_MDI_DIR			0x0002
@@ -2769,6 +2774,7 @@
 #define	BGE_FLAG_4G_BNDRY_BUG	0x0200
 #define	BGE_FLAG_RX_ALIGNBUG	0x0400
 #define	BGE_FLAG_SHORT_DMA_BUG	0x0800
+#define	BGE_FLAG_CLKREQ_BUG	0x1000
 	uint32_t		bge_phy_flags;
 #define	BGE_PHY_WIRESPEED	0x0001
 #define	BGE_PHY_ADC_BUG		0x0002
Index: sys/dev/bge/if_bge.c
===
--- sys/dev/bge/if_bge.c	(revision 218409)
+++ sys/dev/bge/if_bge.c	(working copy)
@@ -879,6 +879,8 @@
 {
 	struct bge_softc *sc;
 	struct mii_data *mii;
+	uint16_t lnkctl;
+
 	sc = device_get_softc(dev);
 	mii = device_get_softc(sc-bge_miibus);
 
@@ -905,6 +907,18 @@
 		sc-bge_link = 0;
 	if (sc-bge_link == 0)
 		return;
+	/* Disable CLKREQ when controller is running at 10/100Mbps. */
+	if (sc-bge_flags  BGE_FLAG_CLKREQ_BUG) {
+		lnkctl = pci_read_config(sc-bge_dev, sc-bge_expcap +
+

Re: bge wedging 8.2-RC1

2011-02-09 Thread Peter Lai


 Let me know attached patch makes any difference on your box.
 The patch contains some other changes but that wouldn't affect your
 BCM5761 controller. If you see CLKREQ enabled message after
 applying the patch also let me know that too.


Can I apply this to 8.2-RC1 or should I update it to -RC3?
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: bge wedging 8.2-RC1

2011-02-09 Thread Pyun YongHyeon

On Wed, Feb 09, 2011 at 06:28:31PM -0600, Peter Lai wrote:
 
  Let me know attached patch makes any difference on your box.
  The patch contains some other changes but that wouldn't affect your
  BCM5761 controller. If you see CLKREQ enabled message after
  applying the patch also let me know that too.
 
 
 Can I apply this to 8.2-RC1 or should I update it to -RC3?

I guess you can apply it to 8.2-RC1 without a problem.
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: Slow Intel 10GbE CX4 adapter behaviour

2011-02-09 Thread rihad


On 02/09/2011 07:27 PM, Sergey Kandaurov wrote:

On 9 February 2011 18:15, rihadri...@mail.ru  wrote:

On 02/09/2011 05:47 PM, Sergey Kandaurov wrote:


On 9 February 2011 12:37, rihadri...@mail.ruwrote:


Problem solved, I'm so embarrassed :) The issue on 7.2 mentioned above
with
ixgbe (tons of fragmentation failed errors) was real. The issue in
8.3-RC3
was because dummynet wasn't being loaded at all... so no traffic could
pass
on it, despite dummynet_load=YES being set in /boot/loader.conf. So I
turned it on in /etc/rc.conf : dummynet_enable=YES and loaded it
kldload
dummynet in order to do without a reboot. Works like a charm so far.
Thanks
to all!


Looks like loading dummynet.ko via /boot/loader.conf doesn't work because
dummynet.ko depends on dummynet.ko but of the different version.


Would dummynet_enable=YES in rc.conf still work? We haven't yet had a
chance to reboot to test that.



Yes, it would.
Note that it depends on firewall_enable=YES also present in rc.conf.



Thanks, I see. Now I think that changing through rc.conf is the 
official, or supported, way of enabling dummynet upon reboot, but 
loader.conf is a little way under the hood. I always asked myself why it 
was settable in two places, and not one. But now I know. The fact that 
dummynet can be set to load in loader.conf is more like an undesired 
effect of generality.

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: kern/154600: [tcp] [panic] Random kernel panics on tcp_output

2011-02-09 Thread linimon

Old Synopsis: Random kernel panics on tcp_output
New Synopsis: [tcp] [panic] Random kernel panics on tcp_output

Responsible-Changed-From-To: freebsd-amd64-freebsd-net
Responsible-Changed-By: linimon
Responsible-Changed-When: Thu Feb 10 05:41:32 UTC 2011
Responsible-Changed-Why: 
reclassify and assign.

http://www.freebsd.org/cgi/query-pr.cgi?pr=154600
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: kern/154591: [msk] [panic] if_msk driver causes kernel panic (fatal trap while in kernel mode)

2011-02-09 Thread linimon

Old Synopsis: if_msk driver causes kernel panic (fatal trap while in kernel 
mode)
New Synopsis: [msk] [panic] if_msk driver causes kernel panic (fatal trap while 
in kernel mode)

Responsible-Changed-From-To: freebsd-bugs-freebsd-net
Responsible-Changed-By: linimon
Responsible-Changed-When: Thu Feb 10 05:43:45 UTC 2011
Responsible-Changed-Why: 
reassign.

http://www.freebsd.org/cgi/query-pr.cgi?pr=154591
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: Slow Intel 10GbE CX4 adapter behaviour

Re: igb driver RX (was TX) hangs when out of mbuf clusters

Re: Slow Intel 10GbE CX4 adapter behaviour

Re: Slow Intel 10GbE CX4 adapter behaviour

Re: Slow Intel 10GbE CX4 adapter behaviour

route messages from NDP

Re: igb driver RX (was TX) hangs when out of mbuf clusters

Re: if_run in hostap mode: issue with stations in the power save mode

Re: igb driver RX (was TX) hangs when out of mbuf clusters

Problem with re0

Re: bge wedging 8.2-RC1

Re: bge wedging 8.2-RC1

Re: bge wedging 8.2-RC1

Re: Slow Intel 10GbE CX4 adapter behaviour

Re: kern/154600: [tcp] [panic] Random kernel panics on tcp_output

Re: kern/154591: [msk] [panic] if_msk driver causes kernel panic (fatal trap while in kernel mode)

16 matches

Site Navigation

Mail list logo

Footer information