Re: tg3 issues
Michael Chan wrote: Last update on this. "udelay( 100 + net_random % 300 )" seems to work much better and i have not had a single problem getting the link up within 10 seconds of a cold or warm-boot, and most often the link comes up directly without any sort of delay instead like before when it could hang for 30 seconds before getting a link, if you even got a link. We'll have to do some testing to see if we can find a better solution. Adding up to 400 usec of busy wait is not ideal. Are you connecting two 5701 fiber cards directly to each other in your setup? Hi, Yea, it's and ugly hack, but it's a workaround that at least works for me. Have done some additional tests and to me it seems that the driver just needs to wait a bit longer to detect the link + some random time to get around the issues it might have when chasing the other systems state. Don't have the numbers in front of me, but i did some tests to get the 'optimum' delay to wait and it seems like the larger the wait time (even up to around 40-50ms!) causes the autoneg to go much smoother and faster. And remember that the link can be reported up, but no traffic can be passed via the link before the autoneg is complete and both cards reports back that flowcontrol is enabled so you need to verify the link by trying to send some data and not just look at the link-status. Hope my conclusions might help, or atleast point you in the right direction. I have 2 01:08.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5701 Gigabit Ethernet (rev 15) ( 14e4:1645 ) connected via a single FC cable. And the two systems are one single-core and one dual-core AMD system. Regards, Patric - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tg3 issues
On Sun, 2007-07-22 at 13:43 +0200, patric wrote: > patric wrote: > > > Hi, > > > > Think i got something working for me at least, and the fix is quite > > minimal and only downside that i could see from it was that you might > > get a small delay when bringing up the interface, but that's probably > > better than getting a non-functional interface that reports that it's up. > > > > The fix seems to be quite simple with just a random sleep at the end > > of "tg3_setup_fiber_by_hand():" > > > >tw32_f(MAC_MODE, tp->mac_mode); > >udelay(40); > >} > > > > out: > >udelay( net_random() % 400 ); > >return current_link_up; > > } > > > > Not sure that this is a good fix or if it might break on other > > systems, but maybe you could have a quick look at that? > > > > Regards, > > Patric > > > > - > > To unsubscribe from this list: send the line "unsubscribe netdev" in > > the body of a message to [EMAIL PROTECTED] > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > Last update on this. > > "udelay( 100 + net_random % 300 )" seems to work much better and i have > not had a single problem getting the link up within 10 seconds of a cold > or warm-boot, and most often the link comes up directly without any sort > of delay instead like before when it could hang for 30 seconds before > getting a link, if you even got a link. > We'll have to do some testing to see if we can find a better solution. Adding up to 400 usec of busy wait is not ideal. Are you connecting two 5701 fiber cards directly to each other in your setup? - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tg3 issues
patric wrote: Hi, Think i got something working for me at least, and the fix is quite minimal and only downside that i could see from it was that you might get a small delay when bringing up the interface, but that's probably better than getting a non-functional interface that reports that it's up. The fix seems to be quite simple with just a random sleep at the end of "tg3_setup_fiber_by_hand():" tw32_f(MAC_MODE, tp->mac_mode); udelay(40); } out: udelay( net_random() % 400 ); return current_link_up; } Not sure that this is a good fix or if it might break on other systems, but maybe you could have a quick look at that? Regards, Patric - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Last update on this. "udelay( 100 + net_random % 300 )" seems to work much better and i have not had a single problem getting the link up within 10 seconds of a cold or warm-boot, and most often the link comes up directly without any sort of delay instead like before when it could hang for 30 seconds before getting a link, if you even got a link. /Patric - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tg3 issues
Michael Chan wrote: On Fri, 2007-07-20 at 18:59 +0200, patric wrote: Thanks... That's a confirmation on what i suspected.. I think i'll dig into the code instead and try to figure out some way of making it work a bit better for my setup atleast... I'll post a patch later if i get something working... Btw, do you have any timings on how long it takes for the cards to get a lock on the link? Serdes link comes up very fast. The clause 37 autoneg involves going through a few states and exchanging base pages and optional next pages. Each state completes in 10 msec, so the whole autoneg should complete in roughly 100 msec or so. But because it is done in software for these older cards, it can take much longer because it can get out-of-sync. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Hi, Think i got something working for me at least, and the fix is quite minimal and only downside that i could see from it was that you might get a small delay when bringing up the interface, but that's probably better than getting a non-functional interface that reports that it's up. The fix seems to be quite simple with just a random sleep at the end of "tg3_setup_fiber_by_hand():" tw32_f(MAC_MODE, tp->mac_mode); udelay(40); } out: udelay( net_random() % 400 ); return current_link_up; } Not sure that this is a good fix or if it might break on other systems, but maybe you could have a quick look at that? Regards, Patric - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tg3 issues
On Fri, 2007-07-20 at 18:59 +0200, patric wrote: > Thanks... That's a confirmation on what i suspected.. I think i'll dig > into the code instead and try to figure out some way of making it work a > bit better for my setup atleast... > I'll post a patch later if i get something working... Btw, do you have > any timings on how long it takes for the cards to get a lock on the link? > Serdes link comes up very fast. The clause 37 autoneg involves going through a few states and exchanging base pages and optional next pages. Each state completes in 10 msec, so the whole autoneg should complete in roughly 100 msec or so. But because it is done in software for these older cards, it can take much longer because it can get out-of-sync. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tg3 issues
Michael Chan wrote: On Thu, 2007-07-19 at 15:24 +0200, patric wrote: Just a hypothetical question. If the 2 network cards starts the autonegotiation would it be possible that they get into a loop where they are chasing each others state? Maybe a fix could be to add a sleep of a random length that would enable them to catch up? Maybe you know if any of the fiber-cards so support running without flowcontrol too since the cards don't seem to be able to get a link with flowcontrol turned off at least in this setup. The old 5701 fiber NICs do not support autonegotiation in hardware so it is done "by hand" in the driver. It is not the most robust way of doing autoneg and what you described is totally possible. You might want to try disabling autoneg to see if it works any better. There is only one possible speed in fiber and autoneg is really only used to negotiate flow control. Some switch ports will not link up if the link partner does not do autoneg though. You have to use ethtool in initrd to turn off autoneg or just modify the driver to disable autoneg. Thanks... That's a confirmation on what i suspected.. I think i'll dig into the code instead and try to figure out some way of making it work a bit better for my setup atleast... I'll post a patch later if i get something working... Btw, do you have any timings on how long it takes for the cards to get a lock on the link? Thanks for your input! /Patric - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tg3 issues
On Thu, 2007-07-19 at 15:24 +0200, patric wrote: > > Just a hypothetical question. If the 2 network cards starts the > autonegotiation would it be possible that they get into a loop where > they are chasing each others state? Maybe a fix could be to add a sleep > of a random length that would enable them to catch up? Maybe you know if > any of the fiber-cards so support running without flowcontrol too since > the cards don't seem to be able to get a link with flowcontrol turned > off at least in this setup. > > The old 5701 fiber NICs do not support autonegotiation in hardware so it is done "by hand" in the driver. It is not the most robust way of doing autoneg and what you described is totally possible. You might want to try disabling autoneg to see if it works any better. There is only one possible speed in fiber and autoneg is really only used to negotiate flow control. Some switch ports will not link up if the link partner does not do autoneg though. You have to use ethtool in initrd to turn off autoneg or just modify the driver to disable autoneg. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tg3 issues
Neil Horman wrote: On Thu, Jul 19, 2007 at 04:49:13PM +0530, pradeep singh wrote: CCing: netdev On 7/19/07, patric <[EMAIL PROTECTED]> wrote: Hi, To start with, i'm not sure if this should go to the dev or user list, but i'll start here.. I'm currently running a nfsroot via a Broadcom NetXtreme 1000-SX card (BCM5701) and i have a big problem with the tg3 drivers autonegotiation. The issue seems to be that when the kernel comes so far as it's trying to mount the boot the autonegotiation has not yet completed and then causes a panic since it cannot mount the nfsroot. From some debugging i have done here the issues seems to be related to the flowcontrol configuration, and just to make it a bit more fun it does work some of the time.. (around once every 5-10 attempts.) On the console it looks something like this when failing. (written from memory since i don't have netconsole enabled) tg3: eth0: Link is up at 1000 Mbps, full duplex. tg3: eth0: Flow control is off for TX and off for RX. IP-Config: Complete: device=eth0, addr=192.168.1.10, mask=255.255.255.0, gw=255.255.255.255, host=amd, domain=, nis-domain=(none), bootserver=255.255.255.255, rootserver=192.168.1.1, rootpath= Root-NFS: unknown option: nolocks Looking up port of RPC 13/3 on 192.168.1.1 rpcbind: server 192.168.1.1 not responding, timed out Root-NFS: Unable to get nfsd port number from server, using default Looking up port of RPC 13/3 on 192.168.1.1 rpcbind: server 192.168.1.1 not responding, timed out Root-NFS: Unable to get nfsd port number from server, using default and so on until it panics... IIRC, there are two main problems in this typ of situation 1) Spanning tree convergence 2) Firmware initalization latency If you are running spanning tree on your network, it can take up to 2 minutes before your port will forward frames properly. if you have the options available, disable spanning tree on the switch port connected to your host system, or at least enable portfast if it is an option. That should fix any spanning tree issues you have If the tg3 card is just taking a long time to initalize, there is not too much you can do about it. If your goal is to use nfs root, I would, instead of enabling nfs-root as a kernel config option, I would create an initramfs that: A) Brings up your NIC B) Mounts your nfs partition C) executes a switch_root or pivot_root operation That way you can calibrate a delay between steps (A) and (B) in your initramfs init script Regards Neil Hi Neil and thanks for your quick reply, and thanks Pradeep for forwarding the question to the correct mailinglist. Well, not using any switches and just a crossed cable between the machines. Did notice that it seems to get a 'good link' more often when cold-booting the client. Have been thinking about using a initrd to get around the issue, but the problem is that you never know how long the init will be so there will always have to be a quite big delay before the system can boot. But don't really think the issue is that the card takes a long time to initialize since it does sometime work without delay during a warm-boot and the cards do report that they are up but they then are reporting different states of flow-control. Maybe set the flowcontrol static in the driver for a test, if i now can figure out how this driver works. :) Just a hypothetical question. If the 2 network cards starts the autonegotiation would it be possible that they get into a loop where they are chasing each others state? Maybe a fix could be to add a sleep of a random length that would enable them to catch up? Maybe you know if any of the fiber-cards so support running without flowcontrol too since the cards don't seem to be able to get a link with flowcontrol turned off at least in this setup. Regards, Patric - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tg3 issues
On Thu, Jul 19, 2007 at 04:49:13PM +0530, pradeep singh wrote: > CCing: netdev > > On 7/19/07, patric <[EMAIL PROTECTED]> wrote: > >Hi, > > > > > >To start with, i'm not sure if this should go to the dev or user list, > >but i'll start here.. > > > > > >I'm currently running a nfsroot via a Broadcom NetXtreme 1000-SX card > >(BCM5701) and i have a big problem with the tg3 drivers autonegotiation. > > > >The issue seems to be that when the kernel comes so far as it's trying > >to mount the boot the autonegotiation has not yet completed and then > >causes a panic since it cannot mount the nfsroot. > > > > > > From some debugging i have done here the issues seems to be related to > >the flowcontrol configuration, and just to make it a bit more fun it > >does work some of the time.. (around once every 5-10 attempts.) > > > > > >On the console it looks something like this when failing. (written from > >memory since i don't have netconsole enabled) > > > >tg3: eth0: Link is up at 1000 Mbps, full duplex. > >tg3: eth0: Flow control is off for TX and off for RX. > >IP-Config: Complete: > > device=eth0, addr=192.168.1.10, mask=255.255.255.0, > >gw=255.255.255.255, > > host=amd, domain=, nis-domain=(none), > > bootserver=255.255.255.255, rootserver=192.168.1.1, rootpath= > >Root-NFS: unknown option: nolocks > >Looking up port of RPC 13/3 on 192.168.1.1 > >rpcbind: server 192.168.1.1 not responding, timed out > > > >Root-NFS: Unable to get nfsd port number from server, using default > > > >Looking up port of RPC 13/3 on 192.168.1.1 > >rpcbind: server 192.168.1.1 not responding, timed out > > > >Root-NFS: Unable to get nfsd port number from server, using default > > > > > >and so on until it panics... > > IIRC, there are two main problems in this typ of situation 1) Spanning tree convergence 2) Firmware initalization latency If you are running spanning tree on your network, it can take up to 2 minutes before your port will forward frames properly. if you have the options available, disable spanning tree on the switch port connected to your host system, or at least enable portfast if it is an option. That should fix any spanning tree issues you have If the tg3 card is just taking a long time to initalize, there is not too much you can do about it. If your goal is to use nfs root, I would, instead of enabling nfs-root as a kernel config option, I would create an initramfs that: A) Brings up your NIC B) Mounts your nfs partition C) executes a switch_root or pivot_root operation That way you can calibrate a delay between steps (A) and (B) in your initramfs init script Regards Neil -- /*** *Neil Horman *Software Engineer *Red Hat, Inc. [EMAIL PROTECTED] *gpg keyid: 1024D / 0x92A74FA1 *http://pgp.mit.edu ***/ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tg3 issues
CCing: netdev On 7/19/07, patric <[EMAIL PROTECTED]> wrote: Hi, To start with, i'm not sure if this should go to the dev or user list, but i'll start here.. I'm currently running a nfsroot via a Broadcom NetXtreme 1000-SX card (BCM5701) and i have a big problem with the tg3 drivers autonegotiation. The issue seems to be that when the kernel comes so far as it's trying to mount the boot the autonegotiation has not yet completed and then causes a panic since it cannot mount the nfsroot. From some debugging i have done here the issues seems to be related to the flowcontrol configuration, and just to make it a bit more fun it does work some of the time.. (around once every 5-10 attempts.) On the console it looks something like this when failing. (written from memory since i don't have netconsole enabled) tg3: eth0: Link is up at 1000 Mbps, full duplex. tg3: eth0: Flow control is off for TX and off for RX. IP-Config: Complete: device=eth0, addr=192.168.1.10, mask=255.255.255.0, gw=255.255.255.255, host=amd, domain=, nis-domain=(none), bootserver=255.255.255.255, rootserver=192.168.1.1, rootpath= Root-NFS: unknown option: nolocks Looking up port of RPC 13/3 on 192.168.1.1 rpcbind: server 192.168.1.1 not responding, timed out Root-NFS: Unable to get nfsd port number from server, using default Looking up port of RPC 13/3 on 192.168.1.1 rpcbind: server 192.168.1.1 not responding, timed out Root-NFS: Unable to get nfsd port number from server, using default and so on until it panics... And for a successful attempt: tg3: eth0: Link is up at 1000 Mbps, full duplex. tg3: eth0: Flow control is off for TX and off for RX. IP-Config: Complete: device=eth0, addr=192.168.1.10, mask=255.255.255.0, gw=255.255.255.255, host=amd, domain=, nis-domain=(none), bootserver=255.255.255.255, rootserver=192.168.1.1, rootpath= Root-NFS: unknown option: nolocks Looking up port of RPC 13/3 on 192.168.1.1 tg3: eth0: Link is up at 1000 Mbps, full duplex. tg3: eth0: Flow control is on for TX and on for RX. rpcbind: server 192.168.1.1 not responding, timed out Root-NFS: Unable to get nfsd port number from server, using default Looking up port of RPC 15/3 on 192.168.1.1 VFS: Mounted root (nfs filesystem). Also, if i get the "flowcontrol is on" before it tries to mount the root it does not report any issues. And this is also not unique to NFS booting, but anytime i bring the link up (when not using it for booting). Currently i'm running 2.6.22-git9 (for testing) and i experience the same issues in plain 2.6.22 and before. Regards, Patric - To unsubscribe from this list: send the line "unsubscribe linux-net" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- play the game - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html