Re: tg3 issues

2007-07-24 Thread patric

Michael Chan wrote:


Last update on this.

"udelay( 100 + net_random % 300 )" seems to work much better and i have 
not had a single problem getting the link up within 10 seconds of a cold 
or warm-boot, and most often the link comes up directly without any sort 
of delay instead like before when it could hang for 30 seconds before 
getting a link, if you even got a link.





We'll have to do some testing to see if we can find a better solution.
Adding up to 400 usec of busy wait is not ideal.  Are you connecting two
5701 fiber cards directly to each other in your setup?


  


Hi,

Yea, it's and ugly hack, but it's a workaround that at least works for me.
Have done some additional tests and to me it seems that the driver just 
needs to wait a bit longer to detect the link + some random time to get 
around the issues it might have when chasing the other systems state.
Don't have the numbers in front of me, but i did some tests to get the 
'optimum' delay to wait and it seems like the larger the wait time (even 
up to around 40-50ms!) causes the autoneg to go much smoother and 
faster. And remember that the link can be reported up, but no traffic 
can be passed via the link before the autoneg is complete and both cards 
reports back that flowcontrol is enabled so you need to verify the link 
by trying to send some data and not just look at the link-status.


Hope my conclusions might help, or atleast point you in the right direction.

I have 2
01:08.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5701 
Gigabit Ethernet (rev 15)   ( 14e4:1645 )

connected via a single FC cable.
And the two systems are one single-core and one dual-core AMD system.

Regards,
Patric

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tg3 issues

2007-07-23 Thread Michael Chan
On Sun, 2007-07-22 at 13:43 +0200, patric wrote:
> patric wrote:
> 
> > Hi,
> >
> > Think i got something working for me at least, and the fix is quite 
> > minimal and only downside that i could see from it was that you might 
> > get a small delay when bringing up the interface, but that's probably 
> > better than getting a non-functional interface that reports that it's up.
> >
> > The fix seems to be quite simple with just a random sleep at the end 
> > of  "tg3_setup_fiber_by_hand():"
> >
> >tw32_f(MAC_MODE, tp->mac_mode);
> >udelay(40);
> >}
> >
> > out:
> >udelay( net_random() % 400 );
> >return current_link_up;
> > }
> >
> > Not sure that this is a good fix or if it might break on other 
> > systems, but maybe you could have a quick look at that?
> >
> > Regards,
> > Patric
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to [EMAIL PROTECTED]
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> Last update on this.
> 
> "udelay( 100 + net_random % 300 )" seems to work much better and i have 
> not had a single problem getting the link up within 10 seconds of a cold 
> or warm-boot, and most often the link comes up directly without any sort 
> of delay instead like before when it could hang for 30 seconds before 
> getting a link, if you even got a link.
> 

We'll have to do some testing to see if we can find a better solution.
Adding up to 400 usec of busy wait is not ideal.  Are you connecting two
5701 fiber cards directly to each other in your setup?

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tg3 issues

2007-07-22 Thread patric

patric wrote:


Hi,

Think i got something working for me at least, and the fix is quite 
minimal and only downside that i could see from it was that you might 
get a small delay when bringing up the interface, but that's probably 
better than getting a non-functional interface that reports that it's up.


The fix seems to be quite simple with just a random sleep at the end 
of  "tg3_setup_fiber_by_hand():"


   tw32_f(MAC_MODE, tp->mac_mode);
   udelay(40);
   }

out:
   udelay( net_random() % 400 );
   return current_link_up;
}

Not sure that this is a good fix or if it might break on other 
systems, but maybe you could have a quick look at that?


Regards,
Patric

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Last update on this.

"udelay( 100 + net_random % 300 )" seems to work much better and i have 
not had a single problem getting the link up within 10 seconds of a cold 
or warm-boot, and most often the link comes up directly without any sort 
of delay instead like before when it could hang for 30 seconds before 
getting a link, if you even got a link.


/Patric

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tg3 issues

2007-07-20 Thread patric

Michael Chan wrote:


On Fri, 2007-07-20 at 18:59 +0200, patric wrote:

  
Thanks... That's a confirmation on what i  suspected.. I think i'll dig 
into the code instead and try to figure out some way of making it work a 
bit better for my setup atleast...
I'll post a patch later if i get something working... Btw, do you have 
any timings on how long it takes for the cards to get a lock on the link?





Serdes link comes up very fast.  The clause 37 autoneg involves going
through a few states and exchanging base pages and optional next pages.
Each state completes in 10 msec, so the whole autoneg should complete in
roughly 100 msec or so.  But because it is done in software for these
older cards, it can take much longer because it can get out-of-sync. 


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  

Hi,

Think i got something working for me at least, and the fix is quite 
minimal and only downside that i could see from it was that you might 
get a small delay when bringing up the interface, but that's probably 
better than getting a non-functional interface that reports that it's up.


The fix seems to be quite simple with just a random sleep at the end of  
"tg3_setup_fiber_by_hand():"


   tw32_f(MAC_MODE, tp->mac_mode);
   udelay(40);
   }

out:
   udelay( net_random() % 400 );
   return current_link_up;
}

Not sure that this is a good fix or if it might break on other systems, 
but maybe you could have a quick look at that?


Regards,
Patric

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tg3 issues

2007-07-20 Thread Michael Chan
On Fri, 2007-07-20 at 18:59 +0200, patric wrote:

> Thanks... That's a confirmation on what i  suspected.. I think i'll dig 
> into the code instead and try to figure out some way of making it work a 
> bit better for my setup atleast...
> I'll post a patch later if i get something working... Btw, do you have 
> any timings on how long it takes for the cards to get a lock on the link?
> 

Serdes link comes up very fast.  The clause 37 autoneg involves going
through a few states and exchanging base pages and optional next pages.
Each state completes in 10 msec, so the whole autoneg should complete in
roughly 100 msec or so.  But because it is done in software for these
older cards, it can take much longer because it can get out-of-sync. 

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tg3 issues

2007-07-20 Thread patric

Michael Chan wrote:


On Thu, 2007-07-19 at 15:24 +0200, patric wrote:

  
Just a hypothetical question. If the 2 network cards starts the 
autonegotiation would it be possible that they get into a loop where 
they are chasing each others state?  Maybe a fix could be to add a sleep 
of a random length that would enable them to catch up? Maybe you know if 
any of the fiber-cards so support running without flowcontrol too since 
the cards don't seem to be able to get a link with flowcontrol turned 
off at least in this setup.






The old 5701 fiber NICs do not support autonegotiation in hardware so it
is done "by hand" in the driver.  It is not the most robust way of doing
autoneg and what you described is totally possible.  You might want to
try disabling autoneg to see if it works any better.  There is only one
possible speed in fiber and autoneg is really only used to negotiate
flow control.  Some switch ports will not link up if the link partner
does not do autoneg though.

You have to use ethtool in initrd to turn off autoneg or just modify the
driver to disable autoneg.


  
Thanks... That's a confirmation on what i  suspected.. I think i'll dig 
into the code instead and try to figure out some way of making it work a 
bit better for my setup atleast...
I'll post a patch later if i get something working... Btw, do you have 
any timings on how long it takes for the cards to get a lock on the link?


Thanks for your input!

/Patric

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tg3 issues

2007-07-19 Thread Michael Chan
On Thu, 2007-07-19 at 15:24 +0200, patric wrote:

> 
> Just a hypothetical question. If the 2 network cards starts the 
> autonegotiation would it be possible that they get into a loop where 
> they are chasing each others state?  Maybe a fix could be to add a sleep 
> of a random length that would enable them to catch up? Maybe you know if 
> any of the fiber-cards so support running without flowcontrol too since 
> the cards don't seem to be able to get a link with flowcontrol turned 
> off at least in this setup.
> 
> 

The old 5701 fiber NICs do not support autonegotiation in hardware so it
is done "by hand" in the driver.  It is not the most robust way of doing
autoneg and what you described is totally possible.  You might want to
try disabling autoneg to see if it works any better.  There is only one
possible speed in fiber and autoneg is really only used to negotiate
flow control.  Some switch ports will not link up if the link partner
does not do autoneg though.

You have to use ethtool in initrd to turn off autoneg or just modify the
driver to disable autoneg.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tg3 issues

2007-07-19 Thread patric

Neil Horman wrote:


On Thu, Jul 19, 2007 at 04:49:13PM +0530, pradeep singh wrote:
  

CCing: netdev

On 7/19/07, patric <[EMAIL PROTECTED]> wrote:


Hi,


To start with, i'm not sure if this should go to the dev or user list,
but i'll start here..


I'm currently running a nfsroot via a Broadcom NetXtreme 1000-SX card
(BCM5701) and i have a big problem with the tg3 drivers autonegotiation.

The issue seems to be that when the kernel comes so far as it's trying
to mount the boot the autonegotiation has not yet completed and then
causes a panic since it cannot mount the nfsroot.


From some debugging i have done here the issues seems to be related to
the flowcontrol configuration, and just to make it a bit more fun it
does work some of the time.. (around once every 5-10 attempts.)


On the console it looks something like this when failing. (written from
memory since i don't have netconsole enabled)

tg3: eth0: Link is up at 1000 Mbps, full duplex.
tg3: eth0: Flow control is off for TX and off for RX.
IP-Config: Complete:
 device=eth0, addr=192.168.1.10, mask=255.255.255.0,
gw=255.255.255.255,
host=amd, domain=, nis-domain=(none),
bootserver=255.255.255.255, rootserver=192.168.1.1, rootpath=
Root-NFS: unknown option: nolocks
Looking up port of RPC 13/3 on 192.168.1.1
rpcbind: server 192.168.1.1 not responding, timed out

Root-NFS: Unable to get nfsd port number from server, using default

Looking up port of RPC 13/3 on 192.168.1.1
rpcbind: server 192.168.1.1 not responding, timed out

Root-NFS: Unable to get nfsd port number from server, using default


and so on until it panics...

  


IIRC, there are two main problems in this typ of situation

1) Spanning tree convergence
2) Firmware initalization latency

If you are running spanning tree on your network, it can take up to 2 minutes
before your port will forward frames properly.  if you have the options
available, disable spanning tree on the switch port connected to your host
system, or at least enable portfast if it is an option.  That should fix any
spanning tree issues you have

If the tg3 card is just taking a long time to initalize, there is not too much
you can do about it.  If your goal is to use nfs root, I would, instead of
enabling nfs-root as a kernel config option, I would create an initramfs that:
A) Brings up your NIC
B) Mounts your nfs partition
C) executes a switch_root or pivot_root operation

That way you can calibrate a delay between steps (A) and (B) in your initramfs
init script

Regards
Neil

  
Hi Neil and thanks for your quick reply, and thanks Pradeep for 
forwarding the question to the correct mailinglist.


Well, not using any switches and just a crossed cable between the 
machines. Did notice that it seems to get a 'good link' more often when 
cold-booting the client.
Have been thinking about using a initrd to get around the issue, but the 
problem is that you never know how long the init will be so there will 
always have to be a quite big delay before the system can boot. But 
don't really think the issue is that the card takes a long time to 
initialize since it does sometime work without delay during a warm-boot 
and the cards do report that they are up but they then are reporting 
different states of flow-control. Maybe set the flowcontrol static in 
the driver for a test, if i now can figure out how this driver works. :)


Just a hypothetical question. If the 2 network cards starts the 
autonegotiation would it be possible that they get into a loop where 
they are chasing each others state?  Maybe a fix could be to add a sleep 
of a random length that would enable them to catch up? Maybe you know if 
any of the fiber-cards so support running without flowcontrol too since 
the cards don't seem to be able to get a link with flowcontrol turned 
off at least in this setup.



Regards,
Patric



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tg3 issues

2007-07-19 Thread Neil Horman
On Thu, Jul 19, 2007 at 04:49:13PM +0530, pradeep singh wrote:
> CCing: netdev
> 
> On 7/19/07, patric <[EMAIL PROTECTED]> wrote:
> >Hi,
> >
> >
> >To start with, i'm not sure if this should go to the dev or user list,
> >but i'll start here..
> >
> >
> >I'm currently running a nfsroot via a Broadcom NetXtreme 1000-SX card
> >(BCM5701) and i have a big problem with the tg3 drivers autonegotiation.
> >
> >The issue seems to be that when the kernel comes so far as it's trying
> >to mount the boot the autonegotiation has not yet completed and then
> >causes a panic since it cannot mount the nfsroot.
> >
> >
> > From some debugging i have done here the issues seems to be related to
> >the flowcontrol configuration, and just to make it a bit more fun it
> >does work some of the time.. (around once every 5-10 attempts.)
> >
> >
> >On the console it looks something like this when failing. (written from
> >memory since i don't have netconsole enabled)
> >
> >tg3: eth0: Link is up at 1000 Mbps, full duplex.
> >tg3: eth0: Flow control is off for TX and off for RX.
> >IP-Config: Complete:
> >  device=eth0, addr=192.168.1.10, mask=255.255.255.0,
> >gw=255.255.255.255,
> > host=amd, domain=, nis-domain=(none),
> > bootserver=255.255.255.255, rootserver=192.168.1.1, rootpath=
> >Root-NFS: unknown option: nolocks
> >Looking up port of RPC 13/3 on 192.168.1.1
> >rpcbind: server 192.168.1.1 not responding, timed out
> >
> >Root-NFS: Unable to get nfsd port number from server, using default
> >
> >Looking up port of RPC 13/3 on 192.168.1.1
> >rpcbind: server 192.168.1.1 not responding, timed out
> >
> >Root-NFS: Unable to get nfsd port number from server, using default
> >
> >
> >and so on until it panics...
> >

IIRC, there are two main problems in this typ of situation

1) Spanning tree convergence
2) Firmware initalization latency

If you are running spanning tree on your network, it can take up to 2 minutes
before your port will forward frames properly.  if you have the options
available, disable spanning tree on the switch port connected to your host
system, or at least enable portfast if it is an option.  That should fix any
spanning tree issues you have

If the tg3 card is just taking a long time to initalize, there is not too much
you can do about it.  If your goal is to use nfs root, I would, instead of
enabling nfs-root as a kernel config option, I would create an initramfs that:
A) Brings up your NIC
B) Mounts your nfs partition
C) executes a switch_root or pivot_root operation

That way you can calibrate a delay between steps (A) and (B) in your initramfs
init script

Regards
Neil

-- 
/***
 *Neil Horman
 *Software Engineer
 *Red Hat, Inc.
 [EMAIL PROTECTED]
 *gpg keyid: 1024D / 0x92A74FA1
 *http://pgp.mit.edu
 ***/
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tg3 issues

2007-07-19 Thread pradeep singh

CCing: netdev

On 7/19/07, patric <[EMAIL PROTECTED]> wrote:

Hi,


To start with, i'm not sure if this should go to the dev or user list,
but i'll start here..


I'm currently running a nfsroot via a Broadcom NetXtreme 1000-SX card
(BCM5701) and i have a big problem with the tg3 drivers autonegotiation.

The issue seems to be that when the kernel comes so far as it's trying
to mount the boot the autonegotiation has not yet completed and then
causes a panic since it cannot mount the nfsroot.


 From some debugging i have done here the issues seems to be related to
the flowcontrol configuration, and just to make it a bit more fun it
does work some of the time.. (around once every 5-10 attempts.)


On the console it looks something like this when failing. (written from
memory since i don't have netconsole enabled)

tg3: eth0: Link is up at 1000 Mbps, full duplex.
tg3: eth0: Flow control is off for TX and off for RX.
IP-Config: Complete:
  device=eth0, addr=192.168.1.10, mask=255.255.255.0,
gw=255.255.255.255,
 host=amd, domain=, nis-domain=(none),
 bootserver=255.255.255.255, rootserver=192.168.1.1, rootpath=
Root-NFS: unknown option: nolocks
Looking up port of RPC 13/3 on 192.168.1.1
rpcbind: server 192.168.1.1 not responding, timed out

Root-NFS: Unable to get nfsd port number from server, using default

Looking up port of RPC 13/3 on 192.168.1.1
rpcbind: server 192.168.1.1 not responding, timed out

Root-NFS: Unable to get nfsd port number from server, using default


and so on until it panics...


And for a successful attempt:

tg3: eth0: Link is up at 1000 Mbps, full duplex.
tg3: eth0: Flow control is off for TX and off for RX.
IP-Config: Complete:
  device=eth0, addr=192.168.1.10, mask=255.255.255.0,
gw=255.255.255.255,
 host=amd, domain=, nis-domain=(none),
 bootserver=255.255.255.255, rootserver=192.168.1.1, rootpath=
Root-NFS: unknown option: nolocks
Looking up port of RPC 13/3 on 192.168.1.1
tg3: eth0: Link is up at 1000 Mbps, full duplex.
tg3: eth0: Flow control is on for TX and on for RX.
rpcbind: server 192.168.1.1 not responding, timed out
Root-NFS: Unable to get nfsd port number from server, using default
Looking up port of RPC 15/3 on 192.168.1.1
VFS: Mounted root (nfs filesystem).


Also, if  i get the "flowcontrol is on" before it tries to mount the
root it does not report any issues. And this is also not unique to NFS
booting, but anytime i bring the link up (when not using it for booting).


Currently i'm running 2.6.22-git9 (for testing) and i experience the
same issues in plain 2.6.22 and before.


Regards,

Patric






-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html




--
play the game
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html