Re: [danie...@cisco.com: Re: gianfar: Implement MAC reset and reconfig procedure]
On Thu, Oct 18, 2018 at 04:49:26PM +, Claudiu Manoil wrote: > > I can only advise you to check whether the MACCFG2 register settings are > consistent > at this point, when ping fails. You should check the I/F Mode bits (22-23) > and the > Full Duplex bit (31), in big-endian format. If these do not match the > 100Mbps full > duplex link mode, then it might be that another thread (probably doing > reset_gfar) > changes MACCFG2 concurrently. I think MACCFG2 may be dumped with ethtool -d. > I can get my hands on a board no sooner than maybe next week. What does the MACCFG2 register actually do ? Is that connected to the phy somehow ? I'm wondering because it seems like the gianfar driver is doing the right things, and adjust_link() is getting called etc.. Something seems not to tolerate the change from GMII to MII. Daniel
Re: [danie...@cisco.com: Re: gianfar: Implement MAC reset and reconfig procedure]
On Thu, Oct 18, 2018 at 04:49:26PM +, Claudiu Manoil wrote: > I can only advise you to check whether the MACCFG2 register settings are > consistent > at this point, when ping fails. You should check the I/F Mode bits (22-23) > and the > Full Duplex bit (31), in big-endian format. If these do not match the > 100Mbps full > duplex link mode, then it might be that another thread (probably doing > reset_gfar) > changes MACCFG2 concurrently. I think MACCFG2 may be dumped with ethtool -d. > I can get my hands on a board no sooner than maybe next week. A board won't help you .. I'm running on customer hardware which you don't have access to. After boot up you have MACCFG2 = 0x7205 which is the same as the INIT settings. After the interface is brought up adjust_link() changes to MACCFG2 = 0x7105 which I think is MII. 0x7105 stays after the interface is brought down until gfar_mac_reset sets it to 0x7205 (GMII) .. then adjust link resets it to 0x7105 (MII) .. That goes on and on each time to interface is brought down/up. It seems like this is what your expecting to happen, but it doesn't seems to work %100 of the time. Daniel
RE: [danie...@cisco.com: Re: gianfar: Implement MAC reset and reconfig procedure]
>-Original Message- >From: Daniel Walker >Sent: Thursday, October 18, 2018 5:05 PM >To: Claudiu Manoil >Cc: Hemant Ramdasi ; netdev@vger.kernel.org >Subject: Re: [danie...@cisco.com: Re: gianfar: Implement MAC reset and >reconfig procedure] > [...] > >Here's some parts of the logs. I added a dump_stack() into adjust_link(). It >does appear to be running, but it seems it's not working or not doing what you >think it should be doing. The signature of the issue is below, you bring up the >interface the first time and it works, then bring it down/up and no traffic. >You can see in the second ping there is %100 packet loss. > >Seems the "Link is Up" lines indicate what adjust_link() changes. > >IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready >CPU: 0 PID: 24 Comm: kworker/0:1 Not tainted 3.14.0-rc3 #174 >Workqueue: events_power_efficient phy_state_machine >Call Trace: >[e81ffdb0] [c0008718] show_stack+0xfc/0x1bc (unreliable) >[e81ffe00] [c0602168] dump_stack+0x78/0xa0 >[e81ffe10] [c0437b20] adjust_link+0x30/0x2b0 >[e81ffe50] [c0430f1c] phy_state_machine+0x428/0x47c >[e81ffe70] [c0060a84] process_one_work+0x158/0x3c4 >[e81ffea0] [c0061120] worker_thread+0x138/0x384 >[e81ffed0] [c0068714] kthread+0xd0/0xe4 >[e81fff40] [c0011bc8] ret_from_kernel_thread+0x5c/0x64 >CPU: 0 PID: 24 Comm: kworker/0:1 Not tainted 3.14.0-rc3 #174 >Workqueue: events_power_efficient phy_state_machine >Call Trace: >[e81ffdb0] [c0008718] show_stack+0xfc/0x1bc (unreliable) >[e81ffe00] [c0602168] dump_stack+0x78/0xa0 >[e81ffe10] [c0437b20] adjust_link+0x30/0x2b0 >[e81ffe50] [c0430e60] phy_state_machine+0x36c/0x47c >[e81ffe70] [c0060a84] process_one_work+0x158/0x3c4 >[e81ffea0] [c0061120] worker_thread+0x138/0x384 >[e81ffed0] [c0068714] kthread+0xd0/0xe4 >[e81fff40] [c0011bc8] ret_from_kernel_thread+0x5c/0x64 >fsl-gianfar ff725000.ethernet eth1: Link is Up - 100Mbps/Full - flow control >off >IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready >PING 10.126.154.1 (10.126.154.1): 56 data bytes >64 bytes from 10.126.154.1: seq=0 ttl=255 time=5.606 ms > >--- 10.126.154.1 ping statistics --- >1 packets transmitted, 1 packets received, 0% packet loss >round-trip min/avg/max = 5.606/5.606/5.606 ms >CPU: 0 PID: 24 Comm: kworker/0:1 Not tainted 3.14.0-rc3 #174 >Workqueue: events_power_efficient phy_state_machine >Call Trace: >[e81ffdb0] [c0008718] show_stack+0xfc/0x1bc (unreliable) >[e81ffe00] [c0602168] dump_stack+0x78/0xa0 >[e81ffe10] [c0437b20] adjust_link+0x30/0x2b0 >[e81ffe50] [c0430f1c] phy_state_machine+0x428/0x47c >[e81ffe70] [c0060a84] process_one_work+0x158/0x3c4 >[e81ffea0] [c0061120] worker_thread+0x138/0x384 >[e81ffed0] [c0068714] kthread+0xd0/0xe4 >[e81fff40] [c0011bc8] ret_from_kernel_thread+0x5c/0x64 >CPU: 0 PID: 24 Comm: kworker/0:1 Not tainted 3.14.0-rc3 #174 >Workqueue: events_power_efficient phy_state_machine >Call Trace: >[e81ffdb0] [c0008718] show_stack+0xfc/0x1bc (unreliable) >[e81ffe00] [c0602168] dump_stack+0x78/0xa0 >[e81ffe10] [c0437b20] adjust_link+0x30/0x2b0 >[e81ffe50] [c0430e60] phy_state_machine+0x36c/0x47c >[e81ffe70] [c0060a84] process_one_work+0x158/0x3c4 >[e81ffea0] [c0061120] worker_thread+0x138/0x384 >[e81ffed0] [c0068714] kthread+0xd0/0xe4 >[e81fff40] [c0011bc8] ret_from_kernel_thread+0x5c/0x64 >fsl-gianfar ff725000.ethernet eth1: Link is Up - 100Mbps/Full - flow control >off >PING 10.126.154.1 (10.126.154.1): 56 data bytes > >--- 10.126.154.1 ping statistics --- >1 packets transmitted, 0 packets received, 100% packet loss I can only advise you to check whether the MACCFG2 register settings are consistent at this point, when ping fails. You should check the I/F Mode bits (22-23) and the Full Duplex bit (31), in big-endian format. If these do not match the 100Mbps full duplex link mode, then it might be that another thread (probably doing reset_gfar) changes MACCFG2 concurrently. I think MACCFG2 may be dumped with ethtool -d. I can get my hands on a board no sooner than maybe next week. -Claudiu
Re: [danie...@cisco.com: Re: gianfar: Implement MAC reset and reconfig procedure]
On Thu, Oct 18, 2018 at 12:16:06PM +, Claudiu Manoil wrote: > Hi, > > Sorry but I never heard about the phy you're quoting, this m88e1101, what is > it? > Link mode? (SGMII, RGMII, ?) > Our boards (the ones I know) have Vitesse or Atheros phys. > If the maccfg2 setting you're mentioning really makes the difference, then it > looks > like your phy enters in 10/100 Mbit or half duplex operation mode after MAC > reset, > aka lower speed MII mode, whereas the INIT_SETTINGS set up the MAC to operate > in 1000 full duplex mode (GMII mode) by default. > Link speed settings for the MACCFG2 register should be later adjusted via > adjust_link() callback, > so that if the initial maccfg2 settings don't match with the phy settings > they will be adjusted > by phylib's adjust_link(). For some reason this doesn't seem to happen on > your setup either. > So, could you please confirm whether after MAC reset your phy enters lower > speed mode (MII), > and whether the adjust_link() callback is getting invoked after ifconfig up? Here's some parts of the logs. I added a dump_stack() into adjust_link(). It does appear to be running, but it seems it's not working or not doing what you think it should be doing. The signature of the issue is below, you bring up the interface the first time and it works, then bring it down/up and no traffic. You can see in the second ping there is %100 packet loss. Seems the "Link is Up" lines indicate what adjust_link() changes. IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready CPU: 0 PID: 24 Comm: kworker/0:1 Not tainted 3.14.0-rc3 #174 Workqueue: events_power_efficient phy_state_machine Call Trace: [e81ffdb0] [c0008718] show_stack+0xfc/0x1bc (unreliable) [e81ffe00] [c0602168] dump_stack+0x78/0xa0 [e81ffe10] [c0437b20] adjust_link+0x30/0x2b0 [e81ffe50] [c0430f1c] phy_state_machine+0x428/0x47c [e81ffe70] [c0060a84] process_one_work+0x158/0x3c4 [e81ffea0] [c0061120] worker_thread+0x138/0x384 [e81ffed0] [c0068714] kthread+0xd0/0xe4 [e81fff40] [c0011bc8] ret_from_kernel_thread+0x5c/0x64 CPU: 0 PID: 24 Comm: kworker/0:1 Not tainted 3.14.0-rc3 #174 Workqueue: events_power_efficient phy_state_machine Call Trace: [e81ffdb0] [c0008718] show_stack+0xfc/0x1bc (unreliable) [e81ffe00] [c0602168] dump_stack+0x78/0xa0 [e81ffe10] [c0437b20] adjust_link+0x30/0x2b0 [e81ffe50] [c0430e60] phy_state_machine+0x36c/0x47c [e81ffe70] [c0060a84] process_one_work+0x158/0x3c4 [e81ffea0] [c0061120] worker_thread+0x138/0x384 [e81ffed0] [c0068714] kthread+0xd0/0xe4 [e81fff40] [c0011bc8] ret_from_kernel_thread+0x5c/0x64 fsl-gianfar ff725000.ethernet eth1: Link is Up - 100Mbps/Full - flow control off IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready PING 10.126.154.1 (10.126.154.1): 56 data bytes 64 bytes from 10.126.154.1: seq=0 ttl=255 time=5.606 ms --- 10.126.154.1 ping statistics --- 1 packets transmitted, 1 packets received, 0% packet loss round-trip min/avg/max = 5.606/5.606/5.606 ms CPU: 0 PID: 24 Comm: kworker/0:1 Not tainted 3.14.0-rc3 #174 Workqueue: events_power_efficient phy_state_machine Call Trace: [e81ffdb0] [c0008718] show_stack+0xfc/0x1bc (unreliable) [e81ffe00] [c0602168] dump_stack+0x78/0xa0 [e81ffe10] [c0437b20] adjust_link+0x30/0x2b0 [e81ffe50] [c0430f1c] phy_state_machine+0x428/0x47c [e81ffe70] [c0060a84] process_one_work+0x158/0x3c4 [e81ffea0] [c0061120] worker_thread+0x138/0x384 [e81ffed0] [c0068714] kthread+0xd0/0xe4 [e81fff40] [c0011bc8] ret_from_kernel_thread+0x5c/0x64 CPU: 0 PID: 24 Comm: kworker/0:1 Not tainted 3.14.0-rc3 #174 Workqueue: events_power_efficient phy_state_machine Call Trace: [e81ffdb0] [c0008718] show_stack+0xfc/0x1bc (unreliable) [e81ffe00] [c0602168] dump_stack+0x78/0xa0 [e81ffe10] [c0437b20] adjust_link+0x30/0x2b0 [e81ffe50] [c0430e60] phy_state_machine+0x36c/0x47c [e81ffe70] [c0060a84] process_one_work+0x158/0x3c4 [e81ffea0] [c0061120] worker_thread+0x138/0x384 [e81ffed0] [c0068714] kthread+0xd0/0xe4 [e81fff40] [c0011bc8] ret_from_kernel_thread+0x5c/0x64 fsl-gianfar ff725000.ethernet eth1: Link is Up - 100Mbps/Full - flow control off PING 10.126.154.1 (10.126.154.1): 56 data bytes --- 10.126.154.1 ping statistics --- 1 packets transmitted, 0 packets received, 100% packet loss
Re: [danie...@cisco.com: Re: gianfar: Implement MAC reset and reconfig procedure]
On Thu, Oct 18, 2018 at 12:16:06PM +, Claudiu Manoil wrote: > Hi, > > Sorry but I never heard about the phy you're quoting, this m88e1101, what is > it? > Link mode? (SGMII, RGMII, ?) > Our boards (the ones I know) have Vitesse or Atheros phys. > If the maccfg2 setting you're mentioning really makes the difference, then it > looks > like your phy enters in 10/100 Mbit or half duplex operation mode after MAC > reset, > aka lower speed MII mode, whereas the INIT_SETTINGS set up the MAC to operate > in 1000 full duplex mode (GMII mode) by default. > Link speed settings for the MACCFG2 register should be later adjusted via > adjust_link() callback, > so that if the initial maccfg2 settings don't match with the phy settings > they will be adjusted > by phylib's adjust_link(). For some reason this doesn't seem to happen on > your setup either. > So, could you please confirm whether after MAC reset your phy enters lower > speed mode (MII), > and whether the adjust_link() callback is getting invoked after ifconfig up? > It's a Marvell phy, this is not an eval board from NXP it's custom hardware. The link on this board is setup to run at 100Mps. Here's a snippet of the logs during a test run. IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready fsl-gianfar ff725000.ethernet eth1: Link is Up - 100Mbps/Full - flow control off IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready PING 10.126.154.1 (10.126.154.1): 56 data bytes 64 bytes from 10.126.154.1: seq=0 ttl=255 time=2.101 ms --- 10.126.154.1 ping statistics --- 1 packets transmitted, 1 packets received, 0% packet loss I can check if adjust_link() is running. This kernel has only very few changes to allow the hardware to work allos isolated under arch/powerpc/ , certainly no changes under drivers/. So if it's suppose to be running there is no reason why it wouldn't be. Daniel
Re: gianfar: Implement MAC reset and reconfig procedure
On Tue, Oct 16, 2018 at 03:03:07PM -0700, Florian Fainelli wrote: > On 10/16/2018 02:36 PM, Daniel Walker wrote: > > Hi, > > > > I would like to report an issue in the gianfar driver. The issue is as > > follows. > > > > We have a P2020 board that uses the gianfar driver, and we have a m88e1101 > > PHY connect. When the interface is initially brought up traffic flows as > > normal. If you take the interface down then bring it back up traffic stops > > flowing. If you do this sequence over and over up/down/up we find that the > > interface will allow traffic to flow at a low percentage. > > > > In v4.9 interface allows traffic about %10 of the time. > > > > In v4.19-rc8 the allows traffic %30 of the time. > > > > After bisecting I found that in v3.14 the interface was rock solid and > > never did > > we see this issue. However, v3.15 we started to see this issue. After > > bisecting I > > found the following change is the first one which causes the issue, > > > > a328ac9 gianfar: Implement MAC reset and reconfig procedure > > > > I was able to revert this in v3.15 , however with later development a revert > > doesn't appear to be possible. We have no fix for this currently. > > > > I can do testing if you have an idea what might cause the issue. > > What we have seen being typically the problem is that when you have a > PHY connection whereby the PHY provides the RX clock to the MAC (e.g: > RGMII), it is very easy to get in a situation where the PHY clock is > stopped, and the MAC is asked to be reset, but the HW design does not > like that at all since it e.g: stops on packet boundaries and need some > clock cycles to do that, and that results in all sorts of issues (in our > case it was some FIFO corruption). We solved that in bcmgenet.c with > looping internally the TX clock to the RX clock to make sure the > Ethernet MAC (UniMAC in our designs) was successfully reset: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=28c2d1a7a0bfdf3617800d2beae1c67983c03d15 > > Could that somehow be the problem here? A little more context on this issue after some debugging. The patch which I quote above adds a line into int startup_gfar() which does, gfar_mac_reset(priv); If this line is removed then everything starts working again (this is debugging at the v3.15 source level). On further inspection the block of code inside gfar_mac_reset() is causes a problem is this one, /* Initialize MACCFG2. */ tempval = MACCFG2_INIT_SETTINGS; if (gfar_has_errata(priv, GFAR_ERRATA_74)) tempval |= MACCFG2_HUGEFRAME | MACCFG2_LENGTHCHECK; gfar_write(®s->maccfg2, tempval); and if you change this block to this, tempval = gfar_read(®s->maccfg2); if (gfar_has_errata(priv, GFAR_ERRATA_74)) tempval |= MACCFG2_HUGEFRAME | MACCFG2_LENGTHCHECK; gfar_write(®s->maccfg2, tempval); Then everything starts working. At least on my hardware if you gfar_read() when the hardware first comes up it doesn't cause any issues however, I don't know about other hardware. It would seems that MACCFG2_INIT_SETTINGS is not set up correctly or shouldn't be used in this context. Daniel
Re: gianfar: Implement MAC reset and reconfig procedure
On 10/16/2018 02:36 PM, Daniel Walker wrote: > Hi, > > I would like to report an issue in the gianfar driver. The issue is as > follows. > > We have a P2020 board that uses the gianfar driver, and we have a m88e1101 > PHY connect. When the interface is initially brought up traffic flows as > normal. If you take the interface down then bring it back up traffic stops > flowing. If you do this sequence over and over up/down/up we find that the > interface will allow traffic to flow at a low percentage. > > In v4.9 interface allows traffic about %10 of the time. > > In v4.19-rc8 the allows traffic %30 of the time. > > After bisecting I found that in v3.14 the interface was rock solid and never > did > we see this issue. However, v3.15 we started to see this issue. After > bisecting I > found the following change is the first one which causes the issue, > > a328ac9 gianfar: Implement MAC reset and reconfig procedure > > I was able to revert this in v3.15 , however with later development a revert > doesn't appear to be possible. We have no fix for this currently. > > I can do testing if you have an idea what might cause the issue. What we have seen being typically the problem is that when you have a PHY connection whereby the PHY provides the RX clock to the MAC (e.g: RGMII), it is very easy to get in a situation where the PHY clock is stopped, and the MAC is asked to be reset, but the HW design does not like that at all since it e.g: stops on packet boundaries and need some clock cycles to do that, and that results in all sorts of issues (in our case it was some FIFO corruption). We solved that in bcmgenet.c with looping internally the TX clock to the RX clock to make sure the Ethernet MAC (UniMAC in our designs) was successfully reset: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=28c2d1a7a0bfdf3617800d2beae1c67983c03d15 Could that somehow be the problem here? -- Florian
Re: gianfar: Implement MAC reset and reconfig procedure
Hi, I would like to report an issue in the gianfar driver. The issue is as follows. We have a P2020 board that uses the gianfar driver, and we have a m88e1101 PHY connect. When the interface is initially brought up traffic flows as normal. If you take the interface down then bring it back up traffic stops flowing. If you do this sequence over and over up/down/up we find that the interface will allow traffic to flow at a low percentage. In v4.9 interface allows traffic about %10 of the time. In v4.19-rc8 the allows traffic %30 of the time. After bisecting I found that in v3.14 the interface was rock solid and never did we see this issue. However, v3.15 we started to see this issue. After bisecting I found the following change is the first one which causes the issue, a328ac9 gianfar: Implement MAC reset and reconfig procedure I was able to revert this in v3.15 , however with later development a revert doesn't appear to be possible. We have no fix for this currently. I can do testing if you have an idea what might cause the issue. Daniel