Re: [danie...@cisco.com: Re: gianfar: Implement MAC reset and reconfig procedure]

2018-10-19 Thread Daniel Walker
On Thu, Oct 18, 2018 at 04:49:26PM +, Claudiu Manoil wrote:
> 
> I can only advise you to check whether the MACCFG2 register settings are 
> consistent 
> at this point, when ping fails.  You should check the I/F Mode bits (22-23) 
> and the
> Full Duplex bit (31), in big-endian format.  If these do not match the 
> 100Mbps full 
> duplex link mode, then it might be that another thread (probably doing 
> reset_gfar) 
> changes MACCFG2 concurrently.  I think MACCFG2 may be dumped with ethtool -d.
> I can get my hands on a board no sooner than maybe next week.


What does the MACCFG2 register actually do ? Is that connected to the phy
somehow ? I'm wondering because it seems like the gianfar driver is doing the
right things, and adjust_link() is getting called etc.. Something seems not to
tolerate the change from GMII to MII.

Daniel


Re: [danie...@cisco.com: Re: gianfar: Implement MAC reset and reconfig procedure]

2018-10-18 Thread Daniel Walker
On Thu, Oct 18, 2018 at 04:49:26PM +, Claudiu Manoil wrote:
> I can only advise you to check whether the MACCFG2 register settings are 
> consistent 
> at this point, when ping fails.  You should check the I/F Mode bits (22-23) 
> and the
> Full Duplex bit (31), in big-endian format.  If these do not match the 
> 100Mbps full 
> duplex link mode, then it might be that another thread (probably doing 
> reset_gfar) 
> changes MACCFG2 concurrently.  I think MACCFG2 may be dumped with ethtool -d.
> I can get my hands on a board no sooner than maybe next week.

A board won't help you .. I'm running on customer hardware which you don't have
access to.

After boot up you have MACCFG2 = 0x7205 which is the same
as the INIT settings.

After the interface is brought up adjust_link() changes to MACCFG2  = 0x7105
which I think is MII.

0x7105 stays after the interface is brought down until gfar_mac_reset sets it to
0x7205 (GMII) .. then adjust link resets it to 0x7105 (MII) ..

That goes on and on each time to interface is brought down/up. It seems like
this is what your expecting to happen, but it doesn't seems to work %100 of the
time.

Daniel


RE: [danie...@cisco.com: Re: gianfar: Implement MAC reset and reconfig procedure]

2018-10-18 Thread Claudiu Manoil
>-Original Message-
>From: Daniel Walker 
>Sent: Thursday, October 18, 2018 5:05 PM
>To: Claudiu Manoil 
>Cc: Hemant Ramdasi ; netdev@vger.kernel.org
>Subject: Re: [danie...@cisco.com: Re: gianfar: Implement MAC reset and
>reconfig procedure]
>
[...]
>
>Here's some parts of the logs. I added a dump_stack() into adjust_link(). It
>does appear to be running, but it seems it's not working or not doing what you
>think it should be doing. The signature of the issue is below, you bring up the
>interface the first time and it works, then bring it down/up and no traffic.
>You can see in the second ping there is %100 packet loss.
>
>Seems the "Link is Up" lines indicate what adjust_link() changes.
>
>IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
>CPU: 0 PID: 24 Comm: kworker/0:1 Not tainted 3.14.0-rc3 #174
>Workqueue: events_power_efficient phy_state_machine
>Call Trace:
>[e81ffdb0] [c0008718] show_stack+0xfc/0x1bc (unreliable)
>[e81ffe00] [c0602168] dump_stack+0x78/0xa0
>[e81ffe10] [c0437b20] adjust_link+0x30/0x2b0
>[e81ffe50] [c0430f1c] phy_state_machine+0x428/0x47c
>[e81ffe70] [c0060a84] process_one_work+0x158/0x3c4
>[e81ffea0] [c0061120] worker_thread+0x138/0x384
>[e81ffed0] [c0068714] kthread+0xd0/0xe4
>[e81fff40] [c0011bc8] ret_from_kernel_thread+0x5c/0x64
>CPU: 0 PID: 24 Comm: kworker/0:1 Not tainted 3.14.0-rc3 #174
>Workqueue: events_power_efficient phy_state_machine
>Call Trace:
>[e81ffdb0] [c0008718] show_stack+0xfc/0x1bc (unreliable)
>[e81ffe00] [c0602168] dump_stack+0x78/0xa0
>[e81ffe10] [c0437b20] adjust_link+0x30/0x2b0
>[e81ffe50] [c0430e60] phy_state_machine+0x36c/0x47c
>[e81ffe70] [c0060a84] process_one_work+0x158/0x3c4
>[e81ffea0] [c0061120] worker_thread+0x138/0x384
>[e81ffed0] [c0068714] kthread+0xd0/0xe4
>[e81fff40] [c0011bc8] ret_from_kernel_thread+0x5c/0x64
>fsl-gianfar ff725000.ethernet eth1: Link is Up - 100Mbps/Full - flow control 
>off
>IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
>PING 10.126.154.1 (10.126.154.1): 56 data bytes
>64 bytes from 10.126.154.1: seq=0 ttl=255 time=5.606 ms
>
>--- 10.126.154.1 ping statistics ---
>1 packets transmitted, 1 packets received, 0% packet loss
>round-trip min/avg/max = 5.606/5.606/5.606 ms
>CPU: 0 PID: 24 Comm: kworker/0:1 Not tainted 3.14.0-rc3 #174
>Workqueue: events_power_efficient phy_state_machine
>Call Trace:
>[e81ffdb0] [c0008718] show_stack+0xfc/0x1bc (unreliable)
>[e81ffe00] [c0602168] dump_stack+0x78/0xa0
>[e81ffe10] [c0437b20] adjust_link+0x30/0x2b0
>[e81ffe50] [c0430f1c] phy_state_machine+0x428/0x47c
>[e81ffe70] [c0060a84] process_one_work+0x158/0x3c4
>[e81ffea0] [c0061120] worker_thread+0x138/0x384
>[e81ffed0] [c0068714] kthread+0xd0/0xe4
>[e81fff40] [c0011bc8] ret_from_kernel_thread+0x5c/0x64
>CPU: 0 PID: 24 Comm: kworker/0:1 Not tainted 3.14.0-rc3 #174
>Workqueue: events_power_efficient phy_state_machine
>Call Trace:
>[e81ffdb0] [c0008718] show_stack+0xfc/0x1bc (unreliable)
>[e81ffe00] [c0602168] dump_stack+0x78/0xa0
>[e81ffe10] [c0437b20] adjust_link+0x30/0x2b0
>[e81ffe50] [c0430e60] phy_state_machine+0x36c/0x47c
>[e81ffe70] [c0060a84] process_one_work+0x158/0x3c4
>[e81ffea0] [c0061120] worker_thread+0x138/0x384
>[e81ffed0] [c0068714] kthread+0xd0/0xe4
>[e81fff40] [c0011bc8] ret_from_kernel_thread+0x5c/0x64
>fsl-gianfar ff725000.ethernet eth1: Link is Up - 100Mbps/Full - flow control 
>off
>PING 10.126.154.1 (10.126.154.1): 56 data bytes
>
>--- 10.126.154.1 ping statistics ---
>1 packets transmitted, 0 packets received, 100% packet loss

I can only advise you to check whether the MACCFG2 register settings are 
consistent 
at this point, when ping fails.  You should check the I/F Mode bits (22-23) and 
the
Full Duplex bit (31), in big-endian format.  If these do not match the 100Mbps 
full 
duplex link mode, then it might be that another thread (probably doing 
reset_gfar) 
changes MACCFG2 concurrently.  I think MACCFG2 may be dumped with ethtool -d.
I can get my hands on a board no sooner than maybe next week.

-Claudiu


Re: [danie...@cisco.com: Re: gianfar: Implement MAC reset and reconfig procedure]

2018-10-18 Thread Daniel Walker
On Thu, Oct 18, 2018 at 12:16:06PM +, Claudiu Manoil wrote:
> Hi,
> 
> Sorry but I never heard about the phy you're quoting, this m88e1101, what is 
> it?
> Link mode? (SGMII, RGMII, ?)
> Our boards (the ones I know) have Vitesse or Atheros phys.
> If the maccfg2 setting you're mentioning really makes the difference, then it 
> looks
> like your phy enters in 10/100 Mbit or half duplex operation mode after MAC 
> reset,
> aka lower speed MII mode, whereas the INIT_SETTINGS set up the MAC to operate
> in 1000 full duplex mode (GMII mode) by default.
> Link speed settings for the MACCFG2 register should be later adjusted via 
> adjust_link() callback,
> so that if the initial maccfg2 settings don't match with the phy settings 
> they will be adjusted
> by phylib's adjust_link().  For some reason this doesn't seem to happen on 
> your setup either.
> So, could you please confirm whether after MAC reset your phy enters lower 
> speed mode (MII),
> and whether the adjust_link() callback is getting invoked after ifconfig up?

Here's some parts of the logs. I added a dump_stack() into adjust_link(). It
does appear to be running, but it seems it's not working or not doing what you
think it should be doing. The signature of the issue is below, you bring up the
interface the first time and it works, then bring it down/up and no traffic.
You can see in the second ping there is %100 packet loss. 

Seems the "Link is Up" lines indicate what adjust_link() changes.

IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
CPU: 0 PID: 24 Comm: kworker/0:1 Not tainted 3.14.0-rc3 #174
Workqueue: events_power_efficient phy_state_machine
Call Trace:
[e81ffdb0] [c0008718] show_stack+0xfc/0x1bc (unreliable)
[e81ffe00] [c0602168] dump_stack+0x78/0xa0
[e81ffe10] [c0437b20] adjust_link+0x30/0x2b0
[e81ffe50] [c0430f1c] phy_state_machine+0x428/0x47c
[e81ffe70] [c0060a84] process_one_work+0x158/0x3c4
[e81ffea0] [c0061120] worker_thread+0x138/0x384
[e81ffed0] [c0068714] kthread+0xd0/0xe4
[e81fff40] [c0011bc8] ret_from_kernel_thread+0x5c/0x64
CPU: 0 PID: 24 Comm: kworker/0:1 Not tainted 3.14.0-rc3 #174
Workqueue: events_power_efficient phy_state_machine
Call Trace:
[e81ffdb0] [c0008718] show_stack+0xfc/0x1bc (unreliable)
[e81ffe00] [c0602168] dump_stack+0x78/0xa0
[e81ffe10] [c0437b20] adjust_link+0x30/0x2b0
[e81ffe50] [c0430e60] phy_state_machine+0x36c/0x47c
[e81ffe70] [c0060a84] process_one_work+0x158/0x3c4
[e81ffea0] [c0061120] worker_thread+0x138/0x384
[e81ffed0] [c0068714] kthread+0xd0/0xe4
[e81fff40] [c0011bc8] ret_from_kernel_thread+0x5c/0x64
fsl-gianfar ff725000.ethernet eth1: Link is Up - 100Mbps/Full - flow control off
IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
PING 10.126.154.1 (10.126.154.1): 56 data bytes
64 bytes from 10.126.154.1: seq=0 ttl=255 time=5.606 ms

--- 10.126.154.1 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 5.606/5.606/5.606 ms
CPU: 0 PID: 24 Comm: kworker/0:1 Not tainted 3.14.0-rc3 #174
Workqueue: events_power_efficient phy_state_machine
Call Trace:
[e81ffdb0] [c0008718] show_stack+0xfc/0x1bc (unreliable)
[e81ffe00] [c0602168] dump_stack+0x78/0xa0
[e81ffe10] [c0437b20] adjust_link+0x30/0x2b0
[e81ffe50] [c0430f1c] phy_state_machine+0x428/0x47c
[e81ffe70] [c0060a84] process_one_work+0x158/0x3c4
[e81ffea0] [c0061120] worker_thread+0x138/0x384
[e81ffed0] [c0068714] kthread+0xd0/0xe4
[e81fff40] [c0011bc8] ret_from_kernel_thread+0x5c/0x64
CPU: 0 PID: 24 Comm: kworker/0:1 Not tainted 3.14.0-rc3 #174
Workqueue: events_power_efficient phy_state_machine
Call Trace:
[e81ffdb0] [c0008718] show_stack+0xfc/0x1bc (unreliable)
[e81ffe00] [c0602168] dump_stack+0x78/0xa0
[e81ffe10] [c0437b20] adjust_link+0x30/0x2b0
[e81ffe50] [c0430e60] phy_state_machine+0x36c/0x47c
[e81ffe70] [c0060a84] process_one_work+0x158/0x3c4
[e81ffea0] [c0061120] worker_thread+0x138/0x384
[e81ffed0] [c0068714] kthread+0xd0/0xe4
[e81fff40] [c0011bc8] ret_from_kernel_thread+0x5c/0x64
fsl-gianfar ff725000.ethernet eth1: Link is Up - 100Mbps/Full - flow control off
PING 10.126.154.1 (10.126.154.1): 56 data bytes

--- 10.126.154.1 ping statistics ---
1 packets transmitted, 0 packets received, 100% packet loss



Re: [danie...@cisco.com: Re: gianfar: Implement MAC reset and reconfig procedure]

2018-10-18 Thread Daniel Walker
On Thu, Oct 18, 2018 at 12:16:06PM +, Claudiu Manoil wrote:
> Hi,
> 
> Sorry but I never heard about the phy you're quoting, this m88e1101, what is 
> it?
> Link mode? (SGMII, RGMII, ?)
> Our boards (the ones I know) have Vitesse or Atheros phys.
> If the maccfg2 setting you're mentioning really makes the difference, then it 
> looks
> like your phy enters in 10/100 Mbit or half duplex operation mode after MAC 
> reset,
> aka lower speed MII mode, whereas the INIT_SETTINGS set up the MAC to operate
> in 1000 full duplex mode (GMII mode) by default.
> Link speed settings for the MACCFG2 register should be later adjusted via 
> adjust_link() callback,
> so that if the initial maccfg2 settings don't match with the phy settings 
> they will be adjusted
> by phylib's adjust_link().  For some reason this doesn't seem to happen on 
> your setup either.
> So, could you please confirm whether after MAC reset your phy enters lower 
> speed mode (MII),
> and whether the adjust_link() callback is getting invoked after ifconfig up?
> 


It's a Marvell phy, this is not an eval board from NXP it's custom hardware. 
The link on this board
is setup to run at 100Mps. Here's a snippet of the logs during a test run.

IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready

fsl-gianfar ff725000.ethernet eth1: Link is Up - 100Mbps/Full - flow control off
IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
PING 10.126.154.1 (10.126.154.1): 56 data bytes
64 bytes from 10.126.154.1: seq=0 ttl=255 time=2.101 ms

--- 10.126.154.1 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss


I can check if adjust_link() is running. This kernel has only very few changes 
to allow the hardware to work
allos isolated under arch/powerpc/ , certainly no changes under drivers/. So if 
it's suppose to be running
there is no reason why it wouldn't be.

Daniel


Re: gianfar: Implement MAC reset and reconfig procedure

2018-10-17 Thread Daniel Walker
On Tue, Oct 16, 2018 at 03:03:07PM -0700, Florian Fainelli wrote:
> On 10/16/2018 02:36 PM, Daniel Walker wrote:
> > Hi,
> > 
> > I would like to report an issue in the gianfar driver. The issue is as 
> > follows. 
> > 
> > We have a P2020 board that uses the gianfar driver, and we have a m88e1101
> > PHY connect. When the interface is initially brought up traffic flows as
> > normal. If you take the interface down then bring it back up traffic stops
> > flowing. If you do this sequence over and over up/down/up we find that the
> > interface will allow traffic to flow at a low percentage.
> > 
> > In v4.9 interface allows traffic about %10 of the time.
> > 
> > In v4.19-rc8 the allows traffic %30 of the time.
> > 
> > After bisecting I found that in v3.14 the interface was rock solid and 
> > never did
> > we see this issue. However, v3.15 we started to see this issue. After 
> > bisecting I
> > found the following change is the first one which causes the issue,
> > 
> > a328ac9 gianfar: Implement MAC reset and reconfig procedure
> > 
> > I was able to revert this in v3.15 , however with later development a revert
> > doesn't appear to be possible. We have no fix for this currently.
> > 
> > I can do testing if you have an idea what might cause the issue.
> 
> What we have seen being typically the problem is that when you have a
> PHY connection whereby the PHY provides the RX clock to the MAC (e.g:
> RGMII), it is very easy to get in a situation where the PHY clock is
> stopped, and the MAC is asked to be reset, but the HW design does not
> like that at all since it e.g: stops on packet boundaries and need some
> clock cycles to do that, and that results in all sorts of issues (in our
> case it was some FIFO corruption). We solved that in bcmgenet.c with
> looping internally the TX clock to the RX clock to make sure the
> Ethernet MAC (UniMAC in our designs) was successfully reset:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=28c2d1a7a0bfdf3617800d2beae1c67983c03d15
> 
> Could that somehow be the problem here?

A little more context on this issue after some debugging.


The patch which I quote above adds a line into int startup_gfar() which does,

gfar_mac_reset(priv);

If this line is removed then everything starts working again (this is debugging
at the v3.15 source level).

On further inspection the block of code inside gfar_mac_reset() is causes a
problem is this one,

/* Initialize MACCFG2. */
tempval = MACCFG2_INIT_SETTINGS;
if (gfar_has_errata(priv, GFAR_ERRATA_74))
tempval |= MACCFG2_HUGEFRAME | MACCFG2_LENGTHCHECK;
gfar_write(®s->maccfg2, tempval);

and if you change this block to this,

tempval = gfar_read(®s->maccfg2);
if (gfar_has_errata(priv, GFAR_ERRATA_74))
tempval |= MACCFG2_HUGEFRAME | MACCFG2_LENGTHCHECK;
gfar_write(®s->maccfg2, tempval);

Then everything starts working.

At least on my hardware if you gfar_read() when the hardware first comes up it 
doesn't cause any issues
however, I don't know about other hardware. It would seems that 
MACCFG2_INIT_SETTINGS is not set up
correctly or shouldn't be used in this context.

Daniel


Re: gianfar: Implement MAC reset and reconfig procedure

2018-10-16 Thread Florian Fainelli
On 10/16/2018 02:36 PM, Daniel Walker wrote:
> Hi,
> 
> I would like to report an issue in the gianfar driver. The issue is as 
> follows. 
> 
> We have a P2020 board that uses the gianfar driver, and we have a m88e1101
> PHY connect. When the interface is initially brought up traffic flows as
> normal. If you take the interface down then bring it back up traffic stops
> flowing. If you do this sequence over and over up/down/up we find that the
> interface will allow traffic to flow at a low percentage.
> 
> In v4.9 interface allows traffic about %10 of the time.
> 
> In v4.19-rc8 the allows traffic %30 of the time.
> 
> After bisecting I found that in v3.14 the interface was rock solid and never 
> did
> we see this issue. However, v3.15 we started to see this issue. After 
> bisecting I
> found the following change is the first one which causes the issue,
> 
> a328ac9 gianfar: Implement MAC reset and reconfig procedure
> 
> I was able to revert this in v3.15 , however with later development a revert
> doesn't appear to be possible. We have no fix for this currently.
> 
> I can do testing if you have an idea what might cause the issue.

What we have seen being typically the problem is that when you have a
PHY connection whereby the PHY provides the RX clock to the MAC (e.g:
RGMII), it is very easy to get in a situation where the PHY clock is
stopped, and the MAC is asked to be reset, but the HW design does not
like that at all since it e.g: stops on packet boundaries and need some
clock cycles to do that, and that results in all sorts of issues (in our
case it was some FIFO corruption). We solved that in bcmgenet.c with
looping internally the TX clock to the RX clock to make sure the
Ethernet MAC (UniMAC in our designs) was successfully reset:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=28c2d1a7a0bfdf3617800d2beae1c67983c03d15

Could that somehow be the problem here?
-- 
Florian


Re: gianfar: Implement MAC reset and reconfig procedure

2018-10-16 Thread Daniel Walker
Hi,

I would like to report an issue in the gianfar driver. The issue is as follows. 

We have a P2020 board that uses the gianfar driver, and we have a m88e1101
PHY connect. When the interface is initially brought up traffic flows as
normal. If you take the interface down then bring it back up traffic stops
flowing. If you do this sequence over and over up/down/up we find that the
interface will allow traffic to flow at a low percentage.

In v4.9 interface allows traffic about %10 of the time.

In v4.19-rc8 the allows traffic %30 of the time.

After bisecting I found that in v3.14 the interface was rock solid and never did
we see this issue. However, v3.15 we started to see this issue. After bisecting 
I
found the following change is the first one which causes the issue,

a328ac9 gianfar: Implement MAC reset and reconfig procedure

I was able to revert this in v3.15 , however with later development a revert
doesn't appear to be possible. We have no fix for this currently.

I can do testing if you have an idea what might cause the issue.

Daniel