Re: [PATCH net v2] net: phy: Correctly process PHY_HALTED in phy_stop_machine()

2017-08-30 Thread Florian Fainelli
On 08/30/2017 05:13 PM, David Daney wrote:
> On 07/31/2017 05:28 PM, David Miller wrote:
>> From: Florian Fainelli 
>> Date: Fri, 28 Jul 2017 11:58:36 -0700
>>
>>> Marc reported that he was not getting the PHY library adjust_link()
>>> callback function to run when calling phy_stop() + phy_disconnect()
>>> which does not indeed happen because we set the state machine to
>>> PHY_HALTED but we don't get to run it to process this state past that
>>> point.
>>>
>>> Fix this with a synchronous call to phy_state_machine() in order to have
>>> the state machine actually act on PHY_HALTED, set the PHY device's link
>>> down, turn the network device's carrier off and finally call the
>>> adjust_link() function.
>>>
>>> Reported-by: Marc Gonzalez 
>>> Fixes: a390d1f379cf ("phylib: convert state_queue work to delayed_work")
>>> Signed-off-by: Florian Fainelli 
>>> ---
>>> Changes in v2:
>>>
>>> - reword subject and commit message based on changes
>>> - dropped flush_scheduled_work() since it is redundant
>>
>> Applied and queued up for -stable, thanks.
>>
> 
> 
> This is broken.  Please revert.
> 
> Upstream commit 7ad813f20853 and in the stable branches as well.
> 
> When ndo_stop() is called we call:
> 
> 
>  phy_disconnect()
> +---> phy_stop_interrupts() implies: phydev->irq = PHY_POLL;
> +---> phy_stop_machine()
> |  +---> phy_stop_machine()
> |  +> queue_delayed_work(): Work queued.
> +--->phy_detach() implies: phydev->attached_dev = NULL;
> 
> Now at a later time the queued work does:
> 
>  phy_state_machine()
> +>netif_carrier_off(phydev->attached_dev): Oh no! It is NULL:

How about the following instead of a revert (which I have queued locally
as well along with your correct call graph). This still would not fix
Geert's problem where with this change, we do actually call back into
adjust_link after a phy_stop() which may be problematic for him so I
think the revert is just easier and Marc, we'll figure out something for
nb8800?

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index d0626bf5c540..78168e19bd5d 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -1234,7 +1234,7 @@ void phy_state_machine(struct work_struct *work)
 * PHY, if PHY_IGNORE_INTERRUPT is set, then we will be moving
 * between states from phy_mac_interrupt()
 */
-   if (phydev->irq == PHY_POLL)
+   if (phydev->irq == PHY_POLL && phydev->state != PHY_HALTED)
queue_delayed_work(system_power_efficient_wq,
>state_queue,
   PHY_STATE_TIME * HZ);
 }
> 
> 
>  CPU 12 Unable to handle kernel paging request at virtual address
> 0048, epc == 80de37ec, ra == 80c7c
> Oops[#1]:
> CPU: 12 PID: 1502 Comm: kworker/12:1 Not tainted 4.9.43-Cavium-Octeon+ #1
> Workqueue: events_power_efficient phy_state_machine
> task: 8004021ed100 task.stack: 800409d7
> $ 0   :  84720060 0048 0004
> $ 4   :  0001 0004 
> $ 8   :   98f3 
> $12   : 800409d73fe0 9c00 846547c8 af3b
> $16   : 8004096bab68 8004096babd0  8004096ba800
> $20   :   8109 0008
> $24   : 0061 808637b0
> $28   : 800409d7 800409d73cf0 8000271bd300 80c7804c
> Hi: 002a
> Lo: 003f
> epc   : 80de37ec netif_carrier_off+0xc/0x58
> ra: 80c7804c phy_state_machine+0x48c/0x4f8
> Status: 14009ce3KX SX UX KERNEL EXL IE
> Cause : 0088 (ExcCode 02)
> BadVA : 0048
> PrId  : 000d9501 (Cavium Octeon III)
> Modules linked in:
> Process kworker/12:1 (pid: 1502, threadinfo=800409d7,
> task=8004021ed100, tls=)
> Stack : 800409a54000 8004096bab68 8000271bd300 8000271c1e00
>  808a1708 800409a54000 8000271bd300
> 8000271bd320 800409a54030 80ff0f00 0001
> 8109 808a1ac0 800402182080 8465
> 800402182080 8465 80ff 800409a54000
> 808a1970  8004099e8000 800402099240
>  808a8598  800408eeeb00
> 800409a54000 810a1d00  800409d73de8
> 800409d73de8 0088 0c009c00 800409d73e08
> 800409d73e08 800402182080 808a84d0 800402182080
> ...
> Call Trace:
> [] netif_carrier_off+0xc/0x58
> [] phy_state_machine+0x48c/0x4f8
> [] process_one_work+0x158/0x368
> [] worker_thread+0x150/0x4c0
> [] 

Re: [PATCH net v2] net: phy: Correctly process PHY_HALTED in phy_stop_machine()

2017-08-30 Thread Florian Fainelli
On 08/30/2017 05:13 PM, David Daney wrote:
> On 07/31/2017 05:28 PM, David Miller wrote:
>> From: Florian Fainelli 
>> Date: Fri, 28 Jul 2017 11:58:36 -0700
>>
>>> Marc reported that he was not getting the PHY library adjust_link()
>>> callback function to run when calling phy_stop() + phy_disconnect()
>>> which does not indeed happen because we set the state machine to
>>> PHY_HALTED but we don't get to run it to process this state past that
>>> point.
>>>
>>> Fix this with a synchronous call to phy_state_machine() in order to have
>>> the state machine actually act on PHY_HALTED, set the PHY device's link
>>> down, turn the network device's carrier off and finally call the
>>> adjust_link() function.
>>>
>>> Reported-by: Marc Gonzalez 
>>> Fixes: a390d1f379cf ("phylib: convert state_queue work to delayed_work")
>>> Signed-off-by: Florian Fainelli 
>>> ---
>>> Changes in v2:
>>>
>>> - reword subject and commit message based on changes
>>> - dropped flush_scheduled_work() since it is redundant
>>
>> Applied and queued up for -stable, thanks.
>>
> 
> 
> This is broken.  Please revert.

This has been causing problem for Geert as well, 2 vs 1, Marc, you lose,
I will send a revert for this shortly, sorry about that.

> 
> Upstream commit 7ad813f20853 and in the stable branches as well.
> 
> When ndo_stop() is called we call:
> 
> 
>  phy_disconnect()
> +---> phy_stop_interrupts() implies: phydev->irq = PHY_POLL;
> +---> phy_stop_machine()
> |  +---> phy_stop_machine()
> |  +> queue_delayed_work(): Work queued.
> +--->phy_detach() implies: phydev->attached_dev = NULL;
> 
> Now at a later time the queued work does:
> 
>  phy_state_machine()
> +>netif_carrier_off(phydev->attached_dev): Oh no! It is NULL:
> 
> 
>  CPU 12 Unable to handle kernel paging request at virtual address
> 0048, epc == 80de37ec, ra == 80c7c
> Oops[#1]:
> CPU: 12 PID: 1502 Comm: kworker/12:1 Not tainted 4.9.43-Cavium-Octeon+ #1
> Workqueue: events_power_efficient phy_state_machine
> task: 8004021ed100 task.stack: 800409d7
> $ 0   :  84720060 0048 0004
> $ 4   :  0001 0004 
> $ 8   :   98f3 
> $12   : 800409d73fe0 9c00 846547c8 af3b
> $16   : 8004096bab68 8004096babd0  8004096ba800
> $20   :   8109 0008
> $24   : 0061 808637b0
> $28   : 800409d7 800409d73cf0 8000271bd300 80c7804c
> Hi: 002a
> Lo: 003f
> epc   : 80de37ec netif_carrier_off+0xc/0x58
> ra: 80c7804c phy_state_machine+0x48c/0x4f8
> Status: 14009ce3KX SX UX KERNEL EXL IE
> Cause : 0088 (ExcCode 02)
> BadVA : 0048
> PrId  : 000d9501 (Cavium Octeon III)
> Modules linked in:
> Process kworker/12:1 (pid: 1502, threadinfo=800409d7,
> task=8004021ed100, tls=)
> Stack : 800409a54000 8004096bab68 8000271bd300 8000271c1e00
>  808a1708 800409a54000 8000271bd300
> 8000271bd320 800409a54030 80ff0f00 0001
> 8109 808a1ac0 800402182080 8465
> 800402182080 8465 80ff 800409a54000
> 808a1970  8004099e8000 800402099240
>  808a8598  800408eeeb00
> 800409a54000 810a1d00  800409d73de8
> 800409d73de8 0088 0c009c00 800409d73e08
> 800409d73e08 800402182080 808a84d0 800402182080
> ...
> Call Trace:
> [] netif_carrier_off+0xc/0x58
> [] phy_state_machine+0x48c/0x4f8
> [] process_one_work+0x158/0x368
> [] worker_thread+0x150/0x4c0
> [] kthread+0xc8/0xe0
> [] ret_from_kernel_thread+0x14/0x1c


-- 
Florian


Re: [PATCH net v2] net: phy: Correctly process PHY_HALTED in phy_stop_machine()

2017-08-30 Thread David Daney

And of course I mess up my pretty picture, see below.

On 08/30/2017 05:13 PM, David Daney wrote:

On 07/31/2017 05:28 PM, David Miller wrote:

From: Florian Fainelli 
Date: Fri, 28 Jul 2017 11:58:36 -0700


Marc reported that he was not getting the PHY library adjust_link()
callback function to run when calling phy_stop() + phy_disconnect()
which does not indeed happen because we set the state machine to
PHY_HALTED but we don't get to run it to process this state past that
point.

Fix this with a synchronous call to phy_state_machine() in order to have
the state machine actually act on PHY_HALTED, set the PHY device's link
down, turn the network device's carrier off and finally call the
adjust_link() function.

Reported-by: Marc Gonzalez 
Fixes: a390d1f379cf ("phylib: convert state_queue work to delayed_work")
Signed-off-by: Florian Fainelli 
---
Changes in v2:

- reword subject and commit message based on changes
- dropped flush_scheduled_work() since it is redundant


Applied and queued up for -stable, thanks.




This is broken.  Please revert.

Upstream commit 7ad813f20853 and in the stable branches as well.

When ndo_stop() is called we call:


  phy_disconnect()
 +---> phy_stop_interrupts() implies: phydev->irq = PHY_POLL;
 +---> phy_stop_machine()
 |  +---> phy_stop_machine()


s/phy_stop_machine/phy_state_machine/

The call that the offending patch adds.



 |  +> queue_delayed_work(): Work queued.
 +--->phy_detach() implies: phydev->attached_dev = NULL;

Now at a later time the queued work does:

  phy_state_machine()
 +>netif_carrier_off(phydev->attached_dev): Oh no! It is NULL:


  CPU 12 Unable to handle kernel paging request at virtual address
0048, epc == 80de37ec, ra == 80c7c
Oops[#1]:
CPU: 12 PID: 1502 Comm: kworker/12:1 Not tainted 4.9.43-Cavium-Octeon+ #1
Workqueue: events_power_efficient phy_state_machine
task: 8004021ed100 task.stack: 800409d7
$ 0   :  84720060 0048 0004
$ 4   :  0001 0004 
$ 8   :   98f3 
$12   : 800409d73fe0 9c00 846547c8 af3b
$16   : 8004096bab68 8004096babd0  8004096ba800
$20   :   8109 0008
$24   : 0061 808637b0
$28   : 800409d7 800409d73cf0 8000271bd300 80c7804c
Hi: 002a
Lo: 003f
epc   : 80de37ec netif_carrier_off+0xc/0x58
ra: 80c7804c phy_state_machine+0x48c/0x4f8
Status: 14009ce3KX SX UX KERNEL EXL IE
Cause : 0088 (ExcCode 02)
BadVA : 0048
PrId  : 000d9501 (Cavium Octeon III)
Modules linked in:
Process kworker/12:1 (pid: 1502, threadinfo=800409d7,
task=8004021ed100, tls=)
Stack : 800409a54000 8004096bab68 8000271bd300 8000271c1e00
  808a1708 800409a54000 
8000271bd300
 8000271bd320 800409a54030 80ff0f00 
0001
 8109 808a1ac0 800402182080 
8465
 800402182080 8465 80ff 
800409a54000
 808a1970  8004099e8000 
800402099240
  808a8598  
800408eeeb00
 800409a54000 810a1d00  
800409d73de8
 800409d73de8 0088 0c009c00 
800409d73e08
 800409d73e08 800402182080 808a84d0 
800402182080

 ...
Call Trace:
[] netif_carrier_off+0xc/0x58
[] phy_state_machine+0x48c/0x4f8
[] process_one_work+0x158/0x368
[] worker_thread+0x150/0x4c0
[] kthread+0xc8/0xe0
[] ret_from_kernel_thread+0x14/0x1c




Re: [PATCH net v2] net: phy: Correctly process PHY_HALTED in phy_stop_machine()

2017-08-30 Thread David Daney

On 07/31/2017 05:28 PM, David Miller wrote:

From: Florian Fainelli 
Date: Fri, 28 Jul 2017 11:58:36 -0700


Marc reported that he was not getting the PHY library adjust_link()
callback function to run when calling phy_stop() + phy_disconnect()
which does not indeed happen because we set the state machine to
PHY_HALTED but we don't get to run it to process this state past that
point.

Fix this with a synchronous call to phy_state_machine() in order to have
the state machine actually act on PHY_HALTED, set the PHY device's link
down, turn the network device's carrier off and finally call the
adjust_link() function.

Reported-by: Marc Gonzalez 
Fixes: a390d1f379cf ("phylib: convert state_queue work to delayed_work")
Signed-off-by: Florian Fainelli 
---
Changes in v2:

- reword subject and commit message based on changes
- dropped flush_scheduled_work() since it is redundant


Applied and queued up for -stable, thanks.




This is broken.  Please revert.

Upstream commit 7ad813f20853 and in the stable branches as well.

When ndo_stop() is called we call:


 phy_disconnect()
+---> phy_stop_interrupts() implies: phydev->irq = PHY_POLL;
+---> phy_stop_machine()
|  +---> phy_stop_machine()
|  +> queue_delayed_work(): Work queued.
+--->phy_detach() implies: phydev->attached_dev = NULL;

Now at a later time the queued work does:

 phy_state_machine()
+>netif_carrier_off(phydev->attached_dev): Oh no! It is NULL:


 CPU 12 Unable to handle kernel paging request at virtual address
0048, epc == 80de37ec, ra == 80c7c
Oops[#1]:
CPU: 12 PID: 1502 Comm: kworker/12:1 Not tainted 4.9.43-Cavium-Octeon+ #1
Workqueue: events_power_efficient phy_state_machine
task: 8004021ed100 task.stack: 800409d7
$ 0   :  84720060 0048 0004
$ 4   :  0001 0004 
$ 8   :   98f3 
$12   : 800409d73fe0 9c00 846547c8 af3b
$16   : 8004096bab68 8004096babd0  8004096ba800
$20   :   8109 0008
$24   : 0061 808637b0
$28   : 800409d7 800409d73cf0 8000271bd300 80c7804c
Hi: 002a
Lo: 003f
epc   : 80de37ec netif_carrier_off+0xc/0x58
ra: 80c7804c phy_state_machine+0x48c/0x4f8
Status: 14009ce3KX SX UX KERNEL EXL IE
Cause : 0088 (ExcCode 02)
BadVA : 0048
PrId  : 000d9501 (Cavium Octeon III)
Modules linked in:
Process kworker/12:1 (pid: 1502, threadinfo=800409d7,
task=8004021ed100, tls=)
Stack : 800409a54000 8004096bab68 8000271bd300 8000271c1e00
 808a1708 800409a54000 8000271bd300
8000271bd320 800409a54030 80ff0f00 0001
8109 808a1ac0 800402182080 8465
800402182080 8465 80ff 800409a54000
808a1970  8004099e8000 800402099240
 808a8598  800408eeeb00
800409a54000 810a1d00  800409d73de8
800409d73de8 0088 0c009c00 800409d73e08
800409d73e08 800402182080 808a84d0 800402182080
...
Call Trace:
[] netif_carrier_off+0xc/0x58
[] phy_state_machine+0x48c/0x4f8
[] process_one_work+0x158/0x368
[] worker_thread+0x150/0x4c0
[] kthread+0xc8/0xe0
[] ret_from_kernel_thread+0x14/0x1c


Re: [PATCH net v2] net: phy: Correctly process PHY_HALTED in phy_stop_machine()

2017-07-31 Thread David Miller
From: Florian Fainelli 
Date: Fri, 28 Jul 2017 11:58:36 -0700

> Marc reported that he was not getting the PHY library adjust_link()
> callback function to run when calling phy_stop() + phy_disconnect()
> which does not indeed happen because we set the state machine to
> PHY_HALTED but we don't get to run it to process this state past that
> point.
> 
> Fix this with a synchronous call to phy_state_machine() in order to have
> the state machine actually act on PHY_HALTED, set the PHY device's link
> down, turn the network device's carrier off and finally call the
> adjust_link() function.
> 
> Reported-by: Marc Gonzalez 
> Fixes: a390d1f379cf ("phylib: convert state_queue work to delayed_work")
> Signed-off-by: Florian Fainelli 
> ---
> Changes in v2:
> 
> - reword subject and commit message based on changes
> - dropped flush_scheduled_work() since it is redundant

Applied and queued up for -stable, thanks.


Re: [PATCH net v2] net: phy: Correctly process PHY_HALTED in phy_stop_machine()

2017-07-28 Thread Marc Gonzalez
On 28/07/2017 20:58, Florian Fainelli wrote:

> Marc reported that he was not getting the PHY library adjust_link()
> callback function to run when calling phy_stop() + phy_disconnect()
> which does not indeed happen because we set the state machine to
> PHY_HALTED but we don't get to run it to process this state past that
> point.
> 
> Fix this with a synchronous call to phy_state_machine() in order to have
> the state machine actually act on PHY_HALTED, set the PHY device's link
> down, turn the network device's carrier off and finally call the
> adjust_link() function.
> 
> Reported-by: Marc Gonzalez 
> Fixes: a390d1f379cf ("phylib: convert state_queue work to delayed_work")
> Signed-off-by: Florian Fainelli 
> ---
> Changes in v2:
> 
> - reword subject and commit message based on changes
> - dropped flush_scheduled_work() since it is redundant
> 
>  drivers/net/phy/phy.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
> index d0626bf5c540..5068c582d502 100644
> --- a/drivers/net/phy/phy.c
> +++ b/drivers/net/phy/phy.c
> @@ -749,6 +749,9 @@ void phy_stop_machine(struct phy_device *phydev)
>   if (phydev->state > PHY_UP && phydev->state != PHY_HALTED)
>   phydev->state = PHY_UP;
>   mutex_unlock(>lock);
> +
> + /* Now we can run the state machine synchronously */
> + phy_state_machine(>state_queue.work);
>  }
>  
>  /**

Signed-off-by: Marc Gonzalez 

Regards.


[PATCH net v2] net: phy: Correctly process PHY_HALTED in phy_stop_machine()

2017-07-28 Thread Florian Fainelli
Marc reported that he was not getting the PHY library adjust_link()
callback function to run when calling phy_stop() + phy_disconnect()
which does not indeed happen because we set the state machine to
PHY_HALTED but we don't get to run it to process this state past that
point.

Fix this with a synchronous call to phy_state_machine() in order to have
the state machine actually act on PHY_HALTED, set the PHY device's link
down, turn the network device's carrier off and finally call the
adjust_link() function.

Reported-by: Marc Gonzalez 
Fixes: a390d1f379cf ("phylib: convert state_queue work to delayed_work")
Signed-off-by: Florian Fainelli 
---
Changes in v2:

- reword subject and commit message based on changes
- dropped flush_scheduled_work() since it is redundant

 drivers/net/phy/phy.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index d0626bf5c540..5068c582d502 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -749,6 +749,9 @@ void phy_stop_machine(struct phy_device *phydev)
if (phydev->state > PHY_UP && phydev->state != PHY_HALTED)
phydev->state = PHY_UP;
mutex_unlock(>lock);
+
+   /* Now we can run the state machine synchronously */
+   phy_state_machine(>state_queue.work);
 }
 
 /**
-- 
2.9.3