[PATCH net v2] net: phy: Correctly process PHY_HALTED in phy_stop_machine()
Marc reported that he was not getting the PHY library adjust_link() callback function to run when calling phy_stop() + phy_disconnect() which does not indeed happen because we set the state machine to PHY_HALTED but we don't get to run it to process this state past that point. Fix this with a synchronous call to phy_state_machine() in order to have the state machine actually act on PHY_HALTED, set the PHY device's link down, turn the network device's carrier off and finally call the adjust_link() function. Reported-by: Marc Gonzalez Fixes: a390d1f379cf ("phylib: convert state_queue work to delayed_work") Signed-off-by: Florian Fainelli --- Changes in v2: - reword subject and commit message based on changes - dropped flush_scheduled_work() since it is redundant drivers/net/phy/phy.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c index d0626bf5c540..5068c582d502 100644 --- a/drivers/net/phy/phy.c +++ b/drivers/net/phy/phy.c @@ -749,6 +749,9 @@ void phy_stop_machine(struct phy_device *phydev) if (phydev->state > PHY_UP && phydev->state != PHY_HALTED) phydev->state = PHY_UP; mutex_unlock(&phydev->lock); + + /* Now we can run the state machine synchronously */ + phy_state_machine(&phydev->state_queue.work); } /** -- 2.9.3
Re: [PATCH net v2] net: phy: Correctly process PHY_HALTED in phy_stop_machine()
On 28/07/2017 20:58, Florian Fainelli wrote: > Marc reported that he was not getting the PHY library adjust_link() > callback function to run when calling phy_stop() + phy_disconnect() > which does not indeed happen because we set the state machine to > PHY_HALTED but we don't get to run it to process this state past that > point. > > Fix this with a synchronous call to phy_state_machine() in order to have > the state machine actually act on PHY_HALTED, set the PHY device's link > down, turn the network device's carrier off and finally call the > adjust_link() function. > > Reported-by: Marc Gonzalez > Fixes: a390d1f379cf ("phylib: convert state_queue work to delayed_work") > Signed-off-by: Florian Fainelli > --- > Changes in v2: > > - reword subject and commit message based on changes > - dropped flush_scheduled_work() since it is redundant > > drivers/net/phy/phy.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c > index d0626bf5c540..5068c582d502 100644 > --- a/drivers/net/phy/phy.c > +++ b/drivers/net/phy/phy.c > @@ -749,6 +749,9 @@ void phy_stop_machine(struct phy_device *phydev) > if (phydev->state > PHY_UP && phydev->state != PHY_HALTED) > phydev->state = PHY_UP; > mutex_unlock(&phydev->lock); > + > + /* Now we can run the state machine synchronously */ > + phy_state_machine(&phydev->state_queue.work); > } > > /** Signed-off-by: Marc Gonzalez Regards.
Re: [PATCH net v2] net: phy: Correctly process PHY_HALTED in phy_stop_machine()
From: Florian Fainelli Date: Fri, 28 Jul 2017 11:58:36 -0700 > Marc reported that he was not getting the PHY library adjust_link() > callback function to run when calling phy_stop() + phy_disconnect() > which does not indeed happen because we set the state machine to > PHY_HALTED but we don't get to run it to process this state past that > point. > > Fix this with a synchronous call to phy_state_machine() in order to have > the state machine actually act on PHY_HALTED, set the PHY device's link > down, turn the network device's carrier off and finally call the > adjust_link() function. > > Reported-by: Marc Gonzalez > Fixes: a390d1f379cf ("phylib: convert state_queue work to delayed_work") > Signed-off-by: Florian Fainelli > --- > Changes in v2: > > - reword subject and commit message based on changes > - dropped flush_scheduled_work() since it is redundant Applied and queued up for -stable, thanks.
Re: [PATCH net v2] net: phy: Correctly process PHY_HALTED in phy_stop_machine()
On 07/31/2017 05:28 PM, David Miller wrote: From: Florian Fainelli Date: Fri, 28 Jul 2017 11:58:36 -0700 Marc reported that he was not getting the PHY library adjust_link() callback function to run when calling phy_stop() + phy_disconnect() which does not indeed happen because we set the state machine to PHY_HALTED but we don't get to run it to process this state past that point. Fix this with a synchronous call to phy_state_machine() in order to have the state machine actually act on PHY_HALTED, set the PHY device's link down, turn the network device's carrier off and finally call the adjust_link() function. Reported-by: Marc Gonzalez Fixes: a390d1f379cf ("phylib: convert state_queue work to delayed_work") Signed-off-by: Florian Fainelli --- Changes in v2: - reword subject and commit message based on changes - dropped flush_scheduled_work() since it is redundant Applied and queued up for -stable, thanks. This is broken. Please revert. Upstream commit 7ad813f20853 and in the stable branches as well. When ndo_stop() is called we call: phy_disconnect() +---> phy_stop_interrupts() implies: phydev->irq = PHY_POLL; +---> phy_stop_machine() | +---> phy_stop_machine() | +> queue_delayed_work(): Work queued. +--->phy_detach() implies: phydev->attached_dev = NULL; Now at a later time the queued work does: phy_state_machine() +>netif_carrier_off(phydev->attached_dev): Oh no! It is NULL: CPU 12 Unable to handle kernel paging request at virtual address 0048, epc == 80de37ec, ra == 80c7c Oops[#1]: CPU: 12 PID: 1502 Comm: kworker/12:1 Not tainted 4.9.43-Cavium-Octeon+ #1 Workqueue: events_power_efficient phy_state_machine task: 8004021ed100 task.stack: 800409d7 $ 0 : 84720060 0048 0004 $ 4 : 0001 0004 $ 8 : 98f3 $12 : 800409d73fe0 9c00 846547c8 af3b $16 : 8004096bab68 8004096babd0 8004096ba800 $20 : 8109 0008 $24 : 0061 808637b0 $28 : 800409d7 800409d73cf0 8000271bd300 80c7804c Hi: 002a Lo: 003f epc : 80de37ec netif_carrier_off+0xc/0x58 ra: 80c7804c phy_state_machine+0x48c/0x4f8 Status: 14009ce3KX SX UX KERNEL EXL IE Cause : 0088 (ExcCode 02) BadVA : 0048 PrId : 000d9501 (Cavium Octeon III) Modules linked in: Process kworker/12:1 (pid: 1502, threadinfo=800409d7, task=8004021ed100, tls=) Stack : 800409a54000 8004096bab68 8000271bd300 8000271c1e00 808a1708 800409a54000 8000271bd300 8000271bd320 800409a54030 80ff0f00 0001 8109 808a1ac0 800402182080 8465 800402182080 8465 80ff 800409a54000 808a1970 8004099e8000 800402099240 808a8598 800408eeeb00 800409a54000 810a1d00 800409d73de8 800409d73de8 0088 0c009c00 800409d73e08 800409d73e08 800402182080 808a84d0 800402182080 ... Call Trace: [] netif_carrier_off+0xc/0x58 [] phy_state_machine+0x48c/0x4f8 [] process_one_work+0x158/0x368 [] worker_thread+0x150/0x4c0 [] kthread+0xc8/0xe0 [] ret_from_kernel_thread+0x14/0x1c
Re: [PATCH net v2] net: phy: Correctly process PHY_HALTED in phy_stop_machine()
On 08/30/2017 05:13 PM, David Daney wrote: > On 07/31/2017 05:28 PM, David Miller wrote: >> From: Florian Fainelli >> Date: Fri, 28 Jul 2017 11:58:36 -0700 >> >>> Marc reported that he was not getting the PHY library adjust_link() >>> callback function to run when calling phy_stop() + phy_disconnect() >>> which does not indeed happen because we set the state machine to >>> PHY_HALTED but we don't get to run it to process this state past that >>> point. >>> >>> Fix this with a synchronous call to phy_state_machine() in order to have >>> the state machine actually act on PHY_HALTED, set the PHY device's link >>> down, turn the network device's carrier off and finally call the >>> adjust_link() function. >>> >>> Reported-by: Marc Gonzalez >>> Fixes: a390d1f379cf ("phylib: convert state_queue work to delayed_work") >>> Signed-off-by: Florian Fainelli >>> --- >>> Changes in v2: >>> >>> - reword subject and commit message based on changes >>> - dropped flush_scheduled_work() since it is redundant >> >> Applied and queued up for -stable, thanks. >> > > > This is broken. Please revert. This has been causing problem for Geert as well, 2 vs 1, Marc, you lose, I will send a revert for this shortly, sorry about that. > > Upstream commit 7ad813f20853 and in the stable branches as well. > > When ndo_stop() is called we call: > > > phy_disconnect() > +---> phy_stop_interrupts() implies: phydev->irq = PHY_POLL; > +---> phy_stop_machine() > | +---> phy_stop_machine() > | +> queue_delayed_work(): Work queued. > +--->phy_detach() implies: phydev->attached_dev = NULL; > > Now at a later time the queued work does: > > phy_state_machine() > +>netif_carrier_off(phydev->attached_dev): Oh no! It is NULL: > > > CPU 12 Unable to handle kernel paging request at virtual address > 0048, epc == 80de37ec, ra == 80c7c > Oops[#1]: > CPU: 12 PID: 1502 Comm: kworker/12:1 Not tainted 4.9.43-Cavium-Octeon+ #1 > Workqueue: events_power_efficient phy_state_machine > task: 8004021ed100 task.stack: 800409d7 > $ 0 : 84720060 0048 0004 > $ 4 : 0001 0004 > $ 8 : 98f3 > $12 : 800409d73fe0 9c00 846547c8 af3b > $16 : 8004096bab68 8004096babd0 8004096ba800 > $20 : 8109 0008 > $24 : 0061 808637b0 > $28 : 800409d7 800409d73cf0 8000271bd300 80c7804c > Hi: 002a > Lo: 003f > epc : 80de37ec netif_carrier_off+0xc/0x58 > ra: 80c7804c phy_state_machine+0x48c/0x4f8 > Status: 14009ce3KX SX UX KERNEL EXL IE > Cause : 0088 (ExcCode 02) > BadVA : 0048 > PrId : 000d9501 (Cavium Octeon III) > Modules linked in: > Process kworker/12:1 (pid: 1502, threadinfo=800409d7, > task=8004021ed100, tls=) > Stack : 800409a54000 8004096bab68 8000271bd300 8000271c1e00 > 808a1708 800409a54000 8000271bd300 > 8000271bd320 800409a54030 80ff0f00 0001 > 8109 808a1ac0 800402182080 8465 > 800402182080 8465 80ff 800409a54000 > 808a1970 8004099e8000 800402099240 > 808a8598 800408eeeb00 > 800409a54000 810a1d00 800409d73de8 > 800409d73de8 0088 0c009c00 800409d73e08 > 800409d73e08 800402182080 808a84d0 800402182080 > ... > Call Trace: > [] netif_carrier_off+0xc/0x58 > [] phy_state_machine+0x48c/0x4f8 > [] process_one_work+0x158/0x368 > [] worker_thread+0x150/0x4c0 > [] kthread+0xc8/0xe0 > [] ret_from_kernel_thread+0x14/0x1c -- Florian
Re: [PATCH net v2] net: phy: Correctly process PHY_HALTED in phy_stop_machine()
And of course I mess up my pretty picture, see below. On 08/30/2017 05:13 PM, David Daney wrote: On 07/31/2017 05:28 PM, David Miller wrote: From: Florian Fainelli Date: Fri, 28 Jul 2017 11:58:36 -0700 Marc reported that he was not getting the PHY library adjust_link() callback function to run when calling phy_stop() + phy_disconnect() which does not indeed happen because we set the state machine to PHY_HALTED but we don't get to run it to process this state past that point. Fix this with a synchronous call to phy_state_machine() in order to have the state machine actually act on PHY_HALTED, set the PHY device's link down, turn the network device's carrier off and finally call the adjust_link() function. Reported-by: Marc Gonzalez Fixes: a390d1f379cf ("phylib: convert state_queue work to delayed_work") Signed-off-by: Florian Fainelli --- Changes in v2: - reword subject and commit message based on changes - dropped flush_scheduled_work() since it is redundant Applied and queued up for -stable, thanks. This is broken. Please revert. Upstream commit 7ad813f20853 and in the stable branches as well. When ndo_stop() is called we call: phy_disconnect() +---> phy_stop_interrupts() implies: phydev->irq = PHY_POLL; +---> phy_stop_machine() | +---> phy_stop_machine() s/phy_stop_machine/phy_state_machine/ The call that the offending patch adds. | +> queue_delayed_work(): Work queued. +--->phy_detach() implies: phydev->attached_dev = NULL; Now at a later time the queued work does: phy_state_machine() +>netif_carrier_off(phydev->attached_dev): Oh no! It is NULL: CPU 12 Unable to handle kernel paging request at virtual address 0048, epc == 80de37ec, ra == 80c7c Oops[#1]: CPU: 12 PID: 1502 Comm: kworker/12:1 Not tainted 4.9.43-Cavium-Octeon+ #1 Workqueue: events_power_efficient phy_state_machine task: 8004021ed100 task.stack: 800409d7 $ 0 : 84720060 0048 0004 $ 4 : 0001 0004 $ 8 : 98f3 $12 : 800409d73fe0 9c00 846547c8 af3b $16 : 8004096bab68 8004096babd0 8004096ba800 $20 : 8109 0008 $24 : 0061 808637b0 $28 : 800409d7 800409d73cf0 8000271bd300 80c7804c Hi: 002a Lo: 003f epc : 80de37ec netif_carrier_off+0xc/0x58 ra: 80c7804c phy_state_machine+0x48c/0x4f8 Status: 14009ce3KX SX UX KERNEL EXL IE Cause : 0088 (ExcCode 02) BadVA : 0048 PrId : 000d9501 (Cavium Octeon III) Modules linked in: Process kworker/12:1 (pid: 1502, threadinfo=800409d7, task=8004021ed100, tls=) Stack : 800409a54000 8004096bab68 8000271bd300 8000271c1e00 808a1708 800409a54000 8000271bd300 8000271bd320 800409a54030 80ff0f00 0001 8109 808a1ac0 800402182080 8465 800402182080 8465 80ff 800409a54000 808a1970 8004099e8000 800402099240 808a8598 800408eeeb00 800409a54000 810a1d00 800409d73de8 800409d73de8 0088 0c009c00 800409d73e08 800409d73e08 800402182080 808a84d0 800402182080 ... Call Trace: [] netif_carrier_off+0xc/0x58 [] phy_state_machine+0x48c/0x4f8 [] process_one_work+0x158/0x368 [] worker_thread+0x150/0x4c0 [] kthread+0xc8/0xe0 [] ret_from_kernel_thread+0x14/0x1c
Re: [PATCH net v2] net: phy: Correctly process PHY_HALTED in phy_stop_machine()
On 08/30/2017 05:13 PM, David Daney wrote: > On 07/31/2017 05:28 PM, David Miller wrote: >> From: Florian Fainelli >> Date: Fri, 28 Jul 2017 11:58:36 -0700 >> >>> Marc reported that he was not getting the PHY library adjust_link() >>> callback function to run when calling phy_stop() + phy_disconnect() >>> which does not indeed happen because we set the state machine to >>> PHY_HALTED but we don't get to run it to process this state past that >>> point. >>> >>> Fix this with a synchronous call to phy_state_machine() in order to have >>> the state machine actually act on PHY_HALTED, set the PHY device's link >>> down, turn the network device's carrier off and finally call the >>> adjust_link() function. >>> >>> Reported-by: Marc Gonzalez >>> Fixes: a390d1f379cf ("phylib: convert state_queue work to delayed_work") >>> Signed-off-by: Florian Fainelli >>> --- >>> Changes in v2: >>> >>> - reword subject and commit message based on changes >>> - dropped flush_scheduled_work() since it is redundant >> >> Applied and queued up for -stable, thanks. >> > > > This is broken. Please revert. > > Upstream commit 7ad813f20853 and in the stable branches as well. > > When ndo_stop() is called we call: > > > phy_disconnect() > +---> phy_stop_interrupts() implies: phydev->irq = PHY_POLL; > +---> phy_stop_machine() > | +---> phy_stop_machine() > | +> queue_delayed_work(): Work queued. > +--->phy_detach() implies: phydev->attached_dev = NULL; > > Now at a later time the queued work does: > > phy_state_machine() > +>netif_carrier_off(phydev->attached_dev): Oh no! It is NULL: How about the following instead of a revert (which I have queued locally as well along with your correct call graph). This still would not fix Geert's problem where with this change, we do actually call back into adjust_link after a phy_stop() which may be problematic for him so I think the revert is just easier and Marc, we'll figure out something for nb8800? diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c index d0626bf5c540..78168e19bd5d 100644 --- a/drivers/net/phy/phy.c +++ b/drivers/net/phy/phy.c @@ -1234,7 +1234,7 @@ void phy_state_machine(struct work_struct *work) * PHY, if PHY_IGNORE_INTERRUPT is set, then we will be moving * between states from phy_mac_interrupt() */ - if (phydev->irq == PHY_POLL) + if (phydev->irq == PHY_POLL && phydev->state != PHY_HALTED) queue_delayed_work(system_power_efficient_wq, &phydev->state_queue, PHY_STATE_TIME * HZ); } > > > CPU 12 Unable to handle kernel paging request at virtual address > 0048, epc == 80de37ec, ra == 80c7c > Oops[#1]: > CPU: 12 PID: 1502 Comm: kworker/12:1 Not tainted 4.9.43-Cavium-Octeon+ #1 > Workqueue: events_power_efficient phy_state_machine > task: 8004021ed100 task.stack: 800409d7 > $ 0 : 84720060 0048 0004 > $ 4 : 0001 0004 > $ 8 : 98f3 > $12 : 800409d73fe0 9c00 846547c8 af3b > $16 : 8004096bab68 8004096babd0 8004096ba800 > $20 : 8109 0008 > $24 : 0061 808637b0 > $28 : 800409d7 800409d73cf0 8000271bd300 80c7804c > Hi: 002a > Lo: 003f > epc : 80de37ec netif_carrier_off+0xc/0x58 > ra: 80c7804c phy_state_machine+0x48c/0x4f8 > Status: 14009ce3KX SX UX KERNEL EXL IE > Cause : 0088 (ExcCode 02) > BadVA : 0048 > PrId : 000d9501 (Cavium Octeon III) > Modules linked in: > Process kworker/12:1 (pid: 1502, threadinfo=800409d7, > task=8004021ed100, tls=) > Stack : 800409a54000 8004096bab68 8000271bd300 8000271c1e00 > 808a1708 800409a54000 8000271bd300 > 8000271bd320 800409a54030 80ff0f00 0001 > 8109 808a1ac0 800402182080 8465 > 800402182080 8465 80ff 800409a54000 > 808a1970 8004099e8000 800402099240 > 808a8598 800408eeeb00 > 800409a54000 810a1d00 800409d73de8 > 800409d73de8 0088 0c009c00 800409d73e08 > 800409d73e08 800402182080 808a84d0 800402182080 > ... > Call Trace: > [] netif_carrier_off+0xc/0x58 > [] phy_state_machine+0x48c/0x4f8 > [] process_one_work+0x158/0x368 > [] worker_thread+0x150/0x4c0 > [] kthread+0xc8/0xe0 > [] ret_from_kernel_thread+0x14/0x1c -- Floria