Re: [PATCH 3/3] PCI: Convert rtw88 power cycle quirk to shutdown quirk

2021-03-03 Thread Kai-Heng Feng
On Sat, Feb 27, 2021 at 2:17 AM Bjorn Helgaas  wrote:
>
> On Fri, Feb 26, 2021 at 02:31:31PM +0100, Heiner Kallweit wrote:
> > On 26.02.2021 13:18, Kai-Heng Feng wrote:
> > > On Fri, Feb 26, 2021 at 8:10 PM Heiner Kallweit  
> > > wrote:
> > >>
> > >> On 26.02.2021 08:12, Kalle Valo wrote:
> > >>> Kai-Heng Feng  writes:
> > >>>
> > >>>> Now we have a generic D3 shutdown quirk, so convert the original
> > >>>> approach to a PCI quirk.
> > >>>>
> > >>>> Signed-off-by: Kai-Heng Feng 
> > >>>> ---
> > >>>>  drivers/net/wireless/realtek/rtw88/pci.c | 2 --
> > >>>>  drivers/pci/quirks.c | 6 ++
> > >>>>  2 files changed, 6 insertions(+), 2 deletions(-)
> > >>>
> > >>> It would have been nice to CC linux-wireless also on patches 1-2. I only
> > >>> saw patch 3 and had to search the rest of patches from lkml.
> > >>>
> > >>> I assume this goes via the PCI tree so:
> > >>>
> > >>> Acked-by: Kalle Valo 
> > >>
> > >> To me it looks odd to (mis-)use the quirk mechanism to set a device
> > >> to D3cold on shutdown. As I see it the quirk mechanism is used to work
> > >> around certain device misbehavior. And setting a device to a D3
> > >> state on shutdown is a normal activity, and the shutdown() callback
> > >> seems to be a good place for it.
> > >> I miss an explanation what the actual benefit of the change is.
> > >
> > > To make putting device to D3 more generic, as there are more than one
> > > device need the quirk.
> > >
> > > Here's the discussion:
> > > https://lore.kernel.org/linux-usb/00de6927-3fa6-a9a3-2d65-2b4d4e8f0...@linux.intel.com/
> > >
> >
> > Thanks for the link. For the AMD USB use case I don't have a strong opinion,
> > what's considered the better option may be a question of personal taste.
> > For rtw88 however I'd still consider it over-engineering to replace a simple
> > call to pci_set_power_state() with a PCI quirk.
> > I may be biased here because I find it sometimes bothering if I want to
> > look up how a device is handled and in addition to checking the respective
> > driver I also have to grep through quirks.c whether there's any special
> > handling.
>
> I haven't looked at these patches carefully, but in general, I agree
> that quirks should be used to work around hardware defects in the
> device.  If the device behaves correctly per spec, we should use a
> different mechanism so the code remains generic and all devices get
> the benefit.
>
> If we do add quirks, the commit log should explain what the device
> defect is.

So maybe it's reasonable to put all PCI devices to D3 at shutdown?

Kai-Heng

>
> Bjorn


Re: [PATCH 3/3] PCI: Convert rtw88 power cycle quirk to shutdown quirk

2021-02-26 Thread Kai-Heng Feng
On Fri, Feb 26, 2021 at 8:10 PM Heiner Kallweit  wrote:
>
> On 26.02.2021 08:12, Kalle Valo wrote:
> > Kai-Heng Feng  writes:
> >
> >> Now we have a generic D3 shutdown quirk, so convert the original
> >> approach to a PCI quirk.
> >>
> >> Signed-off-by: Kai-Heng Feng 
> >> ---
> >>  drivers/net/wireless/realtek/rtw88/pci.c | 2 --
> >>  drivers/pci/quirks.c | 6 ++
> >>  2 files changed, 6 insertions(+), 2 deletions(-)
> >
> > It would have been nice to CC linux-wireless also on patches 1-2. I only
> > saw patch 3 and had to search the rest of patches from lkml.
> >
> > I assume this goes via the PCI tree so:
> >
> > Acked-by: Kalle Valo 
> >
>
> To me it looks odd to (mis-)use the quirk mechanism to set a device
> to D3cold on shutdown. As I see it the quirk mechanism is used to work
> around certain device misbehavior. And setting a device to a D3
> state on shutdown is a normal activity, and the shutdown() callback
> seems to be a good place for it.
> I miss an explanation what the actual benefit of the change is.

To make putting device to D3 more generic, as there are more than one
device need the quirk.

Here's the discussion:
https://lore.kernel.org/linux-usb/00de6927-3fa6-a9a3-2d65-2b4d4e8f0...@linux.intel.com/

Kai-Heng


[PATCH 3/3] PCI: Convert rtw88 power cycle quirk to shutdown quirk

2021-02-25 Thread Kai-Heng Feng
Now we have a generic D3 shutdown quirk, so convert the original
approach to a PCI quirk.

Signed-off-by: Kai-Heng Feng 
---
 drivers/net/wireless/realtek/rtw88/pci.c | 2 --
 drivers/pci/quirks.c | 6 ++
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/realtek/rtw88/pci.c 
b/drivers/net/wireless/realtek/rtw88/pci.c
index 786a48649946..cddc9b09bb1f 100644
--- a/drivers/net/wireless/realtek/rtw88/pci.c
+++ b/drivers/net/wireless/realtek/rtw88/pci.c
@@ -1709,8 +1709,6 @@ void rtw_pci_shutdown(struct pci_dev *pdev)
 
if (chip->ops->shutdown)
chip->ops->shutdown(rtwdev);
-
-   pci_set_power_state(pdev, PCI_D3hot);
 }
 EXPORT_SYMBOL(rtw_pci_shutdown);
 
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 0a848ef0b7db..dfb8746e3b72 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -5627,3 +5627,9 @@ static void pci_fixup_shutdown_d3(struct pci_dev *pdev)
pci_set_power_state(pdev, PCI_D3cold);
 }
 DECLARE_PCI_FIXUP_SHUTDOWN(PCI_VENDOR_ID_AMD, 0x1639, pci_fixup_shutdown_d3);
+DECLARE_PCI_FIXUP_SHUTDOWN(PCI_VENDOR_ID_REALTEK, 0xd723, 
pci_fixup_shutdown_d3);
+DECLARE_PCI_FIXUP_SHUTDOWN(PCI_VENDOR_ID_REALTEK, 0xc821, 
pci_fixup_shutdown_d3);
+DECLARE_PCI_FIXUP_SHUTDOWN(PCI_VENDOR_ID_REALTEK, 0xb822, 
pci_fixup_shutdown_d3);
+DECLARE_PCI_FIXUP_SHUTDOWN(PCI_VENDOR_ID_REALTEK, 0xb822, 
pci_fixup_shutdown_d3);
+DECLARE_PCI_FIXUP_SHUTDOWN(PCI_VENDOR_ID_REALTEK, 0xc822, 
pci_fixup_shutdown_d3);
+DECLARE_PCI_FIXUP_SHUTDOWN(PCI_VENDOR_ID_REALTEK, 0xc82f, 
pci_fixup_shutdown_d3);
-- 
2.30.0



Re: [PATCH] rtw88: 8821c: Add RFE 2 support

2021-02-01 Thread Kai-Heng Feng
On Tue, Feb 2, 2021 at 3:02 PM Larry Finger  wrote:
>
> On 2/2/21 12:29 AM, Kalle Valo wrote:
> > Kai-Heng Feng  writes:
> >
> >> On Wed, Aug 5, 2020 at 7:24 PM Kai-Heng Feng
> >>  wrote:
> >>>
> >>> Hi Tony,
> >>>
> >>>> On Aug 5, 2020, at 19:18, Tony Chuang  wrote:
> >>>>
> >>>>> 8821CE with RFE 2 isn't supported:
> >>>>> [   12.404834] rtw_8821ce :02:00.0: rfe 2 isn't supported
> >>>>> [   12.404937] rtw_8821ce :02:00.0: failed to setup chip efuse info
> >>>>> [   12.404939] rtw_8821ce :02:00.0: failed to setup chip information
> >>>>>
> >>>>
> >>>> NACK
> >>>>
> >>>> The RFE type 2 should be working with some additional fixes.
> >>>> Did you tested connecting to AP with BT paired?
> >>>
> >>> No, I only tested WiFi.
> >>>
> >>>> The antenna configuration is different with RFE type 0.
> >>>> I will ask someone else to fix them.
> >>>> Then the RFE type 2 modules can be supported.
> >>>
> >>> Good to know that, I'll be patient and wait for a real fix.
> >>
> >> It's been quite some time, is support for RFE type 2 ready now?
> >
> > It looks like this patch should add it:
> >
> > https://patchwork.kernel.org/project/linux-wireless/patch/20210202055012.8296-4-pks...@realtek.com/
> >
> New firmware (rtw8821c_fw.bin) is also needed. That is available at
> https://github.com/lwfinger/rtw88.git.

Thanks. RFE2 works with the new firmware.

Kai-Heng

>
> Larry
>


[PATCH v2] r8169: Add support for another RTL8168FP

2021-02-01 Thread Kai-Heng Feng
According to the vendor driver, the new chip with XID 0x54b is
essentially the same as the one with XID 0x54a, but it doesn't need the
firmware.

So add support accordingly.

Signed-off-by: Kai-Heng Feng 
---
v2:
 - Add phy support.
 - Rebase on net-next.

 drivers/net/ethernet/realtek/r8169.h|  1 +
 drivers/net/ethernet/realtek/r8169_main.c   | 17 +++--
 drivers/net/ethernet/realtek/r8169_phy_config.c |  1 +
 3 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.h 
b/drivers/net/ethernet/realtek/r8169.h
index 7be86ef5a584..2728df46ec41 100644
--- a/drivers/net/ethernet/realtek/r8169.h
+++ b/drivers/net/ethernet/realtek/r8169.h
@@ -63,6 +63,7 @@ enum mac_version {
RTL_GIGA_MAC_VER_50,
RTL_GIGA_MAC_VER_51,
RTL_GIGA_MAC_VER_52,
+   RTL_GIGA_MAC_VER_53,
RTL_GIGA_MAC_VER_60,
RTL_GIGA_MAC_VER_61,
RTL_GIGA_MAC_VER_63,
diff --git a/drivers/net/ethernet/realtek/r8169_main.c 
b/drivers/net/ethernet/realtek/r8169_main.c
index 475e6f01ea10..48697ec36e55 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -146,6 +146,7 @@ static const struct {
[RTL_GIGA_MAC_VER_50] = {"RTL8168ep/8111ep" },
[RTL_GIGA_MAC_VER_51] = {"RTL8168ep/8111ep" },
[RTL_GIGA_MAC_VER_52] = {"RTL8168fp/RTL8117",  FIRMWARE_8168FP_3},
+   [RTL_GIGA_MAC_VER_53] = {"RTL8168fp/RTL8117",   },
[RTL_GIGA_MAC_VER_60] = {"RTL8125A" },
[RTL_GIGA_MAC_VER_61] = {"RTL8125A",FIRMWARE_8125A_3},
/* reserve 62 for CFG_METHOD_4 in the vendor driver */
@@ -696,7 +697,7 @@ static bool rtl_is_8168evl_up(struct rtl8169_private *tp)
 {
return tp->mac_version >= RTL_GIGA_MAC_VER_34 &&
   tp->mac_version != RTL_GIGA_MAC_VER_39 &&
-  tp->mac_version <= RTL_GIGA_MAC_VER_52;
+  tp->mac_version <= RTL_GIGA_MAC_VER_53;
 }
 
 static bool rtl_supports_eee(struct rtl8169_private *tp)
@@ -763,7 +764,9 @@ static bool name ## _check(struct rtl8169_private *tp)
 static void r8168fp_adjust_ocp_cmd(struct rtl8169_private *tp, u32 *cmd, int 
type)
 {
/* based on RTL8168FP_OOBMAC_BASE in vendor driver */
-   if (tp->mac_version == RTL_GIGA_MAC_VER_52 && type == ERIAR_OOB)
+   if (type == ERIAR_OOB &&
+   (tp->mac_version == RTL_GIGA_MAC_VER_52 ||
+tp->mac_version == RTL_GIGA_MAC_VER_53))
*cmd |= 0x7f0 << 18;
 }
 
@@ -1238,7 +1241,7 @@ static enum rtl_dash_type rtl_check_dash(struct 
rtl8169_private *tp)
case RTL_GIGA_MAC_VER_28:
case RTL_GIGA_MAC_VER_31:
return r8168dp_check_dash(tp) ? RTL_DASH_DP : RTL_DASH_NONE;
-   case RTL_GIGA_MAC_VER_49 ... RTL_GIGA_MAC_VER_52:
+   case RTL_GIGA_MAC_VER_49 ... RTL_GIGA_MAC_VER_53:
return r8168ep_check_dash(tp) ? RTL_DASH_EP : RTL_DASH_NONE;
default:
return RTL_DASH_NONE;
@@ -1962,6 +1965,7 @@ static enum mac_version rtl8169_get_mac_version(u16 xid, 
bool gmii)
{ 0x7c8, 0x608, RTL_GIGA_MAC_VER_61 },
 
/* RTL8117 */
+   { 0x7cf, 0x54b, RTL_GIGA_MAC_VER_53 },
{ 0x7cf, 0x54a, RTL_GIGA_MAC_VER_52 },
 
/* 8168EP family. */
@@ -2236,7 +2240,7 @@ static void rtl_init_rxcfg(struct rtl8169_private *tp)
case RTL_GIGA_MAC_VER_38:
RTL_W32(tp, RxConfig, RX128_INT_EN | RX_MULTI_EN | 
RX_DMA_BURST);
break;
-   case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_52:
+   case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_53:
RTL_W32(tp, RxConfig, RX128_INT_EN | RX_MULTI_EN | RX_DMA_BURST 
| RX_EARLY_OFF);
break;
case RTL_GIGA_MAC_VER_60 ... RTL_GIGA_MAC_VER_63:
@@ -2410,7 +2414,7 @@ DECLARE_RTL_COND(rtl_rxtx_empty_cond_2)
 static void rtl_wait_txrx_fifo_empty(struct rtl8169_private *tp)
 {
switch (tp->mac_version) {
-   case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_52:
+   case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_53:
rtl_loop_wait_high(tp, &rtl_txcfg_empty_cond, 100, 42);
rtl_loop_wait_high(tp, &rtl_rxtx_empty_cond, 100, 42);
break;
@@ -3669,6 +3673,7 @@ static void rtl_hw_config(struct rtl8169_private *tp)
[RTL_GIGA_MAC_VER_50] = rtl_hw_start_8168ep_2,
[RTL_GIGA_MAC_VER_51] = rtl_hw_start_8168ep_3,
[RTL_GIGA_MAC_VER_52] = rtl_hw_start_8117,
+   [RTL_GIGA_MAC_VER_53] = rtl_hw_start_8117,
[RTL_GIGA_MAC_VER_60] = rtl_hw_start_8125a_1,
[RTL_GIGA_MAC_VER_61] = rtl_hw_start_8125a_2,
[RTL_GIGA_MAC_VER_63] = rtl_hw_s

Re: [PATCH] r8169: Add support for another RTL8168FP

2021-02-01 Thread Kai-Heng Feng
On Tue, Feb 2, 2021 at 2:54 AM Heiner Kallweit  wrote:
>
> On 01.02.2021 17:47, Kai-Heng Feng wrote:
> > According to the vendor driver, the new chip with XID 0x54b is
> > essentially the same as the one with XID 0x54a, but it doesn't need the
> > firmware.
> >
> > So add support accordingly.
> >
> > Signed-off-by: Kai-Heng Feng 
> > ---
> >  drivers/net/ethernet/realtek/r8169.h  |  1 +
> >  drivers/net/ethernet/realtek/r8169_main.c | 21 +
> >  2 files changed, 14 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/realtek/r8169.h 
> > b/drivers/net/ethernet/realtek/r8169.h
> > index 7be86ef5a584..2728df46ec41 100644
> > --- a/drivers/net/ethernet/realtek/r8169.h
> > +++ b/drivers/net/ethernet/realtek/r8169.h
> > @@ -63,6 +63,7 @@ enum mac_version {
> >   RTL_GIGA_MAC_VER_50,
> >   RTL_GIGA_MAC_VER_51,
> >   RTL_GIGA_MAC_VER_52,
> > + RTL_GIGA_MAC_VER_53,
> >   RTL_GIGA_MAC_VER_60,
> >   RTL_GIGA_MAC_VER_61,
> >   RTL_GIGA_MAC_VER_63,
> > diff --git a/drivers/net/ethernet/realtek/r8169_main.c 
> > b/drivers/net/ethernet/realtek/r8169_main.c
> > index a569abe7f5ef..ee1f72a9d453 100644
> > --- a/drivers/net/ethernet/realtek/r8169_main.c
> > +++ b/drivers/net/ethernet/realtek/r8169_main.c
> > @@ -145,6 +145,7 @@ static const struct {
> >   [RTL_GIGA_MAC_VER_50] = {"RTL8168ep/8111ep" },
> >   [RTL_GIGA_MAC_VER_51] = {"RTL8168ep/8111ep" },
> >   [RTL_GIGA_MAC_VER_52] = {"RTL8168fp/RTL8117",  FIRMWARE_8168FP_3},
> > + [RTL_GIGA_MAC_VER_53] = {"RTL8168fp/RTL8117",   },
> >   [RTL_GIGA_MAC_VER_60] = {"RTL8125A" },
> >   [RTL_GIGA_MAC_VER_61] = {"RTL8125A",FIRMWARE_8125A_3},
> >   /* reserve 62 for CFG_METHOD_4 in the vendor driver */
> > @@ -682,7 +683,7 @@ static bool rtl_is_8168evl_up(struct rtl8169_private 
> > *tp)
> >  {
> >   return tp->mac_version >= RTL_GIGA_MAC_VER_34 &&
> >  tp->mac_version != RTL_GIGA_MAC_VER_39 &&
> > -tp->mac_version <= RTL_GIGA_MAC_VER_52;
> > +tp->mac_version <= RTL_GIGA_MAC_VER_53;
> >  }
> >
> >  static bool rtl_supports_eee(struct rtl8169_private *tp)
> > @@ -1012,7 +1013,9 @@ static u16 rtl_ephy_read(struct rtl8169_private *tp, 
> > int reg_addr)
> >  static void r8168fp_adjust_ocp_cmd(struct rtl8169_private *tp, u32 *cmd, 
> > int type)
> >  {
> >   /* based on RTL8168FP_OOBMAC_BASE in vendor driver */
> > - if (tp->mac_version == RTL_GIGA_MAC_VER_52 && type == ERIAR_OOB)
> > + if (type == ERIAR_OOB &&
> > + (tp->mac_version == RTL_GIGA_MAC_VER_52 ||
> > +  tp->mac_version == RTL_GIGA_MAC_VER_53))
> >   *cmd |= 0x7f0 << 18;
> >  }
> >
> > @@ -1164,7 +1167,7 @@ static void rtl8168_driver_start(struct 
> > rtl8169_private *tp)
> >   case RTL_GIGA_MAC_VER_31:
> >   rtl8168dp_driver_start(tp);
> >   break;
> > - case RTL_GIGA_MAC_VER_49 ... RTL_GIGA_MAC_VER_52:
> > + case RTL_GIGA_MAC_VER_49 ... RTL_GIGA_MAC_VER_53:
> >   rtl8168ep_driver_start(tp);
> >   break;
> >   default:
> > @@ -1195,7 +1198,7 @@ static void rtl8168_driver_stop(struct 
> > rtl8169_private *tp)
> >   case RTL_GIGA_MAC_VER_31:
> >   rtl8168dp_driver_stop(tp);
> >   break;
> > - case RTL_GIGA_MAC_VER_49 ... RTL_GIGA_MAC_VER_52:
> > + case RTL_GIGA_MAC_VER_49 ... RTL_GIGA_MAC_VER_53:
> >   rtl8168ep_driver_stop(tp);
> >   break;
> >   default:
> > @@ -1223,7 +1226,7 @@ static bool r8168_check_dash(struct rtl8169_private 
> > *tp)
> >   case RTL_GIGA_MAC_VER_28:
> >   case RTL_GIGA_MAC_VER_31:
> >   return r8168dp_check_dash(tp);
> > - case RTL_GIGA_MAC_VER_49 ... RTL_GIGA_MAC_VER_52:
> > + case RTL_GIGA_MAC_VER_49 ... RTL_GIGA_MAC_VER_53:
> >   return r8168ep_check_dash(tp);
> >   default:
> >   return false;
> > @@ -1930,6 +1933,7 @@ static enum mac_version rtl8169_get_mac_version(u16 
> > xid, bool gmii)
> >   { 0x7c8, 0x608, RTL_GIGA_MAC_VER_61 },
> >
> >   /* RTL8117 */
> > + { 0x7cf, 0x54b, RTL_GIGA_MAC_VER_53

[PATCH] r8169: Add support for another RTL8168FP

2021-02-01 Thread Kai-Heng Feng
According to the vendor driver, the new chip with XID 0x54b is
essentially the same as the one with XID 0x54a, but it doesn't need the
firmware.

So add support accordingly.

Signed-off-by: Kai-Heng Feng 
---
 drivers/net/ethernet/realtek/r8169.h  |  1 +
 drivers/net/ethernet/realtek/r8169_main.c | 21 +
 2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.h 
b/drivers/net/ethernet/realtek/r8169.h
index 7be86ef5a584..2728df46ec41 100644
--- a/drivers/net/ethernet/realtek/r8169.h
+++ b/drivers/net/ethernet/realtek/r8169.h
@@ -63,6 +63,7 @@ enum mac_version {
RTL_GIGA_MAC_VER_50,
RTL_GIGA_MAC_VER_51,
RTL_GIGA_MAC_VER_52,
+   RTL_GIGA_MAC_VER_53,
RTL_GIGA_MAC_VER_60,
RTL_GIGA_MAC_VER_61,
RTL_GIGA_MAC_VER_63,
diff --git a/drivers/net/ethernet/realtek/r8169_main.c 
b/drivers/net/ethernet/realtek/r8169_main.c
index a569abe7f5ef..ee1f72a9d453 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -145,6 +145,7 @@ static const struct {
[RTL_GIGA_MAC_VER_50] = {"RTL8168ep/8111ep" },
[RTL_GIGA_MAC_VER_51] = {"RTL8168ep/8111ep" },
[RTL_GIGA_MAC_VER_52] = {"RTL8168fp/RTL8117",  FIRMWARE_8168FP_3},
+   [RTL_GIGA_MAC_VER_53] = {"RTL8168fp/RTL8117",   },
[RTL_GIGA_MAC_VER_60] = {"RTL8125A" },
[RTL_GIGA_MAC_VER_61] = {"RTL8125A",FIRMWARE_8125A_3},
/* reserve 62 for CFG_METHOD_4 in the vendor driver */
@@ -682,7 +683,7 @@ static bool rtl_is_8168evl_up(struct rtl8169_private *tp)
 {
return tp->mac_version >= RTL_GIGA_MAC_VER_34 &&
   tp->mac_version != RTL_GIGA_MAC_VER_39 &&
-  tp->mac_version <= RTL_GIGA_MAC_VER_52;
+  tp->mac_version <= RTL_GIGA_MAC_VER_53;
 }
 
 static bool rtl_supports_eee(struct rtl8169_private *tp)
@@ -1012,7 +1013,9 @@ static u16 rtl_ephy_read(struct rtl8169_private *tp, int 
reg_addr)
 static void r8168fp_adjust_ocp_cmd(struct rtl8169_private *tp, u32 *cmd, int 
type)
 {
/* based on RTL8168FP_OOBMAC_BASE in vendor driver */
-   if (tp->mac_version == RTL_GIGA_MAC_VER_52 && type == ERIAR_OOB)
+   if (type == ERIAR_OOB &&
+   (tp->mac_version == RTL_GIGA_MAC_VER_52 ||
+tp->mac_version == RTL_GIGA_MAC_VER_53))
*cmd |= 0x7f0 << 18;
 }
 
@@ -1164,7 +1167,7 @@ static void rtl8168_driver_start(struct rtl8169_private 
*tp)
case RTL_GIGA_MAC_VER_31:
rtl8168dp_driver_start(tp);
break;
-   case RTL_GIGA_MAC_VER_49 ... RTL_GIGA_MAC_VER_52:
+   case RTL_GIGA_MAC_VER_49 ... RTL_GIGA_MAC_VER_53:
rtl8168ep_driver_start(tp);
break;
default:
@@ -1195,7 +1198,7 @@ static void rtl8168_driver_stop(struct rtl8169_private 
*tp)
case RTL_GIGA_MAC_VER_31:
rtl8168dp_driver_stop(tp);
break;
-   case RTL_GIGA_MAC_VER_49 ... RTL_GIGA_MAC_VER_52:
+   case RTL_GIGA_MAC_VER_49 ... RTL_GIGA_MAC_VER_53:
rtl8168ep_driver_stop(tp);
break;
default:
@@ -1223,7 +1226,7 @@ static bool r8168_check_dash(struct rtl8169_private *tp)
case RTL_GIGA_MAC_VER_28:
case RTL_GIGA_MAC_VER_31:
return r8168dp_check_dash(tp);
-   case RTL_GIGA_MAC_VER_49 ... RTL_GIGA_MAC_VER_52:
+   case RTL_GIGA_MAC_VER_49 ... RTL_GIGA_MAC_VER_53:
return r8168ep_check_dash(tp);
default:
return false;
@@ -1930,6 +1933,7 @@ static enum mac_version rtl8169_get_mac_version(u16 xid, 
bool gmii)
{ 0x7c8, 0x608, RTL_GIGA_MAC_VER_61 },
 
/* RTL8117 */
+   { 0x7cf, 0x54b, RTL_GIGA_MAC_VER_53 },
{ 0x7cf, 0x54a, RTL_GIGA_MAC_VER_52 },
 
/* 8168EP family. */
@@ -2274,7 +2278,7 @@ static void rtl_init_rxcfg(struct rtl8169_private *tp)
case RTL_GIGA_MAC_VER_38:
RTL_W32(tp, RxConfig, RX128_INT_EN | RX_MULTI_EN | 
RX_DMA_BURST);
break;
-   case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_52:
+   case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_53:
RTL_W32(tp, RxConfig, RX128_INT_EN | RX_MULTI_EN | RX_DMA_BURST 
| RX_EARLY_OFF);
break;
case RTL_GIGA_MAC_VER_60 ... RTL_GIGA_MAC_VER_63:
@@ -2449,7 +2453,7 @@ DECLARE_RTL_COND(rtl_rxtx_empty_cond_2)
 static void rtl_wait_txrx_fifo_empty(struct rtl8169_private *tp)
 {
switch (tp->mac_version) {
-   case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_52:
+   case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_53:
rtl_loop_wait_high(tp, &rtl_txcfg_empty_cond, 100, 42);
 

Re: [PATCH] rtw88: 8821c: Add RFE 2 support

2021-01-06 Thread Kai-Heng Feng
On Wed, Aug 5, 2020 at 7:24 PM Kai-Heng Feng
 wrote:
>
> Hi Tony,
>
> > On Aug 5, 2020, at 19:18, Tony Chuang  wrote:
> >
> >> 8821CE with RFE 2 isn't supported:
> >> [   12.404834] rtw_8821ce :02:00.0: rfe 2 isn't supported
> >> [   12.404937] rtw_8821ce :02:00.0: failed to setup chip efuse info
> >> [   12.404939] rtw_8821ce :02:00.0: failed to setup chip information
> >>
> >
> > NACK
> >
> > The RFE type 2 should be working with some additional fixes.
> > Did you tested connecting to AP with BT paired?
>
> No, I only tested WiFi.
>
> > The antenna configuration is different with RFE type 0.
> > I will ask someone else to fix them.
> > Then the RFE type 2 modules can be supported.
>
> Good to know that, I'll be patient and wait for a real fix.

It's been quite some time, is support for RFE type 2 ready now?

Kai-Heng

>
> Kai-Heng
>
> >
> > Yen-Hsuan
>


Re: [PATCH 1/2][v3] e1000e: Leverage direct_complete to speed up s2ram

2020-12-02 Thread Kai-Heng Feng
Hi Yu,

> On Dec 1, 2020, at 09:21, Chen Yu  wrote:
> 
> The NIC is put in runtime suspend status when there is no cable connected.
> As a result, it is safe to keep non-wakeup NIC in runtime suspended during
> s2ram because the system does not rely on the NIC plug event nor WoL to wake
> up the system. Besides that, unlike the s2idle, s2ram does not need to
> manipulate S0ix settings during suspend.
> 
> This patch introduces the .prepare() for e1000e so that if the NIC is runtime
> suspended the subsequent suspend/resume hooks will be skipped so as to speed
> up the s2ram. The pm core will check whether the NIC is a wake up device so
> there's no need to check it again in .prepare(). DPM_FLAG_SMART_PREPARE flag
> should be set during probe to ask the pci subsystem to honor the driver's
> prepare() result. Besides, the NIC remains runtime suspended after resumed
> from s2ram as there is no need to resume it.
> 
> Tested on i7-2600K with 82579V NIC
> Before the patch:
> e1000e :00:19.0: pci_pm_suspend+0x0/0x160 returned 0 after 225146 usecs
> e1000e :00:19.0: pci_pm_resume+0x0/0x90 returned 0 after 140588 usecs
> 
> After the patch:
> echo disabled > //sys/devices/pci\:00/\:00\:19.0/power/wakeup
> becomes 0 usecs because the hooks will be skipped.
> 
> Suggested-by: Kai-Heng Feng 
> Signed-off-by: Chen Yu 

Well, I was intended to send it, but anyway :)

> ---
> v2: Added test data and some commit log revise(Paul Menzel)
>    Only skip the suspend/resume if the NIC is not a wake up device specified
>by the user(Kai-Heng Feng)
> v3: Leverage direct complete mechanism to skip all hooks(Kai-Heng Feng)
> ---
> drivers/net/ethernet/intel/e1000e/netdev.c | 10 +-
> 1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
> b/drivers/net/ethernet/intel/e1000e/netdev.c
> index b30f00891c03..b210bba3f20a 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -25,6 +25,7 @@
> #include 
> #include 
> #include 
> +#include 
> 
> #include "e1000.h"
> 
> @@ -6957,6 +6958,12 @@ static int __e1000_resume(struct pci_dev *pdev)
>   return 0;
> }
> 
> +static int e1000e_pm_prepare(struct device *dev)
> +{
> + return pm_runtime_suspended(dev) &&
> + pm_suspend_via_firmware();
> +}
> +
> static __maybe_unused int e1000e_pm_suspend(struct device *dev)
> {
>   struct net_device *netdev = pci_get_drvdata(to_pci_dev(dev));
> @@ -7665,7 +7672,7 @@ static int e1000_probe(struct pci_dev *pdev, const 
> struct pci_device_id *ent)
> 
>   e1000_print_device_info(adapter);
> 
> - dev_pm_set_driver_flags(&pdev->dev, DPM_FLAG_NO_DIRECT_COMPLETE);
> + dev_pm_set_driver_flags(&pdev->dev, DPM_FLAG_SMART_PREPARE);

This isn't required for pci_pm_prepare() to use driver's .prepare callback.

> 
>   if (pci_dev_run_wake(pdev) && hw->mac.type < e1000_pch_cnp)
>   pm_runtime_put_noidle(&pdev->dev);
> @@ -7890,6 +7897,7 @@ MODULE_DEVICE_TABLE(pci, e1000_pci_tbl);
> 
> static const struct dev_pm_ops e1000_pm_ops = {
> #ifdef CONFIG_PM_SLEEP
> + .prepare= e1000e_pm_prepare,

How do we make sure a link change happened in S3 can be detect after resume, 
without a .complete callback which ask device to runtime resume?

Kai-Heng

>   .suspend= e1000e_pm_suspend,
>   .resume = e1000e_pm_resume,
>   .freeze = e1000e_pm_freeze,
> -- 
> 2.17.1
> 



[PATCH] igc: Report speed and duplex as unknown when device is runtime suspended

2020-12-01 Thread Kai-Heng Feng
h begin()
and complete() callbacks, and calls runtime resume and runtime suspend
routine respectively. However, igc is like igb, runtime resume routine
uses rtnl_lock() which upper ethtool layer also uses.

So to prevent a deadlock on rtnl, take a different approach, use
pm_runtime_suspended() to avoid reading register while device is runtime
suspended.

Cc: 
Signed-off-by: Kai-Heng Feng 
---
 drivers/net/ethernet/intel/igc/igc_ethtool.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/igc/igc_ethtool.c 
b/drivers/net/ethernet/intel/igc/igc_ethtool.c
index 61d331ce38cd..4b9eac9dc090 100644
--- a/drivers/net/ethernet/intel/igc/igc_ethtool.c
+++ b/drivers/net/ethernet/intel/igc/igc_ethtool.c
@@ -1708,7 +1708,8 @@ static int igc_ethtool_get_link_ksettings(struct 
net_device *netdev,
 Asym_Pause);
}
 
-   status = rd32(IGC_STATUS);
+   status = pm_runtime_suspended(&adapter->pdev->dev) ?
+0 : rd32(IGC_STATUS);
 
if (status & IGC_STATUS_LU) {
if (status & IGC_STATUS_SPEED_1000) {
-- 
2.29.2



Re: [Intel-wired-lan] [PATCH v4] e1000e: Increase polling timeout on MDIC ready bit

2020-10-04 Thread Kai-Heng Feng
Hi Vitaly,

> On Sep 30, 2020, at 14:54, Vitaly Lifshits  wrote:
> 
> On 9/29/2020 18:08, Kai-Heng Feng wrote:
> 
> Hello Kai-Heng,
>>> On Sep 29, 2020, at 21:46, Neftin, Sasha  wrote:
>>> 
>>> Hello Kai-Heng,
>>> On 9/29/2020 16:31, Kai-Heng Feng wrote:
>>>> Hi Sasha,
>>>>> On Sep 29, 2020, at 21:08, Neftin, Sasha  wrote:
>>>>> 
>>>>> On 9/28/2020 11:36, Kai-Heng Feng wrote:
>>>>>> We are seeing the following error after S3 resume:
>>>>>> [  704.746874] e1000e :00:1f.6 eno1: Setting page 0x6020
>>>>>> [  704.844232] e1000e :00:1f.6 eno1: MDI Write did not complete
>>>>>> [  704.902817] e1000e :00:1f.6 eno1: Setting page 0x6020
>>>>>> [  704.903075] e1000e :00:1f.6 eno1: reading PHY page 769 (or 0x6020 
>>>>>> shifted) reg 0x17
>>>>>> [  704.903281] e1000e :00:1f.6 eno1: Setting page 0x6020
>>>>>> [  704.903486] e1000e :00:1f.6 eno1: writing PHY page 769 (or 0x6020 
>>>>>> shifted) reg 0x17
>>>>>> [  704.943155] e1000e :00:1f.6 eno1: MDI Error
>>>>>> ...
>>>>>> [  705.108161] e1000e :00:1f.6 eno1: Hardware Error
>>>>>> As Andrew Lunn pointed out, MDIO has nothing to do with phy, and indeed
>>>>>> increase polling iteration can resolve the issue.
>>>>>> This patch only papers over the symptom, as we don't really know the
>>>>>> root cause of the issue. The most possible culprit is Intel ME, which
>>>>>> may do its own things that conflict with software.
>>>>>> Signed-off-by: Kai-Heng Feng 
>>>>>> ---
>>>>>> v4:
>>>>>>  - States that this patch just papers over the symptom.
>>>>>> v3:
>>>>>>  - Moving delay to end of loop doesn't save anytime, move it back.
>>>>>>  - Point out this is quitely likely caused by Intel ME.
>>>>>> v2:
>>>>>>  - Increase polling iteration instead of powering down the phy.
>>>>>>  drivers/net/ethernet/intel/e1000e/phy.c | 2 +-
>>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>> diff --git a/drivers/net/ethernet/intel/e1000e/phy.c 
>>>>>> b/drivers/net/ethernet/intel/e1000e/phy.c
>>>>>> index e11c877595fb..e6d4acd90937 100644
>>>>>> --- a/drivers/net/ethernet/intel/e1000e/phy.c
>>>>>> +++ b/drivers/net/ethernet/intel/e1000e/phy.c
>>>>>> @@ -203,7 +203,7 @@ s32 e1000e_write_phy_reg_mdic(struct e1000_hw *hw, 
>>>>>> u32 offset, u16 data)
>>>>>>   * Increasing the time out as testing showed failures with
>>>>>>   * the lower time out
>>>>>>   */
>>>>>> -for (i = 0; i < (E1000_GEN_POLL_TIMEOUT * 3); i++) {
>>>>>> +for (i = 0; i < (E1000_GEN_POLL_TIMEOUT * 10); i++) {
>>>>> As we discussed (many threads) - AMT/ME systems not supported on Linux as 
>>>>> properly. I do not think increasing polling iteration will solve the 
>>>>> problem. Rather mask it.
>>>> I am aware of the status quo of no proper support on Intel ME.
>>>>> I prefer you check option to disable ME vi BIOS on your system.
>>>> We can't ask user to change the BIOS to accommodate Linux. So before a 
>>>> proper solution comes out, masking the problem is good enough for me.
>>>> Until then, I'll carry it as a downstream distro patch.
>>> What will you do with system that even after increasing polling time will 
>>> run into HW error?
>> Hope we finally have proper ME support under Linux?
>> Kai-Heng
>>>> Kai-Heng
>>>>>>  udelay(50);
>>>>>>  mdic = er32(MDIC);
>>>>>>  if (mdic & E1000_MDIC_READY)
>>>>> Thanks,
>>>>> Sasha
>>> Thanks,
>>> Sasha
> 
> On which device ID/platform do you see the issue?

HP Z4 G4 Workstation,
00:1f.6 Ethernet controller [0200]: Intel Corporation Ethernet Connection (2) 
I219-LM [8086:15b7]

> What is the Firmware version on your platform?

BIOS version: P61 v02.59


> What is the ME firmware version that you have?

ME version: 11.11.50.1422
ME mode: AMT disable

Kai-Heng

> 
> I am asking these questions, since I know there is supposed to be a fix in 
> the firmware to many issues that are related to ME and device 
> interoperability.
> 
> Thanks,
> 
> Vitaly



Re: [Intel-wired-lan] [PATCH v4] e1000e: Increase polling timeout on MDIC ready bit

2020-09-29 Thread Kai-Heng Feng



> On Sep 29, 2020, at 23:11, David Laight  wrote:
> 
>> Hope we finally have proper ME support under Linux?
> 
> How about a way to disable it.

This will do, too :)

Kai-Heng

> 
>   David
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 
> 1PT, UK
> Registration No: 1397386 (Wales)
> 



Re: [Intel-wired-lan] [PATCH v4] e1000e: Increase polling timeout on MDIC ready bit

2020-09-29 Thread Kai-Heng Feng



> On Sep 29, 2020, at 21:46, Neftin, Sasha  wrote:
> 
> Hello Kai-Heng,
> On 9/29/2020 16:31, Kai-Heng Feng wrote:
>> Hi Sasha,
>>> On Sep 29, 2020, at 21:08, Neftin, Sasha  wrote:
>>> 
>>> On 9/28/2020 11:36, Kai-Heng Feng wrote:
>>>> We are seeing the following error after S3 resume:
>>>> [  704.746874] e1000e :00:1f.6 eno1: Setting page 0x6020
>>>> [  704.844232] e1000e :00:1f.6 eno1: MDI Write did not complete
>>>> [  704.902817] e1000e :00:1f.6 eno1: Setting page 0x6020
>>>> [  704.903075] e1000e :00:1f.6 eno1: reading PHY page 769 (or 0x6020 
>>>> shifted) reg 0x17
>>>> [  704.903281] e1000e :00:1f.6 eno1: Setting page 0x6020
>>>> [  704.903486] e1000e :00:1f.6 eno1: writing PHY page 769 (or 0x6020 
>>>> shifted) reg 0x17
>>>> [  704.943155] e1000e :00:1f.6 eno1: MDI Error
>>>> ...
>>>> [  705.108161] e1000e :00:1f.6 eno1: Hardware Error
>>>> As Andrew Lunn pointed out, MDIO has nothing to do with phy, and indeed
>>>> increase polling iteration can resolve the issue.
>>>> This patch only papers over the symptom, as we don't really know the
>>>> root cause of the issue. The most possible culprit is Intel ME, which
>>>> may do its own things that conflict with software.
>>>> Signed-off-by: Kai-Heng Feng 
>>>> ---
>>>> v4:
>>>>  - States that this patch just papers over the symptom.
>>>> v3:
>>>>  - Moving delay to end of loop doesn't save anytime, move it back.
>>>>  - Point out this is quitely likely caused by Intel ME.
>>>> v2:
>>>>  - Increase polling iteration instead of powering down the phy.
>>>>  drivers/net/ethernet/intel/e1000e/phy.c | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>> diff --git a/drivers/net/ethernet/intel/e1000e/phy.c 
>>>> b/drivers/net/ethernet/intel/e1000e/phy.c
>>>> index e11c877595fb..e6d4acd90937 100644
>>>> --- a/drivers/net/ethernet/intel/e1000e/phy.c
>>>> +++ b/drivers/net/ethernet/intel/e1000e/phy.c
>>>> @@ -203,7 +203,7 @@ s32 e1000e_write_phy_reg_mdic(struct e1000_hw *hw, u32 
>>>> offset, u16 data)
>>>> * Increasing the time out as testing showed failures with
>>>> * the lower time out
>>>> */
>>>> -  for (i = 0; i < (E1000_GEN_POLL_TIMEOUT * 3); i++) {
>>>> +  for (i = 0; i < (E1000_GEN_POLL_TIMEOUT * 10); i++) {
>>> As we discussed (many threads) - AMT/ME systems not supported on Linux as 
>>> properly. I do not think increasing polling iteration will solve the 
>>> problem. Rather mask it.
>> I am aware of the status quo of no proper support on Intel ME.
>>> I prefer you check option to disable ME vi BIOS on your system.
>> We can't ask user to change the BIOS to accommodate Linux. So before a 
>> proper solution comes out, masking the problem is good enough for me.
>> Until then, I'll carry it as a downstream distro patch.
> What will you do with system that even after increasing polling time will run 
> into HW error?

Hope we finally have proper ME support under Linux?

Kai-Heng

>> Kai-Heng
>>>>udelay(50);
>>>>mdic = er32(MDIC);
>>>>if (mdic & E1000_MDIC_READY)
>>> Thanks,
>>> Sasha
> Thanks,
> Sasha



Re: [Intel-wired-lan] [PATCH v4] e1000e: Increase polling timeout on MDIC ready bit

2020-09-29 Thread Kai-Heng Feng
Hi Sasha,

> On Sep 29, 2020, at 21:08, Neftin, Sasha  wrote:
> 
> On 9/28/2020 11:36, Kai-Heng Feng wrote:
>> We are seeing the following error after S3 resume:
>> [  704.746874] e1000e :00:1f.6 eno1: Setting page 0x6020
>> [  704.844232] e1000e :00:1f.6 eno1: MDI Write did not complete
>> [  704.902817] e1000e :00:1f.6 eno1: Setting page 0x6020
>> [  704.903075] e1000e :00:1f.6 eno1: reading PHY page 769 (or 0x6020 
>> shifted) reg 0x17
>> [  704.903281] e1000e :00:1f.6 eno1: Setting page 0x6020
>> [  704.903486] e1000e :00:1f.6 eno1: writing PHY page 769 (or 0x6020 
>> shifted) reg 0x17
>> [  704.943155] e1000e :00:1f.6 eno1: MDI Error
>> ...
>> [  705.108161] e1000e :00:1f.6 eno1: Hardware Error
>> As Andrew Lunn pointed out, MDIO has nothing to do with phy, and indeed
>> increase polling iteration can resolve the issue.
>> This patch only papers over the symptom, as we don't really know the
>> root cause of the issue. The most possible culprit is Intel ME, which
>> may do its own things that conflict with software.
>> Signed-off-by: Kai-Heng Feng 
>> ---
>> v4:
>>  - States that this patch just papers over the symptom.
>> v3:
>>  - Moving delay to end of loop doesn't save anytime, move it back.
>>  - Point out this is quitely likely caused by Intel ME.
>> v2:
>>  - Increase polling iteration instead of powering down the phy.
>>  drivers/net/ethernet/intel/e1000e/phy.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> diff --git a/drivers/net/ethernet/intel/e1000e/phy.c 
>> b/drivers/net/ethernet/intel/e1000e/phy.c
>> index e11c877595fb..e6d4acd90937 100644
>> --- a/drivers/net/ethernet/intel/e1000e/phy.c
>> +++ b/drivers/net/ethernet/intel/e1000e/phy.c
>> @@ -203,7 +203,7 @@ s32 e1000e_write_phy_reg_mdic(struct e1000_hw *hw, u32 
>> offset, u16 data)
>>   * Increasing the time out as testing showed failures with
>>   * the lower time out
>>   */
>> -for (i = 0; i < (E1000_GEN_POLL_TIMEOUT * 3); i++) {
>> +for (i = 0; i < (E1000_GEN_POLL_TIMEOUT * 10); i++) {
> As we discussed (many threads) - AMT/ME systems not supported on Linux as 
> properly. I do not think increasing polling iteration will solve the problem. 
> Rather mask it.

I am aware of the status quo of no proper support on Intel ME. 

> I prefer you check option to disable ME vi BIOS on your system.

We can't ask user to change the BIOS to accommodate Linux. So before a proper 
solution comes out, masking the problem is good enough for me.
Until then, I'll carry it as a downstream distro patch.

Kai-Heng

>>  udelay(50);
>>  mdic = er32(MDIC);
>>  if (mdic & E1000_MDIC_READY)
> Thanks,
> Sasha



[PATCH v2] rtw88: pci: Power cycle device during shutdown

2020-09-28 Thread Kai-Heng Feng
There are reports that 8822CE fails to work rtw88 with "failed to read DBI
register" error. Also I have a system with 8723DE which freezes the whole
system when the rtw88 is probing the device.

According to [1], platform firmware may not properly power manage the
device during shutdown. I did some expirements and putting the device to
D3 can workaround the issue.

So let's power cycle the device by putting the device to D3 at shutdown
to prevent the issue from happening.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=206411#c9

BugLink: https://bugs.launchpad.net/bugs/1872984
Signed-off-by: Kai-Heng Feng 
---
v2:
 - Add more detail in commit log.

 drivers/net/wireless/realtek/rtw88/pci.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/wireless/realtek/rtw88/pci.c 
b/drivers/net/wireless/realtek/rtw88/pci.c
index 3413973bc475..7f1f5073b9f4 100644
--- a/drivers/net/wireless/realtek/rtw88/pci.c
+++ b/drivers/net/wireless/realtek/rtw88/pci.c
@@ -1599,6 +1599,8 @@ void rtw_pci_shutdown(struct pci_dev *pdev)
 
if (chip->ops->shutdown)
chip->ops->shutdown(rtwdev);
+
+   pci_set_power_state(pdev, PCI_D3hot);
 }
 EXPORT_SYMBOL(rtw_pci_shutdown);
 
-- 
2.17.1



[PATCH v4] e1000e: Increase polling timeout on MDIC ready bit

2020-09-28 Thread Kai-Heng Feng
We are seeing the following error after S3 resume:
[  704.746874] e1000e :00:1f.6 eno1: Setting page 0x6020
[  704.844232] e1000e :00:1f.6 eno1: MDI Write did not complete
[  704.902817] e1000e :00:1f.6 eno1: Setting page 0x6020
[  704.903075] e1000e :00:1f.6 eno1: reading PHY page 769 (or 0x6020 
shifted) reg 0x17
[  704.903281] e1000e :00:1f.6 eno1: Setting page 0x6020
[  704.903486] e1000e :00:1f.6 eno1: writing PHY page 769 (or 0x6020 
shifted) reg 0x17
[  704.943155] e1000e :00:1f.6 eno1: MDI Error
...
[  705.108161] e1000e :00:1f.6 eno1: Hardware Error

As Andrew Lunn pointed out, MDIO has nothing to do with phy, and indeed
increase polling iteration can resolve the issue.

This patch only papers over the symptom, as we don't really know the
root cause of the issue. The most possible culprit is Intel ME, which
may do its own things that conflict with software.

Signed-off-by: Kai-Heng Feng 
---
v4:
 - States that this patch just papers over the symptom.

v3:
 - Moving delay to end of loop doesn't save anytime, move it back.
 - Point out this is quitely likely caused by Intel ME.

v2:
 - Increase polling iteration instead of powering down the phy.

 drivers/net/ethernet/intel/e1000e/phy.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/e1000e/phy.c 
b/drivers/net/ethernet/intel/e1000e/phy.c
index e11c877595fb..e6d4acd90937 100644
--- a/drivers/net/ethernet/intel/e1000e/phy.c
+++ b/drivers/net/ethernet/intel/e1000e/phy.c
@@ -203,7 +203,7 @@ s32 e1000e_write_phy_reg_mdic(struct e1000_hw *hw, u32 
offset, u16 data)
 * Increasing the time out as testing showed failures with
 * the lower time out
 */
-   for (i = 0; i < (E1000_GEN_POLL_TIMEOUT * 3); i++) {
+   for (i = 0; i < (E1000_GEN_POLL_TIMEOUT * 10); i++) {
udelay(50);
mdic = er32(MDIC);
if (mdic & E1000_MDIC_READY)
-- 
2.17.1



Re: [PATCH v3] e1000e: Increase iteration on polling MDIC ready bit

2020-09-24 Thread Kai-Heng Feng



> On Sep 25, 2020, at 03:57, Andrew Lunn  wrote:
> 
> On Fri, Sep 25, 2020 at 12:45:42AM +0800, Kai-Heng Feng wrote:
>> We are seeing the following error after S3 resume:
>> [  704.746874] e1000e :00:1f.6 eno1: Setting page 0x6020
>> [  704.844232] e1000e :00:1f.6 eno1: MDI Write did not complete
>> [  704.902817] e1000e :00:1f.6 eno1: Setting page 0x6020
>> [  704.903075] e1000e :00:1f.6 eno1: reading PHY page 769 (or 0x6020 
>> shifted) reg 0x17
>> [  704.903281] e1000e :00:1f.6 eno1: Setting page 0x6020
>> [  704.903486] e1000e :00:1f.6 eno1: writing PHY page 769 (or 0x6020 
>> shifted) reg 0x17
>> [  704.943155] e1000e :00:1f.6 eno1: MDI Error
>> ...
>> [  705.108161] e1000e :00:1f.6 eno1: Hardware Error
>> 
>> As Andrew Lunn pointed out, MDIO has nothing to do with phy, and indeed
>> increase polling iteration can resolve the issue.
>> 
>> The root cause is quite likely Intel ME, since it's a blackbox to the
>> kernel so the only approach we can take is to be patient and wait
>> longer.
> 
> Please could you explain how you see Intel ME being responsible for
> this. I'm not convinced.

Some other occurrences:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d17c7868b2f8e329dcee4ecd2f5d16cfc9b26ac8
https://lore.kernel.org/netdev/20200323191639.48826-1-aaron...@canonical.com/

Of course we need an ACK from Intel this one is also related to ME.

Kai-Heng

> 
>  Andrew



[PATCH v3] e1000e: Increase iteration on polling MDIC ready bit

2020-09-24 Thread Kai-Heng Feng
We are seeing the following error after S3 resume:
[  704.746874] e1000e :00:1f.6 eno1: Setting page 0x6020
[  704.844232] e1000e :00:1f.6 eno1: MDI Write did not complete
[  704.902817] e1000e :00:1f.6 eno1: Setting page 0x6020
[  704.903075] e1000e :00:1f.6 eno1: reading PHY page 769 (or 0x6020 
shifted) reg 0x17
[  704.903281] e1000e :00:1f.6 eno1: Setting page 0x6020
[  704.903486] e1000e :00:1f.6 eno1: writing PHY page 769 (or 0x6020 
shifted) reg 0x17
[  704.943155] e1000e :00:1f.6 eno1: MDI Error
...
[  705.108161] e1000e :00:1f.6 eno1: Hardware Error

As Andrew Lunn pointed out, MDIO has nothing to do with phy, and indeed
increase polling iteration can resolve the issue.

The root cause is quite likely Intel ME, since it's a blackbox to the
kernel so the only approach we can take is to be patient and wait
longer.

Signed-off-by: Kai-Heng Feng 
---
v3:
 - Moving delay to end of loop doesn't save anytime, move it back.
 - Point out this is quitely likely caused by Intel ME.

v2:
 - Increase polling iteration instead of powering down the phy.

 drivers/net/ethernet/intel/e1000e/phy.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/e1000e/phy.c 
b/drivers/net/ethernet/intel/e1000e/phy.c
index e11c877595fb..e6d4acd90937 100644
--- a/drivers/net/ethernet/intel/e1000e/phy.c
+++ b/drivers/net/ethernet/intel/e1000e/phy.c
@@ -203,7 +203,7 @@ s32 e1000e_write_phy_reg_mdic(struct e1000_hw *hw, u32 
offset, u16 data)
 * Increasing the time out as testing showed failures with
 * the lower time out
 */
-   for (i = 0; i < (E1000_GEN_POLL_TIMEOUT * 3); i++) {
+   for (i = 0; i < (E1000_GEN_POLL_TIMEOUT * 10); i++) {
udelay(50);
mdic = er32(MDIC);
if (mdic & E1000_MDIC_READY)
-- 
2.17.1



Re: [PATCH v2] e1000e: Increase iteration on polling MDIC ready bit

2020-09-24 Thread Kai-Heng Feng
Hi Andrew,

> On Sep 24, 2020, at 23:53, Andrew Lunn  wrote:
> 
> On Thu, Sep 24, 2020 at 11:09:58PM +0800, Kai-Heng Feng wrote:
>> We are seeing the following error after S3 resume:
>> [  704.746874] e1000e :00:1f.6 eno1: Setting page 0x6020
>> [  704.844232] e1000e :00:1f.6 eno1: MDI Write did not complete
>> [  704.902817] e1000e :00:1f.6 eno1: Setting page 0x6020
>> [  704.903075] e1000e :00:1f.6 eno1: reading PHY page 769 (or 0x6020 
>> shifted) reg 0x17
>> [  704.903281] e1000e :00:1f.6 eno1: Setting page 0x6020
>> [  704.903486] e1000e :00:1f.6 eno1: writing PHY page 769 (or 0x6020 
>> shifted) reg 0x17
>> [  704.943155] e1000e :00:1f.6 eno1: MDI Error
>> ...
>> [  705.108161] e1000e :00:1f.6 eno1: Hardware Error
>> 
>> As Andrew Lunn pointed out, MDIO has nothing to do with phy, and indeed
>> increase polling iteration can resolve the issue.
>> 
>> While at it, also move the delay to the end of loop, to potentially save
>> 50 us.
> 
> You are unlikely to save any time. 64 bits at 2.5MHz is 25.6uS. So it
> is very unlikely doing a read directly after setting is going is going
> to have E1000_MDIC_READY set. So this change likely causes an addition
> read on MDIC. Did you profile this at all, for the normal case?

You are right, it's actually may add an additional read.
Let me send a v3.

> 
> I also don't fully understand the fix. You are now looping up to 6400
> times, each with a delay of 50uS. So that is around 12800 times more
> than it actually needs to transfer the 64 bits! I've no idea how this
> hardware works, but my guess would be, something is wrong with the
> clock setup?

It's probably caused by Intel ME. This is not something new, you can find many 
polling codes in e1000e driver are for ME, especially after S3 resume.

Unless Intel is willing to open up ME, being patient and wait for a longer 
while is the best approach we got.

Kai-Heng

> 
> Andrew



[PATCH v2] e1000e: Increase iteration on polling MDIC ready bit

2020-09-24 Thread Kai-Heng Feng
We are seeing the following error after S3 resume:
[  704.746874] e1000e :00:1f.6 eno1: Setting page 0x6020
[  704.844232] e1000e :00:1f.6 eno1: MDI Write did not complete
[  704.902817] e1000e :00:1f.6 eno1: Setting page 0x6020
[  704.903075] e1000e :00:1f.6 eno1: reading PHY page 769 (or 0x6020 
shifted) reg 0x17
[  704.903281] e1000e :00:1f.6 eno1: Setting page 0x6020
[  704.903486] e1000e :00:1f.6 eno1: writing PHY page 769 (or 0x6020 
shifted) reg 0x17
[  704.943155] e1000e :00:1f.6 eno1: MDI Error
...
[  705.108161] e1000e :00:1f.6 eno1: Hardware Error

As Andrew Lunn pointed out, MDIO has nothing to do with phy, and indeed
increase polling iteration can resolve the issue.

While at it, also move the delay to the end of loop, to potentially save
50 us.

Signed-off-by: Kai-Heng Feng 
---
v2:
 - Increase polling iteration instead of powering down the phy.

 drivers/net/ethernet/intel/e1000e/phy.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/phy.c 
b/drivers/net/ethernet/intel/e1000e/phy.c
index e11c877595fb..72968a01164b 100644
--- a/drivers/net/ethernet/intel/e1000e/phy.c
+++ b/drivers/net/ethernet/intel/e1000e/phy.c
@@ -203,11 +203,12 @@ s32 e1000e_write_phy_reg_mdic(struct e1000_hw *hw, u32 
offset, u16 data)
 * Increasing the time out as testing showed failures with
 * the lower time out
 */
-   for (i = 0; i < (E1000_GEN_POLL_TIMEOUT * 3); i++) {
-   udelay(50);
+   for (i = 0; i < (E1000_GEN_POLL_TIMEOUT * 10); i++) {
mdic = er32(MDIC);
if (mdic & E1000_MDIC_READY)
break;
+
+   udelay(50);
}
if (!(mdic & E1000_MDIC_READY)) {
e_dbg("MDI Write did not complete\n");
-- 
2.17.1



Re: [PATCH] e1000e: Power cycle phy on PM resume

2020-09-24 Thread Kai-Heng Feng
Hi Andrew,

> On Sep 23, 2020, at 23:37, Andrew Lunn  wrote:
> 
> On Wed, Sep 23, 2020 at 10:44:10PM +0800, Kai-Heng Feng wrote:
>> Hi Andrew,
>> 
>>> On Sep 23, 2020, at 20:17, Andrew Lunn  wrote:
>>> 
>>> On Wed, Sep 23, 2020 at 03:47:51PM +0800, Kai-Heng Feng wrote:
>>>> We are seeing the following error after S3 resume:
>>>> [  704.746874] e1000e :00:1f.6 eno1: Setting page 0x6020
>>>> [  704.844232] e1000e :00:1f.6 eno1: MDI Write did not complete
>>>> [  704.902817] e1000e :00:1f.6 eno1: Setting page 0x6020
>>>> [  704.903075] e1000e :00:1f.6 eno1: reading PHY page 769 (or 0x6020 
>>>> shifted) reg 0x17
>>>> [  704.903281] e1000e :00:1f.6 eno1: Setting page 0x6020
>>>> [  704.903486] e1000e :00:1f.6 eno1: writing PHY page 769 (or 0x6020 
>>>> shifted) reg 0x17
>>>> [  704.943155] e1000e :00:1f.6 eno1: MDI Error
>>>> ...
>>>> [  705.108161] e1000e :00:1f.6 eno1: Hardware Error
>>>> 
>>>> Since we don't know what platform firmware may do to the phy, so let's
>>>> power cycle the phy upon system resume to resolve the issue.
>>>> 
>>>> Signed-off-by: Kai-Heng Feng 
>>>> ---
>>>> drivers/net/ethernet/intel/e1000e/netdev.c | 2 ++
>>>> 1 file changed, 2 insertions(+)
>>>> 
>>>> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
>>>> b/drivers/net/ethernet/intel/e1000e/netdev.c
>>>> index 664e8ccc88d2..c2a87a408102 100644
>>>> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
>>>> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
>>>> @@ -6968,6 +6968,8 @@ static __maybe_unused int e1000e_pm_resume(struct 
>>>> device *dev)
>>>>!e1000e_check_me(hw->adapter->pdev->device))
>>>>e1000e_s0ix_exit_flow(adapter);
>>>> 
>>>> +  e1000_power_down_phy(adapter);
>>>> +
>>> 
>>> static void e1000_power_down_phy(struct e1000_adapter *adapter)
>>> {
>>> struct e1000_hw *hw = &adapter->hw;
>>> 
>>> /* Power down the PHY so no link is implied when interface is down *
>>>  * The PHY cannot be powered down if any of the following is true *
>>>  * (a) WoL is enabled
>>>  * (b) AMT is active
>>>  * (c) SoL/IDER session is active
>>>  */
>>> if (!adapter->wol && hw->mac_type >= e1000_82540 &&
>>>hw->media_type == e1000_media_type_copper) {
>> 
>> Looks like the the function comes from e1000, 
>> drivers/net/ethernet/intel/e1000/e1000_main.c.
>> However, this patch is for e1000e, so the function with same name is 
>> different.
> 
> Ah! Sorry. Missed that. Also it is not nice there are two functions in
> the kernel with the same name.
> 
>>> Could it be coming out of S3 because it just received a WoL?
>> 
>> No, the issue can be reproduced by pressing keyboard or rtcwake.
> 
> Not relevant now, since i was looking at the wrong function. But i was
> meaning the call is a NOP in the case WoL caused the wake up. So if
> the issues can also happen after WoL, your fix is not going to fix it.
> 
>>> It seems unlikely that it is the MII_CR_POWER_DOWN which is helping,
>>> since that is an MDIO write itself. Do you actually know how this call
>>> to e1000_power_down_phy() fixes the issues?
>> 
> 
>> I don't know from hardware's perspective, but I think the comment on
>> e1000_power_down_phy_copper() can give us some insight:
> 
> And there is only one function called e1000_power_down_phy_copper()
> :-)
> 
>> 
>> /**
>> * e1000_power_down_phy_copper - Restore copper link in case of PHY power down
>> * @hw: pointer to the HW structure
>> *
>> * In the case of a PHY power down to save power, or to turn off link during a
>> * driver unload, or wake on lan is not enabled, restore the link to previous
>> * settings.   
>> **/
>> void e1000_power_down_phy_copper(struct e1000_hw *hw)
>> {
>>u16 mii_reg = 0;
>> 
>>/* The PHY will retain its settings across a power down/up cycle */
>>e1e_rphy(hw, MII_BMCR, &mii_reg);
>>mii_reg |= BMCR_PDOWN;
>>e1e_wphy(hw, MII_BMCR, mii_reg);
>>usleep_range(1000, 2000);
>> }
> 
> I don't really see how this explains this:
> 
>>>> [  704.746874] e1000e :00:1

[PATCH] iwlwifi: mvm: Increase session protection duration for association

2020-09-23 Thread Kai-Heng Feng
Sometimes Intel AX201 fails to associate with AP:
[  839.290042] wlp0s20f3: authenticate with xx:xx:xx:xx:xx:xx
[  839.291737] wlp0s20f3: send auth to xx:xx:xx:xx:xx:xx (try 1/3)
[  839.350010] wlp0s20f3: send auth to xx:xx:xx:xx:xx:xx (try 2/3)
[  839.360826] wlp0s20f3: authenticated
[  839.363205] wlp0s20f3: associate with xx:xx:xx:xx:xx:xx (try 1/3)
[  839.370342] wlp0s20f3: RX AssocResp from xx:xx:xx:xx:xx:xx (capab=0x431 
status=0 aid=12)
[  839.378925] wlp0s20f3: associated
[  839.431788] wlp0s20f3: deauthenticated from xx:xx:xx:xx:xx:xx (Reason: 
2=PREV_AUTH_NOT_VALID)

It fails because EAPOL hasn't finished. Increase the  session protection
duration to 1200TU can eliminate the problem.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=209237
Signed-off-by: Kai-Heng Feng 
---
 drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c 
b/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c
index 9374c85c5caf..54acd9a68955 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c
@@ -3297,13 +3297,13 @@ static void iwl_mvm_mac_mgd_prepare_tx(struct 
ieee80211_hw *hw,
 * session for a much longer time since the firmware will internally
 * create two events: a 300TU one with a very high priority that
 * won't be fragmented which should be enough for 99% of the cases,
-* and another one (which we configure here to be 900TU long) which
+* and another one (which we configure here to be 1200TU long) which
 * will have a slightly lower priority, but more importantly, can be
 * fragmented so that it'll allow other activities to run.
 */
if (fw_has_capa(&mvm->fw->ucode_capa,
IWL_UCODE_TLV_CAPA_SESSION_PROT_CMD))
-   iwl_mvm_schedule_session_protection(mvm, vif, 900,
+   iwl_mvm_schedule_session_protection(mvm, vif, 1200,
min_duration, false);
else
iwl_mvm_protect_session(mvm, vif, duration,
-- 
2.17.1



Re: [Intel-wired-lan] [PATCH] e1000e: Power cycle phy on PM resume

2020-09-23 Thread Kai-Heng Feng
Hi Paul,

> On Sep 23, 2020, at 21:28, Paul Menzel  wrote:
> 
> Dear Kai-Heng,
> 
> 
> Am 23.09.20 um 09:47 schrieb Kai-Heng Feng:
>> We are seeing the following error after S3 resume:
>> [  704.746874] e1000e :00:1f.6 eno1: Setting page 0x6020
>> [  704.844232] e1000e :00:1f.6 eno1: MDI Write did not complete
>> [  704.902817] e1000e :00:1f.6 eno1: Setting page 0x6020
>> [  704.903075] e1000e :00:1f.6 eno1: reading PHY page 769 (or 0x6020 
>> shifted) reg 0x17
>> [  704.903281] e1000e :00:1f.6 eno1: Setting page 0x6020
>> [  704.903486] e1000e :00:1f.6 eno1: writing PHY page 769 (or 0x6020 
>> shifted) reg 0x17
>> [  704.943155] e1000e :00:1f.6 eno1: MDI Error
>> ...
>> [  705.108161] e1000e :00:1f.6 eno1: Hardware Error
>> Since we don't know what platform firmware may do to the phy, so let's
>> power cycle the phy upon system resume to resolve the issue.
> 
> Is there a bug report or list thread for this issue?

No. That's why I sent a patch to start discussion :)

> 
>> Signed-off-by: Kai-Heng Feng 
>> ---
>>  drivers/net/ethernet/intel/e1000e/netdev.c | 2 ++
>>  1 file changed, 2 insertions(+)
>> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
>> b/drivers/net/ethernet/intel/e1000e/netdev.c
>> index 664e8ccc88d2..c2a87a408102 100644
>> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
>> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
>> @@ -6968,6 +6968,8 @@ static __maybe_unused int e1000e_pm_resume(struct 
>> device *dev)
>>  !e1000e_check_me(hw->adapter->pdev->device))
>>  e1000e_s0ix_exit_flow(adapter);
>>  +   e1000_power_down_phy(adapter);
>> +
>>  rc = __e1000_resume(pdev);
>>  if (rc)
>>  return rc;
> 
> How much does this increase the resume time?

I didn't measure it. Because for me it's more important to have a working 
device.

Does it have a noticeable impact on your system's resume time?

Kai-Heng

> 
> 
> Kind regards,
> 
> Paul
> 



Re: [PATCH] e1000e: Power cycle phy on PM resume

2020-09-23 Thread Kai-Heng Feng
Hi Andrew,

> On Sep 23, 2020, at 20:17, Andrew Lunn  wrote:
> 
> On Wed, Sep 23, 2020 at 03:47:51PM +0800, Kai-Heng Feng wrote:
>> We are seeing the following error after S3 resume:
>> [  704.746874] e1000e :00:1f.6 eno1: Setting page 0x6020
>> [  704.844232] e1000e :00:1f.6 eno1: MDI Write did not complete
>> [  704.902817] e1000e :00:1f.6 eno1: Setting page 0x6020
>> [  704.903075] e1000e :00:1f.6 eno1: reading PHY page 769 (or 0x6020 
>> shifted) reg 0x17
>> [  704.903281] e1000e :00:1f.6 eno1: Setting page 0x6020
>> [  704.903486] e1000e :00:1f.6 eno1: writing PHY page 769 (or 0x6020 
>> shifted) reg 0x17
>> [  704.943155] e1000e :00:1f.6 eno1: MDI Error
>> ...
>> [  705.108161] e1000e :00:1f.6 eno1: Hardware Error
>> 
>> Since we don't know what platform firmware may do to the phy, so let's
>> power cycle the phy upon system resume to resolve the issue.
>> 
>> Signed-off-by: Kai-Heng Feng 
>> ---
>> drivers/net/ethernet/intel/e1000e/netdev.c | 2 ++
>> 1 file changed, 2 insertions(+)
>> 
>> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
>> b/drivers/net/ethernet/intel/e1000e/netdev.c
>> index 664e8ccc88d2..c2a87a408102 100644
>> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
>> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
>> @@ -6968,6 +6968,8 @@ static __maybe_unused int e1000e_pm_resume(struct 
>> device *dev)
>>  !e1000e_check_me(hw->adapter->pdev->device))
>>  e1000e_s0ix_exit_flow(adapter);
>> 
>> +e1000_power_down_phy(adapter);
>> +
> 
> static void e1000_power_down_phy(struct e1000_adapter *adapter)
> {
>   struct e1000_hw *hw = &adapter->hw;
> 
>   /* Power down the PHY so no link is implied when interface is down *
>* The PHY cannot be powered down if any of the following is true *
>* (a) WoL is enabled
>* (b) AMT is active
>* (c) SoL/IDER session is active
>*/
>   if (!adapter->wol && hw->mac_type >= e1000_82540 &&
>  hw->media_type == e1000_media_type_copper) {

Looks like the the function comes from e1000, 
drivers/net/ethernet/intel/e1000/e1000_main.c.
However, this patch is for e1000e, so the function with same name is different.

> 
> Could it be coming out of S3 because it just received a WoL?

No, the issue can be reproduced by pressing keyboard or rtcwake.

> 
> It seems unlikely that it is the MII_CR_POWER_DOWN which is helping,
> since that is an MDIO write itself. Do you actually know how this call
> to e1000_power_down_phy() fixes the issues?

I don't know from hardware's perspective, but I think the comment on 
e1000_power_down_phy_copper() can give us some insight:

/**
 * e1000_power_down_phy_copper - Restore copper link in case of PHY power down
 * @hw: pointer to the HW structure
 *
 * In the case of a PHY power down to save power, or to turn off link during a
 * driver unload, or wake on lan is not enabled, restore the link to previous
 * settings.   
 **/
void e1000_power_down_phy_copper(struct e1000_hw *hw)
{
u16 mii_reg = 0;

/* The PHY will retain its settings across a power down/up cycle */
e1e_rphy(hw, MII_BMCR, &mii_reg);
mii_reg |= BMCR_PDOWN;
e1e_wphy(hw, MII_BMCR, mii_reg);
usleep_range(1000, 2000);
}

Kai-Heng

> 
>   Andrew



[PATCH] e1000e: Power cycle phy on PM resume

2020-09-23 Thread Kai-Heng Feng
We are seeing the following error after S3 resume:
[  704.746874] e1000e :00:1f.6 eno1: Setting page 0x6020
[  704.844232] e1000e :00:1f.6 eno1: MDI Write did not complete
[  704.902817] e1000e :00:1f.6 eno1: Setting page 0x6020
[  704.903075] e1000e :00:1f.6 eno1: reading PHY page 769 (or 0x6020 
shifted) reg 0x17
[  704.903281] e1000e :00:1f.6 eno1: Setting page 0x6020
[  704.903486] e1000e :00:1f.6 eno1: writing PHY page 769 (or 0x6020 
shifted) reg 0x17
[  704.943155] e1000e :00:1f.6 eno1: MDI Error
...
[  705.108161] e1000e :00:1f.6 eno1: Hardware Error

Since we don't know what platform firmware may do to the phy, so let's
power cycle the phy upon system resume to resolve the issue.

Signed-off-by: Kai-Heng Feng 
---
 drivers/net/ethernet/intel/e1000e/netdev.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index 664e8ccc88d2..c2a87a408102 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -6968,6 +6968,8 @@ static __maybe_unused int e1000e_pm_resume(struct device 
*dev)
!e1000e_check_me(hw->adapter->pdev->device))
e1000e_s0ix_exit_flow(adapter);
 
+   e1000_power_down_phy(adapter);
+
rc = __e1000_resume(pdev);
if (rc)
return rc;
-- 
2.17.1



Re: [PATCH] rtw88: pci: Power cycle device during shutdown

2020-09-09 Thread Kai-Heng Feng



> On Aug 26, 2020, at 08:27, Brian Norris  wrote:
> 
> On Mon, Aug 24, 2020 at 2:32 AM Kai-Heng Feng
>  wrote:
>> 
>> Sometimes system freeze on cold/warm boot when rtw88 is probing.
>> 
>> According to [1], platform firmware may not properly power manage the
>> device during shutdown. I did some expirements and putting the device to
>> D3 can workaround the issue.
>> 
>> So let's power cycle the device by putting the device to D3 at shutdown
>> to prevent the issue from happening.
>> 
>> [1] https://bugzilla.kernel.org/show_bug.cgi?id=206411#c9
>> 
>> Signed-off-by: Kai-Heng Feng 
> 
> Can you at least include some more details, like which hardware
> version and firmware?

8723DE, 8822BE and 8822CE are affected [1].

> And which platform?

Many x86 laptops.

Some users claim BIOS update can fix the issue, however some are still affected.

> It seems a bit harsh to
> include a platform workaround to run for everyone, unless there's
> truly no downside. But even then, it's good to log what you're working
> with, in case there are ever problems with this in the future.

Ok. I can send V2 with more detailed commit message.

[1] https://bugs.launchpad.net/bugs/1872984

Kai-Heng

> 
> Brian



[PATCH] rtw88: pci: Power cycle device during shutdown

2020-08-24 Thread Kai-Heng Feng
Sometimes system freeze on cold/warm boot when rtw88 is probing.

According to [1], platform firmware may not properly power manage the
device during shutdown. I did some expirements and putting the device to
D3 can workaround the issue.

So let's power cycle the device by putting the device to D3 at shutdown
to prevent the issue from happening.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=206411#c9

Signed-off-by: Kai-Heng Feng 
---
 drivers/net/wireless/realtek/rtw88/pci.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/wireless/realtek/rtw88/pci.c 
b/drivers/net/wireless/realtek/rtw88/pci.c
index 3413973bc475..7f1f5073b9f4 100644
--- a/drivers/net/wireless/realtek/rtw88/pci.c
+++ b/drivers/net/wireless/realtek/rtw88/pci.c
@@ -1599,6 +1599,8 @@ void rtw_pci_shutdown(struct pci_dev *pdev)
 
if (chip->ops->shutdown)
chip->ops->shutdown(rtwdev);
+
+   pci_set_power_state(pdev, PCI_D3hot);
 }
 EXPORT_SYMBOL(rtw_pci_shutdown);
 
-- 
2.17.1



Re: [PATCH] rtw88: 8821c: Add RFE 2 support

2020-08-05 Thread Kai-Heng Feng
Hi Tony,

> On Aug 5, 2020, at 19:18, Tony Chuang  wrote:
> 
>> 8821CE with RFE 2 isn't supported:
>> [   12.404834] rtw_8821ce :02:00.0: rfe 2 isn't supported
>> [   12.404937] rtw_8821ce :02:00.0: failed to setup chip efuse info
>> [   12.404939] rtw_8821ce :02:00.0: failed to setup chip information
>> 
> 
> NACK
> 
> The RFE type 2 should be working with some additional fixes.
> Did you tested connecting to AP with BT paired?

No, I only tested WiFi.

> The antenna configuration is different with RFE type 0.
> I will ask someone else to fix them.
> Then the RFE type 2 modules can be supported.

Good to know that, I'll be patient and wait for a real fix.

Kai-Heng

> 
> Yen-Hsuan



[PATCH] rtw88: 8821c: Add RFE 2 support

2020-08-05 Thread Kai-Heng Feng
8821CE with RFE 2 isn't supported:
[   12.404834] rtw_8821ce :02:00.0: rfe 2 isn't supported
[   12.404937] rtw_8821ce :02:00.0: failed to setup chip efuse info
[   12.404939] rtw_8821ce :02:00.0: failed to setup chip information

It works well if both type0 tables are in use, so add it to the RFE
default.

Signed-off-by: Kai-Heng Feng 
---
 drivers/net/wireless/realtek/rtw88/rtw8821c.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/wireless/realtek/rtw88/rtw8821c.c 
b/drivers/net/wireless/realtek/rtw88/rtw8821c.c
index d8863d8a5468..f7270d0f1d55 100644
--- a/drivers/net/wireless/realtek/rtw88/rtw8821c.c
+++ b/drivers/net/wireless/realtek/rtw88/rtw8821c.c
@@ -1410,6 +1410,7 @@ static const struct rtw_intf_phy_para_table 
phy_para_table_8821c = {
 
 static const struct rtw_rfe_def rtw8821c_rfe_defs[] = {
[0] = RTW_DEF_RFE(8821c, 0, 0),
+   [2] = RTW_DEF_RFE(8821c, 0, 0),
 };
 
 static struct rtw_hw_reg rtw8821c_dig[] = {
-- 
2.17.1



Re: [Regression] "SUNRPC: Add "@len" parameter to gss_unwrap()" breaks NFS Kerberos on upstream stable 5.4.y

2020-07-17 Thread Kai-Heng Feng



> On Jul 18, 2020, at 01:34, Chuck Lever  wrote:
> 
> 
> 
>> On Jul 17, 2020, at 1:29 PM, Pierre Sauter  wrote:
>> 
>> Hi Chuck,
>> 
>> Am Donnerstag, 16. Juli 2020, 21:25:40 CEST schrieb Chuck Lever:
>>> So this makes me think there's a possibility you are not using upstream
>>> stable kernels. I can't help if I don't know what source code and commit
>>> stream you are using. It also makes me question the bisect result.
>> 
>> Yes you are right, I was referring to Ubuntu kernels 5.4.0-XX. From the
>> discussion in the Ubuntu bugtracker I got the impression that Ubuntu kernels
>> 5.4.0-XX and upstream 5.4.XX are closely related, obviously they are not. The
>> bisection was done by the original bug reporter and also refers to the Ubuntu
>> kernel.
>> 
>> In the meantime I tested v5.4.51 upstream, which shows no problems. Sorry for
>> the bother.
> 
> Pierre, thanks for confirming!
> 
> Kai-Heng suspected an upstream stable commit that is missing in 5.4.0-40,
> but I don't have any good suggestions.

Well, Ubuntu's 5.4 kernel is based on upstream stable v5.4, so I asked users to 
test stable v5.4.51, however the feedback was negative, and that's the reason 
why I raised the issue here.

Anyway, good to know that it's fixed in upstream stable, everything's good now!
Thanks for your effort Chuck.

Kai-Heng


> 
> 
 My krb5 etype is aes256-cts-hmac-sha1-96.
>>> 
>>> Thanks! And what is your NFS server and filesystem? It's possible that the
>>> client is not estimating the size of the reply correctly. Variables include
>>> the size of file handles, MIC verifiers, and wrap tokens.
>> 
>> The server is Debian with v4.19.130 upstream, filesystem ext4.
>> 
>>> You might try:
>>> 
>>> e8d70b321ecc ("SUNRPC: Fix another issue with MIC buffer space")
>> 
>> That one is actually in Ubuntus 5.4.0-40, from looking at the code.
> 
> --
> Chuck Lever



Re: [Regression] "SUNRPC: Add "@len" parameter to gss_unwrap()" breaks NFS Kerberos on upstream stable 5.4.y

2020-07-15 Thread Kai-Heng Feng



> On Jul 15, 2020, at 23:02, Chuck Lever  wrote:
> 
> 
> 
>> On Jul 15, 2020, at 10:48 AM, Kai-Heng Feng  
>> wrote:
>> 
>> Hi,
>> 
>> Multiple users reported NFS causes NULL pointer dereference [1] on Ubuntu, 
>> due to commit "SUNRPC: Add "@len" parameter to gss_unwrap()" and commit 
>> "SUNRPC: Fix GSS privacy computation of auth->au_ralign".
>> 
>> The same issue happens on upstream stable 5.4.y branch.
>> The mainline kernel doesn't have this issue though.
>> 
>> Should we revert them? Or is there any missing commits need to be backported 
>> to v5.4?
>> 
>> [1] https://bugs.launchpad.net/bugs/1886277
>> 
>> Kai-Heng
> 
> 31c9590ae468 ("SUNRPC: Add "@len" parameter to gss_unwrap()") is a refactoring
> change. It shouldn't have introduced any behavior difference. But in theory,
> practice and theory should be the same...
> 
> Check if 0a8e7b7d0846 ("SUNRPC: Revert 241b1f419f0e ("SUNRPC: Remove 
> xdr_buf_trim()")")
> is also applied to 5.4.0-40-generic.

Yes, it's included. The commit is part of upstream stable 5.4.

> 
> It would help to know if v5.5 stable is working for you. I haven't had any
> problems with it.

I'll ask users to test it out. 
Thanks for you quick reply!

Kai-Heng

> 
> 
> --
> Chuck Lever
> 
> 
> 



[Regression] "SUNRPC: Add "@len" parameter to gss_unwrap()" breaks NFS Kerberos on upstream stable 5.4.y

2020-07-15 Thread Kai-Heng Feng
Hi,

Multiple users reported NFS causes NULL pointer dereference [1] on Ubuntu, due 
to commit "SUNRPC: Add "@len" parameter to gss_unwrap()" and commit "SUNRPC: 
Fix GSS privacy computation of auth->au_ralign".

The same issue happens on upstream stable 5.4.y branch.
The mainline kernel doesn't have this issue though.

Should we revert them? Or is there any missing commits need to be backported to 
v5.4?

[1] https://bugs.launchpad.net/bugs/1886277

Kai-Heng


Re: [PATCH] net: atlantic: Add support for firmware v4

2020-07-07 Thread Kai-Heng Feng



> On Jul 7, 2020, at 16:46, Alexander Lobakin  wrote:
> 
> From:   Kai-Heng Feng 
> Date:   Tue,  7 Jul 2020 14:38:28 +0800
> 
>> We have a new ethernet card that is supported by the atlantic driver:
>> 01:00.0 Ethernet controller [0200]: Aquantia Corp. AQC107 NBase-T/IEEE 
>> 802.3bz Ethernet Controller [AQtion] [1d6a:07b1] (rev 02)
>> 
>> But the driver failed to probe the device:
>> kernel: atlantic: Bad FW version detected: 41e
>> kernel: atlantic: probe of :01:00.0 failed with error -95
>> 
>> As a pure guesswork, simply adding the firmware version to the driver
> 
> Please don't send "pure guessworks" to net-fixes tree.

Is matching a new firmware version really that different to matching a new PCI 
ID or USB ID?

> You should have
> reported this as a bug to LKML and/or atlantic team, so we could issue
> it.

Right, I'll wait for a new patch from atlantic team.

Kai-Heng

> 
>> can make it function. Doing iperf3 as a smoketest doesn't show any
>> abnormality either.
>> 
>> Signed-off-by: Kai-Heng Feng 
>> ---
>> drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c | 4 
>> 1 file changed, 4 insertions(+)
>> 
>> diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c 
>> b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c
>> index 73c0f41df8d8..0b4cd1c0e022 100644
>> --- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c
>> +++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c
>> @@ -46,6 +46,7 @@
>> #define HW_ATL_FW_VER_1X 0x01050006U
>> #define HW_ATL_FW_VER_2X 0x0200U
>> #define HW_ATL_FW_VER_3X 0x0300U
>> +#define HW_ATL_FW_VER_4X 0x041EU
>> 
>> #define FORCE_FLASHLESS 0
>> 
>> @@ -81,6 +82,9 @@ int hw_atl_utils_initfw(struct aq_hw_s *self, const struct 
>> aq_fw_ops **fw_ops)
>>  } else if (hw_atl_utils_ver_match(HW_ATL_FW_VER_3X,
>>self->fw_ver_actual) == 0) {
>>  *fw_ops = &aq_fw_2x_ops;
>> +} else if (hw_atl_utils_ver_match(HW_ATL_FW_VER_4X,
>> +  self->fw_ver_actual) == 0) {
>> +*fw_ops = &aq_fw_2x_ops;
>>  } else {
>>  aq_pr_err("Bad FW version detected: %x\n",
>>self->fw_ver_actual);
>> -- 
>> 2.17.1



[PATCH] net: atlantic: Add support for firmware v4

2020-07-06 Thread Kai-Heng Feng
We have a new ethernet card that is supported by the atlantic driver:
01:00.0 Ethernet controller [0200]: Aquantia Corp. AQC107 NBase-T/IEEE 802.3bz 
Ethernet Controller [AQtion] [1d6a:07b1] (rev 02)

But the driver failed to probe the device:
kernel: atlantic: Bad FW version detected: 41e
kernel: atlantic: probe of :01:00.0 failed with error -95

As a pure guesswork, simply adding the firmware version to the driver
can make it function. Doing iperf3 as a smoketest doesn't show any
abnormality either.

Signed-off-by: Kai-Heng Feng 
---
 drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c 
b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c
index 73c0f41df8d8..0b4cd1c0e022 100644
--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c
+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c
@@ -46,6 +46,7 @@
 #define HW_ATL_FW_VER_1X 0x01050006U
 #define HW_ATL_FW_VER_2X 0x0200U
 #define HW_ATL_FW_VER_3X 0x0300U
+#define HW_ATL_FW_VER_4X 0x041EU
 
 #define FORCE_FLASHLESS 0
 
@@ -81,6 +82,9 @@ int hw_atl_utils_initfw(struct aq_hw_s *self, const struct 
aq_fw_ops **fw_ops)
} else if (hw_atl_utils_ver_match(HW_ATL_FW_VER_3X,
  self->fw_ver_actual) == 0) {
*fw_ops = &aq_fw_2x_ops;
+   } else if (hw_atl_utils_ver_match(HW_ATL_FW_VER_4X,
+ self->fw_ver_actual) == 0) {
+   *fw_ops = &aq_fw_2x_ops;
} else {
aq_pr_err("Bad FW version detected: %x\n",
  self->fw_ver_actual);
-- 
2.17.1



Re: [PATCH] e1000e: continue to init phy even when failed to disable ULP

2020-06-16 Thread Kai-Heng Feng



> On Jun 16, 2020, at 18:05, Aaron Ma  wrote:
> 
> After commit "e1000e: disable s0ix entry and exit flows for ME systems",
> some ThinkPads always failed to disable ulp by ME.
> commit "e1000e: Warn if disabling ULP failed" break out of init phy:
> 
> error log:
> [   42.364753] e1000e :00:1f.6 enp0s31f6: Failed to disable ULP
> [   42.524626] e1000e :00:1f.6 enp0s31f6: PHY Wakeup cause - Unicast 
> Packet
> [   42.822476] e1000e :00:1f.6 enp0s31f6: Hardware Error
> 
> When disable s0ix, E1000_FWSM_ULP_CFG_DONE will never be 1.
> If continue to init phy like before, it can work as before.
> iperf test result good too.
> 
> Chnage e_warn to e_dbg, in case it confuses.
> 
> Signed-off-by: Aaron Ma 
> ---
> drivers/net/ethernet/intel/e1000e/ich8lan.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c 
> b/drivers/net/ethernet/intel/e1000e/ich8lan.c
> index f999cca37a8a..63405819eb83 100644
> --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
> +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
> @@ -302,8 +302,7 @@ static s32 e1000_init_phy_workarounds_pchlan(struct 
> e1000_hw *hw)
>   hw->dev_spec.ich8lan.ulp_state = e1000_ulp_state_unknown;
>   ret_val = e1000_disable_ulp_lpt_lp(hw, true);

If si0x entry isn't enabled, maybe skip calling e1000_disable_ulp_lpt_lp() 
altogether?
We can use e1000e_check_me() to check that.

>   if (ret_val) {
> - e_warn("Failed to disable ULP\n");
> - goto out;
> + e_dbg("Failed to disable ULP\n");
>   }

The change of "e1000e: Warn if disabling ULP failed" is intentional to catch 
bugs like this.

Kai-Heng

> 
>   ret_val = hw->phy.ops.acquire(hw);
> -- 
> 2.26.2
> 



[PATCH] e1000e: Disable TSO for buffer overrun workaround

2020-05-07 Thread Kai-Heng Feng
Commit b10effb92e27 ("e1000e: fix buffer overrun while the I219 is
processing DMA transactions") imposes roughly 30% performance penalty.

The commit log states that "Disabling TSO eliminates performance loss
for TCP traffic without a noticeable impact on CPU performance", so
let's disable TSO by default to regain the loss.

Fixes: b10effb92e27 ("e1000e: fix buffer overrun while the I219 is processing 
DMA transactions")
BugLink: https://bugs.launchpad.net/bugs/1802691
Signed-off-by: Kai-Heng Feng 
---
 drivers/net/ethernet/intel/e1000e/netdev.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index 177c6da80c57..9b5509149578 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -5294,6 +5294,10 @@ static void e1000_watchdog_task(struct work_struct *work)
/* oops */
break;
}
+   if (hw->mac.type == e1000_pch_spt) {
+   netdev->features &= ~NETIF_F_TSO;
+   netdev->features &= ~NETIF_F_TSO6;
+   }
}
 
/* enable transmits in the hardware, need to do this
-- 
2.17.1



[PATCH] e1000e: Warn if disabling ULP failed

2020-05-06 Thread Kai-Heng Feng
The hardware may stop working if driver failed to disable ULP mode.

Take the return value of e1000_disable_ulp_lpt_lp() into account, and
pass up the error if it fails.

Signed-off-by: Kai-Heng Feng 
---
 drivers/net/ethernet/intel/e1000e/ich8lan.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c 
b/drivers/net/ethernet/intel/e1000e/ich8lan.c
index 15f6c0a4dc63..9cbd2d6c7da4 100644
--- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
+++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
@@ -300,7 +300,11 @@ static s32 e1000_init_phy_workarounds_pchlan(struct 
e1000_hw *hw)
 * so forcibly disable it.
 */
hw->dev_spec.ich8lan.ulp_state = e1000_ulp_state_unknown;
-   e1000_disable_ulp_lpt_lp(hw, true);
+   ret_val = e1000_disable_ulp_lpt_lp(hw, true);
+   if (ret_val) {
+   e_warn("Failed to disable ULP\n");
+   goto out;
+   }
 
ret_val = hw->phy.ops.acquire(hw);
if (ret_val) {
-- 
2.17.1



[PATCH v2] igb: Report speed and duplex as unknown when device is runtime suspended

2020-05-04 Thread Kai-Heng Feng
igb device gets runtime suspended when there's no link partner. We can't
get correct speed under that state:
$ cat /sys/class/net/enp3s0/speed
1000

In addition to that, an error can also be spotted in dmesg:
[  385.991957] igb :03:00.0 enp3s0: PCIe link lost

Since device can only be runtime suspended when there's no link partner,
we can skip reading register and let the following logic set speed and
duplex with correct status.

The more generic approach will be wrap get_link_ksettings() with begin()
and complete() callbacks. However, for this particular issue, begin()
calls igb_runtime_resume() , which tries to rtnl_lock() while the lock
is already hold by upper ethtool layer.

So let's take this approach until the igb_runtime_resume() no longer
needs to hold rtnl_lock.

Cc: Jeff Kirsher 
Cc: Aaron Brown 
Suggested-by: Alexander Duyck 
Signed-off-by: Kai-Heng Feng 
---
v2:
 - Don't early return the routine so other info can be set.

 drivers/net/ethernet/intel/igb/igb_ethtool.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c 
b/drivers/net/ethernet/intel/igb/igb_ethtool.c
index 39d3b76a6f5d..2cd003c5ad43 100644
--- a/drivers/net/ethernet/intel/igb/igb_ethtool.c
+++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c
@@ -143,7 +143,8 @@ static int igb_get_link_ksettings(struct net_device *netdev,
u32 speed;
u32 supported, advertising;
 
-   status = rd32(E1000_STATUS);
+   status = pm_runtime_suspended(&adapter->pdev->dev) ?
+0 : rd32(E1000_STATUS);
if (hw->phy.media_type == e1000_media_type_copper) {
 
supported = (SUPPORTED_10baseT_Half |
-- 
2.17.1



[PATCH] igb: Report speed and duplex as unknown when device is runtime suspended

2020-05-04 Thread Kai-Heng Feng
igb device gets runtime suspended when there's no link partner. We can't
get correct speed under that state:
$ cat /sys/class/net/enp3s0/speed
1000

In addition to that, an error can also be spotted in dmesg:
[  385.991957] igb :03:00.0 enp3s0: PCIe link lost

Since device can only be runtime suspended when there's no link partner,
we can directly report the speed and duplex as unknown.

The more generic approach will be wrap get_link_ksettings() with begin()
and complete() callbacks. However, for this particular issue, begin()
calls igb_runtime_resume() , which tries to rtnl_lock() while the lock
is already hold by upper ethtool layer.

So let's take this approach until the igb_runtime_resume() no longer
needs to hold rtnl_lock.

Suggested-by: Alexander Duyck 
Signed-off-by: Kai-Heng Feng 
---
 drivers/net/ethernet/intel/igb/igb_ethtool.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c 
b/drivers/net/ethernet/intel/igb/igb_ethtool.c
index 39d3b76a6f5d..b429bca4aa6a 100644
--- a/drivers/net/ethernet/intel/igb/igb_ethtool.c
+++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c
@@ -143,6 +143,12 @@ static int igb_get_link_ksettings(struct net_device 
*netdev,
u32 speed;
u32 supported, advertising;
 
+   if (pm_runtime_suspended(&adapter->pdev->dev)) {
+   cmd->base.duplex = DUPLEX_UNKNOWN;
+   cmd->base.speed = SPEED_UNKNOWN;
+   return 0;
+   }
+
status = rd32(E1000_STATUS);
if (hw->phy.media_type == e1000_media_type_copper) {
 
-- 
2.17.1



Re: Latitude 5495's tg3 hangs under heavy load

2019-05-19 Thread Kai-Heng Feng

Hi Siva,

at 21:42, Siva Reddy Kallam  wrote:


On Mon, Mar 11, 2019 at 9:23 AM Kai-Heng Feng
 wrote:

[snipped]

Hi again,

Any update?

Kai-Heng

Sorry for the late reply. We will provide our feedback soon.


Any good news? It still happens on latest mainline kernel.

Kai-Heng


Re: Latitude 5495's tg3 hangs under heavy load

2019-03-10 Thread Kai-Heng Feng

[snipped]

Hi again,

Any update?

Kai-Heng


[PATCH] sky2: Disable MSI on Dell Inspiron 1545 and Gateway P-79

2019-03-03 Thread Kai-Heng Feng
Some sky2 chips fire IRQ after S3, before the driver is fully resumed:
[ 686.804877] do_IRQ: 1.37 No irq handler for vector

This is likely a platform bug that device isn't fully quiesced during
S3. Use MSI-X, maskable MSI or INTx can prevent this issue from
happening.

Since MSI-X and maskable MSI are not supported by this device, fallback
to use INTx on affected platforms.

BugLink: https://bugs.launchpad.net/bugs/1807259
BugLink: https://bugs.launchpad.net/bugs/1809843
Signed-off-by: Kai-Heng Feng 
---
 drivers/net/ethernet/marvell/sky2.c | 24 +++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/sky2.c 
b/drivers/net/ethernet/marvell/sky2.c
index 57727fe1501e..8b3495ee2b6e 100644
--- a/drivers/net/ethernet/marvell/sky2.c
+++ b/drivers/net/ethernet/marvell/sky2.c
@@ -46,6 +46,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -93,7 +94,7 @@ static int copybreak __read_mostly = 128;
 module_param(copybreak, int, 0);
 MODULE_PARM_DESC(copybreak, "Receive copy threshold");
 
-static int disable_msi = 0;
+static int disable_msi = -1;
 module_param(disable_msi, int, 0);
 MODULE_PARM_DESC(disable_msi, "Disable Message Signaled Interrupt (MSI)");
 
@@ -4917,6 +4918,24 @@ static const char *sky2_name(u8 chipid, char *buf, int 
sz)
return buf;
 }
 
+static const struct dmi_system_id msi_blacklist[] = {
+   {
+   .ident = "Dell Inspiron 1545",
+   .matches = {
+   DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."),
+   DMI_MATCH(DMI_PRODUCT_NAME, "Inspiron 1545"),
+   },
+   },
+   {
+   .ident = "Gateway P-79",
+   .matches = {
+   DMI_MATCH(DMI_SYS_VENDOR, "Gateway"),
+   DMI_MATCH(DMI_PRODUCT_NAME, "P-79"),
+   },
+   },
+   {}
+};
+
 static int sky2_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 {
struct net_device *dev, *dev1;
@@ -5028,6 +5047,9 @@ static int sky2_probe(struct pci_dev *pdev, const struct 
pci_device_id *ent)
goto err_out_free_pci;
}
 
+   if (disable_msi == -1)
+   disable_msi = !!dmi_check_system(msi_blacklist);
+
if (!disable_msi && pci_enable_msi(pdev) == 0) {
err = sky2_test_msi(hw);
if (err) {
-- 
2.17.1



[PATCH] sky2: Increase D3 delay again

2019-02-19 Thread Kai-Heng Feng
Another platform requires even longer delay to make the device work
correctly after S3.

So increase the delay to 300ms.

BugLink: https://bugs.launchpad.net/bugs/1798921

Cc: sta...@vger.kernel.org
Signed-off-by: Kai-Heng Feng 
---
 drivers/net/ethernet/marvell/sky2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/sky2.c 
b/drivers/net/ethernet/marvell/sky2.c
index f3a5fa84860f..57727fe1501e 100644
--- a/drivers/net/ethernet/marvell/sky2.c
+++ b/drivers/net/ethernet/marvell/sky2.c
@@ -5073,7 +5073,7 @@ static int sky2_probe(struct pci_dev *pdev, const struct 
pci_device_id *ent)
INIT_WORK(&hw->restart_work, sky2_restart);
 
pci_set_drvdata(pdev, hw);
-   pdev->d3_delay = 200;
+   pdev->d3_delay = 300;
 
return 0;
 
-- 
2.17.1



Re: [PATCH] PCI / ACPI: Don't clear pme_poll on device that has unreliable ACPI wake

2019-02-18 Thread Kai Heng Feng



> On Feb 4, 2019, at 6:20 PM, Bjorn Helgaas  wrote:
> 
> On Sun, Feb 03, 2019 at 01:46:50AM +0800, Kai Heng Feng wrote:
>>> On Jan 28, 2019, at 3:51 PM, Kai Heng Feng  
>>> wrote:
>> 
>>>> If I understand correctly, the bugzilla lspci
>>>> (https://bugzilla.kernel.org/attachment.cgi?id=280691) was collected
>>>> at point 8, and it shows PME_Status=1 when it should be 0.
>>>> 
>>>> If we write a 1 to PME_Status to clear it, and it remains set, that's
>>>> obviously a hardware defect, and Intel should document that in an
>>>> erratum, and a quirk would be the appropriate way to work around it.
>>>> But I doubt that's what's happening.
>>> 
>>> I’ll ask them if they can provide an erratum.
>> 
>> Got confirmed with e1000e folks, I219 (the device in question) doesn’t
>> really support runtime D3.
> 
> Did you get a reference, e.g., an intel.com URL for that?  Intel
> usually publishes errata for hardware defects, which is nice because
> it means every customer doesn't have to experimentally rediscover
> them.

Unfortunately no.

> 
>> I also checked the behavior of the device under Windows, and it
>> stays at D0 all the time even when it’s not in use.
> 
> I think there are two possible explanations for this:
> 
>  1) This device requires a Windows or a driver update with a
>  device-specific quirk similar to what you're proposing for Linux.

I am sure the latest driver is loaded under Windows.

> 
>  2) Windows correctly detects that this device doesn't support D3,
>  and Linux has a bug and does not detect that.

I think that’s the case.

> 
> Obviously nobody wants to require OS or driver updates just for minor
> device changes, and the PCI and ACPI specs are designed to allow
> generic, non device-specific code to detect D3 support, so the first
> case should be a result of a hardware defect.

Yea, that’s why my original idea is to workaround it in PCI/ACPI.

> 
>> So I sent a patch [1] to disable it.
>> 
>> [1] https://lkml.org/lkml/2019/2/2/200
> 
> OK.  Since that's in drivers/net/..., I have no objection and the
> e1000e maintainers would deal with that.

Thanks.

Kai-Heng

> 
> Bjorn



Re: [PATCH] PCI / ACPI: Don't clear pme_poll on device that has unreliable ACPI wake

2019-02-02 Thread Kai Heng Feng
Hi Bjorn,

> On Jan 28, 2019, at 3:51 PM, Kai Heng Feng  
> wrote:
[snipped]

>> If I understand correctly, the bugzilla lspci
>> (https://bugzilla.kernel.org/attachment.cgi?id=280691) was collected
>> at point 8, and it shows PME_Status=1 when it should be 0.
>> 
>> If we write a 1 to PME_Status to clear it, and it remains set, that's
>> obviously a hardware defect, and Intel should document that in an
>> erratum, and a quirk would be the appropriate way to work around it.
>> But I doubt that's what's happening.
> 
> I’ll ask them if they can provide an erratum.

Got confirmed with e1000e folks, I219 (the device in question) doesn’t really 
support runtime D3.
I also checked the behavior of the device under Windows, and it stays at D0 all 
the time even when it’s not in use.

So I sent a patch [1] to disable it.

[1] https://lkml.org/lkml/2019/2/2/200

Kai-Heng




[PATCH] e1000e: Disable runtime PM on CNP+

2019-02-02 Thread Kai-Heng Feng
There are some new e1000e devices can only be woken up from D3 one time,
by plugging ethernet cable. Subsequent cable plugging does set PME bit
correctly, but it still doesn't get woken up.

Since e1000e connects to the root complex directly, we rely on ACPI to
wake it up. In this case, the GPE from _PRW only works once and stops
working after that. Though it appears to be a platform bug, e1000e
maintainers confirmed that I219 does not support D3.

So disable runtime PM on CNP+ chips. We may need to disable earlier
generations if this bug also hit older platforms.

Bugzilla: https://bugzilla.kernel.org/attachment.cgi?id=280819
Signed-off-by: Kai-Heng Feng 
---
 drivers/net/ethernet/intel/e1000e/netdev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index 189f231075c2..9366b9d19a6f 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -7351,7 +7351,7 @@ static int e1000_probe(struct pci_dev *pdev, const struct 
pci_device_id *ent)
 
e1000_print_device_info(adapter);
 
-   if (pci_dev_run_wake(pdev))
+   if (pci_dev_run_wake(pdev) && hw->mac.type < e1000_pch_cnp)
pm_runtime_put_noidle(&pdev->dev);
 
return 0;
-- 
2.17.1



Re: Latitude 5495's tg3 hangs under heavy load

2019-01-31 Thread Kai-Heng Feng
Hi tg3 folks, 

> On Jan 9, 2019, at 13:42, Kai Heng Feng  wrote:
> 
> 
> 
>> On Jan 7, 2019, at 5:12 PM, Siva Reddy Kallam  
>> wrote:
>> 
>> On Mon, Jan 7, 2019 at 11:34 AM Kai Heng Feng
>>  wrote:
>>> 
>>> Hi tg3 folks,
>>> 
>>> Any idea how to solve the bug?
>>> 
>>> Kai-Heng
>> Hi,
>> Can you share Register dump(ethtool -d)?
> 
> `ethtool -d` before freeze:
> https://pastebin.com/MSkJzhcv
> 
> `ethtool -d` after freeze:
> https://pastebin.com/dQj8mLsN
> 
>> Is ifconfig down/up bringing back interface?
> 
> Yes. And seems like network works fine afterward.
> 
> `ethtool -d` after down/up:
> https://pastebin.com/vL1gCC2n

Wondering if there’s any update?

Are these info sufficient?

Kai-Heng

> 
> Kai-Heng
> 
>> 
>>> 
>>>> On Dec 7, 2018, at 17:27, Kai Heng Feng  
>>>> wrote:
>>>> 
>>>> Hi tg3 maintainers,
>>>> 
>>>> I’ve encountered network freeze when using tg3 in gigabits net.
>>>> 
>>>> The issue can be easily reproduced when using scp to transfer files in 
>>>> local network.
>>>> 
>>>> The symptom is pretty similar to what this commit is trying to solve:
>>>> commit 3a498606bb04af603a46ebde8296040b2de350d1
>>>> Author: Sanjeev Bansal 
>>>> Date:   Mon Jul 16 11:13:32 2018 +0530
>>>> 
>>>>  tg3: Add higher cpu clock for 5762.
>>>> 
>>>>  This patch has fix for TX timeout while running bi-directional
>>>>  traffic with 100 Mbps using 5762.
>>>> 
>>>> But reverting this commit doesn’t help.
>>>> 
>>>> Latitude 5495 is a AMD Raven Ridge platform, not sure if this matters.
>>>> 
>>>> Here’s the lspci for this device:
>>>> 03:00.0 Ethernet controller [0200]: Broadcom Inc. and subsidiaries 
>>>> NetXtreme BCM5762 Gigabit Ethernet PCIe [14e4:1687] (rev 10)
>>>>  Subsystem: Dell NetXtreme BCM5762 Gigabit Ethernet PCIe [1028:0814]
>>>>  Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
>>>> Stepping- SERR- FastB2B- DisINTx+
>>>>  Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- 
>>>> SERR- >>>  Latency: 0
>>>>  Interrupt: pin A routed to IRQ 16
>>>>  Region 0: Memory at e022 (64-bit, prefetchable) [size=64K]
>>>>  Region 2: Memory at e021 (64-bit, prefetchable) [size=64K]
>>>>  Region 4: Memory at e020 (64-bit, prefetchable) [size=64K]
>>>>  Capabilities: [48] Power Management version 3
>>>>  Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
>>>> PME(D0+,D1-,D2-,D3hot+,D3cold+)
>>>>  Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
>>>>  Capabilities: [50] Vital Product Data
>>>>  Product Name: Broadcom NetXtreme Gigabit Ethernet Controller
>>>>  Read-only fields:
>>>>  [PN] Part number: BCM95762
>>>>  [EC] Engineering changes: 106679-15
>>>>  [SN] Serial number: 0123456789
>>>>  [MN] Manufacture ID: 31 34 65 34
>>>>  [RV] Reserved: checksum good, 28 byte(s) reserved
>>>>  Read/write fields:
>>>>  [YA] Asset tag: XYZ01234567
>>>>  [RW] Read-write area: 107 byte(s) free
>>>>  End
>>>>  Capabilities: [58] MSI: Enable- Count=1/8 Maskable- 64bit+
>>>>  Address:   Data: 
>>>>  Capabilities: [a0] MSI-X: Enable+ Count=6 Masked-
>>>>  Vector table: BAR=4 offset=
>>>>  PBA: BAR=2 offset=0120
>>>>  Capabilities: [ac] Express (v2) Endpoint, MSI 00
>>>>  DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, 
>>>> L1 <64us
>>>>  ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ 
>>>> SlotPowerLimit 0.000W
>>>>  DevCtl: Report errors: Correctable- Non-Fatal- Fatal- 
>>>> Unsupported-
>>>>  RlxdOrd- ExtTag- PhantFunc- AuxPwr+ NoSnoop- FLReset-
>>>>  MaxPayload 128 bytes, MaxReadReq 4096 bytes
>>>>  DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ 

Re: [PATCH] PCI / ACPI: Don't clear pme_poll on device that has unreliable ACPI wake

2019-01-27 Thread Kai Heng Feng



> On Jan 25, 2019, at 4:05 AM, Bjorn Helgaas  wrote:
> 
> On Thu, Jan 24, 2019 at 11:29:37PM +0800, Kai Heng Feng wrote:
>>> On Jan 24, 2019, at 11:15 PM, Bjorn Helgaas  wrote:
>>> On Wed, Jan 23, 2019 at 03:17:37PM +0800, Kai Heng Feng wrote:
>>>>> On Jan 23, 2019, at 7:51 AM, Bjorn Helgaas  wrote:
>>>>> On Tue, Jan 22, 2019 at 02:45:44PM +0800, Kai-Heng Feng wrote:
>>>>>> There are some e1000e devices can only be woken up from D3 one time, by
>>>>>> plugging ethernet cable. Subsequent cable plugging does set PME bit
>>>>>> correctly, but it still doesn't get woken up.
>>>>>> 
>>>>>> Since e1000e connects to the root complex directly, we rely on ACPI to
>>>>>> wake it up. In this case, the GPE from _PRW only works once and stops
>>>>>> working after that.
>>>>>> 
>>>>>> So introduce a new PCI quirk, to avoid clearing pme_poll flag for buggy
>>>>>> platform firmwares that have unreliable GPE wake.
>>>>> 
>>>>> This quirk applies to all 0x15bb (E1000_DEV_ID_PCH_CNP_I219_LM7) and
>>>>> 0x15bd (E1000_DEV_ID_PCH_CNP_I219_LM6) devices.  The e1000e driver
>>>>> claims about a zillion different device IDs.
>>>>> 
>>>>> I would be surprised if these two devices are defective but all the
>>>>> others work correctly.  Could it be that there is a problem with the
>>>>> wiring on this particular motherboard or with the ACPI _PRW methods
>>>>> (or the way Linux interprets them) in this firmware?
>>>> 
>>>> If this is a motherboard issue or platform specific, do you prefer to use
>>>> DMI matches here?
>>> 
>>> I'm not sure what the problem is yet, so let's hold off on the exact
>>> structure of the fix.
>> 
>> I think DMI table can put in e1000e driver instead of PCI quirk.
> 
> I don't think we should add a quirk or DMI table yet because we
> haven't gotten to the root cause of this problem.  If the root cause
> is a problem in the Linux code, adding a quirk will mask the problem
> for this specific system, but will leave other systems with similar
> problems.
> 
>>> If I understand correctly, e1000e wakeup works once, but doesn't work
>>> after that.  Your lspci (from after that first wakeup, from
>>> https://bugzilla.kernel.org/attachment.cgi?id=280691) shows this:
>>> 
>>> 00:14.0 XHC  XHCI USB
>>>   Flags: PMEClk- DSI- D1- D2- ... PME(D0-,D1-,D2-,D3hot+,D3cold+)
>>>   Status: D3 NoSoftRst+ PME-Enable+ DSel=0 DScale=0 PME-
>>> 00:1f.3 HDAS audio
>>>   Flags: PMEClk- DSI- D1- D2- ... PME(D0-,D1-,D2-,D3hot+,D3cold+)
>>>   Status: D3 NoSoftRst+ PME-Enable+ DSel=0 DScale=0 PME-
>>> 00:1f.6 GLAN e1000e
>>>   Flags: PMEClk- DSI+ D1- D2- ... PME(D0+,D1-,D2-,D3hot+,D3cold+)
>>>   Status: D3 NoSoftRst+ PME-Enable+ DSel=0 DScale=1 PME+
>>> 
>>> So the e1000e PME_Status bit is still set, which means it probably
>>> won't generate another PME interrupt, which would explain why wakeup
>>> doesn't work.  To test this theory, can you try this:
>>> 
>>> - sleep
>>> - wakeup via e1000e
>>> # DEV=00:1f.6
>>> # lspci -vvs $DEV
>>> # setpci -s $DEV CAP_PM+4.W
>>> # setpci -s $DEV CAP_PM+4.W=0x8100
>>> - sleep
>>> - attempt another wakeup via e1000e
>>> 
>>> If this second wakeup works, it would suggest that PME_Status isn't
>>> being cleared correctly.  I see code, e.g., in
>>> acpi_setup_gpe_for_wake(), that *looks* like it would arrange to clear
>>> it, but I'm not very familiar with it.  Maybe there's some issue with
>>> multiple devices sharing an "implicit notification" situation like
>>> this.
>> 
>> The PME status is being cleared correctly.
> 
> I was hoping to understand this better via the experiment above, but
> I'm still confused.  Here's the scenario as I understand it:
> 
>  0) fresh boot
>  1) e1000e PME_Status should be 0
>  2) sleep
>  3) wakeup via e1000e succeeds
>  4) e1000e PME_Status should be 0
>  5) sleep
>  6) wakeup via e1000e fails
>  7) wakeup via USB succeeds
>  8) e1000e PME_Status should be 0, but is actually 1

Sorry for not illustrating the scenario more clearly, here’s the test scenario:
 0) fresh boot
 1) no ethernet cable plugged
 2) e1000e runtime suspend
 3) PME_Status is 0
 4) plug ethernet cable
 5) e1000e gets wo

Re: [PATCH] PCI / ACPI: Don't clear pme_poll on device that has unreliable ACPI wake

2019-01-24 Thread Kai Heng Feng



> On Jan 24, 2019, at 11:15 PM, Bjorn Helgaas  wrote:
> 
> On Wed, Jan 23, 2019 at 03:17:37PM +0800, Kai Heng Feng wrote:
>>> On Jan 23, 2019, at 7:51 AM, Bjorn Helgaas  wrote:
>>> On Tue, Jan 22, 2019 at 02:45:44PM +0800, Kai-Heng Feng wrote:
>>>> There are some e1000e devices can only be woken up from D3 one time, by
>>>> plugging ethernet cable. Subsequent cable plugging does set PME bit
>>>> correctly, but it still doesn't get woken up.
>>>> 
>>>> Since e1000e connects to the root complex directly, we rely on ACPI to
>>>> wake it up. In this case, the GPE from _PRW only works once and stops
>>>> working after that.
>>>> 
>>>> So introduce a new PCI quirk, to avoid clearing pme_poll flag for buggy
>>>> platform firmwares that have unreliable GPE wake.
>>> 
>>> This quirk applies to all 0x15bb (E1000_DEV_ID_PCH_CNP_I219_LM7) and
>>> 0x15bd (E1000_DEV_ID_PCH_CNP_I219_LM6) devices.  The e1000e driver
>>> claims about a zillion different device IDs.
>>> 
>>> I would be surprised if these two devices are defective but all the
>>> others work correctly.  Could it be that there is a problem with the
>>> wiring on this particular motherboard or with the ACPI _PRW methods
>>> (or the way Linux interprets them) in this firmware?
>> 
>> If this is a motherboard issue or platform specific, do you prefer to use
>> DMI matches here?
> 
> I'm not sure what the problem is yet, so let's hold off on the exact
> structure of the fix.

I think DMI table can put in e1000e driver instead of PCI quirk.

> 
> If I understand correctly, e1000e wakeup works once, but doesn't work
> after that.  Your lspci (from after that first wakeup, from
> https://bugzilla.kernel.org/attachment.cgi?id=280691) shows this:
> 
>  00:14.0 XHC  XHCI USB
>Flags: PMEClk- DSI- D1- D2- ... PME(D0-,D1-,D2-,D3hot+,D3cold+)
>Status: D3 NoSoftRst+ PME-Enable+ DSel=0 DScale=0 PME-
>  00:1f.3 HDAS audio
>Flags: PMEClk- DSI- D1- D2- ... PME(D0-,D1-,D2-,D3hot+,D3cold+)
>Status: D3 NoSoftRst+ PME-Enable+ DSel=0 DScale=0 PME-
>  00:1f.6 GLAN e1000e
>Flags: PMEClk- DSI+ D1- D2- ... PME(D0+,D1-,D2-,D3hot+,D3cold+)
>Status: D3 NoSoftRst+ PME-Enable+ DSel=0 DScale=1 PME+
> 
> So the e1000e PME_Status bit is still set, which means it probably
> won't generate another PME interrupt, which would explain why wakeup
> doesn't work.  To test this theory, can you try this:
> 
>  - sleep
>  - wakeup via e1000e
>  # DEV=00:1f.6
>  # lspci -vvs $DEV
>  # setpci -s $DEV CAP_PM+4.W
>  # setpci -s $DEV CAP_PM+4.W=0x8100
>  - sleep
>  - attempt another wakeup via e1000e
> 
> If this second wakeup works, it would suggest that PME_Status isn't
> being cleared correctly.  I see code, e.g., in
> acpi_setup_gpe_for_wake(), that *looks* like it would arrange to clear
> it, but I'm not very familiar with it.  Maybe there's some issue with
> multiple devices sharing an "implicit notification" situation like
> this.

The PME status is being cleared correctly.

The lspci is captured after I plugged the ethernet cable second time,
i.e. PME is set but not being woken up.

Kai-Heng

> 
>> As for _PRW, it’s shared by USB controller, Audio controller and ethernet.
>> Only the ethernet (e1000e) has this issue.
>> 
>> When this issue happens, the e1000e doesn’t get woken up by ethernet cable
>> plugging, but inserting a USB device or plugging audio jack can wake up all
>> three devices. So I think Linux interprets ACPI correctly here.
>> 
>> Their _PRW here:
>> USB controller:
>>Scope (_SB.PCI0)   
>>{  
>>Device (XDCI)
>>{
>>Method (_PRW, 0, NotSerialized)  // _PRW: Power Resources for Wake
>>{
>>Return (GPRW (0x6D, 0x04))
>>} 
>> 
>> Audio controller:
>> Scope (_SB.PCI0) 
>>   
>>{ 
>>  
>>Device (HDAS)  
>>{ 
>> 
>>… 
>>Method (_PRW, 0, NotSerialized)  // _PRW: Power Resources for 
>> Wake 
>>{  

Re: [PATCH] PCI / ACPI: Don't clear pme_poll on device that has unreliable ACPI wake

2019-01-22 Thread Kai Heng Feng



> On Jan 23, 2019, at 7:51 AM, Bjorn Helgaas  wrote:
> 
> On Tue, Jan 22, 2019 at 02:45:44PM +0800, Kai-Heng Feng wrote:
>> There are some e1000e devices can only be woken up from D3 one time, by
>> plugging ethernet cable. Subsequent cable plugging does set PME bit
>> correctly, but it still doesn't get woken up.
>> 
>> Since e1000e connects to the root complex directly, we rely on ACPI to
>> wake it up. In this case, the GPE from _PRW only works once and stops
>> working after that.
>> 
>> So introduce a new PCI quirk, to avoid clearing pme_poll flag for buggy
>> platform firmwares that have unreliable GPE wake.
> 
> This quirk applies to all 0x15bb (E1000_DEV_ID_PCH_CNP_I219_LM7) and
> 0x15bd (E1000_DEV_ID_PCH_CNP_I219_LM6) devices.  The e1000e driver
> claims about a zillion different device IDs.
> 
> I would be surprised if these two devices are defective but all the
> others work correctly.  Could it be that there is a problem with the
> wiring on this particular motherboard or with the ACPI _PRW methods
> (or the way Linux interprets them) in this firmware?

If this is a motherboard issue or platform specific, do you prefer to use
DMI matches here?

As for _PRW, it’s shared by USB controller, Audio controller and ethernet.
Only the ethernet (e1000e) has this issue.

When this issue happens, the e1000e doesn’t get woken up by ethernet cable
plugging, but inserting a USB device or plugging audio jack can wake up all
three devices. So I think Linux interprets ACPI correctly here.

Their _PRW here:
USB controller:
Scope (_SB.PCI0)   
{  
Device (XDCI)
{
Method (_PRW, 0, NotSerialized)  // _PRW: Power Resources for Wake
{
Return (GPRW (0x6D, 0x04))
} 

Audio controller:
Scope (_SB.PCI0)
   
{   
   
Device (HDAS)  
{   
  
… 
Method (_PRW, 0, NotSerialized)  // _PRW: Power Resources for Wake  
   
{
Return (GPRW (0x6D, 0x04))   
}  

Ethernet controller:
Scope (_SB.PCI0)
 
{ 
Device (GLAN)  
{   
 
   …
Method (_PRW, 0, NotSerialized)  // _PRW: Power Resources for Wake  
   
{
Return (GPRW (0x6D, 0x04))   
}  
} 
}  


> 
> Would you mind attaching a complete dmesg log and "sudo lspci -vvv"
> output to the bugzilla, please?

Sure.

Kai-Heng

> 
>> Signed-off-by: Kai-Heng Feng 
>> ---
>> drivers/pci/pci-acpi.c | 2 +-
>> drivers/pci/quirks.c   | 8 
>> include/linux/pci.h| 1 +
>> 3 files changed, 10 insertions(+), 1 deletion(-)
>> 
>> diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
>> index e1949f7efd9c..184e2fc8a294 100644
>> --- a/drivers/pci/pci-acpi.c
>> +++ b/drivers/pci/pci-acpi.c
>> @@ -430,7 +430,7 @@ static void pci_acpi_wake_dev(struct 
>> acpi_device_wakeup_context *context)
>> 
>>  pci_dev = to_pci_dev(context->dev);
>> 
>> -if (pci_dev->pme_poll)
>> +if (pci_dev->pme_poll && !pci_dev->unreliable_acpi_wake)
>>  pci_dev->pme_poll = false;
>> 
>>  if (pci_dev->current_state == PCI_D3cold) {
>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>> index b0a413f3f7ca..ed4863496fa8 100644
>> --- a/drivers/pci/quirks.c
>> +++ b/drivers/pci/quirks.c
>> @@ -4948,6 +4948,14 @@ DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_AMD, 
>> PCI_ANY_ID,
>> DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID,
>>PCI_CLASS_MULTIMEDIA_HD_AUDIO, 8, quirk_gpu_hda);
>> 
>> +static void quirk_unreliable_acpi_wake(struct pci_dev *pdev)
>> +{
>> +pci_info(pdev, "ACPI Wake unreliable, al

Re: [PATCH] PCI / ACPI: Don't clear pme_poll on device that has unreliable ACPI wake

2019-01-21 Thread Kai-Heng Feng



> On Jan 22, 2019, at 14:45, Kai-Heng Feng  wrote:
> 
> There are some e1000e devices can only be woken up from D3 one time, by
> plugging ethernet cable. Subsequent cable plugging does set PME bit
> correctly, but it still doesn't get woken up.
> 
> Since e1000e connects to the root complex directly, we rely on ACPI to
> wake it up. In this case, the GPE from _PRW only works once and stops
> working after that.
> 
> So introduce a new PCI quirk, to avoid clearing pme_poll flag for buggy
> platform firmwares that have unreliable GPE wake.

Forgot this:
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202181

> 
> Signed-off-by: Kai-Heng Feng 
> ---
> drivers/pci/pci-acpi.c | 2 +-
> drivers/pci/quirks.c   | 8 
> include/linux/pci.h| 1 +
> 3 files changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
> index e1949f7efd9c..184e2fc8a294 100644
> --- a/drivers/pci/pci-acpi.c
> +++ b/drivers/pci/pci-acpi.c
> @@ -430,7 +430,7 @@ static void pci_acpi_wake_dev(struct 
> acpi_device_wakeup_context *context)
> 
>   pci_dev = to_pci_dev(context->dev);
> 
> - if (pci_dev->pme_poll)
> + if (pci_dev->pme_poll && !pci_dev->unreliable_acpi_wake)
>   pci_dev->pme_poll = false;
> 
>   if (pci_dev->current_state == PCI_D3cold) {
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index b0a413f3f7ca..ed4863496fa8 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -4948,6 +4948,14 @@ DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_AMD, 
> PCI_ANY_ID,
> DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID,
> PCI_CLASS_MULTIMEDIA_HD_AUDIO, 8, quirk_gpu_hda);
> 
> +static void quirk_unreliable_acpi_wake(struct pci_dev *pdev)
> +{
> + pci_info(pdev, "ACPI Wake unreliable, always poll PME\n");
> + pdev->unreliable_acpi_wake = 1;
> +}
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x15bb, 
> quirk_unreliable_acpi_wake);
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x15bd, 
> quirk_unreliable_acpi_wake);
> +
> /*
>  * Some IDT switches incorrectly flag an ACS Source Validation error on
>  * completions for config read requests even though PCIe r4.0, sec
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 65f1d8c2f082..d22065c1576f 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -331,6 +331,7 @@ struct pci_dev {
>   unsigned intpme_support:5;  /* Bitmask of states from which PME#
>  can be generated */
>   unsigned intpme_poll:1; /* Poll device's PME status bit */
> + unsigned intunreliable_acpi_wake:1; /* ACPI Wake doesn't always 
> work */
>   unsigned intd1_support:1;   /* Low power state D1 is supported */
>   unsigned intd2_support:1;   /* Low power state D2 is supported */
>   unsigned intno_d1d2:1;  /* D1 and D2 are forbidden */
> -- 
> 2.17.1
> 



[PATCH] PCI / ACPI: Don't clear pme_poll on device that has unreliable ACPI wake

2019-01-21 Thread Kai-Heng Feng
There are some e1000e devices can only be woken up from D3 one time, by
plugging ethernet cable. Subsequent cable plugging does set PME bit
correctly, but it still doesn't get woken up.

Since e1000e connects to the root complex directly, we rely on ACPI to
wake it up. In this case, the GPE from _PRW only works once and stops
working after that.

So introduce a new PCI quirk, to avoid clearing pme_poll flag for buggy
platform firmwares that have unreliable GPE wake.

Signed-off-by: Kai-Heng Feng 
---
 drivers/pci/pci-acpi.c | 2 +-
 drivers/pci/quirks.c   | 8 
 include/linux/pci.h| 1 +
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
index e1949f7efd9c..184e2fc8a294 100644
--- a/drivers/pci/pci-acpi.c
+++ b/drivers/pci/pci-acpi.c
@@ -430,7 +430,7 @@ static void pci_acpi_wake_dev(struct 
acpi_device_wakeup_context *context)
 
pci_dev = to_pci_dev(context->dev);
 
-   if (pci_dev->pme_poll)
+   if (pci_dev->pme_poll && !pci_dev->unreliable_acpi_wake)
pci_dev->pme_poll = false;
 
if (pci_dev->current_state == PCI_D3cold) {
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index b0a413f3f7ca..ed4863496fa8 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -4948,6 +4948,14 @@ DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_AMD, 
PCI_ANY_ID,
 DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID,
  PCI_CLASS_MULTIMEDIA_HD_AUDIO, 8, quirk_gpu_hda);
 
+static void quirk_unreliable_acpi_wake(struct pci_dev *pdev)
+{
+   pci_info(pdev, "ACPI Wake unreliable, always poll PME\n");
+   pdev->unreliable_acpi_wake = 1;
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x15bb, 
quirk_unreliable_acpi_wake);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x15bd, 
quirk_unreliable_acpi_wake);
+
 /*
  * Some IDT switches incorrectly flag an ACS Source Validation error on
  * completions for config read requests even though PCIe r4.0, sec
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 65f1d8c2f082..d22065c1576f 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -331,6 +331,7 @@ struct pci_dev {
unsigned intpme_support:5;  /* Bitmask of states from which PME#
   can be generated */
unsigned intpme_poll:1; /* Poll device's PME status bit */
+   unsigned intunreliable_acpi_wake:1; /* ACPI Wake doesn't always 
work */
unsigned intd1_support:1;   /* Low power state D1 is supported */
unsigned intd2_support:1;   /* Low power state D2 is supported */
unsigned intno_d1d2:1;  /* D1 and D2 are forbidden */
-- 
2.17.1



Re: Latitude 5495's tg3 hangs under heavy load

2019-01-08 Thread Kai Heng Feng



> On Jan 7, 2019, at 5:12 PM, Siva Reddy Kallam  
> wrote:
> 
> On Mon, Jan 7, 2019 at 11:34 AM Kai Heng Feng
>  wrote:
>> 
>> Hi tg3 folks,
>> 
>> Any idea how to solve the bug?
>> 
>> Kai-Heng
> Hi,
> Can you share Register dump(ethtool -d)?

`ethtool -d` before freeze:
https://pastebin.com/MSkJzhcv

`ethtool -d` after freeze:
https://pastebin.com/dQj8mLsN

> Is ifconfig down/up bringing back interface?

Yes. And seems like network works fine afterward.

`ethtool -d` after down/up:
https://pastebin.com/vL1gCC2n

Kai-Heng

> 
>> 
>>> On Dec 7, 2018, at 17:27, Kai Heng Feng  wrote:
>>> 
>>> Hi tg3 maintainers,
>>> 
>>> I’ve encountered network freeze when using tg3 in gigabits net.
>>> 
>>> The issue can be easily reproduced when using scp to transfer files in 
>>> local network.
>>> 
>>> The symptom is pretty similar to what this commit is trying to solve:
>>> commit 3a498606bb04af603a46ebde8296040b2de350d1
>>> Author: Sanjeev Bansal 
>>> Date:   Mon Jul 16 11:13:32 2018 +0530
>>> 
>>>   tg3: Add higher cpu clock for 5762.
>>> 
>>>   This patch has fix for TX timeout while running bi-directional
>>>   traffic with 100 Mbps using 5762.
>>> 
>>> But reverting this commit doesn’t help.
>>> 
>>> Latitude 5495 is a AMD Raven Ridge platform, not sure if this matters.
>>> 
>>> Here’s the lspci for this device:
>>> 03:00.0 Ethernet controller [0200]: Broadcom Inc. and subsidiaries 
>>> NetXtreme BCM5762 Gigabit Ethernet PCIe [14e4:1687] (rev 10)
>>>   Subsystem: Dell NetXtreme BCM5762 Gigabit Ethernet PCIe [1028:0814]
>>>   Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
>>> Stepping- SERR- FastB2B- DisINTx+
>>>   Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- 
>>> SERR- >>   Latency: 0
>>>   Interrupt: pin A routed to IRQ 16
>>>   Region 0: Memory at e022 (64-bit, prefetchable) [size=64K]
>>>   Region 2: Memory at e021 (64-bit, prefetchable) [size=64K]
>>>   Region 4: Memory at e020 (64-bit, prefetchable) [size=64K]
>>>   Capabilities: [48] Power Management version 3
>>>   Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
>>> PME(D0+,D1-,D2-,D3hot+,D3cold+)
>>>   Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
>>>   Capabilities: [50] Vital Product Data
>>>   Product Name: Broadcom NetXtreme Gigabit Ethernet Controller
>>>   Read-only fields:
>>>   [PN] Part number: BCM95762
>>>   [EC] Engineering changes: 106679-15
>>>   [SN] Serial number: 0123456789
>>>   [MN] Manufacture ID: 31 34 65 34
>>>   [RV] Reserved: checksum good, 28 byte(s) reserved
>>>   Read/write fields:
>>>   [YA] Asset tag: XYZ01234567
>>>   [RW] Read-write area: 107 byte(s) free
>>>   End
>>>   Capabilities: [58] MSI: Enable- Count=1/8 Maskable- 64bit+
>>>   Address:   Data: 
>>>   Capabilities: [a0] MSI-X: Enable+ Count=6 Masked-
>>>   Vector table: BAR=4 offset=
>>>   PBA: BAR=2 offset=0120
>>>   Capabilities: [ac] Express (v2) Endpoint, MSI 00
>>>   DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, 
>>> L1 <64us
>>>   ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ 
>>> SlotPowerLimit 0.000W
>>>   DevCtl: Report errors: Correctable- Non-Fatal- Fatal- 
>>> Unsupported-
>>>   RlxdOrd- ExtTag- PhantFunc- AuxPwr+ NoSnoop- FLReset-
>>>   MaxPayload 128 bytes, MaxReadReq 4096 bytes
>>>   DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ 
>>> TransPend-
>>>   LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, Exit 
>>> Latency L0s <2us, L1 <64us
>>>   ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
>>>   LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
>>>   ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
>>>   LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ 
>>> DLActive- BWMgmt- ABWMgmt-
>>>   De

Re: r8152: data corruption in various scenarios

2019-01-06 Thread Kai Heng Feng



> On Jan 7, 2019, at 12:13, Mark Lord  wrote:
> 
> On 2019-01-06 11:09 p.m., Kai Heng Feng wrote:
>> 
>> 
>>> On Jan 7, 2019, at 05:16, Mark Lord  wrote:
>>> 
>>> On 2019-01-06 4:13 p.m., Mark Lord wrote:
>>>> On 2019-01-06 2:14 p.m., Kai Heng Feng wrote:>> On Jan 5, 2019, at 10:14 
>>>> PM, Mark Lord
>>>>  wrote:
>>>> ..
>>>>>> There is even now a special hack in the upstream r8152.c to attempt to 
>>>>>> detect
>>>>>> a Dell TB16 dock and disable RX Aggregation in the driver to prevent 
>>>>>> such issues.
>>>>>> 
>>>>>> Well.. I have a WD15 dock, not a TB16, and that same hack also catches 
>>>>>> my dock
>>>>>> in its net:
>>>>>> 
>>>>>>  [5.794641] usb 4-1.2: Dell TB16 Dock, disable RX aggregation
>>>>> 
>>>>> The serial should be unique according to Dell.
>>>>> 
>>>>>> So one issue is that the code is not correctly identifying the dock,
>>>>>> and the WD15 is claimed to be immune from the r8152 issues.
>>>>> 
>>>>> The WD15 I tested didn't use that serial number though...
>>>> 
>>>> What info do you need from me about the WD15 so this can be corrected?
>>>> 
>>>>>> xhci_hcd :39:00.0: ERROR Transfer event TRB DMA ptr not part of 
>>>>>> current TD ep_index 13
>>>>>> comp_code 1
>>>>> 
>>>>> This is probably an xHC bug. A similar issue is fixed by commit 
>>>>> 9da5a1092b13
>>>>> ("xhci: Bad Ethernet performance plugged in ASM1042A host”). 
>>>>> 
>>>>>> I just got that exact message above, with the r8152 in my 1-day old WD15 
>>>>>> dock,
>>>>>> with the TB16 "workaround" enabled in Linux kernel 4.20.0.
>>>>> 
>>>>> Is the xHC WD15 connected an ASMedia one?
>>>> 
>>>> I don't know.  I *think* it identifies as a DSL6340 (see below).
>>>> 
>>>> Here is lspci and lsusb:
>>>> 
>>>> $ lspci -vt
>>>> -[:00]-+-00.0  Intel Corporation Xeon E3-1200 v6/7th Gen Core 
>>>> Processor Host Bridge/DRAM Registers
>>>>  +-02.0  Intel Corporation UHD Graphics 620
>>>>  +-04.0  Intel Corporation Skylake Processor Thermal Subsystem
>>>>  +-14.0  Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller
>>>>  +-14.2  Intel Corporation Sunrise Point-LP Thermal subsystem
>>>>  +-15.0  Intel Corporation Sunrise Point-LP Serial IO I2C 
>>>> Controller #0
>>>>  +-15.1  Intel Corporation Sunrise Point-LP Serial IO I2C 
>>>> Controller #1
>>>>  +-16.0  Intel Corporation Sunrise Point-LP CSME HECI #1
>>>>  +-1c.0-[01-39]00.0-[02-39]--+-00.0-[03]--
>>>>  |   +-01.0-[04-38]--
>>>>  |   \-02.0-[39]00.0  Intel 
>>>> Corporation DSL6340 USB 3.1
>>>> Controller [Alpine Ridge]
>>>>  +-1c.4-[3a]00.0  Qualcomm Atheros QCA6174 802.11ac Wireless 
>>>> Network Adapter
>>>>  +-1d.0-[3b]00.0  Samsung Electronics Co Ltd Device a808
>>>>  +-1f.0  Intel Corporation Device 9d4e
>>>>  +-1f.2  Intel Corporation Sunrise Point-LP PMC
>>>>  +-1f.3  Intel Corporation Sunrise Point-LP HD Audio
>>>>  \-1f.4  Intel Corporation Sunrise Point-LP SMBus
>>> 
>>> 
>>> Mmm.. lspci -vt isn't as verbose as I thought, so here is plain lspci to 
>>> fill in the blanks:
>>> 
>>> $ lspci
>>> 00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v6/7th Gen Core 
>>> Processor Host Bridge/DRAM
>>> Registers (rev 08)
>>> 00:02.0 VGA compatible controller: Intel Corporation UHD Graphics 620 (rev 
>>> 07)
>>> 
>>> 00:04.0 Signal processing controller: Intel Corporation Skylake Processor 
>>> Thermal Subsystem (rev 08)
>>> 
>>> 00:14.0 USB controller: Intel Corporation Sunrise Point-LP USB 3.0 xHCI 
>>> Controller (rev 21)
>>> 
>>> 00:14.2 Signal processing controller: Intel Corporation Sunrise Point-LP 
>>> Thermal subsystem (rev 21)
>>> 
>>> 00:15.0 Signal processi

Re: Latitude 5495's tg3 hangs under heavy load

2019-01-06 Thread Kai Heng Feng
Hi tg3 folks,

Any idea how to solve the bug?

Kai-Heng

> On Dec 7, 2018, at 17:27, Kai Heng Feng  wrote:
> 
> Hi tg3 maintainers,
> 
> I’ve encountered network freeze when using tg3 in gigabits net.
> 
> The issue can be easily reproduced when using scp to transfer files in local 
> network.
> 
> The symptom is pretty similar to what this commit is trying to solve:
> commit 3a498606bb04af603a46ebde8296040b2de350d1
> Author: Sanjeev Bansal 
> Date:   Mon Jul 16 11:13:32 2018 +0530
> 
>tg3: Add higher cpu clock for 5762.
> 
>This patch has fix for TX timeout while running bi-directional
>traffic with 100 Mbps using 5762.
> 
> But reverting this commit doesn’t help.
> 
> Latitude 5495 is a AMD Raven Ridge platform, not sure if this matters.
> 
> Here’s the lspci for this device:
> 03:00.0 Ethernet controller [0200]: Broadcom Inc. and subsidiaries NetXtreme 
> BCM5762 Gigabit Ethernet PCIe [14e4:1687] (rev 10)
>Subsystem: Dell NetXtreme BCM5762 Gigabit Ethernet PCIe [1028:0814]
>Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
> Stepping- SERR- FastB2B- DisINTx+
>Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- 
> SERR- Latency: 0
>Interrupt: pin A routed to IRQ 16
>Region 0: Memory at e022 (64-bit, prefetchable) [size=64K]
>Region 2: Memory at e021 (64-bit, prefetchable) [size=64K]
>Region 4: Memory at e020 (64-bit, prefetchable) [size=64K]
>Capabilities: [48] Power Management version 3
>Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
> PME(D0+,D1-,D2-,D3hot+,D3cold+)
>Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
>Capabilities: [50] Vital Product Data
>Product Name: Broadcom NetXtreme Gigabit Ethernet Controller
>Read-only fields:
>[PN] Part number: BCM95762
>[EC] Engineering changes: 106679-15
>[SN] Serial number: 0123456789
>[MN] Manufacture ID: 31 34 65 34
>[RV] Reserved: checksum good, 28 byte(s) reserved
>Read/write fields:
>[YA] Asset tag: XYZ01234567
>[RW] Read-write area: 107 byte(s) free
>End
>Capabilities: [58] MSI: Enable- Count=1/8 Maskable- 64bit+
>Address:   Data: 
>Capabilities: [a0] MSI-X: Enable+ Count=6 Masked-
>Vector table: BAR=4 offset=
>PBA: BAR=2 offset=0120
>Capabilities: [ac] Express (v2) Endpoint, MSI 00
>DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, 
> L1 <64us
>ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ 
> SlotPowerLimit 0.000W
>DevCtl: Report errors: Correctable- Non-Fatal- Fatal- 
> Unsupported-
>RlxdOrd- ExtTag- PhantFunc- AuxPwr+ NoSnoop- FLReset-
>MaxPayload 128 bytes, MaxReadReq 4096 bytes
>DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ 
> TransPend-
>LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, Exit 
> Latency L0s <2us, L1 <64us
>ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
>LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
>ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
>LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ 
> DLActive- BWMgmt- ABWMgmt-
>DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, 
> OBFF Via WAKE#
>DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, 
> OBFF Disabled
>LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
> Transmit Margin: Normal Operating Range, 
> EnterModifiedCompliance- ComplianceSOS-
> Compliance De-emphasis: -6dB
>LnkSta2: Current De-emphasis Level: -3.5dB, 
> EqualizationComplete-, EqualizationPhase1-
> EqualizationPhase2-, EqualizationPhase3-, 
> LinkEqualizationRequest-
>Capabilities: [100 v1] Advanced Error Reporting
>UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- 
> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- 
> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- 
> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>CESta

Re: r8152: data corruption in various scenarios

2019-01-06 Thread Kai Heng Feng



> On Jan 7, 2019, at 05:16, Mark Lord  wrote:
> 
> On 2019-01-06 4:13 p.m., Mark Lord wrote:
>> On 2019-01-06 2:14 p.m., Kai Heng Feng wrote:>> On Jan 5, 2019, at 10:14 PM, 
>> Mark Lord
>>  wrote:
>> ..
>>>> There is even now a special hack in the upstream r8152.c to attempt to 
>>>> detect
>>>> a Dell TB16 dock and disable RX Aggregation in the driver to prevent such 
>>>> issues.
>>>> 
>>>> Well.. I have a WD15 dock, not a TB16, and that same hack also catches my 
>>>> dock
>>>> in its net:
>>>> 
>>>>   [5.794641] usb 4-1.2: Dell TB16 Dock, disable RX aggregation
>>> 
>>> The serial should be unique according to Dell.
>>> 
>>>> So one issue is that the code is not correctly identifying the dock,
>>>> and the WD15 is claimed to be immune from the r8152 issues.
>>> 
>>> The WD15 I tested didn't use that serial number though...
>> 
>> What info do you need from me about the WD15 so this can be corrected?
>> 
>>>>  xhci_hcd :39:00.0: ERROR Transfer event TRB DMA ptr not part of 
>>>> current TD ep_index 13
>>>> comp_code 1
>>> 
>>> This is probably an xHC bug. A similar issue is fixed by commit 9da5a1092b13
>>> ("xhci: Bad Ethernet performance plugged in ASM1042A host”). 
>>> 
>>>> I just got that exact message above, with the r8152 in my 1-day old WD15 
>>>> dock,
>>>> with the TB16 "workaround" enabled in Linux kernel 4.20.0.
>>> 
>>> Is the xHC WD15 connected an ASMedia one?
>> 
>> I don't know.  I *think* it identifies as a DSL6340 (see below).
>> 
>> Here is lspci and lsusb:
>> 
>> $ lspci -vt
>> -[:00]-+-00.0  Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor 
>> Host Bridge/DRAM Registers
>>   +-02.0  Intel Corporation UHD Graphics 620
>>   +-04.0  Intel Corporation Skylake Processor Thermal Subsystem
>>   +-14.0  Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller
>>   +-14.2  Intel Corporation Sunrise Point-LP Thermal subsystem
>>   +-15.0  Intel Corporation Sunrise Point-LP Serial IO I2C 
>> Controller #0
>>   +-15.1  Intel Corporation Sunrise Point-LP Serial IO I2C 
>> Controller #1
>>   +-16.0  Intel Corporation Sunrise Point-LP CSME HECI #1
>>   +-1c.0-[01-39]00.0-[02-39]--+-00.0-[03]--
>>   |   +-01.0-[04-38]--
>>   |   \-02.0-[39]00.0  Intel 
>> Corporation DSL6340 USB 3.1
>> Controller [Alpine Ridge]
>>   +-1c.4-[3a]00.0  Qualcomm Atheros QCA6174 802.11ac Wireless 
>> Network Adapter
>>   +-1d.0-[3b]00.0  Samsung Electronics Co Ltd Device a808
>>   +-1f.0  Intel Corporation Device 9d4e
>>   +-1f.2  Intel Corporation Sunrise Point-LP PMC
>>   +-1f.3  Intel Corporation Sunrise Point-LP HD Audio
>>   \-1f.4  Intel Corporation Sunrise Point-LP SMBus
> 
> 
> Mmm.. lspci -vt isn't as verbose as I thought, so here is plain lspci to fill 
> in the blanks:
> 
> $ lspci
> 00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor 
> Host Bridge/DRAM
> Registers (rev 08)
> 00:02.0 VGA compatible controller: Intel Corporation UHD Graphics 620 (rev 07)
> 
> 00:04.0 Signal processing controller: Intel Corporation Skylake Processor 
> Thermal Subsystem (rev 08)
> 
> 00:14.0 USB controller: Intel Corporation Sunrise Point-LP USB 3.0 xHCI 
> Controller (rev 21)
> 
> 00:14.2 Signal processing controller: Intel Corporation Sunrise Point-LP 
> Thermal subsystem (rev 21)
> 
> 00:15.0 Signal processing controller: Intel Corporation Sunrise Point-LP 
> Serial IO I2C Controller #0
> (rev 21)
> 00:15.1 Signal processing controller: Intel Corporation Sunrise Point-LP 
> Serial IO I2C Controller #1
> (rev 21)
> 00:16.0 Communication controller: Intel Corporation Sunrise Point-LP CSME 
> HECI #1 (rev 21)
> 
> 00:1c.0 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port 
> (rev f1)
> 
> 00:1c.4 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port 
> #5 (rev f1)
> 
> 00:1d.0 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port 
> #9 (rev f1)
> 
> 00:1f.0 ISA bridge: Intel Corporation Device 9d4e (rev 21)
> 
> 00:1f.2 Memory controller: Intel Corporation Sunrise Point-LP PMC (rev 21)
> 
> 00:1f.3 Audio device: Intel Corporation Sunrise Point-

Re: r8152: data corruption in various scenarios

2019-01-06 Thread Kai Heng Feng



> On Jan 5, 2019, at 10:14 PM, Mark Lord  wrote:
> 
> A couple of years back, I reported data corruption resulting from
> a change in kernel 3.16 which enabled hardware checksums in the r8152 driver.
> This was happening on an embedded system that was using a r8152 USB dongle.
> 
> At the time, it was very difficult to figure out what could possibly be 
> causing it,
> other than that re-enabling software checksums prevented corrupted packets 
> from
> resulting in more serious issues.
> 
> Since that time, more and more reports of similar corruption and issues
> have been trickling in.  Eg.
> 
>   https://lore.kernel.org/patchwork/patch/873920/
> 
> Note that there are reports in the thread above that the issues
> are not limited to only the built-in ethernet chip of the dock.
> 
> There is even now a special hack in the upstream r8152.c to attempt to detect
> a Dell TB16 dock and disable RX Aggregation in the driver to prevent such 
> issues.
> 
> Well.. I have a WD15 dock, not a TB16, and that same hack also catches my dock
> in its net:
> 
>[5.794641] usb 4-1.2: Dell TB16 Dock, disable RX aggregation

The serial should be unique according to Dell.

> 
> So one issue is that the code is not correctly identifying the dock,
> and the WD15 is claimed to be immune from the r8152 issues.

The WD15 I tested didn't use that serial number though...

> 
> One of the symptoms of the r8152 issue, reported by Ansis Atteka,
> were messages like this:
> 
>   xhci_hcd :39:00.0: ERROR Transfer event TRB DMA ptr not part of current 
> TD ep_index 13
> comp_code 1

This is probably an xHC bug. A similar issue is fixed by commit 9da5a1092b13
("xhci: Bad Ethernet performance plugged in ASM1042A host”). 

> 
> I just got that exact message above, with the r8152 in my 1-day old WD15 dock,
> with the TB16 "workaround" enabled in Linux kernel 4.20.0.

Is the xHC WD15 connected an ASMedia one?

Kai-Heng

> 
> From this I conclude that the workaround is not 100% complete yet.
> -- 
> Mark Lord
> Real-Time Remedies Inc.
> ml...@pobox.com



[PATCH] r8169: Add support for new Realtek Ethernet

2019-01-01 Thread Kai-Heng Feng
There are two new Realtek Ethernet devices which are re-branded r8168h.
Add the IDs to to support them.

Signed-off-by: Kai-Heng Feng 
---
 drivers/net/ethernet/realtek/r8169.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/realtek/r8169.c 
b/drivers/net/ethernet/realtek/r8169.c
index 99bc3de906e2..66a3f3bf7835 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -205,6 +205,8 @@ enum cfg_version {
 };
 
 static const struct pci_device_id rtl8169_pci_tbl[] = {
+   { PCI_VDEVICE(REALTEK,  0x2502), RTL_CFG_1 },
+   { PCI_VDEVICE(REALTEK,  0x2600), RTL_CFG_1 },
{ PCI_VDEVICE(REALTEK,  0x8129), RTL_CFG_0 },
{ PCI_VDEVICE(REALTEK,  0x8136), RTL_CFG_2 },
{ PCI_VDEVICE(REALTEK,  0x8161), RTL_CFG_1 },
-- 
2.17.1



Re: [PATCH] rtlwifi: Fix non-working BSS STA mode

2018-12-20 Thread Kai Heng Feng



> On Dec 20, 2018, at 14:42, Kalle Valo  wrote:
> 
> Kai-Heng Feng  wrote:
> 
>> Once BSS STA mode gets started, it can be scanned by other clients but
>> cannot entablish a connection.
>> 
>> Turns out the set_bcn_reg() and its *_set_beacon_related_registers()
>> callbacks never get called so it has problem beaconing.
>> 
>> Enable the function in rtl_op_bss_info_changed() can make BSS STA mode
>> start to work.
>> 
>> Signed-off-by: Kai-Heng Feng 
> 
> The commit log is quite misleading. It implies that the client mode is broken
> for all rtlwifi hardware and that can't be the case, otherwise we would be
> flooded by bug reports. So please improve the commit log and describe clearly
> the problem you are solving.

You are right. I'll wait for PKShih’s test result and update commit log 
afterwards.

Kai-Heng

> 
> -- 
> https://patchwork.kernel.org/patch/10725537/
> 
> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
> 



Re: [PATCH] rtlwifi: Fix non-working BSS STA mode

2018-12-13 Thread Kai Heng Feng



> On Dec 13, 2018, at 15:39, Pkshih  wrote:
> 
> On Thu, 2018-12-13 at 13:36 +0800, Kai Heng Feng wrote:
>>> On Dec 13, 2018, at 08:35, Pkshih  wrote:
>>>  
>>> On Wed, 2018-12-12 at 13:13 +0800, Kai-Heng Feng wrote:
>>>> Once BSS STA mode gets started, it can be scanned by other clients but
>>>> cannot entablish a connection.
>>>   ^^^ typo: establish
>>>>  
>>>> Turns out the set_bcn_reg() and its *_set_beacon_related_registers()
>>>> callbacks never get called so it has problem beaconing.
>>>>  
>>>> Enable the function in rtl_op_bss_info_changed() can make BSS STA mode
>>>> start to work.
>>>>  
>>>> Signed-off-by: Kai-Heng Feng 
>>>> ---
>>>>   drivers/net/wireless/realtek/rtlwifi/core.c | 2 +-
>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>  
>>>> diff --git a/drivers/net/wireless/realtek/rtlwifi/core.c
>>>> b/drivers/net/wireless/realtek/rtlwifi/core.c
>>>> index 4bf7967590ca..11d27a5cc576 100644
>>>> --- a/drivers/net/wireless/realtek/rtlwifi/core.c
>>>> +++ b/drivers/net/wireless/realtek/rtlwifi/core.c
>>>> @@ -1054,7 +1054,7 @@ static void rtl_op_bss_info_changed(struct
>> ieee80211_hw
>>>> *hw,
>>>> "BSS_CHANGED_BEACON_ENABLED\n");
>>>>   
>>>>/*start hw beacon interrupt. */
>>>> -  /*rtlpriv->cfg->ops->set_bcn_reg(hw); */
>>>> +  rtlpriv->cfg->ops->set_bcn_reg(hw);
>>>>mac->beacon_enabled = 1;
>>>>rtlpriv->cfg->ops-
>>> update_interrupt_mask(hw,
>>>>rtlpriv->cfg->maps
>>>  
>>> Which wifi chip do you use? And, please share your test scenario.
>> 
>> It’s Realtek 8723DE, which is currently not supported in mainline so I use
>> rtl8723de in rtlwifi_new [1] to test it out.
>> 
>> The test scenario is simply enable hotspot through network manager, which 
>> uses
>> wpa_supplicant to do the work.
>> 
>> [1] https://github.com/lwfinger/rtlwifi_new
>> 
> 
> Since rtl8723de isn't supported yet, this patch would be pending.
> I'll take time to check whether it works on existing chips.

Thanks, that will be great.

Unrelated question:
Is it possible to use a DMI table to select correct ant_sel for affected HP 
laptops?
This can bring better out-of-the-box experience for users.

Kai-Heng

> 
> Thanks
> PK



Re: [PATCH] rtlwifi: Fix non-working BSS STA mode

2018-12-12 Thread Kai Heng Feng



> On Dec 13, 2018, at 08:35, Pkshih  wrote:
> 
> On Wed, 2018-12-12 at 13:13 +0800, Kai-Heng Feng wrote:
>> Once BSS STA mode gets started, it can be scanned by other clients but
>> cannot entablish a connection.
>  ^^^ typo: establish
>> 
>> Turns out the set_bcn_reg() and its *_set_beacon_related_registers()
>> callbacks never get called so it has problem beaconing.
>> 
>> Enable the function in rtl_op_bss_info_changed() can make BSS STA mode
>> start to work.
>> 
>> Signed-off-by: Kai-Heng Feng 
>> ---
>>  drivers/net/wireless/realtek/rtlwifi/core.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/drivers/net/wireless/realtek/rtlwifi/core.c
>> b/drivers/net/wireless/realtek/rtlwifi/core.c
>> index 4bf7967590ca..11d27a5cc576 100644
>> --- a/drivers/net/wireless/realtek/rtlwifi/core.c
>> +++ b/drivers/net/wireless/realtek/rtlwifi/core.c
>> @@ -1054,7 +1054,7 @@ static void rtl_op_bss_info_changed(struct ieee80211_hw
>> *hw,
>>   "BSS_CHANGED_BEACON_ENABLED\n");
>>  
>>  /*start hw beacon interrupt. */
>> -/*rtlpriv->cfg->ops->set_bcn_reg(hw); */
>> +rtlpriv->cfg->ops->set_bcn_reg(hw);
>>  mac->beacon_enabled = 1;
>>  rtlpriv->cfg->ops->update_interrupt_mask(hw,
>>  rtlpriv->cfg->maps
> 
> Which wifi chip do you use? And, please share your test scenario.

It’s Realtek 8723DE, which is currently not supported in mainline so I use 
rtl8723de in rtlwifi_new [1] to test it out.

The test scenario is simply enable hotspot through network manager, which uses 
wpa_supplicant to do the work.

[1] https://github.com/lwfinger/rtlwifi_new

Kai-Heng

> 
> Thanks
> 



[PATCH] rtlwifi: Fix non-working BSS STA mode

2018-12-11 Thread Kai-Heng Feng
Once BSS STA mode gets started, it can be scanned by other clients but
cannot entablish a connection.

Turns out the set_bcn_reg() and its *_set_beacon_related_registers()
callbacks never get called so it has problem beaconing.

Enable the function in rtl_op_bss_info_changed() can make BSS STA mode
start to work.

Signed-off-by: Kai-Heng Feng 
---
 drivers/net/wireless/realtek/rtlwifi/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/realtek/rtlwifi/core.c 
b/drivers/net/wireless/realtek/rtlwifi/core.c
index 4bf7967590ca..11d27a5cc576 100644
--- a/drivers/net/wireless/realtek/rtlwifi/core.c
+++ b/drivers/net/wireless/realtek/rtlwifi/core.c
@@ -1054,7 +1054,7 @@ static void rtl_op_bss_info_changed(struct ieee80211_hw 
*hw,
 "BSS_CHANGED_BEACON_ENABLED\n");
 
/*start hw beacon interrupt. */
-   /*rtlpriv->cfg->ops->set_bcn_reg(hw); */
+   rtlpriv->cfg->ops->set_bcn_reg(hw);
mac->beacon_enabled = 1;
rtlpriv->cfg->ops->update_interrupt_mask(hw,
rtlpriv->cfg->maps
-- 
2.17.1



[PATCH 1/2] e1000e: Exclude device from suspend direct complete optimization

2018-12-11 Thread Kai-Heng Feng
e1000e sets different WoL settings in system suspend callback and
runtime suspend callback.

The suspend direct complete optimization leaves e1000e in runtime
suspneded state with wrong WoL setting during system suspend.

To fix this, we need to disable suspend direct complete optimization to
let e1000e always use suspend callback to set correct WoL during system
suspend.

Signed-off-by: Kai-Heng Feng 
---
 drivers/net/ethernet/intel/e1000e/netdev.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index e434a6a64966..4a22390f9f49 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -7351,6 +7351,8 @@ static int e1000_probe(struct pci_dev *pdev, const struct 
pci_device_id *ent)
 
e1000_print_device_info(adapter);
 
+   dev_pm_set_driver_flags(&pdev->dev, DPM_FLAG_NEVER_SKIP);
+
if (pci_dev_run_wake(pdev))
pm_runtime_put_noidle(&pdev->dev);
 
-- 
2.17.1



[PATCH 2/2] igb: Exclude device from suspend direct complete optimization

2018-12-11 Thread Kai-Heng Feng
igb sets different WoL settings in system suspend callback and runtime
suspend callback.

The suspend direct complete optimization leaves igb in runtime suspneded
state with wrong WoL setting during system suspend.

To fix this, we need to disable suspend direct complete optimization to
let igb always use suspend callback to set correct WoL during system
suspend.

Signed-off-by: Kai-Heng Feng 
---
 drivers/net/ethernet/intel/igb/igb_main.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index 2ce3f2cb156d..e0e8f7a708e9 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -3467,6 +3467,9 @@ static int igb_probe(struct pci_dev *pdev, const struct 
pci_device_id *ent)
break;
}
}
+
+   dev_pm_set_driver_flags(&pdev->dev, DPM_FLAG_NEVER_SKIP);
+
pm_runtime_put_noidle(&pdev->dev);
return 0;
 
-- 
2.17.1



Latitude 5495's tg3 hangs under heavy load

2018-12-07 Thread Kai Heng Feng
Hi tg3 maintainers,

I’ve encountered network freeze when using tg3 in gigabits net.

The issue can be easily reproduced when using scp to transfer files in local 
network.

The symptom is pretty similar to what this commit is trying to solve:
commit 3a498606bb04af603a46ebde8296040b2de350d1
Author: Sanjeev Bansal 
Date:   Mon Jul 16 11:13:32 2018 +0530

tg3: Add higher cpu clock for 5762.

This patch has fix for TX timeout while running bi-directional
traffic with 100 Mbps using 5762.

But reverting this commit doesn’t help.

Latitude 5495 is a AMD Raven Ridge platform, not sure if this matters.

Here’s the lspci for this device:
03:00.0 Ethernet controller [0200]: Broadcom Inc. and subsidiaries NetXtreme 
BCM5762 Gigabit Ethernet PCIe [14e4:1687] (rev 10)
Subsystem: Dell NetXtreme BCM5762 Gigabit Ethernet PCIe [1028:0814]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- 
Capabilities: [160 v1] Virtual Channel
Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
Arb:Fixed- WRR32- WRR64- WRR128-
Ctrl:   ArbSelect=Fixed
Status: InProgress-
VC0:Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb:Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
Status: NegoPending- InProgress-
Capabilities: [1b0 v1] Latency Tolerance Reporting
Max snoop latency: 1048576ns
Max no snoop latency: 1048576ns
Capabilities: [230 v1] Transaction Processing Hints
Interrupt vector mode supported
Steering table in MSI-X table
Kernel driver in use: tg3
Kernel modules: tg3

Please let me know if you need more information.

Kai-Heng


[PATCH] igb: Fix an issue that PME is not enabled during runtime suspend

2018-12-02 Thread Kai-Heng Feng
I210 ethernet card doesn't wakeup when a cable gets plugged. It's
because its PME is not set.

Since commit 42eca2302146 ("PCI: Don't touch card regs after runtime
suspend D3"), if the PCI state is saved, pci_pm_runtime_suspend() stops
calling pci_finish_runtime_suspend(), which enables the PCI PME.

To fix the issue, let's not to save PCI states when it's runtime
suspend, to let the PCI subsytem enables PME.

Fixes: 42eca2302146 ("PCI: Don't touch card regs after runtime suspend D3")
Signed-off-by: Kai-Heng Feng 
---
 drivers/net/ethernet/intel/igb/igb_main.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index 5df88ad8ac81..93f150784cfc 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -8770,9 +8770,11 @@ static int __igb_shutdown(struct pci_dev *pdev, bool 
*enable_wake,
rtnl_unlock();
 
 #ifdef CONFIG_PM
-   retval = pci_save_state(pdev);
-   if (retval)
-   return retval;
+   if (!runtime) {
+   retval = pci_save_state(pdev);
+   if (retval)
+   return retval;
+   }
 #endif
 
status = rd32(E1000_STATUS);
-- 
2.17.1



[PATCH] r8169: Clear RTL_FLAG_TASK_*_PENDING when clearing RTL_FLAG_TASK_ENABLED

2018-09-09 Thread Kai-Heng Feng
After system suspend, sometimes the r8169 doesn't work when ethernet
cable gets pluggued.

This issue happens because rtl_reset_work() doesn't get called from
rtl8169_runtime_resume(), after system suspend.

In rtl_task(), RTL_FLAG_TASK_* only gets cleared if this condition is
met:
if (!netif_running(dev) ||
!test_bit(RTL_FLAG_TASK_ENABLED, tp->wk.flags))
...

If RTL_FLAG_TASK_ENABLED was cleared during system suspend while
RTL_FLAG_TASK_RESET_PENDING was set, the next rtl_schedule_task() won't
schedule task as the flag is still there.

So in addition to clearing RTL_FLAG_TASK_ENABLED, also clears other
flags.

Cc: Heiner Kallweit 
Signed-off-by: Kai-Heng Feng 
---
 drivers/net/ethernet/realtek/r8169.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c 
b/drivers/net/ethernet/realtek/r8169.c
index b08d51bf7a20..20593245ef53 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -6648,6 +6648,7 @@ static int rtl8169_close(struct net_device *dev)
 {
struct rtl8169_private *tp = netdev_priv(dev);
struct pci_dev *pdev = tp->pci_dev;
+   int i;
 
pm_runtime_get_sync(&pdev->dev);
 
@@ -6655,7 +6656,9 @@ static int rtl8169_close(struct net_device *dev)
rtl8169_update_counters(tp);
 
rtl_lock_work(tp);
-   clear_bit(RTL_FLAG_TASK_ENABLED, tp->wk.flags);
+   /* Clear all task flags */
+   for (i = 0; i < RTL_FLAG_MAX; i++)
+   clear_bit(i, tp->wk.flags);
 
rtl8169_down(dev);
rtl_unlock_work(tp);
@@ -6828,6 +6831,7 @@ rtl8169_get_stats64(struct net_device *dev, struct 
rtnl_link_stats64 *stats)
 static void rtl8169_net_suspend(struct net_device *dev)
 {
struct rtl8169_private *tp = netdev_priv(dev);
+   int i;
 
if (!netif_running(dev))
return;
@@ -6838,7 +6842,10 @@ static void rtl8169_net_suspend(struct net_device *dev)
 
rtl_lock_work(tp);
napi_disable(&tp->napi);
-   clear_bit(RTL_FLAG_TASK_ENABLED, tp->wk.flags);
+   /* Clear all task flags */
+   for (i = 0; i < RTL_FLAG_MAX; i++)
+   clear_bit(i, tp->wk.flags);
+
rtl_unlock_work(tp);
 
rtl_pll_power_down(tp);
-- 
2.17.1



Re: tg3 crashes under high load, when using 100Mbits

2018-04-14 Thread Kai-Heng Feng
Hi Satish,

> On 2018Mar21, at 00:57, Kai-Heng Feng  wrote:
> 
> Satish Baddipadige  wrote:
> 
>> On Thu, Feb 15, 2018 at 7:37 PM, Siva Reddy Kallam
>>  wrote:
>>> On Mon, Feb 12, 2018 at 10:59 AM, Siva Reddy Kallam
>>>  wrote:
>>>> On Fri, Feb 9, 2018 at 10:41 AM, Kai Heng Feng
>>>>  wrote:
>>>>> Hi Broadcom folks,
>>>>> 
>>>>> We are now enabling a new platform with tg3 nic, unfortunately we observed
>>>>> the bug [1] that dated back to 2015.
>>>>> I tried commit 4419bb1cedcd ("tg3: Add workaround to restrict 5762 MRRS to
>>>>> 2048”) but it does’t work.
>>>>> 
>>>>> Do you have any idea how to solve the issue?
>>>>> 
>>>>> [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1447664
>>>>> 
>>>>> Kai-Heng
>>>> Thank you for reporting. We will check and update you.
>>> With link aware mode, the clock speed could be slow and boot code does not
>>> complete within the expected time with lower link speeds. Need to override
>>> and the clock in driver. We are checking the feasibility of adding
>>> this in driver or firmware.
>> 
>> Hi Kai-Heng,
>> 
>> Can you please test the attached patch?
> 
> I built a kernel and asked affected users to try.

Users reported that the crash still happens with the patch.

Kai-Heng

> 
> Thanks for your work.
> 
> Kai-Heng
> 
>> 
>> Thanks,
>> Satish
>> 



[PATCH] sky2: Increase D3 delay to sky2 stops working after suspend

2018-03-31 Thread Kai-Heng Feng
The sky2 ethernet stops working after system resume from suspend:
[ 582.852065] sky2 :04:00.0: Refused to change power state, currently in D3

The current 150ms delay is not enough, change it to 200ms can solve the
issue.

BugLink: https://bugs.launchpad.net/bugs/1758507
Cc: Stable 
Signed-off-by: Kai-Heng Feng 
---
 drivers/net/ethernet/marvell/sky2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/sky2.c 
b/drivers/net/ethernet/marvell/sky2.c
index 9fe85300e7b6..5754116a6a4d 100644
--- a/drivers/net/ethernet/marvell/sky2.c
+++ b/drivers/net/ethernet/marvell/sky2.c
@@ -5087,7 +5087,7 @@ static int sky2_probe(struct pci_dev *pdev, const struct 
pci_device_id *ent)
INIT_WORK(&hw->restart_work, sky2_restart);
 
pci_set_drvdata(pdev, hw);
-   pdev->d3_delay = 150;
+   pdev->d3_delay = 200;
 
return 0;
 
-- 
2.15.1



Re: [PATCH 1/2] r8169: reinstate ALDPS for power saving

2018-03-22 Thread Kai-Heng Feng

Kai Heng Feng  wrote:


Hopefully Hayes (or Realtek) can shed more lights on the issue. Apparently
ALDPS and ASPM for r8169 is enabled in different commercial products, just
not in Linux mainline.


Hayes and Realtek folks,

How do we make this patch going forward?
Do you find the root cause that make this patch got reverted?

I guess ALDPS is no longer needed after commit a92a08499b1f ("r8169:  
improve runtime pm in general and suspend unused ports"), now the device  
gets runtime suspended when link is down.


OTOH, ASPM is still quite useful though. When it's enabled, it can save 1W  
power usage, which is quite substantial for a laptop.


So, I'd like to hear your feedback and make ASPM for r8169 eventually gets  
upstreamed.


Kai-Heng



Re: tg3 crashes under high load, when using 100Mbits

2018-03-20 Thread Kai-Heng Feng

Satish Baddipadige  wrote:


On Thu, Feb 15, 2018 at 7:37 PM, Siva Reddy Kallam
 wrote:

On Mon, Feb 12, 2018 at 10:59 AM, Siva Reddy Kallam
 wrote:

On Fri, Feb 9, 2018 at 10:41 AM, Kai Heng Feng
 wrote:

Hi Broadcom folks,

We are now enabling a new platform with tg3 nic, unfortunately we  
observed

the bug [1] that dated back to 2015.
I tried commit 4419bb1cedcd ("tg3: Add workaround to restrict 5762  
MRRS to

2048”) but it does’t work.

Do you have any idea how to solve the issue?

[1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1447664

Kai-Heng

Thank you for reporting. We will check and update you.

With link aware mode, the clock speed could be slow and boot code does not
complete within the expected time with lower link speeds. Need to override
and the clock in driver. We are checking the feasibility of adding
this in driver or firmware.


Hi Kai-Heng,

Can you please test the attached patch?


I built a kernel and asked affected users to try.

Thanks for your work.

Kai-Heng



Thanks,
Satish






Re: [PATCH] ath9k: turn on btcoex_enable as default

2018-02-11 Thread Kai Heng Feng


> On 10 Feb 2018, at 10:05 PM, Felix Fietkau  wrote:
> 
> On 2018-02-10 14:56, Kai Heng Feng wrote:
>> 
>>> On 9 Feb 2018, at 3:16 PM, Kalle Valo  wrote:
>>> Sure, but we have to make sure that we don't create regressions on
>>> existing systems. For example, did you test this with any system which
>>> don't support btcoex? (just asking, haven't tested this myself)
>> 
>> No not really, but I will definitely test it.
>> The only module I have that uses ath9k is Dell’s DW1707.
>> How do I check if it support btcoex or not?
> I just reviewed the code again, and I am sure that we cannot merge this
> patch. Enabling the btcoex parameter makes the driver enable a whole
> bunch of code starting timers, listening to some GPIOs, etc.
> 
> On non-btcoex systems, some of those GPIOs might be floating or even
> connected to different things, which could cause a lot of undefined
> behavior.
> 
> This is simply too big a risk, so there absolutely needs to be a
> whitelist for systems that need this, otherwise it has to remain
> disabled by default.

So what information can we use to whitelist btcoex chips?
Can we get btcoex support status at ath9k probing?

Kai-Heng

> 
> - Felix



Re: [PATCH] ath9k: turn on btcoex_enable as default

2018-02-10 Thread Kai Heng Feng

> On 9 Feb 2018, at 3:16 PM, Kalle Valo  wrote:
> Sure, but we have to make sure that we don't create regressions on
> existing systems. For example, did you test this with any system which
> don't support btcoex? (just asking, haven't tested this myself)

No not really, but I will definitely test it.
The only module I have that uses ath9k is Dell’s DW1707.
How do I check if it support btcoex or not?

(I resend the mail because my last mail get changed to HTML by my mail client)

Kai-Heng

> 
> -- 
> Kalle Valo



tg3 crashes under high load, when using 100Mbits

2018-02-08 Thread Kai Heng Feng

Hi Broadcom folks,

We are now enabling a new platform with tg3 nic, unfortunately we observed  
the bug [1] that dated back to 2015.
I tried commit 4419bb1cedcd ("tg3: Add workaround to restrict 5762 MRRS to  
2048”) but it does’t work.


Do you have any idea how to solve the issue?

[1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1447664

Kai-Heng



Re: [PATCH] ath9k: turn on btcoex_enable as default

2018-02-08 Thread Kai Heng Feng

Hi Felix,


On Feb 8, 2018, at 7:02 PM, Felix Fietkau  wrote:

On 2018-02-08 06:28, Kai-Heng Feng wrote:

Without btcoex_enable, WiFi activies make both WiFi and Bluetooth
unstable if there's a bluetooth connection.

Enable this option when bt_ant_diversity is disabled.

BugLink: https://bugs.launchpad.net/bugs/1746164
Signed-off-by: Kai-Heng Feng 

I think this might cause regressions on devices that don't have
bluetooth. This probably either needs more EEPROM checks, or something
to selectively enable it only on affected platforms.



I think it’s better not to use dmi_match. This issue should affect more  
ath9k.
And bluetooth peripherals are more than ever now, so it would be great to  
use BT out of the box.


Can you take a look at the bug link, maybe there are other things caused  
the erratic behavior that I didn’t notice?


Kai-Heng


- Felix


[PATCH] ath9k: turn on btcoex_enable as default

2018-02-07 Thread Kai-Heng Feng
Without btcoex_enable, WiFi activies make both WiFi and Bluetooth
unstable if there's a bluetooth connection.

Enable this option when bt_ant_diversity is disabled.

BugLink: https://bugs.launchpad.net/bugs/1746164
Signed-off-by: Kai-Heng Feng 
---
 drivers/net/wireless/ath/ath9k/init.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/net/wireless/ath/ath9k/init.c 
b/drivers/net/wireless/ath/ath9k/init.c
index e479fae5aab9..f8f6b091a077 100644
--- a/drivers/net/wireless/ath/ath9k/init.c
+++ b/drivers/net/wireless/ath/ath9k/init.c
@@ -56,7 +56,7 @@ static int ath9k_led_active_high = -1;
 module_param_named(led_active_high, ath9k_led_active_high, int, 0444);
 MODULE_PARM_DESC(led_active_high, "Invert LED polarity");
 
-static int ath9k_btcoex_enable;
+static int ath9k_btcoex_enable = 1;
 module_param_named(btcoex_enable, ath9k_btcoex_enable, int, 0444);
 MODULE_PARM_DESC(btcoex_enable, "Enable wifi-BT coexistence");
 
@@ -693,7 +693,6 @@ static int ath9k_init_softc(u16 devid, struct ath_softc *sc,
common->hw = sc->hw;
common->priv = sc;
common->debug_mask = ath9k_debug;
-   common->btcoex_enabled = ath9k_btcoex_enable == 1;
common->disable_ani = false;
 
/*
@@ -715,14 +714,17 @@ static int ath9k_init_softc(u16 devid, struct ath_softc 
*sc,
/*
 * Enable WLAN/BT RX Antenna diversity only when:
 *
-* - BTCOEX is disabled.
 * - the user manually requests the feature.
 * - the HW cap is set using the platform data.
 */
-   if (!common->btcoex_enabled && ath9k_bt_ant_diversity &&
+   if (ath9k_bt_ant_diversity &&
(pCap->hw_caps & ATH9K_HW_CAP_BT_ANT_DIV))
common->bt_ant_diversity = 1;
 
+   /* Enable btcoex when ant_diversity is disabled */
+   if (!common->bt_ant_diversity && ath9k_btcoex_enable)
+   common->btcoex_enabled = 1;
+
spin_lock_init(&common->cc_lock);
spin_lock_init(&sc->intr_lock);
spin_lock_init(&sc->sc_serial_rw);
-- 
2.15.1



Re: [PATCH v2] r8152: disable RX aggregation on Dell TB16 dock

2018-01-18 Thread Kai Heng Feng


> On 18 Jan 2018, at 10:50 PM, David Miller  wrote:
> 
> From: Hayes Wang 
> Date: Thu, 18 Jan 2018 03:04:08 +
> 
>> [...]
 r8153 on Dell TB15/16 dock corrupts rx packets.
 
 This change is suggested by Realtek. They guess that the XHCI
 controller doesn't have enough buffer, and their guesswork is correct,
 once the RX aggregation gets disabled, the issue is gone.
 
 ASMedia is currently working on a real sulotion for this issue.
 
 Dell and ODM confirm the bcdDevice and iSerialNumber is unique for TB16.
 
 Note that TB15 has different bcdDevice and iSerialNumber, which are
 not unique values. If you still have TB15, please contact Dell to
 replace it with TB16.
>> 
>> Excuse me. I don't understand why this patch is for specific USB nic rather 
>> than xHCI.
>> It seems to make the specific USB nic working and the other ones keeping 
>> error.
> 
> Well, are we sure that the device being in the TB16 dock doesn't
> contribute to the issue as well?

This is what vendors concluded for now. The very same NIC on WD15 doesn’t
have the issue.

> 
> If the problem only shows up with XHCI and this dock, then this patch
> was the appropriate way to deal with the problem for now.



Re: [PATCH v2] r8152: disable RX aggregation on Dell TB16 dock

2018-01-17 Thread Kai Heng Feng

> On 18 Jan 2018, at 11:04 AM, Hayes Wang  wrote:
> 
> [...]
>>> r8153 on Dell TB15/16 dock corrupts rx packets.
>>> 
>>> This change is suggested by Realtek. They guess that the XHCI
>>> controller doesn't have enough buffer, and their guesswork is correct,
>>> once the RX aggregation gets disabled, the issue is gone.
>>> 
>>> ASMedia is currently working on a real sulotion for this issue.
>>> 
>>> Dell and ODM confirm the bcdDevice and iSerialNumber is unique for TB16.
>>> 
>>> Note that TB15 has different bcdDevice and iSerialNumber, which are
>>> not unique values. If you still have TB15, please contact Dell to
>>> replace it with TB16.
> 
> Excuse me. I don't understand why this patch is for specific USB nic rather 
> than xHCI.
> It seems to make the specific USB nic working and the other ones keeping 
> error.

This patch is transient, once ASMedia find the correct solution, the patch
can be reverted.

This patch only targets the builtin 8153 because externally plugged r8152 or
asix do not suffered from this issue.

Kai-Heng

> 
> 
> Best Regards,
> Hayes
> 
> 



[PATCH v2] r8152: disable RX aggregation on Dell TB16 dock

2018-01-16 Thread Kai-Heng Feng
r8153 on Dell TB15/16 dock corrupts rx packets.

This change is suggested by Realtek. They guess that the XHCI controller
doesn't have enough buffer, and their guesswork is correct, once the RX
aggregation gets disabled, the issue is gone.

ASMedia is currently working on a real sulotion for this issue.

Dell and ODM confirm the bcdDevice and iSerialNumber is unique for TB16.

Note that TB15 has different bcdDevice and iSerialNumber, which are not
unique values. If you still have TB15, please contact Dell to replace it
with TB16.

BugLink: https://bugs.launchpad.net/bugs/1729674
Cc: Mario Limonciello 
Signed-off-by: Kai-Heng Feng 
---
v2:
- Disable RX aggregation instead of disable RX checksum
- Use bcdDevice and iSerialNumber to uniquely identify Dell TB16

 drivers/net/usb/r8152.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index d51d9abf7986..0657203ffb91 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -606,6 +606,7 @@ enum rtl8152_flags {
PHY_RESET,
SCHEDULE_NAPI,
GREEN_ETHERNET,
+   DELL_TB_RX_AGG_BUG,
 };
 
 /* Define these values to match your device */
@@ -1798,6 +1799,9 @@ static int r8152_tx_agg_fill(struct r8152 *tp, struct 
tx_agg *agg)
dev_kfree_skb_any(skb);
 
remain = agg_buf_sz - (int)(tx_agg_align(tx_data) - agg->head);
+
+   if (test_bit(DELL_TB_RX_AGG_BUG, &tp->flags))
+   break;
}
 
if (!skb_queue_empty(&skb_head)) {
@@ -4133,6 +4137,9 @@ static void r8153_init(struct r8152 *tp)
/* rx aggregation */
ocp_data = ocp_read_word(tp, MCU_TYPE_USB, USB_USB_CTRL);
ocp_data &= ~(RX_AGG_DISABLE | RX_ZERO_EN);
+   if (test_bit(DELL_TB_RX_AGG_BUG, &tp->flags))
+   ocp_data |= RX_AGG_DISABLE;
+
ocp_write_word(tp, MCU_TYPE_USB, USB_USB_CTRL, ocp_data);
 
rtl_tally_reset(tp);
@@ -5207,6 +5214,12 @@ static int rtl8152_probe(struct usb_interface *intf,
netdev->hw_features &= ~NETIF_F_RXCSUM;
}
 
+   if (le16_to_cpu(udev->descriptor.bcdDevice) == 0x3011 &&
+   udev->serial && !strcmp(udev->serial, "0100")) {
+   dev_info(&udev->dev, "Dell TB16 Dock, disable RX aggregation");
+   set_bit(DELL_TB_RX_AGG_BUG, &tp->flags);
+   }
+
netdev->ethtool_ops = &ops;
netif_set_gso_max_size(netdev, RTL_LIMITED_TSO_SIZE);
 
-- 
2.15.1



Re: [PATCH] r8152: disable rx checksum offload on Dell TB dock

2017-11-28 Thread Kai Heng Feng


> On 27 Nov 2017, at 11:13 PM,  
>  wrote:
> 
> This is quite surprising to me too.  The externally plugged in r8153 dongle,
> was it connected over type C port or over type A port?  AFAIK Type C port is
> actually Alpine ridge pass through port.  It is not connected to XHCI 
> controller
> or USB hub.

Over the front type A port, which connects to SMSC hub then ASMedia controller.

The type C port indeed connects to AR directly, hence no such issue.

Kai-Heng



Re: [PATCH] r8152: disable rx checksum offload on Dell TB dock

2017-11-24 Thread Kai Heng Feng

> Also the MAC address is different, can you just trigger off of Dell's
> MAC address space instead of the address space of the dongle device?

A really good idea, never thought of this. Thanks for the hint :)
Still, I need to ask Dell folks to get all the answers.

Kai-Heng



Re: [PATCH] r8152: disable rx checksum offload on Dell TB dock

2017-11-24 Thread Kai Heng Feng


> On 24 Nov 2017, at 4:28 PM, Greg KH  wrote:
> 
> The bcdDevice is different between the dock device and the "real"
> device, why not use that?

Yea, I’ll poke around and see if bcdDevice alone can be a good predicate.

> Then there is still a bug.  Who as ASMedia is working on this, have they
> posted anything to the linux-usb mailing list about it?

I think they are doing this internally. I’ll advice them to ask questions here 
if
they encounter any problem.

> Maybe.  Have you tried using usbmon to see what the data streams are for
> the two devices and where they have problems and diverge?  Is the dock
> device doing something different in response to something from the host
> that the "real" device does not do?

No I haven’t.
Not really sure how do debug network packets over USB. I’ll do some research
on the topic.

Kai-Heng


Re: [PATCH] r8152: disable rx checksum offload on Dell TB dock

2017-11-23 Thread Kai Heng Feng


> On 23 Nov 2017, at 5:24 PM, Greg KH  wrote:
> 
> On Thu, Nov 23, 2017 at 04:53:41PM +0800, Kai Heng Feng wrote:
>> 
>> What I want to do here is to finding this connection:
>> Realtek r8153 <-> SMSC hub (USD ID: 0424:5537) <-> 
>> ASMedia XHCI controller (PCI ID: 1b21:1142).
>> 
>> Is there a safer way to do this?
> 
> Nope!  You can't do that at all from within a USB driver, sorry.  As you
> really should not care at all :)

Got it :)

The r8153 in Dell TB dock has version information, RTL_VER_05.
We can use it to check for workaround, but many working RTL_VER_05 devices
will also be affected.
Do you think it’s an acceptable compromise?

>> I have a r8153 <-> USB 3.0 dongle which work just fine. I can’t find any 
>> information to differentiate them. Hence I want to use the connection to
>> identify if r8153 is on a Dell TB dock.
> 
> Are you sure there is nothing different in the version or release number
> of the device?  'lsusb -v' shows the exact same information for both
> devices?

Yes. I attached `lsusb -v` for r8153 on Dell TB dock, on a RJ45 <-> USB 3.0 
dongle,
and on a RJ45 <-> USB Type-C dongle.

>> Yes. From what I know, ASMedia is working on it, but not sure how long it
>> will take. In the meantime, I’d like to workaround this issue for the users.
> 
> Again, it's a host controller bug, it should be fixed there, don't try
> to paper over the real issue in different individual drivers.
> 
> I think I've seen various patches on the linux-usb list for this
> controller already, have you tried them?

Yes. These patches are all in mainline Linux now.

>> Actually no.
>> I just plugged r8153 dongle into the same hub, surprisingly the issue
>> doesn’t happen in this scenario.
> 
> Then something seems to be wrong with the device itself, as that would
> be the same exact electrical/logical path, right?

I have no idea why externally plugged one doesn’t have this issue.
Maybe it’s related how it’s wired inside the Dell TB dock...

Kai-Heng



lsusb-a
Description: Binary data


lsusb-c
Description: Binary data


lsusb-dock
Description: Binary data

> thanks,
> 
> greg k-h



Re: [PATCH] r8152: disable rx checksum offload on Dell TB dock

2017-11-23 Thread Kai Heng Feng

> On 23 Nov 2017, at 3:58 PM, Greg KH  wrote:
> 
> On Thu, Nov 23, 2017 at 01:38:38AM -0500, Kai-Heng Feng wrote:
>> r8153 on Dell TB dock corrupts rx packets.
>> 
>> The root cause is not found yet, but disabling rx checksumming can
>> workaround the issue. We can use this connection to decide if it's
>> a Dell TB dock:
>> Realtek r8153 <-> SMSC hub <-> ASMedia XHCI controller
>> 
>> BugLink: https://bugs.launchpad.net/bugs/1729674
>> Cc: Mario Limonciello 
>> Signed-off-by: Kai-Heng Feng 
>> ---
>> drivers/net/usb/r8152.c | 33 -
>> 1 file changed, 32 insertions(+), 1 deletion(-)
>> 
>> diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
>> index d51d9abf7986..58b80b5e7803 100644
>> --- a/drivers/net/usb/r8152.c
>> +++ b/drivers/net/usb/r8152.c
>> @@ -27,6 +27,8 @@
>> #include 
>> #include 
>> #include 
>> +#include 
>> +#include 
>> 
>> /* Information for net-next */
>> #define NETNEXT_VERSION  "09"
>> @@ -5135,6 +5137,35 @@ static u8 rtl_get_version(struct usb_interface *intf)
>>  return version;
>> }
>> 
>> +/* Ethernet on Dell TB 15/16 dock is connected this way:
>> + * Realtek r8153 <-> SMSC hub <-> ASMedia XHCI controller
>> + * We use this connection to make sure r8153 is on the Dell TB dock.
>> + */
>> +static bool check_dell_tb_dock(struct usb_device *udev)
>> +{
>> +struct usb_device *hub = udev->parent;
>> +struct usb_device *root_hub;
>> +struct pci_dev *controller;
>> +
>> +if (!hub)
>> +return false;
>> +
>> +if (!(le16_to_cpu(hub->descriptor.idVendor) == 0x0424 &&
>> +  le16_to_cpu(hub->descriptor.idProduct) == 0x5537))
>> +return false;
>> +
>> +root_hub = hub->parent;
>> +if (!root_hub || root_hub->parent)
>> +return false;
>> +
>> +controller = to_pci_dev(bus_to_hcd(root_hub->bus)->self.controller);
> 
> That's a very scary, and dangerous, cast.  You can not ever be sure that
> the hub really is a "root hub" like this.

What I want to do here is to finding this connection:
Realtek r8153 <-> SMSC hub (USD ID: 0424:5537) <-> 
ASMedia XHCI controller (PCI ID: 1b21:1142).

Is there a safer way to do this?

> 
>> +if (controller->vendor == 0x1b21 && controller->device == 0x1142)
>> +return true;
> 
> Why can't you just look at the USB device itself and go off of a quirk
> in it?  Something like a version or string or something else?

I have a r8153 <-> USB 3.0 dongle which work just fine. I can’t find any 
information to differentiate them. Hence I want to use the connection to
identify if r8153 is on a Dell TB dock.

> 
> This sounds like a USB host controller issue, not a USB device issue,
> can't we fix the "real" problem here instead of this crazy work-around?

Yes. From what I know, ASMedia is working on it, but not sure how long it
will take. In the meantime, I’d like to workaround this issue for the users.

> Odds are any device plugged into the hub should have the same issue,
> right?

Actually no.
I just plugged r8153 dongle into the same hub, surprisingly the issue
doesn’t happen in this scenario.

Kai-Heng

> 
> thanks,
> 
> greg k-h



[PATCH] r8152: disable rx checksum offload on Dell TB dock

2017-11-22 Thread Kai-Heng Feng
r8153 on Dell TB dock corrupts rx packets.

The root cause is not found yet, but disabling rx checksumming can
workaround the issue. We can use this connection to decide if it's
a Dell TB dock:
Realtek r8153 <-> SMSC hub <-> ASMedia XHCI controller

BugLink: https://bugs.launchpad.net/bugs/1729674
Cc: Mario Limonciello 
Signed-off-by: Kai-Heng Feng 
---
 drivers/net/usb/r8152.c | 33 -
 1 file changed, 32 insertions(+), 1 deletion(-)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index d51d9abf7986..58b80b5e7803 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -27,6 +27,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 /* Information for net-next */
 #define NETNEXT_VERSION"09"
@@ -5135,6 +5137,35 @@ static u8 rtl_get_version(struct usb_interface *intf)
return version;
 }
 
+/* Ethernet on Dell TB 15/16 dock is connected this way:
+ * Realtek r8153 <-> SMSC hub <-> ASMedia XHCI controller
+ * We use this connection to make sure r8153 is on the Dell TB dock.
+ */
+static bool check_dell_tb_dock(struct usb_device *udev)
+{
+   struct usb_device *hub = udev->parent;
+   struct usb_device *root_hub;
+   struct pci_dev *controller;
+
+   if (!hub)
+   return false;
+
+   if (!(le16_to_cpu(hub->descriptor.idVendor) == 0x0424 &&
+ le16_to_cpu(hub->descriptor.idProduct) == 0x5537))
+   return false;
+
+   root_hub = hub->parent;
+   if (!root_hub || root_hub->parent)
+   return false;
+
+   controller = to_pci_dev(bus_to_hcd(root_hub->bus)->self.controller);
+
+   if (controller->vendor == 0x1b21 && controller->device == 0x1142)
+   return true;
+
+   return false;
+}
+
 static int rtl8152_probe(struct usb_interface *intf,
 const struct usb_device_id *id)
 {
@@ -5202,7 +5233,7 @@ static int rtl8152_probe(struct usb_interface *intf,
NETIF_F_HIGHDMA | NETIF_F_FRAGLIST |
NETIF_F_IPV6_CSUM | NETIF_F_TSO6;
 
-   if (tp->version == RTL_VER_01) {
+   if (tp->version == RTL_VER_01 || check_dell_tb_dock(udev)) {
netdev->features &= ~NETIF_F_RXCSUM;
netdev->hw_features &= ~NETIF_F_RXCSUM;
}
-- 
2.14.1



Re: [PATCH 1/2] r8169: reinstate ALDPS for power saving

2017-11-15 Thread Kai Heng Feng


> On 15 Nov 2017, at 6:53 PM, David Miller  wrote:
> 
> From: Kai-Heng Feng 
> Date: Wed, 15 Nov 2017 04:00:18 -0500
> 
>> Commit ("r8169: enable ALDPS for power saving") caused a regression on
>> RTL8168evl/8111evl [1], so it got reverted.
>> 
>> Instead of reverting the whole commit, let's reinstate ALDPS for all
>> models other than RTL8168evl/8111evl.
>> 
>> [1] https://lkml.org/lkml/2013/1/5/36
>> 
>> Cc: Francois Romieu 
>> Cc: Hayes Wang 
>> Cc: Jörg Otte 
>> Signed-off-by: Kai-Heng Feng 
> 
> Sorry, this isn't going to work.
> 
> The problem is that we have no idea what chips this really
> works for.  All we know is that it definitely does not work
> for one particular chip variant.

Another guy tried to enable ASPM before [1], with extra knobs,
which is discouraged.
I want to do the same without the knobs, with the intention not to
re-introduce the regression to RTL8168evl/8111evl.

> The amount of coverage this change is going to get is very small as
> well, meaning an even greater change of regressions.
> 
> Therefore the only acceptable way to handle this is to have
> a white-list, specific chips that have been explicitly tested
> and are known to work with this feature, rather than the other
> way around.

I think a chip-specific whitelist will be quite hard. That requires *both*
Realtek and OEM/ODM to do the test. IIUC, the very same chip can be used
in different Laptops/Motherboards, regression may happen on a very specific 
combination.

A more plausible solution is model specific whitelist. In this case, 
extra knobs are desirable, so users can do the testing themselves before
adding new models to whitelist.

Currently users report that by enabling r8169’s ASPM on Dell Inspiron 7559,
power consumption can be drastically reduced.

> Furthermore, you're not even checking the chip version, you're
> checking instead whether the firmware is loaded or not.  That
> doesn't seem like a safe way to guard this at all.

Hopefully Hayes (or Realtek) can shed more lights on the issue. Apparently
ALDPS and ASPM for r8169 is enabled in different commercial products, just
not in Linux mainline.

Kai-Heng

[1] https://www.spinics.net/lists/netdev/msg403994.html




[PATCH 2/2] r8169: reinstate internal ASPM and clock request settings

2017-11-15 Thread Kai-Heng Feng
Commit ("r8169: enable internal ASPM and clock request settings") caused
a regression on RTL8168evl/8111evl [1], so it got reverted.

Instead of reverting the whole commit, let's reinstate ASPM for all
models other than RTL8168evl/8111evl.

[1] https://lkml.org/lkml/2013/1/5/36

Cc: Francois Romieu 
Cc: Hayes Wang 
Cc: Jörg Otte 
Signed-off-by: Kai-Heng Feng 
---
 drivers/net/ethernet/realtek/r8169.c | 28 ++--
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c 
b/drivers/net/ethernet/realtek/r8169.c
index 54bb9b15d541..6d6f3079320a 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -510,6 +510,7 @@ enum rtl8168_registers {
 #define PWM_EN (1 << 22)
 #define RXDV_GATED_EN  (1 << 19)
 #define EARLY_TALLY_EN (1 << 16)
+#define FORCE_CLK  (1 << 15) /* force clock request */
 };
 
 enum rtl_register_content {
@@ -5992,6 +5993,11 @@ static void rtl_hw_start_8168e_2(struct rtl8169_private 
*tp)
 
RTL_W8(DLLPR, RTL_R8(DLLPR) | PFM_EN);
RTL_W32(MISC, RTL_R32(MISC) | PWM_EN);
+
+   /* Don't enable ASPM for RTL8168evl/8111evl.
+* https://lkml.org/lkml/2013/1/5/36
+*/
+
RTL_W8(Config5, RTL_R8(Config5) & ~Spi_en);
 }
 
@@ -6017,13 +6023,12 @@ static void rtl_hw_start_8168f(struct rtl8169_private 
*tp)
 
RTL_W8(MaxTxPacketSize, EarlySize);
 
-   rtl_disable_clock_request(pdev);
-
RTL_W32(TxConfig, RTL_R32(TxConfig) | TXCFG_AUTO_FIFO);
RTL_W8(MCU, RTL_R8(MCU) & ~NOW_IS_OOB);
RTL_W8(DLLPR, RTL_R8(DLLPR) | PFM_EN);
-   RTL_W32(MISC, RTL_R32(MISC) | PWM_EN);
-   RTL_W8(Config5, RTL_R8(Config5) & ~Spi_en);
+   RTL_W32(MISC, RTL_R32(MISC) | PWM_EN | FORCE_CLK);
+   RTL_W8(Config5, (RTL_R8(Config5) & ~Spi_en) | ASPM_en);
+   RTL_W8(Config2, RTL_R8(Config2) | ClkReqEn);
 }
 
 static void rtl_hw_start_8168f_1(struct rtl8169_private *tp)
@@ -6083,8 +6088,10 @@ static void rtl_hw_start_8168g(struct rtl8169_private 
*tp)
rtl_w0w1_eri(tp, 0xdc, ERIAR_MASK_0001, 0x01, 0x00, ERIAR_EXGMAC);
rtl_eri_write(tp, 0x2f8, ERIAR_MASK_0011, 0x1d8f, ERIAR_EXGMAC);
 
-   RTL_W32(MISC, RTL_R32(MISC) & ~RXDV_GATED_EN);
+   RTL_W32(MISC, (RTL_R32(MISC) | FORCE_CLK) & ~RXDV_GATED_EN);
RTL_W8(MaxTxPacketSize, EarlySize);
+   RTL_W8(Config5, RTL_R8(Config5) | ASPM_en);
+   RTL_W8(Config2, RTL_R8(Config2) | ClkReqEn);
 
rtl_eri_write(tp, 0xc0, ERIAR_MASK_0011, 0x, ERIAR_EXGMAC);
rtl_eri_write(tp, 0xb8, ERIAR_MASK_0011, 0x, ERIAR_EXGMAC);
@@ -6594,6 +6601,9 @@ static void rtl_hw_start_8105e_1(struct rtl8169_private 
*tp)
 
RTL_W8(MCU, RTL_R8(MCU) | EN_NDP | EN_OOB_RESET);
RTL_W8(DLLPR, RTL_R8(DLLPR) | PFM_EN);
+   RTL_W8(Config5, RTL_R8(Config5) | ASPM_en);
+   RTL_W8(Config2, RTL_R8(Config2) | ClkReqEn);
+   RTL_W32(MISC, RTL_R32(MISC) | FORCE_CLK);
 
rtl_ephy_init(tp, e_info_8105e_1, ARRAY_SIZE(e_info_8105e_1));
 
@@ -6621,6 +6631,9 @@ static void rtl_hw_start_8402(struct rtl8169_private *tp)
 
RTL_W32(TxConfig, RTL_R32(TxConfig) | TXCFG_AUTO_FIFO);
RTL_W8(MCU, RTL_R8(MCU) & ~NOW_IS_OOB);
+   RTL_W8(Config5, RTL_R8(Config5) | ASPM_en);
+   RTL_W8(Config2, RTL_R8(Config2) | ClkReqEn);
+   RTL_W32(MISC, RTL_R32(MISC) | FORCE_CLK);
 
rtl_ephy_init(tp, e_info_8402, ARRAY_SIZE(e_info_8402));
 
@@ -6644,7 +6657,10 @@ static void rtl_hw_start_8106(struct rtl8169_private *tp)
/* Force LAN exit from ASPM if Rx/Tx are not idle */
RTL_W32(FuncEvent, RTL_R32(FuncEvent) | 0x002800);
 
-   RTL_W32(MISC, (RTL_R32(MISC) | DISABLE_LAN_EN) & ~EARLY_TALLY_EN);
+   RTL_W32(MISC,
+   (RTL_R32(MISC) | DISABLE_LAN_EN | FORCE_CLK) & ~EARLY_TALLY_EN);
+   RTL_W8(Config5, RTL_R8(Config5) | ASPM_en);
+   RTL_W8(Config2, RTL_R8(Config2) | ClkReqEn);
RTL_W8(MCU, RTL_R8(MCU) | EN_NDP | EN_OOB_RESET);
RTL_W8(DLLPR, RTL_R8(DLLPR) & ~PFM_EN);
 
-- 
2.14.1



[PATCH 1/2] r8169: reinstate ALDPS for power saving

2017-11-15 Thread Kai-Heng Feng
Commit ("r8169: enable ALDPS for power saving") caused a regression on
RTL8168evl/8111evl [1], so it got reverted.

Instead of reverting the whole commit, let's reinstate ALDPS for all
models other than RTL8168evl/8111evl.

[1] https://lkml.org/lkml/2013/1/5/36

Cc: Francois Romieu 
Cc: Hayes Wang 
Cc: Jörg Otte 
Signed-off-by: Kai-Heng Feng 
---
 drivers/net/ethernet/realtek/r8169.c | 58 +---
 1 file changed, 48 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c 
b/drivers/net/ethernet/realtek/r8169.c
index e03fcf914690..54bb9b15d541 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -733,6 +733,7 @@ enum features {
RTL_FEATURE_WOL = (1 << 0),
RTL_FEATURE_MSI = (1 << 1),
RTL_FEATURE_GMII= (1 << 2),
+   RTL_FEATURE_FW_LOADED   = (1 << 3),
 };
 
 struct rtl8169_counters {
@@ -2785,8 +2786,10 @@ static void rtl_apply_firmware(struct rtl8169_private 
*tp)
struct rtl_fw *rtl_fw = tp->rtl_fw;
 
/* TODO: release firmware once rtl_phy_write_fw signals failures. */
-   if (!IS_ERR_OR_NULL(rtl_fw))
+   if (!IS_ERR_OR_NULL(rtl_fw)) {
rtl_phy_write_fw(tp, rtl_fw);
+   tp->features |= RTL_FEATURE_FW_LOADED;
+   }
 }
 
 static void rtl_apply_firmware_cond(struct rtl8169_private *tp, u8 reg, u16 
val)
@@ -2797,6 +2800,31 @@ static void rtl_apply_firmware_cond(struct 
rtl8169_private *tp, u8 reg, u16 val)
rtl_apply_firmware(tp);
 }
 
+static void r810x_aldps_disable(struct rtl8169_private *tp)
+{
+   rtl_writephy(tp, 0x1f, 0x);
+   rtl_writephy(tp, 0x18, 0x0310);
+   msleep(100);
+}
+
+static void r810x_aldps_enable(struct rtl8169_private *tp)
+{
+   if (!(tp->features & RTL_FEATURE_FW_LOADED))
+   return;
+
+   rtl_writephy(tp, 0x1f, 0x);
+   rtl_writephy(tp, 0x18, 0x8310);
+}
+
+static void r8168_aldps_enable_1(struct rtl8169_private *tp)
+{
+   if (!(tp->features & RTL_FEATURE_FW_LOADED))
+   return;
+
+   rtl_writephy(tp, 0x1f, 0x);
+   rtl_w0w1_phy(tp, 0x15, 0x1000, 0x);
+}
+
 static void rtl8169s_hw_phy_config(struct rtl8169_private *tp)
 {
static const struct phy_reg phy_reg_init[] = {
@@ -3587,6 +3615,10 @@ static void rtl8168e_2_hw_phy_config(struct 
rtl8169_private *tp)
rtl_w0w1_phy(tp, 0x10, 0x, 0x0400);
rtl_writephy(tp, 0x1f, 0x);
 
+   /* Don't enable ALDPS for RTL8168evl/8111evl.
+* https://lkml.org/lkml/2013/1/5/36
+*/
+
/* Broken BIOS workaround: feed GigaMAC registers with MAC address. */
rtl_rar_exgmac_set(tp, tp->dev->dev_addr);
 }
@@ -3661,6 +3693,8 @@ static void rtl8168f_1_hw_phy_config(struct 
rtl8169_private *tp)
rtl_writephy(tp, 0x05, 0x8b85);
rtl_w0w1_phy(tp, 0x06, 0x4000, 0x);
rtl_writephy(tp, 0x1f, 0x);
+
+   r8168_aldps_enable_1(tp);
 }
 
 static void rtl8168f_2_hw_phy_config(struct rtl8169_private *tp)
@@ -3668,6 +3702,8 @@ static void rtl8168f_2_hw_phy_config(struct 
rtl8169_private *tp)
rtl_apply_firmware(tp);
 
rtl8168f_hw_phy_config(tp);
+
+   r8168_aldps_enable_1(tp);
 }
 
 static void rtl8411_hw_phy_config(struct rtl8169_private *tp)
@@ -4019,6 +4055,8 @@ static void rtl8168h_2_hw_phy_config(struct 
rtl8169_private *tp)
rtl_w0w1_phy(tp, 0x10, 0x, 0x0004);
 
rtl_writephy(tp, 0x1f, 0x);
+
+   r8168_aldps_enable_1(tp);
 }
 
 static void rtl8168ep_1_hw_phy_config(struct rtl8169_private *tp)
@@ -4188,21 +4226,19 @@ static void rtl8105e_hw_phy_config(struct 
rtl8169_private *tp)
};
 
/* Disable ALDPS before ram code */
-   rtl_writephy(tp, 0x1f, 0x);
-   rtl_writephy(tp, 0x18, 0x0310);
-   msleep(100);
+   r810x_aldps_disable(tp);
 
rtl_apply_firmware(tp);
 
rtl_writephy_batch(tp, phy_reg_init, ARRAY_SIZE(phy_reg_init));
+
+   r810x_aldps_enable(tp);
 }
 
 static void rtl8402_hw_phy_config(struct rtl8169_private *tp)
 {
/* Disable ALDPS before setting firmware */
-   rtl_writephy(tp, 0x1f, 0x);
-   rtl_writephy(tp, 0x18, 0x0310);
-   msleep(20);
+   r810x_aldps_disable(tp);
 
rtl_apply_firmware(tp);
 
@@ -4212,6 +4248,8 @@ static void rtl8402_hw_phy_config(struct rtl8169_private 
*tp)
rtl_writephy(tp, 0x10, 0x401f);
rtl_writephy(tp, 0x19, 0x7030);
rtl_writephy(tp, 0x1f, 0x);
+
+   r810x_aldps_enable(tp);
 }
 
 static void rtl8106e_hw_phy_config(struct rtl8169_private *tp)
@@ -4224,9 +4262,7 @@ static void rtl8106e_hw_phy_config(struct rtl8169_private 
*tp)
};
 
/* Disable ALDPS before ram code */
-   rtl_writephy(tp, 0x1f, 0x);
-   rtl_writephy(tp, 0x18, 0x0310);
-   msleep(100);
+   r810x_aldps_disable

Re: [PATCH] usbnet: prevent device rpm suspend in usbnet_probe function

2016-11-16 Thread Kai-Heng Feng
On Mon, Nov 14, 2016 at 3:34 PM, Kai-Heng Feng
 wrote:
> On Fri, Nov 11, 2016 at 10:44 PM, Mathias Nyman
>  wrote:
>> On 10.11.2016 13:22, Oliver Neukum wrote:
>>>
>>> On Thu, 2016-11-10 at 12:09 +0100, Bjørn Mork wrote:
>>>>
>>>> Kai-Heng Feng  writes:
>>>>>
>>>>> On Wed, Nov 9, 2016 at 8:32 PM, Bjørn Mork  wrote:
>>>>>>
>>>>>> Oliver Neukum  writes:
>>>>>>
>>>>>>> On Tue, 2016-11-08 at 13:44 -0500, Alan Stern wrote:
>>>>>>>
>>>>>>>> These problems could very well be caused by running at SuperSpeed
>>>>>>>> (USB-3) instead of high speed (USB-2).
>>>>>
>>>>>
>>>>> Yes, it's running at SuperSpeed, on a Kabylake laptop.
>>>>>
>>>>> It does not have this issue on a Broadwell laptop, also running at
>>>>> SuperSpeed.
>>>>
>>>>
>>>> Then I must join Oliver, being very surprised by where in the stack you
>>>> attempt to fix the issue.  What you write above indicates a problem in
>>>> pci bridge or usb host controller, doesn't it?
>
> Yes, I was totally wrong about that.
>
>>>
>>>
>>> Indeed. And this means we need an XHCI specialist.
>>> Mathias, we have a failure specific to one implementation of XHCI.
>>>
>>
>>
>> Could be related to resume singnalling time.
>> Does the xhci fix for it in 4.9-rc3 help?
>>
>> commit 7d3b016a6f5a0fa610dfd02b05654c08fa4ae514
>> xhci: use default USB_RESUME_TIMEOUT when resuming ports.
>>
>> It doesn't directly explain why it would work on Broadwell but not Kabylake,
>> but it resolved very similar cases.
>>
>> If not, then adding dynamic debug for xhci could show something.
>
> I tried the latest commit, 6005a545cadb2adca64350c7aee17d002563e8c7,
> on for-usb-next branch.
>
> Now the cdc_mbim still probe failed at the first time, but somehow it
> re-probed again with a success.
>
> I reverted commit 7d3b016a6f5a0fa610dfd02b05654c08fa4ae514 and the
> behavior is the same, first time probed failed, second time probed
> success.
>
> The attached dmesg is with usbcore and xhci_hcd dynamic debug enabled.

I filed a bug report on bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=187861

>
>>
>> -Mathias
>>


Re: [PATCH] usbnet: prevent device rpm suspend in usbnet_probe function

2016-11-09 Thread Kai-Heng Feng
Hi,

On Wed, Nov 9, 2016 at 8:32 PM, Bjørn Mork  wrote:
> Oliver Neukum  writes:
>
>> On Tue, 2016-11-08 at 13:44 -0500, Alan Stern wrote:
>>
>>> These problems could very well be caused by running at SuperSpeed
>>> (USB-3) instead of high speed (USB-2).

Yes, it's running at SuperSpeed, on a Kabylake laptop.

It does not have this issue on a Broadwell laptop, also running at SuperSpeed.

>>>
>>> Is there any way to test what happens when the device is attached to
>>> the computer by a USB-2 cable?  That would prevent it from operating at
>>> SuperSpeed.

I recall old Intel PCH can change the USB host from XHCI to EHCI,
newer PCH does not have this option.

Is there a way to force XHCI run at HighSpeed?

>>>
>>> The main point, however, is that the proposed patch doesn't seem to
>>> address the true problem, which is that the device gets suspended
>>> between probes.  The patch only tries to prevent it from being
>>> suspended during a probe -- which is already prevented by the USB core.
>>
>> But why doesn't it fail during normal operation?
>>
>> I suspect that its firmware requires the altsetting
>>
>> /* should we change control altsetting on a NCM/MBIM function? */
>> if (cdc_ncm_select_altsetting(intf) == CDC_NCM_COMM_ALTSETTING_MBIM) 
>> {
>> data_altsetting = CDC_NCM_DATA_ALTSETTING_MBIM;
>> ret = cdc_mbim_set_ctrlalt(dev, intf, 
>> CDC_NCM_COMM_ALTSETTING_MBIM);
>>
>> to be set before it accepts a suspension.
>
> Could be, but I don't think so.  The above code is effectively a noop
> unless the function is a combined NCM/MBIM function.  Something I've
> never seen on a Sierra Wireless device (ignoring the infamous EM7345,
> which really is an Intel device).
>
> This is a typical example of a Sierra Wireless modem configured for
> MBIM:
>
> P:  Vendor=1199 ProdID=9079 Rev= 0.06
> S:  Manufacturer=Sierra Wireless, Incorporated
> S:  Product=Sierra Wireless EM7455 Qualcomm Snapdragon X7 LTE-A
> S:  SerialNumber=LF615126xxx
> C:* #Ifs= 2 Cfg#= 1 Atr=a0 MxPwr=500mA
> A:  FirstIf#=12 IfCount= 2 Cls=02(comm.) Sub=0e Prot=00
> I:* If#=12 Alt= 0 #EPs= 1 Cls=02(comm.) Sub=0e Prot=00 Driver=(none)
> E:  Ad=82(I) Atr=03(Int.) MxPS=  64 Ivl=32ms
> I:* If#=13 Alt= 0 #EPs= 0 Cls=0a(data ) Sub=00 Prot=02 Driver=(none)
> I:  If#=13 Alt= 1 #EPs= 2 Cls=0a(data ) Sub=00 Prot=02 Driver=(none)
> E:  Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
> E:  Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
>
>
> The control interface of plain MBIM functions will always have a single
> altsetting, like the example above. So cdc_ncm_select_altsetting(intf)
> returns "0", while CDC_NCM_COMM_ALTSETTING_MBIM is "1".
>
>
> Just for reference, using the Intel^H^H^H^H^HEM7345 as example, this is
> what a combined NCM/MBIM function looks like:
>
>
> P:  Vendor=1199 ProdID=a001 Rev=17.29
> S:  Manufacturer=Sierra Wireless Inc.
> S:  Product=Sierra Wireless EM7345 4G LTE
> S:  SerialNumber=013937000xx
> C:* #Ifs= 4 Cfg#= 1 Atr=e0 MxPwr=100mA
> A:  FirstIf#= 0 IfCount= 2 Cls=02(comm.) Sub=0d Prot=00
> A:  FirstIf#= 2 IfCount= 2 Cls=02(comm.) Sub=02 Prot=01
> I:  If#= 0 Alt= 0 #EPs= 1 Cls=02(comm.) Sub=0d Prot=00 Driver=cdc_mbim
> E:  Ad=81(I) Atr=03(Int.) MxPS=  64 Ivl=1ms
> I:* If#= 0 Alt= 1 #EPs= 1 Cls=02(comm.) Sub=0e Prot=00 Driver=cdc_mbim
> E:  Ad=81(I) Atr=03(Int.) MxPS=  64 Ivl=1ms
> I:  If#= 1 Alt= 0 #EPs= 0 Cls=0a(data ) Sub=00 Prot=01 Driver=cdc_mbim
> I:  If#= 1 Alt= 1 #EPs= 2 Cls=0a(data ) Sub=00 Prot=01 Driver=cdc_mbim
> E:  Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
> E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
> I:* If#= 1 Alt= 2 #EPs= 2 Cls=0a(data ) Sub=00 Prot=02 Driver=cdc_mbim
> E:  Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
> E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
> I:* If#= 2 Alt= 0 #EPs= 1 Cls=02(comm.) Sub=02 Prot=01 Driver=(none)
> E:  Ad=83(I) Atr=03(Int.) MxPS=  64 Ivl=1ms
> I:* If#= 3 Alt= 0 #EPs= 2 Cls=0a(data ) Sub=00 Prot=00 Driver=(none)
> E:  Ad=84(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
> E:  Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
>
>
> And this is what the code you quote is trying to deal with.  Note the
> different subclass of altsetting 0 and 1 This is incredibly ugly.
>
> FWIW, the modem in question cannot be an EM7345. That modem does not
> have the static interface numbering oddity.  Another sign that it isn't
> a true Sierra device.

Yes, this modem is an EM7445.

>
>
>
>
> Bjørn


Re: [PATCH] usbnet: prevent device rpm suspend in usbnet_probe function

2016-11-07 Thread Kai-Heng Feng
Hi,

On Mon, Nov 7, 2016 at 7:02 PM, Oliver Neukum  wrote:
> On Fri, 2016-11-04 at 17:57 +0800, Kai-Heng Feng wrote:
>> Sometimes cdc_mbim failed to probe if runtime pm is enabled:
>> [9.305626] cdc_mbim: probe of 2-2:1.12 failed with error -22
>>
>> This can be solved by increase its pm usage counter.
>>
>> Signed-off-by: Kai-Heng Feng 
>
> For the record:
>
> NAK. This fixes a symptom. If this patch helps something is broken in
> device core. We need to find that.
>

Please check attached dmesg with usbcore.dyndbg="+p".

Thanks!

> Regards
> Oliver
>
>


dmesg
Description: Binary data


[PATCH] usbnet: prevent device rpm suspend in usbnet_probe function

2016-11-04 Thread Kai-Heng Feng
Sometimes cdc_mbim failed to probe if runtime pm is enabled:
[9.305626] cdc_mbim: probe of 2-2:1.12 failed with error -22

This can be solved by increase its pm usage counter.

Signed-off-by: Kai-Heng Feng 
---
 drivers/net/usb/usbnet.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index d5071e3..f77b4bf 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -1674,12 +1674,15 @@ usbnet_probe (struct usb_interface *udev, const struct 
usb_device_id *prod)
net->watchdog_timeo = TX_TIMEOUT_JIFFIES;
net->ethtool_ops = &usbnet_ethtool_ops;
 
+   if (usb_autopm_get_interface(dev->intf) < 0)
+   goto out1;
+
// allow device-specific bind/init procedures
// NOTE net->name still not usable ...
if (info->bind) {
status = info->bind (dev, udev);
if (status < 0)
-   goto out1;
+   goto out2;
 
// heuristic:  "usb%d" for links we know are two-host,
// else "eth%d" when there's reasonable doubt.  userspace
@@ -1772,6 +1775,8 @@ usbnet_probe (struct usb_interface *udev, const struct 
usb_device_id *prod)
 out3:
if (info->unbind)
info->unbind (dev, udev);
+out2:
+   usb_autopm_put_interface(dev->intf);
 out1:
/* subdrivers must undo all they did in bind() if they
 * fail it, but we may fail later and a deferred kevent
-- 
2.7.4