Re: [PATCH net] r8169: fix mac address change
On Jul 2 22:49, Heiner Kallweit wrote: > Network core refuses to change mac address because flag > IFF_LIVE_ADDR_CHANGE isn't set. Set this missing flag. > > Fixes: 1f7aa2bc268e ("r8169: simplify rtl_set_mac_address") > Reported-by: Corinna Vinschen > Signed-off-by: Heiner Kallweit > --- > drivers/net/ethernet/realtek/r8169.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/net/ethernet/realtek/r8169.c > b/drivers/net/ethernet/realtek/r8169.c > index f80ac894..a390db27 100644 > --- a/drivers/net/ethernet/realtek/r8169.c > +++ b/drivers/net/ethernet/realtek/r8169.c > @@ -7607,6 +7607,7 @@ static int rtl_init_one(struct pci_dev *pdev, const > struct pci_device_id *ent) > NETIF_F_HW_VLAN_CTAG_RX; > dev->vlan_features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_TSO | > NETIF_F_HIGHDMA; > + dev->priv_flags |= IFF_LIVE_ADDR_CHANGE; > > tp->cp_cmd |= RxChkSum | RxVlan; > > -- > 2.18.0 Tested-by: Corinna Vinschen This patch allows to change the MAC any time, just as before 1f7aa2bc268e, and thus also fixes the problem reported in https://www.spinics.net/lists/netdev/msg510926.html Thanks, Corinna
Regression introduced by "r8169: simplify rtl_set_mac_address"
Hi, the patch 1f7aa2bc268e, "r8169: simplify rtl_set_mac_address", introduced a regression found by trying to team a r8169 NIC. Try the following (assuming the r8169 NIC is eth0): $ nmcli con add type team con-name team0 ifname nm-team config \ '{"runner": {"name": "lacp"}, "link_watch": {"name": "ethtool"}}' \ ipv4.method disable ipv6.method ignore $ nmcli con add type ethernet ifname eth0 con-name team0.0 master nm-team $ nmcli con up team0.0 $ teamdctl nm-team port present eth0 command call failed (No such device) Bisecting turned up commit 1f7aa2bc268e, "r8169: simplify rtl_set_mac_address" as the culprit. Reverting this patch fixes the issue and the teamdctl call succeeds. The reason is apparently the usage of eth_mac_addr here. eth_mac_addr calls eth_prepare_mac_addr_change which checks for IFF_LIVE_ADDR_CHANGE. Debugging shows this flag not being set on r8169, thus eth_prepare_mac_addr_change returns -EBUSY (no idea why userspace claims "No such device", rather than "Device or resource busy", but that's not the point here). Note that other devices like igb, don't call eth_mac_addr either, but rather call memcpy by themselves to copy the new MAC, just as the original r8169 code did, too. Consequentially this problem is not present on igb. I suggest to revert this change in the first place, but I wonder if we're not just missing to set IFF_LIVE_ADDR_CHANGE in a lot of drivers. Thanks, Corinna
Re: [Intel-wired-lan] [PATCH] igb: add VF trust infrastructure
On Jan 24 03:41, Brown, Aaron F wrote: > > From: Intel-wired-lan [mailto:intel-wired-lan-boun...@osuosl.org] On > > Behalf Of Alexander Duyck > > Sent: Monday, January 22, 2018 2:58 PM > > To: intel-wired-lan <intel-wired-...@lists.osuosl.org>; Netdev > > <netdev@vger.kernel.org> > > Cc: Corinna Vinschen <vinsc...@redhat.com> > > Subject: Re: [Intel-wired-lan] [PATCH] igb: add VF trust infrastructure > > > > Hi Corinna, > > > > I've looked over the patch and didn't see any issues. My understanding > > is that Jeff has pulled it into his tree and it should be going > > through testing shortly. > > Indeed it is / was ;) Thanks guys! Corinna > > > > > Thanks. > > > > - Alex > > > > On Mon, Jan 22, 2018 at 12:54 AM, Corinna Vinschen > > <vinsc...@redhat.com> wrote: > > > Hi, > > > > > > Could somebody please review this patch? > > > > > > > > > Thanks, > > > Corinna > > > > > > > > > On Jan 17 11:53, Corinna Vinschen wrote: > > >> * Add a per-VF value to know if a VF is trusted, by default don't > > >> trust VFs. > > >> > > >> * Implement netdev op to trust VFs (igb_ndo_set_vf_trust) and add > > >> trust status to ndo_get_vf_config output. > > >> > > >> * Allow a trusted VF to change MAC and MAC filters even if MAC > > >> has been administratively set. > > >> > > >> Signed-off-by: Corinna Vinschen <vinsc...@redhat.com> > > >> --- > > >> drivers/net/ethernet/intel/igb/igb.h | 1 + > > >> drivers/net/ethernet/intel/igb/igb_main.c | 30 > > +++--- > > >> 2 files changed, 28 insertions(+), 3 deletions(-) > > Tested-by: Aaron Brown <aaron.f.br...@intel.com> > ___ > Intel-wired-lan mailing list > intel-wired-...@osuosl.org > https://lists.osuosl.org/mailman/listinfo/intel-wired-lan
Re: [PATCH] igb: add VF trust infrastructure
Hi, Could somebody please review this patch? Thanks, Corinna On Jan 17 11:53, Corinna Vinschen wrote: > * Add a per-VF value to know if a VF is trusted, by default don't > trust VFs. > > * Implement netdev op to trust VFs (igb_ndo_set_vf_trust) and add > trust status to ndo_get_vf_config output. > > * Allow a trusted VF to change MAC and MAC filters even if MAC > has been administratively set. > > Signed-off-by: Corinna Vinschen <vinsc...@redhat.com> > --- > drivers/net/ethernet/intel/igb/igb.h | 1 + > drivers/net/ethernet/intel/igb/igb_main.c | 30 +++--- > 2 files changed, 28 insertions(+), 3 deletions(-) > > diff --git a/drivers/net/ethernet/intel/igb/igb.h > b/drivers/net/ethernet/intel/igb/igb.h > index 1c6b8d9176a8..55d6f17d5799 100644 > --- a/drivers/net/ethernet/intel/igb/igb.h > +++ b/drivers/net/ethernet/intel/igb/igb.h > @@ -109,6 +109,7 @@ struct vf_data_storage { > u16 pf_qos; > u16 tx_rate; > bool spoofchk_enabled; > + bool trusted; > }; > > /* Number of unicast MAC filters reserved for the PF in the RAR registers */ > diff --git a/drivers/net/ethernet/intel/igb/igb_main.c > b/drivers/net/ethernet/intel/igb/igb_main.c > index 749fb1f720b4..5cbec6cf2b2a 100644 > --- a/drivers/net/ethernet/intel/igb/igb_main.c > +++ b/drivers/net/ethernet/intel/igb/igb_main.c > @@ -190,6 +190,8 @@ static int igb_ndo_set_vf_vlan(struct net_device *netdev, > static int igb_ndo_set_vf_bw(struct net_device *, int, int, int); > static int igb_ndo_set_vf_spoofchk(struct net_device *netdev, int vf, > bool setting); > +static int igb_ndo_set_vf_trust(struct net_device *netdev, int vf, > + bool setting); > static int igb_ndo_get_vf_config(struct net_device *netdev, int vf, >struct ifla_vf_info *ivi); > static void igb_check_vf_rate_limit(struct igb_adapter *); > @@ -2527,6 +2529,7 @@ static const struct net_device_ops igb_netdev_ops = { > .ndo_set_vf_vlan= igb_ndo_set_vf_vlan, > .ndo_set_vf_rate= igb_ndo_set_vf_bw, > .ndo_set_vf_spoofchk= igb_ndo_set_vf_spoofchk, > + .ndo_set_vf_trust = igb_ndo_set_vf_trust, > .ndo_get_vf_config = igb_ndo_get_vf_config, > #ifdef CONFIG_NET_POLL_CONTROLLER > .ndo_poll_controller= igb_netpoll, > @@ -6383,6 +6386,9 @@ static int igb_vf_configure(struct igb_adapter > *adapter, int vf) > /* By default spoof check is enabled for all VFs */ > adapter->vf_data[vf].spoofchk_enabled = true; > > + /* By default VFs are not trusted */ > + adapter->vf_data[vf].trusted = false; > + > return 0; > } > > @@ -6940,13 +6946,13 @@ static int igb_set_vf_mac_filter(struct igb_adapter > *adapter, const int vf, > } > break; > case E1000_VF_MAC_FILTER_ADD: > - if (vf_data->flags & IGB_VF_FLAG_PF_SET_MAC) { > + if ((vf_data->flags & IGB_VF_FLAG_PF_SET_MAC) && > + !vf_data->trusted) { > dev_warn(>dev, >"VF %d requested MAC filter but is > administratively denied\n", >vf); > return -EINVAL; > } > - > if (!is_valid_ether_addr(addr)) { > dev_warn(>dev, >"VF %d attempted to set invalid MAC filter\n", > @@ -6998,7 +7004,8 @@ static int igb_set_vf_mac_addr(struct igb_adapter > *adapter, u32 *msg, int vf) > int ret = 0; > > if (!info) { > - if (vf_data->flags & IGB_VF_FLAG_PF_SET_MAC) { > + if ((vf_data->flags & IGB_VF_FLAG_PF_SET_MAC) && > + !vf_data->trusted) { > dev_warn(>dev, >"VF %d attempted to override administratively > set MAC address\nReload the VF driver to resume operations\n", >vf); > @@ -8934,6 +8941,22 @@ static int igb_ndo_set_vf_spoofchk(struct net_device > *netdev, int vf, > return 0; > } > > +static int igb_ndo_set_vf_trust(struct net_device *netdev, int vf, bool > setting) > +{ > + struct igb_adapter *adapter = netdev_priv(netdev); > + > + if (vf >= adapter->vfs_allocated_count) > + return -EINVAL; > + if (adapter->vf_data[vf].trusted == setting) > + return 0; > + > + adapter->vf_data[vf].trusted = setting; > + > + dev_info(>pdev->dev, &
[PATCH] igb: add VF trust infrastructure
* Add a per-VF value to know if a VF is trusted, by default don't trust VFs. * Implement netdev op to trust VFs (igb_ndo_set_vf_trust) and add trust status to ndo_get_vf_config output. * Allow a trusted VF to change MAC and MAC filters even if MAC has been administratively set. Signed-off-by: Corinna Vinschen <vinsc...@redhat.com> --- drivers/net/ethernet/intel/igb/igb.h | 1 + drivers/net/ethernet/intel/igb/igb_main.c | 30 +++--- 2 files changed, 28 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h index 1c6b8d9176a8..55d6f17d5799 100644 --- a/drivers/net/ethernet/intel/igb/igb.h +++ b/drivers/net/ethernet/intel/igb/igb.h @@ -109,6 +109,7 @@ struct vf_data_storage { u16 pf_qos; u16 tx_rate; bool spoofchk_enabled; + bool trusted; }; /* Number of unicast MAC filters reserved for the PF in the RAR registers */ diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index 749fb1f720b4..5cbec6cf2b2a 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -190,6 +190,8 @@ static int igb_ndo_set_vf_vlan(struct net_device *netdev, static int igb_ndo_set_vf_bw(struct net_device *, int, int, int); static int igb_ndo_set_vf_spoofchk(struct net_device *netdev, int vf, bool setting); +static int igb_ndo_set_vf_trust(struct net_device *netdev, int vf, + bool setting); static int igb_ndo_get_vf_config(struct net_device *netdev, int vf, struct ifla_vf_info *ivi); static void igb_check_vf_rate_limit(struct igb_adapter *); @@ -2527,6 +2529,7 @@ static const struct net_device_ops igb_netdev_ops = { .ndo_set_vf_vlan= igb_ndo_set_vf_vlan, .ndo_set_vf_rate= igb_ndo_set_vf_bw, .ndo_set_vf_spoofchk= igb_ndo_set_vf_spoofchk, + .ndo_set_vf_trust = igb_ndo_set_vf_trust, .ndo_get_vf_config = igb_ndo_get_vf_config, #ifdef CONFIG_NET_POLL_CONTROLLER .ndo_poll_controller= igb_netpoll, @@ -6383,6 +6386,9 @@ static int igb_vf_configure(struct igb_adapter *adapter, int vf) /* By default spoof check is enabled for all VFs */ adapter->vf_data[vf].spoofchk_enabled = true; + /* By default VFs are not trusted */ + adapter->vf_data[vf].trusted = false; + return 0; } @@ -6940,13 +6946,13 @@ static int igb_set_vf_mac_filter(struct igb_adapter *adapter, const int vf, } break; case E1000_VF_MAC_FILTER_ADD: - if (vf_data->flags & IGB_VF_FLAG_PF_SET_MAC) { + if ((vf_data->flags & IGB_VF_FLAG_PF_SET_MAC) && + !vf_data->trusted) { dev_warn(>dev, "VF %d requested MAC filter but is administratively denied\n", vf); return -EINVAL; } - if (!is_valid_ether_addr(addr)) { dev_warn(>dev, "VF %d attempted to set invalid MAC filter\n", @@ -6998,7 +7004,8 @@ static int igb_set_vf_mac_addr(struct igb_adapter *adapter, u32 *msg, int vf) int ret = 0; if (!info) { - if (vf_data->flags & IGB_VF_FLAG_PF_SET_MAC) { + if ((vf_data->flags & IGB_VF_FLAG_PF_SET_MAC) && + !vf_data->trusted) { dev_warn(>dev, "VF %d attempted to override administratively set MAC address\nReload the VF driver to resume operations\n", vf); @@ -8934,6 +8941,22 @@ static int igb_ndo_set_vf_spoofchk(struct net_device *netdev, int vf, return 0; } +static int igb_ndo_set_vf_trust(struct net_device *netdev, int vf, bool setting) +{ + struct igb_adapter *adapter = netdev_priv(netdev); + + if (vf >= adapter->vfs_allocated_count) + return -EINVAL; + if (adapter->vf_data[vf].trusted == setting) + return 0; + + adapter->vf_data[vf].trusted = setting; + + dev_info(>pdev->dev, "VF %u is %strusted\n", +vf, setting ? "" : "not "); + return 0; +} + static int igb_ndo_get_vf_config(struct net_device *netdev, int vf, struct ifla_vf_info *ivi) { @@ -8947,6 +8970,7 @@ static int igb_ndo_get_vf_config(struct net_device *netdev, ivi->vlan = adapter->vf_data[vf].pf_vlan; ivi->qos = adapter->vf_data[vf].pf_qos; ivi->spoofchk = adapter->vf_data[vf].spoofchk_enabled; + ivi->trusted = adapter->vf_data[vf].trusted; return 0; } -- 2.14.3
[PATCH v3] igb: Allow to remove administratively set MAC on VFs
Before libvirt modifies the MAC address and vlan tag for an SRIOV VF for use by a virtual machine (either using vfio device assignment or macvtap passthru mode), it saves the current MAC address and vlan tag so that it can reset them to their original value when the guest is done. Libvirt can't leave the VF MAC set to the value used by the now-defunct guest since it may be started again later using a different VF, but it certainly shouldn't just pick any random value, either. So it saves the state of everything prior to using the VF, and resets it to that. The igb driver initializes the MAC addresses of all VFs to 00:00:00:00:00:00, and reports that when asked (via an RTM_GETLINK netlink message, also visible in the list of VFs in the output of "ip link show"). But when libvirt attempts to restore the MAC address back to 00:00:00:00:00:00 (using an RTM_SETLINK netlink message) the kernel responds with "Invalid argument". Forbidding a reset back to the original value leaves the VF MAC at the value set for the now-defunct virtual machine. Especially on a system with NetworkManager enabled, this has very bad consequences, since NetworkManager forces all interfacess to be IFF_UP all the time - if the same virtual machine is restarted using a different VF (or even on a different host), there will be multiple interfaces watching for traffic with the same MAC address. To allow libvirt to revert to the original state, we need a way to remove the administrative set MAC on a VF, to allow normal host operation again, and to reset/overwrite the VF MAC via VF netdev. This patch implements the outlined scenario by allowing to set the VF MAC to 00:00:00:00:00:00 via RTM_SETLINK on the PF. igb_ndo_set_vf_mac resets the IGB_VF_FLAG_PF_SET_MAC flag to 0, so it's possible to reset the VF MAC back to the original value via the VF netdev. Note: Recent patches to libvirt allow for a workaround if the NIC isn't capable of resetting the administrative MAC back to all 0, but in theory the NIC should allow resetting the MAC in the first place. Signed-off-by: Corinna Vinschen <vinsc...@redhat.com> --- drivers/net/ethernet/intel/igb/igb_main.c | 42 +++ 1 file changed, 31 insertions(+), 11 deletions(-) diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index 1cf74aa..dbb4e3c 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -8358,7 +8358,8 @@ static void igb_rar_set_index(struct igb_adapter *adapter, u32 index) /* Indicate to hardware the Address is Valid. */ if (adapter->mac_table[index].state & IGB_MAC_STATE_IN_USE) { - rar_high |= E1000_RAH_AV; + if (is_valid_ether_addr(addr)) + rar_high |= E1000_RAH_AV; if (hw->mac.type == e1000_82575) rar_high |= E1000_RAH_POOL_1 * @@ -8396,17 +8397,36 @@ static int igb_set_vf_mac(struct igb_adapter *adapter, static int igb_ndo_set_vf_mac(struct net_device *netdev, int vf, u8 *mac) { struct igb_adapter *adapter = netdev_priv(netdev); - if (!is_valid_ether_addr(mac) || (vf >= adapter->vfs_allocated_count)) + + if (vf >= adapter->vfs_allocated_count) + return -EINVAL; + + /* Setting the VF MAC to 0 reverts the IGB_VF_FLAG_PF_SET_MAC +* flag and allows to overwrite the MAC via VF netdev. This +* is necessary to allow libvirt a way to restore the original +* MAC after unbinding vfio-pci and reloading igbvf after shutting +* down a VM. +*/ + if (is_zero_ether_addr(mac)) { + adapter->vf_data[vf].flags &= ~IGB_VF_FLAG_PF_SET_MAC; + dev_info(>pdev->dev, +"remove administratively set MAC on VF %d\n", +vf); + } else if (is_valid_ether_addr (mac)) { + adapter->vf_data[vf].flags |= IGB_VF_FLAG_PF_SET_MAC; + dev_info(>pdev->dev, "setting MAC %pM on VF %d\n", +mac, vf); + dev_info(>pdev->dev, +"Reload the VF driver to make this change effective."); + /* Generate additional warning if PF is down */ + if (test_bit(__IGB_DOWN, >state)) { + dev_warn(>pdev->dev, +"The VF MAC address has been set, but the PF device is not up.\n"); + dev_warn(>pdev->dev, +"Bring the PF device up before attempting to use the VF device.\n"); + } + } else { return -EINVAL; - adapter->vf_data[vf].flags |= IGB_VF_FLAG_PF_SET_MAC; - dev_
Re: [Intel-wired-lan] [PATCH v2] igb: Allow to remove administratively set MAC on VFs
On Apr 7 12:06, Jeff Kirsher wrote: > On Wed, 2017-04-05 at 15:46 +0200, Corinna Vinschen wrote: > > Before libvirt modifies the MAC address and vlan tag for an SRIOV > > VF > > for use by a virtual machine (either using vfio device assignment > > or > > macvtap passthru mode), it saves the current MAC address and vlan > > tag > > so that it can reset them to their original value when the guest is > > done. Libvirt can't leave the VF MAC set to the value used by the > > now-defunct guest since it may be started again later using a > > different VF, but it certainly shouldn't just pick any random > > value, > > either. So it saves the state of everything prior to using the VF, > > and > > resets it to that. > > > > The igb driver initializes the MAC addresses of all VFs to > > 00:00:00:00:00:00, and reports that when asked (via an RTM_GETLINK > > netlink message, also visible in the list of VFs in the output of > > "ip > > link show"). But when libvirt attempts to restore the MAC address > > back > > to 00:00:00:00:00:00 (using an RTM_SETLINK netlink message) the > > kernel > > responds with "Invalid argument". > > > > Forbidding a reset back to the original value leaves the VF MAC at > > the > > value set for the now-defunct virtual machine. Especially on a > > system > > with NetworkManager enabled, this has very bad consequences, since > > NetworkManager forces all interfacess to be IFF_UP all the time - > > if > > the same virtual machine is restarted using a different VF (or even > > on > > a different host), there will be multiple interfaces watching for > > traffic with the same MAC address. > > > > To allow libvirt to revert to the original state, we need a way to > > remove the administrative set MAC on a VF, to allow normal host > > operation again, and to reset/overwrite the VF MAC via VF netdev. > > > > This patch implements the outlined scenario by allowing to set the > > VF MAC to 00:00:00:00:00:00 via RTM_SETLINK on the PF. > > igb_ndo_set_vf_mac resets the IGB_VF_FLAG_PF_SET_MAC flag to 0, > > so it's possible to reset the VF MAC back to the original value via > > the VF netdev. > > > > Note: Recent patches to libvirt allow for a workaround if the NIC > > isn't capable of resetting the administrative MAC back to all 0, > > but > > in theory the NIC should allow resetting the MAC in the first > > place. > > > > Signed-off-by: Corinna Vinschen <vinsc...@redhat.com> > > --- > > drivers/net/ethernet/intel/igb/igb_main.c | 42 > > +++ > > 1 file changed, 31 insertions(+), 11 deletions(-) > > This patch does not apply (not even close). Please make sure to base > you patch off my dev-queue branch of my next-queue tree on kernel.org. Right, I applied the patch against net-next. Actually, only the first hunk didn't apply to dev-queue because of the change from igb_rar_set_qsel to igb_rar_set_index. I'll send a v3 agains dev-queue in a bit. Corinna signature.asc Description: PGP signature
[PATCH v2] igb: Allow to remove administratively set MAC on VFs
Before libvirt modifies the MAC address and vlan tag for an SRIOV VF for use by a virtual machine (either using vfio device assignment or macvtap passthru mode), it saves the current MAC address and vlan tag so that it can reset them to their original value when the guest is done. Libvirt can't leave the VF MAC set to the value used by the now-defunct guest since it may be started again later using a different VF, but it certainly shouldn't just pick any random value, either. So it saves the state of everything prior to using the VF, and resets it to that. The igb driver initializes the MAC addresses of all VFs to 00:00:00:00:00:00, and reports that when asked (via an RTM_GETLINK netlink message, also visible in the list of VFs in the output of "ip link show"). But when libvirt attempts to restore the MAC address back to 00:00:00:00:00:00 (using an RTM_SETLINK netlink message) the kernel responds with "Invalid argument". Forbidding a reset back to the original value leaves the VF MAC at the value set for the now-defunct virtual machine. Especially on a system with NetworkManager enabled, this has very bad consequences, since NetworkManager forces all interfacess to be IFF_UP all the time - if the same virtual machine is restarted using a different VF (or even on a different host), there will be multiple interfaces watching for traffic with the same MAC address. To allow libvirt to revert to the original state, we need a way to remove the administrative set MAC on a VF, to allow normal host operation again, and to reset/overwrite the VF MAC via VF netdev. This patch implements the outlined scenario by allowing to set the VF MAC to 00:00:00:00:00:00 via RTM_SETLINK on the PF. igb_ndo_set_vf_mac resets the IGB_VF_FLAG_PF_SET_MAC flag to 0, so it's possible to reset the VF MAC back to the original value via the VF netdev. Note: Recent patches to libvirt allow for a workaround if the NIC isn't capable of resetting the administrative MAC back to all 0, but in theory the NIC should allow resetting the MAC in the first place. Signed-off-by: Corinna Vinschen <vinsc...@redhat.com> --- drivers/net/ethernet/intel/igb/igb_main.c | 42 +++ 1 file changed, 31 insertions(+), 11 deletions(-) diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index 26a821f..c7673a2 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -8093,7 +8093,8 @@ static void igb_rar_set_qsel(struct igb_adapter *adapter, u8 *addr, u32 index, rar_high = le16_to_cpup((__le16 *)(addr + 4)); /* Indicate to hardware the Address is Valid. */ - rar_high |= E1000_RAH_AV; + if (is_valid_ether_addr(addr)) + rar_high |= E1000_RAH_AV; if (hw->mac.type == e1000_82575) rar_high |= E1000_RAH_POOL_1 * qsel; @@ -8125,17 +8126,36 @@ static int igb_set_vf_mac(struct igb_adapter *adapter, static int igb_ndo_set_vf_mac(struct net_device *netdev, int vf, u8 *mac) { struct igb_adapter *adapter = netdev_priv(netdev); - if (!is_valid_ether_addr(mac) || (vf >= adapter->vfs_allocated_count)) + + if (vf >= adapter->vfs_allocated_count) + return -EINVAL; + + /* Setting the VF MAC to 0 reverts the IGB_VF_FLAG_PF_SET_MAC +* flag and allows to overwrite the MAC via VF netdev. This +* is necessary to allow libvirt a way to restore the original +* MAC after unbinding vfio-pci and reloading igbvf after shutting +* down a VM. +*/ + if (is_zero_ether_addr(mac)) { + adapter->vf_data[vf].flags &= ~IGB_VF_FLAG_PF_SET_MAC; + dev_info(>pdev->dev, +"remove administratively set MAC on VF %d\n", +vf); + } else if (is_valid_ether_addr (mac)) { + adapter->vf_data[vf].flags |= IGB_VF_FLAG_PF_SET_MAC; + dev_info(>pdev->dev, "setting MAC %pM on VF %d\n", +mac, vf); + dev_info(>pdev->dev, +"Reload the VF driver to make this change effective."); + /* Generate additional warning if PF is down */ + if (test_bit(__IGB_DOWN, >state)) { + dev_warn(>pdev->dev, +"The VF MAC address has been set, but the PF device is not up.\n"); + dev_warn(>pdev->dev, +"Bring the PF device up before attempting to use the VF device.\n"); + } + } else { return -EINVAL; - adapter->vf_data[vf].flags |= IGB_VF_FLAG_PF_SET_MAC; - dev_info(>pdev->dev, "setting MAC %pM on VF %d
Re: [Intel-wired-lan] [PATCH] igb: Allow to remove administratively set MAC on VFs
Hi Alex, On Apr 4 10:33, Alexander Duyck wrote: > On Tue, Apr 4, 2017 at 10:16 AM, Duyck, Alexander H > <alexander.h.du...@intel.com> wrote: > >> -Original Message- > >> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@lists.osuosl.org] On > >> Behalf Of Corinna Vinschen > >> Sent: Tuesday, April 4, 2017 8:11 AM > >> To: intel-wired-...@lists.osuosl.org > >> Cc: netdev@vger.kernel.org > >> Subject: [Intel-wired-lan] [PATCH] igb: Allow to remove administratively > >> set MAC > >> on VFs > >> [...] > > So I just realized there is one other minor issue I just found. In > igb_rar_set_qsel you should probably add a check for > "is_valid_ether_addr(addr)" before you set the E1000_RAH_AV bit. For > the zeroed MAC address it should be cleared so that we aren't > filtering on a MAC address of all 0's for the VF. > > - Alex I see your point, but I'm a bit reluctant to do that because igb_vf_configure() calls igb_set_vf_mac() with addr set to the zeroed MAC explicitely: eth_zero_addr(mac_addr); igb_set_vf_mac(adapter, vf, mac_addr); So in this case the zero MAC is already treated as valid address and the E1000_RAH_AV bit is set. Is that just a bug? Corinna signature.asc Description: PGP signature
[PATCH] igb: Allow to remove administratively set MAC on VFs
Before libvirt modifies the MAC address and vlan tag for an SRIOV VF for use by a virtual machine (either using vfio device assignment or macvtap passthru mode), it saves the current MAC address and vlan tag so that it can reset them to their original value when the guest is done. Libvirt can't leave the VF MAC set to the value used by the now-defunct guest since it may be started again later using a different VF, but it certainly shouldn't just pick any random value, either. So it saves the state of everything prior to using the VF, and resets it to that. The igb driver initializes the MAC addresses of all VFs to 00:00:00:00:00:00, and reports that when asked (via an RTM_GETLINK netlink message, also visible in the list of VFs in the output of "ip link show"). But when libvirt attempts to restore the MAC address back to 00:00:00:00:00:00 (using an RTM_SETLINK netlink message) the kernel responds with "Invalid argument". Forbidding a reset back to the original value leaves the VF MAC at the value set for the now-defunct virtual machine. Especially on a system with NetworkManager enabled, this has very bad consequences, since NetworkManager forces all interfacess to be IFF_UP all the time - if the same virtual machine is restarted using a different VF (or even on a different host), there will be multiple interfaces watching for traffic with the same MAC address. To allow libvirt to revert to the original state, we need a way to remove the administrative set MAC on a VF, to allow normal host operation again, and to reset/overwrite the VF MAC via VF netdev. This patch implements the outlined scenario by allowing to set the VF MAC to 00:00:00:00:00:00 via RTM_SETLINK on the PF. igb_ndo_set_vf_mac resets the IGB_VF_FLAG_PF_SET_MAC flag to 0, so it's possible to reset the VF MAC back to the original value via the VF netdev. Note: Recent patches to libvirt allow for a workaround if the NIC isn't capable of resetting the administrative MAC back to all 0, but in theory the NIC should allow resetting the MAC in the fisr place. Signed-off-by: Corinna Vinschen <vinsc...@redhat.com> --- drivers/net/ethernet/intel/igb/igb_main.c | 25 - 1 file changed, 20 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index 26a821f..e7a61b1 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -8125,12 +8125,27 @@ static int igb_set_vf_mac(struct igb_adapter *adapter, static int igb_ndo_set_vf_mac(struct net_device *netdev, int vf, u8 *mac) { struct igb_adapter *adapter = netdev_priv(netdev); - if (!is_valid_ether_addr(mac) || (vf >= adapter->vfs_allocated_count)) + + if (vf >= adapter->vfs_allocated_count) + return -EINVAL; + /* Setting the VF MAC to 0 reverts the IGB_VF_FLAG_PF_SET_MAC + flag and allows to overwrite the MAC via VF netdev. This + is necessary to allow libvirt a way to restore the original + MAC after unbinding vfio-pci and reloading igbvf after shutting + down a VM. */ + if (is_zero_ether_addr(mac)) { + adapter->vf_data[vf].flags &= ~IGB_VF_FLAG_PF_SET_MAC; + dev_info(>pdev->dev, +"remove administratively set MAC on VF %d\n", +vf); + } else if (is_valid_ether_addr (mac)) { + adapter->vf_data[vf].flags |= IGB_VF_FLAG_PF_SET_MAC; + dev_info(>pdev->dev, "setting MAC %pM on VF %d\n", +mac, vf); + dev_info(>pdev->dev, +"Reload the VF driver to make this change effective."); + } else return -EINVAL; - adapter->vf_data[vf].flags |= IGB_VF_FLAG_PF_SET_MAC; - dev_info(>pdev->dev, "setting MAC %pM on VF %d\n", mac, vf); - dev_info(>pdev->dev, -"Reload the VF driver to make this change effective."); if (test_bit(__IGB_DOWN, >state)) { dev_warn(>pdev->dev, "The VF MAC address has been set, but the PF device is not up.\n"); -- 2.9.3
Re: [Intel-wired-lan] [PATCH] igb: use igb_adapter->io_addr instead of e1000_hw->hw_addr
On Nov 10 05:48, Hisashi T Fujinaka wrote: > On Thu, 10 Nov 2016, Corinna Vinschen wrote: > > On Nov 8 11:33, Alexander Duyck wrote: > ... > > > The question I would have is what is reading the device when it is in > > > this state. The watchdog and any other functions that would read the > > > device should be disabled. > > > > > > One possibility could be a race between a call to igb_close and the > > > igb_suspend function. We have seen some of those pop up recently on > > > ixgbe and it looks like igb has the same bug. We should probably be > > > using the rtnl_lock to guarantee that netif_device_detach and the call > > > to __igb_close are completed before igb_close could possibly be called > > > by the network stack. > > > > Do you have a pointer to the related ixgbe patch, by any chance? > ... > Here's the initial patch for igb I have, but it's on hold awaiting more > changes in ixgbe regarding AER. Thanks a lot! Corinna signature.asc Description: PGP signature
Re: [Intel-wired-lan] [PATCH] igb: use igb_adapter->io_addr instead of e1000_hw->hw_addr
On Nov 8 11:33, Alexander Duyck wrote: > On Tue, Nov 8, 2016 at 10:37 AM, Corinna Vinschen <vinsc...@redhat.com> wrote: > > On Nov 8 09:16, Hisashi T Fujinaka wrote: > >> On Tue, 8 Nov 2016, Corinna Vinschen wrote: > >> > On Nov 8 15:06, Cao jin wrote: > >> > > When running as guest, under certain condition, it will oops as > >> > > following. > >> > > writel() in igb_configure_tx_ring() results in oops, because > >> > > hw->hw_addr > >> > > is NULL. While other register access won't oops kernel because they use > >> > > wr32/rd32 which have a defense against NULL pointer. > >> > > [...] > >> > > >> > Incidentally we're just looking for a solution to that problem too. > >> > Do three patches to fix the same problem at rougly the same time already > >> > qualify as freak accident? > >> > > >> > FTR, I attached my current patch, which I was planning to submit after > >> > some external testing. > >> > > >> > However, all three patches have one thing in common: They workaround > >> > a somewhat dubious resetting of the hardware address to NULL in case > >> > reading from a register failed. > >> > > >> > That makes me wonder if setting the hardware address to NULL in > >> > rd32/igb_rd32 is really such a good idea. It's performed in a function > >> > which return value is *never* tested for validity in the calling > >> > functions and leads to subsequent crashes since no tests for hw_addr == > >> > NULL are performed. > >> > > >> > Maybe commit 22a8b2915 should be reconsidered? Isn't there some more > >> > graceful way to handle the "surprise removal"? > >> > >> Answering this from my home account because, well, work is Outlook. > >> > >> "Reconsidering" would be great. In fact, revert if if you'd like. I'm > >> uncertain that the surprise removal code actually works the way I > >> thought previously and I think I took a lot of it out of my local code. > >> > >> Unfortuantely I don't have any equipment that I can use to reproduce > >> surprise removal any longer so that means I wouldn't be able to test > >> anything. I have to defer to you or Cao Jin. > > > > I'm not too keen to rip out a PCIe NIC under power from my locale > > desktop machine, but I think an actual surprise removal is not the > > problem. > > > > As described in my git log entry, the error condition in igb_rd32 can be > > triggered during a suspend. The HW has been put into a sleep state but > > some register read requests are apparently not guarded against that > > situation. Reading a register in this state returns -1, thus a suspend > > is erroneously triggering the "surprise removal" sequence. > > The question I would have is what is reading the device when it is in > this state. The watchdog and any other functions that would read the > device should be disabled. > > One possibility could be a race between a call to igb_close and the > igb_suspend function. We have seen some of those pop up recently on > ixgbe and it looks like igb has the same bug. We should probably be > using the rtnl_lock to guarantee that netif_device_detach and the call > to __igb_close are completed before igb_close could possibly be called > by the network stack. Do you have a pointer to the related ixgbe patch, by any chance? > > Here's a raw idea: > > > > - Note that device is suspended in e1000_hw struct. Don't trigger > > error sequence in igb_rd32 if so (...and return a 0 value???) > > The thing is that a suspended device should not be accessed at all. > If we are accessing it while it is suspended then that is a bug. If > you could throw a WARN_ON call in igb_rd32 to capture where this is > being triggered that might be useful. > > > - Otherwise assume it's actually a surprise removal. In theory that > > should somehow trigger a device removal sequence, kind of like > > calling igb_remove, no? > > Well a read of the MMIO region while suspended is more of a surprise > read since there shouldn't be anything going on. We need to isolate > where that read is coming from and fix it. That would be ideal, but the problem couldn't be reproduced yet apart from at a customer's customer site. It's not clear yet if we can access the machine for further testing. Corinna signature.asc Description: PGP signature
Re: [Intel-wired-lan] [PATCH] igb: use igb_adapter->io_addr instead of e1000_hw->hw_addr
On Nov 8 09:16, Hisashi T Fujinaka wrote: > On Tue, 8 Nov 2016, Corinna Vinschen wrote: > > On Nov 8 15:06, Cao jin wrote: > > > When running as guest, under certain condition, it will oops as following. > > > writel() in igb_configure_tx_ring() results in oops, because hw->hw_addr > > > is NULL. While other register access won't oops kernel because they use > > > wr32/rd32 which have a defense against NULL pointer. > > > [...] > > > > Incidentally we're just looking for a solution to that problem too. > > Do three patches to fix the same problem at rougly the same time already > > qualify as freak accident? > > > > FTR, I attached my current patch, which I was planning to submit after > > some external testing. > > > > However, all three patches have one thing in common: They workaround > > a somewhat dubious resetting of the hardware address to NULL in case > > reading from a register failed. > > > > That makes me wonder if setting the hardware address to NULL in > > rd32/igb_rd32 is really such a good idea. It's performed in a function > > which return value is *never* tested for validity in the calling > > functions and leads to subsequent crashes since no tests for hw_addr == > > NULL are performed. > > > > Maybe commit 22a8b2915 should be reconsidered? Isn't there some more > > graceful way to handle the "surprise removal"? > > Answering this from my home account because, well, work is Outlook. > > "Reconsidering" would be great. In fact, revert if if you'd like. I'm > uncertain that the surprise removal code actually works the way I > thought previously and I think I took a lot of it out of my local code. > > Unfortuantely I don't have any equipment that I can use to reproduce > surprise removal any longer so that means I wouldn't be able to test > anything. I have to defer to you or Cao Jin. I'm not too keen to rip out a PCIe NIC under power from my locale desktop machine, but I think an actual surprise removal is not the problem. As described in my git log entry, the error condition in igb_rd32 can be triggered during a suspend. The HW has been put into a sleep state but some register read requests are apparently not guarded against that situation. Reading a register in this state returns -1, thus a suspend is erroneously triggering the "surprise removal" sequence. Here's a raw idea: - Note that device is suspended in e1000_hw struct. Don't trigger error sequence in igb_rd32 if so (...and return a 0 value???) - Otherwise assume it's actually a surprise removal. In theory that should somehow trigger a device removal sequence, kind of like calling igb_remove, no? Thanks, Corinna signature.asc Description: PGP signature
Re: [Intel-wired-lan] [PATCH] igb: use igb_adapter->io_addr instead of e1000_hw->hw_addr
On Nov 8 15:06, Cao jin wrote: > When running as guest, under certain condition, it will oops as following. > writel() in igb_configure_tx_ring() results in oops, because hw->hw_addr > is NULL. While other register access won't oops kernel because they use > wr32/rd32 which have a defense against NULL pointer. > > [ 141.225449] pcieport :00:1c.0: AER: Multiple Uncorrected (Fatal) > error received: id=0101 > [ 141.225523] igb :01:00.1: PCIe Bus Error: > severity=Uncorrected (Fatal), type=Unaccessible, > id=0101(Unregistered Agent ID) > [ 141.299442] igb :01:00.1: broadcast error_detected message > [ 141.300539] igb :01:00.0 enp1s0f0: PCIe link lost, device now > detached > [ 141.351019] igb :01:00.1 enp1s0f1: PCIe link lost, device now > detached > [ 143.465904] pcieport :00:1c.0: Root Port link has been reset > [ 143.465994] igb :01:00.1: broadcast slot_reset message > [ 143.466039] igb :01:00.0: enabling device ( -> 0002) > [ 144.389078] igb :01:00.1: enabling device ( -> 0002) > [ 145.312078] igb :01:00.1: broadcast resume message > [ 145.322211] BUG: unable to handle kernel paging request at > 3818 > [ 145.361275] IP: [] > igb_configure_tx_ring+0x14d/0x280 [igb] > [ 145.400048] PGD 0 > [ 145.438007] Oops: 0002 [#1] SMP > > A similiar issue & solution could be found at: > http://patchwork.ozlabs.org/patch/689592/ > > Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com> > --- > drivers/net/ethernet/intel/igb/igb_main.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/ethernet/intel/igb/igb_main.c > b/drivers/net/ethernet/intel/igb/igb_main.c > index edc9a6a..3f240ac 100644 > --- a/drivers/net/ethernet/intel/igb/igb_main.c > +++ b/drivers/net/ethernet/intel/igb/igb_main.c > @@ -3390,7 +3390,7 @@ void igb_configure_tx_ring(struct igb_adapter *adapter, >tdba & 0xULL); > wr32(E1000_TDBAH(reg_idx), tdba >> 32); > > - ring->tail = hw->hw_addr + E1000_TDT(reg_idx); > + ring->tail = adapter->io_addr + E1000_TDT(reg_idx); > wr32(E1000_TDH(reg_idx), 0); > writel(0, ring->tail); > > @@ -3729,7 +3729,7 @@ void igb_configure_rx_ring(struct igb_adapter *adapter, >ring->count * sizeof(union e1000_adv_rx_desc)); > > /* initialize head and tail */ > - ring->tail = hw->hw_addr + E1000_RDT(reg_idx); > + ring->tail = adapter->io_addr + E1000_RDT(reg_idx); > wr32(E1000_RDH(reg_idx), 0); > writel(0, ring->tail); > > -- > 2.1.0 Incidentally we're just looking for a solution to that problem too. Do three patches to fix the same problem at rougly the same time already qualify as freak accident? FTR, I attached my current patch, which I was planning to submit after some external testing. However, all three patches have one thing in common: They workaround a somewhat dubious resetting of the hardware address to NULL in case reading from a register failed. That makes me wonder if setting the hardware address to NULL in rd32/igb_rd32 is really such a good idea. It's performed in a function which return value is *never* tested for validity in the calling functions and leads to subsequent crashes since no tests for hw_addr == NULL are performed. Maybe commit 22a8b2915 should be reconsidered? Isn't there some more graceful way to handle the "surprise removal"? Thanks, Corinna From c6e80c04a7f20bdf3bde490ff842bcc1c800bf2a Mon Sep 17 00:00:00 2001 From: Corinna Vinschen <vinsc...@redhat.com> Date: Mon, 24 Oct 2016 16:44:55 +0200 Subject: [RHEL7.4 PATCH] igb_resume: Fix up hw_addr on resume A suspend hanging for too long can trigger spurious reads. The device is already suspended so igb_rd32 fails to read and sets hw->hw_addr to NULL. If we don't fix that here, subsequent code will fail to write to HW registers and igb crashes eventually in a writel call. Signed-off-by: Corinna Vinschen <vinsc...@redhat.com> --- drivers/net/ethernet/intel/igb/igb_main.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index 8e96c35..d1f9d34 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -7577,6 +7577,9 @@ static int igb_resume(struct device *dev) pci_enable_wake(pdev, PCI_D3hot, 0); pci_enable_wake(pdev, PCI_D3cold, 0); + if (E1000_REMOVED(hw->hw_addr) && adapter->io_addr) + hw->hw_addr = adapter->io_addr; + if (igb_init_interrupt_scheme(adapter, true)) { dev_err(>dev, "Unable to allocate memory for queues\n"); return -ENOMEM; -- 2.5.5 signature.asc Description: PGP signature
Re: [PATCH] igb: Fix VLAN tag stripping on Intel i350
Hi Jeff, On Jan 27 11:25, Jeff Kirsher wrote: > On Wed, 2016-01-27 at 14:28 +0100, Corinna Vinschen wrote: > > Problem: When switching off VLAN offloading on an i350, the VLAN > > interface gets unusable. For testing, set up a VLAN on an i350 > > and some remote machine, e.g.: > > [...] > I tried applying your patch to my tree for review and validation, but > due to patches already applied against the igb driver in my tree, your > patch does not apply cleanly. > > Can you please update your patch to apply cleanly to my tree? > git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue.git > dev-queue > > (next-queue tree and dev-queue branch) The attached patch is against the dev-queue branch of your next-queue tree. I retested again using an i350 as well as a 82580. On the i350 I tested additionally adding and removing VFs while offloading was switched on or off, and it worked as desired. I'm just a bit unsure about the correctness of the igb_set_vf_vlan_strip call from igb_vf_reset. I think it's the right thing to do since resetting the VF requires to reevaluate the STRVLAN flag. I'd feel better if you could double check, though. Thanks, Corinna From ab27415c3d3be2beb978316007f84fb54d12b619 Mon Sep 17 00:00:00 2001 From: Corinna Vinschen <vinsc...@redhat.com> Date: Thu, 28 Jan 2016 13:31:11 +0100 Subject: [PATCH] igb: Fix VLAN tag stripping on Intel i350 Problem: When switching off VLAN offloading on an i350, the VLAN interface gets unusable. For testing, set up a VLAN on an i350 and some remote machine, e.g.: $ ip link add link eth0 name eth0.42 type vlan id 42 $ ip addr add 192.168.42.1/24 dev eth0.42 $ ip link set dev eth0.42 up Offloading is switched on by default: $ ethtool -k eth0 | grep vlan-offload rx-vlan-offload: on tx-vlan-offload: on $ ping -c 3 -I eth0.42 192.168.42.2 [...works as usual...] Now switch off VLAN offloading and try again: $ ethtool -K eth0 rxvlan off Actual changes: rx-vlan-offload: off tx-vlan-offload: off [requested on] $ ping -c 3 -I eth0.42 192.168.42.2 PING 192.168.42.2 (192.168.42.2) from 192.168.42.1 eth0.42: 56(84) bytes of da ta. --- 192.168.42.2 ping statistics --- 3 packets transmitted, 0 received, 100% packet loss, time 1999ms I can only reproduce it on an i350, the above works fine on a 82580. While inspecting the igb source, I came across the code in igb_set_vmolr which sets the E1000_VMOLR_STRVLAN/E1000_DVMOLR_STRVLAN flags once and for all, and in all of the igb code there's no other place where the STRVLAN is set or cleared. Thus, VLAN stripping is enabled in igb unconditionally, independently of the offloading setting. I compared that to the latest Intel igb-5.3.3.5 driver from http://sourceforge.net/projects/e1000/ which in fact sets and clears the STRVLAN flag independently from igb_set_vmolr in its own function igb_set_vf_vlan_strip, depending on the vlan settings. So I included the STRVLAN handling from the igb-5.3.3.5 driver into our current igb driver and tested the above scenario again. This time ping still works after switching off VLAN offloading. Tested on i350, with and without addtional VFs, as well as on 82580 successfully. Signed-off-by: Corinna Vinschen <vinsc...@redhat.com> --- drivers/net/ethernet/intel/igb/igb_main.c | 41 --- 1 file changed, 32 insertions(+), 9 deletions(-) diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index 95d6042..c414e41 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -3699,6 +3699,28 @@ static inline int igb_set_vf_rlpml(struct igb_adapter *adapter, int size, return 0; } +static inline void igb_set_vf_vlan_strip(struct igb_adapter *adapter, +int vfn, bool enable) +{ + struct e1000_hw *hw = >hw; + u32 val, reg; + + if (hw->mac.type < e1000_82576) + return; + + if (hw->mac.type == e1000_i350) + reg = E1000_DVMOLR(vfn); + else + reg = E1000_VMOLR(vfn); + + val = rd32(reg); + if (enable) + val |= E1000_VMOLR_STRVLAN; + else + val &= ~(E1000_VMOLR_STRVLAN); + wr32(reg, val); +} + static inline void igb_set_vmolr(struct igb_adapter *adapter, int vfn, bool aupe) { @@ -3712,14 +3734,6 @@ static inline void igb_set_vmolr(struct igb_adapter *adapter, return; vmolr = rd32(E1000_VMOLR(vfn)); - vmolr |= E1000_VMOLR_STRVLAN; /* Strip vlan tags */ - if (hw->mac.type == e1000_i350) { - u32 dvmolr; - - dvmolr = rd32(E1000_DVMOLR(vfn)); - dvmolr |= E1000_DVMOLR_STRVLAN; - wr32(E1000_DVMOLR(vfn), dvmolr); - } if (aupe)
[PATCH] igb: Fix VLAN tag stripping on Intel i350
Problem: When switching off VLAN offloading on an i350, the VLAN interface gets unusable. For testing, set up a VLAN on an i350 and some remote machine, e.g.: $ ip link add link eth0 name eth0.42 type vlan id 42 $ ip addr add 192.168.42.1/24 dev eth0.42 $ ip link set dev eth0.42 up Offloading is switched on by default: $ ethtool -k eth0 | grep vlan-offload rx-vlan-offload: on tx-vlan-offload: on $ ping -c 3 -I eth0.42 192.168.42.2 [...works as usual...] Now switch off VLAN offloading and try again: $ ethtool -K eth0 rxvlan off Actual changes: rx-vlan-offload: off tx-vlan-offload: off [requested on] $ ping -c 3 -I eth0.42 192.168.42.2 PING 192.168.42.2 (192.168.42.2) from 192.168.42.1 eth0.42: 56(84) bytes of data. --- 192.168.42.2 ping statistics --- 3 packets transmitted, 0 received, 100% packet loss, time 1999ms I can only reproduce it on an i350, the above works fine on a 82580. While inspecting the igb source, I came across the code in igb_set_vmolr which sets the E1000_VMOLR_STRVLAN/E1000_DVMOLR_STRVLAN flags once and for all, and in all of the igb code there's no other place where the STRVLAN is set or cleared. Thus, VLAN stripping is enabled in igb unconditionally, independently of the offloading setting. I compared that to the latest Intel igb-5.3.3.5 driver from http://sourceforge.net/projects/e1000/ which in fact sets and clears the STRVLAN flag independently from igb_set_vmolr in its own function igb_set_vf_vlan_strip, depending on the vlan settings. So I included the STRVLAN handling from the igb-5.3.3.5 driver into our current igb driver and tested the above scenario again. This time ping still works after switching off VLAN offloading. Tested on i350, with and without addtional VFs, as well as on 82580 successfully. Signed-off-by: Corinna Vinschen <vinsc...@redhat.com> --- drivers/net/ethernet/intel/igb/igb_main.c | 39 --- 1 file changed, 31 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index 31e5f39..afd3990 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -3580,6 +3580,28 @@ static void igb_rlpml_set(struct igb_adapter *adapter) wr32(E1000_RLPML, max_frame_size); } +static inline void igb_set_vf_vlan_strip(struct igb_adapter *adapter, +int vfn, bool enable) +{ + struct e1000_hw *hw = >hw; + u32 val, reg; + + if (hw->mac.type < e1000_82576) + return; + + if (hw->mac.type == e1000_i350) + reg = E1000_DVMOLR(vfn); + else + reg = E1000_VMOLR(vfn); + + val = rd32(reg); + if (enable) + val |= E1000_VMOLR_STRVLAN; + else + val &= ~(E1000_VMOLR_STRVLAN); + wr32(reg, val); +} + static inline void igb_set_vmolr(struct igb_adapter *adapter, int vfn, bool aupe) { @@ -3593,14 +3615,6 @@ static inline void igb_set_vmolr(struct igb_adapter *adapter, return; vmolr = rd32(E1000_VMOLR(vfn)); - vmolr |= E1000_VMOLR_STRVLAN; /* Strip vlan tags */ - if (hw->mac.type == e1000_i350) { - u32 dvmolr; - - dvmolr = rd32(E1000_DVMOLR(vfn)); - dvmolr |= E1000_DVMOLR_STRVLAN; - wr32(E1000_DVMOLR(vfn), dvmolr); - } if (aupe) vmolr |= E1000_VMOLR_AUPE; /* Accept untagged packets */ else @@ -5937,6 +5951,7 @@ static int igb_ndo_set_vf_vlan(struct net_device *netdev, goto out; igb_set_vmvir(adapter, vlan | (qos << VLAN_PRIO_SHIFT), vf); igb_set_vmolr(adapter, vf, !vlan); + igb_set_vf_vlan_strip(adapter, vf, true); adapter->vf_data[vf].pf_vlan = vlan; adapter->vf_data[vf].pf_qos = qos; dev_info(>pdev->dev, @@ -5952,6 +5967,7 @@ static int igb_ndo_set_vf_vlan(struct net_device *netdev, false, vf); igb_set_vmvir(adapter, vlan, vf); igb_set_vmolr(adapter, vf, true); + igb_set_vf_vlan_strip(adapter, vf, false); adapter->vf_data[vf].pf_vlan = 0; adapter->vf_data[vf].pf_qos = 0; } @@ -5986,6 +6002,11 @@ static int igb_set_vf_vlan(struct igb_adapter *adapter, u32 *msgbuf, u32 vf) int vid = (msgbuf[1] & E1000_VLVF_VLANID_MASK); int err = 0; + if (vid) + igb_set_vf_vlan_strip(adapter, vf, true); + else + igb_set_vf_vlan_strip(adapter, vf, false); + /* If in promiscuous mode we need to make sure the PF also has * the VLAN filter set. */ @@ -7202,6 +7223,8 @@ static void igb_vlan_mode(struct net_device *netd
Re: [PATCH v2] r8169: Don't claim WoL works if LanWake flag is not set
On Dec 11 01:06, Francois Romieu wrote: > Corinna Vinschen <vinsc...@redhat.com> : > [...] > > It's still a bit weird. On the machines I tested this on, if I disable > > LanWake and shutdown the machine, I can send, e.g., MagicPackets as much > > as I like, the machined don't come up. Isn't it a bit misleading then > > if ethtool reports that some WoL method is enabled but it doesn't work? > > Of course it is. :o( > > I'm fine with Config5.LanWake changes if you have empirical evidences that > it helps. > > We have terse - outdated ? - documentation and some hint from > http://marc.info/?l=linux-netdev=137654699802446. I'm unable to figure > what an/the adequate change could be, especially a low level chance of > regression one. I think the problem here is that LanWake only switches off aspects of the WoL capability which can't be reflected in a reliable way to the kernel. That's certainly one reason for the driver to enable/disable LanWake always in lock-step with PMEnable. So I wonder if we shouldn't just add some code to rtl_init_one (or create a new function called from rtl_init_one) which checks the WoL flags and if the PmConfig and LanWake flags are set inconsistently (aka "differently") then set them to an equal value, either 0 (no WoL method enabled) or 1 (any WoL method enabled). Does that make sense? Thanks, Corinna pgpN4_FLvqZpy.pgp Description: PGP signature
Re: [PATCH v2] r8169: Don't claim WoL works if LanWake flag is not set
On Dec 9 23:43, Francois Romieu wrote: > Corinna Vinschen <vinsc...@redhat.com> : > [...] > > This patch is supposed to fix this behaviour. If LanWake is 0, the > > function now returns 0. Thus ethtool correctly reports "Wake-on: d". > > Can you turn it into a DMI controlled one (something like > drivers/net/ethernet/marvell/skge.c use of dmi_check_system) I could do this (after I could lay my hands on such a board, that is), but I'm not convinced that this makes a lot of sense for two reasons. > in order to > avoid a global change of behavior ? 1. There is no global change in behaviour. The usual way to handle the WoL flags is to set the affected method flags and additionally set LanWake if any of the method flags is set. The fact that the method flags don't enable WoL without also settting the LanWake flag is documented. __rtl8169_get_wol not reflecting this is a bug. The code lazily assumes that checking the WoL method flags is sufficient while in fact it isn't. __rtl8169_set_wol sets the LanWake flag accordingly, but that doesn't mean the driver may assume that the flags haven't been set differently. I can easily hack the driver to set LanWake to 0 and ethtool would still happily report WoL is active. That's plain wrong. 2. While we now know a single board which neglects to set the LanWake flag, that doesn't mean there aren't other boards out there doing the same. On top of that, the state of the NIC registers in terms of WoL are *not* board-specific. They are regular NIC registers which are just set in a combination which the driver in it's current state evaluates wrongly. It doesn't matter who and why the flags have been set that way. The driver should reflect the actual state, and that requires to check for LanWake. For those reasons I think that my fix is the right thing to do. > Btw it's probably time to emit some warning during driver probe if wol > bits are not consistent with LanWake. That's a good idea. I'll propose a followup patch with this addition. Thanks, Corinna pgp6boYAPgk5h.pgp Description: PGP signature
Re: [PATCH v2] r8169: Don't claim WoL works if LanWake flag is not set
On Dec 10 21:40, Francois Romieu wrote: > Corinna Vinschen <vinsc...@redhat.com> : > [...] > > I could do this (after I could lay my hands on such a board, that is), > > but I'm not convinced that this makes a lot of sense for two reasons. > > Ok, let's get this change applied. Whatever happens should not be > hard to manage (I'm thinking about other boards or BIOSes relying on the > current - broken as it can be - behavior to work correctly). > > [...] > > 1. There is no global change in behaviour. The usual way to handle the > > WoL flags is to set the affected method flags and additionally set > > LanWake if any of the method flags is set. The fact that the method > > flags don't enable WoL without also settting the LanWake flag is > > documented. > > I see no such thing in "7.5 Power Management Function" of the 8168c > registers datasheet. While Config3 states Magic Packet and Link Up > dependencies on Config1.PMEn, it says nothing about Config5.LanWake. > > On old 8169 chipsets LanWake is autoloaded from EEPROM. > > Plausible for Config5.{B, M, U}WF ? Ok. > > Documented ? I am genuinely curious to know where. Ok, I reread the documentation I have, and I got that wrong it seems. Apparently the LanWake flag enables or disables the LANWAKE/LANWAKEB pin only but not the other possible PM events. So, self-NACKed. It's still a bit weird. On the machines I tested this on, if I disable LanWake and shutdown the machine, I can send, e.g., MagicPackets as much as I like, the machined don't come up. Isn't it a bit misleading then if ethtool reports that some WoL method is enabled but it doesn't work? Corinna pgpdp6cN6Fxeq.pgp Description: PGP signature
[PATCH v2] r8169: Don't claim WoL works if LanWake flag is not set
[v2 adds missing text to the patch comment] The __rtl8169_get_wol function returns the state of the various WoL method bits (MagicPaket, Unicast, etc.). While the PMEnable bit is tested at entry, the function doesn't check the LanWake flag. Even if any one of the WoL bits is set, WoL is deactivated as a whole if LanWake isn't set. The return value from __rtl8169_get_wol doesn't reflect that. Unfortunately BIOS exist (HP T620) which set the MagicPaket bit to 1 but the LanWake flag to 0 when switching on WoL in the BIOS. On those machines, ethtool wrongly claims that WoL via MagicPaket is activated while in fact it isn't, unless you explicitely called, e.g, `ethtool -s wol g'. This patch is supposed to fix this behaviour. If LanWake is 0, the function now returns 0. Thus ethtool correctly reports "Wake-on: d". The patch is against Dave's net-next tree. Signed-off-by: Corinna Vinschen <vinsc...@redhat.com> --- drivers/net/ethernet/realtek/r8169.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index 79ef799..813ad2b 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -1705,6 +1705,10 @@ static u32 __rtl8169_get_wol(struct rtl8169_private *tp) if (!(options & PMEnable)) return 0; + options = RTL_R8(Config5); + if (!(options & LanWake)) + return 0; + options = RTL_R8(Config3); if (options & LinkUp) wolopts |= WAKE_PHY; -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] r8169: Don't claim WoL works if LanWake flag is not set
The __rtl8169_get_wol function returns the state of the various WoL method bits (MagicPaket, Unicast, etc.). While the PMEnable bit is tested at entry, the function doesn't check the LanWake flag. Even if any one of the WoL bits is set, WoL is deactivated as a whole if LanWake isn't set. The return value from __rtl8169_get_wol doesn't reflect that. Unfortunately BIOS exist (HP T620) which set the MagicPaket bit to 1 but the LanWake flag to 0 when switching on WoL in the BIOS. On those machines, ethtool wrongly claims that WoL via MagicPaket is activated while in fact it isn't, unless you explicitely called This patch is supposed to fix this behaviour. If LanWake is 0, the function now returns 0. Thus ethtool correctly reports "Wake-on: d". The patch is against Dave's net-next tree. Signed-off-by: Corinna Vinschen <vinsc...@redhat.com> --- drivers/net/ethernet/realtek/r8169.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index 79ef799..813ad2b 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -1705,6 +1705,10 @@ static u32 __rtl8169_get_wol(struct rtl8169_private *tp) if (!(options & PMEnable)) return 0; + options = RTL_R8(Config5); + if (!(options & LanWake)) + return 0; + options = RTL_R8(Config3); if (options & LinkUp) wolopts |= WAKE_PHY; -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kasan r8169 use-after-free trace.
On Nov 11 05:16, Eric Dumazet wrote: > On Wed, 2015-11-11 at 10:19 +0100, Francois Romieu wrote: > > Dave Jones <da...@codemonkey.org.uk> : > > > This happens during boot, (and then there's a flood of traces that happen > > > so fast > > > afterwards it completely overwhelms serial console; not sure if they're > > > the > > > same/related or not). > > > > > > == > > > BUG: KASAN: use-after-free in rtl8169_poll+0x4b6/0xb70 at addr > > > 8801d43b3288 > > > Read of size 1 by task kworker/0:3/188 > > > = > > > BUG kmalloc-256 (Not tainted): kasan: bad access detected > > > - > > > > > > Disabling lock debugging due to kernel taint > > > INFO: Slab 0xea000750ecc0 objects=16 used=16 fp=0x (null) > > > flags=0x8080 > > > INFO: Object 0x8801d43b3200 @offset=512 fp=0x8801d43b3800 > > > > > > Bytes b4 8801d43b31f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > > 00 > > > Object 8801d43b3200: 00 38 3b d4 01 88 ff ff 00 00 00 00 00 00 00 00 > > > .8;. > > > > Does the patch below cure it ? > > > > diff --git a/drivers/net/ethernet/realtek/r8169.c > > b/drivers/net/ethernet/realtek/r8169.c > > index b4f2123..79ef799 100644 > > --- a/drivers/net/ethernet/realtek/r8169.c > > +++ b/drivers/net/ethernet/realtek/r8169.c > > @@ -7429,15 +7429,15 @@ process_pkt: > > > > rtl8169_rx_vlan_tag(desc, skb); > > > > + if (skb->pkt_type == PACKET_MULTICAST) > > + dev->stats.multicast++; > > + > > napi_gro_receive(>napi, skb); > > > > u64_stats_update_begin(>rx_stats.syncp); > > tp->rx_stats.packets++; > > tp->rx_stats.bytes += pkt_size; > > u64_stats_update_end(>rx_stats.syncp); > > - > > - if (skb->pkt_type == PACKET_MULTICAST) > > - dev->stats.multicast++; > > } > > release_descriptor: > > desc->opts2 = 0; > > This looks obvious indeed, please submit this formally Francois ;) Yes, please. Thank you Francois. > Fixes: d7d2d89d4b0af ("r8169: Add software counter for multicast packages") > Acked-by: Eric Dumazet <eduma...@google.com> Acked-by: Corinna Vinschen <vinsc...@redhat.com> Corinna pgpOTQv1S7P6H.pgp Description: PGP signature
Re: [PATCH v2] r8169: Fix sleeping function called during get_stats64
On Sep 9 20:31, David Miller wrote: > From: Corinna Vinschen <vinsc...@redhat.com> > Date: Wed, 9 Sep 2015 23:16:40 +0200 > > > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=104031 > > Fixes: 6e85d5ad36a26debc23a9a865c029cbe242b2dc8 > > > > Based on the discussion starting at > > http://www.spinics.net/lists/netdev/msg342193.html > > > > Tested locally on RTL8168evl/8111evl with various concurrent processes > > accessing /proc/net/dev while changing the link state as well as > > removing/reloading the r8169 module. > > > > Signed-off-by: Corinna Vinschen <vinsc...@redhat.com> > > Please address Francois's minor feedback, camelcase drives me nuts > too FWIW :-) Haha, with pleasure. I don't like them much either. I just used them to align to the original code, which uses camel back for the rx/tx descriptor ring buffers allocated the same way. I'll send a v3 in a minute which fixes all of Francois points. Thanks, Corinna pgpkZoFaiGneN.pgp Description: PGP signature
[PATCH v3] r8169: Fix sleeping function called during get_stats64, v2
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=104031 Fixes: 6e85d5ad36a26debc23a9a865c029cbe242b2dc8 Based on the discussion starting at http://www.spinics.net/lists/netdev/msg342193.html Tested locally on RTL8168evl/8111evl with various concurrent processes accessing /proc/net/dev while changing the link state as well as removing/reloading the r8169 module. Signed-off-by: Corinna Vinschen <vinsc...@redhat.com> --- v2 -> v3: Rename CntPhysAddr -> counters_phys_addr Rename CntArray-> counters Fix whitespace issue Fix label names in rtl_init_one. drivers/net/ethernet/realtek/r8169.c | 137 ++- 1 file changed, 54 insertions(+), 83 deletions(-) diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index 24dcbe6..2b32e0c 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -833,7 +833,8 @@ struct rtl8169_private { unsigned features; struct mii_if_info mii; - struct rtl8169_counters counters; + dma_addr_t counters_phys_addr; + struct rtl8169_counters *counters; struct rtl8169_tc_offsets tc_offset; u32 saved_wolopts; u32 opts1_mask; @@ -2190,53 +2191,37 @@ static int rtl8169_get_sset_count(struct net_device *dev, int sset) } } -static struct rtl8169_counters *rtl8169_map_counters(struct net_device *dev, -dma_addr_t *paddr, -u32 counter_cmd) +DECLARE_RTL_COND(rtl_counters_cond) { - struct rtl8169_private *tp = netdev_priv(dev); void __iomem *ioaddr = tp->mmio_addr; - struct device *d = >pci_dev->dev; - struct rtl8169_counters *counters; - u32 cmd; - counters = dma_alloc_coherent(d, sizeof(*counters), paddr, GFP_KERNEL); - if (counters) { - RTL_W32(CounterAddrHigh, (u64)*paddr >> 32); - cmd = (u64)*paddr & DMA_BIT_MASK(32); - RTL_W32(CounterAddrLow, cmd); - RTL_W32(CounterAddrLow, cmd | counter_cmd); - } - return counters; + return RTL_R32(CounterAddrLow) & (CounterReset | CounterDump); } -static void rtl8169_unmap_counters (struct net_device *dev, - dma_addr_t paddr, - struct rtl8169_counters *counters) +static bool rtl8169_do_counters(struct net_device *dev, u32 counter_cmd) { struct rtl8169_private *tp = netdev_priv(dev); void __iomem *ioaddr = tp->mmio_addr; - struct device *d = >pci_dev->dev; + dma_addr_t paddr = tp->counters_phys_addr; + u32 cmd; + bool ret; - RTL_W32(CounterAddrLow, 0); - RTL_W32(CounterAddrHigh, 0); + RTL_W32(CounterAddrHigh, (u64)paddr >> 32); + cmd = (u64)paddr & DMA_BIT_MASK(32); + RTL_W32(CounterAddrLow, cmd); + RTL_W32(CounterAddrLow, cmd | counter_cmd); - dma_free_coherent(d, sizeof(*counters), counters, paddr); -} + ret = rtl_udelay_loop_wait_low(tp, _counters_cond, 10, 1000); -DECLARE_RTL_COND(rtl_reset_counters_cond) -{ - void __iomem *ioaddr = tp->mmio_addr; + RTL_W32(CounterAddrLow, 0); + RTL_W32(CounterAddrHigh, 0); - return RTL_R32(CounterAddrLow) & CounterReset; + return ret; } static bool rtl8169_reset_counters(struct net_device *dev) { struct rtl8169_private *tp = netdev_priv(dev); - struct rtl8169_counters *counters; - dma_addr_t paddr; - bool ret = true; /* * Versions prior to RTL_GIGA_MAC_VER_19 don't support resetting the @@ -2245,32 +2230,13 @@ static bool rtl8169_reset_counters(struct net_device *dev) if (tp->mac_version < RTL_GIGA_MAC_VER_19) return true; - counters = rtl8169_map_counters(dev, , CounterReset); - if (!counters) - return false; - - if (!rtl_udelay_loop_wait_low(tp, _reset_counters_cond, 10, 1000)) - ret = false; - - rtl8169_unmap_counters(dev, paddr, counters); - - return ret; -} - -DECLARE_RTL_COND(rtl_counters_cond) -{ - void __iomem *ioaddr = tp->mmio_addr; - - return RTL_R32(CounterAddrLow) & CounterDump; + return rtl8169_do_counters(dev, CounterReset); } static bool rtl8169_update_counters(struct net_device *dev) { struct rtl8169_private *tp = netdev_priv(dev); void __iomem *ioaddr = tp->mmio_addr; - struct rtl8169_counters *counters; - dma_addr_t paddr; - bool ret = true; /* * Some chips are unable to dump tally counters when the receiver @@ -2279,23 +2245,13 @@ static bool rtl8169_update_counters(struct net_device *dev) if ((RTL_R8(ChipCmd) & CmdRxEnb) == 0) return tru
Re: [PATCH net 1/1] r8169: fix sleepable allocation during netdevice stats retrieval.
On Sep 10 01:51, poma wrote: > [PATCH v2] r8169: Fix sleeping function called during get_stats64 > http://www.spinics.net/lists/netdev/msg342490.html > - the noise is still present > > Corinna, care to share kernel-core & kernel-modules you are testing with, > to unravel the Gordian Knot? Somehow your initrd seem to refuse to pick up the new kernel module. Here's basically what I do: - Test system is RHEL7 - git clone git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git - edit drivers/net/ethernet/realtek/r8169.c - make config; make -j42 - make install_modules && make install - init 6 Alternatively, if you already did the above once: - git am http://www.spinics.net/lists/netdev/msg342490.html> - make modules - cp drivers/net/ethernet/realtek/r8169.ko \ /lib/modules/4.2.0+/kernel/drivers/net/ethernet/realtek/r8169.c - dracut -f --kver '4.2.0+' - init 6 Corinna pgpVpl0dCHn6g.pgp Description: PGP signature
Re: [PATCH net 1/1] r8169: fix sleepable allocation during netdevice stats retrieval.
On Sep 10 14:36, poma wrote: > On 10.09.2015 10:47, Corinna Vinschen wrote: > > On Sep 10 01:51, poma wrote: > >> [PATCH v2] r8169: Fix sleeping function called during get_stats64 > >> http://www.spinics.net/lists/netdev/msg342490.html > >> - the noise is still present > >> > >> Corinna, care to share kernel-core & kernel-modules you are testing with, > >> to unravel the Gordian Knot? > > > > Somehow your initrd seem to refuse to pick up the new kernel module. > > [...etc...] > > [PATCH v3] r8169: Fix sleeping function called during get_stats64, v2 > http://marc.info/?l=linux-netdev=144187488130444=raw > > # dracut --omit-drivers r8169 -f --kver 4.3.0-0.rc0.git11.1.fc24.x86_64 > > - the noise is gone > > Tested-by: poma <pomidorabelis...@gmail.com> Thanks for testing! Corinna pgpSwWLxiLb3R.pgp Description: PGP signature
Re: [PATCH net 3/3] r8169: increase the lifespan of the hardware counters dump area.
On Sep 9 01:27, Francois Romieu wrote: > Corinna Vinschen <vinsc...@redhat.com> : > [...] > > - Alternatively I can still reproduce the SEGV in rtl_remove_one > > when trying to rmmod the module, I just don't have the stack dump > > handy while writing this mail. I can show it if needed. > > I see it too. > > > I debugged this on and off the entire day (tweaking, compiling, rebooting, > > kernel crash, rinse and repeat). > > > > And the result of my debugging is totally crazy: > > My patch corrupts memory. That's rather obvious, I just fail to see how. I try to implement this from scratch, maybe I stumble over the cause by accident... Corinna pgpRDJT_M8GnX.pgp Description: PGP signature
Re: [GIT] Networking
On Sep 8 22:16, Konrad Rzeszutek Wilk wrote: > On Tue, Sep 8, 2015 at 10:14 PM, Konrad Rzeszutek Wilk > <kon...@kernel.org> wrote: > > > > (Removed Linus and Andrew from the To, added Corinna ..) > > and resending again without HTML (sorry, thought I had HTML-emails > disabled by default) > > > > On Thu, Sep 3, 2015 at 1:35 AM, David Miller <da...@davemloft.net> wrote: > >> > >> > >> Another merge window, another set of networking changes. I've heard > >> rumblings that the lightweight tunnels infrastructure has been voted > >> networking change of the year. But what do I know? > >> > > .. snip.. > >> > >> Corinna Vinschen (2): > >> r8169: Add values missing in @get_stats64 from HW counters Thanks, this has been reported over the weekend in https://bugzilla.kernel.org/show_bug.cgi?id=104031 Strangly this didn't occur during my tests. Looking into it... Corinna pgpQbD31yZa48.pgp Description: PGP signature
[PATCH net] r8169: Fix sleeping function called during get_stats64
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=104031 Fixes: 6e85d5ad36a26debc23a9a865c029cbe242b2dc8 Based on the discussion starting at http://www.spinics.net/lists/netdev/msg342193.html Tested locally on RTL8168evl/8111evl with various concurrent processes accessing /proc/net/dev while changing the link state as well as removing/reloading the r8169 module at the same time. Signed-off-by: Corinna Vinschen <vinsc...@redhat.com> --- drivers/net/ethernet/realtek/r8169.c | 140 ++- 1 file changed, 56 insertions(+), 84 deletions(-) diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index 24dcbe6..630811a 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -833,7 +833,8 @@ struct rtl8169_private { unsigned features; struct mii_if_info mii; - struct rtl8169_counters counters; + dma_addr_t CntPhysAddr; + struct rtl8169_counters *CntArray; struct rtl8169_tc_offsets tc_offset; u32 saved_wolopts; u32 opts1_mask; @@ -2190,53 +2191,40 @@ static int rtl8169_get_sset_count(struct net_device *dev, int sset) } } -static struct rtl8169_counters *rtl8169_map_counters(struct net_device *dev, -dma_addr_t *paddr, -u32 counter_cmd) +DECLARE_RTL_COND(rtl_counters_cond) { - struct rtl8169_private *tp = netdev_priv(dev); void __iomem *ioaddr = tp->mmio_addr; - struct device *d = >pci_dev->dev; - struct rtl8169_counters *counters; - u32 cmd; - counters = dma_alloc_coherent(d, sizeof(*counters), paddr, GFP_KERNEL); - if (counters) { - RTL_W32(CounterAddrHigh, (u64)*paddr >> 32); - cmd = (u64)*paddr & DMA_BIT_MASK(32); - RTL_W32(CounterAddrLow, cmd); - RTL_W32(CounterAddrLow, cmd | counter_cmd); - } - return counters; + return RTL_R32(CounterAddrLow) & (CounterReset | CounterDump); } -static void rtl8169_unmap_counters (struct net_device *dev, - dma_addr_t paddr, - struct rtl8169_counters *counters) +static bool rtl8169_do_counters(struct net_device *dev, u32 counter_cmd) { struct rtl8169_private *tp = netdev_priv(dev); void __iomem *ioaddr = tp->mmio_addr; - struct device *d = >pci_dev->dev; + dma_addr_t paddr = tp->CntPhysAddr; + u32 cmd; + bool ret; - RTL_W32(CounterAddrLow, 0); - RTL_W32(CounterAddrHigh, 0); + if (!paddr) + return false; - dma_free_coherent(d, sizeof(*counters), counters, paddr); -} + RTL_W32(CounterAddrHigh, (u64)paddr >> 32); + cmd = (u64)paddr & DMA_BIT_MASK(32); + RTL_W32(CounterAddrLow, cmd); + RTL_W32(CounterAddrLow, cmd | counter_cmd); -DECLARE_RTL_COND(rtl_reset_counters_cond) -{ - void __iomem *ioaddr = tp->mmio_addr; + ret = rtl_udelay_loop_wait_low(tp, _counters_cond, 10, 1000); - return RTL_R32(CounterAddrLow) & CounterReset; + RTL_W32(CounterAddrLow, 0); + RTL_W32(CounterAddrHigh, 0); + + return ret; } static bool rtl8169_reset_counters(struct net_device *dev) { struct rtl8169_private *tp = netdev_priv(dev); - struct rtl8169_counters *counters; - dma_addr_t paddr; - bool ret = true; /* * Versions prior to RTL_GIGA_MAC_VER_19 don't support resetting the @@ -2245,32 +2233,13 @@ static bool rtl8169_reset_counters(struct net_device *dev) if (tp->mac_version < RTL_GIGA_MAC_VER_19) return true; - counters = rtl8169_map_counters(dev, , CounterReset); - if (!counters) - return false; - - if (!rtl_udelay_loop_wait_low(tp, _reset_counters_cond, 10, 1000)) - ret = false; - - rtl8169_unmap_counters(dev, paddr, counters); - - return ret; -} - -DECLARE_RTL_COND(rtl_counters_cond) -{ - void __iomem *ioaddr = tp->mmio_addr; - - return RTL_R32(CounterAddrLow) & CounterDump; +return rtl8169_do_counters(dev, CounterReset); } static bool rtl8169_update_counters(struct net_device *dev) { struct rtl8169_private *tp = netdev_priv(dev); void __iomem *ioaddr = tp->mmio_addr; - struct rtl8169_counters *counters; - dma_addr_t paddr; - bool ret = true; /* * Some chips are unable to dump tally counters when the receiver @@ -2279,23 +2248,13 @@ static bool rtl8169_update_counters(struct net_device *dev) if ((RTL_R8(ChipCmd) & CmdRxEnb) == 0) return true; - counters = rtl8169_map_counters(dev, , CounterDump); - if (!counters) - return false; - - if (
Re: [PATCH net 1/1] r8169: fix sleepable allocation during netdevice stats retrieval.
On Sep 9 17:54, Corinna Vinschen wrote: > On Sep 9 17:24, poma wrote: > > [PATCH net] r8169: Fix sleeping function called during get_stats64 > > http://marc.info/?l=linux-netdev=144180123410135=raw > > - the noise is still present > > Are you really sure? The entire dma_alloc/dma_free stuff has been moved > away. There's no locking or sleeping involved different from what was > there before my original patch when calling .ndo_get_stats64. > > How would I be able to reproduce this on the command line? It would also be interesting to see the "noise" as it looks after applying the above patch... Corinna pgpyx8KB9HKKJ.pgp Description: PGP signature
Re: [PATCH net 1/1] r8169: fix sleepable allocation during netdevice stats retrieval.
On Sep 9 17:24, poma wrote: > [PATCH net] r8169: Fix sleeping function called during get_stats64 > http://marc.info/?l=linux-netdev=144180123410135=raw > - the noise is still present Are you really sure? The entire dma_alloc/dma_free stuff has been moved away. There's no locking or sleeping involved different from what was there before my original patch when calling .ndo_get_stats64. How would I be able to reproduce this on the command line? Corinna pgpcvEJOJJCGT.pgp Description: PGP signature
Re: [PATCH net 1/1] r8169: fix sleepable allocation during netdevice stats retrieval.
On Sep 9 20:34, poma wrote: > On 09/09/2015 05:55 PM, Corinna Vinschen wrote: > > On Sep 9 17:54, Corinna Vinschen wrote: > >> On Sep 9 17:24, poma wrote: > >>> [PATCH net] r8169: Fix sleeping function called during get_stats64 > >>> http://marc.info/?l=linux-netdev=144180123410135=raw > >>> - the noise is still present > >> > >> Are you really sure? The entire dma_alloc/dma_free stuff has been moved > >> away. There's no locking or sleeping involved different from what was > >> there before my original patch when calling .ndo_get_stats64. > >> > > > It is literally the kernel ring buffer output, > so I really can not understand your question. I'm asking because I'm wondering if you're actually testing the right r8169.ko module, the one with the patch applied. See below. > >> How would I be able to reproduce this on the command line? > > This I have already written, here's once more for you, > > This noise is induced via userspace, xfce4-netload-plugin, > http://goodies.xfce.org/projects/panel-plugins/xfce4-netload-plugin > > $ grep -i device .config/xfce4/panel/netload-16.rc > Network_Device=enp3s0 > > $ ethtool -i enp3s0 | grep driver > driver: r8169 > > > Therefore, to try to reproduce this issue, 'xfce4-netload-plugin' must run > within 'xfce4-panel', > moreover 'xfce4-netload-plugin' must be configured to monitor affected > network interface. I'lll see if I can try this tomorrow. > No command line this time, hombre. If it has to be spanish, I'd prefer mujer, but whatever. > > It would also be interesting to see the "noise" as it looks after > > applying the above patch... > > The "noise" after applying "r8169: Fix sleeping function called during > get_stats64": > [...] > [ 215.049067] Call Trace: > [ 215.049078] [] dump_stack+0x4b/0x63 > [ 215.049090] [] lockdep_rcu_suspicious+0xd7/0x110 > [ 215.049099] [] ___might_sleep+0xa7/0x230 > [ 215.049107] [] __might_sleep+0x49/0x80 > [ 215.049121] [] __alloc_pages_nodemask+0x2fe/0xb90 > [ 215.049130] [] ? debug_lockdep_rcu_enabled+0x1d/0x20 > [ 215.049141] [] ? sched_clock+0x9/0x10 > [ 215.049149] [] ? local_clock+0x1c/0x20 > [ 215.049157] [] ? debug_lockdep_rcu_enabled+0x1d/0x20 > [ 215.049168] [] dma_generic_alloc_coherent+0x96/0x130 > [ 215.049178] [] x86_swiotlb_alloc_coherent+0x25/0x50 > [ 215.049193] [] dma_alloc_attrs+0x6d/0xe0 > [ 215.049208] [] rtl8169_map_counters+0x3e/0x70 [r8169] This is very certainly not the r8169.ko driver with my patch applied. There is no rtl8169_map_counters function anymore, just as with Francois' patch. I'm not sure what you're doing wrong there, but this stack dump definitely cannot be produced with either Francois or my patch, so you're apparently testing the unpatched upstream driver all the time. > ... > etc. > etc. > etc. > > This looks the same as at the beginning, as if you were dealing with > an entirely different problem, hombre. No, sorry, it's you running the wrong kernel module, and a single occurence of the stack dump would have been sufficient, but thanks all the same. Corinna pgpskJpMKmMHs.pgp Description: PGP signature
Re: [PATCH net] r8169: Fix sleeping function called during get_stats64
On Sep 9 22:23, Francois Romieu wrote: > Corinna Vinschen <vinsc...@redhat.com> : > [...] > > diff --git a/drivers/net/ethernet/realtek/r8169.c > > b/drivers/net/ethernet/realtek/r8169.c > > index 24dcbe6..630811a 100644 > > --- a/drivers/net/ethernet/realtek/r8169.c > > +++ b/drivers/net/ethernet/realtek/r8169.c > [...] > > + if (!paddr) > > + return false; > > I guess this is the secret recipe. Actually, no. I started out stress testing this combined with a printk to show if paddr can be NULL, but this never occurs. ndo_get_stats64 is apparently only called after registering the device, and this occurs after calling dma_alloc_coherent in rtl_init_one. Rather than removing the entire test, I acidentally only removed the printk. Sorry about that. > [...] > > @@ -8447,9 +8411,14 @@ static int rtl_init_one(struct pci_dev *pdev, const > > struct pci_device_id *ent) > > > > tp->rtl_fw = RTL_FIRMWARE_UNKNOWN; > > > > + tp->CntArray = dma_alloc_coherent (>dev, sizeof(*tp->CntArray), > > + >CntPhysAddr, GFP_KERNEL); > > + if (!tp->CntArray) > > + goto err_out_cnt_4; > > + > > rc is still zero here so rtl_init_one will return success. Thanks for catching! I'm sending a fixed patch in a minute, removing the paddr test and setting rc to -ENOMEM if dma_alloc_coherent fails. Corinna pgp_SzSXvSkJ2.pgp Description: PGP signature
[PATCH v2] r8169: Fix sleeping function called during get_stats64
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=104031 Fixes: 6e85d5ad36a26debc23a9a865c029cbe242b2dc8 Based on the discussion starting at http://www.spinics.net/lists/netdev/msg342193.html Tested locally on RTL8168evl/8111evl with various concurrent processes accessing /proc/net/dev while changing the link state as well as removing/reloading the r8169 module. Signed-off-by: Corinna Vinschen <vinsc...@redhat.com> --- drivers/net/ethernet/realtek/r8169.c | 139 ++- 1 file changed, 55 insertions(+), 84 deletions(-) diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index 24dcbe6..c12 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -833,7 +833,8 @@ struct rtl8169_private { unsigned features; struct mii_if_info mii; - struct rtl8169_counters counters; + dma_addr_t CntPhysAddr; + struct rtl8169_counters *CntArray; struct rtl8169_tc_offsets tc_offset; u32 saved_wolopts; u32 opts1_mask; @@ -2190,53 +2191,37 @@ static int rtl8169_get_sset_count(struct net_device *dev, int sset) } } -static struct rtl8169_counters *rtl8169_map_counters(struct net_device *dev, -dma_addr_t *paddr, -u32 counter_cmd) +DECLARE_RTL_COND(rtl_counters_cond) { - struct rtl8169_private *tp = netdev_priv(dev); void __iomem *ioaddr = tp->mmio_addr; - struct device *d = >pci_dev->dev; - struct rtl8169_counters *counters; - u32 cmd; - counters = dma_alloc_coherent(d, sizeof(*counters), paddr, GFP_KERNEL); - if (counters) { - RTL_W32(CounterAddrHigh, (u64)*paddr >> 32); - cmd = (u64)*paddr & DMA_BIT_MASK(32); - RTL_W32(CounterAddrLow, cmd); - RTL_W32(CounterAddrLow, cmd | counter_cmd); - } - return counters; + return RTL_R32(CounterAddrLow) & (CounterReset | CounterDump); } -static void rtl8169_unmap_counters (struct net_device *dev, - dma_addr_t paddr, - struct rtl8169_counters *counters) +static bool rtl8169_do_counters(struct net_device *dev, u32 counter_cmd) { struct rtl8169_private *tp = netdev_priv(dev); void __iomem *ioaddr = tp->mmio_addr; - struct device *d = >pci_dev->dev; + dma_addr_t paddr = tp->CntPhysAddr; + u32 cmd; + bool ret; - RTL_W32(CounterAddrLow, 0); - RTL_W32(CounterAddrHigh, 0); + RTL_W32(CounterAddrHigh, (u64)paddr >> 32); + cmd = (u64)paddr & DMA_BIT_MASK(32); + RTL_W32(CounterAddrLow, cmd); + RTL_W32(CounterAddrLow, cmd | counter_cmd); - dma_free_coherent(d, sizeof(*counters), counters, paddr); -} + ret = rtl_udelay_loop_wait_low(tp, _counters_cond, 10, 1000); -DECLARE_RTL_COND(rtl_reset_counters_cond) -{ - void __iomem *ioaddr = tp->mmio_addr; + RTL_W32(CounterAddrLow, 0); + RTL_W32(CounterAddrHigh, 0); - return RTL_R32(CounterAddrLow) & CounterReset; + return ret; } static bool rtl8169_reset_counters(struct net_device *dev) { struct rtl8169_private *tp = netdev_priv(dev); - struct rtl8169_counters *counters; - dma_addr_t paddr; - bool ret = true; /* * Versions prior to RTL_GIGA_MAC_VER_19 don't support resetting the @@ -2245,32 +2230,13 @@ static bool rtl8169_reset_counters(struct net_device *dev) if (tp->mac_version < RTL_GIGA_MAC_VER_19) return true; - counters = rtl8169_map_counters(dev, , CounterReset); - if (!counters) - return false; - - if (!rtl_udelay_loop_wait_low(tp, _reset_counters_cond, 10, 1000)) - ret = false; - - rtl8169_unmap_counters(dev, paddr, counters); - - return ret; -} - -DECLARE_RTL_COND(rtl_counters_cond) -{ - void __iomem *ioaddr = tp->mmio_addr; - - return RTL_R32(CounterAddrLow) & CounterDump; +return rtl8169_do_counters(dev, CounterReset); } static bool rtl8169_update_counters(struct net_device *dev) { struct rtl8169_private *tp = netdev_priv(dev); void __iomem *ioaddr = tp->mmio_addr; - struct rtl8169_counters *counters; - dma_addr_t paddr; - bool ret = true; /* * Some chips are unable to dump tally counters when the receiver @@ -2279,23 +2245,13 @@ static bool rtl8169_update_counters(struct net_device *dev) if ((RTL_R8(ChipCmd) & CmdRxEnb) == 0) return true; - counters = rtl8169_map_counters(dev, , CounterDump); - if (!counters) - return false; - - if (rtl_udelay_loop_wait_low(tp, _counters_cond, 10, 1000)) - me
Re: [PATCH net 3/3] r8169: increase the lifespan of the hardware counters dump area.
Hi David, On Sep 7 17:00, David Miller wrote: > From: Corinna Vinschen <vinsc...@redhat.com> > Date: Mon, 7 Sep 2015 11:29:49 +0200 > > > Still wondering though. Given that the driver never failed before if > > the counter values couldn't be updated, and given that these counter > > values only have statistical relevance, why should this suddenly result > > in a fatal failure at open time? > > Failing to allocate such a small buffer means we have much deeper issues > at hand. A GFP_KERNEL allocation of this size really should not fail. I'm not talking about the allocation. I agree with you on that score. What I'm talking about is the situation where the NIC hardware fails to reset or update its own counters for whatever reason. Apparently the mechanism is supposed to be performed within a given timeframe. The code sets some registers and then waits for a flag bit to be set to 0. For that it utilizes a busy loop checking the flag bit up to 1000 times with a delay of about 10 us. The error condition is that the flag bit hasn't been set to 0 when the loop exits, after roughly 10ms, and *this* part does not constitute a fatal error which breaks the operation of the NIC. So, from my perspective a timeout while trying to wait for updated counter values from the NIC at @ndo_open time should not be treated as fatal. Corinna pgpk7BRl5wT6Y.pgp Description: PGP signature
Re: [PATCH net 3/3] r8169: increase the lifespan of the hardware counters dump area.
On Sep 7 23:52, Francois Romieu wrote: > Corinna Vinschen <vinsc...@redhat.com> : > [...] > > I have a bit of a problem with this patch. It crashes when loading the > > driver manually w/ modprobe. For some reason dev_get_stats is called > > during rtl_init_one and at that time the counters pointer is NULL, so > > the kernel gets a SEGV. > > > > After I worked around that I got a SEGV in > > rtl_remove_one, which I still have to find out why. I didn't test with > > the latest kernel, though, so I still have to check if that's the reason > > for the crashes. That takes a bit longer, I just wanted to let you > > know. > > Call me stupid: I forgot that it's perfectly fine to request the stats > of a down interface. :o/ > > Updated patch is on the way. > > > There's also a problem with rtl8169_cmd_counters: It always calls > > rtl_udelay_loop_wait_low w/ rtl_reset_counters_cond, even when called > > from rtl8169_update_counters, where it should call > > rtl_udelay_loop_wait_low w/ rtl_counters_cond to check for the > > CounterDump flag, rather than for the CounterReset flag. > > > > For now I applied the below patch, but I think it's a bit ugly to > > conditionalize the rtl_cond struct by the incoming flag value. > > > > Let's check both at once: > > return RTL_R32(CounterAddrLow) & (CounterDump | CounterReset); If you're sure that's valid, then why not? It's certainly cleaner code. Corinna pgpNZQ0zHUR2P.pgp Description: PGP signature
Re: [PATCH net 3/3] r8169: increase the lifespan of the hardware counters dump area.
On Sep 8 02:02, Francois Romieu wrote: > Francois Romieu: > [...] > > Updated patch is on the way. > > Fixed memcpy in patch 0001, moved counters allocation from open() > to probe(), returned open() to its original state but something is > still wrong: the link does not come up. I tested and debugged the attached patches. Just as you noticed, the interfaces (my test machine has two) don't come up at boot time and subsequently I can also reproduce two kinds of crashes: - Calling `ip link ... up' crashes the kernel in rtl_open like this: [ 138.031190] [] dump_stack+0x44/0x55 [ 138.036311] [] __setup_irq+0x515/0x580 [ 138.041693] [] ? rtl8169_gset_xmii+0x20/0x20 [r8169] [ 138.048284] [] request_threaded_irq+0xf4/0x1a0 [ 138.054357] [] rtl_open+0x3a7/0xab4 [r8169] [...] - Alternatively I can still reproduce the SEGV in rtl_remove_one when trying to rmmod the module, I just don't have the stack dump handy while writing this mail. I can show it if needed. I debugged this on and off the entire day (tweaking, compiling, rebooting, kernel crash, rinse and repeat). And the result of my debugging is totally crazy: If I disable the call to rtl_init_counter_offsets in rtl_open, as in #if 0 retval = rtl_init_counter_offsets(dev); if (retval < 0) netif_warn(tp, hw, dev, "counter reset/update failed\n"); #endif the interfaces come up just fine. If I reenable the rtl_init_counter_offsets call in rtl_open, and reduce the rtl_init_counter_offsets function to just this: static int rtl_init_counter_offsets(struct net_device *dev) { return 1; } then the interfaces refuse to come up, and a subsequent `ip link ... up' crashes the kernel. No, I do not understand this :( Corinna pgp2h7zEl3YBC.pgp Description: PGP signature
Re: [PATCH net 1/1] r8169: fix sleepable allocation during netdevice stats retrieval.
On Sep 7 09:13, poma wrote: > On 06.09.2015 12:19, Corinna Vinschen wrote: > > On Sep 4 22:59, Francois Romieu wrote: > >> Applies against davem's net as of > >> f1ccbfce2fc787981d1182d09c1f6b67766783a8. > >> > >> drivers/net/ethernet/realtek/r8169.c | 2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >> diff --git a/drivers/net/ethernet/realtek/r8169.c > >> b/drivers/net/ethernet/realtek/r8169.c > >> index 24dcbe6..56829ea 100644 > >> --- a/drivers/net/ethernet/realtek/r8169.c > >> +++ b/drivers/net/ethernet/realtek/r8169.c > >> @@ -2200,7 +2200,7 @@ static struct rtl8169_counters > >> *rtl8169_map_counters(struct net_device *dev, > >>struct rtl8169_counters *counters; > >>u32 cmd; > >> > >> - counters = dma_alloc_coherent(d, sizeof(*counters), paddr, GFP_KERNEL); > >> + counters = dma_alloc_coherent(d, sizeof(*counters), paddr, GFP_ATOMIC); > >>if (counters) { > >>RTL_W32(CounterAddrHigh, (u64)*paddr >> 32); > >>cmd = (u64)*paddr & DMA_BIT_MASK(32); > >> -- > >> 2.4.3 > > [...] > 1. Reverted to r8169.c?id=eb78139 >- the noise is still present > 2. Patches applied - Francois Romieu (3): >r8169: decouple the counters data and the device private area. >r8169: move rtl_reset_counters_cond before the hardware counters helpers. >r8169: increase the lifespan of the hardware counters dump area. >- the noise is still present Sure you tested the right code? I'm just asking because... > [ 70.016445] [] dump_stack+0x4b/0x63 > [ 70.016456] [] lockdep_rcu_suspicious+0xd7/0x110 > [ 70.016465] [] ___might_sleep+0xa7/0x230 > [ 70.016472] [] __might_sleep+0x49/0x80 > [ 70.016481] [] __alloc_pages_nodemask+0x2fe/0xb90 > [ 70.016490] [] ? debug_lockdep_rcu_enabled+0x1d/0x20 > [ 70.016499] [] ? sched_clock+0x9/0x10 > [ 70.016507] [] ? local_clock+0x1c/0x20 > [ 70.016514] [] ? debug_lockdep_rcu_enabled+0x1d/0x20 > [ 70.016524] [] dma_generic_alloc_coherent+0x96/0x130 > [ 70.016534] [] x86_swiotlb_alloc_coherent+0x25/0x50 > [ 70.016541] [] dma_alloc_attrs+0x6d/0xe0 > [ 70.016555] [] rtl8169_map_counters+0x3e/0x70 [r8169] ...rtl8169_map_counters only exists before Francois patch. The patch removes this function. Corinna pgpVJzCq1DXC1.pgp Description: PGP signature
Re: [PATCH net 3/3] r8169: increase the lifespan of the hardware counters dump area.
On Sep 6 22:21, Francois Romieu wrote: > Corinna Vinschen <vinsc...@redhat.com> : > > On Sep 5 14:18, rom...@fr.zoreil.com wrote: > [...] > > > - rtl_reset_counters_cond induced failures in open() are also considered > > > fatal: it takes acceptable work to unwind comfortably. > > > > Why? > > > Crap, my description does not match the code wrt rtl_reset_counters_cond. :o/ > > s/rtl8169_reset_counters/rtl8169_update_counters/g > > The code is right. > > [...] > > This returns -EINVAL even for older chip versions which are not capable > > of resetting the counters. The result is, this driver will not work at > > all on chip versions prior to RTL_GIGA_MAC_VER_19 anymore, because > > rtl_open will always fail. > > No. My changelog was misleading. rtl_init_counter_offsets handles this part > correctly. Oh, right, I missed that rtl_init_counter_offsets checks for `if (!rc)', not for `if (rc < 0)' as for the call to rtl8169_update_counters. Still wondering though. Given that the driver never failed before if the counter values couldn't be updated, and given that these counter values only have statistical relevance, why should this suddenly result in a fatal failure at open time? Corinna pgpU8NPcDgH_n.pgp Description: PGP signature
Re: [PATCH net 3/3] r8169: increase the lifespan of the hardware counters dump area.
On Sep 6 12:37, Corinna Vinschen wrote: > On Sep 5 14:18, rom...@fr.zoreil.com wrote: > > From: Francois Romieu <rom...@fr.zoreil.com> > > > > net/core/net-sysfs.c::netstat_show retrieves stats with spinlock held. > > > > This change avoids sleepable allocation and performs some housekeeping: > > - receive ring, transmit ring and counters dump area allocation failures > > are all considered fatal during open() > > - netif_warn is now redundant with rtl_reset_counters_cond built-in > > failure message. > > - rtl_reset_counters_cond induced failures in open() are also considered > > fatal: it takes acceptable work to unwind comfortably. > > Why? The counter reset is not a fatal condition for the operation of > the NIC. Even if the counters show incorrect values, the normal > operation of the NIC is not affected. And to top that off, even if > resetting the counters actually doesn't work, the driver keeps the > offset values, so a failed reset won't even harm the statistics. It > just doesn't make sense to break the NIC operation for such a minor > problem. And worse: > > > -static bool rtl8169_reset_counters(struct net_device *dev) > > +static int rtl8169_reset_counters(struct net_device *dev) > > { > > struct rtl8169_private *tp = netdev_priv(dev); > > - struct rtl8169_counters *counters; > > - dma_addr_t paddr; > > - bool ret = true; > > > > /* > > * Versions prior to RTL_GIGA_MAC_VER_19 don't support resetting the > > * tally counters. > > */ > > if (tp->mac_version < RTL_GIGA_MAC_VER_19) > > - return true; > > - > > - counters = rtl8169_map_counters(dev, , CounterReset); > > - if (!counters) > > - return false; > > - > > - if (!rtl_udelay_loop_wait_low(tp, _reset_counters_cond, 10, 1000)) > > - ret = false; > > - > > - rtl8169_unmap_counters(dev, paddr, counters); > > + return -EINVAL; > > This returns -EINVAL even for older chip versions which are not capable > of resetting the counters. The result is, this driver will not work at > all on chip versions prior to RTL_GIGA_MAC_VER_19 anymore, because > rtl_open will always fail. > > Other then that, I can test the patch probably tomorrow. I have a bit of a problem with this patch. It crashes when loading the driver manually w/ modprobe. For some reason dev_get_stats is called during rtl_init_one and at that time the counters pointer is NULL, so the kernel gets a SEGV. After I worked around that I got a SEGV in rtl_remove_one, which I still have to find out why. I didn't test with the latest kernel, though, so I still have to check if that's the reason for the crashes. That takes a bit longer, I just wanted to let you know. There's also a problem with rtl8169_cmd_counters: It always calls rtl_udelay_loop_wait_low w/ rtl_reset_counters_cond, even when called from rtl8169_update_counters, where it should call rtl_udelay_loop_wait_low w/ rtl_counters_cond to check for the CounterDump flag, rather than for the CounterReset flag. For now I applied the below patch, but I think it's a bit ugly to conditionalize the rtl_cond struct by the incoming flag value. Corinna diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index ae07567..cd7adbf 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -2200,6 +2200,13 @@ DECLARE_RTL_COND(rtl_reset_counters_cond) return RTL_R32(CounterAddrLow) & CounterReset; } +DECLARE_RTL_COND(rtl_counters_cond) +{ + void __iomem *ioaddr = tp->mmio_addr; + + return RTL_R32(CounterAddrLow) & CounterDump; +} + static int rtl8169_cmd_counters(struct net_device *dev, u32 counter_cmd) { struct rtl8169_private *tp = netdev_priv(dev); @@ -2212,7 +2219,10 @@ static int rtl8169_cmd_counters(struct net_device *dev, u32 counter_cmd) RTL_W32(CounterAddrLow, cmd); RTL_W32(CounterAddrLow, cmd | counter_cmd); - return rtl_udelay_loop_wait_low(tp, _reset_counters_cond, 10, 1000); + return rtl_udelay_loop_wait_low(tp, + counter_cmd == CounterDump ? _counters_cond : +_reset_counters_cond, + 10, 1000); } static int rtl8169_reset_counters(struct net_device *dev) @@ -2229,13 +2239,6 @@ static int rtl8169_reset_counters(struct net_device *dev) return rtl8169_cmd_counters(dev, CounterReset); } -DECLARE_RTL_COND(rtl_counters_cond) -{ - void __iomem *ioaddr = tp->mmio_addr; - - return RTL_R32(CounterAddrLow) & CounterDump; -} - static int rtl8169_update_counters(struct net_device *dev) { struct rtl8169_privat
Re: [PATCH net 1/1] r8169: fix sleepable allocation during netdevice stats retrieval.
On Sep 4 22:59, Francois Romieu wrote: > net/core/net-sysfs.c::netstat_show fetches statistics under dev_base_lock. > > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=104031 > Fixes: 6e85d5ad36a2 ("r8169: Add values missing in @get_stats64 from HW > counters") > Signed-off-by: Francois Romieu <rom...@fr.zoreil.com> > Cc: Corinna Vinschen <vinsc...@redhat.com> > --- > > Applies against davem's net as of f1ccbfce2fc787981d1182d09c1f6b67766783a8. > > drivers/net/ethernet/realtek/r8169.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/realtek/r8169.c > b/drivers/net/ethernet/realtek/r8169.c > index 24dcbe6..56829ea 100644 > --- a/drivers/net/ethernet/realtek/r8169.c > +++ b/drivers/net/ethernet/realtek/r8169.c > @@ -2200,7 +2200,7 @@ static struct rtl8169_counters > *rtl8169_map_counters(struct net_device *dev, > struct rtl8169_counters *counters; > u32 cmd; > > - counters = dma_alloc_coherent(d, sizeof(*counters), paddr, GFP_KERNEL); > + counters = dma_alloc_coherent(d, sizeof(*counters), paddr, GFP_ATOMIC); > if (counters) { > RTL_W32(CounterAddrHigh, (u64)*paddr >> 32); > cmd = (u64)*paddr & DMA_BIT_MASK(32); > -- > 2.4.3 I'll have a look tomorrow what I'm back to work, but for the time being, two questions. - The code worked for me in local testing without problem, so I'm rather puzzled. How can I reproduce the problem? - I'm pretty new to this stuff, so I don't understand this: The dma_alloc_coherent(...,GFP_KERNEL) call is not new in the code, it's there since at least 2010. It appears to work fine in the context of @get_ethtool_stats. Why does this not work in the context of @ndo_get_stats64? Thanks, Corinna pgpdO7Z2CLpTf.pgp Description: PGP signature
Re: [PATCH net 3/3] r8169: increase the lifespan of the hardware counters dump area.
On Sep 5 14:18, rom...@fr.zoreil.com wrote: > From: Francois Romieu> > net/core/net-sysfs.c::netstat_show retrieves stats with spinlock held. > > This change avoids sleepable allocation and performs some housekeeping: > - receive ring, transmit ring and counters dump area allocation failures > are all considered fatal during open() > - netif_warn is now redundant with rtl_reset_counters_cond built-in > failure message. > - rtl_reset_counters_cond induced failures in open() are also considered > fatal: it takes acceptable work to unwind comfortably. Why? The counter reset is not a fatal condition for the operation of the NIC. Even if the counters show incorrect values, the normal operation of the NIC is not affected. And to top that off, even if resetting the counters actually doesn't work, the driver keeps the offset values, so a failed reset won't even harm the statistics. It just doesn't make sense to break the NIC operation for such a minor problem. And worse: > -static bool rtl8169_reset_counters(struct net_device *dev) > +static int rtl8169_reset_counters(struct net_device *dev) > { > struct rtl8169_private *tp = netdev_priv(dev); > - struct rtl8169_counters *counters; > - dma_addr_t paddr; > - bool ret = true; > > /* >* Versions prior to RTL_GIGA_MAC_VER_19 don't support resetting the >* tally counters. >*/ > if (tp->mac_version < RTL_GIGA_MAC_VER_19) > - return true; > - > - counters = rtl8169_map_counters(dev, , CounterReset); > - if (!counters) > - return false; > - > - if (!rtl_udelay_loop_wait_low(tp, _reset_counters_cond, 10, 1000)) > - ret = false; > - > - rtl8169_unmap_counters(dev, paddr, counters); > + return -EINVAL; This returns -EINVAL even for older chip versions which are not capable of resetting the counters. The result is, this driver will not work at all on chip versions prior to RTL_GIGA_MAC_VER_19 anymore, because rtl_open will always fail. Other then that, I can test the patch probably tomorrow. Corinna pgpcwnfNo06nh.pgp Description: PGP signature
Re: [PATCH net 1/3] r8169: decouple the counters data and the device private area.
On Sep 5 14:18, rom...@fr.zoreil.com wrote: > From: Francois Romieu> @@ -2335,24 +2337,25 @@ static void rtl8169_get_ethtool_stats(struct > net_device *dev, > struct ethtool_stats *stats, u64 *data) > { > struct rtl8169_private *tp = netdev_priv(dev); > + struct rtl8169_counters *c = tp->counters; Minor style nit: Shouldn't that local var ideally be called "counters" as you did in the other functions? Corinna pgpRd7HUBOTob.pgp Description: PGP signature
[PATCH net-next] r8169: Add software counter for multicast packages
The multicast hardware counter on 8168/8111 chips is only 32 bit while the statistics in struct rtnl_link_stats64 are 64 bit. Given that statistics are requested on an irregular basis, an overflow of the hardware counter can go unnoticed. To count even very large numbers of multicast packets reliably, add a software counter and remove previously applied code to fill the multicast field requested by @rtl8169_get_stats64 with the values read from the rx_multicast hardware counter. Signed-off-by: Corinna Vinschen vinsc...@redhat.com --- drivers/net/ethernet/realtek/r8169.c | 9 - 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index d6d39df..24dcbe6 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -754,7 +754,6 @@ struct rtl8169_tc_offsets { boolinited; __le64 tx_errors; __le32 tx_multi_collision; - __le32 rx_multicast; __le16 tx_aborted; }; @@ -2326,7 +2325,6 @@ static bool rtl8169_init_counter_offsets(struct net_device *dev) tp-tc_offset.tx_errors = tp-counters.tx_errors; tp-tc_offset.tx_multi_collision = tp-counters.tx_multi_collision; - tp-tc_offset.rx_multicast = tp-counters.rx_multicast; tp-tc_offset.tx_aborted = tp-counters.tx_aborted; tp-tc_offset.inited = true; @@ -7480,6 +7478,9 @@ process_pkt: tp-rx_stats.packets++; tp-rx_stats.bytes += pkt_size; u64_stats_update_end(tp-rx_stats.syncp); + + if (skb-pkt_type == PACKET_MULTICAST) + dev-stats.multicast++; } release_descriptor: desc-opts2 = 0; @@ -7790,7 +7791,6 @@ rtl8169_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats) stats-rx_bytes = tp-rx_stats.bytes; } while (u64_stats_fetch_retry_irq(tp-rx_stats.syncp, start)); - do { start = u64_stats_fetch_begin_irq(tp-tx_stats.syncp); stats-tx_packets = tp-tx_stats.packets; @@ -7804,6 +7804,7 @@ rtl8169_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats) stats-rx_crc_errors= dev-stats.rx_crc_errors; stats-rx_fifo_errors = dev-stats.rx_fifo_errors; stats-rx_missed_errors = dev-stats.rx_missed_errors; + stats-multicast= dev-stats.multicast; /* * Fetch additonal counter values missing in stats collected by driver @@ -7819,8 +7820,6 @@ rtl8169_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats) le64_to_cpu(tp-tc_offset.tx_errors); stats-collisions = le32_to_cpu(tp-counters.tx_multi_collision) - le32_to_cpu(tp-tc_offset.tx_multi_collision); - stats-multicast = le32_to_cpu(tp-counters.rx_multicast) - - le32_to_cpu(tp-tc_offset.rx_multicast); stats-tx_aborted_errors = le16_to_cpu(tp-counters.tx_aborted) - le16_to_cpu(tp-tc_offset.tx_aborted); -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 net-next] r8169: Add values missing in @get_stats64 from HW counters
On Aug 26 00:54, Francois Romieu wrote: David Miller da...@davemloft.net : [...] Your counter offsets should be read at probe time, not open time. It can be done but the CmdRxEnb / rx traffic must be enabled constraint will make it a major pita. Reading counter offsets at the end of open() naturally solves this constraint (retentive error unwinding in opne() stops being completely trivial though :o/ ). Bringing the interface is brought down/up should not reset the counters. Afaiks rtl8169_tc_offsets.inited in rtl8169_init_counter_offsets takes care of it: it's set during the first open() after probe(). Looking at it again, the patch directly stores 16 and 32 bit values in rtnl_link_stats64. Nobody should care about exact exceedingly high error count but rx_multicast ought to be accumulated. I'll have a look into that for a followup patch. Thanks, Corinna pgpgKxmrSJF6p.pgp Description: PGP signature
Re: [PATCH v4 net-next] r8169: Add values missing in @get_stats64 from HW counters
On Aug 25 16:03, David Miller wrote: From: David Miller da...@davemloft.net Date: Tue, 25 Aug 2015 15:59:21 -0700 (PDT) From: Francois Romieu rom...@fr.zoreil.com Date: Wed, 26 Aug 2015 00:54:06 +0200 Bringing the interface is brought down/up should not reset the counters. Afaiks rtl8169_tc_offsets.inited in rtl8169_init_counter_offsets takes care of it: it's set during the first open() after probe(). Ok, then it's fine. And as such I've applied this patch, thanks. Thanks, Corinna pgpWiElqoUtc5.pgp Description: PGP signature
Re: [PATCH v2 net-next] r8169: Add values missing in @get_stats64 from HW counters
On Aug 25 01:33, Francois Romieu wrote: Corinna Vinschen vinsc...@redhat.com : On Aug 22 13:23, Francois Romieu wrote: [...] Sorry, my english was really bad: the code should propagate failure when rtl8169_reset_counters and rtl8169_update_counters *simultaneously* fail. Uhm... sorry, but that still doesn't answer the question. As you can see in my patch, the initalization at open time is already encapsulated in a function rtl8169_init_counter_offsets. I have read your patch, I have already answered the question and I have already said that it wasn't a showstopper. Ok. I hope v4 of the patch is ok? Thanks, Corinna pgpNpx7FA2Ait.pgp Description: PGP signature
Re: [PATCH v2 net-next] r8169: Add values missing in @get_stats64 from HW counters
On Aug 22 13:23, Francois Romieu wrote: Corinna Vinschen vinsc...@redhat.com : [...] That won't happen with the current patch because only rtl8169_reset_counters would print a log message, it's only called from open, and open occurs rather seldom. Atop of that the code only tries to reset counters on HW supporting it, and only if resetting on the HW fails, there will be a log message at all. There's no reasonable chance that failing to reset the counters will lead to log flooding. Thanks for reformulating it. We are in violent agreement here. [...] I'm not trying to avoid work, I'm trying to understand. As far as I see it failing to reset the counters has no impact on the viability of the code. It's still working with offsets and if the offset is 0 or non-0, the user space won't see the difference in the values returned by @get_stats64. Successful resetting the counters is just a bonus. Sorry, my english was really bad: the code should propagate failure when rtl8169_reset_counters and rtl8169_update_counters *simultaneously* fail. Uhm... sorry, but that still doesn't answer the question. As you can see in my patch, the initalization at open time is already encapsulated in a function rtl8169_init_counter_offsets. Assuming rtl8169_init_counter_offsets returns -1 if both functions, rtl8169_reset_counters and rtl8169_update_counters fail. Then... what? Not being able to reset or update the counters is still not at all fatal for the operation of the NIC as a whole and rtl_open in particular: rtl_open() { [...] /* This is non-fatal. */ if (!rtl8169_init_counter_offsets ()) { /* What to do here??? */ } [...] } Corinna pgpg0O3EnYyyX.pgp Description: PGP signature
[PATCH v3 net-next] r8169: Add values missing in @get_stats64 from HW counters
The r8169 driver collects statistical information returned by @get_stats64 by counting them in the driver itself, even though many (but not all) of the values are already collected by tally counters (TCs) in the NIC. Some of these TC values are not returned by @get_stats64. Especially the received multicast packages are missing from /proc/net/dev. Rectify this by fetching the TCs and returning them from rtl8169_get_stats64. The counters collected in the driver obviously disappear as soon as the driver is unloaded so after a driver is loaded the counters always start at 0. The TCs on the other hand are only reset by a power cycle. Without further considerations the values collected by the driver would not match up against the TC values. This patch introduces a new function rtl8169_reset_counters which resets the TCs. Also, since rtl8169_reset_counters shares most of its code with rtl8169_update_counters, refactor the shared code into two new functions rtl8169_map_counters and rtl8169_unmap_counters. Unfortunately chip versions prior to RTL_GIGA_MAC_VER_19 don't allow to reset the TCs programatically. Therefore introduce an addition to the rtl8169_private struct and a function rtl8169_init_counter_offsets to store the TCs at first rtl_open. Use these values as offsets in rtl8169_get_stats64. Propagate a failure to reset *and* update the counters up to rtl_open and emit a warning message, if so. Signed-off-by: Corinna Vinschen vinsc...@redhat.com --- drivers/net/ethernet/realtek/r8169.c | 165 +++ 1 file changed, 150 insertions(+), 15 deletions(-) diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index f790f61..1e4322c 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -637,6 +637,9 @@ enum rtl_register_content { /* _TBICSRBit */ TBILinkOK = 0x0200, + /* ResetCounterCommand */ + CounterReset= 0x1, + /* DumpCounterCommand */ CounterDump = 0x8, @@ -747,6 +750,14 @@ struct rtl8169_counters { __le16 tx_underun; }; +struct rtl8169_tc_offsets { + boolinited; + __le64 tx_errors; + __le32 tx_multi_collision; + __le32 rx_multicast; + __le16 tx_aborted; +}; + enum rtl_flag { RTL_FLAG_TASK_ENABLED, RTL_FLAG_TASK_SLOW_PENDING, @@ -824,6 +835,7 @@ struct rtl8169_private { struct mii_if_info mii; struct rtl8169_counters counters; + struct rtl8169_tc_offsets tc_offset; u32 saved_wolopts; u32 opts1_mask; @@ -2179,6 +2191,73 @@ static int rtl8169_get_sset_count(struct net_device *dev, int sset) } } +static struct rtl8169_counters *rtl8169_map_counters(struct net_device *dev, +dma_addr_t *paddr, +u32 counter_cmd) +{ + struct rtl8169_private *tp = netdev_priv(dev); + void __iomem *ioaddr = tp-mmio_addr; + struct device *d = tp-pci_dev-dev; + struct rtl8169_counters *counters; + u32 cmd; + + counters = dma_alloc_coherent(d, sizeof(*counters), paddr, GFP_KERNEL); + if (counters) { + RTL_W32(CounterAddrHigh, (u64)*paddr 32); + cmd = (u64)*paddr DMA_BIT_MASK(32); + RTL_W32(CounterAddrLow, cmd); + RTL_W32(CounterAddrLow, cmd | counter_cmd); + } + return counters; +} + +static void rtl8169_unmap_counters (struct net_device *dev, + dma_addr_t paddr, + struct rtl8169_counters *counters) +{ + struct rtl8169_private *tp = netdev_priv(dev); + void __iomem *ioaddr = tp-mmio_addr; + struct device *d = tp-pci_dev-dev; + + RTL_W32(CounterAddrLow, 0); + RTL_W32(CounterAddrHigh, 0); + + dma_free_coherent(d, sizeof(*counters), counters, paddr); +} + +DECLARE_RTL_COND(rtl_reset_counters_cond) +{ + void __iomem *ioaddr = tp-mmio_addr; + + return RTL_R32(CounterAddrLow) CounterReset; +} + +static bool rtl8169_reset_counters(struct net_device *dev) +{ + struct rtl8169_private *tp = netdev_priv(dev); + struct rtl8169_counters *counters; + dma_addr_t paddr; + bool ret = true; + + /* +* Versions prior to RTL_GIGA_MAC_VER_19 don't support resetting the +* tally counters. +*/ + if (tp-mac_version RTL_GIGA_MAC_VER_19) + return true; + + counters = rtl8169_map_counters(dev, paddr, CounterReset); + if (!counters) + return false; + + if (!rtl_udelay_loop_wait_low(tp, rtl_reset_counters_cond, 10, 1000)) + ret = false; + + rtl8169_unmap_counters(dev, paddr, counters); + + return ret; +} + DECLARE_RTL_COND(rtl_counters_cond) { void __iomem *ioaddr = tp-mmio_addr; @@ -2186,38
Re: [PATCH v2 net-next] r8169: Add values missing in @get_stats64 from HW counters
On Aug 24 09:33, Corinna Vinschen wrote: On Aug 22 13:23, Francois Romieu wrote: Corinna Vinschen vinsc...@redhat.com : [...] That won't happen with the current patch because only rtl8169_reset_counters would print a log message, it's only called from open, and open occurs rather seldom. Atop of that the code only tries to reset counters on HW supporting it, and only if resetting on the HW fails, there will be a log message at all. There's no reasonable chance that failing to reset the counters will lead to log flooding. Thanks for reformulating it. We are in violent agreement here. [...] I'm not trying to avoid work, I'm trying to understand. As far as I see it failing to reset the counters has no impact on the viability of the code. It's still working with offsets and if the offset is 0 or non-0, the user space won't see the difference in the values returned by @get_stats64. Successful resetting the counters is just a bonus. Sorry, my english was really bad: the code should propagate failure when rtl8169_reset_counters and rtl8169_update_counters *simultaneously* fail. Uhm... sorry, but that still doesn't answer the question. As you can see in my patch, the initalization at open time is already encapsulated in a function rtl8169_init_counter_offsets. Assuming rtl8169_init_counter_offsets returns -1 if both functions, rtl8169_reset_counters and rtl8169_update_counters fail. Then... what? Not being able to reset or update the counters is still not at all fatal for the operation of the NIC as a whole and rtl_open in particular: rtl_open() { [...] /* This is non-fatal. */ if (!rtl8169_init_counter_offsets ()) { /* What to do here??? */ } [...] } Never mind, I rearranged the code so that the warning is now printed in rtl_open. All functions related to the counter reset/update are now bool functions. I just sent a v3 patch. Corinna pgp7HuqR5UDnR.pgp Description: PGP signature
[PATCH v4 net-next] r8169: Add values missing in @get_stats64 from HW counters
The r8169 driver collects statistical information returned by @get_stats64 by counting them in the driver itself, even though many (but not all) of the values are already collected by tally counters (TCs) in the NIC. Some of these TC values are not returned by @get_stats64. Especially the received multicast packages are missing from /proc/net/dev. Rectify this by fetching the TCs and returning them from rtl8169_get_stats64. The counters collected in the driver obviously disappear as soon as the driver is unloaded so after a driver is loaded the counters always start at 0. The TCs on the other hand are only reset by a power cycle. Without further considerations the values collected by the driver would not match up against the TC values. This patch introduces a new function rtl8169_reset_counters which resets the TCs. Also, since rtl8169_reset_counters shares most of its code with rtl8169_update_counters, refactor the shared code into two new functions rtl8169_map_counters and rtl8169_unmap_counters. Unfortunately chip versions prior to RTL_GIGA_MAC_VER_19 don't allow to reset the TCs programatically. Therefore introduce an addition to the rtl8169_private struct and a function rtl8169_init_counter_offsets to store the TCs at first rtl_open. Use these values as offsets in rtl8169_get_stats64. Propagate a failure to reset *and* update the counters up to rtl_open and emit a warning message, if so. Signed-off-by: Corinna Vinschen vinsc...@redhat.com --- [v3 - v4]: Remove useless rtl8169_close chunk changing case of comment drivers/net/ethernet/realtek/r8169.c | 163 --- 1 file changed, 149 insertions(+), 14 deletions(-) diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index f790f61..d6d39df 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -637,6 +637,9 @@ enum rtl_register_content { /* _TBICSRBit */ TBILinkOK = 0x0200, + /* ResetCounterCommand */ + CounterReset= 0x1, + /* DumpCounterCommand */ CounterDump = 0x8, @@ -747,6 +750,14 @@ struct rtl8169_counters { __le16 tx_underun; }; +struct rtl8169_tc_offsets { + boolinited; + __le64 tx_errors; + __le32 tx_multi_collision; + __le32 rx_multicast; + __le16 tx_aborted; +}; + enum rtl_flag { RTL_FLAG_TASK_ENABLED, RTL_FLAG_TASK_SLOW_PENDING, @@ -824,6 +835,7 @@ struct rtl8169_private { struct mii_if_info mii; struct rtl8169_counters counters; + struct rtl8169_tc_offsets tc_offset; u32 saved_wolopts; u32 opts1_mask; @@ -2179,6 +2191,73 @@ static int rtl8169_get_sset_count(struct net_device *dev, int sset) } } +static struct rtl8169_counters *rtl8169_map_counters(struct net_device *dev, +dma_addr_t *paddr, +u32 counter_cmd) +{ + struct rtl8169_private *tp = netdev_priv(dev); + void __iomem *ioaddr = tp-mmio_addr; + struct device *d = tp-pci_dev-dev; + struct rtl8169_counters *counters; + u32 cmd; + + counters = dma_alloc_coherent(d, sizeof(*counters), paddr, GFP_KERNEL); + if (counters) { + RTL_W32(CounterAddrHigh, (u64)*paddr 32); + cmd = (u64)*paddr DMA_BIT_MASK(32); + RTL_W32(CounterAddrLow, cmd); + RTL_W32(CounterAddrLow, cmd | counter_cmd); + } + return counters; +} + +static void rtl8169_unmap_counters (struct net_device *dev, + dma_addr_t paddr, + struct rtl8169_counters *counters) +{ + struct rtl8169_private *tp = netdev_priv(dev); + void __iomem *ioaddr = tp-mmio_addr; + struct device *d = tp-pci_dev-dev; + + RTL_W32(CounterAddrLow, 0); + RTL_W32(CounterAddrHigh, 0); + + dma_free_coherent(d, sizeof(*counters), counters, paddr); +} + +DECLARE_RTL_COND(rtl_reset_counters_cond) +{ + void __iomem *ioaddr = tp-mmio_addr; + + return RTL_R32(CounterAddrLow) CounterReset; +} + +static bool rtl8169_reset_counters(struct net_device *dev) +{ + struct rtl8169_private *tp = netdev_priv(dev); + struct rtl8169_counters *counters; + dma_addr_t paddr; + bool ret = true; + + /* +* Versions prior to RTL_GIGA_MAC_VER_19 don't support resetting the +* tally counters. +*/ + if (tp-mac_version RTL_GIGA_MAC_VER_19) + return true; + + counters = rtl8169_map_counters(dev, paddr, CounterReset); + if (!counters) + return false; + + if (!rtl_udelay_loop_wait_low(tp, rtl_reset_counters_cond, 10, 1000)) + ret = false; + + rtl8169_unmap_counters(dev, paddr, counters); + + return ret; +} + DECLARE_RTL_COND
Re: [PATCH v2 net-next] r8169: Add values missing in @get_stats64 from HW counters
Sorry, I forgot to mention that I tested this patch on three different chip versions, RTL_GIGA_MAC_VER_23, RTL_GIGA_MAC_VER_33 and RTL_GIGA_MAC_VER_35. I couldn't test on pre-RTL_GIGA_MAC_VER_19, but the offset handling without counter reset already worked as expected on later chip versions, so I'm pretty confident that older chip versions should work accordingly. On Aug 21 12:09, Corinna Vinschen wrote: The r8169 driver collects statistical information returned by @get_stats64 by counting them in the driver itself, even though many (but not all) of the values are already collected by tally counters (TCs) in the NIC. Some of these TC values are not returned by @get_stats64. Especially the received multicast packages are missing from /proc/net/dev. Rectify this by fetching the TCs and returning them from rtl8169_get_stats64. The counters collected in the driver obviously disappear as soon as the driver is unloaded so after a driver is loaded the counters always start at 0. The TCs on the other hand are only reset by a power cycle. Without further considerations the values collected by the driver would not match up against the TC values. This patch introduces a new function rtl8169_reset_counters which resets the TCs. Unfortunately chip versions prior to RTL_GIGA_MAC_VER_19 don't allow to reset the TCs programatically. Therefore introduce an addition to the rtl8169_private struct and a function rtl8169_init_counter_offsets to store the TCs at first rtl_open. Use these values as offsets in rtl8169_get_stats64. Signed-off-by: Corinna Vinschen vinsc...@redhat.com --- drivers/net/ethernet/realtek/r8169.c | 107 +++ 1 file changed, 107 insertions(+) diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index f790f61..f26a48d 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -637,6 +637,9 @@ enum rtl_register_content { /* _TBICSRBit */ TBILinkOK = 0x0200, + /* ResetCounterCommand */ + CounterReset= 0x1, + /* DumpCounterCommand */ CounterDump = 0x8, @@ -747,6 +750,14 @@ struct rtl8169_counters { __le16 tx_underun; }; +struct rtl8169_tc_offsets { + boolinited; + __le64 tx_errors; + __le32 tx_multi_collision; + __le32 rx_multicast; + __le16 tx_aborted; +}; + enum rtl_flag { RTL_FLAG_TASK_ENABLED, RTL_FLAG_TASK_SLOW_PENDING, @@ -824,6 +835,7 @@ struct rtl8169_private { struct mii_if_info mii; struct rtl8169_counters counters; + struct rtl8169_tc_offsets tc_offset; u32 saved_wolopts; u32 opts1_mask; @@ -2179,6 +2191,47 @@ static int rtl8169_get_sset_count(struct net_device *dev, int sset) } } +DECLARE_RTL_COND(rtl_reset_counters_cond) +{ + void __iomem *ioaddr = tp-mmio_addr; + + return RTL_R32(CounterAddrLow) CounterReset; +} + +static void rtl8169_reset_counters(struct net_device *dev) +{ + struct rtl8169_private *tp = netdev_priv(dev); + void __iomem *ioaddr = tp-mmio_addr; + struct device *d = tp-pci_dev-dev; + struct rtl8169_counters *counters; + dma_addr_t paddr; + u32 cmd; + + /* + * Versions prior to RTL_GIGA_MAC_VER_19 don't support resetting the + * tally counters. + */ + if (tp-mac_version RTL_GIGA_MAC_VER_19) + return; + + counters = dma_alloc_coherent(d, sizeof(*counters), paddr, GFP_KERNEL); + if (!counters) + return; + + RTL_W32(CounterAddrHigh, (u64)paddr 32); + cmd = (u64)paddr DMA_BIT_MASK(32); + RTL_W32(CounterAddrLow, cmd); + RTL_W32(CounterAddrLow, cmd | CounterReset); + + if (!rtl_udelay_loop_wait_low(tp, rtl_reset_counters_cond, 10, 1000)) + netif_warn(tp, hw, dev, counter reset failed\n); + + RTL_W32(CounterAddrLow, 0); + RTL_W32(CounterAddrHigh, 0); + + dma_free_coherent(d, sizeof(*counters), counters, paddr); +} + DECLARE_RTL_COND(rtl_counters_cond) { void __iomem *ioaddr = tp-mmio_addr; @@ -2220,6 +2273,39 @@ static void rtl8169_update_counters(struct net_device *dev) dma_free_coherent(d, sizeof(*counters), counters, paddr); } +static void rtl8169_init_counter_offsets(struct net_device *dev) +{ + struct rtl8169_private *tp = netdev_priv(dev); + + /* + * rtl8169_init_counter_offsets is called from rtl_open. On chip + * versions prior to RTL_GIGA_MAC_VER_19 the tally counters are only + * reset by a power cycle, while the counter values collected by the + * driver are reset at every driver unload/load cycle. + * + * To make sure the HW values returned by @get_stats64 match the SW + * values, we collect the initial values at first open(*) and use them + * as offsets to normalize the values
Re: [PATCH v2 net-next] r8169: Add values missing in @get_stats64 from HW counters
On Aug 21 21:39, Francois Romieu wrote: Corinna Vinschen vinsc...@redhat.com : [...] diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index f790f61..f26a48d 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c [...] @@ -2179,6 +2191,47 @@ static int rtl8169_get_sset_count(struct net_device *dev, int sset) } } +DECLARE_RTL_COND(rtl_reset_counters_cond) +{ + void __iomem *ioaddr = tp-mmio_addr; + + return RTL_R32(CounterAddrLow) CounterReset; +} + +static void rtl8169_reset_counters(struct net_device *dev) +{ rtl8169_reset_counters duplicates most of rtl8169_update_counters. Please factor out the dma_alloc + parametrized CounterAddrLow write + cleanup. Ok, will do. + rtl8169_reset_counters(dev); + + rtl8169_update_counters(dev); The code should propagate failure when both rtl8169_reset_counters and rtl8169_update_counters fail. This one I don't understand. Neither failing to reset the counters nor failing to update the counters is fatal for the driver. So far the (unchanged) rtl8169_update_counters doesn't even print a log message, while a failing reset in rtl8169_reset_counters now does. Why is that not sufficent? Thanks, Corinna pgpKL0AKrvb57.pgp Description: PGP signature
[PATCH v2 net-next] r8169: Add values missing in @get_stats64 from HW counters
The r8169 driver collects statistical information returned by @get_stats64 by counting them in the driver itself, even though many (but not all) of the values are already collected by tally counters (TCs) in the NIC. Some of these TC values are not returned by @get_stats64. Especially the received multicast packages are missing from /proc/net/dev. Rectify this by fetching the TCs and returning them from rtl8169_get_stats64. The counters collected in the driver obviously disappear as soon as the driver is unloaded so after a driver is loaded the counters always start at 0. The TCs on the other hand are only reset by a power cycle. Without further considerations the values collected by the driver would not match up against the TC values. This patch introduces a new function rtl8169_reset_counters which resets the TCs. Unfortunately chip versions prior to RTL_GIGA_MAC_VER_19 don't allow to reset the TCs programatically. Therefore introduce an addition to the rtl8169_private struct and a function rtl8169_init_counter_offsets to store the TCs at first rtl_open. Use these values as offsets in rtl8169_get_stats64. Signed-off-by: Corinna Vinschen vinsc...@redhat.com --- drivers/net/ethernet/realtek/r8169.c | 107 +++ 1 file changed, 107 insertions(+) diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index f790f61..f26a48d 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -637,6 +637,9 @@ enum rtl_register_content { /* _TBICSRBit */ TBILinkOK = 0x0200, + /* ResetCounterCommand */ + CounterReset= 0x1, + /* DumpCounterCommand */ CounterDump = 0x8, @@ -747,6 +750,14 @@ struct rtl8169_counters { __le16 tx_underun; }; +struct rtl8169_tc_offsets { + boolinited; + __le64 tx_errors; + __le32 tx_multi_collision; + __le32 rx_multicast; + __le16 tx_aborted; +}; + enum rtl_flag { RTL_FLAG_TASK_ENABLED, RTL_FLAG_TASK_SLOW_PENDING, @@ -824,6 +835,7 @@ struct rtl8169_private { struct mii_if_info mii; struct rtl8169_counters counters; + struct rtl8169_tc_offsets tc_offset; u32 saved_wolopts; u32 opts1_mask; @@ -2179,6 +2191,47 @@ static int rtl8169_get_sset_count(struct net_device *dev, int sset) } } +DECLARE_RTL_COND(rtl_reset_counters_cond) +{ + void __iomem *ioaddr = tp-mmio_addr; + + return RTL_R32(CounterAddrLow) CounterReset; +} + +static void rtl8169_reset_counters(struct net_device *dev) +{ + struct rtl8169_private *tp = netdev_priv(dev); + void __iomem *ioaddr = tp-mmio_addr; + struct device *d = tp-pci_dev-dev; + struct rtl8169_counters *counters; + dma_addr_t paddr; + u32 cmd; + + /* +* Versions prior to RTL_GIGA_MAC_VER_19 don't support resetting the +* tally counters. +*/ + if (tp-mac_version RTL_GIGA_MAC_VER_19) + return; + + counters = dma_alloc_coherent(d, sizeof(*counters), paddr, GFP_KERNEL); + if (!counters) + return; + + RTL_W32(CounterAddrHigh, (u64)paddr 32); + cmd = (u64)paddr DMA_BIT_MASK(32); + RTL_W32(CounterAddrLow, cmd); + RTL_W32(CounterAddrLow, cmd | CounterReset); + + if (!rtl_udelay_loop_wait_low(tp, rtl_reset_counters_cond, 10, 1000)) + netif_warn(tp, hw, dev, counter reset failed\n); + + RTL_W32(CounterAddrLow, 0); + RTL_W32(CounterAddrHigh, 0); + + dma_free_coherent(d, sizeof(*counters), counters, paddr); +} + DECLARE_RTL_COND(rtl_counters_cond) { void __iomem *ioaddr = tp-mmio_addr; @@ -2220,6 +2273,39 @@ static void rtl8169_update_counters(struct net_device *dev) dma_free_coherent(d, sizeof(*counters), counters, paddr); } +static void rtl8169_init_counter_offsets(struct net_device *dev) +{ + struct rtl8169_private *tp = netdev_priv(dev); + + /* +* rtl8169_init_counter_offsets is called from rtl_open. On chip +* versions prior to RTL_GIGA_MAC_VER_19 the tally counters are only +* reset by a power cycle, while the counter values collected by the +* driver are reset at every driver unload/load cycle. +* +* To make sure the HW values returned by @get_stats64 match the SW +* values, we collect the initial values at first open(*) and use them +* as offsets to normalize the values returned by @get_stats64. +* +* (*) We can't call rtl8169_init_counter_offsets from rtl_init_one +* for the reason stated in rtl8169_update_counters; CmdRxEnb is only +* set at open time by rtl_hw_start. +*/ + + if (tp-tc_offset.inited) + return; + + rtl8169_reset_counters(dev); + + rtl8169_update_counters(dev); + + tp-tc_offset.tx_errors = tp
Re: [PATCH] r8169: Add values missing in @get_stats64 from HW counters
On Aug 20 02:43, Hayes Wang wrote: Corinna Vinschen [mailto:vinsc...@redhat.com] Sent: Thursday, August 20, 2015 3:24 AM [...] + /* +* Versions prior to RTL_GIGA_MAC_VER_19 don't support resetting the +* tally counters. +*/ + if (tp-mac_version = RTL_GIGA_MAC_VER_19) { + RTL_W32(CounterAddrHigh, 0); + RTL_W32(CounterAddrLow, CounterReset); I check these with our engineers, and they say the bit 6 ~ 63 should be the valid 64 byte alignment memory address. Although you don’t want to dump the counters, the hw may also clear the data in the memory which is indicated by bit 6 ~ 63, when you reset the counters. Ok, that's easy enough to implement. What about CmdRxEnb? Are there chips which need this flag set to perform the counter reset? Thanks, Corinna pgpGrVZNY4vq8.pgp Description: PGP signature
Re: [PATCH] r8169: Add values missing in @get_stats64 from HW counters
On Aug 19 09:31, Hayes Wang wrote: Corinna Vinschen [mailto:vinsc...@redhat.com] Sent: Wednesday, August 19, 2015 5:13 PM [...] It could be cleared by setting bit 0, such as rtl_tally_reset() of r8152. Is it safe to assume that this is implemented in all NICs covered by r8169? It is supported from RTL8111C. That is, RTL_GIGA_MAC_VER_19 and later. Thanks. In that case I would prefer the same generic method for all chip versions, so I'd opt for storing the offset values at rtl_open time as my patch is doing right now. Is that acceptable? If so, wouldn't it make even more sense to use the hardware collected information in @get_stats64 throughout, except for the numbers collected *only* in software? I would be willing to propose a matching patch. Thanks, Corinna pgpJydjgGbJe7.pgp Description: PGP signature
Re: [PATCH] r8169: Add values missing in @get_stats64 from HW counters
On Aug 19 15:07, Corinna Vinschen wrote: On Aug 19 09:31, Hayes Wang wrote: Corinna Vinschen [mailto:vinsc...@redhat.com] Sent: Wednesday, August 19, 2015 5:13 PM [...] It could be cleared by setting bit 0, such as rtl_tally_reset() of r8152. Is it safe to assume that this is implemented in all NICs covered by r8169? It is supported from RTL8111C. That is, RTL_GIGA_MAC_VER_19 and later. Thanks. In that case I would prefer the same generic method for all chip versions, so I'd opt for storing the offset values at rtl_open time as my patch is doing right now. Is that acceptable? If so, wouldn't it make even more sense to use the hardware collected information in @get_stats64 throughout, except for the numbers collected *only* in software? I would be willing to propose a matching patch. It just occured to me that the combination of resetting the counters on post-RTL_GIGA_MAC_VER_19 chips plus offset handling would be quite nice, because it would reset also the small 16 and 32 bit counters. So I'd like to propose a patch which combines both techniques, if that's an acceptable way to go forward. Btw., does setting the reset bit in CounterAddrLow work the same way as setting the CounterDump flag? I.e, does the driver have to wait for the hardware to set the bit to 0 again to be sure the reset is finished? Thanks in advance, Corinna pgp8ULHO1_RPj.pgp Description: PGP signature
Re: [PATCH] r8169: Add values missing in @get_stats64 from HW counters
On Aug 19 16:24, Corinna Vinschen wrote: On Aug 19 15:07, Corinna Vinschen wrote: On Aug 19 09:31, Hayes Wang wrote: Corinna Vinschen [mailto:vinsc...@redhat.com] Sent: Wednesday, August 19, 2015 5:13 PM [...] It could be cleared by setting bit 0, such as rtl_tally_reset() of r8152. Is it safe to assume that this is implemented in all NICs covered by r8169? It is supported from RTL8111C. That is, RTL_GIGA_MAC_VER_19 and later. Thanks. In that case I would prefer the same generic method for all chip versions, so I'd opt for storing the offset values at rtl_open time as my patch is doing right now. Is that acceptable? If so, wouldn't it make even more sense to use the hardware collected information in @get_stats64 throughout, except for the numbers collected *only* in software? I would be willing to propose a matching patch. It just occured to me that the combination of resetting the counters on post-RTL_GIGA_MAC_VER_19 chips plus offset handling would be quite nice, because it would reset also the small 16 and 32 bit counters. So I'd like to propose a patch which combines both techniques, if that's an acceptable way to go forward. Btw., does setting the reset bit in CounterAddrLow work the same way as setting the CounterDump flag? I.e, does the driver have to wait for the hardware to set the bit to 0 again to be sure the reset is finished? Here's a preliminary implementation of the counter reset code. It's based on my previous patch. It calls the new rtl8169_reset_counters prior to rtl8169_update_counters from rtl_open. I tested the patch successfully on a RTL8111f NIC (RTL_GIGA_MAC_VER_35). This isn't an official patch submission, I'm just sending this for further discussion. I'll make a full patch submission if the code is acceptable. diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index e7c7955..204f7e7 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -637,6 +637,9 @@ enum rtl_register_content { /* _TBICSRBit */ TBILinkOK = 0x0200, + /* ResetCounterCommand */ + CounterReset= 0x1, + /* DumpCounterCommand */ CounterDump = 0x8, @@ -2188,6 +2191,31 @@ static int rtl8169_get_sset_count(struct net_device *dev, int sset) } } +DECLARE_RTL_COND(rtl_reset_counters_cond) +{ + void __iomem *ioaddr = tp-mmio_addr; + + return RTL_R32(CounterAddrLow) CounterReset; +} + +static void rtl8169_reset_counters(struct net_device *dev) +{ + struct rtl8169_private *tp = netdev_priv(dev); + void __iomem *ioaddr = tp-mmio_addr; + + /* +* Versions prior to RTL_GIGA_MAC_VER_19 don't support resetting the +* tally counters. +*/ + if (tp-mac_version = RTL_GIGA_MAC_VER_19) { + RTL_W32(CounterAddrHigh, 0); + RTL_W32(CounterAddrLow, CounterReset); + if (!rtl_udelay_loop_wait_low(tp, rtl_reset_counters_cond, + 10, 1000)) + netif_warn(tp, hw, dev, counter reset failed\n); + } +} + DECLARE_RTL_COND(rtl_counters_cond) { void __iomem *ioaddr = tp-mmio_addr; @@ -2234,11 +2262,10 @@ static void rtl8169_init_tc_counter_offset(struct net_device *dev) struct rtl8169_private *tp = netdev_priv(dev); /* -* rtl8169_init_tc_counter_offset is called from rtl_open. The tally -* counters are only reset by a power cycle, while the counter values -* collected by the driver are reset at every driver unload/load cycle. -* There's also no (documented?) way to reset the tally counters -* programatically. +* rtl8169_init_tc_counter_offset is called from rtl_open. On chip +* versions prior to RTL_GIGA_MAC_VER_19 the tally counters are only +* reset by a power cycle, while the counter values collected by the +* driver are reset at every driver unload/load cycle. * * To make sure the HW values returned by @get_stats64 match the SW * values, we collect the initial values at first open(*) and use them @@ -2252,6 +2279,8 @@ static void rtl8169_init_tc_counter_offset(struct net_device *dev) if (tp-tc_offset.inited) return; + rtl8169_reset_counters(dev); + rtl8169_update_counters(dev); tp-tc_offset.tx_errors = tp-counters.tx_errors; Thanks, Corinna pgplEX60t_D0q.pgp Description: PGP signature
Re: [PATCH] r8169: Add values missing in @get_stats64 from HW counters
On Aug 19 02:51, Hayes Wang wrote: [...] The TCs are only reset by a power cycle and there's no known way to reset them programatically. It could be cleared by setting bit 0, such as rtl_tally_reset() of r8152. Thanks for this hint. I'll give it a try. Is it safe to assume that this is implemented in all NICs covered by r8169? Otherwise it's more safe to use offset values as in my patch, I think. Thanks, Corinna pgpLJr0cODr0S.pgp Description: PGP signature
Re: [PATCH] r8169: Add values missing in @get_stats64 from HW counters
On Aug 18 19:05, David Miller wrote: From: Francois Romieu rom...@fr.zoreil.com Date: Tue, 18 Aug 2015 23:40:17 +0200 Corinna Vinschen vinsc...@redhat.com : The r8169 driver collects statistical information returned by @get_stats64 by counting them in the driver itself, even though many (but not all) of the values are already collected by tally counters (TCs) in the NIC. Some of these TC values are not returned by @get_stats64. Especially the received multicast packages are missing from /proc/net/dev. Rectify this by fetching the TCs and returning them from rtl8169_get_stats64. It is not exactly a some / many / not all / confused [*] picture - get_stats64 provides software managed values - ethtool provides hardware only values [*] ok, rx_missed is kinda special. The counters collected in the driver obviously disappear as soon as the driver is unloaded so after a driver is loaded the counters always start at 0. The TCs are only reset by a power cycle and there's no known way to reset them programatically. Without further considerations the values collected by the driver would not match up against the TC values. I'd rather: 1. keep software and hardware stats separated 2. push to userspace the responsibility to offset device specific hardware values jumps My 0.02 €. Any value that is provided by -get_stats64() should be provided by whatever mechanism is available and determined to be prudent for the device in question. If the device is not tracking the event, this means pure software counters. If the device tracks the event, you should attempt to use the hardware facility to provide the -get_stats64() values. If the hardware provides counters not starting at zero on driver load, one option to handle that is to load the initial values of all of those counters and calculate diffs at -get_stats64() time. That's what my code is doing, just for a limited set of values. I was already wondering why some values available in the hardware tally counters are additionally counted in software. I was planning to propose to use the hardware counters in @get_stats64 in the first place and only count values in the driver which are not provided by the hardware Corinna pgpv2lzWCTlZ1.pgp Description: PGP signature
[PATCH] r8169: Add values missing in @get_stats64 from HW counters
The r8169 driver collects statistical information returned by @get_stats64 by counting them in the driver itself, even though many (but not all) of the values are already collected by tally counters (TCs) in the NIC. Some of these TC values are not returned by @get_stats64. Especially the received multicast packages are missing from /proc/net/dev. Rectify this by fetching the TCs and returning them from rtl8169_get_stats64. The counters collected in the driver obviously disappear as soon as the driver is unloaded so after a driver is loaded the counters always start at 0. The TCs are only reset by a power cycle and there's no known way to reset them programatically. Without further considerations the values collected by the driver would not match up against the TC values. Therefore introduce an addition to the rtl8169_private struct and a function rtl8169_init_tc_counter_offset to store the TCs at first rtl_open. Use these values as offsets in rtl8169_get_stats64. Signed-off-by: Corinna Vinschen vinsc...@redhat.com --- drivers/net/ethernet/realtek/r8169.c | 62 1 file changed, 62 insertions(+) diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index f790f61..e7c7955 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -747,6 +747,14 @@ struct rtl8169_counters { __le16 tx_underun; }; +struct rtl8169_tc_offsets { + boolinited; + __le64 tx_errors; + __le32 tx_multi_collision; + __le32 rx_multicast; + __le16 tx_aborted; +}; + enum rtl_flag { RTL_FLAG_TASK_ENABLED, RTL_FLAG_TASK_SLOW_PENDING, @@ -824,6 +832,7 @@ struct rtl8169_private { struct mii_if_info mii; struct rtl8169_counters counters; + struct rtl8169_tc_offsets tc_offset; u32 saved_wolopts; u32 opts1_mask; @@ -2220,6 +2229,38 @@ static void rtl8169_update_counters(struct net_device *dev) dma_free_coherent(d, sizeof(*counters), counters, paddr); } +static void rtl8169_init_tc_counter_offset(struct net_device *dev) +{ + struct rtl8169_private *tp = netdev_priv(dev); + + /* +* rtl8169_init_tc_counter_offset is called from rtl_open. The tally +* counters are only reset by a power cycle, while the counter values +* collected by the driver are reset at every driver unload/load cycle. +* There's also no (documented?) way to reset the tally counters +* programatically. +* +* To make sure the HW values returned by @get_stats64 match the SW +* values, we collect the initial values at first open(*) and use them +* as offsets to normalize the values returned by @get_stats64. +* +* (*) We can't call rtl8169_init_tc_counter_offset from rtl_init_one +* for the reason stated in rtl8169_update_counters; CmdRxEnb is only +* set at open time by rtl_hw_start. +*/ + + if (tp-tc_offset.inited) + return; + + rtl8169_update_counters(dev); + + tp-tc_offset.tx_errors = tp-counters.tx_errors; + tp-tc_offset.tx_multi_collision = tp-counters.tx_multi_collision; + tp-tc_offset.rx_multicast = tp-counters.rx_multicast; + tp-tc_offset.tx_aborted = tp-counters.tx_aborted; + tp-tc_offset.inited = true; +} + static void rtl8169_get_ethtool_stats(struct net_device *dev, struct ethtool_stats *stats, u64 *data) { @@ -7631,6 +7672,8 @@ static int rtl_open(struct net_device *dev) rtl_hw_start(dev); + rtl8169_init_tc_counter_offset(dev); + netif_start_queue(dev); rtl_unlock_work(tp); @@ -7689,6 +7732,25 @@ rtl8169_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats) stats-rx_fifo_errors = dev-stats.rx_fifo_errors; stats-rx_missed_errors = dev-stats.rx_missed_errors; + /* +* Fetch additonal counter values missing in stats collected by driver +* from tally counters. +*/ + rtl8169_update_counters(dev); + + /* +* Subtract values fetched during initalization. +* See rtl8169_init_tc_counter_offset for a description why we do that. +*/ + stats-tx_errors = le64_to_cpu(tp-counters.tx_errors) - + le64_to_cpu(tp-tc_offset.tx_errors); + stats-collisions = le32_to_cpu(tp-counters.tx_multi_collision) - + le32_to_cpu(tp-tc_offset.tx_multi_collision); + stats-multicast = le32_to_cpu(tp-counters.rx_multicast) - + le32_to_cpu(tp-tc_offset.rx_multicast); + stats-tx_aborted_errors = le16_to_cpu(tp-counters.tx_aborted) - + le16_to_cpu(tp-tc_offset.tx_aborted); + return stats; } -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo