from:"Alexander Duyck"

Re: [PATCH RFC net] igb: Fix XDP with PTP enabled

2021-04-13 Thread Alexander Duyck

On Mon, Apr 12, 2021 at 7:29 AM Jesper Dangaard Brouer
 wrote:
>
>
> On Mon, 12 Apr 2021 12:17:13 +0200
> Kurt Kanzenbach  wrote:
>
> > When using native XDP with the igb driver, the XDP frame data doesn't point 
> > to
> > the beginning of the packet. It's off by 16 bytes. Everything works as 
> > expected
> > with XDP skb mode.
> >
> > Actually these 16 bytes are used to store the packet timestamps. Therefore, 
> > pull
> > the timestamp before executing any XDP operations and adjust all other code
> > accordingly. The igc driver does it like that as well.
>
> (Cc. Alexander Duyck)
>
> Do we have enough room for the packet page-split tricks when these 16
> bytes are added?
>
> AFAIK this driver like ixgbe+i40e split the page in two 2048 bytes packets.
>
>  The XDP headroom is reduced to 192 bytes.
>  The skb_shared_info is 320 bytes in size.
>
> 2048-192-320 = 1536 bytes
>
>  MTU(L3) 1500
>  Ethernet (L2) headers 14 bytes
>  VLAN 4 bytes, but Q-in-Q vlan 8 bytes.
>
> Single VLAN: 1536-1500-14-4 = 18 bytes left
> Q-in-q VLAN: 1536-1500-14-8 = 14 bytes left

So the Q-in-q case should kick us over to jumbo frames since we have
to add the extra size into the final supported frame size. So the size
itself should work.

> > diff --git a/drivers/net/ethernet/intel/igb/igb_ptp.c 
> > b/drivers/net/ethernet/intel/igb/igb_ptp.c
> > index 86a576201f5f..0cbdf48285d3 100644
> > --- a/drivers/net/ethernet/intel/igb/igb_ptp.c
> > +++ b/drivers/net/ethernet/intel/igb/igb_ptp.c
> > @@ -863,23 +863,22 @@ static void igb_ptp_tx_hwtstamp(struct igb_adapter 
> > *adapter)
> >   * igb_ptp_rx_pktstamp - retrieve Rx per packet timestamp
> >   * @q_vector: Pointer to interrupt specific structure
> >   * @va: Pointer to address containing Rx buffer
> > - * @skb: Buffer containing timestamp and packet
> >   *
> >   * This function is meant to retrieve a timestamp from the first buffer of 
> > an
> >   * incoming frame.  The value is stored in little endian format starting on
> >   * byte 8
> >   *
> > - * Returns: 0 if success, nonzero if failure
> > + * Returns: 0 on failure, timestamp on success
> >   **/
> > -int igb_ptp_rx_pktstamp(struct igb_q_vector *q_vector, void *va,
> > - struct sk_buff *skb)
> > +ktime_t igb_ptp_rx_pktstamp(struct igb_q_vector *q_vector, void *va)
> >  {
> >   struct igb_adapter *adapter = q_vector->adapter;
> > + struct skb_shared_hwtstamps ts;
> >   __le64 *regval = (__le64 *)va;
> >   int adjust = 0;
> >
> >   if (!(adapter->ptp_flags & IGB_PTP_ENABLED))
> > - return IGB_RET_PTP_DISABLED;
> > + return 0;
> >
> >   /* The timestamp is recorded in little endian format.
> >* DWORD: 0123
> > @@ -888,10 +887,9 @@ int igb_ptp_rx_pktstamp(struct igb_q_vector *q_vector, 
> > void *va,
> >
> >   /* check reserved dwords are zero, be/le doesn't matter for zero */
> >   if (regval[0])
> > - return IGB_RET_PTP_INVALID;
> > + return 0;
> >

One thing that needs to be cleaned up in the patch is that if it is
going to drop these return values it should probably drop the defines
for them since I don't think they are used anywhere else.

Re: [igb] netconsole triggers warning in netpoll_poll_dev

2021-04-07 Thread Alexander Duyck

On Wed, Apr 7, 2021 at 11:07 AM Jakub Kicinski  wrote:
>
> On Wed, 7 Apr 2021 09:25:28 -0700 Alexander Duyck wrote:
> > On Wed, Apr 7, 2021 at 8:37 AM Jakub Kicinski  wrote:
> > >
> > > On Wed, 7 Apr 2021 08:00:53 +0200 Oleksandr Natalenko wrote:
> > > > Thanks for the effort, but reportedly [1] it made no difference,
> > > > unfortunately.
> > > >
> > > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=212573#c8
> > >
> > > The only other option I see is that somehow the NAPI has no rings.
> > >
> > > diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
> > > b/drivers/net/ethernet/intel/igb/igb_main.c
> > > index a45cd2b416c8..24568adc2fb1 100644
> > > --- a/drivers/net/ethernet/intel/igb/igb_main.c
> > > +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> > > @@ -7980,7 +7980,7 @@ static int igb_poll(struct napi_struct *napi, int 
> > > budget)
> > > struct igb_q_vector *q_vector = container_of(napi,
> > >  struct igb_q_vector,
> > >  napi);
> > > -   bool clean_complete = true;
> > > +   bool clean_complete = q_vector->tx.ring || q_vector->rx.ring;
> > > int work_done = 0;
> > >
> > >  #ifdef CONFIG_IGB_DCA
> >
> > It might make sense to just cast the work_done as a unsigned int, and
> > then on the end of igb_poll use:
> >   return min_t(unsigned int, work_done, budget - 1);
>
> Sure, that's simplest. I wasn't sure something is supposed to prevent
> this condition or if it's okay to cover it up.

I'm pretty sure it is okay to cover it up. In this case the "budget -
1" is supposed to be the upper limit on what can be reported. I think
it was assuming an unsigned value anyway.

Another alternative would be to default clean_complete to !!budget.
Then if budget is 0 clean_complete would always return false.

Re: [igb] netconsole triggers warning in netpoll_poll_dev

2021-04-07 Thread Alexander Duyck

On Wed, Apr 7, 2021 at 8:37 AM Jakub Kicinski  wrote:
>
> On Wed, 7 Apr 2021 08:00:53 +0200 Oleksandr Natalenko wrote:
> > Thanks for the effort, but reportedly [1] it made no difference,
> > unfortunately.
> >
> > [1] https://bugzilla.kernel.org/show_bug.cgi?id=212573#c8
>
> The only other option I see is that somehow the NAPI has no rings.
>
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
> b/drivers/net/ethernet/intel/igb/igb_main.c
> index a45cd2b416c8..24568adc2fb1 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -7980,7 +7980,7 @@ static int igb_poll(struct napi_struct *napi, int 
> budget)
> struct igb_q_vector *q_vector = container_of(napi,
>  struct igb_q_vector,
>  napi);
> -   bool clean_complete = true;
> +   bool clean_complete = q_vector->tx.ring || q_vector->rx.ring;
> int work_done = 0;
>
>  #ifdef CONFIG_IGB_DCA

It might make sense to just cast the work_done as a unsigned int, and
then on the end of igb_poll use:
  return min_t(unsigned int, work_done, budget - 1);

Re: [PATCH mlx5-next v7 0/4] Dynamically assign MSI-X vectors count

2021-03-26 Thread Alexander Duyck

On Fri, Mar 26, 2021 at 10:08 AM Bjorn Helgaas  wrote:
>
> On Fri, Mar 26, 2021 at 09:00:50AM -0700, Alexander Duyck wrote:
> > On Thu, Mar 25, 2021 at 11:44 PM Leon Romanovsky  wrote:
> > > On Thu, Mar 25, 2021 at 03:28:36PM -0300, Jason Gunthorpe wrote:
> > > > On Thu, Mar 25, 2021 at 01:20:21PM -0500, Bjorn Helgaas wrote:
> > > > > On Thu, Mar 25, 2021 at 02:36:46PM -0300, Jason Gunthorpe wrote:
> > > > > > On Thu, Mar 25, 2021 at 12:21:44PM -0500, Bjorn Helgaas wrote:
> > > > > >
> > > > > > > NVMe and mlx5 have basically identical functionality in this 
> > > > > > > respect.
> > > > > > > Other devices and vendors will likely implement similar 
> > > > > > > functionality.
> > > > > > > It would be ideal if we had an interface generic enough to support
> > > > > > > them all.
> > > > > > >
> > > > > > > Is the mlx5 interface proposed here sufficient to support the NVMe
> > > > > > > model?  I think it's close, but not quite, because the the NVMe
> > > > > > > "offline" state isn't explicitly visible in the mlx5 model.
> > > > > >
> > > > > > I thought Keith basically said "offline" wasn't really useful as a
> > > > > > distinct idea. It is an artifact of nvme being a standards body
> > > > > > divorced from the operating system.
> > > > > >
> > > > > > In linux offline and no driver attached are the same thing, you'd
> > > > > > never want an API to make a nvme device with a driver attached 
> > > > > > offline
> > > > > > because it would break the driver.
> > > > >
> > > > > I think the sticky part is that Linux driver attach is not visible to
> > > > > the hardware device, while the NVMe "offline" state *is*.  An NVMe PF
> > > > > can only assign resources to a VF when the VF is offline, and the VF
> > > > > is only usable when it is online.
> > > > >
> > > > > For NVMe, software must ask the PF to make those online/offline
> > > > > transitions via Secondary Controller Offline and Secondary Controller
> > > > > Online commands [1].  How would this be integrated into this sysfs
> > > > > interface?
> > > >
> > > > Either the NVMe PF driver tracks the driver attach state using a bus
> > > > notifier and mirrors it to the offline state, or it simply
> > > > offline/onlines as part of the sequence to program the MSI change.
> > > >
> > > > I don't see why we need any additional modeling of this behavior.
> > > >
> > > > What would be the point of onlining a device without a driver?
> > >
> > > Agree, we should remember that we are talking about Linux kernel model
> > > and implementation, where _no_driver_ means _offline_.
> >
> > The only means you have of guaranteeing the driver is "offline" is by
> > holding on the device lock and checking it. So it is only really
> > useful for one operation and then you have to release the lock. The
> > idea behind having an "offline" state would be to allow you to
> > aggregate multiple potential operations into a single change.
> >
> > For example you would place the device offline, then change
> > interrupts, and then queues, and then you could online it again. The
> > kernel code could have something in place to prevent driver load on
> > "offline" devices. What it gives you is more of a transactional model
> > versus what you have right now which is more of a concurrent model.
>
> Thanks, Alex.  Leon currently does enforce the "offline" situation by
> holding the VF device lock while checking that it has no driver and
> asking the PF to do the assignment.  I agree this is only useful for a
> single operation.  Would the current series *prevent* a transactional
> model from being added later if it turns out to be useful?  I think I
> can imagine keeping the same sysfs files but changing the
> implementation to check for the VF being offline, while adding
> something new to control online/offline.

My concern would be that we are defining the user space interface.
Once we have this working as a single operation I could see us having
to support it that way going forward as somebody will script something
not expecting an "offline" sysfs file, a

Re: [PATCH mlx5-next v7 0/4] Dynamically assign MSI-X vectors count

2021-03-26 Thread Alexander Duyck

On Thu, Mar 25, 2021 at 11:44 PM Leon Romanovsky  wrote:
>
> On Thu, Mar 25, 2021 at 03:28:36PM -0300, Jason Gunthorpe wrote:
> > On Thu, Mar 25, 2021 at 01:20:21PM -0500, Bjorn Helgaas wrote:
> > > On Thu, Mar 25, 2021 at 02:36:46PM -0300, Jason Gunthorpe wrote:
> > > > On Thu, Mar 25, 2021 at 12:21:44PM -0500, Bjorn Helgaas wrote:
> > > >
> > > > > NVMe and mlx5 have basically identical functionality in this respect.
> > > > > Other devices and vendors will likely implement similar functionality.
> > > > > It would be ideal if we had an interface generic enough to support
> > > > > them all.
> > > > >
> > > > > Is the mlx5 interface proposed here sufficient to support the NVMe
> > > > > model?  I think it's close, but not quite, because the the NVMe
> > > > > "offline" state isn't explicitly visible in the mlx5 model.
> > > >
> > > > I thought Keith basically said "offline" wasn't really useful as a
> > > > distinct idea. It is an artifact of nvme being a standards body
> > > > divorced from the operating system.
> > > >
> > > > In linux offline and no driver attached are the same thing, you'd
> > > > never want an API to make a nvme device with a driver attached offline
> > > > because it would break the driver.
> > >
> > > I think the sticky part is that Linux driver attach is not visible to
> > > the hardware device, while the NVMe "offline" state *is*.  An NVMe PF
> > > can only assign resources to a VF when the VF is offline, and the VF
> > > is only usable when it is online.
> > >
> > > For NVMe, software must ask the PF to make those online/offline
> > > transitions via Secondary Controller Offline and Secondary Controller
> > > Online commands [1].  How would this be integrated into this sysfs
> > > interface?
> >
> > Either the NVMe PF driver tracks the driver attach state using a bus
> > notifier and mirrors it to the offline state, or it simply
> > offline/onlines as part of the sequence to program the MSI change.
> >
> > I don't see why we need any additional modeling of this behavior.
> >
> > What would be the point of onlining a device without a driver?
>
> Agree, we should remember that we are talking about Linux kernel model
> and implementation, where _no_driver_ means _offline_.

The only means you have of guaranteeing the driver is "offline" is by
holding on the device lock and checking it. So it is only really
useful for one operation and then you have to release the lock. The
idea behind having an "offline" state would be to allow you to
aggregate multiple potential operations into a single change.

For example you would place the device offline, then change
interrupts, and then queues, and then you could online it again. The
kernel code could have something in place to prevent driver load on
"offline" devices. What it gives you is more of a transactional model
versus what you have right now which is more of a concurrent model.

Re: [PATCH net-next 8/9] net: hns3: add support for queue bonding mode of flow director

2021-03-17 Thread Alexander Duyck

On Wed, Mar 17, 2021 at 6:28 PM Jakub Kicinski  wrote:
>
> On Thu, 18 Mar 2021 09:02:54 +0800 Huazhong Tan wrote:
> > On 2021/3/16 4:04, Jakub Kicinski wrote:
> > > On Mon, 15 Mar 2021 20:23:50 +0800 Huazhong Tan wrote:
> > >> From: Jian Shen 
> > >>
> > >> For device version V3, it supports queue bonding, which can
> > >> identify the tuple information of TCP stream, and create flow
> > >> director rules automatically, in order to keep the tx and rx
> > >> packets are in the same queue pair. The driver set FD_ADD
> > >> field of TX BD for TCP SYN packet, and set FD_DEL filed for
> > >> TCP FIN or RST packet. The hardware create or remove a fd rule
> > >> according to the TX BD, and it also support to age-out a rule
> > >> if not hit for a long time.
> > >>
> > >> The queue bonding mode is default to be disabled, and can be
> > >> enabled/disabled with ethtool priv-flags command.
> > > This seems like fairly well defined behavior, IMHO we should have a full
> > > device feature for it, rather than a private flag.
> >
> > Should we add a NETIF_F_NTUPLE_HW feature for it?
>
> It'd be better to keep the configuration close to the existing RFS
> config, no? Perhaps a new file under
>
>   /sys/class/net/$dev/queues/rx-$id/
>
> to enable the feature would be more appropriate?
>
> Otherwise I'd call it something like NETIF_F_RFS_AUTO ?
>
> Alex, any thoughts? IIRC Intel HW had a similar feature?

Yeah, this is pretty much what Intel used to put out as ATR aka Flow
Director. Although with that there was also a component of XPS. Flow
Director was the name of the hardware feature and ATR, Application
Targeted Routing, was the software feature that had the Tx path adding
rules by default.

The i40e driver supports disabling it via the "flow-director-atr" private flag.

As far as tying this into NTUPLE that is definitely a no-go. Generally
NTUPLE rules and ATR are mutually exclusive since they compete for
resources within the same device.

> > > Does the device need to be able to parse the frame fully for this
> > > mechanism to work? Will it work even if the TCP segment is encapsulated
> > > in a custom tunnel?
> >
> > no, custom tunnel is not supported.
>
> Hm, okay, it's just queue mapping, if device gets it wrong not the end
> of the world (provided security boundaries are preserved).

So yes/no in terms of this not causing serious issues. Where this
tends to get ugly is if it is combined with something like XPS, which
appears to be enabled for hns3. In that case the flow can jump queues
and when it does that can lead to the Rx either jumping to follow
causing an out of order issue on the Rx side, or being left behind,
with being left behind which is the safer case.

Really I think this feature would be better served by implementing
Accelerated RFS and adding support for ndo_rx_flow_steer.

[net-next PATCH v2 10/10] ionic: Update driver to use ethtool_sprintf

2021-03-16 Thread Alexander Duyck

From: Alexander Duyck 

Update the ionic driver to make use of ethtool_sprintf. In addition add
separate functions for Tx/Rx stats strings in order to reduce the total
amount of indenting needed in the driver code.

Acked-by: Shannon Nelson 
Signed-off-by: Alexander Duyck 
---
 drivers/net/ethernet/pensando/ionic/ionic_stats.c |  145 +
 1 file changed, 60 insertions(+), 85 deletions(-)

diff --git a/drivers/net/ethernet/pensando/ionic/ionic_stats.c 
b/drivers/net/ethernet/pensando/ionic/ionic_stats.c
index 6ae75b771a15..308b4ac6c57b 100644
--- a/drivers/net/ethernet/pensando/ionic/ionic_stats.c
+++ b/drivers/net/ethernet/pensando/ionic/ionic_stats.c
@@ -246,98 +246,73 @@ static u64 ionic_sw_stats_get_count(struct ionic_lif *lif)
return total;
 }
 
+static void ionic_sw_stats_get_tx_strings(struct ionic_lif *lif, u8 **buf,
+ int q_num)
+{
+   int i;
+
+   for (i = 0; i < IONIC_NUM_TX_STATS; i++)
+   ethtool_sprintf(buf, "tx_%d_%s", q_num,
+   ionic_tx_stats_desc[i].name);
+
+   if (!test_bit(IONIC_LIF_F_UP, lif->state) ||
+   !test_bit(IONIC_LIF_F_SW_DEBUG_STATS, lif->state))
+   return;
+
+   for (i = 0; i < IONIC_NUM_TX_Q_STATS; i++)
+   ethtool_sprintf(buf, "txq_%d_%s", q_num,
+   ionic_txq_stats_desc[i].name);
+   for (i = 0; i < IONIC_NUM_DBG_CQ_STATS; i++)
+   ethtool_sprintf(buf, "txq_%d_cq_%s", q_num,
+   ionic_dbg_cq_stats_desc[i].name);
+   for (i = 0; i < IONIC_NUM_DBG_INTR_STATS; i++)
+   ethtool_sprintf(buf, "txq_%d_intr_%s", q_num,
+   ionic_dbg_intr_stats_desc[i].name);
+   for (i = 0; i < IONIC_MAX_NUM_SG_CNTR; i++)
+   ethtool_sprintf(buf, "txq_%d_sg_cntr_%d", q_num, i);
+}
+
+static void ionic_sw_stats_get_rx_strings(struct ionic_lif *lif, u8 **buf,
+ int q_num)
+{
+   int i;
+
+   for (i = 0; i < IONIC_NUM_RX_STATS; i++)
+   ethtool_sprintf(buf, "rx_%d_%s", q_num,
+   ionic_rx_stats_desc[i].name);
+
+   if (!test_bit(IONIC_LIF_F_UP, lif->state) ||
+   !test_bit(IONIC_LIF_F_SW_DEBUG_STATS, lif->state))
+   return;
+
+   for (i = 0; i < IONIC_NUM_DBG_CQ_STATS; i++)
+   ethtool_sprintf(buf, "rxq_%d_cq_%s", q_num,
+   ionic_dbg_cq_stats_desc[i].name);
+   for (i = 0; i < IONIC_NUM_DBG_INTR_STATS; i++)
+   ethtool_sprintf(buf, "rxq_%d_intr_%s", q_num,
+   ionic_dbg_intr_stats_desc[i].name);
+   for (i = 0; i < IONIC_NUM_DBG_NAPI_STATS; i++)
+   ethtool_sprintf(buf, "rxq_%d_napi_%s", q_num,
+   ionic_dbg_napi_stats_desc[i].name);
+   for (i = 0; i < IONIC_MAX_NUM_NAPI_CNTR; i++)
+   ethtool_sprintf(buf, "rxq_%d_napi_work_done_%d", q_num, i);
+}
+
 static void ionic_sw_stats_get_strings(struct ionic_lif *lif, u8 **buf)
 {
int i, q_num;
 
-   for (i = 0; i < IONIC_NUM_LIF_STATS; i++) {
-   snprintf(*buf, ETH_GSTRING_LEN, ionic_lif_stats_desc[i].name);
-   *buf += ETH_GSTRING_LEN;
-   }
+   for (i = 0; i < IONIC_NUM_LIF_STATS; i++)
+   ethtool_sprintf(buf, ionic_lif_stats_desc[i].name);
 
-   for (i = 0; i < IONIC_NUM_PORT_STATS; i++) {
-   snprintf(*buf, ETH_GSTRING_LEN,
-ionic_port_stats_desc[i].name);
-   *buf += ETH_GSTRING_LEN;
-   }
+   for (i = 0; i < IONIC_NUM_PORT_STATS; i++)
+   ethtool_sprintf(buf, ionic_port_stats_desc[i].name);
 
-   for (q_num = 0; q_num < MAX_Q(lif); q_num++) {
-   for (i = 0; i < IONIC_NUM_TX_STATS; i++) {
-   snprintf(*buf, ETH_GSTRING_LEN, "tx_%d_%s",
-q_num, ionic_tx_stats_desc[i].name);
-   *buf += ETH_GSTRING_LEN;
-   }
+   for (q_num = 0; q_num < MAX_Q(lif); q_num++)
+   ionic_sw_stats_get_tx_strings(lif, buf, q_num);
 
-   if (test_bit(IONIC_LIF_F_UP, lif->state) &&
-   test_bit(IONIC_LIF_F_SW_DEBUG_STATS, lif->state)) {
-   for (i = 0; i < IONIC_NUM_TX_Q_STATS; i++) {
-   snprintf(*buf, ETH_GSTRING_LEN,
-"txq_%d_%s",
-q_num,
-ionic_txq_stats_desc[i].name);
-   *buf += ETH_GSTRING_LEN;
-   }
-   for (i = 0; i

[net-next PATCH v2 09/10] bna: Update driver to use ethtool_sprintf

2021-03-16 Thread Alexander Duyck

From: Alexander Duyck 

Update the bnad_get_strings to make use of ethtool_sprintf and avoid
unnecessary line wrapping. To do this we invert the logic for the string
set test and instead exit immediately if we are not working with the stats
strings. In addition the function is broken up into subfunctions for each
area so that we can simply call ethtool_sprintf once for each string in a
given subsection.

Signed-off-by: Alexander Duyck 
---
 drivers/net/ethernet/brocade/bna/bnad_ethtool.c |  266 +--
 1 file changed, 105 insertions(+), 161 deletions(-)

diff --git a/drivers/net/ethernet/brocade/bna/bnad_ethtool.c 
b/drivers/net/ethernet/brocade/bna/bnad_ethtool.c
index 588c4804d10a..265c2fa6bbe0 100644
--- a/drivers/net/ethernet/brocade/bna/bnad_ethtool.c
+++ b/drivers/net/ethernet/brocade/bna/bnad_ethtool.c
@@ -524,6 +524,68 @@ bnad_set_pauseparam(struct net_device *netdev,
return 0;
 }
 
+static void bnad_get_txf_strings(u8 **string, int f_num)
+{
+   ethtool_sprintf(string, "txf%d_ucast_octets", f_num);
+   ethtool_sprintf(string, "txf%d_ucast", f_num);
+   ethtool_sprintf(string, "txf%d_ucast_vlan", f_num);
+   ethtool_sprintf(string, "txf%d_mcast_octets", f_num);
+   ethtool_sprintf(string, "txf%d_mcast", f_num);
+   ethtool_sprintf(string, "txf%d_mcast_vlan", f_num);
+   ethtool_sprintf(string, "txf%d_bcast_octets", f_num);
+   ethtool_sprintf(string, "txf%d_bcast", f_num);
+   ethtool_sprintf(string, "txf%d_bcast_vlan", f_num);
+   ethtool_sprintf(string, "txf%d_errors", f_num);
+   ethtool_sprintf(string, "txf%d_filter_vlan", f_num);
+   ethtool_sprintf(string, "txf%d_filter_mac_sa", f_num);
+}
+
+static void bnad_get_rxf_strings(u8 **string, int f_num)
+{
+   ethtool_sprintf(string, "rxf%d_ucast_octets", f_num);
+   ethtool_sprintf(string, "rxf%d_ucast", f_num);
+   ethtool_sprintf(string, "rxf%d_ucast_vlan", f_num);
+   ethtool_sprintf(string, "rxf%d_mcast_octets", f_num);
+   ethtool_sprintf(string, "rxf%d_mcast", f_num);
+   ethtool_sprintf(string, "rxf%d_mcast_vlan", f_num);
+   ethtool_sprintf(string, "rxf%d_bcast_octets", f_num);
+   ethtool_sprintf(string, "rxf%d_bcast", f_num);
+   ethtool_sprintf(string, "rxf%d_bcast_vlan", f_num);
+   ethtool_sprintf(string, "rxf%d_frame_drops", f_num);
+}
+
+static void bnad_get_cq_strings(u8 **string, int q_num)
+{
+   ethtool_sprintf(string, "cq%d_producer_index", q_num);
+   ethtool_sprintf(string, "cq%d_consumer_index", q_num);
+   ethtool_sprintf(string, "cq%d_hw_producer_index", q_num);
+   ethtool_sprintf(string, "cq%d_intr", q_num);
+   ethtool_sprintf(string, "cq%d_poll", q_num);
+   ethtool_sprintf(string, "cq%d_schedule", q_num);
+   ethtool_sprintf(string, "cq%d_keep_poll", q_num);
+   ethtool_sprintf(string, "cq%d_complete", q_num);
+}
+
+static void bnad_get_rxq_strings(u8 **string, int q_num)
+{
+   ethtool_sprintf(string, "rxq%d_packets", q_num);
+   ethtool_sprintf(string, "rxq%d_bytes", q_num);
+   ethtool_sprintf(string, "rxq%d_packets_with_error", q_num);
+   ethtool_sprintf(string, "rxq%d_allocbuf_failed", q_num);
+   ethtool_sprintf(string, "rxq%d_mapbuf_failed", q_num);
+   ethtool_sprintf(string, "rxq%d_producer_index", q_num);
+   ethtool_sprintf(string, "rxq%d_consumer_index", q_num);
+}
+
+static void bnad_get_txq_strings(u8 **string, int q_num)
+{
+   ethtool_sprintf(string, "txq%d_packets", q_num);
+   ethtool_sprintf(string, "txq%d_bytes", q_num);
+   ethtool_sprintf(string, "txq%d_producer_index", q_num);
+   ethtool_sprintf(string, "txq%d_consumer_index", q_num);
+   ethtool_sprintf(string, "txq%d_hw_consumer_index", q_num);
+}
+
 static void
 bnad_get_strings(struct net_device *netdev, u32 stringset, u8 *string)
 {
@@ -531,175 +593,57 @@ bnad_get_strings(struct net_device *netdev, u32 
stringset, u8 *string)
int i, j, q_num;
u32 bmap;
 
+   if (stringset != ETH_SS_STATS)
+   return;
+
mutex_lock(&bnad->conf_mutex);
 
-   switch (stringset) {
-   case ETH_SS_STATS:
-   for (i = 0; i < BNAD_ETHTOOL_STATS_NUM; i++) {
-   BUG_ON(!(strlen(bnad_net_stats_strings[i]) <
-  ETH_GSTRING_LEN));
-   strncpy(string, bnad_net_stats_strings[i],
-   ETH_GSTRING_LEN);
-   string += ETH_GSTRING_LEN;
-   }
-   bmap = bna_t

[net-next PATCH v2 08/10] vmxnet3: Update driver to use ethtool_sprintf

2021-03-16 Thread Alexander Duyck

From: Alexander Duyck 

So this patch actually does 3 things.

First it removes a stray white space at the start of the variable
declaration in vmxnet3_get_strings.

Second it flips the logic for the string test so that we exit immediately
if we are not looking for the stats strings. Doing this we can avoid
unnecessary indentation and line wrapping.

Then finally it updates the code to use ethtool_sprintf rather than a
memcpy and pointer increment to write the ethtool strings.

Signed-off-by: Alexander Duyck 
---
 drivers/net/vmxnet3/vmxnet3_ethtool.c |   53 -
 1 file changed, 19 insertions(+), 34 deletions(-)

diff --git a/drivers/net/vmxnet3/vmxnet3_ethtool.c 
b/drivers/net/vmxnet3/vmxnet3_ethtool.c
index 7ec8652f2c26..c0bd9cbc43b1 100644
--- a/drivers/net/vmxnet3/vmxnet3_ethtool.c
+++ b/drivers/net/vmxnet3/vmxnet3_ethtool.c
@@ -218,43 +218,28 @@ vmxnet3_get_drvinfo(struct net_device *netdev, struct 
ethtool_drvinfo *drvinfo)
 static void
 vmxnet3_get_strings(struct net_device *netdev, u32 stringset, u8 *buf)
 {
-struct vmxnet3_adapter *adapter = netdev_priv(netdev);
-   if (stringset == ETH_SS_STATS) {
-   int i, j;
-   for (j = 0; j < adapter->num_tx_queues; j++) {
-   for (i = 0; i < ARRAY_SIZE(vmxnet3_tq_dev_stats); i++) {
-   memcpy(buf, vmxnet3_tq_dev_stats[i].desc,
-  ETH_GSTRING_LEN);
-   buf += ETH_GSTRING_LEN;
-   }
-   for (i = 0; i < ARRAY_SIZE(vmxnet3_tq_driver_stats);
-i++) {
-   memcpy(buf, vmxnet3_tq_driver_stats[i].desc,
-  ETH_GSTRING_LEN);
-   buf += ETH_GSTRING_LEN;
-   }
-   }
+   struct vmxnet3_adapter *adapter = netdev_priv(netdev);
+   int i, j;
 
-   for (j = 0; j < adapter->num_rx_queues; j++) {
-   for (i = 0; i < ARRAY_SIZE(vmxnet3_rq_dev_stats); i++) {
-   memcpy(buf, vmxnet3_rq_dev_stats[i].desc,
-  ETH_GSTRING_LEN);
-   buf += ETH_GSTRING_LEN;
-   }
-   for (i = 0; i < ARRAY_SIZE(vmxnet3_rq_driver_stats);
-i++) {
-   memcpy(buf, vmxnet3_rq_driver_stats[i].desc,
-  ETH_GSTRING_LEN);
-   buf += ETH_GSTRING_LEN;
-   }
-   }
+   if (stringset != ETH_SS_STATS)
+   return;
 
-   for (i = 0; i < ARRAY_SIZE(vmxnet3_global_stats); i++) {
-   memcpy(buf, vmxnet3_global_stats[i].desc,
-   ETH_GSTRING_LEN);
-   buf += ETH_GSTRING_LEN;
-   }
+   for (j = 0; j < adapter->num_tx_queues; j++) {
+   for (i = 0; i < ARRAY_SIZE(vmxnet3_tq_dev_stats); i++)
+   ethtool_sprintf(&buf, vmxnet3_tq_dev_stats[i].desc);
+   for (i = 0; i < ARRAY_SIZE(vmxnet3_tq_driver_stats); i++)
+   ethtool_sprintf(&buf, vmxnet3_tq_driver_stats[i].desc);
+   }
+
+   for (j = 0; j < adapter->num_rx_queues; j++) {
+   for (i = 0; i < ARRAY_SIZE(vmxnet3_rq_dev_stats); i++)
+   ethtool_sprintf(&buf, vmxnet3_rq_dev_stats[i].desc);
+   for (i = 0; i < ARRAY_SIZE(vmxnet3_rq_driver_stats); i++)
+   ethtool_sprintf(&buf, vmxnet3_rq_driver_stats[i].desc);
}
+
+   for (i = 0; i < ARRAY_SIZE(vmxnet3_global_stats); i++)
+   ethtool_sprintf(&buf, vmxnet3_global_stats[i].desc);
 }
 
 netdev_features_t vmxnet3_fix_features(struct net_device *netdev,

[net-next PATCH v2 07/10] virtio_net: Update driver to use ethtool_sprintf

2021-03-16 Thread Alexander Duyck

From: Alexander Duyck 

Update the code to replace instances of snprintf and a pointer update with
just calling ethtool_sprintf.

Also replace the char pointer with a u8 pointer to avoid having to recast
the pointer type.

Acked-by: Michael S. Tsirkin 
Acked-by: Jason Wang 
Signed-off-by: Alexander Duyck 
---
 drivers/net/virtio_net.c |   18 +++---
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index e97288dd6e5a..77ba8e2fc11c 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2138,25 +2138,21 @@ static int virtnet_set_channels(struct net_device *dev,
 static void virtnet_get_strings(struct net_device *dev, u32 stringset, u8 
*data)
 {
struct virtnet_info *vi = netdev_priv(dev);
-   char *p = (char *)data;
unsigned int i, j;
+   u8 *p = data;
 
switch (stringset) {
case ETH_SS_STATS:
for (i = 0; i < vi->curr_queue_pairs; i++) {
-   for (j = 0; j < VIRTNET_RQ_STATS_LEN; j++) {
-   snprintf(p, ETH_GSTRING_LEN, "rx_queue_%u_%s",
-i, virtnet_rq_stats_desc[j].desc);
-   p += ETH_GSTRING_LEN;
-   }
+   for (j = 0; j < VIRTNET_RQ_STATS_LEN; j++)
+   ethtool_sprintf(&p, "rx_queue_%u_%s", i,
+   virtnet_rq_stats_desc[j].desc);
}
 
for (i = 0; i < vi->curr_queue_pairs; i++) {
-   for (j = 0; j < VIRTNET_SQ_STATS_LEN; j++) {
-   snprintf(p, ETH_GSTRING_LEN, "tx_queue_%u_%s",
-i, virtnet_sq_stats_desc[j].desc);
-   p += ETH_GSTRING_LEN;
-   }
+   for (j = 0; j < VIRTNET_SQ_STATS_LEN; j++)
+   ethtool_sprintf(&p, "tx_queue_%u_%s", i,
+   virtnet_sq_stats_desc[j].desc);
}
break;
}

[net-next PATCH v2 05/10] ena: Update driver to use ethtool_sprintf

2021-03-16 Thread Alexander Duyck

From: Alexander Duyck 

Replace instances of snprintf or memcpy with a pointer update with
ethtool_sprintf.

Acked-by: Arthur Kiyanovski 
Signed-off-by: Alexander Duyck 
---
 drivers/net/ethernet/amazon/ena/ena_ethtool.c |   25 +++--
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_ethtool.c 
b/drivers/net/ethernet/amazon/ena/ena_ethtool.c
index d6cc7aa612b7..2fe7ccee55b2 100644
--- a/drivers/net/ethernet/amazon/ena/ena_ethtool.c
+++ b/drivers/net/ethernet/amazon/ena/ena_ethtool.c
@@ -251,10 +251,10 @@ static void ena_queue_strings(struct ena_adapter 
*adapter, u8 **data)
for (j = 0; j < ENA_STATS_ARRAY_TX; j++) {
ena_stats = &ena_stats_tx_strings[j];
 
-   snprintf(*data, ETH_GSTRING_LEN,
-"queue_%u_%s_%s", i,
-is_xdp ? "xdp_tx" : "tx", ena_stats->name);
-   (*data) += ETH_GSTRING_LEN;
+   ethtool_sprintf(data,
+   "queue_%u_%s_%s", i,
+   is_xdp ? "xdp_tx" : "tx",
+   ena_stats->name);
}
 
if (!is_xdp) {
@@ -264,9 +264,9 @@ static void ena_queue_strings(struct ena_adapter *adapter, 
u8 **data)
for (j = 0; j < ENA_STATS_ARRAY_RX; j++) {
ena_stats = &ena_stats_rx_strings[j];
 
-   snprintf(*data, ETH_GSTRING_LEN,
-"queue_%u_rx_%s", i, ena_stats->name);
-   (*data) += ETH_GSTRING_LEN;
+   ethtool_sprintf(data,
+   "queue_%u_rx_%s", i,
+   ena_stats->name);
}
}
}
@@ -280,9 +280,8 @@ static void ena_com_dev_strings(u8 **data)
for (i = 0; i < ENA_STATS_ARRAY_ENA_COM; i++) {
ena_stats = &ena_stats_ena_com_strings[i];
 
-   snprintf(*data, ETH_GSTRING_LEN,
-"ena_admin_q_%s", ena_stats->name);
-   (*data) += ETH_GSTRING_LEN;
+   ethtool_sprintf(data,
+   "ena_admin_q_%s", ena_stats->name);
}
 }
 
@@ -295,15 +294,13 @@ static void ena_get_strings(struct ena_adapter *adapter,
 
for (i = 0; i < ENA_STATS_ARRAY_GLOBAL; i++) {
ena_stats = &ena_stats_global_strings[i];
-   memcpy(data, ena_stats->name, ETH_GSTRING_LEN);
-   data += ETH_GSTRING_LEN;
+   ethtool_sprintf(&data, ena_stats->name);
}
 
if (eni_stats_needed) {
for (i = 0; i < ENA_STATS_ARRAY_ENI(adapter); i++) {
ena_stats = &ena_stats_eni_strings[i];
-   memcpy(data, ena_stats->name, ETH_GSTRING_LEN);
-   data += ETH_GSTRING_LEN;
+   ethtool_sprintf(&data, ena_stats->name);
}
}

[net-next PATCH v2 04/10] hisilicon: Update drivers to use ethtool_sprintf

2021-03-16 Thread Alexander Duyck

From: Alexander Duyck 

Update the hisilicon drivers to make use of ethtool_sprintf. The general
idea is to reduce code size and overhead by replacing the repeated pattern
of string printf statements and ETH_STRING_LEN counter increments.

Signed-off-by: Alexander Duyck 
---
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c |9 +-
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c  |   41 +++-
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c  |   91 ++
 .../net/ethernet/hisilicon/hns/hns_dsaf_xgmac.c|8 +-
 drivers/net/ethernet/hisilicon/hns/hns_ethtool.c   |  103 +++-
 5 files changed, 90 insertions(+), 162 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c
index 7fb7a419607d..04878b145626 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c
@@ -687,17 +687,14 @@ static void hns_gmac_get_stats(void *mac_drv, u64 *data)
 
 static void hns_gmac_get_strings(u32 stringset, u8 *data)
 {
-   char *buff = (char *)data;
+   u8 *buff = data;
u32 i;
 
if (stringset != ETH_SS_STATS)
return;
 
-   for (i = 0; i < ARRAY_SIZE(g_gmac_stats_string); i++) {
-   snprintf(buff, ETH_GSTRING_LEN, "%s",
-g_gmac_stats_string[i].desc);
-   buff = buff + ETH_GSTRING_LEN;
-   }
+   for (i = 0; i < ARRAY_SIZE(g_gmac_stats_string); i++)
+   ethtool_sprintf(&buff, g_gmac_stats_string[i].desc);
 }
 
 static int hns_gmac_get_sset_count(int stringset)
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c
index d0f8b1fff333..ff03cafccb66 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c
@@ -462,33 +462,22 @@ int hns_ppe_get_regs_count(void)
  */
 void hns_ppe_get_strings(struct hns_ppe_cb *ppe_cb, int stringset, u8 *data)
 {
-   char *buff = (char *)data;
int index = ppe_cb->index;
-
-   snprintf(buff, ETH_GSTRING_LEN, "ppe%d_rx_sw_pkt", index);
-   buff = buff + ETH_GSTRING_LEN;
-   snprintf(buff, ETH_GSTRING_LEN, "ppe%d_rx_pkt_ok", index);
-   buff = buff + ETH_GSTRING_LEN;
-   snprintf(buff, ETH_GSTRING_LEN, "ppe%d_rx_drop_pkt_no_bd", index);
-   buff = buff + ETH_GSTRING_LEN;
-   snprintf(buff, ETH_GSTRING_LEN, "ppe%d_rx_alloc_buf_fail", index);
-   buff = buff + ETH_GSTRING_LEN;
-   snprintf(buff, ETH_GSTRING_LEN, "ppe%d_rx_alloc_buf_wait", index);
-   buff = buff + ETH_GSTRING_LEN;
-   snprintf(buff, ETH_GSTRING_LEN, "ppe%d_rx_pkt_drop_no_buf", index);
-   buff = buff + ETH_GSTRING_LEN;
-   snprintf(buff, ETH_GSTRING_LEN, "ppe%d_rx_pkt_err_fifo_full", index);
-   buff = buff + ETH_GSTRING_LEN;
-
-   snprintf(buff, ETH_GSTRING_LEN, "ppe%d_tx_bd", index);
-   buff = buff + ETH_GSTRING_LEN;
-   snprintf(buff, ETH_GSTRING_LEN, "ppe%d_tx_pkt", index);
-   buff = buff + ETH_GSTRING_LEN;
-   snprintf(buff, ETH_GSTRING_LEN, "ppe%d_tx_pkt_ok", index);
-   buff = buff + ETH_GSTRING_LEN;
-   snprintf(buff, ETH_GSTRING_LEN, "ppe%d_tx_pkt_err_fifo_empty", index);
-   buff = buff + ETH_GSTRING_LEN;
-   snprintf(buff, ETH_GSTRING_LEN, "ppe%d_tx_pkt_err_csum_fail", index);
+   u8 *buff = data;
+
+   ethtool_sprintf(&buff, "ppe%d_rx_sw_pkt", index);
+   ethtool_sprintf(&buff, "ppe%d_rx_pkt_ok", index);
+   ethtool_sprintf(&buff, "ppe%d_rx_drop_pkt_no_bd", index);
+   ethtool_sprintf(&buff, "ppe%d_rx_alloc_buf_fail", index);
+   ethtool_sprintf(&buff, "ppe%d_rx_alloc_buf_wait", index);
+   ethtool_sprintf(&buff, "ppe%d_rx_pkt_drop_no_buf", index);
+   ethtool_sprintf(&buff, "ppe%d_rx_pkt_err_fifo_full", index);
+
+   ethtool_sprintf(&buff, "ppe%d_tx_bd", index);
+   ethtool_sprintf(&buff, "ppe%d_tx_pkt", index);
+   ethtool_sprintf(&buff, "ppe%d_tx_pkt_ok", index);
+   ethtool_sprintf(&buff, "ppe%d_tx_pkt_err_fifo_empty", index);
+   ethtool_sprintf(&buff, "ppe%d_tx_pkt_err_csum_fail", index);
 }
 
 void hns_ppe_get_stats(struct hns_ppe_cb *ppe_cb, u64 *data)
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c
index b6c8910cf7ba..37c8effa421c 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c
@@ -929,69 +929,42 @@ int hns_rcb_get_ring_regs_count(void)
  */
 void hns_rcb_get_strings(int stringset, u8 *data, int index)
 {
-

[net-next PATCH v2 06/10] netvsc: Update driver to use ethtool_sprintf

2021-03-16 Thread Alexander Duyck

From: Alexander Duyck 

Replace instances of sprintf or memcpy with a pointer update with
ethtool_sprintf.

Signed-off-by: Alexander Duyck 
---
 drivers/net/hyperv/netvsc_drv.c |   33 +++--
 1 file changed, 11 insertions(+), 22 deletions(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 15f262b70489..97b5c9b60503 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -1612,34 +1612,23 @@ static void netvsc_get_strings(struct net_device *dev, 
u32 stringset, u8 *data)
 
switch (stringset) {
case ETH_SS_STATS:
-   for (i = 0; i < ARRAY_SIZE(netvsc_stats); i++) {
-   memcpy(p, netvsc_stats[i].name, ETH_GSTRING_LEN);
-   p += ETH_GSTRING_LEN;
-   }
+   for (i = 0; i < ARRAY_SIZE(netvsc_stats); i++)
+   ethtool_sprintf(&p, netvsc_stats[i].name);
 
-   for (i = 0; i < ARRAY_SIZE(vf_stats); i++) {
-   memcpy(p, vf_stats[i].name, ETH_GSTRING_LEN);
-   p += ETH_GSTRING_LEN;
-   }
+   for (i = 0; i < ARRAY_SIZE(vf_stats); i++)
+   ethtool_sprintf(&p, vf_stats[i].name);
 
for (i = 0; i < nvdev->num_chn; i++) {
-   sprintf(p, "tx_queue_%u_packets", i);
-   p += ETH_GSTRING_LEN;
-   sprintf(p, "tx_queue_%u_bytes", i);
-   p += ETH_GSTRING_LEN;
-   sprintf(p, "rx_queue_%u_packets", i);
-   p += ETH_GSTRING_LEN;
-   sprintf(p, "rx_queue_%u_bytes", i);
-   p += ETH_GSTRING_LEN;
-   sprintf(p, "rx_queue_%u_xdp_drop", i);
-   p += ETH_GSTRING_LEN;
+   ethtool_sprintf(&p, "tx_queue_%u_packets", i);
+   ethtool_sprintf(&p, "tx_queue_%u_bytes", i);
+   ethtool_sprintf(&p, "rx_queue_%u_packets", i);
+   ethtool_sprintf(&p, "rx_queue_%u_bytes", i);
+   ethtool_sprintf(&p, "rx_queue_%u_xdp_drop", i);
}
 
for_each_present_cpu(cpu) {
-   for (i = 0; i < ARRAY_SIZE(pcpu_stats); i++) {
-   sprintf(p, pcpu_stats[i].name, cpu);
-   p += ETH_GSTRING_LEN;
-   }
+   for (i = 0; i < ARRAY_SIZE(pcpu_stats); i++)
+   ethtool_sprintf(&p, pcpu_stats[i].name, cpu);
}
 
break;

[net-next PATCH v2 03/10] nfp: Replace nfp_pr_et with ethtool_sprintf

2021-03-16 Thread Alexander Duyck

From: Alexander Duyck 

The nfp_pr_et function is nearly identical to ethtool_sprintf except for
the fact that it passes the pointer by value and as a return whereas
ethtool_sprintf passes it as a pointer.

Since they are so close just update nfp to make use of ethtool_sprintf

Reviewed-by: Simon Horman 
Signed-off-by: Alexander Duyck 
---
 drivers/net/ethernet/netronome/nfp/abm/main.c  |4 +
 .../net/ethernet/netronome/nfp/nfp_net_ethtool.c   |   79 +---
 drivers/net/ethernet/netronome/nfp/nfp_port.h  |2 -
 3 files changed, 36 insertions(+), 49 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/abm/main.c 
b/drivers/net/ethernet/netronome/nfp/abm/main.c
index bdbf0726145e..605a1617b195 100644
--- a/drivers/net/ethernet/netronome/nfp/abm/main.c
+++ b/drivers/net/ethernet/netronome/nfp/abm/main.c
@@ -419,8 +419,8 @@ nfp_abm_port_get_stats_strings(struct nfp_app *app, struct 
nfp_port *port,
return data;
alink = repr->app_priv;
for (i = 0; i < alink->vnic->dp.num_r_vecs; i++) {
-   data = nfp_pr_et(data, "q%u_no_wait", i);
-   data = nfp_pr_et(data, "q%u_delayed", i);
+   ethtool_sprintf(&data, "q%u_no_wait", i);
+   ethtool_sprintf(&data, "q%u_delayed", i);
}
return data;
 }
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
index 9c9ae33d84ce..1b482446536d 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
@@ -429,17 +429,6 @@ static int nfp_net_set_ringparam(struct net_device *netdev,
return nfp_net_set_ring_size(nn, rxd_cnt, txd_cnt);
 }
 
-__printf(2, 3) u8 *nfp_pr_et(u8 *data, const char *fmt, ...)
-{
-   va_list args;
-
-   va_start(args, fmt);
-   vsnprintf(data, ETH_GSTRING_LEN, fmt, args);
-   va_end(args);
-
-   return data + ETH_GSTRING_LEN;
-}
-
 static unsigned int nfp_vnic_get_sw_stats_count(struct net_device *netdev)
 {
struct nfp_net *nn = netdev_priv(netdev);
@@ -454,29 +443,29 @@ static u8 *nfp_vnic_get_sw_stats_strings(struct 
net_device *netdev, u8 *data)
int i;
 
for (i = 0; i < nn->max_r_vecs; i++) {
-   data = nfp_pr_et(data, "rvec_%u_rx_pkts", i);
-   data = nfp_pr_et(data, "rvec_%u_tx_pkts", i);
-   data = nfp_pr_et(data, "rvec_%u_tx_busy", i);
+   ethtool_sprintf(&data, "rvec_%u_rx_pkts", i);
+   ethtool_sprintf(&data, "rvec_%u_tx_pkts", i);
+   ethtool_sprintf(&data, "rvec_%u_tx_busy", i);
}
 
-   data = nfp_pr_et(data, "hw_rx_csum_ok");
-   data = nfp_pr_et(data, "hw_rx_csum_inner_ok");
-   data = nfp_pr_et(data, "hw_rx_csum_complete");
-   data = nfp_pr_et(data, "hw_rx_csum_err");
-   data = nfp_pr_et(data, "rx_replace_buf_alloc_fail");
-   data = nfp_pr_et(data, "rx_tls_decrypted_packets");
-   data = nfp_pr_et(data, "hw_tx_csum");
-   data = nfp_pr_et(data, "hw_tx_inner_csum");
-   data = nfp_pr_et(data, "tx_gather");
-   data = nfp_pr_et(data, "tx_lso");
-   data = nfp_pr_et(data, "tx_tls_encrypted_packets");
-   data = nfp_pr_et(data, "tx_tls_ooo");
-   data = nfp_pr_et(data, "tx_tls_drop_no_sync_data");
-
-   data = nfp_pr_et(data, "hw_tls_no_space");
-   data = nfp_pr_et(data, "rx_tls_resync_req_ok");
-   data = nfp_pr_et(data, "rx_tls_resync_req_ign");
-   data = nfp_pr_et(data, "rx_tls_resync_sent");
+   ethtool_sprintf(&data, "hw_rx_csum_ok");
+   ethtool_sprintf(&data, "hw_rx_csum_inner_ok");
+   ethtool_sprintf(&data, "hw_rx_csum_complete");
+   ethtool_sprintf(&data, "hw_rx_csum_err");
+   ethtool_sprintf(&data, "rx_replace_buf_alloc_fail");
+   ethtool_sprintf(&data, "rx_tls_decrypted_packets");
+   ethtool_sprintf(&data, "hw_tx_csum");
+   ethtool_sprintf(&data, "hw_tx_inner_csum");
+   ethtool_sprintf(&data, "tx_gather");
+   ethtool_sprintf(&data, "tx_lso");
+   ethtool_sprintf(&data, "tx_tls_encrypted_packets");
+   ethtool_sprintf(&data, "tx_tls_ooo");
+   ethtool_sprintf(&data, "tx_tls_drop_no_sync_data");
+
+   ethtool_sprintf(&data, "hw_tls_no_space");
+   ethtool_sprintf(&data, "rx_tls_resync_req_ok");
+   ethtool_sprintf(&data, "rx_tls_resync_req_ign");
+   ethtool_sprintf(&data, "

[net-next PATCH v2 01/10] ethtool: Add common function for filling out strings

2021-03-16 Thread Alexander Duyck

From: Alexander Duyck 

Add a function to handle the common pattern of printing a string into the
ethtool strings interface and incrementing the string pointer by the
ETH_GSTRING_LEN. Most of the drivers end up doing this and several have
implemented their own versions of this function so it would make sense to
consolidate on one implementation.

Signed-off-by: Alexander Duyck 
---
 include/linux/ethtool.h |9 +
 net/ethtool/ioctl.c |   12 
 2 files changed, 21 insertions(+)

diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index ec4cd3921c67..3583f7fc075c 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -571,4 +571,13 @@ struct ethtool_phy_ops {
  */
 void ethtool_set_ethtool_phy_ops(const struct ethtool_phy_ops *ops);
 
+/**
+ * ethtool_sprintf - Write formatted string to ethtool string data
+ * @data: Pointer to start of string to update
+ * @fmt: Format of string to write
+ *
+ * Write formatted string to data. Update data to point at start of
+ * next string.
+ */
+extern __printf(2, 3) void ethtool_sprintf(u8 **data, const char *fmt, ...);
 #endif /* _LINUX_ETHTOOL_H */
diff --git a/net/ethtool/ioctl.c b/net/ethtool/ioctl.c
index 24783b71c584..0788cc3b3114 100644
--- a/net/ethtool/ioctl.c
+++ b/net/ethtool/ioctl.c
@@ -1844,6 +1844,18 @@ static int ethtool_get_strings(struct net_device *dev, 
void __user *useraddr)
return ret;
 }
 
+__printf(2, 3) void ethtool_sprintf(u8 **data, const char *fmt, ...)
+{
+   va_list args;
+
+   va_start(args, fmt);
+   vsnprintf(*data, ETH_GSTRING_LEN, fmt, args);
+   va_end(args);
+
+   *data += ETH_GSTRING_LEN;
+}
+EXPORT_SYMBOL(ethtool_sprintf);
+
 static int ethtool_phys_id(struct net_device *dev, void __user *useraddr)
 {
struct ethtool_value id;

[net-next PATCH v2 02/10] intel: Update drivers to use ethtool_sprintf

2021-03-16 Thread Alexander Duyck

From: Alexander Duyck 

Update the Intel drivers to make use of ethtool_sprintf. The general idea
is to reduce code size and overhead by replacing the repeated pattern of
string printf statements and ETH_STRING_LEN counter increments.

Signed-off-by: Alexander Duyck 
---
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c   |   16 ++
 drivers/net/ethernet/intel/ice/ice_ethtool.c |   55 +++---
 drivers/net/ethernet/intel/igb/igb_ethtool.c |   40 ++--
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |   40 ++--
 4 files changed, 50 insertions(+), 101 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index c70dec65a572..3c9054e13aa5 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -2368,21 +2368,15 @@ static void i40e_get_priv_flag_strings(struct 
net_device *netdev, u8 *data)
struct i40e_netdev_priv *np = netdev_priv(netdev);
struct i40e_vsi *vsi = np->vsi;
struct i40e_pf *pf = vsi->back;
-   char *p = (char *)data;
unsigned int i;
+   u8 *p = data;
 
-   for (i = 0; i < I40E_PRIV_FLAGS_STR_LEN; i++) {
-   snprintf(p, ETH_GSTRING_LEN, "%s",
-i40e_gstrings_priv_flags[i].flag_string);
-   p += ETH_GSTRING_LEN;
-   }
+   for (i = 0; i < I40E_PRIV_FLAGS_STR_LEN; i++)
+   ethtool_sprintf(&p, i40e_gstrings_priv_flags[i].flag_string);
if (pf->hw.pf_id != 0)
return;
-   for (i = 0; i < I40E_GL_PRIV_FLAGS_STR_LEN; i++) {
-   snprintf(p, ETH_GSTRING_LEN, "%s",
-i40e_gl_gstrings_priv_flags[i].flag_string);
-   p += ETH_GSTRING_LEN;
-   }
+   for (i = 0; i < I40E_GL_PRIV_FLAGS_STR_LEN; i++)
+   ethtool_sprintf(&p, i40e_gl_gstrings_priv_flags[i].flag_string);
 }
 
 static void i40e_get_strings(struct net_device *netdev, u32 stringset,
diff --git a/drivers/net/ethernet/intel/ice/ice_ethtool.c 
b/drivers/net/ethernet/intel/ice/ice_ethtool.c
index 2dcfa685b763..4f738425fb44 100644
--- a/drivers/net/ethernet/intel/ice/ice_ethtool.c
+++ b/drivers/net/ethernet/intel/ice/ice_ethtool.c
@@ -871,68 +871,47 @@ static void ice_get_strings(struct net_device *netdev, 
u32 stringset, u8 *data)
 {
struct ice_netdev_priv *np = netdev_priv(netdev);
struct ice_vsi *vsi = np->vsi;
-   char *p = (char *)data;
unsigned int i;
+   u8 *p = data;
 
switch (stringset) {
case ETH_SS_STATS:
-   for (i = 0; i < ICE_VSI_STATS_LEN; i++) {
-   snprintf(p, ETH_GSTRING_LEN, "%s",
-ice_gstrings_vsi_stats[i].stat_string);
-   p += ETH_GSTRING_LEN;
-   }
+   for (i = 0; i < ICE_VSI_STATS_LEN; i++)
+   ethtool_sprintf(&p,
+   ice_gstrings_vsi_stats[i].stat_string);
 
ice_for_each_alloc_txq(vsi, i) {
-   snprintf(p, ETH_GSTRING_LEN,
-"tx_queue_%u_packets", i);
-   p += ETH_GSTRING_LEN;
-   snprintf(p, ETH_GSTRING_LEN, "tx_queue_%u_bytes", i);
-   p += ETH_GSTRING_LEN;
+   ethtool_sprintf(&p, "tx_queue_%u_packets", i);
+   ethtool_sprintf(&p, "tx_queue_%u_bytes", i);
}
 
ice_for_each_alloc_rxq(vsi, i) {
-   snprintf(p, ETH_GSTRING_LEN,
-"rx_queue_%u_packets", i);
-   p += ETH_GSTRING_LEN;
-   snprintf(p, ETH_GSTRING_LEN, "rx_queue_%u_bytes", i);
-   p += ETH_GSTRING_LEN;
+   ethtool_sprintf(&p, "rx_queue_%u_packets", i);
+   ethtool_sprintf(&p, "rx_queue_%u_bytes", i);
}
 
if (vsi->type != ICE_VSI_PF)
return;
 
-   for (i = 0; i < ICE_PF_STATS_LEN; i++) {
-   snprintf(p, ETH_GSTRING_LEN, "%s",
-ice_gstrings_pf_stats[i].stat_string);
-   p += ETH_GSTRING_LEN;
-   }
+   for (i = 0; i < ICE_PF_STATS_LEN; i++)
+   ethtool_sprintf(&p,
+   ice_gstrings_pf_stats[i].stat_string);
 
for (i = 0; i < ICE_MAX_USER_PRIORITY; i++) {
-   snprintf(p, ETH_GSTRING_LEN,
-"tx_priority_%u_xon.nic", i);
-   p += ETH_GSTRING_LEN;
-

[net-next PATCH v2 00/10] ethtool: Factor out common code related to writing ethtool strings

2021-03-16 Thread Alexander Duyck

This patch set is meant to be a cleanup and refactoring of common code bits
from several drivers. Specificlly a number of drivers engage in a pattern
where they will use some variant on an sprintf or memcpy to write a string
into the ethtool string array and then they will increment their pointer by
ETH_GSTRING_LEN.

Instead of having each driver implement this independently I am refactoring
the code so that we have one central function, ethtool_sprintf that does
all this and takes a double pointer to access the data, a formatted string
to print, and the variable arguments that are associated with the string.

Changes from v1:
Fixed usage of char ** vs  unsigned char ** in hisilicon drivers

Changes from RFC:
Renamed ethtool_gsprintf to ethtool_sprintf
Fixed reverse xmas tree issue in patch 2

---

Alexander Duyck (10):
  ethtool: Add common function for filling out strings
  intel: Update drivers to use ethtool_sprintf
  nfp: Replace nfp_pr_et with ethtool_sprintf
  hisilicon: Update drivers to use ethtool_sprintf
  ena: Update driver to use ethtool_sprintf
  netvsc: Update driver to use ethtool_sprintf
  virtio_net: Update driver to use ethtool_sprintf
  vmxnet3: Update driver to use ethtool_sprintf
  bna: Update driver to use ethtool_sprintf
  ionic: Update driver to use ethtool_sprintf


 drivers/net/ethernet/amazon/ena/ena_ethtool.c |  25 +-
 .../net/ethernet/brocade/bna/bnad_ethtool.c   | 266 +++---
 .../ethernet/hisilicon/hns/hns_dsaf_gmac.c|   9 +-
 .../net/ethernet/hisilicon/hns/hns_dsaf_ppe.c |  41 +--
 .../net/ethernet/hisilicon/hns/hns_dsaf_rcb.c |  91 +++---
 .../ethernet/hisilicon/hns/hns_dsaf_xgmac.c   |   8 +-
 .../net/ethernet/hisilicon/hns/hns_ethtool.c  | 103 +++
 .../net/ethernet/intel/i40e/i40e_ethtool.c|  16 +-
 drivers/net/ethernet/intel/ice/ice_ethtool.c  |  55 ++--
 drivers/net/ethernet/intel/igb/igb_ethtool.c  |  40 +--
 .../net/ethernet/intel/ixgbe/ixgbe_ethtool.c  |  40 +--
 drivers/net/ethernet/netronome/nfp/abm/main.c |   4 +-
 .../ethernet/netronome/nfp/nfp_net_ethtool.c  |  79 +++---
 drivers/net/ethernet/netronome/nfp/nfp_port.h |   2 -
 .../net/ethernet/pensando/ionic/ionic_stats.c | 145 --
 drivers/net/hyperv/netvsc_drv.c   |  33 +--
 drivers/net/virtio_net.c  |  18 +-
 drivers/net/vmxnet3/vmxnet3_ethtool.c |  53 ++--
 18 files changed, 389 insertions(+), 639 deletions(-)

--

Re: [PATCH net-next v4 0/6] net: qualcomm: rmnet: stop using C bit-fields

2021-03-15 Thread Alexander Duyck

On Mon, Mar 15, 2021 at 6:36 AM Alex Elder  wrote:
>
> The main reason for version 4 of this series is that a bug was
> introduced in version 3, and that is fixed.
>
> But a nice note from Vladimir Oltean got me thinking about the
> necessity of using accessors defined in , and I
> concluded there was no need.  So this version simplifies things
> further, using bitwise AND and OR operators (rather than, e.g.,
> u8_get_bits()) to access all values encoded in bit fields.
>
> This version has been tested using IPv4 with checksum offload
> enabled and disabled.  Traffic over the link included ICMP (ping),
> UDP (iperf), and TCP (wget).
>
> Version 3 of this series used BIT() rather than GENMASK() to define
> single-bit masks, and bitwise AND operators to access them.
>
> Version 2 fixed bugs in the way the value written into the header
> was computed in version 1.
>
> The series was first posted here:
>   https://lore.kernel.org/netdev/20210304223431.15045-1-el...@linaro.org/
>
> -Alex
>
> Alex Elder (6):
>   net: qualcomm: rmnet: mark trailer field endianness
>   net: qualcomm: rmnet: simplify some byte order logic
>   net: qualcomm: rmnet: kill RMNET_MAP_GET_*() accessor macros
>   net: qualcomm: rmnet: use masks instead of C bit-fields
>   net: qualcomm: rmnet: don't use C bit-fields in rmnet checksum trailer
>   net: qualcomm: rmnet: don't use C bit-fields in rmnet checksum header
>
>  .../ethernet/qualcomm/rmnet/rmnet_handlers.c  | 10 +--
>  .../net/ethernet/qualcomm/rmnet/rmnet_map.h   | 12 
>  .../qualcomm/rmnet/rmnet_map_command.c| 11 +++-
>  .../ethernet/qualcomm/rmnet/rmnet_map_data.c  | 60 -
>  include/linux/if_rmnet.h  | 65 +--
>  5 files changed, 69 insertions(+), 89 deletions(-)
>

Other than the minor nit I pointed out in patch 2 the set looks good to me.

Reviewed-by: Alexander Duyck

Re: [PATCH net-next v4 2/6] net: qualcomm: rmnet: simplify some byte order logic

2021-03-15 Thread Alexander Duyck

On Mon, Mar 15, 2021 at 6:36 AM Alex Elder  wrote:
>
> In rmnet_map_ipv4_ul_csum_header() and rmnet_map_ipv6_ul_csum_header()
> the offset within a packet at which checksumming should commence is
> calculated.  This calculation involves byte swapping and a forced type
> conversion that makes it hard to understand.
>
> Simplify this by computing the offset in host byte order, then
> converting the result when assigning it into the header field.
>
> Signed-off-by: Alex Elder 
> Reviewed-by: Bjorn Andersson 
> ---
>  .../ethernet/qualcomm/rmnet/rmnet_map_data.c  | 22 ++-
>  1 file changed, 12 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c 
> b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
> index 21d38167f9618..bd1aa11c9ce59 100644
> --- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
> +++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
> @@ -197,12 +197,13 @@ rmnet_map_ipv4_ul_csum_header(void *iphdr,
>   struct rmnet_map_ul_csum_header *ul_header,
>   struct sk_buff *skb)
>  {
> -   struct iphdr *ip4h = (struct iphdr *)iphdr;
> -   __be16 *hdr = (__be16 *)ul_header, offset;
> +   __be16 *hdr = (__be16 *)ul_header;
> +   struct iphdr *ip4h = iphdr;
> +   u16 offset;
> +
> +   offset = skb_transport_header(skb) - (unsigned char *)iphdr;
> +   ul_header->csum_start_offset = htons(offset);

Rather than using skb_transport_header the correct pointer to use is
probably skb_checksum_start. The two are essentially synonymous but
the checksumming code is supposed to use skb_checksum_start.

Alternatively you could look at possibly using skb_network_header_len
as that would be the same value assuming that both headers are the
outer headers. Then you could avoid the extra pointer overhead.

>
> -   offset = htons((__force u16)(skb_transport_header(skb) -
> -(unsigned char *)iphdr));
> -   ul_header->csum_start_offset = offset;
> ul_header->csum_insert_offset = skb->csum_offset;
> ul_header->csum_enabled = 1;
> if (ip4h->protocol == IPPROTO_UDP)
> @@ -239,12 +240,13 @@ rmnet_map_ipv6_ul_csum_header(void *ip6hdr,
>   struct rmnet_map_ul_csum_header *ul_header,
>   struct sk_buff *skb)
>  {
> -   struct ipv6hdr *ip6h = (struct ipv6hdr *)ip6hdr;
> -   __be16 *hdr = (__be16 *)ul_header, offset;
> +   __be16 *hdr = (__be16 *)ul_header;
> +   struct ipv6hdr *ip6h = ip6hdr;
> +   u16 offset;
> +
> +   offset = skb_transport_header(skb) - (unsigned char *)ip6hdr;
> +   ul_header->csum_start_offset = htons(offset);

Same here.

>
> -   offset = htons((__force u16)(skb_transport_header(skb) -
> -(unsigned char *)ip6hdr));
> -   ul_header->csum_start_offset = offset;
> ul_header->csum_insert_offset = skb->csum_offset;
> ul_header->csum_enabled = 1;
>
> --
> 2.27.0
>

Re: [PATCH net-next] net: ipa: make ipa_table_hash_support() inline

2021-03-15 Thread Alexander Duyck

On Mon, Mar 15, 2021 at 8:01 AM Alex Elder  wrote:
>
> In review, Alexander Duyck suggested that ipa_table_hash_support()
> was trivial enough that it could be implemented as a static inline
> function in the header file.  But the patch had already been
> accepted.  Implement his suggestion.
>
> Signed-off-by: Alex Elder 

Looks good to me.

Reviewed-by: Alexander Duyck 

> ---
>  drivers/net/ipa/ipa_table.c | 5 -
>  drivers/net/ipa/ipa_table.h | 5 -
>  2 files changed, 4 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/net/ipa/ipa_table.c b/drivers/net/ipa/ipa_table.c
> index baaab3dd0e63c..7450e27068f19 100644
> --- a/drivers/net/ipa/ipa_table.c
> +++ b/drivers/net/ipa/ipa_table.c
> @@ -239,11 +239,6 @@ static void ipa_table_validate_build(void)
>
>  #endif /* !IPA_VALIDATE */
>
> -bool ipa_table_hash_support(struct ipa *ipa)
> -{
> -   return ipa->version != IPA_VERSION_4_2;
> -}
> -
>  /* Zero entry count means no table, so just return a 0 address */
>  static dma_addr_t ipa_table_addr(struct ipa *ipa, bool filter_mask, u16 
> count)
>  {
> diff --git a/drivers/net/ipa/ipa_table.h b/drivers/net/ipa/ipa_table.h
> index 1a68d20f19d6a..889c2e93b1223 100644
> --- a/drivers/net/ipa/ipa_table.h
> +++ b/drivers/net/ipa/ipa_table.h
> @@ -55,7 +55,10 @@ static inline bool ipa_filter_map_valid(struct ipa *ipa, 
> u32 filter_mask)
>   * ipa_table_hash_support() - Return true if hashed tables are supported
>   * @ipa:   IPA pointer
>   */
> -bool ipa_table_hash_support(struct ipa *ipa);
> +static inline bool ipa_table_hash_support(struct ipa *ipa)
> +{
> +   return ipa->version != IPA_VERSION_4_2;
> +}
>
>  /**
>   * ipa_table_reset() - Reset filter and route tables entries to "none"
> --
> 2.27.0
>

Re: [PATCH] SUNRPC: Refresh rq_pages using a bulk page allocator

2021-03-12 Thread Alexander Duyck

On Fri, Mar 12, 2021 at 1:57 PM Chuck Lever  wrote:
>
> Reduce the rate at which nfsd threads hammer on the page allocator.
> This improves throughput scalability by enabling the threads to run
> more independently of each other.
>
> Signed-off-by: Chuck Lever 
> ---
> Hi Mel-
>
> This patch replaces patch 5/7 in v4 of your alloc_pages_bulk()
> series. It implements code clean-ups suggested by Alexander Duyck.
> It builds and has seen some light testing.
>
>
>  net/sunrpc/svc_xprt.c |   39 +++
>  1 file changed, 27 insertions(+), 12 deletions(-)

The updated patch looks good to me. I am good with having my
Reviewed-by added for patches 1-6. I think the only one that still
needs work is patch 7.

Reviewed-by: Alexander Duyck

Re: [PATCH 7/7] net: page_pool: use alloc_pages_bulk in refill code path

2021-03-12 Thread Alexander Duyck

On Fri, Mar 12, 2021 at 7:43 AM Mel Gorman  wrote:
>
> From: Jesper Dangaard Brouer 
>
> There are cases where the page_pool need to refill with pages from the
> page allocator. Some workloads cause the page_pool to release pages
> instead of recycling these pages.
>
> For these workload it can improve performance to bulk alloc pages from
> the page-allocator to refill the alloc cache.
>
> For XDP-redirect workload with 100G mlx5 driver (that use page_pool)
> redirecting xdp_frame packets into a veth, that does XDP_PASS to create
> an SKB from the xdp_frame, which then cannot return the page to the
> page_pool. In this case, we saw[1] an improvement of 18.8% from using
> the alloc_pages_bulk API (3,677,958 pps -> 4,368,926 pps).
>
> [1] 
> https://github.com/xdp-project/xdp-project/blob/master/areas/mem/page_pool06_alloc_pages_bulk.org
>
> Signed-off-by: Jesper Dangaard Brouer 
> Signed-off-by: Mel Gorman 
> Reviewed-by: Ilias Apalodimas 
> ---
>  net/core/page_pool.c | 62 
>  1 file changed, 39 insertions(+), 23 deletions(-)
>
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 40e1b2beaa6c..a5889f1b86aa 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -208,44 +208,60 @@ noinline
>  static struct page *__page_pool_alloc_pages_slow(struct page_pool *pool,
>  gfp_t _gfp)
>  {
> +   const int bulk = PP_ALLOC_CACHE_REFILL;
> +   struct page *page, *next, *first_page;
> unsigned int pp_flags = pool->p.flags;
> -   struct page *page;
> +   unsigned int pp_order = pool->p.order;
> +   int pp_nid = pool->p.nid;
> +   LIST_HEAD(page_list);
> gfp_t gfp = _gfp;
>
> -   /* We could always set __GFP_COMP, and avoid this branch, as
> -* prep_new_page() can handle order-0 with __GFP_COMP.
> -*/
> -   if (pool->p.order)
> +   /* Don't support bulk alloc for high-order pages */
> +   if (unlikely(pp_order)) {
> gfp |= __GFP_COMP;
> +   first_page = alloc_pages_node(pp_nid, gfp, pp_order);
> +   if (unlikely(!first_page))
> +   return NULL;
> +   goto out;
> +   }
>
> -   /* FUTURE development:
> -*
> -* Current slow-path essentially falls back to single page
> -* allocations, which doesn't improve performance.  This code
> -* need bulk allocation support from the page allocator code.
> -*/
> -
> -   /* Cache was empty, do real allocation */
> -#ifdef CONFIG_NUMA
> -   page = alloc_pages_node(pool->p.nid, gfp, pool->p.order);
> -#else
> -   page = alloc_pages(gfp, pool->p.order);
> -#endif
> -   if (!page)
> +   if (unlikely(!__alloc_pages_bulk(gfp, pp_nid, NULL, bulk, 
> &page_list)))
> return NULL;
>
> +   /* First page is extracted and returned to caller */
> +   first_page = list_first_entry(&page_list, struct page, lru);
> +   list_del(&first_page->lru);
> +

This seems kind of broken to me. If you pull the first page and then
cannot map it you end up returning NULL even if you placed a number of
pages in the cache.

It might make more sense to have the loop below record a pointer to
the last page you processed and handle things in two stages so that on
the first iteration you map one page.

So something along the lines of:
1. Initialize last_page to NULL

for each page in the list
  2. Map page
  3. If last_page is non-NULL, move to cache
  4. Assign page to last_page
  5. Return to step 2 for each page in list

6. return last_page

> +   /* Remaining pages store in alloc.cache */
> +   list_for_each_entry_safe(page, next, &page_list, lru) {
> +   list_del(&page->lru);
> +   if ((pp_flags & PP_FLAG_DMA_MAP) &&
> +   unlikely(!page_pool_dma_map(pool, page))) {
> +   put_page(page);
> +   continue;
> +   }

So if you added a last_page pointer what you could do is check for it
here and assign it to the alloc cache. If last_page is not set the
block would be skipped.

> +   if (likely(pool->alloc.count < PP_ALLOC_CACHE_SIZE)) {
> +   pool->alloc.cache[pool->alloc.count++] = page;
> +   pool->pages_state_hold_cnt++;
> +   trace_page_pool_state_hold(pool, page,
> +  
> pool->pages_state_hold_cnt);
> +   } else {
> +   put_page(page);

If you are just calling put_page here aren't you leaking DMA mappings?
Wouldn't you need to potentially unmap the page before you call
put_page on it?

> +   }
> +   }
> +out:
> if ((pp_flags & PP_FLAG_DMA_MAP) &&
> -   unlikely(!page_pool_dma_map(pool, page))) {
> -   put_page(page);
> +   unlikely(!page_pool_dma_map(pool, first_pag

Re: [PATCH 5/7] SUNRPC: Refresh rq_pages using a bulk page allocator

2021-03-12 Thread Alexander Duyck

On Fri, Mar 12, 2021 at 7:43 AM Mel Gorman  wrote:
>
> From: Chuck Lever 
>
> Reduce the rate at which nfsd threads hammer on the page allocator.
> This improves throughput scalability by enabling the threads to run
> more independently of each other.
>
> Signed-off-by: Chuck Lever 
> Signed-off-by: Mel Gorman 
> ---
>  net/sunrpc/svc_xprt.c | 43 +++
>  1 file changed, 31 insertions(+), 12 deletions(-)
>
> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> index cfa7e4776d0e..38a8d6283801 100644
> --- a/net/sunrpc/svc_xprt.c
> +++ b/net/sunrpc/svc_xprt.c
> @@ -642,11 +642,12 @@ static void svc_check_conn_limits(struct svc_serv *serv)
>  static int svc_alloc_arg(struct svc_rqst *rqstp)
>  {
> struct svc_serv *serv = rqstp->rq_server;
> +   unsigned long needed;
> struct xdr_buf *arg;
> +   struct page *page;
> int pages;
> int i;
>
> -   /* now allocate needed pages.  If we get a failure, sleep briefly */
> pages = (serv->sv_max_mesg + 2 * PAGE_SIZE) >> PAGE_SHIFT;
> if (pages > RPCSVC_MAXPAGES) {
> pr_warn_once("svc: warning: pages=%u > RPCSVC_MAXPAGES=%lu\n",
> @@ -654,19 +655,28 @@ static int svc_alloc_arg(struct svc_rqst *rqstp)
> /* use as many pages as possible */
> pages = RPCSVC_MAXPAGES;
> }
> -   for (i = 0; i < pages ; i++)
> -   while (rqstp->rq_pages[i] == NULL) {
> -   struct page *p = alloc_page(GFP_KERNEL);
> -   if (!p) {
> -   set_current_state(TASK_INTERRUPTIBLE);
> -   if (signalled() || kthread_should_stop()) {
> -   set_current_state(TASK_RUNNING);
> -   return -EINTR;
> -   }
> -   schedule_timeout(msecs_to_jiffies(500));
> +

> +   for (needed = 0, i = 0; i < pages ; i++)
> +   if (!rqstp->rq_pages[i])
> +   needed++;

I would use an opening and closing braces for the for loop since
technically the if is a multiline statement. It will make this more
readable.

> +   if (needed) {
> +   LIST_HEAD(list);
> +
> +retry:

Rather than kind of open code a while loop why not just make this
"while (needed)"? Then all you have to do is break out of the for loop
and you will automatically return here instead of having to jump to
two different labels.

> +   alloc_pages_bulk(GFP_KERNEL, needed, &list);

Rather than not using the return value would it make sense here to
perhaps subtract it from needed? Then you would know if any of the
allocation requests weren't fulfilled.

> +   for (i = 0; i < pages; i++) {

It is probably optimizing for the exception case, but I don't think
you want the "i = 0" here. If you are having to stop because the list
is empty it probably makes sense to resume where you left off. So you
should probably be initializing i to 0 before we check for needed.

> +   if (!rqstp->rq_pages[i]) {

It might be cleaner here to just do a "continue" if rq_pages[i] is populated.

> +   page = list_first_entry_or_null(&list,
> +   struct page,
> +   lru);
> +   if (unlikely(!page))
> +   goto empty_list;

I think I preferred the original code that wasn't jumping away from
the loop here. With the change I suggested above that would switch the
if(needed) to while(needed) you could have it just break out of the
for loop to place itself back in the while loop.

> +   list_del(&page->lru);
> +   rqstp->rq_pages[i] = page;
> +   needed--;
> }
> -   rqstp->rq_pages[i] = p;
> }
> +   }
> rqstp->rq_page_end = &rqstp->rq_pages[pages];
> rqstp->rq_pages[pages] = NULL; /* this might be seen in 
> nfsd_splice_actor() */
>
> @@ -681,6 +691,15 @@ static int svc_alloc_arg(struct svc_rqst *rqstp)
> arg->len = (pages-1)*PAGE_SIZE;
> arg->tail[0].iov_len = 0;
> return 0;
> +
> +empty_list:
> +   set_current_state(TASK_INTERRUPTIBLE);
> +   if (signalled() || kthread_should_stop()) {
> +   set_current_state(TASK_RUNNING);
> +   return -EINTR;
> +   }
> +   schedule_timeout(msecs_to_jiffies(500));
> +   goto retry;
>  }
>
>  static bool
> --
> 2.26.2
>

[net-next PATCH 10/10] ionic: Update driver to use ethtool_sprintf

2021-03-12 Thread Alexander Duyck

From: Alexander Duyck 

Update the ionic driver to make use of ethtool_sprintf. In addition add
separate functions for Tx/Rx stats strings in order to reduce the total
amount of indenting needed in the driver code.

Acked-by: Shannon Nelson 
Signed-off-by: Alexander Duyck 
---
 drivers/net/ethernet/pensando/ionic/ionic_stats.c |  145 +
 1 file changed, 60 insertions(+), 85 deletions(-)

diff --git a/drivers/net/ethernet/pensando/ionic/ionic_stats.c 
b/drivers/net/ethernet/pensando/ionic/ionic_stats.c
index 6ae75b771a15..308b4ac6c57b 100644
--- a/drivers/net/ethernet/pensando/ionic/ionic_stats.c
+++ b/drivers/net/ethernet/pensando/ionic/ionic_stats.c
@@ -246,98 +246,73 @@ static u64 ionic_sw_stats_get_count(struct ionic_lif *lif)
return total;
 }
 
+static void ionic_sw_stats_get_tx_strings(struct ionic_lif *lif, u8 **buf,
+ int q_num)
+{
+   int i;
+
+   for (i = 0; i < IONIC_NUM_TX_STATS; i++)
+   ethtool_sprintf(buf, "tx_%d_%s", q_num,
+   ionic_tx_stats_desc[i].name);
+
+   if (!test_bit(IONIC_LIF_F_UP, lif->state) ||
+   !test_bit(IONIC_LIF_F_SW_DEBUG_STATS, lif->state))
+   return;
+
+   for (i = 0; i < IONIC_NUM_TX_Q_STATS; i++)
+   ethtool_sprintf(buf, "txq_%d_%s", q_num,
+   ionic_txq_stats_desc[i].name);
+   for (i = 0; i < IONIC_NUM_DBG_CQ_STATS; i++)
+   ethtool_sprintf(buf, "txq_%d_cq_%s", q_num,
+   ionic_dbg_cq_stats_desc[i].name);
+   for (i = 0; i < IONIC_NUM_DBG_INTR_STATS; i++)
+   ethtool_sprintf(buf, "txq_%d_intr_%s", q_num,
+   ionic_dbg_intr_stats_desc[i].name);
+   for (i = 0; i < IONIC_MAX_NUM_SG_CNTR; i++)
+   ethtool_sprintf(buf, "txq_%d_sg_cntr_%d", q_num, i);
+}
+
+static void ionic_sw_stats_get_rx_strings(struct ionic_lif *lif, u8 **buf,
+ int q_num)
+{
+   int i;
+
+   for (i = 0; i < IONIC_NUM_RX_STATS; i++)
+   ethtool_sprintf(buf, "rx_%d_%s", q_num,
+   ionic_rx_stats_desc[i].name);
+
+   if (!test_bit(IONIC_LIF_F_UP, lif->state) ||
+   !test_bit(IONIC_LIF_F_SW_DEBUG_STATS, lif->state))
+   return;
+
+   for (i = 0; i < IONIC_NUM_DBG_CQ_STATS; i++)
+   ethtool_sprintf(buf, "rxq_%d_cq_%s", q_num,
+   ionic_dbg_cq_stats_desc[i].name);
+   for (i = 0; i < IONIC_NUM_DBG_INTR_STATS; i++)
+   ethtool_sprintf(buf, "rxq_%d_intr_%s", q_num,
+   ionic_dbg_intr_stats_desc[i].name);
+   for (i = 0; i < IONIC_NUM_DBG_NAPI_STATS; i++)
+   ethtool_sprintf(buf, "rxq_%d_napi_%s", q_num,
+   ionic_dbg_napi_stats_desc[i].name);
+   for (i = 0; i < IONIC_MAX_NUM_NAPI_CNTR; i++)
+   ethtool_sprintf(buf, "rxq_%d_napi_work_done_%d", q_num, i);
+}
+
 static void ionic_sw_stats_get_strings(struct ionic_lif *lif, u8 **buf)
 {
int i, q_num;
 
-   for (i = 0; i < IONIC_NUM_LIF_STATS; i++) {
-   snprintf(*buf, ETH_GSTRING_LEN, ionic_lif_stats_desc[i].name);
-   *buf += ETH_GSTRING_LEN;
-   }
+   for (i = 0; i < IONIC_NUM_LIF_STATS; i++)
+   ethtool_sprintf(buf, ionic_lif_stats_desc[i].name);
 
-   for (i = 0; i < IONIC_NUM_PORT_STATS; i++) {
-   snprintf(*buf, ETH_GSTRING_LEN,
-ionic_port_stats_desc[i].name);
-   *buf += ETH_GSTRING_LEN;
-   }
+   for (i = 0; i < IONIC_NUM_PORT_STATS; i++)
+   ethtool_sprintf(buf, ionic_port_stats_desc[i].name);
 
-   for (q_num = 0; q_num < MAX_Q(lif); q_num++) {
-   for (i = 0; i < IONIC_NUM_TX_STATS; i++) {
-   snprintf(*buf, ETH_GSTRING_LEN, "tx_%d_%s",
-q_num, ionic_tx_stats_desc[i].name);
-   *buf += ETH_GSTRING_LEN;
-   }
+   for (q_num = 0; q_num < MAX_Q(lif); q_num++)
+   ionic_sw_stats_get_tx_strings(lif, buf, q_num);
 
-   if (test_bit(IONIC_LIF_F_UP, lif->state) &&
-   test_bit(IONIC_LIF_F_SW_DEBUG_STATS, lif->state)) {
-   for (i = 0; i < IONIC_NUM_TX_Q_STATS; i++) {
-   snprintf(*buf, ETH_GSTRING_LEN,
-"txq_%d_%s",
-q_num,
-ionic_txq_stats_desc[i].name);
-   *buf += ETH_GSTRING_LEN;
-   }
-   for (i = 0; i

[net-next PATCH 07/10] virtio_net: Update driver to use ethtool_sprintf

2021-03-12 Thread Alexander Duyck

From: Alexander Duyck 

Update the code to replace instances of snprintf and a pointer update with
just calling ethtool_sprintf.

Also replace the char pointer with a u8 pointer to avoid having to recast
the pointer type.

Signed-off-by: Alexander Duyck 
---
 drivers/net/virtio_net.c |   18 +++---
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index e97288dd6e5a..77ba8e2fc11c 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2138,25 +2138,21 @@ static int virtnet_set_channels(struct net_device *dev,
 static void virtnet_get_strings(struct net_device *dev, u32 stringset, u8 
*data)
 {
struct virtnet_info *vi = netdev_priv(dev);
-   char *p = (char *)data;
unsigned int i, j;
+   u8 *p = data;
 
switch (stringset) {
case ETH_SS_STATS:
for (i = 0; i < vi->curr_queue_pairs; i++) {
-   for (j = 0; j < VIRTNET_RQ_STATS_LEN; j++) {
-   snprintf(p, ETH_GSTRING_LEN, "rx_queue_%u_%s",
-i, virtnet_rq_stats_desc[j].desc);
-   p += ETH_GSTRING_LEN;
-   }
+   for (j = 0; j < VIRTNET_RQ_STATS_LEN; j++)
+   ethtool_sprintf(&p, "rx_queue_%u_%s", i,
+   virtnet_rq_stats_desc[j].desc);
}
 
for (i = 0; i < vi->curr_queue_pairs; i++) {
-   for (j = 0; j < VIRTNET_SQ_STATS_LEN; j++) {
-   snprintf(p, ETH_GSTRING_LEN, "tx_queue_%u_%s",
-i, virtnet_sq_stats_desc[j].desc);
-   p += ETH_GSTRING_LEN;
-   }
+   for (j = 0; j < VIRTNET_SQ_STATS_LEN; j++)
+   ethtool_sprintf(&p, "tx_queue_%u_%s", i,
+   virtnet_sq_stats_desc[j].desc);
}
break;
}

[net-next PATCH 08/10] vmxnet3: Update driver to use ethtool_sprintf

2021-03-12 Thread Alexander Duyck

From: Alexander Duyck 

So this patch actually does 3 things.

First it removes a stray white space at the start of the variable
declaration in vmxnet3_get_strings.

Second it flips the logic for the string test so that we exit immediately
if we are not looking for the stats strings. Doing this we can avoid
unnecessary indentation and line wrapping.

Then finally it updates the code to use ethtool_sprintf rather than a
memcpy and pointer increment to write the ethtool strings.

Signed-off-by: Alexander Duyck 
---
 drivers/net/vmxnet3/vmxnet3_ethtool.c |   53 -
 1 file changed, 19 insertions(+), 34 deletions(-)

diff --git a/drivers/net/vmxnet3/vmxnet3_ethtool.c 
b/drivers/net/vmxnet3/vmxnet3_ethtool.c
index 7ec8652f2c26..c0bd9cbc43b1 100644
--- a/drivers/net/vmxnet3/vmxnet3_ethtool.c
+++ b/drivers/net/vmxnet3/vmxnet3_ethtool.c
@@ -218,43 +218,28 @@ vmxnet3_get_drvinfo(struct net_device *netdev, struct 
ethtool_drvinfo *drvinfo)
 static void
 vmxnet3_get_strings(struct net_device *netdev, u32 stringset, u8 *buf)
 {
-struct vmxnet3_adapter *adapter = netdev_priv(netdev);
-   if (stringset == ETH_SS_STATS) {
-   int i, j;
-   for (j = 0; j < adapter->num_tx_queues; j++) {
-   for (i = 0; i < ARRAY_SIZE(vmxnet3_tq_dev_stats); i++) {
-   memcpy(buf, vmxnet3_tq_dev_stats[i].desc,
-  ETH_GSTRING_LEN);
-   buf += ETH_GSTRING_LEN;
-   }
-   for (i = 0; i < ARRAY_SIZE(vmxnet3_tq_driver_stats);
-i++) {
-   memcpy(buf, vmxnet3_tq_driver_stats[i].desc,
-  ETH_GSTRING_LEN);
-   buf += ETH_GSTRING_LEN;
-   }
-   }
+   struct vmxnet3_adapter *adapter = netdev_priv(netdev);
+   int i, j;
 
-   for (j = 0; j < adapter->num_rx_queues; j++) {
-   for (i = 0; i < ARRAY_SIZE(vmxnet3_rq_dev_stats); i++) {
-   memcpy(buf, vmxnet3_rq_dev_stats[i].desc,
-  ETH_GSTRING_LEN);
-   buf += ETH_GSTRING_LEN;
-   }
-   for (i = 0; i < ARRAY_SIZE(vmxnet3_rq_driver_stats);
-i++) {
-   memcpy(buf, vmxnet3_rq_driver_stats[i].desc,
-  ETH_GSTRING_LEN);
-   buf += ETH_GSTRING_LEN;
-   }
-   }
+   if (stringset != ETH_SS_STATS)
+   return;
 
-   for (i = 0; i < ARRAY_SIZE(vmxnet3_global_stats); i++) {
-   memcpy(buf, vmxnet3_global_stats[i].desc,
-   ETH_GSTRING_LEN);
-   buf += ETH_GSTRING_LEN;
-   }
+   for (j = 0; j < adapter->num_tx_queues; j++) {
+   for (i = 0; i < ARRAY_SIZE(vmxnet3_tq_dev_stats); i++)
+   ethtool_sprintf(&buf, vmxnet3_tq_dev_stats[i].desc);
+   for (i = 0; i < ARRAY_SIZE(vmxnet3_tq_driver_stats); i++)
+   ethtool_sprintf(&buf, vmxnet3_tq_driver_stats[i].desc);
+   }
+
+   for (j = 0; j < adapter->num_rx_queues; j++) {
+   for (i = 0; i < ARRAY_SIZE(vmxnet3_rq_dev_stats); i++)
+   ethtool_sprintf(&buf, vmxnet3_rq_dev_stats[i].desc);
+   for (i = 0; i < ARRAY_SIZE(vmxnet3_rq_driver_stats); i++)
+   ethtool_sprintf(&buf, vmxnet3_rq_driver_stats[i].desc);
}
+
+   for (i = 0; i < ARRAY_SIZE(vmxnet3_global_stats); i++)
+   ethtool_sprintf(&buf, vmxnet3_global_stats[i].desc);
 }
 
 netdev_features_t vmxnet3_fix_features(struct net_device *netdev,

[net-next PATCH 09/10] bna: Update driver to use ethtool_sprintf

2021-03-12 Thread Alexander Duyck

From: Alexander Duyck 

Update the bnad_get_strings to make use of ethtool_sprintf and avoid
unnecessary line wrapping. To do this we invert the logic for the string
set test and instead exit immediately if we are not working with the stats
strings. In addition the function is broken up into subfunctions for each
area so that we can simply call ethtool_sprintf once for each string in a
given subsection.

Signed-off-by: Alexander Duyck 
---
 drivers/net/ethernet/brocade/bna/bnad_ethtool.c |  266 +--
 1 file changed, 105 insertions(+), 161 deletions(-)

diff --git a/drivers/net/ethernet/brocade/bna/bnad_ethtool.c 
b/drivers/net/ethernet/brocade/bna/bnad_ethtool.c
index 588c4804d10a..265c2fa6bbe0 100644
--- a/drivers/net/ethernet/brocade/bna/bnad_ethtool.c
+++ b/drivers/net/ethernet/brocade/bna/bnad_ethtool.c
@@ -524,6 +524,68 @@ bnad_set_pauseparam(struct net_device *netdev,
return 0;
 }
 
+static void bnad_get_txf_strings(u8 **string, int f_num)
+{
+   ethtool_sprintf(string, "txf%d_ucast_octets", f_num);
+   ethtool_sprintf(string, "txf%d_ucast", f_num);
+   ethtool_sprintf(string, "txf%d_ucast_vlan", f_num);
+   ethtool_sprintf(string, "txf%d_mcast_octets", f_num);
+   ethtool_sprintf(string, "txf%d_mcast", f_num);
+   ethtool_sprintf(string, "txf%d_mcast_vlan", f_num);
+   ethtool_sprintf(string, "txf%d_bcast_octets", f_num);
+   ethtool_sprintf(string, "txf%d_bcast", f_num);
+   ethtool_sprintf(string, "txf%d_bcast_vlan", f_num);
+   ethtool_sprintf(string, "txf%d_errors", f_num);
+   ethtool_sprintf(string, "txf%d_filter_vlan", f_num);
+   ethtool_sprintf(string, "txf%d_filter_mac_sa", f_num);
+}
+
+static void bnad_get_rxf_strings(u8 **string, int f_num)
+{
+   ethtool_sprintf(string, "rxf%d_ucast_octets", f_num);
+   ethtool_sprintf(string, "rxf%d_ucast", f_num);
+   ethtool_sprintf(string, "rxf%d_ucast_vlan", f_num);
+   ethtool_sprintf(string, "rxf%d_mcast_octets", f_num);
+   ethtool_sprintf(string, "rxf%d_mcast", f_num);
+   ethtool_sprintf(string, "rxf%d_mcast_vlan", f_num);
+   ethtool_sprintf(string, "rxf%d_bcast_octets", f_num);
+   ethtool_sprintf(string, "rxf%d_bcast", f_num);
+   ethtool_sprintf(string, "rxf%d_bcast_vlan", f_num);
+   ethtool_sprintf(string, "rxf%d_frame_drops", f_num);
+}
+
+static void bnad_get_cq_strings(u8 **string, int q_num)
+{
+   ethtool_sprintf(string, "cq%d_producer_index", q_num);
+   ethtool_sprintf(string, "cq%d_consumer_index", q_num);
+   ethtool_sprintf(string, "cq%d_hw_producer_index", q_num);
+   ethtool_sprintf(string, "cq%d_intr", q_num);
+   ethtool_sprintf(string, "cq%d_poll", q_num);
+   ethtool_sprintf(string, "cq%d_schedule", q_num);
+   ethtool_sprintf(string, "cq%d_keep_poll", q_num);
+   ethtool_sprintf(string, "cq%d_complete", q_num);
+}
+
+static void bnad_get_rxq_strings(u8 **string, int q_num)
+{
+   ethtool_sprintf(string, "rxq%d_packets", q_num);
+   ethtool_sprintf(string, "rxq%d_bytes", q_num);
+   ethtool_sprintf(string, "rxq%d_packets_with_error", q_num);
+   ethtool_sprintf(string, "rxq%d_allocbuf_failed", q_num);
+   ethtool_sprintf(string, "rxq%d_mapbuf_failed", q_num);
+   ethtool_sprintf(string, "rxq%d_producer_index", q_num);
+   ethtool_sprintf(string, "rxq%d_consumer_index", q_num);
+}
+
+static void bnad_get_txq_strings(u8 **string, int q_num)
+{
+   ethtool_sprintf(string, "txq%d_packets", q_num);
+   ethtool_sprintf(string, "txq%d_bytes", q_num);
+   ethtool_sprintf(string, "txq%d_producer_index", q_num);
+   ethtool_sprintf(string, "txq%d_consumer_index", q_num);
+   ethtool_sprintf(string, "txq%d_hw_consumer_index", q_num);
+}
+
 static void
 bnad_get_strings(struct net_device *netdev, u32 stringset, u8 *string)
 {
@@ -531,175 +593,57 @@ bnad_get_strings(struct net_device *netdev, u32 
stringset, u8 *string)
int i, j, q_num;
u32 bmap;
 
+   if (stringset != ETH_SS_STATS)
+   return;
+
mutex_lock(&bnad->conf_mutex);
 
-   switch (stringset) {
-   case ETH_SS_STATS:
-   for (i = 0; i < BNAD_ETHTOOL_STATS_NUM; i++) {
-   BUG_ON(!(strlen(bnad_net_stats_strings[i]) <
-  ETH_GSTRING_LEN));
-   strncpy(string, bnad_net_stats_strings[i],
-   ETH_GSTRING_LEN);
-   string += ETH_GSTRING_LEN;
-   }
-   bmap = bna_t

[net-next PATCH 06/10] netvsc: Update driver to use ethtool_sprintf

2021-03-12 Thread Alexander Duyck

From: Alexander Duyck 

Replace instances of sprintf or memcpy with a pointer update with
ethtool_sprintf.

Signed-off-by: Alexander Duyck 
---
 drivers/net/hyperv/netvsc_drv.c |   33 +++--
 1 file changed, 11 insertions(+), 22 deletions(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 15f262b70489..97b5c9b60503 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -1612,34 +1612,23 @@ static void netvsc_get_strings(struct net_device *dev, 
u32 stringset, u8 *data)
 
switch (stringset) {
case ETH_SS_STATS:
-   for (i = 0; i < ARRAY_SIZE(netvsc_stats); i++) {
-   memcpy(p, netvsc_stats[i].name, ETH_GSTRING_LEN);
-   p += ETH_GSTRING_LEN;
-   }
+   for (i = 0; i < ARRAY_SIZE(netvsc_stats); i++)
+   ethtool_sprintf(&p, netvsc_stats[i].name);
 
-   for (i = 0; i < ARRAY_SIZE(vf_stats); i++) {
-   memcpy(p, vf_stats[i].name, ETH_GSTRING_LEN);
-   p += ETH_GSTRING_LEN;
-   }
+   for (i = 0; i < ARRAY_SIZE(vf_stats); i++)
+   ethtool_sprintf(&p, vf_stats[i].name);
 
for (i = 0; i < nvdev->num_chn; i++) {
-   sprintf(p, "tx_queue_%u_packets", i);
-   p += ETH_GSTRING_LEN;
-   sprintf(p, "tx_queue_%u_bytes", i);
-   p += ETH_GSTRING_LEN;
-   sprintf(p, "rx_queue_%u_packets", i);
-   p += ETH_GSTRING_LEN;
-   sprintf(p, "rx_queue_%u_bytes", i);
-   p += ETH_GSTRING_LEN;
-   sprintf(p, "rx_queue_%u_xdp_drop", i);
-   p += ETH_GSTRING_LEN;
+   ethtool_sprintf(&p, "tx_queue_%u_packets", i);
+   ethtool_sprintf(&p, "tx_queue_%u_bytes", i);
+   ethtool_sprintf(&p, "rx_queue_%u_packets", i);
+   ethtool_sprintf(&p, "rx_queue_%u_bytes", i);
+   ethtool_sprintf(&p, "rx_queue_%u_xdp_drop", i);
}
 
for_each_present_cpu(cpu) {
-   for (i = 0; i < ARRAY_SIZE(pcpu_stats); i++) {
-   sprintf(p, pcpu_stats[i].name, cpu);
-   p += ETH_GSTRING_LEN;
-   }
+   for (i = 0; i < ARRAY_SIZE(pcpu_stats); i++)
+   ethtool_sprintf(&p, pcpu_stats[i].name, cpu);
}
 
break;

[net-next PATCH 05/10] ena: Update driver to use ethtool_sprintf

2021-03-12 Thread Alexander Duyck

From: Alexander Duyck 

Replace instances of snprintf or memcpy with a pointer update with
ethtool_sprintf.

Acked-by: Arthur Kiyanovski 
Signed-off-by: Alexander Duyck 
---
 drivers/net/ethernet/amazon/ena/ena_ethtool.c |   25 +++--
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_ethtool.c 
b/drivers/net/ethernet/amazon/ena/ena_ethtool.c
index d6cc7aa612b7..2fe7ccee55b2 100644
--- a/drivers/net/ethernet/amazon/ena/ena_ethtool.c
+++ b/drivers/net/ethernet/amazon/ena/ena_ethtool.c
@@ -251,10 +251,10 @@ static void ena_queue_strings(struct ena_adapter 
*adapter, u8 **data)
for (j = 0; j < ENA_STATS_ARRAY_TX; j++) {
ena_stats = &ena_stats_tx_strings[j];
 
-   snprintf(*data, ETH_GSTRING_LEN,
-"queue_%u_%s_%s", i,
-is_xdp ? "xdp_tx" : "tx", ena_stats->name);
-   (*data) += ETH_GSTRING_LEN;
+   ethtool_sprintf(data,
+   "queue_%u_%s_%s", i,
+   is_xdp ? "xdp_tx" : "tx",
+   ena_stats->name);
}
 
if (!is_xdp) {
@@ -264,9 +264,9 @@ static void ena_queue_strings(struct ena_adapter *adapter, 
u8 **data)
for (j = 0; j < ENA_STATS_ARRAY_RX; j++) {
ena_stats = &ena_stats_rx_strings[j];
 
-   snprintf(*data, ETH_GSTRING_LEN,
-"queue_%u_rx_%s", i, ena_stats->name);
-   (*data) += ETH_GSTRING_LEN;
+   ethtool_sprintf(data,
+   "queue_%u_rx_%s", i,
+   ena_stats->name);
}
}
}
@@ -280,9 +280,8 @@ static void ena_com_dev_strings(u8 **data)
for (i = 0; i < ENA_STATS_ARRAY_ENA_COM; i++) {
ena_stats = &ena_stats_ena_com_strings[i];
 
-   snprintf(*data, ETH_GSTRING_LEN,
-"ena_admin_q_%s", ena_stats->name);
-   (*data) += ETH_GSTRING_LEN;
+   ethtool_sprintf(data,
+   "ena_admin_q_%s", ena_stats->name);
}
 }
 
@@ -295,15 +294,13 @@ static void ena_get_strings(struct ena_adapter *adapter,
 
for (i = 0; i < ENA_STATS_ARRAY_GLOBAL; i++) {
ena_stats = &ena_stats_global_strings[i];
-   memcpy(data, ena_stats->name, ETH_GSTRING_LEN);
-   data += ETH_GSTRING_LEN;
+   ethtool_sprintf(&data, ena_stats->name);
}
 
if (eni_stats_needed) {
for (i = 0; i < ENA_STATS_ARRAY_ENI(adapter); i++) {
ena_stats = &ena_stats_eni_strings[i];
-   memcpy(data, ena_stats->name, ETH_GSTRING_LEN);
-   data += ETH_GSTRING_LEN;
+   ethtool_sprintf(&data, ena_stats->name);
}
}

[net-next PATCH 04/10] hisilicon: Update drivers to use ethtool_sprintf

2021-03-12 Thread Alexander Duyck

From: Alexander Duyck 

Update the hisilicon drivers to make use of ethtool_sprintf. The general
idea is to reduce code size and overhead by replacing the repeated pattern
of string printf statements and ETH_STRING_LEN counter increments.

Signed-off-by: Alexander Duyck 
---
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c |7 -
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c  |   37 +++-
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c  |   89 ++
 .../net/ethernet/hisilicon/hns/hns_dsaf_xgmac.c|6 -
 drivers/net/ethernet/hisilicon/hns/hns_ethtool.c   |   97 +++-
 5 files changed, 82 insertions(+), 154 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c
index 7fb7a419607d..91b64db91e51 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c
@@ -693,11 +693,8 @@ static void hns_gmac_get_strings(u32 stringset, u8 *data)
if (stringset != ETH_SS_STATS)
return;
 
-   for (i = 0; i < ARRAY_SIZE(g_gmac_stats_string); i++) {
-   snprintf(buff, ETH_GSTRING_LEN, "%s",
-g_gmac_stats_string[i].desc);
-   buff = buff + ETH_GSTRING_LEN;
-   }
+   for (i = 0; i < ARRAY_SIZE(g_gmac_stats_string); i++)
+   ethtool_sprintf(&buff, g_gmac_stats_string[i].desc);
 }
 
 static int hns_gmac_get_sset_count(int stringset)
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c
index d0f8b1fff333..f331621fcc41 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c
@@ -465,30 +465,19 @@ void hns_ppe_get_strings(struct hns_ppe_cb *ppe_cb, int 
stringset, u8 *data)
char *buff = (char *)data;
int index = ppe_cb->index;
 
-   snprintf(buff, ETH_GSTRING_LEN, "ppe%d_rx_sw_pkt", index);
-   buff = buff + ETH_GSTRING_LEN;
-   snprintf(buff, ETH_GSTRING_LEN, "ppe%d_rx_pkt_ok", index);
-   buff = buff + ETH_GSTRING_LEN;
-   snprintf(buff, ETH_GSTRING_LEN, "ppe%d_rx_drop_pkt_no_bd", index);
-   buff = buff + ETH_GSTRING_LEN;
-   snprintf(buff, ETH_GSTRING_LEN, "ppe%d_rx_alloc_buf_fail", index);
-   buff = buff + ETH_GSTRING_LEN;
-   snprintf(buff, ETH_GSTRING_LEN, "ppe%d_rx_alloc_buf_wait", index);
-   buff = buff + ETH_GSTRING_LEN;
-   snprintf(buff, ETH_GSTRING_LEN, "ppe%d_rx_pkt_drop_no_buf", index);
-   buff = buff + ETH_GSTRING_LEN;
-   snprintf(buff, ETH_GSTRING_LEN, "ppe%d_rx_pkt_err_fifo_full", index);
-   buff = buff + ETH_GSTRING_LEN;
-
-   snprintf(buff, ETH_GSTRING_LEN, "ppe%d_tx_bd", index);
-   buff = buff + ETH_GSTRING_LEN;
-   snprintf(buff, ETH_GSTRING_LEN, "ppe%d_tx_pkt", index);
-   buff = buff + ETH_GSTRING_LEN;
-   snprintf(buff, ETH_GSTRING_LEN, "ppe%d_tx_pkt_ok", index);
-   buff = buff + ETH_GSTRING_LEN;
-   snprintf(buff, ETH_GSTRING_LEN, "ppe%d_tx_pkt_err_fifo_empty", index);
-   buff = buff + ETH_GSTRING_LEN;
-   snprintf(buff, ETH_GSTRING_LEN, "ppe%d_tx_pkt_err_csum_fail", index);
+   ethtool_sprintf(&buff, "ppe%d_rx_sw_pkt", index);
+   ethtool_sprintf(&buff, "ppe%d_rx_pkt_ok", index);
+   ethtool_sprintf(&buff, "ppe%d_rx_drop_pkt_no_bd", index);
+   ethtool_sprintf(&buff, "ppe%d_rx_alloc_buf_fail", index);
+   ethtool_sprintf(&buff, "ppe%d_rx_alloc_buf_wait", index);
+   ethtool_sprintf(&buff, "ppe%d_rx_pkt_drop_no_buf", index);
+   ethtool_sprintf(&buff, "ppe%d_rx_pkt_err_fifo_full", index);
+
+   ethtool_sprintf(&buff, "ppe%d_tx_bd", index);
+   ethtool_sprintf(&buff, "ppe%d_tx_pkt", index);
+   ethtool_sprintf(&buff, "ppe%d_tx_pkt_ok", index);
+   ethtool_sprintf(&buff, "ppe%d_tx_pkt_err_fifo_empty", index);
+   ethtool_sprintf(&buff, "ppe%d_tx_pkt_err_csum_fail", index);
 }
 
 void hns_ppe_get_stats(struct hns_ppe_cb *ppe_cb, u64 *data)
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c
index b6c8910cf7ba..f9f0736a2c63 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c
@@ -934,64 +934,37 @@ void hns_rcb_get_strings(int stringset, u8 *data, int 
index)
if (stringset != ETH_SS_STATS)
return;
 
-   snprintf(buff, ETH_GSTRING_LEN, "tx_ring%d_rcb_pkt_num", index);
-   buff = buff + ETH_GSTRING_LEN;
-   snprintf(buff, ETH_GSTRING_LEN, "tx_ring%d_ppe_tx_pkt_nu

[net-next PATCH 03/10] nfp: Replace nfp_pr_et with ethtool_sprintf

2021-03-12 Thread Alexander Duyck

From: Alexander Duyck 

The nfp_pr_et function is nearly identical to ethtool_sprintf except for
the fact that it passes the pointer by value and as a return whereas
ethtool_sprintf passes it as a pointer.

Since they are so close just update nfp to make use of ethtool_sprintf

Reviewed-by: Simon Horman 
Signed-off-by: Alexander Duyck 
---
 drivers/net/ethernet/netronome/nfp/abm/main.c  |4 +
 .../net/ethernet/netronome/nfp/nfp_net_ethtool.c   |   79 +---
 drivers/net/ethernet/netronome/nfp/nfp_port.h  |2 -
 3 files changed, 36 insertions(+), 49 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/abm/main.c 
b/drivers/net/ethernet/netronome/nfp/abm/main.c
index bdbf0726145e..605a1617b195 100644
--- a/drivers/net/ethernet/netronome/nfp/abm/main.c
+++ b/drivers/net/ethernet/netronome/nfp/abm/main.c
@@ -419,8 +419,8 @@ nfp_abm_port_get_stats_strings(struct nfp_app *app, struct 
nfp_port *port,
return data;
alink = repr->app_priv;
for (i = 0; i < alink->vnic->dp.num_r_vecs; i++) {
-   data = nfp_pr_et(data, "q%u_no_wait", i);
-   data = nfp_pr_et(data, "q%u_delayed", i);
+   ethtool_sprintf(&data, "q%u_no_wait", i);
+   ethtool_sprintf(&data, "q%u_delayed", i);
}
return data;
 }
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
index 9c9ae33d84ce..1b482446536d 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
@@ -429,17 +429,6 @@ static int nfp_net_set_ringparam(struct net_device *netdev,
return nfp_net_set_ring_size(nn, rxd_cnt, txd_cnt);
 }
 
-__printf(2, 3) u8 *nfp_pr_et(u8 *data, const char *fmt, ...)
-{
-   va_list args;
-
-   va_start(args, fmt);
-   vsnprintf(data, ETH_GSTRING_LEN, fmt, args);
-   va_end(args);
-
-   return data + ETH_GSTRING_LEN;
-}
-
 static unsigned int nfp_vnic_get_sw_stats_count(struct net_device *netdev)
 {
struct nfp_net *nn = netdev_priv(netdev);
@@ -454,29 +443,29 @@ static u8 *nfp_vnic_get_sw_stats_strings(struct 
net_device *netdev, u8 *data)
int i;
 
for (i = 0; i < nn->max_r_vecs; i++) {
-   data = nfp_pr_et(data, "rvec_%u_rx_pkts", i);
-   data = nfp_pr_et(data, "rvec_%u_tx_pkts", i);
-   data = nfp_pr_et(data, "rvec_%u_tx_busy", i);
+   ethtool_sprintf(&data, "rvec_%u_rx_pkts", i);
+   ethtool_sprintf(&data, "rvec_%u_tx_pkts", i);
+   ethtool_sprintf(&data, "rvec_%u_tx_busy", i);
}
 
-   data = nfp_pr_et(data, "hw_rx_csum_ok");
-   data = nfp_pr_et(data, "hw_rx_csum_inner_ok");
-   data = nfp_pr_et(data, "hw_rx_csum_complete");
-   data = nfp_pr_et(data, "hw_rx_csum_err");
-   data = nfp_pr_et(data, "rx_replace_buf_alloc_fail");
-   data = nfp_pr_et(data, "rx_tls_decrypted_packets");
-   data = nfp_pr_et(data, "hw_tx_csum");
-   data = nfp_pr_et(data, "hw_tx_inner_csum");
-   data = nfp_pr_et(data, "tx_gather");
-   data = nfp_pr_et(data, "tx_lso");
-   data = nfp_pr_et(data, "tx_tls_encrypted_packets");
-   data = nfp_pr_et(data, "tx_tls_ooo");
-   data = nfp_pr_et(data, "tx_tls_drop_no_sync_data");
-
-   data = nfp_pr_et(data, "hw_tls_no_space");
-   data = nfp_pr_et(data, "rx_tls_resync_req_ok");
-   data = nfp_pr_et(data, "rx_tls_resync_req_ign");
-   data = nfp_pr_et(data, "rx_tls_resync_sent");
+   ethtool_sprintf(&data, "hw_rx_csum_ok");
+   ethtool_sprintf(&data, "hw_rx_csum_inner_ok");
+   ethtool_sprintf(&data, "hw_rx_csum_complete");
+   ethtool_sprintf(&data, "hw_rx_csum_err");
+   ethtool_sprintf(&data, "rx_replace_buf_alloc_fail");
+   ethtool_sprintf(&data, "rx_tls_decrypted_packets");
+   ethtool_sprintf(&data, "hw_tx_csum");
+   ethtool_sprintf(&data, "hw_tx_inner_csum");
+   ethtool_sprintf(&data, "tx_gather");
+   ethtool_sprintf(&data, "tx_lso");
+   ethtool_sprintf(&data, "tx_tls_encrypted_packets");
+   ethtool_sprintf(&data, "tx_tls_ooo");
+   ethtool_sprintf(&data, "tx_tls_drop_no_sync_data");
+
+   ethtool_sprintf(&data, "hw_tls_no_space");
+   ethtool_sprintf(&data, "rx_tls_resync_req_ok");
+   ethtool_sprintf(&data, "rx_tls_resync_req_ign");
+   ethtool_sprintf(&data, "

[net-next PATCH 02/10] intel: Update drivers to use ethtool_sprintf

2021-03-12 Thread Alexander Duyck

From: Alexander Duyck 

Update the Intel drivers to make use of ethtool_sprintf. The general idea
is to reduce code size and overhead by replacing the repeated pattern of
string printf statements and ETH_STRING_LEN counter increments.

Signed-off-by: Alexander Duyck 
---
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c   |   16 ++
 drivers/net/ethernet/intel/ice/ice_ethtool.c |   55 +++---
 drivers/net/ethernet/intel/igb/igb_ethtool.c |   40 ++--
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |   40 ++--
 4 files changed, 50 insertions(+), 101 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index c70dec65a572..3c9054e13aa5 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -2368,21 +2368,15 @@ static void i40e_get_priv_flag_strings(struct 
net_device *netdev, u8 *data)
struct i40e_netdev_priv *np = netdev_priv(netdev);
struct i40e_vsi *vsi = np->vsi;
struct i40e_pf *pf = vsi->back;
-   char *p = (char *)data;
unsigned int i;
+   u8 *p = data;
 
-   for (i = 0; i < I40E_PRIV_FLAGS_STR_LEN; i++) {
-   snprintf(p, ETH_GSTRING_LEN, "%s",
-i40e_gstrings_priv_flags[i].flag_string);
-   p += ETH_GSTRING_LEN;
-   }
+   for (i = 0; i < I40E_PRIV_FLAGS_STR_LEN; i++)
+   ethtool_sprintf(&p, i40e_gstrings_priv_flags[i].flag_string);
if (pf->hw.pf_id != 0)
return;
-   for (i = 0; i < I40E_GL_PRIV_FLAGS_STR_LEN; i++) {
-   snprintf(p, ETH_GSTRING_LEN, "%s",
-i40e_gl_gstrings_priv_flags[i].flag_string);
-   p += ETH_GSTRING_LEN;
-   }
+   for (i = 0; i < I40E_GL_PRIV_FLAGS_STR_LEN; i++)
+   ethtool_sprintf(&p, i40e_gl_gstrings_priv_flags[i].flag_string);
 }
 
 static void i40e_get_strings(struct net_device *netdev, u32 stringset,
diff --git a/drivers/net/ethernet/intel/ice/ice_ethtool.c 
b/drivers/net/ethernet/intel/ice/ice_ethtool.c
index 2dcfa685b763..4f738425fb44 100644
--- a/drivers/net/ethernet/intel/ice/ice_ethtool.c
+++ b/drivers/net/ethernet/intel/ice/ice_ethtool.c
@@ -871,68 +871,47 @@ static void ice_get_strings(struct net_device *netdev, 
u32 stringset, u8 *data)
 {
struct ice_netdev_priv *np = netdev_priv(netdev);
struct ice_vsi *vsi = np->vsi;
-   char *p = (char *)data;
unsigned int i;
+   u8 *p = data;
 
switch (stringset) {
case ETH_SS_STATS:
-   for (i = 0; i < ICE_VSI_STATS_LEN; i++) {
-   snprintf(p, ETH_GSTRING_LEN, "%s",
-ice_gstrings_vsi_stats[i].stat_string);
-   p += ETH_GSTRING_LEN;
-   }
+   for (i = 0; i < ICE_VSI_STATS_LEN; i++)
+   ethtool_sprintf(&p,
+   ice_gstrings_vsi_stats[i].stat_string);
 
ice_for_each_alloc_txq(vsi, i) {
-   snprintf(p, ETH_GSTRING_LEN,
-"tx_queue_%u_packets", i);
-   p += ETH_GSTRING_LEN;
-   snprintf(p, ETH_GSTRING_LEN, "tx_queue_%u_bytes", i);
-   p += ETH_GSTRING_LEN;
+   ethtool_sprintf(&p, "tx_queue_%u_packets", i);
+   ethtool_sprintf(&p, "tx_queue_%u_bytes", i);
}
 
ice_for_each_alloc_rxq(vsi, i) {
-   snprintf(p, ETH_GSTRING_LEN,
-"rx_queue_%u_packets", i);
-   p += ETH_GSTRING_LEN;
-   snprintf(p, ETH_GSTRING_LEN, "rx_queue_%u_bytes", i);
-   p += ETH_GSTRING_LEN;
+   ethtool_sprintf(&p, "rx_queue_%u_packets", i);
+   ethtool_sprintf(&p, "rx_queue_%u_bytes", i);
}
 
if (vsi->type != ICE_VSI_PF)
return;
 
-   for (i = 0; i < ICE_PF_STATS_LEN; i++) {
-   snprintf(p, ETH_GSTRING_LEN, "%s",
-ice_gstrings_pf_stats[i].stat_string);
-   p += ETH_GSTRING_LEN;
-   }
+   for (i = 0; i < ICE_PF_STATS_LEN; i++)
+   ethtool_sprintf(&p,
+   ice_gstrings_pf_stats[i].stat_string);
 
for (i = 0; i < ICE_MAX_USER_PRIORITY; i++) {
-   snprintf(p, ETH_GSTRING_LEN,
-"tx_priority_%u_xon.nic", i);
-   p += ETH_GSTRING_LEN;
-

[net-next PATCH 01/10] ethtool: Add common function for filling out strings

2021-03-12 Thread Alexander Duyck

From: Alexander Duyck 

Add a function to handle the common pattern of printing a string into the
ethtool strings interface and incrementing the string pointer by the
ETH_GSTRING_LEN. Most of the drivers end up doing this and several have
implemented their own versions of this function so it would make sense to
consolidate on one implementation.

Signed-off-by: Alexander Duyck 
---
 include/linux/ethtool.h |9 +
 net/ethtool/ioctl.c |   12 
 2 files changed, 21 insertions(+)

diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index ec4cd3921c67..3583f7fc075c 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -571,4 +571,13 @@ struct ethtool_phy_ops {
  */
 void ethtool_set_ethtool_phy_ops(const struct ethtool_phy_ops *ops);
 
+/**
+ * ethtool_sprintf - Write formatted string to ethtool string data
+ * @data: Pointer to start of string to update
+ * @fmt: Format of string to write
+ *
+ * Write formatted string to data. Update data to point at start of
+ * next string.
+ */
+extern __printf(2, 3) void ethtool_sprintf(u8 **data, const char *fmt, ...);
 #endif /* _LINUX_ETHTOOL_H */
diff --git a/net/ethtool/ioctl.c b/net/ethtool/ioctl.c
index 24783b71c584..0788cc3b3114 100644
--- a/net/ethtool/ioctl.c
+++ b/net/ethtool/ioctl.c
@@ -1844,6 +1844,18 @@ static int ethtool_get_strings(struct net_device *dev, 
void __user *useraddr)
return ret;
 }
 
+__printf(2, 3) void ethtool_sprintf(u8 **data, const char *fmt, ...)
+{
+   va_list args;
+
+   va_start(args, fmt);
+   vsnprintf(*data, ETH_GSTRING_LEN, fmt, args);
+   va_end(args);
+
+   *data += ETH_GSTRING_LEN;
+}
+EXPORT_SYMBOL(ethtool_sprintf);
+
 static int ethtool_phys_id(struct net_device *dev, void __user *useraddr)
 {
struct ethtool_value id;

[net-next PATCH 00/10] ethtool: Factor out common code related to writing ethtool strings

2021-03-12 Thread Alexander Duyck

This patch set is meant to be a cleanup and refactoring of common code bits
from several drivers. Specifically a number of drivers engage in a pattern
where they will use some variant on an sprintf or memcpy to write a string
into the ethtool string array and then they will increment their pointer by
ETH_GSTRING_LEN.

Instead of having each driver implement this independently I am refactoring
the code so that we have one central function, ethtool_sprintf that does
all this and takes a double pointer to access the data, a formatted string
to print, and the variable arguments that are associated with the string.

Changes from RFC:
Renamed ethtool_gsprintf to ethtool_sprintf
Fixed reverse xmas tree issue in patch 2
Added Acked-by/Reviewed-by tags from RFC review

---

Alexander Duyck (10):
  ethtool: Add common function for filling out strings
  intel: Update drivers to use ethtool_sprintf
  nfp: Replace nfp_pr_et with ethtool_sprintf
  hisilicon: Update drivers to use ethtool_sprintf
  ena: Update driver to use ethtool_sprintf
  netvsc: Update driver to use ethtool_sprintf
  virtio_net: Update driver to use ethtool_sprintf
  vmxnet3: Update driver to use ethtool_sprintf
  bna: Update driver to use ethtool_sprintf
  ionic: Update driver to use ethtool_sprintf


 drivers/net/ethernet/amazon/ena/ena_ethtool.c |  25 +-
 .../net/ethernet/brocade/bna/bnad_ethtool.c   | 266 +++---
 .../ethernet/hisilicon/hns/hns_dsaf_gmac.c|   7 +-
 .../net/ethernet/hisilicon/hns/hns_dsaf_ppe.c |  37 +--
 .../net/ethernet/hisilicon/hns/hns_dsaf_rcb.c |  89 ++
 .../ethernet/hisilicon/hns/hns_dsaf_xgmac.c   |   6 +-
 .../net/ethernet/hisilicon/hns/hns_ethtool.c  |  97 +++
 .../net/ethernet/intel/i40e/i40e_ethtool.c|  16 +-
 drivers/net/ethernet/intel/ice/ice_ethtool.c  |  55 ++--
 drivers/net/ethernet/intel/igb/igb_ethtool.c  |  40 +--
 .../net/ethernet/intel/ixgbe/ixgbe_ethtool.c  |  40 +--
 drivers/net/ethernet/netronome/nfp/abm/main.c |   4 +-
 .../ethernet/netronome/nfp/nfp_net_ethtool.c  |  79 +++---
 drivers/net/ethernet/netronome/nfp/nfp_port.h |   2 -
 .../net/ethernet/pensando/ionic/ionic_stats.c | 145 --
 drivers/net/hyperv/netvsc_drv.c   |  33 +--
 drivers/net/virtio_net.c  |  18 +-
 drivers/net/vmxnet3/vmxnet3_ethtool.c |  53 ++--
 18 files changed, 381 insertions(+), 631 deletions(-)

--

Re: [PATCH mlx5-next v7 0/4] Dynamically assign MSI-X vectors count

2021-03-12 Thread Alexander Duyck

On Thu, Mar 11, 2021 at 10:32 PM Leon Romanovsky  wrote:
>
> On Thu, Mar 11, 2021 at 06:53:16PM -0800, Alexander Duyck wrote:
> > On Thu, Mar 11, 2021 at 3:21 PM Jason Gunthorpe  wrote:
> > >
> > > On Thu, Mar 11, 2021 at 01:49:24PM -0800, Alexander Duyck wrote:
> > > > > We don't need to invent new locks and new complexity for something
> > > > > that is trivially solved already.
> > > >
> > > > I am not wanting a new lock. What I am wanting is a way to mark the VF
> > > > as being stale/offline while we are performing the update. With that
> > > > we would be able to apply similar logic to any changes in the future.
> > >
> > > I think we should hold off doing this until someone comes up with HW
> > > that needs it. The response time here is microseconds, it is not worth
> > > any complexity
>
> <...>
>
> > Another way to think of this is that we are essentially pulling a
> > device back after we have already allocated the VFs and we are
> > reconfiguring it before pushing it back out for usage. Having a flag
> > that we could set on the VF device to say it is "under
> > construction"/modification/"not ready for use" would be quite useful I
> > would think.
>
> It is not simple flag change, but change of PCI state machine, which is
> far more complex than holding two locks or call to sysfs_create_file in
> the loop that made Bjorn nervous.
>
> I want to remind again that the suggestion here has nothing to do with
> the real use case of SR-IOV capable devices in the Linux.
>
> The flow is:
> 1. Disable SR-IOV driver autoprobe
> 2. Create as much as possible VFs
> 3. Wait for request from the user to get VM
> 4. Change MSI-X table according to requested in item #3
> 5. Bind ready to go VF to VM
> 6. Inform user about VM readiness
>
> The destroy flow includes VM destroy and unbind.
>
> Let's focus on solutions for real problems instead of trying to solve 
> theoretical
> cases that are not going to be tested and deployed.
>
> Thanks

So part of the problem with this all along has been that you are only
focused on how you are going to use this and don't think about how
somebody else might need to use or implement it. In addition there are
a number of half measures even within your own flow. In reality if we
are thinking we are going to have to reconfigure every device it might
make sense to simply block the driver from being able to load until
you have configured it. Then the SR-IOV autoprobe would be redundant
since you could use something like the "offline" flag to avoid that.

If you are okay with step 1 where you are setting a flag to prevent
driver auto probing why is it so much more overhead to set a bit
blocking drivers from loading entirely while you are changing the
config space? Sitting on two locks and assuming a synchronous
operation is assuming a lot about the hardware and how this is going
to be used.

In addition it seems like the logic is that step 4 will always
succeed. What happens if for example you send the message to the
firmware and you don't get a response? Do you just say the request
failed let the VF be used anyway? This is another reason why I would
be much more comfortable with the option to offline the device and
then tinker with it rather than hope that your operation can somehow
do everything in one shot.

In my mind step 4 really should be 4 steps.

1. Offline VF to reserve it for modification
2. Submit request for modification
3. Verify modification has occurred, reset if needed.
4. Online VF

Doing it in that order allows for handling many more scenarios
including those where perhaps step 2 actually consists of several
changes to support any future extensions that are needed. Splitting
step 2 and 3 allows for an asynchronous event where you can wait if
firmware takes an excessively long time, or if step 2 somehow fails
you can then repeat or revert it to get back to a consistent state.
Lastly by splitting out the onlining step you can avoid potentially
releasing a broken VF to be reserved if there is some sort of
unrecoverable error between steps 2 and 3.

Re: [PATCH mlx5-next v7 0/4] Dynamically assign MSI-X vectors count

2021-03-11 Thread Alexander Duyck

On Thu, Mar 11, 2021 at 3:21 PM Jason Gunthorpe  wrote:
>
> On Thu, Mar 11, 2021 at 01:49:24PM -0800, Alexander Duyck wrote:
> > > We don't need to invent new locks and new complexity for something
> > > that is trivially solved already.
> >
> > I am not wanting a new lock. What I am wanting is a way to mark the VF
> > as being stale/offline while we are performing the update. With that
> > we would be able to apply similar logic to any changes in the future.
>
> I think we should hold off doing this until someone comes up with HW
> that needs it. The response time here is microseconds, it is not worth
> any complexity

I disagree. Take a look at section 8.5.3 in the NVMe document that was
linked to earlier:
https://nvmexpress.org/wp-content/uploads/NVM-Express-1_4a-2020.03.09-Ratified.pdf

This is exactly what they are doing and I think it makes a ton of
sense. Basically the VF has to be taken "offline" before you are
allowed to start changing resources on it. It would basically consist
of one extra sysfs file and has additional uses beyond just the
configuration of MSI-X vectors.

We would just have to add one additional sysfs file, maybe modify the
"dead" device flag to be "offline", and we could make this work with
minimal changes to the patch set you already have. We could probably
toggle to "offline" while holding just the VF lock. To toggle the VF
back to being "online" we might need to take the PF device lock since
it is ultimately responsible for guaranteeing we have the resources.

Another way to think of this is that we are essentially pulling a
device back after we have already allocated the VFs and we are
reconfiguring it before pushing it back out for usage. Having a flag
that we could set on the VF device to say it is "under
construction"/modification/"not ready for use" would be quite useful I
would think.

Re: [PATCH mlx5-next v7 0/4] Dynamically assign MSI-X vectors count

2021-03-11 Thread Alexander Duyck

On Thu, Mar 11, 2021 at 12:19 PM Jason Gunthorpe  wrote:
>
> On Thu, Mar 11, 2021 at 11:37:28AM -0800, Alexander Duyck wrote:
>
>
> > Then the flow for this could be changed where we take the VF lock and
> > mark it as "stale" to prevent any driver binding and then we can
> > release the VF lock. Next we would perform the PF operation telling it
> > to update the VF.  Then we spin on the VF waiting for the stale data
> > to be updated and once that happens we can pop the indication that the
> > device is "stale" freeing it for use.
>
> I always get leary when people propose to open code locking constructs
> :\

I'm not suggesting we replace the lock. It is more about essentially
revoking the VF. What we are doing is essentially rewriting the PCIe
config of the VF so in my mind it makes sense to take sort of an RCU
approach where the old one is readable, but not something a new driver
can be bound to.

> There is already an existing lock to prevent probe() it is the
> device_lock() mutex on the VF. With no driver bound there is not much
> issue to hold it over the HW activity.

Yes. But sitting on those locks also has side effects such as
preventing us from taking any other actions such as disabling SR-IOV.
One concern I have is that if somebody else tries to implement this in
the future and they don't have a synchronous setup, or worse yet they
do but it takes a long time to process a request because they have a
slow controller it would be preferable to just have us post the
message to the PF and then have the thread spin and wait on the VF to
be updated rather than block on the PF while sitting on two locks.

> This lock is normally held around the entire probe() and remove()
> function which has huge amounts of HW activity already.

Yes, but usually that activity is time bound. You are usually reading
values and processing them in a timely fashion. In the case of probe
we even have cases where we have to defer because we don't want to
hold these locks for too long.

> We don't need to invent new locks and new complexity for something
> that is trivially solved already.

I am not wanting a new lock. What I am wanting is a way to mark the VF
as being stale/offline while we are performing the update. With that
we would be able to apply similar logic to any changes in the future.

Re: [PATCH mlx5-next v7 0/4] Dynamically assign MSI-X vectors count

2021-03-11 Thread Alexander Duyck

On Thu, Mar 11, 2021 at 11:51 AM Leon Romanovsky  wrote:
>
> On Thu, Mar 11, 2021 at 11:37:28AM -0800, Alexander Duyck wrote:
> > On Thu, Mar 11, 2021 at 10:17 AM Bjorn Helgaas  wrote:
> > >
> > > On Wed, Mar 10, 2021 at 03:34:01PM -0800, Alexander Duyck wrote:
> > > > On Wed, Mar 10, 2021 at 11:09 AM Bjorn Helgaas  
> > > > wrote:
> > > > > On Sun, Mar 07, 2021 at 10:55:24AM -0800, Alexander Duyck wrote:
> > > > > > On Sun, Feb 28, 2021 at 11:55 PM Leon Romanovsky  
> > > > > > wrote:
> > > > > > > From: Leon Romanovsky 
> > > > > > >
> > > > > > > @Alexander Duyck, please update me if I can add your ROB tag again
> > > > > > > to the series, because you liked v6 more.
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > -
> > > > > > > Changelog
> > > > > > > v7:
> > > > > > >  * Rebase on top v5.12-rc1
> > > > > > >  * More english fixes
> > > > > > >  * Returned to static sysfs creation model as was implemented in 
> > > > > > > v0/v1.
>
> <...>
>
> > > > representors rather than being actual PCIe devices. Having
> > > > functionality that only works when the VF driver is not loaded just
> > > > feels off. The VF sysfs directory feels like it is being used as a
> > > > subdirectory of the PF rather than being a device on its own.
> > >
> > > Moving "virtfnX_msix_count" to the PF seems like it would mitigate
> > > this somewhat.  I don't know how to make this work while a VF driver
> > > is bound without making the VF feel even less like a PCIe device,
> > > i.e., we won't be able to use the standard MSI-X model.
> >
> > Yeah, I actually do kind of like that idea. In addition it would
> > address one of the things I pointed out as an issue before as you
> > could place the virtfn values that the total value in the same folder
> > so that it is all in one central spot rather than having to walk all
> > over the sysfs hierarchy to check the setting for each VF when trying
> > to figure out how the vectors are currently distributed.
>
> User binds specific VF with specific PCI ID to VM, so instead of
> changing MSI-X table for that specific VF, he will need to translate
> from virtfn25_msix_count to PCI ID.

Wouldn't that just be a matter of changing the naming so that the PCI
ID was present in the virtfn name?

> I also gave an example of my system where I have many PFs and VFs
> function numbers are not distributed nicely. On that system 
> virtfn25_msix_count
> won't translate to AA:BB:CC.25 but to something else.

That isn't too surprising since normally we only support 7 functions
per device. I am okay with not using the name virtfnX. If you wanted
to embed the bus, device, func in the naming scheme that would work
for me too.

Really in general as a naming scheme just using a logical number have
probably never provided all that much value. There may be an argument
to be made for renaming the virtfn symlinks to include bus, device,
function instead.

Re: [PATCH mlx5-next v7 0/4] Dynamically assign MSI-X vectors count

2021-03-11 Thread Alexander Duyck

On Thu, Mar 11, 2021 at 10:17 AM Bjorn Helgaas  wrote:
>
> On Wed, Mar 10, 2021 at 03:34:01PM -0800, Alexander Duyck wrote:
> > On Wed, Mar 10, 2021 at 11:09 AM Bjorn Helgaas  wrote:
> > > On Sun, Mar 07, 2021 at 10:55:24AM -0800, Alexander Duyck wrote:
> > > > On Sun, Feb 28, 2021 at 11:55 PM Leon Romanovsky  
> > > > wrote:
> > > > > From: Leon Romanovsky 
> > > > >
> > > > > @Alexander Duyck, please update me if I can add your ROB tag again
> > > > > to the series, because you liked v6 more.
> > > > >
> > > > > Thanks
> > > > >
> > > > > -
> > > > > Changelog
> > > > > v7:
> > > > >  * Rebase on top v5.12-rc1
> > > > >  * More english fixes
> > > > >  * Returned to static sysfs creation model as was implemented in 
> > > > > v0/v1.
> > > >
> > > > Yeah, so I am not a fan of the series. The problem is there is only
> > > > one driver that supports this, all VFs are going to expose this sysfs,
> > > > and I don't know how likely it is that any others are going to
> > > > implement this functionality. I feel like you threw out all the
> > > > progress from v2-v6.
> > >
> > > pci_enable_vfs_overlay() turned up in v4, so I think v0-v3 had static
> > > sysfs files regardless of whether the PF driver was bound.
> > >
> > > > I really feel like the big issue is that this model is broken as you
> > > > have the VFs exposing sysfs interfaces that make use of the PFs to
> > > > actually implement. Greg's complaint was the PF pushing sysfs onto the
> > > > VFs. My complaint is VFs sysfs files operating on the PF. The trick is
> > > > to find a way to address both issues.
> > > >
> > > > Maybe the compromise is to reach down into the IOV code and have it
> > > > register the sysfs interface at device creation time in something like
> > > > pci_iov_sysfs_link if the PF has the functionality present to support
> > > > it.
> > >
> > > IIUC there are two questions on the table:
> > >
> > >   1) Should the sysfs files be visible only when a PF driver that
> > >  supports MSI-X vector assignment is bound?
> > >
> > >  I think this is a cosmetic issue.  The presence of the file is
> > >  not a reliable signal to management software; it must always
> > >  tolerate files that don't exist (e.g., on old kernels) or files
> > >  that are visible but don't work (e.g., vectors may be exhausted).
> > >
> > >  If we start with the files always being visible, we should be
> > >  able to add smarts later to expose them only when the PF driver
> > >  is bound.
> > >
> > >  My concerns with pci_enable_vf_overlay() are that it uses a
> > >  little more sysfs internals than I'd like (although there are
> > >  many callers of sysfs_create_files()) and it uses
> > >  pci_get_domain_bus_and_slot(), which is generally a hack and
> > >  creates refcounting hassles.  Speaking of which, isn't v6 missing
> > >  a pci_dev_put() to match the pci_get_domain_bus_and_slot()?
> >
> > I'm not so much worried about management software as the fact that
> > this is a vendor specific implementation detail that is shaping how
> > the kernel interfaces are meant to work. Other than the mlx5 I don't
> > know if there are any other vendors really onboard with this sort of
> > solution.
>
> I know this is currently vendor-specific, but I thought the value
> proposition of dynamic configuration of VFs for different clients
> sounded compelling enough that other vendors would do something
> similar.  But I'm not an SR-IOV guy and have no vendor insight, so
> maybe that's not the case?

The problem is there are multiple ways to deal with this issue. I have
worked on parts in the past that simply advertised a fixed table size
and then only allowed for configuring the number of vectors internally
so some vectors simply couldn't be unmasked. I don't know if that was
any better than this though. It is just yet another way to do this.

> > In addition it still feels rather hacky to be modifying read-only PCIe
> > configuration space on the fly via a backdoor provided by the PF. It
> > almost feels like this should be some sort of quirk rather th

Re: [PATCH 2/5] mm/page_alloc: Add a bulk page allocator

2021-03-11 Thread Alexander Duyck

On Thu, Mar 11, 2021 at 3:49 AM Mel Gorman  wrote:
>
> This patch adds a new page allocator interface via alloc_pages_bulk,
> and __alloc_pages_bulk_nodemask. A caller requests a number of pages
> to be allocated and added to a list. They can be freed in bulk using
> free_pages_bulk().
>
> The API is not guaranteed to return the requested number of pages and
> may fail if the preferred allocation zone has limited free memory, the
> cpuset changes during the allocation or page debugging decides to fail
> an allocation. It's up to the caller to request more pages in batch
> if necessary.
>
> Note that this implementation is not very efficient and could be improved
> but it would require refactoring. The intent is to make it available early
> to determine what semantics are required by different callers. Once the
> full semantics are nailed down, it can be refactored.
>
> Signed-off-by: Mel Gorman 
> ---
>  include/linux/gfp.h |  13 +
>  mm/page_alloc.c | 118 +++-
>  2 files changed, 129 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index 8572a1474e16..4903d1cc48dc 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -515,6 +515,10 @@ static inline int arch_make_page_accessible(struct page 
> *page)
>  }
>  #endif
>
> +int __alloc_pages_bulk_nodemask(gfp_t gfp_mask, int preferred_nid,
> +   nodemask_t *nodemask, int nr_pages,
> +   struct list_head *list);
> +
>  struct page *
>  __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
> nodemask_t *nodemask);
> @@ -525,6 +529,14 @@ __alloc_pages(gfp_t gfp_mask, unsigned int order, int 
> preferred_nid)
> return __alloc_pages_nodemask(gfp_mask, order, preferred_nid, NULL);
>  }
>
> +/* Bulk allocate order-0 pages */
> +static inline unsigned long
> +alloc_pages_bulk(gfp_t gfp_mask, unsigned long nr_pages, struct list_head 
> *list)
> +{
> +   return __alloc_pages_bulk_nodemask(gfp_mask, numa_mem_id(), NULL,
> +   nr_pages, list);
> +}
> +
>  /*
>   * Allocate pages, preferring the node given as nid. The node must be valid 
> and
>   * online. For more general interface, see alloc_pages_node().
> @@ -594,6 +606,7 @@ void * __meminit alloc_pages_exact_nid(int nid, size_t 
> size, gfp_t gfp_mask);
>
>  extern void __free_pages(struct page *page, unsigned int order);
>  extern void free_pages(unsigned long addr, unsigned int order);
> +extern void free_pages_bulk(struct list_head *list);
>
>  struct page_frag_cache;
>  extern void __page_frag_cache_drain(struct page *page, unsigned int count);
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3e4b29ee2b1e..415059324dc3 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4436,6 +4436,21 @@ static void wake_all_kswapds(unsigned int order, gfp_t 
> gfp_mask,
> }
>  }
>
> +/* Drop reference counts and free order-0 pages from a list. */
> +void free_pages_bulk(struct list_head *list)
> +{
> +   struct page *page, *next;
> +
> +   list_for_each_entry_safe(page, next, list, lru) {
> +   trace_mm_page_free_batched(page);
> +   if (put_page_testzero(page)) {
> +   list_del(&page->lru);
> +   __free_pages_ok(page, 0, FPI_NONE);
> +   }
> +   }
> +}
> +EXPORT_SYMBOL_GPL(free_pages_bulk);
> +
>  static inline unsigned int
>  gfp_to_alloc_flags(gfp_t gfp_mask)
>  {
> @@ -4919,6 +4934,9 @@ static inline bool prepare_alloc_pages(gfp_t gfp_mask, 
> unsigned int order,
> struct alloc_context *ac, gfp_t *alloc_mask,
> unsigned int *alloc_flags)
>  {
> +   gfp_mask &= gfp_allowed_mask;
> +   *alloc_mask = gfp_mask;
> +
> ac->highest_zoneidx = gfp_zone(gfp_mask);
> ac->zonelist = node_zonelist(preferred_nid, gfp_mask);
> ac->nodemask = nodemask;

It might be better to pull this and the change from the bottom out
into a seperate patch. I was reviewing this and when I hit the bottom
I apparently had the same question other reviewers had wondering if it
was intentional. By splitting it out it would be easier to review.

> @@ -4960,6 +4978,104 @@ static inline bool prepare_alloc_pages(gfp_t 
> gfp_mask, unsigned int order,
> return true;
>  }
>
> +/*
> + * This is a batched version of the page allocator that attempts to
> + * allocate nr_pages quickly from the preferred zone and add them to list.
> + *
> + * Returns the number of pages allocated.
> + */
> +int __alloc_pages_bulk_nodemask(gfp_t gfp_mask, int preferred_nid,
> +   nodemask_t *nodemask, int nr_pages,
> +   struct list_head *alloc_list)
> +{
> +   struct page *page;
> +   unsigned long flags;
> +   struct zone *zone;
> +   struct zoneref *z

Re: [PATCH net-next 2/6] ionic: implement Rx page reuse

2021-03-10 Thread Alexander Duyck

On Wed, Mar 10, 2021 at 11:28 AM Shannon Nelson  wrote:
>
> Rework the Rx buffer allocations to use pages twice when using
> normal MTU in order to cut down on buffer allocation and mapping
> overhead.
>
> Instead of tracking individual pages, in which we may have
> wasted half the space when using standard 1500 MTU, we track
> buffers which use half pages, so we can use the second half
> of the page rather than allocate and map a new page once the
> first buffer has been used.
>
> Signed-off-by: Shannon Nelson 

So looking at the approach taken here it just seems like you are doing
the linear walk approach and getting 2 uses per 4K page. If you are
taking that route it might make more sense to just split the page and
use both pieces immediately to populate 2 entries instead of waiting
on the next loop through the ring. Then you could just split the page
into multiple buffers and fill your sg list using less total pages
rather than having 2K gaps between your entries. An added advantage
would be that you could simply merge the page fragments in the event
that you have something writing to the full 2K buffers and you cannot
use copybreak.

> ---
>  .../net/ethernet/pensando/ionic/ionic_dev.h   |  12 +-
>  .../net/ethernet/pensando/ionic/ionic_txrx.c  | 215 +++---
>  2 files changed, 138 insertions(+), 89 deletions(-)
>
> diff --git a/drivers/net/ethernet/pensando/ionic/ionic_dev.h 
> b/drivers/net/ethernet/pensando/ionic/ionic_dev.h
> index 690768ff0143..0f877c86eba6 100644
> --- a/drivers/net/ethernet/pensando/ionic/ionic_dev.h
> +++ b/drivers/net/ethernet/pensando/ionic/ionic_dev.h
> @@ -170,9 +170,15 @@ typedef void (*ionic_desc_cb)(struct ionic_queue *q,
>   struct ionic_desc_info *desc_info,
>   struct ionic_cq_info *cq_info, void *cb_arg);
>
> -struct ionic_page_info {
> +#define IONIC_PAGE_SIZEPAGE_SIZE
> +#define IONIC_PAGE_SPLIT_SZ(PAGE_SIZE / 2)

This probably doesn't work out too well when the page size gets up to
64K. I don't know of too many networks that support a 32K MTU.. :)

> +#define IONIC_PAGE_GFP_MASK(GFP_ATOMIC | __GFP_NOWARN |\
> +__GFP_COMP | __GFP_MEMALLOC)
> +
> +struct ionic_buf_info {
> struct page *page;
> dma_addr_t dma_addr;
> +   u32 page_offset;
>  };

I'm not really sure the rename was needed. You are still just working
with a page aren't you? It would actually reduce the complexity of
this patch a bunch as you could drop the renaming changes.

>  struct ionic_desc_info {
> @@ -187,8 +193,8 @@ struct ionic_desc_info {
> struct ionic_txq_sg_desc *txq_sg_desc;
> struct ionic_rxq_sg_desc *rxq_sgl_desc;
> };
> -   unsigned int npages;
> -   struct ionic_page_info pages[IONIC_RX_MAX_SG_ELEMS + 1];
> +   unsigned int nbufs;
> +   struct ionic_buf_info bufs[IONIC_RX_MAX_SG_ELEMS + 1];
> ionic_desc_cb cb;
> void *cb_arg;
>  };
> diff --git a/drivers/net/ethernet/pensando/ionic/ionic_txrx.c 
> b/drivers/net/ethernet/pensando/ionic/ionic_txrx.c
> index 70b997f302ac..3e13cfee9ecd 100644
> --- a/drivers/net/ethernet/pensando/ionic/ionic_txrx.c
> +++ b/drivers/net/ethernet/pensando/ionic/ionic_txrx.c
> @@ -54,7 +54,7 @@ static struct sk_buff *ionic_rx_skb_alloc(struct 
> ionic_queue *q,
> if (frags)
> skb = napi_get_frags(&q_to_qcq(q)->napi);
> else
> -   skb = netdev_alloc_skb_ip_align(netdev, len);
> +   skb = napi_alloc_skb(&q_to_qcq(q)->napi, len);
>
> if (unlikely(!skb)) {
> net_warn_ratelimited("%s: SKB alloc failed on %s!\n",
> @@ -66,8 +66,15 @@ static struct sk_buff *ionic_rx_skb_alloc(struct 
> ionic_queue *q,
> return skb;
>  }
>
> +static void ionic_rx_buf_reset(struct ionic_buf_info *buf_info)
> +{
> +   buf_info->page = NULL;
> +   buf_info->page_offset = 0;
> +   buf_info->dma_addr = 0;
> +}
> +

Technically speaking you probably only need to reset the page value.
You could hold off on resetting the page_offset and dma_addr until you
actually are populating the page.

>  static int ionic_rx_page_alloc(struct ionic_queue *q,
> -  struct ionic_page_info *page_info)
> +  struct ionic_buf_info *buf_info)
>  {
> struct ionic_lif *lif = q->lif;
> struct ionic_rx_stats *stats;
> @@ -78,26 +85,26 @@ static int ionic_rx_page_alloc(struct ionic_queue *q,
> dev = lif->ionic->dev;
> stats = q_to_rx_stats(q);
>
> -   if (unlikely(!page_info)) {
> -   net_err_ratelimited("%s: %s invalid page_info in alloc\n",
> +   if (unlikely(!buf_info)) {
> +   net_err_ratelimited("%s: %s invalid buf_info in alloc\n",
> netdev->name, q->name);
> return -EINV

[RFC PATCH 10/10] ionic: Update driver to use ethtool_gsprintf

2021-03-10 Thread Alexander Duyck

From: Alexander Duyck 

Update the ionic driver to make use of ethtool_gsprintf. In addition add
separate functions for Tx/Rx stats strings in order to reduce the total
amount of indenting needed in the driver code.

Signed-off-by: Alexander Duyck 
---
 drivers/net/ethernet/pensando/ionic/ionic_stats.c |  145 +
 1 file changed, 60 insertions(+), 85 deletions(-)

diff --git a/drivers/net/ethernet/pensando/ionic/ionic_stats.c 
b/drivers/net/ethernet/pensando/ionic/ionic_stats.c
index 6ae75b771a15..1dac960386df 100644
--- a/drivers/net/ethernet/pensando/ionic/ionic_stats.c
+++ b/drivers/net/ethernet/pensando/ionic/ionic_stats.c
@@ -246,98 +246,73 @@ static u64 ionic_sw_stats_get_count(struct ionic_lif *lif)
return total;
 }
 
+static void ionic_sw_stats_get_tx_strings(struct ionic_lif *lif, u8 **buf,
+ int q_num)
+{
+   int i;
+
+   for (i = 0; i < IONIC_NUM_TX_STATS; i++)
+   ethtool_gsprintf(buf, "tx_%d_%s", q_num,
+ionic_tx_stats_desc[i].name);
+
+   if (!test_bit(IONIC_LIF_F_UP, lif->state) ||
+   !test_bit(IONIC_LIF_F_SW_DEBUG_STATS, lif->state))
+   return;
+
+   for (i = 0; i < IONIC_NUM_TX_Q_STATS; i++)
+   ethtool_gsprintf(buf, "txq_%d_%s", q_num,
+ionic_txq_stats_desc[i].name);
+   for (i = 0; i < IONIC_NUM_DBG_CQ_STATS; i++)
+   ethtool_gsprintf(buf, "txq_%d_cq_%s", q_num,
+ionic_dbg_cq_stats_desc[i].name);
+   for (i = 0; i < IONIC_NUM_DBG_INTR_STATS; i++)
+   ethtool_gsprintf(buf, "txq_%d_intr_%s", q_num,
+ionic_dbg_intr_stats_desc[i].name);
+   for (i = 0; i < IONIC_MAX_NUM_SG_CNTR; i++)
+   ethtool_gsprintf(buf, "txq_%d_sg_cntr_%d", q_num, i);
+}
+
+static void ionic_sw_stats_get_rx_strings(struct ionic_lif *lif, u8 **buf,
+ int q_num)
+{
+   int i;
+
+   for (i = 0; i < IONIC_NUM_RX_STATS; i++)
+   ethtool_gsprintf(buf, "rx_%d_%s", q_num,
+ionic_rx_stats_desc[i].name);
+
+   if (!test_bit(IONIC_LIF_F_UP, lif->state) ||
+   !test_bit(IONIC_LIF_F_SW_DEBUG_STATS, lif->state))
+   return;
+
+   for (i = 0; i < IONIC_NUM_DBG_CQ_STATS; i++)
+   ethtool_gsprintf(buf, "rxq_%d_cq_%s", q_num,
+ionic_dbg_cq_stats_desc[i].name);
+   for (i = 0; i < IONIC_NUM_DBG_INTR_STATS; i++)
+   ethtool_gsprintf(buf, "rxq_%d_intr_%s", q_num,
+ionic_dbg_intr_stats_desc[i].name);
+   for (i = 0; i < IONIC_NUM_DBG_NAPI_STATS; i++)
+   ethtool_gsprintf(buf, "rxq_%d_napi_%s", q_num,
+ionic_dbg_napi_stats_desc[i].name);
+   for (i = 0; i < IONIC_MAX_NUM_NAPI_CNTR; i++)
+   ethtool_gsprintf(buf, "rxq_%d_napi_work_done_%d", q_num, i);
+}
+
 static void ionic_sw_stats_get_strings(struct ionic_lif *lif, u8 **buf)
 {
int i, q_num;
 
-   for (i = 0; i < IONIC_NUM_LIF_STATS; i++) {
-   snprintf(*buf, ETH_GSTRING_LEN, ionic_lif_stats_desc[i].name);
-   *buf += ETH_GSTRING_LEN;
-   }
+   for (i = 0; i < IONIC_NUM_LIF_STATS; i++)
+   ethtool_gsprintf(buf, ionic_lif_stats_desc[i].name);
 
-   for (i = 0; i < IONIC_NUM_PORT_STATS; i++) {
-   snprintf(*buf, ETH_GSTRING_LEN,
-ionic_port_stats_desc[i].name);
-   *buf += ETH_GSTRING_LEN;
-   }
+   for (i = 0; i < IONIC_NUM_PORT_STATS; i++)
+   ethtool_gsprintf(buf, ionic_port_stats_desc[i].name);
 
-   for (q_num = 0; q_num < MAX_Q(lif); q_num++) {
-   for (i = 0; i < IONIC_NUM_TX_STATS; i++) {
-   snprintf(*buf, ETH_GSTRING_LEN, "tx_%d_%s",
-q_num, ionic_tx_stats_desc[i].name);
-   *buf += ETH_GSTRING_LEN;
-   }
+   for (q_num = 0; q_num < MAX_Q(lif); q_num++)
+   ionic_sw_stats_get_tx_strings(lif, buf, q_num);
 
-   if (test_bit(IONIC_LIF_F_UP, lif->state) &&
-   test_bit(IONIC_LIF_F_SW_DEBUG_STATS, lif->state)) {
-   for (i = 0; i < IONIC_NUM_TX_Q_STATS; i++) {
-   snprintf(*buf, ETH_GSTRING_LEN,
-"txq_%d_%s",
-q_num,
-ionic_txq_stats_desc[i].name);
-   *buf += ETH_GSTRING_LEN;
-   }
-   for (i = 0; i

[RFC PATCH 09/10] bna: Update driver to use ethtool_gsprintf

2021-03-10 Thread Alexander Duyck

From: Alexander Duyck 

Update the bnad_get_strings to make use of ethtool_gsprintf and avoid
unnecessary line wrapping. To do this we invert the logic for the string
set test and instead exit immediately if we are not working with the stats
strings. In addition the function is broken up into subfunctions for each
area so that we can simply call ethtool_gsprintf once for each string in a
given subsection.

Signed-off-by: Alexander Duyck 
---
 drivers/net/ethernet/brocade/bna/bnad_ethtool.c |  266 +--
 1 file changed, 105 insertions(+), 161 deletions(-)

diff --git a/drivers/net/ethernet/brocade/bna/bnad_ethtool.c 
b/drivers/net/ethernet/brocade/bna/bnad_ethtool.c
index 588c4804d10a..9d72f896880d 100644
--- a/drivers/net/ethernet/brocade/bna/bnad_ethtool.c
+++ b/drivers/net/ethernet/brocade/bna/bnad_ethtool.c
@@ -524,6 +524,68 @@ bnad_set_pauseparam(struct net_device *netdev,
return 0;
 }
 
+static void bnad_get_txf_strings(u8 **string, int f_num)
+{
+   ethtool_gsprintf(string, "txf%d_ucast_octets", f_num);
+   ethtool_gsprintf(string, "txf%d_ucast", f_num);
+   ethtool_gsprintf(string, "txf%d_ucast_vlan", f_num);
+   ethtool_gsprintf(string, "txf%d_mcast_octets", f_num);
+   ethtool_gsprintf(string, "txf%d_mcast", f_num);
+   ethtool_gsprintf(string, "txf%d_mcast_vlan", f_num);
+   ethtool_gsprintf(string, "txf%d_bcast_octets", f_num);
+   ethtool_gsprintf(string, "txf%d_bcast", f_num);
+   ethtool_gsprintf(string, "txf%d_bcast_vlan", f_num);
+   ethtool_gsprintf(string, "txf%d_errors", f_num);
+   ethtool_gsprintf(string, "txf%d_filter_vlan", f_num);
+   ethtool_gsprintf(string, "txf%d_filter_mac_sa", f_num);
+}
+
+static void bnad_get_rxf_strings(u8 **string, int f_num)
+{
+   ethtool_gsprintf(string, "rxf%d_ucast_octets", f_num);
+   ethtool_gsprintf(string, "rxf%d_ucast", f_num);
+   ethtool_gsprintf(string, "rxf%d_ucast_vlan", f_num);
+   ethtool_gsprintf(string, "rxf%d_mcast_octets", f_num);
+   ethtool_gsprintf(string, "rxf%d_mcast", f_num);
+   ethtool_gsprintf(string, "rxf%d_mcast_vlan", f_num);
+   ethtool_gsprintf(string, "rxf%d_bcast_octets", f_num);
+   ethtool_gsprintf(string, "rxf%d_bcast", f_num);
+   ethtool_gsprintf(string, "rxf%d_bcast_vlan", f_num);
+   ethtool_gsprintf(string, "rxf%d_frame_drops", f_num);
+}
+
+static void bnad_get_cq_strings(u8 **string, int q_num)
+{
+   ethtool_gsprintf(string, "cq%d_producer_index", q_num);
+   ethtool_gsprintf(string, "cq%d_consumer_index", q_num);
+   ethtool_gsprintf(string, "cq%d_hw_producer_index", q_num);
+   ethtool_gsprintf(string, "cq%d_intr", q_num);
+   ethtool_gsprintf(string, "cq%d_poll", q_num);
+   ethtool_gsprintf(string, "cq%d_schedule", q_num);
+   ethtool_gsprintf(string, "cq%d_keep_poll", q_num);
+   ethtool_gsprintf(string, "cq%d_complete", q_num);
+}
+
+static void bnad_get_rxq_strings(u8 **string, int q_num)
+{
+   ethtool_gsprintf(string, "rxq%d_packets", q_num);
+   ethtool_gsprintf(string, "rxq%d_bytes", q_num);
+   ethtool_gsprintf(string, "rxq%d_packets_with_error", q_num);
+   ethtool_gsprintf(string, "rxq%d_allocbuf_failed", q_num);
+   ethtool_gsprintf(string, "rxq%d_mapbuf_failed", q_num);
+   ethtool_gsprintf(string, "rxq%d_producer_index", q_num);
+   ethtool_gsprintf(string, "rxq%d_consumer_index", q_num);
+}
+
+static void bnad_get_txq_strings(u8 **string, int q_num)
+{
+   ethtool_gsprintf(string, "txq%d_packets", q_num);
+   ethtool_gsprintf(string, "txq%d_bytes", q_num);
+   ethtool_gsprintf(string, "txq%d_producer_index", q_num);
+   ethtool_gsprintf(string, "txq%d_consumer_index", q_num);
+   ethtool_gsprintf(string, "txq%d_hw_consumer_index", q_num);
+}
+
 static void
 bnad_get_strings(struct net_device *netdev, u32 stringset, u8 *string)
 {
@@ -531,175 +593,57 @@ bnad_get_strings(struct net_device *netdev, u32 
stringset, u8 *string)
int i, j, q_num;
u32 bmap;
 
+   if (stringset != ETH_SS_STATS)
+   return;
+
mutex_lock(&bnad->conf_mutex);
 
-   switch (stringset) {
-   case ETH_SS_STATS:
-   for (i = 0; i < BNAD_ETHTOOL_STATS_NUM; i++) {
-   BUG_ON(!(strlen(bnad_net_stats_strings[i]) <
-  ETH_GSTRING_LEN));
-   strncpy(string, bnad_net_stats_strings[i],
-   ETH_GSTRING_LEN);
-   st

[RFC PATCH 05/10] ena: Update driver to use ethtool_gsprintf

2021-03-10 Thread Alexander Duyck

From: Alexander Duyck 

Replace instances of snprintf or memcpy with a pointer update with
ethtool_gsprintf.

Signed-off-by: Alexander Duyck 
---
 drivers/net/ethernet/amazon/ena/ena_ethtool.c |   25 +++--
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_ethtool.c 
b/drivers/net/ethernet/amazon/ena/ena_ethtool.c
index d6cc7aa612b7..42f6bad7ca26 100644
--- a/drivers/net/ethernet/amazon/ena/ena_ethtool.c
+++ b/drivers/net/ethernet/amazon/ena/ena_ethtool.c
@@ -251,10 +251,10 @@ static void ena_queue_strings(struct ena_adapter 
*adapter, u8 **data)
for (j = 0; j < ENA_STATS_ARRAY_TX; j++) {
ena_stats = &ena_stats_tx_strings[j];
 
-   snprintf(*data, ETH_GSTRING_LEN,
-"queue_%u_%s_%s", i,
-is_xdp ? "xdp_tx" : "tx", ena_stats->name);
-   (*data) += ETH_GSTRING_LEN;
+   ethtool_gsprintf(data,
+"queue_%u_%s_%s", i,
+is_xdp ? "xdp_tx" : "tx",
+ena_stats->name);
}
 
if (!is_xdp) {
@@ -264,9 +264,9 @@ static void ena_queue_strings(struct ena_adapter *adapter, 
u8 **data)
for (j = 0; j < ENA_STATS_ARRAY_RX; j++) {
ena_stats = &ena_stats_rx_strings[j];
 
-   snprintf(*data, ETH_GSTRING_LEN,
-"queue_%u_rx_%s", i, ena_stats->name);
-   (*data) += ETH_GSTRING_LEN;
+   ethtool_gsprintf(data,
+"queue_%u_rx_%s", i,
+ena_stats->name);
}
}
}
@@ -280,9 +280,8 @@ static void ena_com_dev_strings(u8 **data)
for (i = 0; i < ENA_STATS_ARRAY_ENA_COM; i++) {
ena_stats = &ena_stats_ena_com_strings[i];
 
-   snprintf(*data, ETH_GSTRING_LEN,
-"ena_admin_q_%s", ena_stats->name);
-   (*data) += ETH_GSTRING_LEN;
+   ethtool_gsprintf(data,
+"ena_admin_q_%s", ena_stats->name);
}
 }
 
@@ -295,15 +294,13 @@ static void ena_get_strings(struct ena_adapter *adapter,
 
for (i = 0; i < ENA_STATS_ARRAY_GLOBAL; i++) {
ena_stats = &ena_stats_global_strings[i];
-   memcpy(data, ena_stats->name, ETH_GSTRING_LEN);
-   data += ETH_GSTRING_LEN;
+   ethtool_gsprintf(&data, ena_stats->name);
}
 
if (eni_stats_needed) {
for (i = 0; i < ENA_STATS_ARRAY_ENI(adapter); i++) {
ena_stats = &ena_stats_eni_strings[i];
-   memcpy(data, ena_stats->name, ETH_GSTRING_LEN);
-   data += ETH_GSTRING_LEN;
+   ethtool_gsprintf(&data, ena_stats->name);
}
}

[RFC PATCH 06/10] netvsc: Update driver to use ethtool_gsprintf

2021-03-10 Thread Alexander Duyck

From: Alexander Duyck 

Replace instances of sprintf or memcpy with a pointer update with
ethtool_gsprintf.

Signed-off-by: Alexander Duyck 
---
 drivers/net/hyperv/netvsc_drv.c |   33 +++--
 1 file changed, 11 insertions(+), 22 deletions(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 15f262b70489..4e8446a81c0b 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -1612,34 +1612,23 @@ static void netvsc_get_strings(struct net_device *dev, 
u32 stringset, u8 *data)
 
switch (stringset) {
case ETH_SS_STATS:
-   for (i = 0; i < ARRAY_SIZE(netvsc_stats); i++) {
-   memcpy(p, netvsc_stats[i].name, ETH_GSTRING_LEN);
-   p += ETH_GSTRING_LEN;
-   }
+   for (i = 0; i < ARRAY_SIZE(netvsc_stats); i++)
+   ethtool_gsprintf(&p, netvsc_stats[i].name);
 
-   for (i = 0; i < ARRAY_SIZE(vf_stats); i++) {
-   memcpy(p, vf_stats[i].name, ETH_GSTRING_LEN);
-   p += ETH_GSTRING_LEN;
-   }
+   for (i = 0; i < ARRAY_SIZE(vf_stats); i++)
+   ethtool_gsprintf(&p, vf_stats[i].name);
 
for (i = 0; i < nvdev->num_chn; i++) {
-   sprintf(p, "tx_queue_%u_packets", i);
-   p += ETH_GSTRING_LEN;
-   sprintf(p, "tx_queue_%u_bytes", i);
-   p += ETH_GSTRING_LEN;
-   sprintf(p, "rx_queue_%u_packets", i);
-   p += ETH_GSTRING_LEN;
-   sprintf(p, "rx_queue_%u_bytes", i);
-   p += ETH_GSTRING_LEN;
-   sprintf(p, "rx_queue_%u_xdp_drop", i);
-   p += ETH_GSTRING_LEN;
+   ethtool_gsprintf(&p, "tx_queue_%u_packets", i);
+   ethtool_gsprintf(&p, "tx_queue_%u_bytes", i);
+   ethtool_gsprintf(&p, "rx_queue_%u_packets", i);
+   ethtool_gsprintf(&p, "rx_queue_%u_bytes", i);
+   ethtool_gsprintf(&p, "rx_queue_%u_xdp_drop", i);
}
 
for_each_present_cpu(cpu) {
-   for (i = 0; i < ARRAY_SIZE(pcpu_stats); i++) {
-   sprintf(p, pcpu_stats[i].name, cpu);
-   p += ETH_GSTRING_LEN;
-   }
+   for (i = 0; i < ARRAY_SIZE(pcpu_stats); i++)
+   ethtool_gsprintf(&p, pcpu_stats[i].name, cpu);
}
 
break;

[RFC PATCH 07/10] virtio_net: Update driver to use ethtool_gsprintf

2021-03-10 Thread Alexander Duyck

From: Alexander Duyck 

Update the code to replace instances of snprintf and a pointer update with
just calling ethtool_gsprintf.

Also replace the char pointer with a u8 pointer to avoid having to recast
the pointer type.

Signed-off-by: Alexander Duyck 
---
 drivers/net/virtio_net.c |   18 +++---
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 82e520d2cb12..f1a05b43dde7 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2104,25 +2104,21 @@ static int virtnet_set_channels(struct net_device *dev,
 static void virtnet_get_strings(struct net_device *dev, u32 stringset, u8 
*data)
 {
struct virtnet_info *vi = netdev_priv(dev);
-   char *p = (char *)data;
unsigned int i, j;
+   u8 *p = data;
 
switch (stringset) {
case ETH_SS_STATS:
for (i = 0; i < vi->curr_queue_pairs; i++) {
-   for (j = 0; j < VIRTNET_RQ_STATS_LEN; j++) {
-   snprintf(p, ETH_GSTRING_LEN, "rx_queue_%u_%s",
-i, virtnet_rq_stats_desc[j].desc);
-   p += ETH_GSTRING_LEN;
-   }
+   for (j = 0; j < VIRTNET_RQ_STATS_LEN; j++)
+   ethtool_gsprintf(&p, "rx_queue_%u_%s", i,
+virtnet_rq_stats_desc[j].desc);
}
 
for (i = 0; i < vi->curr_queue_pairs; i++) {
-   for (j = 0; j < VIRTNET_SQ_STATS_LEN; j++) {
-   snprintf(p, ETH_GSTRING_LEN, "tx_queue_%u_%s",
-i, virtnet_sq_stats_desc[j].desc);
-   p += ETH_GSTRING_LEN;
-   }
+   for (j = 0; j < VIRTNET_SQ_STATS_LEN; j++)
+   ethtool_gsprintf(&p, "tx_queue_%u_%s", i,
+virtnet_sq_stats_desc[j].desc);
}
break;
}

[RFC PATCH 08/10] vmxnet3: Update driver to use ethtool_gsprintf

2021-03-10 Thread Alexander Duyck

From: Alexander Duyck 

So this patch actually does 3 things.

First it removes a stray white space at the start of the variable
declaration in vmxnet3_get_strings.

Second it flips the logic for the string test so that we exit immediately
if we are not looking for the stats strings. Doing this we can avoid
unnecessary indentation and line wrapping.

Then finally it updates the code to use ethtool_gsprintf rather than a
memcpy and pointer increment to write the ethtool strings.

Signed-off-by: Alexander Duyck 
---
 drivers/net/vmxnet3/vmxnet3_ethtool.c |   53 -
 1 file changed, 19 insertions(+), 34 deletions(-)

diff --git a/drivers/net/vmxnet3/vmxnet3_ethtool.c 
b/drivers/net/vmxnet3/vmxnet3_ethtool.c
index 7ec8652f2c26..4ec674380a91 100644
--- a/drivers/net/vmxnet3/vmxnet3_ethtool.c
+++ b/drivers/net/vmxnet3/vmxnet3_ethtool.c
@@ -218,43 +218,28 @@ vmxnet3_get_drvinfo(struct net_device *netdev, struct 
ethtool_drvinfo *drvinfo)
 static void
 vmxnet3_get_strings(struct net_device *netdev, u32 stringset, u8 *buf)
 {
-struct vmxnet3_adapter *adapter = netdev_priv(netdev);
-   if (stringset == ETH_SS_STATS) {
-   int i, j;
-   for (j = 0; j < adapter->num_tx_queues; j++) {
-   for (i = 0; i < ARRAY_SIZE(vmxnet3_tq_dev_stats); i++) {
-   memcpy(buf, vmxnet3_tq_dev_stats[i].desc,
-  ETH_GSTRING_LEN);
-   buf += ETH_GSTRING_LEN;
-   }
-   for (i = 0; i < ARRAY_SIZE(vmxnet3_tq_driver_stats);
-i++) {
-   memcpy(buf, vmxnet3_tq_driver_stats[i].desc,
-  ETH_GSTRING_LEN);
-   buf += ETH_GSTRING_LEN;
-   }
-   }
+   struct vmxnet3_adapter *adapter = netdev_priv(netdev);
+   int i, j;
 
-   for (j = 0; j < adapter->num_rx_queues; j++) {
-   for (i = 0; i < ARRAY_SIZE(vmxnet3_rq_dev_stats); i++) {
-   memcpy(buf, vmxnet3_rq_dev_stats[i].desc,
-  ETH_GSTRING_LEN);
-   buf += ETH_GSTRING_LEN;
-   }
-   for (i = 0; i < ARRAY_SIZE(vmxnet3_rq_driver_stats);
-i++) {
-   memcpy(buf, vmxnet3_rq_driver_stats[i].desc,
-  ETH_GSTRING_LEN);
-   buf += ETH_GSTRING_LEN;
-   }
-   }
+   if (stringset != ETH_SS_STATS)
+   return;
 
-   for (i = 0; i < ARRAY_SIZE(vmxnet3_global_stats); i++) {
-   memcpy(buf, vmxnet3_global_stats[i].desc,
-   ETH_GSTRING_LEN);
-   buf += ETH_GSTRING_LEN;
-   }
+   for (j = 0; j < adapter->num_tx_queues; j++) {
+   for (i = 0; i < ARRAY_SIZE(vmxnet3_tq_dev_stats); i++)
+   ethtool_gsprintf(&buf, vmxnet3_tq_dev_stats[i].desc);
+   for (i = 0; i < ARRAY_SIZE(vmxnet3_tq_driver_stats); i++)
+   ethtool_gsprintf(&buf, vmxnet3_tq_driver_stats[i].desc);
+   }
+
+   for (j = 0; j < adapter->num_rx_queues; j++) {
+   for (i = 0; i < ARRAY_SIZE(vmxnet3_rq_dev_stats); i++)
+   ethtool_gsprintf(&buf, vmxnet3_rq_dev_stats[i].desc);
+   for (i = 0; i < ARRAY_SIZE(vmxnet3_rq_driver_stats); i++)
+   ethtool_gsprintf(&buf, vmxnet3_rq_driver_stats[i].desc);
}
+
+   for (i = 0; i < ARRAY_SIZE(vmxnet3_global_stats); i++)
+   ethtool_gsprintf(&buf, vmxnet3_global_stats[i].desc);
 }
 
 netdev_features_t vmxnet3_fix_features(struct net_device *netdev,

[RFC PATCH 04/10] hisilicon: Update drivers to use ethtool_gsprintf

2021-03-10 Thread Alexander Duyck

From: Alexander Duyck 

Update the hisilicon drivers to make use of ethtool_gsprintf. The general
idea is to reduce code size and overhead by replacing the repeated pattern
of string printf statements and ETH_STRING_LEN counter increments.

Signed-off-by: Alexander Duyck 
---
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c |7 -
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c  |   37 +++-
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c  |   89 ++
 .../net/ethernet/hisilicon/hns/hns_dsaf_xgmac.c|6 -
 drivers/net/ethernet/hisilicon/hns/hns_ethtool.c   |   97 +++-
 5 files changed, 82 insertions(+), 154 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c
index 7fb7a419607d..c43acb73f1e3 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c
@@ -693,11 +693,8 @@ static void hns_gmac_get_strings(u32 stringset, u8 *data)
if (stringset != ETH_SS_STATS)
return;
 
-   for (i = 0; i < ARRAY_SIZE(g_gmac_stats_string); i++) {
-   snprintf(buff, ETH_GSTRING_LEN, "%s",
-g_gmac_stats_string[i].desc);
-   buff = buff + ETH_GSTRING_LEN;
-   }
+   for (i = 0; i < ARRAY_SIZE(g_gmac_stats_string); i++)
+   ethtool_gsprintf(&buff, g_gmac_stats_string[i].desc);
 }
 
 static int hns_gmac_get_sset_count(int stringset)
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c
index d0f8b1fff333..35a149e31a43 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c
@@ -465,30 +465,19 @@ void hns_ppe_get_strings(struct hns_ppe_cb *ppe_cb, int 
stringset, u8 *data)
char *buff = (char *)data;
int index = ppe_cb->index;
 
-   snprintf(buff, ETH_GSTRING_LEN, "ppe%d_rx_sw_pkt", index);
-   buff = buff + ETH_GSTRING_LEN;
-   snprintf(buff, ETH_GSTRING_LEN, "ppe%d_rx_pkt_ok", index);
-   buff = buff + ETH_GSTRING_LEN;
-   snprintf(buff, ETH_GSTRING_LEN, "ppe%d_rx_drop_pkt_no_bd", index);
-   buff = buff + ETH_GSTRING_LEN;
-   snprintf(buff, ETH_GSTRING_LEN, "ppe%d_rx_alloc_buf_fail", index);
-   buff = buff + ETH_GSTRING_LEN;
-   snprintf(buff, ETH_GSTRING_LEN, "ppe%d_rx_alloc_buf_wait", index);
-   buff = buff + ETH_GSTRING_LEN;
-   snprintf(buff, ETH_GSTRING_LEN, "ppe%d_rx_pkt_drop_no_buf", index);
-   buff = buff + ETH_GSTRING_LEN;
-   snprintf(buff, ETH_GSTRING_LEN, "ppe%d_rx_pkt_err_fifo_full", index);
-   buff = buff + ETH_GSTRING_LEN;
-
-   snprintf(buff, ETH_GSTRING_LEN, "ppe%d_tx_bd", index);
-   buff = buff + ETH_GSTRING_LEN;
-   snprintf(buff, ETH_GSTRING_LEN, "ppe%d_tx_pkt", index);
-   buff = buff + ETH_GSTRING_LEN;
-   snprintf(buff, ETH_GSTRING_LEN, "ppe%d_tx_pkt_ok", index);
-   buff = buff + ETH_GSTRING_LEN;
-   snprintf(buff, ETH_GSTRING_LEN, "ppe%d_tx_pkt_err_fifo_empty", index);
-   buff = buff + ETH_GSTRING_LEN;
-   snprintf(buff, ETH_GSTRING_LEN, "ppe%d_tx_pkt_err_csum_fail", index);
+   ethtool_gsprintf(&buff, "ppe%d_rx_sw_pkt", index);
+   ethtool_gsprintf(&buff, "ppe%d_rx_pkt_ok", index);
+   ethtool_gsprintf(&buff, "ppe%d_rx_drop_pkt_no_bd", index);
+   ethtool_gsprintf(&buff, "ppe%d_rx_alloc_buf_fail", index);
+   ethtool_gsprintf(&buff, "ppe%d_rx_alloc_buf_wait", index);
+   ethtool_gsprintf(&buff, "ppe%d_rx_pkt_drop_no_buf", index);
+   ethtool_gsprintf(&buff, "ppe%d_rx_pkt_err_fifo_full", index);
+
+   ethtool_gsprintf(&buff, "ppe%d_tx_bd", index);
+   ethtool_gsprintf(&buff, "ppe%d_tx_pkt", index);
+   ethtool_gsprintf(&buff, "ppe%d_tx_pkt_ok", index);
+   ethtool_gsprintf(&buff, "ppe%d_tx_pkt_err_fifo_empty", index);
+   ethtool_gsprintf(&buff, "ppe%d_tx_pkt_err_csum_fail", index);
 }
 
 void hns_ppe_get_stats(struct hns_ppe_cb *ppe_cb, u64 *data)
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c
index b6c8910cf7ba..a7232b906be4 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c
@@ -934,64 +934,37 @@ void hns_rcb_get_strings(int stringset, u8 *data, int 
index)
if (stringset != ETH_SS_STATS)
return;
 
-   snprintf(buff, ETH_GSTRING_LEN, "tx_ring%d_rcb_pkt_num", index);
-   buff = buff + ETH_GSTRING_LEN;
-   snprintf(buff, ETH_GSTRING_LEN, "tx_ring%d_ppe

[RFC PATCH 03/10] nfp: Replace nfp_pr_et with ethtool_gsprintf

2021-03-10 Thread Alexander Duyck

From: Alexander Duyck 

The nfp_pr_et function is nearly identical to ethtool_gsprintf except for
the fact that it passes the pointer by value and as a return whereas
ethtool_gsprintf passes it as a pointer.

Since they are so close just update nfp to make use of ethtool_gsprintf

Signed-off-by: Alexander Duyck 
---
 drivers/net/ethernet/netronome/nfp/abm/main.c  |4 +
 .../net/ethernet/netronome/nfp/nfp_net_ethtool.c   |   79 +---
 drivers/net/ethernet/netronome/nfp/nfp_port.h  |2 -
 3 files changed, 36 insertions(+), 49 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/abm/main.c 
b/drivers/net/ethernet/netronome/nfp/abm/main.c
index bdbf0726145e..3e8a9a7d7327 100644
--- a/drivers/net/ethernet/netronome/nfp/abm/main.c
+++ b/drivers/net/ethernet/netronome/nfp/abm/main.c
@@ -419,8 +419,8 @@ nfp_abm_port_get_stats_strings(struct nfp_app *app, struct 
nfp_port *port,
return data;
alink = repr->app_priv;
for (i = 0; i < alink->vnic->dp.num_r_vecs; i++) {
-   data = nfp_pr_et(data, "q%u_no_wait", i);
-   data = nfp_pr_et(data, "q%u_delayed", i);
+   ethtool_gsprintf(&data, "q%u_no_wait", i);
+   ethtool_gsprintf(&data, "q%u_delayed", i);
}
return data;
 }
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
index 9c9ae33d84ce..33097c411d7d 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
@@ -429,17 +429,6 @@ static int nfp_net_set_ringparam(struct net_device *netdev,
return nfp_net_set_ring_size(nn, rxd_cnt, txd_cnt);
 }
 
-__printf(2, 3) u8 *nfp_pr_et(u8 *data, const char *fmt, ...)
-{
-   va_list args;
-
-   va_start(args, fmt);
-   vsnprintf(data, ETH_GSTRING_LEN, fmt, args);
-   va_end(args);
-
-   return data + ETH_GSTRING_LEN;
-}
-
 static unsigned int nfp_vnic_get_sw_stats_count(struct net_device *netdev)
 {
struct nfp_net *nn = netdev_priv(netdev);
@@ -454,29 +443,29 @@ static u8 *nfp_vnic_get_sw_stats_strings(struct 
net_device *netdev, u8 *data)
int i;
 
for (i = 0; i < nn->max_r_vecs; i++) {
-   data = nfp_pr_et(data, "rvec_%u_rx_pkts", i);
-   data = nfp_pr_et(data, "rvec_%u_tx_pkts", i);
-   data = nfp_pr_et(data, "rvec_%u_tx_busy", i);
+   ethtool_gsprintf(&data, "rvec_%u_rx_pkts", i);
+   ethtool_gsprintf(&data, "rvec_%u_tx_pkts", i);
+   ethtool_gsprintf(&data, "rvec_%u_tx_busy", i);
}
 
-   data = nfp_pr_et(data, "hw_rx_csum_ok");
-   data = nfp_pr_et(data, "hw_rx_csum_inner_ok");
-   data = nfp_pr_et(data, "hw_rx_csum_complete");
-   data = nfp_pr_et(data, "hw_rx_csum_err");
-   data = nfp_pr_et(data, "rx_replace_buf_alloc_fail");
-   data = nfp_pr_et(data, "rx_tls_decrypted_packets");
-   data = nfp_pr_et(data, "hw_tx_csum");
-   data = nfp_pr_et(data, "hw_tx_inner_csum");
-   data = nfp_pr_et(data, "tx_gather");
-   data = nfp_pr_et(data, "tx_lso");
-   data = nfp_pr_et(data, "tx_tls_encrypted_packets");
-   data = nfp_pr_et(data, "tx_tls_ooo");
-   data = nfp_pr_et(data, "tx_tls_drop_no_sync_data");
-
-   data = nfp_pr_et(data, "hw_tls_no_space");
-   data = nfp_pr_et(data, "rx_tls_resync_req_ok");
-   data = nfp_pr_et(data, "rx_tls_resync_req_ign");
-   data = nfp_pr_et(data, "rx_tls_resync_sent");
+   ethtool_gsprintf(&data, "hw_rx_csum_ok");
+   ethtool_gsprintf(&data, "hw_rx_csum_inner_ok");
+   ethtool_gsprintf(&data, "hw_rx_csum_complete");
+   ethtool_gsprintf(&data, "hw_rx_csum_err");
+   ethtool_gsprintf(&data, "rx_replace_buf_alloc_fail");
+   ethtool_gsprintf(&data, "rx_tls_decrypted_packets");
+   ethtool_gsprintf(&data, "hw_tx_csum");
+   ethtool_gsprintf(&data, "hw_tx_inner_csum");
+   ethtool_gsprintf(&data, "tx_gather");
+   ethtool_gsprintf(&data, "tx_lso");
+   ethtool_gsprintf(&data, "tx_tls_encrypted_packets");
+   ethtool_gsprintf(&data, "tx_tls_ooo");
+   ethtool_gsprintf(&data, "tx_tls_drop_no_sync_data");
+
+   ethtool_gsprintf(&data, "hw_tls_no_space");
+   ethtool_gsprintf(&data, "rx_tls_resync_req_ok");
+   ethtool_gsprintf(&data, "rx_tls_resync_req_ign");
+   ethtool_gsprintf(&data, "r

[RFC PATCH 01/10] ethtool: Add common function for filling out strings

2021-03-10 Thread Alexander Duyck

From: Alexander Duyck 

Add a function to handle the common pattern of printing a string into the
ethtool strings interface and incrementing the string pointer by the
ETH_GSTRING_LEN. Most of the drivers end up doing this and several have
implemented their own versions of this function so it would make sense to
consolidate on one implementation.

Signed-off-by: Alexander Duyck 
---
 include/linux/ethtool.h |9 +
 net/ethtool/ioctl.c |   12 
 2 files changed, 21 insertions(+)

diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index ec4cd3921c67..0493f13b2b20 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -571,4 +571,13 @@ struct ethtool_phy_ops {
  */
 void ethtool_set_ethtool_phy_ops(const struct ethtool_phy_ops *ops);
 
+/**
+ * ethtool_gsprintf - Write formatted string to ethtool string data
+ * @data: Pointer to start of string to update
+ * @fmt: Format of string to write
+ *
+ * Write formatted string to data. Update data to point at start of
+ * next string.
+ */
+extern __printf(2, 3) void ethtool_gsprintf(u8 **data, const char *fmt, ...);
 #endif /* _LINUX_ETHTOOL_H */
diff --git a/net/ethtool/ioctl.c b/net/ethtool/ioctl.c
index 24783b71c584..44ac73780b6e 100644
--- a/net/ethtool/ioctl.c
+++ b/net/ethtool/ioctl.c
@@ -1844,6 +1844,18 @@ static int ethtool_get_strings(struct net_device *dev, 
void __user *useraddr)
return ret;
 }
 
+__printf(2, 3) void ethtool_gsprintf(u8 **data, const char *fmt, ...)
+{
+   va_list args;
+
+   va_start(args, fmt);
+   vsnprintf(*data, ETH_GSTRING_LEN, fmt, args);
+   va_end(args);
+
+   *data += ETH_GSTRING_LEN;
+}
+EXPORT_SYMBOL(ethtool_gsprintf);
+
 static int ethtool_phys_id(struct net_device *dev, void __user *useraddr)
 {
struct ethtool_value id;

[RFC PATCH 02/10] intel: Update drivers to use ethtool_gsprintf

2021-03-10 Thread Alexander Duyck

From: Alexander Duyck 

Update the Intel drivers to make use of ethtool_gsprintf. The general idea
is to reduce code size and overhead by replacing the repeated pattern of
string printf statements and ETH_STRING_LEN counter increments.

Signed-off-by: Alexander Duyck 
---
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c   |   16 ++
 drivers/net/ethernet/intel/ice/ice_ethtool.c |   55 +++---
 drivers/net/ethernet/intel/igb/igb_ethtool.c |   40 ++--
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |   40 ++--
 4 files changed, 50 insertions(+), 101 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index c70dec65a572..932c6635cfd6 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -2368,21 +2368,15 @@ static void i40e_get_priv_flag_strings(struct 
net_device *netdev, u8 *data)
struct i40e_netdev_priv *np = netdev_priv(netdev);
struct i40e_vsi *vsi = np->vsi;
struct i40e_pf *pf = vsi->back;
-   char *p = (char *)data;
+   u8 *p = data;
unsigned int i;
 
-   for (i = 0; i < I40E_PRIV_FLAGS_STR_LEN; i++) {
-   snprintf(p, ETH_GSTRING_LEN, "%s",
-i40e_gstrings_priv_flags[i].flag_string);
-   p += ETH_GSTRING_LEN;
-   }
+   for (i = 0; i < I40E_PRIV_FLAGS_STR_LEN; i++)
+   ethtool_gsprintf(&p, i40e_gstrings_priv_flags[i].flag_string);
if (pf->hw.pf_id != 0)
return;
-   for (i = 0; i < I40E_GL_PRIV_FLAGS_STR_LEN; i++) {
-   snprintf(p, ETH_GSTRING_LEN, "%s",
-i40e_gl_gstrings_priv_flags[i].flag_string);
-   p += ETH_GSTRING_LEN;
-   }
+   for (i = 0; i < I40E_GL_PRIV_FLAGS_STR_LEN; i++)
+   ethtool_gsprintf(&p, 
i40e_gl_gstrings_priv_flags[i].flag_string);
 }
 
 static void i40e_get_strings(struct net_device *netdev, u32 stringset,
diff --git a/drivers/net/ethernet/intel/ice/ice_ethtool.c 
b/drivers/net/ethernet/intel/ice/ice_ethtool.c
index 2dcfa685b763..cef5ebeae886 100644
--- a/drivers/net/ethernet/intel/ice/ice_ethtool.c
+++ b/drivers/net/ethernet/intel/ice/ice_ethtool.c
@@ -871,68 +871,47 @@ static void ice_get_strings(struct net_device *netdev, 
u32 stringset, u8 *data)
 {
struct ice_netdev_priv *np = netdev_priv(netdev);
struct ice_vsi *vsi = np->vsi;
-   char *p = (char *)data;
unsigned int i;
+   u8 *p = data;
 
switch (stringset) {
case ETH_SS_STATS:
-   for (i = 0; i < ICE_VSI_STATS_LEN; i++) {
-   snprintf(p, ETH_GSTRING_LEN, "%s",
-ice_gstrings_vsi_stats[i].stat_string);
-   p += ETH_GSTRING_LEN;
-   }
+   for (i = 0; i < ICE_VSI_STATS_LEN; i++)
+   ethtool_gsprintf(&p,
+ice_gstrings_vsi_stats[i].stat_string);
 
ice_for_each_alloc_txq(vsi, i) {
-   snprintf(p, ETH_GSTRING_LEN,
-"tx_queue_%u_packets", i);
-   p += ETH_GSTRING_LEN;
-   snprintf(p, ETH_GSTRING_LEN, "tx_queue_%u_bytes", i);
-   p += ETH_GSTRING_LEN;
+   ethtool_gsprintf(&p, "tx_queue_%u_packets", i);
+   ethtool_gsprintf(&p, "tx_queue_%u_bytes", i);
}
 
ice_for_each_alloc_rxq(vsi, i) {
-   snprintf(p, ETH_GSTRING_LEN,
-"rx_queue_%u_packets", i);
-   p += ETH_GSTRING_LEN;
-   snprintf(p, ETH_GSTRING_LEN, "rx_queue_%u_bytes", i);
-   p += ETH_GSTRING_LEN;
+   ethtool_gsprintf(&p, "rx_queue_%u_packets", i);
+   ethtool_gsprintf(&p, "rx_queue_%u_bytes", i);
}
 
if (vsi->type != ICE_VSI_PF)
return;
 
-   for (i = 0; i < ICE_PF_STATS_LEN; i++) {
-   snprintf(p, ETH_GSTRING_LEN, "%s",
-ice_gstrings_pf_stats[i].stat_string);
-   p += ETH_GSTRING_LEN;
-   }
+   for (i = 0; i < ICE_PF_STATS_LEN; i++)
+   ethtool_gsprintf(&p,
+ice_gstrings_pf_stats[i].stat_string);
 
for (i = 0; i < ICE_MAX_USER_PRIORITY; i++) {
-   snprintf(p, ETH_GSTRING_LEN,
-"tx_priority_%u_xon.nic", i);
-

[RFC PATCH 00/10] ethtool: Factor out common code related to writing ethtool strings

2021-03-10 Thread Alexander Duyck

This patch set is meant to be a cleanup and refactoring of common code bits
from several drivers. Specificlly a number of drivers engage in a pattern
where they will use some variant on an sprintf or memcpy to write a string
into the ethtool string array and then they will increment their pointer by
ETH_GSTRING_LEN.

Instead of having each driver implement this independently I am refactoring
the code so that we have one central function, ethtool_gsprintf that does
all this whch takes a double pointer to access the data, a formatted string
to print, and the variable arguments that are associated with the string.


---

Alexander Duyck (10):
  ethtool: Add common function for filling out strings
  intel: Update drivers to use ethtool_gsprintf
  nfp: Replace nfp_pr_et with ethtool_gsprintf
  hisilicon: Update drivers to use ethtool_gsprintf
  ena: Update driver to use ethtool_gsprintf
  netvsc: Update driver to use ethtool_gsprintf
  virtio_net: Update driver to use ethtool_gsprintf
  vmxnet3: Update driver to use ethtool_gsprintf
  bna: Update driver to use ethtool_gsprintf
  ionic: Update driver to use ethtool_gsprintf


 drivers/net/ethernet/amazon/ena/ena_ethtool.c |  25 +-
 .../net/ethernet/brocade/bna/bnad_ethtool.c   | 266 +++---
 .../ethernet/hisilicon/hns/hns_dsaf_gmac.c|   7 +-
 .../net/ethernet/hisilicon/hns/hns_dsaf_ppe.c |  37 +--
 .../net/ethernet/hisilicon/hns/hns_dsaf_rcb.c |  89 ++
 .../ethernet/hisilicon/hns/hns_dsaf_xgmac.c   |   6 +-
 .../net/ethernet/hisilicon/hns/hns_ethtool.c  |  97 +++
 .../net/ethernet/intel/i40e/i40e_ethtool.c|  16 +-
 drivers/net/ethernet/intel/ice/ice_ethtool.c  |  55 ++--
 drivers/net/ethernet/intel/igb/igb_ethtool.c  |  40 +--
 .../net/ethernet/intel/ixgbe/ixgbe_ethtool.c  |  40 +--
 drivers/net/ethernet/netronome/nfp/abm/main.c |   4 +-
 .../ethernet/netronome/nfp/nfp_net_ethtool.c  |  79 +++---
 drivers/net/ethernet/netronome/nfp/nfp_port.h |   2 -
 .../net/ethernet/pensando/ionic/ionic_stats.c | 145 --
 drivers/net/hyperv/netvsc_drv.c   |  33 +--
 drivers/net/virtio_net.c  |  18 +-
 drivers/net/vmxnet3/vmxnet3_ethtool.c |  53 ++--
 18 files changed, 381 insertions(+), 631 deletions(-)

--

Re: [PATCH mlx5-next v7 0/4] Dynamically assign MSI-X vectors count

2021-03-10 Thread Alexander Duyck

On Wed, Mar 10, 2021 at 11:09 AM Bjorn Helgaas  wrote:
>
> On Sun, Mar 07, 2021 at 10:55:24AM -0800, Alexander Duyck wrote:
> > On Sun, Feb 28, 2021 at 11:55 PM Leon Romanovsky  wrote:
> > > From: Leon Romanovsky 
> > >
> > > @Alexander Duyck, please update me if I can add your ROB tag again
> > > to the series, because you liked v6 more.
> > >
> > > Thanks
> > >
> > > -
> > > Changelog
> > > v7:
> > >  * Rebase on top v5.12-rc1
> > >  * More english fixes
> > >  * Returned to static sysfs creation model as was implemented in v0/v1.
> >
> > Yeah, so I am not a fan of the series. The problem is there is only
> > one driver that supports this, all VFs are going to expose this sysfs,
> > and I don't know how likely it is that any others are going to
> > implement this functionality. I feel like you threw out all the
> > progress from v2-v6.
>
> pci_enable_vfs_overlay() turned up in v4, so I think v0-v3 had static
> sysfs files regardless of whether the PF driver was bound.
>
> > I really feel like the big issue is that this model is broken as you
> > have the VFs exposing sysfs interfaces that make use of the PFs to
> > actually implement. Greg's complaint was the PF pushing sysfs onto the
> > VFs. My complaint is VFs sysfs files operating on the PF. The trick is
> > to find a way to address both issues.
> >
> > Maybe the compromise is to reach down into the IOV code and have it
> > register the sysfs interface at device creation time in something like
> > pci_iov_sysfs_link if the PF has the functionality present to support
> > it.
>
> IIUC there are two questions on the table:
>
>   1) Should the sysfs files be visible only when a PF driver that
>  supports MSI-X vector assignment is bound?
>
>  I think this is a cosmetic issue.  The presence of the file is
>  not a reliable signal to management software; it must always
>  tolerate files that don't exist (e.g., on old kernels) or files
>  that are visible but don't work (e.g., vectors may be exhausted).
>
>  If we start with the files always being visible, we should be
>  able to add smarts later to expose them only when the PF driver
>  is bound.
>
>  My concerns with pci_enable_vf_overlay() are that it uses a
>  little more sysfs internals than I'd like (although there are
>  many callers of sysfs_create_files()) and it uses
>  pci_get_domain_bus_and_slot(), which is generally a hack and
>  creates refcounting hassles.  Speaking of which, isn't v6 missing
>  a pci_dev_put() to match the pci_get_domain_bus_and_slot()?

I'm not so much worried about management software as the fact that
this is a vendor specific implementation detail that is shaping how
the kernel interfaces are meant to work. Other than the mlx5 I don't
know if there are any other vendors really onboard with this sort of
solution.

In addition it still feels rather hacky to be modifying read-only PCIe
configuration space on the fly via a backdoor provided by the PF. It
almost feels like this should be some sort of quirk rather than a
standard feature for an SR-IOV VF.

>   2) Should a VF sysfs file use the PF to implement this?
>
>  Can you elaborate on your idea here?  I guess
>  pci_iov_sysfs_link() makes a "virtfnX" link from the PF to the
>  VF, and you're thinking we could also make a "virtfnX_msix_count"
>  in the PF directory?  That's a really interesting idea.

I would honestly be more comfortable if the PF owned these files
instead of the VFs. One of the things I didn't like about this back
during the V1/2 days was the fact that it gave the impression that
MSI-X count was something that is meant to be edited. Since then I
think at least the naming was changed so that it implies that this is
only possible due to SR-IOV.

I also didn't like that it makes the VFs feel like they are port
representors rather than being actual PCIe devices. Having
functionality that only works when the VF driver is not loaded just
feels off. The VF sysfs directory feels like it is being used as a
subdirectory of the PF rather than being a device on its own.

> > Also we might want to double check that the PF cannot be unbound while
> > the VF is present. I know for a while there it was possible to remove
> > the PF driver while the VF was present. The Mellanox drivers may not
> > allow it but it might not hurt to look at taking a reference against
> > the PF driver if you are allocating the VF MSI-X configuration sysfs
> > f

Re: [PATCH] net: sock: simplify tw proto registration

2021-03-09 Thread Alexander Duyck

On Tue, Mar 9, 2021 at 5:48 PM Tonghao Zhang  wrote:
>
> On Wed, Mar 10, 2021 at 1:39 AM Alexander Duyck
>  wrote:
> >
> > On Mon, Mar 8, 2021 at 7:15 PM  wrote:
> > >
> > > From: Tonghao Zhang 
> > >
> > > Introduce a new function twsk_prot_init, inspired by
> > > req_prot_init, to simplify the "proto_register" function.
> > >
> > > Signed-off-by: Tonghao Zhang 
> > > ---
> > >  net/core/sock.c | 44 
> > >  1 file changed, 28 insertions(+), 16 deletions(-)
> > >
> > > diff --git a/net/core/sock.c b/net/core/sock.c
> > > index 0ed98f20448a..610de4295101 100644
> > > --- a/net/core/sock.c
> > > +++ b/net/core/sock.c
> > > @@ -3475,6 +3475,32 @@ static int req_prot_init(const struct proto *prot)
> > > return 0;
> > >  }
> > >
> > > +static int twsk_prot_init(const struct proto *prot)
> > > +{
> > > +   struct timewait_sock_ops *twsk_prot = prot->twsk_prot;
> > > +
> > > +   if (!twsk_prot)
> > > +   return 0;
> > > +
> > > +   twsk_prot->twsk_slab_name = kasprintf(GFP_KERNEL, "tw_sock_%s",
> > > + prot->name);
> > > +   if (!twsk_prot->twsk_slab_name)
> > > +   return -ENOMEM;
> > > +
> > > +   twsk_prot->twsk_slab =
> > > +   kmem_cache_create(twsk_prot->twsk_slab_name,
> > > + twsk_prot->twsk_obj_size, 0,
> > > + SLAB_ACCOUNT | prot->slab_flags,
> > > + NULL);
> > > +   if (!twsk_prot->twsk_slab) {
> > > +   pr_crit("%s: Can't create timewait sock SLAB cache!\n",
> > > +   prot->name);
> > > +   return -ENOMEM;
> > > +   }
> > > +
> > > +   return 0;
> > > +}
> > > +
> >
> > So one issue here is that you have two returns but they both have the
> > same error clean-up outside of the function. It might make more sense
> > to look at freeing the kasprintf if the slab allocation fails and then
> > using the out_free_request_sock_slab jump label below if the slab
> > allocation failed.
> Hi, thanks for your review.
> if twsk_prot_init failed, (kasprintf, or slab alloc), we will invoke
> the tw_prot_cleanup() to clean up
> the sources allocated.
> 1. kfree(twsk_prot->twsk_slab_name); // if name is NULL, kfree() will
> return directly
> 2. kmem_cache_destroy(twsk_prot->twsk_slab); // if slab is NULL,
> kmem_cache_destroy() will return directly too.
> so we don't care what err in twsk_prot_init().
>
> and req_prot_cleanup() will clean up all sources allocated for 
> req_prot_init().

I see. Okay so the expectation is that tw_prot_cleanup will take care
of a partially initialized timewait_sock_ops.

With that being the case the one change I would ask you to make would
be to look at moving the function up so it is just below
tw_prot_cleanup so it is obvious that the two are meant to be paired
rather than placing it after req_prot_init.

Otherwise the patch set itself looks good to me.

Reviewed-by: Alexander Duyck

Re: [PATCH] net: sock: simplify tw proto registration

2021-03-09 Thread Alexander Duyck

On Mon, Mar 8, 2021 at 7:15 PM  wrote:
>
> From: Tonghao Zhang 
>
> Introduce a new function twsk_prot_init, inspired by
> req_prot_init, to simplify the "proto_register" function.
>
> Signed-off-by: Tonghao Zhang 
> ---
>  net/core/sock.c | 44 
>  1 file changed, 28 insertions(+), 16 deletions(-)
>
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 0ed98f20448a..610de4295101 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -3475,6 +3475,32 @@ static int req_prot_init(const struct proto *prot)
> return 0;
>  }
>
> +static int twsk_prot_init(const struct proto *prot)
> +{
> +   struct timewait_sock_ops *twsk_prot = prot->twsk_prot;
> +
> +   if (!twsk_prot)
> +   return 0;
> +
> +   twsk_prot->twsk_slab_name = kasprintf(GFP_KERNEL, "tw_sock_%s",
> + prot->name);
> +   if (!twsk_prot->twsk_slab_name)
> +   return -ENOMEM;
> +
> +   twsk_prot->twsk_slab =
> +   kmem_cache_create(twsk_prot->twsk_slab_name,
> + twsk_prot->twsk_obj_size, 0,
> + SLAB_ACCOUNT | prot->slab_flags,
> + NULL);
> +   if (!twsk_prot->twsk_slab) {
> +   pr_crit("%s: Can't create timewait sock SLAB cache!\n",
> +   prot->name);
> +   return -ENOMEM;
> +   }
> +
> +   return 0;
> +}
> +

So one issue here is that you have two returns but they both have the
same error clean-up outside of the function. It might make more sense
to look at freeing the kasprintf if the slab allocation fails and then
using the out_free_request_sock_slab jump label below if the slab
allocation failed.

>  int proto_register(struct proto *prot, int alloc_slab)
>  {
> int ret = -ENOBUFS;
> @@ -3496,22 +3522,8 @@ int proto_register(struct proto *prot, int alloc_slab)
> if (req_prot_init(prot))
> goto out_free_request_sock_slab;
>
> -   if (prot->twsk_prot != NULL) {
> -   prot->twsk_prot->twsk_slab_name = 
> kasprintf(GFP_KERNEL, "tw_sock_%s", prot->name);
> -
> -   if (prot->twsk_prot->twsk_slab_name == NULL)
> -   goto out_free_request_sock_slab;
> -
> -   prot->twsk_prot->twsk_slab =
> -   
> kmem_cache_create(prot->twsk_prot->twsk_slab_name,
> - 
> prot->twsk_prot->twsk_obj_size,
> - 0,
> - SLAB_ACCOUNT |
> - prot->slab_flags,
> - NULL);
> -   if (prot->twsk_prot->twsk_slab == NULL)
> -   goto out_free_timewait_sock_slab;
> -   }
> +   if (twsk_prot_init(prot))
> +   goto out_free_timewait_sock_slab;

So assuming the code above takes care of freeing the slab name in case
of slab allocation failure then this would be better off jumping to
out_free_request_sock_slab.

> }
>
> mutex_lock(&proto_list_mutex);
> --
> 2.27.0
>

[net PATCH] ixgbe: Fix NULL pointer dereference in ethtool loopback test

2021-03-08 Thread Alexander Duyck

From: Alexander Duyck 

The ixgbe driver currently generates a NULL pointer dereference when
performing the ethtool loopback test. This is due to the fact that there
isn't a q_vector associated with the test ring when it is setup as
interrupts are not normally added to the test rings.

To address this I have added code that will check for a q_vector before
returning a napi_id value. If a q_vector is not present it will return a
value of 0.

Fixes: b02e5a0ebb17 ("xsk: Propagate napi_id to XDP socket Rx path")
Signed-off-by: Alexander Duyck 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index fae84202d870..724cdd669957 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -6534,6 +6534,13 @@ static int ixgbe_setup_all_tx_resources(struct 
ixgbe_adapter *adapter)
return err;
 }
 
+static int ixgbe_rx_napi_id(struct ixgbe_ring *rx_ring)
+{
+   struct ixgbe_q_vector *q_vector = rx_ring->q_vector;
+
+   return q_vector ? q_vector->napi.napi_id : 0;
+}
+
 /**
  * ixgbe_setup_rx_resources - allocate Rx resources (Descriptors)
  * @adapter: pointer to ixgbe_adapter
@@ -6582,7 +6589,7 @@ int ixgbe_setup_rx_resources(struct ixgbe_adapter 
*adapter,
 
/* XDP RX-queue info */
if (xdp_rxq_info_reg(&rx_ring->xdp_rxq, adapter->netdev,
-rx_ring->queue_index, 
rx_ring->q_vector->napi.napi_id) < 0)
+rx_ring->queue_index, ixgbe_rx_napi_id(rx_ring)) < 
0)
goto err;
 
rx_ring->xdp_prog = adapter->xdp_prog;

Re: [PATCH 0/3] fix a couple of atm->phy_data related issues

2021-03-08 Thread Alexander Duyck

Hi Tong,

Is this direct-assigned hardware or is QEMU being used to emulate the
hardware here? Admittedly I don't know that much about ATM, so I am
not sure when/if those phys would have gone out of production. However
since the code dates back to 2005 I am guessing it is on the old side.

Ultimately the decision is up to Chas. However if there has been code
in place for this long that would trigger this kind of null pointer
dereference then it kind of points to the fact that those phys have
probably not been in use since at least back when Linus switched over
to git in 2005.

Thanks,

- Alex

On Mon, Mar 8, 2021 at 9:55 AM Tong Zhang  wrote:
>
> Hi Alex,
> attached is the kernel log for zatm(uPD98402) -- I also have
> idt77252's log -- which is similar to this one --
> I think it makes sense to drop if no one is actually using it --
> - Tong
>
> [5.740774] BUG: KASAN: null-ptr-deref in uPD98402_start+0x5e/0x219
> [uPD98402]
> [5.741179] Write of size 4 at addr 002c by task modprobe/96
> [5.741548]
> [5.741637] CPU: 0 PID: 96 Comm: modprobe Not tainted 5.12.0-rc2-dirty #71
> [5.742017] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
> BIOS rel-1.13.0-48-gd9c812dda519-prebuilt.qemu.org 04/01/2014
> [5.742635] Call Trace:
> [5.742775]  dump_stack+0x8a/0xb5
> [5.742966]  kasan_report.cold+0x10f/0x111
> [5.743197]  ? uPD98402_start+0x5e/0x219 [uPD98402]
> [5.743473]  uPD98402_start+0x5e/0x219 [uPD98402]
> [5.743739]  zatm_init_one+0x10b5/0x1311 [zatm]
> [5.743998]  ? zatm_int.cold+0x30/0x30 [zatm]
> [5.744246]  ? _raw_write_lock_irqsave+0xd0/0xd0
> [5.744507]  ? __mutex_lock_slowpath+0x10/0x10
> [5.744757]  ? _raw_spin_unlock_irqrestore+0xd/0x20
> [5.745030]  ? zatm_int.cold+0x30/0x30 [zatm]
> [5.745278]  local_pci_probe+0x6f/0xb0
> [5.745492]  pci_device_probe+0x171/0x240
> [5.745718]  ? pci_device_remove+0xe0/0xe0
> [5.745949]  ? kernfs_create_link+0xb6/0x110
> [5.746190]  ? sysfs_do_create_link_sd.isra.0+0x76/0xe0
> [5.746482]  really_probe+0x161/0x420
> [5.746691]  driver_probe_device+0x6d/0xd0
> [5.746923]  device_driver_attach+0x82/0x90
> [5.747158]  ? device_driver_attach+0x90/0x90
> [5.747402]  __driver_attach+0x60/0x100
> [5.747621]  ? device_driver_attach+0x90/0x90
> [5.747864]  bus_for_each_dev+0xe1/0x140
> [5.748075]  ? subsys_dev_iter_exit+0x10/0x10
> [5.748320]  ? klist_node_init+0x61/0x80
> [5.748542]  bus_add_driver+0x254/0x2a0
> [5.748760]  driver_register+0xd3/0x150
> [5.748977]  ? 0xc003
> [5.749163]  do_one_initcall+0x84/0x250
> [5.749380]  ? trace_event_raw_event_initcall_finish+0x150/0x150
> [5.749714]  ? _raw_spin_unlock_irqrestore+0xd/0x20
> [5.749987]  ? create_object+0x395/0x510
> [5.750210]  ? kasan_unpoison+0x21/0x50
> [5.750427]  do_init_module+0xf8/0x350
> [5.750640]  load_module+0x40c5/0x4410
> [5.750854]  ? module_frob_arch_sections+0x20/0x20
> [5.751123]  ? kernel_read_file+0x1cd/0x3e0
> [5.751364]  ? __do_sys_finit_module+0x108/0x170
> [5.751628]  __do_sys_finit_module+0x108/0x170
> [5.751879]  ? __ia32_sys_init_module+0x40/0x40
> [5.752126]  ? file_open_root+0x200/0x200
> [5.752353]  ? do_sys_open+0x85/0xe0
> [5.752556]  ? filp_open+0x50/0x50
> [5.752750]  ? fpregs_assert_state_consistent+0x4d/0x60
> [5.753042]  ? exit_to_user_mode_prepare+0x2f/0x130
> [5.753316]  do_syscall_64+0x33/0x40
> [5.753519]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [5.753802] RIP: 0033:0x7ff64032dcf7
>  ff c3 48 c7 c6 01 00 00 00 e9 a1
> [5.755029] RSP: 002b:7ffd250ea358 EFLAGS: 0246 ORIG_RAX:
> 0139
> [5.755449] RAX: ffda RBX: 01093a70 RCX: 
> 7ff64032dcf7
> [5.755847] RDX:  RSI: 010929e0 RDI: 
> 0003
> [5.756242] RBP: 0003 R08:  R09: 
> 00000001
> [5.756635] R10: 7ff640391300 R11: 0246 R12: 
> 010929e0
> [5.757029] R13:  R14: 01092dd0 R15: 
> 0001
>
> On Mon, Mar 8, 2021 at 12:47 PM Alexander Duyck
>  wrote:
> >
> > On Mon, Mar 8, 2021 at 12:39 AM Tong Zhang  wrote:
> > >
> > > there are two drivers(zatm and idt77252) using PRIV() (i.e. atm->phy_data)
> > > to store private data, but the driver happens to populate wrong
> > > pointers: atm->dev_data. which actually cause null-ptr-dereference in
> > > following PRIV(dev). This patch series attemps to fix those two issues
> > > along with a typo in atm struct.
> > >
> > > Tong Zhang (3)

Re: [PATCH 0/3] fix a couple of atm->phy_data related issues

2021-03-08 Thread Alexander Duyck

On Mon, Mar 8, 2021 at 12:39 AM Tong Zhang  wrote:
>
> there are two drivers(zatm and idt77252) using PRIV() (i.e. atm->phy_data)
> to store private data, but the driver happens to populate wrong
> pointers: atm->dev_data. which actually cause null-ptr-dereference in
> following PRIV(dev). This patch series attemps to fix those two issues
> along with a typo in atm struct.
>
> Tong Zhang (3):
>   atm: fix a typo in the struct description
>   atm: uPD98402: fix incorrect allocation
>   atm: idt77252: fix null-ptr-dereference
>
>  drivers/atm/idt77105.c | 4 ++--
>  drivers/atm/uPD98402.c | 2 +-
>  include/linux/atmdev.h | 2 +-
>  3 files changed, 4 insertions(+), 4 deletions(-)

For the 2 phys you actually seen null pointer dereferences or are your
changes based on just code review?

I ask because it seems like this code has been this way since 2005 and
in the case of uPD98402_start the code doesn't seem like it should
function the way it was as PRIV is phy_data and there being issues
seems pretty obvious since the initialization of things happens
immediately after the allocation.

I'm just wondering if it might make more sense to drop the code if it
hasn't been run in 15+ years rather than updating it?

Re: [PATCH mlx5-next v7 0/4] Dynamically assign MSI-X vectors count

2021-03-08 Thread Alexander Duyck

On Sun, Mar 7, 2021 at 11:19 AM Leon Romanovsky  wrote:
>
> On Sun, Mar 07, 2021 at 10:55:24AM -0800, Alexander Duyck wrote:
> > On Sun, Feb 28, 2021 at 11:55 PM Leon Romanovsky  wrote:
> > >
> > > From: Leon Romanovsky 
> > >
> > > @Alexander Duyck, please update me if I can add your ROB tag again
> > > to the series, because you liked v6 more.
> > >
> > > Thanks
> > >
> > > -
> > > Changelog
> > > v7:
> > >  * Rebase on top v5.12-rc1
> > >  * More english fixes
> > >  * Returned to static sysfs creation model as was implemented in v0/v1.
> >
> > Yeah, so I am not a fan of the series. The problem is there is only
> > one driver that supports this, all VFs are going to expose this sysfs,
> > and I don't know how likely it is that any others are going to
> > implement this functionality. I feel like you threw out all the
> > progress from v2-v6.
>
> I'm with you here and tried to present the rationale in v6 when had
> a discussion with Bjorn, so it is unfair to say "you threw out".
>
> Bjorn expressed his preference, and no one came forward to support v6.

Sorry, it wasn't my intention to be accusatory. I'm just not a fan of
going back to where we were with v1.

With that said, if it is what Bjorn wants then you are probably better
off going with that. However if that is the direction we are going in
then you should probably focus on getting his Reviewed-by or Ack since
he will ultimately be the maintainer for the code.

> >
> > I really feel like the big issue is that this model is broken as you
> > have the VFs exposing sysfs interfaces that make use of the PFs to
> > actually implement. Greg's complaint was the PF pushing sysfs onto the
> > VFs. My complaint is VFs sysfs files operating on the PF. The trick is
> > to find a way to address both issues.
>
> It is hard to say something meaningful about Greg's complain, he was
> added in the middle of the discussion without much chances to get full
> picture.

Right, but what I am getting at is that the underlying problem is that
you either have sysfs being pushed onto a remote device, or sysfs that
is having to call into another device. It's not exactly something we
have had precedent for enabling before, and either perspective seems a
bit ugly.

> >
> > Maybe the compromise is to reach down into the IOV code and have it
> > register the sysfs interface at device creation time in something like
> > pci_iov_sysfs_link if the PF has the functionality present to support
> > it.
>
> IMHO, it adds nothing.

My thought was to reduce clutter. As I mentioned before with this
patch set we are enabling sysfs for functionality that is currently
only exposed by one device. I'm not sure it will be used by many
others or not. Having these sysfs interfaces instantiated at probe
time or at creation time in the case of VFs was preferable to me.

> >
> > Also we might want to double check that the PF cannot be unbound while
> > the VF is present. I know for a while there it was possible to remove
> > the PF driver while the VF was present. The Mellanox drivers may not
> > allow it but it might not hurt to look at taking a reference against
> > the PF driver if you are allocating the VF MSI-X configuration sysfs
> > file.
>
> Right now, we always allocate these sysfs without relation if PF
> supports or not. The check is done during write() call to such sysfs
> and at that phase we check the existence of the drivers. It greatly
> simplifies creation phase.

Yeah, I see that. From what I can tell the locking looks correct to
keep things from breaking. For what we have it is probably good enough
to keep things from causing any issues. My concern was more about
preventing the driver from reloading if we only exposed these
interfaces if the PF driver supported them.

Re: [PATCH mlx5-next v7 0/4] Dynamically assign MSI-X vectors count

2021-03-07 Thread Alexander Duyck

On Sun, Feb 28, 2021 at 11:55 PM Leon Romanovsky  wrote:
>
> From: Leon Romanovsky 
>
> @Alexander Duyck, please update me if I can add your ROB tag again
> to the series, because you liked v6 more.
>
> Thanks
>
> -
> Changelog
> v7:
>  * Rebase on top v5.12-rc1
>  * More english fixes
>  * Returned to static sysfs creation model as was implemented in v0/v1.

Yeah, so I am not a fan of the series. The problem is there is only
one driver that supports this, all VFs are going to expose this sysfs,
and I don't know how likely it is that any others are going to
implement this functionality. I feel like you threw out all the
progress from v2-v6.

I really feel like the big issue is that this model is broken as you
have the VFs exposing sysfs interfaces that make use of the PFs to
actually implement. Greg's complaint was the PF pushing sysfs onto the
VFs. My complaint is VFs sysfs files operating on the PF. The trick is
to find a way to address both issues.

Maybe the compromise is to reach down into the IOV code and have it
register the sysfs interface at device creation time in something like
pci_iov_sysfs_link if the PF has the functionality present to support
it.

Also we might want to double check that the PF cannot be unbound while
the VF is present. I know for a while there it was possible to remove
the PF driver while the VF was present. The Mellanox drivers may not
allow it but it might not hurt to look at taking a reference against
the PF driver if you are allocating the VF MSI-X configuration sysfs
file.

Re: [PATCH net] net: tcp: don't allocate fast clones for fastopen SYN

2021-03-03 Thread Alexander Duyck

On Wed, Mar 3, 2021 at 4:07 PM Jakub Kicinski  wrote:
>
> On Wed, 3 Mar 2021 13:35:53 -0800 Alexander Duyck wrote:
> > On Tue, Mar 2, 2021 at 1:37 PM Eric Dumazet  wrote:
> > > On Tue, Mar 2, 2021 at 7:08 AM Jakub Kicinski  wrote:
> > > > When receiver does not accept TCP Fast Open it will only ack
> > > > the SYN, and not the data. We detect this and immediately queue
> > > > the data for (re)transmission in tcp_rcv_fastopen_synack().
> > > >
> > > > In DC networks with very low RTT and without RFS the SYN-ACK
> > > > may arrive before NIC driver reported Tx completion on
> > > > the original SYN. In which case skb_still_in_host_queue()
> > > > returns true and sender will need to wait for the retransmission
> > > > timer to fire milliseconds later.
> > > >
> > > > Revert back to non-fast clone skbs, this way
> > > > skb_still_in_host_queue() won't prevent the recovery flow
> > > > from completing.
> > > >
> > > > Suggested-by: Eric Dumazet 
> > > > Fixes: 355a901e6cf1 ("tcp: make connect() mem charging friendly")
> > >
> > > Hmmm, not sure if this Fixes: tag makes sense.
> > >
> > > Really, if we delay TX completions by say 10 ms, other parts of the
> > > stack will misbehave anyway.
> > >
> > > Also, backporting this patch up to linux-3.19 is going to be tricky.
> > >
> > > The real issue here is that skb_still_in_host_queue() can give a false 
> > > positive.
> > >
> > > I have mixed feelings here, as you can read my answer :/
> > >
> > > Maybe skb_still_in_host_queue() signal should not be used when a part
> > > of the SKB has been received/acknowledged by the remote peer
> > > (in this case the SYN part).
> > >
> > > Alternative is that drivers unable to TX complete their skbs in a
> > > reasonable time should call skb_orphan()
> > >  to avoid skb_unclone() penalties (and this skb_still_in_host_queue() 
> > > issue)
> > >
> > > If you really want to play and delay TX completions, maybe provide a
> > > way to disable skb_still_in_host_queue() globally,
> > > using a static key ?
> >
> > The problem as I see it is that the original fclone isn't what we sent
> > out on the wire and that is confusing things. What we sent was a SYN
> > with data, but what we have now is just a data frame that hasn't been
> > put out on the wire yet.
>
> Not sure I understand why it's the key distinction here. Is it
> re-transmitting part of the frame or having different flags?
> Is re-transmit of half of a GSO skb also considered not the same?

The difference in my mind is the flags. So specifically the clone of
the syn_data frame in the case of the TCP fast open isn't actually a
clone of the sent frame. Instead we end up modifying the flags so that
it becomes the first data frame. We already have the SYN sitting in
the retransmit queue before we send the SYN w/ data frame. In addition
the SYN packet in the retransmit queue has a reference count of 1 so
it is not encumbered by the fclone reference count check so it could
theoretically be retransmitted immediately, it is just the data packet
that is being held.

If we replay a GSO frame we will get the same frames all over again.
In the case of a TCP fast open syn_data packet that isn't the case.
The first time out it is one packet, the second time it is two.

> To me the distinction is that the receiver has implicitly asked
> us for the re-transmission. If it was requested by SACK we should
> ignore "in_queue" for the first transmission as well, even if the
> skb state is identical.

In my mind the distinction is the fact that what we have in the
retransmit queue is 2 frames, a SYN and a data. Whereas what we have
put on the wire is SYN w/ data.

> > I wonder if we couldn't get away with doing something like adding a
> > fourth option of SKB_FCLONE_MODIFIED that we could apply to fastopen
> > skbs? That would keep the skb_still_in_host queue from triggering as
> > we would be changing the state from SKB_FCLONE_ORIG to
> > SKB_FCLONE_MODIFIED for the skb we store in the retransmit queue. In
> > addition if we have to clone it again and the fclone reference count
> > is 1 we could reset it back to SKB_FCLONE_ORIG.
>
> The unused value of fclone was tempting me as well :)
>
> AFAICT we have at least these options:
>
> 1 - don't use a fclone skb [v1]
>
> 2 - mark the fclone as "special" at Tx to escape the "in queue" check

This is what I had in

Re: [PATCH net] net: tcp: don't allocate fast clones for fastopen SYN

2021-03-03 Thread Alexander Duyck

On Tue, Mar 2, 2021 at 1:37 PM Eric Dumazet  wrote:
>
> On Tue, Mar 2, 2021 at 7:08 AM Jakub Kicinski  wrote:
> >
> > When receiver does not accept TCP Fast Open it will only ack
> > the SYN, and not the data. We detect this and immediately queue
> > the data for (re)transmission in tcp_rcv_fastopen_synack().
> >
> > In DC networks with very low RTT and without RFS the SYN-ACK
> > may arrive before NIC driver reported Tx completion on
> > the original SYN. In which case skb_still_in_host_queue()
> > returns true and sender will need to wait for the retransmission
> > timer to fire milliseconds later.
> >
> > Revert back to non-fast clone skbs, this way
> > skb_still_in_host_queue() won't prevent the recovery flow
> > from completing.
> >
> > Suggested-by: Eric Dumazet 
> > Fixes: 355a901e6cf1 ("tcp: make connect() mem charging friendly")
>
> Hmmm, not sure if this Fixes: tag makes sense.
>
> Really, if we delay TX completions by say 10 ms, other parts of the
> stack will misbehave anyway.
>
> Also, backporting this patch up to linux-3.19 is going to be tricky.
>
> The real issue here is that skb_still_in_host_queue() can give a false 
> positive.
>
> I have mixed feelings here, as you can read my answer :/
>
> Maybe skb_still_in_host_queue() signal should not be used when a part
> of the SKB has been received/acknowledged by the remote peer
> (in this case the SYN part).
>
> Alternative is that drivers unable to TX complete their skbs in a
> reasonable time should call skb_orphan()
>  to avoid skb_unclone() penalties (and this skb_still_in_host_queue() issue)
>
> If you really want to play and delay TX completions, maybe provide a
> way to disable skb_still_in_host_queue() globally,
> using a static key ?

The problem as I see it is that the original fclone isn't what we sent
out on the wire and that is confusing things. What we sent was a SYN
with data, but what we have now is just a data frame that hasn't been
put out on the wire yet.

I wonder if we couldn't get away with doing something like adding a
fourth option of SKB_FCLONE_MODIFIED that we could apply to fastopen
skbs? That would keep the skb_still_in_host queue from triggering as
we would be changing the state from SKB_FCLONE_ORIG to
SKB_FCLONE_MODIFIED for the skb we store in the retransmit queue. In
addition if we have to clone it again and the fclone reference count
is 1 we could reset it back to SKB_FCLONE_ORIG.

Re: [PATCH net] net: fix race between napi kthread mode and busy poll

2021-02-25 Thread Alexander Duyck

On Thu, Feb 25, 2021 at 5:20 PM Jakub Kicinski  wrote:
>
> On Thu, 25 Feb 2021 16:16:20 -0800 Wei Wang wrote:
> > On Thu, Feb 25, 2021 at 3:00 PM Jakub Kicinski  wrote:
> > > On Thu, 25 Feb 2021 10:29:47 -0800 Wei Wang wrote:
> > > > Hmm... I don't think the above patch would work. Consider a situation 
> > > > that:
> > > > 1. At first, the kthread is in sleep mode.
> > > > 2. Then someone calls napi_schedule() to schedule work on this napi.
> > > > So napi_schedule() is called. But at this moment, the kthread is
> > > > not yet in RUNNING state. So this function does not set SCHED_THREAD
> > > > bit.
> > > > 3. Then wake_up_process() is called to wake up the thread.
> > > > 4. Then napi_threaded_poll() calls napi_thread_wait().
> > >
> > > But how is the task not in running state outside of napi_thread_wait()?
> > >
> > > My scheduler knowledge is rudimentary, but AFAIU off CPU tasks which
> > > were not put to sleep are still in RUNNING state, so unless we set
> > > INTERRUPTIBLE the task will be running, even if it's stuck in 
> > > cond_resched().
> >
> > I think the thread is only in RUNNING state after wake_up_process() is
> > called on the thread in napi_schedule(). Before that, it should be
> > in INTERRUPTIBLE state. napi_thread_wait() explicitly calls
> > set_current_state(TASK_INTERRUPTIBLE) when it finishes 1 round of
> > polling.
>
> Are you concerned about it not being in RUNNING state after it's
> spawned but before it's first parked?
>
> > > > woken is false
> > > > and SCHED_THREAD bit is not set. So the kthread will go to sleep again
> > > > (in INTERRUPTIBLE mode) when schedule() is called, and waits to be
> > > > woken up by the next napi_schedule().
> > > > That will introduce arbitrary delay for the napi->poll() to be called.
> > > > Isn't it? Please enlighten me if I did not understand it correctly.
> > >
> > > Probably just me not understanding the scheduler :)
> > >
> > > > I personally prefer to directly set SCHED_THREAD bit in 
> > > > napi_schedule().
> > > > Or stick with SCHED_BUSY_POLL solution and replace kthread_run() with
> > > > kthread_create().
> > >
> > > Well, I'm fine with that too, no point arguing further if I'm not
> > > convincing anyone. But we need a fix which fixes the issue completely,
> > > not just one of three incarnations.
> >
> > Alexander and Eric,
> > Do you guys have preference on which approach to take?
> > If we keep the current SCHED_BUSY_POLL patch, I think we need to
> > change kthread_run() to kthread_create() to address the warning Martin
> > reported.
> > Or if we choose to set SCHED_THREADED, we could keep kthread_run().
> > But there is 1 extra set_bit() operation.
>
> To be clear extra set_bit() only if thread is running, which if IRQ
> coalescing works should be rather rare.

I was good with either approach. My preference would be to probably
use kthread_create regardless as it doesn't make much sense to have
the thread running until we really need it anyway.

RE: [PATCH net] net: fix race between napi kthread mode and busy poll

2021-02-24 Thread Alexander Duyck




> -Original Message-
> From: Jakub Kicinski 
> Sent: Wednesday, February 24, 2021 4:21 PM
> To: Alexander Duyck 
> Cc: Eric Dumazet ; Wei Wang
> ; David S . Miller ; netdev
> ; Paolo Abeni ; Hannes
> Frederic Sowa ; Martin Zaharinov
> 
> Subject: Re: [PATCH net] net: fix race between napi kthread mode and busy
> poll
> 
> On Thu, 25 Feb 2021 00:11:34 + Alexander Duyck wrote:
> > > > We were trying to not pollute the list (with about 40 different
> > > > emails so far)
> > > >
> > > > (Note this was not something I initiated, I only hit Reply all
> > > > button)
> > > >
> > > > OK, I will shut up, since you seem to take over this matter, and
> > > > it is 1am here in France.
> > >
> > > Are you okay with adding a SCHED_THREADED bit for threaded NAPI to
> > > be set in addition to SCHED? At least that way the bit is associated with 
> > > it's
> user.
> > > IIUC since the extra clear_bit() in busy poll was okay so should be
> > > a new set_bit()?
> >
> > The problem with adding a bit for SCHED_THREADED is that you would
> > have to heavily modify napi_schedule_prep so that it would add the
> > bit. That is the reason for going with adding the bit to the busy poll
> > logic because it added no additional overhead. Adding another atomic
> > bit setting operation or heavily modifying the existing one would add
> > considerable overhead as it is either adding a complicated conditional
> > check to all NAPI calls, or adding an atomic operation to the path for
> > the threaded NAPI.
> 
> I wasn't thinking of modifying the main schedule logic, just the threaded
> parts:
> 
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index
> ddf4cfc12615..6953005d06af 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -360,6 +360,7 @@ enum {
> NAPI_STATE_IN_BUSY_POLL,/* sk_busy_loop() owns this NAPI */
> NAPI_STATE_PREFER_BUSY_POLL,/* prefer busy-polling over softirq
> processing*/
> NAPI_STATE_THREADED,/* The poll is performed inside its 
> own
> thread*/
> +   NAPI_STATE_SCHED_THREAD,/* Thread owns the NAPI and will poll
> */
>  };
> 
>  enum {
> diff --git a/net/core/dev.c b/net/core/dev.c index
> 6c5967e80132..23e53f971478 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4294,6 +4294,7 @@ static inline void napi_schedule(struct
> softnet_data *sd,
>  */
> thread = READ_ONCE(napi->thread);
> if (thread) {
> +   set_bit(NAPI_STATE_SCHED_THREAD, &napi->state);
> wake_up_process(thread);
> return;
> }
> @@ -6486,7 +6487,8 @@ bool napi_complete_done(struct napi_struct *n,
> int work_done)
> WARN_ON_ONCE(!(val & NAPIF_STATE_SCHED));
> 
> new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED |
> - NAPIF_STATE_PREFER_BUSY_POLL);
> + NAPIF_STATE_PREFER_BUSY_POLL |
> + NAPI_STATE_SCHED_THREAD);
> 
> /* If STATE_MISSED was set, leave STATE_SCHED set,
>  * because we will call napi->poll() one more time.
> @@ -6971,7 +6973,9 @@ static int napi_thread_wait(struct napi_struct
> *napi)
> set_current_state(TASK_INTERRUPTIBLE);
> 
> while (!kthread_should_stop() && !napi_disable_pending(napi)) {
> -   if (test_bit(NAPI_STATE_SCHED, &napi->state)) {
> +   if (test_bit(NAPI_STATE_SCHED_THREAD, &napi->state)) {
> +   WARN_ON(!test_bit(test_bit(NAPI_STATE_SCHED,
> +  &napi->state)));
> WARN_ON(!list_empty(&napi->poll_list));
> __set_current_state(TASK_RUNNING);
> return 0;

Yeah, that was the patch Wei had done earlier. Eric complained about the extra 
set_bit atomic operation in the threaded path. That is when I came up with the 
idea of just adding a bit to the busy poll logic so that the only extra cost in 
the threaded path was having to check 2 bits instead of 1.

RE: [PATCH net] net: fix race between napi kthread mode and busy poll

2021-02-24 Thread Alexander Duyck




> -Original Message-
> From: Jakub Kicinski 
> Sent: Wednesday, February 24, 2021 4:07 PM
> To: Eric Dumazet 
> Cc: Wei Wang ; David S . Miller
> ; netdev ; Paolo Abeni
> ; Hannes Frederic Sowa
> ; Alexander Duyck
> ; Martin Zaharinov 
> Subject: Re: [PATCH net] net: fix race between napi kthread mode and busy
> poll
> 
> On Thu, 25 Feb 2021 00:59:25 +0100 Eric Dumazet wrote:
> > On Thu, Feb 25, 2021 at 12:52 AM Jakub Kicinski  wrote:
> > > Interesting, vger seems to be CCed but it isn't appearing on the ML.
> > > Perhaps just a vger delay :S
> > >
> > > Not really upsetting. I'm just trying to share what I learned
> > > devising more advanced pollers. The bits get really messy really quickly.
> > > Especially that the proposed fix adds a bit for a poor bystander
> > > (busy
> > > poll) while it's the threaded IRQ that is incorrectly not preserving
> > > its ownership.
> > >
> > > > Additional 16 bytes here, possibly in a shared cache line, [1] I
> > > > prefer using a bit in hot n->state, we have plenty of them available.
> > >
> > > Right, presumably the location of the new member could be optimized.
> > > I typed this proposal up in a couple of minutes.
> > >
> > > > We worked hours with Alexander, Wei, I am sorry you think we did a
> poor job.
> > > > I really thought we instead solved the issue at hand.
> > > >
> > > > May I suggest you defer your idea of redesigning the NAPI model
> > > > for net-next ?
> > >
> > > Seems like you decided on this solution off list and now the fact
> > > that there is a discussion on the list is upsetting you. May I
> > > suggest that discussions should be conducted on list to avoid such
> situations?
> >
> > We were trying to not pollute the list (with about 40 different emails
> > so far)
> >
> > (Note this was not something I initiated, I only hit Reply all button)
> >
> > OK, I will shut up, since you seem to take over this matter, and it is
> > 1am here in France.
> 
> Are you okay with adding a SCHED_THREADED bit for threaded NAPI to be
> set in addition to SCHED? At least that way the bit is associated with it's 
> user.
> IIUC since the extra clear_bit() in busy poll was okay so should be a new
> set_bit()?

The problem with adding a bit for SCHED_THREADED is that you would have to 
heavily modify napi_schedule_prep so that it would add the bit. That is the 
reason for going with adding the bit to the busy poll logic because it added no 
additional overhead. Adding another atomic bit setting operation or heavily 
modifying the existing one would add considerable overhead as it is either 
adding a complicated conditional check to all NAPI calls, or adding an atomic 
operation to the path for the threaded NAPI.

Re: [PATCH v6 net-next 00/11] skbuff: introduce skbuff_heads bulking and reusing

2021-02-13 Thread Alexander Duyck

On Sat, Feb 13, 2021 at 6:10 AM Alexander Lobakin  wrote:
>
> Currently, all sorts of skb allocation always do allocate
> skbuff_heads one by one via kmem_cache_alloc().
> On the other hand, we have percpu napi_alloc_cache to store
> skbuff_heads queued up for freeing and flush them by bulks.
>
> We can use this cache not only for bulk-wiping, but also to obtain
> heads for new skbs and avoid unconditional allocations, as well as
> for bulk-allocating (like XDP's cpumap code and veth driver already
> do).
>
> As this might affect latencies, cache pressure and lots of hardware
> and driver-dependent stuff, this new feature is mostly optional and
> can be issued via:
>  - a new napi_build_skb() function (as a replacement for build_skb());
>  - existing {,__}napi_alloc_skb() and napi_get_frags() functions;
>  - __alloc_skb() with passing SKB_ALLOC_NAPI in flags.
>
> iperf3 showed 35-70 Mbps bumps for both TCP and UDP while performing
> VLAN NAT on 1.2 GHz MIPS board. The boost is likely to be bigger
> on more powerful hosts and NICs with tens of Mpps.
>
> Note on skbuff_heads from distant slabs or pfmemalloc'ed slabs:
>  - kmalloc()/kmem_cache_alloc() itself allows by default allocating
>memory from the remote nodes to defragment their slabs. This is
>controlled by sysctl, but according to this, skbuff_head from a
>remote node is an OK case;
>  - The easiest way to check if the slab of skbuff_head is remote or
>pfmemalloc'ed is:
>
> if (!dev_page_is_reusable(virt_to_head_page(skb)))
> /* drop it */;
>
>...*but*, regarding that most slabs are built of compound pages,
>virt_to_head_page() will hit unlikely-branch every single call.
>This check costed at least 20 Mbps in test scenarios and seems
>like it'd be better to _not_ do this.



> Alexander Lobakin (11):
>   skbuff: move __alloc_skb() next to the other skb allocation functions
>   skbuff: simplify kmalloc_reserve()
>   skbuff: make __build_skb_around() return void
>   skbuff: simplify __alloc_skb() a bit
>   skbuff: use __build_skb_around() in __alloc_skb()
>   skbuff: remove __kfree_skb_flush()
>   skbuff: move NAPI cache declarations upper in the file
>   skbuff: introduce {,__}napi_build_skb() which reuses NAPI cache heads
>   skbuff: allow to optionally use NAPI cache from __alloc_skb()
>   skbuff: allow to use NAPI cache from __napi_alloc_skb()
>   skbuff: queue NAPI_MERGED_FREE skbs into NAPI cache instead of freeing
>
>  include/linux/skbuff.h |   4 +-
>  net/core/dev.c |  16 +-
>  net/core/skbuff.c  | 428 +++--
>  3 files changed, 242 insertions(+), 206 deletions(-)
>

With the last few changes and testing to verify the need to drop the
cache clearing this patch set looks good to me.

Reviewed-by: Alexander Duyck

Re: [PATCH net-next 02/11] i40e: drop misleading function comments

2021-02-12 Thread Alexander Duyck

On Fri, Feb 12, 2021 at 2:46 PM Tony Nguyen  wrote:
>
> From: Maciej Fijalkowski 
>
> i40e_cleanup_headers has a statement about check against skb being
> linear or not which is not relevant anymore, so let's remove it.
>
> Same case for i40e_can_reuse_rx_page, it references things that are not
> present there anymore.
>
> Reviewed-by: Björn Töpel 
> Signed-off-by: Maciej Fijalkowski 
> Tested-by: Tony Brelinski 
> Signed-off-by: Tony Nguyen 
> ---
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c | 33 -
>  1 file changed, 6 insertions(+), 27 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
> b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> index 3d24c6032616..5f6aa13e85ca 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> @@ -1963,9 +1963,6 @@ void i40e_process_skb_fields(struct i40e_ring *rx_ring,
>   * @skb: pointer to current skb being fixed
>   * @rx_desc: pointer to the EOP Rx descriptor
>   *
> - * Also address the case where we are pulling data in on pages only
> - * and as such no data is present in the skb header.
> - *
>   * In addition if skb is not at least 60 bytes we need to pad it so that
>   * it is large enough to qualify as a valid Ethernet frame.
>   *
> @@ -1998,33 +1995,15 @@ static bool i40e_cleanup_headers(struct i40e_ring 
> *rx_ring, struct sk_buff *skb,
>  }
>
>  /**
> - * i40e_can_reuse_rx_page - Determine if this page can be reused by
> - * the adapter for another receive
> - *
> + * i40e_can_reuse_rx_page - Determine if page can be reused for another Rx
>   * @rx_buffer: buffer containing the page
>   * @rx_buffer_pgcnt: buffer page refcount pre xdp_do_redirect() call
>   *
> - * If page is reusable, rx_buffer->page_offset is adjusted to point to
> - * an unused region in the page.
> - *
> - * For small pages, @truesize will be a constant value, half the size
> - * of the memory at page.  We'll attempt to alternate between high and
> - * low halves of the page, with one half ready for use by the hardware
> - * and the other half being consumed by the stack.  We use the page
> - * ref count to determine whether the stack has finished consuming the
> - * portion of this page that was passed up with a previous packet.  If
> - * the page ref count is >1, we'll assume the "other" half page is
> - * still busy, and this page cannot be reused.
> - *
> - * For larger pages, @truesize will be the actual space used by the
> - * received packet (adjusted upward to an even multiple of the cache
> - * line size).  This will advance through the page by the amount
> - * actually consumed by the received packets while there is still
> - * space for a buffer.  Each region of larger pages will be used at
> - * most once, after which the page will not be reused.
> - *
> - * In either case, if the page is reusable its refcount is increased.
> - **/
> + * If page is reusable, we have a green light for calling i40e_reuse_rx_page,
> + * which will assign the current buffer to the buffer that next_to_alloc is
> + * pointing to; otherwise, the DMA mapping needs to be destroyed and
> + * page freed
> + */
>  static bool i40e_can_reuse_rx_page(struct i40e_rx_buffer *rx_buffer,
>int rx_buffer_pgcnt)
>  {

So this lost all of the context for why or how the function works.

You should probably call out that for 4K pages it is using a simple
page count where if the count hits 2 we have to return false, and if
the page is bigger than 4K we have to check the remaining unused
buffer to determine if we will fail or not.

Re: [PATCH v3 net-next 4/5] net: ipa: introduce ipa_table_hash_support()

2021-02-12 Thread Alexander Duyck

On Fri, Feb 12, 2021 at 6:40 AM Alex Elder  wrote:
>
> Introduce a new function to abstract the knowledge of whether hashed
> routing and filter tables are supported for a given IPA instance.
>
> IPA v4.2 is the only one that doesn't support hashed tables (now
> and for the foreseeable future), but the name of the helper function
> is better for explaining what's going on.
>
> Signed-off-by: Alex Elder 
> ---
> v2: - Update copyrights.
>
>  drivers/net/ipa/ipa_cmd.c   |  2 +-
>  drivers/net/ipa/ipa_table.c | 16 +---
>  drivers/net/ipa/ipa_table.h |  8 +++-
>  3 files changed, 17 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/net/ipa/ipa_cmd.c b/drivers/net/ipa/ipa_cmd.c
> index fd8bf6468d313..35e35852c25c5 100644
> --- a/drivers/net/ipa/ipa_cmd.c
> +++ b/drivers/net/ipa/ipa_cmd.c
> @@ -268,7 +268,7 @@ static bool ipa_cmd_register_write_valid(struct ipa *ipa)
> /* If hashed tables are supported, ensure the hash flush register
>  * offset will fit in a register write IPA immediate command.
>  */
> -   if (ipa->version != IPA_VERSION_4_2) {
> +   if (ipa_table_hash_support(ipa)) {
> offset = ipa_reg_filt_rout_hash_flush_offset(ipa->version);
> name = "filter/route hash flush";
> if (!ipa_cmd_register_write_offset_valid(ipa, name, offset))
> diff --git a/drivers/net/ipa/ipa_table.c b/drivers/net/ipa/ipa_table.c
> index 32e2d3e052d55..baaab3dd0e63c 100644
> --- a/drivers/net/ipa/ipa_table.c
> +++ b/drivers/net/ipa/ipa_table.c
> @@ -1,7 +1,7 @@
>  // SPDX-License-Identifier: GPL-2.0
>
>  /* Copyright (c) 2012-2018, The Linux Foundation. All rights reserved.
> - * Copyright (C) 2018-2020 Linaro Ltd.
> + * Copyright (C) 2018-2021 Linaro Ltd.
>   */
>
>  #include 
> @@ -239,6 +239,11 @@ static void ipa_table_validate_build(void)
>
>  #endif /* !IPA_VALIDATE */
>
> +bool ipa_table_hash_support(struct ipa *ipa)
> +{
> +   return ipa->version != IPA_VERSION_4_2;
> +}
> +

Since this is only a single comparison it might make more sense to
make this a static inline and place it in ipa.h. Otherwise you are
just bloating the code up to jump to such a small function.

>  /* Zero entry count means no table, so just return a 0 address */
>  static dma_addr_t ipa_table_addr(struct ipa *ipa, bool filter_mask, u16 
> count)
>  {
> @@ -412,8 +417,7 @@ int ipa_table_hash_flush(struct ipa *ipa)
> struct gsi_trans *trans;
> u32 val;
>
> -   /* IPA version 4.2 does not support hashed tables */
> -   if (ipa->version == IPA_VERSION_4_2)
> +   if (!ipa_table_hash_support(ipa))
> return 0;
>
> trans = ipa_cmd_trans_alloc(ipa, 1);
> @@ -531,8 +535,7 @@ static void ipa_filter_config(struct ipa *ipa, bool modem)
> enum gsi_ee_id ee_id = modem ? GSI_EE_MODEM : GSI_EE_AP;
> u32 ep_mask = ipa->filter_map;
>
> -   /* IPA version 4.2 has no hashed route tables */
> -   if (ipa->version == IPA_VERSION_4_2)
> +   if (!ipa_table_hash_support(ipa))
> return;
>
> while (ep_mask) {
> @@ -582,8 +585,7 @@ static void ipa_route_config(struct ipa *ipa, bool modem)
>  {
> u32 route_id;
>
> -   /* IPA version 4.2 has no hashed route tables */
> -   if (ipa->version == IPA_VERSION_4_2)
> +   if (!ipa_table_hash_support(ipa))
> return;
>
> for (route_id = 0; route_id < IPA_ROUTE_COUNT_MAX; route_id++)
> diff --git a/drivers/net/ipa/ipa_table.h b/drivers/net/ipa/ipa_table.h
> index 78038d14fcea9..1a68d20f19d6a 100644
> --- a/drivers/net/ipa/ipa_table.h
> +++ b/drivers/net/ipa/ipa_table.h
> @@ -1,7 +1,7 @@
>  /* SPDX-License-Identifier: GPL-2.0 */
>
>  /* Copyright (c) 2012-2018, The Linux Foundation. All rights reserved.
> - * Copyright (C) 2019-2020 Linaro Ltd.
> + * Copyright (C) 2019-2021 Linaro Ltd.
>   */
>  #ifndef _IPA_TABLE_H_
>  #define _IPA_TABLE_H_
> @@ -51,6 +51,12 @@ static inline bool ipa_filter_map_valid(struct ipa *ipa, 
> u32 filter_mask)
>
>  #endif /* !IPA_VALIDATE */
>
> +/**
> + * ipa_table_hash_support() - Return true if hashed tables are supported
> + * @ipa:   IPA pointer
> + */
> +bool ipa_table_hash_support(struct ipa *ipa);
> +
>  /**
>   * ipa_table_reset() - Reset filter and route tables entries to "none"
>   * @ipa:   IPA pointer

Just define the function here and make it a static inline.

Re: [PATCH v5 net-next 06/11] skbuff: remove __kfree_skb_flush()

2021-02-11 Thread Alexander Duyck

On Thu, Feb 11, 2021 at 10:57 AM Alexander Lobakin  wrote:
>
> This function isn't much needed as NAPI skb queue gets bulk-freed
> anyway when there's no more room, and even may reduce the efficiency
> of bulk operations.
> It will be even less needed after reusing skb cache on allocation path,
> so remove it and this way lighten network softirqs a bit.
>
> Suggested-by: Eric Dumazet 
> Signed-off-by: Alexander Lobakin 

I'm wondering if you have any actual gains to show from this patch?

The reason why I ask is because the flushing was happening at the end
of the softirq before the system basically gave control back over to
something else. As such there is a good chance for the memory to be
dropped from the cache by the time we come back to it. So it may be
just as expensive if not more so than accessing memory that was just
freed elsewhere and placed in the slab cache.

> ---
>  include/linux/skbuff.h |  1 -
>  net/core/dev.c |  7 +--
>  net/core/skbuff.c  | 12 
>  3 files changed, 1 insertion(+), 19 deletions(-)
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 0a4e91a2f873..0e0707296098 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -2919,7 +2919,6 @@ static inline struct sk_buff *napi_alloc_skb(struct 
> napi_struct *napi,
>  }
>  void napi_consume_skb(struct sk_buff *skb, int budget);
>
> -void __kfree_skb_flush(void);
>  void __kfree_skb_defer(struct sk_buff *skb);
>
>  /**
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 321d41a110e7..4154d4683bb9 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4944,8 +4944,6 @@ static __latent_entropy void net_tx_action(struct 
> softirq_action *h)
> else
> __kfree_skb_defer(skb);
> }
> -
> -   __kfree_skb_flush();
> }
>
> if (sd->output_queue) {
> @@ -7012,7 +7010,6 @@ static int napi_threaded_poll(void *data)
> __napi_poll(napi, &repoll);
> netpoll_poll_unlock(have);
>
> -   __kfree_skb_flush();
> local_bh_enable();
>
> if (!repoll)

So it looks like this is the one exception to my comment above. Here
we should probably be adding a "if (!repoll)" before calling
__kfree_skb_flush().

> @@ -7042,7 +7039,7 @@ static __latent_entropy void net_rx_action(struct 
> softirq_action *h)
>
> if (list_empty(&list)) {
> if (!sd_has_rps_ipi_waiting(sd) && 
> list_empty(&repoll))
> -   goto out;
> +   return;
> break;
> }
>
> @@ -7069,8 +7066,6 @@ static __latent_entropy void net_rx_action(struct 
> softirq_action *h)
> __raise_softirq_irqoff(NET_RX_SOFTIRQ);
>
> net_rps_action_and_irq_enable(sd);
> -out:
> -   __kfree_skb_flush();
>  }
>
>  struct netdev_adjacent {
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 1c6f6ef70339..4be2bb969535 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -838,18 +838,6 @@ void __consume_stateless_skb(struct sk_buff *skb)
> kfree_skbmem(skb);
>  }
>
> -void __kfree_skb_flush(void)
> -{
> -   struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache);
> -
> -   /* flush skb_cache if containing objects */
> -   if (nc->skb_count) {
> -   kmem_cache_free_bulk(skbuff_head_cache, nc->skb_count,
> -nc->skb_cache);
> -   nc->skb_count = 0;
> -   }
> -}
> -
>  static inline void _kfree_skb_defer(struct sk_buff *skb)
>  {
> struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache);
> --
> 2.30.1
>
>

Re: [PATCH v5 net-next 09/11] skbuff: allow to optionally use NAPI cache from __alloc_skb()

2021-02-11 Thread Alexander Duyck

On Thu, Feb 11, 2021 at 11:00 AM Alexander Lobakin  wrote:
>
> Reuse the old and forgotten SKB_ALLOC_NAPI to add an option to get
> an skbuff_head from the NAPI cache instead of inplace allocation
> inside __alloc_skb().
> This implies that the function is called from softirq or BH-off
> context, not for allocating a clone or from a distant node.
>
> Signed-off-by: Alexander Lobakin 
> ---
>  net/core/skbuff.c | 13 +
>  1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 9e1a8ded4acc..a0b457ae87c2 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -397,15 +397,20 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t 
> gfp_mask,
> struct sk_buff *skb;
> u8 *data;
> bool pfmemalloc;
> +   bool clone;
>
> -   cache = (flags & SKB_ALLOC_FCLONE)
> -   ? skbuff_fclone_cache : skbuff_head_cache;
> +   clone = !!(flags & SKB_ALLOC_FCLONE);

The boolean conversion here is probably unnecessary. I would make
clone an int like flags and work with that. I suspect the compiler is
doing it already, but it is better to be explicit.

> +   cache = clone ? skbuff_fclone_cache : skbuff_head_cache;
>
> if (sk_memalloc_socks() && (flags & SKB_ALLOC_RX))
> gfp_mask |= __GFP_MEMALLOC;
>
> /* Get the HEAD */
> -   skb = kmem_cache_alloc_node(cache, gfp_mask & ~__GFP_DMA, node);
> +   if ((flags & SKB_ALLOC_NAPI) && !clone &&

Rather than having to do two checks you could just check for
SKB_ALLOC_NAPI and SKB_ALLOC_FCLONE in a single check. You could just
do something like:
if ((flags & (SKB_ALLOC_FCLONE | SKB_ALLOC_NAPI) == SKB_ALLOC_NAPI)

That way you can avoid the extra conditional jumps and can start
computing the flags value sooner.

> +   likely(node == NUMA_NO_NODE || node == numa_mem_id()))
> +   skb = napi_skb_cache_get();
> +   else
> +   skb = kmem_cache_alloc_node(cache, gfp_mask & ~GFP_DMA, node);
> if (unlikely(!skb))
> return NULL;
> prefetchw(skb);
> @@ -436,7 +441,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t 
> gfp_mask,
> __build_skb_around(skb, data, 0);
> skb->pfmemalloc = pfmemalloc;
>
> -   if (flags & SKB_ALLOC_FCLONE) {
> +   if (clone) {
> struct sk_buff_fclones *fclones;
>
> fclones = container_of(skb, struct sk_buff_fclones, skb1);
> --
> 2.30.1
>
>

Re: [PATCH net-next v2 0/3] bonding: 3ad: support for 200G/400G ports and more verbose warning

2021-02-10 Thread Alexander Duyck

On Wed, Feb 10, 2021 at 12:43 PM Nikolay Aleksandrov
 wrote:
>
> From: Nikolay Aleksandrov 
>
> Hi,
> We'd like to have proper 200G and 400G support with 3ad bond mode, so we
> need to add new definitions for them in order to have separate oper keys,
> aggregated bandwidth and proper operation (patches 01 and 02). In
> patch 03 Ido changes the code to use pr_err_once instead of
> pr_warn_once which would help future detection of unsupported speeds.
>
> v2: patch 03: use pr_err_once instead of WARN_ONCE
>
> Thanks,
>  Nik
>
> Ido Schimmel (1):
>   bonding: 3ad: Print an error for unknown speeds
>
> Nikolay Aleksandrov (2):
>   bonding: 3ad: add support for 200G speed
>   bonding: 3ad: add support for 400G speed
>
>  drivers/net/bonding/bond_3ad.c | 26 ++
>  1 file changed, 22 insertions(+), 4 deletions(-)
>

With this update the series looks good to me.

Reviewed-by: Alexander Duyck

Re: [PATCH net-next 3/3] bonding: 3ad: Use a more verbose warning for unknown speeds

2021-02-10 Thread Alexander Duyck

On Tue, Feb 9, 2021 at 2:42 AM Nikolay Aleksandrov  wrote:
>
> From: Ido Schimmel 
>
> The bond driver needs to be patched to support new ethtool speeds.
> Currently it emits a single warning [1] when it encounters an unknown
> speed. As evident by the two previous patches, this is not explicit
> enough. Instead, use WARN_ONCE() to get a more verbose warning [2].
>
> [1]
> bond10: (slave swp1): unknown ethtool speed (20) for port 1 (set it to 0)
>
> [2]
> bond20: (slave swp2): unknown ethtool speed (40) for port 1 (set it to 0)
> WARNING: CPU: 5 PID: 96 at drivers/net/bonding/bond_3ad.c:317 
> __get_link_speed.isra.0+0x110/0x120
> Modules linked in:
> CPU: 5 PID: 96 Comm: kworker/u16:5 Not tainted 
> 5.11.0-rc6-custom-02818-g69a767ec7302 #3243
> Hardware name: Mellanox Technologies Ltd. MSN4700/VMOD0010, BIOS 5.11 
> 01/06/2019
> Workqueue: bond20 bond_mii_monitor
> RIP: 0010:__get_link_speed.isra.0+0x110/0x120
> Code: 5b ff ff ff 52 4c 8b 4e 08 44 0f b7 c7 48 c7 c7 18 46 4a b8 48 8b 16 c6 
> 05 d9 76 41 01 01 49 8b 31 89 44 24 04 e8 a2 8a 3f 00 <0f> 0b 8b 44 24 04 59 
> c3 0
> f 1f 84 00 00 00 00 00 48 85 ff 74 3b 53
> RSP: 0018:b683c03afde0 EFLAGS: 00010282
> RAX:  RBX: 96bd3f2a9a38 RCX: 
> RDX: 96c06fd67560 RSI: 96c06fd57850 RDI: 96c06fd57850
> RBP:  R08: b8b49888 R09: 9ffb
> R10: e000 R11: 3fff R12: 
> R13: 96bd3f2a9a38 R14: 96bd49c56400 R15: 96bd49c564f0
> FS:  () GS:96c06fd4() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 7f327ad804b0 CR3: 000142ad5006 CR4: 003706e0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> Call Trace:
>  ad_update_actor_keys+0x36/0xc0
>  bond_3ad_handle_link_change+0x5d/0xf0
>  bond_mii_monitor.cold+0x1c2/0x1e8
>  process_one_work+0x1c9/0x360
>  worker_thread+0x48/0x3c0
>  kthread+0x113/0x130
>  ret_from_fork+0x1f/0x30
>
> Signed-off-by: Ido Schimmel 

I'm not really sure making the warning consume more text is really
going to solve the problem. I was actually much happier with just the
first error as I don't need a stack trace. Just having the line is
enough information for me to search and find the cause for the issue.
Adding a backtrace is just overkill.

If we really think this is something that is important maybe we should
move this up to an error instead of a warning. For example why not
make this use pr_err_once, instead of pr_warn_once? It should make it
more likely to be highlighted in the system log.

Re: [PATCH net-next v2] net: octeontx2: Fix the confusion in buffer alloc failure path

2021-02-10 Thread Alexander Duyck

On Tue, Feb 9, 2021 at 2:23 AM Kevin Hao  wrote:
>
> Pavel pointed that the return of dma_addr_t in
> otx2_alloc_rbuf/__otx2_alloc_rbuf() seem suspicious because a negative
> error code may be returned in some cases. For a dma_addr_t, the error
> code such as -ENOMEM does seem a valid value, so we can't judge if the
> buffer allocation fail or not based on that value. Add a parameter for
> otx2_alloc_rbuf/__otx2_alloc_rbuf() to store the dma address and make
> the return value to indicate if the buffer allocation really fail or
> not.
>
> Reported-by: Pavel Machek 
> Signed-off-by: Kevin Hao 
> Tested-by: Subbaraya Sundeep 

Actually in most cases -ENOMEM wouldn't be a valid value. The issue is
that you wouldn't have enough space to store anything since you are
only 12 bytes from overflowing the DMA value. That is why ~0 is used
as the DMA_MAPPING_ERROR value as there is only enough space to
possibly store 1 byte before it overflows.

I wonder if it wouldn't make sense to look at coming up with a set of
macros to convert the error values into a dma_addr_t value and to test
for those errors being present similar to what we already have for
pointers. It should work for most cases as I think the error values
are only up to something like -133 and I don't think we have too many
cases where something like an Rx buffer will be that small.

Anyway that is future work for another time.

The code itself looks fine.

Reviewed-by: Alexander Duyck

Re: [PATCH mlx5-next v6 0/4] Dynamically assign MSI-X vectors count

2021-02-09 Thread Alexander Duyck

On Tue, Feb 9, 2021 at 5:34 AM Leon Romanovsky  wrote:
>
> From: Leon Romanovsky 



> 
> Hi,
>
> The number of MSI-X vectors is PCI property visible through lspci, that
> field is read-only and configured by the device.
>
> The static assignment of an amount of MSI-X vectors doesn't allow utilize
> the newly created VF because it is not known to the device the future load
> and configuration where that VF will be used.
>
> The VFs are created on the hypervisor and forwarded to the VMs that have
> different properties (for example number of CPUs).
>
> To overcome the inefficiency in the spread of such MSI-X vectors, we
> allow the kernel to instruct the device with the needed number of such
> vectors, before VF is initialized and bounded to the driver.
>
> Before this series:
> [root@server ~]# lspci -vs :08:00.2
> 08:00.2 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5 
> Virtual Function]
> 
> Capabilities: [9c] MSI-X: Enable- Count=12 Masked-
>
> Configuration script:
> 1. Start fresh
> echo 0 > /sys/bus/pci/devices/\:08\:00.0/sriov_numvfs
> modprobe -q -r mlx5_ib mlx5_core
> 2. Ensure that driver doesn't run and it is safe to change MSI-X
> echo 0 > /sys/bus/pci/devices/\:08\:00.0/sriov_drivers_autoprobe
> 3. Load driver for the PF
> modprobe mlx5_core
> 4. Configure one of the VFs with new number
> echo 2 > /sys/bus/pci/devices/\:08\:00.0/sriov_numvfs
> echo 21 > /sys/bus/pci/devices/\:08\:00.2/sriov_vf_msix_count
>
> After this series:
> [root@server ~]# lspci -vs :08:00.2
> 08:00.2 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5 
> Virtual Function]
> 
> Capabilities: [9c] MSI-X: Enable- Count=21 Masked-
>
> Thanks
>
> Leon Romanovsky (4):
>   PCI: Add sysfs callback to allow MSI-X table size change of SR-IOV VFs
>   net/mlx5: Add dynamic MSI-X capabilities bits
>   net/mlx5: Dynamically assign MSI-X vectors count
>   net/mlx5: Allow to the users to configure number of MSI-X vectors
>
>  Documentation/ABI/testing/sysfs-bus-pci   |  28 
>  .../net/ethernet/mellanox/mlx5/core/main.c|  17 ++
>  .../ethernet/mellanox/mlx5/core/mlx5_core.h   |  27 
>  .../net/ethernet/mellanox/mlx5/core/pci_irq.c |  72 +
>  .../net/ethernet/mellanox/mlx5/core/sriov.c   |  58 ++-
>  drivers/pci/iov.c | 153 ++
>  include/linux/mlx5/mlx5_ifc.h |  11 +-
>  include/linux/pci.h   |  12 ++
>  8 files changed, 375 insertions(+), 3 deletions(-)
>

This seems much improved from the last time I reviewed the patch set.
I am good with the drop of the folder in favor of using "sriov" in the
naming of the fields.

For the series:
Reviewed-by: Alexander Duyck

Re: [PATCH net-next 00/12][pull request] 100GbE Intel Wired LAN Driver Updates 2021-02-08

2021-02-09 Thread Alexander Duyck

On Mon, Feb 8, 2021 at 5:19 PM Tony Nguyen  wrote:
>
> This series contains updates to the ice driver and documentation.
>
> Brett adds a log message when a trusted VF goes in and out of promiscuous
> for consistency with i40e driver.
>
> Dave implements a new LLDP command that allows adding VSI destinations to
> existing filters and adds support for netdev bonding events, current
> support is software based.
>
> Michal refactors code to move from VSI stored xsk_buff_pools to
> netdev-provided ones.
>
> Kiran implements the creation scheduler aggregator nodes and distributing
> VSIs within the nodes.
>
> Ben modifies rate limit calculations to use clock frequency from the
> hardware instead of using a hardcoded one.
>
> Jesse adds support for user to control writeback frequency.
>
> Chinh refactors DCB variables out of the ice_port_info struct.
>
> Bruce removes some unnecessary casting.
>
> Mitch fixes an error message that was reported as if_up instead of if_down.
>
> Tony adjusts fallback allocation for MSI-X to use all given vectors instead
> of using only the minimum configuration and updates documentation for
> the ice driver.
>
> The following are changes since commit 
> 08cbabb77e9098ec6c4a35911effac53e943c331:
>   Merge tag 'mlx5-updates-2021-02-04' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
> and are available in the git repository at:
>   git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue 100GbE
>
> Ben Shelton (1):
>   ice: Use PSM clock frequency to calculate RL profiles
>
> Brett Creeley (1):
>   ice: log message when trusted VF goes in/out of promisc mode
>
> Bruce Allan (1):
>   ice: remove unnecessary casts
>
> Chinh T Cao (1):
>   ice: Refactor DCB related variables out of the ice_port_info struct
>
> Dave Ertman (2):
>   ice: implement new LLDP filter command
>   ice: Add initial support framework for LAG
>
> Jesse Brandeburg (1):
>   ice: fix writeback enable logic
>
> Kiran Patil (1):
>   ice: create scheduler aggregator node config and move VSIs
>
> Michal Swiatkowski (1):
>   ice: Remove xsk_buff_pool from VSI structure
>
> Mitch Williams (1):
>   ice: Fix trivial error message
>
> Tony Nguyen (2):
>   ice: Improve MSI-X fallback logic
>   Documentation: ice: update documentation
>
>  .../device_drivers/ethernet/intel/ice.rst | 1027 -
>  drivers/net/ethernet/intel/ice/Makefile   |1 +
>  drivers/net/ethernet/intel/ice/ice.h  |   52 +-
>  .../net/ethernet/intel/ice/ice_adminq_cmd.h   |   25 +
>  drivers/net/ethernet/intel/ice/ice_common.c   |   58 +-
>  drivers/net/ethernet/intel/ice/ice_common.h   |3 +
>  drivers/net/ethernet/intel/ice/ice_controlq.c |4 +-
>  drivers/net/ethernet/intel/ice/ice_dcb.c  |   40 +-
>  drivers/net/ethernet/intel/ice/ice_dcb_lib.c  |   47 +-
>  drivers/net/ethernet/intel/ice/ice_dcb_nl.c   |   50 +-
>  drivers/net/ethernet/intel/ice/ice_ethtool.c  |   14 +-
>  .../net/ethernet/intel/ice/ice_flex_pipe.c|   10 +-
>  .../net/ethernet/intel/ice/ice_hw_autogen.h   |3 +
>  drivers/net/ethernet/intel/ice/ice_lag.c  |  445 ++
>  drivers/net/ethernet/intel/ice/ice_lag.h  |   87 ++
>  drivers/net/ethernet/intel/ice/ice_lib.c  |  142 +-
>  drivers/net/ethernet/intel/ice/ice_main.c |   87 +-
>  drivers/net/ethernet/intel/ice/ice_sched.c| 1283 +++--
>  drivers/net/ethernet/intel/ice/ice_sched.h|   24 +-
>  drivers/net/ethernet/intel/ice/ice_switch.c   |2 +-
>  drivers/net/ethernet/intel/ice/ice_txrx.c |   61 +-
>  drivers/net/ethernet/intel/ice/ice_txrx.h |1 -
>  drivers/net/ethernet/intel/ice/ice_type.h |   27 +-
>  .../net/ethernet/intel/ice/ice_virtchnl_pf.c  |   72 +-
>  drivers/net/ethernet/intel/ice/ice_xsk.c  |   71 +-
>  25 files changed, 3234 insertions(+), 402 deletions(-)
>  create mode 100644 drivers/net/ethernet/intel/ice/ice_lag.c
>  create mode 100644 drivers/net/ethernet/intel/ice/ice_lag.h
>

I looked over the patch set and it seems good to me.

Reviewed-by: Alexander Duyck

Re: [PATCH net-next v2] cxgb4: collect serial config version from register

2021-02-09 Thread Alexander Duyck

On Mon, Feb 8, 2021 at 10:10 PM Rahul Lakkireddy
 wrote:
>
> Collect serial config version information directly from an internal
> register, instead of explicitly resizing VPD.
>
> v2:
> - Add comments on info stored in PCIE_STATIC_SPARE2 register.
>
> Signed-off-by: Rahul Lakkireddy 
> ---
>  .../net/ethernet/chelsio/cxgb4/cudbg_entity.h |  3 ---
>  .../net/ethernet/chelsio/cxgb4/cudbg_lib.c| 24 +++
>  drivers/net/ethernet/chelsio/cxgb4/t4_regs.h  |  6 +
>  3 files changed, 9 insertions(+), 24 deletions(-)
>

Looks good.

Reviewed-by: Alexander Duyck

Re: [PATCH net-next] cxgb4: collect serial config version from register

2021-02-08 Thread Alexander Duyck

On Mon, Feb 8, 2021 at 3:48 PM Rahul Lakkireddy
 wrote:
>
> Collect serial config version information directly from an internal
> register, instead of explicitly resizing VPD.
>
> Signed-off-by: Rahul Lakkireddy 
> ---
>  .../net/ethernet/chelsio/cxgb4/cudbg_entity.h |  3 ---
>  .../net/ethernet/chelsio/cxgb4/cudbg_lib.c| 24 +++
>  drivers/net/ethernet/chelsio/cxgb4/t4_regs.h  |  2 ++
>  3 files changed, 5 insertions(+), 24 deletions(-)
>
> diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h 
> b/drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h
> index 876f90e5795e..d5218e74284c 100644
> --- a/drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h
> +++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h
> @@ -220,9 +220,6 @@ struct cudbg_mps_tcam {
> u8 reserved[2];
>  };
>
> -#define CUDBG_VPD_PF_SIZE 0x800
> -#define CUDBG_SCFG_VER_ADDR 0x06
> -#define CUDBG_SCFG_VER_LEN 4
>  #define CUDBG_VPD_VER_ADDR 0x18c7
>  #define CUDBG_VPD_VER_LEN 2
>
> diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c 
> b/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c
> index 75474f810249..6c85a10f465c 100644
> --- a/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c
> +++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c
> @@ -2686,10 +2686,10 @@ int cudbg_collect_vpd_data(struct cudbg_init 
> *pdbg_init,
> struct adapter *padap = pdbg_init->adap;
> struct cudbg_buffer temp_buff = { 0 };
> char vpd_str[CUDBG_VPD_VER_LEN + 1];
> -   u32 scfg_vers, vpd_vers, fw_vers;
> struct cudbg_vpd_data *vpd_data;
> struct vpd_params vpd = { 0 };
> -   int rc, ret;
> +   u32 vpd_vers, fw_vers;
> +   int rc;
>
> rc = t4_get_raw_vpd_params(padap, &vpd);
> if (rc)
> @@ -2699,24 +2699,6 @@ int cudbg_collect_vpd_data(struct cudbg_init 
> *pdbg_init,
> if (rc)
> return rc;
>
> -   /* Serial Configuration Version is located beyond the PF's vpd size.
> -* Temporarily give access to entire EEPROM to get it.
> -*/
> -   rc = pci_set_vpd_size(padap->pdev, EEPROMVSIZE);
> -   if (rc < 0)
> -   return rc;
> -
> -   ret = cudbg_read_vpd_reg(padap, CUDBG_SCFG_VER_ADDR, 
> CUDBG_SCFG_VER_LEN,
> -&scfg_vers);
> -
> -   /* Restore back to original PF's vpd size */
> -   rc = pci_set_vpd_size(padap->pdev, CUDBG_VPD_PF_SIZE);
> -   if (rc < 0)
> -   return rc;
> -
> -   if (ret)
> -   return ret;
> -
> rc = cudbg_read_vpd_reg(padap, CUDBG_VPD_VER_ADDR, CUDBG_VPD_VER_LEN,
> vpd_str);
> if (rc)
> @@ -2737,7 +2719,7 @@ int cudbg_collect_vpd_data(struct cudbg_init *pdbg_init,
> memcpy(vpd_data->bn, vpd.pn, PN_LEN + 1);
> memcpy(vpd_data->na, vpd.na, MACADDR_LEN + 1);
> memcpy(vpd_data->mn, vpd.id, ID_LEN + 1);
> -   vpd_data->scfg_vers = scfg_vers;
> +   vpd_data->scfg_vers = t4_read_reg(padap, PCIE_STATIC_SPARE2_A);
> vpd_data->vpd_vers = vpd_vers;
> vpd_data->fw_major = FW_HDR_FW_VER_MAJOR_G(fw_vers);
> vpd_data->fw_minor = FW_HDR_FW_VER_MINOR_G(fw_vers);

All of the above looks good to me.

> diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h 
> b/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h
> index b11a172b5174..2d7bb8b66a3e 100644
> --- a/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h
> +++ b/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h
> @@ -884,6 +884,8 @@
>  #define TDUE_V(x) ((x) << TDUE_S)
>  #define TDUE_FTDUE_V(1U)
>
> +#define PCIE_STATIC_SPARE2_A   0x5bfc
> +
>  /* registers for module MC */
>  #define MC_INT_CAUSE_A 0x7518
>  #define MC_P_INT_CAUSE_A   0x41318

I cannot say I am a fan of the naming. I assume that is the name of an
existing register that someone claimed to use to store the serial
config version? A comment explaining what all is stored in the
register might be useful since the name doesn't imply anything related
to a serial config version is stored there.

Re: [PATCH net-next v2 12/12] net-sysfs: move the xps cpus/rxqs retrieval in a common function

2021-02-08 Thread Alexander Duyck

On Mon, Feb 8, 2021 at 9:19 AM Antoine Tenart  wrote:
>
> Most of the xps_cpus_show and xps_rxqs_show functions share the same
> logic. Having it in two different functions does not help maintenance.
> This patch moves their common logic into a new function, xps_queue_show,
> to improve this.
>
> Signed-off-by: Antoine Tenart 
> ---
>  net/core/net-sysfs.c | 98 ++--
>  1 file changed, 31 insertions(+), 67 deletions(-)
>
> diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
> index 6ce5772e799e..984c15248483 100644
> --- a/net/core/net-sysfs.c
> +++ b/net/core/net-sysfs.c
> @@ -1314,35 +1314,31 @@ static const struct attribute_group dql_group = {
>  #endif /* CONFIG_BQL */
>
>  #ifdef CONFIG_XPS
> -static ssize_t xps_cpus_show(struct netdev_queue *queue,
> -char *buf)
> +static ssize_t xps_queue_show(struct net_device *dev, unsigned int index,
> + char *buf, enum xps_map_type type)
>  {
> -   struct net_device *dev = queue->dev;
> struct xps_dev_maps *dev_maps;
> -   unsigned int index, nr_ids;
> -   int j, len, ret, tc = 0;
> unsigned long *mask;
> -
> -   if (!netif_is_multiqueue(dev))
> -   return -ENOENT;
> -
> -   index = get_netdev_queue_index(queue);
> -
> -   /* If queue belongs to subordinate dev use its map */
> -   dev = netdev_get_tx_queue(dev, index)->sb_dev ? : dev;
> +   unsigned int nr_ids;
> +   int j, len, tc = 0;
>
> tc = netdev_txq_to_tc(dev, index);
> if (tc < 0)
> return -EINVAL;
>
> rcu_read_lock();
> -   dev_maps = rcu_dereference(dev->xps_maps[XPS_CPUS]);
> -   nr_ids = dev_maps ? dev_maps->nr_ids : nr_cpu_ids;
> +   dev_maps = rcu_dereference(dev->xps_maps[type]);
> +
> +   /* Default to nr_cpu_ids/dev->num_rx_queues and do not just return 0
> +* when dev_maps hasn't been allocated yet, to be backward compatible.
> +*/
> +   nr_ids = dev_maps ? dev_maps->nr_ids :
> +(type == XPS_CPUS ? nr_cpu_ids : dev->num_rx_queues);
>
> mask = bitmap_zalloc(nr_ids, GFP_KERNEL);
> if (!mask) {
> -   ret = -ENOMEM;
> -   goto err_rcu_unlock;
> +   rcu_read_unlock();
> +   return -ENOMEM;
> }
>
> if (!dev_maps || tc >= dev_maps->num_tc)
> @@ -1368,11 +1364,24 @@ static ssize_t xps_cpus_show(struct netdev_queue 
> *queue,
>
> len = bitmap_print_to_pagebuf(false, buf, mask, nr_ids);
> bitmap_free(mask);
> +
> return len < PAGE_SIZE ? len : -EINVAL;
> +}
>
> -err_rcu_unlock:
> -   rcu_read_unlock();
> -   return ret;
> +static ssize_t xps_cpus_show(struct netdev_queue *queue, char *buf)
> +{
> +   struct net_device *dev = queue->dev;
> +   unsigned int index;
> +
> +   if (!netif_is_multiqueue(dev))
> +   return -ENOENT;
> +
> +   index = get_netdev_queue_index(queue);
> +
> +   /* If queue belongs to subordinate dev use its map */
> +   dev = netdev_get_tx_queue(dev, index)->sb_dev ? : dev;
> +
> +   return xps_queue_show(dev, index, buf, XPS_CPUS);
>  }
>
>  static ssize_t xps_cpus_store(struct netdev_queue *queue,

So this patch has the same issue as the one that was removing the
rtnl_lock. Basically the sb_dev needs to still be protected by the
rtnl_lock. We might need to take the rtnl_lock and maybe make use of
the get_device/put_device logic to make certain the device cannot be
freed while you are passing it to xps_queue_show.

[net-next PATCH] net-sysfs: Add rtnl locking for getting Tx queue traffic class

2021-02-08 Thread Alexander Duyck

From: Alexander Duyck 

In order to access the suboordinate dev for a device we should be holding
the rtnl_lock when outside of the transmit path. The existing code was not
doing that for the sysfs dump function and as a result we were open to a
possible race.

To resolve that take the rtnl lock prior to accessing the sb_dev field of
the Tx queue and release it after we have retrieved the tc for the queue.

Signed-off-by: Alexander Duyck 
---
 net/core/net-sysfs.c |   13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index daf502c13d6d..91afb0b6de69 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -1136,18 +1136,25 @@ static ssize_t traffic_class_show(struct netdev_queue 
*queue,
  char *buf)
 {
struct net_device *dev = queue->dev;
+   int num_tc, tc;
int index;
-   int tc;
 
if (!netif_is_multiqueue(dev))
return -ENOENT;
 
+   if (!rtnl_trylock())
+   return restart_syscall();
+
index = get_netdev_queue_index(queue);
 
/* If queue belongs to subordinate dev use its TC mapping */
dev = netdev_get_tx_queue(dev, index)->sb_dev ? : dev;
 
+   num_tc = dev->num_tc;
tc = netdev_txq_to_tc(dev, index);
+
+   rtnl_unlock();
+
if (tc < 0)
return -EINVAL;
 
@@ -1158,8 +1165,8 @@ static ssize_t traffic_class_show(struct netdev_queue 
*queue,
 * belongs to the root device it will be reported with just the
 * traffic class, so just "0" for TC 0 for example.
 */
-   return dev->num_tc < 0 ? sprintf(buf, "%d%d\n", tc, dev->num_tc) :
-sprintf(buf, "%d\n", tc);
+   return num_tc < 0 ? sprintf(buf, "%d%d\n", tc, num_tc) :
+   sprintf(buf, "%d\n", tc);
 }
 
 #ifdef CONFIG_XPS

Re: [PATCH net-next v2 09/12] net-sysfs: remove the rtnl lock when accessing the xps maps

2021-02-08 Thread Alexander Duyck

On Mon, Feb 8, 2021 at 9:19 AM Antoine Tenart  wrote:
>
> Now that nr_ids and num_tc are stored in the xps dev_maps, which are RCU
> protected, we do not have the need to protect the xps_cpus_show and
> xps_rxqs_show functions with the rtnl lock.
>
> Signed-off-by: Antoine Tenart 
> ---
>  net/core/net-sysfs.c | 26 --
>  1 file changed, 4 insertions(+), 22 deletions(-)
>
> diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
> index c2276b589cfb..6ce5772e799e 100644
> --- a/net/core/net-sysfs.c
> +++ b/net/core/net-sysfs.c
> @@ -1328,17 +1328,12 @@ static ssize_t xps_cpus_show(struct netdev_queue 
> *queue,
>
> index = get_netdev_queue_index(queue);
>
> -   if (!rtnl_trylock())
> -   return restart_syscall();
> -
> /* If queue belongs to subordinate dev use its map */
> dev = netdev_get_tx_queue(dev, index)->sb_dev ? : dev;
>
> tc = netdev_txq_to_tc(dev, index);
> -   if (tc < 0) {
> -   ret = -EINVAL;
> -   goto err_rtnl_unlock;
> -   }
> +   if (tc < 0)
> +   return -EINVAL;
>
> rcu_read_lock();
> dev_maps = rcu_dereference(dev->xps_maps[XPS_CPUS]);

So I think we hit a snag here. The sb_dev pointer is protected by the
rtnl_lock. So I don't think we can release the rtnl_lock until after
we are done with the dev pointer.

Also I am not sure it is safe to use netdev_txq_to_tc without holding
the lock. I don't know if we ever went through and guaranteed that it
will always work if the lock isn't held since in theory the device
could reprogram all the map values out from under us.

Odds are we should probably fix traffic_class_show as I suspect it
probably also needs to be holding the rtnl_lock to prevent any
possible races. I'll submit a patch for that.

> @@ -1371,16 +1366,12 @@ static ssize_t xps_cpus_show(struct netdev_queue 
> *queue,
>  out_no_maps:
> rcu_read_unlock();
>
> -   rtnl_unlock();
> -
> len = bitmap_print_to_pagebuf(false, buf, mask, nr_ids);
> bitmap_free(mask);
> return len < PAGE_SIZE ? len : -EINVAL;
>
>  err_rcu_unlock:
> rcu_read_unlock();
> -err_rtnl_unlock:
> -   rtnl_unlock();
> return ret;
>  }
>
> @@ -1435,14 +1426,9 @@ static ssize_t xps_rxqs_show(struct netdev_queue 
> *queue, char *buf)
>
> index = get_netdev_queue_index(queue);
>
> -   if (!rtnl_trylock())
> -   return restart_syscall();
> -
> tc = netdev_txq_to_tc(dev, index);
> -   if (tc < 0) {
> -   ret = -EINVAL;
> -   goto err_rtnl_unlock;
> -   }
> +   if (tc < 0)
> +   return -EINVAL;
>
> rcu_read_lock();
> dev_maps = rcu_dereference(dev->xps_maps[XPS_RXQS]);
> @@ -1475,8 +1461,6 @@ static ssize_t xps_rxqs_show(struct netdev_queue 
> *queue, char *buf)
>  out_no_maps:
> rcu_read_unlock();
>
> -   rtnl_unlock();
> -
> len = bitmap_print_to_pagebuf(false, buf, mask, nr_ids);
> bitmap_free(mask);
>
> @@ -1484,8 +1468,6 @@ static ssize_t xps_rxqs_show(struct netdev_queue 
> *queue, char *buf)
>
>  err_rcu_unlock:
> rcu_read_unlock();
> -err_rtnl_unlock:
> -   rtnl_unlock();
> return ret;
>  }
>
> --
> 2.29.2
>

Re: [PATCH net-next v2 07/12] net: remove the xps possible_mask

2021-02-08 Thread Alexander Duyck

On Mon, Feb 8, 2021 at 9:19 AM Antoine Tenart  wrote:
>
> Remove the xps possible_mask. It was an optimization but we can just
> loop from 0 to nr_ids now that it is embedded in the xps dev_maps. That
> simplifies the code a bit.
>
> Suggested-by: Alexander Duyck 
> Signed-off-by: Antoine Tenart 
> ---
>  net/core/dev.c   | 43 ++-
>  net/core/net-sysfs.c |  4 ++--
>  2 files changed, 16 insertions(+), 31 deletions(-)
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index abbb2ae6b3ed..d0c07ccea2e5 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -2505,33 +2505,27 @@ static void reset_xps_maps(struct net_device *dev,
> kfree_rcu(dev_maps, rcu);
>  }
>
> -static void clean_xps_maps(struct net_device *dev, const unsigned long *mask,
> +static void clean_xps_maps(struct net_device *dev,
>struct xps_dev_maps *dev_maps, u16 offset, u16 
> count,
>bool is_rxqs_map)
>  {
> -   unsigned int nr_ids = dev_maps->nr_ids;
> bool active = false;
> int i, j;
>
> -   for (j = -1; j = netif_attrmask_next(j, mask, nr_ids), j < nr_ids;)
> -   active |= remove_xps_queue_cpu(dev, dev_maps, j, offset,
> -  count);
> +   for (j = 0; j < dev_maps->nr_ids; j++)
> +   active |= remove_xps_queue_cpu(dev, dev_maps, j, offset, 
> count);
> if (!active)
> reset_xps_maps(dev, dev_maps, is_rxqs_map);
>
> -   if (!is_rxqs_map) {
> -   for (i = offset + (count - 1); count--; i--) {
> +   if (!is_rxqs_map)
> +   for (i = offset + (count - 1); count--; i--)
> netdev_queue_numa_node_write(
> -   netdev_get_tx_queue(dev, i),
> -   NUMA_NO_NODE);
> -   }
> -   }
> +   netdev_get_tx_queue(dev, i), NUMA_NO_NODE);
>  }
>

This violates the coding-style guide for the kernel. The if statement
should still have braces as the for loop and
netdev_queue_numa_node_write are more than a single statement. I'd be
curious to see if checkpatch also complains about this because it
probably should.

For reference see the end of section 3.0 in
Documentation/process/coding-style.rst.

Other than that the rest of the patch seemed to be fine.

Re: [PATCH net-next v11 3/3] net: add sysfs attribute to control napi threaded mode

2021-02-08 Thread Alexander Duyck

 threaded = false;
}

Anyway it is just a suggestion for improvement since it is
functionally the same as the code above but greatly reduces the
indentation. I am okay with the code as is as well, it just seemed
like a lot of braces on the end there.

> +   dev->threaded = threaded;
> +
> +   /* Make sure kthread is created before THREADED bit
> +* is set.
> +*/
> +   smp_mb__before_atomic();
> +
> +   /* Setting/unsetting threaded mode on a napi might not immediately
> +* take effect, if the current napi instance is actively being
> +* polled. In this case, the switch between threaded mode and
> +* softirq mode will happen in the next round of napi_schedule().
> +* This should not cause hiccups/stalls to the live traffic.
> +*/
> +   list_for_each_entry(napi, &dev->napi_list, dev_list) {
> +   if (threaded)
> +   set_bit(NAPI_STATE_THREADED, &napi->state);
> +   else
> +   clear_bit(NAPI_STATE_THREADED, &napi->state);
> +   }
> +
> +   return err;
> +}
> +
>  void netif_napi_add(struct net_device *dev, struct napi_struct *napi,
> int (*poll)(struct napi_struct *, int), int weight)
>  {
> diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
> index daf502c13d6d..e72d474c2623 100644
> --- a/net/core/net-sysfs.c
> +++ b/net/core/net-sysfs.c
> @@ -538,6 +538,45 @@ static ssize_t phys_switch_id_show(struct device *dev,
>  }
>  static DEVICE_ATTR_RO(phys_switch_id);
>
> +static ssize_t threaded_show(struct device *dev,
> +struct device_attribute *attr, char *buf)
> +{
> +   struct net_device *netdev = to_net_dev(dev);
> +   ssize_t ret = -EINVAL;
> +
> +   if (!rtnl_trylock())
> +   return restart_syscall();
> +
> +   if (dev_isalive(netdev))
> +   ret = sprintf(buf, fmt_dec, netdev->threaded);
> +
> +   rtnl_unlock();
> +   return ret;
> +}
> +
> +static int modify_napi_threaded(struct net_device *dev, unsigned long val)
> +{
> +   int ret;
> +
> +   if (list_empty(&dev->napi_list))
> +   return -EOPNOTSUPP;
> +
> +   if (val != 0 && val != 1)
> +   return -EOPNOTSUPP;
> +
> +   ret = dev_set_threaded(dev, val);
> +
> +   return ret;
> +}
> +
> +static ssize_t threaded_store(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t len)
> +{
> +   return netdev_store(dev, attr, buf, len, modify_napi_threaded);
> +}
> +static DEVICE_ATTR_RW(threaded);
> +
>  static struct attribute *net_class_attrs[] __ro_after_init = {
> &dev_attr_netdev_group.attr,
> &dev_attr_type.attr,
> @@ -570,6 +609,7 @@ static struct attribute *net_class_attrs[] 
> __ro_after_init = {
> &dev_attr_proto_down.attr,
> &dev_attr_carrier_up_count.attr,
> &dev_attr_carrier_down_count.attr,
> +   &dev_attr_threaded.attr,
> NULL,
>  };
>  ATTRIBUTE_GROUPS(net_class);

Other than the style nit I mentioned above the code looks good to me.

Reviewed-by: Alexander Duyck

Re: [PATCH net-next v2 3/3] cxgb4: remove changing VPD len

2021-02-08 Thread Alexander Duyck

On Fri, Feb 5, 2021 at 2:18 PM Heiner Kallweit  wrote:
>
> Now that the PCI VPD for Chelsio devices from T4 has been changed and VPD
> len is set to PCI_VPD_MAX_SIZE (32K), we don't have to change the VPD len
> any longer.
>
> Signed-off-by: Heiner Kallweit 
> ---
>  .../net/ethernet/chelsio/cxgb4/cudbg_entity.h |  1 -
>  .../net/ethernet/chelsio/cxgb4/cudbg_lib.c| 21 ---
>  2 files changed, 4 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h 
> b/drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h
> index 876f90e57..02ccb610a 100644
> --- a/drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h
> +++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h
> @@ -220,7 +220,6 @@ struct cudbg_mps_tcam {
> u8 reserved[2];
>  };
>
> -#define CUDBG_VPD_PF_SIZE 0x800
>  #define CUDBG_SCFG_VER_ADDR 0x06
>  #define CUDBG_SCFG_VER_LEN 4
>  #define CUDBG_VPD_VER_ADDR 0x18c7
> diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c 
> b/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c
> index 75474f810..addac5518 100644
> --- a/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c
> +++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c
> @@ -2689,7 +2689,7 @@ int cudbg_collect_vpd_data(struct cudbg_init *pdbg_init,
> u32 scfg_vers, vpd_vers, fw_vers;
> struct cudbg_vpd_data *vpd_data;
> struct vpd_params vpd = { 0 };
> -   int rc, ret;
> +   int rc;
>
> rc = t4_get_raw_vpd_params(padap, &vpd);
> if (rc)
> @@ -2699,24 +2699,11 @@ int cudbg_collect_vpd_data(struct cudbg_init 
> *pdbg_init,
> if (rc)
> return rc;
>
> -   /* Serial Configuration Version is located beyond the PF's vpd size.
> -* Temporarily give access to entire EEPROM to get it.
> -*/
> -   rc = pci_set_vpd_size(padap->pdev, EEPROMVSIZE);
> -   if (rc < 0)
> -   return rc;
> -
> -   ret = cudbg_read_vpd_reg(padap, CUDBG_SCFG_VER_ADDR, 
> CUDBG_SCFG_VER_LEN,
> -&scfg_vers);
> -
> -   /* Restore back to original PF's vpd size */
> -   rc = pci_set_vpd_size(padap->pdev, CUDBG_VPD_PF_SIZE);
> -   if (rc < 0)
> +   rc = cudbg_read_vpd_reg(padap, CUDBG_SCFG_VER_ADDR, 
> CUDBG_SCFG_VER_LEN,
> +   &scfg_vers);
> +   if (rc)
> return rc;
>
> -   if (ret)
> -   return ret;
> -
> rc = cudbg_read_vpd_reg(padap, CUDBG_VPD_VER_ADDR, CUDBG_VPD_VER_LEN,
> vpd_str);
> if (rc)

Assuming that patch 2 is okay then this patch should be fine since it
is just toggling back and forth between the same value anyway.

Reviewed-by: Alexander Duyck

Re: [PATCH net-next v2 1/3] cxgb4: remove unused vpd_cap_addr

2021-02-08 Thread Alexander Duyck

On Fri, Feb 5, 2021 at 2:29 PM Heiner Kallweit  wrote:
>
> Supposedly this is a leftover from T3 driver heritage. cxgb4 uses the
> PCI core VPD access code that handles detection of VPD capabilities.
>
> Signed-off-by: Heiner Kallweit 

Instead of starting with the "Supposedly this is" it might be better
to word it along the lines of "This is likely". The "Supposedly" makes
it sound like you heard this as a rumor from somebody else.

Other than that nit about the description the change looks good to me.

Reviewed-by: Alexander Duyck 

> ---
>  drivers/net/ethernet/chelsio/cxgb4/cxgb4.h  | 1 -
>  drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 2 --
>  2 files changed, 3 deletions(-)
>
> diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h 
> b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
> index 8e681ce72..314f8d806 100644
> --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
> +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
> @@ -414,7 +414,6 @@ struct pf_resources {
>  };
>
>  struct pci_params {
> -   unsigned int vpd_cap_addr;
> unsigned char speed;
> unsigned char width;
>  };
> diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c 
> b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> index 9f1965c80..6264bc66a 100644
> --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> @@ -3201,8 +3201,6 @@ static void cxgb4_mgmt_fill_vf_station_mac_addr(struct 
> adapter *adap)
> int err;
> u8 *na;
>
> -   adap->params.pci.vpd_cap_addr = pci_find_capability(adap->pdev,
> -   PCI_CAP_ID_VPD);
> err = t4_get_raw_vpd_params(adap, &adap->params.vpd);
> if (err)
> return;
> --
> 2.30.0
>
>

Re: [PATCH net-next v2 2/3] PCI/VPD: Change Chelsio T4 quirk to provide access to full virtual address space

2021-02-08 Thread Alexander Duyck

On Fri, Feb 5, 2021 at 2:15 PM Heiner Kallweit  wrote:
>
> cxgb4 uses the full VPD address space for accessing its EEPROM (with some
> mapping, see t4_eeprom_ptov()). In cudbg_collect_vpd_data() it sets the
> VPD len to 32K (PCI_VPD_MAX_SIZE), and then back to 2K (CUDBG_VPD_PF_SIZE).
> Having official (structured) and inofficial (unstructured) VPD data
> violates the PCI spec, let's set VPD len according to all data that can be
> accessed via PCI VPD access, no matter of its structure.
>
> Signed-off-by: Heiner Kallweit 
> ---
>  drivers/pci/vpd.c | 7 +++
>  1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/pci/vpd.c b/drivers/pci/vpd.c
> index 7915d10f9..06a7954d0 100644
> --- a/drivers/pci/vpd.c
> +++ b/drivers/pci/vpd.c
> @@ -633,9 +633,8 @@ static void quirk_chelsio_extend_vpd(struct pci_dev *dev)
> /*
>  * If this is a T3-based adapter, there's a 1KB VPD area at offset
>  * 0xc00 which contains the preferred VPD values.  If this is a T4 or
> -* later based adapter, the special VPD is at offset 0x400 for the
> -* Physical Functions (the SR-IOV Virtual Functions have no VPD
> -* Capabilities).  The PCI VPD Access core routines will normally
> +* later based adapter, provide access to the full virtual EEPROM
> +* address space. The PCI VPD Access core routines will normally
>  * compute the size of the VPD by parsing the VPD Data Structure at
>  * offset 0x000.  This will result in silent failures when attempting
>  * to accesses these other VPD areas which are beyond those computed
> @@ -644,7 +643,7 @@ static void quirk_chelsio_extend_vpd(struct pci_dev *dev)
> if (chip == 0x0 && prod >= 0x20)
> pci_set_vpd_size(dev, 8192);
> else if (chip >= 0x4 && func < 0x8)
> -   pci_set_vpd_size(dev, 2048);
> +   pci_set_vpd_size(dev, PCI_VPD_MAX_SIZE);
>  }
>
>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, PCI_ANY_ID,

So as I recall the size value was added when some hardware was hanging
when an out-of-bounds read occured from various tools accessing the
VPD. I'm assuming if you are enabling full access the T4 hardware can
handle cases where an out-of-bounds read is requested?

Otherwise the code itself looks fine to me.

Reviewed-by: Alexander Duyck

Re: [PATCH net-next RESEND 3/5] net: stmmac: dwmac-sun8i: Use reset_control_reset

2021-02-08 Thread Alexander Duyck

On Sun, Feb 7, 2021 at 10:32 PM Samuel Holland  wrote:
>
> Use the appropriate function instead of reimplementing it,
> and update the error message to match the code.
>
> Reviewed-by: Chen-Yu Tsai 
> Signed-off-by: Samuel Holland 
> ---
>  drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c 
> b/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
> index 3c3d0b99d3e8..0e8d88417251 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
> @@ -806,11 +806,9 @@ static int sun8i_dwmac_power_internal_phy(struct 
> stmmac_priv *priv)
> /* Make sure the EPHY is properly reseted, as U-Boot may leave
>  * it at deasserted state, and thus it may fail to reset EMAC.
>  */
> -   reset_control_assert(gmac->rst_ephy);
> -
> -   ret = reset_control_deassert(gmac->rst_ephy);
> +   ret = reset_control_reset(gmac->rst_ephy);
> if (ret) {
> -   dev_err(priv->device, "Cannot deassert internal phy\n");
> +   dev_err(priv->device, "Cannot reset internal PHY\n");
> clk_disable_unprepare(gmac->ephy_clk);
> return ret;
> }

I'm assuming you have exclusive access to the phy and this isn't a
shared line? Just wanting to confirm since the function call has the
following comment in the header for the documentation.

 * Consumers must not use reset_control_(de)assert on shared reset lines when
 * reset_control_reset has been used.
 *

If that is the case it might not hurt to add some documentation to
your call to reset_control_reset here explaining that it is safe to do
so since you have exclusive access.

Re: [PATCH net-next v10 3/3] net: add sysfs attribute to control napi threaded mode

2021-02-04 Thread Alexander Duyck

On Thu, Feb 4, 2021 at 1:35 PM Wei Wang  wrote:
>
> This patch adds a new sysfs attribute to the network device class.
> Said attribute provides a per-device control to enable/disable the
> threaded mode for all the napi instances of the given network device,
> without the need for a device up/down.
> User sets it to 1 or 0 to enable or disable threaded mode.
> Note: when switching between threaded and the current softirq based mode
> for a napi instance, it will not immediately take effect if the napi is
> currently being polled. The mode switch will happen for the next time
> napi_schedule() is called.
>
> Co-developed-by: Paolo Abeni 
> Signed-off-by: Paolo Abeni 
> Co-developed-by: Hannes Frederic Sowa 
> Signed-off-by: Hannes Frederic Sowa 
> Co-developed-by: Felix Fietkau 
> Signed-off-by: Felix Fietkau 
> Signed-off-by: Wei Wang 
> ---
>  Documentation/ABI/testing/sysfs-class-net | 15 +
>  include/linux/netdevice.h |  2 +
>  net/core/dev.c| 67 ++-
>  net/core/net-sysfs.c  | 45 +++
>  4 files changed, 127 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/ABI/testing/sysfs-class-net 
> b/Documentation/ABI/testing/sysfs-class-net
> index 1f2002df5ba2..1419103d11f9 100644
> --- a/Documentation/ABI/testing/sysfs-class-net
> +++ b/Documentation/ABI/testing/sysfs-class-net
> @@ -337,3 +337,18 @@ Contact:   netdev@vger.kernel.org
>  Description:
> 32-bit unsigned integer counting the number of times the link 
> has
> been down
> +
> +What:  /sys/class/net//threaded
> +Date:  Jan 2021
> +KernelVersion: 5.12
> +Contact:   netdev@vger.kernel.org
> +Description:
> +   Boolean value to control the threaded mode per device. User 
> could
> +   set this value to enable/disable threaded mode for all napi
> +   belonging to this device, without the need to do device 
> up/down.
> +
> +   Possible values:
> +   == ==
> +   0  threaded mode disabled for this dev
> +   1  threaded mode enabled for this dev
> +   == ==
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 99fb4ec9573e..1340327f7abf 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -497,6 +497,8 @@ static inline bool napi_complete(struct napi_struct *n)
> return napi_complete_done(n, 0);
>  }
>
> +int dev_set_threaded(struct net_device *dev, bool threaded);
> +
>  /**
>   * napi_disable - prevent NAPI from scheduling
>   * @n: NAPI context
> diff --git a/net/core/dev.c b/net/core/dev.c
> index a8c5eca17074..9cc9b245419e 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4290,8 +4290,9 @@ static inline void napi_schedule(struct 
> softnet_data *sd,
>
> if (test_bit(NAPI_STATE_THREADED, &napi->state)) {
> /* Paired with smp_mb__before_atomic() in
> -* napi_enable(). Use READ_ONCE() to guarantee
> -* a complete read on napi->thread. Only call
> +* napi_enable()/napi_set_threaded().
> +* Use READ_ONCE() to guarantee a complete
> +* read on napi->thread. Only call
>  * wake_up_process() when it's not NULL.
>  */
> thread = READ_ONCE(napi->thread);
> @@ -6743,6 +6744,68 @@ static void init_gro_hash(struct napi_struct *napi)
> napi->gro_bitmask = 0;
>  }
>
> +/* Setting/unsetting threaded mode on a napi might not immediately
> + * take effect, if the current napi instance is actively being
> + * polled. In this case, the switch between threaded mode and
> + * softirq mode will happen in the next round of napi_schedule().
> + * This should not cause hiccups/stalls to the live traffic.
> + */
> +static int napi_set_threaded(struct napi_struct *n, bool threaded)
> +{
> +   int err = 0;
> +
> +   if (threaded == !!test_bit(NAPI_STATE_THREADED, &n->state))
> +   return 0;
> +
> +   if (!threaded) {
> +   clear_bit(NAPI_STATE_THREADED, &n->state);
> +   return 0;
> +   }


> +
> +   if (!n->thread) {
> +   err = napi_kthread_create(n);
> +   if (err)
> +   return err;
> +   }

This piece needs to be broken out similar to what we did for the
napi_add and napi enable. In the case where we are enabling the
threaded NAPI you should first go through and allocate all the
threads. Then once all the threads are allocated you then enable them
by setting the NAPI_STATE_THREADED bit.

I would pull this section out and place it in a loop in
dev_set_threaded to handle creating the threads before you set
dev->threaded and then set the threaded flags in the napi instances.

> +
> +   /* Make sure kthread is created before

Re: [PATCH net-next v10 2/3] net: implement threaded-able napi poll loop support

2021-02-04 Thread Alexander Duyck

On Thu, Feb 4, 2021 at 1:34 PM Wei Wang  wrote:
>
> This patch allows running each napi poll loop inside its own
> kernel thread.
> The kthread is created during netif_napi_add() if dev->threaded
> is set. And threaded mode is enabled in napi_enable(). We will
> provide a way to set dev->threaded and enable threaded mode
> without a device up/down in the following patch.
>
> Once that threaded mode is enabled and the kthread is
> started, napi_schedule() will wake-up such thread instead
> of scheduling the softirq.
>
> The threaded poll loop behaves quite likely the net_rx_action,
> but it does not have to manipulate local irqs and uses
> an explicit scheduling point based on netdev_budget.
>
> Co-developed-by: Paolo Abeni 
> Signed-off-by: Paolo Abeni 
> Co-developed-by: Hannes Frederic Sowa 
> Signed-off-by: Hannes Frederic Sowa 
> Co-developed-by: Jakub Kicinski 
> Signed-off-by: Jakub Kicinski 
> Signed-off-by: Wei Wang 
> ---
>  include/linux/netdevice.h |  21 +++
>  net/core/dev.c| 112 ++
>  2 files changed, 119 insertions(+), 14 deletions(-)
>

Looks good.

Reviewed-by: Alexander Duyck

Re: [PATCH net-next v3 2/4] net: Introduce {netdev,napi}_alloc_frag_align()

2021-02-04 Thread Alexander Duyck

On Thu, Feb 4, 2021 at 3:06 AM Kevin Hao  wrote:
>
> In the current implementation of {netdev,napi}_alloc_frag(), it doesn't
> have any align guarantee for the returned buffer address, But for some
> hardwares they do require the DMA buffer to be aligned correctly,
> so we would have to use some workarounds like below if the buffers
> allocated by the {netdev,napi}_alloc_frag() are used by these hardwares
> for DMA.
> buf = napi_alloc_frag(really_needed_size + align);
> buf = PTR_ALIGN(buf, align);
>
> These codes seems ugly and would waste a lot of memories if the buffers
> are used in a network driver for the TX/RX. We have added the align
> support for the page_frag functions, so add the corresponding
> {netdev,napi}_frag functions.
>
> Signed-off-by: Kevin Hao 
> ---
> v3: Use align mask and refactor the {netdev,napi}_alloc_frag_align() as
> suggested by Alexander.
>
>  include/linux/skbuff.h | 36 ++--
>  net/core/skbuff.c  | 26 ++----
>  2 files changed, 44 insertions(+), 18 deletions(-)

Reviewed-by: Alexander Duyck

Re: [PATCH net-next v3 1/4] mm: page_frag: Introduce page_frag_alloc_align()

2021-02-04 Thread Alexander Duyck

On Thu, Feb 4, 2021 at 3:06 AM Kevin Hao  wrote:
>
> In the current implementation of page_frag_alloc(), it doesn't have
> any align guarantee for the returned buffer address. But for some
> hardwares they do require the DMA buffer to be aligned correctly,
> so we would have to use some workarounds like below if the buffers
> allocated by the page_frag_alloc() are used by these hardwares for
> DMA.
> buf = page_frag_alloc(really_needed_size + align);
> buf = PTR_ALIGN(buf, align);
>
> These codes seems ugly and would waste a lot of memories if the buffers
> are used in a network driver for the TX/RX. So introduce
> page_frag_alloc_align() to make sure that an aligned buffer address is
> returned.
>
> Signed-off-by: Kevin Hao 
> Acked-by: Vlastimil Babka 
> ---
> v3: Use align mask as suggested by Alexander.
>
>  include/linux/gfp.h | 12 ++--
>  mm/page_alloc.c |  8 +---
>  2 files changed, 15 insertions(+), 5 deletions(-)

Looks good to me.

Reviewed-by: Alexander Duyck

Re: [PATCH net v2] udp: fix skb_copy_and_csum_datagram with odd segment sizes

2021-02-03 Thread Alexander Duyck

On Wed, Feb 3, 2021 at 11:29 AM Willem de Bruijn
 wrote:
>
> From: Willem de Bruijn 
>
> When iteratively computing a checksum with csum_block_add, track the
> offset "pos" to correctly rotate in csum_block_add when offset is odd.
>
> The open coded implementation of skb_copy_and_csum_datagram did this.
> With the switch to __skb_datagram_iter calling csum_and_copy_to_iter,
> pos was reinitialized to 0 on each call.
>
> Bring back the pos by passing it along with the csum to the callback.
>
> Changes v1->v2
>   - pass csum value, instead of csump pointer (Alexander Duyck)
>
> Link: https://lore.kernel.org/netdev/20210128152353.GB27281@optiplex/
> Fixes: 950fcaecd5cc ("datagram: consolidate datagram copy to iter helpers")
> Reported-by: Oliver Graute 
> Signed-off-by: Willem de Bruijn 

Looks good to me.

Reviewed-by: Alexander Duyck

Re: [PATCH net-next v9 3/3] net: add sysfs attribute to control napi threaded mode

2021-02-03 Thread Alexander Duyck

On Tue, Feb 2, 2021 at 5:01 PM Jakub Kicinski  wrote:
>
> On Fri, 29 Jan 2021 10:18:12 -0800 Wei Wang wrote:
> > This patch adds a new sysfs attribute to the network device class.
> > Said attribute provides a per-device control to enable/disable the
> > threaded mode for all the napi instances of the given network device,
> > without the need for a device up/down.
> > User sets it to 1 or 0 to enable or disable threaded mode.
> >
> > Co-developed-by: Paolo Abeni 
> > Signed-off-by: Paolo Abeni 
> > Co-developed-by: Hannes Frederic Sowa 
> > Signed-off-by: Hannes Frederic Sowa 
> > Co-developed-by: Felix Fietkau 
> > Signed-off-by: Felix Fietkau 
> > Signed-off-by: Wei Wang 
>
> > +static int napi_set_threaded(struct napi_struct *n, bool threaded)
> > +{
> > + int err = 0;
> > +
> > + if (threaded == !!test_bit(NAPI_STATE_THREADED, &n->state))
> > + return 0;
> > +
> > + if (!threaded) {
> > + clear_bit(NAPI_STATE_THREADED, &n->state);
>
> Can we put a note in the commit message saying that stopping the
> threads is slightly tricky but we'll do it if someone complains?
>
> Or is there a stronger reason than having to wait for thread to finish
> up with the NAPI not to stop them?

Normally if we are wanting to shut down NAPI we would have to go
through a coordinated process with us setting the NAPI_STATE_DISABLE
bit and then having to sit on NAPI_STATE_SCHED. Doing that would
likely cause a traffic hiccup if somebody toggles this while the NIC
is active so probably best to not interfere.

I suspect this should be more than enough to have us switch in and out
of the threaded setup. I don't think leaving the threads allocated
after someone has enabled it once should be much of an issue. As far
as using just the bit to do the disable, I think the most it would
probably take is a second or so for the queues to switch over from
threaded to normal NAPI again.

> > + return 0;
> > + }
> > +
> > + if (!n->thread) {
> > + err = napi_kthread_create(n);
> > + if (err)
> > + return err;
> > + }
> > +
> > + /* Make sure kthread is created before THREADED bit
> > +  * is set.
> > +  */
> > + smp_mb__before_atomic();
> > + set_bit(NAPI_STATE_THREADED, &n->state);
> > +
> > + return 0;
> > +}
> > +
> > +static void dev_disable_threaded_all(struct net_device *dev)
> > +{
> > + struct napi_struct *napi;
> > +
> > + list_for_each_entry(napi, &dev->napi_list, dev_list)
> > + napi_set_threaded(napi, false);
> > + dev->threaded = 0;
> > +}
> > +
> > +int dev_set_threaded(struct net_device *dev, bool threaded)
> > +{
> > + struct napi_struct *napi;
> > + int ret;
> > +
> > + dev->threaded = threaded;
> > + list_for_each_entry(napi, &dev->napi_list, dev_list) {
> > + ret = napi_set_threaded(napi, threaded);
> > + if (ret) {
> > + /* Error occurred on one of the napi,
> > +  * reset threaded mode on all napi.
> > +  */
> > + dev_disable_threaded_all(dev);
> > + break;
> > + }
> > + }
> > +
> > + return ret;
> > +}
> > +
> >  void netif_napi_add(struct net_device *dev, struct napi_struct *napi,
> >   int (*poll)(struct napi_struct *, int), int weight)
> >  {
> > diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
> > index daf502c13d6d..884f049ee395 100644
> > --- a/net/core/net-sysfs.c
> > +++ b/net/core/net-sysfs.c
> > @@ -538,6 +538,55 @@ static ssize_t phys_switch_id_show(struct device *dev,
> >  }
> >  static DEVICE_ATTR_RO(phys_switch_id);
> >
> > +static ssize_t threaded_show(struct device *dev,
> > +  struct device_attribute *attr, char *buf)
> > +{
> > + struct net_device *netdev = to_net_dev(dev);
> > + int ret;
> > +
> > + if (!rtnl_trylock())
> > + return restart_syscall();
> > +
> > + if (!dev_isalive(netdev)) {
> > + ret = -EINVAL;
> > + goto unlock;
> > + }
> > +
> > + if (list_empty(&netdev->napi_list)) {
> > + ret = -EOPNOTSUPP;
> > + goto unlock;
> > + }
>
> Maybe others disagree but I'd take this check out. What's wrong with
> letting users see that threaded napi is disabled for devices without
> NAPI?
>
> This will also help a little devices which remove NAPIs when they are
> down.
>
> I've been caught off guard in the past by the fact that kernel returns
> -ENOENT for XPS map when device has a single queue.

I agree there isn't any point to the check. I think this is a
hold-over from the original code that was querying each napi structure
assigned to the device.

> > + ret = sprintf(buf, fmt_dec, netdev->threaded);
> > +
> > +unlock:
> > + rtnl_unlock();
> > + return ret;
> > +}
> > +
> > +static int modify_napi_threaded(struct net_device *dev, unsigned long val)

Re: [PATCH net-next v9 2/3] net: implement threaded-able napi poll loop support

2021-02-03 Thread Alexander Duyck

On Fri, Jan 29, 2021 at 10:22 AM Wei Wang  wrote:
>
> This patch allows running each napi poll loop inside its own
> kernel thread.
> The kthread is created during netif_napi_add() if dev->threaded
> is set. And threaded mode is enabled in napi_enable(). We will
> provide a way to set dev->threaded and enable threaded mode
> without a device up/down in the following patch.
>
> Once that threaded mode is enabled and the kthread is
> started, napi_schedule() will wake-up such thread instead
> of scheduling the softirq.
>
> The threaded poll loop behaves quite likely the net_rx_action,
> but it does not have to manipulate local irqs and uses
> an explicit scheduling point based on netdev_budget.
>
> Co-developed-by: Paolo Abeni 
> Signed-off-by: Paolo Abeni 
> Co-developed-by: Hannes Frederic Sowa 
> Signed-off-by: Hannes Frederic Sowa 
> Co-developed-by: Jakub Kicinski 
> Signed-off-by: Jakub Kicinski 
> Signed-off-by: Wei Wang 
> ---
>  include/linux/netdevice.h |  21 +++
>  net/core/dev.c| 117 ++
>  2 files changed, 124 insertions(+), 14 deletions(-)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 02dcef4d66e2..f1e9fe9017ac 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -347,6 +347,7 @@ struct napi_struct {
> struct list_headdev_list;
> struct hlist_node   napi_hash_node;
> unsigned intnapi_id;
> +   struct task_struct  *thread;
>  };
>
>  enum {
> @@ -358,6 +359,7 @@ enum {
> NAPI_STATE_NO_BUSY_POLL,/* Do not add in napi_hash, no busy 
> polling */
> NAPI_STATE_IN_BUSY_POLL,/* sk_busy_loop() owns this NAPI */
> NAPI_STATE_PREFER_BUSY_POLL,/* prefer busy-polling over softirq 
> processing*/
> +   NAPI_STATE_THREADED,/* The poll is performed inside its 
> own thread*/
>  };
>
>  enum {
> @@ -369,6 +371,7 @@ enum {
> NAPIF_STATE_NO_BUSY_POLL= BIT(NAPI_STATE_NO_BUSY_POLL),
> NAPIF_STATE_IN_BUSY_POLL= BIT(NAPI_STATE_IN_BUSY_POLL),
> NAPIF_STATE_PREFER_BUSY_POLL= BIT(NAPI_STATE_PREFER_BUSY_POLL),
> +   NAPIF_STATE_THREADED= BIT(NAPI_STATE_THREADED),
>  };
>
>  enum gro_result {
> @@ -503,20 +506,7 @@ static inline bool napi_complete(struct napi_struct *n)
>   */
>  void napi_disable(struct napi_struct *n);
>
> -/**
> - * napi_enable - enable NAPI scheduling
> - * @n: NAPI context
> - *
> - * Resume NAPI from being scheduled on this context.
> - * Must be paired with napi_disable.
> - */
> -static inline void napi_enable(struct napi_struct *n)
> -{
> -   BUG_ON(!test_bit(NAPI_STATE_SCHED, &n->state));
> -   smp_mb__before_atomic();
> -   clear_bit(NAPI_STATE_SCHED, &n->state);
> -   clear_bit(NAPI_STATE_NPSVC, &n->state);
> -}
> +void napi_enable(struct napi_struct *n);
>
>  /**
>   * napi_synchronize - wait until NAPI is not running
> @@ -1826,6 +1816,8 @@ enum netdev_priv_flags {
>   *
>   * @wol_enabled:   Wake-on-LAN is enabled
>   *
> + * @threaded:  napi threaded mode is enabled
> + *
>   * @net_notifier_list: List of per-net netdev notifier block
>   * that follow this device when it is moved
>   * to another network namespace.
> @@ -2143,6 +2135,7 @@ struct net_device {
> struct lock_class_key   *qdisc_running_key;
> boolproto_down;
> unsignedwol_enabled:1;
> +   unsignedthreaded:1;
>
> struct list_headnet_notifier_list;
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 7d23bff03864..743dd69fba19 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -91,6 +91,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -1493,6 +1494,37 @@ void netdev_notify_peers(struct net_device *dev)
>  }
>  EXPORT_SYMBOL(netdev_notify_peers);
>
> +static int napi_threaded_poll(void *data);
> +
> +static int napi_kthread_create(struct napi_struct *n)
> +{
> +   int err = 0;
> +
> +   /* Create and wake up the kthread once to put it in
> +* TASK_INTERRUPTIBLE mode to avoid the blocked task
> +* warning and work with loadavg.
> +*/
> +   n->thread = kthread_run(napi_threaded_poll, n, "napi/%s-%d",
> +   n->dev->name, n->napi_id);
> +   if (IS_ERR(n->thread)) {
> +   err = PTR_ERR(n->thread);
> +   pr_err("kthread_run failed with err %d\n", err);
> +   n->thread = NULL;
> +   }
> +
> +   return err;
> +}
> +
> +static void napi_kthread_stop(struct napi_struct *n)
> +{
> +   if (!n->thread)
> +   return;
> +
> +   kthread_stop(n->thread);
> +   clear_bit(NAPI_STATE_THREADED, &n->state);
> +   n->thread = NULL;
> +}
> +

So I think the napi_kthread_stop

Re: [PATCH net-next v9 1/3] net: extract napi poll functionality to __napi_poll()

2021-02-03 Thread Alexander Duyck

On Fri, Jan 29, 2021 at 10:20 AM Wei Wang  wrote:
>
> From: Felix Fietkau 
>
> This commit introduces a new function __napi_poll() which does the main
> logic of the existing napi_poll() function, and will be called by other
> functions in later commits.
> This idea and implementation is done by Felix Fietkau  and
> is proposed as part of the patch to move napi work to work_queue
> context.
> This commit by itself is a code restructure.
>
> Signed-off-by: Felix Fietkau 
> Signed-off-by: Wei Wang 
> ---
>  net/core/dev.c | 35 +--
>  1 file changed, 25 insertions(+), 10 deletions(-)
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 0332f2e8f7da..7d23bff03864 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -6768,15 +6768,10 @@ void __netif_napi_del(struct napi_struct *napi)
>  }
>  EXPORT_SYMBOL(__netif_napi_del);
>
> -static int napi_poll(struct napi_struct *n, struct list_head *repoll)
> +static int __napi_poll(struct napi_struct *n, bool *repoll)
>  {
> -   void *have;
> int work, weight;
>
> -   list_del_init(&n->poll_list);
> -
> -   have = netpoll_poll_lock(n);
> -
> weight = n->weight;
>
> /* This NAPI_STATE_SCHED test is for avoiding a race
> @@ -6796,7 +6791,7 @@ static int napi_poll(struct napi_struct *n, struct 
> list_head *repoll)
> n->poll, work, weight);
>
> if (likely(work < weight))
> -   goto out_unlock;
> +   return work;
>
> /* Drivers must not modify the NAPI state if they
>  * consume the entire weight.  In such cases this code
> @@ -6805,7 +6800,7 @@ static int napi_poll(struct napi_struct *n, struct 
> list_head *repoll)
>  */
> if (unlikely(napi_disable_pending(n))) {
> napi_complete(n);
> -   goto out_unlock;
> +   return work;
> }
>
> /* The NAPI context has more processing work, but busy-polling
> @@ -6818,7 +6813,7 @@ static int napi_poll(struct napi_struct *n, struct 
> list_head *repoll)
>  */
> napi_schedule(n);
> }
> -   goto out_unlock;
> +   return work;
> }
>
> if (n->gro_bitmask) {
> @@ -6836,9 +6831,29 @@ static int napi_poll(struct napi_struct *n, struct 
> list_head *repoll)
> if (unlikely(!list_empty(&n->poll_list))) {
> pr_warn_once("%s: Budget exhausted after napi rescheduled\n",
>  n->dev ? n->dev->name : "backlog");
> -   goto out_unlock;
> +   return work;
> }
>
> +   *repoll = true;
> +
> +   return work;
> +}
> +
> +static int napi_poll(struct napi_struct *n, struct list_head *repoll)
> +{
> +   bool do_repoll = false;
> +   void *have;
> +   int work;
> +
> +   list_del_init(&n->poll_list);
> +
> +   have = netpoll_poll_lock(n);
> +
> +   work = __napi_poll(n, &do_repoll);
> +
> +   if (!do_repoll)
> +   goto out_unlock;
> +
> list_add_tail(&n->poll_list, repoll);
>
>  out_unlock:

Instead of using the out_unlock label why don't you only do the
list_add_tail if do_repoll is true? It will allow you to drop a few
lines of noise. Otherwise this looks good to me.

Reviewed-by: Alexander Duyck

Re: [PATCH net] udp: fix skb_copy_and_csum_datagram with odd segment sizes

2021-02-02 Thread Alexander Duyck

r now */
> return 0;
> @@ -1561,7 +1564,8 @@ size_t csum_and_copy_to_iter(const void *addr, size_t 
> bytes, void *csump,
> off += v.iov_len;
> })
> )
> -   *csum = sum;
> +   *csstate->csump = sum;
> +   csstate->off = off;
> return bytes;
>  }
>  EXPORT_SYMBOL(csum_and_copy_to_iter);
> diff --git a/net/core/datagram.c b/net/core/datagram.c
> index 81809fa735a7..c6ac5413dda9 100644
> --- a/net/core/datagram.c
> +++ b/net/core/datagram.c
> @@ -721,8 +721,10 @@ static int skb_copy_and_csum_datagram(const struct 
> sk_buff *skb, int offset,
>   struct iov_iter *to, int len,
>   __wsum *csump)
>  {
> +   struct csum_state csdata = { .csump = csump };
> +
> return __skb_datagram_iter(skb, offset, to, len, true,
> -   csum_and_copy_to_iter, csump);
> +   csum_and_copy_to_iter, &csdata);
>  }
>
>  /**

The rest of this looks good to me, and my only complaint is the
performance nit called out above.

Reviewed-by: Alexander Duyck

Re: [PATCH net-next v2 2/4] net: Introduce {netdev,napi}_alloc_frag_align()

2021-02-02 Thread Alexander Duyck

On Sun, Jan 31, 2021 at 12:17 AM Kevin Hao  wrote:
>
> In the current implementation of {netdev,napi}_alloc_frag(), it doesn't
> have any align guarantee for the returned buffer address, But for some
> hardwares they do require the DMA buffer to be aligned correctly,
> so we would have to use some workarounds like below if the buffers
> allocated by the {netdev,napi}_alloc_frag() are used by these hardwares
> for DMA.
> buf = napi_alloc_frag(really_needed_size + align);
> buf = PTR_ALIGN(buf, align);
>
> These codes seems ugly and would waste a lot of memories if the buffers
> are used in a network driver for the TX/RX. We have added the align
> support for the page_frag functions, so add the corresponding
> {netdev,napi}_frag functions.
>
> Signed-off-by: Kevin Hao 
> ---
> v2: Inline {netdev,napi}_alloc_frag().
>
>  include/linux/skbuff.h | 22 --
>  net/core/skbuff.c  | 25 +
>  2 files changed, 29 insertions(+), 18 deletions(-)
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 9313b5aaf45b..7e8beff4ff22 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -2818,7 +2818,19 @@ void skb_queue_purge(struct sk_buff_head *list);
>
>  unsigned int skb_rbtree_purge(struct rb_root *root);
>
> -void *netdev_alloc_frag(unsigned int fragsz);
> +void *netdev_alloc_frag_align(unsigned int fragsz, int align);
> +
> +/**
> + * netdev_alloc_frag - allocate a page fragment
> + * @fragsz: fragment size
> + *
> + * Allocates a frag from a page for receive buffer.
> + * Uses GFP_ATOMIC allocations.
> + */
> +static inline void *netdev_alloc_frag(unsigned int fragsz)
> +{
> +   return netdev_alloc_frag_align(fragsz, 0);
> +}
>

So one thing we may want to do is actually split this up so that we
have a __netdev_alloc_frag_align function that is called by one of two
inline functions. The standard netdev_alloc_frag would be like what
you have here, however we would be passing ~0 for the mask.

The "align" version would be taking in an unsigned int align value and
converting it to a mask. The idea is that your mask value is likely a
constant so converting the constant to a mask would be much easier to
do in an inline function as the compiler can take care of converting
the value during compile time.

An added value to that is you could also add tests to the align value
to guarantee that the value being passed is a power of 2 so that it
works with the alignment mask generation as expected.

>  struct sk_buff *__netdev_alloc_skb(struct net_device *dev, unsigned int 
> length,
>gfp_t gfp_mask);
> @@ -2877,7 +2889,13 @@ static inline void skb_free_frag(void *addr)
> page_frag_free(addr);
>  }
>
> -void *napi_alloc_frag(unsigned int fragsz);
> +void *napi_alloc_frag_align(unsigned int fragsz, int align);
> +
> +static inline void *napi_alloc_frag(unsigned int fragsz)
> +{
> +   return napi_alloc_frag_align(fragsz, 0);
> +}
> +
>  struct sk_buff *__napi_alloc_skb(struct napi_struct *napi,
>  unsigned int length, gfp_t gfp_mask);
>  static inline struct sk_buff *napi_alloc_skb(struct napi_struct *napi,

Same for the __napi_alloc_frag code. You could probably convert the
__napi_alloc_frag below into an __napi_alloc_frag_align that you pass
a mask to. Then you could convert the other two functions to either
pass ~0 or the align value and add align value validation.

> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 2af12f7e170c..a35e75f12428 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -374,29 +374,22 @@ struct napi_alloc_cache {
>  static DEFINE_PER_CPU(struct page_frag_cache, netdev_alloc_cache);
>  static DEFINE_PER_CPU(struct napi_alloc_cache, napi_alloc_cache);
>
> -static void *__napi_alloc_frag(unsigned int fragsz, gfp_t gfp_mask)
> +static void *__napi_alloc_frag(unsigned int fragsz, gfp_t gfp_mask, int 
> align)
>  {
> struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache);
>
> -   return page_frag_alloc(&nc->page, fragsz, gfp_mask);
> +   return page_frag_alloc_align(&nc->page, fragsz, gfp_mask, align);
>  }
>
> -void *napi_alloc_frag(unsigned int fragsz)
> +void *napi_alloc_frag_align(unsigned int fragsz, int align)
>  {
> fragsz = SKB_DATA_ALIGN(fragsz);
>
> -   return __napi_alloc_frag(fragsz, GFP_ATOMIC);
> +   return __napi_alloc_frag(fragsz, GFP_ATOMIC, align);
>  }
> -EXPORT_SYMBOL(napi_alloc_frag);
> +EXPORT_SYMBOL(napi_alloc_frag_align);
>
> -/**
> - * netdev_alloc_frag - allocate a page fragment
> - * @fragsz: fragment size
> - *
> - * Allocates a frag from a page for receive buffer.
> - * Uses GFP_ATOMIC allocations.
> - */
> -void *netdev_alloc_frag(unsigned int fragsz)
> +void *netdev_alloc_frag_align(unsigned int fragsz, int align)
>  {
> struct page_frag_cache *nc;
> void *data;
> @@ -404,15 +397,15 @@ void *netdev_alloc_frag(unsigned int fragsz)
>

Re: [PATCH net-next v2 1/4] mm: page_frag: Introduce page_frag_alloc_align()

2021-02-02 Thread Alexander Duyck

On Sat, Jan 30, 2021 at 11:54 PM Kevin Hao  wrote:
>
> In the current implementation of page_frag_alloc(), it doesn't have
> any align guarantee for the returned buffer address. But for some
> hardwares they do require the DMA buffer to be aligned correctly,
> so we would have to use some workarounds like below if the buffers
> allocated by the page_frag_alloc() are used by these hardwares for
> DMA.
> buf = page_frag_alloc(really_needed_size + align);
> buf = PTR_ALIGN(buf, align);
>
> These codes seems ugly and would waste a lot of memories if the buffers
> are used in a network driver for the TX/RX. So introduce
> page_frag_alloc_align() to make sure that an aligned buffer address is
> returned.
>
> Signed-off-by: Kevin Hao 
> Acked-by: Vlastimil Babka 
> ---
> v2:
>   - Inline page_frag_alloc()
>   - Adopt Vlastimil's suggestion and add his Acked-by
>
>  include/linux/gfp.h | 12 ++--
>  mm/page_alloc.c |  8 +---
>  2 files changed, 15 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index 6e479e9c48ce..39f4b3070d09 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -583,8 +583,16 @@ extern void free_pages(unsigned long addr, unsigned int 
> order);
>
>  struct page_frag_cache;
>  extern void __page_frag_cache_drain(struct page *page, unsigned int count);
> -extern void *page_frag_alloc(struct page_frag_cache *nc,
> -unsigned int fragsz, gfp_t gfp_mask);
> +extern void *page_frag_alloc_align(struct page_frag_cache *nc,
> +  unsigned int fragsz, gfp_t gfp_mask,
> +  int align);
> +
> +static inline void *page_frag_alloc(struct page_frag_cache *nc,
> +unsigned int fragsz, gfp_t gfp_mask)
> +{
> +   return page_frag_alloc_align(nc, fragsz, gfp_mask, 0);
> +}
> +
>  extern void page_frag_free(void *addr);
>
>  #define __free_page(page) __free_pages((page), 0)
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 519a60d5b6f7..4667e7b6993b 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5137,8 +5137,8 @@ void __page_frag_cache_drain(struct page *page, 
> unsigned int count)
>  }
>  EXPORT_SYMBOL(__page_frag_cache_drain);
>
> -void *page_frag_alloc(struct page_frag_cache *nc,
> - unsigned int fragsz, gfp_t gfp_mask)
> +void *page_frag_alloc_align(struct page_frag_cache *nc,
> + unsigned int fragsz, gfp_t gfp_mask, int align)

I would make "align" unsigned since really we are using it as a mask.
Actually passing it as a mask might be even better. More on that
below.

>  {
> unsigned int size = PAGE_SIZE;
> struct page *page;
> @@ -5190,11 +5190,13 @@ void *page_frag_alloc(struct page_frag_cache *nc,
> }
>
> nc->pagecnt_bias--;
> +   if (align)
> +   offset = ALIGN_DOWN(offset, align);
> nc->offset = offset;
>
> return nc->va + offset;
>  }
> -EXPORT_SYMBOL(page_frag_alloc);
> +EXPORT_SYMBOL(page_frag_alloc_align);
>
>  /*
>   * Frees a page fragment allocated out of either a compound or order 0 page.

Rather than using the conditional branch it might be better to just do
"offset &= align_mask". Then you would be adding at most 1 instruction
which can likely occur in parallel with the other work that is going
on versus the conditional branch which requires a test, jump, and then
the 3 alignment instructions to do the subtraction, inversion, and
AND.

However it would ripple through the other patches as you would also
need to update you other patches to assume ~0 in the unaligned case,
however with your masked cases you could just use the negative
alignment value to generate your mask which would likely be taken care
of by the compiler.

Re: [PATCHv3 net-next 1/2] net: support ip generic csum processing in skb_csum_hwoffload_help

2021-01-28 Thread Alexander Duyck

On Thu, Jan 28, 2021 at 12:00 PM Willem de Bruijn
 wrote:
>
> On Thu, Jan 28, 2021 at 2:46 PM Alexander Duyck
>  wrote:
> >
> > On Thu, Jan 28, 2021 at 6:07 AM Willem de Bruijn
> >  wrote:
> > >
> > > On Thu, Jan 28, 2021 at 4:29 AM Xin Long  wrote:
> > > >
> > > > NETIF_F_IP|IPV6_CSUM feature flag indicates UDP and TCP csum offload
> > > > while NETIF_F_HW_CSUM feature flag indicates ip generic csum offload
> > > > for HW, which includes not only for TCP/UDP csum, but also for other
> > > > protocols' csum like GRE's.
> > > >
> > > > However, in skb_csum_hwoffload_help() it only checks features against
> > > > NETIF_F_CSUM_MASK(NETIF_F_HW|IP|IPV6_CSUM). So if it's a non TCP/UDP
> > > > packet and the features doesn't support NETIF_F_HW_CSUM, but supports
> > > > NETIF_F_IP|IPV6_CSUM only, it would still return 0 and leave the HW
> > > > to do csum.
> > > >
> > > > This patch is to support ip generic csum processing by checking
> > > > NETIF_F_HW_CSUM for all protocols, and check (NETIF_F_IP_CSUM |
> > > > NETIF_F_IPV6_CSUM) only for TCP and UDP.
> > > >
> > > > Note that we're using skb->csum_offset to check if it's a TCP/UDP
> > > > proctol, this might be fragile. However, as Alex said, for now we
> > > > only have a few L4 protocols that are requesting Tx csum offload,
> > > > we'd better fix this until a new protocol comes with a same csum
> > > > offset.
> > > >
> > > > v1->v2:
> > > >   - not extend skb->csum_not_inet, but use skb->csum_offset to tell
> > > > if it's an UDP/TCP csum packet.
> > > > v2->v3:
> > > >   - add a note in the changelog, as Willem suggested.
> > > >
> > > > Suggested-by: Alexander Duyck 
> > > > Signed-off-by: Xin Long 
> > > > ---
> > > >  net/core/dev.c | 13 -
> > > >  1 file changed, 12 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > > index 6df3f1b..aae116d 100644
> > > > --- a/net/core/dev.c
> > > > +++ b/net/core/dev.c
> > > > @@ -3621,7 +3621,18 @@ int skb_csum_hwoffload_help(struct sk_buff *skb,
> > > > return !!(features & NETIF_F_SCTP_CRC) ? 0 :
> > > > skb_crc32c_csum_help(skb);
> > > >
> > > > -   return !!(features & NETIF_F_CSUM_MASK) ? 0 : 
> > > > skb_checksum_help(skb);
> > > > +   if (features & NETIF_F_HW_CSUM)
> > > > +   return 0;
> > > > +
> > > > +   if (features & (NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM)) {
> > >
> > > Should this check the specific feature flag against skb->protocol? I
> > > don't know if there are actually instances that only support one of
> > > the two flags.
> >
> > The issue is at a certain point we start excluding devices that were
> > previously working.
> >
> > All this patch is really doing is using the checksum offset to
> > identify the cases that were previously UDP or TCP offloads and
> > letting those through with the legacy path, while any offsets that are
> > not those two, such as the GRE checksum will now have to be explicitly
> > caught by the NETIF_F_HW_CSUM case and not accepted by the other
> > cases.
>
> I understand. But letting through an IPv6 packet to a nic that
> advertises NETIF_F_IP_CSUM, but not NETIF_F_IPV6_CSUM, is still
> incorrect, right?

That all depends. The problem is if we are going to look at protocol
we essentially have to work our way through a number of fields and
sort out if there are tunnels or not and if so what the protocol for
the inner headers are and if that is supported. It might make more
sense in that case to look at incorporating a v4/v6 specific check
into netif_skb_features so we could mask off the bit there.

The question i would have is how has this code been working up until
now without that check? If we are broken outright and need to add it
then maybe this should be deemed more of a fix and pushed for net with
the added protocol bit masking added.

Re: [PATCHv3 net-next 1/2] net: support ip generic csum processing in skb_csum_hwoffload_help

2021-01-28 Thread Alexander Duyck

On Thu, Jan 28, 2021 at 6:07 AM Willem de Bruijn
 wrote:
>
> On Thu, Jan 28, 2021 at 4:29 AM Xin Long  wrote:
> >
> > NETIF_F_IP|IPV6_CSUM feature flag indicates UDP and TCP csum offload
> > while NETIF_F_HW_CSUM feature flag indicates ip generic csum offload
> > for HW, which includes not only for TCP/UDP csum, but also for other
> > protocols' csum like GRE's.
> >
> > However, in skb_csum_hwoffload_help() it only checks features against
> > NETIF_F_CSUM_MASK(NETIF_F_HW|IP|IPV6_CSUM). So if it's a non TCP/UDP
> > packet and the features doesn't support NETIF_F_HW_CSUM, but supports
> > NETIF_F_IP|IPV6_CSUM only, it would still return 0 and leave the HW
> > to do csum.
> >
> > This patch is to support ip generic csum processing by checking
> > NETIF_F_HW_CSUM for all protocols, and check (NETIF_F_IP_CSUM |
> > NETIF_F_IPV6_CSUM) only for TCP and UDP.
> >
> > Note that we're using skb->csum_offset to check if it's a TCP/UDP
> > proctol, this might be fragile. However, as Alex said, for now we
> > only have a few L4 protocols that are requesting Tx csum offload,
> > we'd better fix this until a new protocol comes with a same csum
> > offset.
> >
> > v1->v2:
> >   - not extend skb->csum_not_inet, but use skb->csum_offset to tell
> > if it's an UDP/TCP csum packet.
> > v2->v3:
> >   - add a note in the changelog, as Willem suggested.
> >
> > Suggested-by: Alexander Duyck 
> > Signed-off-by: Xin Long 
> > ---
> >  net/core/dev.c | 13 -
> >  1 file changed, 12 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 6df3f1b..aae116d 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -3621,7 +3621,18 @@ int skb_csum_hwoffload_help(struct sk_buff *skb,
> > return !!(features & NETIF_F_SCTP_CRC) ? 0 :
> > skb_crc32c_csum_help(skb);
> >
> > -   return !!(features & NETIF_F_CSUM_MASK) ? 0 : 
> > skb_checksum_help(skb);
> > +   if (features & NETIF_F_HW_CSUM)
> > +   return 0;
> > +
> > +   if (features & (NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM)) {
>
> Should this check the specific feature flag against skb->protocol? I
> don't know if there are actually instances that only support one of
> the two flags.

The issue is at a certain point we start excluding devices that were
previously working.

All this patch is really doing is using the checksum offset to
identify the cases that were previously UDP or TCP offloads and
letting those through with the legacy path, while any offsets that are
not those two, such as the GRE checksum will now have to be explicitly
caught by the NETIF_F_HW_CSUM case and not accepted by the other
cases.

Re: [PATCH net-next 03/11] net-sysfs: move the xps cpus/rxqs retrieval in a common function

2021-01-28 Thread Alexander Duyck

On Thu, Jan 28, 2021 at 6:44 AM Antoine Tenart  wrote:
>
> Most of the xps_cpus_show and xps_rxqs_show functions share the same
> logic. Having it in two different functions does not help maintenance
> and we can already see small implementation differences. This should not
> be the case and this patch moves their common logic into a new function,
> xps_queue_show, to improve maintenance.
>
> While the rtnl lock could be held in the new xps_queue_show, it is still
> held in xps_cpus_show and xps_rxqs_show as this is an important
> information when looking at those two functions. This does not add
> complexity.
>
> Signed-off-by: Antoine Tenart 
> ---
>  net/core/net-sysfs.c | 168 ---
>  1 file changed, 79 insertions(+), 89 deletions(-)
>
> diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
> index 5a39e9b38e5f..6e6bc05181f6 100644
> --- a/net/core/net-sysfs.c
> +++ b/net/core/net-sysfs.c
> @@ -1314,77 +1314,98 @@ static const struct attribute_group dql_group = {
>  #endif /* CONFIG_BQL */
>
>  #ifdef CONFIG_XPS
> -static ssize_t xps_cpus_show(struct netdev_queue *queue,
> -char *buf)
> +/* Should be called with the rtnl lock held. */
> +static int xps_queue_show(struct net_device *dev, unsigned long **mask,
> + unsigned int index, bool is_rxqs_map)
>  {
> -   int cpu, len, ret, num_tc = 1, tc = 0;
> -   struct net_device *dev = queue->dev;
> +   const unsigned long *possible_mask = NULL;
> +   int j, num_tc = 0, tc = 0, ret = 0;
> struct xps_dev_maps *dev_maps;
> -   unsigned long *mask;
> -   unsigned int index;
> -
> -   if (!netif_is_multiqueue(dev))
> -   return -ENOENT;
> -
> -   index = get_netdev_queue_index(queue);
> -
> -   if (!rtnl_trylock())
> -   return restart_syscall();
> +   unsigned int nr_ids;
>
> if (dev->num_tc) {
> /* Do not allow XPS on subordinate device directly */
> num_tc = dev->num_tc;
> -   if (num_tc < 0) {
> -   ret = -EINVAL;
> -   goto err_rtnl_unlock;
> -   }
> +   if (num_tc < 0)
> +   return -EINVAL;
>
> /* If queue belongs to subordinate dev use its map */
> dev = netdev_get_tx_queue(dev, index)->sb_dev ? : dev;
>
> tc = netdev_txq_to_tc(dev, index);
> -   if (tc < 0) {
> -   ret = -EINVAL;
> -   goto err_rtnl_unlock;
> -   }
> +   if (tc < 0)
> +   return -EINVAL;
> }
>
> -   mask = bitmap_zalloc(nr_cpu_ids, GFP_KERNEL);
> -   if (!mask) {
> -   ret = -ENOMEM;
> -   goto err_rtnl_unlock;
> +   rcu_read_lock();
> +
> +   if (is_rxqs_map) {
> +   dev_maps = rcu_dereference(dev->xps_rxqs_map);
> +   nr_ids = dev->num_rx_queues;
> +   } else {
> +   dev_maps = rcu_dereference(dev->xps_cpus_map);
> +   nr_ids = nr_cpu_ids;
> +   if (num_possible_cpus() > 1)
> +   possible_mask = cpumask_bits(cpu_possible_mask);
> }

I was good with what we had up until this point. THe issue is we are
allocating the nr_ids for the bitmap in one location, and then
populating it here.

In order to keep this consisten we would need to either hold the
rtnl_lock in the case of the number of Rx queues, or we would need to
call cpus_read_lock in order to prevent the number of CPUs from
changing. It may be better to look at encoding the number of IDs into
the map first, and then using that value. Otherwise we need to be
holding the appropriate lock or passing the number of IDs ourselves as
an argument.

Also we can just drop the possible_mask. No point in carrying it and
it will simplify the loop below as we shouldn't have added the CPU if
it wasn't possible to access it.

> +   if (!dev_maps)
> +   goto rcu_unlock;
>
> -   rcu_read_lock();
> -   dev_maps = rcu_dereference(dev->xps_cpus_map);
> -   if (dev_maps) {
> -   for_each_possible_cpu(cpu) {
> -   int i, tci = cpu * num_tc + tc;
> -   struct xps_map *map;
> -
> -   map = rcu_dereference(dev_maps->attr_map[tci]);
> -   if (!map)
> -   continue;
> -
> -   for (i = map->len; i--;) {
> -   if (map->queues[i] == index) {
> -   set_bit(cpu, mask);
> -   break;
> -   }
> +   for (j = -1; j = netif_attrmask_next(j, possible_mask, nr_ids),
> +j < nr_ids;) {

I would drop the mask check and just work from 0 to nr_ids - 1.

> +   int i, tci = j * num_tc + tc

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1940 matches

Mail list logo