from:"Ross Lagerwall"

[PATCH] xen/netback: Reset nr_frags before freeing skb

2019-08-05 Thread Ross Lagerwall

At this point nr_frags has been incremented but the frag does not yet
have a page assigned so freeing the skb results in a crash. Reset
nr_frags before freeing the skb to prevent this.

Signed-off-by: Ross Lagerwall 
---
 drivers/net/xen-netback/netback.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/xen-netback/netback.c 
b/drivers/net/xen-netback/netback.c
index 1d9940d4e8c7..c9262ffeefe4 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -925,6 +925,7 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue,
skb_shinfo(skb)->nr_frags = MAX_SKB_FRAGS;
nskb = xenvif_alloc_skb(0);
if (unlikely(nskb == NULL)) {
+   skb_shinfo(skb)->nr_frags = 0;
kfree_skb(skb);
xenvif_tx_err(queue, &txreq, extra_count, idx);
if (net_ratelimit())
@@ -940,6 +941,7 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue,
 
if (xenvif_set_skb_gso(queue->vif, skb, gso)) {
/* Failure in xenvif_set_skb_gso is fatal. */
+   skb_shinfo(skb)->nr_frags = 0;
kfree_skb(skb);
kfree_skb(nskb);
break;
-- 
2.17.2

[PATCH] net: Fix usage of pskb_trim_rcsum

2019-01-17 Thread Ross Lagerwall

In certain cases, pskb_trim_rcsum() may change skb pointers.
Reinitialize header pointers afterwards to avoid potential
use-after-frees. Add a note in the documentation of
pskb_trim_rcsum(). Found by KASAN.

Signed-off-by: Ross Lagerwall 
---
 drivers/net/ppp/pppoe.c  | 1 +
 include/linux/skbuff.h   | 1 +
 net/bridge/br_netfilter_ipv6.c   | 1 +
 net/bridge/netfilter/nft_reject_bridge.c | 1 +
 net/ipv4/ip_input.c  | 1 +
 5 files changed, 5 insertions(+)

diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c
index 62dc564b251d..f22639f0116a 100644
--- a/drivers/net/ppp/pppoe.c
+++ b/drivers/net/ppp/pppoe.c
@@ -445,6 +445,7 @@ static int pppoe_rcv(struct sk_buff *skb, struct net_device 
*dev,
if (pskb_trim_rcsum(skb, len))
goto drop;
 
+   ph = pppoe_hdr(skb);
pn = pppoe_pernet(dev_net(dev));
 
/* Note that get_item does a sock_hold(), so sk_pppox(po)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 93f56fddd92a..95d25b010a25 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3218,6 +3218,7 @@ int pskb_trim_rcsum_slow(struct sk_buff *skb, unsigned 
int len);
  *
  * This is exactly the same as pskb_trim except that it ensures the
  * checksum of received packets are still valid after the operation.
+ * It can change skb pointers.
  */
 
 static inline int pskb_trim_rcsum(struct sk_buff *skb, unsigned int len)
diff --git a/net/bridge/br_netfilter_ipv6.c b/net/bridge/br_netfilter_ipv6.c
index 94039f588f1d..564710f88f93 100644
--- a/net/bridge/br_netfilter_ipv6.c
+++ b/net/bridge/br_netfilter_ipv6.c
@@ -131,6 +131,7 @@ int br_validate_ipv6(struct net *net, struct sk_buff *skb)
IPSTATS_MIB_INDISCARDS);
goto drop;
}
+   hdr = ipv6_hdr(skb);
}
if (hdr->nexthdr == NEXTHDR_HOP && br_nf_check_hbh_len(skb))
goto drop;
diff --git a/net/bridge/netfilter/nft_reject_bridge.c 
b/net/bridge/netfilter/nft_reject_bridge.c
index 08cbed7d940e..419e8edf23ba 100644
--- a/net/bridge/netfilter/nft_reject_bridge.c
+++ b/net/bridge/netfilter/nft_reject_bridge.c
@@ -229,6 +229,7 @@ static bool reject6_br_csum_ok(struct sk_buff *skb, int 
hook)
pskb_trim_rcsum(skb, ntohs(ip6h->payload_len) + sizeof(*ip6h)))
return false;
 
+   ip6h = ipv6_hdr(skb);
thoff = ipv6_skip_exthdr(skb, ((u8*)(ip6h+1) - skb->data), &proto, &fo);
if (thoff < 0 || thoff >= skb->len || (fo & htons(~0x7)) != 0)
return false;
diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
index 26921f6b3b92..51d8efba6de2 100644
--- a/net/ipv4/ip_input.c
+++ b/net/ipv4/ip_input.c
@@ -488,6 +488,7 @@ static struct sk_buff *ip_rcv_core(struct sk_buff *skb, 
struct net *net)
goto drop;
}
 
+   iph = ip_hdr(skb);
skb->transport_header = skb->network_header + iph->ihl*4;
 
/* Remove any debris in the socket control block */
-- 
2.17.2

[PATCH] openvswitch: Avoid OOB read when parsing flow nlattrs

2019-01-14 Thread Ross Lagerwall

For nested and variable attributes, the expected length of an attribute
is not known and marked by a negative number.  This results in an OOB
read when the expected length is later used to check if the attribute is
all zeros. Fix this by using the actual length of the attribute rather
than the expected length.

Signed-off-by: Ross Lagerwall 
---
 net/openvswitch/flow_netlink.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index 435a4bdf8f89..691da853bef5 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -500,7 +500,7 @@ static int __parse_flow_nlattrs(const struct nlattr *attr,
return -EINVAL;
}
 
-   if (!nz || !is_all_zero(nla_data(nla), expected_len)) {
+   if (!nz || !is_all_zero(nla_data(nla), nla_len(nla))) {
attrs |= 1 << type;
a[type] = nla;
}
-- 
2.17.2

Re: [PATCH] net: Fix xps_needed inc/dec mismatch

2018-12-07 Thread Ross Lagerwall


On 12/7/18 11:01 AM, Sabrina Dubroca wrote:

Hi Ross,

2018-12-07, 10:16:21 +, Ross Lagerwall wrote:

xps_needed is incremented only when a new dev map is allocated (in
__netif_set_xps_queue). Therefore it should be decremented only when we
actually have a dev map to destroy. Without this, it may be decremented
too many times which causes netif_reset_xps_queues to return early and
not actually clean up the old dev maps. This results in a crash in
__netif_set_xps_queue when it is called later.

The crash occurred when having multiple ixgbe devices in a host. lldpad
would reconfigure them to be FCoE-capable causing reset_xps_queues /
set_xps_queue to be called several times. The xps_needed count would get
out of sync and eventually the above-mentioned crash would occur.

Signed-off-by: Ross Lagerwall 


I posted another patchset recently (commits f28c020fb488 and
867d0ad476db in the "net" tree) for issues in XPS, including broken
xps_needed accounting, so your patch won't apply to David's "net"
tree. Could you try it with your use case, and if you still see
issues, fix them on top? You can grab the latest net tree here:

git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git



Your two commits fix the issue I was seeing. Thanks!

--
Ross Lagerwall

[PATCH] net: Fix xps_needed inc/dec mismatch

2018-12-07 Thread Ross Lagerwall

xps_needed is incremented only when a new dev map is allocated (in
__netif_set_xps_queue). Therefore it should be decremented only when we
actually have a dev map to destroy. Without this, it may be decremented
too many times which causes netif_reset_xps_queues to return early and
not actually clean up the old dev maps. This results in a crash in
__netif_set_xps_queue when it is called later.

The crash occurred when having multiple ixgbe devices in a host. lldpad
would reconfigure them to be FCoE-capable causing reset_xps_queues /
set_xps_queue to be called several times. The xps_needed count would get
out of sync and eventually the above-mentioned crash would occur.

Signed-off-by: Ross Lagerwall 
---
 net/core/dev.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index ddc551f24ba2..8aa72e93af9f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2233,11 +2233,12 @@ static void netif_reset_xps_queues(struct net_device 
*dev, u16 offset,
clean_xps_maps(dev, possible_mask, dev_maps, nr_ids, offset, count,
   false);
 
-out_no_maps:
if (static_key_enabled(&xps_rxqs_needed))
static_key_slow_dec_cpuslocked(&xps_rxqs_needed);
 
static_key_slow_dec_cpuslocked(&xps_needed);
+
+out_no_maps:
mutex_unlock(&xps_map_mutex);
cpus_read_unlock();
 }
-- 
2.17.1

[PATCH] ixgbe: Fix race when the VF driver does a reset

2018-12-05 Thread Ross Lagerwall

When the VF driver does a reset, it (at least the Linux one) writes to
the VFCTRL register to issue a reset and then immediately sends a reset
message using the mailbox API. This is racy because when the PF driver
detects that the VFCTRL register reset pin has been asserted, it clears
the mailbox memory. Depending on ordering, the reset message sent by
the VF could be cleared by the PF driver. It then responds to the
cleared message with a NACK which causes the VF driver to malfunction.
Fix this by deferring clearing the mailbox memory until the reset
message is received.

Fixes: 939b701ad633 ("ixgbe: fix driver behaviour after issuing VFLR")
Signed-off-by: Ross Lagerwall 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
index 5dacfc870259..345701af7749 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
@@ -700,7 +700,6 @@ static inline void ixgbe_vf_reset_event(struct 
ixgbe_adapter *adapter, u32 vf)
u8 num_tcs = adapter->hw_tcs;
u32 reg_val;
u32 queue;
-   u32 word;
 
/* remove VLAN filters beloning to this VF */
ixgbe_clear_vf_vlans(adapter, vf);
@@ -758,6 +757,14 @@ static inline void ixgbe_vf_reset_event(struct 
ixgbe_adapter *adapter, u32 vf)
}
}
 
+   IXGBE_WRITE_FLUSH(hw);
+}
+
+static void ixgbe_vf_clear_mbx(struct ixgbe_adapter *adapter, u32 vf)
+{
+   struct ixgbe_hw *hw = &adapter->hw;
+   u32 word;
+
/* Clear VF's mailbox memory */
for (word = 0; word < IXGBE_VFMAILBOX_SIZE; word++)
IXGBE_WRITE_REG_ARRAY(hw, IXGBE_PFMBMEM(vf), word, 0);
@@ -831,6 +838,8 @@ static int ixgbe_vf_reset_msg(struct ixgbe_adapter 
*adapter, u32 vf)
/* reset the filters for the device */
ixgbe_vf_reset_event(adapter, vf);
 
+   ixgbe_vf_clear_mbx(adapter, vf);
+
/* set vf mac address */
if (!is_zero_ether_addr(vf_mac))
ixgbe_set_vf_mac(adapter, vf, vf_mac);
-- 
2.17.1

Re: [PATCH 2/2] xen-netfront: Fix race between device setup and open

2018-01-11 Thread Ross Lagerwall

On 01/11/2018 04:08 PM, David Miller wrote:

From: Ross Lagerwall 
Date: Thu, 11 Jan 2018 15:49:07 +

I've now added CC'd netdev on the other two.

That doesn't work.

If you don't post the originals explicitly to netdev then it won't
get properly queued in patchwork.

The series fixes two crashes when using netfront. The first is actually 
fixed in Xen common code, not netfront, which is why I only CC'd netdev 
on the second. I can resend the originals to netdev if you want but IMO 
it is not necessary.

--
Ross Lagerwall

Re: [PATCH 1/2] xen/grant-table: Use put_page instead of free_page

2018-01-11 Thread Ross Lagerwall


+CC netdev

On 01/11/2018 09:36 AM, Ross Lagerwall wrote:

The page given to gnttab_end_foreign_access() to free could be a
compound page so use put_page() instead of free_page() since it can
handle both compound and single pages correctly.

This bug was discovered when migrating a Xen VM with several VIFs and
CONFIG_DEBUG_VM enabled. It hits a BUG usually after fewer than 10
iterations. All netfront devices disconnect from the backend during a
suspend/resume and this will call gnttab_end_foreign_access() if a
netfront queue has an outstanding skb. The mismatch between calling
get_page() and free_page() on a compound page causes a reference
counting error which is detected when DEBUG_VM is enabled.

Signed-off-by: Ross Lagerwall 
---
  drivers/xen/grant-table.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index f45114f..27be107 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -382,7 +382,7 @@ static void gnttab_handle_deferred(struct timer_list 
*unused)
if (entry->page) {
pr_debug("freeing g.e. %#x (pfn %#lx)\n",
 entry->ref, page_to_pfn(entry->page));
-   __free_page(entry->page);
+   put_page(entry->page);
} else
pr_info("freeing g.e. %#x\n", entry->ref);
kfree(entry);
@@ -438,7 +438,7 @@ void gnttab_end_foreign_access(grant_ref_t ref, int 
readonly,
if (gnttab_end_foreign_access_ref(ref, readonly)) {
put_free_entry(ref);
if (page != 0)
-   free_page(page);
+   put_page(virt_to_page(page));
} else
gnttab_add_deferred(ref, readonly,
page ? virt_to_page(page) : NULL);

Re: [PATCH 2/2] xen-netfront: Fix race between device setup and open

2018-01-11 Thread Ross Lagerwall

On 01/11/2018 03:26 PM, David Miller wrote:

From: Ross Lagerwall 
Date: Thu, 11 Jan 2018 09:36:38 +

When a netfront device is set up it registers a netdev fairly early on,
before it has set up the queues and is actually usable. A userspace tool
like NetworkManager will immediately try to open it and access its state
as soon as it appears. The bug can be reproduced by hotplugging VIFs
until the VM runs out of grant refs. It registers the netdev but fails
to set up any queues (since there are no more grant refs). In the
meantime, NetworkManager opens the device and the kernel crashes trying
to access the queues (of which there are none).

Fix this in two ways:
* For initial setup, register the netdev much later, after the queues
are setup. This avoids the race entirely.
* During a suspend/resume cycle, the frontend reconnects to the backend
and the queues are recreated. It is possible (though highly unlikely) to
race with something opening the device and accessing the queues after
they have been destroyed but before they have been recreated. Extend the
region covered by the rtnl semaphore to protect against this race. There
is a possibility that we fail to recreate the queues so check for this
in the open function.

Signed-off-by: Ross Lagerwall 

Where is patch 1/2 and the 0/2 header posting which explains what this
patch series is doing, how it is doing it, and why it is doing it that
way?

I've now added CC'd netdev on the other two.

Cheers,
--
Ross Lagerwall

Re: [PATCH 0/2] Fix a couple of crashes in netfront

2018-01-11 Thread Ross Lagerwall


+CC netdev

On 01/11/2018 09:36 AM, Ross Lagerwall wrote:

Here are a couple of patches to fix two crashes in netfront.

Ross Lagerwall (2):
   xen/grant-table: Use put_page instead of free_page
   xen-netfront: Fix race between device setup and open

  drivers/net/xen-netfront.c | 46 --
  drivers/xen/grant-table.c  |  4 ++--
  2 files changed, 26 insertions(+), 24 deletions(-)

[PATCH 2/2] xen-netfront: Fix race between device setup and open

2018-01-11 Thread Ross Lagerwall

When a netfront device is set up it registers a netdev fairly early on,
before it has set up the queues and is actually usable. A userspace tool
like NetworkManager will immediately try to open it and access its state
as soon as it appears. The bug can be reproduced by hotplugging VIFs
until the VM runs out of grant refs. It registers the netdev but fails
to set up any queues (since there are no more grant refs). In the
meantime, NetworkManager opens the device and the kernel crashes trying
to access the queues (of which there are none).

Fix this in two ways:
* For initial setup, register the netdev much later, after the queues
are setup. This avoids the race entirely.
* During a suspend/resume cycle, the frontend reconnects to the backend
and the queues are recreated. It is possible (though highly unlikely) to
race with something opening the device and accessing the queues after
they have been destroyed but before they have been recreated. Extend the
region covered by the rtnl semaphore to protect against this race. There
is a possibility that we fail to recreate the queues so check for this
in the open function.

Signed-off-by: Ross Lagerwall 
---
 drivers/net/xen-netfront.c | 46 --
 1 file changed, 24 insertions(+), 22 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 9bd7dde..8328d39 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -351,6 +351,9 @@ static int xennet_open(struct net_device *dev)
unsigned int i = 0;
struct netfront_queue *queue = NULL;
 
+   if (!np->queues)
+   return -ENODEV;
+
for (i = 0; i < num_queues; ++i) {
queue = &np->queues[i];
napi_enable(&queue->napi);
@@ -1358,18 +1361,8 @@ static int netfront_probe(struct xenbus_device *dev,
 #ifdef CONFIG_SYSFS
info->netdev->sysfs_groups[0] = &xennet_dev_group;
 #endif
-   err = register_netdev(info->netdev);
-   if (err) {
-   pr_warn("%s: register_netdev err=%d\n", __func__, err);
-   goto fail;
-   }
 
return 0;
-
- fail:
-   xennet_free_netdev(netdev);
-   dev_set_drvdata(&dev->dev, NULL);
-   return err;
 }
 
 static void xennet_end_access(int ref, void *page)
@@ -1737,8 +1730,6 @@ static void xennet_destroy_queues(struct netfront_info 
*info)
 {
unsigned int i;
 
-   rtnl_lock();
-
for (i = 0; i < info->netdev->real_num_tx_queues; i++) {
struct netfront_queue *queue = &info->queues[i];
 
@@ -1747,8 +1738,6 @@ static void xennet_destroy_queues(struct netfront_info 
*info)
netif_napi_del(&queue->napi);
}
 
-   rtnl_unlock();
-
kfree(info->queues);
info->queues = NULL;
 }
@@ -1764,8 +1753,6 @@ static int xennet_create_queues(struct netfront_info 
*info,
if (!info->queues)
return -ENOMEM;
 
-   rtnl_lock();
-
for (i = 0; i < *num_queues; i++) {
struct netfront_queue *queue = &info->queues[i];
 
@@ -1774,7 +1761,7 @@ static int xennet_create_queues(struct netfront_info 
*info,
 
ret = xennet_init_queue(queue);
if (ret < 0) {
-   dev_warn(&info->netdev->dev,
+   dev_warn(&info->xbdev->dev,
 "only created %d queues\n", i);
*num_queues = i;
break;
@@ -1788,10 +1775,8 @@ static int xennet_create_queues(struct netfront_info 
*info,
 
netif_set_real_num_tx_queues(info->netdev, *num_queues);
 
-   rtnl_unlock();
-
if (*num_queues == 0) {
-   dev_err(&info->netdev->dev, "no queues\n");
+   dev_err(&info->xbdev->dev, "no queues\n");
return -EINVAL;
}
return 0;
@@ -1828,6 +1813,7 @@ static int talk_to_netback(struct xenbus_device *dev,
goto out;
}
 
+   rtnl_lock();
if (info->queues)
xennet_destroy_queues(info);
 
@@ -1838,6 +1824,7 @@ static int talk_to_netback(struct xenbus_device *dev,
info->queues = NULL;
goto out;
}
+   rtnl_unlock();
 
/* Create shared ring, alloc event channel -- for each queue */
for (i = 0; i < num_queues; ++i) {
@@ -1934,8 +1921,10 @@ static int talk_to_netback(struct xenbus_device *dev,
xenbus_transaction_end(xbt, 1);
  destroy_ring:
xennet_disconnect_backend(info);
+   rtnl_lock();
xennet_destroy_queues(info);
  out:
+   rtnl_unlock();
device_unregister(&dev->dev);
return err;
 }
@@ -1965,6 +1954,15 @@ static int xennet_connect(struct net_device *dev)
netdev_update_features(dev

[PATCH RESEND 4.4-only] netlink: Allow direct reclaim for fallback allocation

2017-05-03 Thread Ross Lagerwall

The backport of d35c99ff77ec ("netlink: do not enter direct reclaim from
netlink_dump()") to the 4.4 branch (first in 4.4.32) mistakenly removed
direct claim from the initial large allocation _and_ the fallback
allocation which means that allocations can spuriously fail.
Fix the issue by adding back the direct reclaim flag to the fallback
allocation.

Fixes: 6d123f1d396b ("netlink: do not enter direct reclaim from netlink_dump()")
Signed-off-by: Ross Lagerwall 
---

Note that this is only for the 4.4 branch as the regression is only in
this branch. Consequently, there is no corresponding upstream commit.

I'm resending this to the linux-stable list since I now understand the
netdev maintainer only handles backports for the last couple of versions
of Linux.

 net/netlink/af_netlink.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 8e33019..acfb16f 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2107,7 +2107,7 @@ static int netlink_dump(struct sock *sk)
if (!skb) {
alloc_size = alloc_min_size;
skb = netlink_alloc_skb(sk, alloc_size, nlk->portid,
-   (GFP_KERNEL & ~__GFP_DIRECT_RECLAIM));
+   GFP_KERNEL);
}
if (!skb)
goto errout_skb;
-- 
2.7.4

[PATCH 4.4-only] netlink: Allow direct reclaim for fallback allocation

2017-04-21 Thread Ross Lagerwall

The backport of d35c99ff77ec ("netlink: do not enter direct reclaim from
netlink_dump()") to the 4.4 branch (first in 4.4.32) mistakenly removed
direct claim from the initial large allocation and the fallback
allocation which means that allocations can spuriously fail.
Fix the issue by adding back the direct reclaim flag to the fallback
allocation.

Fixes: 6d123f1d396b ("netlink: do not enter direct reclaim from netlink_dump()")
Signed-off-by: Ross Lagerwall 
---
 net/netlink/af_netlink.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 8e33019..acfb16f 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2107,7 +2107,7 @@ static int netlink_dump(struct sock *sk)
if (!skb) {
alloc_size = alloc_min_size;
skb = netlink_alloc_skb(sk, alloc_size, nlk->portid,
-   (GFP_KERNEL & ~__GFP_DIRECT_RECLAIM));
+   GFP_KERNEL);
}
if (!skb)
goto errout_skb;
-- 
2.7.4

[PATCH v3] xen-netfront: Improve error handling during initialization

2017-02-08 Thread Ross Lagerwall

This fixes a crash when running out of grant refs when creating many
queues across many netdevs.

* If creating queues fails (i.e. there are no grant refs available),
call xenbus_dev_fatal() to ensure that the xenbus device is set to the
closed state.
* If no queues are created, don't call xennet_disconnect_backend as
netdev->real_num_tx_queues will not have been set correctly.
* If setup_netfront() fails, ensure that all the queues created are
cleaned up, not just those that have been set up.
* If any queues were set up and an error occurs, call
xennet_destroy_queues() to clean up the napi context.
* If any fatal error occurs, unregister and destroy the netdev to avoid
leaving around a half setup network device.

Signed-off-by: Ross Lagerwall 
---

Changed in V3:
* If xennet_create_queues returns < 0, it will not have created any
  queues so there's no need to go to destroy_ring.

 drivers/net/xen-netfront.c | 29 +++--
 1 file changed, 11 insertions(+), 18 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 722fe9f..f90fc72 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -1823,27 +1823,19 @@ static int talk_to_netback(struct xenbus_device *dev,
xennet_destroy_queues(info);
 
err = xennet_create_queues(info, &num_queues);
-   if (err < 0)
-   goto destroy_ring;
+   if (err < 0) {
+   xenbus_dev_fatal(dev, err, "creating queues");
+   kfree(info->queues);
+   info->queues = NULL;
+   goto out;
+   }
 
/* Create shared ring, alloc event channel -- for each queue */
for (i = 0; i < num_queues; ++i) {
queue = &info->queues[i];
err = setup_netfront(dev, queue, feature_split_evtchn);
-   if (err) {
-   /* setup_netfront() will tidy up the current
-* queue on error, but we need to clean up
-* those already allocated.
-*/
-   if (i > 0) {
-   rtnl_lock();
-   netif_set_real_num_tx_queues(info->netdev, i);
-   rtnl_unlock();
-   goto destroy_ring;
-   } else {
-   goto out;
-   }
-   }
+   if (err)
+   goto destroy_ring;
}
 
 again:
@@ -1933,9 +1925,10 @@ static int talk_to_netback(struct xenbus_device *dev,
xenbus_transaction_end(xbt, 1);
  destroy_ring:
xennet_disconnect_backend(info);
-   kfree(info->queues);
-   info->queues = NULL;
+   xennet_destroy_queues(info);
  out:
+   unregister_netdev(info->netdev);
+   xennet_free_netdev(info->netdev);
return err;
 }
 
-- 
2.7.4

Re: [PATCH v2] xen-netfront: Improve error handling during initialization

2017-02-08 Thread Ross Lagerwall


On 02/07/2017 11:33 PM, Boris Ostrovsky wrote:

On 02/07/2017 09:55 AM, Ross Lagerwall wrote:

This fixes a crash when running out of grant refs when creating many
queues across many netdevs.

* If creating queues fails (i.e. there are no grant refs available),
call xenbus_dev_fatal() to ensure that the xenbus device is set to the
closed state.
* If no queues are created, don't call xennet_disconnect_backend as
netdev->real_num_tx_queues will not have been set correctly.
* If setup_netfront() fails, ensure that all the queues created are
cleaned up, not just those that have been set up.
* If any queues were set up and an error occurs, call
xennet_destroy_queues() to clean up the napi context.
* If any fatal error occurs, unregister and destroy the netdev to avoid
leaving around a half setup network device.

Signed-off-by: Ross Lagerwall 
---

Changed in V2:
* Retested on top of v4.10-rc7 + "xen-netfront: Delete rx_refill_timer
  in xennet_disconnect_backend()".
* Don't move setup_timer as it is not necessary.

 drivers/net/xen-netfront.c | 33 +++--
 1 file changed, 15 insertions(+), 18 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 722fe9f..5399a86 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -1823,27 +1823,23 @@ static int talk_to_netback(struct xenbus_device *dev,
xennet_destroy_queues(info);

err = xennet_create_queues(info, &num_queues);
-   if (err < 0)
-   goto destroy_ring;
+   if (err < 0) {
+   xenbus_dev_fatal(dev, err, "creating queues");
+   if (num_queues > 0) {
+   goto destroy_ring;


The only way for us to have (err<0) && (num_queues>0) is when we get a
-ENOMEM right at the top, isn't it? So there is nothing to disconnect or
destroy, it seems to me. And if that's true you can directly 'goto out'.



You're right, although that might make it a bit more fragile if 
something in xennet_create_queues() changes in the future. Nevertheless, 
I'll update the patch.


--
Ross Lagerwall

[PATCH v2] xen-netfront: Improve error handling during initialization

2017-02-07 Thread Ross Lagerwall

This fixes a crash when running out of grant refs when creating many
queues across many netdevs.

* If creating queues fails (i.e. there are no grant refs available),
call xenbus_dev_fatal() to ensure that the xenbus device is set to the
closed state.
* If no queues are created, don't call xennet_disconnect_backend as
netdev->real_num_tx_queues will not have been set correctly.
* If setup_netfront() fails, ensure that all the queues created are
cleaned up, not just those that have been set up.
* If any queues were set up and an error occurs, call
xennet_destroy_queues() to clean up the napi context.
* If any fatal error occurs, unregister and destroy the netdev to avoid
leaving around a half setup network device.

Signed-off-by: Ross Lagerwall 
---

Changed in V2:
* Retested on top of v4.10-rc7 + "xen-netfront: Delete rx_refill_timer
  in xennet_disconnect_backend()".
* Don't move setup_timer as it is not necessary.

 drivers/net/xen-netfront.c | 33 +++--
 1 file changed, 15 insertions(+), 18 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 722fe9f..5399a86 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -1823,27 +1823,23 @@ static int talk_to_netback(struct xenbus_device *dev,
xennet_destroy_queues(info);
 
err = xennet_create_queues(info, &num_queues);
-   if (err < 0)
-   goto destroy_ring;
+   if (err < 0) {
+   xenbus_dev_fatal(dev, err, "creating queues");
+   if (num_queues > 0) {
+   goto destroy_ring;
+   } else {
+   kfree(info->queues);
+   info->queues = NULL;
+   goto out;
+   }
+   }
 
/* Create shared ring, alloc event channel -- for each queue */
for (i = 0; i < num_queues; ++i) {
queue = &info->queues[i];
err = setup_netfront(dev, queue, feature_split_evtchn);
-   if (err) {
-   /* setup_netfront() will tidy up the current
-* queue on error, but we need to clean up
-* those already allocated.
-*/
-   if (i > 0) {
-   rtnl_lock();
-   netif_set_real_num_tx_queues(info->netdev, i);
-   rtnl_unlock();
-   goto destroy_ring;
-   } else {
-   goto out;
-   }
-   }
+   if (err)
+   goto destroy_ring;
}
 
 again:
@@ -1933,9 +1929,10 @@ static int talk_to_netback(struct xenbus_device *dev,
xenbus_transaction_end(xbt, 1);
  destroy_ring:
xennet_disconnect_backend(info);
-   kfree(info->queues);
-   info->queues = NULL;
+   xennet_destroy_queues(info);
  out:
+   unregister_netdev(info->netdev);
+   xennet_free_netdev(info->netdev);
return err;
 }
 
-- 
2.7.4

Re: [PATCH] xen-netfront: Improve error handling during initialization

2017-02-02 Thread Ross Lagerwall


On 02/01/2017 06:54 PM, Boris Ostrovsky wrote:

On 02/01/2017 10:50 AM, Ross Lagerwall wrote:

Improve error handling during initialization. This fixes a crash when
running out of grant refs when creating many queues across many netdevs.

* Delay timer creation so that if initializing a queue fails, the timer
has not been setup yet.
* If creating queues fails (i.e. there are no grant refs available),
call xenbus_dev_fatal() to ensure that the xenbus device is set to the
closed state.
* If no queues are created, don't call xennet_disconnect_backend as
netdev->real_num_tx_queues will not have been set correctly.
* If setup_netfront() fails, ensure that all the queues created are
cleaned up, not just those that have been set up.
* If any queues were set up and an error occurs, call
xennet_destroy_queues() to stop the timer and clean up the napi context.


We need to stop the timer in xennet_disconnect_backend(). I sent a patch
a couple of day ago

https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg03269.html

but was about to resend it with del_timer_sync() moved after
napi_synchronize().



OK, but the patch is still relevant since I believe we still need to 
clean up the napi context in this case (plus the patch fixes a lot of 
other issues).


But I will respin it on top of your patch(es) and re-test it before 
resending.


--
Ross Lagerwall

[PATCH] xen-netfront: Improve error handling during initialization

2017-02-01 Thread Ross Lagerwall

Improve error handling during initialization. This fixes a crash when
running out of grant refs when creating many queues across many netdevs.

* Delay timer creation so that if initializing a queue fails, the timer
has not been setup yet.
* If creating queues fails (i.e. there are no grant refs available),
call xenbus_dev_fatal() to ensure that the xenbus device is set to the
closed state.
* If no queues are created, don't call xennet_disconnect_backend as
netdev->real_num_tx_queues will not have been set correctly.
* If setup_netfront() fails, ensure that all the queues created are
cleaned up, not just those that have been set up.
* If any queues were set up and an error occurs, call
xennet_destroy_queues() to stop the timer and clean up the napi context.
* If any fatal error occurs, unregister and destroy the netdev to avoid
leaving around a half setup network device.

Signed-off-by: Ross Lagerwall 
---
 drivers/net/xen-netfront.c | 39 ++-
 1 file changed, 18 insertions(+), 21 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 8315fe7..8ca85af 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -1596,9 +1596,6 @@ static int xennet_init_queue(struct netfront_queue *queue)
spin_lock_init(&queue->tx_lock);
spin_lock_init(&queue->rx_lock);
 
-   setup_timer(&queue->rx_refill_timer, rx_refill_timeout,
-   (unsigned long)queue);
-
snprintf(queue->name, sizeof(queue->name), "%s-q%u",
 queue->info->netdev->name, queue->id);
 
@@ -1632,6 +1629,9 @@ static int xennet_init_queue(struct netfront_queue *queue)
goto exit_free_tx;
}
 
+   setup_timer(&queue->rx_refill_timer, rx_refill_timeout,
+   (unsigned long)queue);
+
return 0;
 
  exit_free_tx:
@@ -1822,27 +1822,23 @@ static int talk_to_netback(struct xenbus_device *dev,
xennet_destroy_queues(info);
 
err = xennet_create_queues(info, &num_queues);
-   if (err < 0)
-   goto destroy_ring;
+   if (err < 0) {
+   xenbus_dev_fatal(dev, err, "creating queues");
+   if (num_queues > 0) {
+   goto destroy_ring;
+   } else {
+   kfree(info->queues);
+   info->queues = NULL;
+   goto out;
+   }
+   }
 
/* Create shared ring, alloc event channel -- for each queue */
for (i = 0; i < num_queues; ++i) {
queue = &info->queues[i];
err = setup_netfront(dev, queue, feature_split_evtchn);
-   if (err) {
-   /* setup_netfront() will tidy up the current
-* queue on error, but we need to clean up
-* those already allocated.
-*/
-   if (i > 0) {
-   rtnl_lock();
-   netif_set_real_num_tx_queues(info->netdev, i);
-   rtnl_unlock();
-   goto destroy_ring;
-   } else {
-   goto out;
-   }
-   }
+   if (err)
+   goto destroy_ring;
}
 
 again:
@@ -1932,9 +1928,10 @@ static int talk_to_netback(struct xenbus_device *dev,
xenbus_transaction_end(xbt, 1);
  destroy_ring:
xennet_disconnect_backend(info);
-   kfree(info->queues);
-   info->queues = NULL;
+   xennet_destroy_queues(info);
  out:
+   unregister_netdev(info->netdev);
+   xennet_free_netdev(info->netdev);
return err;
 }
 
-- 
2.7.4

[PATCH v2] xen/netback: Wake dealloc thread after completing zerocopy work

2015-08-04 Thread Ross Lagerwall

Waking the dealloc thread before decrementing inflight_packets is racy
because it means the thread may go to sleep before inflight_packets is
decremented. If kthread_stop() has already been called, the dealloc
thread may wait forever with nothing to wake it. Instead, wake the
thread only after decrementing inflight_packets.

Signed-off-by: Ross Lagerwall 
---
Changed in V2: Move wakeup into zerocopy_complete function.

 drivers/net/xen-netback/interface.c | 6 ++
 drivers/net/xen-netback/netback.c   | 1 -
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/net/xen-netback/interface.c 
b/drivers/net/xen-netback/interface.c
index 1a83e19..28577a3 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -61,6 +61,12 @@ void xenvif_skb_zerocopy_prepare(struct xenvif_queue *queue,
 void xenvif_skb_zerocopy_complete(struct xenvif_queue *queue)
 {
atomic_dec(&queue->inflight_packets);
+
+   /* Wake the dealloc thread _after_ decrementing inflight_packets so
+* that if kthread_stop() has already been called, the dealloc thread
+* does not wait forever with nothing to wake it.
+*/
+   wake_up(&queue->dealloc_wq);
 }
 
 int xenvif_schedulable(struct xenvif *vif)
diff --git a/drivers/net/xen-netback/netback.c 
b/drivers/net/xen-netback/netback.c
index 7d50711..09ffda4 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -1536,7 +1536,6 @@ void xenvif_zerocopy_callback(struct ubuf_info *ubuf, 
bool zerocopy_success)
smp_wmb();
queue->dealloc_prod++;
} while (ubuf);
-   wake_up(&queue->dealloc_wq);
spin_unlock_irqrestore(&queue->callback_lock, flags);
 
if (likely(zerocopy_success))
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] xen/netback: Wake dealloc thread after completing zerocopy work

2015-08-04 Thread Ross Lagerwall

Waking the dealloc thread before decrementing inflight_packets is racy
because it means the thread may go to sleep before inflight_packets is
decremented. If kthread_stop() has already been called, the dealloc
thread may wait forever with nothing to wake it. Instead, wake the
thread only after decrementing inflight_packets.

Signed-off-by: Ross Lagerwall 
---
 drivers/net/xen-netback/netback.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/xen-netback/netback.c 
b/drivers/net/xen-netback/netback.c
index 7d50711..e95ee20 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -1536,7 +1536,6 @@ void xenvif_zerocopy_callback(struct ubuf_info *ubuf, 
bool zerocopy_success)
smp_wmb();
queue->dealloc_prod++;
} while (ubuf);
-   wake_up(&queue->dealloc_wq);
spin_unlock_irqrestore(&queue->callback_lock, flags);
 
if (likely(zerocopy_success))
@@ -1544,6 +1543,7 @@ void xenvif_zerocopy_callback(struct ubuf_info *ubuf, 
bool zerocopy_success)
else
queue->stats.tx_zerocopy_fail++;
xenvif_skb_zerocopy_complete(queue);
+   wake_up(&queue->dealloc_wq);
 }
 
 static inline void xenvif_tx_dealloc_action(struct xenvif_queue *queue)
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] xen-netback: Allocate fraglist early to avoid complex rollback

2015-08-03 Thread Ross Lagerwall

Determine if a fraglist is needed in the tx path, and allocate it if
necessary before setting up the copy and map operations.
Otherwise, undoing the copy and map operations is tricky.

This fixes a use-after-free: if allocating the fraglist failed, the copy
and map operations that had been set up were still executed, writing
over the data area of a freed skb.

Signed-off-by: Ross Lagerwall 
---
 drivers/net/xen-netback/netback.c | 61 +--
 1 file changed, 33 insertions(+), 28 deletions(-)

diff --git a/drivers/net/xen-netback/netback.c 
b/drivers/net/xen-netback/netback.c
index 7d50711..1b406e7 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -810,23 +810,17 @@ static inline struct sk_buff *xenvif_alloc_skb(unsigned 
int size)
 static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif_queue 
*queue,
struct sk_buff *skb,
struct 
xen_netif_tx_request *txp,
-   struct 
gnttab_map_grant_ref *gop)
+   struct 
gnttab_map_grant_ref *gop,
+   unsigned int 
frag_overflow,
+   struct sk_buff *nskb)
 {
struct skb_shared_info *shinfo = skb_shinfo(skb);
skb_frag_t *frags = shinfo->frags;
u16 pending_idx = XENVIF_TX_CB(skb)->pending_idx;
int start;
pending_ring_idx_t index;
-   unsigned int nr_slots, frag_overflow = 0;
+   unsigned int nr_slots;
 
-   /* At this point shinfo->nr_frags is in fact the number of
-* slots, which can be as large as XEN_NETBK_LEGACY_SLOTS_MAX.
-*/
-   if (shinfo->nr_frags > MAX_SKB_FRAGS) {
-   frag_overflow = shinfo->nr_frags - MAX_SKB_FRAGS;
-   BUG_ON(frag_overflow > MAX_SKB_FRAGS);
-   shinfo->nr_frags = MAX_SKB_FRAGS;
-   }
nr_slots = shinfo->nr_frags;
 
/* Skip first skb fragment if it is on same page as header fragment. */
@@ -841,13 +835,6 @@ static struct gnttab_map_grant_ref 
*xenvif_get_requests(struct xenvif_queue *que
}
 
if (frag_overflow) {
-   struct sk_buff *nskb = xenvif_alloc_skb(0);
-   if (unlikely(nskb == NULL)) {
-   if (net_ratelimit())
-   netdev_err(queue->vif->dev,
-  "Can't allocate the frag_list 
skb.\n");
-   return NULL;
-   }
 
shinfo = skb_shinfo(nskb);
frags = shinfo->frags;
@@ -1175,9 +1162,10 @@ static void xenvif_tx_build_gops(struct xenvif_queue 
*queue,
 unsigned *copy_ops,
 unsigned *map_ops)
 {
-   struct gnttab_map_grant_ref *gop = queue->tx_map_ops, *request_gop;
-   struct sk_buff *skb;
+   struct gnttab_map_grant_ref *gop = queue->tx_map_ops;
+   struct sk_buff *skb, *nskb;
int ret;
+   unsigned int frag_overflow;
 
while (skb_queue_len(&queue->tx_queue) < budget) {
struct xen_netif_tx_request txreq;
@@ -1265,6 +1253,29 @@ static void xenvif_tx_build_gops(struct xenvif_queue 
*queue,
break;
}
 
+   skb_shinfo(skb)->nr_frags = ret;
+   if (data_len < txreq.size)
+   skb_shinfo(skb)->nr_frags++;
+   /* At this point shinfo->nr_frags is in fact the number of
+* slots, which can be as large as XEN_NETBK_LEGACY_SLOTS_MAX.
+*/
+   frag_overflow = 0;
+   nskb = NULL;
+   if (skb_shinfo(skb)->nr_frags > MAX_SKB_FRAGS) {
+   frag_overflow = skb_shinfo(skb)->nr_frags - 
MAX_SKB_FRAGS;
+   BUG_ON(frag_overflow > MAX_SKB_FRAGS);
+   skb_shinfo(skb)->nr_frags = MAX_SKB_FRAGS;
+   nskb = xenvif_alloc_skb(0);
+   if (unlikely(nskb == NULL)) {
+   kfree_skb(skb);
+   xenvif_tx_err(queue, &txreq, idx);
+   if (net_ratelimit())
+   netdev_err(queue->vif->dev,
+  "Can't allocate the 
frag_list skb.\n");
+   break;
+   }
+   }
+
if (extras[XEN_NETIF_EXTRA_TYPE_GSO - 1].type) {
struct xen_netif_extra_info *gso;
gso = &extras[XEN_NETIF_EXTRA_TYPE_GSO - 1];
@@ -127

[PATCH net v2] xen/netback: Properly initialize credit_bytes

2015-05-27 Thread Ross Lagerwall

Commit e9ce7cb6b107 ("xen-netback: Factor queue-specific data into queue
struct") introduced a regression when moving queue-specific data into
the queue struct by failing to set the credit_bytes field. This
prevented bandwidth limiting from working. Initialize the field as it
was done before multiqueue support was added.

Signed-off-by: Ross Lagerwall 
---

Changed in v2: Fixed multiple-assignment.

 drivers/net/xen-netback/xenbus.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index 3d8dbf5..fee0241 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -793,6 +793,7 @@ static void connect(struct backend_info *be)
goto err;
}
 
+   queue->credit_bytes = credit_bytes;
queue->remaining_credit = credit_bytes;
queue->credit_usec = credit_usec;
 
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] xen/netback: Reset nr_frags before freeing skb

[PATCH] net: Fix usage of pskb_trim_rcsum

[PATCH] openvswitch: Avoid OOB read when parsing flow nlattrs

Re: [PATCH] net: Fix xps_needed inc/dec mismatch

[PATCH] net: Fix xps_needed inc/dec mismatch

[PATCH] ixgbe: Fix race when the VF driver does a reset

Re: [PATCH 2/2] xen-netfront: Fix race between device setup and open

Re: [PATCH 1/2] xen/grant-table: Use put_page instead of free_page

Re: [PATCH 2/2] xen-netfront: Fix race between device setup and open

Re: [PATCH 0/2] Fix a couple of crashes in netfront

[PATCH 2/2] xen-netfront: Fix race between device setup and open

[PATCH RESEND 4.4-only] netlink: Allow direct reclaim for fallback allocation

[PATCH 4.4-only] netlink: Allow direct reclaim for fallback allocation

[PATCH v3] xen-netfront: Improve error handling during initialization

Re: [PATCH v2] xen-netfront: Improve error handling during initialization

[PATCH v2] xen-netfront: Improve error handling during initialization

Re: [PATCH] xen-netfront: Improve error handling during initialization

[PATCH] xen-netfront: Improve error handling during initialization

[PATCH v2] xen/netback: Wake dealloc thread after completing zerocopy work

[PATCH] xen/netback: Wake dealloc thread after completing zerocopy work

[PATCH] xen-netback: Allocate fraglist early to avoid complex rollback

[PATCH net v2] xen/netback: Properly initialize credit_bytes

22 matches

Site Navigation

Mail list logo

Footer information