[PATCH] xen/netback: Reset nr_frags before freeing skb
At this point nr_frags has been incremented but the frag does not yet have a page assigned so freeing the skb results in a crash. Reset nr_frags before freeing the skb to prevent this. Signed-off-by: Ross Lagerwall --- drivers/net/xen-netback/netback.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c index 1d9940d4e8c7..c9262ffeefe4 100644 --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -925,6 +925,7 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue, skb_shinfo(skb)->nr_frags = MAX_SKB_FRAGS; nskb = xenvif_alloc_skb(0); if (unlikely(nskb == NULL)) { + skb_shinfo(skb)->nr_frags = 0; kfree_skb(skb); xenvif_tx_err(queue, &txreq, extra_count, idx); if (net_ratelimit()) @@ -940,6 +941,7 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue, if (xenvif_set_skb_gso(queue->vif, skb, gso)) { /* Failure in xenvif_set_skb_gso is fatal. */ + skb_shinfo(skb)->nr_frags = 0; kfree_skb(skb); kfree_skb(nskb); break; -- 2.17.2
[PATCH] net: Fix usage of pskb_trim_rcsum
In certain cases, pskb_trim_rcsum() may change skb pointers. Reinitialize header pointers afterwards to avoid potential use-after-frees. Add a note in the documentation of pskb_trim_rcsum(). Found by KASAN. Signed-off-by: Ross Lagerwall --- drivers/net/ppp/pppoe.c | 1 + include/linux/skbuff.h | 1 + net/bridge/br_netfilter_ipv6.c | 1 + net/bridge/netfilter/nft_reject_bridge.c | 1 + net/ipv4/ip_input.c | 1 + 5 files changed, 5 insertions(+) diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c index 62dc564b251d..f22639f0116a 100644 --- a/drivers/net/ppp/pppoe.c +++ b/drivers/net/ppp/pppoe.c @@ -445,6 +445,7 @@ static int pppoe_rcv(struct sk_buff *skb, struct net_device *dev, if (pskb_trim_rcsum(skb, len)) goto drop; + ph = pppoe_hdr(skb); pn = pppoe_pernet(dev_net(dev)); /* Note that get_item does a sock_hold(), so sk_pppox(po) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 93f56fddd92a..95d25b010a25 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -3218,6 +3218,7 @@ int pskb_trim_rcsum_slow(struct sk_buff *skb, unsigned int len); * * This is exactly the same as pskb_trim except that it ensures the * checksum of received packets are still valid after the operation. + * It can change skb pointers. */ static inline int pskb_trim_rcsum(struct sk_buff *skb, unsigned int len) diff --git a/net/bridge/br_netfilter_ipv6.c b/net/bridge/br_netfilter_ipv6.c index 94039f588f1d..564710f88f93 100644 --- a/net/bridge/br_netfilter_ipv6.c +++ b/net/bridge/br_netfilter_ipv6.c @@ -131,6 +131,7 @@ int br_validate_ipv6(struct net *net, struct sk_buff *skb) IPSTATS_MIB_INDISCARDS); goto drop; } + hdr = ipv6_hdr(skb); } if (hdr->nexthdr == NEXTHDR_HOP && br_nf_check_hbh_len(skb)) goto drop; diff --git a/net/bridge/netfilter/nft_reject_bridge.c b/net/bridge/netfilter/nft_reject_bridge.c index 08cbed7d940e..419e8edf23ba 100644 --- a/net/bridge/netfilter/nft_reject_bridge.c +++ b/net/bridge/netfilter/nft_reject_bridge.c @@ -229,6 +229,7 @@ static bool reject6_br_csum_ok(struct sk_buff *skb, int hook) pskb_trim_rcsum(skb, ntohs(ip6h->payload_len) + sizeof(*ip6h))) return false; + ip6h = ipv6_hdr(skb); thoff = ipv6_skip_exthdr(skb, ((u8*)(ip6h+1) - skb->data), &proto, &fo); if (thoff < 0 || thoff >= skb->len || (fo & htons(~0x7)) != 0) return false; diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c index 26921f6b3b92..51d8efba6de2 100644 --- a/net/ipv4/ip_input.c +++ b/net/ipv4/ip_input.c @@ -488,6 +488,7 @@ static struct sk_buff *ip_rcv_core(struct sk_buff *skb, struct net *net) goto drop; } + iph = ip_hdr(skb); skb->transport_header = skb->network_header + iph->ihl*4; /* Remove any debris in the socket control block */ -- 2.17.2
[PATCH] openvswitch: Avoid OOB read when parsing flow nlattrs
For nested and variable attributes, the expected length of an attribute is not known and marked by a negative number. This results in an OOB read when the expected length is later used to check if the attribute is all zeros. Fix this by using the actual length of the attribute rather than the expected length. Signed-off-by: Ross Lagerwall --- net/openvswitch/flow_netlink.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c index 435a4bdf8f89..691da853bef5 100644 --- a/net/openvswitch/flow_netlink.c +++ b/net/openvswitch/flow_netlink.c @@ -500,7 +500,7 @@ static int __parse_flow_nlattrs(const struct nlattr *attr, return -EINVAL; } - if (!nz || !is_all_zero(nla_data(nla), expected_len)) { + if (!nz || !is_all_zero(nla_data(nla), nla_len(nla))) { attrs |= 1 << type; a[type] = nla; } -- 2.17.2
Re: [PATCH] net: Fix xps_needed inc/dec mismatch
On 12/7/18 11:01 AM, Sabrina Dubroca wrote: Hi Ross, 2018-12-07, 10:16:21 +, Ross Lagerwall wrote: xps_needed is incremented only when a new dev map is allocated (in __netif_set_xps_queue). Therefore it should be decremented only when we actually have a dev map to destroy. Without this, it may be decremented too many times which causes netif_reset_xps_queues to return early and not actually clean up the old dev maps. This results in a crash in __netif_set_xps_queue when it is called later. The crash occurred when having multiple ixgbe devices in a host. lldpad would reconfigure them to be FCoE-capable causing reset_xps_queues / set_xps_queue to be called several times. The xps_needed count would get out of sync and eventually the above-mentioned crash would occur. Signed-off-by: Ross Lagerwall I posted another patchset recently (commits f28c020fb488 and 867d0ad476db in the "net" tree) for issues in XPS, including broken xps_needed accounting, so your patch won't apply to David's "net" tree. Could you try it with your use case, and if you still see issues, fix them on top? You can grab the latest net tree here: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git Your two commits fix the issue I was seeing. Thanks! -- Ross Lagerwall
[PATCH] net: Fix xps_needed inc/dec mismatch
xps_needed is incremented only when a new dev map is allocated (in __netif_set_xps_queue). Therefore it should be decremented only when we actually have a dev map to destroy. Without this, it may be decremented too many times which causes netif_reset_xps_queues to return early and not actually clean up the old dev maps. This results in a crash in __netif_set_xps_queue when it is called later. The crash occurred when having multiple ixgbe devices in a host. lldpad would reconfigure them to be FCoE-capable causing reset_xps_queues / set_xps_queue to be called several times. The xps_needed count would get out of sync and eventually the above-mentioned crash would occur. Signed-off-by: Ross Lagerwall --- net/core/dev.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/core/dev.c b/net/core/dev.c index ddc551f24ba2..8aa72e93af9f 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2233,11 +2233,12 @@ static void netif_reset_xps_queues(struct net_device *dev, u16 offset, clean_xps_maps(dev, possible_mask, dev_maps, nr_ids, offset, count, false); -out_no_maps: if (static_key_enabled(&xps_rxqs_needed)) static_key_slow_dec_cpuslocked(&xps_rxqs_needed); static_key_slow_dec_cpuslocked(&xps_needed); + +out_no_maps: mutex_unlock(&xps_map_mutex); cpus_read_unlock(); } -- 2.17.1
[PATCH] ixgbe: Fix race when the VF driver does a reset
When the VF driver does a reset, it (at least the Linux one) writes to the VFCTRL register to issue a reset and then immediately sends a reset message using the mailbox API. This is racy because when the PF driver detects that the VFCTRL register reset pin has been asserted, it clears the mailbox memory. Depending on ordering, the reset message sent by the VF could be cleared by the PF driver. It then responds to the cleared message with a NACK which causes the VF driver to malfunction. Fix this by deferring clearing the mailbox memory until the reset message is received. Fixes: 939b701ad633 ("ixgbe: fix driver behaviour after issuing VFLR") Signed-off-by: Ross Lagerwall --- drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c index 5dacfc870259..345701af7749 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c @@ -700,7 +700,6 @@ static inline void ixgbe_vf_reset_event(struct ixgbe_adapter *adapter, u32 vf) u8 num_tcs = adapter->hw_tcs; u32 reg_val; u32 queue; - u32 word; /* remove VLAN filters beloning to this VF */ ixgbe_clear_vf_vlans(adapter, vf); @@ -758,6 +757,14 @@ static inline void ixgbe_vf_reset_event(struct ixgbe_adapter *adapter, u32 vf) } } + IXGBE_WRITE_FLUSH(hw); +} + +static void ixgbe_vf_clear_mbx(struct ixgbe_adapter *adapter, u32 vf) +{ + struct ixgbe_hw *hw = &adapter->hw; + u32 word; + /* Clear VF's mailbox memory */ for (word = 0; word < IXGBE_VFMAILBOX_SIZE; word++) IXGBE_WRITE_REG_ARRAY(hw, IXGBE_PFMBMEM(vf), word, 0); @@ -831,6 +838,8 @@ static int ixgbe_vf_reset_msg(struct ixgbe_adapter *adapter, u32 vf) /* reset the filters for the device */ ixgbe_vf_reset_event(adapter, vf); + ixgbe_vf_clear_mbx(adapter, vf); + /* set vf mac address */ if (!is_zero_ether_addr(vf_mac)) ixgbe_set_vf_mac(adapter, vf, vf_mac); -- 2.17.1
Re: [PATCH 2/2] xen-netfront: Fix race between device setup and open
On 01/11/2018 04:08 PM, David Miller wrote: From: Ross Lagerwall Date: Thu, 11 Jan 2018 15:49:07 + I've now added CC'd netdev on the other two. That doesn't work. If you don't post the originals explicitly to netdev then it won't get properly queued in patchwork. The series fixes two crashes when using netfront. The first is actually fixed in Xen common code, not netfront, which is why I only CC'd netdev on the second. I can resend the originals to netdev if you want but IMO it is not necessary. -- Ross Lagerwall
Re: [PATCH 1/2] xen/grant-table: Use put_page instead of free_page
+CC netdev On 01/11/2018 09:36 AM, Ross Lagerwall wrote: The page given to gnttab_end_foreign_access() to free could be a compound page so use put_page() instead of free_page() since it can handle both compound and single pages correctly. This bug was discovered when migrating a Xen VM with several VIFs and CONFIG_DEBUG_VM enabled. It hits a BUG usually after fewer than 10 iterations. All netfront devices disconnect from the backend during a suspend/resume and this will call gnttab_end_foreign_access() if a netfront queue has an outstanding skb. The mismatch between calling get_page() and free_page() on a compound page causes a reference counting error which is detected when DEBUG_VM is enabled. Signed-off-by: Ross Lagerwall --- drivers/xen/grant-table.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c index f45114f..27be107 100644 --- a/drivers/xen/grant-table.c +++ b/drivers/xen/grant-table.c @@ -382,7 +382,7 @@ static void gnttab_handle_deferred(struct timer_list *unused) if (entry->page) { pr_debug("freeing g.e. %#x (pfn %#lx)\n", entry->ref, page_to_pfn(entry->page)); - __free_page(entry->page); + put_page(entry->page); } else pr_info("freeing g.e. %#x\n", entry->ref); kfree(entry); @@ -438,7 +438,7 @@ void gnttab_end_foreign_access(grant_ref_t ref, int readonly, if (gnttab_end_foreign_access_ref(ref, readonly)) { put_free_entry(ref); if (page != 0) - free_page(page); + put_page(virt_to_page(page)); } else gnttab_add_deferred(ref, readonly, page ? virt_to_page(page) : NULL);
Re: [PATCH 2/2] xen-netfront: Fix race between device setup and open
On 01/11/2018 03:26 PM, David Miller wrote: From: Ross Lagerwall Date: Thu, 11 Jan 2018 09:36:38 + When a netfront device is set up it registers a netdev fairly early on, before it has set up the queues and is actually usable. A userspace tool like NetworkManager will immediately try to open it and access its state as soon as it appears. The bug can be reproduced by hotplugging VIFs until the VM runs out of grant refs. It registers the netdev but fails to set up any queues (since there are no more grant refs). In the meantime, NetworkManager opens the device and the kernel crashes trying to access the queues (of which there are none). Fix this in two ways: * For initial setup, register the netdev much later, after the queues are setup. This avoids the race entirely. * During a suspend/resume cycle, the frontend reconnects to the backend and the queues are recreated. It is possible (though highly unlikely) to race with something opening the device and accessing the queues after they have been destroyed but before they have been recreated. Extend the region covered by the rtnl semaphore to protect against this race. There is a possibility that we fail to recreate the queues so check for this in the open function. Signed-off-by: Ross Lagerwall Where is patch 1/2 and the 0/2 header posting which explains what this patch series is doing, how it is doing it, and why it is doing it that way? I've now added CC'd netdev on the other two. Cheers, -- Ross Lagerwall
Re: [PATCH 0/2] Fix a couple of crashes in netfront
+CC netdev On 01/11/2018 09:36 AM, Ross Lagerwall wrote: Here are a couple of patches to fix two crashes in netfront. Ross Lagerwall (2): xen/grant-table: Use put_page instead of free_page xen-netfront: Fix race between device setup and open drivers/net/xen-netfront.c | 46 -- drivers/xen/grant-table.c | 4 ++-- 2 files changed, 26 insertions(+), 24 deletions(-)
[PATCH 2/2] xen-netfront: Fix race between device setup and open
When a netfront device is set up it registers a netdev fairly early on, before it has set up the queues and is actually usable. A userspace tool like NetworkManager will immediately try to open it and access its state as soon as it appears. The bug can be reproduced by hotplugging VIFs until the VM runs out of grant refs. It registers the netdev but fails to set up any queues (since there are no more grant refs). In the meantime, NetworkManager opens the device and the kernel crashes trying to access the queues (of which there are none). Fix this in two ways: * For initial setup, register the netdev much later, after the queues are setup. This avoids the race entirely. * During a suspend/resume cycle, the frontend reconnects to the backend and the queues are recreated. It is possible (though highly unlikely) to race with something opening the device and accessing the queues after they have been destroyed but before they have been recreated. Extend the region covered by the rtnl semaphore to protect against this race. There is a possibility that we fail to recreate the queues so check for this in the open function. Signed-off-by: Ross Lagerwall --- drivers/net/xen-netfront.c | 46 -- 1 file changed, 24 insertions(+), 22 deletions(-) diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c index 9bd7dde..8328d39 100644 --- a/drivers/net/xen-netfront.c +++ b/drivers/net/xen-netfront.c @@ -351,6 +351,9 @@ static int xennet_open(struct net_device *dev) unsigned int i = 0; struct netfront_queue *queue = NULL; + if (!np->queues) + return -ENODEV; + for (i = 0; i < num_queues; ++i) { queue = &np->queues[i]; napi_enable(&queue->napi); @@ -1358,18 +1361,8 @@ static int netfront_probe(struct xenbus_device *dev, #ifdef CONFIG_SYSFS info->netdev->sysfs_groups[0] = &xennet_dev_group; #endif - err = register_netdev(info->netdev); - if (err) { - pr_warn("%s: register_netdev err=%d\n", __func__, err); - goto fail; - } return 0; - - fail: - xennet_free_netdev(netdev); - dev_set_drvdata(&dev->dev, NULL); - return err; } static void xennet_end_access(int ref, void *page) @@ -1737,8 +1730,6 @@ static void xennet_destroy_queues(struct netfront_info *info) { unsigned int i; - rtnl_lock(); - for (i = 0; i < info->netdev->real_num_tx_queues; i++) { struct netfront_queue *queue = &info->queues[i]; @@ -1747,8 +1738,6 @@ static void xennet_destroy_queues(struct netfront_info *info) netif_napi_del(&queue->napi); } - rtnl_unlock(); - kfree(info->queues); info->queues = NULL; } @@ -1764,8 +1753,6 @@ static int xennet_create_queues(struct netfront_info *info, if (!info->queues) return -ENOMEM; - rtnl_lock(); - for (i = 0; i < *num_queues; i++) { struct netfront_queue *queue = &info->queues[i]; @@ -1774,7 +1761,7 @@ static int xennet_create_queues(struct netfront_info *info, ret = xennet_init_queue(queue); if (ret < 0) { - dev_warn(&info->netdev->dev, + dev_warn(&info->xbdev->dev, "only created %d queues\n", i); *num_queues = i; break; @@ -1788,10 +1775,8 @@ static int xennet_create_queues(struct netfront_info *info, netif_set_real_num_tx_queues(info->netdev, *num_queues); - rtnl_unlock(); - if (*num_queues == 0) { - dev_err(&info->netdev->dev, "no queues\n"); + dev_err(&info->xbdev->dev, "no queues\n"); return -EINVAL; } return 0; @@ -1828,6 +1813,7 @@ static int talk_to_netback(struct xenbus_device *dev, goto out; } + rtnl_lock(); if (info->queues) xennet_destroy_queues(info); @@ -1838,6 +1824,7 @@ static int talk_to_netback(struct xenbus_device *dev, info->queues = NULL; goto out; } + rtnl_unlock(); /* Create shared ring, alloc event channel -- for each queue */ for (i = 0; i < num_queues; ++i) { @@ -1934,8 +1921,10 @@ static int talk_to_netback(struct xenbus_device *dev, xenbus_transaction_end(xbt, 1); destroy_ring: xennet_disconnect_backend(info); + rtnl_lock(); xennet_destroy_queues(info); out: + rtnl_unlock(); device_unregister(&dev->dev); return err; } @@ -1965,6 +1954,15 @@ static int xennet_connect(struct net_device *dev) netdev_update_features(dev
[PATCH RESEND 4.4-only] netlink: Allow direct reclaim for fallback allocation
The backport of d35c99ff77ec ("netlink: do not enter direct reclaim from netlink_dump()") to the 4.4 branch (first in 4.4.32) mistakenly removed direct claim from the initial large allocation _and_ the fallback allocation which means that allocations can spuriously fail. Fix the issue by adding back the direct reclaim flag to the fallback allocation. Fixes: 6d123f1d396b ("netlink: do not enter direct reclaim from netlink_dump()") Signed-off-by: Ross Lagerwall --- Note that this is only for the 4.4 branch as the regression is only in this branch. Consequently, there is no corresponding upstream commit. I'm resending this to the linux-stable list since I now understand the netdev maintainer only handles backports for the last couple of versions of Linux. net/netlink/af_netlink.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index 8e33019..acfb16f 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -2107,7 +2107,7 @@ static int netlink_dump(struct sock *sk) if (!skb) { alloc_size = alloc_min_size; skb = netlink_alloc_skb(sk, alloc_size, nlk->portid, - (GFP_KERNEL & ~__GFP_DIRECT_RECLAIM)); + GFP_KERNEL); } if (!skb) goto errout_skb; -- 2.7.4
[PATCH 4.4-only] netlink: Allow direct reclaim for fallback allocation
The backport of d35c99ff77ec ("netlink: do not enter direct reclaim from netlink_dump()") to the 4.4 branch (first in 4.4.32) mistakenly removed direct claim from the initial large allocation and the fallback allocation which means that allocations can spuriously fail. Fix the issue by adding back the direct reclaim flag to the fallback allocation. Fixes: 6d123f1d396b ("netlink: do not enter direct reclaim from netlink_dump()") Signed-off-by: Ross Lagerwall --- net/netlink/af_netlink.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index 8e33019..acfb16f 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -2107,7 +2107,7 @@ static int netlink_dump(struct sock *sk) if (!skb) { alloc_size = alloc_min_size; skb = netlink_alloc_skb(sk, alloc_size, nlk->portid, - (GFP_KERNEL & ~__GFP_DIRECT_RECLAIM)); + GFP_KERNEL); } if (!skb) goto errout_skb; -- 2.7.4
[PATCH v3] xen-netfront: Improve error handling during initialization
This fixes a crash when running out of grant refs when creating many queues across many netdevs. * If creating queues fails (i.e. there are no grant refs available), call xenbus_dev_fatal() to ensure that the xenbus device is set to the closed state. * If no queues are created, don't call xennet_disconnect_backend as netdev->real_num_tx_queues will not have been set correctly. * If setup_netfront() fails, ensure that all the queues created are cleaned up, not just those that have been set up. * If any queues were set up and an error occurs, call xennet_destroy_queues() to clean up the napi context. * If any fatal error occurs, unregister and destroy the netdev to avoid leaving around a half setup network device. Signed-off-by: Ross Lagerwall --- Changed in V3: * If xennet_create_queues returns < 0, it will not have created any queues so there's no need to go to destroy_ring. drivers/net/xen-netfront.c | 29 +++-- 1 file changed, 11 insertions(+), 18 deletions(-) diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c index 722fe9f..f90fc72 100644 --- a/drivers/net/xen-netfront.c +++ b/drivers/net/xen-netfront.c @@ -1823,27 +1823,19 @@ static int talk_to_netback(struct xenbus_device *dev, xennet_destroy_queues(info); err = xennet_create_queues(info, &num_queues); - if (err < 0) - goto destroy_ring; + if (err < 0) { + xenbus_dev_fatal(dev, err, "creating queues"); + kfree(info->queues); + info->queues = NULL; + goto out; + } /* Create shared ring, alloc event channel -- for each queue */ for (i = 0; i < num_queues; ++i) { queue = &info->queues[i]; err = setup_netfront(dev, queue, feature_split_evtchn); - if (err) { - /* setup_netfront() will tidy up the current -* queue on error, but we need to clean up -* those already allocated. -*/ - if (i > 0) { - rtnl_lock(); - netif_set_real_num_tx_queues(info->netdev, i); - rtnl_unlock(); - goto destroy_ring; - } else { - goto out; - } - } + if (err) + goto destroy_ring; } again: @@ -1933,9 +1925,10 @@ static int talk_to_netback(struct xenbus_device *dev, xenbus_transaction_end(xbt, 1); destroy_ring: xennet_disconnect_backend(info); - kfree(info->queues); - info->queues = NULL; + xennet_destroy_queues(info); out: + unregister_netdev(info->netdev); + xennet_free_netdev(info->netdev); return err; } -- 2.7.4
Re: [PATCH v2] xen-netfront: Improve error handling during initialization
On 02/07/2017 11:33 PM, Boris Ostrovsky wrote: On 02/07/2017 09:55 AM, Ross Lagerwall wrote: This fixes a crash when running out of grant refs when creating many queues across many netdevs. * If creating queues fails (i.e. there are no grant refs available), call xenbus_dev_fatal() to ensure that the xenbus device is set to the closed state. * If no queues are created, don't call xennet_disconnect_backend as netdev->real_num_tx_queues will not have been set correctly. * If setup_netfront() fails, ensure that all the queues created are cleaned up, not just those that have been set up. * If any queues were set up and an error occurs, call xennet_destroy_queues() to clean up the napi context. * If any fatal error occurs, unregister and destroy the netdev to avoid leaving around a half setup network device. Signed-off-by: Ross Lagerwall --- Changed in V2: * Retested on top of v4.10-rc7 + "xen-netfront: Delete rx_refill_timer in xennet_disconnect_backend()". * Don't move setup_timer as it is not necessary. drivers/net/xen-netfront.c | 33 +++-- 1 file changed, 15 insertions(+), 18 deletions(-) diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c index 722fe9f..5399a86 100644 --- a/drivers/net/xen-netfront.c +++ b/drivers/net/xen-netfront.c @@ -1823,27 +1823,23 @@ static int talk_to_netback(struct xenbus_device *dev, xennet_destroy_queues(info); err = xennet_create_queues(info, &num_queues); - if (err < 0) - goto destroy_ring; + if (err < 0) { + xenbus_dev_fatal(dev, err, "creating queues"); + if (num_queues > 0) { + goto destroy_ring; The only way for us to have (err<0) && (num_queues>0) is when we get a -ENOMEM right at the top, isn't it? So there is nothing to disconnect or destroy, it seems to me. And if that's true you can directly 'goto out'. You're right, although that might make it a bit more fragile if something in xennet_create_queues() changes in the future. Nevertheless, I'll update the patch. -- Ross Lagerwall
[PATCH v2] xen-netfront: Improve error handling during initialization
This fixes a crash when running out of grant refs when creating many queues across many netdevs. * If creating queues fails (i.e. there are no grant refs available), call xenbus_dev_fatal() to ensure that the xenbus device is set to the closed state. * If no queues are created, don't call xennet_disconnect_backend as netdev->real_num_tx_queues will not have been set correctly. * If setup_netfront() fails, ensure that all the queues created are cleaned up, not just those that have been set up. * If any queues were set up and an error occurs, call xennet_destroy_queues() to clean up the napi context. * If any fatal error occurs, unregister and destroy the netdev to avoid leaving around a half setup network device. Signed-off-by: Ross Lagerwall --- Changed in V2: * Retested on top of v4.10-rc7 + "xen-netfront: Delete rx_refill_timer in xennet_disconnect_backend()". * Don't move setup_timer as it is not necessary. drivers/net/xen-netfront.c | 33 +++-- 1 file changed, 15 insertions(+), 18 deletions(-) diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c index 722fe9f..5399a86 100644 --- a/drivers/net/xen-netfront.c +++ b/drivers/net/xen-netfront.c @@ -1823,27 +1823,23 @@ static int talk_to_netback(struct xenbus_device *dev, xennet_destroy_queues(info); err = xennet_create_queues(info, &num_queues); - if (err < 0) - goto destroy_ring; + if (err < 0) { + xenbus_dev_fatal(dev, err, "creating queues"); + if (num_queues > 0) { + goto destroy_ring; + } else { + kfree(info->queues); + info->queues = NULL; + goto out; + } + } /* Create shared ring, alloc event channel -- for each queue */ for (i = 0; i < num_queues; ++i) { queue = &info->queues[i]; err = setup_netfront(dev, queue, feature_split_evtchn); - if (err) { - /* setup_netfront() will tidy up the current -* queue on error, but we need to clean up -* those already allocated. -*/ - if (i > 0) { - rtnl_lock(); - netif_set_real_num_tx_queues(info->netdev, i); - rtnl_unlock(); - goto destroy_ring; - } else { - goto out; - } - } + if (err) + goto destroy_ring; } again: @@ -1933,9 +1929,10 @@ static int talk_to_netback(struct xenbus_device *dev, xenbus_transaction_end(xbt, 1); destroy_ring: xennet_disconnect_backend(info); - kfree(info->queues); - info->queues = NULL; + xennet_destroy_queues(info); out: + unregister_netdev(info->netdev); + xennet_free_netdev(info->netdev); return err; } -- 2.7.4
Re: [PATCH] xen-netfront: Improve error handling during initialization
On 02/01/2017 06:54 PM, Boris Ostrovsky wrote: On 02/01/2017 10:50 AM, Ross Lagerwall wrote: Improve error handling during initialization. This fixes a crash when running out of grant refs when creating many queues across many netdevs. * Delay timer creation so that if initializing a queue fails, the timer has not been setup yet. * If creating queues fails (i.e. there are no grant refs available), call xenbus_dev_fatal() to ensure that the xenbus device is set to the closed state. * If no queues are created, don't call xennet_disconnect_backend as netdev->real_num_tx_queues will not have been set correctly. * If setup_netfront() fails, ensure that all the queues created are cleaned up, not just those that have been set up. * If any queues were set up and an error occurs, call xennet_destroy_queues() to stop the timer and clean up the napi context. We need to stop the timer in xennet_disconnect_backend(). I sent a patch a couple of day ago https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg03269.html but was about to resend it with del_timer_sync() moved after napi_synchronize(). OK, but the patch is still relevant since I believe we still need to clean up the napi context in this case (plus the patch fixes a lot of other issues). But I will respin it on top of your patch(es) and re-test it before resending. -- Ross Lagerwall
[PATCH] xen-netfront: Improve error handling during initialization
Improve error handling during initialization. This fixes a crash when running out of grant refs when creating many queues across many netdevs. * Delay timer creation so that if initializing a queue fails, the timer has not been setup yet. * If creating queues fails (i.e. there are no grant refs available), call xenbus_dev_fatal() to ensure that the xenbus device is set to the closed state. * If no queues are created, don't call xennet_disconnect_backend as netdev->real_num_tx_queues will not have been set correctly. * If setup_netfront() fails, ensure that all the queues created are cleaned up, not just those that have been set up. * If any queues were set up and an error occurs, call xennet_destroy_queues() to stop the timer and clean up the napi context. * If any fatal error occurs, unregister and destroy the netdev to avoid leaving around a half setup network device. Signed-off-by: Ross Lagerwall --- drivers/net/xen-netfront.c | 39 ++- 1 file changed, 18 insertions(+), 21 deletions(-) diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c index 8315fe7..8ca85af 100644 --- a/drivers/net/xen-netfront.c +++ b/drivers/net/xen-netfront.c @@ -1596,9 +1596,6 @@ static int xennet_init_queue(struct netfront_queue *queue) spin_lock_init(&queue->tx_lock); spin_lock_init(&queue->rx_lock); - setup_timer(&queue->rx_refill_timer, rx_refill_timeout, - (unsigned long)queue); - snprintf(queue->name, sizeof(queue->name), "%s-q%u", queue->info->netdev->name, queue->id); @@ -1632,6 +1629,9 @@ static int xennet_init_queue(struct netfront_queue *queue) goto exit_free_tx; } + setup_timer(&queue->rx_refill_timer, rx_refill_timeout, + (unsigned long)queue); + return 0; exit_free_tx: @@ -1822,27 +1822,23 @@ static int talk_to_netback(struct xenbus_device *dev, xennet_destroy_queues(info); err = xennet_create_queues(info, &num_queues); - if (err < 0) - goto destroy_ring; + if (err < 0) { + xenbus_dev_fatal(dev, err, "creating queues"); + if (num_queues > 0) { + goto destroy_ring; + } else { + kfree(info->queues); + info->queues = NULL; + goto out; + } + } /* Create shared ring, alloc event channel -- for each queue */ for (i = 0; i < num_queues; ++i) { queue = &info->queues[i]; err = setup_netfront(dev, queue, feature_split_evtchn); - if (err) { - /* setup_netfront() will tidy up the current -* queue on error, but we need to clean up -* those already allocated. -*/ - if (i > 0) { - rtnl_lock(); - netif_set_real_num_tx_queues(info->netdev, i); - rtnl_unlock(); - goto destroy_ring; - } else { - goto out; - } - } + if (err) + goto destroy_ring; } again: @@ -1932,9 +1928,10 @@ static int talk_to_netback(struct xenbus_device *dev, xenbus_transaction_end(xbt, 1); destroy_ring: xennet_disconnect_backend(info); - kfree(info->queues); - info->queues = NULL; + xennet_destroy_queues(info); out: + unregister_netdev(info->netdev); + xennet_free_netdev(info->netdev); return err; } -- 2.7.4
[PATCH v2] xen/netback: Wake dealloc thread after completing zerocopy work
Waking the dealloc thread before decrementing inflight_packets is racy because it means the thread may go to sleep before inflight_packets is decremented. If kthread_stop() has already been called, the dealloc thread may wait forever with nothing to wake it. Instead, wake the thread only after decrementing inflight_packets. Signed-off-by: Ross Lagerwall --- Changed in V2: Move wakeup into zerocopy_complete function. drivers/net/xen-netback/interface.c | 6 ++ drivers/net/xen-netback/netback.c | 1 - 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c index 1a83e19..28577a3 100644 --- a/drivers/net/xen-netback/interface.c +++ b/drivers/net/xen-netback/interface.c @@ -61,6 +61,12 @@ void xenvif_skb_zerocopy_prepare(struct xenvif_queue *queue, void xenvif_skb_zerocopy_complete(struct xenvif_queue *queue) { atomic_dec(&queue->inflight_packets); + + /* Wake the dealloc thread _after_ decrementing inflight_packets so +* that if kthread_stop() has already been called, the dealloc thread +* does not wait forever with nothing to wake it. +*/ + wake_up(&queue->dealloc_wq); } int xenvif_schedulable(struct xenvif *vif) diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c index 7d50711..09ffda4 100644 --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -1536,7 +1536,6 @@ void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success) smp_wmb(); queue->dealloc_prod++; } while (ubuf); - wake_up(&queue->dealloc_wq); spin_unlock_irqrestore(&queue->callback_lock, flags); if (likely(zerocopy_success)) -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] xen/netback: Wake dealloc thread after completing zerocopy work
Waking the dealloc thread before decrementing inflight_packets is racy because it means the thread may go to sleep before inflight_packets is decremented. If kthread_stop() has already been called, the dealloc thread may wait forever with nothing to wake it. Instead, wake the thread only after decrementing inflight_packets. Signed-off-by: Ross Lagerwall --- drivers/net/xen-netback/netback.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c index 7d50711..e95ee20 100644 --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -1536,7 +1536,6 @@ void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success) smp_wmb(); queue->dealloc_prod++; } while (ubuf); - wake_up(&queue->dealloc_wq); spin_unlock_irqrestore(&queue->callback_lock, flags); if (likely(zerocopy_success)) @@ -1544,6 +1543,7 @@ void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success) else queue->stats.tx_zerocopy_fail++; xenvif_skb_zerocopy_complete(queue); + wake_up(&queue->dealloc_wq); } static inline void xenvif_tx_dealloc_action(struct xenvif_queue *queue) -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] xen-netback: Allocate fraglist early to avoid complex rollback
Determine if a fraglist is needed in the tx path, and allocate it if necessary before setting up the copy and map operations. Otherwise, undoing the copy and map operations is tricky. This fixes a use-after-free: if allocating the fraglist failed, the copy and map operations that had been set up were still executed, writing over the data area of a freed skb. Signed-off-by: Ross Lagerwall --- drivers/net/xen-netback/netback.c | 61 +-- 1 file changed, 33 insertions(+), 28 deletions(-) diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c index 7d50711..1b406e7 100644 --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -810,23 +810,17 @@ static inline struct sk_buff *xenvif_alloc_skb(unsigned int size) static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif_queue *queue, struct sk_buff *skb, struct xen_netif_tx_request *txp, - struct gnttab_map_grant_ref *gop) + struct gnttab_map_grant_ref *gop, + unsigned int frag_overflow, + struct sk_buff *nskb) { struct skb_shared_info *shinfo = skb_shinfo(skb); skb_frag_t *frags = shinfo->frags; u16 pending_idx = XENVIF_TX_CB(skb)->pending_idx; int start; pending_ring_idx_t index; - unsigned int nr_slots, frag_overflow = 0; + unsigned int nr_slots; - /* At this point shinfo->nr_frags is in fact the number of -* slots, which can be as large as XEN_NETBK_LEGACY_SLOTS_MAX. -*/ - if (shinfo->nr_frags > MAX_SKB_FRAGS) { - frag_overflow = shinfo->nr_frags - MAX_SKB_FRAGS; - BUG_ON(frag_overflow > MAX_SKB_FRAGS); - shinfo->nr_frags = MAX_SKB_FRAGS; - } nr_slots = shinfo->nr_frags; /* Skip first skb fragment if it is on same page as header fragment. */ @@ -841,13 +835,6 @@ static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif_queue *que } if (frag_overflow) { - struct sk_buff *nskb = xenvif_alloc_skb(0); - if (unlikely(nskb == NULL)) { - if (net_ratelimit()) - netdev_err(queue->vif->dev, - "Can't allocate the frag_list skb.\n"); - return NULL; - } shinfo = skb_shinfo(nskb); frags = shinfo->frags; @@ -1175,9 +1162,10 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue, unsigned *copy_ops, unsigned *map_ops) { - struct gnttab_map_grant_ref *gop = queue->tx_map_ops, *request_gop; - struct sk_buff *skb; + struct gnttab_map_grant_ref *gop = queue->tx_map_ops; + struct sk_buff *skb, *nskb; int ret; + unsigned int frag_overflow; while (skb_queue_len(&queue->tx_queue) < budget) { struct xen_netif_tx_request txreq; @@ -1265,6 +1253,29 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue, break; } + skb_shinfo(skb)->nr_frags = ret; + if (data_len < txreq.size) + skb_shinfo(skb)->nr_frags++; + /* At this point shinfo->nr_frags is in fact the number of +* slots, which can be as large as XEN_NETBK_LEGACY_SLOTS_MAX. +*/ + frag_overflow = 0; + nskb = NULL; + if (skb_shinfo(skb)->nr_frags > MAX_SKB_FRAGS) { + frag_overflow = skb_shinfo(skb)->nr_frags - MAX_SKB_FRAGS; + BUG_ON(frag_overflow > MAX_SKB_FRAGS); + skb_shinfo(skb)->nr_frags = MAX_SKB_FRAGS; + nskb = xenvif_alloc_skb(0); + if (unlikely(nskb == NULL)) { + kfree_skb(skb); + xenvif_tx_err(queue, &txreq, idx); + if (net_ratelimit()) + netdev_err(queue->vif->dev, + "Can't allocate the frag_list skb.\n"); + break; + } + } + if (extras[XEN_NETIF_EXTRA_TYPE_GSO - 1].type) { struct xen_netif_extra_info *gso; gso = &extras[XEN_NETIF_EXTRA_TYPE_GSO - 1]; @@ -127
[PATCH net v2] xen/netback: Properly initialize credit_bytes
Commit e9ce7cb6b107 ("xen-netback: Factor queue-specific data into queue struct") introduced a regression when moving queue-specific data into the queue struct by failing to set the credit_bytes field. This prevented bandwidth limiting from working. Initialize the field as it was done before multiqueue support was added. Signed-off-by: Ross Lagerwall --- Changed in v2: Fixed multiple-assignment. drivers/net/xen-netback/xenbus.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c index 3d8dbf5..fee0241 100644 --- a/drivers/net/xen-netback/xenbus.c +++ b/drivers/net/xen-netback/xenbus.c @@ -793,6 +793,7 @@ static void connect(struct backend_info *be) goto err; } + queue->credit_bytes = credit_bytes; queue->remaining_credit = credit_bytes; queue->credit_usec = credit_usec; -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html