Re: [PATCH net 1/1] r8169: fix sleepable allocation during netdevice stats retrieval.
2nd batch: 0001-r8169-decouple-the-counters-data-and-the-device-priv.patch 0002-r8169-use-a-single-condition-to-check-the-completion.patch 0003-r8169-increase-the-lifespan-of-the-hardware-counters.patch - the noise is still present -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: parsing the output of ethtool programmatically
On 9/5/15, 2:53 AM, Ben Hutchings wrote: On Thu, 2015-06-25 at 03:28 +0200, Robert Urban wrote: Hello, there doesn't seem to be any flag to request ethtool to present its output in a script-friendly form. Things like: Link partner advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full are kind of pain to parse. Something to consider? Or write a library for using SIOCETHTOOL. The interface is fairly well documented now and there are other users besides the ethtool utility. one of the requests which we usually get is ethtool output in json format. since iproute2 tools are already getting an option to print in json format, would be nice if we considered an option to print in json for ethtool too. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net v3] ipv6: fix multipath route replace error recovery
On 9/6/15, 1:46 PM, Roopa Prabhu wrote: From: Roopa Prabhu Problem: The ecmp route replace support for ipv6 in the kernel, deletes the existing ecmp route too early, ie when it installs the first nexthop. If there is an error in installing the subsequent nexthops, its too late to recover the already deleted existing route This patch fixes the problem with the following: a) Changes the existing multipath route add code to a two stage process: build rt6_infos + insert them ip6_route_add rt6_info creation code is moved into ip6_route_info_create. b) This ensures that all errors are caught during building rt6_infos and we fail early The other way I have been thinking of solving the problem is to mark the sibling routes being replaced with some state ...so they can be restored on error. Still figuring out a way to do this in a clean and non-intrusive way. Or maybe just save the sibling routes (rt6_infos) being replaced in a list and re-insert them on error. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net v3] ipv6: fix multipath route replace error recovery
On 9/7/15, 5:37 AM, Nicolas Dichtel wrote: Le 06/09/2015 22:46, Roopa Prabhu a écrit : From: Roopa Prabhu I've sent you some comments about the v2, so please keep me in CC for the next versions. Problem: The ecmp route replace support for ipv6 in the kernel, deletes the existing ecmp route too early, ie when it installs the first nexthop. If there is an error in installing the subsequent nexthops, its too late to recover the already deleted existing route This patch fixes the problem with the following: It does not really 'fix' the problem, it only reduces the probability to have an error. This is really different. The status is much better after this patch, but it could be good to reword a bit the commitlog to reflect that. sure. a) Changes the existing multipath route add code to a two stage process: build rt6_infos + insert them ip6_route_add rt6_info creation code is moved into ip6_route_info_create. b) This ensures that all errors are caught during building rt6_infos and we fail early c) Separates multipath add and del code. Because add needs the special two stage mode in a) and delete essentially does not care. d) In any event if the code fails during inserting a route again, a warning is printed (This should be unlikely) Before the patch: $ip -6 route show 3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024 3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024 3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024 /* Try replacing the route with a duplicate nexthop */ $ip -6 route change 3000:1000:1000:1000::2/128 nexthop via fe80::202:ff:fe00:b dev swp49s0 nexthop via fe80::202:ff:fe00:d dev swp49s1 nexthop via fe80::202:ff:fe00:d dev swp49s1 RTNETLINK answers: File exists $ip -6 route show /* previously added ecmp route 3000:1000:1000:1000::2 dissappears from * kernel */ After the patch: $ip -6 route show 3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024 3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024 3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024 /* Try replacing the route with a duplicate nexthop */ $ip -6 route change 3000:1000:1000:1000::2/128 nexthop via fe80::202:ff:fe00:b dev swp49s0 nexthop via fe80::202:ff:fe00:d dev swp49s1 nexthop via fe80::202:ff:fe00:d dev swp49s1 RTNETLINK answers: File exists $ip -6 route show 3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024 3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024 3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024 Fixes: 27596472473a ("ipv6: fix ECMP route replacement") As said in the v2 thread, I still don't agree with this tag. [snip] +static void ip6_print_replace_route_err(struct list_head *rt6_nh_list) +{ +struct rt6_nh *nh; + +list_for_each_entry(nh, rt6_nh_list, next) { +pr_warn("IPV6: replace premature del %pI6 nexthop %pI6 ifi %d\n", I don't think that a user (who didn't read the code) can understand this sentence. Another suggestion: "ECMPv6: route replacement failed (check the consistency of the installed route)". Not sure that the nexthops should be listed after. sure, i don't have a preference. I will resubmit if we converge on the commit message. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net 3/3] r8169: increase the lifespan of the hardware counters dump area.
Francois Romieu : [...] > Updated patch is on the way. Fixed memcpy in patch 0001, moved counters allocation from open() to probe(), returned open() to its original state but something is still wrong: the link does not come up. Patches attached. I'll try again tomorrow. -- Ueimor >From 8f1b87a115f6c340558b95f6a1748ddbf866d95f Mon Sep 17 00:00:00 2001 Message-Id: <8f1b87a115f6c340558b95f6a1748ddbf866d95f.1441670158.git.rom...@fr.zoreil.com> From: Francois Romieu Date: Sat, 5 Sep 2015 13:26:30 +0200 Subject: [PATCH 1/3] r8169: decouple the counters data and the device private area. X-Organisation: Land of Sunshine Inc. Signed-off-by: Francois Romieu --- drivers/net/ethernet/realtek/r8169.c | 69 ++-- 1 file changed, 42 insertions(+), 27 deletions(-) diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index 24dcbe6..ff1b834 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -833,7 +833,7 @@ struct rtl8169_private { unsigned features; struct mii_if_info mii; - struct rtl8169_counters counters; + struct rtl8169_counters *counters; struct rtl8169_tc_offsets tc_offset; u32 saved_wolopts; u32 opts1_mask; @@ -2284,7 +2284,7 @@ static bool rtl8169_update_counters(struct net_device *dev) return false; if (rtl_udelay_loop_wait_low(tp, &rtl_counters_cond, 10, 1000)) - memcpy(&tp->counters, counters, sizeof(*counters)); + memcpy(tp->counters, counters, sizeof(*counters)); else ret = false; @@ -2296,6 +2296,8 @@ static bool rtl8169_update_counters(struct net_device *dev) static bool rtl8169_init_counter_offsets(struct net_device *dev) { struct rtl8169_private *tp = netdev_priv(dev); + struct rtl8169_counters *counters = tp->counters; + struct rtl8169_tc_offsets *offset = &tp->tc_offset; bool ret = false; /* @@ -2313,7 +2315,7 @@ static bool rtl8169_init_counter_offsets(struct net_device *dev) * set at open time by rtl_hw_start. */ - if (tp->tc_offset.inited) + if (offset->inited) return true; /* If both, reset and update fail, propagate to caller. */ @@ -2323,10 +2325,10 @@ static bool rtl8169_init_counter_offsets(struct net_device *dev) if (rtl8169_update_counters(dev)) ret = true; - tp->tc_offset.tx_errors = tp->counters.tx_errors; - tp->tc_offset.tx_multi_collision = tp->counters.tx_multi_collision; - tp->tc_offset.tx_aborted = tp->counters.tx_aborted; - tp->tc_offset.inited = true; + offset->tx_errors = counters->tx_errors; + offset->tx_multi_collision = counters->tx_multi_collision; + offset->tx_aborted = counters->tx_aborted; + offset->inited = true; return ret; } @@ -2335,24 +2337,25 @@ static void rtl8169_get_ethtool_stats(struct net_device *dev, struct ethtool_stats *stats, u64 *data) { struct rtl8169_private *tp = netdev_priv(dev); + struct rtl8169_counters *c = tp->counters; ASSERT_RTNL(); rtl8169_update_counters(dev); - data[0] = le64_to_cpu(tp->counters.tx_packets); - data[1] = le64_to_cpu(tp->counters.rx_packets); - data[2] = le64_to_cpu(tp->counters.tx_errors); - data[3] = le32_to_cpu(tp->counters.rx_errors); - data[4] = le16_to_cpu(tp->counters.rx_missed); - data[5] = le16_to_cpu(tp->counters.align_errors); - data[6] = le32_to_cpu(tp->counters.tx_one_collision); - data[7] = le32_to_cpu(tp->counters.tx_multi_collision); - data[8] = le64_to_cpu(tp->counters.rx_unicast); - data[9] = le64_to_cpu(tp->counters.rx_broadcast); - data[10] = le32_to_cpu(tp->counters.rx_multicast); - data[11] = le16_to_cpu(tp->counters.tx_aborted); - data[12] = le16_to_cpu(tp->counters.tx_underun); + data[ 0] = le64_to_cpu(c->tx_packets); + data[ 1] = le64_to_cpu(c->rx_packets); + data[ 2] = le64_to_cpu(c->tx_errors); + data[ 3] = le32_to_cpu(c->rx_errors); + data[ 4] = le16_to_cpu(c->rx_missed); + data[ 5] = le16_to_cpu(c->align_errors); + data[ 6] = le32_to_cpu(c->tx_one_collision); + data[ 7] = le32_to_cpu(c->tx_multi_collision); + data[ 8] = le64_to_cpu(c->rx_unicast); + data[ 9] = le64_to_cpu(c->rx_broadcast); + data[10] = le32_to_cpu(c->rx_multicast); + data[11] = le16_to_cpu(c->tx_aborted); + data[12] = le16_to_cpu(c->tx_underun); } static void rtl8169_get_strings(struct net_device *dev, u32 stringset, u8 *data) @@ -7779,6 +7782,8 @@ static struct rtnl_link_stats64 * rtl8169_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats) { struct rtl8169_private *tp = netdev_priv(dev); + struct rtl8169_tc_offsets *offset = &tp->tc_
Re: [PATCH net-next v2] ipv6: fix multipath route replace error recovery
On 9/7/15, 5:03 AM, Nicolas Dichtel wrote: The bug you're trying to fix has been introduced by the commit 51ebd3181572 ("ipv6: add support of equal cost multipath (ECMP)"). Commit 27596472473a ("ipv6: fix ECMP route replacement") didn't try to fix this problem. The reason I say "27596472473a ("ipv6: fix ECMP route replacement") ", has introduced the problem i am trying to fix is because of the below change. The part where it deletes all siblings of an existing route diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c index 96dbfff..bde57b1 100644 --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c [..snip] @@ -835,8 +851,27 @@ add: info->nl_net->ipv6.rt6_stats->fib_route_nodes++; fn->fn_flags |= RTN_RTINFO; } + nsiblings = iter->rt6i_nsiblings; fib6_purge_rt(iter, fn, info->nl_net); rt6_release(iter); + + if (nsiblings) { + /* Replacing an ECMP route, remove all siblings */ + ins = &rt->dst.rt6_next; + iter = *ins; + while (iter) { + if (rt6_qualify_for_ecmp(iter)) { + *ins = iter->dst.rt6_next; + fib6_purge_rt(iter, fn, info->nl_net); + rt6_release(iter); + nsiblings--; + } else { + ins = &iter->dst.rt6_next; + } + iter = *ins; + } + WARN_ON(nsiblings != 0); + } } return 0; Note that the first patch you submitted to fix this pb (cf http://patchwork.ozlabs.org/patch/362296/) was in June 2014, ie one year before the commit 27596472473a. yes, i had submitted the patch you mention above to fix a slightly different problem that existed then..which was introduced by "51ebd3181572 ("ipv6: add support of equal cost multipath (ECMP)")". Commit "35f1b4e96b9258a3668872b1139c51e5a23eb876 ipv6: do not delete previously existing ECMP routes if add fails" subsequently fixed it. Thanks, Roopa -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net 3/3] r8169: increase the lifespan of the hardware counters dump area.
From: Corinna Vinschen Date: Mon, 7 Sep 2015 11:29:49 +0200 > Still wondering though. Given that the driver never failed before if > the counter values couldn't be updated, and given that these counter > values only have statistical relevance, why should this suddenly result > in a fatal failure at open time? Failing to allocate such a small buffer means we have much deeper issues at hand. A GFP_KERNEL allocation of this size really should not fail. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net 3/3] r8169: increase the lifespan of the hardware counters dump area.
Corinna Vinschen : [...] > I have a bit of a problem with this patch. It crashes when loading the > driver manually w/ modprobe. For some reason dev_get_stats is called > during rtl_init_one and at that time the counters pointer is NULL, so > the kernel gets a SEGV. > > After I worked around that I got a SEGV in > rtl_remove_one, which I still have to find out why. I didn't test with > the latest kernel, though, so I still have to check if that's the reason > for the crashes. That takes a bit longer, I just wanted to let you > know. Call me stupid: I forgot that it's perfectly fine to request the stats of a down interface. :o/ Updated patch is on the way. > There's also a problem with rtl8169_cmd_counters: It always calls > rtl_udelay_loop_wait_low w/ rtl_reset_counters_cond, even when called > from rtl8169_update_counters, where it should call > rtl_udelay_loop_wait_low w/ rtl_counters_cond to check for the > CounterDump flag, rather than for the CounterReset flag. > > For now I applied the below patch, but I think it's a bit ugly to > conditionalize the rtl_cond struct by the incoming flag value. Let's check both at once: return RTL_R32(CounterAddrLow) & (CounterDump | CounterReset); Thanks for reviewing. -- Ueimor -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] stmmac: fix check for phydev being open
Hi Sergei, On Tue, 2015-09-08 at 00:24 +0300, Sergei Shtylyov wrote: > On 09/07/2015 11:50 PM, Alexey Brodkin wrote: > > > Current implementation via IS_ERR(phydev) may make no sense because > > of_phy_attach() returns NULL on failure instead of error value. > > Not of_phy_connect()? I already noticed this misspelling - it came from my exploration of what happens inside of_phy_connect(). So sort of braindump. I will fix in v3 re-spin. -Alexey-- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] stmmac: fix check for phydev being open
On 09/07/2015 11:50 PM, Alexey Brodkin wrote: Current implementation via IS_ERR(phydev) may make no sense because of_phy_attach() returns NULL on failure instead of error value. Not of_phy_connect()? Still for checking result of phy_connect() IS_ERR() is useful. To address both situations we use combined IS_ERR_OR_NULL() check. Cc: Giuseppe Cavallaro Cc: linux-ker...@vger.kernel.org Cc: sta...@vger.kernel.org Cc: David Miller Cc: Sergei Shtylyov Signed-off-by: Alexey Brodkin MBR, Sergei -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/3] Network stack, first user of SLAB/kmem_cache bulk free API.
On 09/07/2015 01:16 AM, Jesper Dangaard Brouer wrote: On Fri, 4 Sep 2015 11:09:21 -0700 Alexander Duyck wrote: This is an interesting start. However I feel like it might work better if you were to create a per-cpu pool for skbs that could be freed and allocated in NAPI context. So for example we already have napi_alloc_skb, why not just add a napi_free_skb I do like the idea... If nothing else you want to avoid having to redo this code for every driver. If you can just replace dev_kfree_skb with some other freeing call it will make it much easier to convert other drivers. and then make the array of objects to be freed part of a pool that could be used for either allocation or freeing? If the pool runs empty you just allocate something like 8 or 16 new skb heads, and if you fill it you just free half of the list? But I worry that this algorithm will "randomize" the (skb) objects. And the SLUB bulk optimization only works if we have many objects belonging to the same page. Agreed to some extent, however at the same time what this does is allow for a certain amount of skb recycling. So instead of freeing the buffers received from the socket you would likely be recycling them and sending them back as Rx skbs. In the case of a heavy routing workload you would likely just be cycling through the same set of buffers and cleaning them off of transmit and placing them back on receive. The general idea is to keep the memory footprint small so recycling Tx buffers to use for Rx can have its advantages in terms of keeping things confined to limits of the L1/L2 cache. It would likely be fastest to implement a simple stack (for these per-cpu pools), but I again worry that it would randomize the object-pages. A simple queue might be better, but slightly slower. Guess I could just reuse part of qmempool / alf_queue as a quick test. I would say don't over engineer it. A stack is the simplest. The qmempool / alf_queue is just going to add extra overhead. The added advantage to the stack is that you are working with pointers and you are guaranteed that the list of pointers are going to be linear. If you use a queue clean-up will require up to 2 blocks of freeing in case the ring has wrapped. Having a per-cpu pool in networking would solve the problem of the slub per-cpu pool isn't large enough for our use-case. On the other hand, maybe we should fix slub to dynamically adjust the size of it's per-cpu resources? The per-cpu pool is just meant to replace the the per-driver pool you were using. By using a per-cpu pool you would get better aggregation and can just flush the freed buffers at the end of the Rx softirq or when the pool is full instead of having to flush smaller lists per call to napi->poll. A pre-req knowledge (for people not knowing slub's internal details): Slub alloc path will pickup a page, and empty all objects for that page before proceeding to the next page. Thus, slub bulk alloc will give many objects belonging to the page. I'm trying to keep these objects grouped together until they can be free'ed in a bulk. The problem is you aren't going to be able to keep them together very easily. Yes they might be allocated all from one spot on Rx but they can very easily end up scattered to multiple locations. The same applies to Tx where you will have multiple flows all outgoing on one port. That is why I was thinking adding some skb recycling via a per-cpu stack might be useful especially since you have to either fill or empty the stack when you allocate or free multiple skbs anyway. In addition it provides an easy way for a bulk alloc and a bulk free to share data structures without adding additional overhead by keeping them separate. If you managed it with some sort of high-water/low-water mark type setup you could very well keep the bulk-alloc/free busy without too much fragmentation. For the socket transmit/receive case the thing you have to keep in mind is that if you reuse the buffers you are just going to be throwing them back at the sockets which are likely not using bulk-free anyway. So in that case reuse could actually improve things by simply reducing the number of calls to bulk-alloc you will need to make since things like TSO allow you to send 64K using a single sk_buff, while you will be likely be receiving one or more acks on the receive side which will require allocations. - Alex -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: slab-nomerge (was Re: [git pull] device mapper changes for 4.3)
On Mon, 7 Sep 2015 13:22:13 -0700 Linus Torvalds wrote: > On Mon, Sep 7, 2015 at 2:30 AM, Jesper Dangaard Brouer > wrote: > > > > The slub allocator have a faster "fastpath", if your workload is > > fast-reusing within the same per-cpu page-slab, but once the workload > > increases you hit the slowpath, and then slab catches up. Slub looks > > great in micro-benchmarking. > > > > And with "slab_nomerge" I get even high performance: > > I think those two are related. > > Not merging means that effectively the percpu caches end up being > bigger (simply because there are more of them), and so it captures > more of the fastpath cases. Yes, that was also my theory. As manually tuning the percpu sizes gave me almost the same boost. > Obviously the percpu queue size is an easy tunable too, but there are > real downsides to that too. The easy fix is to introduce a subsystem specific percpu cache that is large enough for our use-case. That seems to be a trend. I'm hoping to come up with something smarter that every subsystem can benefit from. E.g some heuristic that can dynamic adjust SLUB according to the usage pattern. I can imagine something as simple as a counter for every slowpath call, that is only valid as long as the jiffies count matches (reset to zero, and store new jiffies cnt). (But I have not thought this through...) > I suspect your IP forwarding case isn't so > different from some of the microbenchmarks, it just has more > outstanding work.. Yes, I will admit that my testing is very close to micro benchmarking, and it is specifically designed to pressure the system to its limits[1]. Especially the minimum frame size is evil and unrealistic, but the real purpose is preparing the stack for increasing speeds like 100Gbit/s. > And yes, the slow path (ie not hitting in the percpu cache) of SLUB > could hopefully be optimizable too, although maybe the bulk patches > are the way to go (and unrelated to this thread - at least part of > your bulk patches actually got merged last Friday - they were part of > Andrew's patch-bomb). Cool. Yes, it is only part of the bulk patches. The real performance boosters are not in yet (but I need to make them work correctly with memory debugging enabled before they can get merged). At least the main API is in, which allows me to implement use-case easier in other subsystems :-) [1] http://netoptimizer.blogspot.dk/2014/09/packet-per-sec-measurements-for.html -- Best regards, Jesper Dangaard Brouer MSc.CS, Sr. Network Kernel Developer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] stmmac: fix check for phydev being open
Hi Sergei, On Mon, 2015-09-07 at 23:53 +0300, Sergei Shtylyov wrote: > On 09/07/2015 11:50 PM, Alexey Brodkin wrote: > > > Current implementation via IS_ERR(phydev) may make no sense because > > of_phy_attach() returns NULL on failure instead of error value. > > > > Still for checking result of phy_connect() IS_ERR() is useful. > > > > To address both situations we use combined IS_ERR_OR_NULL() check. > > > > Cc: Giuseppe Cavallaro > > Cc: linux-ker...@vger.kernel.org > > Cc: sta...@vger.kernel.org > > Cc: David Miller > > Cc: Sergei Shtylyov > > Signed-off-by: Alexey Brodkin > > --- > > > > Changes compared to v1: > > * Use IS_ERR_OR_NULL() instead of discrete checks for null and err > > > > drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > > b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > > index 864b476..7985d8a 100644 > > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > > @@ -837,7 +837,7 @@ static int stmmac_init_phy(struct net_device *dev) > > interface); > > } > > > > - if (IS_ERR(phydev)) { > > + if (IS_ERR_OR_NULL(phydev)) { > > pr_err("%s: Could not attach to PHY\n", dev->name); > > return PTR_ERR(phydev); > > Hm, in case of phydev == NULL, you're going to return 0 here... is that > what you want? Ah, right. So then the question would be what's a proper error code for !phydev: -ENOENT or -ENODEV? -Alexey-- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] stmmac: fix check for phydev being open
On 09/07/2015 11:50 PM, Alexey Brodkin wrote: Current implementation via IS_ERR(phydev) may make no sense because of_phy_attach() returns NULL on failure instead of error value. Still for checking result of phy_connect() IS_ERR() is useful. To address both situations we use combined IS_ERR_OR_NULL() check. Cc: Giuseppe Cavallaro Cc: linux-ker...@vger.kernel.org Cc: sta...@vger.kernel.org Cc: David Miller Cc: Sergei Shtylyov Signed-off-by: Alexey Brodkin --- Changes compared to v1: * Use IS_ERR_OR_NULL() instead of discrete checks for null and err drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 864b476..7985d8a 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -837,7 +837,7 @@ static int stmmac_init_phy(struct net_device *dev) interface); } - if (IS_ERR(phydev)) { + if (IS_ERR_OR_NULL(phydev)) { pr_err("%s: Could not attach to PHY\n", dev->name); return PTR_ERR(phydev); Hm, in case of phydev == NULL, you're going to return 0 here... is that what you want? MBR, Sergei -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] stmmac: fix check for phydev being open
Current implementation via IS_ERR(phydev) may make no sense because of_phy_attach() returns NULL on failure instead of error value. Still for checking result of phy_connect() IS_ERR() is useful. To address both situations we use combined IS_ERR_OR_NULL() check. Cc: Giuseppe Cavallaro Cc: linux-ker...@vger.kernel.org Cc: sta...@vger.kernel.org Cc: David Miller Cc: Sergei Shtylyov Signed-off-by: Alexey Brodkin --- Changes compared to v1: * Use IS_ERR_OR_NULL() instead of discrete checks for null and err drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 864b476..7985d8a 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -837,7 +837,7 @@ static int stmmac_init_phy(struct net_device *dev) interface); } - if (IS_ERR(phydev)) { + if (IS_ERR_OR_NULL(phydev)) { pr_err("%s: Could not attach to PHY\n", dev->name); return PTR_ERR(phydev); } -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: slab-nomerge (was Re: [git pull] device mapper changes for 4.3)
On Mon, Sep 7, 2015 at 2:30 AM, Jesper Dangaard Brouer wrote: > > The slub allocator have a faster "fastpath", if your workload is > fast-reusing within the same per-cpu page-slab, but once the workload > increases you hit the slowpath, and then slab catches up. Slub looks > great in micro-benchmarking. > > And with "slab_nomerge" I get even high performance: I think those two are related. Not merging means that effectively the percpu caches end up being bigger (simply because there are more of them), and so it captures more of the fastpath cases. Obviously the percpu queue size is an easy tunable too, but there are real downsides to that too. I suspect your IP forwarding case isn't so different from some of the microbenchmarks, it just has more outstanding work.. And yes, the slow path (ie not hitting in the percpu cache) of SLUB could hopefully be optimizable too, although maybe the bulk patches are the way to go (and unrelated to this thread - at least part of your bulk patches actually got merged last Friday - they were part of Andrew's patch-bomb). Linus -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 1/3] net: introduce kfree_skb_bulk() user of kmem_cache_free_bulk()
On Mon, 7 Sep 2015 09:25:49 -0700 Tom Herbert wrote: > >> What not pass a list of skbs (e.g. using skb->next)? > > > > Because the next layer, the slab API needs an array: > > kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p) > > > > I suppose we could ask the same question of that function. IMO > encouraging drivers to define arrays of pointers on the stack like > you're doing in the ixgbe patch is a bad direction. > > In any case I believe this would be simpler in the networking side > just to maintain a list of skb's to free. Then the dev_free_waitlist > structure might not be needed then since we could just use a > skb_buf_head for that. I guess it is more natural for the network side to work with skb lists. But I'm keeping it for slab/slub as we cannot assume/enforce objects of a specific data type. I worried about how large bulk free we should allow, due to the interaction with skb->destructor which for sockets affect their memory accounting. E.g. we have seen issues with hypervisor network drivers (Xen and HyperV) that are too slow to cleanup their TX completion queue that their TCP bandwidth get limited by tcp_limit_output_bytes. I capped it at 32, and the NAPI budget will cap it at 64. By the following argument, bulk free of 64 objects/skb's is not a problem. The delay I'm introducing is very small, before the first real kfree_skb is called, which calls the destructor with free up socket memory accounting. Assume measured packet rate of: 2105011 pps Time between packets (1/2105011*10^9): 475 ns Perf shows ixgbe_clean_tx_irq() takes: 1.23% Extrapolating the function call cost: 5.84 ns (475*(1.23/100)) Processing 64 packets in ixgbe_clean_tx_irq() 373 ns. At 10Gbit/s how many bytes can arrive in this period, only: 466 bytes. ((373/10^9)*(1*10^6)/8) -- Best regards, Jesper Dangaard Brouer MSc.CS, Sr. Network Kernel Developer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ip link/addr show: add empty line between interfaces
Excerpts from Stephen Hemminger's message of 2015-09-07 11:09:32 -0700: > Agreed with Jiri. I prefer the new shorter brief format, not longer format. As long as it requires extra typing, few are going to use it. It's about the default. Did you try the brief format with a few v6 addresses on an interface? When the line gets wrapped, it gets hard to read. Besides, this isn't about the brief format, but about making the long format less unreadable. > Did you ever try a system with 10 interfaces or 1000 tunnels?? The largest I tried it on had 11 interfaces (8 of which are for tunnels). That's not 1000, but how many people have that? After all the positive feedback I got locally I'm genuinely surprised that you don't like it, but I guess that's the way it is. Thanks for looking at it, though! Felix -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] stmmac: fix check for phydev being open
Hello. On 09/07/2015 09:15 PM, Alexey Brodkin wrote: Current implementation via IS_ERR(phydev) may make no sense because of_phy_attach() returns NULL on failure instead of error value. Still for checking result of phy_connect() IS_ERR() is useful. So adding explicit check for NULL. Cc: Giuseppe Cavallaro Cc: arc-linux-...@synopsys.com Cc: linux-ker...@vger.kernel.org Cc: sta...@vger.kernel.org Cc: David Miller Signed-off-by: Alexey Brodkin --- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 864b476..ad2ce3e 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -837,7 +837,7 @@ static int stmmac_init_phy(struct net_device *dev) interface); } - if (IS_ERR(phydev)) { + if (!phydev || IS_ERR(phydev)) { There's IS_ERR_OR_NULL(). WBR, Sergei -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] stmmac: fix check for phydev being open
Current implementation via IS_ERR(phydev) may make no sense because of_phy_attach() returns NULL on failure instead of error value. Still for checking result of phy_connect() IS_ERR() is useful. So adding explicit check for NULL. Cc: Giuseppe Cavallaro Cc: arc-linux-...@synopsys.com Cc: linux-ker...@vger.kernel.org Cc: sta...@vger.kernel.org Cc: David Miller Signed-off-by: Alexey Brodkin --- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 864b476..ad2ce3e 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -837,7 +837,7 @@ static int stmmac_init_phy(struct net_device *dev) interface); } - if (IS_ERR(phydev)) { + if (!phydev || IS_ERR(phydev)) { pr_err("%s: Could not attach to PHY\n", dev->name); return PTR_ERR(phydev); } -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] iproute: print more verbose error on route cache flush
On Sat, 5 Sep 2015 10:40:50 +0300 Denis Kirjanov wrote: > Before: > kda@vfirst ~/devel/iproute2 $ ./ip/ip route flush cache > Cannot open "/proc/sys/net/ipv4/route/flush" > > After: > kda@vfirst ~/devel/iproute2/ip $ ./ip route flush cache > Cannot open "/proc/sys/net/ipv4/route/flush": Permission denied > > Signed-off-by: Denis Kirjanov > --- > ip/iproute.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/ip/iproute.c b/ip/iproute.c > index 8f49e62..abe4180 100644 > --- a/ip/iproute.c > +++ b/ip/iproute.c > @@ -1213,7 +1213,8 @@ static int iproute_flush_cache(void) > char *buffer = "-1"; > > if (flush_fd < 0) { > - fprintf (stderr, "Cannot open \"%s\"\n", ROUTE_FLUSH_PATH); > + fprintf (stderr, "Cannot open \"%s\": %s\n", > + ROUTE_FLUSH_PATH, strerror(errno)); > return -1; > } > Sure why not, applied -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ip link/addr show: add empty line between interfaces
On Mon, 7 Sep 2015 17:54:11 +0200 Jiri Benc wrote: > On Mon, 07 Sep 2015 17:38:48 +0200, Felix Kaiser wrote: > > My opinion is that if someone parses that, they deserve the opportunity > > to fix their code. At the very least they could have used -oneline! > > No doubt. I also don't like scripts that parse the default "ip a" > output. > > I just pointed out that there are different opinions and that it can > have some consequences. Personally, I have no strong feeling against > nor for the patch. It's up to Stephen to decide. > > > I've met users who use a shell alias to hack newlines in > > Which is not a bad solution, actually. > > Jiri > Agreed with Jiri. I prefer the new shorter brief format, not longer format. Did you ever try a system with 10 interfaces or 1000 tunnels?? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Fw: [Bug 104161] New: DLNA services disappear on bridge device shortly after starting the dlna service
Begin forwarded message: Date: Mon, 7 Sep 2015 13:31:54 + From: "bugzilla-dae...@bugzilla.kernel.org" To: "shemmin...@linux-foundation.org" Subject: [Bug 104161] New: DLNA services disappear on bridge device shortly after starting the dlna service https://bugzilla.kernel.org/show_bug.cgi?id=104161 Bug ID: 104161 Summary: DLNA services disappear on bridge device shortly after starting the dlna service Product: Networking Version: 2.5 Kernel Version: 4.2 Hardware: All OS: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: IPV4 Assignee: shemmin...@linux-foundation.org Reporter: t.p...@gmx.de Regression: No Hi, https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=9afd85c9e4552b276e2f4cfefd622bdeeffbbf26 This commit introduces a weird behaviour on my dlna server, commit was bisected. This commit was merged into 4.2 series, which makes my dlna/upnp services disappear real soon after starting from discovering from other clients. Localhost seems not to be affected but every external discovery is broken. The dlna services run on a bridge device with IPV4. Mythtv and Minidlna both disappear after some minutes of running. 4.2 with those 3 patches reverted makes it work as it should: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=a516993f0ac1694673412eb2d16a091eafa77d2a https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=fcba67c94abe83e0e69a65737000ccbb16a4fa03 https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=9afd85c9e4552b276e2f4cfefd622bdeeffbbf26 Do you need any more information to get this really weird bug fixed? Thanks in advance. -- You are receiving this mail because: You are the assignee for the bug. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 18/20] net/xen-netback: Make it running on 64KB page granularity
You might need to rebase you patch. A patch to netback went it recently. On Mon, Sep 07, 2015 at 04:33:56PM +0100, Julien Grall wrote: > The PV network protocol is using 4KB page granularity. The goal of this > patch is to allow a Linux using 64KB page granularity working as a > network backend on a non-modified Xen. > > It's only necessary to adapt the ring size and break skb data in small > chunk of 4KB. The rest of the code is relying on the grant table code. > > Signed-off-by: Julien Grall > Reviewed-by: Wei Liu -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 1/3] net: introduce kfree_skb_bulk() user of kmem_cache_free_bulk()
>> What not pass a list of skbs (e.g. using skb->next)? > > Because the next layer, the slab API needs an array: > kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p) > I suppose we could ask the same question of that function. IMO encouraging drivers to define arrays of pointers on the stack like you're doing in the ixgbe patch is a bad direction. In any case I believe this would be simpler in the networking side just to maintain a list of skb's to free. Then the dev_free_waitlist structure might not be needed then since we could just use a skb_buf_head for that. Tom > Look at the patch: > [PATCH V2 3/3] slub: build detached freelist with look-ahead > http://thread.gmane.org/gmane.linux.kernel.mm/137469/focus=137472 > > Where I use this array to progressively scan for objects belonging to > the same page. (A subtle detail is I manage to zero out the array, > which is good from a security/error-handling point of view, as pointers > to the objects are not left dangling on the stack). > > > I cannot argue that, writing skb->next comes as an additional cost, > because the slUb free also writes into this cacheline. Perhaps the > slAb allocator does not? > > [...] >> > + if (likely(cnt)) { >> > + kmem_cache_free_bulk(skbuff_head_cache, cnt, (void **) >> > skbs); >> > + } >> > +} >> > +EXPORT_SYMBOL(kfree_skb_bulk); > > -- > Best regards, > Jesper Dangaard Brouer > MSc.CS, Sr. Network Kernel Developer at Red Hat > Author of http://www.iptv-analyzer.org > LinkedIn: http://www.linkedin.com/in/brouer -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ip link/addr show: add empty line between interfaces
On Mon, 07 Sep 2015 17:38:48 +0200, Felix Kaiser wrote: > My opinion is that if someone parses that, they deserve the opportunity > to fix their code. At the very least they could have used -oneline! No doubt. I also don't like scripts that parse the default "ip a" output. I just pointed out that there are different opinions and that it can have some consequences. Personally, I have no strong feeling against nor for the patch. It's up to Stephen to decide. > I've met users who use a shell alias to hack newlines in Which is not a bad solution, actually. Jiri -- Jiri Benc -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 17/20] net/xen-netfront: Make it running on 64KB page granularity
The PV network protocol is using 4KB page granularity. The goal of this patch is to allow a Linux using 64KB page granularity using network device on a non-modified Xen. It's only necessary to adapt the ring size and break skb data in small chunk of 4KB. The rest of the code is relying on the grant table code. Note that we allocate a Linux page for each rx skb but only the first 4KB is used. We may improve the memory usage by extending the size of the rx skb. Signed-off-by: Julien Grall Reviewed-by: David Vrabel --- Cc: Konrad Rzeszutek Wilk Cc: Boris Ostrovsky Cc: netdev@vger.kernel.org Improvement such as support of 64KB grant is not taken into consideration in this patch because we have the requirement to run a Linux using 64KB pages on a non-modified Xen. Tested with workload such as ping, ssh, wget, git... I would happy if someone give details how to test all the path. Changes in v4: - s/gnttab_one_grant/gnttab_for_one_grant/ based on the new naming - Add David's reviewed-by Changes in v3: - Fix errors reported by checkpatch.pl - s/mfn/gfn/ base on the new naming - xennet_tx_setup_grant was calling itself resulting an guest stall when using iperf. - The grant callback doesn't allow anymore to change the len (wasn't used here) - gnttab_foreach_grant has been renamed to gnttab_foreach_grant_in_range - gnttab_page_grant_foreign_ref has been renamed to gnttab_foreach_grant_foreign_ref_one Changes in v2: - Use gnttab_foreach_grant to split a Linux page in grant - Fix count slots --- drivers/net/xen-netfront.c | 122 - 1 file changed, 86 insertions(+), 36 deletions(-) diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c index 47f791e..17b1013 100644 --- a/drivers/net/xen-netfront.c +++ b/drivers/net/xen-netfront.c @@ -74,8 +74,8 @@ struct netfront_cb { #define GRANT_INVALID_REF 0 -#define NET_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, PAGE_SIZE) -#define NET_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, PAGE_SIZE) +#define NET_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, XEN_PAGE_SIZE) +#define NET_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, XEN_PAGE_SIZE) /* Minimum number of Rx slots (includes slot for GSO metadata). */ #define NET_RX_SLOTS_MIN (XEN_NETIF_NR_SLOTS_MIN + 1) @@ -291,7 +291,7 @@ static void xennet_alloc_rx_buffers(struct netfront_queue *queue) struct sk_buff *skb; unsigned short id; grant_ref_t ref; - unsigned long gfn; + struct page *page; struct xen_netif_rx_request *req; skb = xennet_alloc_one_rx_buffer(queue); @@ -307,14 +307,13 @@ static void xennet_alloc_rx_buffers(struct netfront_queue *queue) BUG_ON((signed short)ref < 0); queue->grant_rx_ref[id] = ref; - gfn = xen_page_to_gfn(skb_frag_page(&skb_shinfo(skb)->frags[0])); + page = skb_frag_page(&skb_shinfo(skb)->frags[0]); req = RING_GET_REQUEST(&queue->rx, req_prod); - gnttab_grant_foreign_access_ref(ref, - queue->info->xbdev->otherend_id, - gfn, - 0); - + gnttab_page_grant_foreign_access_ref_one(ref, + queue->info->xbdev->otherend_id, +page, +0); req->id = id; req->gref = ref; } @@ -415,25 +414,33 @@ static void xennet_tx_buf_gc(struct netfront_queue *queue) xennet_maybe_wake_tx(queue); } -static struct xen_netif_tx_request *xennet_make_one_txreq( - struct netfront_queue *queue, struct sk_buff *skb, - struct page *page, unsigned int offset, unsigned int len) +struct xennet_gnttab_make_txreq { + struct netfront_queue *queue; + struct sk_buff *skb; + struct page *page; + struct xen_netif_tx_request *tx; /* Last request */ + unsigned int size; +}; + +static void xennet_tx_setup_grant(unsigned long gfn, unsigned int offset, + unsigned int len, void *data) { + struct xennet_gnttab_make_txreq *info = data; unsigned int id; struct xen_netif_tx_request *tx; grant_ref_t ref; - - len = min_t(unsigned int, PAGE_SIZE - offset, len); + /* convenient aliases */ + struct page *page = info->page; + struct netfront_queue *queue = info->queue; + struct sk_buff *skb = info->skb; id = get_id_from_freelist(&queue->tx_skb_freelist, queue->tx_skbs); tx = RING_GET_REQUEST(&queue->tx, queue->tx.req_prod_pvt++); ref = gnttab_claim_
[PATCH v4 18/20] net/xen-netback: Make it running on 64KB page granularity
The PV network protocol is using 4KB page granularity. The goal of this patch is to allow a Linux using 64KB page granularity working as a network backend on a non-modified Xen. It's only necessary to adapt the ring size and break skb data in small chunk of 4KB. The rest of the code is relying on the grant table code. Signed-off-by: Julien Grall --- Cc: Ian Campbell Cc: Wei Liu Cc: netdev@vger.kernel.org Improvement such as support of 64KB grant is not taken into consideration in this patch because we have the requirement to run a Linux using 64KB pages on a non-modified Xen. Note that I haven't add a comment why the offset is 0 after the first iteration. See [1] for more details. [1] https://lkml.org/lkml/2015/8/10/456 Changes in v4: - Add a comment to explain how we compute MAX_XEN_SKB_FRAGS Changes in v3: - Fix errors reported by checkpatch.pl - s/mfn/gfn/ based on the new naming - gnttab_foreach_grant has been renamed to gnttab_forach_grant_in_range - The grant callback doesn't allow anymore to use less data. An helpers has been added in netback to handle this. Changes in v2: - Correctly set MAX_GRANT_COPY_OPS and XEN_NETBK_RX_SLOTS_MAX - Don't use XEN_PAGE_SIZE in handle_frag_list as we coalesce fragment into a new skb - Use gnntab_foreach_grant to split a Linux page into grant --- drivers/net/xen-netback/common.h | 18 +++-- drivers/net/xen-netback/netback.c | 153 -- 2 files changed, 110 insertions(+), 61 deletions(-) diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h index 8a495b3..24cb365 100644 --- a/drivers/net/xen-netback/common.h +++ b/drivers/net/xen-netback/common.h @@ -44,6 +44,7 @@ #include #include #include +#include #include typedef unsigned int pending_ring_idx_t; @@ -64,8 +65,8 @@ struct pending_tx_info { struct ubuf_info callback_struct; }; -#define XEN_NETIF_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, PAGE_SIZE) -#define XEN_NETIF_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, PAGE_SIZE) +#define XEN_NETIF_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, XEN_PAGE_SIZE) +#define XEN_NETIF_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, XEN_PAGE_SIZE) struct xenvif_rx_meta { int id; @@ -80,16 +81,21 @@ struct xenvif_rx_meta { /* Discriminate from any valid pending_idx value. */ #define INVALID_PENDING_IDX 0x -#define MAX_BUFFER_OFFSET PAGE_SIZE +#define MAX_BUFFER_OFFSET XEN_PAGE_SIZE #define MAX_PENDING_REQS XEN_NETIF_TX_RING_SIZE +/* The maximum number of frags is derived from the size of a grant (same + * as a Xen page size for now). + */ +#define MAX_XEN_SKB_FRAGS (65536 / XEN_PAGE_SIZE + 1) + /* It's possible for an skb to have a maximal number of frags * but still be less than MAX_BUFFER_OFFSET in size. Thus the - * worst-case number of copy operations is MAX_SKB_FRAGS per + * worst-case number of copy operations is MAX_XEN_SKB_FRAGS per * ring slot. */ -#define MAX_GRANT_COPY_OPS (MAX_SKB_FRAGS * XEN_NETIF_RX_RING_SIZE) +#define MAX_GRANT_COPY_OPS (MAX_XEN_SKB_FRAGS * XEN_NETIF_RX_RING_SIZE) #define NETBACK_INVALID_HANDLE -1 @@ -203,7 +209,7 @@ struct xenvif_queue { /* Per-queue data for xenvif */ /* Maximum number of Rx slots a to-guest packet may use, including the * slot needed for GSO meta-data. */ -#define XEN_NETBK_RX_SLOTS_MAX (MAX_SKB_FRAGS + 1) +#define XEN_NETBK_RX_SLOTS_MAX ((MAX_XEN_SKB_FRAGS + 1)) enum state_bit_shift { /* This bit marks that the vif is connected */ diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c index d4c1bc7..b1649aa 100644 --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -263,6 +263,80 @@ static struct xenvif_rx_meta *get_next_rx_buffer(struct xenvif_queue *queue, return meta; } +struct gop_frag_copy { + struct xenvif_queue *queue; + struct netrx_pending_operations *npo; + struct xenvif_rx_meta *meta; + int head; + int gso_type; + + struct page *page; +}; + +static void xenvif_setup_copy_gop(unsigned long gfn, + unsigned int offset, + unsigned int *len, + struct gop_frag_copy *info) +{ + struct gnttab_copy *copy_gop; + struct xen_page_foreign *foreign; + /* Convenient aliases */ + struct xenvif_queue *queue = info->queue; + struct netrx_pending_operations *npo = info->npo; + struct page *page = info->page; + + BUG_ON(npo->copy_off > MAX_BUFFER_OFFSET); + + if (npo->copy_off == MAX_BUFFER_OFFSET) + info->meta = get_next_rx_buffer(queue, npo); + + if (npo->copy_off + *len > MAX_BUFFER_OFFSET) + *len = MAX_BUFFER_OFFSET - npo->copy_off; + + copy_gop = npo->copy + npo->copy_prod++; + copy_gop->flags = GNTCOPY_de
Re: [PATCH] ip link/addr show: add empty line between interfaces
Excerpts from Jiri Benc's message of 2015-09-07 14:46:09 +0200: > Or makes the readability worse by requiring more scrolling, depending > on number of interfaces you have and your taste. Of course it's slightly subjective, but I think it improves readability quite a bit *especially* on servers with many interfaces. Without the additional lines, you can obviously fit more on one screen, but I don't think it's much, and I think that once you have a screenful of text, adding some whitespace makes it much, much easier to focus on the individual sections. > Many users prefer the current format. I've met users who prefer ifconfigs format (and have literally told me that they find ips hard to read), and I've met users who use a shell alias to hack newlines in (which is the motivation for this patch), and everyone that I've shown side-by-side comparisons has been supportive. > Also, as sad as it is, there are scripts out there that parse ip output > and I have no doubts this would break them. I'm aware. My opinion is that if someone parses that, they deserve the opportunity to fix their code. At the very least they could have used -oneline! The default output format is clearly intended for humans; it seems like no attention at all has been given to parseability. And that's as it should be - since, because it requires the fewest keystrokes, that's the format that nearly all interactive users are going to see, and the output should be optimized for them. If absolute compatibility even for badly written scripts that misuse the tool had to be maintained then that'd mean that no improvements could ever be made, and that'd be sad. Felix -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 01/20] net/xen-netback: xenvif_gop_frag_copy: move GSO check out of the loop
The skb doesn't change within the function. Therefore it's only necessary to check if we need GSO once at the beginning. Signed-off-by: Julien Grall Acked-by: Wei Liu --- Cc: Ian Campbell Cc: netdev@vger.kernel.org Changes in v4: - Add Wei's acked Changes in v2: - Patch added --- drivers/net/xen-netback/netback.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c index 7c64c74..d4c1bc7 100644 --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -277,6 +277,13 @@ static void xenvif_gop_frag_copy(struct xenvif_queue *queue, struct sk_buff *skb unsigned long bytes; int gso_type = XEN_NETIF_GSO_TYPE_NONE; + if (skb_is_gso(skb)) { + if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4) + gso_type = XEN_NETIF_GSO_TYPE_TCPV4; + else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6) + gso_type = XEN_NETIF_GSO_TYPE_TCPV6; + } + /* Data must not cross a page boundary. */ BUG_ON(size + offset > PAGE_SIZEgso_type & SKB_GSO_TCPV6) - gso_type = XEN_NETIF_GSO_TYPE_TCPV6; - } - if (*head && ((1 << gso_type) & queue->vif->gso_mask)) queue->rx.req_cons++; -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] 9p: trans_fd, bail out if recv fcall if missing
req->rc is pre-allocated early on with p9_tag_alloc and shouldn't be missing Signed-off-by: Dominique Martinet --- net/9p/trans_fd.c | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) Feel free to adapt error code/message if you can think of something better. diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c index a270dcc..a6d89c0 100644 --- a/net/9p/trans_fd.c +++ b/net/9p/trans_fd.c @@ -356,13 +356,12 @@ static void p9_read_work(struct work_struct *work) } if (m->req->rc == NULL) { - m->req->rc = kmalloc(sizeof(struct p9_fcall) + - m->client->msize, GFP_NOFS); - if (!m->req->rc) { - m->req = NULL; - err = -ENOMEM; - goto error; - } + p9_debug(P9_DEBUG_ERROR, +"No recv fcall for tag %d (req %p), disconnecting!\n", +m->rc.tag, m->req); + m->req = NULL; + err = -EIO; + goto error; } m->rc.sdata = (char *)m->req->rc + sizeof(struct p9_fcall); memcpy(m->rc.sdata, m->tmp_buf, m->rc.capacity); -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv1 net-next 0/5] netlink: mmap: kernel panic and some issues
On 08/17/2015 11:02 PM, David Miller wrote: ... I would seriously rather see us do an expensive full copy of the SKB than to have traffic which is unexpectedly invisible to taps. I've been looking into this issue a bit further, so the copy for the tap seems doable, but while further going through the code to find similar issues elsewhere, and doing some experiments, it looks like we write shared info also in some edge-cases of upcalls such as nfqueue or ovs when mmaped netlink is used for rx. I did a test with nfqueue using the libmnl mmap branch [1]. My test case reduced the hard-coded frame size in libmnl to 4096. I then crafted via pktgen in receive mode skbs that f.e. have a MTU of 9000, and which do contain page frags. Redirecting that traffic into nfqueue with a listener that is rx mmaped, I can see that in case of nfqnl_build_packet_message(), the kernel is, when invoking skb_zerocopy(), failing the (len <= skb_tailroom(to)) condition, writing some header part, and filling the rest with frags[] in the for loop (thus writing/ leaking into the user mmap buffer), so given the right preconditions, the kernel wouldn't prevent us from doing that. One way to fix it would be to change netlink_alloc_skb() and all its call-sites to add another size param, say 'ldiff', that would contain the size difference that would be needed to copy the non-linear parts from the network skb into the linear ring buffer slot of the netlink skb. Thus, that also in case it exceeds the slot size, the kernel will just fall back to alloc_skb() inside of netlink_alloc_skb() and fill the slot with status NL_MMAP_STATUS_COPY. With this, no doubt it's not super pretty to add that additional size param, but the kernel could still work with mmap receivers. Other than that, I found an skb_copy() in __netlink_dump_start(), but that is a left-over from tx mmap in netlink, so not an issue here. Thanks, Daniel [1] http://git.netfilter.org/libmnl/log/?h=nl-mmap -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net 3/3] r8169: increase the lifespan of the hardware counters dump area.
On Sep 6 12:37, Corinna Vinschen wrote: > On Sep 5 14:18, rom...@fr.zoreil.com wrote: > > From: Francois Romieu > > > > net/core/net-sysfs.c::netstat_show retrieves stats with spinlock held. > > > > This change avoids sleepable allocation and performs some housekeeping: > > - receive ring, transmit ring and counters dump area allocation failures > > are all considered fatal during open() > > - netif_warn is now redundant with rtl_reset_counters_cond built-in > > failure message. > > - rtl_reset_counters_cond induced failures in open() are also considered > > fatal: it takes acceptable work to unwind comfortably. > > Why? The counter reset is not a fatal condition for the operation of > the NIC. Even if the counters show incorrect values, the normal > operation of the NIC is not affected. And to top that off, even if > resetting the counters actually doesn't work, the driver keeps the > offset values, so a failed reset won't even harm the statistics. It > just doesn't make sense to break the NIC operation for such a minor > problem. And worse: > > > -static bool rtl8169_reset_counters(struct net_device *dev) > > +static int rtl8169_reset_counters(struct net_device *dev) > > { > > struct rtl8169_private *tp = netdev_priv(dev); > > - struct rtl8169_counters *counters; > > - dma_addr_t paddr; > > - bool ret = true; > > > > /* > > * Versions prior to RTL_GIGA_MAC_VER_19 don't support resetting the > > * tally counters. > > */ > > if (tp->mac_version < RTL_GIGA_MAC_VER_19) > > - return true; > > - > > - counters = rtl8169_map_counters(dev, &paddr, CounterReset); > > - if (!counters) > > - return false; > > - > > - if (!rtl_udelay_loop_wait_low(tp, &rtl_reset_counters_cond, 10, 1000)) > > - ret = false; > > - > > - rtl8169_unmap_counters(dev, paddr, counters); > > + return -EINVAL; > > This returns -EINVAL even for older chip versions which are not capable > of resetting the counters. The result is, this driver will not work at > all on chip versions prior to RTL_GIGA_MAC_VER_19 anymore, because > rtl_open will always fail. > > Other then that, I can test the patch probably tomorrow. I have a bit of a problem with this patch. It crashes when loading the driver manually w/ modprobe. For some reason dev_get_stats is called during rtl_init_one and at that time the counters pointer is NULL, so the kernel gets a SEGV. After I worked around that I got a SEGV in rtl_remove_one, which I still have to find out why. I didn't test with the latest kernel, though, so I still have to check if that's the reason for the crashes. That takes a bit longer, I just wanted to let you know. There's also a problem with rtl8169_cmd_counters: It always calls rtl_udelay_loop_wait_low w/ rtl_reset_counters_cond, even when called from rtl8169_update_counters, where it should call rtl_udelay_loop_wait_low w/ rtl_counters_cond to check for the CounterDump flag, rather than for the CounterReset flag. For now I applied the below patch, but I think it's a bit ugly to conditionalize the rtl_cond struct by the incoming flag value. Corinna diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index ae07567..cd7adbf 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -2200,6 +2200,13 @@ DECLARE_RTL_COND(rtl_reset_counters_cond) return RTL_R32(CounterAddrLow) & CounterReset; } +DECLARE_RTL_COND(rtl_counters_cond) +{ + void __iomem *ioaddr = tp->mmio_addr; + + return RTL_R32(CounterAddrLow) & CounterDump; +} + static int rtl8169_cmd_counters(struct net_device *dev, u32 counter_cmd) { struct rtl8169_private *tp = netdev_priv(dev); @@ -2212,7 +2219,10 @@ static int rtl8169_cmd_counters(struct net_device *dev, u32 counter_cmd) RTL_W32(CounterAddrLow, cmd); RTL_W32(CounterAddrLow, cmd | counter_cmd); - return rtl_udelay_loop_wait_low(tp, &rtl_reset_counters_cond, 10, 1000); + return rtl_udelay_loop_wait_low(tp, + counter_cmd == CounterDump ? &rtl_counters_cond : +&rtl_reset_counters_cond, + 10, 1000); } static int rtl8169_reset_counters(struct net_device *dev) @@ -2229,13 +2239,6 @@ static int rtl8169_reset_counters(struct net_device *dev) return rtl8169_cmd_counters(dev, CounterReset); } -DECLARE_RTL_COND(rtl_counters_cond) -{ - void __iomem *ioaddr = tp->mmio_addr; - - return RTL_R32(CounterAddrLow) & CounterDump; -} - static int rtl8169_update_counters(struct net_device *dev) { struct rtl8169_private *tp = netdev_priv(dev); @@ -7800,8 +7803,11 @@ rtl8169_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats) /* * Fetch additonal counter values missing in stats collected by driver -* from tally counters. +* from tally counters, but only if w
Re: [PATCH net 1/1] r8169: fix sleepable allocation during netdevice stats retrieval.
On 07.09.2015 10:50, Corinna Vinschen wrote: > On Sep 7 09:13, poma wrote: >> On 06.09.2015 12:19, Corinna Vinschen wrote: >>> On Sep 4 22:59, Francois Romieu wrote: Applies against davem's net as of f1ccbfce2fc787981d1182d09c1f6b67766783a8. drivers/net/ethernet/realtek/r8169.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index 24dcbe6..56829ea 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -2200,7 +2200,7 @@ static struct rtl8169_counters *rtl8169_map_counters(struct net_device *dev, struct rtl8169_counters *counters; u32 cmd; - counters = dma_alloc_coherent(d, sizeof(*counters), paddr, GFP_KERNEL); + counters = dma_alloc_coherent(d, sizeof(*counters), paddr, GFP_ATOMIC); if (counters) { RTL_W32(CounterAddrHigh, (u64)*paddr >> 32); cmd = (u64)*paddr & DMA_BIT_MASK(32); -- 2.4.3 >>> [...] >> 1. Reverted to r8169.c?id=eb78139 >>- the noise is still present >> 2. Patches applied - Francois Romieu (3): >>r8169: decouple the counters data and the device private area. >>r8169: move rtl_reset_counters_cond before the hardware counters helpers. >>r8169: increase the lifespan of the hardware counters dump area. >>- the noise is still present > > Sure you tested the right code? I'm just asking because... > >> [ 70.016445] [] dump_stack+0x4b/0x63 >> [ 70.016456] [] lockdep_rcu_suspicious+0xd7/0x110 >> [ 70.016465] [] ___might_sleep+0xa7/0x230 >> [ 70.016472] [] __might_sleep+0x49/0x80 >> [ 70.016481] [] __alloc_pages_nodemask+0x2fe/0xb90 >> [ 70.016490] [] ? debug_lockdep_rcu_enabled+0x1d/0x20 >> [ 70.016499] [] ? sched_clock+0x9/0x10 >> [ 70.016507] [] ? local_clock+0x1c/0x20 >> [ 70.016514] [] ? debug_lockdep_rcu_enabled+0x1d/0x20 >> [ 70.016524] [] dma_generic_alloc_coherent+0x96/0x130 >> [ 70.016534] [] x86_swiotlb_alloc_coherent+0x25/0x50 >> [ 70.016541] [] dma_alloc_attrs+0x6d/0xe0 >> [ 70.016555] [] rtl8169_map_counters+0x3e/0x70 [r8169] > > ...rtl8169_map_counters only exists before Francois patch. The patch > removes this function. > > > Corinna > I do not know, is this OK? $ rpm -i https://kojipkgs.fedoraproject.org/packages/kernel/4.3.0/0.rc0.git7.1.fc24/src/kernel-4.3.0-0.rc0.git7.1.fc24.src.rpm $ rpmbuild -bp rpmbuild/SPECS/kernel.spec $ cd rpmbuild/BUILD/kernel-4.2.fc24/linux-4.3.0-0.rc0.git7.1.fc24.x86_64/ $ curl -s "http://marc.info/?l=linux-netdev&m=144145559429943&q=raw"; | patch -p1 patching file drivers/net/ethernet/realtek/r8169.c patch unexpectedly ends in middle of line $ curl -s "http://marc.info/?l=linux-netdev&m=144145559729944&q=raw"; | patch -p1 patching file drivers/net/ethernet/realtek/r8169.c patch unexpectedly ends in middle of line $ curl -s "http://marc.info/?l=linux-netdev&m=144145558929942&q=raw"; | patch -p1 patching file drivers/net/ethernet/realtek/r8169.c patch unexpectedly ends in middle of line $ sed -i 's/PATCHLEVEL = 2/PATCHLEVEL = 3/' Makefile $ sed -i 's/EXTRAVERSION =/EXTRAVERSION = -0.rc0.git7.1.fc24.x86_64/' Makefile $ make -j drivers/net/ethernet/realtek/r8169.ko $ su # cp drivers/net/ethernet/realtek/r8169.ko /usr/lib/modules/4.3.0-0.rc0.git7.1.fc24.x86_64/updates/ # depmod # dracut -f --kver 4.3.0-0.rc0.git7.1.fc24.x86_64 # lsinitrd --kver 4.3.0-0.rc0.git7.1.fc24.x86_64 | grep r8169 ... usr/lib/modules/4.3.0-0.rc0.git7.1.fc24.x86_64/updates/r8169.ko # systemctl reboot -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [V9fs-developer] [PATCH] 9p: trans_fd, initialize recv fcall properly if not set
I thought the nature of trans_fd would have prevented any sort of true zero copy, but I suppose one less is always welcome :) -eric On Sun, Sep 6, 2015 at 1:55 AM, Dominique Martinet wrote: > Eric Van Hensbergen wrote on Sat, Sep 05, 2015: >> On Thu, Sep 3, 2015 at 4:38 AM, Dominique Martinet >> wrote: >> > To be honest, I think it might be better to just bail out if we get in >> > this switch (m->req->rc == NULL after p9_tag_lookup) and not try to >> > allocate more, because if we get there it's likely a race condition and >> > silently re-allocating will end up in more troubles than trying to >> > recover is worth. >> > Thoughts ? >> > >> >> Hmmm...trying to rattle my brain and remember why I put it in there >> back in 2008. >> It might have just been over-defensive programming -- or more likely it just >> pre-dated all the zero copy infrastructure which pretty much guaranteed we >> had >> an rc allocated and what is there is vestigial. I'm happy to accept a >> patch which >> makes this an assert, or perhaps just resets the connection because something >> has gone horribly wrong (similar to the ENOMEM path that is there now). > > Yeah, it looks like the safety comes from the zero-copy stuff that came > much later. > Let's go with resetting the connection then. Hmm. EIO is a bit too > generic so would be good to avoid that if possible, but can't think of > anything better... > > > Speaking of zero-copy, I believe it should be fairly straight-forward to > implement for trans_fd now I've actually looked at it, since we do the > payload read after a p9_tag_lookup, would just need m->req to point to a > zc buffer. Write is similar, if there's a zc buffer just send it after > the header. > The cost is a couple more pointers in req and an extra if in both > workers, that seems pretty reasonable. > > Well, I'm not using trans_fd much here (and unfortunately zero-copy > isn't possible at all given the transport protocol for RDMA, at least > for recv), but if anyone cares it probably could be done without too > much hassle for the fd workers. > > -- > Dominique -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] ath6kl: remove redundant null pointer check on send_pkt
From: Colin Ian King The check for send_pkt being NULL is redundant before the call to htc_reclaim_txctrl_buf, therefore it should be removed. This was detected by static analysis by cppcheck. Signed-off-by: Colin Ian King --- drivers/net/wireless/ath/ath6kl/htc_mbox.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/net/wireless/ath/ath6kl/htc_mbox.c b/drivers/net/wireless/ath/ath6kl/htc_mbox.c index e481f14..fffb65b 100644 --- a/drivers/net/wireless/ath/ath6kl/htc_mbox.c +++ b/drivers/net/wireless/ath/ath6kl/htc_mbox.c @@ -1085,9 +1085,7 @@ static int htc_setup_tx_complete(struct htc_target *target) send_pkt->completion = NULL; ath6kl_htc_tx_prep_pkt(send_pkt, 0, 0, 0); status = ath6kl_htc_tx_issue(target, send_pkt); - - if (send_pkt != NULL) - htc_reclaim_txctrl_buf(target, send_pkt); + htc_reclaim_txctrl_buf(target, send_pkt); return status; } -- 2.5.0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] x86: Wire up 32-bit direct socket calls
On Wednesday 02 September 2015 13:16:19 H. Peter Anvin wrote: > On 09/02/2015 02:48 AM, Geert Uytterhoeven wrote: > > > > Should all other architectures follow suit? > > Or should we follow the s390 approach: > > > > It is up to the maintainer(s), largely dependent on how likely you are > going to want to support this in your libc, but in general, socketcall > is an abomination which there is no reason not to bypass. > > So follow suit unless you have a strong reason not to. +1 In my y2038 syscall series, I'm adding a new recvmmsg64 call, and we may decide to add new setsockopt/getsockopt variants as well. This is probably not the last change to socketcall, and it would be made much easier if all architectures had separate calls here. It seems that there are very few architectures that don't already have the separate calls: $ git grep -l __NR_socketcall arch/*/include/uapi | xargs git grep -L recvmsg arch/cris/include/uapi/asm/unistd.h arch/frv/include/uapi/asm/unistd.h arch/m32r/include/uapi/asm/unistd.h arch/m68k/include/uapi/asm/unistd.h arch/mn10300/include/uapi/asm/unistd.h arch/s390/include/uapi/asm/unistd.h These are of course all examples of architectures that originally followed the i386 syscall scheme closely rather than trying to leave out obsolete calls. Arnd -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ip link/addr show: add empty line between interfaces
On Sun, 6 Sep 2015 13:01:09 +0200, Felix Kaiser wrote: > This improves the readability of the output. Or makes the readability worse by requiring more scrolling, depending on number of interfaces you have and your taste. Many users prefer the current format. Also, as sad as it is, there are scripts out there that parse ip output and I have no doubts this would break them. Jiri -- Jiri Benc -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net v3] ipv6: fix multipath route replace error recovery
Le 06/09/2015 22:46, Roopa Prabhu a écrit : From: Roopa Prabhu I've sent you some comments about the v2, so please keep me in CC for the next versions. Problem: The ecmp route replace support for ipv6 in the kernel, deletes the existing ecmp route too early, ie when it installs the first nexthop. If there is an error in installing the subsequent nexthops, its too late to recover the already deleted existing route This patch fixes the problem with the following: It does not really 'fix' the problem, it only reduces the probability to have an error. This is really different. The status is much better after this patch, but it could be good to reword a bit the commitlog to reflect that. a) Changes the existing multipath route add code to a two stage process: build rt6_infos + insert them ip6_route_add rt6_info creation code is moved into ip6_route_info_create. b) This ensures that all errors are caught during building rt6_infos and we fail early c) Separates multipath add and del code. Because add needs the special two stage mode in a) and delete essentially does not care. d) In any event if the code fails during inserting a route again, a warning is printed (This should be unlikely) Before the patch: $ip -6 route show 3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024 3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024 3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024 /* Try replacing the route with a duplicate nexthop */ $ip -6 route change 3000:1000:1000:1000::2/128 nexthop via fe80::202:ff:fe00:b dev swp49s0 nexthop via fe80::202:ff:fe00:d dev swp49s1 nexthop via fe80::202:ff:fe00:d dev swp49s1 RTNETLINK answers: File exists $ip -6 route show /* previously added ecmp route 3000:1000:1000:1000::2 dissappears from * kernel */ After the patch: $ip -6 route show 3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024 3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024 3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024 /* Try replacing the route with a duplicate nexthop */ $ip -6 route change 3000:1000:1000:1000::2/128 nexthop via fe80::202:ff:fe00:b dev swp49s0 nexthop via fe80::202:ff:fe00:d dev swp49s1 nexthop via fe80::202:ff:fe00:d dev swp49s1 RTNETLINK answers: File exists $ip -6 route show 3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024 3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024 3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024 Fixes: 27596472473a ("ipv6: fix ECMP route replacement") As said in the v2 thread, I still don't agree with this tag. [snip] +static void ip6_print_replace_route_err(struct list_head *rt6_nh_list) +{ + struct rt6_nh *nh; + + list_for_each_entry(nh, rt6_nh_list, next) { + pr_warn("IPV6: replace premature del %pI6 nexthop %pI6 ifi %d\n", I don't think that a user (who didn't read the code) can understand this sentence. Another suggestion: "ECMPv6: route replacement failed (check the consistency of the installed route)". Not sure that the nexthops should be listed after. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net] cxgb4: Changes for new firmware 1.14.4.0
Incorporate fw_ldst_cmd structure change for new firmware and also update version string for the same Signed-off-by: Hariprasad Shenai --- drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h | 33 +-- drivers/net/ethernet/chelsio/cxgb4/t4fw_version.h | 12 - 2 files changed, 37 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h b/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h index ab46746..a32de30 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h +++ b/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h @@ -762,8 +762,6 @@ enum fw_ldst_func_mod_index { struct fw_ldst_cmd { __be32 op_to_addrspace; -#define FW_LDST_CMD_ADDRSPACE_S0 -#define FW_LDST_CMD_ADDRSPACE_V(x) ((x) << FW_LDST_CMD_ADDRSPACE_S) __be32 cycles_to_len16; union fw_ldst { struct fw_ldst_addrval { @@ -788,6 +786,13 @@ struct fw_ldst_cmd { __be16 vctl; __be16 rval; } mdio; + struct fw_ldst_cim_rq { + u8 req_first64[8]; + u8 req_second64[8]; + u8 resp_first64[8]; + u8 resp_second64[8]; + __be32 r3[2]; + } cim_rq; union fw_ldst_mps { struct fw_ldst_mps_rplc { __be16 fid_idx; @@ -828,9 +833,33 @@ struct fw_ldst_cmd { __be16 nset_pkd; __be32 data[12]; } pcie; + struct fw_ldst_i2c_deprecated { + u8 pid_pkd; + u8 base; + u8 boffset; + u8 data; + __be32 r9; + } i2c_deprecated; + struct fw_ldst_i2c { + u8 pid; + u8 did; + u8 boffset; + u8 blen; + __be32 r9; + __u8 data[48]; + } i2c; + struct fw_ldst_le { + __be32 index; + __be32 r9; + u8 val[33]; + u8 r11[7]; + } le; } u; }; +#define FW_LDST_CMD_ADDRSPACE_S0 +#define FW_LDST_CMD_ADDRSPACE_V(x) ((x) << FW_LDST_CMD_ADDRSPACE_S) + #define FW_LDST_CMD_MSG_S 31 #define FW_LDST_CMD_MSG_V(x) ((x) << FW_LDST_CMD_MSG_S) diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4fw_version.h b/drivers/net/ethernet/chelsio/cxgb4/t4fw_version.h index 32b2135..58a0327 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/t4fw_version.h +++ b/drivers/net/ethernet/chelsio/cxgb4/t4fw_version.h @@ -36,18 +36,18 @@ #define __T4FW_VERSION_H__ #define T4FW_VERSION_MAJOR 0x01 -#define T4FW_VERSION_MINOR 0x0D -#define T4FW_VERSION_MICRO 0x20 +#define T4FW_VERSION_MINOR 0x0E +#define T4FW_VERSION_MICRO 0x04 #define T4FW_VERSION_BUILD 0x00 #define T5FW_VERSION_MAJOR 0x01 -#define T5FW_VERSION_MINOR 0x0D -#define T5FW_VERSION_MICRO 0x20 +#define T5FW_VERSION_MINOR 0x0E +#define T5FW_VERSION_MICRO 0x04 #define T5FW_VERSION_BUILD 0x00 #define T6FW_VERSION_MAJOR 0x01 -#define T6FW_VERSION_MINOR 0x0D -#define T6FW_VERSION_MICRO 0x2D +#define T6FW_VERSION_MINOR 0x0E +#define T6FW_VERSION_MICRO 0x04 #define T6FW_VERSION_BUILD 0x00 #endif -- 2.3.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next v2] ipv6: fix multipath route replace error recovery
+ Michal Kubecek Le 06/09/2015 22:46, roopa a écrit : On 9/4/15, 1:12 AM, Nicolas Dichtel wrote: Le 03/09/2015 01:44, Roopa Prabhu a écrit : [snip] Fixes: 4a287eba2de3 ("IPv6 routing, NLM_F_* flag support: REPLACE and EXCL flags support, warn about missing CREATE flag") ECMP was added one year after this patch. The right tag is: Fixes: 51ebd3181572 ("ipv6: add support of equal cost multipath (ECMP)") I went back and looked again. It is a recent one 27596472473a ("ipv6: fix ECMP route replacement"). The bug you're trying to fix has been introduced by the commit 51ebd3181572 ("ipv6: add support of equal cost multipath (ECMP)"). Commit 27596472473a ("ipv6: fix ECMP route replacement") didn't try to fix this problem. Note that the first patch you submitted to fix this pb (cf http://patchwork.ozlabs.org/patch/362296/) was in June 2014, ie one year before the commit 27596472473a. Regards, Nicolas -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [GIT] Networking
From: Rustad, Mark D ... > >> static int smp_ah(struct crypto_blkcipher *tfm, const u8 irk[16], > >> const u8 r[3], u8 res[3]) > > > > Expect that it looks like you are passing arrays by value, > > but instead you are passing by reference. > > > > Explicitly pass by reference and sizeof works. > > It depends on what you mean by works. It at least doesn't look so misleading > when passing by reference > and so works more as expected. The sizeof in either case will never return > the size of the array. To > have sizeof return the size of the array would require a typedef of the array > to pass by reference. In > some cases that could be the right thing to do. Feed: int bar(int (*f)[10]) { return sizeof *f; } into cc -O2 -S and look at the generated code - returns 40 not 4. David -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: IPv6 xfrm GSO fragmentation bug
On Fri, Sep 04, 2015 at 01:21:06PM +0800, Herbert Xu wrote: > On Mon, Aug 31, 2015 at 03:35:26PM +0800, Herbert Xu wrote: > > > > I see where the bug came from. Indeed IPv6 does do fragmentation > > but only for tunnel mode. While your patch added a check that also > > affected transport mode. So in addition to the GSO fix we should > > also make the MTU check conditional to tunnel mode. > > Here is the patch: > > ---8<--- > ipv6: Fix IPsec pre-encap fragmentation check > > The IPv6 IPsec pre-encap path performs fragmentation for tunnel-mode > packets. That is, we perform fragmentation pre-encap rather than > post-encap. > > A check was added later to ensure that proper MTU information is > passed back for locally generated traffic. Unfortunately this > check was performed on all IPsec packets, including transport-mode > packets. > > What's more, the check failed to take GSO into account. > > The end result is that transport-mode GSO packets get dropped at > the check. > > This patch fixes it by moving the tunnel mode check forward as well > as adding the GSO check. > > Fixes: dd767856a36e ("xfrm6: Don't call icmpv6_send on local error") > Signed-off-by: Herbert Xu Applied to the ipsec tree, thanks Herbert! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-next PATCH v3] drivers: net: cpsw: Add support to drive gpios for ethernet to be functional
In DRA72x EVM, by default slave 1 is connected to the onboard phy, but slave 2 pins are also muxed with video input module which is controlled by pcf857x gpio and currently to select slave 0 to connect to phy gpio hogging is used, but with omap2plus_defconfig the pcf857x gpio is built as module. So when using NFS on DRA72x EVM, board doesn't boot as gpio hogging do not set proper gpio state to connect slave 0 to phy as it is built as module and you do not see any errors for not setting gpio and just mentions dhcp reply not got. To solve this issue, introducing "mode-gpios" in DT when gpio based muxing is required. This will throw a warning when gpio get fails and returns probe defer. When gpio-pcf857x module is installed, cpsw probes again and ethernet becomes functional. Verified this on DRA72x with pcf as module and ramdisk. Signed-off-by: Mugunthan V N --- Changes from v2: * Used mode-gpios, so that the driver is generic enough to handle multiple gpios This patch is tested on DRA72x, Logs [1] and pushed a branch [2] [1]: http://pastebin.ubuntu.com/12306224/ [2]: git://git.ti.com/~mugunthanvnm/ti-linux-kernel/linux.git cpsw-gpio-optional-v3 --- Documentation/devicetree/bindings/net/cpsw.txt | 7 +++ drivers/net/ethernet/ti/cpsw.c | 9 + 2 files changed, 16 insertions(+) diff --git a/Documentation/devicetree/bindings/net/cpsw.txt b/Documentation/devicetree/bindings/net/cpsw.txt index a9df21a..676ecf6 100644 --- a/Documentation/devicetree/bindings/net/cpsw.txt +++ b/Documentation/devicetree/bindings/net/cpsw.txt @@ -30,6 +30,13 @@ Optional properties: - dual_emac: Specifies Switch to act as Dual EMAC - syscon : Phandle to the system control device node, which is the control module device of the am33x +- mode-gpios : Should be added if one/multiple gpio lines are + required to be driven so that cpsw data lines + can be connected to the phy via selective mux. + For example in dra72x-evm, pcf gpio has to be + driven low so that cpsw slave 0 and phy data + lines are connected via mux. + Slave Properties: Required properties: diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c index 8fc90f1..c670317 100644 --- a/drivers/net/ethernet/ti/cpsw.c +++ b/drivers/net/ethernet/ti/cpsw.c @@ -29,6 +29,7 @@ #include #include #include +#include #include #include #include @@ -2207,6 +2208,7 @@ static int cpsw_probe(struct platform_device *pdev) void __iomem*ss_regs; struct resource *res, *ss_res; const struct of_device_id *of_id; + struct gpio_descs *mode; u32 slave_offset, sliver_offset, slave_size; int ret = 0, i; int irq; @@ -2232,6 +2234,13 @@ static int cpsw_probe(struct platform_device *pdev) goto clean_ndev_ret; } + mode = devm_gpiod_get_array_optional(&pdev->dev, "mode", GPIOD_OUT_LOW); + if (IS_ERR(mode)) { + ret = PTR_ERR(mode); + dev_err(&pdev->dev, "gpio request failed, ret %d\n", ret); + goto clean_ndev_ret; + } + /* * This may be required here for child devices. */ -- 2.6.0.rc0.24.gec371ff -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: slab-nomerge (was Re: [git pull] device mapper changes for 4.3)
On Thu, 3 Sep 2015 20:51:09 -0700 Linus Torvalds wrote: > On Thu, Sep 3, 2015 at 8:26 PM, Dave Chinner wrote: > > > > The double standard is the problem here. No notification, proof, > > discussion or review was needed to turn on slab merging for > > everyone, but you're setting a very high bar to jump if anyone wants > > to turn it off in their code. > > Ehh. You realize that almost the only load that is actually seriously > allocator-limited is networking? > > And slub was beating slab on that? And slub has been doing the merging > since day one. Slab was just changed to try to keep up with the > winning strategy. Sorry, I have to correct you on this. The slub allocator is not as fast as you might think. The slab allocator is actually faster for networking. IP-forwarding, single CPU, single flow UDP (highly tuned): * Allocator slub: 2043575 pps * Allocator slab: 2088295 pps Difference slab faster than slub: * +44720 pps and -10.48ns The slub allocator have a faster "fastpath", if your workload is fast-reusing within the same per-cpu page-slab, but once the workload increases you hit the slowpath, and then slab catches up. Slub looks great in micro-benchmarking. As you can see in patchset: [PATCH 0/3] Network stack, first user of SLAB/kmem_cache bulk free API. http://thread.gmane.org/gmane.linux.kernel.mm/137469/focus=376625 I'm working on speeding up slub to the level of slab. And it seems like I have succeeded with half-a-nanosec 2090522 pps (+2227 pps or 0.51 ns). And with "slab_nomerge" I get even high performance: * slub: bulk-free and slab_nomerge: 2121824 pps * Diff to slub: +78249 and -18.05ns -- Best regards, Jesper Dangaard Brouer MSc.CS, Sr. Network Kernel Developer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net 3/3] r8169: increase the lifespan of the hardware counters dump area.
On Sep 6 22:21, Francois Romieu wrote: > Corinna Vinschen : > > On Sep 5 14:18, rom...@fr.zoreil.com wrote: > [...] > > > - rtl_reset_counters_cond induced failures in open() are also considered > > > fatal: it takes acceptable work to unwind comfortably. > > > > Why? > > > Crap, my description does not match the code wrt rtl_reset_counters_cond. :o/ > > s/rtl8169_reset_counters/rtl8169_update_counters/g > > The code is right. > > [...] > > This returns -EINVAL even for older chip versions which are not capable > > of resetting the counters. The result is, this driver will not work at > > all on chip versions prior to RTL_GIGA_MAC_VER_19 anymore, because > > rtl_open will always fail. > > No. My changelog was misleading. rtl_init_counter_offsets handles this part > correctly. Oh, right, I missed that rtl_init_counter_offsets checks for `if (!rc)', not for `if (rc < 0)' as for the call to rtl8169_update_counters. Still wondering though. Given that the driver never failed before if the counter values couldn't be updated, and given that these counter values only have statistical relevance, why should this suddenly result in a fatal failure at open time? Corinna pgpU8NPcDgH_n.pgp Description: PGP signature
RE: [PATCH] net: tipc: fix stall during bclink wakeup procedure
If an attempt to wake up users of broadcast link is made when there is no enough place in send queue than it may hang up inside the tipc_sk_rcv() function since the loop breaks only after the wake up queue becomes empty. This can lead to complete CPU stall with the following message generated by RCU: INFO: rcu_sched self-detected stall on CPU { 0} (t=2101 jiffies g=54225 c=54224 q=11465) Task dump for CPU 0: tpchR running task0 39949 39948 0x000a 818536c0 88181fa037a0 8106a4be 818536c0 88181fa037c0 8106d8a8 88181fa03800 0001 88181fa037f0 81094a50 88181fa15680 Call Trace: [] sched_show_task+0xae/0x120 [] dump_cpu_task+0x38/0x40 [] rcu_dump_cpu_stacks+0x90/0xd0 [] rcu_check_callbacks+0x3eb/0x6e0 [] ? account_system_time+0x7f/0x170 [] update_process_times+0x34/0x60 [] tick_sched_handle.isra.18+0x31/0x40 [] tick_sched_timer+0x3c/0x70 [] __run_hrtimer.isra.34+0x3d/0xc0 [] hrtimer_interrupt+0xc5/0x1e0 [] ? native_smp_send_reschedule+0x42/0x60 [] local_apic_timer_interrupt+0x34/0x60 [] smp_apic_timer_interrupt+0x3c/0x60 [] apic_timer_interrupt+0x6b/0x70 [] ? _raw_spin_unlock_irqrestore+0x9/0x10 [] __wake_up_sync_key+0x4f/0x60 [] tipc_write_space+0x31/0x40 [tipc] [] filter_rcv+0x31f/0x520 [tipc] [] ? tipc_sk_lookup+0xc9/0x110 [tipc] [] ? _raw_spin_lock_bh+0x19/0x30 [] tipc_sk_rcv+0x2dc/0x3e0 [tipc] [] tipc_bclink_wakeup_users+0x2f/0x40 [tipc] [] tipc_node_unlock+0x186/0x190 [tipc] [] ? kfree_skb+0x2c/0x40 [] tipc_rcv+0x2ac/0x8c0 [tipc] [] tipc_l2_rcv_msg+0x38/0x50 [tipc] [] __netif_receive_skb_core+0x5a3/0x950 [] __netif_receive_skb+0x13/0x60 [] netif_receive_skb_internal+0x1e/0x90 [] napi_gro_receive+0x78/0xa0 [] tg3_poll_work+0xc54/0xf40 [tg3] [] ? consume_skb+0x2c/0x40 [] tg3_poll_msix+0x41/0x160 [tg3] [] net_rx_action+0xe2/0x290 [] __do_softirq+0xda/0x1f0 [] irq_exit+0x76/0xa0 [] do_IRQ+0x55/0xf0 [] common_interrupt+0x6b/0x6b The issue occurs only when tipc_sk_rcv() is used to wake up postponed senders: tipc_bclink_wakeup_users() // wakeupq - is a queue which consists of special // messages with SOCK_WAKEUP type. tipc_sk_rcv(wakeupq) ... while (skb_queue_len(inputq)) { filter_rcv(skb) // Here the type of message is checked // and if it is SOCK_WAKEUP then // it tries to wake up a sender. tipc_write_space(sk) wake_up_interruptible_sync_poll() } After the sender thread is woke up it can gather control and perform an attempt to send a message. But if there is no enough place in send queue it will call link_schedule_user() function which puts a message of type SOCK_WAKEUP to the wakeup queue and put the sender to sleep. Thus the size of the queue actually is not changed and the while() loop never exits. The approach I proposed is to wake up only senders for which there is enough place in send queue so the described issue can't occur. Moreover the same approach is already used to wake up senders on unicast links. I have got into the issue on our product code but to reproduce the issue I changed a benchmark test application (from tipcutils/demos/benchmark) to perform the following scenario: 1. Run 64 instances of test application (nodes). It can be done on the one physical machine. 2. Each application connects to all other using TIPC sockets in RDM mode. 3. When setup is done all nodes start simultaneously send broadcast messages. 4. Everything hangs up. The issue is reproducible only when a congestion on broadcast link occurs. For example, when there are only 8 nodes it works fine since congestion doesn't occur. Send queue limit is 40 in my case (I use a critical importance level) and when 64 nodes send a message at the same moment a congestion occurs every time. Signed-off-by: Dmitry S Kolmakov Reviewed-by: Jon Maloy Acked-by: Ying Xue --- v2: Updated after comments from Jon and Ying. diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c index c5cbdcb..997dd60 100644 --- a/net/tipc/bcast.c +++ b/net/tipc/bcast.c @@ -169,6 +169,30 @@ static void bclink_retransmit_pkt(struct tipc_net *tn, u32 after, u32 to) } /** + * bclink_prepare_wakeup - prepare users for wakeup after congestion + * @bcl: broadcast link + * @resultq: queue for users which can be woken up + * Move a number of waiting users, as permitted by available space in + * the send queue, from link wait queue to specified queue for wakeup + */ +static void bclink_prepare_wakeup(struct tipc_link *bcl, struct sk_
Re: [PATCH net 1/1] r8169: fix sleepable allocation during netdevice stats retrieval.
On Sep 7 09:13, poma wrote: > On 06.09.2015 12:19, Corinna Vinschen wrote: > > On Sep 4 22:59, Francois Romieu wrote: > >> Applies against davem's net as of > >> f1ccbfce2fc787981d1182d09c1f6b67766783a8. > >> > >> drivers/net/ethernet/realtek/r8169.c | 2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >> diff --git a/drivers/net/ethernet/realtek/r8169.c > >> b/drivers/net/ethernet/realtek/r8169.c > >> index 24dcbe6..56829ea 100644 > >> --- a/drivers/net/ethernet/realtek/r8169.c > >> +++ b/drivers/net/ethernet/realtek/r8169.c > >> @@ -2200,7 +2200,7 @@ static struct rtl8169_counters > >> *rtl8169_map_counters(struct net_device *dev, > >>struct rtl8169_counters *counters; > >>u32 cmd; > >> > >> - counters = dma_alloc_coherent(d, sizeof(*counters), paddr, GFP_KERNEL); > >> + counters = dma_alloc_coherent(d, sizeof(*counters), paddr, GFP_ATOMIC); > >>if (counters) { > >>RTL_W32(CounterAddrHigh, (u64)*paddr >> 32); > >>cmd = (u64)*paddr & DMA_BIT_MASK(32); > >> -- > >> 2.4.3 > > [...] > 1. Reverted to r8169.c?id=eb78139 >- the noise is still present > 2. Patches applied - Francois Romieu (3): >r8169: decouple the counters data and the device private area. >r8169: move rtl_reset_counters_cond before the hardware counters helpers. >r8169: increase the lifespan of the hardware counters dump area. >- the noise is still present Sure you tested the right code? I'm just asking because... > [ 70.016445] [] dump_stack+0x4b/0x63 > [ 70.016456] [] lockdep_rcu_suspicious+0xd7/0x110 > [ 70.016465] [] ___might_sleep+0xa7/0x230 > [ 70.016472] [] __might_sleep+0x49/0x80 > [ 70.016481] [] __alloc_pages_nodemask+0x2fe/0xb90 > [ 70.016490] [] ? debug_lockdep_rcu_enabled+0x1d/0x20 > [ 70.016499] [] ? sched_clock+0x9/0x10 > [ 70.016507] [] ? local_clock+0x1c/0x20 > [ 70.016514] [] ? debug_lockdep_rcu_enabled+0x1d/0x20 > [ 70.016524] [] dma_generic_alloc_coherent+0x96/0x130 > [ 70.016534] [] x86_swiotlb_alloc_coherent+0x25/0x50 > [ 70.016541] [] dma_alloc_attrs+0x6d/0xe0 > [ 70.016555] [] rtl8169_map_counters+0x3e/0x70 [r8169] ...rtl8169_map_counters only exists before Francois patch. The patch removes this function. Corinna pgpVJzCq1DXC1.pgp Description: PGP signature
Re: [RFC PATCH 1/3] net: introduce kfree_skb_bulk() user of kmem_cache_free_bulk()
On Fri, 4 Sep 2015 11:47:17 -0700 Tom Herbert wrote: > On Fri, Sep 4, 2015 at 10:00 AM, Jesper Dangaard Brouer > wrote: > > Introduce the first user of SLAB bulk free API kmem_cache_free_bulk(), > > in the network stack in form of function kfree_skb_bulk() which bulk > > free SKBs (not skb clones or skb->head, yet). > > [...] > > +/** > > + * kfree_skb_bulk - bulk free SKBs when refcnt allows to > > + * @skbs: array of SKBs to free > > + * @size: number of SKBs in array > > + * > > + * If SKB refcnt allows for free, then release any auxiliary data > > + * and then bulk free SKBs to the SLAB allocator. > > + * > > + * Note that interrupts must be enabled when calling this function. > > + */ > > +void kfree_skb_bulk(struct sk_buff **skbs, unsigned int size) > > +{ > > What not pass a list of skbs (e.g. using skb->next)? Because the next layer, the slab API needs an array: kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p) Look at the patch: [PATCH V2 3/3] slub: build detached freelist with look-ahead http://thread.gmane.org/gmane.linux.kernel.mm/137469/focus=137472 Where I use this array to progressively scan for objects belonging to the same page. (A subtle detail is I manage to zero out the array, which is good from a security/error-handling point of view, as pointers to the objects are not left dangling on the stack). I cannot argue that, writing skb->next comes as an additional cost, because the slUb free also writes into this cacheline. Perhaps the slAb allocator does not? [...] > > + if (likely(cnt)) { > > + kmem_cache_free_bulk(skbuff_head_cache, cnt, (void **) > > skbs); > > + } > > +} > > +EXPORT_SYMBOL(kfree_skb_bulk); -- Best regards, Jesper Dangaard Brouer MSc.CS, Sr. Network Kernel Developer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/3] Network stack, first user of SLAB/kmem_cache bulk free API.
On Fri, 4 Sep 2015 11:09:21 -0700 Alexander Duyck wrote: > This is an interesting start. However I feel like it might work better > if you were to create a per-cpu pool for skbs that could be freed and > allocated in NAPI context. So for example we already have > napi_alloc_skb, why not just add a napi_free_skb I do like the idea... > and then make the array > of objects to be freed part of a pool that could be used for either > allocation or freeing? If the pool runs empty you just allocate > something like 8 or 16 new skb heads, and if you fill it you just free > half of the list? But I worry that this algorithm will "randomize" the (skb) objects. And the SLUB bulk optimization only works if we have many objects belonging to the same page. It would likely be fastest to implement a simple stack (for these per-cpu pools), but I again worry that it would randomize the object-pages. A simple queue might be better, but slightly slower. Guess I could just reuse part of qmempool / alf_queue as a quick test. Having a per-cpu pool in networking would solve the problem of the slub per-cpu pool isn't large enough for our use-case. On the other hand, maybe we should fix slub to dynamically adjust the size of it's per-cpu resources? A pre-req knowledge (for people not knowing slub's internal details): Slub alloc path will pickup a page, and empty all objects for that page before proceeding to the next page. Thus, slub bulk alloc will give many objects belonging to the page. I'm trying to keep these objects grouped together until they can be free'ed in a bulk. -- Best regards, Jesper Dangaard Brouer MSc.CS, Sr. Network Kernel Developer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net 1/1] r8169: fix sleepable allocation during netdevice stats retrieval.
On 06.09.2015 12:19, Corinna Vinschen wrote: > On Sep 4 22:59, Francois Romieu wrote: >> net/core/net-sysfs.c::netstat_show fetches statistics under dev_base_lock. >> >> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=104031 >> Fixes: 6e85d5ad36a2 ("r8169: Add values missing in @get_stats64 from HW >> counters") >> Signed-off-by: Francois Romieu >> Cc: Corinna Vinschen >> --- >> >> Applies against davem's net as of f1ccbfce2fc787981d1182d09c1f6b67766783a8. >> >> drivers/net/ethernet/realtek/r8169.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/net/ethernet/realtek/r8169.c >> b/drivers/net/ethernet/realtek/r8169.c >> index 24dcbe6..56829ea 100644 >> --- a/drivers/net/ethernet/realtek/r8169.c >> +++ b/drivers/net/ethernet/realtek/r8169.c >> @@ -2200,7 +2200,7 @@ static struct rtl8169_counters >> *rtl8169_map_counters(struct net_device *dev, >> struct rtl8169_counters *counters; >> u32 cmd; >> >> -counters = dma_alloc_coherent(d, sizeof(*counters), paddr, GFP_KERNEL); >> +counters = dma_alloc_coherent(d, sizeof(*counters), paddr, GFP_ATOMIC); >> if (counters) { >> RTL_W32(CounterAddrHigh, (u64)*paddr >> 32); >> cmd = (u64)*paddr & DMA_BIT_MASK(32); >> -- >> 2.4.3 > > I'll have a look tomorrow what I'm back to work, but for the time > being, two questions. > > - The code worked for me in local testing without problem, so I'm rather > puzzled. How can I reproduce the problem? > > - I'm pretty new to this stuff, so I don't understand this: > > The dma_alloc_coherent(...,GFP_KERNEL) call is not new in the code, > it's there since at least 2010. It appears to work fine in the > context of @get_ethtool_stats. Why does this not work in the context > of @ndo_get_stats64? > > > Thanks, > Corinna > 1. Reverted to r8169.c?id=eb78139 - the noise is still present 2. Patches applied - Francois Romieu (3): r8169: decouple the counters data and the device private area. r8169: move rtl_reset_counters_cond before the hardware counters helpers. r8169: increase the lifespan of the hardware counters dump area. - the noise is still present $ modinfo -n r8169 /lib/modules/4.3.0-0.rc0.git7.1.fc24.x86_64/updates/r8169.ko This noise is induced via userspace, xfce4-netload-plugin, http://goodies.xfce.org/projects/panel-plugins/xfce4-netload-plugin $ grep -i device .config/xfce4/panel/netload-16.rc Network_Device=enp3s0 $ ethtool -i enp3s0 | grep driver driver: r8169 $ dmesg | grep panel-16-netloa | wc -l 237 $ dmesg ... [ 68.597764] bridge0: port 1(enp3s0) entered forwarding state [ 70.016291] === [ 70.016296] [ INFO: suspicious RCU usage. ] [ 70.016303] 4.3.0-0.rc0.git7.1.fc24.x86_64 #1 Not tainted [ 70.016308] --- [ 70.016314] include/linux/rcupdate.h:579 Illegal context switch in RCU read-side critical section! [ 70.016319] other info that might help us debug this: [ 70.016327] rcu_scheduler_active = 1, debug_locks = 0 [ 70.016334] 2 locks held by panel-16-netloa/1806: [ 70.016338] #0: (&p->lock){+.+.+.}, at: [] seq_read+0x4c/0x3e0 [ 70.016362] #1: (rcu_read_lock){..}, at: [] dev_seq_start+0x5/0x140 [ 70.016380] stack backtrace: [ 70.016390] CPU: 3 PID: 1806 Comm: panel-16-netloa Not tainted 4.3.0-0.rc0.git7.1.fc24.x86_64 #1 ... [ 70.016435] Call Trace: [ 70.016445] [] dump_stack+0x4b/0x63 [ 70.016456] [] lockdep_rcu_suspicious+0xd7/0x110 [ 70.016465] [] ___might_sleep+0xa7/0x230 [ 70.016472] [] __might_sleep+0x49/0x80 [ 70.016481] [] __alloc_pages_nodemask+0x2fe/0xb90 [ 70.016490] [] ? debug_lockdep_rcu_enabled+0x1d/0x20 [ 70.016499] [] ? sched_clock+0x9/0x10 [ 70.016507] [] ? local_clock+0x1c/0x20 [ 70.016514] [] ? debug_lockdep_rcu_enabled+0x1d/0x20 [ 70.016524] [] dma_generic_alloc_coherent+0x96/0x130 [ 70.016534] [] x86_swiotlb_alloc_coherent+0x25/0x50 [ 70.016541] [] dma_alloc_attrs+0x6d/0xe0 [ 70.016555] [] rtl8169_map_counters+0x3e/0x70 [r8169] [ 70.016567] [] rtl8169_update_counters+0x64/0x140 [r8169] [ 70.016578] [] rtl8169_get_stats64+0xbf/0x130 [r8169] [ 70.016587] [] dev_get_stats+0x54/0x100 [ 70.016595] [] dev_seq_printf_stats+0x37/0x120 [ 70.016605] [] dev_seq_show+0x14/0x30 [ 70.016611] [] seq_read+0x2fa/0x3e0 [ 70.016621] [] proc_reg_read+0x42/0x70 [ 70.016628] [] __vfs_read+0x37/0x100 [ 70.016637] [] ? security_file_permission+0xa3/0xc0 [ 70.016644] [] vfs_read+0x86/0x130 [ 70.016652] [] SyS_read+0x58/0xd0 [ 70.016660] [] entry_SYSCALL_64_fastpath+0x12/0x76 [ 70.016698] BUG: sleeping function called from invalid context at mm/page_alloc.c:3186 [ 70.016704] in_atomic(): 1, irqs_disabled(): 0, pid: 1806, name: panel-16-netloa [ 70.016708] INFO: lockdep is turned off. [ 70.016715] CPU: 3 PID: 1806 Comm: panel-16-netloa Not tainted 4.3.0-0.rc0.git
Re: [PATCH net 3/3] r8169: increase the lifespan of the hardware counters dump area.
From: Francois Romieu Date: Sun, 6 Sep 2015 22:21:53 +0200 > Corinna Vinschen : >> On Sep 5 14:18, rom...@fr.zoreil.com wrote: > [...] >> > - rtl_reset_counters_cond induced failures in open() are also considered >> > fatal: it takes acceptable work to unwind comfortably. >> >> Why? > > > Crap, my description does not match the code wrt rtl_reset_counters_cond. :o/ > > s/rtl8169_reset_counters/rtl8169_update_counters/g > > The code is right. Please resubmit with fixed commit log messages, thanks! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html