date:20150907

Re: [PATCH net 1/1] r8169: fix sleepable allocation during netdevice stats retrieval.

2015-09-07 Thread poma

2nd batch:
0001-r8169-decouple-the-counters-data-and-the-device-priv.patch
0002-r8169-use-a-single-condition-to-check-the-completion.patch
0003-r8169-increase-the-lifespan-of-the-hardware-counters.patch
- the noise is still present

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: parsing the output of ethtool programmatically

2015-09-07 Thread roopa


On 9/5/15, 2:53 AM, Ben Hutchings wrote:

On Thu, 2015-06-25 at 03:28 +0200, Robert Urban wrote:

Hello,

there doesn't seem to be any flag to request ethtool to present its output in a
script-friendly form.  Things like:

 Link partner advertised link modes:  10baseT/Half 10baseT/Full
  100baseT/Half 100baseT/Full
  1000baseT/Full

are kind of pain to parse.

Something to consider?

Or write a library for using SIOCETHTOOL.  The interface is fairly well
documented now and there are other users besides the ethtool utility.


one of the requests which we usually get is ethtool output in json 
format. since iproute2 tools are already getting
an option to print in json format, would be nice if we considered an 
option to print in json for ethtool too.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net v3] ipv6: fix multipath route replace error recovery

2015-09-07 Thread roopa


On 9/6/15, 1:46 PM, Roopa Prabhu wrote:

From: Roopa Prabhu 

Problem:
The ecmp route replace support for ipv6 in the kernel, deletes the
existing ecmp route too early, ie when it installs the first nexthop.
If there is an error in installing the subsequent nexthops, its too late
to recover the already deleted existing route

This patch fixes the problem with the following:
a) Changes the existing multipath route add code to a two stage process:
   build rt6_infos + insert them
ip6_route_add rt6_info creation code is moved into
ip6_route_info_create.
b) This ensures that all errors are caught during building rt6_infos
   and we fail early

The other way I have been thinking of solving the problem is to mark the 
sibling routes being replaced with some state
...so they can be restored on error. Still figuring out a way to do this 
in a clean and non-intrusive way.
Or maybe  just save the sibling routes (rt6_infos) being replaced in a 
list and re-insert them on error.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net v3] ipv6: fix multipath route replace error recovery

2015-09-07 Thread roopa


On 9/7/15, 5:37 AM, Nicolas Dichtel wrote:

Le 06/09/2015 22:46, Roopa Prabhu a écrit :

From: Roopa Prabhu 
I've sent you some comments about the v2, so please keep me in CC for 
the next

versions.



Problem:
The ecmp route replace support for ipv6 in the kernel, deletes the
existing ecmp route too early, ie when it installs the first nexthop.
If there is an error in installing the subsequent nexthops, its too late
to recover the already deleted existing route

This patch fixes the problem with the following:
It does not really 'fix' the problem, it only reduces the probability 
to have
an error. This is really different. The status is much better after 
this patch,

but it could be good to reword a bit the commitlog to reflect that.


sure.



a) Changes the existing multipath route add code to a two stage process:
   build rt6_infos + insert them
ip6_route_add rt6_info creation code is moved into
ip6_route_info_create.
b) This ensures that all errors are caught during building rt6_infos
   and we fail early
c) Separates multipath add and del code. Because add needs the special
   two stage mode in a) and delete essentially does not care.
d) In any event if the code fails during inserting a route again, a
   warning is printed (This should be unlikely)

Before the patch:
$ip -6 route show
3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024
3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024
3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024

/* Try replacing the route with a duplicate nexthop */
$ip -6 route change 3000:1000:1000:1000::2/128 nexthop via
fe80::202:ff:fe00:b dev swp49s0 nexthop via fe80::202:ff:fe00:d dev
swp49s1 nexthop via fe80::202:ff:fe00:d dev swp49s1
RTNETLINK answers: File exists

$ip -6 route show
/* previously added ecmp route 3000:1000:1000:1000::2 dissappears from
  * kernel */

After the patch:
$ip -6 route show
3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024
3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024
3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024

/* Try replacing the route with a duplicate nexthop */
$ip -6 route change 3000:1000:1000:1000::2/128 nexthop via
fe80::202:ff:fe00:b dev swp49s0 nexthop via fe80::202:ff:fe00:d dev
swp49s1 nexthop via fe80::202:ff:fe00:d dev swp49s1
RTNETLINK answers: File exists

$ip -6 route show
3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024
3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024
3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024

Fixes: 27596472473a ("ipv6: fix ECMP route replacement")

As said in the v2 thread, I still don't agree with this tag.

[snip]

+static void ip6_print_replace_route_err(struct list_head *rt6_nh_list)
+{
+struct rt6_nh *nh;
+
+list_for_each_entry(nh, rt6_nh_list, next) {
+pr_warn("IPV6: replace premature del %pI6 nexthop %pI6 ifi 
%d\n",

I don't think that a user (who didn't read the code) can understand this
sentence. Another suggestion:
"ECMPv6: route replacement failed (check the consistency of the 
installed route)". Not sure that the nexthops should be listed after.
sure, i don't have a preference. I will resubmit if we converge on the 
commit message.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net 3/3] r8169: increase the lifespan of the hardware counters dump area.

2015-09-07 Thread Francois Romieu

Francois Romieu  :
[...]
> Updated patch is on the way.

Fixed memcpy in patch 0001, moved counters allocation from open() 
to probe(), returned open() to its original state but something is
still wrong: the link does not come up.

Patches attached.

I'll try again tomorrow.

-- 
Ueimor
>From 8f1b87a115f6c340558b95f6a1748ddbf866d95f Mon Sep 17 00:00:00 2001
Message-Id: 
<8f1b87a115f6c340558b95f6a1748ddbf866d95f.1441670158.git.rom...@fr.zoreil.com>
From: Francois Romieu 
Date: Sat, 5 Sep 2015 13:26:30 +0200
Subject: [PATCH 1/3] r8169: decouple the counters data and the device private
 area.
X-Organisation: Land of Sunshine Inc.

Signed-off-by: Francois Romieu 
---
 drivers/net/ethernet/realtek/r8169.c | 69 ++--
 1 file changed, 42 insertions(+), 27 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c 
b/drivers/net/ethernet/realtek/r8169.c
index 24dcbe6..ff1b834 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -833,7 +833,7 @@ struct rtl8169_private {
unsigned features;
 
struct mii_if_info mii;
-   struct rtl8169_counters counters;
+   struct rtl8169_counters *counters;
struct rtl8169_tc_offsets tc_offset;
u32 saved_wolopts;
u32 opts1_mask;
@@ -2284,7 +2284,7 @@ static bool rtl8169_update_counters(struct net_device 
*dev)
return false;
 
if (rtl_udelay_loop_wait_low(tp, &rtl_counters_cond, 10, 1000))
-   memcpy(&tp->counters, counters, sizeof(*counters));
+   memcpy(tp->counters, counters, sizeof(*counters));
else
ret = false;
 
@@ -2296,6 +2296,8 @@ static bool rtl8169_update_counters(struct net_device 
*dev)
 static bool rtl8169_init_counter_offsets(struct net_device *dev)
 {
struct rtl8169_private *tp = netdev_priv(dev);
+   struct rtl8169_counters *counters = tp->counters;
+   struct rtl8169_tc_offsets *offset = &tp->tc_offset;
bool ret = false;
 
/*
@@ -2313,7 +2315,7 @@ static bool rtl8169_init_counter_offsets(struct 
net_device *dev)
 * set at open time by rtl_hw_start.
 */
 
-   if (tp->tc_offset.inited)
+   if (offset->inited)
return true;
 
/* If both, reset and update fail, propagate to caller. */
@@ -2323,10 +2325,10 @@ static bool rtl8169_init_counter_offsets(struct 
net_device *dev)
if (rtl8169_update_counters(dev))
ret = true;
 
-   tp->tc_offset.tx_errors = tp->counters.tx_errors;
-   tp->tc_offset.tx_multi_collision = tp->counters.tx_multi_collision;
-   tp->tc_offset.tx_aborted = tp->counters.tx_aborted;
-   tp->tc_offset.inited = true;
+   offset->tx_errors = counters->tx_errors;
+   offset->tx_multi_collision = counters->tx_multi_collision;
+   offset->tx_aborted = counters->tx_aborted;
+   offset->inited = true;
 
return ret;
 }
@@ -2335,24 +2337,25 @@ static void rtl8169_get_ethtool_stats(struct net_device 
*dev,
  struct ethtool_stats *stats, u64 *data)
 {
struct rtl8169_private *tp = netdev_priv(dev);
+   struct rtl8169_counters *c = tp->counters;
 
ASSERT_RTNL();
 
rtl8169_update_counters(dev);
 
-   data[0] = le64_to_cpu(tp->counters.tx_packets);
-   data[1] = le64_to_cpu(tp->counters.rx_packets);
-   data[2] = le64_to_cpu(tp->counters.tx_errors);
-   data[3] = le32_to_cpu(tp->counters.rx_errors);
-   data[4] = le16_to_cpu(tp->counters.rx_missed);
-   data[5] = le16_to_cpu(tp->counters.align_errors);
-   data[6] = le32_to_cpu(tp->counters.tx_one_collision);
-   data[7] = le32_to_cpu(tp->counters.tx_multi_collision);
-   data[8] = le64_to_cpu(tp->counters.rx_unicast);
-   data[9] = le64_to_cpu(tp->counters.rx_broadcast);
-   data[10] = le32_to_cpu(tp->counters.rx_multicast);
-   data[11] = le16_to_cpu(tp->counters.tx_aborted);
-   data[12] = le16_to_cpu(tp->counters.tx_underun);
+   data[ 0] = le64_to_cpu(c->tx_packets);
+   data[ 1] = le64_to_cpu(c->rx_packets);
+   data[ 2] = le64_to_cpu(c->tx_errors);
+   data[ 3] = le32_to_cpu(c->rx_errors);
+   data[ 4] = le16_to_cpu(c->rx_missed);
+   data[ 5] = le16_to_cpu(c->align_errors);
+   data[ 6] = le32_to_cpu(c->tx_one_collision);
+   data[ 7] = le32_to_cpu(c->tx_multi_collision);
+   data[ 8] = le64_to_cpu(c->rx_unicast);
+   data[ 9] = le64_to_cpu(c->rx_broadcast);
+   data[10] = le32_to_cpu(c->rx_multicast);
+   data[11] = le16_to_cpu(c->tx_aborted);
+   data[12] = le16_to_cpu(c->tx_underun);
 }
 
 static void rtl8169_get_strings(struct net_device *dev, u32 stringset, u8 
*data)
@@ -7779,6 +7782,8 @@ static struct rtnl_link_stats64 *
 rtl8169_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
 {
struct rtl8169_private *tp = netdev_priv(dev);
+   struct rtl8169_tc_offsets *offset = &tp->tc_

Re: [PATCH net-next v2] ipv6: fix multipath route replace error recovery

2015-09-07 Thread roopa


On 9/7/15, 5:03 AM, Nicolas Dichtel wrote:
The bug you're trying to fix has been introduced by the commit 
51ebd3181572

("ipv6: add support of equal cost multipath (ECMP)").
Commit 27596472473a ("ipv6: fix ECMP route replacement") didn't try to 
fix this

problem.
The reason I say "27596472473a ("ipv6: fix ECMP route replacement") ", 
has introduced the problem
 i am trying to fix is because of the below change. The part where it 
deletes all siblings of an existing route


diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 96dbfff..bde57b1 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
[..snip]
@@ -835,8 +851,27 @@ add:
info->nl_net->ipv6.rt6_stats->fib_route_nodes++;
fn->fn_flags |= RTN_RTINFO;
}
+   nsiblings = iter->rt6i_nsiblings;
fib6_purge_rt(iter, fn, info->nl_net);
rt6_release(iter);
+
+   if (nsiblings) {
+   /* Replacing an ECMP route, remove all siblings */
+   ins = &rt->dst.rt6_next;
+   iter = *ins;
+   while (iter) {
+   if (rt6_qualify_for_ecmp(iter)) {
+   *ins = iter->dst.rt6_next;
+   fib6_purge_rt(iter, fn, 
info->nl_net);

+   rt6_release(iter);
+   nsiblings--;
+   } else {
+   ins = &iter->dst.rt6_next;
+   }
+   iter = *ins;
+   }
+   WARN_ON(nsiblings != 0);
+   }
}

return 0;



Note that the first patch you submitted to fix this pb (cf
http://patchwork.ozlabs.org/patch/362296/) was in June 2014, ie one 
year before

the commit 27596472473a.

yes, i had submitted the patch you mention above to fix a slightly 
different problem that existed then..which
was introduced by "51ebd3181572 ("ipv6: add support of equal cost 
multipath (ECMP)")".
Commit "35f1b4e96b9258a3668872b1139c51e5a23eb876 ipv6: do not delete 
previously existing ECMP routes if add fails"

subsequently fixed it.

Thanks,
Roopa

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net 3/3] r8169: increase the lifespan of the hardware counters dump area.

2015-09-07 Thread David Miller

From: Corinna Vinschen 
Date: Mon, 7 Sep 2015 11:29:49 +0200

> Still wondering though.  Given that the driver never failed before if
> the counter values couldn't be updated, and given that these counter
> values only have statistical relevance, why should this suddenly result
> in a fatal failure at open time?

Failing to allocate such a small buffer means we have much deeper issues
at hand.  A GFP_KERNEL allocation of this size really should not fail.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net 3/3] r8169: increase the lifespan of the hardware counters dump area.

2015-09-07 Thread Francois Romieu

Corinna Vinschen  :
[...]
> I have a bit of a problem with this patch.  It crashes when loading the
> driver manually w/ modprobe.  For some reason dev_get_stats is called
> during rtl_init_one and at that time the counters pointer is NULL, so
> the kernel gets a SEGV.
>
>  After I worked around that I got a SEGV in
> rtl_remove_one, which I still have to find out why.  I didn't test with
> the latest kernel, though, so I still have to check if that's the reason
> for the crashes.  That takes a bit longer, I just wanted to let you
> know.

Call me stupid: I forgot that it's perfectly fine to request the stats
of a down interface. :o/

Updated patch is on the way.

> There's also a problem with rtl8169_cmd_counters:  It always calls
> rtl_udelay_loop_wait_low w/ rtl_reset_counters_cond, even when called
> from rtl8169_update_counters, where it should call
> rtl_udelay_loop_wait_low w/ rtl_counters_cond to check for the
> CounterDump flag, rather than for the CounterReset flag.
> 
> For now I applied the below patch, but I think it's a bit ugly to
> conditionalize the rtl_cond struct by the incoming flag value.



Let's check both at once:

return RTL_R32(CounterAddrLow) & (CounterDump | CounterReset);



Thanks for reviewing.

-- 
Ueimor
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] stmmac: fix check for phydev being open

2015-09-07 Thread Alexey Brodkin

Hi Sergei,

On Tue, 2015-09-08 at 00:24 +0300, Sergei Shtylyov wrote:
> On 09/07/2015 11:50 PM, Alexey Brodkin wrote:
> 
> > Current implementation via IS_ERR(phydev) may make no sense because
> > of_phy_attach() returns NULL on failure instead of error value.
> 
> Not of_phy_connect()?

I already noticed this misspelling - it came from my exploration of
what happens inside of_phy_connect(). So sort of braindump.

I will fix in v3 re-spin.

-Alexey--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] stmmac: fix check for phydev being open

2015-09-07 Thread Sergei Shtylyov


On 09/07/2015 11:50 PM, Alexey Brodkin wrote:


Current implementation via IS_ERR(phydev) may make no sense because
of_phy_attach() returns NULL on failure instead of error value.


   Not of_phy_connect()?


Still for checking result of phy_connect() IS_ERR() is useful.

To address both situations we use combined IS_ERR_OR_NULL() check.

Cc: Giuseppe Cavallaro 
Cc: linux-ker...@vger.kernel.org
Cc: sta...@vger.kernel.org
Cc: David Miller 
Cc: Sergei Shtylyov 
Signed-off-by: Alexey Brodkin 


MBR, Sergei

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 0/3] Network stack, first user of SLAB/kmem_cache bulk free API.

2015-09-07 Thread Alexander Duyck


On 09/07/2015 01:16 AM, Jesper Dangaard Brouer wrote:

On Fri, 4 Sep 2015 11:09:21 -0700
Alexander Duyck  wrote:


This is an interesting start.  However I feel like it might work better
if you were to create a per-cpu pool for skbs that could be freed and
allocated in NAPI context.  So for example we already have
napi_alloc_skb, why not just add a napi_free_skb

I do like the idea...


If nothing else you want to avoid having to redo this code for every 
driver.  If you can just replace dev_kfree_skb with some other freeing 
call it will make it much easier to convert other drivers.



and then make the array
of objects to be freed part of a pool that could be used for either
allocation or freeing?  If the pool runs empty you just allocate
something like 8 or 16 new skb heads, and if you fill it you just free
half of the list?

But I worry that this algorithm will "randomize" the (skb) objects.
And the SLUB bulk optimization only works if we have many objects
belonging to the same page.


Agreed to some extent, however at the same time what this does is allow 
for a certain amount of skb recycling.  So instead of freeing the 
buffers received from the socket you would likely be recycling them and 
sending them back as Rx skbs.  In the case of a heavy routing workload 
you would likely just be cycling through the same set of buffers and 
cleaning them off of transmit and placing them back on receive.  The 
general idea is to keep the memory footprint small so recycling Tx 
buffers to use for Rx can have its advantages in terms of keeping things 
confined to limits of the L1/L2 cache.



It would likely be fastest to implement a simple stack (for these
per-cpu pools), but I again worry that it would randomize the
object-pages.  A simple queue might be better, but slightly slower.
Guess I could just reuse part of qmempool / alf_queue as a quick test.


I would say don't over engineer it.  A stack is the simplest.  The 
qmempool / alf_queue is just going to add extra overhead.


The added advantage to the stack is that you are working with pointers 
and you are guaranteed that the list of pointers are going to be 
linear.  If you use a queue clean-up will require up to 2 blocks of 
freeing in case the ring has wrapped.



Having a per-cpu pool in networking would solve the problem of the slub
per-cpu pool isn't large enough for our use-case.  On the other hand,
maybe we should fix slub to dynamically adjust the size of it's per-cpu
resources?


The per-cpu pool is just meant to replace the the per-driver pool you 
were using.  By using a per-cpu pool you would get better aggregation 
and can just flush the freed buffers at the end of the Rx softirq or 
when the pool is full instead of having to flush smaller lists per call 
to napi->poll.



A pre-req knowledge (for people not knowing slub's internal details):
Slub alloc path will pickup a page, and empty all objects for that page
before proceeding to the next page.  Thus, slub bulk alloc will give
many objects belonging to the page.  I'm trying to keep these objects
grouped together until they can be free'ed in a bulk.


The problem is you aren't going to be able to keep them together very 
easily.  Yes they might be allocated all from one spot on Rx but they 
can very easily end up scattered to multiple locations. The same applies 
to Tx where you will have multiple flows all outgoing on one port.  That 
is why I was thinking adding some skb recycling via a per-cpu stack 
might be useful especially since you have to either fill or empty the 
stack when you allocate or free multiple skbs anyway.  In addition it 
provides an easy way for a bulk alloc and a bulk free to share data 
structures without adding additional overhead by keeping them separate.


If you managed it with some sort of high-water/low-water mark type setup 
you could very well keep the bulk-alloc/free busy without too much 
fragmentation.  For the socket transmit/receive case the thing you have 
to keep in mind is that if you reuse the buffers you are just going to 
be throwing them back at the sockets which are likely not using 
bulk-free anyway.  So in that case reuse could actually improve things 
by simply reducing the number of calls to bulk-alloc you will need to 
make since things like TSO allow you to send 64K using a single sk_buff, 
while you will be likely be receiving one or more acks on the receive 
side which will require allocations.


- Alex
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: slab-nomerge (was Re: [git pull] device mapper changes for 4.3)

2015-09-07 Thread Jesper Dangaard Brouer

On Mon, 7 Sep 2015 13:22:13 -0700
Linus Torvalds  wrote:

> On Mon, Sep 7, 2015 at 2:30 AM, Jesper Dangaard Brouer
>  wrote:
> >
> > The slub allocator have a faster "fastpath", if your workload is
> > fast-reusing within the same per-cpu page-slab, but once the workload
> > increases you hit the slowpath, and then slab catches up. Slub looks
> > great in micro-benchmarking.
> >
> > And with "slab_nomerge" I get even high performance:
> 
> I think those two are related.
> 
> Not merging means that effectively the percpu caches end up being
> bigger (simply because there are more of them), and so it captures
> more of the fastpath cases.

Yes, that was also my theory.  As manually tuning the percpu sizes gave
me almost the same boost.

> Obviously the percpu queue size is an easy tunable too, but there are
> real downsides to that too. 

The easy fix is to introduce a subsystem specific percpu cache that is
large enough for our use-case.  That seems to be a trend. I'm hoping to
come up with something smarter that every subsystem can benefit from. 
E.g some heuristic that can dynamic adjust SLUB according to the usage
pattern. I can imagine something as simple as a counter for every
slowpath call, that is only valid as long as the jiffies count matches
(reset to zero, and store new jiffies cnt).  (But I have not thought
this through...)

> I suspect your IP forwarding case isn't so
> different from some of the microbenchmarks, it just has more
> outstanding work..

Yes, I will admit that my testing is very close to micro benchmarking,
and it is specifically designed to pressure the system to its limits[1].
Especially the minimum frame size is evil and unrealistic, but the real
purpose is preparing the stack for increasing speeds like 100Gbit/s.

> And yes, the slow path (ie not hitting in the percpu cache) of SLUB
> could hopefully be optimizable too, although maybe the bulk patches
> are the way to go (and unrelated to this thread - at least part of
> your bulk patches actually got merged last Friday - they were part of
> Andrew's patch-bomb).

Cool. Yes, it is only part of the bulk patches. The real performance
boosters are not in yet (but I need to make them work correctly with
memory debugging enabled before they can get merged).  At least the
main API is in, which allows me to implement use-case easier in other
subsystems :-)

[1] http://netoptimizer.blogspot.dk/2014/09/packet-per-sec-measurements-for.html
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] stmmac: fix check for phydev being open

2015-09-07 Thread Alexey Brodkin

Hi Sergei,

On Mon, 2015-09-07 at 23:53 +0300, Sergei Shtylyov wrote:
> On 09/07/2015 11:50 PM, Alexey Brodkin wrote:
> 
> > Current implementation via IS_ERR(phydev) may make no sense because
> > of_phy_attach() returns NULL on failure instead of error value.
> > 
> > Still for checking result of phy_connect() IS_ERR() is useful.
> > 
> > To address both situations we use combined IS_ERR_OR_NULL() check.
> > 
> > Cc: Giuseppe Cavallaro 
> > Cc: linux-ker...@vger.kernel.org
> > Cc: sta...@vger.kernel.org
> > Cc: David Miller 
> > Cc: Sergei Shtylyov 
> > Signed-off-by: Alexey Brodkin 
> > ---
> > 
> > Changes compared to v1:
> >   * Use IS_ERR_OR_NULL() instead of discrete checks for null and err
> > 
> >   drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
> > b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > index 864b476..7985d8a 100644
> > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > @@ -837,7 +837,7 @@ static int stmmac_init_phy(struct net_device *dev)
> >  interface);
> > }
> > 
> > -   if (IS_ERR(phydev)) {
> > +   if (IS_ERR_OR_NULL(phydev)) {
> > pr_err("%s: Could not attach to PHY\n", dev->name);
> > return PTR_ERR(phydev);
> 
> Hm, in case of phydev == NULL, you're going to return 0 here... is that 
> what you want?

Ah, right.

So then the question would be what's a proper error code for !phydev:
-ENOENT or -ENODEV?

-Alexey--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] stmmac: fix check for phydev being open

2015-09-07 Thread Sergei Shtylyov


On 09/07/2015 11:50 PM, Alexey Brodkin wrote:


Current implementation via IS_ERR(phydev) may make no sense because
of_phy_attach() returns NULL on failure instead of error value.

Still for checking result of phy_connect() IS_ERR() is useful.

To address both situations we use combined IS_ERR_OR_NULL() check.

Cc: Giuseppe Cavallaro 
Cc: linux-ker...@vger.kernel.org
Cc: sta...@vger.kernel.org
Cc: David Miller 
Cc: Sergei Shtylyov 
Signed-off-by: Alexey Brodkin 
---

Changes compared to v1:
  * Use IS_ERR_OR_NULL() instead of discrete checks for null and err

  drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 864b476..7985d8a 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -837,7 +837,7 @@ static int stmmac_init_phy(struct net_device *dev)
 interface);
}

-   if (IS_ERR(phydev)) {
+   if (IS_ERR_OR_NULL(phydev)) {
pr_err("%s: Could not attach to PHY\n", dev->name);
return PTR_ERR(phydev);


   Hm, in case of phydev == NULL, you're going to return 0 here... is that 
what you want?


MBR, Sergei

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2] stmmac: fix check for phydev being open

2015-09-07 Thread Alexey Brodkin

Current implementation via IS_ERR(phydev) may make no sense because
of_phy_attach() returns NULL on failure instead of error value.

Still for checking result of phy_connect() IS_ERR() is useful.

To address both situations we use combined IS_ERR_OR_NULL() check.

Cc: Giuseppe Cavallaro 
Cc: linux-ker...@vger.kernel.org
Cc: sta...@vger.kernel.org
Cc: David Miller 
Cc: Sergei Shtylyov 
Signed-off-by: Alexey Brodkin 
---

Changes compared to v1:
 * Use IS_ERR_OR_NULL() instead of discrete checks for null and err

 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 864b476..7985d8a 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -837,7 +837,7 @@ static int stmmac_init_phy(struct net_device *dev)
 interface);
}
 
-   if (IS_ERR(phydev)) {
+   if (IS_ERR_OR_NULL(phydev)) {
pr_err("%s: Could not attach to PHY\n", dev->name);
return PTR_ERR(phydev);
}
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: slab-nomerge (was Re: [git pull] device mapper changes for 4.3)

2015-09-07 Thread Linus Torvalds

On Mon, Sep 7, 2015 at 2:30 AM, Jesper Dangaard Brouer
 wrote:
>
> The slub allocator have a faster "fastpath", if your workload is
> fast-reusing within the same per-cpu page-slab, but once the workload
> increases you hit the slowpath, and then slab catches up. Slub looks
> great in micro-benchmarking.
>
> And with "slab_nomerge" I get even high performance:

I think those two are related.

Not merging means that effectively the percpu caches end up being
bigger (simply because there are more of them), and so it captures
more of the fastpath cases.

Obviously the percpu queue size is an easy tunable too, but there are
real downsides to that too. I suspect your IP forwarding case isn't so
different from some of the microbenchmarks, it just has more
outstanding work..

And yes, the slow path (ie not hitting in the percpu cache) of SLUB
could hopefully be optimizable too, although maybe the bulk patches
are the way to go (and unrelated to this thread - at least part of
your bulk patches actually got merged last Friday - they were part of
Andrew's patch-bomb).

Linus
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 1/3] net: introduce kfree_skb_bulk() user of kmem_cache_free_bulk()

2015-09-07 Thread Jesper Dangaard Brouer

On Mon, 7 Sep 2015 09:25:49 -0700 Tom Herbert  wrote:

> >> What not pass a list of skbs (e.g. using skb->next)?
> >
> > Because the next layer, the slab API needs an array:
> >   kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p)
> >
> 
> I suppose we could ask the same question of that function. IMO
> encouraging drivers to define arrays of pointers on the stack like
> you're doing in the ixgbe patch is a bad direction.
> 
> In any case I believe this would be simpler in the networking side
> just to maintain a list of skb's to free. Then the dev_free_waitlist
> structure might not be needed then since we could just use a
> skb_buf_head for that.

I guess it is more natural for the network side to work with skb lists.
But I'm keeping it for slab/slub as we cannot assume/enforce objects of a
specific data type.

I worried about how large bulk free we should allow, due to the
interaction with skb->destructor which for sockets affect their memory
accounting. E.g. we have seen issues with hypervisor network drivers
(Xen and HyperV) that are too slow to cleanup their TX completion queue
that their TCP bandwidth get limited by tcp_limit_output_bytes.
I capped it at 32, and the NAPI budget will cap it at 64.

By the following argument, bulk free of 64 objects/skb's is not a problem.
The delay I'm introducing is very small, before the first real
kfree_skb is called, which calls the destructor with free up socket
memory accounting.

Assume measured packet rate of: 2105011 pps
Time between packets (1/2105011*10^9): 475 ns

Perf shows ixgbe_clean_tx_irq() takes: 1.23%
Extrapolating the function call cost: 5.84 ns (475*(1.23/100))

Processing 64 packets in ixgbe_clean_tx_irq() 373 ns.
At 10Gbit/s how many bytes can arrive in this period, only: 466 bytes.
((373/10^9)*(1*10^6)/8)

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] ip link/addr show: add empty line between interfaces

2015-09-07 Thread Felix Kaiser

Excerpts from Stephen Hemminger's message of 2015-09-07 11:09:32 -0700:
> Agreed with Jiri. I prefer the new shorter brief format, not longer format.

As long as it requires extra typing, few are going to use it.
It's about the default.

Did you try the brief format with a few v6 addresses on an
interface? When the line gets wrapped, it gets hard to read.

Besides, this isn't about the brief format, but about making
the long format less unreadable.

> Did you ever try a system with 10 interfaces or 1000 tunnels??

The largest I tried it on had 11 interfaces (8 of which are for tunnels).
That's not 1000, but how many people have that?

After all the positive feedback I got locally I'm genuinely
surprised that you don't like it, but I guess that's the way it is.

Thanks for looking at it, though!
Felix
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] stmmac: fix check for phydev being open

2015-09-07 Thread Sergei Shtylyov


Hello.

On 09/07/2015 09:15 PM, Alexey Brodkin wrote:


Current implementation via IS_ERR(phydev) may make no sense because
of_phy_attach() returns NULL on failure instead of error value.



Still for checking result of phy_connect() IS_ERR() is useful.



So adding explicit check for NULL.



Cc: Giuseppe Cavallaro 
Cc: arc-linux-...@synopsys.com
Cc: linux-ker...@vger.kernel.org
Cc: sta...@vger.kernel.org
Cc: David Miller 
Signed-off-by: Alexey Brodkin 
---
  drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 864b476..ad2ce3e 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -837,7 +837,7 @@ static int stmmac_init_phy(struct net_device *dev)
 interface);
}

-   if (IS_ERR(phydev)) {
+   if (!phydev || IS_ERR(phydev)) {


   There's IS_ERR_OR_NULL().

WBR, Sergei

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] stmmac: fix check for phydev being open

2015-09-07 Thread Alexey Brodkin

Current implementation via IS_ERR(phydev) may make no sense because
of_phy_attach() returns NULL on failure instead of error value.

Still for checking result of phy_connect() IS_ERR() is useful.

So adding explicit check for NULL.

Cc: Giuseppe Cavallaro 
Cc: arc-linux-...@synopsys.com
Cc: linux-ker...@vger.kernel.org
Cc: sta...@vger.kernel.org
Cc: David Miller 
Signed-off-by: Alexey Brodkin 
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 864b476..ad2ce3e 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -837,7 +837,7 @@ static int stmmac_init_phy(struct net_device *dev)
 interface);
}
 
-   if (IS_ERR(phydev)) {
+   if (!phydev || IS_ERR(phydev)) {
pr_err("%s: Could not attach to PHY\n", dev->name);
return PTR_ERR(phydev);
}
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] iproute: print more verbose error on route cache flush

2015-09-07 Thread Stephen Hemminger

On Sat,  5 Sep 2015 10:40:50 +0300
Denis Kirjanov  wrote:

> Before:
> kda@vfirst ~/devel/iproute2 $ ./ip/ip route flush cache
> Cannot open "/proc/sys/net/ipv4/route/flush"
> 
> After:
> kda@vfirst ~/devel/iproute2/ip $ ./ip route flush cache
> Cannot open "/proc/sys/net/ipv4/route/flush": Permission denied
> 
> Signed-off-by: Denis Kirjanov 
> ---
>  ip/iproute.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/ip/iproute.c b/ip/iproute.c
> index 8f49e62..abe4180 100644
> --- a/ip/iproute.c
> +++ b/ip/iproute.c
> @@ -1213,7 +1213,8 @@ static int iproute_flush_cache(void)
>   char *buffer = "-1";
>  
>   if (flush_fd < 0) {
> - fprintf (stderr, "Cannot open \"%s\"\n", ROUTE_FLUSH_PATH);
> + fprintf (stderr, "Cannot open \"%s\": %s\n",
> + ROUTE_FLUSH_PATH, strerror(errno));
>   return -1;
>   }
>  

Sure why not, applied
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] ip link/addr show: add empty line between interfaces

2015-09-07 Thread Stephen Hemminger

On Mon, 7 Sep 2015 17:54:11 +0200
Jiri Benc  wrote:

> On Mon, 07 Sep 2015 17:38:48 +0200, Felix Kaiser wrote:
> > My opinion is that if someone parses that, they deserve the opportunity
> > to fix their code. At the very least they could have used -oneline!
> 
> No doubt. I also don't like scripts that parse the default "ip a"
> output.
> 
> I just pointed out that there are different opinions and that it can
> have some consequences. Personally, I have no strong feeling against
> nor for the patch. It's up to Stephen to decide.
> 
> > I've met users who use a shell alias to hack newlines in
> 
> Which is not a bad solution, actually.
> 
>  Jiri
> 

Agreed with Jiri. I prefer the new shorter brief format, not longer format.
Did you ever try a system with 10 interfaces or 1000 tunnels??
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Fw: [Bug 104161] New: DLNA services disappear on bridge device shortly after starting the dlna service

2015-09-07 Thread Stephen Hemminger

Begin forwarded message:

Date: Mon, 7 Sep 2015 13:31:54 +
From: "bugzilla-dae...@bugzilla.kernel.org" 

To: "shemmin...@linux-foundation.org" 
Subject: [Bug 104161] New: DLNA services disappear on bridge device shortly 
after starting the dlna service

https://bugzilla.kernel.org/show_bug.cgi?id=104161

Bug ID: 104161
   Summary: DLNA services disappear on bridge device shortly after
starting the dlna service
   Product: Networking
   Version: 2.5
Kernel Version: 4.2
  Hardware: All
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: IPV4
  Assignee: shemmin...@linux-foundation.org
  Reporter: t.p...@gmx.de
Regression: No

Hi,
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=9afd85c9e4552b276e2f4cfefd622bdeeffbbf26

This commit introduces a weird behaviour on my dlna server, commit was
bisected.

This commit was merged into 4.2 series, which makes my dlna/upnp services
disappear real soon after starting from discovering from other clients.
Localhost seems not to be affected but every external discovery is broken.
The dlna services run on a bridge device with IPV4. Mythtv and Minidlna both
disappear after some minutes of running.
4.2 with those 3 patches reverted makes it work as it should:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=a516993f0ac1694673412eb2d16a091eafa77d2a
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=fcba67c94abe83e0e69a65737000ccbb16a4fa03
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=9afd85c9e4552b276e2f4cfefd622bdeeffbbf26

Do you need any more information to get this really weird bug fixed?

Thanks in advance.

-- 
You are receiving this mail because:
You are the assignee for the bug.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 18/20] net/xen-netback: Make it running on 64KB page granularity

2015-09-07 Thread Wei Liu

You might need to rebase you patch. A patch to netback went it recently.

On Mon, Sep 07, 2015 at 04:33:56PM +0100, Julien Grall wrote:
> The PV network protocol is using 4KB page granularity. The goal of this
> patch is to allow a Linux using 64KB page granularity working as a
> network backend on a non-modified Xen.
> 
> It's only necessary to adapt the ring size and break skb data in small
> chunk of 4KB. The rest of the code is relying on the grant table code.
> 
> Signed-off-by: Julien Grall 
> 

Reviewed-by: Wei Liu 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 1/3] net: introduce kfree_skb_bulk() user of kmem_cache_free_bulk()

2015-09-07 Thread Tom Herbert

>> What not pass a list of skbs (e.g. using skb->next)?
>
> Because the next layer, the slab API needs an array:
>   kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p)
>

I suppose we could ask the same question of that function. IMO
encouraging drivers to define arrays of pointers on the stack like
you're doing in the ixgbe patch is a bad direction.

In any case I believe this would be simpler in the networking side
just to maintain a list of skb's to free. Then the dev_free_waitlist
structure might not be needed then since we could just use a
skb_buf_head for that.


Tom

> Look at the patch:
>  [PATCH V2 3/3] slub: build detached freelist with look-ahead
>  http://thread.gmane.org/gmane.linux.kernel.mm/137469/focus=137472
>
> Where I use this array to progressively scan for objects belonging to
> the same page.  (A subtle detail is I manage to zero out the array,
> which is good from a security/error-handling point of view, as pointers
> to the objects are not left dangling on the stack).
>
>
> I cannot argue that, writing skb->next comes as an additional cost,
> because the slUb free also writes into this cacheline.  Perhaps the
> slAb allocator does not?
>
> [...]
>> > +   if (likely(cnt)) {
>> > +   kmem_cache_free_bulk(skbuff_head_cache, cnt, (void **) 
>> > skbs);
>> > +   }
>> > +}
>> > +EXPORT_SYMBOL(kfree_skb_bulk);
>
> --
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Sr. Network Kernel Developer at Red Hat
>   Author of http://www.iptv-analyzer.org
>   LinkedIn: http://www.linkedin.com/in/brouer
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] ip link/addr show: add empty line between interfaces

2015-09-07 Thread Jiri Benc

On Mon, 07 Sep 2015 17:38:48 +0200, Felix Kaiser wrote:
> My opinion is that if someone parses that, they deserve the opportunity
> to fix their code. At the very least they could have used -oneline!

No doubt. I also don't like scripts that parse the default "ip a"
output.

I just pointed out that there are different opinions and that it can
have some consequences. Personally, I have no strong feeling against
nor for the patch. It's up to Stephen to decide.

> I've met users who use a shell alias to hack newlines in

Which is not a bad solution, actually.

 Jiri

-- 
Jiri Benc
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 17/20] net/xen-netfront: Make it running on 64KB page granularity

2015-09-07 Thread Julien Grall

The PV network protocol is using 4KB page granularity. The goal of this
patch is to allow a Linux using 64KB page granularity using network
device on a non-modified Xen.

It's only necessary to adapt the ring size and break skb data in small
chunk of 4KB. The rest of the code is relying on the grant table code.

Note that we allocate a Linux page for each rx skb but only the first
4KB is used. We may improve the memory usage by extending the size of
the rx skb.

Signed-off-by: Julien Grall 
Reviewed-by: David Vrabel 

---
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 
Cc: netdev@vger.kernel.org

Improvement such as support of 64KB grant is not taken into
consideration in this patch because we have the requirement to run a Linux
using 64KB pages on a non-modified Xen.

Tested with workload such as ping, ssh, wget, git... I would happy if
someone give details how to test all the path.

Changes in v4:
- s/gnttab_one_grant/gnttab_for_one_grant/ based on the new naming
- Add David's reviewed-by

Changes in v3:
- Fix errors reported by checkpatch.pl
- s/mfn/gfn/ base on the new naming
- xennet_tx_setup_grant was calling itself resulting an
guest stall when using iperf.
- The grant callback doesn't allow anymore to change the len
(wasn't used here)
- gnttab_foreach_grant has been renamed to gnttab_foreach_grant_in_range
- gnttab_page_grant_foreign_ref has been renamed to
gnttab_foreach_grant_foreign_ref_one

Changes in v2:
- Use gnttab_foreach_grant to split a Linux page in grant
- Fix count slots
---
 drivers/net/xen-netfront.c | 122 -
 1 file changed, 86 insertions(+), 36 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 47f791e..17b1013 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -74,8 +74,8 @@ struct netfront_cb {
 
 #define GRANT_INVALID_REF  0
 
-#define NET_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, PAGE_SIZE)
-#define NET_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, PAGE_SIZE)
+#define NET_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, XEN_PAGE_SIZE)
+#define NET_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, XEN_PAGE_SIZE)
 
 /* Minimum number of Rx slots (includes slot for GSO metadata). */
 #define NET_RX_SLOTS_MIN (XEN_NETIF_NR_SLOTS_MIN + 1)
@@ -291,7 +291,7 @@ static void xennet_alloc_rx_buffers(struct netfront_queue 
*queue)
struct sk_buff *skb;
unsigned short id;
grant_ref_t ref;
-   unsigned long gfn;
+   struct page *page;
struct xen_netif_rx_request *req;
 
skb = xennet_alloc_one_rx_buffer(queue);
@@ -307,14 +307,13 @@ static void xennet_alloc_rx_buffers(struct netfront_queue 
*queue)
BUG_ON((signed short)ref < 0);
queue->grant_rx_ref[id] = ref;
 
-   gfn = 
xen_page_to_gfn(skb_frag_page(&skb_shinfo(skb)->frags[0]));
+   page = skb_frag_page(&skb_shinfo(skb)->frags[0]);
 
req = RING_GET_REQUEST(&queue->rx, req_prod);
-   gnttab_grant_foreign_access_ref(ref,
-   queue->info->xbdev->otherend_id,
-   gfn,
-   0);
-
+   gnttab_page_grant_foreign_access_ref_one(ref,
+
queue->info->xbdev->otherend_id,
+page,
+0);
req->id = id;
req->gref = ref;
}
@@ -415,25 +414,33 @@ static void xennet_tx_buf_gc(struct netfront_queue *queue)
xennet_maybe_wake_tx(queue);
 }
 
-static struct xen_netif_tx_request *xennet_make_one_txreq(
-   struct netfront_queue *queue, struct sk_buff *skb,
-   struct page *page, unsigned int offset, unsigned int len)
+struct xennet_gnttab_make_txreq {
+   struct netfront_queue *queue;
+   struct sk_buff *skb;
+   struct page *page;
+   struct xen_netif_tx_request *tx; /* Last request */
+   unsigned int size;
+};
+
+static void xennet_tx_setup_grant(unsigned long gfn, unsigned int offset,
+ unsigned int len, void *data)
 {
+   struct xennet_gnttab_make_txreq *info = data;
unsigned int id;
struct xen_netif_tx_request *tx;
grant_ref_t ref;
-
-   len = min_t(unsigned int, PAGE_SIZE - offset, len);
+   /* convenient aliases */
+   struct page *page = info->page;
+   struct netfront_queue *queue = info->queue;
+   struct sk_buff *skb = info->skb;
 
id = get_id_from_freelist(&queue->tx_skb_freelist, queue->tx_skbs);
tx = RING_GET_REQUEST(&queue->tx, queue->tx.req_prod_pvt++);
ref = gnttab_claim_

[PATCH v4 18/20] net/xen-netback: Make it running on 64KB page granularity

2015-09-07 Thread Julien Grall

The PV network protocol is using 4KB page granularity. The goal of this
patch is to allow a Linux using 64KB page granularity working as a
network backend on a non-modified Xen.

It's only necessary to adapt the ring size and break skb data in small
chunk of 4KB. The rest of the code is relying on the grant table code.

Signed-off-by: Julien Grall 

---
Cc: Ian Campbell 
Cc: Wei Liu 
Cc: netdev@vger.kernel.org

Improvement such as support of 64KB grant is not taken into
consideration in this patch because we have the requirement to run a
Linux using 64KB pages on a non-modified Xen.

Note that I haven't add a comment why the offset is 0 after the first
iteration. See [1] for more details.

[1] https://lkml.org/lkml/2015/8/10/456

Changes in v4:
- Add a comment to explain how we compute MAX_XEN_SKB_FRAGS

Changes in v3:
- Fix errors reported by checkpatch.pl
- s/mfn/gfn/ based on the new naming
- gnttab_foreach_grant has been renamed to gnttab_forach_grant_in_range
- The grant callback doesn't allow anymore to use less data. An
helpers has been added in netback to handle this.

Changes in v2:
- Correctly set MAX_GRANT_COPY_OPS and XEN_NETBK_RX_SLOTS_MAX
- Don't use XEN_PAGE_SIZE in handle_frag_list as we coalesce
fragment into a new skb
- Use gnntab_foreach_grant to split a Linux page into grant
---
 drivers/net/xen-netback/common.h  |  18 +++--
 drivers/net/xen-netback/netback.c | 153 --
 2 files changed, 110 insertions(+), 61 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 8a495b3..24cb365 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -44,6 +44,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 typedef unsigned int pending_ring_idx_t;
@@ -64,8 +65,8 @@ struct pending_tx_info {
struct ubuf_info callback_struct;
 };
 
-#define XEN_NETIF_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, PAGE_SIZE)
-#define XEN_NETIF_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, PAGE_SIZE)
+#define XEN_NETIF_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, XEN_PAGE_SIZE)
+#define XEN_NETIF_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, XEN_PAGE_SIZE)
 
 struct xenvif_rx_meta {
int id;
@@ -80,16 +81,21 @@ struct xenvif_rx_meta {
 /* Discriminate from any valid pending_idx value. */
 #define INVALID_PENDING_IDX 0x
 
-#define MAX_BUFFER_OFFSET PAGE_SIZE
+#define MAX_BUFFER_OFFSET XEN_PAGE_SIZE
 
 #define MAX_PENDING_REQS XEN_NETIF_TX_RING_SIZE
 
+/* The maximum number of frags is derived from the size of a grant (same
+ * as a Xen page size for now).
+ */
+#define MAX_XEN_SKB_FRAGS (65536 / XEN_PAGE_SIZE + 1)
+
 /* It's possible for an skb to have a maximal number of frags
  * but still be less than MAX_BUFFER_OFFSET in size. Thus the
- * worst-case number of copy operations is MAX_SKB_FRAGS per
+ * worst-case number of copy operations is MAX_XEN_SKB_FRAGS per
  * ring slot.
  */
-#define MAX_GRANT_COPY_OPS (MAX_SKB_FRAGS * XEN_NETIF_RX_RING_SIZE)
+#define MAX_GRANT_COPY_OPS (MAX_XEN_SKB_FRAGS * XEN_NETIF_RX_RING_SIZE)
 
 #define NETBACK_INVALID_HANDLE -1
 
@@ -203,7 +209,7 @@ struct xenvif_queue { /* Per-queue data for xenvif */
 /* Maximum number of Rx slots a to-guest packet may use, including the
  * slot needed for GSO meta-data.
  */
-#define XEN_NETBK_RX_SLOTS_MAX (MAX_SKB_FRAGS + 1)
+#define XEN_NETBK_RX_SLOTS_MAX ((MAX_XEN_SKB_FRAGS + 1))
 
 enum state_bit_shift {
/* This bit marks that the vif is connected */
diff --git a/drivers/net/xen-netback/netback.c 
b/drivers/net/xen-netback/netback.c
index d4c1bc7..b1649aa 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -263,6 +263,80 @@ static struct xenvif_rx_meta *get_next_rx_buffer(struct 
xenvif_queue *queue,
return meta;
 }
 
+struct gop_frag_copy {
+   struct xenvif_queue *queue;
+   struct netrx_pending_operations *npo;
+   struct xenvif_rx_meta *meta;
+   int head;
+   int gso_type;
+
+   struct page *page;
+};
+
+static void xenvif_setup_copy_gop(unsigned long gfn,
+ unsigned int offset,
+ unsigned int *len,
+ struct gop_frag_copy *info)
+{
+   struct gnttab_copy *copy_gop;
+   struct xen_page_foreign *foreign;
+   /* Convenient aliases */
+   struct xenvif_queue *queue = info->queue;
+   struct netrx_pending_operations *npo = info->npo;
+   struct page *page = info->page;
+
+   BUG_ON(npo->copy_off > MAX_BUFFER_OFFSET);
+
+   if (npo->copy_off == MAX_BUFFER_OFFSET)
+   info->meta = get_next_rx_buffer(queue, npo);
+
+   if (npo->copy_off + *len > MAX_BUFFER_OFFSET)
+   *len = MAX_BUFFER_OFFSET - npo->copy_off;
+
+   copy_gop = npo->copy + npo->copy_prod++;
+   copy_gop->flags = GNTCOPY_de

Re: [PATCH] ip link/addr show: add empty line between interfaces

2015-09-07 Thread Felix Kaiser

Excerpts from Jiri Benc's message of 2015-09-07 14:46:09 +0200:
> Or makes the readability worse by requiring more scrolling, depending
> on number of interfaces you have and your taste.

Of course it's slightly subjective, but I think it improves readability
quite a bit *especially* on servers with many interfaces.

Without the additional lines, you can obviously fit more on one screen,
but I don't think it's much, and I think that once you have a screenful
of text, adding some whitespace makes it much, much easier to focus on
the individual sections.

> Many users prefer the current format.

I've met users who prefer ifconfigs format (and have literally told me
that they find ips hard to read), and I've met users who use a shell
alias to hack newlines in (which is the motivation for this patch), and
everyone that I've shown side-by-side comparisons has been supportive.

> Also, as sad as it is, there are scripts out there that parse ip output
> and I have no doubts this would break them.

I'm aware.

My opinion is that if someone parses that, they deserve the opportunity
to fix their code. At the very least they could have used -oneline!

The default output format is clearly intended for humans; it seems
like no attention at all has been given to parseability. And that's
as it should be - since, because it requires the fewest keystrokes,
that's the format that nearly all interactive users are going to see,
and the output should be optimized for them.

If absolute compatibility even for badly written scripts that misuse
the tool had to be maintained then that'd mean that no improvements
could ever be made, and that'd be sad.

Felix
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 01/20] net/xen-netback: xenvif_gop_frag_copy: move GSO check out of the loop

2015-09-07 Thread Julien Grall

The skb doesn't change within the function. Therefore it's only
necessary to check if we need GSO once at the beginning.

Signed-off-by: Julien Grall 
Acked-by: Wei Liu 

---
Cc: Ian Campbell 
Cc: netdev@vger.kernel.org

Changes in v4:
- Add Wei's acked

Changes in v2:
- Patch added
---
 drivers/net/xen-netback/netback.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/xen-netback/netback.c 
b/drivers/net/xen-netback/netback.c
index 7c64c74..d4c1bc7 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -277,6 +277,13 @@ static void xenvif_gop_frag_copy(struct xenvif_queue 
*queue, struct sk_buff *skb
unsigned long bytes;
int gso_type = XEN_NETIF_GSO_TYPE_NONE;
 
+   if (skb_is_gso(skb)) {
+   if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4)
+   gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
+   else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6)
+   gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
+   }
+
/* Data must not cross a page boundary. */
BUG_ON(size + offset > PAGE_SIZEgso_type & SKB_GSO_TCPV6)
-   gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
-   }
-
if (*head && ((1 << gso_type) & queue->vif->gso_mask))
queue->rx.req_cons++;
 
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2] 9p: trans_fd, bail out if recv fcall if missing

2015-09-07 Thread Dominique Martinet

req->rc is pre-allocated early on with p9_tag_alloc and shouldn't be missing

Signed-off-by: Dominique Martinet 
---
 net/9p/trans_fd.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

Feel free to adapt error code/message if you can think of something better.

diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index a270dcc..a6d89c0 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -356,13 +356,12 @@ static void p9_read_work(struct work_struct *work)
}
 
if (m->req->rc == NULL) {
-   m->req->rc = kmalloc(sizeof(struct p9_fcall) +
-   m->client->msize, GFP_NOFS);
-   if (!m->req->rc) {
-   m->req = NULL;
-   err = -ENOMEM;
-   goto error;
-   }
+   p9_debug(P9_DEBUG_ERROR,
+"No recv fcall for tag %d (req %p), 
disconnecting!\n",
+m->rc.tag, m->req);
+   m->req = NULL;
+   err = -EIO;
+   goto error;
}
m->rc.sdata = (char *)m->req->rc + sizeof(struct p9_fcall);
memcpy(m->rc.sdata, m->tmp_buf, m->rc.capacity);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv1 net-next 0/5] netlink: mmap: kernel panic and some issues

2015-09-07 Thread Daniel Borkmann


On 08/17/2015 11:02 PM, David Miller wrote:
...

I would seriously rather see us do an expensive full copy of the SKB
than to have traffic which is unexpectedly invisible to taps.


I've been looking into this issue a bit further, so the copy for the
tap seems doable, but while further going through the code to find similar
issues elsewhere, and doing some experiments, it looks like we write
shared info also in some edge-cases of upcalls such as nfqueue or ovs
when mmaped netlink is used for rx. I did a test with nfqueue using
the libmnl mmap branch [1].

My test case reduced the hard-coded frame size in libmnl to 4096. I
then crafted via pktgen in receive mode skbs that f.e. have a MTU
of 9000, and which do contain page frags. Redirecting that traffic
into nfqueue with a listener that is rx mmaped, I can see that in case
of nfqnl_build_packet_message(), the kernel is, when invoking skb_zerocopy(),
failing the (len <= skb_tailroom(to)) condition, writing some header
part, and filling the rest with frags[] in the for loop (thus writing/
leaking into the user mmap buffer), so given the right preconditions,
the kernel wouldn't prevent us from doing that.

One way to fix it would be to change netlink_alloc_skb() and all its
call-sites to add another size param, say 'ldiff', that would contain
the size difference that would be needed to copy the non-linear parts
from the network skb into the linear ring buffer slot of the netlink skb.
Thus, that also in case it exceeds the slot size, the kernel will just
fall back to alloc_skb() inside of netlink_alloc_skb() and fill the
slot with status NL_MMAP_STATUS_COPY. With this, no doubt it's not
super pretty to add that additional size param, but the kernel could
still work with mmap receivers.

Other than that, I found an skb_copy() in __netlink_dump_start(), but
that is a left-over from tx mmap in netlink, so not an issue here.

Thanks,
Daniel

  [1] http://git.netfilter.org/libmnl/log/?h=nl-mmap
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net 3/3] r8169: increase the lifespan of the hardware counters dump area.

2015-09-07 Thread Corinna Vinschen

On Sep  6 12:37, Corinna Vinschen wrote:
> On Sep  5 14:18, rom...@fr.zoreil.com wrote:
> > From: Francois Romieu 
> > 
> > net/core/net-sysfs.c::netstat_show retrieves stats with spinlock held.
> > 
> > This change avoids sleepable allocation and performs some housekeeping:
> > - receive ring, transmit ring and counters dump area allocation failures
> >   are all considered fatal during open()
> > - netif_warn is now redundant with rtl_reset_counters_cond built-in
> >   failure message.
> > - rtl_reset_counters_cond induced failures in open() are also considered
> >   fatal: it takes acceptable work to unwind comfortably.
> 
> Why?  The counter reset is not a fatal condition for the operation of
> the NIC.  Even if the counters show incorrect values, the normal
> operation of the NIC is not affected.  And to top that off, even if
> resetting the counters actually doesn't work, the driver keeps the
> offset values, so a failed reset won't even harm the statistics.  It
> just doesn't make sense to break the NIC operation for such a minor
> problem.  And worse:
> 
> > -static bool rtl8169_reset_counters(struct net_device *dev)
> > +static int rtl8169_reset_counters(struct net_device *dev)
> >  {
> > struct rtl8169_private *tp = netdev_priv(dev);
> > -   struct rtl8169_counters *counters;
> > -   dma_addr_t paddr;
> > -   bool ret = true;
> >  
> > /*
> >  * Versions prior to RTL_GIGA_MAC_VER_19 don't support resetting the
> >  * tally counters.
> >  */
> > if (tp->mac_version < RTL_GIGA_MAC_VER_19)
> > -   return true;
> > -
> > -   counters = rtl8169_map_counters(dev, &paddr, CounterReset);
> > -   if (!counters)
> > -   return false;
> > -
> > -   if (!rtl_udelay_loop_wait_low(tp, &rtl_reset_counters_cond, 10, 1000))
> > -   ret = false;
> > -
> > -   rtl8169_unmap_counters(dev, paddr, counters);
> > +   return -EINVAL;
> 
> This returns -EINVAL even for older chip versions which are not capable
> of resetting the counters.  The result is, this driver will not work at
> all on chip versions prior to RTL_GIGA_MAC_VER_19 anymore, because
> rtl_open will always fail.
> 
> Other then that, I can test the patch probably tomorrow.

I have a bit of a problem with this patch.  It crashes when loading the
driver manually w/ modprobe.  For some reason dev_get_stats is called
during rtl_init_one and at that time the counters pointer is NULL, so
the kernel gets a SEGV.  After I worked around that I got a SEGV in
rtl_remove_one, which I still have to find out why.  I didn't test with
the latest kernel, though, so I still have to check if that's the reason
for the crashes.  That takes a bit longer, I just wanted to let you
know.

There's also a problem with rtl8169_cmd_counters:  It always calls
rtl_udelay_loop_wait_low w/ rtl_reset_counters_cond, even when called
from rtl8169_update_counters, where it should call
rtl_udelay_loop_wait_low w/ rtl_counters_cond to check for the
CounterDump flag, rather than for the CounterReset flag.

For now I applied the below patch, but I think it's a bit ugly to
conditionalize the rtl_cond struct by the incoming flag value.


Corinna


diff --git a/drivers/net/ethernet/realtek/r8169.c 
b/drivers/net/ethernet/realtek/r8169.c
index ae07567..cd7adbf 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -2200,6 +2200,13 @@ DECLARE_RTL_COND(rtl_reset_counters_cond)
return RTL_R32(CounterAddrLow) & CounterReset;
 }
 
+DECLARE_RTL_COND(rtl_counters_cond)
+{
+   void __iomem *ioaddr = tp->mmio_addr;
+
+   return RTL_R32(CounterAddrLow) & CounterDump;
+}
+
 static int rtl8169_cmd_counters(struct net_device *dev, u32 counter_cmd)
 {
struct rtl8169_private *tp = netdev_priv(dev);
@@ -2212,7 +2219,10 @@ static int rtl8169_cmd_counters(struct net_device *dev, 
u32 counter_cmd)
RTL_W32(CounterAddrLow, cmd);
RTL_W32(CounterAddrLow, cmd | counter_cmd);
 
-   return rtl_udelay_loop_wait_low(tp, &rtl_reset_counters_cond, 10, 1000);
+   return rtl_udelay_loop_wait_low(tp,
+   counter_cmd == CounterDump ? &rtl_counters_cond :
+&rtl_reset_counters_cond,
+   10, 1000);
 }
 
 static int rtl8169_reset_counters(struct net_device *dev)
@@ -2229,13 +2239,6 @@ static int rtl8169_reset_counters(struct net_device *dev)
return rtl8169_cmd_counters(dev, CounterReset);
 }
 
-DECLARE_RTL_COND(rtl_counters_cond)
-{
-   void __iomem *ioaddr = tp->mmio_addr;
-
-   return RTL_R32(CounterAddrLow) & CounterDump;
-}
-
 static int rtl8169_update_counters(struct net_device *dev)
 {
struct rtl8169_private *tp = netdev_priv(dev);
@@ -7800,8 +7803,11 @@ rtl8169_get_stats64(struct net_device *dev, struct 
rtnl_link_stats64 *stats)
 
/*
 * Fetch additonal counter values missing in stats collected by driver
-* from tally counters.
+* from tally counters, but only if w

Re: [PATCH net 1/1] r8169: fix sleepable allocation during netdevice stats retrieval.

2015-09-07 Thread poma

On 07.09.2015 10:50, Corinna Vinschen wrote:
> On Sep  7 09:13, poma wrote:
>> On 06.09.2015 12:19, Corinna Vinschen wrote:
>>> On Sep  4 22:59, Francois Romieu wrote:
  Applies against davem's net as of 
 f1ccbfce2fc787981d1182d09c1f6b67766783a8.

  drivers/net/ethernet/realtek/r8169.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/drivers/net/ethernet/realtek/r8169.c 
 b/drivers/net/ethernet/realtek/r8169.c
 index 24dcbe6..56829ea 100644
 --- a/drivers/net/ethernet/realtek/r8169.c
 +++ b/drivers/net/ethernet/realtek/r8169.c
 @@ -2200,7 +2200,7 @@ static struct rtl8169_counters 
 *rtl8169_map_counters(struct net_device *dev,
struct rtl8169_counters *counters;
u32 cmd;
  
 -  counters = dma_alloc_coherent(d, sizeof(*counters), paddr, GFP_KERNEL);
 +  counters = dma_alloc_coherent(d, sizeof(*counters), paddr, GFP_ATOMIC);
if (counters) {
RTL_W32(CounterAddrHigh, (u64)*paddr >> 32);
cmd = (u64)*paddr & DMA_BIT_MASK(32);
 -- 
 2.4.3
>>> [...]
>> 1. Reverted to r8169.c?id=eb78139
>>- the noise is still present
>> 2. Patches applied - Francois Romieu (3):
>>r8169: decouple the counters data and the device private area.
>>r8169: move rtl_reset_counters_cond before the hardware counters helpers.
>>r8169: increase the lifespan of the hardware counters dump area.
>>- the noise is still present
> 
> Sure you tested the right code?  I'm just asking because...
> 
>> [   70.016445]  [] dump_stack+0x4b/0x63
>> [   70.016456]  [] lockdep_rcu_suspicious+0xd7/0x110
>> [   70.016465]  [] ___might_sleep+0xa7/0x230
>> [   70.016472]  [] __might_sleep+0x49/0x80
>> [   70.016481]  [] __alloc_pages_nodemask+0x2fe/0xb90
>> [   70.016490]  [] ? debug_lockdep_rcu_enabled+0x1d/0x20
>> [   70.016499]  [] ? sched_clock+0x9/0x10
>> [   70.016507]  [] ? local_clock+0x1c/0x20
>> [   70.016514]  [] ? debug_lockdep_rcu_enabled+0x1d/0x20
>> [   70.016524]  [] dma_generic_alloc_coherent+0x96/0x130
>> [   70.016534]  [] x86_swiotlb_alloc_coherent+0x25/0x50
>> [   70.016541]  [] dma_alloc_attrs+0x6d/0xe0
>> [   70.016555]  [] rtl8169_map_counters+0x3e/0x70 [r8169]
> 
> ...rtl8169_map_counters only exists before Francois patch.  The patch
> removes this function.
> 
> 
> Corinna
> 

I do not know, is this OK?

$ rpm -i 
https://kojipkgs.fedoraproject.org/packages/kernel/4.3.0/0.rc0.git7.1.fc24/src/kernel-4.3.0-0.rc0.git7.1.fc24.src.rpm
$ rpmbuild -bp rpmbuild/SPECS/kernel.spec
$ cd rpmbuild/BUILD/kernel-4.2.fc24/linux-4.3.0-0.rc0.git7.1.fc24.x86_64/

$ curl -s "http://marc.info/?l=linux-netdev&m=144145559429943&q=raw"; | patch -p1
patching file drivers/net/ethernet/realtek/r8169.c
patch unexpectedly ends in middle of line
$ curl -s "http://marc.info/?l=linux-netdev&m=144145559729944&q=raw"; | patch -p1
patching file drivers/net/ethernet/realtek/r8169.c
patch unexpectedly ends in middle of line
$ curl -s "http://marc.info/?l=linux-netdev&m=144145558929942&q=raw"; | patch -p1
patching file drivers/net/ethernet/realtek/r8169.c
patch unexpectedly ends in middle of line

$ sed -i 's/PATCHLEVEL = 2/PATCHLEVEL = 3/' Makefile
$ sed -i 's/EXTRAVERSION =/EXTRAVERSION = -0.rc0.git7.1.fc24.x86_64/' Makefile

$ make -j drivers/net/ethernet/realtek/r8169.ko
$ su
# cp drivers/net/ethernet/realtek/r8169.ko 
/usr/lib/modules/4.3.0-0.rc0.git7.1.fc24.x86_64/updates/
# depmod 

# dracut -f --kver 4.3.0-0.rc0.git7.1.fc24.x86_64 
# lsinitrd --kver 4.3.0-0.rc0.git7.1.fc24.x86_64 | grep r8169
... usr/lib/modules/4.3.0-0.rc0.git7.1.fc24.x86_64/updates/r8169.ko

# systemctl reboot


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [V9fs-developer] [PATCH] 9p: trans_fd, initialize recv fcall properly if not set

2015-09-07 Thread Eric Van Hensbergen

I thought the nature of trans_fd would have prevented any sort of true
zero copy, but I suppose one less is always welcome :)

-eric


On Sun, Sep 6, 2015 at 1:55 AM, Dominique Martinet
 wrote:
> Eric Van Hensbergen wrote on Sat, Sep 05, 2015:
>> On Thu, Sep 3, 2015 at 4:38 AM, Dominique Martinet
>>  wrote:
>> > To be honest, I think it might be better to just bail out if we get in
>> > this switch (m->req->rc == NULL after p9_tag_lookup) and not try to
>> > allocate more, because if we get there it's likely a race condition and
>> > silently re-allocating will end up in more troubles than trying to
>> > recover is worth.
>> > Thoughts ?
>> >
>>
>> Hmmm...trying to rattle my brain and remember why I put it in there
>> back in 2008.
>> It might have just been over-defensive programming -- or more likely it just
>> pre-dated all the zero copy infrastructure which pretty much guaranteed we 
>> had
>> an rc allocated and what is there is vestigial.  I'm happy to accept a
>> patch which
>> makes this an assert, or perhaps just resets the connection because something
>> has gone horribly wrong (similar to the ENOMEM path that is there now).
>
> Yeah, it looks like the safety comes from the zero-copy stuff that came
> much later.
> Let's go with resetting the connection then. Hmm. EIO is a bit too
> generic so would be good to avoid that if possible, but can't think of
> anything better...
>
>
> Speaking of zero-copy, I believe it should be fairly straight-forward to
> implement for trans_fd now I've actually looked at it, since we do the
> payload read after a p9_tag_lookup, would just need m->req to point to a
> zc buffer. Write is similar, if there's a zc buffer just send it after
> the header.
> The cost is a couple more pointers in req and an extra if in both
> workers, that seems pretty reasonable.
>
> Well, I'm not using trans_fd much here (and unfortunately zero-copy
> isn't possible at all given the transport protocol for RDMA, at least
> for recv), but if anyone cares it probably could be done without too
> much hassle for the fd workers.
>
> --
> Dominique
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] ath6kl: remove redundant null pointer check on send_pkt

2015-09-07 Thread Colin King

From: Colin Ian King 

The check for send_pkt being NULL is redundant before the call
to htc_reclaim_txctrl_buf, therefore it should be removed. This was
detected by static analysis by cppcheck.

Signed-off-by: Colin Ian King 
---
 drivers/net/wireless/ath/ath6kl/htc_mbox.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/wireless/ath/ath6kl/htc_mbox.c 
b/drivers/net/wireless/ath/ath6kl/htc_mbox.c
index e481f14..fffb65b 100644
--- a/drivers/net/wireless/ath/ath6kl/htc_mbox.c
+++ b/drivers/net/wireless/ath/ath6kl/htc_mbox.c
@@ -1085,9 +1085,7 @@ static int htc_setup_tx_complete(struct htc_target 
*target)
send_pkt->completion = NULL;
ath6kl_htc_tx_prep_pkt(send_pkt, 0, 0, 0);
status = ath6kl_htc_tx_issue(target, send_pkt);
-
-   if (send_pkt != NULL)
-   htc_reclaim_txctrl_buf(target, send_pkt);
+   htc_reclaim_txctrl_buf(target, send_pkt);
 
return status;
 }
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] x86: Wire up 32-bit direct socket calls

2015-09-07 Thread Arnd Bergmann

On Wednesday 02 September 2015 13:16:19 H. Peter Anvin wrote:
> On 09/02/2015 02:48 AM, Geert Uytterhoeven wrote:
> > 
> > Should all other architectures follow suit?
> > Or should we follow the s390 approach:
> > 
> 
> It is up to the maintainer(s), largely dependent on how likely you are
> going to want to support this in your libc, but in general, socketcall
> is an abomination which there is no reason not to bypass.
> 
> So follow suit unless you have a strong reason not to.

+1

In my y2038 syscall series, I'm adding a new recvmmsg64 call, and
we may decide to add new setsockopt/getsockopt variants as well.
This is probably not the last change to socketcall, and it would
be made much easier if all architectures had separate calls here.

It seems that there are very few architectures that don't already have
the separate calls:

$ git grep -l __NR_socketcall arch/*/include/uapi  | xargs git grep -L recvmsg 
arch/cris/include/uapi/asm/unistd.h
arch/frv/include/uapi/asm/unistd.h
arch/m32r/include/uapi/asm/unistd.h
arch/m68k/include/uapi/asm/unistd.h
arch/mn10300/include/uapi/asm/unistd.h
arch/s390/include/uapi/asm/unistd.h

These are of course all examples of architectures that originally followed
the i386 syscall scheme closely rather than trying to leave out obsolete
calls.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] ip link/addr show: add empty line between interfaces

2015-09-07 Thread Jiri Benc

On Sun,  6 Sep 2015 13:01:09 +0200, Felix Kaiser wrote:
> This improves the readability of the output.

Or makes the readability worse by requiring more scrolling, depending
on number of interfaces you have and your taste. Many users prefer the
current format.

Also, as sad as it is, there are scripts out there that parse ip output
and I have no doubts this would break them.

 Jiri

-- 
Jiri Benc
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net v3] ipv6: fix multipath route replace error recovery

2015-09-07 Thread Nicolas Dichtel


Le 06/09/2015 22:46, Roopa Prabhu a écrit :

From: Roopa Prabhu 

I've sent you some comments about the v2, so please keep me in CC for the next
versions.



Problem:
The ecmp route replace support for ipv6 in the kernel, deletes the
existing ecmp route too early, ie when it installs the first nexthop.
If there is an error in installing the subsequent nexthops, its too late
to recover the already deleted existing route

This patch fixes the problem with the following:

It does not really 'fix' the problem, it only reduces the probability to have
an error. This is really different. The status is much better after this patch,
but it could be good to reword a bit the commitlog to reflect that.


a) Changes the existing multipath route add code to a two stage process:
   build rt6_infos + insert them
ip6_route_add rt6_info creation code is moved into
ip6_route_info_create.
b) This ensures that all errors are caught during building rt6_infos
   and we fail early
c) Separates multipath add and del code. Because add needs the special
   two stage mode in a) and delete essentially does not care.
d) In any event if the code fails during inserting a route again, a
   warning is printed (This should be unlikely)

Before the patch:
$ip -6 route show
3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024
3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024
3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024

/* Try replacing the route with a duplicate nexthop */
$ip -6 route change 3000:1000:1000:1000::2/128 nexthop via
fe80::202:ff:fe00:b dev swp49s0 nexthop via fe80::202:ff:fe00:d dev
swp49s1 nexthop via fe80::202:ff:fe00:d dev swp49s1
RTNETLINK answers: File exists

$ip -6 route show
/* previously added ecmp route 3000:1000:1000:1000::2 dissappears from
  * kernel */

After the patch:
$ip -6 route show
3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024
3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024
3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024

/* Try replacing the route with a duplicate nexthop */
$ip -6 route change 3000:1000:1000:1000::2/128 nexthop via
fe80::202:ff:fe00:b dev swp49s0 nexthop via fe80::202:ff:fe00:d dev
swp49s1 nexthop via fe80::202:ff:fe00:d dev swp49s1
RTNETLINK answers: File exists

$ip -6 route show
3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024
3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024
3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024

Fixes: 27596472473a ("ipv6: fix ECMP route replacement")

As said in the v2 thread, I still don't agree with this tag.

[snip]

+static void ip6_print_replace_route_err(struct list_head *rt6_nh_list)
+{
+   struct rt6_nh *nh;
+
+   list_for_each_entry(nh, rt6_nh_list, next) {
+   pr_warn("IPV6: replace premature del %pI6 nexthop %pI6 ifi 
%d\n",

I don't think that a user (who didn't read the code) can understand this
sentence. Another suggestion:
"ECMPv6: route replacement failed (check the consistency of the installed 
route)". Not sure that the nexthops should be listed after.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net] cxgb4: Changes for new firmware 1.14.4.0

2015-09-07 Thread Hariprasad Shenai

Incorporate fw_ldst_cmd structure change for new firmware and also
update version string for the same

Signed-off-by: Hariprasad Shenai 
---
 drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h | 33 +--
 drivers/net/ethernet/chelsio/cxgb4/t4fw_version.h | 12 -
 2 files changed, 37 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h 
b/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h
index ab46746..a32de30 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h
@@ -762,8 +762,6 @@ enum fw_ldst_func_mod_index {
 
 struct fw_ldst_cmd {
__be32 op_to_addrspace;
-#define FW_LDST_CMD_ADDRSPACE_S0
-#define FW_LDST_CMD_ADDRSPACE_V(x) ((x) << FW_LDST_CMD_ADDRSPACE_S)
__be32 cycles_to_len16;
union fw_ldst {
struct fw_ldst_addrval {
@@ -788,6 +786,13 @@ struct fw_ldst_cmd {
__be16 vctl;
__be16 rval;
} mdio;
+   struct fw_ldst_cim_rq {
+   u8 req_first64[8];
+   u8 req_second64[8];
+   u8 resp_first64[8];
+   u8 resp_second64[8];
+   __be32 r3[2];
+   } cim_rq;
union fw_ldst_mps {
struct fw_ldst_mps_rplc {
__be16 fid_idx;
@@ -828,9 +833,33 @@ struct fw_ldst_cmd {
__be16 nset_pkd;
__be32 data[12];
} pcie;
+   struct fw_ldst_i2c_deprecated {
+   u8 pid_pkd;
+   u8 base;
+   u8 boffset;
+   u8 data;
+   __be32 r9;
+   } i2c_deprecated;
+   struct fw_ldst_i2c {
+   u8 pid;
+   u8 did;
+   u8 boffset;
+   u8 blen;
+   __be32 r9;
+   __u8   data[48];
+   } i2c;
+   struct fw_ldst_le {
+   __be32 index;
+   __be32 r9;
+   u8 val[33];
+   u8 r11[7];
+   } le;
} u;
 };
 
+#define FW_LDST_CMD_ADDRSPACE_S0
+#define FW_LDST_CMD_ADDRSPACE_V(x) ((x) << FW_LDST_CMD_ADDRSPACE_S)
+
 #define FW_LDST_CMD_MSG_S   31
 #define FW_LDST_CMD_MSG_V(x)   ((x) << FW_LDST_CMD_MSG_S)
 
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4fw_version.h 
b/drivers/net/ethernet/chelsio/cxgb4/t4fw_version.h
index 32b2135..58a0327 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4fw_version.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4fw_version.h
@@ -36,18 +36,18 @@
 #define __T4FW_VERSION_H__
 
 #define T4FW_VERSION_MAJOR 0x01
-#define T4FW_VERSION_MINOR 0x0D
-#define T4FW_VERSION_MICRO 0x20
+#define T4FW_VERSION_MINOR 0x0E
+#define T4FW_VERSION_MICRO 0x04
 #define T4FW_VERSION_BUILD 0x00
 
 #define T5FW_VERSION_MAJOR 0x01
-#define T5FW_VERSION_MINOR 0x0D
-#define T5FW_VERSION_MICRO 0x20
+#define T5FW_VERSION_MINOR 0x0E
+#define T5FW_VERSION_MICRO 0x04
 #define T5FW_VERSION_BUILD 0x00
 
 #define T6FW_VERSION_MAJOR 0x01
-#define T6FW_VERSION_MINOR 0x0D
-#define T6FW_VERSION_MICRO 0x2D
+#define T6FW_VERSION_MINOR 0x0E
+#define T6FW_VERSION_MICRO 0x04
 #define T6FW_VERSION_BUILD 0x00
 
 #endif
-- 
2.3.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next v2] ipv6: fix multipath route replace error recovery

2015-09-07 Thread Nicolas Dichtel


+ Michal Kubecek

Le 06/09/2015 22:46, roopa a écrit :

On 9/4/15, 1:12 AM, Nicolas Dichtel wrote:

Le 03/09/2015 01:44, Roopa Prabhu a écrit :

[snip]

Fixes: 4a287eba2de3 ("IPv6 routing, NLM_F_* flag support: REPLACE and EXCL
flags support, warn about missing CREATE flag")

ECMP was added one year after this patch. The right tag is:
Fixes: 51ebd3181572 ("ipv6: add support of equal cost multipath (ECMP)")

I went back and looked again. It is a recent one 27596472473a ("ipv6: fix ECMP
route replacement").

The bug you're trying to fix has been introduced by the commit 51ebd3181572
("ipv6: add support of equal cost multipath (ECMP)").
Commit 27596472473a ("ipv6: fix ECMP route replacement") didn't try to fix this
problem.

Note that the first patch you submitted to fix this pb (cf
http://patchwork.ozlabs.org/patch/362296/) was in June 2014, ie one year before
the commit 27596472473a.


Regards,
Nicolas
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [GIT] Networking

2015-09-07 Thread David Laight

From: Rustad, Mark D
...
> >> static int smp_ah(struct crypto_blkcipher *tfm, const u8 irk[16],
> >>  const u8 r[3], u8 res[3])
> >
> > Expect that it looks like you are passing arrays by value,
> > but instead you are passing by reference.
> >
> > Explicitly pass by reference and sizeof works.
> 
> It depends on what you mean by works. It at least doesn't look so misleading 
> when passing by reference
> and so works more as expected. The sizeof in either case will never return 
> the size of the array. To
> have sizeof return the size of the array would require a typedef of the array 
> to pass by reference. In
> some cases that could be the right thing to do.

Feed:
int bar(int (*f)[10]) { return sizeof *f; }
into cc -O2 -S and look at the generated code - returns 40 not 4.

David

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: IPv6 xfrm GSO fragmentation bug

2015-09-07 Thread Steffen Klassert

On Fri, Sep 04, 2015 at 01:21:06PM +0800, Herbert Xu wrote:
> On Mon, Aug 31, 2015 at 03:35:26PM +0800, Herbert Xu wrote:
> > 
> > I see where the bug came from.  Indeed IPv6 does do fragmentation
> > but only for tunnel mode.  While your patch added a check that also
> > affected transport mode.  So in addition to the GSO fix we should
> > also make the MTU check conditional to tunnel mode.
> 
> Here is the patch:
> 
> ---8<---
> ipv6: Fix IPsec pre-encap fragmentation check
> 
> The IPv6 IPsec pre-encap path performs fragmentation for tunnel-mode
> packets.  That is, we perform fragmentation pre-encap rather than
> post-encap.
> 
> A check was added later to ensure that proper MTU information is
> passed back for locally generated traffic.  Unfortunately this
> check was performed on all IPsec packets, including transport-mode
> packets.
> 
> What's more, the check failed to take GSO into account.
> 
> The end result is that transport-mode GSO packets get dropped at
> the check.
> 
> This patch fixes it by moving the tunnel mode check forward as well
> as adding the GSO check.
> 
> Fixes: dd767856a36e ("xfrm6: Don't call icmpv6_send on local error")
> Signed-off-by: Herbert Xu 

Applied to the ipsec tree, thanks Herbert!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next PATCH v3] drivers: net: cpsw: Add support to drive gpios for ethernet to be functional

2015-09-07 Thread Mugunthan V N

In DRA72x EVM, by default slave 1 is connected to the onboard
phy, but slave 2 pins are also muxed with video input module
which is controlled by pcf857x gpio and currently to select slave
0 to connect to phy gpio hogging is used, but with
omap2plus_defconfig the pcf857x gpio is built as module. So when
using NFS on DRA72x EVM, board doesn't boot as gpio hogging do
not set proper gpio state to connect slave 0 to phy as it is
built as module and you do not see any errors for not setting
gpio and just mentions dhcp reply not got.

To solve this issue, introducing "mode-gpios" in DT when gpio
based muxing is required. This will throw a warning when gpio
get fails and returns probe defer. When gpio-pcf857x module is
installed, cpsw probes again and ethernet becomes functional.
Verified this on DRA72x with pcf as module and ramdisk.

Signed-off-by: Mugunthan V N 
---

Changes from v2:
* Used mode-gpios, so that the driver is generic enough to handle
  multiple gpios

This patch is tested on DRA72x, Logs [1] and pushed a branch [2]

[1]: http://pastebin.ubuntu.com/12306224/
[2]: git://git.ti.com/~mugunthanvnm/ti-linux-kernel/linux.git 
cpsw-gpio-optional-v3

---
 Documentation/devicetree/bindings/net/cpsw.txt | 7 +++
 drivers/net/ethernet/ti/cpsw.c | 9 +
 2 files changed, 16 insertions(+)

diff --git a/Documentation/devicetree/bindings/net/cpsw.txt 
b/Documentation/devicetree/bindings/net/cpsw.txt
index a9df21a..676ecf6 100644
--- a/Documentation/devicetree/bindings/net/cpsw.txt
+++ b/Documentation/devicetree/bindings/net/cpsw.txt
@@ -30,6 +30,13 @@ Optional properties:
 - dual_emac: Specifies Switch to act as Dual EMAC
 - syscon   : Phandle to the system control device node, which is
  the control module device of the am33x
+- mode-gpios   : Should be added if one/multiple gpio lines are
+ required to be driven so that cpsw data lines
+ can be connected to the phy via selective mux.
+ For example in dra72x-evm, pcf gpio has to be
+ driven low so that cpsw slave 0 and phy data
+ lines are connected via mux.
+
 
 Slave Properties:
 Required properties:
diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 8fc90f1..c670317 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -2207,6 +2208,7 @@ static int cpsw_probe(struct platform_device *pdev)
void __iomem*ss_regs;
struct resource *res, *ss_res;
const struct of_device_id   *of_id;
+   struct gpio_descs   *mode;
u32 slave_offset, sliver_offset, slave_size;
int ret = 0, i;
int irq;
@@ -2232,6 +2234,13 @@ static int cpsw_probe(struct platform_device *pdev)
goto clean_ndev_ret;
}
 
+   mode = devm_gpiod_get_array_optional(&pdev->dev, "mode", GPIOD_OUT_LOW);
+   if (IS_ERR(mode)) {
+   ret = PTR_ERR(mode);
+   dev_err(&pdev->dev, "gpio request failed, ret %d\n", ret);
+   goto clean_ndev_ret;
+   }
+
/*
 * This may be required here for child devices.
 */
-- 
2.6.0.rc0.24.gec371ff

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: slab-nomerge (was Re: [git pull] device mapper changes for 4.3)

2015-09-07 Thread Jesper Dangaard Brouer

On Thu, 3 Sep 2015 20:51:09 -0700 Linus Torvalds 
 wrote:

> On Thu, Sep 3, 2015 at 8:26 PM, Dave Chinner  wrote:
> >
> > The double standard is the problem here. No notification, proof,
> > discussion or review was needed to turn on slab merging for
> > everyone, but you're setting a very high bar to jump if anyone wants
> > to turn it off in their code.
> 
> Ehh. You realize that almost the only load that is actually seriously
> allocator-limited is networking?
> 
> And slub was beating slab on that? And slub has been doing the merging
> since day one. Slab was just changed to try to keep up with the
> winning strategy.

Sorry, I have to correct you on this.  The slub allocator is not as
fast as you might think.  The slab allocator is actually faster for
networking.

IP-forwarding, single CPU, single flow UDP (highly tuned):
 * Allocator slub: 2043575 pps
 * Allocator slab: 2088295 pps

Difference slab faster than slub:
 * +44720 pps and -10.48ns

The slub allocator have a faster "fastpath", if your workload is
fast-reusing within the same per-cpu page-slab, but once the workload
increases you hit the slowpath, and then slab catches up. Slub looks
great in micro-benchmarking.

As you can see in patchset:
 [PATCH 0/3] Network stack, first user of SLAB/kmem_cache bulk free API.
 http://thread.gmane.org/gmane.linux.kernel.mm/137469/focus=376625

I'm working on speeding up slub to the level of slab.  And it seems
like I have succeeded with half-a-nanosec 2090522 pps (+2227 pps or
0.51 ns).

And with "slab_nomerge" I get even high performance:
 * slub: bulk-free and slab_nomerge: 2121824 pps
 * Diff to slub: +78249 and -18.05ns

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net 3/3] r8169: increase the lifespan of the hardware counters dump area.

2015-09-07 Thread Corinna Vinschen

On Sep  6 22:21, Francois Romieu wrote:
> Corinna Vinschen  :
> > On Sep  5 14:18, rom...@fr.zoreil.com wrote:
> [...]
> > > - rtl_reset_counters_cond induced failures in open() are also considered
> > >   fatal: it takes acceptable work to unwind comfortably.
> > 
> > Why?
> 
> 
> Crap, my description does not match the code wrt rtl_reset_counters_cond. :o/
> 
> s/rtl8169_reset_counters/rtl8169_update_counters/g
> 
> The code is right.
> 
> [...]
> > This returns -EINVAL even for older chip versions which are not capable
> > of resetting the counters.  The result is, this driver will not work at
> > all on chip versions prior to RTL_GIGA_MAC_VER_19 anymore, because
> > rtl_open will always fail.
> 
> No. My changelog was misleading. rtl_init_counter_offsets handles this part
> correctly.

Oh, right, I missed that rtl_init_counter_offsets checks for `if (!rc)',
not for `if (rc < 0)' as for the call to rtl8169_update_counters.

Still wondering though.  Given that the driver never failed before if
the counter values couldn't be updated, and given that these counter
values only have statistical relevance, why should this suddenly result
in a fatal failure at open time?

Corinna

pgpU8NPcDgH_n.pgp
Description: PGP signature

RE: [PATCH] net: tipc: fix stall during bclink wakeup procedure

2015-09-07 Thread Kolmakov Dmitriy

If an attempt to wake up users of broadcast link is made when there is
no enough place in send queue than it may hang up inside the
tipc_sk_rcv() function since the loop breaks only after the wake up
queue becomes empty. This can lead to complete CPU stall with the
following message generated by RCU:

INFO: rcu_sched self-detected stall on CPU { 0}  (t=2101 jiffies
g=54225 c=54224 q=11465)
Task dump for CPU 0:
tpchR  running task0 39949  39948 0x000a
 818536c0 88181fa037a0 8106a4be 
 818536c0 88181fa037c0 8106d8a8 88181fa03800
 0001 88181fa037f0 81094a50 88181fa15680
Call Trace:
   [] sched_show_task+0xae/0x120
 [] dump_cpu_task+0x38/0x40
 [] rcu_dump_cpu_stacks+0x90/0xd0
 [] rcu_check_callbacks+0x3eb/0x6e0
 [] ? account_system_time+0x7f/0x170
 [] update_process_times+0x34/0x60
 [] tick_sched_handle.isra.18+0x31/0x40
 [] tick_sched_timer+0x3c/0x70
 [] __run_hrtimer.isra.34+0x3d/0xc0
 [] hrtimer_interrupt+0xc5/0x1e0
 [] ? native_smp_send_reschedule+0x42/0x60
 [] local_apic_timer_interrupt+0x34/0x60
 [] smp_apic_timer_interrupt+0x3c/0x60
 [] apic_timer_interrupt+0x6b/0x70
 [] ? _raw_spin_unlock_irqrestore+0x9/0x10
 [] __wake_up_sync_key+0x4f/0x60
 [] tipc_write_space+0x31/0x40 [tipc]
 [] filter_rcv+0x31f/0x520 [tipc]
 [] ? tipc_sk_lookup+0xc9/0x110 [tipc]
 [] ? _raw_spin_lock_bh+0x19/0x30
 [] tipc_sk_rcv+0x2dc/0x3e0 [tipc]
 [] tipc_bclink_wakeup_users+0x2f/0x40 [tipc]
 [] tipc_node_unlock+0x186/0x190 [tipc]
 [] ? kfree_skb+0x2c/0x40
 [] tipc_rcv+0x2ac/0x8c0 [tipc]
 [] tipc_l2_rcv_msg+0x38/0x50 [tipc]
 [] __netif_receive_skb_core+0x5a3/0x950
 [] __netif_receive_skb+0x13/0x60
 [] netif_receive_skb_internal+0x1e/0x90
 [] napi_gro_receive+0x78/0xa0
 [] tg3_poll_work+0xc54/0xf40 [tg3]
 [] ? consume_skb+0x2c/0x40
 [] tg3_poll_msix+0x41/0x160 [tg3]
 [] net_rx_action+0xe2/0x290
 [] __do_softirq+0xda/0x1f0
 [] irq_exit+0x76/0xa0
 [] do_IRQ+0x55/0xf0
 [] common_interrupt+0x6b/0x6b
 

The issue occurs only when tipc_sk_rcv() is used to wake up postponed
senders:

tipc_bclink_wakeup_users()
// wakeupq - is a queue which consists of special
//   messages with SOCK_WAKEUP type.
tipc_sk_rcv(wakeupq)
...
while (skb_queue_len(inputq)) {
filter_rcv(skb)
// Here the type of message is checked
// and if it is SOCK_WAKEUP then
// it tries to wake up a sender.
tipc_write_space(sk)

wake_up_interruptible_sync_poll()
}

After the sender thread is woke up it can gather control and perform
an attempt to send a message. But if there is no enough place in send
queue it will call link_schedule_user() function which puts a message
of type SOCK_WAKEUP to the wakeup queue and put the sender to sleep.
Thus the size of the queue actually is not changed and the while()
loop never exits.

The approach I proposed is to wake up only senders for which there is
enough place in send queue so the described issue can't occur.
Moreover the same approach is already used to wake up senders on
unicast links.

I have got into the issue on our product code but to reproduce the
issue I changed a benchmark test application (from
tipcutils/demos/benchmark) to perform the following scenario:
1. Run 64 instances of test application (nodes). It can be done
   on the one physical machine.
2. Each application connects to all other using TIPC sockets in
   RDM mode.
3. When setup is done all nodes start simultaneously send
   broadcast messages.
4. Everything hangs up.

The issue is reproducible only when a congestion on broadcast link
occurs. For example, when there are only 8 nodes it works fine since
congestion doesn't occur. Send queue limit is 40 in my case (I use a
critical importance level) and when 64 nodes send a message at the
same moment a congestion occurs every time.

Signed-off-by: Dmitry S Kolmakov 
Reviewed-by: Jon Maloy 
Acked-by: Ying Xue 
---
v2: Updated after comments from Jon and Ying.

diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c
index c5cbdcb..997dd60 100644
--- a/net/tipc/bcast.c
+++ b/net/tipc/bcast.c
@@ -169,6 +169,30 @@ static void bclink_retransmit_pkt(struct tipc_net *tn, u32 
after, u32 to)
 }

 /**
+ * bclink_prepare_wakeup - prepare users for wakeup after congestion
+ * @bcl: broadcast link
+ * @resultq: queue for users which can be woken up
+ * Move a number of waiting users, as permitted by available space in
+ * the send queue, from link wait queue to specified queue for wakeup
+ */
+static void bclink_prepare_wakeup(struct tipc_link *bcl, struct sk_

Re: [PATCH net 1/1] r8169: fix sleepable allocation during netdevice stats retrieval.

2015-09-07 Thread Corinna Vinschen

On Sep  7 09:13, poma wrote:
> On 06.09.2015 12:19, Corinna Vinschen wrote:
> > On Sep  4 22:59, Francois Romieu wrote:
> >>  Applies against davem's net as of 
> >> f1ccbfce2fc787981d1182d09c1f6b67766783a8.
> >>
> >>  drivers/net/ethernet/realtek/r8169.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/net/ethernet/realtek/r8169.c 
> >> b/drivers/net/ethernet/realtek/r8169.c
> >> index 24dcbe6..56829ea 100644
> >> --- a/drivers/net/ethernet/realtek/r8169.c
> >> +++ b/drivers/net/ethernet/realtek/r8169.c
> >> @@ -2200,7 +2200,7 @@ static struct rtl8169_counters 
> >> *rtl8169_map_counters(struct net_device *dev,
> >>struct rtl8169_counters *counters;
> >>u32 cmd;
> >>  
> >> -  counters = dma_alloc_coherent(d, sizeof(*counters), paddr, GFP_KERNEL);
> >> +  counters = dma_alloc_coherent(d, sizeof(*counters), paddr, GFP_ATOMIC);
> >>if (counters) {
> >>RTL_W32(CounterAddrHigh, (u64)*paddr >> 32);
> >>cmd = (u64)*paddr & DMA_BIT_MASK(32);
> >> -- 
> >> 2.4.3
> > [...]
> 1. Reverted to r8169.c?id=eb78139
>- the noise is still present
> 2. Patches applied - Francois Romieu (3):
>r8169: decouple the counters data and the device private area.
>r8169: move rtl_reset_counters_cond before the hardware counters helpers.
>r8169: increase the lifespan of the hardware counters dump area.
>- the noise is still present

Sure you tested the right code?  I'm just asking because...

> [   70.016445]  [] dump_stack+0x4b/0x63
> [   70.016456]  [] lockdep_rcu_suspicious+0xd7/0x110
> [   70.016465]  [] ___might_sleep+0xa7/0x230
> [   70.016472]  [] __might_sleep+0x49/0x80
> [   70.016481]  [] __alloc_pages_nodemask+0x2fe/0xb90
> [   70.016490]  [] ? debug_lockdep_rcu_enabled+0x1d/0x20
> [   70.016499]  [] ? sched_clock+0x9/0x10
> [   70.016507]  [] ? local_clock+0x1c/0x20
> [   70.016514]  [] ? debug_lockdep_rcu_enabled+0x1d/0x20
> [   70.016524]  [] dma_generic_alloc_coherent+0x96/0x130
> [   70.016534]  [] x86_swiotlb_alloc_coherent+0x25/0x50
> [   70.016541]  [] dma_alloc_attrs+0x6d/0xe0
> [   70.016555]  [] rtl8169_map_counters+0x3e/0x70 [r8169]

...rtl8169_map_counters only exists before Francois patch.  The patch
removes this function.


Corinna


pgpVJzCq1DXC1.pgp
Description: PGP signature

Re: [RFC PATCH 1/3] net: introduce kfree_skb_bulk() user of kmem_cache_free_bulk()

2015-09-07 Thread Jesper Dangaard Brouer

On Fri, 4 Sep 2015 11:47:17 -0700 Tom Herbert  wrote:

> On Fri, Sep 4, 2015 at 10:00 AM, Jesper Dangaard Brouer  
> wrote:
> > Introduce the first user of SLAB bulk free API kmem_cache_free_bulk(),
> > in the network stack in form of function kfree_skb_bulk() which bulk
> > free SKBs (not skb clones or skb->head, yet).
> >
[...]
> > +/**
> > + * kfree_skb_bulk - bulk free SKBs when refcnt allows to
> > + * @skbs: array of SKBs to free
> > + * @size: number of SKBs in array
> > + *
> > + * If SKB refcnt allows for free, then release any auxiliary data
> > + * and then bulk free SKBs to the SLAB allocator.
> > + *
> > + * Note that interrupts must be enabled when calling this function.
> > + */
> > +void kfree_skb_bulk(struct sk_buff **skbs, unsigned int size)
> > +{
>
> What not pass a list of skbs (e.g. using skb->next)?

Because the next layer, the slab API needs an array:
  kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p)

Look at the patch:
 [PATCH V2 3/3] slub: build detached freelist with look-ahead
 http://thread.gmane.org/gmane.linux.kernel.mm/137469/focus=137472

Where I use this array to progressively scan for objects belonging to
the same page.  (A subtle detail is I manage to zero out the array,
which is good from a security/error-handling point of view, as pointers
to the objects are not left dangling on the stack).


I cannot argue that, writing skb->next comes as an additional cost,
because the slUb free also writes into this cacheline.  Perhaps the
slAb allocator does not?

[...]
> > +   if (likely(cnt)) {
> > +   kmem_cache_free_bulk(skbuff_head_cache, cnt, (void **) 
> > skbs);
> > +   }
> > +}
> > +EXPORT_SYMBOL(kfree_skb_bulk);

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 0/3] Network stack, first user of SLAB/kmem_cache bulk free API.

2015-09-07 Thread Jesper Dangaard Brouer

On Fri, 4 Sep 2015 11:09:21 -0700
Alexander Duyck  wrote:

> This is an interesting start.  However I feel like it might work better 
> if you were to create a per-cpu pool for skbs that could be freed and 
> allocated in NAPI context.  So for example we already have 
> napi_alloc_skb, why not just add a napi_free_skb

I do like the idea...

> and then make the array 
> of objects to be freed part of a pool that could be used for either 
> allocation or freeing?  If the pool runs empty you just allocate 
> something like 8 or 16 new skb heads, and if you fill it you just free 
> half of the list?

But I worry that this algorithm will "randomize" the (skb) objects.
And the SLUB bulk optimization only works if we have many objects
belonging to the same page.

It would likely be fastest to implement a simple stack (for these
per-cpu pools), but I again worry that it would randomize the
object-pages.  A simple queue might be better, but slightly slower.
Guess I could just reuse part of qmempool / alf_queue as a quick test.

Having a per-cpu pool in networking would solve the problem of the slub
per-cpu pool isn't large enough for our use-case.  On the other hand,
maybe we should fix slub to dynamically adjust the size of it's per-cpu
resources?

A pre-req knowledge (for people not knowing slub's internal details):
Slub alloc path will pickup a page, and empty all objects for that page
before proceeding to the next page.  Thus, slub bulk alloc will give
many objects belonging to the page.  I'm trying to keep these objects
grouped together until they can be free'ed in a bulk.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net 1/1] r8169: fix sleepable allocation during netdevice stats retrieval.

2015-09-07 Thread poma

On 06.09.2015 12:19, Corinna Vinschen wrote:
> On Sep  4 22:59, Francois Romieu wrote:
>> net/core/net-sysfs.c::netstat_show fetches statistics under dev_base_lock.
>>
>> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=104031
>> Fixes: 6e85d5ad36a2 ("r8169: Add values missing in @get_stats64 from HW 
>> counters")
>> Signed-off-by: Francois Romieu 
>> Cc: Corinna Vinschen 
>> ---
>>
>>  Applies against davem's net as of f1ccbfce2fc787981d1182d09c1f6b67766783a8.
>>
>>  drivers/net/ethernet/realtek/r8169.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/realtek/r8169.c 
>> b/drivers/net/ethernet/realtek/r8169.c
>> index 24dcbe6..56829ea 100644
>> --- a/drivers/net/ethernet/realtek/r8169.c
>> +++ b/drivers/net/ethernet/realtek/r8169.c
>> @@ -2200,7 +2200,7 @@ static struct rtl8169_counters 
>> *rtl8169_map_counters(struct net_device *dev,
>>  struct rtl8169_counters *counters;
>>  u32 cmd;
>>  
>> -counters = dma_alloc_coherent(d, sizeof(*counters), paddr, GFP_KERNEL);
>> +counters = dma_alloc_coherent(d, sizeof(*counters), paddr, GFP_ATOMIC);
>>  if (counters) {
>>  RTL_W32(CounterAddrHigh, (u64)*paddr >> 32);
>>  cmd = (u64)*paddr & DMA_BIT_MASK(32);
>> -- 
>> 2.4.3
> 
> I'll have a look tomorrow what I'm back to work, but for the time
> being, two questions.
> 
> - The code worked for me in local testing without problem, so I'm rather
>   puzzled.  How can I reproduce the problem?
> 
> - I'm pretty new to this stuff, so I don't understand this:
> 
>   The dma_alloc_coherent(...,GFP_KERNEL) call is not new in the code,
>   it's there since at least 2010.  It appears to work fine in the
>   context of @get_ethtool_stats.  Why does this not work in the context
>   of @ndo_get_stats64?
> 
> 
> Thanks,
> Corinna
> 

1. Reverted to r8169.c?id=eb78139
   - the noise is still present
2. Patches applied - Francois Romieu (3):
   r8169: decouple the counters data and the device private area.
   r8169: move rtl_reset_counters_cond before the hardware counters helpers.
   r8169: increase the lifespan of the hardware counters dump area.
   - the noise is still present

$ modinfo -n r8169 
/lib/modules/4.3.0-0.rc0.git7.1.fc24.x86_64/updates/r8169.ko

This noise is induced via userspace, xfce4-netload-plugin,
http://goodies.xfce.org/projects/panel-plugins/xfce4-netload-plugin

$ grep -i device .config/xfce4/panel/netload-16.rc
Network_Device=enp3s0

$ ethtool -i enp3s0 | grep driver
driver: r8169

$ dmesg | grep panel-16-netloa | wc -l
237

$ dmesg
...
[   68.597764] bridge0: port 1(enp3s0) entered forwarding state

[   70.016291] ===
[   70.016296] [ INFO: suspicious RCU usage. ]
[   70.016303] 4.3.0-0.rc0.git7.1.fc24.x86_64 #1 Not tainted
[   70.016308] ---
[   70.016314] include/linux/rcupdate.h:579 Illegal context switch in RCU 
read-side critical section!
[   70.016319] 
   other info that might help us debug this:

[   70.016327] 
   rcu_scheduler_active = 1, debug_locks = 0
[   70.016334] 2 locks held by panel-16-netloa/1806:
[   70.016338]  #0:  (&p->lock){+.+.+.}, at: [] 
seq_read+0x4c/0x3e0
[   70.016362]  #1:  (rcu_read_lock){..}, at: [] 
dev_seq_start+0x5/0x140
[   70.016380] 
   stack backtrace:
[   70.016390] CPU: 3 PID: 1806 Comm: panel-16-netloa Not tainted 
4.3.0-0.rc0.git7.1.fc24.x86_64 #1
...
[   70.016435] Call Trace:
[   70.016445]  [] dump_stack+0x4b/0x63
[   70.016456]  [] lockdep_rcu_suspicious+0xd7/0x110
[   70.016465]  [] ___might_sleep+0xa7/0x230
[   70.016472]  [] __might_sleep+0x49/0x80
[   70.016481]  [] __alloc_pages_nodemask+0x2fe/0xb90
[   70.016490]  [] ? debug_lockdep_rcu_enabled+0x1d/0x20
[   70.016499]  [] ? sched_clock+0x9/0x10
[   70.016507]  [] ? local_clock+0x1c/0x20
[   70.016514]  [] ? debug_lockdep_rcu_enabled+0x1d/0x20
[   70.016524]  [] dma_generic_alloc_coherent+0x96/0x130
[   70.016534]  [] x86_swiotlb_alloc_coherent+0x25/0x50
[   70.016541]  [] dma_alloc_attrs+0x6d/0xe0
[   70.016555]  [] rtl8169_map_counters+0x3e/0x70 [r8169]
[   70.016567]  [] rtl8169_update_counters+0x64/0x140 [r8169]
[   70.016578]  [] rtl8169_get_stats64+0xbf/0x130 [r8169]
[   70.016587]  [] dev_get_stats+0x54/0x100
[   70.016595]  [] dev_seq_printf_stats+0x37/0x120
[   70.016605]  [] dev_seq_show+0x14/0x30
[   70.016611]  [] seq_read+0x2fa/0x3e0
[   70.016621]  [] proc_reg_read+0x42/0x70
[   70.016628]  [] __vfs_read+0x37/0x100
[   70.016637]  [] ? security_file_permission+0xa3/0xc0
[   70.016644]  [] vfs_read+0x86/0x130
[   70.016652]  [] SyS_read+0x58/0xd0
[   70.016660]  [] entry_SYSCALL_64_fastpath+0x12/0x76
[   70.016698] BUG: sleeping function called from invalid context at 
mm/page_alloc.c:3186
[   70.016704] in_atomic(): 1, irqs_disabled(): 0, pid: 1806, name: 
panel-16-netloa
[   70.016708] INFO: lockdep is turned off.
[   70.016715] CPU: 3 PID: 1806 Comm: panel-16-netloa Not tainted 
4.3.0-0.rc0.git

Re: [PATCH net 3/3] r8169: increase the lifespan of the hardware counters dump area.

2015-09-07 Thread David Miller

From: Francois Romieu 
Date: Sun, 6 Sep 2015 22:21:53 +0200

> Corinna Vinschen  :
>> On Sep  5 14:18, rom...@fr.zoreil.com wrote:
> [...]
>> > - rtl_reset_counters_cond induced failures in open() are also considered
>> >   fatal: it takes acceptable work to unwind comfortably.
>> 
>> Why?
> 
> 
> Crap, my description does not match the code wrt rtl_reset_counters_cond. :o/
> 
> s/rtl8169_reset_counters/rtl8169_update_counters/g
> 
> The code is right.

Please resubmit with fixed commit log messages, thanks!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

52 matches

Mail list logo